Re: [Gluster-devel] regarding inode-unref on root inode

2014-06-24 Thread Raghavendra Bhat

On Tuesday 24 June 2014 08:17 PM, Pranith Kumar Karampuri wrote:

Does anyone know why inode_unref is no-op for root inode?

I see the following code in inode.c

 static inode_t *
 __inode_unref (inode_t *inode)
 {
 if (!inode)
 return NULL;

 if (__is_root_gfid(inode->gfid))
 return inode;
 ...
}


I think its done with the intention that, root inode should *never* ever 
get removed from the active inodes list. (not even accidentally). So 
unref on root-inode is a no-op. Dont know whether there are any other 
reasons.


Regards,
Raghavendra Bhat



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Data classification proposal

2014-06-24 Thread Krishnan Parthasarathi
Jeff,

- Original Message -
> > Am I right if I understood that the value for media-type is not
> > interpreted beyond the scope of matching rules? That is to say, we
> > don't need/have any notion of media-types that type check internally
> > for forming (sub)volumes using the rules specified.
> 
> Exactly.  To us it's just an opaque ID.

OK. That makes sense.

> 
> > Should the no. of bricks or lower-level subvolumes that match the rule
> > be an exact multiple of group-size?
> 
> Good question.  I think users see the current requirement to add bricks
> in multiples of the replica/stripe size as an annoyance.  This will only
> get worse with erasure coding where the group size is larger.  On the
> other hand, we do need to make sure that members of a group are on
> different machines.  This is why I think we need to be able to split
> bricks, so that we can use overlapping replica/erasure sets.  For
> example, if we have five bricks and two-way replication, we can split
> bricks to get a multiple of two and life's good again.  So *long term* I
> think we can/should remove any restriction on users, but there are a
> whole bunch of unsolved issues around brick splitting.  I'm not sure
> what to do in the short term.

For the short-term, wouldn't it be OK to disallow adding bricks that is not
a multiple of group-size?

> 
> > > Here's a more complex example that adds replication and erasure
> > > coding to the mix.
> > >
> > > # Assume 20 hosts, four fast and sixteen slow (named
> > > appropriately).
> > >
> > > rule tier-1
> > > select *fast*
> > > group-size 2
> > > type cluster/afr
> > >
> > > rule tier-2
> > > # special pattern matching otherwise-unused bricks
> > > select %{unclaimed}
> > > group-size 8
> > > type cluster/ec parity=2
> > > # i.e. two groups, each six data plus two parity
> > >
> > > rule all
> > > select tier-1
> > > select tier-2
> > > type features/tiering
> > >
> >
> > In the above example we would have 2 subvolumes each containing 2
> > bricks that would be aggregated by rule tier-1. Lets call those
> > subvolumes as tier-1-fast-0 and tier-fast-1.  Both of these subvolumes
> > are afr based two-way replicated subvolumes.  Are these instances of
> > tier-1-* composed using cluster/dht by the default semantics?
> 
> Yes.  Any time we have multiple subvolumes and no other specified way to
> combine them into one, we just slap DHT on top.  We do this already at
> the top level; with data classification we might do it at lower levels
> too.
> 

thanks,
Krish
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Addition of GlusterFS Port Maintainers

2014-06-24 Thread Anand Avati
On Tue, Jun 24, 2014 at 10:43 AM, Justin Clift  wrote:

> On 24/06/2014, at 6:34 PM, Vijay Bellur wrote:
> > Hi All,
> >
> > Since there has been traction for ports of GlusterFS to other unix
> distributions, we thought of adding maintainers for the various ports that
> are around. I am glad to announce that the following individuals who have
> been chugging GlusterFS along on those distributions have readily agreed to
> be port maintainers. Please welcome:
> >
> > 1. Emmanuel Dreyfus as maintainer for NetBSD
> >
> > 2. Harshavardhana and Dennis Schafroth for Mac OS X
> >
> > 3. Harshavardhana as interim maintainer for FreeBSD
> >
> > All port maintainers will have commit access to GlusterFS repository and
> will manage patches in gerrit that are necessary for keeping the ports
> functional. We believe that this effort will help in keeping releases on
> various ports up to date.
> >
> > Let us extend our co-operation to port maintainers and help evolve a
> more broader, vibrant community for GlusterFS!
>
>
> Excellent stuff. :)
>
> + Justin
>

+1
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster on OSX

2014-06-24 Thread Justin Clift
On 23/05/2014, at 6:52 PM, Harshavardhana wrote:

>> Do you reckon we should get that Mac Mini in the Westford
>> lab set up to automatically test Gluster builds each
>> night or something?
>> 
>> If so, we should probably take/claim ownership of it,
>> upgrade the memory in it, and (possibly) see if it can be
>> put in the DMZ.
> 
> Up to you guys, it would be great. I am doing it manually for now once
> in 2days :-)


I've just ordered the ram upgrade (16GB) for it.  Kaleb should
receive it in a week or so, and will be able to install it after
that. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories

2014-06-24 Thread Shyamsundar Ranganathan


- Original Message -
> From: "Anders Blomdell" 
> To: "Niels de Vos" 
> Cc: "Shyamsundar Ranganathan" , "Gluster Devel" 
> , "Susant Palai"
> 
> Sent: Tuesday, June 24, 2014 4:09:52 AM
> Subject: Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on 
> directories
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 2014-06-23 12:03, Niels de Vos wrote:
> > On Tue, Jun 17, 2014 at 11:49:26AM -0400, Shyamsundar Ranganathan wrote:
> >> You maybe looking at the problem being fixed here, [1].
> >>
> >> On a lookup attribute mismatch was not being healed across
> >> directories, and this patch attempts to address the same. Currently
> >> the version of the patch does not heal the S_ISUID and S_ISGID bits,
> >> which is work in progress (but easy enough to incorporate and test
> >> based on the patch at [1]).
> >>
> >> On a separate note, add-brick just adds a brick to the cluster, the
> >> lookup is where the heal (or creation of the directory across all sub
> >> volumes in DHT xlator) is being done.
> >
> > I assume that this is not a regression between 3.5.0 and 3.5.1? If that
> > is the case, we can pull the fix in 3.5.2 because 3.5.1 really should
> > not get delayed much longer.
> No, it does not work in 3.5.0 either :-(

I ran these tests using your scripts and observed similar behavior and need to 
dig into this a little further to understand how to make this work reliably.

> 
> 
> The proposed patch does not work as intended, with the following hieararchy
> 
>7550:   0 /mnt/gluster
>   27770:1000 /mnt/gluster/test
>   2755 1000:1000 /mnt/gluster/test/dir1
>   2755 1000:1000 /mnt/gluster/test/dir1/dir2
> 
> In the (approx 25%) of cases where my test-script does trigger a
> self heal on disk2, 10% ends up with (giving access error on client):
> 
>  00:   0 /data/disk2/gluster/test
>755 1000:1000 /data/disk2/gluster/test/dir1
>755 1000:1000 /data/disk2/gluster/test/dir1/dir2
> or
> 
>   27770:1000 /data/disk2/gluster/test
>  00:   0 /data/disk2/gluster/test/dir1
>755 1000:1000 /data/disk2/gluster/test/dir1/dir2
> 
> or
> 
>   27770:1000 /data/disk2/gluster/test
>   2755 1000:1000 /data/disk2/gluster/test/dir1
>  00:   0 /data/disk2/gluster/test/dir1/dir2
> 
> 
> and 73% ends up with either partially healed directories
> (/data/disk2/gluster/test/dir1/dir2 or
>  /data/disk2/gluster/test/dir1 missing) or the sgid bit
> [randomly] set on some of the directories.
> 
> Since I don't even understand how to reliably trigger
> a self-heal of the directories, I'm currently clueless
> to the reason for this behaviour.
> 
> Soo, I think that the comment from susant in
> http://review.gluster.org/#/c/6983/3/xlators/cluster/dht/src/dht-common.c:
> 
>   susant palaiJun 13 9:04 AM
> 
>I think we dont have to worry about that.
>Rebalance does not interfere with directory SUID/GID/STICKY bits.
> 
> unfortunately is wrong :-(, and I'm on too deep water to understand how to
> fix this at the moment.

Currently in the test case rebalance is not run, so the above comment in 
relation to rebalance is sort of different that what is observed. Just a note.

> 
> 
> N.B: with 00777 flags on the /mnt/gluster/test directory
> I have not been able to trigger any unreadable directories
> 
> /Anders
> 
> >
> > Thanks,
> > Niels
> >
> >>
> >> Shyam
> >>
> >> [1] http://review.gluster.org/#/c/6983/
> >>
> >> - Original Message -
> >>> From: "Anders Blomdell" 
> >>> To: "Gluster Devel" 
> >>> Sent: Tuesday, June 17, 2014 10:53:52 AM
> >>> Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on
> >>>   directories
> >>>
> > With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted
> > 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4
> > addresses), I get
> > weird behavior if I:
> >
> > 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test)
> > 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1)
> > 3. Do an add-brick
> >
> > Before add-brick
> >
> >755 /mnt/gluster
> >   7775 /mnt/gluster/test
> >   2755 /mnt/gluster/test/dir1
> >
> > After add-brick
> >
> >755 /mnt/gluster
> >   1775 /mnt/gluster/test
> >755 /mnt/gluster/test/dir1
> >
> > On the server it looks like this:
> >
> >   7775 /data/disk1/gluster/test
> >   2755 /data/disk1/gluster/test/dir1
> >   1775 /data/disk2/gluster/test
> >755 /data/disk2/gluster/test/dir1
> >
> > Filed as bug:
> >
> >   https://bugzilla.redhat.com/show_bug.cgi?id=1110262
> >
> > If somebody can point me to where the logic of add-brick is placed, I can
> > give
> > it a shot (a find/grep on mkdir didn't immediately point me to the right
> > place).
> >
> >
> > Regards
> >
> > Anders Blomdell
> >
> >
> >
> >
> >>> ___
> >>> Gluster-devel mailing list
> >>> Gluster-devel@gluster.org
> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> >>>
> >> 

Re: [Gluster-devel] [Gluster-users] Addition of GlusterFS Port Maintainers

2014-06-24 Thread Justin Clift
On 24/06/2014, at 6:34 PM, Vijay Bellur wrote:
> Hi All,
> 
> Since there has been traction for ports of GlusterFS to other unix 
> distributions, we thought of adding maintainers for the various ports that 
> are around. I am glad to announce that the following individuals who have 
> been chugging GlusterFS along on those distributions have readily agreed to 
> be port maintainers. Please welcome:
> 
> 1. Emmanuel Dreyfus as maintainer for NetBSD
> 
> 2. Harshavardhana and Dennis Schafroth for Mac OS X
> 
> 3. Harshavardhana as interim maintainer for FreeBSD
> 
> All port maintainers will have commit access to GlusterFS repository and will 
> manage patches in gerrit that are necessary for keeping the ports functional. 
> We believe that this effort will help in keeping releases on various ports up 
> to date.
> 
> Let us extend our co-operation to port maintainers and help evolve a more 
> broader, vibrant community for GlusterFS!


Excellent stuff. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Addition of GlusterFS Port Maintainers

2014-06-24 Thread Vijay Bellur

Hi All,

Since there has been traction for ports of GlusterFS to other unix 
distributions, we thought of adding maintainers for the various ports 
that are around. I am glad to announce that the following individuals 
who have been chugging GlusterFS along on those distributions have 
readily agreed to be port maintainers. Please welcome:


1. Emmanuel Dreyfus as maintainer for NetBSD

2. Harshavardhana and Dennis Schafroth for Mac OS X

3. Harshavardhana as interim maintainer for FreeBSD

All port maintainers will have commit access to GlusterFS repository and 
will manage patches in gerrit that are necessary for keeping the ports 
functional. We believe that this effort will help in keeping releases on 
various ports up to date.


Let us extend our co-operation to port maintainers and help evolve a 
more broader, vibrant community for GlusterFS!


Cheers,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Data classification proposal

2014-06-24 Thread Jeff Darcy
> Its possible to express your example using lists if their entries are allowed
> to overlap. I see that you wanted a way to express a matrix (overlapping
> rules) with gluster's tree-like syntax as backdrop.
> 
> A polytree may be a better term than matrix (DAG without cycles), i.e. when
> there are overlaps a node in the graph gets multiple in-arcs.
> 
> Syntax aside, we seem to part on "where" to solve the problem- config file or
> UX. I prefer the UX have the logic to build the configuration file, given
> how complex it can be. My preference would be for the config file be mostly
> "read only" with extremely simple syntax.
> 
> I'll put some more thought into this and believe this discussion has
> illuminated some good points.
> 
> Brick: host1:/SSD1  SSD1
> Brick: host1:/SSD2  SSD2
> Brick: host2:/SSD3  SSD3
> Brick: host2:/SSD4  SSD4
> Brick: host1:/DISK1 DISK1
> 
> rule rack4:
>   select SSD1, SSD2, DISK1
> 
> # some files should go on ssds in rack 4
> rule A:
>   option filter-condition *.lock
>   select SSD1, SSD2
> 
> # some files should go on ssds anywhere
> rule B:
>   option filter-condition *.out
>   select SSD1, SSD2, SSD3, SSD4
> 
> # some files should go anywhere in rack 4
> rule C
>   option filter-condition *.c
>   select rack4
> 
> # some files we just don't care
> rule D
>   option filter-condition *.h
>   select SSD1, SSD2, SSD3, SSD4, DISK1
> 
> volume:
>   option filter-condition A,B,C,D

This seems to leave us with two options.  One option is that "select"
supports only explicit enumeration, so that adding a brick means editing
multiple rules that apply to it.  The other option is that "select"
supports wildcards.  Using a regex to match parts of a name is
effectively the same as matching the explicit tags we started with,
except that expressing complex Boolean conditions using a regex can get
more than a bit messy.  As Jamie Zawinski famously said:

> Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.

I think it's nice to support regexes instead of plain strings in
lower-level rules, but relying on them alone to express complex
higher-level policies would IMO be a mistake.  Likewise, defining a
proper syntax for a config file seems both more flexible and easier than
defining one for a CLI, where the parsing options are even more limited.
What happens when someone wants to use Puppet (for example) to set this
up?  Then the user would express their will in Puppet syntax, which
would have to convert it to our CLI syntax, which would convert it to
our config-file syntax.  Why not allow them to skip a step where
information might get lost or mangled in translation?  We can still have
CLI commands to do the most common kinds of manipulation, as we do for
volfiles, but the final form can be more extensible.  It will still be
more comprehensible than Ceph's CRUSH maps.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regarding inode-unref on root inode

2014-06-24 Thread Pranith Kumar Karampuri

Does anyone know why inode_unref is no-op for root inode?

I see the following code in inode.c

 static inode_t *
 __inode_unref (inode_t *inode)
 {
 if (!inode)
 return NULL;

 if (__is_root_gfid(inode->gfid))
 return inode;
 ...
}

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Data classification proposal

2014-06-24 Thread Dan Lambright

Its possible to express your example using lists if their entries are allowed 
to overlap. I see that you wanted a way to express a matrix (overlapping rules) 
with gluster's tree-like syntax as backdrop. 

A polytree may be a better term than matrix (DAG without cycles), i.e. when 
there are overlaps a node in the graph gets multiple in-arcs.

Syntax aside, we seem to part on "where" to solve the problem- config file or 
UX. I prefer the UX have the logic to build the configuration file, given how 
complex it can be. My preference would be for the config file be mostly "read 
only" with extremely simple syntax. 

I'll put some more thought into this and believe this discussion has 
illuminated some good points.

Brick: host1:/SSD1  SSD1
Brick: host1:/SSD2  SSD2
Brick: host2:/SSD3  SSD3
Brick: host2:/SSD4  SSD4
Brick: host1:/DISK1 DISK1

rule rack4: 
  select SSD1, SSD2, DISK1

# some files should go on ssds in rack 4
rule A: 
  option filter-condition *.lock
  select SSD1, SSD2

# some files should go on ssds anywhere
rule B: 
  option filter-condition *.out
  select SSD1, SSD2, SSD3, SSD4

# some files should go anywhere in rack 4
rule C 
  option filter-condition *.c
  select rack4

# some files we just don't care
rule D
  option filter-condition *.h
  select SSD1, SSD2, SSD3, SSD4, DISK1

volume:
  option filter-condition A,B,C,D

- Original Message -
From: "Jeff Darcy" 
To: "Dan Lambright" 
Cc: "Gluster Devel" 
Sent: Monday, June 23, 2014 7:11:44 PM
Subject: Re: [Gluster-devel] Data classification proposal

> Rather than using the keyword "unclaimed", my instinct was to
> explicitly list which bricks have not been "claimed".  Perhaps you
> have something more subtle in mind, it is not apparent to me from your
> response. Can you provide an example of why it is necessary and a list
> could not be provided in its place? If the list is somehow "difficult
> to figure out", due to a particularly complex setup or some such, I'd
> prefer a CLI/GUI build that list rather than having sysadmins
> hand-edit this file.

It's not *difficult* to make sure every brick has been enumerated by
some rule, and that there are no overlaps, but it's certainly tedious
and error prone.  Imagine that a user has four has bricks in four
machines, using names like serv1-b1, serv1-b2, ..., serv4-b6.
Accordingly, they've set up rules to put serv1* into one set and
serv[234]* into another set (which is already more flexibility than I
think your proposal gave them).  Now when they add serv5 they need an
extra step to add it to the tiering config, which wouldn't have been
necessary if we supported defaults.  What percentage of users would
forget that step at least once?  I don't know for sure, but I'd guess
it's pretty high.

Having a CLI or GUI create configs just means that we have to add
support for defaults there instead.  We'd still have to implement the
same logic, they'd still have to specify the same thing.  That just
seems like moving the problem around instead of solving it.

> The key-value piece seems like syntactic sugar - an "alias". If so,
> let the name itself be the alias. No notions of SSD or physical
> location need be inserted. Unless I am missing that it *is* necessary,
> I stand by that value judgement as a philosophy of not putting
> anything into the configuration file that you don't require. Can you
> provide an example of where it is necessary?

OK...
-


Brick: SSD1
Brick: SSD2
Brick: SSD3
Brick: SSD4
Brick: DISK1

rack4: SSD1, SSD2, DISK1

filter A : SSD1, SSD2

filter B : SSD1,SSD2, SSD3, SSD4

filter C: rack4

filter D: SSD1, SSD2, SSD3, SSD4, DISK1

meta-filter: filter A, filter B, filter C, filter D

  * some files should go on ssds in rack 4

  * some files should go on ssds anywhere

  * some files should go anywhere in rack 4

  * some files we just don't care

Notice how the rules *overlap*.  We can't support that if our syntax
only allows the user to express a list (or list of lists).  If the list
is ordered by type, we can't also support location-based rules.  If the
list is ordered by location, we lose type-based rules instead.   Brick
properties create a matrix, with an unknown number of dimensions (e.g.
security level, tenant ID, and so on as well as type and location).  The
logical way to represent such a space for rule-matching purposes is to
let users define however many dimensions (keys) as they want and as many
values for each dimension as they want.

Whether the exact string "type" or "unclaimed" appears anywhere isn't
the issue.  What matters is that the *semantics* of assigning properties
to a brick have to be more sophisticated than just assigning each a
position in a list, and we need a syntax that supports those semantics.
Otherwise we'll end up solving the same UX problems again and again each
time we add a feature that involves treating bricks or data differently.
Each time we'll probably do it a little differently and confuse users a
little more, if history is any gui

Re: [Gluster-devel] [Gluster-users] Glusterfs Help needed

2014-06-24 Thread Niels de Vos
On Tue, Jun 24, 2014 at 04:45:30PM +0530, Chandrahasa S wrote:
> Dear All,
> 
> I am building Glusterfs on shared storage.
> 
> I got Disk array with 2 SAS controller, one controller connected to node A 
> and other Node B.
> 
> Can I create Glusterfs between these two node ( A & B) without 
> replication, but data should be read / write on both node ( for better 
> performance). In case of node A fail data should be accessed from node B.

This does not sound like a use-case for GlusterFS. Gluster uses a local 
filesystem (like XFS) as backing storage, and that filesystem can only 
be mounted on one node (A or B) at the same time.

If you need a filesystem that can be mounted at two nodes (A and B) at 
the same time, you need to look at filesystems like GFS2.

HTH,
Niels

> 
> Please suggest.
> 
> Regards,
> Chandrahasa S
> Tata Consultancy Services
> Data Center- ( Non STPI)
> 2nd Pokharan Road,
> Subash Nagar ,
> Mumbai - 400601,Maharashtra
> India
> Ph:- +91 22 677-81825
> Buzz:- 4221825
> Mailto: chandrahas...@tcs.com
> Website: http://www.tcs.com
> 
> Experience certainty.   IT Services
> Business Solutions
> Consulting
> 
> 
> 
> 
> From:   jenk...@build.gluster.org (Gluster Build System)
> To: gluster-us...@gluster.org, gluster-devel@gluster.org
> Date:   06/24/2014 03:46 PM
> Subject:[Gluster-users] glusterfs-3.5.1 released
> Sent by:gluster-users-boun...@gluster.org
> 
> 
> 
> 
> 
> SRC: 
> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz
> 
> This release is made off jenkins-release-73
> 
> -- Gluster Build System
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 

> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Glusterfs Help needed

2014-06-24 Thread Chandrahasa S
Dear All,

I am building Glusterfs on shared storage.

I got Disk array with 2 SAS controller, one controller connected to node A 
and other Node B.

Can I create Glusterfs between these two node ( A & B) without 
replication, but data should be read / write on both node ( for better 
performance). In case of node A fail data should be accessed from node B.

Please suggest.

Regards,
Chandrahasa S
Tata Consultancy Services
Data Center- ( Non STPI)
2nd Pokharan Road,
Subash Nagar ,
Mumbai - 400601,Maharashtra
India
Ph:- +91 22 677-81825
Buzz:- 4221825
Mailto: chandrahas...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting




From:   jenk...@build.gluster.org (Gluster Build System)
To: gluster-us...@gluster.org, gluster-devel@gluster.org
Date:   06/24/2014 03:46 PM
Subject:[Gluster-users] glusterfs-3.5.1 released
Sent by:gluster-users-boun...@gluster.org





SRC: 
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz

This release is made off jenkins-release-73

-- Gluster Build System
___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] glusterfs-3.5.1 released

2014-06-24 Thread Chandrahasa S
Dear All,

I am building Glusterfs on shared storage.

I got Disk array with 2 SAS controller, one controller connected to node A 
and other Node B.

Can I create Glusterfs between these two node ( A & B) without 
replication, but data should be read / write on both node ( for better 
performance). In case of node A fail data should be accessed from node B.

Please suggest.

Regards,
Chandrahasa S
Tata Consultancy Services
Data Center- ( Non STPI)
2nd Pokharan Road,
Subash Nagar ,
Mumbai - 400601,Maharashtra
India
Ph:- +91 22 677-81825
Buzz:- 4221825
Mailto: chandrahas...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting




From:   jenk...@build.gluster.org (Gluster Build System)
To: gluster-us...@gluster.org, gluster-devel@gluster.org
Date:   06/24/2014 03:46 PM
Subject:[Gluster-users] glusterfs-3.5.1 released
Sent by:gluster-users-boun...@gluster.org





SRC: 
http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz

This release is made off jenkins-release-73

-- Gluster Build System
___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs-3.5.1 released

2014-06-24 Thread Gluster Build System


SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz

This release is made off jenkins-release-73

-- Gluster Build System
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool

2014-06-24 Thread Vijaikumar M

Hi Jeff,

This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: 
edge triggered and multi-threaded epoll).
The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please 
find the stack trace below).


In the code snippet below we found that 'SSL_pending' was returning 0.
I have added a condition here to return from the function when there is 
no data available.
Please suggest if this is OK to do this way or do we need to restructure 
this function for multi-threaded epoll?



 178 static int
 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, 
SSL_trinary_func *func)

 180 {
 

 211 switch (SSL_get_error(priv->ssl_ssl,r)) {
 212 case SSL_ERROR_NONE:
 213 return r;
 214 case SSL_ERROR_WANT_READ:
 215 if (SSL_pending(priv->ssl_ssl) == 0)
 216 return r;
 217 pfd.fd = priv->sock;
 221 if (poll(&pfd,1,-1) < 0) {




Thanks,
Vijay

On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
From the stack trace we found that function 'socket_submit_request' is 
waiting on mutext_lock.
lock is held by the function 'ssl_do' and this function is blocked by 
poll syscall.



(gdb) bt
#0  0x003daa80822d in pthread_join () from /lib64/libpthread.so.0
#1  0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=optimized out>) at event-epoll.c:632
#2  0x00407ecd in main (argc=4, argv=0x7fff160a4528) at 
glusterfsd.c:2023



(gdb) info threads
  10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  9 Thread 0x7f3b8ca82700 (LWP 26226)  0x003daa80f4b5 in sigwait 
() from /lib64/libpthread.so.0
  8 Thread 0x7f3b8c081700 (LWP 26227)  0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  7 Thread 0x7f3b8b680700 (LWP 26228)  0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  6 Thread 0x7f3b8a854700 (LWP 26232)  0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  5 Thread 0x7f3b89e53700 (LWP 26233)  0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  4 Thread 0x7f3b833eb700 (LWP 26241)  0x003daa4df343 in poll () 
from /lib64/libc.so.6
  3 Thread 0x7f3b82130700 (LWP 26245)  0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f3b8172f700 (LWP 26247)  0x003daa80e75d in read () 
from /lib64/libpthread.so.0
* 1 Thread 0x7f3b94a38700 (LWP 26224)  0x003daa80822d in 
pthread_join () from /lib64/libpthread.so.0



*(gdb) thread 3**
**[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0  
0x003daa80e264 in __lll_lock_wait ()**

**   from /lib64/libpthread.so.0**
**(gdb) bt
#0  0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x003daa8093d7 in pthread_mutex_lock () from 
/lib64/libpthread.so.0
#3  0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, 
req=0x7f3b8212f0b0) at socket.c:3134
*#4  0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, 
prog=,
procnum=, cbkfn=0x7f3b892364b0 
, proghdr=0x7f3b8212f410,
proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=optimized out>, frame=0x7f3b93d2a454,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)

at rpc-clnt.c:1556
#5  0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, 
req=,
frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, 
cbkfn=0x7f3b892364b0 , iobref=0x0,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,

xdrproc=0x7f3b94a4ede0 ) at client.c:243
#6  0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, 
this=0x7f3b7c005ef0, data=0x7f3b8212f660)

at client-rpc-fops.c:3119


(gdb) p priv->lock
$1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 
1, __kind = 0, __spins = 0, __list = {

  __prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\000\000\000\000\201f\000\000\001", '\000' 
, __align = 2}



*(gdb) thread 4
[Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0  
0x003daa4df343 in poll () from /lib64/libc.so.6

(gdb) bt
#0  0x003daa4df343 in poll () from /lib64/libc.so.6
#1  0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, 
buf=0x7f3b7c051264, len=4, func=0x3db2441570 )

at socket.c:216
#2  0x7f3b8aa7277b in __socket_ssl_readv (this=out>, opvector=,

opcount=) at socket.c:335
#3  0x7f3b8aa72c26 in __socket_cached_read (this=out>, vector=,
count=, pending_vector=0x7f3b7c051258, 
pending_count=0x7f3b7c051260, bytes=0x0, write=0)

at socket.c:422
#4  __socket_rwv (this=, vector=out>, count=,
pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, 
bytes=0x0, write=0) at socket.c:496
#5  

Re: [Gluster-devel] glusterfs-3.5.1 released

2014-06-24 Thread Humble Devassy Chirammal
Kudos to the folks behind this release !



On Tue, Jun 24, 2014 at 4:20 PM, Niels de Vos  wrote:

> On Tue, Jun 24, 2014 at 03:15:58AM -0700, Gluster Build System wrote:
> >
> >
> > SRC:
> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz
> >
> > This release is made off jenkins-release-73
>
> Many thanks to everyone how tested the glusterfs-3.5.1 beta releases and
> gave feedback. There were no regressions reported compared to the 3.5.0
> release.
>
> Many bugs have been fixed, and documentation for all new features in 3.5
> should be included now. Thanks to all the reporters, developers and
> testers for improving the 3.5 stable series.
>
> Below you will find the release notes in MarkDown format for
> glusterfs-3.5.1, these are included in the tar.gz as
> doc/release-notes/3.5.1.md. The mirror repository on GitHub provides
> a nicely rendered version:
> -
> https://github.com/gluster/glusterfs/blob/v3.5.1/doc/release-notes/3.5.1.md
>
> Packages for different Linux distributions will follow shortly.
> Notifications are normally sent to this list when the packages are
> available for download, and/or have reached the distributions update
> infrastructure.
>
> Changes for a new 3.5.2 release are now being accepted. The list of
> proposed fixes is already growing:
> -
> https://bugzilla.redhat.com/showdependencytree.cgi?hide_resolved=0&id=glusterfs-3.5.2
>
> Anyone is free to request a bugfix or backport for the 3.5.2 release. In
> order to do so, file a bug and set the 'blocked' field to
> 'glusterfs-3.5.2' so that we can track the requests. Use this link to
> make it a little easier for yourself:
> -
> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&blocked=glusterfs-3.5.2
>
> Cheers,
> Niels
>
>
>
> ## Release Notes for GlusterFS 3.5.1
>
> This is mostly a bugfix release. The [Release Notes for 3.5.0](3.5.0.md)
> contain a listing of all the new features that were added.
>
> There are two notable changes that are not only bug fixes, or documentation
> additions:
>
> 1. a new volume option `server.manage-gids` has been added
>This option should be used when users of a volume are in more than
>approximately 93 groups (Bug [1096425](
> https://bugzilla.redhat.com/1096425))
> 2. Duplicate Request Cache for NFS has now been disabled by default, this
> may
>reduce performance for certain workloads, but improves the overall
> stability
>and memory footprint for most users
>
> ### Bugs Fixed:
>
> * [765202](https://bugzilla.redhat.com/765202): lgetxattr called with
> invalid keys on the bricks
> * [833586](https://bugzilla.redhat.com/833586): inodelk hang from
> marker_rename_release_newp_lock
> * [859581](https://bugzilla.redhat.com/859581): self-heal process can
> sometimes create directories instead of symlinks for the root gfid file in
> .glusterfs
> * [986429](https://bugzilla.redhat.com/986429): Backupvolfile server
> option should work internal to GlusterFS framework
> * [1039544](https://bugzilla.redhat.com/1039544): [FEAT] "gluster volume
> heal info" should list the entries that actually required to be healed.
> * [1046624](https://bugzilla.redhat.com/1046624): Unable to heal symbolic
> Links
> * [1046853](https://bugzilla.redhat.com/1046853): AFR : For every file
> self-heal there are warning messages reported in glustershd.log file
> * [1063190](https://bugzilla.redhat.com/1063190): Volume was not
> accessible after server side quorum was met
> * [1064096](https://bugzilla.redhat.com/1064096): The old Python
> Translator code (not Glupy) should be removed
> * [1066996](https://bugzilla.redhat.com/1066996): Using sanlock on a
> gluster mount with replica 3 (quorum-type auto) leads to a split-brain
> * [1071191](https://bugzilla.redhat.com/1071191): [3.5.1] Sporadic SIGBUS
> with mmap() on a sparse file created with open(), seek(), write()
> * [1078061](https://bugzilla.redhat.com/1078061): Need ability to heal
> mismatching user extended attributes without any changelogs
> * [1078365](https://bugzilla.redhat.com/1078365): New xlators are linked
> as versioned .so files, creating .so.0.0.0
> * [1086743](https://bugzilla.redhat.com/1086743): Add documentation for
> the Feature: RDMA-connection manager (RDMA-CM)
> * [1086748](https://bugzilla.redhat.com/1086748): Add documentation for
> the Feature: AFR CLI enhancements
> * [1086749](https://bugzilla.redhat.com/1086749): Add documentation for
> the Feature: Exposing Volume Capabilities
> * [1086750](https://bugzilla.redhat.com/1086750): Add documentation for
> the Feature: File Snapshots in GlusterFS
> * [1086751](https://bugzilla.redhat.com/1086751): Add documentation for
> the Feature: gfid-access
> * [1086752](https://bugzilla.redhat.com/1086752): Add documentation for
> the Feature: On-Wire Compression/Decompression
> * [1086754](https://bugzilla.redhat.com/1086754): Add documentation for
> the Feature: Quota Scalability
> * [1086755](https://bugzilla.redhat.com/1086755): Add documenta

Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool

2014-06-24 Thread Vijaikumar M

Hi Jeff,

Missed to add this:
SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned 
'SSL_ERROR_WANT_READ'


Thanks,
Vijay


On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote:

Hi Jeff,

This is regarding the patch http://review.gluster.org/#/c/3842/ 
(epoll: edge triggered and multi-threaded epoll).
The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please 
find the stack trace below).


In the code snippet below we found that 'SSL_pending' was returning 0.
I have added a condition here to return from the function when there 
is no data available.
Please suggest if this is OK to do this way or do we need to 
restructure this function for multi-threaded epoll?



 178 static int
 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, 
SSL_trinary_func *func)

 180 {
 

 211 switch (SSL_get_error(priv->ssl_ssl,r)) {
 212 case SSL_ERROR_NONE:
 213 return r;
 214 case SSL_ERROR_WANT_READ:
 215 if (SSL_pending(priv->ssl_ssl) == 0)
 216 return r;
 217 pfd.fd = priv->sock;
 221 if (poll(&pfd,1,-1) < 0) {




Thanks,
Vijay

On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote:
From the stack trace we found that function 'socket_submit_request' 
is waiting on mutext_lock.
lock is held by the function 'ssl_do' and this function is blocked by 
poll syscall.



(gdb) bt
#0  0x003daa80822d in pthread_join () from /lib64/libpthread.so.0
#1  0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=optimized out>) at event-epoll.c:632
#2  0x00407ecd in main (argc=4, argv=0x7fff160a4528) at 
glusterfsd.c:2023



(gdb) info threads
  10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  9 Thread 0x7f3b8ca82700 (LWP 26226) 0x003daa80f4b5 in sigwait 
() from /lib64/libpthread.so.0
  8 Thread 0x7f3b8c081700 (LWP 26227) 0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  7 Thread 0x7f3b8b680700 (LWP 26228) 0x003daa80b98e in 
pthread_cond_timedwait@@GLIBC_2.3.2 ()

   from /lib64/libpthread.so.0
  6 Thread 0x7f3b8a854700 (LWP 26232) 0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  5 Thread 0x7f3b89e53700 (LWP 26233) 0x003daa4e9163 in 
epoll_wait () from /lib64/libc.so.6
  4 Thread 0x7f3b833eb700 (LWP 26241) 0x003daa4df343 in poll () 
from /lib64/libc.so.6
  3 Thread 0x7f3b82130700 (LWP 26245) 0x003daa80e264 in 
__lll_lock_wait () from /lib64/libpthread.so.0
  2 Thread 0x7f3b8172f700 (LWP 26247) 0x003daa80e75d in read () 
from /lib64/libpthread.so.0
* 1 Thread 0x7f3b94a38700 (LWP 26224) 0x003daa80822d in 
pthread_join () from /lib64/libpthread.so.0



*(gdb) thread 3**
**[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0  
0x003daa80e264 in __lll_lock_wait ()**

**   from /lib64/libpthread.so.0**
**(gdb) bt
#0  0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x003daa8093d7 in pthread_mutex_lock () from 
/lib64/libpthread.so.0
#3  0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, 
req=0x7f3b8212f0b0) at socket.c:3134
*#4  0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, 
prog=,
procnum=, cbkfn=0x7f3b892364b0 
, proghdr=0x7f3b8212f410,
proghdrcount=1, progpayload=0x0, progpayloadcount=0, 
iobref=, frame=0x7f3b93d2a454,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0)

at rpc-clnt.c:1556
#5  0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, 
req=,
frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, 
cbkfn=0x7f3b892364b0 , iobref=0x0,
rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, 
rsp_payload_count=0, rsp_iobref=0x7f3b700010d0,

xdrproc=0x7f3b94a4ede0 ) at client.c:243
#6  0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, 
this=0x7f3b7c005ef0, data=0x7f3b8212f660)

at client-rpc-fops.c:3119


(gdb) p priv->lock
$1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 
1, __kind = 0, __spins = 0, __list = {

  __prev = 0x0, __next = 0x0}},
  __size = "\002\000\000\000\000\000\000\000\201f\000\000\001", 
'\000' , __align = 2}



*(gdb) thread 4
[Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0  
0x003daa4df343 in poll () from /lib64/libc.so.6

(gdb) bt
#0  0x003daa4df343 in poll () from /lib64/libc.so.6
#1  0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, 
buf=0x7f3b7c051264, len=4, func=0x3db2441570 )

at socket.c:216
#2  0x7f3b8aa7277b in __socket_ssl_readv (this=out>, opvector=,

opcount=) at socket.c:335
#3  0x7f3b8aa72c26 in __socket_cached_read (this=out>, vector=,
count=, pending_vector=0x7f3b7c051258, 
pending_count=0x7f3b7c051260, bytes=0x0, write

Re: [Gluster-devel] Data classification proposal

2014-06-24 Thread Jeff Darcy
> Am I right if I understood that the value for media-type is not
> interpreted beyond the scope of matching rules? That is to say, we
> don't need/have any notion of media-types that type check internally
> for forming (sub)volumes using the rules specified.

Exactly.  To us it's just an opaque ID.

> Should the no. of bricks or lower-level subvolumes that match the rule
> be an exact multiple of group-size?

Good question.  I think users see the current requirement to add bricks
in multiples of the replica/stripe size as an annoyance.  This will only
get worse with erasure coding where the group size is larger.  On the
other hand, we do need to make sure that members of a group are on
different machines.  This is why I think we need to be able to split
bricks, so that we can use overlapping replica/erasure sets.  For
example, if we have five bricks and two-way replication, we can split
bricks to get a multiple of two and life's good again.  So *long term* I
think we can/should remove any restriction on users, but there are a
whole bunch of unsolved issues around brick splitting.  I'm not sure
what to do in the short term.

> > Here's a more complex example that adds replication and erasure
> > coding to the mix.
> >
> > # Assume 20 hosts, four fast and sixteen slow (named
> > appropriately).
> >
> > rule tier-1
> > select *fast*
> > group-size 2
> > type cluster/afr
> >
> > rule tier-2
> > # special pattern matching otherwise-unused bricks
> > select %{unclaimed}
> > group-size 8
> > type cluster/ec parity=2
> > # i.e. two groups, each six data plus two parity
> >
> > rule all
> > select tier-1
> > select tier-2
> > type features/tiering
> >
>
> In the above example we would have 2 subvolumes each containing 2
> bricks that would be aggregated by rule tier-1. Lets call those
> subvolumes as tier-1-fast-0 and tier-fast-1.  Both of these subvolumes
> are afr based two-way replicated subvolumes.  Are these instances of
> tier-1-* composed using cluster/dht by the default semantics?

Yes.  Any time we have multiple subvolumes and no other specified way to
combine them into one, we just slap DHT on top.  We do this already at
the top level; with data classification we might do it at lower levels
too.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] glusterfs-3.5.1 released

2014-06-24 Thread Niels de Vos
On Tue, Jun 24, 2014 at 03:15:58AM -0700, Gluster Build System wrote:
> 
> 
> SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz
> 
> This release is made off jenkins-release-73

Many thanks to everyone how tested the glusterfs-3.5.1 beta releases and 
gave feedback. There were no regressions reported compared to the 3.5.0 
release.

Many bugs have been fixed, and documentation for all new features in 3.5 
should be included now. Thanks to all the reporters, developers and 
testers for improving the 3.5 stable series.

Below you will find the release notes in MarkDown format for 
glusterfs-3.5.1, these are included in the tar.gz as
doc/release-notes/3.5.1.md. The mirror repository on GitHub provides 
a nicely rendered version:
- https://github.com/gluster/glusterfs/blob/v3.5.1/doc/release-notes/3.5.1.md

Packages for different Linux distributions will follow shortly.  
Notifications are normally sent to this list when the packages are 
available for download, and/or have reached the distributions update 
infrastructure.

Changes for a new 3.5.2 release are now being accepted. The list of 
proposed fixes is already growing:
- 
https://bugzilla.redhat.com/showdependencytree.cgi?hide_resolved=0&id=glusterfs-3.5.2

Anyone is free to request a bugfix or backport for the 3.5.2 release. In 
order to do so, file a bug and set the 'blocked' field to 
'glusterfs-3.5.2' so that we can track the requests. Use this link to 
make it a little easier for yourself:
- 
https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&blocked=glusterfs-3.5.2

Cheers,
Niels



## Release Notes for GlusterFS 3.5.1

This is mostly a bugfix release. The [Release Notes for 3.5.0](3.5.0.md)
contain a listing of all the new features that were added.

There are two notable changes that are not only bug fixes, or documentation
additions:

1. a new volume option `server.manage-gids` has been added
   This option should be used when users of a volume are in more than
   approximately 93 groups (Bug [1096425](https://bugzilla.redhat.com/1096425))
2. Duplicate Request Cache for NFS has now been disabled by default, this may
   reduce performance for certain workloads, but improves the overall stability
   and memory footprint for most users

### Bugs Fixed:

* [765202](https://bugzilla.redhat.com/765202): lgetxattr called with invalid 
keys on the bricks
* [833586](https://bugzilla.redhat.com/833586): inodelk hang from 
marker_rename_release_newp_lock
* [859581](https://bugzilla.redhat.com/859581): self-heal process can sometimes 
create directories instead of symlinks for the root gfid file in .glusterfs
* [986429](https://bugzilla.redhat.com/986429): Backupvolfile server option 
should work internal to GlusterFS framework
* [1039544](https://bugzilla.redhat.com/1039544): [FEAT] "gluster volume heal 
info" should list the entries that actually required to be healed.
* [1046624](https://bugzilla.redhat.com/1046624): Unable to heal symbolic Links
* [1046853](https://bugzilla.redhat.com/1046853): AFR : For every file 
self-heal there are warning messages reported in glustershd.log file
* [1063190](https://bugzilla.redhat.com/1063190): Volume was not accessible 
after server side quorum was met
* [1064096](https://bugzilla.redhat.com/1064096): The old Python Translator 
code (not Glupy) should be removed
* [1066996](https://bugzilla.redhat.com/1066996): Using sanlock on a gluster 
mount with replica 3 (quorum-type auto) leads to a split-brain
* [1071191](https://bugzilla.redhat.com/1071191): [3.5.1] Sporadic SIGBUS with 
mmap() on a sparse file created with open(), seek(), write()
* [1078061](https://bugzilla.redhat.com/1078061): Need ability to heal 
mismatching user extended attributes without any changelogs
* [1078365](https://bugzilla.redhat.com/1078365): New xlators are linked as 
versioned .so files, creating .so.0.0.0
* [1086743](https://bugzilla.redhat.com/1086743): Add documentation for the 
Feature: RDMA-connection manager (RDMA-CM)
* [1086748](https://bugzilla.redhat.com/1086748): Add documentation for the 
Feature: AFR CLI enhancements
* [1086749](https://bugzilla.redhat.com/1086749): Add documentation for the 
Feature: Exposing Volume Capabilities
* [1086750](https://bugzilla.redhat.com/1086750): Add documentation for the 
Feature: File Snapshots in GlusterFS
* [1086751](https://bugzilla.redhat.com/1086751): Add documentation for the 
Feature: gfid-access
* [1086752](https://bugzilla.redhat.com/1086752): Add documentation for the 
Feature: On-Wire Compression/Decompression
* [1086754](https://bugzilla.redhat.com/1086754): Add documentation for the 
Feature: Quota Scalability
* [1086755](https://bugzilla.redhat.com/1086755): Add documentation for the 
Feature: readdir-ahead
* [1086756](https://bugzilla.redhat.com/1086756): Add documentation for the 
Feature: zerofill API for GlusterFS
* [1086758](https://bugzilla.redhat.com/1086758): Add documentation for the 
Feature: Changelog based parallel 

Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories

2014-06-24 Thread Anders Blomdell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2014-06-23 12:03, Niels de Vos wrote:
> On Tue, Jun 17, 2014 at 11:49:26AM -0400, Shyamsundar Ranganathan wrote:
>> You maybe looking at the problem being fixed here, [1].
>>
>> On a lookup attribute mismatch was not being healed across
>> directories, and this patch attempts to address the same. Currently
>> the version of the patch does not heal the S_ISUID and S_ISGID bits,
>> which is work in progress (but easy enough to incorporate and test
>> based on the patch at [1]).
>>
>> On a separate note, add-brick just adds a brick to the cluster, the
>> lookup is where the heal (or creation of the directory across all sub
>> volumes in DHT xlator) is being done.
>
> I assume that this is not a regression between 3.5.0 and 3.5.1? If that
> is the case, we can pull the fix in 3.5.2 because 3.5.1 really should
> not get delayed much longer.
No, it does not work in 3.5.0 either :-(


The proposed patch does not work as intended, with the following hieararchy

   7550:   0 /mnt/gluster
  27770:1000 /mnt/gluster/test
  2755 1000:1000 /mnt/gluster/test/dir1
  2755 1000:1000 /mnt/gluster/test/dir1/dir2

In the (approx 25%) of cases where my test-script does trigger a
self heal on disk2, 10% ends up with (giving access error on client):

 00:   0 /data/disk2/gluster/test
   755 1000:1000 /data/disk2/gluster/test/dir1
   755 1000:1000 /data/disk2/gluster/test/dir1/dir2
or

  27770:1000 /data/disk2/gluster/test
 00:   0 /data/disk2/gluster/test/dir1
   755 1000:1000 /data/disk2/gluster/test/dir1/dir2

or

  27770:1000 /data/disk2/gluster/test
  2755 1000:1000 /data/disk2/gluster/test/dir1
 00:   0 /data/disk2/gluster/test/dir1/dir2


and 73% ends up with either partially healed directories
(/data/disk2/gluster/test/dir1/dir2 or
 /data/disk2/gluster/test/dir1 missing) or the sgid bit
[randomly] set on some of the directories.

Since I don't even understand how to reliably trigger
a self-heal of the directories, I'm currently clueless
to the reason for this behaviour.

Soo, I think that the comment from susant in
http://review.gluster.org/#/c/6983/3/xlators/cluster/dht/src/dht-common.c:

  susant palai  Jun 13 9:04 AM

   I think we dont have to worry about that.
   Rebalance does not interfere with directory SUID/GID/STICKY bits.

unfortunately is wrong :-(, and I'm on too deep water to understand how to
fix this at the moment.


N.B: with 00777 flags on the /mnt/gluster/test directory
I have not been able to trigger any unreadable directories

/Anders

>
> Thanks,
> Niels
>
>>
>> Shyam
>>
>> [1] http://review.gluster.org/#/c/6983/
>>
>> - Original Message -
>>> From: "Anders Blomdell" 
>>> To: "Gluster Devel" 
>>> Sent: Tuesday, June 17, 2014 10:53:52 AM
>>> Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on
>>> directories
>>>
> With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted
> 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4
> addresses), I get
> weird behavior if I:
>
> 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test)
> 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1)
> 3. Do an add-brick
>
> Before add-brick
>
>755 /mnt/gluster
>   7775 /mnt/gluster/test
>   2755 /mnt/gluster/test/dir1
>
> After add-brick
>
>755 /mnt/gluster
>   1775 /mnt/gluster/test
>755 /mnt/gluster/test/dir1
>
> On the server it looks like this:
>
>   7775 /data/disk1/gluster/test
>   2755 /data/disk1/gluster/test/dir1
>   1775 /data/disk2/gluster/test
>755 /data/disk2/gluster/test/dir1
>
> Filed as bug:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=1110262
>
> If somebody can point me to where the logic of add-brick is placed, I can
> give
> it a shot (a find/grep on mkdir didn't immediately point me to the right
> place).
>
>
> Regards
>
> Anders Blomdell
>
>
>
>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel

- -- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqTJQAAoJENZYyvaDG8Nc590H/25klV6DD96A8LXPa8Z7UhdG
nHLrMCIe2sOM6DhXrg/lt9QEi5iHpmBQoDk/F+yQ/AWx5c9OWYnDcPYS+4DB0lRp
WZDEdL3DiLYjDCePsibvULVeXrt60nobWo01LMbLrVYh6mXuKee+HwPskCWLtWuo
9CipjyrDUio2ofEhLBNrTJqJ0BUjvSzBOIswcizpeU6oUGllT0GRkXydaaZj4/Gq
IKgCjrSdjS8iy+Rk0yd+LERUakBr8J1PRtuVuYm07MXnaDlDW45P2HhWQUSGciTE
/ujVY7NTCdBO75MlFHgkxNy33tvj3MuSJ53MrGSnLDeep

Re: [Gluster-devel] Data classification proposal

2014-06-24 Thread Krishnan Parthasarathi
Jeff,

I have a few questions regarding the rules syntax and how they apply.
I think this is different in spirit from the discussion Dan has started
and keeping it separate. See questions inline.

- Original Message -
> One of the things holding up our data classification efforts (which include
> tiering but also other stuff as well) has been the extension of the same
> conceptual model from the I/O path to the configuration subsystem and
> ultimately to the user experience.  How does an administrator define a
> tiering policy without tearing their hair out?  How does s/he define a mixed
> replication/erasure-coding setup without wanting to rip *our* hair out?  The
> included Markdown document attempts to remedy this by proposing one out of
> many possible models and user interfaces.  It includes examples for some of
> the most common use cases, including the "replica 2.5" case we'e been
> discussing recently.  Constructive feedback would be greatly appreciated.
> 
> 
> 
> # Data Classification Interface
> 
> The data classification feature is extremely flexible, to cover use cases
> from
> SSD/disk tiering to rack-aware placement to security or other policies.  With
> this flexibility comes complexity.  While this complexity does not affect the
> I/O path much, it does affect both the volume-configuration subsystem and the
> user interface to set placement policies.  This document describes one
> possible
> model and user interface.
> 
> The model we used is based on two kinds of information: brick descriptions
> and
> aggregation rules.  Both are contained in a configuration file (format TBD)
> which can be associated with a volume using a volume option.
> 
> ## Brick Descriptions
> 
> A brick is described by a series of simple key/value pairs.  Predefined keys
> include:
> 
>  * **media-type**
>The underlying media type for the brick.  In its simplest form this might
>just be *ssd* or *disk*.  More sophisticated users might use something
>like
>*15krpm* to represent a faster disk, or *perc-raid5* to represent a brick
>backed by a RAID controller.

Am I right if I understood that the value for media-type is not interpreted 
beyond the
scope of matching rules? That is to say, we don't need/have any notion of 
media-types
that type check internally for forming (sub)volumes using the rules specified.

> 
>  * **rack** (and/or **row**)
>The physical location of the brick.  Some policy rules might be set up to
>spread data across more than one rack.
> 
> User-defined keys are also allowed.  For example, some users might use a
> *tenant* or *security-level* tag as the basis for their placement policy.
> 
> ## Aggregation Rules
> 
> Aggregation rules are used to define how bricks should be combined into
> subvolumes, and those potentially combined into higher-level subvolumes, and
> so
> on until all of the bricks are accounted for.  Each aggregation rule consists
> of the following parts:
> 
>  * **id**
>The base name of the subvolumes the rule will create.  If a rule is
>applied
>multiple times this will yield *id-0*, *id-1*, and so on.
> 
>  * **selector**
>A "filter" for which bricks or lower-level subvolumes the rule will
>aggregate.  This is an expression similar to a *WHERE* clause in SQL,
>using
>brick/subvolume names and properties in lieu of columns.  These values are
>then matched against literal values or regular expressions, using the
>usual
>set of boolean operators to arrive at a *yes* or *no* answer to the
>question
>of whether this brick/subvolume is affected by this rule.
> 
>  * **group-size** (optional)
>The number of original bricks/subvolumes to be combined into each produced
>subvolume.  The special default value zero means to collect all original
>bricks or subvolumes into one final subvolume.  In this case, *id* is used
>directly instead of having a numeric suffix appended.

Should the no. of bricks or lower-level subvolumes that match the rule be an 
exact
multiple of group-size?

> 
>  * **type** (optional)
>The type of the generated translator definition(s).  Examples might
>include
>"AFR" to do replication, "EC" to do erasure coding, and so on.  The more
>general data classification task includes the definition of new
>translators
>to do tiering and other kinds of filtering, but those are beyond the scope
>of this document.  If no type is specified, cluster/dht will be used to do
>random placement among its constituents.
> 
>  * **tag** and **option** (optional, repeatable)
>Additional tags and/or options to be applied to each newly created
>subvolume.  See the "replica 2.5" example to see how this can be used.
> 
> Since each type might have unique requirements, such as ensuring that
> replication is done across machines or racks whenever possible, it is assumed
> that there will be corresponding type-specific scripts or functions to do the
> a