Re: [Gluster-devel] regarding inode-unref on root inode
On Tuesday 24 June 2014 08:17 PM, Pranith Kumar Karampuri wrote: Does anyone know why inode_unref is no-op for root inode? I see the following code in inode.c static inode_t * __inode_unref (inode_t *inode) { if (!inode) return NULL; if (__is_root_gfid(inode->gfid)) return inode; ... } I think its done with the intention that, root inode should *never* ever get removed from the active inodes list. (not even accidentally). So unref on root-inode is a no-op. Dont know whether there are any other reasons. Regards, Raghavendra Bhat Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Data classification proposal
Jeff, - Original Message - > > Am I right if I understood that the value for media-type is not > > interpreted beyond the scope of matching rules? That is to say, we > > don't need/have any notion of media-types that type check internally > > for forming (sub)volumes using the rules specified. > > Exactly. To us it's just an opaque ID. OK. That makes sense. > > > Should the no. of bricks or lower-level subvolumes that match the rule > > be an exact multiple of group-size? > > Good question. I think users see the current requirement to add bricks > in multiples of the replica/stripe size as an annoyance. This will only > get worse with erasure coding where the group size is larger. On the > other hand, we do need to make sure that members of a group are on > different machines. This is why I think we need to be able to split > bricks, so that we can use overlapping replica/erasure sets. For > example, if we have five bricks and two-way replication, we can split > bricks to get a multiple of two and life's good again. So *long term* I > think we can/should remove any restriction on users, but there are a > whole bunch of unsolved issues around brick splitting. I'm not sure > what to do in the short term. For the short-term, wouldn't it be OK to disallow adding bricks that is not a multiple of group-size? > > > > Here's a more complex example that adds replication and erasure > > > coding to the mix. > > > > > > # Assume 20 hosts, four fast and sixteen slow (named > > > appropriately). > > > > > > rule tier-1 > > > select *fast* > > > group-size 2 > > > type cluster/afr > > > > > > rule tier-2 > > > # special pattern matching otherwise-unused bricks > > > select %{unclaimed} > > > group-size 8 > > > type cluster/ec parity=2 > > > # i.e. two groups, each six data plus two parity > > > > > > rule all > > > select tier-1 > > > select tier-2 > > > type features/tiering > > > > > > > In the above example we would have 2 subvolumes each containing 2 > > bricks that would be aggregated by rule tier-1. Lets call those > > subvolumes as tier-1-fast-0 and tier-fast-1. Both of these subvolumes > > are afr based two-way replicated subvolumes. Are these instances of > > tier-1-* composed using cluster/dht by the default semantics? > > Yes. Any time we have multiple subvolumes and no other specified way to > combine them into one, we just slap DHT on top. We do this already at > the top level; with data classification we might do it at lower levels > too. > thanks, Krish ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Addition of GlusterFS Port Maintainers
On Tue, Jun 24, 2014 at 10:43 AM, Justin Clift wrote: > On 24/06/2014, at 6:34 PM, Vijay Bellur wrote: > > Hi All, > > > > Since there has been traction for ports of GlusterFS to other unix > distributions, we thought of adding maintainers for the various ports that > are around. I am glad to announce that the following individuals who have > been chugging GlusterFS along on those distributions have readily agreed to > be port maintainers. Please welcome: > > > > 1. Emmanuel Dreyfus as maintainer for NetBSD > > > > 2. Harshavardhana and Dennis Schafroth for Mac OS X > > > > 3. Harshavardhana as interim maintainer for FreeBSD > > > > All port maintainers will have commit access to GlusterFS repository and > will manage patches in gerrit that are necessary for keeping the ports > functional. We believe that this effort will help in keeping releases on > various ports up to date. > > > > Let us extend our co-operation to port maintainers and help evolve a > more broader, vibrant community for GlusterFS! > > > Excellent stuff. :) > > + Justin > +1 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster on OSX
On 23/05/2014, at 6:52 PM, Harshavardhana wrote: >> Do you reckon we should get that Mac Mini in the Westford >> lab set up to automatically test Gluster builds each >> night or something? >> >> If so, we should probably take/claim ownership of it, >> upgrade the memory in it, and (possibly) see if it can be >> put in the DMZ. > > Up to you guys, it would be great. I am doing it manually for now once > in 2days :-) I've just ordered the ram upgrade (16GB) for it. Kaleb should receive it in a week or so, and will be able to install it after that. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories
- Original Message - > From: "Anders Blomdell" > To: "Niels de Vos" > Cc: "Shyamsundar Ranganathan" , "Gluster Devel" > , "Susant Palai" > > Sent: Tuesday, June 24, 2014 4:09:52 AM > Subject: Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on > directories > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 2014-06-23 12:03, Niels de Vos wrote: > > On Tue, Jun 17, 2014 at 11:49:26AM -0400, Shyamsundar Ranganathan wrote: > >> You maybe looking at the problem being fixed here, [1]. > >> > >> On a lookup attribute mismatch was not being healed across > >> directories, and this patch attempts to address the same. Currently > >> the version of the patch does not heal the S_ISUID and S_ISGID bits, > >> which is work in progress (but easy enough to incorporate and test > >> based on the patch at [1]). > >> > >> On a separate note, add-brick just adds a brick to the cluster, the > >> lookup is where the heal (or creation of the directory across all sub > >> volumes in DHT xlator) is being done. > > > > I assume that this is not a regression between 3.5.0 and 3.5.1? If that > > is the case, we can pull the fix in 3.5.2 because 3.5.1 really should > > not get delayed much longer. > No, it does not work in 3.5.0 either :-( I ran these tests using your scripts and observed similar behavior and need to dig into this a little further to understand how to make this work reliably. > > > The proposed patch does not work as intended, with the following hieararchy > >7550: 0 /mnt/gluster > 27770:1000 /mnt/gluster/test > 2755 1000:1000 /mnt/gluster/test/dir1 > 2755 1000:1000 /mnt/gluster/test/dir1/dir2 > > In the (approx 25%) of cases where my test-script does trigger a > self heal on disk2, 10% ends up with (giving access error on client): > > 00: 0 /data/disk2/gluster/test >755 1000:1000 /data/disk2/gluster/test/dir1 >755 1000:1000 /data/disk2/gluster/test/dir1/dir2 > or > > 27770:1000 /data/disk2/gluster/test > 00: 0 /data/disk2/gluster/test/dir1 >755 1000:1000 /data/disk2/gluster/test/dir1/dir2 > > or > > 27770:1000 /data/disk2/gluster/test > 2755 1000:1000 /data/disk2/gluster/test/dir1 > 00: 0 /data/disk2/gluster/test/dir1/dir2 > > > and 73% ends up with either partially healed directories > (/data/disk2/gluster/test/dir1/dir2 or > /data/disk2/gluster/test/dir1 missing) or the sgid bit > [randomly] set on some of the directories. > > Since I don't even understand how to reliably trigger > a self-heal of the directories, I'm currently clueless > to the reason for this behaviour. > > Soo, I think that the comment from susant in > http://review.gluster.org/#/c/6983/3/xlators/cluster/dht/src/dht-common.c: > > susant palaiJun 13 9:04 AM > >I think we dont have to worry about that. >Rebalance does not interfere with directory SUID/GID/STICKY bits. > > unfortunately is wrong :-(, and I'm on too deep water to understand how to > fix this at the moment. Currently in the test case rebalance is not run, so the above comment in relation to rebalance is sort of different that what is observed. Just a note. > > > N.B: with 00777 flags on the /mnt/gluster/test directory > I have not been able to trigger any unreadable directories > > /Anders > > > > > Thanks, > > Niels > > > >> > >> Shyam > >> > >> [1] http://review.gluster.org/#/c/6983/ > >> > >> - Original Message - > >>> From: "Anders Blomdell" > >>> To: "Gluster Devel" > >>> Sent: Tuesday, June 17, 2014 10:53:52 AM > >>> Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on > >>> directories > >>> > > With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted > > 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4 > > addresses), I get > > weird behavior if I: > > > > 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test) > > 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1) > > 3. Do an add-brick > > > > Before add-brick > > > >755 /mnt/gluster > > 7775 /mnt/gluster/test > > 2755 /mnt/gluster/test/dir1 > > > > After add-brick > > > >755 /mnt/gluster > > 1775 /mnt/gluster/test > >755 /mnt/gluster/test/dir1 > > > > On the server it looks like this: > > > > 7775 /data/disk1/gluster/test > > 2755 /data/disk1/gluster/test/dir1 > > 1775 /data/disk2/gluster/test > >755 /data/disk2/gluster/test/dir1 > > > > Filed as bug: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1110262 > > > > If somebody can point me to where the logic of add-brick is placed, I can > > give > > it a shot (a find/grep on mkdir didn't immediately point me to the right > > place). > > > > > > Regards > > > > Anders Blomdell > > > > > > > > > >>> ___ > >>> Gluster-devel mailing list > >>> Gluster-devel@gluster.org > >>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel > >>> > >>
Re: [Gluster-devel] [Gluster-users] Addition of GlusterFS Port Maintainers
On 24/06/2014, at 6:34 PM, Vijay Bellur wrote: > Hi All, > > Since there has been traction for ports of GlusterFS to other unix > distributions, we thought of adding maintainers for the various ports that > are around. I am glad to announce that the following individuals who have > been chugging GlusterFS along on those distributions have readily agreed to > be port maintainers. Please welcome: > > 1. Emmanuel Dreyfus as maintainer for NetBSD > > 2. Harshavardhana and Dennis Schafroth for Mac OS X > > 3. Harshavardhana as interim maintainer for FreeBSD > > All port maintainers will have commit access to GlusterFS repository and will > manage patches in gerrit that are necessary for keeping the ports functional. > We believe that this effort will help in keeping releases on various ports up > to date. > > Let us extend our co-operation to port maintainers and help evolve a more > broader, vibrant community for GlusterFS! Excellent stuff. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Addition of GlusterFS Port Maintainers
Hi All, Since there has been traction for ports of GlusterFS to other unix distributions, we thought of adding maintainers for the various ports that are around. I am glad to announce that the following individuals who have been chugging GlusterFS along on those distributions have readily agreed to be port maintainers. Please welcome: 1. Emmanuel Dreyfus as maintainer for NetBSD 2. Harshavardhana and Dennis Schafroth for Mac OS X 3. Harshavardhana as interim maintainer for FreeBSD All port maintainers will have commit access to GlusterFS repository and will manage patches in gerrit that are necessary for keeping the ports functional. We believe that this effort will help in keeping releases on various ports up to date. Let us extend our co-operation to port maintainers and help evolve a more broader, vibrant community for GlusterFS! Cheers, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Data classification proposal
> Its possible to express your example using lists if their entries are allowed > to overlap. I see that you wanted a way to express a matrix (overlapping > rules) with gluster's tree-like syntax as backdrop. > > A polytree may be a better term than matrix (DAG without cycles), i.e. when > there are overlaps a node in the graph gets multiple in-arcs. > > Syntax aside, we seem to part on "where" to solve the problem- config file or > UX. I prefer the UX have the logic to build the configuration file, given > how complex it can be. My preference would be for the config file be mostly > "read only" with extremely simple syntax. > > I'll put some more thought into this and believe this discussion has > illuminated some good points. > > Brick: host1:/SSD1 SSD1 > Brick: host1:/SSD2 SSD2 > Brick: host2:/SSD3 SSD3 > Brick: host2:/SSD4 SSD4 > Brick: host1:/DISK1 DISK1 > > rule rack4: > select SSD1, SSD2, DISK1 > > # some files should go on ssds in rack 4 > rule A: > option filter-condition *.lock > select SSD1, SSD2 > > # some files should go on ssds anywhere > rule B: > option filter-condition *.out > select SSD1, SSD2, SSD3, SSD4 > > # some files should go anywhere in rack 4 > rule C > option filter-condition *.c > select rack4 > > # some files we just don't care > rule D > option filter-condition *.h > select SSD1, SSD2, SSD3, SSD4, DISK1 > > volume: > option filter-condition A,B,C,D This seems to leave us with two options. One option is that "select" supports only explicit enumeration, so that adding a brick means editing multiple rules that apply to it. The other option is that "select" supports wildcards. Using a regex to match parts of a name is effectively the same as matching the explicit tags we started with, except that expressing complex Boolean conditions using a regex can get more than a bit messy. As Jamie Zawinski famously said: > Some people, when confronted with a problem, think "I know, I'll use > regular expressions." Now they have two problems. I think it's nice to support regexes instead of plain strings in lower-level rules, but relying on them alone to express complex higher-level policies would IMO be a mistake. Likewise, defining a proper syntax for a config file seems both more flexible and easier than defining one for a CLI, where the parsing options are even more limited. What happens when someone wants to use Puppet (for example) to set this up? Then the user would express their will in Puppet syntax, which would have to convert it to our CLI syntax, which would convert it to our config-file syntax. Why not allow them to skip a step where information might get lost or mangled in translation? We can still have CLI commands to do the most common kinds of manipulation, as we do for volfiles, but the final form can be more extensible. It will still be more comprehensible than Ceph's CRUSH maps. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding inode-unref on root inode
Does anyone know why inode_unref is no-op for root inode? I see the following code in inode.c static inode_t * __inode_unref (inode_t *inode) { if (!inode) return NULL; if (__is_root_gfid(inode->gfid)) return inode; ... } Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Data classification proposal
Its possible to express your example using lists if their entries are allowed to overlap. I see that you wanted a way to express a matrix (overlapping rules) with gluster's tree-like syntax as backdrop. A polytree may be a better term than matrix (DAG without cycles), i.e. when there are overlaps a node in the graph gets multiple in-arcs. Syntax aside, we seem to part on "where" to solve the problem- config file or UX. I prefer the UX have the logic to build the configuration file, given how complex it can be. My preference would be for the config file be mostly "read only" with extremely simple syntax. I'll put some more thought into this and believe this discussion has illuminated some good points. Brick: host1:/SSD1 SSD1 Brick: host1:/SSD2 SSD2 Brick: host2:/SSD3 SSD3 Brick: host2:/SSD4 SSD4 Brick: host1:/DISK1 DISK1 rule rack4: select SSD1, SSD2, DISK1 # some files should go on ssds in rack 4 rule A: option filter-condition *.lock select SSD1, SSD2 # some files should go on ssds anywhere rule B: option filter-condition *.out select SSD1, SSD2, SSD3, SSD4 # some files should go anywhere in rack 4 rule C option filter-condition *.c select rack4 # some files we just don't care rule D option filter-condition *.h select SSD1, SSD2, SSD3, SSD4, DISK1 volume: option filter-condition A,B,C,D - Original Message - From: "Jeff Darcy" To: "Dan Lambright" Cc: "Gluster Devel" Sent: Monday, June 23, 2014 7:11:44 PM Subject: Re: [Gluster-devel] Data classification proposal > Rather than using the keyword "unclaimed", my instinct was to > explicitly list which bricks have not been "claimed". Perhaps you > have something more subtle in mind, it is not apparent to me from your > response. Can you provide an example of why it is necessary and a list > could not be provided in its place? If the list is somehow "difficult > to figure out", due to a particularly complex setup or some such, I'd > prefer a CLI/GUI build that list rather than having sysadmins > hand-edit this file. It's not *difficult* to make sure every brick has been enumerated by some rule, and that there are no overlaps, but it's certainly tedious and error prone. Imagine that a user has four has bricks in four machines, using names like serv1-b1, serv1-b2, ..., serv4-b6. Accordingly, they've set up rules to put serv1* into one set and serv[234]* into another set (which is already more flexibility than I think your proposal gave them). Now when they add serv5 they need an extra step to add it to the tiering config, which wouldn't have been necessary if we supported defaults. What percentage of users would forget that step at least once? I don't know for sure, but I'd guess it's pretty high. Having a CLI or GUI create configs just means that we have to add support for defaults there instead. We'd still have to implement the same logic, they'd still have to specify the same thing. That just seems like moving the problem around instead of solving it. > The key-value piece seems like syntactic sugar - an "alias". If so, > let the name itself be the alias. No notions of SSD or physical > location need be inserted. Unless I am missing that it *is* necessary, > I stand by that value judgement as a philosophy of not putting > anything into the configuration file that you don't require. Can you > provide an example of where it is necessary? OK... - Brick: SSD1 Brick: SSD2 Brick: SSD3 Brick: SSD4 Brick: DISK1 rack4: SSD1, SSD2, DISK1 filter A : SSD1, SSD2 filter B : SSD1,SSD2, SSD3, SSD4 filter C: rack4 filter D: SSD1, SSD2, SSD3, SSD4, DISK1 meta-filter: filter A, filter B, filter C, filter D * some files should go on ssds in rack 4 * some files should go on ssds anywhere * some files should go anywhere in rack 4 * some files we just don't care Notice how the rules *overlap*. We can't support that if our syntax only allows the user to express a list (or list of lists). If the list is ordered by type, we can't also support location-based rules. If the list is ordered by location, we lose type-based rules instead. Brick properties create a matrix, with an unknown number of dimensions (e.g. security level, tenant ID, and so on as well as type and location). The logical way to represent such a space for rule-matching purposes is to let users define however many dimensions (keys) as they want and as many values for each dimension as they want. Whether the exact string "type" or "unclaimed" appears anywhere isn't the issue. What matters is that the *semantics* of assigning properties to a brick have to be more sophisticated than just assigning each a position in a list, and we need a syntax that supports those semantics. Otherwise we'll end up solving the same UX problems again and again each time we add a feature that involves treating bricks or data differently. Each time we'll probably do it a little differently and confuse users a little more, if history is any gui
Re: [Gluster-devel] [Gluster-users] Glusterfs Help needed
On Tue, Jun 24, 2014 at 04:45:30PM +0530, Chandrahasa S wrote: > Dear All, > > I am building Glusterfs on shared storage. > > I got Disk array with 2 SAS controller, one controller connected to node A > and other Node B. > > Can I create Glusterfs between these two node ( A & B) without > replication, but data should be read / write on both node ( for better > performance). In case of node A fail data should be accessed from node B. This does not sound like a use-case for GlusterFS. Gluster uses a local filesystem (like XFS) as backing storage, and that filesystem can only be mounted on one node (A or B) at the same time. If you need a filesystem that can be mounted at two nodes (A and B) at the same time, you need to look at filesystems like GFS2. HTH, Niels > > Please suggest. > > Regards, > Chandrahasa S > Tata Consultancy Services > Data Center- ( Non STPI) > 2nd Pokharan Road, > Subash Nagar , > Mumbai - 400601,Maharashtra > India > Ph:- +91 22 677-81825 > Buzz:- 4221825 > Mailto: chandrahas...@tcs.com > Website: http://www.tcs.com > > Experience certainty. IT Services > Business Solutions > Consulting > > > > > From: jenk...@build.gluster.org (Gluster Build System) > To: gluster-us...@gluster.org, gluster-devel@gluster.org > Date: 06/24/2014 03:46 PM > Subject:[Gluster-users] glusterfs-3.5.1 released > Sent by:gluster-users-boun...@gluster.org > > > > > > SRC: > http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz > > This release is made off jenkins-release-73 > > -- Gluster Build System > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > =-=-= > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > ___ > Gluster-users mailing list > gluster-us...@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Glusterfs Help needed
Dear All, I am building Glusterfs on shared storage. I got Disk array with 2 SAS controller, one controller connected to node A and other Node B. Can I create Glusterfs between these two node ( A & B) without replication, but data should be read / write on both node ( for better performance). In case of node A fail data should be accessed from node B. Please suggest. Regards, Chandrahasa S Tata Consultancy Services Data Center- ( Non STPI) 2nd Pokharan Road, Subash Nagar , Mumbai - 400601,Maharashtra India Ph:- +91 22 677-81825 Buzz:- 4221825 Mailto: chandrahas...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting From: jenk...@build.gluster.org (Gluster Build System) To: gluster-us...@gluster.org, gluster-devel@gluster.org Date: 06/24/2014 03:46 PM Subject:[Gluster-users] glusterfs-3.5.1 released Sent by:gluster-users-boun...@gluster.org SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz This release is made off jenkins-release-73 -- Gluster Build System ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] glusterfs-3.5.1 released
Dear All, I am building Glusterfs on shared storage. I got Disk array with 2 SAS controller, one controller connected to node A and other Node B. Can I create Glusterfs between these two node ( A & B) without replication, but data should be read / write on both node ( for better performance). In case of node A fail data should be accessed from node B. Please suggest. Regards, Chandrahasa S Tata Consultancy Services Data Center- ( Non STPI) 2nd Pokharan Road, Subash Nagar , Mumbai - 400601,Maharashtra India Ph:- +91 22 677-81825 Buzz:- 4221825 Mailto: chandrahas...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting From: jenk...@build.gluster.org (Gluster Build System) To: gluster-us...@gluster.org, gluster-devel@gluster.org Date: 06/24/2014 03:46 PM Subject:[Gluster-users] glusterfs-3.5.1 released Sent by:gluster-users-boun...@gluster.org SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz This release is made off jenkins-release-73 -- Gluster Build System ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] glusterfs-3.5.1 released
SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz This release is made off jenkins-release-73 -- Gluster Build System ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
Hi Jeff, This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: edge triggered and multi-threaded epoll). The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please find the stack trace below). In the code snippet below we found that 'SSL_pending' was returning 0. I have added a condition here to return from the function when there is no data available. Please suggest if this is OK to do this way or do we need to restructure this function for multi-threaded epoll? 178 static int 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, SSL_trinary_func *func) 180 { 211 switch (SSL_get_error(priv->ssl_ssl,r)) { 212 case SSL_ERROR_NONE: 213 return r; 214 case SSL_ERROR_WANT_READ: 215 if (SSL_pending(priv->ssl_ssl) == 0) 216 return r; 217 pfd.fd = priv->sock; 221 if (poll(&pfd,1,-1) < 0) { Thanks, Vijay On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote: From the stack trace we found that function 'socket_submit_request' is waiting on mutext_lock. lock is held by the function 'ssl_do' and this function is blocked by poll syscall. (gdb) bt #0 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 #1 0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=optimized out>) at event-epoll.c:632 #2 0x00407ecd in main (argc=4, argv=0x7fff160a4528) at glusterfsd.c:2023 (gdb) info threads 10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7f3b8ca82700 (LWP 26226) 0x003daa80f4b5 in sigwait () from /lib64/libpthread.so.0 8 Thread 0x7f3b8c081700 (LWP 26227) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7f3b8b680700 (LWP 26228) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f3b8a854700 (LWP 26232) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x7f3b89e53700 (LWP 26233) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7f3b833eb700 (LWP 26241) 0x003daa4df343 in poll () from /lib64/libc.so.6 3 Thread 0x7f3b82130700 (LWP 26245) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f3b8172f700 (LWP 26247) 0x003daa80e75d in read () from /lib64/libpthread.so.0 * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 *(gdb) thread 3** **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0 0x003daa80e264 in __lll_lock_wait ()** ** from /lib64/libpthread.so.0** **(gdb) bt #0 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x003daa8093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, req=0x7f3b8212f0b0) at socket.c:3134 *#4 0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, prog=, procnum=, cbkfn=0x7f3b892364b0 , proghdr=0x7f3b8212f410, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=optimized out>, frame=0x7f3b93d2a454, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0) at rpc-clnt.c:1556 #5 0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, req=, frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, cbkfn=0x7f3b892364b0 , iobref=0x0, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0, xdrproc=0x7f3b94a4ede0 ) at client.c:243 #6 0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, this=0x7f3b7c005ef0, data=0x7f3b8212f660) at client-rpc-fops.c:3119 (gdb) p priv->lock $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 1, __kind = 0, __spins = 0, __list = { __prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\201f\000\000\001", '\000' , __align = 2} *(gdb) thread 4 [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0 0x003daa4df343 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x003daa4df343 in poll () from /lib64/libc.so.6 #1 0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, buf=0x7f3b7c051264, len=4, func=0x3db2441570 ) at socket.c:216 #2 0x7f3b8aa7277b in __socket_ssl_readv (this=out>, opvector=, opcount=) at socket.c:335 #3 0x7f3b8aa72c26 in __socket_cached_read (this=out>, vector=, count=, pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, bytes=0x0, write=0) at socket.c:422 #4 __socket_rwv (this=, vector=out>, count=, pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, bytes=0x0, write=0) at socket.c:496 #5
Re: [Gluster-devel] glusterfs-3.5.1 released
Kudos to the folks behind this release ! On Tue, Jun 24, 2014 at 4:20 PM, Niels de Vos wrote: > On Tue, Jun 24, 2014 at 03:15:58AM -0700, Gluster Build System wrote: > > > > > > SRC: > http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz > > > > This release is made off jenkins-release-73 > > Many thanks to everyone how tested the glusterfs-3.5.1 beta releases and > gave feedback. There were no regressions reported compared to the 3.5.0 > release. > > Many bugs have been fixed, and documentation for all new features in 3.5 > should be included now. Thanks to all the reporters, developers and > testers for improving the 3.5 stable series. > > Below you will find the release notes in MarkDown format for > glusterfs-3.5.1, these are included in the tar.gz as > doc/release-notes/3.5.1.md. The mirror repository on GitHub provides > a nicely rendered version: > - > https://github.com/gluster/glusterfs/blob/v3.5.1/doc/release-notes/3.5.1.md > > Packages for different Linux distributions will follow shortly. > Notifications are normally sent to this list when the packages are > available for download, and/or have reached the distributions update > infrastructure. > > Changes for a new 3.5.2 release are now being accepted. The list of > proposed fixes is already growing: > - > https://bugzilla.redhat.com/showdependencytree.cgi?hide_resolved=0&id=glusterfs-3.5.2 > > Anyone is free to request a bugfix or backport for the 3.5.2 release. In > order to do so, file a bug and set the 'blocked' field to > 'glusterfs-3.5.2' so that we can track the requests. Use this link to > make it a little easier for yourself: > - > https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&blocked=glusterfs-3.5.2 > > Cheers, > Niels > > > > ## Release Notes for GlusterFS 3.5.1 > > This is mostly a bugfix release. The [Release Notes for 3.5.0](3.5.0.md) > contain a listing of all the new features that were added. > > There are two notable changes that are not only bug fixes, or documentation > additions: > > 1. a new volume option `server.manage-gids` has been added >This option should be used when users of a volume are in more than >approximately 93 groups (Bug [1096425]( > https://bugzilla.redhat.com/1096425)) > 2. Duplicate Request Cache for NFS has now been disabled by default, this > may >reduce performance for certain workloads, but improves the overall > stability >and memory footprint for most users > > ### Bugs Fixed: > > * [765202](https://bugzilla.redhat.com/765202): lgetxattr called with > invalid keys on the bricks > * [833586](https://bugzilla.redhat.com/833586): inodelk hang from > marker_rename_release_newp_lock > * [859581](https://bugzilla.redhat.com/859581): self-heal process can > sometimes create directories instead of symlinks for the root gfid file in > .glusterfs > * [986429](https://bugzilla.redhat.com/986429): Backupvolfile server > option should work internal to GlusterFS framework > * [1039544](https://bugzilla.redhat.com/1039544): [FEAT] "gluster volume > heal info" should list the entries that actually required to be healed. > * [1046624](https://bugzilla.redhat.com/1046624): Unable to heal symbolic > Links > * [1046853](https://bugzilla.redhat.com/1046853): AFR : For every file > self-heal there are warning messages reported in glustershd.log file > * [1063190](https://bugzilla.redhat.com/1063190): Volume was not > accessible after server side quorum was met > * [1064096](https://bugzilla.redhat.com/1064096): The old Python > Translator code (not Glupy) should be removed > * [1066996](https://bugzilla.redhat.com/1066996): Using sanlock on a > gluster mount with replica 3 (quorum-type auto) leads to a split-brain > * [1071191](https://bugzilla.redhat.com/1071191): [3.5.1] Sporadic SIGBUS > with mmap() on a sparse file created with open(), seek(), write() > * [1078061](https://bugzilla.redhat.com/1078061): Need ability to heal > mismatching user extended attributes without any changelogs > * [1078365](https://bugzilla.redhat.com/1078365): New xlators are linked > as versioned .so files, creating .so.0.0.0 > * [1086743](https://bugzilla.redhat.com/1086743): Add documentation for > the Feature: RDMA-connection manager (RDMA-CM) > * [1086748](https://bugzilla.redhat.com/1086748): Add documentation for > the Feature: AFR CLI enhancements > * [1086749](https://bugzilla.redhat.com/1086749): Add documentation for > the Feature: Exposing Volume Capabilities > * [1086750](https://bugzilla.redhat.com/1086750): Add documentation for > the Feature: File Snapshots in GlusterFS > * [1086751](https://bugzilla.redhat.com/1086751): Add documentation for > the Feature: gfid-access > * [1086752](https://bugzilla.redhat.com/1086752): Add documentation for > the Feature: On-Wire Compression/Decompression > * [1086754](https://bugzilla.redhat.com/1086754): Add documentation for > the Feature: Quota Scalability > * [1086755](https://bugzilla.redhat.com/1086755): Add documenta
Re: [Gluster-devel] Change in glusterfs[master]: epoll: Handle client and server FDs in a separate event pool
Hi Jeff, Missed to add this: SSL_pending was 0 before calling SSL_readand hence SSL_get_errorreturned 'SSL_ERROR_WANT_READ' Thanks, Vijay On Tuesday 24 June 2014 05:15 PM, Vijaikumar M wrote: Hi Jeff, This is regarding the patch http://review.gluster.org/#/c/3842/ (epoll: edge triggered and multi-threaded epoll). The testcase './tests/bugs/bug-873367.t' hangs with this fix (Please find the stack trace below). In the code snippet below we found that 'SSL_pending' was returning 0. I have added a condition here to return from the function when there is no data available. Please suggest if this is OK to do this way or do we need to restructure this function for multi-threaded epoll? 178 static int 179 ssl_do (rpc_transport_t *this, void *buf, size_t len, SSL_trinary_func *func) 180 { 211 switch (SSL_get_error(priv->ssl_ssl,r)) { 212 case SSL_ERROR_NONE: 213 return r; 214 case SSL_ERROR_WANT_READ: 215 if (SSL_pending(priv->ssl_ssl) == 0) 216 return r; 217 pfd.fd = priv->sock; 221 if (poll(&pfd,1,-1) < 0) { Thanks, Vijay On Tuesday 24 June 2014 03:55 PM, Vijaikumar M wrote: From the stack trace we found that function 'socket_submit_request' is waiting on mutext_lock. lock is held by the function 'ssl_do' and this function is blocked by poll syscall. (gdb) bt #0 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 #1 0x7f3b94eea9d0 in event_dispatch_epoll (event_pool=optimized out>) at event-epoll.c:632 #2 0x00407ecd in main (argc=4, argv=0x7fff160a4528) at glusterfsd.c:2023 (gdb) info threads 10 Thread 0x7f3b8d483700 (LWP 26225) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7f3b8ca82700 (LWP 26226) 0x003daa80f4b5 in sigwait () from /lib64/libpthread.so.0 8 Thread 0x7f3b8c081700 (LWP 26227) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7f3b8b680700 (LWP 26228) 0x003daa80b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7f3b8a854700 (LWP 26232) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 5 Thread 0x7f3b89e53700 (LWP 26233) 0x003daa4e9163 in epoll_wait () from /lib64/libc.so.6 4 Thread 0x7f3b833eb700 (LWP 26241) 0x003daa4df343 in poll () from /lib64/libc.so.6 3 Thread 0x7f3b82130700 (LWP 26245) 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f3b8172f700 (LWP 26247) 0x003daa80e75d in read () from /lib64/libpthread.so.0 * 1 Thread 0x7f3b94a38700 (LWP 26224) 0x003daa80822d in pthread_join () from /lib64/libpthread.so.0 *(gdb) thread 3** **[Switching to thread 3 (Thread 0x7f3b82130700 (LWP 26245))]#0 0x003daa80e264 in __lll_lock_wait ()** ** from /lib64/libpthread.so.0** **(gdb) bt #0 0x003daa80e264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x003daa809508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x003daa8093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x7f3b8aa74524 in socket_submit_request (this=0x7f3b7c0505c0, req=0x7f3b8212f0b0) at socket.c:3134 *#4 0x7f3b94c6b7d5 in rpc_clnt_submit (rpc=0x7f3b7c029ce0, prog=, procnum=, cbkfn=0x7f3b892364b0 , proghdr=0x7f3b8212f410, proghdrcount=1, progpayload=0x0, progpayloadcount=0, iobref=, frame=0x7f3b93d2a454, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0) at rpc-clnt.c:1556 #5 0x7f3b892243b0 in client_submit_request (this=0x7f3b7c005ef0, req=, frame=0x7f3b93d2a454, prog=0x7f3b894525a0, procnum=27, cbkfn=0x7f3b892364b0 , iobref=0x0, rsphdr=0x7f3b8212f4c0, rsphdr_count=1, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x7f3b700010d0, xdrproc=0x7f3b94a4ede0 ) at client.c:243 #6 0x7f3b8922fa42 in client3_3_lookup (frame=0x7f3b93d2a454, this=0x7f3b7c005ef0, data=0x7f3b8212f660) at client-rpc-fops.c:3119 (gdb) p priv->lock $1 = {__data = {__lock = 2, __count = 0, __owner = 26241, __nusers = 1, __kind = 0, __spins = 0, __list = { __prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\201f\000\000\001", '\000' , __align = 2} *(gdb) thread 4 [Switching to thread 4 (Thread 0x7f3b833eb700 (LWP 26241))]#0 0x003daa4df343 in poll () from /lib64/libc.so.6 (gdb) bt #0 0x003daa4df343 in poll () from /lib64/libc.so.6 #1 0x7f3b8aa71fff in ssl_do (this=0x7f3b7c0505c0, buf=0x7f3b7c051264, len=4, func=0x3db2441570 ) at socket.c:216 #2 0x7f3b8aa7277b in __socket_ssl_readv (this=out>, opvector=, opcount=) at socket.c:335 #3 0x7f3b8aa72c26 in __socket_cached_read (this=out>, vector=, count=, pending_vector=0x7f3b7c051258, pending_count=0x7f3b7c051260, bytes=0x0, write
Re: [Gluster-devel] Data classification proposal
> Am I right if I understood that the value for media-type is not > interpreted beyond the scope of matching rules? That is to say, we > don't need/have any notion of media-types that type check internally > for forming (sub)volumes using the rules specified. Exactly. To us it's just an opaque ID. > Should the no. of bricks or lower-level subvolumes that match the rule > be an exact multiple of group-size? Good question. I think users see the current requirement to add bricks in multiples of the replica/stripe size as an annoyance. This will only get worse with erasure coding where the group size is larger. On the other hand, we do need to make sure that members of a group are on different machines. This is why I think we need to be able to split bricks, so that we can use overlapping replica/erasure sets. For example, if we have five bricks and two-way replication, we can split bricks to get a multiple of two and life's good again. So *long term* I think we can/should remove any restriction on users, but there are a whole bunch of unsolved issues around brick splitting. I'm not sure what to do in the short term. > > Here's a more complex example that adds replication and erasure > > coding to the mix. > > > > # Assume 20 hosts, four fast and sixteen slow (named > > appropriately). > > > > rule tier-1 > > select *fast* > > group-size 2 > > type cluster/afr > > > > rule tier-2 > > # special pattern matching otherwise-unused bricks > > select %{unclaimed} > > group-size 8 > > type cluster/ec parity=2 > > # i.e. two groups, each six data plus two parity > > > > rule all > > select tier-1 > > select tier-2 > > type features/tiering > > > > In the above example we would have 2 subvolumes each containing 2 > bricks that would be aggregated by rule tier-1. Lets call those > subvolumes as tier-1-fast-0 and tier-fast-1. Both of these subvolumes > are afr based two-way replicated subvolumes. Are these instances of > tier-1-* composed using cluster/dht by the default semantics? Yes. Any time we have multiple subvolumes and no other specified way to combine them into one, we just slap DHT on top. We do this already at the top level; with data classification we might do it at lower levels too. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] glusterfs-3.5.1 released
On Tue, Jun 24, 2014 at 03:15:58AM -0700, Gluster Build System wrote: > > > SRC: http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs-3.5.1.tar.gz > > This release is made off jenkins-release-73 Many thanks to everyone how tested the glusterfs-3.5.1 beta releases and gave feedback. There were no regressions reported compared to the 3.5.0 release. Many bugs have been fixed, and documentation for all new features in 3.5 should be included now. Thanks to all the reporters, developers and testers for improving the 3.5 stable series. Below you will find the release notes in MarkDown format for glusterfs-3.5.1, these are included in the tar.gz as doc/release-notes/3.5.1.md. The mirror repository on GitHub provides a nicely rendered version: - https://github.com/gluster/glusterfs/blob/v3.5.1/doc/release-notes/3.5.1.md Packages for different Linux distributions will follow shortly. Notifications are normally sent to this list when the packages are available for download, and/or have reached the distributions update infrastructure. Changes for a new 3.5.2 release are now being accepted. The list of proposed fixes is already growing: - https://bugzilla.redhat.com/showdependencytree.cgi?hide_resolved=0&id=glusterfs-3.5.2 Anyone is free to request a bugfix or backport for the 3.5.2 release. In order to do so, file a bug and set the 'blocked' field to 'glusterfs-3.5.2' so that we can track the requests. Use this link to make it a little easier for yourself: - https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&version=3.5.1&blocked=glusterfs-3.5.2 Cheers, Niels ## Release Notes for GlusterFS 3.5.1 This is mostly a bugfix release. The [Release Notes for 3.5.0](3.5.0.md) contain a listing of all the new features that were added. There are two notable changes that are not only bug fixes, or documentation additions: 1. a new volume option `server.manage-gids` has been added This option should be used when users of a volume are in more than approximately 93 groups (Bug [1096425](https://bugzilla.redhat.com/1096425)) 2. Duplicate Request Cache for NFS has now been disabled by default, this may reduce performance for certain workloads, but improves the overall stability and memory footprint for most users ### Bugs Fixed: * [765202](https://bugzilla.redhat.com/765202): lgetxattr called with invalid keys on the bricks * [833586](https://bugzilla.redhat.com/833586): inodelk hang from marker_rename_release_newp_lock * [859581](https://bugzilla.redhat.com/859581): self-heal process can sometimes create directories instead of symlinks for the root gfid file in .glusterfs * [986429](https://bugzilla.redhat.com/986429): Backupvolfile server option should work internal to GlusterFS framework * [1039544](https://bugzilla.redhat.com/1039544): [FEAT] "gluster volume heal info" should list the entries that actually required to be healed. * [1046624](https://bugzilla.redhat.com/1046624): Unable to heal symbolic Links * [1046853](https://bugzilla.redhat.com/1046853): AFR : For every file self-heal there are warning messages reported in glustershd.log file * [1063190](https://bugzilla.redhat.com/1063190): Volume was not accessible after server side quorum was met * [1064096](https://bugzilla.redhat.com/1064096): The old Python Translator code (not Glupy) should be removed * [1066996](https://bugzilla.redhat.com/1066996): Using sanlock on a gluster mount with replica 3 (quorum-type auto) leads to a split-brain * [1071191](https://bugzilla.redhat.com/1071191): [3.5.1] Sporadic SIGBUS with mmap() on a sparse file created with open(), seek(), write() * [1078061](https://bugzilla.redhat.com/1078061): Need ability to heal mismatching user extended attributes without any changelogs * [1078365](https://bugzilla.redhat.com/1078365): New xlators are linked as versioned .so files, creating .so.0.0.0 * [1086743](https://bugzilla.redhat.com/1086743): Add documentation for the Feature: RDMA-connection manager (RDMA-CM) * [1086748](https://bugzilla.redhat.com/1086748): Add documentation for the Feature: AFR CLI enhancements * [1086749](https://bugzilla.redhat.com/1086749): Add documentation for the Feature: Exposing Volume Capabilities * [1086750](https://bugzilla.redhat.com/1086750): Add documentation for the Feature: File Snapshots in GlusterFS * [1086751](https://bugzilla.redhat.com/1086751): Add documentation for the Feature: gfid-access * [1086752](https://bugzilla.redhat.com/1086752): Add documentation for the Feature: On-Wire Compression/Decompression * [1086754](https://bugzilla.redhat.com/1086754): Add documentation for the Feature: Quota Scalability * [1086755](https://bugzilla.redhat.com/1086755): Add documentation for the Feature: readdir-ahead * [1086756](https://bugzilla.redhat.com/1086756): Add documentation for the Feature: zerofill API for GlusterFS * [1086758](https://bugzilla.redhat.com/1086758): Add documentation for the Feature: Changelog based parallel
Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2014-06-23 12:03, Niels de Vos wrote: > On Tue, Jun 17, 2014 at 11:49:26AM -0400, Shyamsundar Ranganathan wrote: >> You maybe looking at the problem being fixed here, [1]. >> >> On a lookup attribute mismatch was not being healed across >> directories, and this patch attempts to address the same. Currently >> the version of the patch does not heal the S_ISUID and S_ISGID bits, >> which is work in progress (but easy enough to incorporate and test >> based on the patch at [1]). >> >> On a separate note, add-brick just adds a brick to the cluster, the >> lookup is where the heal (or creation of the directory across all sub >> volumes in DHT xlator) is being done. > > I assume that this is not a regression between 3.5.0 and 3.5.1? If that > is the case, we can pull the fix in 3.5.2 because 3.5.1 really should > not get delayed much longer. No, it does not work in 3.5.0 either :-( The proposed patch does not work as intended, with the following hieararchy 7550: 0 /mnt/gluster 27770:1000 /mnt/gluster/test 2755 1000:1000 /mnt/gluster/test/dir1 2755 1000:1000 /mnt/gluster/test/dir1/dir2 In the (approx 25%) of cases where my test-script does trigger a self heal on disk2, 10% ends up with (giving access error on client): 00: 0 /data/disk2/gluster/test 755 1000:1000 /data/disk2/gluster/test/dir1 755 1000:1000 /data/disk2/gluster/test/dir1/dir2 or 27770:1000 /data/disk2/gluster/test 00: 0 /data/disk2/gluster/test/dir1 755 1000:1000 /data/disk2/gluster/test/dir1/dir2 or 27770:1000 /data/disk2/gluster/test 2755 1000:1000 /data/disk2/gluster/test/dir1 00: 0 /data/disk2/gluster/test/dir1/dir2 and 73% ends up with either partially healed directories (/data/disk2/gluster/test/dir1/dir2 or /data/disk2/gluster/test/dir1 missing) or the sgid bit [randomly] set on some of the directories. Since I don't even understand how to reliably trigger a self-heal of the directories, I'm currently clueless to the reason for this behaviour. Soo, I think that the comment from susant in http://review.gluster.org/#/c/6983/3/xlators/cluster/dht/src/dht-common.c: susant palai Jun 13 9:04 AM I think we dont have to worry about that. Rebalance does not interfere with directory SUID/GID/STICKY bits. unfortunately is wrong :-(, and I'm on too deep water to understand how to fix this at the moment. N.B: with 00777 flags on the /mnt/gluster/test directory I have not been able to trigger any unreadable directories /Anders > > Thanks, > Niels > >> >> Shyam >> >> [1] http://review.gluster.org/#/c/6983/ >> >> - Original Message - >>> From: "Anders Blomdell" >>> To: "Gluster Devel" >>> Sent: Tuesday, June 17, 2014 10:53:52 AM >>> Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on >>> directories >>> > With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted > 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4 > addresses), I get > weird behavior if I: > > 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test) > 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1) > 3. Do an add-brick > > Before add-brick > >755 /mnt/gluster > 7775 /mnt/gluster/test > 2755 /mnt/gluster/test/dir1 > > After add-brick > >755 /mnt/gluster > 1775 /mnt/gluster/test >755 /mnt/gluster/test/dir1 > > On the server it looks like this: > > 7775 /data/disk1/gluster/test > 2755 /data/disk1/gluster/test/dir1 > 1775 /data/disk2/gluster/test >755 /data/disk2/gluster/test/dir1 > > Filed as bug: > > https://bugzilla.redhat.com/show_bug.cgi?id=1110262 > > If somebody can point me to where the logic of add-brick is placed, I can > give > it a shot (a find/grep on mkdir didn't immediately point me to the right > place). > > > Regards > > Anders Blomdell > > > > >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel >>> >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel - -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTqTJQAAoJENZYyvaDG8Nc590H/25klV6DD96A8LXPa8Z7UhdG nHLrMCIe2sOM6DhXrg/lt9QEi5iHpmBQoDk/F+yQ/AWx5c9OWYnDcPYS+4DB0lRp WZDEdL3DiLYjDCePsibvULVeXrt60nobWo01LMbLrVYh6mXuKee+HwPskCWLtWuo 9CipjyrDUio2ofEhLBNrTJqJ0BUjvSzBOIswcizpeU6oUGllT0GRkXydaaZj4/Gq IKgCjrSdjS8iy+Rk0yd+LERUakBr8J1PRtuVuYm07MXnaDlDW45P2HhWQUSGciTE /ujVY7NTCdBO75MlFHgkxNy33tvj3MuSJ53MrGSnLDeep
Re: [Gluster-devel] Data classification proposal
Jeff, I have a few questions regarding the rules syntax and how they apply. I think this is different in spirit from the discussion Dan has started and keeping it separate. See questions inline. - Original Message - > One of the things holding up our data classification efforts (which include > tiering but also other stuff as well) has been the extension of the same > conceptual model from the I/O path to the configuration subsystem and > ultimately to the user experience. How does an administrator define a > tiering policy without tearing their hair out? How does s/he define a mixed > replication/erasure-coding setup without wanting to rip *our* hair out? The > included Markdown document attempts to remedy this by proposing one out of > many possible models and user interfaces. It includes examples for some of > the most common use cases, including the "replica 2.5" case we'e been > discussing recently. Constructive feedback would be greatly appreciated. > > > > # Data Classification Interface > > The data classification feature is extremely flexible, to cover use cases > from > SSD/disk tiering to rack-aware placement to security or other policies. With > this flexibility comes complexity. While this complexity does not affect the > I/O path much, it does affect both the volume-configuration subsystem and the > user interface to set placement policies. This document describes one > possible > model and user interface. > > The model we used is based on two kinds of information: brick descriptions > and > aggregation rules. Both are contained in a configuration file (format TBD) > which can be associated with a volume using a volume option. > > ## Brick Descriptions > > A brick is described by a series of simple key/value pairs. Predefined keys > include: > > * **media-type** >The underlying media type for the brick. In its simplest form this might >just be *ssd* or *disk*. More sophisticated users might use something >like >*15krpm* to represent a faster disk, or *perc-raid5* to represent a brick >backed by a RAID controller. Am I right if I understood that the value for media-type is not interpreted beyond the scope of matching rules? That is to say, we don't need/have any notion of media-types that type check internally for forming (sub)volumes using the rules specified. > > * **rack** (and/or **row**) >The physical location of the brick. Some policy rules might be set up to >spread data across more than one rack. > > User-defined keys are also allowed. For example, some users might use a > *tenant* or *security-level* tag as the basis for their placement policy. > > ## Aggregation Rules > > Aggregation rules are used to define how bricks should be combined into > subvolumes, and those potentially combined into higher-level subvolumes, and > so > on until all of the bricks are accounted for. Each aggregation rule consists > of the following parts: > > * **id** >The base name of the subvolumes the rule will create. If a rule is >applied >multiple times this will yield *id-0*, *id-1*, and so on. > > * **selector** >A "filter" for which bricks or lower-level subvolumes the rule will >aggregate. This is an expression similar to a *WHERE* clause in SQL, >using >brick/subvolume names and properties in lieu of columns. These values are >then matched against literal values or regular expressions, using the >usual >set of boolean operators to arrive at a *yes* or *no* answer to the >question >of whether this brick/subvolume is affected by this rule. > > * **group-size** (optional) >The number of original bricks/subvolumes to be combined into each produced >subvolume. The special default value zero means to collect all original >bricks or subvolumes into one final subvolume. In this case, *id* is used >directly instead of having a numeric suffix appended. Should the no. of bricks or lower-level subvolumes that match the rule be an exact multiple of group-size? > > * **type** (optional) >The type of the generated translator definition(s). Examples might >include >"AFR" to do replication, "EC" to do erasure coding, and so on. The more >general data classification task includes the definition of new >translators >to do tiering and other kinds of filtering, but those are beyond the scope >of this document. If no type is specified, cluster/dht will be used to do >random placement among its constituents. > > * **tag** and **option** (optional, repeatable) >Additional tags and/or options to be applied to each newly created >subvolume. See the "replica 2.5" example to see how this can be used. > > Since each type might have unique requirements, such as ensuring that > replication is done across machines or racks whenever possible, it is assumed > that there will be corresponding type-specific scripts or functions to do the > a