Re: [Gluster-devel] Serialization of fops acting on same dentry on server
On 08/24/2016 05:29 AM, Vijay Bellur wrote: > On Tue, Aug 23, 2016 at 12:18 PM, Niels de Vos wrote: >> On Tue, Aug 23, 2016 at 08:16:54PM +0530, Mohammed Rafi K C wrote: >>> Hi, >>> >>> We have pushed a patch for fop serialization on server side [1]. If you >>> have some time, please take a look into the patch. You are reviews are >>> most welcome :) >>> >>> >>> If I can accommodate all the comments by End of the week, we are >>> planning to get this before the coming Friday. >> Without looking into the code yet, I would like to see a different name >> for "DFS". It is a function of the Samba protocol, and having a name >> like this in the Gluster sources will cause confusion. Thanks for the suggestion, I will use a different name or I will add any gluster related prefix. >> >> Does this come with a design document in the glusterfs-specs repository? >> Features like this can not be accepted without one. If you want this >> included in 3.9, it should also get added to >> https://www.gluster.org/community/roadmap/3.9/ . It looks a little late >> for proposing a new feature, and only have a couple of days to review >> the design and a 1500+ patch that does not include any test-cases yet. >> If this really is the current state, I suggest to move it to the next >> release and use the additional three months (only!) to stabilize it. I understand the risk involved here. Do we have a feature page for 3.10 or any procedure to get started for 3.10. >> > > +1. We need to have more discussion on this one. Besides a 12 month > old email thread, I have not seen more details about this feature. > Providing more details on the design, nature of testing done, > performance impact if any etc. would be necessary before merging any > patchset of this nature. I will add more details about performance and testing that we are planning to do to get this qualified. Thanks for your input. Regards Rafi KC > > Regards, > Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
On Tue, Aug 23, 2016 at 12:18 PM, Niels de Vos wrote: > On Tue, Aug 23, 2016 at 08:16:54PM +0530, Mohammed Rafi K C wrote: >> Hi, >> >> We have pushed a patch for fop serialization on server side [1]. If you >> have some time, please take a look into the patch. You are reviews are >> most welcome :) >> >> >> If I can accommodate all the comments by End of the week, we are >> planning to get this before the coming Friday. > > Without looking into the code yet, I would like to see a different name > for "DFS". It is a function of the Samba protocol, and having a name > like this in the Gluster sources will cause confusion. > > Does this come with a design document in the glusterfs-specs repository? > Features like this can not be accepted without one. If you want this > included in 3.9, it should also get added to > https://www.gluster.org/community/roadmap/3.9/ . It looks a little late > for proposing a new feature, and only have a couple of days to review > the design and a 1500+ patch that does not include any test-cases yet. > If this really is the current state, I suggest to move it to the next > release and use the additional three months (only!) to stabilize it. > +1. We need to have more discussion on this one. Besides a 12 month old email thread, I have not seen more details about this feature. Providing more details on the design, nature of testing done, performance impact if any etc. would be necessary before merging any patchset of this nature. Regards, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
On Tue, Aug 23, 2016 at 08:16:54PM +0530, Mohammed Rafi K C wrote: > Hi, > > We have pushed a patch for fop serialization on server side [1]. If you > have some time, please take a look into the patch. You are reviews are > most welcome :) > > > If I can accommodate all the comments by End of the week, we are > planning to get this before the coming Friday. Without looking into the code yet, I would like to see a different name for "DFS". It is a function of the Samba protocol, and having a name like this in the Gluster sources will cause confusion. Does this come with a design document in the glusterfs-specs repository? Features like this can not be accepted without one. If you want this included in 3.9, it should also get added to https://www.gluster.org/community/roadmap/3.9/ . It looks a little late for proposing a new feature, and only have a couple of days to review the design and a 1500+ patch that does not include any test-cases yet. If this really is the current state, I suggest to move it to the next release and use the additional three months (only!) to stabilize it. Thanks, Niels > Note: Meantime I will be working to get the performance numbers to see > how much performance drop can it cause. > > > [1] : http://review.gluster.org/13451 > > Regards > > Rafi KC > > > On 08/19/2015 02:55 PM, Pranith Kumar Karampuri wrote: > > + Ravi, Anuradha > > > > On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote: > >> All, > >> > >> Pranith and me were discussing about implementation of compound > >> operations like "create + lock", "mkdir + lock", "open + lock" etc. > >> These operations are useful in situations like: > >> > >> 1. To prevent locking on all subvols during directory creation as > >> part of self heal in dht. Currently we are following approach of > >> locking _all_ subvols by both rmdir and lookup-heal [1]. > >> 2. To lock a file in advance so that there is less performance hit > >> during transactions in afr. > >> > >> While thinking about implementing such compound operations, it > >> occurred to me that one of the problems would be how do we handle a > >> racing mkdir/create and a (named lookup - simply referred as lookup > >> from now on - followed by lock). This is because, > >> 1. creation of directory/file on backend > >> 2. linking of the inode with the gfid corresponding to that > >> file/directory > >> > >> are not atomic. It is not guaranteed that inode passed down during > >> mkdir/create call need not be the one that survives in inode table. > >> Since posix-locks xlator maintains all the lock-state in inode, it > >> would be a problem if a different inode is linked in inode table than > >> the one passed during mkdir/create. One way to solve this problem is > >> to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) > >> that are happening on a particular dentry. This serialization would > >> also solve other bugs like: > >> > >> 1. issues solved by [2][3] and possibly many such issues. > >> 2. Stale dentries left out in bricks' inode table because of a racing > >> lookup and dentry modification ops (like rmdir, unlink, rename etc). > >> > >> Initial idea I've now is to maintain fops in-progress on a dentry in > >> parent inode (may be resolver code in protocol/server). Based on this > >> we can serialize the operations. Since we need to serialize _only_ > >> operations on a dentry (we don't serialize nameless lookups), it is > >> guaranteed that we do have a parent inode always. Any > >> comments/discussion on this would be appreciated. > >> > >> [1] http://review.gluster.org/11725 > >> [2] http://review.gluster.org/9913 > >> [3] http://review.gluster.org/5240 > >> > >> regards, > >> Raghavendra. > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
Hi, We have pushed a patch for fop serialization on server side [1]. If you have some time, please take a look into the patch. You are reviews are most welcome :) If I can accommodate all the comments by End of the week, we are planning to get this before the coming Friday. Note: Meantime I will be working to get the performance numbers to see how much performance drop can it cause. [1] : http://review.gluster.org/13451 Regards Rafi KC On 08/19/2015 02:55 PM, Pranith Kumar Karampuri wrote: > + Ravi, Anuradha > > On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote: >> All, >> >> Pranith and me were discussing about implementation of compound >> operations like "create + lock", "mkdir + lock", "open + lock" etc. >> These operations are useful in situations like: >> >> 1. To prevent locking on all subvols during directory creation as >> part of self heal in dht. Currently we are following approach of >> locking _all_ subvols by both rmdir and lookup-heal [1]. >> 2. To lock a file in advance so that there is less performance hit >> during transactions in afr. >> >> While thinking about implementing such compound operations, it >> occurred to me that one of the problems would be how do we handle a >> racing mkdir/create and a (named lookup - simply referred as lookup >> from now on - followed by lock). This is because, >> 1. creation of directory/file on backend >> 2. linking of the inode with the gfid corresponding to that >> file/directory >> >> are not atomic. It is not guaranteed that inode passed down during >> mkdir/create call need not be the one that survives in inode table. >> Since posix-locks xlator maintains all the lock-state in inode, it >> would be a problem if a different inode is linked in inode table than >> the one passed during mkdir/create. One way to solve this problem is >> to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) >> that are happening on a particular dentry. This serialization would >> also solve other bugs like: >> >> 1. issues solved by [2][3] and possibly many such issues. >> 2. Stale dentries left out in bricks' inode table because of a racing >> lookup and dentry modification ops (like rmdir, unlink, rename etc). >> >> Initial idea I've now is to maintain fops in-progress on a dentry in >> parent inode (may be resolver code in protocol/server). Based on this >> we can serialize the operations. Since we need to serialize _only_ >> operations on a dentry (we don't serialize nameless lookups), it is >> guaranteed that we do have a parent inode always. Any >> comments/discussion on this would be appreciated. >> >> [1] http://review.gluster.org/11725 >> [2] http://review.gluster.org/9913 >> [3] http://review.gluster.org/5240 >> >> regards, >> Raghavendra. > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
+ Ravi, Anuradha On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote: All, Pranith and me were discussing about implementation of compound operations like "create + lock", "mkdir + lock", "open + lock" etc. These operations are useful in situations like: 1. To prevent locking on all subvols during directory creation as part of self heal in dht. Currently we are following approach of locking _all_ subvols by both rmdir and lookup-heal [1]. 2. To lock a file in advance so that there is less performance hit during transactions in afr. While thinking about implementing such compound operations, it occurred to me that one of the problems would be how do we handle a racing mkdir/create and a (named lookup - simply referred as lookup from now on - followed by lock). This is because, 1. creation of directory/file on backend 2. linking of the inode with the gfid corresponding to that file/directory are not atomic. It is not guaranteed that inode passed down during mkdir/create call need not be the one that survives in inode table. Since posix-locks xlator maintains all the lock-state in inode, it would be a problem if a different inode is linked in inode table than the one passed during mkdir/create. One way to solve this problem is to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a particular dentry. This serialization would also solve other bugs like: 1. issues solved by [2][3] and possibly many such issues. 2. Stale dentries left out in bricks' inode table because of a racing lookup and dentry modification ops (like rmdir, unlink, rename etc). Initial idea I've now is to maintain fops in-progress on a dentry in parent inode (may be resolver code in protocol/server). Based on this we can serialize the operations. Since we need to serialize _only_ operations on a dentry (we don't serialize nameless lookups), it is guaranteed that we do have a parent inode always. Any comments/discussion on this would be appreciated. [1] http://review.gluster.org/11725 [2] http://review.gluster.org/9913 [3] http://review.gluster.org/5240 regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
On 08/17/2015 01:19 AM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Gluster Devel" Cc: "Sakshi Bansal" Sent: Monday, 17 August, 2015 10:39:38 AM Subject: [Gluster-devel] Serialization of fops acting on same dentry on server All, Pranith and me were discussing about implementation of compound operations like "create + lock", "mkdir + lock", "open + lock" etc. These operations are useful in situations like: 1. To prevent locking on all subvols during directory creation as part of self heal in dht. Currently we are following approach of locking _all_ subvols by both rmdir and lookup-heal [1]. Correction. It should've been, "to prevent locking on all subvols during rmdir". The lookup self-heal should lock on all subvols (with compound "mkdir + lookup" if directory is not present on a subvol). With this rmdir/rename can lock on just any one subvol and this will prevent any parallel lookup-heal from preventing directory creation. 2. To lock a file in advance so that there is less performance hit during transactions in afr. I see multiple thoughts here and am splitting what I think into these parts, - Compound FOPs: The whole idea and need for compound FOPs I think is very useful. Initially compounding the FOP+Lock is a good idea as this is mostly internal to Gluster and does not change any interface to any of the consumers. Also, as Pranith is involved we can iron out AFR/EC related possibilities in such compounding as well. In compounding I am only concerned about cases where part of the compound operation succeeds on one replica, but fails on the other, as an example if the mkdir succeeds on one and so locking subsequently succeeds, but mkdir fails on the other (because a competing clients compound FOP raced this one), how can we handle such situations? Do we need server side AFR/EC with leader election link in NSR to handle this? (maybe the example is not a good/firm one for this case, but nevertheless can compounding create such problems?) Another question would be, we need to compound it as Lock+FOP rather than FOP+Lock in some cases, right? - Advance locking to reduce serial RPC requests that degrade performance: This is again a good thing to do, part of such a concept is in eager locking already (as I see it). What I would like to see in this regard would be eager leasing (piggyback leases) of a file (and loosely directory, as I need to think through that case more) so that we can optimize the common case when a file is being operated by a single client and degrade to fine grained locking when multiple clients compete. Assuming eager leasing, AFR transactions need only client side in memory locking (to prevent 2 threads/consumers of the client racing on the same file/dir) and also, with leasing and lease breaking we can get better at cooperating with other clients than what eager locking does now. In short, I would like to see the advance locking or leasing be, is part of the client side caching stack, so that multiple xlators on the client can leverage the same and I would like the leasing model over the locking model as it allows easier breaking than locks. While thinking about implementing such compound operations, it occurred to me that one of the problems would be how do we handle a racing mkdir/create and a (named lookup - simply referred as lookup from now on - followed by lock). This is because, 1. creation of directory/file on backend 2. linking of the inode with the gfid corresponding to that file/directory are not atomic. It is not guaranteed that inode passed down during mkdir/create call need not be the one that survives in inode table. Since posix-locks xlator maintains all the lock-state in inode, it would be a problem if a different inode is linked in inode table than the one passed during mkdir/create. One way to solve this problem is to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a particular dentry. This serialization would also solve other bugs like: 1. issues solved by [2][3] and possibly many such issues. 2. Stale dentries left out in bricks' inode table because of a racing lookup and dentry modification ops (like rmdir, unlink, rename etc). Initial idea I've now is to maintain fops in-progress on a dentry in parent inode (may be resolver code in protocol/server). Based on this we can serialize the operations. Since we need to serialize _only_ operations on a dentry (we don't serialize nameless lookups), it is guaranteed that we do have a parent inode always. Any comments/discussion on this would be appreciated. My initial comments on this would be to refer to FS locking notes in Linux kernel, which has rules for locking during dentry operations and such. The next part is as follows, - Why create the name (dentry) before creati
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
- Original Message - > From: "Niels de Vos" > To: "Raghavendra Gowdappa" > Cc: "Gluster Devel" , "Sakshi Bansal" > > Sent: Monday, 17 August, 2015 11:14:18 AM > Subject: Re: [Gluster-devel] Serialization of fops acting on same dentry on > server > > On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote: > > All, > > > > Pranith and me were discussing about implementation of compound > > operations like "create + lock", "mkdir + lock", "open + lock" etc. > > These operations are useful in situations like: > > > > 1. To prevent locking on all subvols during directory creation as part > > of self heal in dht. Currently we are following approach of locking > > _all_ subvols by both rmdir and lookup-heal [1]. > > 2. To lock a file in advance so that there is less performance hit > > during transactions in afr. > > I have an interest in compound/composite procedures too. My use-case is > a little different, and I (was and still) am planning to send more > details about it soon. > > Basically, there are certain cases where libgfapi will not be able to > automatically pass the uid/gid in the RPC-header. A design for > supporting Kerberos will mainly use the standardized RPCSEC_GSS. If > there is no option to use the Kerberos credentials of the user doing > I/O (remote client, not using Kerberos to talk to samba/ganesha), the > username (or uid/gid) needs to be passed to the storage servers. > > A compound/composite procedure would then look like this: > > [RPC header] > [AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...] > > [GlusterFS COMPOUND] > [SETFSUID] > [SETLOCKOWNER] > [${FOP}] > [.. more FOPs?] > > This idea has not been reviewed/commented on with some of the Kerberos > experts that I want to involve. A more complete description about the > plans to support Kerberos will follow. > > Do you think that this matches your ideas on compound operations? The thing we had in mind was more of compounding more than one Gluster fops. We really didn't think at the granularity of setfsuid, setlkowner etc. But, yes its not something fundamentally different from what we had in mind. > > Thanks, > Niels > > > > > > While thinking about implementing such compound operations, it > > occurred to me that one of the problems would be how do we handle a > > racing mkdir/create and a (named lookup - simply referred as lookup > > from now on - followed by lock). This is because, > > 1. creation of directory/file on backend > > 2. linking of the inode with the gfid corresponding to that > > file/directory > > > > are not atomic. It is not guaranteed that inode passed down during > > mkdir/create call need not be the one that survives in inode table. > > Since posix-locks xlator maintains all the lock-state in inode, it > > would be a problem if a different inode is linked in inode table than > > the one passed during mkdir/create. One way to solve this problem is > > to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) > > that are happening on a particular dentry. This serialization would > > also solve other bugs like: > > > > 1. issues solved by [2][3] and possibly many such issues. > > 2. Stale dentries left out in bricks' inode table because of a racing > > lookup and dentry modification ops (like rmdir, unlink, rename etc). > > > > Initial idea I've now is to maintain fops in-progress on a dentry in > > parent inode (may be resolver code in protocol/server). Based on this > > we can serialize the operations. Since we need to serialize _only_ > > operations on a dentry (we don't serialize nameless lookups), it is > > guaranteed that we do have a parent inode always. Any > > comments/discussion on this would be appreciated. > > > > [1] http://review.gluster.org/11725 > > [2] http://review.gluster.org/9913 > > [3] http://review.gluster.org/5240 > > > > regards, > > Raghavendra. > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote: > All, > > Pranith and me were discussing about implementation of compound > operations like "create + lock", "mkdir + lock", "open + lock" etc. > These operations are useful in situations like: > > 1. To prevent locking on all subvols during directory creation as part > of self heal in dht. Currently we are following approach of locking > _all_ subvols by both rmdir and lookup-heal [1]. > 2. To lock a file in advance so that there is less performance hit > during transactions in afr. I have an interest in compound/composite procedures too. My use-case is a little different, and I (was and still) am planning to send more details about it soon. Basically, there are certain cases where libgfapi will not be able to automatically pass the uid/gid in the RPC-header. A design for supporting Kerberos will mainly use the standardized RPCSEC_GSS. If there is no option to use the Kerberos credentials of the user doing I/O (remote client, not using Kerberos to talk to samba/ganesha), the username (or uid/gid) needs to be passed to the storage servers. A compound/composite procedure would then look like this: [RPC header] [AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...] [GlusterFS COMPOUND] [SETFSUID] [SETLOCKOWNER] [${FOP}] [.. more FOPs?] This idea has not been reviewed/commented on with some of the Kerberos experts that I want to involve. A more complete description about the plans to support Kerberos will follow. Do you think that this matches your ideas on compound operations? Thanks, Niels > > While thinking about implementing such compound operations, it > occurred to me that one of the problems would be how do we handle a > racing mkdir/create and a (named lookup - simply referred as lookup > from now on - followed by lock). This is because, > 1. creation of directory/file on backend > 2. linking of the inode with the gfid corresponding to that > file/directory > > are not atomic. It is not guaranteed that inode passed down during > mkdir/create call need not be the one that survives in inode table. > Since posix-locks xlator maintains all the lock-state in inode, it > would be a problem if a different inode is linked in inode table than > the one passed during mkdir/create. One way to solve this problem is > to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) > that are happening on a particular dentry. This serialization would > also solve other bugs like: > > 1. issues solved by [2][3] and possibly many such issues. > 2. Stale dentries left out in bricks' inode table because of a racing > lookup and dentry modification ops (like rmdir, unlink, rename etc). > > Initial idea I've now is to maintain fops in-progress on a dentry in > parent inode (may be resolver code in protocol/server). Based on this > we can serialize the operations. Since we need to serialize _only_ > operations on a dentry (we don't serialize nameless lookups), it is > guaranteed that we do have a parent inode always. Any > comments/discussion on this would be appreciated. > > [1] http://review.gluster.org/11725 > [2] http://review.gluster.org/9913 > [3] http://review.gluster.org/5240 > > regards, > Raghavendra. > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel pgp95Z8uzDPPJ.pgp Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
- Original Message - > From: "Raghavendra Gowdappa" > To: "Gluster Devel" > Cc: "Sakshi Bansal" > Sent: Monday, 17 August, 2015 10:39:38 AM > Subject: [Gluster-devel] Serialization of fops acting on same dentry on > server > > All, > > Pranith and me were discussing about implementation of compound operations > like "create + lock", "mkdir + lock", "open + lock" etc. These operations > are useful in situations like: > > 1. To prevent locking on all subvols during directory creation as part of > self heal in dht. Currently we are following approach of locking _all_ > subvols by both rmdir and lookup-heal [1]. Correction. It should've been, "to prevent locking on all subvols during rmdir". The lookup self-heal should lock on all subvols (with compound "mkdir + lookup" if directory is not present on a subvol). With this rmdir/rename can lock on just any one subvol and this will prevent any parallel lookup-heal from preventing directory creation. > 2. To lock a file in advance so that there is less performance hit during > transactions in afr. > > While thinking about implementing such compound operations, it occurred to me > that one of the problems would be how do we handle a racing mkdir/create and > a (named lookup - simply referred as lookup from now on - followed by lock). > This is because, > 1. creation of directory/file on backend > 2. linking of the inode with the gfid corresponding to that file/directory > > are not atomic. It is not guaranteed that inode passed down during > mkdir/create call need not be the one that survives in inode table. Since > posix-locks xlator maintains all the lock-state in inode, it would be a > problem if a different inode is linked in inode table than the one passed > during mkdir/create. One way to solve this problem is to serialize fops > (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a > particular dentry. This serialization would also solve other bugs like: > > 1. issues solved by [2][3] and possibly many such issues. > 2. Stale dentries left out in bricks' inode table because of a racing lookup > and dentry modification ops (like rmdir, unlink, rename etc). > > Initial idea I've now is to maintain fops in-progress on a dentry in parent > inode (may be resolver code in protocol/server). Based on this we can > serialize the operations. Since we need to serialize _only_ operations on a > dentry (we don't serialize nameless lookups), it is guaranteed that we do > have a parent inode always. Any comments/discussion on this would be > appreciated. > > [1] http://review.gluster.org/11725 > [2] http://review.gluster.org/9913 > [3] http://review.gluster.org/5240 > > regards, > Raghavendra. > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Serialization of fops acting on same dentry on server
All, Pranith and me were discussing about implementation of compound operations like "create + lock", "mkdir + lock", "open + lock" etc. These operations are useful in situations like: 1. To prevent locking on all subvols during directory creation as part of self heal in dht. Currently we are following approach of locking _all_ subvols by both rmdir and lookup-heal [1]. 2. To lock a file in advance so that there is less performance hit during transactions in afr. While thinking about implementing such compound operations, it occurred to me that one of the problems would be how do we handle a racing mkdir/create and a (named lookup - simply referred as lookup from now on - followed by lock). This is because, 1. creation of directory/file on backend 2. linking of the inode with the gfid corresponding to that file/directory are not atomic. It is not guaranteed that inode passed down during mkdir/create call need not be the one that survives in inode table. Since posix-locks xlator maintains all the lock-state in inode, it would be a problem if a different inode is linked in inode table than the one passed during mkdir/create. One way to solve this problem is to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a particular dentry. This serialization would also solve other bugs like: 1. issues solved by [2][3] and possibly many such issues. 2. Stale dentries left out in bricks' inode table because of a racing lookup and dentry modification ops (like rmdir, unlink, rename etc). Initial idea I've now is to maintain fops in-progress on a dentry in parent inode (may be resolver code in protocol/server). Based on this we can serialize the operations. Since we need to serialize _only_ operations on a dentry (we don't serialize nameless lookups), it is guaranteed that we do have a parent inode always. Any comments/discussion on this would be appreciated. [1] http://review.gluster.org/11725 [2] http://review.gluster.org/9913 [3] http://review.gluster.org/5240 regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel