Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2016-08-26 Thread Mohammed Rafi K C


On 08/24/2016 05:29 AM, Vijay Bellur wrote:
> On Tue, Aug 23, 2016 at 12:18 PM, Niels de Vos  wrote:
>> On Tue, Aug 23, 2016 at 08:16:54PM +0530, Mohammed Rafi K C wrote:
>>> Hi,
>>>
>>> We have pushed a patch for fop serialization on server side [1]. If you
>>> have some time, please take a look into the patch. You are reviews are
>>> most welcome :)
>>>
>>>
>>> If I can accommodate all the comments by End of the week, we are
>>> planning to get this before the coming Friday.
>> Without looking into the code yet, I would like to see a different name
>> for "DFS". It is a function of the Samba protocol, and having a name
>> like this in the Gluster sources will cause confusion.

Thanks for the suggestion, I will use a different name or I will add any
gluster related prefix.

>>
>> Does this come with a design document in the glusterfs-specs repository?
>> Features like this can not be accepted without one. If you want this
>> included in 3.9, it should also get added to
>> https://www.gluster.org/community/roadmap/3.9/ . It looks a little late
>> for proposing a new feature, and only have a couple of days to review
>> the design and a 1500+ patch that does not include any test-cases yet.
>> If this really is the current state, I suggest to move it to the next
>> release and use the additional three months (only!) to stabilize it.
I understand the risk involved here. Do we have a feature page for 3.10
or any procedure to get started for 3.10.



>>
>
> +1. We need to have more discussion on this one. Besides a 12 month
> old email thread, I have not seen more details about this feature.
> Providing more details on the design, nature of testing done,
> performance impact if any etc. would be necessary before merging any
> patchset of this nature.

I will add more details about performance and testing that we are
planning to do to get this qualified.

Thanks for your input.

Regards
Rafi KC


>
> Regards,
> Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2016-08-23 Thread Vijay Bellur
On Tue, Aug 23, 2016 at 12:18 PM, Niels de Vos  wrote:
> On Tue, Aug 23, 2016 at 08:16:54PM +0530, Mohammed Rafi K C wrote:
>> Hi,
>>
>> We have pushed a patch for fop serialization on server side [1]. If you
>> have some time, please take a look into the patch. You are reviews are
>> most welcome :)
>>
>>
>> If I can accommodate all the comments by End of the week, we are
>> planning to get this before the coming Friday.
>
> Without looking into the code yet, I would like to see a different name
> for "DFS". It is a function of the Samba protocol, and having a name
> like this in the Gluster sources will cause confusion.
>
> Does this come with a design document in the glusterfs-specs repository?
> Features like this can not be accepted without one. If you want this
> included in 3.9, it should also get added to
> https://www.gluster.org/community/roadmap/3.9/ . It looks a little late
> for proposing a new feature, and only have a couple of days to review
> the design and a 1500+ patch that does not include any test-cases yet.
> If this really is the current state, I suggest to move it to the next
> release and use the additional three months (only!) to stabilize it.
>


+1. We need to have more discussion on this one. Besides a 12 month
old email thread, I have not seen more details about this feature.
Providing more details on the design, nature of testing done,
performance impact if any etc. would be necessary before merging any
patchset of this nature.

Regards,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2016-08-23 Thread Niels de Vos
On Tue, Aug 23, 2016 at 08:16:54PM +0530, Mohammed Rafi K C wrote:
> Hi,
> 
> We have pushed a patch for fop serialization on server side [1]. If you
> have some time, please take a look into the patch. You are reviews are
> most welcome :)
> 
> 
> If I can accommodate all the comments by End of the week, we are
> planning to get this before the coming Friday.

Without looking into the code yet, I would like to see a different name
for "DFS". It is a function of the Samba protocol, and having a name
like this in the Gluster sources will cause confusion.

Does this come with a design document in the glusterfs-specs repository?
Features like this can not be accepted without one. If you want this
included in 3.9, it should also get added to
https://www.gluster.org/community/roadmap/3.9/ . It looks a little late
for proposing a new feature, and only have a couple of days to review
the design and a 1500+ patch that does not include any test-cases yet.
If this really is the current state, I suggest to move it to the next
release and use the additional three months (only!) to stabilize it.

Thanks,
Niels


> Note: Meantime I will be working to get the performance numbers to see
> how much performance drop can it cause.
> 
> 
> [1] : http://review.gluster.org/13451
> 
> Regards
> 
> Rafi KC
> 
> 
> On 08/19/2015 02:55 PM, Pranith Kumar Karampuri wrote:
> > + Ravi, Anuradha
> >
> > On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:
> >> All,
> >>
> >> Pranith and me were discussing about implementation of compound
> >> operations like "create + lock", "mkdir + lock", "open + lock" etc.
> >> These operations are useful in situations like:
> >>
> >> 1. To prevent locking on all subvols during directory creation as
> >> part of self heal in dht. Currently we are following approach of
> >> locking _all_ subvols by both rmdir and lookup-heal [1].
> >> 2. To lock a file in advance so that there is less performance hit
> >> during transactions in afr.
> >>
> >> While thinking about implementing such compound operations, it
> >> occurred to me that one of the problems would be how do we handle a
> >> racing mkdir/create and a (named lookup - simply referred as lookup
> >> from now on - followed by lock). This is because,
> >> 1. creation of directory/file on backend
> >> 2. linking of the inode with the gfid corresponding to that
> >> file/directory
> >>
> >> are not atomic. It is not guaranteed that inode passed down during
> >> mkdir/create call need not be the one that survives in inode table.
> >> Since posix-locks xlator maintains all the lock-state in inode, it
> >> would be a problem if a different inode is linked in inode table than
> >> the one passed during mkdir/create. One way to solve this problem is
> >> to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink)
> >> that are happening on a particular dentry. This serialization would
> >> also solve other bugs like:
> >>
> >> 1. issues solved by [2][3] and possibly many such issues.
> >> 2. Stale dentries left out in bricks' inode table because of a racing
> >> lookup and dentry modification ops (like rmdir, unlink, rename etc).
> >>
> >> Initial idea I've now is to maintain fops in-progress on a dentry in
> >> parent inode (may be resolver code in protocol/server). Based on this
> >> we can serialize the operations. Since we need to serialize _only_
> >> operations on a dentry (we don't serialize nameless lookups), it is
> >> guaranteed that we do have a parent inode always. Any
> >> comments/discussion on this would be appreciated.
> >>
> >> [1] http://review.gluster.org/11725
> >> [2] http://review.gluster.org/9913
> >> [3] http://review.gluster.org/5240
> >>
> >> regards,
> >> Raghavendra.
> >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2016-08-23 Thread Mohammed Rafi K C
Hi,

We have pushed a patch for fop serialization on server side [1]. If you
have some time, please take a look into the patch. You are reviews are
most welcome :)


If I can accommodate all the comments by End of the week, we are
planning to get this before the coming Friday.


Note: Meantime I will be working to get the performance numbers to see
how much performance drop can it cause.


[1] : http://review.gluster.org/13451

Regards

Rafi KC


On 08/19/2015 02:55 PM, Pranith Kumar Karampuri wrote:
> + Ravi, Anuradha
>
> On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:
>> All,
>>
>> Pranith and me were discussing about implementation of compound
>> operations like "create + lock", "mkdir + lock", "open + lock" etc.
>> These operations are useful in situations like:
>>
>> 1. To prevent locking on all subvols during directory creation as
>> part of self heal in dht. Currently we are following approach of
>> locking _all_ subvols by both rmdir and lookup-heal [1].
>> 2. To lock a file in advance so that there is less performance hit
>> during transactions in afr.
>>
>> While thinking about implementing such compound operations, it
>> occurred to me that one of the problems would be how do we handle a
>> racing mkdir/create and a (named lookup - simply referred as lookup
>> from now on - followed by lock). This is because,
>> 1. creation of directory/file on backend
>> 2. linking of the inode with the gfid corresponding to that
>> file/directory
>>
>> are not atomic. It is not guaranteed that inode passed down during
>> mkdir/create call need not be the one that survives in inode table.
>> Since posix-locks xlator maintains all the lock-state in inode, it
>> would be a problem if a different inode is linked in inode table than
>> the one passed during mkdir/create. One way to solve this problem is
>> to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink)
>> that are happening on a particular dentry. This serialization would
>> also solve other bugs like:
>>
>> 1. issues solved by [2][3] and possibly many such issues.
>> 2. Stale dentries left out in bricks' inode table because of a racing
>> lookup and dentry modification ops (like rmdir, unlink, rename etc).
>>
>> Initial idea I've now is to maintain fops in-progress on a dentry in
>> parent inode (may be resolver code in protocol/server). Based on this
>> we can serialize the operations. Since we need to serialize _only_
>> operations on a dentry (we don't serialize nameless lookups), it is
>> guaranteed that we do have a parent inode always. Any
>> comments/discussion on this would be appreciated.
>>
>> [1] http://review.gluster.org/11725
>> [2] http://review.gluster.org/9913
>> [3] http://review.gluster.org/5240
>>
>> regards,
>> Raghavendra.
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-19 Thread Pranith Kumar Karampuri

+ Ravi, Anuradha

On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote:

All,

Pranith and me were discussing about implementation of compound operations like "create + lock", 
"mkdir + lock", "open + lock" etc. These operations are useful in situations like:

1. To prevent locking on all subvols during directory creation as part of self 
heal in dht. Currently we are following approach of locking _all_ subvols by 
both rmdir and lookup-heal [1].
2. To lock a file in advance so that there is less performance hit during 
transactions in afr.

While thinking about implementing such compound operations, it occurred to me 
that one of the problems would be how do we handle a racing mkdir/create and a 
(named lookup - simply referred as lookup from now on - followed by lock). This 
is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory

are not atomic. It is not guaranteed that inode passed down during mkdir/create 
call need not be the one that survives in inode table. Since posix-locks xlator 
maintains all the lock-state in inode, it would be a problem if a different 
inode is linked in inode table than the one passed during mkdir/create. One way 
to solve this problem is to serialize fops (like mkdir/create, lookup, rename, 
rmdir, unlink) that are happening on a particular dentry. This serialization 
would also solve other bugs like:

1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup 
and dentry modification ops (like rmdir, unlink, rename etc).

Initial idea I've now is to maintain fops in-progress on a dentry in parent 
inode (may be resolver code in protocol/server). Based on this we can serialize 
the operations. Since we need to serialize _only_ operations on a dentry (we 
don't serialize nameless lookups), it is guaranteed that we do have a parent 
inode always. Any comments/discussion on this would be appreciated.

[1] http://review.gluster.org/11725
[2] http://review.gluster.org/9913
[3] http://review.gluster.org/5240

regards,
Raghavendra.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-17 Thread Shyam

On 08/17/2015 01:19 AM, Raghavendra Gowdappa wrote:



- Original Message -

From: "Raghavendra Gowdappa" 
To: "Gluster Devel" 
Cc: "Sakshi Bansal" 
Sent: Monday, 17 August, 2015 10:39:38 AM
Subject: [Gluster-devel] Serialization of fops acting on same dentry on server

All,

Pranith and me were discussing about implementation of compound operations
like "create + lock", "mkdir + lock", "open + lock" etc. These operations
are useful in situations like:

1. To prevent locking on all subvols during directory creation as part of
self heal in dht. Currently we are following approach of locking _all_
subvols by both rmdir and lookup-heal [1].


Correction. It should've been, "to prevent locking on all subvols during rmdir". The 
lookup self-heal should lock on all subvols (with compound "mkdir + lookup" if directory 
is not present on a subvol). With this rmdir/rename can lock on just any one subvol and this will 
prevent any parallel lookup-heal from preventing directory creation.


2. To lock a file in advance so that there is less performance hit during
transactions in afr.


I see multiple thoughts here and am splitting what I think into these parts,

- Compound FOPs:
The whole idea and need for compound FOPs I think is very useful. 
Initially compounding the FOP+Lock is a good idea as this is mostly 
internal to Gluster and does not change any interface to any of the 
consumers. Also, as Pranith is involved we can iron out AFR/EC related 
possibilities in such compounding as well.


In compounding I am only concerned about cases where part of the 
compound operation succeeds on one replica, but fails on the other, as 
an example if the mkdir succeeds on one and so locking subsequently 
succeeds, but mkdir fails on the other (because a competing clients 
compound FOP raced this one), how can we handle such situations? Do we 
need server side AFR/EC with leader election link in NSR to handle this? 
(maybe the example is not a good/firm one for this case, but 
nevertheless can compounding create such problems?)


Another question would be, we need to compound it as Lock+FOP rather 
than FOP+Lock in some cases, right?


- Advance locking to reduce serial RPC requests that degrade performance:
This is again a good thing to do, part of such a concept is in eager 
locking already (as I see it). What I would like to see in this regard 
would be eager leasing (piggyback leases) of a file (and loosely 
directory, as I need to think through that case more) so that we can 
optimize the common case when a file is being operated by a single 
client and degrade to fine grained locking when multiple clients compete.


Assuming eager leasing, AFR transactions need only client side in memory 
locking (to prevent 2 threads/consumers of the client racing on the same 
file/dir) and also, with leasing and lease breaking we can get better at 
cooperating with other clients than what eager locking does now.


In short, I would like to see the advance locking or leasing be, is part 
of the client side caching stack, so that multiple xlators on the client 
can leverage the same and I would like the leasing model over the 
locking model as it allows easier breaking than locks.




While thinking about implementing such compound operations, it occurred to me
that one of the problems would be how do we handle a racing mkdir/create and
a (named lookup - simply referred as lookup from now on - followed by lock).
This is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory

are not atomic. It is not guaranteed that inode passed down during
mkdir/create call need not be the one that survives in inode table. Since
posix-locks xlator maintains all the lock-state in inode, it would be a
problem if a different inode is linked in inode table than the one passed
during mkdir/create. One way to solve this problem is to serialize fops
(like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a
particular dentry. This serialization would also solve other bugs like:

1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup
and dentry modification ops (like rmdir, unlink, rename etc).

Initial idea I've now is to maintain fops in-progress on a dentry in parent
inode (may be resolver code in protocol/server). Based on this we can
serialize the operations. Since we need to serialize _only_ operations on a
dentry (we don't serialize nameless lookups), it is guaranteed that we do
have a parent inode always. Any comments/discussion on this would be
appreciated.


My initial comments on this would be to refer to FS locking notes in 
Linux kernel, which has rules for locking during dentry operations and such.


The next part is as follows,
- Why create the name (dentry) before creati

Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-16 Thread Raghavendra Gowdappa


- Original Message -
> From: "Niels de Vos" 
> To: "Raghavendra Gowdappa" 
> Cc: "Gluster Devel" , "Sakshi Bansal" 
> 
> Sent: Monday, 17 August, 2015 11:14:18 AM
> Subject: Re: [Gluster-devel] Serialization of fops acting on same dentry on 
> server
> 
> On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote:
> > All,
> > 
> > Pranith and me were discussing about implementation of compound
> > operations like "create + lock", "mkdir + lock", "open + lock" etc.
> > These operations are useful in situations like:
> > 
> > 1. To prevent locking on all subvols during directory creation as part
> > of self heal in dht. Currently we are following approach of locking
> > _all_ subvols by both rmdir and lookup-heal [1].
> > 2. To lock a file in advance so that there is less performance hit
> > during transactions in afr.
> 
> I have an interest in compound/composite procedures too. My use-case is
> a little different, and I (was and still) am planning to send more
> details about it soon.
> 
> Basically, there are certain cases where libgfapi will not be able to
> automatically pass the uid/gid in the RPC-header. A design for
> supporting Kerberos will mainly use the standardized RPCSEC_GSS. If
> there is no option to use the Kerberos credentials of the user doing
> I/O (remote client, not using Kerberos to talk to samba/ganesha), the
> username (or uid/gid) needs to be passed to the storage servers.
> 
> A compound/composite procedure would then look like this:
> 
>   [RPC header]
> [AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...]
> 
>   [GlusterFS COMPOUND]
> [SETFSUID]
> [SETLOCKOWNER]
> [${FOP}]
> [.. more FOPs?]
> 
> This idea has not been reviewed/commented on with some of the Kerberos
> experts that I want to involve. A more complete description about the
> plans to support Kerberos will follow.
> 
> Do you think that this matches your ideas on compound operations?

The thing we had in mind was more of compounding more than one Gluster fops. We 
really didn't think at the granularity of setfsuid, setlkowner etc. But, yes 
its not something fundamentally different from what we had in mind.

> 
> Thanks,
> Niels
> 
> 
> > 
> > While thinking about implementing such compound operations, it
> > occurred to me that one of the problems would be how do we handle a
> > racing mkdir/create and a (named lookup - simply referred as lookup
> > from now on - followed by lock). This is because,
> > 1. creation of directory/file on backend
> > 2. linking of the inode with the gfid corresponding to that
> > file/directory
> > 
> > are not atomic. It is not guaranteed that inode passed down during
> > mkdir/create call need not be the one that survives in inode table.
> > Since posix-locks xlator maintains all the lock-state in inode, it
> > would be a problem if a different inode is linked in inode table than
> > the one passed during mkdir/create. One way to solve this problem is
> > to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink)
> > that are happening on a particular dentry. This serialization would
> > also solve other bugs like:
> > 
> > 1. issues solved by [2][3] and possibly many such issues.
> > 2. Stale dentries left out in bricks' inode table because of a racing
> > lookup and dentry modification ops (like rmdir, unlink, rename etc).
> > 
> > Initial idea I've now is to maintain fops in-progress on a dentry in
> > parent inode (may be resolver code in protocol/server). Based on this
> > we can serialize the operations. Since we need to serialize _only_
> > operations on a dentry (we don't serialize nameless lookups), it is
> > guaranteed that we do have a parent inode always. Any
> > comments/discussion on this would be appreciated.
> > 
> > [1] http://review.gluster.org/11725
> > [2] http://review.gluster.org/9913
> > [3] http://review.gluster.org/5240
> > 
> > regards,
> > Raghavendra.
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-16 Thread Niels de Vos
On Mon, Aug 17, 2015 at 01:09:38AM -0400, Raghavendra Gowdappa wrote:
> All,
> 
> Pranith and me were discussing about implementation of compound
> operations like "create + lock", "mkdir + lock", "open + lock" etc.
> These operations are useful in situations like:
> 
> 1. To prevent locking on all subvols during directory creation as part
> of self heal in dht. Currently we are following approach of locking
> _all_ subvols by both rmdir and lookup-heal [1].
> 2. To lock a file in advance so that there is less performance hit
> during transactions in afr.

I have an interest in compound/composite procedures too. My use-case is
a little different, and I (was and still) am planning to send more
details about it soon.

Basically, there are certain cases where libgfapi will not be able to
automatically pass the uid/gid in the RPC-header. A design for
supporting Kerberos will mainly use the standardized RPCSEC_GSS. If
there is no option to use the Kerberos credentials of the user doing
I/O (remote client, not using Kerberos to talk to samba/ganesha), the
username (or uid/gid) needs to be passed to the storage servers.

A compound/composite procedure would then look like this:

  [RPC header]
[AUTH_GSS + Kerberos principal for libgfapi/samba/ganesha/...]

  [GlusterFS COMPOUND]
[SETFSUID]
[SETLOCKOWNER]
[${FOP}]
[.. more FOPs?]

This idea has not been reviewed/commented on with some of the Kerberos
experts that I want to involve. A more complete description about the
plans to support Kerberos will follow.

Do you think that this matches your ideas on compound operations?

Thanks,
Niels


> 
> While thinking about implementing such compound operations, it
> occurred to me that one of the problems would be how do we handle a
> racing mkdir/create and a (named lookup - simply referred as lookup
> from now on - followed by lock). This is because,
> 1. creation of directory/file on backend
> 2. linking of the inode with the gfid corresponding to that
> file/directory
> 
> are not atomic. It is not guaranteed that inode passed down during
> mkdir/create call need not be the one that survives in inode table.
> Since posix-locks xlator maintains all the lock-state in inode, it
> would be a problem if a different inode is linked in inode table than
> the one passed during mkdir/create. One way to solve this problem is
> to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink)
> that are happening on a particular dentry. This serialization would
> also solve other bugs like:
> 
> 1. issues solved by [2][3] and possibly many such issues.
> 2. Stale dentries left out in bricks' inode table because of a racing
> lookup and dentry modification ops (like rmdir, unlink, rename etc).
> 
> Initial idea I've now is to maintain fops in-progress on a dentry in
> parent inode (may be resolver code in protocol/server). Based on this
> we can serialize the operations. Since we need to serialize _only_
> operations on a dentry (we don't serialize nameless lookups), it is
> guaranteed that we do have a parent inode always. Any
> comments/discussion on this would be appreciated.
> 
> [1] http://review.gluster.org/11725
> [2] http://review.gluster.org/9913
> [3] http://review.gluster.org/5240
> 
> regards,
> Raghavendra.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


pgp95Z8uzDPPJ.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-16 Thread Raghavendra Gowdappa


- Original Message -
> From: "Raghavendra Gowdappa" 
> To: "Gluster Devel" 
> Cc: "Sakshi Bansal" 
> Sent: Monday, 17 August, 2015 10:39:38 AM
> Subject: [Gluster-devel] Serialization of fops acting on same dentry on   
> server
> 
> All,
> 
> Pranith and me were discussing about implementation of compound operations
> like "create + lock", "mkdir + lock", "open + lock" etc. These operations
> are useful in situations like:
> 
> 1. To prevent locking on all subvols during directory creation as part of
> self heal in dht. Currently we are following approach of locking _all_
> subvols by both rmdir and lookup-heal [1].

Correction. It should've been, "to prevent locking on all subvols during 
rmdir". The lookup self-heal should lock on all subvols (with compound "mkdir + 
lookup" if directory is not present on a subvol). With this rmdir/rename can 
lock on just any one subvol and this will prevent any parallel lookup-heal from 
preventing directory creation.

> 2. To lock a file in advance so that there is less performance hit during
> transactions in afr.
> 
> While thinking about implementing such compound operations, it occurred to me
> that one of the problems would be how do we handle a racing mkdir/create and
> a (named lookup - simply referred as lookup from now on - followed by lock).
> This is because,
> 1. creation of directory/file on backend
> 2. linking of the inode with the gfid corresponding to that file/directory
> 
> are not atomic. It is not guaranteed that inode passed down during
> mkdir/create call need not be the one that survives in inode table. Since
> posix-locks xlator maintains all the lock-state in inode, it would be a
> problem if a different inode is linked in inode table than the one passed
> during mkdir/create. One way to solve this problem is to serialize fops
> (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a
> particular dentry. This serialization would also solve other bugs like:
> 
> 1. issues solved by [2][3] and possibly many such issues.
> 2. Stale dentries left out in bricks' inode table because of a racing lookup
> and dentry modification ops (like rmdir, unlink, rename etc).
> 
> Initial idea I've now is to maintain fops in-progress on a dentry in parent
> inode (may be resolver code in protocol/server). Based on this we can
> serialize the operations. Since we need to serialize _only_ operations on a
> dentry (we don't serialize nameless lookups), it is guaranteed that we do
> have a parent inode always. Any comments/discussion on this would be
> appreciated.
> 
> [1] http://review.gluster.org/11725
> [2] http://review.gluster.org/9913
> [3] http://review.gluster.org/5240
> 
> regards,
> Raghavendra.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Serialization of fops acting on same dentry on server

2015-08-16 Thread Raghavendra Gowdappa
All,

Pranith and me were discussing about implementation of compound operations like 
"create + lock", "mkdir + lock", "open + lock" etc. These operations are useful 
in situations like:

1. To prevent locking on all subvols during directory creation as part of self 
heal in dht. Currently we are following approach of locking _all_ subvols by 
both rmdir and lookup-heal [1].
2. To lock a file in advance so that there is less performance hit during 
transactions in afr.

While thinking about implementing such compound operations, it occurred to me 
that one of the problems would be how do we handle a racing mkdir/create and a 
(named lookup - simply referred as lookup from now on - followed by lock). This 
is because,
1. creation of directory/file on backend
2. linking of the inode with the gfid corresponding to that file/directory

are not atomic. It is not guaranteed that inode passed down during mkdir/create 
call need not be the one that survives in inode table. Since posix-locks xlator 
maintains all the lock-state in inode, it would be a problem if a different 
inode is linked in inode table than the one passed during mkdir/create. One way 
to solve this problem is to serialize fops (like mkdir/create, lookup, rename, 
rmdir, unlink) that are happening on a particular dentry. This serialization 
would also solve other bugs like:

1. issues solved by [2][3] and possibly many such issues.
2. Stale dentries left out in bricks' inode table because of a racing lookup 
and dentry modification ops (like rmdir, unlink, rename etc).

Initial idea I've now is to maintain fops in-progress on a dentry in parent 
inode (may be resolver code in protocol/server). Based on this we can serialize 
the operations. Since we need to serialize _only_ operations on a dentry (we 
don't serialize nameless lookups), it is guaranteed that we do have a parent 
inode always. Any comments/discussion on this would be appreciated.

[1] http://review.gluster.org/11725
[2] http://review.gluster.org/9913
[3] http://review.gluster.org/5240

regards,
Raghavendra.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel