Re: [Gluster-devel] Improvement of eager locking

2015-01-22 Thread Pranith Kumar Karampuri


On 01/16/2015 05:40 PM, Xavier Hernandez wrote:

On 01/16/2015 04:58 AM, Pranith Kumar Karampuri wrote:


On 01/15/2015 10:53 PM, Xavier Hernandez wrote:

Hi,

currently eager locking is implemented by checking the open-fd-count
special xattr for each write. If there's more than one open on the
same file, eager locking is disabled to avoid starvation.

This works quite well for file writes, but makes eager locking
unusable for other request types that do not involve an open fd (in
fact, this method is only for writes on regular files, not reads or
directories). This may cause a performance problem for other
operations, like metadata.

To be able to use eager locking for other purposes, what do you think
about this proposal:

Instead of implementing open-fd-count on posix xlator, do something
similar but in locks xlator. The difference will be that locks xlator
can use the pending locking information to determine if there are
other processes waiting for a resource. If so, set a flag in the cbk
xdata to let high level xlators know that they should not use eager
locking (this can be done only upon request by xdata).

I think this way provides a more precise way to avoid starvation and
maximize performance at the same time, and it can be used for any
request even if it doesn't depend on an fd.

Another advantage is that if one file has been opened multiple times
but all of them from the same glusterfs client, that client could use
a single inodelk to manage all the accesses, not needing to release
the lock. Current implementation in posix xlator cannot differentiate
from opens from the same client or different clients.

What do you think ?

I like the idea. So basically we can propagate list_empty information of
'blocking_locks' list. And for sending locks, we need to use lk-owner
based on gfid so that locks from same client i.e. lkowner+transport are
granted irrespective of conflicting locks. The respective xls need to
make sure to order the fops so that they don't step on each other in a
single process. This can be used for entry-locks also.


I don't understand what are the benefits of checking for 
lkowner+transport to grant a lock bypassing conflicts. It seems 
dangerous and I don't see exactly how this can help the upper xlator. 
If this xlator already needs to take care of fop ordering for each 
inode, it can share the same lock. It seems there's no need to do 
additional locking calls. I may be missing some detail though.
Afr at the time of 3.2 or 3.3 used to take full file lock for doing 
self-heal. But this scheme was useless for VM healing. So we had to 
migrate the locking in a backward compatible way, so this strategy was 
employed where healing will take 128k chunk lock at a time heal that 
chunk and move to next chunk, but at no point we needed another 
self-heal to start. So the locking scheme came to be: Take a full file 
lock, find good/bad copies, take a lock on chunk-1, unlock full lock, 
heal chunk-1 then take a lock on chunk-2 unlock the lock on chunk-1 etc. 
To have this we needed the locks to be granted even when there were 
conflicting locks, so we chose (lk-owner+ transport) being same as a way 
to grant conflicting locks. We found that this can lead to another 
problem where truncate fop etc can hang, so we are moving to a different 
mechanism now. You can find the complete lock evolution document here 
https://github.com/gluster/glusterfs/blob/master/doc/code/xlators/cluster/afr/afr-locks-evolution.md




Thinking a litle more about the way to detect multiple accesses to the 
same inode using the list of pending locks, there's a case where some 
more logic must be added to avoid unnecessary delays.


Suppose you receive a request for an inode from one client. If there 
isn't anyone else waiting, a flag is set into the answer indicating 
that there's no conflict. After that the caller begins an eager lock 
timer because there isn't anyone else waiting. During that timeout, 
another client tries to access the same inode. It will block until the 
eager lock timer expires (at this time it will release the inode lock) 
or another request from the first client arrives (in this case the 
request is served and the result will indicate that it should release 
the lock since there are other clients waiting). When the lock is 
released, it will be granted to the other client. It's possible that 
this client completes the request before the first one tries to 
acquire the lock again (because it had more requests pending), causing 
that the second client initiates another eager lock timer because 
there were no other client waiting at the moment of executing the 
request. This is an unnecessary delay.


To avoid this problem, we could add a flag in the inodelk/entrylk 
calls to indicate that the lock is released to let other clients to 
proceed, but we will want the lock again as soon as possible. It would 
be as a combined unlock and lock on the same inode sent in a single 
request. This way 

Re: [Gluster-devel] Improvement of eager locking

2015-01-16 Thread Xavier Hernandez

On 01/16/2015 04:58 AM, Pranith Kumar Karampuri wrote:


On 01/15/2015 10:53 PM, Xavier Hernandez wrote:

Hi,

currently eager locking is implemented by checking the open-fd-count
special xattr for each write. If there's more than one open on the
same file, eager locking is disabled to avoid starvation.

This works quite well for file writes, but makes eager locking
unusable for other request types that do not involve an open fd (in
fact, this method is only for writes on regular files, not reads or
directories). This may cause a performance problem for other
operations, like metadata.

To be able to use eager locking for other purposes, what do you think
about this proposal:

Instead of implementing open-fd-count on posix xlator, do something
similar but in locks xlator. The difference will be that locks xlator
can use the pending locking information to determine if there are
other processes waiting for a resource. If so, set a flag in the cbk
xdata to let high level xlators know that they should not use eager
locking (this can be done only upon request by xdata).

I think this way provides a more precise way to avoid starvation and
maximize performance at the same time, and it can be used for any
request even if it doesn't depend on an fd.

Another advantage is that if one file has been opened multiple times
but all of them from the same glusterfs client, that client could use
a single inodelk to manage all the accesses, not needing to release
the lock. Current implementation in posix xlator cannot differentiate
from opens from the same client or different clients.

What do you think ?

I like the idea. So basically we can propagate list_empty information of
'blocking_locks' list. And for sending locks, we need to use lk-owner
based on gfid so that locks from same client i.e. lkowner+transport are
granted irrespective of conflicting locks. The respective xls need to
make sure to order the fops so that they don't step on each other in a
single process. This can be used for entry-locks also.


I don't understand what are the benefits of checking for 
lkowner+transport to grant a lock bypassing conflicts. It seems 
dangerous and I don't see exactly how this can help the upper xlator. If 
this xlator already needs to take care of fop ordering for each inode, 
it can share the same lock. It seems there's no need to do additional 
locking calls. I may be missing some detail though.


Thinking a litle more about the way to detect multiple accesses to the 
same inode using the list of pending locks, there's a case where some 
more logic must be added to avoid unnecessary delays.


Suppose you receive a request for an inode from one client. If there 
isn't anyone else waiting, a flag is set into the answer indicating that 
there's no conflict. After that the caller begins an eager lock timer 
because there isn't anyone else waiting. During that timeout, another 
client tries to access the same inode. It will block until the eager 
lock timer expires (at this time it will release the inode lock) or 
another request from the first client arrives (in this case the request 
is served and the result will indicate that it should release the lock 
since there are other clients waiting). When the lock is released, it 
will be granted to the other client. It's possible that this client 
completes the request before the first one tries to acquire the lock 
again (because it had more requests pending), causing that the second 
client initiates another eager lock timer because there were no other 
client waiting at the moment of executing the request. This is an 
unnecessary delay.


To avoid this problem, we could add a flag in the inodelk/entrylk calls 
to indicate that the lock is released to let other clients to proceed, 
but we will want the lock again as soon as possible. It would be as a 
combined unlock and lock on the same inode sent in a single request. 
This way we avoid one round trip and locks xlator has more up to date 
information to decide if there are concurrent accesses or not.


Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Improvement of eager locking

2015-01-15 Thread Pranith Kumar Karampuri


On 01/15/2015 10:53 PM, Xavier Hernandez wrote:

Hi,

currently eager locking is implemented by checking the open-fd-count 
special xattr for each write. If there's more than one open on the 
same file, eager locking is disabled to avoid starvation.


This works quite well for file writes, but makes eager locking 
unusable for other request types that do not involve an open fd (in 
fact, this method is only for writes on regular files, not reads or 
directories). This may cause a performance problem for other 
operations, like metadata.


To be able to use eager locking for other purposes, what do you think 
about this proposal:


Instead of implementing open-fd-count on posix xlator, do something 
similar but in locks xlator. The difference will be that locks xlator 
can use the pending locking information to determine if there are 
other processes waiting for a resource. If so, set a flag in the cbk 
xdata to let high level xlators know that they should not use eager 
locking (this can be done only upon request by xdata).


I think this way provides a more precise way to avoid starvation and 
maximize performance at the same time, and it can be used for any 
request even if it doesn't depend on an fd.


Another advantage is that if one file has been opened multiple times 
but all of them from the same glusterfs client, that client could use 
a single inodelk to manage all the accesses, not needing to release 
the lock. Current implementation in posix xlator cannot differentiate 
from opens from the same client or different clients.


What do you think ?
I like the idea. So basically we can propagate list_empty information of 
'blocking_locks' list. And for sending locks, we need to use lk-owner 
based on gfid so that locks from same client i.e. lkowner+transport are 
granted irrespective of conflicting locks. The respective xls need to 
make sure to order the fops so that they don't step on each other in a 
single process. This can be used for entry-locks also.


Pranith


Xavi


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel