Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread Steven Whitehouse
Hi,

On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
 When unmounting, gfs2 does a full dlm_unlock operation on every
 cached lock.  This can create a very large amount of work and can
 take a long time to complete.  However, the vast majority of these
 dlm unlock operations are unnecessary because after all the unlocks
 are done, gfs2 leaves the dlm lockspace, which automatically clears
 the locks of the leaving node, without unlocking each one individually.
 So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
 remove the locks implicitly.  The one exception is when the lock's lvb is
 being used.  In this case, dlm_unlock is called because it may update the
 lvb of the resource.
 

I'm wondering just how much we are likely to gain from this we
currently use LVBs for both quota (and more recently) rgrp too. If we
were to start using the LVBs for inodes and/or iopen locks eventually
then that would seem to rather reduce the benefits of this.

The other question is what the cost of conversion to NL vs unlock of an
NL lock is. Even with the patch we are still iterating over each lock to
do a conversion to NL in any case where the lock is not already in NL.
So all we are saving is the final NL - unlocked change.

One thought is whether it would not be better to do a direct whatever
- unlocked change in the first place, rather than splitting the
operation into two parts.

 Signed-off-by: David Teigland teigl...@redhat.com
 ---
  fs/gfs2/glock.c|1 +
  fs/gfs2/incore.h   |1 +
  fs/gfs2/lock_dlm.c |6 ++
  3 files changed, 8 insertions(+)
 
 diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
 index e6c2fd5..f3a5edb 100644
 --- a/fs/gfs2/glock.c
 +++ b/fs/gfs2/glock.c
 @@ -1528,6 +1528,7 @@ static void dump_glock_func(struct gfs2_glock *gl)
  
  void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
  {
 + set_bit(SDF_SKIP_DLM_UNLOCK, sdp-sd_flags);
   glock_hash_walk(clear_glock, sdp);
   flush_workqueue(glock_workqueue);
   wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
 0);
 diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
 index 3d469d3..67a39cf 100644
 --- a/fs/gfs2/incore.h
 +++ b/fs/gfs2/incore.h
 @@ -539,6 +539,7 @@ enum {
   SDF_DEMOTE  = 5,
   SDF_NOJOURNALID = 6,
   SDF_RORECOVERY  = 7, /* read only recovery */
 + SDF_SKIP_DLM_UNLOCK = 8,
  };
  
  #define GFS2_FSNAME_LEN  256
 diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
 index 0fb6539..efd0fb6 100644
 --- a/fs/gfs2/lock_dlm.c
 +++ b/fs/gfs2/lock_dlm.c
 @@ -289,6 +289,12 @@ static void gdlm_put_lock(struct gfs2_glock *gl)
   gfs2_glstats_inc(gl, GFS2_LKS_DCOUNT);
   gfs2_sbstats_inc(gl, GFS2_LKS_DCOUNT);
   gfs2_update_request_times(gl);
 +
 + if (test_bit(SDF_SKIP_DLM_UNLOCK, sdp-sd_flags)  !gl-gl_lvb[0]) {
There is no guarantee that just because the first byte of the lvb is 0
it is not in use. So I'd suggest adding a flag to each struct gfs2_glops
to say whether the lvb is used by this type of glock or not and then
using that instead of this test.

We assume though that it is only ok to write the LVB when the lock is in
higher lock modes, so that it is not going to have changed since the
last lock state change (to NL in this case) so maybe we don't even need
to care about this? Or at least not unless we merge the two state
changes as mentioned above.

 + gfs2_glock_free(gl);
 + return;
 + }
 +
   error = dlm_unlock(ls-ls_dlm, gl-gl_lksb.sb_lkid, DLM_LKF_VALBLK,
  NULL, gl);
   if (error) {

Steve.




Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote:
 Hi,
 
 On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
  When unmounting, gfs2 does a full dlm_unlock operation on every
  cached lock.  This can create a very large amount of work and can
  take a long time to complete.  However, the vast majority of these
  dlm unlock operations are unnecessary because after all the unlocks
  are done, gfs2 leaves the dlm lockspace, which automatically clears
  the locks of the leaving node, without unlocking each one individually.
  So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
  remove the locks implicitly.  The one exception is when the lock's lvb is
  being used.  In this case, dlm_unlock is called because it may update the
  lvb of the resource.
  
 
 I'm wondering just how much we are likely to gain from this we
 currently use LVBs for both quota (and more recently) rgrp too. If we
 were to start using the LVBs for inodes and/or iopen locks eventually
 then that would seem to rather reduce the benefits of this.

Considering what you say below, after you've converted to NL, there's no
more lvb to consider, so the lvb is not an issue in that case. The lvb is
only written if you're unlocking from PW or EX, so there's bound to always
be many unlocks that could be skipped.  I'll adjust the patch to skip
unlock unless there's an lvb and the mode is PW or EX.

 The other question is what the cost of conversion to NL vs unlock of an
 NL lock is. Even with the patch we are still iterating over each lock to
 do a conversion to NL in any case where the lock is not already in NL.
 So all we are saving is the final NL - unlocked change.

yeah, I'd forgotten about that.

 One thought is whether it would not be better to do a direct whatever
 - unlocked change in the first place, rather than splitting the
 operation into two parts.

Converting to NL would actually be less expensive than unlock because the
NL convert does not involve a reply message, but unlock does.

So skipping the unlocks is a first step that gives us a big benefit very
simply.  To benefit even further, we could later look into skipping the
convert to NL step also, and just abandoning the dlm locks in whatever
mode they're in; but that's probably not as simple a change.



Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread Steven Whitehouse
Hi,

On Thu, 2012-11-08 at 10:41 -0500, David Teigland wrote:
 On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote:
  Hi,
  
  On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
   When unmounting, gfs2 does a full dlm_unlock operation on every
   cached lock.  This can create a very large amount of work and can
   take a long time to complete.  However, the vast majority of these
   dlm unlock operations are unnecessary because after all the unlocks
   are done, gfs2 leaves the dlm lockspace, which automatically clears
   the locks of the leaving node, without unlocking each one individually.
   So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
   remove the locks implicitly.  The one exception is when the lock's lvb is
   being used.  In this case, dlm_unlock is called because it may update the
   lvb of the resource.
   
  
  I'm wondering just how much we are likely to gain from this we
  currently use LVBs for both quota (and more recently) rgrp too. If we
  were to start using the LVBs for inodes and/or iopen locks eventually
  then that would seem to rather reduce the benefits of this.
 
 Considering what you say below, after you've converted to NL, there's no
 more lvb to consider, so the lvb is not an issue in that case. The lvb is
 only written if you're unlocking from PW or EX, so there's bound to always
 be many unlocks that could be skipped.  I'll adjust the patch to skip
 unlock unless there's an lvb and the mode is PW or EX.
 
Ok

  The other question is what the cost of conversion to NL vs unlock of an
  NL lock is. Even with the patch we are still iterating over each lock to
  do a conversion to NL in any case where the lock is not already in NL.
  So all we are saving is the final NL - unlocked change.
 
 yeah, I'd forgotten about that.
 
  One thought is whether it would not be better to do a direct whatever
  - unlocked change in the first place, rather than splitting the
  operation into two parts.
 
 Converting to NL would actually be less expensive than unlock because the
 NL convert does not involve a reply message, but unlock does.
 
I'm not entirely sure I follow... at least from the filesystem point of
view (and without your proposed change) both conversions and unlocks
result in a reply. Is this a dlm internal reply perhaps?

 So skipping the unlocks is a first step that gives us a big benefit very
 simply.  To benefit even further, we could later look into skipping the
 convert to NL step also, and just abandoning the dlm locks in whatever
 mode they're in; but that's probably not as simple a change.
 

Yes, thats true... the issue is that the glock state machine treats all
glocks on an individual basis, and the demotion to NL also deals with
any writing back and invalidating of the cache thats required at the
same time. So that makes it tricky to separate from the requests to the
dlm.

That said, I'd like to be able to move towards dealing with batches of
glocks in the future, since that means we can provide a more favourable
ordering of i/o requests. That is not an easy thing to do though.

In addition to the benefit for umount, I'm also wondering whether, if
these unlocks are relatively slow, we should look at what happens during
normal operation, where we do from time to time, send unlock requests.
Those are mostly (though indirectly) in response to memory pressure. Is
there anything we can do there to speed things up I wonder?

Steve.





Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 06:48:19PM +, Steven Whitehouse wrote:
  Converting to NL would actually be less expensive than unlock because the
  NL convert does not involve a reply message, but unlock does.
  
 I'm not entirely sure I follow... at least from the filesystem point of
 view (and without your proposed change) both conversions and unlocks
 result in a reply. Is this a dlm internal reply perhaps?

Right, I was refering to the internal dlm reply over the network.

  So skipping the unlocks is a first step that gives us a big benefit very
  simply.  To benefit even further, we could later look into skipping the
  convert to NL step also, and just abandoning the dlm locks in whatever
  mode they're in; but that's probably not as simple a change.
  
 
 Yes, thats true... the issue is that the glock state machine treats all
 glocks on an individual basis, and the demotion to NL also deals with
 any writing back and invalidating of the cache thats required at the
 same time. So that makes it tricky to separate from the requests to the
 dlm.
 
 That said, I'd like to be able to move towards dealing with batches of
 glocks in the future, since that means we can provide a more favourable
 ordering of i/o requests. That is not an easy thing to do though.
 
 In addition to the benefit for umount, I'm also wondering whether, if
 these unlocks are relatively slow, we should look at what happens during
 normal operation, where we do from time to time, send unlock requests.
 Those are mostly (though indirectly) in response to memory pressure. Is
 there anything we can do there to speed things up I wonder?

The main thing would be to not use a completion callback for dlm_unlock
(either make dlm not send one, or ignore it in gfs2).  This would let you
free the glock memory right away.

But, removing unlock completions can create new problems, because you'd
need to handle new errors from dlm_lock() when it ran up against an
incomplete unlock.  Dealing with that complication may negate any benefit
from ignoring unlock completions.  Unless, of course, you knew you
wouldn't be making any more dlm_lock calls on that lock, e.g. during
unmount.



[Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-07 Thread David Teigland
When unmounting, gfs2 does a full dlm_unlock operation on every
cached lock.  This can create a very large amount of work and can
take a long time to complete.  However, the vast majority of these
dlm unlock operations are unnecessary because after all the unlocks
are done, gfs2 leaves the dlm lockspace, which automatically clears
the locks of the leaving node, without unlocking each one individually.
So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
remove the locks implicitly.  The one exception is when the lock's lvb is
being used.  In this case, dlm_unlock is called because it may update the
lvb of the resource.

Signed-off-by: David Teigland teigl...@redhat.com
---
 fs/gfs2/glock.c|1 +
 fs/gfs2/incore.h   |1 +
 fs/gfs2/lock_dlm.c |6 ++
 3 files changed, 8 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e6c2fd5..f3a5edb 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1528,6 +1528,7 @@ static void dump_glock_func(struct gfs2_glock *gl)
 
 void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 {
+   set_bit(SDF_SKIP_DLM_UNLOCK, sdp-sd_flags);
glock_hash_walk(clear_glock, sdp);
flush_workqueue(glock_workqueue);
wait_event(sdp-sd_glock_wait, atomic_read(sdp-sd_glock_disposal) == 
0);
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 3d469d3..67a39cf 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -539,6 +539,7 @@ enum {
SDF_DEMOTE  = 5,
SDF_NOJOURNALID = 6,
SDF_RORECOVERY  = 7, /* read only recovery */
+   SDF_SKIP_DLM_UNLOCK = 8,
 };
 
 #define GFS2_FSNAME_LEN256
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 0fb6539..efd0fb6 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -289,6 +289,12 @@ static void gdlm_put_lock(struct gfs2_glock *gl)
gfs2_glstats_inc(gl, GFS2_LKS_DCOUNT);
gfs2_sbstats_inc(gl, GFS2_LKS_DCOUNT);
gfs2_update_request_times(gl);
+
+   if (test_bit(SDF_SKIP_DLM_UNLOCK, sdp-sd_flags)  !gl-gl_lvb[0]) {
+   gfs2_glock_free(gl);
+   return;
+   }
+
error = dlm_unlock(ls-ls_dlm, gl-gl_lksb.sb_lkid, DLM_LKF_VALBLK,
   NULL, gl);
if (error) {
-- 
1.7.10.1.362.g242cab3