Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 06:48:19PM +, Steven Whitehouse wrote:
> > Converting to NL would actually be less expensive than unlock because the
> > NL convert does not involve a reply message, but unlock does.
> > 
> I'm not entirely sure I follow... at least from the filesystem point of
> view (and without your proposed change) both conversions and unlocks
> result in a reply. Is this a dlm internal reply perhaps?

Right, I was refering to the internal dlm reply over the network.

> > So skipping the unlocks is a first step that gives us a big benefit very
> > simply.  To benefit even further, we could later look into skipping the
> > "convert to NL" step also, and just abandoning the dlm locks in whatever
> > mode they're in; but that's probably not as simple a change.
> > 
> 
> Yes, thats true... the issue is that the glock state machine treats all
> glocks on an individual basis, and the demotion to NL also deals with
> any writing back and invalidating of the cache thats required at the
> same time. So that makes it tricky to separate from the requests to the
> dlm.
> 
> That said, I'd like to be able to move towards dealing with batches of
> glocks in the future, since that means we can provide a more favourable
> ordering of i/o requests. That is not an easy thing to do though.
> 
> In addition to the benefit for umount, I'm also wondering whether, if
> these unlocks are relatively slow, we should look at what happens during
> normal operation, where we do from time to time, send unlock requests.
> Those are mostly (though indirectly) in response to memory pressure. Is
> there anything we can do there to speed things up I wonder?

The main thing would be to not use a completion callback for dlm_unlock
(either make dlm not send one, or ignore it in gfs2).  This would let you
free the glock memory right away.

But, removing unlock completions can create new problems, because you'd
need to handle new errors from dlm_lock() when it ran up against an
incomplete unlock.  Dealing with that complication may negate any benefit
from ignoring unlock completions.  Unless, of course, you knew you
wouldn't be making any more dlm_lock calls on that lock, e.g. during
unmount.



Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread Steven Whitehouse
Hi,

On Thu, 2012-11-08 at 10:41 -0500, David Teigland wrote:
> On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote:
> > Hi,
> > 
> > On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
> > > When unmounting, gfs2 does a full dlm_unlock operation on every
> > > cached lock.  This can create a very large amount of work and can
> > > take a long time to complete.  However, the vast majority of these
> > > dlm unlock operations are unnecessary because after all the unlocks
> > > are done, gfs2 leaves the dlm lockspace, which automatically clears
> > > the locks of the leaving node, without unlocking each one individually.
> > > So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
> > > remove the locks implicitly.  The one exception is when the lock's lvb is
> > > being used.  In this case, dlm_unlock is called because it may update the
> > > lvb of the resource.
> > > 
> > 
> > I'm wondering just how much we are likely to gain from this we
> > currently use LVBs for both quota (and more recently) rgrp too. If we
> > were to start using the LVBs for inodes and/or iopen locks eventually
> > then that would seem to rather reduce the benefits of this.
> 
> Considering what you say below, after you've converted to NL, there's no
> more lvb to consider, so the lvb is not an issue in that case. The lvb is
> only written if you're unlocking from PW or EX, so there's bound to always
> be many unlocks that could be skipped.  I'll adjust the patch to skip
> unlock unless there's an lvb and the mode is PW or EX.
> 
Ok

> > The other question is what the cost of conversion to NL vs unlock of an
> > NL lock is. Even with the patch we are still iterating over each lock to
> > do a conversion to NL in any case where the lock is not already in NL.
> > So all we are saving is the final NL -> unlocked change.
> 
> yeah, I'd forgotten about that.
> 
> > One thought is whether it would not be better to do a direct "whatever"
> > -> unlocked change in the first place, rather than splitting the
> > operation into two parts.
> 
> Converting to NL would actually be less expensive than unlock because the
> NL convert does not involve a reply message, but unlock does.
> 
I'm not entirely sure I follow... at least from the filesystem point of
view (and without your proposed change) both conversions and unlocks
result in a reply. Is this a dlm internal reply perhaps?

> So skipping the unlocks is a first step that gives us a big benefit very
> simply.  To benefit even further, we could later look into skipping the
> "convert to NL" step also, and just abandoning the dlm locks in whatever
> mode they're in; but that's probably not as simple a change.
> 

Yes, thats true... the issue is that the glock state machine treats all
glocks on an individual basis, and the demotion to NL also deals with
any writing back and invalidating of the cache thats required at the
same time. So that makes it tricky to separate from the requests to the
dlm.

That said, I'd like to be able to move towards dealing with batches of
glocks in the future, since that means we can provide a more favourable
ordering of i/o requests. That is not an easy thing to do though.

In addition to the benefit for umount, I'm also wondering whether, if
these unlocks are relatively slow, we should look at what happens during
normal operation, where we do from time to time, send unlock requests.
Those are mostly (though indirectly) in response to memory pressure. Is
there anything we can do there to speed things up I wonder?

Steve.





Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread David Teigland
On Thu, Nov 08, 2012 at 10:26:53AM +, Steven Whitehouse wrote:
> Hi,
> 
> On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
> > When unmounting, gfs2 does a full dlm_unlock operation on every
> > cached lock.  This can create a very large amount of work and can
> > take a long time to complete.  However, the vast majority of these
> > dlm unlock operations are unnecessary because after all the unlocks
> > are done, gfs2 leaves the dlm lockspace, which automatically clears
> > the locks of the leaving node, without unlocking each one individually.
> > So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
> > remove the locks implicitly.  The one exception is when the lock's lvb is
> > being used.  In this case, dlm_unlock is called because it may update the
> > lvb of the resource.
> > 
> 
> I'm wondering just how much we are likely to gain from this we
> currently use LVBs for both quota (and more recently) rgrp too. If we
> were to start using the LVBs for inodes and/or iopen locks eventually
> then that would seem to rather reduce the benefits of this.

Considering what you say below, after you've converted to NL, there's no
more lvb to consider, so the lvb is not an issue in that case. The lvb is
only written if you're unlocking from PW or EX, so there's bound to always
be many unlocks that could be skipped.  I'll adjust the patch to skip
unlock unless there's an lvb and the mode is PW or EX.

> The other question is what the cost of conversion to NL vs unlock of an
> NL lock is. Even with the patch we are still iterating over each lock to
> do a conversion to NL in any case where the lock is not already in NL.
> So all we are saving is the final NL -> unlocked change.

yeah, I'd forgotten about that.

> One thought is whether it would not be better to do a direct "whatever"
> -> unlocked change in the first place, rather than splitting the
> operation into two parts.

Converting to NL would actually be less expensive than unlock because the
NL convert does not involve a reply message, but unlock does.

So skipping the unlocks is a first step that gives us a big benefit very
simply.  To benefit even further, we could later look into skipping the
"convert to NL" step also, and just abandoning the dlm locks in whatever
mode they're in; but that's probably not as simple a change.



Re: [Cluster-devel] gfs2: skip dlm_unlock calls in unmount

2012-11-08 Thread Steven Whitehouse
Hi,

On Wed, 2012-11-07 at 14:14 -0500, David Teigland wrote:
> When unmounting, gfs2 does a full dlm_unlock operation on every
> cached lock.  This can create a very large amount of work and can
> take a long time to complete.  However, the vast majority of these
> dlm unlock operations are unnecessary because after all the unlocks
> are done, gfs2 leaves the dlm lockspace, which automatically clears
> the locks of the leaving node, without unlocking each one individually.
> So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
> remove the locks implicitly.  The one exception is when the lock's lvb is
> being used.  In this case, dlm_unlock is called because it may update the
> lvb of the resource.
> 

I'm wondering just how much we are likely to gain from this we
currently use LVBs for both quota (and more recently) rgrp too. If we
were to start using the LVBs for inodes and/or iopen locks eventually
then that would seem to rather reduce the benefits of this.

The other question is what the cost of conversion to NL vs unlock of an
NL lock is. Even with the patch we are still iterating over each lock to
do a conversion to NL in any case where the lock is not already in NL.
So all we are saving is the final NL -> unlocked change.

One thought is whether it would not be better to do a direct "whatever"
-> unlocked change in the first place, rather than splitting the
operation into two parts.

> Signed-off-by: David Teigland 
> ---
>  fs/gfs2/glock.c|1 +
>  fs/gfs2/incore.h   |1 +
>  fs/gfs2/lock_dlm.c |6 ++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index e6c2fd5..f3a5edb 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -1528,6 +1528,7 @@ static void dump_glock_func(struct gfs2_glock *gl)
>  
>  void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
>  {
> + set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
>   glock_hash_walk(clear_glock, sdp);
>   flush_workqueue(glock_workqueue);
>   wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == 
> 0);
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index 3d469d3..67a39cf 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -539,6 +539,7 @@ enum {
>   SDF_DEMOTE  = 5,
>   SDF_NOJOURNALID = 6,
>   SDF_RORECOVERY  = 7, /* read only recovery */
> + SDF_SKIP_DLM_UNLOCK = 8,
>  };
>  
>  #define GFS2_FSNAME_LEN  256
> diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> index 0fb6539..efd0fb6 100644
> --- a/fs/gfs2/lock_dlm.c
> +++ b/fs/gfs2/lock_dlm.c
> @@ -289,6 +289,12 @@ static void gdlm_put_lock(struct gfs2_glock *gl)
>   gfs2_glstats_inc(gl, GFS2_LKS_DCOUNT);
>   gfs2_sbstats_inc(gl, GFS2_LKS_DCOUNT);
>   gfs2_update_request_times(gl);
> +
> + if (test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags) && !gl->gl_lvb[0]) {
There is no guarantee that just because the first byte of the lvb is 0
it is not in use. So I'd suggest adding a flag to each struct gfs2_glops
to say whether the lvb is used by this type of glock or not and then
using that instead of this test.

We assume though that it is only ok to write the LVB when the lock is in
higher lock modes, so that it is not going to have changed since the
last lock state change (to NL in this case) so maybe we don't even need
to care about this? Or at least not unless we merge the two state
changes as mentioned above.

> + gfs2_glock_free(gl);
> + return;
> + }
> +
>   error = dlm_unlock(ls->ls_dlm, gl->gl_lksb.sb_lkid, DLM_LKF_VALBLK,
>  NULL, gl);
>   if (error) {

Steve.