Re: [Cluster-devel] [PATCH] gfs2: time journal recovery steps accurately

2018-03-29 Thread Bob Peterson
- Original Message -
> This patch spits out the time taken by the various steps in the
> journal recover process. Previously, the journal recovery time
> didn't account for finding the journal head in the log which takes
> up a significant portion of time.
> 
> Signed-off-by: Abhi Das 
> ---
Hi,

Thanks. This is now pushed to the for-next branch of the linux-gfs2 tree:
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next=5e86d9d122d0d6fae00d9dff41c22d6f4d09f566

Regards,

Bob Peterson
Red Hat File Systems



Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Herbert Xu
On Thu, Mar 29, 2018 at 06:52:34PM +0200, Andreas Gruenbacher wrote:
>
> Should rhashtable_walk_peek be kept around even if there are no more
> users? I have my doubts.

Absolutely.  All netlink dumps using rhashtable_walk_next are buggy
and need to switch over to rhashtable_walk_peek.  As otherwise
the object that triggers the out-of-space condition will be skipped
upon resumption.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



[Cluster-devel] [PATCH] gfs2: time journal recovery steps accurately

2018-03-29 Thread Abhi Das
This patch spits out the time taken by the various steps in the
journal recover process. Previously, the journal recovery time
didn't account for finding the journal head in the log which takes
up a significant portion of time.

Signed-off-by: Abhi Das 
---
 fs/gfs2/recovery.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index b6b2589..d8b622c 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "gfs2.h"
 #include "incore.h"
@@ -409,12 +410,13 @@ void gfs2_recover_func(struct work_struct *work)
struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
struct gfs2_log_header_host head;
struct gfs2_holder j_gh, ji_gh, thaw_gh;
-   unsigned long t;
+   ktime_t t_start, t_jlck, t_jhd, t_tlck, t_rep;
int ro = 0;
unsigned int pass;
int error;
int jlocked = 0;
 
+   t_start = ktime_get();
if (sdp->sd_args.ar_spectator ||
(jd->jd_jid != sdp->sd_lockstruct.ls_jid)) {
fs_info(sdp, "jid=%u: Trying to acquire journal lock...\n",
@@ -446,6 +448,7 @@ void gfs2_recover_func(struct work_struct *work)
fs_info(sdp, "jid=%u, already locked for use\n", jd->jd_jid);
}
 
+   t_jlck = ktime_get();
fs_info(sdp, "jid=%u: Looking at journal...\n", jd->jd_jid);
 
error = gfs2_jdesc_check(jd);
@@ -455,13 +458,12 @@ void gfs2_recover_func(struct work_struct *work)
error = gfs2_find_jhead(jd, );
if (error)
goto fail_gunlock_ji;
+   t_jhd = ktime_get();
 
if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
fs_info(sdp, "jid=%u: Acquiring the transaction lock...\n",
jd->jd_jid);
 
-   t = jiffies;
-
/* Acquire a shared hold on the freeze lock */
 
error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED,
@@ -495,6 +497,7 @@ void gfs2_recover_func(struct work_struct *work)
goto fail_gunlock_thaw;
}
 
+   t_tlck = ktime_get();
fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);
 
for (pass = 0; pass < 2; pass++) {
@@ -509,9 +512,14 @@ void gfs2_recover_func(struct work_struct *work)
clean_journal(jd, );
 
gfs2_glock_dq_uninit(_gh);
-   t = DIV_ROUND_UP(jiffies - t, HZ);
-   fs_info(sdp, "jid=%u: Journal replayed in %lus\n",
-   jd->jd_jid, t);
+   t_rep = ktime_get();
+   fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "
+   "jhead:%lldms, tlck:%lldms, replay:%lldms]\n",
+   jd->jd_jid, ktime_ms_delta(t_rep, t_start),
+   ktime_ms_delta(t_jlck, t_start),
+   ktime_ms_delta(t_jhd, t_jlck),
+   ktime_ms_delta(t_tlck, t_jhd),
+   ktime_ms_delta(t_rep, t_tlck));
}
 
gfs2_recovery_done(sdp, jd->jd_jid, LM_RD_SUCCESS);
-- 
2.4.11



Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher
On 29 March 2018 at 17:41, Herbert Xu  wrote:
> On Thu, Mar 29, 2018 at 03:15:54PM +0200, Andreas Gruenbacher wrote:
>>
>> For all I know, Neil's latest plan is to get rhashtable_walk_peek
>> replaced and removed because it is unfixable. This patch removes the
>> one and only user.
>
> His latest patch makes rhashtable_walk_peek stable in the face of
> removals.
>
> https://patchwork.ozlabs.org/patch/892534/

Ok, I can slightly update my patch description. The problem still
remains that glocks can be deleted from the rhashtable between
stop/start, and that needs to be fixed in gfs2. Once that's done,
keeping track of the current glock comes for free and we won't need
rhashtable_walk_peek anymore.

Should rhashtable_walk_peek be kept around even if there are no more
users? I have my doubts.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Herbert Xu
On Thu, Mar 29, 2018 at 03:15:54PM +0200, Andreas Gruenbacher wrote:
>
> For all I know, Neil's latest plan is to get rhashtable_walk_peek
> replaced and removed because it is unfixable. This patch removes the
> one and only user.

His latest patch makes rhashtable_walk_peek stable in the face of
removals.

https://patchwork.ozlabs.org/patch/892534/

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



Re: [Cluster-devel] [PATCH] gfs2: Zero out fallocated blocks in fallocate_chunk

2018-03-29 Thread Bob Peterson
- Original Message -
| Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
| out in fallocate_chunk, much higher up the call stack.  This gets rid of
| gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
| buffer flag.  I can't think of a reason why zeroing out the blocks in
| gfs2_iomap_alloc would have any benefits: there is no additional locking
| at that level that would add protection to the newly allocated blocks.
| 
| While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.
| 
| Signed-off-by: Andreas Gruenbacher 
| ---
Hi,

Thanks. This is now pushed to the for-next branch of the linux-gfs2 tree:
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next=fffb64127adc3eea6a19ceefdc88d171f68b9d34

Regards,

Bob Peterson
Red Hat File Systems



Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher
On 29 March 2018 at 14:35, Herbert Xu  wrote:
> On Thu, Mar 29, 2018 at 02:06:10PM +0200, Andreas Gruenbacher wrote:
>> Here's a second version of the patch (now a patch set) to eliminate
>> rhashtable_walk_peek in gfs2.
>>
>> The first patch introduces lockref_put_not_zero, the inverse of
>> lockref_get_not_zero.
>>
>> The second patch eliminates rhashtable_walk_peek in gfs2.  In
>> gfs2_glock_iter_next, the new lockref function from patch one is used to
>> drop a lockref count as long as the count doesn't drop to zero.  This is
>> almost always the case; if there is a risk of dropping the last
>> reference, we must defer that to a work queue because dropping the last
>> reference may sleep.
>
> In light of Neil's latest patch, do we still need this?

For all I know, Neil's latest plan is to get rhashtable_walk_peek
replaced and removed because it is unfixable. This patch removes the
one and only user.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Herbert Xu
On Thu, Mar 29, 2018 at 02:06:10PM +0200, Andreas Gruenbacher wrote:
> Here's a second version of the patch (now a patch set) to eliminate
> rhashtable_walk_peek in gfs2.
> 
> The first patch introduces lockref_put_not_zero, the inverse of
> lockref_get_not_zero.
> 
> The second patch eliminates rhashtable_walk_peek in gfs2.  In
> gfs2_glock_iter_next, the new lockref function from patch one is used to
> drop a lockref count as long as the count doesn't drop to zero.  This is
> almost always the case; if there is a risk of dropping the last
> reference, we must defer that to a work queue because dropping the last
> reference may sleep.

In light of Neil's latest patch, do we still need this?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Steven Whitehouse

Hi,

Can we solve the problem another way, by not taking refs on the glocks 
when we are iterating over them for the debugfs files? I assume that is 
the main issue here.


We didn't used to take refs since the rcu locking was enough during the 
walk itself. We used to only keep track of the hash bucket and offset 
within the bucket when we dropped the rcu lock between calls to the 
iterator. I may have lost track of why that approach did not work?


Steve.


On 29/03/18 13:06, Andreas Gruenbacher wrote:

Here's a second version of the patch (now a patch set) to eliminate
rhashtable_walk_peek in gfs2.

The first patch introduces lockref_put_not_zero, the inverse of
lockref_get_not_zero.

The second patch eliminates rhashtable_walk_peek in gfs2.  In
gfs2_glock_iter_next, the new lockref function from patch one is used to
drop a lockref count as long as the count doesn't drop to zero.  This is
almost always the case; if there is a risk of dropping the last
reference, we must defer that to a work queue because dropping the last
reference may sleep.

Thanks,
Andreas

Andreas Gruenbacher (2):
   lockref: Add lockref_put_not_zero
   gfs2: Stop using rhashtable_walk_peek

  fs/gfs2/glock.c | 47 ---
  include/linux/lockref.h |  1 +
  lib/lockref.c   | 28 
  3 files changed, 57 insertions(+), 19 deletions(-)





[Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher
Here's a second version of the patch (now a patch set) to eliminate
rhashtable_walk_peek in gfs2.

The first patch introduces lockref_put_not_zero, the inverse of
lockref_get_not_zero.

The second patch eliminates rhashtable_walk_peek in gfs2.  In
gfs2_glock_iter_next, the new lockref function from patch one is used to
drop a lockref count as long as the count doesn't drop to zero.  This is
almost always the case; if there is a risk of dropping the last
reference, we must defer that to a work queue because dropping the last
reference may sleep.

Thanks,
Andreas

Andreas Gruenbacher (2):
  lockref: Add lockref_put_not_zero
  gfs2: Stop using rhashtable_walk_peek

 fs/gfs2/glock.c | 47 ---
 include/linux/lockref.h |  1 +
 lib/lockref.c   | 28 
 3 files changed, 57 insertions(+), 19 deletions(-)

-- 
2.14.3



[Cluster-devel] [PATCH v2 1/2] lockref: Add lockref_put_not_zero

2018-03-29 Thread Andreas Gruenbacher
Put a lockref unless the lockref is dead or its count would become zero.
This is the same as lockref_put_or_lock except that the lock is never
left held.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/lockref.h |  1 +
 lib/lockref.c   | 28 
 2 files changed, 29 insertions(+)

diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 2eac32095113..99f17cc8e163 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -37,6 +37,7 @@ struct lockref {
 extern void lockref_get(struct lockref *);
 extern int lockref_put_return(struct lockref *);
 extern int lockref_get_not_zero(struct lockref *);
+extern int lockref_put_not_zero(struct lockref *);
 extern int lockref_get_or_lock(struct lockref *);
 extern int lockref_put_or_lock(struct lockref *);
 
diff --git a/lib/lockref.c b/lib/lockref.c
index 47169ed7e964..3d468b53d4c9 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -80,6 +80,34 @@ int lockref_get_not_zero(struct lockref *lockref)
 }
 EXPORT_SYMBOL(lockref_get_not_zero);
 
+/**
+ * lockref_put_not_zero - Decrements count unless count <= 1 before decrement
+ * @lockref: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count would become zero
+ */
+int lockref_put_not_zero(struct lockref *lockref)
+{
+   int retval;
+
+   CMPXCHG_LOOP(
+   new.count--;
+   if (old.count <= 1)
+   return 0;
+   ,
+   return 1;
+   );
+
+   spin_lock(>lock);
+   retval = 0;
+   if (lockref->count > 1) {
+   lockref->count--;
+   retval = 1;
+   }
+   spin_unlock(>lock);
+   return retval;
+}
+EXPORT_SYMBOL(lockref_put_not_zero);
+
 /**
  * lockref_get_or_lock - Increments count unless the count is 0 or dead
  * @lockref: pointer to lockref structure
-- 
2.14.3



[Cluster-devel] [PATCH v2 2/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher
Function rhashtable_walk_peek is problematic because there is no
guarantee that the glock previously returned still exists; when that key
is deleted, rhashtable_walk_peek can end up returning a different key,
which will cause an inconsistent glock dump.  Fix this by keeping track
of the current glock in the seq file iterator functions instead.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/glock.c | 47 ---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 82fb5583445c..097bd3c0f270 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1923,28 +1923,37 @@ void gfs2_glock_exit(void)
 
 static void gfs2_glock_iter_next(struct gfs2_glock_iter *gi, loff_t n)
 {
-   if (n == 0)
-   gi->gl = rhashtable_walk_peek(>hti);
-   else {
-   gi->gl = rhashtable_walk_next(>hti);
-   n--;
+   struct gfs2_glock *gl = gi->gl;
+
+   if (gl) {
+   if (n == 0)
+   return;
+   if (!lockref_put_not_zero(>gl_lockref))
+   gfs2_glock_queue_put(gl);
}
for (;;) {
-   if (IS_ERR_OR_NULL(gi->gl)) {
-   if (!gi->gl)
-   return;
-   if (PTR_ERR(gi->gl) != -EAGAIN) {
-   gi->gl = NULL;
-   return;
+   gl = rhashtable_walk_next(>hti);
+   if (IS_ERR_OR_NULL(gl)) {
+   if (gl == ERR_PTR(-EAGAIN)) {
+   n = 1;
+   continue;
}
-   n = 0;
-   } else if (gi->sdp == gi->gl->gl_name.ln_sbd &&
-  !__lockref_is_dead(>gl->gl_lockref)) {
-   if (!n--)
-   break;
+   gl = NULL;
+   break;
+   }
+   if (gl->gl_name.ln_sbd != gi->sdp)
+   continue;
+   if (n <= 1) {
+   if (!lockref_get_not_dead(>gl_lockref))
+   continue;
+   break;
+   } else {
+   if (__lockref_is_dead(>gl_lockref))
+   continue;
+   n--;
}
-   gi->gl = rhashtable_walk_next(>hti);
}
+   gi->gl = gl;
 }
 
 static void *gfs2_glock_seq_start(struct seq_file *seq, loff_t *pos)
@@ -1988,7 +1997,6 @@ static void gfs2_glock_seq_stop(struct seq_file *seq, 
void *iter_ptr)
 {
struct gfs2_glock_iter *gi = seq->private;
 
-   gi->gl = NULL;
rhashtable_walk_stop(>hti);
 }
 
@@ -2076,7 +2084,8 @@ static int gfs2_glocks_release(struct inode *inode, 
struct file *file)
struct seq_file *seq = file->private_data;
struct gfs2_glock_iter *gi = seq->private;
 
-   gi->gl = NULL;
+   if (gi->gl)
+   gfs2_glock_put(gi->gl);
rhashtable_walk_exit(>hti);
return seq_release_private(inode, file);
 }
-- 
2.14.3



Re: [Cluster-devel] [PATCH] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread NeilBrown
On Wed, Mar 28 2018, Andreas Gruenbacher wrote:

> Function rhashtable_walk_peek is problematic because there is no
> guarantee that the glock previously returned still exists; when that key
> is deleted, rhashtable_walk_peek can end up returning a different key,
> which would cause an inconsistent glock dump.  So instead of using
> rhashtable_walk_peek, keep track of the current glock in the seq file
> iterator functions.
>
> Signed-off-by: Andreas Gruenbacher 
> ---
>  fs/gfs2/glock.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 82fb5583445c..f1fc353875d3 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -55,6 +55,7 @@ struct gfs2_glock_iter {
>   struct gfs2_sbd *sdp;   /* incore superblock   */
>   struct rhashtable_iter hti; /* rhashtable iterator */
>   struct gfs2_glock *gl;  /* current glock struct*/
> + bool gl_held;
>   loff_t last_pos;/* last position   */
>  };
>  
> @@ -1923,9 +1924,11 @@ void gfs2_glock_exit(void)
>  
>  static void gfs2_glock_iter_next(struct gfs2_glock_iter *gi, loff_t n)
>  {
> - if (n == 0)
> - gi->gl = rhashtable_walk_peek(>hti);
> - else {
> + if (n != 0 || !gi->gl) {
> + if (gi->gl_held) {
> + gfs2_glock_queue_put(gi->gl);
> + gi->gl_held = false;
> + }
>   gi->gl = rhashtable_walk_next(>hti);
>   n--;
>   }

Thank for this patch!
The above looks a bit fragile to me.
gfs2_glock_iter_next() (And hence gfs2_glock_seq_start()) will sometimes
exit with gl_held true, and sometimes with it false.
gfs2_glock_seq_stop() assumes that it is false.
Normally gfs2_glock_seq_next() will normally be called between these
two and will clear gl_held, but I don't think there is a hard guarantee
of that.
Maybe we should always 'put' gi->gl in iter_next if gl_held??

Thanks,
NeilBrown



> @@ -1988,7 +1991,10 @@ static void gfs2_glock_seq_stop(struct seq_file *seq, 
> void *iter_ptr)
>  {
>   struct gfs2_glock_iter *gi = seq->private;
>  
> - gi->gl = NULL;
> + if (gi->gl) {
> + lockref_get(>gl->gl_lockref);
> + gi->gl_held = true;
> + }
>   rhashtable_walk_stop(>hti);
>  }
>  
> @@ -2061,6 +2067,7 @@ static int __gfs2_glocks_open(struct inode *inode, 
> struct file *file,
>*/
>   gi->last_pos = -1;
>   gi->gl = NULL;
> + gi->gl_held = false;
>   rhashtable_walk_enter(_hash_table, >hti);
>   }
>   return ret;
> @@ -2076,7 +2083,8 @@ static int gfs2_glocks_release(struct inode *inode, 
> struct file *file)
>   struct seq_file *seq = file->private_data;
>   struct gfs2_glock_iter *gi = seq->private;
>  
> - gi->gl = NULL;
> + if (gi->gl_held)
> + gfs2_glock_put(gi->gl);
>   rhashtable_walk_exit(>hti);
>   return seq_release_private(inode, file);
>  }
> -- 
> 2.14.3


signature.asc
Description: PGP signature


Re: [Cluster-devel] [PATCH] gfs2: Zero out fallocated blocks in fallocate_chunk

2018-03-29 Thread Christoph Hellwig
Looks good to me from the iomap POV:

Acked-by: Christoph Hellwig