Re: [Cluster-devel] [PATCH] GFS2: Hold gl_hash_table bucket locks in glock_hash_walk

2015-11-13 Thread Andrew Price

On 12/11/15 20:38, Steven Whitehouse wrote:

Hi,

On 12/11/15 20:11, Bob Peterson wrote:

- Original Message -

This lockdep splat was being triggered on umount:

[55715.973122] ===
[55715.980169] [ INFO: suspicious RCU usage. ]
[55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: GW
[55715.982353] ---
[55715.983301] fs/gfs2/glock.c:1427 suspicious
rcu_dereference_protected()
usage!

The code it refers to is the rht_for_each_entry_safe usage in
glock_hash_walk. The condition that triggers the warning is
lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
__rcu_dereference_protected macro.

lockdep_rht_bucket_is_held returns false when rht_bucket_lock(tbl, hash)
is not held, which suggests that glock_hash_walk should be holding the
lock for each rhashtable bucket it looks at. Holding those locks clears
up the warning.

Signed-off-by: Andrew Price 
---
  fs/gfs2/glock.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 32e7471..7384cbb 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1424,11 +1424,13 @@ static void glock_hash_walk(glock_examiner
examiner,
const struct gfs2_sbd *sdp)
  rcu_read_lock();
  tbl = rht_dereference_rcu(gl_hash_table.tbl, _hash_table);
  for (i = 0; i < tbl->size; i++) {
+spin_lock(rht_bucket_lock(tbl, i));
  rht_for_each_entry_safe(gl, pos, next, tbl, i, gl_node) {
  if ((gl->gl_name.ln_sbd == sdp) &&
  lockref_get_not_dead(>gl_lockref))
  examiner(gl);
  }
+spin_unlock(rht_bucket_lock(tbl, i));
  }
  rcu_read_unlock();
  cond_resched();
--

Not sure that this is the correct fix. There is nothing changing in the
examiner function that I can see which would require the bucket lock,
and we already have this with an rcu_read_lock, which is correct since
we are only reading the hash lists, not changing them.


Good point, I wasn't sure whether it was safe to assume the examiner 
functions weren't changing the buckets and erred on the side of caution.


rht_for_each_entry_safe() does an unconditional

rcu_dereference_protected(p, lockdep_rht_bucket_is_held(tbl, hash))

so we're likely using the wrong rht_for_each_entry* macro. I'm currently 
testing a version which uses rht_for_each_entry_rcu() instead. That one uses


rcu_dereference_check(p, lockdep_rht_bucket_is_held(tbl, hash))

which is defined as

#define rcu_dereference_check(p, c) \
   __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)

so that looks like the right one to use. I'll send that patch shortly.

Andy



Re: [Cluster-devel] [PATCH v2] GFS2: Use rht_for_each_entry_rcu in glock_hash_walk

2015-11-13 Thread Steven Whitehouse

Hi,

Looks good...

Acked-by: Steven Whitehouse 

Steve.

On 13/11/15 12:16, Andrew Price wrote:

This lockdep splat was being triggered on umount:

[55715.973122] ===
[55715.980169] [ INFO: suspicious RCU usage. ]
[55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: GW
[55715.982353] ---
[55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() 
usage!

The code it refers to is the rht_for_each_entry_safe usage in
glock_hash_walk. The condition that triggers the warning is
lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
__rcu_dereference_protected macro.

The rhashtable buckets are not changed in glock_hash_walk so it's safe
to rely on the rcu protection. Replace the rht_for_each_entry_safe()
usage with rht_for_each_entry_rcu(), which doesn't care whether the
bucket lock is held if the rcu read lock is held.

Signed-off-by: Andrew Price 
---
  fs/gfs2/glock.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 32e7471..430326e 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1417,14 +1417,14 @@ static struct shrinker glock_shrinker = {
  static void glock_hash_walk(glock_examiner examiner, const struct gfs2_sbd 
*sdp)
  {
struct gfs2_glock *gl;
-   struct rhash_head *pos, *next;
+   struct rhash_head *pos;
const struct bucket_table *tbl;
int i;
  
  	rcu_read_lock();

tbl = rht_dereference_rcu(gl_hash_table.tbl, _hash_table);
for (i = 0; i < tbl->size; i++) {
-   rht_for_each_entry_safe(gl, pos, next, tbl, i, gl_node) {
+   rht_for_each_entry_rcu(gl, pos, tbl, i, gl_node) {
if ((gl->gl_name.ln_sbd == sdp) &&
lockref_get_not_dead(>gl_lockref))
examiner(gl);




Re: [Cluster-devel] [PATCH 2/2] gfs2: Extended attribute readahead optimization

2015-11-13 Thread Andreas Gruenbacher
On Thu, Nov 12, 2015 at 9:33 PM, Steven Whitehouse  wrote:
> That looks good to me, and I assume that it should be faster too?

I did some tests with a directory tree with 3439 directories and 51556
files in it.

In that tree, 47313 or 86% of the 54995 files and directories ended up
with readahead enabled, while 7682 didn't.

On a slow SATA magnetic disc, I compared the time required for running
"find", then "find | xargs stat", then "find | xargs getfattr". I
booted with selinux=0 for these measurements, so "find" and "find |
xargs stat" didn't use the xattrs.

The results are attached as a spreadsheet. They show that xattr
readahead can save quite some time (here, about -75%). The optimized
version of xattr readahead is about 3% faster than the simple version
in this test, so I'll leave it up to you whether that's worth it for
you.

It makes no difference whether the scheduler is "noop" or "deadline"
here; the "noop" scheduler still seems to merge adjacent I/O requests
(which makes sense).

Andreas


gfs2-readahead.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: [Cluster-devel] [PATCH v2] GFS2: Use rht_for_each_entry_rcu in glock_hash_walk

2015-11-13 Thread Bob Peterson
- Original Message -
> This lockdep splat was being triggered on umount:
> 
> [55715.973122] ===
> [55715.980169] [ INFO: suspicious RCU usage. ]
> [55715.981021] 4.3.0-11553-g8d3de01-dirty #15 Tainted: GW
> [55715.982353] ---
> [55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected()
> usage!
> 
> The code it refers to is the rht_for_each_entry_safe usage in
> glock_hash_walk. The condition that triggers the warning is
> lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
> __rcu_dereference_protected macro.
> 
> The rhashtable buckets are not changed in glock_hash_walk so it's safe
> to rely on the rcu protection. Replace the rht_for_each_entry_safe()
> usage with rht_for_each_entry_rcu(), which doesn't care whether the
> bucket lock is held if the rcu read lock is held.
> 
> Signed-off-by: Andrew Price 
> ---
>  fs/gfs2/glock.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 32e7471..430326e 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -1417,14 +1417,14 @@ static struct shrinker glock_shrinker = {
>  static void glock_hash_walk(glock_examiner examiner, const struct gfs2_sbd
>  *sdp)
>  {
>   struct gfs2_glock *gl;
> - struct rhash_head *pos, *next;
> + struct rhash_head *pos;
>   const struct bucket_table *tbl;
>   int i;
>  
>   rcu_read_lock();
>   tbl = rht_dereference_rcu(gl_hash_table.tbl, _hash_table);
>   for (i = 0; i < tbl->size; i++) {
> - rht_for_each_entry_safe(gl, pos, next, tbl, i, gl_node) {
> + rht_for_each_entry_rcu(gl, pos, tbl, i, gl_node) {
>   if ((gl->gl_name.ln_sbd == sdp) &&
>   lockref_get_not_dead(>gl_lockref))
>   examiner(gl);
> --
> 2.4.3
> 
> 

ACK

I'll save it until after this merge window.

Bob Peterson
Red Hat File Systems



Re: [Cluster-devel] [PATCH] GFS2: Delete an unnecessary check before the function call "iput"

2015-11-13 Thread Bob Peterson
- Original Message -
> From: Markus Elfring 
> Date: Wed, 4 Nov 2015 21:23:43 +0100
> 
> The iput() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  fs/gfs2/ops_fstype.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index baab99b..1f9de17 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -910,8 +910,7 @@ fail_qc_i:
>  fail_ut_i:
>   iput(sdp->sd_sc_inode);
>  fail:
> - if (pn)
> - iput(pn);
> + iput(pn);
>   return error;
>  }
>  
> --
> 2.6.2
> 
> 
Hi Markus,

ACK,

I'll save this patch and push it after the current merge window closed.

Regards,

Bob Peterson
Red Hat File Systems



Re: [Cluster-devel] [PATCH 2/2] gfs2: Extended attribute readahead optimization

2015-11-13 Thread Bob Peterson
- Original Message -
> Here is an updated version of this patch, please review.
> 
> Thanks,
> Andreas
> 
> --
> 
> Instead of submitting a READ_SYNC bio for the inode and a READA bio for
> the inode's extended attributes through submit_bh, submit a single READ
> bio for both strough submit_bio when possible.  This can be more
> efficient on some kinds of block devices.
> 
> Signed-off-by: Andreas Gruenbacher 
> ---
>  fs/gfs2/meta_io.c | 81
>  ++-
>  1 file changed, 63 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
> index 0f24828..e137d96 100644
> --- a/fs/gfs2/meta_io.c
> +++ b/fs/gfs2/meta_io.c
> @@ -187,19 +187,50 @@ struct buffer_head *gfs2_meta_new(struct gfs2_glock
> *gl, u64 blkno)
>   return bh;
>  }
>  
> -static void gfs2_meta_readahead(struct gfs2_glock *gl, u64 blkno)
> +static void gfs2_meta_read_endio(struct bio *bio)
>  {
> - struct buffer_head *bh;
> + struct bio_vec *bvec;
> + int i;
> +
> + bio_for_each_segment_all(bvec, bio, i) {
> + struct page *page = bvec->bv_page;
> + struct buffer_head *bh = page_buffers(page);
> + unsigned int len = bvec->bv_len;
> +
> + while (bh_offset(bh) < bvec->bv_offset)
> + bh = bh->b_this_page;
> + do {
> + struct buffer_head *next = bh->b_this_page;
> + len -= bh->b_size;
> + bh->b_end_io(bh, !bio->bi_error);
> + bh = next;
> + } while (bh && len);
> + }
> + bio_put(bio);
> +}
>  
> - bh = gfs2_getbuf(gl, blkno, 1);
> - lock_buffer(bh);
> - if (buffer_uptodate(bh)) {
> - unlock_buffer(bh);
> - brelse(bh);
> +/*
> + * Submit several consecutive buffer head I/O requests as a single bio I/O
> + * request.  (See submit_bh_wbc.)
> + */
> +static void gfs2_submit_bhs(int rw, struct buffer_head *bhs[], int num)
> +{
> + struct buffer_head *bh = bhs[0];
> + struct bio *bio;
> + int i;
> +
> + if (!num)
>   return;
> +
> + bio = bio_alloc(GFP_NOIO, num);
> + bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
> + bio->bi_bdev = bh->b_bdev;
> + for (i = 0; i < num; i++) {
> + bh = bhs[i];
> + bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
>   }
> - bh->b_end_io = end_buffer_read_sync;
> - submit_bh(READA | REQ_META | REQ_PRIO, bh);
> + bio->bi_end_io = gfs2_meta_read_endio;
> + submit_bio(rw, bio);
>  }
>  
>  /**
> @@ -216,7 +247,8 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int
> flags,
>  int rahead, struct buffer_head **bhp)
>  {
>   struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
> - struct buffer_head *bh;
> + struct buffer_head *bh, *bhs[2];
> + int num = 0;
>  
>   if (unlikely(test_bit(SDF_SHUTDOWN, >sd_flags))) {
>   *bhp = NULL;
> @@ -228,18 +260,31 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno,
> int flags,
>   lock_buffer(bh);
>   if (buffer_uptodate(bh)) {
>   unlock_buffer(bh);
> - if (rahead)
> - gfs2_meta_readahead(gl, blkno + 1);
> - return 0;
> + flags &= ~DIO_WAIT;
> + } else {
> + bh->b_end_io = end_buffer_read_sync;
> + get_bh(bh);
> + bhs[num++] = bh;
>   }
> - bh->b_end_io = end_buffer_read_sync;
> - get_bh(bh);
> - submit_bh(READ_SYNC | REQ_META | REQ_PRIO, bh);
> - if (rahead)
> - gfs2_meta_readahead(gl, blkno + 1);
> +
> + if (rahead) {
> + bh = gfs2_getbuf(gl, blkno + 1, CREATE);
> +
> + lock_buffer(bh);
> + if (buffer_uptodate(bh)) {
> + unlock_buffer(bh);
> + brelse(bh);
> + } else {
> + bh->b_end_io = end_buffer_read_sync;
> + bhs[num++] = bh;
> + }
> + }
> +
> + gfs2_submit_bhs(READ_SYNC | REQ_META | REQ_PRIO, bhs, num);
>   if (!(flags & DIO_WAIT))
>   return 0;
>  
> + bh = *bhp;
>   wait_on_buffer(bh);
>   if (unlikely(!buffer_uptodate(bh))) {
>   struct gfs2_trans *tr = current->journal_info;
> --
> 2.5.0
> 
> 
Hi,

ACK to both patches

Looks good. I'll hold onto them until this merge window closes.

Regards,

Bob Peterson
Red Hat File Systems