Re: [Cluster-devel] [PATCH 09/11] gfs2: Read bitmap buffers on demand

2019-01-14 Thread Andreas Gruenbacher
On Mon, 14 Jan 2019 at 19:41, Andreas Gruenbacher  wrote:
>
> Bob,
>
> On Mon, 14 Jan 2019 at 18:12, Bob Peterson  wrote:
> > - Original Message -
> > > Before this patch, when locking a resource group, gfs2 would read in the
> > > resource group header and all the bitmap buffers of the resource group.
> > > Those buffers would then be locked into memory until the resource group
> > > is unlocked, which will happen when the filesystem is unmounted or when
> > > transferring the resource group lock to another node, but not due to
> > > memory pressure.  Larger resource groups lock more buffers into memory,
> > > and cause more unnecessary I/O when resource group locks are transferred
> > > between nodes.
> > >
> > > With this patch, when locking a resource group, only the resource group
> > > header is read in.  The other bitmap buffers (the resource group header
> > > contains part of the bitmap) are only read in on demand.
> > >
> > > It would probably make sense to also only read in the resource group
> > > header on demand, when the resource group is modified, but since the
> > > header contains the number of free blocks in the resource group, there
> > > is a higher incentive to keep the header locked in memory after that.
> > >
> > > Signed-off-by: Andreas Gruenbacher 
> > > ---
> >
> > Hi,
> >
> > This patch looks good. However, I'm concerned about its performance,
> > at least until we get some bitmap read-ahead in place.
> > (see "/* FIXME: Might want to do read-ahead here. */")
> >
> > In particular, I'm concerned that it does the exact opposite of this
> > performance enhancement:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?id=39b0f1e9290880a6c905f639e7db6b646e302a4f
> >
> > So we need to test that 2GB rgrps (33 bitmap blocks) that are fragmented
> > don't have a disproportionate slowdown compared to, say, 128MB rgrps
> > because we end up spending so much time fetching pages out of page cache,
> > only to find them inadequate and doing brelse() again. And by fragmented,
> > I mean all 33 bitmap blocks have a small number of free blocks, thus
> > disqualifying them from having "full" status but still having a large
> > enough number of free blocks that we might consider searching them.
> > Some of the patches we did after that one may have mitigated the problem
> > to some extent, but we need to be sure we're not regressing performance.
>
> Yes, there are a few ways this patch can still be tuned depending on
> the performance impact. This should speed up "normal" workloads by
> requiring fewer reads from disk when acquiring a resource group glock,
> and less memory is pinned for filesystem metadata, but we do end up
> with more page cache lookups during allocations, and depending on
> read-ahead, possibly more smaller reads at inconvenient times.
>
> rd_last_alloc should help limit the performance impact when bitmaps
> become mostly full. (Forms of rd_last_alloc have been in gfs2 since
> the mainline merge.)

Hmm, rd_extfail_pt actually, not rd_last_alloc, and that predates the
optimization in commit 39b0f1e929 by less than two years.

> One way to further reduce the number of page cache lookups would be to
> cache the buffer head of the last allocation, but it's unclear if this
> is even an issue, so let's do some performance testing first.

Andreas



Re: [Cluster-devel] [PATCH 09/11] gfs2: Read bitmap buffers on demand

2019-01-14 Thread Andreas Gruenbacher
Bob,

On Mon, 14 Jan 2019 at 18:12, Bob Peterson  wrote:
> - Original Message -
> > Before this patch, when locking a resource group, gfs2 would read in the
> > resource group header and all the bitmap buffers of the resource group.
> > Those buffers would then be locked into memory until the resource group
> > is unlocked, which will happen when the filesystem is unmounted or when
> > transferring the resource group lock to another node, but not due to
> > memory pressure.  Larger resource groups lock more buffers into memory,
> > and cause more unnecessary I/O when resource group locks are transferred
> > between nodes.
> >
> > With this patch, when locking a resource group, only the resource group
> > header is read in.  The other bitmap buffers (the resource group header
> > contains part of the bitmap) are only read in on demand.
> >
> > It would probably make sense to also only read in the resource group
> > header on demand, when the resource group is modified, but since the
> > header contains the number of free blocks in the resource group, there
> > is a higher incentive to keep the header locked in memory after that.
> >
> > Signed-off-by: Andreas Gruenbacher 
> > ---
>
> Hi,
>
> This patch looks good. However, I'm concerned about its performance,
> at least until we get some bitmap read-ahead in place.
> (see "/* FIXME: Might want to do read-ahead here. */")
>
> In particular, I'm concerned that it does the exact opposite of this
> performance enhancement:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?id=39b0f1e9290880a6c905f639e7db6b646e302a4f
>
> So we need to test that 2GB rgrps (33 bitmap blocks) that are fragmented
> don't have a disproportionate slowdown compared to, say, 128MB rgrps
> because we end up spending so much time fetching pages out of page cache,
> only to find them inadequate and doing brelse() again. And by fragmented,
> I mean all 33 bitmap blocks have a small number of free blocks, thus
> disqualifying them from having "full" status but still having a large
> enough number of free blocks that we might consider searching them.
> Some of the patches we did after that one may have mitigated the problem
> to some extent, but we need to be sure we're not regressing performance.

Yes, there are a few ways this patch can still be tuned depending on
the performance impact. This should speed up "normal" workloads by
requiring fewer reads from disk when acquiring a resource group glock,
and less memory is pinned for filesystem metadata, but we do end up
with more page cache lookups during allocations, and depending on
read-ahead, possibly more smaller reads at inconvenient times.

rd_last_alloc should help limit the performance impact when bitmaps
become mostly full. (Forms of rd_last_alloc have been in gfs2 since
the mainline merge.)

One way to further reduce the number of page cache lookups would be to
cache the buffer head of the last allocation, but it's unclear if this
is even an issue, so let's do some performance testing first.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH 09/11] gfs2: Read bitmap buffers on demand

2019-01-14 Thread Bob Peterson
- Original Message -
> Before this patch, when locking a resource group, gfs2 would read in the
> resource group header and all the bitmap buffers of the resource group.
> Those buffers would then be locked into memory until the resource group
> is unlocked, which will happen when the filesystem is unmounted or when
> transferring the resource group lock to another node, but not due to
> memory pressure.  Larger resource groups lock more buffers into memory,
> and cause more unnecessary I/O when resource group locks are transferred
> between nodes.
> 
> With this patch, when locking a resource group, only the resource group
> header is read in.  The other bitmap buffers (the resource group header
> contains part of the bitmap) are only read in on demand.
> 
> It would probably make sense to also only read in the resource group
> header on demand, when the resource group is modified, but since the
> header contains the number of free blocks in the resource group, there
> is a higher incentive to keep the header locked in memory after that.
> 
> Signed-off-by: Andreas Gruenbacher 
> ---

Hi,

This patch looks good. However, I'm concerned about its performance,
at least until we get some bitmap read-ahead in place.
(see "/* FIXME: Might want to do read-ahead here. */")

In particular, I'm concerned that it does the exact opposite of this
performance enhancement:

https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?id=39b0f1e9290880a6c905f639e7db6b646e302a4f

So we need to test that 2GB rgrps (33 bitmap blocks) that are fragmented
don't have a disproportionate slowdown compared to, say, 128MB rgrps
because we end up spending so much time fetching pages out of page cache,
only to find them inadequate and doing brelse() again. And by fragmented,
I mean all 33 bitmap blocks have a small number of free blocks, thus
disqualifying them from having "full" status but still having a large
enough number of free blocks that we might consider searching them.
Some of the patches we did after that one may have mitigated the problem
to some extent, but we need to be sure we're not regressing performance.

Regards,

Bob Peterson



[Cluster-devel] [PATCH 09/11] gfs2: Read bitmap buffers on demand

2019-01-11 Thread Andreas Gruenbacher
Before this patch, when locking a resource group, gfs2 would read in the
resource group header and all the bitmap buffers of the resource group.
Those buffers would then be locked into memory until the resource group
is unlocked, which will happen when the filesystem is unmounted or when
transferring the resource group lock to another node, but not due to
memory pressure.  Larger resource groups lock more buffers into memory,
and cause more unnecessary I/O when resource group locks are transferred
between nodes.

With this patch, when locking a resource group, only the resource group
header is read in.  The other bitmap buffers (the resource group header
contains part of the bitmap) are only read in on demand.

It would probably make sense to also only read in the resource group
header on demand, when the resource group is modified, but since the
header contains the number of free blocks in the resource group, there
is a higher incentive to keep the header locked in memory after that.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/incore.h |   1 -
 fs/gfs2/rgrp.c   | 382 ++-
 2 files changed, 241 insertions(+), 142 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 51eb41484e9af..d39c26b950121 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -87,7 +87,6 @@ struct gfs2_log_operations {
  * be reallocated in that same transaction.
  */
 struct gfs2_bitmap {
-   struct buffer_head *bi_bh;
char *bi_clone;
unsigned long bi_flags;
u32 bi_offset;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 339d6b064f1fc..503ea6f18ed74 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -62,6 +62,7 @@
 
 struct gfs2_rbm {
struct gfs2_rgrpd *rgd;
+   struct buffer_head *bh;
u32 offset; /* The offset is bitmap relative */
int bii;/* Bitmap index */
 };
@@ -112,8 +113,8 @@ static inline void gfs2_setbit(const struct gfs2_rbm *rbm, 
bool do_clone,
unsigned int buflen = bi->bi_bytes;
const unsigned int bit = (rbm->offset % GFS2_NBBY) * GFS2_BIT_SIZE;
 
-   byte1 = bi->bi_bh->b_data + bi->bi_offset + (rbm->offset / GFS2_NBBY);
-   end = bi->bi_bh->b_data + bi->bi_offset + buflen;
+   byte1 = rbm->bh->b_data + bi->bi_offset + (rbm->offset / GFS2_NBBY);
+   end = rbm->bh->b_data + bi->bi_offset + buflen;
 
BUG_ON(byte1 >= end);
 
@@ -126,7 +127,7 @@ static inline void gfs2_setbit(const struct gfs2_rbm *rbm, 
bool do_clone,
rbm->offset, cur_state, new_state);
fs_warn(sdp, "rgrp=0x%llx bi_start=0x%x biblk: 0x%llx\n",
(unsigned long long)rbm->rgd->rd_addr, bi->bi_start,
-   (unsigned long long)bi->bi_bh->b_blocknr);
+   (unsigned long long)rbm->bh->b_blocknr);
fs_warn(sdp, "bi_offset=0x%x bi_bytes=0x%x block=0x%llx\n",
bi->bi_offset, bi->bi_bytes,
(unsigned long long)gfs2_rbm_to_block(rbm));
@@ -164,7 +165,7 @@ static inline u8 gfs2_testbit(const struct gfs2_rbm *rbm, 
bool use_clone)
if (use_clone && bi->bi_clone)
buffer = bi->bi_clone;
else
-   buffer = bi->bi_bh->b_data;
+   buffer = rbm->bh->b_data;
buffer += bi->bi_offset;
byte = buffer + (rbm->offset / GFS2_NBBY);
bit = (rbm->offset % GFS2_NBBY) * GFS2_BIT_SIZE;
@@ -276,62 +277,152 @@ static u32 gfs2_bitfit(const u8 *buf, const unsigned int 
len,
 }
 
 /**
- * gfs2_rbm_from_block - Set the rbm based upon rgd and block number
- * @rbm: The rbm with rgd already set correctly
+ * __gfs2_rbm_get - Get the buffer head of @rbm
+ * @rbm: The rbm
+ *
+ * Get the buffer head of the bitmap block the rbm points at.
+ *
+ * Returns: 0 on success, or an error code
+ */
+static int __gfs2_rbm_get(struct gfs2_rbm *rbm) {
+   struct gfs2_rgrpd *rgd = rbm->rgd;
+   struct buffer_head *bh;
+   int error;
+
+   if (rbm->bii == 0) {
+   rbm->bh = rgd->rd_bh;
+   return 0;
+   }
+
+   /* FIXME: Might want to do read-ahead here. */
+   error = gfs2_meta_read(rgd->rd_gl, rgd->rd_addr + rbm->bii,
+  DIO_WAIT, 0, );
+   if (error)
+   goto out;
+   if (gfs2_metatype_check(rgd->rd_sbd, bh, GFS2_METATYPE_RB)) {
+   brelse(bh);
+   error = -EIO;
+   goto out;
+   }
+   rbm->bh = bh;
+out:
+   return error;
+}
+
+/**
+ * gfs2_rbm_get - Set up @rbm to point at @block
+ * @rbm: The rbm
+ * @rgd: The resource group of @block
  * @block: The block number (filesystem relative)
  *
- * This sets the bi and offset members of an rbm based on a
- * resource group and a filesystem relative block number. The
- * resource group must be set in the rbm on entry, the bi and
- * offset members will be set by this function.
+ * This sets the