Re: [Cluster-devel] [GFS2 RFC PATCH 3/3] gfs2: introduce bio_pool to readahead journal to find jhead

2018-09-25 Thread Steven Whitehouse

Hi,



On 25/09/18 06:38, Abhi Das wrote:

This patch adds a new data structure called bio_pool. This is
basically a dynamically allocated array of struct bio* and
associated variables to manage this data structure.

The array is used in a circular fashion until the entire array
has bios that are in flight. i.e. they need to be waited on and
consumed upon completion, in order to make room for more. To
locate the journal head, we read the journal sequentially from
the beginning, creating bios and submitting them as necessary.

We wait for these inflight bios in the order we submit them even
though the block layer may complete them out of order. This strict
ordering allows us to determine the journal head without having
to do extra reads.

A tunable allows us to configure the size of the bio_pool.
I'd rather not introduce a new tunable. What size should the pool be? Is 
there any reason that we even need to have the array to keep track of 
the bios? If the pages are in the page cache (i.e. address space of the 
journal inode) then we should be able to simply wait on the pages in 
order I think, without needing a separate list,


Steve.


Signed-off-by: Abhi Das 
---
  fs/gfs2/incore.h |   3 +
  fs/gfs2/lops.c   | 359 +++
  fs/gfs2/lops.h   |   1 +
  fs/gfs2/ops_fstype.c |   2 +
  fs/gfs2/recovery.c   | 116 ++---
  fs/gfs2/sys.c|  27 ++--
  6 files changed, 391 insertions(+), 117 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b96d39c..424687f 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -542,6 +542,8 @@ struct gfs2_jdesc {
int jd_recover_error;
/* Replay stuff */
  
+	struct gfs2_log_header_host jd_jhead;

+   struct mutex jd_jh_mutex;
unsigned int jd_found_blocks;
unsigned int jd_found_revokes;
unsigned int jd_replayed_blocks;
@@ -610,6 +612,7 @@ struct gfs2_tune {
unsigned int gt_complain_secs;
unsigned int gt_statfs_quantum;
unsigned int gt_statfs_slow;
+   unsigned int gt_bio_pool_size; /* No of bios to use for the bio_pool */
  };
  
  enum {

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index f2567f9..69fc058 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  
+#include "bmap.h"

  #include "dir.h"
  #include "gfs2.h"
  #include "incore.h"
@@ -370,6 +371,364 @@ void gfs2_log_write_page(struct gfs2_sbd *sdp, struct 
page *page)
   gfs2_log_bmap(sdp));
  }
  
+/*

+ * The bio_pool structure is an array of bios of length 'size'.
+ * 'cur' is the index of the next bio to be submitted for I/O.
+ * 'wait' is the index of bio we need to wait on for I/O completion.
+ * 'inflight' is the number of bios submitted, but not yet completed.
+ */
+struct bio_pool {
+   struct bio **bios;
+   unsigned int size;
+   unsigned int cur;
+   unsigned int wait;
+   unsigned int inflight;
+};
+typedef int (search_bio_t) (struct gfs2_jdesc *jd, const void *ptr);
+
+/**
+ * bio_pool_submit_bio - Submit the current bio in the pool
+ *
+ * @pool: The bio pool
+ *
+ * Submit the current bio (pool->bios[pool->cur]) and update internal pool
+ * management variables. If pool->inflight == pool->size, we've maxed out all
+ * the bios in our pool and the caller needs to wait on some bios, process and
+ * free them so new ones can be added.
+ *
+ * Returns: 1 if we maxed out our bios, 0, otherwise
+ */
+
+static int bio_pool_submit_bio(struct bio_pool *pool)
+{
+   int ret = 0;
+   BUG_ON(!pool || !pool->bios || !pool->bios[pool->cur]);
+
+   bio_set_op_attrs(pool->bios[pool->cur], REQ_OP_READ, 0);
+   submit_bio(pool->bios[pool->cur]);
+   pool->cur = pool->cur == pool->size - 1 ? 0 : pool->cur + 1;
+   pool->inflight++;
+   if (pool->inflight == pool->size)
+   ret = 1;
+   return ret;
+}
+
+/**
+ * bio_pool_get_cur - Do what's necessary to get a valid bio for the caller.
+ *
+ * @pool: The bio pool
+ * @sdp: The gfs2 superblock
+ * @blkno: The block number we wish to add to a bio
+ * @end_io: The end_io completion callback
+ *
+ * If there's no currently active bio, we allocate one for the blkno and 
return.
+ *
+ * If there's an active bio at pool->bios[pool->cur], we check if the requested
+ * block maybe to tacked onto it. If yes, we do nothing and return.
+ *
+ * If the block can't be added (non-contiguous), we submit the current bio.
+ * pool->cur, pool->inflight will change and we fall through to allocate a new
+ * bio and return. In this case, it is possible that submitting the current bio
+ * has maxed out our readahead (bio_pool_submit_bio() returns 1). We pass this
+ * error code back to the caller.
+ *
+ * Returns: 1 if bio_pool_submit_bio() maxed readahead, else 0.
+ */
+
+static int bio_pool_get_cur(struct bio_pool *pool, struct gfs2_sbd *sdp,
+   u64 blkno, bio_end_io_t end_io, void 

[Cluster-devel] [GFS2 RFC PATCH 3/3] gfs2: introduce bio_pool to readahead journal to find jhead

2018-09-24 Thread Abhi Das
This patch adds a new data structure called bio_pool. This is
basically a dynamically allocated array of struct bio* and
associated variables to manage this data structure.

The array is used in a circular fashion until the entire array
has bios that are in flight. i.e. they need to be waited on and
consumed upon completion, in order to make room for more. To
locate the journal head, we read the journal sequentially from
the beginning, creating bios and submitting them as necessary.

We wait for these inflight bios in the order we submit them even
though the block layer may complete them out of order. This strict
ordering allows us to determine the journal head without having
to do extra reads.

A tunable allows us to configure the size of the bio_pool.

Signed-off-by: Abhi Das 
---
 fs/gfs2/incore.h |   3 +
 fs/gfs2/lops.c   | 359 +++
 fs/gfs2/lops.h   |   1 +
 fs/gfs2/ops_fstype.c |   2 +
 fs/gfs2/recovery.c   | 116 ++---
 fs/gfs2/sys.c|  27 ++--
 6 files changed, 391 insertions(+), 117 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b96d39c..424687f 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -542,6 +542,8 @@ struct gfs2_jdesc {
int jd_recover_error;
/* Replay stuff */
 
+   struct gfs2_log_header_host jd_jhead;
+   struct mutex jd_jh_mutex;
unsigned int jd_found_blocks;
unsigned int jd_found_revokes;
unsigned int jd_replayed_blocks;
@@ -610,6 +612,7 @@ struct gfs2_tune {
unsigned int gt_complain_secs;
unsigned int gt_statfs_quantum;
unsigned int gt_statfs_slow;
+   unsigned int gt_bio_pool_size; /* No of bios to use for the bio_pool */
 };
 
 enum {
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index f2567f9..69fc058 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 
+#include "bmap.h"
 #include "dir.h"
 #include "gfs2.h"
 #include "incore.h"
@@ -370,6 +371,364 @@ void gfs2_log_write_page(struct gfs2_sbd *sdp, struct 
page *page)
   gfs2_log_bmap(sdp));
 }
 
+/*
+ * The bio_pool structure is an array of bios of length 'size'.
+ * 'cur' is the index of the next bio to be submitted for I/O.
+ * 'wait' is the index of bio we need to wait on for I/O completion.
+ * 'inflight' is the number of bios submitted, but not yet completed.
+ */
+struct bio_pool {
+   struct bio **bios;
+   unsigned int size;
+   unsigned int cur;
+   unsigned int wait;
+   unsigned int inflight;
+};
+typedef int (search_bio_t) (struct gfs2_jdesc *jd, const void *ptr);
+
+/**
+ * bio_pool_submit_bio - Submit the current bio in the pool
+ *
+ * @pool: The bio pool
+ *
+ * Submit the current bio (pool->bios[pool->cur]) and update internal pool
+ * management variables. If pool->inflight == pool->size, we've maxed out all
+ * the bios in our pool and the caller needs to wait on some bios, process and
+ * free them so new ones can be added.
+ *
+ * Returns: 1 if we maxed out our bios, 0, otherwise
+ */
+
+static int bio_pool_submit_bio(struct bio_pool *pool)
+{
+   int ret = 0;
+   BUG_ON(!pool || !pool->bios || !pool->bios[pool->cur]);
+
+   bio_set_op_attrs(pool->bios[pool->cur], REQ_OP_READ, 0);
+   submit_bio(pool->bios[pool->cur]);
+   pool->cur = pool->cur == pool->size - 1 ? 0 : pool->cur + 1;
+   pool->inflight++;
+   if (pool->inflight == pool->size)
+   ret = 1;
+   return ret;
+}
+
+/**
+ * bio_pool_get_cur - Do what's necessary to get a valid bio for the caller.
+ *
+ * @pool: The bio pool
+ * @sdp: The gfs2 superblock
+ * @blkno: The block number we wish to add to a bio
+ * @end_io: The end_io completion callback
+ *
+ * If there's no currently active bio, we allocate one for the blkno and 
return.
+ *
+ * If there's an active bio at pool->bios[pool->cur], we check if the requested
+ * block maybe to tacked onto it. If yes, we do nothing and return.
+ *
+ * If the block can't be added (non-contiguous), we submit the current bio.
+ * pool->cur, pool->inflight will change and we fall through to allocate a new
+ * bio and return. In this case, it is possible that submitting the current bio
+ * has maxed out our readahead (bio_pool_submit_bio() returns 1). We pass this
+ * error code back to the caller.
+ *
+ * Returns: 1 if bio_pool_submit_bio() maxed readahead, else 0.
+ */
+
+static int bio_pool_get_cur(struct bio_pool *pool, struct gfs2_sbd *sdp,
+   u64 blkno, bio_end_io_t end_io, void *private)
+{
+   struct super_block *sb = sdp->sd_vfs;
+   struct bio *bio;
+   int ret = 0;
+
+   BUG_ON(!pool || !pool->bios);
+
+   if (pool->bios[pool->cur]) {
+   u64 nblk;
+   nblk = bio_end_sector(pool->bios[pool->cur]);
+   nblk >>= sdp->sd_fsb2bb_shift;
+   if (blkno == nblk)
+   return 0;
+   ret =