Re: [Cluster-devel] [PATCH V11 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-22 Thread Ming Lei
On Thu, Nov 22, 2018 at 12:03:15PM +0100, Christoph Hellwig wrote:
> > +/* used for chunk_for_each_segment */
> > +static inline void bvec_next_segment(const struct bio_vec *bvec,
> > +struct bvec_iter_all *iter_all)
> 
> FYI, chunk_for_each_segment doesn't exist anymore, this is
> bvec_for_each_segment now.  Not sure the comment helps much, though.

OK, will remove the comment.

> 
> > +{
> > +   struct bio_vec *bv = _all->bv;
> > +
> > +   if (bv->bv_page) {
> > +   bv->bv_page += 1;
> 
> I think this needs to use nth_page() given that with discontigmem
> page structures might not be allocated contigously.

Good catch!

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 07/19] fs/buffer.c: use bvec iterator to truncate the bio

2018-11-22 Thread Ming Lei
On Thu, Nov 22, 2018 at 11:58:49AM +0100, Christoph Hellwig wrote:
> Btw, given that this is the last user of bvec_last_segment after my
> other patches I think we should kill bvec_last_segment and do something
> like this here:
> 
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index fa37ad52e962..af5e135d2b83 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2981,6 +2981,14 @@ static void end_bio_bh_io_sync(struct bio *bio)
>   bio_put(bio);
>  }
>  
> +static void zero_trailing_sectors(struct bio_vec *bvec, unsigned bytes)
> +{
> + unsigned last_page = (bvec->bv_offset + bvec->bv_len - 1) >> PAGE_SHIFT;
> +
> + zero_user(nth_page(bvec->bv_page, last_page),
> +   bvec->bv_offset % PAGE_SIZE + bvec->bv_len, bytes);
> +}

The above 'start' parameter is figured out as wrong, and the computation
isn't very obvious, so I'd suggest to keep bvec_last_segment().

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-22 Thread Ming Lei
On Thu, Nov 22, 2018 at 11:30:33AM +0100, Christoph Hellwig wrote:
> Btw, this patch instead of the plain rever might make it a little
> more clear what is going on by skipping the confusing helper altogher
> and operating on the raw bvec array:
> 
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index e5b975fa0558..926550ce2d21 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -137,24 +137,18 @@ static inline bool bio_full(struct bio *bio)
>   for (i = 0, iter_all.idx = 0; iter_all.idx < (bio)->bi_vcnt; 
> iter_all.idx++)\
>   bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
> i, iter_all)
>  
> -static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> *iter,
> -   unsigned bytes, unsigned max_seg_len)
> +static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
> + unsigned bytes)
>  {
>   iter->bi_sector += bytes >> 9;
>  
>   if (bio_no_advance_iter(bio))
>   iter->bi_size -= bytes;
>   else
> - __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_seg_len);
> + bvec_iter_advance(bio->bi_io_vec, iter, bytes);
>   /* TODO: It is reasonable to complete bio with error here. */
>  }
>  
> -static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
> - unsigned bytes)
> -{
> - __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
> -}
> -
>  #define __bio_for_each_segment(bvl, bio, iter, start)
> \
>   for (iter = (start);\
>(iter).bi_size &&  \
> @@ -168,7 +162,7 @@ static inline void bio_advance_iter(struct bio *bio, 
> struct bvec_iter *iter,
>   for (iter = (start);\
>(iter).bi_size &&  \
>   ((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
> -  __bio_advance_iter((bio), &(iter), (bvl).bv_len, BVEC_MAX_LEN))
> +  bio_advance_iter((bio), &(iter), (bvl).bv_len))
>  
>  /* returns one real segment(multi-page bvec) each time */
>  #define bio_for_each_bvec(bvl, bio, iter)\
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index cab36d838ed0..7d0f9bdb6f05 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -25,8 +25,6 @@
>  #include 
>  #include 
>  
> -#define BVEC_MAX_LEN  ((unsigned int)-1)
> -
>  /*
>   * was unsigned short, but we might as well be ready for > 64kB I/O pages
>   */
> @@ -102,8 +100,8 @@ struct bvec_iter_all {
>   .bv_offset  = segment_iter_offset((bvec), (iter)),  \
>  })
>  
> -static inline bool __bvec_iter_advance(const struct bio_vec *bv,
> - struct bvec_iter *iter, unsigned bytes, unsigned max_seg_len)
> +static inline bool bvec_iter_advance(const struct bio_vec *bv,
> + struct bvec_iter *iter, unsigned bytes)
>  {
>   if (WARN_ONCE(bytes > iter->bi_size,
>"Attempted to advance past end of bvec iter\n")) {
> @@ -112,20 +110,15 @@ static inline bool __bvec_iter_advance(const struct 
> bio_vec *bv,
>   }
>  
>   while (bytes) {
> - unsigned segment_len = segment_iter_len(bv, *iter);
> -
> - if (max_seg_len < BVEC_MAX_LEN)
> - segment_len = min_t(unsigned, segment_len,
> - max_seg_len -
> - bvec_iter_offset(bv, *iter));
> + const struct bio_vec *cur = bv + iter->bi_idx;
> + unsigned len = min3(bytes, iter->bi_size,
> + cur->bv_len - iter->bi_bvec_done);
>  
> - segment_len = min(bytes, segment_len);
> -
> - bytes -= segment_len;
> - iter->bi_size -= segment_len;
> - iter->bi_bvec_done += segment_len;
> + bytes -= len;
> + iter->bi_size -= len;
> + iter->bi_bvec_done += len;
>  
> - if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) {
> + if (iter->bi_bvec_done == cur->bv_len) {
>   iter->bi_bvec_done = 0;
>   iter->bi_idx++;
>   }

I'd rather not do the optimization part in this patchset, given it doesn't
belong to this patchset, and it may decrease readability. So I plan to revert
the delta part in V12 first.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-22 Thread Christoph Hellwig
> +/* used for chunk_for_each_segment */
> +static inline void bvec_next_segment(const struct bio_vec *bvec,
> +  struct bvec_iter_all *iter_all)

FYI, chunk_for_each_segment doesn't exist anymore, this is
bvec_for_each_segment now.  Not sure the comment helps much, though.

> +{
> + struct bio_vec *bv = _all->bv;
> +
> + if (bv->bv_page) {
> + bv->bv_page += 1;

I think this needs to use nth_page() given that with discontigmem
page structures might not be allocated contigously.



Re: [Cluster-devel] [PATCH V11 07/19] fs/buffer.c: use bvec iterator to truncate the bio

2018-11-22 Thread Christoph Hellwig
Btw, given that this is the last user of bvec_last_segment after my
other patches I think we should kill bvec_last_segment and do something
like this here:


diff --git a/fs/buffer.c b/fs/buffer.c
index fa37ad52e962..af5e135d2b83 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2981,6 +2981,14 @@ static void end_bio_bh_io_sync(struct bio *bio)
bio_put(bio);
 }
 
+static void zero_trailing_sectors(struct bio_vec *bvec, unsigned bytes)
+{
+   unsigned last_page = (bvec->bv_offset + bvec->bv_len - 1) >> PAGE_SHIFT;
+
+   zero_user(nth_page(bvec->bv_page, last_page),
+ bvec->bv_offset % PAGE_SIZE + bvec->bv_len, bytes);
+}
+
 /*
  * This allows us to do IO even on the odd last sectors
  * of a device, even if the block size is some multiple
@@ -3031,13 +3039,8 @@ void guard_bio_eod(int op, struct bio *bio)
bvec->bv_len -= truncated_bytes;
 
/* ..and clear the end of the buffer for reads */
-   if (op == REQ_OP_READ) {
-   struct bio_vec bv;
-
-   bvec_last_segment(bvec, );
-   zero_user(bv.bv_page, bv.bv_offset + bv.bv_len,
-   truncated_bytes);
-   }
+   if (op == REQ_OP_READ)
+   zero_trailing_sectors(bvec, truncated_bytes);
 }
 
 static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 06:46:05PM +0800, Ming Lei wrote:
> Then your patch should work by just replacing virt boundary with segment
> boudary limit. I will do that change in V12 if you don't object.

Please do, thanks a lot.



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Ming Lei
On Thu, Nov 22, 2018 at 11:41:50AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 22, 2018 at 06:32:09PM +0800, Ming Lei wrote:
> > On Thu, Nov 22, 2018 at 11:04:28AM +0100, Christoph Hellwig wrote:
> > > On Thu, Nov 22, 2018 at 05:33:00PM +0800, Ming Lei wrote:
> > > > However, using virt boundary limit on non-cluster seems over-kill,
> > > > because the bio will be over-split(each small bvec may be split as one 
> > > > bio)
> > > > if it includes lots of small segment.
> > > 
> > > The combination of the virt boundary of PAGE_SIZE - 1 and a
> > > max_segment_size of PAGE_SIZE will only split if the to me merged
> > > segment is in a different page than the previous one, which is exactly
> > > what we need here.  Multiple small bvec inside the same page (e.g.
> > > 512 byte buffer_heads) will still be merged.
> > > 
> > > > What we want to do is just to avoid to merge bvecs to segment, which
> > > > should have been done by NO_SG_MERGE simply. However, after multi-page
> > > > is enabled, two adjacent bvecs won't be merged any more, I just forget
> > > > to remove the bvec merge code in V11.
> > > > 
> > > > So seems we can simply avoid to use virt boundary limit for non-cluster
> > > > after multipage bvec is enabled?
> > > 
> > > No, we can't just remove it.  As explained in the patch there is one very
> > > visible difference of setting the flag amd that is no segment will span a
> > > page boundary, and at least the iSCSI code seems to rely on that.
> > 
> > IMO, we should use queue_segment_boundary() to enhance the rule during 
> > splitting
> > segment after multi-page bvec is enabled.
> > 
> > Seems we miss the segment boundary limit in bvec_split_segs().
> 
> Yes, that looks like the right fix!

Then your patch should work by just replacing virt boundary with segment
boudary limit. I will do that change in V12 if you don't object.


Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 06:32:09PM +0800, Ming Lei wrote:
> On Thu, Nov 22, 2018 at 11:04:28AM +0100, Christoph Hellwig wrote:
> > On Thu, Nov 22, 2018 at 05:33:00PM +0800, Ming Lei wrote:
> > > However, using virt boundary limit on non-cluster seems over-kill,
> > > because the bio will be over-split(each small bvec may be split as one 
> > > bio)
> > > if it includes lots of small segment.
> > 
> > The combination of the virt boundary of PAGE_SIZE - 1 and a
> > max_segment_size of PAGE_SIZE will only split if the to me merged
> > segment is in a different page than the previous one, which is exactly
> > what we need here.  Multiple small bvec inside the same page (e.g.
> > 512 byte buffer_heads) will still be merged.
> > 
> > > What we want to do is just to avoid to merge bvecs to segment, which
> > > should have been done by NO_SG_MERGE simply. However, after multi-page
> > > is enabled, two adjacent bvecs won't be merged any more, I just forget
> > > to remove the bvec merge code in V11.
> > > 
> > > So seems we can simply avoid to use virt boundary limit for non-cluster
> > > after multipage bvec is enabled?
> > 
> > No, we can't just remove it.  As explained in the patch there is one very
> > visible difference of setting the flag amd that is no segment will span a
> > page boundary, and at least the iSCSI code seems to rely on that.
> 
> IMO, we should use queue_segment_boundary() to enhance the rule during 
> splitting
> segment after multi-page bvec is enabled.
> 
> Seems we miss the segment boundary limit in bvec_split_segs().

Yes, that looks like the right fix!



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 06:26:17PM +0800, Ming Lei wrote:
> Suppose one bio includes (pg0, 0, 512) and (pg1, 512, 512):
> 
> The split is introduced by the following code in blk_bio_segment_split():
> 
>   if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
>   goto split;
> 
> Without this patch, for non-cluster, the two bvecs are just in different
> segment, but still handled by one same bio. Now you convert into two bios.

Oh, true - we create new bios instead of new segments.  So I guess we
need to do something special here if we don't want to pay that overhead.

Or just look into killing the cluster setting..



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Ming Lei
On Thu, Nov 22, 2018 at 11:04:28AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 22, 2018 at 05:33:00PM +0800, Ming Lei wrote:
> > However, using virt boundary limit on non-cluster seems over-kill,
> > because the bio will be over-split(each small bvec may be split as one bio)
> > if it includes lots of small segment.
> 
> The combination of the virt boundary of PAGE_SIZE - 1 and a
> max_segment_size of PAGE_SIZE will only split if the to me merged
> segment is in a different page than the previous one, which is exactly
> what we need here.  Multiple small bvec inside the same page (e.g.
> 512 byte buffer_heads) will still be merged.
> 
> > What we want to do is just to avoid to merge bvecs to segment, which
> > should have been done by NO_SG_MERGE simply. However, after multi-page
> > is enabled, two adjacent bvecs won't be merged any more, I just forget
> > to remove the bvec merge code in V11.
> > 
> > So seems we can simply avoid to use virt boundary limit for non-cluster
> > after multipage bvec is enabled?
> 
> No, we can't just remove it.  As explained in the patch there is one very
> visible difference of setting the flag amd that is no segment will span a
> page boundary, and at least the iSCSI code seems to rely on that.

IMO, we should use queue_segment_boundary() to enhance the rule during splitting
segment after multi-page bvec is enabled.

Seems we miss the segment boundary limit in bvec_split_segs().

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-22 Thread Christoph Hellwig
Btw, this patch instead of the plain rever might make it a little
more clear what is going on by skipping the confusing helper altogher
and operating on the raw bvec array:


diff --git a/include/linux/bio.h b/include/linux/bio.h
index e5b975fa0558..926550ce2d21 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -137,24 +137,18 @@ static inline bool bio_full(struct bio *bio)
for (i = 0, iter_all.idx = 0; iter_all.idx < (bio)->bi_vcnt; 
iter_all.idx++)\
bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
i, iter_all)
 
-static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
- unsigned bytes, unsigned max_seg_len)
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+   unsigned bytes)
 {
iter->bi_sector += bytes >> 9;
 
if (bio_no_advance_iter(bio))
iter->bi_size -= bytes;
else
-   __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_seg_len);
+   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
/* TODO: It is reasonable to complete bio with error here. */
 }
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-   unsigned bytes)
-{
-   __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
-}
-
 #define __bio_for_each_segment(bvl, bio, iter, start)  \
for (iter = (start);\
 (iter).bi_size &&  \
@@ -168,7 +162,7 @@ static inline void bio_advance_iter(struct bio *bio, struct 
bvec_iter *iter,
for (iter = (start);\
 (iter).bi_size &&  \
((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
-__bio_advance_iter((bio), &(iter), (bvl).bv_len, BVEC_MAX_LEN))
+bio_advance_iter((bio), &(iter), (bvl).bv_len))
 
 /* returns one real segment(multi-page bvec) each time */
 #define bio_for_each_bvec(bvl, bio, iter)  \
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index cab36d838ed0..7d0f9bdb6f05 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -25,8 +25,6 @@
 #include 
 #include 
 
-#define BVEC_MAX_LEN  ((unsigned int)-1)
-
 /*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
@@ -102,8 +100,8 @@ struct bvec_iter_all {
.bv_offset  = segment_iter_offset((bvec), (iter)),  \
 })
 
-static inline bool __bvec_iter_advance(const struct bio_vec *bv,
-   struct bvec_iter *iter, unsigned bytes, unsigned max_seg_len)
+static inline bool bvec_iter_advance(const struct bio_vec *bv,
+   struct bvec_iter *iter, unsigned bytes)
 {
if (WARN_ONCE(bytes > iter->bi_size,
 "Attempted to advance past end of bvec iter\n")) {
@@ -112,20 +110,15 @@ static inline bool __bvec_iter_advance(const struct 
bio_vec *bv,
}
 
while (bytes) {
-   unsigned segment_len = segment_iter_len(bv, *iter);
-
-   if (max_seg_len < BVEC_MAX_LEN)
-   segment_len = min_t(unsigned, segment_len,
-   max_seg_len -
-   bvec_iter_offset(bv, *iter));
+   const struct bio_vec *cur = bv + iter->bi_idx;
+   unsigned len = min3(bytes, iter->bi_size,
+   cur->bv_len - iter->bi_bvec_done);
 
-   segment_len = min(bytes, segment_len);
-
-   bytes -= segment_len;
-   iter->bi_size -= segment_len;
-   iter->bi_bvec_done += segment_len;
+   bytes -= len;
+   iter->bi_size -= len;
+   iter->bi_bvec_done += len;
 
-   if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) {
+   if (iter->bi_bvec_done == cur->bv_len) {
iter->bi_bvec_done = 0;
iter->bi_idx++;
}
@@ -157,13 +150,6 @@ static inline bool bvec_iter_rewind(const struct bio_vec 
*bv,
return true;
 }
 
-static inline bool bvec_iter_advance(const struct bio_vec *bv,
-struct bvec_iter *iter,
-unsigned bytes)
-{
-   return __bvec_iter_advance(bv, iter, bytes, PAGE_SIZE);
-}
-
 #define for_each_bvec(bvl, bio_vec, iter, start)   \
for (iter = (start);\
 (iter).bi_size &&  \



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Ming Lei
On Thu, Nov 22, 2018 at 11:04:28AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 22, 2018 at 05:33:00PM +0800, Ming Lei wrote:
> > However, using virt boundary limit on non-cluster seems over-kill,
> > because the bio will be over-split(each small bvec may be split as one bio)
> > if it includes lots of small segment.
> 
> The combination of the virt boundary of PAGE_SIZE - 1 and a
> max_segment_size of PAGE_SIZE will only split if the to me merged
> segment is in a different page than the previous one, which is exactly
> what we need here.  Multiple small bvec inside the same page (e.g.
> 512 byte buffer_heads) will still be merged.

Suppose one bio includes (pg0, 0, 512) and (pg1, 512, 512):

The split is introduced by the following code in blk_bio_segment_split():

  if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
goto split;

Without this patch, for non-cluster, the two bvecs are just in different
segment, but still handled by one same bio. Now you convert into two bios.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 06:15:28PM +0800, Ming Lei wrote:
> > while (bytes) {
> > -   unsigned segment_len = segment_iter_len(bv, *iter);
> > -
> > -   if (max_seg_len < BVEC_MAX_LEN)
> > -   segment_len = min_t(unsigned, segment_len,
> > -   max_seg_len -
> > -   bvec_iter_offset(bv, *iter));
> > +   unsigned iter_len = bvec_iter_len(bv, *iter);
> > +   unsigned len = min(bytes, iter_len);
> 
> It may not work to always use bvec_iter_len() here, and 'segment_len'
> should be max length of the passed 'bv', however we don't know if it is
> single-page or mutli-page bvec if no one tells us.

The plain revert I sent isn't totally correct in your current tree
indeed because of the odd change that segment_iter_len now is the
full bvec len, so it should be changed to that until we switch to the
sane naming scheme used elsewhere.

But except for that, it will work.  The bvec we operate on (part of the
array in the bio pointed to by the iter) is always a multi-page capable
one. We only fake up a single page on for users using segment_iter_len.

Think of what you are doing in the (I think mostly hypothetical) case
that we call bvec_iter_advance with a bytes > PAGE_SIZE on a segment
small than page size.

In this case we will limit the current iteration to the while loop
until we git a page boundary.  This means we will not hit the condition
moving to the next (real, in the array) bvec at the end of the loop,
and just go on into the next iteration of the loop with the reduced
amount of bytes.  Depending on how large byte is this might repeat
a few more times.  Then on the final one we hit the limit and move
to the next (real, in the array) bvec.

Now if we don't have the segment limit we do exactly the same thing,
just a whole lot more efficiently in a single loop iteration.
tree where segment_iter_len is the 



Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-22 Thread Ming Lei
On Wed, Nov 21, 2018 at 06:12:17PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote:
> > No - I think we can always use the code without any segment in
> > bvec_iter_advance.  Because bvec_iter_advance only operates on the
> > iteractor, the generation of an actual single-page or multi-page
> > bvec is left to the caller using the bvec_iter_bvec or segment_iter_bvec
> > helpers.  The only difference is how many bytes you can move the
> > iterator forward in a single loop iteration - so if you pass in
> > PAGE_SIZE as the max_seg_len you just will have to loop more often
> > for a large enough bytes, but not actually do anything different.
> 
> FYI, this patch reverts the max_seg_len related changes back to where
> we are in mainline, and as expected everything works fine for me:
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index e5b975fa0558..926550ce2d21 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -137,24 +137,18 @@ static inline bool bio_full(struct bio *bio)
>   for (i = 0, iter_all.idx = 0; iter_all.idx < (bio)->bi_vcnt; 
> iter_all.idx++)\
>   bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
> i, iter_all)
>  
> -static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> *iter,
> -   unsigned bytes, unsigned max_seg_len)
> +static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
> + unsigned bytes)
>  {
>   iter->bi_sector += bytes >> 9;
>  
>   if (bio_no_advance_iter(bio))
>   iter->bi_size -= bytes;
>   else
> - __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_seg_len);
> + bvec_iter_advance(bio->bi_io_vec, iter, bytes);
>   /* TODO: It is reasonable to complete bio with error here. */
>  }
>  
> -static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
> - unsigned bytes)
> -{
> - __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
> -}
> -
>  #define __bio_for_each_segment(bvl, bio, iter, start)
> \
>   for (iter = (start);\
>(iter).bi_size &&  \
> @@ -168,7 +162,7 @@ static inline void bio_advance_iter(struct bio *bio, 
> struct bvec_iter *iter,
>   for (iter = (start);\
>(iter).bi_size &&  \
>   ((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
> -  __bio_advance_iter((bio), &(iter), (bvl).bv_len, BVEC_MAX_LEN))
> +  bio_advance_iter((bio), &(iter), (bvl).bv_len))
>  
>  /* returns one real segment(multi-page bvec) each time */
>  #define bio_for_each_bvec(bvl, bio, iter)\
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index cab36d838ed0..138b4007b8f2 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -25,8 +25,6 @@
>  #include 
>  #include 
>  
> -#define BVEC_MAX_LEN  ((unsigned int)-1)
> -
>  /*
>   * was unsigned short, but we might as well be ready for > 64kB I/O pages
>   */
> @@ -102,8 +100,8 @@ struct bvec_iter_all {
>   .bv_offset  = segment_iter_offset((bvec), (iter)),  \
>  })
>  
> -static inline bool __bvec_iter_advance(const struct bio_vec *bv,
> - struct bvec_iter *iter, unsigned bytes, unsigned max_seg_len)
> +static inline bool bvec_iter_advance(const struct bio_vec *bv,
> + struct bvec_iter *iter, unsigned bytes)
>  {
>   if (WARN_ONCE(bytes > iter->bi_size,
>"Attempted to advance past end of bvec iter\n")) {
> @@ -112,18 +110,12 @@ static inline bool __bvec_iter_advance(const struct 
> bio_vec *bv,
>   }
>  
>   while (bytes) {
> - unsigned segment_len = segment_iter_len(bv, *iter);
> -
> - if (max_seg_len < BVEC_MAX_LEN)
> - segment_len = min_t(unsigned, segment_len,
> - max_seg_len -
> - bvec_iter_offset(bv, *iter));
> + unsigned iter_len = bvec_iter_len(bv, *iter);
> + unsigned len = min(bytes, iter_len);

It may not work to always use bvec_iter_len() here, and 'segment_len'
should be max length of the passed 'bv', however we don't know if it is
single-page or mutli-page bvec if no one tells us.

Thanks,
Ming



Re: [Cluster-devel] [PATCH 0/2] gfs2: improvements to recovery and withdraw process (v2)

2018-11-22 Thread Steven Whitehouse

Hi,


On 21/11/18 18:52, Bob Peterson wrote:

Hi,

This is a second draft of a two-patch set to fix some of the nasty
journal recovery problems I've found lately.

The original post from 08 November had horribly bad and inaccurate
comments, and Steve Whitehouse and Andreas Gruenbacher pointed out.
This version is hopefully better and more accurately describes what
the patches do and how they work. Also, I fixed a superblock flag
that was improperly declared as a glock flag.

Other than the renamed and re-valued superblock flag, the code
remains unchanged from the previous version. It probably needs a bit
more testing, but it seems to work well.
---
The problems have to do with file system corruption caused when recovery
replays a journal after the resource group blocks have been unlocked
by the recovery process. In other words, when no cluster node takes
responsibility to replay the journal of a withdrawing node, then it
gets replayed later on, after the blocks contents have been changed.

The first patch prevents gfs2 from attempting recovery if the file system
is withdrawn or has journal IO errors. Trying to recover your own journal
from either of these unstable conditions is dangerous and likely to corrupt
the file system.

The second patch is more extensive. When a node withdraws from a file system
it signals all other nodes with the file system mounted to perform recovery
on its journal, since it cannot safely recover its own journal. This is
accomplished by a new non-disk callback glop used exclusively by the
"live" glock, which sets up an lvb in the glock to indicate which
journal(s) need to be recovered.

Regards,

Bob Peterson
---
Bob Peterson (2):
   gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn
   gfs2: initiate journal recovery as soon as a node withdraws

  fs/gfs2/glock.c|  5 ++-
  fs/gfs2/glops.c| 47 +++
  fs/gfs2/incore.h   |  3 ++
  fs/gfs2/lock_dlm.c | 95 ++
  fs/gfs2/log.c  | 62 --
  fs/gfs2/super.c|  5 ++-
  fs/gfs2/super.h|  1 +
  fs/gfs2/util.c | 84 
  fs/gfs2/util.h | 13 +++
  9 files changed, 282 insertions(+), 33 deletions(-)


Yes, that looks a bit cleaner now,

Steve.



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Christoph Hellwig
On Thu, Nov 22, 2018 at 05:33:00PM +0800, Ming Lei wrote:
> However, using virt boundary limit on non-cluster seems over-kill,
> because the bio will be over-split(each small bvec may be split as one bio)
> if it includes lots of small segment.

The combination of the virt boundary of PAGE_SIZE - 1 and a
max_segment_size of PAGE_SIZE will only split if the to me merged
segment is in a different page than the previous one, which is exactly
what we need here.  Multiple small bvec inside the same page (e.g.
512 byte buffer_heads) will still be merged.

> What we want to do is just to avoid to merge bvecs to segment, which
> should have been done by NO_SG_MERGE simply. However, after multi-page
> is enabled, two adjacent bvecs won't be merged any more, I just forget
> to remove the bvec merge code in V11.
> 
> So seems we can simply avoid to use virt boundary limit for non-cluster
> after multipage bvec is enabled?

No, we can't just remove it.  As explained in the patch there is one very
visible difference of setting the flag amd that is no segment will span a
page boundary, and at least the iSCSI code seems to rely on that.



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-22 Thread Ming Lei
On Wed, Nov 21, 2018 at 06:46:21PM +0100, Christoph Hellwig wrote:
> Actually..
> 
> I think we can kill this code entirely.  If we look at what the
> clustering setting is really about it is to avoid ever merging a
> segement that spans a page boundary.  And we should be able to do
> that with something like this before your series:
> 
> ---
> From 0d46fa76c376493a74ea0dbe77305bd5fa2cf011 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 21 Nov 2018 18:39:47 +0100
> Subject: block: remove the "cluster" flag
> 
> The cluster flag implements some very old SCSI behavior.  As far as I
> can tell the original intent was to enable or disable any kind of
> segment merging.  But the actually visible effect to the LLDD is that
> it limits each segments to be inside a single page, which we can
> also affect by setting the maximum segment size and the virt
> boundary.

This approach is pretty good given we can do post-split during mapping
sg.

However, using virt boundary limit on non-cluster seems over-kill,
because the bio will be over-split(each small bvec may be split as one bio)
if it includes lots of small segment.

What we want to do is just to avoid to merge bvecs to segment, which
should have been done by NO_SG_MERGE simply. However, after multi-page
is enabled, two adjacent bvecs won't be merged any more, I just forget
to remove the bvec merge code in V11.

So seems we can simply avoid to use virt boundary limit for non-cluster
after multipage bvec is enabled?


thanks,
Ming