Re: [Cluster-devel] [PATCH V10 18/19] block: kill QUEUE_FLAG_NO_SG_MERGE
On Thu, Nov 15, 2018 at 04:53:05PM +0800, Ming Lei wrote: > Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"), > physical segment number is mainly figured out in blk_queue_split() for > fast path, and the flag of BIO_SEG_VALID is set there too. > > Now only blk_recount_segments() and blk_recalc_rq_segments() use this > flag. > > Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID > is set in blk_queue_split(). > > For another user of blk_recalc_rq_segments(): > > - run in partial completion branch of blk_update_request, which is an unusual > case > > - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is > killed > since dm-rq is the only user. > > Multi-page bvec is enabled now, QUEUE_FLAG_NO_SG_MERGE doesn't make sense any > more. This commit message wasn't very clear. Is it the case that QUEUE_FLAG_NO_SG_MERGE is no longer set by any drivers?
Re: [Cluster-devel] [PATCH V10 17/19] block: don't use bio->bi_vcnt to figure out segment number
On Thu, Nov 15, 2018 at 04:53:04PM +0800, Ming Lei wrote: > It is wrong to use bio->bi_vcnt to figure out how many segments > there are in the bio even though CLONED flag isn't set on this bio, > because this bio may be splitted or advanced. > > So always use bio_segments() in blk_recount_segments(), and it shouldn't > cause any performance loss now because the physical segment number is figured > out in blk_queue_split() and BIO_SEG_VALID is set meantime since > bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"). > > Cc: Dave Chinner > Cc: Kent Overstreet > Fixes: 7f60dcaaf91 ("block: blk-merge: fix blk_recount_segments()") >From what I can tell, the problem was originally introduced by 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max segments") Is that right? > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > block/blk-merge.c | 8 +--- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index cb9f49bcfd36..153a659fde74 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -429,13 +429,7 @@ void blk_recalc_rq_segments(struct request *rq) > > void blk_recount_segments(struct request_queue *q, struct bio *bio) > { > - unsigned short seg_cnt; > - > - /* estimate segment number by bi_vcnt for non-cloned bio */ > - if (bio_flagged(bio, BIO_CLONED)) > - seg_cnt = bio_segments(bio); > - else > - seg_cnt = bio->bi_vcnt; > + unsigned short seg_cnt = bio_segments(bio); > > if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) && > (seg_cnt < queue_max_segments(q))) > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 13/19] iomap & xfs: only account for new added page
On Thu, Nov 15, 2018 at 04:53:00PM +0800, Ming Lei wrote: > After multi-page is enabled, one new page may be merged to a segment > even though it is a new added page. > > This patch deals with this issue by post-check in case of merge, and > only a freshly new added page need to be dealt with for iomap & xfs. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > fs/iomap.c | 22 ++ > fs/xfs/xfs_aops.c | 10 -- > include/linux/bio.h | 11 +++ > 3 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index df0212560b36..a1b97a5c726a 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -288,6 +288,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, > loff_t length, void *data, > loff_t orig_pos = pos; > unsigned poff, plen; > sector_t sector; > + bool need_account = false; > > if (iomap->type == IOMAP_INLINE) { > WARN_ON_ONCE(pos); > @@ -313,18 +314,15 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, > loff_t length, void *data, >*/ > sector = iomap_sector(iomap, pos); > if (ctx->bio && bio_end_sector(ctx->bio) == sector) { > - if (__bio_try_merge_page(ctx->bio, page, plen, poff)) > + if (__bio_try_merge_page(ctx->bio, page, plen, poff)) { > + need_account = iop && bio_is_last_segment(ctx->bio, > + page, plen, poff); It's redundant to make this iop && ... since you already check iop && need_account below. Maybe rename it to added_page? Also, this indentation is wack. > goto done; > + } > is_contig = true; > } > > - /* > - * If we start a new segment we need to increase the read count, and we > - * need to do so before submitting any previous full bio to make sure > - * that we don't prematurely unlock the page. > - */ > - if (iop) > - atomic_inc(>read_count); > + need_account = true; > > if (!ctx->bio || !is_contig || bio_full(ctx->bio)) { > gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); > @@ -347,6 +345,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, > loff_t length, void *data, > __bio_add_page(ctx->bio, page, plen, poff); > done: > /* > + * If we add a new page we need to increase the read count, and we > + * need to do so before submitting any previous full bio to make sure > + * that we don't prematurely unlock the page. > + */ > + if (iop && need_account) > + atomic_inc(>read_count); > + > + /* >* Move the caller beyond our range so that it keeps making progress. >* For that we have to include any leading non-uptodate ranges, but >* we can skip trailing ones as they will be handled in the next > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 1f1829e506e8..d8e9cc9f751a 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -603,6 +603,7 @@ xfs_add_to_ioend( > unsignedlen = i_blocksize(inode); > unsignedpoff = offset & (PAGE_SIZE - 1); > sector_tsector; > + boolneed_account; > > sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) + > ((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9); > @@ -617,13 +618,18 @@ xfs_add_to_ioend( > } > > if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) { > - if (iop) > - atomic_inc(>write_count); > + need_account = true; > if (bio_full(wpc->ioend->io_bio)) > xfs_chain_bio(wpc->ioend, wbc, bdev, sector); > __bio_add_page(wpc->ioend->io_bio, page, len, poff); > + } else { > + need_account = iop && bio_is_last_segment(wpc->ioend->io_bio, > + page, len, poff); Same here, no need for iop &&, rename it added_page, indentation is off. > } > > + if (iop && need_account) > + atomic_inc(>write_count); > + > wpc->ioend->io_size += len; > } > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 1a2430a8b89d..5040e9a2eb09 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -341,6 +341,17 @@ static inline struct bio_vec
Re: [Cluster-devel] [PATCH V10 15/19] block: always define BIO_MAX_PAGES as 256
On Thu, Nov 15, 2018 at 04:53:02PM +0800, Ming Lei wrote: > Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to > increase BIO_MAX_PAGES for it. You mentioned to it in the cover letter, but this needs more explanation in the commit message. Why did CONFIG_THP_SWAP require > 256? Why does multipage bvecs remove that requirement? > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > include/linux/bio.h | 8 > 1 file changed, 8 deletions(-) > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 5040e9a2eb09..277921ad42e7 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -34,15 +34,7 @@ > #define BIO_BUG_ON > #endif > > -#ifdef CONFIG_THP_SWAP > -#if HPAGE_PMD_NR > 256 > -#define BIO_MAX_PAGESHPAGE_PMD_NR > -#else > #define BIO_MAX_PAGES256 > -#endif > -#else > -#define BIO_MAX_PAGES256 > -#endif > > #define bio_prio(bio)(bio)->bi_ioprio > #define bio_set_prio(bio, prio) ((bio)->bi_ioprio = prio) > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 16/19] block: document usage of bio iterator helpers
On Thu, Nov 15, 2018 at 04:53:03PM +0800, Ming Lei wrote: > Now multi-page bvec is supported, some helpers may return page by > page, meantime some may return segment by segment, this patch > documents the usage. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > Documentation/block/biovecs.txt | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt > index 25689584e6e0..bfafb70d0d9e 100644 > --- a/Documentation/block/biovecs.txt > +++ b/Documentation/block/biovecs.txt > @@ -117,3 +117,29 @@ Other implications: > size limitations and the limitations of the underlying devices. Thus > there's no need to define ->merge_bvec_fn() callbacks for individual block > drivers. > + > +Usage of helpers: > += > + > +* The following helpers whose names have the suffix of "_all" can only be > used > +on non-BIO_CLONED bio, and usually they are used by filesystem code, and > driver > +shouldn't use them because bio may have been split before they got to the > driver: Putting an english teacher hat on, this is quite the run-on sentence. How about: * The following helpers whose names have the suffix of "_all" can only be used on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers shouldn't use them because the bio may have been split before it reached the driver. Maybe also an explanation of why the filesystem would want to use these? > + bio_for_each_segment_all() > + bio_first_bvec_all() > + bio_first_page_all() > + bio_last_bvec_all() > + > +* The following helpers iterate over single-page bvec, and the local > +variable of 'struct bio_vec' or the reference records single-page IO > +vector during the itearation: * The following helpers iterate over single-page bvecs. The passed 'struct bio_vec' will contain a single-page IO vector during the iteration. > + bio_for_each_segment() > + bio_for_each_segment_all() > + > +* The following helper iterates over multi-page bvec, and each bvec may > +include multiple physically contiguous pages, and the local variable of > +'struct bio_vec' or the reference records multi-page IO vector during the > +itearation: * The following helper iterates over multi-page bvecs. Each bvec may include multiple physically contiguous pages. The passed 'struct bio_vec' will contain a multi-page IO vector during the iteration. > + bio_for_each_bvec() > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 14/19] block: enable multipage bvecs
On Thu, Nov 15, 2018 at 04:53:01PM +0800, Ming Lei wrote: > This patch pulls the trigger for multi-page bvecs. > > Now any request queue which supports queue cluster will see multi-page > bvecs. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > block/bio.c | 24 ++-- > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 6486722d4d4b..ed6df6f8e63d 100644 > --- a/block/bio.c > +++ b/block/bio.c This comment above __bio_try_merge_page() doesn't make sense after this change: This is a a useful optimisation for file systems with a block size smaller than the page size. Can you please get rid of it in this patch? > @@ -767,12 +767,24 @@ bool __bio_try_merge_page(struct bio *bio, struct page > *page, > > if (bio->bi_vcnt > 0) { > struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1]; > - > - if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) { > - bv->bv_len += len; > - bio->bi_iter.bi_size += len; > - return true; > - } > + struct request_queue *q = NULL; > + > + if (page == bv->bv_page && off == (bv->bv_offset + bv->bv_len) > + && (off + len) <= PAGE_SIZE) > + goto merge; The parentheses around (bv->bv_offset + bv->bv_len) and (off + len) are unnecessary noise. What's the point of the new (off + len) <= PAGE_SIZE check? > + > + if (bio->bi_disk) > + q = bio->bi_disk->queue; > + > + /* disable multi-page bvec too if cluster isn't enabled */ > + if (!q || !blk_queue_cluster(q) || > + ((page_to_phys(bv->bv_page) + bv->bv_offset + bv->bv_len) != > + (page_to_phys(page) + off))) More unnecessary parentheses here. > + return false; > + merge: > + bv->bv_len += len; > + bio->bi_iter.bi_size += len; > + return true; > } > return false; > } > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 10/19] block: loop: pass multi-page bvec to iov_iter
On Thu, Nov 15, 2018 at 04:52:57PM +0800, Ming Lei wrote: > iov_iter is implemented with bvec itererator, so it is safe to pass > multipage bvec to it, and this way is much more efficient than > passing one page in each bvec. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval Comments below. > Signed-off-by: Ming Lei > --- > drivers/block/loop.c | 23 --- > 1 file changed, 12 insertions(+), 11 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index bf6bc35aaf88..a3fd418ec637 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -515,16 +515,16 @@ static int lo_rw_aio(struct loop_device *lo, struct > loop_cmd *cmd, > struct bio *bio = rq->bio; > struct file *file = lo->lo_backing_file; > unsigned int offset; > - int segments = 0; > + int nr_bvec = 0; > int ret; > > if (rq->bio != rq->biotail) { > - struct req_iterator iter; > + struct bvec_iter iter; > struct bio_vec tmp; > > __rq_for_each_bio(bio, rq) > - segments += bio_segments(bio); > - bvec = kmalloc_array(segments, sizeof(struct bio_vec), > + nr_bvec += bio_bvecs(bio); > + bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec), >GFP_NOIO); > if (!bvec) > return -EIO; > @@ -533,13 +533,14 @@ static int lo_rw_aio(struct loop_device *lo, struct > loop_cmd *cmd, > /* >* The bios of the request may be started from the middle of >* the 'bvec' because of bio splitting, so we can't directly > - * copy bio->bi_iov_vec to new bvec. The rq_for_each_segment > + * copy bio->bi_iov_vec to new bvec. The bio_for_each_bvec >* API will take care of all details for us. >*/ > - rq_for_each_segment(tmp, rq, iter) { > - *bvec = tmp; > - bvec++; > - } > + __rq_for_each_bio(bio, rq) > + bio_for_each_bvec(tmp, bio, iter) { > + *bvec = tmp; > + bvec++; > + } Even if they're not strictly necessary, could you please include the curly braces for __rq_for_each_bio() here? > bvec = cmd->bvec; > offset = 0; > } else { > @@ -550,11 +551,11 @@ static int lo_rw_aio(struct loop_device *lo, struct > loop_cmd *cmd, >*/ > offset = bio->bi_iter.bi_bvec_done; > bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); > - segments = bio_segments(bio); > + nr_bvec = bio_bvecs(bio); This scared me for a second, but it's fine to do here because we haven't actually enabled multipage bvecs yet, right? > } > atomic_set(>ref, 2); > > - iov_iter_bvec(, rw, bvec, segments, blk_rq_bytes(rq)); > + iov_iter_bvec(, rw, bvec, nr_bvec, blk_rq_bytes(rq)); > iter.iov_offset = offset; > > cmd->iocb.ki_pos = pos; > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec
On Thu, Nov 15, 2018 at 04:52:59PM +0800, Ming Lei wrote: > This patch introduces one extra iterator variable to > bio_for_each_segment_all(), > then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. > > Given it is just one mechannical & simple change on all > bio_for_each_segment_all() > users, this patch does tree-wide change in one single patch, so that we can > avoid to use a temporary helper for this conversion. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: linux-fsde...@vger.kernel.org > Cc: Alexander Viro > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: linux-bt...@vger.kernel.org > Cc: David Sterba > Cc: Darrick J. Wong > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > block/bio.c | 27 ++- > block/blk-zoned.c | 1 + > block/bounce.c| 6 -- > drivers/md/bcache/btree.c | 3 ++- > drivers/md/dm-crypt.c | 3 ++- > drivers/md/raid1.c| 3 ++- > drivers/staging/erofs/data.c | 3 ++- > drivers/staging/erofs/unzip_vle.c | 3 ++- > fs/block_dev.c| 6 -- > fs/btrfs/compression.c| 3 ++- > fs/btrfs/disk-io.c| 3 ++- > fs/btrfs/extent_io.c | 12 > fs/btrfs/inode.c | 6 -- > fs/btrfs/raid56.c | 3 ++- > fs/crypto/bio.c | 3 ++- > fs/direct-io.c| 4 +++- > fs/exofs/ore.c| 3 ++- > fs/exofs/ore_raid.c | 3 ++- > fs/ext4/page-io.c | 3 ++- > fs/ext4/readpage.c| 3 ++- > fs/f2fs/data.c| 9 ++--- > fs/gfs2/lops.c| 6 -- > fs/gfs2/meta_io.c | 3 ++- > fs/iomap.c| 6 -- > fs/mpage.c| 3 ++- > fs/xfs/xfs_aops.c | 5 +++-- > include/linux/bio.h | 11 +-- > include/linux/bvec.h | 31 +++ > 28 files changed, 129 insertions(+), 46 deletions(-) > [snip] > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 3496c816946e..1a2430a8b89d 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -131,12 +131,19 @@ static inline bool bio_full(struct bio *bio) > return bio->bi_vcnt >= bio->bi_max_vecs; > } > > +#define bvec_for_each_segment(bv, bvl, i, iter_all) \ > + for (bv = bvec_init_iter_all(_all);\ > + (iter_all.done < (bvl)->bv_len) && \ > + ((bvec_next_segment((bvl), _all)), 1); \ The parentheses around (bvec_next_segment((bvl), _all)) are unnecessary. > + iter_all.done += bv->bv_len, i += 1) > + > /* > * drivers should _never_ use the all version - the bio may have been split > * before it got to the driver and the driver won't own all of it > */ > -#define bio_for_each_segment_all(bvl, bio, i) > \ > - for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++) > +#define bio_for_each_segment_all(bvl, bio, i, iter_all) \ > + for (i = 0, iter_all.idx = 0; iter_all.idx < (bio)->bi_vcnt; > iter_all.idx++)\ > + bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), > i, iter_all) Would it be possible to move i into iter_all to streamline this a bit? > static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter > *iter, > unsigned bytes, bool mp) > diff --git a/include/linux/bvec.h b/include/linux/bvec.h > index 01616a0b6220..02f26d2b59ad 100644 > --- a/include/linux/bvec.h > +++ b/include/linux/bvec.h > @@ -82,6 +82,12 @@ struct bvec_iter { > current bvec */ > }; > > +struct bvec_iter_all { > + struct bio_vec bv; > + int idx; > + unsigneddone; > +}; > + > /* > * various member access, note that bio_data should of course not be used > * on highmem page vectors > @@ -216,6 +222,31 @@ static inline bool mp_bvec_iter_advance(const struct > bio_vec *bv, > .bi_bvec_done = 0,\ > } > > +static inline struct bio_vec *bvec_init_iter_all(struct bvec_iter_all > *iter_all) > +{ > + iter_all->bv.bv_page = NULL; > + iter_all->done = 0; > + > + return _all->bv; > +} > + > +/* used for chunk_for_each_segment */ > +static inline void bvec_next_segment(const struct bio_vec *bvec, > + struct bvec_iter_all *iter_all) Indentation. > +{ > + struct bio_vec
Re: [Cluster-devel] [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()
On Thu, Nov 15, 2018 at 04:52:58PM +0800, Ming Lei wrote: > bch_bio_alloc_pages() is always called on one new bio, so it is safe > to access the bvec table directly. Given it is the only kind of this > case, open code the bvec table access since bio_for_each_segment_all() > will be changed to support for iterating over multipage bvec. > > Cc: Dave Chinner > Cc: Kent Overstreet > Acked-by: Coly Li > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > drivers/md/bcache/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c > index 20eddeac1531..8517aebcda2d 100644 > --- a/drivers/md/bcache/util.c > +++ b/drivers/md/bcache/util.c > @@ -270,7 +270,7 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask) > int i; > struct bio_vec *bv; > > - bio_for_each_segment_all(bv, bio, i) { > + for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) { This is missing an i++.
Re: [Cluster-devel] [PATCH V10 07/19] btrfs: use bvec_last_segment to get bio's last page
On Thu, Nov 15, 2018 at 04:52:54PM +0800, Ming Lei wrote: > Preparing for supporting multi-page bvec. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > fs/btrfs/compression.c | 5 - > fs/btrfs/extent_io.c | 5 +++-- > 2 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index 2955a4ea2fa8..161e14b8b180 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -400,8 +400,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode > *inode, u64 start, > static u64 bio_end_offset(struct bio *bio) > { > struct bio_vec *last = bio_last_bvec_all(bio); > + struct bio_vec bv; > > - return page_offset(last->bv_page) + last->bv_len + last->bv_offset; > + bvec_last_segment(last, ); > + > + return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset; > } > > static noinline int add_ra_bio_pages(struct inode *inode, > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index d228f706ff3e..5d5965297e7e 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2720,11 +2720,12 @@ static int __must_check submit_one_bio(struct bio > *bio, int mirror_num, > { > blk_status_t ret = 0; > struct bio_vec *bvec = bio_last_bvec_all(bio); > - struct page *page = bvec->bv_page; > + struct bio_vec bv; > struct extent_io_tree *tree = bio->bi_private; > u64 start; > > - start = page_offset(page) + bvec->bv_offset; > + bvec_last_segment(bvec, ); > + start = page_offset(bv.bv_page) + bv.bv_offset; > > bio->bi_private = NULL; > > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()
On Thu, Nov 15, 2018 at 04:52:56PM +0800, Ming Lei wrote: > There are still cases in which we need to use bio_bvecs() for get the > number of multi-page segment, so introduce it. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > include/linux/bio.h | 30 +- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 1f0dcf109841..3496c816946e 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -196,7 +196,6 @@ static inline unsigned bio_segments(struct bio *bio) >* We special case discard/write same/write zeroes, because they >* interpret bi_size differently: >*/ > - > switch (bio_op(bio)) { > case REQ_OP_DISCARD: > case REQ_OP_SECURE_ERASE: > @@ -205,13 +204,34 @@ static inline unsigned bio_segments(struct bio *bio) > case REQ_OP_WRITE_SAME: > return 1; > default: > - break; > + bio_for_each_segment(bv, bio, iter) > + segs++; > + return segs; > } > +} > > - bio_for_each_segment(bv, bio, iter) > - segs++; > +static inline unsigned bio_bvecs(struct bio *bio) > +{ > + unsigned bvecs = 0; > + struct bio_vec bv; > + struct bvec_iter iter; > > - return segs; > + /* > + * We special case discard/write same/write zeroes, because they > + * interpret bi_size differently: > + */ > + switch (bio_op(bio)) { > + case REQ_OP_DISCARD: > + case REQ_OP_SECURE_ERASE: > + case REQ_OP_WRITE_ZEROES: > + return 0; > + case REQ_OP_WRITE_SAME: > + return 1; > + default: > + bio_for_each_bvec(bv, bio, iter) > + bvecs++; > + return bvecs; > + } > } > > /* > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs
On Thu, Nov 15, 2018 at 04:52:55PM +0800, Ming Lei wrote: > BTRFS is the only user of this helper, so move this helper into > BTRFS, and implement it via bio_for_each_segment_all(), since > bio->bi_vcnt may not equal to number of pages after multipage bvec > is enabled. Shouldn't you also get rid of bio_pages_all() in this patch? > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > fs/btrfs/extent_io.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 5d5965297e7e..874bb9aeebdc 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2348,6 +2348,18 @@ struct bio *btrfs_create_repair_bio(struct inode > *inode, struct bio *failed_bio, > return bio; > } > > +static unsigned btrfs_bio_pages_all(struct bio *bio) > +{ > + unsigned i; > + struct bio_vec *bv; > + > + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); > + > + bio_for_each_segment_all(bv, bio, i) > + ; > + return i; > +} > + > /* > * this is a generic handler for readpage errors (default > * readpage_io_failed_hook). if other copies exist, read those and write back > @@ -2368,7 +2380,7 @@ static int bio_readpage_error(struct bio *failed_bio, > u64 phy_offset, > int read_mode = 0; > blk_status_t status; > int ret; > - unsigned failed_bio_pages = bio_pages_all(failed_bio); > + unsigned failed_bio_pages = btrfs_bio_pages_all(failed_bio); > > BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); > > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 06/19] fs/buffer.c: use bvec iterator to truncate the bio
On Thu, Nov 15, 2018 at 04:52:53PM +0800, Ming Lei wrote: > Once multi-page bvec is enabled, the last bvec may include more than one > page, this patch use bvec_last_segment() to truncate the bio. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > fs/buffer.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/buffer.c b/fs/buffer.c > index 1286c2b95498..fa37ad52e962 100644 > --- a/fs/buffer.c > +++ b/fs/buffer.c > @@ -3032,7 +3032,10 @@ void guard_bio_eod(int op, struct bio *bio) > > /* ..and clear the end of the buffer for reads */ > if (op == REQ_OP_READ) { > - zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len, > + struct bio_vec bv; > + > + bvec_last_segment(bvec, ); > + zero_user(bv.bv_page, bv.bv_offset + bv.bv_len, > truncated_bytes); > } > } > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 05/19] block: introduce bvec_last_segment()
On Thu, Nov 15, 2018 at 04:52:52PM +0800, Ming Lei wrote: > BTRFS and guard_bio_eod() need to get the last singlepage segment > from one multipage bvec, so introduce this helper to make them happy. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval Minor comments below. > Signed-off-by: Ming Lei > --- > include/linux/bvec.h | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/include/linux/bvec.h b/include/linux/bvec.h > index 3d61352cd8cf..01616a0b6220 100644 > --- a/include/linux/bvec.h > +++ b/include/linux/bvec.h > @@ -216,4 +216,29 @@ static inline bool mp_bvec_iter_advance(const struct > bio_vec *bv, > .bi_bvec_done = 0,\ > } > > +/* > + * Get the last singlepage segment from the multipage bvec and store it > + * in @seg > + */ > +static inline void bvec_last_segment(const struct bio_vec *bvec, > + struct bio_vec *seg) Indentation is all messed up here. > +{ > + unsigned total = bvec->bv_offset + bvec->bv_len; > + unsigned last_page = total / PAGE_SIZE; > + > + if (last_page * PAGE_SIZE == total) > + last_page--; I think this could just be unsigned int last_page = (total - 1) / PAGE_SIZE; > + seg->bv_page = nth_page(bvec->bv_page, last_page); > + > + /* the whole segment is inside the last page */ > + if (bvec->bv_offset >= last_page * PAGE_SIZE) { > + seg->bv_offset = bvec->bv_offset % PAGE_SIZE; > + seg->bv_len = bvec->bv_len; > + } else { > + seg->bv_offset = 0; > + seg->bv_len = total - last_page * PAGE_SIZE; > + } > +} > + > #endif /* __LINUX_BVEC_ITER_H */ > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V10 03/19] block: use bio_for_each_bvec() to compute multi-page bvec count
On Thu, Nov 15, 2018 at 04:05:10PM -0500, Mike Snitzer wrote: > On Thu, Nov 15 2018 at 3:20pm -0500, > Omar Sandoval wrote: > > > On Thu, Nov 15, 2018 at 04:52:50PM +0800, Ming Lei wrote: > > > First it is more efficient to use bio_for_each_bvec() in both > > > blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how > > > many multi-page bvecs there are in the bio. > > > > > > Secondly once bio_for_each_bvec() is used, the bvec may need to be > > > splitted because its length can be very longer than max segment size, > > > so we have to split the big bvec into several segments. > > > > > > Thirdly when splitting multi-page bvec into segments, the max segment > > > limit may be reached, so the bio split need to be considered under > > > this situation too. > > > > > > Cc: Dave Chinner > > > Cc: Kent Overstreet > > > Cc: Mike Snitzer > > > Cc: dm-de...@redhat.com > > > Cc: Alexander Viro > > > Cc: linux-fsde...@vger.kernel.org > > > Cc: Shaohua Li > > > Cc: linux-r...@vger.kernel.org > > > Cc: linux-er...@lists.ozlabs.org > > > Cc: David Sterba > > > Cc: linux-bt...@vger.kernel.org > > > Cc: Darrick J. Wong > > > Cc: linux-...@vger.kernel.org > > > Cc: Gao Xiang > > > Cc: Christoph Hellwig > > > Cc: Theodore Ts'o > > > Cc: linux-e...@vger.kernel.org > > > Cc: Coly Li > > > Cc: linux-bca...@vger.kernel.org > > > Cc: Boaz Harrosh > > > Cc: Bob Peterson > > > Cc: cluster-devel@redhat.com > > > Signed-off-by: Ming Lei > > > --- > > > block/blk-merge.c | 90 > > > ++- > > > 1 file changed, 76 insertions(+), 14 deletions(-) > > > > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > > index 91b2af332a84..6f7deb94a23f 100644 > > > --- a/block/blk-merge.c > > > +++ b/block/blk-merge.c > > > @@ -160,6 +160,62 @@ static inline unsigned get_max_io_size(struct > > > request_queue *q, > > > return sectors; > > > } > > > > > > +/* > > > + * Split the bvec @bv into segments, and update all kinds of > > > + * variables. > > > + */ > > > +static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, > > > + unsigned *nsegs, unsigned *last_seg_size, > > > + unsigned *front_seg_size, unsigned *sectors) > > > +{ > > > + bool need_split = false; > > > + unsigned len = bv->bv_len; > > > + unsigned total_len = 0; > > > + unsigned new_nsegs = 0, seg_size = 0; > > > > "unsigned int" here and everywhere else. > > Curious why? I've wondered what govens use of "unsigned" vs "unsigned > int" recently and haven't found _the_ reason to pick one over the other. My only reason to prefer unsigned int is consistency. unsigned int is much more common in the kernel: $ ag --cc -s 'unsigned\s+int' | wc -l 129632 $ ag --cc -s 'unsigned\s+(?!char|short|int|long)' | wc -l 22435 checkpatch also warns on plain unsigned.
Re: [Cluster-devel] [PATCH V10 04/19] block: use bio_for_each_bvec() to map sg
On Thu, Nov 15, 2018 at 04:52:51PM +0800, Ming Lei wrote: > It is more efficient to use bio_for_each_bvec() to map sg, meantime > we have to consider splitting multipage bvec as done in > blk_bio_segment_split(). > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > block/blk-merge.c | 72 > +++ > 1 file changed, 52 insertions(+), 20 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 6f7deb94a23f..cb9f49bcfd36 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -473,6 +473,56 @@ static int blk_phys_contig_segment(struct request_queue > *q, struct bio *bio, > return biovec_phys_mergeable(q, _bv, _bv); > } > > +static struct scatterlist *blk_next_sg(struct scatterlist **sg, > + struct scatterlist *sglist) > +{ > + if (!*sg) > + return sglist; > + else { > + /* > + * If the driver previously mapped a shorter > + * list, we could see a termination bit > + * prematurely unless it fully inits the sg > + * table on each mapping. We KNOW that there > + * must be more entries here or the driver > + * would be buggy, so force clear the > + * termination bit to avoid doing a full > + * sg_init_table() in drivers for each command. > + */ > + sg_unmark_end(*sg); > + return sg_next(*sg); > + } > +} > + > +static unsigned blk_bvec_map_sg(struct request_queue *q, > + struct bio_vec *bvec, struct scatterlist *sglist, > + struct scatterlist **sg) > +{ > + unsigned nbytes = bvec->bv_len; > + unsigned nsegs = 0, total = 0; > + > + while (nbytes > 0) { > + unsigned seg_size; > + struct page *pg; > + unsigned offset, idx; > + > + *sg = blk_next_sg(sg, sglist); > + > + seg_size = min(nbytes, queue_max_segment_size(q)); > + offset = (total + bvec->bv_offset) % PAGE_SIZE; > + idx = (total + bvec->bv_offset) / PAGE_SIZE; > + pg = nth_page(bvec->bv_page, idx); > + > + sg_set_page(*sg, pg, seg_size, offset); > + > + total += seg_size; > + nbytes -= seg_size; > + nsegs++; > + } > + > + return nsegs; > +} > + > static inline void > __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec, >struct scatterlist *sglist, struct bio_vec *bvprv, > @@ -490,25 +540,7 @@ __blk_segment_map_sg(struct request_queue *q, struct > bio_vec *bvec, > (*sg)->length += nbytes; > } else { > new_segment: > - if (!*sg) > - *sg = sglist; > - else { > - /* > - * If the driver previously mapped a shorter > - * list, we could see a termination bit > - * prematurely unless it fully inits the sg > - * table on each mapping. We KNOW that there > - * must be more entries here or the driver > - * would be buggy, so force clear the > - * termination bit to avoid doing a full > - * sg_init_table() in drivers for each command. > - */ > - sg_unmark_end(*sg); > - *sg = sg_next(*sg); > - } > - > - sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset); > - (*nsegs)++; > + (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg); > } > *bvprv = *bvec; > } > @@ -530,7 +562,7 @@ static int __blk_bios_map_sg(struct request_queue *q, > struct bio *bio, > int cluster = blk_queue_cluster(q), nsegs = 0; > > for_each_bio(bio) > - bio_for_each_segment(bvec, bio, iter) > + bio_for_each_bvec(bvec, bio, iter) > __blk_segment_map_sg(q, , sglist, , sg, >, ); > > -- > 2.9.5 >
Re: [Cluster-devel] [GIT PULL] gfs2: 4.20 fixes
On Thu, 15 Nov 2018 at 19:23, Linus Torvalds wrote: > > On Thu, Nov 15, 2018 at 12:20 PM Andreas Gruenbacher > wrote: > > > > I guess rebasing the for-next branch onto something more recent to > > avoid the back-merge in the first place will be best, resulting in a > > cleaner history. > > Rebases aren't really any better at all. > > If you have a real *reason* for a merge, do the merge. But then the > reason should be clearly stated in the merge commit. Not just some > random undocumented merge message. Tell why some other branch was > relevant to your fix and needed to be pulled in. > > Better yet, don't do either merges or rebases. Ok, I've changed the merge commit as follows now: Merge tag 'v4.20-rc1' Pull in the gfs2 fixes that went into v4.19-rc8: gfs2: Fix iomap buffered write support for journaled files gfs2: Fix iomap buffered write support for journaled files (2) Without these two commits, the following fix would cause conflicts. So merging v4.19-rc8 would have been sufficient. v4.20-rc1 is what I ended up testing, though. Are you okay with that now? Thanks, Andreas -- The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git tags/gfs2-4.20.fixes3 for you to fetch changes up to 01ed1606d30966dc8fa255a941b2fc42d4e308a1: gfs2: Fix iomap buffer head reference counting bug (2018-11-15 21:31:58 +0100) Fix two bugs leading to leaked buffer head references: gfs2: Put bitmap buffers in put_super gfs2: Fix iomap buffer head reference counting bug And one bug leading to significant slow-downs when deleting large files: gfs2: Fix metadata read-ahead during truncate (2) Andreas Gruenbacher (4): gfs2: Put bitmap buffers in put_super gfs2: Fix metadata read-ahead during truncate (2) Merge tag 'v4.20-rc1' gfs2: Fix iomap buffer head reference counting bug fs/gfs2/bmap.c | 54 +++--- fs/gfs2/rgrp.c | 3 ++- 2 files changed, 29 insertions(+), 28 deletions(-)
Re: [Cluster-devel] [PATCH V10 03/19] block: use bio_for_each_bvec() to compute multi-page bvec count
On Thu, Nov 15, 2018 at 04:52:50PM +0800, Ming Lei wrote: > First it is more efficient to use bio_for_each_bvec() in both > blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how > many multi-page bvecs there are in the bio. > > Secondly once bio_for_each_bvec() is used, the bvec may need to be > splitted because its length can be very longer than max segment size, > so we have to split the big bvec into several segments. > > Thirdly when splitting multi-page bvec into segments, the max segment > limit may be reached, so the bio split need to be considered under > this situation too. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com > Signed-off-by: Ming Lei > --- > block/blk-merge.c | 90 > ++- > 1 file changed, 76 insertions(+), 14 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 91b2af332a84..6f7deb94a23f 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -160,6 +160,62 @@ static inline unsigned get_max_io_size(struct > request_queue *q, > return sectors; > } > > +/* > + * Split the bvec @bv into segments, and update all kinds of > + * variables. > + */ > +static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, > + unsigned *nsegs, unsigned *last_seg_size, > + unsigned *front_seg_size, unsigned *sectors) > +{ > + bool need_split = false; > + unsigned len = bv->bv_len; > + unsigned total_len = 0; > + unsigned new_nsegs = 0, seg_size = 0; "unsigned int" here and everywhere else. > + if ((*nsegs >= queue_max_segments(q)) || !len) > + return need_split; > + > + /* > + * Multipage bvec may be too big to hold in one segment, > + * so the current bvec has to be splitted as multiple > + * segments. > + */ > + while (new_nsegs + *nsegs < queue_max_segments(q)) { > + seg_size = min(queue_max_segment_size(q), len); > + > + new_nsegs++; > + total_len += seg_size; > + len -= seg_size; > + > + if ((queue_virt_boundary(q) && ((bv->bv_offset + > + total_len) & queue_virt_boundary(q))) || !len) > + break; Checking queue_virt_boundary(q) != 0 is superfluous, and the len check could just control the loop, i.e., while (len && new_nsegs + *nsegs < queue_max_segments(q)) { seg_size = min(queue_max_segment_size(q), len); new_nsegs++; total_len += seg_size; len -= seg_size; if ((bv->bv_offset + total_len) & queue_virt_boundary(q)) break; } And if you rewrite it this way, I _think_ you can get rid of this special case: if ((*nsegs >= queue_max_segments(q)) || !len) return need_split; above. > + } > + > + /* split in the middle of the bvec */ > + if (len) > + need_split = true; need_split is unnecessary, just return len != 0. > + > + /* update front segment size */ > + if (!*nsegs) { > + unsigned first_seg_size = seg_size; > + > + if (new_nsegs > 1) > + first_seg_size = queue_max_segment_size(q); > + if (*front_seg_size < first_seg_size) > + *front_seg_size = first_seg_size; > + } > + > + /* update other varibles */ > + *last_seg_size = seg_size; > + *nsegs += new_nsegs; > + if (sectors) > + *sectors += total_len >> 9; > + > + return need_split; > +} > + > static struct bio *blk_bio_segment_split(struct request_queue *q, >struct bio *bio, >struct bio_set *bs, > @@ -173,7 +229,7 @@ static struct bio *blk_bio_segment_split(struct > request_queue *q, > struct bio *new = NULL; > const unsigned max_sectors = get_max_io_size(q, bio); > > - bio_for_each_segment(bv, bio, iter) { > + bio_for_each_bvec(bv, bio, iter) { > /* >* If the queue doesn't support SG gaps and adding this >* offset would create a gap, disallow it. > @@ -188,8 +244,12 @@ static struct bio *blk_bio_segment_split(struct > request_queue *q, >*/ > if (nsegs < queue_max_segments(q) && > sectors < max_sectors) { > -
Re: [Cluster-devel] [GIT PULL] gfs2: 4.20 fixes
On Thu, 15 Nov 2018 at 18:11, Linus Torvalds wrote: > On Thu, Nov 15, 2018 at 6:00 AM Andreas Gruenbacher > wrote: > > > > could you please pull the following gfs2 fixes for 4.20? > > No. > > I'm not pulling this useless commit message: > > "Merge tag 'v4.20-rc1'" > > with absolutely _zero_ explanation for why that merge was done. Sorry for that. I guess rebasing the for-next branch onto something more recent to avoid the back-merge in the first place will be best, resulting in a cleaner history. > Guys, stop doing this. Because I will stop pulling them. > > If you can't be bothered to explain exactly why you're doing a merge, > I can't be bothered to pull the result. > > Commit messages are important. They explain why something was done. > Merge commits are to some degree *more* important, because they do odd > things and the reason is not obvious from the code ("Oh, it's a > oneliner obvious fix"). > > So merge commits without a reason for them are simply not acceptable. Thanks, Andreas
Re: [Cluster-devel] [PATCH V10 02/19] block: introduce bio_for_each_bvec()
On Thu, Nov 15, 2018 at 04:52:49PM +0800, Ming Lei wrote: > This helper is used for iterating over multi-page bvec for bio > split & merge code. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval One comment below. > Signed-off-by: Ming Lei > --- > include/linux/bio.h | 34 +++--- > include/linux/bvec.h | 36 > 2 files changed, 63 insertions(+), 7 deletions(-) > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 056fb627edb3..1f0dcf109841 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -76,6 +76,9 @@ > #define bio_data_dir(bio) \ > (op_is_write(bio_op(bio)) ? WRITE : READ) > > +#define bio_iter_mp_iovec(bio, iter) \ > + mp_bvec_iter_bvec((bio)->bi_io_vec, (iter)) > + > /* > * Check whether this bio carries any data or not. A NULL bio is allowed. > */ > @@ -135,18 +138,33 @@ static inline bool bio_full(struct bio *bio) > #define bio_for_each_segment_all(bvl, bio, i) > \ > for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++) > > -static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter, > - unsigned bytes) > +static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter > *iter, > + unsigned bytes, bool mp) > { > iter->bi_sector += bytes >> 9; > > if (bio_no_advance_iter(bio)) > iter->bi_size -= bytes; > else > - bvec_iter_advance(bio->bi_io_vec, iter, bytes); > + if (!mp) > + bvec_iter_advance(bio->bi_io_vec, iter, bytes); > + else > + mp_bvec_iter_advance(bio->bi_io_vec, iter, bytes); if (!foo) {} else {} hurts my brain, please do if (mp) mp_bvec_iter_advance(bio->bi_io_vec, iter, bytes); else bvec_iter_advance(bio->bi_io_vec, iter, bytes);
Re: [Cluster-devel] [PATCH V10 01/19] block: introduce multi-page page bvec helpers
On Thu, Nov 15, 2018 at 04:52:48PM +0800, Ming Lei wrote: > This patch introduces helpers of 'mp_bvec_iter_*' for multipage > bvec support. > > The introduced helpers treate one bvec as real multi-page segment, > which may include more than one pages. > > The existed helpers of bvec_iter_* are interfaces for supporting current > bvec iterator which is thought as single-page by drivers, fs, dm and > etc. These introduced helpers will build single-page bvec in flight, so > this way won't break current bio/bvec users, which needn't any change. > > Cc: Dave Chinner > Cc: Kent Overstreet > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Alexander Viro > Cc: linux-fsde...@vger.kernel.org > Cc: Shaohua Li > Cc: linux-r...@vger.kernel.org > Cc: linux-er...@lists.ozlabs.org > Cc: David Sterba > Cc: linux-bt...@vger.kernel.org > Cc: Darrick J. Wong > Cc: linux-...@vger.kernel.org > Cc: Gao Xiang > Cc: Christoph Hellwig > Cc: Theodore Ts'o > Cc: linux-e...@vger.kernel.org > Cc: Coly Li > Cc: linux-bca...@vger.kernel.org > Cc: Boaz Harrosh > Cc: Bob Peterson > Cc: cluster-devel@redhat.com Reviewed-by: Omar Sandoval But a couple of comments below. > Signed-off-by: Ming Lei > --- > include/linux/bvec.h | 63 > +--- > 1 file changed, 60 insertions(+), 3 deletions(-) > > diff --git a/include/linux/bvec.h b/include/linux/bvec.h > index 02c73c6aa805..8ef904a50577 100644 > --- a/include/linux/bvec.h > +++ b/include/linux/bvec.h > @@ -23,6 +23,44 @@ > #include > #include > #include > +#include > + > +/* > + * What is multi-page bvecs? > + * > + * - bvecs stored in bio->bi_io_vec is always multi-page(mp) style > + * > + * - bvec(struct bio_vec) represents one physically contiguous I/O > + * buffer, now the buffer may include more than one pages after > + * multi-page(mp) bvec is supported, and all these pages represented > + * by one bvec is physically contiguous. Before mp support, at most > + * one page is included in one bvec, we call it single-page(sp) > + * bvec. > + * > + * - .bv_page of the bvec represents the 1st page in the mp bvec > + * > + * - .bv_offset of the bvec represents offset of the buffer in the bvec > + * > + * The effect on the current drivers/filesystem/dm/bcache/...: > + * > + * - almost everyone supposes that one bvec only includes one single > + * page, so we keep the sp interface not changed, for example, > + * bio_for_each_segment() still returns bvec with single page > + * > + * - bio_for_each_segment*() will be changed to return single-page > + * bvec too > + * > + * - during iterating, iterator variable(struct bvec_iter) is always > + * updated in multipage bvec style and that means bvec_iter_advance() > + * is kept not changed > + * > + * - returned(copied) single-page bvec is built in flight by bvec > + * helpers from the stored multipage bvec > + * > + * - In case that some components(such as iov_iter) need to support > + * multi-page bvec, we introduce new helpers(mp_bvec_iter_*) for > + * them. > + */ This comment sounds more like a commit message (i.e., how were things before, and how are we changing them). In a couple of years when I read this code, I probably won't care how it was changed, just how it works. So I think a comment explaining the concepts of multi-page and single-page bvecs is very useful, but please move all of the "foo was changed" and "before mp support" type stuff to the commit message. > /* > * was unsigned short, but we might as well be ready for > 64kB I/O pages > @@ -50,16 +88,35 @@ struct bvec_iter { > */ > #define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx]) > > -#define bvec_iter_page(bvec, iter) \ > +#define mp_bvec_iter_page(bvec, iter)\ > (__bvec_iter_bvec((bvec), (iter))->bv_page) > > -#define bvec_iter_len(bvec, iter)\ > +#define mp_bvec_iter_len(bvec, iter) \ > min((iter).bi_size, \ > __bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done) > > -#define bvec_iter_offset(bvec, iter) \ > +#define mp_bvec_iter_offset(bvec, iter) \ > (__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done) > > +#define mp_bvec_iter_page_idx(bvec, iter)\ > + (mp_bvec_iter_offset((bvec), (iter)) / PAGE_SIZE) > + > +/* > + * of single-page(sp) segment. > + * > + * This helpers are for building sp bvec in flight. > + */ > +#define bvec_iter_offset(bvec, iter) \ > + (mp_bvec_iter_offset((bvec), (iter)) % PAGE_SIZE) > + > +#define bvec_iter_len(bvec, iter)\ > + min_t(unsigned, mp_bvec_iter_len((bvec), (iter)), \ > + (PAGE_SIZE - (bvec_iter_offset((bvec), (iter) The parentheses around
Re: [Cluster-devel] lost idr_destroy for ls_recover_idr in release_lockspace() ?
On Thu, Nov 15, 2018 at 09:49:17AM +0300, Vasily Averin wrote: > Dear David, > I've noticed that release_lockspace() lacks idr_destroy(>ls_recover_idr), > though it is called on rollback in new_lockspace(). > > It seems for me it is not critical, and should not lead to any leaks, > however could you please re-check it? > > Thank you, > Vasily Averin Thanks for the patches, I've pushed them to linux-dlm next, and added another for the missing idr_destroy. Dave
Re: [Cluster-devel] [GIT PULL] gfs2: 4.20 fixes
On Thu, Nov 15, 2018 at 12:20 PM Andreas Gruenbacher wrote: > > I guess rebasing the for-next branch onto something more recent to > avoid the back-merge in the first place will be best, resulting in a > cleaner history. Rebases aren't really any better at all. If you have a real *reason* for a merge, do the merge. But then the reason should be clearly stated in the merge commit. Not just some random undocumented merge message. Tell why some other branch was relevant to your fix and needed to be pulled in. Better yet, don't do either merges or rebases. Linus
Re: [Cluster-devel] [GIT PULL] gfs2: 4.20 fixes
On Thu, Nov 15, 2018 at 6:00 AM Andreas Gruenbacher wrote: > > could you please pull the following gfs2 fixes for 4.20? No. I'm not pulling this useless commit message: "Merge tag 'v4.20-rc1'" with absolutely _zero_ explanation for why that merge was done. Guys, stop doing this. Because I will stop pulling them. If you can't be bothered to explain exactly why you're doing a merge, I can't be bothered to pull the result. Commit messages are important. They explain why something was done. Merge commits are to some degree *more* important, because they do odd things and the reason is not obvious from the code ("Oh, it's a oneliner obvious fix"). So merge commits without a reason for them are simply not acceptable. Linus
Re: [Cluster-devel] [PATCH V10 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec
On Thu, Nov 15, 2018 at 04:52:59PM +0800, Ming Lei wrote: > diff --git a/block/blk-zoned.c b/block/blk-zoned.c > index 13ba2011a306..789b09ae402a 100644 > --- a/block/blk-zoned.c > +++ b/block/blk-zoned.c > @@ -123,6 +123,7 @@ static int blk_report_zones(struct gendisk *disk, > sector_t sector, > unsigned int z = 0, n, nrz = *nr_zones; > sector_t capacity = get_capacity(disk); > int ret; > + struct bvec_iter_all iter_all; > > while (z < nrz && sector < capacity) { > n = nrz - z; iter_all is added but not used and I don't see any bio_for_each_segment_all for conversion in this function.
[Cluster-devel] [GIT PULL] gfs2: 4.20 fixes
Hi Linus, could you please pull the following gfs2 fixes for 4.20? Thank you, Andreas The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git tags/gfs2-4.20.fixes2 for you to fetch changes up to 77d36cabdf8e7c8d46b3e275a3d7958549de04b0: gfs2: Fix iomap buffer head reference counting bug (2018-11-11 12:20:05 +) Fix two bugs leading to leaked buffer head references: gfs2: Put bitmap buffers in put_super gfs2: Fix iomap buffer head reference counting bug And one bug leading to significant slow-downs when deleting large files: gfs2: Fix metadata read-ahead during truncate (2) Andreas Gruenbacher (4): gfs2: Put bitmap buffers in put_super gfs2: Fix metadata read-ahead during truncate (2) Merge tag 'v4.20-rc1' gfs2: Fix iomap buffer head reference counting bug fs/gfs2/bmap.c | 54 +++--- fs/gfs2/rgrp.c | 3 ++- 2 files changed, 29 insertions(+), 28 deletions(-)
[Cluster-devel] [PATCH 1/3] dlm: possible memory leak on error path in create_lkb()
Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr") Cc: sta...@kernel.org # 3.1 Signed-off-by: Vasily Averin --- fs/dlm/lock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index cc91963683de..2cb125cc21c9 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1209,6 +1209,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret) if (rv < 0) { log_error(ls, "create_lkb idr error %d", rv); + dlm_free_lkb(lkb); return rv; } -- 2.17.1
[Cluster-devel] [PATCH 2/3] dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages") Cc: sta...@kernel.org # 3.5 Signed-off-by: Vasily Averin --- fs/dlm/lock.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 2cb125cc21c9..03d767b94f7b 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -4180,6 +4180,7 @@ static int receive_convert(struct dlm_ls *ls, struct dlm_message *ms) (unsigned long long)lkb->lkb_recover_seq, ms->m_header.h_nodeid, ms->m_lkid); error = -ENOENT; + dlm_put_lkb(lkb); goto fail; } @@ -4233,6 +4234,7 @@ static int receive_unlock(struct dlm_ls *ls, struct dlm_message *ms) lkb->lkb_id, lkb->lkb_remid, ms->m_header.h_nodeid, ms->m_lkid); error = -ENOENT; + dlm_put_lkb(lkb); goto fail; } -- 2.17.1
[Cluster-devel] [PATCH v2] dlm: fixed memory leaks after failed ls_remove_names allocation
If allocation fails on last elements of array need to free already allocated elements. v2: just move existing out_rsbtbl label to right place Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: sta...@kernel.org # 3.6 Signed-off-by: Vasily Averin --- fs/dlm/lockspace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 5ba94be006ee..6a1529e478f3 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -680,11 +680,11 @@ static int new_lockspace(const char *name, const char *cluster, kfree(ls->ls_recover_buf); out_lkbidr: idr_destroy(>ls_lkbidr); + out_rsbtbl: for (i = 0; i < DLM_REMOVE_NAMES_MAX; i++) { if (ls->ls_remove_names[i]) kfree(ls->ls_remove_names[i]); } - out_rsbtbl: vfree(ls->ls_rsbtbl); out_lsfree: if (do_unreg) -- 2.17.1
[Cluster-devel] [PATCH 3/3] dlm: memory leaks on error path in dlm_user_request()
According to comment in dlm_user_request() ua should be freed in dlm_free_lkb() after successful attach to lkb. However ua is attached to lkb not in set_lock_args() but later, inside request_lock(). Fixes 597d0cae0f99 ("[DLM] dlm: user locks") Cc: sta...@kernel.org # 2.6.19 Signed-off-by: Vasily Averin --- fs/dlm/lock.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 03d767b94f7b..a928ba008d7d 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -5795,20 +5795,20 @@ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, goto out; } } - - /* After ua is attached to lkb it will be freed by dlm_free_lkb(). - When DLM_IFL_USER is set, the dlm knows that this is a userspace - lock and that lkb_astparam is the dlm_user_args structure. */ - error = set_lock_args(mode, >lksb, flags, namelen, timeout_cs, fake_astfn, ua, fake_bastfn, ); - lkb->lkb_flags |= DLM_IFL_USER; - if (error) { + kfree(ua->lksb.sb_lvbptr); + ua->lksb.sb_lvbptr = NULL; + kfree(ua); __put_lkb(ls, lkb); goto out; } + /* After ua is attached to lkb it will be freed by dlm_free_lkb(). + When DLM_IFL_USER is set, the dlm knows that this is a userspace + lock and that lkb_astparam is the dlm_user_args structure. */ + lkb->lkb_flags |= DLM_IFL_USER; error = request_lock(ls, lkb, name, namelen, ); switch (error) { -- 2.17.1
[Cluster-devel] lost idr_destroy for ls_recover_idr in release_lockspace() ?
Dear David, I've noticed that release_lockspace() lacks idr_destroy(>ls_recover_idr), though it is called on rollback in new_lockspace(). It seems for me it is not critical, and should not lead to any leaks, however could you please re-check it? Thank you, Vasily Averin
[Cluster-devel] [PATCH] dlm: fixed memory leaks after failed ls_remove_names allocation
If allocation fails on last elements of array need to free already allocated elements. Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: sta...@kernel.org # 3.6 Signed-off-by: Vasily Averin --- fs/dlm/lockspace.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 5ba94be006ee..f99e110a0af8 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -532,7 +532,7 @@ static int new_lockspace(const char *name, const char *cluster, ls->ls_remove_names[i] = kzalloc(DLM_RESNAME_MAXLEN+1, GFP_KERNEL); if (!ls->ls_remove_names[i]) - goto out_rsbtbl; + goto out_remove_names; } idr_init(>ls_lkbidr); @@ -680,6 +680,7 @@ static int new_lockspace(const char *name, const char *cluster, kfree(ls->ls_recover_buf); out_lkbidr: idr_destroy(>ls_lkbidr); + out_remove_names: for (i = 0; i < DLM_REMOVE_NAMES_MAX; i++) { if (ls->ls_remove_names[i]) kfree(ls->ls_remove_names[i]); -- 2.17.1
[Cluster-devel] [PATCH V10 18/19] block: kill QUEUE_FLAG_NO_SG_MERGE
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"), physical segment number is mainly figured out in blk_queue_split() for fast path, and the flag of BIO_SEG_VALID is set there too. Now only blk_recount_segments() and blk_recalc_rq_segments() use this flag. Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID is set in blk_queue_split(). For another user of blk_recalc_rq_segments(): - run in partial completion branch of blk_update_request, which is an unusual case - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed since dm-rq is the only user. Multi-page bvec is enabled now, QUEUE_FLAG_NO_SG_MERGE doesn't make sense any more. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- block/blk-merge.c | 31 ++- block/blk-mq-debugfs.c | 1 - block/blk-mq.c | 3 --- drivers/md/dm-table.c | 13 - include/linux/blkdev.h | 1 - 5 files changed, 6 insertions(+), 43 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 153a659fde74..06be298be332 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -351,8 +351,7 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) EXPORT_SYMBOL(blk_queue_split); static unsigned int __blk_recalc_rq_segments(struct request_queue *q, -struct bio *bio, -bool no_sg_merge) +struct bio *bio) { struct bio_vec bv, bvprv = { NULL }; int cluster, prev = 0; @@ -379,13 +378,6 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, nr_phys_segs = 0; for_each_bio(bio) { bio_for_each_bvec(bv, bio, iter) { - /* -* If SG merging is disabled, each bio vector is -* a segment -*/ - if (no_sg_merge) - goto new_segment; - if (prev && cluster) { if (seg_size + bv.bv_len > queue_max_segment_size(q)) @@ -420,27 +412,16 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q, void blk_recalc_rq_segments(struct request *rq) { - bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE, - >q->queue_flags); - - rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio, - no_sg_merge); + rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio); } void blk_recount_segments(struct request_queue *q, struct bio *bio) { - unsigned short seg_cnt = bio_segments(bio); - - if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) && - (seg_cnt < queue_max_segments(q))) - bio->bi_phys_segments = seg_cnt; - else { - struct bio *nxt = bio->bi_next; + struct bio *nxt = bio->bi_next; - bio->bi_next = NULL; - bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false); - bio->bi_next = nxt; - } + bio->bi_next = NULL; + bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio); + bio->bi_next = nxt; bio_set_flag(bio, BIO_SEG_VALID); } diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index f021f4817b80..e188b1090759 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -128,7 +128,6 @@ static const char *const blk_queue_flag_name[] = { QUEUE_FLAG_NAME(SAME_FORCE), QUEUE_FLAG_NAME(DEAD), QUEUE_FLAG_NAME(INIT_DONE), - QUEUE_FLAG_NAME(NO_SG_MERGE), QUEUE_FLAG_NAME(POLL), QUEUE_FLAG_NAME(WC), QUEUE_FLAG_NAME(FUA), diff --git a/block/blk-mq.c b/block/blk-mq.c index 411be60d0cb6..ed484af5744b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2755,9 +2755,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT; - if (!(set->flags & BLK_MQ_F_SG_MERGE)) - queue_flag_set_unlocked(QUEUE_FLAG_NO_SG_MERGE, q); - q->sg_reserved_size = INT_MAX; INIT_DELAYED_WORK(>requeue_work, blk_mq_requeue_work); diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 9038c302d5c2..22fed6987aea 100644 ---
[Cluster-devel] [PATCH V10 17/19] block: don't use bio->bi_vcnt to figure out segment number
It is wrong to use bio->bi_vcnt to figure out how many segments there are in the bio even though CLONED flag isn't set on this bio, because this bio may be splitted or advanced. So always use bio_segments() in blk_recount_segments(), and it shouldn't cause any performance loss now because the physical segment number is figured out in blk_queue_split() and BIO_SEG_VALID is set meantime since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"). Cc: Dave Chinner Cc: Kent Overstreet Fixes: 7f60dcaaf91 ("block: blk-merge: fix blk_recount_segments()") Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- block/blk-merge.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index cb9f49bcfd36..153a659fde74 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -429,13 +429,7 @@ void blk_recalc_rq_segments(struct request *rq) void blk_recount_segments(struct request_queue *q, struct bio *bio) { - unsigned short seg_cnt; - - /* estimate segment number by bi_vcnt for non-cloned bio */ - if (bio_flagged(bio, BIO_CLONED)) - seg_cnt = bio_segments(bio); - else - seg_cnt = bio->bi_vcnt; + unsigned short seg_cnt = bio_segments(bio); if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) && (seg_cnt < queue_max_segments(q))) -- 2.9.5
[Cluster-devel] [PATCH V10 07/19] btrfs: use bvec_last_segment to get bio's last page
Preparing for supporting multi-page bvec. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- fs/btrfs/compression.c | 5 - fs/btrfs/extent_io.c | 5 +++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 2955a4ea2fa8..161e14b8b180 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -400,8 +400,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, static u64 bio_end_offset(struct bio *bio) { struct bio_vec *last = bio_last_bvec_all(bio); + struct bio_vec bv; - return page_offset(last->bv_page) + last->bv_len + last->bv_offset; + bvec_last_segment(last, ); + + return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset; } static noinline int add_ra_bio_pages(struct inode *inode, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d228f706ff3e..5d5965297e7e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2720,11 +2720,12 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num, { blk_status_t ret = 0; struct bio_vec *bvec = bio_last_bvec_all(bio); - struct page *page = bvec->bv_page; + struct bio_vec bv; struct extent_io_tree *tree = bio->bi_private; u64 start; - start = page_offset(page) + bvec->bv_offset; + bvec_last_segment(bvec, ); + start = page_offset(bv.bv_page) + bv.bv_offset; bio->bi_private = NULL; -- 2.9.5
[Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()
There are still cases in which we need to use bio_bvecs() for get the number of multi-page segment, so introduce it. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- include/linux/bio.h | 30 +- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 1f0dcf109841..3496c816946e 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -196,7 +196,6 @@ static inline unsigned bio_segments(struct bio *bio) * We special case discard/write same/write zeroes, because they * interpret bi_size differently: */ - switch (bio_op(bio)) { case REQ_OP_DISCARD: case REQ_OP_SECURE_ERASE: @@ -205,13 +204,34 @@ static inline unsigned bio_segments(struct bio *bio) case REQ_OP_WRITE_SAME: return 1; default: - break; + bio_for_each_segment(bv, bio, iter) + segs++; + return segs; } +} - bio_for_each_segment(bv, bio, iter) - segs++; +static inline unsigned bio_bvecs(struct bio *bio) +{ + unsigned bvecs = 0; + struct bio_vec bv; + struct bvec_iter iter; - return segs; + /* +* We special case discard/write same/write zeroes, because they +* interpret bi_size differently: +*/ + switch (bio_op(bio)) { + case REQ_OP_DISCARD: + case REQ_OP_SECURE_ERASE: + case REQ_OP_WRITE_ZEROES: + return 0; + case REQ_OP_WRITE_SAME: + return 1; + default: + bio_for_each_bvec(bv, bio, iter) + bvecs++; + return bvecs; + } } /* -- 2.9.5
[Cluster-devel] [PATCH V10 16/19] block: document usage of bio iterator helpers
Now multi-page bvec is supported, some helpers may return page by page, meantime some may return segment by segment, this patch documents the usage. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- Documentation/block/biovecs.txt | 26 ++ 1 file changed, 26 insertions(+) diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt index 25689584e6e0..bfafb70d0d9e 100644 --- a/Documentation/block/biovecs.txt +++ b/Documentation/block/biovecs.txt @@ -117,3 +117,29 @@ Other implications: size limitations and the limitations of the underlying devices. Thus there's no need to define ->merge_bvec_fn() callbacks for individual block drivers. + +Usage of helpers: += + +* The following helpers whose names have the suffix of "_all" can only be used +on non-BIO_CLONED bio, and usually they are used by filesystem code, and driver +shouldn't use them because bio may have been split before they got to the driver: + + bio_for_each_segment_all() + bio_first_bvec_all() + bio_first_page_all() + bio_last_bvec_all() + +* The following helpers iterate over single-page bvec, and the local +variable of 'struct bio_vec' or the reference records single-page IO +vector during the itearation: + + bio_for_each_segment() + bio_for_each_segment_all() + +* The following helper iterates over multi-page bvec, and each bvec may +include multiple physically contiguous pages, and the local variable of +'struct bio_vec' or the reference records multi-page IO vector during the +itearation: + + bio_for_each_bvec() -- 2.9.5
[Cluster-devel] [PATCH V10 13/19] iomap & xfs: only account for new added page
After multi-page is enabled, one new page may be merged to a segment even though it is a new added page. This patch deals with this issue by post-check in case of merge, and only a freshly new added page need to be dealt with for iomap & xfs. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- fs/iomap.c | 22 ++ fs/xfs/xfs_aops.c | 10 -- include/linux/bio.h | 11 +++ 3 files changed, 33 insertions(+), 10 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index df0212560b36..a1b97a5c726a 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -288,6 +288,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, loff_t orig_pos = pos; unsigned poff, plen; sector_t sector; + bool need_account = false; if (iomap->type == IOMAP_INLINE) { WARN_ON_ONCE(pos); @@ -313,18 +314,15 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, */ sector = iomap_sector(iomap, pos); if (ctx->bio && bio_end_sector(ctx->bio) == sector) { - if (__bio_try_merge_page(ctx->bio, page, plen, poff)) + if (__bio_try_merge_page(ctx->bio, page, plen, poff)) { + need_account = iop && bio_is_last_segment(ctx->bio, + page, plen, poff); goto done; + } is_contig = true; } - /* -* If we start a new segment we need to increase the read count, and we -* need to do so before submitting any previous full bio to make sure -* that we don't prematurely unlock the page. -*/ - if (iop) - atomic_inc(>read_count); + need_account = true; if (!ctx->bio || !is_contig || bio_full(ctx->bio)) { gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); @@ -347,6 +345,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, __bio_add_page(ctx->bio, page, plen, poff); done: /* +* If we add a new page we need to increase the read count, and we +* need to do so before submitting any previous full bio to make sure +* that we don't prematurely unlock the page. +*/ + if (iop && need_account) + atomic_inc(>read_count); + + /* * Move the caller beyond our range so that it keeps making progress. * For that we have to include any leading non-uptodate ranges, but * we can skip trailing ones as they will be handled in the next diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 1f1829e506e8..d8e9cc9f751a 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -603,6 +603,7 @@ xfs_add_to_ioend( unsignedlen = i_blocksize(inode); unsignedpoff = offset & (PAGE_SIZE - 1); sector_tsector; + boolneed_account; sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) + ((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9); @@ -617,13 +618,18 @@ xfs_add_to_ioend( } if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) { - if (iop) - atomic_inc(>write_count); + need_account = true; if (bio_full(wpc->ioend->io_bio)) xfs_chain_bio(wpc->ioend, wbc, bdev, sector); __bio_add_page(wpc->ioend->io_bio, page, len, poff); + } else { + need_account = iop && bio_is_last_segment(wpc->ioend->io_bio, + page, len, poff); } + if (iop && need_account) + atomic_inc(>write_count); + wpc->ioend->io_size += len; } diff --git a/include/linux/bio.h b/include/linux/bio.h index 1a2430a8b89d..5040e9a2eb09 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -341,6 +341,17 @@ static inline struct bio_vec *bio_last_bvec_all(struct bio *bio) return >bi_io_vec[bio->bi_vcnt - 1]; } +/* iomap needs this helper to deal with sub-pagesize bvec */ +static inline bool bio_is_last_segment(struct bio *bio, struct page *page, + unsigned int len, unsigned int off) +{ + struct bio_vec bv; + + bvec_last_segment(bio_last_bvec_all(bio), ); + + return bv.bv_page == page && bv.bv_len == len &&
[Cluster-devel] [PATCH V10 14/19] block: enable multipage bvecs
This patch pulls the trigger for multi-page bvecs. Now any request queue which supports queue cluster will see multi-page bvecs. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- block/bio.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/block/bio.c b/block/bio.c index 6486722d4d4b..ed6df6f8e63d 100644 --- a/block/bio.c +++ b/block/bio.c @@ -767,12 +767,24 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, if (bio->bi_vcnt > 0) { struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1]; - - if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) { - bv->bv_len += len; - bio->bi_iter.bi_size += len; - return true; - } + struct request_queue *q = NULL; + + if (page == bv->bv_page && off == (bv->bv_offset + bv->bv_len) + && (off + len) <= PAGE_SIZE) + goto merge; + + if (bio->bi_disk) + q = bio->bi_disk->queue; + + /* disable multi-page bvec too if cluster isn't enabled */ + if (!q || !blk_queue_cluster(q) || + ((page_to_phys(bv->bv_page) + bv->bv_offset + bv->bv_len) != +(page_to_phys(page) + off))) + return false; + merge: + bv->bv_len += len; + bio->bi_iter.bi_size += len; + return true; } return false; } -- 2.9.5
[Cluster-devel] [PATCH V10 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec
This patch introduces one extra iterator variable to bio_for_each_segment_all(), then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. Given it is just one mechannical & simple change on all bio_for_each_segment_all() users, this patch does tree-wide change in one single patch, so that we can avoid to use a temporary helper for this conversion. Cc: Dave Chinner Cc: Kent Overstreet Cc: linux-fsde...@vger.kernel.org Cc: Alexander Viro Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: linux-bt...@vger.kernel.org Cc: David Sterba Cc: Darrick J. Wong Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- block/bio.c | 27 ++- block/blk-zoned.c | 1 + block/bounce.c| 6 -- drivers/md/bcache/btree.c | 3 ++- drivers/md/dm-crypt.c | 3 ++- drivers/md/raid1.c| 3 ++- drivers/staging/erofs/data.c | 3 ++- drivers/staging/erofs/unzip_vle.c | 3 ++- fs/block_dev.c| 6 -- fs/btrfs/compression.c| 3 ++- fs/btrfs/disk-io.c| 3 ++- fs/btrfs/extent_io.c | 12 fs/btrfs/inode.c | 6 -- fs/btrfs/raid56.c | 3 ++- fs/crypto/bio.c | 3 ++- fs/direct-io.c| 4 +++- fs/exofs/ore.c| 3 ++- fs/exofs/ore_raid.c | 3 ++- fs/ext4/page-io.c | 3 ++- fs/ext4/readpage.c| 3 ++- fs/f2fs/data.c| 9 ++--- fs/gfs2/lops.c| 6 -- fs/gfs2/meta_io.c | 3 ++- fs/iomap.c| 6 -- fs/mpage.c| 3 ++- fs/xfs/xfs_aops.c | 5 +++-- include/linux/bio.h | 11 +-- include/linux/bvec.h | 31 +++ 28 files changed, 129 insertions(+), 46 deletions(-) diff --git a/block/bio.c b/block/bio.c index d5368a445561..6486722d4d4b 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1072,8 +1072,9 @@ static int bio_copy_from_iter(struct bio *bio, struct iov_iter *iter) { int i; struct bio_vec *bvec; + struct bvec_iter_all iter_all; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all(bvec, bio, i, iter_all) { ssize_t ret; ret = copy_page_from_iter(bvec->bv_page, @@ -1103,8 +1104,9 @@ static int bio_copy_to_iter(struct bio *bio, struct iov_iter iter) { int i; struct bio_vec *bvec; + struct bvec_iter_all iter_all; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all(bvec, bio, i, iter_all) { ssize_t ret; ret = copy_page_to_iter(bvec->bv_page, @@ -1126,8 +1128,9 @@ void bio_free_pages(struct bio *bio) { struct bio_vec *bvec; int i; + struct bvec_iter_all iter_all; - bio_for_each_segment_all(bvec, bio, i) + bio_for_each_segment_all(bvec, bio, i, iter_all) __free_page(bvec->bv_page); } EXPORT_SYMBOL(bio_free_pages); @@ -1293,6 +1296,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, struct bio *bio; int ret; struct bio_vec *bvec; + struct bvec_iter_all iter_all; if (!iov_iter_count(iter)) return ERR_PTR(-EINVAL); @@ -1366,7 +1370,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, return bio; out_unmap: - bio_for_each_segment_all(bvec, bio, j) { + bio_for_each_segment_all(bvec, bio, j, iter_all) { put_page(bvec->bv_page); } bio_put(bio); @@ -1377,11 +1381,12 @@ static void __bio_unmap_user(struct bio *bio) { struct bio_vec *bvec; int i; + struct bvec_iter_all iter_all; /* * make sure we dirty pages we wrote to */ - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all(bvec, bio, i, iter_all) { if (bio_data_dir(bio) == READ) set_page_dirty_lock(bvec->bv_page); @@ -1473,8 +1478,9 @@ static void bio_copy_kern_endio_read(struct bio *bio) char *p = bio->bi_private; struct bio_vec *bvec; int i; + struct bvec_iter_all iter_all; - bio_for_each_segment_all(bvec, bio, i) { + bio_for_each_segment_all(bvec, bio, i, iter_all) { memcpy(p, page_address(bvec->bv_page), bvec->bv_len); p += bvec->bv_len; } @@ -1583,8 +1589,9 @@ void bio_set_pages_dirty(struct bio *bio) { struct bio_vec *bvec; int i; + struct bvec_iter_all iter_all;
[Cluster-devel] [PATCH V10 00/19] block: support multi-page bvec
Hi, This patchset brings multi-page bvec into block layer: 1) what is multi-page bvec? Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages which are physically contiguous instead of one single page used in linux kernel for long time. 2) why is multi-page bvec introduced? Kent proposed the idea[1] first. As system's RAM becomes much bigger than before, and huge page, transparent huge page and memory compaction are widely used, it is a bit easy now to see physically contiguous pages from fs in I/O. On the other hand, from block layer's view, it isn't necessary to store intermediate pages into bvec, and it is enough to just store the physicallly contiguous 'segment' in each io vector. Also huge pages are being brought to filesystem and swap [2][6], we can do IO on a hugepage each time[3], which requires that one bio can transfer at least one huge page one time. Turns out it isn't flexiable to change BIO_MAX_PAGES simply[3][5]. Multipage bvec can fit in this case very well. As we saw, if CONFIG_THP_SWAP is enabled, BIO_MAX_PAGES can be configured as much bigger, such as 512, which requires at least two 4K pages for holding the bvec table. With multi-page bvec: - Inside block layer, both bio splitting and sg map can become more efficient than before by just traversing the physically contiguous 'segment' instead of each page. - segment handling in block layer can be improved much in future since it should be quite easy to convert multipage bvec into segment easily. For example, we might just store segment in each bvec directly in future. - bio size can be increased and it should improve some high-bandwidth IO case in theory[4]. - there is opportunity in future to improve memory footprint of bvecs. 3) how is multi-page bvec implemented in this patchset? The patches of 1 ~ 14 implement multipage bvec in block layer: - put all tricks into bvec/bio/rq iterators, and as far as drivers and fs use these standard iterators, they are happy with multipage bvec - introduce bio_for_each_bvec() to iterate over multipage bvec for splitting bio and mapping sg - keep current bio_for_each_segment*() to itereate over singlepage bvec and make sure current users won't be broken; especailly, convert to this new helper prototype in single patch 21 given it is bascially a mechanism conversion - deal with iomap & xfs's sub-pagesize io vec in patch 13 - enalbe multipage bvec in patch 14 Patch 15 redefines BIO_MAX_PAGES as 256. Patch 16 documents usages of bio iterator helpers. Patch 17~19 kills NO_SG_MERGE. These patches can be found in the following git tree: git: https://github.com/ming1/linux.git for-4.21-block-mp-bvec-V10 Lots of test(blktest, xfstests, ltp io, ...) have been run with this patchset, and not see regression. Thanks Christoph for reviewing the early version and providing very good suggestions, such as: introduce bio_init_with_vec_table(), remove another unnecessary helpers for cleanup and so on. Any comments are welcome! V10: - no any code change, just add more guys and list into patch's CC list, as suggested by Christoph and Dave Chinner V9: - fix regression on iomap's sub-pagesize io vec, covered by patch 13 V8: - remove prepare patches which all are merged to linus tree - rebase on for-4.21/block - address comments on V7 - add patches of killing NO_SG_MERGE V7: - include Christoph and Mike's bio_clone_bioset() patches, which is actually prepare patches for multipage bvec - address Christoph's comments V6: - avoid to introduce lots of renaming, follow Jen's suggestion of using the name of chunk for multipage io vector - include Christoph's three prepare patches - decrease stack usage for using bio_for_each_chunk_segment_all() - address Kent's comment V5: - remove some of prepare patches, which have been merged already - add bio_clone_seg_bioset() to fix DM's bio clone, which is introduced by 18a25da84354c6b (dm: ensure bio submission follows a depth-first tree walk) - rebase on the latest block for-v4.18 V4: - rename bio_for_each_segment*() as bio_for_each_page*(), rename bio_segments() as bio_pages(), rename rq_for_each_segment() as rq_for_each_pages(), because these helpers never return real segment, and they always return single page bvec - introducing segment_for_each_page_all() - introduce new bio_for_each_segment*()/rq_for_each_segment()/bio_segments() for returning real multipage segment - rewrite segment_last_page() - rename bvec iterator helper as suggested by Christoph - replace comment with applying bio helpers as suggested by Christoph - document usage of bio iterator helpers -
[Cluster-devel] [PATCH V10 06/19] fs/buffer.c: use bvec iterator to truncate the bio
Once multi-page bvec is enabled, the last bvec may include more than one page, this patch use bvec_last_segment() to truncate the bio. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- fs/buffer.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/buffer.c b/fs/buffer.c index 1286c2b95498..fa37ad52e962 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3032,7 +3032,10 @@ void guard_bio_eod(int op, struct bio *bio) /* ..and clear the end of the buffer for reads */ if (op == REQ_OP_READ) { - zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len, + struct bio_vec bv; + + bvec_last_segment(bvec, ); + zero_user(bv.bv_page, bv.bv_offset + bv.bv_len, truncated_bytes); } } -- 2.9.5
[Cluster-devel] [PATCH V10 19/19] block: kill BLK_MQ_F_SG_MERGE
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- block/blk-mq-debugfs.c | 1 - drivers/block/loop.c | 2 +- drivers/block/nbd.c | 2 +- drivers/block/rbd.c | 2 +- drivers/block/skd_main.c | 1 - drivers/block/xen-blkfront.c | 2 +- drivers/md/dm-rq.c | 2 +- drivers/mmc/core/queue.c | 3 +-- drivers/scsi/scsi_lib.c | 2 +- include/linux/blk-mq.h | 1 - 10 files changed, 7 insertions(+), 11 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index e188b1090759..e1c12358391a 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -250,7 +250,6 @@ static const char *const alloc_policy_name[] = { static const char *const hctx_flag_name[] = { HCTX_FLAG_NAME(SHOULD_MERGE), HCTX_FLAG_NAME(TAG_SHARED), - HCTX_FLAG_NAME(SG_MERGE), HCTX_FLAG_NAME(BLOCKING), HCTX_FLAG_NAME(NO_SCHED), }; diff --git a/drivers/block/loop.c b/drivers/block/loop.c index a3fd418ec637..d509902a8046 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1907,7 +1907,7 @@ static int loop_add(struct loop_device **l, int i) lo->tag_set.queue_depth = 128; lo->tag_set.numa_node = NUMA_NO_NODE; lo->tag_set.cmd_size = sizeof(struct loop_cmd); - lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; lo->tag_set.driver_data = lo; err = blk_mq_alloc_tag_set(>tag_set); diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 08696f5f00bb..999c94de78e5 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1570,7 +1570,7 @@ static int nbd_dev_add(int index) nbd->tag_set.numa_node = NUMA_NO_NODE; nbd->tag_set.cmd_size = sizeof(struct nbd_cmd); nbd->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | - BLK_MQ_F_SG_MERGE | BLK_MQ_F_BLOCKING; + BLK_MQ_F_BLOCKING; nbd->tag_set.driver_data = nbd; err = blk_mq_alloc_tag_set(>tag_set); diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 8e5140bbf241..3dfd300b5283 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3988,7 +3988,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev) rbd_dev->tag_set.ops = _mq_ops; rbd_dev->tag_set.queue_depth = rbd_dev->opts->queue_depth; rbd_dev->tag_set.numa_node = NUMA_NO_NODE; - rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; rbd_dev->tag_set.nr_hw_queues = 1; rbd_dev->tag_set.cmd_size = sizeof(struct work_struct); diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c index a10d5736d8f7..a7040f9a1b1b 100644 --- a/drivers/block/skd_main.c +++ b/drivers/block/skd_main.c @@ -2843,7 +2843,6 @@ static int skd_cons_disk(struct skd_device *skdev) skdev->sgs_per_request * sizeof(struct scatterlist); skdev->tag_set.numa_node = NUMA_NO_NODE; skdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | - BLK_MQ_F_SG_MERGE | BLK_ALLOC_POLICY_TO_MQ_FLAG(BLK_TAG_ALLOC_FIFO); skdev->tag_set.driver_data = skdev; rc = blk_mq_alloc_tag_set(>tag_set); diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 56452cabce5b..297412bf23e1 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -977,7 +977,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, } else info->tag_set.queue_depth = BLK_RING_SIZE(info); info->tag_set.numa_node = NUMA_NO_NODE; - info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE; info->tag_set.cmd_size = sizeof(struct blkif_req); info->tag_set.driver_data = info; diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 7cd36e4d1310..140ada0b99fc 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -536,7 +536,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t) md->tag_set->ops = _mq_ops; md->tag_set->queue_depth = dm_get_blk_mq_queue_depth(); md->tag_set->numa_node = md->numa_node_id; - md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
[Cluster-devel] [PATCH V10 15/19] block: always define BIO_MAX_PAGES as 256
Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to increase BIO_MAX_PAGES for it. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- include/linux/bio.h | 8 1 file changed, 8 deletions(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 5040e9a2eb09..277921ad42e7 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -34,15 +34,7 @@ #define BIO_BUG_ON #endif -#ifdef CONFIG_THP_SWAP -#if HPAGE_PMD_NR > 256 -#define BIO_MAX_PAGES HPAGE_PMD_NR -#else #define BIO_MAX_PAGES 256 -#endif -#else -#define BIO_MAX_PAGES 256 -#endif #define bio_prio(bio) (bio)->bi_ioprio #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio) -- 2.9.5
[Cluster-devel] [PATCH V10 02/19] block: introduce bio_for_each_bvec()
This helper is used for iterating over multi-page bvec for bio split & merge code. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- include/linux/bio.h | 34 +++--- include/linux/bvec.h | 36 2 files changed, 63 insertions(+), 7 deletions(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 056fb627edb3..1f0dcf109841 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -76,6 +76,9 @@ #define bio_data_dir(bio) \ (op_is_write(bio_op(bio)) ? WRITE : READ) +#define bio_iter_mp_iovec(bio, iter) \ + mp_bvec_iter_bvec((bio)->bi_io_vec, (iter)) + /* * Check whether this bio carries any data or not. A NULL bio is allowed. */ @@ -135,18 +138,33 @@ static inline bool bio_full(struct bio *bio) #define bio_for_each_segment_all(bvl, bio, i) \ for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++) -static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter, - unsigned bytes) +static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter, + unsigned bytes, bool mp) { iter->bi_sector += bytes >> 9; if (bio_no_advance_iter(bio)) iter->bi_size -= bytes; else - bvec_iter_advance(bio->bi_io_vec, iter, bytes); + if (!mp) + bvec_iter_advance(bio->bi_io_vec, iter, bytes); + else + mp_bvec_iter_advance(bio->bi_io_vec, iter, bytes); /* TODO: It is reasonable to complete bio with error here. */ } +static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter, + unsigned bytes) +{ + __bio_advance_iter(bio, iter, bytes, false); +} + +static inline void bio_advance_mp_iter(struct bio *bio, struct bvec_iter *iter, + unsigned bytes) +{ + __bio_advance_iter(bio, iter, bytes, true); +} + #define __bio_for_each_segment(bvl, bio, iter, start) \ for (iter = (start);\ (iter).bi_size && \ @@ -156,6 +174,16 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter, #define bio_for_each_segment(bvl, bio, iter) \ __bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter) +#define __bio_for_each_bvec(bvl, bio, iter, start) \ + for (iter = (start);\ +(iter).bi_size && \ + ((bvl = bio_iter_mp_iovec((bio), (iter))), 1); \ +bio_advance_mp_iter((bio), &(iter), (bvl).bv_len)) + +/* returns one real segment(multipage bvec) each time */ +#define bio_for_each_bvec(bvl, bio, iter) \ + __bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter) + #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len) static inline unsigned bio_segments(struct bio *bio) diff --git a/include/linux/bvec.h b/include/linux/bvec.h index 8ef904a50577..3d61352cd8cf 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -124,8 +124,16 @@ struct bvec_iter { .bv_offset = bvec_iter_offset((bvec), (iter)), \ }) -static inline bool bvec_iter_advance(const struct bio_vec *bv, - struct bvec_iter *iter, unsigned bytes) +#define mp_bvec_iter_bvec(bvec, iter) \ +((struct bio_vec) {\ + .bv_page= mp_bvec_iter_page((bvec), (iter)),\ + .bv_len = mp_bvec_iter_len((bvec), (iter)), \ + .bv_offset = mp_bvec_iter_offset((bvec), (iter)), \ +}) + +static inline bool __bvec_iter_advance(const struct bio_vec *bv, + struct bvec_iter *iter, + unsigned bytes, bool mp) { if (WARN_ONCE(bytes > iter->bi_size, "Attempted to advance past end of bvec iter\n")) { @@ -134,8 +142,14 @@ static inline bool bvec_iter_advance(const struct bio_vec *bv, } while (bytes) { - unsigned iter_len = bvec_iter_len(bv, *iter); - unsigned len = min(bytes,
[Cluster-devel] [PATCH V10 05/19] block: introduce bvec_last_segment()
BTRFS and guard_bio_eod() need to get the last singlepage segment from one multipage bvec, so introduce this helper to make them happy. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- include/linux/bvec.h | 25 + 1 file changed, 25 insertions(+) diff --git a/include/linux/bvec.h b/include/linux/bvec.h index 3d61352cd8cf..01616a0b6220 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -216,4 +216,29 @@ static inline bool mp_bvec_iter_advance(const struct bio_vec *bv, .bi_bvec_done = 0,\ } +/* + * Get the last singlepage segment from the multipage bvec and store it + * in @seg + */ +static inline void bvec_last_segment(const struct bio_vec *bvec, + struct bio_vec *seg) +{ + unsigned total = bvec->bv_offset + bvec->bv_len; + unsigned last_page = total / PAGE_SIZE; + + if (last_page * PAGE_SIZE == total) + last_page--; + + seg->bv_page = nth_page(bvec->bv_page, last_page); + + /* the whole segment is inside the last page */ + if (bvec->bv_offset >= last_page * PAGE_SIZE) { + seg->bv_offset = bvec->bv_offset % PAGE_SIZE; + seg->bv_len = bvec->bv_len; + } else { + seg->bv_offset = 0; + seg->bv_len = total - last_page * PAGE_SIZE; + } +} + #endif /* __LINUX_BVEC_ITER_H */ -- 2.9.5
[Cluster-devel] [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs
BTRFS is the only user of this helper, so move this helper into BTRFS, and implement it via bio_for_each_segment_all(), since bio->bi_vcnt may not equal to number of pages after multipage bvec is enabled. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- fs/btrfs/extent_io.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 5d5965297e7e..874bb9aeebdc 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2348,6 +2348,18 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, return bio; } +static unsigned btrfs_bio_pages_all(struct bio *bio) +{ + unsigned i; + struct bio_vec *bv; + + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); + + bio_for_each_segment_all(bv, bio, i) + ; + return i; +} + /* * this is a generic handler for readpage errors (default * readpage_io_failed_hook). if other copies exist, read those and write back @@ -2368,7 +2380,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, int read_mode = 0; blk_status_t status; int ret; - unsigned failed_bio_pages = bio_pages_all(failed_bio); + unsigned failed_bio_pages = btrfs_bio_pages_all(failed_bio); BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); -- 2.9.5
[Cluster-devel] [PATCH V10 04/19] block: use bio_for_each_bvec() to map sg
It is more efficient to use bio_for_each_bvec() to map sg, meantime we have to consider splitting multipage bvec as done in blk_bio_segment_split(). Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- block/blk-merge.c | 72 +++ 1 file changed, 52 insertions(+), 20 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 6f7deb94a23f..cb9f49bcfd36 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -473,6 +473,56 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio, return biovec_phys_mergeable(q, _bv, _bv); } +static struct scatterlist *blk_next_sg(struct scatterlist **sg, + struct scatterlist *sglist) +{ + if (!*sg) + return sglist; + else { + /* +* If the driver previously mapped a shorter +* list, we could see a termination bit +* prematurely unless it fully inits the sg +* table on each mapping. We KNOW that there +* must be more entries here or the driver +* would be buggy, so force clear the +* termination bit to avoid doing a full +* sg_init_table() in drivers for each command. +*/ + sg_unmark_end(*sg); + return sg_next(*sg); + } +} + +static unsigned blk_bvec_map_sg(struct request_queue *q, + struct bio_vec *bvec, struct scatterlist *sglist, + struct scatterlist **sg) +{ + unsigned nbytes = bvec->bv_len; + unsigned nsegs = 0, total = 0; + + while (nbytes > 0) { + unsigned seg_size; + struct page *pg; + unsigned offset, idx; + + *sg = blk_next_sg(sg, sglist); + + seg_size = min(nbytes, queue_max_segment_size(q)); + offset = (total + bvec->bv_offset) % PAGE_SIZE; + idx = (total + bvec->bv_offset) / PAGE_SIZE; + pg = nth_page(bvec->bv_page, idx); + + sg_set_page(*sg, pg, seg_size, offset); + + total += seg_size; + nbytes -= seg_size; + nsegs++; + } + + return nsegs; +} + static inline void __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec, struct scatterlist *sglist, struct bio_vec *bvprv, @@ -490,25 +540,7 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec, (*sg)->length += nbytes; } else { new_segment: - if (!*sg) - *sg = sglist; - else { - /* -* If the driver previously mapped a shorter -* list, we could see a termination bit -* prematurely unless it fully inits the sg -* table on each mapping. We KNOW that there -* must be more entries here or the driver -* would be buggy, so force clear the -* termination bit to avoid doing a full -* sg_init_table() in drivers for each command. -*/ - sg_unmark_end(*sg); - *sg = sg_next(*sg); - } - - sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset); - (*nsegs)++; + (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg); } *bvprv = *bvec; } @@ -530,7 +562,7 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio, int cluster = blk_queue_cluster(q), nsegs = 0; for_each_bio(bio) - bio_for_each_segment(bvec, bio, iter) + bio_for_each_bvec(bvec, bio, iter) __blk_segment_map_sg(q, , sglist, , sg, , ); -- 2.9.5
[Cluster-devel] [PATCH V10 10/19] block: loop: pass multi-page bvec to iov_iter
iov_iter is implemented with bvec itererator, so it is safe to pass multipage bvec to it, and this way is much more efficient than passing one page in each bvec. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- drivers/block/loop.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index bf6bc35aaf88..a3fd418ec637 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -515,16 +515,16 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, struct bio *bio = rq->bio; struct file *file = lo->lo_backing_file; unsigned int offset; - int segments = 0; + int nr_bvec = 0; int ret; if (rq->bio != rq->biotail) { - struct req_iterator iter; + struct bvec_iter iter; struct bio_vec tmp; __rq_for_each_bio(bio, rq) - segments += bio_segments(bio); - bvec = kmalloc_array(segments, sizeof(struct bio_vec), + nr_bvec += bio_bvecs(bio); + bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec), GFP_NOIO); if (!bvec) return -EIO; @@ -533,13 +533,14 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, /* * The bios of the request may be started from the middle of * the 'bvec' because of bio splitting, so we can't directly -* copy bio->bi_iov_vec to new bvec. The rq_for_each_segment +* copy bio->bi_iov_vec to new bvec. The bio_for_each_bvec * API will take care of all details for us. */ - rq_for_each_segment(tmp, rq, iter) { - *bvec = tmp; - bvec++; - } + __rq_for_each_bio(bio, rq) + bio_for_each_bvec(tmp, bio, iter) { + *bvec = tmp; + bvec++; + } bvec = cmd->bvec; offset = 0; } else { @@ -550,11 +551,11 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, */ offset = bio->bi_iter.bi_bvec_done; bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); - segments = bio_segments(bio); + nr_bvec = bio_bvecs(bio); } atomic_set(>ref, 2); - iov_iter_bvec(, rw, bvec, segments, blk_rq_bytes(rq)); + iov_iter_bvec(, rw, bvec, nr_bvec, blk_rq_bytes(rq)); iter.iov_offset = offset; cmd->iocb.ki_pos = pos; -- 2.9.5
[Cluster-devel] [PATCH V10 01/19] block: introduce multi-page page bvec helpers
This patch introduces helpers of 'mp_bvec_iter_*' for multipage bvec support. The introduced helpers treate one bvec as real multi-page segment, which may include more than one pages. The existed helpers of bvec_iter_* are interfaces for supporting current bvec iterator which is thought as single-page by drivers, fs, dm and etc. These introduced helpers will build single-page bvec in flight, so this way won't break current bio/bvec users, which needn't any change. Cc: Dave Chinner Cc: Kent Overstreet Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Alexander Viro Cc: linux-fsde...@vger.kernel.org Cc: Shaohua Li Cc: linux-r...@vger.kernel.org Cc: linux-er...@lists.ozlabs.org Cc: David Sterba Cc: linux-bt...@vger.kernel.org Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: Gao Xiang Cc: Christoph Hellwig Cc: Theodore Ts'o Cc: linux-e...@vger.kernel.org Cc: Coly Li Cc: linux-bca...@vger.kernel.org Cc: Boaz Harrosh Cc: Bob Peterson Cc: cluster-devel@redhat.com Signed-off-by: Ming Lei --- include/linux/bvec.h | 63 +--- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/include/linux/bvec.h b/include/linux/bvec.h index 02c73c6aa805..8ef904a50577 100644 --- a/include/linux/bvec.h +++ b/include/linux/bvec.h @@ -23,6 +23,44 @@ #include #include #include +#include + +/* + * What is multi-page bvecs? + * + * - bvecs stored in bio->bi_io_vec is always multi-page(mp) style + * + * - bvec(struct bio_vec) represents one physically contiguous I/O + * buffer, now the buffer may include more than one pages after + * multi-page(mp) bvec is supported, and all these pages represented + * by one bvec is physically contiguous. Before mp support, at most + * one page is included in one bvec, we call it single-page(sp) + * bvec. + * + * - .bv_page of the bvec represents the 1st page in the mp bvec + * + * - .bv_offset of the bvec represents offset of the buffer in the bvec + * + * The effect on the current drivers/filesystem/dm/bcache/...: + * + * - almost everyone supposes that one bvec only includes one single + * page, so we keep the sp interface not changed, for example, + * bio_for_each_segment() still returns bvec with single page + * + * - bio_for_each_segment*() will be changed to return single-page + * bvec too + * + * - during iterating, iterator variable(struct bvec_iter) is always + * updated in multipage bvec style and that means bvec_iter_advance() + * is kept not changed + * + * - returned(copied) single-page bvec is built in flight by bvec + * helpers from the stored multipage bvec + * + * - In case that some components(such as iov_iter) need to support + * multi-page bvec, we introduce new helpers(mp_bvec_iter_*) for + * them. + */ /* * was unsigned short, but we might as well be ready for > 64kB I/O pages @@ -50,16 +88,35 @@ struct bvec_iter { */ #define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx]) -#define bvec_iter_page(bvec, iter) \ +#define mp_bvec_iter_page(bvec, iter) \ (__bvec_iter_bvec((bvec), (iter))->bv_page) -#define bvec_iter_len(bvec, iter) \ +#define mp_bvec_iter_len(bvec, iter) \ min((iter).bi_size, \ __bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done) -#define bvec_iter_offset(bvec, iter) \ +#define mp_bvec_iter_offset(bvec, iter)\ (__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done) +#define mp_bvec_iter_page_idx(bvec, iter) \ + (mp_bvec_iter_offset((bvec), (iter)) / PAGE_SIZE) + +/* + * of single-page(sp) segment. + * + * This helpers are for building sp bvec in flight. + */ +#define bvec_iter_offset(bvec, iter) \ + (mp_bvec_iter_offset((bvec), (iter)) % PAGE_SIZE) + +#define bvec_iter_len(bvec, iter) \ + min_t(unsigned, mp_bvec_iter_len((bvec), (iter)), \ + (PAGE_SIZE - (bvec_iter_offset((bvec), (iter) + +#define bvec_iter_page(bvec, iter) \ + nth_page(mp_bvec_iter_page((bvec), (iter)), \ +mp_bvec_iter_page_idx((bvec), (iter))) + #define bvec_iter_bvec(bvec, iter) \ ((struct bio_vec) {\ .bv_page= bvec_iter_page((bvec), (iter)), \ -- 2.9.5