Re: [PATCH 2/3] block: kill QUEUE_FLAG_NO_SG_MERGE

2018-08-01 Thread Ming Lei
On Wed, Aug 01, 2018 at 04:54:02PM +0200, Christoph Hellwig wrote:
> We still hit __blk_recalc_rq_segments at least once per submission,
> don't we?  Either rather wait with this until we have multi-page bvecs.

No.

Please see blk_queue_split(), where the physical segment number is
always updated, and the flag of BIO_SEG_VALID is set meantime.

And for callers of blk_recount_segments(), it will check the VALID flag
first, and blk_recount_segments() is only run when this flag isn't set.

blk_recount_segments
bio_phys_segments
ll_new_hw_segment
ll_back_merge_fn
ll_front_merge_fn 
blk_rq_bio_prep
blk_rq_append_bio
blk_init_request_from_bio
bio_add_pc_page
ll_back_merge_fn
ll_front_merge_fn


Thanks,
Ming


Re: [PATCH] block: copy ioprio in __bio_clone_fast()

2018-08-01 Thread Jens Axboe
On 8/1/18 8:38 AM, Hannes Reinecke wrote:
> We need to copy the io priority, too; otherwise the clone will run
> with a different priority than the original one.

Looks good, but please make the patch against the block 4.19 branch.

-- 
Jens Axboe



Re: [PATCH 0/3] blk-iolatency related ref counting fixes

2018-08-01 Thread Jens Axboe
On 7/31/18 10:39 AM, Josef Bacik wrote:
> These are three ref counting issues we found in testing.  They are pretty
> straightforward, the first two don't really happen in practice but need to be
> fixed, but the last one fixes a real panic we saw pretty regularly in our 
> stress
> testing.  They've been running on our test boxes internally under stress tests
> and seem to have fixed the remaining issues we were seeing.  Thanks,

Applied for 4.19, thanks.

-- 
Jens Axboe



Re: [PATCH] block: really disable runtime-pm for blk-mq

2018-08-01 Thread Christoph Hellwig
On Mon, Jul 30, 2018 at 08:02:19PM +0800, Ming Lei wrote:
> Runtime PM isn't ready for blk-mq yet, and commit 765e40b675a9 ("block:
> disable runtime-pm for blk-mq") tried to disable it. Unfortunately,
> it can't take effect in that way since user space still can switch
> it on via 'echo auto > /sys/block/sdN/device/power/control'.
> 
> This patch disables runtime-pm for blk-mq really by pm_runtime_disable()
> and fixes all kinds of PM related kernel crash.

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 2/3] block: kill QUEUE_FLAG_NO_SG_MERGE

2018-08-01 Thread Christoph Hellwig
We still hit __blk_recalc_rq_segments at least once per submission,
don't we?  Either rather wait with this until we have multi-page bvecs.


Re: [PATCH 1/3] block: don't use bio->bi_vcnt to figure out segment number

2018-08-01 Thread Christoph Hellwig
On Tue, Jul 31, 2018 at 06:49:12PM +0800, Ming Lei wrote:
> It is wrong to use bio->bi_vcnt to figure out how many segments
> there are in the bio even though CLONED flag isn't set on this bio,
> because this bio may be splitted or advanced.
> 
> So always use bio_segments() in blk_recount_segments(), and it shouldn't
> cause any performance loss now because the physical segment number is figured
> out in blk_queue_split() and BIO_SEG_VALID is set meantime since
> bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH] block: copy ioprio in __bio_clone_fast()

2018-08-01 Thread Johannes Thumshirn
Good catch,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH] block: copy ioprio in __bio_clone_fast()

2018-08-01 Thread Hannes Reinecke
We need to copy the io priority, too; otherwise the clone will run
with a different priority than the original one.

Fixes: 43b62ce3ff0a ("block: move bio io prio to a new field")
Signed-off-by: Hannes Reinecke 
---
 block/bio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 8ecc95615941..88129f20623d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -608,6 +608,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter = bio_src->bi_iter;
bio->bi_io_vec = bio_src->bi_io_vec;
+   bio->bi_ioprio = bio_src->bi_ioprio;
 
bio_clone_blkcg_association(bio, bio_src);
 }
@@ -692,6 +693,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
bio->bi_write_hint  = bio_src->bi_write_hint;
bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
+   bio->bi_ioprio  = bio_src->bi_ioprio;
 
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
-- 
2.12.3



Re: [PATCH] block: only rewind to last split

2018-08-01 Thread Jason Yan
Sorry, please ignore this because I saw Greg Edwards's patch ("block: 
reset bi_iter.bi_done after splitting bio") already fixed this.


On 2018/8/1 17:33, Jason Yan wrote:

Commit 63573e359d05 ("bio-integrity: Restore original iterator on verify
stage") made the bio-integrity verify function works fine when the bio
is smaller than the queue limit.

But since commit 54efd50bfd87 ("block: make generic_make_request handle
arbitrarily sized bios") the bio will split in block layer if the size
of bio is larger than the max length of the queue limit. This will make
the bio-integrity verify function do not work. We will see such error
log with splited bios:

[69163.386065] sda: ref tag error at location 0 (rcvd 1024)

Suppose we have a bio of size 1024k submitted and the driver can only
accept bio of max size 512k.

The original bio:

   bi_sector 1024k
  bio  +---+

After the split we have two bios:

   bi_sector  512k
  split+---+
bi_sector  512k
  bio  +---+---+

The new split bio is cloned from the first 512k of the original bio. And
the original bio is advanced to the next 512k(that is 512k~1024k). The
we submit these two requests to the driver.

After the these two bios are complete and we rewind the original bio in
bio_integrity_verify_fn(), the bio restored to the beginnig, but it does
not match the data we submitted to the driver:

   bi_sector 1024k
  bio  +---+

When the split bio is complete the verify works fine. But when the whole
original bio is complete we just rewind the whole bio, which makes the
seed does not match the ref tag at all. So we see the above error log.

Fix this by clear the bi_iter.bi_done after split, so that we will only
rewind to the last sector we last start to split.

Fixes: 63573e359d05 ("bio-integrity: Restore original iterator on verify stage")
Signed-off-by: Jason Yan 
Reported-by: Zhang Liao 
Tested-by: Li Chunjiang 
CC: Jens Axboe 
CC: Kent Overstreet 
CC: Christoph Hellwig 
CC: Martin K. Petersen 
CC: Miao Xie 
CC: Zhaohongjiang 
---
  block/bio.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 67eff5eddc49..afa4f408b9dc 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1867,6 +1867,14 @@ struct bio *bio_split(struct bio *bio, int sectors,

bio_advance(bio, split->bi_iter.bi_size);

+   /*
+* Now bi_iter.bi_done is only used for bio-integrity to verify
+* the data come up from the driver. If we don't clear bi_done
+* here, the original bio will rewind to the beginning and the
+* seed is error. Please check bio_integrity_verify_fn().
+*/
+   bio->bi_iter.bi_done = 0;
+
if (bio_flagged(bio, BIO_TRACE_COMPLETION))
bio_set_flag(split, BIO_TRACE_COMPLETION);






[PATCH] block: only rewind to last split

2018-08-01 Thread Jason Yan
Commit 63573e359d05 ("bio-integrity: Restore original iterator on verify
stage") made the bio-integrity verify function works fine when the bio
is smaller than the queue limit.

But since commit 54efd50bfd87 ("block: make generic_make_request handle
arbitrarily sized bios") the bio will split in block layer if the size
of bio is larger than the max length of the queue limit. This will make
the bio-integrity verify function do not work. We will see such error
log with splited bios:

[69163.386065] sda: ref tag error at location 0 (rcvd 1024)

Suppose we have a bio of size 1024k submitted and the driver can only
accept bio of max size 512k.

The original bio:

  bi_sector 1024k
 bio  +---+

After the split we have two bios:

  bi_sector  512k
 split+---+
   bi_sector  512k
 bio  +---+---+

The new split bio is cloned from the first 512k of the original bio. And
the original bio is advanced to the next 512k(that is 512k~1024k). The
we submit these two requests to the driver.

After the these two bios are complete and we rewind the original bio in
bio_integrity_verify_fn(), the bio restored to the beginnig, but it does
not match the data we submitted to the driver:

  bi_sector 1024k
 bio  +---+

When the split bio is complete the verify works fine. But when the whole
original bio is complete we just rewind the whole bio, which makes the
seed does not match the ref tag at all. So we see the above error log.

Fix this by clear the bi_iter.bi_done after split, so that we will only
rewind to the last sector we last start to split.

Fixes: 63573e359d05 ("bio-integrity: Restore original iterator on verify stage")
Signed-off-by: Jason Yan 
Reported-by: Zhang Liao 
Tested-by: Li Chunjiang 
CC: Jens Axboe 
CC: Kent Overstreet 
CC: Christoph Hellwig 
CC: Martin K. Petersen 
CC: Miao Xie 
CC: Zhaohongjiang 
---
 block/bio.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 67eff5eddc49..afa4f408b9dc 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1867,6 +1867,14 @@ struct bio *bio_split(struct bio *bio, int sectors,
 
bio_advance(bio, split->bi_iter.bi_size);
 
+   /*
+* Now bi_iter.bi_done is only used for bio-integrity to verify
+* the data come up from the driver. If we don't clear bi_done
+* here, the original bio will rewind to the beginning and the
+* seed is error. Please check bio_integrity_verify_fn().
+*/
+   bio->bi_iter.bi_done = 0;
+
if (bio_flagged(bio, BIO_TRACE_COMPLETION))
bio_set_flag(split, BIO_TRACE_COMPLETION);
 
-- 
2.13.6