Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Christoph Hellwig
On Fri, Jan 13, 2017 at 06:16:02AM +, Dexuan Cui wrote:
> IMO this means not only SCSI Unmap command is affected, but
> some other SCSI commands can be affected too? 
> And it looks the bare metal can be affected too?

This affects all drivers looking at the sdb.length field for the
total I/O length - many drivers don't need it but just the SGL, including
both that I tested d this change on - one being virtualized and one
bare metal.

It also only affects commands where the data transfer length is different
from the length of the written blocks, so only affects WRITE SAME and
UNMAP commands, used for discard or zeroing.

I'll submit a cleaned up version with a proper block layer helper today.
Thanks for reporting and debugging this issue!
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Dexuan Cui
> From: Dexuan Cui
> Sent: Friday, January 13, 2017 11:05
> To: 'Christoph Hellwig' 
> Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris
> Valean (Cloudbase Solutions SRL) 
> Subject: RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve
> handling of the magic discard payload"
> 
> > From: Christoph Hellwig [mailto:h...@lst.de]
> > Sent: Friday, January 13, 2017 02:19
> > To: Dexuan Cui 
> > Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris
> > Valean (Cloudbase Solutions SRL) 
> > Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block:
> improve
> > handling of the magic discard payload"
> >
> > Next try:  (I've also dropped most of the Cc list)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index c35b6de..2f358f7 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req,
> > struct scsi_data_buffer *sdb)
> > count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
> > BUG_ON(count > sdb->table.nents);
> > sdb->table.nents = count;
> > -   sdb->length = blk_rq_bytes(req);
> > +   if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
> > +   sdb->length = req->special_vec.bv_len;
> > +   else
> > +   sdb->length = blk_rq_bytes(req);
> > return BLKPREP_OK;
> >  }
> 
> Hi Christoph,
> The patch works like a charm!
> fstrim can work now.
> Chris may help to do more test.
> 
> FWIW:
> If (req->rq_flags & RQF_SPECIAL_PAYLOAD) is true,
> req->special_vec.bv_len is always 24 in my test.
> 
> Thanks really a lot for your quick patch! :-)
> 
> Can the patch make it into v4.10?
> IMO It's a really important fix.
> 
> Thanks,
> -- Dexuan

FYI:  I did more tests and the patch worked just great!

BTW, fstrim/mkfs are not the only affected tools: I put a WARN_ON
before the new line and found python too (see the below calltrace).

IMO this means not only SCSI Unmap command is affected, but
some other SCSI commands can be affected too? 
And it looks the bare metal can be affected too?

Thanks,
-- Dexuan

//Dexcuan: in this case: req->special_vec.bv_len is 24 and 
// blk_rq_bytes(req) is 4096.

[   17.862939] CPU: 2 PID: 1430 Comm: python3 Tainted: GW   
4.10.0-rc3+ #1
[   17.862940] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS 090006  05/23/2012
[   17.862941] Call Trace:
[   17.862947]  dump_stack+0x63/0x90
[   17.862952]  __warn+0xcb/0xf0
[   17.862954]  warn_slowpath_fmt+0x5f/0x80
[   17.862955]  scsi_init_sgtable+0x92/0xc0
[   17.862956]  scsi_init_io+0x4f/0x1e0
[   17.862959]  sd_init_command+0x55b/0xdb0
[   17.862963]  ? scsi_host_alloc_command+0x44/0xc0
[   17.862965]  scsi_setup_cmnd+0xf0/0x150
[   17.862966]  scsi_prep_fn+0xef/0x170
[   17.862968]  blk_peek_request+0x180/0x2b0
[   17.862970]  scsi_request_fn+0x3e/0x620
[   17.862973]  ? elv_rb_add+0x61/0x70
[   17.862977]  ? deadline_add_request+0x36/0x80
[   17.862978]  __blk_run_queue+0x33/0x40
[   17.862979]  blk_queue_bio+0x3c8/0x3e0
[   17.862980]  generic_make_request+0xf2/0x1d0
[   17.862981]  submit_bio+0x73/0x150
[   17.862985]  submit_bh_wbc+0x14c/0x180
[   17.862987]  ll_rw_block+0x78/0xb0
[   17.862988]  __block_write_begin_int+0x4d6/0x5c0
[   17.863002]  ? ext4_inode_attach_jinode.part.67+0xb0/0xb0
[   17.863004]  ? ext4_da_write_begin+0x122/0x400
[   17.863006]  __block_write_begin+0x11/0x20
[   17.863007]  ext4_da_write_begin+0x178/0x400
[   17.863012]  generic_perform_write+0xc9/0x1c0
[   17.863015]  ? file_update_time+0xc8/0x110
[   17.863017]  __generic_file_write_iter+0x1a6/0x1f0
[   17.863020]  ext4_file_write_iter+0x89/0x370
[   17.863023]  ? _copy_to_user+0x2e/0x40
[   17.863026]  ? cp_new_stat+0x153/0x180
[   17.863030]  __vfs_write+0xe3/0x160
[   17.863031]  vfs_write+0xb8/0x1b0
[   17.863032]  SyS_write+0x55/0xc0
[   17.863036]  entry_SYSCALL_64_fastpath+0x1e/0xad
[   17.863037] RIP: 0033:0x7f27314bf4bd

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Dexuan Cui
> From: Christoph Hellwig [mailto:h...@lst.de]
> Sent: Friday, January 13, 2017 02:19
> To: Dexuan Cui 
> Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris
> Valean (Cloudbase Solutions SRL) 
> Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve
> handling of the magic discard payload"
> 
> Next try:  (I've also dropped most of the Cc list)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c35b6de..2f358f7 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req,
> struct scsi_data_buffer *sdb)
>   count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
>   BUG_ON(count > sdb->table.nents);
>   sdb->table.nents = count;
> - sdb->length = blk_rq_bytes(req);
> + if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
> + sdb->length = req->special_vec.bv_len;
> + else
> + sdb->length = blk_rq_bytes(req);
>   return BLKPREP_OK;
>  }

Hi Christoph,
The patch works like a charm!
fstrim can work now.
Chris may help to do more test.

FWIW:
If (req->rq_flags & RQF_SPECIAL_PAYLOAD) is true,
req->special_vec.bv_len is always 24 in my test.

Thanks really a lot for your quick patch! :-)

Can the patch make it into v4.10? 
IMO It's a really important fix.

Thanks,
-- Dexuan

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Chris Valean (Cloudbase Solutions SRL)
Hi Christoph,

Adding Nick and Alex to the thread.
We'll give it a try along with Dexuan and update you with the results.

Thank you!
Chris Valean

-Original Message-
From: Christoph Hellwig [mailto:h...@lst.de] 
Sent: Thursday, January 12, 2017 8:19 PM
To: Dexuan Cui 
Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris 
Valean (Cloudbase Solutions SRL) 
Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve 
handling of the magic discard payload"

Next try:  (I've also dropped most of the Cc list)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 
c35b6de..2f358f7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
BUG_ON(count > sdb->table.nents);
sdb->table.nents = count;
-   sdb->length = blk_rq_bytes(req);
+   if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
+   sdb->length = req->special_vec.bv_len;
+   else
+   sdb->length = blk_rq_bytes(req);
return BLKPREP_OK;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Christoph Hellwig
Next try:  (I've also dropped most of the Cc list)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c35b6de..2f358f7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req, struct 
scsi_data_buffer *sdb)
count = blk_rq_map_sg(req->q, req, sdb->table.sgl);
BUG_ON(count > sdb->table.nents);
sdb->table.nents = count;
-   sdb->length = blk_rq_bytes(req);
+   if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
+   sdb->length = req->special_vec.bv_len;
+   else
+   sdb->length = blk_rq_bytes(req);
return BLKPREP_OK;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Christoph Hellwig
Can you check if this debug printk triggers for the discard commands?

---
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 888e16e..7ab7d08 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1031,6 +1031,10 @@ static void storvsc_command_completion(struct 
storvsc_cmd_request *cmd_request,
data_transfer_length = 0;
}
 
+   if (cmd_request->payload->range.len != data_transfer_length)
+   printk_ratelimited("request len: %u, transfer len: %u\n",
+   cmd_request->payload->range.len,
+   data_transfer_length);
scsi_set_resid(scmnd,
cmd_request->payload->range.len - data_transfer_length);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Dexuan Cui
> From: Christoph Hellwig [mailto:h...@lst.de]
> Sent: Thursday, January 12, 2017 21:44
> To: Dexuan Cui 
> Cc: Christoph Hellwig ; linux-block@vger.kernel.org; Jens Axboe
> ; Vitaly Kuznetsov ; linux-
> ker...@vger.kernel.org; KY Srinivasan ; Chris Valean
> (Cloudbase Solutions SRL) 
> Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve
> handling of the magic discard payload"
> 
> Hi Dexuan,
> 
> sorry for dropping the ball on the previous private report, I hoped
> I could get my hands on a Hyper-V VM and reproduce it myself, but
> that has obviously not happened.
> 
> Can you send me the output of the provisioning_mode file for the
> scsi disk in question to get started?

Hi Christoph,
Thank you very much for the help! 

The file just shows "unmap":

root@decui-u1604:~# cd /sys/class/scsi_disk/2\:0\:0\:0
root@decui-u1604:/sys/class/scsi_disk/2:0:0:0# ls
allow_restart  cache_type  device  manage_start_stop   
max_write_same_blocks  protection_mode  provisioning_mode  thin_provisioning
app_tag_owndeferred_probe  FUA max_medium_access_timeouts  power
  protection_type  subsystem  uevent
root@decui-u1604:/sys/class/scsi_disk/2:0:0:0# cat provisioning_mode
unmap

I'm ready to provide any info you need. :-)

Thanks,
-- Dexuan
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"

2017-01-12 Thread Christoph Hellwig
Hi Dexuan,

sorry for dropping the ball on the previous private report, I hoped
I could get my hands on a Hyper-V VM and reproduce it myself, but
that has obviously not happened.

Can you send me the output of the provisioning_mode file for the
scsi disk in question to get started?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html