Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
On Fri, Jan 13, 2017 at 06:16:02AM +, Dexuan Cui wrote: > IMO this means not only SCSI Unmap command is affected, but > some other SCSI commands can be affected too? > And it looks the bare metal can be affected too? This affects all drivers looking at the sdb.length field for the total I/O length - many drivers don't need it but just the SGL, including both that I tested d this change on - one being virtualized and one bare metal. It also only affects commands where the data transfer length is different from the length of the written blocks, so only affects WRITE SAME and UNMAP commands, used for discard or zeroing. I'll submit a cleaned up version with a proper block layer helper today. Thanks for reporting and debugging this issue! -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
> From: Dexuan Cui > Sent: Friday, January 13, 2017 11:05 > To: 'Christoph Hellwig'> Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris > Valean (Cloudbase Solutions SRL) > Subject: RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve > handling of the magic discard payload" > > > From: Christoph Hellwig [mailto:h...@lst.de] > > Sent: Friday, January 13, 2017 02:19 > > To: Dexuan Cui > > Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris > > Valean (Cloudbase Solutions SRL) > > Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: > improve > > handling of the magic discard payload" > > > > Next try: (I've also dropped most of the Cc list) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index c35b6de..2f358f7 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req, > > struct scsi_data_buffer *sdb) > > count = blk_rq_map_sg(req->q, req, sdb->table.sgl); > > BUG_ON(count > sdb->table.nents); > > sdb->table.nents = count; > > - sdb->length = blk_rq_bytes(req); > > + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) > > + sdb->length = req->special_vec.bv_len; > > + else > > + sdb->length = blk_rq_bytes(req); > > return BLKPREP_OK; > > } > > Hi Christoph, > The patch works like a charm! > fstrim can work now. > Chris may help to do more test. > > FWIW: > If (req->rq_flags & RQF_SPECIAL_PAYLOAD) is true, > req->special_vec.bv_len is always 24 in my test. > > Thanks really a lot for your quick patch! :-) > > Can the patch make it into v4.10? > IMO It's a really important fix. > > Thanks, > -- Dexuan FYI: I did more tests and the patch worked just great! BTW, fstrim/mkfs are not the only affected tools: I put a WARN_ON before the new line and found python too (see the below calltrace). IMO this means not only SCSI Unmap command is affected, but some other SCSI commands can be affected too? And it looks the bare metal can be affected too? Thanks, -- Dexuan //Dexcuan: in this case: req->special_vec.bv_len is 24 and // blk_rq_bytes(req) is 4096. [ 17.862939] CPU: 2 PID: 1430 Comm: python3 Tainted: GW 4.10.0-rc3+ #1 [ 17.862940] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012 [ 17.862941] Call Trace: [ 17.862947] dump_stack+0x63/0x90 [ 17.862952] __warn+0xcb/0xf0 [ 17.862954] warn_slowpath_fmt+0x5f/0x80 [ 17.862955] scsi_init_sgtable+0x92/0xc0 [ 17.862956] scsi_init_io+0x4f/0x1e0 [ 17.862959] sd_init_command+0x55b/0xdb0 [ 17.862963] ? scsi_host_alloc_command+0x44/0xc0 [ 17.862965] scsi_setup_cmnd+0xf0/0x150 [ 17.862966] scsi_prep_fn+0xef/0x170 [ 17.862968] blk_peek_request+0x180/0x2b0 [ 17.862970] scsi_request_fn+0x3e/0x620 [ 17.862973] ? elv_rb_add+0x61/0x70 [ 17.862977] ? deadline_add_request+0x36/0x80 [ 17.862978] __blk_run_queue+0x33/0x40 [ 17.862979] blk_queue_bio+0x3c8/0x3e0 [ 17.862980] generic_make_request+0xf2/0x1d0 [ 17.862981] submit_bio+0x73/0x150 [ 17.862985] submit_bh_wbc+0x14c/0x180 [ 17.862987] ll_rw_block+0x78/0xb0 [ 17.862988] __block_write_begin_int+0x4d6/0x5c0 [ 17.863002] ? ext4_inode_attach_jinode.part.67+0xb0/0xb0 [ 17.863004] ? ext4_da_write_begin+0x122/0x400 [ 17.863006] __block_write_begin+0x11/0x20 [ 17.863007] ext4_da_write_begin+0x178/0x400 [ 17.863012] generic_perform_write+0xc9/0x1c0 [ 17.863015] ? file_update_time+0xc8/0x110 [ 17.863017] __generic_file_write_iter+0x1a6/0x1f0 [ 17.863020] ext4_file_write_iter+0x89/0x370 [ 17.863023] ? _copy_to_user+0x2e/0x40 [ 17.863026] ? cp_new_stat+0x153/0x180 [ 17.863030] __vfs_write+0xe3/0x160 [ 17.863031] vfs_write+0xb8/0x1b0 [ 17.863032] SyS_write+0x55/0xc0 [ 17.863036] entry_SYSCALL_64_fastpath+0x1e/0xad [ 17.863037] RIP: 0033:0x7f27314bf4bd -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
> From: Christoph Hellwig [mailto:h...@lst.de] > Sent: Friday, January 13, 2017 02:19 > To: Dexuan Cui> Cc: linux-block@vger.kernel.org; KY Srinivasan ; Chris > Valean (Cloudbase Solutions SRL) > Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve > handling of the magic discard payload" > > Next try: (I've also dropped most of the Cc list) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index c35b6de..2f358f7 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req, > struct scsi_data_buffer *sdb) > count = blk_rq_map_sg(req->q, req, sdb->table.sgl); > BUG_ON(count > sdb->table.nents); > sdb->table.nents = count; > - sdb->length = blk_rq_bytes(req); > + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) > + sdb->length = req->special_vec.bv_len; > + else > + sdb->length = blk_rq_bytes(req); > return BLKPREP_OK; > } Hi Christoph, The patch works like a charm! fstrim can work now. Chris may help to do more test. FWIW: If (req->rq_flags & RQF_SPECIAL_PAYLOAD) is true, req->special_vec.bv_len is always 24 in my test. Thanks really a lot for your quick patch! :-) Can the patch make it into v4.10? IMO It's a really important fix. Thanks, -- Dexuan -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
Hi Christoph, Adding Nick and Alex to the thread. We'll give it a try along with Dexuan and update you with the results. Thank you! Chris Valean -Original Message- From: Christoph Hellwig [mailto:h...@lst.de] Sent: Thursday, January 12, 2017 8:19 PM To: Dexuan CuiCc: linux-block@vger.kernel.org; KY Srinivasan ; Chris Valean (Cloudbase Solutions SRL) Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload" Next try: (I've also dropped most of the Cc list) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c35b6de..2f358f7 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb) count = blk_rq_map_sg(req->q, req, sdb->table.sgl); BUG_ON(count > sdb->table.nents); sdb->table.nents = count; - sdb->length = blk_rq_bytes(req); + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) + sdb->length = req->special_vec.bv_len; + else + sdb->length = blk_rq_bytes(req); return BLKPREP_OK; } -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
Next try: (I've also dropped most of the Cc list) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c35b6de..2f358f7 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1018,7 +1018,10 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb) count = blk_rq_map_sg(req->q, req, sdb->table.sgl); BUG_ON(count > sdb->table.nents); sdb->table.nents = count; - sdb->length = blk_rq_bytes(req); + if (req->rq_flags & RQF_SPECIAL_PAYLOAD) + sdb->length = req->special_vec.bv_len; + else + sdb->length = blk_rq_bytes(req); return BLKPREP_OK; } -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
Can you check if this debug printk triggers for the discard commands? --- diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 888e16e..7ab7d08 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1031,6 +1031,10 @@ static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request, data_transfer_length = 0; } + if (cmd_request->payload->range.len != data_transfer_length) + printk_ratelimited("request len: %u, transfer len: %u\n", + cmd_request->payload->range.len, + data_transfer_length); scsi_set_resid(scmnd, cmd_request->payload->range.len - data_transfer_length); -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
> From: Christoph Hellwig [mailto:h...@lst.de] > Sent: Thursday, January 12, 2017 21:44 > To: Dexuan Cui> Cc: Christoph Hellwig ; linux-block@vger.kernel.org; Jens Axboe > ; Vitaly Kuznetsov ; linux- > ker...@vger.kernel.org; KY Srinivasan ; Chris Valean > (Cloudbase Solutions SRL) > Subject: Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve > handling of the magic discard payload" > > Hi Dexuan, > > sorry for dropping the ball on the previous private report, I hoped > I could get my hands on a Hyper-V VM and reproduce it myself, but > that has obviously not happened. > > Can you send me the output of the provisioning_mode file for the > scsi disk in question to get started? Hi Christoph, Thank you very much for the help! The file just shows "unmap": root@decui-u1604:~# cd /sys/class/scsi_disk/2\:0\:0\:0 root@decui-u1604:/sys/class/scsi_disk/2:0:0:0# ls allow_restart cache_type device manage_start_stop max_write_same_blocks protection_mode provisioning_mode thin_provisioning app_tag_owndeferred_probe FUA max_medium_access_timeouts power protection_type subsystem uevent root@decui-u1604:/sys/class/scsi_disk/2:0:0:0# cat provisioning_mode unmap I'm ready to provide any info you need. :-) Thanks, -- Dexuan -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Regression] fstrim hangs on Hyper-V: caused by "block: improve handling of the magic discard payload"
Hi Dexuan, sorry for dropping the ball on the previous private report, I hoped I could get my hands on a Hyper-V VM and reproduce it myself, but that has obviously not happened. Can you send me the output of the provisioning_mode file for the scsi disk in question to get started? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html