Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-24 Thread Omar Sandoval
On Mon, Apr 23, 2018 at 02:25:03PM +0200, Steffen Maier wrote:
> 
> On 04/19/2018 10:18 PM, Omar Sandoval wrote:
> > On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote:
> >> On 4/19/18 1:41 PM, Bart Van Assche wrote:
> >>> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
>  On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> > Thanks for the test! Applied.
> 
>  Side note, it's unfortunate that this test takes 180 seconds to run only
>  because we have to wait for the command timeout. We should be able to
>  export request_queue->rq_timeout writeable in sysfs. Would you be
>  interested in doing that?
> >>>
> >>> Hello Omar,
> >>>
> >>> Is this perhaps what you are looking for?
> >>> # ls -l /sys/class/scsi_device/*/*/timeout
> >>> -rw-r--r-- 1 root root 4096 Apr 19 08:52 
> >>> /sys/class/scsi_device/2:0:0:0/device/timeout
> >>> -rw-r--r-- 1 root root 4096 Apr 19 12:39 
> >>> /sys/class/scsi_device/8:0:0:1/device/timeout
> >>
> >> We should have it generically available though, not just for SCSI. In
> >> retrospect, it should have been under queue/ from the start, now we'll
> >> end up with duplicate entries for SCSI.
> > 
> > For the sake of this test, I just decreased the timeout through SCSI.
> 
> Great idea.
> 
> > echo 5 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/timeout"
> 
> However, the timeout should be sufficiently larger than scsi_debug/delay,
> in order not to run into the command timeout.
> It may be unfortunate that scsi_debug/delay uses jiffies as unit and
> can thus differ in a range of an order of magnitude for different kernel 
> configs.
> 
> > # delay to reduce response repetition: around 1..10sec depending on HZ
> > echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay
> 
> On s390, we typically have HZ=100, so 1000 jiffies are 10 seconds.

Good catch, I just switched this to use ndelay in nanoseconds instead of
delay.

> We can increase the sdev cmd timeout or decrease the scsi_debug/delay.
> 100 instead of 1000 for scsi_debug/delay worked for me;
> but for some reason the loop checking for busy did not work (any more?)
> causing an unexpected test case error:
> 
> > # ./check scsi/004
> > scsi/004 (ensure repeated TASK SET FULL results in EIO on timing out 
> > command) [failed]
> > runtime  31.892s  ...  31.720s
> > --- tests/scsi/004.out  2018-04-16 11:47:19.105931872 +0200
> > +++ results/nodev/scsi/004.out.bad  2018-04-23 14:07:33.615445253 
> > +0200
> > @@ -1,3 +1,3 @@
> >  Running scsi/004
> > -Input/output error
> > +modprobe: FATAL: Module scsi_debug is in use.
> >  Test complete
> 
> so I added another sleep hack:
> 
>  # dd closing SCSI disk causes implicit TUR also being delayed once
> +# sleep over time window where READ was done and TUR not yet queued
> +sleep 2
>  while grep -q -F "in_use_bm BUSY:" 
> "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do
> 
> What do you think?

I've been hitting this on and off on all of the scsi-debug tests for
awhile, and I can't figure out where the lingering reference comes from.
I don't think it's related, but I'll look into it.


Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-23 Thread Steffen Maier

On 04/19/2018 10:18 PM, Omar Sandoval wrote:
> On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote:
>> On 4/19/18 1:41 PM, Bart Van Assche wrote:
>>> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
 On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> Thanks for the test! Applied.

 Side note, it's unfortunate that this test takes 180 seconds to run only
 because we have to wait for the command timeout. We should be able to
 export request_queue->rq_timeout writeable in sysfs. Would you be
 interested in doing that?
>>>
>>> Hello Omar,
>>>
>>> Is this perhaps what you are looking for?
>>> # ls -l /sys/class/scsi_device/*/*/timeout
>>> -rw-r--r-- 1 root root 4096 Apr 19 08:52 
>>> /sys/class/scsi_device/2:0:0:0/device/timeout
>>> -rw-r--r-- 1 root root 4096 Apr 19 12:39 
>>> /sys/class/scsi_device/8:0:0:1/device/timeout
>>
>> We should have it generically available though, not just for SCSI. In
>> retrospect, it should have been under queue/ from the start, now we'll
>> end up with duplicate entries for SCSI.
> 
> For the sake of this test, I just decreased the timeout through SCSI.

Great idea.

>   echo 5 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/timeout"

However, the timeout should be sufficiently larger than scsi_debug/delay,
in order not to run into the command timeout.
It may be unfortunate that scsi_debug/delay uses jiffies as unit and
can thus differ in a range of an order of magnitude for different kernel 
configs.

>   # delay to reduce response repetition: around 1..10sec depending on HZ
>   echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay

On s390, we typically have HZ=100, so 1000 jiffies are 10 seconds.

We can increase the sdev cmd timeout or decrease the scsi_debug/delay.
100 instead of 1000 for scsi_debug/delay worked for me;
but for some reason the loop checking for busy did not work (any more?)
causing an unexpected test case error:

> # ./check scsi/004
> scsi/004 (ensure repeated TASK SET FULL results in EIO on timing out command) 
> [failed]
> runtime  31.892s  ...  31.720s
> --- tests/scsi/004.out2018-04-16 11:47:19.105931872 +0200
> +++ results/nodev/scsi/004.out.bad2018-04-23 14:07:33.615445253 
> +0200
> @@ -1,3 +1,3 @@
>  Running scsi/004
> -Input/output error
> +modprobe: FATAL: Module scsi_debug is in use.
>  Test complete

so I added another sleep hack:

 # dd closing SCSI disk causes implicit TUR also being delayed once
+# sleep over time window where READ was done and TUR not yet queued
+sleep 2
 while grep -q -F "in_use_bm BUSY:" 
"/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do

What do you think?

-- 
Mit freundlichen Grüßen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-19 Thread Omar Sandoval
On Thu, Apr 19, 2018 at 01:44:41PM -0600, Jens Axboe wrote:
> On 4/19/18 1:41 PM, Bart Van Assche wrote:
> > On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
> >> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> >>> Thanks for the test! Applied.
> >>
> >> Side note, it's unfortunate that this test takes 180 seconds to run only
> >> because we have to wait for the command timeout. We should be able to
> >> export request_queue->rq_timeout writeable in sysfs. Would you be
> >> interested in doing that?
> > 
> > Hello Omar,
> > 
> > Is this perhaps what you are looking for?
> > # ls -l /sys/class/scsi_device/*/*/timeout 
> > -rw-r--r-- 1 root root 4096 Apr 19 08:52 
> > /sys/class/scsi_device/2:0:0:0/device/timeout
> > -rw-r--r-- 1 root root 4096 Apr 19 12:39 
> > /sys/class/scsi_device/8:0:0:1/device/timeout
> 
> We should have it generically available though, not just for SCSI. In
> retrospect, it should have been under queue/ from the start, now we'll
> end up with duplicate entries for SCSI.

For the sake of this test, I just decreased the timeout through SCSI.


Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-19 Thread Jens Axboe
On 4/19/18 1:13 PM, Omar Sandoval wrote:
> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
>> Thanks for the test! Applied.
> 
> Side note, it's unfortunate that this test takes 180 seconds to run only
> because we have to wait for the command timeout. We should be able to
> export request_queue->rq_timeout writeable in sysfs. Would you be
> interested in doing that?

Here's one, I even tested it.


diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index d00d1b0ec109..d5d628c3c12d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -496,6 +496,27 @@ static ssize_t queue_dax_show(struct request_queue *q, 
char *page)
return queue_var_show(blk_queue_dax(q), page);
 }
 
+static ssize_t queue_timeout_show(struct request_queue *q, char *page)
+{
+   return sprintf(page, "%u\n", q->rq_timeout / HZ);
+}
+
+static ssize_t queue_timeout_store(struct request_queue *q, const char *page,
+  size_t count)
+{
+   unsigned long timeout;
+   ssize_t ret;
+
+   ret = queue_var_store(, page, count);
+   if (ret < 0)
+   return ret;
+   if (!timeout)
+   return -EINVAL;
+
+   blk_queue_rq_timeout(q, timeout * HZ);
+   return count;
+}
+
 static struct queue_sysfs_entry queue_requests_entry = {
.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
.show = queue_requests_show,
@@ -678,6 +699,12 @@ static struct queue_sysfs_entry throtl_sample_time_entry = 
{
 };
 #endif
 
+static struct queue_sysfs_entry queue_timeout_entry = {
+   .attr = {.name = "timeout", .mode = S_IRUGO | S_IWUSR },
+   .show = queue_timeout_show,
+   .store = queue_timeout_store,
+};
+
 static struct attribute *default_attrs[] = {
_requests_entry.attr,
_ra_entry.attr,
@@ -714,6 +741,7 @@ static struct attribute *default_attrs[] = {
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
_sample_time_entry.attr,
 #endif
+   _timeout_entry.attr,
NULL,
 };
 

-- 
Jens Axboe



Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-19 Thread Jens Axboe
On 4/19/18 1:41 PM, Bart Van Assche wrote:
> On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
>> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
>>> Thanks for the test! Applied.
>>
>> Side note, it's unfortunate that this test takes 180 seconds to run only
>> because we have to wait for the command timeout. We should be able to
>> export request_queue->rq_timeout writeable in sysfs. Would you be
>> interested in doing that?
> 
> Hello Omar,
> 
> Is this perhaps what you are looking for?
> # ls -l /sys/class/scsi_device/*/*/timeout 
> -rw-r--r-- 1 root root 4096 Apr 19 08:52 
> /sys/class/scsi_device/2:0:0:0/device/timeout
> -rw-r--r-- 1 root root 4096 Apr 19 12:39 
> /sys/class/scsi_device/8:0:0:1/device/timeout

We should have it generically available though, not just for SCSI. In
retrospect, it should have been under queue/ from the start, now we'll
end up with duplicate entries for SCSI.

-- 
Jens Axboe



Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-19 Thread Bart Van Assche
On Thu, 2018-04-19 at 12:13 -0700, Omar Sandoval wrote:
> On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> > Thanks for the test! Applied.
> 
> Side note, it's unfortunate that this test takes 180 seconds to run only
> because we have to wait for the command timeout. We should be able to
> export request_queue->rq_timeout writeable in sysfs. Would you be
> interested in doing that?

Hello Omar,

Is this perhaps what you are looking for?
# ls -l /sys/class/scsi_device/*/*/timeout 
-rw-r--r-- 1 root root 4096 Apr 19 08:52 
/sys/class/scsi_device/2:0:0:0/device/timeout
-rw-r--r-- 1 root root 4096 Apr 19 12:39 
/sys/class/scsi_device/8:0:0:1/device/timeout

Thanks,

Bart.





Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-19 Thread Omar Sandoval
On Thu, Apr 19, 2018 at 11:53:30AM -0700, Omar Sandoval wrote:
> Thanks for the test! Applied.

Side note, it's unfortunate that this test takes 180 seconds to run only
because we have to wait for the command timeout. We should be able to
export request_queue->rq_timeout writeable in sysfs. Would you be
interested in doing that?


Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-19 Thread Omar Sandoval
On Tue, Apr 17, 2018 at 11:03:37AM +0200, Steffen Maier wrote:
> Signed-off-by: Steffen Maier 
> ---
>  tests/scsi/004 |   59 
> 
>  tests/scsi/004.out |3 ++
>  2 files changed, 62 insertions(+), 0 deletions(-)
>  create mode 100755 tests/scsi/004
>  create mode 100644 tests/scsi/004.out
> 
> diff --git a/tests/scsi/004 b/tests/scsi/004
> new file mode 100755
> index 000..4852efc
> --- /dev/null
> +++ b/tests/scsi/004
> @@ -0,0 +1,59 @@
> +#!/bin/bash
> +#
> +# Ensure repeated SAM_STAT_TASK_SET_FULL results in EIO on timing out 
> command.
> +#
> +# Regression test for commit cbe095e2b584 ("Revert "scsi: core: return
> +# BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"")
> +#
> +# Found independently of corresponding commit mail threads while
> +# experimenting with storage mirroring. This test is a storage-independent
> +# reproducer for the error case I ran into.
> +#
> +# Copyright IBM Corp. 2018
> +# Author: Steffen Maier 
> +#
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +
> +. common/scsi_debug
> +
> +DESCRIPTION="ensure repeated TASK SET FULL results in EIO on timing out 
> command"
> +
> +requires() {
> + _have_scsi_debug
> +}
> +
> +test() {
> + echo "Running ${TEST_NAME}"
> +
> + if ! _init_scsi_debug add_host=1 max_luns=1 statistics=1 every_nth=1; 
> then
> + return 1
> + fi
> + # every_nth RW with full queue gets SAM_STAT_TASK_SET_FULL
> + echo 0x800 > /sys/bus/pseudo/drivers/scsi_debug/opts
> + # delay to reduce response repetition: around 1..10sec depending on HZ
> + echo 1000 > /sys/bus/pseudo/drivers/scsi_debug/delay
> + # a single command fills device queue to satisfy 0x800 opts condition
> + echo 1 > "/sys/block/${SCSI_DEBUG_DEVICES[0]}/device/queue_depth"
> + dd if="/dev/${SCSI_DEBUG_DEVICES[0]}" iflag=direct of=/dev/null bs=512 
> count=1 |& grep -o "Input/output error"
> + # stop injection
> + echo 0 > /sys/bus/pseudo/drivers/scsi_debug/opts
> + # dd closing SCSI disk causes implicit TUR also being delayed once
> + while grep -q -F "in_use_bm BUSY:" 
> "/proc/scsi/scsi_debug/${SCSI_DEBUG_HOSTS[0]}"; do
> + sleep 1
> + done
> + echo 1 > /sys/bus/pseudo/drivers/scsi_debug/delay
> + _exit_scsi_debug
> +
> + echo "Test complete"
> +}
> diff --git a/tests/scsi/004.out b/tests/scsi/004.out
> new file mode 100644
> index 000..b1126fb
> --- /dev/null
> +++ b/tests/scsi/004.out
> @@ -0,0 +1,3 @@
> +Running scsi/004
> +Input/output error
> +Test complete
> -- 
> 1.7.1
> 

Thanks for the test! Applied.


Re: [PATCH blktests] scsi/004: add regression test for false BLK_STS_OK with non good SAM status

2018-04-17 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850