Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere

2018-04-15 Thread Alan Jenkins

On 14/04/18 20:52, Jens Axboe wrote:

On 4/14/18 1:46 PM, Alan Jenkins wrote:

On 13/04/18 09:31, Johannes Thumshirn wrote:

Hi Alan,

On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:

# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd  # stop after 5 seconds

Can you please also add a regression test to blktests[1] for this?

[1] https://github.com/osandov/blktests

Thanks,
Johannes

Good question. It would be nice to promote this test.

Template looks like I need the commit (sha1) first.

I had some ideas about automating it, so I wrote a standalone (see
end).  I can automate the wakeup by using pm_test, but this is still a
system suspend test.  Unfortunately I don't think there's any
alternative. To give the most dire example

  # This test is non-destructive, but it exercises suspend in all drivers.
  # If your system has a problem with suspend, it might not wake up again.


So I'm not sure if it would be acceptable for the default set?

How useful is this going to be? Is there an expanded/full set of tests
that gets run somewhere?

If you can't guarantee it's going to be run somewhere, I'd worry the
cost/benefit  feels a little narrow :-(. There were one or two further
"interesting" details, and it might theoretically bitrot if it's not run
periodically.

I run it, just last week we found two new bugs with it. I'm requiring
anyone that submits block patches to run the test suite, and also
working towards having it be part of the 0-day runs so it gets run
on posted patches automatically.

So yes, it's useful and it won't bitrot. Please do turn it into a blktests
test.


Thanks, it's really great to have a test suite. I was specifically 
checking in on how we can include a system suspend test.


I've been thinking the suspend test could be opt-in test (e.g. 
ALLOW_PM_TEST=1), and then we have some infrastructure (you or 0-day 
runs) that does the opt-in.  Without knowing anything about the 
infrastructure, I didn't want to assume that would work.


I'm aware of one particular suspend issue; inside virt-manager VMs I see 
Linux crashing with a null pointer in qxl_drm_freeze.  A regression soon 
after I learned how to use VMs for suspend tests :-( , and it's been 
long enough that I suspect few people use it.


Partly what you saw me fishing for in the comments, is the idea of some 
kernel code allowing more direct testing of the queue freeze / 
preempt_only flag.  That might be better engineering, but I don't know 
where I could put it.


Alan


Re: [PATCH v2] block: do not use interruptible wait anywhere

2018-04-14 Thread Jens Axboe
On 4/12/18 12:11 PM, Alan Jenkins wrote:
> When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
> 
> The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
> ("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
> device is resumed asynchronously, i.e. after un-freezing userspace tasks.
> 
> So that commit exposed the bug as a regression in v4.15.  A mysterious
> SIGBUS (or -EIO) sometimes happened during the time the device was being
> resumed.  Most frequently, there was no kernel log message, and we saw Xorg
> or Xwayland killed by SIGBUS.[1]
> 
> [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
> 
> Without this fix, I get an IO error in this test:
> 
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds
> 
> The interruptible wait was added to blk_queue_enter in
> commit 3ef28e83ab15 ("block: generic request_queue reference counting").
> Before then, the interruptible wait was only in blk-mq, but I don't think
> it could ever have been correct.

Applied, thanks.

Still want that test in blktests, though!

-- 
Jens Axboe



Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere

2018-04-14 Thread Jens Axboe
On 4/14/18 1:46 PM, Alan Jenkins wrote:
> On 13/04/18 09:31, Johannes Thumshirn wrote:
>> Hi Alan,
>>
>> On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
>>> # dd if=/dev/sda of=/dev/null iflag=direct & \
>>>while killall -SIGUSR1 dd; do sleep 0.1; done & \
>>>echo mem > /sys/power/state ; \
>>>sleep 5; killall dd  # stop after 5 seconds
>> Can you please also add a regression test to blktests[1] for this?
>>
>> [1] https://github.com/osandov/blktests
>>
>> Thanks,
>>  Johannes
> 
> Good question. It would be nice to promote this test.
> 
> Template looks like I need the commit (sha1) first.
> 
> I had some ideas about automating it, so I wrote a standalone (see 
> end).  I can automate the wakeup by using pm_test, but this is still a 
> system suspend test.  Unfortunately I don't think there's any 
> alternative. To give the most dire example
> 
>  # This test is non-destructive, but it exercises suspend in all drivers.
>  # If your system has a problem with suspend, it might not wake up again.
> 
> 
> So I'm not sure if it would be acceptable for the default set?
> 
> How useful is this going to be? Is there an expanded/full set of tests 
> that gets run somewhere?
> 
> If you can't guarantee it's going to be run somewhere, I'd worry the 
> cost/benefit  feels a little narrow :-(. There were one or two further 
> "interesting" details, and it might theoretically bitrot if it's not run 
> periodically.

I run it, just last week we found two new bugs with it. I'm requiring
anyone that submits block patches to run the test suite, and also
working towards having it be part of the 0-day runs so it gets run
on posted patches automatically.

So yes, it's useful and it won't bitrot. Please do turn it into a blktests
test.

-- 
Jens Axboe



Re: blktest for [PATCH v2] block: do not use interruptible wait anywhere

2018-04-14 Thread Alan Jenkins

On 13/04/18 09:31, Johannes Thumshirn wrote:

Hi Alan,

On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:

# dd if=/dev/sda of=/dev/null iflag=direct & \
   while killall -SIGUSR1 dd; do sleep 0.1; done & \
   echo mem > /sys/power/state ; \
   sleep 5; killall dd  # stop after 5 seconds

Can you please also add a regression test to blktests[1] for this?

[1] https://github.com/osandov/blktests

Thanks,
Johannes


Good question. It would be nice to promote this test.

Template looks like I need the commit (sha1) first.

I had some ideas about automating it, so I wrote a standalone (see 
end).  I can automate the wakeup by using pm_test, but this is still a 
system suspend test.  Unfortunately I don't think there's any 
alternative. To give the most dire example


# This test is non-destructive, but it exercises suspend in all drivers.
# If your system has a problem with suspend, it might not wake up again.


So I'm not sure if it would be acceptable for the default set?

How useful is this going to be? Is there an expanded/full set of tests 
that gets run somewhere?


If you can't guarantee it's going to be run somewhere, I'd worry the 
cost/benefit  feels a little narrow :-(. There were one or two further 
"interesting" details, and it might theoretically bitrot if it's not run 
periodically.


If you look at the diff and title for the fix, I don't think it's at 
high risk of being reversed unintentionally.


And I think you can trust users will notice if the fix gets merged away 
accidentally, before it hits -stable releases :-). The issue kills the 
entire GUI session on resume from suspend, say once every three days, on 
gnome-shell (due to Xwayland). One unfortunate user switched to Xorg 
only to find that was also affected.  I honestly assume the issue 
applies generally to laptop systems.  The only mitigating factor is if 
you have RAM to spare, so you don't hit the major pagefaults during resume.


#!/bin/bash

# This test is non-destructive, but it exercises suspend in all drivers.
# If your system has a problem with suspend, it might not wake up again.

# TEST_DEV must be SCSI (inc. libata).
#
# Additionally, this test will abort if $TEST_DEV is too tiny
# and we finish reading it within 3 seconds.  Sorry.
TEST_DEV=sda

# RATIONALE
#
# The original root cause issue was the behaviour around blk_queue_freeze().
# It put tasks into an interruptible wait, which is wrong for block devices.
#
# XXX Insert reference to fix commit XXX
#
# The freeze feature is not directly exposed to userspace, so I can not test
# it directly :(.  (It's used to "guarantee no request is in use, so we can
# change any data structure of the queue afterward".  I.e. freeze, modify the
# queue structure, unfreeze).
#
# However, this lead to a regression with a decent reproducer.  In v4.15 the
# same interruptible wait was also used for SCSI suspend/resume.  SCSI resume
# can take a second or so... hence we like to do it asynchronously.  This
# means we can observe the wait at resume time, and we can test if it is
# interruptible.
#
# Note `echo quiesce > /sys/class/scsi_device/*/device/state` can *not*
# trigger the specific wait in the block layer.  That code path only
# sets the SCSI device state; it does not set any block device state.
# (It does not call into blk_queue_freeze() or blk_set_preempt_only();
#  it literally just sets sdev->sdev_state to SDEV_QUIESCE).

set -o nounset

abort() {
echo "$*"
echo "=== Test ERROR ==="
exit 2
}

SYSFS_PM_TEST_DELAY=/sys/module/suspend/parameters/pm_test_delay
SAVED_PM_TEST_DELAY=

# Child process IDs
DD=
SUBSHELL=

cleanup() {
# In many cases the subshell will already have exited...
# and semantics for `wait` are crappy in shell.
# Failure will be harmless in most cases.
# Just try to provide enough context for the user to guess.

echo "Cleaning up"

if [ -n "$SUBSHELL" ]; then

echo "Killing sub-shell PID $SUBSHELL..."
kill $SUBSHELL
wait $SUBSHELL
fi
if [ -n "$DD" ]; then
echo "Killing 'dd' PID $DD..."
kill $DD
wait $DD
fi

echo "Resetting pm_test"
echo none > /sys/power/pm_test

echo "Resetting pm_test_delay"

if [ -n "$SAVED_PM_TEST_DELAY" ]; then
echo "$SAVED_PM_TEST_DELAY" > "$SYSFS_PM_TEST_DELAY"
fi
}
trap cleanup EXIT

# "If a user has disabled async probing a likely reason
#  is due to a storage enclosure that does not inject
#  staggered spin-ups. For safety, make resume
#  synchronous as well in that case."
if ! SCAN="$(cat /sys/module/scsi_mod/parameters/scan)"; then
abort "error reading '/sys/module/scsi_mod/parameters/scan' ?"
fi
if [ "$SCAN" != "async" ]; then
abort "This test does not work if you have set 'scsi_mod.scan=sync'"
fi

# Ignore USR1, in the hope that this applies to child processes.
# This allows us to safely `kill -USR1 $DD`, when we don't know
# whether the child process has fully started yet.

Re: [PATCH v2] block: do not use interruptible wait anywhere

2018-04-13 Thread Johannes Thumshirn
Hi Alan,

On Thu, 2018-04-12 at 19:11 +0100, Alan Jenkins wrote:
> # dd if=/dev/sda of=/dev/null iflag=direct & \
>   while killall -SIGUSR1 dd; do sleep 0.1; done & \
>   echo mem > /sys/power/state ; \
>   sleep 5; killall dd  # stop after 5 seconds

Can you please also add a regression test to blktests[1] for this?

[1] https://github.com/osandov/blktests

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH v2] block: do not use interruptible wait anywhere

2018-04-12 Thread Alan Jenkins
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

# dd if=/dev/sda of=/dev/null iflag=direct & \
  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche 
Cc: sta...@vger.kernel.org
Signed-off-by: Alan Jenkins 
---
v2: fix indentation

 block/blk-core.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index abcb8684ba67..1a762f3980f2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -915,7 +915,6 @@ int blk_queue_enter(struct request_queue *q, 
blk_mq_req_flags_t flags)
 
while (true) {
bool success = false;
-   int ret;
 
rcu_read_lock();
if (percpu_ref_tryget_live(>q_usage_counter)) {
@@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, 
blk_mq_req_flags_t flags)
 */
smp_rmb();
 
-   ret = wait_event_interruptible(q->mq_freeze_wq,
-   (atomic_read(>mq_freeze_depth) == 0 &&
-(preempt || !blk_queue_preempt_only(q))) ||
-   blk_queue_dying(q));
+   wait_event(q->mq_freeze_wq,
+  (atomic_read(>mq_freeze_depth) == 0 &&
+   (preempt || !blk_queue_preempt_only(q))) ||
+  blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
-   if (ret)
-   return ret;
}
 }
 
-- 
2.14.3