Hello Jens, Christoph.
Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
as in a VM.
I use complex disk layout involving MD, LUKS and LVM, and managed to get these
warnings from VM via
68.698624]
[ 368.698990] =
[ 368.698990]
===
Deadlock with CPU hotplug?
On sobota 29. července 2017 17:27:41 CEST Oleksandr Natalenko wrote:
> Hello Jens, Christoph.
>
> Unfortunately, even with "block: disable runtime-pm for blk-mq" patch
> applied blk-mq breaks
Hello Mike et al.
On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote:
> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if
> stable fixed it.
My build already includes v4.12.4.
> If not, I'd find these two commits irresistible.
>
> 5f042e7cbd9eb blk-mq: Include all
warning if user specifies "elevator" boot param.
Removing this option at all might be considered in some future.
Signed-off-by: Oleksandr Natalenko <oleksa...@redhat.com>
---
block/elevator.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/elevator.c b/block/elevator.c
in
Greg,
this is 765e40b675a9566459ddcb8358ad16f3b8344bbe.
On úterý 8. srpna 2017 18:43:33 CEST Greg KH wrote:
> On Tue, Aug 08, 2017 at 06:36:01PM +0200, Oleksandr Natalenko wrote:
> > Could you queue "block: disable runtime-pm for blk-mq" too please? It is
> > also
2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote:
> >> Hello Mike et al.
> >>
> >> On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote:
> >>> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if
> >>> stable fixed it.
Also
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
for whole v8.
On úterý 3. října 2017 16:03:58 CEST Ming Lei wrote:
> Hi Jens,
>
> Please consider this patchset for V4.15, and it fixes one
> kind of long-term I/O hang issue in either block legac
Hi.
By any chance, could this be backported to 4.14? I'm confused with "SCSI:
allow to pass null rq to scsi_prep_state_check()" since it uses refactored
flags.
===
if (req && !(req->rq_flags & RQF_PREEMPT))
===
Is it safe to revert to REQ_PREEMPT here, or rq_flags should also be replaced
Jens,
any thoughts about this?
On Mon, Aug 14, 2017 at 11:27 AM, Oleksandr Natalenko
<oleksa...@redhat.com> wrote:
> Setting I/O scheduler via kernel command line is not flexible enough
> anymore. Different schedulers might be desirable for different types
> of devices
Hello.
> Recently it was reported on the block layer mailing list that suspend
> does not work reliably neither for the legacy block layer nor for blk-mq.
Since I was one of the reporters, please consider adding me to CC next time as
I'm interested in resolving this issue properly and test each
Feel free to add:
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
since I'm running this on 4 machines without issues.
> Hi Jens,
>
> Ping...
For v4 with regard to suspend/resume:
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
On pondělí 11. září 2017 13:10:11 CEST Ming Lei wrote:
> Hi,
>
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
>
> Once SCSI device is put into QUIESCE,
Hello.
I can confirm that this patchset applied on top of v4.14-rc3 also (like Ming's
one) fixes the issue with suspend/resume+RAID10, reported by me previously.
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
On úterý 3. října 2017 0:52:10 CEST Bart Van Assche wrote:
> H
Hey.
I can confirm that v6 of your patchset still works well for me. Tested on
v4.13 kernel.
Thanks.
On středa 27. září 2017 10:52:41 CEST Ming Lei wrote:
> On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote:
> > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote:
> > >
Hi.
v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have
blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
result. Soft lockup happens showing stacktraces I'm pasting below.
Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits
na 2017 13:45:43 CEST Oleksandr Natalenko wrote:
> Hi.
>
> v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have
> blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same
> result. Soft lockup happens showing stacktraces I'm pasting below.
>
Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote:
> Hi.
>
> I've re-checked this issue with 4.12.9, and it is still there.
>
> Also, I've managed to reproduce it in a VM with non-virtio disks (just
&g
620
[ 63.668704] #1: ((>work)){+.+.+.}, at: []
process_one_work+0x1b3/0x620
[ 63.674307] #2: (>mutex){..}, at: []
device_resume+0x89/0x1e0
[ 63.678492] 2 locks held by kworker/u8:24/2619:
[ 63.680737] #0: ("events_unbound"){.+.+.+}, at: []
process_one_work+0x1b3/0x620
[ 63.685153] #1: (
wrote:
> On Sat, Aug 26, 2017 at 12:48:01PM +0200, Oleksandr Natalenko wrote:
> > Quick update: reproduced on both v4.12.7 and v4.13.0-rc6.
>
> BTW, given it hangs during resume, it isn't easy to collect debug
> info, and there should have been lots useful info there.
>
> Yo
Hi.
On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote:
> Could you verify if the following patch fixes your issue?
> …SNIP…
I've applied it to v4.12.9 and rechecked — the issue is still there,
unfortunately. Stacktrace is the same as before.
Were you able to reproduce it in a VM?
Should
With regard to suspend/resume cycle:
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
On pátek 1. září 2017 20:49:49 CEST Ming Lei wrote:
> Hi,
>
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
>
> Once SCSI device is put into QUIESCE,
Again,
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
On sobota 2. září 2017 15:08:32 CEST Ming Lei wrote:
> Hi,
>
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
>
> Once SCSI device is put into QUIESCE, no new request except
at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> > Since I'm in CC, does this series aim to replace 2 patches I've tested
> > before:
> >
> > blk-mq: add requests in the tail of hctx->dispatch
> > blk-mq: align to legacy's implementation of blk_execute_rq
> >
&
Since I'm in CC, does this series aim to replace 2 patches I've tested before:
blk-mq: add requests in the tail of hctx->dispatch
blk-mq: align to legacy's implementation of blk_execute_rq
?
On čtvrtek 31. srpna 2017 19:27:19 CEST Ming Lei wrote:
> The current SCSI quiesce isn't safe and easy
Hi Ming, Jens.
What is the fate of this patchset please?
03.10.2017 16:03, Ming Lei wrote:
Hi Jens,
Please consider this patchset for V4.15, and it fixes one
kind of long-term I/O hang issue in either block legacy path
or blk-mq.
The current SCSI quiesce isn't safe and easy to trigger I/O
Then,
Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name>
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
On čtvrtek 9. listopadu 2017 17:55:58 CET Jens Axboe wrote:
> On 11/09/2017 09:54 AM, Bart Van Assche wrote:
> > On Thu, 2017-11-09 at 07:16 +0100
Bart,
is this something known to you, or it is just my fault applying this series to
v4.13? Except having this warning, suspend/resume works for me:
===
[ 27.383846] sd 0:0:0:0: [sda] Starting disk
[ 27.383976] sd 1:0:0:0: [sdb] Starting disk
[ 27.451218] sdb: Attempt to allocate
Well,
I've cherry-picked this series for current upstream/master branch, and got
this while performing another suspend try:
===
[ 62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing
to freeze, wq_busy=0):
[ 62.421150] xfsaild/dm-7D0 289 2 0x8000
[
queue *bfqq)
No harm is observed on both test VM with smartctl hammer and my laptop.
So,
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
Thanks.
--
Oleksandr Natalenko (post-factum)
d state.
> + */
> + unsigned long service_from_wr;
>
> /*
>* Value of wr start time when switching to soft rt
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index 4456eda34e48..4498c43245e2 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
and visible smoke:
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
for both of them.
Many thanks, Paolo!
Hi.
30.01.2018 09:19, Ming Lei wrote:
Hi,
We knew there is IO hang issue on BFQ over USB-storage wrt. blk-mq, and
last time I found it is inside BFQ. You can try the debug patch in the
following link[1] to see if it is same with the previous report[1][2]:
[1]
Hi, Paolo, Ivan, Ming et al.
It looks like I've just encountered the issue Ivan has already described
in [1]. Since I'm able to reproduce it reliably in a VM, I'd like to
draw more attention to it.
First, I'm using v4.15 kernel with all pending BFQ fixes:
===
2ad909a300c4 bfq-iosched: don't
Hi.
06.02.2018 15:50, Paolo Valente wrote:
Could you please do a
gdb /block/bfq-iosched.o # or vmlinux.o if bfq is builtin
list *(bfq_finish_requeue_request+0x54)
list *(bfq_put_queue+0x10b)
for me?
Fresh crashes and gdb output are given below. A side note: it is harder
to trigger things on
Hi.
06.02.2018 12:57, Mike Galbraith wrote:
Not me. Box seems to be fairly sure that it is bfq. Twice again box
went belly up on me in fairly short order with bfq, but seemed fine
with deadline. I'm currently running deadline again, and box again
seems solid, thought I won't say _is_ solid
06.02.2018 15:50, Paolo Valente wrote:
Could you please do a
gdb /block/bfq-iosched.o # or vmlinux.o if bfq is builtin
list *(bfq_finish_requeue_request+0x54)
list *(bfq_put_queue+0x10b)
for me?
Yes. Just give me some time to recompile the kernel with minimal debug
info enabled. I'll post
Hi.
On pátek 9. února 2018 18:29:39 CET Mike Galbraith wrote:
> On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> > In addition to this I think it should be worth considering CC'ing Greg
> > to pull this fix into 4.15 stable tree.
>
> This isn't one he can cher
rly implementing the hook requeue_request in BFQ.
Thanks, applied.
I Jens,
I forgot to add
Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name>
in the patch.
Is it still possible to add it?
In addition to this I think it should be worth considering CC'ing Greg
to pull this fix
-by: Oleksandr Natalenko <oleksa...@natalenko.name>
Thank you.
Hi.
06.02.2018 08:56, Mike Galbraith wrote:
I was doing kbuilds, and it blew up on me twice. Switching back to cfq
seemed to confirm it was indeed the patch causing trouble, but that's
by no means a certainty.
Just to note, I was using v4.15.1, not the latest git HEAD. Are you able
to
Hi.
06.02.2018 14:46, Mike Galbraith wrote:
Sorry for the noise, but just to make it clear, are we talking about
"deadline" or "mq-deadline" now?
mq-deadline.
Okay, I've spent a little bit more time on stressing the VM with BFQ +
this patch enabled, and managed to get it crashed relatively
Hi.
07.02.2018 12:27, Paolo Valente wrote:
Hi Oleksandr, Holger,
before I prepare a V2 candidate patch, could you please test my
instrumentation patch too, with the above change made. For your
convenience, I have attached a compressed archive with both the
instrumentation patch and a patch
Hi.
Cc'ing linux-block people (mainly, Christoph) too because of 17cb960f29c2.
Also, duplicating the initial statement for them.
With v4.16 (and now with v4.16.1) it is possible to trigger usercopy whitelist
warning and/or bug while doing smartctl on a SATA disk having blk-mq and BFQ
enabled.
Hi.
09.04.2018 11:35, Christoph Hellwig wrote:
I really can't make sense of that report.
Sorry, I have nothing to add there so far, I just see the symptom of
something going wrong in the ioctl code path that is invoked by
smartctl, but I have no idea what's the minimal environment to
Hi.
10.04.2018 08:35, Oleksandr Natalenko wrote:
- does it reproduce _without_ hardened usercopy? (I would assume yes,
but you'd just not get any warning until the hangs started.) If it
does reproduce without hardened usercopy, then a new bisect run could
narrow the search even more.
Looks
Hi, Kees, Paolo et al.
10.04.2018 08:53, Kees Cook wrote:
Unfortunately I only had a single hang with no dumps. I haven't been
able to reproduce it since. :(
For your convenience I've prepared a VM that contains a reproducer.
It consists of 3 disk images (sda.img is for the system, it is
Hi.
On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote:
> My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire
> up timeouts"), which seems insane given that null_blk isn't even built
> in the .config. I managed to get the testing automated now for a "git
> bisect run ...",
Hi.
(fancy details for linux-block and BFQ people go below)
09.04.2018 20:32, Kees Cook wrote:
Ah, this detail I didn't have. I've changed my environment to
build with:
CONFIG_BLK_MQ_PCI=y
CONFIG_BLK_MQ_VIRTIO=y
CONFIG_IOSCHED_BFQ=y
boot with scsi_mod.use_blk_mq=1
and select BFQ in the
Hi.
09.04.2018 22:30, Kees Cook wrote:
echo 1 | tee /sys/block/sd*/queue/nr_requests
I can't get this below "4".
Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is
enforced explicitly in queue_requests_store(). It is the same for me.
echo 1 | tee
Hi.
17.04.2018 05:12, Kees Cook wrote:
Turning off HARDENED_USERCOPY and turning on KASAN, I see the same
report:
[ 38.274106] BUG: KASAN: slab-out-of-bounds in
_copy_to_user+0x42/0x60
[ 38.274841] Read of size 22 at addr 8800122b8c4b by task
smartctl/1064
[ 38.275630]
[
fusion I should note that I've removed the
reproducer from my server, but I can re-upload it if needed.
--
Oleksandr Natalenko (post-factum)
Hi.
17.04.2018 23:47, Kees Cook wrote:
I sent the patch anyway, since it's kind of a robustness improvement,
I'd hope. If you fix BFQ also, please add:
Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name>
Root-caused-by: Kees Cook <keesc...@chromium.org>
:) I gotta task-swi
Cc: Pavel Machek
Cc: Paolo Valente
Cc: Jens Axboe
Cc: Ulf Hansson
Cc: Richard Weinberger
Cc: Adrian Hunter
Cc: Bart Van Assche
Cc: Jan Kara
Cc: Artem Bityutskiy
Cc: Christoph Hellwig
Cc: Alan Cox
Cc: Mark Brown
Cc: Damien Le Moal
Cc: Johannes Thumshirn
Cc: Oleksandr Natalenko
Cc: Jonat
537.646 540.609
1r-raw_seq 500.733 502.526
Throughput-wise, BFQ is on-par with mq-deadline. Latency-wise, BFQ is
much-much better.
--
Oleksandr Natalenko (post-factum)
struct/table now + policy search
helper?
Let's do it when it happens. Premature optimization is the root
of all evil ;)
I'd say, this is a matter of code readability, not optimisations. I do
not strongly object against current approach, though.
--
Oleksandr Natalenko (post-factum)
55 matches
Mail list logo