blk-mq breaks suspend even with runtime PM patch

2017-07-29 Thread Oleksandr Natalenko
Hello Jens, Christoph. Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well as in a VM. I use complex disk layout involving MD, LUKS and LVM, and managed to get these warnings from VM via

Re: blk-mq breaks suspend even with runtime PM patch

2017-07-29 Thread Oleksandr Natalenko
68.698624] [ 368.698990] = [ 368.698990] === Deadlock with CPU hotplug? On sobota 29. července 2017 17:27:41 CEST Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch > applied blk-mq breaks

Re: blk-mq breaks suspend even with runtime PM patch

2017-07-30 Thread Oleksandr Natalenko
Hello Mike et al. On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > stable fixed it. My build already includes v4.12.4. > If not, I'd find these two commits irresistible. > > 5f042e7cbd9eb blk-mq: Include all

[RFC] block: deprecate choosing elevator via boot param

2017-08-14 Thread Oleksandr Natalenko
warning if user specifies "elevator" boot param. Removing this option at all might be considered in some future. Signed-off-by: Oleksandr Natalenko <oleksa...@redhat.com> --- block/elevator.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/elevator.c b/block/elevator.c in

Re: blk-mq breaks suspend even with runtime PM patch

2017-08-08 Thread Oleksandr Natalenko
Greg, this is 765e40b675a9566459ddcb8358ad16f3b8344bbe. On úterý 8. srpna 2017 18:43:33 CEST Greg KH wrote: > On Tue, Aug 08, 2017 at 06:36:01PM +0200, Oleksandr Natalenko wrote: > > Could you queue "block: disable runtime-pm for blk-mq" too please? It is > > also

Re: blk-mq breaks suspend even with runtime PM patch

2017-08-08 Thread Oleksandr Natalenko
2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > >> Hello Mike et al. > >> > >> On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > >>> FWIW, first thing I'd do is update that 4.12.0 to 4.12.4, and see if > >>> stable fixed it.

Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing

2017-10-03 Thread Oleksandr Natalenko
Also Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> for whole v8. On úterý 3. října 2017 16:03:58 CEST Ming Lei wrote: > Hi Jens, > > Please consider this patchset for V4.15, and it fixes one > kind of long-term I/O hang issue in either block legac

Re: [PATCH V10 0/8] blk-mq-sched: improve sequential I/O performance

2017-10-14 Thread Oleksandr Natalenko
Hi. By any chance, could this be backported to 4.14? I'm confused with "SCSI: allow to pass null rq to scsi_prep_state_check()" since it uses refactored flags. === if (req && !(req->rq_flags & RQF_PREEMPT)) === Is it safe to revert to REQ_PREEMPT here, or rq_flags should also be replaced

Re: [RFC] block: deprecate choosing elevator via boot param

2017-08-30 Thread Oleksandr Natalenko
Jens, any thoughts about this? On Mon, Aug 14, 2017 at 11:27 AM, Oleksandr Natalenko <oleksa...@redhat.com> wrote: > Setting I/O scheduler via kernel command line is not flexible enough > anymore. Different schedulers might be desirable for different types > of devices

Re: [PATCH 0/5] Make SCSI device suspend work reliably

2017-09-09 Thread Oleksandr Natalenko
Hello. > Recently it was reported on the block layer mailing list that suspend > does not work reliably neither for the legacy block layer nor for blk-mq. Since I was one of the reporters, please consider adding me to CC next time as I'm interested in resolving this issue properly and test each

Re: [PATCH V4 00/14] blk-mq-sched: improve SCSI-MQ performance

2017-09-06 Thread Oleksandr Natalenko
Feel free to add: Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> since I'm running this on 4 machines without issues. > Hi Jens, > > Ping...

Re: [PATCH V4 0/10] block/scsi: safe SCSI quiescing

2017-09-11 Thread Oleksandr Natalenko
For v4 with regard to suspend/resume: Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> On pondělí 11. září 2017 13:10:11 CEST Ming Lei wrote: > Hi, > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE,

Re: [PATCH v5 0/8] block, scsi, md: Improve suspend and resume

2017-10-03 Thread Oleksandr Natalenko
Hello. I can confirm that this patchset applied on top of v4.14-rc3 also (like Ming's one) fixes the issue with suspend/resume+RAID10, reported by me previously. Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> On úterý 3. října 2017 0:52:10 CEST Bart Van Assche wrote: > H

Re: [PATCH V6 0/6] block/scsi: safe SCSI quiescing

2017-09-28 Thread Oleksandr Natalenko
Hey. I can confirm that v6 of your patchset still works well for me. Tested on v4.13 kernel. Thanks. On středa 27. září 2017 10:52:41 CEST Ming Lei wrote: > On Wed, Sep 27, 2017 at 04:27:51PM +0800, Ming Lei wrote: > > On Wed, Sep 27, 2017 at 09:57:37AM +0200, Martin Steigerwald wrote: > > >

I/O hangs after resuming from suspend-to-ram

2017-08-22 Thread Oleksandr Natalenko
Hi. v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same result. Soft lockup happens showing stacktraces I'm pasting below. Stacktrace shows that I/O hangs in md_super_wait(), and it means it waits

Re: I/O hangs after resuming from suspend-to-ram

2017-08-26 Thread Oleksandr Natalenko
na 2017 13:45:43 CEST Oleksandr Natalenko wrote: > Hi. > > v4.12.8 kernel hangs in I/O path after resuming from suspend-to-ram. I have > blk-mq enabled, tried both BFQ and mq-deadline schedulers with the same > result. Soft lockup happens showing stacktraces I'm pasting below. >

Re: I/O hangs after resuming from suspend-to-ram

2017-08-26 Thread Oleksandr Natalenko
Quick update: reproduced on both v4.12.7 and v4.13.0-rc6. On sobota 26. srpna 2017 12:37:29 CEST Oleksandr Natalenko wrote: > Hi. > > I've re-checked this issue with 4.12.9, and it is still there. > > Also, I've managed to reproduce it in a VM with non-virtio disks (just &g

Re: I/O hangs after resuming from suspend-to-ram

2017-08-26 Thread Oleksandr Natalenko
620 [ 63.668704] #1: ((>work)){+.+.+.}, at: [] process_one_work+0x1b3/0x620 [ 63.674307] #2: (>mutex){..}, at: [] device_resume+0x89/0x1e0 [ 63.678492] 2 locks held by kworker/u8:24/2619: [ 63.680737] #0: ("events_unbound"){.+.+.+}, at: [] process_one_work+0x1b3/0x620 [ 63.685153] #1: (

Re: I/O hangs after resuming from suspend-to-ram

2017-08-27 Thread Oleksandr Natalenko
wrote: > On Sat, Aug 26, 2017 at 12:48:01PM +0200, Oleksandr Natalenko wrote: > > Quick update: reproduced on both v4.12.7 and v4.13.0-rc6. > > BTW, given it hangs during resume, it isn't easy to collect debug > info, and there should have been lots useful info there. > > Yo

Re: I/O hangs after resuming from suspend-to-ram

2017-08-28 Thread Oleksandr Natalenko
Hi. On pondělí 28. srpna 2017 14:58:28 CEST Ming Lei wrote: > Could you verify if the following patch fixes your issue? > …SNIP… I've applied it to v4.12.9 and rechecked — the issue is still there, unfortunately. Stacktrace is the same as before. Were you able to reproduce it in a VM? Should

Re: [PATCH V2 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Oleksandr Natalenko
With regard to suspend/resume cycle: Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> On pátek 1. září 2017 20:49:49 CEST Ming Lei wrote: > Hi, > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE,

Re: [PATCH V3 0/8] block/scsi: safe SCSI quiescing

2017-09-02 Thread Oleksandr Natalenko
Again, Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> On sobota 2. září 2017 15:08:32 CEST Ming Lei wrote: > Hi, > > The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. > > Once SCSI device is put into QUIESCE, no new request except

Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-08-31 Thread Oleksandr Natalenko
at 07:34:06PM +0200, Oleksandr Natalenko wrote: > > Since I'm in CC, does this series aim to replace 2 patches I've tested > > before: > > > > blk-mq: add requests in the tail of hctx->dispatch > > blk-mq: align to legacy's implementation of blk_execute_rq > > &

Re: [PATCH 0/9] block/scsi: safe SCSI quiescing

2017-08-31 Thread Oleksandr Natalenko
Since I'm in CC, does this series aim to replace 2 patches I've tested before: blk-mq: add requests in the tail of hctx->dispatch blk-mq: align to legacy's implementation of blk_execute_rq ? On čtvrtek 31. srpna 2017 19:27:19 CEST Ming Lei wrote: > The current SCSI quiesce isn't safe and easy

Re: [PATCH V8 0/8] block/scsi: safe SCSI quiescing

2017-11-07 Thread Oleksandr Natalenko
Hi Ming, Jens. What is the fate of this patchset please? 03.10.2017 16:03, Ming Lei wrote: Hi Jens, Please consider this patchset for V4.15, and it fixes one kind of long-term I/O hang issue in either block legacy path or blk-mq. The current SCSI quiesce isn't safe and easy to trigger I/O

Re: [PATCH v11 0/7] block, scsi, md: Improve suspend and resume

2017-11-09 Thread Oleksandr Natalenko
Then, Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name> Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> On čtvrtek 9. listopadu 2017 17:55:58 CET Jens Axboe wrote: > On 11/09/2017 09:54 AM, Bart Van Assche wrote: > > On Thu, 2017-11-09 at 07:16 +0100

Re: [PATCH v11 0/7] block, scsi, md: Improve suspend and resume

2017-11-08 Thread Oleksandr Natalenko
Bart, is this something known to you, or it is just my fault applying this series to v4.13? Except having this warning, suspend/resume works for me: === [ 27.383846] sd 0:0:0:0: [sda] Starting disk [ 27.383976] sd 1:0:0:0: [sdb] Starting disk [ 27.451218] sdb: Attempt to allocate

Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-21 Thread Oleksandr Natalenko
Well, I've cherry-picked this series for current upstream/master branch, and got this while performing another suspend try: === [ 62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): [ 62.421150] xfsaild/dm-7D0 289 2 0x8000 [

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Oleksandr Natalenko
queue *bfqq) No harm is observed on both test VM with smartctl hammer and my laptop. So, Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> Thanks. -- Oleksandr Natalenko (post-factum)

Re: [PATCH IMPROVEMENT] block, bfq: limit sectors served with interactive weight raising

2017-12-29 Thread Oleksandr Natalenko
d state. > + */ > + unsigned long service_from_wr; > > /* >* Value of wr start time when switching to soft rt > diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c > index 4456eda34e48..4498c43245e2 100644 > --- a/block/bfq-wf2q.c > +++ b/block/bfq-wf2q.c

Re: [PATCH BUGFIX/IMPROVEMENT 0/2] block, bfq: two pending patches

2018-01-13 Thread Oleksandr Natalenko
and visible smoke: Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> for both of them. Many thanks, Paolo!

Re: v4.15 and I/O hang with BFQ

2018-01-30 Thread Oleksandr Natalenko
Hi. 30.01.2018 09:19, Ming Lei wrote: Hi, We knew there is IO hang issue on BFQ over USB-storage wrt. blk-mq, and last time I found it is inside BFQ. You can try the debug patch in the following link[1] to see if it is same with the previous report[1][2]: [1]

v4.15 and I/O hang with BFQ

2018-01-30 Thread Oleksandr Natalenko
Hi, Paolo, Ivan, Ming et al. It looks like I've just encountered the issue Ivan has already described in [1]. Since I'm able to reproduce it reliably in a VM, I'd like to draw more attention to it. First, I'm using v4.15 kernel with all pending BFQ fixes: === 2ad909a300c4 bfq-iosched: don't

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Oleksandr Natalenko
Hi. 06.02.2018 15:50, Paolo Valente wrote: Could you please do a gdb /block/bfq-iosched.o # or vmlinux.o if bfq is builtin list *(bfq_finish_requeue_request+0x54) list *(bfq_put_queue+0x10b) for me? Fresh crashes and gdb output are given below. A side note: it is harder to trigger things on

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Oleksandr Natalenko
Hi. 06.02.2018 12:57, Mike Galbraith wrote: Not me.  Box seems to be fairly sure that it is bfq. Twice again box went belly up on me in fairly short order with bfq, but seemed fine with deadline. I'm currently running deadline again, and box again seems solid, thought I won't say _is_ solid

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Oleksandr Natalenko
06.02.2018 15:50, Paolo Valente wrote: Could you please do a gdb /block/bfq-iosched.o # or vmlinux.o if bfq is builtin list *(bfq_finish_requeue_request+0x54) list *(bfq_put_queue+0x10b) for me? Yes. Just give me some time to recompile the kernel with minimal debug info enabled. I'll post

Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

2018-02-10 Thread Oleksandr Natalenko
Hi. On pátek 9. února 2018 18:29:39 CET Mike Galbraith wrote: > On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote: > > In addition to this I think it should be worth considering CC'ing Greg > > to pull this fix into 4.15 stable tree. > > This isn't one he can cher

Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

2018-02-09 Thread Oleksandr Natalenko
rly implementing the hook requeue_request in BFQ. Thanks, applied. I Jens, I forgot to add Tested-by: Oleksandr Natalenko <oleksa...@natalenko.name> in the patch. Is it still possible to add it? In addition to this I think it should be worth considering CC'ing Greg to pull this fix

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Oleksandr Natalenko
-by: Oleksandr Natalenko <oleksa...@natalenko.name> Thank you.

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Oleksandr Natalenko
Hi. 06.02.2018 08:56, Mike Galbraith wrote: I was doing kbuilds, and it blew up on me twice. Switching back to cfq seemed to confirm it was indeed the patch causing trouble, but that's by no means a certainty. Just to note, I was using v4.15.1, not the latest git HEAD. Are you able to

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Oleksandr Natalenko
Hi. 06.02.2018 14:46, Mike Galbraith wrote: Sorry for the noise, but just to make it clear, are we talking about "deadline" or "mq-deadline" now? mq-deadline. Okay, I've spent a little bit more time on stressing the VM with BFQ + this patch enabled, and managed to get it crashed relatively

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Oleksandr Natalenko
Hi. 07.02.2018 12:27, Paolo Valente wrote: Hi Oleksandr, Holger, before I prepare a V2 candidate patch, could you please test my instrumentation patch too, with the above change made. For your convenience, I have attached a compressed archive with both the instrumentation patch and a patch

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-08 Thread Oleksandr Natalenko
Hi. Cc'ing linux-block people (mainly, Christoph) too because of 17cb960f29c2. Also, duplicating the initial statement for them. With v4.16 (and now with v4.16.1) it is possible to trigger usercopy whitelist warning and/or bug while doing smartctl on a SATA disk having blk-mq and BFQ enabled.

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
Hi. 09.04.2018 11:35, Christoph Hellwig wrote: I really can't make sense of that report. Sorry, I have nothing to add there so far, I just see the symptom of something going wrong in the ioctl code path that is invoked by smartctl, but I have no idea what's the minimal environment to

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko
Hi. 10.04.2018 08:35, Oleksandr Natalenko wrote: - does it reproduce _without_ hardened usercopy? (I would assume yes, but you'd just not get any warning until the hangs started.) If it does reproduce without hardened usercopy, then a new bisect run could narrow the search even more. Looks

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko
Hi, Kees, Paolo et al. 10.04.2018 08:53, Kees Cook wrote: Unfortunately I only had a single hang with no dumps. I haven't been able to reproduce it since. :( For your convenience I've prepared a VM that contains a reproducer. It consists of 3 disk images (sda.img is for the system, it is

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-12 Thread Oleksandr Natalenko
Hi. On čtvrtek 12. dubna 2018 20:44:37 CEST Kees Cook wrote: > My first bisect attempt gave me commit 5448aca41cd5 ("null_blk: wire > up timeouts"), which seems insane given that null_blk isn't even built > in the .config. I managed to get the testing automated now for a "git > bisect run ...",

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-09 Thread Oleksandr Natalenko
Hi. (fancy details for linux-block and BFQ people go below) 09.04.2018 20:32, Kees Cook wrote: Ah, this detail I didn't have. I've changed my environment to build with: CONFIG_BLK_MQ_PCI=y CONFIG_BLK_MQ_VIRTIO=y CONFIG_IOSCHED_BFQ=y boot with scsi_mod.use_blk_mq=1 and select BFQ in the

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-10 Thread Oleksandr Natalenko
Hi. 09.04.2018 22:30, Kees Cook wrote: echo 1 | tee /sys/block/sd*/queue/nr_requests I can't get this below "4". Oops, yeah. It cannot be less than BLKDEV_MIN_RQ (which is 4), so it is enforced explicitly in queue_requests_store(). It is the same for me. echo 1 | tee

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko
Hi. 17.04.2018 05:12, Kees Cook wrote: Turning off HARDENED_USERCOPY and turning on KASAN, I see the same report: [ 38.274106] BUG: KASAN: slab-out-of-bounds in _copy_to_user+0x42/0x60 [ 38.274841] Read of size 22 at addr 8800122b8c4b by task smartctl/1064 [ 38.275630] [

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-20 Thread Oleksandr Natalenko
fusion I should note that I've removed the reproducer from my server, but I can re-upload it if needed. -- Oleksandr Natalenko (post-factum)

Re: usercopy whitelist woe in scsi_sense_cache

2018-04-17 Thread Oleksandr Natalenko
Hi. 17.04.2018 23:47, Kees Cook wrote: I sent the patch anyway, since it's kind of a robustness improvement, I'd hope. If you fix BFQ also, please add: Reported-by: Oleksandr Natalenko <oleksa...@natalenko.name> Root-caused-by: Kees Cook <keesc...@chromium.org> :) I gotta task-swi

Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-15 Thread Oleksandr Natalenko
Cc: Pavel Machek Cc: Paolo Valente Cc: Jens Axboe Cc: Ulf Hansson Cc: Richard Weinberger Cc: Adrian Hunter Cc: Bart Van Assche Cc: Jan Kara Cc: Artem Bityutskiy Cc: Christoph Hellwig Cc: Alan Cox Cc: Mark Brown Cc: Damien Le Moal Cc: Johannes Thumshirn Cc: Oleksandr Natalenko Cc: Jonat

Re: [PATCH v2] block: BFQ default for single queue devices

2018-11-02 Thread Oleksandr Natalenko
537.646 540.609 1r-raw_seq 500.733 502.526 Throughput-wise, BFQ is on-par with mq-deadline. Latency-wise, BFQ is much-much better. -- Oleksandr Natalenko (post-factum)

Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-19 Thread Oleksandr Natalenko
struct/table now + policy search helper? Let's do it when it happens. Premature optimization is the root of all evil ;) I'd say, this is a matter of code readability, not optimisations. I do not strongly object against current approach, though. -- Oleksandr Natalenko (post-factum)