Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 13:50 -0600, Jens Axboe wrote: > On 5/9/18 12:31 PM, Mike Galbraith wrote: > > On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote: > >> On 5/9/18 10:57 AM, Mike Galbraith wrote: > >> > >>>>> Confirmed. Impressive high speed bug s

Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 11:01 -0600, Jens Axboe wrote: > On 5/9/18 10:57 AM, Mike Galbraith wrote: > > >>> Confirmed. Impressive high speed bug stomping. > >> > >> Well, that's good news. Can I get you to try this patch? > > > > Sure thin

Re: bug in tag handling in blk-mq?

2018-05-09 Thread Mike Galbraith
On Wed, 2018-05-09 at 09:18 -0600, Jens Axboe wrote: > On 5/8/18 10:11 PM, Mike Galbraith wrote: > > On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote: > >> > >> Alright, I managed to reproduce it. What I think is happening is that > >> BFQ is limiting the

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 14:37 -0600, Jens Axboe wrote: > > - sdd has nothing pending, yet has 6 active waitqueues. sdd is where ccache storage lives, which that should have been the only activity on that drive, as I built source in sdb, and was doing nothing else that utilizes sdd. -Mike

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote: > > Alright, I managed to reproduce it. What I think is happening is that > BFQ is limiting the inflight case to something less than the wake > batch for sbitmap, which can lead to stalls. I don't have time to test > this tonight, but perhaps yo

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote: > > All the block debug files are empty... Sigh. Take 2, this time cat debug files, having turned block tracing off before doing anything else (so trace bits in dmesg.txt should end AT the stall). -Mike dmesg.xz Description: applicat

Re: bug in tag handling in blk-mq?

2018-05-08 Thread Mike Galbraith
On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote: > > I'm deadlined ATM, but will get to it. (Bah, even a zombie can type ccache -C; make -j8 and stare...) kbuild again hung on the first go (yay), and post hang data written to sdd1 survived (kernel source lives in sdb3).

Re: bug in tag handling in blk-mq?

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 20:02 +0200, Paolo Valente wrote: > > > > Is there a reproducer? Just building fat config kernels works for me. It was highly non- deterministic, but reproduced quickly twice in a row with Paolos hack.    > Ok Mike, I guess it's your turn now, for at least a stack trace.

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-07 Thread Mike Galbraith
On Mon, 2018-05-07 at 11:27 +0200, Paolo Valente wrote: > > > Where is the bug? Hm, seems potent pain-killers and C don't mix all that well.

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > diff --git a/block/bfq-mq-iosched.c b/block/bfq-mq-iosched.c > index 118f319af7c0..6662efe29b69 100644 > --- a/block/bfq-mq-iosched.c > +++ b/block/bfq-mq-iosched.c > @@ -525,8 +525,13 @@ static void bfq_limit_depth(unsigned int op, struc

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Mon, 2018-05-07 at 04:43 +0200, Mike Galbraith wrote: > On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > > > I've attached a compressed patch (to avoid possible corruption from my > > mailer). I'm little confident, but no pain, no gain, right? > &g

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-06 Thread Mike Galbraith
On Sun, 2018-05-06 at 09:42 +0200, Paolo Valente wrote: > > I've attached a compressed patch (to avoid possible corruption from my > mailer). I'm little confident, but no pain, no gain, right? > > If possible, apply this patch on top of the fix I proposed in this > thread, just to eliminate poss

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-05 Thread Mike Galbraith
On Sat, 2018-05-05 at 12:39 +0200, Paolo Valente wrote: > > BTW, if you didn't run out of patience with this permanent issue yet, > I was thinking of two o three changes to retry to trigger your failure > reliably. Sure, fire away, I'll happily give the annoying little bugger opportunities to sho

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-05 Thread Mike Galbraith
On Fri, 2018-05-04 at 21:46 +0200, Mike Galbraith wrote: > Tentatively, I suspect you've just fixed the nasty stalls I reported a > while back. Oh well, so much for optimism. It took a lot, but just hung.

Re: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge

2018-05-04 Thread Mike Galbraith
Tentatively, I suspect you've just fixed the nasty stalls I reported a while back. Not a hint of stall as yet (should have shown itself by now), spinning rust buckets are being all they can be, box feels good. Later mq-deadline (I hope to eventually forget the module dependency eternities we've s

Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

2018-02-09 Thread Mike Galbraith
On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote: > > In addition to this I think it should be worth considering CC'ing Greg > to pull this fix into 4.15 stable tree. This isn't one he can cherry-pick, some munging required, in which case he usually wants a properly tested backport.

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 12:12 +0100, Paolo Valente wrote: > Just to be certain, before submitting a new patch: you changed *only* > the BUG_ON at line 4742, on top of my instrumentation patch. Nah, I completely rewrite it with only a little help from an ouija board to compensate for missing (all) k

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote: > > 2. Could you please turn that BUG_ON into: > if (!(rq->rq_flags & RQF_ELVPRIV)) > return; > and see what happens? That seems to make it forgets how to make boom. -Mike

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 11:27 +0100, Paolo Valente wrote: > > 1. Could you paste a stack trace for this OOPS, just to understand how we > get there? [ 442.421058] kernel BUG at block/bfq-iosched.c:4742! [ 442.421762] invalid opcode: [#1] SMP PTI [ 442.422436] Dumping ftrace buffer: [ 442.4

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 10:45 +0100, Paolo Valente wrote: > > > Il giorno 07 feb 2018, alle ore 10:23, Mike Galbraith ha > > scritto: > > > > On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote: > >> > >> The first piece of information I need is w

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-07 Thread Mike Galbraith
On Wed, 2018-02-07 at 10:08 +0100, Paolo Valente wrote: > > The first piece of information I need is whether this failure happens > even without "BFQ hierarchical scheduling support". I presume you mean BFQ_GROUP_IOSCHED, which I do not have enabled. -Mike 

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 13:43 +0100, Holger Hoffstätte wrote: > > A much more interesting question to me is why there is kyber in the middle. :) Yeah, given per sysfs I have zero devices using kyber. -Mike

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 13:26 +0100, Paolo Valente wrote: > > ok, right in the middle of bfq this time ... Was this the first OOPS in your > kernel log? Yeah.

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 13:16 +0100, Oleksandr Natalenko wrote: > Hi. > > 06.02.2018 12:57, Mike Galbraith wrote: > > Not me.  Box seems to be fairly sure that it is bfq. Twice again box > > went belly up on me in fairly short order with bfq, but seemed fine > > wi

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 10:38 +0100, Paolo Valente wrote: > > Hi Mike, > as you can imagine, I didn't get any failure in my pre-submission > tests on this patch. In addition, it is not that easy to link this > patch, which just adds some internal bfq housekeeping in case of a > requeue, with a corr

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-06 Thread Mike Galbraith
On Tue, 2018-02-06 at 09:37 +0100, Oleksandr Natalenko wrote: > Hi. > > 06.02.2018 08:56, Mike Galbraith wrote: > > I was doing kbuilds, and it blew up on me twice. Switching back to cfq > > seemed to confirm it was indeed the patch causing trouble, but that's &

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith
On Tue, 2018-02-06 at 08:44 +0100, Oleksandr Natalenko wrote: > Hi, Paolo. > > I can confirm that this patch fixes cfdisk hang for me. I've also tried > to trigger the issue Mike has encountered, but with no luck (maybe, I > wasn't insistent enough, just was doing dd on usb-storage device in the

Re: [PATCH BUGFIX 1/1] block, bfq: add requeue-request hook

2018-02-05 Thread Mike Galbraith
Hi Paolo, I applied this to master.today, flipped udev back to bfq and took it for a spin.  Unfortunately, box fairly quickly went boom under load. [ 454.739975] [ cut here ] [ 454.739979] list_add corruption. prev->next should be next (5f99a42a), but was

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2017-12-14 Thread Mike Galbraith
On Thu, 2017-12-14 at 22:54 +0100, Peter Zijlstra wrote: > On Thu, Dec 14, 2017 at 09:42:48PM +, Bart Van Assche wrote: > > > Some time ago the block layer was changed to handle timeouts in thread > > context > > instead of interrupt context. See also commit 287922eb0b18 ("block: defer > > ti

Re: possible deadlock in blk_trace_remove

2017-12-03 Thread Mike Galbraith
On Sun, 2017-12-03 at 17:47 -0700, Jens Axboe wrote: > On 12/03/2017 05:44 PM, Eric Biggers wrote: > > > >>> #syz fix: blktrace: fix trace mutex deadlock > >> > >> This is fixed in current -git. > >> > > > > I know, but syzbot needed to be told what commit fixes the bug. > > See https://github.co

Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg

2017-08-31 Thread Mike Galbraith
On Thu, 2017-08-31 at 19:12 +0200, Paolo Valente wrote: > > Il giorno 31 ago 2017, alle ore 19:06, Mike Galbraith ha > > scritto: > > > > On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote: > >> On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote:

Re: [PATCH BUGFIX/IMPROVEMENT V2 0/3] three bfq fixes restoring service guarantees with random sync writes in bg

2017-08-31 Thread Mike Galbraith
On Thu, 2017-08-31 at 15:42 +0100, Mel Gorman wrote: > On Thu, Aug 31, 2017 at 08:46:28AM +0200, Paolo Valente wrote: > > [SECOND TAKE, with just the name of one of the tester fixed] > > > > Hi, > > while testing the read-write unfairness issues reported by Mel, I > > found BFQ failing to guarante

Re: blk-mq breaks suspend even with runtime PM patch

2017-08-08 Thread Mike Galbraith
On Tue, 2017-08-08 at 18:50 +0200, Mike Galbraith wrote: > On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > > > Should these go back farther than 4.12? Looks like they apply cleanly > > to 4.9, didn't look older than that... > > I met prerequisites a

Re: blk-mq breaks suspend even with runtime PM patch

2017-08-08 Thread Mike Galbraith
On Tue, 2017-08-08 at 09:44 -0700, Greg KH wrote: > > Should these go back farther than 4.12? Looks like they apply cleanly > to 4.9, didn't look older than that... I met prerequisites at 4.11, but I wasn't patching anything remotely resembling virgin source. -Mike

Re: blk-mq breaks suspend even with runtime PM patch

2017-08-08 Thread Mike Galbraith
On Tue, 2017-08-08 at 09:22 -0700, Greg KH wrote: > On Sun, Jul 30, 2017 at 03:50:15PM +0200, Oleksandr Natalenko wrote: > > Hello Mike et al. > > > > On neděle 30. července 2017 7:12:31 CEST Mike Galbraith wrote: > > > FWIW, first thing I'd do is updat

Re: blk-mq breaks suspend even with runtime PM patch

2017-07-29 Thread Mike Galbraith
On Sat, 2017-07-29 at 17:27 +0200, Oleksandr Natalenko wrote: > Hello Jens, Christoph. > > Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied > blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well > as in a VM. > > I use complex disk layout inv

Re: [PATCH] Fix loop device flush before configure v2

2017-06-07 Thread Mike Galbraith
On Thu, 2017-06-08 at 10:17 +0800, James Wang wrote: > This condition check was exist at before commit b5dd2f6047ca ("block: loop: > improve performance via blk-mq") When add MQ support to loop device, it be > removed because the member of '->lo_thread' be removed. And then upstream > add '->worker