Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/25/2017 11:17 AM, h...@lst.de wrote: > On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote: >> Bart, I pushed a fix here: >> >> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1 > > Yeah, this looks fine to me. It was broken on blk-mq before, but > basically impossible to hit. I wonder if we should have a debug mode > where we set requests to a known pattern after they are freed to catch > these sort of bugs. tio/md would probably never change for the same set of requests, so it worked by happy accident before. Some of the blk-mq state remains valid after "freeing" a request, but we could selectively fill and definitely fill the entire PDU area with free poison. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote: > Bart, I pushed a fix here: > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1 Yeah, this looks fine to me. It was broken on blk-mq before, but basically impossible to hit. I wonder if we should have a debug mode where we set requests to a known pattern after they are freed to catch these sort of bugs.
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Fri, 2017-02-24 at 13:22 -0700, Jens Axboe wrote: > Bart, I pushed a fix here: > > http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1 Hello Jens, The same test passes against the kernel I obtained by merging your for-linus branch with the same version of Linus' master branch I mentioned in a previous e-mail. Feel free to add my Tested-by to the patch "dm-rq: don't dereference request payload after ending request". Thanks, Bart.
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/24/2017 01:00 PM, Jens Axboe wrote: > On 02/24/2017 12:43 PM, Linus Torvalds wrote: >> On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche >>wrote: >>> >>> So the crash is caused by an attempt to dereference address >>> 0x6b6b6b6b6b6b6b6b >>> at offset 0x270. I think this means the crash is caused by a use-after-free. >> >> Yeah, that's POISON_FREE, and that might explain why you see crashes >> that others don't - you obviously have SLAB poisoning enabled. Jens >> may not have. >> >> %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing >> >> struct dm_rq_target_io *tio = tio_from_request(rq); >> struct request *clone = tio->clone; >> int rw; >> >> if (!clone) { >> rq_end_stats(tio->md, rq); >> rw = rq_data_dir(rq); >> if (!rq->q->mq_ops) >> blk_end_request_all(rq, tio->error); >> else >> blk_mq_end_request(rq, tio->error); >> rq_completed(tio->md, rw, false); >> return; >> } >> >> so it's the 'tio' pointer that has been free'd. But it's worth noting >> that we did apparently successfully dereference "tio" earlier in that >> dm_softirq_done() *without* getting the poison value, so what I think >> might be going on is that the 'tio' thing gets free'd when the code >> does the blk_end_request_all()/blk_mq_end_request() call. >> >> Which makes sense - that ends the lifetime of the request, which in >> turn also ends the lifetime of the "tio_from_request()", no? >> >> So the fix may be as simple as just doing >> >> if (!clone) { >> struct mapped_device *md = tio->md; >> >> rq_end_stats(md, rq); >> ... >> rq_completed(md, rw, false); >> return; >> } >> >> because the 'mapped_device' pointer hopefully is still valid, it's >> just 'tio' that has been freed. >> >> Jens? Bart? Christoph? Somebody who knows this code should >> double-check my thinking above. I don't actually know the tio >> lifetimes, I'm just going by looking at how earlier accesses seemed to >> be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer, >> for example). > > I think that is spot on. With the request changes for CDBs, for non > blk-mq, we know also carry the payload after the request. But since > blk-mq never frees the request, the above use-after-free with poison > will only happen for !mq. Caching 'md' and avoiding a dereference of > 'tio' after calling blk_end_request_all() will likely fix it. > > Bart, can you test that? Bart, I pushed a fix here: http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1 -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/24/2017 12:43 PM, Linus Torvalds wrote: > On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche >wrote: >> >> So the crash is caused by an attempt to dereference address >> 0x6b6b6b6b6b6b6b6b >> at offset 0x270. I think this means the crash is caused by a use-after-free. > > Yeah, that's POISON_FREE, and that might explain why you see crashes > that others don't - you obviously have SLAB poisoning enabled. Jens > may not have. > > %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing > > struct dm_rq_target_io *tio = tio_from_request(rq); > struct request *clone = tio->clone; > int rw; > > if (!clone) { > rq_end_stats(tio->md, rq); > rw = rq_data_dir(rq); > if (!rq->q->mq_ops) > blk_end_request_all(rq, tio->error); > else > blk_mq_end_request(rq, tio->error); > rq_completed(tio->md, rw, false); > return; > } > > so it's the 'tio' pointer that has been free'd. But it's worth noting > that we did apparently successfully dereference "tio" earlier in that > dm_softirq_done() *without* getting the poison value, so what I think > might be going on is that the 'tio' thing gets free'd when the code > does the blk_end_request_all()/blk_mq_end_request() call. > > Which makes sense - that ends the lifetime of the request, which in > turn also ends the lifetime of the "tio_from_request()", no? > > So the fix may be as simple as just doing > > if (!clone) { > struct mapped_device *md = tio->md; > > rq_end_stats(md, rq); > ... > rq_completed(md, rw, false); > return; > } > > because the 'mapped_device' pointer hopefully is still valid, it's > just 'tio' that has been freed. > > Jens? Bart? Christoph? Somebody who knows this code should > double-check my thinking above. I don't actually know the tio > lifetimes, I'm just going by looking at how earlier accesses seemed to > be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer, > for example). I think that is spot on. With the request changes for CDBs, for non blk-mq, we know also carry the payload after the request. But since blk-mq never frees the request, the above use-after-free with poison will only happen for !mq. Caching 'md' and avoiding a dereference of 'tio' after calling blk_end_request_all() will likely fix it. Bart, can you test that? -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Asschewrote: > > So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b > at offset 0x270. I think this means the crash is caused by a use-after-free. Yeah, that's POISON_FREE, and that might explain why you see crashes that others don't - you obviously have SLAB poisoning enabled. Jens may not have. %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing struct dm_rq_target_io *tio = tio_from_request(rq); struct request *clone = tio->clone; int rw; if (!clone) { rq_end_stats(tio->md, rq); rw = rq_data_dir(rq); if (!rq->q->mq_ops) blk_end_request_all(rq, tio->error); else blk_mq_end_request(rq, tio->error); rq_completed(tio->md, rw, false); return; } so it's the 'tio' pointer that has been free'd. But it's worth noting that we did apparently successfully dereference "tio" earlier in that dm_softirq_done() *without* getting the poison value, so what I think might be going on is that the 'tio' thing gets free'd when the code does the blk_end_request_all()/blk_mq_end_request() call. Which makes sense - that ends the lifetime of the request, which in turn also ends the lifetime of the "tio_from_request()", no? So the fix may be as simple as just doing if (!clone) { struct mapped_device *md = tio->md; rq_end_stats(md, rq); ... rq_completed(md, rw, false); return; } because the 'mapped_device' pointer hopefully is still valid, it's just 'tio' that has been freed. Jens? Bart? Christoph? Somebody who knows this code should double-check my thinking above. I don't actually know the tio lifetimes, I'm just going by looking at how earlier accesses seemed to be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer, for example). Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/24/2017 10:39 AM, Bart Van Assche wrote: > On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote: >> On 02/20/2017 09:16 AM, Bart Van Assche wrote: >>> On 02/19/2017 11:35 PM, Christoph Hellwig wrote: On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote: > That said, we will look into this again, of course. Christoph, any idea? No idea really - this seems so far away from the code touched, and there are no obvious signs for a memory scamble from another object touched that I think if it really bisects down to that issue it must be a timing issue. But reading Bart's message again: Did you actually bisect it down to the is commit? Or just test the whole tree? Between the 4.10-rc5 merge and all the block tree there might a few more likely suspects like the scsi bdi lifetime fixes that James mentioned. >>> >>> Hello Christoph, >>> >>> As far as I know Jens does not rebase his trees so we can use the commit >>> date to check which patch went in when. From the first of Jan's bdi patches: >>> >>> CommitDate: Thu Feb 2 08:18:41 2017 -0700 >>> >>> So the bdi patches went in several days after I reported the general >>> protection >>> fault issue. >>> >>> In an e-mail of January 30th I wrote the following: "Running the srp-test >>> software against kernel 4.9.6 and kernel 4.10-rc5 went fine. With your >>> for-4.11/block branch (commit 400f73b23f457a) however I just ran into >>> the following warning: [ ... ]" That means that I did not hit the crash with >>> Jens' for-4.11/block branch but only with the for-next branch. The patches >>> on Jens' for-next branch after that commit that were applied before I ran >>> my test are: >>> >>> $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block >>> drivers/md/dm{,-mpath,-table}.[ch] >>> fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in >>> __blk_rq_prep_clone >>> 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of >>> struct request >>> 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation >>> into bsg_setup_queue >>> eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request >>> allocation to the owner of the request_queue >>> 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for >>> extra command data >>> 5ea708d15a928f7a479987704203616d3274c03b block: simplify >>> blk_init_allocated_queue >>> e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check >>> f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into >>> for-4.11/rq-refactor >>> 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable >>> bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from >>> __blkdev_issue_zeroout >>> f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size >>> and bdev_zone_size >>> >>> Do you see any patch in the above list that does not belong to the "split >>> scsi passthrough fields out of struct request" series and that could have >>> caused the reported behavior change? >> >> Bart, since you are the only one that can reproduce this, can you just bisect >> your way through that series? > > Hello Jens, > > Since Christoph also has access to IB hardware I will leave it to Christoph > to do the bisect. Anyway, I just reproduced this crash with Linus' current > tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq > (see also https://github.com/bvanassche/srp-test): > > [ 1629.920553] general protection fault: [#1] SMP > [ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G I > 4.10.0-dbg+ #1 > [ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod] > [ 1629.921316] RSP: 0018:c90001bdbda8 EFLAGS: 00010246 > [ 1629.921344] RAX: RBX: 6b6b6b6b6b6b6b6b RCX: > > [ 1629.921372] RDX: RSI: RDI: > 6b6b6b6b6b6b6b6b > [ 1629.921401] RBP: c90001bdbdc0 R08: 8803a3858d48 R09: > > [ 1629.921429] R10: R11: R12: > > [ 1629.921458] R13: R14: 81c05120 R15: > 0004 > [ 1629.921489] FS: () GS:88046ef8() > knlGS: > [ 1629.921520] CS: 0010 DS: ES: CR0: 80050033 > [ 1629.921547] CR2: 7fb6324486b8 CR3: 01c0f000 CR4: > 001406e0 > [ 1629.921576] Call Trace: > [ 1629.921605] dm_softirq_done+0xe6/0x1e0 [dm_mod] > [ 1629.921637] blk_done_softirq+0x88/0xa0 > [ 1629.921663] __do_softirq+0xba/0x4c0 > [ 1629.921744] run_ksoftirqd+0x1a/0x50 > [ 1629.921769] smpboot_thread_fn+0x123/0x1e0 > [ 1629.921797] kthread+0x107/0x140 > [ 1629.921944] ret_from_fork+0x2e/0x40 > [ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f > 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> >
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote: > On 02/20/2017 09:16 AM, Bart Van Assche wrote: > > On 02/19/2017 11:35 PM, Christoph Hellwig wrote: > > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote: > > > > That said, we will look into this again, of course. Christoph, any idea? > > > > > > No idea really - this seems so far away from the code touched, and there > > > are no obvious signs for a memory scamble from another object touched > > > that I think if it really bisects down to that issue it must be a timing > > > issue. > > > > > > But reading Bart's message again: Did you actually bisect it down > > > to the is commit? Or just test the whole tree? Between the 4.10-rc5 > > > merge and all the block tree there might a few more likely suspects > > > like the scsi bdi lifetime fixes that James mentioned. > > > > Hello Christoph, > > > > As far as I know Jens does not rebase his trees so we can use the commit > > date to check which patch went in when. From the first of Jan's bdi patches: > > > > CommitDate: Thu Feb 2 08:18:41 2017 -0700 > > > > So the bdi patches went in several days after I reported the general > > protection > > fault issue. > > > > In an e-mail of January 30th I wrote the following: "Running the srp-test > > software against kernel 4.9.6 and kernel 4.10-rc5 went fine. With your > > for-4.11/block branch (commit 400f73b23f457a) however I just ran into > > the following warning: [ ... ]" That means that I did not hit the crash with > > Jens' for-4.11/block branch but only with the for-next branch. The patches > > on Jens' for-next branch after that commit that were applied before I ran > > my test are: > > > > $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block > > drivers/md/dm{,-mpath,-table}.[ch] > > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in > > __blk_rq_prep_clone > > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of > > struct request > > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation > > into bsg_setup_queue > > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request > > allocation to the owner of the request_queue > > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for > > extra command data > > 5ea708d15a928f7a479987704203616d3274c03b block: simplify > > blk_init_allocated_queue > > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check > > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into > > for-4.11/rq-refactor > > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable > > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from > > __blkdev_issue_zeroout > > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size > > and bdev_zone_size > > > > Do you see any patch in the above list that does not belong to the "split > > scsi passthrough fields out of struct request" series and that could have > > caused the reported behavior change? > > Bart, since you are the only one that can reproduce this, can you just bisect > your way through that series? Hello Jens, Since Christoph also has access to IB hardware I will leave it to Christoph to do the bisect. Anyway, I just reproduced this crash with Linus' current tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq (see also https://github.com/bvanassche/srp-test): [ 1629.920553] general protection fault: [#1] SMP [ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G I 4.10.0-dbg+ #1 [ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod] [ 1629.921316] RSP: 0018:c90001bdbda8 EFLAGS: 00010246 [ 1629.921344] RAX: RBX: 6b6b6b6b6b6b6b6b RCX: [ 1629.921372] RDX: RSI: RDI: 6b6b6b6b6b6b6b6b [ 1629.921401] RBP: c90001bdbdc0 R08: 8803a3858d48 R09: [ 1629.921429] R10: R11: R12: [ 1629.921458] R13: R14: 81c05120 R15: 0004 [ 1629.921489] FS: () GS:88046ef8() knlGS: [ 1629.921520] CS: 0010 DS: ES: CR0: 80050033 [ 1629.921547] CR2: 7fb6324486b8 CR3: 01c0f000 CR4: 001406e0 [ 1629.921576] Call Trace: [ 1629.921605] dm_softirq_done+0xe6/0x1e0 [dm_mod] [ 1629.921637] blk_done_softirq+0x88/0xa0 [ 1629.921663] __do_softirq+0xba/0x4c0 [ 1629.921744] run_ksoftirqd+0x1a/0x50 [ 1629.921769] smpboot_thread_fn+0x123/0x1e0 [ 1629.921797] kthread+0x107/0x140 [ 1629.921944] ret_from_fork+0x2e/0x40 [ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85 [ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod]
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Wed, Feb 22, 2017 at 1:50 PM, Markus Trippelsdorfwrote: > > But what about e.g. SATA SSDs? Wouldn't they be better off without any > scheduler? > So perhaps setting "none" for queue/rotational==0 and mq-deadline for > spinning drives automatically in the sq blk-mq case? Jens already said that the merging advantage can outweigh the costs, but he didn't actually talk much about it. The scheduler advantage can outweigh the costs of running a scheduler by an absolutely _huge_ amount. An SSD isn't zero-cost, and each command tends to have some fixed overhead on the controller, and pretty much all SSD's heavily prefer fewer large request over lots of tiny ones. There are also fairness/latency issues that tend to very heavily favor having an actual scheduler, ie reads want to be scheduled before writes on an SSD (within reason) in order to make latency better. Ten years ago, there were lots of people who argued that you don't want to do do scheduling for SSD's, because SSD's were so fast that you only added overhead. Nobody really believes that fairytale any more. So you might have particular loads that look better with noop, but they will be rare and far between. Try it, by all means, and if it works for you, set it in your udev rules. The main place where a noop scheduler currently might make sense is likely for a ramdisk, but quite frankly, since the main real usecase for a ram-disk tends to be to make it easy to profile and find the bottlenecks for performance analysis (for emulating future "infinitely fast" media), even that isn't true - using noop there defeats the whole purpose. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 2017.02.22 at 11:44 -0700, Jens Axboe wrote: > On 02/22/2017 11:42 AM, Linus Torvalds wrote: > > On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds > >wrote: > >> > >> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR > >> USER? > > > > Basically, I'm pushing back on config options that I can't personally > > even sanely answer. > > I got that much, and I don't disagree on that part. > > > If it's a config option about "do I have a particular piece of > > hardware", it makes sense. But these new ones were just complete > > garbage. > > > > The whole "default IO scheduler" thing is a disease. We should stop > > making up these shit schedulers and then say "we don't know which one > > works best for you". > > > > All it does is encourage developers to make shortcuts and create crap > > that isn't generically useful, and then blame the user and say "well, > > you should have picked a different scheduler" when they say "this does > > not work well for me". > > > > We have had too many of those kinds of broken choices. And when the > > new Kconfig options get so confusing and so esoteric that I go "Hmm, I > > have no idea if my hardware does a single queue or not", I put my foot > > down. > > > > When the IO scheduler questions were about a generic IO scheduler for > > everything, I can kind of understand them. I think it was still a > > mistake (for the reasons outline above), but at least it was a > > comprehensible question to ask. > > > > But when it gets to "what should I do about a single-queue version of > > a MQ scheduler", the question is no longer even remotely sensible. The > > question should simply NOT EXIST. There is no possible valid reason to > > ask that kind of crap. > > OK, so here's what I'll do: > > 1) We'll kill the default scheduler choices. sq blk-mq will default to >mq-deadline, mq blk-mq will default to "none" (at least for now, until >the new scheduler is done). But what about e.g. SATA SSDs? Wouldn't they be better off without any scheduler? So perhaps setting "none" for queue/rotational==0 and mq-deadline for spinning drives automatically in the sq blk-mq case? -- Markus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/22/2017 02:50 PM, Markus Trippelsdorf wrote: > On 2017.02.22 at 11:44 -0700, Jens Axboe wrote: >> On 02/22/2017 11:42 AM, Linus Torvalds wrote: >>> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds >>>wrote: And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER? >>> >>> Basically, I'm pushing back on config options that I can't personally >>> even sanely answer. >> >> I got that much, and I don't disagree on that part. >> >>> If it's a config option about "do I have a particular piece of >>> hardware", it makes sense. But these new ones were just complete >>> garbage. >>> >>> The whole "default IO scheduler" thing is a disease. We should stop >>> making up these shit schedulers and then say "we don't know which one >>> works best for you". >>> >>> All it does is encourage developers to make shortcuts and create crap >>> that isn't generically useful, and then blame the user and say "well, >>> you should have picked a different scheduler" when they say "this does >>> not work well for me". >>> >>> We have had too many of those kinds of broken choices. And when the >>> new Kconfig options get so confusing and so esoteric that I go "Hmm, I >>> have no idea if my hardware does a single queue or not", I put my foot >>> down. >>> >>> When the IO scheduler questions were about a generic IO scheduler for >>> everything, I can kind of understand them. I think it was still a >>> mistake (for the reasons outline above), but at least it was a >>> comprehensible question to ask. >>> >>> But when it gets to "what should I do about a single-queue version of >>> a MQ scheduler", the question is no longer even remotely sensible. The >>> question should simply NOT EXIST. There is no possible valid reason to >>> ask that kind of crap. >> >> OK, so here's what I'll do: >> >> 1) We'll kill the default scheduler choices. sq blk-mq will default to >>mq-deadline, mq blk-mq will default to "none" (at least for now, until >>the new scheduler is done). > > But what about e.g. SATA SSDs? Wouldn't they be better off without any > scheduler? Marginal. If they are single queue, using a basic scheduler like deadline isn't going to be a significant amount of overhead. In some cases they are going to be better off, due to better merging. In the worst case, overhead is slightly higher. Net result is positive, I'd say. > So perhaps setting "none" for queue/rotational==0 and mq-deadline for > spinning drives automatically in the sq blk-mq case? You can do that through a udev rule. The kernel doesn't know if the device is rotational or not when we set up the scheduler. So we'd either have to add code to do that, or simply just do it with a udev rule. I'd prefer the latter. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/22/2017 12:04 PM, Linus Torvalds wrote: > On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboewrote: >> On 02/22/2017 11:56 AM, Linus Torvalds wrote: >> >> OK, so here's what I'll do: >> >> 1) We'll kill the default scheduler choices. sq blk-mq will default to >>mq-deadline, mq blk-mq will default to "none" (at least for now, until >>the new scheduler is done). >> 2) The individual schedulers will be y/m/n selectable, just like any >>other driver. > > Yes. That makes sense as options. I can (or, perhaps even more > importantly, a distro can) answer those kinds of questions. Someone misspelled pacman: parman (PARMAN) [N/m/y] (NEW) ? There is no help available for this option. Or I think it's pacman, because I have no idea what else it could be. I'm going to say N. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboewrote: > On 02/22/2017 11:56 AM, Linus Torvalds wrote: > > OK, so here's what I'll do: > > 1) We'll kill the default scheduler choices. sq blk-mq will default to >mq-deadline, mq blk-mq will default to "none" (at least for now, until >the new scheduler is done). > 2) The individual schedulers will be y/m/n selectable, just like any >other driver. Yes. That makes sense as options. I can (or, perhaps even more importantly, a distro can) answer those kinds of questions. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Wed, Feb 22, 2017 at 10:52 AM, Jens Axboewrote: >> >> It's that simple. > > No, it's not that simple at all. Fact is, some optimizations make sense > for some workloads, and some do not. Are you even listening? I'm saying no user can ever give a sane answer to your question. The question is insane and wrong. I already said you can have a dynamic configuration (and maybe even an automatic heuristic - like saying that a ramdisk gets NOOP by default, real hardware does not). But asking a user at kernel config time for a default is insane. If *you* cannot answer it, then the user sure as hell cannot. Other configuration questions have problems too, but at least the question about "should I support ext4" is something a user (or distro) can sanely answer. So your comparisons are pure bullshit. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Wed, Feb 22, 2017 at 10:41 AM, Jens Axboewrote: > > The fact is that we have two different sets, until we can yank > the old ones. So I can't just ask one question, since the sets > aren't identical. Bullshit. I'm, saying: rip out the question ENTIRELY. For *both* cases. If you cannot yourself give a good answer, then there's no f*cking way any user can give a good answer. So asking the question is totally and utterly pointless. All it means is that different people will try different (in random ways) configurations, and the end result is random crap. So get rid of those questions. Pick a default, and live with it. And if people complain about performance, fix the performance issue. It's that simple. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/22/2017 11:42 AM, Linus Torvalds wrote: > On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds >wrote: >> >> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR >> USER? > > Basically, I'm pushing back on config options that I can't personally > even sanely answer. I got that much, and I don't disagree on that part. > If it's a config option about "do I have a particular piece of > hardware", it makes sense. But these new ones were just complete > garbage. > > The whole "default IO scheduler" thing is a disease. We should stop > making up these shit schedulers and then say "we don't know which one > works best for you". > > All it does is encourage developers to make shortcuts and create crap > that isn't generically useful, and then blame the user and say "well, > you should have picked a different scheduler" when they say "this does > not work well for me". > > We have had too many of those kinds of broken choices. And when the > new Kconfig options get so confusing and so esoteric that I go "Hmm, I > have no idea if my hardware does a single queue or not", I put my foot > down. > > When the IO scheduler questions were about a generic IO scheduler for > everything, I can kind of understand them. I think it was still a > mistake (for the reasons outline above), but at least it was a > comprehensible question to ask. > > But when it gets to "what should I do about a single-queue version of > a MQ scheduler", the question is no longer even remotely sensible. The > question should simply NOT EXIST. There is no possible valid reason to > ask that kind of crap. OK, so here's what I'll do: 1) We'll kill the default scheduler choices. sq blk-mq will default to mq-deadline, mq blk-mq will default to "none" (at least for now, until the new scheduler is done). 2) The individual schedulers will be y/m/n selectable, just like any other driver. I hope that works for everyone. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvaldswrote: > > And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER? Basically, I'm pushing back on config options that I can't personally even sanely answer. If it's a config option about "do I have a particular piece of hardware", it makes sense. But these new ones were just complete garbage. The whole "default IO scheduler" thing is a disease. We should stop making up these shit schedulers and then say "we don't know which one works best for you". All it does is encourage developers to make shortcuts and create crap that isn't generically useful, and then blame the user and say "well, you should have picked a different scheduler" when they say "this does not work well for me". We have had too many of those kinds of broken choices. And when the new Kconfig options get so confusing and so esoteric that I go "Hmm, I have no idea if my hardware does a single queue or not", I put my foot down. When the IO scheduler questions were about a generic IO scheduler for everything, I can kind of understand them. I think it was still a mistake (for the reasons outline above), but at least it was a comprehensible question to ask. But when it gets to "what should I do about a single-queue version of a MQ scheduler", the question is no longer even remotely sensible. The question should simply NOT EXIST. There is no possible valid reason to ask that kind of crap. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/22/2017 11:26 AM, Linus Torvalds wrote: > On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboewrote: >> >> What do you mean by "the regular IO scheduler"? These are different >> schedulers. > > Not to the user they aren't. > > If the user already answered once about the IO schedulers, we damn > well shouldn't ask again abotu another small implementaiton detail. > > How hard is this to understand? You're asking users stupid things. The fact is that we have two different sets, until we can yank the old ones. So I can't just ask one question, since the sets aren't identical. This IS confusing to the user, and it's an artifact of the situation that we have where we are phasing out the old IO path and switching to blk-mq. I don't want to user to know about blk-mq, I just want it to be what everything runs on. But until that happens, and it is happening, we are going to be stuck with that situation. We have this exposed in other places, too. Like for dm, and for SCSI. Not a perfect situation, but something that WILL go away eventually. > It's not just about the wording. It's a fundamental issue. These > questions are about internal implementation details. They make no > sense to a user. They don't even make sense to a kernel developer, for > chrissake! > > Don't make the kconfig mess worse. This "we can't make good defaults > in the kernel, so ask users about random things that they cannot > possibly answer" model is not an acceptable model. There are good defaults! mq single-queue should default to mq-deadline, and mq multi-queue should default to "none" for now. If you feel that strongly about it (and I'm guessing you do, judging by the speed typing and generally annoyed demeanor), then by all means, let's kill the config entries and I'll just hardwire the defaults. The config entries were implemented similarly to the old schedulers, and each scheduler is selectable individually. I'd greatly prefer just improving the wording so it makes more sense. > If the new schedulers aren't better than NOOP, they shouldn't exist. > And if you want people to be able to test, they should be dynamic. They are dynamic! You can build them as modules, you can switch at runtime. Just like we have always been able to. I can't make it more dynamic than that. We're reusing the same internal infrastructure for that, AND the user visible ABI for checking what is available, and setting a new one. > And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER? BECAUSE IT'S POLICY! Fact of that matter is, if I just default to what we had before, it'd all be running with none. In a few years time, if I'm lucky, someone will have shipped udev rules setting this appropriately. If I ask the question, we'll get testing NOW. People will run with the default set. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboewrote: > > What do you mean by "the regular IO scheduler"? These are different > schedulers. Not to the user they aren't. If the user already answered once about the IO schedulers, we damn well shouldn't ask again abotu another small implementaiton detail. How hard is this to understand? You're asking users stupid things. It's not just about the wording. It's a fundamental issue. These questions are about internal implementation details. They make no sense to a user. They don't even make sense to a kernel developer, for chrissake! Don't make the kconfig mess worse. This "we can't make good defaults in the kernel, so ask users about random things that they cannot possibly answer" model is not an acceptable model. If the new schedulers aren't better than NOOP, they shouldn't exist. And if you want people to be able to test, they should be dynamic. And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER? It's really that simple. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/21/2017 04:23 PM, Linus Torvalds wrote: > On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboewrote: >> >> But under a device managed by blk-mq, that device exposes a number of >> hardware queues. For older style devices, that number is typically 1 >> (single queue). > > ... but why would this ever be different from the normal IO scheduler? Because we have a different set of schedulers for blk-mq, different than the legacy path. mq-deadline is a basic port that will work fine with rotational storage, but it's not going to be a good choice for NVMe because of scalability issues. We'll have BFQ on the blk-mq side, catering to the needs of those folks that currently rely on the richer feature set that CFQ supports. We've continually been working towards getting rid of the legacy IO path, and its set of schedulers. So if it's any consolation, those options will go away in the future. > IOW, what makes single-queue mq scheduling so special that > > (a) it needs its own config option > > (b) it is different from just the regular IO scheduler in the first place? > > So the whole thing stinks. The fact that it then has an > incomprehensible config option seems to be just gravy on top of the > crap. What do you mean by "the regular IO scheduler"? These are different schedulers. As explained above, single-queue mq devices generally DO want mq-deadline. multi-queue mq devices, we don't have a good choice for them right now, so we retain the current behavior (that we've had since blk-mq was introduced in 3.13) of NOT doing any IO scheduling for them. If you do want scheduling for them, set the option, or configure udev to make the right choice for you. I agree the wording isn't great, and we can improve that. But I do think that the current choices make sense. >> "none" just means that we don't have a scheduler attached. > > .. which makes no sense to me in the first place. > > People used to try to convince us that doing IO schedulers was a > mistake, because modern disk hardware did a better job than we could > do in software. > > Those people were full of crap. The regular IO scheduler used to have > a "NONE" option too. Maybe it even still has one, but only insane > people actually use it. > > Why is the MQ stuff magically so different that NONE would make sense at all>? I was never one of those people, and I've always been a strong advocate for imposing scheduling to keep devices in check. The regular IO scheduler pool includes "noop", which is probably the one you are thinking of. That one is a bit different than the new "none" option for blk-mq, in that it does do insertion sorts and it does do merges. "none" does some merging, but only where it happens to make sense. There's no insertion sorting. > And equally importantly: why do we _ask_ people these issues? Is this > some kind of sick "cover your ass" thing, where you can say "well, I > asked about it", when inevitably the choice ends up being the wrong > one? > > We have too damn many Kconfig options as-is, I'm trying to push back > on them. These two options seem fundamentally broken and stupid. > > The "we have no good idea, so let's add a Kconfig option" seems like a > broken excuse for these things existing. > > So why ask this question in the first place? > > Is there any possible reason why "NONE" is a good option at all? And > if it is the _only_ option (because no other better choice exists), it > damn well shouldn't be a kconfig option! I'm all for NOT asking questions, and not providing tunables. That's generally how I do write code. See the blk-wbt stuff, for instance, that basically just has one tunable that's set sanely by default, and we figure out the rest. I don't want to regress performance of blk-mq devices by attaching mq-deadline to them. When we do have a sane scheduler choice, we'll make that the default. And yes, maybe we can remove the Kconfig option at that point. For single queue devices, we could kill the option. But we're expecting bfq-mq for 4.12, and we'll want to have the option at that point unless you want to rely solely on runtime setting of the scheduler through udev or by the sysadmin. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboewrote: > > But under a device managed by blk-mq, that device exposes a number of > hardware queues. For older style devices, that number is typically 1 > (single queue). ... but why would this ever be different from the normal IO scheduler? IOW, what makes single-queue mq scheduling so special that (a) it needs its own config option (b) it is different from just the regular IO scheduler in the first place? So the whole thing stinks. The fact that it then has an incomprehensible config option seems to be just gravy on top of the crap. > "none" just means that we don't have a scheduler attached. .. which makes no sense to me in the first place. People used to try to convince us that doing IO schedulers was a mistake, because modern disk hardware did a better job than we could do in software. Those people were full of crap. The regular IO scheduler used to have a "NONE" option too. Maybe it even still has one, but only insane people actually use it. Why is the MQ stuff magically so different that NONE would make sense at all>? And equally importantly: why do we _ask_ people these issues? Is this some kind of sick "cover your ass" thing, where you can say "well, I asked about it", when inevitably the choice ends up being the wrong one? We have too damn many Kconfig options as-is, I'm trying to push back on them. These two options seem fundamentally broken and stupid. The "we have no good idea, so let's add a Kconfig option" seems like a broken excuse for these things existing. So why ask this question in the first place? Is there any possible reason why "NONE" is a good option at all? And if it is the _only_ option (because no other better choice exists), it damn well shouldn't be a kconfig option! Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/21/2017 04:02 PM, Linus Torvalds wrote: > Hmm. The new config options are incomprehensible and their help > messages don't actually help. > > So can you fill in the blanks on what > > Default single-queue blk-mq I/O scheduler > Default multi-queue blk-mq I/O scheduler > > config options mean, and why they default to none? > > The config phase of the kernel is one of the worst parts of the whole > project, and adding these kinds of random and incomprehensible config > options does *not* help. I'll try and see if I can come up with some better sounding/reading explanations. But under a device managed by blk-mq, that device exposes a number of hardware queues. For older style devices, that number is typically 1 (single queue). This is true for most SCSI devices that are run by scsi-mq, which is often hosting rotational storage. Faster devices, like nvme, exposes a lot more hardware queues (multi-queue). Hence the distinction between having a scheduler attached for single-queue devices, and for multi-queue devices. For rotational devices, we'll want to default to something like mq-deadline, and I actually thought that was the default already. It should be (see below). For multi-queue devices, we'll want to initially default to "none", and then later attach a properly multiqueue scheduler, when we have it (it's still in development). "none" just means that we don't have a scheduler attached. In essence, we want to default to having a sane IO scheduler attached depending on device class. For single-queue devices, that's deadline for now. For multi-queue, we'll want to wait until we have something that scales really well. It's not that easy to present this information in a user grokkable fashion, since most people would not know the difference between the two. I'll send the below as a real patch, and also ponder how we can improve the Kconfig text. diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched index 0715ce93daef..f6144c5d7c70 100644 --- a/block/Kconfig.iosched +++ b/block/Kconfig.iosched @@ -75,7 +75,7 @@ config MQ_IOSCHED_NONE choice prompt "Default single-queue blk-mq I/O scheduler" - default DEFAULT_SQ_NONE + default DEFAULT_SQ_DEADLINE if MQ_IOSCHED_DEADLINE=y help Select the I/O scheduler which will be used by default for blk-mq managed block devices with a single queue. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
Hmm. The new config options are incomprehensible and their help messages don't actually help. So can you fill in the blanks on what Default single-queue blk-mq I/O scheduler Default multi-queue blk-mq I/O scheduler config options mean, and why they default to none? The config phase of the kernel is one of the worst parts of the whole project, and adding these kinds of random and incomprehensible config options does *not* help. Linus
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/20/2017 08:32 AM, Jens Axboe wrote: > Bart, since you are the only one that can reproduce this, can you just bisect > your way through that series? Hello Jens, I will do that as soon as I'm back in the office (later this week). Bart.
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/20/2017 09:16 AM, Bart Van Assche wrote: > On 02/19/2017 11:35 PM, Christoph Hellwig wrote: >> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote: >>> That said, we will look into this again, of course. Christoph, any idea? >> >> No idea really - this seems so far away from the code touched, and there >> are no obvious signs for a memory scamble from another object touched >> that I think if it really bisects down to that issue it must be a timing >> issue. >> >> But reading Bart's message again: Did you actually bisect it down >> to the is commit? Or just test the whole tree? Between the 4.10-rc5 >> merge and all the block tree there might a few more likely suspects >> like the scsi bdi lifetime fixes that James mentioned. > > Hello Christoph, > > As far as I know Jens does not rebase his trees so we can use the commit > date to check which patch went in when. From the first of Jan's bdi patches: > > CommitDate: Thu Feb 2 08:18:41 2017 -0700 > > So the bdi patches went in several days after I reported the general > protection > fault issue. > > In an e-mail of January 30th I wrote the following: "Running the srp-test > software against kernel 4.9.6 and kernel 4.10-rc5 went fine. With your > for-4.11/block branch (commit 400f73b23f457a) however I just ran into > the following warning: [ ... ]" That means that I did not hit the crash with > Jens' for-4.11/block branch but only with the for-next branch. The patches > on Jens' for-next branch after that commit that were applied before I ran > my test are: > > $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block > drivers/md/dm{,-mpath,-table}.[ch] > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in > __blk_rq_prep_clone > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of > struct request > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into > bsg_setup_queue > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation > to the owner of the request_queue > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for > extra command data > 5ea708d15a928f7a479987704203616d3274c03b block: simplify > blk_init_allocated_queue > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into > for-4.11/rq-refactor > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from > __blkdev_issue_zeroout > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size > and bdev_zone_size > > Do you see any patch in the above list that does not belong to the "split > scsi passthrough fields out of struct request" series and that could have > caused the reported behavior change? Bart, since you are the only one that can reproduce this, can you just bisect your way through that series? -- Jens Axboe
RE: [GIT PULL] Block pull request for- 4.11-rc1
On 02/19/2017 11:35 PM, Christoph Hellwig wrote: > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote: >> That said, we will look into this again, of course. Christoph, any idea? > > No idea really - this seems so far away from the code touched, and there > are no obvious signs for a memory scamble from another object touched > that I think if it really bisects down to that issue it must be a timing > issue. > > But reading Bart's message again: Did you actually bisect it down > to the is commit? Or just test the whole tree? Between the 4.10-rc5 > merge and all the block tree there might a few more likely suspects > like the scsi bdi lifetime fixes that James mentioned. Hello Christoph, As far as I know Jens does not rebase his trees so we can use the commit date to check which patch went in when. From the first of Jan's bdi patches: CommitDate: Thu Feb 2 08:18:41 2017 -0700 So the bdi patches went in several days after I reported the general protection fault issue. In an e-mail of January 30th I wrote the following: "Running the srp-test software against kernel 4.9.6 and kernel 4.10-rc5 went fine. With your for-4.11/block branch (commit 400f73b23f457a) however I just ran into the following warning: [ ... ]" That means that I did not hit the crash with Jens' for-4.11/block branch but only with the for-next branch. The patches on Jens' for-next branch after that commit that were applied before I ran my test are: $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch] fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size Do you see any patch in the above list that does not belong to the "split scsi passthrough fields out of struct request" series and that could have caused the reported behavior change? Thanks, Bart.
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/19/2017 07:59 PM, Jens Axboe wrote: > On 02/19/2017 07:12 PM, James Bottomley wrote: >> On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote: >>> On 02/19/2017 06:09 PM, Bart Van Assche wrote: On 02/19/2017 04:11 PM, Jens Axboe wrote: > - Removal of the BLOCK_PC support in struct request, and > refactoring of > carrying SCSI payloads in the block layer. This cleans up the > code > nicely, and enables us to kill the SCSI specific parts of > struct > request, shrinking it down nicely. From Christoph mainly, with > help > from Hannes. Hello Jens, Christoph and Mike, Is anyone working on a fix for the regression introduced by the BLOCK_PC removal changes (general protection fault) that I had reported three weeks ago? See also https://www.spinics.net/lists/raid/msg55494.html >>> >>> I don't think that's a regression in this series, it just triggers >>> more easily with this series. The BLOCK_PC removal fixes aren't >>> touching device life times at all. >>> >>> That said, we will look into this again, of course. Christoph, any >>> idea? >> >> We could do with tracing the bdi removal failure issue fingered both by >> the block xfstests and Omar. It's something to do with this set of >> commits >> >>> - Fixes for duplicate bdi registrations and bdi/queue life time >>> problems from Jan and Dan. >> >> But no-one has actually root caused it yet. > > The above set from Jan and Dan fixed one set of issues around this, and > the reproducer test case was happy as well. But we've recently found > that there are still corner cases that aren't happy, Omar reported that > separately on Friday. So there will be a followup set for that, > hopefully intersecting with the issue that Bart reported. Forgot to mention, that these cases exist in 4.0 and 4.6 as well, so neither are a new problem with this series. The fixes from Jan and Dan didn't close all of them. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/19/2017 07:12 PM, James Bottomley wrote: > On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote: >> On 02/19/2017 06:09 PM, Bart Van Assche wrote: >>> On 02/19/2017 04:11 PM, Jens Axboe wrote: - Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes. >>> >>> Hello Jens, Christoph and Mike, >>> >>> Is anyone working on a fix for the regression introduced by the >>> BLOCK_PC removal changes (general protection fault) that I had >>> reported three weeks ago? See also >>> https://www.spinics.net/lists/raid/msg55494.html >> >> I don't think that's a regression in this series, it just triggers >> more easily with this series. The BLOCK_PC removal fixes aren't >> touching device life times at all. >> >> That said, we will look into this again, of course. Christoph, any >> idea? > > We could do with tracing the bdi removal failure issue fingered both by > the block xfstests and Omar. It's something to do with this set of > commits > >> - Fixes for duplicate bdi registrations and bdi/queue life time >> problems from Jan and Dan. > > But no-one has actually root caused it yet. The above set from Jan and Dan fixed one set of issues around this, and the reproducer test case was happy as well. But we've recently found that there are still corner cases that aren't happy, Omar reported that separately on Friday. So there will be a followup set for that, hopefully intersecting with the issue that Bart reported. -- Jens Axboe
Re: [GIT PULL] Block pull request for- 4.11-rc1
On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote: > On 02/19/2017 06:09 PM, Bart Van Assche wrote: > > On 02/19/2017 04:11 PM, Jens Axboe wrote: > > > - Removal of the BLOCK_PC support in struct request, and > > > refactoring of > > > carrying SCSI payloads in the block layer. This cleans up the > > > code > > > nicely, and enables us to kill the SCSI specific parts of > > > struct > > > request, shrinking it down nicely. From Christoph mainly, with > > > help > > > from Hannes. > > > > Hello Jens, Christoph and Mike, > > > > Is anyone working on a fix for the regression introduced by the > > BLOCK_PC removal changes (general protection fault) that I had > > reported three weeks ago? See also > > https://www.spinics.net/lists/raid/msg55494.html > > I don't think that's a regression in this series, it just triggers > more easily with this series. The BLOCK_PC removal fixes aren't > touching device life times at all. > > That said, we will look into this again, of course. Christoph, any > idea? We could do with tracing the bdi removal failure issue fingered both by the block xfstests and Omar. It's something to do with this set of commits > - Fixes for duplicate bdi registrations and bdi/queue life time > problems from Jan and Dan. But no-one has actually root caused it yet. James
Re: [GIT PULL] Block pull request for- 4.11-rc1
On 02/19/2017 06:09 PM, Bart Van Assche wrote: > On 02/19/2017 04:11 PM, Jens Axboe wrote: >> - Removal of the BLOCK_PC support in struct request, and refactoring of >> carrying SCSI payloads in the block layer. This cleans up the code >> nicely, and enables us to kill the SCSI specific parts of struct >> request, shrinking it down nicely. From Christoph mainly, with help >> from Hannes. > > Hello Jens, Christoph and Mike, > > Is anyone working on a fix for the regression introduced by the BLOCK_PC > removal > changes (general protection fault) that I had reported three weeks ago? See > also > https://www.spinics.net/lists/raid/msg55494.html I don't think that's a regression in this series, it just triggers more easily with this series. The BLOCK_PC removal fixes aren't touching device life times at all. That said, we will look into this again, of course. Christoph, any idea? -- Jens Axboe
RE: [GIT PULL] Block pull request for- 4.11-rc1
On 02/19/2017 04:11 PM, Jens Axboe wrote: > - Removal of the BLOCK_PC support in struct request, and refactoring of > carrying SCSI payloads in the block layer. This cleans up the code > nicely, and enables us to kill the SCSI specific parts of struct > request, shrinking it down nicely. From Christoph mainly, with help > from Hannes. Hello Jens, Christoph and Mike, Is anyone working on a fix for the regression introduced by the BLOCK_PC removal changes (general protection fault) that I had reported three weeks ago? See also https://www.spinics.net/lists/raid/msg55494.html Thanks, Bart.
[GIT PULL] Block pull request for- 4.11-rc1
Hi Linus, This is the collected pull request for 4.11 for the block core and drivers. It's really two different branches: for-4.11/block-signed for-4.11/next-signed for-4.11/next exists because some of Christoph's patch series were based on patches that were added after for-4.11/block was forked off, which would have caused needless merge pain. So for-4.11/next was forked off v4.10-rc5, with for-4.11/block pulled in. Feel free to pull each of these in succession instead of this pre-merged branch. Note that if you do opt for pulling the two branches, for-4.11/block will throw a trivial merge conflict in block/blk-mq.c, where you need to delete a function. for-4.11/next merges cleanly, HOWEVER, a commit that was added in mainline since v4.10-rc5 (f2e767bb5d6e) adds a line of code that is no longer valid with the changes in for-4.11/next. Hence, if you do pull in separately, you'll want to --no-commit and edit: drivers/scsi/mpt3sas/mpt3sas_scsih.c to fix up that one line, like I did in the for-4.11/linus-merge branch. With that said, this pull request contains: - blk-mq scheduling framework from me and Omar, with a port of the deadline scheduler for this framework. A port of BFQ from Paolo is in the works, and should be ready for 4.12. - Various fixups and improvements to the above scheduling framework from Omar, Paolo, Bart, me, others. - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar. This allows us to export more information that helps debug hangs or performance issues, without cluttering or abusing the sysfs API. - Fixes for the sbitmap code, the scalable bitmap code that was migrated from blk-mq, from Omar. - Removal of the BLOCK_PC support in struct request, and refactoring of carrying SCSI payloads in the block layer. This cleans up the code nicely, and enables us to kill the SCSI specific parts of struct request, shrinking it down nicely. From Christoph mainly, with help from Hannes. - Support for ranged discard requests and discard merging, also from Christoph. - Support for OPAL in the block layer, and for NVMe as well. Mainly from Scott Bauer, with fixes/updates from various others folks. - Error code fixup for gdrom from Christophe. - cciss pci irq allocation cleanup from Christoph. - Making the cdrom device operations read only, from Kees Cook. - Fixes for duplicate bdi registrations and bdi/queue life time problems from Jan and Dan. - Set of fixes and updates for lightnvm, from Matias and Javier. - A few fixes for nbd from Josef, using idr to name devices and a workqueue deadlock fix on receive. Also marks Josef as the current maintainer of nbd. - Fix from Josef, overwriting queue settings when the number of hardware queues is updated for a blk-mq device. - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO aborted, if we didn't end up aborting it. - SG gap merging fix from Ming Lei for block. - Loop fix also from Ming, fixing a race and crash between setting loop status and IO. - Two block race fixes from Tahsin, fixing request list iteration and fixing a race between device registration and udev device add notifiations. - Double free fix from cgroup writeback, from Tejun. - Another double free fix in blkcg, from Hou Tao. - Partition overflow fix for EFI from Alden Tondettar. Please pull! Either this pre-merged branch: git://git.kernel.dk/linux-block.git for-4.11/linus-merge-signed or git://git.kernel.dk/linux-block.git for-4.11/block-signed git://git.kernel.dk/linux-block.git for-4.11/next-signed in that order. All branches are signed tags. Alden Tondettar (1): partitions/efi: Fix integer overflow in GPT size calculation Alexander Potapenko (1): block: Initialize cfqq->ioprio_class in cfq_get_queue() Bart Van Assche (5): blk-mq-debugfs: Add missing __acquires() / __releases() annotations blk-mq-debug: Avoid that sparse complains about req_flags_t usage blk-mq-debug: Make show() operations interruptible blk-mq-debug: Introduce debugfs_create_files() block: Update comments that refer to __bio_map_user() and bio_map_user() Christoph Hellwig (39): block: add a op_is_flush helper md: cleanup bio op / flags handling in raid1_write_request block: fix elevator init check block: simplify blk_init_allocated_queue block: allow specifying size for extra command data block: cleanup tracing dm: remove incomplete BLOCK_PC support dm: always defer request allocation to the owner of the request_queue scsi: remove gfp_flags member in scsi_host_cmd_pool scsi: respect unchecked_isa_dma for blk-mq scsi: remove scsi_cmd_dma_pool scsi: remove __scsi_alloc_queue scsi: allocate scsi_cmnd structures as part of struct request block/bsg: move queue creation into bsg_setup_queue