subject:"\[GIT PULL\] Block pull request for\- 4.11\-rc1"

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-25 Thread Jens Axboe

On 02/25/2017 11:17 AM, h...@lst.de wrote:
> On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote:
>> Bart, I pushed a fix here:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1
> 
> Yeah, this looks fine to me.  It was broken on blk-mq before, but
> basically impossible to hit.  I wonder if we should have a debug mode
> where we set requests to a known pattern after they are freed to catch
> these sort of bugs.

tio/md would probably never change for the same set of requests, so
it worked by happy accident before. Some of the blk-mq state remains
valid after "freeing" a request, but we could selectively fill and
definitely fill the entire PDU area with free poison.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-25 Thread h...@lst.de

On Fri, Feb 24, 2017 at 01:22:42PM -0700, Jens Axboe wrote:
> Bart, I pushed a fix here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1

Yeah, this looks fine to me.  It was broken on blk-mq before, but
basically impossible to hit.  I wonder if we should have a debug mode
where we set requests to a known pattern after they are freed to catch
these sort of bugs.

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-24 Thread Bart Van Assche

On Fri, 2017-02-24 at 13:22 -0700, Jens Axboe wrote:
> Bart, I pushed a fix here:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1

Hello Jens,

The same test passes against the kernel I obtained by merging your for-linus
branch with the same version of Linus' master branch I mentioned in a previous
e-mail. Feel free to add my Tested-by to the patch "dm-rq: don't dereference
request payload after ending request".

Thanks,

Bart.

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-24 Thread Jens Axboe

On 02/24/2017 01:00 PM, Jens Axboe wrote:
> On 02/24/2017 12:43 PM, Linus Torvalds wrote:
>> On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche
>>  wrote:
>>>
>>> So the crash is caused by an attempt to dereference address 
>>> 0x6b6b6b6b6b6b6b6b
>>> at offset 0x270. I think this means the crash is caused by a use-after-free.
>>
>> Yeah, that's POISON_FREE, and that might explain why you see crashes
>> that others don't - you obviously have SLAB poisoning enabled. Jens
>> may not have.
>>
>> %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing
>>
>> struct dm_rq_target_io *tio = tio_from_request(rq);
>> struct request *clone = tio->clone;
>> int rw;
>>
>> if (!clone) {
>> rq_end_stats(tio->md, rq);
>> rw = rq_data_dir(rq);
>> if (!rq->q->mq_ops)
>> blk_end_request_all(rq, tio->error);
>> else
>> blk_mq_end_request(rq, tio->error);
>> rq_completed(tio->md, rw, false);
>> return;
>> }
>>
>> so it's the 'tio' pointer that has been free'd. But it's worth noting
>> that we did apparently successfully dereference "tio" earlier in that
>> dm_softirq_done() *without* getting the poison value, so what I think
>> might be going on is that the 'tio' thing gets free'd when the code
>> does the blk_end_request_all()/blk_mq_end_request() call.
>>
>> Which makes sense - that ends the lifetime of the request, which in
>> turn also ends the lifetime of the "tio_from_request()", no?
>>
>> So the fix may be as simple as just doing
>>
>> if (!clone) {
>> struct mapped_device *md = tio->md;
>>
>> rq_end_stats(md, rq);
>> ...
>> rq_completed(md, rw, false);
>> return;
>> }
>>
>> because the 'mapped_device' pointer hopefully is still valid, it's
>> just 'tio' that has been freed.
>>
>> Jens? Bart? Christoph? Somebody who knows this code should
>> double-check my thinking above. I don't actually know the tio
>> lifetimes, I'm just going by looking at how earlier accesses seemed to
>> be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer,
>> for example).
> 
> I think that is spot on. With the request changes for CDBs, for non
> blk-mq, we know also carry the payload after the request. But since
> blk-mq never frees the request, the above use-after-free with poison
> will only happen for !mq. Caching 'md' and avoiding a dereference of
> 'tio' after calling blk_end_request_all() will likely fix it.
> 
> Bart, can you test that?

Bart, I pushed a fix here:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=61febef40bfe8ab68259d8545257686e8a0d91d1

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-24 Thread Jens Axboe

On 02/24/2017 12:43 PM, Linus Torvalds wrote:
> On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche
>  wrote:
>>
>> So the crash is caused by an attempt to dereference address 
>> 0x6b6b6b6b6b6b6b6b
>> at offset 0x270. I think this means the crash is caused by a use-after-free.
> 
> Yeah, that's POISON_FREE, and that might explain why you see crashes
> that others don't - you obviously have SLAB poisoning enabled. Jens
> may not have.
> 
> %rdi is "struct mapped_device *md", which came from dm_softirq_done() doing
> 
> struct dm_rq_target_io *tio = tio_from_request(rq);
> struct request *clone = tio->clone;
> int rw;
> 
> if (!clone) {
> rq_end_stats(tio->md, rq);
> rw = rq_data_dir(rq);
> if (!rq->q->mq_ops)
> blk_end_request_all(rq, tio->error);
> else
> blk_mq_end_request(rq, tio->error);
> rq_completed(tio->md, rw, false);
> return;
> }
> 
> so it's the 'tio' pointer that has been free'd. But it's worth noting
> that we did apparently successfully dereference "tio" earlier in that
> dm_softirq_done() *without* getting the poison value, so what I think
> might be going on is that the 'tio' thing gets free'd when the code
> does the blk_end_request_all()/blk_mq_end_request() call.
> 
> Which makes sense - that ends the lifetime of the request, which in
> turn also ends the lifetime of the "tio_from_request()", no?
> 
> So the fix may be as simple as just doing
> 
> if (!clone) {
> struct mapped_device *md = tio->md;
> 
> rq_end_stats(md, rq);
> ...
> rq_completed(md, rw, false);
> return;
> }
> 
> because the 'mapped_device' pointer hopefully is still valid, it's
> just 'tio' that has been freed.
> 
> Jens? Bart? Christoph? Somebody who knows this code should
> double-check my thinking above. I don't actually know the tio
> lifetimes, I'm just going by looking at how earlier accesses seemed to
> be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer,
> for example).

I think that is spot on. With the request changes for CDBs, for non
blk-mq, we know also carry the payload after the request. But since
blk-mq never frees the request, the above use-after-free with poison
will only happen for !mq. Caching 'md' and avoiding a dereference of
'tio' after calling blk_end_request_all() will likely fix it.

Bart, can you test that?

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-24 Thread Linus Torvalds

On Fri, Feb 24, 2017 at 9:39 AM, Bart Van Assche
 wrote:
>
> So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
> at offset 0x270. I think this means the crash is caused by a use-after-free.

Yeah, that's POISON_FREE, and that might explain why you see crashes
that others don't - you obviously have SLAB poisoning enabled. Jens
may not have.

%rdi is "struct mapped_device *md", which came from dm_softirq_done() doing

struct dm_rq_target_io *tio = tio_from_request(rq);
struct request *clone = tio->clone;
int rw;

if (!clone) {
rq_end_stats(tio->md, rq);
rw = rq_data_dir(rq);
if (!rq->q->mq_ops)
blk_end_request_all(rq, tio->error);
else
blk_mq_end_request(rq, tio->error);
rq_completed(tio->md, rw, false);
return;
}

so it's the 'tio' pointer that has been free'd. But it's worth noting
that we did apparently successfully dereference "tio" earlier in that
dm_softirq_done() *without* getting the poison value, so what I think
might be going on is that the 'tio' thing gets free'd when the code
does the blk_end_request_all()/blk_mq_end_request() call.

Which makes sense - that ends the lifetime of the request, which in
turn also ends the lifetime of the "tio_from_request()", no?

So the fix may be as simple as just doing

if (!clone) {
struct mapped_device *md = tio->md;

rq_end_stats(md, rq);
...
rq_completed(md, rw, false);
return;
}

because the 'mapped_device' pointer hopefully is still valid, it's
just 'tio' that has been freed.

Jens? Bart? Christoph? Somebody who knows this code should
double-check my thinking above. I don't actually know the tio
lifetimes, I'm just going by looking at how earlier accesses seemed to
be fine (eg that "tio->clone" got us NULL, not a POISON_FREE pointer,
for example).

   Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-24 Thread Jens Axboe

On 02/24/2017 10:39 AM, Bart Van Assche wrote:
> On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote:
>> On 02/20/2017 09:16 AM, Bart Van Assche wrote:
>>> On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
 On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
> That said, we will look into this again, of course. Christoph, any idea?

 No idea really - this seems so far away from the code touched, and there
 are no obvious signs for a memory scamble from another object touched
 that I think if it really bisects down to that issue it must be a timing
 issue.

 But reading Bart's message again:  Did you actually bisect it down
 to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
 merge and all the block tree there might a few more likely suspects
 like the scsi bdi lifetime fixes that James mentioned.
>>>
>>> Hello Christoph,
>>>
>>> As far as I know Jens does not rebase his trees so we can use the commit
>>> date to check which patch went in when. From the first of Jan's bdi patches:
>>>
>>> CommitDate: Thu Feb 2 08:18:41 2017 -0700
>>>
>>> So the bdi patches went in several days after I reported the general 
>>> protection
>>> fault issue.
>>>
>>> In an e-mail of January 30th I wrote the following: "Running the srp-test
>>> software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
>>> for-4.11/block branch (commit 400f73b23f457a) however I just ran into
>>> the following warning: [ ... ]" That means that I did not hit the crash with
>>> Jens' for-4.11/block branch but only with the for-next branch. The patches
>>> on Jens' for-next branch after that commit that were applied before I ran
>>> my test are:
>>>
>>> $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block 
>>> drivers/md/dm{,-mpath,-table}.[ch]
>>> fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in 
>>> __blk_rq_prep_clone
>>> 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of 
>>> struct request
>>> 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation 
>>> into bsg_setup_queue
>>> eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request 
>>> allocation to the owner of the request_queue
>>> 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for 
>>> extra command data
>>> 5ea708d15a928f7a479987704203616d3274c03b block: simplify 
>>> blk_init_allocated_queue
>>> e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
>>> f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into 
>>> for-4.11/rq-refactor
>>> 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
>>> bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from 
>>> __blkdev_issue_zeroout
>>> f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size 
>>> and bdev_zone_size
>>>
>>> Do you see any patch in the above list that does not belong to the "split
>>> scsi passthrough fields out of struct request" series and that could have
>>> caused the reported behavior change?
>>
>> Bart, since you are the only one that can reproduce this, can you just bisect
>> your way through that series?
> 
> Hello Jens,
> 
> Since Christoph also has access to IB hardware I will leave it to Christoph
> to do the bisect. Anyway, I just reproduced this crash with Linus' current
> tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq
> (see also https://github.com/bvanassche/srp-test):
> 
> [ 1629.920553] general protection fault:  [#1] SMP
> [ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G  I 
> 4.10.0-dbg+ #1
> [ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod]
> [ 1629.921316] RSP: 0018:c90001bdbda8 EFLAGS: 00010246
> [ 1629.921344] RAX:  RBX: 6b6b6b6b6b6b6b6b RCX: 
> 
> [ 1629.921372] RDX:  RSI:  RDI: 
> 6b6b6b6b6b6b6b6b
> [ 1629.921401] RBP: c90001bdbdc0 R08: 8803a3858d48 R09: 
> 
> [ 1629.921429] R10:  R11:  R12: 
> 
> [ 1629.921458] R13:  R14: 81c05120 R15: 
> 0004
> [ 1629.921489] FS:  () GS:88046ef8() 
> knlGS:
> [ 1629.921520] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1629.921547] CR2: 7fb6324486b8 CR3: 01c0f000 CR4: 
> 001406e0
> [ 1629.921576] Call Trace:
> [ 1629.921605]  dm_softirq_done+0xe6/0x1e0 [dm_mod]
> [ 1629.921637]  blk_done_softirq+0x88/0xa0
> [ 1629.921663]  __do_softirq+0xba/0x4c0
> [ 1629.921744]  run_ksoftirqd+0x1a/0x50
> [ 1629.921769]  smpboot_thread_fn+0x123/0x1e0
> [ 1629.921797]  kthread+0x107/0x140
> [ 1629.921944]  ret_from_fork+0x2e/0x40
> [ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 
> 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 
>

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-24 Thread Bart Van Assche

On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote:
> On 02/20/2017 09:16 AM, Bart Van Assche wrote:
> > On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
> > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
> > > > That said, we will look into this again, of course. Christoph, any idea?
> > > 
> > > No idea really - this seems so far away from the code touched, and there
> > > are no obvious signs for a memory scamble from another object touched
> > > that I think if it really bisects down to that issue it must be a timing
> > > issue.
> > > 
> > > But reading Bart's message again:  Did you actually bisect it down
> > > to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
> > > merge and all the block tree there might a few more likely suspects
> > > like the scsi bdi lifetime fixes that James mentioned.
> > 
> > Hello Christoph,
> > 
> > As far as I know Jens does not rebase his trees so we can use the commit
> > date to check which patch went in when. From the first of Jan's bdi patches:
> > 
> > CommitDate: Thu Feb 2 08:18:41 2017 -0700
> > 
> > So the bdi patches went in several days after I reported the general 
> > protection
> > fault issue.
> > 
> > In an e-mail of January 30th I wrote the following: "Running the srp-test
> > software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
> > for-4.11/block branch (commit 400f73b23f457a) however I just ran into
> > the following warning: [ ... ]" That means that I did not hit the crash with
> > Jens' for-4.11/block branch but only with the for-next branch. The patches
> > on Jens' for-next branch after that commit that were applied before I ran
> > my test are:
> > 
> > $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block 
> > drivers/md/dm{,-mpath,-table}.[ch]
> > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in 
> > __blk_rq_prep_clone
> > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of 
> > struct request
> > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation 
> > into bsg_setup_queue
> > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request 
> > allocation to the owner of the request_queue
> > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for 
> > extra command data
> > 5ea708d15a928f7a479987704203616d3274c03b block: simplify 
> > blk_init_allocated_queue
> > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
> > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into 
> > for-4.11/rq-refactor
> > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
> > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from 
> > __blkdev_issue_zeroout
> > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size 
> > and bdev_zone_size
> > 
> > Do you see any patch in the above list that does not belong to the "split
> > scsi passthrough fields out of struct request" series and that could have
> > caused the reported behavior change?
> 
> Bart, since you are the only one that can reproduce this, can you just bisect
> your way through that series?

Hello Jens,

Since Christoph also has access to IB hardware I will leave it to Christoph
to do the bisect. Anyway, I just reproduced this crash with Linus' current
tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq
(see also https://github.com/bvanassche/srp-test):

[ 1629.920553] general protection fault:  [#1] SMP
[ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G  I 
4.10.0-dbg+ #1
[ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod]
[ 1629.921316] RSP: 0018:c90001bdbda8 EFLAGS: 00010246
[ 1629.921344] RAX:  RBX: 6b6b6b6b6b6b6b6b RCX: 
[ 1629.921372] RDX:  RSI:  RDI: 6b6b6b6b6b6b6b6b
[ 1629.921401] RBP: c90001bdbdc0 R08: 8803a3858d48 R09: 
[ 1629.921429] R10:  R11:  R12: 
[ 1629.921458] R13:  R14: 81c05120 R15: 0004
[ 1629.921489] FS:  () GS:88046ef8() 
knlGS:
[ 1629.921520] CS:  0010 DS:  ES:  CR0: 80050033
[ 1629.921547] CR2: 7fb6324486b8 CR3: 01c0f000 CR4: 001406e0
[ 1629.921576] Call Trace:
[ 1629.921605]  dm_softirq_done+0xe6/0x1e0 [dm_mod]
[ 1629.921637]  blk_done_softirq+0x88/0xa0
[ 1629.921663]  __do_softirq+0xba/0x4c0
[ 1629.921744]  run_ksoftirqd+0x1a/0x50
[ 1629.921769]  smpboot_thread_fn+0x123/0x1e0
[ 1629.921797]  kthread+0x107/0x140
[ 1629.921944]  ret_from_fork+0x2e/0x40
[ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 
84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 
a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85 
[ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod]

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Linus Torvalds

On Wed, Feb 22, 2017 at 1:50 PM, Markus Trippelsdorf
 wrote:
>
> But what about e.g. SATA SSDs? Wouldn't they be better off without any
> scheduler?
> So perhaps setting "none" for queue/rotational==0 and mq-deadline for
> spinning drives automatically in the sq blk-mq case?

Jens already said that the merging advantage can outweigh the costs,
but he didn't actually talk much about it.

The scheduler advantage can outweigh the costs of running a scheduler
by an absolutely _huge_ amount.

An SSD isn't zero-cost, and each command tends to have some fixed
overhead on the controller, and pretty much all SSD's heavily prefer
fewer large request over lots of tiny ones.

There are also fairness/latency issues that tend to very heavily favor
having an actual scheduler, ie reads want to be scheduled before
writes on an SSD (within reason) in order to make latency better.

Ten years ago, there were lots of people who argued that you don't
want to do do scheduling for SSD's, because SSD's were so fast that
you only added overhead. Nobody really believes that fairytale any
more.

So you might have particular loads that look better with noop, but
they will be rare and far between. Try it, by all means, and if it
works for you, set it in your udev rules.

The main place where a noop scheduler currently might make sense is
likely for a ramdisk, but quite frankly, since the main real usecase
for a ram-disk tends to be to make it easy to profile and find the
bottlenecks for performance analysis (for emulating future "infinitely
fast" media), even that isn't true - using noop there defeats the
whole purpose.

  Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Markus Trippelsdorf

On 2017.02.22 at 11:44 -0700, Jens Axboe wrote:
> On 02/22/2017 11:42 AM, Linus Torvalds wrote:
> > On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
> >  wrote:
> >>
> >> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR 
> >> USER?
> > 
> > Basically, I'm pushing back on config options that I can't personally
> > even sanely answer.
> 
> I got that much, and I don't disagree on that part.
> 
> > If it's a config option about "do I have a particular piece of
> > hardware", it makes sense. But these new ones were just complete
> > garbage.
> > 
> > The whole "default IO scheduler" thing is a disease. We should stop
> > making up these shit schedulers and then say "we don't know which one
> > works best for you".
> > 
> > All it does is encourage developers to make shortcuts and create crap
> > that isn't generically useful, and then blame the user and say "well,
> > you should have picked a different scheduler" when they say "this does
> > not work well for me".
> > 
> > We have had too many of those kinds of broken choices.  And when the
> > new Kconfig options get so confusing and so esoteric that I go "Hmm, I
> > have no idea if my hardware does a single queue or not", I put my foot
> > down.
> > 
> > When the IO scheduler questions were about a generic IO scheduler for
> > everything, I can kind of understand them. I think it was still a
> > mistake (for the reasons outline above), but at least it was a
> > comprehensible question to ask.
> > 
> > But when it gets to "what should I do about a single-queue version of
> > a MQ scheduler", the question is no longer even remotely sensible. The
> > question should simply NOT EXIST. There is no possible valid reason to
> > ask that kind of crap.
> 
> OK, so here's what I'll do:
> 
> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>mq-deadline, mq blk-mq will default to "none" (at least for now, until
>the new scheduler is done).

But what about e.g. SATA SSDs? Wouldn't they be better off without any
scheduler? 
So perhaps setting "none" for queue/rotational==0 and mq-deadline for
spinning drives automatically in the sq blk-mq case?

-- 
Markus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Jens Axboe

On 02/22/2017 02:50 PM, Markus Trippelsdorf wrote:
> On 2017.02.22 at 11:44 -0700, Jens Axboe wrote:
>> On 02/22/2017 11:42 AM, Linus Torvalds wrote:
>>> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
>>>  wrote:

 And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR 
 USER?
>>>
>>> Basically, I'm pushing back on config options that I can't personally
>>> even sanely answer.
>>
>> I got that much, and I don't disagree on that part.
>>
>>> If it's a config option about "do I have a particular piece of
>>> hardware", it makes sense. But these new ones were just complete
>>> garbage.
>>>
>>> The whole "default IO scheduler" thing is a disease. We should stop
>>> making up these shit schedulers and then say "we don't know which one
>>> works best for you".
>>>
>>> All it does is encourage developers to make shortcuts and create crap
>>> that isn't generically useful, and then blame the user and say "well,
>>> you should have picked a different scheduler" when they say "this does
>>> not work well for me".
>>>
>>> We have had too many of those kinds of broken choices.  And when the
>>> new Kconfig options get so confusing and so esoteric that I go "Hmm, I
>>> have no idea if my hardware does a single queue or not", I put my foot
>>> down.
>>>
>>> When the IO scheduler questions were about a generic IO scheduler for
>>> everything, I can kind of understand them. I think it was still a
>>> mistake (for the reasons outline above), but at least it was a
>>> comprehensible question to ask.
>>>
>>> But when it gets to "what should I do about a single-queue version of
>>> a MQ scheduler", the question is no longer even remotely sensible. The
>>> question should simply NOT EXIST. There is no possible valid reason to
>>> ask that kind of crap.
>>
>> OK, so here's what I'll do:
>>
>> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>>mq-deadline, mq blk-mq will default to "none" (at least for now, until
>>the new scheduler is done).
> 
> But what about e.g. SATA SSDs? Wouldn't they be better off without any
> scheduler? 

Marginal. If they are single queue, using a basic scheduler like
deadline isn't going to be a significant amount of overhead. In some
cases they are going to be better off, due to better merging. In the
worst case, overhead is slightly higher. Net result is positive, I'd
say.

> So perhaps setting "none" for queue/rotational==0 and mq-deadline for
> spinning drives automatically in the sq blk-mq case?

You can do that through a udev rule. The kernel doesn't know if the
device is rotational or not when we set up the scheduler. So we'd either
have to add code to do that, or simply just do it with a udev rule. I'd
prefer the latter.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Jens Axboe

On 02/22/2017 12:04 PM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboe  wrote:
>> On 02/22/2017 11:56 AM, Linus Torvalds wrote:
>>
>> OK, so here's what I'll do:
>>
>> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>>mq-deadline, mq blk-mq will default to "none" (at least for now, until
>>the new scheduler is done).
>> 2) The individual schedulers will be y/m/n selectable, just like any
>>other driver.
> 
> Yes. That makes sense as options. I can (or, perhaps even more
> importantly, a distro can) answer those kinds of questions.

Someone misspelled pacman:

parman (PARMAN) [N/m/y] (NEW) ?

There is no help available for this option.

Or I think it's pacman, because I have no idea what else it could be. I'm
going to say N.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Linus Torvalds

On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboe  wrote:
> On 02/22/2017 11:56 AM, Linus Torvalds wrote:
>
> OK, so here's what I'll do:
>
> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>mq-deadline, mq blk-mq will default to "none" (at least for now, until
>the new scheduler is done).
> 2) The individual schedulers will be y/m/n selectable, just like any
>other driver.

Yes. That makes sense as options. I can (or, perhaps even more
importantly, a distro can) answer those kinds of questions.

   Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Linus Torvalds

On Wed, Feb 22, 2017 at 10:52 AM, Jens Axboe  wrote:
>>
>> It's that simple.
>
> No, it's not that simple at all. Fact is, some optimizations make sense
> for some workloads, and some do not.

Are you even listening?

I'm saying no user can ever give a sane answer to your question. The
question is insane and wrong.

I already said you can have a dynamic configuration (and maybe even an
automatic heuristic - like saying that a ramdisk gets NOOP by default,
real hardware does not).

But asking a user at kernel config time for a default is insane. If
*you* cannot answer it, then the user sure as hell cannot.

Other configuration questions have problems too, but at least the
question about "should I support ext4" is something a user (or distro)
can sanely answer. So your comparisons are pure bullshit.

 Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Linus Torvalds

On Wed, Feb 22, 2017 at 10:41 AM, Jens Axboe  wrote:
>
> The fact is that we have two different sets, until we can yank
> the old ones. So I can't just ask one question, since the sets
> aren't identical.

Bullshit.

I'm, saying: rip out the question ENTIRELY. For *both* cases.

If you cannot yourself give a good answer, then there's no f*cking way
any user can give a good answer. So asking the question is totally and
utterly pointless.

All it means is that different people will try different (in random
ways) configurations, and the end result is random crap.

So get rid of those questions. Pick a default, and live with it. And
if people complain about performance, fix the performance issue.

It's that simple.

Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Jens Axboe

On 02/22/2017 11:42 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
>  wrote:
>>
>> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR 
>> USER?
> 
> Basically, I'm pushing back on config options that I can't personally
> even sanely answer.

I got that much, and I don't disagree on that part.

> If it's a config option about "do I have a particular piece of
> hardware", it makes sense. But these new ones were just complete
> garbage.
> 
> The whole "default IO scheduler" thing is a disease. We should stop
> making up these shit schedulers and then say "we don't know which one
> works best for you".
> 
> All it does is encourage developers to make shortcuts and create crap
> that isn't generically useful, and then blame the user and say "well,
> you should have picked a different scheduler" when they say "this does
> not work well for me".
> 
> We have had too many of those kinds of broken choices.  And when the
> new Kconfig options get so confusing and so esoteric that I go "Hmm, I
> have no idea if my hardware does a single queue or not", I put my foot
> down.
> 
> When the IO scheduler questions were about a generic IO scheduler for
> everything, I can kind of understand them. I think it was still a
> mistake (for the reasons outline above), but at least it was a
> comprehensible question to ask.
> 
> But when it gets to "what should I do about a single-queue version of
> a MQ scheduler", the question is no longer even remotely sensible. The
> question should simply NOT EXIST. There is no possible valid reason to
> ask that kind of crap.

OK, so here's what I'll do:

1) We'll kill the default scheduler choices. sq blk-mq will default to
   mq-deadline, mq blk-mq will default to "none" (at least for now, until
   the new scheduler is done).
2) The individual schedulers will be y/m/n selectable, just like any
   other driver.

I hope that works for everyone.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Linus Torvalds

On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
 wrote:
>
> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

Basically, I'm pushing back on config options that I can't personally
even sanely answer.

If it's a config option about "do I have a particular piece of
hardware", it makes sense. But these new ones were just complete
garbage.

The whole "default IO scheduler" thing is a disease. We should stop
making up these shit schedulers and then say "we don't know which one
works best for you".

All it does is encourage developers to make shortcuts and create crap
that isn't generically useful, and then blame the user and say "well,
you should have picked a different scheduler" when they say "this does
not work well for me".

We have had too many of those kinds of broken choices.  And when the
new Kconfig options get so confusing and so esoteric that I go "Hmm, I
have no idea if my hardware does a single queue or not", I put my foot
down.

When the IO scheduler questions were about a generic IO scheduler for
everything, I can kind of understand them. I think it was still a
mistake (for the reasons outline above), but at least it was a
comprehensible question to ask.

But when it gets to "what should I do about a single-queue version of
a MQ scheduler", the question is no longer even remotely sensible. The
question should simply NOT EXIST. There is no possible valid reason to
ask that kind of crap.

   Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Jens Axboe

On 02/22/2017 11:26 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboe  wrote:
>>
>> What do you mean by "the regular IO scheduler"? These are different
>> schedulers.
> 
> Not to the user they aren't.
> 
> If the user already answered once about the IO schedulers, we damn
> well shouldn't ask again abotu another small implementaiton detail.
> 
> How hard is this to understand? You're asking users stupid things.

The fact is that we have two different sets, until we can yank
the old ones. So I can't just ask one question, since the sets
aren't identical.

This IS confusing to the user, and it's an artifact of the situation
that we have where we are phasing out the old IO path and switching
to blk-mq. I don't want to user to know about blk-mq, I just want
it to be what everything runs on. But until that happens, and it is
happening, we are going to be stuck with that situation.

We have this exposed in other places, too. Like for dm, and for
SCSI. Not a perfect situation, but something that WILL go away
eventually.

> It's not just about the wording. It's a fundamental issue.  These
> questions are about internal implementation details. They make no
> sense to a user. They don't even make sense to a kernel developer, for
> chrissake!
> 
> Don't make the kconfig mess worse. This "we can't make good defaults
> in the kernel, so ask users about random things that they cannot
> possibly answer" model is not an acceptable model.

There are good defaults! mq single-queue should default to mq-deadline,
and mq multi-queue should default to "none" for now. If you feel that
strongly about it (and I'm guessing you do, judging by the speed
typing and generally annoyed demeanor), then by all means, let's kill
the config entries and I'll just hardwire the defaults. The config
entries were implemented similarly to the old schedulers, and each
scheduler is selectable individually. I'd greatly prefer just
improving the wording so it makes more sense.

> If the new schedulers aren't better than NOOP, they shouldn't exist.
> And if you want people to be able to test, they should be dynamic.

They are dynamic! You can build them as modules, you can switch at
runtime. Just like we have always been able to. I can't make it more
dynamic than that. We're reusing the same internal infrastructure for
that, AND the user visible ABI for checking what is available, and
setting a new one.

> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

BECAUSE IT'S POLICY! Fact of that matter is, if I just default to what
we had before, it'd all be running with none. In a few years time, if
I'm lucky, someone will have shipped udev rules setting this appropriately.
If I ask the question, we'll get testing NOW. People will run with
the default set.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Linus Torvalds

On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboe  wrote:
>
> What do you mean by "the regular IO scheduler"? These are different
> schedulers.

Not to the user they aren't.

If the user already answered once about the IO schedulers, we damn
well shouldn't ask again abotu another small implementaiton detail.

How hard is this to understand? You're asking users stupid things.

It's not just about the wording. It's a fundamental issue.  These
questions are about internal implementation details. They make no
sense to a user. They don't even make sense to a kernel developer, for
chrissake!

Don't make the kconfig mess worse. This "we can't make good defaults
in the kernel, so ask users about random things that they cannot
possibly answer" model is not an acceptable model.

If the new schedulers aren't better than NOOP, they shouldn't exist.
And if you want people to be able to test, they should be dynamic.

And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

It's really that simple.

 Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-22 Thread Jens Axboe

On 02/21/2017 04:23 PM, Linus Torvalds wrote:
> On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboe  wrote:
>>
>> But under a device managed by blk-mq, that device exposes a number of
>> hardware queues. For older style devices, that number is typically 1
>> (single queue).
> 
> ... but why would this ever be different from the normal IO scheduler?

Because we have a different set of schedulers for blk-mq, different
than the legacy path. mq-deadline is a basic port that will work
fine with rotational storage, but it's not going to be a good choice
for NVMe because of scalability issues.

We'll have BFQ on the blk-mq side, catering to the needs of those
folks that currently rely on the richer feature set that CFQ supports.

We've continually been working towards getting rid of the legacy
IO path, and its set of schedulers. So if it's any consolation,
those options will go away in the future.

> IOW, what makes single-queue mq scheduling so special that
> 
>  (a) it needs its own config option
> 
>  (b) it is different from just the regular IO scheduler in the first place?
> 
> So the whole thing stinks. The fact that it then has an
> incomprehensible config option seems to be just gravy on top of the
> crap.

What do you mean by "the regular IO scheduler"? These are different
schedulers.

As explained above, single-queue mq devices generally DO want mq-deadline.
multi-queue mq devices, we don't have a good choice for them right now,
so we retain the current behavior (that we've had since blk-mq was
introduced in 3.13) of NOT doing any IO scheduling for them. If you
do want scheduling for them, set the option, or configure udev to
make the right choice for you.

I agree the wording isn't great, and we can improve that. But I do
think that the current choices make sense.

>> "none" just means that we don't have a scheduler attached.
> 
> .. which makes no sense to me in the first place.
> 
> People used to try to convince us that doing IO schedulers was a
> mistake, because modern disk hardware did a better job than we could
> do in software.
> 
> Those people were full of crap. The regular IO scheduler used to have
> a "NONE" option too. Maybe it even still has one, but only insane
> people actually use it.
> 
> Why is the MQ stuff magically so different that NONE would make sense at all>?

I was never one of those people, and I've always been a strong advocate
for imposing scheduling to keep devices in check. The regular IO scheduler
pool includes "noop", which is probably the one you are thinking of. That
one is a bit different than the new "none" option for blk-mq, in that it
does do insertion sorts and it does do merges. "none" does some merging,
but only where it happens to make sense. There's no insertion sorting.

> And equally importantly: why do we _ask_ people these issues? Is this
> some kind of sick "cover your ass" thing, where you can say "well, I
> asked about it", when inevitably the choice ends up being the wrong
> one?
> 
> We have too damn many Kconfig options as-is, I'm trying to push back
> on them. These two options seem fundamentally broken and stupid.
> 
> The "we have no good idea, so let's add a Kconfig option" seems like a
> broken excuse for these things existing.
> 
> So why ask this question in the first place?
> 
> Is there any possible reason why "NONE" is a good option at all? And
> if it is the _only_ option (because no other better choice exists), it
> damn well shouldn't be a kconfig option!

I'm all for NOT asking questions, and not providing tunables. That's
generally how I do write code. See the blk-wbt stuff, for instance, that
basically just has one tunable that's set sanely by default, and we
figure out the rest.

I don't want to regress performance of blk-mq devices by attaching
mq-deadline to them. When we do have a sane scheduler choice, we'll
make that the default. And yes, maybe we can remove the Kconfig option
at that point.

For single queue devices, we could kill the option. But we're expecting
bfq-mq for 4.12, and we'll want to have the option at that point unless
you want to rely solely on runtime setting of the scheduler through
udev or by the sysadmin.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-21 Thread Linus Torvalds

On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboe  wrote:
>
> But under a device managed by blk-mq, that device exposes a number of
> hardware queues. For older style devices, that number is typically 1
> (single queue).

... but why would this ever be different from the normal IO scheduler?

IOW, what makes single-queue mq scheduling so special that

 (a) it needs its own config option

 (b) it is different from just the regular IO scheduler in the first place?

So the whole thing stinks. The fact that it then has an
incomprehensible config option seems to be just gravy on top of the
crap.

> "none" just means that we don't have a scheduler attached.

.. which makes no sense to me in the first place.

People used to try to convince us that doing IO schedulers was a
mistake, because modern disk hardware did a better job than we could
do in software.

Those people were full of crap. The regular IO scheduler used to have
a "NONE" option too. Maybe it even still has one, but only insane
people actually use it.

Why is the MQ stuff magically so different that NONE would make sense at all>?

And equally importantly: why do we _ask_ people these issues? Is this
some kind of sick "cover your ass" thing, where you can say "well, I
asked about it", when inevitably the choice ends up being the wrong
one?

We have too damn many Kconfig options as-is, I'm trying to push back
on them. These two options seem fundamentally broken and stupid.

The "we have no good idea, so let's add a Kconfig option" seems like a
broken excuse for these things existing.

So why ask this question in the first place?

Is there any possible reason why "NONE" is a good option at all? And
if it is the _only_ option (because no other better choice exists), it
damn well shouldn't be a kconfig option!

 Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-21 Thread Jens Axboe

On 02/21/2017 04:02 PM, Linus Torvalds wrote:
> Hmm. The new config options are incomprehensible and their help
> messages don't actually help.
> 
> So can you fill in the blanks on what
> 
>   Default single-queue blk-mq I/O scheduler
>   Default multi-queue blk-mq I/O scheduler
> 
> config options mean, and why they default to none?
> 
> The config phase of the kernel is one of the worst parts of the whole
> project, and adding these kinds of random and incomprehensible config
> options does *not* help.

I'll try and see if I can come up with some better sounding/reading
explanations.

But under a device managed by blk-mq, that device exposes a number of
hardware queues. For older style devices, that number is typically 1
(single queue). This is true for most SCSI devices that are run by
scsi-mq, which is often hosting rotational storage. Faster devices, like
nvme, exposes a lot more hardware queues (multi-queue). Hence the
distinction between having a scheduler attached for single-queue
devices, and for multi-queue devices. For rotational devices, we'll want
to default to something like mq-deadline, and I actually thought that
was the default already. It should be (see below). For multi-queue
devices, we'll want to initially default to "none", and then later
attach a properly multiqueue scheduler, when we have it (it's still in
development).

"none" just means that we don't have a scheduler attached.

In essence, we want to default to having a sane IO scheduler attached
depending on device class. For single-queue devices, that's deadline for
now. For multi-queue, we'll want to wait until we have something that
scales really well. It's not that easy to present this information in a
user grokkable fashion, since most people would not know the difference
between the two.

I'll send the below as a real patch, and also ponder how we can improve
the Kconfig text.

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 0715ce93daef..f6144c5d7c70 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -75,7 +75,7 @@ config MQ_IOSCHED_NONE

 choice
prompt "Default single-queue blk-mq I/O scheduler"
-   default DEFAULT_SQ_NONE
+   default DEFAULT_SQ_DEADLINE if MQ_IOSCHED_DEADLINE=y
help
  Select the I/O scheduler which will be used by default for blk-mq
  managed block devices with a single queue.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-21 Thread Linus Torvalds

Hmm. The new config options are incomprehensible and their help
messages don't actually help.

So can you fill in the blanks on what

  Default single-queue blk-mq I/O scheduler
  Default multi-queue blk-mq I/O scheduler

config options mean, and why they default to none?

The config phase of the kernel is one of the worst parts of the whole
project, and adding these kinds of random and incomprehensible config
options does *not* help.

  Linus

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-20 Thread Bart Van Assche

On 02/20/2017 08:32 AM, Jens Axboe wrote:
> Bart, since you are the only one that can reproduce this, can you just bisect
> your way through that series?

Hello Jens,

I will do that as soon as I'm back in the office (later this week).

Bart.

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-20 Thread Jens Axboe

On 02/20/2017 09:16 AM, Bart Van Assche wrote:
> On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
>> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
>>> That said, we will look into this again, of course. Christoph, any idea?
>>
>> No idea really - this seems so far away from the code touched, and there
>> are no obvious signs for a memory scamble from another object touched
>> that I think if it really bisects down to that issue it must be a timing
>> issue.
>>
>> But reading Bart's message again:  Did you actually bisect it down
>> to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
>> merge and all the block tree there might a few more likely suspects
>> like the scsi bdi lifetime fixes that James mentioned.
> 
> Hello Christoph,
> 
> As far as I know Jens does not rebase his trees so we can use the commit
> date to check which patch went in when. From the first of Jan's bdi patches:
> 
> CommitDate: Thu Feb 2 08:18:41 2017 -0700
> 
> So the bdi patches went in several days after I reported the general 
> protection
> fault issue.
> 
> In an e-mail of January 30th I wrote the following: "Running the srp-test
> software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
> for-4.11/block branch (commit 400f73b23f457a) however I just ran into
> the following warning: [ ... ]" That means that I did not hit the crash with
> Jens' for-4.11/block branch but only with the for-next branch. The patches
> on Jens' for-next branch after that commit that were applied before I ran
> my test are:
> 
> $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block 
> drivers/md/dm{,-mpath,-table}.[ch]
> fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in 
> __blk_rq_prep_clone
> 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of 
> struct request
> 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into 
> bsg_setup_queue
> eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation 
> to the owner of the request_queue
> 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for 
> extra command data
> 5ea708d15a928f7a479987704203616d3274c03b block: simplify 
> blk_init_allocated_queue
> e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
> f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into 
> for-4.11/rq-refactor
> 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
> bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from 
> __blkdev_issue_zeroout
> f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size 
> and bdev_zone_size
> 
> Do you see any patch in the above list that does not belong to the "split
> scsi passthrough fields out of struct request" series and that could have
> caused the reported behavior change?

Bart, since you are the only one that can reproduce this, can you just bisect
your way through that series?

-- 
Jens Axboe

RE: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-20 Thread Bart Van Assche

On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
> On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
>> That said, we will look into this again, of course. Christoph, any idea?
> 
> No idea really - this seems so far away from the code touched, and there
> are no obvious signs for a memory scamble from another object touched
> that I think if it really bisects down to that issue it must be a timing
> issue.
> 
> But reading Bart's message again:  Did you actually bisect it down
> to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
> merge and all the block tree there might a few more likely suspects
> like the scsi bdi lifetime fixes that James mentioned.

Hello Christoph,

As far as I know Jens does not rebase his trees so we can use the commit
date to check which patch went in when. From the first of Jan's bdi patches:

CommitDate: Thu Feb 2 08:18:41 2017 -0700

So the bdi patches went in several days after I reported the general protection
fault issue.

In an e-mail of January 30th I wrote the following: "Running the srp-test
software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
for-4.11/block branch (commit 400f73b23f457a) however I just ran into
the following warning: [ ... ]" That means that I did not hit the crash with
Jens' for-4.11/block branch but only with the for-next branch. The patches
on Jens' for-next branch after that commit that were applied before I ran
my test are:

$ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block 
drivers/md/dm{,-mpath,-table}.[ch]
fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in 
__blk_rq_prep_clone
82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of 
struct request
8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into 
bsg_setup_queue
eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to 
the owner of the request_queue
6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra 
command data
5ea708d15a928f7a479987704203616d3274c03b block: simplify 
blk_init_allocated_queue
e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into 
for-4.11/rq-refactor
88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from 
__blkdev_issue_zeroout
f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and 
bdev_zone_size

Do you see any patch in the above list that does not belong to the "split
scsi passthrough fields out of struct request" series and that could have
caused the reported behavior change?

Thanks,

Bart.

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-19 Thread Jens Axboe

On 02/19/2017 07:59 PM, Jens Axboe wrote:
> On 02/19/2017 07:12 PM, James Bottomley wrote:
>> On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
>>> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
 On 02/19/2017 04:11 PM, Jens Axboe wrote:
> - Removal of the BLOCK_PC support in struct request, and
> refactoring of
>   carrying SCSI payloads in the block layer. This cleans up the
> code
>   nicely, and enables us to kill the SCSI specific parts of
> struct
>   request, shrinking it down nicely. From Christoph mainly, with
> help
>   from Hannes.

 Hello Jens, Christoph and Mike,

 Is anyone working on a fix for the regression introduced by the 
 BLOCK_PC removal changes (general protection fault) that I had 
 reported three weeks ago? See also
 https://www.spinics.net/lists/raid/msg55494.html
>>>
>>> I don't think that's a regression in this series, it just triggers 
>>> more easily with this series. The BLOCK_PC removal fixes aren't 
>>> touching device life times at all.
>>>
>>> That said, we will look into this again, of course. Christoph, any
>>> idea?
>>
>> We could do with tracing the bdi removal failure issue fingered both by
>> the block xfstests and Omar.  It's something to do with this set of
>> commits
>>
>>> - Fixes for duplicate bdi registrations and bdi/queue life time
>>>   problems from Jan and Dan.
>>
>> But no-one has actually root caused it yet.
> 
> The above set from Jan and Dan fixed one set of issues around this, and
> the reproducer test case was happy as well. But we've recently found
> that there are still corner cases that aren't happy, Omar reported that
> separately on Friday. So there will be a followup set for that,
> hopefully intersecting with the issue that Bart reported.

Forgot to mention, that these cases exist in 4.0 and 4.6 as well, so
neither are a new problem with this series. The fixes from Jan and
Dan didn't close all of them.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-19 Thread Jens Axboe

On 02/19/2017 07:12 PM, James Bottomley wrote:
> On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
>> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
>>> On 02/19/2017 04:11 PM, Jens Axboe wrote:
 - Removal of the BLOCK_PC support in struct request, and
 refactoring of
   carrying SCSI payloads in the block layer. This cleans up the
 code
   nicely, and enables us to kill the SCSI specific parts of
 struct
   request, shrinking it down nicely. From Christoph mainly, with
 help
   from Hannes.
>>>
>>> Hello Jens, Christoph and Mike,
>>>
>>> Is anyone working on a fix for the regression introduced by the 
>>> BLOCK_PC removal changes (general protection fault) that I had 
>>> reported three weeks ago? See also
>>> https://www.spinics.net/lists/raid/msg55494.html
>>
>> I don't think that's a regression in this series, it just triggers 
>> more easily with this series. The BLOCK_PC removal fixes aren't 
>> touching device life times at all.
>>
>> That said, we will look into this again, of course. Christoph, any
>> idea?
> 
> We could do with tracing the bdi removal failure issue fingered both by
> the block xfstests and Omar.  It's something to do with this set of
> commits
> 
>> - Fixes for duplicate bdi registrations and bdi/queue life time
>>   problems from Jan and Dan.
> 
> But no-one has actually root caused it yet.

The above set from Jan and Dan fixed one set of issues around this, and
the reproducer test case was happy as well. But we've recently found
that there are still corner cases that aren't happy, Omar reported that
separately on Friday. So there will be a followup set for that,
hopefully intersecting with the issue that Bart reported.

-- 
Jens Axboe

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-19 Thread James Bottomley

On Sun, 2017-02-19 at 18:15 -0700, Jens Axboe wrote:
> On 02/19/2017 06:09 PM, Bart Van Assche wrote:
> > On 02/19/2017 04:11 PM, Jens Axboe wrote:
> > > - Removal of the BLOCK_PC support in struct request, and
> > > refactoring of
> > >   carrying SCSI payloads in the block layer. This cleans up the
> > > code
> > >   nicely, and enables us to kill the SCSI specific parts of
> > > struct
> > >   request, shrinking it down nicely. From Christoph mainly, with
> > > help
> > >   from Hannes.
> > 
> > Hello Jens, Christoph and Mike,
> > 
> > Is anyone working on a fix for the regression introduced by the 
> > BLOCK_PC removal changes (general protection fault) that I had 
> > reported three weeks ago? See also
> > https://www.spinics.net/lists/raid/msg55494.html
> 
> I don't think that's a regression in this series, it just triggers 
> more easily with this series. The BLOCK_PC removal fixes aren't 
> touching device life times at all.
> 
> That said, we will look into this again, of course. Christoph, any
> idea?

We could do with tracing the bdi removal failure issue fingered both by
the block xfstests and Omar.  It's something to do with this set of
commits

> - Fixes for duplicate bdi registrations and bdi/queue life time
>   problems from Jan and Dan.

But no-one has actually root caused it yet.

James

Re: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-19 Thread Jens Axboe

On 02/19/2017 06:09 PM, Bart Van Assche wrote:
> On 02/19/2017 04:11 PM, Jens Axboe wrote:
>> - Removal of the BLOCK_PC support in struct request, and refactoring of
>>   carrying SCSI payloads in the block layer. This cleans up the code
>>   nicely, and enables us to kill the SCSI specific parts of struct
>>   request, shrinking it down nicely. From Christoph mainly, with help
>>   from Hannes.
> 
> Hello Jens, Christoph and Mike,
> 
> Is anyone working on a fix for the regression introduced by the BLOCK_PC 
> removal
> changes (general protection fault) that I had reported three weeks ago? See 
> also
> https://www.spinics.net/lists/raid/msg55494.html

I don't think that's a regression in this series, it just triggers more easily
with this series. The BLOCK_PC removal fixes aren't touching device life times
at all.

That said, we will look into this again, of course. Christoph, any idea?

-- 
Jens Axboe

RE: [GIT PULL] Block pull request for- 4.11-rc1

2017-02-19 Thread Bart Van Assche

On 02/19/2017 04:11 PM, Jens Axboe wrote:
> - Removal of the BLOCK_PC support in struct request, and refactoring of
>   carrying SCSI payloads in the block layer. This cleans up the code
>   nicely, and enables us to kill the SCSI specific parts of struct
>   request, shrinking it down nicely. From Christoph mainly, with help
>   from Hannes.

Hello Jens, Christoph and Mike,

Is anyone working on a fix for the regression introduced by the BLOCK_PC removal
changes (general protection fault) that I had reported three weeks ago? See also
https://www.spinics.net/lists/raid/msg55494.html

Thanks,

Bart.

[GIT PULL] Block pull request for- 4.11-rc1

2017-02-19 Thread Jens Axboe

Hi Linus,

This is the collected pull request for 4.11 for the block core and
drivers. It's really two different branches:

for-4.11/block-signed
for-4.11/next-signed

for-4.11/next exists because some of Christoph's patch series were based
on patches that were added after for-4.11/block was forked off, which
would have caused needless merge pain. So for-4.11/next was forked off
v4.10-rc5, with for-4.11/block pulled in. Feel free to pull each of
these in succession instead of this pre-merged branch. Note that if you
do opt for pulling the two branches, for-4.11/block will throw a trivial
merge conflict in block/blk-mq.c, where you need to delete a function.
for-4.11/next merges cleanly, HOWEVER, a commit that was added in
mainline since v4.10-rc5 (f2e767bb5d6e) adds a line of code that is no
longer valid with the changes in for-4.11/next. Hence, if you do pull in
separately, you'll want to --no-commit and edit:

drivers/scsi/mpt3sas/mpt3sas_scsih.c

to fix up that one line, like I did in the for-4.11/linus-merge branch.


With that said, this pull request contains:

- blk-mq scheduling framework from me and Omar, with a port of the
  deadline scheduler for this framework. A port of BFQ from Paolo
  is in the works, and should be ready for 4.12.

- Various fixups and improvements to the above scheduling framework
  from Omar, Paolo, Bart, me, others.

- Cleanup of the exported sysfs blk-mq data into debugfs, from
  Omar. This allows us to export more information that helps
  debug hangs or performance issues, without cluttering or
  abusing the sysfs API.

- Fixes for the sbitmap code, the scalable bitmap code that was
  migrated from blk-mq, from Omar.

- Removal of the BLOCK_PC support in struct request, and refactoring of
  carrying SCSI payloads in the block layer. This cleans up the code
  nicely, and enables us to kill the SCSI specific parts of struct
  request, shrinking it down nicely. From Christoph mainly, with help
  from Hannes.

- Support for ranged discard requests and discard merging, also from
  Christoph.

- Support for OPAL in the block layer, and for NVMe as well. Mainly
  from Scott Bauer, with fixes/updates from various others folks.

- Error code fixup for gdrom from Christophe.

- cciss pci irq allocation cleanup from Christoph.

- Making the cdrom device operations read only, from Kees Cook.

- Fixes for duplicate bdi registrations and bdi/queue life time
  problems from Jan and Dan.

- Set of fixes and updates for lightnvm, from Matias and Javier.

- A few fixes for nbd from Josef, using idr to name devices and
  a workqueue deadlock fix on receive. Also marks Josef as the
  current maintainer of nbd.

- Fix from Josef, overwriting queue settings when the number of
  hardware queues is updated for a blk-mq device.

- NVMe fix from Keith, ensuring that we don't repeatedly mark
  and IO aborted, if we didn't end up aborting it.

- SG gap merging fix from Ming Lei for block.

- Loop fix also from Ming, fixing a race and crash between setting
  loop status and IO.

- Two block race fixes from Tahsin, fixing request list iteration
  and fixing a race between device registration and udev device add
  notifiations.

- Double free fix from cgroup writeback, from Tejun.

- Another double free fix in blkcg, from Hou Tao.

- Partition overflow fix for EFI from Alden Tondettar.

Please pull! Either this pre-merged branch:


  git://git.kernel.dk/linux-block.git for-4.11/linus-merge-signed


or


  git://git.kernel.dk/linux-block.git for-4.11/block-signed
  git://git.kernel.dk/linux-block.git for-4.11/next-signed


in that order. All branches are signed tags.



Alden Tondettar (1):
  partitions/efi: Fix integer overflow in GPT size calculation

Alexander Potapenko (1):
  block: Initialize cfqq->ioprio_class in cfq_get_queue()

Bart Van Assche (5):
  blk-mq-debugfs: Add missing __acquires() / __releases() annotations
  blk-mq-debug: Avoid that sparse complains about req_flags_t usage
  blk-mq-debug: Make show() operations interruptible
  blk-mq-debug: Introduce debugfs_create_files()
  block: Update comments that refer to __bio_map_user() and bio_map_user()

Christoph Hellwig (39):
  block: add a op_is_flush helper
  md: cleanup bio op / flags handling in raid1_write_request
  block: fix elevator init check
  block: simplify blk_init_allocated_queue
  block: allow specifying size for extra command data
  block: cleanup tracing
  dm: remove incomplete BLOCK_PC support
  dm: always defer request allocation to the owner of the request_queue
  scsi: remove gfp_flags member in scsi_host_cmd_pool
  scsi: respect unchecked_isa_dma for blk-mq
  scsi: remove scsi_cmd_dma_pool
  scsi: remove __scsi_alloc_queue
  scsi: allocate scsi_cmnd structures as part of struct request
  block/bsg: move queue creation into bsg_setup_queue

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

RE: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

Re: [GIT PULL] Block pull request for- 4.11-rc1

RE: [GIT PULL] Block pull request for- 4.11-rc1

[GIT PULL] Block pull request for- 4.11-rc1

32 matches

Site Navigation

Mail list logo

Footer information