Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-14 Thread Mike Snitzer
On Thu, Feb 09 2017 at  4:25pm -0500,
Kent Overstreet  wrote:

> On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote:
> > On Tue, Feb 07 2017 at 11:58pm -0500,
> > Kent Overstreet  wrote:
> > 
> > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > > > 
> > > > > > > So.. what needs to be done there?
> > > > > 
> > > > > > But, I just got an idea for how to handle this that might be 
> > > > > > halfway sane, maybe
> > > > > > I'll try and come up with a patch...
> > > > > 
> > > > > Ok, here's such a patch, only lightly tested:
> > > > 
> > > > I guess it would be nice for me to test it... but what it is against?
> > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> > > 
> > > Sorry, I forgot I had a few other patches in my branch that touch
> > > mempool/biosets code.
> > > 
> > > Also, after thinking about it more and looking at the relevant code, I'm 
> > > pretty
> > > sure we don't need rescuer threads for block devices that just split bios 
> > > - i.e.
> > > most of them, so I changed my patch to do that.
> > > 
> > > Tested it by ripping out the current->bio_list checks/workarounds from the
> > > bcache code, appears to work:
> > 
> > Feedback on this patch below, but first:
> > 
> > There are deeper issues with the current->bio_list and rescue workqueues
> > than thread counts.
> > 
> > I cannot help but feel like you (and Jens) are repeatedly ignoring the
> > issue that has been raised numerous times, most recently:
> > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html
> > 
> > FYI, this test (albeit ugly) can be used to check if the dm-snapshot
> > deadlock is fixed:
> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
> > 
> > This situation is the unfortunate pathological worst case for what
> > happens when changes are merged and nobody wants to own fixing the
> > unforseen implications/regressions.   Like everyone else in a position
> > of Linux maintenance I've tried to stay away from owning the
> > responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
> > now.. I do bear responsibility for not digging in myself.  We're all
> > busy and this issue is "hard".
> 
> Mike, it's not my job to debug DM code for you or sift through your bug 
> reports.
> I don't read dm-devel, and I don't know why you think I that's my job.
> 
> If there's something you think the block layer should be doing differently, 
> post
> patches - or at the very least, explain what you'd like to be done, with 
> words.
> Don't get pissy because I'm not sifting through your bug reports.
> 
> Hell, I'm not getting paid to work on kernel code at all right now, and you
> trying to rope me into fixing device mapper sure makes me want to work on the
> block layer more.
> 
> DM developers have a long history of working siloed off from the rest of the
> block layer, building up their own crazy infrastructure (remember the old bio
> splitting code?) and going to extreme lengths to avoid having to work on or
> improve the core block layer infrastructure. It's ridiculous.

That is bullshit.  I try to help block core where/when I can (did so
with discards and limits stacking, other fixes here and there).

Your take on what DM code is and how it evolved is way off base.  But
that is to be expected from you.  Not going to waste time laboring over
correcting you.

> You know what would be nice? What'd really make my day is if just once I got a
> thank you or a bit of appreciation from DM developers for the bvec 
> iterators/bio
> splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting
> rid of merge_bvec_fn, or trying to come up with a better solution for 
> deadlocks
> due to running under generic_make_request() now.

I do appreciate the immutable biovec work you did.  Helped speed up bio
cloning quite nicely.

But you've always been hostile to DM.  You'll fabricate any excuse to
never touch or care about it.  Repeatedly noted.  But this is a block
core bug/regression.  Not DM.

If you think I'm going to thank you for blindly breaking shit, and
refusing to care when you're made aware of it, then you're out of your
mind.

Save your predictably hostile response.  It'll get marked as spam
anyway.


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-14 Thread Mike Snitzer
On Thu, Feb 09 2017 at  4:25pm -0500,
Kent Overstreet  wrote:

> On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote:
> > On Tue, Feb 07 2017 at 11:58pm -0500,
> > Kent Overstreet  wrote:
> > 
> > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > > > 
> > > > > > > So.. what needs to be done there?
> > > > > 
> > > > > > But, I just got an idea for how to handle this that might be 
> > > > > > halfway sane, maybe
> > > > > > I'll try and come up with a patch...
> > > > > 
> > > > > Ok, here's such a patch, only lightly tested:
> > > > 
> > > > I guess it would be nice for me to test it... but what it is against?
> > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> > > 
> > > Sorry, I forgot I had a few other patches in my branch that touch
> > > mempool/biosets code.
> > > 
> > > Also, after thinking about it more and looking at the relevant code, I'm 
> > > pretty
> > > sure we don't need rescuer threads for block devices that just split bios 
> > > - i.e.
> > > most of them, so I changed my patch to do that.
> > > 
> > > Tested it by ripping out the current->bio_list checks/workarounds from the
> > > bcache code, appears to work:
> > 
> > Feedback on this patch below, but first:
> > 
> > There are deeper issues with the current->bio_list and rescue workqueues
> > than thread counts.
> > 
> > I cannot help but feel like you (and Jens) are repeatedly ignoring the
> > issue that has been raised numerous times, most recently:
> > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html
> > 
> > FYI, this test (albeit ugly) can be used to check if the dm-snapshot
> > deadlock is fixed:
> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
> > 
> > This situation is the unfortunate pathological worst case for what
> > happens when changes are merged and nobody wants to own fixing the
> > unforseen implications/regressions.   Like everyone else in a position
> > of Linux maintenance I've tried to stay away from owning the
> > responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
> > now.. I do bear responsibility for not digging in myself.  We're all
> > busy and this issue is "hard".
> 
> Mike, it's not my job to debug DM code for you or sift through your bug 
> reports.
> I don't read dm-devel, and I don't know why you think I that's my job.
> 
> If there's something you think the block layer should be doing differently, 
> post
> patches - or at the very least, explain what you'd like to be done, with 
> words.
> Don't get pissy because I'm not sifting through your bug reports.
> 
> Hell, I'm not getting paid to work on kernel code at all right now, and you
> trying to rope me into fixing device mapper sure makes me want to work on the
> block layer more.
> 
> DM developers have a long history of working siloed off from the rest of the
> block layer, building up their own crazy infrastructure (remember the old bio
> splitting code?) and going to extreme lengths to avoid having to work on or
> improve the core block layer infrastructure. It's ridiculous.

That is bullshit.  I try to help block core where/when I can (did so
with discards and limits stacking, other fixes here and there).

Your take on what DM code is and how it evolved is way off base.  But
that is to be expected from you.  Not going to waste time laboring over
correcting you.

> You know what would be nice? What'd really make my day is if just once I got a
> thank you or a bit of appreciation from DM developers for the bvec 
> iterators/bio
> splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting
> rid of merge_bvec_fn, or trying to come up with a better solution for 
> deadlocks
> due to running under generic_make_request() now.

I do appreciate the immutable biovec work you did.  Helped speed up bio
cloning quite nicely.

But you've always been hostile to DM.  You'll fabricate any excuse to
never touch or care about it.  Repeatedly noted.  But this is a block
core bug/regression.  Not DM.

If you think I'm going to thank you for blindly breaking shit, and
refusing to care when you're made aware of it, then you're out of your
mind.

Save your predictably hostile response.  It'll get marked as spam
anyway.


Re: [dm-devel] v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-14 Thread Mikulas Patocka


On Thu, 9 Feb 2017, Kent Overstreet wrote:

> On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote:
> > On Tue, Feb 07 2017 at 11:58pm -0500,
> > Kent Overstreet  wrote:
> > 
> > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > > > 
> > > > > > > So.. what needs to be done there?
> > > > > 
> > > > > > But, I just got an idea for how to handle this that might be 
> > > > > > halfway sane, maybe
> > > > > > I'll try and come up with a patch...
> > > > > 
> > > > > Ok, here's such a patch, only lightly tested:
> > > > 
> > > > I guess it would be nice for me to test it... but what it is against?
> > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> > > 
> > > Sorry, I forgot I had a few other patches in my branch that touch
> > > mempool/biosets code.
> > > 
> > > Also, after thinking about it more and looking at the relevant code, I'm 
> > > pretty
> > > sure we don't need rescuer threads for block devices that just split bios 
> > > - i.e.
> > > most of them, so I changed my patch to do that.
> > > 
> > > Tested it by ripping out the current->bio_list checks/workarounds from the
> > > bcache code, appears to work:
> > 
> > Feedback on this patch below, but first:
> > 
> > There are deeper issues with the current->bio_list and rescue workqueues
> > than thread counts.
> > 
> > I cannot help but feel like you (and Jens) are repeatedly ignoring the
> > issue that has been raised numerous times, most recently:
> > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html
> > 
> > FYI, this test (albeit ugly) can be used to check if the dm-snapshot
> > deadlock is fixed:
> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
> > 
> > This situation is the unfortunate pathological worst case for what
> > happens when changes are merged and nobody wants to own fixing the
> > unforseen implications/regressions.   Like everyone else in a position
> > of Linux maintenance I've tried to stay away from owning the
> > responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
> > now.. I do bear responsibility for not digging in myself.  We're all
> > busy and this issue is "hard".
> 
> Mike, it's not my job to debug DM code for you or sift through your bug 
> reports.
> I don't read dm-devel, and I don't know why you think I that's my job.
> 
> If there's something you think the block layer should be doing differently, 
> post
> patches - or at the very least, explain what you'd like to be done, with 
> words.
> Don't get pissy because I'm not sifting through your bug reports.

So I post this patch for that bug.

Will any of the block device maintainers respond to it?



From: Mikulas Patocka 
Date: Tue, 27 May 2014 11:03:36 -0400
Subject: block: flush queued bios when process blocks to avoid deadlock

The block layer uses per-process bio list to avoid recursion in
generic_make_request.  When generic_make_request is called recursively,
the bio is added to current->bio_list and generic_make_request returns
immediately.  The top-level instance of generic_make_request takes bios
from current->bio_list and processes them.

The problem is that this bio queuing on current->bio_list creates an 
artifical locking dependency - a bio further on current->bio_list depends 
on any locks that preceding bios could take.  This could result in a 
deadlock.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks by
redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However another deadlock (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule call.  Consequently, when
the process blocks on a mutex, the bios queued on current->bio_list are
dispatched to independent workqueus and they can complete without
waiting for the mutex to be available.

Also, now we can remove punt_bios_to_rescuer() and bio_alloc_bioset()'s
calls to it because bio_alloc_bioset() will implicitly punt all bios on
current->bio_list if it performs a blocking allocation.

** Here is the dm-snapshot deadlock that was observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the 

Re: [dm-devel] v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-14 Thread Mikulas Patocka


On Thu, 9 Feb 2017, Kent Overstreet wrote:

> On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote:
> > On Tue, Feb 07 2017 at 11:58pm -0500,
> > Kent Overstreet  wrote:
> > 
> > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > > > 
> > > > > > > So.. what needs to be done there?
> > > > > 
> > > > > > But, I just got an idea for how to handle this that might be 
> > > > > > halfway sane, maybe
> > > > > > I'll try and come up with a patch...
> > > > > 
> > > > > Ok, here's such a patch, only lightly tested:
> > > > 
> > > > I guess it would be nice for me to test it... but what it is against?
> > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> > > 
> > > Sorry, I forgot I had a few other patches in my branch that touch
> > > mempool/biosets code.
> > > 
> > > Also, after thinking about it more and looking at the relevant code, I'm 
> > > pretty
> > > sure we don't need rescuer threads for block devices that just split bios 
> > > - i.e.
> > > most of them, so I changed my patch to do that.
> > > 
> > > Tested it by ripping out the current->bio_list checks/workarounds from the
> > > bcache code, appears to work:
> > 
> > Feedback on this patch below, but first:
> > 
> > There are deeper issues with the current->bio_list and rescue workqueues
> > than thread counts.
> > 
> > I cannot help but feel like you (and Jens) are repeatedly ignoring the
> > issue that has been raised numerous times, most recently:
> > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html
> > 
> > FYI, this test (albeit ugly) can be used to check if the dm-snapshot
> > deadlock is fixed:
> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
> > 
> > This situation is the unfortunate pathological worst case for what
> > happens when changes are merged and nobody wants to own fixing the
> > unforseen implications/regressions.   Like everyone else in a position
> > of Linux maintenance I've tried to stay away from owning the
> > responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
> > now.. I do bear responsibility for not digging in myself.  We're all
> > busy and this issue is "hard".
> 
> Mike, it's not my job to debug DM code for you or sift through your bug 
> reports.
> I don't read dm-devel, and I don't know why you think I that's my job.
> 
> If there's something you think the block layer should be doing differently, 
> post
> patches - or at the very least, explain what you'd like to be done, with 
> words.
> Don't get pissy because I'm not sifting through your bug reports.

So I post this patch for that bug.

Will any of the block device maintainers respond to it?



From: Mikulas Patocka 
Date: Tue, 27 May 2014 11:03:36 -0400
Subject: block: flush queued bios when process blocks to avoid deadlock

The block layer uses per-process bio list to avoid recursion in
generic_make_request.  When generic_make_request is called recursively,
the bio is added to current->bio_list and generic_make_request returns
immediately.  The top-level instance of generic_make_request takes bios
from current->bio_list and processes them.

The problem is that this bio queuing on current->bio_list creates an 
artifical locking dependency - a bio further on current->bio_list depends 
on any locks that preceding bios could take.  This could result in a 
deadlock.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks by
redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However another deadlock (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule call.  Consequently, when
the process blocks on a mutex, the bios queued on current->bio_list are
dispatched to independent workqueus and they can complete without
waiting for the mutex to be available.

Also, now we can remove punt_bios_to_rescuer() and bio_alloc_bioset()'s
calls to it because bio_alloc_bioset() will implicitly punt all bios on
current->bio_list if it performs a blocking allocation.

** Here is the dm-snapshot deadlock that was observed:

1) Process A sends one-page read bio to the dm-snapshot target. The bio
spans snapshot chunk boundary and so it is split to two bios by device
mapper.

2) Device mapper creates the first sub-bio and sends it to the snapshot
driver.

3) The function snapshot_map calls 

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-09 Thread Kent Overstreet
On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote:
> On Tue, Feb 07 2017 at 11:58pm -0500,
> Kent Overstreet  wrote:
> 
> > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > > 
> > > > > > So.. what needs to be done there?
> > > > 
> > > > > But, I just got an idea for how to handle this that might be halfway 
> > > > > sane, maybe
> > > > > I'll try and come up with a patch...
> > > > 
> > > > Ok, here's such a patch, only lightly tested:
> > > 
> > > I guess it would be nice for me to test it... but what it is against?
> > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> > 
> > Sorry, I forgot I had a few other patches in my branch that touch
> > mempool/biosets code.
> > 
> > Also, after thinking about it more and looking at the relevant code, I'm 
> > pretty
> > sure we don't need rescuer threads for block devices that just split bios - 
> > i.e.
> > most of them, so I changed my patch to do that.
> > 
> > Tested it by ripping out the current->bio_list checks/workarounds from the
> > bcache code, appears to work:
> 
> Feedback on this patch below, but first:
> 
> There are deeper issues with the current->bio_list and rescue workqueues
> than thread counts.
> 
> I cannot help but feel like you (and Jens) are repeatedly ignoring the
> issue that has been raised numerous times, most recently:
> https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html
> 
> FYI, this test (albeit ugly) can be used to check if the dm-snapshot
> deadlock is fixed:
> https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
> 
> This situation is the unfortunate pathological worst case for what
> happens when changes are merged and nobody wants to own fixing the
> unforseen implications/regressions.   Like everyone else in a position
> of Linux maintenance I've tried to stay away from owning the
> responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
> now.. I do bear responsibility for not digging in myself.  We're all
> busy and this issue is "hard".

Mike, it's not my job to debug DM code for you or sift through your bug reports.
I don't read dm-devel, and I don't know why you think I that's my job.

If there's something you think the block layer should be doing differently, post
patches - or at the very least, explain what you'd like to be done, with words.
Don't get pissy because I'm not sifting through your bug reports.

Hell, I'm not getting paid to work on kernel code at all right now, and you
trying to rope me into fixing device mapper sure makes me want to work on the
block layer more.

DM developers have a long history of working siloed off from the rest of the
block layer, building up their own crazy infrastructure (remember the old bio
splitting code?) and going to extreme lengths to avoid having to work on or
improve the core block layer infrastructure. It's ridiculous.

You know what would be nice? What'd really make my day is if just once I got a
thank you or a bit of appreciation from DM developers for the bvec iterators/bio
splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting
rid of merge_bvec_fn, or trying to come up with a better solution for deadlocks
due to running under generic_make_request() now.


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-09 Thread Kent Overstreet
On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote:
> On Tue, Feb 07 2017 at 11:58pm -0500,
> Kent Overstreet  wrote:
> 
> > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > > 
> > > > > > So.. what needs to be done there?
> > > > 
> > > > > But, I just got an idea for how to handle this that might be halfway 
> > > > > sane, maybe
> > > > > I'll try and come up with a patch...
> > > > 
> > > > Ok, here's such a patch, only lightly tested:
> > > 
> > > I guess it would be nice for me to test it... but what it is against?
> > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> > 
> > Sorry, I forgot I had a few other patches in my branch that touch
> > mempool/biosets code.
> > 
> > Also, after thinking about it more and looking at the relevant code, I'm 
> > pretty
> > sure we don't need rescuer threads for block devices that just split bios - 
> > i.e.
> > most of them, so I changed my patch to do that.
> > 
> > Tested it by ripping out the current->bio_list checks/workarounds from the
> > bcache code, appears to work:
> 
> Feedback on this patch below, but first:
> 
> There are deeper issues with the current->bio_list and rescue workqueues
> than thread counts.
> 
> I cannot help but feel like you (and Jens) are repeatedly ignoring the
> issue that has been raised numerous times, most recently:
> https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html
> 
> FYI, this test (albeit ugly) can be used to check if the dm-snapshot
> deadlock is fixed:
> https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html
> 
> This situation is the unfortunate pathological worst case for what
> happens when changes are merged and nobody wants to own fixing the
> unforseen implications/regressions.   Like everyone else in a position
> of Linux maintenance I've tried to stay away from owning the
> responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
> now.. I do bear responsibility for not digging in myself.  We're all
> busy and this issue is "hard".

Mike, it's not my job to debug DM code for you or sift through your bug reports.
I don't read dm-devel, and I don't know why you think I that's my job.

If there's something you think the block layer should be doing differently, post
patches - or at the very least, explain what you'd like to be done, with words.
Don't get pissy because I'm not sifting through your bug reports.

Hell, I'm not getting paid to work on kernel code at all right now, and you
trying to rope me into fixing device mapper sure makes me want to work on the
block layer more.

DM developers have a long history of working siloed off from the rest of the
block layer, building up their own crazy infrastructure (remember the old bio
splitting code?) and going to extreme lengths to avoid having to work on or
improve the core block layer infrastructure. It's ridiculous.

You know what would be nice? What'd really make my day is if just once I got a
thank you or a bit of appreciation from DM developers for the bvec iterators/bio
splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting
rid of merge_bvec_fn, or trying to come up with a better solution for deadlocks
due to running under generic_make_request() now.


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-08 Thread Mike Snitzer
On Tue, Feb 07 2017 at 11:58pm -0500,
Kent Overstreet  wrote:

> On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > 
> > > > > So.. what needs to be done there?
> > > 
> > > > But, I just got an idea for how to handle this that might be halfway 
> > > > sane, maybe
> > > > I'll try and come up with a patch...
> > > 
> > > Ok, here's such a patch, only lightly tested:
> > 
> > I guess it would be nice for me to test it... but what it is against?
> > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> 
> Sorry, I forgot I had a few other patches in my branch that touch
> mempool/biosets code.
> 
> Also, after thinking about it more and looking at the relevant code, I'm 
> pretty
> sure we don't need rescuer threads for block devices that just split bios - 
> i.e.
> most of them, so I changed my patch to do that.
> 
> Tested it by ripping out the current->bio_list checks/workarounds from the
> bcache code, appears to work:

Feedback on this patch below, but first:

There are deeper issues with the current->bio_list and rescue workqueues
than thread counts.

I cannot help but feel like you (and Jens) are repeatedly ignoring the
issue that has been raised numerous times, most recently:
https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html

FYI, this test (albeit ugly) can be used to check if the dm-snapshot
deadlock is fixed:
https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html

This situation is the unfortunate pathological worst case for what
happens when changes are merged and nobody wants to own fixing the
unforseen implications/regressions.   Like everyone else in a position
of Linux maintenance I've tried to stay away from owning the
responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
now.. I do bear responsibility for not digging in myself.  We're all
busy and this issue is "hard".

> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
> 
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
> 
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
> 
> Signed-off-by: Kent Overstreet 
> ---
...
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3086da5664..e1b22a68d9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1490,7 +1490,7 @@ static struct mapped_device *alloc_dev(int minor)
>   INIT_LIST_HEAD(>table_devices);
>   spin_lock_init(>uevent_lock);
>  
> - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id);
> + md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id, 0);
>   if (!md->queue)
>   goto bad;
>  

This should be BLK_QUEUE_NO_RESCUER as DM isn't making direct use of
bio_queue_split() for its own internal spliting (maybe it should and
that'd start to fix the issue I've been harping about?) but as is DM
destroys the rescuer workqueue (since commit dbba42d8a9eb "dm: eliminate
unused "bioset" process for each bio-based DM device").

Mike


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-08 Thread Mike Snitzer
On Tue, Feb 07 2017 at 11:58pm -0500,
Kent Overstreet  wrote:

> On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > 
> > > > > So.. what needs to be done there?
> > > 
> > > > But, I just got an idea for how to handle this that might be halfway 
> > > > sane, maybe
> > > > I'll try and come up with a patch...
> > > 
> > > Ok, here's such a patch, only lightly tested:
> > 
> > I guess it would be nice for me to test it... but what it is against?
> > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> 
> Sorry, I forgot I had a few other patches in my branch that touch
> mempool/biosets code.
> 
> Also, after thinking about it more and looking at the relevant code, I'm 
> pretty
> sure we don't need rescuer threads for block devices that just split bios - 
> i.e.
> most of them, so I changed my patch to do that.
> 
> Tested it by ripping out the current->bio_list checks/workarounds from the
> bcache code, appears to work:

Feedback on this patch below, but first:

There are deeper issues with the current->bio_list and rescue workqueues
than thread counts.

I cannot help but feel like you (and Jens) are repeatedly ignoring the
issue that has been raised numerous times, most recently:
https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html

FYI, this test (albeit ugly) can be used to check if the dm-snapshot
deadlock is fixed:
https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html

This situation is the unfortunate pathological worst case for what
happens when changes are merged and nobody wants to own fixing the
unforseen implications/regressions.   Like everyone else in a position
of Linux maintenance I've tried to stay away from owning the
responsibility of a fix -- it isn't working.  Ok, I'll stop bitching
now.. I do bear responsibility for not digging in myself.  We're all
busy and this issue is "hard".

> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
> 
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
> 
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
> 
> Signed-off-by: Kent Overstreet 
> ---
...
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3086da5664..e1b22a68d9 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1490,7 +1490,7 @@ static struct mapped_device *alloc_dev(int minor)
>   INIT_LIST_HEAD(>table_devices);
>   spin_lock_init(>uevent_lock);
>  
> - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id);
> + md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id, 0);
>   if (!md->queue)
>   goto bad;
>  

This should be BLK_QUEUE_NO_RESCUER as DM isn't making direct use of
bio_queue_split() for its own internal spliting (maybe it should and
that'd start to fix the issue I've been harping about?) but as is DM
destroys the rescuer workqueue (since commit dbba42d8a9eb "dm: eliminate
unused "bioset" process for each bio-based DM device").

Mike


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Mike Galbraith
On Tue, 2017-02-07 at 19:58 -0900, Kent Overstreet wrote:
> On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > 
> > > > > So.. what needs to be done there?
> > > 
> > > > But, I just got an idea for how to handle this that might be halfway 
> > > > sane, maybe
> > > > I'll try and come up with a patch...
> > > 
> > > Ok, here's such a patch, only lightly tested:
> > 
> > I guess it would be nice for me to test it... but what it is against?
> > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> 
> Sorry, I forgot I had a few other patches in my branch that touch
> mempool/biosets code.
> 
> Also, after thinking about it more and looking at the relevant code, I'm 
> pretty
> sure we don't need rescuer threads for block devices that just split bios - 
> i.e.
> most of them, so I changed my patch to do that.
> 
> Tested it by ripping out the current->bio_list checks/workarounds from the
> bcache code, appears to work:

Patch killed every last one of them, but..

homer:/root # dmesg|grep WARNING
[   11.701447] WARNING: CPU: 4 PID: 801 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   11.711027] WARNING: CPU: 4 PID: 801 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0
[   19.728989] WARNING: CPU: 0 PID: 717 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   19.737020] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0
[   19.746173] WARNING: CPU: 0 PID: 717 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   19.755260] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0
[   19.763837] WARNING: CPU: 0 PID: 717 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   19.772526] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0



Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Mike Galbraith
On Tue, 2017-02-07 at 19:58 -0900, Kent Overstreet wrote:
> On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > > 
> > > > > So.. what needs to be done there?
> > > 
> > > > But, I just got an idea for how to handle this that might be halfway 
> > > > sane, maybe
> > > > I'll try and come up with a patch...
> > > 
> > > Ok, here's such a patch, only lightly tested:
> > 
> > I guess it would be nice for me to test it... but what it is against?
> > I tried after v4.10-rc5 and linux-next, but got rejects in both cases.
> 
> Sorry, I forgot I had a few other patches in my branch that touch
> mempool/biosets code.
> 
> Also, after thinking about it more and looking at the relevant code, I'm 
> pretty
> sure we don't need rescuer threads for block devices that just split bios - 
> i.e.
> most of them, so I changed my patch to do that.
> 
> Tested it by ripping out the current->bio_list checks/workarounds from the
> bcache code, appears to work:

Patch killed every last one of them, but..

homer:/root # dmesg|grep WARNING
[   11.701447] WARNING: CPU: 4 PID: 801 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   11.711027] WARNING: CPU: 4 PID: 801 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0
[   19.728989] WARNING: CPU: 0 PID: 717 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   19.737020] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0
[   19.746173] WARNING: CPU: 0 PID: 717 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   19.755260] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0
[   19.763837] WARNING: CPU: 0 PID: 717 at block/bio.c:388 
bio_alloc_bioset+0x1a7/0x240
[   19.772526] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 
generic_make_request+0x191/0x1f0



Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Kent Overstreet
On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > 
> > > > So.. what needs to be done there?
> > 
> > > But, I just got an idea for how to handle this that might be halfway 
> > > sane, maybe
> > > I'll try and come up with a patch...
> > 
> > Ok, here's such a patch, only lightly tested:
> 
> I guess it would be nice for me to test it... but what it is against?
> I tried after v4.10-rc5 and linux-next, but got rejects in both cases.

Sorry, I forgot I had a few other patches in my branch that touch
mempool/biosets code.

Also, after thinking about it more and looking at the relevant code, I'm pretty
sure we don't need rescuer threads for block devices that just split bios - i.e.
most of them, so I changed my patch to do that.

Tested it by ripping out the current->bio_list checks/workarounds from the
bcache code, appears to work:

-- >8 --
Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset

Also, trigger rescuing whenever with bios on current->bio_list, instead
of only when we block in bio_alloc_bioset(). This is more correct, and
should result in fewer rescuer threads.

XXX: The current->bio_list plugging needs to be unified with the
blk_plug mechanism.

Signed-off-by: Kent Overstreet 
---
 block/bio.c| 105 +++--
 block/blk-core.c   |  69 +++
 block/blk-mq.c |   3 +-
 block/blk-sysfs.c  |   2 +
 drivers/block/brd.c|   2 +-
 drivers/block/drbd/drbd_main.c |   2 +-
 drivers/block/null_blk.c   |   3 +-
 drivers/block/pktcdvd.c|   2 +-
 drivers/block/ps3vram.c|   2 +-
 drivers/block/rsxx/dev.c   |   2 +-
 drivers/block/umem.c   |   2 +-
 drivers/block/zram/zram_drv.c  |   2 +-
 drivers/lightnvm/gennvm.c  |   2 +-
 drivers/md/bcache/super.c  |   2 +-
 drivers/md/dm.c|   2 +-
 drivers/md/md.c|   2 +-
 drivers/nvdimm/blk.c   |   2 +-
 drivers/nvdimm/btt.c   |   2 +-
 drivers/nvdimm/pmem.c  |   3 +-
 drivers/s390/block/dcssblk.c   |   2 +-
 drivers/s390/block/xpram.c |   2 +-
 include/linux/bio.h|  16 +++
 include/linux/blkdev.h |  16 ++-
 include/linux/sched.h  |   2 +-
 kernel/sched/core.c|   6 +++
 25 files changed, 117 insertions(+), 138 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2b375020fc..9b89be1719 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -340,54 +340,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
 }
 EXPORT_SYMBOL(bio_chain);
 
-static void bio_alloc_rescue(struct work_struct *work)
-{
-   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
-   struct bio *bio;
-
-   while (1) {
-   spin_lock(>rescue_lock);
-   bio = bio_list_pop(>rescue_list);
-   spin_unlock(>rescue_lock);
-
-   if (!bio)
-   break;
-
-   generic_make_request(bio);
-   }
-}
-
-static void punt_bios_to_rescuer(struct bio_set *bs)
-{
-   struct bio_list punt, nopunt;
-   struct bio *bio;
-
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   *current->bio_list = nopunt;
-
-   spin_lock(>rescue_lock);
-   bio_list_merge(>rescue_list, );
-   spin_unlock(>rescue_lock);
-
-   queue_work(bs->rescue_workqueue, >rescue_work);
-}
-
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -425,17 +377,20 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   if (!bs) {
-   if (nr_iovecs > UIO_MAXIOV)
-   return NULL;
+   WARN(current->bio_list &&
+

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Kent Overstreet
On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote:
> On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > 
> > > > So.. what needs to be done there?
> > 
> > > But, I just got an idea for how to handle this that might be halfway 
> > > sane, maybe
> > > I'll try and come up with a patch...
> > 
> > Ok, here's such a patch, only lightly tested:
> 
> I guess it would be nice for me to test it... but what it is against?
> I tried after v4.10-rc5 and linux-next, but got rejects in both cases.

Sorry, I forgot I had a few other patches in my branch that touch
mempool/biosets code.

Also, after thinking about it more and looking at the relevant code, I'm pretty
sure we don't need rescuer threads for block devices that just split bios - i.e.
most of them, so I changed my patch to do that.

Tested it by ripping out the current->bio_list checks/workarounds from the
bcache code, appears to work:

-- >8 --
Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset

Also, trigger rescuing whenever with bios on current->bio_list, instead
of only when we block in bio_alloc_bioset(). This is more correct, and
should result in fewer rescuer threads.

XXX: The current->bio_list plugging needs to be unified with the
blk_plug mechanism.

Signed-off-by: Kent Overstreet 
---
 block/bio.c| 105 +++--
 block/blk-core.c   |  69 +++
 block/blk-mq.c |   3 +-
 block/blk-sysfs.c  |   2 +
 drivers/block/brd.c|   2 +-
 drivers/block/drbd/drbd_main.c |   2 +-
 drivers/block/null_blk.c   |   3 +-
 drivers/block/pktcdvd.c|   2 +-
 drivers/block/ps3vram.c|   2 +-
 drivers/block/rsxx/dev.c   |   2 +-
 drivers/block/umem.c   |   2 +-
 drivers/block/zram/zram_drv.c  |   2 +-
 drivers/lightnvm/gennvm.c  |   2 +-
 drivers/md/bcache/super.c  |   2 +-
 drivers/md/dm.c|   2 +-
 drivers/md/md.c|   2 +-
 drivers/nvdimm/blk.c   |   2 +-
 drivers/nvdimm/btt.c   |   2 +-
 drivers/nvdimm/pmem.c  |   3 +-
 drivers/s390/block/dcssblk.c   |   2 +-
 drivers/s390/block/xpram.c |   2 +-
 include/linux/bio.h|  16 +++
 include/linux/blkdev.h |  16 ++-
 include/linux/sched.h  |   2 +-
 kernel/sched/core.c|   6 +++
 25 files changed, 117 insertions(+), 138 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2b375020fc..9b89be1719 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -340,54 +340,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
 }
 EXPORT_SYMBOL(bio_chain);
 
-static void bio_alloc_rescue(struct work_struct *work)
-{
-   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
-   struct bio *bio;
-
-   while (1) {
-   spin_lock(>rescue_lock);
-   bio = bio_list_pop(>rescue_list);
-   spin_unlock(>rescue_lock);
-
-   if (!bio)
-   break;
-
-   generic_make_request(bio);
-   }
-}
-
-static void punt_bios_to_rescuer(struct bio_set *bs)
-{
-   struct bio_list punt, nopunt;
-   struct bio *bio;
-
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   *current->bio_list = nopunt;
-
-   spin_lock(>rescue_lock);
-   bio_list_merge(>rescue_list, );
-   spin_unlock(>rescue_lock);
-
-   queue_work(bs->rescue_workqueue, >rescue_work);
-}
-
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -425,17 +377,20 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   if (!bs) {
-   if (nr_iovecs > UIO_MAXIOV)
-   return NULL;
+   WARN(current->bio_list &&
+!current->bio_list->q->rescue_workqueue,
+

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Mike Galbraith
On Tue, 2017-02-07 at 21:39 +0100, Pavel Machek wrote:
> On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > 
> > > > So.. what needs to be done there?
> > 
> > > But, I just got an idea for how to handle this that might be halfway 
> > > sane, maybe
> > > I'll try and come up with a patch...
> > 
> > Ok, here's such a patch, only lightly tested:
> 
> I guess it would be nice for me to test it... but what it is against?
> I tried after v4.10-rc5 and linux-next, but got rejects in both cases.

It wedged into master easily enough (box still seems to work.. but I'll
be rebooting in a very few seconds just in case:), but threads on my
desktop box only dropped from 73 to 71.  Poo.

-Mike


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Mike Galbraith
On Tue, 2017-02-07 at 21:39 +0100, Pavel Machek wrote:
> On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > > 
> > > > So.. what needs to be done there?
> > 
> > > But, I just got an idea for how to handle this that might be halfway 
> > > sane, maybe
> > > I'll try and come up with a patch...
> > 
> > Ok, here's such a patch, only lightly tested:
> 
> I guess it would be nice for me to test it... but what it is against?
> I tried after v4.10-rc5 and linux-next, but got rejects in both cases.

It wedged into master easily enough (box still seems to work.. but I'll
be rebooting in a very few seconds just in case:), but threads on my
desktop box only dropped from 73 to 71.  Poo.

-Mike


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Ming Lei
On Tue, Feb 7, 2017 at 10:49 AM, Kent Overstreet
 wrote:
> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
>> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
>> > Still there on v4.9, 36 threads on nokia n900 cellphone.
>> >
>> > So.. what needs to be done there?
>
>> But, I just got an idea for how to handle this that might be halfway sane, 
>> maybe
>> I'll try and come up with a patch...
>
> Ok, here's such a patch, only lightly tested:
>
> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
>
> Note: this patch is very lightly tested.
>
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.

Looks the rescuer stuff gets simplified much with this patch.

>
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.

Yeah, that can be another benefit, :-)

>
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.

Also the rescue threads are often from some reserved block devices, such as
loop/nbd, and we should have allowed these drivers to delay allocating
the thread
just before the disk is activated. Then the thread number can get descreased
a lot.

> ---
>  block/bio.c| 107 
> -
>  block/blk-core.c   |  58 ---
>  block/blk-sysfs.c  |   2 +
>  include/linux/bio.h|  16 
>  include/linux/blkdev.h |  10 +
>  include/linux/sched.h  |   2 +-
>  kernel/sched/core.c|   4 ++
>  7 files changed, 83 insertions(+), 116 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index f3b5786202..9ad54a9b12 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
>  }
>  EXPORT_SYMBOL(bio_chain);
>
> -static void bio_alloc_rescue(struct work_struct *work)
> -{
> -   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> -   struct bio *bio;
> -
> -   while (1) {
> -   spin_lock(>rescue_lock);
> -   bio = bio_list_pop(>rescue_list);
> -   spin_unlock(>rescue_lock);
> -
> -   if (!bio)
> -   break;
> -
> -   generic_make_request(bio);
> -   }
> -}
> -
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> -{
> -   struct bio_list punt, nopunt;
> -   struct bio *bio;
> -
> -   /*
> -* In order to guarantee forward progress we must punt only bios that
> -* were allocated from this bio_set; otherwise, if there was a bio on
> -* there for a stacking driver higher up in the stack, processing it
> -* could require allocating bios from this bio_set, and doing that 
> from
> -* our own rescuer would be bad.
> -*
> -* Since bio lists are singly linked, pop them all instead of trying 
> to
> -* remove from the middle of the list:
> -*/
> -
> -   bio_list_init();
> -   bio_list_init();
> -
> -   while ((bio = bio_list_pop(current->bio_list)))
> -   bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> -   *current->bio_list = nopunt;
> -
> -   spin_lock(>rescue_lock);
> -   bio_list_merge(>rescue_list, );
> -   spin_unlock(>rescue_lock);
> -
> -   queue_work(bs->rescue_workqueue, >rescue_work);
> -}
> -
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
> *bs)
>  {
> -   gfp_t saved_gfp = gfp_mask;
> unsigned front_pad;
> unsigned inline_vecs;
> struct bio_vec *bvl = NULL;
> struct bio *bio;
> void *p;
>
> -   if (!bs) {
> -   if (nr_iovecs > UIO_MAXIOV)
> -   return NULL;
> +   WARN(current->bio_list &&
> +!current->bio_list->q->rescue_workqueue,
> +"allocating bio beneath generic_make_request() without rescuer");
>
> +   if (nr_iovecs > UIO_MAXIOV)
> +   return NULL;
> +
> +   if (!bs) {
> p = kmalloc(sizeof(struct bio) +
> nr_iovecs * sizeof(struct bio_vec),
> gfp_mask);
> front_pad = 0;
> inline_vecs = nr_iovecs;
> } else {
> -   /*
> -* generic_make_request() converts recursion to iteration; 
> this
> -* means if we're running beneath it, any bios we allocate and
> -* submit will not be 

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Ming Lei
On Tue, Feb 7, 2017 at 10:49 AM, Kent Overstreet
 wrote:
> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
>> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
>> > Still there on v4.9, 36 threads on nokia n900 cellphone.
>> >
>> > So.. what needs to be done there?
>
>> But, I just got an idea for how to handle this that might be halfway sane, 
>> maybe
>> I'll try and come up with a patch...
>
> Ok, here's such a patch, only lightly tested:
>
> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
>
> Note: this patch is very lightly tested.
>
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.

Looks the rescuer stuff gets simplified much with this patch.

>
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.

Yeah, that can be another benefit, :-)

>
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.

Also the rescue threads are often from some reserved block devices, such as
loop/nbd, and we should have allowed these drivers to delay allocating
the thread
just before the disk is activated. Then the thread number can get descreased
a lot.

> ---
>  block/bio.c| 107 
> -
>  block/blk-core.c   |  58 ---
>  block/blk-sysfs.c  |   2 +
>  include/linux/bio.h|  16 
>  include/linux/blkdev.h |  10 +
>  include/linux/sched.h  |   2 +-
>  kernel/sched/core.c|   4 ++
>  7 files changed, 83 insertions(+), 116 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index f3b5786202..9ad54a9b12 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
>  }
>  EXPORT_SYMBOL(bio_chain);
>
> -static void bio_alloc_rescue(struct work_struct *work)
> -{
> -   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> -   struct bio *bio;
> -
> -   while (1) {
> -   spin_lock(>rescue_lock);
> -   bio = bio_list_pop(>rescue_list);
> -   spin_unlock(>rescue_lock);
> -
> -   if (!bio)
> -   break;
> -
> -   generic_make_request(bio);
> -   }
> -}
> -
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> -{
> -   struct bio_list punt, nopunt;
> -   struct bio *bio;
> -
> -   /*
> -* In order to guarantee forward progress we must punt only bios that
> -* were allocated from this bio_set; otherwise, if there was a bio on
> -* there for a stacking driver higher up in the stack, processing it
> -* could require allocating bios from this bio_set, and doing that 
> from
> -* our own rescuer would be bad.
> -*
> -* Since bio lists are singly linked, pop them all instead of trying 
> to
> -* remove from the middle of the list:
> -*/
> -
> -   bio_list_init();
> -   bio_list_init();
> -
> -   while ((bio = bio_list_pop(current->bio_list)))
> -   bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> -   *current->bio_list = nopunt;
> -
> -   spin_lock(>rescue_lock);
> -   bio_list_merge(>rescue_list, );
> -   spin_unlock(>rescue_lock);
> -
> -   queue_work(bs->rescue_workqueue, >rescue_work);
> -}
> -
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
> *bs)
>  {
> -   gfp_t saved_gfp = gfp_mask;
> unsigned front_pad;
> unsigned inline_vecs;
> struct bio_vec *bvl = NULL;
> struct bio *bio;
> void *p;
>
> -   if (!bs) {
> -   if (nr_iovecs > UIO_MAXIOV)
> -   return NULL;
> +   WARN(current->bio_list &&
> +!current->bio_list->q->rescue_workqueue,
> +"allocating bio beneath generic_make_request() without rescuer");
>
> +   if (nr_iovecs > UIO_MAXIOV)
> +   return NULL;
> +
> +   if (!bs) {
> p = kmalloc(sizeof(struct bio) +
> nr_iovecs * sizeof(struct bio_vec),
> gfp_mask);
> front_pad = 0;
> inline_vecs = nr_iovecs;
> } else {
> -   /*
> -* generic_make_request() converts recursion to iteration; 
> this
> -* means if we're running beneath it, any bios we allocate and
> -* submit will not be submitted (and thus freed) 

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Pavel Machek
On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > 
> > > So.. what needs to be done there?
> 
> > But, I just got an idea for how to handle this that might be halfway sane, 
> > maybe
> > I'll try and come up with a patch...
> 
> Ok, here's such a patch, only lightly tested:

I guess it would be nice for me to test it... but what it is against?
I tried after v4.10-rc5 and linux-next, but got rejects in both cases.

Thanks,
Pavel

> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
> 
> Note: this patch is very lightly tested.
> 
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
> 
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
> 
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.
> ---
>  block/bio.c| 107 
> -
>  block/blk-core.c   |  58 ---
>  block/blk-sysfs.c  |   2 +
>  include/linux/bio.h|  16 
>  include/linux/blkdev.h |  10 +
>  include/linux/sched.h  |   2 +-
>  kernel/sched/core.c|   4 ++
>  7 files changed, 83 insertions(+), 116 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index f3b5786202..9ad54a9b12 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
>  }
>  EXPORT_SYMBOL(bio_chain);
>  
> -static void bio_alloc_rescue(struct work_struct *work)
> -{
> - struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> - struct bio *bio;
> -
> - while (1) {
> - spin_lock(>rescue_lock);
> - bio = bio_list_pop(>rescue_list);
> - spin_unlock(>rescue_lock);
> -
> - if (!bio)
> - break;
> -
> - generic_make_request(bio);
> - }
> -}
> -
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> -{
> - struct bio_list punt, nopunt;
> - struct bio *bio;
> -
> - /*
> -  * In order to guarantee forward progress we must punt only bios that
> -  * were allocated from this bio_set; otherwise, if there was a bio on
> -  * there for a stacking driver higher up in the stack, processing it
> -  * could require allocating bios from this bio_set, and doing that from
> -  * our own rescuer would be bad.
> -  *
> -  * Since bio lists are singly linked, pop them all instead of trying to
> -  * remove from the middle of the list:
> -  */
> -
> - bio_list_init();
> - bio_list_init();
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(>rescue_lock);
> - bio_list_merge(>rescue_list, );
> - spin_unlock(>rescue_lock);
> -
> - queue_work(bs->rescue_workqueue, >rescue_work);
> -}
> -
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
> *bs)
>  {
> - gfp_t saved_gfp = gfp_mask;
>   unsigned front_pad;
>   unsigned inline_vecs;
>   struct bio_vec *bvl = NULL;
>   struct bio *bio;
>   void *p;
>  
> - if (!bs) {
> - if (nr_iovecs > UIO_MAXIOV)
> - return NULL;
> + WARN(current->bio_list &&
> +  !current->bio_list->q->rescue_workqueue,
> +  "allocating bio beneath generic_make_request() without rescuer");
>  
> + if (nr_iovecs > UIO_MAXIOV)
> + return NULL;
> +
> + if (!bs) {
>   p = kmalloc(sizeof(struct bio) +
>   nr_iovecs * sizeof(struct bio_vec),
>   gfp_mask);
>   front_pad = 0;
>   inline_vecs = nr_iovecs;
>   } else {
> - /*
> -  * generic_make_request() converts recursion to iteration; this
> -  * means if we're running beneath it, any bios we allocate and
> -  * submit will not be submitted (and thus freed) until after we
> -  * return.
> -  *
> -  * This exposes us to a potential deadlock if we allocate
> -  * multiple bios from the same bio_set() while running
> -  * underneath 

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Pavel Machek
On Mon 2017-02-06 17:49:06, Kent Overstreet wrote:
> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > 
> > > So.. what needs to be done there?
> 
> > But, I just got an idea for how to handle this that might be halfway sane, 
> > maybe
> > I'll try and come up with a patch...
> 
> Ok, here's such a patch, only lightly tested:

I guess it would be nice for me to test it... but what it is against?
I tried after v4.10-rc5 and linux-next, but got rejects in both cases.

Thanks,
Pavel

> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
> 
> Note: this patch is very lightly tested.
> 
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
> 
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
> 
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.
> ---
>  block/bio.c| 107 
> -
>  block/blk-core.c   |  58 ---
>  block/blk-sysfs.c  |   2 +
>  include/linux/bio.h|  16 
>  include/linux/blkdev.h |  10 +
>  include/linux/sched.h  |   2 +-
>  kernel/sched/core.c|   4 ++
>  7 files changed, 83 insertions(+), 116 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index f3b5786202..9ad54a9b12 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
>  }
>  EXPORT_SYMBOL(bio_chain);
>  
> -static void bio_alloc_rescue(struct work_struct *work)
> -{
> - struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
> - struct bio *bio;
> -
> - while (1) {
> - spin_lock(>rescue_lock);
> - bio = bio_list_pop(>rescue_list);
> - spin_unlock(>rescue_lock);
> -
> - if (!bio)
> - break;
> -
> - generic_make_request(bio);
> - }
> -}
> -
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> -{
> - struct bio_list punt, nopunt;
> - struct bio *bio;
> -
> - /*
> -  * In order to guarantee forward progress we must punt only bios that
> -  * were allocated from this bio_set; otherwise, if there was a bio on
> -  * there for a stacking driver higher up in the stack, processing it
> -  * could require allocating bios from this bio_set, and doing that from
> -  * our own rescuer would be bad.
> -  *
> -  * Since bio lists are singly linked, pop them all instead of trying to
> -  * remove from the middle of the list:
> -  */
> -
> - bio_list_init();
> - bio_list_init();
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(>rescue_lock);
> - bio_list_merge(>rescue_list, );
> - spin_unlock(>rescue_lock);
> -
> - queue_work(bs->rescue_workqueue, >rescue_work);
> -}
> -
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
> *bs)
>  {
> - gfp_t saved_gfp = gfp_mask;
>   unsigned front_pad;
>   unsigned inline_vecs;
>   struct bio_vec *bvl = NULL;
>   struct bio *bio;
>   void *p;
>  
> - if (!bs) {
> - if (nr_iovecs > UIO_MAXIOV)
> - return NULL;
> + WARN(current->bio_list &&
> +  !current->bio_list->q->rescue_workqueue,
> +  "allocating bio beneath generic_make_request() without rescuer");
>  
> + if (nr_iovecs > UIO_MAXIOV)
> + return NULL;
> +
> + if (!bs) {
>   p = kmalloc(sizeof(struct bio) +
>   nr_iovecs * sizeof(struct bio_vec),
>   gfp_mask);
>   front_pad = 0;
>   inline_vecs = nr_iovecs;
>   } else {
> - /*
> -  * generic_make_request() converts recursion to iteration; this
> -  * means if we're running beneath it, any bios we allocate and
> -  * submit will not be submitted (and thus freed) until after we
> -  * return.
> -  *
> -  * This exposes us to a potential deadlock if we allocate
> -  * multiple bios from the same bio_set() while running
> -  * underneath 

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Mike Snitzer
On Mon, Feb 06 2017 at  9:49pm -0500,
Kent Overstreet  wrote:

> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > 
> > > So.. what needs to be done there?
> 
> > But, I just got an idea for how to handle this that might be halfway sane, 
> > maybe
> > I'll try and come up with a patch...
> 
> Ok, here's such a patch, only lightly tested:
> 
> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
> 
> Note: this patch is very lightly tested.
> 
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
> 
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
> 
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.

Hi Kent,

I really appreciate you working on this further.  Thanks.

As I think you're probably already aware, a long standing issue with the
per bio_set rescuer is this bug (which manifests in dm-snapshot
deadlocks): https://bugzilla.kernel.org/show_bug.cgi?id=119841

Please also see this patch header, from a private branch from a while
ago, that describes the problem in detail:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=cd2c760b5a609e2aaf3735a7b9503a953535c368

Would welcome your consideration of that BZ as you think further and/or
iterate on this line of work.

Mike


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-07 Thread Mike Snitzer
On Mon, Feb 06 2017 at  9:49pm -0500,
Kent Overstreet  wrote:

> On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > > 
> > > So.. what needs to be done there?
> 
> > But, I just got an idea for how to handle this that might be halfway sane, 
> > maybe
> > I'll try and come up with a patch...
> 
> Ok, here's such a patch, only lightly tested:
> 
> -- >8 --
> Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset
> 
> Note: this patch is very lightly tested.
> 
> Also, trigger rescuing whenever with bios on current->bio_list, instead
> of only when we block in bio_alloc_bioset(). This is more correct, and
> should result in fewer rescuer threads.
> 
> XXX: The current->bio_list plugging needs to be unified with the
> blk_plug mechanism.
> 
> TODO: If we change normal request_queue drivers to handle arbitrary size
> bios by processing requests incrementally, instead of splitting bios,
> then we can get rid of rescuer threads from those devices.

Hi Kent,

I really appreciate you working on this further.  Thanks.

As I think you're probably already aware, a long standing issue with the
per bio_set rescuer is this bug (which manifests in dm-snapshot
deadlocks): https://bugzilla.kernel.org/show_bug.cgi?id=119841

Please also see this patch header, from a private branch from a while
ago, that describes the problem in detail:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=cd2c760b5a609e2aaf3735a7b9503a953535c368

Would welcome your consideration of that BZ as you think further and/or
iterate on this line of work.

Mike


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-06 Thread Kent Overstreet
On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > 
> > So.. what needs to be done there?

> But, I just got an idea for how to handle this that might be halfway sane, 
> maybe
> I'll try and come up with a patch...

Ok, here's such a patch, only lightly tested:

-- >8 --
Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset

Note: this patch is very lightly tested.

Also, trigger rescuing whenever with bios on current->bio_list, instead
of only when we block in bio_alloc_bioset(). This is more correct, and
should result in fewer rescuer threads.

XXX: The current->bio_list plugging needs to be unified with the
blk_plug mechanism.

TODO: If we change normal request_queue drivers to handle arbitrary size
bios by processing requests incrementally, instead of splitting bios,
then we can get rid of rescuer threads from those devices.
---
 block/bio.c| 107 -
 block/blk-core.c   |  58 ---
 block/blk-sysfs.c  |   2 +
 include/linux/bio.h|  16 
 include/linux/blkdev.h |  10 +
 include/linux/sched.h  |   2 +-
 kernel/sched/core.c|   4 ++
 7 files changed, 83 insertions(+), 116 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f3b5786202..9ad54a9b12 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
 }
 EXPORT_SYMBOL(bio_chain);
 
-static void bio_alloc_rescue(struct work_struct *work)
-{
-   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
-   struct bio *bio;
-
-   while (1) {
-   spin_lock(>rescue_lock);
-   bio = bio_list_pop(>rescue_list);
-   spin_unlock(>rescue_lock);
-
-   if (!bio)
-   break;
-
-   generic_make_request(bio);
-   }
-}
-
-static void punt_bios_to_rescuer(struct bio_set *bs)
-{
-   struct bio_list punt, nopunt;
-   struct bio *bio;
-
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   *current->bio_list = nopunt;
-
-   spin_lock(>rescue_lock);
-   bio_list_merge(>rescue_list, );
-   spin_unlock(>rescue_lock);
-
-   queue_work(bs->rescue_workqueue, >rescue_work);
-}
-
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   if (!bs) {
-   if (nr_iovecs > UIO_MAXIOV)
-   return NULL;
+   WARN(current->bio_list &&
+!current->bio_list->q->rescue_workqueue,
+"allocating bio beneath generic_make_request() without rescuer");
 
+   if (nr_iovecs > UIO_MAXIOV)
+   return NULL;
+
+   if (!bs) {
p = kmalloc(sizeof(struct bio) +
nr_iovecs * sizeof(struct bio_vec),
gfp_mask);
front_pad = 0;
inline_vecs = nr_iovecs;
} else {
-   /*
-* generic_make_request() converts recursion to iteration; this
-* means if we're running beneath it, any bios we allocate and
-* submit will not be submitted (and thus freed) until after we
-* return.
-*
-* This exposes us to a potential deadlock if we allocate
-* multiple bios from the same bio_set() while running
-* underneath generic_make_request(). If we were to allocate
-* multiple bios (say a stacking block driver that was splitting
-* bios), we would deadlock if we exhausted the mempool's
-* reserve.
-*
-* We solve this, and guarantee forward progress, with a rescuer
-* workqueue per bio_set. If we go to allocate and there are
-

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-06 Thread Kent Overstreet
On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote:
> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> > Still there on v4.9, 36 threads on nokia n900 cellphone.
> > 
> > So.. what needs to be done there?

> But, I just got an idea for how to handle this that might be halfway sane, 
> maybe
> I'll try and come up with a patch...

Ok, here's such a patch, only lightly tested:

-- >8 --
Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset

Note: this patch is very lightly tested.

Also, trigger rescuing whenever with bios on current->bio_list, instead
of only when we block in bio_alloc_bioset(). This is more correct, and
should result in fewer rescuer threads.

XXX: The current->bio_list plugging needs to be unified with the
blk_plug mechanism.

TODO: If we change normal request_queue drivers to handle arbitrary size
bios by processing requests incrementally, instead of splitting bios,
then we can get rid of rescuer threads from those devices.
---
 block/bio.c| 107 -
 block/blk-core.c   |  58 ---
 block/blk-sysfs.c  |   2 +
 include/linux/bio.h|  16 
 include/linux/blkdev.h |  10 +
 include/linux/sched.h  |   2 +-
 kernel/sched/core.c|   4 ++
 7 files changed, 83 insertions(+), 116 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f3b5786202..9ad54a9b12 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent)
 }
 EXPORT_SYMBOL(bio_chain);
 
-static void bio_alloc_rescue(struct work_struct *work)
-{
-   struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
-   struct bio *bio;
-
-   while (1) {
-   spin_lock(>rescue_lock);
-   bio = bio_list_pop(>rescue_list);
-   spin_unlock(>rescue_lock);
-
-   if (!bio)
-   break;
-
-   generic_make_request(bio);
-   }
-}
-
-static void punt_bios_to_rescuer(struct bio_set *bs)
-{
-   struct bio_list punt, nopunt;
-   struct bio *bio;
-
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   *current->bio_list = nopunt;
-
-   spin_lock(>rescue_lock);
-   bio_list_merge(>rescue_list, );
-   spin_unlock(>rescue_lock);
-
-   queue_work(bs->rescue_workqueue, >rescue_work);
-}
-
 /**
  * bio_alloc_bioset - allocate a bio for I/O
  * @gfp_mask:   the GFP_ mask given to the slab allocator
@@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;
 
-   if (!bs) {
-   if (nr_iovecs > UIO_MAXIOV)
-   return NULL;
+   WARN(current->bio_list &&
+!current->bio_list->q->rescue_workqueue,
+"allocating bio beneath generic_make_request() without rescuer");
 
+   if (nr_iovecs > UIO_MAXIOV)
+   return NULL;
+
+   if (!bs) {
p = kmalloc(sizeof(struct bio) +
nr_iovecs * sizeof(struct bio_vec),
gfp_mask);
front_pad = 0;
inline_vecs = nr_iovecs;
} else {
-   /*
-* generic_make_request() converts recursion to iteration; this
-* means if we're running beneath it, any bios we allocate and
-* submit will not be submitted (and thus freed) until after we
-* return.
-*
-* This exposes us to a potential deadlock if we allocate
-* multiple bios from the same bio_set() while running
-* underneath generic_make_request(). If we were to allocate
-* multiple bios (say a stacking block driver that was splitting
-* bios), we would deadlock if we exhausted the mempool's
-* reserve.
-*
-* We solve this, and guarantee forward progress, with a rescuer
-* workqueue per bio_set. If we go to allocate and there are
-

Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-06 Thread Kent Overstreet
On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> On Sat 2016-02-20 21:04:32, Pavel Machek wrote:
> > Hi!
> > 
> > > > > > > I know it is normal to spawn 8 threads for every single function,
> > > > > > ...
> > > > > > > but 28 threads?
> > > > > > > 
> > > > > > > root   974  0.0  0.0  0 0 ?S<   Dec08   0:00 
> > > > > > > [bioset]
> > > > > > ...
> > > > > > 
> > > > > > How many physical block devices do you have?
> > > > > > 
> > > > > > DM is doing its part to not contribute to this:
> > > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each 
> > > > > > bio-based DM device")
> > > > > > 
> > > > > > (but yeah, all these extra 'bioset' threads aren't ideal)
> > > > > 
> > > > > Still there in 4.4-final.
> > > > 
> > > > ...and still there in 4.5-rc4 :-(.
> > > 
> > > You're directing this concern to the wrong person.
> > > 
> > > I already told you DM is _not_ contributing any extra "bioset" threads
> > > (ever since commit dbba42d8a).
> > 
> > Well, sorry about that. Note that l-k is on the cc list, so hopefully
> > the right person sees it too.
> > 
> > Ok, let me check... it seems that 
> > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent
> > Overstreet  is to blame.
> > 
> > Um, and you acked the patch, so you are partly responsible.
> 
> Still there on v4.9, 36 threads on nokia n900 cellphone.
> 
> So.. what needs to be done there?

So background:

We need rescuer threads because:

Say you allocate a bio from a bioset, submit that bio, and then allocate again
from that same bioset: if you're running underneath generic_make_request(),
you'll deadlock. Real submission of the first bio you allocated is blocked until
you return from your make_request_fn(), but you're blocking that trying to
allocate - this is handled (in a hacky way!) with the punt_bios_to_rescuer code
when we go to allocate from a bioset but have to block.

We need more than a single global rescuer, because:

The rescuer thread is just resubmitting bios, so if in the course of submitting
bios, _their_ drivers allocate new bios from biosets and block - oops, we're
recursing.

However:

The rescuer threads don't inherently need to be per bioset - they really ought
to be per block device.

Additionally, triggering the "punt bios to rescuer" code only when we go to
allocate from a bioset and block is wrong: it's possible to create these sorts
of deadlocks by blocking on other things. The right thing to do would be to
trigger this "punt bios to rescuer" thing whenever we schedule, and there's
still bios on current->bio_list.

This is actually how Jens's new(er) plugging code works (which post dates my
punt bios to rescuer hack). What needs to happen is Jens's scheduler hook for
block plugging needs to be be unified with both the current->bio_list thing
(which is really a block plug, just open coded, as it predates _all_ of this)
and the rescuer thread stuff.

The tricky part is going to be making the rescuer threads per block device
_correctly_ (without introducing new deadlocks)... reasoning out these deadlocks
always makes my head hurt, the biggest reason I made the rescuer threads per
bioset was that when I wrote the code I wasn't at all confident I could get
anything else right. Still uneasy about that :)

What needs rescuer threads?

- if we're allocating from a bioset, but we're not running under
  generic_make_request() (e.g. we're in filesystem code) - don't need a rescuer
  there, we're not blocking previously allocated bios from being submitted.

- if we're a block driver that doesn't allocate new bios, we don't need a
  rescuer thread.

  But note that any block driver that calls blk_queue_split() to handle
  arbitrary size bios is allocating bios. If we converted e.g. the standard
  request queue code to process bios/requests incrementally, instead of
  requiring them to be split, this would go away (and that's something that
  should be done anyways, it would improve performance by getting rid of segment
  counting).

So, we should only need one rescuer thread per block device - and if we get rid
of splitting to handle arbitrary size bios, most block devices won't need
rescuers.

The catch is that the correct rescuer to punt a bio to corresponds to the device
that _issued_ the bio, not the device the bio was submitted to, and that
information is currently lost by the time we block - that's the other reason I
made rescuers per bioset, since bios do track the bioset they were allocated
from.

But, I just got an idea for how to handle this that might be halfway sane, maybe
I'll try and come up with a patch...


Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-06 Thread Kent Overstreet
On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote:
> On Sat 2016-02-20 21:04:32, Pavel Machek wrote:
> > Hi!
> > 
> > > > > > > I know it is normal to spawn 8 threads for every single function,
> > > > > > ...
> > > > > > > but 28 threads?
> > > > > > > 
> > > > > > > root   974  0.0  0.0  0 0 ?S<   Dec08   0:00 
> > > > > > > [bioset]
> > > > > > ...
> > > > > > 
> > > > > > How many physical block devices do you have?
> > > > > > 
> > > > > > DM is doing its part to not contribute to this:
> > > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each 
> > > > > > bio-based DM device")
> > > > > > 
> > > > > > (but yeah, all these extra 'bioset' threads aren't ideal)
> > > > > 
> > > > > Still there in 4.4-final.
> > > > 
> > > > ...and still there in 4.5-rc4 :-(.
> > > 
> > > You're directing this concern to the wrong person.
> > > 
> > > I already told you DM is _not_ contributing any extra "bioset" threads
> > > (ever since commit dbba42d8a).
> > 
> > Well, sorry about that. Note that l-k is on the cc list, so hopefully
> > the right person sees it too.
> > 
> > Ok, let me check... it seems that 
> > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent
> > Overstreet  is to blame.
> > 
> > Um, and you acked the patch, so you are partly responsible.
> 
> Still there on v4.9, 36 threads on nokia n900 cellphone.
> 
> So.. what needs to be done there?

So background:

We need rescuer threads because:

Say you allocate a bio from a bioset, submit that bio, and then allocate again
from that same bioset: if you're running underneath generic_make_request(),
you'll deadlock. Real submission of the first bio you allocated is blocked until
you return from your make_request_fn(), but you're blocking that trying to
allocate - this is handled (in a hacky way!) with the punt_bios_to_rescuer code
when we go to allocate from a bioset but have to block.

We need more than a single global rescuer, because:

The rescuer thread is just resubmitting bios, so if in the course of submitting
bios, _their_ drivers allocate new bios from biosets and block - oops, we're
recursing.

However:

The rescuer threads don't inherently need to be per bioset - they really ought
to be per block device.

Additionally, triggering the "punt bios to rescuer" code only when we go to
allocate from a bioset and block is wrong: it's possible to create these sorts
of deadlocks by blocking on other things. The right thing to do would be to
trigger this "punt bios to rescuer" thing whenever we schedule, and there's
still bios on current->bio_list.

This is actually how Jens's new(er) plugging code works (which post dates my
punt bios to rescuer hack). What needs to happen is Jens's scheduler hook for
block plugging needs to be be unified with both the current->bio_list thing
(which is really a block plug, just open coded, as it predates _all_ of this)
and the rescuer thread stuff.

The tricky part is going to be making the rescuer threads per block device
_correctly_ (without introducing new deadlocks)... reasoning out these deadlocks
always makes my head hurt, the biggest reason I made the rescuer threads per
bioset was that when I wrote the code I wasn't at all confident I could get
anything else right. Still uneasy about that :)

What needs rescuer threads?

- if we're allocating from a bioset, but we're not running under
  generic_make_request() (e.g. we're in filesystem code) - don't need a rescuer
  there, we're not blocking previously allocated bios from being submitted.

- if we're a block driver that doesn't allocate new bios, we don't need a
  rescuer thread.

  But note that any block driver that calls blk_queue_split() to handle
  arbitrary size bios is allocating bios. If we converted e.g. the standard
  request queue code to process bios/requests incrementally, instead of
  requiring them to be split, this would go away (and that's something that
  should be done anyways, it would improve performance by getting rid of segment
  counting).

So, we should only need one rescuer thread per block device - and if we get rid
of splitting to handle arbitrary size bios, most block devices won't need
rescuers.

The catch is that the correct rescuer to punt a bio to corresponds to the device
that _issued_ the bio, not the device the bio was submitted to, and that
information is currently lost by the time we block - that's the other reason I
made rescuers per bioset, since bios do track the bioset they were allocated
from.

But, I just got an idea for how to handle this that might be halfway sane, maybe
I'll try and come up with a patch...


v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-06 Thread Pavel Machek
On Sat 2016-02-20 21:04:32, Pavel Machek wrote:
> Hi!
> 
> > > > > > I know it is normal to spawn 8 threads for every single function,
> > > > > ...
> > > > > > but 28 threads?
> > > > > > 
> > > > > > root   974  0.0  0.0  0 0 ?S<   Dec08   0:00 
> > > > > > [bioset]
> > > > > ...
> > > > > 
> > > > > How many physical block devices do you have?
> > > > > 
> > > > > DM is doing its part to not contribute to this:
> > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each bio-based 
> > > > > DM device")
> > > > > 
> > > > > (but yeah, all these extra 'bioset' threads aren't ideal)
> > > > 
> > > > Still there in 4.4-final.
> > > 
> > > ...and still there in 4.5-rc4 :-(.
> > 
> > You're directing this concern to the wrong person.
> > 
> > I already told you DM is _not_ contributing any extra "bioset" threads
> > (ever since commit dbba42d8a).
> 
> Well, sorry about that. Note that l-k is on the cc list, so hopefully
> the right person sees it too.
> 
> Ok, let me check... it seems that 
> 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent
> Overstreet  is to blame.
> 
> Um, and you acked the patch, so you are partly responsible.

Still there on v4.9, 36 threads on nokia n900 cellphone.

So.. what needs to be done there?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone

2017-02-06 Thread Pavel Machek
On Sat 2016-02-20 21:04:32, Pavel Machek wrote:
> Hi!
> 
> > > > > > I know it is normal to spawn 8 threads for every single function,
> > > > > ...
> > > > > > but 28 threads?
> > > > > > 
> > > > > > root   974  0.0  0.0  0 0 ?S<   Dec08   0:00 
> > > > > > [bioset]
> > > > > ...
> > > > > 
> > > > > How many physical block devices do you have?
> > > > > 
> > > > > DM is doing its part to not contribute to this:
> > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each bio-based 
> > > > > DM device")
> > > > > 
> > > > > (but yeah, all these extra 'bioset' threads aren't ideal)
> > > > 
> > > > Still there in 4.4-final.
> > > 
> > > ...and still there in 4.5-rc4 :-(.
> > 
> > You're directing this concern to the wrong person.
> > 
> > I already told you DM is _not_ contributing any extra "bioset" threads
> > (ever since commit dbba42d8a).
> 
> Well, sorry about that. Note that l-k is on the cc list, so hopefully
> the right person sees it too.
> 
> Ok, let me check... it seems that 
> 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent
> Overstreet  is to blame.
> 
> Um, and you acked the patch, so you are partly responsible.

Still there on v4.9, 36 threads on nokia n900 cellphone.

So.. what needs to be done there?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature