Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Thu, Feb 09 2017 at 4:25pm -0500, Kent Overstreetwrote: > On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote: > > On Tue, Feb 07 2017 at 11:58pm -0500, > > Kent Overstreet wrote: > > > > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > > > > > So.. what needs to be done there? > > > > > > > > > > > But, I just got an idea for how to handle this that might be > > > > > > halfway sane, maybe > > > > > > I'll try and come up with a patch... > > > > > > > > > > Ok, here's such a patch, only lightly tested: > > > > > > > > I guess it would be nice for me to test it... but what it is against? > > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > > > > > Sorry, I forgot I had a few other patches in my branch that touch > > > mempool/biosets code. > > > > > > Also, after thinking about it more and looking at the relevant code, I'm > > > pretty > > > sure we don't need rescuer threads for block devices that just split bios > > > - i.e. > > > most of them, so I changed my patch to do that. > > > > > > Tested it by ripping out the current->bio_list checks/workarounds from the > > > bcache code, appears to work: > > > > Feedback on this patch below, but first: > > > > There are deeper issues with the current->bio_list and rescue workqueues > > than thread counts. > > > > I cannot help but feel like you (and Jens) are repeatedly ignoring the > > issue that has been raised numerous times, most recently: > > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html > > > > FYI, this test (albeit ugly) can be used to check if the dm-snapshot > > deadlock is fixed: > > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > > > This situation is the unfortunate pathological worst case for what > > happens when changes are merged and nobody wants to own fixing the > > unforseen implications/regressions. Like everyone else in a position > > of Linux maintenance I've tried to stay away from owning the > > responsibility of a fix -- it isn't working. Ok, I'll stop bitching > > now.. I do bear responsibility for not digging in myself. We're all > > busy and this issue is "hard". > > Mike, it's not my job to debug DM code for you or sift through your bug > reports. > I don't read dm-devel, and I don't know why you think I that's my job. > > If there's something you think the block layer should be doing differently, > post > patches - or at the very least, explain what you'd like to be done, with > words. > Don't get pissy because I'm not sifting through your bug reports. > > Hell, I'm not getting paid to work on kernel code at all right now, and you > trying to rope me into fixing device mapper sure makes me want to work on the > block layer more. > > DM developers have a long history of working siloed off from the rest of the > block layer, building up their own crazy infrastructure (remember the old bio > splitting code?) and going to extreme lengths to avoid having to work on or > improve the core block layer infrastructure. It's ridiculous. That is bullshit. I try to help block core where/when I can (did so with discards and limits stacking, other fixes here and there). Your take on what DM code is and how it evolved is way off base. But that is to be expected from you. Not going to waste time laboring over correcting you. > You know what would be nice? What'd really make my day is if just once I got a > thank you or a bit of appreciation from DM developers for the bvec > iterators/bio > splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting > rid of merge_bvec_fn, or trying to come up with a better solution for > deadlocks > due to running under generic_make_request() now. I do appreciate the immutable biovec work you did. Helped speed up bio cloning quite nicely. But you've always been hostile to DM. You'll fabricate any excuse to never touch or care about it. Repeatedly noted. But this is a block core bug/regression. Not DM. If you think I'm going to thank you for blindly breaking shit, and refusing to care when you're made aware of it, then you're out of your mind. Save your predictably hostile response. It'll get marked as spam anyway.
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Thu, Feb 09 2017 at 4:25pm -0500, Kent Overstreet wrote: > On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote: > > On Tue, Feb 07 2017 at 11:58pm -0500, > > Kent Overstreet wrote: > > > > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > > > > > So.. what needs to be done there? > > > > > > > > > > > But, I just got an idea for how to handle this that might be > > > > > > halfway sane, maybe > > > > > > I'll try and come up with a patch... > > > > > > > > > > Ok, here's such a patch, only lightly tested: > > > > > > > > I guess it would be nice for me to test it... but what it is against? > > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > > > > > Sorry, I forgot I had a few other patches in my branch that touch > > > mempool/biosets code. > > > > > > Also, after thinking about it more and looking at the relevant code, I'm > > > pretty > > > sure we don't need rescuer threads for block devices that just split bios > > > - i.e. > > > most of them, so I changed my patch to do that. > > > > > > Tested it by ripping out the current->bio_list checks/workarounds from the > > > bcache code, appears to work: > > > > Feedback on this patch below, but first: > > > > There are deeper issues with the current->bio_list and rescue workqueues > > than thread counts. > > > > I cannot help but feel like you (and Jens) are repeatedly ignoring the > > issue that has been raised numerous times, most recently: > > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html > > > > FYI, this test (albeit ugly) can be used to check if the dm-snapshot > > deadlock is fixed: > > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > > > This situation is the unfortunate pathological worst case for what > > happens when changes are merged and nobody wants to own fixing the > > unforseen implications/regressions. Like everyone else in a position > > of Linux maintenance I've tried to stay away from owning the > > responsibility of a fix -- it isn't working. Ok, I'll stop bitching > > now.. I do bear responsibility for not digging in myself. We're all > > busy and this issue is "hard". > > Mike, it's not my job to debug DM code for you or sift through your bug > reports. > I don't read dm-devel, and I don't know why you think I that's my job. > > If there's something you think the block layer should be doing differently, > post > patches - or at the very least, explain what you'd like to be done, with > words. > Don't get pissy because I'm not sifting through your bug reports. > > Hell, I'm not getting paid to work on kernel code at all right now, and you > trying to rope me into fixing device mapper sure makes me want to work on the > block layer more. > > DM developers have a long history of working siloed off from the rest of the > block layer, building up their own crazy infrastructure (remember the old bio > splitting code?) and going to extreme lengths to avoid having to work on or > improve the core block layer infrastructure. It's ridiculous. That is bullshit. I try to help block core where/when I can (did so with discards and limits stacking, other fixes here and there). Your take on what DM code is and how it evolved is way off base. But that is to be expected from you. Not going to waste time laboring over correcting you. > You know what would be nice? What'd really make my day is if just once I got a > thank you or a bit of appreciation from DM developers for the bvec > iterators/bio > splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting > rid of merge_bvec_fn, or trying to come up with a better solution for > deadlocks > due to running under generic_make_request() now. I do appreciate the immutable biovec work you did. Helped speed up bio cloning quite nicely. But you've always been hostile to DM. You'll fabricate any excuse to never touch or care about it. Repeatedly noted. But this is a block core bug/regression. Not DM. If you think I'm going to thank you for blindly breaking shit, and refusing to care when you're made aware of it, then you're out of your mind. Save your predictably hostile response. It'll get marked as spam anyway.
Re: [dm-devel] v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Thu, 9 Feb 2017, Kent Overstreet wrote: > On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote: > > On Tue, Feb 07 2017 at 11:58pm -0500, > > Kent Overstreetwrote: > > > > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > > > > > So.. what needs to be done there? > > > > > > > > > > > But, I just got an idea for how to handle this that might be > > > > > > halfway sane, maybe > > > > > > I'll try and come up with a patch... > > > > > > > > > > Ok, here's such a patch, only lightly tested: > > > > > > > > I guess it would be nice for me to test it... but what it is against? > > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > > > > > Sorry, I forgot I had a few other patches in my branch that touch > > > mempool/biosets code. > > > > > > Also, after thinking about it more and looking at the relevant code, I'm > > > pretty > > > sure we don't need rescuer threads for block devices that just split bios > > > - i.e. > > > most of them, so I changed my patch to do that. > > > > > > Tested it by ripping out the current->bio_list checks/workarounds from the > > > bcache code, appears to work: > > > > Feedback on this patch below, but first: > > > > There are deeper issues with the current->bio_list and rescue workqueues > > than thread counts. > > > > I cannot help but feel like you (and Jens) are repeatedly ignoring the > > issue that has been raised numerous times, most recently: > > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html > > > > FYI, this test (albeit ugly) can be used to check if the dm-snapshot > > deadlock is fixed: > > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > > > This situation is the unfortunate pathological worst case for what > > happens when changes are merged and nobody wants to own fixing the > > unforseen implications/regressions. Like everyone else in a position > > of Linux maintenance I've tried to stay away from owning the > > responsibility of a fix -- it isn't working. Ok, I'll stop bitching > > now.. I do bear responsibility for not digging in myself. We're all > > busy and this issue is "hard". > > Mike, it's not my job to debug DM code for you or sift through your bug > reports. > I don't read dm-devel, and I don't know why you think I that's my job. > > If there's something you think the block layer should be doing differently, > post > patches - or at the very least, explain what you'd like to be done, with > words. > Don't get pissy because I'm not sifting through your bug reports. So I post this patch for that bug. Will any of the block device maintainers respond to it? From: Mikulas Patocka Date: Tue, 27 May 2014 11:03:36 -0400 Subject: block: flush queued bios when process blocks to avoid deadlock The block layer uses per-process bio list to avoid recursion in generic_make_request. When generic_make_request is called recursively, the bio is added to current->bio_list and generic_make_request returns immediately. The top-level instance of generic_make_request takes bios from current->bio_list and processes them. The problem is that this bio queuing on current->bio_list creates an artifical locking dependency - a bio further on current->bio_list depends on any locks that preceding bios could take. This could result in a deadlock. Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers") created a workqueue for every bio set and code in bio_alloc_bioset() that tries to resolve some low-memory deadlocks by redirecting bios queued on current->bio_list to the workqueue if the system is low on memory. However another deadlock (see below **) may happen, without any low memory condition, because generic_make_request is queuing bios to current->bio_list (rather than submitting them). Fix this deadlock by redirecting any bios on current->bio_list to the bio_set's rescue workqueue on every schedule call. Consequently, when the process blocks on a mutex, the bios queued on current->bio_list are dispatched to independent workqueus and they can complete without waiting for the mutex to be available. Also, now we can remove punt_bios_to_rescuer() and bio_alloc_bioset()'s calls to it because bio_alloc_bioset() will implicitly punt all bios on current->bio_list if it performs a blocking allocation. ** Here is the dm-snapshot deadlock that was observed: 1) Process A sends one-page read bio to the dm-snapshot target. The bio spans snapshot chunk boundary and so it is split to two bios by device mapper. 2) Device mapper creates the first sub-bio and sends it to the
Re: [dm-devel] v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Thu, 9 Feb 2017, Kent Overstreet wrote: > On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote: > > On Tue, Feb 07 2017 at 11:58pm -0500, > > Kent Overstreet wrote: > > > > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > > > > > So.. what needs to be done there? > > > > > > > > > > > But, I just got an idea for how to handle this that might be > > > > > > halfway sane, maybe > > > > > > I'll try and come up with a patch... > > > > > > > > > > Ok, here's such a patch, only lightly tested: > > > > > > > > I guess it would be nice for me to test it... but what it is against? > > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > > > > > Sorry, I forgot I had a few other patches in my branch that touch > > > mempool/biosets code. > > > > > > Also, after thinking about it more and looking at the relevant code, I'm > > > pretty > > > sure we don't need rescuer threads for block devices that just split bios > > > - i.e. > > > most of them, so I changed my patch to do that. > > > > > > Tested it by ripping out the current->bio_list checks/workarounds from the > > > bcache code, appears to work: > > > > Feedback on this patch below, but first: > > > > There are deeper issues with the current->bio_list and rescue workqueues > > than thread counts. > > > > I cannot help but feel like you (and Jens) are repeatedly ignoring the > > issue that has been raised numerous times, most recently: > > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html > > > > FYI, this test (albeit ugly) can be used to check if the dm-snapshot > > deadlock is fixed: > > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > > > This situation is the unfortunate pathological worst case for what > > happens when changes are merged and nobody wants to own fixing the > > unforseen implications/regressions. Like everyone else in a position > > of Linux maintenance I've tried to stay away from owning the > > responsibility of a fix -- it isn't working. Ok, I'll stop bitching > > now.. I do bear responsibility for not digging in myself. We're all > > busy and this issue is "hard". > > Mike, it's not my job to debug DM code for you or sift through your bug > reports. > I don't read dm-devel, and I don't know why you think I that's my job. > > If there's something you think the block layer should be doing differently, > post > patches - or at the very least, explain what you'd like to be done, with > words. > Don't get pissy because I'm not sifting through your bug reports. So I post this patch for that bug. Will any of the block device maintainers respond to it? From: Mikulas Patocka Date: Tue, 27 May 2014 11:03:36 -0400 Subject: block: flush queued bios when process blocks to avoid deadlock The block layer uses per-process bio list to avoid recursion in generic_make_request. When generic_make_request is called recursively, the bio is added to current->bio_list and generic_make_request returns immediately. The top-level instance of generic_make_request takes bios from current->bio_list and processes them. The problem is that this bio queuing on current->bio_list creates an artifical locking dependency - a bio further on current->bio_list depends on any locks that preceding bios could take. This could result in a deadlock. Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers") created a workqueue for every bio set and code in bio_alloc_bioset() that tries to resolve some low-memory deadlocks by redirecting bios queued on current->bio_list to the workqueue if the system is low on memory. However another deadlock (see below **) may happen, without any low memory condition, because generic_make_request is queuing bios to current->bio_list (rather than submitting them). Fix this deadlock by redirecting any bios on current->bio_list to the bio_set's rescue workqueue on every schedule call. Consequently, when the process blocks on a mutex, the bios queued on current->bio_list are dispatched to independent workqueus and they can complete without waiting for the mutex to be available. Also, now we can remove punt_bios_to_rescuer() and bio_alloc_bioset()'s calls to it because bio_alloc_bioset() will implicitly punt all bios on current->bio_list if it performs a blocking allocation. ** Here is the dm-snapshot deadlock that was observed: 1) Process A sends one-page read bio to the dm-snapshot target. The bio spans snapshot chunk boundary and so it is split to two bios by device mapper. 2) Device mapper creates the first sub-bio and sends it to the snapshot driver. 3) The function snapshot_map calls
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote: > On Tue, Feb 07 2017 at 11:58pm -0500, > Kent Overstreetwrote: > > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > > > So.. what needs to be done there? > > > > > > > > > But, I just got an idea for how to handle this that might be halfway > > > > > sane, maybe > > > > > I'll try and come up with a patch... > > > > > > > > Ok, here's such a patch, only lightly tested: > > > > > > I guess it would be nice for me to test it... but what it is against? > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > > > Sorry, I forgot I had a few other patches in my branch that touch > > mempool/biosets code. > > > > Also, after thinking about it more and looking at the relevant code, I'm > > pretty > > sure we don't need rescuer threads for block devices that just split bios - > > i.e. > > most of them, so I changed my patch to do that. > > > > Tested it by ripping out the current->bio_list checks/workarounds from the > > bcache code, appears to work: > > Feedback on this patch below, but first: > > There are deeper issues with the current->bio_list and rescue workqueues > than thread counts. > > I cannot help but feel like you (and Jens) are repeatedly ignoring the > issue that has been raised numerous times, most recently: > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html > > FYI, this test (albeit ugly) can be used to check if the dm-snapshot > deadlock is fixed: > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > This situation is the unfortunate pathological worst case for what > happens when changes are merged and nobody wants to own fixing the > unforseen implications/regressions. Like everyone else in a position > of Linux maintenance I've tried to stay away from owning the > responsibility of a fix -- it isn't working. Ok, I'll stop bitching > now.. I do bear responsibility for not digging in myself. We're all > busy and this issue is "hard". Mike, it's not my job to debug DM code for you or sift through your bug reports. I don't read dm-devel, and I don't know why you think I that's my job. If there's something you think the block layer should be doing differently, post patches - or at the very least, explain what you'd like to be done, with words. Don't get pissy because I'm not sifting through your bug reports. Hell, I'm not getting paid to work on kernel code at all right now, and you trying to rope me into fixing device mapper sure makes me want to work on the block layer more. DM developers have a long history of working siloed off from the rest of the block layer, building up their own crazy infrastructure (remember the old bio splitting code?) and going to extreme lengths to avoid having to work on or improve the core block layer infrastructure. It's ridiculous. You know what would be nice? What'd really make my day is if just once I got a thank you or a bit of appreciation from DM developers for the bvec iterators/bio splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting rid of merge_bvec_fn, or trying to come up with a better solution for deadlocks due to running under generic_make_request() now.
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Wed, Feb 08, 2017 at 11:34:07AM -0500, Mike Snitzer wrote: > On Tue, Feb 07 2017 at 11:58pm -0500, > Kent Overstreet wrote: > > > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > > > So.. what needs to be done there? > > > > > > > > > But, I just got an idea for how to handle this that might be halfway > > > > > sane, maybe > > > > > I'll try and come up with a patch... > > > > > > > > Ok, here's such a patch, only lightly tested: > > > > > > I guess it would be nice for me to test it... but what it is against? > > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > > > Sorry, I forgot I had a few other patches in my branch that touch > > mempool/biosets code. > > > > Also, after thinking about it more and looking at the relevant code, I'm > > pretty > > sure we don't need rescuer threads for block devices that just split bios - > > i.e. > > most of them, so I changed my patch to do that. > > > > Tested it by ripping out the current->bio_list checks/workarounds from the > > bcache code, appears to work: > > Feedback on this patch below, but first: > > There are deeper issues with the current->bio_list and rescue workqueues > than thread counts. > > I cannot help but feel like you (and Jens) are repeatedly ignoring the > issue that has been raised numerous times, most recently: > https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html > > FYI, this test (albeit ugly) can be used to check if the dm-snapshot > deadlock is fixed: > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > This situation is the unfortunate pathological worst case for what > happens when changes are merged and nobody wants to own fixing the > unforseen implications/regressions. Like everyone else in a position > of Linux maintenance I've tried to stay away from owning the > responsibility of a fix -- it isn't working. Ok, I'll stop bitching > now.. I do bear responsibility for not digging in myself. We're all > busy and this issue is "hard". Mike, it's not my job to debug DM code for you or sift through your bug reports. I don't read dm-devel, and I don't know why you think I that's my job. If there's something you think the block layer should be doing differently, post patches - or at the very least, explain what you'd like to be done, with words. Don't get pissy because I'm not sifting through your bug reports. Hell, I'm not getting paid to work on kernel code at all right now, and you trying to rope me into fixing device mapper sure makes me want to work on the block layer more. DM developers have a long history of working siloed off from the rest of the block layer, building up their own crazy infrastructure (remember the old bio splitting code?) and going to extreme lengths to avoid having to work on or improve the core block layer infrastructure. It's ridiculous. You know what would be nice? What'd really make my day is if just once I got a thank you or a bit of appreciation from DM developers for the bvec iterators/bio splitting work I did that cleaned up a _lot_ of crazy hairy messes. Or getting rid of merge_bvec_fn, or trying to come up with a better solution for deadlocks due to running under generic_make_request() now.
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, Feb 07 2017 at 11:58pm -0500, Kent Overstreetwrote: > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > So.. what needs to be done there? > > > > > > > But, I just got an idea for how to handle this that might be halfway > > > > sane, maybe > > > > I'll try and come up with a patch... > > > > > > Ok, here's such a patch, only lightly tested: > > > > I guess it would be nice for me to test it... but what it is against? > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > Sorry, I forgot I had a few other patches in my branch that touch > mempool/biosets code. > > Also, after thinking about it more and looking at the relevant code, I'm > pretty > sure we don't need rescuer threads for block devices that just split bios - > i.e. > most of them, so I changed my patch to do that. > > Tested it by ripping out the current->bio_list checks/workarounds from the > bcache code, appears to work: Feedback on this patch below, but first: There are deeper issues with the current->bio_list and rescue workqueues than thread counts. I cannot help but feel like you (and Jens) are repeatedly ignoring the issue that has been raised numerous times, most recently: https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html FYI, this test (albeit ugly) can be used to check if the dm-snapshot deadlock is fixed: https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html This situation is the unfortunate pathological worst case for what happens when changes are merged and nobody wants to own fixing the unforseen implications/regressions. Like everyone else in a position of Linux maintenance I've tried to stay away from owning the responsibility of a fix -- it isn't working. Ok, I'll stop bitching now.. I do bear responsibility for not digging in myself. We're all busy and this issue is "hard". > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > Signed-off-by: Kent Overstreet > --- ... > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 3086da5664..e1b22a68d9 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1490,7 +1490,7 @@ static struct mapped_device *alloc_dev(int minor) > INIT_LIST_HEAD(>table_devices); > spin_lock_init(>uevent_lock); > > - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id); > + md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id, 0); > if (!md->queue) > goto bad; > This should be BLK_QUEUE_NO_RESCUER as DM isn't making direct use of bio_queue_split() for its own internal spliting (maybe it should and that'd start to fix the issue I've been harping about?) but as is DM destroys the rescuer workqueue (since commit dbba42d8a9eb "dm: eliminate unused "bioset" process for each bio-based DM device"). Mike
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, Feb 07 2017 at 11:58pm -0500, Kent Overstreet wrote: > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > So.. what needs to be done there? > > > > > > > But, I just got an idea for how to handle this that might be halfway > > > > sane, maybe > > > > I'll try and come up with a patch... > > > > > > Ok, here's such a patch, only lightly tested: > > > > I guess it would be nice for me to test it... but what it is against? > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > Sorry, I forgot I had a few other patches in my branch that touch > mempool/biosets code. > > Also, after thinking about it more and looking at the relevant code, I'm > pretty > sure we don't need rescuer threads for block devices that just split bios - > i.e. > most of them, so I changed my patch to do that. > > Tested it by ripping out the current->bio_list checks/workarounds from the > bcache code, appears to work: Feedback on this patch below, but first: There are deeper issues with the current->bio_list and rescue workqueues than thread counts. I cannot help but feel like you (and Jens) are repeatedly ignoring the issue that has been raised numerous times, most recently: https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html FYI, this test (albeit ugly) can be used to check if the dm-snapshot deadlock is fixed: https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html This situation is the unfortunate pathological worst case for what happens when changes are merged and nobody wants to own fixing the unforseen implications/regressions. Like everyone else in a position of Linux maintenance I've tried to stay away from owning the responsibility of a fix -- it isn't working. Ok, I'll stop bitching now.. I do bear responsibility for not digging in myself. We're all busy and this issue is "hard". > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > Signed-off-by: Kent Overstreet > --- ... > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 3086da5664..e1b22a68d9 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1490,7 +1490,7 @@ static struct mapped_device *alloc_dev(int minor) > INIT_LIST_HEAD(>table_devices); > spin_lock_init(>uevent_lock); > > - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id); > + md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id, 0); > if (!md->queue) > goto bad; > This should be BLK_QUEUE_NO_RESCUER as DM isn't making direct use of bio_queue_split() for its own internal spliting (maybe it should and that'd start to fix the issue I've been harping about?) but as is DM destroys the rescuer workqueue (since commit dbba42d8a9eb "dm: eliminate unused "bioset" process for each bio-based DM device"). Mike
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, 2017-02-07 at 19:58 -0900, Kent Overstreet wrote: > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > So.. what needs to be done there? > > > > > > > But, I just got an idea for how to handle this that might be halfway > > > > sane, maybe > > > > I'll try and come up with a patch... > > > > > > Ok, here's such a patch, only lightly tested: > > > > I guess it would be nice for me to test it... but what it is against? > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > Sorry, I forgot I had a few other patches in my branch that touch > mempool/biosets code. > > Also, after thinking about it more and looking at the relevant code, I'm > pretty > sure we don't need rescuer threads for block devices that just split bios - > i.e. > most of them, so I changed my patch to do that. > > Tested it by ripping out the current->bio_list checks/workarounds from the > bcache code, appears to work: Patch killed every last one of them, but.. homer:/root # dmesg|grep WARNING [ 11.701447] WARNING: CPU: 4 PID: 801 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 11.711027] WARNING: CPU: 4 PID: 801 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0 [ 19.728989] WARNING: CPU: 0 PID: 717 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 19.737020] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0 [ 19.746173] WARNING: CPU: 0 PID: 717 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 19.755260] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0 [ 19.763837] WARNING: CPU: 0 PID: 717 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 19.772526] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, 2017-02-07 at 19:58 -0900, Kent Overstreet wrote: > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > So.. what needs to be done there? > > > > > > > But, I just got an idea for how to handle this that might be halfway > > > > sane, maybe > > > > I'll try and come up with a patch... > > > > > > Ok, here's such a patch, only lightly tested: > > > > I guess it would be nice for me to test it... but what it is against? > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > Sorry, I forgot I had a few other patches in my branch that touch > mempool/biosets code. > > Also, after thinking about it more and looking at the relevant code, I'm > pretty > sure we don't need rescuer threads for block devices that just split bios - > i.e. > most of them, so I changed my patch to do that. > > Tested it by ripping out the current->bio_list checks/workarounds from the > bcache code, appears to work: Patch killed every last one of them, but.. homer:/root # dmesg|grep WARNING [ 11.701447] WARNING: CPU: 4 PID: 801 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 11.711027] WARNING: CPU: 4 PID: 801 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0 [ 19.728989] WARNING: CPU: 0 PID: 717 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 19.737020] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0 [ 19.746173] WARNING: CPU: 0 PID: 717 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 19.755260] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0 [ 19.763837] WARNING: CPU: 0 PID: 717 at block/bio.c:388 bio_alloc_bioset+0x1a7/0x240 [ 19.772526] WARNING: CPU: 0 PID: 717 at block/blk-core.c:2013 generic_make_request+0x191/0x1f0
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > So.. what needs to be done there? > > > > > But, I just got an idea for how to handle this that might be halfway > > > sane, maybe > > > I'll try and come up with a patch... > > > > Ok, here's such a patch, only lightly tested: > > I guess it would be nice for me to test it... but what it is against? > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. Sorry, I forgot I had a few other patches in my branch that touch mempool/biosets code. Also, after thinking about it more and looking at the relevant code, I'm pretty sure we don't need rescuer threads for block devices that just split bios - i.e. most of them, so I changed my patch to do that. Tested it by ripping out the current->bio_list checks/workarounds from the bcache code, appears to work: -- >8 -- Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset Also, trigger rescuing whenever with bios on current->bio_list, instead of only when we block in bio_alloc_bioset(). This is more correct, and should result in fewer rescuer threads. XXX: The current->bio_list plugging needs to be unified with the blk_plug mechanism. Signed-off-by: Kent Overstreet--- block/bio.c| 105 +++-- block/blk-core.c | 69 +++ block/blk-mq.c | 3 +- block/blk-sysfs.c | 2 + drivers/block/brd.c| 2 +- drivers/block/drbd/drbd_main.c | 2 +- drivers/block/null_blk.c | 3 +- drivers/block/pktcdvd.c| 2 +- drivers/block/ps3vram.c| 2 +- drivers/block/rsxx/dev.c | 2 +- drivers/block/umem.c | 2 +- drivers/block/zram/zram_drv.c | 2 +- drivers/lightnvm/gennvm.c | 2 +- drivers/md/bcache/super.c | 2 +- drivers/md/dm.c| 2 +- drivers/md/md.c| 2 +- drivers/nvdimm/blk.c | 2 +- drivers/nvdimm/btt.c | 2 +- drivers/nvdimm/pmem.c | 3 +- drivers/s390/block/dcssblk.c | 2 +- drivers/s390/block/xpram.c | 2 +- include/linux/bio.h| 16 +++ include/linux/blkdev.h | 16 ++- include/linux/sched.h | 2 +- kernel/sched/core.c| 6 +++ 25 files changed, 117 insertions(+), 138 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2b375020fc..9b89be1719 100644 --- a/block/bio.c +++ b/block/bio.c @@ -340,54 +340,6 @@ void bio_chain(struct bio *bio, struct bio *parent) } EXPORT_SYMBOL(bio_chain); -static void bio_alloc_rescue(struct work_struct *work) -{ - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); - struct bio *bio; - - while (1) { - spin_lock(>rescue_lock); - bio = bio_list_pop(>rescue_list); - spin_unlock(>rescue_lock); - - if (!bio) - break; - - generic_make_request(bio); - } -} - -static void punt_bios_to_rescuer(struct bio_set *bs) -{ - struct bio_list punt, nopunt; - struct bio *bio; - - /* -* In order to guarantee forward progress we must punt only bios that -* were allocated from this bio_set; otherwise, if there was a bio on -* there for a stacking driver higher up in the stack, processing it -* could require allocating bios from this bio_set, and doing that from -* our own rescuer would be bad. -* -* Since bio lists are singly linked, pop them all instead of trying to -* remove from the middle of the list: -*/ - - bio_list_init(); - bio_list_init(); - - while ((bio = bio_list_pop(current->bio_list))) - bio_list_add(bio->bi_pool == bs ? : , bio); - - *current->bio_list = nopunt; - - spin_lock(>rescue_lock); - bio_list_merge(>rescue_list, ); - spin_unlock(>rescue_lock); - - queue_work(bs->rescue_workqueue, >rescue_work); -} - /** * bio_alloc_bioset - allocate a bio for I/O * @gfp_mask: the GFP_ mask given to the slab allocator @@ -425,17 +377,20 @@ static void punt_bios_to_rescuer(struct bio_set *bs) */ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) { - gfp_t saved_gfp = gfp_mask; unsigned front_pad; unsigned inline_vecs; struct bio_vec *bvl = NULL; struct bio *bio; void *p; - if (!bs) { - if (nr_iovecs > UIO_MAXIOV) - return NULL; + WARN(current->bio_list && +
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > So.. what needs to be done there? > > > > > But, I just got an idea for how to handle this that might be halfway > > > sane, maybe > > > I'll try and come up with a patch... > > > > Ok, here's such a patch, only lightly tested: > > I guess it would be nice for me to test it... but what it is against? > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. Sorry, I forgot I had a few other patches in my branch that touch mempool/biosets code. Also, after thinking about it more and looking at the relevant code, I'm pretty sure we don't need rescuer threads for block devices that just split bios - i.e. most of them, so I changed my patch to do that. Tested it by ripping out the current->bio_list checks/workarounds from the bcache code, appears to work: -- >8 -- Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset Also, trigger rescuing whenever with bios on current->bio_list, instead of only when we block in bio_alloc_bioset(). This is more correct, and should result in fewer rescuer threads. XXX: The current->bio_list plugging needs to be unified with the blk_plug mechanism. Signed-off-by: Kent Overstreet --- block/bio.c| 105 +++-- block/blk-core.c | 69 +++ block/blk-mq.c | 3 +- block/blk-sysfs.c | 2 + drivers/block/brd.c| 2 +- drivers/block/drbd/drbd_main.c | 2 +- drivers/block/null_blk.c | 3 +- drivers/block/pktcdvd.c| 2 +- drivers/block/ps3vram.c| 2 +- drivers/block/rsxx/dev.c | 2 +- drivers/block/umem.c | 2 +- drivers/block/zram/zram_drv.c | 2 +- drivers/lightnvm/gennvm.c | 2 +- drivers/md/bcache/super.c | 2 +- drivers/md/dm.c| 2 +- drivers/md/md.c| 2 +- drivers/nvdimm/blk.c | 2 +- drivers/nvdimm/btt.c | 2 +- drivers/nvdimm/pmem.c | 3 +- drivers/s390/block/dcssblk.c | 2 +- drivers/s390/block/xpram.c | 2 +- include/linux/bio.h| 16 +++ include/linux/blkdev.h | 16 ++- include/linux/sched.h | 2 +- kernel/sched/core.c| 6 +++ 25 files changed, 117 insertions(+), 138 deletions(-) diff --git a/block/bio.c b/block/bio.c index 2b375020fc..9b89be1719 100644 --- a/block/bio.c +++ b/block/bio.c @@ -340,54 +340,6 @@ void bio_chain(struct bio *bio, struct bio *parent) } EXPORT_SYMBOL(bio_chain); -static void bio_alloc_rescue(struct work_struct *work) -{ - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); - struct bio *bio; - - while (1) { - spin_lock(>rescue_lock); - bio = bio_list_pop(>rescue_list); - spin_unlock(>rescue_lock); - - if (!bio) - break; - - generic_make_request(bio); - } -} - -static void punt_bios_to_rescuer(struct bio_set *bs) -{ - struct bio_list punt, nopunt; - struct bio *bio; - - /* -* In order to guarantee forward progress we must punt only bios that -* were allocated from this bio_set; otherwise, if there was a bio on -* there for a stacking driver higher up in the stack, processing it -* could require allocating bios from this bio_set, and doing that from -* our own rescuer would be bad. -* -* Since bio lists are singly linked, pop them all instead of trying to -* remove from the middle of the list: -*/ - - bio_list_init(); - bio_list_init(); - - while ((bio = bio_list_pop(current->bio_list))) - bio_list_add(bio->bi_pool == bs ? : , bio); - - *current->bio_list = nopunt; - - spin_lock(>rescue_lock); - bio_list_merge(>rescue_list, ); - spin_unlock(>rescue_lock); - - queue_work(bs->rescue_workqueue, >rescue_work); -} - /** * bio_alloc_bioset - allocate a bio for I/O * @gfp_mask: the GFP_ mask given to the slab allocator @@ -425,17 +377,20 @@ static void punt_bios_to_rescuer(struct bio_set *bs) */ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) { - gfp_t saved_gfp = gfp_mask; unsigned front_pad; unsigned inline_vecs; struct bio_vec *bvl = NULL; struct bio *bio; void *p; - if (!bs) { - if (nr_iovecs > UIO_MAXIOV) - return NULL; + WARN(current->bio_list && +!current->bio_list->q->rescue_workqueue, +
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, 2017-02-07 at 21:39 +0100, Pavel Machek wrote: > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > So.. what needs to be done there? > > > > > But, I just got an idea for how to handle this that might be halfway > > > sane, maybe > > > I'll try and come up with a patch... > > > > Ok, here's such a patch, only lightly tested: > > I guess it would be nice for me to test it... but what it is against? > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. It wedged into master easily enough (box still seems to work.. but I'll be rebooting in a very few seconds just in case:), but threads on my desktop box only dropped from 73 to 71. Poo. -Mike
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, 2017-02-07 at 21:39 +0100, Pavel Machek wrote: > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > So.. what needs to be done there? > > > > > But, I just got an idea for how to handle this that might be halfway > > > sane, maybe > > > I'll try and come up with a patch... > > > > Ok, here's such a patch, only lightly tested: > > I guess it would be nice for me to test it... but what it is against? > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. It wedged into master easily enough (box still seems to work.. but I'll be rebooting in a very few seconds just in case:), but threads on my desktop box only dropped from 73 to 71. Poo. -Mike
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, Feb 7, 2017 at 10:49 AM, Kent Overstreetwrote: > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: >> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: >> > Still there on v4.9, 36 threads on nokia n900 cellphone. >> > >> > So.. what needs to be done there? > >> But, I just got an idea for how to handle this that might be halfway sane, >> maybe >> I'll try and come up with a patch... > > Ok, here's such a patch, only lightly tested: > > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Note: this patch is very lightly tested. > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. Looks the rescuer stuff gets simplified much with this patch. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. Yeah, that can be another benefit, :-) > > TODO: If we change normal request_queue drivers to handle arbitrary size > bios by processing requests incrementally, instead of splitting bios, > then we can get rid of rescuer threads from those devices. Also the rescue threads are often from some reserved block devices, such as loop/nbd, and we should have allowed these drivers to delay allocating the thread just before the disk is activated. Then the thread number can get descreased a lot. > --- > block/bio.c| 107 > - > block/blk-core.c | 58 --- > block/blk-sysfs.c | 2 + > include/linux/bio.h| 16 > include/linux/blkdev.h | 10 + > include/linux/sched.h | 2 +- > kernel/sched/core.c| 4 ++ > 7 files changed, 83 insertions(+), 116 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index f3b5786202..9ad54a9b12 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent) > } > EXPORT_SYMBOL(bio_chain); > > -static void bio_alloc_rescue(struct work_struct *work) > -{ > - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); > - struct bio *bio; > - > - while (1) { > - spin_lock(>rescue_lock); > - bio = bio_list_pop(>rescue_list); > - spin_unlock(>rescue_lock); > - > - if (!bio) > - break; > - > - generic_make_request(bio); > - } > -} > - > -static void punt_bios_to_rescuer(struct bio_set *bs) > -{ > - struct bio_list punt, nopunt; > - struct bio *bio; > - > - /* > -* In order to guarantee forward progress we must punt only bios that > -* were allocated from this bio_set; otherwise, if there was a bio on > -* there for a stacking driver higher up in the stack, processing it > -* could require allocating bios from this bio_set, and doing that > from > -* our own rescuer would be bad. > -* > -* Since bio lists are singly linked, pop them all instead of trying > to > -* remove from the middle of the list: > -*/ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > - > - queue_work(bs->rescue_workqueue, >rescue_work); > -} > - > /** > * bio_alloc_bioset - allocate a bio for I/O > * @gfp_mask: the GFP_ mask given to the slab allocator > @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs) > */ > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set > *bs) > { > - gfp_t saved_gfp = gfp_mask; > unsigned front_pad; > unsigned inline_vecs; > struct bio_vec *bvl = NULL; > struct bio *bio; > void *p; > > - if (!bs) { > - if (nr_iovecs > UIO_MAXIOV) > - return NULL; > + WARN(current->bio_list && > +!current->bio_list->q->rescue_workqueue, > +"allocating bio beneath generic_make_request() without rescuer"); > > + if (nr_iovecs > UIO_MAXIOV) > + return NULL; > + > + if (!bs) { > p = kmalloc(sizeof(struct bio) + > nr_iovecs * sizeof(struct bio_vec), > gfp_mask); > front_pad = 0; > inline_vecs = nr_iovecs; > } else { > - /* > -* generic_make_request() converts recursion to iteration; > this > -* means if we're running beneath it, any bios we allocate and > -* submit will not be
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Tue, Feb 7, 2017 at 10:49 AM, Kent Overstreet wrote: > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: >> On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: >> > Still there on v4.9, 36 threads on nokia n900 cellphone. >> > >> > So.. what needs to be done there? > >> But, I just got an idea for how to handle this that might be halfway sane, >> maybe >> I'll try and come up with a patch... > > Ok, here's such a patch, only lightly tested: > > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Note: this patch is very lightly tested. > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. Looks the rescuer stuff gets simplified much with this patch. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. Yeah, that can be another benefit, :-) > > TODO: If we change normal request_queue drivers to handle arbitrary size > bios by processing requests incrementally, instead of splitting bios, > then we can get rid of rescuer threads from those devices. Also the rescue threads are often from some reserved block devices, such as loop/nbd, and we should have allowed these drivers to delay allocating the thread just before the disk is activated. Then the thread number can get descreased a lot. > --- > block/bio.c| 107 > - > block/blk-core.c | 58 --- > block/blk-sysfs.c | 2 + > include/linux/bio.h| 16 > include/linux/blkdev.h | 10 + > include/linux/sched.h | 2 +- > kernel/sched/core.c| 4 ++ > 7 files changed, 83 insertions(+), 116 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index f3b5786202..9ad54a9b12 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent) > } > EXPORT_SYMBOL(bio_chain); > > -static void bio_alloc_rescue(struct work_struct *work) > -{ > - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); > - struct bio *bio; > - > - while (1) { > - spin_lock(>rescue_lock); > - bio = bio_list_pop(>rescue_list); > - spin_unlock(>rescue_lock); > - > - if (!bio) > - break; > - > - generic_make_request(bio); > - } > -} > - > -static void punt_bios_to_rescuer(struct bio_set *bs) > -{ > - struct bio_list punt, nopunt; > - struct bio *bio; > - > - /* > -* In order to guarantee forward progress we must punt only bios that > -* were allocated from this bio_set; otherwise, if there was a bio on > -* there for a stacking driver higher up in the stack, processing it > -* could require allocating bios from this bio_set, and doing that > from > -* our own rescuer would be bad. > -* > -* Since bio lists are singly linked, pop them all instead of trying > to > -* remove from the middle of the list: > -*/ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > - > - queue_work(bs->rescue_workqueue, >rescue_work); > -} > - > /** > * bio_alloc_bioset - allocate a bio for I/O > * @gfp_mask: the GFP_ mask given to the slab allocator > @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs) > */ > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set > *bs) > { > - gfp_t saved_gfp = gfp_mask; > unsigned front_pad; > unsigned inline_vecs; > struct bio_vec *bvl = NULL; > struct bio *bio; > void *p; > > - if (!bs) { > - if (nr_iovecs > UIO_MAXIOV) > - return NULL; > + WARN(current->bio_list && > +!current->bio_list->q->rescue_workqueue, > +"allocating bio beneath generic_make_request() without rescuer"); > > + if (nr_iovecs > UIO_MAXIOV) > + return NULL; > + > + if (!bs) { > p = kmalloc(sizeof(struct bio) + > nr_iovecs * sizeof(struct bio_vec), > gfp_mask); > front_pad = 0; > inline_vecs = nr_iovecs; > } else { > - /* > -* generic_make_request() converts recursion to iteration; > this > -* means if we're running beneath it, any bios we allocate and > -* submit will not be submitted (and thus freed)
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > So.. what needs to be done there? > > > But, I just got an idea for how to handle this that might be halfway sane, > > maybe > > I'll try and come up with a patch... > > Ok, here's such a patch, only lightly tested: I guess it would be nice for me to test it... but what it is against? I tried after v4.10-rc5 and linux-next, but got rejects in both cases. Thanks, Pavel > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Note: this patch is very lightly tested. > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > TODO: If we change normal request_queue drivers to handle arbitrary size > bios by processing requests incrementally, instead of splitting bios, > then we can get rid of rescuer threads from those devices. > --- > block/bio.c| 107 > - > block/blk-core.c | 58 --- > block/blk-sysfs.c | 2 + > include/linux/bio.h| 16 > include/linux/blkdev.h | 10 + > include/linux/sched.h | 2 +- > kernel/sched/core.c| 4 ++ > 7 files changed, 83 insertions(+), 116 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index f3b5786202..9ad54a9b12 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent) > } > EXPORT_SYMBOL(bio_chain); > > -static void bio_alloc_rescue(struct work_struct *work) > -{ > - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); > - struct bio *bio; > - > - while (1) { > - spin_lock(>rescue_lock); > - bio = bio_list_pop(>rescue_list); > - spin_unlock(>rescue_lock); > - > - if (!bio) > - break; > - > - generic_make_request(bio); > - } > -} > - > -static void punt_bios_to_rescuer(struct bio_set *bs) > -{ > - struct bio_list punt, nopunt; > - struct bio *bio; > - > - /* > - * In order to guarantee forward progress we must punt only bios that > - * were allocated from this bio_set; otherwise, if there was a bio on > - * there for a stacking driver higher up in the stack, processing it > - * could require allocating bios from this bio_set, and doing that from > - * our own rescuer would be bad. > - * > - * Since bio lists are singly linked, pop them all instead of trying to > - * remove from the middle of the list: > - */ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > - > - queue_work(bs->rescue_workqueue, >rescue_work); > -} > - > /** > * bio_alloc_bioset - allocate a bio for I/O > * @gfp_mask: the GFP_ mask given to the slab allocator > @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs) > */ > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set > *bs) > { > - gfp_t saved_gfp = gfp_mask; > unsigned front_pad; > unsigned inline_vecs; > struct bio_vec *bvl = NULL; > struct bio *bio; > void *p; > > - if (!bs) { > - if (nr_iovecs > UIO_MAXIOV) > - return NULL; > + WARN(current->bio_list && > + !current->bio_list->q->rescue_workqueue, > + "allocating bio beneath generic_make_request() without rescuer"); > > + if (nr_iovecs > UIO_MAXIOV) > + return NULL; > + > + if (!bs) { > p = kmalloc(sizeof(struct bio) + > nr_iovecs * sizeof(struct bio_vec), > gfp_mask); > front_pad = 0; > inline_vecs = nr_iovecs; > } else { > - /* > - * generic_make_request() converts recursion to iteration; this > - * means if we're running beneath it, any bios we allocate and > - * submit will not be submitted (and thus freed) until after we > - * return. > - * > - * This exposes us to a potential deadlock if we allocate > - * multiple bios from the same bio_set() while running > - * underneath
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > So.. what needs to be done there? > > > But, I just got an idea for how to handle this that might be halfway sane, > > maybe > > I'll try and come up with a patch... > > Ok, here's such a patch, only lightly tested: I guess it would be nice for me to test it... but what it is against? I tried after v4.10-rc5 and linux-next, but got rejects in both cases. Thanks, Pavel > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Note: this patch is very lightly tested. > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > TODO: If we change normal request_queue drivers to handle arbitrary size > bios by processing requests incrementally, instead of splitting bios, > then we can get rid of rescuer threads from those devices. > --- > block/bio.c| 107 > - > block/blk-core.c | 58 --- > block/blk-sysfs.c | 2 + > include/linux/bio.h| 16 > include/linux/blkdev.h | 10 + > include/linux/sched.h | 2 +- > kernel/sched/core.c| 4 ++ > 7 files changed, 83 insertions(+), 116 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index f3b5786202..9ad54a9b12 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent) > } > EXPORT_SYMBOL(bio_chain); > > -static void bio_alloc_rescue(struct work_struct *work) > -{ > - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); > - struct bio *bio; > - > - while (1) { > - spin_lock(>rescue_lock); > - bio = bio_list_pop(>rescue_list); > - spin_unlock(>rescue_lock); > - > - if (!bio) > - break; > - > - generic_make_request(bio); > - } > -} > - > -static void punt_bios_to_rescuer(struct bio_set *bs) > -{ > - struct bio_list punt, nopunt; > - struct bio *bio; > - > - /* > - * In order to guarantee forward progress we must punt only bios that > - * were allocated from this bio_set; otherwise, if there was a bio on > - * there for a stacking driver higher up in the stack, processing it > - * could require allocating bios from this bio_set, and doing that from > - * our own rescuer would be bad. > - * > - * Since bio lists are singly linked, pop them all instead of trying to > - * remove from the middle of the list: > - */ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > - > - queue_work(bs->rescue_workqueue, >rescue_work); > -} > - > /** > * bio_alloc_bioset - allocate a bio for I/O > * @gfp_mask: the GFP_ mask given to the slab allocator > @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs) > */ > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set > *bs) > { > - gfp_t saved_gfp = gfp_mask; > unsigned front_pad; > unsigned inline_vecs; > struct bio_vec *bvl = NULL; > struct bio *bio; > void *p; > > - if (!bs) { > - if (nr_iovecs > UIO_MAXIOV) > - return NULL; > + WARN(current->bio_list && > + !current->bio_list->q->rescue_workqueue, > + "allocating bio beneath generic_make_request() without rescuer"); > > + if (nr_iovecs > UIO_MAXIOV) > + return NULL; > + > + if (!bs) { > p = kmalloc(sizeof(struct bio) + > nr_iovecs * sizeof(struct bio_vec), > gfp_mask); > front_pad = 0; > inline_vecs = nr_iovecs; > } else { > - /* > - * generic_make_request() converts recursion to iteration; this > - * means if we're running beneath it, any bios we allocate and > - * submit will not be submitted (and thus freed) until after we > - * return. > - * > - * This exposes us to a potential deadlock if we allocate > - * multiple bios from the same bio_set() while running > - * underneath
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon, Feb 06 2017 at 9:49pm -0500, Kent Overstreetwrote: > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > So.. what needs to be done there? > > > But, I just got an idea for how to handle this that might be halfway sane, > > maybe > > I'll try and come up with a patch... > > Ok, here's such a patch, only lightly tested: > > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Note: this patch is very lightly tested. > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > TODO: If we change normal request_queue drivers to handle arbitrary size > bios by processing requests incrementally, instead of splitting bios, > then we can get rid of rescuer threads from those devices. Hi Kent, I really appreciate you working on this further. Thanks. As I think you're probably already aware, a long standing issue with the per bio_set rescuer is this bug (which manifests in dm-snapshot deadlocks): https://bugzilla.kernel.org/show_bug.cgi?id=119841 Please also see this patch header, from a private branch from a while ago, that describes the problem in detail: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=cd2c760b5a609e2aaf3735a7b9503a953535c368 Would welcome your consideration of that BZ as you think further and/or iterate on this line of work. Mike
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon, Feb 06 2017 at 9:49pm -0500, Kent Overstreet wrote: > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > So.. what needs to be done there? > > > But, I just got an idea for how to handle this that might be halfway sane, > > maybe > > I'll try and come up with a patch... > > Ok, here's such a patch, only lightly tested: > > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Note: this patch is very lightly tested. > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > TODO: If we change normal request_queue drivers to handle arbitrary size > bios by processing requests incrementally, instead of splitting bios, > then we can get rid of rescuer threads from those devices. Hi Kent, I really appreciate you working on this further. Thanks. As I think you're probably already aware, a long standing issue with the per bio_set rescuer is this bug (which manifests in dm-snapshot deadlocks): https://bugzilla.kernel.org/show_bug.cgi?id=119841 Please also see this patch header, from a private branch from a while ago, that describes the problem in detail: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=cd2c760b5a609e2aaf3735a7b9503a953535c368 Would welcome your consideration of that BZ as you think further and/or iterate on this line of work. Mike
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > So.. what needs to be done there? > But, I just got an idea for how to handle this that might be halfway sane, > maybe > I'll try and come up with a patch... Ok, here's such a patch, only lightly tested: -- >8 -- Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset Note: this patch is very lightly tested. Also, trigger rescuing whenever with bios on current->bio_list, instead of only when we block in bio_alloc_bioset(). This is more correct, and should result in fewer rescuer threads. XXX: The current->bio_list plugging needs to be unified with the blk_plug mechanism. TODO: If we change normal request_queue drivers to handle arbitrary size bios by processing requests incrementally, instead of splitting bios, then we can get rid of rescuer threads from those devices. --- block/bio.c| 107 - block/blk-core.c | 58 --- block/blk-sysfs.c | 2 + include/linux/bio.h| 16 include/linux/blkdev.h | 10 + include/linux/sched.h | 2 +- kernel/sched/core.c| 4 ++ 7 files changed, 83 insertions(+), 116 deletions(-) diff --git a/block/bio.c b/block/bio.c index f3b5786202..9ad54a9b12 100644 --- a/block/bio.c +++ b/block/bio.c @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent) } EXPORT_SYMBOL(bio_chain); -static void bio_alloc_rescue(struct work_struct *work) -{ - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); - struct bio *bio; - - while (1) { - spin_lock(>rescue_lock); - bio = bio_list_pop(>rescue_list); - spin_unlock(>rescue_lock); - - if (!bio) - break; - - generic_make_request(bio); - } -} - -static void punt_bios_to_rescuer(struct bio_set *bs) -{ - struct bio_list punt, nopunt; - struct bio *bio; - - /* -* In order to guarantee forward progress we must punt only bios that -* were allocated from this bio_set; otherwise, if there was a bio on -* there for a stacking driver higher up in the stack, processing it -* could require allocating bios from this bio_set, and doing that from -* our own rescuer would be bad. -* -* Since bio lists are singly linked, pop them all instead of trying to -* remove from the middle of the list: -*/ - - bio_list_init(); - bio_list_init(); - - while ((bio = bio_list_pop(current->bio_list))) - bio_list_add(bio->bi_pool == bs ? : , bio); - - *current->bio_list = nopunt; - - spin_lock(>rescue_lock); - bio_list_merge(>rescue_list, ); - spin_unlock(>rescue_lock); - - queue_work(bs->rescue_workqueue, >rescue_work); -} - /** * bio_alloc_bioset - allocate a bio for I/O * @gfp_mask: the GFP_ mask given to the slab allocator @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs) */ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) { - gfp_t saved_gfp = gfp_mask; unsigned front_pad; unsigned inline_vecs; struct bio_vec *bvl = NULL; struct bio *bio; void *p; - if (!bs) { - if (nr_iovecs > UIO_MAXIOV) - return NULL; + WARN(current->bio_list && +!current->bio_list->q->rescue_workqueue, +"allocating bio beneath generic_make_request() without rescuer"); + if (nr_iovecs > UIO_MAXIOV) + return NULL; + + if (!bs) { p = kmalloc(sizeof(struct bio) + nr_iovecs * sizeof(struct bio_vec), gfp_mask); front_pad = 0; inline_vecs = nr_iovecs; } else { - /* -* generic_make_request() converts recursion to iteration; this -* means if we're running beneath it, any bios we allocate and -* submit will not be submitted (and thus freed) until after we -* return. -* -* This exposes us to a potential deadlock if we allocate -* multiple bios from the same bio_set() while running -* underneath generic_make_request(). If we were to allocate -* multiple bios (say a stacking block driver that was splitting -* bios), we would deadlock if we exhausted the mempool's -* reserve. -* -* We solve this, and guarantee forward progress, with a rescuer -* workqueue per bio_set. If we go to allocate and there are -
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > So.. what needs to be done there? > But, I just got an idea for how to handle this that might be halfway sane, > maybe > I'll try and come up with a patch... Ok, here's such a patch, only lightly tested: -- >8 -- Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset Note: this patch is very lightly tested. Also, trigger rescuing whenever with bios on current->bio_list, instead of only when we block in bio_alloc_bioset(). This is more correct, and should result in fewer rescuer threads. XXX: The current->bio_list plugging needs to be unified with the blk_plug mechanism. TODO: If we change normal request_queue drivers to handle arbitrary size bios by processing requests incrementally, instead of splitting bios, then we can get rid of rescuer threads from those devices. --- block/bio.c| 107 - block/blk-core.c | 58 --- block/blk-sysfs.c | 2 + include/linux/bio.h| 16 include/linux/blkdev.h | 10 + include/linux/sched.h | 2 +- kernel/sched/core.c| 4 ++ 7 files changed, 83 insertions(+), 116 deletions(-) diff --git a/block/bio.c b/block/bio.c index f3b5786202..9ad54a9b12 100644 --- a/block/bio.c +++ b/block/bio.c @@ -336,54 +336,6 @@ void bio_chain(struct bio *bio, struct bio *parent) } EXPORT_SYMBOL(bio_chain); -static void bio_alloc_rescue(struct work_struct *work) -{ - struct bio_set *bs = container_of(work, struct bio_set, rescue_work); - struct bio *bio; - - while (1) { - spin_lock(>rescue_lock); - bio = bio_list_pop(>rescue_list); - spin_unlock(>rescue_lock); - - if (!bio) - break; - - generic_make_request(bio); - } -} - -static void punt_bios_to_rescuer(struct bio_set *bs) -{ - struct bio_list punt, nopunt; - struct bio *bio; - - /* -* In order to guarantee forward progress we must punt only bios that -* were allocated from this bio_set; otherwise, if there was a bio on -* there for a stacking driver higher up in the stack, processing it -* could require allocating bios from this bio_set, and doing that from -* our own rescuer would be bad. -* -* Since bio lists are singly linked, pop them all instead of trying to -* remove from the middle of the list: -*/ - - bio_list_init(); - bio_list_init(); - - while ((bio = bio_list_pop(current->bio_list))) - bio_list_add(bio->bi_pool == bs ? : , bio); - - *current->bio_list = nopunt; - - spin_lock(>rescue_lock); - bio_list_merge(>rescue_list, ); - spin_unlock(>rescue_lock); - - queue_work(bs->rescue_workqueue, >rescue_work); -} - /** * bio_alloc_bioset - allocate a bio for I/O * @gfp_mask: the GFP_ mask given to the slab allocator @@ -421,54 +373,27 @@ static void punt_bios_to_rescuer(struct bio_set *bs) */ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) { - gfp_t saved_gfp = gfp_mask; unsigned front_pad; unsigned inline_vecs; struct bio_vec *bvl = NULL; struct bio *bio; void *p; - if (!bs) { - if (nr_iovecs > UIO_MAXIOV) - return NULL; + WARN(current->bio_list && +!current->bio_list->q->rescue_workqueue, +"allocating bio beneath generic_make_request() without rescuer"); + if (nr_iovecs > UIO_MAXIOV) + return NULL; + + if (!bs) { p = kmalloc(sizeof(struct bio) + nr_iovecs * sizeof(struct bio_vec), gfp_mask); front_pad = 0; inline_vecs = nr_iovecs; } else { - /* -* generic_make_request() converts recursion to iteration; this -* means if we're running beneath it, any bios we allocate and -* submit will not be submitted (and thus freed) until after we -* return. -* -* This exposes us to a potential deadlock if we allocate -* multiple bios from the same bio_set() while running -* underneath generic_make_request(). If we were to allocate -* multiple bios (say a stacking block driver that was splitting -* bios), we would deadlock if we exhausted the mempool's -* reserve. -* -* We solve this, and guarantee forward progress, with a rescuer -* workqueue per bio_set. If we go to allocate and there are -
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > On Sat 2016-02-20 21:04:32, Pavel Machek wrote: > > Hi! > > > > > > > > > I know it is normal to spawn 8 threads for every single function, > > > > > > ... > > > > > > > but 28 threads? > > > > > > > > > > > > > > root 974 0.0 0.0 0 0 ?S< Dec08 0:00 > > > > > > > [bioset] > > > > > > ... > > > > > > > > > > > > How many physical block devices do you have? > > > > > > > > > > > > DM is doing its part to not contribute to this: > > > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each > > > > > > bio-based DM device") > > > > > > > > > > > > (but yeah, all these extra 'bioset' threads aren't ideal) > > > > > > > > > > Still there in 4.4-final. > > > > > > > > ...and still there in 4.5-rc4 :-(. > > > > > > You're directing this concern to the wrong person. > > > > > > I already told you DM is _not_ contributing any extra "bioset" threads > > > (ever since commit dbba42d8a). > > > > Well, sorry about that. Note that l-k is on the cc list, so hopefully > > the right person sees it too. > > > > Ok, let me check... it seems that > > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent > > Overstreetis to blame. > > > > Um, and you acked the patch, so you are partly responsible. > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > So.. what needs to be done there? So background: We need rescuer threads because: Say you allocate a bio from a bioset, submit that bio, and then allocate again from that same bioset: if you're running underneath generic_make_request(), you'll deadlock. Real submission of the first bio you allocated is blocked until you return from your make_request_fn(), but you're blocking that trying to allocate - this is handled (in a hacky way!) with the punt_bios_to_rescuer code when we go to allocate from a bioset but have to block. We need more than a single global rescuer, because: The rescuer thread is just resubmitting bios, so if in the course of submitting bios, _their_ drivers allocate new bios from biosets and block - oops, we're recursing. However: The rescuer threads don't inherently need to be per bioset - they really ought to be per block device. Additionally, triggering the "punt bios to rescuer" code only when we go to allocate from a bioset and block is wrong: it's possible to create these sorts of deadlocks by blocking on other things. The right thing to do would be to trigger this "punt bios to rescuer" thing whenever we schedule, and there's still bios on current->bio_list. This is actually how Jens's new(er) plugging code works (which post dates my punt bios to rescuer hack). What needs to happen is Jens's scheduler hook for block plugging needs to be be unified with both the current->bio_list thing (which is really a block plug, just open coded, as it predates _all_ of this) and the rescuer thread stuff. The tricky part is going to be making the rescuer threads per block device _correctly_ (without introducing new deadlocks)... reasoning out these deadlocks always makes my head hurt, the biggest reason I made the rescuer threads per bioset was that when I wrote the code I wasn't at all confident I could get anything else right. Still uneasy about that :) What needs rescuer threads? - if we're allocating from a bioset, but we're not running under generic_make_request() (e.g. we're in filesystem code) - don't need a rescuer there, we're not blocking previously allocated bios from being submitted. - if we're a block driver that doesn't allocate new bios, we don't need a rescuer thread. But note that any block driver that calls blk_queue_split() to handle arbitrary size bios is allocating bios. If we converted e.g. the standard request queue code to process bios/requests incrementally, instead of requiring them to be split, this would go away (and that's something that should be done anyways, it would improve performance by getting rid of segment counting). So, we should only need one rescuer thread per block device - and if we get rid of splitting to handle arbitrary size bios, most block devices won't need rescuers. The catch is that the correct rescuer to punt a bio to corresponds to the device that _issued_ the bio, not the device the bio was submitted to, and that information is currently lost by the time we block - that's the other reason I made rescuers per bioset, since bios do track the bioset they were allocated from. But, I just got an idea for how to handle this that might be halfway sane, maybe I'll try and come up with a patch...
Re: v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > On Sat 2016-02-20 21:04:32, Pavel Machek wrote: > > Hi! > > > > > > > > > I know it is normal to spawn 8 threads for every single function, > > > > > > ... > > > > > > > but 28 threads? > > > > > > > > > > > > > > root 974 0.0 0.0 0 0 ?S< Dec08 0:00 > > > > > > > [bioset] > > > > > > ... > > > > > > > > > > > > How many physical block devices do you have? > > > > > > > > > > > > DM is doing its part to not contribute to this: > > > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each > > > > > > bio-based DM device") > > > > > > > > > > > > (but yeah, all these extra 'bioset' threads aren't ideal) > > > > > > > > > > Still there in 4.4-final. > > > > > > > > ...and still there in 4.5-rc4 :-(. > > > > > > You're directing this concern to the wrong person. > > > > > > I already told you DM is _not_ contributing any extra "bioset" threads > > > (ever since commit dbba42d8a). > > > > Well, sorry about that. Note that l-k is on the cc list, so hopefully > > the right person sees it too. > > > > Ok, let me check... it seems that > > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent > > Overstreet is to blame. > > > > Um, and you acked the patch, so you are partly responsible. > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > So.. what needs to be done there? So background: We need rescuer threads because: Say you allocate a bio from a bioset, submit that bio, and then allocate again from that same bioset: if you're running underneath generic_make_request(), you'll deadlock. Real submission of the first bio you allocated is blocked until you return from your make_request_fn(), but you're blocking that trying to allocate - this is handled (in a hacky way!) with the punt_bios_to_rescuer code when we go to allocate from a bioset but have to block. We need more than a single global rescuer, because: The rescuer thread is just resubmitting bios, so if in the course of submitting bios, _their_ drivers allocate new bios from biosets and block - oops, we're recursing. However: The rescuer threads don't inherently need to be per bioset - they really ought to be per block device. Additionally, triggering the "punt bios to rescuer" code only when we go to allocate from a bioset and block is wrong: it's possible to create these sorts of deadlocks by blocking on other things. The right thing to do would be to trigger this "punt bios to rescuer" thing whenever we schedule, and there's still bios on current->bio_list. This is actually how Jens's new(er) plugging code works (which post dates my punt bios to rescuer hack). What needs to happen is Jens's scheduler hook for block plugging needs to be be unified with both the current->bio_list thing (which is really a block plug, just open coded, as it predates _all_ of this) and the rescuer thread stuff. The tricky part is going to be making the rescuer threads per block device _correctly_ (without introducing new deadlocks)... reasoning out these deadlocks always makes my head hurt, the biggest reason I made the rescuer threads per bioset was that when I wrote the code I wasn't at all confident I could get anything else right. Still uneasy about that :) What needs rescuer threads? - if we're allocating from a bioset, but we're not running under generic_make_request() (e.g. we're in filesystem code) - don't need a rescuer there, we're not blocking previously allocated bios from being submitted. - if we're a block driver that doesn't allocate new bios, we don't need a rescuer thread. But note that any block driver that calls blk_queue_split() to handle arbitrary size bios is allocating bios. If we converted e.g. the standard request queue code to process bios/requests incrementally, instead of requiring them to be split, this would go away (and that's something that should be done anyways, it would improve performance by getting rid of segment counting). So, we should only need one rescuer thread per block device - and if we get rid of splitting to handle arbitrary size bios, most block devices won't need rescuers. The catch is that the correct rescuer to punt a bio to corresponds to the device that _issued_ the bio, not the device the bio was submitted to, and that information is currently lost by the time we block - that's the other reason I made rescuers per bioset, since bios do track the bioset they were allocated from. But, I just got an idea for how to handle this that might be halfway sane, maybe I'll try and come up with a patch...
v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Sat 2016-02-20 21:04:32, Pavel Machek wrote: > Hi! > > > > > > > I know it is normal to spawn 8 threads for every single function, > > > > > ... > > > > > > but 28 threads? > > > > > > > > > > > > root 974 0.0 0.0 0 0 ?S< Dec08 0:00 > > > > > > [bioset] > > > > > ... > > > > > > > > > > How many physical block devices do you have? > > > > > > > > > > DM is doing its part to not contribute to this: > > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each bio-based > > > > > DM device") > > > > > > > > > > (but yeah, all these extra 'bioset' threads aren't ideal) > > > > > > > > Still there in 4.4-final. > > > > > > ...and still there in 4.5-rc4 :-(. > > > > You're directing this concern to the wrong person. > > > > I already told you DM is _not_ contributing any extra "bioset" threads > > (ever since commit dbba42d8a). > > Well, sorry about that. Note that l-k is on the cc list, so hopefully > the right person sees it too. > > Ok, let me check... it seems that > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent > Overstreetis to blame. > > Um, and you acked the patch, so you are partly responsible. Still there on v4.9, 36 threads on nokia n900 cellphone. So.. what needs to be done there? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone
On Sat 2016-02-20 21:04:32, Pavel Machek wrote: > Hi! > > > > > > > I know it is normal to spawn 8 threads for every single function, > > > > > ... > > > > > > but 28 threads? > > > > > > > > > > > > root 974 0.0 0.0 0 0 ?S< Dec08 0:00 > > > > > > [bioset] > > > > > ... > > > > > > > > > > How many physical block devices do you have? > > > > > > > > > > DM is doing its part to not contribute to this: > > > > > dbba42d8a ("dm: eliminate unused "bioset" process for each bio-based > > > > > DM device") > > > > > > > > > > (but yeah, all these extra 'bioset' threads aren't ideal) > > > > > > > > Still there in 4.4-final. > > > > > > ...and still there in 4.5-rc4 :-(. > > > > You're directing this concern to the wrong person. > > > > I already told you DM is _not_ contributing any extra "bioset" threads > > (ever since commit dbba42d8a). > > Well, sorry about that. Note that l-k is on the cc list, so hopefully > the right person sees it too. > > Ok, let me check... it seems that > 54efd50bfd873e2dbf784e0b21a8027ba4299a3e is responsible, thus Kent > Overstreet is to blame. > > Um, and you acked the patch, so you are partly responsible. Still there on v4.9, 36 threads on nokia n900 cellphone. So.. what needs to be done there? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature