Re: [PATCH v2] block: flush queued bios when the process blocks
On Fri, Oct 16, 2015 at 11:29 PM, Mike Snitzer wrote: > On Thu, Oct 15 2015 at 11:08pm -0400, > Ming Lei wrote: > >> On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer wrote: >> > On Wed, Oct 14 2015 at 11:27pm -0400, >> > Ming Lei wrote: >> > >> >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: >> >> > >> >> > Turns out that this change: >> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 >> >> > >> >> > needed to be reverted with: >> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe >> >> > >> >> > Because nested plugs caused generic_make_request()'s onstack bio_list to >> >> > go out of scope (blk_finish_plug() wouldn't actually flush the list >> >> > within generic_make_request because XFS already added an outermost >> >> > plug). >> >> >> >> Looks you should have defined bio_list in plug as >> >> >> >>'struct bio_list bio_list' >> >> >> >> instead of one pointer. >> > >> > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced >> > above that does exactly that). That wasn't the problem. >> >> OK. >> >> > >> >> > >> >> > But even after fixing that I then hit issues with these changes now >> >> > resulting in imperfect 'in_generic_make_request' accounting that happens >> >> > lazily once the outermost plug completes blk_finish_plug. manifested as >> >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. >> >> >> >> Looks this problem should be related with above 'bio_list' definition too. >> > >> > No, as I explained it was due to the nested plug: >> > >> >> > >> >> > Basically using the blk-core's onstack plugging isn't workable for >> >> > fixing this deadlock and we're back to having to seriously consider >> >> > this (with its additional hook in the scheduler) >> > >> > To elaborate, for the code in DM (and other subsystems like bcache) that >> > rely on accurate accounting of whether we're actively _in_ >> > generic_make_request: using plug to store/manage the bio_list isn't >> >> That looks an interesting requirement, which means DM just need to know >> if the current callsite is from generic_make_request(), so what you need >> is just one per-task variable. >> >> With the stack variable of 'plug', it should be easier to do that for DM, for >> example, you can introduce one flag in 'struct blk_plug', then set it in >> the entry of generic_make_request(), and clear it in the exit of the >> function. > > Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_ > generic_make_request() but then it just calls into question why the heck > we're using the plug to begin with? (especially given plugging is for > request-based devices at this point!). > > It really doesn't make _any_ sense to overload blk_plug by moving the > bio_list into there and adding a 'in_generic_make_request'... when you > consider the _only_ reason this was suggested is to (ab)use the existing > hook in scheduler/core.c. At the first glance, I mean it is doable to use blk_plug for the issue. >From last year's discussion, looks Jens thought we have plug already which should have covered this case, also Kent wanted to implement plug for bio too. > > So I stand by my position that there is really no point in the exercise > and that it actually hurts the code to try to make this a blk_plug > "feature". > > We already have well established current->bio_list semantics that can be > reused as a flag given it is a pointer. The block callout in the > scheduler is going to grow a conditional either way. What I've proposed > _seems_ the cleanest to me and others. Hopefully you can see that > aspect of things. > > So if you could review the v3 patch with a critical eye that'd be very > much appreciated. Will do. > > But I do look forward to Jens also having a look at this and providing > his review feedback. > >> > workable because nested plugs change the lifetime of when the bio_list >> > is processed (as I implemented it -- which was to respect nested plugs). >> > I could've forced the issue by making the bio_list get processed >> > regardless of nesting but that would've made the onstack plugging much >> > more convoluted (duality between nested vs not just for bio_list's >> > benefit and for what gain? Simply to avoid an extra conditional >> > immediately in the scheduler? That conditional was still added anyway >> > but just as part of blk_needs_flush_plug so in the end there wasn't any >> > benefit!). >> > >> > Hopefully my middle-of-the-night reply is coherent and helped to clarify >> > my position that (ab)using blk_plug for the bio_list management is >> > _really_ awkward. ;) >> >> Hope it wan't my reply to cause the break of your sleep, :-) > > No, my dog woke me up to go outside at 4am.. I was up and couldn't > resist looking at my phone.. the rest is history ;) -- Ming Lei -- To unsubscribe from
Re: [PATCH v2] block: flush queued bios when the process blocks
On Fri, Oct 16, 2015 at 11:29 PM, Mike Snitzerwrote: > On Thu, Oct 15 2015 at 11:08pm -0400, > Ming Lei wrote: > >> On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer wrote: >> > On Wed, Oct 14 2015 at 11:27pm -0400, >> > Ming Lei wrote: >> > >> >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: >> >> > >> >> > Turns out that this change: >> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 >> >> > >> >> > needed to be reverted with: >> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe >> >> > >> >> > Because nested plugs caused generic_make_request()'s onstack bio_list to >> >> > go out of scope (blk_finish_plug() wouldn't actually flush the list >> >> > within generic_make_request because XFS already added an outermost >> >> > plug). >> >> >> >> Looks you should have defined bio_list in plug as >> >> >> >>'struct bio_list bio_list' >> >> >> >> instead of one pointer. >> > >> > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced >> > above that does exactly that). That wasn't the problem. >> >> OK. >> >> > >> >> > >> >> > But even after fixing that I then hit issues with these changes now >> >> > resulting in imperfect 'in_generic_make_request' accounting that happens >> >> > lazily once the outermost plug completes blk_finish_plug. manifested as >> >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. >> >> >> >> Looks this problem should be related with above 'bio_list' definition too. >> > >> > No, as I explained it was due to the nested plug: >> > >> >> > >> >> > Basically using the blk-core's onstack plugging isn't workable for >> >> > fixing this deadlock and we're back to having to seriously consider >> >> > this (with its additional hook in the scheduler) >> > >> > To elaborate, for the code in DM (and other subsystems like bcache) that >> > rely on accurate accounting of whether we're actively _in_ >> > generic_make_request: using plug to store/manage the bio_list isn't >> >> That looks an interesting requirement, which means DM just need to know >> if the current callsite is from generic_make_request(), so what you need >> is just one per-task variable. >> >> With the stack variable of 'plug', it should be easier to do that for DM, for >> example, you can introduce one flag in 'struct blk_plug', then set it in >> the entry of generic_make_request(), and clear it in the exit of the >> function. > > Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_ > generic_make_request() but then it just calls into question why the heck > we're using the plug to begin with? (especially given plugging is for > request-based devices at this point!). > > It really doesn't make _any_ sense to overload blk_plug by moving the > bio_list into there and adding a 'in_generic_make_request'... when you > consider the _only_ reason this was suggested is to (ab)use the existing > hook in scheduler/core.c. At the first glance, I mean it is doable to use blk_plug for the issue. >From last year's discussion, looks Jens thought we have plug already which should have covered this case, also Kent wanted to implement plug for bio too. > > So I stand by my position that there is really no point in the exercise > and that it actually hurts the code to try to make this a blk_plug > "feature". > > We already have well established current->bio_list semantics that can be > reused as a flag given it is a pointer. The block callout in the > scheduler is going to grow a conditional either way. What I've proposed > _seems_ the cleanest to me and others. Hopefully you can see that > aspect of things. > > So if you could review the v3 patch with a critical eye that'd be very > much appreciated. Will do. > > But I do look forward to Jens also having a look at this and providing > his review feedback. > >> > workable because nested plugs change the lifetime of when the bio_list >> > is processed (as I implemented it -- which was to respect nested plugs). >> > I could've forced the issue by making the bio_list get processed >> > regardless of nesting but that would've made the onstack plugging much >> > more convoluted (duality between nested vs not just for bio_list's >> > benefit and for what gain? Simply to avoid an extra conditional >> > immediately in the scheduler? That conditional was still added anyway >> > but just as part of blk_needs_flush_plug so in the end there wasn't any >> > benefit!). >> > >> > Hopefully my middle-of-the-night reply is coherent and helped to clarify >> > my position that (ab)using blk_plug for the bio_list management is >> > _really_ awkward. ;) >> >> Hope it wan't my reply to cause the break of your sleep, :-) > > No, my dog woke me up to go outside at 4am.. I was up
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 15 2015 at 11:08pm -0400, Ming Lei wrote: > On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer wrote: > > On Wed, Oct 14 2015 at 11:27pm -0400, > > Ming Lei wrote: > > > >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: > >> > > >> > Turns out that this change: > >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > >> > > >> > needed to be reverted with: > >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > >> > > >> > Because nested plugs caused generic_make_request()'s onstack bio_list to > >> > go out of scope (blk_finish_plug() wouldn't actually flush the list > >> > within generic_make_request because XFS already added an outermost > >> > plug). > >> > >> Looks you should have defined bio_list in plug as > >> > >>'struct bio_list bio_list' > >> > >> instead of one pointer. > > > > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced > > above that does exactly that). That wasn't the problem. > > OK. > > > > >> > > >> > But even after fixing that I then hit issues with these changes now > >> > resulting in imperfect 'in_generic_make_request' accounting that happens > >> > lazily once the outermost plug completes blk_finish_plug. manifested as > >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. > >> > >> Looks this problem should be related with above 'bio_list' definition too. > > > > No, as I explained it was due to the nested plug: > > > >> > > >> > Basically using the blk-core's onstack plugging isn't workable for > >> > fixing this deadlock and we're back to having to seriously consider > >> > this (with its additional hook in the scheduler) > > > > To elaborate, for the code in DM (and other subsystems like bcache) that > > rely on accurate accounting of whether we're actively _in_ > > generic_make_request: using plug to store/manage the bio_list isn't > > That looks an interesting requirement, which means DM just need to know > if the current callsite is from generic_make_request(), so what you need > is just one per-task variable. > > With the stack variable of 'plug', it should be easier to do that for DM, for > example, you can introduce one flag in 'struct blk_plug', then set it in > the entry of generic_make_request(), and clear it in the exit of the > function. Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_ generic_make_request() but then it just calls into question why the heck we're using the plug to begin with? (especially given plugging is for request-based devices at this point!). It really doesn't make _any_ sense to overload blk_plug by moving the bio_list into there and adding a 'in_generic_make_request'... when you consider the _only_ reason this was suggested is to (ab)use the existing hook in scheduler/core.c. So I stand by my position that there is really no point in the exercise and that it actually hurts the code to try to make this a blk_plug "feature". We already have well established current->bio_list semantics that can be reused as a flag given it is a pointer. The block callout in the scheduler is going to grow a conditional either way. What I've proposed _seems_ the cleanest to me and others. Hopefully you can see that aspect of things. So if you could review the v3 patch with a critical eye that'd be very much appreciated. But I do look forward to Jens also having a look at this and providing his review feedback. > > workable because nested plugs change the lifetime of when the bio_list > > is processed (as I implemented it -- which was to respect nested plugs). > > I could've forced the issue by making the bio_list get processed > > regardless of nesting but that would've made the onstack plugging much > > more convoluted (duality between nested vs not just for bio_list's > > benefit and for what gain? Simply to avoid an extra conditional > > immediately in the scheduler? That conditional was still added anyway > > but just as part of blk_needs_flush_plug so in the end there wasn't any > > benefit!). > > > > Hopefully my middle-of-the-night reply is coherent and helped to clarify > > my position that (ab)using blk_plug for the bio_list management is > > _really_ awkward. ;) > > Hope it wan't my reply to cause the break of your sleep, :-) No, my dog woke me up to go outside at 4am.. I was up and couldn't resist looking at my phone.. the rest is history ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 15 2015 at 11:08pm -0400, Ming Leiwrote: > On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer wrote: > > On Wed, Oct 14 2015 at 11:27pm -0400, > > Ming Lei wrote: > > > >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: > >> > > >> > Turns out that this change: > >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > >> > > >> > needed to be reverted with: > >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > >> > > >> > Because nested plugs caused generic_make_request()'s onstack bio_list to > >> > go out of scope (blk_finish_plug() wouldn't actually flush the list > >> > within generic_make_request because XFS already added an outermost > >> > plug). > >> > >> Looks you should have defined bio_list in plug as > >> > >>'struct bio_list bio_list' > >> > >> instead of one pointer. > > > > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced > > above that does exactly that). That wasn't the problem. > > OK. > > > > >> > > >> > But even after fixing that I then hit issues with these changes now > >> > resulting in imperfect 'in_generic_make_request' accounting that happens > >> > lazily once the outermost plug completes blk_finish_plug. manifested as > >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. > >> > >> Looks this problem should be related with above 'bio_list' definition too. > > > > No, as I explained it was due to the nested plug: > > > >> > > >> > Basically using the blk-core's onstack plugging isn't workable for > >> > fixing this deadlock and we're back to having to seriously consider > >> > this (with its additional hook in the scheduler) > > > > To elaborate, for the code in DM (and other subsystems like bcache) that > > rely on accurate accounting of whether we're actively _in_ > > generic_make_request: using plug to store/manage the bio_list isn't > > That looks an interesting requirement, which means DM just need to know > if the current callsite is from generic_make_request(), so what you need > is just one per-task variable. > > With the stack variable of 'plug', it should be easier to do that for DM, for > example, you can introduce one flag in 'struct blk_plug', then set it in > the entry of generic_make_request(), and clear it in the exit of the > function. Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_ generic_make_request() but then it just calls into question why the heck we're using the plug to begin with? (especially given plugging is for request-based devices at this point!). It really doesn't make _any_ sense to overload blk_plug by moving the bio_list into there and adding a 'in_generic_make_request'... when you consider the _only_ reason this was suggested is to (ab)use the existing hook in scheduler/core.c. So I stand by my position that there is really no point in the exercise and that it actually hurts the code to try to make this a blk_plug "feature". We already have well established current->bio_list semantics that can be reused as a flag given it is a pointer. The block callout in the scheduler is going to grow a conditional either way. What I've proposed _seems_ the cleanest to me and others. Hopefully you can see that aspect of things. So if you could review the v3 patch with a critical eye that'd be very much appreciated. But I do look forward to Jens also having a look at this and providing his review feedback. > > workable because nested plugs change the lifetime of when the bio_list > > is processed (as I implemented it -- which was to respect nested plugs). > > I could've forced the issue by making the bio_list get processed > > regardless of nesting but that would've made the onstack plugging much > > more convoluted (duality between nested vs not just for bio_list's > > benefit and for what gain? Simply to avoid an extra conditional > > immediately in the scheduler? That conditional was still added anyway > > but just as part of blk_needs_flush_plug so in the end there wasn't any > > benefit!). > > > > Hopefully my middle-of-the-night reply is coherent and helped to clarify > > my position that (ab)using blk_plug for the bio_list management is > > _really_ awkward. ;) > > Hope it wan't my reply to cause the break of your sleep, :-) No, my dog woke me up to go outside at 4am.. I was up and couldn't resist looking at my phone.. the rest is history ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer wrote: > On Wed, Oct 14 2015 at 11:27pm -0400, > Ming Lei wrote: > >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: >> > >> > Turns out that this change: >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 >> > >> > needed to be reverted with: >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe >> > >> > Because nested plugs caused generic_make_request()'s onstack bio_list to >> > go out of scope (blk_finish_plug() wouldn't actually flush the list >> > within generic_make_request because XFS already added an outermost >> > plug). >> >> Looks you should have defined bio_list in plug as >> >>'struct bio_list bio_list' >> >> instead of one pointer. > > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced > above that does exactly that). That wasn't the problem. OK. > >> > >> > But even after fixing that I then hit issues with these changes now >> > resulting in imperfect 'in_generic_make_request' accounting that happens >> > lazily once the outermost plug completes blk_finish_plug. manifested as >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. >> >> Looks this problem should be related with above 'bio_list' definition too. > > No, as I explained it was due to the nested plug: > >> > >> > Basically using the blk-core's onstack plugging isn't workable for >> > fixing this deadlock and we're back to having to seriously consider >> > this (with its additional hook in the scheduler) > > To elaborate, for the code in DM (and other subsystems like bcache) that > rely on accurate accounting of whether we're actively _in_ > generic_make_request: using plug to store/manage the bio_list isn't That looks an interesting requirement, which means DM just need to know if the current callsite is from generic_make_request(), so what you need is just one per-task variable. With the stack variable of 'plug', it should be easier to do that for DM, for example, you can introduce one flag in 'struct blk_plug', then set it in the entry of generic_make_request(), and clear it in the exit of the function. > workable because nested plugs change the lifetime of when the bio_list > is processed (as I implemented it -- which was to respect nested plugs). > I could've forced the issue by making the bio_list get processed > regardless of nesting but that would've made the onstack plugging much > more convoluted (duality between nested vs not just for bio_list's > benefit and for what gain? Simply to avoid an extra conditional > immediately in the scheduler? That conditional was still added anyway > but just as part of blk_needs_flush_plug so in the end there wasn't any > benefit!). > > Hopefully my middle-of-the-night reply is coherent and helped to clarify > my position that (ab)using blk_plug for the bio_list management is > _really_ awkward. ;) Hope it wan't my reply to cause the break of your sleep, :-) Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Wed, Oct 14 2015 at 11:27pm -0400, Ming Lei wrote: > On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: > > > > Turns out that this change: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > > > > needed to be reverted with: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > > > > Because nested plugs caused generic_make_request()'s onstack bio_list to > > go out of scope (blk_finish_plug() wouldn't actually flush the list > > within generic_make_request because XFS already added an outermost > > plug). > > Looks you should have defined bio_list in plug as > >'struct bio_list bio_list' > > instead of one pointer. I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced above that does exactly that). That wasn't the problem. > > > > But even after fixing that I then hit issues with these changes now > > resulting in imperfect 'in_generic_make_request' accounting that happens > > lazily once the outermost plug completes blk_finish_plug. manifested as > > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. > > Looks this problem should be related with above 'bio_list' definition too. No, as I explained it was due to the nested plug: > > > > Basically using the blk-core's onstack plugging isn't workable for > > fixing this deadlock and we're back to having to seriously consider > > this (with its additional hook in the scheduler) To elaborate, for the code in DM (and other subsystems like bcache) that rely on accurate accounting of whether we're actively _in_ generic_make_request: using plug to store/manage the bio_list isn't workable because nested plugs change the lifetime of when the bio_list is processed (as I implemented it -- which was to respect nested plugs). I could've forced the issue by making the bio_list get processed regardless of nesting but that would've made the onstack plugging much more convoluted (duality between nested vs not just for bio_list's benefit and for what gain? Simply to avoid an extra conditional immediately in the scheduler? That conditional was still added anyway but just as part of blk_needs_flush_plug so in the end there wasn't any benefit!). Hopefully my middle-of-the-night reply is coherent and helped to clarify my position that (ab)using blk_plug for the bio_list management is _really_ awkward. ;) Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Wed, Oct 14 2015 at 11:27pm -0400, Ming Leiwrote: > On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: > > > > Turns out that this change: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > > > > needed to be reverted with: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > > > > Because nested plugs caused generic_make_request()'s onstack bio_list to > > go out of scope (blk_finish_plug() wouldn't actually flush the list > > within generic_make_request because XFS already added an outermost > > plug). > > Looks you should have defined bio_list in plug as > >'struct bio_list bio_list' > > instead of one pointer. I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced above that does exactly that). That wasn't the problem. > > > > But even after fixing that I then hit issues with these changes now > > resulting in imperfect 'in_generic_make_request' accounting that happens > > lazily once the outermost plug completes blk_finish_plug. manifested as > > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. > > Looks this problem should be related with above 'bio_list' definition too. No, as I explained it was due to the nested plug: > > > > Basically using the blk-core's onstack plugging isn't workable for > > fixing this deadlock and we're back to having to seriously consider > > this (with its additional hook in the scheduler) To elaborate, for the code in DM (and other subsystems like bcache) that rely on accurate accounting of whether we're actively _in_ generic_make_request: using plug to store/manage the bio_list isn't workable because nested plugs change the lifetime of when the bio_list is processed (as I implemented it -- which was to respect nested plugs). I could've forced the issue by making the bio_list get processed regardless of nesting but that would've made the onstack plugging much more convoluted (duality between nested vs not just for bio_list's benefit and for what gain? Simply to avoid an extra conditional immediately in the scheduler? That conditional was still added anyway but just as part of blk_needs_flush_plug so in the end there wasn't any benefit!). Hopefully my middle-of-the-night reply is coherent and helped to clarify my position that (ab)using blk_plug for the bio_list management is _really_ awkward. ;) Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzerwrote: > On Wed, Oct 14 2015 at 11:27pm -0400, > Ming Lei wrote: > >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: >> > >> > Turns out that this change: >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 >> > >> > needed to be reverted with: >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe >> > >> > Because nested plugs caused generic_make_request()'s onstack bio_list to >> > go out of scope (blk_finish_plug() wouldn't actually flush the list >> > within generic_make_request because XFS already added an outermost >> > plug). >> >> Looks you should have defined bio_list in plug as >> >>'struct bio_list bio_list' >> >> instead of one pointer. > > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced > above that does exactly that). That wasn't the problem. OK. > >> > >> > But even after fixing that I then hit issues with these changes now >> > resulting in imperfect 'in_generic_make_request' accounting that happens >> > lazily once the outermost plug completes blk_finish_plug. manifested as >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. >> >> Looks this problem should be related with above 'bio_list' definition too. > > No, as I explained it was due to the nested plug: > >> > >> > Basically using the blk-core's onstack plugging isn't workable for >> > fixing this deadlock and we're back to having to seriously consider >> > this (with its additional hook in the scheduler) > > To elaborate, for the code in DM (and other subsystems like bcache) that > rely on accurate accounting of whether we're actively _in_ > generic_make_request: using plug to store/manage the bio_list isn't That looks an interesting requirement, which means DM just need to know if the current callsite is from generic_make_request(), so what you need is just one per-task variable. With the stack variable of 'plug', it should be easier to do that for DM, for example, you can introduce one flag in 'struct blk_plug', then set it in the entry of generic_make_request(), and clear it in the exit of the function. > workable because nested plugs change the lifetime of when the bio_list > is processed (as I implemented it -- which was to respect nested plugs). > I could've forced the issue by making the bio_list get processed > regardless of nesting but that would've made the onstack plugging much > more convoluted (duality between nested vs not just for bio_list's > benefit and for what gain? Simply to avoid an extra conditional > immediately in the scheduler? That conditional was still added anyway > but just as part of blk_needs_flush_plug so in the end there wasn't any > benefit!). > > Hopefully my middle-of-the-night reply is coherent and helped to clarify > my position that (ab)using blk_plug for the bio_list management is > _really_ awkward. ;) Hope it wan't my reply to cause the break of your sleep, :-) Thanks, Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer wrote: > On Thu, Oct 08 2015 at 11:08am -0400, > Mike Snitzer wrote: > >> On Thu, Oct 08 2015 at 11:04am -0400, >> Mikulas Patocka wrote: >> >> > >> > >> > On Tue, 6 Oct 2015, Mike Snitzer wrote: >> > >> > > To give others context for why I'm caring about this issue again, this >> > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 >> > > >> > > FYI, I cleaned up the plug-based approach a bit further, here is the >> > > incremental patch: >> > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 >> > > >> > > And here is a new version of the overall combined patch (sharing now >> > > before I transition to looking at alternatives, though my gut is the use >> > > of a plug in generic_make_request really wouldn't hurt us.. famous last >> > > words): >> > > >> > > block/bio.c| 82 >> > > +- >> > > block/blk-core.c | 21 - >> > > drivers/md/dm-bufio.c | 2 +- >> > > drivers/md/raid1.c | 6 ++-- >> > > drivers/md/raid10.c| 6 ++-- >> > > include/linux/blkdev.h | 11 +-- >> > > include/linux/sched.h | 4 --- >> > > 7 files changed, 51 insertions(+), 81 deletions(-) >> > > >> ... >> > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c >> > > index 2dd3308..c2bff16 100644 >> > > --- a/drivers/md/dm-bufio.c >> > > +++ b/drivers/md/dm-bufio.c >> > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct >> > > dm_bufio_client *c) >> > > #define DM_BUFIO_CACHE(c) >> > > (dm_bufio_caches[dm_bufio_cache_index(c)]) >> > > #define DM_BUFIO_CACHE_NAME(c) >> > > (dm_bufio_cache_names[dm_bufio_cache_index(c)]) >> > > >> > > -#define dm_bufio_in_request()(!!current->bio_list) >> > > +#define dm_bufio_in_request()(current->plug && >> > > !!current->plug->bio_list) >> > >> > This condition is repeated several times throughout the whole patch - so >> > maybe you should make it a function in block device header file. >> >> Yeah, I thought of that too but forgot to come back to it. Will do, >> thanks. >> >> FYI, I found another bug in my last patch and fixed it up. I'll get >> some refactoring done (including your suggestion), actually _test_ the >> code (e.g. verify all of lvm testsuite passes) and then send out v3. > > Turns out that this change: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > > needed to be reverted with: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > > Because nested plugs caused generic_make_request()'s onstack bio_list to > go out of scope (blk_finish_plug() wouldn't actually flush the list > within generic_make_request because XFS already added an outermost > plug). Looks you should have defined bio_list in plug as 'struct bio_list bio_list' instead of one pointer. > > But even after fixing that I then hit issues with these changes now > resulting in imperfect 'in_generic_make_request' accounting that happens > lazily once the outermost plug completes blk_finish_plug. manifested as > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. Looks this problem should be related with above 'bio_list' definition too. > > Basically using the blk-core's onstack plugging isn't workable for > fixing this deadlock and we're back to having to seriously consider > this (with its additional hook in the scheduler): > > Mike > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzerwrote: > On Thu, Oct 08 2015 at 11:08am -0400, > Mike Snitzer wrote: > >> On Thu, Oct 08 2015 at 11:04am -0400, >> Mikulas Patocka wrote: >> >> > >> > >> > On Tue, 6 Oct 2015, Mike Snitzer wrote: >> > >> > > To give others context for why I'm caring about this issue again, this >> > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 >> > > >> > > FYI, I cleaned up the plug-based approach a bit further, here is the >> > > incremental patch: >> > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 >> > > >> > > And here is a new version of the overall combined patch (sharing now >> > > before I transition to looking at alternatives, though my gut is the use >> > > of a plug in generic_make_request really wouldn't hurt us.. famous last >> > > words): >> > > >> > > block/bio.c| 82 >> > > +- >> > > block/blk-core.c | 21 - >> > > drivers/md/dm-bufio.c | 2 +- >> > > drivers/md/raid1.c | 6 ++-- >> > > drivers/md/raid10.c| 6 ++-- >> > > include/linux/blkdev.h | 11 +-- >> > > include/linux/sched.h | 4 --- >> > > 7 files changed, 51 insertions(+), 81 deletions(-) >> > > >> ... >> > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c >> > > index 2dd3308..c2bff16 100644 >> > > --- a/drivers/md/dm-bufio.c >> > > +++ b/drivers/md/dm-bufio.c >> > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct >> > > dm_bufio_client *c) >> > > #define DM_BUFIO_CACHE(c) >> > > (dm_bufio_caches[dm_bufio_cache_index(c)]) >> > > #define DM_BUFIO_CACHE_NAME(c) >> > > (dm_bufio_cache_names[dm_bufio_cache_index(c)]) >> > > >> > > -#define dm_bufio_in_request()(!!current->bio_list) >> > > +#define dm_bufio_in_request()(current->plug && >> > > !!current->plug->bio_list) >> > >> > This condition is repeated several times throughout the whole patch - so >> > maybe you should make it a function in block device header file. >> >> Yeah, I thought of that too but forgot to come back to it. Will do, >> thanks. >> >> FYI, I found another bug in my last patch and fixed it up. I'll get >> some refactoring done (including your suggestion), actually _test_ the >> code (e.g. verify all of lvm testsuite passes) and then send out v3. > > Turns out that this change: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > > needed to be reverted with: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > > Because nested plugs caused generic_make_request()'s onstack bio_list to > go out of scope (blk_finish_plug() wouldn't actually flush the list > within generic_make_request because XFS already added an outermost > plug). Looks you should have defined bio_list in plug as 'struct bio_list bio_list' instead of one pointer. > > But even after fixing that I then hit issues with these changes now > resulting in imperfect 'in_generic_make_request' accounting that happens > lazily once the outermost plug completes blk_finish_plug. manifested as > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. Looks this problem should be related with above 'bio_list' definition too. > > Basically using the blk-core's onstack plugging isn't workable for > fixing this deadlock and we're back to having to seriously consider > this (with its additional hook in the scheduler): > > Mike > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Fri, Oct 09 2015 at 3:52pm -0400, Mike Snitzer wrote: > Turns out that this change: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > > needed to be reverted with: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > > Because nested plugs caused generic_make_request()'s onstack bio_list to > go out of scope (blk_finish_plug() wouldn't actually flush the list > within generic_make_request because XFS already added an outermost > plug). > > But even after fixing that I then hit issues with these changes now > resulting in imperfect 'in_generic_make_request' accounting that happens > lazily once the outermost plug completes blk_finish_plug. manifested as > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. > > Basically using the blk-core's onstack plugging isn't workable for > fixing this deadlock and we're back to having to seriously consider > this (with its additional hook in the scheduler): http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=a91709cd32b5ca7ca047b68c9299e747f2ae6ca2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 08 2015 at 11:08am -0400, Mike Snitzer wrote: > On Thu, Oct 08 2015 at 11:04am -0400, > Mikulas Patocka wrote: > > > > > > > On Tue, 6 Oct 2015, Mike Snitzer wrote: > > > > > To give others context for why I'm caring about this issue again, this > > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > > > > > FYI, I cleaned up the plug-based approach a bit further, here is the > > > incremental patch: > > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > > > > > And here is a new version of the overall combined patch (sharing now > > > before I transition to looking at alternatives, though my gut is the use > > > of a plug in generic_make_request really wouldn't hurt us.. famous last > > > words): > > > > > > block/bio.c| 82 > > > +- > > > block/blk-core.c | 21 - > > > drivers/md/dm-bufio.c | 2 +- > > > drivers/md/raid1.c | 6 ++-- > > > drivers/md/raid10.c| 6 ++-- > > > include/linux/blkdev.h | 11 +-- > > > include/linux/sched.h | 4 --- > > > 7 files changed, 51 insertions(+), 81 deletions(-) > > > > ... > > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c > > > index 2dd3308..c2bff16 100644 > > > --- a/drivers/md/dm-bufio.c > > > +++ b/drivers/md/dm-bufio.c > > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct > > > dm_bufio_client *c) > > > #define DM_BUFIO_CACHE(c) > > > (dm_bufio_caches[dm_bufio_cache_index(c)]) > > > #define DM_BUFIO_CACHE_NAME(c) > > > (dm_bufio_cache_names[dm_bufio_cache_index(c)]) > > > > > > -#define dm_bufio_in_request()(!!current->bio_list) > > > +#define dm_bufio_in_request()(current->plug && > > > !!current->plug->bio_list) > > > > This condition is repeated several times throughout the whole patch - so > > maybe you should make it a function in block device header file. > > Yeah, I thought of that too but forgot to come back to it. Will do, > thanks. > > FYI, I found another bug in my last patch and fixed it up. I'll get > some refactoring done (including your suggestion), actually _test_ the > code (e.g. verify all of lvm testsuite passes) and then send out v3. Turns out that this change: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 needed to be reverted with: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe Because nested plugs caused generic_make_request()'s onstack bio_list to go out of scope (blk_finish_plug() wouldn't actually flush the list within generic_make_request because XFS already added an outermost plug). But even after fixing that I then hit issues with these changes now resulting in imperfect 'in_generic_make_request' accounting that happens lazily once the outermost plug completes blk_finish_plug. manifested as dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. Basically using the blk-core's onstack plugging isn't workable for fixing this deadlock and we're back to having to seriously consider this (with its additional hook in the scheduler): Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
Hi Mike, [auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore] config: x86_64-lkp (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from include/linux/linkage.h:4:0, from include/linux/fs.h:4, from include/linux/highmem.h:4, from include/linux/bio.h:23, from drivers/md/bcache/bcache.h:181, from drivers/md/bcache/btree.c:23: drivers/md/bcache/btree.c: In function '__bch_btree_node_write': >> drivers/md/bcache/btree.c:454:16: error: 'struct task_struct' has no member >> named 'bio_list' BUG_ON(current->bio_list); ^ include/linux/compiler.h:166:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/md/bcache/btree.c:454:2: note: in expansion of macro 'BUG_ON' BUG_ON(current->bio_list); ^ drivers/md/bcache/btree.c: In function 'bch_btree_leaf_dirty': drivers/md/bcache/btree.c:548:14: error: 'struct task_struct' has no member named 'bio_list' !current->bio_list) ^ In file included from include/linux/linkage.h:4:0, from include/linux/fs.h:4, from include/linux/highmem.h:4, from include/linux/bio.h:23, from drivers/md/bcache/bcache.h:181, from drivers/md/bcache/btree.c:23: drivers/md/bcache/btree.c: In function 'mca_alloc': drivers/md/bcache/btree.c:893:16: error: 'struct task_struct' has no member named 'bio_list' BUG_ON(current->bio_list); ^ include/linux/compiler.h:166:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/md/bcache/btree.c:893:2: note: in expansion of macro 'BUG_ON' BUG_ON(current->bio_list); ^ drivers/md/bcache/btree.c: In function 'bch_btree_node_get': drivers/md/bcache/btree.c:980:14: error: 'struct task_struct' has no member named 'bio_list' if (current->bio_list) ^ drivers/md/bcache/btree.c: In function 'bch_btree_insert_node': drivers/md/bcache/btree.c:2131:13: error: 'struct task_struct' has no member named 'bio_list' if (current->bio_list) { ^ In file included from include/linux/linkage.h:4:0, from include/linux/fs.h:4, from include/linux/highmem.h:4, from include/linux/bio.h:23, from drivers/md/bcache/bcache.h:181, from drivers/md/bcache/btree.c:23: drivers/md/bcache/btree.c: In function 'bch_btree_insert': drivers/md/bcache/btree.c:2211:16: error: 'struct task_struct' has no member named 'bio_list' BUG_ON(current->bio_list); ^ include/linux/compiler.h:166:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/md/bcache/btree.c:2211:2: note: in expansion of macro 'BUG_ON' BUG_ON(current->bio_list); ^ drivers/md/bcache/btree.c: In function 'bch_btree_insert_node': drivers/md/bcache/btree.c:2147:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +454 drivers/md/bcache/btree.c ee811287 Kent Overstreet 2013-12-17 448struct bset *i = btree_bset_last(b); cafe5635 Kent Overstreet 2013-03-23 449 2a285686 Kent Overstreet 2014-03-04 450 lockdep_assert_held(>write_lock); 2a285686 Kent Overstreet 2014-03-04 451 c37511b8 Kent Overstreet 2013-04-26 452trace_bcache_btree_write(b); c37511b8 Kent Overstreet 2013-04-26 453 cafe5635 Kent Overstreet 2013-03-23 @454BUG_ON(current->bio_list); 57943511 Kent Overstreet 2013-04-25 455BUG_ON(b->written >= btree_blocks(b)); 57943511 Kent Overstreet 2013-04-25 456BUG_ON(b->written && !i->keys); ee811287 Kent Overstreet 2013-12-17 457BUG_ON(btree_bset_first(b)->seq != i->seq); :: The code at line 454 was first introduced by commit :: cafe563591446cf80bfbc2fe3bc72a2e36cf1060 bcache: A block layer cache :: TO: Kent Overstreet :: CC: Kent Overstreet --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH v2] block: flush queued bios when the process blocks
Hi Mike, [auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore] config: x86_64-lkp (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from include/linux/linkage.h:4:0, from include/linux/fs.h:4, from include/linux/highmem.h:4, from include/linux/bio.h:23, from drivers/md/bcache/bcache.h:181, from drivers/md/bcache/btree.c:23: drivers/md/bcache/btree.c: In function '__bch_btree_node_write': >> drivers/md/bcache/btree.c:454:16: error: 'struct task_struct' has no member >> named 'bio_list' BUG_ON(current->bio_list); ^ include/linux/compiler.h:166:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/md/bcache/btree.c:454:2: note: in expansion of macro 'BUG_ON' BUG_ON(current->bio_list); ^ drivers/md/bcache/btree.c: In function 'bch_btree_leaf_dirty': drivers/md/bcache/btree.c:548:14: error: 'struct task_struct' has no member named 'bio_list' !current->bio_list) ^ In file included from include/linux/linkage.h:4:0, from include/linux/fs.h:4, from include/linux/highmem.h:4, from include/linux/bio.h:23, from drivers/md/bcache/bcache.h:181, from drivers/md/bcache/btree.c:23: drivers/md/bcache/btree.c: In function 'mca_alloc': drivers/md/bcache/btree.c:893:16: error: 'struct task_struct' has no member named 'bio_list' BUG_ON(current->bio_list); ^ include/linux/compiler.h:166:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/md/bcache/btree.c:893:2: note: in expansion of macro 'BUG_ON' BUG_ON(current->bio_list); ^ drivers/md/bcache/btree.c: In function 'bch_btree_node_get': drivers/md/bcache/btree.c:980:14: error: 'struct task_struct' has no member named 'bio_list' if (current->bio_list) ^ drivers/md/bcache/btree.c: In function 'bch_btree_insert_node': drivers/md/bcache/btree.c:2131:13: error: 'struct task_struct' has no member named 'bio_list' if (current->bio_list) { ^ In file included from include/linux/linkage.h:4:0, from include/linux/fs.h:4, from include/linux/highmem.h:4, from include/linux/bio.h:23, from drivers/md/bcache/bcache.h:181, from drivers/md/bcache/btree.c:23: drivers/md/bcache/btree.c: In function 'bch_btree_insert': drivers/md/bcache/btree.c:2211:16: error: 'struct task_struct' has no member named 'bio_list' BUG_ON(current->bio_list); ^ include/linux/compiler.h:166:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/md/bcache/btree.c:2211:2: note: in expansion of macro 'BUG_ON' BUG_ON(current->bio_list); ^ drivers/md/bcache/btree.c: In function 'bch_btree_insert_node': drivers/md/bcache/btree.c:2147:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +454 drivers/md/bcache/btree.c ee811287 Kent Overstreet 2013-12-17 448struct bset *i = btree_bset_last(b); cafe5635 Kent Overstreet 2013-03-23 449 2a285686 Kent Overstreet 2014-03-04 450 lockdep_assert_held(>write_lock); 2a285686 Kent Overstreet 2014-03-04 451 c37511b8 Kent Overstreet 2013-04-26 452trace_bcache_btree_write(b); c37511b8 Kent Overstreet 2013-04-26 453 cafe5635 Kent Overstreet 2013-03-23 @454BUG_ON(current->bio_list); 57943511 Kent Overstreet 2013-04-25 455BUG_ON(b->written >= btree_blocks(b)); 57943511 Kent Overstreet 2013-04-25 456BUG_ON(b->written && !i->keys); ee811287 Kent Overstreet 2013-12-17 457BUG_ON(btree_bset_first(b)->seq != i->seq); :: The code at line 454 was first introduced by commit :: cafe563591446cf80bfbc2fe3bc72a2e36cf1060 bcache: A block layer cache :: TO: Kent Overstreet:: CC: Kent Overstreet --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 08 2015 at 11:08am -0400, Mike Snitzerwrote: > On Thu, Oct 08 2015 at 11:04am -0400, > Mikulas Patocka wrote: > > > > > > > On Tue, 6 Oct 2015, Mike Snitzer wrote: > > > > > To give others context for why I'm caring about this issue again, this > > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > > > > > FYI, I cleaned up the plug-based approach a bit further, here is the > > > incremental patch: > > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > > > > > And here is a new version of the overall combined patch (sharing now > > > before I transition to looking at alternatives, though my gut is the use > > > of a plug in generic_make_request really wouldn't hurt us.. famous last > > > words): > > > > > > block/bio.c| 82 > > > +- > > > block/blk-core.c | 21 - > > > drivers/md/dm-bufio.c | 2 +- > > > drivers/md/raid1.c | 6 ++-- > > > drivers/md/raid10.c| 6 ++-- > > > include/linux/blkdev.h | 11 +-- > > > include/linux/sched.h | 4 --- > > > 7 files changed, 51 insertions(+), 81 deletions(-) > > > > ... > > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c > > > index 2dd3308..c2bff16 100644 > > > --- a/drivers/md/dm-bufio.c > > > +++ b/drivers/md/dm-bufio.c > > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct > > > dm_bufio_client *c) > > > #define DM_BUFIO_CACHE(c) > > > (dm_bufio_caches[dm_bufio_cache_index(c)]) > > > #define DM_BUFIO_CACHE_NAME(c) > > > (dm_bufio_cache_names[dm_bufio_cache_index(c)]) > > > > > > -#define dm_bufio_in_request()(!!current->bio_list) > > > +#define dm_bufio_in_request()(current->plug && > > > !!current->plug->bio_list) > > > > This condition is repeated several times throughout the whole patch - so > > maybe you should make it a function in block device header file. > > Yeah, I thought of that too but forgot to come back to it. Will do, > thanks. > > FYI, I found another bug in my last patch and fixed it up. I'll get > some refactoring done (including your suggestion), actually _test_ the > code (e.g. verify all of lvm testsuite passes) and then send out v3. Turns out that this change: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 needed to be reverted with: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe Because nested plugs caused generic_make_request()'s onstack bio_list to go out of scope (blk_finish_plug() wouldn't actually flush the list within generic_make_request because XFS already added an outermost plug). But even after fixing that I then hit issues with these changes now resulting in imperfect 'in_generic_make_request' accounting that happens lazily once the outermost plug completes blk_finish_plug. manifested as dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. Basically using the blk-core's onstack plugging isn't workable for fixing this deadlock and we're back to having to seriously consider this (with its additional hook in the scheduler): Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Fri, Oct 09 2015 at 3:52pm -0400, Mike Snitzerwrote: > Turns out that this change: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841 > > needed to be reverted with: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe > > Because nested plugs caused generic_make_request()'s onstack bio_list to > go out of scope (blk_finish_plug() wouldn't actually flush the list > within generic_make_request because XFS already added an outermost > plug). > > But even after fixing that I then hit issues with these changes now > resulting in imperfect 'in_generic_make_request' accounting that happens > lazily once the outermost plug completes blk_finish_plug. manifested as > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting. > > Basically using the blk-core's onstack plugging isn't workable for > fixing this deadlock and we're back to having to seriously consider > this (with its additional hook in the scheduler): http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=a91709cd32b5ca7ca047b68c9299e747f2ae6ca2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 08 2015 at 11:04am -0400, Mikulas Patocka wrote: > > > On Tue, 6 Oct 2015, Mike Snitzer wrote: > > > To give others context for why I'm caring about this issue again, this > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > > > FYI, I cleaned up the plug-based approach a bit further, here is the > > incremental patch: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > > > And here is a new version of the overall combined patch (sharing now > > before I transition to looking at alternatives, though my gut is the use > > of a plug in generic_make_request really wouldn't hurt us.. famous last > > words): > > > > block/bio.c| 82 > > +- > > block/blk-core.c | 21 - > > drivers/md/dm-bufio.c | 2 +- > > drivers/md/raid1.c | 6 ++-- > > drivers/md/raid10.c| 6 ++-- > > include/linux/blkdev.h | 11 +-- > > include/linux/sched.h | 4 --- > > 7 files changed, 51 insertions(+), 81 deletions(-) > > ... > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c > > index 2dd3308..c2bff16 100644 > > --- a/drivers/md/dm-bufio.c > > +++ b/drivers/md/dm-bufio.c > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct > > dm_bufio_client *c) > > #define DM_BUFIO_CACHE(c) (dm_bufio_caches[dm_bufio_cache_index(c)]) > > #define DM_BUFIO_CACHE_NAME(c) > > (dm_bufio_cache_names[dm_bufio_cache_index(c)]) > > > > -#define dm_bufio_in_request() (!!current->bio_list) > > +#define dm_bufio_in_request() (current->plug && > > !!current->plug->bio_list) > > This condition is repeated several times throughout the whole patch - so > maybe you should make it a function in block device header file. Yeah, I thought of that too but forgot to come back to it. Will do, thanks. FYI, I found another bug in my last patch and fixed it up. I'll get some refactoring done (including your suggestion), actually _test_ the code (e.g. verify all of lvm testsuite passes) and then send out v3. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Tue, 6 Oct 2015, Mike Snitzer wrote: > To give others context for why I'm caring about this issue again, this > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > FYI, I cleaned up the plug-based approach a bit further, here is the > incremental patch: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > And here is a new version of the overall combined patch (sharing now > before I transition to looking at alternatives, though my gut is the use > of a plug in generic_make_request really wouldn't hurt us.. famous last > words): > > block/bio.c| 82 > +- > block/blk-core.c | 21 - > drivers/md/dm-bufio.c | 2 +- > drivers/md/raid1.c | 6 ++-- > drivers/md/raid10.c| 6 ++-- > include/linux/blkdev.h | 11 +-- > include/linux/sched.h | 4 --- > 7 files changed, 51 insertions(+), 81 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index ad3f276..3d03668 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work) > } > } > > -static void punt_bios_to_rescuer(struct bio_set *bs) > +/** > + * blk_flush_bio_list > + * @plug: the blk_plug that may have collected bios > + * > + * Pop bios queued on plug->bio_list and submit each of them to > + * their rescue workqueue. > + * > + * If the bio doesn't have a bio_set, we use the default fs_bio_set. > + * However, stacking drivers should use bio_set, so this shouldn't be > + * an issue. > + */ > +void blk_flush_bio_list(struct blk_plug *plug) > { > - struct bio_list punt, nopunt; > struct bio *bio; > > - /* > - * In order to guarantee forward progress we must punt only bios that > - * were allocated from this bio_set; otherwise, if there was a bio on > - * there for a stacking driver higher up in the stack, processing it > - * could require allocating bios from this bio_set, and doing that from > - * our own rescuer would be bad. > - * > - * Since bio lists are singly linked, pop them all instead of trying to > - * remove from the middle of the list: > - */ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > + while ((bio = bio_list_pop(>bio_list))) { > + struct bio_set *bs = bio->bi_pool; > + if (!bs) > + bs = fs_bio_set; > > - queue_work(bs->rescue_workqueue, >rescue_work); > + spin_lock(>rescue_lock); > + bio_list_add(>rescue_list, bio); > + queue_work(bs->rescue_workqueue, >rescue_work); > + spin_unlock(>rescue_lock); > + } > } > > /** > @@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs) > */ > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set > *bs) > { > - gfp_t saved_gfp = gfp_mask; > unsigned front_pad; > unsigned inline_vecs; > unsigned long idx = BIO_POOL_NONE; > @@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int > nr_iovecs, struct bio_set *bs) > /* should not use nobvec bioset for nr_iovecs > 0 */ > if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0)) > return NULL; > - /* > - * generic_make_request() converts recursion to iteration; this > - * means if we're running beneath it, any bios we allocate and > - * submit will not be submitted (and thus freed) until after we > - * return. > - * > - * This exposes us to a potential deadlock if we allocate > - * multiple bios from the same bio_set() while running > - * underneath generic_make_request(). If we were to allocate > - * multiple bios (say a stacking block driver that was splitting > - * bios), we would deadlock if we exhausted the mempool's > - * reserve. > - * > - * We solve this, and guarantee forward progress, with a rescuer > - * workqueue per bio_set. If we go to allocate and there are > - * bios on current->bio_list, we first try the allocation > - * without __GFP_WAIT; if that fails, we punt those bios we > - * would be blocking to the rescuer workqueue before we retry > - * with the original gfp_flags. > - */ > - > - if (current->bio_list && !bio_list_empty(current->bio_list)) > - gfp_mask &=
Re: [PATCH v2] block: flush queued bios when the process blocks
On Thu, Oct 08 2015 at 11:04am -0400, Mikulas Patockawrote: > > > On Tue, 6 Oct 2015, Mike Snitzer wrote: > > > To give others context for why I'm caring about this issue again, this > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > > > FYI, I cleaned up the plug-based approach a bit further, here is the > > incremental patch: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > > > And here is a new version of the overall combined patch (sharing now > > before I transition to looking at alternatives, though my gut is the use > > of a plug in generic_make_request really wouldn't hurt us.. famous last > > words): > > > > block/bio.c| 82 > > +- > > block/blk-core.c | 21 - > > drivers/md/dm-bufio.c | 2 +- > > drivers/md/raid1.c | 6 ++-- > > drivers/md/raid10.c| 6 ++-- > > include/linux/blkdev.h | 11 +-- > > include/linux/sched.h | 4 --- > > 7 files changed, 51 insertions(+), 81 deletions(-) > > ... > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c > > index 2dd3308..c2bff16 100644 > > --- a/drivers/md/dm-bufio.c > > +++ b/drivers/md/dm-bufio.c > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct > > dm_bufio_client *c) > > #define DM_BUFIO_CACHE(c) (dm_bufio_caches[dm_bufio_cache_index(c)]) > > #define DM_BUFIO_CACHE_NAME(c) > > (dm_bufio_cache_names[dm_bufio_cache_index(c)]) > > > > -#define dm_bufio_in_request() (!!current->bio_list) > > +#define dm_bufio_in_request() (current->plug && > > !!current->plug->bio_list) > > This condition is repeated several times throughout the whole patch - so > maybe you should make it a function in block device header file. Yeah, I thought of that too but forgot to come back to it. Will do, thanks. FYI, I found another bug in my last patch and fixed it up. I'll get some refactoring done (including your suggestion), actually _test_ the code (e.g. verify all of lvm testsuite passes) and then send out v3. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] block: flush queued bios when the process blocks
On Tue, 6 Oct 2015, Mike Snitzer wrote: > To give others context for why I'm caring about this issue again, this > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > FYI, I cleaned up the plug-based approach a bit further, here is the > incremental patch: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > And here is a new version of the overall combined patch (sharing now > before I transition to looking at alternatives, though my gut is the use > of a plug in generic_make_request really wouldn't hurt us.. famous last > words): > > block/bio.c| 82 > +- > block/blk-core.c | 21 - > drivers/md/dm-bufio.c | 2 +- > drivers/md/raid1.c | 6 ++-- > drivers/md/raid10.c| 6 ++-- > include/linux/blkdev.h | 11 +-- > include/linux/sched.h | 4 --- > 7 files changed, 51 insertions(+), 81 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index ad3f276..3d03668 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work) > } > } > > -static void punt_bios_to_rescuer(struct bio_set *bs) > +/** > + * blk_flush_bio_list > + * @plug: the blk_plug that may have collected bios > + * > + * Pop bios queued on plug->bio_list and submit each of them to > + * their rescue workqueue. > + * > + * If the bio doesn't have a bio_set, we use the default fs_bio_set. > + * However, stacking drivers should use bio_set, so this shouldn't be > + * an issue. > + */ > +void blk_flush_bio_list(struct blk_plug *plug) > { > - struct bio_list punt, nopunt; > struct bio *bio; > > - /* > - * In order to guarantee forward progress we must punt only bios that > - * were allocated from this bio_set; otherwise, if there was a bio on > - * there for a stacking driver higher up in the stack, processing it > - * could require allocating bios from this bio_set, and doing that from > - * our own rescuer would be bad. > - * > - * Since bio lists are singly linked, pop them all instead of trying to > - * remove from the middle of the list: > - */ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > + while ((bio = bio_list_pop(>bio_list))) { > + struct bio_set *bs = bio->bi_pool; > + if (!bs) > + bs = fs_bio_set; > > - queue_work(bs->rescue_workqueue, >rescue_work); > + spin_lock(>rescue_lock); > + bio_list_add(>rescue_list, bio); > + queue_work(bs->rescue_workqueue, >rescue_work); > + spin_unlock(>rescue_lock); > + } > } > > /** > @@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs) > */ > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set > *bs) > { > - gfp_t saved_gfp = gfp_mask; > unsigned front_pad; > unsigned inline_vecs; > unsigned long idx = BIO_POOL_NONE; > @@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int > nr_iovecs, struct bio_set *bs) > /* should not use nobvec bioset for nr_iovecs > 0 */ > if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0)) > return NULL; > - /* > - * generic_make_request() converts recursion to iteration; this > - * means if we're running beneath it, any bios we allocate and > - * submit will not be submitted (and thus freed) until after we > - * return. > - * > - * This exposes us to a potential deadlock if we allocate > - * multiple bios from the same bio_set() while running > - * underneath generic_make_request(). If we were to allocate > - * multiple bios (say a stacking block driver that was splitting > - * bios), we would deadlock if we exhausted the mempool's > - * reserve. > - * > - * We solve this, and guarantee forward progress, with a rescuer > - * workqueue per bio_set. If we go to allocate and there are > - * bios on current->bio_list, we first try the allocation > - * without __GFP_WAIT; if that fails, we punt those bios we > - * would be blocking to the rescuer workqueue before we retry > - * with the original gfp_flags. > - */ > - > - if (current->bio_list && !bio_list_empty(current->bio_list)) > - gfp_mask &=
Re: [PATCH v2] block: flush queued bios when the process blocks
On Tue, Oct 06 2015 at 4:16P -0400, Mike Snitzer wrote: > To give others context for why I'm caring about this issue again, this > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > FYI, I cleaned up the plug-based approach a bit further, here is the > incremental patch: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > And here is a new version of the overall combined patch (sharing now > before I transition to looking at alternatives, though my gut is the use > of a plug in generic_make_request really wouldn't hurt us.. famous last > words): > > block/bio.c| 82 > +- > block/blk-core.c | 21 - > drivers/md/dm-bufio.c | 2 +- > drivers/md/raid1.c | 6 ++-- > drivers/md/raid10.c| 6 ++-- > include/linux/blkdev.h | 11 +-- > include/linux/sched.h | 4 --- > 7 files changed, 51 insertions(+), 81 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index ad3f276..3d03668 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work) > } > } > > -static void punt_bios_to_rescuer(struct bio_set *bs) > +/** > + * blk_flush_bio_list > + * @plug: the blk_plug that may have collected bios > + * > + * Pop bios queued on plug->bio_list and submit each of them to > + * their rescue workqueue. > + * > + * If the bio doesn't have a bio_set, we use the default fs_bio_set. > + * However, stacking drivers should use bio_set, so this shouldn't be > + * an issue. > + */ > +void blk_flush_bio_list(struct blk_plug *plug) > { > - struct bio_list punt, nopunt; > struct bio *bio; > > - /* > - * In order to guarantee forward progress we must punt only bios that > - * were allocated from this bio_set; otherwise, if there was a bio on > - * there for a stacking driver higher up in the stack, processing it > - * could require allocating bios from this bio_set, and doing that from > - * our own rescuer would be bad. > - * > - * Since bio lists are singly linked, pop them all instead of trying to > - * remove from the middle of the list: > - */ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > + while ((bio = bio_list_pop(>bio_list))) { Bleh, should be plug->bio_list.. obviously I didn't compile test this... Here is the incremental I've folded in: diff --git a/block/bio.c b/block/bio.c index 3d03668..b868b9e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -369,7 +369,10 @@ void blk_flush_bio_list(struct blk_plug *plug) { struct bio *bio; - while ((bio = bio_list_pop(>bio_list))) { + if (!plug->bio_list) + return; + + while ((bio = bio_list_pop(plug->bio_list))) { struct bio_set *bs = bio->bi_pool; if (!bs) bs = fs_bio_set; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] block: flush queued bios when the process blocks
To give others context for why I'm caring about this issue again, this recent BZ against 4.3-rc served as a reminder that we _need_ a fix: https://bugzilla.redhat.com/show_bug.cgi?id=1267650 FYI, I cleaned up the plug-based approach a bit further, here is the incremental patch: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 And here is a new version of the overall combined patch (sharing now before I transition to looking at alternatives, though my gut is the use of a plug in generic_make_request really wouldn't hurt us.. famous last words): block/bio.c| 82 +- block/blk-core.c | 21 - drivers/md/dm-bufio.c | 2 +- drivers/md/raid1.c | 6 ++-- drivers/md/raid10.c| 6 ++-- include/linux/blkdev.h | 11 +-- include/linux/sched.h | 4 --- 7 files changed, 51 insertions(+), 81 deletions(-) diff --git a/block/bio.c b/block/bio.c index ad3f276..3d03668 100644 --- a/block/bio.c +++ b/block/bio.c @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work) } } -static void punt_bios_to_rescuer(struct bio_set *bs) +/** + * blk_flush_bio_list + * @plug: the blk_plug that may have collected bios + * + * Pop bios queued on plug->bio_list and submit each of them to + * their rescue workqueue. + * + * If the bio doesn't have a bio_set, we use the default fs_bio_set. + * However, stacking drivers should use bio_set, so this shouldn't be + * an issue. + */ +void blk_flush_bio_list(struct blk_plug *plug) { - struct bio_list punt, nopunt; struct bio *bio; - /* -* In order to guarantee forward progress we must punt only bios that -* were allocated from this bio_set; otherwise, if there was a bio on -* there for a stacking driver higher up in the stack, processing it -* could require allocating bios from this bio_set, and doing that from -* our own rescuer would be bad. -* -* Since bio lists are singly linked, pop them all instead of trying to -* remove from the middle of the list: -*/ - - bio_list_init(); - bio_list_init(); - - while ((bio = bio_list_pop(current->bio_list))) - bio_list_add(bio->bi_pool == bs ? : , bio); - - *current->bio_list = nopunt; - - spin_lock(>rescue_lock); - bio_list_merge(>rescue_list, ); - spin_unlock(>rescue_lock); + while ((bio = bio_list_pop(>bio_list))) { + struct bio_set *bs = bio->bi_pool; + if (!bs) + bs = fs_bio_set; - queue_work(bs->rescue_workqueue, >rescue_work); + spin_lock(>rescue_lock); + bio_list_add(>rescue_list, bio); + queue_work(bs->rescue_workqueue, >rescue_work); + spin_unlock(>rescue_lock); + } } /** @@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs) */ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) { - gfp_t saved_gfp = gfp_mask; unsigned front_pad; unsigned inline_vecs; unsigned long idx = BIO_POOL_NONE; @@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) /* should not use nobvec bioset for nr_iovecs > 0 */ if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0)) return NULL; - /* -* generic_make_request() converts recursion to iteration; this -* means if we're running beneath it, any bios we allocate and -* submit will not be submitted (and thus freed) until after we -* return. -* -* This exposes us to a potential deadlock if we allocate -* multiple bios from the same bio_set() while running -* underneath generic_make_request(). If we were to allocate -* multiple bios (say a stacking block driver that was splitting -* bios), we would deadlock if we exhausted the mempool's -* reserve. -* -* We solve this, and guarantee forward progress, with a rescuer -* workqueue per bio_set. If we go to allocate and there are -* bios on current->bio_list, we first try the allocation -* without __GFP_WAIT; if that fails, we punt those bios we -* would be blocking to the rescuer workqueue before we retry -* with the original gfp_flags. -*/ - - if (current->bio_list && !bio_list_empty(current->bio_list)) - gfp_mask &= ~__GFP_WAIT; p = mempool_alloc(bs->bio_pool, gfp_mask); - if (!p && gfp_mask != saved_gfp) { - punt_bios_to_rescuer(bs);
[PATCH v2] block: flush queued bios when the process blocks
To give others context for why I'm caring about this issue again, this recent BZ against 4.3-rc served as a reminder that we _need_ a fix: https://bugzilla.redhat.com/show_bug.cgi?id=1267650 FYI, I cleaned up the plug-based approach a bit further, here is the incremental patch: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 And here is a new version of the overall combined patch (sharing now before I transition to looking at alternatives, though my gut is the use of a plug in generic_make_request really wouldn't hurt us.. famous last words): block/bio.c| 82 +- block/blk-core.c | 21 - drivers/md/dm-bufio.c | 2 +- drivers/md/raid1.c | 6 ++-- drivers/md/raid10.c| 6 ++-- include/linux/blkdev.h | 11 +-- include/linux/sched.h | 4 --- 7 files changed, 51 insertions(+), 81 deletions(-) diff --git a/block/bio.c b/block/bio.c index ad3f276..3d03668 100644 --- a/block/bio.c +++ b/block/bio.c @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work) } } -static void punt_bios_to_rescuer(struct bio_set *bs) +/** + * blk_flush_bio_list + * @plug: the blk_plug that may have collected bios + * + * Pop bios queued on plug->bio_list and submit each of them to + * their rescue workqueue. + * + * If the bio doesn't have a bio_set, we use the default fs_bio_set. + * However, stacking drivers should use bio_set, so this shouldn't be + * an issue. + */ +void blk_flush_bio_list(struct blk_plug *plug) { - struct bio_list punt, nopunt; struct bio *bio; - /* -* In order to guarantee forward progress we must punt only bios that -* were allocated from this bio_set; otherwise, if there was a bio on -* there for a stacking driver higher up in the stack, processing it -* could require allocating bios from this bio_set, and doing that from -* our own rescuer would be bad. -* -* Since bio lists are singly linked, pop them all instead of trying to -* remove from the middle of the list: -*/ - - bio_list_init(); - bio_list_init(); - - while ((bio = bio_list_pop(current->bio_list))) - bio_list_add(bio->bi_pool == bs ? : , bio); - - *current->bio_list = nopunt; - - spin_lock(>rescue_lock); - bio_list_merge(>rescue_list, ); - spin_unlock(>rescue_lock); + while ((bio = bio_list_pop(>bio_list))) { + struct bio_set *bs = bio->bi_pool; + if (!bs) + bs = fs_bio_set; - queue_work(bs->rescue_workqueue, >rescue_work); + spin_lock(>rescue_lock); + bio_list_add(>rescue_list, bio); + queue_work(bs->rescue_workqueue, >rescue_work); + spin_unlock(>rescue_lock); + } } /** @@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs) */ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) { - gfp_t saved_gfp = gfp_mask; unsigned front_pad; unsigned inline_vecs; unsigned long idx = BIO_POOL_NONE; @@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs) /* should not use nobvec bioset for nr_iovecs > 0 */ if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0)) return NULL; - /* -* generic_make_request() converts recursion to iteration; this -* means if we're running beneath it, any bios we allocate and -* submit will not be submitted (and thus freed) until after we -* return. -* -* This exposes us to a potential deadlock if we allocate -* multiple bios from the same bio_set() while running -* underneath generic_make_request(). If we were to allocate -* multiple bios (say a stacking block driver that was splitting -* bios), we would deadlock if we exhausted the mempool's -* reserve. -* -* We solve this, and guarantee forward progress, with a rescuer -* workqueue per bio_set. If we go to allocate and there are -* bios on current->bio_list, we first try the allocation -* without __GFP_WAIT; if that fails, we punt those bios we -* would be blocking to the rescuer workqueue before we retry -* with the original gfp_flags. -*/ - - if (current->bio_list && !bio_list_empty(current->bio_list)) - gfp_mask &= ~__GFP_WAIT; p = mempool_alloc(bs->bio_pool, gfp_mask); - if (!p && gfp_mask != saved_gfp) { - punt_bios_to_rescuer(bs);
Re: [PATCH v2] block: flush queued bios when the process blocks
On Tue, Oct 06 2015 at 4:16P -0400, Mike Snitzerwrote: > To give others context for why I'm caring about this issue again, this > recent BZ against 4.3-rc served as a reminder that we _need_ a fix: > https://bugzilla.redhat.com/show_bug.cgi?id=1267650 > > FYI, I cleaned up the plug-based approach a bit further, here is the > incremental patch: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6 > > And here is a new version of the overall combined patch (sharing now > before I transition to looking at alternatives, though my gut is the use > of a plug in generic_make_request really wouldn't hurt us.. famous last > words): > > block/bio.c| 82 > +- > block/blk-core.c | 21 - > drivers/md/dm-bufio.c | 2 +- > drivers/md/raid1.c | 6 ++-- > drivers/md/raid10.c| 6 ++-- > include/linux/blkdev.h | 11 +-- > include/linux/sched.h | 4 --- > 7 files changed, 51 insertions(+), 81 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index ad3f276..3d03668 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work) > } > } > > -static void punt_bios_to_rescuer(struct bio_set *bs) > +/** > + * blk_flush_bio_list > + * @plug: the blk_plug that may have collected bios > + * > + * Pop bios queued on plug->bio_list and submit each of them to > + * their rescue workqueue. > + * > + * If the bio doesn't have a bio_set, we use the default fs_bio_set. > + * However, stacking drivers should use bio_set, so this shouldn't be > + * an issue. > + */ > +void blk_flush_bio_list(struct blk_plug *plug) > { > - struct bio_list punt, nopunt; > struct bio *bio; > > - /* > - * In order to guarantee forward progress we must punt only bios that > - * were allocated from this bio_set; otherwise, if there was a bio on > - * there for a stacking driver higher up in the stack, processing it > - * could require allocating bios from this bio_set, and doing that from > - * our own rescuer would be bad. > - * > - * Since bio lists are singly linked, pop them all instead of trying to > - * remove from the middle of the list: > - */ > - > - bio_list_init(); > - bio_list_init(); > - > - while ((bio = bio_list_pop(current->bio_list))) > - bio_list_add(bio->bi_pool == bs ? : , bio); > - > - *current->bio_list = nopunt; > - > - spin_lock(>rescue_lock); > - bio_list_merge(>rescue_list, ); > - spin_unlock(>rescue_lock); > + while ((bio = bio_list_pop(>bio_list))) { Bleh, should be plug->bio_list.. obviously I didn't compile test this... Here is the incremental I've folded in: diff --git a/block/bio.c b/block/bio.c index 3d03668..b868b9e 100644 --- a/block/bio.c +++ b/block/bio.c @@ -369,7 +369,10 @@ void blk_flush_bio_list(struct blk_plug *plug) { struct bio *bio; - while ((bio = bio_list_pop(>bio_list))) { + if (!plug->bio_list) + return; + + while ((bio = bio_list_pop(plug->bio_list))) { struct bio_set *bs = bio->bi_pool; if (!bs) bs = fs_bio_set; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/