Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-17 Thread Ming Lei
On Fri, Oct 16, 2015 at 11:29 PM, Mike Snitzer  wrote:
> On Thu, Oct 15 2015 at 11:08pm -0400,
> Ming Lei  wrote:
>
>> On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer  wrote:
>> > On Wed, Oct 14 2015 at 11:27pm -0400,
>> > Ming Lei  wrote:
>> >
>> >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
>> >> >
>> >> > Turns out that this change:
>> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
>> >> >
>> >> > needed to be reverted with:
>> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
>> >> >
>> >> > Because nested plugs caused generic_make_request()'s onstack bio_list to
>> >> > go out of scope (blk_finish_plug() wouldn't actually flush the list
>> >> > within generic_make_request because XFS already added an outermost
>> >> > plug).
>> >>
>> >> Looks you should have defined bio_list in plug as
>> >>
>> >>'struct bio_list bio_list'
>> >>
>> >> instead of one pointer.
>> >
>> > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
>> > above that does exactly that).  That wasn't the problem.
>>
>> OK.
>>
>> >
>> >> >
>> >> > But even after fixing that I then hit issues with these changes now
>> >> > resulting in imperfect 'in_generic_make_request' accounting that happens
>> >> > lazily once the outermost plug completes blk_finish_plug.  manifested as
>> >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
>> >>
>> >> Looks this problem should be related with above 'bio_list' definition too.
>> >
>> > No, as I explained it was due to the nested plug:
>> >
>> >> >
>> >> > Basically using the blk-core's onstack plugging isn't workable for
>> >> > fixing this deadlock and we're back to having to seriously consider
>> >> > this (with its additional hook in the scheduler)
>> >
>> > To elaborate, for the code in DM (and other subsystems like bcache) that
>> > rely on accurate accounting of whether we're actively _in_
>> > generic_make_request: using plug to store/manage the bio_list isn't
>>
>> That looks an interesting requirement, which means DM just need to know
>> if the current callsite is from generic_make_request(), so what you need
>> is just one per-task variable.
>>
>> With the stack variable of 'plug', it should be easier to do that for DM, for
>> example, you can introduce one flag in 'struct blk_plug', then set it in
>> the entry of generic_make_request(), and clear it in the exit of the
>> function.
>
> Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_
> generic_make_request() but then it just calls into question why the heck
> we're using the plug to begin with? (especially given plugging is for
> request-based devices at this point!).
>
> It really doesn't make _any_ sense to overload blk_plug by moving the
> bio_list into there and adding a 'in_generic_make_request'... when you
> consider the _only_ reason this was suggested is to (ab)use the existing
> hook in scheduler/core.c.

At the first glance, I mean it is doable to use blk_plug for the issue.

>From last year's discussion, looks Jens thought we have plug already which
should have covered this case, also Kent wanted to implement
plug for bio too.

>
> So I stand by my position that there is really no point in the exercise
> and that it actually hurts the code to try to make this a blk_plug
> "feature".
>
> We already have well established current->bio_list semantics that can be
> reused as a flag given it is a pointer.  The block callout in the
> scheduler is going to grow a conditional either way.  What I've proposed
> _seems_ the cleanest to me and others.  Hopefully you can see that
> aspect of things.
>
> So if you could review the v3 patch with a critical eye that'd be very
> much appreciated.

Will do.

>
> But I do look forward to Jens also having a look at this and providing
> his review feedback.
>
>> > workable because nested plugs change the lifetime of when the bio_list
>> > is processed (as I implemented it -- which was to respect nested plugs).
>> > I could've forced the issue by making the bio_list get processed
>> > regardless of nesting but that would've made the onstack plugging much
>> > more convoluted (duality between nested vs not just for bio_list's
>> > benefit and for what gain?  Simply to avoid an extra conditional
>> > immediately in the scheduler?  That conditional was still added anyway
>> > but just as part of blk_needs_flush_plug so in the end there wasn't any
>> > benefit!).
>> >
>> > Hopefully my middle-of-the-night reply is coherent and helped to clarify
>> > my position that (ab)using blk_plug for the bio_list management is
>> > _really_ awkward. ;)
>>
>> Hope it wan't my reply to cause the break of your sleep, :-)
>
> No, my dog woke me up to go outside at 4am.. I was up and couldn't
> resist looking at my phone.. the rest is history ;)



-- 
Ming Lei
--
To unsubscribe from 

Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-17 Thread Ming Lei
On Fri, Oct 16, 2015 at 11:29 PM, Mike Snitzer  wrote:
> On Thu, Oct 15 2015 at 11:08pm -0400,
> Ming Lei  wrote:
>
>> On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer  wrote:
>> > On Wed, Oct 14 2015 at 11:27pm -0400,
>> > Ming Lei  wrote:
>> >
>> >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
>> >> >
>> >> > Turns out that this change:
>> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
>> >> >
>> >> > needed to be reverted with:
>> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
>> >> >
>> >> > Because nested plugs caused generic_make_request()'s onstack bio_list to
>> >> > go out of scope (blk_finish_plug() wouldn't actually flush the list
>> >> > within generic_make_request because XFS already added an outermost
>> >> > plug).
>> >>
>> >> Looks you should have defined bio_list in plug as
>> >>
>> >>'struct bio_list bio_list'
>> >>
>> >> instead of one pointer.
>> >
>> > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
>> > above that does exactly that).  That wasn't the problem.
>>
>> OK.
>>
>> >
>> >> >
>> >> > But even after fixing that I then hit issues with these changes now
>> >> > resulting in imperfect 'in_generic_make_request' accounting that happens
>> >> > lazily once the outermost plug completes blk_finish_plug.  manifested as
>> >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
>> >>
>> >> Looks this problem should be related with above 'bio_list' definition too.
>> >
>> > No, as I explained it was due to the nested plug:
>> >
>> >> >
>> >> > Basically using the blk-core's onstack plugging isn't workable for
>> >> > fixing this deadlock and we're back to having to seriously consider
>> >> > this (with its additional hook in the scheduler)
>> >
>> > To elaborate, for the code in DM (and other subsystems like bcache) that
>> > rely on accurate accounting of whether we're actively _in_
>> > generic_make_request: using plug to store/manage the bio_list isn't
>>
>> That looks an interesting requirement, which means DM just need to know
>> if the current callsite is from generic_make_request(), so what you need
>> is just one per-task variable.
>>
>> With the stack variable of 'plug', it should be easier to do that for DM, for
>> example, you can introduce one flag in 'struct blk_plug', then set it in
>> the entry of generic_make_request(), and clear it in the exit of the
>> function.
>
> Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_
> generic_make_request() but then it just calls into question why the heck
> we're using the plug to begin with? (especially given plugging is for
> request-based devices at this point!).
>
> It really doesn't make _any_ sense to overload blk_plug by moving the
> bio_list into there and adding a 'in_generic_make_request'... when you
> consider the _only_ reason this was suggested is to (ab)use the existing
> hook in scheduler/core.c.

At the first glance, I mean it is doable to use blk_plug for the issue.

>From last year's discussion, looks Jens thought we have plug already which
should have covered this case, also Kent wanted to implement
plug for bio too.

>
> So I stand by my position that there is really no point in the exercise
> and that it actually hurts the code to try to make this a blk_plug
> "feature".
>
> We already have well established current->bio_list semantics that can be
> reused as a flag given it is a pointer.  The block callout in the
> scheduler is going to grow a conditional either way.  What I've proposed
> _seems_ the cleanest to me and others.  Hopefully you can see that
> aspect of things.
>
> So if you could review the v3 patch with a critical eye that'd be very
> much appreciated.

Will do.

>
> But I do look forward to Jens also having a look at this and providing
> his review feedback.
>
>> > workable because nested plugs change the lifetime of when the bio_list
>> > is processed (as I implemented it -- which was to respect nested plugs).
>> > I could've forced the issue by making the bio_list get processed
>> > regardless of nesting but that would've made the onstack plugging much
>> > more convoluted (duality between nested vs not just for bio_list's
>> > benefit and for what gain?  Simply to avoid an extra conditional
>> > immediately in the scheduler?  That conditional was still added anyway
>> > but just as part of blk_needs_flush_plug so in the end there wasn't any
>> > benefit!).
>> >
>> > Hopefully my middle-of-the-night reply is coherent and helped to clarify
>> > my position that (ab)using blk_plug for the bio_list management is
>> > _really_ awkward. ;)
>>
>> Hope it wan't my reply to cause the break of your sleep, :-)
>
> No, my dog woke me up to go outside at 4am.. I was up 

Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-16 Thread Mike Snitzer
On Thu, Oct 15 2015 at 11:08pm -0400,
Ming Lei  wrote:

> On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer  wrote:
> > On Wed, Oct 14 2015 at 11:27pm -0400,
> > Ming Lei  wrote:
> >
> >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
> >> >
> >> > Turns out that this change:
> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
> >> >
> >> > needed to be reverted with:
> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
> >> >
> >> > Because nested plugs caused generic_make_request()'s onstack bio_list to
> >> > go out of scope (blk_finish_plug() wouldn't actually flush the list
> >> > within generic_make_request because XFS already added an outermost
> >> > plug).
> >>
> >> Looks you should have defined bio_list in plug as
> >>
> >>'struct bio_list bio_list'
> >>
> >> instead of one pointer.
> >
> > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
> > above that does exactly that).  That wasn't the problem.
> 
> OK.
> 
> >
> >> >
> >> > But even after fixing that I then hit issues with these changes now
> >> > resulting in imperfect 'in_generic_make_request' accounting that happens
> >> > lazily once the outermost plug completes blk_finish_plug.  manifested as
> >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
> >>
> >> Looks this problem should be related with above 'bio_list' definition too.
> >
> > No, as I explained it was due to the nested plug:
> >
> >> >
> >> > Basically using the blk-core's onstack plugging isn't workable for
> >> > fixing this deadlock and we're back to having to seriously consider
> >> > this (with its additional hook in the scheduler)
> >
> > To elaborate, for the code in DM (and other subsystems like bcache) that
> > rely on accurate accounting of whether we're actively _in_
> > generic_make_request: using plug to store/manage the bio_list isn't
> 
> That looks an interesting requirement, which means DM just need to know
> if the current callsite is from generic_make_request(), so what you need
> is just one per-task variable.
> 
> With the stack variable of 'plug', it should be easier to do that for DM, for
> example, you can introduce one flag in 'struct blk_plug', then set it in
> the entry of generic_make_request(), and clear it in the exit of the
> function.

Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_
generic_make_request() but then it just calls into question why the heck
we're using the plug to begin with? (especially given plugging is for
request-based devices at this point!).

It really doesn't make _any_ sense to overload blk_plug by moving the
bio_list into there and adding a 'in_generic_make_request'... when you
consider the _only_ reason this was suggested is to (ab)use the existing
hook in scheduler/core.c.

So I stand by my position that there is really no point in the exercise
and that it actually hurts the code to try to make this a blk_plug
"feature".

We already have well established current->bio_list semantics that can be
reused as a flag given it is a pointer.  The block callout in the
scheduler is going to grow a conditional either way.  What I've proposed
_seems_ the cleanest to me and others.  Hopefully you can see that
aspect of things.

So if you could review the v3 patch with a critical eye that'd be very
much appreciated.

But I do look forward to Jens also having a look at this and providing
his review feedback.

> > workable because nested plugs change the lifetime of when the bio_list
> > is processed (as I implemented it -- which was to respect nested plugs).
> > I could've forced the issue by making the bio_list get processed
> > regardless of nesting but that would've made the onstack plugging much
> > more convoluted (duality between nested vs not just for bio_list's
> > benefit and for what gain?  Simply to avoid an extra conditional
> > immediately in the scheduler?  That conditional was still added anyway
> > but just as part of blk_needs_flush_plug so in the end there wasn't any
> > benefit!).
> >
> > Hopefully my middle-of-the-night reply is coherent and helped to clarify
> > my position that (ab)using blk_plug for the bio_list management is
> > _really_ awkward. ;)
> 
> Hope it wan't my reply to cause the break of your sleep, :-)

No, my dog woke me up to go outside at 4am.. I was up and couldn't
resist looking at my phone.. the rest is history ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-16 Thread Mike Snitzer
On Thu, Oct 15 2015 at 11:08pm -0400,
Ming Lei  wrote:

> On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer  wrote:
> > On Wed, Oct 14 2015 at 11:27pm -0400,
> > Ming Lei  wrote:
> >
> >> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
> >> >
> >> > Turns out that this change:
> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
> >> >
> >> > needed to be reverted with:
> >> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
> >> >
> >> > Because nested plugs caused generic_make_request()'s onstack bio_list to
> >> > go out of scope (blk_finish_plug() wouldn't actually flush the list
> >> > within generic_make_request because XFS already added an outermost
> >> > plug).
> >>
> >> Looks you should have defined bio_list in plug as
> >>
> >>'struct bio_list bio_list'
> >>
> >> instead of one pointer.
> >
> > I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
> > above that does exactly that).  That wasn't the problem.
> 
> OK.
> 
> >
> >> >
> >> > But even after fixing that I then hit issues with these changes now
> >> > resulting in imperfect 'in_generic_make_request' accounting that happens
> >> > lazily once the outermost plug completes blk_finish_plug.  manifested as
> >> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
> >>
> >> Looks this problem should be related with above 'bio_list' definition too.
> >
> > No, as I explained it was due to the nested plug:
> >
> >> >
> >> > Basically using the blk-core's onstack plugging isn't workable for
> >> > fixing this deadlock and we're back to having to seriously consider
> >> > this (with its additional hook in the scheduler)
> >
> > To elaborate, for the code in DM (and other subsystems like bcache) that
> > rely on accurate accounting of whether we're actively _in_
> > generic_make_request: using plug to store/manage the bio_list isn't
> 
> That looks an interesting requirement, which means DM just need to know
> if the current callsite is from generic_make_request(), so what you need
> is just one per-task variable.
> 
> With the stack variable of 'plug', it should be easier to do that for DM, for
> example, you can introduce one flag in 'struct blk_plug', then set it in
> the entry of generic_make_request(), and clear it in the exit of the
> function.

Yes, I mean we _could_ set/clear the 'in_generic_make_request' flag _in_
generic_make_request() but then it just calls into question why the heck
we're using the plug to begin with? (especially given plugging is for
request-based devices at this point!).

It really doesn't make _any_ sense to overload blk_plug by moving the
bio_list into there and adding a 'in_generic_make_request'... when you
consider the _only_ reason this was suggested is to (ab)use the existing
hook in scheduler/core.c.

So I stand by my position that there is really no point in the exercise
and that it actually hurts the code to try to make this a blk_plug
"feature".

We already have well established current->bio_list semantics that can be
reused as a flag given it is a pointer.  The block callout in the
scheduler is going to grow a conditional either way.  What I've proposed
_seems_ the cleanest to me and others.  Hopefully you can see that
aspect of things.

So if you could review the v3 patch with a critical eye that'd be very
much appreciated.

But I do look forward to Jens also having a look at this and providing
his review feedback.

> > workable because nested plugs change the lifetime of when the bio_list
> > is processed (as I implemented it -- which was to respect nested plugs).
> > I could've forced the issue by making the bio_list get processed
> > regardless of nesting but that would've made the onstack plugging much
> > more convoluted (duality between nested vs not just for bio_list's
> > benefit and for what gain?  Simply to avoid an extra conditional
> > immediately in the scheduler?  That conditional was still added anyway
> > but just as part of blk_needs_flush_plug so in the end there wasn't any
> > benefit!).
> >
> > Hopefully my middle-of-the-night reply is coherent and helped to clarify
> > my position that (ab)using blk_plug for the bio_list management is
> > _really_ awkward. ;)
> 
> Hope it wan't my reply to cause the break of your sleep, :-)

No, my dog woke me up to go outside at 4am.. I was up and couldn't
resist looking at my phone.. the rest is history ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-15 Thread Ming Lei
On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer  wrote:
> On Wed, Oct 14 2015 at 11:27pm -0400,
> Ming Lei  wrote:
>
>> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
>> >
>> > Turns out that this change:
>> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
>> >
>> > needed to be reverted with:
>> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
>> >
>> > Because nested plugs caused generic_make_request()'s onstack bio_list to
>> > go out of scope (blk_finish_plug() wouldn't actually flush the list
>> > within generic_make_request because XFS already added an outermost
>> > plug).
>>
>> Looks you should have defined bio_list in plug as
>>
>>'struct bio_list bio_list'
>>
>> instead of one pointer.
>
> I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
> above that does exactly that).  That wasn't the problem.

OK.

>
>> >
>> > But even after fixing that I then hit issues with these changes now
>> > resulting in imperfect 'in_generic_make_request' accounting that happens
>> > lazily once the outermost plug completes blk_finish_plug.  manifested as
>> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
>>
>> Looks this problem should be related with above 'bio_list' definition too.
>
> No, as I explained it was due to the nested plug:
>
>> >
>> > Basically using the blk-core's onstack plugging isn't workable for
>> > fixing this deadlock and we're back to having to seriously consider
>> > this (with its additional hook in the scheduler)
>
> To elaborate, for the code in DM (and other subsystems like bcache) that
> rely on accurate accounting of whether we're actively _in_
> generic_make_request: using plug to store/manage the bio_list isn't

That looks an interesting requirement, which means DM just need to know
if the current callsite is from generic_make_request(), so what you need
is just one per-task variable.

With the stack variable of 'plug', it should be easier to do that for DM, for
example, you can introduce one flag in 'struct blk_plug', then set it in
the entry of generic_make_request(), and clear it in the exit of the
function.

> workable because nested plugs change the lifetime of when the bio_list
> is processed (as I implemented it -- which was to respect nested plugs).
> I could've forced the issue by making the bio_list get processed
> regardless of nesting but that would've made the onstack plugging much
> more convoluted (duality between nested vs not just for bio_list's
> benefit and for what gain?  Simply to avoid an extra conditional
> immediately in the scheduler?  That conditional was still added anyway
> but just as part of blk_needs_flush_plug so in the end there wasn't any
> benefit!).
>
> Hopefully my middle-of-the-night reply is coherent and helped to clarify
> my position that (ab)using blk_plug for the bio_list management is
> _really_ awkward. ;)

Hope it wan't my reply to cause the break of your sleep, :-)

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-15 Thread Mike Snitzer
On Wed, Oct 14 2015 at 11:27pm -0400,
Ming Lei  wrote:

> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
> >
> > Turns out that this change:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
> >
> > needed to be reverted with:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
> >
> > Because nested plugs caused generic_make_request()'s onstack bio_list to
> > go out of scope (blk_finish_plug() wouldn't actually flush the list
> > within generic_make_request because XFS already added an outermost
> > plug).
> 
> Looks you should have defined bio_list in plug as
> 
>'struct bio_list bio_list'
> 
> instead of one pointer.

I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
above that does exactly that).  That wasn't the problem.

> >
> > But even after fixing that I then hit issues with these changes now
> > resulting in imperfect 'in_generic_make_request' accounting that happens
> > lazily once the outermost plug completes blk_finish_plug.  manifested as
> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
> 
> Looks this problem should be related with above 'bio_list' definition too.

No, as I explained it was due to the nested plug:

> >
> > Basically using the blk-core's onstack plugging isn't workable for
> > fixing this deadlock and we're back to having to seriously consider
> > this (with its additional hook in the scheduler)

To elaborate, for the code in DM (and other subsystems like bcache) that
rely on accurate accounting of whether we're actively _in_
generic_make_request: using plug to store/manage the bio_list isn't
workable because nested plugs change the lifetime of when the bio_list
is processed (as I implemented it -- which was to respect nested plugs).
I could've forced the issue by making the bio_list get processed
regardless of nesting but that would've made the onstack plugging much
more convoluted (duality between nested vs not just for bio_list's
benefit and for what gain?  Simply to avoid an extra conditional
immediately in the scheduler?  That conditional was still added anyway
but just as part of blk_needs_flush_plug so in the end there wasn't any
benefit!).

Hopefully my middle-of-the-night reply is coherent and helped to clarify
my position that (ab)using blk_plug for the bio_list management is
_really_ awkward. ;)

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-15 Thread Mike Snitzer
On Wed, Oct 14 2015 at 11:27pm -0400,
Ming Lei  wrote:

> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
> >
> > Turns out that this change:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
> >
> > needed to be reverted with:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
> >
> > Because nested plugs caused generic_make_request()'s onstack bio_list to
> > go out of scope (blk_finish_plug() wouldn't actually flush the list
> > within generic_make_request because XFS already added an outermost
> > plug).
> 
> Looks you should have defined bio_list in plug as
> 
>'struct bio_list bio_list'
> 
> instead of one pointer.

I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
above that does exactly that).  That wasn't the problem.

> >
> > But even after fixing that I then hit issues with these changes now
> > resulting in imperfect 'in_generic_make_request' accounting that happens
> > lazily once the outermost plug completes blk_finish_plug.  manifested as
> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
> 
> Looks this problem should be related with above 'bio_list' definition too.

No, as I explained it was due to the nested plug:

> >
> > Basically using the blk-core's onstack plugging isn't workable for
> > fixing this deadlock and we're back to having to seriously consider
> > this (with its additional hook in the scheduler)

To elaborate, for the code in DM (and other subsystems like bcache) that
rely on accurate accounting of whether we're actively _in_
generic_make_request: using plug to store/manage the bio_list isn't
workable because nested plugs change the lifetime of when the bio_list
is processed (as I implemented it -- which was to respect nested plugs).
I could've forced the issue by making the bio_list get processed
regardless of nesting but that would've made the onstack plugging much
more convoluted (duality between nested vs not just for bio_list's
benefit and for what gain?  Simply to avoid an extra conditional
immediately in the scheduler?  That conditional was still added anyway
but just as part of blk_needs_flush_plug so in the end there wasn't any
benefit!).

Hopefully my middle-of-the-night reply is coherent and helped to clarify
my position that (ab)using blk_plug for the bio_list management is
_really_ awkward. ;)

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-15 Thread Ming Lei
On Thu, Oct 15, 2015 at 4:06 PM, Mike Snitzer  wrote:
> On Wed, Oct 14 2015 at 11:27pm -0400,
> Ming Lei  wrote:
>
>> On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
>> >
>> > Turns out that this change:
>> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
>> >
>> > needed to be reverted with:
>> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
>> >
>> > Because nested plugs caused generic_make_request()'s onstack bio_list to
>> > go out of scope (blk_finish_plug() wouldn't actually flush the list
>> > within generic_make_request because XFS already added an outermost
>> > plug).
>>
>> Looks you should have defined bio_list in plug as
>>
>>'struct bio_list bio_list'
>>
>> instead of one pointer.
>
> I realized that and fixed it (see commit ad3ccd760da7c05b90 referenced
> above that does exactly that).  That wasn't the problem.

OK.

>
>> >
>> > But even after fixing that I then hit issues with these changes now
>> > resulting in imperfect 'in_generic_make_request' accounting that happens
>> > lazily once the outermost plug completes blk_finish_plug.  manifested as
>> > dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
>>
>> Looks this problem should be related with above 'bio_list' definition too.
>
> No, as I explained it was due to the nested plug:
>
>> >
>> > Basically using the blk-core's onstack plugging isn't workable for
>> > fixing this deadlock and we're back to having to seriously consider
>> > this (with its additional hook in the scheduler)
>
> To elaborate, for the code in DM (and other subsystems like bcache) that
> rely on accurate accounting of whether we're actively _in_
> generic_make_request: using plug to store/manage the bio_list isn't

That looks an interesting requirement, which means DM just need to know
if the current callsite is from generic_make_request(), so what you need
is just one per-task variable.

With the stack variable of 'plug', it should be easier to do that for DM, for
example, you can introduce one flag in 'struct blk_plug', then set it in
the entry of generic_make_request(), and clear it in the exit of the
function.

> workable because nested plugs change the lifetime of when the bio_list
> is processed (as I implemented it -- which was to respect nested plugs).
> I could've forced the issue by making the bio_list get processed
> regardless of nesting but that would've made the onstack plugging much
> more convoluted (duality between nested vs not just for bio_list's
> benefit and for what gain?  Simply to avoid an extra conditional
> immediately in the scheduler?  That conditional was still added anyway
> but just as part of blk_needs_flush_plug so in the end there wasn't any
> benefit!).
>
> Hopefully my middle-of-the-night reply is coherent and helped to clarify
> my position that (ab)using blk_plug for the bio_list management is
> _really_ awkward. ;)

Hope it wan't my reply to cause the break of your sleep, :-)

Thanks,
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-14 Thread Ming Lei
On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
> On Thu, Oct 08 2015 at 11:08am -0400,
> Mike Snitzer  wrote:
>
>> On Thu, Oct 08 2015 at 11:04am -0400,
>> Mikulas Patocka  wrote:
>>
>> >
>> >
>> > On Tue, 6 Oct 2015, Mike Snitzer wrote:
>> >
>> > > To give others context for why I'm caring about this issue again, this
>> > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650
>> > >
>> > > FYI, I cleaned up the plug-based approach a bit further, here is the
>> > > incremental patch:
>> > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
>> > >
>> > > And here is a new version of the overall combined patch (sharing now
>> > > before I transition to looking at alternatives, though my gut is the use
>> > > of a plug in generic_make_request really wouldn't hurt us.. famous last
>> > > words):
>> > >
>> > >  block/bio.c| 82 
>> > > +-
>> > >  block/blk-core.c   | 21 -
>> > >  drivers/md/dm-bufio.c  |  2 +-
>> > >  drivers/md/raid1.c |  6 ++--
>> > >  drivers/md/raid10.c|  6 ++--
>> > >  include/linux/blkdev.h | 11 +--
>> > >  include/linux/sched.h  |  4 ---
>> > >  7 files changed, 51 insertions(+), 81 deletions(-)
>> > >
>> ...
>> > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
>> > > index 2dd3308..c2bff16 100644
>> > > --- a/drivers/md/dm-bufio.c
>> > > +++ b/drivers/md/dm-bufio.c
>> > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct 
>> > > dm_bufio_client *c)
>> > >  #define DM_BUFIO_CACHE(c)
>> > > (dm_bufio_caches[dm_bufio_cache_index(c)])
>> > >  #define DM_BUFIO_CACHE_NAME(c)   
>> > > (dm_bufio_cache_names[dm_bufio_cache_index(c)])
>> > >
>> > > -#define dm_bufio_in_request()(!!current->bio_list)
>> > > +#define dm_bufio_in_request()(current->plug && 
>> > > !!current->plug->bio_list)
>> >
>> > This condition is repeated several times throughout the whole patch - so
>> > maybe you should make it a function in block device header file.
>>
>> Yeah, I thought of that too but forgot to come back to it.  Will do,
>> thanks.
>>
>> FYI, I found another bug in my last patch and fixed it up.  I'll get
>> some refactoring done (including your suggestion), actually _test_ the
>> code (e.g. verify all of lvm testsuite passes) and then send out v3.
>
> Turns out that this change:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
>
> needed to be reverted with:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
>
> Because nested plugs caused generic_make_request()'s onstack bio_list to
> go out of scope (blk_finish_plug() wouldn't actually flush the list
> within generic_make_request because XFS already added an outermost
> plug).

Looks you should have defined bio_list in plug as

   'struct bio_list bio_list'

instead of one pointer.

>
> But even after fixing that I then hit issues with these changes now
> resulting in imperfect 'in_generic_make_request' accounting that happens
> lazily once the outermost plug completes blk_finish_plug.  manifested as
> dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.

Looks this problem should be related with above 'bio_list' definition too.

>
> Basically using the blk-core's onstack plugging isn't workable for
> fixing this deadlock and we're back to having to seriously consider
> this (with its additional hook in the scheduler):
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-14 Thread Ming Lei
On Sat, Oct 10, 2015 at 3:52 AM, Mike Snitzer  wrote:
> On Thu, Oct 08 2015 at 11:08am -0400,
> Mike Snitzer  wrote:
>
>> On Thu, Oct 08 2015 at 11:04am -0400,
>> Mikulas Patocka  wrote:
>>
>> >
>> >
>> > On Tue, 6 Oct 2015, Mike Snitzer wrote:
>> >
>> > > To give others context for why I'm caring about this issue again, this
>> > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650
>> > >
>> > > FYI, I cleaned up the plug-based approach a bit further, here is the
>> > > incremental patch:
>> > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
>> > >
>> > > And here is a new version of the overall combined patch (sharing now
>> > > before I transition to looking at alternatives, though my gut is the use
>> > > of a plug in generic_make_request really wouldn't hurt us.. famous last
>> > > words):
>> > >
>> > >  block/bio.c| 82 
>> > > +-
>> > >  block/blk-core.c   | 21 -
>> > >  drivers/md/dm-bufio.c  |  2 +-
>> > >  drivers/md/raid1.c |  6 ++--
>> > >  drivers/md/raid10.c|  6 ++--
>> > >  include/linux/blkdev.h | 11 +--
>> > >  include/linux/sched.h  |  4 ---
>> > >  7 files changed, 51 insertions(+), 81 deletions(-)
>> > >
>> ...
>> > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
>> > > index 2dd3308..c2bff16 100644
>> > > --- a/drivers/md/dm-bufio.c
>> > > +++ b/drivers/md/dm-bufio.c
>> > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct 
>> > > dm_bufio_client *c)
>> > >  #define DM_BUFIO_CACHE(c)
>> > > (dm_bufio_caches[dm_bufio_cache_index(c)])
>> > >  #define DM_BUFIO_CACHE_NAME(c)   
>> > > (dm_bufio_cache_names[dm_bufio_cache_index(c)])
>> > >
>> > > -#define dm_bufio_in_request()(!!current->bio_list)
>> > > +#define dm_bufio_in_request()(current->plug && 
>> > > !!current->plug->bio_list)
>> >
>> > This condition is repeated several times throughout the whole patch - so
>> > maybe you should make it a function in block device header file.
>>
>> Yeah, I thought of that too but forgot to come back to it.  Will do,
>> thanks.
>>
>> FYI, I found another bug in my last patch and fixed it up.  I'll get
>> some refactoring done (including your suggestion), actually _test_ the
>> code (e.g. verify all of lvm testsuite passes) and then send out v3.
>
> Turns out that this change:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
>
> needed to be reverted with:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
>
> Because nested plugs caused generic_make_request()'s onstack bio_list to
> go out of scope (blk_finish_plug() wouldn't actually flush the list
> within generic_make_request because XFS already added an outermost
> plug).

Looks you should have defined bio_list in plug as

   'struct bio_list bio_list'

instead of one pointer.

>
> But even after fixing that I then hit issues with these changes now
> resulting in imperfect 'in_generic_make_request' accounting that happens
> lazily once the outermost plug completes blk_finish_plug.  manifested as
> dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.

Looks this problem should be related with above 'bio_list' definition too.

>
> Basically using the blk-core's onstack plugging isn't workable for
> fixing this deadlock and we're back to having to seriously consider
> this (with its additional hook in the scheduler):
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-09 Thread Mike Snitzer
On Fri, Oct 09 2015 at  3:52pm -0400,
Mike Snitzer  wrote:
 
> Turns out that this change:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
> 
> needed to be reverted with:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
> 
> Because nested plugs caused generic_make_request()'s onstack bio_list to
> go out of scope (blk_finish_plug() wouldn't actually flush the list
> within generic_make_request because XFS already added an outermost
> plug).
> 
> But even after fixing that I then hit issues with these changes now
> resulting in imperfect 'in_generic_make_request' accounting that happens
> lazily once the outermost plug completes blk_finish_plug.  manifested as
> dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
> 
> Basically using the blk-core's onstack plugging isn't workable for
> fixing this deadlock and we're back to having to seriously consider
> this (with its additional hook in the scheduler): 

http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=a91709cd32b5ca7ca047b68c9299e747f2ae6ca2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-09 Thread Mike Snitzer
On Thu, Oct 08 2015 at 11:08am -0400,
Mike Snitzer  wrote:

> On Thu, Oct 08 2015 at 11:04am -0400,
> Mikulas Patocka  wrote:
> 
> > 
> > 
> > On Tue, 6 Oct 2015, Mike Snitzer wrote:
> > 
> > > To give others context for why I'm caring about this issue again, this
> > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> > > 
> > > FYI, I cleaned up the plug-based approach a bit further, here is the
> > > incremental patch:
> > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> > > 
> > > And here is a new version of the overall combined patch (sharing now
> > > before I transition to looking at alternatives, though my gut is the use
> > > of a plug in generic_make_request really wouldn't hurt us.. famous last
> > > words):
> > > 
> > >  block/bio.c| 82 
> > > +-
> > >  block/blk-core.c   | 21 -
> > >  drivers/md/dm-bufio.c  |  2 +-
> > >  drivers/md/raid1.c |  6 ++--
> > >  drivers/md/raid10.c|  6 ++--
> > >  include/linux/blkdev.h | 11 +--
> > >  include/linux/sched.h  |  4 ---
> > >  7 files changed, 51 insertions(+), 81 deletions(-)
> > > 
> ...
> > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> > > index 2dd3308..c2bff16 100644
> > > --- a/drivers/md/dm-bufio.c
> > > +++ b/drivers/md/dm-bufio.c
> > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct 
> > > dm_bufio_client *c)
> > >  #define DM_BUFIO_CACHE(c)
> > > (dm_bufio_caches[dm_bufio_cache_index(c)])
> > >  #define DM_BUFIO_CACHE_NAME(c)   
> > > (dm_bufio_cache_names[dm_bufio_cache_index(c)])
> > >  
> > > -#define dm_bufio_in_request()(!!current->bio_list)
> > > +#define dm_bufio_in_request()(current->plug && 
> > > !!current->plug->bio_list)
> > 
> > This condition is repeated several times throughout the whole patch - so 
> > maybe you should make it a function in block device header file.
> 
> Yeah, I thought of that too but forgot to come back to it.  Will do,
> thanks.
> 
> FYI, I found another bug in my last patch and fixed it up.  I'll get
> some refactoring done (including your suggestion), actually _test_ the
> code (e.g. verify all of lvm testsuite passes) and then send out v3.

Turns out that this change:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841

needed to be reverted with:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe

Because nested plugs caused generic_make_request()'s onstack bio_list to
go out of scope (blk_finish_plug() wouldn't actually flush the list
within generic_make_request because XFS already added an outermost
plug).

But even after fixing that I then hit issues with these changes now
resulting in imperfect 'in_generic_make_request' accounting that happens
lazily once the outermost plug completes blk_finish_plug.  manifested as
dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.

Basically using the blk-core's onstack plugging isn't workable for
fixing this deadlock and we're back to having to seriously consider
this (with its additional hook in the scheduler): 

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-09 Thread kbuild test robot
Hi Mike,

[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: x86_64-lkp (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/highmem.h:4,
from include/linux/bio.h:23,
from drivers/md/bcache/bcache.h:181,
from drivers/md/bcache/btree.c:23:
   drivers/md/bcache/btree.c: In function '__bch_btree_node_write':
>> drivers/md/bcache/btree.c:454:16: error: 'struct task_struct' has no member 
>> named 'bio_list'
 BUG_ON(current->bio_list);
   ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/md/bcache/btree.c:454:2: note: in expansion of macro 'BUG_ON'
 BUG_ON(current->bio_list);
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_leaf_dirty':
   drivers/md/bcache/btree.c:548:14: error: 'struct task_struct' has no member 
named 'bio_list'
 !current->bio_list)
 ^
   In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/highmem.h:4,
from include/linux/bio.h:23,
from drivers/md/bcache/bcache.h:181,
from drivers/md/bcache/btree.c:23:
   drivers/md/bcache/btree.c: In function 'mca_alloc':
   drivers/md/bcache/btree.c:893:16: error: 'struct task_struct' has no member 
named 'bio_list'
 BUG_ON(current->bio_list);
   ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/md/bcache/btree.c:893:2: note: in expansion of macro 'BUG_ON'
 BUG_ON(current->bio_list);
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_node_get':
   drivers/md/bcache/btree.c:980:14: error: 'struct task_struct' has no member 
named 'bio_list'
  if (current->bio_list)
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_insert_node':
   drivers/md/bcache/btree.c:2131:13: error: 'struct task_struct' has no member 
named 'bio_list'
 if (current->bio_list) {
^
   In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/highmem.h:4,
from include/linux/bio.h:23,
from drivers/md/bcache/bcache.h:181,
from drivers/md/bcache/btree.c:23:
   drivers/md/bcache/btree.c: In function 'bch_btree_insert':
   drivers/md/bcache/btree.c:2211:16: error: 'struct task_struct' has no member 
named 'bio_list'
 BUG_ON(current->bio_list);
   ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/md/bcache/btree.c:2211:2: note: in expansion of macro 'BUG_ON'
 BUG_ON(current->bio_list);
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_insert_node':
   drivers/md/bcache/btree.c:2147:1: warning: control reaches end of non-void 
function [-Wreturn-type]
}
^

vim +454 drivers/md/bcache/btree.c

ee811287 Kent Overstreet 2013-12-17  448struct bset *i = 
btree_bset_last(b);
cafe5635 Kent Overstreet 2013-03-23  449  
2a285686 Kent Overstreet 2014-03-04  450
lockdep_assert_held(>write_lock);
2a285686 Kent Overstreet 2014-03-04  451  
c37511b8 Kent Overstreet 2013-04-26  452trace_bcache_btree_write(b);
c37511b8 Kent Overstreet 2013-04-26  453  
cafe5635 Kent Overstreet 2013-03-23 @454BUG_ON(current->bio_list);
57943511 Kent Overstreet 2013-04-25  455BUG_ON(b->written >= 
btree_blocks(b));
57943511 Kent Overstreet 2013-04-25  456BUG_ON(b->written && !i->keys);
ee811287 Kent Overstreet 2013-12-17  457BUG_ON(btree_bset_first(b)->seq 
!= i->seq);

:: The code at line 454 was first introduced by commit
:: cafe563591446cf80bfbc2fe3bc72a2e36cf1060 bcache: A block layer cache

:: TO: Kent Overstreet 
:: CC: Kent Overstreet 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-09 Thread kbuild test robot
Hi Mike,

[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: x86_64-lkp (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/highmem.h:4,
from include/linux/bio.h:23,
from drivers/md/bcache/bcache.h:181,
from drivers/md/bcache/btree.c:23:
   drivers/md/bcache/btree.c: In function '__bch_btree_node_write':
>> drivers/md/bcache/btree.c:454:16: error: 'struct task_struct' has no member 
>> named 'bio_list'
 BUG_ON(current->bio_list);
   ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/md/bcache/btree.c:454:2: note: in expansion of macro 'BUG_ON'
 BUG_ON(current->bio_list);
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_leaf_dirty':
   drivers/md/bcache/btree.c:548:14: error: 'struct task_struct' has no member 
named 'bio_list'
 !current->bio_list)
 ^
   In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/highmem.h:4,
from include/linux/bio.h:23,
from drivers/md/bcache/bcache.h:181,
from drivers/md/bcache/btree.c:23:
   drivers/md/bcache/btree.c: In function 'mca_alloc':
   drivers/md/bcache/btree.c:893:16: error: 'struct task_struct' has no member 
named 'bio_list'
 BUG_ON(current->bio_list);
   ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/md/bcache/btree.c:893:2: note: in expansion of macro 'BUG_ON'
 BUG_ON(current->bio_list);
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_node_get':
   drivers/md/bcache/btree.c:980:14: error: 'struct task_struct' has no member 
named 'bio_list'
  if (current->bio_list)
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_insert_node':
   drivers/md/bcache/btree.c:2131:13: error: 'struct task_struct' has no member 
named 'bio_list'
 if (current->bio_list) {
^
   In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/highmem.h:4,
from include/linux/bio.h:23,
from drivers/md/bcache/bcache.h:181,
from drivers/md/bcache/btree.c:23:
   drivers/md/bcache/btree.c: In function 'bch_btree_insert':
   drivers/md/bcache/btree.c:2211:16: error: 'struct task_struct' has no member 
named 'bio_list'
 BUG_ON(current->bio_list);
   ^
   include/linux/compiler.h:166:42: note: in definition of macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
 ^
   drivers/md/bcache/btree.c:2211:2: note: in expansion of macro 'BUG_ON'
 BUG_ON(current->bio_list);
 ^
   drivers/md/bcache/btree.c: In function 'bch_btree_insert_node':
   drivers/md/bcache/btree.c:2147:1: warning: control reaches end of non-void 
function [-Wreturn-type]
}
^

vim +454 drivers/md/bcache/btree.c

ee811287 Kent Overstreet 2013-12-17  448struct bset *i = 
btree_bset_last(b);
cafe5635 Kent Overstreet 2013-03-23  449  
2a285686 Kent Overstreet 2014-03-04  450
lockdep_assert_held(>write_lock);
2a285686 Kent Overstreet 2014-03-04  451  
c37511b8 Kent Overstreet 2013-04-26  452trace_bcache_btree_write(b);
c37511b8 Kent Overstreet 2013-04-26  453  
cafe5635 Kent Overstreet 2013-03-23 @454BUG_ON(current->bio_list);
57943511 Kent Overstreet 2013-04-25  455BUG_ON(b->written >= 
btree_blocks(b));
57943511 Kent Overstreet 2013-04-25  456BUG_ON(b->written && !i->keys);
ee811287 Kent Overstreet 2013-12-17  457BUG_ON(btree_bset_first(b)->seq 
!= i->seq);

:: The code at line 454 was first introduced by commit
:: cafe563591446cf80bfbc2fe3bc72a2e36cf1060 bcache: A block layer cache

:: TO: Kent Overstreet 
:: CC: Kent Overstreet 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-09 Thread Mike Snitzer
On Thu, Oct 08 2015 at 11:08am -0400,
Mike Snitzer  wrote:

> On Thu, Oct 08 2015 at 11:04am -0400,
> Mikulas Patocka  wrote:
> 
> > 
> > 
> > On Tue, 6 Oct 2015, Mike Snitzer wrote:
> > 
> > > To give others context for why I'm caring about this issue again, this
> > > recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> > > 
> > > FYI, I cleaned up the plug-based approach a bit further, here is the
> > > incremental patch:
> > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> > > 
> > > And here is a new version of the overall combined patch (sharing now
> > > before I transition to looking at alternatives, though my gut is the use
> > > of a plug in generic_make_request really wouldn't hurt us.. famous last
> > > words):
> > > 
> > >  block/bio.c| 82 
> > > +-
> > >  block/blk-core.c   | 21 -
> > >  drivers/md/dm-bufio.c  |  2 +-
> > >  drivers/md/raid1.c |  6 ++--
> > >  drivers/md/raid10.c|  6 ++--
> > >  include/linux/blkdev.h | 11 +--
> > >  include/linux/sched.h  |  4 ---
> > >  7 files changed, 51 insertions(+), 81 deletions(-)
> > > 
> ...
> > > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> > > index 2dd3308..c2bff16 100644
> > > --- a/drivers/md/dm-bufio.c
> > > +++ b/drivers/md/dm-bufio.c
> > > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct 
> > > dm_bufio_client *c)
> > >  #define DM_BUFIO_CACHE(c)
> > > (dm_bufio_caches[dm_bufio_cache_index(c)])
> > >  #define DM_BUFIO_CACHE_NAME(c)   
> > > (dm_bufio_cache_names[dm_bufio_cache_index(c)])
> > >  
> > > -#define dm_bufio_in_request()(!!current->bio_list)
> > > +#define dm_bufio_in_request()(current->plug && 
> > > !!current->plug->bio_list)
> > 
> > This condition is repeated several times throughout the whole patch - so 
> > maybe you should make it a function in block device header file.
> 
> Yeah, I thought of that too but forgot to come back to it.  Will do,
> thanks.
> 
> FYI, I found another bug in my last patch and fixed it up.  I'll get
> some refactoring done (including your suggestion), actually _test_ the
> code (e.g. verify all of lvm testsuite passes) and then send out v3.

Turns out that this change:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841

needed to be reverted with:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe

Because nested plugs caused generic_make_request()'s onstack bio_list to
go out of scope (blk_finish_plug() wouldn't actually flush the list
within generic_make_request because XFS already added an outermost
plug).

But even after fixing that I then hit issues with these changes now
resulting in imperfect 'in_generic_make_request' accounting that happens
lazily once the outermost plug completes blk_finish_plug.  manifested as
dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.

Basically using the blk-core's onstack plugging isn't workable for
fixing this deadlock and we're back to having to seriously consider
this (with its additional hook in the scheduler): 

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-09 Thread Mike Snitzer
On Fri, Oct 09 2015 at  3:52pm -0400,
Mike Snitzer  wrote:
 
> Turns out that this change:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=2639638c77768a86216be456c2764e32a2bcd841
> 
> needed to be reverted with:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=ad3ccd760da7c05b90775372f9b39dc2964086fe
> 
> Because nested plugs caused generic_make_request()'s onstack bio_list to
> go out of scope (blk_finish_plug() wouldn't actually flush the list
> within generic_make_request because XFS already added an outermost
> plug).
> 
> But even after fixing that I then hit issues with these changes now
> resulting in imperfect 'in_generic_make_request' accounting that happens
> lazily once the outermost plug completes blk_finish_plug.  manifested as
> dm-bufio.c:dm_bufio_prefetch's BUG_ON(dm_bufio_in_request()); hitting.
> 
> Basically using the blk-core's onstack plugging isn't workable for
> fixing this deadlock and we're back to having to seriously consider
> this (with its additional hook in the scheduler): 

http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=a91709cd32b5ca7ca047b68c9299e747f2ae6ca2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-08 Thread Mike Snitzer
On Thu, Oct 08 2015 at 11:04am -0400,
Mikulas Patocka  wrote:

> 
> 
> On Tue, 6 Oct 2015, Mike Snitzer wrote:
> 
> > To give others context for why I'm caring about this issue again, this
> > recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> > 
> > FYI, I cleaned up the plug-based approach a bit further, here is the
> > incremental patch:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> > 
> > And here is a new version of the overall combined patch (sharing now
> > before I transition to looking at alternatives, though my gut is the use
> > of a plug in generic_make_request really wouldn't hurt us.. famous last
> > words):
> > 
> >  block/bio.c| 82 
> > +-
> >  block/blk-core.c   | 21 -
> >  drivers/md/dm-bufio.c  |  2 +-
> >  drivers/md/raid1.c |  6 ++--
> >  drivers/md/raid10.c|  6 ++--
> >  include/linux/blkdev.h | 11 +--
> >  include/linux/sched.h  |  4 ---
> >  7 files changed, 51 insertions(+), 81 deletions(-)
> > 
...
> > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> > index 2dd3308..c2bff16 100644
> > --- a/drivers/md/dm-bufio.c
> > +++ b/drivers/md/dm-bufio.c
> > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct 
> > dm_bufio_client *c)
> >  #define DM_BUFIO_CACHE(c)  (dm_bufio_caches[dm_bufio_cache_index(c)])
> >  #define DM_BUFIO_CACHE_NAME(c) 
> > (dm_bufio_cache_names[dm_bufio_cache_index(c)])
> >  
> > -#define dm_bufio_in_request()  (!!current->bio_list)
> > +#define dm_bufio_in_request()  (current->plug && 
> > !!current->plug->bio_list)
> 
> This condition is repeated several times throughout the whole patch - so 
> maybe you should make it a function in block device header file.

Yeah, I thought of that too but forgot to come back to it.  Will do,
thanks.

FYI, I found another bug in my last patch and fixed it up.  I'll get
some refactoring done (including your suggestion), actually _test_ the
code (e.g. verify all of lvm testsuite passes) and then send out v3.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-08 Thread Mikulas Patocka


On Tue, 6 Oct 2015, Mike Snitzer wrote:

> To give others context for why I'm caring about this issue again, this
> recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> 
> FYI, I cleaned up the plug-based approach a bit further, here is the
> incremental patch:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> 
> And here is a new version of the overall combined patch (sharing now
> before I transition to looking at alternatives, though my gut is the use
> of a plug in generic_make_request really wouldn't hurt us.. famous last
> words):
> 
>  block/bio.c| 82 
> +-
>  block/blk-core.c   | 21 -
>  drivers/md/dm-bufio.c  |  2 +-
>  drivers/md/raid1.c |  6 ++--
>  drivers/md/raid10.c|  6 ++--
>  include/linux/blkdev.h | 11 +--
>  include/linux/sched.h  |  4 ---
>  7 files changed, 51 insertions(+), 81 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index ad3f276..3d03668 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work)
>   }
>  }
>  
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> +/**
> + * blk_flush_bio_list
> + * @plug: the blk_plug that may have collected bios
> + *
> + * Pop bios queued on plug->bio_list and submit each of them to
> + * their rescue workqueue.
> + *
> + * If the bio doesn't have a bio_set, we use the default fs_bio_set.
> + * However, stacking drivers should use bio_set, so this shouldn't be
> + * an issue.
> + */
> +void blk_flush_bio_list(struct blk_plug *plug)
>  {
> - struct bio_list punt, nopunt;
>   struct bio *bio;
>  
> - /*
> -  * In order to guarantee forward progress we must punt only bios that
> -  * were allocated from this bio_set; otherwise, if there was a bio on
> -  * there for a stacking driver higher up in the stack, processing it
> -  * could require allocating bios from this bio_set, and doing that from
> -  * our own rescuer would be bad.
> -  *
> -  * Since bio lists are singly linked, pop them all instead of trying to
> -  * remove from the middle of the list:
> -  */
> -
> - bio_list_init();
> - bio_list_init();
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(>rescue_lock);
> - bio_list_merge(>rescue_list, );
> - spin_unlock(>rescue_lock);
> + while ((bio = bio_list_pop(>bio_list))) {
> + struct bio_set *bs = bio->bi_pool;
> + if (!bs)
> + bs = fs_bio_set;
>  
> - queue_work(bs->rescue_workqueue, >rescue_work);
> + spin_lock(>rescue_lock);
> + bio_list_add(>rescue_list, bio);
> + queue_work(bs->rescue_workqueue, >rescue_work);
> + spin_unlock(>rescue_lock);
> + }
>  }
>  
>  /**
> @@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
> *bs)
>  {
> - gfp_t saved_gfp = gfp_mask;
>   unsigned front_pad;
>   unsigned inline_vecs;
>   unsigned long idx = BIO_POOL_NONE;
> @@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int 
> nr_iovecs, struct bio_set *bs)
>   /* should not use nobvec bioset for nr_iovecs > 0 */
>   if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0))
>   return NULL;
> - /*
> -  * generic_make_request() converts recursion to iteration; this
> -  * means if we're running beneath it, any bios we allocate and
> -  * submit will not be submitted (and thus freed) until after we
> -  * return.
> -  *
> -  * This exposes us to a potential deadlock if we allocate
> -  * multiple bios from the same bio_set() while running
> -  * underneath generic_make_request(). If we were to allocate
> -  * multiple bios (say a stacking block driver that was splitting
> -  * bios), we would deadlock if we exhausted the mempool's
> -  * reserve.
> -  *
> -  * We solve this, and guarantee forward progress, with a rescuer
> -  * workqueue per bio_set. If we go to allocate and there are
> -  * bios on current->bio_list, we first try the allocation
> -  * without __GFP_WAIT; if that fails, we punt those bios we
> -  * would be blocking to the rescuer workqueue before we retry
> -  * with the original gfp_flags.
> -  */
> -
> - if (current->bio_list && !bio_list_empty(current->bio_list))
> - gfp_mask &= 

Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-08 Thread Mike Snitzer
On Thu, Oct 08 2015 at 11:04am -0400,
Mikulas Patocka  wrote:

> 
> 
> On Tue, 6 Oct 2015, Mike Snitzer wrote:
> 
> > To give others context for why I'm caring about this issue again, this
> > recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> > 
> > FYI, I cleaned up the plug-based approach a bit further, here is the
> > incremental patch:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> > 
> > And here is a new version of the overall combined patch (sharing now
> > before I transition to looking at alternatives, though my gut is the use
> > of a plug in generic_make_request really wouldn't hurt us.. famous last
> > words):
> > 
> >  block/bio.c| 82 
> > +-
> >  block/blk-core.c   | 21 -
> >  drivers/md/dm-bufio.c  |  2 +-
> >  drivers/md/raid1.c |  6 ++--
> >  drivers/md/raid10.c|  6 ++--
> >  include/linux/blkdev.h | 11 +--
> >  include/linux/sched.h  |  4 ---
> >  7 files changed, 51 insertions(+), 81 deletions(-)
> > 
...
> > diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
> > index 2dd3308..c2bff16 100644
> > --- a/drivers/md/dm-bufio.c
> > +++ b/drivers/md/dm-bufio.c
> > @@ -168,7 +168,7 @@ static inline int dm_bufio_cache_index(struct 
> > dm_bufio_client *c)
> >  #define DM_BUFIO_CACHE(c)  (dm_bufio_caches[dm_bufio_cache_index(c)])
> >  #define DM_BUFIO_CACHE_NAME(c) 
> > (dm_bufio_cache_names[dm_bufio_cache_index(c)])
> >  
> > -#define dm_bufio_in_request()  (!!current->bio_list)
> > +#define dm_bufio_in_request()  (current->plug && 
> > !!current->plug->bio_list)
> 
> This condition is repeated several times throughout the whole patch - so 
> maybe you should make it a function in block device header file.

Yeah, I thought of that too but forgot to come back to it.  Will do,
thanks.

FYI, I found another bug in my last patch and fixed it up.  I'll get
some refactoring done (including your suggestion), actually _test_ the
code (e.g. verify all of lvm testsuite passes) and then send out v3.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-08 Thread Mikulas Patocka


On Tue, 6 Oct 2015, Mike Snitzer wrote:

> To give others context for why I'm caring about this issue again, this
> recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> 
> FYI, I cleaned up the plug-based approach a bit further, here is the
> incremental patch:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> 
> And here is a new version of the overall combined patch (sharing now
> before I transition to looking at alternatives, though my gut is the use
> of a plug in generic_make_request really wouldn't hurt us.. famous last
> words):
> 
>  block/bio.c| 82 
> +-
>  block/blk-core.c   | 21 -
>  drivers/md/dm-bufio.c  |  2 +-
>  drivers/md/raid1.c |  6 ++--
>  drivers/md/raid10.c|  6 ++--
>  include/linux/blkdev.h | 11 +--
>  include/linux/sched.h  |  4 ---
>  7 files changed, 51 insertions(+), 81 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index ad3f276..3d03668 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work)
>   }
>  }
>  
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> +/**
> + * blk_flush_bio_list
> + * @plug: the blk_plug that may have collected bios
> + *
> + * Pop bios queued on plug->bio_list and submit each of them to
> + * their rescue workqueue.
> + *
> + * If the bio doesn't have a bio_set, we use the default fs_bio_set.
> + * However, stacking drivers should use bio_set, so this shouldn't be
> + * an issue.
> + */
> +void blk_flush_bio_list(struct blk_plug *plug)
>  {
> - struct bio_list punt, nopunt;
>   struct bio *bio;
>  
> - /*
> -  * In order to guarantee forward progress we must punt only bios that
> -  * were allocated from this bio_set; otherwise, if there was a bio on
> -  * there for a stacking driver higher up in the stack, processing it
> -  * could require allocating bios from this bio_set, and doing that from
> -  * our own rescuer would be bad.
> -  *
> -  * Since bio lists are singly linked, pop them all instead of trying to
> -  * remove from the middle of the list:
> -  */
> -
> - bio_list_init();
> - bio_list_init();
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(>rescue_lock);
> - bio_list_merge(>rescue_list, );
> - spin_unlock(>rescue_lock);
> + while ((bio = bio_list_pop(>bio_list))) {
> + struct bio_set *bs = bio->bi_pool;
> + if (!bs)
> + bs = fs_bio_set;
>  
> - queue_work(bs->rescue_workqueue, >rescue_work);
> + spin_lock(>rescue_lock);
> + bio_list_add(>rescue_list, bio);
> + queue_work(bs->rescue_workqueue, >rescue_work);
> + spin_unlock(>rescue_lock);
> + }
>  }
>  
>  /**
> @@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set 
> *bs)
>  {
> - gfp_t saved_gfp = gfp_mask;
>   unsigned front_pad;
>   unsigned inline_vecs;
>   unsigned long idx = BIO_POOL_NONE;
> @@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int 
> nr_iovecs, struct bio_set *bs)
>   /* should not use nobvec bioset for nr_iovecs > 0 */
>   if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0))
>   return NULL;
> - /*
> -  * generic_make_request() converts recursion to iteration; this
> -  * means if we're running beneath it, any bios we allocate and
> -  * submit will not be submitted (and thus freed) until after we
> -  * return.
> -  *
> -  * This exposes us to a potential deadlock if we allocate
> -  * multiple bios from the same bio_set() while running
> -  * underneath generic_make_request(). If we were to allocate
> -  * multiple bios (say a stacking block driver that was splitting
> -  * bios), we would deadlock if we exhausted the mempool's
> -  * reserve.
> -  *
> -  * We solve this, and guarantee forward progress, with a rescuer
> -  * workqueue per bio_set. If we go to allocate and there are
> -  * bios on current->bio_list, we first try the allocation
> -  * without __GFP_WAIT; if that fails, we punt those bios we
> -  * would be blocking to the rescuer workqueue before we retry
> -  * with the original gfp_flags.
> -  */
> -
> - if (current->bio_list && !bio_list_empty(current->bio_list))
> - gfp_mask &= 

Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-06 Thread Mike Snitzer
On Tue, Oct 06 2015 at  4:16P -0400,
Mike Snitzer  wrote:

> To give others context for why I'm caring about this issue again, this
> recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> 
> FYI, I cleaned up the plug-based approach a bit further, here is the
> incremental patch:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> 
> And here is a new version of the overall combined patch (sharing now
> before I transition to looking at alternatives, though my gut is the use
> of a plug in generic_make_request really wouldn't hurt us.. famous last
> words):
> 
>  block/bio.c| 82 
> +-
>  block/blk-core.c   | 21 -
>  drivers/md/dm-bufio.c  |  2 +-
>  drivers/md/raid1.c |  6 ++--
>  drivers/md/raid10.c|  6 ++--
>  include/linux/blkdev.h | 11 +--
>  include/linux/sched.h  |  4 ---
>  7 files changed, 51 insertions(+), 81 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index ad3f276..3d03668 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work)
>   }
>  }
>  
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> +/**
> + * blk_flush_bio_list
> + * @plug: the blk_plug that may have collected bios
> + *
> + * Pop bios queued on plug->bio_list and submit each of them to
> + * their rescue workqueue.
> + *
> + * If the bio doesn't have a bio_set, we use the default fs_bio_set.
> + * However, stacking drivers should use bio_set, so this shouldn't be
> + * an issue.
> + */
> +void blk_flush_bio_list(struct blk_plug *plug)
>  {
> - struct bio_list punt, nopunt;
>   struct bio *bio;
>  
> - /*
> -  * In order to guarantee forward progress we must punt only bios that
> -  * were allocated from this bio_set; otherwise, if there was a bio on
> -  * there for a stacking driver higher up in the stack, processing it
> -  * could require allocating bios from this bio_set, and doing that from
> -  * our own rescuer would be bad.
> -  *
> -  * Since bio lists are singly linked, pop them all instead of trying to
> -  * remove from the middle of the list:
> -  */
> -
> - bio_list_init();
> - bio_list_init();
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(>rescue_lock);
> - bio_list_merge(>rescue_list, );
> - spin_unlock(>rescue_lock);
> + while ((bio = bio_list_pop(>bio_list))) {

Bleh, should be plug->bio_list.. obviously I didn't compile test this...

Here is the incremental I've folded in:

diff --git a/block/bio.c b/block/bio.c
index 3d03668..b868b9e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -369,7 +369,10 @@ void blk_flush_bio_list(struct blk_plug *plug)
 {
struct bio *bio;
 
-   while ((bio = bio_list_pop(>bio_list))) {
+   if (!plug->bio_list)
+   return;
+
+   while ((bio = bio_list_pop(plug->bio_list))) {
struct bio_set *bs = bio->bi_pool;
if (!bs)
bs = fs_bio_set;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] block: flush queued bios when the process blocks

2015-10-06 Thread Mike Snitzer
To give others context for why I'm caring about this issue again, this
recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1267650

FYI, I cleaned up the plug-based approach a bit further, here is the
incremental patch:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6

And here is a new version of the overall combined patch (sharing now
before I transition to looking at alternatives, though my gut is the use
of a plug in generic_make_request really wouldn't hurt us.. famous last
words):

 block/bio.c| 82 +-
 block/blk-core.c   | 21 -
 drivers/md/dm-bufio.c  |  2 +-
 drivers/md/raid1.c |  6 ++--
 drivers/md/raid10.c|  6 ++--
 include/linux/blkdev.h | 11 +--
 include/linux/sched.h  |  4 ---
 7 files changed, 51 insertions(+), 81 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ad3f276..3d03668 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work)
}
 }
 
-static void punt_bios_to_rescuer(struct bio_set *bs)
+/**
+ * blk_flush_bio_list
+ * @plug: the blk_plug that may have collected bios
+ *
+ * Pop bios queued on plug->bio_list and submit each of them to
+ * their rescue workqueue.
+ *
+ * If the bio doesn't have a bio_set, we use the default fs_bio_set.
+ * However, stacking drivers should use bio_set, so this shouldn't be
+ * an issue.
+ */
+void blk_flush_bio_list(struct blk_plug *plug)
 {
-   struct bio_list punt, nopunt;
struct bio *bio;
 
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   *current->bio_list = nopunt;
-
-   spin_lock(>rescue_lock);
-   bio_list_merge(>rescue_list, );
-   spin_unlock(>rescue_lock);
+   while ((bio = bio_list_pop(>bio_list))) {
+   struct bio_set *bs = bio->bi_pool;
+   if (!bs)
+   bs = fs_bio_set;
 
-   queue_work(bs->rescue_workqueue, >rescue_work);
+   spin_lock(>rescue_lock);
+   bio_list_add(>rescue_list, bio);
+   queue_work(bs->rescue_workqueue, >rescue_work);
+   spin_unlock(>rescue_lock);
+   }
 }
 
 /**
@@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
@@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int 
nr_iovecs, struct bio_set *bs)
/* should not use nobvec bioset for nr_iovecs > 0 */
if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0))
return NULL;
-   /*
-* generic_make_request() converts recursion to iteration; this
-* means if we're running beneath it, any bios we allocate and
-* submit will not be submitted (and thus freed) until after we
-* return.
-*
-* This exposes us to a potential deadlock if we allocate
-* multiple bios from the same bio_set() while running
-* underneath generic_make_request(). If we were to allocate
-* multiple bios (say a stacking block driver that was splitting
-* bios), we would deadlock if we exhausted the mempool's
-* reserve.
-*
-* We solve this, and guarantee forward progress, with a rescuer
-* workqueue per bio_set. If we go to allocate and there are
-* bios on current->bio_list, we first try the allocation
-* without __GFP_WAIT; if that fails, we punt those bios we
-* would be blocking to the rescuer workqueue before we retry
-* with the original gfp_flags.
-*/
-
-   if (current->bio_list && !bio_list_empty(current->bio_list))
-   gfp_mask &= ~__GFP_WAIT;
 
p = mempool_alloc(bs->bio_pool, gfp_mask);
-   if (!p && gfp_mask != saved_gfp) {
-   punt_bios_to_rescuer(bs);

[PATCH v2] block: flush queued bios when the process blocks

2015-10-06 Thread Mike Snitzer
To give others context for why I'm caring about this issue again, this
recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1267650

FYI, I cleaned up the plug-based approach a bit further, here is the
incremental patch:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6

And here is a new version of the overall combined patch (sharing now
before I transition to looking at alternatives, though my gut is the use
of a plug in generic_make_request really wouldn't hurt us.. famous last
words):

 block/bio.c| 82 +-
 block/blk-core.c   | 21 -
 drivers/md/dm-bufio.c  |  2 +-
 drivers/md/raid1.c |  6 ++--
 drivers/md/raid10.c|  6 ++--
 include/linux/blkdev.h | 11 +--
 include/linux/sched.h  |  4 ---
 7 files changed, 51 insertions(+), 81 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ad3f276..3d03668 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work)
}
 }
 
-static void punt_bios_to_rescuer(struct bio_set *bs)
+/**
+ * blk_flush_bio_list
+ * @plug: the blk_plug that may have collected bios
+ *
+ * Pop bios queued on plug->bio_list and submit each of them to
+ * their rescue workqueue.
+ *
+ * If the bio doesn't have a bio_set, we use the default fs_bio_set.
+ * However, stacking drivers should use bio_set, so this shouldn't be
+ * an issue.
+ */
+void blk_flush_bio_list(struct blk_plug *plug)
 {
-   struct bio_list punt, nopunt;
struct bio *bio;
 
-   /*
-* In order to guarantee forward progress we must punt only bios that
-* were allocated from this bio_set; otherwise, if there was a bio on
-* there for a stacking driver higher up in the stack, processing it
-* could require allocating bios from this bio_set, and doing that from
-* our own rescuer would be bad.
-*
-* Since bio lists are singly linked, pop them all instead of trying to
-* remove from the middle of the list:
-*/
-
-   bio_list_init();
-   bio_list_init();
-
-   while ((bio = bio_list_pop(current->bio_list)))
-   bio_list_add(bio->bi_pool == bs ?  : , bio);
-
-   *current->bio_list = nopunt;
-
-   spin_lock(>rescue_lock);
-   bio_list_merge(>rescue_list, );
-   spin_unlock(>rescue_lock);
+   while ((bio = bio_list_pop(>bio_list))) {
+   struct bio_set *bs = bio->bi_pool;
+   if (!bs)
+   bs = fs_bio_set;
 
-   queue_work(bs->rescue_workqueue, >rescue_work);
+   spin_lock(>rescue_lock);
+   bio_list_add(>rescue_list, bio);
+   queue_work(bs->rescue_workqueue, >rescue_work);
+   spin_unlock(>rescue_lock);
+   }
 }
 
 /**
@@ -422,7 +418,6 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
  */
 struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 {
-   gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
@@ -443,37 +438,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int 
nr_iovecs, struct bio_set *bs)
/* should not use nobvec bioset for nr_iovecs > 0 */
if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0))
return NULL;
-   /*
-* generic_make_request() converts recursion to iteration; this
-* means if we're running beneath it, any bios we allocate and
-* submit will not be submitted (and thus freed) until after we
-* return.
-*
-* This exposes us to a potential deadlock if we allocate
-* multiple bios from the same bio_set() while running
-* underneath generic_make_request(). If we were to allocate
-* multiple bios (say a stacking block driver that was splitting
-* bios), we would deadlock if we exhausted the mempool's
-* reserve.
-*
-* We solve this, and guarantee forward progress, with a rescuer
-* workqueue per bio_set. If we go to allocate and there are
-* bios on current->bio_list, we first try the allocation
-* without __GFP_WAIT; if that fails, we punt those bios we
-* would be blocking to the rescuer workqueue before we retry
-* with the original gfp_flags.
-*/
-
-   if (current->bio_list && !bio_list_empty(current->bio_list))
-   gfp_mask &= ~__GFP_WAIT;
 
p = mempool_alloc(bs->bio_pool, gfp_mask);
-   if (!p && gfp_mask != saved_gfp) {
-   punt_bios_to_rescuer(bs);

Re: [PATCH v2] block: flush queued bios when the process blocks

2015-10-06 Thread Mike Snitzer
On Tue, Oct 06 2015 at  4:16P -0400,
Mike Snitzer  wrote:

> To give others context for why I'm caring about this issue again, this
> recent BZ against 4.3-rc served as a reminder that we _need_ a fix:
> https://bugzilla.redhat.com/show_bug.cgi?id=1267650
> 
> FYI, I cleaned up the plug-based approach a bit further, here is the
> incremental patch:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip=f73d001ec692125308accbb5ca26f892f949c1b6
> 
> And here is a new version of the overall combined patch (sharing now
> before I transition to looking at alternatives, though my gut is the use
> of a plug in generic_make_request really wouldn't hurt us.. famous last
> words):
> 
>  block/bio.c| 82 
> +-
>  block/blk-core.c   | 21 -
>  drivers/md/dm-bufio.c  |  2 +-
>  drivers/md/raid1.c |  6 ++--
>  drivers/md/raid10.c|  6 ++--
>  include/linux/blkdev.h | 11 +--
>  include/linux/sched.h  |  4 ---
>  7 files changed, 51 insertions(+), 81 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index ad3f276..3d03668 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -354,35 +354,31 @@ static void bio_alloc_rescue(struct work_struct *work)
>   }
>  }
>  
> -static void punt_bios_to_rescuer(struct bio_set *bs)
> +/**
> + * blk_flush_bio_list
> + * @plug: the blk_plug that may have collected bios
> + *
> + * Pop bios queued on plug->bio_list and submit each of them to
> + * their rescue workqueue.
> + *
> + * If the bio doesn't have a bio_set, we use the default fs_bio_set.
> + * However, stacking drivers should use bio_set, so this shouldn't be
> + * an issue.
> + */
> +void blk_flush_bio_list(struct blk_plug *plug)
>  {
> - struct bio_list punt, nopunt;
>   struct bio *bio;
>  
> - /*
> -  * In order to guarantee forward progress we must punt only bios that
> -  * were allocated from this bio_set; otherwise, if there was a bio on
> -  * there for a stacking driver higher up in the stack, processing it
> -  * could require allocating bios from this bio_set, and doing that from
> -  * our own rescuer would be bad.
> -  *
> -  * Since bio lists are singly linked, pop them all instead of trying to
> -  * remove from the middle of the list:
> -  */
> -
> - bio_list_init();
> - bio_list_init();
> -
> - while ((bio = bio_list_pop(current->bio_list)))
> - bio_list_add(bio->bi_pool == bs ?  : , bio);
> -
> - *current->bio_list = nopunt;
> -
> - spin_lock(>rescue_lock);
> - bio_list_merge(>rescue_list, );
> - spin_unlock(>rescue_lock);
> + while ((bio = bio_list_pop(>bio_list))) {

Bleh, should be plug->bio_list.. obviously I didn't compile test this...

Here is the incremental I've folded in:

diff --git a/block/bio.c b/block/bio.c
index 3d03668..b868b9e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -369,7 +369,10 @@ void blk_flush_bio_list(struct blk_plug *plug)
 {
struct bio *bio;
 
-   while ((bio = bio_list_pop(>bio_list))) {
+   if (!plug->bio_list)
+   return;
+
+   while ((bio = bio_list_pop(plug->bio_list))) {
struct bio_set *bs = bio->bi_pool;
if (!bs)
bs = fs_bio_set;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/