Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread John Stoffel
> "Kent" == Kent Overstreet  writes:

Kent> On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
>> It's also instructive to remember why the code is the way it is: it used
>> to process bios for underlying devices immediately, but this sometimes
>> meant too much recursive stack growth.  If a per-device rescuer thread
>> is to be made available (as well as the mempool), the option of
>> reinstating recursion is there too - only punting to workqueue when the
>> stack actually becomes "too big".  (Also bear in mind that some dm
>> targets may have dependencies on their own mempools - submission can
>> block there too.)  I find it helpful only to consider splitting into two
>> pieces - it must always be possible to process the first piece (i.e.
>> process it at the next layer down in the stack) and complete it
>> independently of what happens to the second piece (which might require
>> further splitting and block until the first piece has completed).

Kent> I'm sure it could be made to work (and it may well simpler), but it
Kent> seems problematic from a performance pov.

Kent> With stacked devices you'd then have to switch stacks on _every_ bio.
Kent> That could be made fast enough I'm sure, but it wouldn't be free and I
Kent> don't know of any existing code in the kernel that implements what we'd
Kent> need (though if you know how you'd go about doing that, I'd love to
Kent> know! Would be useful for other things).

Kent> The real problem is that because we'd need these extra stacks for
Kent> handling all bios we couldn't get by with just one per bio_set. We'd
Kent> only need one to make forward progress so the rest could be allocated
Kent> on demand (i.e. what the workqueue code does) but that sounds like it's
Kent> starting to get expensive.

Maybe we need to limit the size of BIOs to that of the bottom-most
underlying device instead?  Or maybe limit BIOs to some small
multiple?  As you stack up DM targets one on top of each other, they
should respect the limits of the underlying devices and pass those
limits up the chain.

Or maybe I'm speaking giberish...

John
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Kent Overstreet
On Wed, Aug 29, 2012 at 05:01:09PM -0400, John Stoffel wrote:
> > "Kent" == Kent Overstreet  writes:
> 
> Kent> On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
> >> It's also instructive to remember why the code is the way it is: it used
> >> to process bios for underlying devices immediately, but this sometimes
> >> meant too much recursive stack growth.  If a per-device rescuer thread
> >> is to be made available (as well as the mempool), the option of
> >> reinstating recursion is there too - only punting to workqueue when the
> >> stack actually becomes "too big".  (Also bear in mind that some dm
> >> targets may have dependencies on their own mempools - submission can
> >> block there too.)  I find it helpful only to consider splitting into two
> >> pieces - it must always be possible to process the first piece (i.e.
> >> process it at the next layer down in the stack) and complete it
> >> independently of what happens to the second piece (which might require
> >> further splitting and block until the first piece has completed).
> 
> Kent> I'm sure it could be made to work (and it may well simpler), but it
> Kent> seems problematic from a performance pov.
> 
> Kent> With stacked devices you'd then have to switch stacks on _every_ bio.
> Kent> That could be made fast enough I'm sure, but it wouldn't be free and I
> Kent> don't know of any existing code in the kernel that implements what we'd
> Kent> need (though if you know how you'd go about doing that, I'd love to
> Kent> know! Would be useful for other things).
> 
> Kent> The real problem is that because we'd need these extra stacks for
> Kent> handling all bios we couldn't get by with just one per bio_set. We'd
> Kent> only need one to make forward progress so the rest could be allocated
> Kent> on demand (i.e. what the workqueue code does) but that sounds like it's
> Kent> starting to get expensive.
> 
> Maybe we need to limit the size of BIOs to that of the bottom-most
> underlying device instead?  Or maybe limit BIOs to some small
> multiple?  As you stack up DM targets one on top of each other, they
> should respect the limits of the underlying devices and pass those
> limits up the chain.

That's the approach the block layer currently tries to take. It's
brittle, tricky and inefficient, and it completely breaks down when the
limits are dynamic - so things like dm and bcache are always going to
have to split anyways.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Kent Overstreet
On Wed, Aug 29, 2012 at 06:23:36PM +0100, Alasdair G Kergon wrote:
> On Wed, Aug 29, 2012 at 10:13:45AM -0700, Kent Overstreet wrote:
> > Only on allocation failure.
>  
> Which as you already said assumes wrapping together the other mempools in the
> submission path first...

Yes? I'm not sure what you're getting at.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Alasdair G Kergon
On Wed, Aug 29, 2012 at 10:13:45AM -0700, Kent Overstreet wrote:
> Only on allocation failure.
 
Which as you already said assumes wrapping together the other mempools in the
submission path first...

Alasdair

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Alasdair G Kergon
On Wed, Aug 29, 2012 at 09:50:06AM -0700, Kent Overstreet wrote:
> The other thing we could do is make the rescuer thread per block device
> (which arguably makes more sense than per bio_set anyways), then
> associate bio_sets with specific block devices/rescuers.
 
For dm, it's equivalent - already one bioset required per live device:
dm_alloc_md_mempools().

Alasdair

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Kent Overstreet
On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
> It's also instructive to remember why the code is the way it is: it used
> to process bios for underlying devices immediately, but this sometimes
> meant too much recursive stack growth.  If a per-device rescuer thread
> is to be made available (as well as the mempool), the option of
> reinstating recursion is there too - only punting to workqueue when the
> stack actually becomes "too big".  (Also bear in mind that some dm
> targets may have dependencies on their own mempools - submission can
> block there too.)  I find it helpful only to consider splitting into two
> pieces - it must always be possible to process the first piece (i.e.
> process it at the next layer down in the stack) and complete it
> independently of what happens to the second piece (which might require
> further splitting and block until the first piece has completed).

I'm sure it could be made to work (and it may well simpler), but it
seems problematic from a performance pov.

With stacked devices you'd then have to switch stacks on _every_ bio.
That could be made fast enough I'm sure, but it wouldn't be free and I
don't know of any existing code in the kernel that implements what we'd
need (though if you know how you'd go about doing that, I'd love to
know! Would be useful for other things).

The real problem is that because we'd need these extra stacks for
handling all bios we couldn't get by with just one per bio_set. We'd
only need one to make forward progress so the rest could be allocated
on demand (i.e. what the workqueue code does) but that sounds like it's
starting to get expensive.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Alasdair G Kergon
On Wed, Aug 29, 2012 at 08:57:59AM -0400, Vivek Goyal wrote:
> I would say keep all the bio splitting patches and any fixes w.r.t
> deadlocks in a seprate series. As this is little complicated and a lot
> of is just theoritical corner cases. If you limit this series to just
> bio_set related cleanups, it becomes more acceptable for inclusion.
 
Yes, please keep the splitting patches separate.  The current code gets
away with what it does through statistics making the deadlocks very
unlikely.

It's also instructive to remember why the code is the way it is: it used
to process bios for underlying devices immediately, but this sometimes
meant too much recursive stack growth.  If a per-device rescuer thread
is to be made available (as well as the mempool), the option of
reinstating recursion is there too - only punting to workqueue when the
stack actually becomes "too big".  (Also bear in mind that some dm
targets may have dependencies on their own mempools - submission can
block there too.)  I find it helpful only to consider splitting into two
pieces - it must always be possible to process the first piece (i.e.
process it at the next layer down in the stack) and complete it
independently of what happens to the second piece (which might require
further splitting and block until the first piece has completed).

Alasdair

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Alasdair G Kergon
On Wed, Aug 29, 2012 at 08:57:59AM -0400, Vivek Goyal wrote:
 I would say keep all the bio splitting patches and any fixes w.r.t
 deadlocks in a seprate series. As this is little complicated and a lot
 of is just theoritical corner cases. If you limit this series to just
 bio_set related cleanups, it becomes more acceptable for inclusion.
 
Yes, please keep the splitting patches separate.  The current code gets
away with what it does through statistics making the deadlocks very
unlikely.

It's also instructive to remember why the code is the way it is: it used
to process bios for underlying devices immediately, but this sometimes
meant too much recursive stack growth.  If a per-device rescuer thread
is to be made available (as well as the mempool), the option of
reinstating recursion is there too - only punting to workqueue when the
stack actually becomes too big.  (Also bear in mind that some dm
targets may have dependencies on their own mempools - submission can
block there too.)  I find it helpful only to consider splitting into two
pieces - it must always be possible to process the first piece (i.e.
process it at the next layer down in the stack) and complete it
independently of what happens to the second piece (which might require
further splitting and block until the first piece has completed).

Alasdair

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Kent Overstreet
On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
 It's also instructive to remember why the code is the way it is: it used
 to process bios for underlying devices immediately, but this sometimes
 meant too much recursive stack growth.  If a per-device rescuer thread
 is to be made available (as well as the mempool), the option of
 reinstating recursion is there too - only punting to workqueue when the
 stack actually becomes too big.  (Also bear in mind that some dm
 targets may have dependencies on their own mempools - submission can
 block there too.)  I find it helpful only to consider splitting into two
 pieces - it must always be possible to process the first piece (i.e.
 process it at the next layer down in the stack) and complete it
 independently of what happens to the second piece (which might require
 further splitting and block until the first piece has completed).

I'm sure it could be made to work (and it may well simpler), but it
seems problematic from a performance pov.

With stacked devices you'd then have to switch stacks on _every_ bio.
That could be made fast enough I'm sure, but it wouldn't be free and I
don't know of any existing code in the kernel that implements what we'd
need (though if you know how you'd go about doing that, I'd love to
know! Would be useful for other things).

The real problem is that because we'd need these extra stacks for
handling all bios we couldn't get by with just one per bio_set. We'd
only need one to make forward progress so the rest could be allocated
on demand (i.e. what the workqueue code does) but that sounds like it's
starting to get expensive.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Alasdair G Kergon
On Wed, Aug 29, 2012 at 09:50:06AM -0700, Kent Overstreet wrote:
 The other thing we could do is make the rescuer thread per block device
 (which arguably makes more sense than per bio_set anyways), then
 associate bio_sets with specific block devices/rescuers.
 
For dm, it's equivalent - already one bioset required per live device:
dm_alloc_md_mempools().

Alasdair

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Alasdair G Kergon
On Wed, Aug 29, 2012 at 10:13:45AM -0700, Kent Overstreet wrote:
 Only on allocation failure.
 
Which as you already said assumes wrapping together the other mempools in the
submission path first...

Alasdair

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Kent Overstreet
On Wed, Aug 29, 2012 at 06:23:36PM +0100, Alasdair G Kergon wrote:
 On Wed, Aug 29, 2012 at 10:13:45AM -0700, Kent Overstreet wrote:
  Only on allocation failure.
  
 Which as you already said assumes wrapping together the other mempools in the
 submission path first...

Yes? I'm not sure what you're getting at.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread Kent Overstreet
On Wed, Aug 29, 2012 at 05:01:09PM -0400, John Stoffel wrote:
  Kent == Kent Overstreet koverstr...@google.com writes:
 
 Kent On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
  It's also instructive to remember why the code is the way it is: it used
  to process bios for underlying devices immediately, but this sometimes
  meant too much recursive stack growth.  If a per-device rescuer thread
  is to be made available (as well as the mempool), the option of
  reinstating recursion is there too - only punting to workqueue when the
  stack actually becomes too big.  (Also bear in mind that some dm
  targets may have dependencies on their own mempools - submission can
  block there too.)  I find it helpful only to consider splitting into two
  pieces - it must always be possible to process the first piece (i.e.
  process it at the next layer down in the stack) and complete it
  independently of what happens to the second piece (which might require
  further splitting and block until the first piece has completed).
 
 Kent I'm sure it could be made to work (and it may well simpler), but it
 Kent seems problematic from a performance pov.
 
 Kent With stacked devices you'd then have to switch stacks on _every_ bio.
 Kent That could be made fast enough I'm sure, but it wouldn't be free and I
 Kent don't know of any existing code in the kernel that implements what we'd
 Kent need (though if you know how you'd go about doing that, I'd love to
 Kent know! Would be useful for other things).
 
 Kent The real problem is that because we'd need these extra stacks for
 Kent handling all bios we couldn't get by with just one per bio_set. We'd
 Kent only need one to make forward progress so the rest could be allocated
 Kent on demand (i.e. what the workqueue code does) but that sounds like it's
 Kent starting to get expensive.
 
 Maybe we need to limit the size of BIOs to that of the bottom-most
 underlying device instead?  Or maybe limit BIOs to some small
 multiple?  As you stack up DM targets one on top of each other, they
 should respect the limits of the underlying devices and pass those
 limits up the chain.

That's the approach the block layer currently tries to take. It's
brittle, tricky and inefficient, and it completely breaks down when the
limits are dynamic - so things like dm and bcache are always going to
have to split anyways.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v7 9/9] block: Avoid deadlocks with bio allocation by stacking drivers

2012-08-29 Thread John Stoffel
 Kent == Kent Overstreet koverstr...@google.com writes:

Kent On Wed, Aug 29, 2012 at 03:39:14PM +0100, Alasdair G Kergon wrote:
 It's also instructive to remember why the code is the way it is: it used
 to process bios for underlying devices immediately, but this sometimes
 meant too much recursive stack growth.  If a per-device rescuer thread
 is to be made available (as well as the mempool), the option of
 reinstating recursion is there too - only punting to workqueue when the
 stack actually becomes too big.  (Also bear in mind that some dm
 targets may have dependencies on their own mempools - submission can
 block there too.)  I find it helpful only to consider splitting into two
 pieces - it must always be possible to process the first piece (i.e.
 process it at the next layer down in the stack) and complete it
 independently of what happens to the second piece (which might require
 further splitting and block until the first piece has completed).

Kent I'm sure it could be made to work (and it may well simpler), but it
Kent seems problematic from a performance pov.

Kent With stacked devices you'd then have to switch stacks on _every_ bio.
Kent That could be made fast enough I'm sure, but it wouldn't be free and I
Kent don't know of any existing code in the kernel that implements what we'd
Kent need (though if you know how you'd go about doing that, I'd love to
Kent know! Would be useful for other things).

Kent The real problem is that because we'd need these extra stacks for
Kent handling all bios we couldn't get by with just one per bio_set. We'd
Kent only need one to make forward progress so the rest could be allocated
Kent on demand (i.e. what the workqueue code does) but that sounds like it's
Kent starting to get expensive.

Maybe we need to limit the size of BIOs to that of the bottom-most
underlying device instead?  Or maybe limit BIOs to some small
multiple?  As you stack up DM targets one on top of each other, they
should respect the limits of the underlying devices and pass those
limits up the chain.

Or maybe I'm speaking giberish...

John
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/