Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-13 Thread Hannes Reinecke
On 08/08/2015 11:02 AM, Kent Overstreet wrote:
> On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
>> Wouldn't it be easier to move both max_write_same_sectors and
>> max_discard sectors to 64 bit (ie to type sector_t) and be done with the
>> overflow?
>> Seems to me this is far too much coding around self-imposed restrictions...
> 
> It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I
> suppose wouldn't actually increase the size of struct bio (when sector_t is 64
> bits), since struct bvec_iter has padding right now...
> 
Which I guess we should be doing anyway.
Devices are getting larger and larger, so in the long run in need to
be moved to 64 bits.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-13 Thread Hannes Reinecke
On 08/08/2015 11:02 AM, Kent Overstreet wrote:
 On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
 Wouldn't it be easier to move both max_write_same_sectors and
 max_discard sectors to 64 bit (ie to type sector_t) and be done with the
 overflow?
 Seems to me this is far too much coding around self-imposed restrictions...
 
 It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I
 suppose wouldn't actually increase the size of struct bio (when sector_t is 64
 bits), since struct bvec_iter has padding right now...
 
Which I guess we should be doing anyway.
Devices are getting larger and larger, so in the long run in need to
be moved to 64 bits.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Ming Lin
On Tue, 2015-08-11 at 20:24 -0400, Martin K. Petersen wrote:
> > "Ming" == Ming Lin  writes:
> 
> Ming> Do you still agree we cap discard to 2G as an interim solution?
> 
> I can live with the 2G cap for 4.3 but would like to see it fixed
> properly in 4.4.

Sure. I'd like to work on the fix.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
> "Ming" == Ming Lin  writes:

Ming> Do you still agree we cap discard to 2G as an interim solution?

I can live with the 2G cap for 4.3 but would like to see it fixed
properly in 4.4.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Ming Lin
On Tue, 2015-08-11 at 14:05 -0400, Martin K. Petersen wrote:
> > "Martin" == Martin K Petersen  writes:
> 
> Martin> I agree except I really don't want to lop off anything unless
> Martin> the device locks up if we send it partial blocks. There was an
> Martin> array that had problems a while back but I believe they have
> Martin> been fixed.
> 
> Oh, and there are several arrays out there that have allocation units
> that are not powers of two. *sigh*

Do you still agree we cap discard to 2G as an interim solution?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/commit/?h=block-generic-req=39d2f5f

If OK, I'll send out new version asking Jens to apply.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
> "Martin" == Martin K Petersen  writes:

Martin> I agree except I really don't want to lop off anything unless
Martin> the device locks up if we send it partial blocks. There was an
Martin> array that had problems a while back but I believe they have
Martin> been fixed.

Oh, and there are several arrays out there that have allocation units
that are not powers of two. *sigh*

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike> That is the benefit.  And when coupled with the new default
Mike> max_discard of 64K (pending change from Jens for 4.3) this 2GB
Mike> upper limit really isn't such a big deal.  Unless I'm missing
Mike> something...

2GB is fine for current SATA due to the stupid range descriptors. But
there are some changes in the pipeline to support descriptors with
bigger ranges so that will change.

However, we currently do 4GB discards on SCSI so the proposed cap will
cut that in half. I'm OK with that as an interim solution. though.

But I'm a bit concerned about what might be lurking in dm thinp if you
trip over partial blocks like in Ming's example...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
> "Kent" == Kent Overstreet  writes:

Kent> This kind of logic really doesn't belong in dm

Well it does in this case since the thinp personality actually
provisions and unprovisions space.

But there is a difference between what dm thinp acts on for its own
internal provisioning purposes and what it passes down the stack. I am
really against dropping information anywhere along the path. We don't
round off read/write requests either.

The queue limits were meant as hints to mkfs.* so that on-disk data
structures could be laid out in an aligned and storage friendly way. I
never intended for the hints to affect runtime behavior.

Kent> IMO though it belongs in the driver - if a discard needs to be
Kent> dropped because it's too small and the hardware can't do it, that
Kent> should be the driver's responsibility.

I agree except I really don't want to lop off anything unless the device
locks up if we send it partial blocks. There was an array that had
problems a while back but I believe they have been fixed.

The fundamental premise should be that we pass as comprehensive
information as we can. And the device can then decide to ignore all or
parts of the request. That's fundamentally how things work at the
protocol level in both SCSI and SATA. I don't see any reason why the
Linux I/O stack should behave in a different manner.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Mike Snitzer
On Tue, Aug 11 2015 at  1:36pm -0400,
Martin K. Petersen  wrote:

> > "Mike" == Mike Snitzer  writes:
> 
> Mike> DM-thinp processes discards internally before it passes them down
> Mike> (if configured to do so).  If a discard is smaller than the
> Mike> granularity of a thinp block (whose size is configurable) or if
> Mike> the start and end of the discard's extent is misaligned (relative
> Mike> to the thinp blocks mapped to the logical extent) then the discard
> Mike> won't actually discard partial thinp blocks.
> 
> That's fine. You can throw away anything you don't like as long as
> discard_zeroes_data=0.

Correct, dm-thinp sets discard_zeroes_data=0

> But I don't understand why having an artificial cap at 2GB fixes
> things. Other than making it less likely for you to receive a runt by
> virtue of being aligned to a power of two.

That is the benefit.  And when coupled with the new default max_discard
of 64K (pending change from Jens for 4.3) this 2GB upper limit really
isn't such a big deal.  Unless I'm missing something...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike> DM-thinp processes discards internally before it passes them down
Mike> (if configured to do so).  If a discard is smaller than the
Mike> granularity of a thinp block (whose size is configurable) or if
Mike> the start and end of the discard's extent is misaligned (relative
Mike> to the thinp blocks mapped to the logical extent) then the discard
Mike> won't actually discard partial thinp blocks.

That's fine. You can throw away anything you don't like as long as
discard_zeroes_data=0.

But I don't understand why having an artificial cap at 2GB fixes
things. Other than making it less likely for you to receive a runt by
virtue of being aligned to a power of two.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Mike Snitzer
On Mon, Aug 10 2015 at 11:38pm -0400,
Kent Overstreet  wrote:

> On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote:
> > On Mon, Aug 10 2015 at 10:00pm -0400,
> > Martin K. Petersen  wrote:
> > 
> > > > "Ming" == Ming Lin  writes:
> > > 
> > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()?
> > > 
> > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use
> > > Ming> 1<<31.
> > > 
> > > I'm not sure why things are not working for dm-thinp. Presumably Kent's
> > > code would split the discard at a granularity boundary so why would that
> > > cause problems for dm?
> > 
> > DM-thinp processes discards internally before it passes them down (if
> > configured to do so).  If a discard is smaller than the granularity of a
> > thinp block (whose size is configurable) or if the start and end of the
> > discard's extent is misaligned (relative to the thinp blocks mapped to
> > the logical extent) then the discard won't actually discard partial
> > thinp blocks.
> 
> This kind of logic really doesn't belong in dm - if it's needed, it really
> belongs in bio_split() (which is supposed to work correctly for discards - so 
> if
> it is needed, then bio_split() needs fixing...)

DM thinp does advertise discard_granularity that reflects the thinp blocksize.
blk_queue_split() does look like it'd do the right thing.  But the
splitting that DM thinp is doing is a long standing implementation (in
DM core) that will need to be carefully reviewed/rewritten.  We can
tackle it after all this late splitting code lands.

> IMO though it belongs in the driver - if a discard needs to be dropped because
> it's too small and the hardware can't do it, that should be the driver's
> responsibility.

This isn't about the hardware's limits.  This is about the intermediate
remapping/stacking driver's own space management hooking off of the
discard bio too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Ming Lin
On Tue, 2015-08-11 at 14:05 -0400, Martin K. Petersen wrote:
  Martin == Martin K Petersen martin.peter...@oracle.com writes:
 
 Martin I agree except I really don't want to lop off anything unless
 Martin the device locks up if we send it partial blocks. There was an
 Martin array that had problems a while back but I believe they have
 Martin been fixed.
 
 Oh, and there are several arrays out there that have allocation units
 that are not powers of two. *sigh*

Do you still agree we cap discard to 2G as an interim solution?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/commit/?h=block-generic-reqid=39d2f5f

If OK, I'll send out new version asking Jens to apply.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Mike Snitzer
On Mon, Aug 10 2015 at 11:38pm -0400,
Kent Overstreet kent.overstr...@gmail.com wrote:

 On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote:
  On Mon, Aug 10 2015 at 10:00pm -0400,
  Martin K. Petersen martin.peter...@oracle.com wrote:
  
Ming == Ming Lin m...@kernel.org writes:
   
   Ming Did you mean still use (UINT_MAX  9) in blkdev_issue_discard()?
   
   Ming But that doesn't work for dm-thinp. See Kent's suggestion to use
   Ming 131.
   
   I'm not sure why things are not working for dm-thinp. Presumably Kent's
   code would split the discard at a granularity boundary so why would that
   cause problems for dm?
  
  DM-thinp processes discards internally before it passes them down (if
  configured to do so).  If a discard is smaller than the granularity of a
  thinp block (whose size is configurable) or if the start and end of the
  discard's extent is misaligned (relative to the thinp blocks mapped to
  the logical extent) then the discard won't actually discard partial
  thinp blocks.
 
 This kind of logic really doesn't belong in dm - if it's needed, it really
 belongs in bio_split() (which is supposed to work correctly for discards - so 
 if
 it is needed, then bio_split() needs fixing...)

DM thinp does advertise discard_granularity that reflects the thinp blocksize.
blk_queue_split() does look like it'd do the right thing.  But the
splitting that DM thinp is doing is a long standing implementation (in
DM core) that will need to be carefully reviewed/rewritten.  We can
tackle it after all this late splitting code lands.

 IMO though it belongs in the driver - if a discard needs to be dropped because
 it's too small and the hardware can't do it, that should be the driver's
 responsibility.

This isn't about the hardware's limits.  This is about the intermediate
remapping/stacking driver's own space management hooking off of the
discard bio too.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Mike Snitzer
On Tue, Aug 11 2015 at  1:36pm -0400,
Martin K. Petersen martin.peter...@oracle.com wrote:

  Mike == Mike Snitzer snit...@redhat.com writes:
 
 Mike DM-thinp processes discards internally before it passes them down
 Mike (if configured to do so).  If a discard is smaller than the
 Mike granularity of a thinp block (whose size is configurable) or if
 Mike the start and end of the discard's extent is misaligned (relative
 Mike to the thinp blocks mapped to the logical extent) then the discard
 Mike won't actually discard partial thinp blocks.
 
 That's fine. You can throw away anything you don't like as long as
 discard_zeroes_data=0.

Correct, dm-thinp sets discard_zeroes_data=0

 But I don't understand why having an artificial cap at 2GB fixes
 things. Other than making it less likely for you to receive a runt by
 virtue of being aligned to a power of two.

That is the benefit.  And when coupled with the new default max_discard
of 64K (pending change from Jens for 4.3) this 2GB upper limit really
isn't such a big deal.  Unless I'm missing something...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
 Martin == Martin K Petersen martin.peter...@oracle.com writes:

Martin I agree except I really don't want to lop off anything unless
Martin the device locks up if we send it partial blocks. There was an
Martin array that had problems a while back but I believe they have
Martin been fixed.

Oh, and there are several arrays out there that have allocation units
that are not powers of two. *sigh*

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike That is the benefit.  And when coupled with the new default
Mike max_discard of 64K (pending change from Jens for 4.3) this 2GB
Mike upper limit really isn't such a big deal.  Unless I'm missing
Mike something...

2GB is fine for current SATA due to the stupid range descriptors. But
there are some changes in the pipeline to support descriptors with
bigger ranges so that will change.

However, we currently do 4GB discards on SCSI so the proposed cap will
cut that in half. I'm OK with that as an interim solution. though.

But I'm a bit concerned about what might be lurking in dm thinp if you
trip over partial blocks like in Ming's example...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike DM-thinp processes discards internally before it passes them down
Mike (if configured to do so).  If a discard is smaller than the
Mike granularity of a thinp block (whose size is configurable) or if
Mike the start and end of the discard's extent is misaligned (relative
Mike to the thinp blocks mapped to the logical extent) then the discard
Mike won't actually discard partial thinp blocks.

That's fine. You can throw away anything you don't like as long as
discard_zeroes_data=0.

But I don't understand why having an artificial cap at 2GB fixes
things. Other than making it less likely for you to receive a runt by
virtue of being aligned to a power of two.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
 Kent == Kent Overstreet kent.overstr...@gmail.com writes:

Kent This kind of logic really doesn't belong in dm

Well it does in this case since the thinp personality actually
provisions and unprovisions space.

But there is a difference between what dm thinp acts on for its own
internal provisioning purposes and what it passes down the stack. I am
really against dropping information anywhere along the path. We don't
round off read/write requests either.

The queue limits were meant as hints to mkfs.* so that on-disk data
structures could be laid out in an aligned and storage friendly way. I
never intended for the hints to affect runtime behavior.

Kent IMO though it belongs in the driver - if a discard needs to be
Kent dropped because it's too small and the hardware can't do it, that
Kent should be the driver's responsibility.

I agree except I really don't want to lop off anything unless the device
locks up if we send it partial blocks. There was an array that had
problems a while back but I believe they have been fixed.

The fundamental premise should be that we pass as comprehensive
information as we can. And the device can then decide to ignore all or
parts of the request. That's fundamentally how things work at the
protocol level in both SCSI and SATA. I don't see any reason why the
Linux I/O stack should behave in a different manner.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
 Ming == Ming Lin m...@kernel.org writes:

Ming Do you still agree we cap discard to 2G as an interim solution?

I can live with the 2G cap for 4.3 but would like to see it fixed
properly in 4.4.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Ming Lin
On Tue, 2015-08-11 at 20:24 -0400, Martin K. Petersen wrote:
  Ming == Ming Lin m...@kernel.org writes:
 
 Ming Do you still agree we cap discard to 2G as an interim solution?
 
 I can live with the 2G cap for 4.3 but would like to see it fixed
 properly in 4.4.

Sure. I'd like to work on the fix.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Kent Overstreet
On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote:
> On Mon, Aug 10 2015 at 10:00pm -0400,
> Martin K. Petersen  wrote:
> 
> > > "Ming" == Ming Lin  writes:
> > 
> > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()?
> > 
> > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use
> > Ming> 1<<31.
> > 
> > I'm not sure why things are not working for dm-thinp. Presumably Kent's
> > code would split the discard at a granularity boundary so why would that
> > cause problems for dm?
> 
> DM-thinp processes discards internally before it passes them down (if
> configured to do so).  If a discard is smaller than the granularity of a
> thinp block (whose size is configurable) or if the start and end of the
> discard's extent is misaligned (relative to the thinp blocks mapped to
> the logical extent) then the discard won't actually discard partial
> thinp blocks.

This kind of logic really doesn't belong in dm - if it's needed, it really
belongs in bio_split() (which is supposed to work correctly for discards - so if
it is needed, then bio_split() needs fixing...)

IMO though it belongs in the driver - if a discard needs to be dropped because
it's too small and the hardware can't do it, that should be the driver's
responsibility.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Mike Snitzer
On Mon, Aug 10 2015 at 10:00pm -0400,
Martin K. Petersen  wrote:

> > "Ming" == Ming Lin  writes:
> 
> Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()?
> 
> Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use
> Ming> 1<<31.
> 
> I'm not sure why things are not working for dm-thinp. Presumably Kent's
> code would split the discard at a granularity boundary so why would that
> cause problems for dm?

DM-thinp processes discards internally before it passes them down (if
configured to do so).  If a discard is smaller than the granularity of a
thinp block (whose size is configurable) or if the start and end of the
discard's extent is misaligned (relative to the thinp blocks mapped to
the logical extent) then the discard won't actually discard partial
thinp blocks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Martin K. Petersen
> "Ming" == Ming Lin  writes:

Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()?

Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use
Ming> 1<<31.

I'm not sure why things are not working for dm-thinp. Presumably Kent's
code would split the discard at a granularity boundary so why would that
cause problems for dm?

In looking at this I just found out that we'll corrupt data on certain
SCSI configs with the granularity enforcement in place. I'll have to
conjure up a fix for that...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, Aug 10, 2015 at 11:13 AM, Mike Snitzer  wrote:
> On Mon, Aug 10 2015 at 12:14pm -0400,
> Ming Lin  wrote:
>
>> On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
>> >
>> > Aside from that, I'm in favor of seeing this late bio splitting patchset
>> > finally land upstream (hopefully in time for the 4.3 merge, Jens?):
>> >
>> > Acked-by: Mike Snitzer 
>>
>> Thanks!
>>
>> May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely"
>> also?
>
> Sure, but please fold in the removal of dm.c comments I made in this
> merge commit:
>
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splitting=d6df875bb65ef1ee10c91cf09cb58d009286321f

Rebased on top of latest Linus tree(4.2-rc6+).

https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/diff/drivers/md/dm.c?h=block-generic-req=3dd8509a3f05152cca83bc41c37a8ba3e9119736

>
> thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, 2015-08-10 at 12:22 -0400, Martin K. Petersen wrote:
> > "Mike" == Mike Snitzer  writes:
> 
> Mike> Shouldn't we also be using MAX_BIO_SECTORS in
> Mike> blkdev_issue_write_same (instead of UINT_MAX >> 9)?
> 
> The granularity of WRITE SAME is always 1 logical block and there is no
> reason to round it down to the next power of two.
> 
> +/*
> + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
> + * it is of the proper granularity as long as the granularity is a power
> + * of two.
> + */
> +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)
> +
> 
> That's fine for SATA since we're already capping at 2TB minus change.
> But it means we'll be capping unnecessarily on SCSI. And larger range
> counts are impending in SATA as well.
> 
> So this goes back to my original comment: The only place there can be a
> discard granularity is for SCSI UNMAP. And consequently, we should only
> enforce alignment and granularity when that code path is taken in sd.c.
> 
> I'm OK with Ming's patch series in general. Let's leave the discard cap
> at UINT_MAX and I'll twiddle the rest in SCSI.

Just to make sure I didn't misunderstand it.

Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()?

req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
instead of:
req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS);

But that doesn't work for dm-thinp. See Kent's suggestion to use 1<<31.
https://www.redhat.com/archives/dm-devel/2015-August/msg00053.html

> 
> Reviewed-by: Martin K. Petersen 
> 

Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Mike Snitzer
On Mon, Aug 10 2015 at 12:14pm -0400,
Ming Lin  wrote:

> On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
> > 
> > Aside from that, I'm in favor of seeing this late bio splitting patchset
> > finally land upstream (hopefully in time for the 4.3 merge, Jens?):
> > 
> > Acked-by: Mike Snitzer 
> 
> Thanks!
> 
> May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely"
> also?

Sure, but please fold in the removal of dm.c comments I made in this
merge commit:

http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splitting=d6df875bb65ef1ee10c91cf09cb58d009286321f

thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Martin K. Petersen
> "Ming" == Ming Lin  writes:

Ming,

Ming> I also prefer using MAX_BIO_SECTORS.  Otherwise, we may have non
Ming> page size aligned splits.

This does not matter for write same and discard since there is only a
single logical block of payload. Also, given limitations in SATA we're
always issuing 2GB-32KB sized discards. Rounding those down to an even
1GB would impact performance.

I am sympathetic to wanting to issue I/Os that are aligned to powers of
two. But for most devices this matters little since the additional cost
is limited to misaligned head and tail blocks.

One thing that may be worth considering is switching bi_size from bytes
to blocks for REQ_FS. That would give us some headroom without
increasing bi_size beyond 32 bits. But I'm not entirely sure it's worth
the pain.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike> Shouldn't we also be using MAX_BIO_SECTORS in
Mike> blkdev_issue_write_same (instead of UINT_MAX >> 9)?

The granularity of WRITE SAME is always 1 logical block and there is no
reason to round it down to the next power of two.

+/*
+ * Ensure that max discard sectors doesn't overflow bi_size and hopefully
+ * it is of the proper granularity as long as the granularity is a power
+ * of two.
+ */
+#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)
+

That's fine for SATA since we're already capping at 2TB minus change.
But it means we'll be capping unnecessarily on SCSI. And larger range
counts are impending in SATA as well.

So this goes back to my original comment: The only place there can be a
discard granularity is for SCSI UNMAP. And consequently, we should only
enforce alignment and granularity when that code path is taken in sd.c.

I'm OK with Ming's patch series in general. Let's leave the discard cap
at UINT_MAX and I'll twiddle the rest in SCSI.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, 2015-08-10 at 09:14 -0700, Ming Lin wrote:
> On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
> > On Sun, Aug 09 2015 at  3:18am -0400,
> > Ming Lin  wrote:
> > 
> > > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
> > > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> > > > > Will change it to MAX_BIO_SECTORS.
> > > > > May I add your ACK?
> > > > 
> > > > Yes, please go ahead.
> > > 
> > > Thanks. I'll send a new version of the series once device-mapper guy
> > > acks.
> > > 
> > > Hi Mike,
> > > 
> > > I have updated my tree. Could you pull and re-test?
> > > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
> > > 
> > > The 2 thin-provisioning tests passed.
> > 
> > I've merged your latest branch with my dm-4.3 branch, I had one conflict
> > in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no
> > surprise).  I've published the result here:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting
> > 
> > It passes the device-mapper-test-suite's 'thin-provisioning' tests.
> > 
> > > Hope I can have your ACK soon.
> > 
> > Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same
> > (instead of UINT_MAX >> 9)?
> 
> I also prefer using MAX_BIO_SECTORS.
> Otherwise, we may have non page size aligned splits.
> 
> Say, write_same 8G.
> 
> Using UINT_MAX >> 9, we'll have 2 sector aligned splits in
> blkdev_issue_write_same():
> 0 - (4G - 512 - 1)
> (4G - 512, 8G -1)

Actually, 3 sector aligned splits.

0 - (4G-512-1)
(4G-512), (8G-512-1)
(8G-512), (8G-1)

> 
> This looks weired.
> 
> Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits:
> 0 - (2G -1)
> 2G - (4G - 1)
> 4G - (6G - 1)
> 6G - (8G - 1)
> 
> I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection.
> 
> > 
> > Aside from that, I'm in favor of seeing this late bio splitting patchset
> > finally land upstream (hopefully in time for the 4.3 merge, Jens?):
> > 
> > Acked-by: Mike Snitzer 
> 
> Thanks!
> 
> May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely"
> also?
> 
> > 
> > p.s. I'll be working with Joe Thornber on optimizing DM (particularly
> > dm-thinp and dm-cache) once this patchset is included upstream.  You'll
> > see I've already added a couple WIP dm-thinp patches ontop.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
> On Sun, Aug 09 2015 at  3:18am -0400,
> Ming Lin  wrote:
> 
> > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
> > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> > > > Will change it to MAX_BIO_SECTORS.
> > > > May I add your ACK?
> > > 
> > > Yes, please go ahead.
> > 
> > Thanks. I'll send a new version of the series once device-mapper guy
> > acks.
> > 
> > Hi Mike,
> > 
> > I have updated my tree. Could you pull and re-test?
> > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
> > 
> > The 2 thin-provisioning tests passed.
> 
> I've merged your latest branch with my dm-4.3 branch, I had one conflict
> in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no
> surprise).  I've published the result here:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting
> 
> It passes the device-mapper-test-suite's 'thin-provisioning' tests.
> 
> > Hope I can have your ACK soon.
> 
> Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same
> (instead of UINT_MAX >> 9)?

I also prefer using MAX_BIO_SECTORS.
Otherwise, we may have non page size aligned splits.

Say, write_same 8G.

Using UINT_MAX >> 9, we'll have 2 sector aligned splits in
blkdev_issue_write_same():
0 - (4G - 512 - 1)
(4G - 512, 8G -1)

This looks weired.

Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits:
0 - (2G -1)
2G - (4G - 1)
4G - (6G - 1)
6G - (8G - 1)

I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection.

> 
> Aside from that, I'm in favor of seeing this late bio splitting patchset
> finally land upstream (hopefully in time for the 4.3 merge, Jens?):
> 
> Acked-by: Mike Snitzer 

Thanks!

May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely"
also?

> 
> p.s. I'll be working with Joe Thornber on optimizing DM (particularly
> dm-thinp and dm-cache) once this patchset is included upstream.  You'll
> see I've already added a couple WIP dm-thinp patches ontop.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Mike Snitzer
On Sun, Aug 09 2015 at  3:18am -0400,
Ming Lin  wrote:

> On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
> > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> > > Will change it to MAX_BIO_SECTORS.
> > > May I add your ACK?
> > 
> > Yes, please go ahead.
> 
> Thanks. I'll send a new version of the series once device-mapper guy
> acks.
> 
> Hi Mike,
> 
> I have updated my tree. Could you pull and re-test?
> https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
> 
> The 2 thin-provisioning tests passed.

I've merged your latest branch with my dm-4.3 branch, I had one conflict
in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no
surprise).  I've published the result here:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting

It passes the device-mapper-test-suite's 'thin-provisioning' tests.

> Hope I can have your ACK soon.

Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same
(instead of UINT_MAX >> 9)?

Aside from that, I'm in favor of seeing this late bio splitting patchset
finally land upstream (hopefully in time for the 4.3 merge, Jens?):

Acked-by: Mike Snitzer 

p.s. I'll be working with Joe Thornber on optimizing DM (particularly
dm-thinp and dm-cache) once this patchset is included upstream.  You'll
see I've already added a couple WIP dm-thinp patches ontop.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, 2015-08-10 at 09:14 -0700, Ming Lin wrote:
 On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
  On Sun, Aug 09 2015 at  3:18am -0400,
  Ming Lin m...@kernel.org wrote:
  
   On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
 Will change it to MAX_BIO_SECTORS.
 May I add your ACK?

Yes, please go ahead.
   
   Thanks. I'll send a new version of the series once device-mapper guy
   acks.
   
   Hi Mike,
   
   I have updated my tree. Could you pull and re-test?
   https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
   
   The 2 thin-provisioning tests passed.
  
  I've merged your latest branch with my dm-4.3 branch, I had one conflict
  in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no
  surprise).  I've published the result here:
  http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting
  
  It passes the device-mapper-test-suite's 'thin-provisioning' tests.
  
   Hope I can have your ACK soon.
  
  Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same
  (instead of UINT_MAX  9)?
 
 I also prefer using MAX_BIO_SECTORS.
 Otherwise, we may have non page size aligned splits.
 
 Say, write_same 8G.
 
 Using UINT_MAX  9, we'll have 2 sector aligned splits in
 blkdev_issue_write_same():
 0 - (4G - 512 - 1)
 (4G - 512, 8G -1)

Actually, 3 sector aligned splits.

0 - (4G-512-1)
(4G-512), (8G-512-1)
(8G-512), (8G-1)

 
 This looks weired.
 
 Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits:
 0 - (2G -1)
 2G - (4G - 1)
 4G - (6G - 1)
 6G - (8G - 1)
 
 I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection.
 
  
  Aside from that, I'm in favor of seeing this late bio splitting patchset
  finally land upstream (hopefully in time for the 4.3 merge, Jens?):
  
  Acked-by: Mike Snitzer snit...@redhat.com
 
 Thanks!
 
 May I add your Ack to PATCH: block: kill merge_bvec_fn() completely
 also?
 
  
  p.s. I'll be working with Joe Thornber on optimizing DM (particularly
  dm-thinp and dm-cache) once this patchset is included upstream.  You'll
  see I've already added a couple WIP dm-thinp patches ontop.
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Mike Snitzer
On Sun, Aug 09 2015 at  3:18am -0400,
Ming Lin m...@kernel.org wrote:

 On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
  On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
   Will change it to MAX_BIO_SECTORS.
   May I add your ACK?
  
  Yes, please go ahead.
 
 Thanks. I'll send a new version of the series once device-mapper guy
 acks.
 
 Hi Mike,
 
 I have updated my tree. Could you pull and re-test?
 https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
 
 The 2 thin-provisioning tests passed.

I've merged your latest branch with my dm-4.3 branch, I had one conflict
in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no
surprise).  I've published the result here:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting

It passes the device-mapper-test-suite's 'thin-provisioning' tests.

 Hope I can have your ACK soon.

Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same
(instead of UINT_MAX  9)?

Aside from that, I'm in favor of seeing this late bio splitting patchset
finally land upstream (hopefully in time for the 4.3 merge, Jens?):

Acked-by: Mike Snitzer snit...@redhat.com

p.s. I'll be working with Joe Thornber on optimizing DM (particularly
dm-thinp and dm-cache) once this patchset is included upstream.  You'll
see I've already added a couple WIP dm-thinp patches ontop.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
 On Sun, Aug 09 2015 at  3:18am -0400,
 Ming Lin m...@kernel.org wrote:
 
  On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
   On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
Will change it to MAX_BIO_SECTORS.
May I add your ACK?
   
   Yes, please go ahead.
  
  Thanks. I'll send a new version of the series once device-mapper guy
  acks.
  
  Hi Mike,
  
  I have updated my tree. Could you pull and re-test?
  https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req
  
  The 2 thin-provisioning tests passed.
 
 I've merged your latest branch with my dm-4.3 branch, I had one conflict
 in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no
 surprise).  I've published the result here:
 http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting
 
 It passes the device-mapper-test-suite's 'thin-provisioning' tests.
 
  Hope I can have your ACK soon.
 
 Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same
 (instead of UINT_MAX  9)?

I also prefer using MAX_BIO_SECTORS.
Otherwise, we may have non page size aligned splits.

Say, write_same 8G.

Using UINT_MAX  9, we'll have 2 sector aligned splits in
blkdev_issue_write_same():
0 - (4G - 512 - 1)
(4G - 512, 8G -1)

This looks weired.

Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits:
0 - (2G -1)
2G - (4G - 1)
4G - (6G - 1)
6G - (8G - 1)

I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection.

 
 Aside from that, I'm in favor of seeing this late bio splitting patchset
 finally land upstream (hopefully in time for the 4.3 merge, Jens?):
 
 Acked-by: Mike Snitzer snit...@redhat.com

Thanks!

May I add your Ack to PATCH: block: kill merge_bvec_fn() completely
also?

 
 p.s. I'll be working with Joe Thornber on optimizing DM (particularly
 dm-thinp and dm-cache) once this patchset is included upstream.  You'll
 see I've already added a couple WIP dm-thinp patches ontop.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike Shouldn't we also be using MAX_BIO_SECTORS in
Mike blkdev_issue_write_same (instead of UINT_MAX  9)?

The granularity of WRITE SAME is always 1 logical block and there is no
reason to round it down to the next power of two.

+/*
+ * Ensure that max discard sectors doesn't overflow bi_size and hopefully
+ * it is of the proper granularity as long as the granularity is a power
+ * of two.
+ */
+#define MAX_DISCARD_SECTORS ((1U  31)  9)
+

That's fine for SATA since we're already capping at 2TB minus change.
But it means we'll be capping unnecessarily on SCSI. And larger range
counts are impending in SATA as well.

So this goes back to my original comment: The only place there can be a
discard granularity is for SCSI UNMAP. And consequently, we should only
enforce alignment and granularity when that code path is taken in sd.c.

I'm OK with Ming's patch series in general. Let's leave the discard cap
at UINT_MAX and I'll twiddle the rest in SCSI.

Reviewed-by: Martin K. Petersen martin.peter...@oracle.com

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Mike Snitzer
On Mon, Aug 10 2015 at 12:14pm -0400,
Ming Lin m...@kernel.org wrote:

 On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
  
  Aside from that, I'm in favor of seeing this late bio splitting patchset
  finally land upstream (hopefully in time for the 4.3 merge, Jens?):
  
  Acked-by: Mike Snitzer snit...@redhat.com
 
 Thanks!
 
 May I add your Ack to PATCH: block: kill merge_bvec_fn() completely
 also?

Sure, but please fold in the removal of dm.c comments I made in this
merge commit:

http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splittingid=d6df875bb65ef1ee10c91cf09cb58d009286321f

thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, 2015-08-10 at 12:22 -0400, Martin K. Petersen wrote:
  Mike == Mike Snitzer snit...@redhat.com writes:
 
 Mike Shouldn't we also be using MAX_BIO_SECTORS in
 Mike blkdev_issue_write_same (instead of UINT_MAX  9)?
 
 The granularity of WRITE SAME is always 1 logical block and there is no
 reason to round it down to the next power of two.
 
 +/*
 + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
 + * it is of the proper granularity as long as the granularity is a power
 + * of two.
 + */
 +#define MAX_DISCARD_SECTORS ((1U  31)  9)
 +
 
 That's fine for SATA since we're already capping at 2TB minus change.
 But it means we'll be capping unnecessarily on SCSI. And larger range
 counts are impending in SATA as well.
 
 So this goes back to my original comment: The only place there can be a
 discard granularity is for SCSI UNMAP. And consequently, we should only
 enforce alignment and granularity when that code path is taken in sd.c.
 
 I'm OK with Ming's patch series in general. Let's leave the discard cap
 at UINT_MAX and I'll twiddle the rest in SCSI.

Just to make sure I didn't misunderstand it.

Did you mean still use (UINT_MAX  9) in blkdev_issue_discard()?

req_sects = min_t(sector_t, nr_sects, UINT_MAX  9);
instead of:
req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS);

But that doesn't work for dm-thinp. See Kent's suggestion to use 131.
https://www.redhat.com/archives/dm-devel/2015-August/msg00053.html

 
 Reviewed-by: Martin K. Petersen martin.peter...@oracle.com
 

Thanks!


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Martin K. Petersen
 Ming == Ming Lin m...@kernel.org writes:

Ming,

Ming I also prefer using MAX_BIO_SECTORS.  Otherwise, we may have non
Ming page size aligned splits.

This does not matter for write same and discard since there is only a
single logical block of payload. Also, given limitations in SATA we're
always issuing 2GB-32KB sized discards. Rounding those down to an even
1GB would impact performance.

I am sympathetic to wanting to issue I/Os that are aligned to powers of
two. But for most devices this matters little since the additional cost
is limited to misaligned head and tail blocks.

One thing that may be worth considering is switching bi_size from bytes
to blocks for REQ_FS. That would give us some headroom without
increasing bi_size beyond 32 bits. But I'm not entirely sure it's worth
the pain.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Ming Lin
On Mon, Aug 10, 2015 at 11:13 AM, Mike Snitzer snit...@redhat.com wrote:
 On Mon, Aug 10 2015 at 12:14pm -0400,
 Ming Lin m...@kernel.org wrote:

 On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote:
 
  Aside from that, I'm in favor of seeing this late bio splitting patchset
  finally land upstream (hopefully in time for the 4.3 merge, Jens?):
 
  Acked-by: Mike Snitzer snit...@redhat.com

 Thanks!

 May I add your Ack to PATCH: block: kill merge_bvec_fn() completely
 also?

 Sure, but please fold in the removal of dm.c comments I made in this
 merge commit:

 http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splittingid=d6df875bb65ef1ee10c91cf09cb58d009286321f

Rebased on top of latest Linus tree(4.2-rc6+).

https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/diff/drivers/md/dm.c?h=block-generic-reqid=3dd8509a3f05152cca83bc41c37a8ba3e9119736


 thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Kent Overstreet
On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote:
 On Mon, Aug 10 2015 at 10:00pm -0400,
 Martin K. Petersen martin.peter...@oracle.com wrote:
 
   Ming == Ming Lin m...@kernel.org writes:
  
  Ming Did you mean still use (UINT_MAX  9) in blkdev_issue_discard()?
  
  Ming But that doesn't work for dm-thinp. See Kent's suggestion to use
  Ming 131.
  
  I'm not sure why things are not working for dm-thinp. Presumably Kent's
  code would split the discard at a granularity boundary so why would that
  cause problems for dm?
 
 DM-thinp processes discards internally before it passes them down (if
 configured to do so).  If a discard is smaller than the granularity of a
 thinp block (whose size is configurable) or if the start and end of the
 discard's extent is misaligned (relative to the thinp blocks mapped to
 the logical extent) then the discard won't actually discard partial
 thinp blocks.

This kind of logic really doesn't belong in dm - if it's needed, it really
belongs in bio_split() (which is supposed to work correctly for discards - so if
it is needed, then bio_split() needs fixing...)

IMO though it belongs in the driver - if a discard needs to be dropped because
it's too small and the hardware can't do it, that should be the driver's
responsibility.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Martin K. Petersen
 Ming == Ming Lin m...@kernel.org writes:

Ming Did you mean still use (UINT_MAX  9) in blkdev_issue_discard()?

Ming But that doesn't work for dm-thinp. See Kent's suggestion to use
Ming 131.

I'm not sure why things are not working for dm-thinp. Presumably Kent's
code would split the discard at a granularity boundary so why would that
cause problems for dm?

In looking at this I just found out that we'll corrupt data on certain
SCSI configs with the granularity enforcement in place. I'll have to
conjure up a fix for that...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-10 Thread Mike Snitzer
On Mon, Aug 10 2015 at 10:00pm -0400,
Martin K. Petersen martin.peter...@oracle.com wrote:

  Ming == Ming Lin m...@kernel.org writes:
 
 Ming Did you mean still use (UINT_MAX  9) in blkdev_issue_discard()?
 
 Ming But that doesn't work for dm-thinp. See Kent's suggestion to use
 Ming 131.
 
 I'm not sure why things are not working for dm-thinp. Presumably Kent's
 code would split the discard at a granularity boundary so why would that
 cause problems for dm?

DM-thinp processes discards internally before it passes them down (if
configured to do so).  If a discard is smaller than the granularity of a
thinp block (whose size is configurable) or if the start and end of the
discard's extent is misaligned (relative to the thinp blocks mapped to
the logical extent) then the discard won't actually discard partial
thinp blocks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
> On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> > Will change it to MAX_BIO_SECTORS.
> > May I add your ACK?
> 
> Yes, please go ahead.

Thanks. I'll send a new version of the series once device-mapper guy
acks.

Hi Mike,

I have updated my tree. Could you pull and re-test?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req

The 2 thin-provisioning tests passed.
Hope I can have your ACK soon.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> Will change it to MAX_BIO_SECTORS.
> May I add your ACK?

Yes, please go ahead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote:
> On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
> > +/*
> > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
> > + * it is of the proper granularity as long as the granularity is a power
> > + * of two.
> > + */
> > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)
> 
> Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
> to something like Kent's multipage biovecs we'll actually need it for
> regular read/write bios in addition to discard and write same.
> 
> Except for that the patch looks reasonable to me.

Will change it to MAX_BIO_SECTORS.
May I add your ACK?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
> +/*
> + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
> + * it is of the proper granularity as long as the granularity is a power
> + * of two.
> + */
> +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)

Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
to something like Kent's multipage biovecs we'll actually need it for
regular read/write bios in addition to discard and write same.

Except for that the patch looks reasonable to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote:
> > "Mike" == Mike Snitzer  writes:
> 
> Mike> This will translate to all intermediate layers that might split
> Mike> discards needing to worry about granularity/alignment too
> Mike> (e.g. how dm-thinp will have to care because it must generate
> Mike> discard mappings with associated bios based on how blocks were
> Mike> mapped to thinp).
> 
> The fundamental issue here is that alignment and granularity should
> never, ever have been enforced at the top of the stack. Horrendous idea
> from the very beginning.
> 
> For the < handful of braindead devices that get confused when you do
> partial or misaligned blocks we should have had a quirk that did any
> range adjusting at the bottom in sd_setup_discard_cmnd().
> 
> There's a reason I turned discard_zeroes_data off for UNMAP!
> 
> Wrt. the range size I don't have a problem with capping at the 32-bit
> bi_size limit. We probably don't want to send commands much bigger than
> that anyway.

How about below?

commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a
Author: Ming Lin 
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we set max discard sectors to (1<<31)>>9 to ensure
it doesn't overflow bi_size and hopefully it is of the proper
granularity as long as the granularity is a power of two.

Signed-off-by: Ming Lin 
---
 block/blk-lib.c | 47 +++
 1 file changed, 11 insertions(+), 36 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..4859e4b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err)
bio_put(bio);
 }
 
+/*
+ * Ensure that max discard sectors doesn't overflow bi_size and hopefully
+ * it is of the proper granularity as long as the granularity is a power
+ * of two.
+ */
+#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)
+
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q->limits.discard_granularity >> 9, 1U);
-   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
break;
}
 
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
+   req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS);
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects < nr_sects &&
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio->bi_iter.bi_sector = sector;

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote:
  Mike == Mike Snitzer snit...@redhat.com writes:
 
 Mike This will translate to all intermediate layers that might split
 Mike discards needing to worry about granularity/alignment too
 Mike (e.g. how dm-thinp will have to care because it must generate
 Mike discard mappings with associated bios based on how blocks were
 Mike mapped to thinp).
 
 The fundamental issue here is that alignment and granularity should
 never, ever have been enforced at the top of the stack. Horrendous idea
 from the very beginning.
 
 For the  handful of braindead devices that get confused when you do
 partial or misaligned blocks we should have had a quirk that did any
 range adjusting at the bottom in sd_setup_discard_cmnd().
 
 There's a reason I turned discard_zeroes_data off for UNMAP!
 
 Wrt. the range size I don't have a problem with capping at the 32-bit
 bi_size limit. We probably don't want to send commands much bigger than
 that anyway.

How about below?

commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a
Author: Ming Lin min...@ssi.samsung.com
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we set max discard sectors to (131)9 to ensure
it doesn't overflow bi_size and hopefully it is of the proper
granularity as long as the granularity is a power of two.

Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-lib.c | 47 +++
 1 file changed, 11 insertions(+), 36 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..4859e4b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err)
bio_put(bio);
 }
 
+/*
+ * Ensure that max discard sectors doesn't overflow bi_size and hopefully
+ * it is of the proper granularity as long as the granularity is a power
+ * of two.
+ */
+#define MAX_DISCARD_SECTORS ((1U  31)  9)
+
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q-limits.discard_granularity  9, 1U);
-   alignment = (bdev_discard_alignment(bdev)  9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags  BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug(plug);
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
break;
}
 
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
+   req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS);
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects  nr_sects 
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio-bi_iter.bi_sector = sector;
 

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
 +/*
 + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
 + * it is of the proper granularity as long as the granularity is a power
 + * of two.
 + */
 +#define MAX_DISCARD_SECTORS ((1U  31)  9)

Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
to something like Kent's multipage biovecs we'll actually need it for
regular read/write bios in addition to discard and write same.

Except for that the patch looks reasonable to me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote:
 On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
  +/*
  + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
  + * it is of the proper granularity as long as the granularity is a power
  + * of two.
  + */
  +#define MAX_DISCARD_SECTORS ((1U  31)  9)
 
 Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
 to something like Kent's multipage biovecs we'll actually need it for
 regular read/write bios in addition to discard and write same.
 
 Except for that the patch looks reasonable to me.

Will change it to MAX_BIO_SECTORS.
May I add your ACK?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
 Will change it to MAX_BIO_SECTORS.
 May I add your ACK?

Yes, please go ahead.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
 On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
  Will change it to MAX_BIO_SECTORS.
  May I add your ACK?
 
 Yes, please go ahead.

Thanks. I'll send a new version of the series once device-mapper guy
acks.

Hi Mike,

I have updated my tree. Could you pull and re-test?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req

The 2 thin-provisioning tests passed.
Hope I can have your ACK soon.

Thanks.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike> This will translate to all intermediate layers that might split
Mike> discards needing to worry about granularity/alignment too
Mike> (e.g. how dm-thinp will have to care because it must generate
Mike> discard mappings with associated bios based on how blocks were
Mike> mapped to thinp).

The fundamental issue here is that alignment and granularity should
never, ever have been enforced at the top of the stack. Horrendous idea
from the very beginning.

For the < handful of braindead devices that get confused when you do
partial or misaligned blocks we should have had a quirk that did any
range adjusting at the bottom in sd_setup_discard_cmnd().

There's a reason I turned discard_zeroes_data off for UNMAP!

Wrt. the range size I don't have a problem with capping at the 32-bit
bi_size limit. We probably don't want to send commands much bigger than
that anyway.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Christoph Hellwig
On Fri, Aug 07, 2015 at 09:22:26PM -0800, Kent Overstreet wrote:
> Is granularity required to be a power of two? One would hope, but looking at 
> the
> code that doesn't appear to be a requirement... ugh, that's terrible...

If devices have an odd granularity we'll just have to split more than
nessecary.  Let's hope no major hardware does that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Kent Overstreet
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
> Wouldn't it be easier to move both max_write_same_sectors and
> max_discard sectors to 64 bit (ie to type sector_t) and be done with the
> overflow?
> Seems to me this is far too much coding around self-imposed restrictions...

It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I
suppose wouldn't actually increase the size of struct bio (when sector_t is 64
bits), since struct bvec_iter has padding right now...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Hannes Reinecke
On 08/08/2015 01:40 AM, Ming Lin wrote:
> 
> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
>> I'm for solution 3:
>>
>>  - keep blk_bio_{discard,write_same}_split, but ensure we never built
>>a > 4GB bio in blkdev_issue_{discard,write_same}.
> 
> This has problem as I mentioned in solution 1.
> We need to also make sure max discard size is of proper granularity.
> See below example.
> 
>   4G: 8388608 sectors
> UINT_MAX: 8388607 sectors
> 
> dm-thinp block size = default discard granularity = 128 sectors
> 
> blkdev_issue_discard(sector=0, nr_sectors=8388608)
> 
> 1. Only ensure bi_size not overflow
> 
> It doesn't work.
> 
> [start_sector, end_sector]
> [0, 8388607]
> [0, 8388606], then dm-thinp splits it to 2 bios
> [0, 8388479]
> [8388480, 8388606] ---> this has problem in process_discard_bio(),
> because the discard size(7 sectors) covers 
> less than a block(128 sectors)
> [8388607, 8388607] ---> same problem 
> 
> 2. Ensure bi_size not overflow and max discard size is of proper granularity
> 
> It works.
> 
> [start_sector, end_sector]
> [0, 8388607]
> [0, 8388479]
> [8388480, 8388607]
> 
> 
> So how about below patch?
> 
> commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
> Author: Ming Lin 
> Date:   Fri Aug 7 15:07:07 2015 -0700
> 
> block: remove split code in blkdev_issue_{discard,write_same}
> 
> The split code in blkdev_issue_{discard,write_same} can go away
> now that any driver that cares does the split. We have to make
> sure bio size doesn't overflow.
> 
> For discard, we ensure max_discard_sectors is of the proper
> granularity. So if discard size > 4G, blkdev_issue_discard() always
> send multiple granularity requests to lower level, except that the
> last one may be not multiple granularity.
> 
> Signed-off-by: Ming Lin 
> ---
>  block/blk-lib.c | 37 +
>  1 file changed, 9 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 7688ee3..e178a07 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   struct request_queue *q = bdev_get_queue(bdev);
>   int type = REQ_WRITE | REQ_DISCARD;
>   unsigned int max_discard_sectors, granularity;
> - int alignment;
>   struct bio_batch bb;
>   struct bio *bio;
>   int ret = 0;
> @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>  
>   /* Zero-sector (unknown) and one-sector granularities are the same.  */
>   granularity = max(q->limits.discard_granularity >> 9, 1U);
> - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
>  
>   /*
> -  * Ensure that max_discard_sectors is of the proper
> -  * granularity, so that requests stay aligned after a split.
> -  */
> - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> +  * Ensure that max_discard_sectors doesn't overflow bi_size and is of
> +  * the proper granularity. So if discard size > 4G, 
> blkdev_issue_discard()
> +  * always split and send multiple granularity requests to lower level,
> +  * except that the last one may be not multiple granularity.
> + */
> + max_discard_sectors = UINT_MAX >> 9;
>   max_discard_sectors -= max_discard_sectors % granularity;
> - if (unlikely(!max_discard_sectors)) {
> - /* Avoid infinite loop below. Being cautious never hurts. */
> - return -EOPNOTSUPP;
> - }
>  
>   if (flags & BLKDEV_DISCARD_SECURE) {
>   if (!blk_queue_secdiscard(q))
> @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   blk_start_plug();
>   while (nr_sects) {
>   unsigned int req_sects;
> - sector_t end_sect, tmp;
> + sector_t end_sect;
>  
>   bio = bio_alloc(gfp_mask, 1);
>   if (!bio) {
> @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   }
>  
>   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
> -
> - /*
> -  * If splitting a request, and the next starting sector would be
> -  * misaligned, stop the discard at the previous aligned sector.
> -  */
>   end_sect = sector + req_sects;
> - tmp = end_sect;
> - if (req_sects < nr_sects &&
> - sector_div(tmp, granularity) != alignment) {
> - end_sect = end_sect - alignment;
> - sector_div(end_sect, granularity);
> - end_sect = end_sect * granularity + alignment;
> - req_sects = end_sect - sector;
> - }
>  
>   bio->bi_iter.bi_sector = sector;
> 

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Hannes Reinecke
On 08/08/2015 01:40 AM, Ming Lin wrote:
 
 On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
 I'm for solution 3:

  - keep blk_bio_{discard,write_same}_split, but ensure we never built
a  4GB bio in blkdev_issue_{discard,write_same}.
 
 This has problem as I mentioned in solution 1.
 We need to also make sure max discard size is of proper granularity.
 See below example.
 
   4G: 8388608 sectors
 UINT_MAX: 8388607 sectors
 
 dm-thinp block size = default discard granularity = 128 sectors
 
 blkdev_issue_discard(sector=0, nr_sectors=8388608)
 
 1. Only ensure bi_size not overflow
 
 It doesn't work.
 
 [start_sector, end_sector]
 [0, 8388607]
 [0, 8388606], then dm-thinp splits it to 2 bios
 [0, 8388479]
 [8388480, 8388606] --- this has problem in process_discard_bio(),
 because the discard size(7 sectors) covers 
 less than a block(128 sectors)
 [8388607, 8388607] --- same problem 
 
 2. Ensure bi_size not overflow and max discard size is of proper granularity
 
 It works.
 
 [start_sector, end_sector]
 [0, 8388607]
 [0, 8388479]
 [8388480, 8388607]
 
 
 So how about below patch?
 
 commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
 Author: Ming Lin min...@ssi.samsung.com
 Date:   Fri Aug 7 15:07:07 2015 -0700
 
 block: remove split code in blkdev_issue_{discard,write_same}
 
 The split code in blkdev_issue_{discard,write_same} can go away
 now that any driver that cares does the split. We have to make
 sure bio size doesn't overflow.
 
 For discard, we ensure max_discard_sectors is of the proper
 granularity. So if discard size  4G, blkdev_issue_discard() always
 send multiple granularity requests to lower level, except that the
 last one may be not multiple granularity.
 
 Signed-off-by: Ming Lin min...@ssi.samsung.com
 ---
  block/blk-lib.c | 37 +
  1 file changed, 9 insertions(+), 28 deletions(-)
 
 diff --git a/block/blk-lib.c b/block/blk-lib.c
 index 7688ee3..e178a07 100644
 --- a/block/blk-lib.c
 +++ b/block/blk-lib.c
 @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
   struct request_queue *q = bdev_get_queue(bdev);
   int type = REQ_WRITE | REQ_DISCARD;
   unsigned int max_discard_sectors, granularity;
 - int alignment;
   struct bio_batch bb;
   struct bio *bio;
   int ret = 0;
 @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
  
   /* Zero-sector (unknown) and one-sector granularities are the same.  */
   granularity = max(q-limits.discard_granularity  9, 1U);
 - alignment = (bdev_discard_alignment(bdev)  9) % granularity;
  
   /*
 -  * Ensure that max_discard_sectors is of the proper
 -  * granularity, so that requests stay aligned after a split.
 -  */
 - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
 +  * Ensure that max_discard_sectors doesn't overflow bi_size and is of
 +  * the proper granularity. So if discard size  4G, 
 blkdev_issue_discard()
 +  * always split and send multiple granularity requests to lower level,
 +  * except that the last one may be not multiple granularity.
 + */
 + max_discard_sectors = UINT_MAX  9;
   max_discard_sectors -= max_discard_sectors % granularity;
 - if (unlikely(!max_discard_sectors)) {
 - /* Avoid infinite loop below. Being cautious never hurts. */
 - return -EOPNOTSUPP;
 - }
  
   if (flags  BLKDEV_DISCARD_SECURE) {
   if (!blk_queue_secdiscard(q))
 @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
   blk_start_plug(plug);
   while (nr_sects) {
   unsigned int req_sects;
 - sector_t end_sect, tmp;
 + sector_t end_sect;
  
   bio = bio_alloc(gfp_mask, 1);
   if (!bio) {
 @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
   }
  
   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
 -
 - /*
 -  * If splitting a request, and the next starting sector would be
 -  * misaligned, stop the discard at the previous aligned sector.
 -  */
   end_sect = sector + req_sects;
 - tmp = end_sect;
 - if (req_sects  nr_sects 
 - sector_div(tmp, granularity) != alignment) {
 - end_sect = end_sect - alignment;
 - sector_div(end_sect, granularity);
 - end_sect = end_sect * granularity + alignment;
 - req_sects = end_sect - sector;
 - }
  
   bio-bi_iter.bi_sector = sector;
   bio-bi_end_io = bio_batch_end_io;
 @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device 

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Kent Overstreet
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
 Wouldn't it be easier to move both max_write_same_sectors and
 max_discard sectors to 64 bit (ie to type sector_t) and be done with the
 overflow?
 Seems to me this is far too much coding around self-imposed restrictions...

It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I
suppose wouldn't actually increase the size of struct bio (when sector_t is 64
bits), since struct bvec_iter has padding right now...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Christoph Hellwig
On Fri, Aug 07, 2015 at 09:22:26PM -0800, Kent Overstreet wrote:
 Is granularity required to be a power of two? One would hope, but looking at 
 the
 code that doesn't appear to be a requirement... ugh, that's terrible...

If devices have an odd granularity we'll just have to split more than
nessecary.  Let's hope no major hardware does that.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike This will translate to all intermediate layers that might split
Mike discards needing to worry about granularity/alignment too
Mike (e.g. how dm-thinp will have to care because it must generate
Mike discard mappings with associated bios based on how blocks were
Mike mapped to thinp).

The fundamental issue here is that alignment and granularity should
never, ever have been enforced at the top of the stack. Horrendous idea
from the very beginning.

For the  handful of braindead devices that get confused when you do
partial or misaligned blocks we should have had a quirk that did any
range adjusting at the bottom in sd_setup_discard_cmnd().

There's a reason I turned discard_zeroes_data off for UNMAP!

Wrt. the range size I don't have a problem with capping at the 32-bit
bi_size limit. We probably don't want to send commands much bigger than
that anyway.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Kent Overstreet
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote:
> On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet
> > Ideally we'd get upper layers out of the business of knowing about the queue
> > limits at all - that was the point of the patch series, after all.
> >
> > Instead of using UINT_MAX, would it work to just make the max 1 << 31 
> > sectors?'
> 
> 1 << 31 = 2G bytes = 0x40 sectors.
> 
> Yes, that works as long as it's multiple of granularity.

Is granularity required to be a power of two? One would hope, but looking at the
code that doesn't appear to be a requirement... ugh, that's terrible...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Ming Lin
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet
 wrote:
> On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote:
>>
>> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
>> > I'm for solution 3:
>> >
>> >  - keep blk_bio_{discard,write_same}_split, but ensure we never built
>> >a > 4GB bio in blkdev_issue_{discard,write_same}.
>>
>> This has problem as I mentioned in solution 1.
>> We need to also make sure max discard size is of proper granularity.
>> See below example.
>>
>>   4G: 8388608 sectors
>> UINT_MAX: 8388607 sectors
>>
>> dm-thinp block size = default discard granularity = 128 sectors
>>
>> blkdev_issue_discard(sector=0, nr_sectors=8388608)
>>
>> 1. Only ensure bi_size not overflow
>>
>> It doesn't work.
>>
>> [start_sector, end_sector]
>> [0, 8388607]
>> [0, 8388606], then dm-thinp splits it to 2 bios
>> [0, 8388479]
>> [8388480, 8388606] ---> this has problem in process_discard_bio(),
>> because the discard size(7 sectors) covers 
>> less than a block(128 sectors)
>> [8388607, 8388607] ---> same problem
>>
>> 2. Ensure bi_size not overflow and max discard size is of proper granularity
>
> Ideally we'd get upper layers out of the business of knowing about the queue
> limits at all - that was the point of the patch series, after all.
>
> Instead of using UINT_MAX, would it work to just make the max 1 << 31 
> sectors?'

1 << 31 = 2G bytes = 0x40 sectors.

Yes, that works as long as it's multiple of granularity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Kent Overstreet
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote:
> 
> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
> > I'm for solution 3:
> > 
> >  - keep blk_bio_{discard,write_same}_split, but ensure we never built
> >a > 4GB bio in blkdev_issue_{discard,write_same}.
> 
> This has problem as I mentioned in solution 1.
> We need to also make sure max discard size is of proper granularity.
> See below example.
> 
>   4G: 8388608 sectors
> UINT_MAX: 8388607 sectors
> 
> dm-thinp block size = default discard granularity = 128 sectors
> 
> blkdev_issue_discard(sector=0, nr_sectors=8388608)
> 
> 1. Only ensure bi_size not overflow
> 
> It doesn't work.
> 
> [start_sector, end_sector]
> [0, 8388607]
> [0, 8388606], then dm-thinp splits it to 2 bios
> [0, 8388479]
> [8388480, 8388606] ---> this has problem in process_discard_bio(),
> because the discard size(7 sectors) covers 
> less than a block(128 sectors)
> [8388607, 8388607] ---> same problem 
> 
> 2. Ensure bi_size not overflow and max discard size is of proper granularity

Ideally we'd get upper layers out of the business of knowing about the queue
limits at all - that was the point of the patch series, after all.

Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Ming Lin

On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
> I'm for solution 3:
> 
>  - keep blk_bio_{discard,write_same}_split, but ensure we never built
>a > 4GB bio in blkdev_issue_{discard,write_same}.

This has problem as I mentioned in solution 1.
We need to also make sure max discard size is of proper granularity.
See below example.

  4G: 8388608 sectors
UINT_MAX: 8388607 sectors

dm-thinp block size = default discard granularity = 128 sectors

blkdev_issue_discard(sector=0, nr_sectors=8388608)

1. Only ensure bi_size not overflow

It doesn't work.

[start_sector, end_sector]
[0, 8388607]
[0, 8388606], then dm-thinp splits it to 2 bios
[0, 8388479]
[8388480, 8388606] ---> this has problem in process_discard_bio(),
because the discard size(7 sectors) covers less 
than a block(128 sectors)
[8388607, 8388607] ---> same problem 

2. Ensure bi_size not overflow and max discard size is of proper granularity

It works.

[start_sector, end_sector]
[0, 8388607]
[0, 8388479]
[8388480, 8388607]


So how about below patch?

commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
Author: Ming Lin 
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we ensure max_discard_sectors is of the proper
granularity. So if discard size > 4G, blkdev_issue_discard() always
send multiple granularity requests to lower level, except that the
last one may be not multiple granularity.

Signed-off-by: Ming Lin 
---
 block/blk-lib.c | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..e178a07 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q->limits.discard_granularity >> 9, 1U);
-   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
 
/*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+* Ensure that max_discard_sectors doesn't overflow bi_size and is of
+* the proper granularity. So if discard size > 4G, 
blkdev_issue_discard()
+* always split and send multiple granularity requests to lower level,
+* except that the last one may be not multiple granularity.
+ */
+   max_discard_sectors = UINT_MAX >> 9;
max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
 
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
@@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
}
 
req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects < nr_sects &&
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
@@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
-   

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Christoph Hellwig
On Thu, Aug 06, 2015 at 05:00:04PM -0700, Kent Overstreet wrote:
> Why is it that we don't want to send giant discards? Is it a latency issue?

Yes.  Take a look at the "Configurable max discard size" thread(s).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Christoph Hellwig
I'm for solution 3:

 - keep blk_bio_{discard,write_same}_split, but ensure we never built
   a > 4GB bio in blkdev_issue_{discard,write_same}.

Note that this isn't special casing, we can't build > 4GB bios for
data either, it's just implemented as a side effect right now instead
of checked explicitly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Ming Lin

On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
 I'm for solution 3:
 
  - keep blk_bio_{discard,write_same}_split, but ensure we never built
a  4GB bio in blkdev_issue_{discard,write_same}.

This has problem as I mentioned in solution 1.
We need to also make sure max discard size is of proper granularity.
See below example.

  4G: 8388608 sectors
UINT_MAX: 8388607 sectors

dm-thinp block size = default discard granularity = 128 sectors

blkdev_issue_discard(sector=0, nr_sectors=8388608)

1. Only ensure bi_size not overflow

It doesn't work.

[start_sector, end_sector]
[0, 8388607]
[0, 8388606], then dm-thinp splits it to 2 bios
[0, 8388479]
[8388480, 8388606] --- this has problem in process_discard_bio(),
because the discard size(7 sectors) covers less 
than a block(128 sectors)
[8388607, 8388607] --- same problem 

2. Ensure bi_size not overflow and max discard size is of proper granularity

It works.

[start_sector, end_sector]
[0, 8388607]
[0, 8388479]
[8388480, 8388607]


So how about below patch?

commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
Author: Ming Lin min...@ssi.samsung.com
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we ensure max_discard_sectors is of the proper
granularity. So if discard size  4G, blkdev_issue_discard() always
send multiple granularity requests to lower level, except that the
last one may be not multiple granularity.

Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-lib.c | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..e178a07 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q-limits.discard_granularity  9, 1U);
-   alignment = (bdev_discard_alignment(bdev)  9) % granularity;
 
/*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
+* Ensure that max_discard_sectors doesn't overflow bi_size and is of
+* the proper granularity. So if discard size  4G, 
blkdev_issue_discard()
+* always split and send multiple granularity requests to lower level,
+* except that the last one may be not multiple granularity.
+ */
+   max_discard_sectors = UINT_MAX  9;
max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
 
if (flags  BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
@@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug(plug);
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
}
 
req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects  nr_sects 
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio-bi_iter.bi_sector = sector;
bio-bi_end_io = bio_batch_end_io;
@@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
if (!q)
return 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Kent Overstreet
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote:
 
 On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
  I'm for solution 3:
  
   - keep blk_bio_{discard,write_same}_split, but ensure we never built
 a  4GB bio in blkdev_issue_{discard,write_same}.
 
 This has problem as I mentioned in solution 1.
 We need to also make sure max discard size is of proper granularity.
 See below example.
 
   4G: 8388608 sectors
 UINT_MAX: 8388607 sectors
 
 dm-thinp block size = default discard granularity = 128 sectors
 
 blkdev_issue_discard(sector=0, nr_sectors=8388608)
 
 1. Only ensure bi_size not overflow
 
 It doesn't work.
 
 [start_sector, end_sector]
 [0, 8388607]
 [0, 8388606], then dm-thinp splits it to 2 bios
 [0, 8388479]
 [8388480, 8388606] --- this has problem in process_discard_bio(),
 because the discard size(7 sectors) covers 
 less than a block(128 sectors)
 [8388607, 8388607] --- same problem 
 
 2. Ensure bi_size not overflow and max discard size is of proper granularity

Ideally we'd get upper layers out of the business of knowing about the queue
limits at all - that was the point of the patch series, after all.

Instead of using UINT_MAX, would it work to just make the max 1  31 sectors?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Ming Lin
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet
kent.overstr...@gmail.com wrote:
 On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote:

 On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
  I'm for solution 3:
 
   - keep blk_bio_{discard,write_same}_split, but ensure we never built
 a  4GB bio in blkdev_issue_{discard,write_same}.

 This has problem as I mentioned in solution 1.
 We need to also make sure max discard size is of proper granularity.
 See below example.

   4G: 8388608 sectors
 UINT_MAX: 8388607 sectors

 dm-thinp block size = default discard granularity = 128 sectors

 blkdev_issue_discard(sector=0, nr_sectors=8388608)

 1. Only ensure bi_size not overflow

 It doesn't work.

 [start_sector, end_sector]
 [0, 8388607]
 [0, 8388606], then dm-thinp splits it to 2 bios
 [0, 8388479]
 [8388480, 8388606] --- this has problem in process_discard_bio(),
 because the discard size(7 sectors) covers 
 less than a block(128 sectors)
 [8388607, 8388607] --- same problem

 2. Ensure bi_size not overflow and max discard size is of proper granularity

 Ideally we'd get upper layers out of the business of knowing about the queue
 limits at all - that was the point of the patch series, after all.

 Instead of using UINT_MAX, would it work to just make the max 1  31 
 sectors?'

1  31 = 2G bytes = 0x40 sectors.

Yes, that works as long as it's multiple of granularity.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Kent Overstreet
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote:
 On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet
  Ideally we'd get upper layers out of the business of knowing about the queue
  limits at all - that was the point of the patch series, after all.
 
  Instead of using UINT_MAX, would it work to just make the max 1  31 
  sectors?'
 
 1  31 = 2G bytes = 0x40 sectors.
 
 Yes, that works as long as it's multiple of granularity.

Is granularity required to be a power of two? One would hope, but looking at the
code that doesn't appear to be a requirement... ugh, that's terrible...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Christoph Hellwig
On Thu, Aug 06, 2015 at 05:00:04PM -0700, Kent Overstreet wrote:
 Why is it that we don't want to send giant discards? Is it a latency issue?

Yes.  Take a look at the Configurable max discard size thread(s).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-07 Thread Christoph Hellwig
I'm for solution 3:

 - keep blk_bio_{discard,write_same}_split, but ensure we never built
   a  4GB bio in blkdev_issue_{discard,write_same}.

Note that this isn't special casing, we can't build  4GB bios for
data either, it's just implemented as a side effect right now instead
of checked explicitly.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-06 Thread Kent Overstreet
On Tue, Aug 04, 2015 at 01:36:26PM +0200, Christoph Hellwig wrote:
> On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
> > I think the important thing is the late splitting for regular bio.
> > For discard/write_same bio, how about just don't do late splitting?
> 
> I'd hate having to special case them even more.  Especially as the
> discard splitting is nasty and we really don't want to send giant
> discards by default anyway (see Jens' patches to limit discard size
> by default).

Why is it that we don't want to send giant discards? Is it a latency issue?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-06 Thread Kent Overstreet
On Tue, Aug 04, 2015 at 01:36:26PM +0200, Christoph Hellwig wrote:
 On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
  I think the important thing is the late splitting for regular bio.
  For discard/write_same bio, how about just don't do late splitting?
 
 I'd hate having to special case them even more.  Especially as the
 discard splitting is nasty and we really don't want to send giant
 discards by default anyway (see Jens' patches to limit discard size
 by default).

Why is it that we don't want to send giant discards? Is it a latency issue?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-05 Thread Ming Lin
On Tue, 2015-08-04 at 13:36 +0200, Christoph Hellwig wrote:
> On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
> > I think the important thing is the late splitting for regular bio.
> > For discard/write_same bio, how about just don't do late splitting?
> 
> I'd hate having to special case them even more.  Especially as the
> discard splitting is nasty and we really don't want to send giant
> discards by default anyway (see Jens' patches to limit discard size
> by default).
> 
> So I'd recommend to keep everything as-is, just make sure we don't
> overflow bi_size.

Did you mean to remove "PATCH 4 block: remove split code in
blkdev_issue_discard" or to keep it?

Which of below 2 solutions you prefer?

- Solution 1

remove splits in blkdev_issue_{discard,write_same} and keep
blk_bio_{discard,write_same}_split

But for blkdev_issue_discard(), it's not enough if only make sure
bi_size not overflow, for example, discard 4G

4G bytes = 8388608 sectors
UINT_MAX = 8388607 sectors

So blkdev_issue_discard() will send 2 discard bios.
First bio: sector 0 .. 8388606
Second bio: sector 8388607 .. 8388607

In this case, the 2 discard tests in device-mapper-test-suite still
fail, probably because the second bio start sector is not aligned with
discard_granularity.

So I have to take into account discard_granularity(assume 32 sectors),
then blkdev_issue_discard() will send 2 discard bios, as

First bio: sector 0 .. 8388575
Second bio: sector 8388576 .. 8388607

In this case, both discard tests passed.

- Solution 2

special case discard/write_same bios(You said you hate it).

That is to keep splits in blkdev_issue_{discard,write_same} and remove
blk_bio_{discard,write_same}_split

I think this is more clean way because blkdev_issue_{discard,write_same}
already make sure we don't overflow bi_size.

And blk_bio_{discard,write_same}_split are actually duplicated with the
splits in blkdev_issue_{discard,write_same}. It's OK to remove it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-05 Thread Ming Lin
On Tue, 2015-08-04 at 13:36 +0200, Christoph Hellwig wrote:
 On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
  I think the important thing is the late splitting for regular bio.
  For discard/write_same bio, how about just don't do late splitting?
 
 I'd hate having to special case them even more.  Especially as the
 discard splitting is nasty and we really don't want to send giant
 discards by default anyway (see Jens' patches to limit discard size
 by default).
 
 So I'd recommend to keep everything as-is, just make sure we don't
 overflow bi_size.

Did you mean to remove PATCH 4 block: remove split code in
blkdev_issue_discard or to keep it?

Which of below 2 solutions you prefer?

- Solution 1

remove splits in blkdev_issue_{discard,write_same} and keep
blk_bio_{discard,write_same}_split

But for blkdev_issue_discard(), it's not enough if only make sure
bi_size not overflow, for example, discard 4G

4G bytes = 8388608 sectors
UINT_MAX = 8388607 sectors

So blkdev_issue_discard() will send 2 discard bios.
First bio: sector 0 .. 8388606
Second bio: sector 8388607 .. 8388607

In this case, the 2 discard tests in device-mapper-test-suite still
fail, probably because the second bio start sector is not aligned with
discard_granularity.

So I have to take into account discard_granularity(assume 32 sectors),
then blkdev_issue_discard() will send 2 discard bios, as

First bio: sector 0 .. 8388575
Second bio: sector 8388576 .. 8388607

In this case, both discard tests passed.

- Solution 2

special case discard/write_same bios(You said you hate it).

That is to keep splits in blkdev_issue_{discard,write_same} and remove
blk_bio_{discard,write_same}_split

I think this is more clean way because blkdev_issue_{discard,write_same}
already make sure we don't overflow bi_size.

And blk_bio_{discard,write_same}_split are actually duplicated with the
splits in blkdev_issue_{discard,write_same}. It's OK to remove it.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-04 Thread Christoph Hellwig
On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
> I think the important thing is the late splitting for regular bio.
> For discard/write_same bio, how about just don't do late splitting?

I'd hate having to special case them even more.  Especially as the
discard splitting is nasty and we really don't want to send giant
discards by default anyway (see Jens' patches to limit discard size
by default).

So I'd recommend to keep everything as-is, just make sure we don't
overflow bi_size.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-04 Thread Christoph Hellwig
On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote:
 I think the important thing is the late splitting for regular bio.
 For discard/write_same bio, how about just don't do late splitting?

I'd hate having to special case them even more.  Especially as the
discard splitting is nasty and we really don't want to send giant
discards by default anyway (see Jens' patches to limit discard size
by default).

So I'd recommend to keep everything as-is, just make sure we don't
overflow bi_size.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-02 Thread Ming Lin
On Sat, 2015-08-01 at 12:33 -0400, Mike Snitzer wrote:
> On Sat, Aug 01 2015 at  2:58am -0400,
> Ming Lin  wrote:
> 
> > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
> > > 
> > > OK, once setup, to run the 2 tests in question directly you'd do
> > > something like:
> > > 
> > > dmtest run --suite thin-provisioning -n discard_a_fragmented_device
> > > 
> > > dmtest run --suite thin-provisioning -n 
> > > discard_fully_provisioned_device_benchmark
> > > 
> > > Again, these tests pass without this patchset.
> > 
> > It's caused by patch 4.

Typo. I mean patch 5.

> > When discard size >=4G, the bio->bi_iter.bi_size overflows.
> 
> Thanks for tracking this down!

blkdev_issue_write_same() has same problem.

> 
> > Below is the new patch.
> > 
> > Christoph,
> > Could you also help to review it?
> > 
> > Now we still do "misaligned" check in blkdev_issue_discard().
> > So the same code in blk_bio_discard_split() was removed.
> 
> But I don't agree with this approach.  One of the most meaningful
> benefits of late bio splitting is the upper layers shouldn't _need_ to
> depend on the intermediate devices' queue_limits being stacked properly.
> Your solution to mix discard granularity/alignment checks at the upper
> layer(s) but then split based on max_discard_sectors at the lower layer
> defeats that benefit for discards.
> 
> This will translate to all intermediate layers that might split
> discards needing to worry about granularity/alignment
> too (e.g. how dm-thinp will have to care because it must generate
> discard mappings with associated bios based on how blocks were mapped to
> thinp).

I think the important thing is the late splitting for regular bio.
For discard/write_same bio, how about just don't do late splitting?

That is:
1. remove "PATCH 5: block: remove split code in blkdev_issue_discard"
2. Add below changes to PATCH 1

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1f5dfa0..90b085e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,59 +9,6 @@
 
 #include "blk.h"
 
-static struct bio *blk_bio_discard_split(struct request_queue *q,
-struct bio *bio,
-struct bio_set *bs)
-{
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
-   sector_t tmp;
-   unsigned split_sectors;
-
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q->limits.discard_granularity >> 9, 1U);
-
-   max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-
-   if (unlikely(!max_discard_sectors)) {
-   /* XXX: warn */
-   return NULL;
-   }
-
-   if (bio_sectors(bio) <= max_discard_sectors)
-   return NULL;
-
-   split_sectors = max_discard_sectors;
-
-   /*
-* If the next starting sector would be misaligned, stop the discard at
-* the previous aligned sector.
-*/
-   alignment = (q->limits.discard_alignment >> 9) % granularity;
-
-   tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
-   tmp = sector_div(tmp, granularity);
-
-   if (split_sectors > tmp)
-   split_sectors -= tmp;
-
-   return bio_split(bio, split_sectors, GFP_NOIO, bs);
-}
-
-static struct bio *blk_bio_write_same_split(struct request_queue *q,
-   struct bio *bio,
-   struct bio_set *bs)
-{
-   if (!q->limits.max_write_same_sectors)
-   return NULL;
-
-   if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
-   return NULL;
-
-   return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
-}
-
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 struct bio *bio,
 struct bio_set *bs)
@@ -129,10 +76,8 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio,
 {
struct bio *split;
 
-   if ((*bio)->bi_rw & REQ_DISCARD)
-   split = blk_bio_discard_split(q, *bio, bs);
-   else if ((*bio)->bi_rw & REQ_WRITE_SAME)
-   split = blk_bio_write_same_split(q, *bio, bs);
+   if ((*bio)->bi_rw & REQ_DISCARD || (*bio)->bi_rw & REQ_WRITE_SAME)
+   split = NULL;
else
split = blk_bio_segment_split(q, *bio, q->bio_split);
 

> 
> Also, it is unfortunate that IO that doesn't have a payload is being
> artificially split simply because bio->bi_iter.bi_size is 32bits.

Indeed.
Will it be possible to make it 64bits? I guess no.

> 
> Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-02 Thread Ming Lin
On Sat, 2015-08-01 at 12:33 -0400, Mike Snitzer wrote:
 On Sat, Aug 01 2015 at  2:58am -0400,
 Ming Lin m...@kernel.org wrote:
 
  On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
   
   OK, once setup, to run the 2 tests in question directly you'd do
   something like:
   
   dmtest run --suite thin-provisioning -n discard_a_fragmented_device
   
   dmtest run --suite thin-provisioning -n 
   discard_fully_provisioned_device_benchmark
   
   Again, these tests pass without this patchset.
  
  It's caused by patch 4.

Typo. I mean patch 5.

  When discard size =4G, the bio-bi_iter.bi_size overflows.
 
 Thanks for tracking this down!

blkdev_issue_write_same() has same problem.

 
  Below is the new patch.
  
  Christoph,
  Could you also help to review it?
  
  Now we still do misaligned check in blkdev_issue_discard().
  So the same code in blk_bio_discard_split() was removed.
 
 But I don't agree with this approach.  One of the most meaningful
 benefits of late bio splitting is the upper layers shouldn't _need_ to
 depend on the intermediate devices' queue_limits being stacked properly.
 Your solution to mix discard granularity/alignment checks at the upper
 layer(s) but then split based on max_discard_sectors at the lower layer
 defeats that benefit for discards.
 
 This will translate to all intermediate layers that might split
 discards needing to worry about granularity/alignment
 too (e.g. how dm-thinp will have to care because it must generate
 discard mappings with associated bios based on how blocks were mapped to
 thinp).

I think the important thing is the late splitting for regular bio.
For discard/write_same bio, how about just don't do late splitting?

That is:
1. remove PATCH 5: block: remove split code in blkdev_issue_discard
2. Add below changes to PATCH 1

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1f5dfa0..90b085e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,59 +9,6 @@
 
 #include blk.h
 
-static struct bio *blk_bio_discard_split(struct request_queue *q,
-struct bio *bio,
-struct bio_set *bs)
-{
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
-   sector_t tmp;
-   unsigned split_sectors;
-
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q-limits.discard_granularity  9, 1U);
-
-   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-
-   if (unlikely(!max_discard_sectors)) {
-   /* XXX: warn */
-   return NULL;
-   }
-
-   if (bio_sectors(bio) = max_discard_sectors)
-   return NULL;
-
-   split_sectors = max_discard_sectors;
-
-   /*
-* If the next starting sector would be misaligned, stop the discard at
-* the previous aligned sector.
-*/
-   alignment = (q-limits.discard_alignment  9) % granularity;
-
-   tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
-   tmp = sector_div(tmp, granularity);
-
-   if (split_sectors  tmp)
-   split_sectors -= tmp;
-
-   return bio_split(bio, split_sectors, GFP_NOIO, bs);
-}
-
-static struct bio *blk_bio_write_same_split(struct request_queue *q,
-   struct bio *bio,
-   struct bio_set *bs)
-{
-   if (!q-limits.max_write_same_sectors)
-   return NULL;
-
-   if (bio_sectors(bio) = q-limits.max_write_same_sectors)
-   return NULL;
-
-   return bio_split(bio, q-limits.max_write_same_sectors, GFP_NOIO, bs);
-}
-
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 struct bio *bio,
 struct bio_set *bs)
@@ -129,10 +76,8 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio,
 {
struct bio *split;
 
-   if ((*bio)-bi_rw  REQ_DISCARD)
-   split = blk_bio_discard_split(q, *bio, bs);
-   else if ((*bio)-bi_rw  REQ_WRITE_SAME)
-   split = blk_bio_write_same_split(q, *bio, bs);
+   if ((*bio)-bi_rw  REQ_DISCARD || (*bio)-bi_rw  REQ_WRITE_SAME)
+   split = NULL;
else
split = blk_bio_segment_split(q, *bio, q-bio_split);
 

 
 Also, it is unfortunate that IO that doesn't have a payload is being
 artificially split simply because bio-bi_iter.bi_size is 32bits.

Indeed.
Will it be possible to make it 64bits? I guess no.

 
 Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-01 Thread Mike Snitzer
On Sat, Aug 01 2015 at  2:58am -0400,
Ming Lin  wrote:

> On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
> > 
> > OK, once setup, to run the 2 tests in question directly you'd do
> > something like:
> > 
> > dmtest run --suite thin-provisioning -n discard_a_fragmented_device
> > 
> > dmtest run --suite thin-provisioning -n 
> > discard_fully_provisioned_device_benchmark
> > 
> > Again, these tests pass without this patchset.
> 
> It's caused by patch 4.
> When discard size >=4G, the bio->bi_iter.bi_size overflows.

Thanks for tracking this down!

> Below is the new patch.
> 
> Christoph,
> Could you also help to review it?
> 
> Now we still do "misaligned" check in blkdev_issue_discard().
> So the same code in blk_bio_discard_split() was removed.

But I don't agree with this approach.  One of the most meaningful
benefits of late bio splitting is the upper layers shouldn't _need_ to
depend on the intermediate devices' queue_limits being stacked properly.
Your solution to mix discard granularity/alignment checks at the upper
layer(s) but then split based on max_discard_sectors at the lower layer
defeats that benefit for discards.

This will translate to all intermediate layers that might split
discards needing to worry about granularity/alignment
too (e.g. how dm-thinp will have to care because it must generate
discard mappings with associated bios based on how blocks were mapped to
thinp).

Also, it is unfortunate that IO that doesn't have a payload is being
artificially split simply because bio->bi_iter.bi_size is 32bits.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-01 Thread Ming Lin
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
> On Fri, Jul 31 2015 at  5:19pm -0400,
> Ming Lin  wrote:
> 
> > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer  wrote:
> > > On Mon, Jul 06 2015 at  3:44P -0400,
> > > Ming Lin  wrote:
> > >
> > >> From: Kent Overstreet 
> > >>
> > >> The way the block layer is currently written, it goes to great lengths
> > >> to avoid having to split bios; upper layer code (such as bio_add_page())
> > >> checks what the underlying device can handle and tries to always create
> > >> bios that don't need to be split.
> > >>
> > >> But this approach becomes unwieldy and eventually breaks down with
> > >> stacked devices and devices with dynamic limits, and it adds a lot of
> > >> complexity. If the block layer could split bios as needed, we could
> > >> eliminate a lot of complexity elsewhere - particularly in stacked
> > >> drivers. Code that creates bios can then create whatever size bios are
> > >> convenient, and more importantly stacked drivers don't have to deal with
> > >> both their own bio size limitations and the limitations of the
> > >> (potentially multiple) devices underneath them.  In the future this will
> > >> let us delete merge_bvec_fn and a bunch of other code.
> > >>
> > >> We do this by adding calls to blk_queue_split() to the various
> > >> make_request functions that need it - a few can already handle arbitrary
> > >> size bios. Note that we add the call _after_ any call to
> > >> blk_queue_bounce(); this means that blk_queue_split() and
> > >> blk_recalc_rq_segments() don't need to be concerned with bouncing
> > >> affecting segment merging.
> > >>
> > >> Some make_request_fn() callbacks were simple enough to audit and verify
> > >> they don't need blk_queue_split() calls. The skipped ones are:
> > >>
> > >>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
> > >>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
> > >>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
> > >>  * brd_make_request (ramdisk - drivers/block/brd.c)
> > >>  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
> > >>  * loop_make_request
> > >>  * null_queue_bio
> > >>  * bcache's make_request fns
> > >>
> > >> Some others are almost certainly safe to remove now, but will be left
> > >> for future patches.
> > >>
> > >> Cc: Jens Axboe 
> > >> Cc: Christoph Hellwig 
> > >> Cc: Al Viro 
> > >> Cc: Ming Lei 
> > >> Cc: Neil Brown 
> > >> Cc: Alasdair Kergon 
> > >> Cc: Mike Snitzer 
> > >> Cc: dm-de...@redhat.com
> > >> Cc: Lars Ellenberg 
> > >> Cc: drbd-u...@lists.linbit.com
> > >> Cc: Jiri Kosina 
> > >> Cc: Geoff Levand 
> > >> Cc: Jim Paris 
> > >> Cc: Joshua Morris 
> > >> Cc: Philip Kelleher 
> > >> Cc: Minchan Kim 
> > >> Cc: Nitin Gupta 
> > >> Cc: Oleg Drokin 
> > >> Cc: Andreas Dilger 
> > >> Acked-by: NeilBrown  (for the 'md/md.c' bits)
> > >> Signed-off-by: Kent Overstreet 
> > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
> > >> Signed-off-by: Dongsu Park 
> > >> Signed-off-by: Ming Lin 
> > > ...
> > >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> > >> index 30a0d9f..3707f30 100644
> > >> --- a/block/blk-merge.c
> > >> +++ b/block/blk-merge.c
> > >> @@ -9,12 +9,158 @@
> > >>
> > >>  #include "blk.h"
> > >>
> > >> +static struct bio *blk_bio_discard_split(struct request_queue *q,
> > >> +  struct bio *bio,
> > >> +  struct bio_set *bs)
> > >> +{
> > >> + unsigned int max_discard_sectors, granularity;
> > >> + int alignment;
> > >> + sector_t tmp;
> > >> + unsigned split_sectors;
> > >> +
> > >> + /* Zero-sector (unknown) and one-sector granularities are the 
> > >> same.  */
> > >> + granularity = max(q->limits.discard_granularity >> 9, 1U);
> > >> +
> > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX 
> > >> >> 9);
> > >> + max_discard_sectors -= max_discard_sectors % granularity;
> > >> +
> > >> + if (unlikely(!max_discard_sectors)) {
> > >> + /* XXX: warn */
> > >> + return NULL;
> > >> + }
> > >> +
> > >> + if (bio_sectors(bio) <= max_discard_sectors)
> > >> + return NULL;
> > >> +
> > >> + split_sectors = max_discard_sectors;
> > >> +
> > >> + /*
> > >> +  * If the next starting sector would be misaligned, stop the 
> > >> discard at
> > >> +  * the previous aligned sector.
> > >> +  */
> > >> + alignment = (q->limits.discard_alignment >> 9) % granularity;
> > >> +
> > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
> > >> + tmp = sector_div(tmp, granularity);
> > >> +
> > >> + if (split_sectors > tmp)
> > >> + split_sectors -= tmp;
> > >> +
> > >> + return bio_split(bio, split_sectors, GFP_NOIO, bs);
> > >> +}
> > >
> > > This code to stop the discard at the previous aligned sector could be
> > > the reason why I have 2 device-mapper-test-suite tests in the
> 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-01 Thread Ming Lin
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
 On Fri, Jul 31 2015 at  5:19pm -0400,
 Ming Lin m...@kernel.org wrote:
 
  On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote:
   On Mon, Jul 06 2015 at  3:44P -0400,
   Ming Lin m...@kernel.org wrote:
  
   From: Kent Overstreet kent.overstr...@gmail.com
  
   The way the block layer is currently written, it goes to great lengths
   to avoid having to split bios; upper layer code (such as bio_add_page())
   checks what the underlying device can handle and tries to always create
   bios that don't need to be split.
  
   But this approach becomes unwieldy and eventually breaks down with
   stacked devices and devices with dynamic limits, and it adds a lot of
   complexity. If the block layer could split bios as needed, we could
   eliminate a lot of complexity elsewhere - particularly in stacked
   drivers. Code that creates bios can then create whatever size bios are
   convenient, and more importantly stacked drivers don't have to deal with
   both their own bio size limitations and the limitations of the
   (potentially multiple) devices underneath them.  In the future this will
   let us delete merge_bvec_fn and a bunch of other code.
  
   We do this by adding calls to blk_queue_split() to the various
   make_request functions that need it - a few can already handle arbitrary
   size bios. Note that we add the call _after_ any call to
   blk_queue_bounce(); this means that blk_queue_split() and
   blk_recalc_rq_segments() don't need to be concerned with bouncing
   affecting segment merging.
  
   Some make_request_fn() callbacks were simple enough to audit and verify
   they don't need blk_queue_split() calls. The skipped ones are:
  
* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fns
  
   Some others are almost certainly safe to remove now, but will be left
   for future patches.
  
   Cc: Jens Axboe ax...@kernel.dk
   Cc: Christoph Hellwig h...@infradead.org
   Cc: Al Viro v...@zeniv.linux.org.uk
   Cc: Ming Lei ming@canonical.com
   Cc: Neil Brown ne...@suse.de
   Cc: Alasdair Kergon a...@redhat.com
   Cc: Mike Snitzer snit...@redhat.com
   Cc: dm-de...@redhat.com
   Cc: Lars Ellenberg drbd-...@lists.linbit.com
   Cc: drbd-u...@lists.linbit.com
   Cc: Jiri Kosina jkos...@suse.cz
   Cc: Geoff Levand ge...@infradead.org
   Cc: Jim Paris j...@jtan.com
   Cc: Joshua Morris josh.h.mor...@us.ibm.com
   Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
   Cc: Minchan Kim minc...@kernel.org
   Cc: Nitin Gupta ngu...@vflare.org
   Cc: Oleg Drokin oleg.dro...@intel.com
   Cc: Andreas Dilger andreas.dil...@intel.com
   Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
   Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
   [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
   Signed-off-by: Dongsu Park dp...@posteo.net
   Signed-off-by: Ming Lin min...@ssi.samsung.com
   ...
   diff --git a/block/blk-merge.c b/block/blk-merge.c
   index 30a0d9f..3707f30 100644
   --- a/block/blk-merge.c
   +++ b/block/blk-merge.c
   @@ -9,12 +9,158 @@
  
#include blk.h
  
   +static struct bio *blk_bio_discard_split(struct request_queue *q,
   +  struct bio *bio,
   +  struct bio_set *bs)
   +{
   + unsigned int max_discard_sectors, granularity;
   + int alignment;
   + sector_t tmp;
   + unsigned split_sectors;
   +
   + /* Zero-sector (unknown) and one-sector granularities are the 
   same.  */
   + granularity = max(q-limits.discard_granularity  9, 1U);
   +
   + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 
9);
   + max_discard_sectors -= max_discard_sectors % granularity;
   +
   + if (unlikely(!max_discard_sectors)) {
   + /* XXX: warn */
   + return NULL;
   + }
   +
   + if (bio_sectors(bio) = max_discard_sectors)
   + return NULL;
   +
   + split_sectors = max_discard_sectors;
   +
   + /*
   +  * If the next starting sector would be misaligned, stop the 
   discard at
   +  * the previous aligned sector.
   +  */
   + alignment = (q-limits.discard_alignment  9) % granularity;
   +
   + tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
   + tmp = sector_div(tmp, granularity);
   +
   + if (split_sectors  tmp)
   + split_sectors -= tmp;
   +
   + return bio_split(bio, split_sectors, GFP_NOIO, bs);
   +}
  
   This code to stop the discard at the previous aligned sector could be
   the reason why I have 2 device-mapper-test-suite tests in the
   

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-01 Thread Mike Snitzer
On Sat, Aug 01 2015 at  2:58am -0400,
Ming Lin m...@kernel.org wrote:

 On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
  
  OK, once setup, to run the 2 tests in question directly you'd do
  something like:
  
  dmtest run --suite thin-provisioning -n discard_a_fragmented_device
  
  dmtest run --suite thin-provisioning -n 
  discard_fully_provisioned_device_benchmark
  
  Again, these tests pass without this patchset.
 
 It's caused by patch 4.
 When discard size =4G, the bio-bi_iter.bi_size overflows.

Thanks for tracking this down!

 Below is the new patch.
 
 Christoph,
 Could you also help to review it?
 
 Now we still do misaligned check in blkdev_issue_discard().
 So the same code in blk_bio_discard_split() was removed.

But I don't agree with this approach.  One of the most meaningful
benefits of late bio splitting is the upper layers shouldn't _need_ to
depend on the intermediate devices' queue_limits being stacked properly.
Your solution to mix discard granularity/alignment checks at the upper
layer(s) but then split based on max_discard_sectors at the lower layer
defeats that benefit for discards.

This will translate to all intermediate layers that might split
discards needing to worry about granularity/alignment
too (e.g. how dm-thinp will have to care because it must generate
discard mappings with associated bios based on how blocks were mapped to
thinp).

Also, it is unfortunate that IO that doesn't have a payload is being
artificially split simply because bio-bi_iter.bi_size is 32bits.

Mike
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Ming Lin
On Fri, Jul 31, 2015 at 3:02 PM, Ming Lin  wrote:
> On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
>> On Fri, Jul 31 2015 at  5:19pm -0400,
>> Ming Lin  wrote:
>>
>> > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer  wrote:
>> > > On Mon, Jul 06 2015 at  3:44P -0400,
>> > > Ming Lin  wrote:
>> > >
>> > >> From: Kent Overstreet 
>> > >>
>> > >> The way the block layer is currently written, it goes to great lengths
>> > >> to avoid having to split bios; upper layer code (such as bio_add_page())
>> > >> checks what the underlying device can handle and tries to always create
>> > >> bios that don't need to be split.
>> > >>
>> > >> But this approach becomes unwieldy and eventually breaks down with
>> > >> stacked devices and devices with dynamic limits, and it adds a lot of
>> > >> complexity. If the block layer could split bios as needed, we could
>> > >> eliminate a lot of complexity elsewhere - particularly in stacked
>> > >> drivers. Code that creates bios can then create whatever size bios are
>> > >> convenient, and more importantly stacked drivers don't have to deal with
>> > >> both their own bio size limitations and the limitations of the
>> > >> (potentially multiple) devices underneath them.  In the future this will
>> > >> let us delete merge_bvec_fn and a bunch of other code.
>> > >>
>> > >> We do this by adding calls to blk_queue_split() to the various
>> > >> make_request functions that need it - a few can already handle arbitrary
>> > >> size bios. Note that we add the call _after_ any call to
>> > >> blk_queue_bounce(); this means that blk_queue_split() and
>> > >> blk_recalc_rq_segments() don't need to be concerned with bouncing
>> > >> affecting segment merging.
>> > >>
>> > >> Some make_request_fn() callbacks were simple enough to audit and verify
>> > >> they don't need blk_queue_split() calls. The skipped ones are:
>> > >>
>> > >>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
>> > >>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>> > >>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>> > >>  * brd_make_request (ramdisk - drivers/block/brd.c)
>> > >>  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
>> > >>  * loop_make_request
>> > >>  * null_queue_bio
>> > >>  * bcache's make_request fns
>> > >>
>> > >> Some others are almost certainly safe to remove now, but will be left
>> > >> for future patches.
>> > >>
>> > >> Cc: Jens Axboe 
>> > >> Cc: Christoph Hellwig 
>> > >> Cc: Al Viro 
>> > >> Cc: Ming Lei 
>> > >> Cc: Neil Brown 
>> > >> Cc: Alasdair Kergon 
>> > >> Cc: Mike Snitzer 
>> > >> Cc: dm-de...@redhat.com
>> > >> Cc: Lars Ellenberg 
>> > >> Cc: drbd-u...@lists.linbit.com
>> > >> Cc: Jiri Kosina 
>> > >> Cc: Geoff Levand 
>> > >> Cc: Jim Paris 
>> > >> Cc: Joshua Morris 
>> > >> Cc: Philip Kelleher 
>> > >> Cc: Minchan Kim 
>> > >> Cc: Nitin Gupta 
>> > >> Cc: Oleg Drokin 
>> > >> Cc: Andreas Dilger 
>> > >> Acked-by: NeilBrown  (for the 'md/md.c' bits)
>> > >> Signed-off-by: Kent Overstreet 
>> > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
>> > >> Signed-off-by: Dongsu Park 
>> > >> Signed-off-by: Ming Lin 
>> > > ...
>> > >> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> > >> index 30a0d9f..3707f30 100644
>> > >> --- a/block/blk-merge.c
>> > >> +++ b/block/blk-merge.c
>> > >> @@ -9,12 +9,158 @@
>> > >>
>> > >>  #include "blk.h"
>> > >>
>> > >> +static struct bio *blk_bio_discard_split(struct request_queue *q,
>> > >> +  struct bio *bio,
>> > >> +  struct bio_set *bs)
>> > >> +{
>> > >> + unsigned int max_discard_sectors, granularity;
>> > >> + int alignment;
>> > >> + sector_t tmp;
>> > >> + unsigned split_sectors;
>> > >> +
>> > >> + /* Zero-sector (unknown) and one-sector granularities are the 
>> > >> same.  */
>> > >> + granularity = max(q->limits.discard_granularity >> 9, 1U);
>> > >> +
>> > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX 
>> > >> >> 9);
>> > >> + max_discard_sectors -= max_discard_sectors % granularity;
>> > >> +
>> > >> + if (unlikely(!max_discard_sectors)) {
>> > >> + /* XXX: warn */
>> > >> + return NULL;
>> > >> + }
>> > >> +
>> > >> + if (bio_sectors(bio) <= max_discard_sectors)
>> > >> + return NULL;
>> > >> +
>> > >> + split_sectors = max_discard_sectors;
>> > >> +
>> > >> + /*
>> > >> +  * If the next starting sector would be misaligned, stop the 
>> > >> discard at
>> > >> +  * the previous aligned sector.
>> > >> +  */
>> > >> + alignment = (q->limits.discard_alignment >> 9) % granularity;
>> > >> +
>> > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
>> > >> + tmp = sector_div(tmp, granularity);
>> > >> +
>> > >> + if (split_sectors > tmp)
>> > >> + split_sectors -= tmp;
>> > >> +
>> > >> + return bio_split(bio, split_sectors, 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Ming Lin
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
> On Fri, Jul 31 2015 at  5:19pm -0400,
> Ming Lin  wrote:
> 
> > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer  wrote:
> > > On Mon, Jul 06 2015 at  3:44P -0400,
> > > Ming Lin  wrote:
> > >
> > >> From: Kent Overstreet 
> > >>
> > >> The way the block layer is currently written, it goes to great lengths
> > >> to avoid having to split bios; upper layer code (such as bio_add_page())
> > >> checks what the underlying device can handle and tries to always create
> > >> bios that don't need to be split.
> > >>
> > >> But this approach becomes unwieldy and eventually breaks down with
> > >> stacked devices and devices with dynamic limits, and it adds a lot of
> > >> complexity. If the block layer could split bios as needed, we could
> > >> eliminate a lot of complexity elsewhere - particularly in stacked
> > >> drivers. Code that creates bios can then create whatever size bios are
> > >> convenient, and more importantly stacked drivers don't have to deal with
> > >> both their own bio size limitations and the limitations of the
> > >> (potentially multiple) devices underneath them.  In the future this will
> > >> let us delete merge_bvec_fn and a bunch of other code.
> > >>
> > >> We do this by adding calls to blk_queue_split() to the various
> > >> make_request functions that need it - a few can already handle arbitrary
> > >> size bios. Note that we add the call _after_ any call to
> > >> blk_queue_bounce(); this means that blk_queue_split() and
> > >> blk_recalc_rq_segments() don't need to be concerned with bouncing
> > >> affecting segment merging.
> > >>
> > >> Some make_request_fn() callbacks were simple enough to audit and verify
> > >> they don't need blk_queue_split() calls. The skipped ones are:
> > >>
> > >>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
> > >>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
> > >>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
> > >>  * brd_make_request (ramdisk - drivers/block/brd.c)
> > >>  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
> > >>  * loop_make_request
> > >>  * null_queue_bio
> > >>  * bcache's make_request fns
> > >>
> > >> Some others are almost certainly safe to remove now, but will be left
> > >> for future patches.
> > >>
> > >> Cc: Jens Axboe 
> > >> Cc: Christoph Hellwig 
> > >> Cc: Al Viro 
> > >> Cc: Ming Lei 
> > >> Cc: Neil Brown 
> > >> Cc: Alasdair Kergon 
> > >> Cc: Mike Snitzer 
> > >> Cc: dm-de...@redhat.com
> > >> Cc: Lars Ellenberg 
> > >> Cc: drbd-u...@lists.linbit.com
> > >> Cc: Jiri Kosina 
> > >> Cc: Geoff Levand 
> > >> Cc: Jim Paris 
> > >> Cc: Joshua Morris 
> > >> Cc: Philip Kelleher 
> > >> Cc: Minchan Kim 
> > >> Cc: Nitin Gupta 
> > >> Cc: Oleg Drokin 
> > >> Cc: Andreas Dilger 
> > >> Acked-by: NeilBrown  (for the 'md/md.c' bits)
> > >> Signed-off-by: Kent Overstreet 
> > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
> > >> Signed-off-by: Dongsu Park 
> > >> Signed-off-by: Ming Lin 
> > > ...
> > >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> > >> index 30a0d9f..3707f30 100644
> > >> --- a/block/blk-merge.c
> > >> +++ b/block/blk-merge.c
> > >> @@ -9,12 +9,158 @@
> > >>
> > >>  #include "blk.h"
> > >>
> > >> +static struct bio *blk_bio_discard_split(struct request_queue *q,
> > >> +  struct bio *bio,
> > >> +  struct bio_set *bs)
> > >> +{
> > >> + unsigned int max_discard_sectors, granularity;
> > >> + int alignment;
> > >> + sector_t tmp;
> > >> + unsigned split_sectors;
> > >> +
> > >> + /* Zero-sector (unknown) and one-sector granularities are the 
> > >> same.  */
> > >> + granularity = max(q->limits.discard_granularity >> 9, 1U);
> > >> +
> > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX 
> > >> >> 9);
> > >> + max_discard_sectors -= max_discard_sectors % granularity;
> > >> +
> > >> + if (unlikely(!max_discard_sectors)) {
> > >> + /* XXX: warn */
> > >> + return NULL;
> > >> + }
> > >> +
> > >> + if (bio_sectors(bio) <= max_discard_sectors)
> > >> + return NULL;
> > >> +
> > >> + split_sectors = max_discard_sectors;
> > >> +
> > >> + /*
> > >> +  * If the next starting sector would be misaligned, stop the 
> > >> discard at
> > >> +  * the previous aligned sector.
> > >> +  */
> > >> + alignment = (q->limits.discard_alignment >> 9) % granularity;
> > >> +
> > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
> > >> + tmp = sector_div(tmp, granularity);
> > >> +
> > >> + if (split_sectors > tmp)
> > >> + split_sectors -= tmp;
> > >> +
> > >> + return bio_split(bio, split_sectors, GFP_NOIO, bs);
> > >> +}
> > >
> > > This code to stop the discard at the previous aligned sector could be
> > > the reason why I have 2 device-mapper-test-suite tests in the
> 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Mike Snitzer
On Fri, Jul 31 2015 at  5:19pm -0400,
Ming Lin  wrote:

> On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer  wrote:
> > On Mon, Jul 06 2015 at  3:44P -0400,
> > Ming Lin  wrote:
> >
> >> From: Kent Overstreet 
> >>
> >> The way the block layer is currently written, it goes to great lengths
> >> to avoid having to split bios; upper layer code (such as bio_add_page())
> >> checks what the underlying device can handle and tries to always create
> >> bios that don't need to be split.
> >>
> >> But this approach becomes unwieldy and eventually breaks down with
> >> stacked devices and devices with dynamic limits, and it adds a lot of
> >> complexity. If the block layer could split bios as needed, we could
> >> eliminate a lot of complexity elsewhere - particularly in stacked
> >> drivers. Code that creates bios can then create whatever size bios are
> >> convenient, and more importantly stacked drivers don't have to deal with
> >> both their own bio size limitations and the limitations of the
> >> (potentially multiple) devices underneath them.  In the future this will
> >> let us delete merge_bvec_fn and a bunch of other code.
> >>
> >> We do this by adding calls to blk_queue_split() to the various
> >> make_request functions that need it - a few can already handle arbitrary
> >> size bios. Note that we add the call _after_ any call to
> >> blk_queue_bounce(); this means that blk_queue_split() and
> >> blk_recalc_rq_segments() don't need to be concerned with bouncing
> >> affecting segment merging.
> >>
> >> Some make_request_fn() callbacks were simple enough to audit and verify
> >> they don't need blk_queue_split() calls. The skipped ones are:
> >>
> >>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
> >>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
> >>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
> >>  * brd_make_request (ramdisk - drivers/block/brd.c)
> >>  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
> >>  * loop_make_request
> >>  * null_queue_bio
> >>  * bcache's make_request fns
> >>
> >> Some others are almost certainly safe to remove now, but will be left
> >> for future patches.
> >>
> >> Cc: Jens Axboe 
> >> Cc: Christoph Hellwig 
> >> Cc: Al Viro 
> >> Cc: Ming Lei 
> >> Cc: Neil Brown 
> >> Cc: Alasdair Kergon 
> >> Cc: Mike Snitzer 
> >> Cc: dm-de...@redhat.com
> >> Cc: Lars Ellenberg 
> >> Cc: drbd-u...@lists.linbit.com
> >> Cc: Jiri Kosina 
> >> Cc: Geoff Levand 
> >> Cc: Jim Paris 
> >> Cc: Joshua Morris 
> >> Cc: Philip Kelleher 
> >> Cc: Minchan Kim 
> >> Cc: Nitin Gupta 
> >> Cc: Oleg Drokin 
> >> Cc: Andreas Dilger 
> >> Acked-by: NeilBrown  (for the 'md/md.c' bits)
> >> Signed-off-by: Kent Overstreet 
> >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
> >> Signed-off-by: Dongsu Park 
> >> Signed-off-by: Ming Lin 
> > ...
> >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> index 30a0d9f..3707f30 100644
> >> --- a/block/blk-merge.c
> >> +++ b/block/blk-merge.c
> >> @@ -9,12 +9,158 @@
> >>
> >>  #include "blk.h"
> >>
> >> +static struct bio *blk_bio_discard_split(struct request_queue *q,
> >> +  struct bio *bio,
> >> +  struct bio_set *bs)
> >> +{
> >> + unsigned int max_discard_sectors, granularity;
> >> + int alignment;
> >> + sector_t tmp;
> >> + unsigned split_sectors;
> >> +
> >> + /* Zero-sector (unknown) and one-sector granularities are the same.  
> >> */
> >> + granularity = max(q->limits.discard_granularity >> 9, 1U);
> >> +
> >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 
> >> 9);
> >> + max_discard_sectors -= max_discard_sectors % granularity;
> >> +
> >> + if (unlikely(!max_discard_sectors)) {
> >> + /* XXX: warn */
> >> + return NULL;
> >> + }
> >> +
> >> + if (bio_sectors(bio) <= max_discard_sectors)
> >> + return NULL;
> >> +
> >> + split_sectors = max_discard_sectors;
> >> +
> >> + /*
> >> +  * If the next starting sector would be misaligned, stop the discard 
> >> at
> >> +  * the previous aligned sector.
> >> +  */
> >> + alignment = (q->limits.discard_alignment >> 9) % granularity;
> >> +
> >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
> >> + tmp = sector_div(tmp, granularity);
> >> +
> >> + if (split_sectors > tmp)
> >> + split_sectors -= tmp;
> >> +
> >> + return bio_split(bio, split_sectors, GFP_NOIO, bs);
> >> +}
> >
> > This code to stop the discard at the previous aligned sector could be
> > the reason why I have 2 device-mapper-test-suite tests in the
> > 'thin-provisioning' testsuite failing due to this patchset:
> 
> I'm setting up the testsuite to debug.

OK, once setup, to run the 2 tests in question directly you'd do
something like:

dmtest run --suite thin-provisioning -n discard_a_fragmented_device

dmtest run --suite thin-provisioning -n 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Ming Lin
On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer  wrote:
> On Mon, Jul 06 2015 at  3:44P -0400,
> Ming Lin  wrote:
>
>> From: Kent Overstreet 
>>
>> The way the block layer is currently written, it goes to great lengths
>> to avoid having to split bios; upper layer code (such as bio_add_page())
>> checks what the underlying device can handle and tries to always create
>> bios that don't need to be split.
>>
>> But this approach becomes unwieldy and eventually breaks down with
>> stacked devices and devices with dynamic limits, and it adds a lot of
>> complexity. If the block layer could split bios as needed, we could
>> eliminate a lot of complexity elsewhere - particularly in stacked
>> drivers. Code that creates bios can then create whatever size bios are
>> convenient, and more importantly stacked drivers don't have to deal with
>> both their own bio size limitations and the limitations of the
>> (potentially multiple) devices underneath them.  In the future this will
>> let us delete merge_bvec_fn and a bunch of other code.
>>
>> We do this by adding calls to blk_queue_split() to the various
>> make_request functions that need it - a few can already handle arbitrary
>> size bios. Note that we add the call _after_ any call to
>> blk_queue_bounce(); this means that blk_queue_split() and
>> blk_recalc_rq_segments() don't need to be concerned with bouncing
>> affecting segment merging.
>>
>> Some make_request_fn() callbacks were simple enough to audit and verify
>> they don't need blk_queue_split() calls. The skipped ones are:
>>
>>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
>>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>>  * brd_make_request (ramdisk - drivers/block/brd.c)
>>  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
>>  * loop_make_request
>>  * null_queue_bio
>>  * bcache's make_request fns
>>
>> Some others are almost certainly safe to remove now, but will be left
>> for future patches.
>>
>> Cc: Jens Axboe 
>> Cc: Christoph Hellwig 
>> Cc: Al Viro 
>> Cc: Ming Lei 
>> Cc: Neil Brown 
>> Cc: Alasdair Kergon 
>> Cc: Mike Snitzer 
>> Cc: dm-de...@redhat.com
>> Cc: Lars Ellenberg 
>> Cc: drbd-u...@lists.linbit.com
>> Cc: Jiri Kosina 
>> Cc: Geoff Levand 
>> Cc: Jim Paris 
>> Cc: Joshua Morris 
>> Cc: Philip Kelleher 
>> Cc: Minchan Kim 
>> Cc: Nitin Gupta 
>> Cc: Oleg Drokin 
>> Cc: Andreas Dilger 
>> Acked-by: NeilBrown  (for the 'md/md.c' bits)
>> Signed-off-by: Kent Overstreet 
>> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
>> Signed-off-by: Dongsu Park 
>> Signed-off-by: Ming Lin 
> ...
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 30a0d9f..3707f30 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -9,12 +9,158 @@
>>
>>  #include "blk.h"
>>
>> +static struct bio *blk_bio_discard_split(struct request_queue *q,
>> +  struct bio *bio,
>> +  struct bio_set *bs)
>> +{
>> + unsigned int max_discard_sectors, granularity;
>> + int alignment;
>> + sector_t tmp;
>> + unsigned split_sectors;
>> +
>> + /* Zero-sector (unknown) and one-sector granularities are the same.  */
>> + granularity = max(q->limits.discard_granularity >> 9, 1U);
>> +
>> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 
>> 9);
>> + max_discard_sectors -= max_discard_sectors % granularity;
>> +
>> + if (unlikely(!max_discard_sectors)) {
>> + /* XXX: warn */
>> + return NULL;
>> + }
>> +
>> + if (bio_sectors(bio) <= max_discard_sectors)
>> + return NULL;
>> +
>> + split_sectors = max_discard_sectors;
>> +
>> + /*
>> +  * If the next starting sector would be misaligned, stop the discard at
>> +  * the previous aligned sector.
>> +  */
>> + alignment = (q->limits.discard_alignment >> 9) % granularity;
>> +
>> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
>> + tmp = sector_div(tmp, granularity);
>> +
>> + if (split_sectors > tmp)
>> + split_sectors -= tmp;
>> +
>> + return bio_split(bio, split_sectors, GFP_NOIO, bs);
>> +}
>
> This code to stop the discard at the previous aligned sector could be
> the reason why I have 2 device-mapper-test-suite tests in the
> 'thin-provisioning' testsuite failing due to this patchset:

I'm setting up the testsuite to debug.

Ming

>
>   4) Failure:
> test_discard_a_fragmented_device(DiscardQuickTests)
> 
> [/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in
>  `block in assert_no_mappings'
>  
> /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in
>  `call'
>  
> /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in
>  `block in with_dev_md'
>  
> 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Mike Snitzer
On Mon, Jul 06 2015 at  3:44P -0400,
Ming Lin  wrote:

> From: Kent Overstreet 
> 
> The way the block layer is currently written, it goes to great lengths
> to avoid having to split bios; upper layer code (such as bio_add_page())
> checks what the underlying device can handle and tries to always create
> bios that don't need to be split.
> 
> But this approach becomes unwieldy and eventually breaks down with
> stacked devices and devices with dynamic limits, and it adds a lot of
> complexity. If the block layer could split bios as needed, we could
> eliminate a lot of complexity elsewhere - particularly in stacked
> drivers. Code that creates bios can then create whatever size bios are
> convenient, and more importantly stacked drivers don't have to deal with
> both their own bio size limitations and the limitations of the
> (potentially multiple) devices underneath them.  In the future this will
> let us delete merge_bvec_fn and a bunch of other code.
> 
> We do this by adding calls to blk_queue_split() to the various
> make_request functions that need it - a few can already handle arbitrary
> size bios. Note that we add the call _after_ any call to
> blk_queue_bounce(); this means that blk_queue_split() and
> blk_recalc_rq_segments() don't need to be concerned with bouncing
> affecting segment merging.
> 
> Some make_request_fn() callbacks were simple enough to audit and verify
> they don't need blk_queue_split() calls. The skipped ones are:
> 
>  * nfhd_make_request (arch/m68k/emu/nfblock.c)
>  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
>  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
>  * brd_make_request (ramdisk - drivers/block/brd.c)
>  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
>  * loop_make_request
>  * null_queue_bio
>  * bcache's make_request fns
> 
> Some others are almost certainly safe to remove now, but will be left
> for future patches.
> 
> Cc: Jens Axboe 
> Cc: Christoph Hellwig 
> Cc: Al Viro 
> Cc: Ming Lei 
> Cc: Neil Brown 
> Cc: Alasdair Kergon 
> Cc: Mike Snitzer 
> Cc: dm-de...@redhat.com
> Cc: Lars Ellenberg 
> Cc: drbd-u...@lists.linbit.com
> Cc: Jiri Kosina 
> Cc: Geoff Levand 
> Cc: Jim Paris 
> Cc: Joshua Morris 
> Cc: Philip Kelleher 
> Cc: Minchan Kim 
> Cc: Nitin Gupta 
> Cc: Oleg Drokin 
> Cc: Andreas Dilger 
> Acked-by: NeilBrown  (for the 'md/md.c' bits)
> Signed-off-by: Kent Overstreet 
> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
> Signed-off-by: Dongsu Park 
> Signed-off-by: Ming Lin 
...
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 30a0d9f..3707f30 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -9,12 +9,158 @@
>  
>  #include "blk.h"
>  
> +static struct bio *blk_bio_discard_split(struct request_queue *q,
> +  struct bio *bio,
> +  struct bio_set *bs)
> +{
> + unsigned int max_discard_sectors, granularity;
> + int alignment;
> + sector_t tmp;
> + unsigned split_sectors;
> +
> + /* Zero-sector (unknown) and one-sector granularities are the same.  */
> + granularity = max(q->limits.discard_granularity >> 9, 1U);
> +
> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> + max_discard_sectors -= max_discard_sectors % granularity;
> +
> + if (unlikely(!max_discard_sectors)) {
> + /* XXX: warn */
> + return NULL;
> + }
> +
> + if (bio_sectors(bio) <= max_discard_sectors)
> + return NULL;
> +
> + split_sectors = max_discard_sectors;
> +
> + /*
> +  * If the next starting sector would be misaligned, stop the discard at
> +  * the previous aligned sector.
> +  */
> + alignment = (q->limits.discard_alignment >> 9) % granularity;
> +
> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
> + tmp = sector_div(tmp, granularity);
> +
> + if (split_sectors > tmp)
> + split_sectors -= tmp;
> +
> + return bio_split(bio, split_sectors, GFP_NOIO, bs);
> +}

This code to stop the discard at the previous aligned sector could be
the reason why I have 2 device-mapper-test-suite tests in the
'thin-provisioning' testsuite failing due to this patchset:

  4) Failure:
test_discard_a_fragmented_device(DiscardQuickTests)

[/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in
 `block in assert_no_mappings'
 
/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in
 `call'
 
/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in
 `block in with_dev_md'
 
/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:40:in
 `each'
 
/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:40:in
 `with_dev_md'
 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Mike Snitzer
On Mon, Jul 06 2015 at  3:44P -0400,
Ming Lin m...@kernel.org wrote:

 From: Kent Overstreet kent.overstr...@gmail.com
 
 The way the block layer is currently written, it goes to great lengths
 to avoid having to split bios; upper layer code (such as bio_add_page())
 checks what the underlying device can handle and tries to always create
 bios that don't need to be split.
 
 But this approach becomes unwieldy and eventually breaks down with
 stacked devices and devices with dynamic limits, and it adds a lot of
 complexity. If the block layer could split bios as needed, we could
 eliminate a lot of complexity elsewhere - particularly in stacked
 drivers. Code that creates bios can then create whatever size bios are
 convenient, and more importantly stacked drivers don't have to deal with
 both their own bio size limitations and the limitations of the
 (potentially multiple) devices underneath them.  In the future this will
 let us delete merge_bvec_fn and a bunch of other code.
 
 We do this by adding calls to blk_queue_split() to the various
 make_request functions that need it - a few can already handle arbitrary
 size bios. Note that we add the call _after_ any call to
 blk_queue_bounce(); this means that blk_queue_split() and
 blk_recalc_rq_segments() don't need to be concerned with bouncing
 affecting segment merging.
 
 Some make_request_fn() callbacks were simple enough to audit and verify
 they don't need blk_queue_split() calls. The skipped ones are:
 
  * nfhd_make_request (arch/m68k/emu/nfblock.c)
  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
  * brd_make_request (ramdisk - drivers/block/brd.c)
  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
  * loop_make_request
  * null_queue_bio
  * bcache's make_request fns
 
 Some others are almost certainly safe to remove now, but will be left
 for future patches.
 
 Cc: Jens Axboe ax...@kernel.dk
 Cc: Christoph Hellwig h...@infradead.org
 Cc: Al Viro v...@zeniv.linux.org.uk
 Cc: Ming Lei ming@canonical.com
 Cc: Neil Brown ne...@suse.de
 Cc: Alasdair Kergon a...@redhat.com
 Cc: Mike Snitzer snit...@redhat.com
 Cc: dm-de...@redhat.com
 Cc: Lars Ellenberg drbd-...@lists.linbit.com
 Cc: drbd-u...@lists.linbit.com
 Cc: Jiri Kosina jkos...@suse.cz
 Cc: Geoff Levand ge...@infradead.org
 Cc: Jim Paris j...@jtan.com
 Cc: Joshua Morris josh.h.mor...@us.ibm.com
 Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
 Cc: Minchan Kim minc...@kernel.org
 Cc: Nitin Gupta ngu...@vflare.org
 Cc: Oleg Drokin oleg.dro...@intel.com
 Cc: Andreas Dilger andreas.dil...@intel.com
 Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
 Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
 [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
 Signed-off-by: Dongsu Park dp...@posteo.net
 Signed-off-by: Ming Lin min...@ssi.samsung.com
...
 diff --git a/block/blk-merge.c b/block/blk-merge.c
 index 30a0d9f..3707f30 100644
 --- a/block/blk-merge.c
 +++ b/block/blk-merge.c
 @@ -9,12 +9,158 @@
  
  #include blk.h
  
 +static struct bio *blk_bio_discard_split(struct request_queue *q,
 +  struct bio *bio,
 +  struct bio_set *bs)
 +{
 + unsigned int max_discard_sectors, granularity;
 + int alignment;
 + sector_t tmp;
 + unsigned split_sectors;
 +
 + /* Zero-sector (unknown) and one-sector granularities are the same.  */
 + granularity = max(q-limits.discard_granularity  9, 1U);
 +
 + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
 + max_discard_sectors -= max_discard_sectors % granularity;
 +
 + if (unlikely(!max_discard_sectors)) {
 + /* XXX: warn */
 + return NULL;
 + }
 +
 + if (bio_sectors(bio) = max_discard_sectors)
 + return NULL;
 +
 + split_sectors = max_discard_sectors;
 +
 + /*
 +  * If the next starting sector would be misaligned, stop the discard at
 +  * the previous aligned sector.
 +  */
 + alignment = (q-limits.discard_alignment  9) % granularity;
 +
 + tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
 + tmp = sector_div(tmp, granularity);
 +
 + if (split_sectors  tmp)
 + split_sectors -= tmp;
 +
 + return bio_split(bio, split_sectors, GFP_NOIO, bs);
 +}

This code to stop the discard at the previous aligned sector could be
the reason why I have 2 device-mapper-test-suite tests in the
'thin-provisioning' testsuite failing due to this patchset:

  4) Failure:
test_discard_a_fragmented_device(DiscardQuickTests)

[/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in
 `block in assert_no_mappings'
 
/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in
 `call'
 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Mike Snitzer
On Fri, Jul 31 2015 at  5:19pm -0400,
Ming Lin m...@kernel.org wrote:

 On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote:
  On Mon, Jul 06 2015 at  3:44P -0400,
  Ming Lin m...@kernel.org wrote:
 
  From: Kent Overstreet kent.overstr...@gmail.com
 
  The way the block layer is currently written, it goes to great lengths
  to avoid having to split bios; upper layer code (such as bio_add_page())
  checks what the underlying device can handle and tries to always create
  bios that don't need to be split.
 
  But this approach becomes unwieldy and eventually breaks down with
  stacked devices and devices with dynamic limits, and it adds a lot of
  complexity. If the block layer could split bios as needed, we could
  eliminate a lot of complexity elsewhere - particularly in stacked
  drivers. Code that creates bios can then create whatever size bios are
  convenient, and more importantly stacked drivers don't have to deal with
  both their own bio size limitations and the limitations of the
  (potentially multiple) devices underneath them.  In the future this will
  let us delete merge_bvec_fn and a bunch of other code.
 
  We do this by adding calls to blk_queue_split() to the various
  make_request functions that need it - a few can already handle arbitrary
  size bios. Note that we add the call _after_ any call to
  blk_queue_bounce(); this means that blk_queue_split() and
  blk_recalc_rq_segments() don't need to be concerned with bouncing
  affecting segment merging.
 
  Some make_request_fn() callbacks were simple enough to audit and verify
  they don't need blk_queue_split() calls. The skipped ones are:
 
   * nfhd_make_request (arch/m68k/emu/nfblock.c)
   * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
   * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
   * brd_make_request (ramdisk - drivers/block/brd.c)
   * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
   * loop_make_request
   * null_queue_bio
   * bcache's make_request fns
 
  Some others are almost certainly safe to remove now, but will be left
  for future patches.
 
  Cc: Jens Axboe ax...@kernel.dk
  Cc: Christoph Hellwig h...@infradead.org
  Cc: Al Viro v...@zeniv.linux.org.uk
  Cc: Ming Lei ming@canonical.com
  Cc: Neil Brown ne...@suse.de
  Cc: Alasdair Kergon a...@redhat.com
  Cc: Mike Snitzer snit...@redhat.com
  Cc: dm-de...@redhat.com
  Cc: Lars Ellenberg drbd-...@lists.linbit.com
  Cc: drbd-u...@lists.linbit.com
  Cc: Jiri Kosina jkos...@suse.cz
  Cc: Geoff Levand ge...@infradead.org
  Cc: Jim Paris j...@jtan.com
  Cc: Joshua Morris josh.h.mor...@us.ibm.com
  Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
  Cc: Minchan Kim minc...@kernel.org
  Cc: Nitin Gupta ngu...@vflare.org
  Cc: Oleg Drokin oleg.dro...@intel.com
  Cc: Andreas Dilger andreas.dil...@intel.com
  Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
  Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
  [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
  Signed-off-by: Dongsu Park dp...@posteo.net
  Signed-off-by: Ming Lin min...@ssi.samsung.com
  ...
  diff --git a/block/blk-merge.c b/block/blk-merge.c
  index 30a0d9f..3707f30 100644
  --- a/block/blk-merge.c
  +++ b/block/blk-merge.c
  @@ -9,12 +9,158 @@
 
   #include blk.h
 
  +static struct bio *blk_bio_discard_split(struct request_queue *q,
  +  struct bio *bio,
  +  struct bio_set *bs)
  +{
  + unsigned int max_discard_sectors, granularity;
  + int alignment;
  + sector_t tmp;
  + unsigned split_sectors;
  +
  + /* Zero-sector (unknown) and one-sector granularities are the same.  
  */
  + granularity = max(q-limits.discard_granularity  9, 1U);
  +
  + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  
  9);
  + max_discard_sectors -= max_discard_sectors % granularity;
  +
  + if (unlikely(!max_discard_sectors)) {
  + /* XXX: warn */
  + return NULL;
  + }
  +
  + if (bio_sectors(bio) = max_discard_sectors)
  + return NULL;
  +
  + split_sectors = max_discard_sectors;
  +
  + /*
  +  * If the next starting sector would be misaligned, stop the discard 
  at
  +  * the previous aligned sector.
  +  */
  + alignment = (q-limits.discard_alignment  9) % granularity;
  +
  + tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
  + tmp = sector_div(tmp, granularity);
  +
  + if (split_sectors  tmp)
  + split_sectors -= tmp;
  +
  + return bio_split(bio, split_sectors, GFP_NOIO, bs);
  +}
 
  This code to stop the discard at the previous aligned sector could be
  the reason why I have 2 device-mapper-test-suite tests in the
  'thin-provisioning' testsuite failing due to this patchset:
 
 I'm setting up the testsuite to debug.

OK, once setup, to run the 2 tests in question directly you'd do
something like:

dmtest run 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Ming Lin
On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote:
 On Mon, Jul 06 2015 at  3:44P -0400,
 Ming Lin m...@kernel.org wrote:

 From: Kent Overstreet kent.overstr...@gmail.com

 The way the block layer is currently written, it goes to great lengths
 to avoid having to split bios; upper layer code (such as bio_add_page())
 checks what the underlying device can handle and tries to always create
 bios that don't need to be split.

 But this approach becomes unwieldy and eventually breaks down with
 stacked devices and devices with dynamic limits, and it adds a lot of
 complexity. If the block layer could split bios as needed, we could
 eliminate a lot of complexity elsewhere - particularly in stacked
 drivers. Code that creates bios can then create whatever size bios are
 convenient, and more importantly stacked drivers don't have to deal with
 both their own bio size limitations and the limitations of the
 (potentially multiple) devices underneath them.  In the future this will
 let us delete merge_bvec_fn and a bunch of other code.

 We do this by adding calls to blk_queue_split() to the various
 make_request functions that need it - a few can already handle arbitrary
 size bios. Note that we add the call _after_ any call to
 blk_queue_bounce(); this means that blk_queue_split() and
 blk_recalc_rq_segments() don't need to be concerned with bouncing
 affecting segment merging.

 Some make_request_fn() callbacks were simple enough to audit and verify
 they don't need blk_queue_split() calls. The skipped ones are:

  * nfhd_make_request (arch/m68k/emu/nfblock.c)
  * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
  * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
  * brd_make_request (ramdisk - drivers/block/brd.c)
  * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
  * loop_make_request
  * null_queue_bio
  * bcache's make_request fns

 Some others are almost certainly safe to remove now, but will be left
 for future patches.

 Cc: Jens Axboe ax...@kernel.dk
 Cc: Christoph Hellwig h...@infradead.org
 Cc: Al Viro v...@zeniv.linux.org.uk
 Cc: Ming Lei ming@canonical.com
 Cc: Neil Brown ne...@suse.de
 Cc: Alasdair Kergon a...@redhat.com
 Cc: Mike Snitzer snit...@redhat.com
 Cc: dm-de...@redhat.com
 Cc: Lars Ellenberg drbd-...@lists.linbit.com
 Cc: drbd-u...@lists.linbit.com
 Cc: Jiri Kosina jkos...@suse.cz
 Cc: Geoff Levand ge...@infradead.org
 Cc: Jim Paris j...@jtan.com
 Cc: Joshua Morris josh.h.mor...@us.ibm.com
 Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
 Cc: Minchan Kim minc...@kernel.org
 Cc: Nitin Gupta ngu...@vflare.org
 Cc: Oleg Drokin oleg.dro...@intel.com
 Cc: Andreas Dilger andreas.dil...@intel.com
 Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
 Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
 [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
 Signed-off-by: Dongsu Park dp...@posteo.net
 Signed-off-by: Ming Lin min...@ssi.samsung.com
 ...
 diff --git a/block/blk-merge.c b/block/blk-merge.c
 index 30a0d9f..3707f30 100644
 --- a/block/blk-merge.c
 +++ b/block/blk-merge.c
 @@ -9,12 +9,158 @@

  #include blk.h

 +static struct bio *blk_bio_discard_split(struct request_queue *q,
 +  struct bio *bio,
 +  struct bio_set *bs)
 +{
 + unsigned int max_discard_sectors, granularity;
 + int alignment;
 + sector_t tmp;
 + unsigned split_sectors;
 +
 + /* Zero-sector (unknown) and one-sector granularities are the same.  */
 + granularity = max(q-limits.discard_granularity  9, 1U);
 +
 + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  
 9);
 + max_discard_sectors -= max_discard_sectors % granularity;
 +
 + if (unlikely(!max_discard_sectors)) {
 + /* XXX: warn */
 + return NULL;
 + }
 +
 + if (bio_sectors(bio) = max_discard_sectors)
 + return NULL;
 +
 + split_sectors = max_discard_sectors;
 +
 + /*
 +  * If the next starting sector would be misaligned, stop the discard at
 +  * the previous aligned sector.
 +  */
 + alignment = (q-limits.discard_alignment  9) % granularity;
 +
 + tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
 + tmp = sector_div(tmp, granularity);
 +
 + if (split_sectors  tmp)
 + split_sectors -= tmp;
 +
 + return bio_split(bio, split_sectors, GFP_NOIO, bs);
 +}

 This code to stop the discard at the previous aligned sector could be
 the reason why I have 2 device-mapper-test-suite tests in the
 'thin-provisioning' testsuite failing due to this patchset:

I'm setting up the testsuite to debug.

Ming


   4) Failure:
 test_discard_a_fragmented_device(DiscardQuickTests)
 
 [/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in
  `block in assert_no_mappings'
  
 

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Ming Lin
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
 On Fri, Jul 31 2015 at  5:19pm -0400,
 Ming Lin m...@kernel.org wrote:
 
  On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote:
   On Mon, Jul 06 2015 at  3:44P -0400,
   Ming Lin m...@kernel.org wrote:
  
   From: Kent Overstreet kent.overstr...@gmail.com
  
   The way the block layer is currently written, it goes to great lengths
   to avoid having to split bios; upper layer code (such as bio_add_page())
   checks what the underlying device can handle and tries to always create
   bios that don't need to be split.
  
   But this approach becomes unwieldy and eventually breaks down with
   stacked devices and devices with dynamic limits, and it adds a lot of
   complexity. If the block layer could split bios as needed, we could
   eliminate a lot of complexity elsewhere - particularly in stacked
   drivers. Code that creates bios can then create whatever size bios are
   convenient, and more importantly stacked drivers don't have to deal with
   both their own bio size limitations and the limitations of the
   (potentially multiple) devices underneath them.  In the future this will
   let us delete merge_bvec_fn and a bunch of other code.
  
   We do this by adding calls to blk_queue_split() to the various
   make_request functions that need it - a few can already handle arbitrary
   size bios. Note that we add the call _after_ any call to
   blk_queue_bounce(); this means that blk_queue_split() and
   blk_recalc_rq_segments() don't need to be concerned with bouncing
   affecting segment merging.
  
   Some make_request_fn() callbacks were simple enough to audit and verify
   they don't need blk_queue_split() calls. The skipped ones are:
  
* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fns
  
   Some others are almost certainly safe to remove now, but will be left
   for future patches.
  
   Cc: Jens Axboe ax...@kernel.dk
   Cc: Christoph Hellwig h...@infradead.org
   Cc: Al Viro v...@zeniv.linux.org.uk
   Cc: Ming Lei ming@canonical.com
   Cc: Neil Brown ne...@suse.de
   Cc: Alasdair Kergon a...@redhat.com
   Cc: Mike Snitzer snit...@redhat.com
   Cc: dm-de...@redhat.com
   Cc: Lars Ellenberg drbd-...@lists.linbit.com
   Cc: drbd-u...@lists.linbit.com
   Cc: Jiri Kosina jkos...@suse.cz
   Cc: Geoff Levand ge...@infradead.org
   Cc: Jim Paris j...@jtan.com
   Cc: Joshua Morris josh.h.mor...@us.ibm.com
   Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
   Cc: Minchan Kim minc...@kernel.org
   Cc: Nitin Gupta ngu...@vflare.org
   Cc: Oleg Drokin oleg.dro...@intel.com
   Cc: Andreas Dilger andreas.dil...@intel.com
   Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
   Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
   [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
   Signed-off-by: Dongsu Park dp...@posteo.net
   Signed-off-by: Ming Lin min...@ssi.samsung.com
   ...
   diff --git a/block/blk-merge.c b/block/blk-merge.c
   index 30a0d9f..3707f30 100644
   --- a/block/blk-merge.c
   +++ b/block/blk-merge.c
   @@ -9,12 +9,158 @@
  
#include blk.h
  
   +static struct bio *blk_bio_discard_split(struct request_queue *q,
   +  struct bio *bio,
   +  struct bio_set *bs)
   +{
   + unsigned int max_discard_sectors, granularity;
   + int alignment;
   + sector_t tmp;
   + unsigned split_sectors;
   +
   + /* Zero-sector (unknown) and one-sector granularities are the 
   same.  */
   + granularity = max(q-limits.discard_granularity  9, 1U);
   +
   + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 
9);
   + max_discard_sectors -= max_discard_sectors % granularity;
   +
   + if (unlikely(!max_discard_sectors)) {
   + /* XXX: warn */
   + return NULL;
   + }
   +
   + if (bio_sectors(bio) = max_discard_sectors)
   + return NULL;
   +
   + split_sectors = max_discard_sectors;
   +
   + /*
   +  * If the next starting sector would be misaligned, stop the 
   discard at
   +  * the previous aligned sector.
   +  */
   + alignment = (q-limits.discard_alignment  9) % granularity;
   +
   + tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
   + tmp = sector_div(tmp, granularity);
   +
   + if (split_sectors  tmp)
   + split_sectors -= tmp;
   +
   + return bio_split(bio, split_sectors, GFP_NOIO, bs);
   +}
  
   This code to stop the discard at the previous aligned sector could be
   the reason why I have 2 device-mapper-test-suite tests in the
   

Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-31 Thread Ming Lin
On Fri, Jul 31, 2015 at 3:02 PM, Ming Lin m...@kernel.org wrote:
 On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
 On Fri, Jul 31 2015 at  5:19pm -0400,
 Ming Lin m...@kernel.org wrote:

  On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote:
   On Mon, Jul 06 2015 at  3:44P -0400,
   Ming Lin m...@kernel.org wrote:
  
   From: Kent Overstreet kent.overstr...@gmail.com
  
   The way the block layer is currently written, it goes to great lengths
   to avoid having to split bios; upper layer code (such as bio_add_page())
   checks what the underlying device can handle and tries to always create
   bios that don't need to be split.
  
   But this approach becomes unwieldy and eventually breaks down with
   stacked devices and devices with dynamic limits, and it adds a lot of
   complexity. If the block layer could split bios as needed, we could
   eliminate a lot of complexity elsewhere - particularly in stacked
   drivers. Code that creates bios can then create whatever size bios are
   convenient, and more importantly stacked drivers don't have to deal with
   both their own bio size limitations and the limitations of the
   (potentially multiple) devices underneath them.  In the future this will
   let us delete merge_bvec_fn and a bunch of other code.
  
   We do this by adding calls to blk_queue_split() to the various
   make_request functions that need it - a few can already handle arbitrary
   size bios. Note that we add the call _after_ any call to
   blk_queue_bounce(); this means that blk_queue_split() and
   blk_recalc_rq_segments() don't need to be concerned with bouncing
   affecting segment merging.
  
   Some make_request_fn() callbacks were simple enough to audit and verify
   they don't need blk_queue_split() calls. The skipped ones are:
  
* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fns
  
   Some others are almost certainly safe to remove now, but will be left
   for future patches.
  
   Cc: Jens Axboe ax...@kernel.dk
   Cc: Christoph Hellwig h...@infradead.org
   Cc: Al Viro v...@zeniv.linux.org.uk
   Cc: Ming Lei ming@canonical.com
   Cc: Neil Brown ne...@suse.de
   Cc: Alasdair Kergon a...@redhat.com
   Cc: Mike Snitzer snit...@redhat.com
   Cc: dm-de...@redhat.com
   Cc: Lars Ellenberg drbd-...@lists.linbit.com
   Cc: drbd-u...@lists.linbit.com
   Cc: Jiri Kosina jkos...@suse.cz
   Cc: Geoff Levand ge...@infradead.org
   Cc: Jim Paris j...@jtan.com
   Cc: Joshua Morris josh.h.mor...@us.ibm.com
   Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
   Cc: Minchan Kim minc...@kernel.org
   Cc: Nitin Gupta ngu...@vflare.org
   Cc: Oleg Drokin oleg.dro...@intel.com
   Cc: Andreas Dilger andreas.dil...@intel.com
   Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
   Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
   [dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
   Signed-off-by: Dongsu Park dp...@posteo.net
   Signed-off-by: Ming Lin min...@ssi.samsung.com
   ...
   diff --git a/block/blk-merge.c b/block/blk-merge.c
   index 30a0d9f..3707f30 100644
   --- a/block/blk-merge.c
   +++ b/block/blk-merge.c
   @@ -9,12 +9,158 @@
  
#include blk.h
  
   +static struct bio *blk_bio_discard_split(struct request_queue *q,
   +  struct bio *bio,
   +  struct bio_set *bs)
   +{
   + unsigned int max_discard_sectors, granularity;
   + int alignment;
   + sector_t tmp;
   + unsigned split_sectors;
   +
   + /* Zero-sector (unknown) and one-sector granularities are the 
   same.  */
   + granularity = max(q-limits.discard_granularity  9, 1U);
   +
   + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 
9);
   + max_discard_sectors -= max_discard_sectors % granularity;
   +
   + if (unlikely(!max_discard_sectors)) {
   + /* XXX: warn */
   + return NULL;
   + }
   +
   + if (bio_sectors(bio) = max_discard_sectors)
   + return NULL;
   +
   + split_sectors = max_discard_sectors;
   +
   + /*
   +  * If the next starting sector would be misaligned, stop the 
   discard at
   +  * the previous aligned sector.
   +  */
   + alignment = (q-limits.discard_alignment  9) % granularity;
   +
   + tmp = bio-bi_iter.bi_sector + split_sectors - alignment;
   + tmp = sector_div(tmp, granularity);
   +
   + if (split_sectors  tmp)
   + split_sectors -= tmp;
   +
   + return bio_split(bio, split_sectors, GFP_NOIO, bs);
   +}
  
   This code to stop the discard at the previous aligned sector could be
   the reason why I 

[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-06 Thread Ming Lin
From: Kent Overstreet 

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Al Viro 
Cc: Ming Lei 
Cc: Neil Brown 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Lars Ellenberg 
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina 
Cc: Geoff Levand 
Cc: Jim Paris 
Cc: Joshua Morris 
Cc: Philip Kelleher 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Oleg Drokin 
Cc: Andreas Dilger 
Acked-by: NeilBrown  (for the 'md/md.c' bits)
Signed-off-by: Kent Overstreet 
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 block/blk-core.c|  19 ++--
 block/blk-merge.c   | 159 ++--
 block/blk-mq.c  |   4 +
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_req.c   |   2 +
 drivers/block/pktcdvd.c |   6 +-
 drivers/block/ps3vram.c |   2 +
 drivers/block/rsxx/dev.c|   2 +
 drivers/block/umem.c|   2 +
 drivers/block/zram/zram_drv.c   |   2 +
 drivers/md/dm.c |   2 +
 drivers/md/md.c |   2 +
 drivers/s390/block/dcssblk.c|   2 +
 drivers/s390/block/xpram.c  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h  |   3 +
 16 files changed, 192 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82819e6..cecf80c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q->id < 0)
goto fail_q;
 
+   q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+   if (!q->bio_split)
+   goto fail_id;
+
q->backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
@@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(>backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(>backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(>backing_dev_info);
+fail_split:
+   bioset_free(q->bio_split);
 fail_id:
ida_simple_remove(_queue_ida, q->id);
 fail_q:
@@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct 
bio *bio)
struct request *req;
unsigned int request_count = 0;
 
+   blk_queue_split(q, , q->bio_split);
+
/*
 * low level driver can indicate that it wants pages above a
 * certain limit bounced to low memory (ie for highmem, or even
@@ -1832,15 +1840,6 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
-   if (likely(bio_is_rw(bio) &&
-  nr_sectors > 

[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-06 Thread mlin
From: Kent Overstreet 

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe 
Cc: Christoph Hellwig 
Cc: Al Viro 
Cc: Ming Lei 
Cc: Neil Brown 
Cc: Alasdair Kergon 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Lars Ellenberg 
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina 
Cc: Geoff Levand 
Cc: Jim Paris 
Cc: Joshua Morris 
Cc: Philip Kelleher 
Cc: Minchan Kim 
Cc: Nitin Gupta 
Cc: Oleg Drokin 
Cc: Andreas Dilger 
Acked-by: NeilBrown  (for the 'md/md.c' bits)
Signed-off-by: Kent Overstreet 
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park 
Signed-off-by: Ming Lin 
---
 block/blk-core.c|  19 ++--
 block/blk-merge.c   | 159 ++--
 block/blk-mq.c  |   4 +
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_req.c   |   2 +
 drivers/block/pktcdvd.c |   6 +-
 drivers/block/ps3vram.c |   2 +
 drivers/block/rsxx/dev.c|   2 +
 drivers/block/umem.c|   2 +
 drivers/block/zram/zram_drv.c   |   2 +
 drivers/md/dm.c |   2 +
 drivers/md/md.c |   2 +
 drivers/s390/block/dcssblk.c|   2 +
 drivers/s390/block/xpram.c  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h  |   3 +
 16 files changed, 192 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82819e6..cecf80c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q->id < 0)
goto fail_q;
 
+   q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
+   if (!q->bio_split)
+   goto fail_id;
+
q->backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
@@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(>backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(>backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(>backing_dev_info);
+fail_split:
+   bioset_free(q->bio_split);
 fail_id:
ida_simple_remove(_queue_ida, q->id);
 fail_q:
@@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct 
bio *bio)
struct request *req;
unsigned int request_count = 0;
 
+   blk_queue_split(q, , q->bio_split);
+
/*
 * low level driver can indicate that it wants pages above a
 * certain limit bounced to low memory (ie for highmem, or even
@@ -1832,15 +1840,6 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
-   if (likely(bio_is_rw(bio) &&
-  nr_sectors > 

[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-06 Thread Ming Lin
From: Kent Overstreet kent.overstr...@gmail.com

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe ax...@kernel.dk
Cc: Christoph Hellwig h...@infradead.org
Cc: Al Viro v...@zeniv.linux.org.uk
Cc: Ming Lei ming@canonical.com
Cc: Neil Brown ne...@suse.de
Cc: Alasdair Kergon a...@redhat.com
Cc: Mike Snitzer snit...@redhat.com
Cc: dm-de...@redhat.com
Cc: Lars Ellenberg drbd-...@lists.linbit.com
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina jkos...@suse.cz
Cc: Geoff Levand ge...@infradead.org
Cc: Jim Paris j...@jtan.com
Cc: Joshua Morris josh.h.mor...@us.ibm.com
Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
Cc: Minchan Kim minc...@kernel.org
Cc: Nitin Gupta ngu...@vflare.org
Cc: Oleg Drokin oleg.dro...@intel.com
Cc: Andreas Dilger andreas.dil...@intel.com
Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-core.c|  19 ++--
 block/blk-merge.c   | 159 ++--
 block/blk-mq.c  |   4 +
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_req.c   |   2 +
 drivers/block/pktcdvd.c |   6 +-
 drivers/block/ps3vram.c |   2 +
 drivers/block/rsxx/dev.c|   2 +
 drivers/block/umem.c|   2 +
 drivers/block/zram/zram_drv.c   |   2 +
 drivers/md/dm.c |   2 +
 drivers/md/md.c |   2 +
 drivers/s390/block/dcssblk.c|   2 +
 drivers/s390/block/xpram.c  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h  |   3 +
 16 files changed, 192 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82819e6..cecf80c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q-id  0)
goto fail_q;
 
+   q-bio_split = bioset_create(BIO_POOL_SIZE, 0);
+   if (!q-bio_split)
+   goto fail_id;
+
q-backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q-backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
@@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(q-backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(q-backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(q-backing_dev_info);
+fail_split:
+   bioset_free(q-bio_split);
 fail_id:
ida_simple_remove(blk_queue_ida, q-id);
 fail_q:
@@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct 
bio *bio)
struct request 

[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-07-06 Thread mlin
From: Kent Overstreet kent.overstr...@gmail.com

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe ax...@kernel.dk
Cc: Christoph Hellwig h...@infradead.org
Cc: Al Viro v...@zeniv.linux.org.uk
Cc: Ming Lei ming@canonical.com
Cc: Neil Brown ne...@suse.de
Cc: Alasdair Kergon a...@redhat.com
Cc: Mike Snitzer snit...@redhat.com
Cc: dm-de...@redhat.com
Cc: Lars Ellenberg drbd-...@lists.linbit.com
Cc: drbd-u...@lists.linbit.com
Cc: Jiri Kosina jkos...@suse.cz
Cc: Geoff Levand ge...@infradead.org
Cc: Jim Paris j...@jtan.com
Cc: Joshua Morris josh.h.mor...@us.ibm.com
Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com
Cc: Minchan Kim minc...@kernel.org
Cc: Nitin Gupta ngu...@vflare.org
Cc: Oleg Drokin oleg.dro...@intel.com
Cc: Andreas Dilger andreas.dil...@intel.com
Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits)
Signed-off-by: Kent Overstreet kent.overstr...@gmail.com
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park dp...@posteo.net
Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-core.c|  19 ++--
 block/blk-merge.c   | 159 ++--
 block/blk-mq.c  |   4 +
 block/blk-sysfs.c   |   3 +
 drivers/block/drbd/drbd_req.c   |   2 +
 drivers/block/pktcdvd.c |   6 +-
 drivers/block/ps3vram.c |   2 +
 drivers/block/rsxx/dev.c|   2 +
 drivers/block/umem.c|   2 +
 drivers/block/zram/zram_drv.c   |   2 +
 drivers/md/dm.c |   2 +
 drivers/md/md.c |   2 +
 drivers/s390/block/dcssblk.c|   2 +
 drivers/s390/block/xpram.c  |   2 +
 drivers/staging/lustre/lustre/llite/lloop.c |   2 +
 include/linux/blkdev.h  |   3 +
 16 files changed, 192 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 82819e6..cecf80c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q-id  0)
goto fail_q;
 
+   q-bio_split = bioset_create(BIO_POOL_SIZE, 0);
+   if (!q-bio_split)
+   goto fail_id;
+
q-backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q-backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK;
@@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(q-backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(q-backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(q-backing_dev_info);
+fail_split:
+   bioset_free(q-bio_split);
 fail_id:
ida_simple_remove(blk_queue_ida, q-id);
 fail_q:
@@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct 
bio *bio)
struct request