Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 11:02 AM, Kent Overstreet wrote: > On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: >> Wouldn't it be easier to move both max_write_same_sectors and >> max_discard sectors to 64 bit (ie to type sector_t) and be done with the >> overflow? >> Seems to me this is far too much coding around self-imposed restrictions... > > It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I > suppose wouldn't actually increase the size of struct bio (when sector_t is 64 > bits), since struct bvec_iter has padding right now... > Which I guess we should be doing anyway. Devices are getting larger and larger, so in the long run in need to be moved to 64 bits. Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 11:02 AM, Kent Overstreet wrote: On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: Wouldn't it be easier to move both max_write_same_sectors and max_discard sectors to 64 bit (ie to type sector_t) and be done with the overflow? Seems to me this is far too much coding around self-imposed restrictions... It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... Which I guess we should be doing anyway. Devices are getting larger and larger, so in the long run in need to be moved to 64 bits. Cheers, Hannes -- Dr. Hannes ReineckezSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, 2015-08-11 at 20:24 -0400, Martin K. Petersen wrote: > > "Ming" == Ming Lin writes: > > Ming> Do you still agree we cap discard to 2G as an interim solution? > > I can live with the 2G cap for 4.3 but would like to see it fixed > properly in 4.4. Sure. I'd like to work on the fix. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Ming" == Ming Lin writes: Ming> Do you still agree we cap discard to 2G as an interim solution? I can live with the 2G cap for 4.3 but would like to see it fixed properly in 4.4. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, 2015-08-11 at 14:05 -0400, Martin K. Petersen wrote: > > "Martin" == Martin K Petersen writes: > > Martin> I agree except I really don't want to lop off anything unless > Martin> the device locks up if we send it partial blocks. There was an > Martin> array that had problems a while back but I believe they have > Martin> been fixed. > > Oh, and there are several arrays out there that have allocation units > that are not powers of two. *sigh* Do you still agree we cap discard to 2G as an interim solution? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/commit/?h=block-generic-req=39d2f5f If OK, I'll send out new version asking Jens to apply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Martin" == Martin K Petersen writes: Martin> I agree except I really don't want to lop off anything unless Martin> the device locks up if we send it partial blocks. There was an Martin> array that had problems a while back but I believe they have Martin> been fixed. Oh, and there are several arrays out there that have allocation units that are not powers of two. *sigh* -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Mike" == Mike Snitzer writes: Mike> That is the benefit. And when coupled with the new default Mike> max_discard of 64K (pending change from Jens for 4.3) this 2GB Mike> upper limit really isn't such a big deal. Unless I'm missing Mike> something... 2GB is fine for current SATA due to the stupid range descriptors. But there are some changes in the pipeline to support descriptors with bigger ranges so that will change. However, we currently do 4GB discards on SCSI so the proposed cap will cut that in half. I'm OK with that as an interim solution. though. But I'm a bit concerned about what might be lurking in dm thinp if you trip over partial blocks like in Ming's example... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Kent" == Kent Overstreet writes: Kent> This kind of logic really doesn't belong in dm Well it does in this case since the thinp personality actually provisions and unprovisions space. But there is a difference between what dm thinp acts on for its own internal provisioning purposes and what it passes down the stack. I am really against dropping information anywhere along the path. We don't round off read/write requests either. The queue limits were meant as hints to mkfs.* so that on-disk data structures could be laid out in an aligned and storage friendly way. I never intended for the hints to affect runtime behavior. Kent> IMO though it belongs in the driver - if a discard needs to be Kent> dropped because it's too small and the hardware can't do it, that Kent> should be the driver's responsibility. I agree except I really don't want to lop off anything unless the device locks up if we send it partial blocks. There was an array that had problems a while back but I believe they have been fixed. The fundamental premise should be that we pass as comprehensive information as we can. And the device can then decide to ignore all or parts of the request. That's fundamentally how things work at the protocol level in both SCSI and SATA. I don't see any reason why the Linux I/O stack should behave in a different manner. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, Aug 11 2015 at 1:36pm -0400, Martin K. Petersen wrote: > > "Mike" == Mike Snitzer writes: > > Mike> DM-thinp processes discards internally before it passes them down > Mike> (if configured to do so). If a discard is smaller than the > Mike> granularity of a thinp block (whose size is configurable) or if > Mike> the start and end of the discard's extent is misaligned (relative > Mike> to the thinp blocks mapped to the logical extent) then the discard > Mike> won't actually discard partial thinp blocks. > > That's fine. You can throw away anything you don't like as long as > discard_zeroes_data=0. Correct, dm-thinp sets discard_zeroes_data=0 > But I don't understand why having an artificial cap at 2GB fixes > things. Other than making it less likely for you to receive a runt by > virtue of being aligned to a power of two. That is the benefit. And when coupled with the new default max_discard of 64K (pending change from Jens for 4.3) this 2GB upper limit really isn't such a big deal. Unless I'm missing something... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Mike" == Mike Snitzer writes: Mike> DM-thinp processes discards internally before it passes them down Mike> (if configured to do so). If a discard is smaller than the Mike> granularity of a thinp block (whose size is configurable) or if Mike> the start and end of the discard's extent is misaligned (relative Mike> to the thinp blocks mapped to the logical extent) then the discard Mike> won't actually discard partial thinp blocks. That's fine. You can throw away anything you don't like as long as discard_zeroes_data=0. But I don't understand why having an artificial cap at 2GB fixes things. Other than making it less likely for you to receive a runt by virtue of being aligned to a power of two. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10 2015 at 11:38pm -0400, Kent Overstreet wrote: > On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote: > > On Mon, Aug 10 2015 at 10:00pm -0400, > > Martin K. Petersen wrote: > > > > > > "Ming" == Ming Lin writes: > > > > > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? > > > > > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use > > > Ming> 1<<31. > > > > > > I'm not sure why things are not working for dm-thinp. Presumably Kent's > > > code would split the discard at a granularity boundary so why would that > > > cause problems for dm? > > > > DM-thinp processes discards internally before it passes them down (if > > configured to do so). If a discard is smaller than the granularity of a > > thinp block (whose size is configurable) or if the start and end of the > > discard's extent is misaligned (relative to the thinp blocks mapped to > > the logical extent) then the discard won't actually discard partial > > thinp blocks. > > This kind of logic really doesn't belong in dm - if it's needed, it really > belongs in bio_split() (which is supposed to work correctly for discards - so > if > it is needed, then bio_split() needs fixing...) DM thinp does advertise discard_granularity that reflects the thinp blocksize. blk_queue_split() does look like it'd do the right thing. But the splitting that DM thinp is doing is a long standing implementation (in DM core) that will need to be carefully reviewed/rewritten. We can tackle it after all this late splitting code lands. > IMO though it belongs in the driver - if a discard needs to be dropped because > it's too small and the hardware can't do it, that should be the driver's > responsibility. This isn't about the hardware's limits. This is about the intermediate remapping/stacking driver's own space management hooking off of the discard bio too. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, 2015-08-11 at 14:05 -0400, Martin K. Petersen wrote: Martin == Martin K Petersen martin.peter...@oracle.com writes: Martin I agree except I really don't want to lop off anything unless Martin the device locks up if we send it partial blocks. There was an Martin array that had problems a while back but I believe they have Martin been fixed. Oh, and there are several arrays out there that have allocation units that are not powers of two. *sigh* Do you still agree we cap discard to 2G as an interim solution? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/commit/?h=block-generic-reqid=39d2f5f If OK, I'll send out new version asking Jens to apply. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10 2015 at 11:38pm -0400, Kent Overstreet kent.overstr...@gmail.com wrote: On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote: On Mon, Aug 10 2015 at 10:00pm -0400, Martin K. Petersen martin.peter...@oracle.com wrote: Ming == Ming Lin m...@kernel.org writes: Ming Did you mean still use (UINT_MAX 9) in blkdev_issue_discard()? Ming But that doesn't work for dm-thinp. See Kent's suggestion to use Ming 131. I'm not sure why things are not working for dm-thinp. Presumably Kent's code would split the discard at a granularity boundary so why would that cause problems for dm? DM-thinp processes discards internally before it passes them down (if configured to do so). If a discard is smaller than the granularity of a thinp block (whose size is configurable) or if the start and end of the discard's extent is misaligned (relative to the thinp blocks mapped to the logical extent) then the discard won't actually discard partial thinp blocks. This kind of logic really doesn't belong in dm - if it's needed, it really belongs in bio_split() (which is supposed to work correctly for discards - so if it is needed, then bio_split() needs fixing...) DM thinp does advertise discard_granularity that reflects the thinp blocksize. blk_queue_split() does look like it'd do the right thing. But the splitting that DM thinp is doing is a long standing implementation (in DM core) that will need to be carefully reviewed/rewritten. We can tackle it after all this late splitting code lands. IMO though it belongs in the driver - if a discard needs to be dropped because it's too small and the hardware can't do it, that should be the driver's responsibility. This isn't about the hardware's limits. This is about the intermediate remapping/stacking driver's own space management hooking off of the discard bio too. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, Aug 11 2015 at 1:36pm -0400, Martin K. Petersen martin.peter...@oracle.com wrote: Mike == Mike Snitzer snit...@redhat.com writes: Mike DM-thinp processes discards internally before it passes them down Mike (if configured to do so). If a discard is smaller than the Mike granularity of a thinp block (whose size is configurable) or if Mike the start and end of the discard's extent is misaligned (relative Mike to the thinp blocks mapped to the logical extent) then the discard Mike won't actually discard partial thinp blocks. That's fine. You can throw away anything you don't like as long as discard_zeroes_data=0. Correct, dm-thinp sets discard_zeroes_data=0 But I don't understand why having an artificial cap at 2GB fixes things. Other than making it less likely for you to receive a runt by virtue of being aligned to a power of two. That is the benefit. And when coupled with the new default max_discard of 64K (pending change from Jens for 4.3) this 2GB upper limit really isn't such a big deal. Unless I'm missing something... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Martin == Martin K Petersen martin.peter...@oracle.com writes: Martin I agree except I really don't want to lop off anything unless Martin the device locks up if we send it partial blocks. There was an Martin array that had problems a while back but I believe they have Martin been fixed. Oh, and there are several arrays out there that have allocation units that are not powers of two. *sigh* -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Mike == Mike Snitzer snit...@redhat.com writes: Mike That is the benefit. And when coupled with the new default Mike max_discard of 64K (pending change from Jens for 4.3) this 2GB Mike upper limit really isn't such a big deal. Unless I'm missing Mike something... 2GB is fine for current SATA due to the stupid range descriptors. But there are some changes in the pipeline to support descriptors with bigger ranges so that will change. However, we currently do 4GB discards on SCSI so the proposed cap will cut that in half. I'm OK with that as an interim solution. though. But I'm a bit concerned about what might be lurking in dm thinp if you trip over partial blocks like in Ming's example... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Mike == Mike Snitzer snit...@redhat.com writes: Mike DM-thinp processes discards internally before it passes them down Mike (if configured to do so). If a discard is smaller than the Mike granularity of a thinp block (whose size is configurable) or if Mike the start and end of the discard's extent is misaligned (relative Mike to the thinp blocks mapped to the logical extent) then the discard Mike won't actually discard partial thinp blocks. That's fine. You can throw away anything you don't like as long as discard_zeroes_data=0. But I don't understand why having an artificial cap at 2GB fixes things. Other than making it less likely for you to receive a runt by virtue of being aligned to a power of two. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Kent == Kent Overstreet kent.overstr...@gmail.com writes: Kent This kind of logic really doesn't belong in dm Well it does in this case since the thinp personality actually provisions and unprovisions space. But there is a difference between what dm thinp acts on for its own internal provisioning purposes and what it passes down the stack. I am really against dropping information anywhere along the path. We don't round off read/write requests either. The queue limits were meant as hints to mkfs.* so that on-disk data structures could be laid out in an aligned and storage friendly way. I never intended for the hints to affect runtime behavior. Kent IMO though it belongs in the driver - if a discard needs to be Kent dropped because it's too small and the hardware can't do it, that Kent should be the driver's responsibility. I agree except I really don't want to lop off anything unless the device locks up if we send it partial blocks. There was an array that had problems a while back but I believe they have been fixed. The fundamental premise should be that we pass as comprehensive information as we can. And the device can then decide to ignore all or parts of the request. That's fundamentally how things work at the protocol level in both SCSI and SATA. I don't see any reason why the Linux I/O stack should behave in a different manner. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Ming == Ming Lin m...@kernel.org writes: Ming Do you still agree we cap discard to 2G as an interim solution? I can live with the 2G cap for 4.3 but would like to see it fixed properly in 4.4. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, 2015-08-11 at 20:24 -0400, Martin K. Petersen wrote: Ming == Ming Lin m...@kernel.org writes: Ming Do you still agree we cap discard to 2G as an interim solution? I can live with the 2G cap for 4.3 but would like to see it fixed properly in 4.4. Sure. I'd like to work on the fix. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote: > On Mon, Aug 10 2015 at 10:00pm -0400, > Martin K. Petersen wrote: > > > > "Ming" == Ming Lin writes: > > > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? > > > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use > > Ming> 1<<31. > > > > I'm not sure why things are not working for dm-thinp. Presumably Kent's > > code would split the discard at a granularity boundary so why would that > > cause problems for dm? > > DM-thinp processes discards internally before it passes them down (if > configured to do so). If a discard is smaller than the granularity of a > thinp block (whose size is configurable) or if the start and end of the > discard's extent is misaligned (relative to the thinp blocks mapped to > the logical extent) then the discard won't actually discard partial > thinp blocks. This kind of logic really doesn't belong in dm - if it's needed, it really belongs in bio_split() (which is supposed to work correctly for discards - so if it is needed, then bio_split() needs fixing...) IMO though it belongs in the driver - if a discard needs to be dropped because it's too small and the hardware can't do it, that should be the driver's responsibility. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10 2015 at 10:00pm -0400, Martin K. Petersen wrote: > > "Ming" == Ming Lin writes: > > Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? > > Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use > Ming> 1<<31. > > I'm not sure why things are not working for dm-thinp. Presumably Kent's > code would split the discard at a granularity boundary so why would that > cause problems for dm? DM-thinp processes discards internally before it passes them down (if configured to do so). If a discard is smaller than the granularity of a thinp block (whose size is configurable) or if the start and end of the discard's extent is misaligned (relative to the thinp blocks mapped to the logical extent) then the discard won't actually discard partial thinp blocks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Ming" == Ming Lin writes: Ming> Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? Ming> But that doesn't work for dm-thinp. See Kent's suggestion to use Ming> 1<<31. I'm not sure why things are not working for dm-thinp. Presumably Kent's code would split the discard at a granularity boundary so why would that cause problems for dm? In looking at this I just found out that we'll corrupt data on certain SCSI configs with the granularity enforcement in place. I'll have to conjure up a fix for that... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10, 2015 at 11:13 AM, Mike Snitzer wrote: > On Mon, Aug 10 2015 at 12:14pm -0400, > Ming Lin wrote: > >> On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: >> > >> > Aside from that, I'm in favor of seeing this late bio splitting patchset >> > finally land upstream (hopefully in time for the 4.3 merge, Jens?): >> > >> > Acked-by: Mike Snitzer >> >> Thanks! >> >> May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" >> also? > > Sure, but please fold in the removal of dm.c comments I made in this > merge commit: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splitting=d6df875bb65ef1ee10c91cf09cb58d009286321f Rebased on top of latest Linus tree(4.2-rc6+). https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/diff/drivers/md/dm.c?h=block-generic-req=3dd8509a3f05152cca83bc41c37a8ba3e9119736 > > thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, 2015-08-10 at 12:22 -0400, Martin K. Petersen wrote: > > "Mike" == Mike Snitzer writes: > > Mike> Shouldn't we also be using MAX_BIO_SECTORS in > Mike> blkdev_issue_write_same (instead of UINT_MAX >> 9)? > > The granularity of WRITE SAME is always 1 logical block and there is no > reason to round it down to the next power of two. > > +/* > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > + * it is of the proper granularity as long as the granularity is a power > + * of two. > + */ > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) > + > > That's fine for SATA since we're already capping at 2TB minus change. > But it means we'll be capping unnecessarily on SCSI. And larger range > counts are impending in SATA as well. > > So this goes back to my original comment: The only place there can be a > discard granularity is for SCSI UNMAP. And consequently, we should only > enforce alignment and granularity when that code path is taken in sd.c. > > I'm OK with Ming's patch series in general. Let's leave the discard cap > at UINT_MAX and I'll twiddle the rest in SCSI. Just to make sure I didn't misunderstand it. Did you mean still use (UINT_MAX >> 9) in blkdev_issue_discard()? req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9); instead of: req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); But that doesn't work for dm-thinp. See Kent's suggestion to use 1<<31. https://www.redhat.com/archives/dm-devel/2015-August/msg00053.html > > Reviewed-by: Martin K. Petersen > Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10 2015 at 12:14pm -0400, Ming Lin wrote: > On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: > > > > Aside from that, I'm in favor of seeing this late bio splitting patchset > > finally land upstream (hopefully in time for the 4.3 merge, Jens?): > > > > Acked-by: Mike Snitzer > > Thanks! > > May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" > also? Sure, but please fold in the removal of dm.c comments I made in this merge commit: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splitting=d6df875bb65ef1ee10c91cf09cb58d009286321f thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Ming" == Ming Lin writes: Ming, Ming> I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non Ming> page size aligned splits. This does not matter for write same and discard since there is only a single logical block of payload. Also, given limitations in SATA we're always issuing 2GB-32KB sized discards. Rounding those down to an even 1GB would impact performance. I am sympathetic to wanting to issue I/Os that are aligned to powers of two. But for most devices this matters little since the additional cost is limited to misaligned head and tail blocks. One thing that may be worth considering is switching bi_size from bytes to blocks for REQ_FS. That would give us some headroom without increasing bi_size beyond 32 bits. But I'm not entirely sure it's worth the pain. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Mike" == Mike Snitzer writes: Mike> Shouldn't we also be using MAX_BIO_SECTORS in Mike> blkdev_issue_write_same (instead of UINT_MAX >> 9)? The granularity of WRITE SAME is always 1 logical block and there is no reason to round it down to the next power of two. +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + That's fine for SATA since we're already capping at 2TB minus change. But it means we'll be capping unnecessarily on SCSI. And larger range counts are impending in SATA as well. So this goes back to my original comment: The only place there can be a discard granularity is for SCSI UNMAP. And consequently, we should only enforce alignment and granularity when that code path is taken in sd.c. I'm OK with Ming's patch series in general. Let's leave the discard cap at UINT_MAX and I'll twiddle the rest in SCSI. Reviewed-by: Martin K. Petersen -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, 2015-08-10 at 09:14 -0700, Ming Lin wrote: > On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: > > On Sun, Aug 09 2015 at 3:18am -0400, > > Ming Lin wrote: > > > > > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > > > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > > > > Will change it to MAX_BIO_SECTORS. > > > > > May I add your ACK? > > > > > > > > Yes, please go ahead. > > > > > > Thanks. I'll send a new version of the series once device-mapper guy > > > acks. > > > > > > Hi Mike, > > > > > > I have updated my tree. Could you pull and re-test? > > > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req > > > > > > The 2 thin-provisioning tests passed. > > > > I've merged your latest branch with my dm-4.3 branch, I had one conflict > > in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no > > surprise). I've published the result here: > > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting > > > > It passes the device-mapper-test-suite's 'thin-provisioning' tests. > > > > > Hope I can have your ACK soon. > > > > Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same > > (instead of UINT_MAX >> 9)? > > I also prefer using MAX_BIO_SECTORS. > Otherwise, we may have non page size aligned splits. > > Say, write_same 8G. > > Using UINT_MAX >> 9, we'll have 2 sector aligned splits in > blkdev_issue_write_same(): > 0 - (4G - 512 - 1) > (4G - 512, 8G -1) Actually, 3 sector aligned splits. 0 - (4G-512-1) (4G-512), (8G-512-1) (8G-512), (8G-1) > > This looks weired. > > Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits: > 0 - (2G -1) > 2G - (4G - 1) > 4G - (6G - 1) > 6G - (8G - 1) > > I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection. > > > > > Aside from that, I'm in favor of seeing this late bio splitting patchset > > finally land upstream (hopefully in time for the 4.3 merge, Jens?): > > > > Acked-by: Mike Snitzer > > Thanks! > > May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" > also? > > > > > p.s. I'll be working with Joe Thornber on optimizing DM (particularly > > dm-thinp and dm-cache) once this patchset is included upstream. You'll > > see I've already added a couple WIP dm-thinp patches ontop. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: > On Sun, Aug 09 2015 at 3:18am -0400, > Ming Lin wrote: > > > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > > > Will change it to MAX_BIO_SECTORS. > > > > May I add your ACK? > > > > > > Yes, please go ahead. > > > > Thanks. I'll send a new version of the series once device-mapper guy > > acks. > > > > Hi Mike, > > > > I have updated my tree. Could you pull and re-test? > > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req > > > > The 2 thin-provisioning tests passed. > > I've merged your latest branch with my dm-4.3 branch, I had one conflict > in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no > surprise). I've published the result here: > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting > > It passes the device-mapper-test-suite's 'thin-provisioning' tests. > > > Hope I can have your ACK soon. > > Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same > (instead of UINT_MAX >> 9)? I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non page size aligned splits. Say, write_same 8G. Using UINT_MAX >> 9, we'll have 2 sector aligned splits in blkdev_issue_write_same(): 0 - (4G - 512 - 1) (4G - 512, 8G -1) This looks weired. Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits: 0 - (2G -1) 2G - (4G - 1) 4G - (6G - 1) 6G - (8G - 1) I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection. > > Aside from that, I'm in favor of seeing this late bio splitting patchset > finally land upstream (hopefully in time for the 4.3 merge, Jens?): > > Acked-by: Mike Snitzer Thanks! May I add your Ack to "PATCH: block: kill merge_bvec_fn() completely" also? > > p.s. I'll be working with Joe Thornber on optimizing DM (particularly > dm-thinp and dm-cache) once this patchset is included upstream. You'll > see I've already added a couple WIP dm-thinp patches ontop. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, Aug 09 2015 at 3:18am -0400, Ming Lin wrote: > On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > > Will change it to MAX_BIO_SECTORS. > > > May I add your ACK? > > > > Yes, please go ahead. > > Thanks. I'll send a new version of the series once device-mapper guy > acks. > > Hi Mike, > > I have updated my tree. Could you pull and re-test? > https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req > > The 2 thin-provisioning tests passed. I've merged your latest branch with my dm-4.3 branch, I had one conflict in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no surprise). I've published the result here: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting It passes the device-mapper-test-suite's 'thin-provisioning' tests. > Hope I can have your ACK soon. Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same (instead of UINT_MAX >> 9)? Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer p.s. I'll be working with Joe Thornber on optimizing DM (particularly dm-thinp and dm-cache) once this patchset is included upstream. You'll see I've already added a couple WIP dm-thinp patches ontop. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, 2015-08-10 at 09:14 -0700, Ming Lin wrote: On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: On Sun, Aug 09 2015 at 3:18am -0400, Ming Lin m...@kernel.org wrote: On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. I've merged your latest branch with my dm-4.3 branch, I had one conflict in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no surprise). I've published the result here: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting It passes the device-mapper-test-suite's 'thin-provisioning' tests. Hope I can have your ACK soon. Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same (instead of UINT_MAX 9)? I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non page size aligned splits. Say, write_same 8G. Using UINT_MAX 9, we'll have 2 sector aligned splits in blkdev_issue_write_same(): 0 - (4G - 512 - 1) (4G - 512, 8G -1) Actually, 3 sector aligned splits. 0 - (4G-512-1) (4G-512), (8G-512-1) (8G-512), (8G-1) This looks weired. Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits: 0 - (2G -1) 2G - (4G - 1) 4G - (6G - 1) 6G - (8G - 1) I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection. Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer snit...@redhat.com Thanks! May I add your Ack to PATCH: block: kill merge_bvec_fn() completely also? p.s. I'll be working with Joe Thornber on optimizing DM (particularly dm-thinp and dm-cache) once this patchset is included upstream. You'll see I've already added a couple WIP dm-thinp patches ontop. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, Aug 09 2015 at 3:18am -0400, Ming Lin m...@kernel.org wrote: On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. I've merged your latest branch with my dm-4.3 branch, I had one conflict in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no surprise). I've published the result here: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting It passes the device-mapper-test-suite's 'thin-provisioning' tests. Hope I can have your ACK soon. Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same (instead of UINT_MAX 9)? Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer snit...@redhat.com p.s. I'll be working with Joe Thornber on optimizing DM (particularly dm-thinp and dm-cache) once this patchset is included upstream. You'll see I've already added a couple WIP dm-thinp patches ontop. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: On Sun, Aug 09 2015 at 3:18am -0400, Ming Lin m...@kernel.org wrote: On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. I've merged your latest branch with my dm-4.3 branch, I had one conflict in the merge due to the dm_merge_bvec() change from 4.2-rc6 (no surprise). I've published the result here: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=block-late-bio-splitting It passes the device-mapper-test-suite's 'thin-provisioning' tests. Hope I can have your ACK soon. Shouldn't we also be using MAX_BIO_SECTORS in blkdev_issue_write_same (instead of UINT_MAX 9)? I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non page size aligned splits. Say, write_same 8G. Using UINT_MAX 9, we'll have 2 sector aligned splits in blkdev_issue_write_same(): 0 - (4G - 512 - 1) (4G - 512, 8G -1) This looks weired. Using MAX_BIO_SECTORS, we'll have 4 page size aligned splits: 0 - (2G -1) 2G - (4G - 1) 4G - (6G - 1) 6G - (8G - 1) I'll use MAX_BIO_SECTORS in blkdev_issue_write_same() if no objection. Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer snit...@redhat.com Thanks! May I add your Ack to PATCH: block: kill merge_bvec_fn() completely also? p.s. I'll be working with Joe Thornber on optimizing DM (particularly dm-thinp and dm-cache) once this patchset is included upstream. You'll see I've already added a couple WIP dm-thinp patches ontop. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Mike == Mike Snitzer snit...@redhat.com writes: Mike Shouldn't we also be using MAX_BIO_SECTORS in Mike blkdev_issue_write_same (instead of UINT_MAX 9)? The granularity of WRITE SAME is always 1 logical block and there is no reason to round it down to the next power of two. +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) + That's fine for SATA since we're already capping at 2TB minus change. But it means we'll be capping unnecessarily on SCSI. And larger range counts are impending in SATA as well. So this goes back to my original comment: The only place there can be a discard granularity is for SCSI UNMAP. And consequently, we should only enforce alignment and granularity when that code path is taken in sd.c. I'm OK with Ming's patch series in general. Let's leave the discard cap at UINT_MAX and I'll twiddle the rest in SCSI. Reviewed-by: Martin K. Petersen martin.peter...@oracle.com -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10 2015 at 12:14pm -0400, Ming Lin m...@kernel.org wrote: On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer snit...@redhat.com Thanks! May I add your Ack to PATCH: block: kill merge_bvec_fn() completely also? Sure, but please fold in the removal of dm.c comments I made in this merge commit: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splittingid=d6df875bb65ef1ee10c91cf09cb58d009286321f thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, 2015-08-10 at 12:22 -0400, Martin K. Petersen wrote: Mike == Mike Snitzer snit...@redhat.com writes: Mike Shouldn't we also be using MAX_BIO_SECTORS in Mike blkdev_issue_write_same (instead of UINT_MAX 9)? The granularity of WRITE SAME is always 1 logical block and there is no reason to round it down to the next power of two. +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) + That's fine for SATA since we're already capping at 2TB minus change. But it means we'll be capping unnecessarily on SCSI. And larger range counts are impending in SATA as well. So this goes back to my original comment: The only place there can be a discard granularity is for SCSI UNMAP. And consequently, we should only enforce alignment and granularity when that code path is taken in sd.c. I'm OK with Ming's patch series in general. Let's leave the discard cap at UINT_MAX and I'll twiddle the rest in SCSI. Just to make sure I didn't misunderstand it. Did you mean still use (UINT_MAX 9) in blkdev_issue_discard()? req_sects = min_t(sector_t, nr_sects, UINT_MAX 9); instead of: req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); But that doesn't work for dm-thinp. See Kent's suggestion to use 131. https://www.redhat.com/archives/dm-devel/2015-August/msg00053.html Reviewed-by: Martin K. Petersen martin.peter...@oracle.com Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Ming == Ming Lin m...@kernel.org writes: Ming, Ming I also prefer using MAX_BIO_SECTORS. Otherwise, we may have non Ming page size aligned splits. This does not matter for write same and discard since there is only a single logical block of payload. Also, given limitations in SATA we're always issuing 2GB-32KB sized discards. Rounding those down to an even 1GB would impact performance. I am sympathetic to wanting to issue I/Os that are aligned to powers of two. But for most devices this matters little since the additional cost is limited to misaligned head and tail blocks. One thing that may be worth considering is switching bi_size from bytes to blocks for REQ_FS. That would give us some headroom without increasing bi_size beyond 32 bits. But I'm not entirely sure it's worth the pain. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10, 2015 at 11:13 AM, Mike Snitzer snit...@redhat.com wrote: On Mon, Aug 10 2015 at 12:14pm -0400, Ming Lin m...@kernel.org wrote: On Mon, 2015-08-10 at 11:02 -0400, Mike Snitzer wrote: Aside from that, I'm in favor of seeing this late bio splitting patchset finally land upstream (hopefully in time for the 4.3 merge, Jens?): Acked-by: Mike Snitzer snit...@redhat.com Thanks! May I add your Ack to PATCH: block: kill merge_bvec_fn() completely also? Sure, but please fold in the removal of dm.c comments I made in this merge commit: http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=block-late-bio-splittingid=d6df875bb65ef1ee10c91cf09cb58d009286321f Rebased on top of latest Linus tree(4.2-rc6+). https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/diff/drivers/md/dm.c?h=block-generic-reqid=3dd8509a3f05152cca83bc41c37a8ba3e9119736 thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10, 2015 at 10:41:55PM -0400, Mike Snitzer wrote: On Mon, Aug 10 2015 at 10:00pm -0400, Martin K. Petersen martin.peter...@oracle.com wrote: Ming == Ming Lin m...@kernel.org writes: Ming Did you mean still use (UINT_MAX 9) in blkdev_issue_discard()? Ming But that doesn't work for dm-thinp. See Kent's suggestion to use Ming 131. I'm not sure why things are not working for dm-thinp. Presumably Kent's code would split the discard at a granularity boundary so why would that cause problems for dm? DM-thinp processes discards internally before it passes them down (if configured to do so). If a discard is smaller than the granularity of a thinp block (whose size is configurable) or if the start and end of the discard's extent is misaligned (relative to the thinp blocks mapped to the logical extent) then the discard won't actually discard partial thinp blocks. This kind of logic really doesn't belong in dm - if it's needed, it really belongs in bio_split() (which is supposed to work correctly for discards - so if it is needed, then bio_split() needs fixing...) IMO though it belongs in the driver - if a discard needs to be dropped because it's too small and the hardware can't do it, that should be the driver's responsibility. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Ming == Ming Lin m...@kernel.org writes: Ming Did you mean still use (UINT_MAX 9) in blkdev_issue_discard()? Ming But that doesn't work for dm-thinp. See Kent's suggestion to use Ming 131. I'm not sure why things are not working for dm-thinp. Presumably Kent's code would split the discard at a granularity boundary so why would that cause problems for dm? In looking at this I just found out that we'll corrupt data on certain SCSI configs with the granularity enforcement in place. I'll have to conjure up a fix for that... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Aug 10 2015 at 10:00pm -0400, Martin K. Petersen martin.peter...@oracle.com wrote: Ming == Ming Lin m...@kernel.org writes: Ming Did you mean still use (UINT_MAX 9) in blkdev_issue_discard()? Ming But that doesn't work for dm-thinp. See Kent's suggestion to use Ming 131. I'm not sure why things are not working for dm-thinp. Presumably Kent's code would split the discard at a granularity boundary so why would that cause problems for dm? DM-thinp processes discards internally before it passes them down (if configured to do so). If a discard is smaller than the granularity of a thinp block (whose size is configurable) or if the start and end of the discard's extent is misaligned (relative to the thinp blocks mapped to the logical extent) then the discard won't actually discard partial thinp blocks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > Will change it to MAX_BIO_SECTORS. > > May I add your ACK? > > Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. Hope I can have your ACK soon. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > Will change it to MAX_BIO_SECTORS. > May I add your ACK? Yes, please go ahead. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote: > On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: > > +/* > > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > > + * it is of the proper granularity as long as the granularity is a power > > + * of two. > > + */ > > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) > > Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever > to something like Kent's multipage biovecs we'll actually need it for > regular read/write bios in addition to discard and write same. > > Except for that the patch looks reasonable to me. Will change it to MAX_BIO_SECTORS. May I add your ACK? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: > +/* > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > + * it is of the proper granularity as long as the granularity is a power > + * of two. > + */ > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote: > > "Mike" == Mike Snitzer writes: > > Mike> This will translate to all intermediate layers that might split > Mike> discards needing to worry about granularity/alignment too > Mike> (e.g. how dm-thinp will have to care because it must generate > Mike> discard mappings with associated bios based on how blocks were > Mike> mapped to thinp). > > The fundamental issue here is that alignment and granularity should > never, ever have been enforced at the top of the stack. Horrendous idea > from the very beginning. > > For the < handful of braindead devices that get confused when you do > partial or misaligned blocks we should have had a quirk that did any > range adjusting at the bottom in sd_setup_discard_cmnd(). > > There's a reason I turned discard_zeroes_data off for UNMAP! > > Wrt. the range size I don't have a problem with capping at the 32-bit > bi_size limit. We probably don't want to send commands much bigger than > that anyway. How about below? commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a Author: Ming Lin Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (1<<31)>>9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Signed-off-by: Ming Lin --- block/blk-lib.c | 47 +++ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector;
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote: Mike == Mike Snitzer snit...@redhat.com writes: Mike This will translate to all intermediate layers that might split Mike discards needing to worry about granularity/alignment too Mike (e.g. how dm-thinp will have to care because it must generate Mike discard mappings with associated bios based on how blocks were Mike mapped to thinp). The fundamental issue here is that alignment and granularity should never, ever have been enforced at the top of the stack. Horrendous idea from the very beginning. For the handful of braindead devices that get confused when you do partial or misaligned blocks we should have had a quirk that did any range adjusting at the bottom in sd_setup_discard_cmnd(). There's a reason I turned discard_zeroes_data off for UNMAP! Wrt. the range size I don't have a problem with capping at the 32-bit bi_size limit. We probably don't want to send commands much bigger than that anyway. How about below? commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a Author: Ming Lin min...@ssi.samsung.com Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (131)9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-lib.c | 47 +++ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q-limits.discard_granularity 9, 1U); - alignment = (bdev_discard_alignment(bdev) 9) % granularity; - - /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects nr_sects - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio-bi_iter.bi_sector = sector;
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. Will change it to MAX_BIO_SECTORS. May I add your ACK? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. Hope I can have your ACK soon. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Mike" == Mike Snitzer writes: Mike> This will translate to all intermediate layers that might split Mike> discards needing to worry about granularity/alignment too Mike> (e.g. how dm-thinp will have to care because it must generate Mike> discard mappings with associated bios based on how blocks were Mike> mapped to thinp). The fundamental issue here is that alignment and granularity should never, ever have been enforced at the top of the stack. Horrendous idea from the very beginning. For the < handful of braindead devices that get confused when you do partial or misaligned blocks we should have had a quirk that did any range adjusting at the bottom in sd_setup_discard_cmnd(). There's a reason I turned discard_zeroes_data off for UNMAP! Wrt. the range size I don't have a problem with capping at the 32-bit bi_size limit. We probably don't want to send commands much bigger than that anyway. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 09:22:26PM -0800, Kent Overstreet wrote: > Is granularity required to be a power of two? One would hope, but looking at > the > code that doesn't appear to be a requirement... ugh, that's terrible... If devices have an odd granularity we'll just have to split more than nessecary. Let's hope no major hardware does that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: > Wouldn't it be easier to move both max_write_same_sectors and > max_discard sectors to 64 bit (ie to type sector_t) and be done with the > overflow? > Seems to me this is far too much coding around self-imposed restrictions... It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 01:40 AM, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> I'm for solution 3: >> >> - keep blk_bio_{discard,write_same}_split, but ensure we never built >>a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers > less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity > > It works. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388479] > [8388480, 8388607] > > > So how about below patch? > > commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 > Author: Ming Lin > Date: Fri Aug 7 15:07:07 2015 -0700 > > block: remove split code in blkdev_issue_{discard,write_same} > > The split code in blkdev_issue_{discard,write_same} can go away > now that any driver that cares does the split. We have to make > sure bio size doesn't overflow. > > For discard, we ensure max_discard_sectors is of the proper > granularity. So if discard size > 4G, blkdev_issue_discard() always > send multiple granularity requests to lower level, except that the > last one may be not multiple granularity. > > Signed-off-by: Ming Lin > --- > block/blk-lib.c | 37 + > 1 file changed, 9 insertions(+), 28 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 7688ee3..e178a07 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > struct request_queue *q = bdev_get_queue(bdev); > int type = REQ_WRITE | REQ_DISCARD; > unsigned int max_discard_sectors, granularity; > - int alignment; > struct bio_batch bb; > struct bio *bio; > int ret = 0; > @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > > /* Zero-sector (unknown) and one-sector granularities are the same. */ > granularity = max(q->limits.discard_granularity >> 9, 1U); > - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; > > /* > - * Ensure that max_discard_sectors is of the proper > - * granularity, so that requests stay aligned after a split. > - */ > - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); > + * Ensure that max_discard_sectors doesn't overflow bi_size and is of > + * the proper granularity. So if discard size > 4G, > blkdev_issue_discard() > + * always split and send multiple granularity requests to lower level, > + * except that the last one may be not multiple granularity. > + */ > + max_discard_sectors = UINT_MAX >> 9; > max_discard_sectors -= max_discard_sectors % granularity; > - if (unlikely(!max_discard_sectors)) { > - /* Avoid infinite loop below. Being cautious never hurts. */ > - return -EOPNOTSUPP; > - } > > if (flags & BLKDEV_DISCARD_SECURE) { > if (!blk_queue_secdiscard(q)) > @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > blk_start_plug(); > while (nr_sects) { > unsigned int req_sects; > - sector_t end_sect, tmp; > + sector_t end_sect; > > bio = bio_alloc(gfp_mask, 1); > if (!bio) { > @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > } > > req_sects = min_t(sector_t, nr_sects, max_discard_sectors); > - > - /* > - * If splitting a request, and the next starting sector would be > - * misaligned, stop the discard at the previous aligned sector. > - */ > end_sect = sector + req_sects; > - tmp = end_sect; > - if (req_sects < nr_sects && > - sector_div(tmp, granularity) != alignment) { > - end_sect = end_sect - alignment; > - sector_div(end_sect, granularity); > - end_sect = end_sect * granularity + alignment; > - req_sects = end_sect - sector; > - } > > bio->bi_iter.bi_sector = sector; >
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 01:40 AM, Ming Lin wrote: On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] --- this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] --- same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity It works. [start_sector, end_sector] [0, 8388607] [0, 8388479] [8388480, 8388607] So how about below patch? commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 Author: Ming Lin min...@ssi.samsung.com Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we ensure max_discard_sectors is of the proper granularity. So if discard size 4G, blkdev_issue_discard() always send multiple granularity requests to lower level, except that the last one may be not multiple granularity. Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-lib.c | 37 + 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..e178a07 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q-limits.discard_granularity 9, 1U); - alignment = (bdev_discard_alignment(bdev) 9) % granularity; /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + * Ensure that max_discard_sectors doesn't overflow bi_size and is of + * the proper granularity. So if discard size 4G, blkdev_issue_discard() + * always split and send multiple granularity requests to lower level, + * except that the last one may be not multiple granularity. + */ + max_discard_sectors = UINT_MAX 9; max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } if (flags BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* - * If splitting a request, and the next starting sector would be - * misaligned, stop the discard at the previous aligned sector. - */ end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects nr_sects - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio-bi_iter.bi_sector = sector; bio-bi_end_io = bio_batch_end_io; @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: Wouldn't it be easier to move both max_write_same_sectors and max_discard sectors to 64 bit (ie to type sector_t) and be done with the overflow? Seems to me this is far too much coding around self-imposed restrictions... It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 09:22:26PM -0800, Kent Overstreet wrote: Is granularity required to be a power of two? One would hope, but looking at the code that doesn't appear to be a requirement... ugh, that's terrible... If devices have an odd granularity we'll just have to split more than nessecary. Let's hope no major hardware does that. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Mike == Mike Snitzer snit...@redhat.com writes: Mike This will translate to all intermediate layers that might split Mike discards needing to worry about granularity/alignment too Mike (e.g. how dm-thinp will have to care because it must generate Mike discard mappings with associated bios based on how blocks were Mike mapped to thinp). The fundamental issue here is that alignment and granularity should never, ever have been enforced at the top of the stack. Horrendous idea from the very beginning. For the handful of braindead devices that get confused when you do partial or misaligned blocks we should have had a quirk that did any range adjusting at the bottom in sd_setup_discard_cmnd(). There's a reason I turned discard_zeroes_data off for UNMAP! Wrt. the range size I don't have a problem with capping at the 32-bit bi_size limit. We probably don't want to send commands much bigger than that anyway. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote: > On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet > > Ideally we'd get upper layers out of the business of knowing about the queue > > limits at all - that was the point of the patch series, after all. > > > > Instead of using UINT_MAX, would it work to just make the max 1 << 31 > > sectors?' > > 1 << 31 = 2G bytes = 0x40 sectors. > > Yes, that works as long as it's multiple of granularity. Is granularity required to be a power of two? One would hope, but looking at the code that doesn't appear to be a requirement... ugh, that's terrible... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet wrote: > On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: >> >> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> > I'm for solution 3: >> > >> > - keep blk_bio_{discard,write_same}_split, but ensure we never built >> >a > 4GB bio in blkdev_issue_{discard,write_same}. >> >> This has problem as I mentioned in solution 1. >> We need to also make sure max discard size is of proper granularity. >> See below example. >> >> 4G: 8388608 sectors >> UINT_MAX: 8388607 sectors >> >> dm-thinp block size = default discard granularity = 128 sectors >> >> blkdev_issue_discard(sector=0, nr_sectors=8388608) >> >> 1. Only ensure bi_size not overflow >> >> It doesn't work. >> >> [start_sector, end_sector] >> [0, 8388607] >> [0, 8388606], then dm-thinp splits it to 2 bios >> [0, 8388479] >> [8388480, 8388606] ---> this has problem in process_discard_bio(), >> because the discard size(7 sectors) covers >> less than a block(128 sectors) >> [8388607, 8388607] ---> same problem >> >> 2. Ensure bi_size not overflow and max discard size is of proper granularity > > Ideally we'd get upper layers out of the business of knowing about the queue > limits at all - that was the point of the patch series, after all. > > Instead of using UINT_MAX, would it work to just make the max 1 << 31 > sectors?' 1 << 31 = 2G bytes = 0x40 sectors. Yes, that works as long as it's multiple of granularity. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: > > I'm for solution 3: > > > > - keep blk_bio_{discard,write_same}_split, but ensure we never built > >a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers > less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity Ideally we'd get upper layers out of the business of knowing about the queue limits at all - that was the point of the patch series, after all. Instead of using UINT_MAX, would it work to just make the max 1 << 31 sectors? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: > I'm for solution 3: > > - keep blk_bio_{discard,write_same}_split, but ensure we never built >a > 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] ---> this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] ---> same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity It works. [start_sector, end_sector] [0, 8388607] [0, 8388479] [8388480, 8388607] So how about below patch? commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 Author: Ming Lin Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we ensure max_discard_sectors is of the proper granularity. So if discard size > 4G, blkdev_issue_discard() always send multiple granularity requests to lower level, except that the last one may be not multiple granularity. Signed-off-by: Ming Lin --- block/blk-lib.c | 37 + 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..e178a07 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); +* Ensure that max_discard_sectors doesn't overflow bi_size and is of +* the proper granularity. So if discard size > 4G, blkdev_issue_discard() +* always split and send multiple granularity requests to lower level, +* except that the last one may be not multiple granularity. + */ + max_discard_sectors = UINT_MAX >> 9; max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return -ENXIO; -
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Thu, Aug 06, 2015 at 05:00:04PM -0700, Kent Overstreet wrote: > Why is it that we don't want to send giant discards? Is it a latency issue? Yes. Take a look at the "Configurable max discard size" thread(s). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a > 4GB bio in blkdev_issue_{discard,write_same}. Note that this isn't special casing, we can't build > 4GB bios for data either, it's just implemented as a side effect right now instead of checked explicitly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] --- this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] --- same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity It works. [start_sector, end_sector] [0, 8388607] [0, 8388479] [8388480, 8388607] So how about below patch? commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 Author: Ming Lin min...@ssi.samsung.com Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we ensure max_discard_sectors is of the proper granularity. So if discard size 4G, blkdev_issue_discard() always send multiple granularity requests to lower level, except that the last one may be not multiple granularity. Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-lib.c | 37 + 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..e178a07 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q-limits.discard_granularity 9, 1U); - alignment = (bdev_discard_alignment(bdev) 9) % granularity; /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); +* Ensure that max_discard_sectors doesn't overflow bi_size and is of +* the proper granularity. So if discard size 4G, blkdev_issue_discard() +* always split and send multiple granularity requests to lower level, +* except that the last one may be not multiple granularity. + */ + max_discard_sectors = UINT_MAX 9; max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } if (flags BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects nr_sects - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio-bi_iter.bi_sector = sector; bio-bi_end_io = bio_batch_end_io; @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] --- this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] --- same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity Ideally we'd get upper layers out of the business of knowing about the queue limits at all - that was the point of the patch series, after all. Instead of using UINT_MAX, would it work to just make the max 1 31 sectors? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet kent.overstr...@gmail.com wrote: On Fri, Aug 07, 2015 at 04:40:06PM -0700, Ming Lin wrote: On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] --- this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] --- same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity Ideally we'd get upper layers out of the business of knowing about the queue limits at all - that was the point of the patch series, after all. Instead of using UINT_MAX, would it work to just make the max 1 31 sectors?' 1 31 = 2G bytes = 0x40 sectors. Yes, that works as long as it's multiple of granularity. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Aug 07, 2015 at 10:17:43PM -0700, Ming Lin wrote: On Fri, Aug 7, 2015 at 5:30 PM, Kent Overstreet Ideally we'd get upper layers out of the business of knowing about the queue limits at all - that was the point of the patch series, after all. Instead of using UINT_MAX, would it work to just make the max 1 31 sectors?' 1 31 = 2G bytes = 0x40 sectors. Yes, that works as long as it's multiple of granularity. Is granularity required to be a power of two? One would hope, but looking at the code that doesn't appear to be a requirement... ugh, that's terrible... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Thu, Aug 06, 2015 at 05:00:04PM -0700, Kent Overstreet wrote: Why is it that we don't want to send giant discards? Is it a latency issue? Yes. Take a look at the Configurable max discard size thread(s). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a 4GB bio in blkdev_issue_{discard,write_same}. Note that this isn't special casing, we can't build 4GB bios for data either, it's just implemented as a side effect right now instead of checked explicitly. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, Aug 04, 2015 at 01:36:26PM +0200, Christoph Hellwig wrote: > On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote: > > I think the important thing is the late splitting for regular bio. > > For discard/write_same bio, how about just don't do late splitting? > > I'd hate having to special case them even more. Especially as the > discard splitting is nasty and we really don't want to send giant > discards by default anyway (see Jens' patches to limit discard size > by default). Why is it that we don't want to send giant discards? Is it a latency issue? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, Aug 04, 2015 at 01:36:26PM +0200, Christoph Hellwig wrote: On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote: I think the important thing is the late splitting for regular bio. For discard/write_same bio, how about just don't do late splitting? I'd hate having to special case them even more. Especially as the discard splitting is nasty and we really don't want to send giant discards by default anyway (see Jens' patches to limit discard size by default). Why is it that we don't want to send giant discards? Is it a latency issue? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, 2015-08-04 at 13:36 +0200, Christoph Hellwig wrote: > On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote: > > I think the important thing is the late splitting for regular bio. > > For discard/write_same bio, how about just don't do late splitting? > > I'd hate having to special case them even more. Especially as the > discard splitting is nasty and we really don't want to send giant > discards by default anyway (see Jens' patches to limit discard size > by default). > > So I'd recommend to keep everything as-is, just make sure we don't > overflow bi_size. Did you mean to remove "PATCH 4 block: remove split code in blkdev_issue_discard" or to keep it? Which of below 2 solutions you prefer? - Solution 1 remove splits in blkdev_issue_{discard,write_same} and keep blk_bio_{discard,write_same}_split But for blkdev_issue_discard(), it's not enough if only make sure bi_size not overflow, for example, discard 4G 4G bytes = 8388608 sectors UINT_MAX = 8388607 sectors So blkdev_issue_discard() will send 2 discard bios. First bio: sector 0 .. 8388606 Second bio: sector 8388607 .. 8388607 In this case, the 2 discard tests in device-mapper-test-suite still fail, probably because the second bio start sector is not aligned with discard_granularity. So I have to take into account discard_granularity(assume 32 sectors), then blkdev_issue_discard() will send 2 discard bios, as First bio: sector 0 .. 8388575 Second bio: sector 8388576 .. 8388607 In this case, both discard tests passed. - Solution 2 special case discard/write_same bios(You said you hate it). That is to keep splits in blkdev_issue_{discard,write_same} and remove blk_bio_{discard,write_same}_split I think this is more clean way because blkdev_issue_{discard,write_same} already make sure we don't overflow bi_size. And blk_bio_{discard,write_same}_split are actually duplicated with the splits in blkdev_issue_{discard,write_same}. It's OK to remove it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Tue, 2015-08-04 at 13:36 +0200, Christoph Hellwig wrote: On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote: I think the important thing is the late splitting for regular bio. For discard/write_same bio, how about just don't do late splitting? I'd hate having to special case them even more. Especially as the discard splitting is nasty and we really don't want to send giant discards by default anyway (see Jens' patches to limit discard size by default). So I'd recommend to keep everything as-is, just make sure we don't overflow bi_size. Did you mean to remove PATCH 4 block: remove split code in blkdev_issue_discard or to keep it? Which of below 2 solutions you prefer? - Solution 1 remove splits in blkdev_issue_{discard,write_same} and keep blk_bio_{discard,write_same}_split But for blkdev_issue_discard(), it's not enough if only make sure bi_size not overflow, for example, discard 4G 4G bytes = 8388608 sectors UINT_MAX = 8388607 sectors So blkdev_issue_discard() will send 2 discard bios. First bio: sector 0 .. 8388606 Second bio: sector 8388607 .. 8388607 In this case, the 2 discard tests in device-mapper-test-suite still fail, probably because the second bio start sector is not aligned with discard_granularity. So I have to take into account discard_granularity(assume 32 sectors), then blkdev_issue_discard() will send 2 discard bios, as First bio: sector 0 .. 8388575 Second bio: sector 8388576 .. 8388607 In this case, both discard tests passed. - Solution 2 special case discard/write_same bios(You said you hate it). That is to keep splits in blkdev_issue_{discard,write_same} and remove blk_bio_{discard,write_same}_split I think this is more clean way because blkdev_issue_{discard,write_same} already make sure we don't overflow bi_size. And blk_bio_{discard,write_same}_split are actually duplicated with the splits in blkdev_issue_{discard,write_same}. It's OK to remove it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote: > I think the important thing is the late splitting for regular bio. > For discard/write_same bio, how about just don't do late splitting? I'd hate having to special case them even more. Especially as the discard splitting is nasty and we really don't want to send giant discards by default anyway (see Jens' patches to limit discard size by default). So I'd recommend to keep everything as-is, just make sure we don't overflow bi_size. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, Aug 02, 2015 at 10:58:22PM -0700, Ming Lin wrote: I think the important thing is the late splitting for regular bio. For discard/write_same bio, how about just don't do late splitting? I'd hate having to special case them even more. Especially as the discard splitting is nasty and we really don't want to send giant discards by default anyway (see Jens' patches to limit discard size by default). So I'd recommend to keep everything as-is, just make sure we don't overflow bi_size. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-01 at 12:33 -0400, Mike Snitzer wrote: > On Sat, Aug 01 2015 at 2:58am -0400, > Ming Lin wrote: > > > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: > > > > > > OK, once setup, to run the 2 tests in question directly you'd do > > > something like: > > > > > > dmtest run --suite thin-provisioning -n discard_a_fragmented_device > > > > > > dmtest run --suite thin-provisioning -n > > > discard_fully_provisioned_device_benchmark > > > > > > Again, these tests pass without this patchset. > > > > It's caused by patch 4. Typo. I mean patch 5. > > When discard size >=4G, the bio->bi_iter.bi_size overflows. > > Thanks for tracking this down! blkdev_issue_write_same() has same problem. > > > Below is the new patch. > > > > Christoph, > > Could you also help to review it? > > > > Now we still do "misaligned" check in blkdev_issue_discard(). > > So the same code in blk_bio_discard_split() was removed. > > But I don't agree with this approach. One of the most meaningful > benefits of late bio splitting is the upper layers shouldn't _need_ to > depend on the intermediate devices' queue_limits being stacked properly. > Your solution to mix discard granularity/alignment checks at the upper > layer(s) but then split based on max_discard_sectors at the lower layer > defeats that benefit for discards. > > This will translate to all intermediate layers that might split > discards needing to worry about granularity/alignment > too (e.g. how dm-thinp will have to care because it must generate > discard mappings with associated bios based on how blocks were mapped to > thinp). I think the important thing is the late splitting for regular bio. For discard/write_same bio, how about just don't do late splitting? That is: 1. remove "PATCH 5: block: remove split code in blkdev_issue_discard" 2. Add below changes to PATCH 1 diff --git a/block/blk-merge.c b/block/blk-merge.c index 1f5dfa0..90b085e 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,59 +9,6 @@ #include "blk.h" -static struct bio *blk_bio_discard_split(struct request_queue *q, -struct bio *bio, -struct bio_set *bs) -{ - unsigned int max_discard_sectors, granularity; - int alignment; - sector_t tmp; - unsigned split_sectors; - - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - - if (unlikely(!max_discard_sectors)) { - /* XXX: warn */ - return NULL; - } - - if (bio_sectors(bio) <= max_discard_sectors) - return NULL; - - split_sectors = max_discard_sectors; - - /* -* If the next starting sector would be misaligned, stop the discard at -* the previous aligned sector. -*/ - alignment = (q->limits.discard_alignment >> 9) % granularity; - - tmp = bio->bi_iter.bi_sector + split_sectors - alignment; - tmp = sector_div(tmp, granularity); - - if (split_sectors > tmp) - split_sectors -= tmp; - - return bio_split(bio, split_sectors, GFP_NOIO, bs); -} - -static struct bio *blk_bio_write_same_split(struct request_queue *q, - struct bio *bio, - struct bio_set *bs) -{ - if (!q->limits.max_write_same_sectors) - return NULL; - - if (bio_sectors(bio) <= q->limits.max_write_same_sectors) - return NULL; - - return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs); -} - static struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio, struct bio_set *bs) @@ -129,10 +76,8 @@ void blk_queue_split(struct request_queue *q, struct bio **bio, { struct bio *split; - if ((*bio)->bi_rw & REQ_DISCARD) - split = blk_bio_discard_split(q, *bio, bs); - else if ((*bio)->bi_rw & REQ_WRITE_SAME) - split = blk_bio_write_same_split(q, *bio, bs); + if ((*bio)->bi_rw & REQ_DISCARD || (*bio)->bi_rw & REQ_WRITE_SAME) + split = NULL; else split = blk_bio_segment_split(q, *bio, q->bio_split); > > Also, it is unfortunate that IO that doesn't have a payload is being > artificially split simply because bio->bi_iter.bi_size is 32bits. Indeed. Will it be possible to make it 64bits? I guess no. > > Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-01 at 12:33 -0400, Mike Snitzer wrote: On Sat, Aug 01 2015 at 2:58am -0400, Ming Lin m...@kernel.org wrote: On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: OK, once setup, to run the 2 tests in question directly you'd do something like: dmtest run --suite thin-provisioning -n discard_a_fragmented_device dmtest run --suite thin-provisioning -n discard_fully_provisioned_device_benchmark Again, these tests pass without this patchset. It's caused by patch 4. Typo. I mean patch 5. When discard size =4G, the bio-bi_iter.bi_size overflows. Thanks for tracking this down! blkdev_issue_write_same() has same problem. Below is the new patch. Christoph, Could you also help to review it? Now we still do misaligned check in blkdev_issue_discard(). So the same code in blk_bio_discard_split() was removed. But I don't agree with this approach. One of the most meaningful benefits of late bio splitting is the upper layers shouldn't _need_ to depend on the intermediate devices' queue_limits being stacked properly. Your solution to mix discard granularity/alignment checks at the upper layer(s) but then split based on max_discard_sectors at the lower layer defeats that benefit for discards. This will translate to all intermediate layers that might split discards needing to worry about granularity/alignment too (e.g. how dm-thinp will have to care because it must generate discard mappings with associated bios based on how blocks were mapped to thinp). I think the important thing is the late splitting for regular bio. For discard/write_same bio, how about just don't do late splitting? That is: 1. remove PATCH 5: block: remove split code in blkdev_issue_discard 2. Add below changes to PATCH 1 diff --git a/block/blk-merge.c b/block/blk-merge.c index 1f5dfa0..90b085e 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,59 +9,6 @@ #include blk.h -static struct bio *blk_bio_discard_split(struct request_queue *q, -struct bio *bio, -struct bio_set *bs) -{ - unsigned int max_discard_sectors, granularity; - int alignment; - sector_t tmp; - unsigned split_sectors; - - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q-limits.discard_granularity 9, 1U); - - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); - max_discard_sectors -= max_discard_sectors % granularity; - - if (unlikely(!max_discard_sectors)) { - /* XXX: warn */ - return NULL; - } - - if (bio_sectors(bio) = max_discard_sectors) - return NULL; - - split_sectors = max_discard_sectors; - - /* -* If the next starting sector would be misaligned, stop the discard at -* the previous aligned sector. -*/ - alignment = (q-limits.discard_alignment 9) % granularity; - - tmp = bio-bi_iter.bi_sector + split_sectors - alignment; - tmp = sector_div(tmp, granularity); - - if (split_sectors tmp) - split_sectors -= tmp; - - return bio_split(bio, split_sectors, GFP_NOIO, bs); -} - -static struct bio *blk_bio_write_same_split(struct request_queue *q, - struct bio *bio, - struct bio_set *bs) -{ - if (!q-limits.max_write_same_sectors) - return NULL; - - if (bio_sectors(bio) = q-limits.max_write_same_sectors) - return NULL; - - return bio_split(bio, q-limits.max_write_same_sectors, GFP_NOIO, bs); -} - static struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio, struct bio_set *bs) @@ -129,10 +76,8 @@ void blk_queue_split(struct request_queue *q, struct bio **bio, { struct bio *split; - if ((*bio)-bi_rw REQ_DISCARD) - split = blk_bio_discard_split(q, *bio, bs); - else if ((*bio)-bi_rw REQ_WRITE_SAME) - split = blk_bio_write_same_split(q, *bio, bs); + if ((*bio)-bi_rw REQ_DISCARD || (*bio)-bi_rw REQ_WRITE_SAME) + split = NULL; else split = blk_bio_segment_split(q, *bio, q-bio_split); Also, it is unfortunate that IO that doesn't have a payload is being artificially split simply because bio-bi_iter.bi_size is 32bits. Indeed. Will it be possible to make it 64bits? I guess no. Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 01 2015 at 2:58am -0400, Ming Lin wrote: > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: > > > > OK, once setup, to run the 2 tests in question directly you'd do > > something like: > > > > dmtest run --suite thin-provisioning -n discard_a_fragmented_device > > > > dmtest run --suite thin-provisioning -n > > discard_fully_provisioned_device_benchmark > > > > Again, these tests pass without this patchset. > > It's caused by patch 4. > When discard size >=4G, the bio->bi_iter.bi_size overflows. Thanks for tracking this down! > Below is the new patch. > > Christoph, > Could you also help to review it? > > Now we still do "misaligned" check in blkdev_issue_discard(). > So the same code in blk_bio_discard_split() was removed. But I don't agree with this approach. One of the most meaningful benefits of late bio splitting is the upper layers shouldn't _need_ to depend on the intermediate devices' queue_limits being stacked properly. Your solution to mix discard granularity/alignment checks at the upper layer(s) but then split based on max_discard_sectors at the lower layer defeats that benefit for discards. This will translate to all intermediate layers that might split discards needing to worry about granularity/alignment too (e.g. how dm-thinp will have to care because it must generate discard mappings with associated bios based on how blocks were mapped to thinp). Also, it is unfortunate that IO that doesn't have a payload is being artificially split simply because bio->bi_iter.bi_size is 32bits. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: > On Fri, Jul 31 2015 at 5:19pm -0400, > Ming Lin wrote: > > > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer wrote: > > > On Mon, Jul 06 2015 at 3:44P -0400, > > > Ming Lin wrote: > > > > > >> From: Kent Overstreet > > >> > > >> The way the block layer is currently written, it goes to great lengths > > >> to avoid having to split bios; upper layer code (such as bio_add_page()) > > >> checks what the underlying device can handle and tries to always create > > >> bios that don't need to be split. > > >> > > >> But this approach becomes unwieldy and eventually breaks down with > > >> stacked devices and devices with dynamic limits, and it adds a lot of > > >> complexity. If the block layer could split bios as needed, we could > > >> eliminate a lot of complexity elsewhere - particularly in stacked > > >> drivers. Code that creates bios can then create whatever size bios are > > >> convenient, and more importantly stacked drivers don't have to deal with > > >> both their own bio size limitations and the limitations of the > > >> (potentially multiple) devices underneath them. In the future this will > > >> let us delete merge_bvec_fn and a bunch of other code. > > >> > > >> We do this by adding calls to blk_queue_split() to the various > > >> make_request functions that need it - a few can already handle arbitrary > > >> size bios. Note that we add the call _after_ any call to > > >> blk_queue_bounce(); this means that blk_queue_split() and > > >> blk_recalc_rq_segments() don't need to be concerned with bouncing > > >> affecting segment merging. > > >> > > >> Some make_request_fn() callbacks were simple enough to audit and verify > > >> they don't need blk_queue_split() calls. The skipped ones are: > > >> > > >> * nfhd_make_request (arch/m68k/emu/nfblock.c) > > >> * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) > > >> * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) > > >> * brd_make_request (ramdisk - drivers/block/brd.c) > > >> * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) > > >> * loop_make_request > > >> * null_queue_bio > > >> * bcache's make_request fns > > >> > > >> Some others are almost certainly safe to remove now, but will be left > > >> for future patches. > > >> > > >> Cc: Jens Axboe > > >> Cc: Christoph Hellwig > > >> Cc: Al Viro > > >> Cc: Ming Lei > > >> Cc: Neil Brown > > >> Cc: Alasdair Kergon > > >> Cc: Mike Snitzer > > >> Cc: dm-de...@redhat.com > > >> Cc: Lars Ellenberg > > >> Cc: drbd-u...@lists.linbit.com > > >> Cc: Jiri Kosina > > >> Cc: Geoff Levand > > >> Cc: Jim Paris > > >> Cc: Joshua Morris > > >> Cc: Philip Kelleher > > >> Cc: Minchan Kim > > >> Cc: Nitin Gupta > > >> Cc: Oleg Drokin > > >> Cc: Andreas Dilger > > >> Acked-by: NeilBrown (for the 'md/md.c' bits) > > >> Signed-off-by: Kent Overstreet > > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] > > >> Signed-off-by: Dongsu Park > > >> Signed-off-by: Ming Lin > > > ... > > >> diff --git a/block/blk-merge.c b/block/blk-merge.c > > >> index 30a0d9f..3707f30 100644 > > >> --- a/block/blk-merge.c > > >> +++ b/block/blk-merge.c > > >> @@ -9,12 +9,158 @@ > > >> > > >> #include "blk.h" > > >> > > >> +static struct bio *blk_bio_discard_split(struct request_queue *q, > > >> + struct bio *bio, > > >> + struct bio_set *bs) > > >> +{ > > >> + unsigned int max_discard_sectors, granularity; > > >> + int alignment; > > >> + sector_t tmp; > > >> + unsigned split_sectors; > > >> + > > >> + /* Zero-sector (unknown) and one-sector granularities are the > > >> same. */ > > >> + granularity = max(q->limits.discard_granularity >> 9, 1U); > > >> + > > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX > > >> >> 9); > > >> + max_discard_sectors -= max_discard_sectors % granularity; > > >> + > > >> + if (unlikely(!max_discard_sectors)) { > > >> + /* XXX: warn */ > > >> + return NULL; > > >> + } > > >> + > > >> + if (bio_sectors(bio) <= max_discard_sectors) > > >> + return NULL; > > >> + > > >> + split_sectors = max_discard_sectors; > > >> + > > >> + /* > > >> + * If the next starting sector would be misaligned, stop the > > >> discard at > > >> + * the previous aligned sector. > > >> + */ > > >> + alignment = (q->limits.discard_alignment >> 9) % granularity; > > >> + > > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment; > > >> + tmp = sector_div(tmp, granularity); > > >> + > > >> + if (split_sectors > tmp) > > >> + split_sectors -= tmp; > > >> + > > >> + return bio_split(bio, split_sectors, GFP_NOIO, bs); > > >> +} > > > > > > This code to stop the discard at the previous aligned sector could be > > > the reason why I have 2 device-mapper-test-suite tests in the >
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: On Fri, Jul 31 2015 at 5:19pm -0400, Ming Lin m...@kernel.org wrote: On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote: On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin m...@kernel.org wrote: From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com ... diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f..3707f30 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,12 +9,158 @@ #include blk.h +static struct bio *blk_bio_discard_split(struct request_queue *q, + struct bio *bio, + struct bio_set *bs) +{ + unsigned int max_discard_sectors, granularity; + int alignment; + sector_t tmp; + unsigned split_sectors; + + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q-limits.discard_granularity 9, 1U); + + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + max_discard_sectors -= max_discard_sectors % granularity; + + if (unlikely(!max_discard_sectors)) { + /* XXX: warn */ + return NULL; + } + + if (bio_sectors(bio) = max_discard_sectors) + return NULL; + + split_sectors = max_discard_sectors; + + /* + * If the next starting sector would be misaligned, stop the discard at + * the previous aligned sector. + */ + alignment = (q-limits.discard_alignment 9) % granularity; + + tmp = bio-bi_iter.bi_sector + split_sectors - alignment; + tmp = sector_div(tmp, granularity); + + if (split_sectors tmp) + split_sectors -= tmp; + + return bio_split(bio, split_sectors, GFP_NOIO, bs); +} This code to stop the discard at the previous aligned sector could be the reason why I have 2 device-mapper-test-suite tests in the
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 01 2015 at 2:58am -0400, Ming Lin m...@kernel.org wrote: On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: OK, once setup, to run the 2 tests in question directly you'd do something like: dmtest run --suite thin-provisioning -n discard_a_fragmented_device dmtest run --suite thin-provisioning -n discard_fully_provisioned_device_benchmark Again, these tests pass without this patchset. It's caused by patch 4. When discard size =4G, the bio-bi_iter.bi_size overflows. Thanks for tracking this down! Below is the new patch. Christoph, Could you also help to review it? Now we still do misaligned check in blkdev_issue_discard(). So the same code in blk_bio_discard_split() was removed. But I don't agree with this approach. One of the most meaningful benefits of late bio splitting is the upper layers shouldn't _need_ to depend on the intermediate devices' queue_limits being stacked properly. Your solution to mix discard granularity/alignment checks at the upper layer(s) but then split based on max_discard_sectors at the lower layer defeats that benefit for discards. This will translate to all intermediate layers that might split discards needing to worry about granularity/alignment too (e.g. how dm-thinp will have to care because it must generate discard mappings with associated bios based on how blocks were mapped to thinp). Also, it is unfortunate that IO that doesn't have a payload is being artificially split simply because bio-bi_iter.bi_size is 32bits. Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Jul 31, 2015 at 3:02 PM, Ming Lin wrote: > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: >> On Fri, Jul 31 2015 at 5:19pm -0400, >> Ming Lin wrote: >> >> > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer wrote: >> > > On Mon, Jul 06 2015 at 3:44P -0400, >> > > Ming Lin wrote: >> > > >> > >> From: Kent Overstreet >> > >> >> > >> The way the block layer is currently written, it goes to great lengths >> > >> to avoid having to split bios; upper layer code (such as bio_add_page()) >> > >> checks what the underlying device can handle and tries to always create >> > >> bios that don't need to be split. >> > >> >> > >> But this approach becomes unwieldy and eventually breaks down with >> > >> stacked devices and devices with dynamic limits, and it adds a lot of >> > >> complexity. If the block layer could split bios as needed, we could >> > >> eliminate a lot of complexity elsewhere - particularly in stacked >> > >> drivers. Code that creates bios can then create whatever size bios are >> > >> convenient, and more importantly stacked drivers don't have to deal with >> > >> both their own bio size limitations and the limitations of the >> > >> (potentially multiple) devices underneath them. In the future this will >> > >> let us delete merge_bvec_fn and a bunch of other code. >> > >> >> > >> We do this by adding calls to blk_queue_split() to the various >> > >> make_request functions that need it - a few can already handle arbitrary >> > >> size bios. Note that we add the call _after_ any call to >> > >> blk_queue_bounce(); this means that blk_queue_split() and >> > >> blk_recalc_rq_segments() don't need to be concerned with bouncing >> > >> affecting segment merging. >> > >> >> > >> Some make_request_fn() callbacks were simple enough to audit and verify >> > >> they don't need blk_queue_split() calls. The skipped ones are: >> > >> >> > >> * nfhd_make_request (arch/m68k/emu/nfblock.c) >> > >> * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) >> > >> * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) >> > >> * brd_make_request (ramdisk - drivers/block/brd.c) >> > >> * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) >> > >> * loop_make_request >> > >> * null_queue_bio >> > >> * bcache's make_request fns >> > >> >> > >> Some others are almost certainly safe to remove now, but will be left >> > >> for future patches. >> > >> >> > >> Cc: Jens Axboe >> > >> Cc: Christoph Hellwig >> > >> Cc: Al Viro >> > >> Cc: Ming Lei >> > >> Cc: Neil Brown >> > >> Cc: Alasdair Kergon >> > >> Cc: Mike Snitzer >> > >> Cc: dm-de...@redhat.com >> > >> Cc: Lars Ellenberg >> > >> Cc: drbd-u...@lists.linbit.com >> > >> Cc: Jiri Kosina >> > >> Cc: Geoff Levand >> > >> Cc: Jim Paris >> > >> Cc: Joshua Morris >> > >> Cc: Philip Kelleher >> > >> Cc: Minchan Kim >> > >> Cc: Nitin Gupta >> > >> Cc: Oleg Drokin >> > >> Cc: Andreas Dilger >> > >> Acked-by: NeilBrown (for the 'md/md.c' bits) >> > >> Signed-off-by: Kent Overstreet >> > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] >> > >> Signed-off-by: Dongsu Park >> > >> Signed-off-by: Ming Lin >> > > ... >> > >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> > >> index 30a0d9f..3707f30 100644 >> > >> --- a/block/blk-merge.c >> > >> +++ b/block/blk-merge.c >> > >> @@ -9,12 +9,158 @@ >> > >> >> > >> #include "blk.h" >> > >> >> > >> +static struct bio *blk_bio_discard_split(struct request_queue *q, >> > >> + struct bio *bio, >> > >> + struct bio_set *bs) >> > >> +{ >> > >> + unsigned int max_discard_sectors, granularity; >> > >> + int alignment; >> > >> + sector_t tmp; >> > >> + unsigned split_sectors; >> > >> + >> > >> + /* Zero-sector (unknown) and one-sector granularities are the >> > >> same. */ >> > >> + granularity = max(q->limits.discard_granularity >> 9, 1U); >> > >> + >> > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> > >> >> 9); >> > >> + max_discard_sectors -= max_discard_sectors % granularity; >> > >> + >> > >> + if (unlikely(!max_discard_sectors)) { >> > >> + /* XXX: warn */ >> > >> + return NULL; >> > >> + } >> > >> + >> > >> + if (bio_sectors(bio) <= max_discard_sectors) >> > >> + return NULL; >> > >> + >> > >> + split_sectors = max_discard_sectors; >> > >> + >> > >> + /* >> > >> + * If the next starting sector would be misaligned, stop the >> > >> discard at >> > >> + * the previous aligned sector. >> > >> + */ >> > >> + alignment = (q->limits.discard_alignment >> 9) % granularity; >> > >> + >> > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment; >> > >> + tmp = sector_div(tmp, granularity); >> > >> + >> > >> + if (split_sectors > tmp) >> > >> + split_sectors -= tmp; >> > >> + >> > >> + return bio_split(bio, split_sectors,
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: > On Fri, Jul 31 2015 at 5:19pm -0400, > Ming Lin wrote: > > > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer wrote: > > > On Mon, Jul 06 2015 at 3:44P -0400, > > > Ming Lin wrote: > > > > > >> From: Kent Overstreet > > >> > > >> The way the block layer is currently written, it goes to great lengths > > >> to avoid having to split bios; upper layer code (such as bio_add_page()) > > >> checks what the underlying device can handle and tries to always create > > >> bios that don't need to be split. > > >> > > >> But this approach becomes unwieldy and eventually breaks down with > > >> stacked devices and devices with dynamic limits, and it adds a lot of > > >> complexity. If the block layer could split bios as needed, we could > > >> eliminate a lot of complexity elsewhere - particularly in stacked > > >> drivers. Code that creates bios can then create whatever size bios are > > >> convenient, and more importantly stacked drivers don't have to deal with > > >> both their own bio size limitations and the limitations of the > > >> (potentially multiple) devices underneath them. In the future this will > > >> let us delete merge_bvec_fn and a bunch of other code. > > >> > > >> We do this by adding calls to blk_queue_split() to the various > > >> make_request functions that need it - a few can already handle arbitrary > > >> size bios. Note that we add the call _after_ any call to > > >> blk_queue_bounce(); this means that blk_queue_split() and > > >> blk_recalc_rq_segments() don't need to be concerned with bouncing > > >> affecting segment merging. > > >> > > >> Some make_request_fn() callbacks were simple enough to audit and verify > > >> they don't need blk_queue_split() calls. The skipped ones are: > > >> > > >> * nfhd_make_request (arch/m68k/emu/nfblock.c) > > >> * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) > > >> * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) > > >> * brd_make_request (ramdisk - drivers/block/brd.c) > > >> * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) > > >> * loop_make_request > > >> * null_queue_bio > > >> * bcache's make_request fns > > >> > > >> Some others are almost certainly safe to remove now, but will be left > > >> for future patches. > > >> > > >> Cc: Jens Axboe > > >> Cc: Christoph Hellwig > > >> Cc: Al Viro > > >> Cc: Ming Lei > > >> Cc: Neil Brown > > >> Cc: Alasdair Kergon > > >> Cc: Mike Snitzer > > >> Cc: dm-de...@redhat.com > > >> Cc: Lars Ellenberg > > >> Cc: drbd-u...@lists.linbit.com > > >> Cc: Jiri Kosina > > >> Cc: Geoff Levand > > >> Cc: Jim Paris > > >> Cc: Joshua Morris > > >> Cc: Philip Kelleher > > >> Cc: Minchan Kim > > >> Cc: Nitin Gupta > > >> Cc: Oleg Drokin > > >> Cc: Andreas Dilger > > >> Acked-by: NeilBrown (for the 'md/md.c' bits) > > >> Signed-off-by: Kent Overstreet > > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] > > >> Signed-off-by: Dongsu Park > > >> Signed-off-by: Ming Lin > > > ... > > >> diff --git a/block/blk-merge.c b/block/blk-merge.c > > >> index 30a0d9f..3707f30 100644 > > >> --- a/block/blk-merge.c > > >> +++ b/block/blk-merge.c > > >> @@ -9,12 +9,158 @@ > > >> > > >> #include "blk.h" > > >> > > >> +static struct bio *blk_bio_discard_split(struct request_queue *q, > > >> + struct bio *bio, > > >> + struct bio_set *bs) > > >> +{ > > >> + unsigned int max_discard_sectors, granularity; > > >> + int alignment; > > >> + sector_t tmp; > > >> + unsigned split_sectors; > > >> + > > >> + /* Zero-sector (unknown) and one-sector granularities are the > > >> same. */ > > >> + granularity = max(q->limits.discard_granularity >> 9, 1U); > > >> + > > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX > > >> >> 9); > > >> + max_discard_sectors -= max_discard_sectors % granularity; > > >> + > > >> + if (unlikely(!max_discard_sectors)) { > > >> + /* XXX: warn */ > > >> + return NULL; > > >> + } > > >> + > > >> + if (bio_sectors(bio) <= max_discard_sectors) > > >> + return NULL; > > >> + > > >> + split_sectors = max_discard_sectors; > > >> + > > >> + /* > > >> + * If the next starting sector would be misaligned, stop the > > >> discard at > > >> + * the previous aligned sector. > > >> + */ > > >> + alignment = (q->limits.discard_alignment >> 9) % granularity; > > >> + > > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment; > > >> + tmp = sector_div(tmp, granularity); > > >> + > > >> + if (split_sectors > tmp) > > >> + split_sectors -= tmp; > > >> + > > >> + return bio_split(bio, split_sectors, GFP_NOIO, bs); > > >> +} > > > > > > This code to stop the discard at the previous aligned sector could be > > > the reason why I have 2 device-mapper-test-suite tests in the >
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Jul 31 2015 at 5:19pm -0400, Ming Lin wrote: > On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer wrote: > > On Mon, Jul 06 2015 at 3:44P -0400, > > Ming Lin wrote: > > > >> From: Kent Overstreet > >> > >> The way the block layer is currently written, it goes to great lengths > >> to avoid having to split bios; upper layer code (such as bio_add_page()) > >> checks what the underlying device can handle and tries to always create > >> bios that don't need to be split. > >> > >> But this approach becomes unwieldy and eventually breaks down with > >> stacked devices and devices with dynamic limits, and it adds a lot of > >> complexity. If the block layer could split bios as needed, we could > >> eliminate a lot of complexity elsewhere - particularly in stacked > >> drivers. Code that creates bios can then create whatever size bios are > >> convenient, and more importantly stacked drivers don't have to deal with > >> both their own bio size limitations and the limitations of the > >> (potentially multiple) devices underneath them. In the future this will > >> let us delete merge_bvec_fn and a bunch of other code. > >> > >> We do this by adding calls to blk_queue_split() to the various > >> make_request functions that need it - a few can already handle arbitrary > >> size bios. Note that we add the call _after_ any call to > >> blk_queue_bounce(); this means that blk_queue_split() and > >> blk_recalc_rq_segments() don't need to be concerned with bouncing > >> affecting segment merging. > >> > >> Some make_request_fn() callbacks were simple enough to audit and verify > >> they don't need blk_queue_split() calls. The skipped ones are: > >> > >> * nfhd_make_request (arch/m68k/emu/nfblock.c) > >> * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) > >> * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) > >> * brd_make_request (ramdisk - drivers/block/brd.c) > >> * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) > >> * loop_make_request > >> * null_queue_bio > >> * bcache's make_request fns > >> > >> Some others are almost certainly safe to remove now, but will be left > >> for future patches. > >> > >> Cc: Jens Axboe > >> Cc: Christoph Hellwig > >> Cc: Al Viro > >> Cc: Ming Lei > >> Cc: Neil Brown > >> Cc: Alasdair Kergon > >> Cc: Mike Snitzer > >> Cc: dm-de...@redhat.com > >> Cc: Lars Ellenberg > >> Cc: drbd-u...@lists.linbit.com > >> Cc: Jiri Kosina > >> Cc: Geoff Levand > >> Cc: Jim Paris > >> Cc: Joshua Morris > >> Cc: Philip Kelleher > >> Cc: Minchan Kim > >> Cc: Nitin Gupta > >> Cc: Oleg Drokin > >> Cc: Andreas Dilger > >> Acked-by: NeilBrown (for the 'md/md.c' bits) > >> Signed-off-by: Kent Overstreet > >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] > >> Signed-off-by: Dongsu Park > >> Signed-off-by: Ming Lin > > ... > >> diff --git a/block/blk-merge.c b/block/blk-merge.c > >> index 30a0d9f..3707f30 100644 > >> --- a/block/blk-merge.c > >> +++ b/block/blk-merge.c > >> @@ -9,12 +9,158 @@ > >> > >> #include "blk.h" > >> > >> +static struct bio *blk_bio_discard_split(struct request_queue *q, > >> + struct bio *bio, > >> + struct bio_set *bs) > >> +{ > >> + unsigned int max_discard_sectors, granularity; > >> + int alignment; > >> + sector_t tmp; > >> + unsigned split_sectors; > >> + > >> + /* Zero-sector (unknown) and one-sector granularities are the same. > >> */ > >> + granularity = max(q->limits.discard_granularity >> 9, 1U); > >> + > >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> > >> 9); > >> + max_discard_sectors -= max_discard_sectors % granularity; > >> + > >> + if (unlikely(!max_discard_sectors)) { > >> + /* XXX: warn */ > >> + return NULL; > >> + } > >> + > >> + if (bio_sectors(bio) <= max_discard_sectors) > >> + return NULL; > >> + > >> + split_sectors = max_discard_sectors; > >> + > >> + /* > >> + * If the next starting sector would be misaligned, stop the discard > >> at > >> + * the previous aligned sector. > >> + */ > >> + alignment = (q->limits.discard_alignment >> 9) % granularity; > >> + > >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment; > >> + tmp = sector_div(tmp, granularity); > >> + > >> + if (split_sectors > tmp) > >> + split_sectors -= tmp; > >> + > >> + return bio_split(bio, split_sectors, GFP_NOIO, bs); > >> +} > > > > This code to stop the discard at the previous aligned sector could be > > the reason why I have 2 device-mapper-test-suite tests in the > > 'thin-provisioning' testsuite failing due to this patchset: > > I'm setting up the testsuite to debug. OK, once setup, to run the 2 tests in question directly you'd do something like: dmtest run --suite thin-provisioning -n discard_a_fragmented_device dmtest run --suite thin-provisioning -n
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer wrote: > On Mon, Jul 06 2015 at 3:44P -0400, > Ming Lin wrote: > >> From: Kent Overstreet >> >> The way the block layer is currently written, it goes to great lengths >> to avoid having to split bios; upper layer code (such as bio_add_page()) >> checks what the underlying device can handle and tries to always create >> bios that don't need to be split. >> >> But this approach becomes unwieldy and eventually breaks down with >> stacked devices and devices with dynamic limits, and it adds a lot of >> complexity. If the block layer could split bios as needed, we could >> eliminate a lot of complexity elsewhere - particularly in stacked >> drivers. Code that creates bios can then create whatever size bios are >> convenient, and more importantly stacked drivers don't have to deal with >> both their own bio size limitations and the limitations of the >> (potentially multiple) devices underneath them. In the future this will >> let us delete merge_bvec_fn and a bunch of other code. >> >> We do this by adding calls to blk_queue_split() to the various >> make_request functions that need it - a few can already handle arbitrary >> size bios. Note that we add the call _after_ any call to >> blk_queue_bounce(); this means that blk_queue_split() and >> blk_recalc_rq_segments() don't need to be concerned with bouncing >> affecting segment merging. >> >> Some make_request_fn() callbacks were simple enough to audit and verify >> they don't need blk_queue_split() calls. The skipped ones are: >> >> * nfhd_make_request (arch/m68k/emu/nfblock.c) >> * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) >> * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) >> * brd_make_request (ramdisk - drivers/block/brd.c) >> * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) >> * loop_make_request >> * null_queue_bio >> * bcache's make_request fns >> >> Some others are almost certainly safe to remove now, but will be left >> for future patches. >> >> Cc: Jens Axboe >> Cc: Christoph Hellwig >> Cc: Al Viro >> Cc: Ming Lei >> Cc: Neil Brown >> Cc: Alasdair Kergon >> Cc: Mike Snitzer >> Cc: dm-de...@redhat.com >> Cc: Lars Ellenberg >> Cc: drbd-u...@lists.linbit.com >> Cc: Jiri Kosina >> Cc: Geoff Levand >> Cc: Jim Paris >> Cc: Joshua Morris >> Cc: Philip Kelleher >> Cc: Minchan Kim >> Cc: Nitin Gupta >> Cc: Oleg Drokin >> Cc: Andreas Dilger >> Acked-by: NeilBrown (for the 'md/md.c' bits) >> Signed-off-by: Kent Overstreet >> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] >> Signed-off-by: Dongsu Park >> Signed-off-by: Ming Lin > ... >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> index 30a0d9f..3707f30 100644 >> --- a/block/blk-merge.c >> +++ b/block/blk-merge.c >> @@ -9,12 +9,158 @@ >> >> #include "blk.h" >> >> +static struct bio *blk_bio_discard_split(struct request_queue *q, >> + struct bio *bio, >> + struct bio_set *bs) >> +{ >> + unsigned int max_discard_sectors, granularity; >> + int alignment; >> + sector_t tmp; >> + unsigned split_sectors; >> + >> + /* Zero-sector (unknown) and one-sector granularities are the same. */ >> + granularity = max(q->limits.discard_granularity >> 9, 1U); >> + >> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> >> 9); >> + max_discard_sectors -= max_discard_sectors % granularity; >> + >> + if (unlikely(!max_discard_sectors)) { >> + /* XXX: warn */ >> + return NULL; >> + } >> + >> + if (bio_sectors(bio) <= max_discard_sectors) >> + return NULL; >> + >> + split_sectors = max_discard_sectors; >> + >> + /* >> + * If the next starting sector would be misaligned, stop the discard at >> + * the previous aligned sector. >> + */ >> + alignment = (q->limits.discard_alignment >> 9) % granularity; >> + >> + tmp = bio->bi_iter.bi_sector + split_sectors - alignment; >> + tmp = sector_div(tmp, granularity); >> + >> + if (split_sectors > tmp) >> + split_sectors -= tmp; >> + >> + return bio_split(bio, split_sectors, GFP_NOIO, bs); >> +} > > This code to stop the discard at the previous aligned sector could be > the reason why I have 2 device-mapper-test-suite tests in the > 'thin-provisioning' testsuite failing due to this patchset: I'm setting up the testsuite to debug. Ming > > 4) Failure: > test_discard_a_fragmented_device(DiscardQuickTests) > > [/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in > `block in assert_no_mappings' > > /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in > `call' > > /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in > `block in with_dev_md' > >
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin wrote: > From: Kent Overstreet > > The way the block layer is currently written, it goes to great lengths > to avoid having to split bios; upper layer code (such as bio_add_page()) > checks what the underlying device can handle and tries to always create > bios that don't need to be split. > > But this approach becomes unwieldy and eventually breaks down with > stacked devices and devices with dynamic limits, and it adds a lot of > complexity. If the block layer could split bios as needed, we could > eliminate a lot of complexity elsewhere - particularly in stacked > drivers. Code that creates bios can then create whatever size bios are > convenient, and more importantly stacked drivers don't have to deal with > both their own bio size limitations and the limitations of the > (potentially multiple) devices underneath them. In the future this will > let us delete merge_bvec_fn and a bunch of other code. > > We do this by adding calls to blk_queue_split() to the various > make_request functions that need it - a few can already handle arbitrary > size bios. Note that we add the call _after_ any call to > blk_queue_bounce(); this means that blk_queue_split() and > blk_recalc_rq_segments() don't need to be concerned with bouncing > affecting segment merging. > > Some make_request_fn() callbacks were simple enough to audit and verify > they don't need blk_queue_split() calls. The skipped ones are: > > * nfhd_make_request (arch/m68k/emu/nfblock.c) > * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) > * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) > * brd_make_request (ramdisk - drivers/block/brd.c) > * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) > * loop_make_request > * null_queue_bio > * bcache's make_request fns > > Some others are almost certainly safe to remove now, but will be left > for future patches. > > Cc: Jens Axboe > Cc: Christoph Hellwig > Cc: Al Viro > Cc: Ming Lei > Cc: Neil Brown > Cc: Alasdair Kergon > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: Lars Ellenberg > Cc: drbd-u...@lists.linbit.com > Cc: Jiri Kosina > Cc: Geoff Levand > Cc: Jim Paris > Cc: Joshua Morris > Cc: Philip Kelleher > Cc: Minchan Kim > Cc: Nitin Gupta > Cc: Oleg Drokin > Cc: Andreas Dilger > Acked-by: NeilBrown (for the 'md/md.c' bits) > Signed-off-by: Kent Overstreet > [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] > Signed-off-by: Dongsu Park > Signed-off-by: Ming Lin ... > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 30a0d9f..3707f30 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -9,12 +9,158 @@ > > #include "blk.h" > > +static struct bio *blk_bio_discard_split(struct request_queue *q, > + struct bio *bio, > + struct bio_set *bs) > +{ > + unsigned int max_discard_sectors, granularity; > + int alignment; > + sector_t tmp; > + unsigned split_sectors; > + > + /* Zero-sector (unknown) and one-sector granularities are the same. */ > + granularity = max(q->limits.discard_granularity >> 9, 1U); > + > + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); > + max_discard_sectors -= max_discard_sectors % granularity; > + > + if (unlikely(!max_discard_sectors)) { > + /* XXX: warn */ > + return NULL; > + } > + > + if (bio_sectors(bio) <= max_discard_sectors) > + return NULL; > + > + split_sectors = max_discard_sectors; > + > + /* > + * If the next starting sector would be misaligned, stop the discard at > + * the previous aligned sector. > + */ > + alignment = (q->limits.discard_alignment >> 9) % granularity; > + > + tmp = bio->bi_iter.bi_sector + split_sectors - alignment; > + tmp = sector_div(tmp, granularity); > + > + if (split_sectors > tmp) > + split_sectors -= tmp; > + > + return bio_split(bio, split_sectors, GFP_NOIO, bs); > +} This code to stop the discard at the previous aligned sector could be the reason why I have 2 device-mapper-test-suite tests in the 'thin-provisioning' testsuite failing due to this patchset: 4) Failure: test_discard_a_fragmented_device(DiscardQuickTests) [/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in `block in assert_no_mappings' /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in `call' /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in `block in with_dev_md' /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:40:in `each' /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:40:in `with_dev_md'
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin m...@kernel.org wrote: From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com ... diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f..3707f30 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,12 +9,158 @@ #include blk.h +static struct bio *blk_bio_discard_split(struct request_queue *q, + struct bio *bio, + struct bio_set *bs) +{ + unsigned int max_discard_sectors, granularity; + int alignment; + sector_t tmp; + unsigned split_sectors; + + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q-limits.discard_granularity 9, 1U); + + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + max_discard_sectors -= max_discard_sectors % granularity; + + if (unlikely(!max_discard_sectors)) { + /* XXX: warn */ + return NULL; + } + + if (bio_sectors(bio) = max_discard_sectors) + return NULL; + + split_sectors = max_discard_sectors; + + /* + * If the next starting sector would be misaligned, stop the discard at + * the previous aligned sector. + */ + alignment = (q-limits.discard_alignment 9) % granularity; + + tmp = bio-bi_iter.bi_sector + split_sectors - alignment; + tmp = sector_div(tmp, granularity); + + if (split_sectors tmp) + split_sectors -= tmp; + + return bio_split(bio, split_sectors, GFP_NOIO, bs); +} This code to stop the discard at the previous aligned sector could be the reason why I have 2 device-mapper-test-suite tests in the 'thin-provisioning' testsuite failing due to this patchset: 4) Failure: test_discard_a_fragmented_device(DiscardQuickTests) [/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in `block in assert_no_mappings' /root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:43:in `call'
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Jul 31 2015 at 5:19pm -0400, Ming Lin m...@kernel.org wrote: On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote: On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin m...@kernel.org wrote: From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com ... diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f..3707f30 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,12 +9,158 @@ #include blk.h +static struct bio *blk_bio_discard_split(struct request_queue *q, + struct bio *bio, + struct bio_set *bs) +{ + unsigned int max_discard_sectors, granularity; + int alignment; + sector_t tmp; + unsigned split_sectors; + + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q-limits.discard_granularity 9, 1U); + + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + max_discard_sectors -= max_discard_sectors % granularity; + + if (unlikely(!max_discard_sectors)) { + /* XXX: warn */ + return NULL; + } + + if (bio_sectors(bio) = max_discard_sectors) + return NULL; + + split_sectors = max_discard_sectors; + + /* + * If the next starting sector would be misaligned, stop the discard at + * the previous aligned sector. + */ + alignment = (q-limits.discard_alignment 9) % granularity; + + tmp = bio-bi_iter.bi_sector + split_sectors - alignment; + tmp = sector_div(tmp, granularity); + + if (split_sectors tmp) + split_sectors -= tmp; + + return bio_split(bio, split_sectors, GFP_NOIO, bs); +} This code to stop the discard at the previous aligned sector could be the reason why I have 2 device-mapper-test-suite tests in the 'thin-provisioning' testsuite failing due to this patchset: I'm setting up the testsuite to debug. OK, once setup, to run the 2 tests in question directly you'd do something like: dmtest run
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote: On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin m...@kernel.org wrote: From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com ... diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f..3707f30 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,12 +9,158 @@ #include blk.h +static struct bio *blk_bio_discard_split(struct request_queue *q, + struct bio *bio, + struct bio_set *bs) +{ + unsigned int max_discard_sectors, granularity; + int alignment; + sector_t tmp; + unsigned split_sectors; + + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q-limits.discard_granularity 9, 1U); + + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + max_discard_sectors -= max_discard_sectors % granularity; + + if (unlikely(!max_discard_sectors)) { + /* XXX: warn */ + return NULL; + } + + if (bio_sectors(bio) = max_discard_sectors) + return NULL; + + split_sectors = max_discard_sectors; + + /* + * If the next starting sector would be misaligned, stop the discard at + * the previous aligned sector. + */ + alignment = (q-limits.discard_alignment 9) % granularity; + + tmp = bio-bi_iter.bi_sector + split_sectors - alignment; + tmp = sector_div(tmp, granularity); + + if (split_sectors tmp) + split_sectors -= tmp; + + return bio_split(bio, split_sectors, GFP_NOIO, bs); +} This code to stop the discard at the previous aligned sector could be the reason why I have 2 device-mapper-test-suite tests in the 'thin-provisioning' testsuite failing due to this patchset: I'm setting up the testsuite to debug. Ming 4) Failure: test_discard_a_fragmented_device(DiscardQuickTests) [/root/snitm/git/device-mapper-test-suite/lib/dmtest/tests/thin-provisioning/discard_tests.rb:49:in `block in assert_no_mappings'
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: On Fri, Jul 31 2015 at 5:19pm -0400, Ming Lin m...@kernel.org wrote: On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote: On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin m...@kernel.org wrote: From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com ... diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f..3707f30 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,12 +9,158 @@ #include blk.h +static struct bio *blk_bio_discard_split(struct request_queue *q, + struct bio *bio, + struct bio_set *bs) +{ + unsigned int max_discard_sectors, granularity; + int alignment; + sector_t tmp; + unsigned split_sectors; + + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q-limits.discard_granularity 9, 1U); + + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + max_discard_sectors -= max_discard_sectors % granularity; + + if (unlikely(!max_discard_sectors)) { + /* XXX: warn */ + return NULL; + } + + if (bio_sectors(bio) = max_discard_sectors) + return NULL; + + split_sectors = max_discard_sectors; + + /* + * If the next starting sector would be misaligned, stop the discard at + * the previous aligned sector. + */ + alignment = (q-limits.discard_alignment 9) % granularity; + + tmp = bio-bi_iter.bi_sector + split_sectors - alignment; + tmp = sector_div(tmp, granularity); + + if (split_sectors tmp) + split_sectors -= tmp; + + return bio_split(bio, split_sectors, GFP_NOIO, bs); +} This code to stop the discard at the previous aligned sector could be the reason why I have 2 device-mapper-test-suite tests in the
Re: [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Fri, Jul 31, 2015 at 3:02 PM, Ming Lin m...@kernel.org wrote: On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote: On Fri, Jul 31 2015 at 5:19pm -0400, Ming Lin m...@kernel.org wrote: On Fri, Jul 31, 2015 at 12:23 PM, Mike Snitzer snit...@redhat.com wrote: On Mon, Jul 06 2015 at 3:44P -0400, Ming Lin m...@kernel.org wrote: From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com ... diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f..3707f30 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -9,12 +9,158 @@ #include blk.h +static struct bio *blk_bio_discard_split(struct request_queue *q, + struct bio *bio, + struct bio_set *bs) +{ + unsigned int max_discard_sectors, granularity; + int alignment; + sector_t tmp; + unsigned split_sectors; + + /* Zero-sector (unknown) and one-sector granularities are the same. */ + granularity = max(q-limits.discard_granularity 9, 1U); + + max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + max_discard_sectors -= max_discard_sectors % granularity; + + if (unlikely(!max_discard_sectors)) { + /* XXX: warn */ + return NULL; + } + + if (bio_sectors(bio) = max_discard_sectors) + return NULL; + + split_sectors = max_discard_sectors; + + /* + * If the next starting sector would be misaligned, stop the discard at + * the previous aligned sector. + */ + alignment = (q-limits.discard_alignment 9) % granularity; + + tmp = bio-bi_iter.bi_sector + split_sectors - alignment; + tmp = sector_div(tmp, granularity); + + if (split_sectors tmp) + split_sectors -= tmp; + + return bio_split(bio, split_sectors, GFP_NOIO, bs); +} This code to stop the discard at the previous aligned sector could be the reason why I
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
From: Kent Overstreet The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Al Viro Cc: Ming Lei Cc: Neil Brown Cc: Alasdair Kergon Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Lars Ellenberg Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina Cc: Geoff Levand Cc: Jim Paris Cc: Joshua Morris Cc: Philip Kelleher Cc: Minchan Kim Cc: Nitin Gupta Cc: Oleg Drokin Cc: Andreas Dilger Acked-by: NeilBrown (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- block/blk-core.c| 19 ++-- block/blk-merge.c | 159 ++-- block/blk-mq.c | 4 + block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_req.c | 2 + drivers/block/pktcdvd.c | 6 +- drivers/block/ps3vram.c | 2 + drivers/block/rsxx/dev.c| 2 + drivers/block/umem.c| 2 + drivers/block/zram/zram_drv.c | 2 + drivers/md/dm.c | 2 + drivers/md/md.c | 2 + drivers/s390/block/dcssblk.c| 2 + drivers/s390/block/xpram.c | 2 + drivers/staging/lustre/lustre/llite/lloop.c | 2 + include/linux/blkdev.h | 3 + 16 files changed, 192 insertions(+), 22 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 82819e6..cecf80c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (q->id < 0) goto fail_q; + q->bio_split = bioset_create(BIO_POOL_SIZE, 0); + if (!q->bio_split) + goto fail_id; + q->backing_dev_info.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK; @@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) err = bdi_init(>backing_dev_info); if (err) - goto fail_id; + goto fail_split; setup_timer(>backing_dev_info.laptop_mode_wb_timer, laptop_mode_timer_fn, (unsigned long) q); @@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) fail_bdi: bdi_destroy(>backing_dev_info); +fail_split: + bioset_free(q->bio_split); fail_id: ida_simple_remove(_queue_ida, q->id); fail_q: @@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio) struct request *req; unsigned int request_count = 0; + blk_queue_split(q, , q->bio_split); + /* * low level driver can indicate that it wants pages above a * certain limit bounced to low memory (ie for highmem, or even @@ -1832,15 +1840,6 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (likely(bio_is_rw(bio) && - nr_sectors >
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
From: Kent Overstreet The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe Cc: Christoph Hellwig Cc: Al Viro Cc: Ming Lei Cc: Neil Brown Cc: Alasdair Kergon Cc: Mike Snitzer Cc: dm-de...@redhat.com Cc: Lars Ellenberg Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina Cc: Geoff Levand Cc: Jim Paris Cc: Joshua Morris Cc: Philip Kelleher Cc: Minchan Kim Cc: Nitin Gupta Cc: Oleg Drokin Cc: Andreas Dilger Acked-by: NeilBrown (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park Signed-off-by: Ming Lin --- block/blk-core.c| 19 ++-- block/blk-merge.c | 159 ++-- block/blk-mq.c | 4 + block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_req.c | 2 + drivers/block/pktcdvd.c | 6 +- drivers/block/ps3vram.c | 2 + drivers/block/rsxx/dev.c| 2 + drivers/block/umem.c| 2 + drivers/block/zram/zram_drv.c | 2 + drivers/md/dm.c | 2 + drivers/md/md.c | 2 + drivers/s390/block/dcssblk.c| 2 + drivers/s390/block/xpram.c | 2 + drivers/staging/lustre/lustre/llite/lloop.c | 2 + include/linux/blkdev.h | 3 + 16 files changed, 192 insertions(+), 22 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 82819e6..cecf80c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (q->id < 0) goto fail_q; + q->bio_split = bioset_create(BIO_POOL_SIZE, 0); + if (!q->bio_split) + goto fail_id; + q->backing_dev_info.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; q->backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK; @@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) err = bdi_init(>backing_dev_info); if (err) - goto fail_id; + goto fail_split; setup_timer(>backing_dev_info.laptop_mode_wb_timer, laptop_mode_timer_fn, (unsigned long) q); @@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) fail_bdi: bdi_destroy(>backing_dev_info); +fail_split: + bioset_free(q->bio_split); fail_id: ida_simple_remove(_queue_ida, q->id); fail_q: @@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio) struct request *req; unsigned int request_count = 0; + blk_queue_split(q, , q->bio_split); + /* * low level driver can indicate that it wants pages above a * certain limit bounced to low memory (ie for highmem, or even @@ -1832,15 +1840,6 @@ generic_make_request_checks(struct bio *bio) goto end_io; } - if (likely(bio_is_rw(bio) && - nr_sectors >
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-core.c| 19 ++-- block/blk-merge.c | 159 ++-- block/blk-mq.c | 4 + block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_req.c | 2 + drivers/block/pktcdvd.c | 6 +- drivers/block/ps3vram.c | 2 + drivers/block/rsxx/dev.c| 2 + drivers/block/umem.c| 2 + drivers/block/zram/zram_drv.c | 2 + drivers/md/dm.c | 2 + drivers/md/md.c | 2 + drivers/s390/block/dcssblk.c| 2 + drivers/s390/block/xpram.c | 2 + drivers/staging/lustre/lustre/llite/lloop.c | 2 + include/linux/blkdev.h | 3 + 16 files changed, 192 insertions(+), 22 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 82819e6..cecf80c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (q-id 0) goto fail_q; + q-bio_split = bioset_create(BIO_POOL_SIZE, 0); + if (!q-bio_split) + goto fail_id; + q-backing_dev_info.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; q-backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK; @@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) err = bdi_init(q-backing_dev_info); if (err) - goto fail_id; + goto fail_split; setup_timer(q-backing_dev_info.laptop_mode_wb_timer, laptop_mode_timer_fn, (unsigned long) q); @@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) fail_bdi: bdi_destroy(q-backing_dev_info); +fail_split: + bioset_free(q-bio_split); fail_id: ida_simple_remove(blk_queue_ida, q-id); fail_q: @@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio) struct request
[PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
From: Kent Overstreet kent.overstr...@gmail.com The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe ax...@kernel.dk Cc: Christoph Hellwig h...@infradead.org Cc: Al Viro v...@zeniv.linux.org.uk Cc: Ming Lei ming@canonical.com Cc: Neil Brown ne...@suse.de Cc: Alasdair Kergon a...@redhat.com Cc: Mike Snitzer snit...@redhat.com Cc: dm-de...@redhat.com Cc: Lars Ellenberg drbd-...@lists.linbit.com Cc: drbd-u...@lists.linbit.com Cc: Jiri Kosina jkos...@suse.cz Cc: Geoff Levand ge...@infradead.org Cc: Jim Paris j...@jtan.com Cc: Joshua Morris josh.h.mor...@us.ibm.com Cc: Philip Kelleher pjk1...@linux.vnet.ibm.com Cc: Minchan Kim minc...@kernel.org Cc: Nitin Gupta ngu...@vflare.org Cc: Oleg Drokin oleg.dro...@intel.com Cc: Andreas Dilger andreas.dil...@intel.com Acked-by: NeilBrown ne...@suse.de (for the 'md/md.c' bits) Signed-off-by: Kent Overstreet kent.overstr...@gmail.com [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park dp...@posteo.net Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-core.c| 19 ++-- block/blk-merge.c | 159 ++-- block/blk-mq.c | 4 + block/blk-sysfs.c | 3 + drivers/block/drbd/drbd_req.c | 2 + drivers/block/pktcdvd.c | 6 +- drivers/block/ps3vram.c | 2 + drivers/block/rsxx/dev.c| 2 + drivers/block/umem.c| 2 + drivers/block/zram/zram_drv.c | 2 + drivers/md/dm.c | 2 + drivers/md/md.c | 2 + drivers/s390/block/dcssblk.c| 2 + drivers/s390/block/xpram.c | 2 + drivers/staging/lustre/lustre/llite/lloop.c | 2 + include/linux/blkdev.h | 3 + 16 files changed, 192 insertions(+), 22 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 82819e6..cecf80c 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -645,6 +645,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) if (q-id 0) goto fail_q; + q-bio_split = bioset_create(BIO_POOL_SIZE, 0); + if (!q-bio_split) + goto fail_id; + q-backing_dev_info.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE; q-backing_dev_info.capabilities = BDI_CAP_CGROUP_WRITEBACK; @@ -653,7 +657,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) err = bdi_init(q-backing_dev_info); if (err) - goto fail_id; + goto fail_split; setup_timer(q-backing_dev_info.laptop_mode_wb_timer, laptop_mode_timer_fn, (unsigned long) q); @@ -695,6 +699,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) fail_bdi: bdi_destroy(q-backing_dev_info); +fail_split: + bioset_free(q-bio_split); fail_id: ida_simple_remove(blk_queue_ida, q-id); fail_q: @@ -1612,6 +1618,8 @@ static void blk_queue_bio(struct request_queue *q, struct bio *bio) struct request