Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-12 Thread Christoph Hellwig
And more importantly please test with a file system that uses the
iomap direct I/O code (btrfs, gfs2, ext4, xfs, zonefs) as we should
never just work aroudn a legacy codebase that should go away in the
block layer.


Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-12 Thread Changheun Lee
> On Sun, Apr 11, 2021 at 10:13:01PM +, Damien Le Moal wrote:
> > On 2021/04/09 23:47, Bart Van Assche wrote:
> > > On 4/7/21 3:27 AM, Damien Le Moal wrote:
> > >> On 2021/04/07 18:46, Changheun Lee wrote:
> > >>> I'll prepare new patch as you recommand. It will be added setting of
> > >>> limit_bio_size automatically when queue max sectors is determined.
> > >>
> > >> Please do that in the driver for the HW that benefits from it. Do not do 
> > >> this
> > >> for all block devices.
> > > 
> > > Hmm ... is it ever useful to build a bio with a size that exceeds 
> > > max_hw_sectors when submitting a bio directly to a block device, or in 
> > > other words, if no stacked block driver sits between the submitter and 
> > > the block device? Am I perhaps missing something?
> > 
> > Device performance wise, the benefits are certainly not obvious to me 
> > either.
> > But for very fast block devices, I think the CPU overhead of building more
> > smaller BIOs may be significant compared to splitting a large BIO into 
> > multiple
> > requests. Though it may be good to revisit this with some benchmark numbers.
> 
> This patch tries to address issue[1] in do_direct_IO() in which
> Changheun observed that other operations takes time between adding page
> to bio.
> 
> However, do_direct_IO() just does following except for adding bio and
> submitting bio:
> 
> - retrieves pages at batch(pin 64 pages each time from VM) and 
> 
> - retrieve block mapping(get_more_blocks), which is still done usually
> very less times for 32MB; for new mapping, clean_bdev_aliases() may
> take a bit time.
> 
> If there isn't system memory pressure, pin 64 pages won't be slow, but
> get_more_blocks() may take a bit time.
> 
> Changheun, can you check if multiple get_more_blocks() is called for 
> submitting
> 32MB in your test?

almost one time called.

> 
> In my 32MB sync dio f2fs test on x86_64 VM, one buffer_head mapping can
> hold 32MB, but it is one freshly new f2fs.
> 
> I'd suggest to understand the issue completely before figuring out one
> solution.

Thank you for your advice. I'll analyze more about your point later. :)
But I think it's different from finding main time spend point in
do_direct_IO(). I think excessive loop should be controlled.
8,192 loops in do_direct_IO() - for 32MB - to submit one bio is too much
on 4KB page system. I want to apply a optional solution to avoid
excessive loop casued by multipage bvec.

Thanks,

Changheun Lee


Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-11 Thread Ming Lei
On Sun, Apr 11, 2021 at 10:13:01PM +, Damien Le Moal wrote:
> On 2021/04/09 23:47, Bart Van Assche wrote:
> > On 4/7/21 3:27 AM, Damien Le Moal wrote:
> >> On 2021/04/07 18:46, Changheun Lee wrote:
> >>> I'll prepare new patch as you recommand. It will be added setting of
> >>> limit_bio_size automatically when queue max sectors is determined.
> >>
> >> Please do that in the driver for the HW that benefits from it. Do not do 
> >> this
> >> for all block devices.
> > 
> > Hmm ... is it ever useful to build a bio with a size that exceeds 
> > max_hw_sectors when submitting a bio directly to a block device, or in 
> > other words, if no stacked block driver sits between the submitter and 
> > the block device? Am I perhaps missing something?
> 
> Device performance wise, the benefits are certainly not obvious to me either.
> But for very fast block devices, I think the CPU overhead of building more
> smaller BIOs may be significant compared to splitting a large BIO into 
> multiple
> requests. Though it may be good to revisit this with some benchmark numbers.

This patch tries to address issue[1] in do_direct_IO() in which
Changheun observed that other operations takes time between adding page
to bio.

However, do_direct_IO() just does following except for adding bio and
submitting bio:

- retrieves pages at batch(pin 64 pages each time from VM) and 

- retrieve block mapping(get_more_blocks), which is still done usually
very less times for 32MB; for new mapping, clean_bdev_aliases() may
take a bit time.

If there isn't system memory pressure, pin 64 pages won't be slow, but
get_more_blocks() may take a bit time.

Changheun, can you check if multiple get_more_blocks() is called for submitting
32MB in your test?

In my 32MB sync dio f2fs test on x86_64 VM, one buffer_head mapping can
hold 32MB, but it is one freshly new f2fs.

I'd suggest to understand the issue completely before figuring out one
solution.


[1] 
https://lore.kernel.org/linux-block/20210202041204.28995-1-nanich@samsung.com/


Thanks,
Ming



Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-11 Thread Damien Le Moal
On 2021/04/09 23:47, Bart Van Assche wrote:
> On 4/7/21 3:27 AM, Damien Le Moal wrote:
>> On 2021/04/07 18:46, Changheun Lee wrote:
>>> I'll prepare new patch as you recommand. It will be added setting of
>>> limit_bio_size automatically when queue max sectors is determined.
>>
>> Please do that in the driver for the HW that benefits from it. Do not do this
>> for all block devices.
> 
> Hmm ... is it ever useful to build a bio with a size that exceeds 
> max_hw_sectors when submitting a bio directly to a block device, or in 
> other words, if no stacked block driver sits between the submitter and 
> the block device? Am I perhaps missing something?

Device performance wise, the benefits are certainly not obvious to me either.
But for very fast block devices, I think the CPU overhead of building more
smaller BIOs may be significant compared to splitting a large BIO into multiple
requests. Though it may be good to revisit this with some benchmark numbers.

> 
> Thanks,
> 
> Bart.
> 


-- 
Damien Le Moal
Western Digital Research


Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-09 Thread Bart Van Assche

On 4/7/21 3:27 AM, Damien Le Moal wrote:

On 2021/04/07 18:46, Changheun Lee wrote:

I'll prepare new patch as you recommand. It will be added setting of
limit_bio_size automatically when queue max sectors is determined.


Please do that in the driver for the HW that benefits from it. Do not do this
for all block devices.


Hmm ... is it ever useful to build a bio with a size that exceeds 
max_hw_sectors when submitting a bio directly to a block device, or in 
other words, if no stacked block driver sits between the submitter and 
the block device? Am I perhaps missing something?


Thanks,

Bart.


Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-07 Thread Damien Le Moal
On 2021/04/07 18:46, Changheun Lee wrote:
>> On Wed, Apr 07, 2021 at 03:55:07PM +0900, Changheun Lee wrote:
 On Wed, Apr 07, 2021 at 02:06:33PM +0900, Changheun Lee wrote:
>> On Wed, Apr 07, 2021 at 09:16:12AM +0900, Changheun Lee wrote:
 On Tue, Apr 06, 2021 at 10:31:28AM +0900, Changheun Lee wrote:
>> bio size can grow up to 4GB when muli-page bvec is enabled.
>> but sometimes it would lead to inefficient behaviors.
>> in case of large chunk direct I/O, - 32MB chunk read in user space -
>> all pages for 32MB would be merged to a bio structure if the pages
>> physical addresses are contiguous. it makes some delay to submit
>> until merge complete. bio max size should be limited to a proper 
>> size.
>>
>> When 32MB chunk read with direct I/O option is coming from userspace,
>> kernel behavior is below now in do_direct_IO() loop. it's timeline.
>>
>>  | bio merge for 32MB. total 8,192 pages are merged.
>>  | total elapsed time is over 2ms.
>>  |-- ... --->|
>>  | 8,192 pages 
>> merged a bio.
>>  | at this time, 
>> first bio submit is done.
>>  | 1 bio is split to 
>> 32 read request and issue.
>>  |--->
>>   |--->
>>|--->
>>   ..
>>
>> |--->
>> 
>> |--->|
>>   total 19ms elapsed to complete 32MB read 
>> done from device. |
>>
>> If bio max size is limited with 1MB, behavior is changed below.
>>
>>  | bio merge for 1MB. 256 pages are merged for each bio.
>>  | total 32 bio will be made.
>>  | total elapsed time is over 2ms. it's same.
>>  | but, first bio submit timing is fast. about 100us.
>>  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
>>   | 256 pages merged a bio.
>>   | at this time, first bio submit is done.
>>   | and 1 read request is issued for 1 bio.
>>   |--->
>>|--->
>> |--->
>>   ..
>>  |--->
>>   |--->|
>> total 17ms elapsed to complete 32MB read done from device. |
>>
>> As a result, read request issue timing is faster if bio max size is 
>> limited.
>> Current kernel behavior with multipage bvec, super large bio can be 
>> created.
>> And it lead to delay first I/O request issue.
>>
>> Signed-off-by: Changheun Lee 
>> ---
>>  block/bio.c| 13 -
>>  include/linux/bio.h|  2 +-
>>  include/linux/blkdev.h |  3 +++
>>  3 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index 1f2cc1fbe283..c528e1f944c7 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec 
>> *table,
>>  }
>>  EXPORT_SYMBOL(bio_init);
>>  
>> +unsigned int bio_max_size(struct bio *bio)
>> +{
>> +struct request_queue *q = bio->bi_disk->queue;
>> +
>> +if (blk_queue_limit_bio_size(q))
>> +return blk_queue_get_max_sectors(q, bio_op(bio))
>> +<< SECTOR_SHIFT;
>> +
>> +return UINT_MAX;
>> +}
>> +
>>  /**
>>   * bio_reset - reinitialize a bio
>>   * @bio:bio to reset
>> @@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, 
>> struct page *page,
>>  struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
>>  
>>  if (page_is_mergeable(bv, page, len, off, same_page)) {
>> -if (bio->bi_iter.bi_size > UINT_MAX - len) {
>> +if (bio->bi_iter.bi_size > bio_max_size(bio) - 
>> len) {
>>  *same_page = false;
>>  return false;

Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-07 Thread Changheun Lee
> On Wed, Apr 07, 2021 at 03:55:07PM +0900, Changheun Lee wrote:
> > > On Wed, Apr 07, 2021 at 02:06:33PM +0900, Changheun Lee wrote:
> > > > > On Wed, Apr 07, 2021 at 09:16:12AM +0900, Changheun Lee wrote:
> > > > > > > On Tue, Apr 06, 2021 at 10:31:28AM +0900, Changheun Lee wrote:
> > > > > > > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > > > > > > but sometimes it would lead to inefficient behaviors.
> > > > > > > > > in case of large chunk direct I/O, - 32MB chunk read in user 
> > > > > > > > > space -
> > > > > > > > > all pages for 32MB would be merged to a bio structure if the 
> > > > > > > > > pages
> > > > > > > > > physical addresses are contiguous. it makes some delay to 
> > > > > > > > > submit
> > > > > > > > > until merge complete. bio max size should be limited to a 
> > > > > > > > > proper size.
> > > > > > > > > 
> > > > > > > > > When 32MB chunk read with direct I/O option is coming from 
> > > > > > > > > userspace,
> > > > > > > > > kernel behavior is below now in do_direct_IO() loop. it's 
> > > > > > > > > timeline.
> > > > > > > > > 
> > > > > > > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > > > > > > >  | total elapsed time is over 2ms.
> > > > > > > > >  |-- ... --->|
> > > > > > > > >  | 8,192 
> > > > > > > > > pages merged a bio.
> > > > > > > > >  | at this 
> > > > > > > > > time, first bio submit is done.
> > > > > > > > >  | 1 bio is 
> > > > > > > > > split to 32 read request and issue.
> > > > > > > > >  
> > > > > > > > > |--->
> > > > > > > > >   
> > > > > > > > > |--->
> > > > > > > > >
> > > > > > > > > |--->
> > > > > > > > >   
> > > > > > > > > ..
> > > > > > > > >   
> > > > > > > > >  |--->
> > > > > > > > >   
> > > > > > > > >   |--->|
> > > > > > > > >   total 19ms elapsed to complete 32MB 
> > > > > > > > > read done from device. |
> > > > > > > > > 
> > > > > > > > > If bio max size is limited with 1MB, behavior is changed 
> > > > > > > > > below.
> > > > > > > > > 
> > > > > > > > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > > > > > > > >  | total 32 bio will be made.
> > > > > > > > >  | total elapsed time is over 2ms. it's same.
> > > > > > > > >  | but, first bio submit timing is fast. about 100us.
> > > > > > > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > > > > > > >   | 256 pages merged a bio.
> > > > > > > > >   | at this time, first bio submit is done.
> > > > > > > > >   | and 1 read request is issued for 1 bio.
> > > > > > > > >   |--->
> > > > > > > > >|--->
> > > > > > > > > |--->
> > > > > > > > >   ..
> > > > > > > > >  
> > > > > > > > > |--->
> > > > > > > > >   
> > > > > > > > > |--->|
> > > > > > > > > total 17ms elapsed to complete 32MB read done from 
> > > > > > > > > device. |
> > > > > > > > > 
> > > > > > > > > As a result, read request issue timing is faster if bio max 
> > > > > > > > > size is limited.
> > > > > > > > > Current kernel behavior with multipage bvec, super large bio 
> > > > > > > > > can be created.
> > > > > > > > > And it lead to delay first I/O request issue.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Changheun Lee 
> > > > > > > > > ---
> > > > > > > > >  block/bio.c| 13 -
> > > > > > > > >  include/linux/bio.h|  2 +-
> > > > > > > > >  include/linux/blkdev.h |  3 +++
> > > > > > > > >  3 files changed, 16 insertions(+), 2 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/block/bio.c b/block/bio.c
> > > > > > > > > index 1f2cc1fbe283..c528e1f944c7 100644
> > > > > > > > > --- a/block/bio.c
> > > > > > > > > +++ b/block/bio.c
> > > > > > > > > @@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct 
> > > > > > > > > bio_vec *table,
> > > > > > > > >  }
> > > > > > > > >  EXPORT_SYMBOL(bio_init);
> > > > > > > > >  
> > > > > > > > > +unsigned int bio_max_size(struct bio *bio)
> > > > > > > > > +{
> > > > > > > > > + struct request_queue *q = bio->bi_disk->queue;
> > > > > > > > > +
> > > > > > > > > + if (blk_queue_limit_bio_size(q))
> > > > > > > > > + return blk_queue_get_max_sectors(q, bio_op(bio))
> > > > > > > > > +