Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2019-03-05 Thread Jaegeuk Kim
On 03/04, Damien Le Moal wrote:
> For zoned block devices, an array of zone types for each device is
> allocated and initialized in order to determine if a section is stored
> on a sequential zone (zone reset needed) or a conventional zone (no
> zone reset needed and regular discard applies). Considering this usage,
> the zone types stored in memory can be replaced with a bitmap to
> indicate an equivalent information, that is, if a zone is sequential or
> not. This reduces the memory usage for each zoned device by roughly 8:
> on a 14TB disk with zones of 256 MB, the zone type array consumes
> 13x4KB pages while the bitmap uses only 2x4KB pages.
> 
> This patch changes the f2fs_dev_info structure blkz_type field to the
> bitmap blkz_seq. Access to this bitmap is done using the helper
> function f2fs_blkz_is_seq(), which is a rewrite of the function
> get_blkz_type().
> 
> Signed-off-by: Damien Le Moal 
> ---
>  fs/f2fs/f2fs.h| 13 +++--
>  fs/f2fs/segment.c | 23 +++
>  fs/f2fs/super.c   | 13 -
>  3 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 12fabd6735dd..d7b2de930352 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1067,8 +1067,8 @@ struct f2fs_dev_info {
>   block_t start_blk;
>   block_t end_blk;
>  #ifdef CONFIG_BLK_DEV_ZONED
> - unsigned int nr_blkz;   /* Total number of zones */
> - u8 *blkz_type;  /* Array of zones type */
> + unsigned int nr_blkz;   /* Total number of zones */
> + unsigned long *blkz_seq;/* Bitmap indicating sequential zones */
>  #endif
>  };
>  
> @@ -3508,16 +3508,17 @@ F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
>  F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
>  
>  #ifdef CONFIG_BLK_DEV_ZONED
> -static inline int get_blkz_type(struct f2fs_sb_info *sbi,
> - struct block_device *bdev, block_t blkaddr)
> +static inline bool f2fs_blkz_is_seq(struct f2fs_sb_info *sbi,
> + struct block_device *bdev, block_t blkaddr)
>  {
>   unsigned int zno = blkaddr >> sbi->log_blocks_per_blkz;
>   int i;
>  
>   for (i = 0; i < sbi->s_ndevs; i++)
>   if (FDEV(i).bdev == bdev)
> - return FDEV(i).blkz_type[zno];
> - return -EINVAL;
> + return test_bit(zno, FDEV(i).blkz_seq);
> + WARN_ON_ONCE(1);
> + return false;
>  }
>  #endif
>  
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9b79056d705d..65941070776c 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1703,19 +1703,8 @@ static int __f2fs_issue_discard_zone(struct 
> f2fs_sb_info *sbi,
>   blkstart -= FDEV(devi).start_blk;
>   }
>  
> - /*
> -  * We need to know the type of the zone: for conventional zones,
> -  * use regular discard if the drive supports it. For sequential
> -  * zones, reset the zone write pointer.
> -  */
> - switch (get_blkz_type(sbi, bdev, blkstart)) {
> -
> - case BLK_ZONE_TYPE_CONVENTIONAL:
> - if (!blk_queue_discard(bdev_get_queue(bdev)))
> - return 0;
> - return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
> - case BLK_ZONE_TYPE_SEQWRITE_REQ:
> - case BLK_ZONE_TYPE_SEQWRITE_PREF:
> + /* For sequential zones, reset the zone write pointer */
> + if (f2fs_blkz_is_seq(sbi, bdev, blkstart)) {
>   sector = SECTOR_FROM_BLOCK(blkstart);
>   nr_sects = SECTOR_FROM_BLOCK(blklen);
>  
> @@ -1730,10 +1719,12 @@ static int __f2fs_issue_discard_zone(struct 
> f2fs_sb_info *sbi,
>   trace_f2fs_issue_reset_zone(bdev, blkstart);
>   return blkdev_reset_zones(bdev, sector,
> nr_sects, GFP_NOFS);
> - default:
> - /* Unknown zone type: broken device ? */
> - return -EIO;
>   }
> +
> +  /* For conventional zones, use regular discard if supported */
> + if (!blk_queue_discard(bdev_get_queue(bdev)))
> + return 0;
> + return __queue_discard_cmd(sbi, bdev, lblkstart, blklen);
>  }
>  #endif
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index c46a1d4318d4..44860b4285b9 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1017,7 +1017,7 @@ static void destroy_device_list(struct f2fs_sb_info 
> *sbi)
>   for (i = 0; i < sbi->s_ndevs; i++) {
>   blkdev_put(FDEV(i).bdev, FMODE_EXCL);
>  #ifdef CONFIG_BLK_DEV_ZONED
> - kvfree(FDEV(i).blkz_type);
> + kfree(FDEV(i).blkz_seq);

We need to use kvfree() since f2fs_kzalloc() can do kvmalloc().

Thanks,

>  #endif
>   }
>   kvfree(sbi->devs);
> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, 
> int devi)
>   if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>   FDEV(devi).nr_blkz++;
>  
> - FDEV(devi).blkz_t

Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2019-03-05 Thread Damien Le Moal
On 2019/03/06 1:55, Jaegeuk Kim wrote:
> On 03/05, Damien Le Moal wrote:
>> Johannes,
>>
>> On 2019/03/04 18:46, Johannes Thumshirn wrote:
>>> Hi Damien,
>>>
>>> On 04/03/2019 08:04, Damien Le Moal wrote:
 @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, 
 int devi)
if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
FDEV(devi).nr_blkz++;
  
 -  FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
 -  GFP_KERNEL);
 -  if (!FDEV(devi).blkz_type)
 +  FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
 +  BITS_TO_LONGS(FDEV(devi).nr_blkz)
 +  * sizeof(unsigned long),
 +  GFP_KERNEL);
 +  if (!FDEV(devi).blkz_seq)
return -ENOMEM;
>>>
>>> Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
>>> normal kernel library.
>>
>> Yes indeed... f2fs_kzalloc uses f2fs_kmalloc(__GFP_ZERO) and f2fs_kmalloc is
>> basically kmalloc() or kvmalloc() but with error injection for tests. So I 
>> used
>> that instead of bitmap_zalloc() to preserve the error injection test.
>>
>> Jaegeuk,
>>
>> Which do you prefer ? The patch as it is or switching to bitmap_zalloc() ?
>> Whichever is fine with me so I can send a v2 if you prefer bitmap_zalloc().
> 
> Hi Damien,
> 
> I think f2fs_kmalloc would be fine for fault injection tests. It seems it'd
> better to write a clean-up patch which replaces all the bitmap allocations
> in f2fs with single f2fs_bitmap_zalloc() at once.

Sounds good to me. So will you take the patch as is ? Any comment on it ?
This change was tested on a 15TB SMR disks.

> 
> I'll take a look at this.
> Thanks,
> 
>>
>> Best regards.
>>
>> -- 
>> Damien Le Moal
>> Western Digital Research
> 


-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2019-03-05 Thread Jaegeuk Kim
On 03/05, Damien Le Moal wrote:
> Johannes,
> 
> On 2019/03/04 18:46, Johannes Thumshirn wrote:
> > Hi Damien,
> > 
> > On 04/03/2019 08:04, Damien Le Moal wrote:
> >> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, 
> >> int devi)
> >>if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
> >>FDEV(devi).nr_blkz++;
> >>  
> >> -  FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
> >> -  GFP_KERNEL);
> >> -  if (!FDEV(devi).blkz_type)
> >> +  FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
> >> +  BITS_TO_LONGS(FDEV(devi).nr_blkz)
> >> +  * sizeof(unsigned long),
> >> +  GFP_KERNEL);
> >> +  if (!FDEV(devi).blkz_seq)
> >>return -ENOMEM;
> > 
> > Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
> > normal kernel library.
> 
> Yes indeed... f2fs_kzalloc uses f2fs_kmalloc(__GFP_ZERO) and f2fs_kmalloc is
> basically kmalloc() or kvmalloc() but with error injection for tests. So I 
> used
> that instead of bitmap_zalloc() to preserve the error injection test.
> 
> Jaegeuk,
> 
> Which do you prefer ? The patch as it is or switching to bitmap_zalloc() ?
> Whichever is fine with me so I can send a v2 if you prefer bitmap_zalloc().

Hi Damien,

I think f2fs_kmalloc would be fine for fault injection tests. It seems it'd
better to write a clean-up patch which replaces all the bitmap allocations
in f2fs with single f2fs_bitmap_zalloc() at once.

I'll take a look at this.
Thanks,

> 
> Best regards.
> 
> -- 
> Damien Le Moal
> Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2019-03-04 Thread Damien Le Moal
Johannes,

On 2019/03/04 18:46, Johannes Thumshirn wrote:
> Hi Damien,
> 
> On 04/03/2019 08:04, Damien Le Moal wrote:
>> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, 
>> int devi)
>>  if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>>  FDEV(devi).nr_blkz++;
>>  
>> -FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
>> -GFP_KERNEL);
>> -if (!FDEV(devi).blkz_type)
>> +FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
>> +BITS_TO_LONGS(FDEV(devi).nr_blkz)
>> +* sizeof(unsigned long),
>> +GFP_KERNEL);
>> +if (!FDEV(devi).blkz_seq)
>>  return -ENOMEM;
> 
> Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
> normal kernel library.

Yes indeed... f2fs_kzalloc uses f2fs_kmalloc(__GFP_ZERO) and f2fs_kmalloc is
basically kmalloc() or kvmalloc() but with error injection for tests. So I used
that instead of bitmap_zalloc() to preserve the error injection test.

Jaegeuk,

Which do you prefer ? The patch as it is or switching to bitmap_zalloc() ?
Whichever is fine with me so I can send a v2 if you prefer bitmap_zalloc().

Best regards.

-- 
Damien Le Moal
Western Digital Research


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2019-03-04 Thread Johannes Thumshirn
Hi Damien,

On 04/03/2019 08:04, Damien Le Moal wrote:
> @@ -2765,9 +2765,11 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, 
> int devi)
>   if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
>   FDEV(devi).nr_blkz++;
>  
> - FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
> - GFP_KERNEL);
> - if (!FDEV(devi).blkz_type)
> + FDEV(devi).blkz_seq = f2fs_kzalloc(sbi,
> + BITS_TO_LONGS(FDEV(devi).nr_blkz)
> + * sizeof(unsigned long),
> + GFP_KERNEL);
> + if (!FDEV(devi).blkz_seq)
>   return -ENOMEM;

Not so sure about F2FS internals, but there is a bitmap_zalloc() in the
normal kernel library.

Byte,
Johannes
-- 
Johannes ThumshirnSUSE Labs Filesystems
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2018-02-27 Thread Christoph Hellwig
On Tue, Feb 20, 2018 at 03:06:10PM +0900, Damien Le Moal wrote:
> For a zoned block device mount, an array of zone types for the device is
> allocated and initialized in order to determine if a section is stored
> on a sequential zone (zone reset needed) or a conventional zone (no zone
> reset needed and regular discard applies). Considering this usage, the
> zone types stored in memory can be replaced with a bitmap to indicate
> equivalent information, that is, if a zone is sequential or not. This
> reduces the memory usage for the device mount by roughly 8 (on a 14TB
> disk with zones of 256 MB, the zone type array consumes 13x4KB pages
> while the bitmap uses only 2x4KB pages.
> 
> This patch changes the f2fs_dev_info structure blkz_type field to the
> bitmap blkz_seq. Access to this bitmap is done using the function
> f2fs_blkz_is_seq(), which is a rewrite of the function get_blkz_type().

Is there any way we could just provide a block layer helper to
figure this out so that the file system code could be simplified even
more?

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2018-02-27 Thread Damien Le Moal
Christoph,

On 2/21/18 11:39, Christoph Hellwig wrote:
> On Tue, Feb 20, 2018 at 03:06:10PM +0900, Damien Le Moal wrote:
>> For a zoned block device mount, an array of zone types for the device is
>> allocated and initialized in order to determine if a section is stored
>> on a sequential zone (zone reset needed) or a conventional zone (no zone
>> reset needed and regular discard applies). Considering this usage, the
>> zone types stored in memory can be replaced with a bitmap to indicate
>> equivalent information, that is, if a zone is sequential or not. This
>> reduces the memory usage for the device mount by roughly 8 (on a 14TB
>> disk with zones of 256 MB, the zone type array consumes 13x4KB pages
>> while the bitmap uses only 2x4KB pages.
>>
>> This patch changes the f2fs_dev_info structure blkz_type field to the
>> bitmap blkz_seq. Access to this bitmap is done using the function
>> f2fs_blkz_is_seq(), which is a rewrite of the function get_blkz_type().
> 
> Is there any way we could just provide a block layer helper to
> figure this out so that the file system code could be simplified even
> more?

I thought about that before sending the patch...

For a physical drive, the block device queue already has the bitmap
indicating sequential zones, and a helper could be used in that case.
But that is not true if the FS block device is a logical device provided
by something like dm. E.g. if the FS mounts a zoned block device that is
a part of a real disk split by dm-linear, then there is no sequential
zone bitmap available, and the FS has to discover that by itself.

Currently, the sequential zone bitmap is initialized in the scsi driver
only. We could move that initialization to the block device layer at
some point when the bdev is created and requests can be sent to the
underlying dev, but before the bdev is exposed. Any suggestion of an
appropriate place to do that ? I do not see any obvious path that works
in all cases (real disks and dm).

-- 
Damien Le Moal,
Western Digital
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Reduce zoned block device memory usage

2018-02-27 Thread Damien Le Moal
Christoph,

On 2/21/18 11:48, Damien Le Moal wrote:
> Christoph,
> 
> On 2/21/18 11:39, Christoph Hellwig wrote:
>> On Tue, Feb 20, 2018 at 03:06:10PM +0900, Damien Le Moal wrote:
>>> For a zoned block device mount, an array of zone types for the device is
>>> allocated and initialized in order to determine if a section is stored
>>> on a sequential zone (zone reset needed) or a conventional zone (no zone
>>> reset needed and regular discard applies). Considering this usage, the
>>> zone types stored in memory can be replaced with a bitmap to indicate
>>> equivalent information, that is, if a zone is sequential or not. This
>>> reduces the memory usage for the device mount by roughly 8 (on a 14TB
>>> disk with zones of 256 MB, the zone type array consumes 13x4KB pages
>>> while the bitmap uses only 2x4KB pages.
>>>
>>> This patch changes the f2fs_dev_info structure blkz_type field to the
>>> bitmap blkz_seq. Access to this bitmap is done using the function
>>> f2fs_blkz_is_seq(), which is a rewrite of the function get_blkz_type().
>>
>> Is there any way we could just provide a block layer helper to
>> figure this out so that the file system code could be simplified even
>> more?
> 
> I thought about that before sending the patch...
> 
> For a physical drive, the block device queue already has the bitmap
> indicating sequential zones, and a helper could be used in that case.
> But that is not true if the FS block device is a logical device provided
> by something like dm. E.g. if the FS mounts a zoned block device that is
> a part of a real disk split by dm-linear, then there is no sequential
> zone bitmap available, and the FS has to discover that by itself.
> 
> Currently, the sequential zone bitmap is initialized in the scsi driver
> only. We could move that initialization to the block device layer at
> some point when the bdev is created and requests can be sent to the
> underlying dev, but before the bdev is exposed. Any suggestion of an
> appropriate place to do that ? I do not see any obvious path that works
> in all cases (real disks and dm).

Answering to myself :)
I was thinking of optimizing by reusing the request queue sequentila
zone bitmap. But you may have been thinking of something more simple like:

struct blk_zone_info {
unsigned int nr_zones;
unsigned long *seq_zones:
};

struct blk_zone_info *blk_get_zone_info(struct block_device *bdev);
void blk_put_zone_info(struct block_device *bdev);
bool blk_zone_is_seq(struct block_device *bdev, sector_t sect);

Which is in essence what f2fs currently code.
That is easy to write. To avoid a lot of report zones, the blk_zone_info
structure can be added to the block device struct for caching (the zone
types and number of zones never change, even with hybrid SMR drives).

Would this be better ?

Best regards.

-- 
Damien Le Moal,
Western Digital
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel