Re: [PATCH v3] btrfs: zoned: move superblock logging zone location

2021-04-11 Thread Damien Le Moal
On 2021/04/10 19:15, David Sterba wrote:
> From: Naohiro Aota 
> 
> Moves the location of the superblock logging zones. The new locations of
> the logging zones are now determined based on fixed block addresses
> instead of on fixed zone numbers.
> 
> The old placement method based on fixed zone numbers causes problems when
> one needs to inspect a file system image without access to the drive zone
> information. In such case, the super block locations cannot be reliably
> determined as the zone size is unknown. By locating the superblock logging
> zones using fixed addresses, we can scan a dumped file system image without
> the zone information since a super block copy will always be present at or
> after the fixed known locations.
> 
> Introduce the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - primary superblock: offset   0B (and the following zone)
>   - first copy: offset 512G (and the following zone)
>   - Second copy:offset   4T (4096G, and the following zone)
> 
> If a logging zone is outside of the disk capacity, we do not record the
> superblock copy.
> 
> The first copy position is much larger than for a non-zoned filesystem,
> which is at 64M.  This is to avoid overlapping with the log zones for
> the primary superblock. This higher location is arbitrary but allows
> supporting devices with very large zone sizes, plus some space around in
> between.
> 
> Such large zone size is unrealistic and very unlikely to ever be seen in
> real devices. Currently, SMR disks have a zone size of 256MB, and we are
> expecting ZNS drives to be in the 1-4GB range, so this limit gives us
> room to breathe. For now, we only allow zone sizes up to 8GB. The
> maximum zone size that would still fit in the space is 256G.
> 
> The fixed location addresses are somewhat arbitrary, with the intent of
> maintaining superblock reliability for smaller and larger devices, with
> the preference for the latter. For this reason, there are two superblocks
> under the first 1T. This should cover use cases for physical devices and
> for emulated/device-mapper devices.
> 
> The superblock logging zones are reserved for superblock logging and
> never used for data or metadata blocks. Note that we only reserve the
> two zones per primary/copy actually used for superblock logging. We do
> not reserve the ranges of zones possibly containing superblocks with the
> largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).
> 
> The zones containing the fixed location offsets used to store
> superblocks on a non-zoned volume are also reserved to avoid confusion.
> 
> Signed-off-by: Naohiro Aota 
> Signed-off-by: David Sterba 
> ---
> 
> For context see replies under
> https://lore.kernel.org/linux-btrfs/2f58edb74695825632c77349b000d31f16cb3226.1617870145.git.naohiro.a...@wdc.com/
> 
>  fs/btrfs/zoned.c | 53 ++--
>  1 file changed, 42 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 1f972b75a9ab..eeb3ebe11d7a 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -21,9 +21,30 @@
>  /* Pseudo write pointer value for conventional zone */
>  #define WP_CONVENTIONAL ((u64)-2)
>  
> +/*
> + * Location of the first zone of superblock logging zone pairs.
> + *
> + * - primary superblock:0B (zone 0)
> + * - first copy:  512G (zone starting at that offset)
> + * - second copy:   4T (zone starting at that offset)
> + */
> +#define BTRFS_SB_LOG_PRIMARY_OFFSET  (0ULL)
> +#define BTRFS_SB_LOG_FIRST_OFFSET(512ULL * SZ_1G)
> +#define BTRFS_SB_LOG_SECOND_OFFSET   (4096ULL * SZ_1G)
> +
> +#define BTRFS_SB_LOG_FIRST_SHIFT const_ilog2(BTRFS_SB_LOG_FIRST_OFFSET)
> +#define BTRFS_SB_LOG_SECOND_SHIFTconst_ilog2(BTRFS_SB_LOG_SECOND_OFFSET)
> +
>  /* Number of superblock log zones */
>  #define BTRFS_NR_SB_LOG_ZONES 2
>  
> +/*
> + * Maximum supported zone size. Currently, SMR disks have a zone size of
> + * 256MiB, and we are expecting ZNS drives to be in the 1-4GiB range. We do 
> not
> + * expect the zone size to become larger than 8GiB in the near future.
> + */
> +#define BTRFS_MAX_ZONE_SIZE  SZ_8G
> +
>  static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void 
> *data)
>  {
>   struct blk_zone *zones = data;
> @@ -111,23 +132,22 @@ static int sb_write_pointer(struct block_device *bdev, 
> struct blk_zone *zones,
>  }
>  
>  /*
> - * The following zones are reserved as the circular buffer on ZONED btrfs.
> - *  - The primary superblock: zones 0 and 1
> - *  - The first copy: zones 16 and 17
> - *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
> - * the following one
> + * Get the first zone number of the superblock mirror
>   */
>  static inline u32 sb_zone_number(int shift, int mirror)
>  {
> - ASSERT(mirror < BTRFS_SUPER_MIRROR_MAX);
> + u64 zone;
>  
> + ASSERT

Re: [PATCH v3] btrfs: zoned: move superblock logging zone location

2021-04-11 Thread Naohiro Aota
On Sat, Apr 10, 2021 at 12:12:23PM +0200, David Sterba wrote:
> From: Naohiro Aota 
> 
> Moves the location of the superblock logging zones. The new locations of
> the logging zones are now determined based on fixed block addresses
> instead of on fixed zone numbers.
> 
> The old placement method based on fixed zone numbers causes problems when
> one needs to inspect a file system image without access to the drive zone
> information. In such case, the super block locations cannot be reliably
> determined as the zone size is unknown. By locating the superblock logging
> zones using fixed addresses, we can scan a dumped file system image without
> the zone information since a super block copy will always be present at or
> after the fixed known locations.
> 
> Introduce the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - primary superblock: offset   0B (and the following zone)
>   - first copy: offset 512G (and the following zone)
>   - Second copy:offset   4T (4096G, and the following zone)
> 
> If a logging zone is outside of the disk capacity, we do not record the
> superblock copy.
> 
> The first copy position is much larger than for a non-zoned filesystem,
> which is at 64M.  This is to avoid overlapping with the log zones for
> the primary superblock. This higher location is arbitrary but allows
> supporting devices with very large zone sizes, plus some space around in
> between.
> 
> Such large zone size is unrealistic and very unlikely to ever be seen in
> real devices. Currently, SMR disks have a zone size of 256MB, and we are
> expecting ZNS drives to be in the 1-4GB range, so this limit gives us
> room to breathe. For now, we only allow zone sizes up to 8GB. The
> maximum zone size that would still fit in the space is 256G.
> 
> The fixed location addresses are somewhat arbitrary, with the intent of
> maintaining superblock reliability for smaller and larger devices, with
> the preference for the latter. For this reason, there are two superblocks
> under the first 1T. This should cover use cases for physical devices and
> for emulated/device-mapper devices.
> 
> The superblock logging zones are reserved for superblock logging and
> never used for data or metadata blocks. Note that we only reserve the
> two zones per primary/copy actually used for superblock logging. We do
> not reserve the ranges of zones possibly containing superblocks with the
> largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).
> 
> The zones containing the fixed location offsets used to store
> superblocks on a non-zoned volume are also reserved to avoid confusion.
> 
> Signed-off-by: Naohiro Aota 
> Signed-off-by: David Sterba 
> ---
> 
> For context see replies under
> https://lore.kernel.org/linux-btrfs/2f58edb74695825632c77349b000d31f16cb3226.1617870145.git.naohiro.a...@wdc.com/
> 
>  fs/btrfs/zoned.c | 53 ++--
>  1 file changed, 42 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 1f972b75a9ab..eeb3ebe11d7a 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -21,9 +21,30 @@
>  /* Pseudo write pointer value for conventional zone */
>  #define WP_CONVENTIONAL ((u64)-2)
>  
> +/*
> + * Location of the first zone of superblock logging zone pairs.
> + *
> + * - primary superblock:0B (zone 0)
> + * - first copy:  512G (zone starting at that offset)
> + * - second copy:   4T (zone starting at that offset)
> + */
> +#define BTRFS_SB_LOG_PRIMARY_OFFSET  (0ULL)
> +#define BTRFS_SB_LOG_FIRST_OFFSET(512ULL * SZ_1G)
> +#define BTRFS_SB_LOG_SECOND_OFFSET   (4096ULL * SZ_1G)
> +
> +#define BTRFS_SB_LOG_FIRST_SHIFT const_ilog2(BTRFS_SB_LOG_FIRST_OFFSET)
> +#define BTRFS_SB_LOG_SECOND_SHIFTconst_ilog2(BTRFS_SB_LOG_SECOND_OFFSET)
> +
>  /* Number of superblock log zones */
>  #define BTRFS_NR_SB_LOG_ZONES 2
>  
> +/*
> + * Maximum supported zone size. Currently, SMR disks have a zone size of
> + * 256MiB, and we are expecting ZNS drives to be in the 1-4GiB range. We do 
> not
> + * expect the zone size to become larger than 8GiB in the near future.
> + */
> +#define BTRFS_MAX_ZONE_SIZE  SZ_8G
> +
>  static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void 
> *data)
>  {
>   struct blk_zone *zones = data;
> @@ -111,23 +132,22 @@ static int sb_write_pointer(struct block_device *bdev, 
> struct blk_zone *zones,
>  }
>  
>  /*
> - * The following zones are reserved as the circular buffer on ZONED btrfs.
> - *  - The primary superblock: zones 0 and 1
> - *  - The first copy: zones 16 and 17
> - *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
> - * the following one
> + * Get the first zone number of the superblock mirror
>   */
>  static inline u32 sb_zone_number(int shift, int mirror)
>  {
> - ASSERT(mirror < BTRFS_SUPER_MIRROR_MAX);
> + u64 zon

Re: [PATCH v3] btrfs: zoned: move superblock logging zone location

2021-04-11 Thread Johannes Thumshirn
On 10/04/2021 12:15, David Sterba wrote:
> From: Naohiro Aota 
> 
> Moves the location of the superblock logging zones. The new locations of
> the logging zones are now determined based on fixed block addresses
> instead of on fixed zone numbers.
> 
> The old placement method based on fixed zone numbers causes problems when
> one needs to inspect a file system image without access to the drive zone
> information. In such case, the super block locations cannot be reliably
> determined as the zone size is unknown. By locating the superblock logging
> zones using fixed addresses, we can scan a dumped file system image without
> the zone information since a super block copy will always be present at or
> after the fixed known locations.
> 
> Introduce the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - primary superblock: offset   0B (and the following zone)
>   - first copy: offset 512G (and the following zone)
>   - Second copy:offset   4T (4096G, and the following zone)
> 
> If a logging zone is outside of the disk capacity, we do not record the
> superblock copy.
> 
> The first copy position is much larger than for a non-zoned filesystem,
> which is at 64M.  This is to avoid overlapping with the log zones for
> the primary superblock. This higher location is arbitrary but allows
> supporting devices with very large zone sizes, plus some space around in
> between.
> 
> Such large zone size is unrealistic and very unlikely to ever be seen in
> real devices. Currently, SMR disks have a zone size of 256MB, and we are
> expecting ZNS drives to be in the 1-4GB range, so this limit gives us
> room to breathe. For now, we only allow zone sizes up to 8GB. The
> maximum zone size that would still fit in the space is 256G.
> 
> The fixed location addresses are somewhat arbitrary, with the intent of
> maintaining superblock reliability for smaller and larger devices, with
> the preference for the latter. For this reason, there are two superblocks
> under the first 1T. This should cover use cases for physical devices and
> for emulated/device-mapper devices.
> 
> The superblock logging zones are reserved for superblock logging and
> never used for data or metadata blocks. Note that we only reserve the
> two zones per primary/copy actually used for superblock logging. We do
> not reserve the ranges of zones possibly containing superblocks with the
> largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).
> 
> The zones containing the fixed location offsets used to store
> superblocks on a non-zoned volume are also reserved to avoid confusion.
> 
> Signed-off-by: Naohiro Aota 
> Signed-off-by: David Sterba 

Looks good to me, Thanks
Reviewed-by: Johannes Thumshirn