Re: [PATCH] btrfs: zoned: move superblock logging zone location

2021-04-07 Thread Josef Bacik

On 4/7/21 2:31 PM, Johannes Thumshirn wrote:

On 07/04/2021 19:54, Josef Bacik wrote:

On 3/15/21 1:53 AM, Naohiro Aota wrote:

This commit moves the location of superblock logging zones. The location of
the logging zones are determined based on fixed block addresses instead of
on fixed zone numbers.

By locating the superblock zones using fixed addresses, we can scan a
dumped file system image without the zone information. And, no drawbacks
exist.

We use the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.

- Primary superblock: zone starting at offset 0 and the following zone
- First copy: zone containing offset 64GB and the following zone
- Second copy: zone containing offset 256GB and the following zone

If the location of the zones are outside of disk, we don't record the
superblock copy.

These addresses are arbitrary, but using addresses that are too large
reduces superblock reliability for smaller devices, so we do not want to
exceed 1T to cover all case nicely.

Also, LBAs are generally distributed initially across one head (platter
side) up to one or more zones, then go on the next head backward (the other
side of the same platter), and on to the following head/platter. Thus using
non sequential fixed addresses for superblock logging, such as 0/64G/256G,
likely result in each superblock copy being on a different head/platter
which improves chances of recovery in case of superblock read error.

These zones are reserved for superblock logging and never used for data or
metadata blocks. Zones containing the offsets used to store superblocks in
a regular btrfs volume (no zoned case) are also reserved to avoid
confusion.

Note that we only reserve the 2 zones per primary/copy actually used for
superblock logging. We don't reserve the ranges possibly containing
superblock with the largest supported zone size (0-16GB, 64G-80GB,
256G-272GB).

The first copy position is much larger than for a regular btrfs volume
(64M).  This increase is to avoid overlapping with the log zones for the
primary superblock. This higher location is arbitrary but allows supporting
devices with very large zone size, up to 32GB. But we only allow zone sizes
up to 8GB for now.



Ok it took me a few reads to figure out what's going on.

The problem is that with large zone sizes, our current choices put the back up
super blocks wyy out on the disk, correct?  So instead you've picked
arbitrary byte offsets, hoping that they'll be closer to the front of the disk
and thus actually be useful.

And then you've introduced the 8gib zone size as a way to avoid problems where
we get the same zone for the backup supers.

Are these statements correct?  If so the changelog should be updated to make
this clear up front, because it took me a while to work that out.


No the problem is, we're placing superblocks into specific zones, regardless of
the zone size. This creates a problem when you need to inspect a file system,
but don't have the block device available, because you can't look at the zone
size to calculate where the superblocks are on the device.

With this change we're placing the superblocks not into specific zone numbers,
but into the zones starting at specific offsets. We're taking 8G zone size as
a maximum expected zone size, to make sure we're not overlapping superblock
zones. Currently SMR disks have a zone size of 256MB and we're expecting ZNS
drives to be in the 1-2GB range, so this 8GB gives us room to breath.

Hope this helps clearing up any confusion.



Ok this makes a lot more sense, and should be the first thing in the changelog, 
because I still got it wrong after reading the thing a few times.


And I think it's worth pointing out in the comments that 8gib represents a zone 
size that doesn't exist currently and is likely to never exist.


That will make this much easier to grok and understand in the future.  Thanks,

Josef


Re: [PATCH] btrfs: zoned: move superblock logging zone location

2021-04-07 Thread Johannes Thumshirn
On 07/04/2021 19:54, Josef Bacik wrote:
> On 3/15/21 1:53 AM, Naohiro Aota wrote:
>> This commit moves the location of superblock logging zones. The location of
>> the logging zones are determined based on fixed block addresses instead of
>> on fixed zone numbers.
>>
>> By locating the superblock zones using fixed addresses, we can scan a
>> dumped file system image without the zone information. And, no drawbacks
>> exist.
>>
>> We use the following three pairs of zones containing fixed offset
>> locations, regardless of the device zone size.
>>
>>- Primary superblock: zone starting at offset 0 and the following zone
>>- First copy: zone containing offset 64GB and the following zone
>>- Second copy: zone containing offset 256GB and the following zone
>>
>> If the location of the zones are outside of disk, we don't record the
>> superblock copy.
>>
>> These addresses are arbitrary, but using addresses that are too large
>> reduces superblock reliability for smaller devices, so we do not want to
>> exceed 1T to cover all case nicely.
>>
>> Also, LBAs are generally distributed initially across one head (platter
>> side) up to one or more zones, then go on the next head backward (the other
>> side of the same platter), and on to the following head/platter. Thus using
>> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
>> likely result in each superblock copy being on a different head/platter
>> which improves chances of recovery in case of superblock read error.
>>
>> These zones are reserved for superblock logging and never used for data or
>> metadata blocks. Zones containing the offsets used to store superblocks in
>> a regular btrfs volume (no zoned case) are also reserved to avoid
>> confusion.
>>
>> Note that we only reserve the 2 zones per primary/copy actually used for
>> superblock logging. We don't reserve the ranges possibly containing
>> superblock with the largest supported zone size (0-16GB, 64G-80GB,
>> 256G-272GB).
>>
>> The first copy position is much larger than for a regular btrfs volume
>> (64M).  This increase is to avoid overlapping with the log zones for the
>> primary superblock. This higher location is arbitrary but allows supporting
>> devices with very large zone size, up to 32GB. But we only allow zone sizes
>> up to 8GB for now.
>>
> 
> Ok it took me a few reads to figure out what's going on.
> 
> The problem is that with large zone sizes, our current choices put the back 
> up 
> super blocks wyy out on the disk, correct?  So instead you've picked 
> arbitrary byte offsets, hoping that they'll be closer to the front of the 
> disk 
> and thus actually be useful.
> 
> And then you've introduced the 8gib zone size as a way to avoid problems 
> where 
> we get the same zone for the backup supers.
> 
> Are these statements correct?  If so the changelog should be updated to make 
> this clear up front, because it took me a while to work that out.

No the problem is, we're placing superblocks into specific zones, regardless of
the zone size. This creates a problem when you need to inspect a file system,
but don't have the block device available, because you can't look at the zone 
size to calculate where the superblocks are on the device.

With this change we're placing the superblocks not into specific zone numbers,
but into the zones starting at specific offsets. We're taking 8G zone size as
a maximum expected zone size, to make sure we're not overlapping superblock
zones. Currently SMR disks have a zone size of 256MB and we're expecting ZNS
drives to be in the 1-2GB range, so this 8GB gives us room to breath.

Hope this helps clearing up any confusion.

Byte,
Johannes


Re: [PATCH] btrfs: zoned: move superblock logging zone location

2021-04-07 Thread Josef Bacik

On 3/15/21 1:53 AM, Naohiro Aota wrote:

This commit moves the location of superblock logging zones. The location of
the logging zones are determined based on fixed block addresses instead of
on fixed zone numbers.

By locating the superblock zones using fixed addresses, we can scan a
dumped file system image without the zone information. And, no drawbacks
exist.

We use the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.

   - Primary superblock: zone starting at offset 0 and the following zone
   - First copy: zone containing offset 64GB and the following zone
   - Second copy: zone containing offset 256GB and the following zone

If the location of the zones are outside of disk, we don't record the
superblock copy.

These addresses are arbitrary, but using addresses that are too large
reduces superblock reliability for smaller devices, so we do not want to
exceed 1T to cover all case nicely.

Also, LBAs are generally distributed initially across one head (platter
side) up to one or more zones, then go on the next head backward (the other
side of the same platter), and on to the following head/platter. Thus using
non sequential fixed addresses for superblock logging, such as 0/64G/256G,
likely result in each superblock copy being on a different head/platter
which improves chances of recovery in case of superblock read error.

These zones are reserved for superblock logging and never used for data or
metadata blocks. Zones containing the offsets used to store superblocks in
a regular btrfs volume (no zoned case) are also reserved to avoid
confusion.

Note that we only reserve the 2 zones per primary/copy actually used for
superblock logging. We don't reserve the ranges possibly containing
superblock with the largest supported zone size (0-16GB, 64G-80GB,
256G-272GB).

The first copy position is much larger than for a regular btrfs volume
(64M).  This increase is to avoid overlapping with the log zones for the
primary superblock. This higher location is arbitrary but allows supporting
devices with very large zone size, up to 32GB. But we only allow zone sizes
up to 8GB for now.



Ok it took me a few reads to figure out what's going on.

The problem is that with large zone sizes, our current choices put the back up 
super blocks wyy out on the disk, correct?  So instead you've picked 
arbitrary byte offsets, hoping that they'll be closer to the front of the disk 
and thus actually be useful.


And then you've introduced the 8gib zone size as a way to avoid problems where 
we get the same zone for the backup supers.


Are these statements correct?  If so the changelog should be updated to make 
this clear up front, because it took me a while to work that out.


Something at the beginning like the following

"With larger zone sizes, for example 8gib, the 3rd backup super would be located 
8tib into the device.  However not all zoned block devices are this large.  In 
order to fix this limitation set the zones to a static byte offset, and 
calculate the zone number from there based on the devices zone size."


So that it's clear from the outset why we're making this change.

And this brings up another problem, in that what happens when we _do_ run into 
block devices that have huge zones, like 64gib zones?  We have to change the 
disk format to support these devices.  I'm not against that per-se, but it seems 
like a limitation, even if it's unlikely to ever happen.  With the locations we 
currently have, any arbitrary zone size is going to work in the future, and the 
only drawback is you need a device of a certain size to take advantage of the 
back up super blocks.  I would hope that we don't have 64gib zone size block 
devices that are only 128gib in size in the future.




Signed-off-by: Naohiro Aota 
---
  fs/btrfs/zoned.c | 39 +++
  1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 43948bd40e02..6a72ca1f7988 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -21,9 +21,24 @@
  /* Pseudo write pointer value for conventional zone */
  #define WP_CONVENTIONAL ((u64)-2)
  
+/*

+ * Location of the first zone of superblock logging zone pairs.
+ * - Primary superblock: the zone containing offset 0 (zone 0)
+ * - First superblock copy: the zone containing offset 64G
+ * - Second superblock copy: the zone containing offset 256G
+ */
+#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL
+#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G)
+#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
+#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
+#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)
+
  /* Number of superblock log zones */
  #define BTRFS_NR_SB_LOG_ZONES 2
  
+/* Max size of supported zone size */

+#define BTRFS_MAX_ZONE_SIZE SZ_8G
+
  static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void 
*data)

Re: [PATCH] btrfs: zoned: move superblock logging zone location

2021-03-26 Thread Johannes Thumshirn
On 15/03/2021 06:55, Naohiro Aota wrote:
> This commit moves the location of superblock logging zones. The location of
> the logging zones are determined based on fixed block addresses instead of
> on fixed zone numbers.
> 
> By locating the superblock zones using fixed addresses, we can scan a
> dumped file system image without the zone information. And, no drawbacks
> exist.
> 
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - Primary superblock: zone starting at offset 0 and the following zone
>   - First copy: zone containing offset 64GB and the following zone
>   - Second copy: zone containing offset 256GB and the following zone
> 
> If the location of the zones are outside of disk, we don't record the
> superblock copy.
> 
> These addresses are arbitrary, but using addresses that are too large
> reduces superblock reliability for smaller devices, so we do not want to
> exceed 1T to cover all case nicely.
> 
> Also, LBAs are generally distributed initially across one head (platter
> side) up to one or more zones, then go on the next head backward (the other
> side of the same platter), and on to the following head/platter. Thus using
> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
> likely result in each superblock copy being on a different head/platter
> which improves chances of recovery in case of superblock read error.
> 
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> Note that we only reserve the 2 zones per primary/copy actually used for
> superblock logging. We don't reserve the ranges possibly containing
> superblock with the largest supported zone size (0-16GB, 64G-80GB,
> 256G-272GB).
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
> 
> Signed-off-by: Naohiro Aota 

Ping?


Re: [PATCH] btrfs: zoned: move superblock logging zone location

2021-03-24 Thread Damien Le Moal
On 2021/03/15 14:55, Naohiro Aota wrote:
> This commit moves the location of superblock logging zones. The location of
> the logging zones are determined based on fixed block addresses instead of
> on fixed zone numbers.

David,

Any comment on this ? It would be nice to get this settled in this cycle so that
we have a stable on-disk format going forward. btrfs-tools and libblkid zoned
support patches also depend on this.

> 
> By locating the superblock zones using fixed addresses, we can scan a
> dumped file system image without the zone information. And, no drawbacks
> exist.
> 
> We use the following three pairs of zones containing fixed offset
> locations, regardless of the device zone size.
> 
>   - Primary superblock: zone starting at offset 0 and the following zone
>   - First copy: zone containing offset 64GB and the following zone
>   - Second copy: zone containing offset 256GB and the following zone
> 
> If the location of the zones are outside of disk, we don't record the
> superblock copy.
> 
> These addresses are arbitrary, but using addresses that are too large
> reduces superblock reliability for smaller devices, so we do not want to
> exceed 1T to cover all case nicely.
> 
> Also, LBAs are generally distributed initially across one head (platter
> side) up to one or more zones, then go on the next head backward (the other
> side of the same platter), and on to the following head/platter. Thus using
> non sequential fixed addresses for superblock logging, such as 0/64G/256G,
> likely result in each superblock copy being on a different head/platter
> which improves chances of recovery in case of superblock read error.
> 
> These zones are reserved for superblock logging and never used for data or
> metadata blocks. Zones containing the offsets used to store superblocks in
> a regular btrfs volume (no zoned case) are also reserved to avoid
> confusion.
> 
> Note that we only reserve the 2 zones per primary/copy actually used for
> superblock logging. We don't reserve the ranges possibly containing
> superblock with the largest supported zone size (0-16GB, 64G-80GB,
> 256G-272GB).
> 
> The first copy position is much larger than for a regular btrfs volume
> (64M).  This increase is to avoid overlapping with the log zones for the
> primary superblock. This higher location is arbitrary but allows supporting
> devices with very large zone size, up to 32GB. But we only allow zone sizes
> up to 8GB for now.
> 
> Signed-off-by: Naohiro Aota 
> ---
>  fs/btrfs/zoned.c | 39 +++
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 43948bd40e02..6a72ca1f7988 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -21,9 +21,24 @@
>  /* Pseudo write pointer value for conventional zone */
>  #define WP_CONVENTIONAL ((u64)-2)
>  
> +/*
> + * Location of the first zone of superblock logging zone pairs.
> + * - Primary superblock: the zone containing offset 0 (zone 0)
> + * - First superblock copy: the zone containing offset 64G
> + * - Second superblock copy: the zone containing offset 256G
> + */
> +#define BTRFS_PRIMARY_SB_LOG_ZONE 0ULL
> +#define BTRFS_FIRST_SB_LOG_ZONE (64ULL * SZ_1G)
> +#define BTRFS_SECOND_SB_LOG_ZONE (256ULL * SZ_1G)
> +#define BTRFS_FIRST_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_FIRST_SB_LOG_ZONE)
> +#define BTRFS_SECOND_SB_LOG_ZONE_SHIFT const_ilog2(BTRFS_SECOND_SB_LOG_ZONE)
> +
>  /* Number of superblock log zones */
>  #define BTRFS_NR_SB_LOG_ZONES 2
>  
> +/* Max size of supported zone size */
> +#define BTRFS_MAX_ZONE_SIZE SZ_8G
> +
>  static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void 
> *data)
>  {
>   struct blk_zone *zones = data;
> @@ -111,11 +126,8 @@ static int sb_write_pointer(struct block_device *bdev, 
> struct blk_zone *zones,
>  }
>  
>  /*
> - * The following zones are reserved as the circular buffer on ZONED btrfs.
> - *  - The primary superblock: zones 0 and 1
> - *  - The first copy: zones 16 and 17
> - *  - The second copy: zones 1024 or zone at 256GB which is minimum, and
> - * the following one
> + * Get the zone number of the first zone of a pair of contiguous zones used
> + * for superblock logging.
>   */
>  static inline u32 sb_zone_number(int shift, int mirror)
>  {
> @@ -123,8 +135,8 @@ static inline u32 sb_zone_number(int shift, int mirror)
>  
>   switch (mirror) {
>   case 0: return 0;
> - case 1: return 16;
> - case 2: return min_t(u64, btrfs_sb_offset(mirror) >> shift, 1024);
> + case 1: return 1 << (BTRFS_FIRST_SB_LOG_ZONE_SHIFT - shift);
> + case 2: return 1 << (BTRFS_SECOND_SB_LOG_ZONE_SHIFT - shift);
>   }
>  
>   return 0;
> @@ -300,10 +312,21 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device)
>   zone_sectors = bdev_zone_sectors(bdev);
>   }
>  
> - nr_sectors = bdev_nr_sectors(bdev);
>   /* Check if it's power of 2 (see is_power_of_2) 

Re: [PATCH] btrfs: zoned: move superblock logging zone location

2021-03-19 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn