Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones

2020-03-24 Thread Damien Le Moal
On 2020/03/24 16:51, Hannes Reinecke wrote:
> On 3/24/20 4:52 AM, Damien Le Moal wrote:
>> +Bob who had proposed a similar change a last month.
>>
>> On 2020/03/24 0:04, Hannes Reinecke wrote:
>>> Implement 'cache' zones which reside on a different device.
>>> The device is logically split into zones, which then will be
>>> used as 'cache' zones, similar to the existing randow write
>>> zones.
>>
>> It does look like the new "cache" zones are really used exactly as 
>> conventional
>> zones of the SMR drive. So I wonder: why even define this new zone type ? We
>> could have the "cache" device split into random (conventional) zones added 
>> to a
>> single pool of random zones. We can simply add device awareness to the zone
>> allocator to avoid as much as possible using a random zone from the same 
>> drive
>> as the sequential zone it buffers. That would avoid repeating most of the 
>> code
>> for cache & random.
>>
> Yes, indeed that was the idea to keep 'cache' and 'random' zones 
> essentially similar. But then as there is a need to differentiate 
> between them I thought it easier to introduce a new zone type.
> 
> However, it's a nice idea to use the device to differentiate between 
> both. And it would even lend to a simpler reclaim mechanism; set the low 
> watermark when all random zones on the cache device are full, and set 
> the high watermark when half of the random zones on the SMR device are full.
> 
> I'll give it a go and see where I end up.
> 
>> Furthermore, this work is really great to support SMR drives with no
>> conventional zones (a lot of ask for these). And considering that the new 
>> FORMAT
>> WITH PRESET command is coming soon, a user will be able to reformat an SMR 
>> drive
>> with sequential zones only to maximize capacity. For these, the cache device
>> would need to hold the random zones, at which point the difference between 
>> cache
>> and rando goes away.
>>
> I know :-)
> 
>>>
>>> Signed-off-by: Hannes Reinecke 
>>> ---
>>>   drivers/md/dm-zoned-metadata.c | 174 -
>>>   drivers/md/dm-zoned-reclaim.c  |  76 +++---
>>>   drivers/md/dm-zoned-target.c   | 109 ++---
>>>   drivers/md/dm-zoned.h  |  31 +-
>>>   4 files changed, 339 insertions(+), 51 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>>> index 369de15c4e80..41cc3a29db0b 100644
>>> --- a/drivers/md/dm-zoned-metadata.c
>>> +++ b/drivers/md/dm-zoned-metadata.c
>>> @@ -132,6 +132,8 @@ struct dmz_sb {
>>>   struct dmz_metadata {
>>> struct dmz_dev  *dev;
>>>   
>>> +   struct dmz_cdev *cdev;
>>
>> Given the point above, we could have this generalized as an array of devices,
>> with the first one meeting the constraints:
>> * It contains the metadata
>> * It has random/conventional zones, or is a regular device (with all its
>> capacity used through emulated random zones)
>>
>> I do not think that complicates the changes you did a lot. The reclaim part 
>> will
>> need some more love I guess to be efficient, but it may be as simple as 
>> defining
>> one work struct for each drive beside the first one.
>>
>> Thoughts ?
>>
> Rather not. Stringing several devices together essentially emulates a 
> RAID0 setup without any of the benefits. And the reclaim mechanism gets 
> infinitely more complex.

OK. Fair point.

> 
> Another thing: I would need to update the metadata to hold the device 
> and zoneset UUID; both devices need to carry a metadata so that we can 
> stitch them together upon restart.
> 
> But some bright soul put a crc in the middle of the metadata :-(
> So we can't easily extend the metadata with new fields as then the 
> meaning of the crc is unclear; as it stands it would only cover the old 
> fields, and not the new ones.

Haha ! OK. You got me. Not my finest hour on this one :)

> So I would propose a 'v2' metadata, holding the crc as the last entry of 
> the metadata. And adding a device UUID and cacheset UUID.
> And ensuring that the first metadata set is stored on the cache device, 
> and the backup one on the SMR device.

That would work.

> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones

2020-03-24 Thread Hannes Reinecke

On 3/24/20 4:52 AM, Damien Le Moal wrote:

+Bob who had proposed a similar change a last month.

On 2020/03/24 0:04, Hannes Reinecke wrote:

Implement 'cache' zones which reside on a different device.
The device is logically split into zones, which then will be
used as 'cache' zones, similar to the existing randow write
zones.


It does look like the new "cache" zones are really used exactly as conventional
zones of the SMR drive. So I wonder: why even define this new zone type ? We
could have the "cache" device split into random (conventional) zones added to a
single pool of random zones. We can simply add device awareness to the zone
allocator to avoid as much as possible using a random zone from the same drive
as the sequential zone it buffers. That would avoid repeating most of the code
for cache & random.

Yes, indeed that was the idea to keep 'cache' and 'random' zones 
essentially similar. But then as there is a need to differentiate 
between them I thought it easier to introduce a new zone type.


However, it's a nice idea to use the device to differentiate between 
both. And it would even lend to a simpler reclaim mechanism; set the low 
watermark when all random zones on the cache device are full, and set 
the high watermark when half of the random zones on the SMR device are full.


I'll give it a go and see where I end up.


Furthermore, this work is really great to support SMR drives with no
conventional zones (a lot of ask for these). And considering that the new FORMAT
WITH PRESET command is coming soon, a user will be able to reformat an SMR drive
with sequential zones only to maximize capacity. For these, the cache device
would need to hold the random zones, at which point the difference between cache
and rando goes away.


I know :-)



Signed-off-by: Hannes Reinecke 
---
  drivers/md/dm-zoned-metadata.c | 174 -
  drivers/md/dm-zoned-reclaim.c  |  76 +++---
  drivers/md/dm-zoned-target.c   | 109 ++---
  drivers/md/dm-zoned.h  |  31 +-
  4 files changed, 339 insertions(+), 51 deletions(-)

diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
index 369de15c4e80..41cc3a29db0b 100644
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@ -132,6 +132,8 @@ struct dmz_sb {
  struct dmz_metadata {
struct dmz_dev  *dev;
  
+	struct dmz_cdev		*cdev;


Given the point above, we could have this generalized as an array of devices,
with the first one meeting the constraints:
* It contains the metadata
* It has random/conventional zones, or is a regular device (with all its
capacity used through emulated random zones)

I do not think that complicates the changes you did a lot. The reclaim part will
need some more love I guess to be efficient, but it may be as simple as defining
one work struct for each drive beside the first one.

Thoughts ?

Rather not. Stringing several devices together essentially emulates a 
RAID0 setup without any of the benefits. And the reclaim mechanism gets 
infinitely more complex.


Another thing: I would need to update the metadata to hold the device 
and zoneset UUID; both devices need to carry a metadata so that we can 
stitch them together upon restart.


But some bright soul put a crc in the middle of the metadata :-(
So we can't easily extend the metadata with new fields as then the 
meaning of the crc is unclear; as it stands it would only cover the old 
fields, and not the new ones.


So I would propose a 'v2' metadata, holding the crc as the last entry of 
the metadata. And adding a device UUID and cacheset UUID.
And ensuring that the first metadata set is stored on the cache device, 
and the backup one on the SMR device.


Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones

2020-03-24 Thread Bob Liu
On 3/24/20 11:52 AM, Damien Le Moal wrote:
> +Bob who had proposed a similar change a last month.
> 

Thanks!

> On 2020/03/24 0:04, Hannes Reinecke wrote:
>> Implement 'cache' zones which reside on a different device.
>> The device is logically split into zones, which then will be
>> used as 'cache' zones, similar to the existing randow write
>> zones.
> 
> It does look like the new "cahce" zones are really used exactly as 
> conventional
> zones of the SMR drive. So I wonder: why even define this new zone type ? We
> could have the "cache" device split into random (conventional) zones added to 
> a
> single pool of random zones. We can simply add device awareness to the zone
> allocator to avoid as much as possible using a random zone from the same drive
> as the sequential zone it buffers. That would avoid repeating most of the code
> for cache & random.
> 
> Furthermore, this work is really great to support SMR drives with no
> conventional zones (a lot of ask for these). And considering that the new 
> FORMAT
> WITH PRESET command is coming soon, a user will be able to reformat an SMR 
> drive
> with sequential zones only to maximize capacity. For these, the cache device
> would need to hold the random zones, at which point the difference between 
> cache
> and rando goes away.
> 
>>
>> Signed-off-by: Hannes Reinecke 
>> ---
>>  drivers/md/dm-zoned-metadata.c | 174 -
>>  drivers/md/dm-zoned-reclaim.c  |  76 +++---
>>  drivers/md/dm-zoned-target.c   | 109 ++---
>>  drivers/md/dm-zoned.h  |  31 +-
>>  4 files changed, 339 insertions(+), 51 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 369de15c4e80..41cc3a29db0b 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -132,6 +132,8 @@ struct dmz_sb {
>>  struct dmz_metadata {
>>  struct dmz_dev  *dev;
>>  
>> +struct dmz_cdev *cdev;
> 
> Given the point above, we could have this generalized as an array of devices,
> with the first one meeting the constraints:
> * It contains the metadata
> * It has random/conventional zones, or is a regular device (with all its
> capacity used through emulated random zones)
> 

Yes, I was working my v2 patches as this.
Will send out this week.

Regards,
Bob

> I do not think that complicates the changes you did a lot. The reclaim part 
> will
> need some more love I guess to be efficient, but it may be as simple as 
> defining
> one work struct for each drive beside the first one.
> 
> Thoughts ?
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones

2020-03-23 Thread Damien Le Moal
+Bob who had proposed a similar change a last month.

On 2020/03/24 0:04, Hannes Reinecke wrote:
> Implement 'cache' zones which reside on a different device.
> The device is logically split into zones, which then will be
> used as 'cache' zones, similar to the existing randow write
> zones.

It does look like the new "cahce" zones are really used exactly as conventional
zones of the SMR drive. So I wonder: why even define this new zone type ? We
could have the "cache" device split into random (conventional) zones added to a
single pool of random zones. We can simply add device awareness to the zone
allocator to avoid as much as possible using a random zone from the same drive
as the sequential zone it buffers. That would avoid repeating most of the code
for cache & random.

Furthermore, this work is really great to support SMR drives with no
conventional zones (a lot of ask for these). And considering that the new FORMAT
WITH PRESET command is coming soon, a user will be able to reformat an SMR drive
with sequential zones only to maximize capacity. For these, the cache device
would need to hold the random zones, at which point the difference between cache
and rando goes away.

> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/md/dm-zoned-metadata.c | 174 -
>  drivers/md/dm-zoned-reclaim.c  |  76 +++---
>  drivers/md/dm-zoned-target.c   | 109 ++---
>  drivers/md/dm-zoned.h  |  31 +-
>  4 files changed, 339 insertions(+), 51 deletions(-)
> 
> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
> index 369de15c4e80..41cc3a29db0b 100644
> --- a/drivers/md/dm-zoned-metadata.c
> +++ b/drivers/md/dm-zoned-metadata.c
> @@ -132,6 +132,8 @@ struct dmz_sb {
>  struct dmz_metadata {
>   struct dmz_dev  *dev;
>  
> + struct dmz_cdev *cdev;

Given the point above, we could have this generalized as an array of devices,
with the first one meeting the constraints:
* It contains the metadata
* It has random/conventional zones, or is a regular device (with all its
capacity used through emulated random zones)

I do not think that complicates the changes you did a lot. The reclaim part will
need some more love I guess to be efficient, but it may be as simple as defining
one work struct for each drive beside the first one.

Thoughts ?

> +
>   sector_tzone_bitmap_size;
>   unsigned intzone_nr_bitmap_blocks;
>   unsigned intzone_bits_per_mblk;
> @@ -139,10 +141,12 @@ struct dmz_metadata {
>   unsigned intnr_bitmap_blocks;
>   unsigned intnr_map_blocks;
>  
> + unsigned intnr_zones;
>   unsigned intnr_useable_zones;
>   unsigned intnr_meta_blocks;
>   unsigned intnr_meta_zones;
>   unsigned intnr_data_zones;
> + unsigned intnr_cache_zones;
>   unsigned intnr_rnd_zones;
>   unsigned intnr_reserved_seq;
>   unsigned intnr_chunks;
> @@ -173,6 +177,11 @@ struct dmz_metadata {
>   struct list_headunmap_rnd_list;
>   struct list_headmap_rnd_list;
>  
> + unsigned intnr_cache;
> + atomic_tunmap_nr_cache;
> + struct list_headunmap_cache_list;
> + struct list_headmap_cache_list;
> +
>   unsigned intnr_seq;
>   atomic_tunmap_nr_seq;
>   struct list_headunmap_seq_list;
> @@ -189,17 +198,25 @@ struct dmz_metadata {
>   */
>  unsigned int dmz_id(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> - return ((unsigned int)(zone - zmd->zones));
> + return zone->id;
>  }
>  
>  sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> - return (sector_t)dmz_id(zmd, zone) << zmd->dev->zone_nr_sectors_shift;
> + sector_t zone_id = dmz_id(zmd, zone);
> +
> + if (dmz_is_cache(zone))
> + zone_id -= zmd->dev->nr_zones;
> + return zone_id << zmd->dev->zone_nr_sectors_shift;
>  }
>  
>  sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone)
>  {
> - return (sector_t)dmz_id(zmd, zone) << zmd->dev->zone_nr_blocks_shift;
> + sector_t zone_id = dmz_id(zmd, zone);
> +
> + if (dmz_is_cache(zone))
> + zone_id -= zmd->dev->nr_zones;
> + return zone_id << zmd->dev->zone_nr_blocks_shift;
>  }
>  
>  unsigned int dmz_nr_chunks(struct dmz_metadata *zmd)
> @@ -217,6 +234,16 @@ unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata 
> *zmd)
>   return atomic_read(&zmd->unmap_nr_rnd);
>  }
>  
> +unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd)
> +{
> + return zmd->nr_cache;
> +}
> +
> +unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd)
> +{
> + return atomic_read(&zmd->unmap_nr_cache);
> +}
> +
>  /*
>   * Lock/unlock mapping table.
>   * The map lock also