Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones
On 2020/03/24 16:51, Hannes Reinecke wrote: > On 3/24/20 4:52 AM, Damien Le Moal wrote: >> +Bob who had proposed a similar change a last month. >> >> On 2020/03/24 0:04, Hannes Reinecke wrote: >>> Implement 'cache' zones which reside on a different device. >>> The device is logically split into zones, which then will be >>> used as 'cache' zones, similar to the existing randow write >>> zones. >> >> It does look like the new "cache" zones are really used exactly as >> conventional >> zones of the SMR drive. So I wonder: why even define this new zone type ? We >> could have the "cache" device split into random (conventional) zones added >> to a >> single pool of random zones. We can simply add device awareness to the zone >> allocator to avoid as much as possible using a random zone from the same >> drive >> as the sequential zone it buffers. That would avoid repeating most of the >> code >> for cache & random. >> > Yes, indeed that was the idea to keep 'cache' and 'random' zones > essentially similar. But then as there is a need to differentiate > between them I thought it easier to introduce a new zone type. > > However, it's a nice idea to use the device to differentiate between > both. And it would even lend to a simpler reclaim mechanism; set the low > watermark when all random zones on the cache device are full, and set > the high watermark when half of the random zones on the SMR device are full. > > I'll give it a go and see where I end up. > >> Furthermore, this work is really great to support SMR drives with no >> conventional zones (a lot of ask for these). And considering that the new >> FORMAT >> WITH PRESET command is coming soon, a user will be able to reformat an SMR >> drive >> with sequential zones only to maximize capacity. For these, the cache device >> would need to hold the random zones, at which point the difference between >> cache >> and rando goes away. >> > I know :-) > >>> >>> Signed-off-by: Hannes Reinecke >>> --- >>> drivers/md/dm-zoned-metadata.c | 174 - >>> drivers/md/dm-zoned-reclaim.c | 76 +++--- >>> drivers/md/dm-zoned-target.c | 109 ++--- >>> drivers/md/dm-zoned.h | 31 +- >>> 4 files changed, 339 insertions(+), 51 deletions(-) >>> >>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c >>> index 369de15c4e80..41cc3a29db0b 100644 >>> --- a/drivers/md/dm-zoned-metadata.c >>> +++ b/drivers/md/dm-zoned-metadata.c >>> @@ -132,6 +132,8 @@ struct dmz_sb { >>> struct dmz_metadata { >>> struct dmz_dev *dev; >>> >>> + struct dmz_cdev *cdev; >> >> Given the point above, we could have this generalized as an array of devices, >> with the first one meeting the constraints: >> * It contains the metadata >> * It has random/conventional zones, or is a regular device (with all its >> capacity used through emulated random zones) >> >> I do not think that complicates the changes you did a lot. The reclaim part >> will >> need some more love I guess to be efficient, but it may be as simple as >> defining >> one work struct for each drive beside the first one. >> >> Thoughts ? >> > Rather not. Stringing several devices together essentially emulates a > RAID0 setup without any of the benefits. And the reclaim mechanism gets > infinitely more complex. OK. Fair point. > > Another thing: I would need to update the metadata to hold the device > and zoneset UUID; both devices need to carry a metadata so that we can > stitch them together upon restart. > > But some bright soul put a crc in the middle of the metadata :-( > So we can't easily extend the metadata with new fields as then the > meaning of the crc is unclear; as it stands it would only cover the old > fields, and not the new ones. Haha ! OK. You got me. Not my finest hour on this one :) > So I would propose a 'v2' metadata, holding the crc as the last entry of > the metadata. And adding a device UUID and cacheset UUID. > And ensuring that the first metadata set is stored on the cache device, > and the backup one on the SMR device. That would work. > > Cheers, > > Hannes > -- Damien Le Moal Western Digital Research -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones
On 3/24/20 4:52 AM, Damien Le Moal wrote: +Bob who had proposed a similar change a last month. On 2020/03/24 0:04, Hannes Reinecke wrote: Implement 'cache' zones which reside on a different device. The device is logically split into zones, which then will be used as 'cache' zones, similar to the existing randow write zones. It does look like the new "cache" zones are really used exactly as conventional zones of the SMR drive. So I wonder: why even define this new zone type ? We could have the "cache" device split into random (conventional) zones added to a single pool of random zones. We can simply add device awareness to the zone allocator to avoid as much as possible using a random zone from the same drive as the sequential zone it buffers. That would avoid repeating most of the code for cache & random. Yes, indeed that was the idea to keep 'cache' and 'random' zones essentially similar. But then as there is a need to differentiate between them I thought it easier to introduce a new zone type. However, it's a nice idea to use the device to differentiate between both. And it would even lend to a simpler reclaim mechanism; set the low watermark when all random zones on the cache device are full, and set the high watermark when half of the random zones on the SMR device are full. I'll give it a go and see where I end up. Furthermore, this work is really great to support SMR drives with no conventional zones (a lot of ask for these). And considering that the new FORMAT WITH PRESET command is coming soon, a user will be able to reformat an SMR drive with sequential zones only to maximize capacity. For these, the cache device would need to hold the random zones, at which point the difference between cache and rando goes away. I know :-) Signed-off-by: Hannes Reinecke --- drivers/md/dm-zoned-metadata.c | 174 - drivers/md/dm-zoned-reclaim.c | 76 +++--- drivers/md/dm-zoned-target.c | 109 ++--- drivers/md/dm-zoned.h | 31 +- 4 files changed, 339 insertions(+), 51 deletions(-) diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c index 369de15c4e80..41cc3a29db0b 100644 --- a/drivers/md/dm-zoned-metadata.c +++ b/drivers/md/dm-zoned-metadata.c @@ -132,6 +132,8 @@ struct dmz_sb { struct dmz_metadata { struct dmz_dev *dev; + struct dmz_cdev *cdev; Given the point above, we could have this generalized as an array of devices, with the first one meeting the constraints: * It contains the metadata * It has random/conventional zones, or is a regular device (with all its capacity used through emulated random zones) I do not think that complicates the changes you did a lot. The reclaim part will need some more love I guess to be efficient, but it may be as simple as defining one work struct for each drive beside the first one. Thoughts ? Rather not. Stringing several devices together essentially emulates a RAID0 setup without any of the benefits. And the reclaim mechanism gets infinitely more complex. Another thing: I would need to update the metadata to hold the device and zoneset UUID; both devices need to carry a metadata so that we can stitch them together upon restart. But some bright soul put a crc in the middle of the metadata :-( So we can't easily extend the metadata with new fields as then the meaning of the crc is unclear; as it stands it would only cover the old fields, and not the new ones. So I would propose a 'v2' metadata, holding the crc as the last entry of the metadata. And adding a device UUID and cacheset UUID. And ensuring that the first metadata set is stored on the cache device, and the backup one on the SMR device. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones
On 3/24/20 11:52 AM, Damien Le Moal wrote: > +Bob who had proposed a similar change a last month. > Thanks! > On 2020/03/24 0:04, Hannes Reinecke wrote: >> Implement 'cache' zones which reside on a different device. >> The device is logically split into zones, which then will be >> used as 'cache' zones, similar to the existing randow write >> zones. > > It does look like the new "cahce" zones are really used exactly as > conventional > zones of the SMR drive. So I wonder: why even define this new zone type ? We > could have the "cache" device split into random (conventional) zones added to > a > single pool of random zones. We can simply add device awareness to the zone > allocator to avoid as much as possible using a random zone from the same drive > as the sequential zone it buffers. That would avoid repeating most of the code > for cache & random. > > Furthermore, this work is really great to support SMR drives with no > conventional zones (a lot of ask for these). And considering that the new > FORMAT > WITH PRESET command is coming soon, a user will be able to reformat an SMR > drive > with sequential zones only to maximize capacity. For these, the cache device > would need to hold the random zones, at which point the difference between > cache > and rando goes away. > >> >> Signed-off-by: Hannes Reinecke >> --- >> drivers/md/dm-zoned-metadata.c | 174 - >> drivers/md/dm-zoned-reclaim.c | 76 +++--- >> drivers/md/dm-zoned-target.c | 109 ++--- >> drivers/md/dm-zoned.h | 31 +- >> 4 files changed, 339 insertions(+), 51 deletions(-) >> >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c >> index 369de15c4e80..41cc3a29db0b 100644 >> --- a/drivers/md/dm-zoned-metadata.c >> +++ b/drivers/md/dm-zoned-metadata.c >> @@ -132,6 +132,8 @@ struct dmz_sb { >> struct dmz_metadata { >> struct dmz_dev *dev; >> >> +struct dmz_cdev *cdev; > > Given the point above, we could have this generalized as an array of devices, > with the first one meeting the constraints: > * It contains the metadata > * It has random/conventional zones, or is a regular device (with all its > capacity used through emulated random zones) > Yes, I was working my v2 patches as this. Will send out this week. Regards, Bob > I do not think that complicates the changes you did a lot. The reclaim part > will > need some more love I guess to be efficient, but it may be as simple as > defining > one work struct for each drive beside the first one. > > Thoughts ? > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 1/2] dm-zoned: cache device for zones
+Bob who had proposed a similar change a last month. On 2020/03/24 0:04, Hannes Reinecke wrote: > Implement 'cache' zones which reside on a different device. > The device is logically split into zones, which then will be > used as 'cache' zones, similar to the existing randow write > zones. It does look like the new "cahce" zones are really used exactly as conventional zones of the SMR drive. So I wonder: why even define this new zone type ? We could have the "cache" device split into random (conventional) zones added to a single pool of random zones. We can simply add device awareness to the zone allocator to avoid as much as possible using a random zone from the same drive as the sequential zone it buffers. That would avoid repeating most of the code for cache & random. Furthermore, this work is really great to support SMR drives with no conventional zones (a lot of ask for these). And considering that the new FORMAT WITH PRESET command is coming soon, a user will be able to reformat an SMR drive with sequential zones only to maximize capacity. For these, the cache device would need to hold the random zones, at which point the difference between cache and rando goes away. > > Signed-off-by: Hannes Reinecke > --- > drivers/md/dm-zoned-metadata.c | 174 - > drivers/md/dm-zoned-reclaim.c | 76 +++--- > drivers/md/dm-zoned-target.c | 109 ++--- > drivers/md/dm-zoned.h | 31 +- > 4 files changed, 339 insertions(+), 51 deletions(-) > > diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c > index 369de15c4e80..41cc3a29db0b 100644 > --- a/drivers/md/dm-zoned-metadata.c > +++ b/drivers/md/dm-zoned-metadata.c > @@ -132,6 +132,8 @@ struct dmz_sb { > struct dmz_metadata { > struct dmz_dev *dev; > > + struct dmz_cdev *cdev; Given the point above, we could have this generalized as an array of devices, with the first one meeting the constraints: * It contains the metadata * It has random/conventional zones, or is a regular device (with all its capacity used through emulated random zones) I do not think that complicates the changes you did a lot. The reclaim part will need some more love I guess to be efficient, but it may be as simple as defining one work struct for each drive beside the first one. Thoughts ? > + > sector_tzone_bitmap_size; > unsigned intzone_nr_bitmap_blocks; > unsigned intzone_bits_per_mblk; > @@ -139,10 +141,12 @@ struct dmz_metadata { > unsigned intnr_bitmap_blocks; > unsigned intnr_map_blocks; > > + unsigned intnr_zones; > unsigned intnr_useable_zones; > unsigned intnr_meta_blocks; > unsigned intnr_meta_zones; > unsigned intnr_data_zones; > + unsigned intnr_cache_zones; > unsigned intnr_rnd_zones; > unsigned intnr_reserved_seq; > unsigned intnr_chunks; > @@ -173,6 +177,11 @@ struct dmz_metadata { > struct list_headunmap_rnd_list; > struct list_headmap_rnd_list; > > + unsigned intnr_cache; > + atomic_tunmap_nr_cache; > + struct list_headunmap_cache_list; > + struct list_headmap_cache_list; > + > unsigned intnr_seq; > atomic_tunmap_nr_seq; > struct list_headunmap_seq_list; > @@ -189,17 +198,25 @@ struct dmz_metadata { > */ > unsigned int dmz_id(struct dmz_metadata *zmd, struct dm_zone *zone) > { > - return ((unsigned int)(zone - zmd->zones)); > + return zone->id; > } > > sector_t dmz_start_sect(struct dmz_metadata *zmd, struct dm_zone *zone) > { > - return (sector_t)dmz_id(zmd, zone) << zmd->dev->zone_nr_sectors_shift; > + sector_t zone_id = dmz_id(zmd, zone); > + > + if (dmz_is_cache(zone)) > + zone_id -= zmd->dev->nr_zones; > + return zone_id << zmd->dev->zone_nr_sectors_shift; > } > > sector_t dmz_start_block(struct dmz_metadata *zmd, struct dm_zone *zone) > { > - return (sector_t)dmz_id(zmd, zone) << zmd->dev->zone_nr_blocks_shift; > + sector_t zone_id = dmz_id(zmd, zone); > + > + if (dmz_is_cache(zone)) > + zone_id -= zmd->dev->nr_zones; > + return zone_id << zmd->dev->zone_nr_blocks_shift; > } > > unsigned int dmz_nr_chunks(struct dmz_metadata *zmd) > @@ -217,6 +234,16 @@ unsigned int dmz_nr_unmap_rnd_zones(struct dmz_metadata > *zmd) > return atomic_read(&zmd->unmap_nr_rnd); > } > > +unsigned int dmz_nr_cache_zones(struct dmz_metadata *zmd) > +{ > + return zmd->nr_cache; > +} > + > +unsigned int dmz_nr_unmap_cache_zones(struct dmz_metadata *zmd) > +{ > + return atomic_read(&zmd->unmap_nr_cache); > +} > + > /* > * Lock/unlock mapping table. > * The map lock also