Re: [PATCH] qemu-io: add cvtnum() error handling for zone commands

2024-05-07 Thread Sam Li
Stefan Hajnoczi  于2024年5月7日周二 20:06写道:
>
> cvtnum() parses positive int64_t values and returns a negative errno on
> failure. Print errors and return early when cvtnum() fails.
>
> While we're at it, also reject nr_zones values greater or equal to 2^32
> since they cannot be represented.
>
> Reported-by: Peter Maydell 
> Cc: Sam Li 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qemu-io-cmds.c | 48 +++-
>  1 file changed, 47 insertions(+), 1 deletion(-)

Reviewed-by: Sam Li 

Hi Stefan,

Thank you for fixing that. I've been a little busy with moving house lately :)

Sam



Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Sam Li
Markus Armbruster  于2024年2月19日周一 21:42写道:
>
> Sam Li  writes:
>
> > Markus Armbruster  于2024年2月19日周一 16:56写道:
> >>
> >> Sam Li  writes:
> >>
> >> > Markus Armbruster  于2024年2月19日周一 15:40写道:
> >> >>
> >> >> Sam Li  writes:
> >> >>
> >> >> > Markus Armbruster  于2024年2月19日周一 13:05写道:
> >> >> >>
> >> >> >> One more thing...
> >> >> >>
> >> >> >> Markus Armbruster  writes:
> >> >> >>
> >> >> >> > I apologize for the delayed review.
> >> >> >
> >> >> > No problems. Thanks for reviewing!
> >> >> >
> >> >> >> >
> >> >> >> > Sam Li  writes:
> >> >> >> >
> >> >> >> >> To configure the zoned format feature on the qcow2 driver, it
> >> >> >> >> requires settings as: the device size, zone model, zone size,
> >> >> >> >> zone capacity, number of conventional zones, limits on zone
> >> >> >> >> resources (max append bytes, max open zones, and 
> >> >> >> >> max_active_zones).
> >> >> >> >>
> >> >> >> >> To create a qcow2 image with zoned format feature, use command 
> >> >> >> >> like
> >> >> >> >> this:
> >> >> >> >> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> >> >> >> >> -o zone.size=64M -o zone.capacity=64M -o 
> >> >> >> >> zone.conventional_zones=0 \
> >> >> >> >> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> >> >> >> >> -o zone.max_active_zones=8 -o zone.mode=host-managed
> >> >> >> >>
> >> >> >> >> Signed-off-by: Sam Li 
> >> >> >> >
> >> >> >> > [...]
> >> >> >> >
> >> >> >> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> >> >> >> index ca390c5700..e2e0ec21a5 100644
> >> >> >> >> --- a/qapi/block-core.json
> >> >> >> >> +++ b/qapi/block-core.json
> >> >> >> >> @@ -5038,6 +5038,67 @@
> >> >> >> >>  { 'enum': 'Qcow2CompressionType',
> >> >> >> >>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >> >> >> >>
> >> >> >> >> +##
> >> >> >> >> +# @Qcow2ZoneModel:
> >> >> >> >> +#
> >> >> >> >> +# Zoned device model used in qcow2 image file
> >> >> >> >> +#
> >> >> >> >> +# @host-managed: The host-managed model only allows sequential 
> >> >> >> >> write over the
> >> >> >> >> +# device zones.
> >> >> >> >> +#
> >> >> >> >> +# Since 8.2
> >> >> >> >> +##
> >> >> >> >> +{ 'enum': 'Qcow2ZoneModel',
> >> >> >> >> +  'data': [ 'host-managed'] }
> >> >> >> >> +
> >> >> >> >> +##
> >> >> >> >> +# @Qcow2ZoneHostManaged:
> >> >> >> >> +#
> >> >> >> >> +# The host-managed zone model.  It only allows sequential writes.
> >> >> >> >> +#
> >> >> >> >> +# @size: Total number of bytes within zones.
> >> >> >> >
> >> >> >> > Default?
> >> >> >
> >> >> > It should be set by users. No default value provided. If it's unset
> >> >> > then it is zero and an error will be returned.
> >> >>
> >> >> If the user must provide @size, why is it optional then?
> >> >
> >> > It is not optional when the zone model is host-managed. If it's
> >> > non-zoned, then we don't care about zone info. I am not sure how to
> >> > make it unoptional.
> >>
> >> We have:
> >>
> >>blockdev-create argument @options of type BlockdevCreateOptions
> >>
> >>BlockdevCreateOptions union branch @qcow2 of type
> >>BlockdevCreateOptionsQcow2, union tag member is @driver
> >>
> >>BlockdevCreateOptionsQcow2 optional member @zone of type
> >>Qcow2ZoneCreateOptions, default not zoned
> >>
> >>Qcow2ZoneCreateOptions union branch @host-managed of type
> >>Qcow2ZoneHostManaged, union tag member is @mode
> >>
> >>Qcow2ZoneHostManaged optional member @size of type size.
> >>
> >> Making this member @size mandatory means we must specify it when
> >> BlockdevCreateOptionsQcow2 member @zone is present and @zone's member
> >> @mode is "host-managed".  Feels right to me.  Am I missing anything?
> >
> > That's right. And the checks when creating such an img can help do
> > that. It's not specified in the .json file directly.
>
> What would break if we did specify it in the QAPI schema directly?

Nothing I think. We can keep the current schema and add a default zone
size like 131072.

>
> [...]
>



Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Sam Li
Markus Armbruster  于2024年2月19日周一 16:56写道:
>
> Sam Li  writes:
>
> > Markus Armbruster  于2024年2月19日周一 15:40写道:
> >>
> >> Sam Li  writes:
> >>
> >> > Markus Armbruster  于2024年2月19日周一 13:05写道:
> >> >>
> >> >> One more thing...
> >> >>
> >> >> Markus Armbruster  writes:
> >> >>
> >> >> > I apologize for the delayed review.
> >> >
> >> > No problems. Thanks for reviewing!
> >> >
> >> >> >
> >> >> > Sam Li  writes:
> >> >> >
> >> >> >> To configure the zoned format feature on the qcow2 driver, it
> >> >> >> requires settings as: the device size, zone model, zone size,
> >> >> >> zone capacity, number of conventional zones, limits on zone
> >> >> >> resources (max append bytes, max open zones, and max_active_zones).
> >> >> >>
> >> >> >> To create a qcow2 image with zoned format feature, use command like
> >> >> >> this:
> >> >> >> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> >> >> >> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
> >> >> >> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> >> >> >> -o zone.max_active_zones=8 -o zone.mode=host-managed
> >> >> >>
> >> >> >> Signed-off-by: Sam Li 
> >> >> >
> >> >> > [...]
> >> >> >
> >> >> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> >> >> index ca390c5700..e2e0ec21a5 100644
> >> >> >> --- a/qapi/block-core.json
> >> >> >> +++ b/qapi/block-core.json
> >> >> >> @@ -5038,6 +5038,67 @@
> >> >> >>  { 'enum': 'Qcow2CompressionType',
> >> >> >>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >> >> >>
> >> >> >> +##
> >> >> >> +# @Qcow2ZoneModel:
> >> >> >> +#
> >> >> >> +# Zoned device model used in qcow2 image file
> >> >> >> +#
> >> >> >> +# @host-managed: The host-managed model only allows sequential 
> >> >> >> write over the
> >> >> >> +# device zones.
> >> >> >> +#
> >> >> >> +# Since 8.2
> >> >> >> +##
> >> >> >> +{ 'enum': 'Qcow2ZoneModel',
> >> >> >> +  'data': [ 'host-managed'] }
> >> >> >> +
> >> >> >> +##
> >> >> >> +# @Qcow2ZoneHostManaged:
> >> >> >> +#
> >> >> >> +# The host-managed zone model.  It only allows sequential writes.
> >> >> >> +#
> >> >> >> +# @size: Total number of bytes within zones.
> >> >> >
> >> >> > Default?
> >> >
> >> > It should be set by users. No default value provided. If it's unset
> >> > then it is zero and an error will be returned.
> >>
> >> If the user must provide @size, why is it optional then?
> >
> > It is not optional when the zone model is host-managed. If it's
> > non-zoned, then we don't care about zone info. I am not sure how to
> > make it unoptional.
>
> We have:
>
>blockdev-create argument @options of type BlockdevCreateOptions
>
>BlockdevCreateOptions union branch @qcow2 of type
>BlockdevCreateOptionsQcow2, union tag member is @driver
>
>BlockdevCreateOptionsQcow2 optional member @zone of type
>Qcow2ZoneCreateOptions, default not zoned
>
>Qcow2ZoneCreateOptions union branch @host-managed of type
>Qcow2ZoneHostManaged, union tag member is @mode
>
>Qcow2ZoneHostManaged optional member @size of type size.
>
> Making this member @size mandatory means we must specify it when
> BlockdevCreateOptionsQcow2 member @zone is present and @zone's member
> @mode is "host-managed".  Feels right to me.  Am I missing anything?

That's right. And the checks when creating such an img can help do
that. It's not specified in the .json file directly.

>
> >>
> >> >> >
> >> >> >> +#
> >> >> >> +# @capacity: The number of usable logical blocks within zones
> >> >> >> +# in bytes.  A zone capacity is always smaller or equal to the
> >> >> >> +# zone size.
> >> >> >
> >> >> > Default?
> >> >
> >> > Same.
> >> >
> >> >> >
> >> >> >> +# @max-append-bytes: The maximal number of bytes of a zone
> >> >> >> +# append request that can be issued to the device.  It must be
> >> >> >> +# 512-byte aligned and less than the zone capacity.
> >> >> >
> >> >> > Default?
> >> >
> >> > Same.
> >> >
> >> > For those values, I guess it could be set when users provide no
> >> > information and still want a workable emulated zoned block device.
> >> >
> >> >> >
> >> >> >> +#
> >> >> >> +# Since 8.2
> >> >> >> +##
> >> >> >> +{ 'struct': 'Qcow2ZoneHostManaged',
> >> >> >> +  'data': { '*size':  'size',
> >> >> >> +'*capacity':  'size',
> >> >> >> +'*conventional-zones': 'uint32',
> >> >> >> +'*max-open-zones': 'uint32',
> >> >> >> +'*max-active-zones':   'uint32',
> >> >> >> +'*max-append-bytes':   'size' } }
> >>
> >> [...]
> >>
>



Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Sam Li
Markus Armbruster  于2024年2月19日周一 15:40写道:
>
> Sam Li  writes:
>
> > Markus Armbruster  于2024年2月19日周一 13:05写道:
> >>
> >> One more thing...
> >>
> >> Markus Armbruster  writes:
> >>
> >> > I apologize for the delayed review.
> >
> > No problems. Thanks for reviewing!
> >
> >> >
> >> > Sam Li  writes:
> >> >
> >> >> To configure the zoned format feature on the qcow2 driver, it
> >> >> requires settings as: the device size, zone model, zone size,
> >> >> zone capacity, number of conventional zones, limits on zone
> >> >> resources (max append bytes, max open zones, and max_active_zones).
> >> >>
> >> >> To create a qcow2 image with zoned format feature, use command like
> >> >> this:
> >> >> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> >> >> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
> >> >> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> >> >> -o zone.max_active_zones=8 -o zone.mode=host-managed
> >> >>
> >> >> Signed-off-by: Sam Li 
> >> >
> >> > [...]
> >> >
> >> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> >> index ca390c5700..e2e0ec21a5 100644
> >> >> --- a/qapi/block-core.json
> >> >> +++ b/qapi/block-core.json
> >> >> @@ -5038,6 +5038,67 @@
> >> >>  { 'enum': 'Qcow2CompressionType',
> >> >>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >> >>
> >> >> +##
> >> >> +# @Qcow2ZoneModel:
> >> >> +#
> >> >> +# Zoned device model used in qcow2 image file
> >> >> +#
> >> >> +# @host-managed: The host-managed model only allows sequential write 
> >> >> over the
> >> >> +# device zones.
> >> >> +#
> >> >> +# Since 8.2
> >> >> +##
> >> >> +{ 'enum': 'Qcow2ZoneModel',
> >> >> +  'data': [ 'host-managed'] }
> >> >> +
> >> >> +##
> >> >> +# @Qcow2ZoneHostManaged:
> >> >> +#
> >> >> +# The host-managed zone model.  It only allows sequential writes.
> >> >> +#
> >> >> +# @size: Total number of bytes within zones.
> >> >
> >> > Default?
> >
> > It should be set by users. No default value provided. If it's unset
> > then it is zero and an error will be returned.
>
> If the user must provide @size, why is it optional then?

It is not optional when the zone model is host-managed. If it's
non-zoned, then we don't care about zone info. I am not sure how to
make it unoptional.

>
> >> >
> >> >> +#
> >> >> +# @capacity: The number of usable logical blocks within zones
> >> >> +# in bytes.  A zone capacity is always smaller or equal to the
> >> >> +# zone size.
> >> >
> >> > Default?
> >
> > Same.
> >
> >> >
> >> >> +# @max-append-bytes: The maximal number of bytes of a zone
> >> >> +# append request that can be issued to the device.  It must be
> >> >> +# 512-byte aligned and less than the zone capacity.
> >> >
> >> > Default?
> >
> > Same.
> >
> > For those values, I guess it could be set when users provide no
> > information and still want a workable emulated zoned block device.
> >
> >> >
> >> >> +#
> >> >> +# Since 8.2
> >> >> +##
> >> >> +{ 'struct': 'Qcow2ZoneHostManaged',
> >> >> +  'data': { '*size':  'size',
> >> >> +'*capacity':  'size',
> >> >> +'*conventional-zones': 'uint32',
> >> >> +'*max-open-zones': 'uint32',
> >> >> +'*max-active-zones':   'uint32',
> >> >> +'*max-append-bytes':   'size' } }
>
> [...]
>



Re: [PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-02-19 Thread Sam Li
Markus Armbruster  于2024年2月19日周一 13:05写道:
>
> One more thing...
>
> Markus Armbruster  writes:
>
> > I apologize for the delayed review.

No problems. Thanks for reviewing!

> >
> > Sam Li  writes:
> >
> >> To configure the zoned format feature on the qcow2 driver, it
> >> requires settings as: the device size, zone model, zone size,
> >> zone capacity, number of conventional zones, limits on zone
> >> resources (max append bytes, max open zones, and max_active_zones).
> >>
> >> To create a qcow2 image with zoned format feature, use command like
> >> this:
> >> qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
> >> -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
> >> -o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
> >> -o zone.max_active_zones=8 -o zone.mode=host-managed
> >>
> >> Signed-off-by: Sam Li 
> >
> > [...]
> >
> >> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> index ca390c5700..e2e0ec21a5 100644
> >> --- a/qapi/block-core.json
> >> +++ b/qapi/block-core.json
> >> @@ -5038,6 +5038,67 @@
> >>  { 'enum': 'Qcow2CompressionType',
> >>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >>
> >> +##
> >> +# @Qcow2ZoneModel:
> >> +#
> >> +# Zoned device model used in qcow2 image file
> >> +#
> >> +# @host-managed: The host-managed model only allows sequential write over 
> >> the
> >> +# device zones.
> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'enum': 'Qcow2ZoneModel',
> >> +  'data': [ 'host-managed'] }
> >> +
> >> +##
> >> +# @Qcow2ZoneHostManaged:
> >> +#
> >> +# The host-managed zone model.  It only allows sequential writes.
> >> +#
> >> +# @size: Total number of bytes within zones.
> >
> > Default?

It should be set by users. No default value provided. If it's unset
then it is zero and an error will be returned.

> >
> >> +#
> >> +# @capacity: The number of usable logical blocks within zones
> >> +# in bytes.  A zone capacity is always smaller or equal to the
> >> +# zone size.
> >
> > Default?

Same.

> >
> >> +# @max-append-bytes: The maximal number of bytes of a zone
> >> +# append request that can be issued to the device.  It must be
> >> +# 512-byte aligned and less than the zone capacity.
> >
> > Default?

Same.

For those values, I guess it could be set when users provide no
information and still want a workable emulated zoned block device.

> >
> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'struct': 'Qcow2ZoneHostManaged',
> >> +  'data': { '*size':  'size',
> >> +'*capacity':  'size',
> >> +'*conventional-zones': 'uint32',
> >> +'*max-open-zones': 'uint32',
> >> +'*max-active-zones':   'uint32',
> >> +'*max-append-bytes':   'size' } }
> >> +
> >> +##
> >> +# @Qcow2ZoneCreateOptions:
> >> +#
> >> +# The zone device model for the qcow2 image.
>
> Please document member @mode.
>
> Fails to build since merge commit 61e7a0d27c1:
>
> qapi/block-core.json: In union 'Qcow2ZoneCreateOptions':
> qapi/block-core.json:5135: member 'mode' lacks documentation
>

I see. Will update to the latest commit.

> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'union': 'Qcow2ZoneCreateOptions',
> >> +  'base': { 'mode': 'Qcow2ZoneModel' },
> >> +  'discriminator': 'mode',
> >> +  'data': { 'host-managed': 'Qcow2ZoneHostManaged' } }
> >> +
> >>  ##
> >>  # @BlockdevCreateOptionsQcow2:
> >>  #
> >> @@ -5080,6 +5141,9 @@
> >>  # @compression-type: The image cluster compression method
> >>  # (default: zlib, since 5.1)
> >>  #
> >> +# @zone: The zone device model modes.  The default is that the device is
> >> +# not zoned.  (since 8.2)
> >> +#
> >>  # Since: 2.12
> >>  ##
> >>  { 'struct': 'BlockdevCreateOptionsQcow2',
> >> @@ -5096,7 +5160,8 @@
> >>  '*preallocation':   'PreallocMode',
> >>  '*lazy-refcounts':  'bool',
> >>  '*refcount-bits':   'int',
> >> -'*compression-type':'Qcow2CompressionType' } }
> >> +'*compression-type':'Qcow2CompressionType',
> >> +'*zone':'Qcow2ZoneCreateOptions' } }
> >>
> >>  ##
> >>  # @BlockdevCreateOptionsQed:
>



[RFC v3 7/7] hw/nvme: make ZDED persistent

2024-01-22 Thread Sam Li
Zone descriptor extension data (ZDED) is not persistent across QEMU
restarts. The zone descriptor extension valid bit (ZDEV) is part of
zone attributes, which sets to one when the ZDED is associated with
the zone.

With the qcow2 img as the backing file, the NVMe ZNS device stores
the zone attributes at the following eight bit of zone type bit of write
pointers for each zone. The ZDED is stored as part of zoned metadata as
write pointers.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 45 
 hw/nvme/ctrl.c   |  1 +
 include/block/block-common.h |  1 +
 3 files changed, 47 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 43ee0f47b9..f2d58d86c4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 
 #include "block/qdict.h"
+#include "block/nvme.h"
 #include "sysemu/block-backend.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -197,6 +198,17 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, 
Error **errp)
 
 #define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
 
+static inline void qcow2_set_za(uint64_t *wp, uint8_t za)
+{
+/*
+ * The zone attribute takes up one byte. Store it after the zoned
+ * bit.
+ */
+uint64_t addr = *wp;
+addr |= ((uint64_t)za << 51);
+*wp = addr;
+}
+
 /*
  * To emulate a real zoned device, closed, empty and full states are
  * preserved after a power cycle. The open states are in-memory and will
@@ -5053,6 +5065,36 @@ unlock:
 return ret;
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+int ret;
+
+qemu_co_mutex_lock(>wps->colock);
+uint64_t *wp = >wps->wp[index];
+BlockZoneState zs = qcow2_get_zone_state(bs, index);
+if (zs == BLK_ZS_EMPTY) {
+if (!qcow2_can_activate_zone(bs)) {
+goto unlock;
+}
+
+qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID);
+ret = qcow2_write_wp_at(bs, wp, index);
+if (ret < 0) {
+error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp);
+goto unlock;
+}
+s->nr_zones_closed++;
+qemu_co_mutex_unlock(>wps->colock);
+return ret;
+}
+
+unlock:
+qemu_co_mutex_unlock(>wps->colock);
+return NVME_ZONE_INVAL_TRANSITION;
+}
+
 static int coroutine_fn GRAPH_RDLOCK
 qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
int64_t offset, int64_t len)
@@ -5110,6 +5152,9 @@ qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
 case BLK_ZO_OFFLINE:
 /* There are no transitions from the offline state to any other state 
*/
 break;
+case BLK_ZO_SET_ZDED:
+ret = qcow2_zns_set_zded(bs, index);
+break;
 default:
 error_report("Unsupported zone op: 0x%x", op);
 ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index de41d8bac8..2799a3ac31 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3465,6 +3465,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 break;
 
 case NVME_ZONE_ACTION_SET_ZD_EXT:
+op = BLK_ZO_SET_ZDED;
 int zd_ext_size = blk_get_zd_ext_size(blk);
 trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
 if (all || !zd_ext_size) {
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 7690b05149..7c501e053e 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -88,6 +88,7 @@ typedef enum BlockZoneOp {
 BLK_ZO_FINISH,
 BLK_ZO_RESET,
 BLK_ZO_OFFLINE,
+BLK_ZO_SET_ZDED,
 } BlockZoneOp;
 
 typedef enum BlockZoneModel {
-- 
2.40.1




[RFC v3 5/7] hw/nvme: make the metadata of ZNS emulation persistent

2024-01-22 Thread Sam Li
The NVMe ZNS devices follow NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs. The
ZNS device calls zone report and zone mgmt APIs from the block layer
which will handle zone state transition and manage zone resources.

Signed-off-by: Sam Li 
---
 block/qcow2.c|3 +
 hw/nvme/ctrl.c   | 1115 +++---
 hw/nvme/ns.c |   77 +--
 hw/nvme/nvme.h   |   85 +--
 include/block/block-common.h |8 +
 include/block/block_int-common.h |2 +
 6 files changed, 264 insertions(+), 1026 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5098edf656..0bb249fa6e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5107,6 +5107,9 @@ qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
 case BLK_ZO_RESET:
 ret = qcow2_reset_zone(bs, index, len);
 break;
+case BLK_ZO_OFFLINE:
+/* There are no transitions from the offline state to any other state 
*/
+break;
 default:
 error_report("Unsupported zone op: 0x%x", op);
 ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dae6f00e4f..e31aa52c06 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -372,67 +372,6 @@ static inline bool nvme_parse_pid(NvmeNamespace *ns, 
uint16_t pid,
 return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg);
 }
 
-static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
-   NvmeZoneState state)
-{
-if (QTAILQ_IN_USE(zone, entry)) {
-switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_REMOVE(>exp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_REMOVE(>imp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_CLOSED:
-QTAILQ_REMOVE(>closed_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_FULL:
-QTAILQ_REMOVE(>full_zones, zone, entry);
-default:
-;
-}
-}
-
-nvme_set_zone_state(zone, state);
-
-switch (state) {
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>exp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>imp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_CLOSED:
-QTAILQ_INSERT_TAIL(>closed_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_FULL:
-QTAILQ_INSERT_TAIL(>full_zones, zone, entry);
-case NVME_ZONE_STATE_READ_ONLY:
-break;
-default:
-zone->d.za = 0;
-}
-}
-
-static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
- uint32_t opn, uint32_t zrwa)
-{
-if (zrwa > ns->zns.numzrwa) {
-return NVME_NOZRWA | NVME_DNR;
-}
-
-return NVME_SUCCESS;
-}
-
-/*
- * Check if we can open a zone without exceeding open/active limits.
- * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5).
- */
-static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn)
-{
-return nvme_zns_check_resources(ns, act, opn, 0);
-}
-
 static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer 
*ebuf)
 {
 NvmeFdpEvent *ret = NULL;
@@ -1769,355 +1708,11 @@ static inline uint32_t nvme_zone_idx(NvmeNamespace 
*ns, uint64_t slba)
 slba / ns->zone_size;
 }
 
-static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba)
-{
-uint32_t zone_idx = nvme_zone_idx(ns, slba);
-
-if (zone_idx >= ns->num_zones) {
-return NULL;
-}
-
-return >zone_array[zone_idx];
-}
-
-static uint16_t nvme_check_zone_state_for_write(NvmeZone *zone)
-{
-uint64_t zslba = zone->d.zslba;
-
-switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_EMPTY:
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-case NVME_ZONE_STATE_CLOSED:
-return NVME_SUCCESS;
-case NVME_ZONE_STATE_FULL:
-trace_pci_nvme_err_zone_is_full(zslba);
-return NVME_ZONE_FULL;
-case NVME_ZONE_STATE_OFFLINE:
-trace_pci_nvme_err_zone_is_offline(zslba);
-return NVME_ZONE_OFFLINE;
-case NVME_ZONE_STATE_READ_ONLY:
-trace_pci_nvme_err_zone_is_read_only(zslba);
-return NVME_ZONE_READ_ONLY;
-default:
-assert(false);
-}
-
-return NVME_INTERNAL_DEV_ERROR;
-}
-
-static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone,
-  uint64_t slba, uint32_t nlb)
-{
-uint64_t zcap = nvme_zone_wr_boundary(zone);
-uint16_t

[RFC v3 2/7] qcow2: add zd_extension configurations to zoned metadata

2024-01-22 Thread Sam Li
Zone descriptor extension data is host definied data that is
associated with each zone. Add zone descriptor extensions
to zonedmeta struct.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 70 +---
 block/qcow2.h|  2 +
 include/block/block_int-common.h |  6 +++
 qapi/block-core.json |  4 ++
 4 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index db28585b82..5098edf656 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -448,9 +448,9 @@ qcow2_refresh_zonedmeta(BlockDriverState *bs)
 {
 int ret;
 BDRVQcow2State *s = bs->opaque;
-uint64_t wps_size = s->zoned_header.zonedmeta_size;
+uint64_t wps_size = s->zoned_header.zonedmeta_size -
+s->zded_size;
 g_autofree uint64_t *temp;
-
 temp = g_new(uint64_t, wps_size);
 ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
  wps_size, temp, 0);
@@ -459,7 +459,17 @@ qcow2_refresh_zonedmeta(BlockDriverState *bs)
 return ret;
 }
 
+g_autofree uint8_t *zded = NULL;
+zded = g_try_malloc0(s->zded_size);
+ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset + wps_size,
+ s->zded_size, zded, 0);
+if (ret < 0) {
+error_report("Can not read zded");
+return ret;
+}
+
 memcpy(bs->wps->wp, temp, wps_size);
+memcpy(bs->zd_extensions, zded, s->zded_size);
 return 0;
 }
 
@@ -520,6 +530,19 @@ qcow2_check_zone_options(Qcow2ZonedHeaderExtension 
*zone_opt)
 zone_opt->max_open_zones = sequential_zones;
 }
 
+if (zone_opt->zd_extension_size) {
+if (zone_opt->zd_extension_size & 0x3f) {
+   error_report("zone descriptor extension size must be a "
+"multiple of 64B");
+   return false;
+}
+
+if ((zone_opt->zd_extension_size >> 6) > 0xff) {
+error_report("Zone descriptor extension size is too large");
+return false;
+}
+}
+
 return true;
 }
 return false;
@@ -784,6 +807,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 zoned_ext.conventional_zones =
 be32_to_cpu(zoned_ext.conventional_zones);
 zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+zoned_ext.zd_extension_size =
+be32_to_cpu(zoned_ext.zd_extension_size);
 zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
 zoned_ext.max_active_zones =
 be32_to_cpu(zoned_ext.max_active_zones);
@@ -794,7 +819,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
 s->zoned_header = zoned_ext;
 bs->wps = g_malloc(sizeof(BlockZoneWps)
-+ s->zoned_header.zonedmeta_size);
++ zoned_ext.zonedmeta_size - s->zded_size);
+bs->zd_extensions = g_malloc0(s->zded_size);
 ret = qcow2_refresh_zonedmeta(bs);
 if (ret < 0) {
 return ret;
@@ -2370,6 +2396,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.zone_size = s->zoned_header.zone_size;
 bs->bl.zone_capacity = s->zoned_header.zone_capacity;
 bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 }
 
 static int GRAPH_UNLOCKED
@@ -3621,6 +3648,8 @@ int qcow2_update_header(BlockDriverState *bs)
 .conventional_zones =
 cpu_to_be32(s->zoned_header.conventional_zones),
 .nr_zones   = cpu_to_be32(s->zoned_header.nr_zones),
+.zd_extension_size  =
+cpu_to_be32(s->zoned_header.zd_extension_size),
 .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
 .max_active_zones   =
 cpu_to_be32(s->zoned_header.max_active_zones),
@@ -4373,6 +4402,15 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 }
 s->zoned_header.max_append_bytes = zone_host_managed->max_append_bytes;
 
+uint64_t zded_size = 0;
+if (zone_host_managed->has_descriptor_extension_size) {
+s->zoned_header.zd_extension_size =
+zone_host_managed->descriptor_extension_size;
+zded_size = s->zoned_header.zd_extension_size *
+bs->bl.nr_zones;
+}
+s->zded_size = zded_size;
+
 if (!qcow2_check_zone_options(>zoned_header)) {
 s->zoned_header.zoned = BLK_Z_NONE;
 ret = -EINVAL;
@@ -4380

[RFC v3 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer

2024-01-22 Thread Sam Li
The zone information is contained in the BlockLimits fileds. Add blk_get_*() 
functions
to access the block layer and update zone info accessing in the NVMe device 
emulation.

Signed-off-by: Sam Li 
---
 block/block-backend.c | 72 +++
 hw/nvme/ctrl.c| 34 +--
 hw/nvme/ns.c  | 61 --
 hw/nvme/nvme.h|  3 --
 include/sysemu/block-backend-io.h |  9 
 5 files changed, 111 insertions(+), 68 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 209eb07528..c23f2a731b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2359,6 +2359,78 @@ int blk_get_max_iov(BlockBackend *blk)
 return blk->root->bs->bl.max_iov;
 }
 
+uint8_t blk_get_zone_model(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+return bs ? bs->bl.zoned: 0;
+
+}
+
+uint32_t blk_get_zone_size(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zone_size : 0;
+}
+
+uint32_t blk_get_zone_capacity(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zone_capacity : 0;
+}
+
+uint32_t blk_get_max_open_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_open_zones : 0;
+}
+
+uint32_t blk_get_max_active_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_active_zones : 0;
+}
+
+uint32_t blk_get_max_append_sectors(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_append_sectors : 0;
+}
+
+uint32_t blk_get_nr_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.nr_zones : 0;
+}
+
+uint32_t blk_get_write_granularity(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.write_granularity : 0;
+}
+
+BlockZoneWps *blk_get_zone_wps(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->wps : NULL;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e..e64b021454 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -417,18 +417,6 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
 static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
  uint32_t opn, uint32_t zrwa)
 {
-if (ns->params.max_active_zones != 0 &&
-ns->nr_active_zones + act > ns->params.max_active_zones) {
-trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones);
-return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR;
-}
-
-if (ns->params.max_open_zones != 0 &&
-ns->nr_open_zones + opn > ns->params.max_open_zones) {
-trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones);
-return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR;
-}
-
 if (zrwa > ns->zns.numzrwa) {
 return NVME_NOZRWA | NVME_DNR;
 }
@@ -1988,9 +1976,9 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, 
NvmeZone *zone)
 static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns)
 {
 NvmeZone *zone;
+int moz = blk_get_max_open_zones(ns->blkconf.blk);
 
-if (ns->params.max_open_zones &&
-ns->nr_open_zones == ns->params.max_open_zones) {
+if (moz && ns->nr_open_zones == moz) {
 zone = QTAILQ_FIRST(>imp_open_zones);
 if (zone) {
 /*
@@ -2160,7 +2148,7 @@ void nvme_rw_complete_cb(void *opaque, int ret)
 block_acct_done(stats, acct);
 }
 
-if (ns->params.zoned && nvme_is_write(req)) {
+if (blk_get_zone_model(blk) && nvme_is_write(req)) {
 nvme_finalize_zoned_write(ns, req);
 }
 
@@ -2882,7 +2870,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int 
ret)
 goto out;
 }
 
-if (ns->params.zoned) {
+if (blk_get_zone_model(ns->blkconf.blk)) {
 nvme_advance_zone_wp(ns, iocb->zone, nlb);
 }
 
@@ -2994,7 +2982,7 @@ static void nvme_copy_in_completed_cb(void *opaque, int 
ret)
 goto invalid;
 }
 
-if (ns->params.zoned) {
+if (blk_get_zone_model(ns->blkconf.blk)) {
 status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb);
 if (status) {
 goto invalid;
@@ -3088,7 +3076,7 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb)
 }
 }
 
-if (ns->params.zoned) {
+if (blk_get_zone_model(ns->blkconf.blk)) {
 status = nvme_check_zone_read(ns, slba, nlb);
 if (status) {
   

[RFC v3 0/7] Add persistence to NVMe ZNS emulation

2024-01-22 Thread Sam Li
ZNS emulation follows NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs and
the qcow2 img as backing file. It is the second part after the patches
- adding full zoned storage emulation to qcow2 driver [v7]

The metadata of ZNS emulation divides into two parts, zone metadata and
zone descriptor extension data. The zone metadata is composed of zone
states, zone type, wp and zone attributes. The zone information can be
stored at an uint64_t wp to save space and easy access. The structure of
wp of each zone is as follows:
|(4)| zone type (1)| zone attr (8)| wp (51) ||

The zone descriptor extension data is relatively small comparing to the
overall size therefore we adopt the option that store zded of all zones
in an array regardless of the valid bit set.

Creating a zns format qcow2 image file adds one more option zd_extension_size
to zoned device configurations.

For a closer look, you can apply the zns patches on this branch:
https://github.com/sgzerolc/qemu/tree/dev-qcow2-v7
Or use the local zns branch directly:
https://github.com/sgzerolc/qemu/tree/dev-zns-v7

To attach this file as emulated zns drive in the command line of QEMU, use:
  -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
  -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \

Acked-by: Klaus Jensen 

---

v2->v3:
- fix compatability issue with the qcow2 patch series [Markus]
- address review comments [Markus]

v1->v2:
- split [v1 2/5] patch to three (doc, config, block layer API)
- adapt qcow2 v6

Sam Li (7):
  docs/qcow2: add zd_extension_size option to the zoned format feature
  qcow2: add zd_extension configurations to zoned metadata
  hw/nvme: use blk_get_*() to access zone info in the block layer
  hw/nvme: add blk_get_zone_extension to access zd_extensions
  hw/nvme: make the metadata of ZNS emulation persistent
  hw/nvme: refactor zone append write using block layer APIs
  hw/nvme: make ZDED persistent

 block/block-backend.c |   88 ++
 block/qcow2.c |  120 ++-
 block/qcow2.h |2 +
 docs/interop/qcow2.txt|9 +
 hw/nvme/ctrl.c| 1246 -
 hw/nvme/ns.c  |  162 +---
 hw/nvme/nvme.h|   95 +--
 include/block/block-common.h  |9 +
 include/block/block_int-common.h  |8 +
 include/sysemu/block-backend-io.h |   11 +
 include/sysemu/dma.h  |3 +
 qapi/block-core.json  |4 +
 system/dma-helpers.c  |   17 +
 13 files changed, 648 insertions(+), 1126 deletions(-)

-- 
2.40.1




[RFC v3 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature

2024-01-22 Thread Sam Li
The NVMe ZNS command set has the zone descriptor extension feature for
associating the data to a zone. Devices that supports ZAC/ZBC have zero
zone descriptor extension size.

Signed-off-by: Sam Li 
---
 docs/interop/qcow2.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index a8dd4c3b15..106477d9ad 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -436,6 +436,15 @@ The fields of the zoned extension are:
The offset of zoned metadata structure in the contained
image, in bytes.
 
+ 44 - 51:  zd_extension_size
+   The size of zone descriptor extension data in bytes.
+   The value must be a multiple of 64.
+
+   The zone descriptor extension feature is associating data
+   to a zone which is only available in the NVMe ZNS command
+   set. A value of zero indicates the feature is not
+   available.
+
 == Full disk encryption header pointer ==
 
 The full disk encryption header must be present if, and only if, the
-- 
2.40.1




[RFC v3 6/7] hw/nvme: refactor zone append write using block layer APIs

2024-01-22 Thread Sam Li
Signed-off-by: Sam Li 
---
 block/qcow2.c|   2 +-
 hw/nvme/ctrl.c   | 190 ---
 include/sysemu/dma.h |   3 +
 system/dma-helpers.c |  17 
 4 files changed, 162 insertions(+), 50 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0bb249fa6e..43ee0f47b9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2395,7 +2395,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_open_zones = s->zoned_header.max_open_zones;
 bs->bl.zone_size = s->zoned_header.zone_size;
 bs->bl.zone_capacity = s->zoned_header.zone_capacity;
-bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+bs->bl.write_granularity = BDRV_SECTOR_SIZE; /* physical block size */
 bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 }
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e31aa52c06..de41d8bac8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1726,6 +1726,95 @@ static void nvme_misc_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+typedef struct NvmeZoneCmdAIOCB {
+NvmeRequest *req;
+NvmeCmd *cmd;
+NvmeCtrl *n;
+
+union {
+struct {
+  uint32_t partial;
+  unsigned int nr_zones;
+  BlockZoneDescriptor *zones;
+} zone_report_data;
+struct {
+  int64_t offset;
+} zone_append_data;
+};
+} NvmeZoneCmdAIOCB;
+
+static void nvme_blk_zone_append_complete_cb(void *opaque, int ret)
+{
+NvmeZoneCmdAIOCB *cb = opaque;
+NvmeRequest *req = cb->req;
+int64_t *offset = (int64_t *)>cqe;
+
+if (ret) {
+nvme_aio_err(req, ret);
+}
+
+*offset = nvme_b2l(req->ns, cb->zone_append_data.offset);
+nvme_enqueue_req_completion(nvme_cq(req), req);
+g_free(cb);
+}
+
+static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset,
+  uint32_t align,
+  BlockCompletionFunc *cb,
+  NvmeZoneCmdAIOCB *aiocb)
+{
+NvmeRequest *req = aiocb->req;
+assert(req->sg.flags & NVME_SG_ALLOC);
+
+if (req->sg.flags & NVME_SG_DMA) {
+req->aiocb = dma_blk_zone_append(blk, >sg.qsg, (int64_t)offset,
+ align, cb, aiocb);
+} else {
+req->aiocb = blk_aio_zone_append(blk, offset, >sg.iov, 0,
+ cb, aiocb);
+}
+}
+
+static void nvme_zone_append_cb(void *opaque, int ret)
+{
+NvmeZoneCmdAIOCB *aiocb = opaque;
+NvmeRequest *req = aiocb->req;
+NvmeNamespace *ns = req->ns;
+
+BlockBackend *blk = ns->blkconf.blk;
+
+trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
+
+if (ret) {
+goto out;
+}
+
+if (ns->lbaf.ms) {
+NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+int64_t offset = aiocb->zone_append_data.offset;
+
+if (nvme_ns_ext(ns) || req->cmd.mptr) {
+uint16_t status;
+
+nvme_sg_unmap(>sg);
+status = nvme_map_mdata(nvme_ctrl(req), nlb, req);
+if (status) {
+ret = -EFAULT;
+goto out;
+}
+
+return nvme_blk_zone_append(blk, , 1,
+nvme_blk_zone_append_complete_cb,
+aiocb);
+}
+}
+
+out:
+nvme_blk_zone_append_complete_cb(aiocb, ret);
+}
+
+
 void nvme_rw_complete_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
@@ -3052,6 +3141,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 uint64_t mapped_size = data_size;
 uint64_t data_offset;
 BlockBackend *blk = ns->blkconf.blk;
+BlockZoneWps *wps = blk_get_zone_wps(blk);
+uint32_t zone_size = blk_get_zone_size(blk);
+uint32_t zone_idx;
 uint16_t status;
 
 if (nvme_ns_ext(ns)) {
@@ -3082,42 +3174,47 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 }
 
 if (blk_get_zone_model(blk)) {
-uint32_t zone_size = blk_get_zone_size(blk);
-uint32_t zone_idx = slba / zone_size;
-int64_t zone_start = zone_idx * zone_size;
+assert(wps);
+if (zone_size) {
+zone_idx = slba / zone_size;
+int64_t zone_start = zone_idx * zone_size;
+
+if (append) {
+bool piremap = !!(ctrl & NVME_RW_PIREMAP);
+
+if (n->params.zasl &&
+data_size > (uint64_t)
+n->page_size << n->params.zasl) {
+trace_pci_nvme_err_zasl(data_size);
+return NVME_INVALID_FIELD | NVME_DNR;
+}
 
-if (append) {
-bool piremap = !!(ctrl & NVME_RW

[RFC v3 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions

2024-01-22 Thread Sam Li
Signed-off-by: Sam Li 
---
 block/block-backend.c | 16 
 hw/nvme/ctrl.c| 20 ++--
 hw/nvme/ns.c  | 24 
 hw/nvme/nvme.h|  7 ---
 include/sysemu/block-backend-io.h |  2 ++
 5 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index c23f2a731b..3bebee12b9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2431,6 +2431,22 @@ BlockZoneWps *blk_get_zone_wps(BlockBackend *blk)
 return bs ? bs->wps : NULL;
 }
 
+uint8_t *blk_get_zone_extension(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->zd_extensions : NULL;
+}
+
+uint32_t blk_get_zd_ext_size(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zd_extension_size : 0;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e64b021454..dae6f00e4f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4004,6 +4004,12 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl 
*n, NvmeZone *zone,
 return NVME_SUCCESS;
 }
 
+static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
+uint32_t zone_idx)
+{
+return >zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)];
+}
+
 static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)>cmd;
@@ -4088,11 +4094,11 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 
 case NVME_ZONE_ACTION_SET_ZD_EXT:
 trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
-if (all || !ns->params.zd_extension_size) {
+if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 zd_ext = nvme_get_zd_extension(ns, zone_idx);
-status = nvme_h2c(n, zd_ext, ns->params.zd_extension_size, req);
+status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), 
req);
 if (status) {
 trace_pci_nvme_err_zd_extension_map_error(zone_idx);
 return status;
@@ -4183,7 +4189,8 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 if (zra != NVME_ZONE_REPORT && zra != NVME_ZONE_REPORT_EXTENDED) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
-if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) {
+if (zra == NVME_ZONE_REPORT_EXTENDED &&
+!blk_get_zd_ext_size(ns->blkconf.blk)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -4205,7 +4212,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 
 zone_entry_sz = sizeof(NvmeZoneDescr);
 if (zra == NVME_ZONE_REPORT_EXTENDED) {
-zone_entry_sz += ns->params.zd_extension_size;
+zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ;
 }
 
 max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
@@ -4243,11 +4250,12 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 if (zra == NVME_ZONE_REPORT_EXTENDED) {
+int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk);
 if (zone->d.za & NVME_ZA_ZD_EXT_VALID) {
 memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx),
-   ns->params.zd_extension_size);
+   zd_ext_size);
 }
-buf_p += ns->params.zd_extension_size;
+buf_p += zd_ext_size;
 }
 
 max_zones--;
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 82d4f7932d..45c08391f5 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -218,15 +218,15 @@ static int 
nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp)
 
 static void nvme_ns_zoned_init_state(NvmeNamespace *ns)
 {
+BlockBackend *blk = ns->blkconf.blk;
 uint64_t start = 0, zone_size = ns->zone_size;
 uint64_t capacity = ns->num_zones * zone_size;
 NvmeZone *zone;
 int i;
 
 ns->zone_array = g_new0(NvmeZone, ns->num_zones);
-if (ns->params.zd_extension_size) {
-ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
-  ns->num_zones);
+if (blk_get_zone_extension(blk)) {
+ns->zd_extensions = blk_get_zone_extension(blk);
 }
 
 QTAILQ_INIT(>exp_open_zones);
@@ -275,7 +275,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 for (i = 0; i <= ns->id_ns.nlbaf; i++) {
 id_ns_z->lbafe[i].zsze = cpu_to_le64(ns->zone_size);
 id_ns_z->lbafe[i].zdes =
-ns->params.zd_extension_size >> 6; /* Units

[PATCH v7 2/4] qcow2: add configurations for zoned format extension

2024-01-22 Thread Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires settings as: the device size, zone model, zone size,
zone capacity, number of conventional zones, limits on zone
resources (max append bytes, max open zones, and max_active_zones).

To create a qcow2 image with zoned format feature, use command like
this:
qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
-o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
-o zone.max_append_bytes=4096 -o zone.max_open_zones=6 \
-o zone.max_active_zones=8 -o zone.mode=host-managed

Signed-off-by: Sam Li 
---
 block/qcow2.c| 252 ++-
 block/qcow2.h|  36 -
 docs/interop/qcow2.txt   | 107 -
 include/block/block_int-common.h |  13 ++
 qapi/block-core.json |  67 +++-
 5 files changed, 469 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9bee66fff5..b987f1e751 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
 #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -194,6 +195,68 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, 
Error **errp)
 return cryptoopts_qdict;
 }
 
+/*
+ * Passing by the zoned device configurations by a zoned_header struct, check
+ * if the zone device options are under constraints. Return false when some
+ * option is invalid
+ */
+static inline bool
+qcow2_check_zone_options(Qcow2ZonedHeaderExtension *zone_opt)
+{
+if (zone_opt) {
+uint32_t sequential_zones;
+
+if (zone_opt->zone_size == 0) {
+error_report("Zoned extension header zone_size field "
+ "can not be 0");
+return false;
+}
+
+if (zone_opt->zone_capacity > zone_opt->zone_size) {
+error_report("zone capacity %" PRIu32 "B exceeds zone size "
+ "%" PRIu32 "B", zone_opt->zone_capacity,
+ zone_opt->zone_size);
+return false;
+}
+
+if (zone_opt->max_append_bytes + BDRV_SECTOR_SIZE >=
+zone_opt->zone_capacity) {
+error_report("max append bytes %" PRIu32 "B exceeds zone "
+ "capacity %" PRIu32 "B by more than block size",
+ zone_opt->zone_capacity,
+ zone_opt->max_append_bytes);
+return false;
+}
+
+if (zone_opt->max_active_zones > zone_opt->nr_zones) {
+error_report("Max_active_zones %" PRIu32 " exceeds "
+ "nr_zones %" PRIu32 ". Set it to nr_zones.",
+ zone_opt->max_active_zones, zone_opt->nr_zones);
+zone_opt->max_active_zones = zone_opt->nr_zones;
+}
+
+if (zone_opt->max_open_zones > zone_opt->max_active_zones) {
+error_report("Max_open_zones %" PRIu32 " exceeds "
+ "max_active_zones %" PRIu32 ". Set it to "
+ "max_active_zones.",
+ zone_opt->max_open_zones,
+ zone_opt->max_active_zones);
+zone_opt->max_open_zones = zone_opt->max_active_zones;
+}
+
+sequential_zones = zone_opt->nr_zones - zone_opt->conventional_zones;
+if (zone_opt->max_open_zones > sequential_zones) {
+error_report("Max_open_zones field can not be larger "
+ "than the number of SWR zones. Set it to number of "
+ "SWR zones %" PRIu32 ".", sequential_zones);
+zone_opt->max_open_zones = sequential_zones;
+}
+
+return true;
+}
+return false;
+}
+
 /*
  * read qcow2 extension and fill bs
  * start reading from start_offset
@@ -211,6 +274,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 uint64_t offset;
 int ret;
 Qcow2BitmapHeaderExt bitmaps_ext;
+Qcow2ZonedHeaderExtension zoned_ext;
 
 if (need_update_header != NULL) {
 *need_update_header = false;
@@ -432,6 +496,51 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 break;
 }
 
+case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+{
+if (ext.len < sizeof(zoned_ext)) {
+/* Missing fields */
+error_setg(errp, "zoned_ext: len=%" PRIu32 " too small "
+ 

[PATCH v7 1/4] docs/qcow2: add the zoned format feature

2024-01-22 Thread Sam Li
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file then can emulate real zoned devices, either passed
through by virtio-blk device or NVMe ZNS drive to the guest
given zoned information.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 docs/system/qemu-block-drivers.rst.inc | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..4db19b61ae 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,48 @@ This section describes each format and the options that 
are supported for it.
 filename`` to check if the NOCOW flag is set or not (Capital 'C' is
 NOCOW flag).
 
+  .. option:: zone.mode
+If this is set to ``host-managed``, the image is an emulated zoned
+block device. This option is only valid to emulated zoned device files.
+
+  .. option:: zone.size
+
+The size of a zone in bytes. The device is divided into zones of this
+size with the exception of the last zone, which may be smaller.
+
+  .. option:: zone.capacity
+
+The initial capacity value, in bytes, for all zones. The capacity must
+be less than or equal to zone size. If the last zone is smaller, then
+its capacity is capped.
+
+The zone capacity is per zone and may be different between zones in real
+devices. QCow2 sets all zones to the same capacity.
+
+  .. option:: zone.conventional_zones
+
+The number of conventional zones of the zoned device.
+
+  .. option:: zone.max_active_zones
+
+The limit of the zones with implicit open, explicit open or closed state.
+
+The max active zones must be less or equal to the number of SWR
+(sequential write required) zones of the device.
+
+  .. option:: zone.max_open_zones
+
+The maximal allowed open zones. The max open zones must not be larger than
+the max active zones.
+
+If the limits of open zones or active zones are equal to the number of
+SWR zones, then it is the same as having no limits.
+
+  .. option:: zone.max_append_bytes
+
+The number of bytes in a zone append request that can be issued to the
+device. It must be 512-byte aligned and less than the zone capacity.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1




[PATCH v7 3/4] qcow2: add zoned emulation capability

2024-01-22 Thread Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.

Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.

Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones. It is managed by active zone lists
following LRU policy.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 791 ++-
 block/trace-events   |   2 +
 include/qemu/queue.h |   1 +
 3 files changed, 792 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b987f1e751..db28585b82 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -195,6 +195,274 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
*fmt, Error **errp)
 return cryptoopts_qdict;
 }
 
+#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
+
+/*
+ * To emulate a real zoned device, closed, empty and full states are
+ * preserved after a power cycle. The open states are in-memory and will
+ * be lost after closing the device. Read-only and offline states are
+ * device-internal events, which are not considered for simplicity.
+ */
+static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs,
+  uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+uint64_t zone_wp = bs->wps->wp[index];
+uint64_t zone_start;
+
+if (QCOW2_ZT_IS_CONV(zone_wp)) {
+return BLK_ZS_NOT_WP;
+}
+
+if (QLIST_IS_INSERTED(zone_entry, exp_open_zone_entry)) {
+return BLK_ZS_EOPEN;
+}
+if (QLIST_IS_INSERTED(zone_entry, imp_open_zone_entry)) {
+return BLK_ZS_IOPEN;
+}
+
+zone_start = index * bs->bl.zone_size;
+if (zone_wp == zone_start) {
+return BLK_ZS_EMPTY;
+}
+if (zone_wp >= zone_start + bs->bl.zone_capacity) {
+return BLK_ZS_FULL;
+}
+if (zone_wp > zone_start) {
+if (!QLIST_IS_INSERTED(zone_entry, closed_zone_entry)) {
+/*
+ * The number of closed zones is not always updated in time when
+ * the device is closed. However, it only matters when doing
+ * zone report. Refresh the count and list of closed zones to
+ * provide correct zone states for zone report.
+ */
+QLIST_INSERT_HEAD(>closed_zones, zone_entry, closed_zone_entry);
+s->nr_zones_closed++;
+}
+return BLK_ZS_CLOSED;
+}
+return BLK_ZS_NOT_WP;
+}
+
+static void qcow2_rm_exp_open_zone(BDRVQcow2State *s,
+   uint32_t index)
+{
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+
+QLIST_REMOVE(zone_entry, exp_open_zone_entry);
+s->nr_zones_exp_open--;
+}
+
+static void qcow2_rm_imp_open_zone(BDRVQcow2State *s,
+   int32_t index)
+{
+Qcow2ZoneListEntry *zone_entry;
+if (index < 0) {
+/* Apply LRU when the index is not specified. */
+zone_entry = QLIST_LAST(>imp_open_zones, imp_open_zone_entry);
+} else {
+zone_entry = >zone_list_entries[index];
+}
+
+QLIST_REMOVE(zone_entry, imp_open_zone_entry);
+s->nr_zones_imp_open--;
+}
+
+static void qcow2_rm_open_zone(BDRVQcow2State *s,
+   uint32_t index)
+{
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+
+if (QLIST_IS_INSERTED(zone_entry, exp_open_zone_entry)) {
+qcow2_rm_exp_open_zone(s, index);
+} else if (QLIST_IS_INSERTED(zone_entry, imp_open_zone_entry)) {
+qcow2_rm_imp_open_zone(s, index);
+}
+}
+
+static void qcow2_rm_closed_zone(BDRVQcow2State *s,
+ uint32_t index)
+{
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+
+QLIST_REMOVE(zone_entry, closed_zone_entry);
+s->nr_zones_closed--;
+}
+
+static void qcow2_do_imp_open_zone(BDRVQcow2State *s,
+   uint32_t index,
+   BlockZoneState zs)
+{
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+
+switch (zs) {
+case BLK_ZS_EMPTY:
+break;
+case BLK_ZS_CLOSED:
+s->nr_zones_closed--;
+break;
+case BLK_ZS_IOPEN:
+/*
+ * The LRU policy: update the zone that is most recently
+ * used to the head of the zone list
+ */
+

[PATCH v7 4/4] iotests: test the zoned format feature for qcow2 file

2024-01-22 Thread Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check -qcow2 zoned-qcow2

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned-qcow2 | 147 +++
 tests/qemu-iotests/tests/zoned-qcow2.out | 172 +++
 2 files changed, 319 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

diff --git a/tests/qemu-iotests/tests/zoned-qcow2 
b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 00..ee49467576
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,147 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+  _cleanup_test_img
+  _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone.size=64M -o \
+zone.capacity=64M -o zone.conventional_zones=0 -o zone.max_append_bytes=32M \
+-o zone.max_open_zones=6 -o zone.max_active_zones=8 -o zone.mode=host-managed
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test zone operations one by one"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report zones[-1]:"  # zones[-1] dictates the last zone
+$QEMU_IO $IMG -c "zrp 0x2C00 2" # 0x2C00 / 512 = 0x16
+echo
+echo
+echo "(2) open zones[0], zones[1], zones[-1] then close, finish, reset:"
+$QEMU_IO $IMG << EOF
+zo 0 0x400
+zrp 0 1
+zo 0x400 0x400
+zrp 0x400 1
+zo 0x2C00 0x400
+zrp 0x2C00 2
+zc 0 0x400
+zrp 0 1
+zc 0x400 0x400
+zrp 0x400 1
+zc 0x2C00 0x400
+zrp 0x2C00 2
+zf 0 0x400
+zrp 0 1
+zf 64M 64M
+zrp 0x400 2
+zf 0x2C00 0x400
+zrp 0x2C00 2
+zrs 0 0x400
+zrp 0 1
+zrs 0x400 0x400
+zrp 0x400 1
+zrs 0x2C00 0x400
+zrp 0x2C00 2
+EOF
+
+echo
+echo "(3) append write with (4k, 8k) data"
+$QEMU_IO $IMG -c "zrp 0 12" # the physical block size of the device is 4096
+echo "Append write zones[0], zones[1] twice"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0x400 0x1000 0x2000
+zrp 0x400 1
+zap -p 0x400 0x1000 0x2000
+zrp 0x400 1
+EOF
+
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrp 0 12" -c "zrs 0 768M" -c "zrp 0 12"
+echo
+echo
+
+echo "case 2: test a sets of ops that works or not"
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000
+zrp 0 1
+zap -p 0 0x1000 0x3ffd000
+zrp 0 1
+EOF
+
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M" -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000
+zap -p 0xc00 0x1000 0x1000
+zap -p 0x2000 0x1000 0x1000
+zrp 0 12
+zrs 0 768M
+zrp 0 12
+EOF
+
+echo "case 3: test zone resource management"
+echo "(1) write in zones[0], zones[1], zones[2] and then close it"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000
+zap -p 0x400 0x1000 0x1000
+zap -p 0x800 0x1000 0x1000
+zrp 0 12
+zc 0 64M
+zc 0x400 64M
+zc 0x800 64M
+zrp 0 12
+EOF
+
+echo "(2) reset all after 3(1)"
+$QEMU_IO $IMG << EOF
+zrs 0 768M
+zrp 0 12
+EOF
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out 
b/tests/qemu-iotests/tests/zoned-qcow2.out
new file mode 100644
index 00..743abeeea4
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2.out
@@ -0,0 +1,172 @@
+QA output created by zoned-qcow2
+
+=== Initial image setup ===
+
+Formatting 'zbc.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib zone.mode=host-managed zone.size=67108864 
zone.capacity=67108864 zone.conventional_zones=0 zone.max_append_bytes=33554432 
zone.max_active_zones=8 zone.max_open_zones=6 size=805306368 lazy_refcounts=off 
refcount_bits=16
+
+=== Testing a qcow2 img with zoned format ===
+
+case 1: test zone operations one by one
+(1) re

[PATCH v7 0/4] Add full zoned storage emulation to qcow2 driver

2024-01-22 Thread Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.

Write pointer are preserved in the zoned metadata. It will be
recovered after power cycle. Meanwhile, any open (implicit or
explicit) zone will show up as closed.

Zone states are in memory. Read-only and offline states are
device-internal events, which are not considerred in qcow2
emulation for simplicity. The other zone states
(closed, empty, full) can be inferred from write poiner
values, presistent across QEMU reboots. The open states are
kept in memory using open zone lists.

To create a qcow2 file with zoned format, use command like this:

Then add it to the QEMU command line:
-blockdev 
node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 
\
-device virtio-blk-pci,drive=drive1 \

v6->v7:
- modify zone resource managemant (style) [Damien]
- fix accessing list with negative index err
- add some tests for zrm in iotests
- address review comments [Markus]

v5->v6:
- fix docs and specs [Eric, Markus, Stefan]
- add general sanity checks for zoned device configurations while creation and 
opening [Eric]
- fix LRU when implicitly open a zone for a long time [Stefan]

v4->v5:
- add incompatible bit for zoned format [Eric]
- fix and manage zone resources via LRU [Damien]
- renaming functions and fields, spec changes [Markus, Damien]
- add closed zone list
- make qemu iotests for zoned device consecutive [Stefan]

v3->v4:
- use QLIST for implicit, explicit open zones management [Stefan]
- keep zone states in memory and drop state bits in wp metadata structure 
[Damien, Stefan]
- change zone resource management and iotests accordingly
- add tracing for number of implicit zones
- address review comments [Stefan, Markus]:
  * documentation, config, style

v2->v3:
- drop zoned_profile option [Klaus]
- reformat doc comments of qcow2 [Markus]
- add input validation and checks for zoned information [Stefan]
- code style: format, comments, documentation, naming [Stefan]
- add tracing function for wp tracking [Stefan]
- reconstruct io path in check_zone_resources [Stefan]

v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
  * fix zoned_mata allocation size
  * use bitwise or than addition
  * fix wp index overflow and locking
  * cleanups: comments, naming

Sam Li (4):
  docs/qcow2: add the zoned format feature
  qcow2: add configurations for zoned format extension
  qcow2: add zoned emulation capability
  iotests: test the zoned format feature for qcow2 file

 block/qcow2.c| 1041 +-
 block/qcow2.h|   36 +-
 block/trace-events   |2 +
 docs/interop/qcow2.txt   |  107 ++-
 docs/system/qemu-block-drivers.rst.inc   |   42 +
 include/block/block_int-common.h |   13 +
 include/qemu/queue.h |1 +
 qapi/block-core.json |   67 +-
 tests/qemu-iotests/tests/zoned-qcow2 |  147 +++
 tests/qemu-iotests/tests/zoned-qcow2.out |  172 
 10 files changed, 1621 insertions(+), 7 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

-- 
2.40.1




Re: [RFC v2 0/7] Add persistence to NVMe ZNS emulation

2024-01-22 Thread Sam Li
Klaus Jensen  于2024年1月10日周三 07:52写道:
>
> Hi Sam,
>
> This is awesome. For the hw/nvme parts,
>
> Acked-by: Klaus Jensen 
>
> I'll give it a proper R-b when you drop the RFC status.

Hi Klaus,

Sorry for the late response. I will submit a new RFC patch series very
soon.

Now the zone states should persist. The following is the result of
regression tests on zonefs. It's been a while since I worked on this
series. Please let me know if I made any mistake.

Thanks,
Sam

[root@guest tests]# ./zonefs-tests.sh /dev/nvme0n1
Gathering information on /dev/nvme0n1...
zonefs-tests on /dev/nvme0n1:
  12 zones (0 conventional zones, 12 sequential zones)
  131072 512B sectors zone size (64 MiB)
  6 max open zones
  8 max active zones
Running tests
...
75 / 112 tests passed (37 skipped, 0 failures)



Re: [PATCH v6 0/4] Add full zoned storage emulation to qcow2 driver

2023-12-16 Thread Sam Li
Markus Armbruster  于2023年11月30日周四 09:25写道:
>
> Clang reports
>
> ../block/qcow2.c:4066:5: error: mutex 'graph_lock' is not held on every path 
> through here [-Werror,-Wthread-safety-analysis]
> blk_co_unref(blk);
> ^
> ../block/qcow2.c:3928:5: note: mutex acquired here
> bdrv_graph_co_rdlock();
> ^
> ../block/qcow2.c:4066:5: error: mutex 'graph_lock' is not held on every path 
> through here [-Werror,-Wthread-safety-analysis]
> blk_co_unref(blk);
> ^
> ../block/qcow2.c:3928:5: note: mutex acquired here
> bdrv_graph_co_rdlock();
> ^
> 2 errors generated.
>

Turns out that my gcc 12.0 does not support -Wthread-safety-analysis
flag. Need to use --cc=clang to reproduce it. Thanks!

Sam



Re: [PATCH v6 2/4] qcow2: add configurations for zoned format extension

2023-12-16 Thread Sam Li
Markus Armbruster  于2023年11月30日周四 09:40写道:
>
> Sam Li  writes:
>
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append bytes, max open zones, and max_active_zones).
> >
> > To create a qcow2 image with zoned format feature, use command like
> > this:
> > $path/to/qemu-img create -f qcow2 zbc.qcow2 -o size=768M
>
> I'd omit $path/to/
>
> > -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0
> > -o zone.max_append_bytes=4096 -o zone.max_open_zones=10
> > -o zone.max_active_zones=12 -o zone.mode=host-managed
>
> Suggest to add \ like this:
>
>   qemu-img create -f qcow2 zbc.qcow2 -o size=768M \
>   -o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0 \
>   -o zone.max_append_bytes=4096 -o zone.max_open_zones=10 \
>   -o zone.max_active_zones=12 -o zone.mode=host-managed
>
> >
> > Signed-off-by: Sam Li 
>
> [...]
>
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index ca390c5700..ef98dc83a0 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5038,6 +5038,64 @@
> >  { 'enum': 'Qcow2CompressionType',
> >'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >
> > +##
> > +# @Qcow2ZoneModel:
> > +#
> > +# Zoned device model used in qcow2 image file
> > +#
> > +# @host-managed: host-managed model only allows sequential write over the
>
> Suggest "the host-managed model ..."
>
> > +# device zones
> > +#
> > +# Since 8.2
> > +##
> > +{ 'enum': 'Qcow2ZoneModel',
> > +  'data': [ 'host-managed'] }
> > +
> > +##
> > +# @Qcow2ZoneHostManaged:
> > +#
> > +# The host-managed zone model.  It only allows sequential writes.
> > +#
> > +# @size: Total number of bytes within zones
> > +#
> > +# @capacity: The number of usable logical blocks within zones
> > +# in bytes.  A zone capacity is always smaller or equal to the
> > +# zone size
> > +#
> > +# @conventional-zones: The number of conventional zones of the
> > +# zoned device
> > +#
> > +# @max-open-zones: The maximal number of open zones
> > +#
> > +# @max-active-zones: The maximal number of zones in the implicit
> > +# open, explicit open or closed state
> > +#
> > +# @max-append-bytes: The maximal number of bytes of a zone
> > +# append request that can be issued to the device.  It must be
> > +# 512-byte aligned
>
> Missing period at the end.
>
> For all the optional members: what's the default?

The default for optional members is 0. When max-open-zones and
max-active-zones are 0, it implies no limit on zone resources.

>
> > +#
> > +# Since 8.2
> > +##
> > +{ 'struct': 'Qcow2ZoneHostManaged',
> > +  'data': { '*size':  'size',
> > +'*capacity':  'size',
> > +'*conventional-zones': 'uint32',
> > +'*max-open-zones': 'uint32',
> > +'*max-active-zones':   'uint32',
> > +'*max-append-bytes':   'uint32' } }
>
> Why isn't @max-append-bytes 'size'?  It's a byte count...
>
> > +
> > +##
> > +# @Qcow2ZoneCreateOptions:
> > +#
> > +# The zone device model for the qcow2 image.
> > +#
> > +# Since 8.2
> > +##
> > +{ 'union': 'Qcow2ZoneCreateOptions',
> > +  'base': { 'mode': 'Qcow2ZoneModel' },
> > +  'discriminator': 'mode',
> > +  'data': { 'host-managed': 'Qcow2ZoneHostManaged' } }
> > +
> >  ##
> >  # @BlockdevCreateOptionsQcow2:
> >  #
> > @@ -5080,6 +5138,8 @@
> >  # @compression-type: The image cluster compression method
> >  # (default: zlib, since 5.1)
> >  #
> > +# @zone: @Qcow2ZoneCreateOptions.  The zone device model modes (since 8.2)
>
> Don't put the type into the description like that, because it comes out
> like
>
> "zone": "Qcow2ZoneCreateOptions" (optional)
>"Qcow2ZoneCreateOptions".  The zone device model modes (since 8.2)
>
> in formatted documentation.
>
> Let's spell out the default: the device is not zoned.
>
> > +#
> >  # Since: 2.12
> >  ##
> >  { 'struct': 'BlockdevCreateOptionsQcow2',
> > @@ -5096,7 +5156,8 @@
> >  '*preallocation':   'PreallocMode',
> >  '*lazy-refcounts':  'bool',
> >  '*refcount-bits':   'int',
> > -'*compression-type':'Qcow2CompressionType' } }
> > +'*compression-type':'Qcow2CompressionType',
> > +'*zone':'Qcow2ZoneCreateOptions' } }
> >
> >  ##
> >  # @BlockdevCreateOptionsQed:
>



Re: [RFC v2 0/7] Add persistence to NVMe ZNS emulation

2023-11-30 Thread Sam Li
Markus Armbruster  于2023年11月30日周四 18:11写道:
>
> Sam Li  writes:
>
> > ZNS emulation follows NVMe ZNS spec but the state of namespace
> > zones does not persist accross restarts of QEMU. This patch makes the
> > metadata of ZNS emulation persistent by using new block layer APIs and
> > the qcow2 img as backing file. It is the second part after the patches
> > - adding full zoned storage emulation to qcow2 driver.
> > https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilike...@gmail.com/
>
> In the future, also add this information the machine-readable way,
> i.e. like
>
>   Based-on: <20231127043703.49489-1-faithilike...@gmail.com>
>
> However, it doesn't apply on top of that series for me.  Got something I
> could pull?

Weird, I biuld this on top of v6 qcow2 patches. I'll check that after
settling down. I am moving to another city recently.

Thanks,
Sam



[RFC v2 6/7] hw/nvme: refactor zone append write using block layer APIs

2023-11-27 Thread Sam Li
Signed-off-by: Sam Li 
---
 block/qcow2.c|   2 +-
 hw/nvme/ctrl.c   | 190 ---
 include/sysemu/dma.h |   3 +
 system/dma-helpers.c |  17 
 4 files changed, 162 insertions(+), 50 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dfaf5566e2..74d2e2bf39 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2290,7 +2290,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_open_zones = s->zoned_header.max_open_zones;
 bs->bl.zone_size = s->zoned_header.zone_size;
 bs->bl.zone_capacity = s->zoned_header.zone_capacity;
-bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+bs->bl.write_granularity = BDRV_SECTOR_SIZE; /* physical block size */
 bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 }
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index b9ed3495e1..f65a87646e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1735,6 +1735,95 @@ static void nvme_misc_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+typedef struct NvmeZoneCmdAIOCB {
+NvmeRequest *req;
+NvmeCmd *cmd;
+NvmeCtrl *n;
+
+union {
+struct {
+  uint32_t partial;
+  unsigned int nr_zones;
+  BlockZoneDescriptor *zones;
+} zone_report_data;
+struct {
+  int64_t offset;
+} zone_append_data;
+};
+} NvmeZoneCmdAIOCB;
+
+static void nvme_blk_zone_append_complete_cb(void *opaque, int ret)
+{
+NvmeZoneCmdAIOCB *cb = opaque;
+NvmeRequest *req = cb->req;
+int64_t *offset = (int64_t *)>cqe;
+
+if (ret) {
+nvme_aio_err(req, ret);
+}
+
+*offset = nvme_b2l(req->ns, cb->zone_append_data.offset);
+nvme_enqueue_req_completion(nvme_cq(req), req);
+g_free(cb);
+}
+
+static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset,
+  uint32_t align,
+  BlockCompletionFunc *cb,
+  NvmeZoneCmdAIOCB *aiocb)
+{
+NvmeRequest *req = aiocb->req;
+assert(req->sg.flags & NVME_SG_ALLOC);
+
+if (req->sg.flags & NVME_SG_DMA) {
+req->aiocb = dma_blk_zone_append(blk, >sg.qsg, (int64_t)offset,
+ align, cb, aiocb);
+} else {
+req->aiocb = blk_aio_zone_append(blk, offset, >sg.iov, 0,
+ cb, aiocb);
+}
+}
+
+static void nvme_zone_append_cb(void *opaque, int ret)
+{
+NvmeZoneCmdAIOCB *aiocb = opaque;
+NvmeRequest *req = aiocb->req;
+NvmeNamespace *ns = req->ns;
+
+BlockBackend *blk = ns->blkconf.blk;
+
+trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
+
+if (ret) {
+goto out;
+}
+
+if (ns->lbaf.ms) {
+NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+int64_t offset = aiocb->zone_append_data.offset;
+
+if (nvme_ns_ext(ns) || req->cmd.mptr) {
+uint16_t status;
+
+nvme_sg_unmap(>sg);
+status = nvme_map_mdata(nvme_ctrl(req), nlb, req);
+if (status) {
+ret = -EFAULT;
+goto out;
+}
+
+return nvme_blk_zone_append(blk, , 1,
+nvme_blk_zone_append_complete_cb,
+aiocb);
+}
+}
+
+out:
+nvme_blk_zone_append_complete_cb(aiocb, ret);
+}
+
+
 void nvme_rw_complete_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
@@ -3061,6 +3150,9 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 uint64_t mapped_size = data_size;
 uint64_t data_offset;
 BlockBackend *blk = ns->blkconf.blk;
+BlockZoneWps *wps = blk_get_zone_wps(blk);
+uint32_t zone_size = blk_get_zone_size(blk);
+uint32_t zone_idx;
 uint16_t status;
 
 if (nvme_ns_ext(ns)) {
@@ -3091,42 +3183,47 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 }
 
 if (blk_get_zone_model(blk)) {
-uint32_t zone_size = blk_get_zone_size(blk);
-uint32_t zone_idx = slba / zone_size;
-int64_t zone_start = zone_idx * zone_size;
+assert(wps);
+if (zone_size) {
+zone_idx = slba / zone_size;
+int64_t zone_start = zone_idx * zone_size;
+
+if (append) {
+bool piremap = !!(ctrl & NVME_RW_PIREMAP);
+
+if (n->params.zasl &&
+data_size > (uint64_t)
+n->page_size << n->params.zasl) {
+trace_pci_nvme_err_zasl(data_size);
+return NVME_INVALID_FIELD | NVME_DNR;
+}
 
-if (append) {
-bool piremap = !!(ctrl & NVME_RW

[RFC v2 4/7] hw/nvme: add blk_get_zone_extension to access zd_extensions

2023-11-27 Thread Sam Li
Signed-off-by: Sam Li 
---
 block/block-backend.c | 16 
 hw/nvme/ctrl.c| 20 ++--
 hw/nvme/ns.c  | 24 
 hw/nvme/nvme.h|  7 ---
 include/sysemu/block-backend-io.h |  2 ++
 5 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 666df9cfea..fcdcbe28bf 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2452,6 +2452,22 @@ BlockZoneWps *blk_get_zone_wps(BlockBackend *blk)
 return bs ? bs->wps : NULL;
 }
 
+uint8_t *blk_get_zone_extension(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->zd_extensions : NULL;
+}
+
+uint32_t blk_get_zd_ext_size(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zd_extension_size : 0;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e64b021454..dae6f00e4f 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4004,6 +4004,12 @@ static uint16_t nvme_zone_mgmt_send_zrwa_flush(NvmeCtrl 
*n, NvmeZone *zone,
 return NVME_SUCCESS;
 }
 
+static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
+uint32_t zone_idx)
+{
+return >zd_extensions[zone_idx * blk_get_zd_ext_size(ns->blkconf.blk)];
+}
+
 static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeZoneSendCmd *cmd = (NvmeZoneSendCmd *)>cmd;
@@ -4088,11 +4094,11 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 
 case NVME_ZONE_ACTION_SET_ZD_EXT:
 trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
-if (all || !ns->params.zd_extension_size) {
+if (all || !blk_get_zd_ext_size(ns->blkconf.blk)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 zd_ext = nvme_get_zd_extension(ns, zone_idx);
-status = nvme_h2c(n, zd_ext, ns->params.zd_extension_size, req);
+status = nvme_h2c(n, zd_ext, blk_get_zd_ext_size(ns->blkconf.blk), 
req);
 if (status) {
 trace_pci_nvme_err_zd_extension_map_error(zone_idx);
 return status;
@@ -4183,7 +4189,8 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 if (zra != NVME_ZONE_REPORT && zra != NVME_ZONE_REPORT_EXTENDED) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
-if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) {
+if (zra == NVME_ZONE_REPORT_EXTENDED &&
+!blk_get_zd_ext_size(ns->blkconf.blk)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -4205,7 +4212,7 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 
 zone_entry_sz = sizeof(NvmeZoneDescr);
 if (zra == NVME_ZONE_REPORT_EXTENDED) {
-zone_entry_sz += ns->params.zd_extension_size;
+zone_entry_sz += blk_get_zd_ext_size(ns->blkconf.blk) ;
 }
 
 max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
@@ -4243,11 +4250,12 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 if (zra == NVME_ZONE_REPORT_EXTENDED) {
+int zd_ext_size = blk_get_zd_ext_size(ns->blkconf.blk);
 if (zone->d.za & NVME_ZA_ZD_EXT_VALID) {
 memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx),
-   ns->params.zd_extension_size);
+   zd_ext_size);
 }
-buf_p += ns->params.zd_extension_size;
+buf_p += zd_ext_size;
 }
 
 max_zones--;
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 82d4f7932d..45c08391f5 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -218,15 +218,15 @@ static int 
nvme_ns_zoned_check_calc_geometry(NvmeNamespace *ns, Error **errp)
 
 static void nvme_ns_zoned_init_state(NvmeNamespace *ns)
 {
+BlockBackend *blk = ns->blkconf.blk;
 uint64_t start = 0, zone_size = ns->zone_size;
 uint64_t capacity = ns->num_zones * zone_size;
 NvmeZone *zone;
 int i;
 
 ns->zone_array = g_new0(NvmeZone, ns->num_zones);
-if (ns->params.zd_extension_size) {
-ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
-  ns->num_zones);
+if (blk_get_zone_extension(blk)) {
+ns->zd_extensions = blk_get_zone_extension(blk);
 }
 
 QTAILQ_INIT(>exp_open_zones);
@@ -275,7 +275,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 for (i = 0; i <= ns->id_ns.nlbaf; i++) {
 id_ns_z->lbafe[i].zsze = cpu_to_le64(ns->zone_size);
 id_ns_z->lbafe[i].zdes =
-ns->params.zd_extension_size >> 6; /* Units

[RFC v2 7/7] hw/nvme: make ZDED persistent

2023-11-27 Thread Sam Li
Zone descriptor extension data (ZDED) is not persistent across QEMU
restarts. The zone descriptor extension valid bit (ZDEV) is part of
zone attributes, which sets to one when the ZDED is associated with
the zone.

With the qcow2 img as the backing file, the NVMe ZNS device stores
the zone attributes at the following eight bit of zone type bit of write
pointers for each zone. The ZDED is stored as part of zoned metadata as
write pointers.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 45 
 hw/nvme/ctrl.c   |  1 +
 include/block/block-common.h |  1 +
 3 files changed, 47 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 74d2e2bf39..861a8f9f06 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 
 #include "block/qdict.h"
+#include "block/nvme.h"
 #include "sysemu/block-backend.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -235,6 +236,17 @@ static inline BlockZoneState 
qcow2_get_zone_state(BlockDriverState *bs,
 return BLK_ZS_NOT_WP;
 }
 
+static inline void qcow2_set_za(uint64_t *wp, uint8_t za)
+{
+/*
+ * The zone attribute takes up one byte. Store it after the zoned
+ * bit.
+ */
+uint64_t addr = *wp;
+addr |= ((uint64_t)za << 51);
+*wp = addr;
+}
+
 /*
  * Write the new wp value to the dedicated location of the image file.
  */
@@ -4990,6 +5002,36 @@ unlock:
 return ret;
 }
 
+static int qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+int ret;
+
+qemu_co_mutex_lock(>wps->colock);
+uint64_t *wp = >wps->wp[index];
+BlockZoneState zs = qcow2_get_zone_state(bs, index);
+if (zs == BLK_ZS_EMPTY) {
+ret = qcow2_check_zone_resources(bs, zs);
+if (ret < 0) {
+goto unlock;
+}
+
+qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID);
+ret = qcow2_write_wp_at(bs, wp, index);
+if (ret < 0) {
+error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp);
+goto unlock;
+}
+s->nr_zones_closed++;
+qemu_co_mutex_unlock(>wps->colock);
+return ret;
+}
+
+unlock:
+qemu_co_mutex_unlock(>wps->colock);
+return NVME_ZONE_INVAL_TRANSITION;
+}
+
 static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp 
op,
int64_t offset, int64_t len)
 {
@@ -5046,6 +5088,9 @@ static int coroutine_fn 
qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
 case BLK_ZO_OFFLINE:
 /* There are no transitions from the offline state to any other state 
*/
 break;
+case BLK_ZO_SET_ZDED:
+ret = qcow2_zns_set_zded(bs, index);
+break;
 default:
 error_report("Unsupported zone op: 0x%x", op);
 ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f65a87646e..c33e24e303 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3474,6 +3474,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 break;
 
 case NVME_ZONE_ACTION_SET_ZD_EXT:
+op = BLK_ZO_SET_ZDED;
 int zd_ext_size = blk_get_zd_ext_size(blk);
 trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
 if (all || !zd_ext_size) {
diff --git a/include/block/block-common.h b/include/block/block-common.h
index ea213c3887..b61541599f 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -91,6 +91,7 @@ typedef enum BlockZoneOp {
 BLK_ZO_FINISH,
 BLK_ZO_RESET,
 BLK_ZO_OFFLINE,
+BLK_ZO_SET_ZDED,
 } BlockZoneOp;
 
 typedef enum BlockZoneModel {
-- 
2.40.1




[RFC v2 5/7] hw/nvme: make the metadata of ZNS emulation persistent

2023-11-27 Thread Sam Li
The NVMe ZNS devices follow NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs. The
ZNS device calls zone report and zone mgmt APIs from the block layer
which will handle zone state transition and manage zone resources.

Signed-off-by: Sam Li 
---
 block/qcow2.c|3 +
 hw/nvme/ctrl.c   | 1106 +++---
 hw/nvme/ns.c |   77 +--
 hw/nvme/nvme.h   |   85 +--
 include/block/block-common.h |8 +
 include/block/block_int-common.h |2 +
 6 files changed, 264 insertions(+), 1017 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 75dff27216..dfaf5566e2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5043,6 +5043,9 @@ static int coroutine_fn 
qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
 case BLK_ZO_RESET:
 ret = qcow2_reset_zone(bs, index, len);
 break;
+case BLK_ZO_OFFLINE:
+/* There are no transitions from the offline state to any other state 
*/
+break;
 default:
 error_report("Unsupported zone op: 0x%x", op);
 ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dae6f00e4f..b9ed3495e1 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -372,67 +372,6 @@ static inline bool nvme_parse_pid(NvmeNamespace *ns, 
uint16_t pid,
 return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg);
 }
 
-static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
-   NvmeZoneState state)
-{
-if (QTAILQ_IN_USE(zone, entry)) {
-switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_REMOVE(>exp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_REMOVE(>imp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_CLOSED:
-QTAILQ_REMOVE(>closed_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_FULL:
-QTAILQ_REMOVE(>full_zones, zone, entry);
-default:
-;
-}
-}
-
-nvme_set_zone_state(zone, state);
-
-switch (state) {
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>exp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>imp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_CLOSED:
-QTAILQ_INSERT_TAIL(>closed_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_FULL:
-QTAILQ_INSERT_TAIL(>full_zones, zone, entry);
-case NVME_ZONE_STATE_READ_ONLY:
-break;
-default:
-zone->d.za = 0;
-}
-}
-
-static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
- uint32_t opn, uint32_t zrwa)
-{
-if (zrwa > ns->zns.numzrwa) {
-return NVME_NOZRWA | NVME_DNR;
-}
-
-return NVME_SUCCESS;
-}
-
-/*
- * Check if we can open a zone without exceeding open/active limits.
- * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5).
- */
-static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn)
-{
-return nvme_zns_check_resources(ns, act, opn, 0);
-}
-
 static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer 
*ebuf)
 {
 NvmeFdpEvent *ret = NULL;
@@ -1769,346 +1708,11 @@ static inline uint32_t nvme_zone_idx(NvmeNamespace 
*ns, uint64_t slba)
 slba / ns->zone_size;
 }
 
-static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba)
-{
-uint32_t zone_idx = nvme_zone_idx(ns, slba);
-
-if (zone_idx >= ns->num_zones) {
-return NULL;
-}
-
-return >zone_array[zone_idx];
-}
-
-static uint16_t nvme_check_zone_state_for_write(NvmeZone *zone)
-{
-uint64_t zslba = zone->d.zslba;
-
-switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_EMPTY:
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-case NVME_ZONE_STATE_CLOSED:
-return NVME_SUCCESS;
-case NVME_ZONE_STATE_FULL:
-trace_pci_nvme_err_zone_is_full(zslba);
-return NVME_ZONE_FULL;
-case NVME_ZONE_STATE_OFFLINE:
-trace_pci_nvme_err_zone_is_offline(zslba);
-return NVME_ZONE_OFFLINE;
-case NVME_ZONE_STATE_READ_ONLY:
-trace_pci_nvme_err_zone_is_read_only(zslba);
-return NVME_ZONE_READ_ONLY;
-default:
-assert(false);
-}
-
-return NVME_INTERNAL_DEV_ERROR;
-}
-
-static uint16_t nvme_check_zone_write(NvmeNamespace *ns, NvmeZone *zone,
-  uint64_t slba, uint32_t nlb)
-{
-uint64_t zcap = nvme_zone_wr_boundary(zo

[RFC v2 3/7] hw/nvme: use blk_get_*() to access zone info in the block layer

2023-11-27 Thread Sam Li
The zone information is contained in the BlockLimits fileds. Add blk_get_*() 
functions
to access the block layer and update zone info accessing in the NVMe device 
emulation.

Signed-off-by: Sam Li 
---
 block/block-backend.c | 72 +++
 hw/nvme/ctrl.c| 34 +--
 hw/nvme/ns.c  | 61 --
 hw/nvme/nvme.h|  3 --
 include/sysemu/block-backend-io.h |  9 
 5 files changed, 111 insertions(+), 68 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index ec21148806..666df9cfea 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2380,6 +2380,78 @@ int blk_get_max_iov(BlockBackend *blk)
 return blk->root->bs->bl.max_iov;
 }
 
+uint8_t blk_get_zone_model(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+return bs ? bs->bl.zoned: 0;
+
+}
+
+uint32_t blk_get_zone_size(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zone_size : 0;
+}
+
+uint32_t blk_get_zone_capacity(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zone_capacity : 0;
+}
+
+uint32_t blk_get_max_open_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_open_zones : 0;
+}
+
+uint32_t blk_get_max_active_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_active_zones : 0;
+}
+
+uint32_t blk_get_max_append_sectors(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_append_sectors : 0;
+}
+
+uint32_t blk_get_nr_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.nr_zones : 0;
+}
+
+uint32_t blk_get_write_granularity(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.write_granularity : 0;
+}
+
+BlockZoneWps *blk_get_zone_wps(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->wps : NULL;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e..e64b021454 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -417,18 +417,6 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
 static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
  uint32_t opn, uint32_t zrwa)
 {
-if (ns->params.max_active_zones != 0 &&
-ns->nr_active_zones + act > ns->params.max_active_zones) {
-trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones);
-return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR;
-}
-
-if (ns->params.max_open_zones != 0 &&
-ns->nr_open_zones + opn > ns->params.max_open_zones) {
-trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones);
-return NVME_ZONE_TOO_MANY_OPEN | NVME_DNR;
-}
-
 if (zrwa > ns->zns.numzrwa) {
 return NVME_NOZRWA | NVME_DNR;
 }
@@ -1988,9 +1976,9 @@ static uint16_t nvme_zrm_reset(NvmeNamespace *ns, 
NvmeZone *zone)
 static void nvme_zrm_auto_transition_zone(NvmeNamespace *ns)
 {
 NvmeZone *zone;
+int moz = blk_get_max_open_zones(ns->blkconf.blk);
 
-if (ns->params.max_open_zones &&
-ns->nr_open_zones == ns->params.max_open_zones) {
+if (moz && ns->nr_open_zones == moz) {
 zone = QTAILQ_FIRST(>imp_open_zones);
 if (zone) {
 /*
@@ -2160,7 +2148,7 @@ void nvme_rw_complete_cb(void *opaque, int ret)
 block_acct_done(stats, acct);
 }
 
-if (ns->params.zoned && nvme_is_write(req)) {
+if (blk_get_zone_model(blk) && nvme_is_write(req)) {
 nvme_finalize_zoned_write(ns, req);
 }
 
@@ -2882,7 +2870,7 @@ static void nvme_copy_out_completed_cb(void *opaque, int 
ret)
 goto out;
 }
 
-if (ns->params.zoned) {
+if (blk_get_zone_model(ns->blkconf.blk)) {
 nvme_advance_zone_wp(ns, iocb->zone, nlb);
 }
 
@@ -2994,7 +2982,7 @@ static void nvme_copy_in_completed_cb(void *opaque, int 
ret)
 goto invalid;
 }
 
-if (ns->params.zoned) {
+if (blk_get_zone_model(ns->blkconf.blk)) {
 status = nvme_check_zone_write(ns, iocb->zone, iocb->slba, nlb);
 if (status) {
 goto invalid;
@@ -3088,7 +3076,7 @@ static void nvme_do_copy(NvmeCopyAIOCB *iocb)
 }
 }
 
-if (ns->params.zoned) {
+if (blk_get_zone_model(ns->blkconf.blk)) {
 status = nvme_check_zone_read(ns, slba, nlb);
 if (status) {
   

[RFC v2 1/7] docs/qcow2: add zd_extension_size option to the zoned format feature

2023-11-27 Thread Sam Li
Signed-off-by: Sam Li 
---
 docs/interop/qcow2.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 0f1938f056..458d05371a 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -428,6 +428,9 @@ The fields of the zoned extension are:
The offset of zoned metadata structure in the contained
image, in bytes.
 
+  44 - 51:  zd_extension_size
+The size of zone descriptor extension data in bytes.
+
 == Full disk encryption header pointer ==
 
 The full disk encryption header must be present if, and only if, the
-- 
2.40.1




[RFC v2 0/7] Add persistence to NVMe ZNS emulation

2023-11-27 Thread Sam Li
ZNS emulation follows NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs and
the qcow2 img as backing file. It is the second part after the patches
- adding full zoned storage emulation to qcow2 driver.
https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilike...@gmail.com/

The metadata of ZNS emulation divides into two parts, zone metadata and
zone descriptor extension data. The zone metadata is composed of zone
states, zone type, wp and zone attributes. The zone information can be
stored at an uint64_t wp to save space and easy access. The structure of
wp of each zone is as follows:
|(4)| zone type (1)| zone attr (8)| wp (51) ||

The zone descriptor extension data is relatively small comparing to the
overall size therefore we adopt the option that store zded of all zones
in an array regardless of the valid bit set.

Creating a zns format qcow2 image file adds one more option zd_extension_size
to zoned device configurations.

To attach this file as emulated zns drive in the command line of QEMU, use:
  -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
  -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \

Sorry, send this one more time due to network problems.

v1->v2:
- split [v1 2/5] patch to three (doc, config, block layer API)
- adapt qcow2 v6

Sam Li (7):
  docs/qcow2: add zd_extension_size option to the zoned format feature
  qcow2: add zd_extension configurations to zoned metadata
  hw/nvme: use blk_get_*() to access zone info in the block layer
  hw/nvme: add blk_get_zone_extension to access zd_extensions
  hw/nvme: make the metadata of ZNS emulation persistent
  hw/nvme: refactor zone append write using block layer APIs
  hw/nvme: make ZDED persistent

 block/block-backend.c |   88 ++
 block/qcow2.c |  119 ++-
 block/qcow2.h |2 +
 docs/interop/qcow2.txt|3 +
 hw/nvme/ctrl.c| 1247 -
 hw/nvme/ns.c  |  162 +---
 hw/nvme/nvme.h|   95 +--
 include/block/block-common.h  |9 +
 include/block/block_int-common.h  |8 +
 include/sysemu/block-backend-io.h |   11 +
 include/sysemu/dma.h  |3 +
 qapi/block-core.json  |4 +
 system/dma-helpers.c  |   17 +
 13 files changed, 647 insertions(+), 1121 deletions(-)

-- 
2.40.1




[RFC v2 2/7] qcow2: add zd_extension configurations to zoned metadata

2023-11-27 Thread Sam Li
Zone descriptor data is host definied data that is associated with
each zone. Add zone descriptor extensions to zonedmeta struct.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 69 +---
 block/qcow2.h|  2 +
 include/block/block_int-common.h |  6 +++
 qapi/block-core.json |  4 ++
 4 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 26f2bb4a87..75dff27216 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -354,7 +354,8 @@ static inline int qcow2_refresh_zonedmeta(BlockDriverState 
*bs)
 {
 int ret;
 BDRVQcow2State *s = bs->opaque;
-uint64_t wps_size = s->zoned_header.zonedmeta_size;
+uint64_t wps_size = s->zoned_header.zonedmeta_size -
+s->zded_size;
 g_autofree uint64_t *temp = NULL;
 temp = g_new(uint64_t, wps_size);
 ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
@@ -364,7 +365,17 @@ static inline int qcow2_refresh_zonedmeta(BlockDriverState 
*bs)
 return ret;
 }
 
+g_autofree uint8_t *zded = NULL;
+zded = g_try_malloc0(s->zded_size);
+ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset + wps_size,
+ s->zded_size, zded, 0);
+if (ret < 0) {
+error_report("Can not read zded");
+return ret;
+}
+
 memcpy(bs->wps->wp, temp, wps_size);
+memcpy(bs->zd_extensions, zded, s->zded_size);
 return 0;
 }
 
@@ -390,6 +401,19 @@ qcow2_check_zone_options(Qcow2ZonedHeaderExtension 
*zone_opt)
 return false;
 }
 
+if (zone_opt->zd_extension_size) {
+if (zone_opt->zd_extension_size & 0x3f) {
+error_report("zone descriptor extension size must be a "
+ "multiple of 64B");
+return false;
+}
+
+if ((zone_opt->zd_extension_size >> 6) > 0xff) {
+error_report("Zone descriptor extension size is too large");
+return false;
+}
+}
+
 if (zone_opt->max_active_zones > zone_opt->nr_zones) {
 error_report("Max_active_zones %" PRIu32 " exceeds "
  "nr_zones %" PRIu32". Set it to nr_zones.",
@@ -676,6 +700,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 zoned_ext.conventional_zones =
 be32_to_cpu(zoned_ext.conventional_zones);
 zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+zoned_ext.zd_extension_size =
+be32_to_cpu(zoned_ext.zd_extension_size);
 zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
 zoned_ext.max_active_zones =
 be32_to_cpu(zoned_ext.max_active_zones);
@@ -686,7 +712,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
 s->zoned_header = zoned_ext;
 bs->wps = g_malloc(sizeof(BlockZoneWps)
-+ s->zoned_header.zonedmeta_size);
++ zoned_ext.zonedmeta_size - s->zded_size);
+bs->zd_extensions = g_malloc0(s->zded_size);
 ret = qcow2_refresh_zonedmeta(bs);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "zonedmeta: "
@@ -2264,6 +2291,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.zone_size = s->zoned_header.zone_size;
 bs->bl.zone_capacity = s->zoned_header.zone_capacity;
 bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 }
 
 static int GRAPH_UNLOCKED
@@ -3534,6 +3562,8 @@ int qcow2_update_header(BlockDriverState *bs)
 .conventional_zones =
 cpu_to_be32(s->zoned_header.conventional_zones),
 .nr_zones   = cpu_to_be32(s->zoned_header.nr_zones),
+.zd_extension_size  =
+cpu_to_be32(s->zoned_header.zd_extension_size),
 .max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
 .max_active_zones   =
 cpu_to_be32(s->zoned_header.max_active_zones),
@@ -4287,6 +4317,15 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 }
 s->zoned_header.max_append_bytes = zone_host_managed->max_append_bytes;
 
+uint64_t zded_size = 0;
+if (zone_host_managed->has_descriptor_extension_size) {
+s->zoned_header.zd_extension_size =
+zone_host_managed->descriptor_extension_size;
+zded_size = s->zoned_header.zd_extension_size *
+bs->bl.nr_zone

[RFC v2 0/7] Add persistence to NVMe ZNS emulation

2023-11-27 Thread Sam Li
ZNS emulation follows NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs and
the qcow2 img as backing file. It is the second part after the patches
- adding full zoned storage emulation to qcow2 driver.
https://patchwork.kernel.org/project/qemu-devel/cover/20231127043703.49489-1-faithilike...@gmail.com/

The metadata of ZNS emulation divides into two parts, zone metadata and
zone descriptor extension data. The zone metadata is composed of zone
states, zone type, wp and zone attributes. The zone information can be
stored at an uint64_t wp to save space and easy access. The structure of
wp of each zone is as follows:
|(4)| zone type (1)| zone attr (8)| wp (51) ||

The zone descriptor extension data is relatively small comparing to the
overall size therefore we adopt the option that store zded of all zones
in an array regardless of the valid bit set.

Creating a zns format qcow2 image file adds one more option zd_extension_size
to zoned device configurations.

To attach this file as emulated zns drive in the command line of QEMU, use:
  -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
  -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \

v1->v2:
- split [v1 2/5] patch to three (doc, config, block layer API)
- adapt qcow2 v6

Sam Li (7):
  docs/qcow2: add zd_extension_size option to the zoned format feature
  qcow2: add zd_extension configurations to zoned metadata
  hw/nvme: use blk_get_*() to access zone info in the block layer
  hw/nvme: add blk_get_zone_extension to access zd_extensions
  hw/nvme: make the metadata of ZNS emulation persistent
  hw/nvme: refactor zone append write using block layer APIs
  hw/nvme: make ZDED persistent

 block/block-backend.c |   88 ++
 block/qcow2.c |  119 ++-
 block/qcow2.h |2 +
 docs/interop/qcow2.txt|3 +
 hw/nvme/ctrl.c| 1247 -
 hw/nvme/ns.c  |  162 +---
 hw/nvme/nvme.h|   95 +--
 include/block/block-common.h  |9 +
 include/block/block_int-common.h  |8 +
 include/sysemu/block-backend-io.h |   11 +
 include/sysemu/dma.h  |3 +
 qapi/block-core.json  |4 +
 system/dma-helpers.c  |   17 +
 13 files changed, 647 insertions(+), 1121 deletions(-)

-- 
2.40.1




[PATCH v6 0/4] Add full zoned storage emulation to qcow2 driver

2023-11-26 Thread Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.

Write pointer are preserved in the zoned metadata. It will be
recovered after power cycle. Meanwhile, any open (implicit or
explicit) zone will show up as closed.

Zone states are in memory. Read-only and offline states are
device-internal events, which are not considerred in qcow2
emulation for simplicity. The other zone states
(closed, empty, full) can be inferred from write poiner
values, presistent across QEMU reboots. The open states are
kept in memory using open zone lists.

Zoned extension feature is optional. We only set it to host-manged 
when emulating a zoned device. For non-zoned devices, it does not 
need to consider setting this option.

To create a qcow2 image with zoned format feature, use command like
this:
$path/to/qemu-img create -f qcow2 zbc.qcow2 -o size=768M
-o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0
-o zone.max_append_bytes=4096 -o zone.max_open_zones=10
-o zone.max_active_zones=12 -o zone.mode=host-managed


Then add it to the QEMU command line:
-blockdev 
node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 
\
-device virtio-blk-pci,drive=drive1 \

v5->v6:
- fix docs and specs [Eric, Markus, Stefan]
- add general sanity checks for zoned device configurations while creation and 
opening [Eric]
- fix LRU when implicitly open a zone for a long time [Stefan]

v4->v5:
- add incompatible bit for zoned format [Eric]
- fix and manage zone resources via LRU [Damien]
- renaming functions and fields, spec changes [Markus, Damien]
- add closed zone list
- make qemu iotests for zoned device consecutive [Stefan]

v3->v4:
- use QLIST for implicit, explicit open zones management [Stefan]
- keep zone states in memory and drop state bits in wp metadata structure 
[Damien, Stefan]
- change zone resource management and iotests accordingly
- add tracing for number of implicit zones
- address review comments [Stefan, Markus]:
  * documentation, config, style

v2->v3:
- drop zoned_profile option [Klaus]
- reformat doc comments of qcow2 [Markus]
- add input validation and checks for zoned information [Stefan]
- code style: format, comments, documentation, naming [Stefan]
- add tracing function for wp tracking [Stefan]
- reconstruct io path in check_zone_resources [Stefan]

v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
  * fix zoned_mata allocation size
  * use bitwise or than addition
  * fix wp index overflow and locking
  * cleanups: comments, naming

Sam Li (4):
  docs/qcow2: add the zoned format feature
  qcow2: add configurations for zoned format extension
  qcow2: add zoned emulation capability
  iotests: test the zoned format feature for qcow2 file

 block/qcow2.c| 972 ++-
 block/qcow2.h|  36 +-
 block/trace-events   |   2 +
 docs/interop/qcow2.txt   |  99 ++-
 docs/system/qemu-block-drivers.rst.inc   |  35 +
 include/block/block_int-common.h |  13 +
 include/qemu/queue.h |   1 +
 qapi/block-core.json |  63 +-
 tests/qemu-iotests/tests/zoned-qcow2 | 126 +++
 tests/qemu-iotests/tests/zoned-qcow2.out | 118 +++
 10 files changed, 1460 insertions(+), 5 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

-- 
2.40.1




[PATCH v6 4/4] iotests: test the zoned format feature for qcow2 file

2023-11-26 Thread Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check -qcow2 zoned-qcow2

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned-qcow2 | 126 +++
 tests/qemu-iotests/tests/zoned-qcow2.out | 118 +
 2 files changed, 244 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

diff --git a/tests/qemu-iotests/tests/zoned-qcow2 
b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 00..d7141a35aa
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,126 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+  _cleanup_test_img
+  _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone.size=64M -o \
+zone.capacity=64M -o zone.conventional_zones=0 -o zone.max_append_bytes=131072 
\
+-o zone.max_open_zones=10 -o zone.max_active_zones=12 -o zone.mode=host-managed
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test zone operations one by one"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report zones[-1]:"  # zones[-1] dictates the last zone
+$QEMU_IO $IMG -c "zrp 0x2C00 2" # 0x2C00 / 512 = 0x16
+echo
+echo
+echo "(2) open zones[0], zones[1], zones[-1] then close, finish, reset:"
+$QEMU_IO $IMG << EOF
+zo 0 0x400 # 0x400 / 512 = 0x2
+zrp 0 1
+zo 0x400 0x400
+zrp 0x400 1
+zo 0x2C00 0x400
+zrp 0x2C00 2
+zc 0 0x400
+zrp 0 1
+zc 0x2C00 0x400
+zrp 0x2C00 2
+zf 0 0x400
+zrp 0 1
+zf 64M 64M
+zrp 0x400 2
+zf 0x2C00 0x400
+zrp 0x2C00 2
+zrs 0 0x400
+zrp 0 1
+zrs 0x400 0x400
+zrp 0x400 1
+zrs 0x2C00 0x400
+zrp 0x2C00 2
+EOF
+
+echo
+echo "(3) append write with (4k, 8k) data"
+$QEMU_IO $IMG -c "zrp 0 12" # the physical block size of the device is 4096
+echo "Append write zones[0], zones[1] twice"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0x400 0x1000 0x2000
+zrp 0x400 1
+zap -p 0x400 0x1000 0x2000
+zrp 0x400 1
+EOF
+
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M" -c "zrp 0 12"
+echo
+echo
+
+echo "case 2: test a sets of ops that works or not"
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000 # wrote (4k, 4k):
+zrp 0 1
+zap -p 0 0x1000 0x3ffd000
+zrp 0 1
+EOF
+
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M" -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000
+zap -p 0xc00 0x1000 0x1000
+zap -p 0x2000 0x1000 0x1000
+zrp 0 12
+zrs 0 768M
+zrp 0 12
+EOF
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out 
b/tests/qemu-iotests/tests/zoned-qcow2.out
new file mode 100644
index 00..3b30ef545b
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2.out
@@ -0,0 +1,118 @@
+QA output created by zoned-qcow2
+
+=== Initial image setup ===
+
+Formatting 'zbc.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib zone.mode=host-managed zone.size=67108864 
zone.capacity=67108864 zone.conventional_zones=0 zone.max_append_bytes=131072 
zone.max_active_zones=12 zone.max_open_zones=10 size=805306368 
lazy_refcounts=off refcount_bits=16
+
+=== Testing a qcow2 img with zoned format ===
+
+case 1: test zone operations one by one
+(1) report zones[0]:
+start: 0x0, len 0x2, cap 0x2, wptr 0x0, zcond:1, [type: 2]
+
+report zones[0~9]:
+start: 0x0, len 0x2, cap 0x2, wptr 0x0, zcond:1, [type: 2]
+start: 0x2, len 0x2, cap 0x2, wptr 0x2, zcond:1, [type: 2]
+start: 0x4, len 0x2, cap 0x2, wptr 0x4, zcond:1, [type: 2]
+start: 0x6, len 0x2, cap 0x2, wptr 0x6, zcond:1, [type: 2]
+start: 0x8, len 0x2, ca

[PATCH v6 2/4] qcow2: add configurations for zoned format extension

2023-11-26 Thread Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires settings as: the device size, zone model, zone size,
zone capacity, number of conventional zones, limits on zone
resources (max append bytes, max open zones, and max_active_zones).

To create a qcow2 image with zoned format feature, use command like
this:
$path/to/qemu-img create -f qcow2 zbc.qcow2 -o size=768M
-o zone.size=64M -o zone.capacity=64M -o zone.conventional_zones=0
-o zone.max_append_bytes=4096 -o zone.max_open_zones=10
-o zone.max_active_zones=12 -o zone.mode=host-managed

Signed-off-by: Sam Li 
---
 block/qcow2.c| 233 ++-
 block/qcow2.h|  36 -
 docs/interop/qcow2.txt   |  99 -
 include/block/block_int-common.h |  13 ++
 qapi/block-core.json |  63 -
 5 files changed, 440 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 13e032bd5e..9a92cd242c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
 #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -194,6 +195,55 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char *fmt, 
Error **errp)
 return cryptoopts_qdict;
 }
 
+/*
+ * Passing by the zoned device configurations by a zoned_header struct, check
+ * if the zone device options are under constraints. Return false when some
+ * option is invalid
+ */
+static inline bool
+qcow2_check_zone_options(Qcow2ZonedHeaderExtension *zone_opt)
+{
+if (zone_opt) {
+if (zone_opt->zone_size == 0) {
+error_report("Zoned extension header zone_size field "
+ "can not be 0");
+return false;
+}
+
+if (zone_opt->zone_capacity > zone_opt->zone_size) {
+error_report("zone capacity %" PRIu32 "B exceeds zone size "
+ "%" PRIu32"B", zone_opt->zone_capacity,
+ zone_opt->zone_size);
+return false;
+}
+
+if (zone_opt->max_active_zones > zone_opt->nr_zones) {
+error_report("Max_active_zones %" PRIu32 " exceeds "
+ "nr_zones %" PRIu32". Set it to nr_zones.",
+ zone_opt->max_active_zones, zone_opt->nr_zones);
+zone_opt->max_active_zones = zone_opt->nr_zones;
+}
+
+if (zone_opt->max_open_zones > zone_opt->max_active_zones) {
+error_report("Max_open_zones %" PRIu32 " exceeds "
+ "max_active_zones %" PRIu32". Set it to "
+ "max_active_zones.",
+ zone_opt->max_open_zones,
+ zone_opt->max_active_zones);
+zone_opt->max_open_zones = zone_opt->max_active_zones;
+}
+
+if (zone_opt->max_open_zones > zone_opt->nr_zones) {
+error_report("Max_open_zones field can not be larger "
+ "than the number of zones. Set it to nr_zones.");
+zone_opt->max_open_zones = zone_opt->nr_zones;
+}
+
+return true;
+}
+return false;
+}
+
 /*
  * read qcow2 extension and fill bs
  * start reading from start_offset
@@ -211,6 +261,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 uint64_t offset;
 int ret;
 Qcow2BitmapHeaderExt bitmaps_ext;
+Qcow2ZonedHeaderExtension zoned_ext;
 
 if (need_update_header != NULL) {
 *need_update_header = false;
@@ -432,6 +483,51 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 break;
 }
 
+case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+{
+if (ext.len < sizeof(zoned_ext)) {
+/* Missing fields */
+error_setg(errp, "zoned_ext: len=%" PRIu32 " too small "
+   "(<%zu)", ext.len, sizeof(zoned_ext));
+return -EINVAL;
+}
+ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+return ret;
+}
+
+zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+zoned_ext.conventional_zones =
+be32_to_cpu(zoned_ext

[PATCH v6 3/4] qcow2: add zoned emulation capability

2023-11-26 Thread Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.

Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.

Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones. It is managed by active zone lists
following LRU policy.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 741 ++-
 block/trace-events   |   2 +
 include/qemu/queue.h |   1 +
 3 files changed, 742 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9a92cd242c..26f2bb4a87 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -195,6 +195,179 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
*fmt, Error **errp)
 return cryptoopts_qdict;
 }
 
+#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
+
+/*
+ * To emulate a real zoned device, closed, empty and full states are
+ * preserved after a power cycle. Open states are in-memory and will
+ * be lost after closing the device. Read-only and offline states are
+ * device-internal events, which are not considered for simplicity.
+ */
+static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs,
+  uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+uint64_t zone_wp = bs->wps->wp[index];
+uint64_t zone_start;
+
+if (QCOW2_ZT_IS_CONV(zone_wp)) {
+return BLK_ZS_NOT_WP;
+}
+
+if (QLIST_IS_INSERTED(zone_entry, exp_open_zone_entry)) {
+return BLK_ZS_EOPEN;
+}
+if (QLIST_IS_INSERTED(zone_entry, imp_open_zone_entry)) {
+return BLK_ZS_IOPEN;
+}
+
+zone_start = index * bs->bl.zone_size;
+if (zone_wp == zone_start) {
+return BLK_ZS_EMPTY;
+}
+if (zone_wp >= zone_start + bs->bl.zone_capacity) {
+return BLK_ZS_FULL;
+}
+if (zone_wp > zone_start) {
+return BLK_ZS_CLOSED;
+}
+return BLK_ZS_NOT_WP;
+}
+
+/*
+ * Write the new wp value to the dedicated location of the image file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index) {
+BDRVQcow2State *s = bs->opaque;
+uint64_t wpv = *wp;
+int ret;
+
+ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+if (ret < 0) {
+goto exit;
+}
+trace_qcow2_wp_tracking(index, *wp >> BDRV_SECTOR_BITS);
+return ret;
+
+exit:
+*wp = wpv;
+error_report("Failed to write metadata with file");
+return ret;
+}
+
+static bool qcow2_can_activate_zone(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+/* When the max active zone is zero, there is no limit on active zones */
+if (!s->zoned_header.max_active_zones) {
+return true;
+}
+
+/* The active zones are zones with the states of open and closed */
+if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+< s->zoned_header.max_active_zones) {
+return true;
+}
+
+return false;
+}
+
+/*
+ * This function manages open zones under active zones limit. It checks
+ * if a zone can transition to open state while maintaining max open and
+ * active zone limits.
+ */
+static bool qcow2_can_open_zone(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2ZoneListEntry *zone_entry;
+
+/* When the max open zone is zero, there is no limit on open zones */
+if (!s->zoned_header.max_open_zones) {
+return true;
+}
+
+/*
+ * The open zones are zones with the states of explicitly and
+ * implicitly open.
+ */
+if (s->nr_zones_imp_open + s->nr_zones_exp_open <
+s->zoned_header.max_open_zones) {
+return true;
+}
+
+/*
+ * Zones are managed once at a time. Thus, the number of implicitly open
+ * zone can never be over the open zone limit. When the active zone limit
+ * is not reached, close only one implicitly open zone.
+ */
+if (qcow2_can_activate_zone(bs)) {
+/*
+ * The LRU policy is used for handling active zone lists. When
+ * removing a random zone entry, we discard the least recently used
+ * list item. The list item at the last is the least recently used
+ * one. The zone list

[PATCH v6 1/4] docs/qcow2: add the zoned format feature

2023-11-26 Thread Sam Li
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file then can emulate real zoned devices, either passed
through by virtio-blk device or NVMe ZNS drive to the guest
given zoned information.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 docs/system/qemu-block-drivers.rst.inc | 35 ++
 1 file changed, 35 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..955fea271e 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,41 @@ This section describes each format and the options that 
are supported for it.
 filename`` to check if the NOCOW flag is set or not (Capital 'C' is
 NOCOW flag).
 
+  .. option:: zone.mode
+If this is set to ``host-managed``, the image is an emulated zoned
+block device. This option is only valid to emulated zoned device files.
+
+  .. option:: zone.size
+
+The size of a zone in bytes. The device is divided into zones of this
+size with the exception of the last zone, which may be smaller.
+
+  .. option:: zone.capacity
+
+The initial capacity value, in bytes, for all zones. The capacity must
+be less than or equal to zone size. If the last zone is smaller, then
+its capacity is capped.
+
+The zone capacity is per zone and may be different between zones in real
+devices. QCow2 sets all zones to the same capacity.
+
+  .. option:: zone.conventional_zones
+
+The number of conventional zones of the zoned device.
+
+  .. option:: zone.max_open_zones
+
+The maximal allowed open zones.
+
+  .. option:: zone.max_active_zones
+
+The limit of the zones with implicit open, explicit open or closed state.
+
+  .. option:: zone.max_append_bytes
+
+The number of bytes in a zone append request that can be issued to the
+device. It must be 512-byte aligned.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1




Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension

2023-11-16 Thread Sam Li
Hi Eric,

Eric Blake  于2023年10月30日周一 22:53写道:
>
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append bytes, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> > max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=host-managed
> >
> > Signed-off-by: Sam Li 
> >
> > fix config?
>
> Is this comment supposed to be part of the commit message?  If not,...
>
> > ---
>
> ...place it here under the divider, so 'git am' won't include it, if there is 
> nothing further to change on this patch.
>
> >  block/qcow2.c| 205 ++-
> >  block/qcow2.h|  37 +-
> >  docs/interop/qcow2.txt   |  67 +-
> >  include/block/block_int-common.h |  13 ++
> >  qapi/block-core.json |  45 ++-
> >  5 files changed, 362 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index aa01d9e7b5..cd53268ca7 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> >  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> >  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> >  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
> >
> >  static int coroutine_fn
> >  qcow2_co_preadv_compressed(BlockDriverState *bs,
> > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> > start_offset,
> >  uint64_t offset;
> >  int ret;
> >  Qcow2BitmapHeaderExt bitmaps_ext;
> > +Qcow2ZonedHeaderExtension zoned_ext;
> >
> >  if (need_update_header != NULL) {
> >  *need_update_header = false;
> > @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> > start_offset,
> >  break;
> >  }
> >
> > +case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > +{
> > +if (ext.len != sizeof(zoned_ext)) {
> > +error_setg(errp, "zoned_ext: Invalid extension length");
> > +return -EINVAL;
> > +}
>
> Do we ever anticipate the struct growing in size in the future to add
> further features?  Forcing the size to be constant, rather than a
> minimum, may get in the way of clean upgrades to a future version of
> the extension header.
>
> > +ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
> > +if (ret < 0) {
> > +error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Could not read ext header");
> > +return ret;
> > +}
> > +
> > +if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
> > +warn_report("A program lacking zoned format support "
> > +   "may modify this file and zoned metadata are "
> > +   "now considered inconsistent");
> > +error_printf("The zoned metadata is corrupted.\n");
>
> Why is this mixing warn_report and error_printf at the same time.
> Also, grammar is inconsistent from the similar
> QCOW2_AUTOCLEAR_BITMAPS, which used:
>
> if (s->qcow_version < 3) {
> /* Let's be a bit more specific */
> warn_report("This qcow2 v2 image contains bitmaps, but "
> "they may have been modified by a program "
> "without persistent bitmap support; so now "
> "they must all be considered inconsistent");
> } else {
> warn_report("a program lacking bitmap support "
> "modified this file, so all bitmaps are now "
> "considered inconsistent");
>
> This also raises the question whether we want to ever allow zoned
> support with a v2 image, or whether it should just be a hard error if
> it i

Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension

2023-11-16 Thread Sam Li
Markus Armbruster  于2023年11月3日周五 17:08写道:
>
> Eric Blake  writes:
>
> > On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> >> To configure the zoned format feature on the qcow2 driver, it
> >> requires settings as: the device size, zone model, zone size,
> >> zone capacity, number of conventional zones, limits on zone
> >> resources (max append bytes, max open zones, and max_active_zones).
> >>
> >> To create a qcow2 file with zoned format, use command like this:
> >> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> >> zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> >> max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> >> -o zone_model=host-managed
> >>
> >> Signed-off-by: Sam Li 
> >>
> >> fix config?
> >
> > Is this comment supposed to be part of the commit message?  If not,...
> >
> >> ---
> >
> > ...place it here under the divider, so 'git am' won't include it, if there 
> > is nothing further to change on this patch.
>
> [...]
>
> >> +++ b/qapi/block-core.json
> >> @@ -4981,6 +4981,21 @@
> >>  { 'enum': 'Qcow2CompressionType',
> >>'data': [ 'zlib', { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
> >>
> >> +##
> >> +# @Qcow2ZoneModel:
> >> +#
> >> +# Zoned device model used in qcow2 image file
> >> +#
> >> +# @non-zoned: non-zoned model is for regular block devices
> >> +#
> >> +# @host-managed: host-managed model only allows sequential write over the
> >> +# device zones
> >> +#
> >> +# Since 8.2
> >> +##
> >> +{ 'enum': 'Qcow2ZoneModel',
> >> +  'data': ['non-zoned', 'host-managed'] }
> >> +
> >>  ##
> >>  # @BlockdevCreateOptionsQcow2:
> >>  #
> >> @@ -5023,6 +5038,27 @@
> >>  # @compression-type: The image cluster compression method
> >>  # (default: zlib, since 5.1)
> >>  #
> >> +# @zone-model: @Qcow2ZoneModel.  The zone device model.
> >> +# (default: non-zoned, since 8.2)
> >> +#
> >> +# @zone-size: Total number of bytes within zones (since 8.2)
> >
> > If @zone-model is "non-zoned", does it make sense to even allow
> > @zone-size and friends?  Should this use a QMP union, where you can
> > pass in the remaining zone-* fields only when zone-model is set to
> > host-managed?
>
> Valid question; needs an answer.

Yes, it should use a QMP union. It's better to separate those fields
for zoned and non-zoned.

>
> >> +#
> >> +# @zone-capacity: The number of usable logical blocks within zones
> >> +# in bytes.  A zone capacity is always smaller or equal to the
> >> +# zone size (since 8.2)
> >> +#
> >> +# @conventional-zones: The number of conventional zones of the
> >> +# zoned device (since 8.2)
> >> +#
> >> +# @max-open-zones: The maximal number of open zones (since 8.2)
> >> +#
> >> +# @max-active-zones: The maximal number of zones in the implicit
> >> +# open, explicit open or closed state (since 8.2)
> >> +#
> >> +# @max-append-bytes: The maximal number of bytes of a zone
> >> +# append request that can be issued to the device.  It must be
> >> +# 512-byte aligned (since 8.2)
> >> +#
> >>  # Since: 2.12
> >>  ##
> >>  { 'struct': 'BlockdevCreateOptionsQcow2',
> >> @@ -5039,7 +5075,14 @@
> >>  '*preallocation':   'PreallocMode',
> >>  '*lazy-refcounts':  'bool',
> >>  '*refcount-bits':   'int',
> >> -'*compression-type':'Qcow2CompressionType' } }
> >> +'*compression-type':'Qcow2CompressionType',
> >> +'*zone-model': 'Qcow2ZoneModel',
> >> +'*zone-size':  'size',
> >> +'*zone-capacity':  'size',
> >> +'*conventional-zones': 'uint32',
> >> +'*max-open-zones': 'uint32',
> >> +'*max-active-zones':   'uint32',
> >> +'*max-append-bytes':   'uint32' } }
> >
> > In other words, I'm envisioning something like an optional
> > '*zone':'ZoneStruct', where:
> >
> > { 'struct': 'ZoneHostManaged',
> >   'data': { 'size': 'size', '*capacity': 'size', ..., '*max-append-bytes': 
> > 'uint32' } }
> > { 'union': 'ZoneStruct',
> >   'base': { 'model': 'Qcow2ZoneModel' },
> >   'discriminator': 'model',
> >   'data': { 'non-zoned': {},
> > 'host-managed': 'ZoneHostManaged' } }
> >
> > then over the wire, QMP can use the existing:
> > { ..., "compression-type":"zstd" }
> >
> > as a synonym for the new but explicit non-zoned:
> > { ..., "compression-type":"zstd", "zone":{"mode":"non-zoned"} }
>
> I.e. @zone is optional, and defaults to {"mode": "non-zoned"}.
>
> > and when we want to use zones, we pass:
> > { ..., "compression-type":"zstd", "zone":{"mode":"host-managed", 
> > "size":16777216} }
> >
> > where you don't have to have zone- prefixing everywhere because it is
> > instead contained in the smart union object where it is obvious from
> > the 'mode' field what other fields should be present.
>

Yes, it's better. Thanks!

Sam



Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension

2023-11-16 Thread Sam Li
Stefan Hajnoczi  于2023年11月3日周五 11:24写道:
>
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > +typedef struct Qcow2ZoneListEntry {
> > +QLIST_ENTRY(Qcow2ZoneListEntry) exp_open_zone_entry;
> > +QLIST_ENTRY(Qcow2ZoneListEntry) imp_open_zone_entry;
> > +QLIST_ENTRY(Qcow2ZoneListEntry) closed_zone_entry;
>
> Where is closed_zone_entry used?

When the number of implicitly open zones are reaching the max
implicitly open zone and one implicitly open zone is closed, it will
add one closed zone to closed_zone_entry. (Will be in the next
version)



Re: [PATCH v2] block/file-posix: fix update_zones_wp() caller

2023-10-31 Thread Sam Li
Looks good, thanks!

Hanna Czenczek 于2023年10月31日 周二17:24写道:

> On 25.08.23 06:05, Sam Li wrote:
>
> When the zoned request fail, it needs to update only the wp of
> the target zones for not disrupting the in-flight writes on
> these other zones. The wp is updated successfully after the
> request completes.
>
> Fixed the callers with right offset and nr_zones.
>
> Signed-off-by: Sam Li  
> ---
>  block/file-posix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
>
> Thanks, applied to my block branch:
>
> https://gitlab.com/hreitz/qemu/-/commits/block
>
> (Rebased on master, and I’ve also fixed the comment to read “boundaries”
> instead of “bounaries”.  Hope that’s OK!)
>
>
> Hanna
>


Re: [PATCH v5 2/4] qcow2: add configurations for zoned format extension

2023-10-30 Thread Sam Li
Hi Eric,

Eric Blake  于2023年10月30日周一 22:53写道:
>
> On Mon, Oct 30, 2023 at 08:18:45PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append bytes, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
> > max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=host-managed
> >
> > Signed-off-by: Sam Li 
> >
> > fix config?
>
> Is this comment supposed to be part of the commit message?  If not,...

No...

>
> > ---
>
> ...place it here under the divider, so 'git am' won't include it, if there is 
> nothing further to change on this patch.
>
> >  block/qcow2.c| 205 ++-
> >  block/qcow2.h|  37 +-
> >  docs/interop/qcow2.txt   |  67 +-
> >  include/block/block_int-common.h |  13 ++
> >  qapi/block-core.json |  45 ++-
> >  5 files changed, 362 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index aa01d9e7b5..cd53268ca7 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> >  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> >  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> >  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
> >
> >  static int coroutine_fn
> >  qcow2_co_preadv_compressed(BlockDriverState *bs,
> > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> > start_offset,
> >  uint64_t offset;
> >  int ret;
> >  Qcow2BitmapHeaderExt bitmaps_ext;
> > +Qcow2ZonedHeaderExtension zoned_ext;
> >
> >  if (need_update_header != NULL) {
> >  *need_update_header = false;
> > @@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> > start_offset,
> >  break;
> >  }
> >
> > +case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > +{
> > +if (ext.len != sizeof(zoned_ext)) {
> > +error_setg(errp, "zoned_ext: Invalid extension length");
> > +return -EINVAL;
> > +}
>
> Do we ever anticipate the struct growing in size in the future to add
> further features?  Forcing the size to be constant, rather than a
> minimum, may get in the way of clean upgrades to a future version of
> the extension header.

The zoned extension could grow. So ext.len > sizeof(zoned_ext) -> invalid.

>
> > +ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
> > +if (ret < 0) {
> > +error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Could not read ext header");
> > +return ret;
> > +}
> > +
> > +if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
> > +warn_report("A program lacking zoned format support "
> > +   "may modify this file and zoned metadata are "
> > +   "now considered inconsistent");
> > +error_printf("The zoned metadata is corrupted.\n");
>
> Why is this mixing warn_report and error_printf at the same time.
> Also, grammar is inconsistent from the similar
> QCOW2_AUTOCLEAR_BITMAPS, which used:
>
> if (s->qcow_version < 3) {
> /* Let's be a bit more specific */
> warn_report("This qcow2 v2 image contains bitmaps, but "
> "they may have been modified by a program "
> "without persistent bitmap support; so now "
> "they must all be considered inconsistent");
> } else {
> warn_report("a program lacking bitmap support "
> "modified this file, so all bitmaps are now "
> "considered inconsistent");
>
> This also raises the question whether we want to eve

Re: [PATCH v5 1/4] docs/qcow2: add the zoned format feature

2023-10-30 Thread Sam Li
Eric Blake  于2023年10月30日周一 22:05写道:
>
> On Mon, Oct 30, 2023 at 08:18:44PM +0800, Sam Li wrote:
> > Add the specs for the zoned format feature of the qcow2 driver.
> > The qcow2 file can be taken as zoned device and passed through by
> > virtio-blk device or NVMe ZNS device to the guest given zoned
> > information.
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Stefan Hajnoczi 
> > ---
> >  docs/system/qemu-block-drivers.rst.inc | 33 ++
> >  1 file changed, 33 insertions(+)
> >
> > diff --git a/docs/system/qemu-block-drivers.rst.inc 
> > b/docs/system/qemu-block-drivers.rst.inc
> > index 105cb9679c..4647c5fa29 100644
> > --- a/docs/system/qemu-block-drivers.rst.inc
> > +++ b/docs/system/qemu-block-drivers.rst.inc
> > @@ -172,6 +172,39 @@ This section describes each format and the options 
> > that are supported for it.
> >  filename`` to check if the NOCOW flag is set or not (Capital 'C' is
> >  NOCOW flag).
> >
> > +  .. option:: zoned
> > +1 for host-managed zoned device and 0 for a non-zoned device.
>
> Should this be a bool or enum type, instead of requiring the user to
> know magic numbers?  Is there a potential to add yet another type in
> the future?

Mistake, sorry. Forgot to document this change but the configurations
in the subsequent patch uses enum type.

>
> > +
> > +  .. option:: zone_size
> > +
> > +The size of a zone in bytes. The device is divided into zones of this
> > +size with the exception of the last zone, which may be smaller.
> > +
> > +  .. option:: zone_capacity
> > +
> > +The initial capacity value, in bytes, for all zones. The capacity must
> > +be less than or equal to zone size. If the last zone is smaller, then
> > +its capacity is capped.
> > +
> > +The zone capacity is per zone and may be different between zones in 
> > real
> > +devices. For simplicity, QCow2 sets all zones to the same capacity.
>
> Just making sure I understand: One possible setup would be to describe
> a block device with zones of size 1024M but with capacity 1000M (that
> is, the zone reserves 24M capacity for other purposes)?

Yes, it is. The NVMe ZNS drive allows that.

>
> Otherwise, I'm having a hard time seeing when you would ever set a
> capacity different from size.
>
> Are there requirements that one (or both) of these values must be
> powers of 2?  Or is the requirement merely that they must be a
> multiple of 512 bytes (because sub-sector operations are not
> permitted)?  Is there any implicit requirement based on qcow2
> implementation that a zone size/capacity must be a multiple of cluster
> size (other than possibly for the last zone)?

Yes. Linux will only expose zoned devices that have a zone size
that is a power of 2 number of LBAs.

No, the zone size/capacity is not necessarily a multiple of the cluster size.

>
> > +
> > +  .. option:: zone_nr_conv
> > +
> > +The number of conventional zones of the zoned device.
> > +
> > +  .. option:: max_open_zones
> > +
> > +The maximal allowed open zones.
> > +
> > +  .. option:: max_active_zones
> > +
> > +The limit of the zones with implicit open, explicit open or closed 
> > state.
> > +
> > +  .. option:: max_append_sectors
> > +
> > +The maximal number of 512-byte sectors in a zone append request.
>
> Why is this value in sectors instead of bytes?  I understand that
> drivers may be written with sectors in mind, but any time we mix units
> in the public interface, it gets awkward.  I'd lean towards having
> bytes here, with a requirement that it be a multiple of 512.

Sorry. Same, already changed this in the following patches.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>



[PATCH v5 3/4] qcow2: add zoned emulation capability

2023-10-30 Thread Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.

Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.

Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones. It is managed by active zone lists
following LRU policy.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 731 ++-
 block/trace-events   |   2 +
 include/qemu/queue.h |   1 +
 3 files changed, 732 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index cd53268ca7..b0f9023fd9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -194,6 +194,178 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
*fmt, Error **errp)
 return cryptoopts_qdict;
 }
 
+#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
+
+/*
+ * To emulate a real zoned device, closed, empty and full states are
+ * preserved after a power cycle. Open states are in-memory and will
+ * be lost after closing the device. Read-only and offline states are
+ * device-internal events, which are not considered for simplicity.
+ */
+static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs,
+  uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2ZoneListEntry *zone_entry = >zone_list_entries[index];
+uint64_t zone_wp = bs->wps->wp[index];
+uint64_t zone_start;
+
+if (QCOW2_ZT_IS_CONV(zone_wp)) {
+return BLK_ZS_NOT_WP;
+}
+
+if (QLIST_IS_INSERTED(zone_entry, exp_open_zone_entry)) {
+return BLK_ZS_EOPEN;
+}
+if (QLIST_IS_INSERTED(zone_entry, imp_open_zone_entry)) {
+return BLK_ZS_IOPEN;
+}
+
+zone_start = index * bs->bl.zone_size;
+if (zone_wp == zone_start) {
+return BLK_ZS_EMPTY;
+}
+if (zone_wp >= zone_start + bs->bl.zone_capacity) {
+return BLK_ZS_FULL;
+}
+if (zone_wp > zone_start) {
+return BLK_ZS_CLOSED;
+}
+return BLK_ZS_NOT_WP;
+}
+
+/*
+ * Write the new wp value to the dedicated location of the image file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index) {
+BDRVQcow2State *s = bs->opaque;
+uint64_t wpv = *wp;
+int ret;
+
+ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+if (ret < 0) {
+goto exit;
+}
+trace_qcow2_wp_tracking(index, *wp >> BDRV_SECTOR_BITS);
+return ret;
+
+exit:
+*wp = wpv;
+error_report("Failed to write metadata with file");
+return ret;
+}
+
+static bool qcow2_can_activate_zone(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+/* When the max active zone is zero, there is no limit on active zones */
+if (!s->zoned_header.max_active_zones) {
+return true;
+}
+
+/* The active zones are zones with the states of open and closed */
+if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+< s->zoned_header.max_active_zones) {
+return true;
+}
+
+return false;
+}
+
+/*
+ * This function manages open zones under active zones limit. It checks
+ * if a zone can transition to open state while maintaining max open and
+ * active zone limits.
+ */
+static bool qcow2_can_open_zone(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+Qcow2ZoneListEntry *zone_entry;
+
+/* When the max open zone is zero, there is no limit on open zones */
+if (!s->zoned_header.max_open_zones) {
+return true;
+}
+
+/*
+ * The open zones are zones with the states of explicitly and
+ * implicitly open.
+ */
+if (s->nr_zones_imp_open + s->nr_zones_exp_open <
+s->zoned_header.max_open_zones) {
+return true;
+}
+
+/*
+ * Zones are managed once at a time. Thus, the number of implicitly open
+ * zone can never be over the open zone limit. When the active zone limit
+ * is not reached, close only one implicitly open zone.
+ */
+if (qcow2_can_activate_zone(bs)) {
+/*
+ * The LRU policy is used for handling active zone lists. When
+ * removing a random zone entry, we discard the least recently used
+ * list item. The list item at the last is the least recently used
+ * one. The zone list

[PATCH v5 4/4] iotests: test the zoned format feature for qcow2 file

2023-10-30 Thread Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check -qcow2 zoned-qcow2

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned-qcow2 | 126 +++
 tests/qemu-iotests/tests/zoned-qcow2.out | 118 +
 2 files changed, 244 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

diff --git a/tests/qemu-iotests/tests/zoned-qcow2 
b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 00..7749329480
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,126 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+  _cleanup_test_img
+  _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
+-o zone_capacity=64M -o conventional_zones=0 -o max_append_bytes=131072 \
+-o max_open_zones=0 -o max_active_zones=0 -o zone_model=host-managed
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test zone operations one by one"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report zones[-1]:"  # zones[-1] dictates the last zone
+$QEMU_IO $IMG -c "zrp 0x2C00 2" # 0x2C00 / 512 = 0x16
+echo
+echo
+echo "(2) open zones[0], zones[1], zones[-1] then close, finish, reset:"
+$QEMU_IO $IMG << EOF
+zo 0 0x400 # 0x400 / 512 = 0x2
+zrp 0 1
+zo 0x400 0x400
+zrp 0x400 1
+zo 0x2C00 0x400
+zrp 0x2C00 2
+zc 0 0x400
+zrp 0 1
+zc 0x2C00 0x400
+zrp 0x2C00 2
+zf 0 0x400
+zrp 0 1
+zf 64M 64M
+zrp 0x400 2
+zf 0x2C00 0x400
+zrp 0x2C00 2
+zrs 0 0x400
+zrp 0 1
+zrs 0x400 0x400
+zrp 0x400 1
+zrs 0x2C00 0x400
+zrp 0x2C00 2
+EOF
+
+echo
+echo "(3) append write with (4k, 8k) data"
+$QEMU_IO $IMG -c "zrp 0 12" # the physical block size of the device is 4096
+echo "Append write zones[0], zones[1] twice"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0 0x1000 0x2000
+zrp 0 1
+zap -p 0x400 0x1000 0x2000
+zrp 0x400 1
+zap -p 0x400 0x1000 0x2000
+zrp 0x400 1
+EOF
+
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M" -c "zrp 0 12"
+echo
+echo
+
+echo "case 2: test a sets of ops that works or not"
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000 # wrote (4k, 4k):
+zrp 0 1
+zap -p 0 0x1000 0x3ffd000
+zrp 0 1
+EOF
+
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M" -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG << EOF
+zap -p 0 0x1000 0x1000
+zap -p 0xc00 0x1000 0x1000
+zap -p 0x2000 0x1000 0x1000
+zrp 0 12
+zrs 0 768M
+zrp 0 12
+EOF
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out 
b/tests/qemu-iotests/tests/zoned-qcow2.out
new file mode 100644
index 00..aec43f0d5b
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2.out
@@ -0,0 +1,118 @@
+QA output created by zoned-qcow2
+
+=== Initial image setup ===
+
+Formatting 'zbc.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off 
compression_type=zlib zone_model=host-managed zone_size=67108864 
zone_capacity=67108864 conventional_zones=0 max_append_bytes=131072 
max_active_zones=0 max_open_zones=0 size=805306368 lazy_refcounts=off 
refcount_bits=16
+
+=== Testing a qcow2 img with zoned format ===
+
+case 1: test zone operations one by one
+(1) report zones[0]:
+start: 0x0, len 0x2, cap 0x2, wptr 0x0, zcond:1, [type: 2]
+
+report zones[0~9]:
+start: 0x0, len 0x2, cap 0x2, wptr 0x0, zcond:1, [type: 2]
+start: 0x2, len 0x2, cap 0x2, wptr 0x2, zcond:1, [type: 2]
+start: 0x4, len 0x2, cap 0x2, wptr 0x4, zcond:1, [type: 2]
+start: 0x6, len 0x2, cap 0x2, wptr 0x6, zcond:1, [type: 2]
+start: 0x8, len 0x2, cap 0x2, wptr 0x8, zcond:1, [type: 2

[PATCH v5 2/4] qcow2: add configurations for zoned format extension

2023-10-30 Thread Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires settings as: the device size, zone model, zone size,
zone capacity, number of conventional zones, limits on zone
resources (max append bytes, max open zones, and max_active_zones).

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o conventional_zones=0 -o
max_append_bytes=4096 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=host-managed

Signed-off-by: Sam Li 

fix config?
---
 block/qcow2.c| 205 ++-
 block/qcow2.h|  37 +-
 docs/interop/qcow2.txt   |  67 +-
 include/block/block_int-common.h |  13 ++
 qapi/block-core.json |  45 ++-
 5 files changed, 362 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index aa01d9e7b5..cd53268ca7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
 #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x007a6264
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 uint64_t offset;
 int ret;
 Qcow2BitmapHeaderExt bitmaps_ext;
+Qcow2ZonedHeaderExtension zoned_ext;
 
 if (need_update_header != NULL) {
 *need_update_header = false;
@@ -431,6 +433,63 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 break;
 }
 
+case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+{
+if (ext.len != sizeof(zoned_ext)) {
+error_setg(errp, "zoned_ext: Invalid extension length");
+return -EINVAL;
+}
+ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+return ret;
+}
+
+if (s->incompatible_features & QCOW2_INCOMPAT_ZONED_FORMAT) {
+warn_report("A program lacking zoned format support "
+   "may modify this file and zoned metadata are "
+   "now considered inconsistent");
+error_printf("The zoned metadata is corrupted.\n");
+}
+
+zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+zoned_ext.conventional_zones =
+be32_to_cpu(zoned_ext.conventional_zones);
+zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
+zoned_ext.max_active_zones =
+be32_to_cpu(zoned_ext.max_active_zones);
+zoned_ext.max_append_bytes =
+be32_to_cpu(zoned_ext.max_append_bytes);
+s->zoned_header = zoned_ext;
+
+/* refuse to open broken images */
+if (zoned_ext.zone_size == 0) {
+error_setg(errp, "Zoned extension header zone_size field "
+ "can not be 0");
+return -EINVAL;
+}
+if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
+error_setg(errp, "Zoned extension header zone_capacity field "
+ "can not be larger that zone_size field");
+return -EINVAL;
+}
+if (zoned_ext.nr_zones != DIV_ROUND_UP(
+bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
+error_setg(errp, "Zoned extension header nr_zones field "
+ "is wrong");
+return -EINVAL;
+}
+
+#ifdef DEBUG_EXT
+printf("Qcow2: Got zoned format extension: "
+   "offset=%" PRIu32 "\n", offset);
+#endif
+break;
+}
+
 default:
 /* unknown magic - save it in case we need to rewrite the header */
 /* If you add a new feature, make sure to also update the fast
@@ -1967,6 +2026,15 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }
 bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
 bs->bl.pdiscard_alignment = s->cluster_size;
+bs->bl.zoned = s->zoned_header.zoned;
+bs->bl.nr_zones = s->zoned_header.nr_zones;
+bs->bl.max_append_sectors = s->zoned_header.max_append_bytes
+ 

[PATCH v5 1/4] docs/qcow2: add the zoned format feature

2023-10-30 Thread Sam Li
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file can be taken as zoned device and passed through by
virtio-blk device or NVMe ZNS device to the guest given zoned
information.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 docs/system/qemu-block-drivers.rst.inc | 33 ++
 1 file changed, 33 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..4647c5fa29 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,39 @@ This section describes each format and the options that 
are supported for it.
 filename`` to check if the NOCOW flag is set or not (Capital 'C' is
 NOCOW flag).
 
+  .. option:: zoned
+1 for host-managed zoned device and 0 for a non-zoned device.
+
+  .. option:: zone_size
+
+The size of a zone in bytes. The device is divided into zones of this
+size with the exception of the last zone, which may be smaller.
+
+  .. option:: zone_capacity
+
+The initial capacity value, in bytes, for all zones. The capacity must
+be less than or equal to zone size. If the last zone is smaller, then
+its capacity is capped.
+
+The zone capacity is per zone and may be different between zones in real
+devices. For simplicity, QCow2 sets all zones to the same capacity.
+
+  .. option:: zone_nr_conv
+
+The number of conventional zones of the zoned device.
+
+  .. option:: max_open_zones
+
+The maximal allowed open zones.
+
+  .. option:: max_active_zones
+
+The limit of the zones with implicit open, explicit open or closed state.
+
+  .. option:: max_append_sectors
+
+The maximal number of 512-byte sectors in a zone append request.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1




[PATCH v5 0/4] Add full zoned storage emulation to qcow2 driver

2023-10-30 Thread Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.

Write pointer are preserved in the zoned metadata. It will be
recovered after power cycle. Meanwhile, any open (implicit or
explicit) zone will show up as closed.

Zone states are in memory. Read-only and offline states are
device-internal events, which are not considerred in qcow2
emulation for simplicity. The other zone states
(closed, empty, full) can be inferred from write poiner
values, presistent across QEMU reboots. The open states are
kept in memory using open zone lists.

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=1

Then add it to the QEMU command line:
-blockdev 
node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 
\
-device virtio-blk-pci,drive=drive1 \

v4->v5:
- add incompatible bit for zoned format [Eric]
- fix and manage zone resources via LRU [Damien]
- renaming functions and fields, spec changes [Markus, Damien]
- add closed zone list
- make qemu iotests for zoned device consecutive [Stefan]

v3->v4:
- use QLIST for implicit, explicit open zones management [Stefan]
- keep zone states in memory and drop state bits in wp metadata structure 
[Damien, Stefan]
- change zone resource management and iotests accordingly
- add tracing for number of implicit zones
- address review comments [Stefan, Markus]:
  * documentation, config, style

v2->v3:
- drop zoned_profile option [Klaus]
- reformat doc comments of qcow2 [Markus]
- add input validation and checks for zoned information [Stefan]
- code style: format, comments, documentation, naming [Stefan]
- add tracing function for wp tracking [Stefan]
- reconstruct io path in check_zone_resources [Stefan]

v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
  * fix zoned_mata allocation size
  * use bitwise or than addition
  * fix wp index overflow and locking
  * cleanups: comments, naming

Sam Li (4):
  docs/qcow2: add the zoned format feature
  qcow2: add configurations for zoned format extension
  qcow2: add zoned emulation capability
  iotests: test the zoned format feature for qcow2 file

 block/qcow2.c| 934 ++-
 block/qcow2.h|  37 +-
 block/trace-events   |   2 +
 docs/interop/qcow2.txt   |  67 +-
 docs/system/qemu-block-drivers.rst.inc   |  33 +
 include/block/block_int-common.h |  13 +
 include/qemu/queue.h |   1 +
 qapi/block-core.json |  45 +-
 tests/qemu-iotests/tests/zoned-qcow2 | 126 +++
 tests/qemu-iotests/tests/zoned-qcow2.out | 118 +++
 10 files changed, 1370 insertions(+), 6 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

-- 
2.40.1




Re: [PATCH] file-posix: fix over-writing of returning zone_append offset

2023-10-30 Thread Sam Li
Naohiro Aota  于2023年10月30日周一 15:39写道:
>
> raw_co_zone_append() sets "s->offset" where "BDRVRawState *s". This pointer
> is used later at raw_co_prw() to save the block address where the data is
> written.
>
> When multiple IOs are on-going at the same time, a later IO's
> raw_co_zone_append() call over-writes a former IO's offset address before
> raw_co_prw() completes. As a result, the former zone append IO returns the
> initial value (= the start address of the writing zone), instead of the
> proper address.
>
> Fix the issue by passing the offset pointer to raw_co_prw() instead of
> passing it through s->offset. Also, remove "offset" from BDRVRawState as
> there is no usage anymore.
>
> Fixes: 4751d09adcc3 ("block: introduce zone append write for zoned devices")
> Signed-off-by: Naohiro Aota 
> ---
>  block/file-posix.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)

Thanks!

Reviewed-by: Sam Li 

>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 50e2b20d5c45..c39209358909 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -160,7 +160,6 @@ typedef struct BDRVRawState {
>  bool has_write_zeroes:1;
>  bool use_linux_aio:1;
>  bool use_linux_io_uring:1;
> -int64_t *offset; /* offset of zone append operation */
>  int page_cache_inconsistent; /* errno from fdatasync failure */
>  bool has_fallocate;
>  bool needs_alignment;
> @@ -2445,12 +2444,13 @@ static bool bdrv_qiov_is_aligned(BlockDriverState 
> *bs, QEMUIOVector *qiov)
>  return true;
>  }
>
> -static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
> +static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
> uint64_t bytes, QEMUIOVector *qiov, int 
> type)
>  {
>  BDRVRawState *s = bs->opaque;
>  RawPosixAIOData acb;
>  int ret;
> +uint64_t offset = *offset_ptr;
>
>  if (fd_open(bs) < 0)
>  return -EIO;
> @@ -2513,8 +2513,8 @@ out:
>  uint64_t *wp = >wp[offset / bs->bl.zone_size];
>  if (!BDRV_ZT_IS_CONV(*wp)) {
>  if (type & QEMU_AIO_ZONE_APPEND) {
> -*s->offset = *wp;
> -trace_zbd_zone_append_complete(bs, *s->offset
> +*offset_ptr = *wp;
> +trace_zbd_zone_append_complete(bs, *offset_ptr
>  >> BDRV_SECTOR_BITS);
>  }
>  /* Advance the wp if needed */
> @@ -2536,14 +2536,14 @@ static int coroutine_fn 
> raw_co_preadv(BlockDriverState *bs, int64_t offset,
>int64_t bytes, QEMUIOVector *qiov,
>BdrvRequestFlags flags)
>  {
> -return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_READ);
> +return raw_co_prw(bs, , bytes, qiov, QEMU_AIO_READ);
>  }
>
>  static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
> int64_t bytes, QEMUIOVector *qiov,
> BdrvRequestFlags flags)
>  {
> -return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
> +return raw_co_prw(bs, , bytes, qiov, QEMU_AIO_WRITE);
>  }
>
>  static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
> @@ -3506,8 +3506,6 @@ static int coroutine_fn 
> raw_co_zone_append(BlockDriverState *bs,
>  int64_t zone_size_mask = bs->bl.zone_size - 1;
>  int64_t iov_len = 0;
>  int64_t len = 0;
> -BDRVRawState *s = bs->opaque;
> -s->offset = offset;
>
>  if (*offset & zone_size_mask) {
>  error_report("sector offset %" PRId64 " is not aligned to zone size "
> @@ -3528,7 +3526,7 @@ static int coroutine_fn 
> raw_co_zone_append(BlockDriverState *bs,
>  }
>
>  trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
> -return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
> +return raw_co_prw(bs, offset, len, qiov, QEMU_AIO_ZONE_APPEND);
>  }
>  #endif
>
> --
> 2.42.0
>



Re: [PATCH v2] block/file-posix: fix update_zones_wp() caller

2023-10-16 Thread Sam Li
Sam Li  于2023年8月25日周五 12:06写道:
>
> When the zoned request fail, it needs to update only the wp of
> the target zones for not disrupting the in-flight writes on
> these other zones. The wp is updated successfully after the
> request completes.
>
> Fixed the callers with right offset and nr_zones.
>
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Ping?

>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b16e9c21a1..55e7f06a2f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2522,7 +2522,8 @@ out:
>  }
>  } else {
>  if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> -update_zones_wp(bs, s->fd, 0, 1);
> +/* write and append write are not allowed to cross zone 
> bounaries */
> +update_zones_wp(bs, s->fd, offset, 1);
>  }
>  }
>
> @@ -3472,7 +3473,7 @@ static int coroutine_fn 
> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
>  len >> BDRV_SECTOR_BITS);
>  ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
>  if (ret != 0) {
> -update_zones_wp(bs, s->fd, offset, i);
> +update_zones_wp(bs, s->fd, offset, nrz);
>  error_report("ioctl %s failed %d", op_name, ret);
>  return ret;
>  }
> --
> 2.40.1
>



Re: [PATCH v4 2/4] qcow2: add configurations for zoned format extension

2023-10-09 Thread Sam Li
Hello Eric,

Eric Blake  于2023年9月28日周四 23:15写道:
>
> On Mon, Sep 18, 2023 at 05:53:11PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append sectors, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=1
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/qcow2.c| 186 ++-
> >  block/qcow2.h|  28 +
> >  docs/interop/qcow2.txt   |  36 ++
> >  include/block/block_int-common.h |  13 +++
> >  qapi/block-core.json |  30 -
> >  5 files changed, 291 insertions(+), 2 deletions(-)
>
> Below, I'll focus only on the spec change, not the implementation:
>
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index b48cd9ce63..521276fc51 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> >  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> >  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> >  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
>
> Why not spell it 0x007a6264 with 8 hex digits, like the others?  (I
> get why you choose that constant, though - ascii 'zbd')
>
> > +++ b/docs/interop/qcow2.txt
> > @@ -331,6 +331,42 @@ The fields of the bitmaps extension are:
> > Offset into the image file at which the bitmap directory
> > starts. Must be aligned to a cluster boundary.
> >
> > +== Zoned extension ==
>
> Where is the magic number for this extension called out?  That's
> missing, and MUST be part of the spec.

It's a part of the header extension type in the spec. I will add it.

>
> Back-compatibility constraints: you should consider what happens in
> both of the following cases:
>
> a program that intends to do read-only access to the qcow2 file but
> which does not understand this extension header (for example, an older
> version of 'qemu-img convert' being used to extract data from a newer
> .qcow2 file with this header present - but also the new 'nbdkit
> qcow2dec' decoder plugin just released in nbdkit 1.36).  Is it safe to
> read the data as-is, by basically ignoring zone informations?  Or will
> that ever produce wrong data (for example, if operations on a
> particular zone imply that the guest should read all zeroes after the
> current zone offset within that zone, regardless of whether non-zero
> content was previously stored at those offsets - then not honoring the
> existence of the extension header would require you to add and
> document an incompatible feature bit so that reader apps fail to open
> the file rather than reading wrong data).
>
> a program that intends to edit the qcow2 file but which does not
> understand this extension header (again, consider access by an older
> version of qemu).  Is it safe to just write data anywhere in the disk,
> but where failure to update the zone metadata means that all
> subsequent use of the file MUST behave as if it is now a non-zeoned
> device?  If so, then it is sufficient to document an autoclear feature
> bit: any time a newer qcow2 writer creates a file with a zoned
> extension, it also sets the autoclear feature bit; any time an older
> qcow2 writer edits a file with the autoclear bit, it clears the bit
> (because it has no idea if its edits invalidated the unknown
> extension).  Then when the new qcow2 program again accesses the file,
> it knows that the zone information is no longer reliable, and can fall
> back to forcing the image to behave as flat.

Considering access by an older version of qemu ('old qemu' for abbr.)
with a qcow2 file created with zoned extension ('new file' for abbr.),
reads from a new file on old qemu which does not understand zoned
information are safe. The zoned extension represents necessary zone
states for all zones, which puts constraints to operations on the
zones. For example, writes to offsets that are over the capacity of
that zone are not allowed, where it will be read as zeroes. The old
qemu ignores that and reads the new file as a regular one anyway.

However, what is unsafe is when an old qemu program gets involved in
editing a new file. The new qemu will not see the write pointer
changes of

Re: [PATCH v4 3/4] qcow2: add zoned emulation capability

2023-10-09 Thread Sam Li
Eric Blake  于2023年9月29日周五 03:17写道:
>
> On Mon, Sep 18, 2023 at 05:53:12PM +0800, Sam Li wrote:
> > By adding zone operations and zoned metadata, the zoned emulation
> > capability enables full emulation support of zoned device using
> > a qcow2 file. The zoned device metadata includes zone type,
> > zoned device state and write pointer of each zone, which is stored
> > to an array of unsigned integers.
> >
> > Each zone of a zoned device makes state transitions following
> > the zone state machine. The zone state machine mainly describes
> > five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> > READ ONLY and OFFLINE states will generally be affected by device
> > internal events. The operations on zones cause corresponding state
> > changing.
> >
> > Zoned devices have a limit on zone resources, which puts constraints on
> > write operations into zones.
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/qcow2.c  | 709 -
> >  block/qcow2.h  |   2 +
> >  block/trace-events |   2 +
> >  docs/interop/qcow2.txt |   6 +
> >  4 files changed, 717 insertions(+), 2 deletions(-)
>
> You may want to look at scripts/git.orderfile; putting spec changes
> (docs/*) first in your output before implementation is generally
> beneficial to reviewers.
>
> > +++ b/docs/interop/qcow2.txt
> > @@ -367,6 +367,12 @@ The fields of the zoned extension are:
> >  The maximal number of 512-byte sectors of a zone
> >  append request that can be issued to the device.
> >
> > +  36 - 43:  zonedmeta_offset
> > +The offset of zoned metadata structure in the file in 
> > bytes.
>
> For the spec to be useful, you also need to add a section describing
> the layout of the zoned metadata structure actually is.
>
> > +
> > +  44 - 51:  zonedmeta_size
> > +The size of zoned metadata in bytes.
> > +
>
> Can the zoned metadata structure ever occupy more than 4G, or can this
> field be sized at 4 bytes instead of 8?

The zoned metadata is the write pointers of all zones. The size of it
is nr_zones (uint32_t) * write_pointer size (uint64_t). So it will not
occupy more than 4G. But it still need more than 4 bytes.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>



Re: [PATCH v4 2/4] qcow2: add configurations for zoned format extension

2023-09-25 Thread Sam Li
Markus Armbruster  于2023年9月25日周一 21:05写道:
>
> Sam Li  writes:
>
> > To configure the zoned format feature on the qcow2 driver, it
> > requires settings as: the device size, zone model, zone size,
> > zone capacity, number of conventional zones, limits on zone
> > resources (max append sectors, max open zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=1
> >
> > Signed-off-by: Sam Li 
>
> [...]
>
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 2b1d493d6e..2aad82c399 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5021,6 +5021,27 @@
> >  # @compression-type: The image cluster compression method
> >  # (default: zlib, since 5.1)
> >  #
> > +# @zone-model: Zoned device model, 1 for host-managed and 0 for
> > +# non-zoned devices (default: 0, since 8.2)
>
> Shouldn't this be a QAPI enum rather than a number?
>
> > +#
> > +# @zone-size: Total number of logical blocks within zones in bytes
> > +# (since 8.2)
> > +#
> > +# @zone-capacity: The number of usable logical blocks within zones
> > +# in bytes.  A zone capacity is always smaller or equal to the
> > +# zone size. (since 8.2)
> > +#
> > +# @nr-conv-zones: The number of conventional zones of the zoned device
> > +# (since 8.2)
>
> I still think @conventional-zones would be more obvious.
>
> > +#
> > +# @max-open-zones: The maximal number of open zones (since 8.2)
> > +#
> > +# @max-active-zones: The limit of the zones that have the implicit
> > +# open, explicit open or closed state (since 8.2)
>
> Maybe "The maximum number of zones in the implicit open, explicit open
> or closed state".
>
> (I'll repeat suggestions until you reject them, just to make sure they
> get ignored by accident)

Thanks for noticing. I will change them (enum, conv, maz) in v5.


>
> > +#
> > +# @max-append-sectors: The maximal number of 512-byte sectors of a zone
> > +# append request that can be issued to the device. (since 8.2)
> > +#
> >  # Since: 2.12
> >  ##
> >  { 'struct': 'BlockdevCreateOptionsQcow2',
> > @@ -5037,7 +5058,14 @@
> >  '*preallocation':   'PreallocMode',
> >  '*lazy-refcounts':  'bool',
> >  '*refcount-bits':   'int',
> > -'*compression-type':'Qcow2CompressionType' } }
> > +'*compression-type':'Qcow2CompressionType',
> > +'*zone-model': 'uint8',
> > +'*zone-size':  'size',
> > +'*zone-capacity':  'size',
> > +'*nr-conv-zones':  'uint32',
> > +'*max-open-zones': 'uint32',
> > +'*max-active-zones':   'uint32',
> > +'*max-append-sectors': 'uint32' } }
> >
> >  ##
> >  # @BlockdevCreateOptionsQed:
>



Re: [PATCH v3 2/4] qcow2: add configurations for zoned format extension

2023-09-18 Thread Sam Li
Markus Armbruster  于2023年9月18日周一 22:46写道:
>
> Sam Li  writes:
>
> > Markus Armbruster  于2023年9月1日周五 19:08写道:
> >>
> >> Sam Li  writes:
> >>
> >> > To configure the zoned format feature on the qcow2 driver, it
> >> > requires following arguments: the device size, zoned profile,
> >>
> >> "Zoned profile" is gone in v3.
> >>
> >> > zone model, zone size, zone capacity, number of conventional
> >> > zones, limits on zone resources (max append sectors, max open
> >> > zones, and max_active_zones).
> >> >
> >> > To create a qcow2 file with zoned format, use command like this:
> >> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> >> > zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> >> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> >> > -o zone_model=1
> >> >
> >> > Signed-off-by: Sam Li 
> >>
> >> [...]
> >>
> >> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> > index 2b1d493d6e..0d8f9e0a88 100644
> >> > --- a/qapi/block-core.json
> >> > +++ b/qapi/block-core.json
> >> > @@ -5021,6 +5021,27 @@
> >> >  # @compression-type: The image cluster compression method
> >> >  # (default: zlib, since 5.1)
> >> >  #
> >> > +# @zone-model: Zoned device model, 1 for host-managed and 0 for
> >>
> >> Why is this encoded as a number?
> >>
> >> If it's fundamentally a flag, use bool.
> >>
> >> If more models could appear in the future, make it an enum.
> >>
> >
> > Yes, it is an enum.
> >
> > typedef enum BlockZoneModel {
> > BLK_Z_NONE = 0x0, /* Regular block device */
> > BLK_Z_HM = 0x1, /* Host-managed zoned block device */
> > BLK_Z_HA = 0x2, /* Host-aware zoned block device */
> > } BlockZoneModel;
>
> Please make it an enum in the QAPI schema, too.

I see.

>
> >> > +# non-zoned devices (default: 0, since 8.0)
> >>
> >> Since 8.2.  More of the same below.
> >>
> >> > +#
> >> > +# @zone-size: Total number of logical blocks within zones in bytes
> >> > +# (since 8.0)
> >> > +#
> >> > +# @zone-capacity: The number of usable logical blocks within zones
> >> > +# in bytes. A zone capacity is always smaller or equal to the
> >> > +# zone size. (since 8.0)
> >>
> >> Two spaces between sentences for consistency, please.
> >>
> >> > +#
> >> > +# @nr-conv-zones: The number of conventional zones of the zoned device
> >> > +# (since 8.0)
> >>
> >> I think @conventional-zones would be more obvious.
> >>
> >> > +#
> >> > +# @max-open-zones: The maximal allowed open zones (since 8.0)
> >>
> >> Maybe "The maximum number of open zones".
> >>
> >> > +#
> >> > +# @max-active-zones: The limit of the zones that have the implicit
> >> > +# open, explicit open or closed state (since 8.0)
> >>
> >> Maybe "The maximum number of zones in the implicit open, explicit open
> >> or closed state".
> >>
> >> > +#
> >> > +# @max-append-sectors: The maximal data size in sectors of a zone
> >> > +# append request that can be issued to the device. (since 8.0)
> >>
> >> What's the sector size, and how can the user determine it?  Why can't we
> >> use bytes here?
> >
> > The sector size is 512 bytes.
>
> Needs to be documented.
>
> I believe bytes would be easier to document, which makes me suspect
> they'd be the simpler interface.
>
> >   It's more for conventional use.
>
> I'm afraid I don't understand this part.  Do I have to?

Not necessarily. I adopt the name from zoned storage part of virtio spec.

+If the VIRTIO_BLK_F_ZONED feature is negotiated, then in
+\field{virtio_blk_zoned_characteristics},
+\begin{itemize}
+\item \field{zone_sectors} value is expressed in 512-byte sectors.
+\item \field{max_append_sectors} value is expressed in 512-byte sectors.
+\item \field{write_granularity} value is expressed in bytes.
+\end{itemize}

>
> >> > +#
> >> >  # Since: 2.12
> >> >  ##
> >> >  { 'struct': 'BlockdevCreateOptionsQcow2',
> >> > @@ -5037,7 +5058,14 @@
> >> >  '*preallocation':   'PreallocMode',
> >> >  '*lazy-refcounts':  'bool',
> >> >  '*refcount-bits':   'int',
> >> > -'*compression-type':'Qcow2CompressionType' } }
> >> > +'*compression-type':'Qcow2CompressionType',
> >> > +'*zone-model': 'uint8',
> >> > +'*zone-size':  'size',
> >> > +'*zone-capacity':  'size',
> >> > +'*nr-conv-zones':  'uint32',
> >> > +'*max-open-zones': 'uint32',
> >> > +'*max-active-zones':   'uint32',
> >> > +'*max-append-sectors': 'uint32' } }
> >> >
> >> >  ##
> >> >  # @BlockdevCreateOptionsQed:
> >>
>



[PATCH v4 4/4] iotests: test the zoned format feature for qcow2 file

2023-09-18 Thread Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check -qcow2 zoned-qcow2

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned-qcow2 | 129 ++
 tests/qemu-iotests/tests/zoned-qcow2.out | 133 +++
 2 files changed, 262 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

diff --git a/tests/qemu-iotests/tests/zoned-qcow2 
b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 00..22e34ff6a0
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,129 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+  _cleanup_test_img
+  _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
+-o zone_capacity=64M -o nr_conv_zones=0 -o max_append_sectors=131072 \
+-o max_open_zones=0 -o max_active_zones=0 -o zone_model=1
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test persistent zone states"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+$QEMU_IO $IMG -c "zrp 0x2C00 2" # 0x2C00 / 512 = 0x16
+echo
+echo
+echo "(2) finish zones[0]:"
+$QEMU_IO $IMG -c "zf 0 0x400" # 0x400 / 512 = 0x2
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "finish zones[1]"
+$QEMU_IO $IMG -c "zf 64M 64M"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "finish the last zone"
+$QEMU_IO $IMG -c "zf 0x2C00 0x400"
+$QEMU_IO $IMG -c "zrp 0x2C00 2"
+echo
+echo
+echo "(3) reset zones[0]: full => empty"
+$QEMU_IO $IMG -c "zrs 0 0x400" # 0x400 / 512 = 0x2
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "reset zones[1]:"
+$QEMU_IO $IMG -c "zrs 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "reset the last zone"
+$QEMU_IO $IMG -c "zrs 0x2C00 0x400"
+$QEMU_IO $IMG -c "zrp 0x2C00 2"
+echo
+echo
+echo "(4) append write with (4k, 8k) data" # the physical block size of the 
device is 4096
+$QEMU_IO $IMG -c "zrp 0 12"
+echo "Append write zones[0] one time:"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "Append write zones[0] twice:"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "Append write zones[1] one time:"
+$QEMU_IO $IMG -c "zap -p 0x400 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "Append write zones[1] twice:"
+$QEMU_IO $IMG -c "zap -p 0x400 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M"
+$QEMU_IO $IMG -c "zrp 0 12"
+echo
+echo
+echo "case 2: test a sets of ops that works or not"
+
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x1000"
+echo "wrote (4k, 4k):"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x3ffd000"
+echo "wrote to full:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M"
+$QEMU_IO $IMG -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x1000"
+$QEMU_IO $IMG -c "zap -p 0xc00 0x1000 0x1000"
+$QEMU_IO $IMG -c "zap -p 0x2000 0x1000 0x1000"
+echo "wrote three zones:"
+$QEMU_IO $IMG -c "zrp 0 12"
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M"
+$QEMU_IO $IMG -c "zrp 0 12"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned-qcow2.out 
b/tests/qemu-iotests/tests/zoned-qcow2.out
new file mode 100644
index 0

[PATCH v4 3/4] qcow2: add zoned emulation capability

2023-09-18 Thread Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.

Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.

Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones.

Signed-off-by: Sam Li 
---
 block/qcow2.c  | 709 -
 block/qcow2.h  |   2 +
 block/trace-events |   2 +
 docs/interop/qcow2.txt |   6 +
 4 files changed, 717 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 521276fc51..8240f74de8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -194,6 +194,156 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
*fmt, Error **errp)
 return cryptoopts_qdict;
 }
 
+#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
+
+/*
+ * To emulate a real zoned device, closed, empty and full states are
+ * preserved after a power cycle. Open states are in-memory and will
+ * be lost after closing the device. Read-only and offline states are
+ * device-internal events, which are not considered for simplicity.
+ */
+static inline BlockZoneState qcow2_get_zs(BlockDriverState *bs,
+ uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+uint64_t zone_wp;
+uint64_t zone_start;
+
+if (QLIST_IS_INSERTED(>wp[index], exp_open_zone_entry)) {
+return BLK_ZS_EOPEN;
+} else if (QLIST_IS_INSERTED(>wp[index], imp_open_zone_entry)) {
+return BLK_ZS_IOPEN;
+}
+
+zone_start = index * bs->bl.zone_size;
+zone_wp = bs->wps->wp[index];
+if (zone_wp == zone_start) {
+return BLK_ZS_EMPTY;
+} else if (zone_wp >= zone_start + bs->bl.zone_capacity) {
+return BLK_ZS_FULL;
+} else if (zone_wp > zone_start) {
+return BLK_ZS_CLOSED;
+} else {
+return BLK_ZS_NOT_WP;
+}
+}
+
+/*
+ * Write the new wp value to the dedicated location of the disk file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index) {
+BDRVQcow2State *s = bs->opaque;
+uint64_t wpv = *wp;
+int ret;
+
+ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+if (ret < 0) {
+goto exit;
+}
+trace_qcow2_wp_tracking(index, *wp >> BDRV_SECTOR_BITS);
+return ret;
+
+exit:
+*wp = wpv;
+error_report("Failed to write metadata with file");
+return ret;
+}
+
+static bool qcow2_check_active_zones(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+if (!s->zoned_header.max_active_zones) {
+return true;
+}
+
+if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+< s->zoned_header.max_active_zones) {
+return true;
+}
+
+return false;
+}
+
+static bool qcow2_check_open_zones(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+if (!s->zoned_header.max_open_zones) {
+return true;
+}
+
+if (s->nr_zones_exp_open + s->nr_zones_imp_open
+< s->zoned_header.max_open_zones) {
+return true;
+}
+
+if(s->nr_zones_imp_open && qcow2_check_active_zones(bs)) {
+/*
+ * close one implicitly open zone only when there is room left
+ * for active zones
+ */
+QLIST_REMOVE(>wp[0], imp_open_zone_entry);
+s->nr_zones_imp_open--;
+trace_qcow2_imp_open_zones(0x23, s->nr_zones_imp_open);
+s->nr_zones_closed++;
+return true;
+}
+
+return false;
+}
+
+/*
+ * The zoned device has limited zone resources of open, closed, active
+ * zones. This function manages open zones with the constraint of max
+ * active zones limit. It checks if a zone can transition to implicit open
+ * or explicit open while maintaining max open zone and max active zone
+ * limits.
+ *
+ * Implicit open state can change to closed only if there is no room of
+ * open zones. Meanwhile, it must be within the active zone limit.
+ */
+static int qcow2_check_zone_resources(BlockDriverState *bs,
+  BlockZoneState zs)
+{
+switch (zs) {
+case BLK_ZS_EMPTY:
+if (!qcow2_check_active_zones(bs)) {
+error_report("No enough active zones");
+return -EINVAL;
+}
+break;
+case 

[PATCH v4 2/4] qcow2: add configurations for zoned format extension

2023-09-18 Thread Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires settings as: the device size, zone model, zone size,
zone capacity, number of conventional zones, limits on zone
resources (max append sectors, max open zones, and max_active_zones).

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=1

Signed-off-by: Sam Li 
---
 block/qcow2.c| 186 ++-
 block/qcow2.h|  28 +
 docs/interop/qcow2.txt   |  36 ++
 include/block/block_int-common.h |  13 +++
 qapi/block-core.json |  30 -
 5 files changed, 291 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b48cd9ce63..521276fc51 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
 #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 uint64_t offset;
 int ret;
 Qcow2BitmapHeaderExt bitmaps_ext;
+Qcow2ZonedHeaderExtension zoned_ext;
 
 if (need_update_header != NULL) {
 *need_update_header = false;
@@ -431,6 +433,55 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 break;
 }
 
+case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+{
+if (ext.len != sizeof(zoned_ext)) {
+error_setg(errp, "zoned_ext: Invalid extension length");
+return -EINVAL;
+}
+ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+return ret;
+}
+
+zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+zoned_ext.nr_conv_zones = be32_to_cpu(zoned_ext.nr_conv_zones);
+zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
+zoned_ext.max_active_zones =
+be32_to_cpu(zoned_ext.max_active_zones);
+zoned_ext.max_append_sectors =
+be32_to_cpu(zoned_ext.max_append_sectors);
+s->zoned_header = zoned_ext;
+
+/* refuse to open broken images */
+if (zoned_ext.zone_size == 0) {
+error_setg(errp, "Zoned extension header zone_size field "
+ "can not be 0");
+return -EINVAL;
+}
+if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
+error_setg(errp, "Zoned extension header zone_capacity field "
+ "can not be larger that zone_size field");
+return -EINVAL;
+}
+if (zoned_ext.nr_zones != DIV_ROUND_UP(
+bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
+error_setg(errp, "Zoned extension header nr_zones field "
+ "is wrong");
+return -EINVAL;
+}
+
+#ifdef DEBUG_EXT
+printf("Qcow2: Got zoned format extension: "
+   "offset=%" PRIu32 "\n", offset);
+#endif
+break;
+}
+
 default:
 /* unknown magic - save it in case we need to rewrite the header */
 /* If you add a new feature, make sure to also update the fast
@@ -1967,6 +2018,14 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }
 bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
 bs->bl.pdiscard_alignment = s->cluster_size;
+bs->bl.zoned = s->zoned_header.zoned;
+bs->bl.nr_zones = s->zoned_header.nr_zones;
+bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
+bs->bl.max_active_zones = s->zoned_header.max_active_zones;
+bs->bl.max_open_zones = s->zoned_header.max_open_zones;
+bs->bl.zone_size = s->zoned_header.zone_size;
+bs->bl.zone_capacity = s->zoned_header.zone_capacity;
+bs->bl.write_granularity = BDRV_SECTOR_SIZE;
 }
 
 static int qcow2_reopen_prepare(BDRVReopenState *state,
@@ -3089,6 +3148,30 @@ int qcow2_update_header(BlockDriverState *bs)
 buflen -

[PATCH v4 1/4] docs/qcow2: add the zoned format feature

2023-09-18 Thread Sam Li
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file can be taken as zoned device and passed through by
virtio-blk device or NVMe ZNS device to the guest given zoned
information.

Signed-off-by: Sam Li 
---
 docs/system/qemu-block-drivers.rst.inc | 33 ++
 1 file changed, 33 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..4647c5fa29 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,39 @@ This section describes each format and the options that 
are supported for it.
 filename`` to check if the NOCOW flag is set or not (Capital 'C' is
 NOCOW flag).
 
+  .. option:: zoned
+1 for host-managed zoned device and 0 for a non-zoned device.
+
+  .. option:: zone_size
+
+The size of a zone in bytes. The device is divided into zones of this
+size with the exception of the last zone, which may be smaller.
+
+  .. option:: zone_capacity
+
+The initial capacity value, in bytes, for all zones. The capacity must
+be less than or equal to zone size. If the last zone is smaller, then
+its capacity is capped.
+
+The zone capacity is per zone and may be different between zones in real
+devices. For simplicity, QCow2 sets all zones to the same capacity.
+
+  .. option:: zone_nr_conv
+
+The number of conventional zones of the zoned device.
+
+  .. option:: max_open_zones
+
+The maximal allowed open zones.
+
+  .. option:: max_active_zones
+
+The limit of the zones with implicit open, explicit open or closed state.
+
+  .. option:: max_append_sectors
+
+The maximal number of 512-byte sectors in a zone append request.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1




[PATCH v4 0/4] Add full zoned storage emulation to qcow2 driver

2023-09-18 Thread Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.

Write pointer are preserved in the zoned metadata. It will be
recovered after power cycle. Meanwhile, any open (implicit or
explicit) zone will show up as closed.

Zone states are in memory. Read-only and offline states are
device-internal events, which are not considerred in qcow2
emulation for simplicity. The other zone states
(closed, empty, full) can be inferred from write poiner
values, presistent across QEMU reboots. The open states are
kept in memory using open zone lists.

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=1

Then add it to the QEMU command line:
-blockdev 
node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 
\
-device virtio-blk-pci,drive=drive1 \

v3->v4:
- use QLIST for implicit, explicit open zones management [Stefan]
- keep zone states in memory and drop state bits in wp metadata structure 
[Damien, Stefan]
- change zone resource management and iotests accordingly
- add tracing for number of implicit zones
- address review comments [Stefan, Markus]:
  * documentation, config, style

v2->v3:
- drop zoned_profile option [Klaus]
- reformat doc comments of qcow2 [Markus]
- add input validation and checks for zoned information [Stefan]
- code style: format, comments, documentation, naming [Stefan]
- add tracing function for wp tracking [Stefan]
- reconstruct io path in check_zone_resources [Stefan]

v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
  * fix zoned_mata allocation size
  * use bitwise or than addition
  * fix wp index overflow and locking
  * cleanups: comments, naming

Sam Li (4):
  docs/qcow2: add the zoned format feature
  qcow2: add configurations for zoned format extension
  qcow2: add zoned emulation capability
  iotests: test the zoned format feature for qcow2 file

 block/qcow2.c| 893 ++-
 block/qcow2.h|  30 +
 block/trace-events   |   2 +
 docs/interop/qcow2.txt   |  42 ++
 docs/system/qemu-block-drivers.rst.inc   |  33 +
 include/block/block_int-common.h |  13 +
 qapi/block-core.json |  30 +-
 tests/qemu-iotests/tests/zoned-qcow2 | 129 
 tests/qemu-iotests/tests/zoned-qcow2.out | 133 
 9 files changed, 1302 insertions(+), 3 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

-- 
2.40.1




Re: [PATCH v3 2/4] qcow2: add configurations for zoned format extension

2023-09-18 Thread Sam Li
Stefan Hajnoczi  于2023年9月14日周四 04:12写道:
>
> On Mon, Aug 28, 2023 at 11:09:53PM +0800, Sam Li wrote:
> > To configure the zoned format feature on the qcow2 driver, it
> > requires following arguments: the device size, zoned profile,
> > zone model, zone size, zone capacity, number of conventional
> > zones, limits on zone resources (max append sectors, max open
> > zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=1
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/qcow2.c| 176 ++-
> >  block/qcow2.h|  20 
> >  docs/interop/qcow2.txt   |  36 +++
> >  include/block/block_int-common.h |  13 +++
> >  qapi/block-core.json |  30 +-
> >  5 files changed, 273 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index c51388e99d..7074bfc620 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -73,6 +73,7 @@ typedef struct {
> >  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
> >  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
> >  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> > +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
> >
> >  static int coroutine_fn
> >  qcow2_co_preadv_compressed(BlockDriverState *bs,
> > @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> > start_offset,
> >  uint64_t offset;
> >  int ret;
> >  Qcow2BitmapHeaderExt bitmaps_ext;
> > +Qcow2ZonedHeaderExtension zoned_ext;
> >
> >  if (need_update_header != NULL) {
> >  *need_update_header = false;
> > @@ -431,6 +433,55 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> > start_offset,
> >  break;
> >  }
> >
> > +case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> > +{
> > +if (ext.len != sizeof(zoned_ext)) {
> > +error_setg(errp, "zoned_ext: Invalid extension length");
> > +return -EINVAL;
> > +}
> > +ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
> > +if (ret < 0) {
> > +error_setg_errno(errp, -ret, "zoned_ext: "
> > + "Could not read ext header");
> > +return ret;
> > +}
> > +
> > +zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> > +zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> > +zoned_ext.nr_conv_zones = be32_to_cpu(zoned_ext.nr_conv_zones);
> > +zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> > +zoned_ext.max_open_zones = 
> > be32_to_cpu(zoned_ext.max_open_zones);
> > +zoned_ext.max_active_zones =
> > +be32_to_cpu(zoned_ext.max_active_zones);
> > +zoned_ext.max_append_sectors =
> > +be32_to_cpu(zoned_ext.max_append_sectors);
> > +s->zoned_header = zoned_ext;
> > +
> > +/* refuse to open broken images */
> > +if (zoned_ext.zone_size == 0) {
> > +error_setg(errp, "Zoned extension header zone_size field "
> > + "can not be 0");
> > +return -EINVAL;
> > +}
> > +if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
> > +error_setg(errp, "Zoned extension header zone_capacity 
> > field "
> > + "can not be larger that zone_size field");
> > +return -EINVAL;
> > +}
> > +if (zoned_ext.nr_zones != DIV_ROUND_UP(
> > +bs->total_sectors * BDRV_SECTOR_SIZE, 
> > zoned_ext.zone_size)) {
> > +error_setg(errp, "Zoned extension header nr_zones field "
> > + "gets wrong");
>
> "gets" -> "is"
>
> > +return -EINVAL;
> > +}
> > +
> > +#ifdef DEBUG_EXT
> > +printf("Qcow2: Got zoned format extension: "
> > +   "offset=%"

Re: [PATCH v3 2/4] qcow2: add configurations for zoned format extension

2023-09-18 Thread Sam Li
Markus Armbruster  于2023年9月1日周五 19:08写道:
>
> Sam Li  writes:
>
> > To configure the zoned format feature on the qcow2 driver, it
> > requires following arguments: the device size, zoned profile,
>
> "Zoned profile" is gone in v3.
>
> > zone model, zone size, zone capacity, number of conventional
> > zones, limits on zone resources (max append sectors, max open
> > zones, and max_active_zones).
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zone_model=1
> >
> > Signed-off-by: Sam Li 
>
> [...]
>
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 2b1d493d6e..0d8f9e0a88 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5021,6 +5021,27 @@
> >  # @compression-type: The image cluster compression method
> >  # (default: zlib, since 5.1)
> >  #
> > +# @zone-model: Zoned device model, 1 for host-managed and 0 for
>
> Why is this encoded as a number?
>
> If it's fundamentally a flag, use bool.
>
> If more models could appear in the future, make it an enum.
>

Yes, it is an enum.

typedef enum BlockZoneModel {
BLK_Z_NONE = 0x0, /* Regular block device */
BLK_Z_HM = 0x1, /* Host-managed zoned block device */
BLK_Z_HA = 0x2, /* Host-aware zoned block device */
} BlockZoneModel;

> > +# non-zoned devices (default: 0, since 8.0)
>
> Since 8.2.  More of the same below.
>
> > +#
> > +# @zone-size: Total number of logical blocks within zones in bytes
> > +# (since 8.0)
> > +#
> > +# @zone-capacity: The number of usable logical blocks within zones
> > +# in bytes. A zone capacity is always smaller or equal to the
> > +# zone size. (since 8.0)
>
> Two spaces between sentences for consistency, please.
>
> > +#
> > +# @nr-conv-zones: The number of conventional zones of the zoned device
> > +# (since 8.0)
>
> I think @conventional-zones would be more obvious.
>
> > +#
> > +# @max-open-zones: The maximal allowed open zones (since 8.0)
>
> Maybe "The maximum number of open zones".
>
> > +#
> > +# @max-active-zones: The limit of the zones that have the implicit
> > +# open, explicit open or closed state (since 8.0)
>
> Maybe "The maximum number of zones in the implicit open, explicit open
> or closed state".
>
> > +#
> > +# @max-append-sectors: The maximal data size in sectors of a zone
> > +# append request that can be issued to the device. (since 8.0)
>
> What's the sector size, and how can the user determine it?  Why can't we
> use bytes here?

The sector size is 512 bytes. It's more for conventional use.

>
> > +#
> >  # Since: 2.12
> >  ##
> >  { 'struct': 'BlockdevCreateOptionsQcow2',
> > @@ -5037,7 +5058,14 @@
> >  '*preallocation':   'PreallocMode',
> >  '*lazy-refcounts':  'bool',
> >  '*refcount-bits':   'int',
> > -'*compression-type':'Qcow2CompressionType' } }
> > +'*compression-type':'Qcow2CompressionType',
> > +'*zone-model': 'uint8',
> > +'*zone-size':  'size',
> > +'*zone-capacity':  'size',
> > +'*nr-conv-zones':  'uint32',
> > +'*max-open-zones': 'uint32',
> > +'*max-active-zones':   'uint32',
> > +'*max-append-sectors': 'uint32' } }
> >
> >  ##
> >  # @BlockdevCreateOptionsQed:
>



Re: [PATCH v2 3/4] qcow2: add zoned emulation capability

2023-08-29 Thread Sam Li
Damien Le Moal  于2023年8月29日周二 15:14写道:
>
> On 8/29/23 15:27, Sam Li wrote:
> > Damien Le Moal  于2023年8月29日周二 14:06写道:
> >>
> >> On 8/28/23 20:55, Sam Li wrote:
> >>>>> +/* close one implicitly open zones to make it available */
> >>>>> +for (int i = s->zoned_header.zone_nr_conv;
> >>>>> +i < bs->bl.nr_zones; ++i) {
> >>>>> +uint64_t *wp = >wps->wp[i];
> >>>>> +if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
> >>>>> +ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
> >>>>
> >>>> I'm wondering if it's correct to store the zone state persistently in
> >>>> the qcow2 file. If the guest or QEMU crashes, then zones will be left in
> >>>> states like EOPEN. Since the guest software will have forgotten about
> >>>> explicitly opened zones, the guest would need to recover zone states.
> >>>> I'm not sure if existing software is designed to do that.
> >>>>
> >>>> Damien: Should the zone state be persistent?
> >>
> >> Yes and no. Yes you need to preserve/maintain zone states but not as is.
> >> With a real drive, if you power cycle the device, you get the following 
> >> states
> >> changes:
> >>
> >>  Before | After power cycle
> >> +---
> >>  EMPTY  | EMPTY
> >>  FULL   | FULL
> >>  IMP. OPEN  | CLOSED
> >>  EXP. OPEN  | CLOSED
> >>  CLOSED | CLOSED
> >>  READ=ONLY  | READ-ONLY
> >>  OFFLINE| OFFLINE
> >>
> >> So any open (implicit or explicit) zone will show up as closed after power
> >> cycle. That is, the number of "active" zones does not change.
> >> For the qcow2 emulation, as long as you do not also emulate read-only and
> >> offline zones, you actually do not need to save the zone state in the zone
> >> metadata. On startup, you can infer the state from the zone write pointer:
> >>
> >> zone wp == zone start -> EMPTY
> >> zone wp >= zone capacity -> FULL
> >> zone wp > zone start -> CLOSED
> >>
> >> And make sure that all closed zones are counted as the initial number of 
> >> active
> >> zones. The initial number of open zones will always be 0.
> >>
> >> So it is easy :)
> >
> > Thanks for the explanations!
> >
> > Read-only and offline are device internal events. Does qcow2 emulation
> > need to emulate that?
> >
> > Current NVMe ZNS emulation in QEMU has a nvme_offline_zone() function.
> > Does it suggest keeping the offline state persistent?
> > https://github.com/qemu/qemu/blob/master/hw/nvme/ctrl.c#L3740
>
> The offline state is useful for testing only. If a zone goes offline, it
> generally means that the device is dying...
> At least for now, I do not think it is needed for qcow2. That can always be
> added later.

Ok. Then the wps of zoned metadata structure would be almost like
zoned emulation in file-posix. Current wp design can still preserve as
is. Though, it will be only in memory then.

This change will be reflected in v4 (newest v3 for now).

Sam



Re: [PATCH v2 3/4] qcow2: add zoned emulation capability

2023-08-29 Thread Sam Li
Damien Le Moal  于2023年8月29日周二 14:06写道:
>
> On 8/28/23 20:55, Sam Li wrote:
> >>> +/* close one implicitly open zones to make it available */
> >>> +for (int i = s->zoned_header.zone_nr_conv;
> >>> +i < bs->bl.nr_zones; ++i) {
> >>> +uint64_t *wp = >wps->wp[i];
> >>> +if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
> >>> +ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
> >>
> >> I'm wondering if it's correct to store the zone state persistently in
> >> the qcow2 file. If the guest or QEMU crashes, then zones will be left in
> >> states like EOPEN. Since the guest software will have forgotten about
> >> explicitly opened zones, the guest would need to recover zone states.
> >> I'm not sure if existing software is designed to do that.
> >>
> >> Damien: Should the zone state be persistent?
>
> Yes and no. Yes you need to preserve/maintain zone states but not as is.
> With a real drive, if you power cycle the device, you get the following states
> changes:
>
>  Before | After power cycle
> +---
>  EMPTY  | EMPTY
>  FULL   | FULL
>  IMP. OPEN  | CLOSED
>  EXP. OPEN  | CLOSED
>  CLOSED | CLOSED
>  READ=ONLY  | READ-ONLY
>  OFFLINE| OFFLINE
>
> So any open (implicit or explicit) zone will show up as closed after power
> cycle. That is, the number of "active" zones does not change.
> For the qcow2 emulation, as long as you do not also emulate read-only and
> offline zones, you actually do not need to save the zone state in the zone
> metadata. On startup, you can infer the state from the zone write pointer:
>
> zone wp == zone start -> EMPTY
> zone wp >= zone capacity -> FULL
> zone wp > zone start -> CLOSED
>
> And make sure that all closed zones are counted as the initial number of 
> active
> zones. The initial number of open zones will always be 0.
>
> So it is easy :)

Thanks for the explanations!

Read-only and offline are device internal events. Does qcow2 emulation
need to emulate that?

Current NVMe ZNS emulation in QEMU has a nvme_offline_zone() function.
Does it suggest keeping the offline state persistent?
https://github.com/qemu/qemu/blob/master/hw/nvme/ctrl.c#L3740

Sam



[PATCH v3 4/4] iotests: test the zoned format feature for qcow2 file

2023-08-28 Thread Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check -qcow2 zoned-qcow2

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned-qcow2 | 135 ++
 tests/qemu-iotests/tests/zoned-qcow2.out | 140 +++
 2 files changed, 275 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

diff --git a/tests/qemu-iotests/tests/zoned-qcow2 
b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 00..7ec8b18860
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,135 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+  _cleanup_test_img
+  _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
+-o zone_capacity=64M -o nr_conv_zones=0 -o max_append_sectors=131072 \
+-o max_open_zones=0 -o max_active_zones=0 -o zone_model=1
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test if one zone operation works"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+$QEMU_IO $IMG -c "zrp 0x2C00 2" # 0x2C00 / 512 = 0x16
+echo
+echo
+echo "open zones[0]:"
+$QEMU_IO $IMG -c "zo 0 0x400" # 0x400 / 512 = 0x2
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "open zones[1]"
+$QEMU_IO $IMG -c "zo 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "open the last zone"
+$QEMU_IO $IMG -c "zo 0x2C00 0x400"
+$QEMU_IO $IMG -c "zrp 0x2C00 2"
+echo
+echo
+echo "close zones[0]"
+$QEMU_IO $IMG -c "zc 0 0x400"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "close the last zone"
+$QEMU_IO $IMG -c "zc 0x3e7000 0x400"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(4) finish zones[1]"
+$QEMU_IO $IMG -c "zf 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo
+echo "(5) reset zones[1]"
+$QEMU_IO $IMG -c "zrs 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo
+echo "(6) append write with (4k, 8k) data" # the physical block size of the 
device is 4096
+$QEMU_IO $IMG -c "zrp 0 12"
+echo "Append write zones[0] one time:"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "Append write zones[0] twice:"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "Append write zones[1] one time:"
+$QEMU_IO $IMG -c "zap -p 0x400 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "Append write zones[1] twice:"
+$QEMU_IO $IMG -c "zap -p 0x400 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M"
+$QEMU_IO $IMG -c "zrp 0 12"
+echo
+echo
+echo "case 2: test a sets of ops that works or not"
+
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x1000"
+echo "wrote (4k, 4k):"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x3ffd000"
+echo "wrote to full:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M"
+$QEMU_IO $IMG -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x1000"
+$QEMU_IO $IMG -c "zap -p 0xc00 0x1000 0x1000"
+$QEMU_IO $IMG -c "zap -p 0x2000 0x1000 0x1000"
+echo "wrote three zones:"
+$QEMU_IO $IMG -c "zrp 0 12"
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M"
+$QEMU_IO $IMG -c "zrp 0 12"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/t

[PATCH v3 3/4] qcow2: add zoned emulation capability

2023-08-28 Thread Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.

Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.

Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones.

Signed-off-by: Sam Li 
---
 block/qcow2.c  | 657 -
 block/qcow2.h  |   2 +
 block/trace-events |   1 +
 docs/interop/qcow2.txt |   6 +
 4 files changed, 664 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7074bfc620..bc98d98c8e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -194,6 +194,153 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
*fmt, Error **errp)
 return cryptoopts_qdict;
 }
 
+#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
+
+static inline int qcow2_get_wp(uint64_t wp)
+{
+/* clear state and type information */
+return ((wp << 5) >> 5);
+}
+
+static inline int qcow2_get_zs(uint64_t wp)
+{
+return (wp >> 60);
+}
+
+static inline void qcow2_set_zs(uint64_t *wp, BlockZoneState zs)
+{
+uint64_t addr = qcow2_get_wp(*wp);
+addr |= ((uint64_t)zs << 60);
+*wp = addr;
+}
+
+/*
+ * Perform a state assignment and a flush operation that writes the new wp
+ * value to the dedicated location of the disk file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index, BlockZoneState zs) {
+BDRVQcow2State *s = bs->opaque;
+uint64_t wpv = *wp;
+int ret;
+
+qcow2_set_zs(wp, zs);
+ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+
+if (ret < 0) {
+goto exit;
+}
+trace_qcow2_wp_tracking(index, qcow2_get_wp(*wp) >> BDRV_SECTOR_BITS);
+return ret;
+
+exit:
+*wp = wpv;
+error_report("Failed to write metadata with file");
+return ret;
+}
+
+static bool qcow2_check_active_zones(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+
+if (!s->zoned_header.max_active_zones) {
+return true;
+}
+
+if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+< s->zoned_header.max_active_zones) {
+return true;
+}
+
+return false;
+}
+
+static int qcow2_check_open_zones(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+int ret;
+
+if (!s->zoned_header.max_open_zones) {
+return 0;
+}
+
+if (s->nr_zones_exp_open + s->nr_zones_imp_open
+< s->zoned_header.max_open_zones) {
+return 0;
+}
+
+if(s->nr_zones_imp_open && qcow2_check_active_zones(bs)) {
+/* TODO: it takes O(n) time complexity (n = nr_zones).
+ * Optimizations required. */
+/* close one implicitly open zones to make it available */
+for (int i = s->zoned_header.nr_conv_zones;
+i < bs->bl.nr_zones; ++i) {
+uint64_t *wp = >wps->wp[i];
+if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
+ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
+if (ret < 0) {
+return ret;
+}
+bs->wps->wp[i] = *wp;
+s->nr_zones_imp_open--;
+s->nr_zones_closed++;
+break;
+}
+}
+return 0;
+}
+
+return -EINVAL;
+}
+
+/*
+ * The zoned device has limited zone resources of open, closed, active
+ * zones.
+ */
+static int qcow2_check_zone_resources(BlockDriverState *bs,
+  BlockZoneState zs)
+{
+int ret = 0;
+
+switch (zs) {
+case BLK_ZS_EMPTY:
+if (!qcow2_check_active_zones(bs)) {
+error_report("No enough active zones");
+return -EINVAL;
+}
+return ret;
+case BLK_ZS_CLOSED:
+ret = qcow2_check_open_zones(bs);
+if (ret < 0) {
+error_report("No enough open zones");
+return ret;
+}
+return ret;
+default:
+return -EINVAL;
+}
+
+}
+
+static inline int qcow2_refresh_zonedmeta(BlockDriverState *bs)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+uint64_t wps_size = s->zoned_header.zonedmeta_size;
+g_autofree uint64_t *temp = NULL;
+temp = g_new(uint64_t, wps_size);
+

[PATCH v3 2/4] qcow2: add configurations for zoned format extension

2023-08-28 Thread Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires following arguments: the device size, zoned profile,
zone model, zone size, zone capacity, number of conventional
zones, limits on zone resources (max append sectors, max open
zones, and max_active_zones).

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=1

Signed-off-by: Sam Li 
---
 block/qcow2.c| 176 ++-
 block/qcow2.h|  20 
 docs/interop/qcow2.txt   |  36 +++
 include/block/block_int-common.h |  13 +++
 qapi/block-core.json |  30 +-
 5 files changed, 273 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c51388e99d..7074bfc620 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
 #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 uint64_t offset;
 int ret;
 Qcow2BitmapHeaderExt bitmaps_ext;
+Qcow2ZonedHeaderExtension zoned_ext;
 
 if (need_update_header != NULL) {
 *need_update_header = false;
@@ -431,6 +433,55 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 break;
 }
 
+case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+{
+if (ext.len != sizeof(zoned_ext)) {
+error_setg(errp, "zoned_ext: Invalid extension length");
+return -EINVAL;
+}
+ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+return ret;
+}
+
+zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+zoned_ext.nr_conv_zones = be32_to_cpu(zoned_ext.nr_conv_zones);
+zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
+zoned_ext.max_active_zones =
+be32_to_cpu(zoned_ext.max_active_zones);
+zoned_ext.max_append_sectors =
+be32_to_cpu(zoned_ext.max_append_sectors);
+s->zoned_header = zoned_ext;
+
+/* refuse to open broken images */
+if (zoned_ext.zone_size == 0) {
+error_setg(errp, "Zoned extension header zone_size field "
+ "can not be 0");
+return -EINVAL;
+}
+if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
+error_setg(errp, "Zoned extension header zone_capacity field "
+ "can not be larger that zone_size field");
+return -EINVAL;
+}
+if (zoned_ext.nr_zones != DIV_ROUND_UP(
+bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
+error_setg(errp, "Zoned extension header nr_zones field "
+ "gets wrong");
+return -EINVAL;
+}
+
+#ifdef DEBUG_EXT
+printf("Qcow2: Got zoned format extension: "
+   "offset=%" PRIu32 "\n", offset);
+#endif
+break;
+}
+
 default:
 /* unknown magic - save it in case we need to rewrite the header */
 /* If you add a new feature, make sure to also update the fast
@@ -1967,6 +2018,14 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }
 bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
 bs->bl.pdiscard_alignment = s->cluster_size;
+bs->bl.zoned = s->zoned_header.zoned;
+bs->bl.nr_zones = s->zoned_header.nr_zones;
+bs->wps = s->wps;
+bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
+bs->bl.max_active_zones = s->zoned_header.max_active_zones;
+bs->bl.max_open_zones = s->zoned_header.max_open_zones;
+bs->bl.zone_size = s->zoned_header.zone_size;
+bs->bl.write_granularity = BDRV_SECTOR_SIZE;
 }
 
 static int qcow2_reopen_prepare(BDRVReopenState *state,
@@ -3089,6 +3148,30 @@ int qcow2_update_header(BlockDriverState *bs)
 buflen -= ret;
 }
 
+   

[PATCH v3 0/4] Add full zoned storage emulation to qcow2 driver

2023-08-28 Thread Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zone_model=1

Then add it to the QEMU command line:
-blockdev 
node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 
\
-device virtio-blk-pci,drive=drive1 \

v2->v3:
- drop zoned_profile option [Klaus]
- reformat doc comments of qcow2 [Markus]
- add input validation and checks for zoned information [Stefan]
- code style: format, comments, documentation, naming [Stefan]
- add tracing function for wp tracking [Stefan]
- reconstruct io path in check_zone_resources [Stefan]

v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
  * fix zoned_mata allocation size
  * use bitwise or than addition
  * fix wp index overflow and locking
  * cleanups: comments, naming

Sam Li (4):
  docs/qcow2: add the zoned format feature
  qcow2: add configurations for zoned format extension
  qcow2: add zoned emulation capability
  iotests: test the zoned format feature for qcow2 file

 block/qcow2.c| 831 ++-
 block/qcow2.h|  22 +
 block/trace-events   |   1 +
 docs/interop/qcow2.txt   |  42 ++
 docs/system/qemu-block-drivers.rst.inc   |  39 ++
 include/block/block_int-common.h |  13 +
 qapi/block-core.json |  30 +-
 tests/qemu-iotests/tests/zoned-qcow2 | 135 
 tests/qemu-iotests/tests/zoned-qcow2.out | 140 
 9 files changed, 1250 insertions(+), 3 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

-- 
2.40.1




[PATCH v3 1/4] docs/qcow2: add the zoned format feature

2023-08-28 Thread Sam Li
Add the specs for the zoned format feature of the qcow2 driver.
The qcow2 file can be taken as zoned device and passed through by
virtio-blk device or NVMe ZNS device to the guest given zoned
information.

Signed-off-by: Sam Li 
---
 docs/system/qemu-block-drivers.rst.inc | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..640ab151a7 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,45 @@ This section describes each format and the options that 
are supported for it.
 filename`` to check if the NOCOW flag is set or not (Capital 'C' is
 NOCOW flag).
 
+  .. option:: zoned
+The zoned interface of zoned storage divices can different forms which
+is referred to as models. This option uses number to represent, 1 for
+host-managed and 0 for non-zoned.
+
+  .. option:: zone_size
+
+The size of a zone of the zoned device in bytes. The device is divided
+into zones of this size with the exception of the last zone, which may
+be smaller.
+
+  .. option:: zone_capacity
+
+The initial capacity value for all zones. The capacity must be less than
+or equal to zone size. If the last zone is smaller, then its capacity is
+capped. The device follows the ZBC protocol tends to have the same size
+as its zone.
+
+The zone capacity is per zone and may be different between zones in real
+devices. For simplicity, limits QCow2 emulation to the same zone capacity
+for all zones.
+
+  .. option:: zone_nr_conv
+
+The number of conventional zones of the zoned device.
+
+  .. option:: max_open_zones
+
+The maximal allowed open zones.
+
+  .. option:: max_active_zones
+
+The limit of the zones with implicit open, explicit open or closed state.
+
+  .. option:: max_append_sectors
+
+The maximal sectors in 512B blocks that is allowed to append to zones
+while writing.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1




Re: [PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-28 Thread Sam Li
Stefan Hajnoczi  于2023年8月21日周一 21:31写道:
>
> On Mon, Aug 14, 2023 at 04:58:00PM +0800, Sam Li wrote:
> > diff --git a/block/qcow2.h b/block/qcow2.h
> > index f789ce3ae0..3694c8d217 100644
> > --- a/block/qcow2.h
> > +++ b/block/qcow2.h
> > @@ -236,6 +236,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> >  uint64_t length;
> >  } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >
> > +typedef struct Qcow2ZonedHeaderExtension {
> > +/* Zoned device attributes */
> > +uint8_t zoned_profile;
> > +uint8_t zoned;
> > +uint16_t reserved16;
> > +uint32_t zone_size;
> > +uint32_t zone_capacity;
>
> Should zone capacity be stored individually for each zone (alongside the
> write pointer and other per zone metadata) instead of as a global value
> for all zones? My understanding is that NVMe ZNS does not have a global
> value and each zone could have a different zone capacity value.
>
> > +uint32_t nr_zones;
>
> Is this field necessary since it can be derived from other image
> options: nr_zones = DIV_ROUND_UP(total_length, zone_capacity)?

Yes. The bs->total_sectors in refresh_limits is zero. Keeping a
persistent nr_zones helps assigning right value instead of zero.

The process is roughly like this:
*_qcow2_create: calculate nr_zones and write it to zoned_header
->  *_qcow2_update_header: update nr_zones
->  *_qcow2_read_extensions: read nr_zones in zoned_header to
Qcow2State and check if right (valid total size here)
  -> *_refresh_limits(): set bl.nr_zones to zoned_header.nr_zones

Sam



Re: [PATCH v2 3/4] qcow2: add zoned emulation capability

2023-08-28 Thread Sam Li
Stefan Hajnoczi  于2023年8月23日周三 03:48写道:
>
> On Mon, Aug 14, 2023 at 04:58:01PM +0800, Sam Li wrote:
> > By adding zone operations and zoned metadata, the zoned emulation
> > capability enables full emulation support of zoned device using
> > a qcow2 file. The zoned device metadata includes zone type,
> > zoned device state and write pointer of each zone, which is stored
> > to an array of unsigned integers.
> >
> > Each zone of a zoned device makes state transitions following
> > the zone state machine. The zone state machine mainly describes
> > five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> > READ ONLY and OFFLINE states will generally be affected by device
> > internal events. The operations on zones cause corresponding state
> > changing.
> >
> > Zoned devices have a limit on zone resources, which puts constraints on
> > write operations into zones.
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/qcow2.c  | 676 -
> >  block/qcow2.h  |   2 +
> >  docs/interop/qcow2.txt |   2 +
> >  3 files changed, 678 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index c1077c4a4a..5ccf79cbe7 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
> > *fmt, Error **errp)
> >  return cryptoopts_qdict;
> >  }
> >
> > +#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
> > +
> > +static inline int qcow2_get_wp(uint64_t wp)
> > +{
> > +/* clear state and type information */
> > +return ((wp << 5) >> 5);
> > +}
> > +
> > +static inline int qcow2_get_zs(uint64_t wp)
> > +{
> > +return (wp >> 60);
> > +}
> > +
> > +static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
> > +{
> > +uint64_t addr = qcow2_get_wp(*wp);
> > +addr |= ((uint64_t)zs << 60);
> > +*wp = addr;
> > +}
> > +
> > +/*
> > + * File wp tracking: reset zone, finish zone and append zone can
> > + * change the value of write pointer. All zone operations will change
> > + * the state of that/those zone.
> > + * */
> > +static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
> > +/* format: operations, the wp. */
> > +printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
> > +}
> > +
> > +/*
> > + * Perform a state assignment and a flush operation that writes the new wp
> > + * value to the dedicated location of the disk file.
> > + */
> > +static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
> > + uint32_t index, BlockZoneState zs) {
> > +BDRVQcow2State *s = bs->opaque;
> > +int ret;
> > +
> > +qcow2_set_wp(wp, zs);
> > +ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
> > ++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
> > +
> > +if (ret < 0) {
> > +goto exit;
>
> Should *wp be restored to its original value to undo the effect of
> qcow2_set_wp()?
>
> > +}
> > +qcow2_wp_tracking_helper(index, *wp);
> > +return ret;
> > +
> > +exit:
> > +error_report("Failed to write metadata with file");
> > +return ret;
> > +}
> > +
> > +static int qcow2_check_active(BlockDriverState *bs)
>
> Please rename this to qcow2_check_active_zones() to avoid confusion with
> other uses "active" in qcow2.
>
> > +{
> > +BDRVQcow2State *s = bs->opaque;
> > +
> > +if (!s->zoned_header.max_active_zones) {
> > +return 0;
> > +}
> > +
> > +if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
> > +< s->zoned_header.max_active_zones) {
> > +return 0;
> > +}
> > +
> > +return -1;
> > +}
>
> (This function could return a bool instead of 0/-1 since it doesn't
> really need an int.)
>
> > +
> > +static int qcow2_check_open(BlockDriverState *bs)
>
> qcow2_check_open_zones() or, even better, qcow2_can_open_zone().
>
> > +{
> > +BDRVQcow2State *s = bs->opaque;
> > +int ret;
> > +
> > +if (!s->zoned_header.max_open_zones) {
> > +return 0;
> > +}
> > +
> > +if (s->nr_zones_exp_open + s->nr_zones_imp_open
>

Re: [PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-28 Thread Sam Li
Damien Le Moal  于2023年8月28日周一 18:22写道:
>
> On 8/28/23 19:18, Sam Li wrote:
> > Damien Le Moal  于2023年8月28日周一 18:13写道:
> >>
> >> On 8/28/23 18:22, Sam Li wrote:
> >>> Stefan Hajnoczi  于2023年8月21日周一 21:31写道:
> >>>>
> >>>> On Mon, Aug 14, 2023 at 04:58:00PM +0800, Sam Li wrote:
> >>>>> diff --git a/block/qcow2.h b/block/qcow2.h
> >>>>> index f789ce3ae0..3694c8d217 100644
> >>>>> --- a/block/qcow2.h
> >>>>> +++ b/block/qcow2.h
> >>>>> @@ -236,6 +236,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> >>>>>  uint64_t length;
> >>>>>  } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >>>>>
> >>>>> +typedef struct Qcow2ZonedHeaderExtension {
> >>>>> +/* Zoned device attributes */
> >>>>> +uint8_t zoned_profile;
> >>>>> +uint8_t zoned;
> >>>>> +uint16_t reserved16;
> >>>>> +uint32_t zone_size;
> >>>>> +uint32_t zone_capacity;
> >>>>
> >>>> Should zone capacity be stored individually for each zone (alongside the
> >>>> write pointer and other per zone metadata) instead of as a global value
> >>>> for all zones? My understanding is that NVMe ZNS does not have a global
> >>>> value and each zone could have a different zone capacity value.
> >>>
> >>> Though zone capacity is per-zone attribute, it remains same for all
> >>> zones in most cases. Referring to the NVMe ZNS spec, zone capacity
> >>> changes associate to RESET_ZONE op when the variable zone capacity bit
> >>> is '1'. It hasn't specifically tell what it is changed to. Current ZNS
> >>> emulation doesn't change zone capacity as well.
> >>>
> >>> If the Variable Zone Capacity bit is cleared to ‘0’ in the Zone
> >>> Operation Characteristics field in the Zoned
> >>> Namespace Command Set specific Identify Namespace data structure, then
> >>> this field does not change without a change to the format of the zoned
> >>> namespace.
> >>>
> >>> If the Variable Zone Capacity bit is set to ‘1’ in the Zone Operation
> >>> Characteristics field in the Zoned
> >>> Namespace Command Set specific Identify Namespace data structure, then
> >>> the zone capacity may
> >>> change upon successful completion of a Zone Management Send command
> >>> specifying the Zone Send
> >>> Action of Reset Zone.
> >>
> >> Regardless of the variable zone capacity feature, zone capacity is per 
> >> zone and
> >> may be different between zones. That is why it is reported per zone in zone
> >> report. The IO path code should not assume that the zone capacity is the 
> >> same
> >> for all zones.
> >
> > How is zone capacity changed, by devices or commands? Can you give
> > some example please?
>
> If the device does not support variable zone capacity, the zone capacity is
> fixed at device manufacturing time and never changes. It is reported per zone
> and you have to make things work with whatever value you see. The user cannot
> change device zone capacity.
>
> For you qcow2 zoned image, the equivalent is to fix the zone capacity when the
> image is created and not allowing to change it. And for simplicity, the same
> zone capacity value can be used for all zones, so having the zone capacity
> value in the header is OK.

Thanks!

>
> >
> >>
> >> For this particular case though, given that this is QCow2 emulation, 
> >> limiting
> >> ourselves to the same zone capacity for all zones is I think fine. But that
> >> should be clearly stated somewhere may be...
> >
> > I see. The qcow2 documentaion can add that.
> >
> >>
> >>>
> >>>>
> >>>>> +uint32_t nr_zones;
> >>>>
> >>>> Is this field necessary since it can be derived from other image
> >>>> options: nr_zones = DIV_ROUND_UP(total_length, zone_capacity)?
> >>>
> >>> It can be dropped. I added this for reducing duplication. Thanks!
> >>
> >> --
> >> Damien Le Moal
> >> Western Digital Research
> >>
>
> --
> Damien Le Moal
> Western Digital Research
>



Re: [PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-28 Thread Sam Li
Damien Le Moal  于2023年8月28日周一 18:13写道:
>
> On 8/28/23 18:22, Sam Li wrote:
> > Stefan Hajnoczi  于2023年8月21日周一 21:31写道:
> >>
> >> On Mon, Aug 14, 2023 at 04:58:00PM +0800, Sam Li wrote:
> >>> diff --git a/block/qcow2.h b/block/qcow2.h
> >>> index f789ce3ae0..3694c8d217 100644
> >>> --- a/block/qcow2.h
> >>> +++ b/block/qcow2.h
> >>> @@ -236,6 +236,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> >>>  uint64_t length;
> >>>  } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >>>
> >>> +typedef struct Qcow2ZonedHeaderExtension {
> >>> +/* Zoned device attributes */
> >>> +uint8_t zoned_profile;
> >>> +uint8_t zoned;
> >>> +uint16_t reserved16;
> >>> +uint32_t zone_size;
> >>> +uint32_t zone_capacity;
> >>
> >> Should zone capacity be stored individually for each zone (alongside the
> >> write pointer and other per zone metadata) instead of as a global value
> >> for all zones? My understanding is that NVMe ZNS does not have a global
> >> value and each zone could have a different zone capacity value.
> >
> > Though zone capacity is per-zone attribute, it remains same for all
> > zones in most cases. Referring to the NVMe ZNS spec, zone capacity
> > changes associate to RESET_ZONE op when the variable zone capacity bit
> > is '1'. It hasn't specifically tell what it is changed to. Current ZNS
> > emulation doesn't change zone capacity as well.
> >
> > If the Variable Zone Capacity bit is cleared to ‘0’ in the Zone
> > Operation Characteristics field in the Zoned
> > Namespace Command Set specific Identify Namespace data structure, then
> > this field does not change without a change to the format of the zoned
> > namespace.
> >
> > If the Variable Zone Capacity bit is set to ‘1’ in the Zone Operation
> > Characteristics field in the Zoned
> > Namespace Command Set specific Identify Namespace data structure, then
> > the zone capacity may
> > change upon successful completion of a Zone Management Send command
> > specifying the Zone Send
> > Action of Reset Zone.
>
> Regardless of the variable zone capacity feature, zone capacity is per zone 
> and
> may be different between zones. That is why it is reported per zone in zone
> report. The IO path code should not assume that the zone capacity is the same
> for all zones.

How is zone capacity changed, by devices or commands? Can you give
some example please?

>
> For this particular case though, given that this is QCow2 emulation, limiting
> ourselves to the same zone capacity for all zones is I think fine. But that
> should be clearly stated somewhere may be...

I see. The qcow2 documentaion can add that.

>
> >
> >>
> >>> +uint32_t nr_zones;
> >>
> >> Is this field necessary since it can be derived from other image
> >> options: nr_zones = DIV_ROUND_UP(total_length, zone_capacity)?
> >
> > It can be dropped. I added this for reducing duplication. Thanks!
>
> --
> Damien Le Moal
> Western Digital Research
>



Re: [PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-28 Thread Sam Li
Stefan Hajnoczi  于2023年8月21日周一 21:31写道:
>
> On Mon, Aug 14, 2023 at 04:58:00PM +0800, Sam Li wrote:
> > diff --git a/block/qcow2.h b/block/qcow2.h
> > index f789ce3ae0..3694c8d217 100644
> > --- a/block/qcow2.h
> > +++ b/block/qcow2.h
> > @@ -236,6 +236,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> >  uint64_t length;
> >  } QEMU_PACKED Qcow2CryptoHeaderExtension;
> >
> > +typedef struct Qcow2ZonedHeaderExtension {
> > +/* Zoned device attributes */
> > +uint8_t zoned_profile;
> > +uint8_t zoned;
> > +uint16_t reserved16;
> > +uint32_t zone_size;
> > +uint32_t zone_capacity;
>
> Should zone capacity be stored individually for each zone (alongside the
> write pointer and other per zone metadata) instead of as a global value
> for all zones? My understanding is that NVMe ZNS does not have a global
> value and each zone could have a different zone capacity value.

Though zone capacity is per-zone attribute, it remains same for all
zones in most cases. Referring to the NVMe ZNS spec, zone capacity
changes associate to RESET_ZONE op when the variable zone capacity bit
is '1'. It hasn't specifically tell what it is changed to. Current ZNS
emulation doesn't change zone capacity as well.

If the Variable Zone Capacity bit is cleared to ‘0’ in the Zone
Operation Characteristics field in the Zoned
Namespace Command Set specific Identify Namespace data structure, then
this field does not change without a change to the format of the zoned
namespace.

If the Variable Zone Capacity bit is set to ‘1’ in the Zone Operation
Characteristics field in the Zoned
Namespace Command Set specific Identify Namespace data structure, then
the zone capacity may
change upon successful completion of a Zone Management Send command
specifying the Zone Send
Action of Reset Zone.

>
> > +uint32_t nr_zones;
>
> Is this field necessary since it can be derived from other image
> options: nr_zones = DIV_ROUND_UP(total_length, zone_capacity)?

It can be dropped. I added this for reducing duplication. Thanks!



Re: [PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-28 Thread Sam Li
Markus Armbruster  于2023年8月21日周一 21:13写道:
>
> Sam Li  writes:
>
> > To configure the zoned format feature on the qcow2 driver, it
> > requires following arguments: the device size, zoned profile,
> > zoned model, zone size, zone capacity, number of conventional
> > zones, limits on zone resources (max append sectors, max open
> > zones, and max_active_zones). The zoned profile option is set
> > to zns when using the qcow2 file as a ZNS drive.
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> >  -o zoned_profile=zbc/zns
> >
> > Signed-off-by: Sam Li 
>
> [...]
>
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 2b1d493d6e..0c97ae678b 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -5020,24 +5020,42 @@
> >  #
> >  # @compression-type: The image cluster compression method
> >  # (default: zlib, since 5.1)
> > +# @zoned-profile: Two zoned device protocol options, zbc or zns
> > +# (default: off, since 8.0)
>
> When a 'str' thing accepts a fixed set of (string) values, it most
> likely should be an enum instead.  Have you considered making
> @zoned-profile one?
>
> > +# @zone-size: The size of a zone of the zoned device (since 8.0)
> > +# @zone-capacity: The capacity of a zone of the zoned device (since 8.0)
>
> In bytes, I presume?

Yes.

>
> What's the difference between size and capacity?
>

Zone size is the total number of logical blocks within zones in bytes.
Zone capacity is the number of usable logical blocks within zones in
bytes. A zone capacity is always smaller or equal to than zone size.
According to ZBC/ZAC standards, a zone capacity is equal to the zone
size. While in ZNS spec, it can be smaller. I will add the
documentation and below in the next patches.

> > +# @zone-nr-conv: The number of conventional zones of the zoned device
> > +#(since 8.0)
> > +# @max-open-zones: The maximal allowed open zones (since 8.0)
> > +# @max-active-zones: The limit of the zones that have the implicit open,
> > +#explicit open or closed state (since 8.0)
>
> Naming...  if I understand the comment correctly, then @zone-nr-conv,
> @max-open-zones, and @max-active-zones are all counting zones.  Rename
> @zone-nr-conv to @conventional-zones?
>
> > +# @max-append-sectors: The maximal sectors that is allowed to append write
>
> I'm not sure I understand the explanation.  Elaborate for me?

The max_append_sector is the maximum data size (in sectors) of a zone
append request that can be successfully issued to the device.  It is a
constraint on the maximum amount of data that can be appended to a
zone in a single request.

>
> > +#  (since 8.0)
>
> Please format like
>
>#
># @zoned-profile: Two zoned device protocol options, zbc or zns
># (default: off, since 8.0)
>#
># @zone-size: The size of a zone of the zoned device (since 8.0)
>#
># @zone-capacity: The capacity of a zone of the zoned device
># (since 8.0)
>#
># @zone-nr-conv: The number of conventional zones of the zoned device
># (since 8.0)
>#
># @max-open-zones: The maximal allowed open zones (since 8.0)
>#
># @max-active-zones: The limit of the zones that have the implicit
># open, explicit open or closed state (since 8.0)
>#
># @max-append-sectors: The maximal sectors that is allowed to append
># write (since 8.0)
>
> to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
> to conform to current conventions).
>
> >  #
> >  # Since: 2.12
> >  ##
> >  { 'struct': 'BlockdevCreateOptionsQcow2',
> > -  'data': { 'file': 'BlockdevRef',
> > -'*data-file':   'BlockdevRef',
> > -'*data-file-raw':   'bool',
> > -'*extended-l2': 'bool',
> > -'size': 'size',
> > -'*version': 'BlockdevQcow2Version',
> > -'*backing-file':'str',
> > -'*backing-fmt': 'BlockdevDriver',
> > -'*encrypt': 'QCryptoBlockCreateOptions',
> > -'*cluster-size':'size',
> > -'*preallocation':   'PreallocMode',
> > -'*lazy-refcounts':  'bool',
> > -'*refcount-bits':   'int',
> > -'*compressi

[PATCH v2] block/file-posix: fix update_zones_wp() caller

2023-08-24 Thread Sam Li
When the zoned request fail, it needs to update only the wp of
the target zones for not disrupting the in-flight writes on
these other zones. The wp is updated successfully after the
request completes.

Fixed the callers with right offset and nr_zones.

Signed-off-by: Sam Li 
---
 block/file-posix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b16e9c21a1..55e7f06a2f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2522,7 +2522,8 @@ out:
 }
 } else {
 if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
-update_zones_wp(bs, s->fd, 0, 1);
+/* write and append write are not allowed to cross zone bounaries 
*/
+update_zones_wp(bs, s->fd, offset, 1);
 }
 }
 
@@ -3472,7 +3473,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState 
*bs, BlockZoneOp op,
 len >> BDRV_SECTOR_BITS);
 ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
 if (ret != 0) {
-update_zones_wp(bs, s->fd, offset, i);
+update_zones_wp(bs, s->fd, offset, nrz);
 error_report("ioctl %s failed %d", op_name, ret);
 return ret;
 }
-- 
2.40.1




Re: [PATCH] block/file-posix: fix update_zones_wp() caller

2023-08-24 Thread Sam Li
Damien Le Moal  于2023年8月25日周五 11:32写道:
>
> On 8/25/23 12:05, Sam Li wrote:
> > Damien Le Moal  于2023年8月25日周五 07:49写道:
> >>
> >> On 8/25/23 02:39, Sam Li wrote:
> >>> When the zoned requests that may change wp fail, it needs to
> >>> update only wps of the zones within the range of the requests
> >>> for not disrupting the other in-flight requests. The wp is updated
> >>> successfully after the request completes.
> >>>
> >>> Fixed the callers with right offset and nr_zones.
> >>>
> >>> Signed-off-by: Sam Li 
> >>> ---
> >>>  block/file-posix.c | 5 +++--
> >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/block/file-posix.c b/block/file-posix.c
> >>> index b16e9c21a1..22559d6c2d 100644
> >>> --- a/block/file-posix.c
> >>> +++ b/block/file-posix.c
> >>> @@ -2522,7 +2522,8 @@ out:
> >>>  }
> >>>  } else {
> >>>  if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> >>> -update_zones_wp(bs, s->fd, 0, 1);
> >>> +update_zones_wp(bs, s->fd, offset,
> >>> +ROUND_UP(bytes, bs->bl.zone_size));
> >>
> >> Write and zone append operations are not allowed to cross zone boundaries. 
> >> So I
> >> the number of zones should always be 1. The above changes a number of 
> >> zones to a
> >> number of bytes, which seems wrong. The correct fix is I think:
> >>
> >> update_zones_wp(bs, s->fd, offset, 1);
> >>
> >
> > I see. I forgot this constraint.
> >
> >>>  }
> >>>  }
> >>>
> >>> @@ -3472,7 +3473,7 @@ static int coroutine_fn 
> >>> raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> >>>  len >> BDRV_SECTOR_BITS);
> >>>  ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
> >>>  if (ret != 0) {
> >>> -update_zones_wp(bs, s->fd, offset, i);
> >>> +update_zones_wp(bs, s->fd, offset, nrz);
> >>
> >> Same here. Why would you need to update all zones wp ? This will affect 
> >> zones
> >> that do not have a write error and potentially change there correct 
> >> in-memory wp
> >> to a wrong value. I think this also should be:
> >>
> >>update_zones_wp(bs, s->fd, offset, 1);
> >>
> >
> > Is update_zones_wp for cancelling the writes on invalid zones or
> > updating corrupted write pointers caused by caller (write, append or
> > zone_mgmt)?
> >
> > My thought is based on the latter. Zone_mgmt can manage multiple zones
> > with a single request. When the request fails, it's hard to tell which
> > zone is corrupted. The relation between the req (zone_mgmt) and
> > update_zones_wp is: if req succeeds, no updates; if req fails,
> > consider the req never happens and do again.
>
> You should update the wp of the zones that were touched by the operation that
> failed. No other zone should have its wp updated as that could cause 
> corruptions
> of the wp if there are on-going writes on these other zones.
>
> so the call should be "update_zones_wp(bs, s->fd, offset, n);"
>
> with n being the number of zones that the operation targeted.

Yes, so it's nrz in zone_mgmt. Thanks!

>
> >
> > If the former is right, then it assumes only the first zone may
> > contain an error. I am not sure it's right.
> >
> >>>  error_report("ioctl %s failed %d", op_name, ret);
> >>>  return ret;
> >>>  }
> >>
> >> --
> >> Damien Le Moal
> >> Western Digital Research
> >>
>
> --
> Damien Le Moal
> Western Digital Research
>



Re: [PATCH] block/file-posix: fix update_zones_wp() caller

2023-08-24 Thread Sam Li
Damien Le Moal  于2023年8月25日周五 07:49写道:
>
> On 8/25/23 02:39, Sam Li wrote:
> > When the zoned requests that may change wp fail, it needs to
> > update only wps of the zones within the range of the requests
> > for not disrupting the other in-flight requests. The wp is updated
> > successfully after the request completes.
> >
> > Fixed the callers with right offset and nr_zones.
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/file-posix.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index b16e9c21a1..22559d6c2d 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2522,7 +2522,8 @@ out:
> >  }
> >  } else {
> >  if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> > -update_zones_wp(bs, s->fd, 0, 1);
> > +update_zones_wp(bs, s->fd, offset,
> > +ROUND_UP(bytes, bs->bl.zone_size));
>
> Write and zone append operations are not allowed to cross zone boundaries. So 
> I
> the number of zones should always be 1. The above changes a number of zones 
> to a
> number of bytes, which seems wrong. The correct fix is I think:
>
> update_zones_wp(bs, s->fd, offset, 1);
>

I see. I forgot this constraint.

> >  }
> >  }
> >
> > @@ -3472,7 +3473,7 @@ static int coroutine_fn 
> > raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
> >  len >> BDRV_SECTOR_BITS);
> >  ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
> >  if (ret != 0) {
> > -update_zones_wp(bs, s->fd, offset, i);
> > +update_zones_wp(bs, s->fd, offset, nrz);
>
> Same here. Why would you need to update all zones wp ? This will affect zones
> that do not have a write error and potentially change there correct in-memory 
> wp
> to a wrong value. I think this also should be:
>
>update_zones_wp(bs, s->fd, offset, 1);
>

Is update_zones_wp for cancelling the writes on invalid zones or
updating corrupted write pointers caused by caller (write, append or
zone_mgmt)?

My thought is based on the latter. Zone_mgmt can manage multiple zones
with a single request. When the request fails, it's hard to tell which
zone is corrupted. The relation between the req (zone_mgmt) and
update_zones_wp is: if req succeeds, no updates; if req fails,
consider the req never happens and do again.

If the former is right, then it assumes only the first zone may
contain an error. I am not sure it's right.

> >  error_report("ioctl %s failed %d", op_name, ret);
> >  return ret;
> >  }
>
> --
> Damien Le Moal
> Western Digital Research
>



[PATCH] block/file-posix: fix update_zones_wp() caller

2023-08-24 Thread Sam Li
When the zoned requests that may change wp fail, it needs to
update only wps of the zones within the range of the requests
for not disrupting the other in-flight requests. The wp is updated
successfully after the request completes.

Fixed the callers with right offset and nr_zones.

Signed-off-by: Sam Li 
---
 block/file-posix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b16e9c21a1..22559d6c2d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2522,7 +2522,8 @@ out:
 }
 } else {
 if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
-update_zones_wp(bs, s->fd, 0, 1);
+update_zones_wp(bs, s->fd, offset,
+ROUND_UP(bytes, bs->bl.zone_size));
 }
 }
 
@@ -3472,7 +3473,7 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState 
*bs, BlockZoneOp op,
 len >> BDRV_SECTOR_BITS);
 ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
 if (ret != 0) {
-update_zones_wp(bs, s->fd, offset, i);
+update_zones_wp(bs, s->fd, offset, nrz);
 error_report("ioctl %s failed %d", op_name, ret);
 return ret;
 }
-- 
2.40.1




Re: [PATCH 4/5] file-posix: Simplify raw_co_prw's 'out' zone code

2023-08-24 Thread Sam Li
Hanna Czenczek  于2023年8月24日周四 23:53写道:
>
> We duplicate the same condition three times here, pull it out to the top
> level.
>
> Signed-off-by: Hanna Czenczek 
> ---
>  block/file-posix.c | 18 +-
>  1 file changed, 5 insertions(+), 13 deletions(-)

Reviewed-by: Sam Li 



Re: [PATCH 0/5] file-posix: Clean up and fix zoned checks

2023-08-24 Thread Sam Li
Hi Hanna,

Hanna Czenczek  于2023年8月24日周四 23:53写道:
>
> Hi,
>
> As presented in [1] there is a bug in the zone code in raw_co_prw(),
> specifically we don’t check whether there actually is zone information
> before running code that assumes there is (and thus we run into a
> division by zero).  This has now also been reported in [2].

Thanks for catching the bugs and your work.

>
> I believe the solution [1] is incomplete, though, which is why I’m
> sending this separate series: I don’t think checking bs->wps and/or
> bs->bl.zone_size to determine whether there is zone information is
> right; for example, we do not have raw_refresh_zoned_limits() clear
> those values if on a refresh, zone information were to disappear.
>
> It is also weird that we separate checking bs->wps and bs->bl.zone_size
> at all; raw_refresh_zoned_limits() seems to intend to ensure that either
> we have information with non-NULL bs->wps and non-zero bs->bl.zone_size,
> or we don’t.
>
> I think we should have a single flag that tells whether we have valid
> information or not, and it looks to me like bs->bl.zoned != BLK_Z_NONE
> is the condition that fits best.

The former way only checks zone information when it is being used to
avoid divide-by-zero or nullptr errors. Putting the error path with
non-zoned model implies a zoned device must have non-zero zone size
and allocated write pointers. Given that no other parts are changing
the zone_size to 0 and free wps, It does simplify the code path.

Thanks,
Sam



Re: [PATCH 3/5] file-posix: Fix zone update in I/O error path

2023-08-24 Thread Sam Li
Hanna Czenczek  于2023年8月24日周四 23:53写道:
>
> We must check that zone information is present before running
> update_zones_wp().
>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
> Fixes: Coverity CID 1512459
> Signed-off-by: Hanna Czenczek 
> ---
>  block/file-posix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Sam Li 



Re: [PATCH 2/5] file-posix: Check bs->bl.zoned for zone info

2023-08-24 Thread Sam Li
Hanna Czenczek  于2023年8月24日周四 23:53写道:
>
> Instead of checking bs->wps or bs->bl.zone_size for whether zone
> information is present, check bs->bl.zoned.  That is the flag that
> raw_refresh_zoned_limits() reliably sets to indicate zone support.  If
> it is set to something other than BLK_Z_NONE, other values and objects
> like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it
> is not, we cannot rely on their validity.
>
> Signed-off-by: Hanna Czenczek 
> ---
>  block/file-posix.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Sam Li 



Re: [PATCH 1/5] file-posix: Clear bs->bl.zoned on error

2023-08-24 Thread Sam Li
Hanna Czenczek  于2023年8月24日周四 23:53写道:
>
> bs->bl.zoned is what indicates whether the zone information is present
> and valid; it is the only thing that raw_refresh_zoned_limits() sets if
> CONFIG_BLKZONED is not defined, and it is also the only thing that it
> sets if CONFIG_BLKZONED is defined, but there are no zones.
>
> Make sure that it is always set to BLK_Z_NONE if there is an error
> anywhere in raw_refresh_zoned_limits() so that we do not accidentally
> announce zones while our information is incomplete or invalid.
>
> This also fixes a memory leak in the last error path in
> raw_refresh_zoned_limits().
>
> Signed-off-by: Hanna Czenczek 
> ---
>  block/file-posix.c | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)

Reviewed-by: Sam Li 



Re: NVMe ZNS last zone size

2023-08-23 Thread Sam Li
Klaus Jensen  于2023年8月24日周四 02:53写道:
>
> On Aug 23 22:58, Sam Li wrote:
> > Stefan Hajnoczi  于2023年8月23日周三 22:41写道:
> > >
> > > On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
> > > >
> > > > Hi Stefan,
> > > >
> > > > Stefan Hajnoczi  于2023年8月23日周三 21:26写道:
> > > > >
> > > > > Hi Sam and Klaus,
> > > > > Val is adding nvme-io_uring ZNS support to libblkio
> > > > > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > > > > how to test the size of the last zone when the namespace's total size
> > > > > is not a multiple of the zone size.
> > > >
> > > > I think a zone report operation can do the trick. Given zone configs,
> > > > the size of last zone should be [size - (nr_zones - 1) * zone_size].
> > > > Reporting last zone on such devices tells whether the value is
> > > > correct.
> > >
> > > In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded 
> > > down:
> > >
> > >   ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;
> > >
> > > Afterwards nsze is recalculated as follows:
> > >
> > >   ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
> > >
> > > I interpret this to mean that when the namespace's total size is not a
> > > multiple of the zone size, then the last part will be ignored and not
> > > exposed as a zone.
> >
> > I see. Current ZNS emulation does not support this case.
> >
>
> NVMe Zoned Namespaces requires all zones to be the same size. The
> "trailing zone" is a thing in SMR HDDs.

Thanks! Then qcow2 with ZNS should also ignore the trailing zone.

Sam



Re: NVMe ZNS last zone size

2023-08-23 Thread Sam Li
Stefan Hajnoczi  于2023年8月23日周三 22:41写道:
>
> On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
> >
> > Hi Stefan,
> >
> > Stefan Hajnoczi  于2023年8月23日周三 21:26写道:
> > >
> > > Hi Sam and Klaus,
> > > Val is adding nvme-io_uring ZNS support to libblkio
> > > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > > how to test the size of the last zone when the namespace's total size
> > > is not a multiple of the zone size.
> >
> > I think a zone report operation can do the trick. Given zone configs,
> > the size of last zone should be [size - (nr_zones - 1) * zone_size].
> > Reporting last zone on such devices tells whether the value is
> > correct.
>
> In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded down:
>
>   ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;
>
> Afterwards nsze is recalculated as follows:
>
>   ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
>
> I interpret this to mean that when the namespace's total size is not a
> multiple of the zone size, then the last part will be ignored and not
> exposed as a zone.

I see. Current ZNS emulation does not support this case.

>
> >
> > >
> > > My understanding is that the zoned storage model allows the last zone
> > > to be smaller than the zone size in this case. However, the NVMe ZNS
> > > emulation code in QEMU makes all zones a multiple of the zone size. I
> > > think QEMU cannot be used for this test case at the moment.
> > >
> > > Are there any plans to allow the last zone to have a different size?
> > > Maybe Sam's qcow2 work will allow this?
> >
> > Yes, the zone report in qcow2 allows smaller last zone.
> > Please let me know if there is any problem.
>
> Great. Val can try your qcow2 patches and see if that allows her to
> test last zone size != zone_size.

Not sure how the test is set up. If requiring nvme passthrough, maybe
zns patches need to be on top of qcow2 patches. There are still some
cases to be fixed up. So just let me know any problem is on my side.

In case Val needs it, the lastest branch is:
https://github.com/sgzerolc/qemu/blob/dev-zns-v3/

Thanks,
Sam



Re: NVMe ZNS last zone size

2023-08-23 Thread Sam Li
Hi Stefan,

Stefan Hajnoczi  于2023年8月23日周三 21:26写道:
>
> Hi Sam and Klaus,
> Val is adding nvme-io_uring ZNS support to libblkio
> (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> how to test the size of the last zone when the namespace's total size
> is not a multiple of the zone size.

I think a zone report operation can do the trick. Given zone configs,
the size of last zone should be [size - (nr_zones - 1) * zone_size].
Reporting last zone on such devices tells whether the value is
correct.

>
> My understanding is that the zoned storage model allows the last zone
> to be smaller than the zone size in this case. However, the NVMe ZNS
> emulation code in QEMU makes all zones a multiple of the zone size. I
> think QEMU cannot be used for this test case at the moment.
>
> Are there any plans to allow the last zone to have a different size?
> Maybe Sam's qcow2 work will allow this?

Yes, the zone report in qcow2 allows smaller last zone.
Please let me know if there is any problem.

Thanks,
Sam



Re: [PATCH v2 0/4] Add full zoned storage emulation to qcow2 driver

2023-08-16 Thread Sam Li
Klaus Jensen  于2023年8月16日周三 15:37写道:
>
> On Aug 14 16:57, Sam Li wrote:
> > This patch series add a new extension - zoned format - to the
> > qcow2 driver thereby allowing full zoned storage emulation on
> > the qcow2 img file. Users can attach such a qcow2 file to the
> > guest as a zoned device.
> >
> > To create a qcow2 file with zoned format, use command like this:
> > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > -o zoned_profile=zbc
> >
> > Then add it to the QEMU command line:
> > -blockdev 
> > node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2
> >  \
> > -device virtio-blk-pci,drive=drive1 \
> >
> > v1->v2:
> > - add more tests to qemu-io zoned commands
> > - make zone append change state to full when wp reaches end
> > - add documentation to qcow2 zoned extension header
> > - address review comments (Stefan):
> >   * fix zoned_mata allocation size
> >   * use bitwise or than addition
> >   * fix wp index overflow and locking
> >   * cleanups: comments, naming
> >
> > Sam Li (4):
> >   docs/qcow2: add the zoned format feature
> >   qcow2: add configurations for zoned format extension
> >   qcow2: add zoned emulation capability
> >   iotests: test the zoned format feature for qcow2 file
> >
> >  block/qcow2.c| 799 ++-
> >  block/qcow2.h|  23 +
> >  docs/interop/qcow2.txt   |  26 +
> >  docs/system/qemu-block-drivers.rst.inc   |  39 ++
> >  include/block/block-common.h |   5 +
> >  include/block/block_int-common.h |  16 +
> >  qapi/block-core.json |  46 +-
> >  tests/qemu-iotests/tests/zoned-qcow2 | 135 
> >  tests/qemu-iotests/tests/zoned-qcow2.out | 140 
> >  9 files changed, 1214 insertions(+), 15 deletions(-)
> >  create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
> >  create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out
> >
>
> Hi Sam,
>
> Thanks for this and for the RFC for hw/nvme - this is an awesome
> improvement.
>
> Can you explain the need for the zoned_profile? I understand that only
> ZNS requires potentially setting zone_capacity and configuring extended
> descriptors. When an image is hooked up to a block emulation device that
> doesnt understand cap < size or extended descriptors, it could just
> would fail on the cap < size and just ignore the extended descriptor
> space. Do we really need to add the complexity of the user explicitly
> having to set the profile? I also think it is fair for the QEMU zoned
> block api to accomodate both variations - if a particular configuration
> is supported or not is up to the emulating device.
>
> Checking the profile from hw/nvme or hw/block/virtio is the same as
> checking if cap < size or possibly the presence of extended descriptors.

Hi Klaus,

Thanks for your feedback.

The zoned_profile is for users to choose the emulating device type,
either zbc or zns. It implies using virtio-blk or nvme pass through.
The zoned block api does accommodate both variations. Since the cap <
size and extended descriptor config can also infer zoned_profile, this
option can be dropped. Then the device type is determined by the
configurations. When cap = size and no extended descriptor, the img
can be used both in virtio-blk and nvme zns depending on the QEMU
command line.


Best regards,
Sam



[RFC 5/5] hw/nvme: make ZDED persistent

2023-08-16 Thread Sam Li
Zone descriptor extension data (ZDED) is not persistent across QEMU
restarts. The zone descriptor extension valid bit (ZDEV) is part of
zone attributes, which sets to one when the ZDED is associated with
the zone.

With the qcow2-ZNS file as the backing file, the NVMe ZNS device stores
the zone attributes at the following eight bit of zoned bit of write
pointers for each zone. The ZDED is stored as part of zoned metadata as
write pointers.

Signed-off-by: Sam Li 
---
 block/qcow2.c| 44 +++-
 hw/nvme/ctrl.c   |  6 +
 include/block/block-common.h |  1 +
 3 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5a038792f1..ac5ecef559 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 
 #include "block/qdict.h"
+#include "block/nvme.h"
 #include "sysemu/block-backend.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -214,6 +215,17 @@ static inline void qcow2_set_wp(uint64_t *wp, 
BlockZoneState zs)
 *wp = addr;
 }
 
+static inline void qcow2_set_za(uint64_t *wp, uint8_t za)
+{
+/*
+ * The zone attribute takes up one byte. Store it after the zoned
+ * bit.
+ */
+uint64_t addr = *wp;
+addr |= ((uint64_t)za << 51);
+*wp = addr;
+}
+
 /*
  * File wp tracking: reset zone, finish zone and append zone can
  * change the value of write pointer. All zone operations will change
@@ -308,7 +320,7 @@ static int qcow2_check_open(BlockDriverState *bs)
 
 /*
  * The zoned device has limited zone resources of open, closed, active
- * zones.
+ * zones. Check if we can manage a zone without exceeding those limits.
  */
 static int qcow2_check_zone_resources(BlockDriverState *bs,
   BlockZoneState zs)
@@ -4801,6 +4813,33 @@ unlock:
 return ret;
 }
 
+static int qcow2_zns_set_zded(BlockDriverState *bs, uint32_t index)
+{
+BDRVQcow2State *s = bs->opaque;
+int ret;
+
+qemu_co_mutex_lock(>wps->colock);
+uint64_t *wp = >wps->wp[index];
+BlockZoneState zs = qcow2_get_zs(*wp);
+if (zs == BLK_ZS_EMPTY) {
+ret = qcow2_check_zone_resources(bs, zs);
+if (ret < 0) {
+return ret;
+}
+
+qcow2_set_za(wp, NVME_ZA_ZD_EXT_VALID);
+ret = qcow2_write_wp_at(bs, wp, index, BLK_ZO_CLOSE);
+if (ret < 0) {
+error_report("Failed to set zone extension at 0x%" PRIx64 "", *wp);
+return ret;
+}
+s->nr_zones_closed++;
+return ret;
+}
+
+return NVME_ZONE_INVAL_TRANSITION;
+}
+
 static int coroutine_fn qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp 
op,
int64_t offset, int64_t len)
 {
@@ -4857,6 +4896,9 @@ static int coroutine_fn 
qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
 case BLK_ZO_OFFLINE:
 ret = qcow2_write_wp_at(bs, >wp[index], index, BLK_ZO_OFFLINE);
 break;
+case BLK_ZO_SET_ZDED:
+ret = qcow2_zns_set_zded(bs, index);
+break;
 default:
 error_report("Unsupported zone op: 0x%x", op);
 ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 3932b516ed..fcd774e3f7 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3425,11 +3425,6 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 NvmeNamespace *ns = req->ns;
 NvmeZoneMgmtAIOCB *iocb;
 uint64_t slba = 0;
-uint64_t offset;
-BlockBackend *blk = ns->blkconf.blk;
-uint32_t zone_size = blk_get_zone_size(blk);
-uint64_t size = zone_size * blk_get_nr_zones(blk);
-int64_t len;
 uint32_t zone_idx = 0;
 uint16_t status;
 uint8_t action = cmd->zsa;
@@ -3485,6 +3480,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, 
NvmeRequest *req)
 break;
 
 case NVME_ZONE_ACTION_SET_ZD_EXT:
+op = BLK_ZO_SET_ZDED;
 int zd_ext_size = blk_get_zd_ext_size(blk);
 trace_pci_nvme_set_descriptor_extension(slba, zone_idx);
 if (all || !zd_ext_size) {
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 0cbed607a8..b369e77607 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -84,6 +84,7 @@ typedef enum BlockZoneOp {
 BLK_ZO_FINISH,
 BLK_ZO_RESET,
 BLK_ZO_OFFLINE,
+BLK_ZO_SET_ZDED,
 } BlockZoneOp;
 
 typedef enum BlockZoneModel {
-- 
2.40.1




[RFC 4/5] hw/nvme: refactor zone append writes using block layer APIs

2023-08-16 Thread Sam Li
Signed-off-by: Sam Li 
---
 block/block-backend.c |   8 ++
 block/qcow2.c |   7 +-
 hw/nvme/ctrl.c| 195 ++
 include/sysemu/block-backend-io.h |   1 +
 include/sysemu/dma.h  |   3 +
 softmmu/dma-helpers.c |  17 +++
 6 files changed, 181 insertions(+), 50 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 9c95ae0267..2aafb4cee3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2426,6 +2426,14 @@ uint32_t blk_get_nr_zones(BlockBackend *blk)
 return bs ? bs->bl.nr_zones : 0;
 }
 
+uint32_t blk_get_write_granularity(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.write_granularity : 0;
+}
+
 uint8_t *blk_get_zone_extension(BlockBackend *blk) {
 BlockDriverState * bs = blk_bs(blk);
 IO_CODE();
diff --git a/block/qcow2.c b/block/qcow2.c
index 41549dd68b..5a038792f1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2198,7 +2198,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_active_zones = s->zoned_header.max_active_zones;
 bs->bl.max_open_zones = s->zoned_header.max_open_zones;
 bs->bl.zone_size = s->zoned_header.zone_size;
-bs->bl.write_granularity = BDRV_SECTOR_SIZE;
+bs->bl.write_granularity = 4096; /* physical block size */
 }
 
 static int qcow2_reopen_prepare(BDRVReopenState *state,
@@ -4915,6 +4915,11 @@ qcow2_co_zone_append(BlockDriverState *bs, int64_t 
*offset, QEMUIOVector *qiov,
 qemu_co_mutex_lock(>wps->colock);
 uint64_t wp = s->wps->wp[index];
 uint64_t wp_i = qcow2_get_wp(wp);
+printf("qcow2 offset 0x%lx\n", *offset);
+printf("checking wp[%ld]: 0b%lb\n", *offset / bs->bl.zone_size, wp);
+for (int i = 0; i < bs->bl.nr_zones; i++) {
+printf("Listing wp[%d]: 0b%lb\n", i, s->wps->wp[i]);
+}
 ret = qcow2_co_pwritev_part(bs, wp_i, len, qiov, 0, 0);
 if (ret == 0) {
 *offset = wp_i;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8d4c08dc4c..3932b516ed 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1740,6 +1740,95 @@ static void nvme_misc_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+typedef struct NvmeZoneCmdAIOCB {
+NvmeRequest *req;
+NvmeCmd *cmd;
+NvmeCtrl *n;
+
+union {
+struct {
+  uint32_t partial;
+  unsigned int nr_zones;
+  BlockZoneDescriptor *zones;
+} zone_report_data;
+struct {
+  int64_t offset;
+} zone_append_data;
+};
+} NvmeZoneCmdAIOCB;
+
+static void nvme_blk_zone_append_complete_cb(void *opaque, int ret)
+{
+NvmeZoneCmdAIOCB *cb = opaque;
+NvmeRequest *req = cb->req;
+int64_t *offset = (int64_t *)>cqe;
+
+if (ret) {
+nvme_aio_err(req, ret);
+}
+
+*offset = nvme_b2l(req->ns, cb->zone_append_data.offset);
+nvme_enqueue_req_completion(nvme_cq(req), req);
+g_free(cb);
+}
+
+static inline void nvme_blk_zone_append(BlockBackend *blk, int64_t *offset,
+  uint32_t align,
+  BlockCompletionFunc *cb,
+  NvmeZoneCmdAIOCB *aiocb)
+{
+NvmeRequest *req = aiocb->req;
+assert(req->sg.flags & NVME_SG_ALLOC);
+
+if (req->sg.flags & NVME_SG_DMA) {
+req->aiocb = dma_blk_zone_append(blk, >sg.qsg, (int64_t)offset,
+ align, cb, aiocb);
+} else {
+req->aiocb = blk_aio_zone_append(blk, offset, >sg.iov, 0,
+ cb, aiocb);
+}
+}
+
+static void nvme_zone_append_cb(void *opaque, int ret)
+{
+NvmeZoneCmdAIOCB *aiocb = opaque;
+NvmeRequest *req = aiocb->req;
+NvmeNamespace *ns = req->ns;
+
+BlockBackend *blk = ns->blkconf.blk;
+
+trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk));
+
+if (ret) {
+goto out;
+}
+
+if (ns->lbaf.ms) {
+NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+int64_t offset = aiocb->zone_append_data.offset;
+
+if (nvme_ns_ext(ns) || req->cmd.mptr) {
+uint16_t status;
+
+nvme_sg_unmap(>sg);
+status = nvme_map_mdata(nvme_ctrl(req), nlb, req);
+if (status) {
+ret = -EFAULT;
+goto out;
+}
+
+return nvme_blk_zone_append(blk, , 1,
+nvme_blk_zone_append_complete_cb,
+aiocb);
+}
+}
+
+out:
+nvme_blk_zone_append_complete_cb(aiocb, ret);
+}
+
+
 void nvme_rw_complete_cb(void *opaque, int ret)
 {
 NvmeRequest *req = opaque;
@@ -30

[RFC 3/5] hw/nvme: make the metadata of ZNS emulation persistent

2023-08-16 Thread Sam Li
The NVMe ZNS devices follow NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs. The
ZNS device calls zone report and zone mgmt APIs from the block layer
which will handle zone state transition and manage zone resources.

Signed-off-by: Sam Li 
---
 block/block-backend.c |   15 +
 block/qcow2.c |3 +
 hw/nvme/ctrl.c| 1114 ++---
 hw/nvme/ns.c  |   77 +-
 hw/nvme/nvme.h|   85 +--
 include/block/block-common.h  |8 +
 include/block/block_int-common.h  |2 +
 include/sysemu/block-backend-io.h |2 +
 8 files changed, 283 insertions(+), 1023 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f68c5263f3..9c95ae0267 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2418,6 +2418,14 @@ uint32_t blk_get_max_append_sectors(BlockBackend *blk)
 return bs ? bs->bl.max_append_sectors : 0;
 }
 
+uint32_t blk_get_nr_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.nr_zones : 0;
+}
+
 uint8_t *blk_get_zone_extension(BlockBackend *blk) {
 BlockDriverState * bs = blk_bs(blk);
 IO_CODE();
@@ -2433,6 +2441,13 @@ uint32_t blk_get_zd_ext_size(BlockBackend *blk)
 return bs ? bs->bl.zd_extension_size : 0;
 }
 
+BlockZoneWps *blk_get_zone_wps(BlockBackend *blk) {
+BlockDriverState * bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->wps : NULL;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/block/qcow2.c b/block/qcow2.c
index fce1fe83a7..41549dd68b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4854,6 +4854,9 @@ static int coroutine_fn 
qcow2_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
 case BLK_ZO_RESET:
 ret = qcow2_reset_zone(bs, index, len);
 break;
+case BLK_ZO_OFFLINE:
+ret = qcow2_write_wp_at(bs, >wp[index], index, BLK_ZO_OFFLINE);
+break;
 default:
 error_report("Unsupported zone op: 0x%x", op);
 ret = -ENOTSUP;
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 4320f3a15c..8d4c08dc4c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -372,67 +372,6 @@ static inline bool nvme_parse_pid(NvmeNamespace *ns, 
uint16_t pid,
 return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg);
 }
 
-static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
-   NvmeZoneState state)
-{
-if (QTAILQ_IN_USE(zone, entry)) {
-switch (nvme_get_zone_state(zone)) {
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_REMOVE(>exp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_REMOVE(>imp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_CLOSED:
-QTAILQ_REMOVE(>closed_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_FULL:
-QTAILQ_REMOVE(>full_zones, zone, entry);
-default:
-;
-}
-}
-
-nvme_set_zone_state(zone, state);
-
-switch (state) {
-case NVME_ZONE_STATE_EXPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>exp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-QTAILQ_INSERT_TAIL(>imp_open_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_CLOSED:
-QTAILQ_INSERT_TAIL(>closed_zones, zone, entry);
-break;
-case NVME_ZONE_STATE_FULL:
-QTAILQ_INSERT_TAIL(>full_zones, zone, entry);
-case NVME_ZONE_STATE_READ_ONLY:
-break;
-default:
-zone->d.za = 0;
-}
-}
-
-static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
- uint32_t opn, uint32_t zrwa)
-{
-if (zrwa > ns->zns.numzrwa) {
-return NVME_NOZRWA | NVME_DNR;
-}
-
-return NVME_SUCCESS;
-}
-
-/*
- * Check if we can open a zone without exceeding open/active limits.
- * AOR stands for "Active and Open Resources" (see TP 4053 section 2.5).
- */
-static uint16_t nvme_aor_check(NvmeNamespace *ns, uint32_t act, uint32_t opn)
-{
-return nvme_zns_check_resources(ns, act, opn, 0);
-}
-
 static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer 
*ebuf)
 {
 NvmeFdpEvent *ret = NULL;
@@ -1769,346 +1708,11 @@ static inline uint32_t nvme_zone_idx(NvmeNamespace 
*ns, uint64_t slba)
 slba / ns->zone_size;
 }
 
-static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba)
-{
-uint32_t zone_idx = nvme_zone_idx(ns, slba);
-
-if (zone_idx >= ns->num_zones) {
-return NULL;
-}
-
-return >

[RFC 2/5] qcow2: add zone device metadata with zd_extension

2023-08-16 Thread Sam Li
Zone descriptor data is host definied data that is associated with
each zone. Add zone descriptor extensions to zonedmeta and
blk_get_zone_extension to access zd_extensions.

Signed-off-by: Sam Li 
---
 block/block-backend.c | 15 ++
 block/qcow2.c | 86 ++-
 block/qcow2.h |  3 ++
 docs/interop/qcow2.txt|  2 +
 hw/nvme/ctrl.c| 19 ---
 hw/nvme/ns.c  | 24 ++---
 hw/nvme/nvme.h|  7 ---
 include/block/block_int-common.h  |  6 +++
 include/sysemu/block-backend-io.h |  2 +
 qapi/block-core.json  |  3 ++
 10 files changed, 121 insertions(+), 46 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index ad410286a0..f68c5263f3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2418,6 +2418,21 @@ uint32_t blk_get_max_append_sectors(BlockBackend *blk)
 return bs ? bs->bl.max_append_sectors : 0;
 }
 
+uint8_t *blk_get_zone_extension(BlockBackend *blk) {
+BlockDriverState * bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->zd_extensions : NULL;
+}
+
+uint32_t blk_get_zd_ext_size(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zd_extension_size : 0;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/block/qcow2.c b/block/qcow2.c
index 9de90ccc9f..fce1fe83a7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -340,15 +340,28 @@ static inline int 
qcow2_refresh_zonedmeta(BlockDriverState *bs)
 {
 int ret;
 BDRVQcow2State *s = bs->opaque;
-uint64_t *temp = g_malloc(s->zoned_header.zonedmeta_size);
+uint64_t wps_size = s->zoned_header.zonedmeta_size -
+s->zded_size;
+g_autofree uint64_t *temp = NULL;
+temp = g_new(uint64_t, wps_size);
 ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset,
- s->zoned_header.zonedmeta_size, temp, 0);
+ wps_size, temp, 0);
 if (ret < 0) {
-error_report("Can not read metadata\n");
+error_report("Can not read metadata");
 return ret;
 }
 
-memcpy(s->wps->wp, temp, s->zoned_header.zonedmeta_size);
+g_autofree uint8_t *zded = NULL;
+zded = g_try_malloc0(s->zded_size);
+ret = bdrv_pread(bs->file, s->zoned_header.zonedmeta_offset + wps_size,
+ s->zded_size, zded, 0);
+if (ret < 0) {
+error_report("Can not read zded");
+return ret;
+}
+
+memcpy(s->wps->wp, temp, wps_size);
+memcpy(bs->zd_extensions, zded, s->zded_size);
 return 0;
 }
 
@@ -607,6 +620,8 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 
 zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
 zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+zoned_ext.zd_extension_size =
+be32_to_cpu(zoned_ext.zd_extension_size);
 zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
 zoned_ext.zone_nr_conv = be32_to_cpu(zoned_ext.zone_nr_conv);
 zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
@@ -618,8 +633,10 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 be64_to_cpu(zoned_ext.zonedmeta_offset);
 zoned_ext.zonedmeta_size = be64_to_cpu(zoned_ext.zonedmeta_size);
 s->zoned_header = zoned_ext;
+
 s->wps = g_malloc(sizeof(BlockZoneWps)
-+ s->zoned_header.zonedmeta_size);
++ zoned_ext.zonedmeta_size - s->zded_size);
+bs->zd_extensions = g_malloc0(s->zded_size);
 ret = qcow2_refresh_zonedmeta(bs);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "zonedmeta: "
@@ -2174,6 +2191,7 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.zoned = s->zoned_header.zoned;
 bs->bl.zoned_profile = s->zoned_header.zoned_profile;
 bs->bl.zone_capacity = s->zoned_header.zone_capacity;
+bs->bl.zd_extension_size = s->zoned_header.zd_extension_size;
 bs->bl.nr_zones = s->zoned_header.nr_zones;
 bs->wps = s->wps;
 bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
@@ -3369,6 +3387,8 @@ int qcow2_update_header(BlockDriverState *bs)
 .nr_zones   = cpu_to_be32(s->zoned_header.nr_zones),
 .zone_size  = cpu_to_be32(s->zoned_header.zone_size),
 .zone_capacity  = cpu_to_be32(s->zoned_header.zone_capacity),
+.zd_extension_size  =
+cpu_to_be32(s->zoned_header.zd_extension_size),
 .zone_nr_conv   = cpu_to_be32(s->zoned_header.zon

[RFC 1/5] hw/nvme: use blk_get_*() to access zone info in the block layer

2023-08-16 Thread Sam Li
The zone information is contained in the BlockLimits fileds. Add blk_get_*() 
functions
to access the block layer and update zone info accessing in the NVMe device 
emulation.

Signed-off-by: Sam Li 
---
 block/block-backend.c | 56 
 block/qcow2.c | 20 +-
 hw/nvme/ctrl.c| 34 ++---
 hw/nvme/ns.c  | 62 ++-
 hw/nvme/nvme.h|  3 --
 include/sysemu/block-backend-io.h |  7 
 6 files changed, 112 insertions(+), 70 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..ad410286a0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2362,6 +2362,62 @@ int blk_get_max_iov(BlockBackend *blk)
 return blk->root->bs->bl.max_iov;
 }
 
+uint8_t blk_get_zone_model(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+return bs ? bs->bl.zoned: 0;
+
+}
+
+uint8_t blk_get_zone_profile(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+return bs ? bs->bl.zoned_profile: 0;
+
+}
+
+uint32_t blk_get_zone_size(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zone_size : 0;
+}
+
+uint32_t blk_get_zone_capacity(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.zone_capacity : 0;
+}
+
+uint32_t blk_get_max_open_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_open_zones : 0;
+}
+
+uint32_t blk_get_max_active_zones(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_active_zones : 0;
+}
+
+uint32_t blk_get_max_append_sectors(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+IO_CODE();
+
+return bs ? bs->bl.max_append_sectors : 0;
+}
+
 void *blk_try_blockalign(BlockBackend *blk, size_t size)
 {
 IO_CODE();
diff --git a/block/qcow2.c b/block/qcow2.c
index 5ccf79cbe7..9de90ccc9f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2172,6 +2172,8 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
 bs->bl.pdiscard_alignment = s->cluster_size;
 bs->bl.zoned = s->zoned_header.zoned;
+bs->bl.zoned_profile = s->zoned_header.zoned_profile;
+bs->bl.zone_capacity = s->zoned_header.zone_capacity;
 bs->bl.nr_zones = s->zoned_header.nr_zones;
 bs->wps = s->wps;
 bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
@@ -4083,8 +4085,22 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
 s->zoned_header.zoned = BLK_Z_HM;
 s->zoned_header.zone_size = qcow2_opts->zone_size;
 s->zoned_header.zone_nr_conv = qcow2_opts->zone_nr_conv;
-s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
-s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
+
+if (!qcow2_opts->max_active_zones) {
+if (qcow2_opts->max_open_zones > qcow2_opts->max_active_zones) {
+error_setg(errp, "max_open_zones (%u) exceeds "
+   "max_active_zones (%u)", qcow2_opts->max_open_zones,
+   qcow2_opts->max_active_zones);
+return -1;
+}
+
+if (!qcow2_opts->max_open_zones) {
+s->zoned_header.max_open_zones = qcow2_opts->max_active_zones;
+}
+s->zoned_header.max_open_zones = qcow2_opts->max_open_zones;
+s->zoned_header.max_active_zones = qcow2_opts->max_active_zones;
+}
+
 s->zoned_header.max_append_sectors = qcow2_opts->max_append_sectors;
 s->zoned_header.nr_zones = qcow2_opts->size / qcow2_opts->zone_size;
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 539d273553..4e1608f0c1 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -417,18 +417,6 @@ static void nvme_assign_zone_state(NvmeNamespace *ns, 
NvmeZone *zone,
 static uint16_t nvme_zns_check_resources(NvmeNamespace *ns, uint32_t act,
  uint32_t opn, uint32_t zrwa)
 {
-if (ns->params.max_active_zones != 0 &&
-ns->nr_active_zones + act > ns->params.max_active_zones) {
-trace_pci_nvme_err_insuff_active_res(ns->params.max_active_zones);
-return NVME_ZONE_TOO_MANY_ACTIVE | NVME_DNR;
-}
-
-if (ns->params.max_open_zones != 0 &&
-ns->nr_open_zones + opn > ns->params.max_open_zones) {
-trace_pci_nvme_err_insuff_open_res(ns->params.max_open_zones);
-return NVME_ZONE_TOO_MANY_OPEN | NVME_DN

[RFC 0/5] Add persistence to NVMe ZNS emulation

2023-08-16 Thread Sam Li
ZNS emulation follows NVMe ZNS spec but the state of namespace
zones does not persist accross restarts of QEMU. This patch makes the
metadata of ZNS emulation persistent by using new block layer APIs and
the qcow2 img as backing file. It is the second part after the patches
- adding full zoned storage emulation to qcow2 driver.

The metadata of ZNS emulation divides into two parts, zone metadata and
zone descriptor extension data. The zone metadata is composed of zone
states, zone type, wp and zone attributes. The zone information can be
stored at an uint64_t wp to save space and easy access. The structure of
wp of each zone is as follows:
|| zone state (4)| zone type (1)| zone attr (8)| wp (51) ||

The zone descriptor extension data is relatively small comparing to the
overall size therefore we adopt the option that store zded of all zones
in an array regardless of the valid bit set.

To create a zns format qcow2 image file, use:
$ ./build/qemu-img create -f qcow2 zns.qcow2 -o size=768M
-o zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0
-o max_append_sectors=512 -o max_open_zones=0 -o
max_active_zones=0 -o zoned_profile=zns

To attach this file as emulated zns drive in the command line of QEMU, use:
  -drive file=${znsimg},id=nvmezns0,format=qcow2,if=none \
  -device nvme-ns,drive=nvmezns0,bus=nvme0,nsid=1,uuid=xxx \

Sam Li (5):
  hw/nvme: use blk_get_*() to access zone info in the block layer
  qcow2: add zone device metadata with zd_extension
  hw/nvme: make the metadata of ZNS emulation persistent
  hw/nvme: refactor zone append writes using block layer APIs
  hw/nvme: make ZDED persistent

 block/block-backend.c |   94 +++
 block/qcow2.c |  160 +++-
 block/qcow2.h |3 +
 docs/interop/qcow2.txt|2 +
 hw/nvme/ctrl.c| 1256 -
 hw/nvme/ns.c  |  163 +---
 hw/nvme/nvme.h|   95 +--
 include/block/block-common.h  |9 +
 include/block/block_int-common.h  |8 +
 include/sysemu/block-backend-io.h |   12 +
 include/sysemu/dma.h  |3 +
 qapi/block-core.json  |3 +
 softmmu/dma-helpers.c |   17 +
 13 files changed, 686 insertions(+), 1139 deletions(-)

-- 
2.40.1




[PATCH v2 4/4] iotests: test the zoned format feature for qcow2 file

2023-08-14 Thread Sam Li
The zoned format feature can be tested by:
$ tests/qemu-iotests/check zoned-qcow2

Signed-off-by: Sam Li 
---
 tests/qemu-iotests/tests/zoned-qcow2 | 135 ++
 tests/qemu-iotests/tests/zoned-qcow2.out | 140 +++
 2 files changed, 275 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

diff --git a/tests/qemu-iotests/tests/zoned-qcow2 
b/tests/qemu-iotests/tests/zoned-qcow2
new file mode 100755
index 00..473b462b50
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned-qcow2
@@ -0,0 +1,135 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations for qcow2 file.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+file_name="zbc.qcow2"
+_cleanup()
+{
+  _cleanup_test_img
+  _rm_test_img "$file_name"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with qcow2 image files.
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+$QEMU_IMG create -f qcow2 $file_name -o size=768M -o zone_size=64M \
+-o zone_capacity=64M -o zone_nr_conv=0 -o max_append_sectors=131072 \
+-o max_open_zones=0 -o max_active_zones=0 -o zoned_profile=zbc
+
+IMG="--image-opts -n driver=qcow2,file.driver=file,file.filename=$file_name"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo
+echo "=== Testing a qcow2 img with zoned format ==="
+echo
+echo "case 1: test if one zone operation works"
+
+echo "(1) report zones[0]:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report zones[0~9]:"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+$QEMU_IO $IMG -c "zrp 0x2C00 2" # 0x2C00 / 512 = 0x16
+echo
+echo
+echo "open zones[0]:"
+$QEMU_IO $IMG -c "zo 0 0x400" # 0x400 / 512 = 0x2
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "open zones[1]"
+$QEMU_IO $IMG -c "zo 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "open the last zone"
+$QEMU_IO $IMG -c "zo 0x2C00 0x400"
+$QEMU_IO $IMG -c "zrp 0x2C00 2"
+echo
+echo
+echo "close zones[0]"
+$QEMU_IO $IMG -c "zc 0 0x400"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "close the last zone"
+$QEMU_IO $IMG -c "zc 0x3e7000 0x400"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(4) finish zones[1]"
+$QEMU_IO $IMG -c "zf 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo
+echo "(5) reset zones[1]"
+$QEMU_IO $IMG -c "zrs 0x400 0x400"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo
+echo "(6) append write with (4k, 8k) data" # the physical block size of the 
device is 4096
+$QEMU_IO $IMG -c "zrp 0 12"
+echo "Append write zones[0] one time:"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "Append write zones[0] twice:"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "Append write zones[1] one time:"
+$QEMU_IO $IMG -c "zap -p 0x400 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "Append write zones[1] twice:"
+$QEMU_IO $IMG -c "zap -p 0x400 0x1000 0x2000"
+$QEMU_IO $IMG -c "zrp 0x400 1"
+echo
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M"
+$QEMU_IO $IMG -c "zrp 0 12"
+echo
+echo
+echo "case 2: test a sets of ops that works or not"
+
+echo "(1) append write (4k, 4k) and then write to full"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x1000"
+echo "wrote (4k, 4k):"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x3ffd000"
+echo "wrote to full:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo "Reset zones[0]:"
+$QEMU_IO $IMG -c "zrs 0 64M"
+$QEMU_IO $IMG -c "zrp 0 1"
+
+echo "(2) write in zones[0], zones[3], zones[8], and then reset all"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x1000"
+$QEMU_IO $IMG -c "zap -p 0xc00 0x1000 0x1000"
+$QEMU_IO $IMG -c "zap -p 0x2000 0x1000 0x1000"
+echo "wrote three zones:"
+$QEMU_IO $IMG -c "zrp 0 12"
+echo "Reset all:"
+$QEMU_IO $IMG -c "zrs 0 768M"
+$QEMU_IO $IMG -c "zrp 0 12"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zo

[PATCH v2 0/4] Add full zoned storage emulation to qcow2 driver

2023-08-14 Thread Sam Li
This patch series add a new extension - zoned format - to the
qcow2 driver thereby allowing full zoned storage emulation on
the qcow2 img file. Users can attach such a qcow2 file to the
guest as a zoned device.

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
-o zoned_profile=zbc

Then add it to the QEMU command line:
-blockdev 
node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2 
\
-device virtio-blk-pci,drive=drive1 \

v1->v2:
- add more tests to qemu-io zoned commands
- make zone append change state to full when wp reaches end
- add documentation to qcow2 zoned extension header
- address review comments (Stefan):
  * fix zoned_mata allocation size
  * use bitwise or than addition
  * fix wp index overflow and locking
  * cleanups: comments, naming

Sam Li (4):
  docs/qcow2: add the zoned format feature
  qcow2: add configurations for zoned format extension
  qcow2: add zoned emulation capability
  iotests: test the zoned format feature for qcow2 file

 block/qcow2.c| 799 ++-
 block/qcow2.h|  23 +
 docs/interop/qcow2.txt   |  26 +
 docs/system/qemu-block-drivers.rst.inc   |  39 ++
 include/block/block-common.h |   5 +
 include/block/block_int-common.h |  16 +
 qapi/block-core.json |  46 +-
 tests/qemu-iotests/tests/zoned-qcow2 | 135 
 tests/qemu-iotests/tests/zoned-qcow2.out | 140 
 9 files changed, 1214 insertions(+), 15 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
 create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out

-- 
2.40.1




[PATCH v2 1/4] docs/qcow2: add the zoned format feature

2023-08-14 Thread Sam Li
Add the specs for the zoned format feature of the qcow2 driver. If
the zoned_profile is set to `zbc`, then the qcow2 file can be taken
as zoned device and passed through by virtio-blk device to the guest.
If it's `zns`, then it can be passed through by virtio-blk device or
NVMe ZNS device as a ZNS drive.

Signed-off-by: Sam Li 
---
 docs/system/qemu-block-drivers.rst.inc | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index 105cb9679c..2c1620668f 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -172,6 +172,45 @@ This section describes each format and the options that 
are supported for it.
 filename`` to check if the NOCOW flag is set or not (Capital 'C' is
 NOCOW flag).
 
+  .. option:: zoned_profile
+
+The option configures the zoned format feature on the qcow2 driver. If
+this is set to ``zbc``, then it follows the basics of ZBC/ZAC protocol.
+If setting to ``zns``, then it follows NVMe ZNS protocol.
+
+The virtio-blk device allows ``zbc`` and ``zns`` options to pass through
+zoned devices. While NVMe ZNS device only allows ``zns`` option.
+
+  .. option:: zone_size
+
+The size of a zone of the zoned device in bytes. The device is divided
+into zones of this size with the exception of the last zone, which may
+be smaller.
+
+  .. option:: zone_capacity
+
+The initial capacity value for all zones. The capacity must be less than
+or equal to zone size. If the last zone is smaller, then its capacity is
+capped. The device follows the ZBC protocol tends to have the same size
+as its zone.
+
+  .. option:: zone_nr_conv
+
+The number of conventional zones of the zoned device.
+
+  .. option:: max_open_zones
+
+The maximal allowed open zones.
+
+  .. option:: max_active_zones
+
+The limit of the zones with implicit open, explicit open or closed state.
+
+  .. option:: max_append_sectors
+
+The maximal sectors in 512B blocks that is allowed to append to zones
+while writing.
+
 .. program:: image-formats
 .. option:: qed
 
-- 
2.40.1




[PATCH v2 3/4] qcow2: add zoned emulation capability

2023-08-14 Thread Sam Li
By adding zone operations and zoned metadata, the zoned emulation
capability enables full emulation support of zoned device using
a qcow2 file. The zoned device metadata includes zone type,
zoned device state and write pointer of each zone, which is stored
to an array of unsigned integers.

Each zone of a zoned device makes state transitions following
the zone state machine. The zone state machine mainly describes
five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
READ ONLY and OFFLINE states will generally be affected by device
internal events. The operations on zones cause corresponding state
changing.

Zoned devices have a limit on zone resources, which puts constraints on
write operations into zones.

Signed-off-by: Sam Li 
---
 block/qcow2.c  | 676 -
 block/qcow2.h  |   2 +
 docs/interop/qcow2.txt |   2 +
 3 files changed, 678 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c1077c4a4a..5ccf79cbe7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
*fmt, Error **errp)
 return cryptoopts_qdict;
 }
 
+#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
+
+static inline int qcow2_get_wp(uint64_t wp)
+{
+/* clear state and type information */
+return ((wp << 5) >> 5);
+}
+
+static inline int qcow2_get_zs(uint64_t wp)
+{
+return (wp >> 60);
+}
+
+static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
+{
+uint64_t addr = qcow2_get_wp(*wp);
+addr |= ((uint64_t)zs << 60);
+*wp = addr;
+}
+
+/*
+ * File wp tracking: reset zone, finish zone and append zone can
+ * change the value of write pointer. All zone operations will change
+ * the state of that/those zone.
+ * */
+static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
+/* format: operations, the wp. */
+printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
+}
+
+/*
+ * Perform a state assignment and a flush operation that writes the new wp
+ * value to the dedicated location of the disk file.
+ */
+static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
+ uint32_t index, BlockZoneState zs) {
+BDRVQcow2State *s = bs->opaque;
+int ret;
+
+qcow2_set_wp(wp, zs);
+ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
+
+if (ret < 0) {
+goto exit;
+}
+qcow2_wp_tracking_helper(index, *wp);
+return ret;
+
+exit:
+error_report("Failed to write metadata with file");
+return ret;
+}
+
+static int qcow2_check_active(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+
+if (!s->zoned_header.max_active_zones) {
+return 0;
+}
+
+if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
+< s->zoned_header.max_active_zones) {
+return 0;
+}
+
+return -1;
+}
+
+static int qcow2_check_open(BlockDriverState *bs)
+{
+BDRVQcow2State *s = bs->opaque;
+int ret;
+
+if (!s->zoned_header.max_open_zones) {
+return 0;
+}
+
+if (s->nr_zones_exp_open + s->nr_zones_imp_open
+< s->zoned_header.max_open_zones) {
+return 0;
+}
+
+if(s->nr_zones_imp_open) {
+ret = qcow2_check_active(bs);
+if (ret == 0) {
+/* TODO: it takes O(n) time complexity (n = nr_zones).
+ * Optimizations required. */
+/* close one implicitly open zones to make it available */
+for (int i = s->zoned_header.zone_nr_conv;
+i < bs->bl.nr_zones; ++i) {
+uint64_t *wp = >wps->wp[i];
+if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
+ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
+if (ret < 0) {
+return ret;
+}
+s->wps->wp[i] = *wp;
+s->nr_zones_imp_open--;
+s->nr_zones_closed++;
+break;
+}
+}
+return 0;
+}
+return ret;
+}
+
+return -1;
+}
+
+/*
+ * The zoned device has limited zone resources of open, closed, active
+ * zones.
+ */
+static int qcow2_check_zone_resources(BlockDriverState *bs,
+  BlockZoneState zs)
+{
+int ret;
+
+switch (zs) {
+case BLK_ZS_EMPTY:
+ret = qcow2_check_active(bs);
+if (ret < 0) {
+error_report("No enough active zones");
+return ret;
+}
+return ret;
+case BLK_ZS_CLOSED:
+ret = qcow2_check_open(bs);
+if (ret < 0) {
+error_report("No enough open zones");
+  

[PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-14 Thread Sam Li
To configure the zoned format feature on the qcow2 driver, it
requires following arguments: the device size, zoned profile,
zoned model, zone size, zone capacity, number of conventional
zones, limits on zone resources (max append sectors, max open
zones, and max_active_zones). The zoned profile option is set
to zns when using the qcow2 file as a ZNS drive.

To create a qcow2 file with zoned format, use command like this:
$ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
 -o zoned_profile=zbc/zns

Signed-off-by: Sam Li 
---
 block/qcow2.c| 125 +++
 block/qcow2.h|  21 ++
 docs/interop/qcow2.txt   |  24 ++
 include/block/block-common.h |   5 ++
 include/block/block_int-common.h |  16 
 qapi/block-core.json |  46 
 6 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c51388e99d..c1077c4a4a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -73,6 +73,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
 #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
+#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
@@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 uint64_t offset;
 int ret;
 Qcow2BitmapHeaderExt bitmaps_ext;
+Qcow2ZonedHeaderExtension zoned_ext;
 
 if (need_update_header != NULL) {
 *need_update_header = false;
@@ -431,6 +433,38 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 break;
 }
 
+case QCOW2_EXT_MAGIC_ZONED_FORMAT:
+{
+if (ext.len != sizeof(zoned_ext)) {
+error_setg_errno(errp, -ret, "zoned_ext: "
+ "Invalid extension length");
+return -EINVAL;
+}
+ret = bdrv_pread(bs->file, offset, ext.len, _ext, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "zoned_ext: "
+ "Could not read ext header");
+return ret;
+}
+
+zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
+zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
+zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
+zoned_ext.zone_nr_conv = be32_to_cpu(zoned_ext.zone_nr_conv);
+zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
+zoned_ext.max_active_zones =
+be32_to_cpu(zoned_ext.max_active_zones);
+zoned_ext.max_append_sectors =
+be32_to_cpu(zoned_ext.max_append_sectors);
+s->zoned_header = zoned_ext;
+
+#ifdef DEBUG_EXT
+printf("Qcow2: Got zoned format extension: "
+   "offset=%" PRIu32 "\n", offset);
+#endif
+break;
+}
+
 default:
 /* unknown magic - save it in case we need to rewrite the header */
 /* If you add a new feature, make sure to also update the fast
@@ -3089,6 +3123,31 @@ int qcow2_update_header(BlockDriverState *bs)
 buflen -= ret;
 }
 
+/* Zoned devices header extension */
+if (s->zoned_header.zoned == BLK_Z_HM) {
+Qcow2ZonedHeaderExtension zoned_header = {
+.zoned_profile  = s->zoned_header.zoned_profile,
+.zoned  = s->zoned_header.zoned,
+.nr_zones   = cpu_to_be32(s->zoned_header.nr_zones),
+.zone_size  = cpu_to_be32(s->zoned_header.zone_size),
+.zone_capacity  = cpu_to_be32(s->zoned_header.zone_capacity),
+.zone_nr_conv   = cpu_to_be32(s->zoned_header.zone_nr_conv),
+.max_open_zones = cpu_to_be32(s->zoned_header.max_open_zones),
+.max_active_zones   =
+cpu_to_be32(s->zoned_header.max_active_zones),
+.max_append_sectors =
+cpu_to_be32(s->zoned_header.max_append_sectors)
+};
+ret = header_ext_add(buf, QCOW2_EXT_MAGIC_ZONED_FORMAT,
+ _header, sizeof(zoned_header),
+ buflen);
+if (ret < 0) {
+goto fail;
+}
+buf += ret;
+buflen -= ret;
+}
+
 /* Keep unknown header extensions */
 QLIST_FOREACH(uext, >unknown_header_ext, next) {
 ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
@@ -3773,6 +3832,23 @@ qcow2_co_create(BlockdevCreateOptions *create_op

[PATCH v2] block/file-posix: fix g_file_get_contents return path

2023-07-27 Thread Sam Li
The g_file_get_contents() function returns a g_boolean. If it fails, the
returned value will be 0 instead of -1. Solve the issue by skipping
assigning ret value.

This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.

Signed-off-by: Sam Li 
Reviewed-by: Matthew Rosato 
Reviewed-by: Stefan Hajnoczi 
---
 block/file-posix.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 9e8e3d8ca5..b16e9c21a1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1232,7 +1232,6 @@ static int hdev_get_max_hw_transfer(int fd, struct stat 
*st)
 static int get_sysfs_str_val(struct stat *st, const char *attribute,
  char **val) {
 g_autofree char *sysfspath = NULL;
-int ret;
 size_t len;
 
 if (!S_ISBLK(st->st_mode)) {
@@ -1242,8 +1241,7 @@ static int get_sysfs_str_val(struct stat *st, const char 
*attribute,
 sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
 major(st->st_rdev), minor(st->st_rdev),
 attribute);
-ret = g_file_get_contents(sysfspath, val, , NULL);
-if (ret == -1) {
+if (!g_file_get_contents(sysfspath, val, , NULL)) {
 return -ENOENT;
 }
 
@@ -1253,7 +1251,7 @@ static int get_sysfs_str_val(struct stat *st, const char 
*attribute,
 if (*(p + len - 1) == '\n') {
 *(p + len - 1) = '\0';
 }
-return ret;
+return 0;
 }
 #endif
 
-- 
2.40.1




Re: [PATCH 1/2] block/file-posix: fix g_file_get_contents return path

2023-07-27 Thread Sam Li
Matthew Rosato  于2023年7月27日周四 19:46写道:
>
> On 7/5/23 10:54 AM, Matthew Rosato wrote:
> > On 6/4/23 2:16 AM, Sam Li wrote:
> >> The g_file_get_contents() function returns a g_boolean. If it fails, the
> >> returned value will be 0 instead of -1. Solve the issue by skipping
> >> assigning ret value.
> >>
> >> This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
> >> by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.
> >>
> >> Signed-off-by: Sam Li 
> >
> > Polite ping on this patch -- this issue still exists in master as of today 
> > and this patch resolves it for me.  Just want to make sure it gets into 8.1
> >
>
> Ping -- I can still reproduce this crash on -rc1.  Any chance this patch can 
> get picked up for the 8.1 release?
>
> @Sam I see you sent a v2 of only patch #2 in this series ('block/file-posix: 
> fix wps checking in raw_co_prw')..  I wonder if this one just got forgotten 
> since it wasn't sent as part of v2.  Maybe try a resend of this patch by 
> itself (plus the review tags added)?

Ok, I will resend it as a separate patch.

>
> Thanks,
> Matt
>
> >
> >> ---
> >>  block/file-posix.c | 6 ++
> >>  1 file changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/block/file-posix.c b/block/file-posix.c
> >> index ac1ed54811..0d9d179a35 100644
> >> --- a/block/file-posix.c
> >> +++ b/block/file-posix.c
> >> @@ -1232,7 +1232,6 @@ static int hdev_get_max_hw_transfer(int fd, struct 
> >> stat *st)
> >>  static int get_sysfs_str_val(struct stat *st, const char *attribute,
> >>   char **val) {
> >>  g_autofree char *sysfspath = NULL;
> >> -int ret;
> >>  size_t len;
> >>
> >>  if (!S_ISBLK(st->st_mode)) {
> >> @@ -1242,8 +1241,7 @@ static int get_sysfs_str_val(struct stat *st, const 
> >> char *attribute,
> >>  sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
> >>  major(st->st_rdev), minor(st->st_rdev),
> >>  attribute);
> >> -ret = g_file_get_contents(sysfspath, val, , NULL);
> >> -if (ret == -1) {
> >> +if (!g_file_get_contents(sysfspath, val, , NULL)) {
> >>  return -ENOENT;
> >>  }
> >>
> >> @@ -1253,7 +1251,7 @@ static int get_sysfs_str_val(struct stat *st, const 
> >> char *attribute,
> >>  if (*(p + len - 1) == '\n') {
> >>  *(p + len - 1) = '\0';
> >>  }
> >> -return ret;
> >> +return 0;
> >>  }
> >>  #endif
> >>
> >
> >
>



Re: [RFC 2/4] qcow2: add configurations for zoned format extension

2023-06-20 Thread Sam Li
Stefan Hajnoczi  于2023年6月20日周二 22:48写道:
>
> On Mon, Jun 19, 2023 at 10:50:31PM +0800, Sam Li wrote:
> > Stefan Hajnoczi  于2023年6月19日周一 22:42写道:
> > >
> > > On Mon, Jun 19, 2023 at 06:32:52PM +0800, Sam Li wrote:
> > > > Stefan Hajnoczi  于2023年6月19日周一 18:10写道:
> > > > > On Mon, Jun 05, 2023 at 06:41:06PM +0800, Sam Li wrote:
> > > > > > diff --git a/block/qcow2.h b/block/qcow2.h
> > > > > > index 4f67eb912a..fe18dc4d97 100644
> > > > > > --- a/block/qcow2.h
> > > > > > +++ b/block/qcow2.h
> > > > > > @@ -235,6 +235,20 @@ typedef struct Qcow2CryptoHeaderExtension {
> > > > > >  uint64_t length;
> > > > > >  } QEMU_PACKED Qcow2CryptoHeaderExtension;
> > > > > >
> > > > > > +typedef struct Qcow2ZonedHeaderExtension {
> > > > > > +/* Zoned device attributes */
> > > > > > +BlockZonedProfile zoned_profile;
> > > > > > +BlockZoneModel zoned;
> > > > > > +uint32_t zone_size;
> > > > > > +uint32_t zone_capacity;
> > > > > > +uint32_t nr_zones;
> > > > > > +uint32_t zone_nr_conv;
> > > > > > +uint32_t max_active_zones;
> > > > > > +uint32_t max_open_zones;
> > > > > > +uint32_t max_append_sectors;
> > > > > > +uint8_t padding[3];
> > > > >
> > > > > This looks strange. Why is there 3 bytes of padding at the end? 
> > > > > Normally
> > > > > padding would align to an even power-of-two number of bytes like 2, 4,
> > > > > 8, etc.
> > > >
> > > > It is calculated as 3 if sizeof(zoned+zoned_profile) = 8. Else if it's
> > > > 16, the padding is 2.
> > >
> > > I don't understand. Can you explain why there is padding at the end of
> > > this struct?
> >
> > The overall size should be aligned with 64 bit, which leaves use one
> > uint32_t and two fields zoned, zoned_profile. I am not sure the size
> > of macros here and it used 4 for each. So it makes 3 (*8) + 32 + 8 =
> > 64 in the end. If the macro size is wrong, then the padding will
> > change as well.
>
> The choice of the type (char or int) representing an enum is
> implementation-defined according to the C17 standard (see "6.7.2.2
> Enumeration specifiers").
>
> Therefore it's not portable to use enums in structs exposed to the
> outside world (on-disk formats or network protocols).
>
> Please use uint8_t for the zoned_profile and zoned fields and move them
> to the end of the struct so the uint32_t fields are naturally aligned.
>
> I think only 2 bytes of padding will be required to align the struct to
> a 64-bit boundary once you've done that.

I see. Thanks!

Sam



  1   2   3   4   5   >