Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Klaus Jensen
On Nov  3 21:37, Philippe Mathieu-Daudé wrote:
> On 11/3/20 8:48 PM, Dmitry Fomichev wrote:
> >> -Original Message-
> >> From: Philippe Mathieu-Daudé 
> ...
> >>>  typedef struct QEMU_PACKED NvmeCqe {
> >>> -uint32_tresult;
> >>> -uint32_trsvd;
> >>> +union {
> >>> +uint64_t result64;
> >>> +uint32_t result32;
> >>> +};
> >>
> >> When using packed structure you want to define all fields to
> >> avoid alignment confusion (and I'm surprised the compiler doesn't
> >> complain...). So this would be:
> >>
> >>union {
> >>uint64_t result64;
> >>struct {
> >>uint32_tresult32;
> >>uint32_trsvd32;
> >>};
> >>};
> >>

I align (hehe...) towards this. The amount of bit-juggling we need for
commands justify the need for separate NvmeCmd's, but in this case I
think an NvmeCqeZA is just unnecessary clutter. If the result value is
complex, the approach used by AERs is better I think:

   typedef struct QEMU_PACKED NvmeAerResult {
   uint8_t event_type;
   uint8_t event_info;
   uint8_t log_page;
   uint8_t resv;
   } NvmeAerResult;

   NvmeAerResult *result = (NvmeAerResult *)>cqe.result32;

Since storing the Zone Append ALBA in the result64 isn't really a
complex operation, let's just assign it into that member directly.

(Addendum) That DW1 is command specific and no longer reserved is
defined by TP 4056 (Namespace Types) - not v1.4 or any of its revisions.


signature.asc
Description: PGP signature


Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Philippe Mathieu-Daudé
On 11/3/20 8:48 PM, Dmitry Fomichev wrote:
>> -Original Message-
>> From: Philippe Mathieu-Daudé 
...
>>>  typedef struct QEMU_PACKED NvmeCqe {
>>> -uint32_tresult;
>>> -uint32_trsvd;
>>> +union {
>>> +uint64_t result64;
>>> +uint32_t result32;
>>> +};
>>
>> When using packed structure you want to define all fields to
>> avoid alignment confusion (and I'm surprised the compiler doesn't
>> complain...). So this would be:
>>
>>union {
>>uint64_t result64;
>>struct {
>>uint32_tresult32;
>>uint32_trsvd32;
>>};
>>};
>>
> 
> IMO, the compiler doesn't complain because it's a union. Smaller
> variants in unions are "padded" to the size of the largest variant
> regardless of whether the struct is packed or not.
> 
>> But since the ZNS is still a technical proposal and not in the spec,
>> this doesn't look correct (the spec list this field as 32-bit).
>>
>> What do you think about adding NvmeCqeZNS?
>>
>> Maybe:
>>
>>   typedef struct QEMU_PACKED NvmeCqeZNS {
>>   uint64_tresult;
>>   uint16_tsq_head;
>>   uint16_tsq_id;
>>   uint16_tcid;
>>   uint16_tstatus;
>>   } NvmeCqeZNS;
>>
>> Or clever:
>>
>>   typedef union QEMU_PACKED NvmeCqeZNS {
>>   union {
>>   struct {
>>   uint64_t result;
>>   uint32_t dw2;
>>   uint32_t dw3;
>>   };
>>   NvmeCqe  cqe;
>>   };
>>   } NvmeCqeZNS;
>>
> 
> The 1.4 base spec changes Reserved DW1 in CQE to become the
> Command Specific DW1, so it would rather make sense to define
> a command-specific CQE for Zone Append -
> 
> In include/block/nvme.h:
> 
> typedef struct QEMU_PACKED NvmeCqe {
>  uint32_tresult;
> -uint32_trsvd;
> +uint32_tdw1;
>  uint16_tsq_head;
>  uint16_tsq_id;
>  uint16_tcid;
>  uint16_tstatus;
> } NvmeCqe;
> 
> +/* Zone Append - specific CQE */
> +typedef struct QEMU_PACKED NvmeCqeZA {
> +uint64_tza_slba;
> +uint16_tsq_head;
> +uint16_tsq_id;
> +uint16_tcid;
> +uint16_tstatus;
> +} NvmeCqeZA;
> 
> ...
> 
> +QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != sizeof(NvmeCqeZA));
> 
> This will go away with all CQE unions and it will also allow the returned SLBA
> value to be properly named. What do you think?

This is cleaner, I like it :)

> 
>> I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h".
> 
> NvmeCqeZA could simply be defined in include/block/nvme.h next to NvmeCqe.
> The problem with adding include/block/nvme-zns.h is that it would be hard if
> not impossible to separate all ZNS-specific content from block/nvme.h and it
> would become necessary for developers to deal with two files that present
> different parts of ZNS definitions instead of just one.

Got it.

Regards,

Phil.

> 
> Best regards,
> Dmitry




RE: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-03 Thread Dmitry Fomichev
> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Friday, October 30, 2020 3:16 AM
> To: Dmitry Fomichev ; Keith Busch
> ; Klaus Jensen ; Kevin Wolf
> ; Maxim Levitsky ; Fam Zheng
> 
> Cc: Alistair Francis ; Matias Bjorling
> ; Niklas Cassel ;
> Damien Le Moal ; qemu-block@nongnu.org;
> qemu-de...@nongnu.org
> Subject: Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> Hi Dmitry,
> 
> On 10/30/20 3:32 AM, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller
> emulator.
> > Define trace events where needed in newly introduced code.
> >
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> >
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> > Reviewed-by: Niklas Cassel 
> > ---
> >  block/nvme.c  |   2 +-
> >  hw/block/nvme-ns.c| 173 
> >  hw/block/nvme-ns.h|  54 +++
> >  hw/block/nvme.c   | 977
> +-
> >  hw/block/nvme.h   |   8 +
> >  hw/block/trace-events |  18 +-
> >  include/block/nvme.h  | 113 -
> 
> When you start modifying include/ files, it is recommended
> to start using scripts/git.orderfile as this makes review
> easier (no need to scroll back / up constantly).
>

Hi Philippe,

Thanks for the suggestion, indeed it makes browsing through
diffs much easier!

> As "block/nvme.h" is shared by 2 subsystems, keeping its
> changes in a separate patch is preferred.
>

No problem, I'll move all changes to include/block/nvme.h to
a separate patch.
 
> >  7 files changed, 1322 insertions(+), 23 deletions(-)
> >
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 05485fdd11..7a513c9a17 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const
> NvmeCqe *c)
> >  {
> >  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
> >  if (status) {
> > -trace_nvme_error(le32_to_cpu(c->result),
> > +trace_nvme_error(le32_to_cpu(c->result32),
> >   le16_to_cpu(c->sq_head),
> >   le16_to_cpu(c->sq_id),
> >   le16_to_cpu(c->cid),
> ...
> 
> > diff --git a/include/block/nvme.h b/include/block/nvme.h
> > index 3653b4aefc..ba8a45edf5 100644
> > --- a/include/block/nvme.h
> > +++ b/include/block/nvme.h
> > @@ -489,6 +489,9 @@ enum NvmeIoCommands {
> >  NVME_CMD_COMPARE= 0x05,
> >  NVME_CMD_WRITE_ZEROES   = 0x08,
> >  NVME_CMD_DSM= 0x09,
> > +NVME_CMD_ZONE_MGMT_SEND = 0x79,
> > +NVME_CMD_ZONE_MGMT_RECV = 0x7a,
> > +NVME_CMD_ZONE_APPEND= 0x7d,
> >  };
> >
> >  typedef struct QEMU_PACKED NvmeDeleteQ {
> > @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeA

Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-30 Thread Niklas Cassel
On Fri, Oct 30, 2020 at 11:32:38AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 

(snip)

> +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
> +{
> +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> +NvmeNamespace *ns = req->ns;
> +/* cdw12 is zero-based number of dwords to return. Convert to bytes */
> +uint32_t len = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> +uint32_t zone_idx, zra, zrasf, partial;
> +uint64_t max_zones, nr_zones = 0;
> +uint16_t ret;
> +uint64_t slba;
> +NvmeZoneDescr *z;
> +NvmeZone *zs;
> +NvmeZoneReportHeader *header;
> +void *buf, *buf_p;
> +size_t zone_entry_sz;
> +
> +req->status = NVME_SUCCESS;
> +
> +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> +if (ret) {
> +return ret;
> +}
> +
> +if (len < sizeof(NvmeZoneReportHeader)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}

Just like nvme_read() and nvme_write(), nvme_zone_mgmt_recv()
has to do something like:

+ret = nvme_check_mdts(n, len);
+if (ret) {
+trace_pci_nvme_err_mdts(nvme_cid(req), len);
+return ret;
+}
+

To see that we are not exceeding MDTS.


Kind regards,
Niklas


> +
> +zra = dw13 & 0xff;
> +if (!(zra == NVME_ZONE_REPORT || zra == NVME_ZONE_REPORT_EXTENDED)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +if (zra == NVME_ZONE_REPORT_EXTENDED) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +zrasf = (dw13 >> 8) & 0xff;
> +if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +partial = (dw13 >> 16) & 0x01;
> +
> +zone_entry_sz = sizeof(NvmeZoneDescr);
> +
> +max_zones = (len - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
> +buf = g_malloc0(len);
> +
> +header = (NvmeZoneReportHeader *)buf;
> +buf_p = buf + sizeof(NvmeZoneReportHeader);
> +
> +while (zone_idx < ns->num_zones && nr_zones < max_zones) {
> +zs = >zone_array[zone_idx];
> +
> +if (!nvme_zone_matches_filter(zrasf, zs)) {
> +zone_idx++;
> +continue;
> +}
> +
> +z = (NvmeZoneDescr *)buf_p;
> +buf_p += sizeof(NvmeZoneDescr);
> +nr_zones++;
> +
> +z->zt = zs->d.zt;
> +z->zs = zs->d.zs;
> +z->zcap = cpu_to_le64(zs->d.zcap);
> +z->zslba = cpu_to_le64(zs->d.zslba);
> +z->za = zs->d.za;
> +
> +if (nvme_wp_is_valid(zs)) {
> +z->wp = cpu_to_le64(zs->d.wp);
> +} else {
> +z->wp = cpu_to_le64(~0ULL);
> +}
> +
> +zone_idx++;
> +}
> +
> +if (!partial) {
> +for (; zone_idx < ns->num_zones; zone_idx++) {
> +zs = >zone_array[zone_idx];
> +if (nvme_zone_matches_filter(zrasf, zs)) {
> +nr_zones++;
> +}
> +}
> +}
> +header->nr_zones = cpu_to_le64(nr_zones);
> +
> +ret = nvme_dma(n, (uint8_t *)buf, len, DMA_DIRECTION_FROM_DEVICE, req);
> +
> +g_free(buf);
> +
> +return ret;
> +}


Re: [PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-30 Thread Philippe Mathieu-Daudé
Hi Dmitry,

On 10/30/20 3:32 AM, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 
> ---
>  block/nvme.c  |   2 +-
>  hw/block/nvme-ns.c| 173 
>  hw/block/nvme-ns.h|  54 +++
>  hw/block/nvme.c   | 977 +-
>  hw/block/nvme.h   |   8 +
>  hw/block/trace-events |  18 +-
>  include/block/nvme.h  | 113 -

When you start modifying include/ files, it is recommended
to start using scripts/git.orderfile as this makes review
easier (no need to scroll back / up constantly).

As "block/nvme.h" is shared by 2 subsystems, keeping its
changes in a separate patch is preferred.

>  7 files changed, 1322 insertions(+), 23 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 05485fdd11..7a513c9a17 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
>  {
>  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
>  if (status) {
> -trace_nvme_error(le32_to_cpu(c->result),
> +trace_nvme_error(le32_to_cpu(c->result32),
>   le16_to_cpu(c->sq_head),
>   le16_to_cpu(c->sq_id),
>   le16_to_cpu(c->cid),
...

> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 3653b4aefc..ba8a45edf5 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -489,6 +489,9 @@ enum NvmeIoCommands {
>  NVME_CMD_COMPARE= 0x05,
>  NVME_CMD_WRITE_ZEROES   = 0x08,
>  NVME_CMD_DSM= 0x09,
> +NVME_CMD_ZONE_MGMT_SEND = 0x79,
> +NVME_CMD_ZONE_MGMT_RECV = 0x7a,
> +NVME_CMD_ZONE_APPEND= 0x7d,
>  };
>  
>  typedef struct QEMU_PACKED NvmeDeleteQ {
> @@ -649,8 +652,10 @@ typedef struct QEMU_PACKED NvmeAerResult {
>  } NvmeAerResult;
>  
>  typedef struct QEMU_PACKED NvmeCqe {
> -uint32_tresult;
> -uint32_trsvd;
> +union {
> +uint64_t result64;
> +uint32_t result32;
> +};

When using packed structure you want to define all fields to
avoid alignment confusion (and I'm surprised the compiler doesn't
complain...). So this would be:

   union {
   uint64_t result64;
   struct {
   uint32_tresult32;
   uint32_trsvd32;
   };
   };

But since the ZNS is still a technical proposal and not in the spec,
this doesn't look correct (the spec list this field as 32-bit).

What do you think about adding NvmeCqeZNS?

Maybe:

  typedef struct QEMU_PACKED NvmeCqeZNS {
  uint64_tresult;
  uint16_tsq_head;
  uint16_tsq_id;
  uint16_tcid;
  uint16_tstatus;
  } NvmeCqeZNS;

Or clever:

  typedef union QEMU_PACKED NvmeCqeZNS {
  union {
  struct {
  uint64_t result;
  uint32_t dw2;
  uint32_t dw3;
  };
  NvmeCqe  cqe;
  };
  } NvmeCqeZNS;

I wonder what part could go in hw/block/nvme-ns.h or "block/nvme-zns.h".

>  uint16_tsq_head;
>  uint16_tsq_id;
>  uint16_tcid;
> @@ -678,6 +683,7 @@ enum NvmeStatusCodes {
>  NVME_SGL_DESCR_TYPE_INVALID = 

[PATCH v8 07/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-29 Thread Dmitry Fomichev
The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.

Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.

In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.

Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.

Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.

Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.

Subsequent commits in this series add ZDE support and checks for
active and open zone limits.

Signed-off-by: Niklas Cassel 
Signed-off-by: Hans Holmberg 
Signed-off-by: Ajay Joshi 
Signed-off-by: Chaitanya Kulkarni 
Signed-off-by: Matias Bjorling 
Signed-off-by: Aravind Ramesh 
Signed-off-by: Shin'ichiro Kawasaki 
Signed-off-by: Adam Manzanares 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 block/nvme.c  |   2 +-
 hw/block/nvme-ns.c| 173 
 hw/block/nvme-ns.h|  54 +++
 hw/block/nvme.c   | 977 +-
 hw/block/nvme.h   |   8 +
 hw/block/trace-events |  18 +-
 include/block/nvme.h  | 113 -
 7 files changed, 1322 insertions(+), 23 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 05485fdd11..7a513c9a17 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
 {
 uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
 if (status) {
-trace_nvme_error(le32_to_cpu(c->result),
+trace_nvme_error(le32_to_cpu(c->result32),
  le16_to_cpu(c->sq_head),
  le16_to_cpu(c->sq_id),
  le16_to_cpu(c->cid),
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index e191ef9be0..e6db7f7d3b 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -25,6 +25,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
 
+#include "trace.h"
 #include "nvme.h"
 #include "nvme-ns.h"
 
@@ -77,6 +78,151 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, 
Error **errp)
 return 0;
 }
 
+static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp)
+{
+uint64_t zone_size, zone_cap;
+uint32_t nz, lbasz = ns->blkconf.logical_block_size;
+
+if (ns->params.zone_size_bs) {
+zone_size = ns->params.zone_size_bs;
+} else {
+zone_size = NVME_DEFAULT_ZONE_SIZE;
+}
+if (ns->params.zone_cap_bs) {
+zone_cap = ns->params.zone_cap_bs;
+} else {
+zone_cap = zone_size;
+}
+if (zone_cap > zone_size) {
+error_setg(errp, "zone capacity %luB exceeds zone size %luB",
+   zone_cap, zone_size);
+return -1;
+}
+if (zone_size < lbasz) {
+error_setg(errp, "zone size %luB too small, must be at least %uB",
+   zone_size, lbasz);
+return -1;
+}
+if (zone_cap < lbasz) {
+error_setg(errp, "zone capacity %luB too small, must be at least %uB",
+   zone_cap, lbasz);
+return -1;
+}
+ns->zone_size = zone_size / lbasz;
+ns->zone_capacity = zone_cap / lbasz;
+
+nz = DIV_ROUND_UP(ns->size / lbasz, ns->zone_size);
+ns->num_zones = nz;
+ns->zone_array_size = sizeof(NvmeZone) * nz;
+ns->zone_size_log2 = 0;
+if (is_power_of_2(ns->zone_size)) {
+ns->zone_size_log2 = 63 - clz64(ns->zone_size);
+}
+
+return 0;
+}
+
+static void nvme_init_zone_state(NvmeNamespace *ns)
+{
+uint64_t start = 0, zone_size = ns->zone_size;
+uint64_t capacity = ns->num_zones * zone_size;
+NvmeZone *zone;
+int i;
+
+ns->zone_array = g_malloc0(ns->zone_array_size);
+
+QTAILQ_INIT(>exp_open_zones);
+QTAILQ_INIT(>imp_open_zones);
+QTAILQ_INIT(>closed_zones);
+QTAILQ_INIT(>full_zones);
+
+zone = ns->zone_array;
+for (i = 0; i < ns->num_zones; i++, zone++) {
+if (start + zone_size > capacity) {
+zone_size = capacity - start;
+}
+zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE;
+nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY);
+