Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-12-08 Thread Klaus Jensen
On Dec  8 20:02, Dmitry Fomichev wrote:
> Hi Klaus,
> 
> Thank you for your review! Please see replies below...
> 
> 
> On Thu, 2020-11-12 at 20:36 +0100, Klaus Jensen wrote:
> > Hi Dmitry,
> > 
> > I know you posted v10, but my comments should be relevant to that as
> > well.
> > 
> > On Nov  5 11:53, Dmitry Fomichev wrote:
> > > @@ -133,6 +300,12 @@ static Property nvme_ns_props[] = {
> > >  DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
> > >  DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
> > >  DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
> > > +DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
> > 
> > I disagree on this. Using the "magic" value ensures that only one
> > command set can be selected. We can do a custom property so we can set
> > `iocs=zoned` as well as `iocs=0x2` if that makes it more user friendly?
> 
> I doubt that an average admin will even know what "iocs" really means, leave
> alone for knowing any magic values. On the other hand, it would be trivial
> to add a check to prevent users from doing zoned=true kv=true, etc. I don't
> see that as a big problem.
> 

OK, I'm fine with this.

> > 
> > > +DEFINE_PROP_SIZE("zoned.zsze", NvmeNamespace, params.zone_size_bs,
> > > + NVME_DEFAULT_ZONE_SIZE),
> > > +DEFINE_PROP_SIZE("zoned.zcap", NvmeNamespace, params.zone_cap_bs, 0),
> > > +DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace,
> > > + params.cross_zone_read, false),
> > 
> > Same reason why I think we should just expose ozcs directly instead of
> > adding more parameters.
> > 
> > We are already adding a bunch of new parameters - might as well keep the
> > number as low as possible.
> 
> There is only RAZB that is defined in OZCS as of now and you will not be
> able to reduce the number of module parameters by exposing OZCS instead of
> RAZB. But telling the user what RAZB really means in the parameter name is,
> IMO, a better choice.
> 

The TP that shall not be named puts stuff in there but I'm OK with the
zoned.cross_read parameter.

> > > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
> > > +{
> > > +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> > > +NvmeNamespace *ns = req->ns;
> > > +/* cdw12 is zero-based number of dwords to return. Convert to bytes 
> > > */
> > > +uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> > > +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> > > +uint32_t zone_idx, zra, zrasf, partial;
> > > +uint64_t max_zones, nr_zones = 0;
> > > +uint16_t ret;
> > > +uint64_t slba;
> > > +NvmeZoneDescr *z;
> > > +NvmeZone *zs;
> > > +NvmeZoneReportHeader *header;
> > > +void *buf, *buf_p;
> > > +size_t zone_entry_sz;
> > > +
> > > +req->status = NVME_SUCCESS;
> > > +
> > > +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> > > +if (ret) {
> > > +return ret;
> > > +}
> > 
> > Zone Management Receive does not specify anything for the given SLBA.
> > Out-of-bounds is acceptable, just results in no descriptors being
> > returned.
> 
> SLBA is an LBA in the lowest zone that is considered for reporting.
> The text in 4.4.1.1 a) says "report only zone descriptors for which
> the ZSLBA value is greater or equal to the ZSLBA value of the zone
> specified by the SLBA value in the command". The last part of this
> paragraph basically says that SLBA has to point to a zone, hence
> the error if it doesn't.
> 

Hmm. I tend to disagree since nowhere does the spec define that an error
should be returned if the given ZSLBA does not resolve to a valid zone.

> > > +
> > > +zone_idx++;
> > > +}
> > > +
> > > +if (!partial) {
> > > +for (; zone_idx < ns->num_zones; zone_idx++) {
> > > +zs = >zone_array[zone_idx];
> > > +if (nvme_zone_matches_filter(zrasf, zs)) {
> > > +nr_zones++;
> > > +}
> > > +}
> > > +}
> > 
> > I did something like this as well (only counting matching zones from
> > given SLBA), but when looking at the spec now, it seems that this is a
> > remnant from an older version of the spec? Please correct me if wrong.
> > 
> > On the Partial Report bit, the ratified specification just says that "If
> > this bit is cleared to '0', then the value in the Number of Zones field
> > indicates the number of zone descriptors that match the criteria in the
> > Zone Receive Action Specific field.".
> > 
> > So, I think if !partial, the Number of Zones field should not consider
> > the SLBA and just count from 0.
> 
> If Partial is 0, then the header contains the number of descriptors that
> can be potentially reported from SLBA until the end of LBA range if the
> buffer would be unlimited. If the Partial bit is 1, the same count is
> additionally limited by the number of descriptors that can fit to the
> provided buffer. Perhaps ZNS spec is not quite clear about this, but this
> is the way all 

Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-12-08 Thread Dmitry Fomichev
Hi Klaus,

Thank you for your review! Please see replies below...


On Thu, 2020-11-12 at 20:36 +0100, Klaus Jensen wrote:
> Hi Dmitry,
> 
> I know you posted v10, but my comments should be relevant to that as
> well.
> 
> On Nov  5 11:53, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> > 
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> > Define trace events where needed in newly introduced code.
> > 
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> > 
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> > 
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> > 
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> > 
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> > 
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> > Reviewed-by: Niklas Cassel 
> > ---
> >  hw/block/nvme-ns.h|  54 +++
> >  hw/block/nvme.h   |   8 +
> >  hw/block/nvme-ns.c| 173 
> >  hw/block/nvme.c   | 971 +-
> >  hw/block/trace-events |  18 +-
> >  5 files changed, 1209 insertions(+), 15 deletions(-)
> > 
> > diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> > index 2d9cd29d07..d2631ff5a3 100644
> > --- a/hw/block/nvme-ns.h
> > +++ b/hw/block/nvme-ns.h
> > @@ -19,9 +19,20 @@
> >  #define NVME_NS(obj) \
> >  OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
> >  
> > +typedef struct NvmeZone {
> > +NvmeZoneDescr   d;
> > +uint64_tw_ptr;
> > +QTAILQ_ENTRY(NvmeZone) entry;
> > +} NvmeZone;
> > +
> >  typedef struct NvmeNamespaceParams {
> >  uint32_t nsid;
> >  QemuUUID uuid;
> > +
> > +bool zoned;
> > +bool cross_zone_read;
> > +uint64_t zone_size_bs;
> > +uint64_t zone_cap_bs;
> >  } NvmeNamespaceParams;
> >  
> >  typedef struct NvmeNamespace {
> > @@ -34,6 +45,18 @@ typedef struct NvmeNamespace {
> >  bool attached;
> >  uint8_t  csi;
> >  
> > +NvmeIdNsZoned   *id_ns_zoned;
> > +NvmeZone*zone_array;
> > +QTAILQ_HEAD(, NvmeZone) exp_open_zones;
> > +QTAILQ_HEAD(, NvmeZone) imp_open_zones;
> > +QTAILQ_HEAD(, NvmeZone) closed_zones;
> > +QTAILQ_HEAD(, NvmeZone) full_zones;
> > +uint32_tnum_zones;
> > +uint64_tzone_size;
> > +uint64_tzone_capacity;
> > +uint64_tzone_array_size;
> > +uint32_tzone_size_log2;
> > +
> >  NvmeNamespaceParams params;
> >  } NvmeNamespace;
> >  
> > @@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, 
> > uint64_t lba)
> >  
> >  typedef struct NvmeCtrl NvmeCtrl;
> >  
> > +static inline uint8_t nvme_get_zone_state(NvmeZone *zone)
> 
> This can (should?) return the NvmeZoneState enum.

Ok, good idea.

> 
> > +{
> > +return zone->d.zs >> 4;
> > +}
> > +
> > +static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState 
> > state)
> > +{
> > +zone->d.zs = state << 4;
> > +}
> > +
> > +static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone 
> > *zone)
> > +{
> > +return zone->d.zslba + ns->zone_size;
> > +}
> > +
> > +static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone)
> > +{
> > +return zone->d.zslba + zone->d.zcap;
> > +}
> > +
> > +static inline bool nvme_wp_is_valid(NvmeZone *zone)
> > +{
> > +uint8_t st = nvme_get_zone_state(zone);
> > +
> > +return st != NVME_ZONE_STATE_FULL &&
> > +   st != NVME_ZONE_STATE_READ_ONLY &&
> > +   st != NVME_ZONE_STATE_OFFLINE;
> > +}
> > +
> >  int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
> >  void nvme_ns_drain(NvmeNamespace *ns);
> >  void 

Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-25 Thread Klaus Jensen
On Nov 17 16:32, Keith Busch wrote:
> On Thu, Nov 12, 2020 at 08:36:39PM +0100, Klaus Jensen wrote:
> > On Nov  5 11:53, Dmitry Fomichev wrote:
> > > @@ -133,6 +300,12 @@ static Property nvme_ns_props[] = {
> > >  DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
> > >  DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
> > >  DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
> > > +DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
> > 
> > I disagree on this. Using the "magic" value ensures that only one
> > command set can be selected. We can do a custom property so we can set
> > `iocs=zoned` as well as `iocs=0x2` if that makes it more user friendly?
> 
> IMO, 'iocs' is a rather difficult parameter name for a user to remember
> compared to 'zoned=true'. While 'iocs' is a spec field, the spec isn't
> very user friendly either, and these user parameters can hide the spec
> terms behind human comprehensible names.
> 

I'm okay with a "zoned" bool parameter and having zone size and capacity
in bytes. But parameters such as ZASL, MAR and MOR are "expert"
parameters so I think its better to use the spec field names and
meanings for those. The nvme device already has precedence for using
spec field names (and meanings, e.g. zeroes based) for "expert"
parameters (mdts, aerl).


signature.asc
Description: PGP signature


Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-17 Thread Keith Busch
On Thu, Nov 12, 2020 at 08:36:39PM +0100, Klaus Jensen wrote:
> On Nov  5 11:53, Dmitry Fomichev wrote:
> > @@ -133,6 +300,12 @@ static Property nvme_ns_props[] = {
> >  DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
> >  DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
> >  DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
> > +DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
> 
> I disagree on this. Using the "magic" value ensures that only one
> command set can be selected. We can do a custom property so we can set
> `iocs=zoned` as well as `iocs=0x2` if that makes it more user friendly?

IMO, 'iocs' is a rather difficult parameter name for a user to remember
compared to 'zoned=true'. While 'iocs' is a spec field, the spec isn't
very user friendly either, and these user parameters can hide the spec
terms behind human comprehensible names.



Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-12 Thread Klaus Jensen
Hi Dmitry,

I know you posted v10, but my comments should be relevant to that as
well.

On Nov  5 11:53, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 
> ---
>  hw/block/nvme-ns.h|  54 +++
>  hw/block/nvme.h   |   8 +
>  hw/block/nvme-ns.c| 173 
>  hw/block/nvme.c   | 971 +-
>  hw/block/trace-events |  18 +-
>  5 files changed, 1209 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index 2d9cd29d07..d2631ff5a3 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -19,9 +19,20 @@
>  #define NVME_NS(obj) \
>  OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
>  
> +typedef struct NvmeZone {
> +NvmeZoneDescr   d;
> +uint64_tw_ptr;
> +QTAILQ_ENTRY(NvmeZone) entry;
> +} NvmeZone;
> +
>  typedef struct NvmeNamespaceParams {
>  uint32_t nsid;
>  QemuUUID uuid;
> +
> +bool zoned;
> +bool cross_zone_read;
> +uint64_t zone_size_bs;
> +uint64_t zone_cap_bs;
>  } NvmeNamespaceParams;
>  
>  typedef struct NvmeNamespace {
> @@ -34,6 +45,18 @@ typedef struct NvmeNamespace {
>  bool attached;
>  uint8_t  csi;
>  
> +NvmeIdNsZoned   *id_ns_zoned;
> +NvmeZone*zone_array;
> +QTAILQ_HEAD(, NvmeZone) exp_open_zones;
> +QTAILQ_HEAD(, NvmeZone) imp_open_zones;
> +QTAILQ_HEAD(, NvmeZone) closed_zones;
> +QTAILQ_HEAD(, NvmeZone) full_zones;
> +uint32_tnum_zones;
> +uint64_tzone_size;
> +uint64_tzone_capacity;
> +uint64_tzone_array_size;
> +uint32_tzone_size_log2;
> +
>  NvmeNamespaceParams params;
>  } NvmeNamespace;
>  
> @@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t 
> lba)
>  
>  typedef struct NvmeCtrl NvmeCtrl;
>  
> +static inline uint8_t nvme_get_zone_state(NvmeZone *zone)

This can (should?) return the NvmeZoneState enum.

> +{
> +return zone->d.zs >> 4;
> +}
> +
> +static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState 
> state)
> +{
> +zone->d.zs = state << 4;
> +}
> +
> +static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone 
> *zone)
> +{
> +return zone->d.zslba + ns->zone_size;
> +}
> +
> +static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone)
> +{
> +return zone->d.zslba + zone->d.zcap;
> +}
> +
> +static inline bool nvme_wp_is_valid(NvmeZone *zone)
> +{
> +uint8_t st = nvme_get_zone_state(zone);
> +
> +return st != NVME_ZONE_STATE_FULL &&
> +   st != NVME_ZONE_STATE_READ_ONLY &&
> +   st != NVME_ZONE_STATE_OFFLINE;
> +}
> +
>  int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
>  void nvme_ns_drain(NvmeNamespace *ns);
>  void nvme_ns_flush(NvmeNamespace *ns);
> +void nvme_ns_shutdown(NvmeNamespace *ns);
> +void nvme_ns_cleanup(NvmeNamespace *ns);
>  
>  #endif /* NVME_NS_H */
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index e080a2318a..4cb0615128 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -6,6 +6,9 @@
>  
>  #define NVME_MAX_NAMESPACES 256
>  
> +#define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
> +#define NVME_DEFAULT_MAX_ZA_SIZE (128 

RE: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-06 Thread Dmitry Fomichev
> -Original Message-
> From: Niklas Cassel 
> Sent: Friday, November 6, 2020 6:59 AM
> To: Dmitry Fomichev 
> Cc: Keith Busch ; Klaus Jensen
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Max Reitz ;
> Maxim Levitsky ; Fam Zheng ;
> Alistair Francis ; Matias Bjorling
> ; Damien Le Moal ;
> qemu-bl...@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> On Thu, Nov 05, 2020 at 11:53:38AM +0900, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller
> emulator.
> > Define trace events where needed in newly introduced code.
> >
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> >
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> > Reviewed-by: Niklas Cassel 
> > ---
> >  hw/block/nvme-ns.h|  54 +++
> >  hw/block/nvme.h   |   8 +
> >  hw/block/nvme-ns.c| 173 
> >  hw/block/nvme.c   | 971
> +-
> >  hw/block/trace-events |  18 +-
> >  5 files changed, 1209 insertions(+), 15 deletions(-)
> >
> 
> (snip)
> 
> > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest
> *req)
> > +{
> > +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> > +NvmeNamespace *ns = req->ns;
> > +/* cdw12 is zero-based number of dwords to return. Convert to bytes
> */
> > +uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> > +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> > +uint32_t zone_idx, zra, zrasf, partial;
> > +uint64_t max_zones, nr_zones = 0;
> > +uint16_t ret;
> > +uint64_t slba;
> > +NvmeZoneDescr *z;
> > +NvmeZone *zs;
> > +NvmeZoneReportHeader *header;
> > +void *buf, *buf_p;
> > +size_t zone_entry_sz;
> > +
> > +req->status = NVME_SUCCESS;
> > +
> > +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> > +if (ret) {
> > +return ret;
> > +}
> > +
> > +zra = dw13 & 0xff;
> > +if (zra != NVME_ZONE_REPORT) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +zrasf = (dw13 >> 8) & 0xff;
> > +if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +if (data_size < sizeof(NvmeZoneReportHeader)) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +ret = nvme_map_dptr(n, data_size, req);
> 
> nvme_map_dptr() call was not here in v8 patch set.
> 
> On v7 I commented that you were missing a call to nvme_check_mdts().
> I think you still need to call nvme_check_mdts in order to verify
> that data_size < mdts, no?

Ugh, I've added nvme_map_dptr() instead of nvme_check_mdts() :o
Will send the corrected version shortly...

Cheers,
DF

&

Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-06 Thread Niklas Cassel
On Thu, Nov 05, 2020 at 11:53:38AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Niklas Cassel 
> ---
>  hw/block/nvme-ns.h|  54 +++
>  hw/block/nvme.h   |   8 +
>  hw/block/nvme-ns.c| 173 
>  hw/block/nvme.c   | 971 +-
>  hw/block/trace-events |  18 +-
>  5 files changed, 1209 insertions(+), 15 deletions(-)
> 

(snip)

> +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
> +{
> +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> +NvmeNamespace *ns = req->ns;
> +/* cdw12 is zero-based number of dwords to return. Convert to bytes */
> +uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> +uint32_t zone_idx, zra, zrasf, partial;
> +uint64_t max_zones, nr_zones = 0;
> +uint16_t ret;
> +uint64_t slba;
> +NvmeZoneDescr *z;
> +NvmeZone *zs;
> +NvmeZoneReportHeader *header;
> +void *buf, *buf_p;
> +size_t zone_entry_sz;
> +
> +req->status = NVME_SUCCESS;
> +
> +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> +if (ret) {
> +return ret;
> +}
> +
> +zra = dw13 & 0xff;
> +if (zra != NVME_ZONE_REPORT) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +zrasf = (dw13 >> 8) & 0xff;
> +if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +if (data_size < sizeof(NvmeZoneReportHeader)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
> +ret = nvme_map_dptr(n, data_size, req);

nvme_map_dptr() call was not here in v8 patch set.

On v7 I commented that you were missing a call to nvme_check_mdts().
I think you still need to call nvme_check_mdts in order to verify
that data_size < mdts, no?

This function already has a call do nvme_dma(). nvme_dma() already
calls nvme_map_dptr().
If you use nvme_dma(), you cannot use nvme_map_dptr().

It will call nvme_map_addr() (which calls qemu_sglist_add()) on the
same buffer twice, causing the qsg->size to be twice what the user
sent in. Which will cause the:

if (unlikely(residual)) {

check in nvme_dma() to fail.


Looking at nvme_read()/nvme_write(), they use nvme_map_dptr()
(without any call to nvme_dma()), and then use dma_blk_read() or
dma_blk_write(). (and they both call nvme_check_mdts())


Kind regards,
Niklas

> +if (ret) {
> +return ret;
> +}
> +
> +partial = (dw13 >> 16) & 0x01;
> +
> +zone_entry_sz = sizeof(NvmeZoneDescr);
> +
> +max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz;
> +buf = g_malloc0(data_size);
> +
> +header = (NvmeZoneReportHeader *)buf;
> +buf_p = buf + sizeof(NvmeZoneReportHeader);
> +
> +while (zone_idx < ns->num_zones && nr_zones < max_zones) {
> +zs = >zone_array[zone_idx];
> +
> +if (!nvme_zone_matches_filter(zrasf, zs)) {
> +zone_idx++;
> +continue;
> +}
> +
> +z = (NvmeZoneDescr *)buf_p;
> +buf_p += sizeof(NvmeZoneDescr);
> +nr_zones++;
> +
> +z->zt = zs->d.zt;
> +z->zs = zs->d.zs;
> +z->zcap = cpu_to_le64(zs->d.zcap);
> +z->zslba = 

[PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-04 Thread Dmitry Fomichev
The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.

Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.

In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.

Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.

Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.

Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.

Subsequent commits in this series add ZDE support and checks for
active and open zone limits.

Signed-off-by: Niklas Cassel 
Signed-off-by: Hans Holmberg 
Signed-off-by: Ajay Joshi 
Signed-off-by: Chaitanya Kulkarni 
Signed-off-by: Matias Bjorling 
Signed-off-by: Aravind Ramesh 
Signed-off-by: Shin'ichiro Kawasaki 
Signed-off-by: Adam Manzanares 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h|  54 +++
 hw/block/nvme.h   |   8 +
 hw/block/nvme-ns.c| 173 
 hw/block/nvme.c   | 971 +-
 hw/block/trace-events |  18 +-
 5 files changed, 1209 insertions(+), 15 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 2d9cd29d07..d2631ff5a3 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -19,9 +19,20 @@
 #define NVME_NS(obj) \
 OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
 
+typedef struct NvmeZone {
+NvmeZoneDescr   d;
+uint64_tw_ptr;
+QTAILQ_ENTRY(NvmeZone) entry;
+} NvmeZone;
+
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 QemuUUID uuid;
+
+bool zoned;
+bool cross_zone_read;
+uint64_t zone_size_bs;
+uint64_t zone_cap_bs;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -34,6 +45,18 @@ typedef struct NvmeNamespace {
 bool attached;
 uint8_t  csi;
 
+NvmeIdNsZoned   *id_ns_zoned;
+NvmeZone*zone_array;
+QTAILQ_HEAD(, NvmeZone) exp_open_zones;
+QTAILQ_HEAD(, NvmeZone) imp_open_zones;
+QTAILQ_HEAD(, NvmeZone) closed_zones;
+QTAILQ_HEAD(, NvmeZone) full_zones;
+uint32_tnum_zones;
+uint64_tzone_size;
+uint64_tzone_capacity;
+uint64_tzone_array_size;
+uint32_tzone_size_log2;
+
 NvmeNamespaceParams params;
 } NvmeNamespace;
 
@@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t 
lba)
 
 typedef struct NvmeCtrl NvmeCtrl;
 
+static inline uint8_t nvme_get_zone_state(NvmeZone *zone)
+{
+return zone->d.zs >> 4;
+}
+
+static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState 
state)
+{
+zone->d.zs = state << 4;
+}
+
+static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone)
+{
+return zone->d.zslba + ns->zone_size;
+}
+
+static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone)
+{
+return zone->d.zslba + zone->d.zcap;
+}
+
+static inline bool nvme_wp_is_valid(NvmeZone *zone)
+{
+uint8_t st = nvme_get_zone_state(zone);
+
+return st != NVME_ZONE_STATE_FULL &&
+   st != NVME_ZONE_STATE_READ_ONLY &&
+   st != NVME_ZONE_STATE_OFFLINE;
+}
+
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_flush(NvmeNamespace *ns);
+void nvme_ns_shutdown(NvmeNamespace *ns);
+void nvme_ns_cleanup(NvmeNamespace *ns);
 
 #endif /* NVME_NS_H */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e080a2318a..4cb0615128 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -6,6 +6,9 @@
 
 #define NVME_MAX_NAMESPACES 256
 
+#define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
+#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
+
 typedef struct NvmeParams {
 char *serial;
 uint32_t num_queues; /* deprecated since 5.1 */
@@ -16,6 +19,7 @@ typedef struct NvmeParams {
 uint32_t aer_max_queued;
 uint8_t  mdts;
 bool use_intel_id;
+uint32_t zasl_bs;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
@@ -28,6 +32,8 @@ typedef struct NvmeRequest {
 struct NvmeNamespace*ns;
 BlockAIOCB  *aiocb;
 uint16_tstatus;
+