[PATCH] hw/nvme: remove param zoned.auto_transition

2022-08-12 Thread Niklas Cassel via
The intention of the Zoned Namespace Command Set Specification was
never to make an automatic zone transition optional.

Excerpt from the nvmexpress.org zns mailing list:
"""
A question came up internally on the differences between ZNS and ZAC/ZBC
that asked about when a controller should transitions a specific zone in
the Implicitly Opened state to Closed state.

For example, consider a ZNS SSD that supports a max of 20 active zones,
and a max of 10 open zones, which has the following actions occur:

First, the host writes to ten empty zones, thereby transitioning 10 zones
to the Implicitly Opened state.

Second, the host issues a write to an 11th empty zone.

Given that state, my understanding of the second part is that the ZNS SSD
chooses one of the previously 10 zones, and transition the chosen zone to
the Closed state, and then proceeds to write to the new zone which also
implicitly transition it from the Empty state to the Impl. Open state.
After this, there would be 11 active zones in total, 10 in impl. Open
state, and one in closed state.

The above assumes that a ZNS SSD will always transition an implicitly
opened zone to closed state when required to free up resources when
another zone is opened. However, it isn’t strictly said in the ZNS spec.

The paragraph that should cover it is defined in section
2.1.1.4.1 – Managing Resources:
The controller may transition zones in the ZSIO:Implicitly Opened state
to the ZSC:Closed state for resource management purposes.

However, it doesn’t say “when” it should occur. Thus, as the text stand,
it could be misinterpreted that the controller shouldn’t do close a zone
to make room for a new zone. The issue with this, is that it makes the
point of having implicitly managed zones moot.

The ZAC/ZBC specs is more specific and clarifies when a zone should be
closed. I think it would be natural to the same here.
"""

While the Zoned Namespace Command Set Specification hasn't received an
errata yet, it is quite clear that the intention was that an automatic
zone transition was never supposed to be optional, as then the whole
point of having implictly open zones would be pointless. Therefore,
remove the param zoned.auto_transition, as this was never supposed to
be controller implementation specific.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c | 35 ---
 hw/nvme/nvme.h |  1 -
 2 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..b8c8075301 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -34,7 +34,6 @@
  *  aerl=,aer_max_queued=, \
  *  mdts=,vsl=, \
  *  zoned.zasl=, \
- *  zoned.auto_transition=, \
  *  sriov_max_vfs= \
  *  sriov_vq_flexible= \
  *  sriov_vi_flexible= \
@@ -106,11 +105,6 @@
  *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
  *   defaulting to the value of `mdts`).
  *
- * - `zoned.auto_transition`
- *   Indicates if zones in zone state implicitly opened can be automatically
- *   transitioned to zone state closed for resource management purposes.
- *   Defaults to 'on'.
- *
  * - `sriov_max_vfs`
  *   Indicates the maximum number of PCIe virtual functions supported
  *   by the controller. The default value is 0. Specifying a non-zero value
@@ -1867,8 +1861,8 @@ enum {
 NVME_ZRM_ZRWA = 1 << 1,
 };
 
-static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, NvmeNamespace *ns,
-NvmeZone *zone, int flags)
+static uint16_t nvme_zrm_open_flags(NvmeNamespace *ns, NvmeZone *zone,
+int flags)
 {
 int act = 0;
 uint16_t status;
@@ -1880,9 +1874,7 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
NvmeNamespace *ns,
 /* fallthrough */
 
 case NVME_ZONE_STATE_CLOSED:
-if (n->params.auto_transition_zones) {
-nvme_zrm_auto_transition_zone(ns);
-}
+nvme_zrm_auto_transition_zone(ns);
 status = nvme_zns_check_resources(ns, act, 1,
   (flags & NVME_ZRM_ZRWA) ? 1 : 0);
 if (status) {
@@ -1925,10 +1917,9 @@ static uint16_t nvme_zrm_open_flags(NvmeCtrl *n, 
NvmeNamespace *ns,
 }
 }
 
-static inline uint16_t nvme_zrm_auto(NvmeCtrl *n, NvmeNamespace *ns,
- NvmeZone *zone)
+static inline uint16_t nvme_zrm_auto(NvmeNamespace *ns, NvmeZone *zone)
 {
-return nvme_zrm_open_flags(n, ns, zone, NVME_ZRM_AUTO);
+return nvme_zrm_open_flags(ns, zone, NVME_ZRM_AUTO);
 }
 
 static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
@@ -3065,7 +3056,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 goto invalid;
 }
 
-status = nvme_zrm_auto(n, ns, iocb->zone);
+status = nvme_zrm_auto(ns, iocb->zone);
 if (status) {
 goto invalid;
 }
@@ -3471,7 +3462,7 @@ static 

[PATCH] hw/nvme: force nvme-ns param 'shared' to false if no nvme-subsys node

2022-06-28 Thread Niklas Cassel via
Since commit 916b0f0b5264 ("hw/nvme: change nvme-ns 'shared' default")
the default value of nvme-ns param 'shared' is set to true, regardless
if there is a nvme-subsys node or not.

On a system without a nvme-subsys node, a namespace will never be able
to be attached to more than one controller, so for this configuration,
it is counterintuitive for this parameter to be set by default.

Force the nvme-ns param 'shared' to false for configurations where
there is no nvme-subsys node, as the namespace will never be able to
attach to more than one controller anyway.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ns.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 870c3ca1a2..62a1f97be0 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -546,6 +546,8 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 int i;
 
 if (!n->subsys) {
+/* If no subsys, the ns cannot be attached to more than one ctrl. */
+ns->params.shared = false;
 if (ns->params.detached) {
 error_setg(errp, "detached requires that the nvme device is "
"linked to an nvme-subsys device");
-- 
2.36.1




[PATCH] hw/nvme: fix example serial in documentation

2022-06-27 Thread Niklas Cassel via
The serial prop on the controller is actually describing the nvme
subsystem serial, which has to be identical for all controllers within
the same nvme subsystem.

This is enforced since commit a859eb9f8f64 ("hw/nvme: enforce common
serial per subsystem").

Fix the documentation, so that people copying the qemu command line
example won't get an error on qemu start.

Signed-off-by: Niklas Cassel 
---
 docs/system/devices/nvme.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index aba253304e..30f841ef62 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -104,8 +104,8 @@ multipath I/O.
 .. code-block:: console
 
-device nvme-subsys,id=nvme-subsys-0,nqn=subsys0
-   -device nvme,serial=a,subsys=nvme-subsys-0
-   -device nvme,serial=b,subsys=nvme-subsys-0
+   -device nvme,serial=deadbeef,subsys=nvme-subsys-0
+   -device nvme,serial=deadbeef,subsys=nvme-subsys-0
 
 This will create an NVM subsystem with two controllers. Having controllers
 linked to an ``nvme-subsys`` device allows additional ``nvme-ns`` parameters:
-- 
2.36.1




[PATCH v2 3/4] hw/nvme: add support for ratified TP4084

2022-06-27 Thread Niklas Cassel via
TP4084 adds a new mode, CC.CRIME, that can be used to mark a namespace
as ready independently from the controller.

When CC.CRIME is 0 (default), things behave as before, all namespaces
are ready when CSTS.RDY gets set to 1.

When CC.CRIME is 1, the controller will become ready when CSTS.RDY gets
set to 1, but commands accessing a namespace are allowed to return
"Namespace Not Ready" or "Admin Command Media Not Ready".
After CRTO.CRWMT amount of time, if the namespace has not yet been
marked ready, the status codes also need to have the DNR bit set.

Add a new "ready_delay" namespace device parameter, in order to emulate
different ready latencies for namespaces.

Once a namespace is ready, it will set the NRDY bit in the I/O Command
Set Independent Identify Namespace Data Structure, and then send out a
Namespace Attribute Changed event.

This new "ready_delay" is supported on controllers not part of a NVMe
subsystem. The reasons are many. One problem is that multiple controllers
can have different CC.CRIME modes running. Another problem is the extra
locking needed. The third problem is when to actually clear NRDY. If we
assume that a namespace clears NRDY when it no longer has any controller
online for that namespace. The problem then is that Linux will reset the
controllers one by one during probe time. The reset goes so fast so that
there is no time when all controllers are in reset at the same time, so
NRDY will never get cleared. (The controllers are enabled by SeaBIOS by
default.) We could introduce a reset_time param, but this would only
increase the chances that all controllers are in reset at the same time.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c   | 123 +--
 hw/nvme/ns.c |  18 +++
 hw/nvme/nvme.h   |   6 +++
 hw/nvme/trace-events |   1 +
 include/block/nvme.h |  60 -
 5 files changed, 204 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 8ca824ea14..5404f67480 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -88,6 +88,12 @@
  *   completion when there are no outstanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * - `crwmt`
+ *   This is the Controller Ready With Media Timeout (CRWMT) field that is
+ *   defined in the CRTO register. This specifies the worst-case time that host
+ *   software should wait for the controller and all attached namespaces to
+ *   become ready. The value is in units of 500 milliseconds.
+ *
  * - `mdts`
  *   Indicates the maximum data transfer size for a command that transfers data
  *   between host-accessible memory and the controller. The value is specified
@@ -157,6 +163,11 @@
  *   namespace will be available in the subsystem but not attached to any
  *   controllers.
  *
+ * - `ready_delay`
+ *   This parameter specifies the amount of time that the namespace should wait
+ *   before being marked ready. Only applicable if CC.CRIME is set by the user.
+ *   The value is in units of 500 milliseconds (to be consistent with `crwmt`).
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -216,6 +227,8 @@
 #define NVME_VF_RES_GRANULARITY 1
 #define NVME_VF_OFFSET 0x1
 #define NVME_VF_STRIDE 1
+#define NVME_DEFAULT_CRIMT 0xa
+#define NVME_DEFAULT_CRWMT 0xf
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -4188,6 +4201,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_OPCODE | NVME_DNR;
 }
 
+if (!(ns->id_indep_ns.nstat & NVME_NSTAT_NRDY)) {
+return NVME_NS_NOT_READY;
+}
+
 if (ns->status) {
 return ns->status;
 }
@@ -4791,6 +4808,27 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
+static uint16_t nvme_identify_cs_indep_ns(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint32_t nsid = le32_to_cpu(c->nsid);
+
+trace_pci_nvme_identify_cs_indep_ns(nsid);
+
+if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) {
+return NVME_INVALID_NSID | NVME_DNR;
+}
+
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
+return nvme_c2h(n, (uint8_t *)>id_indep_ns, sizeof(NvmeIdNsCsIndep),
+req);
+}
+
 static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, NvmeRequest *req,
 bool attached)
 {
@@ -5081,6 +5119,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ns(n, req, true);
 case NVME_ID_CNS_NS_PRESENT:
 return nvme_identify_ns(n, req, false);
+case NVME_ID_CNS_CS_INDEPENDENT_NS:
+return nvme_identify_cs_indep_ns(n, req);

[PATCH v2 2/4] hw/nvme: store a pointer to the NvmeSubsystem in the NvmeNamespace

2022-06-27 Thread Niklas Cassel via
Each NvmeNamespace can be used by serveral controllers,
but a NvmeNamespace can at most belong to a single NvmeSubsystem.
Store a pointer to the NvmeSubsystem, if the namespace was realized
with a NvmeSubsystem.

This will be used by a follow up patch.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ns.c   | 1 +
 hw/nvme/nvme.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 870c3ca1a2..8bee3e8b3b 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -559,6 +559,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 if (!qdev_set_parent_bus(dev, >bus.parent_bus, errp)) {
 return;
 }
+ns->subsys = subsys;
 }
 
 if (nvme_ns_setup(ns, errp)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 0711b9748c..5487e2db40 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -167,6 +167,7 @@ typedef struct NvmeNamespace {
 int32_t nr_active_zones;
 
 NvmeNamespaceParams params;
+NvmeSubsystem *subsys;
 
 struct {
 uint32_t err_rec;
-- 
2.36.1




[PATCH v2 4/4] hw/nvme: add new never_ready parameter to test the DNR bit

2022-06-27 Thread Niklas Cassel via
Since we verify that "ready_delay" parameter has to be smaller than CRWMT,
we know that the namespace will always become ready.
Therefore the "Namespace Not Ready" status code will never have the DNR
bit set.

Add a new parameter "never_ready" that can be used to emulate a namespace
that never gets ready, such that the DNR bit gets set after CRWMT amount
of time.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c | 28 +++-
 hw/nvme/ns.c   |  1 +
 hw/nvme/nvme.h |  2 ++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 5404f67480..5f98d4778d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -168,6 +168,12 @@
  *   before being marked ready. Only applicable if CC.CRIME is set by the user.
  *   The value is in units of 500 milliseconds (to be consistent with `crwmt`).
  *
+ * - `never_ready`
+ *   This parameter specifies that a namespace should never be marked as ready.
+ *   When `crwmt` amount of time has passed after enabling the controller,
+ *   status code "Namespace Not Ready" will have the DNR bit set. If specified
+ *   together with `ready_delay`, `never_ready` will take precedence.
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -4156,6 +4162,14 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 return status;
 }
 
+static bool nvme_ready_has_passed_timeout(NvmeCtrl *n)
+{
+int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+int64_t elapsed_time = current_time - n->cc_enable_timestamp;
+
+return elapsed_time > n->params.crwmt * 500;
+}
+
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns;
@@ -4202,7 +4216,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 }
 
 if (!(ns->id_indep_ns.nstat & NVME_NSTAT_NRDY)) {
-return NVME_NS_NOT_READY;
+uint16_t ret = NVME_NS_NOT_READY;
+if (ns->params.never_ready && nvme_ready_has_passed_timeout(n)) {
+ret |= NVME_DNR;
+}
+return ret;
 }
 
 if (ns->status) {
@@ -5616,6 +5634,10 @@ static void nvme_set_ready_or_start_timer(NvmeCtrl *n, 
NvmeNamespace *ns)
 {
 int64_t expire_time;
 
+if (ns->params.never_ready) {
+return;
+}
+
 if (!NVME_CC_CRIME(ldl_le_p(>bar.cc)) || ns->params.ready_delay == 0) {
 ns->id_indep_ns.nstat |= NVME_NSTAT_NRDY;
 return;
@@ -6346,6 +6368,7 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType 
rst)
 }
 }
 
+n->cc_enable_timestamp = 0;
 n->aer_queued = 0;
 n->aer_mask = 0;
 n->outstanding_aers = 0;
@@ -6389,6 +6412,8 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 
 nvme_ns_shutdown(ns);
 }
+
+n->cc_enable_timestamp = 0;
 }
 
 static void nvme_ctrl_per_ns_action_on_start(NvmeCtrl *n)
@@ -6506,6 +6531,7 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 NVME_CAP_SET_TO(cap, new_cap_timeout);
 stq_le_p(>bar.cap, cap);
 
+n->cc_enable_timestamp = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->page_bits = page_bits;
 n->page_size = page_size;
 n->max_prp_ents = n->page_size / sizeof(uint64_t);
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index c1d70183c4..fc12b4e0d3 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -661,6 +661,7 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_BOOL("eui64-default", NvmeNamespace, params.eui64_default,
  false),
 DEFINE_PROP_UINT16("ready_delay", NvmeNamespace, params.ready_delay, 0),
+DEFINE_PROP_BOOL("never_ready", NvmeNamespace, params.never_ready, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 9d2f13dfdb..292b1acf15 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -127,6 +127,7 @@ typedef struct NvmeNamespaceParams {
 uint64_t zrwafg;
 
 uint16_t ready_delay;
+bool never_ready;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -451,6 +452,7 @@ typedef struct NvmeCtrl {
 int cq_pending;
 uint64_thost_timestamp; /* Timestamp sent by the host 
*/
 uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */
+uint64_tcc_enable_timestamp;/* QEMU clock time */
 uint64_tstarttime_ms;
 uint16_ttemperature;
 uint8_t smart_critical_warning;
-- 
2.36.1




[PATCH v2 0/4] hw/nvme: add support for TP4084

2022-06-27 Thread Niklas Cassel via
Hello there,

considering that Linux v5.19 will include support for NVMe TP4084:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme/host/core.c?id=354201c53e61e493017b15327294b0c8ab522d69

I thought that it might be nice to have QEMU support for the same.

TP4084 adds a new mode, CC.CRIME, that can be used to mark a namespace
as ready independently from the controller.

When CC.CRIME is 0 (default), things behave as before, all namespaces
are ready when CSTS.RDY gets set to 1.

Add a new "ready_delay" namespace device parameter, in order to emulate
different ready latencies for namespaces when CC.CRIME is 1.

The patch series also adds a "crwmt" controller parameter, in order to
be able to expose the worst case timeout that the host should wait for
all namespaces to become ready.


Example qemu cmd line for the new options:

# delay in s (20s)
NS1_DELAY_S=20
# convert to units of 500ms
NS1_DELAY=$((NS1_DELAY_S*2))

# delay in s (60s)
NS2_DELAY_S=60
# convert to units of 500ms
NS2_DELAY=$((NS2_DELAY_S*2))

# timeout in s (120s)
CRWMT_S=120
# convert to units of 500ms
CRWMT=$((CRWMT_S*2))

 -device nvme,serial=deadbeef,crwmt=$CRWMT \
 -drive file=$NS1_DATA,id=nvm-1,format=raw,if=none \
 -device nvme-ns,drive=nvm-1,ready_delay=$NS1_DELAY \
 -drive file=$NS2_DATA,id=nvm-2,format=raw,if=none \
 -device nvme-ns,drive=nvm-2,ready_delay=$NS2_DELAY \


Changes since v1:
-Rebased on nvme-next
-Set id_indep_ns->nmic if ns->params.shared in patch 3/4.


Niklas Cassel (4):
  hw/nvme: claim NVMe 2.0 compliance
  hw/nvme: store a pointer to the NvmeSubsystem in the NvmeNamespace
  hw/nvme: add support for ratified TP4084
  hw/nvme: add new never_ready parameter to test the DNR bit

 hw/nvme/ctrl.c   | 151 +--
 hw/nvme/ns.c |  20 ++
 hw/nvme/nvme.h   |   9 +++
 hw/nvme/trace-events |   1 +
 include/block/nvme.h |  60 -
 5 files changed, 236 insertions(+), 5 deletions(-)

-- 
2.36.1




[PATCH v2 1/4] hw/nvme: claim NVMe 2.0 compliance

2022-06-27 Thread Niklas Cassel via
CRMS.CRWMS bit shall be set to 1 on controllers compliant with versions
later than NVMe 1.4.

The first version later than NVMe 1.4 is NVMe 2.0

Let's claim compliance with NVMe 2.0 such that a follow up patch can
set the CRMS.CRWMS bit.

This is needed since CC.CRIME is only writable when both CRMS.CRIMS
and CRMS.CRWMS is set.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f8ec4a7be3..8ca824ea14 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -204,7 +204,7 @@
 
 #define NVME_MAX_IOQPAIRS 0x
 #define NVME_DB_SIZE  4
-#define NVME_SPEC_VER 0x00010400
+#define NVME_SPEC_VER 0x0002
 #define NVME_CMB_BIR 2
 #define NVME_PMR_BIR 4
 #define NVME_TEMPERATURE 0x143
-- 
2.36.1




[PATCH 4/4] hw/nvme: add new never_ready parameter to test the DNR bit

2022-06-07 Thread Niklas Cassel via
Since we verify that "ready_delay" parameter has to be smaller than CRWMT,
we know that the namespace will always become ready.
Therefore the "Namespace Not Ready" status code will never have the DNR
bit set.

Add a new parameter "never_ready" that can be used to emulate a namespace
that never gets ready, such that the DNR bit gets set after CRWMT amount
of time.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c | 28 +++-
 hw/nvme/ns.c   |  1 +
 hw/nvme/nvme.h |  2 ++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 66d96714c3..5ec22ff13d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -134,6 +134,12 @@
  *   before being marked ready. Only applicable if CC.CRIME is set by the user.
  *   The value is in units of 500 milliseconds (to be consistent with `crwmt`).
  *
+ * - `never_ready`
+ *   This parameter specifies that a namespace should never be marked as ready.
+ *   When `crwmt` amount of time has passed after enabling the controller,
+ *   status code "Namespace Not Ready" will have the DNR bit set. If specified
+ *   together with `ready_delay`, `never_ready` will take precedence.
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -4118,6 +4124,14 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 return status;
 }
 
+static bool nvme_ready_has_passed_timeout(NvmeCtrl *n)
+{
+int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
+int64_t elapsed_time = current_time - n->cc_enable_timestamp;
+
+return elapsed_time > n->params.crwmt * 500;
+}
+
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns;
@@ -4164,7 +4178,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 }
 
 if (!(ns->id_indep_ns.nstat & NVME_NSTAT_NRDY)) {
-return NVME_NS_NOT_READY;
+uint16_t ret = NVME_NS_NOT_READY;
+if (ns->params.never_ready && nvme_ready_has_passed_timeout(n)) {
+ret |= NVME_DNR;
+}
+return ret;
 }
 
 if (ns->status) {
@@ -5537,6 +,10 @@ static void nvme_set_ready_or_start_timer(NvmeCtrl *n, 
NvmeNamespace *ns)
 {
 int64_t expire_time;
 
+if (ns->params.never_ready) {
+return;
+}
+
 if (!NVME_CC_CRIME(ldl_le_p(>bar.cc)) || ns->params.ready_delay == 0) {
 ns->id_indep_ns.nstat |= NVME_NSTAT_NRDY;
 return;
@@ -5979,6 +6001,7 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 n->aer_queued = 0;
 n->outstanding_aers = 0;
 n->qs_created = false;
+n->cc_enable_timestamp = 0;
 }
 
 static void nvme_ctrl_shutdown(NvmeCtrl *n)
@@ -6000,6 +6023,8 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 
 nvme_ns_shutdown(ns);
 }
+
+n->cc_enable_timestamp = 0;
 }
 
 static void nvme_ctrl_per_ns_action_on_start(NvmeCtrl *n)
@@ -6109,6 +6134,7 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 NVME_CAP_SET_TO(cap, new_cap_timeout);
 stq_le_p(>bar.cap, cap);
 
+n->cc_enable_timestamp = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->page_bits = page_bits;
 n->page_size = page_size;
 n->max_prp_ents = n->page_size / sizeof(uint64_t);
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index c4e9f0e5c8..89c31658de 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -658,6 +658,7 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_BOOL("eui64-default", NvmeNamespace, params.eui64_default,
  false),
 DEFINE_PROP_UINT16("ready_delay", NvmeNamespace, params.ready_delay, 0),
+DEFINE_PROP_BOOL("never_ready", NvmeNamespace, params.never_ready, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index c9934d0097..6ff9725f21 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -122,6 +122,7 @@ typedef struct NvmeNamespaceParams {
 uint64_t zrwafg;
 
 uint16_t ready_delay;
+bool never_ready;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -436,6 +437,7 @@ typedef struct NvmeCtrl {
 int cq_pending;
 uint64_thost_timestamp; /* Timestamp sent by the host 
*/
 uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */
+uint64_tcc_enable_timestamp;/* QEMU clock time */
 uint64_tstarttime_ms;
 uint16_ttemperature;
 uint8_t smart_critical_warning;
-- 
2.36.1




[PATCH 3/4] hw/nvme: add support for ratified TP4084

2022-06-07 Thread Niklas Cassel via
TP4084 adds a new mode, CC.CRIME, that can be used to mark a namespace
as ready independently from the controller.

When CC.CRIME is 0 (default), things behave as before, all namespaces
are ready when CSTS.RDY gets set to 1.

When CC.CRIME is 1, the controller will become ready when CSTS.RDY gets
set to 1, but commands accessing a namespace are allowed to return
"Namespace Not Ready" or "Admin Command Media Not Ready".
After CRTO.CRWMT amount of time, if the namespace has not yet been
marked ready, the status codes also need to have the DNR bit set.

Add a new "ready_delay" namespace device parameter, in order to emulate
different ready latencies for namespaces.

Once a namespace is ready, it will set the NRDY bit in the I/O Command
Set Independent Identify Namespace Data Structure, and then send out a
Namespace Attribute Changed event.

This new "ready_delay" is supported on controllers not part of a NVMe
subsystem. The reasons are many. One problem is that multiple controllers
can have different CC.CRIME modes running. Another problem is the extra
locking needed. The third problem is when to actually clear NRDY. If we
assume that a namespace clears NRDY when it no longer has any controller
online for that namespace. The problem then is that Linux will reset the
controllers one by one during probe time. The reset goes so fast so that
there is no time when all controllers are in reset at the same time, so
NRDY will never get cleared. (The controllers are enabled by SeaBIOS by
default.) We could introduce a reset_time param, but this would only
increase the chances that all controllers are in reset at the same time.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c   | 123 +--
 hw/nvme/ns.c |  15 ++
 hw/nvme/nvme.h   |   6 +++
 hw/nvme/trace-events |   1 +
 include/block/nvme.h |  60 -
 5 files changed, 201 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 91469834b0..66d96714c3 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -83,6 +83,12 @@
  *   completion when there are no outstanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * - `crwmt`
+ *   This is the Controller Ready With Media Timeout (CRWMT) field that is
+ *   defined in the CRTO register. This specifies the worst-case time that host
+ *   software should wait for the controller and all attached namespaces to
+ *   become ready. The value is in units of 500 milliseconds.
+ *
  * - `mdts`
  *   Indicates the maximum data transfer size for a command that transfers data
  *   between host-accessible memory and the controller. The value is specified
@@ -123,6 +129,11 @@
  *   namespace will be available in the subsystem but not attached to any
  *   controllers.
  *
+ * - `ready_delay`
+ *   This parameter specifies the amount of time that the namespace should wait
+ *   before being marked ready. Only applicable if CC.CRIME is set by the user.
+ *   The value is in units of 500 milliseconds (to be consistent with `crwmt`).
+ *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
  * In this case, the following namespace properties are available to configure
  * zoned operation:
@@ -176,6 +187,8 @@
 #define NVME_TEMPERATURE_CRITICAL 0x175
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
+#define NVME_DEFAULT_CRIMT 0xa
+#define NVME_DEFAULT_CRWMT 0xf
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -4150,6 +4163,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_OPCODE | NVME_DNR;
 }
 
+if (!(ns->id_indep_ns.nstat & NVME_NSTAT_NRDY)) {
+return NVME_NS_NOT_READY;
+}
+
 if (ns->status) {
 return ns->status;
 }
@@ -4746,6 +4763,27 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
+static uint16_t nvme_identify_cs_indep_ns(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeNamespace *ns;
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint32_t nsid = le32_to_cpu(c->nsid);
+
+trace_pci_nvme_identify_cs_indep_ns(nsid);
+
+if (!nvme_nsid_valid(n, nsid) || nsid == NVME_NSID_BROADCAST) {
+return NVME_INVALID_NSID | NVME_DNR;
+}
+
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
+return nvme_c2h(n, (uint8_t *)>id_indep_ns, sizeof(NvmeIdNsCsIndep),
+req);
+}
+
 static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, NvmeRequest *req,
 bool attached)
 {
@@ -5005,6 +5043,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ns(n, req, true);
 case NVME_ID_CNS_NS_PRESENT:
 return nvme_identify_ns(n, req, false);
+case NVME_ID_CNS_CS_INDEPENDENT_NS:
+return 

[PATCH 1/4] hw/nvme: claim NVMe 2.0 compliance

2022-06-07 Thread Niklas Cassel via
CRMS.CRWMS bit shall be set to 1 on controllers compliant with versions
later than NVMe 1.4.

The first version later than NVMe 1.4 is NVMe 2.0

Let's claim compliance with NVMe 2.0 such that a follow up patch can
set the CRMS.CRWMS bit.

This is needed since CC.CRIME is only writable when both CRMS.CRIMS
and CRMS.CRWMS is set.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 1e6e0fcad9..91469834b0 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -168,7 +168,7 @@
 
 #define NVME_MAX_IOQPAIRS 0x
 #define NVME_DB_SIZE  4
-#define NVME_SPEC_VER 0x00010400
+#define NVME_SPEC_VER 0x0002
 #define NVME_CMB_BIR 2
 #define NVME_PMR_BIR 4
 #define NVME_TEMPERATURE 0x143
-- 
2.36.1




[PATCH 2/4] hw/nvme: store a pointer to the NvmeSubsystem in the NvmeNamespace

2022-06-07 Thread Niklas Cassel via
Each NvmeNamespace can be used by serveral controllers,
but a NvmeNamespace can at most belong to a single NvmeSubsystem.
Store a pointer to the NvmeSubsystem, if the namespace was realized
with a NvmeSubsystem.

This will be used by a follow up patch.

Signed-off-by: Niklas Cassel 
---
 hw/nvme/ns.c   | 1 +
 hw/nvme/nvme.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 1b9c9d1156..07759a973b 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -559,6 +559,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 if (!qdev_set_parent_bus(dev, >bus.parent_bus, errp)) {
 return;
 }
+ns->subsys = subsys;
 }
 
 if (nvme_ns_setup(ns, errp)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index e41771604f..32333d0c89 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -162,6 +162,7 @@ typedef struct NvmeNamespace {
 int32_t nr_active_zones;
 
 NvmeNamespaceParams params;
+NvmeSubsystem *subsys;
 
 struct {
 uint32_t err_rec;
-- 
2.36.1




[PATCH 0/4] hw/nvme: add support for TP4084

2022-06-07 Thread Niklas Cassel via
Hello there,

considering that Linux v5.19-rc1 is out which includes support for
NVMe TP4084:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme/host/core.c?id=354201c53e61e493017b15327294b0c8ab522d69

I thought that it might be nice to have QEMU support for the same.

TP4084 adds a new mode, CC.CRIME, that can be used to mark a namespace
as ready independently from the controller.

When CC.CRIME is 0 (default), things behave as before, all namespaces
are ready when CSTS.RDY gets set to 1.

Add a new "ready_delay" namespace device parameter, in order to emulate
different ready latencies for namespaces when CC.CRIME is 1.

The patch series also adds a "crwmt" controller parameter, in order to
be able to expose the worst case timeout that the host should wait for
all namespaces to become ready.


Example qemu cmd line for the new options:

# delay in s (20s)
NS1_DELAY_S=20
# convert to units of 500ms
NS1_DELAY=$((NS1_DELAY_S*2))

# delay in s (60s)
NS2_DELAY_S=60
# convert to units of 500ms
NS2_DELAY=$((NS2_DELAY_S*2))

# timeout in s (120s)
CRWMT_S=120
# convert to units of 500ms
CRWMT=$((CRWMT_S*2))

 -device nvme,serial=deadbeef,crwmt=$CRWMT \
 -drive file=$NS1_DATA,id=nvm-1,format=raw,if=none \
 -device nvme-ns,drive=nvm-1,ready_delay=$NS1_DELAY \
 -drive file=$NS2_DATA,id=nvm-2,format=raw,if=none \
 -device nvme-ns,drive=nvm-2,ready_delay=$NS2_DELAY \


Niklas Cassel (4):
  hw/nvme: claim NVMe 2.0 compliance
  hw/nvme: store a pointer to the NvmeSubsystem in the NvmeNamespace
  hw/nvme: add support for ratified TP4084
  hw/nvme: add new never_ready parameter to test the DNR bit

 hw/nvme/ctrl.c   | 151 +--
 hw/nvme/ns.c |  17 +
 hw/nvme/nvme.h   |   9 +++
 hw/nvme/trace-events |   1 +
 include/block/nvme.h |  60 -
 5 files changed, 233 insertions(+), 5 deletions(-)

-- 
2.36.1