Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-05 Thread Jason Wang



在 2021/4/6 上午4:00, Dongli Zhang 写道:


On 4/1/21 8:47 PM, Jason Wang wrote:

在 2021/3/30 下午3:29, Dongli Zhang 写道:

On 3/28/21 8:56 PM, Jason Wang wrote:

在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$



... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
  kref = {
    refcount = {
  refs = {
    counter = 4
  }
    }
  },
  wqh = {
    lock = {
  {
    rlock = {
  raw_lock = {
    val = {
  counter = 0
    }
  }
    }
  }
    },
    head = {
  next = 0x8f841dc08e18,
  prev = 0x8f841dc08e18
    }
  },
  count = 15, ---> eventfd is 15 !!!
  flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
  "arguments": { "dev": "/machine/peripheral/vscsi0",
     "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.

So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

This is not only about whether notification is blocked.

It can also be used to help narrow down and understand if there is any
suspicious issue in blk-mq/scsi/netdev/napi code. The PV drivers are not only
drivers following virtio spec. It is closely related to many of other kernel
components.

Suppose IO was recovered after we inject an IRQ to vhost-scsi on purpose, we
will be able to analyze what may happen along the IO completion path starting
from when /where the IRQ is injected ... perhaps the root cause is not with
virtio but blk-mq/scsi (this is just an example).


In addition, this idea should help for vfio-pci. Suppose the developer for a
specific device driver suspects IO/networking hangs because of loss if IRQ, we
will be able to verify if that assumption is correct by injecting an IRQ on
purpose.

Therefore, this 

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-05 Thread Dongli Zhang



On 4/1/21 8:47 PM, Jason Wang wrote:
> 
> 在 2021/3/30 下午3:29, Dongli Zhang 写道:
>>
>> On 3/28/21 8:56 PM, Jason Wang wrote:
>>> 在 2021/3/27 上午5:16, Dongli Zhang 写道:
 Hi Jason,

 On 3/26/21 12:24 AM, Jason Wang wrote:
> 在 2021/3/26 下午1:44, Dongli Zhang 写道:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$
>>
>>
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the 
>> new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, 
>> MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device 
>> (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the 
>> doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>>
>> This device can be used as a workaround if call/kick is lost due to
>> virtualization software (e.g., kernel or QEMU) issue.
>>
>> We may also implement the interface for VFIO PCI, e.g., to write to
>> VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
>> on purpose. This is considered future work once the virtio part is done.
>>
>>
>> Below is from live crash analysis. Initially, the queue=2 has count=15 
>> for
>> 'kick' eventfd_ctx. Suppose there is data in vring avail while there is 
>> no
>> used available. We suspect this is because vhost-scsi was not notified by
>> VM. In order to narrow down and analyze the issue, we use live crash to
>> dump the current counter of eventfd for queue=2.
>>
>> crash> eventfd_ctx 8f67f6bbe700
>> struct eventfd_ctx {
>>  kref = {
>>    refcount = {
>>  refs = {
>>    counter = 4
>>  }
>>    }
>>  },
>>  wqh = {
>>    lock = {
>>  {
>>    rlock = {
>>  raw_lock = {
>>    val = {
>>  counter = 0
>>    }
>>  }
>>    }
>>  }
>>    },
>>    head = {
>>  next = 0x8f841dc08e18,
>>  prev = 0x8f841dc08e18
>>    }
>>  },
>>  count = 15, ---> eventfd is 15 !!!
>>  flags = 526336
>> }
>>
>> Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
>> with this interface.
>>
>> { "execute": "x-debug-device-event",
>>  "arguments": { "dev": "/machine/peripheral/vscsi0",
>>     "event": "kick", "queue": 2 } }
>>
>> The counter is increased to 16. Suppose the hang issue is resolved, it
>> indicates something bad is in software that the 'kick' is lost.
> What do you mean by "software" here? And it looks to me you're testing 
> whether
> event_notifier_set() is called by virtio_queue_notify() here. If so, I'm 
> not
> sure how much value could we gain from a dedicated debug interface like 
> this
> consider there're a lot of exisinting general purpose debugging method 
> like
> tracing or gdb. I'd say the path from virtio_queue_notify() to
> event_notifier_set() is only a very small fraction of the process of 
> virtqueue
> kick which is unlikey to be buggy. Consider usually the ioeventfd will be
> offloaded to KVM, it's more a chance that something is wrong in setuping
> ioeventfd instead of here. Irq is even more complicated.
 Thank you very much!

 I am not testing whether event_notifier_set() is called by
 virtio_queue_notify().

 The 'software' indicates the data processing and event notification 
 mechanism
 involved with virtio/vhost PV driver frontend. E.g., while VM is waiting 
 for an
 extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
 erroneously returns false due to corrupted ring buffer status.
>>>
>>> So there could be several factors that may block the notification:
>>>
>>> 1) eventfd bug (ioeventfd vs irqfd)
>>> 2) wrong virtqueue state (either driver or device)
>>> 3) missing barriers (either driver or device)
>>> 4) Qemu bug (irqchip and routing)
>>> ...
>> This is not only about whether notification is blocked.
>>
>> It can also be used to help narrow down and understand if there is any
>> suspicious 

Re: [PATCH for-6.0 v2 0/8] hw/block/nvme: misc fixes

2021-04-05 Thread Keith Busch
On Mon, Apr 05, 2021 at 07:54:44PM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Various fixes for 6.0.
> 
> v2:
>   - "hw/block/nvme: fix handling of private namespaces"
> update documentation (Gollu)
>   - add a patch for missing copyright headers

Series looks fine.

Reviewed-by: Keith Busch 



[PATCH for-6.0 v2 5/8] hw/block/nvme: fix warning about legacy namespace configuration

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Remove the unused BlockConf from the controller structure and fix the
constraint checking to actually check the right BlockConf and issue the
warning.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme.h | 1 -
 hw/block/nvme.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index c610ab30dc5c..1570f65989a7 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -166,7 +166,6 @@ typedef struct NvmeCtrl {
 NvmeBar  bar;
 NvmeParams   params;
 NvmeBus  bus;
-BlockConfconf;
 
 uint16_tcntlid;
 boolqs_created;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6d31d5b17a0b..de0e726dfdd8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -5813,7 +5813,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 params->max_ioqpairs = params->num_queues - 1;
 }
 
-if (n->conf.blk) {
+if (n->namespace.blkconf.blk) {
 warn_report("drive property is deprecated; "
 "please use an nvme-ns device instead");
 }
-- 
2.31.1




[PATCH for-6.0 v2 2/8] hw/block/nvme: fix missing string representation for ns attachment

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Add the missing nvme_adm_opc_str entry for the Namespace Attachment
command.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 5b0031b11db2..9edc86d79e98 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -86,6 +86,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc)
 case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES";
 case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES";
 case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ";
+case NVME_ADM_CMD_NS_ATTACHMENT:return "NVME_ADM_CMD_NS_ATTACHMENT";
 case NVME_ADM_CMD_FORMAT_NVM:   return "NVME_ADM_CMD_FORMAT_NVM";
 default:return "NVME_ADM_CMD_UNKNOWN";
 }
-- 
2.31.1




[PATCH for-6.0 v2 7/8] hw/block/nvme: fix handling of private namespaces

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Prior to this patch, if a private nvme-ns device (that is, a namespace
that is not linked to a subsystem) is wired up to an nvme-subsys linked
nvme controller device, the device fails to verify that the namespace id
is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
and Namespace Usage") states that because the device supports Namespace
Management, "NSIDs *shall* be unique within the NVM subsystem".

Additionally, prior to this patch, private namespaces are not known to
the subsystem and the namespace is considered exclusive to the
controller with which it is initially wired up to. However, this is not
the definition of a private namespace; per Section 1.6.33 ("private
namespace"), a private namespace is just a namespace that does not
support multipath I/O or namespace sharing, which means "that it is only
able to be attached to one controller at a time".

Fix this by always allocating namespaces in the subsystem (if one is
linked to the controller), regardsless of the shared/private status of
the namespace. Whether or not the namespace is shareable is controlled
by a new `shared` nvme-ns parameter.

Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
since the `shared` parameter now serves the purpose of attaching the
namespace to all controllers in the subsystem upon device realization.
It is invalid to have an nvme-ns namespace device with a linked
subsystem without the parent nvme controller device also being linked to
one and since the nvme-ns devices will unconditionally be "attached" (in
QEMU terms that is) to an nvme controller device through an NvmeBus, the
nvme-ns namespace device can always get a reference to the subsystem of
the controller it is explicitly (using 'bus=' parametr) or implicitly
attaching to.

Fixes: e570768566b3 ("hw/block/nvme: support for shared namespace in subsystem")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme-ns.h |  10 +---
 hw/block/nvme-subsys.h |   7 +--
 hw/block/nvme.h|  39 +
 include/block/nvme.h   |   1 +
 hw/block/nvme-ns.c |  74 +
 hw/block/nvme-subsys.c |  28 --
 hw/block/nvme.c| 123 ++---
 hw/block/trace-events  |   1 -
 8 files changed, 113 insertions(+), 170 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 82340c4b2574..fb0a41f912e7 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,7 @@ typedef struct NvmeZone {
 
 typedef struct NvmeNamespaceParams {
 bool detached;
+bool shared;
 uint32_t nsid;
 QemuUUID uuid;
 
@@ -60,8 +61,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 uint16_t status;
+int  attached;
 
-NvmeSubsystem   *subsys;
 QTAILQ_ENTRY(NvmeNamespace) entry;
 
 NvmeIdNsZoned   *id_ns_zoned;
@@ -99,11 +100,6 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return 0;
 }
 
-static inline bool nvme_ns_shared(NvmeNamespace *ns)
-{
-return !!ns->subsys;
-}
-
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
@@ -225,7 +221,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index aafa04b84829..24132edd005c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  256
+#define NVME_MAX_NAMESPACES 256
 
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {
 
 NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 /* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
+NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1];
 
 struct {
 char *nqn;
@@ -32,7 +32,6 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
 uint32_t cntlid)
@@ -54,7 +53,7 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem 
*subsys,
 return NULL;
 }
 
-assert(nsid && nsid <= NVME_SUBSYS_MAX_NAMESPACES);
+assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
 return subsys->namespaces[nsid];
 }
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1570f65989a7..644143597a0f 100644
--- a/hw/block/nvme.h
+++ 

[PATCH for-6.0 v2 6/8] hw/block/nvme: update dmsrl limit on namespace detachment

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

The Non-MDTS DMSRL limit must be recomputed when namespaces are
detached.

Fixes: 645ce1a70cb6 ("hw/block/nvme: support namespace attachment command")
Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index de0e726dfdd8..3dc51f407671 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4876,6 +4876,21 @@ static uint16_t nvme_aer(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static void __nvme_update_dmrsl(NvmeCtrl *n)
+{
+int nsid;
+
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (!ns) {
+continue;
+}
+
+n->dmrsl = MIN_NON_ZERO(n->dmrsl,
+BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1));
+}
+}
+
 static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
 static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 {
@@ -4925,6 +4940,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 nvme_ns_detach(ctrl, ns);
+
+__nvme_update_dmrsl(ctrl);
 }
 
 /*
-- 
2.31.1




[PATCH for-6.0 v2 8/8] hw/block/nvme: add missing copyright headers

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Add missing license/copyright headers to the nvme-dif.{c,h} files.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-dif.h | 10 ++
 hw/block/nvme-dif.c | 10 ++
 2 files changed, 20 insertions(+)

diff --git a/hw/block/nvme-dif.h b/hw/block/nvme-dif.h
index 5a8e37c8525b..524faffbd7a0 100644
--- a/hw/block/nvme-dif.h
+++ b/hw/block/nvme-dif.h
@@ -1,3 +1,13 @@
+/*
+ * QEMU NVM Express End-to-End Data Protection support
+ *
+ * Copyright (c) 2021 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ *   Klaus Jensen   
+ *   Gollu Appalanaidu  
+ */
+
 #ifndef HW_NVME_DIF_H
 #define HW_NVME_DIF_H
 
diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index e6f04faafb5f..81b0a4cb1382 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -1,3 +1,13 @@
+/*
+ * QEMU NVM Express End-to-End Data Protection support
+ *
+ * Copyright (c) 2021 Samsung Electronics Co., Ltd.
+ *
+ * Authors:
+ *   Klaus Jensen   
+ *   Gollu Appalanaidu  
+ */
+
 #include "qemu/osdep.h"
 #include "hw/block/block.h"
 #include "sysemu/dma.h"
-- 
2.31.1




[PATCH for-6.0 v2 4/8] hw/block/nvme: fix controller namespaces array indexing

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

The controller namespaces array being 0-indexed requires 'nsid - 1'
everywhere. Something that is easy to miss. Align the controller
namespaces array with the subsystem namespaces array such that both are
1-indexed.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme.h | 8 
 hw/block/nvme.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 9edc86d79e98..c610ab30dc5c 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -217,7 +217,7 @@ typedef struct NvmeCtrl {
  * Attached namespaces to this controller.  If subsys is not given, all
  * namespaces in this list will always be attached.
  */
-NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
+NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES + 1];
 NvmeSQueue  **sq;
 NvmeCQueue  **cq;
 NvmeSQueue  admin_sq;
@@ -232,7 +232,7 @@ static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t 
nsid)
 return NULL;
 }
 
-return n->namespaces[nsid - 1];
+return n->namespaces[nsid];
 }
 
 static inline bool nvme_ns_is_attached(NvmeCtrl *n, NvmeNamespace *ns)
@@ -253,7 +253,7 @@ static inline void nvme_ns_attach(NvmeCtrl *n, 
NvmeNamespace *ns)
 uint32_t nsid = nvme_nsid(ns);
 assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
-n->namespaces[nsid - 1] = ns;
+n->namespaces[nsid] = ns;
 }
 
 static inline void nvme_ns_detach(NvmeCtrl *n, NvmeNamespace *ns)
@@ -261,7 +261,7 @@ static inline void nvme_ns_detach(NvmeCtrl *n, 
NvmeNamespace *ns)
 uint32_t nsid = nvme_nsid(ns);
 assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
-n->namespaces[nsid - 1] = NULL;
+n->namespaces[nsid] = NULL;
 }
 
 static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index c54ec3c9523c..6d31d5b17a0b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -5915,7 +5915,7 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace 
*ns, Error **errp)
 return -1;
 }
 } else {
-if (n->namespaces[nsid - 1]) {
+if (n->namespaces[nsid]) {
 error_setg(errp, "namespace id '%d' is already in use", nsid);
 return -1;
 }
-- 
2.31.1




[PATCH for-6.0 v2 3/8] hw/block/nvme: fix the nsid 'invalid' value

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

The `nvme_nsid()` function returns '-1' (h) when the given
namespace is NULL. Since h is actually a valid namespace
identifier (the "broadcast" value), change this to be '0' since that
actually *is* the invalid value.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme-ns.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 9ab7894fc83e..82340c4b2574 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -96,7 +96,7 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return ns->params.nsid;
 }
 
-return -1;
+return 0;
 }
 
 static inline bool nvme_ns_shared(NvmeNamespace *ns)
-- 
2.31.1




[PATCH for-6.0 v2 1/8] hw/block/nvme: fix pi constraint check

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Protection Information can only be enabled if there is at least 8 bytes
of metadata.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 hw/block/nvme-ns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7f8d139a8663..ca04ee1bacfb 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -394,7 +394,7 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
-if (ns->params.pi && !ns->params.ms) {
+if (ns->params.pi && ns->params.ms < 8) {
 error_setg(errp, "at least 8 bytes of metadata required to enable "
"protection information");
 return -1;
-- 
2.31.1




[PATCH for-6.0 v2 0/8] hw/block/nvme: misc fixes

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Various fixes for 6.0.

v2:
  - "hw/block/nvme: fix handling of private namespaces"
update documentation (Gollu)
  - add a patch for missing copyright headers

Klaus Jensen (8):
  hw/block/nvme: fix pi constraint check
  hw/block/nvme: fix missing string representation for ns attachment
  hw/block/nvme: fix the nsid 'invalid' value
  hw/block/nvme: fix controller namespaces array indexing
  hw/block/nvme: fix warning about legacy namespace configuration
  hw/block/nvme: update dmsrl limit on namespace detachment
  hw/block/nvme: fix handling of private namespaces
  hw/block/nvme: add missing copyright headers

 hw/block/nvme-dif.h|  10 +++
 hw/block/nvme-ns.h |  12 ++--
 hw/block/nvme-subsys.h |   7 +-
 hw/block/nvme.h|  45 ++---
 include/block/nvme.h   |   1 +
 hw/block/nvme-dif.c|  10 +++
 hw/block/nvme-ns.c |  76 ++
 hw/block/nvme-subsys.c |  28 
 hw/block/nvme.c| 142 +
 hw/block/trace-events  |   1 -
 10 files changed, 156 insertions(+), 176 deletions(-)

-- 
2.31.1




[PULL for-6.0 1/2] hw/block/nvme: remove description for zoned.append_size_limit

2021-04-05 Thread Klaus Jensen
From: Niklas Cassel 

The description was originally removed in commit 578d914b263c
("hw/block/nvme: align zoned.zasl with mdts") together with the removal
of the zoned.append_size_limit parameter itself.

However, it was (most likely accidentally), re-added in commit
f7dcd31885cb ("hw/block/nvme: add non-mdts command size limit for verify").

Remove the description again, since the parameter it describes,
zoned.append_size_limit, no longer exists.

Signed-off-by: Niklas Cassel 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index c54ec3c9523c..08c204d46c43 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -91,14 +91,6 @@
  *   the minimum memory page size (CAP.MPSMIN). The default value is 0 (i.e.
  *   defaulting to the value of `mdts`).
  *
- * - `zoned.append_size_limit`
- *   The maximum I/O size in bytes that is allowed in Zone Append command.
- *   The default is 128KiB. Since internally this this value is maintained as
- *   ZASL = log2( / ), some values assigned
- *   to this property may be rounded down and result in a lower maximum ZA
- *   data size being in effect. By setting this property to 0, users can make
- *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
- *
  * nvme namespace device parameters
  * 
  * - `subsys`
-- 
2.31.1




[PULL for-6.0 0/2] emulated nvme fixes

2021-04-05 Thread Klaus Jensen
From: Klaus Jensen 

Hi Peter,

The following changes since commit 25d75c99b2e5941c67049ee776efdb226414f4c6:

  Merge remote-tracking branch 'remotes/xtensa/tags/20210403-xtensa' into 
staging (2021-04-04 21:48:45 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-for-6.0-pull-request

for you to fetch changes up to 498114b37bc99fddcfc24b92bff7f1323fb32672:

  hw/block/nvme: expose 'bootindex' property (2021-04-05 19:33:04 +0200)


emulated nvme fixes



Joelle van Dyne (1):
  hw/block/nvme: expose 'bootindex' property

Niklas Cassel (1):
  hw/block/nvme: remove description for zoned.append_size_limit

 hw/block/nvme.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

-- 
2.31.1




[PULL for-6.0 2/2] hw/block/nvme: expose 'bootindex' property

2021-04-05 Thread Klaus Jensen
From: Joelle van Dyne 

The check for `n->namespace.blkconf.blk` always fails because
this is in the initialization function.

Signed-off-by: Joelle van Dyne 
Reviewed-by: Klaus Jensen 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 08c204d46c43..7244534a89e9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -6328,11 +6328,9 @@ static void nvme_instance_init(Object *obj)
 {
 NvmeCtrl *n = NVME(obj);
 
-if (n->namespace.blkconf.blk) {
-device_add_bootindex_property(obj, >namespace.blkconf.bootindex,
-  "bootindex", "/namespace@1,0",
-  DEVICE(obj));
-}
+device_add_bootindex_property(obj, >namespace.blkconf.bootindex,
+  "bootindex", "/namespace@1,0",
+  DEVICE(obj));
 
 object_property_add(obj, "smart_critical_warning", "uint8",
 nvme_get_smart_warning,
-- 
2.31.1




Re: [PATCH for-6.0 7/7] hw/block/nvme: fix handling of private namespaces

2021-04-05 Thread Klaus Jensen
On Apr  5 18:02, Gollu Appalanaidu wrote:
> On Wed, Mar 24, 2021 at 09:09:07PM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > Prior to this patch, if a private nvme-ns device (that is, a namespace
> > that is not linked to a subsystem) is wired up to an nvme-subsys linked
> > nvme controller device, the device fails to verify that the namespace id
> > is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
> > and Namespace Usage") states that because the device supports Namespace
> > Management, "NSIDs *shall* be unique within the NVM subsystem".
> > 
> > Additionally, prior to this patch, private namespaces are not known to
> > the subsystem and the namespace is considered exclusive to the
> > controller with which it is initially wired up to. However, this is not
> > the definition of a private namespace; per Section 1.6.33 ("private
> > namespace"), a private namespace is just a namespace that does not
> > support multipath I/O or namespace sharing, which means "that it is only
> > able to be attached to one controller at a time".
> > 
> > Fix this by always allocating namespaces in the subsystem (if one is
> > linked to the controller), regardsless of the shared/private status of
> > the namespace. Whether or not the namespace is shareable is controlled
> > by a new `shared` nvme-ns parameter.
> > 
> > Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
> > since the `shared` parameter now serves the purpose of attaching the
> > namespace to all controllers in the subsystem upon device realization.
> > It is invalid to have an nvme-ns namespace device with a linked
> > subsystem without the parent nvme controller device also being linked to
> > one and since the nvme-ns devices will unconditionally be "attached" (in
> > QEMU terms that is) to an nvme controller device through an NvmeBus, the
> > nvme-ns namespace device can always get a reference to the subsystem of
> > the controller it is explicitly (using 'bus=' parametr) or implicitly
> > attaching to.
> > 
> > Fixes: e570768566b3 ("hw/block/nvme: support for shared namespace in 
> > subsystem")
> > Cc: Minwoo Im 
> > Signed-off-by: Klaus Jensen 
> > ---
> > hw/block/nvme-ns.h |  10 ++--
> > hw/block/nvme-subsys.h |   7 ++-
> > hw/block/nvme.h|  39 +-
> > include/block/nvme.h   |   1 +
> > hw/block/nvme-ns.c |  74 +++
> > hw/block/nvme-subsys.c |  28 ---
> > hw/block/nvme.c| 112 +
> > hw/block/trace-events  |   1 -
> > 8 files changed, 106 insertions(+), 166 deletions(-)
> > 



> > 
> > static Property nvme_ns_props[] = {
> > DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
> > -DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
> > - NvmeSubsystem *),
> > DEFINE_PROP_BOOL("detached", NvmeNamespace, params.detached, false),
> > +DEFINE_PROP_BOOL("shared", NvmeNamespace, params.shared, false)i,
> 
> Nice change point, hope we need update the usage, removing "subsys" from
> nvme-ns device params and adding "shared" param?
> 

Good catch, thanks. Fixing that up for a v2.


signature.asc
Description: PGP signature


Re: [PATCH for-6.0 0/7] hw/block/nvme: misc fixes

2021-04-05 Thread Gollu Appalanaidu

On Wed, Mar 24, 2021 at 09:09:00PM +0100, Klaus Jensen wrote:

From: Klaus Jensen 

Various fixes for 6.0.

Klaus Jensen (7):
 hw/block/nvme: fix pi constraint check
 hw/block/nvme: fix missing string representation for ns attachment
 hw/block/nvme: fix the nsid 'invalid' value
 hw/block/nvme: fix controller namespaces array indexing
 hw/block/nvme: fix warning about legacy namespace configuration
 hw/block/nvme: update dmsrl limit on namespace detachment
 hw/block/nvme: fix handling of private namespaces

hw/block/nvme-ns.h |  12 ++--
hw/block/nvme-subsys.h |   7 +--
hw/block/nvme.h|  45 ++
include/block/nvme.h   |   1 +
hw/block/nvme-ns.c |  76 
hw/block/nvme-subsys.c |  28 -
hw/block/nvme.c| 131 -
hw/block/trace-events  |   1 -
8 files changed, 129 insertions(+), 172 deletions(-)

--


Reviewed-by: Gollu Appalanaidu 


2.31.0




Re: [PATCH for-6.0 7/7] hw/block/nvme: fix handling of private namespaces

2021-04-05 Thread Gollu Appalanaidu

On Wed, Mar 24, 2021 at 09:09:07PM +0100, Klaus Jensen wrote:

From: Klaus Jensen 

Prior to this patch, if a private nvme-ns device (that is, a namespace
that is not linked to a subsystem) is wired up to an nvme-subsys linked
nvme controller device, the device fails to verify that the namespace id
is unique within the subsystem. NVM Express v1.4b, Section 6.1.6 ("NSID
and Namespace Usage") states that because the device supports Namespace
Management, "NSIDs *shall* be unique within the NVM subsystem".

Additionally, prior to this patch, private namespaces are not known to
the subsystem and the namespace is considered exclusive to the
controller with which it is initially wired up to. However, this is not
the definition of a private namespace; per Section 1.6.33 ("private
namespace"), a private namespace is just a namespace that does not
support multipath I/O or namespace sharing, which means "that it is only
able to be attached to one controller at a time".

Fix this by always allocating namespaces in the subsystem (if one is
linked to the controller), regardsless of the shared/private status of
the namespace. Whether or not the namespace is shareable is controlled
by a new `shared` nvme-ns parameter.

Finally, this fix allows the nvme-ns `subsys` parameter to be removed,
since the `shared` parameter now serves the purpose of attaching the
namespace to all controllers in the subsystem upon device realization.
It is invalid to have an nvme-ns namespace device with a linked
subsystem without the parent nvme controller device also being linked to
one and since the nvme-ns devices will unconditionally be "attached" (in
QEMU terms that is) to an nvme controller device through an NvmeBus, the
nvme-ns namespace device can always get a reference to the subsystem of
the controller it is explicitly (using 'bus=' parametr) or implicitly
attaching to.

Fixes: e570768566b3 ("hw/block/nvme: support for shared namespace in subsystem")
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
---
hw/block/nvme-ns.h |  10 ++--
hw/block/nvme-subsys.h |   7 ++-
hw/block/nvme.h|  39 +-
include/block/nvme.h   |   1 +
hw/block/nvme-ns.c |  74 +++
hw/block/nvme-subsys.c |  28 ---
hw/block/nvme.c| 112 +
hw/block/trace-events  |   1 -
8 files changed, 106 insertions(+), 166 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 82340c4b2574..fb0a41f912e7 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,7 @@ typedef struct NvmeZone {

typedef struct NvmeNamespaceParams {
bool detached;
+bool shared;
uint32_t nsid;
QemuUUID uuid;

@@ -60,8 +61,8 @@ typedef struct NvmeNamespace {
const uint32_t *iocs;
uint8_t  csi;
uint16_t status;
+int  attached;

-NvmeSubsystem   *subsys;
QTAILQ_ENTRY(NvmeNamespace) entry;

NvmeIdNsZoned   *id_ns_zoned;
@@ -99,11 +100,6 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
return 0;
}

-static inline bool nvme_ns_shared(NvmeNamespace *ns)
-{
-return !!ns->subsys;
-}
-
static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
{
NvmeIdNs *id_ns = >id_ns;
@@ -225,7 +221,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
}

void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
void nvme_ns_drain(NvmeNamespace *ns);
void nvme_ns_shutdown(NvmeNamespace *ns);
void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index aafa04b84829..24132edd005c 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,7 +14,7 @@
OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)

#define NVME_SUBSYS_MAX_CTRLS   32
-#define NVME_SUBSYS_MAX_NAMESPACES  256
+#define NVME_MAX_NAMESPACES 256

typedef struct NvmeCtrl NvmeCtrl;
typedef struct NvmeNamespace NvmeNamespace;
@@ -24,7 +24,7 @@ typedef struct NvmeSubsystem {

NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
/* Allocated namespaces for this subsystem */
-NvmeNamespace *namespaces[NVME_SUBSYS_MAX_NAMESPACES + 1];
+NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1];

struct {
char *nqn;
@@ -32,7 +32,6 @@ typedef struct NvmeSubsystem {
} NvmeSubsystem;

int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
-int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp);

static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
uint32_t cntlid)
@@ -54,7 +53,7 @@ static inline NvmeNamespace *nvme_subsys_ns(NvmeSubsystem 
*subsys,
return NULL;
}

-assert(nsid && nsid <= NVME_SUBSYS_MAX_NAMESPACES);
+assert(nsid && nsid <= NVME_MAX_NAMESPACES);

return subsys->namespaces[nsid];
}
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 1570f65989a7..644143597a0f 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ 

Re: [PATCH for-6.0 0/7] hw/block/nvme: misc fixes

2021-04-05 Thread Klaus Jensen
On Mar 24 21:09, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Various fixes for 6.0.
> 
> Klaus Jensen (7):
>   hw/block/nvme: fix pi constraint check
>   hw/block/nvme: fix missing string representation for ns attachment
>   hw/block/nvme: fix the nsid 'invalid' value
>   hw/block/nvme: fix controller namespaces array indexing
>   hw/block/nvme: fix warning about legacy namespace configuration
>   hw/block/nvme: update dmsrl limit on namespace detachment
>   hw/block/nvme: fix handling of private namespaces
> 
>  hw/block/nvme-ns.h |  12 ++--
>  hw/block/nvme-subsys.h |   7 +--
>  hw/block/nvme.h|  45 ++
>  include/block/nvme.h   |   1 +
>  hw/block/nvme-ns.c |  76 
>  hw/block/nvme-subsys.c |  28 -
>  hw/block/nvme.c| 131 -
>  hw/block/trace-events  |   1 -
>  8 files changed, 129 insertions(+), 172 deletions(-)
> 

Ping on patches [3,4,7/7] :)


signature.asc
Description: PGP signature