date:20210121

Re: [PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-21 Thread Minwoo Im

On 21-01-21 15:17:16, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 07:09:06AM +0900, Minwoo Im wrote:
> > -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
> > +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t 
> > cntlid)
> >  {
> >  NvmeIdCtrl *id = >id_ctrl;
> >  uint8_t *pci_conf = pci_dev->config;
> >  
> > +n->cntlid = cntlid;
> 
> I don't think 'cntlid' is important enough to be a function parameter.
> You can just set it within the 'NvmeCtrl' struct before calling this
> function like all the other properties.

Okay.  Rather than adding a parameter to this function,
nvme_init_subsys() may take this job to assign cntlid to the controller
instance first.  Let me fix one!

> > @@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> > **errp)
> >  return;
> >  }
> >  
> > -nvme_init_ctrl(n, pci_dev);
> > +cntlid = nvme_init_subsys(n, errp);
> > +if (cntlid < 0) {
> 
> error_propogate();

Thanks for catching this.

Re: [PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-21 Thread Minwoo Im

On 21-01-21 15:03:38, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 07:09:04AM +0900, Minwoo Im wrote:
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -23,6 +23,7 @@
> >   *  max_ioqpairs=, \
> >   *  aerl=, aer_max_queued=, \
> >   *  mdts=,zoned.append_size_limit= \
> > + *  ,subsys= \
> 
> For consistency, the ',' goes in the preceeding line.

I have no idea what happened here :(.  Will fix it. Thanks!

Re: [PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-21 Thread Minwoo Im

On 21-01-21 14:52:02, Keith Busch wrote:
> On Fri, Jan 22, 2021 at 07:09:03AM +0900, Minwoo Im wrote:
> > +static void nvme_subsys_setup(NvmeSubsystem *subsys)
> > +{
> > +char *subnqn;
> > +
> > +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", 
> > subsys->parent_obj.id);
> > +strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, 
> > '\0');
> > +g_free(subnqn);
> 
> Instead of the duplication and copy, you could format the string
> directly into the destination:
> 
> snprintf(subsys->subnqn, sizeof(subsys->subnqn), 
> "nqn.2019-08.org.qemu:%s",
>  subsys->parent_obj.id);

Oh, Thanks. Will fix it!

Re: [PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-21 Thread Keith Busch

On Fri, Jan 22, 2021 at 07:09:06AM +0900, Minwoo Im wrote:
> -static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
> +static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid)
>  {
>  NvmeIdCtrl *id = >id_ctrl;
>  uint8_t *pci_conf = pci_dev->config;
>  
> +n->cntlid = cntlid;

I don't think 'cntlid' is important enough to be a function parameter.
You can just set it within the 'NvmeCtrl' struct before calling this
function like all the other properties.

> @@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  return;
>  }
>  
> -nvme_init_ctrl(n, pci_dev);
> +cntlid = nvme_init_subsys(n, errp);
> +if (cntlid < 0) {

error_propogate();

> +return;
> +}
> +nvme_init_ctrl(n, pci_dev, cntlid);

Re: [PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-21 Thread Keith Busch

On Fri, Jan 22, 2021 at 07:09:04AM +0900, Minwoo Im wrote:
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -23,6 +23,7 @@
>   *  max_ioqpairs=, \
>   *  aerl=, aer_max_queued=, \
>   *  mdts=,zoned.append_size_limit= \
> + *  ,subsys= \

For consistency, the ',' goes in the preceeding line.

>   *  -device nvme-ns,drive=,bus=,nsid=,\
>   *  zoned=
>   *  -device nvme-subsys,id=

> @@ -4404,11 +4412,25 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
> *pci_dev, Error **errp)
>  return 0;
>  }
>  
> +static void nvme_init_subnqn(NvmeCtrl *n)
> +{
> +NvmeSubsystem *subsys = n->subsys;
> +NvmeIdCtrl *id = >id_ctrl;
> +char *subnqn;
> +
> +if (!subsys) {
> +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", 
> n->params.serial);
> +strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
> +g_free(subnqn);

snprintf(id->subnqn, sizeof(id->subnqn), "nqn.2019-08.org.qemu:%s", 
n->params.serial);

> +} else {
> +pstrcpy((char *)id->subnqn, sizeof(id->subnqn), 
> (char*)subsys->subnqn);
> +}
> +}

Re: [PATCH v4 05/16] block/io: bdrv_pad_request(): support qemu_iovec_init_extended failure

2021-01-21 Thread Eric Blake

On 12/11/20 12:39 PM, Vladimir Sementsov-Ogievskiy wrote:
> Make bdrv_pad_request() honest: return error if
> qemu_iovec_init_extended() failed.
> 
> Update also bdrv_padding_destroy() to clean the structure for safety.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/io.c | 45 +++--
>  1 file changed, 31 insertions(+), 14 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-21 Thread Keith Busch

On Fri, Jan 22, 2021 at 07:09:03AM +0900, Minwoo Im wrote:
> +static void nvme_subsys_setup(NvmeSubsystem *subsys)
> +{
> +char *subnqn;
> +
> +subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", 
> subsys->parent_obj.id);
> +strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, '\0');
> +g_free(subnqn);

Instead of the duplication and copy, you could format the string
directly into the destination:

snprintf(subsys->subnqn, sizeof(subsys->subnqn), "nqn.2019-08.org.qemu:%s",
 subsys->parent_obj.id);

Re: [PATCH v4 04/16] block/io: refactor bdrv_pad_request(): move bdrv_pad_request() up

2021-01-21 Thread Eric Blake

On 12/11/20 12:39 PM, Vladimir Sementsov-Ogievskiy wrote:
> Prepare to the following patch when bdrv_pad_request() will be able to

s/Prepare to/Prepare for/

> fail. Update the comments.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/io.c | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 

> -if (bdrv_pad_request(bs, , _offset, , , )) {
> +if (padded) {
> +/*
> + * Request was unaligned to request_alignment and therefore padded.
> + * We are going to do read-modify-write. User is not prepared to 
> widened
> + * request intersections with other requests, so we serialize the
> + * request.

We are going to do read-modify-write, and must serialize the request to
prevent interactions of the widened region with other transactions.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v4 03/16] block: fix theoretical overflow in bdrv_init_padding()

2021-01-21 Thread Eric Blake

On 12/11/20 12:39 PM, Vladimir Sementsov-Ogievskiy wrote:
> Calculation of sum may theoretically overflow, so use 64bit type and
> add some good assertions.
> 
> Use int64_t constantly.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/io.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 21e8a50725..d9bc67f1b0 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1537,8 +1537,12 @@ static bool bdrv_init_padding(BlockDriverState *bs,
>int64_t offset, int64_t bytes,
>BdrvRequestPadding *pad)
>  {
> -uint64_t align = bs->bl.request_alignment;
> -size_t sum;
> +int64_t align = bs->bl.request_alignment;
> +int64_t sum;
> +
> +bdrv_check_request(offset, bytes, _abort);
> +assert(align <= INT_MAX); /* documented in block/block_int.h */
> +assert(align * 2 <= SIZE_MAX); /* so we can allocate the buffer */

Would look better as assert(align <= SIZE_MAX / 2); but not technically
wrong as written because the earlier line proved align is less than 32
bits so align*2 can't overflow 63 bits.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH V4 5/6] hw/block/nvme: add NMIC enum value for Identify Namespace

2021-01-21 Thread Minwoo Im

Added Namespace Multi-path I/O and Namespace Sharing Capabilities (NMIC)
field to support shared namespace from controller(s).

This field is in Identify Namespace data structure in [30].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index d6415a869c1c..ad68cdc2b92d 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1203,6 +1203,10 @@ enum NvmeNsIdentifierType {
 NVME_NIDT_CSI   = 0x04,
 };
 
+enum NvmeIdNsNmic {
+NVME_NMIC_NS_SHARED = 1 << 0,
+};
+
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
 NVME_CSI_ZONED  = 0x02,
-- 
2.17.1

[PATCH V4 3/6] hw/block/nvme: add CMIC enum value for Identify Controller

2021-01-21 Thread Minwoo Im

Added Controller Multi-path I/O and Namespace Sharing Capabilities
(CMIC) field to support multi-controller in the following patches.

This field is in Identify Controller data structure in [76].

Signed-off-by: Minwoo Im 
---
 include/block/nvme.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e4b918064df9..d6415a869c1c 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1034,6 +1034,10 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+enum NvmeIdCtrlCmic {
+NVME_CMIC_MULTI_CTRL= 1 << 1,
+};
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
-- 
2.17.1

[PATCH V4 2/6] hw/block/nvme: support to map controller to a subsystem

2021-01-21 Thread Minwoo Im

nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys).
This patch maps a controller to a subsystem by adding a parameter
'subsys' to the nvme device.

To map a controller to a subsystem, we need to put nvme-subsys first and
then maps the subsystem to the controller:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0

If 'subsys' property is not given to the nvme controller, then subsystem
NQN will be created with serial (e.g., 'foo' in above example),
Otherwise, it will be based on subsys id (e.g., 'subsys0' in above
example).

Signed-off-by: Minwoo Im 
---
 hw/block/nvme.c | 30 ++
 hw/block/nvme.h |  3 +++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index aabccdf36f4b..ab0531492ddd 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -23,6 +23,7 @@
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
  *  mdts=,zoned.append_size_limit= \
+ *  ,subsys= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
  *  -device nvme-subsys,id=
@@ -44,6 +45,13 @@
  *
  * nvme device parameters
  * ~~
+ * - `subsys`
+ *   NVM Subsystem device. If given, a subsystem NQN will be initialized with
+ *given. Otherwise,  will be taken for subsystem NQN.
+ *   Also, it will enable multi controller capability represented in Identify
+ *   Controller data structure in CMIC (Controller Multi-path I/O and Namesapce
+ *   Sharing Capabilities), if given.
+ *
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
  *   of concurrently outstanding Asynchronous Event Request commands support
@@ -4404,11 +4412,25 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 return 0;
 }
 
+static void nvme_init_subnqn(NvmeCtrl *n)
+{
+NvmeSubsystem *subsys = n->subsys;
+NvmeIdCtrl *id = >id_ctrl;
+char *subnqn;
+
+if (!subsys) {
+subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
+strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
+g_free(subnqn);
+} else {
+pstrcpy((char *)id->subnqn, sizeof(id->subnqn), (char*)subsys->subnqn);
+}
+}
+
 static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
-char *subnqn;
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
@@ -4455,9 +4477,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->sgls = cpu_to_le32(NVME_CTRL_SGLS_SUPPORT_NO_ALIGN |
NVME_CTRL_SGLS_BITBUCKET);
 
-subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", n->params.serial);
-strpadcpy((char *)id->subnqn, sizeof(id->subnqn), subnqn, '\0');
-g_free(subnqn);
+nvme_init_subnqn(n);
 
 id->psd[0].mp = cpu_to_le16(0x9c4);
 id->psd[0].enlat = cpu_to_le32(0x10);
@@ -4545,6 +4565,8 @@ static Property nvme_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf),
 DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
 DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
 DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index dee6092bd45f..04d4684601fd 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -2,6 +2,7 @@
 #define HW_NVME_H
 
 #include "block/nvme.h"
+#include "nvme-subsys.h"
 #include "nvme-ns.h"
 
 #define NVME_MAX_NAMESPACES 256
@@ -170,6 +171,8 @@ typedef struct NvmeCtrl {
 
 uint8_t zasl;
 
+NvmeSubsystem   *subsys;
+
 NvmeNamespace   namespace;
 NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
 NvmeSQueue  **sq;
-- 
2.17.1

[PATCH V4 4/6] hw/block/nvme: support for multi-controller in subsystem

2021-01-21 Thread Minwoo Im

We have nvme-subsys and nvme devices mapped together.  To support
multi-controller scheme to this setup, controller identifier(id) has to
be managed.  Earlier, cntlid(controller id) used to be always 0 because
we didn't have any subsystem scheme that controller id matters.

This patch introduced 'cntlid' attribute to the nvme controller
instance(NvmeCtrl) and make it allocated by the nvme-subsys device
mapped to the controller.  If nvme-subsys is not given to the
controller, then it will always be 0 as it was.

Added 'ctrls' array in the nvme-subsys instance to manage attached
controllers to the subsystem with a limit(32).  This patch didn't take
list for the controllers to make it seamless with nvme-ns device.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-subsys.c | 21 +
 hw/block/nvme-subsys.h |  4 
 hw/block/nvme.c| 34 --
 hw/block/nvme.h|  1 +
 4 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index f1dc71d588d9..a01003136b12 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -22,6 +22,27 @@
 #include "nvme.h"
 #include "nvme-subsys.h"
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
+{
+NvmeSubsystem *subsys = n->subsys;
+int cntlid;
+
+for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
+if (!subsys->ctrls[cntlid]) {
+break;
+}
+}
+
+if (cntlid == ARRAY_SIZE(subsys->ctrls)) {
+error_setg(errp, "no more free controller id");
+return -1;
+}
+
+subsys->ctrls[cntlid] = n;
+
+return cntlid;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 40f06a4c7db0..4eba50d96a1d 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -20,6 +20,10 @@ typedef struct NvmeNamespace NvmeNamespace;
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
 uint8_t subnqn[256];
+
+NvmeCtrl*ctrls[NVME_SUBSYS_MAX_CTRLS];
 } NvmeSubsystem;
 
+int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+
 #endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ab0531492ddd..225f0d3f3a27 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4427,16 +4427,21 @@ static void nvme_init_subnqn(NvmeCtrl *n)
 }
 }
 
-static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
+static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t cntlid)
 {
 NvmeIdCtrl *id = >id_ctrl;
 uint8_t *pci_conf = pci_dev->config;
 
+n->cntlid = cntlid;
+
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
 strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
 strpadcpy((char *)id->fr, sizeof(id->fr), "1.0", ' ');
 strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
+
+id->cntlid = cntlid;
+
 id->rab = 6;
 id->ieee[0] = 0x00;
 id->ieee[1] = 0x02;
@@ -4483,6 +4488,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->psd[0].enlat = cpu_to_le32(0x10);
 id->psd[0].exlat = cpu_to_le32(0x4);
 
+if (n->subsys) {
+id->cmic |= NVME_CMIC_MULTI_CTRL;
+}
+
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
@@ -4497,11 +4506,28 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 n->bar.intmc = n->bar.intms = 0;
 }
 
+static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
+{
+int cntlid;
+
+if (!n->subsys) {
+return 0;
+}
+
+cntlid = nvme_subsys_register_ctrl(n, errp);
+if (cntlid < 0) {
+return -1;
+}
+
+return cntlid;
+}
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
 NvmeNamespace *ns;
 Error *local_err = NULL;
+int cntlid;
 
 nvme_check_constraints(n, _err);
 if (local_err) {
@@ -4517,7 +4543,11 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 return;
 }
 
-nvme_init_ctrl(n, pci_dev);
+cntlid = nvme_init_subsys(n, errp);
+if (cntlid < 0) {
+return;
+}
+nvme_init_ctrl(n, pci_dev, cntlid);
 
 /* setup a namespace if the controller drive property was given */
 if (n->namespace.blkconf.blk) {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 04d4684601fd..b8f5f2d6ffb8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -134,6 +134,7 @@ typedef struct NvmeCtrl {
 NvmeBus  bus;
 BlockConfconf;
 
+uint16_tcntlid;
 boolqs_created;
 uint32_tpage_size;
 uint16_tpage_bits;
-- 
2.17.1

[PATCH V4 6/6] hw/block/nvme: support for shared namespace in subsystem

2021-01-21 Thread Minwoo Im

nvme-ns device is registered to a nvme controller device during the
initialization in nvme_register_namespace() in case that 'bus' property
is given which means it's mapped to a single controller.

This patch introduced a new property 'subsys' just like the controller
device instance did to map a namespace to a NVMe subsystem.

If 'subsys' property is given to the nvme-ns device, it will belong to
the specified subsystem and will be attached to all controllers in that
subsystem by enabling shared namespace capability in NMIC(Namespace
Multi-path I/O and Namespace Capabilities) in Identify Namespace.

Usage:

  -device nvme-subsys,id=subsys0
  -device nvme,serial=foo,id=nvme0,subsys=subsys0
  -device nvme,serial=bar,id=nvme1,subsys=subsys0
  -device nvme,serial=baz,id=nvme2,subsys=subsys0
  -device nvme-ns,id=ns1,drive=,nsid=1,subsys=subsys0  # Shared
  -device nvme-ns,id=ns2,drive=,nsid=2,bus=nvme2   # Non-shared

  In the above example, 'ns1' will be shared to 'nvme0' and 'nvme1' in
  the same subsystem.  On the other hand, 'ns2' will be attached to the
  'nvme2' only as a private namespace in that subsystem.

All the namespace with 'subsys' parameter will attach all controllers in
the subsystem to the namespace by default.

Signed-off-by: Minwoo Im 
---
 hw/block/nvme-ns.c | 23 ++-
 hw/block/nvme-ns.h |  7 +++
 hw/block/nvme-subsys.c | 25 +
 hw/block/nvme-subsys.h |  3 +++
 hw/block/nvme.c| 10 +-
 5 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 62b25cf69bfa..9b493f2ead03 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -63,6 +63,10 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+if (nvme_ns_shared(ns)) {
+id_ns->nmic |= NVME_NMIC_NS_SHARED;
+}
+
 return 0;
 }
 
@@ -365,16 +369,25 @@ static void nvme_ns_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (nvme_register_namespace(n, ns, errp)) {
-error_propagate_prepend(errp, local_err,
-"could not register namespace: ");
-return;
+if (ns->subsys) {
+if (nvme_subsys_register_ns(ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not setup namespace to subsys: ");
+return;
+}
+} else {
+if (nvme_register_namespace(n, ns, errp)) {
+error_propagate_prepend(errp, local_err,
+"could not register namespace: ");
+return;
+}
 }
-
 }
 
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
+DEFINE_PROP_LINK("subsys", NvmeNamespace, subsys, TYPE_NVME_SUBSYS,
+ NvmeSubsystem *),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
 DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 293ac990e3f6..929e78861903 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -47,6 +47,8 @@ typedef struct NvmeNamespace {
 const uint32_t *iocs;
 uint8_t  csi;
 
+NvmeSubsystem   *subsys;
+
 NvmeIdNsZoned   *id_ns_zoned;
 NvmeZone*zone_array;
 QTAILQ_HEAD(, NvmeZone) exp_open_zones;
@@ -77,6 +79,11 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline bool nvme_ns_shared(NvmeNamespace *ns)
+{
+return !!ns->subsys;
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index a01003136b12..e7efdcae7d0d 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -43,6 +43,31 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+int nvme_subsys_register_ns(NvmeNamespace *ns, Error **errp)
+{
+NvmeSubsystem *subsys = ns->subsys;
+NvmeCtrl *n;
+int i;
+
+if (subsys->namespaces[nvme_nsid(ns)]) {
+error_setg(errp, "namespace %d already registerd to subsy %s",
+   nvme_nsid(ns), subsys->parent_obj.id);
+return -1;
+}
+
+subsys->namespaces[nvme_nsid(ns)] = ns;
+
+for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
+n = subsys->ctrls[i];
+
+if (n && nvme_register_namespace(n, ns, errp)) {
+return -1;
+}
+}
+
+return 0;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 char *subnqn;
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
index 4eba50d96a1d..ccf6a71398d3 100644
--- a/hw/block/nvme-subsys.h
+++ b/hw/block/nvme-subsys.h
@@ -14,6 +14,7 @@
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 #define NVME_SUBSYS_MAX_CTRLS   32
+#define NVME_SUBSYS_MAX_NAMESPACES  32

[PATCH V4 1/6] hw/block/nvme: introduce nvme-subsys device

2021-01-21 Thread Minwoo Im

To support multi-path in QEMU NVMe device model, We need to have NVMe
subsystem hierarchy to map controllers and namespaces to a NVMe
subsystem.

This patch introduced a simple nvme-subsys device model.  The subsystem
will be prepared with subsystem NQN with  provided in
nvme-subsys device:

  ex) -device nvme-subsys,id=subsys0: nqn.2019-08.org.qemu:subsys0

Signed-off-by: Minwoo Im 
---
 hw/block/meson.build   |  2 +-
 hw/block/nvme-subsys.c | 63 ++
 hw/block/nvme-subsys.h | 25 +
 hw/block/nvme.c|  3 ++
 4 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

diff --git a/hw/block/meson.build b/hw/block/meson.build
index 602ca6c8541d..83ea2d37978d 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,7 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_SH4', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c'))
+softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
new file mode 100644
index ..f1dc71d588d9
--- /dev/null
+++ b/hw/block/nvme-subsys.c
@@ -0,0 +1,63 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#include "qemu/units.h"
+#include "qemu/osdep.h"
+#include "qemu/uuid.h"
+#include "qemu/iov.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-core.h"
+#include "hw/block/block.h"
+#include "block/aio.h"
+#include "block/accounting.h"
+#include "sysemu/sysemu.h"
+#include "hw/pci/pci.h"
+#include "nvme.h"
+#include "nvme-subsys.h"
+
+static void nvme_subsys_setup(NvmeSubsystem *subsys)
+{
+char *subnqn;
+
+subnqn = g_strdup_printf("nqn.2019-08.org.qemu:%s", subsys->parent_obj.id);
+strpadcpy((char *)subsys->subnqn, sizeof(subsys->subnqn), subnqn, '\0');
+g_free(subnqn);
+}
+
+static void nvme_subsys_realize(DeviceState *dev, Error **errp)
+{
+NvmeSubsystem *subsys = NVME_SUBSYS(dev);
+
+nvme_subsys_setup(subsys);
+}
+
+static void nvme_subsys_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+dc->realize = nvme_subsys_realize;
+dc->desc = "Virtual NVMe subsystem";
+}
+
+static const TypeInfo nvme_subsys_info = {
+.name = TYPE_NVME_SUBSYS,
+.parent = TYPE_DEVICE,
+.class_init = nvme_subsys_class_init,
+.instance_size = sizeof(NvmeSubsystem),
+};
+
+static void nvme_subsys_register_types(void)
+{
+type_register_static(_subsys_info);
+}
+
+type_init(nvme_subsys_register_types)
diff --git a/hw/block/nvme-subsys.h b/hw/block/nvme-subsys.h
new file mode 100644
index ..40f06a4c7db0
--- /dev/null
+++ b/hw/block/nvme-subsys.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU NVM Express Subsystem: nvme-subsys
+ *
+ * Copyright (c) 2021 Minwoo Im 
+ *
+ * This code is licensed under the GNU GPL v2.  Refer COPYING.
+ */
+
+#ifndef NVME_SUBSYS_H
+#define NVME_SUBSYS_H
+
+#define TYPE_NVME_SUBSYS "nvme-subsys"
+#define NVME_SUBSYS(obj) \
+OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+
+#define NVME_SUBSYS_MAX_CTRLS   32
+
+typedef struct NvmeCtrl NvmeCtrl;
+typedef struct NvmeNamespace NvmeNamespace;
+typedef struct NvmeSubsystem {
+DeviceState parent_obj;
+uint8_t subnqn[256];
+} NvmeSubsystem;
+
+#endif /* NVME_SUBSYS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 21aec90637fa..aabccdf36f4b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -25,6 +25,7 @@
  *  mdts=,zoned.append_size_limit= \
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=
+ *  -device nvme-subsys,id=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. By default, the
@@ -38,6 +39,8 @@
  *
  * The PMR will use BAR 4/5 exclusively.
  *
+ * To place controller(s) and namespace(s) to a subsystem, then provide
+ * nvme-subsys device as above.
  *
  * nvme device parameters
  * ~~
-- 
2.17.1

[PATCH V4 0/6] hw/block/nvme: support multi-path for ctrl/ns

2021-01-21 Thread Minwoo Im

Hello,

This series is fourth series for the support of NVMe subsystem scheme
with multi-controller and namespace sharing in a subsystem.

This time, I've removed 'detached' scheme introduced in the previous
series out of this series to focus on subsystem scheme in ths series.
The attach/detach scheme will be introduced in the next series with
ns-mgmt functionality.

Here's an example of how-to:

  # Specify a subsystem
  -device nvme-subsys,id=subsys0 \
  -device nvme,serial=foo,id=nvme0,subsys=subsys0 \
  -device nvme,serial=bar,id=nvme1,subsys=subsys0 \
  -device nvme,serial=baz,id=nvme2,subsys=subsys0 \
  -device nvme-ns,id=ns1,drive=drv10,nsid=1,subsys=subsys0 \
  -device nvme-ns,id=ns2,drive=drv11,nsid=2,bus=nvme2 \
  \
  # Not specify a subsystem
  -device nvme,serial=qux,id=nvme3 \
  -device nvme-ns,id=ns3,drive=drv12,nsid=3,bus=nvme3 \

# nvme list -v
  NVM Express Subsystems

  SubsystemSubsystem-NQN
Controllers
   

 
  nvme-subsys1 nqn.2019-08.org.qemu:subsys0 
nvme0, nvme1, nvme2
  nvme-subsys3 nqn.2019-08.org.qemu:qux 
nvme3

  NVM Express Controllers

  Device   SN   MN   FR 
  TxPort AddressSubsystemNamespaces  
     
 -- --  
  nvme0foo  QEMU NVMe Ctrl   1.0
  pcie   :00:06.0   nvme-subsys1 nvme1c0n1
  nvme1bar  QEMU NVMe Ctrl   1.0
  pcie   :00:07.0   nvme-subsys1 nvme1c1n1
  nvme2baz  QEMU NVMe Ctrl   1.0
  pcie   :00:08.0   nvme-subsys1 nvme1c2n1, nvme1c2n2
  nvme3qux  QEMU NVMe Ctrl   1.0
  pcie   :00:09.0   nvme-subsys3 nvme3c3n1

  NVM Express Namespaces

  Device   NSID Usage  Format   Controllers 

    --  

  nvme1n1  1134.22  MB / 134.22  MB512   B +  0 B   nvme0, 
nvme1, nvme2
  nvme1n2  2268.44  MB / 268.44  MB512   B +  0 B   nvme2
  nvme3n1  3268.44  MB / 268.44  MB512   B +  0 B   nvme3

Thanks,

Since RFC V3:
  - Exclude 'deatched' scheme from this series.  This will be covered in
the next series by covering all the ns-related admin commands
including ZNS and ns-mgmt. (Niklas)
  - Rebased on nvme-next.
  - Remove RFC tag from this V4.

Since RFC V2:
  - Rebased on nvme-next branch with trivial patches from the previous
version(V2) applied. (Klaus)
  - Fix enumeration type name convention with NvmeIdNs prefix. (Klaus)
  - Put 'cntlid' to NvmeCtrl instance in nvme_init_ctrl() which was
missed in V2.
  - Added 'detached' parameter to nvme-ns device to decide whether to
attach or not to controller(s) in the subsystem. (Klaus)
  - Implemented Identify Active Namespace ID List aprt from Identify
Allocated Namespace ID List by removing fall-thru statement.

Since RFC V1:
  - Updated namespace sharing scheme to be based on nvme-subsys
hierarchy.

Minwoo Im (6):
  hw/block/nvme: introduce nvme-subsys device
  hw/block/nvme: support to map controller to a subsystem
  hw/block/nvme: add CMIC enum value for Identify Controller
  hw/block/nvme: support for multi-controller in subsystem
  hw/block/nvme: add NMIC enum value for Identify Namespace
  hw/block/nvme: support for shared namespace in subsystem

 hw/block/meson.build   |   2 +-
 hw/block/nvme-ns.c |  23 +++--
 hw/block/nvme-ns.h |   7 +++
 hw/block/nvme-subsys.c | 109 +
 hw/block/nvme-subsys.h |  32 
 hw/block/nvme.c|  77 ++---
 hw/block/nvme.h|   4 ++
 include/block/nvme.h   |   8 +++
 8 files changed, 249 insertions(+), 13 deletions(-)
 create mode 100644 hw/block/nvme-subsys.c
 create mode 100644 hw/block/nvme-subsys.h

-- 
2.17.1

Re: [PATCH v4 02/16] util/iov: make qemu_iovec_init_extended() honest

2021-01-21 Thread Eric Blake

On 12/11/20 12:39 PM, Vladimir Sementsov-Ogievskiy wrote:
> Actually, we can't extend the io vector in all cases. Handle possible
> MAX_IOV and size_t overflows.
> 
> For now add assertion to callers (actually they rely on success anyway)
> and fix them in the following patch.
> 
> Add also some additional good assertions to qemu_iovec_init_slice()
> while being here.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qemu/iov.h |  2 +-
>  block/io.c | 10 +++---
>  util/iov.c | 25 +++--
>  3 files changed, 31 insertions(+), 6 deletions(-)
> 

> @@ -492,7 +506,14 @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t 
> offset, size_t bytes)
>  void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
> size_t offset, size_t len)
>  {
> -qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
> +int ret;
> +
> +assert(source->size >= len);
> +assert(source->size - len >= offset);
> +
> +/* We shrink the request, so we can't overflow neither size_t nor 
> MAX_IOV */

We shrink the request, so neither size_t nor MAX_IOV will overflow

> +ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 
> 0);
> +assert(ret == 0);
>  }
>  
>  void qemu_iovec_destroy(QEMUIOVector *qiov)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 7/7] block/rbd: change request alignment to 1 byte

2021-01-21 Thread Jason Dillaman

On Thu, Jan 21, 2021 at 3:29 PM Peter Lieven  wrote:
>
> Am 21.01.21 um 20:42 schrieb Jason Dillaman:
> > On Wed, Jan 20, 2021 at 6:01 PM Peter Lieven  wrote:
> >>
> >>> Am 19.01.2021 um 15:20 schrieb Jason Dillaman :
> >>>
> >>> On Tue, Jan 19, 2021 at 4:36 AM Peter Lieven  wrote:
> > Am 18.01.21 um 23:33 schrieb Jason Dillaman:
> > On Fri, Jan 15, 2021 at 10:39 AM Peter Lieven  wrote:
> >> Am 15.01.21 um 16:27 schrieb Jason Dillaman:
> >>> On Thu, Jan 14, 2021 at 2:59 PM Peter Lieven  wrote:
>  Am 14.01.21 um 20:19 schrieb Jason Dillaman:
> > On Sun, Dec 27, 2020 at 11:42 AM Peter Lieven  wrote:
> >> since we implement byte interfaces and librbd supports aio on byte 
> >> granularity we can lift
> >> the 512 byte alignment.
> >> Signed-off-by: Peter Lieven 
> >> ---
> >> block/rbd.c | 2 --
> >> 1 file changed, 2 deletions(-)
> >> diff --git a/block/rbd.c b/block/rbd.c
> >> index 27b4404adf..8673e8f553 100644
> >> --- a/block/rbd.c
> >> +++ b/block/rbd.c
> >> @@ -223,8 +223,6 @@ done:
> >> static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error 
> >> **errp)
> >> {
> >>BDRVRBDState *s = bs->opaque;
> >> -/* XXX Does RBD support AIO on less than 512-byte alignment? 
> >> */
> >> -bs->bl.request_alignment = 512;
> > Just a suggestion, but perhaps improve discard alignment, max 
> > discard,
> > optimal alignment (if that's something QEMU handles internally) if 
> > not
> > overridden by the user.
>  Qemu supports max_discard and discard_alignment. Is there a call to 
>  get these limits
>  from librbd?
>  What do you mean by optimal_alignment? The object size?
> >>> krbd does a good job of initializing defaults [1] where optimal and
> >>> discard alignment is 64KiB (can actually be 4KiB now), max IO size for
> >>> writes, discards, and write-zeroes is the object size * the stripe
> >>> count.
> >> Okay, I will have a look at it. If qemu issues a write, discard, 
> >> write_zero greater than
> >> obj_size  * stripe count will librbd split it internally or will the 
> >> request fail?
> > librbd will handle it as needed. My goal is really just to get the
> > hints down the guest OS.
> >> Regarding the alignment it seems that rbd_dev->opts->alloc_size is 
> >> something that comes from the device
> >> configuration and not from rbd? I don't have that information inside 
> >> the Qemu RBD driver.
> > librbd doesn't really have the information either. The 64KiB guess
> > that krbd uses was a compromise since that was the default OSD
> > allocation size for HDDs since Luminous. Starting with Pacific that
> > default is going down to 4KiB.
>  I will try to adjust these values as far as it is possible and makes 
>  sense.
>  Is there a way to check the minimum supported OSD release in the backend 
>  from librbd / librados?
> >>> It's not a minimum -- RADOS will gladly access 1 byte writes as well.
> >>> It's really just the optimal (performance and space-wise). Sadly,
> >>> there is no realistic way to query this data from the backend.
> >> So you would suggest to advertise an optimal transfer length of 64k and 
> >> max transfer length of obj size * stripe count to the guest unless we have 
> >> an API in the future to query these limits from the backend?
> > I'll open a Ceph tracker ticket to expose these via the API in a future 
> > release.
>
>
> That would be good to have!
>
>
> >
> >> I would leave request alignment at 1 byte as otherwise Qemu will issue 
> >> RMWs for all write requests that do not align. Everything that comes from 
> >> a guest OS is very likely 4k aligned anyway.
> > My goal is definitely not to have QEMU do any extra work for splitting
> > or aligning IOs. I am really only trying to get hints passed down the
> > guest via the virtio drivers. If there isn't the plumbing in QEMU for
> > that yet, disregard.
>
>
> From what I read from the code Qemu emulates the Block Limits VPD Page for 
> virtio-scsi, but the limits there are not taken from the backend driver. They 
> can be passed as config to the virtio-scsi device.
>
> However, Qemu uses all the Block Limit that can be found in 
> include/block/block_int.h in the BlockLimits struct to generate optimal 
> requests if it comes to block operations like mirroring, or zeroing of a whole
>
> device etc. Some of the alignments (e.g. the request alignment) have direct 
> influence and make Qemu split requests from the guest or even make RMW cycles.
>
> If my understanding is incorrect please anyone correct me.
>
>
> It would indeed be nice to have an option to propagate the settings from the 
> backend driver into the Guest. But from my understanding this is not there 
> yet.
>
>
> So

Re: [PATCH 7/7] block/rbd: change request alignment to 1 byte

2021-01-21 Thread Peter Lieven

Am 21.01.21 um 20:42 schrieb Jason Dillaman:
> On Wed, Jan 20, 2021 at 6:01 PM Peter Lieven  wrote:
>>
>>> Am 19.01.2021 um 15:20 schrieb Jason Dillaman :
>>>
>>> On Tue, Jan 19, 2021 at 4:36 AM Peter Lieven  wrote:
> Am 18.01.21 um 23:33 schrieb Jason Dillaman:
> On Fri, Jan 15, 2021 at 10:39 AM Peter Lieven  wrote:
>> Am 15.01.21 um 16:27 schrieb Jason Dillaman:
>>> On Thu, Jan 14, 2021 at 2:59 PM Peter Lieven  wrote:
 Am 14.01.21 um 20:19 schrieb Jason Dillaman:
> On Sun, Dec 27, 2020 at 11:42 AM Peter Lieven  wrote:
>> since we implement byte interfaces and librbd supports aio on byte 
>> granularity we can lift
>> the 512 byte alignment.
>> Signed-off-by: Peter Lieven 
>> ---
>> block/rbd.c | 2 --
>> 1 file changed, 2 deletions(-)
>> diff --git a/block/rbd.c b/block/rbd.c
>> index 27b4404adf..8673e8f553 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -223,8 +223,6 @@ done:
>> static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error 
>> **errp)
>> {
>>BDRVRBDState *s = bs->opaque;
>> -/* XXX Does RBD support AIO on less than 512-byte alignment? */
>> -bs->bl.request_alignment = 512;
> Just a suggestion, but perhaps improve discard alignment, max discard,
> optimal alignment (if that's something QEMU handles internally) if not
> overridden by the user.
 Qemu supports max_discard and discard_alignment. Is there a call to 
 get these limits
 from librbd?
 What do you mean by optimal_alignment? The object size?
>>> krbd does a good job of initializing defaults [1] where optimal and
>>> discard alignment is 64KiB (can actually be 4KiB now), max IO size for
>>> writes, discards, and write-zeroes is the object size * the stripe
>>> count.
>> Okay, I will have a look at it. If qemu issues a write, discard, 
>> write_zero greater than
>> obj_size  * stripe count will librbd split it internally or will the 
>> request fail?
> librbd will handle it as needed. My goal is really just to get the
> hints down the guest OS.
>> Regarding the alignment it seems that rbd_dev->opts->alloc_size is 
>> something that comes from the device
>> configuration and not from rbd? I don't have that information inside the 
>> Qemu RBD driver.
> librbd doesn't really have the information either. The 64KiB guess
> that krbd uses was a compromise since that was the default OSD
> allocation size for HDDs since Luminous. Starting with Pacific that
> default is going down to 4KiB.
 I will try to adjust these values as far as it is possible and makes sense.
 Is there a way to check the minimum supported OSD release in the backend 
 from librbd / librados?
>>> It's not a minimum -- RADOS will gladly access 1 byte writes as well.
>>> It's really just the optimal (performance and space-wise). Sadly,
>>> there is no realistic way to query this data from the backend.
>> So you would suggest to advertise an optimal transfer length of 64k and max 
>> transfer length of obj size * stripe count to the guest unless we have an 
>> API in the future to query these limits from the backend?
> I'll open a Ceph tracker ticket to expose these via the API in a future 
> release.


That would be good to have!


>
>> I would leave request alignment at 1 byte as otherwise Qemu will issue RMWs 
>> for all write requests that do not align. Everything that comes from a guest 
>> OS is very likely 4k aligned anyway.
> My goal is definitely not to have QEMU do any extra work for splitting
> or aligning IOs. I am really only trying to get hints passed down the
> guest via the virtio drivers. If there isn't the plumbing in QEMU for
> that yet, disregard.


>From what I read from the code Qemu emulates the Block Limits VPD Page for 
>virtio-scsi, but the limits there are not taken from the backend driver. They 
>can be passed as config to the virtio-scsi device.

However, Qemu uses all the Block Limit that can be found in 
include/block/block_int.h in the BlockLimits struct to generate optimal 
requests if it comes to block operations like mirroring, or zeroing of a whole

device etc. Some of the alignments (e.g. the request alignment) have direct 
influence and make Qemu split requests from the guest or even make RMW cycles.

If my understanding is incorrect please anyone correct me.


It would indeed be nice to have an option to propagate the settings from the 
backend driver into the Guest. But from my understanding this is not there yet.


So I would leave it as it. Drop the request_alignment = 512 (like in the patch) 
and just advertise the cluster_size at obj_size. This is already in the rbd 
driver today.

The cluster_size e.g. is used in any qemu-img convert operation to align read / 
write requests and size them

Re: [PATCH 11/11] iotests/264: add backup-cancel test-case

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 05:21, Eric Blake wrote:

On 1/20/21 7:28 PM, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

Check that cancel doesn't wait for 10s of nbd reconnect timeout.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/264 | 21 ++---
  1 file changed, 14 insertions(+), 7 deletions(-)



Reviewed-by: Eric Blake 



With this patch applied as-is, the test fails for me:

--- /home/eblake/qemu/tests/qemu-iotests/264.out2021-01-20
20:18:54.741366521 -0600
+++ /home/eblake/qemu/build/tests/qemu-iotests/264.out.bad  2021-01-20
20:20:37.242451172 -0600
@@ -1,5 +1,5 @@
-..
+...
  --
-Ran 2 tests
+Ran 3 tests

but with no obvious visibility into why.  Did you forget to check in
changes to 264.out?



Oops, definitely, I forget to add new test-case to .out file.


--
Best regards,
Vladimir

Re: [PATCH 00/11] mirror: cancel nbd reconnect

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 05:14, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

The problem

Assume we have mirror job with nbd target node with enabled reconnect.
Connection failed. So, all current requests to nbd node are waiting for
nbd driver to reconnect. And they will wait for reconnect-delay time
specified in nbd blockdev options. This timeout may be long enough, for
example, we in Virtuozzo use 300 seconds by default.

So, if at this moment user tries to cancel the job, job will wait for
its in-flight requests to finish up to 300 seconds. From the user point
of view, cancelling the job takes a long time. Bad.

Solution

Let's just cancel "waiting for reconnect in in-flight request coroutines"
on mirror (and backup) cancel. Welcome the series below.


Because of my question on 4/11, I did not queue the entire series yet.
But 6/11 was trivial enough to queue now.



Thanks!

--
Best regards,
Vladimir

Re: [PATCH 7/7] block/rbd: change request alignment to 1 byte

2021-01-21 Thread Jason Dillaman

On Wed, Jan 20, 2021 at 6:01 PM Peter Lieven  wrote:
>
>
> > Am 19.01.2021 um 15:20 schrieb Jason Dillaman :
> >
> > On Tue, Jan 19, 2021 at 4:36 AM Peter Lieven  wrote:
> >>> Am 18.01.21 um 23:33 schrieb Jason Dillaman:
> >>> On Fri, Jan 15, 2021 at 10:39 AM Peter Lieven  wrote:
>  Am 15.01.21 um 16:27 schrieb Jason Dillaman:
> > On Thu, Jan 14, 2021 at 2:59 PM Peter Lieven  wrote:
> >> Am 14.01.21 um 20:19 schrieb Jason Dillaman:
> >>> On Sun, Dec 27, 2020 at 11:42 AM Peter Lieven  wrote:
>  since we implement byte interfaces and librbd supports aio on byte 
>  granularity we can lift
>  the 512 byte alignment.
>  Signed-off-by: Peter Lieven 
>  ---
>  block/rbd.c | 2 --
>  1 file changed, 2 deletions(-)
>  diff --git a/block/rbd.c b/block/rbd.c
>  index 27b4404adf..8673e8f553 100644
>  --- a/block/rbd.c
>  +++ b/block/rbd.c
>  @@ -223,8 +223,6 @@ done:
>  static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error 
>  **errp)
>  {
> BDRVRBDState *s = bs->opaque;
>  -/* XXX Does RBD support AIO on less than 512-byte alignment? */
>  -bs->bl.request_alignment = 512;
> >>> Just a suggestion, but perhaps improve discard alignment, max discard,
> >>> optimal alignment (if that's something QEMU handles internally) if not
> >>> overridden by the user.
> >> Qemu supports max_discard and discard_alignment. Is there a call to 
> >> get these limits
> >> from librbd?
> >> What do you mean by optimal_alignment? The object size?
> > krbd does a good job of initializing defaults [1] where optimal and
> > discard alignment is 64KiB (can actually be 4KiB now), max IO size for
> > writes, discards, and write-zeroes is the object size * the stripe
> > count.
>  Okay, I will have a look at it. If qemu issues a write, discard, 
>  write_zero greater than
>  obj_size  * stripe count will librbd split it internally or will the 
>  request fail?
> >>> librbd will handle it as needed. My goal is really just to get the
> >>> hints down the guest OS.
>  Regarding the alignment it seems that rbd_dev->opts->alloc_size is 
>  something that comes from the device
>  configuration and not from rbd? I don't have that information inside the 
>  Qemu RBD driver.
> >>> librbd doesn't really have the information either. The 64KiB guess
> >>> that krbd uses was a compromise since that was the default OSD
> >>> allocation size for HDDs since Luminous. Starting with Pacific that
> >>> default is going down to 4KiB.
> >> I will try to adjust these values as far as it is possible and makes sense.
> >> Is there a way to check the minimum supported OSD release in the backend 
> >> from librbd / librados?
> >
> > It's not a minimum -- RADOS will gladly access 1 byte writes as well.
> > It's really just the optimal (performance and space-wise). Sadly,
> > there is no realistic way to query this data from the backend.
>
> So you would suggest to advertise an optimal transfer length of 64k and max 
> transfer length of obj size * stripe count to the guest unless we have an API 
> in the future to query these limits from the backend?

I'll open a Ceph tracker ticket to expose these via the API in a future release.

> I would leave request alignment at 1 byte as otherwise Qemu will issue RMWs 
> for all write requests that do not align. Everything that comes from a guest 
> OS is very likely 4k aligned anyway.

My goal is definitely not to have QEMU do any extra work for splitting
or aligning IOs. I am really only trying to get hints passed down the
guest via the virtio drivers. If there isn't the plumbing in QEMU for
that yet, disregard.

> Peter
>
>


-- 
Jason

Re: [PATCH 03/11] block/raw-format: implement .bdrv_cancel_in_flight handler

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 02:15, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

We are going to cancel in-flight requests on mirror nbd target on job
cancel. Still nbd is often used not directly but as raw-format child.
So, add pass-through handler here.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/raw-format.c | 6 ++
  1 file changed, 6 insertions(+)


Should all filters do this automatically (or rather, should the block
layer do this automatically for all filters)?  But I understand that it
does NOT make sense for format drivers in general (cancelling a guest
request may still require metadata updates), where raw-format is the
exception, so doing it in raw-format instead of the block layer makes sense.



Hmm.. probably not for backup_top filter. But of course OK for throttling




  BlockDriver bdrv_raw = {
  .format_name  = "raw",
  .instance_size= sizeof(BDRVRawState),
@@ -608,6 +613,7 @@ BlockDriver bdrv_raw = {
  .bdrv_has_zero_init   = _has_zero_init,
  .strong_runtime_opts  = raw_strong_runtime_opts,
  .mutable_opts = mutable_opts,
+.bdrv_cancel_in_flight = raw_cancel_in_flight,


A demonstration of why I don't like aligning the =.  But it's merely
cosmetic, so doesn't affect:

Reviewed-by: Eric Blake 




--
Best regards,
Vladimir

Re: [PATCH 09/11] iotests/264: add mirror-cancel test-case

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 04:26, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

Check that cancel doesn't wait for 10s of nbd reconnect timeout.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/264 | 38 ++
  tests/qemu-iotests/264.out |  4 ++--
  2 files changed, 32 insertions(+), 10 deletions(-)


  
+def test_mirror_cancel(self):

+# Mirror speed limit doesn't work well enough, it seems that mirror
+# will run many parallel requests anyway. MAX_IN_FLIGHT is 16 and
+# MAX_IO_BYTES is 1M in mirror.c, so let's use 20M disk.
+self.init_vm(20 * 1024 * 1024)
+self.start_job('blockdev-mirror')


Is this comment still accurate given recent work on the mirror filter?


Hmm, what do you mean? I missed it..


I'm fine taking the patch as-is and tweaking it with followups, though,
in order to make progress.


Good for me, of course




+
+result = self.vm.qmp('block-job-cancel', device='drive0')
+self.assert_qmp(result, 'return', {})
+
+start_t = time.time()
+self.vm.event_wait('BLOCK_JOB_CANCELLED')
+delta_t = time.time() - start_t
+self.assertTrue(delta_t < 2.0)


I hope this doesn't fail on CI platforms under heavy load.  It didn't
fail for me locally, but I hope we don't have to revisit it.  Is there
any way we can test this in a manner that is not as fragile?


Hmm, I don't know. We want to check that cancel is not as long as reconnect 
timeout.. If it fails, we'll adjust the constants :) And we have no limit in 
it, we can use 1hour for reconnect-timeout and 10min for mirror to cancel for 
example (but probably something other may fail with such big timeouts)



Reviewed-by: Eric Blake 




--
Best regards,
Vladimir

Re: [PATCH 07/11] iotests/264: move to python unittest

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 04:17, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

We are going to add more test cases, so use the library supporting test
cases.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/264 | 93 ++
  tests/qemu-iotests/264.out | 20 ++--
  2 files changed, 58 insertions(+), 55 deletions(-)




+++ b/tests/qemu-iotests/264.out
@@ -1,15 +1,5 @@
-Start NBD server
-{"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": "nbd", "reconnect-delay": 10, "server": {"path": 
"TEST_DIR/PID-nbd-sock", "type": "unix"}}, "node-name": "backup0"}}
-{"return": {}}
-{"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 1048576, "sync": "full", 
"target": "backup0"}}
-{"return": {}}
-Backup job is started
-Kill NBD server
-Backup job is still in progress
-{"execute": "block-job-set-speed", "arguments": {"device": "drive0", "speed": 
0}}
-{"return": {}}
-Start NBD server
-Backup completed: 5242880
-{"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
-{"return": {}}
-Kill NBD server
+.
+--
+Ran 1 tests
+
+OK


I find it a shame that the expected output no longer shows what was
executed.  But the test still passes, and if it makes it easier for you
to extend the test in a later patch, I won't stand in the way (this is
more an indication that by stripping the useful output, I'm no longer in
as decent a position to help debug if the test starts failing).



Still, what is executed is understandable from the test itself.. And IMHO, 
debugging python unittests is simpler: you get the stack, and immediately see 
what happens. When with output-checking tests, you should first understand, 
what is the statement corresponding to the wrong output. It's not saying about 
the fact that with unittests I can simply test only one test-case (that's the 
reason, why I think that tests with several testcases must be written as 
unittest tests). And debugging output-checking tests with a lot of test-cases 
inside is always a pain for me.

Another benefit of unittest: on failure test immediately finishes. With 
output-checking tests, test continue to execute, and may produce unnecessary 
not-matching log, or hang, or anything else.

Another drawback of output-cheking tests: they often test too much unrelated 
things. Sometimes it's good: you can catch some unrelated bug :) But often it's 
a pain: you have to modify test outputs when creating new features or changing 
the behaviour actually unrelated to what the test actually want to test.

Python unittests are more difficult to write, as you should understand what 
exactly you want/should to check.. When with output-checking tests you can just 
log everything. But in general I'm for python unittests.

Still I think sometimes about supporting output for python-unitests based tests 
(not loosing the ability to execute test-cases in separate, may be .out file 
per test-case?), it may be a good compromise.


Reviewed-by: Eric Blake 




--
Best regards,
Vladimir

Re: [PATCH v6 3/7] qemu: add support for iOS host

2021-01-21 Thread Peter Maydell

On Tue, 5 Jan 2021 at 02:25, Joelle van Dyne  wrote:
>
> This introduces support for building for iOS hosts. When the correct Xcode
> toolchain is used, iOS host will be detected automatically.
>
> * block: disable features not supported by iOS sandbox
> * slirp: disable SMB features for iOS
> * osdep: disable system() calls for iOS

> +``ucontext`` support is broken on iOS. The implementation from 
> ``libucontext``
> +is used instead.

Just a note since it came up in another thread today, but
looking at libucontext its aarch64 backend doesn't handle
the floating point registers. I think if the *context
routines don't save/restore the callee-saves fp regs
(v8-v15, FPCR) then it's liable to result in tricky-to-track
down bugs where some kept-in-a-callee-saves-fp-register data
from a function further up the callstack gets corrupted, depending
on what the compiler happens to do. It would be good to work with
the libucontext maintainers to add that functionality.

thanks
-- PMM

Re: [PATCH 01/11] block: add new BlockDriver handler: bdrv_cancel_in_flight

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 01:27, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

It will be used to stop retrying NBD requests on mirror cancel.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block.h |  3 +++
  include/block/block_int.h |  9 +
  block/io.c| 11 +++
  3 files changed, 23 insertions(+)



How does this relate to the recent addition of the QMP yank command?


Hmm. Don't know.. Looking in spec:

# Currently implemented yank instances:
#  - nbd block device:
#Yanking it will shut down the connection to the nbd server without
#attempting to reconnect.

Looks like a close thing. But actually, I don't want to stop reconnecting 
process, but only cancel some requests which we don't want to wait for. After 
that, the nbd node may successfully reconnect and continue to work.





+++ b/include/block/block_int.h
@@ -344,6 +344,15 @@ struct BlockDriver {
  bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
  int64_t *map, BlockDriverState **file);
  
+/*

+ * This informs the driver that we are not more interested in in-flight


that we are no longer interested in the result of in-flight requests, so


+ * requests results, so don't waste the time if possible.
+ *
+ * The example usage is to not wait for nbd target nodedreconnect timeout 
on
+ * job-cancel.


One example usage is to avoid waiting for an nbd target node reconnect
timeout during job-cancel.

Reviewed-by: Eric Blake 




--
Best regards,
Vladimir

Re: [PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup()

2021-01-21 Thread Eric Blake

On 1/21/21 11:07 AM, Sergio Lopez wrote:
> Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
> bdrv_drain_all_begin().
> 
> Export drivers may have coroutines yielding at some point in the block
> layer, so we need to shut them down before draining the block layer,
> as otherwise they may get stuck blk_wait_while_drained().

stuck in

> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
> Signed-off-by: Sergio Lopez 
> ---
>  block.c| 1 -
>  softmmu/runstate.c | 9 +
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 

> @@ -783,6 +784,14 @@ void qemu_cleanup(void)
>   */
>  migration_shutdown();
>  
> +/*
> + * Close the exports before draining the block layer. The export
> + * drivers may have coroutines yielding on it, so we need to clean
> + * them up before the drain, as otherwise they may be get stuck in

s/be //

> + * blk_wait_while_drained().
> + */
> +blk_exp_close_all();
> +
>  /*
>   * We must cancel all block jobs while the block layer is drained,
>   * or cancelling will be affected by throttling and thus may block
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2021-01-21 Thread Eric Blake

On 1/21/21 11:06 AM, Sergio Lopez wrote:
> Some graphs may contain an indirect reference to the first BDS in the
> chain that can be reached while walking it bottom->up from one its

one of its

> children.
> 
> Doubling-processing of a BDS is especially problematic for the

Double-processing

> aio_notifiers, as they might attempt to work on both the old and the
> new AIO contexts.
> 
> To avoid this problem, add every child and parent to the ignore list
> before actually processing them.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Sergio Lopez 
> ---
>  block.c | 34 +++---
>  1 file changed, 27 insertions(+), 7 deletions(-)
> 
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 10/11] iotests: rewrite check into python

2021-01-21 Thread Eric Blake

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> Just use classes introduced in previous three commits. Behavior
> difference is described in these three commits.
> 
> Drop group file, as it becomes unused.
> 
> Drop common.env: now check is in python, and for tests we use same
> python interpreter that runs the check itself. Use build environment
> PYTHON in check-block instead, to keep "make check" use the same
> python.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

> +++ b/tests/qemu-iotests/check
> @@ -1,7 +1,8 @@
> -#!/usr/bin/env bash
> +#!/usr/bin/env python3
>  #
> -# Copyright (C) 2009 Red Hat, Inc.
> -# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
> +# Configure environment and run group of tests in it.
> +#
> +# Copyright (c) 2020-2021 Virtuozzo International GmbH

Normally dropping old copyrights is suspicious, but as this is a
complete rewrite, it makes sense here.

> -exit
> +import os
> +import sys
> +import argparse
> +from findtests import TestFinder
> +from testenv import TestEnv
> +from testrunner import TestRunner
> +
> +
> +def make_argparser() -> argparse.ArgumentParser:
> +p = argparse.ArgumentParser(description="Test run options")
> +
> +p.add_argument('-n', '--dry-run', action='store_true',
> +   help='show me, do not run tests')
> +p.add_argument('-makecheck', action='store_true',
> +   help='pretty print output for make check')
> +
> +p.add_argument('-d', dest='debug', action='store_true', help='debug')
> +p.add_argument('-misalign', action='store_true',
> +   help='misalign memory allocations')
> +
> +g_env = p.add_argument_group('test environment options')
> +mg = g_env.add_mutually_exclusive_group()
> +# We don't set default for cachemode, as we need to distinguish dafult

default

> +# from user input later.
> +mg.add_argument('-nocache', dest='cachemode', action='store_const',
> +const='none', help='set cache mode "none" (O_DIRECT), '
> +'sets CACHEMODE environment variable')
> +mg.add_argument('-c', dest='cachemode',
> +help='sets CACHEMODE environment variable')
> +
> +g_env.add_argument('-i', dest='aiomode', default='threads',
> +   help='sets AIOMODE environment variable')
> +
> +p.set_defaults(imgfmt='raw', imgproto='file')
> +
> +format_list = ['raw', 'bochs', 'cloop', 'parallels', 'qcow', 'qcow2',
> +   'qed', 'vdi', 'vpc', 'vhdx', 'vmdk', 'luks', 'dmg']
> +g_fmt = p.add_argument_group(
> +'  image format options',
> +'The following options set the IMGFMT environment variable. '
> +'At most one choice is allowed, default is "raw"')
> +mg = g_fmt.add_mutually_exclusive_group()
> +for fmt in format_list:
> +mg.add_argument('-' + fmt, dest='imgfmt', action='store_const',
> +const=fmt, help=f'test {fmt}')
> +
> +protocol_list = ['file', 'rbd', 'sheepdoc', 'nbd', 'ssh', 'nfs',

sheepdog

> + 'fuse']
> +g_prt = p.add_argument_group(
> +'  image protocol options',
> +'The following options set the IMGPROTO environment variable. '
> +'At most one choice is allowed, default is "file"')
> +mg = g_prt.add_mutually_exclusive_group()
> +for prt in protocol_list:
> +mg.add_argument('-' + prt, dest='imgproto', action='store_const',
> +const=prt, help=f'test {prt}')
> +
> +g_bash = p.add_argument_group('bash tests options',
> +  'The following options are ignored by '
> +  'python tests.')
> +# TODO: make support for the following options in iotests.py
> +g_bash.add_argument('-o', dest='imgopts',
> +help='options to pass to qemu-img create/convert, '
> +'sets IMGOPTS environment variable')
> +g_bash.add_argument('-valgrind', dest='VALGRIND_QEMU',
> +action='store_const', const='y',
> +help='use valgrind, sets VALGRIND_QEMU environment '
> +'variable')
> +
> +g_sel = p.add_argument_group('test selecting options',
> + 'The following options specify test set '
> + 'to run.')
> +g_sel.add_argument('-g', '--groups', metavar='group1,...',
> +   help='include tests from these groups')
> +g_sel.add_argument('-x', '--exclude-groups', metavar='group1,...',
> +   help='exclude tests from these groups')
> +g_sel.add_argument('--start-from', metavar='TEST',
> +   help='Start from specified test: make sorted sequence 
> '
> +   'of tests as usual and then drop tests from the first 
> '
> +   'one to TEST (not inclusive).

Re: [PATCH v7 09/11] iotests: add testrunner.py

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 20:02, Eric Blake wrote:

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:

Add TestRunner class, which will run tests in a new python iotests
running framework.

There are some differences with current ./check behavior, most
significant are:
- Consider all tests self-executable, just run them, don't run python
   by hand.
- Elapsed time is cached in json file
- Elapsed time precision increased a bit
- use python difflib instead of "diff -w", to ignore spaces at line
   ends strip lines by hand. Do not ignore other spaces.


Awkward wording.  Maybe:

Instead of using "diff -w" which ignores all whitespace differences,
manually strip whitespace at line end then use python difflib, which no
longer ignores spacing mid-line



Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/testrunner.py | 344 +++
  1 file changed, 344 insertions(+)
  create mode 100644 tests/qemu-iotests/testrunner.py

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
new file mode 100644
index 00..92722cc68b
--- /dev/null
+++ b/tests/qemu-iotests/testrunner.py
@@ -0,0 +1,344 @@
+# Class for actual tests running.


for actually running tests

again, should this file be 755 with #! python?


In my latest considerations - no it shouldn't.. We still make implement 
__main__ things later if needed.




+
+def file_diff(file1: str, file2: str) -> List[str]:
+with open(file1) as f1, open(file2) as f2:
+# We want to ignore spaces at line ends. There are a lot of mess about
+# it in iotests.
+# TODO: fix all tests to not produce extra spaces, fix all .out files
+# and use strict diff here!
+seq1 = [line.rstrip() for line in f1]
+seq2 = [line.rstrip() for line in f2]
+return list(difflib.unified_diff(seq1, seq2, file1, file2))


Offhand, do you have the list of tests where (actual/expected) output
has trailing whitespace and would fail without the .rstrip()?


No.. But it's simple to make it




+
+
+# We want to save current tty settings during test run,
+# since an aborting qemu call may leave things screwed up.
+@contextmanager
+def savetty() -> Iterator[None]:
+isterm = sys.stdin.isatty()
+if isterm:
+fd = sys.stdin.fileno()
+attr = termios.tcgetattr(0)


Isn't fd always going to be 0?  It looks odd to hard-code zero in the
very next line; either we should s/0/fd/ here, or...


agree that's strange.




+
+try:
+yield
+finally:
+if isterm:
+termios.tcsetattr(fd, termios.TCSADRAIN, attr)


... s/fd/0/ here and drop fd altogether.


Either way is OK for me, I think, I'll do s/0/fd/




+
+
+class LastElapsedTime(AbstractContextManager['LastElapsedTime']):
+""" Cache for elapsed time for tests, to show it during new test run
+
+Use get() in any time. But, if use update you should then call save(),
+or use update() inside with-block.


Grammar is hard, maybe:

It is safe to use get() at any time.  To use update(), you must either
use a with-block or first use save().


OK, thanks





+def test_print_one_line(self, test: str, starttime: str,



+
+if status == 'pass':
+col = '\033[32m'
+elif status == 'fail':
+col = '\033[1m\033[31m'
+elif status == 'not run':
+col = '\033[33m'


This hard-codes the use of ANSI escape sequences without first checking
that we are writing to a terminal.  Is that wise?  Should we have this
be tunable by a tri-state command-line option, similar to ls --color?
(--color=auto is default, and bases decision on istty(), --color=off
turns color off even for a terminal, --color=on uses color even when
outputting to a pipe, which can be useful depending on the other end of
the pipeline...)


Hmm, yes. It's preexisting in old bash check script I think. I can add a 
separate patch for it





+with f_test.open() as f:
+try:
+if f.readline() == '#!/usr/bin/env python3':
+args.insert(0, self.env.python)
+except UnicodeDecodeError:  # binary test? for future.
+pass


Is pass the right action here?  Or will it silently skip a test file
with encoding errors?


No, we'll not skip it. Here we just failed to recognize python test, so, it 
will be executed as self-executable. So, if there are real problems, we'll see 
them when try to execute the file.



Again, I'm not comfortable enough to give a full review of the python,
but it looks fairly similar to the existing shell code, and with the
series applied, things still work.  So I can offer

Tested-by: Eric Blake 




Thanks a lot for testing and reviewing so many my patches!!

--
Best regards,
Vladimir

[PATCH v3 2/2] block: move blk_exp_close_all() to qemu_cleanup()

2021-01-21 Thread Sergio Lopez

Move blk_exp_close_all() from bdrv_close() to qemu_cleanup(), before
bdrv_drain_all_begin().

Export drivers may have coroutines yielding at some point in the block
layer, so we need to shut them down before draining the block layer,
as otherwise they may get stuck blk_wait_while_drained().

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
Signed-off-by: Sergio Lopez 
---
 block.c| 1 -
 softmmu/runstate.c | 9 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 3da99312db..9682c82fa8 100644
--- a/block.c
+++ b/block.c
@@ -4435,7 +4435,6 @@ static void bdrv_close(BlockDriverState *bs)
 void bdrv_close_all(void)
 {
 assert(job_next(NULL) == NULL);
-blk_exp_close_all();
 
 /* Drop references from requests still in flight, such as canceled block
  * jobs whose AIO context has not been polled yet */
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 6177693a30..ac4b2e2540 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 #include "audio/audio.h"
 #include "block/block.h"
+#include "block/export.h"
 #include "chardev/char.h"
 #include "crypto/cipher.h"
 #include "crypto/init.h"
@@ -783,6 +784,14 @@ void qemu_cleanup(void)
  */
 migration_shutdown();
 
+/*
+ * Close the exports before draining the block layer. The export
+ * drivers may have coroutines yielding on it, so we need to clean
+ * them up before the drain, as otherwise they may be get stuck in
+ * blk_wait_while_drained().
+ */
+blk_exp_close_all();
+
 /*
  * We must cancel all block jobs while the block layer is drained,
  * or cancelling will be affected by throttling and thus may block
-- 
2.26.2

[PATCH v3 0/2] nbd/server: Quiesce coroutines on context switch

2021-01-21 Thread Sergio Lopez

This series allows the NBD server to properly switch between AIO contexts,
having quiesced recv_coroutine and send_coroutine before doing the transition.

We need this because we send back devices running in IO Thread owned contexts
to the main context when stopping the data plane, something that can happen
multiple times during the lifetime of a VM (usually during the boot sequence or
on a reboot), and we drag the NBD server of the correspoing export with it.

While there, fix also a problem caused by a cross-dependency between
closing the export's client connections and draining the block
layer. The visible effect of this problem was QEMU getting hung when
the guest request a power off while there's an active NBD client.

v3:
 - Drop already merged "block: Honor blk_set_aio_context() context
 requirements" and "nbd/server: Quiesce coroutines on context switch"
 - Change the strategy for avoiding processing BDS twice to adding
 every child and parent to the ignore list in advance before
 processing them. (Kevin Wolf)
 - Replace "nbd/server: Quiesce coroutines on context switch" with
 "block: move blk_exp_close_all() to qemu_cleanup()"

v2:
 - Replace "virtio-blk: Acquire context while switching them on
 dataplane start" with "block: Honor blk_set_aio_context() context
 requirements" (Kevin Wolf)
 - Add "block: Avoid processing BDS twice in
 bdrv_set_aio_context_ignore()"
 - Add "block: Close block exports in two steps"
 - Rename nbd_read_eof() to nbd_server_read_eof() (Eric Blake)
 - Fix double space and typo in comment. (Eric Blake)

Sergio Lopez (2):
  block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
  block: move blk_exp_close_all() to qemu_cleanup()

 block.c| 35 +++
 softmmu/runstate.c |  9 +
 2 files changed, 36 insertions(+), 8 deletions(-)

-- 
2.26.2

Re: [PATCH v7 08/11] iotests: add testenv.py

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 19:48, Eric Blake wrote:

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:

Add TestEnv class, which will handle test environment in a new python
iotests running framework.

Difference with current ./check interface:
- -v (verbose) option dropped, as it is unused

- -xdiff option is dropped, until somebody complains that it is needed
- same for -n option
- same for looking for binaries in $build_iotests directory.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/testenv.py | 255 ++
  1 file changed, 255 insertions(+)
  create mode 100644 tests/qemu-iotests/testenv.py

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
new file mode 100644
index 00..0fe5076088
--- /dev/null
+++ b/tests/qemu-iotests/testenv.py
@@ -0,0 +1,255 @@
+# TestEnv class to manage test environment variables.


Should this have permissions 755 and a #! python line?


I understood that keeping cmdline interface inside these three classes doesn't 
help but only makes things more difficult. So it's the change of v7: the whole 
cmdline interface is now in check script, and these three .py files just 
defines normal classes. So, I dropped __main__ things as well together with 
executable permissions.




+def get_default_machine(qemu_prog: str) -> str:
+outp = subprocess.run([qemu_prog, '-machine', 'help'], check=True,
+  text=True, stdout=subprocess.PIPE).stdout
+
+machines = outp.split('\n')
+default_machine = next(m for m in machines if m.endswith(' (default)'))
+default_machine = default_machine.split(' ', 1)[0]
+
+alias_suf = ' (alias of {})'.format(default_machine)
+alias = next((m for m in machines if m.endswith(alias_suf)), None)
+if alias is not None:
+default_machine = alias.split(' ', 1)[0]
+
+return default_machine


I have no idea if this is the most efficient and idiomatic way for
python to extract the default machine out of the output list, but it
seems to work.


+
+
+class TestEnv(AbstractContextManager['TestEnv']):




+self.qsd_prog = os.getenv('QSD_PROG', root('storage-daemon',
+   'qemu-storage-daemon'))
+
+for b in [self.qemu_img_prog, self.qemu_io_prog, self.qemu_nbd_prog,
+  self.qemu_prog, self.qsd_prog]:
+if not os.path.exists(b):
+sys.exit('Not such file: ' + b)


No such file



+if self.imgfmt == 'vmkd':


vmdk (which means we need to test that './check -vmdk' still works...)

I can see a rough correspondence to the existing shell code in ./check,
and with the entire series applied, things still work (other than vmdk,
which should work with the typo fix).  I'm not comfortable enough to
offer my full review of the python code, but I can give:

Tested-by: Eric Blake 




--
Best regards,
Vladimir

[PATCH v3 1/2] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()

2021-01-21 Thread Sergio Lopez

Some graphs may contain an indirect reference to the first BDS in the
chain that can be reached while walking it bottom->up from one its
children.

Doubling-processing of a BDS is especially problematic for the
aio_notifiers, as they might attempt to work on both the old and the
new AIO contexts.

To avoid this problem, add every child and parent to the ignore list
before actually processing them.

Suggested-by: Kevin Wolf 
Signed-off-by: Sergio Lopez 
---
 block.c | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 8b9d457546..3da99312db 100644
--- a/block.c
+++ b/block.c
@@ -6414,7 +6414,10 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
  AioContext *new_context, GSList **ignore)
 {
 AioContext *old_context = bdrv_get_aio_context(bs);
-BdrvChild *child;
+GSList *children_to_process = NULL;
+GSList *parents_to_process = NULL;
+GSList *entry;
+BdrvChild *child, *parent;
 
 g_assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
@@ -6429,16 +6432,33 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
 continue;
 }
 *ignore = g_slist_prepend(*ignore, child);
-bdrv_set_aio_context_ignore(child->bs, new_context, ignore);
+children_to_process = g_slist_prepend(children_to_process, child);
 }
-QLIST_FOREACH(child, >parents, next_parent) {
-if (g_slist_find(*ignore, child)) {
+
+QLIST_FOREACH(parent, >parents, next_parent) {
+if (g_slist_find(*ignore, parent)) {
 continue;
 }
-assert(child->klass->set_aio_ctx);
-*ignore = g_slist_prepend(*ignore, child);
-child->klass->set_aio_ctx(child, new_context, ignore);
+*ignore = g_slist_prepend(*ignore, parent);
+parents_to_process = g_slist_prepend(parents_to_process, parent);
+}
+
+for (entry = children_to_process;
+ entry != NULL;
+ entry = g_slist_next(entry)) {
+child = entry->data;
+bdrv_set_aio_context_ignore(child->bs, new_context, ignore);
+}
+g_slist_free(children_to_process);
+
+for (entry = parents_to_process;
+ entry != NULL;
+ entry = g_slist_next(entry)) {
+parent = entry->data;
+assert(parent->klass->set_aio_ctx);
+parent->klass->set_aio_ctx(parent, new_context, ignore);
 }
+g_slist_free(parents_to_process);
 
 bdrv_detach_aio_context(bs);
 
-- 
2.26.2

Re: [PATCH v7 09/11] iotests: add testrunner.py

2021-01-21 Thread Eric Blake

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add TestRunner class, which will run tests in a new python iotests
> running framework.
> 
> There are some differences with current ./check behavior, most
> significant are:
> - Consider all tests self-executable, just run them, don't run python
>   by hand.
> - Elapsed time is cached in json file
> - Elapsed time precision increased a bit
> - use python difflib instead of "diff -w", to ignore spaces at line
>   ends strip lines by hand. Do not ignore other spaces.

Awkward wording.  Maybe:

Instead of using "diff -w" which ignores all whitespace differences,
manually strip whitespace at line end then use python difflib, which no
longer ignores spacing mid-line

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/testrunner.py | 344 +++
>  1 file changed, 344 insertions(+)
>  create mode 100644 tests/qemu-iotests/testrunner.py
> 
> diff --git a/tests/qemu-iotests/testrunner.py 
> b/tests/qemu-iotests/testrunner.py
> new file mode 100644
> index 00..92722cc68b
> --- /dev/null
> +++ b/tests/qemu-iotests/testrunner.py
> @@ -0,0 +1,344 @@
> +# Class for actual tests running.

for actually running tests

again, should this file be 755 with #! python?

> +
> +def file_diff(file1: str, file2: str) -> List[str]:
> +with open(file1) as f1, open(file2) as f2:
> +# We want to ignore spaces at line ends. There are a lot of mess 
> about
> +# it in iotests.
> +# TODO: fix all tests to not produce extra spaces, fix all .out files
> +# and use strict diff here!
> +seq1 = [line.rstrip() for line in f1]
> +seq2 = [line.rstrip() for line in f2]
> +return list(difflib.unified_diff(seq1, seq2, file1, file2))

Offhand, do you have the list of tests where (actual/expected) output
has trailing whitespace and would fail without the .rstrip()?

> +
> +
> +# We want to save current tty settings during test run,
> +# since an aborting qemu call may leave things screwed up.
> +@contextmanager
> +def savetty() -> Iterator[None]:
> +isterm = sys.stdin.isatty()
> +if isterm:
> +fd = sys.stdin.fileno()
> +attr = termios.tcgetattr(0)

Isn't fd always going to be 0?  It looks odd to hard-code zero in the
very next line; either we should s/0/fd/ here, or...

> +
> +try:
> +yield
> +finally:
> +if isterm:
> +termios.tcsetattr(fd, termios.TCSADRAIN, attr)

... s/fd/0/ here and drop fd altogether.

> +
> +
> +class LastElapsedTime(AbstractContextManager['LastElapsedTime']):
> +""" Cache for elapsed time for tests, to show it during new test run
> +
> +Use get() in any time. But, if use update you should then call save(),
> +or use update() inside with-block.

Grammar is hard, maybe:

It is safe to use get() at any time.  To use update(), you must either
use a with-block or first use save().


> +def test_print_one_line(self, test: str, starttime: str,

> +
> +if status == 'pass':
> +col = '\033[32m'
> +elif status == 'fail':
> +col = '\033[1m\033[31m'
> +elif status == 'not run':
> +col = '\033[33m'

This hard-codes the use of ANSI escape sequences without first checking
that we are writing to a terminal.  Is that wise?  Should we have this
be tunable by a tri-state command-line option, similar to ls --color?
(--color=auto is default, and bases decision on istty(), --color=off
turns color off even for a terminal, --color=on uses color even when
outputting to a pipe, which can be useful depending on the other end of
the pipeline...)


> +with f_test.open() as f:
> +try:
> +if f.readline() == '#!/usr/bin/env python3':
> +args.insert(0, self.env.python)
> +except UnicodeDecodeError:  # binary test? for future.
> +pass

Is pass the right action here?  Or will it silently skip a test file
with encoding errors?

Again, I'm not comfortable enough to give a full review of the python,
but it looks fairly similar to the existing shell code, and with the
series applied, things still work.  So I can offer

Tested-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 07/11] iotests: add findtests.py

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 19:21, Eric Blake wrote:

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:

Add python script with new logic of searching for tests:




The file findtests.py is self-executable and may be used for debugging
purposes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---



diff --git a/tests/qemu-iotests/findtests.py b/tests/qemu-iotests/findtests.py
new file mode 100644
index 00..065ec13125
--- /dev/null
+++ b/tests/qemu-iotests/findtests.py
@@ -0,0 +1,159 @@
+# TestFinder class, define set of tests to run.


Per the commit message, the file is self-executable; doesn't that mean
it should have 755 permissions instead of 644, and have a #! python line?


+return sequence



I guess it would also need a __main__ blurb at the end?  Or maybe I'm
misunderstanding the intent of the commit message line.



These things are dropped recently and I forget to updated commit-msg.

So, what should be done is just drop this sentence from commit message.


--
Best regards,
Vladimir

Re: [PATCH v7 08/11] iotests: add testenv.py

2021-01-21 Thread Eric Blake

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add TestEnv class, which will handle test environment in a new python
> iotests running framework.
> 
> Difference with current ./check interface:
> - -v (verbose) option dropped, as it is unused
> 
> - -xdiff option is dropped, until somebody complains that it is needed
> - same for -n option
> - same for looking for binaries in $build_iotests directory.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/testenv.py | 255 ++
>  1 file changed, 255 insertions(+)
>  create mode 100644 tests/qemu-iotests/testenv.py
> 
> diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
> new file mode 100644
> index 00..0fe5076088
> --- /dev/null
> +++ b/tests/qemu-iotests/testenv.py
> @@ -0,0 +1,255 @@
> +# TestEnv class to manage test environment variables.

Should this have permissions 755 and a #! python line?

> +def get_default_machine(qemu_prog: str) -> str:
> +outp = subprocess.run([qemu_prog, '-machine', 'help'], check=True,
> +  text=True, stdout=subprocess.PIPE).stdout
> +
> +machines = outp.split('\n')
> +default_machine = next(m for m in machines if m.endswith(' (default)'))
> +default_machine = default_machine.split(' ', 1)[0]
> +
> +alias_suf = ' (alias of {})'.format(default_machine)
> +alias = next((m for m in machines if m.endswith(alias_suf)), None)
> +if alias is not None:
> +default_machine = alias.split(' ', 1)[0]
> +
> +return default_machine

I have no idea if this is the most efficient and idiomatic way for
python to extract the default machine out of the output list, but it
seems to work.

> +
> +
> +class TestEnv(AbstractContextManager['TestEnv']):


> +self.qsd_prog = os.getenv('QSD_PROG', root('storage-daemon',
> +   'qemu-storage-daemon'))
> +
> +for b in [self.qemu_img_prog, self.qemu_io_prog, self.qemu_nbd_prog,
> +  self.qemu_prog, self.qsd_prog]:
> +if not os.path.exists(b):
> +sys.exit('Not such file: ' + b)

No such file


> +if self.imgfmt == 'vmkd':

vmdk (which means we need to test that './check -vmdk' still works...)

I can see a rough correspondence to the existing shell code in ./check,
and with the entire series applied, things still work (other than vmdk,
which should work with the typo fix).  I'm not comfortable enough to
offer my full review of the python code, but I can give:

Tested-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 07/11] iotests: add findtests.py

2021-01-21 Thread Eric Blake

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add python script with new logic of searching for tests:
> 
> Current ./check behavior:
>  - tests are named [0-9][0-9][0-9]
>  - tests must be registered in group file (even if test doesn't belong
>to any group, like 142)
> 
> Behavior of findtests.py:
>  - group file is dropped
>  - tests are all files in tests/ subdirectory (except for .out files),
>so it's not needed more to "register the test", just create it with
>appropriate name in tests/ subdirectory. Old names like
>[0-9][0-9][0-9] (in root iotests directory) are supported too, but
>not recommended for new tests
>  - groups are parsed from '# group: ' line inside test files
>  - optional file group.local may be used to define some additional
>groups for downstreams
>  - 'disabled' group is used to temporary disable tests. So instead of
>commenting tests in old 'group' file you now can add them to
>disabled group with help of 'group.local' file
>  - selecting test ranges like 5-15 are not supported more
>(to support restarting failed ./check command from the middle of the
> process, new argument is added: --start-from)
> 
> Benefits:
>  - no rebase conflicts in group file on patch porting from branch to
>branch
>  - no conflicts in upstream, when different series want to occupy same
>test number
>  - meaningful names for test files
>For example, with digital number, when some person wants to add some
>test about block-stream, he most probably will just create a new
>test. But if there would be test-block-stream test already, he will
>at first look at it and may be just add a test-case into it.
>And anyway meaningful names are better.
> 
> This commit don't update check behavior (which will be done in further

doesn't

> commit), still, the documentation changed like new behavior is already
> here.  Let's live with this small inconsistency for the following few
> commits, until final change.
> 
> The file findtests.py is self-executable and may be used for debugging
> purposes.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/devel/testing.rst  |  50 +-
>  tests/qemu-iotests/findtests.py | 159 
>  2 files changed, 208 insertions(+), 1 deletion(-)
>  create mode 100644 tests/qemu-iotests/findtests.py
> 

> +++ b/tests/qemu-iotests/findtests.py

> +class TestFinder:
> +def __init__(self, test_dir: Optional[str] = None) -> None:
> +self.groups = defaultdict(set)
> +
> +with chdir(test_dir):
> +self.all_tests = glob.glob('[0-9][0-9][0-9]')
> +self.all_tests += [f for f in glob.iglob('tests/*')
> +   if not f.endswith('.out') and
> +   os.path.isfile(f + '.out')]

Interesting that 'NNN' is a test even if 'NNN.out' is not present, but
'tests/NNN' is not.  Not sure if it is worth tweaking, though.


> +def parse_test_name(self, name: str) -> str:
> +if '/' in name:
> +raise ValueError('Paths are unsupported for test selecting, '

selection

> + f'requiring "{name}" is wrong')
> +
> +if re.fullmatch(r'\d+', name):
> +# Numbered tests are old naming convetion. We should convert them

convention

> +# to three-digit-length, like 1 --> 001.
> +name = f'{int(name):03}'
> +else:
> +# Named tests all should be in tests/ subdirectory
> +name = os.path.join('tests', name)
> +
> +if name not in self.all_tests:
> +raise ValueError(f'Test "{name}" is not found')
> +
> +return name
> +
> +def find_tests(self, groups: Optional[List[str]] = None,
> +   exclude_groups: Optional[List[str]] = None,
> +   tests: Optional[List[str]] = None,
> +   start_from: Optional[str] = None) -> List[str]:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 07/11] iotests: add findtests.py

2021-01-21 Thread Eric Blake

On 1/16/21 7:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add python script with new logic of searching for tests:
> 

> The file findtests.py is self-executable and may be used for debugging
> purposes.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---

> diff --git a/tests/qemu-iotests/findtests.py b/tests/qemu-iotests/findtests.py
> new file mode 100644
> index 00..065ec13125
> --- /dev/null
> +++ b/tests/qemu-iotests/findtests.py
> @@ -0,0 +1,159 @@
> +# TestFinder class, define set of tests to run.

Per the commit message, the file is self-executable; doesn't that mean
it should have 755 permissions instead of 644, and have a #! python line?

> +return sequence
> 

I guess it would also need a __main__ blurb at the end?  Or maybe I'm
misunderstanding the intent of the commit message line.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v7 00/11] Rework iotests/check

2021-01-21 Thread Paolo Bonzini


On 16/01/21 14:44, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

These series has 3 goals:

  - get rid of group file (to forget about rebase and in-list conflicts)
  - introduce human-readable names for tests
  - rewrite check into python


Very nice!

I have wondered for some time if we should rewrite check as a wrapper 
around avocado.  Your effort certainly goes in the right direction, 
since it at least uses the same programming language and introduces 
abstractions that can be easily turned into avocado wrappers.


Paolo


v7:
   - fix wording and grammar
   - satisfy python linters
   - move argv interfaces all into one in new check script
   - support '-n' == '--dry-run' option
   - update check-block to run check with correct PYTHON

  findtests: - stop parsing test file after first '# group: ' line
 - use parse_test_name in add_group_file()
 - make new list instead of modifying parameter exclude_groups

  testenv: - fix machine_map
   - fix self.python

  testrunner: use env.python to run python tests

Vladimir Sementsov-Ogievskiy (11):
   iotests/277: use dot slash for nbd-fault-injector.py running
   iotests/303: use dot slash for qcow2.py running
   iotests: fix some whitespaces in test output files
   iotests: make tests executable
   iotests/294: add shebang line
   iotests: define group in each iotest
   iotests: add findtests.py
   iotests: add testenv.py
   iotests: add testrunner.py
   iotests: rewrite check into python
   iotests: rename and move 169 and 199 tests

  docs/devel/testing.rst|   50 +-
  Makefile  |1 -
  tests/check-block.sh  |2 +-
  tests/qemu-iotests/001|1 +
  tests/qemu-iotests/002|1 +
  tests/qemu-iotests/003|1 +
  tests/qemu-iotests/004|1 +
  tests/qemu-iotests/005|1 +
  tests/qemu-iotests/007|1 +
  tests/qemu-iotests/008|1 +
  tests/qemu-iotests/009|1 +
  tests/qemu-iotests/010|1 +
  tests/qemu-iotests/011|1 +
  tests/qemu-iotests/012|1 +
  tests/qemu-iotests/013|1 +
  tests/qemu-iotests/014|1 +
  tests/qemu-iotests/015|1 +
  tests/qemu-iotests/017|1 +
  tests/qemu-iotests/018|1 +
  tests/qemu-iotests/019|1 +
  tests/qemu-iotests/020|1 +
  tests/qemu-iotests/021|1 +
  tests/qemu-iotests/022|1 +
  tests/qemu-iotests/023|1 +
  tests/qemu-iotests/024|1 +
  tests/qemu-iotests/025|1 +
  tests/qemu-iotests/026|1 +
  tests/qemu-iotests/027|1 +
  tests/qemu-iotests/028|1 +
  tests/qemu-iotests/029|1 +
  tests/qemu-iotests/030|1 +
  tests/qemu-iotests/031|1 +
  tests/qemu-iotests/032|1 +
  tests/qemu-iotests/033|1 +
  tests/qemu-iotests/034|1 +
  tests/qemu-iotests/035|1 +
  tests/qemu-iotests/036|1 +
  tests/qemu-iotests/037|1 +
  tests/qemu-iotests/038|1 +
  tests/qemu-iotests/039|1 +
  tests/qemu-iotests/040|1 +
  tests/qemu-iotests/041|1 +
  tests/qemu-iotests/042|1 +
  tests/qemu-iotests/043|1 +
  tests/qemu-iotests/044|1 +
  tests/qemu-iotests/045|1 +
  tests/qemu-iotests/046|1 +
  tests/qemu-iotests/047|1 +
  tests/qemu-iotests/048|1 +
  tests/qemu-iotests/049|1 +
  tests/qemu-iotests/050|1 +
  tests/qemu-iotests/051|1 +
  tests/qemu-iotests/052|1 +
  tests/qemu-iotests/053|1 +
  tests/qemu-iotests/054|1 +
  tests/qemu-iotests/055|1 +
  tests/qemu-iotests/056|1 +
  tests/qemu-iotests/057|1 +
  tests/qemu-iotests/058|1 +
  tests/qemu-iotests/059|1 +
  tests/qemu-iotests/060

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

2021-01-21 Thread Francisco Iglesias

Hi Bin,

On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> Hi Francisco,
> 
> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>  wrote:
> >
> > Dear Bin,
> >
> > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > >  wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > >  wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > From: Bin Meng 
> > > > > > > > > > >
> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how 
> > > > > > > > > > > many follow-up
> > > > > > > > > > > bytes are expected to be received after it receives a 
> > > > > > > > > > > command. For
> > > > > > > > > > > example, depending on the address mode, either 3-byte 
> > > > > > > > > > > address or
> > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > >
> > > > > > > > > > > For fast read family commands, some dummy cycles are 
> > > > > > > > > > > required after
> > > > > > > > > > > sending the address bytes, and the dummy cycles need to 
> > > > > > > > > > > be counted
> > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > >
> > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit 
> > > > > > > > > > > is in byte.
> > > > > > > > > > > It is not in bit, or cycle. However for some reason the 
> > > > > > > > > > > model has
> > > > > > > > > > > been using the number of dummy cycles for 
> > > > > > > > > > > s->needed_bytes. The right
> > > > > > > > > > > approach is to convert the number of dummy cycles to 
> > > > > > > > > > > bytes based on
> > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the 
> > > > > > > > > > > Fast Read Quad
> > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula 
> > > > > > > > > > > (6 * 4 / 8).
> > > > > > > > > >
> > > > > > > > > > While not being the original implementor I must assume that 
> > > > > > > > > > above solution was
> > > > > > > > > > considered but not chosen by the developers due to it is 
> > > > > > > > > > inaccuracy (it
> > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a 
> > > > > > > > > > multiple of 8,
> > > > > > > > > > meaning that if the controller is wrongly programmed to 
> > > > > > > > > > generate 7 the error
> > > > > > > > > > wouldn't be caught and the controller will still be 
> > > > > > > > > > considered "correct"). Now
> > > > > > > > > > that we have this detail in the implementation I'm in favor 
> > > > > > > > > > of keeping it, this
> > > > > > > > > > also because the detail is already in use for catching 
> > > > > > > > > > exactly above error.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I found no clue from the commit message that my proposed 
> > > > > > > > > solution here
> > > > > > > > > was ever considered, otherwise all SPI controller models 
> > > > > > > > > supporting
> > > > > > > > > software generation should have been found out seriously 
> > > > > > > > > broken long
> > > > > > > > > time ago!
> > > > > > > >
> > > > > > > >
> > > > > > > > The controllers you are referring to might lack support for 
> > > > > > > > commands requiring
> > > > > > > > dummy clock cycles but I really hope they work with the other 
> > > > > > > > commands? If so I
> > > > > > >
> > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > that needs some special support from the SPI controller. For the 
> > > > > > > case
> > > > > > > 1 controller, it's nothing special from the controller 
> > > > > > > perspective,
> > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > controller just shifts data bit by bit from its tx fifo and 
> > > > > > > that's it.
> > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either 
> > > > > > > be
> > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > >
> > > > > > Ok, I'll try to explain my view point a little differently. For 
> > > > > > that we also
> > > > > > need to keep in mind that QEMU models HW, and any binary

Re: [PATCH v2 3/3] hw/usb/dev-uas: Report command additional adb length as unsupported

2021-01-21 Thread Philippe Mathieu-Daudé

On 1/21/21 12:14 PM, Gerd Hoffmann wrote:
> On Wed, Jan 20, 2021 at 04:35:22PM +0100, Philippe Mathieu-Daudé wrote:
>> We are not ready to handle additional CDB data.
>>
>> If a guest sends a packet with such additional data,
>> report the command parameter as not supported.
>>
>> Specify a size (of 1 byte) for the add_cdb member we
>> are not using, to fix the following warning:
>>
>>   usb/dev-uas.c:157:31: error: field 'status' with variable sized type 
>> 'uas_iu' not at the end of a struct or class is a GNU extension 
>> [-Werror,-Wgnu-variable-sized-type-not-at-end]
>>   uas_iustatus;
>> ^
>>
>> Reported-by: Ed Maste 
>> Reported-by: Daniele Buono 
>> Reported-by: Han Han 
>> Reviewed-by: Eric Blake 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> Cc: Marc-André Lureau 
>> Cc: Paolo Bonzini 
>> Cc: Gustavo A. R. Silva 
>>
>> v2: include Eric feedbacks
> 
> Queued 2+3, fixup #2 conflicts due to dropping #1.

Thank you!

Re: [PATCH v2 3/3] hw/usb/dev-uas: Report command additional adb length as unsupported

2021-01-21 Thread Gerd Hoffmann

On Wed, Jan 20, 2021 at 04:35:22PM +0100, Philippe Mathieu-DaudÃ© wrote:
> We are not ready to handle additional CDB data.
> 
> If a guest sends a packet with such additional data,
> report the command parameter as not supported.
> 
> Specify a size (of 1 byte) for the add_cdb member we
> are not using, to fix the following warning:
> 
>   usb/dev-uas.c:157:31: error: field 'status' with variable sized type 
> 'uas_iu' not at the end of a struct or class is a GNU extension 
> [-Werror,-Wgnu-variable-sized-type-not-at-end]
>   uas_iustatus;
> ^
> 
> Reported-by: Ed Maste 
> Reported-by: Daniele Buono 
> Reported-by: Han Han 
> Reviewed-by: Eric Blake 
> Signed-off-by: Philippe Mathieu-DaudÃ© 
> ---
> Cc: Marc-AndrÃ© Lureau 
> Cc: Paolo Bonzini 
> Cc: Gustavo A. R. Silva 
> 
> v2: include Eric feedbacks

Queued 2+3, fixup #2 conflicts due to dropping #1.

thanks,
  Gerd

Re: [RFC PATCH 0/2] Allow changing bs->file on reopen

2021-01-21 Thread Kevin Wolf

Am 18.01.2021 um 11:22 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 15.01.2021 16:02, Alberto Garcia wrote:
> > Hi,
> > 
> > during the past months we talked about making x-blockdev-reopen stable
> > API, and one of the missing things was having support for changing
> > bs->file. See here for the discusssion (I can't find the message from
> > Kashyap that started the thread in the web archives):
> > 
> > https://lists.gnu.org/archive/html/qemu-block/2020-10/msg00922.html
> > 
> > I was testing this and one of the problems that I found was that
> > removing a filter node using this command is tricky because of the
> > permission system, see here for details:
> > 
> > https://lists.gnu.org/archive/html/qemu-block/2020-12/msg00092.html
> > 
> > The good news is that Vladimir posted a set of patches that changes
> > the way that permissions are updated on reopen:
> > 
> > https://lists.gnu.org/archive/html/qemu-block/2020-11/msg00745.html
> > 
> > I was testing if this would be useful to solve the problem that I
> > mentioned earlier and it seems to be the case so I wrote a patch to
> > add support for changing bs->file, along with a couple of test cases.
> > 
> > This is still an RFC but you can see the idea.
> 
> Good idea and I glad to see that my patches help:)
> 
> Hmm, still, removing a filter which want to unshare WRITE even when
> doesn't have any parents will be a problem anyway, so we'll need a new
> command to drop filter with a logic like in bdrv_drop_filter in my
> series.
> 
> Or, we can introduce multiple reopen.. So that x-blockdev-reopen will
> take a list of BlockdevOptions, and do all modifications in one
> transaction. Than we'll be able to drop filter by transactional update
> of top node child and removing filter child link.

Internally, we already have reopen queues anyway, so it would make sense
to me to expose them externally and take a list of BlockdevOptions.
This way we should be able to do even complex changes of the graph where
adding some edges requires the removal of other edges in a single atomic
operation.

Kevin

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

2021-01-21 Thread Francisco Iglesias

Dear Bin,

On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> Hi Francisco,
> 
> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>  wrote:
> >
> > Dear Bin,
> >
> > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > >  wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > >  wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > From: Bin Meng 
> > > > > > > > > > >
> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how 
> > > > > > > > > > > many follow-up
> > > > > > > > > > > bytes are expected to be received after it receives a 
> > > > > > > > > > > command. For
> > > > > > > > > > > example, depending on the address mode, either 3-byte 
> > > > > > > > > > > address or
> > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > >
> > > > > > > > > > > For fast read family commands, some dummy cycles are 
> > > > > > > > > > > required after
> > > > > > > > > > > sending the address bytes, and the dummy cycles need to 
> > > > > > > > > > > be counted
> > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > >
> > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit 
> > > > > > > > > > > is in byte.
> > > > > > > > > > > It is not in bit, or cycle. However for some reason the 
> > > > > > > > > > > model has
> > > > > > > > > > > been using the number of dummy cycles for 
> > > > > > > > > > > s->needed_bytes. The right
> > > > > > > > > > > approach is to convert the number of dummy cycles to 
> > > > > > > > > > > bytes based on
> > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the 
> > > > > > > > > > > Fast Read Quad
> > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula 
> > > > > > > > > > > (6 * 4 / 8).
> > > > > > > > > >
> > > > > > > > > > While not being the original implementor I must assume that 
> > > > > > > > > > above solution was
> > > > > > > > > > considered but not chosen by the developers due to it is 
> > > > > > > > > > inaccuracy (it
> > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a 
> > > > > > > > > > multiple of 8,
> > > > > > > > > > meaning that if the controller is wrongly programmed to 
> > > > > > > > > > generate 7 the error
> > > > > > > > > > wouldn't be caught and the controller will still be 
> > > > > > > > > > considered "correct"). Now
> > > > > > > > > > that we have this detail in the implementation I'm in favor 
> > > > > > > > > > of keeping it, this
> > > > > > > > > > also because the detail is already in use for catching 
> > > > > > > > > > exactly above error.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I found no clue from the commit message that my proposed 
> > > > > > > > > solution here
> > > > > > > > > was ever considered, otherwise all SPI controller models 
> > > > > > > > > supporting
> > > > > > > > > software generation should have been found out seriously 
> > > > > > > > > broken long
> > > > > > > > > time ago!
> > > > > > > >
> > > > > > > >
> > > > > > > > The controllers you are referring to might lack support for 
> > > > > > > > commands requiring
> > > > > > > > dummy clock cycles but I really hope they work with the other 
> > > > > > > > commands? If so I
> > > > > > >
> > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > that needs some special support from the SPI controller. For the 
> > > > > > > case
> > > > > > > 1 controller, it's nothing special from the controller 
> > > > > > > perspective,
> > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > controller just shifts data bit by bit from its tx fifo and 
> > > > > > > that's it.
> > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either 
> > > > > > > be
> > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > >
> > > > > > Ok, I'll try to explain my view point a little differently. For 
> > > > > > that we also
> > > > > > need to keep in mind that QEMU models HW, and any binary

[RFC PATCH v2 6/8] meson: Display block layer information altogether

2021-01-21 Thread Philippe Mathieu-Daudé

Display block layer information altogether,
when it is relevant.

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: qemu-block@nongnu.org

Should coroutine be displayed generically (here, in Misc?)
or restricted to have_block?

There is probably more features I missed...
---
 meson.build | 45 ++---
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/meson.build b/meson.build
index a0945749f94..e372b69f163 100644
--- a/meson.build
+++ b/meson.build
@@ -2325,6 +2325,7 @@
 summary_info += {'Documentation': build_docs}
 summary_info += {'system-mode emulation': have_system}
 summary_info += {'user-mode emulation': have_user}
+summary_info += {'block layer':   have_block}
 summary_info += {'Install blobs': get_option('install_blobs')}
 # TODO: add back version
 summary_info += {'slirp support': slirp_opt == 'disabled' ? false : 
slirp_opt}
@@ -2427,6 +2428,31 @@
 endif
 summary(summary_info, bool_yn: true, section: 'Targets and accelerators')
 
+# Block layer
+summary_info = {}
+summary_info += {'coroutine backend': config_host['CONFIG_COROUTINE_BACKEND']}
+summary_info += {'coroutine pool':config_host['CONFIG_COROUTINE_POOL'] == 
'1'}
+if have_block
+  summary_info += {'Block whitelist (rw)': 
config_host['CONFIG_BDRV_RW_WHITELIST']}
+  summary_info += {'Block whitelist (ro)': 
config_host['CONFIG_BDRV_RO_WHITELIST']}
+  summary_info += {'VirtFS support':have_virtfs}
+  summary_info += {'build virtiofs daemon': have_virtiofsd}
+  summary_info += {'Live block migration': 
config_host.has_key('CONFIG_LIVE_BLOCK_MIGRATION')}
+  summary_info += {'replication support': 
config_host.has_key('CONFIG_REPLICATION')}
+  summary_info += {'bochs support': config_host.has_key('CONFIG_BOCHS')}
+  summary_info += {'cloop support': config_host.has_key('CONFIG_CLOOP')}
+  summary_info += {'dmg support':   config_host.has_key('CONFIG_DMG')}
+  summary_info += {'qcow v1 support':   config_host.has_key('CONFIG_QCOW1')}
+  summary_info += {'vdi support':   config_host.has_key('CONFIG_VDI')}
+  summary_info += {'vvfat support': config_host.has_key('CONFIG_VVFAT')}
+  summary_info += {'qed support':   config_host.has_key('CONFIG_QED')}
+  summary_info += {'parallels support': 
config_host.has_key('CONFIG_PARALLELS')}
+  summary_info += {'sheepdog support':  config_host.has_key('CONFIG_SHEEPDOG')}
+  summary_info += {'FUSE exports':  fuse.found()}
+  summary_info += {'FUSE lseek':fuse_lseek.found()}
+endif
+summary(summary_info, bool_yn: true, section: 'Block layer support')
+
 summary_info = {}
 summary_info += {'sparse enabled':sparse.found()}
 if targetos == 'darwin'
@@ -2460,10 +2486,6 @@
 # TODO: add back version
 summary_info += {'virgl support': config_host.has_key('CONFIG_VIRGL')}
 summary_info += {'curl support':  curl.found()}
-summary_info += {'Block whitelist (rw)': 
config_host['CONFIG_BDRV_RW_WHITELIST']}
-summary_info += {'Block whitelist (ro)': 
config_host['CONFIG_BDRV_RO_WHITELIST']}
-summary_info += {'VirtFS support':have_virtfs}
-summary_info += {'build virtiofs daemon': have_virtiofsd}
 summary_info += {'Multipath support': mpathpersist.found()}
 summary_info += {'VNC support':   vnc.found()}
 if vnc.found()
@@ -2509,14 +2531,11 @@
   summary_info += {'QGA MSI support':   config_host.has_key('CONFIG_QGA_MSI')}
 endif
 summary_info += {'seccomp support':   seccomp.found()}
-summary_info += {'coroutine backend': config_host['CONFIG_COROUTINE_BACKEND']}
-summary_info += {'coroutine pool':config_host['CONFIG_COROUTINE_POOL'] == 
'1'}
 summary_info += {'crypto afalg':  config_host.has_key('CONFIG_AF_ALG')}
 summary_info += {'GlusterFS support': glusterfs.found()}
 summary_info += {'TPM support':   config_host.has_key('CONFIG_TPM')}
 summary_info += {'libssh support':config_host.has_key('CONFIG_LIBSSH')}
 summary_info += {'QOM debugging': 
config_host.has_key('CONFIG_QOM_CAST_DEBUG')}
-summary_info += {'Live block migration': 
config_host.has_key('CONFIG_LIVE_BLOCK_MIGRATION')}
 summary_info += {'lzo support':   lzo.found()}
 summary_info += {'snappy support':snappy.found()}
 summary_info += {'bzip2 support': libbzip2.found()}
@@ -2524,24 +2543,12 @@
 summary_info += {'zstd support':  zstd.found()}
 summary_info += {'NUMA host support': config_host.has_key('CONFIG_NUMA')}
 summary_info += {'libxml2':   config_host.has_key('CONFIG_LIBXML2')}
-summary_info += {'replication support': 
config_host.has_key('CONFIG_REPLICATION')}
-summary_info += {'bochs support': config_host.has_key('CONFIG_BOCHS')}
-summary_info += {'cloop support': config_host.has_key('CONFIG_CLOOP')}
-summary_info += {'dmg support':   config_host.has_key('CONFIG_DMG')}
-summary_info += {'qcow v1 support':   config_host.has_key('CONFIG_QCOW1')}
-summary_info += {'vdi support':   config_host.has_key('CONFIG_VDI')}
-summary_info += {'vvfat support':

Re: [RFC PATCH V3 8/8] hw/block/nvme: Add Identify Active Namespace ID List

2021-01-21 Thread Niklas Cassel

On Thu, Jan 21, 2021 at 06:58:19AM +0900, Minwoo Im wrote:
> > Hello Minwoo,
> > 
> > By introducing a detached parameter,
> > you are also implicitly making the following
> > NVMe commands no longer be spec compliant:
> > 
> > NVME_ID_CNS_NS, NVME_ID_CNS_CS_NS,
> > NVME_ID_CNS_NS_ACTIVE_LIST, NVME_ID_CNS_CS_NS_ACTIVE_LIST
> > 
> > When these commands are called on a detached namespace,
> > they should usually return a zero filled data struct.
> 
> Agreed.
> 
> > Dmitry and I had a patch on V8 on the ZNS series
> > that tried to fix some the existing NVMe commands
> > to be spec compliant, by handling detached namespaces
> > properly. In the end, in order to make it easier to
> > get the ZNS series accepted, we decided to drop the
> > detached related stuff from the series.
> > 
> > Feel free to look at that patch for inspiration:
> > https://github.com/dmitry-fomichev/qemu/commit/251c0ffee5149c739b1347811fa7e32a1c36bf7c
> 
> I've seen this patch and as Klaus said, only thing patches need go with
> is to put some of nvme_ns_is_attached() helper among the Identify
> handlers.
> 
> > I'm not sure if you want to modify all the functions that
> > our patch modifies, but I think that you should at least
> > modify the following nvme functions:
> > 
> > nvme_identify_ns()
> > nvme_identify_ns_csi()
> > nvme_identify_nslist()
> > nvme_identify_nslist_csi()
> 
> Yes, pretty makes sense to update them.  But now it seems like
> 'attach/detach' scheme should have been separated out of this series
> which just introduced the multi-path for controllers and namespace
> sharing.  I will drop this 'detach' scheme out of this series and make
> another series to support all of the Identify you mentioned above
> cleanly.

Hello Minwoo,

thank you for putting in work on this!

Kind regards,
Niklas

Re: [PATCH 04/11] job: add .cancel handler for the driver

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 02:17, Eric Blake wrote:

On 11/18/20 12:04 PM, Vladimir Sementsov-Ogievskiy wrote:

To be used in mirror in the following commit to cancel in-flight io on
target to not waste the time.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/qemu/job.h | 5 +
  job.c  | 3 +++
  2 files changed, 8 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 32aabb1c60..efc6fa7544 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -251,6 +251,11 @@ struct JobDriver {
   */
  void (*clean)(Job *job);
  
+/**

+ * If the callback is not NULL, it will be invoked in job_cancel_async
+ */
+void (*cancel)(Job *job);
+


Does the call need to be re-entrant or even worry about being invoked
more than once on the same BDS?  Or worded differently,


+++ b/job.c
@@ -712,6 +712,9 @@ static int job_finalize_single(Job *job)
  
  static void job_cancel_async(Job *job, bool force)

  {
+if (job->driver->cancel) {
+job->driver->cancel(job);
+}
  if (job->user_paused) {


can job_cancel_async be reached more than once on the same BDS?


what do you mean by same BDS? On same job?

I don't think that job_cancel_async should be called several times during one 
"cancel" operation. But if it is, it's still safe enough with only .cancel 
implementation in [05], which just calls bdrv_cancel_in_flight() (which of course should 
be safe to call several times).




  /* Do not call job_enter here, the caller will handle it.  */
  if (job->driver->user_resume) {






--
Best regards,
Vladimir

Re: [PULL 10/13] iotests: define group in each iotest

2021-01-21 Thread Vladimir Sementsov-Ogievskiy


21.01.2021 05:36, Eric Blake wrote:

From: Vladimir Sementsov-Ogievskiy

We are going to drop group file. Define group in tests as a preparatory
step.

The patch is generated by

 cd tests/qemu-iotests

 grep '^[0-9]\{3\} ' group | while read line; do
 file=$(awk '{print $1}' <<< "$line");
 groups=$(sed -e 's/^... //' <<< "$line");
 awk "NR==2{print \"# group: $groups\"}1" $file > tmp;
 cat tmp > $file;
 done

Signed-off-by: Vladimir Sementsov-Ogievskiy
Reviewed-by: Eric Blake
Message-Id:<20210116134424.82867-7-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake


Hmm. I think this patch should go together with the main series about 
refactoring iotests/check, not with preparation fixes. Otherwise we'll diverge 
if more tests will come in a mean time (or any change in group file). And just 
for good history, better keep this patch close to dropping of group file.


--
Best regards,
Vladimir

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

2021-01-21 Thread Bin Meng

Hi Francisco,

On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
 wrote:
>
> Dear Bin,
>
> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> >  wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > >  wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > From: Bin Meng 
> > > > > > > > > >
> > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many 
> > > > > > > > > > follow-up
> > > > > > > > > > bytes are expected to be received after it receives a 
> > > > > > > > > > command. For
> > > > > > > > > > example, depending on the address mode, either 3-byte 
> > > > > > > > > > address or
> > > > > > > > > > 4-byte address is needed.
> > > > > > > > > >
> > > > > > > > > > For fast read family commands, some dummy cycles are 
> > > > > > > > > > required after
> > > > > > > > > > sending the address bytes, and the dummy cycles need to be 
> > > > > > > > > > counted
> > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > >
> > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is 
> > > > > > > > > > in byte.
> > > > > > > > > > It is not in bit, or cycle. However for some reason the 
> > > > > > > > > > model has
> > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. 
> > > > > > > > > > The right
> > > > > > > > > > approach is to convert the number of dummy cycles to bytes 
> > > > > > > > > > based on
> > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast 
> > > > > > > > > > Read Quad
> > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 
> > > > > > > > > > * 4 / 8).
> > > > > > > > >
> > > > > > > > > While not being the original implementor I must assume that 
> > > > > > > > > above solution was
> > > > > > > > > considered but not chosen by the developers due to it is 
> > > > > > > > > inaccuracy (it
> > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a 
> > > > > > > > > multiple of 8,
> > > > > > > > > meaning that if the controller is wrongly programmed to 
> > > > > > > > > generate 7 the error
> > > > > > > > > wouldn't be caught and the controller will still be 
> > > > > > > > > considered "correct"). Now
> > > > > > > > > that we have this detail in the implementation I'm in favor 
> > > > > > > > > of keeping it, this
> > > > > > > > > also because the detail is already in use for catching 
> > > > > > > > > exactly above error.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I found no clue from the commit message that my proposed 
> > > > > > > > solution here
> > > > > > > > was ever considered, otherwise all SPI controller models 
> > > > > > > > supporting
> > > > > > > > software generation should have been found out seriously broken 
> > > > > > > > long
> > > > > > > > time ago!
> > > > > > >
> > > > > > >
> > > > > > > The controllers you are referring to might lack support for 
> > > > > > > commands requiring
> > > > > > > dummy clock cycles but I really hope they work with the other 
> > > > > > > commands? If so I
> > > > > >
> > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > that needs some special support from the SPI controller. For the 
> > > > > > case
> > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > controller just shifts data bit by bit from its tx fifo and that's 
> > > > > > it.
> > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > automatically generated (case 2 controller) by the hardware.
> > > > >
> > > > > Ok, I'll try to explain my view point a little differently. For that 
> > > > > we also
> > > > > need to keep in mind that QEMU models HW, and any binary that runs on 
> > > > > a HW
> > > > > board supported in QEMU should ideally run on that board inside QEMU 
> > > > > aswell
> > > > > (this can be a bare metal application equaly well as a modified 
> > > > > u-boot/Linux
> > > > > using SPI commands with a non multiple of 8 number of dummy clock 
> > > > >

Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

2021-01-21 Thread Francisco Iglesias

Dear Bin,

On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> Hi Francisco,
> 
> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>  wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > >  wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > >  wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > From: Bin Meng 
> > > > > > > > >
> > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many 
> > > > > > > > > follow-up
> > > > > > > > > bytes are expected to be received after it receives a 
> > > > > > > > > command. For
> > > > > > > > > example, depending on the address mode, either 3-byte address 
> > > > > > > > > or
> > > > > > > > > 4-byte address is needed.
> > > > > > > > >
> > > > > > > > > For fast read family commands, some dummy cycles are required 
> > > > > > > > > after
> > > > > > > > > sending the address bytes, and the dummy cycles need to be 
> > > > > > > > > counted
> > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > >
> > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in 
> > > > > > > > > byte.
> > > > > > > > > It is not in bit, or cycle. However for some reason the model 
> > > > > > > > > has
> > > > > > > > > been using the number of dummy cycles for s->needed_bytes. 
> > > > > > > > > The right
> > > > > > > > > approach is to convert the number of dummy cycles to bytes 
> > > > > > > > > based on
> > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast 
> > > > > > > > > Read Quad
> > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 
> > > > > > > > > 4 / 8).
> > > > > > > >
> > > > > > > > While not being the original implementor I must assume that 
> > > > > > > > above solution was
> > > > > > > > considered but not chosen by the developers due to it is 
> > > > > > > > inaccuracy (it
> > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a 
> > > > > > > > multiple of 8,
> > > > > > > > meaning that if the controller is wrongly programmed to 
> > > > > > > > generate 7 the error
> > > > > > > > wouldn't be caught and the controller will still be considered 
> > > > > > > > "correct"). Now
> > > > > > > > that we have this detail in the implementation I'm in favor of 
> > > > > > > > keeping it, this
> > > > > > > > also because the detail is already in use for catching exactly 
> > > > > > > > above error.
> > > > > > > >
> > > > > > >
> > > > > > > I found no clue from the commit message that my proposed solution 
> > > > > > > here
> > > > > > > was ever considered, otherwise all SPI controller models 
> > > > > > > supporting
> > > > > > > software generation should have been found out seriously broken 
> > > > > > > long
> > > > > > > time ago!
> > > > > >
> > > > > >
> > > > > > The controllers you are referring to might lack support for 
> > > > > > commands requiring
> > > > > > dummy clock cycles but I really hope they work with the other 
> > > > > > commands? If so I
> > > > >
> > > > > I am not sure why you view dummy clock cycles as something special
> > > > > that needs some special support from the SPI controller. For the case
> > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > just like sending out a command, or address bytes, or data. The
> > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > automatically generated (case 2 controller) by the hardware.
> > > >
> > > > Ok, I'll try to explain my view point a little differently. For that we 
> > > > also
> > > > need to keep in mind that QEMU models HW, and any binary that runs on a 
> > > > HW
> > > > board supported in QEMU should ideally run on that board inside QEMU 
> > > > aswell
> > > > (this can be a bare metal application equaly well as a modified 
> > > > u-boot/Linux
> > > > using SPI commands with a non multiple of 8 number of dummy clock 
> > > > cycles).
> > > >
> > > > Once functionality has been introduced into QEMU it is not easy to know 
> > > > which
> > > > intentional or untentional features provided by the functionality are 
> > > > being
> > > > used by users. One of the (perhaps not well known) features I'm aware 
> > > > of that
> > > > is in use and is

52 matches

Mail list logo