Re: [PATCH v2 04/15] python: add directory structure README.rst files

2020-10-14 Thread John Snow

On 10/14/20 2:05 PM, Philippe Mathieu-Daudé wrote:

On 10/14/20 4:29 PM, John Snow wrote:

Add short readmes to python/, python/qemu/, and python/qemu/core that
explain the directory hierarchy. These readmes are visible when browsing


Maybe readmes -> READMEs



If you want it to match the filename, "README files".

--js


Otherwise:
Reviewed-by: Philippe Mathieu-Daudé 


the source on e.g. gitlab/github and are designed to help new
developers/users quickly make sense of the source tree.

They are not designed for inclusion in a published manual.

Signed-off-by: John Snow 
---
  python/README.rst   | 27 +++
  python/qemu/README.rst  |  8 
  python/qemu/core/README.rst |  9 +
  3 files changed, 44 insertions(+)
  create mode 100644 python/README.rst
  create mode 100644 python/qemu/README.rst
  create mode 100644 python/qemu/core/README.rst







Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed

2020-10-14 Thread Andrey Shinkevich

On 14.10.2020 15:51, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 13 +
  block/io.c   |  3 ++-
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index b136895..278a11a 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -148,10 +148,15 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
  }
  }
  
-ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,

-  local_flags);
-if (ret < 0) {
-return ret;
+if (!!(flags & BDRV_REQ_PREFETCH) &


How about dropping the double negation and using a logical && instead of
the binary &?



Yes, that's correct.


+!(local_flags & BDRV_REQ_COPY_ON_READ)) {
+/* Skip non-guest reads if no copy needed */
+} else {


Hm.  I would have just written the negated form

(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))

and put the “skip” comment above that condition.

(Since local_flags is initialized to flags, it can be written as a
single comparison, but that’s a matter of taste and I’m not going to
recommend either over the other:



I played with the flags to make the idea obvious for the eye of a 
beholder: "we neither read nor write".
Comparing the BDRV_REQ_PREFETCH against the 'flags' means that the flag 
comes from outside of the function.

And the empty section means we do nothing in that case.
Eventually, I will pick up the brief expression below.
Thanks,

Andrey


((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
BDRV_REQ_PREFETCH)

)


+ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+if (ret < 0) {
+return ret;
+}
  }
  
  offset += n;

diff --git a/block/io.c b/block/io.c
index 11df188..bff1808 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  
  max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);

  if (bytes <= max_bytes && bytes <= max_transfer) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
+ flags & bs->supported_read_flags);


Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
why it isn’t.

Max


  goto out;
  }
  








Re: [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag

2020-10-14 Thread Andrey Shinkevich

On 14.10.2020 15:22, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich 
---
  include/block/block.h | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b..2b7efd1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -71,9 +71,10 @@ typedef enum {
  BDRV_REQ_NO_FALLBACK= 0x100,
  
  /*

- * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
- * on read request and means that caller doesn't really need data to be
- * written to qiov parameter which may be NULL.
+ * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
+ * flag or when the COR-filter applied to read operations and means that


There’s some word missing here, but I’m not sure what it is...  At least
an “is” before “applied”.  Perhaps something like ”or when a COR filter
is involved (in read operations)” would be better.


+ * caller doesn't really need data to be written to qiov parameter which


And this “written to” confused me for a second, because we’re reading
into qiov.  Technically, that means writing into the buffer, but, you know.

Could we rewrite the whole thing, perhaps?  Something like

“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
(i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a
COR filter), in which case it signals that the COR operation need not
read the data into memory (qiov), but only ensure it is copied to the
top layer (i.e., that COR is done).”

I don’t know.

Max



I would modify a little:

“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
(i.e., together with the BDRV_REQ_COPY_ON_READ flag or when a COR
filter is involved), in which case it signals that the COR operation
need not read the data into memory (qiov) but only ensure they are
copied to the top layer (i.e., that COR operation is done).”



+ * may be NULL.
   */
  BDRV_REQ_PREFETCH  = 0x200,
  /* Mask of valid flags */








Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-14 Thread Andrey Shinkevich

On 14.10.2020 15:01, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Limit COR operations by the base node in the backing chain when the
overlay base node name is given. It will be useful for a block stream
job when the COR-filter is applied. The overlay base node is passed as
the base itself may change due to concurrent commit jobs on the same
backing chain.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 39 +--
  1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index c578b1b..dfbd6ad 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -122,8 +122,43 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
 size_t qiov_offset,
 int flags)
  {


[...]


+ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
+  state->base_overlay, true, offset,
+  n, );
+if (ret) {
+local_flags |= BDRV_REQ_COPY_ON_READ;
+}
+}


Furthermore, I just noticed – can the is_allocated functions not return
0 in @n, when @offset is a the EOF?  Is that something to look out for?
  (I’m not sure.)

Max



The check for EOF is managed earlier in the stream_run() for a 
block-stream job. For other cases of using the COR-filter, the check for 
EOF can be added to the cor_co_preadv_part().
I would be more than happy if we can escape the duplicated checking for 
is_allocated in the block-stream. But how the stream_run() can stop 
calling the blk_co_preadv() when EOF is reached if is_allocated removed 
from it? May the cor_co_preadv_part() return EOF (or other error code) 
to be handled by a caller if (ret == 0 && n == 0 && (flags & 
BDRV_REQ_PREFETCH)?


Andrey



Re: [PATCH v2 13/15] python: move .isort.cfg into setup.cfg

2020-10-14 Thread Philippe Mathieu-Daudé

On 10/14/20 4:29 PM, John Snow wrote:

Signed-off-by: John Snow 
---
  python/.isort.cfg | 7 ---
  python/setup.cfg  | 8 
  2 files changed, 8 insertions(+), 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 04/15] python: add directory structure README.rst files

2020-10-14 Thread Philippe Mathieu-Daudé

On 10/14/20 4:29 PM, John Snow wrote:

Add short readmes to python/, python/qemu/, and python/qemu/core that
explain the directory hierarchy. These readmes are visible when browsing


Maybe readmes -> READMEs

Otherwise:
Reviewed-by: Philippe Mathieu-Daudé 


the source on e.g. gitlab/github and are designed to help new
developers/users quickly make sense of the source tree.

They are not designed for inclusion in a published manual.

Signed-off-by: John Snow 
---
  python/README.rst   | 27 +++
  python/qemu/README.rst  |  8 
  python/qemu/core/README.rst |  9 +
  3 files changed, 44 insertions(+)
  create mode 100644 python/README.rst
  create mode 100644 python/qemu/README.rst
  create mode 100644 python/qemu/core/README.rst





Re: [PATCH v2 01/15] python: create qemu.core package

2020-10-14 Thread Philippe Mathieu-Daudé

On 10/14/20 4:29 PM, John Snow wrote:

move python/qemu/*.py to python/qemu/core/*.py and update import
directives across the tree.

This is done to create a PEP420 namespace package, in which we may
create subpackages. To do this, the namespace directory ("qemu") should
not have any modules in it. Those files will go in a new 'core'
subpackage instead.

Bolster the core/__init__.py module, making the top-level classes and
functions from this package available directly inside the `qemu.core`
namespace.

This facilitates the convenient shorthand:


from qemu.core import QEMUQtestMachine, QEMUMonitorProtocol


Signed-off-by: John Snow 
---
  python/{qemu => }/.isort.cfg  |  0
  python/qemu/__init__.py   | 11 --
  python/qemu/{ => core}/.flake8|  0
  python/qemu/core/__init__.py  | 44 +++
  python/qemu/{ => core}/accel.py   |  0
  python/qemu/{ => core}/console_socket.py  |  0
  python/qemu/{ => core}/machine.py |  0
  python/qemu/{ => core}/pylintrc   |  0
  python/qemu/{ => core}/qmp.py |  0
  python/qemu/{ => core}/qtest.py   |  0
  scripts/device-crash-test |  2 +-
  scripts/qmp/qemu-ga-client|  2 +-
  scripts/qmp/qmp   |  2 +-
  scripts/qmp/qmp-shell |  2 +-
  scripts/qmp/qom-fuse  |  2 +-
  scripts/qmp/qom-get   |  2 +-
  scripts/qmp/qom-list  |  2 +-
  scripts/qmp/qom-set   |  2 +-
  scripts/qmp/qom-tree  |  2 +-
  scripts/render_block_graph.py |  6 ++--
  scripts/simplebench/bench_block_job.py|  4 +--
  tests/acceptance/avocado_qemu/__init__.py |  2 +-
  tests/acceptance/boot_linux.py|  3 +-
  tests/acceptance/virtio_check_params.py   |  2 +-
  tests/acceptance/virtio_version.py|  2 +-
  tests/migration/guestperf/engine.py   |  2 +-
  tests/qemu-iotests/235|  2 +-
  tests/qemu-iotests/297|  2 +-
  tests/qemu-iotests/300|  4 +--
  tests/qemu-iotests/iotests.py |  4 +--
  tests/vm/basevm.py|  6 ++--
  31 files changed, 71 insertions(+), 41 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-14 Thread Andrey Shinkevich

On 14.10.2020 14:59, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Limit COR operations by the base node in the backing chain when the
overlay base node name is given. It will be useful for a block stream
job when the COR-filter is applied. The overlay base node is passed as
the base itself may change due to concurrent commit jobs on the same
backing chain.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 39 +--
  1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index c578b1b..dfbd6ad 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -122,8 +122,43 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
 size_t qiov_offset,
 int flags)
  {
-return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
-   flags | BDRV_REQ_COPY_ON_READ);
+int64_t n = 0;
+int64_t size = offset + bytes;
+int local_flags;
+int ret;
+BDRVStateCOR *state = bs->opaque;
+
+if (!state->base_overlay) {
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
+}
+
+while (offset < size) {
+local_flags = flags;
+
+/* In case of failure, try to copy-on-read anyway */
+ret = bdrv_is_allocated(bs->file->bs, offset, bytes, );
+if (!ret) {


In case of failure, a negative value is going to be returned, we won’t
go into this conditional block, and local_flags isn’t going to contain
BDRV_REQ_COPY_ON_READ.

So the idea of CORing in case of failure sounds sound to me, but it
doesn’t look like that’s done.



Yes, it's obvious. That was just my fault to miss setting the additional 
condition for "ret < 0". Thank you for noticing that.


Andrey


+ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),


I think this should either be bdrv_backing_chain_next() or we must rule
out the possibility of bs->file->bs being a filter somewhere.  I think
I’d prefer the former.


+  state->base_overlay, true, offset,
+  n, );
+if (ret) {


“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
|| ret < 0” probably needed above), but correct either way.

Max





Re: [PATCH] hw/block/nvme: add block utilization tracking

2020-10-14 Thread Keith Busch
On Wed, Oct 14, 2020 at 10:47:21AM +0200, Klaus Jensen wrote:
> On Oct 13 14:06, Keith Busch wrote:
> 
> > If we were going to support it here, wouldn't it make more sense to
> > tie it to the filesystem's ability to support fallocate hole punch for
> > the backing namespace, and check if the range is allocated with
> > SEEK_HOLE?  Then you don't even need to track persistent state, and
> > we're actually getting the genuine capability.
> > 
> 
> Yes, this would be optimal, definitely. I think we have to rely on what
> we can do with the QEMU block layer, so please see my v2 that uses
> bdrv_block_status. I did do something like this initially, but was
> unsure if we could live with the fact that block drivers such as qcow2
> cannot honor a discard unless an entire cluster is discard/zeroed.
> 
> But see my commit message, I think we can work with it and still be in
> compliance with the spec.

Yes, I think that's better, though I'm not familiar with that particular
API. I definitely prefer implementing spec features in useful ways that
work with other layers rather than faking it.



Re: [PATCH v11 13/13] block: apply COR-filter to block-stream jobs

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> This patch completes the series with the COR-filter insertion for
> block-stream operations. Adding the filter makes it possible for copied
> regions to be discarded in backing files during the block-stream job,
> what will reduce the disk overuse.
> The COR-filter insertion incurs changes in the iotests case
> 245:test_block_stream_4 that reopens the backing chain during a
> block-stream job. There are changes in the iotests #030 as well.
> The iotests case 030:test_stream_parallel was deleted due to multiple
> conflicts between the concurrent job operations over the same backing
> chain. The base backing node for one job is the top node for another
> job. It may change due to the filter node inserted into the backing
> chain while both jobs are running. Another issue is that the parts of
> the backing chain are being frozen by the running job and may not be
> changed by the concurrent job when needed. The concept of the parallel
> jobs with common nodes is considered vital no more.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/stream.c | 93 
> +-
>  tests/qemu-iotests/030 | 51 +++--
>  tests/qemu-iotests/030.out |  4 +-
>  tests/qemu-iotests/141.out |  2 +-
>  tests/qemu-iotests/245 | 19 +++---
>  5 files changed, 81 insertions(+), 88 deletions(-)

Looks like stream_run() could be a bit streamlined now (the allocation
checking should be unnecessary, unconditionally calling
stream_populate() should be sufficient), but not necessary now.

> diff --git a/block/stream.c b/block/stream.c
> index d3e1812..93564db 100644
> --- a/block/stream.c
> +++ b/block/stream.c

[...]

> @@ -94,13 +94,14 @@ static void stream_clean(Job *job)
>  {
>  StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>  BlockJob *bjob = >common;
> -BlockDriverState *bs = blk_bs(bjob->blk);
> +
> +bdrv_cor_filter_drop(s->cor_filter_bs);
>  
>  /* Reopen the image back in read-only mode if necessary */
>  if (s->bs_read_only) {
>  /* Give up write permissions before making it read-only */
>  blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort);

Perhaps it would be good to do so even before the filter is dropped.  I
don’t know whether moving bjob->blk from cor_filter_bs to target_bs
might cause problems otherwise.

> -bdrv_reopen_set_read_only(bs, true, NULL);
> +bdrv_reopen_set_read_only(s->target_bs, true, NULL);
>  }
>  }

[...]

> @@ -262,17 +249,48 @@ void stream_start(const char *job_id, BlockDriverState 
> *bs,
>  }
>  }
>  
> -/* Prevent concurrent jobs trying to modify the graph structure here, we
> - * already have our own plans. Also don't allow resize as the image size 
> is
> - * queried only at the job start and then cached. */
> -s = block_job_create(job_id, _job_driver, NULL, bs,
> - basic_flags | BLK_PERM_GRAPH_MOD,
> - basic_flags | BLK_PERM_WRITE,
> +QDict *opts = qdict_new();

Declaration should be done at the start of the block.

> +
> +qdict_put_str(opts, "driver", "copy-on-read");
> +qdict_put_str(opts, "file", bdrv_get_node_name(bs));
> +if (base_overlay) {

@base_overlay is always non-NULL, this condition should check @base, I
think.

> +/* Pass the base_overlay rather than base */
> +qdict_put_str(opts, "base", base_overlay->node_name);
> +}
> +if (filter_node_name) {
> +qdict_put_str(opts, "node-name", filter_node_name);
> +}
> +
> +cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp);
> +if (cor_filter_bs == NULL) {
> +goto fail;
> +}
> +
> +if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {

Is there a reason why we can’t combine this with the
bdrv_free_backing_chain() from bs down to above_base?  I mean, the
effect should be the same, just asking.

> +bdrv_cor_filter_drop(cor_filter_bs);
> +cor_filter_bs = NULL;
> +goto fail;
> +}
> +
> +s = block_job_create(job_id, _job_driver, NULL, cor_filter_bs,
> + BLK_PERM_CONSISTENT_READ,
> + basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,

Not that I’m an expert on the GRAPH_MOD permission, but why is this
shared here but not below?  Shouldn’t it be the same in both cases?
(Same for taking it as a permission.)

>   speed, creation_flags, NULL, NULL, errp);
>  if (!s) {
>  goto fail;
>  }
>  
> +/*
> + * Prevent concurrent jobs trying to modify the graph structure here, we
> + * already have our own plans. Also don't allow resize as the image size 
> is
> + * queried only at the job start and then cached.
> + */
> +if (block_job_add_bdrv(>common, "active node", bs,
> +   basic_flags | BLK_PERM_GRAPH_MOD,
> + 

Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 19:30, Max Reitz wrote:

On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:

14.10.2020 15:51, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
   block/copy-on-read.c | 13 +
   block/io.c   |  3 ++-
   2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index b136895..278a11a 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -148,10 +148,15 @@ static int coroutine_fn
cor_co_preadv_part(BlockDriverState *bs,
   }
   }
   -    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
qiov_offset,
-  local_flags);
-    if (ret < 0) {
-    return ret;
+    if (!!(flags & BDRV_REQ_PREFETCH) &


How about dropping the double negation and using a logical && instead of
the binary &?


+    !(local_flags & BDRV_REQ_COPY_ON_READ)) {
+    /* Skip non-guest reads if no copy needed */
+    } else {


Hm.  I would have just written the negated form

(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))

and put the “skip” comment above that condition.

(Since local_flags is initialized to flags, it can be written as a
single comparison, but that’s a matter of taste and I’m not going to
recommend either over the other:

((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
BDRV_REQ_PREFETCH)

)


+    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
qiov_offset,
+  local_flags);
+    if (ret < 0) {
+    return ret;
+    }
   }
     offset += n;
diff --git a/block/io.c b/block/io.c
index 11df188..bff1808 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1512,7 +1512,8 @@ static int coroutine_fn
bdrv_aligned_preadv(BdrvChild *child,
     max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
   if (bytes <= max_bytes && bytes <= max_transfer) {
-    ret = bdrv_driver_preadv(bs, offset, bytes, qiov,
qiov_offset, 0);
+    ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
+ flags & bs->supported_read_flags);



When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be)
NULL. This means, that we can't just drop the flag when call the driver
that doesn't support it.


True. :/


Actually, if driver doesn't support the PREFETCH flag we should do nothing.


Hm.  But at least in the case of COR, PREFETCH is not something that can
be optimized to be a no-op (unless the COR is a no-op); it still denotes
a command that must be executed.

So if we can’t pass it to the driver, I don’t think we should do
nothing, but to return an error.  Or maybe we could even assert that it
isn’t set for drivers that don’t support it, because at least right now
such a case would just be a bug.


Hmm. Reasonable..

So, let me summarize the cases:

1. bdrv_co_preadv(.. , PREFETCH | COR)

  In this case generic layer should handle both flags and pass flags=0 to driver

2. bdrv_co_preadv(.., PREFETCH)

2.1 driver supporst PREFETCH
  
  OK, pass PREFETCH to driver, no problems


2.2 driver doesn't support PREFETCH

  We can just abort() here, as the only source of PREFETCH without COR would be
  stream job driver, which must read from COR filter.

  More generic solution is to allocate temporary buffer (at least if qiov is 
NULL)
  and call underlying driver .preadv with flags=0 on that temporary buffer. But
  just abort() is simpler and should work for now.




Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
why it isn’t.




Could it be part of patch 07? I mean introduce new field
supported_read_flags and handle it in generic code in one patch, prior
to implementing support for it in COR driver.


Sure.

Max




--
Best regards,
Vladimir



Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Max Reitz
On 14.10.20 18:08, Andrey Shinkevich wrote:
> On 14.10.2020 14:09, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> We are going to use the COR-filter for a block-stream job.
>>> To limit COR operations by the base node in the backing chain during
>>> stream job, pass the name of overlay base node to the copy-on-read
>>> driver as base node itself may change due to possible concurrent jobs.
>>> The rest of the functionality will be implemented in the patch that
>>> follows.
>>>
>>> Signed-off-by: Andrey Shinkevich 
>>> ---
>>>   block/copy-on-read.c | 14 ++
>>>   1 file changed, 14 insertions(+)
>>
>> Is there a reason why you didn’t add this option to QAPI (as part of a
>> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
>>
> 
> I agree that passing a base overlay under the base option looks clumsy.
> We could pass the base node name and find its overlay ourselves here in
> cor_open(). In that case, we can use the existing QAPI.

Sounds more complicated than to just split off BlockdevOptionsCor (from
BlockdevOptionsGenericFormat, like e.g. BlockdevOptionsRaw does it).

> The reason I used the existing QAPI is to make it easier for a user to
> operate with the traditional options and to keep things simple. So, the
> user shouldn't think what overlay or above-base node to pass.

Well, they don’t have to pass anything if they don’t need a bottom node.
 So unless they want to use a bottom/base-overlay node, it won’t become
more complicated.

> If we introduce the specific BlockdevOptionsCor, what other options may
> come with?

Nothing yet, I think.

>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index bcccf0f..c578b1b 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -24,19 +24,24 @@
>>>   #include "block/block_int.h"
>>>   #include "qemu/module.h"
>>>   #include "qapi/error.h"
>>> +#include "qapi/qmp/qerror.h"
>>>   #include "qapi/qmp/qdict.h"
>>>   #include "block/copy-on-read.h"
>>>       typedef struct BDRVStateCOR {
>>>   bool active;
>>> +    BlockDriverState *base_overlay;
>>>   } BDRVStateCOR;
>>>       static int cor_open(BlockDriverState *bs, QDict *options, int
>>> flags,
>>>   Error **errp)
>>>   {
>>> +    BlockDriverState *base_overlay = NULL;
>>>   BDRVStateCOR *state = bs->opaque;
>>> +    /* We need the base overlay node rather than the base itself */
>>> +    const char *base_overlay_node = qdict_get_try_str(options, "base");
>>
>> Shouldn’t it be called base-overlay or above-base then?
>>
> 
> The base_overlay identifier is used below as the pointer to BS. The
> base_overlay_node stands for the name of the node. I used that
> identifier to differ between the types. And the above_base has another
> meaning per block/stream.c - it can be a temporary filter with a JSON-name.

Yes, agreed; I’m not talking about the variable identifier though.  I’m
talking about the option name, which in this version is just “base”.
(And whenever that option is used, once here, once later in
block/stream.c, there is an explanatory comment why we don’t pass the
base node through it, but the first overlay above the base.  Which makes
me believe perhaps it shouldn’t be called “base”.)

>>>     bs->file = bdrv_open_child(NULL, options, "file", bs,
>>> _of_bds,
>>>  BDRV_CHILD_FILTERED |
>>> BDRV_CHILD_PRIMARY,
>>> @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict
>>> *options, int flags,
>>>   ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>>>   bs->file->bs->supported_zero_flags);
>>>   +    if (base_overlay_node) {
>>> +    qdict_del(options, "base");
>>> +    base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
>>
>> I think this is a use-after-free.  The storage @base_overlay_node points
>> to belongs to a QString, which is referenced only by @options; so
>> deleting that element of @options should free that string.
>>
>> Max
>>
> 
> I will swap those two function calls (bdrv_lookup_bs(); qdict_del();).
> Thank you.

Don’t forget that the error_setg() below also makes use of it:

>>> +    if (!base_overlay) {
>>> +    error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
>>> +    return -EINVAL;
>>> +    }
>>> +    }

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed

2020-10-14 Thread Max Reitz
On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 15:51, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>> COR-driver to skip unneeded reading. It can be taken into account for
>>> the COR-algorithms optimization. That check is being made during the
>>> block stream job by the moment.
>>>
>>> Signed-off-by: Andrey Shinkevich 
>>> ---
>>>   block/copy-on-read.c | 13 +
>>>   block/io.c   |  3 ++-
>>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index b136895..278a11a 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -148,10 +148,15 @@ static int coroutine_fn
>>> cor_co_preadv_part(BlockDriverState *bs,
>>>   }
>>>   }
>>>   -    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>> qiov_offset,
>>> -  local_flags);
>>> -    if (ret < 0) {
>>> -    return ret;
>>> +    if (!!(flags & BDRV_REQ_PREFETCH) &
>>
>> How about dropping the double negation and using a logical && instead of
>> the binary &?
>>
>>> +    !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>>> +    /* Skip non-guest reads if no copy needed */
>>> +    } else {
>>
>> Hm.  I would have just written the negated form
>>
>> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
>>
>> and put the “skip” comment above that condition.
>>
>> (Since local_flags is initialized to flags, it can be written as a
>> single comparison, but that’s a matter of taste and I’m not going to
>> recommend either over the other:
>>
>> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
>> BDRV_REQ_PREFETCH)
>>
>> )
>>
>>> +    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>> qiov_offset,
>>> +  local_flags);
>>> +    if (ret < 0) {
>>> +    return ret;
>>> +    }
>>>   }
>>>     offset += n;
>>> diff --git a/block/io.c b/block/io.c
>>> index 11df188..bff1808 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn
>>> bdrv_aligned_preadv(BdrvChild *child,
>>>     max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>   if (bytes <= max_bytes && bytes <= max_transfer) {
>>> -    ret = bdrv_driver_preadv(bs, offset, bytes, qiov,
>>> qiov_offset, 0);
>>> +    ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>> + flags & bs->supported_read_flags);
> 
> 
> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be)
> NULL. This means, that we can't just drop the flag when call the driver
> that doesn't support it.

True. :/

> Actually, if driver doesn't support the PREFETCH flag we should do nothing.

Hm.  But at least in the case of COR, PREFETCH is not something that can
be optimized to be a no-op (unless the COR is a no-op); it still denotes
a command that must be executed.

So if we can’t pass it to the driver, I don’t think we should do
nothing, but to return an error.  Or maybe we could even assert that it
isn’t set for drivers that don’t support it, because at least right now
such a case would just be a bug.

>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>> why it isn’t.
>>
> 
> 
> Could it be part of patch 07? I mean introduce new field
> supported_read_flags and handle it in generic code in one patch, prior
> to implementing support for it in COR driver.

Sure.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Andrey Shinkevich

On 14.10.2020 14:09, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

We are going to use the COR-filter for a block-stream job.
To limit COR operations by the base node in the backing chain during
stream job, pass the name of overlay base node to the copy-on-read
driver as base node itself may change due to possible concurrent jobs.
The rest of the functionality will be implemented in the patch that
follows.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 14 ++
  1 file changed, 14 insertions(+)


Is there a reason why you didn’t add this option to QAPI (as part of a
yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.



I agree that passing a base overlay under the base option looks clumsy. 
We could pass the base node name and find its overlay ourselves here in 
cor_open(). In that case, we can use the existing QAPI.
The reason I used the existing QAPI is to make it easier for a user to 
operate with the traditional options and to keep things simple. So, the 
user shouldn't think what overlay or above-base node to pass.
If we introduce the specific BlockdevOptionsCor, what other options may 
come with?



diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index bcccf0f..c578b1b 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,19 +24,24 @@
  #include "block/block_int.h"
  #include "qemu/module.h"
  #include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
  #include "qapi/qmp/qdict.h"
  #include "block/copy-on-read.h"
  
  
  typedef struct BDRVStateCOR {

  bool active;
+BlockDriverState *base_overlay;
  } BDRVStateCOR;
  
  
  static int cor_open(BlockDriverState *bs, QDict *options, int flags,

  Error **errp)
  {
+BlockDriverState *base_overlay = NULL;
  BDRVStateCOR *state = bs->opaque;
+/* We need the base overlay node rather than the base itself */
+const char *base_overlay_node = qdict_get_try_str(options, "base");


Shouldn’t it be called base-overlay or above-base then?



The base_overlay identifier is used below as the pointer to BS. The 
base_overlay_node stands for the name of the node. I used that 
identifier to differ between the types. And the above_base has another 
meaning per block/stream.c - it can be a temporary filter with a JSON-name.


  
  bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,

 BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
  
+if (base_overlay_node) {

+qdict_del(options, "base");
+base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);


I think this is a use-after-free.  The storage @base_overlay_node points
to belongs to a QString, which is referenced only by @options; so
deleting that element of @options should free that string.

Max



I will swap those two function calls (bdrv_lookup_bs(); qdict_del();).
Thank you.

Andrey


+if (!base_overlay) {
+error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
+return -EINVAL;
+}
+}
  state->active = true;
+state->base_overlay = base_overlay;
  
  /*

   * We don't need to call bdrv_child_refresh_perms() now as the permissions








Re: [PATCH v11 02/13] copy-on-read: add filter append/drop functions

2020-10-14 Thread Max Reitz
On 14.10.20 16:28, Andrey Shinkevich wrote:
> On 14.10.2020 13:44, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Provide API for the COR-filter insertion/removal.
>>> Also, drop the filter child permissions for an inactive state when the
>>> filter node is being removed.
>>>
>>> Signed-off-by: Andrey Shinkevich 
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/copy-on-read.c | 88
>>> 
>>>   block/copy-on-read.h | 35 +
>>>   2 files changed, 123 insertions(+)
>>>   create mode 100644 block/copy-on-read.h
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index cb03e0f..bcccf0f 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>
>> [...]
>>
>>> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>>>   bdrv_register(_copy_on_read);
>>>   }
>>>   +
>>> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
>>> + QDict *node_options,
>>> + int flags, Error **errp)
>>
>> I had hoped you could make this a generic block layer function. :(
>>
>> (Because it really is rather generic)
>>
>> *shrug*
> 
> Actually, I did (and still can do) that for the 'append node' function
> only but not for the 'drop node' one so far...

Ah, yeah.

> diff --git a/block.c b/block.c
> index 11ab55f..f41e876 100644
> --- a/block.c
> +++ b/block.c
> @@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs)
>  g_free(bs);
>  }
> 
> +BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict
> *node_options,
> +   int flags, Error **errp)
> +{
> +    BlockDriverState *new_node_bs;
> +    Error *local_err = NULL;
> +
> +    new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
> +    if (new_node_bs == NULL) {
> +    error_prepend(errp, "Could not create node: ");
> +    return NULL;
> +    }
> +
> +    bdrv_drained_begin(bs);
> +    bdrv_replace_node(bs, new_node_bs, _err);
> +    bdrv_drained_end(bs);
> +
> +    if (local_err) {
> +    bdrv_unref(new_node_bs);
> +    error_propagate(errp, local_err);
> +    return NULL;
> +    }
> +
> +    return new_node_bs;
> +}
> +
> +void bdrv_remove_node(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    BlockDriverState *inferior_bs;
> +
> +    child = bdrv_filter_or_cow_child(bs);
> +    if (!child) {
> +    return;
> +    }
> +    inferior_bs = child->bs;
> +
> +    /* Retain the BDS until we complete the graph change. */
> +    bdrv_ref(inferior_bs);
> +    /* Hold a guest back from writing while permissions are being
> reset. */
> +    bdrv_drained_begin(inferior_bs);
> +    /* Refresh permissions before the graph change. */
> +    bdrv_child_refresh_perms(bs, child, _abort);
> +    bdrv_replace_node(bs, inferior_bs, _abort);
> +
> +    bdrv_drained_end(inferior_bs);
> +    bdrv_unref(inferior_bs);
> +    bdrv_unref(bs);
> +}
> 
> So, it is an intermediate solution in this patch of the series. I am
> going to make both functions generic once Vladimir overhauls the QEMU
> permission update system. Otherwise, the COR-filter node cannot be
> removed from the backing chain gracefully.

True.

> Thank you for your r-b. If the next version comes, I can move the
> 'append node' function only to the generic layer.

Thanks!  Even if you decide against it, I’m happy as long as there are
plans to perhaps make it generic at some point.

(I’m just afraid this function might be useful in some other case, too,
and we forget that it exists here already, and then end up
reimplementing it.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Max Reitz
On 14.10.20 16:56, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 14:57, Max Reitz wrote:
>> On 14.10.20 13:09, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
 We are going to use the COR-filter for a block-stream job.
 To limit COR operations by the base node in the backing chain during
 stream job, pass the name of overlay base node to the copy-on-read
 driver as base node itself may change due to possible concurrent jobs.
 The rest of the functionality will be implemented in the patch that
 follows.

 Signed-off-by: Andrey Shinkevich 
 ---
   block/copy-on-read.c | 14 ++
   1 file changed, 14 insertions(+)
>>>
>>> Is there a reason why you didn’t add this option to QAPI (as part of a
>>> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it
>>> there.
>>>
 diff --git a/block/copy-on-read.c b/block/copy-on-read.c
 index bcccf0f..c578b1b 100644
 --- a/block/copy-on-read.c
 +++ b/block/copy-on-read.c
 @@ -24,19 +24,24 @@
   #include "block/block_int.h"
   #include "qemu/module.h"
   #include "qapi/error.h"
 +#include "qapi/qmp/qerror.h"
   #include "qapi/qmp/qdict.h"
   #include "block/copy-on-read.h"
       typedef struct BDRVStateCOR {
   bool active;
 +    BlockDriverState *base_overlay;
   } BDRVStateCOR;
       static int cor_open(BlockDriverState *bs, QDict *options, int
 flags,
   Error **errp)
   {
 +    BlockDriverState *base_overlay = NULL;
   BDRVStateCOR *state = bs->opaque;
 +    /* We need the base overlay node rather than the base itself */
 +    const char *base_overlay_node = qdict_get_try_str(options,
 "base");
>>>
>>> Shouldn’t it be called base-overlay or above-base then?
>>
>> While reviewing patch 5, I realized this sould indeed be base-overlay

(Just realized that sounds different from how I meant it.  I meant that
 “above-base” would be wrong, so from the two, if any, it should be
“base-overlay”.)

>> (i.e. the next allocation-bearing layer above the base, not just a
>> filter on top of base), but that should be noted somewhere, of course.
>> Best would be the QAPI documentation for this option. O:)
>>
> 
> What about naming it just "bottom" or "bottom-node"? If we don't have
> base, it's strange to have something "relative to base". And we can
> document, that "bottom" must be a non-filter node in a backing chain of
> "top".

Works for me, too.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 19:08, Andrey Shinkevich wrote:

On 14.10.2020 14:09, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

We are going to use the COR-filter for a block-stream job.
To limit COR operations by the base node in the backing chain during
stream job, pass the name of overlay base node to the copy-on-read
driver as base node itself may change due to possible concurrent jobs.
The rest of the functionality will be implemented in the patch that
follows.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 14 ++
  1 file changed, 14 insertions(+)


Is there a reason why you didn’t add this option to QAPI (as part of a
yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.



I agree that passing a base overlay under the base option looks clumsy. We 
could pass the base node name and find its overlay ourselves here in 
cor_open(). In that case, we can use the existing QAPI.


Actually, there is no existing QAPI: if you don't modify qapi/*.json, user is 
not able to pass the option through QAPI. It's still possible to pass the 
option through command-line, or when create the filter internally (like we are 
going to do in block-stream), but not through QAPI. So, it's better to make a 
new QAPI parameter, to make the new option available for QMP interface.


The reason I used the existing QAPI is to make it easier for a user to operate 
with the traditional options and to keep things simple. So, the user shouldn't 
think what overlay or above-base node to pass.
If we introduce the specific BlockdevOptionsCor, what other options may come 
with?


diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index bcccf0f..c578b1b 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,19 +24,24 @@
  #include "block/block_int.h"
  #include "qemu/module.h"
  #include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
  #include "qapi/qmp/qdict.h"
  #include "block/copy-on-read.h"
  typedef struct BDRVStateCOR {
  bool active;
+    BlockDriverState *base_overlay;
  } BDRVStateCOR;
  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
  {
+    BlockDriverState *base_overlay = NULL;
  BDRVStateCOR *state = bs->opaque;
+    /* We need the base overlay node rather than the base itself */
+    const char *base_overlay_node = qdict_get_try_str(options, "base");


Shouldn’t it be called base-overlay or above-base then?



The base_overlay identifier is used below as the pointer to BS. The 
base_overlay_node stands for the name of the node. I used that identifier to 
differ between the types. And the above_base has another meaning per 
block/stream.c - it can be a temporary filter with a JSON-name.


  bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
 BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
+    if (base_overlay_node) {
+    qdict_del(options, "base");
+    base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);


I think this is a use-after-free.  The storage @base_overlay_node points
to belongs to a QString, which is referenced only by @options; so
deleting that element of @options should free that string.

Max



I will swap those two function calls (bdrv_lookup_bs(); qdict_del();).
Thank you.

Andrey


+    if (!base_overlay) {
+    error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
+    return -EINVAL;
+    }
+    }
  state->active = true;
+    state->base_overlay = base_overlay;
  /*
   * We don't need to call bdrv_child_refresh_perms() now as the permissions







--
Best regards,
Vladimir



Re: [PATCH v2 03/15] python: add VERSION file

2020-10-14 Thread John Snow

On 10/14/20 10:29 AM, John Snow wrote:

Python infrastructure as it exists today is not capable reliably of
single-sourcing a package version from a parent directory. The authors
of pip are working to correct this, but as of today this is not possible
to my knowledge.

The problem is that when using pip to build and install a python
package, it copies files over to a temporary directory and performs its
build there. This loses access to any information in the parent
directory, including git itself.



See also:

https://github.com/pypa/pip/issues/7549
https://github.com/pypa/pip/issues/7555


Further, Python versions have a standard (PEP 440) that may or may not
follow QEMU's versioning. In general, it does; but naturally QEMU does
not follow PEP 440. To avoid any automatically-generated conflict, a
manual version file is preferred.


I am proposing:

- Python core tooling synchronizes with the QEMU version directly
   (5.2.0, 5.1.1, 5.3.0, etc.)

- In the event that a Python package needs to be updated independently
   of the QEMU version, a pre-release alpha version should be preferred,
   but *only* after inclusion to the qemu development or stable branches.

   e.g. 5.2.0a1, 5.2.0a2, and so on should be preferred prior to 5.2.0's
   release.

- The Python core tooling makes absolutely no version compatibility
   checks or constraints. It *may* work with releases of QEMU from the
   past or future, but it is not required to.

   i.e., "qemu.core" will always remain in lock-step with QEMU.

- We reserve the right to split out e.g. qemu.core.qmp to qemu.qmp
   and begin indepedently versioning such a package separately from the
   QEMU version it accompanies.


Implement this versioning scheme by adding a VERSION file and setting it
to 5.2.0a1.

Signed-off-by: John Snow 
---
  python/VERSION   | 1 +
  python/setup.cfg | 1 +
  2 files changed, 2 insertions(+)
  create mode 100644 python/VERSION

diff --git a/python/VERSION b/python/VERSION
new file mode 100644
index 00..2e81039c82
--- /dev/null
+++ b/python/VERSION
@@ -0,0 +1 @@
+5.2.0a1
diff --git a/python/setup.cfg b/python/setup.cfg
index 12b99a796e..260f7f4e94 100755
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -1,5 +1,6 @@
  [metadata]
  name = qemu
+version = file:VERSION
  maintainer = QEMU Developer Team
  maintainer_email = qemu-de...@nongnu.org
  url = https://www.qemu.org/






[PATCH 15/15] block/nvme: Set request_alignment at initialization

2020-10-14 Thread Philippe Mathieu-Daudé
When introducing this driver in commit bdd6a90a9e5
("block: Add VFIO based NVMe driver") we correctly
set the request_alignment in nvme_refresh_limits()
but forgot to set it at initialization. Do it now.

Reported-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nvme.c b/block/nvme.c
index 4d2b7fdf4f4..04608b49f56 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -745,6 +745,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 s->page_size = MAX(4096, 1 << NVME_CAP_MPSMIN(cap));
 s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
 bs->bl.opt_mem_alignment = s->page_size;
+bs->bl.request_alignment = s->page_size;
 timeout_ms = MIN(500 * NVME_CAP_TO(cap), 3);
 
 /* Reset device to get a clean state. */
-- 
2.26.2




[PATCH 14/15] block/nvme: Report warning with warn_report()

2020-10-14 Thread Philippe Mathieu-Daudé
Instead of displaying warning on stderr, use warn_report()
which also displays it on the monitor.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index ea5180d8a27..4d2b7fdf4f4 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -390,8 +390,8 @@ static bool nvme_process_completion(NVMeQueuePair *q)
 }
 cid = le16_to_cpu(c->cid);
 if (cid == 0 || cid > NVME_QUEUE_SIZE) {
-fprintf(stderr, "Unexpected CID in completion queue: %" PRIu32 
"\n",
-cid);
+warn_report("NVMe: Unexpected CID in completion queue: %"PRIu32", "
+"queue size: %u", cid, NVME_QUEUE_SIZE);
 continue;
 }
 trace_nvme_complete_command(s, q->index, cid);
-- 
2.26.2




[PATCH 11/15] block/nvme: Trace controller capabilities

2020-10-14 Thread Philippe Mathieu-Daudé
Controllers have different capabilities and report them in the
CAP register. We are particularly interested by the page size
limits.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c   | 10 ++
 block/trace-events |  1 +
 2 files changed, 11 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index 11fba2d754d..fad727416ee 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -725,6 +725,16 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
  * Initialization". */
 
 cap = le64_to_cpu(regs->cap);
+trace_nvme_controller_capability("Maximum Queue Entries Supported",
+ NVME_CAP_MQES(cap));
+trace_nvme_controller_capability("Contiguous Queues Required",
+ NVME_CAP_CQR(cap));
+trace_nvme_controller_capability("Subsystem  Reset Supported",
+ NVME_CAP_NSSRS(cap));
+trace_nvme_controller_capability("Memory Page Size Minimum",
+ NVME_CAP_MPSMIN(cap));
+trace_nvme_controller_capability("Memory Page Size Maximum",
+ NVME_CAP_MPSMAX(cap));
 if (!NVME_CAP_CSS(cap)) {
 error_setg(errp, "Device doesn't support NVMe command set");
 ret = -EINVAL;
diff --git a/block/trace-events b/block/trace-events
index 6694c23e1c1..8a24d7a8650 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -134,6 +134,7 @@ qed_aio_write_postfill(void *s, void *acb, uint64_t start, 
size_t len, uint64_t
 qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
"s %p acb %p ret %d offset %"PRIu64" len %zu"
 
 # nvme.c
+nvme_controller_capability(const char *desc, uint64_t value) "%s: %"PRIu64
 nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
"cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
-- 
2.26.2




[PATCH 13/15] block/nvme: Simplify ADMIN queue access

2020-10-14 Thread Philippe Mathieu-Daudé
We don't need to dereference from BDRVNVMeState each time.
Use a NVMeQueuePair pointer to the admin queue and use it.
The nvme_init() becomes easier to review, matching the style
of nvme_add_io_queue().

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 299fc82f40f..ea5180d8a27 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -690,6 +690,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
  Error **errp)
 {
 BDRVNVMeState *s = bs->opaque;
+NVMeQueuePair *q;
 AioContext *aio_context = bdrv_get_aio_context(bs);
 int ret;
 uint64_t cap;
@@ -769,19 +770,20 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 
 /* Set up admin queue. */
 s->queues = g_new(NVMeQueuePair *, 1);
-s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s, aio_context, 0,
+q = nvme_create_queue_pair(s, aio_context, 0,
   NVME_QUEUE_SIZE,
   errp);
-if (!s->queues[INDEX_ADMIN]) {
+if (!q) {
 ret = -EINVAL;
 goto out;
 }
+s->queues[INDEX_ADMIN] = q;
 s->nr_queues = 1;
 QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
 regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << AQA_ACQS_SHIFT) |
 (NVME_QUEUE_SIZE << AQA_ASQS_SHIFT));
-regs->asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
-regs->acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
+regs->asq = cpu_to_le64(q->sq.iova);
+regs->acq = cpu_to_le64(q->cq.iova);
 
 /* After setting up all control registers we can enable device now. */
 regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << CC_IOCQES_SHIFT) |
-- 
2.26.2




[PATCH 12/15] block/nvme: Simplify device reset

2020-10-14 Thread Philippe Mathieu-Daudé
Avoid multiple endianess conversion by using device endianess.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index fad727416ee..299fc82f40f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -747,7 +747,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 timeout_ms = MIN(500 * NVME_CAP_TO(cap), 3);
 
 /* Reset device to get a clean state. */
-regs->cc = cpu_to_le32(le32_to_cpu(regs->cc) & 0xFE);
+regs->cc &= const_le32(0xFE);
 /* Wait for CSTS.RDY = 0. */
 deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * SCALE_MS;
 while (NVME_CSTS_RDY(le32_to_cpu(regs->csts))) {
-- 
2.26.2




[PATCH 08/15] block/nvme: Pass AioContext argument to nvme_add_io_queue()

2020-10-14 Thread Philippe Mathieu-Daudé
We want to get ride of the BlockDriverState pointer at some point,
so pass aio_context along.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 523814a1243..b841c5950c5 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -642,7 +642,9 @@ static bool nvme_poll_cb(void *opaque)
 return nvme_poll_queues(s);
 }
 
-static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
+/* Returns true on success, false on failure. */
+static bool nvme_add_io_queue(BlockDriverState *bs,
+  AioContext *aio_context, Error **errp)
 {
 BDRVNVMeState *s = bs->opaque;
 unsigned n = s->nr_queues;
@@ -650,8 +652,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
**errp)
 NvmeCmd cmd;
 unsigned queue_size = NVME_QUEUE_SIZE;
 
-q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
-   n, queue_size, errp);
+q = nvme_create_queue_pair(s, aio_context, n, queue_size, errp);
 if (!q) {
 return false;
 }
@@ -805,7 +806,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 
 /* Set up command queues. */
-if (!nvme_add_io_queue(bs, errp)) {
+if (!nvme_add_io_queue(bs, aio_context, errp)) {
 ret = -EIO;
 }
 out:
-- 
2.26.2




[PATCH 02/15] block/nvme: Trace nvme_poll_queue() per queue

2020-10-14 Thread Philippe Mathieu-Daudé
As we want to enable multiple queues, report the event
in each nvme_poll_queue() call, rather than once in
the callback calling nvme_poll_queues().

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c   | 2 +-
 block/trace-events | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index a534c61b6b6..7c253eafa7f 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -585,6 +585,7 @@ static bool nvme_poll_queue(NVMeQueuePair *q)
 const size_t cqe_offset = q->cq.head * NVME_CQ_ENTRY_BYTES;
 NvmeCqe *cqe = (NvmeCqe *)>cq.queue[cqe_offset];
 
+trace_nvme_poll_queue(q->s, q->index);
 /*
  * Do an early check for completions. q->lock isn't needed because
  * nvme_process_completion() only runs in the event loop thread and
@@ -633,7 +634,6 @@ static bool nvme_poll_cb(void *opaque)
 BDRVNVMeState *s = container_of(e, BDRVNVMeState,
 irq_notifier[MSIX_SHARED_IRQ_IDX]);
 
-trace_nvme_poll_cb(s);
 return nvme_poll_queues(s);
 }
 
diff --git a/block/trace-events b/block/trace-events
index 0e351c3fa3d..fa50af6b6f3 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -143,7 +143,7 @@ nvme_complete_command(void *s, int index, int cid) "s %p 
queue %d cid %d"
 nvme_submit_command(void *s, int index, int cid) "s %p queue %d cid %d"
 nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int 
c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
 nvme_handle_event(void *s) "s %p"
-nvme_poll_cb(void *s) "s %p"
+nvme_poll_queue(void *s, unsigned q_index) "s %p q #%u"
 nvme_prw_aligned(void *s, int is_write, uint64_t offset, uint64_t bytes, int 
flags, int niov) "s %p is_write %d offset %"PRId64" bytes %"PRId64" flags %d 
niov %d"
 nvme_write_zeroes(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p 
offset %"PRId64" bytes %"PRId64" flags %d"
 nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int 
align) "qiov %p n %d base %p size 0x%zx align 0x%x"
-- 
2.26.2




[PATCH 06/15] block/nvme: Make nvme_identify() return boolean indicating error

2020-10-14 Thread Philippe Mathieu-Daudé
Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 95f19e12cd6..95f8f8b360b 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -496,9 +496,11 @@ static int nvme_cmd_sync(BlockDriverState *bs, 
NVMeQueuePair *q,
 return ret;
 }
 
-static void nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
+/* Returns true on success, false on failure. */
+static bool nvme_identify(BlockDriverState *bs, int namespace, Error **errp)
 {
 BDRVNVMeState *s = bs->opaque;
+bool ret = false;
 union {
 NvmeIdCtrl ctrl;
 NvmeIdNs ns;
@@ -575,10 +577,13 @@ static void nvme_identify(BlockDriverState *bs, int 
namespace, Error **errp)
 goto out;
 }
 
+ret = true;
 s->blkshift = lbaf->ds;
 out:
 qemu_vfio_dma_unmap(s->vfio, id);
 qemu_vfree(id);
+
+return ret;
 }
 
 static bool nvme_poll_queue(NVMeQueuePair *q)
-- 
2.26.2




[PATCH 09/15] block/nvme: Introduce Completion Queue definitions

2020-10-14 Thread Philippe Mathieu-Daudé
Rename Submission Queue flags with 'Sq' and introduce
Completion Queue flag definitions.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/block/nvme.h | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 65e68a82c89..079f884a2d3 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -491,6 +491,11 @@ typedef struct QEMU_PACKED NvmeCreateCq {
 #define NVME_CQ_FLAGS_PC(cq_flags)  (cq_flags & 0x1)
 #define NVME_CQ_FLAGS_IEN(cq_flags) ((cq_flags >> 1) & 0x1)
 
+enum NvmeFlagsCq {
+NVME_CQ_PC  = 1,
+NVME_CQ_IEN = 2,
+};
+
 typedef struct QEMU_PACKED NvmeCreateSq {
 uint8_t opcode;
 uint8_t flags;
@@ -508,12 +513,12 @@ typedef struct QEMU_PACKED NvmeCreateSq {
 #define NVME_SQ_FLAGS_PC(sq_flags)  (sq_flags & 0x1)
 #define NVME_SQ_FLAGS_QPRIO(sq_flags)   ((sq_flags >> 1) & 0x3)
 
-enum NvmeQueueFlags {
-NVME_Q_PC   = 1,
-NVME_Q_PRIO_URGENT  = 0,
-NVME_Q_PRIO_HIGH= 1,
-NVME_Q_PRIO_NORMAL  = 2,
-NVME_Q_PRIO_LOW = 3,
+enum NvmeFlagsSq {
+NVME_SQ_PC  = 1,
+NVME_SQ_PRIO_URGENT = 0,
+NVME_SQ_PRIO_HIGH   = 1,
+NVME_SQ_PRIO_NORMAL = 2,
+NVME_SQ_PRIO_LOW= 3,
 };
 
 typedef struct QEMU_PACKED NvmeIdentify {
-- 
2.26.2




[PATCH 10/15] block/nvme: Use definitions instead of magic values in add_io_queue()

2020-10-14 Thread Philippe Mathieu-Daudé
Replace magic values by definitions, and simplifiy since the
number of queues will never reach 64K.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index b841c5950c5..11fba2d754d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -652,6 +652,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs,
 NvmeCmd cmd;
 unsigned queue_size = NVME_QUEUE_SIZE;
 
+assert(n <= UINT16_MAX);
 q = nvme_create_queue_pair(s, aio_context, n, queue_size, errp);
 if (!q) {
 return false;
@@ -659,8 +660,8 @@ static bool nvme_add_io_queue(BlockDriverState *bs,
 cmd = (NvmeCmd) {
 .opcode = NVME_ADM_CMD_CREATE_CQ,
 .dptr.prp1 = cpu_to_le64(q->cq.iova),
-.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
-.cdw11 = cpu_to_le32(0x3),
+.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | n),
+.cdw11 = cpu_to_le32(NVME_CQ_IEN | NVME_CQ_PC),
 };
 if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
 error_setg(errp, "Failed to create CQ io queue [%u]", n);
@@ -669,8 +670,8 @@ static bool nvme_add_io_queue(BlockDriverState *bs,
 cmd = (NvmeCmd) {
 .opcode = NVME_ADM_CMD_CREATE_SQ,
 .dptr.prp1 = cpu_to_le64(q->sq.iova),
-.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | (n & 0x)),
-.cdw11 = cpu_to_le32(0x1 | (n << 16)),
+.cdw10 = cpu_to_le32(((queue_size - 1) << 16) | n),
+.cdw11 = cpu_to_le32(NVME_SQ_PC | (n << 16)),
 };
 if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
 error_setg(errp, "Failed to create SQ io queue [%u]", n);
-- 
2.26.2




[PATCH 07/15] block/nvme: Make nvme_init_queue() return boolean indicating error

2020-10-14 Thread Philippe Mathieu-Daudé
Just for consistency, following the example documented since
commit e3fe3988d7 ("error: Document Error API usage rules"),
return a boolean value indicating an error is set or not.
This simplifies a bit nvme_create_queue_pair().

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 95f8f8b360b..523814a1243 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -153,7 +153,8 @@ static QemuOptsList runtime_opts = {
 },
 };
 
-static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
+/* Returns true on success, false on failure. */
+static bool nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
 unsigned nentries, size_t entry_bytes, Error 
**errp)
 {
 size_t bytes;
@@ -164,13 +165,14 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue 
*q,
 q->queue = qemu_try_memalign(s->page_size, bytes);
 if (!q->queue) {
 error_setg(errp, "Cannot allocate queue");
-return;
+return false;
 }
 memset(q->queue, 0, bytes);
 r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, >iova, errp);
 if (r) {
 error_prepend(errp, "Cannot map queue: ");
 }
+return r == 0;
 }
 
 static void nvme_free_queue_pair(NVMeQueuePair *q)
@@ -203,7 +205,6 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState 
*s,
  Error **errp)
 {
 int i, r;
-Error *local_err = NULL;
 NVMeQueuePair *q;
 uint64_t prp_list_iova;
 
@@ -240,16 +241,12 @@ static NVMeQueuePair 
*nvme_create_queue_pair(BDRVNVMeState *s,
 req->prp_list_iova = prp_list_iova + i * s->page_size;
 }
 
-nvme_init_queue(s, >sq, size, NVME_SQ_ENTRY_BYTES, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!nvme_init_queue(s, >sq, size, NVME_SQ_ENTRY_BYTES, errp)) {
 goto fail;
 }
 q->sq.doorbell = >doorbells[idx * s->doorbell_scale].sq_tail;
 
-nvme_init_queue(s, >cq, size, NVME_CQ_ENTRY_BYTES, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!nvme_init_queue(s, >cq, size, NVME_CQ_ENTRY_BYTES, errp)) {
 goto fail;
 }
 q->cq.doorbell = >doorbells[idx * s->doorbell_scale].cq_head;
-- 
2.26.2




[PATCH 04/15] block/nvme: Improve nvme_free_req_queue_wait() trace information

2020-10-14 Thread Philippe Mathieu-Daudé
What we want to trace is the block driver state and the queue index.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c   | 2 +-
 block/trace-events | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index d84206b598d..e9410f2e0eb 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -286,7 +286,7 @@ static NVMeRequest *nvme_get_free_req(NVMeQueuePair *q)
 
 while (q->free_req_head == -1) {
 if (qemu_in_coroutine()) {
-trace_nvme_free_req_queue_wait(q);
+trace_nvme_free_req_queue_wait(q->s, q->index);
 qemu_co_queue_wait(>free_req_queue, >lock);
 } else {
 qemu_mutex_unlock(>lock);
diff --git a/block/trace-events b/block/trace-events
index 3bb5a238601..410789188cc 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -152,7 +152,7 @@ nvme_rw_done(void *s, int is_write, uint64_t offset, 
uint64_t bytes, int ret) "s
 nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset %"PRId64" 
bytes %"PRId64""
 nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset 
%"PRId64" bytes %"PRId64" ret %d"
 nvme_dma_map_flush(void *s) "s %p"
-nvme_free_req_queue_wait(void *q) "q %p"
+nvme_free_req_queue_wait(void *s, unsigned q_index) "s %p q #%u"
 nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s 
%p cmd %p req %p qiov %p entries %d"
 nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 
0x%"PRIx64
 nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] %p 
pages %d"
-- 
2.26.2




Re: [PATCH v3] migration/block-dirty-bitmap: fix uninitialized variable warning

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 14:44, Chen Qun wrote:

A default value is provided for the variable 'bitmap_name' to avoid compiler 
warning.

The compiler show warning:
migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’
may be used uninitialized in this function [-Wmaybe-uninitialized]
g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name));
^~

Reported-by: Euler Robot
Signed-off-by: Chen Qun


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



[PATCH 05/15] block/nvme: Trace queue pair creation/deletion

2020-10-14 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c   | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index e9410f2e0eb..95f19e12cd6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -175,6 +175,7 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
 
 static void nvme_free_queue_pair(NVMeQueuePair *q)
 {
+trace_nvme_free_queue_pair(q->index, q);
 if (q->completion_bh) {
 qemu_bh_delete(q->completion_bh);
 }
@@ -210,6 +211,8 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState 
*s,
 if (!q) {
 return NULL;
 }
+trace_nvme_create_queue_pair(idx, q, size, aio_context,
+ event_notifier_get_fd(s->irq_notifier));
 q->prp_list_pages = qemu_try_memalign(s->page_size,
   s->page_size * NVME_NUM_REQS);
 if (!q->prp_list_pages) {
diff --git a/block/trace-events b/block/trace-events
index 410789188cc..6694c23e1c1 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -153,6 +153,8 @@ nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p 
offset %"PRId64" bytes
 nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset 
%"PRId64" bytes %"PRId64" ret %d"
 nvme_dma_map_flush(void *s) "s %p"
 nvme_free_req_queue_wait(void *s, unsigned q_index) "s %p q #%u"
+nvme_create_queue_pair(unsigned q_index, void *q, unsigned size, void 
*aio_context, int fd) "index %u q %p size %u aioctx %p fd %d"
+nvme_free_queue_pair(unsigned q_index, void *q) "index %u q %p"
 nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s 
%p cmd %p req %p qiov %p entries %d"
 nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 
0x%"PRIx64
 nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] %p 
pages %d"
-- 
2.26.2




[PATCH 03/15] block/nvme: Use unsigned integer for queue counter/size

2020-10-14 Thread Philippe Mathieu-Daudé
We can not have negative queue count/size/index, use unsigned type.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c   | 24 +++-
 block/trace-events | 10 +-
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 7c253eafa7f..d84206b598d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -103,7 +103,7 @@ struct BDRVNVMeState {
  * [1..]: io queues.
  */
 NVMeQueuePair **queues;
-int nr_queues;
+unsigned nr_queues;
 size_t page_size;
 /* How many uint32_t elements does each doorbell entry take. */
 size_t doorbell_scale;
@@ -154,7 +154,7 @@ static QemuOptsList runtime_opts = {
 };
 
 static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
-int nentries, int entry_bytes, Error **errp)
+unsigned nentries, size_t entry_bytes, Error 
**errp)
 {
 size_t bytes;
 int r;
@@ -198,7 +198,7 @@ static void nvme_free_req_queue_cb(void *opaque)
 
 static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState *s,
  AioContext *aio_context,
- int idx, int size,
+ unsigned idx, size_t size,
  Error **errp)
 {
 int i, r;
@@ -640,10 +640,10 @@ static bool nvme_poll_cb(void *opaque)
 static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
 {
 BDRVNVMeState *s = bs->opaque;
-int n = s->nr_queues;
+unsigned n = s->nr_queues;
 NVMeQueuePair *q;
 NvmeCmd cmd;
-int queue_size = NVME_QUEUE_SIZE;
+unsigned queue_size = NVME_QUEUE_SIZE;
 
 q = nvme_create_queue_pair(s, bdrv_get_aio_context(bs),
n, queue_size, errp);
@@ -657,7 +657,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
**errp)
 .cdw11 = cpu_to_le32(0x3),
 };
 if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
-error_setg(errp, "Failed to create CQ io queue [%d]", n);
+error_setg(errp, "Failed to create CQ io queue [%u]", n);
 goto out_error;
 }
 cmd = (NvmeCmd) {
@@ -667,7 +667,7 @@ static bool nvme_add_io_queue(BlockDriverState *bs, Error 
**errp)
 .cdw11 = cpu_to_le32(0x1 | (n << 16)),
 };
 if (nvme_cmd_sync(bs, s->queues[INDEX_ADMIN], )) {
-error_setg(errp, "Failed to create SQ io queue [%d]", n);
+error_setg(errp, "Failed to create SQ io queue [%u]", n);
 goto out_error;
 }
 s->queues = g_renew(NVMeQueuePair *, s->queues, n + 1);
@@ -869,10 +869,9 @@ static int 
nvme_enable_disable_write_cache(BlockDriverState *bs, bool enable,
 
 static void nvme_close(BlockDriverState *bs)
 {
-int i;
 BDRVNVMeState *s = bs->opaque;
 
-for (i = 0; i < s->nr_queues; ++i) {
+for (unsigned i = 0; i < s->nr_queues; ++i) {
 nvme_free_queue_pair(s->queues[i]);
 }
 g_free(s->queues);
@@ -1380,7 +1379,7 @@ static void nvme_detach_aio_context(BlockDriverState *bs)
 {
 BDRVNVMeState *s = bs->opaque;
 
-for (int i = 0; i < s->nr_queues; i++) {
+for (unsigned i = 0; i < s->nr_queues; i++) {
 NVMeQueuePair *q = s->queues[i];
 
 qemu_bh_delete(q->completion_bh);
@@ -1401,7 +1400,7 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 aio_set_event_notifier(new_context, >irq_notifier[MSIX_SHARED_IRQ_IDX],
false, nvme_handle_event, nvme_poll_cb);
 
-for (int i = 0; i < s->nr_queues; i++) {
+for (unsigned i = 0; i < s->nr_queues; i++) {
 NVMeQueuePair *q = s->queues[i];
 
 q->completion_bh =
@@ -1418,11 +1417,10 @@ static void nvme_aio_plug(BlockDriverState *bs)
 
 static void nvme_aio_unplug(BlockDriverState *bs)
 {
-int i;
 BDRVNVMeState *s = bs->opaque;
 assert(s->plugged);
 s->plugged = false;
-for (i = INDEX_IO(0); i < s->nr_queues; i++) {
+for (unsigned i = INDEX_IO(0); i < s->nr_queues; i++) {
 NVMeQueuePair *q = s->queues[i];
 qemu_mutex_lock(>lock);
 nvme_kick(q);
diff --git a/block/trace-events b/block/trace-events
index fa50af6b6f3..3bb5a238601 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -134,13 +134,13 @@ qed_aio_write_postfill(void *s, void *acb, uint64_t 
start, size_t len, uint64_t
 qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) 
"s %p acb %p ret %d offset %"PRIu64" len %zu"
 
 # nvme.c
-nvme_kick(void *s, int queue) "s %p queue %d"
+nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
"cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
-nvme_process_completion(void *s, int index, int inflight) "s %p queue %d 
inflight %d"
-nvme_process_completion_queue_plugged(void *s, int index) "s %p queue %d"
-nvme_complete_command(void 

[PATCH 00/15] block/nvme: Improve debugging experience and minor fixes

2020-10-14 Thread Philippe Mathieu-Daudé
Another set of boring patches, in preparation of
supporting multiple queues (the next series).

Based-on: <20201014115253.25276-1-phi...@redhat.com>
"util/vfio-helpers: Improve debugging experience"
https://www.mail-archive.com/qemu-block@nongnu.org/msg75637.html

Philippe Mathieu-Daudé (15):
  block/nvme: Move nvme_poll_cb() earlier
  block/nvme: Trace nvme_poll_queue() per queue
  block/nvme: Use unsigned integer for queue counter/size
  block/nvme: Improve nvme_free_req_queue_wait() trace information
  block/nvme: Trace queue pair creation/deletion
  block/nvme: Make nvme_identify() return boolean indicating error
  block/nvme: Make nvme_init_queue() return boolean indicating error
  block/nvme: Pass AioContext argument to nvme_add_io_queue()
  block/nvme: Introduce Completion Queue definitions
  block/nvme: Use definitions instead of magic values in add_io_queue()
  block/nvme: Trace controller capabilities
  block/nvme: Simplify device reset
  block/nvme: Simplify ADMIN queue access
  block/nvme: Report warning with warn_report()
  block/nvme: Set request_alignment at initialization

 include/block/nvme.h |  17 ---
 block/nvme.c | 116 +--
 block/trace-events   |  17 ---
 3 files changed, 88 insertions(+), 62 deletions(-)

-- 
2.26.2





[PATCH 01/15] block/nvme: Move nvme_poll_cb() earlier

2020-10-14 Thread Philippe Mathieu-Daudé
We are going to use this callback in nvme_add_io_queue()
in few commits. To avoid forward-declaring it, move it
before. No logical change.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5f662c55bbe..a534c61b6b6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -627,6 +627,16 @@ static void nvme_handle_event(EventNotifier *n)
 nvme_poll_queues(s);
 }
 
+static bool nvme_poll_cb(void *opaque)
+{
+EventNotifier *e = opaque;
+BDRVNVMeState *s = container_of(e, BDRVNVMeState,
+irq_notifier[MSIX_SHARED_IRQ_IDX]);
+
+trace_nvme_poll_cb(s);
+return nvme_poll_queues(s);
+}
+
 static bool nvme_add_io_queue(BlockDriverState *bs, Error **errp)
 {
 BDRVNVMeState *s = bs->opaque;
@@ -669,16 +679,6 @@ out_error:
 return false;
 }
 
-static bool nvme_poll_cb(void *opaque)
-{
-EventNotifier *e = opaque;
-BDRVNVMeState *s = container_of(e, BDRVNVMeState,
-irq_notifier[MSIX_SHARED_IRQ_IDX]);
-
-trace_nvme_poll_cb(s);
-return nvme_poll_queues(s);
-}
-
 static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
  Error **errp)
 {
-- 
2.26.2




Re: [PATCH v11 11/13] stream: mark backing-file argument as deprecated

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 18:03, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Whereas the block-stream job starts using a backing file name of the
base node overlay after the block-stream job completes, mark the QMP
'backing-file' argument as deprecated.

Signed-off-by: Andrey Shinkevich 
---
  docs/system/deprecated.rst | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 8b3ab5b..7491fcf 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -285,6 +285,12 @@ details.
  The ``query-events`` command has been superseded by the more powerful
  and accurate ``query-qmp-schema`` command.
  
+``block-stream`` argument ``backing-file`` (since 5.2)

+'
+
+The argument ``backing-file`` is deprecated. QEMU uses a backing file
+name of the base node overlay after the block-stream job completes.
+


Hm, why?  I don’t see the problem with it.



My wrong idea, sorry. I believed that the argument is unused when I were 
reviewing v10. But it actually become unused during the series and it is wrong.

--
Best regards,
Vladimir



Re: [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 18:02, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Avoid writing a filter JSON-name to QCOW2 image when the backing file
is changed after the block stream job.

Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index e0540ee..51462bd 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
  BlockDriverState *bs = blk_bs(bjob->blk);
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
+BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
  Error *local_err = NULL;
  int ret = 0;
  
@@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
  
  if (bdrv_cow_child(unfiltered_bs)) {

  const char *base_id = NULL, *base_fmt = NULL;
-if (base) {
-base_id = s->backing_file_str;
-if (base->drv) {
-base_fmt = base->drv->format_name;
+if (base_unfiltered) {
+base_id = base_unfiltered->filename;


I think you have to leave this querying s->backing_file_str, and instead
change how qmp_block_stream() gets @base_name.  block-stream has a
backing-file parameter that can override the string that should be used
here.

(Or perhaps you can let qmp_block_stream() just set it to NULL if no
backing-file parameter is passed and then fall back to
base_unfiltered->filename only here.  Probably better, actually, in case
base_unfiltered is changed during the job run.)



Agree with the way in brackets.

If user set backing-file parameter we should handle it here.

If backing-file is not set, we should use dynamically calculated unfiltered 
base in stream_prepare().

--
Best regards,
Vladimir



Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 15:51, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 13 +
  block/io.c   |  3 ++-
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index b136895..278a11a 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -148,10 +148,15 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
  }
  }
  
-ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,

-  local_flags);
-if (ret < 0) {
-return ret;
+if (!!(flags & BDRV_REQ_PREFETCH) &


How about dropping the double negation and using a logical && instead of
the binary &?


+!(local_flags & BDRV_REQ_COPY_ON_READ)) {
+/* Skip non-guest reads if no copy needed */
+} else {


Hm.  I would have just written the negated form

(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))

and put the “skip” comment above that condition.

(Since local_flags is initialized to flags, it can be written as a
single comparison, but that’s a matter of taste and I’m not going to
recommend either over the other:

((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
BDRV_REQ_PREFETCH)

)


+ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+if (ret < 0) {
+return ret;
+}
  }
  
  offset += n;

diff --git a/block/io.c b/block/io.c
index 11df188..bff1808 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  
  max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);

  if (bytes <= max_bytes && bytes <= max_transfer) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
+ flags & bs->supported_read_flags);



When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. 
This means, that we can't just drop the flag when call the driver that doesn't 
support it.

Actually, if driver doesn't support the PREFETCH flag we should do nothing.




Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
why it isn’t.




Could it be part of patch 07? I mean introduce new field supported_read_flags 
and handle it in generic code in one patch, prior to implementing support for 
it in COR driver.


--
Best regards,
Vladimir



Re: [PATCH v11 12/13] stream: remove unused backing-file name parameter

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> The 'backing-file' argument is not used by the block-stream job. It
> designates a backing file name to set in QCOW2 image header after the
> block-stream job finished. A backing file name of the node above base
> is used instead.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/stream.c|  6 +-
>  blockdev.c| 21 ++---
>  include/block/block_int.h |  2 +-
>  3 files changed, 8 insertions(+), 21 deletions(-)

I don’t think you can deprecate a parameter and effectively remove
support for it in the same release.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 15:22, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich 
---
  include/block/block.h | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b..2b7efd1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -71,9 +71,10 @@ typedef enum {
  BDRV_REQ_NO_FALLBACK= 0x100,
  
  /*

- * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
- * on read request and means that caller doesn't really need data to be
- * written to qiov parameter which may be NULL.
+ * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
+ * flag or when the COR-filter applied to read operations and means that


There’s some word missing here, but I’m not sure what it is...  At least
an “is” before “applied”.  Perhaps something like ”or when a COR filter
is involved (in read operations)” would be better.


+ * caller doesn't really need data to be written to qiov parameter which


And this “written to” confused me for a second, because we’re reading
into qiov.  Technically, that means writing into the buffer, but, you know.

Could we rewrite the whole thing, perhaps?  Something like

“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
(i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a
COR filter), in which case it signals that the COR operation need not
read the data into memory (qiov), but only ensure it is copied to the
top layer (i.e., that COR is done).”



Sounds good




+ * may be NULL.
   */
  BDRV_REQ_PREFETCH  = 0x200,
  /* Mask of valid flags */







--
Best regards,
Vladimir



Re: [PATCH v11 11/13] stream: mark backing-file argument as deprecated

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Whereas the block-stream job starts using a backing file name of the
> base node overlay after the block-stream job completes, mark the QMP
> 'backing-file' argument as deprecated.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  docs/system/deprecated.rst | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index 8b3ab5b..7491fcf 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -285,6 +285,12 @@ details.
>  The ``query-events`` command has been superseded by the more powerful
>  and accurate ``query-qmp-schema`` command.
>  
> +``block-stream`` argument ``backing-file`` (since 5.2)
> +'
> +
> +The argument ``backing-file`` is deprecated. QEMU uses a backing file
> +name of the base node overlay after the block-stream job completes.
> +

Hm, why?  I don’t see the problem with it.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Avoid writing a filter JSON-name to QCOW2 image when the backing file
> is changed after the block stream job.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/stream.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index e0540ee..51462bd 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>  BlockDriverState *bs = blk_bs(bjob->blk);
>  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>  BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
> +BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
>  Error *local_err = NULL;
>  int ret = 0;
>  
> @@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
>  
>  if (bdrv_cow_child(unfiltered_bs)) {
>  const char *base_id = NULL, *base_fmt = NULL;
> -if (base) {
> -base_id = s->backing_file_str;
> -if (base->drv) {
> -base_fmt = base->drv->format_name;
> +if (base_unfiltered) {
> +base_id = base_unfiltered->filename;

I think you have to leave this querying s->backing_file_str, and instead
change how qmp_block_stream() gets @base_name.  block-stream has a
backing-file parameter that can override the string that should be used
here.

(Or perhaps you can let qmp_block_stream() just set it to NULL if no
backing-file parameter is passed and then fall back to
base_unfiltered->filename only here.  Probably better, actually, in case
base_unfiltered is changed during the job run.)

Max

> +if (base_unfiltered->drv) {
> +base_fmt = base_unfiltered->drv->format_name;
>  }
>  }
>  bdrv_set_backing_hd(unfiltered_bs, base, _err);
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2020 14:57, Max Reitz wrote:

On 14.10.20 13:09, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

We are going to use the COR-filter for a block-stream job.
To limit COR operations by the base node in the backing chain during
stream job, pass the name of overlay base node to the copy-on-read
driver as base node itself may change due to possible concurrent jobs.
The rest of the functionality will be implemented in the patch that
follows.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 14 ++
  1 file changed, 14 insertions(+)


Is there a reason why you didn’t add this option to QAPI (as part of a
yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.


diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index bcccf0f..c578b1b 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,19 +24,24 @@
  #include "block/block_int.h"
  #include "qemu/module.h"
  #include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
  #include "qapi/qmp/qdict.h"
  #include "block/copy-on-read.h"
  
  
  typedef struct BDRVStateCOR {

  bool active;
+BlockDriverState *base_overlay;
  } BDRVStateCOR;
  
  
  static int cor_open(BlockDriverState *bs, QDict *options, int flags,

  Error **errp)
  {
+BlockDriverState *base_overlay = NULL;
  BDRVStateCOR *state = bs->opaque;
+/* We need the base overlay node rather than the base itself */
+const char *base_overlay_node = qdict_get_try_str(options, "base");


Shouldn’t it be called base-overlay or above-base then?


While reviewing patch 5, I realized this sould indeed be base-overlay
(i.e. the next allocation-bearing layer above the base, not just a
filter on top of base), but that should be noted somewhere, of course.
Best would be the QAPI documentation for this option. O:)



What about naming it just "bottom" or "bottom-node"? If we don't have base, it's strange to have something 
"relative to base". And we can document, that "bottom" must be a non-filter node in a backing chain of 
"top".


--
Best regards,
Vladimir



Re: [PATCH v4 7/7] nbd: Allow export of multiple bitmaps for one device

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

10.10.2020 00:55, Eric Blake wrote:

With this, 'qemu-nbd -B b0 -B b1 -f qcow2 img.qcow2' can let you sniff
out multiple bitmaps from one server.  qemu-img as client can still
only read one bitmap per client connection, but other NBD clients
(hello libnbd) can now read multiple bitmaps in a single pass.

Signed-off-by: Eric Blake 



You didn't update nbd_export_create failure patch, I suggest:

@@ -1533,6 +1537,7 @@ static int nbd_export_create(BlockExport *blk_exp, 
BlockExportOptions *exp_args,
 bool shared = !exp_args->writable;
 strList *bitmaps;
 int ret;
+size_t i;
 
 assert(exp_args->type == BLOCK_EXPORT_TYPE_NBD);
 
@@ -1632,11 +1637,15 @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,

 goto fail;
 }
 
-bdrv_dirty_bitmap_set_busy(bm, true);

 exp->export_bitmaps[exp->nr_export_bitmaps++] = bm;
 assert(strlen(bitmap) <= BDRV_BITMAP_MAX_NAME_SIZE);
 }
 
+/* Mark bitmaps busy in a separate loop, to not bother with roll-back */

+for (i = 0; i < exp->nr_export_bitmaps; i++) {
+bdrv_dirty_bitmap_set_busy(bm, true);
+}
+
 exp->allocation_depth = arg->allocation_depth;
 
 blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);

@@ -1646,6 +1655,7 @@ static int nbd_export_create(BlockExport *blk_exp, 
BlockExportOptions *exp_args,
 return 0;
 
 fail:

+g_free(exp->export_bitmaps);
 g_free(exp->name);
 g_free(exp->description);
 return ret;

and with it:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


Also, would be good to add a comment:

@@ -29,6 +29,10 @@
 #define NBD_META_ID_BASE_ALLOCATION 0
 #define NBD_META_ID_ALLOCATION_DEPTH 1
 #define NBD_META_ID_DIRTY_BITMAP 2
+/*
+ * NBD_META_ID_DIRTY_BITMAP+i are reserved for dirty bitmaps, so keep
+ * NBD_META_ID_DIRTY_BITMAP the last one.
+ */
 
 /*

  * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical



--
Best regards,
Vladimir



[PATCH v2 15/15] python/qemu: add qemu package itself to pipenv

2020-10-14 Thread John Snow
This adds the python qemu packages themselves to the pipenv manifest.
'pipenv sync' will create a virtual environment sufficient to use the SDK.
'pipenv sync --dev' will create a virtual environment sufficient to use
and test the SDK (with pylint, mypy, isort, flake8, etc.)

The qemu packages are installed in 'editable' mode; all changes made to
the python package inside the git tree will be reflected in the
installed package without reinstallation. This includes changes made
via git pull and so on.

Signed-off-by: John Snow 
---
 python/Pipfile  | 1 +
 python/Pipfile.lock | 9 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/python/Pipfile b/python/Pipfile
index 75b96f29d8..214fb175e7 100644
--- a/python/Pipfile
+++ b/python/Pipfile
@@ -10,6 +10,7 @@ mypy = ">=0.770"
 pylint = ">=2.6.0"
 
 [packages]
+qemu = {editable = true,path = "."}
 
 [requires]
 python_version = "3.6"
diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index 74563b444c..386ca54ea6 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -1,7 +1,7 @@
 {
 "_meta": {
 "hash": {
-"sha256": 
"b89c7a1b8a414f2a4cd708964123fb427d55419ee0b39e088bf2e7d4fbc11979"
+"sha256": 
"e38d142c3fadc2f2ed849e86f7ebd14e25974dc12228751490aef5a9ee074f2f"
 },
 "pipfile-spec": 6,
 "requires": {
@@ -15,7 +15,12 @@
 }
 ]
 },
-"default": {},
+"default": {
+"qemu": {
+"editable": true,
+"path": "."
+}
+},
 "develop": {
 "astroid": {
 "hashes": [
-- 
2.26.2




[PATCH v2 11/15] python: move mypy.ini into setup.cfg

2020-10-14 Thread John Snow
mypy supports reading its configuration values from a central project
configuration file; do so.

Signed-off-by: John Snow 
---
 python/mypy.ini  | 4 
 python/setup.cfg | 5 +
 2 files changed, 5 insertions(+), 4 deletions(-)
 delete mode 100644 python/mypy.ini

diff --git a/python/mypy.ini b/python/mypy.ini
deleted file mode 100644
index 1a581c5f1e..00
--- a/python/mypy.ini
+++ /dev/null
@@ -1,4 +0,0 @@
-[mypy]
-strict = True
-python_version = 3.6
-warn_unused_configs = True
diff --git a/python/setup.cfg b/python/setup.cfg
index 103371ad5e..13a7a8cd9c 100755
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -21,6 +21,11 @@ packages = find_namespace:
 [flake8]
 extend-ignore = E722  # Pylint handles this, but smarter.
 
+[mypy]
+strict = True
+python_version = 3.6
+warn_unused_configs = True
+
 [pylint.messages control]
 # Disable the message, report, category or checker with the given id(s). You
 # can either give multiple identifiers separated by comma (,) or put this
-- 
2.26.2




[PATCH v2 08/15] python: add pylint to pipenv

2020-10-14 Thread John Snow
We are specifying >= pylint 2.6.x for two reasons:

1. For setup.cfg support, added in pylint 2.5.x
2. To clarify that we are using a version that has incompatibly dropped
bad-whitespace checks.

Signed-off-by: John Snow 
---
 python/Pipfile  |   1 +
 python/Pipfile.lock | 127 
 2 files changed, 128 insertions(+)
 create mode 100644 python/Pipfile.lock

diff --git a/python/Pipfile b/python/Pipfile
index 9534830b5e..1e58986c89 100644
--- a/python/Pipfile
+++ b/python/Pipfile
@@ -4,6 +4,7 @@ url = "https://pypi.org/simple;
 verify_ssl = true
 
 [dev-packages]
+pylint = ">=2.6.0"
 
 [packages]
 
diff --git a/python/Pipfile.lock b/python/Pipfile.lock
new file mode 100644
index 00..71e0d40ffb
--- /dev/null
+++ b/python/Pipfile.lock
@@ -0,0 +1,127 @@
+{
+"_meta": {
+"hash": {
+"sha256": 
"b7ac1f2ad73bc166244c0378298afba64951a16bb749b81a9668dc41f40f941c"
+},
+"pipfile-spec": 6,
+"requires": {
+"python_version": "3.6"
+},
+"sources": [
+{
+"name": "pypi",
+"url": "https://pypi.org/simple;,
+"verify_ssl": true
+}
+]
+},
+"default": {},
+"develop": {
+"astroid": {
+"hashes": [
+
"sha256:2f4078c2a41bf377eea06d71c9d2ba4eb8f6b1af2135bec277d8f12bb703",
+
"sha256:bc58d83eb610252fd8de6363e39d4f1d0619c894b0ed24603b881c02e64c7386"
+],
+"markers": "python_version >= '3.5'",
+"version": "==2.4.2"
+},
+"isort": {
+"hashes": [
+
"sha256:dcab1d98b469a12a1a624ead220584391648790275560e1a43e54c5dceae65e7",
+
"sha256:dcaeec1b5f0eca77faea2a35ab790b4f3680ff75590bfcb7145986905aab2f58"
+],
+"markers": "python_version >= '3.6' and python_version < '4.0'",
+"version": "==5.6.4"
+},
+"lazy-object-proxy": {
+"hashes": [
+
"sha256:0c4b206227a8097f05c4dbdd323c50edf81f15db3b8dc064d08c62d37e1a504d",
+
"sha256:194d092e6f246b906e8f70884e620e459fc54db3259e60cf69a4d66c3fda3449",
+
"sha256:1be7e4c9f96948003609aa6c974ae59830a6baecc5376c25c92d7d697e684c08",
+
"sha256:4677f594e474c91da97f489fea5b7daa17b5517190899cf213697e48d3902f5a",
+
"sha256:48dab84ebd4831077b150572aec802f303117c8cc5c871e182447281ebf3ac50",
+
"sha256:5541cada25cd173702dbd99f8e22434105456314462326f06dba3e180f203dfd",
+
"sha256:59f79fef100b09564bc2df42ea2d8d21a64fdcda64979c0fa3db7bdaabaf6239",
+
"sha256:8d859b89baf8ef7f8bc6b00aa20316483d67f0b1cbf422f5b4dc56701c8f2ffb",
+
"sha256:9254f4358b9b541e3441b007a0ea0764b9d056afdeafc1a5569eee1cc6c1b9ea",
+
"sha256:9651375199045a358eb6741df3e02a651e0330be090b3bc79f6d0de31a80ec3e",
+
"sha256:97bb5884f6f1cdce0099f86b907aa41c970c3c672ac8b9c8352789e103cf3156",
+
"sha256:9b15f3f4c0f35727d3a0fba4b770b3c4ebbb1fa907dbcc046a1d2799f3edd142",
+
"sha256:a2238e9d1bb71a56cd710611a1614d1194dc10a175c1e08d75e1a7bcc250d442",
+
"sha256:a6ae12d08c0bf9909ce12385803a543bfe99b95fe01e752536a60af2b7797c62",
+
"sha256:ca0a928a3ddbc5725be2dd1cf895ec0a254798915fb3a36af0964a0a4149e3db",
+
"sha256:cb2c7c57005a6804ab66f106ceb8482da55f5314b7fcb06551db1edae4ad1531",
+
"sha256:d74bb8693bf9cf75ac3b47a54d716bbb1a92648d5f781fc799347cfc95952383",
+
"sha256:d945239a5639b3ff35b70a88c5f2f491913eb94871780ebfabb2568bd58afc5a",
+
"sha256:eba7011090323c1dadf18b3b689845fd96a61ba0a1dfbd7f24b921398affc357",
+
"sha256:efa1909120ce98bbb3777e8b6f92237f5d5c8ea6758efea36a473e1d38f7d3e4",
+
"sha256:f3900e8a5de27447acbf900b4750b0ddfd7ec1ea7fbaf11dfa911141bc522af0"
+],
+"markers": "python_version >= '2.7' and python_version not in 
'3.0, 3.1, 3.2, 3.3'",
+"version": "==1.4.3"
+},
+"mccabe": {
+"hashes": [
+
"sha256:ab8a6258860da4b6677da4bd2fe5dc2c659cff31b3ee4f7f5d64e79735b80d42",
+
"sha256:dd8d182285a0fe56bace7f45b5e7d1a6ebcbf524e8f3bd87eb0f125271b8831f"
+],
+"version": "==0.6.1"
+},
+"pylint": {
+"hashes": [
+
"sha256:bb4a908c9dadbc3aac18860550e870f58e1a02c9f2c204fdf5693d73be061210",
+
"sha256:bfe68f020f8a0fece830a22dd4d5dddb4ecc6137db04face4c3420a46a52239f"
+],
+"index": "pypi",
+"version": "==2.6.0"
+},
+"six": {
+"hashes": [
+
"sha256:30639c035cdb23534cd4aa2dd52c3bf48f06e5f4a941509c8bafd8ce11080259",
+

[PATCH v2 13/15] python: move .isort.cfg into setup.cfg

2020-10-14 Thread John Snow
Signed-off-by: John Snow 
---
 python/.isort.cfg | 7 ---
 python/setup.cfg  | 8 
 2 files changed, 8 insertions(+), 7 deletions(-)
 delete mode 100644 python/.isort.cfg

diff --git a/python/.isort.cfg b/python/.isort.cfg
deleted file mode 100644
index 6d0fd6cc0d..00
--- a/python/.isort.cfg
+++ /dev/null
@@ -1,7 +0,0 @@
-[settings]
-force_grid_wrap=4
-force_sort_within_sections=True
-include_trailing_comma=True
-line_length=72
-lines_after_imports=2
-multi_line_output=3
\ No newline at end of file
diff --git a/python/setup.cfg b/python/setup.cfg
index 13a7a8cd9c..04ad3d91bc 100755
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -54,3 +54,11 @@ good-names=i,
 [pylint.similarities]
 # Ignore imports when computing similarities.
 ignore-imports=yes
+
+[isort]
+force_grid_wrap=4
+force_sort_within_sections=True
+include_trailing_comma=True
+line_length=72
+lines_after_imports=2
+multi_line_output=3
-- 
2.26.2




[PATCH v2 12/15] python: add mypy to pipenv

2020-10-14 Thread John Snow
0.730 appears to be about the oldest version that works with the
features we want, including nice human readable output (to make sure
iotest 297 passes), and type-parameterized Popen generics.

0.770, however, supports adding 'strict' to the config file, so require
at least 0.770.

mypy can be run from the python root by typing 'mypy qemu'.

Signed-off-by: John Snow 
---
 python/Pipfile  |  1 +
 python/Pipfile.lock | 37 -
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/python/Pipfile b/python/Pipfile
index d1f7045f68..51c537b0d1 100644
--- a/python/Pipfile
+++ b/python/Pipfile
@@ -5,6 +5,7 @@ verify_ssl = true
 
 [dev-packages]
 flake8 = ">=3.6.0"
+mypy = ">=0.770"
 pylint = ">=2.6.0"
 
 [packages]
diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index eb5ae75a0c..376f95a6a8 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -1,7 +1,7 @@
 {
 "_meta": {
 "hash": {
-"sha256": 
"9f6d4857a6c72ad40fc1ec1e58cdb76f187a2986ac4156f0027e5eb798ec69a9"
+"sha256": 
"65f010a8f2e55e9870d2b7e0d8af129516097d23abf2504f396552748b067ade"
 },
 "pipfile-spec": 6,
 "requires": {
@@ -83,6 +83,33 @@
 ],
 "version": "==0.6.1"
 },
+"mypy": {
+"hashes": [
+
"sha256:0a0d102247c16ce93c97066443d11e2d36e6cc2a32d8ccc1f705268970479324",
+
"sha256:0d34d6b122597d48a36d6c59e35341f410d4abfa771d96d04ae2c468dd201abc",
+
"sha256:2170492030f6faa537647d29945786d297e4862765f0b4ac5930ff62e300d802",
+
"sha256:2842d4fbd1b12ab422346376aad03ff5d0805b706102e475e962370f874a5122",
+
"sha256:2b21ba45ad9ef2e2eb88ce4aeadd0112d0f5026418324176fd494a6824b74975",
+
"sha256:72060bf64f290fb629bd4a67c707a66fd88ca26e413a91384b18db3876e57ed7",
+
"sha256:af4e9ff1834e565f1baa74ccf7ae2564ae38c8df2a85b057af1dbbc958eb",
+
"sha256:bd03b3cf666bff8d710d633d1c56ab7facbdc204d567715cb3b9f85c6e94f669",
+
"sha256:c614194e01c85bb2e551c421397e49afb2872c88b5830e3554f0519f9fb1c178",
+
"sha256:cf4e7bf7f1214826cf7333627cb2547c0db7e3078723227820d0a2490f117a01",
+
"sha256:da56dedcd7cd502ccd3c5dddc656cb36113dd793ad466e894574125945653cea",
+
"sha256:e86bdace26c5fe9cf8cb735e7cedfe7850ad92b327ac5d797c656717d2ca66de",
+
"sha256:e97e9c13d67fbe524be17e4d8025d51a7dca38f90de2e462243ab8ed8a9178d1",
+
"sha256:eea260feb1830a627fb526d22fbb426b750d9f5a47b624e8d5e7e004359b219c"
+],
+"index": "pypi",
+"version": "==0.790"
+},
+"mypy-extensions": {
+"hashes": [
+
"sha256:090fedd75945a69ae91ce1303b5824f428daf5a028d2f6ab8a299250a846f15d",
+
"sha256:2d82818f5bb3e369420cb3c4060a7970edba416647068eb4c5343488a6c604a8"
+],
+"version": "==0.4.3"
+},
 "pycodestyle": {
 "hashes": [
 
"sha256:2295e7b2f6b5bd100585ebcb1f616591b652db8a741695b3d8f5d28bdc934367",
@@ -149,6 +176,14 @@
 "markers": "implementation_name == 'cpython' and python_version < 
'3.8'",
 "version": "==1.4.1"
 },
+"typing-extensions": {
+"hashes": [
+
"sha256:7cb407020f00f7bfc3cb3e7881628838e69d8f3fcab2f64742a5e76b2f841918",
+
"sha256:99d4073b617d30288f569d3f13d2bd7548c3a7e4c8de87db09a9d29bb3a4a60c",
+
"sha256:dafc7639cde7f1b6e1acc0f457842a83e722ccca8eef5270af2d74792619a89f"
+],
+"version": "==3.7.4.3"
+},
 "wrapt": {
 "hashes": [
 
"sha256:b62ffa81fb85f4332a4f609cab4ac40709470da05643a082ec1eb88e6d9b97d7"
-- 
2.26.2




[PATCH v2 06/15] python: add pylint exceptions to __init__.py

2020-10-14 Thread John Snow
Pylint 2.5.x and 2.6.x have regressions that make import checking
inconsistent, see:

https: //github.com/PyCQA/pylint/issues/3609
https: //github.com/PyCQA/pylint/issues/3624
https: //github.com/PyCQA/pylint/issues/3651
Signed-off-by: John Snow 
---
 python/qemu/core/__init__.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/qemu/core/__init__.py b/python/qemu/core/__init__.py
index bf23ccd839..804a414dfb 100644
--- a/python/qemu/core/__init__.py
+++ b/python/qemu/core/__init__.py
@@ -27,6 +27,9 @@
 # the COPYING file in the top-level directory.
 #
 
+# pylint: disable=import-error
+# see: https://github.com/PyCQA/pylint/issues/3624
+# see: https://github.com/PyCQA/pylint/issues/3651
 from .accel import kvm_available, list_accel, tcg_available
 from .machine import QEMUMachine
 from .qmp import QEMUMonitorProtocol
-- 
2.26.2




[PATCH v2 04/15] python: add directory structure README.rst files

2020-10-14 Thread John Snow
Add short readmes to python/, python/qemu/, and python/qemu/core that
explain the directory hierarchy. These readmes are visible when browsing
the source on e.g. gitlab/github and are designed to help new
developers/users quickly make sense of the source tree.

They are not designed for inclusion in a published manual.

Signed-off-by: John Snow 
---
 python/README.rst   | 27 +++
 python/qemu/README.rst  |  8 
 python/qemu/core/README.rst |  9 +
 3 files changed, 44 insertions(+)
 create mode 100644 python/README.rst
 create mode 100644 python/qemu/README.rst
 create mode 100644 python/qemu/core/README.rst

diff --git a/python/README.rst b/python/README.rst
new file mode 100644
index 00..fcc0552ec4
--- /dev/null
+++ b/python/README.rst
@@ -0,0 +1,27 @@
+QEMU Python Tooling
+===
+
+This directory houses Python tooling used by the QEMU project to build,
+configure, and test QEMU. It is organized by namespace (``qemu``), and
+then by package (``qemu/core``).
+
+``setup.py`` is used by ``pip`` to install this tooling to the current
+environment. You will generally invoke it by doing one of the following:
+
+1. ``pip3 install .`` will install these packages to your current
+   environment. If you are inside a virtual environment, they will
+   install there. If you are not, it will attempt to install to the
+   global environment, which is not recommended.
+
+2. ``pip3 install --user .`` will install these packages to your user's
+   local python packages. If you are inside of a virtual environment,
+   this will fail.
+
+If you amend the ``-e`` argument, pip will install in "editable" mode;
+which installs a version of the package that uses symlinks to these
+files, such that the package always reflects the latest version in your
+git tree.
+
+See `Installing packages using pip and virtual environments
+`_
+for more information.
diff --git a/python/qemu/README.rst b/python/qemu/README.rst
new file mode 100644
index 00..31209c80a5
--- /dev/null
+++ b/python/qemu/README.rst
@@ -0,0 +1,8 @@
+QEMU Python Namespace
+=
+
+This directory serves as the root of a `Python PEP 420 implicit
+namespace package <`_.
+
+Each directory below is assumed to be an installable Python package that
+is available under the ``qemu.`` namespace.
diff --git a/python/qemu/core/README.rst b/python/qemu/core/README.rst
new file mode 100644
index 00..91668e00bd
--- /dev/null
+++ b/python/qemu/core/README.rst
@@ -0,0 +1,9 @@
+qemu.core Package
+=
+
+This package provides core utilities used for testing and debugging
+QEMU. It is used by the iotests, vm tests, and several other utilities
+in the ./scripts directory. It is not a fully-fledged SDK and it is
+subject to change at any time.
+
+See the documentation in ``__init__.py`` for more information.
-- 
2.26.2




[PATCH v2 01/15] python: create qemu.core package

2020-10-14 Thread John Snow
move python/qemu/*.py to python/qemu/core/*.py and update import
directives across the tree.

This is done to create a PEP420 namespace package, in which we may
create subpackages. To do this, the namespace directory ("qemu") should
not have any modules in it. Those files will go in a new 'core'
subpackage instead.

Bolster the core/__init__.py module, making the top-level classes and
functions from this package available directly inside the `qemu.core`
namespace.

This facilitates the convenient shorthand:

> from qemu.core import QEMUQtestMachine, QEMUMonitorProtocol

Signed-off-by: John Snow 
---
 python/{qemu => }/.isort.cfg  |  0
 python/qemu/__init__.py   | 11 --
 python/qemu/{ => core}/.flake8|  0
 python/qemu/core/__init__.py  | 44 +++
 python/qemu/{ => core}/accel.py   |  0
 python/qemu/{ => core}/console_socket.py  |  0
 python/qemu/{ => core}/machine.py |  0
 python/qemu/{ => core}/pylintrc   |  0
 python/qemu/{ => core}/qmp.py |  0
 python/qemu/{ => core}/qtest.py   |  0
 scripts/device-crash-test |  2 +-
 scripts/qmp/qemu-ga-client|  2 +-
 scripts/qmp/qmp   |  2 +-
 scripts/qmp/qmp-shell |  2 +-
 scripts/qmp/qom-fuse  |  2 +-
 scripts/qmp/qom-get   |  2 +-
 scripts/qmp/qom-list  |  2 +-
 scripts/qmp/qom-set   |  2 +-
 scripts/qmp/qom-tree  |  2 +-
 scripts/render_block_graph.py |  6 ++--
 scripts/simplebench/bench_block_job.py|  4 +--
 tests/acceptance/avocado_qemu/__init__.py |  2 +-
 tests/acceptance/boot_linux.py|  3 +-
 tests/acceptance/virtio_check_params.py   |  2 +-
 tests/acceptance/virtio_version.py|  2 +-
 tests/migration/guestperf/engine.py   |  2 +-
 tests/qemu-iotests/235|  2 +-
 tests/qemu-iotests/297|  2 +-
 tests/qemu-iotests/300|  4 +--
 tests/qemu-iotests/iotests.py |  4 +--
 tests/vm/basevm.py|  6 ++--
 31 files changed, 71 insertions(+), 41 deletions(-)
 rename python/{qemu => }/.isort.cfg (100%)
 delete mode 100644 python/qemu/__init__.py
 rename python/qemu/{ => core}/.flake8 (100%)
 create mode 100644 python/qemu/core/__init__.py
 rename python/qemu/{ => core}/accel.py (100%)
 rename python/qemu/{ => core}/console_socket.py (100%)
 rename python/qemu/{ => core}/machine.py (100%)
 rename python/qemu/{ => core}/pylintrc (100%)
 rename python/qemu/{ => core}/qmp.py (100%)
 rename python/qemu/{ => core}/qtest.py (100%)

diff --git a/python/qemu/.isort.cfg b/python/.isort.cfg
similarity index 100%
rename from python/qemu/.isort.cfg
rename to python/.isort.cfg
diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py
deleted file mode 100644
index 4ca06c34a4..00
--- a/python/qemu/__init__.py
+++ /dev/null
@@ -1,11 +0,0 @@
-# QEMU library
-#
-# Copyright (C) 2015-2016 Red Hat Inc.
-# Copyright (C) 2012 IBM Corp.
-#
-# Authors:
-#  Fam Zheng 
-#
-# This work is licensed under the terms of the GNU GPL, version 2.  See
-# the COPYING file in the top-level directory.
-#
diff --git a/python/qemu/.flake8 b/python/qemu/core/.flake8
similarity index 100%
rename from python/qemu/.flake8
rename to python/qemu/core/.flake8
diff --git a/python/qemu/core/__init__.py b/python/qemu/core/__init__.py
new file mode 100644
index 00..bf23ccd839
--- /dev/null
+++ b/python/qemu/core/__init__.py
@@ -0,0 +1,44 @@
+"""
+QEMU development and testing library.
+
+This library provides a few high-level classes for driving QEMU from a
+test suite, not intended for production use.
+
+- QEMUMachine: Configure and Boot a QEMU VM
+ - QEMUQtestMachine: VM class, with a qtest socket.
+
+- QEMUMonitorProtocol: Connect to, send/receive QMP messages.
+- QEMUQtestProtocol: Connect to, send/receive qtest message.
+
+- list_accel: List available accelerators
+- kvm_available: Probe for KVM support
+- tcg_available: Probe for TCG support
+"""
+
+# Copyright (C) 2020 John Snow for Red Hat Inc.
+# Copyright (C) 2015-2016 Red Hat Inc.
+# Copyright (C) 2012 IBM Corp.
+#
+# Authors:
+#  John Snow 
+#  Fam Zheng 
+#
+# This work is licensed under the terms of the GNU GPL, version 2.  See
+# the COPYING file in the top-level directory.
+#
+
+from .accel import kvm_available, list_accel, tcg_available
+from .machine import QEMUMachine
+from .qmp import QEMUMonitorProtocol
+from .qtest import QEMUQtestMachine, QEMUQtestProtocol
+
+
+__all__ = (
+'list_accel',
+'kvm_available',
+'tcg_available',
+'QEMUMonitorProtocol',
+'QEMUMachine',
+'QEMUQtestProtocol',
+'QEMUQtestMachine',
+)
diff --git a/python/qemu/accel.py b/python/qemu/core/accel.py
similarity index 100%
rename from python/qemu/accel.py
rename to python/qemu/core/accel.py
diff --git 

[PATCH v2 02/15] python: add qemu package installer

2020-10-14 Thread John Snow
Add setup.cfg and setup.py, necessary for installing a package via
pip. Add a rst document explaining the basics of what this package is
for and who to contact for more information. This document will be used
as the landing page for the package on PyPI.

I am not yet using a pyproject.toml style package manifest, because
using pyproject.toml (and PEP-517) style packages means that pip is not
able to install in "editable" or "develop" mode, which I consider
necessary for the development of this package.

Use a light-weight setup.py instead.

Signed-off-by: John Snow 
---
 python/PACKAGE.rst | 23 +++
 python/setup.cfg   | 18 ++
 python/setup.py| 23 +++
 3 files changed, 64 insertions(+)
 create mode 100644 python/PACKAGE.rst
 create mode 100755 python/setup.cfg
 create mode 100755 python/setup.py

diff --git a/python/PACKAGE.rst b/python/PACKAGE.rst
new file mode 100644
index 00..6e82f05606
--- /dev/null
+++ b/python/PACKAGE.rst
@@ -0,0 +1,23 @@
+QEMU Python Tooling
+===
+
+This package provides QEMU tooling used by the QEMU project to build,
+configure, and test QEMU. It is not a fully-fledged SDK and it is subject
+to change at any time.
+
+Usage
+-
+
+The ``qemu.core`` subpackage offers rudimentary facilities for launching
+QEMU and communicating with it via QMP. Refer to the module documentation
+(``>>> help(qemu.core)``) for more information.
+
+Contributing
+
+
+This package is maintained by John Snow  as part of
+the QEMU source tree. Contributions are welcome and follow the `QEMU
+patch submission process
+`_. There is a `Gitlab
+mirror `_, but
+contributions must be sent to the list for inclusion.
diff --git a/python/setup.cfg b/python/setup.cfg
new file mode 100755
index 00..12b99a796e
--- /dev/null
+++ b/python/setup.cfg
@@ -0,0 +1,18 @@
+[metadata]
+name = qemu
+maintainer = QEMU Developer Team
+maintainer_email = qemu-de...@nongnu.org
+url = https://www.qemu.org/
+download_url = https://www.qemu.org/download/
+description = QEMU Python Build, Debug and SDK tooling.
+long_description = file:PACKAGE.rst
+long_description_content_type = text/x-rst
+classifiers =
+Development Status :: 3 - Alpha
+License :: OSI Approved :: GNU General Public License v2 (GPLv2)
+Natural Language :: English
+Operating System :: OS Independent
+
+[options]
+python_requires = >= 3.6
+packages = find_namespace:
diff --git a/python/setup.py b/python/setup.py
new file mode 100755
index 00..e93d0075d1
--- /dev/null
+++ b/python/setup.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+"""
+QEMU tooling installer script
+Copyright (c) 2020 John Snow for Red Hat, Inc.
+"""
+
+import setuptools
+import pkg_resources
+
+
+def main():
+"""
+QEMU tooling installer
+"""
+
+# 
https://medium.com/@daveshawley/safely-using-setup-cfg-for-metadata-1babbe54c108
+pkg_resources.require('setuptools>=39.2')
+
+setuptools.setup()
+
+
+if __name__ == '__main__':
+main()
-- 
2.26.2




[PATCH v2 07/15] python: move pylintrc into setup.cfg

2020-10-14 Thread John Snow
Delete the empty settings now that it's sharing a home with settings for
other tools.

pylint can now be run from this folder as "pylint qemu".
Signed-off-by: John Snow 
---
 python/qemu/core/pylintrc | 58 ---
 python/setup.cfg  | 29 
 2 files changed, 29 insertions(+), 58 deletions(-)
 delete mode 100644 python/qemu/core/pylintrc

diff --git a/python/qemu/core/pylintrc b/python/qemu/core/pylintrc
deleted file mode 100644
index 3f69205000..00
--- a/python/qemu/core/pylintrc
+++ /dev/null
@@ -1,58 +0,0 @@
-[MASTER]
-
-[MESSAGES CONTROL]
-
-# Disable the message, report, category or checker with the given id(s). You
-# can either give multiple identifiers separated by comma (,) or put this
-# option multiple times (only on the command line, not in the configuration
-# file where it should appear only once). You can also use "--disable=all" to
-# disable everything first and then reenable specific checks. For example, if
-# you want to run only the similarities checker, you can use "--disable=all
-# --enable=similarities". If you want to run only the classes checker, but have
-# no Warning level messages displayed, use "--disable=all --enable=classes
-# --disable=W".
-disable=too-many-arguments,
-too-many-instance-attributes,
-too-many-public-methods,
-
-[REPORTS]
-
-[REFACTORING]
-
-[MISCELLANEOUS]
-
-[LOGGING]
-
-[BASIC]
-
-# Good variable names which should always be accepted, separated by a comma.
-good-names=i,
-   j,
-   k,
-   ex,
-   Run,
-   _,
-   fd,
-   c,
-[VARIABLES]
-
-[STRING]
-
-[SPELLING]
-
-[FORMAT]
-
-[SIMILARITIES]
-
-# Ignore imports when computing similarities.
-ignore-imports=yes
-
-[TYPECHECK]
-
-[CLASSES]
-
-[IMPORTS]
-
-[DESIGN]
-
-[EXCEPTIONS]
diff --git a/python/setup.cfg b/python/setup.cfg
index 260f7f4e94..a65802d85e 100755
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -17,3 +17,32 @@ classifiers =
 [options]
 python_requires = >= 3.6
 packages = find_namespace:
+
+[pylint.messages control]
+# Disable the message, report, category or checker with the given id(s). You
+# can either give multiple identifiers separated by comma (,) or put this
+# option multiple times (only on the command line, not in the configuration
+# file where it should appear only once). You can also use "--disable=all" to
+# disable everything first and then reenable specific checks. For example, if
+# you want to run only the similarities checker, you can use "--disable=all
+# --enable=similarities". If you want to run only the classes checker, but have
+# no Warning level messages displayed, use "--disable=all --enable=classes
+# --disable=W".
+disable=too-many-arguments,
+too-many-instance-attributes,
+too-many-public-methods,
+
+[pylint.basic]
+# Good variable names which should always be accepted, separated by a comma.
+good-names=i,
+   j,
+   k,
+   ex,
+   Run,
+   _,
+   fd,
+   c,
+
+[pylint.similarities]
+# Ignore imports when computing similarities.
+ignore-imports=yes
-- 
2.26.2




[PATCH v2 14/15] python/qemu: add isort to pipenv

2020-10-14 Thread John Snow
isort 5.0.0 through 5.0.4 has a bug that causes it to misinterpret
certain "from ..." clauses that are not related to imports.

Require 5.0.5 or greater.

isort can be run with 'isort -c qemu' from the python root.

Signed-off-by: John Snow 
---
 python/Pipfile  | 1 +
 python/Pipfile.lock | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/python/Pipfile b/python/Pipfile
index 51c537b0d1..75b96f29d8 100644
--- a/python/Pipfile
+++ b/python/Pipfile
@@ -5,6 +5,7 @@ verify_ssl = true
 
 [dev-packages]
 flake8 = ">=3.6.0"
+isort = ">=5.0.5"
 mypy = ">=0.770"
 pylint = ">=2.6.0"
 
diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index 376f95a6a8..74563b444c 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -1,7 +1,7 @@
 {
 "_meta": {
 "hash": {
-"sha256": 
"65f010a8f2e55e9870d2b7e0d8af129516097d23abf2504f396552748b067ade"
+"sha256": 
"b89c7a1b8a414f2a4cd708964123fb427d55419ee0b39e088bf2e7d4fbc11979"
 },
 "pipfile-spec": 6,
 "requires": {
@@ -46,7 +46,7 @@
 
"sha256:dcab1d98b469a12a1a624ead220584391648790275560e1a43e54c5dceae65e7",
 
"sha256:dcaeec1b5f0eca77faea2a35ab790b4f3680ff75590bfcb7145986905aab2f58"
 ],
-"markers": "python_version >= '3.6' and python_version < '4.0'",
+"index": "pypi",
 "version": "==5.6.4"
 },
 "lazy-object-proxy": {
-- 
2.26.2




[PATCH v2 05/15] python: Add pipenv support

2020-10-14 Thread John Snow
pipenv is a tool used for managing virtual environments with pinned,
explicit dependencies. It is used for precisely recreating python
virtual environments.

pipenv uses two files to do this:

(1) Pipfile, which is similar in purpose and scope to what setup.py
lists. It specifies the requisite minimum to get a functional
environment for using this package.

(2) Pipfile.lock, which is similar in purpose to `pip freeze >
requirements.txt`. It specifies a canonical virtual environment used for
deployment or testing. This ensures that all users have repeatable
results.

The primary benefit of using this tool is to ensure repeatable CI
results with a known set of packages. Although I endeavor to support as
many versions as I can, the fluid nature of the Python toolchain often
means tailoring code for fairly specific versions.

Note that pipenv is *not* required to install or use this module; this is
purely for the sake of repeatable testing by CI or developers.

Here, a "blank" pipfile is added with no dependencies, but specifies
Python 3.6 for the virtual environment.

Pipfile will specify our version minimums, while Pipfile.lock specifies
an exact loudout of packages that were known to operate correctly. This
latter file provides the real value for easy setup of container images
and CI environments.

Signed-off-by: John Snow 
---
 python/Pipfile | 11 +++
 1 file changed, 11 insertions(+)
 create mode 100644 python/Pipfile

diff --git a/python/Pipfile b/python/Pipfile
new file mode 100644
index 00..9534830b5e
--- /dev/null
+++ b/python/Pipfile
@@ -0,0 +1,11 @@
+[[source]]
+name = "pypi"
+url = "https://pypi.org/simple;
+verify_ssl = true
+
+[dev-packages]
+
+[packages]
+
+[requires]
+python_version = "3.6"
-- 
2.26.2




[PATCH v2 00/15] python: create installable package

2020-10-14 Thread John Snow
Based-on: https://gitlab.com/jsnow/qemu/-/tree/python

This series factors the python/qemu directory as an installable
module. It does not yet actually change the mechanics of how any other
python source in the tree actually consumes it (yet), beyond the import
path.

The point of this series is primarily to formalize our dependencies on
mypy, flake8, isort, and pylint alongside versions that are known to
work. It also adds explicitly pinned versions of these dependencies that
should behave in a repeatable and known way for developers and CI
environments both.

With the python tooling as a proper package, you can install this
package in editable or production mode to a virtual environment, your
local user environment, or your system packages. The primary benefit of
this is to gain access to QMP tooling regardless of CWD, without needing
to battle sys.path.

For example: when developing, you may go to qemu/python/ and invoke
`pipenv shell` to activate a virtual environment that contains the qemu
packages.  This package will always reflect the current version of the
source files in the tree. When you are finished, you can simply exit the
shell to remove these packages from your python environment.

When not developing, you could install a version of this package to your
environment outright to gain access to the QMP and QEMUMachine classes
for lightweight scripting and testing by using pip: "pip install [--user] ."

Finally, this package is formatted in such a way that it COULD be
uploaded to https://pypi.org/project/qemu and installed independently of
qemu.git with `pip install qemu`, but that button remains unpushed.

TESTING THIS SERIES:

CD to qemu/python first, and then:

1. Try "pipenv shell" to get a venv with the package installed to it in
editable mode. Ctrl+d exits this venv shell. While in this shell, any
python script that uses "from qemu.core import ..." should work
correctly regardless of your CWD.

2. Try "pipenv sync --dev" to create/update the venv with the
development packages without actually entering the venv. This should
install isort, mypy, flake8 and pylint to the venv.

3. After the above sync, try "pipenv shell" again, and from the python
project root, try any of the following:

  - pylint qemu
  - flake8 qemu
  - isort -c qemu
  - mypy qemu

4. Leave any venv you are in, and from the project root, try the
following commands:

  - pipenv run pylint qemu
  - pipenv run flake8 qemu
  - pipenv run isort -c qemu
  - pipenv run mypy qemu

John Snow (15):
  python: create qemu.core package
  python: add qemu package installer
  python: add VERSION file
  python: add directory structure README.rst files
  python: Add pipenv support
  python: add pylint exceptions to __init__.py
  python: move pylintrc into setup.cfg
  python: add pylint to pipenv
  python: move flake8 config to setup.cfg
  python: Add flake8 to pipenv
  python: move mypy.ini into setup.cfg
  python: add mypy to pipenv
  python: move .isort.cfg into setup.cfg
  python/qemu: add isort to pipenv
  python/qemu: add qemu package itself to pipenv

 python/PACKAGE.rst|  23 +++
 python/README.rst |  27 +++
 python/qemu/README.rst|   8 +
 python/qemu/core/README.rst   |   9 +
 python/Pipfile|  16 ++
 python/Pipfile.lock   | 207 ++
 python/VERSION|   1 +
 python/mypy.ini   |   4 -
 python/qemu/.flake8   |   2 -
 python/qemu/.isort.cfg|   7 -
 python/qemu/__init__.py   |  11 --
 python/qemu/core/__init__.py  |  47 +
 python/qemu/{ => core}/accel.py   |   0
 python/qemu/{ => core}/console_socket.py  |   0
 python/qemu/{ => core}/machine.py |   0
 python/qemu/{ => core}/qmp.py |   0
 python/qemu/{ => core}/qtest.py   |   0
 python/{qemu/pylintrc => setup.cfg}   |  66 +++
 python/setup.py   |  23 +++
 scripts/device-crash-test |   2 +-
 scripts/qmp/qemu-ga-client|   2 +-
 scripts/qmp/qmp   |   2 +-
 scripts/qmp/qmp-shell |   2 +-
 scripts/qmp/qom-fuse  |   2 +-
 scripts/qmp/qom-get   |   2 +-
 scripts/qmp/qom-list  |   2 +-
 scripts/qmp/qom-set   |   2 +-
 scripts/qmp/qom-tree  |   2 +-
 scripts/render_block_graph.py |   6 +-
 scripts/simplebench/bench_block_job.py|   4 +-
 tests/acceptance/avocado_qemu/__init__.py |   2 +-
 tests/acceptance/boot_linux.py|   3 +-
 tests/acceptance/virtio_check_params.py   |   2 +-
 tests/acceptance/virtio_version.py|   2 +-
 tests/migration/guestperf/engine.py   |   2 +-
 tests/qemu-iotests/235|   2 +-
 tests/qemu-iotests/297  

[PATCH v2 03/15] python: add VERSION file

2020-10-14 Thread John Snow
Python infrastructure as it exists today is not capable reliably of
single-sourcing a package version from a parent directory. The authors
of pip are working to correct this, but as of today this is not possible
to my knowledge.

The problem is that when using pip to build and install a python
package, it copies files over to a temporary directory and performs its
build there. This loses access to any information in the parent
directory, including git itself.

Further, Python versions have a standard (PEP 440) that may or may not
follow QEMU's versioning. In general, it does; but naturally QEMU does
not follow PEP 440. To avoid any automatically-generated conflict, a
manual version file is preferred.


I am proposing:

- Python core tooling synchronizes with the QEMU version directly
  (5.2.0, 5.1.1, 5.3.0, etc.)

- In the event that a Python package needs to be updated independently
  of the QEMU version, a pre-release alpha version should be preferred,
  but *only* after inclusion to the qemu development or stable branches.

  e.g. 5.2.0a1, 5.2.0a2, and so on should be preferred prior to 5.2.0's
  release.

- The Python core tooling makes absolutely no version compatibility
  checks or constraints. It *may* work with releases of QEMU from the
  past or future, but it is not required to.

  i.e., "qemu.core" will always remain in lock-step with QEMU.

- We reserve the right to split out e.g. qemu.core.qmp to qemu.qmp
  and begin indepedently versioning such a package separately from the
  QEMU version it accompanies.


Implement this versioning scheme by adding a VERSION file and setting it
to 5.2.0a1.

Signed-off-by: John Snow 
---
 python/VERSION   | 1 +
 python/setup.cfg | 1 +
 2 files changed, 2 insertions(+)
 create mode 100644 python/VERSION

diff --git a/python/VERSION b/python/VERSION
new file mode 100644
index 00..2e81039c82
--- /dev/null
+++ b/python/VERSION
@@ -0,0 +1 @@
+5.2.0a1
diff --git a/python/setup.cfg b/python/setup.cfg
index 12b99a796e..260f7f4e94 100755
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -1,5 +1,6 @@
 [metadata]
 name = qemu
+version = file:VERSION
 maintainer = QEMU Developer Team
 maintainer_email = qemu-de...@nongnu.org
 url = https://www.qemu.org/
-- 
2.26.2




[PATCH v2 09/15] python: move flake8 config to setup.cfg

2020-10-14 Thread John Snow
Signed-off-by: John Snow 
---
 python/qemu/core/.flake8 | 2 --
 python/setup.cfg | 3 +++
 2 files changed, 3 insertions(+), 2 deletions(-)
 delete mode 100644 python/qemu/core/.flake8

diff --git a/python/qemu/core/.flake8 b/python/qemu/core/.flake8
deleted file mode 100644
index 45d8146f3f..00
--- a/python/qemu/core/.flake8
+++ /dev/null
@@ -1,2 +0,0 @@
-[flake8]
-extend-ignore = E722  # Pylint handles this, but smarter.
\ No newline at end of file
diff --git a/python/setup.cfg b/python/setup.cfg
index a65802d85e..103371ad5e 100755
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -18,6 +18,9 @@ classifiers =
 python_requires = >= 3.6
 packages = find_namespace:
 
+[flake8]
+extend-ignore = E722  # Pylint handles this, but smarter.
+
 [pylint.messages control]
 # Disable the message, report, category or checker with the given id(s). You
 # can either give multiple identifiers separated by comma (,) or put this
-- 
2.26.2




[PATCH v2 10/15] python: Add flake8 to pipenv

2020-10-14 Thread John Snow
flake8 3.5.x does not support the --extend-ignore syntax used in the
.flake8 file to gracefully extend default ignores, so 3.6.x is our
minimum requirement. There is no known upper bound.

flake8 can be run from the python/ directory with no arguments.

Signed-off-by: John Snow 
---
 python/Pipfile  |  1 +
 python/Pipfile.lock | 42 +-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/python/Pipfile b/python/Pipfile
index 1e58986c89..d1f7045f68 100644
--- a/python/Pipfile
+++ b/python/Pipfile
@@ -4,6 +4,7 @@ url = "https://pypi.org/simple;
 verify_ssl = true
 
 [dev-packages]
+flake8 = ">=3.6.0"
 pylint = ">=2.6.0"
 
 [packages]
diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index 71e0d40ffb..eb5ae75a0c 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -1,7 +1,7 @@
 {
 "_meta": {
 "hash": {
-"sha256": 
"b7ac1f2ad73bc166244c0378298afba64951a16bb749b81a9668dc41f40f941c"
+"sha256": 
"9f6d4857a6c72ad40fc1ec1e58cdb76f187a2986ac4156f0027e5eb798ec69a9"
 },
 "pipfile-spec": 6,
 "requires": {
@@ -25,6 +25,22 @@
 "markers": "python_version >= '3.5'",
 "version": "==2.4.2"
 },
+"flake8": {
+"hashes": [
+
"sha256:749dbbd6bfd0cf1318af27bf97a14e28e5ff548ef8e5b1566ccfb25a11e7c839",
+
"sha256:aadae8761ec651813c24be05c6f7b4680857ef6afaae4651a4eccaef97ce6c3b"
+],
+"index": "pypi",
+"version": "==3.8.4"
+},
+"importlib-metadata": {
+"hashes": [
+
"sha256:77a540690e24b0305878c37ffd421785a6f7e53c8b5720d211b211de8d0e95da",
+
"sha256:cefa1a2f919b866c5beb7c9f7b0ebb4061f30a8a9bf16d609b000e2dfaceb9c3"
+],
+"markers": "python_version < '3.8'",
+"version": "==2.0.0"
+},
 "isort": {
 "hashes": [
 
"sha256:dcab1d98b469a12a1a624ead220584391648790275560e1a43e54c5dceae65e7",
@@ -67,6 +83,22 @@
 ],
 "version": "==0.6.1"
 },
+"pycodestyle": {
+"hashes": [
+
"sha256:2295e7b2f6b5bd100585ebcb1f616591b652db8a741695b3d8f5d28bdc934367",
+
"sha256:c58a7d2815e0e8d7972bf1803331fb0152f867bd89adf8a01dfd55085434192e"
+],
+"markers": "python_version >= '2.7' and python_version not in 
'3.0, 3.1, 3.2, 3.3'",
+"version": "==2.6.0"
+},
+"pyflakes": {
+"hashes": [
+
"sha256:0d94e0e05a19e57a99444b6ddcf9a6eb2e5c68d3ca1e98e90707af8152c90a92",
+
"sha256:35b2d75ee967ea93b55750aa9edbbf72813e06a66ba54438df2cfac9e3c27fc8"
+],
+"markers": "python_version >= '2.7' and python_version not in 
'3.0, 3.1, 3.2, 3.3'",
+"version": "==2.2.0"
+},
 "pylint": {
 "hashes": [
 
"sha256:bb4a908c9dadbc3aac18860550e870f58e1a02c9f2c204fdf5693d73be061210",
@@ -122,6 +154,14 @@
 
"sha256:b62ffa81fb85f4332a4f609cab4ac40709470da05643a082ec1eb88e6d9b97d7"
 ],
 "version": "==1.12.1"
+},
+"zipp": {
+"hashes": [
+
"sha256:64ad89efee774d1897a58607895d80789c59778ea02185dd846ac38394a8642b",
+
"sha256:eed8ec0b8d1416b2ca33516a37a08892442f3954dee131e92cfd92d8fe3e7066"
+],
+"markers": "python_version >= '3.6'",
+"version": "==3.3.0"
 }
 }
 }
-- 
2.26.2




Re: [PATCH v11 02/13] copy-on-read: add filter append/drop functions

2020-10-14 Thread Andrey Shinkevich

On 14.10.2020 13:44, Max Reitz wrote:

On 12.10.20 19:43, Andrey Shinkevich wrote:

Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
  block/copy-on-read.c | 88 
  block/copy-on-read.h | 35 +
  2 files changed, 123 insertions(+)
  create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..bcccf0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c


[...]


@@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
  bdrv_register(_copy_on_read);
  }
  
+

+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+ QDict *node_options,
+ int flags, Error **errp)


I had hoped you could make this a generic block layer function. :(

(Because it really is rather generic)

*shrug*


Actually, I did (and still can do) that for the 'append node' function 
only but not for the 'drop node' one so far...


diff --git a/block.c b/block.c
index 11ab55f..f41e876 100644
--- a/block.c
+++ b/block.c
@@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs)
 g_free(bs);
 }

+BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict 
*node_options,

+   int flags, Error **errp)
+{
+BlockDriverState *new_node_bs;
+Error *local_err = NULL;
+
+new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
+if (new_node_bs == NULL) {
+error_prepend(errp, "Could not create node: ");
+return NULL;
+}
+
+bdrv_drained_begin(bs);
+bdrv_replace_node(bs, new_node_bs, _err);
+bdrv_drained_end(bs);
+
+if (local_err) {
+bdrv_unref(new_node_bs);
+error_propagate(errp, local_err);
+return NULL;
+}
+
+return new_node_bs;
+}
+
+void bdrv_remove_node(BlockDriverState *bs)
+{
+BdrvChild *child;
+BlockDriverState *inferior_bs;
+
+child = bdrv_filter_or_cow_child(bs);
+if (!child) {
+return;
+}
+inferior_bs = child->bs;
+
+/* Retain the BDS until we complete the graph change. */
+bdrv_ref(inferior_bs);
+/* Hold a guest back from writing while permissions are being reset. */
+bdrv_drained_begin(inferior_bs);
+/* Refresh permissions before the graph change. */
+bdrv_child_refresh_perms(bs, child, _abort);
+bdrv_replace_node(bs, inferior_bs, _abort);
+
+bdrv_drained_end(inferior_bs);
+bdrv_unref(inferior_bs);
+bdrv_unref(bs);
+}

So, it is an intermediate solution in this patch of the series. I am 
going to make both functions generic once Vladimir overhauls the QEMU 
permission update system. Otherwise, the COR-filter node cannot be 
removed from the backing chain gracefully.


Thank you for your r-b. If the next version comes, I can move the 
'append node' function only to the generic layer.


Andrey



Reviewed-by: Max Reitz 


+{
+BlockDriverState *cor_filter_bs;
+Error *local_err = NULL;
+
+cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
+if (cor_filter_bs == NULL) {
+error_prepend(errp, "Could not create COR-filter node: ");
+return NULL;
+}
+
+if (!qdict_get_try_str(node_options, "node-name")) {
+cor_filter_bs->implicit = true;
+}
+
+bdrv_drained_begin(bs);
+bdrv_replace_node(bs, cor_filter_bs, _err);
+bdrv_drained_end(bs);
+
+if (local_err) {
+bdrv_unref(cor_filter_bs);
+error_propagate(errp, local_err);
+return NULL;
+}
+
+return cor_filter_bs;
+}






Re: [PATCH v6 03/11] hw/block/nvme: Add support for Namespace Types

2020-10-14 Thread Niklas Cassel
On Wed, Oct 14, 2020 at 06:42:04AM +0900, Dmitry Fomichev wrote:
> From: Niklas Cassel 
> 
> Define the structures and constants required to implement
> Namespace Types support.
> 
> Namespace Types introduce a new command set, "I/O Command Sets",
> that allows the host to retrieve the command sets associated with
> a namespace. Introduce support for the command set and enable
> detection for the NVM Command Set.
> 
> The new workflows for identify commands rely heavily on zero-filled
> identify structs. E.g., certain CNS commands are defined to return
> a zero-filled identify struct when an inactive namespace NSID
> is supplied.
> 
> Add a helper function in order to avoid code duplication when
> reporting zero-filled identify structures.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Dmitry Fomichev 
> ---

(snip)

> @@ -2090,6 +2199,27 @@ static void nvme_clear_ctrl(NvmeCtrl *n)
>  n->bar.cc = 0;
>  }
>  
> +static void nvme_select_ns_iocs(NvmeCtrl *n)
> +{
> +NvmeNamespace *ns;
> +int i;
> +
> +for (i = 1; i <= n->num_namespaces; i++) {
> +ns = nvme_ns(n, i);
> +if (!ns) {
> +continue;
> +}
> +ns->iocs = nvme_cse_iocs_none;
> +switch (ns->csi) {
> +case NVME_CSI_NVM:
> +if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
> +ns->iocs = nvme_cse_iocs_nvm;
> +}
> +break;
> +}
> +}
> +}
> +
>  static int nvme_start_ctrl(NvmeCtrl *n)
>  {
>  uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12;
> @@ -2188,6 +2318,8 @@ static int nvme_start_ctrl(NvmeCtrl *n)
>  
>  QTAILQ_INIT(>aer_queue);
>  
> +nvme_select_ns_iocs(n);
> +
>  return 0;
>  }
>  
> @@ -2655,7 +2787,6 @@ int nvme_register_namespace(NvmeCtrl *n, NvmeNamespace 
> *ns, Error **errp)
>  trace_pci_nvme_register_namespace(nsid);
>  
>  n->namespaces[nsid - 1] = ns;
> -ns->iocs = nvme_cse_iocs_nvm;
>  
>  return 0;
>  }

Considering how tightly coupled the three above diffs are with the
Commands Supported and Effects log, and since patch 1 already adds
the ns->iocs checking in nvme_admin_cmd() and nvme_io_cmd(),
and since these three diffs are not really related to NS types,
I think they should be moved to patch 1.

It really helps the reviewer if both the ns->iocs assignment
and checking is done in the same patch, and introduced as early
as possible. And since this code is needed/valid simply to check
if ADMIN_ONLY is selected (even before NS Types were introduced),
I don't see any reason not to introduce them in to patch 1
together with the other ns->iocs stuff.

(We were always able to select a I/O Command Set using CC.CSS
(Admin only/None, or NVM), NS types simply introduced the ability
to select/enable more than one command set at the same time.)


Kind regards,
Niklas


Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
> COR-driver to skip unneeded reading. It can be taken into account for
> the COR-algorithms optimization. That check is being made during the
> block stream job by the moment.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 13 +
>  block/io.c   |  3 ++-
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index b136895..278a11a 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -148,10 +148,15 @@ static int coroutine_fn 
> cor_co_preadv_part(BlockDriverState *bs,
>  }
>  }
>  
> -ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> -  local_flags);
> -if (ret < 0) {
> -return ret;
> +if (!!(flags & BDRV_REQ_PREFETCH) &

How about dropping the double negation and using a logical && instead of
the binary &?

> +!(local_flags & BDRV_REQ_COPY_ON_READ)) {
> +/* Skip non-guest reads if no copy needed */
> +} else {

Hm.  I would have just written the negated form

(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))

and put the “skip” comment above that condition.

(Since local_flags is initialized to flags, it can be written as a
single comparison, but that’s a matter of taste and I’m not going to
recommend either over the other:

((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
BDRV_REQ_PREFETCH)

)

> +ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +  local_flags);
> +if (ret < 0) {
> +return ret;
> +}
>  }
>  
>  offset += n;
> diff --git a/block/io.c b/block/io.c
> index 11df188..bff1808 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
> *child,
>  
>  max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>  if (bytes <= max_bytes && bytes <= max_transfer) {
> -ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
> +ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
> + flags & bs->supported_read_flags);

Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
why it isn’t.

Max

>  goto out;
>  }
>  
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Fam Zheng
On Wed, 2020-10-14 at 14:42 +0200, Philippe Mathieu-Daudé wrote:
> On 10/14/20 2:34 PM, Fam Zheng wrote:
> > On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:
> > > A bunch of boring patches that have been proven helpful
> > > while debugging.
> > > 
> > > Philippe Mathieu-Daudé (9):
> > >util/vfio-helpers: Improve reporting unsupported IOMMU type
> > >util/vfio-helpers: Trace PCI I/O config accesses
> > >util/vfio-helpers: Trace PCI BAR region info
> > >util/vfio-helpers: Trace where BARs are mapped
> > >util/vfio-helpers: Improve DMA trace events
> > >util/vfio-helpers: Convert vfio_dump_mapping to trace events
> > >util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
> > >util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
> > >util/vfio-helpers: Let qemu_vfio_verify_mappings() use
> > > error_report()
> > > 
> > >   include/qemu/vfio-helpers.h |  2 +-
> > >   block/nvme.c| 14 
> > >   util/vfio-helpers.c | 66 +-
> > > ---
> > > 
> > >   util/trace-events   | 10 --
> > >   4 files changed, 54 insertions(+), 38 deletions(-)
> > > 
> > > -- 
> > > 2.26.2
> > > 
> > > 
> > > 
> > 
> > Modular the g_strdup_printf() memleak I pointed out:
> > 
> > Reviewed-by: Fam Zheng 

Overlooked the auto free qualifier, so that one is okay too!

Fam

> 
> Thanks!
> 
> 




Re: [PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Philippe Mathieu-Daudé

On 10/14/20 2:34 PM, Fam Zheng wrote:

On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:

A bunch of boring patches that have been proven helpful
while debugging.

Philippe Mathieu-Daudé (9):
   util/vfio-helpers: Improve reporting unsupported IOMMU type
   util/vfio-helpers: Trace PCI I/O config accesses
   util/vfio-helpers: Trace PCI BAR region info
   util/vfio-helpers: Trace where BARs are mapped
   util/vfio-helpers: Improve DMA trace events
   util/vfio-helpers: Convert vfio_dump_mapping to trace events
   util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
   util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
   util/vfio-helpers: Let qemu_vfio_verify_mappings() use
error_report()

  include/qemu/vfio-helpers.h |  2 +-
  block/nvme.c| 14 
  util/vfio-helpers.c | 66 +

  util/trace-events   | 10 --
  4 files changed, 54 insertions(+), 38 deletions(-)

--
2.26.2





Modular the g_strdup_printf() memleak I pointed out:

Reviewed-by: Fam Zheng 


Thanks!




Re: [PATCH 3/9] util/vfio-helpers: Trace PCI BAR region info

2020-10-14 Thread Philippe Mathieu-Daudé

On 10/14/20 2:23 PM, Fam Zheng wrote:

On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:

For debug purpose, trace BAR regions info.

Signed-off-by: Philippe Mathieu-Daudé 
---
  util/vfio-helpers.c | 8 
  util/trace-events   | 1 +
  2 files changed, 9 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 1d4efafcaa4..cd6287c3a98 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -136,6 +136,7 @@ static inline void
assert_bar_index_valid(QEMUVFIOState *s, int index)
  
  static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error

**errp)
  {
+g_autofree char *barname = NULL;


^^


  assert_bar_index_valid(s, index);
  s->bar_region_info[index] = (struct vfio_region_info) {
  .index = VFIO_PCI_BAR0_REGION_INDEX + index,
@@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState
*s, int index, Error **errp)
  error_setg_errno(errp, errno, "Failed to get BAR region
info");
  return -errno;
  }
+barname = g_strdup_printf("bar[%d]", index);


Where is barname freed?


Using GLib g_autofree qualifier.



Fam


+trace_qemu_vfio_region_info(barname, s-

bar_region_info[index].offset,

+s->bar_region_info[index].size,
+s-

bar_region_info[index].cap_offset);
  
  return 0;

  }
@@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s,
const char *device,
  ret = -errno;
  goto fail;
  }
+trace_qemu_vfio_region_info("config", s-

config_region_info.offset,

+s->config_region_info.size,
+s->config_region_info.cap_offset);
  
  for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {

  ret = qemu_vfio_pci_init_bar(s, i, errp);
diff --git a/util/trace-events b/util/trace-events
index c048f85f828..4d40c74a21f 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size,
bool temporary, uint64_t *io
  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t
region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d
(region ofs 0x%"PRIx64" size %"PRId64")"
  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t
region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d
(region ofs 0x%"PRIx64" size %"PRId64")"
+qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t
size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size
%"PRId64" cap_ofs %"PRId32







Re: [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags

s/write/read/

> of the COR-filter.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index dfbd6ad..b136895 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
> int flags,
>  return -EINVAL;
>  }
>  
> +bs->supported_read_flags = BDRV_REQ_PREFETCH;
>  bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
>  (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);

Then we mustn’t let cor_co_preadv_part() pass the flag on to
bdrv_co_preadv_part() unless BDRV_REQ_COPY_ON_READ is set, too.  I
suspect the following patch is going to do that, but in the meantime the
code is wrong.

Perhaps just swap both patches?

And by the way, I’m also missing a patch that makes the block layer
evaluate supported_read_flags and e.g. strip BDRV_REQ_PREFETCH if it
isn’t supported, before it gets passed to such a block driver.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Fam Zheng
On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:
> A bunch of boring patches that have been proven helpful
> while debugging.
> 
> Philippe Mathieu-Daudé (9):
>   util/vfio-helpers: Improve reporting unsupported IOMMU type
>   util/vfio-helpers: Trace PCI I/O config accesses
>   util/vfio-helpers: Trace PCI BAR region info
>   util/vfio-helpers: Trace where BARs are mapped
>   util/vfio-helpers: Improve DMA trace events
>   util/vfio-helpers: Convert vfio_dump_mapping to trace events
>   util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
>   util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
>   util/vfio-helpers: Let qemu_vfio_verify_mappings() use
> error_report()
> 
>  include/qemu/vfio-helpers.h |  2 +-
>  block/nvme.c| 14 
>  util/vfio-helpers.c | 66 +
> 
>  util/trace-events   | 10 --
>  4 files changed, 54 insertions(+), 38 deletions(-)
> 
> -- 
> 2.26.2
> 
> 
> 

Modular the g_strdup_printf() memleak I pointed out:

Reviewed-by: Fam Zheng 




Re: [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
> use it alone and pass it to the COR-filter driver for further
> processing.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  include/block/block.h | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 981ab5b..2b7efd1 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -71,9 +71,10 @@ typedef enum {
>  BDRV_REQ_NO_FALLBACK= 0x100,
>  
>  /*
> - * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
> - * on read request and means that caller doesn't really need data to be
> - * written to qiov parameter which may be NULL.
> + * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
> + * flag or when the COR-filter applied to read operations and means that

There’s some word missing here, but I’m not sure what it is...  At least
an “is” before “applied”.  Perhaps something like ”or when a COR filter
is involved (in read operations)” would be better.

> + * caller doesn't really need data to be written to qiov parameter which

And this “written to” confused me for a second, because we’re reading
into qiov.  Technically, that means writing into the buffer, but, you know.

Could we rewrite the whole thing, perhaps?  Something like

“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
(i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a
COR filter), in which case it signals that the COR operation need not
read the data into memory (qiov), but only ensure it is copied to the
top layer (i.e., that COR is done).”

I don’t know.

Max

> + * may be NULL.
>   */
>  BDRV_REQ_PREFETCH  = 0x200,
>  /* Mask of valid flags */
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 07/13] block: include supported_read_flags into BDS structure

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Add the new member supported_read_flags to BlockDriverState structure.
> It will control the BDRV_REQ_PREFETCH flag set for copy-on-read
> operations.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  include/block/block_int.h | 4 
>  1 file changed, 4 insertions(+)

Not sure what the problem with passing BDRV_REQ_PREFETCH to drivers that
aren’t the COR filter are would be, but:

Reviewed-by: Max Reitz 

(I mean, outside of the context of COR the flag is undefined, so it can
be anything, but intuitively I’d assume it means “no need to read the
data to memory” then, too.  So if a driver doesn’t support it, nothing
bad should happen.  Hm.  Well, unless the driver passes the flag on
(unsuspectingly) and a metadata-bearing child doesn’t return any data
that the drviver needs.  Yeah, better to explicitly disallow the flag
for all unsupporting drivers then.)



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/9] util/vfio-helpers: Trace PCI BAR region info

2020-10-14 Thread Fam Zheng
On Wed, 2020-10-14 at 13:52 +0200, Philippe Mathieu-Daudé wrote:
> For debug purpose, trace BAR regions info.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  util/vfio-helpers.c | 8 
>  util/trace-events   | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
> index 1d4efafcaa4..cd6287c3a98 100644
> --- a/util/vfio-helpers.c
> +++ b/util/vfio-helpers.c
> @@ -136,6 +136,7 @@ static inline void
> assert_bar_index_valid(QEMUVFIOState *s, int index)
>  
>  static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error
> **errp)
>  {
> +g_autofree char *barname = NULL;
>  assert_bar_index_valid(s, index);
>  s->bar_region_info[index] = (struct vfio_region_info) {
>  .index = VFIO_PCI_BAR0_REGION_INDEX + index,
> @@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState
> *s, int index, Error **errp)
>  error_setg_errno(errp, errno, "Failed to get BAR region
> info");
>  return -errno;
>  }
> +barname = g_strdup_printf("bar[%d]", index);

Where is barname freed?

Fam

> +trace_qemu_vfio_region_info(barname, s-
> >bar_region_info[index].offset,
> +s->bar_region_info[index].size,
> +s-
> >bar_region_info[index].cap_offset);
>  
>  return 0;
>  }
> @@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s,
> const char *device,
>  ret = -errno;
>  goto fail;
>  }
> +trace_qemu_vfio_region_info("config", s-
> >config_region_info.offset,
> +s->config_region_info.size,
> +s->config_region_info.cap_offset);
>  
>  for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {
>  ret = qemu_vfio_pci_init_bar(s, i, errp);
> diff --git a/util/trace-events b/util/trace-events
> index c048f85f828..4d40c74a21f 100644
> --- a/util/trace-events
> +++ b/util/trace-events
> @@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size,
> bool temporary, uint64_t *io
>  qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
>  qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t
> region_ofs, uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d
> (region ofs 0x%"PRIx64" size %"PRId64")"
>  qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t
> region_ofs, uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d
> (region ofs 0x%"PRIx64" size %"PRId64")"
> +qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t
> size, uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size
> %"PRId64" cap_ofs %"PRId32




Re: [PATCH v4 6/7] nbd: Refactor counting of metadata contexts

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

10.10.2020 00:55, Eric Blake wrote:

Rather than open-code the count of negotiated contexts at several
sites, embed it directly into the struct.

Signed-off-by: Eric Blake


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v4 5/7] nbd: Simplify qemu bitmap context name

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

10.10.2020 00:55, Eric Blake wrote:

Each dirty bitmap already knows its name; by reducing the scope of the
places where we construct "qemu:dirty-bitmap:NAME" strings, tracking
the name is more localized, and there are fewer per-export fields to
worry about.  This in turn will make it easier for an upcoming patch
to export more than one bitmap at once.

Signed-off-by: Eric Blake


Thanks for updating this!

Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v4 4/7] nbd: Update qapi to support exporting multiple bitmaps

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

10.10.2020 00:55, Eric Blake wrote:

Since 'nbd-server-add' is deprecated, and 'block-export-add' is new to
5.2, we can still tweak the interface.  Allowing 'bitmaps':['str'] is
nicer than 'bitmap':'str'.  This wires up the qapi and qemu-nbd
changes to permit passing multiple bitmaps as distinct metadata
contexts that the NBD client may request, but the actual support for
more than one will require a further patch to the server.

Signed-off-by: Eric Blake 
---


[..]


  break;
  case 'B':
-bitmap = optarg;
+tmp = g_new(strList, 1);
+tmp->value = g_strdup(optarg);
+tmp->next = bitmaps;
+bitmaps = tmp;


If publish QAPI_LIST_ADD, defined in block.c, it would look like:

QAPI_LIST_ADD(bitmaps, g_strdup(optarg));

anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v6 01/11] hw/block/nvme: Add Commands Supported and Effects log

2020-10-14 Thread Niklas Cassel
On Tue, Oct 13, 2020 at 05:50:34PM -0700, Keith Busch wrote:
> On Wed, Oct 14, 2020 at 06:42:02AM +0900, Dmitry Fomichev wrote:
> > +{
> > +NvmeEffectsLog log = {};
> > +uint32_t *dst_acs = log.acs, *dst_iocs = log.iocs;
> > +uint32_t trans_len;
> > +int i;
> > +
> > +trace_pci_nvme_cmd_supp_and_effects_log_read();
> > +
> > +if (off >= sizeof(log)) {
> > +trace_pci_nvme_err_invalid_effects_log_offset(off);
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +for (i = 0; i < 256; i++) {
> > +dst_acs[i] = nvme_cse_acs[i];
> > +}
> > +
> > +for (i = 0; i < 256; i++) {
> > +dst_iocs[i] = nvme_cse_iocs_nvm[i];
> > +}
> 
> You're just copying the array, so let's do it like this:
> 
> memcpy(log.acs, nvme_cse_acs, sizeof(nvme_cse_acs));
> memcpy(log.iocs, nvme_cse_iocs_nvm, sizeof(nvme_cse_iocs_nvm));
> 
> I think you also need to check
> 
> if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY)
> 
> before copying iocs.

So it seems Dmitry actually does this in the Namespace Types patch:


@@ -1051,10 +1054,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
   req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));

-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) {
-return NVME_INVALID_OPCODE | NVME_DNR;
-}
-
 if (!nvme_nsid_valid(n, nsid)) {
 return NVME_INVALID_NSID | NVME_DNR;
 }
@@ -1333,8 +1332,10 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t 
buf_len,
 dst_acs[i] = nvme_cse_acs[i];
 }

-for (i = 0; i < 256; i++) {
-dst_iocs[i] = nvme_cse_iocs_nvm[i];
+if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+for (i = 0; i < 256; i++) {
+dst_iocs[i] = nvme_cse_iocs_nvm[i];
+}
 }


Which later in the ZNS patch is changed to:

@@ -1335,13 +2178,31 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint32_t 
buf_len,
 return NVME_INVALID_FIELD | NVME_DNR;
 }

+switch (NVME_CC_CSS(n->bar.cc)) {
+case NVME_CC_CSS_NVM:
+src_iocs = nvme_cse_iocs_nvm;
+break;
+case NVME_CC_CSS_CSI:
+switch (csi) {
+case NVME_CSI_NVM:
+src_iocs = nvme_cse_iocs_nvm;
+break;
+case NVME_CSI_ZONED:
+src_iocs = nvme_cse_iocs_zoned;
+break;
+}
+/* fall through */
+case NVME_CC_CSS_ADMIN_ONLY:
+break;
+}
+
 for (i = 0; i < 256; i++) {
 dst_acs[i] = nvme_cse_acs[i];
 }

-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+if (src_iocs) {
 for (i = 0; i < 256; i++) {
-dst_iocs[i] = nvme_cse_iocs_nvm[i];
+dst_iocs[i] = src_iocs[i];
 }
 }


Perhaps the nvme_io_cmd() if-statement removal from the NS types patch +
the switch from the ZNS patch (with out the NVME_CSI_ZONED) could be moved
to this patch instead?

The middle version of adding "+if (NVME_CC_CSS(n->bar.cc) != 
NVME_CC_CSS_ADMIN_ONLY) {"
can then be dropped from the NS types patch, since it is not part of the final 
code anyway.


Kind regards,
Niklas


Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn 
> cor_co_preadv_part(BlockDriverState *bs,
> size_t qiov_offset,
> int flags)
>  {
> -return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -   flags | BDRV_REQ_COPY_ON_READ);
> +int64_t n = 0;
> +int64_t size = offset + bytes;
> +int local_flags;
> +int ret;
> +BDRVStateCOR *state = bs->opaque;
> +
> +if (!state->base_overlay) {
> +return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, 
> qiov_offset,
> +   flags | BDRV_REQ_COPY_ON_READ);
> +}
> +
> +while (offset < size) {
> +local_flags = flags;
> +
> +/* In case of failure, try to copy-on-read anyway */
> +ret = bdrv_is_allocated(bs->file->bs, offset, bytes, );
> +if (!ret) {

In case of failure, a negative value is going to be returned, we won’t
go into this conditional block, and local_flags isn’t going to contain
BDRV_REQ_COPY_ON_READ.

So the idea of CORing in case of failure sounds sound to me, but it
doesn’t look like that’s done.

> +ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),

I think this should either be bdrv_backing_chain_next() or we must rule
out the possibility of bs->file->bs being a filter somewhere.  I think
I’d prefer the former.

> +  state->base_overlay, true, offset,
> +  n, );
> +if (ret) {

“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
|| ret < 0” probably needed above), but correct either way.

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-14 Thread Max Reitz
On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn 
> cor_co_preadv_part(BlockDriverState *bs,
> size_t qiov_offset,
> int flags)
>  {
> -return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -   flags | BDRV_REQ_COPY_ON_READ);
> +int64_t n = 0;
> +int64_t size = offset + bytes;

Just when I hit send I noticed that “end” would be a more fitting name
for this variable.

> +int local_flags;
> +int ret;
> +BDRVStateCOR *state = bs->opaque;
> +
> +if (!state->base_overlay) {
> +return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, 
> qiov_offset,
> +   flags | BDRV_REQ_COPY_ON_READ);
> +}
> +
> +while (offset < size) {

(because I got a bit confused looking at this)

(Though dropping @size (or @end) and just checking when @bytes becomes 0
should work, too.)

> +local_flags = flags;
> +
> +/* In case of failure, try to copy-on-read anyway */
> +ret = bdrv_is_allocated(bs->file->bs, offset, bytes, );
> +if (!ret) {
> +ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> +  state->base_overlay, true, offset,
> +  n, );
> +if (ret) {
> +local_flags |= BDRV_REQ_COPY_ON_READ;
> +}
> +}

Furthermore, I just noticed – can the is_allocated functions not return
0 in @n, when @offset is a the EOF?  Is that something to look out for?
 (I’m not sure.)

Max

> +
> +ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +  local_flags);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +offset += n;
> +qiov_offset += n;
> +bytes -= n;
> +}
> +
> +return 0;
>  }
>  
>  
> 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 05/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-14 Thread Niklas Cassel
On Wed, Oct 14, 2020 at 06:42:06AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 

(snip)

> @@ -2260,6 +3155,11 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
>  ns->iocs = nvme_cse_iocs_nvm;
>  }
>  break;
> +case NVME_CSI_ZONED:
> +if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
> +ns->iocs = nvme_cse_iocs_zoned;
> +}
> +break;
>  }
>  }
>  }

Who knows how this whole command set mess is supposed to work,
since e.g. the Key Value Command Set assigns opcodes for new commands
(Delete, Exist) with a opcode values (0x10,0x14) smaller than the
current highest opcode value (0x15) in the NVM Command Set,
while those opcodes (0x10,0x14) are reserved in the NVM Command Set.

At least for Zoned Command Set, they defined the new commands
(Zone Mgmt Send, Zone Mgmt Recv) to opcode values (0x79,0x7a)
that are higher than the current highest opcode value in the
NVM Command Set.

So since we know that the Zoned Command Set is a strict superset of
the NVM Command Set, I guess it might be nice to do something like:

case NVME_CSI_ZONED:
if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
ns->iocs = nvme_cse_iocs_zoned;
} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
ns->iocs = nvme_cse_iocs_nvm;
}
break;


Since I assume that the spec people intended reads/writes
to a ZNS namespace to still be possible when CC_CSS == NVM,
but who knows?


Kind regards,
Niklas


[PATCH 5/9] util/vfio-helpers: Improve DMA trace events

2020-10-14 Thread Philippe Mathieu-Daudé
For debugging purpose, trace where DMA regions are mapped.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 3 ++-
 util/trace-events   | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 278c54902e7..c24a510df82 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -627,7 +627,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 .vaddr = (uintptr_t)host,
 .size = size,
 };
-trace_qemu_vfio_do_mapping(s, host, size, iova);
+trace_qemu_vfio_do_mapping(s, host, iova, size);
 
 if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, _map)) {
 error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
@@ -783,6 +783,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 }
 }
 }
+trace_qemu_vfio_dma_mapped(s, host, iova0, size);
 if (iova) {
 *iova = iova0;
 }
diff --git a/util/trace-events b/util/trace-events
index 50652761a58..8598066acdb 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -82,8 +82,9 @@ qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s 
%p host %p size 0x%z
 qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t 
iova) "s %p host %p size 0x%zx index %d iova 0x%"PRIx64
-qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p 
host %p size 0x%zx iova 0x%"PRIx64
-qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t 
*iova) "s %p host %p size 0x%zx temporary %d iova %p"
+qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p 
host %p <-> iova 0x%"PRIx64 " size 0x%zx"
+qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t 
*iova) "s %p host %p size 0x%zx temporary %d  %p"
+qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p 
host %p <-> iova 0x%"PRIx64" size 0x%zx"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
-- 
2.26.2




[PATCH 9/9] util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report()

2020-10-14 Thread Philippe Mathieu-Daudé
Instead of displaying the error on stderr, use error_report()
which also report to the monitor.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 2c4598d7faa..488ddfca2a9 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -661,13 +661,13 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
 if (QEMU_VFIO_DEBUG) {
 for (i = 0; i < s->nr_mappings - 1; ++i) {
 if (!(s->mappings[i].host < s->mappings[i + 1].host)) {
-fprintf(stderr, "item %d not sorted!\n", i);
+error_report("item %d not sorted!", i);
 qemu_vfio_dump_mappings(s);
 return false;
 }
 if (!(s->mappings[i].host + s->mappings[i].size <=
   s->mappings[i + 1].host)) {
-fprintf(stderr, "item %d overlap with next!\n", i);
+error_report("item %d overlap with next!", i);
 qemu_vfio_dump_mappings(s);
 return false;
 }
-- 
2.26.2




Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Max Reitz
On 14.10.20 13:09, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> We are going to use the COR-filter for a block-stream job.
>> To limit COR operations by the base node in the backing chain during
>> stream job, pass the name of overlay base node to the copy-on-read
>> driver as base node itself may change due to possible concurrent jobs.
>> The rest of the functionality will be implemented in the patch that
>> follows.
>>
>> Signed-off-by: Andrey Shinkevich 
>> ---
>>  block/copy-on-read.c | 14 ++
>>  1 file changed, 14 insertions(+)
> 
> Is there a reason why you didn’t add this option to QAPI (as part of a
> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
> 
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index bcccf0f..c578b1b 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -24,19 +24,24 @@
>>  #include "block/block_int.h"
>>  #include "qemu/module.h"
>>  #include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>>  #include "qapi/qmp/qdict.h"
>>  #include "block/copy-on-read.h"
>>  
>>  
>>  typedef struct BDRVStateCOR {
>>  bool active;
>> +BlockDriverState *base_overlay;
>>  } BDRVStateCOR;
>>  
>>  
>>  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>  Error **errp)
>>  {
>> +BlockDriverState *base_overlay = NULL;
>>  BDRVStateCOR *state = bs->opaque;
>> +/* We need the base overlay node rather than the base itself */
>> +const char *base_overlay_node = qdict_get_try_str(options, "base");
> 
> Shouldn’t it be called base-overlay or above-base then?

While reviewing patch 5, I realized this sould indeed be base-overlay
(i.e. the next allocation-bearing layer above the base, not just a
filter on top of base), but that should be noted somewhere, of course.
Best would be the QAPI documentation for this option. O:)

Max



signature.asc
Description: OpenPGP digital signature


[PATCH 7/9] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error

2020-10-14 Thread Philippe Mathieu-Daudé
Currently qemu_vfio_dma_map() displays errors on stderr.
When using management interface, this information is simply
lost. Pass qemu_vfio_dma_map() an Error* argument so it can
propagate the error to callers.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/vfio-helpers.h |  2 +-
 block/nvme.c| 14 +++---
 util/vfio-helpers.c | 12 +++-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 4491c8e1a6e..bde9495b254 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -18,7 +18,7 @@ typedef struct QEMUVFIOState QEMUVFIOState;
 QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
 void qemu_vfio_close(QEMUVFIOState *s);
 int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-  bool temporary, uint64_t *iova_list);
+  bool temporary, uint64_t *iova_list, Error **errp);
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
 void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host);
 void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
diff --git a/block/nvme.c b/block/nvme.c
index b48f6f25881..5f662c55bbe 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -167,9 +167,9 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
 return;
 }
 memset(q->queue, 0, bytes);
-r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, >iova);
+r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, >iova, errp);
 if (r) {
-error_setg(errp, "Cannot map queue");
+error_prepend(errp, "Cannot map queue: ");
 }
 }
 
@@ -223,7 +223,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState 
*s,
 q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
 r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
   s->page_size * NVME_NUM_REQS,
-  false, _list_iova);
+  false, _list_iova, errp);
 if (r) {
 goto fail;
 }
@@ -514,9 +514,9 @@ static void nvme_identify(BlockDriverState *bs, int 
namespace, Error **errp)
 error_setg(errp, "Cannot allocate buffer for identify response");
 goto out;
 }
-r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, );
+r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, , errp);
 if (r) {
-error_setg(errp, "Cannot map buffer for DMA");
+error_prepend(errp, "Cannot map buffer for DMA: ");
 goto out;
 }
 
@@ -989,7 +989,7 @@ try_map:
 r = qemu_vfio_dma_map(s->vfio,
   qiov->iov[i].iov_base,
   qiov->iov[i].iov_len,
-  true, );
+  true, , NULL);
 if (r == -ENOMEM && retry) {
 retry = false;
 trace_nvme_dma_flush_queue_wait(s);
@@ -1436,7 +1436,7 @@ static void nvme_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 int ret;
 BDRVNVMeState *s = bs->opaque;
 
-ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL);
+ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL, NULL);
 if (ret) {
 /* FIXME: we may run out of IOVA addresses after repeated
  * bdrv_register_buf/bdrv_unregister_buf, because nvme_vfio_dma_unmap
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 73f7bfa7540..c03fe0b7156 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -462,7 +462,7 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
 {
 QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
 trace_qemu_vfio_ram_block_added(s, host, size);
-qemu_vfio_dma_map(s, host, size, false, NULL);
+qemu_vfio_dma_map(s, host, size, false, NULL, NULL);
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -477,6 +477,7 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
 
 static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
 {
+Error *local_err = NULL;
 void *host_addr = qemu_ram_get_host_addr(rb);
 ram_addr_t length = qemu_ram_get_used_length(rb);
 int ret;
@@ -485,10 +486,11 @@ static int qemu_vfio_init_ramblock(RAMBlock *rb, void 
*opaque)
 if (!host_addr) {
 return 0;
 }
-ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
+ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL, _err);
 if (ret) {
-fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-host_addr, (uint64_t)length);
+error_reportf_err(local_err,
+  "qemu_vfio_init_ramblock: failed %p %" PRId64 ":",
+  host_addr, (uint64_t)length);
 }
 return 0;
 }
@@ -724,7 +726,7 @@ qemu_vfio_find_temp_iova(QEMUVFIOState *s, size_t size, 
uint64_t *iova)
  * mapping status within this area is not allowed).
  */
 

[PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Philippe Mathieu-Daudé
A bunch of boring patches that have been proven helpful
while debugging.

Philippe Mathieu-Daudé (9):
  util/vfio-helpers: Improve reporting unsupported IOMMU type
  util/vfio-helpers: Trace PCI I/O config accesses
  util/vfio-helpers: Trace PCI BAR region info
  util/vfio-helpers: Trace where BARs are mapped
  util/vfio-helpers: Improve DMA trace events
  util/vfio-helpers: Convert vfio_dump_mapping to trace events
  util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
  util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
  util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report()

 include/qemu/vfio-helpers.h |  2 +-
 block/nvme.c| 14 
 util/vfio-helpers.c | 66 +
 util/trace-events   | 10 --
 4 files changed, 54 insertions(+), 38 deletions(-)

-- 
2.26.2





Re: [PATCH v4 2/7] nbd: Add new qemu:allocation-depth metadata context

2020-10-14 Thread Vladimir Sementsov-Ogievskiy

10.10.2020 00:55, Eric Blake wrote:

'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point the
qemu:dirty-bitmap:NAME metadata context can expose that information
via the creation of a temporary bitmap, but we can shorten the effort
by adding a new qemu:allocation-depth metadata context that does the
same thing without an intermediate bitmap (this patch does not
eliminate the need for that proposal, as it will have other uses as
well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); an obvious extension
would be to provide the actual depth in bits 31-4 while keeping bits
1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
from a hex number).  But adding this extension would require
bdrv_is_allocated_above to return a depth number.

While documenting things, remember that although the NBD protocol has
NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to
'metadata context', which is a more apt description of what is
actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting
metadata by passing one or more context names.  So I also touched up
some existing wording to prefer the term 'metadata context' where it
makes sense.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  docs/interop/nbd.txt | 27 ++---


[..]


+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+bits 0-1: 00: NBD_STATE_DEPTH_UNALLOC, the extent is unallocated
+  01: NBD_STATE_DEPTH_LOCAL, the extent is allocated in the
+  top level of the image


Hmm. I always thought that "image" == file, so backing chain is a chain of 
images,
not a several levels of one image. If it is so, than it should be "the top level 
image".
And "levels of the image" may designate internal qcow2 snapshots unrelated 
here..


--
Best regards,
Vladimir



[PATCH 8/9] util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error

2020-10-14 Thread Philippe Mathieu-Daudé
Pass qemu_vfio_do_mapping() an Error* argument so it can propagate
any error to callers. Replace error_report() which only report
to the monitor by the more generic error_setg_errno().

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c03fe0b7156..2c4598d7faa 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -609,7 +609,7 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
 
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
-uint64_t iova)
+uint64_t iova, Error **errp)
 {
 struct vfio_iommu_type1_dma_map dma_map = {
 .argsz = sizeof(dma_map),
@@ -621,7 +621,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 trace_qemu_vfio_do_mapping(s, host, iova, size);
 
 if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, _map)) {
-error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
+error_setg_errno(errp, errno, "VFIO_MAP_DMA failed");
 return -errno;
 }
 return 0;
@@ -757,7 +757,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 goto out;
 }
 assert(qemu_vfio_verify_mappings(s));
-ret = qemu_vfio_do_mapping(s, host, size, iova0);
+ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
 if (ret) {
 qemu_vfio_undo_mapping(s, mapping, NULL);
 goto out;
@@ -768,7 +768,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 ret = -ENOMEM;
 goto out;
 }
-ret = qemu_vfio_do_mapping(s, host, size, iova0);
+ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
 if (ret) {
 goto out;
 }
-- 
2.26.2




[PATCH 6/9] util/vfio-helpers: Convert vfio_dump_mapping to trace events

2020-10-14 Thread Philippe Mathieu-Daudé
The QEMU_VFIO_DEBUG definition is only modifiable at build-time.
Trace events can be enabled at run-time. As we prefer the latter,
convert qemu_vfio_dump_mappings() to use trace events instead
of fprintf().

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 19 ---
 util/trace-events   |  1 +
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c24a510df82..73f7bfa7540 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -521,23 +521,12 @@ QEMUVFIOState *qemu_vfio_open_pci(const char *device, 
Error **errp)
 return s;
 }
 
-static void qemu_vfio_dump_mapping(IOVAMapping *m)
-{
-if (QEMU_VFIO_DEBUG) {
-printf("  vfio mapping %p %" PRIx64 " to %" PRIx64 "\n", m->host,
-   (uint64_t)m->size, (uint64_t)m->iova);
-}
-}
-
 static void qemu_vfio_dump_mappings(QEMUVFIOState *s)
 {
-int i;
-
-if (QEMU_VFIO_DEBUG) {
-printf("vfio mappings\n");
-for (i = 0; i < s->nr_mappings; ++i) {
-qemu_vfio_dump_mapping(>mappings[i]);
-}
+for (int i = 0; i < s->nr_mappings; ++i) {
+trace_qemu_vfio_dump_mapping(s->mappings[i].host,
+ s->mappings[i].iova,
+ s->mappings[i].size);
 }
 }
 
diff --git a/util/trace-events b/util/trace-events
index 8598066acdb..7faad2a718c 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -80,6 +80,7 @@ qemu_mutex_unlock(void *mutex, const char *file, const int 
line) "released mutex
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
 qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
 qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
+qemu_vfio_dump_mapping(void *host, uint64_t iova, size_t size) "vfio mapping 
%p to iova 0x%08" PRIx64 " size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t 
iova) "s %p host %p size 0x%zx index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p 
host %p <-> iova 0x%"PRIx64 " size 0x%zx"
-- 
2.26.2




[PATCH 3/9] util/vfio-helpers: Trace PCI BAR region info

2020-10-14 Thread Philippe Mathieu-Daudé
For debug purpose, trace BAR regions info.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 8 
 util/trace-events   | 1 +
 2 files changed, 9 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 1d4efafcaa4..cd6287c3a98 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -136,6 +136,7 @@ static inline void assert_bar_index_valid(QEMUVFIOState *s, 
int index)
 
 static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **errp)
 {
+g_autofree char *barname = NULL;
 assert_bar_index_valid(s, index);
 s->bar_region_info[index] = (struct vfio_region_info) {
 .index = VFIO_PCI_BAR0_REGION_INDEX + index,
@@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int 
index, Error **errp)
 error_setg_errno(errp, errno, "Failed to get BAR region info");
 return -errno;
 }
+barname = g_strdup_printf("bar[%d]", index);
+trace_qemu_vfio_region_info(barname, s->bar_region_info[index].offset,
+s->bar_region_info[index].size,
+s->bar_region_info[index].cap_offset);
 
 return 0;
 }
@@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char 
*device,
 ret = -errno;
 goto fail;
 }
+trace_qemu_vfio_region_info("config", s->config_region_info.offset,
+s->config_region_info.size,
+s->config_region_info.cap_offset);
 
 for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {
 ret = qemu_vfio_pci_init_bar(s, i, errp);
diff --git a/util/trace-events b/util/trace-events
index c048f85f828..4d40c74a21f 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size, bool 
temporary, uint64_t *io
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
+qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, 
uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs 
%"PRId32
-- 
2.26.2




[PATCH 4/9] util/vfio-helpers: Trace where BARs are mapped

2020-10-14 Thread Philippe Mathieu-Daudé
For debugging purpose, trace where a BAR is mapped.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 2 ++
 util/trace-events   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index cd6287c3a98..278c54902e7 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -166,6 +166,8 @@ void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
 p = mmap(NULL, MIN(size, s->bar_region_info[index].size - offset),
  prot, MAP_SHARED,
  s->device, s->bar_region_info[index].offset + offset);
+trace_qemu_vfio_pci_map_bar(index, s->bar_region_info[index].offset ,
+size, offset, p);
 if (p == MAP_FAILED) {
 error_setg_errno(errp, errno, "Failed to map BAR region");
 p = NULL;
diff --git a/util/trace-events b/util/trace-events
index 4d40c74a21f..50652761a58 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -88,3 +88,4 @@ qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
 qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, 
uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs 
%"PRId32
+qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, 
int ofs, void *host) "map region bar#%d ofs 0x%"PRIx64" size %"PRId64" ofs %d 
host %p"
-- 
2.26.2




[PATCH 2/9] util/vfio-helpers: Trace PCI I/O config accesses

2020-10-14 Thread Philippe Mathieu-Daudé
We sometime get kernel panic with some devices on Aarch64
hosts. Alex Williamson suggests it might be broken PCIe
root complex. Add trace event to record the latest I/O
access before crashing. In case, assert our accesses are
aligned.

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Alex Williamson 
---
 util/vfio-helpers.c | 8 
 util/trace-events   | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 14a549510fe..1d4efafcaa4 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -227,6 +227,10 @@ static int qemu_vfio_pci_read_config(QEMUVFIOState *s, 
void *buf,
 {
 int ret;
 
+trace_qemu_vfio_pci_read_config(buf, ofs, size,
+s->config_region_info.offset,
+s->config_region_info.size);
+assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
 do {
 ret = pread(s->device, buf, size, s->config_region_info.offset + ofs);
 } while (ret == -1 && errno == EINTR);
@@ -237,6 +241,10 @@ static int qemu_vfio_pci_write_config(QEMUVFIOState *s, 
void *buf, int size, int
 {
 int ret;
 
+trace_qemu_vfio_pci_write_config(buf, ofs, size,
+ s->config_region_info.offset,
+ s->config_region_info.size);
+assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
 do {
 ret = pwrite(s->device, buf, size, s->config_region_info.offset + ofs);
 } while (ret == -1 && errno == EINTR);
diff --git a/util/trace-events b/util/trace-events
index 24c31803b01..c048f85f828 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -85,3 +85,5 @@ qemu_vfio_new_mapping(void *s, void *host, size_t size, int 
index, uint64_t iova
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p 
host %p size 0x%zx iova 0x%"PRIx64
 qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t 
*iova) "s %p host %p size 0x%zx temporary %d iova %p"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
+qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
+qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
-- 
2.26.2




[PATCH 1/9] util/vfio-helpers: Improve reporting unsupported IOMMU type

2020-10-14 Thread Philippe Mathieu-Daudé
Change the confuse "VFIO IOMMU check failed" error message by
the explicit "VFIO IOMMU Type1 is not supported" once.

Example on POWER:

 $ qemu-system-ppc64 -drive 
if=none,id=nvme0,file=nvme://0001:01:00.0/1,format=raw
 qemu-system-ppc64: -drive 
if=none,id=nvme0,file=nvme://0001:01:00.0/1,format=raw: VFIO IOMMU Type1 is not 
supported

Suggested-by: Alex Williamson 
Reviewed-by: Fam Zheng 
Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c469beb0616..14a549510fe 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -300,7 +300,7 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char 
*device,
 }
 
 if (!ioctl(s->container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
-error_setg_errno(errp, errno, "VFIO IOMMU check failed");
+error_setg_errno(errp, errno, "VFIO IOMMU Type1 is not supported");
 ret = -EINVAL;
 goto fail_container;
 }
-- 
2.26.2




[PATCH v3] migration/block-dirty-bitmap: fix uninitialized variable warning

2020-10-14 Thread Chen Qun
A default value is provided for the variable 'bitmap_name' to avoid compiler 
warning.

The compiler show warning:
migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’
may be used uninitialized in this function [-Wmaybe-uninitialized]
   g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name));
   ^~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
v2->v3: Placing the declaration at the beginning of the block(Base on Max's 
suggestion).

Cc: Max Reitz 
Cc: Vladimir Sementsov-Ogievskiy 
Cc: Laurent Vivier 
Cc: Li Qiang 
---
 migration/block-dirty-bitmap.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 5bef793ac0..114987961a 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1071,18 +1071,15 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,
 return -EINVAL;
 }
 
-if (!s->cancelled) {
-if (bitmap_alias_map) {
-bitmap_name = g_hash_table_lookup(bitmap_alias_map,
-  s->bitmap_alias);
-if (!bitmap_name) {
-error_report("Error: Unknown bitmap alias '%s' on node "
- "'%s' (alias '%s')", s->bitmap_alias,
- s->bs->node_name, s->node_alias);
-cancel_incoming_locked(s);
-}
-} else {
-bitmap_name = s->bitmap_alias;
+bitmap_name = s->bitmap_alias;
+if (!s->cancelled && bitmap_alias_map) {
+bitmap_name = g_hash_table_lookup(bitmap_alias_map,
+  s->bitmap_alias);
+if (!bitmap_name) {
+error_report("Error: Unknown bitmap alias '%s' on node "
+ "'%s' (alias '%s')", s->bitmap_alias,
+ s->bs->node_name, s->node_alias);
+cancel_incoming_locked(s);
 }
 }
 
-- 
2.23.0




[PATCH v3 9/9] hw/block/nvme: allow open to close zone transitions by controller

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Allow the controller to release open resources by transitioning
implicitly and explicitly opened zones to closed. This is done using a
naive "least recently opened" strategy.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h|  5 
 hw/block/nvme-ns.c|  5 
 hw/block/nvme.c   | 57 ---
 hw/block/trace-events |  1 +
 4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 3d0269eef6f0..5d8523c047d8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -38,6 +38,8 @@ typedef struct NvmeZone {
 uint8_t*zde;
 
 uint64_t wp_staging;
+
+QTAILQ_ENTRY(NvmeZone) lru_entry;
 } NvmeZone;
 
 typedef struct NvmeNamespace {
@@ -64,6 +66,9 @@ typedef struct NvmeNamespace {
 struct {
 uint32_t open;
 uint32_t active;
+
+QTAILQ_HEAD(, NvmeZone) lru_open;
+QTAILQ_HEAD(, NvmeZone) lru_active;
 } resources;
 } zns;
 } NvmeNamespace;
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index a01cc5eeb445..cb8b44a78450 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -135,6 +135,9 @@ void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 ns->zns.resources.open = ns->params.zns.mor != 0x ?
 ns->params.zns.mor + 1 : ns->zns.num_zones;
 
+QTAILQ_INIT(>zns.resources.lru_open);
+QTAILQ_INIT(>zns.resources.lru_active);
+
 for (int i = 0; i < ns->zns.num_zones; i++) {
 NvmeZone *zone = >zns.zones[i];
 zone->zd = >zns.zd[i];
@@ -158,6 +161,8 @@ void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 
 if (ns->zns.resources.active) {
 ns->zns.resources.active--;
+QTAILQ_INSERT_TAIL(>zns.resources.lru_active, zone,
+   lru_entry);
 break;
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index cc637b3a68e9..1fab9d69261c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1105,11 +1105,47 @@ static inline void nvme_zone_reset_wp(NvmeZone *zone)
 zone->wp_staging = nvme_zslba(zone);
 }
 
+static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
+NvmeZoneState to);
+
+static uint16_t nvme_zrm_release_open(NvmeNamespace *ns)
+{
+NvmeZone *candidate;
+NvmeZoneState zs;
+uint16_t status;
+
+trace_pci_nvme_zrm_release_open(ns->params.nsid);
+
+QTAILQ_FOREACH(candidate, >zns.resources.lru_open, lru_entry) {
+zs = nvme_zs(candidate);
+
+/* skip explicitly opened zones */
+if (zs == NVME_ZS_ZSEO) {
+continue;
+}
+
+/* skip zones that have in-flight writes */
+if (candidate->wp_staging != nvme_wp(candidate)) {
+continue;
+}
+
+status = nvme_zrm_transition(ns, candidate, NVME_ZS_ZSC);
+if (status) {
+return status;
+}
+
+return NVME_SUCCESS;
+}
+
+return NVME_TOO_MANY_OPEN_ZONES;
+}
+
 static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 NvmeZoneState to)
 {
 NvmeZoneState from = nvme_zs(zone);
 NvmeZoneDescriptor *zd = zone->zd;
+uint16_t status;
 
 trace_pci_nvme_zrm_transition(ns->params.nsid, nvme_zslba(zone), from, to);
 
@@ -1131,6 +1167,7 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 }
 
 ns->zns.resources.active--;
+QTAILQ_INSERT_TAIL(>zns.resources.lru_active, zone, lru_entry);
 
 break;
 
@@ -1141,11 +1178,15 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 }
 
 if (!ns->zns.resources.open) {
-return NVME_TOO_MANY_OPEN_ZONES;
+status = nvme_zrm_release_open(ns);
+if (status) {
+return status;
+}
 }
 
 ns->zns.resources.active--;
 ns->zns.resources.open--;
+QTAILQ_INSERT_TAIL(>zns.resources.lru_open, zone, lru_entry);
 
 break;
 
@@ -1172,11 +1213,15 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSF:
 case NVME_ZS_ZSRO:
 ns->zns.resources.active++;
+ns->zns.resources.open++;
+QTAILQ_REMOVE(>zns.resources.lru_open, zone, lru_entry);
 
-/* fallthrough */
+break;
 
 case NVME_ZS_ZSC:
 ns->zns.resources.open++;
+QTAILQ_REMOVE(>zns.resources.lru_open, zone, lru_entry);
+QTAILQ_INSERT_TAIL(>zns.resources.lru_active, zone, lru_entry);
 
 break;
 
@@ -1201,16 +1246,22 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSF:
 case NVME_ZS_ZSRO:
 ns->zns.resources.active++;
+

[PATCH v3 8/9] hw/block/nvme: track and enforce zone resources

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Track number of open/active resources.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt  |  6 
 hw/block/nvme-ns.h   |  7 +
 include/block/nvme.h |  2 ++
 hw/block/nvme-ns.c   | 25 +++--
 hw/block/nvme.c  | 66 +++-
 5 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
index 80cb34406255..03bb4d9516b4 100644
--- a/docs/specs/nvme.txt
+++ b/docs/specs/nvme.txt
@@ -14,6 +14,12 @@ The nvme device (-device nvme) emulates an NVM Express 
Controller.
  zns.zcap; if the zone capacity is a power of two, the zone size will be
  set to that, otherwise it will default to the next power of two.
 
+  `zns.mar`; Specifies the number of active resources available. This is a 0s
+ based value.
+
+  `zns.mor`; Specifies the number of open resources available. This is a 0s
+ based value.
+
 
 Reference Specifications
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index a273e7f7ed4b..3d0269eef6f0 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -28,6 +28,8 @@ typedef struct NvmeNamespaceParams {
 uint64_t zcap;
 uint64_t zsze;
 uint8_t  zdes;
+uint32_t mar;
+uint32_t mor;
 } zns;
 } NvmeNamespaceParams;
 
@@ -58,6 +60,11 @@ typedef struct NvmeNamespace {
 NvmeZone   *zones;
 NvmeZoneDescriptor *zd;
 uint8_t*zde;
+
+struct {
+uint32_t open;
+uint32_t active;
+} resources;
 } zns;
 } NvmeNamespace;
 
diff --git a/include/block/nvme.h b/include/block/nvme.h
index aecefefd43b6..4ca5f3fb15eb 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -775,6 +775,8 @@ enum NvmeStatusCodes {
 NVME_ZONE_IS_READ_ONLY  = 0x01ba,
 NVME_ZONE_IS_OFFLINE= 0x01bb,
 NVME_ZONE_INVALID_WRITE = 0x01bc,
+NVME_TOO_MANY_ACTIVE_ZONES  = 0x01bd,
+NVME_TOO_MANY_OPEN_ZONES= 0x01be,
 NVME_INVALID_ZONE_STATE_TRANSITION = 0x01bf,
 NVME_WRITE_FAULT= 0x0280,
 NVME_UNRECOVERED_READ   = 0x0281,
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 0922b67dd6d8..a01cc5eeb445 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -92,8 +92,8 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 ns->zns.zde = g_malloc0_n(ns->zns.num_zones, nvme_ns_zdes_bytes(ns));
 }
 
-id_ns_zns->mar = 0x;
-id_ns_zns->mor = 0x;
+id_ns_zns->mar = cpu_to_le32(ns->params.zns.mar);
+id_ns_zns->mor = cpu_to_le32(ns->params.zns.mor);
 }
 
 static void nvme_ns_init(NvmeNamespace *ns)
@@ -130,6 +130,11 @@ static void nvme_ns_init(NvmeNamespace *ns)
 
 void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 {
+ns->zns.resources.active = ns->params.zns.mar != 0x ?
+ns->params.zns.mar + 1 : ns->zns.num_zones;
+ns->zns.resources.open = ns->params.zns.mor != 0x ?
+ns->params.zns.mor + 1 : ns->zns.num_zones;
+
 for (int i = 0; i < ns->zns.num_zones; i++) {
 NvmeZone *zone = >zns.zones[i];
 zone->zd = >zns.zd[i];
@@ -148,9 +153,15 @@ void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 if (nvme_wp(zone) == nvme_zslba(zone) &&
 !(zone->zd->za & NVME_ZA_ZDEV)) {
 nvme_zs_set(zone, NVME_ZS_ZSE);
+break;
 }
 
-break;
+if (ns->zns.resources.active) {
+ns->zns.resources.active--;
+break;
+}
+
+/* fallthrough */
 
 case NVME_ZS_ZSIO:
 case NVME_ZS_ZSEO:
@@ -207,6 +218,12 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
+if (ns->params.zns.mor > ns->params.zns.mar) {
+error_setg(errp, "maximum open resources (zns.mor) must be less "
+   "than or equal to maximum active resources (zns.mar)");
+return -1;
+}
+
 break;
 
 default:
@@ -272,6 +289,8 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_UINT64("zns.zcap", NvmeNamespace, params.zns.zcap, 0),
 DEFINE_PROP_UINT64("zns.zsze", NvmeNamespace, params.zns.zsze, 0),
 DEFINE_PROP_UINT8("zns.zdes", NvmeNamespace, params.zns.zdes, 0),
+DEFINE_PROP_UINT32("zns.mar", NvmeNamespace, params.zns.mar, 0x),
+DEFINE_PROP_UINT32("zns.mor", NvmeNamespace, params.zns.mor, 0x),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 947554c48b35..cc637b3a68e9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1119,6 +1119,40 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 
 switch (from) {
 case NVME_ZS_ZSE:
+switch (to) {
+case NVME_ZS_ZSF:
+case NVME_ZS_ZSRO:
+case NVME_ZS_ZSO:
+break;
+
+case 

[PATCH v3 5/9] hw/block/nvme: add the zone management receive command

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Add the Zone Management Receive command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h|   8 +++
 hw/block/nvme.h   |   1 +
 include/block/nvme.h  |  46 +
 hw/block/nvme-ns.c|  12 +++-
 hw/block/nvme.c   | 146 ++
 hw/block/trace-events |   1 +
 6 files changed, 213 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index dd311012213e..a273e7f7ed4b 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -27,11 +27,13 @@ typedef struct NvmeNamespaceParams {
 struct {
 uint64_t zcap;
 uint64_t zsze;
+uint8_t  zdes;
 } zns;
 } NvmeNamespaceParams;
 
 typedef struct NvmeZone {
 NvmeZoneDescriptor *zd;
+uint8_t*zde;
 
 uint64_t wp_staging;
 } NvmeZone;
@@ -55,6 +57,7 @@ typedef struct NvmeNamespace {
 
 NvmeZone   *zones;
 NvmeZoneDescriptor *zd;
+uint8_t*zde;
 } zns;
 } NvmeNamespace;
 
@@ -105,6 +108,11 @@ static inline uint64_t nvme_ns_zsze(NvmeNamespace *ns)
 return nvme_ns_lbafe(ns)->zsze;
 }
 
+static inline size_t nvme_ns_zdes_bytes(NvmeNamespace *ns)
+{
+return ns->params.zns.zdes << 6;
+}
+
 /* calculate the number of LBAs that the namespace can accomodate */
 static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
 {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index f66ed9ab7eff..523eef0bcad8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -71,6 +71,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE";
 case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
+case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_ZONE_MGMT_RECV";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index cf566032a9fa..fdbb3a0d5490 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -480,6 +480,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_ZONE_MGMT_RECV = 0x7a,
 };
 
 typedef struct QEMU_PACKED NvmeDeleteQ {
@@ -592,6 +593,44 @@ enum {
 NVME_RW_PRINFO_PRCHK_REF= 1 << 10,
 };
 
+typedef struct QEMU_PACKED NvmeZoneManagementRecvCmd {
+uint8_t opcode;
+uint8_t flags;
+uint16_tcid;
+uint32_tnsid;
+uint8_t rsvd8[16];
+NvmeCmdDptr dptr;
+uint64_tslba;
+uint32_tnumdw;
+uint8_t zra;
+uint8_t zrasp;
+uint8_t zrasf;
+uint8_t rsvd55[9];
+} NvmeZoneManagementRecvCmd;
+
+typedef enum NvmeZoneManagementRecvAction {
+NVME_CMD_ZONE_MGMT_RECV_REPORT_ZONES  = 0x0,
+NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES = 0x1,
+} NvmeZoneManagementRecvAction;
+
+typedef enum NvmeZoneManagementRecvActionSpecificField {
+NVME_CMD_ZONE_MGMT_RECV_LIST_ALL  = 0x0,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSE  = 0x1,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSIO = 0x2,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSEO = 0x3,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSC  = 0x4,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSF  = 0x5,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSRO = 0x6,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSO  = 0x7,
+} NvmeZoneManagementRecvActionSpecificField;
+
+#define NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL 0x1
+
+typedef struct QEMU_PACKED NvmeZoneReportHeader {
+uint64_t num_zones;
+uint8_t  rsvd[56];
+} NvmeZoneReportHeader;
+
 typedef struct QEMU_PACKED NvmeDsmCmd {
 uint8_t opcode;
 uint8_t flags;
@@ -810,6 +849,12 @@ typedef struct QEMU_PACKED NvmeZoneDescriptor {
 uint8_t  rsvd32[32];
 } NvmeZoneDescriptor;
 
+#define NVME_ZA_ZDEV (1 << 7)
+
+#define NVME_ZA_SET(za, attrs)   ((za) |= (attrs))
+#define NVME_ZA_CLEAR(za, attrs) ((za) &= ~(attrs))
+#define NVME_ZA_CLEAR_ALL(za)((za) = 0x0)
+
 enum NvmeSmartWarn {
 NVME_SMART_SPARE  = 1 << 0,
 NVME_SMART_TEMPERATURE= 1 << 1,
@@ -1161,6 +1206,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementRecvCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 1a5e99e50578..0922b67dd6d8 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -59,6 +59,9 @@ static void nvme_ns_zns_init_zones(NvmeNamespace *ns)
 
 zone = >zns.zones[i];
 zone->zd = >zns.zd[i];
+if (ns->params.zns.zdes) {
+zone->zde = >zns.zde[i];
+}
 

[PATCH v3 6/9] hw/block/nvme: add the zone management send command

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Add the Zone Management Send command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h   |   2 +
 include/block/nvme.h  |  28 +++
 hw/block/nvme.c   | 384 +-
 hw/block/trace-events |  11 ++
 4 files changed, 420 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 523eef0bcad8..658f15bb5fd1 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -28,6 +28,7 @@ typedef struct NvmeRequest {
 struct NvmeNamespace*ns;
 BlockAIOCB  *aiocb;
 uint16_tstatus;
+void*opaque;
 NvmeCqe cqe;
 NvmeCmd cmd;
 BlockAcctCookie acct;
@@ -71,6 +72,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE";
 case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
+case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_ZONE_MGMT_SEND";
 case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_ZONE_MGMT_RECV";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index fdbb3a0d5490..4181d0068edb 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -480,6 +480,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_ZONE_MGMT_SEND = 0x79,
 NVME_CMD_ZONE_MGMT_RECV = 0x7a,
 };
 
@@ -593,6 +594,32 @@ enum {
 NVME_RW_PRINFO_PRCHK_REF= 1 << 10,
 };
 
+typedef struct QEMU_PACKED NvmeZoneManagementSendCmd {
+uint8_t opcode;
+uint8_t flags;
+uint16_tcid;
+uint32_tnsid;
+uint32_trsvd8[4];
+NvmeCmdDptr dptr;
+uint64_tslba;
+uint32_trsvd48;
+uint8_t zsa;
+uint8_t zsflags;
+uint16_trsvd54;
+uint32_trsvd56[2];
+} NvmeZoneManagementSendCmd;
+
+#define NVME_CMD_ZONE_MGMT_SEND_SELECT_ALL(zsflags) ((zsflags) & 0x1)
+
+typedef enum NvmeZoneManagementSendAction {
+NVME_CMD_ZONE_MGMT_SEND_CLOSE   = 0x1,
+NVME_CMD_ZONE_MGMT_SEND_FINISH  = 0x2,
+NVME_CMD_ZONE_MGMT_SEND_OPEN= 0x3,
+NVME_CMD_ZONE_MGMT_SEND_RESET   = 0x4,
+NVME_CMD_ZONE_MGMT_SEND_OFFLINE = 0x5,
+NVME_CMD_ZONE_MGMT_SEND_SET_ZDE = 0x10,
+} NvmeZoneManagementSendAction;
+
 typedef struct QEMU_PACKED NvmeZoneManagementRecvCmd {
 uint8_t opcode;
 uint8_t flags;
@@ -1206,6 +1233,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementSendCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementRecvCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a92cd02d8955..e02c89b1496b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -163,6 +163,8 @@ static const NvmeEffectsLog nvme_effects[NVME_IOCS_MAX] = {
 .iocs = {
 NVME_EFFECTS_NVM_INITIALIZER,
 [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP,
+[NVME_CMD_ZONE_MGMT_SEND] = NVME_EFFECTS_CSUPP |
+NVME_EFFECTS_LBCC,
 },
 },
 };
@@ -1080,6 +1082,12 @@ static uint16_t nvme_check_dulbe(NvmeNamespace *ns, 
uint64_t slba,
 return NVME_SUCCESS;
 }
 
+static inline void nvme_zone_reset_wp(NvmeZone *zone)
+{
+zone->zd->wp = zone->zd->zslba;
+zone->wp_staging = nvme_zslba(zone);
+}
+
 static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 NvmeZoneState to)
 {
@@ -1100,6 +1108,10 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSEO:
 switch (to) {
 case NVME_ZS_ZSE:
+nvme_zone_reset_wp(zone);
+
+/* fallthrough */
+
 case NVME_ZS_ZSO:
 NVME_ZA_CLEAR_ALL(zd->za);
 
@@ -1120,6 +1132,10 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSC:
 switch (to) {
 case NVME_ZS_ZSE:
+nvme_zone_reset_wp(zone);
+
+/* fallthrough */
+
 case NVME_ZS_ZSO:
 NVME_ZA_CLEAR_ALL(zd->za);
 
@@ -1152,8 +1168,12 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSF:
 switch (to) {
 case NVME_ZS_ZSE:
+nvme_zone_reset_wp(zone);
+
+/* fallthrough */
+
 case NVME_ZS_ZSO:
-NVME_ZA_CLEAR_ALL(zd->za);
+NVME_ZA_CLEAR_ALL(zone->zd->za);
 
 /* fallthrough */
 
@@ -1254,6 +1274,354 @@ 

Re: [PATCH v6 02/11] hw/block/nvme: Generate namespace UUIDs

2020-10-14 Thread Klaus Jensen
On Oct 14 06:42, Dmitry Fomichev wrote:
> In NVMe 1.4, a namespace must report an ID descriptor of UUID type
> if it doesn't support EUI64 or NGUID. Add a new namespace property,
> "uuid", that provides the user the option to either specify the UUID
> explicitly or have a UUID generated automatically every time a
> namespace is initialized.
> 
> Suggested-by: Klaus Jansen 
> Signed-off-by: Dmitry Fomichev 

Reviewed-by: Klaus Jensen 

> ---
>  hw/block/nvme-ns.c | 1 +
>  hw/block/nvme-ns.h | 1 +
>  hw/block/nvme.c| 9 +
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index b69cdaf27e..de735eb9f3 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error 
> **errp)
>  static Property nvme_ns_props[] = {
>  DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
>  DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
> +DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index ea8c2f785d..a38071884a 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -21,6 +21,7 @@
>  
>  typedef struct NvmeNamespaceParams {
>  uint32_t nsid;
> +QemuUUID uuid;
>  } NvmeNamespaceParams;
>  
>  typedef struct NvmeNamespace {
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ee0eff98da..89d91926d9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1571,6 +1571,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
> NvmeRequest *req)
>  
>  static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
>  {
> +NvmeNamespace *ns;
>  NvmeIdentify *c = (NvmeIdentify *)>cmd;
>  uint32_t nsid = le32_to_cpu(c->nsid);
>  uint8_t list[NVME_IDENTIFY_DATA_SIZE];
> @@ -1590,7 +1591,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
> *n, NvmeRequest *req)
>  return NVME_INVALID_NSID | NVME_DNR;
>  }
>  
> -if (unlikely(!nvme_ns(n, nsid))) {
> +ns = nvme_ns(n, nsid);
> +if (unlikely(!ns)) {
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> @@ -1599,12 +1601,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
> *n, NvmeRequest *req)
>  /*
>   * Because the NGUID and EUI64 fields are 0 in the Identify Namespace 
> data
>   * structure, a Namespace UUID (nidt = 0x3) must be reported in the
> - * Namespace Identification Descriptor. Add a very basic Namespace UUID
> - * here.
> + * Namespace Identification Descriptor. Add the namespace UUID here.
>   */
>  ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
>  ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN;
> -stl_be_p(_descrs->uuid.v, nsid);
> +memcpy(_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN);
>  
>  return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
>  DMA_DIRECTION_FROM_DEVICE, req);
> -- 
> 2.21.0
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature


[PATCH v3 0/9] hw/block/nvme: zoned namespace command set

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Updated version of my proposal.

Based-on: <20201014084324.333774-1-...@irrelevant.dk>

Changes for v3
~~

  * Rebased on nvme-next with "[PATCH v2] hw/block/nvme: add dulbe
support" applied.

  * "hw/block/nvme: add support for dulbe and block utilization tracking"
- Dropped from this series. This series instead builds on the
  support for DULBE that I added in "[PATCH v2] hw/block/nvme: add
  dulbe support", previously posted.

  * "hw/block/nvme: add the zone management send command"
- Use asynchronous discards.

  * "hw/block/nvme: add basic read/write for zoned namespaces"
  * "hw/block/nvme: add the zone management receive command"
  * "hw/block/nvme: add the zone management send command"
  * "hw/block/nvme: add the zone append command"
  * "hw/block/nvme: track and enforce zone resources"
  * "hw/block/nvme: allow open to close zone transitions by controller"
- In compliance with the concensus I dropped zone persistence
  support from all patches.

Changes for v2
~~

  * "hw/block/nvme: add support for dulbe and block utilization tracking"
- Factor out pstate init/load into separate functions.

- Fixed a stupid off-by-1 bug that would trigger when resetting the
  last zone.

- I added a more formalized pstate file format that includes a
  header. This is pretty similar to what is done in Dmitry's series,
  but with fewer fields. The device parameters for nvme-ns are still
  the "authoritative" ones, so if any parameters that influence LBA
  size, number of zones, etc. do not match, an error indicating the
  discrepancy will be produced. IIRC, Dmitry's version does the
  same.

  It is set up such that newer versions can load pstate files from
  older versions. The file format header is not unlike a standard
  nvme datastructure with reserved areas. This means that when
  adding new command sets that require persistent state, it is not
  needed to bump the version number, unless the header has to change
  dramatically.  This is demonstrated when the zoned namespace
  command set support is added in "hw/block/nvme: add basic
  read/write for zoned namespaces".

  * "hw/block/nvme: add basic read/write for zoned namespaces"
- The device will now transition all opened zones to Closed on
  "normal shutdown". You can force the "transition to Full" behavior
  by killing QEMU from the monitor.

  * "hw/block/nvme: add the zone append command"
- Slightly reordered the logic so a LBA Out of Range error is
  returned before Invalid Field in Command for normal read/write
  commands.

  * "hw/block/nvme: support zone active excursions"
- Dropped. Optional and non-critical.

  * "hw/block/nvme: support reset/finish recommended limits"
- Dropped. Optional and non-critical.

Gollu Appalanaidu (1):
  hw/block/nvme: add commands supported and effects log page

Klaus Jensen (8):
  hw/block/nvme: add uuid namespace parameter
  hw/block/nvme: support namespace types
  hw/block/nvme: add basic read/write for zoned namespaces
  hw/block/nvme: add the zone management receive command
  hw/block/nvme: add the zone management send command
  hw/block/nvme: add the zone append command
  hw/block/nvme: track and enforce zone resources
  hw/block/nvme: allow open to close zone transitions by controller

 docs/specs/nvme.txt   |   17 +
 hw/block/nvme-ns.h|  112 +++-
 hw/block/nvme.h   |   23 +
 include/block/nvme.h  |  216 ++-
 block/nvme.c  |4 +-
 hw/block/nvme-ns.c|  172 +-
 hw/block/nvme.c   | 1284 +++--
 hw/block/trace-events |   27 +-
 8 files changed, 1791 insertions(+), 64 deletions(-)

-- 
2.28.0




[PATCH v3 7/9] hw/block/nvme: add the zone append command

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Add the Zone Append command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h   |  6 ++
 include/block/nvme.h  |  7 +++
 hw/block/nvme.c   | 46 +++
 hw/block/trace-events |  1 +
 4 files changed, 60 insertions(+)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 658f15bb5fd1..de128e9bb174 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -16,6 +16,10 @@ typedef struct NvmeParams {
 uint32_t aer_max_queued;
 uint8_t  mdts;
 bool use_intel_id;
+
+struct {
+uint8_t zasl;
+} zns;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
@@ -42,6 +46,7 @@ static inline bool nvme_req_is_write(NvmeRequest *req)
 switch (req->cmd.opcode) {
 case NVME_CMD_WRITE:
 case NVME_CMD_WRITE_ZEROES:
+case NVME_CMD_ZONE_APPEND:
 return true;
 default:
 return false;
@@ -74,6 +79,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
 case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_ZONE_MGMT_SEND";
 case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_ZONE_MGMT_RECV";
+case NVME_CMD_ZONE_APPEND:  return "NVME_ZONED_CMD_ZONE_APPEND";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 4181d0068edb..aecefefd43b6 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -482,6 +482,7 @@ enum NvmeIoCommands {
 NVME_CMD_DSM= 0x09,
 NVME_CMD_ZONE_MGMT_SEND = 0x79,
 NVME_CMD_ZONE_MGMT_RECV = 0x7a,
+NVME_CMD_ZONE_APPEND= 0x7d,
 };
 
 typedef struct QEMU_PACKED NvmeDeleteQ {
@@ -1016,6 +1017,11 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+typedef struct QEMU_PACKED NvmeIdCtrlZns {
+uint8_t zasl;
+uint8_t rsvd1[4095];
+} NvmeIdCtrlZns;
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
@@ -1240,6 +1246,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096);
+QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZns) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZns) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e02c89b1496b..947554c48b35 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -165,6 +165,8 @@ static const NvmeEffectsLog nvme_effects[NVME_IOCS_MAX] = {
 [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP,
 [NVME_CMD_ZONE_MGMT_SEND] = NVME_EFFECTS_CSUPP |
 NVME_EFFECTS_LBCC,
+[NVME_CMD_ZONE_APPEND]= NVME_EFFECTS_CSUPP |
+NVME_EFFECTS_LBCC,
 },
 },
 };
@@ -1040,6 +1042,21 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, 
size_t len)
 return NVME_SUCCESS;
 }
 
+static inline uint16_t nvme_check_zasl(NvmeCtrl *n, size_t len)
+{
+uint8_t zasl = n->params.zns.zasl;
+
+if (!zasl) {
+return nvme_check_mdts(n, len);
+}
+
+if (len > n->page_size << zasl) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+return NVME_SUCCESS;
+}
+
 static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
  uint64_t slba, uint32_t nlb)
 {
@@ -1848,6 +1865,24 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 zone = nvme_ns_zone(ns, slba);
 assert(zone);
 
+if (req->cmd.opcode == NVME_CMD_ZONE_APPEND) {
+uint64_t wp = zone->wp_staging;
+
+if (slba != nvme_zslba(zone)) {
+trace_pci_nvme_err_invalid_zslba(nvme_cid(req), slba);
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+status = nvme_check_zasl(n, data_size);
+if (status) {
+trace_pci_nvme_err_zasl(nvme_cid(req), data_size);
+goto invalid;
+}
+
+slba = wp;
+rw->slba = req->cqe.qw0 = cpu_to_le64(wp);
+}
+
 status = nvme_check_zone(n, slba, nlb, req, zone);
 if (status) {
 goto invalid;
@@ -1942,6 +1977,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 return nvme_write_zeroes(n, req);
 case NVME_CMD_WRITE:
 case NVME_CMD_READ:
+case NVME_CMD_ZONE_APPEND:
 return nvme_rw(n, req);
 case NVME_CMD_ZONE_MGMT_SEND:
 return nvme_zone_mgmt_send(n, req);
@@ -3635,6 +3671,11 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 return;
 }
 
+if (params->zns.zasl && params->zns.zasl > params->mdts) {

[PATCH v3 4/9] hw/block/nvme: add basic read/write for zoned namespaces

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

This adds basic read and write for zoned namespaces.

A zoned namespace is created by setting the iocs namespace parameter to
0x2 and specifying the zns.zcap parameter (zone capacity) in number of
logical blocks per zone. If a zone size (zns.zsze) is not specified, the
namespace device will set the zone size to be the next power of two and
fit in as many zones as possible on the underlying namespace blockdev.
This behavior is not required by the specification, but ensures that the
device can be initialized by the Linux kernel nvme driver, which
requires a power of two zone size.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt   |   8 ++
 hw/block/nvme-ns.h|  80 +++
 hw/block/nvme.h   |  11 ++
 include/block/nvme.h  |  60 +++-
 hw/block/nvme-ns.c| 116 
 hw/block/nvme.c   | 308 +-
 hw/block/trace-events |   7 +
 7 files changed, 585 insertions(+), 5 deletions(-)

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
index 619bd9ce4378..80cb34406255 100644
--- a/docs/specs/nvme.txt
+++ b/docs/specs/nvme.txt
@@ -6,6 +6,14 @@ The nvme device (-device nvme) emulates an NVM Express 
Controller.
   `iocs`; The "I/O Command Set" associated with the namespace. E.g. 0x0 for the
  NVM Command Set (the default), or 0x2 for the Zoned Namespace Command Set.
 
+  `zns.zcap`; If `iocs` is 0x2, this specifies the zone capacity. It is
+ specified in units of logical blocks.
+
+  `zns.zsze`; If `iocs` is 0x2, this specifies the zone size. It is specified
+ in units of the logical blocks. If not specified, the value depends on
+ zns.zcap; if the zone capacity is a power of two, the zone size will be
+ set to that, otherwise it will default to the next power of two.
+
 
 Reference Specifications
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 5eb135a0b73f..dd311012213e 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -23,8 +23,19 @@ typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 uint8_t  iocs;
 QemuUUID uuid;
+
+struct {
+uint64_t zcap;
+uint64_t zsze;
+} zns;
 } NvmeNamespaceParams;
 
+typedef struct NvmeZone {
+NvmeZoneDescriptor *zd;
+
+uint64_t wp_staging;
+} NvmeZone;
+
 typedef struct NvmeNamespace {
 DeviceState  parent_obj;
 BlockConfblkconf;
@@ -38,8 +49,20 @@ typedef struct NvmeNamespace {
 struct {
 uint32_t err_rec;
 } features;
+
+struct {
+int num_zones;
+
+NvmeZone   *zones;
+NvmeZoneDescriptor *zd;
+} zns;
 } NvmeNamespace;
 
+static inline bool nvme_ns_zoned(NvmeNamespace *ns)
+{
+return ns->iocs == NVME_IOCS_ZONED;
+}
+
 static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 {
 if (ns) {
@@ -54,17 +77,34 @@ static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns)
 return ns->id_ns[NVME_IOCS_NVM];
 }
 
+static inline NvmeIdNsZns *nvme_ns_id_zoned(NvmeNamespace *ns)
+{
+return ns->id_ns[NVME_IOCS_ZONED];
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
 return _ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 }
 
+static inline NvmeLBAFE *nvme_ns_lbafe(NvmeNamespace *ns)
+{
+NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
+NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns);
+return _ns_zns->lbafe[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
 static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
 {
 return nvme_ns_lbaf(ns)->ds;
 }
 
+static inline uint64_t nvme_ns_zsze(NvmeNamespace *ns)
+{
+return nvme_ns_lbafe(ns)->zsze;
+}
+
 /* calculate the number of LBAs that the namespace can accomodate */
 static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
 {
@@ -77,10 +117,50 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t 
lba)
 return lba << nvme_ns_lbads(ns);
 }
 
+static inline NvmeZone *nvme_ns_zone(NvmeNamespace *ns, uint64_t lba)
+{
+int idx = lba / nvme_ns_zsze(ns);
+if (unlikely(idx >= ns->zns.num_zones)) {
+return NULL;
+}
+
+return >zns.zones[idx];
+}
+
+static inline NvmeZoneState nvme_zs(NvmeZone *zone)
+{
+return (zone->zd->zs >> 4) & 0xf;
+}
+
+static inline void nvme_zs_set(NvmeZone *zone, NvmeZoneState zs)
+{
+zone->zd->zs = zs << 4;
+}
+
+static inline uint64_t nvme_zslba(NvmeZone *zone)
+{
+return le64_to_cpu(zone->zd->zslba);
+}
+
+static inline uint64_t nvme_zcap(NvmeZone *zone)
+{
+return le64_to_cpu(zone->zd->zcap);
+}
+
+static inline uint64_t nvme_wp(NvmeZone *zone)
+{
+return le64_to_cpu(zone->zd->wp);
+}
+
 typedef struct NvmeCtrl NvmeCtrl;
 
+const char *nvme_zs_str(NvmeZone *zone);
+const char *nvme_zs_to_str(NvmeZoneState zs);
+
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_flush(NvmeNamespace *ns);
 
+void nvme_ns_zns_init_zone_state(NvmeNamespace *ns);
+
 #endif /* 

[PATCH v3 3/9] hw/block/nvme: support namespace types

2020-10-14 Thread Klaus Jensen
From: Klaus Jensen 

Implement support for TP 4056 ("Namespace Types"). This adds the 'iocs'
(I/O Command Set) device parameter to the nvme-ns device.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt   |   3 +
 hw/block/nvme-ns.h|  11 ++-
 hw/block/nvme.h   |   3 +
 include/block/nvme.h  |  52 +++---
 block/nvme.c  |   4 +-
 hw/block/nvme-ns.c|  21 +++-
 hw/block/nvme.c   | 225 +++---
 hw/block/trace-events |   6 +-
 8 files changed, 267 insertions(+), 58 deletions(-)

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
index 56d393884e7a..619bd9ce4378 100644
--- a/docs/specs/nvme.txt
+++ b/docs/specs/nvme.txt
@@ -3,6 +3,9 @@ NVM Express Controller
 
 The nvme device (-device nvme) emulates an NVM Express Controller.
 
+  `iocs`; The "I/O Command Set" associated with the namespace. E.g. 0x0 for the
+ NVM Command Set (the default), or 0x2 for the Zoned Namespace Command Set.
+
 
 Reference Specifications
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 8951fc5e86b8..5eb135a0b73f 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+uint8_t  iocs;
 QemuUUID uuid;
 } NvmeNamespaceParams;
 
@@ -29,7 +30,8 @@ typedef struct NvmeNamespace {
 BlockConfblkconf;
 int32_t  bootindex;
 int64_t  size;
-NvmeIdNs id_ns;
+uint8_t  iocs;
+void *id_ns[NVME_IOCS_MAX];
 
 NvmeNamespaceParams params;
 
@@ -47,9 +49,14 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns)
+{
+return ns->id_ns[NVME_IOCS_NVM];
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
-NvmeIdNs *id_ns = >id_ns;
+NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
 return _ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 }
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e080a2318a50..5c1de0ef16e7 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -112,6 +112,7 @@ typedef struct NvmeFeatureVal {
 };
 uint32_tasync_config;
 uint32_tvwc;
+uint32_tiocsci;
 } NvmeFeatureVal;
 
 typedef struct NvmeCtrl {
@@ -139,6 +140,7 @@ typedef struct NvmeCtrl {
 uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */
 uint64_tstarttime_ms;
 uint16_ttemperature;
+uint64_tiocscs[512];
 
 HostMemoryBackend *pmrdev;
 
@@ -154,6 +156,7 @@ typedef struct NvmeCtrl {
 NvmeSQueue  admin_sq;
 NvmeCQueue  admin_cq;
 NvmeIdCtrl  id_ctrl;
+void*id_ctrl_iocss[NVME_IOCS_MAX];
 NvmeFeatureVal  features;
 } NvmeCtrl;
 
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f71376072762..443f5c7e8376 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -84,6 +84,7 @@ enum NvmeCapMask {
 
 enum NvmeCapCss {
 NVME_CAP_CSS_NVM= 1 << 0,
+NVME_CAP_CSS_CSI= 1 << 6,
 NVME_CAP_CSS_ADMIN_ONLY = 1 << 7,
 };
 
@@ -117,6 +118,7 @@ enum NvmeCcMask {
 
 enum NvmeCcCss {
 NVME_CC_CSS_NVM= 0x0,
+NVME_CC_CSS_ALL= 0x6,
 NVME_CC_CSS_ADMIN_ONLY = 0x7,
 };
 
@@ -388,6 +390,11 @@ enum NvmePmrmscMask {
 #define NVME_PMRMSC_SET_CBA(pmrmsc, val)   \
 (pmrmsc |= (uint64_t)(val & PMRMSC_CBA_MASK) << PMRMSC_CBA_SHIFT)
 
+enum NvmeCommandSet {
+NVME_IOCS_NVM = 0x0,
+NVME_IOCS_MAX = 0x1,
+};
+
 enum NvmeSglDescriptorType {
 NVME_SGL_DESCR_TYPE_DATA_BLOCK  = 0x0,
 NVME_SGL_DESCR_TYPE_BIT_BUCKET  = 0x1,
@@ -534,8 +541,13 @@ typedef struct QEMU_PACKED NvmeIdentify {
 uint64_trsvd2[2];
 uint64_tprp1;
 uint64_tprp2;
-uint32_tcns;
-uint32_trsvd11[5];
+uint8_t cns;
+uint8_t rsvd3;
+uint16_tcntid;
+uint16_tnvmsetid;
+uint8_t rsvd4;
+uint8_t csi;
+uint32_trsvd11[4];
 } NvmeIdentify;
 
 typedef struct QEMU_PACKED NvmeRwCmd {
@@ -627,8 +639,15 @@ typedef struct QEMU_PACKED NvmeAerResult {
 } NvmeAerResult;
 
 typedef struct QEMU_PACKED NvmeCqe {
-uint32_tresult;
-uint32_trsvd;
+union {
+struct {
+uint32_tdw0;
+uint32_tdw1;
+};
+
+uint64_t qw0;
+};
+
 uint16_tsq_head;
 uint16_tsq_id;
 uint16_tcid;
@@ -676,6 +695,10 @@ enum NvmeStatusCodes {
 NVME_FEAT_NOT_CHANGEABLE= 0x010e,
 NVME_FEAT_NOT_NS_SPEC   = 0x010f,
 NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
+NVME_IOCS_NOT_SUPPORTED = 0x0129,
+NVME_IOCS_NOT_ENABLED   = 0x012a,
+NVME_IOCS_COMB_REJECTED = 0x012b,
+NVME_INVALID_IOCS   = 0x012c,
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
@@ -785,10 +808,14 @@ typedef struct QEMU_PACKED NvmePSD {
 #define 

  1   2   >