date:20201007

Re: [PATCH v2 4/5] nbd: Add new qemu:allocation-depth metacontext

2020-10-07 Thread Eric Blake

On 10/7/20 8:40 AM, Vladimir Sementsov-Ogievskiy wrote:
> 30.09.2020 15:11, Eric Blake wrote:
>> 'qemu-img map' provides a way to determine which extents of an image
>> come from the top layer vs. inherited from a backing chain.  This is
>> useful information worth exposing over NBD.  There is a proposal to
>> add a QMP command block-dirty-bitmap-populate which can create a dirty
>> bitmap that reflects allocation information, at which point
>> qemu:dirty-bitmap:NAME can expose that information via the creation of
>> a temporary bitmap, but we can shorten the effort by adding a new
>> qemu:allocation-depth context that does the same thing without an
>> intermediate bitmap (this patch does not eliminate the need for that
>> proposal, as it will have other uses as well).
>>
>> For this patch, I just encoded a tri-state value (unallocated, from
>> this layer, from any of the backing layers); an obvious extension
>> would be to provide the actual depth in bits 31-4 while keeping bits
>> 1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
>> from a hex number).  But this extension would require
>> bdrv_is_allocated_above to return a depth number.
>>
>> Note that this patch does not actually enable any way to request a
>> server to enable this context; that will come in the next patch.
>>

>> +In the allocation depth context, bits 0 and 1 form a tri-state value:
>> +
>> +    bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is
>> unallocated
>> +    bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this
>> image
> 
> Maybe, s/this/the top level/, as, what is "this" for NBD client?

Sure.


>> @@ -864,12 +867,21 @@ static bool nbd_meta_qemu_query(NBDClient
>> *client, NBDExportMetaContexts *meta,
>>
>>   if (!*query) {

We get here for "qemu:".

>>   if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
>> +    meta->allocation_depth = meta->exp->alloc_context;
>>   meta->bitmap = !!meta->exp->export_bitmap;
>>   }
>>   trace_nbd_negotiate_meta_query_parse("empty");
>>   return true;
>>   }
>>
>> +    if (nbd_strshift(&query, "allocation-depth")) {

We get here for "qemu:allocation-depth", and as you pointed out,
"qemu:allocation-depth-extended".

>> +    trace_nbd_negotiate_meta_query_parse("allocation-depth");
>> +    if (nbd_meta_empty_or_pattern(client, "", query)) {
> 
> How much it differs from "if (!*query) {", I don't see...

The nbd_meta_empty_or_pattern returns false for
"qemu:allocation-depth-extended"; it only returns true for
"qemu:allocation-depth".  But, as you pointed out,

> 
>> +    meta->allocation_depth = meta->exp->alloc_context;
>> +    }
>> +    return true;
>> +    }
> 
> why not just
> 
>  if (!strcmp(query, "allocation-depth")) {
>     meta->allocation_depth = meta->exp->alloc_context;
>     return true;
>  }
> 
> ?

that does seem shorter.

> 
> With your code you also parse something like
> "allocation-depth-extended".. Is it on purpose?

The string is parsed, but does not affect meta->XXX, which means nothing
gets advertised to the client.  The trace messages might differ, but
behavior is correct.

> 
>> +
>>   if (nbd_strshift(&query, "dirty-bitmap:")) {
>>   trace_nbd_negotiate_meta_query_parse("dirty-bitmap:");
>>   if (!meta->exp->export_bitmap) {
> 
> 
> Also, "trace_nbd_negotiate_meta_query_skip("not dirty-bitmap"); " at
> function end should
> now be something like trace_nbd_negotiate_meta_query_skip("unknown
> context in qemu: namespace");

Good idea.


>> +/* Get allocation depth from the exported device and send it to the
>> client */
>> +static int nbd_co_send_block_alloc(NBDClient *client, uint64_t handle,
>> +   BlockDriverState *bs, uint64_t
>> offset,
>> +   uint32_t length, bool dont_fragment,
>> +   bool last, uint32_t context_id,
>> +   Error **errp)
>> +{
>> +    int ret;
>> +    unsigned int nb_extents = dont_fragment ? 1 :
>> NBD_MAX_BLOCK_STATUS_EXTENTS;
>> +    g_autoptr(NBDExtentArray) ea = nbd_extent_array_new(nb_extents);
>> +
>> +    ret = blockalloc_to_extents(bs, offset, length, ea);
>> +    if (ret < 0) {
>> +    return nbd_co_send_structured_error(
>> +    client, handle, -ret, "can't get block status", errp);
>> +    }
>> +
>> +    return nbd_co_send_extents(client, handle, ea, last, context_id,
>> errp);
>> +}
> 
> 
> The function is a copy of nbd_co_send_block_status()..
> 
> Probably, we can reuse nbd_co_send_block_status(), and just call
> blockstatus_to_extents()
> or blockalloc_to_extents() depending on context_id parameter.

Nice idea to reduce the duplication.

> 
> Still, I'm OK with it as is.
> 

So is that Reviewed-by:, or do I need to post v3 with my tweaks in
response to your comments?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3

Re: [PATCH v2 3/5] nbd: Simplify meta-context parsing

2020-10-07 Thread Eric Blake

On 10/7/20 6:51 AM, Vladimir Sementsov-Ogievskiy wrote:
> 30.09.2020 15:11, Eric Blake wrote:
>> We had a premature optimization of trying to read as little from the
>> wire as possible while handling NBD_OPT_SET_META_CONTEXT in phases.
>> But in reality, we HAVE to read the entire string from the client
>> before we can get to the next command, and it is easier to just read
>> it all at once than it is to read it in pieces.  And once we do that,
>> several functions end up no longer performing I/O, so they can drop
>> length and errp parameters, and just return a bool instead of
>> modifying through a pointer.
>>
>> Our iotests still pass; I also checked that libnbd's testsuite (which
>> covers more corner cases of odd meta context requests) still passes.
>>
> 
> Also, do not advertise bitmaps meta context when bitmap export is not set.

That was already there, although seeing the logic change is tricky and
the trace messages change:

> +static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts 
> *meta,
> +const char *query)
>  {
> -bool dirty_bitmap = false;
> -size_t dirty_bitmap_len = strlen("dirty-bitmap:");
> -int ret;
> -
> -if (!meta->exp->export_bitmap) {
> -trace_nbd_negotiate_meta_query_skip("no dirty-bitmap exported");
> -return nbd_opt_skip(client, len, errp);

Old code returned early if there was no bitmap export set

> +if (!nbd_strshift(&query, "qemu:")) {
> +return false;
>  }
> +trace_nbd_negotiate_meta_query_parse("qemu:");
> 
> -if (len == 0) {
> +if (!*query) {
>  if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
> +meta->bitmap = !!meta->exp->export_bitmap;

while the new code has to handle it specifically.  I'll tweak the commit
message to mention the change in trace messages, even when the end
behavior is the same.


> 
>> Signed-off-by: Eric Blake 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 1/5] qemu-nbd: Honor SIGINT and SIGHUP

2020-10-07 Thread Eric Blake

On 10/7/20 5:32 AM, Vladimir Sementsov-Ogievskiy wrote:
> 30.09.2020 15:11, Eric Blake wrote:
>> Honoring just SIGTERM on Linux is too weak; we also want to handle
>> other common signals, and do so even on BSD.  Why?  Because at least
>> 'qemu-nbd -B bitmap' needs a chance to clean up the in-use bit on
>> bitmaps when the server is shut down via a signal.
> 
> Probably not bad to update a comment [*] if you have a good wording in
> mind.
> 
>>
>> See also: http://bugzilla.redhat.com/1883608
>>
>> Signed-off-by: Eric Blake 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 
>> ---
>>   qemu-nbd.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index bacb69b0898b..e7520261134f 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -581,7 +581,7 @@ int main(int argc, char **argv)
>>   const char *pid_file_name = NULL;
>>   BlockExportOptions *export_opts;
>>
>> -#if HAVE_NBD_DEVICE
>> +#ifdef CONFIG_POSIX
>>   /* The client thread uses SIGTERM to interrupt the server.  A
>> signal
>>    * handler ensures that "qemu-nbd -v -c" exits with a nice
>> status code.
> 
> [*]
> 

Sure, I went with:

diff --git i/qemu-nbd.c w/qemu-nbd.c
index e7520261134f..c731dda04ec0 100644
--- i/qemu-nbd.c
+++ w/qemu-nbd.c
@@ -582,8 +582,9 @@ int main(int argc, char **argv)
 BlockExportOptions *export_opts;

 #ifdef CONFIG_POSIX
-/* The client thread uses SIGTERM to interrupt the server.  A signal
- * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
+/*
+ * Exit gracefully on various signals, which includes SIGTERM used
+ * by 'qemu-nbd -v -c'.
  */
 struct sigaction sa_sigterm;
 memset(&sa_sigterm, 0, sizeof(sa_sigterm));

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v10 8/9] block: remove unused backing-file name parameter

2020-10-07 Thread Andrey Shinkevich




On 07.10.2020 13:21, Vladimir Sementsov-Ogievskiy wrote:

29.09.2020 15:38, Andrey Shinkevich wrote:

The block stream QMP parameter backing-file is in use no more. It
designates a backing file name to set in QCOW2 image header after the
block stream job finished. The base file name is used instead.

Signed-off-by: Andrey Shinkevich 


We can't just remove it without a deprecation period of three releases.


It has not been in use for a long. It's time.



So actually, in a previous patch, we should implement new behavior for
automatic backing-file detection if this parameter is unspecified. Amd
keep old behavior for backing-file-name if it is given.

Hmm. Or, probably, we can use direct base for base-filename? And in cases
when we should skip filters (for example of parallel jobs) user should
specify backing-file explicitly?


The backing_file_str is always specified if the base is specified and is 
always equal to the base->filename. So, the user's backing file name is 
always NULL for the stream job. Furthermore, it is not checked for being 
the correct backing node and can lead to a wrong record in the QCOW2 header.


Andrey




---
  block/monitor/block-hmp-cmds.c |  2 +-
  block/stream.c |  6 +-
  blockdev.c | 17 +
  include/block/block_int.h  |  2 +-
  qapi/block-core.json   | 17 +
  5 files changed, 5 insertions(+), 39 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c 
b/block/monitor/block-hmp-cmds.c

index 4e66775..5f19499 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -506,7 +506,7 @@ void hmp_block_stream(Monitor *mon, const QDict 
*qdict)

  int64_t speed = qdict_get_try_int(qdict, "speed", 0);
  qmp_block_stream(true, device, device, base != NULL, base, 
false, NULL,
- false, NULL, qdict_haskey(qdict, "speed"), 
speed, true,

+ qdict_haskey(qdict, "speed"), speed, true,
   BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, 
false, false,

   false, &error);
diff --git a/block/stream.c b/block/stream.c
index b0719e9..fe2663f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -34,7 +34,6 @@ typedef struct StreamBlockJob {
  BlockDriverState *base_overlay; /* COW overlay (stream from 
this) */

  BlockDriverState *above_base;   /* Node directly above the base */
  BlockdevOnError on_error;
-    char *backing_file_str;
  bool bs_read_only;
  bool chain_frozen;
  } StreamBlockJob;
@@ -103,8 +102,6 @@ static void stream_clean(Job *job)
  blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
  bdrv_reopen_set_read_only(bs, true, NULL);
  }
-
-    g_free(s->backing_file_str);
  }
  static int coroutine_fn stream_run(Job *job, Error **errp)
@@ -220,7 +217,7 @@ static const BlockJobDriver stream_job_driver = {
  };
  void stream_start(const char *job_id, BlockDriverState *bs,
-  BlockDriverState *base, const char *backing_file_str,
+  BlockDriverState *base,
    int creation_flags, int64_t speed,
    BlockdevOnError on_error,
    const char *filter_node_name,
@@ -295,7 +292,6 @@ void stream_start(const char *job_id, 
BlockDriverState *bs,

  s->base_overlay = base_overlay;
  s->above_base = above_base;
-    s->backing_file_str = g_strdup(backing_file_str);
  s->bs_read_only = bs_read_only;
  s->chain_frozen = true;
diff --git a/blockdev.c b/blockdev.c
index d719c47..b223601 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2486,7 +2486,6 @@ out:
  void qmp_block_stream(bool has_job_id, const char *job_id, const 
char *device,

    bool has_base, const char *base,
    bool has_base_node, const char *base_node,
-  bool has_backing_file, const char *backing_file,
    bool has_speed, int64_t speed,
    bool has_on_error, BlockdevOnError on_error,
    bool has_filter_node_name, const char 
*filter_node_name,
@@ -2498,7 +2497,6 @@ void qmp_block_stream(bool has_job_id, const 
char *job_id, const char *device,

  BlockDriverState *base_bs = NULL;
  AioContext *aio_context;
  Error *local_err = NULL;
-    const char *base_name = NULL;
  int job_flags = JOB_DEFAULT;
  if (!has_on_error) {
@@ -2526,7 +2524,6 @@ void qmp_block_stream(bool has_job_id, const 
char *job_id, const char *device,

  goto out;
  }
  assert(bdrv_get_aio_context(base_bs) == aio_context);
-    base_name = base;
  }
  if (has_base_node) {
@@ -2541,7 +2538,6 @@ void qmp_block_stream(bool has_job_id, const 
char *job_id, const char *device,

  }
  assert(bdrv_get_aio_context(base_bs) == aio_context);
  bdrv_refresh_filename(base_bs);
-    base_name = base_bs->filename;
  }
  /* Check for

Re: [PATCH v10 6/9] copy-on-read: skip non-guest reads if no copy needed

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


07.10.2020 22:01, Andrey Shinkevich wrote:


On 07.10.2020 13:06, Vladimir Sementsov-Ogievskiy wrote:

29.09.2020 15:38, Andrey Shinkevich wrote:

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 14 ++
  block/io.c   |  2 +-
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index f53f7e0..5389dca 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -145,10 +145,16 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
  }
  }
-    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
-  local_flags);
-    if (ret < 0) {
-    return ret;
+    if ((flags & BDRV_REQ_PREFETCH) &


BDRV_REQ_PREFETCH is documented to be only used with BDRV_REQ_COPY_ON_READ. But 
here
BDRV_REQ_COPY_ON_READ appears intermediately. We should change documentation in 
block.h
in a separate patch (and probably code in bdrv_aligned_preadv())



OK, we will come here without the BDRV_REQ_PREFETCH flag set.


flag BDRV_REQ_PREFETCH should be set in stream job. Where should it be handled, 
I don't follow?


To differ between guest reads and the stream job ones, we would set it here by 
checking for the qiov NULL pointer:


diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 4e3b1c5..df2c2ab 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -144,6 +144,9 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState 
*bs,
    n, &n);
  if (ret) {
  local_flags |= BDRV_REQ_COPY_ON_READ;
+    if (!qiov) {
+    local_flags |= BDRV_REQ_PREFETCH;


if qiov is NULL, this means that flags must include BDRV_REQ_PREFETCH. 
local_flags should inherit flags I think.


+    }
  }
  }

Andrey


+    !(local_flags & BDRV_REQ_COPY_ON_READ)) {
+    /* Skip non-guest reads if no copy needed */
+    } else {
+


extra new-line ?


+    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+    if (ret < 0) {
+    return ret;
+    }
  }
  offset += n;
diff --git a/block/io.c b/block/io.c
index 11df188..62b75a5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1388,7 +1388,7 @@ static int coroutine_fn 
bdrv_co_do_copy_on_readv(BdrvChild *child,
  qemu_iovec_init_buf(&local_qiov, bounce_buffer, pnum);
  ret = bdrv_driver_preadv(bs, cluster_offset, pnum,
- &local_qiov, 0, 0);
+ &local_qiov, 0, flags & 
BDRV_REQ_PREFETCH);


Why? In this place we want to read. We'll write back the data a few lines 
below. What will we write,
if underlying driver decide to do nothing because of BDRV_REQ_PREFETCH?



See my comment above please.


Anyway, BDRV_REQ_PREFETCH here is wrong. You should not pass any qiov, if you 
set BDRV_REQ_PREFETCH flag.

If we come to bdrv_co_do_copy_on_readv, it means that we have COPY_ON_READ 
flag. And therefore, we will handle
PREFETCH and COPY_ON_READ here in generic layer. And therefore, we shouldn't 
pass them to driver.

On the contrary, if we have PREFETCH flag in bdrv_co_aligned_preadv, but don't 
have COPY_ON_READ in the same time,
this means that we must pass PREFETCH flag to the driver if it supports it. And 
do nothing if driver
doesn't support PREFETCH. That's how I see it.




  if (ret < 0) {
  goto err;
  }







--
Best regards,
Vladimir

Re: [PATCH 18/20] python/qemu/qmp.py: re-raise OSError when encountered

2020-10-07 Thread John Snow


On 10/7/20 7:30 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

Nested if conditions don't change when the exception block fires; we
need to explicitly re-raise the error if we didn't intend to capture and
suppress it.

Signed-off-by: John Snow 
---
  python/qemu/qmp.py | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/qemu/qmp.py b/python/qemu/qmp.py
index d911999da1f..bdbd1e9bdbb 100644
--- a/python/qemu/qmp.py
+++ b/python/qemu/qmp.py
@@ -169,9 +169,9 @@ def __get_events(self, wait: Union[bool, float] = False) -> 
None:
  try:
  self.__json_read()
  except OSError as err:
-if err.errno == errno.EAGAIN:
-# No data available
-pass
+# EAGAIN: No data available; not critical
+if err.errno != errno.EAGAIN:
+raise


Hm, if we re-raise the exception here, the socket remains non-blocking.
I think we intended to have it like that only temporarily.



Whoops. Yep, no good to go from one kind of broken to a different kind 
of broken.



The same kind of exception would raise QMPConnectError below instead of
directly OSError. Should we try to make this consistent?



Yeah, honestly I'm a bit confused about the error plumbing myself. We 
like to return "None" a lot, and I have been trying to remove that 
whenever possible, because the nature of what None can mean semantically 
is ambiguous.


I need to sit down with this code and learn all of the different ways it 
can actually and genuinely fail, and what each failure actually 
semantically means.


I suspect it would probably be best to always catch socket errors and 
wrap them in QMPConnectError just to be consistent about that.


I also need to revise the docstrings to be clear about what errors get 
raised where, when, and why. I almost included that for this series, but 
decided against it because I need to also adjust the docstring 
formatting and so on -- and pending discussion in the qapi series -- 
felt it would be best to tackle that just a little bit later.


Here's a docstring convention question:

I think that any method that directly raises an exception should always 
mention that with :raise X:. How far up the call chain, however, should 
anticipated exceptions be documented with :raise:?


My gut feeling is that it should stop at the first public call boundary, 
so accept() should repeat any :raise: comments that appear in private 
helpers.



  self.__sock.setblocking(True)
  
  # Wait for new events, if needed.


Kevin



Thanks for the review! Things seem like they're looking good.

--js

Re: [PATCH 20/20] python: add mypy config

2020-10-07 Thread John Snow


On 10/7/20 7:35 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

Formalize the options used for checking the python library. You can run
mypy from the directory that mypy.ini is in by typing `mypy qemu/`.

Signed-off-by: John Snow 
---
  python/mypy.ini | 4 
  1 file changed, 4 insertions(+)
  create mode 100644 python/mypy.ini

diff --git a/python/mypy.ini b/python/mypy.ini
new file mode 100644
index 000..7a70eca47c6
--- /dev/null
+++ b/python/mypy.ini
@@ -0,0 +1,4 @@
+[mypy]
+strict = True


$ mypy --strict qemu
mypy.ini: [mypy]: Strict mode is not supported in configuration files: specify 
individual flags instead (see 'mypy -h' for the list of flags enabled in strict 
mode)
Success: no issues found in 6 source files
$ mypy --version
mypy 0.740

Did this change in newer mypy versions? I guess it's time that I get the
new laptop which will involve installing a newer Fedora release. :-)


+python_version = 3.6
+warn_unused_configs = True
\ No newline at end of file


Kevin



0.770 lets you use strict in the config file. Fairly modern. I intend to 
use this version in the CI venv that I am cooking up to check these, so 
no need to hurry and update your fedora.


'pip3 install --user mypy>=0.770' should work out just fine until then.

Maybe I should drop back down to >=0.730, but I liked being able to 
force the stricter options in the conf file directly. I also liked the 
idea that if new strict options got added in the future, we'd acquire 
them automatically.


I felt like anything we disabled should be a conscious and explicit 
choice, instead of the opposite.


--js

Re: [PATCH 17/20] python/qemu/qmp.py: Preserve error context on re-raise

2020-10-07 Thread John Snow


On 10/7/20 7:21 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

Use the "from ..." phrasing when re-raising errors to preserve their
initial context, to help aid debugging when things go wrong.

This also silences a pylint 2.6.0+ error.

Signed-off-by: John Snow 


I don't really understand what this improves compared to the implicit
chaining we got before, but if pylint says so...

Reviewed-by: Kevin Wolf 



Yeah, it's a pretty minimal change. Depending on the context, I think it 
makes a bigger difference depending on how far away you are from the 
error you are re-raising, but I couldn't find a great real-world example 
for you right now.


In summary, it changes this line:

"During handling of the above exception, another exception occurred:"

to this one:

"The above exception was the direct cause of the following exception:"

Which disambiguates between wrapping an exception with a more 
semantically meaningful exception class, vs. your handler code itself 
faulted.


Minor change, I know. You are also allowed to use "from None" to 
suppress the chain. I use this in the QAPI series at one point because I 
felt the underlying error was not useful to see in the traceback.


I see the pylint change as forcing you not to rely on the implicit 
chaining. Eh, fine.


--js

Re: [PATCH v10 6/9] copy-on-read: skip non-guest reads if no copy needed

2020-10-07 Thread Andrey Shinkevich




On 07.10.2020 13:06, Vladimir Sementsov-Ogievskiy wrote:

29.09.2020 15:38, Andrey Shinkevich wrote:

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 14 ++
  block/io.c   |  2 +-
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index f53f7e0..5389dca 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -145,10 +145,16 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,

  }
  }
-    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, 
qiov_offset,

-  local_flags);
-    if (ret < 0) {
-    return ret;
+    if ((flags & BDRV_REQ_PREFETCH) &


BDRV_REQ_PREFETCH is documented to be only used with 
BDRV_REQ_COPY_ON_READ. But here
BDRV_REQ_COPY_ON_READ appears intermediately. We should change 
documentation in block.h

in a separate patch (and probably code in bdrv_aligned_preadv())



OK, we will come here without the BDRV_REQ_PREFETCH flag set.
To differ between guest reads and the stream job ones, we would set it 
here by checking for the qiov NULL pointer:



diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 4e3b1c5..df2c2ab 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -144,6 +144,9 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,

   n, &n);
 if (ret) {
 local_flags |= BDRV_REQ_COPY_ON_READ;
+if (!qiov) {
+local_flags |= BDRV_REQ_PREFETCH;
+}
 }
 }

Andrey


+    !(local_flags & BDRV_REQ_COPY_ON_READ)) {
+    /* Skip non-guest reads if no copy needed */
+    } else {
+


extra new-line ?

+    ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, 
qiov_offset,

+  local_flags);
+    if (ret < 0) {
+    return ret;
+    }
  }
  offset += n;
diff --git a/block/io.c b/block/io.c
index 11df188..62b75a5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1388,7 +1388,7 @@ static int coroutine_fn jk(BdrvChild *child,
  qemu_iovec_init_buf(&local_qiov, bounce_buffer, pnum);
  ret = bdrv_driver_preadv(bs, cluster_offset, pnum,
- &local_qiov, 0, 0);
+ &local_qiov, 0, flags & 
BDRV_REQ_PREFETCH);


Why? In this place we want to read. We'll write back the data a few 
lines below. What will we write,

if underlying driver decide to do nothing because of BDRV_REQ_PREFETCH?



See my comment above please.


  if (ret < 0) {
  goto err;
  }

Re: [PATCH 12/20] python/qemu/console_socket.py: Correct type of recv()

2020-10-07 Thread John Snow


On 10/7/20 6:59 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

The type and parameter names of recv() should match socket.socket().


Should this be socket.socket without parentheses (the class name)?
socket.socket() is the constructor and it takes very different
parameters.



You're right.


OK, easy enough, but in the cases we don't pass straight through to the
real socket implementation, we probably can't accept such flags. OK, for
now, assert that we don't receive flags in such cases.

Signed-off-by: John Snow 


Reviewed-by: Kevin Wolf 



Thanks!

Re: [PATCH 11/20] python/qemu: Add mypy type annotations

2020-10-07 Thread John Snow


On 10/7/20 6:46 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

These should all be purely annotations with no changes in behavior at
all. You need to be in the python folder, but you should be able to
confirm that these annotations are correct (or at least self-consistent)
by running `mypy --strict qemu`.

Signed-off-by: John Snow 


'mypy --strict qemu' doesn't have clean output after this patch because
ConsoleSocket isn't converted yet. But then, technically the commit
message doesn't make this claim, and you can indeed check the
self-consistency when you ignore the ConsoleSocket related errors, so
probably not a problem.

Reviewed-by: Kevin Wolf 



Whoops, old commit message.

I decided against folding in the new changes because they are newer and 
have been through the review wringer a lot less.


I'll fix this commit message up.

Re: [PATCH 08/20] python/machine.py: fix _popen access

2020-10-07 Thread John Snow


On 10/7/20 6:07 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

As always, Optional[T] causes problems with unchecked access. Add a
helper that asserts the pipe is present before we attempt to talk with
it.

Signed-off-by: John Snow 


First a question about the preexisting state: I see that after
initialising self._popen once, we never reset it to None. Should we do
so on shutdown?



Yup, we should.


  python/qemu/machine.py | 16 +++-
  1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 3e9cf09fd2d..4e762fcd529 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -131,7 +131,7 @@ def __init__(self, binary, args=None, wrapper=None, 
name=None,
  # Runstate
  self._qemu_log_path = None
  self._qemu_log_file = None
-self._popen = None
+self._popen: Optional['subprocess.Popen[bytes]'] = None


Another option that we have, especially if it's an attribute that is
never reset, would be to set the attribute only when it first gets a
value other than None. Accessing it while it hasn't been set yet
automatically results in an AttributeError. I don't think that's much
worse than the exception raised explicitly in a property wrapper.

In this case, you would only declare the type in __init__, but not
assign a value to it:

 self._popen: Optional['subprocess.Popen[bytes]']



If you do this, you can just declare it as non-Optional. Whenever it 
exists, it is definitely a subprocess.Popen[bytes].



Maybe a nicer alternative in some cases than adding properties around
everything.

Instead of checking for None, you would then have to use hasattr(),
which is a bit uglier, so I guess it's mainly for attributes where you
can assume that you will always have a value (if the caller isn't buggy)
and therefore don't even have a check in most places.



As long as the style checkers are OK with that sort of thing. After a 
very quick test, it seems like they might be.


Generally, we run into trouble because pylint et al want variables to be 
declared in __init__, but doing so requires Optional[T] most of the time 
to allow something to be initialized later.


A lot of our stateful objects have this kind of pattern. QAPIGen has a 
ton of it. machine.py has a ton of it too.


You can basically imply the stateful check by just foregoing the actual 
initialization, which trades the explicit check for the implicit one 
when you get the AttributeError.


This is maybe more convenient -- less code to write, certainly. The 
error message you get I think is going to be a little worse, though.


I think I have been leaning towards the cute little @property shims 
because it follows a familiar OO model where a specific class always has 
a finite set of properties that does not grow or shrink. You can also 
use the shim to give a meaningful error that might be nicer to read than 
the AttributeError.


I'm open to suggestions on better patterns. I had considered at one 
point that it might be nice to split Machine out into a version with and 
without the console to make stronger typing guarantees. It has 
implications for how shutdown and cleanup and so on is handled, too.


(I had some WIP patches to do this, but I think I got a little stuck 
making the code pretty, and then the release, and then I got busy, and...)



  self._events = []
  self._iolog = None
  self._qmp_set = True   # Enable QMP monitor by default.
@@ -244,6 +244,12 @@ def is_running(self):
  """Returns true if the VM is running."""
  return self._popen is not None and self._popen.poll() is None
  
+@property

+def _subp(self) -> 'subprocess.Popen[bytes]':
+if self._popen is None:
+raise QEMUMachineError('Subprocess pipe not present')
+return self._popen
+
  def exitcode(self):
  """Returns the exit code if possible, or None."""
  if self._popen is None:


Of course, even if an alternative is possible, what you have is still
correct.

Reviewed-by: Kevin Wolf 



Thanks; I'll continue with this for now, but I really am open to talking 
about better ways to model the common pattern of "Optional sub-feature 
for a class that can be engaged post-initialization".


It's an interesting typing problem. If we were using semantic types, 
what we are describing is an f(x) such that:


f(object-without-feature) -> object-with-feature

It's a kind of semantic cast where we are doing something akin to an 
in-place transformation of a base type to a subtype. I'm not sure I have 
encountered any language that actually intentionally supports such a 
paradigm.


(Maybe haskell? I just assume haskell can do everything if you learn to 
lie to computers well enough.)


--js

Re: [PATCH 07/20] python/machine.py: Add _qmp access shim

2020-10-07 Thread John Snow


On 10/7/20 5:53 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

Like many other Optional[] types, it's not always a given that this
object will be set. Wrap it in a type-shim that raises a meaningful
error and will always return a concrete type.

Signed-off-by: John Snow 



@@ -515,11 +515,13 @@ def set_qmp_monitor(self, enabled=True):
  line. Default is True.
  @note: call this function before launch().
  """
-if enabled:
-self._qmp_set = True
-else:
-self._qmp_set = False
-self._qmp = None
+self._qmp_set = enabled


This change seems unrelated to wrapping the connection in a property.
Intuitively, it makes sense that the connection of a running instance
doesn't go away just because I disable QMP in the command line for the
next launch.

If this is the reasoning behind the change, maybe mention it in the
commit message.

With this:
Reviewed-by: Kevin Wolf 



Oh, yes. That's what happened here -- and it got folded in here 
specifically to make that access check consistent.


I'll update the commit message.

Re: [PATCH 03/20] python/machine.py: reorder init

2020-10-07 Thread John Snow


On 10/7/20 5:43 AM, Kevin Wolf wrote:

Am 07.10.2020 um 01:58 hat John Snow geschrieben:

Put the init arg handling all at the top, and mostly in order (deviating
when one is dependent on another), and put what is effectively runtime
state declaration at the bottom.

Signed-off-by: John Snow 
---
  python/qemu/machine.py | 44 --
  1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 3017ec072df..71fe58be050 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -84,42 +84,54 @@ def __init__(self, binary, args=None, wrapper=None, 
name=None,
  @param monitor_address: address for QMP monitor
  @param socket_scm_helper: helper program, required for send_fd_scm()
  @param sock_dir: where to create socket (overrides test_dir for sock)
-@param console_log: (optional) path to console log file
  @param drain_console: (optional) True to drain console socket to 
buffer
+@param console_log: (optional) path to console log file
  @note: Qemu process is not started until launch() is used.
  '''
+# Direct user configuration
+
+self._binary = binary
+
  if args is None:
  args = []
+# Copy mutable input: we will be modifying our copy
+self._args = list(args)
+
  if wrapper is None:
  wrapper = []
-if name is None:
-name = "qemu-%d" % os.getpid()
-if sock_dir is None:
-sock_dir = test_dir
-self._name = name
+self._wrapper = wrapper
+
+self._name = name or "qemu-%d" % os.getpid()
+self._test_dir = test_dir
+self._sock_dir = sock_dir or self._test_dir
+self._socket_scm_helper = socket_scm_helper


Interesting that you use a shortcut with 'or' for name and sock_dir,
but you don't have 'wraper or []' and 'args or []' above.

It's not wrong, of course, but if you have to respin for another reason,
maybe an inconsistency to address.



This winds up being because ... I delete those lines of code later. I 
very often have this problem where I clean up a bunch of stuff, and then 
split out that giant commit into a series.


Sometimes that causes stuff like this.


Reviewed-by: Kevin Wolf 



Thanks!

Re: [PATCH v1 03/22] hw/ide: restore replay support of IDE

2020-10-07 Thread Philippe Mathieu-Daudé

On 10/7/20 6:00 PM, Alex Bennée wrote:
> A recent change to weak reset handling broke replay due to the use of
> aio_bh_schedule_oneshot instead of the replay aware
> replay_bh_schedule_oneshot_event.
> 
> Fixes: 55adb3c456 ("ide: cancel pending callbacks on SRST")
> Suggested-by: Pavel Dovgalyuk 
> Signed-off-by: Alex Bennée 
> ---
>  hw/ide/core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 0e32abd779..693b352d5e 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2289,8 +2289,8 @@ void ide_ctrl_write(void *opaque, uint32_t addr, 
> uint32_t val)
>  s = &bus->ifs[i];
>  s->status |= BUSY_STAT;
>  }
> -aio_bh_schedule_oneshot(qemu_get_aio_context(),
> -ide_bus_perform_srst, bus);
> +replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
> + ide_bus_perform_srst, bus);

Reviewed-by: Philippe Mathieu-Daudé 

>  }
>  
>  bus->cmd = val;
>

Re: [PATCH v10 9/9] block: apply COR-filter to block-stream jobs

2020-10-07 Thread Andrey Shinkevich




On 07.10.2020 20:27, Andrey Shinkevich wrote:


On 29.09.2020 15:38, Andrey Shinkevich wrote:

This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 93 
++

  tests/qemu-iotests/030 | 51 +++--
  tests/qemu-iotests/030.out |  4 +-
  tests/qemu-iotests/141.out |  2 +-
  tests/qemu-iotests/245 | 19 +++---
  5 files changed, 83 insertions(+), 86 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index fe2663f..240b3dc 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -17,8 +17,10 @@



One more change missed, as we use the COR-filter:

@@ -47,8 +47,7 @@ static int coroutine_fn stream_populate(BlockBackend 
*blk,

  {
  assert(bytes < SIZE_MAX);

-    return blk_co_preadv(blk, offset, bytes, NULL,
- BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);

  +return blk_co_preadv(blk, offset, bytes, NULL, BDRV_REQ_PREFETCH);

Sorry, with the only flag BDRV_REQ_PREFETCH set.
A change in the comment at the flag BDRV_REQ_PREFETCH is coming with a 
separate patch as Vladimir suggested.


Andrey


+    return blk_co_preadv(blk, offset, bytes, NULL, 0);
  }

Andrey

Re: [PATCH v3 1/4] keyval: Parse help options

2020-10-07 Thread Eric Blake

On 10/7/20 11:49 AM, Kevin Wolf wrote:
> This adds a special meaning for 'help' and '?' as options to the keyval
> parser. Instead of being an error (because of a missing value) or a
> value for an implied key, they now request help, which is a new boolean
> ouput of the parser in addition to the QDict.

output

> 
> A new parameter 'p_help' is added to keyval_parse() that contains on
> return whether help was requested. If NULL is passed, requesting help
> results in an error and all other cases work like before.
> 
> Turning previous error cases into help is a compatible extension. The
> behaviour potentially changes for implied keys: They could previously
> get 'help' as their value, which is now interpreted as requesting help.
> 
> This is not a problem in practice because 'help' and '?' are not a valid
> values for the implied key of any option parsed with keyval_parse():
> 
> * audiodev: union Audiodev, implied key "driver" is enum AudiodevDriver,
>   "help" and "?" are not among its values
> 
> * display: union DisplayOptions, implied key "type" is enum
>   DisplayType, "help" and "?" are not among its values
> 
> * blockdev: union BlockdevOptions, implied key "driver is enum
>   BlockdevDriver, "help" and "?" are not among its values
> 
> * export: union BlockExport, implied key "type" is enum BlockExportType,
>   "help" and "?" are not among its values
> 
> * monitor: struct MonitorOptions, implied key "mode"is enum MonitorMode,

missing space

>   "help" and "?" are not among its values
> 
> * nbd-server: struct NbdServerOptions, no implied key.

Including the audit is nice (and thanks to Markus for helping derive the
list).

> 
> Signed-off-by: Kevin Wolf 
> ---

> +++ b/util/keyval.c
> @@ -14,7 +14,7 @@
>   * KEY=VALUE,... syntax:
>   *
>   *   key-vals = [ key-val { ',' key-val } [ ',' ] ]
> - *   key-val  = key '=' val
> + *   key-val  = 'help' | key '=' val

Maybe: = 'help' | '?' | key = '=' val

>   *   key  = key-fragment { '.' key-fragment }
>   *   key-fragment = / [^=,.]* /
>   *   val  = { / [^,]* / | ',,' }
> @@ -73,10 +73,14 @@
>   *
>   * Additional syntax for use with an implied key:
>   *
> - *   key-vals-ik  = val-no-key [ ',' key-vals ]
> + *   key-vals-ik  = 'help' | val-no-key [ ',' key-vals ]

and again for '?' here.  Actually, this should probably be:

key-vals-ik  = 'help' [ ',' key-vals ]
 | '?' [ ',' key-vals ]
 | val-no-key [ ',' key-vals ]

>   *   val-no-key   = / [^=,]* /

This is now slightly inaccurate, but I don't know how to concisely
express the regex [^=,]* but not '?' or 'help', and the prose covers the
ambiguity.


> @@ -410,6 +442,14 @@ QDict *keyval_parse(const char *params, const char 
> *implied_key,
>  implied_key = NULL;
>  }
>  
> +if (p_help) {
> +*p_help = help;
> +} else if (help) {
> +error_setg(errp, "Help is not available for this option");
> +qobject_unref(qdict);
> +return NULL;
> +}

This part is a definite improvement over v2.

I'm assuming Markus can help touch up the comments and spelling errors, so:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v10 9/9] block: apply COR-filter to block-stream jobs

2020-10-07 Thread Andrey Shinkevich




On 29.09.2020 15:38, Andrey Shinkevich wrote:

This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 93 ++
  tests/qemu-iotests/030 | 51 +++--
  tests/qemu-iotests/030.out |  4 +-
  tests/qemu-iotests/141.out |  2 +-
  tests/qemu-iotests/245 | 19 +++---
  5 files changed, 83 insertions(+), 86 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index fe2663f..240b3dc 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -17,8 +17,10 @@



One more change missed, as we use the COR-filter:

@@ -47,8 +47,7 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
 {
 assert(bytes < SIZE_MAX);

-return blk_co_preadv(blk, offset, bytes, NULL,
- BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
+return blk_co_preadv(blk, offset, bytes, NULL, 0);
 }

Andrey

[PATCH v3 3/4] qom: Add user_creatable_print_help_from_qdict()

2020-10-07 Thread Kevin Wolf

This adds a function that, given a QDict of non-help options, prints
help for user creatable objects.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 include/qom/object_interfaces.h | 21 ++---
 qom/object_interfaces.c |  9 +
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index f118fb516b..07d5cc8832 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -154,13 +154,28 @@ int user_creatable_add_opts_foreach(void *opaque,
  * @type: the QOM type to be added
  * @opts: options to create
  *
- * Prints help if requested in @opts.
+ * Prints help if requested in @type or @opts. Note that if @type is neither
+ * "help"/"?" nor a valid user creatable type, no help will be printed
+ * regardless of @opts.
  *
- * Returns: true if @opts contained a help option and help was printed, false
- * if no help option was found.
+ * Returns: true if a help option was found and help was printed, false
+ * otherwise.
  */
 bool user_creatable_print_help(const char *type, QemuOpts *opts);
 
+/**
+ * user_creatable_print_help_from_qdict:
+ * @args: options to create
+ *
+ * Prints help considering the other options given in @args (if "qom-type" is
+ * given and valid, print properties for the type, otherwise print valid types)
+ *
+ * In contrast to user_creatable_print_help(), this function can't return that
+ * no help was requested. It should only be called if we know that help is
+ * requested and it will always print some help.
+ */
+void user_creatable_print_help_from_qdict(QDict *args);
+
 /**
  * user_creatable_del:
  * @id: the unique ID for the object
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 3fd1da157e..ed896fe764 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -279,6 +279,15 @@ bool user_creatable_print_help(const char *type, QemuOpts 
*opts)
 return false;
 }
 
+void user_creatable_print_help_from_qdict(QDict *args)
+{
+const char *type = qdict_get_try_str(args, "qom-type");
+
+if (!type || !user_creatable_print_type_properites(type)) {
+user_creatable_print_types();
+}
+}
+
 bool user_creatable_del(const char *id, Error **errp)
 {
 Object *container;
-- 
2.25.4

[PATCH v3 0/4] qemu-storage-daemon: Remove QemuOpts from --object parser

2020-10-07 Thread Kevin Wolf

This replaces the QemuOpts-based help code for --object in the storage
daemon with code based on the keyval parser.

v3:
- Always parse help options, no matter if the caller implements help or
  not. If it doesn't, return an error. [Markus]
- Document changes to the keyval parser grammar [Markus]
- Support both 'help' and '?' [Eric]
- Test case fixes [Eric]
- Improved documentation of user_creatable_print_help(_from_qdict)
  [Markus]

v2:
- Fixed double comma by reusing the existing key and value parsers [Eric]
- More tests to cover the additional cases

Kevin Wolf (4):
  keyval: Parse help options
  qom: Factor out helpers from user_creatable_print_help()
  qom: Add user_creatable_print_help_from_qdict()
  qemu-storage-daemon: Remove QemuOpts from --object parser

 include/qemu/help_option.h   |   5 +
 include/qemu/option.h|   2 +-
 include/qom/object_interfaces.h  |  21 ++-
 qapi/qobject-input-visitor.c |   2 +-
 qom/object_interfaces.c  |  99 -
 storage-daemon/qemu-storage-daemon.c |  15 +-
 tests/test-keyval.c  | 205 +++
 util/keyval.c|  54 ++-
 8 files changed, 280 insertions(+), 123 deletions(-)

-- 
2.25.4

[PATCH v3 2/4] qom: Factor out helpers from user_creatable_print_help()

2020-10-07 Thread Kevin Wolf

This creates separate helper functions for printing a list of user
creatable object types and for printing a list of properties of a given
type. This will allow using these parts without having a QemuOpts.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 qom/object_interfaces.c | 90 -
 1 file changed, 52 insertions(+), 38 deletions(-)

diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index e8e1523960..3fd1da157e 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -214,54 +214,68 @@ char *object_property_help(const char *name, const char 
*type,
 return g_string_free(str, false);
 }
 
-bool user_creatable_print_help(const char *type, QemuOpts *opts)
+static void user_creatable_print_types(void)
+{
+GSList *l, *list;
+
+printf("List of user creatable objects:\n");
+list = object_class_get_list_sorted(TYPE_USER_CREATABLE, false);
+for (l = list; l != NULL; l = l->next) {
+ObjectClass *oc = OBJECT_CLASS(l->data);
+printf("  %s\n", object_class_get_name(oc));
+}
+g_slist_free(list);
+}
+
+static bool user_creatable_print_type_properites(const char *type)
 {
 ObjectClass *klass;
+ObjectPropertyIterator iter;
+ObjectProperty *prop;
+GPtrArray *array;
+int i;
 
-if (is_help_option(type)) {
-GSList *l, *list;
+klass = object_class_by_name(type);
+if (!klass) {
+return false;
+}
 
-printf("List of user creatable objects:\n");
-list = object_class_get_list_sorted(TYPE_USER_CREATABLE, false);
-for (l = list; l != NULL; l = l->next) {
-ObjectClass *oc = OBJECT_CLASS(l->data);
-printf("  %s\n", object_class_get_name(oc));
+array = g_ptr_array_new();
+object_class_property_iter_init(&iter, klass);
+while ((prop = object_property_iter_next(&iter))) {
+if (!prop->set) {
+continue;
 }
-g_slist_free(list);
-return true;
+
+g_ptr_array_add(array,
+object_property_help(prop->name, prop->type,
+ prop->defval, prop->description));
 }
+g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);
+if (array->len > 0) {
+printf("%s options:\n", type);
+} else {
+printf("There are no options for %s.\n", type);
+}
+for (i = 0; i < array->len; i++) {
+printf("%s\n", (char *)array->pdata[i]);
+}
+g_ptr_array_set_free_func(array, g_free);
+g_ptr_array_free(array, true);
+return true;
+}
 
-klass = object_class_by_name(type);
-if (klass && qemu_opt_has_help_opt(opts)) {
-ObjectPropertyIterator iter;
-ObjectProperty *prop;
-GPtrArray *array = g_ptr_array_new();
-int i;
-
-object_class_property_iter_init(&iter, klass);
-while ((prop = object_property_iter_next(&iter))) {
-if (!prop->set) {
-continue;
-}
-
-g_ptr_array_add(array,
-object_property_help(prop->name, prop->type,
- prop->defval, 
prop->description));
-}
-g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);
-if (array->len > 0) {
-printf("%s options:\n", type);
-} else {
-printf("There are no options for %s.\n", type);
-}
-for (i = 0; i < array->len; i++) {
-printf("%s\n", (char *)array->pdata[i]);
-}
-g_ptr_array_set_free_func(array, g_free);
-g_ptr_array_free(array, true);
+bool user_creatable_print_help(const char *type, QemuOpts *opts)
+{
+if (is_help_option(type)) {
+user_creatable_print_types();
 return true;
 }
 
+if (qemu_opt_has_help_opt(opts)) {
+return user_creatable_print_type_properites(type);
+}
+
 return false;
 }
 
-- 
2.25.4

Re: [PATCH v1 03/22] hw/ide: restore replay support of IDE

2020-10-07 Thread John Snow


On 10/7/20 12:00 PM, Alex Bennée wrote:

A recent change to weak reset handling broke replay due to the use of
aio_bh_schedule_oneshot instead of the replay aware
replay_bh_schedule_oneshot_event.

Fixes: 55adb3c456 ("ide: cancel pending callbacks on SRST")
Suggested-by: Pavel Dovgalyuk 
Signed-off-by: Alex Bennée 
---
  hw/ide/core.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0e32abd779..693b352d5e 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2289,8 +2289,8 @@ void ide_ctrl_write(void *opaque, uint32_t addr, uint32_t 
val)
  s = &bus->ifs[i];
  s->status |= BUSY_STAT;
  }
-aio_bh_schedule_oneshot(qemu_get_aio_context(),
-ide_bus_perform_srst, bus);
+replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
+ ide_bus_perform_srst, bus);
  }
  
  bus->cmd = val;




Reviewed-by: John Snow 
Acked-by: John Snow

[PATCH v3 4/4] qemu-storage-daemon: Remove QemuOpts from --object parser

2020-10-07 Thread Kevin Wolf

The command line parser for --object parses the input twice: Once into
QemuOpts just for detecting help options, and then again into a QDict
using the keyval parser for actually creating the object.

Now that the keyval parser can also detect help options, we can simplify
this and remove the QemuOpts part.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 storage-daemon/qemu-storage-daemon.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 6f0e0cfb36..e419ba9f19 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -264,21 +264,14 @@ static void process_options(int argc, char *argv[])
 }
 case OPTION_OBJECT:
 {
-QemuOpts *opts;
-const char *type;
 QDict *args;
+bool help;
 
-/* FIXME The keyval parser rejects 'help' arguments, so we must
- * unconditionall try QemuOpts first. */
-opts = qemu_opts_parse(&qemu_object_opts,
-   optarg, true, &error_fatal);
-type = qemu_opt_get(opts, "qom-type");
-if (type && user_creatable_print_help(type, opts)) {
+args = keyval_parse(optarg, "qom-type", &help, &error_fatal);
+if (help) {
+user_creatable_print_help_from_qdict(args);
 exit(EXIT_SUCCESS);
 }
-qemu_opts_del(opts);
-
-args = keyval_parse(optarg, "qom-type", NULL, &error_fatal);
 user_creatable_add_dict(args, true, &error_fatal);
 qobject_unref(args);
 break;
-- 
2.25.4

[PATCH v3 1/4] keyval: Parse help options

2020-10-07 Thread Kevin Wolf

This adds a special meaning for 'help' and '?' as options to the keyval
parser. Instead of being an error (because of a missing value) or a
value for an implied key, they now request help, which is a new boolean
ouput of the parser in addition to the QDict.

A new parameter 'p_help' is added to keyval_parse() that contains on
return whether help was requested. If NULL is passed, requesting help
results in an error and all other cases work like before.

Turning previous error cases into help is a compatible extension. The
behaviour potentially changes for implied keys: They could previously
get 'help' as their value, which is now interpreted as requesting help.

This is not a problem in practice because 'help' and '?' are not a valid
values for the implied key of any option parsed with keyval_parse():

* audiodev: union Audiodev, implied key "driver" is enum AudiodevDriver,
  "help" and "?" are not among its values

* display: union DisplayOptions, implied key "type" is enum
  DisplayType, "help" and "?" are not among its values

* blockdev: union BlockdevOptions, implied key "driver is enum
  BlockdevDriver, "help" and "?" are not among its values

* export: union BlockExport, implied key "type" is enum BlockExportType,
  "help" and "?" are not among its values

* monitor: struct MonitorOptions, implied key "mode"is enum MonitorMode,
  "help" and "?" are not among its values

* nbd-server: struct NbdServerOptions, no implied key.

Signed-off-by: Kevin Wolf 
---
 include/qemu/help_option.h   |   5 +
 include/qemu/option.h|   2 +-
 qapi/qobject-input-visitor.c |   2 +-
 storage-daemon/qemu-storage-daemon.c |   2 +-
 tests/test-keyval.c  | 205 +++
 util/keyval.c|  54 ++-
 6 files changed, 198 insertions(+), 72 deletions(-)

diff --git a/include/qemu/help_option.h b/include/qemu/help_option.h
index 328d2a89fd..952d628b9d 100644
--- a/include/qemu/help_option.h
+++ b/include/qemu/help_option.h
@@ -19,4 +19,9 @@ static inline bool is_help_option(const char *s)
 return !strcmp(s, "?") || !strcmp(s, "help");
 }
 
+static inline bool is_help_option_n(const char *s, size_t n)
+{
+return !strncmp(s, "?", n) || !strncmp(s, "help", n);
+}
+
 #endif
diff --git a/include/qemu/option.h b/include/qemu/option.h
index 05e8a15c73..ac69352e0e 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -149,6 +149,6 @@ void qemu_opts_free(QemuOptsList *list);
 QemuOptsList *qemu_opts_append(QemuOptsList *dst, QemuOptsList *list);
 
 QDict *keyval_parse(const char *params, const char *implied_key,
-Error **errp);
+bool *help, Error **errp);
 
 #endif
diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index f918a05e5f..7b184b50a7 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -757,7 +757,7 @@ Visitor *qobject_input_visitor_new_str(const char *str,
 assert(args);
 v = qobject_input_visitor_new(QOBJECT(args));
 } else {
-args = keyval_parse(str, implied_key, errp);
+args = keyval_parse(str, implied_key, NULL, errp);
 if (!args) {
 return NULL;
 }
diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 1ae1cda481..6f0e0cfb36 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -278,7 +278,7 @@ static void process_options(int argc, char *argv[])
 }
 qemu_opts_del(opts);
 
-args = keyval_parse(optarg, "qom-type", &error_fatal);
+args = keyval_parse(optarg, "qom-type", NULL, &error_fatal);
 user_creatable_add_dict(args, true, &error_fatal);
 qobject_unref(args);
 break;
diff --git a/tests/test-keyval.c b/tests/test-keyval.c
index e331a84149..b1433054de 100644
--- a/tests/test-keyval.c
+++ b/tests/test-keyval.c
@@ -27,27 +27,28 @@ static void test_keyval_parse(void)
 QDict *qdict, *sub_qdict;
 char long_key[129];
 char *params;
+bool help;
 
 /* Nothing */
-qdict = keyval_parse("", NULL, &error_abort);
+qdict = keyval_parse("", NULL, NULL, &error_abort);
 g_assert_cmpuint(qdict_size(qdict), ==, 0);
 qobject_unref(qdict);
 
 /* Empty key (qemu_opts_parse() accepts this) */
-qdict = keyval_parse("=val", NULL, &err);
+qdict = keyval_parse("=val", NULL, NULL, &err);
 error_free_or_abort(&err);
 g_assert(!qdict);
 
 /* Empty key fragment */
-qdict = keyval_parse(".", NULL, &err);
+qdict = keyval_parse(".", NULL, NULL, &err);
 error_free_or_abort(&err);
 g_assert(!qdict);
-qdict = keyval_parse("key.", NULL, &err);
+qdict = keyval_parse("key.", NULL, NULL, &err);
 error_free_or_abort(&err);
 g_assert(!qdict);
 
 /* Invalid non-empty key (qemu_opts_parse() doesn't care) */
-qdict = keyva

[PATCH v2] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Alberto Garcia

The QCowL2Meta structure is used to store information about a part of
a write request that touches clusters that need changes in their L2
entries. This happens with newly-allocated clusters or subclusters.

This structure has changed a bit since it was first created and its
current documentation is not quite up-to-date.

A write request can span a region consisting of a combination of
clusters of different types, and qcow2_alloc_host_offset() can
repeatedly call handle_copied() and handle_alloc() to add more
clusters to the mix as long as they all are contiguous on the image
file.

Because of this a write request has a list of QCowL2Meta structures,
one for each part of the request that needs changes in the L2
metadata.

Each one of them spans nb_clusters and has two copy-on-write regions
located immediately before and after the middle region touched by that
part of the write request. Even when those regions themselves are
empty their offsets must be correct because they are used to know the
location of the middle region.

This was not always the case but it is not a problem anymore
because the only two places where QCowL2Meta structures are created
(calculate_l2_meta() and qcow2_co_truncate()) ensure that the
copy-on-write regions are correctly defined, and so do assertions like
the ones in perform_cow().

The conditional initialization of the 'written_to' variable is
therefore unnecessary and is removed by this patch.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h | 19 +++
 block/qcow2-cluster.c |  5 +++--
 block/qcow2.c | 19 +++
 3 files changed, 29 insertions(+), 14 deletions(-)

v2: Fix mistakes on the commit messages [Eric]

diff --git a/block/qcow2.h b/block/qcow2.h
index 125ea9679b..2e0272a7b8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -435,17 +435,18 @@ typedef struct Qcow2COWRegion {
 
 /**
  * Describes an in-flight (part of a) write request that writes to clusters
- * that are not referenced in their L2 table yet.
+ * that need to have their L2 table entries updated (because they are
+ * newly allocated or need changes in their L2 bitmaps)
  */
 typedef struct QCowL2Meta
 {
-/** Guest offset of the first newly allocated cluster */
+/** Guest offset of the first updated cluster */
 uint64_t offset;
 
-/** Host offset of the first newly allocated cluster */
+/** Host offset of the first updated cluster */
 uint64_t alloc_offset;
 
-/** Number of newly allocated clusters */
+/** Number of updated clusters */
 int nb_clusters;
 
 /** Do not free the old clusters */
@@ -458,14 +459,16 @@ typedef struct QCowL2Meta
 CoQueue dependent_requests;
 
 /**
- * The COW Region between the start of the first allocated cluster and the
- * area the guest actually writes to.
+ * The COW Region immediately before the area the guest actually
+ * writes to. This (part of the) write request starts at
+ * cow_start.offset + cow_start.nb_bytes.
  */
 Qcow2COWRegion cow_start;
 
 /**
- * The COW Region between the area the guest actually writes to and the
- * end of the last allocated cluster.
+ * The COW Region immediately after the area the guest actually
+ * writes to. This (part of the) write request ends at cow_end.offset
+ * (which must always be set even when cow_end.nb_bytes is 0).
  */
 Qcow2COWRegion cow_end;
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index aa87d3e99b..485b4cb92e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1049,6 +1049,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 
 assert(l2_index + m->nb_clusters <= s->l2_slice_size);
+assert(m->cow_end.offset + m->cow_end.nb_bytes <=
+   m->nb_clusters << s->cluster_bits);
 for (i = 0; i < m->nb_clusters; i++) {
 uint64_t offset = cluster_offset + ((uint64_t)i << s->cluster_bits);
 /* if two concurrent writes happen to the same unallocated cluster
@@ -1070,8 +1072,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 if (has_subclusters(s) && !m->prealloc) {
 uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
 unsigned written_from = m->cow_start.offset;
-unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
-m->nb_clusters << s->cluster_bits;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes;
 int first_sc, last_sc;
 /* Narrow written_from and written_to down to the current cluster 
*/
 written_from = MAX(written_from, i << s->cluster_bits);
diff --git a/block/qcow2.c b/block/qcow2.c
index b05512718c..e7b27fdf25 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2361,15 +2361,26 @@ static bool m

Re: [PATCH] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Alberto Garcia

On Wed 07 Oct 2020 05:47:32 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> 07.10.2020 18:38, Alberto Garcia wrote:
>> On Wed 07 Oct 2020 04:42:37 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
/**
 - * The COW Region between the start of the first allocated cluster 
 and the
 - * area the guest actually writes to.
 + * The COW Region immediately before the area the guest actually
 + * writes to. This (part of the) write request starts at
 + * cow_start.offset + cow_start.nb_bytes.
>>>
>>> "starts at" is a bit misleading.. As here is not guest offset, not
>>> host offset, but offset relative to QCowL2Meta.offset
>> 
>> I thought it was clear by the context since Qcow2COWRegion.offset is
>> defined to be relative to QCowL2Meta.offset
>> 
 @@ -1049,6 +1049,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState 
 *bs, QCowL2Meta *m)
qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);

assert(l2_index + m->nb_clusters <= s->l2_slice_size);
 +assert(m->cow_end.offset + m->cow_end.nb_bytes <=
 +   m->nb_clusters << s->cluster_bits);
>>>
>>> should we also assert that cow_end.offset + cow_end.nb_bytes is not
>>> zero?
>> 
>> No, a write request that fills a complete cluster has no COW but still
>> needs to call qcow2_alloc_cluster_link_l2() to update the L2 metadata.
>
> but cow_end.offset will not be zero still? I suggest it because you
> actually rely on this fact by dropping written_to conditional
> assignment.

No, cow_end.offset must not be 0 but the offset where the data region
ends, this is already enforced by this assertion in perform_cow():

assert(start->offset + start->nb_bytes <= end->offset);

And it is explicitly mentioned in the commit message:

Even when those regions themselves are empty their offsets must be
correct because they are used to know the location of the middle
region.

and also in qcow2.h:

  /**
   * The COW Region immediately after the area the guest actually
   * writes to. This (part of the) write request ends at cow_end.offset
   * (which must always be set even when cow_end.nb_bytes is 0).
   */
  Qcow2COWRegion cow_end;

Berto

[PATCH v1 03/22] hw/ide: restore replay support of IDE

2020-10-07 Thread Alex Bennée

A recent change to weak reset handling broke replay due to the use of
aio_bh_schedule_oneshot instead of the replay aware
replay_bh_schedule_oneshot_event.

Fixes: 55adb3c456 ("ide: cancel pending callbacks on SRST")
Suggested-by: Pavel Dovgalyuk 
Signed-off-by: Alex Bennée 
---
 hw/ide/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0e32abd779..693b352d5e 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2289,8 +2289,8 @@ void ide_ctrl_write(void *opaque, uint32_t addr, uint32_t 
val)
 s = &bus->ifs[i];
 s->status |= BUSY_STAT;
 }
-aio_bh_schedule_oneshot(qemu_get_aio_context(),
-ide_bus_perform_srst, bus);
+replay_bh_schedule_oneshot_event(qemu_get_aio_context(),
+ ide_bus_perform_srst, bus);
 }
 
 bus->cmd = val;
-- 
2.20.1

Re: [PATCH] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


07.10.2020 18:38, Alberto Garcia wrote:

On Wed 07 Oct 2020 04:42:37 PM CEST, Vladimir Sementsov-Ogievskiy wrote:

   /**
- * The COW Region between the start of the first allocated cluster and the
- * area the guest actually writes to.
+ * The COW Region immediately before the area the guest actually
+ * writes to. This (part of the) write request starts at
+ * cow_start.offset + cow_start.nb_bytes.


"starts at" is a bit misleading.. As here is not guest offset, not
host offset, but offset relative to QCowL2Meta.offset


I thought it was clear by the context since Qcow2COWRegion.offset is
defined to be relative to QCowL2Meta.offset


@@ -1049,6 +1049,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
   qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
   
   assert(l2_index + m->nb_clusters <= s->l2_slice_size);

+assert(m->cow_end.offset + m->cow_end.nb_bytes <=
+   m->nb_clusters << s->cluster_bits);


should we also assert that cow_end.offset + cow_end.nb_bytes is not
zero?


No, a write request that fills a complete cluster has no COW but still
needs to call qcow2_alloc_cluster_link_l2() to update the L2 metadata.


but cow_end.offset will not be zero still? I suggest it because you actually 
rely on this fact by dropping written_to conditional assignment.




-/* The end region must be immediately after the data (middle)
- * region */
+/* The write request should end immediately before the second
+ * COW region (see above for why it does not always happen) */
   if (m->offset + m->cow_end.offset != offset + bytes) {
+assert(offset + bytes > m->offset + m->cow_end.offset);
+assert(m->cow_end.nb_bytes == 0);
   continue;
   }


Then it's interesting, why not to merge if we have this zero-length
cow region? What's the problem with it?


We do merge the request and the COW if one of the COW regions has zero
length, visually:

  1)
 <>  <--- cow_end --->
 <--- write request >

  2)
 <--- cow_start --->  <>
<--- write request >

In this case however the problem is not that the region is empty, but
that the request starts *before* (or after) that region and that there's
probably another QCowL2Meta earlier (or later):

 <  1st QCowL2Meta  > < 2nd QCowL2Meta >
 <--- cow_start --->   <> <>   <--- cow_end --->
< write request -->



In this picture it still seems possible to merge "write-request" to "1st 
QCowL2Meta",
keeping "2nd QCowL2Meta" in separate.. Or, in this case we should create 
additional
relation between these metas? OK, it's not so simple, thanks for the picture.


What we would need here is to merge all QCowL2Meta into one, but I don't
think that we want to do that because it looks like complicating the
code for a scenario that does not happen very often.

Berto




--
Best regards,
Vladimir

Re: [PATCH] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Alberto Garcia

On Wed 07 Oct 2020 04:42:37 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>   /**
>> - * The COW Region between the start of the first allocated cluster and 
>> the
>> - * area the guest actually writes to.
>> + * The COW Region immediately before the area the guest actually
>> + * writes to. This (part of the) write request starts at
>> + * cow_start.offset + cow_start.nb_bytes.
>
> "starts at" is a bit misleading.. As here is not guest offset, not
> host offset, but offset relative to QCowL2Meta.offset

I thought it was clear by the context since Qcow2COWRegion.offset is
defined to be relative to QCowL2Meta.offset

>> @@ -1049,6 +1049,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
>> QCowL2Meta *m)
>>   qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>   
>>   assert(l2_index + m->nb_clusters <= s->l2_slice_size);
>> +assert(m->cow_end.offset + m->cow_end.nb_bytes <=
>> +   m->nb_clusters << s->cluster_bits);
>
> should we also assert that cow_end.offset + cow_end.nb_bytes is not
> zero?

No, a write request that fills a complete cluster has no COW but still
needs to call qcow2_alloc_cluster_link_l2() to update the L2 metadata.

>> -/* The end region must be immediately after the data (middle)
>> - * region */
>> +/* The write request should end immediately before the second
>> + * COW region (see above for why it does not always happen) */
>>   if (m->offset + m->cow_end.offset != offset + bytes) {
>> +assert(offset + bytes > m->offset + m->cow_end.offset);
>> +assert(m->cow_end.nb_bytes == 0);
>>   continue;
>>   }
>
> Then it's interesting, why not to merge if we have this zero-length
> cow region? What's the problem with it?

We do merge the request and the COW if one of the COW regions has zero
length, visually:

 1)
<>  <--- cow_end --->
<--- write request >

 2)
<--- cow_start --->  <>
   <--- write request >

In this case however the problem is not that the region is empty, but
that the request starts *before* (or after) that region and that there's
probably another QCowL2Meta earlier (or later):

<  1st QCowL2Meta  > < 2nd QCowL2Meta >
<--- cow_start --->   <> <>   <--- cow_end --->
   < write request -->

What we would need here is to merge all QCowL2Meta into one, but I don't
think that we want to do that because it looks like complicating the
code for a scenario that does not happen very often.

Berto

Re: [PATCH] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


07.10.2020 14:52, Alberto Garcia wrote:

The QCowL2Meta structure is used to store information about a part of
a write request that touches clusters that need changes in their L2
entries. This happens with newly-allocated clusters or subclusters.

This structure has changed a bit since it was first created and its
current documentation is not quite up-to-date.

A write request can span a region consisting on a combination of
clusters of different types, and qcow2_alloc_host_offset() can
repeatedly call handle_copied() and handle_alloc() to add more
clusters to the mix as long as they all are contiguous on the image
file.

Because of this a write request has a list of QCowL2Meta structures,
one for each part of the request that needs changes in the L2
metadata.

Each one of them spans nb_clusters and has two copy-on-write regions
located immediately before and after the middle region that that part
of the write request touches. Even when those regions themselves are
empty their offsets must be correct because they are used to know the
location of the middle region.

This was not always the case but it is not a problem anymore
because the only two places where QCowL2Meta structures are created
(calculate_l2_meta() and qcow2_co_truncate()) ensure that the
copy-on-write regions are correctly defined, and so do assertions like
the ones in perform_cow().

The conditional initialization of the 'written_to' variable is
therefore unnecessary and is removed by this patch.

Signed-off-by: Alberto Garcia 
---
  block/qcow2.h | 19 +++
  block/qcow2-cluster.c |  5 +++--
  block/qcow2.c | 19 +++
  3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 125ea9679b..2e0272a7b8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -435,17 +435,18 @@ typedef struct Qcow2COWRegion {
  
  /**

   * Describes an in-flight (part of a) write request that writes to clusters
- * that are not referenced in their L2 table yet.
+ * that need to have their L2 table entries updated (because they are
+ * newly allocated or need changes in their L2 bitmaps)
   */
  typedef struct QCowL2Meta
  {
-/** Guest offset of the first newly allocated cluster */
+/** Guest offset of the first updated cluster */
  uint64_t offset;
  
-/** Host offset of the first newly allocated cluster */

+/** Host offset of the first updated cluster */
  uint64_t alloc_offset;
  
-/** Number of newly allocated clusters */

+/** Number of updated clusters */
  int nb_clusters;
  
  /** Do not free the old clusters */

@@ -458,14 +459,16 @@ typedef struct QCowL2Meta
  CoQueue dependent_requests;
  
  /**

- * The COW Region between the start of the first allocated cluster and the
- * area the guest actually writes to.
+ * The COW Region immediately before the area the guest actually
+ * writes to. This (part of the) write request starts at
+ * cow_start.offset + cow_start.nb_bytes.


"starts at" is a bit misleading.. As here is not guest offset, not host offset,
but offset relative to QCowL2Meta.offset


   */
  Qcow2COWRegion cow_start;
  
  /**

- * The COW Region between the area the guest actually writes to and the
- * end of the last allocated cluster.
+ * The COW Region immediately after the area the guest actually
+ * writes to. This (part of the) write request ends at cow_end.offset


same here


+ * (which must always be set even when cow_end.nb_bytes is 0).
   */
  Qcow2COWRegion cow_end;
  
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c

index aa87d3e99b..485b4cb92e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1049,6 +1049,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
  
  assert(l2_index + m->nb_clusters <= s->l2_slice_size);

+assert(m->cow_end.offset + m->cow_end.nb_bytes <=
+   m->nb_clusters << s->cluster_bits);


should we also assert that cow_end.offset + cow_end.nb_bytes is not zero?


  for (i = 0; i < m->nb_clusters; i++) {
  uint64_t offset = cluster_offset + ((uint64_t)i << s->cluster_bits);
  /* if two concurrent writes happen to the same unallocated cluster
@@ -1070,8 +1072,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  if (has_subclusters(s) && !m->prealloc) {
  uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
  unsigned written_from = m->cow_start.offset;
-unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
-m->nb_clusters << s->cluster_bits;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes;
  int first_sc, last_sc;
  /* Narrow written_from and written_to down to the current cluster 
*/
  written_from = MAX(w

Re: [PATCH] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Eric Blake

On 10/7/20 6:52 AM, Alberto Garcia wrote:
> The QCowL2Meta structure is used to store information about a part of
> a write request that touches clusters that need changes in their L2
> entries. This happens with newly-allocated clusters or subclusters.
> 
> This structure has changed a bit since it was first created and its
> current documentation is not quite up-to-date.
> 
> A write request can span a region consisting on a combination of

s/on/of/

> clusters of different types, and qcow2_alloc_host_offset() can
> repeatedly call handle_copied() and handle_alloc() to add more
> clusters to the mix as long as they all are contiguous on the image
> file.
> 
> Because of this a write request has a list of QCowL2Meta structures,
> one for each part of the request that needs changes in the L2
> metadata.
> 
> Each one of them spans nb_clusters and has two copy-on-write regions
> located immediately before and after the middle region that that part
> of the write request touches. Even when those regions themselves are

Grammatically correct, but the doubled 'that' caught me on my first
read.  Maybe you want:

s/that that part of the write request touches/touched by that part of
the write request/

> empty their offsets must be correct because they are used to know the
> location of the middle region.
> 
> This was not always the case but it is not a problem anymore
> because the only two places where QCowL2Meta structures are created
> (calculate_l2_meta() and qcow2_co_truncate()) ensure that the
> copy-on-write regions are correctly defined, and so do assertions like
> the ones in perform_cow().
> 
> The conditional initialization of the 'written_to' variable is
> therefore unnecessary and is removed by this patch.
> 
> Signed-off-by: Alberto Garcia 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 5/5] nbd: Add 'qemu-nbd -A' to expose allocation depth

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


30.09.2020 15:11, Eric Blake wrote:

Allow the server to expose an additional metacontext to be requested
by savvy clients.  qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.

qemu as client can be hacked into viewing this new context by using
the now-misnamed x-dirty-bitmap option when creating an NBD blockdev
(even though our x- naming means we could rename it, I did not think
it worth breaking back-compat of tools that have been using it while
waiting for a better solution).  It is worth noting the decoding of
how such context information will appear in 'qemu-img map
--output=json':

NBD_STATE_DEPTH_UNALLOC => "zero":false, "data":true
NBD_STATE_DEPTH_LOCAL   => "zero":false, "data":false
NBD_STATE_DEPTH_BACKING => "zero":true,  "data":true

libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)

Signed-off-by: Eric Blake 
---
  docs/tools/qemu-nbd.rst|  6 
  qapi/block-core.json   |  7 ++--
  qapi/block-export.json |  6 +++-
  blockdev-nbd.c |  2 ++
  nbd/server.c   |  2 ++
  qemu-nbd.c | 14 ++--
  tests/qemu-iotests/309 | 73 ++
  tests/qemu-iotests/309.out | 22 
  tests/qemu-iotests/group   |  1 +
  9 files changed, 127 insertions(+), 6 deletions(-)
  create mode 100755 tests/qemu-iotests/309
  create mode 100644 tests/qemu-iotests/309.out

diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 667861cb22e9..0e545a97cfa3 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -72,6 +72,12 @@ driver options if ``--image-opts`` is specified.

Export the disk as read-only.

+.. option:: -A, --allocation-depth
+
+  Expose allocation depth information via the
+  ``qemu:allocation-depth`` context accessible through
+  NBD_OPT_SET_META_CONTEXT.
+
  .. option:: -B, --bitmap=NAME

If *filename* has a qcow2 persistent bitmap *NAME*, expose
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d620bd1302b2..0379eb992db8 100644



[..]


+echo
+echo "=== Check allocation over NBD ==="
+echo
+
+# Normal -f raw NBD block status loses access to allocation information


^this comment is not for the next line, but for further lines ...


+$QEMU_IMG map --output=json -f qcow2 "$TEST_IMG"
+IMG="driver=nbd,server.type=unix,server.path=$nbd_unix_socket"
+nbd_server_start_unix_socket -r -f qcow2 -A "$TEST_IMG"


... should it be here?


+$QEMU_IMG map --output=json --image-opts \
+"$IMG" | _filter_qemu_img_map
+# But since we used -A, and use x-dirty-bitmap as a hack for reading bitmaps,
+# we can reconstruct it, by abusing block status to report:
+#NBD_STATE_DEPTH_UNALLOC => "zero":false, "data":true
+#NBD_STATE_DEPTH_LOCAL   => "zero":false, "data":false
+#NBD_STATE_DEPTH_BACKING => "zero":true,  "data":true
+$QEMU_IMG map --output=json --image-opts \
+"$IMG,x-dirty-bitmap=qemu:allocation-depth" | _filter_qemu_img_map
+
+# success, all done
+echo '*** done'



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v5 0/7] vhost-user-blk: fix the migration issue and enhance qtests

2020-10-07 Thread Dima Stepanov

On Tue, Sep 29, 2020 at 12:48:38PM +0300, Dima Stepanov wrote:
> On Tue, Sep 29, 2020 at 03:13:09AM -0400, Michael S. Tsirkin wrote:
> > On Sun, Sep 27, 2020 at 09:48:28AM +0300, Dima Stepanov wrote:
> > > On Thu, Sep 24, 2020 at 07:26:14AM -0400, Michael S. Tsirkin wrote:
> > > > On Fri, Sep 11, 2020 at 11:39:42AM +0300, Dima Stepanov wrote:
> > > > > v4 -> v5:
> > > > >   - vhost: check queue state in the vhost_dev_set_log routine
> > > > > tests/qtest/vhost-user-test: prepare the tests for adding new
> > > > > dev class
> > > > > tests/qtest/vhost-user-test: add support for the
> > > > > vhost-user-blk device
> > > > > tests/qtest/vhost-user-test: add migrate_reconnect test
> > > > > Reviewed-by: Raphael Norwitz
> > > > >   - Update qtest, by merging vhost-user-blk "if" case with the
> > > > > virtio-blk case.
> > > > 
> > > > I dropped patches 3-7 since they were stalling on some systems.
> > > > Pls work with Peter Maydell (cc'd) to figure it out.
> > > Thanks!
> > > 
> > > Peter, can you share any details for the stalling errors with me?
> > 
> > I can say for sure that even on x86/linux the affected tests take
> > much longer to run with these applied.
> > I'd suggest making sure there are no timeouts involved in the good case 
> Could you help me to reproduce it? Because on my system i see only 10+ seconds
> increase for the qos-test set to pass (both x86_64 and i386). So on the
> current master i'm running it like:
>   - ./configure  --target-list="x86_64-softmmu i386-softmmu"
>   - with no patch set:
> time QTEST_QEMU_BINARY=./build/x86_64-softmmu/qemu-system-x86_64 
> ./build/tests/qtest/qos-test
> real0m6.394s
> user0m3.643s
> sys 0m3.477s
>   - without patch 7 (where i include vhost-user-net tests also):
> real0m9.955s
> user0m4.133s
> sys 0m4.397s
>   - full patch set:
> real0m17.263s
> user0m4.530s
> sys 0m4.802s
> For i386 target i see pretty the same numbers:
>   time QTEST_QEMU_BINARY=./build/i386-softmmu/qemu-system-i386 
> ./build/tests/qtest/qos-test
>   real0m17.386s
>   user0m4.503s
>   sys 0m4.911s
> So it looks like that i'm missing some step to reproduce an issue.
> 
> And if i run the exact test it takes ~2-3s to pass:
>   $ time QTEST_QEMU_BINARY=./build/x86_64-softmmu/qemu-system-x86_64 
> ./build/tests/qtest/qos-test -p 
> /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/vhost-user-blk-pci/vhost-user-blk/vhost-user-blk-tests/reconnect
>   
> /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/vhost-user-blk-pci/vhost-user-blk/vhost-user-blk-tests/reconnect:
>  OK
>   real0m2.253s
>   user0m0.118s
>   sys 0m0.104s
> And same numbers for i386.
> 
Sorry, but still fail to reproduce the stall issue on both i386 and
x86_64 targets. Any help is highly appreciated.

Thanks, Dima.

> > 
> > > > 
> > > > 
> > > > > v3 -> v4:
> > > > >   - vhost: recheck dev state in the vhost_migration_log routine
> > > > > Reviewed-by: Raphael Norwitz
> > > > >   - vhost: check queue state in the vhost_dev_set_log routine
> > > > > Use "continue" instead of "break" to handle non-initialized
> > > > > virtqueue case.
> > > > > 
> > > > > v2 -> v3:
> > > > >   - update commit message for the 
> > > > > "vhost: recheck dev state in the vhost_migration_log routine" 
> > > > > commit
> > > > >   - rename "started" field of the VhostUserBlk structure to
> > > > > "started_vu", so there will be no confustion with the VHOST 
> > > > > started
> > > > > field
> > > > >   - update vhost-user-test.c to always initialize nq local variable
> > > > > (spotted by patchew)
> > > > > 
> > > > > v1 -> v2:
> > > > >   - add comments to connected/started fields in the header file
> > > > >   - move the "s->started" logic from the vhost_user_blk_disconnect
> > > > > routine to the vhost_user_blk_stop routine
> > > > > 
> > > > > Reference e-mail threads:
> > > > >   - 
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
> > > > >   - 
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
> > > > > 
> > > > > If vhost-user daemon is used as a backend for the vhost device, then 
> > > > > we
> > > > > should consider a possibility of disconnect at any moment. There was 
> > > > > a general
> > > > > question here: should we consider it as an error or okay state for 
> > > > > the vhost-user
> > > > > devices during migration process?
> > > > > I think the disconnect event for the vhost-user devices should not 
> > > > > break the
> > > > > migration process, because:
> > > > >   - the device will be in the stopped state, so it will not be changed
> > > > > during migration
> > > > >   - if reconnect will be made the migration log will be reinitialized 
> > > > > as
> > > > > part of reconnect/init process:
> > > > > #0  vhost_log_global_start (listener=0x563989cf7be0)
> > > > > at hw/virtio/vhost.c:920
> > >

Re: [PATCH v2 4/5] nbd: Add new qemu:allocation-depth metacontext

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


30.09.2020 15:11, Eric Blake wrote:

'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point
qemu:dirty-bitmap:NAME can expose that information via the creation of
a temporary bitmap, but we can shorten the effort by adding a new
qemu:allocation-depth context that does the same thing without an
intermediate bitmap (this patch does not eliminate the need for that
proposal, as it will have other uses as well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); an obvious extension
would be to provide the actual depth in bits 31-4 while keeping bits
1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
from a hex number).  But this extension would require
bdrv_is_allocated_above to return a depth number.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake 
---
  docs/interop/nbd.txt |  22 +++--
  include/block/nbd.h  |   8 +++-
  nbd/server.c | 105 ---
  3 files changed, 125 insertions(+), 10 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f3b3cacc9621..56efec7aee12 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -17,9 +17,9 @@ namespace "qemu".

  == "qemu" namespace ==

-The "qemu" namespace currently contains only one type of context,
-related to exposing the contents of a dirty bitmap alongside the
-associated disk contents.  That context has the following form:
+The "qemu" namespace currently contains two types of context.  The
+first is related to exposing the contents of a dirty bitmap alongside
+the associated disk contents.  That context has the following form:

  qemu:dirty-bitmap:

@@ -28,8 +28,21 @@ in reply for NBD_CMD_BLOCK_STATUS:

  bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"

+The second is related to exposing the source of various extents within
+the image, with a single context named:
+
+qemu:allocation-depth
+
+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
+bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image


Maybe, s/this/the top level/, as, what is "this" for NBD client?


+bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
+   backing layer
+
  For NBD_OPT_LIST_META_CONTEXT the following queries are supported
-in addition to "qemu:dirty-bitmap:":
+in addition to the specific "qemu:allocation-depth" and
+"qemu:dirty-bitmap:":

  * "qemu:" - returns list of all available metadata contexts in the
  namespace.
@@ -55,3 +68,4 @@ the operation of that feature.
  NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
  * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
  NBD_CMD_FLAG_FAST_ZERO
+* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3dd9a04546ec..06208bc25027 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
  /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
   *  Copyright (C) 2005  Anthony Liguori 
   *
   *  Network Block Device
@@ -259,6 +259,12 @@ enum {
  /* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
  #define NBD_STATE_DIRTY (1 << 0)

+/* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
+#define NBD_STATE_DEPTH_UNALLOC (0 << 0)
+#define NBD_STATE_DEPTH_LOCAL   (1 << 0)
+#define NBD_STATE_DEPTH_BACKING (2 << 0)
+#define NBD_STATE_DEPTH_MASK(0x3)
+
  static inline bool nbd_reply_type_is_error(int type)
  {
  return type & (1 << 15);
diff --git a/nbd/server.c b/nbd/server.c
index 7271a09b5c2b..830b21000be3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -27,7 +27,8 @@
  #include "qemu/units.h"

  #define NBD_META_ID_BASE_ALLOCATION 0
-#define NBD_META_ID_DIRTY_BITMAP 1
+#define NBD_META_ID_ALLOCATION_DEPTH 1
+#define NBD_META_ID_DIRTY_BITMAP 2

  /*
   * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
@@ -94,6 +95,7 @@ struct NBDExport {
  BlockBackend *eject_notifier_blk;
  Notifier eject_notifier;

+bool alloc_context;
  BdrvDirtyBitmap *export_bitmap;
  char *export_bitmap_context;
  };
@@ -108,6 +110,7 @@ typedef struct NBDExportMetaContexts {
  bool valid; /* means that negotiation of the option finished without
 errors */
  bool base_allocation; /* export base:allocation context (block status) */
+bool allocation_depth; /* export qemu:allocation-dep

Re: [PATCH 19/20] python/qemu/qmp.py: Straighten out exception hierarchy

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Be a little more rigorous about which exception we use, and when.
> Primarily, this makes QMPCapabilitiesError an extension of
> QMPprotocolError.
> 
> The family of errors:
> 
> QMPError (generic base)
>   QMPConnectError (For connection issues)
>   QMPTimeoutError (when waiting for an event expires)
>   QMPProtocolError (unexpected/garbled responses)
> QMPCapabilitiesError (handshake problems)
>   QMPResponseError (For in-band QMP error returns)
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH v8 00/17] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-10-07 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20201007115700.707938-1-pbonz...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201007115700.707938-1-pbonz...@redhat.com
Subject: [PATCH v8 00/17] Fix scsi devices plug/unplug races w.r.t virtio-scsi 
iothread

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20201007115700.707938-1-pbonz...@redhat.com -> 
patchew/20201007115700.707938-1-pbonz...@redhat.com
Switched to a new branch 'test'
7e51a35 scsi/scsi_bus: fix races in REPORT LUNS
3ba554c virtio-scsi: use scsi_device_get
caa2894 scsi/scsi_bus: Add scsi_device_get
8d7483b scsi/scsi-bus: scsi_device_find: don't return unrealized devices
8002737 device-core: use atomic_set on .realized property
d91937f scsi: switch to bus->check_address
562a123 device-core: use RCU for list of children of a bus
6bc4617 device_core: use drain_call_rcu in in qmp_device_add
97fbf85 scsi/scsi_bus: switch search direction in scsi_device_find
bb3b68e qdev: add "check if address free" callback for buses
adf9e3e qemu-iotests, qtest: rewrite test 067 as a qtest
093627d qtest: check that drives are really appearing and disappearing
80856de qtest: switch users back to qtest_qmp_receive
c96d5f0 device-plug-test: use qtest_qmp to send the device_del command
f1f04d9 qtest: remove qtest_qmp_receive_success
1148ec4 qtest: Reintroduce qtest_qmp_receive
20e3183 qtest: rename qtest_qmp_receive to qtest_qmp_receive_dict

=== OUTPUT BEGIN ===
1/17 Checking commit 20e3183ecbd8 (qtest: rename qtest_qmp_receive to 
qtest_qmp_receive_dict)
2/17 Checking commit 1148ec4900a6 (qtest: Reintroduce qtest_qmp_receive)
3/17 Checking commit f1f04d9c2320 (qtest: remove qtest_qmp_receive_success)
4/17 Checking commit c96d5f0aa32b (device-plug-test: use qtest_qmp to send the 
device_del command)
WARNING: line over 80 characters
#33: FILE: tests/qtest/device-plug-test.c:23:
+ "{'execute': 'device_del', 'arguments': { 'id': %s } }", 
id);

total: 0 errors, 1 warnings, 76 lines checked

Patch 4/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
5/17 Checking commit 80856deb548d (qtest: switch users back to 
qtest_qmp_receive)
6/17 Checking commit 093627d21203 (qtest: check that drives are really 
appearing and disappearing)
7/17 Checking commit adf9e3ea2b0c (qemu-iotests, qtest: rewrite test 067 as a 
qtest)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#43: 
deleted file mode 100755

WARNING: line over 80 characters
#646: FILE: tests/qtest/drive_del-test.c:19:
+static bool look_for_drive0(QTestState *qts, const char *command, const char 
*key)

total: 0 errors, 2 warnings, 308 lines checked

Patch 7/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
8/17 Checking commit bb3b68e3507e (qdev: add "check if address free" callback 
for buses)
9/17 Checking commit 97fbf857acae (scsi/scsi_bus: switch search direction in 
scsi_device_find)
10/17 Checking commit 6bc461779d34 (device_core: use drain_call_rcu in in 
qmp_device_add)
11/17 Checking commit 562a12390903 (device-core: use RCU for list of children 
of a bus)
12/17 Checking commit d91937ffcb2f (scsi: switch to bus->check_address)
ERROR: code indent should never use tabs
#55: FILE: hw/scsi/scsi-bus.c:137:
+^I^I^I^I int channel, int target, int lun,$

ERROR: code indent should never use tabs
#56: FILE: hw/scsi/scsi-bus.c:138:
+^I^I^I^I SCSIDevice **p_dev)$

WARNING: line over 80 characters
#71: FILE: hw/scsi/scsi-bus.c:153:
+static bool scsi_bus_check_address(BusState *qbus, DeviceState *qdev, Error 
**errp)

WARNING: line over 80 characters
#91: FILE: hw/scsi/scsi-bus.c:173:
+if (!scsi_bus_is_address_free(bus, dev->channel, dev->id, dev->lun, 
&d)) {

WARNING: line over 80 characters
#130: FILE: hw/scsi/scsi-bus.c:195:
+is_free = scsi_bus_is_address_free(bus, dev->channel, ++id, 
dev->lun, NULL);

WARNING: line over 80 characters
#143: FILE: hw/scsi/scsi-bus.c:205:
+is_free = scsi_bus_is_address_free(bus, dev->channel, dev->id, 
++lun, NULL);

total: 2 errors, 4 warnings, 182 lines checked

Patch 12/17 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

13/17 Checking commit 8002737303c9 (device-core: use atomic_set on .realized 
property)
14/17 Checking commit 8d7483be0fd9 (scsi/scsi-bus: scsi_device_find: don't 
return unrealized devices

[PATCH v8 17/17] scsi/scsi_bus: fix races in REPORT LUNS

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

Currently scsi_target_emulate_report_luns iterates over the child device list
twice, and there is no guarantee that this list is the same in both iterations.

The reason for iterating twice is that the first iteration calculates
how much memory to allocate.  However if we use a dynamic array we can
avoid iterating twice, and therefore we avoid this race.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1866707

Signed-off-by: Maxim Levitsky 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-10-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-14-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 68 ++
 1 file changed, 33 insertions(+), 35 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index eda8cb7e70..b901e701f0 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -438,19 +438,23 @@ struct SCSITargetReq {
 static void store_lun(uint8_t *outbuf, int lun)
 {
 if (lun < 256) {
+/* Simple logical unit addressing method*/
+outbuf[0] = 0;
 outbuf[1] = lun;
-return;
+} else {
+/* Flat space addressing method */
+outbuf[0] = 0x40 | (lun >> 8);
+outbuf[1] = (lun & 255);
 }
-outbuf[1] = (lun & 255);
-outbuf[0] = (lun >> 8) | 0x40;
 }
 
 static bool scsi_target_emulate_report_luns(SCSITargetReq *r)
 {
 BusChild *kid;
-int i, len, n;
 int channel, id;
-bool found_lun0;
+uint8_t tmp[8] = {0};
+int len = 0;
+GByteArray *buf;
 
 if (r->req.cmd.xfer < 16) {
 return false;
@@ -458,46 +462,40 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq 
*r)
 if (r->req.cmd.buf[2] > 2) {
 return false;
 }
+
+/* reserve space for 63 LUNs*/
+buf = g_byte_array_sized_new(512);
+
 channel = r->req.dev->channel;
 id = r->req.dev->id;
-found_lun0 = false;
-n = 0;
 
-RCU_READ_LOCK_GUARD();
+/* add size (will be updated later to correct value */
+g_byte_array_append(buf, tmp, 8);
+len += 8;
 
-QTAILQ_FOREACH_RCU(kid, &r->req.bus->qbus.children, sibling) {
-DeviceState *qdev = kid->child;
-SCSIDevice *dev = SCSI_DEVICE(qdev);
+/* add LUN0 */
+g_byte_array_append(buf, tmp, 8);
+len += 8;
 
-if (dev->channel == channel && dev->id == id) {
-if (dev->lun == 0) {
-found_lun0 = true;
+WITH_RCU_READ_LOCK_GUARD() {
+QTAILQ_FOREACH_RCU(kid, &r->req.bus->qbus.children, sibling) {
+DeviceState *qdev = kid->child;
+SCSIDevice *dev = SCSI_DEVICE(qdev);
+
+if (dev->channel == channel && dev->id == id && dev->lun != 0) {
+store_lun(tmp, dev->lun);
+g_byte_array_append(buf, tmp, 8);
+len += 8;
 }
-n += 8;
 }
 }
-if (!found_lun0) {
-n += 8;
-}
-
-scsi_target_alloc_buf(&r->req, n + 8);
-
-len = MIN(n + 8, r->req.cmd.xfer & ~7);
-memset(r->buf, 0, len);
-stl_be_p(&r->buf[0], n);
-i = found_lun0 ? 8 : 16;
-QTAILQ_FOREACH_RCU(kid, &r->req.bus->qbus.children, sibling) {
-DeviceState *qdev = kid->child;
-SCSIDevice *dev = SCSI_DEVICE(qdev);
 
-if (dev->channel == channel && dev->id == id) {
-store_lun(&r->buf[i], dev->lun);
-i += 8;
-}
-}
+r->buf_len = len;
+r->buf = g_byte_array_free(buf, FALSE);
+r->len = MIN(len, r->req.cmd.xfer & ~7);
 
-assert(i == n + 8);
-r->len = len;
+/* store the LUN list length */
+stl_be_p(&r->buf[0], len - 8);
 return true;
 }
 
-- 
2.26.2

[PATCH v8 16/17] virtio-scsi: use scsi_device_get

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

This will help us to avoid the scsi device disappearing
after we took a reference to it.

It doesn't by itself forbid case when we try to access
an unrealized device

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Maxim Levitsky 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-9-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-13-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/virtio-scsi.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 971afbb217..3db9a8aae9 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -33,7 +33,7 @@ static inline int virtio_scsi_get_lun(uint8_t *lun)
 return ((lun[2] << 8) | lun[3]) & 0x3FFF;
 }
 
-static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
+static inline SCSIDevice *virtio_scsi_device_get(VirtIOSCSI *s, uint8_t *lun)
 {
 if (lun[0] != 1) {
 return NULL;
@@ -41,7 +41,7 @@ static inline SCSIDevice *virtio_scsi_device_find(VirtIOSCSI 
*s, uint8_t *lun)
 if (lun[2] != 0 && !(lun[2] >= 0x40 && lun[2] < 0x80)) {
 return NULL;
 }
-return scsi_device_find(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
+return scsi_device_get(&s->bus, 0, lun[1], virtio_scsi_get_lun(lun));
 }
 
 void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
@@ -256,7 +256,7 @@ static inline void virtio_scsi_ctx_check(VirtIOSCSI *s, 
SCSIDevice *d)
  *  case of async cancellation. */
 static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq *req)
 {
-SCSIDevice *d = virtio_scsi_device_find(s, req->req.tmf.lun);
+SCSIDevice *d = virtio_scsi_device_get(s, req->req.tmf.lun);
 SCSIRequest *r, *next;
 BusChild *kid;
 int target;
@@ -370,10 +370,10 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 
 rcu_read_lock();
 QTAILQ_FOREACH_RCU(kid, &s->bus.qbus.children, sibling) {
- d = SCSI_DEVICE(kid->child);
- if (d->channel == 0 && d->id == target) {
-qdev_reset_all(&d->qdev);
- }
+SCSIDevice *d1 = SCSI_DEVICE(kid->child);
+if (d1->channel == 0 && d1->id == target) {
+qdev_reset_all(&d1->qdev);
+}
 }
 rcu_read_unlock();
 
@@ -386,14 +386,17 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, 
VirtIOSCSIReq *req)
 break;
 }
 
+object_unref(OBJECT(d));
 return ret;
 
 incorrect_lun:
 req->resp.tmf.response = VIRTIO_SCSI_S_INCORRECT_LUN;
+object_unref(OBJECT(d));
 return ret;
 
 fail:
 req->resp.tmf.response = VIRTIO_SCSI_S_BAD_TARGET;
+object_unref(OBJECT(d));
 return ret;
 }
 
@@ -564,7 +567,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI 
*s, VirtIOSCSIReq *req)
 }
 }
 
-d = virtio_scsi_device_find(s, req->req.cmd.lun);
+d = virtio_scsi_device_get(s, req->req.cmd.lun);
 if (!d) {
 req->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
 virtio_scsi_complete_cmd_req(req);
@@ -580,10 +583,12 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI 
*s, VirtIOSCSIReq *req)
 req->sreq->cmd.xfer > req->qsgl.size)) {
 req->resp.cmd.response = VIRTIO_SCSI_S_OVERRUN;
 virtio_scsi_complete_cmd_req(req);
+object_unref(OBJECT(d));
 return -ENOBUFS;
 }
 scsi_req_ref(req->sreq);
 blk_io_plug(d->conf.blk);
+object_unref(OBJECT(d));
 return 0;
 }
 
-- 
2.26.2

[PATCH v8 02/17] qtest: Reintroduce qtest_qmp_receive

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

The new qtest_qmp_receive buffers all the received qmp events, allowing
qtest_qmp_eventwait_ref to return them.

This is intended to solve the race in regard to ordering of qmp events
vs qmp responses, as soon as the callers start using the new interface.

In addition to that, define qtest_qmp_event_ref a function which only scans
the buffer that qtest_qmp_receive stores the events to.

This is intended for callers that are only interested in events that were
received during the last call to the qtest_qmp_receive.

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
Message-Id: <20201006123904.610658-3-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/qtest/libqos/libqtest.h | 23 
 tests/qtest/libqtest.c| 49 ++-
 2 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/libqtest.h b/tests/qtest/libqos/libqtest.h
index 9b3f99b322..a2e3961792 100644
--- a/tests/qtest/libqos/libqtest.h
+++ b/tests/qtest/libqos/libqtest.h
@@ -198,6 +198,16 @@ void qtest_qmp_vsend(QTestState *s, const char *fmt, 
va_list ap)
  */
 QDict *qtest_qmp_receive_dict(QTestState *s);
 
+/**
+ * qtest_qmp_receive:
+ * @s: #QTestState instance to operate on.
+ *
+ * Reads a QMP message from QEMU and returns the response.
+ * Buffers all the events received meanwhile, until a
+ * call to qtest_qmp_eventwait
+ */
+QDict *qtest_qmp_receive(QTestState *s);
+
 /**
  * qtest_qmp_eventwait:
  * @s: #QTestState instance to operate on.
@@ -217,6 +227,19 @@ void qtest_qmp_eventwait(QTestState *s, const char *event);
  */
 QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event);
 
+/**
+ * qtest_qmp_event_ref:
+ * @s: #QTestState instance to operate on.
+ * @s: #event event to return.
+ *
+ * Removes non-matching events from the buffer that was set by
+ * qtest_qmp_receive, until an event bearing the given name is found,
+ * and returns it.
+ * If no event matches, clears the buffer and returns NULL.
+ *
+ */
+QDict *qtest_qmp_event_ref(QTestState *s, const char *event);
+
 /**
  * qtest_qmp_receive_success:
  * @s: #QTestState instance to operate on
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index dadc465825..d4c49a52ff 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -63,6 +63,7 @@ struct QTestState
 bool irq_level[MAX_IRQ];
 GString *rx;
 QTestTransportOps ops;
+GList *pending_events;
 };
 
 static GHookList abrt_hooks;
@@ -279,6 +280,7 @@ QTestState *qtest_init_without_qmp_handshake(const char 
*extra_args)
 
 g_test_message("starting QEMU: %s", command);
 
+s->pending_events = NULL;
 s->wstatus = 0;
 s->expected_status = 0;
 s->qemu_pid = fork();
@@ -386,6 +388,13 @@ void qtest_quit(QTestState *s)
 close(s->fd);
 close(s->qmp_fd);
 g_string_free(s->rx, true);
+
+for (GList *it = s->pending_events; it != NULL; it = it->next) {
+qobject_unref((QDict *)it->data);
+}
+
+g_list_free(s->pending_events);
+
 g_free(s);
 }
 
@@ -603,6 +612,19 @@ QDict *qmp_fd_receive(int fd)
 return qmp.response;
 }
 
+QDict *qtest_qmp_receive(QTestState *s)
+{
+while (true) {
+QDict *response = qtest_qmp_receive_dict(s);
+
+if (!qdict_get_try_str(response, "event")) {
+return response;
+}
+/* Stash the event for a later consumption */
+s->pending_events = g_list_prepend(s->pending_events, response);
+}
+}
+
 QDict *qtest_qmp_receive_dict(QTestState *s)
 {
 return qmp_fd_receive(s->qmp_fd);
@@ -771,10 +793,34 @@ void qtest_qmp_send_raw(QTestState *s, const char *fmt, 
...)
 va_end(ap);
 }
 
-QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event)
+QDict *qtest_qmp_event_ref(QTestState *s, const char *event)
 {
+GList *next = NULL;
 QDict *response;
 
+for (GList *it = s->pending_events; it != NULL; it = next) {
+
+next = it->next;
+response = (QDict *)it->data;
+
+s->pending_events = g_list_remove_link(s->pending_events, it);
+
+if (!strcmp(qdict_get_str(response, "event"), event)) {
+return response;
+}
+qobject_unref(response);
+}
+return NULL;
+}
+
+QDict *qtest_qmp_eventwait_ref(QTestState *s, const char *event)
+{
+QDict *response = qtest_qmp_event_ref(s, event);
+
+if (response) {
+return response;
+}
+
 for (;;) {
 response = qtest_qmp_receive_dict(s);
 if ((qdict_haskey(response, "event")) &&
@@ -1403,6 +1449,7 @@ QTestState *qtest_inproc_init(QTestState **s, bool log, 
const char* arch,
 {
 QTestState *qts;
 qts = g_new0(QTestState, 1);
+qts->pending_events = NULL;
 *s = qts; /* Expose qts early on, since the query endianness relies on it 
*/
 qts->wstatus = 0;
 for (int i = 0; i < MAX_IRQ; i++) {
-- 
2.26.2

[PATCH v8 12/17] scsi: switch to bus->check_address

2020-10-07 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-6-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 122 -
 1 file changed, 75 insertions(+), 47 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4cf1f404b4..4ab9811cd8 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -22,33 +22,6 @@ static void scsi_req_dequeue(SCSIRequest *req);
 static uint8_t *scsi_target_alloc_buf(SCSIRequest *req, size_t len);
 static void scsi_target_free_buf(SCSIRequest *req);
 
-static Property scsi_props[] = {
-DEFINE_PROP_UINT32("channel", SCSIDevice, channel, 0),
-DEFINE_PROP_UINT32("scsi-id", SCSIDevice, id, -1),
-DEFINE_PROP_UINT32("lun", SCSIDevice, lun, -1),
-DEFINE_PROP_END_OF_LIST(),
-};
-
-static void scsi_bus_class_init(ObjectClass *klass, void *data)
-{
-BusClass *k = BUS_CLASS(klass);
-HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
-
-k->get_dev_path = scsibus_get_dev_path;
-k->get_fw_dev_path = scsibus_get_fw_dev_path;
-hc->unplug = qdev_simple_device_unplug_cb;
-}
-
-static const TypeInfo scsi_bus_info = {
-.name = TYPE_SCSI_BUS,
-.parent = TYPE_BUS,
-.instance_size = sizeof(SCSIBus),
-.class_init = scsi_bus_class_init,
-.interfaces = (InterfaceInfo[]) {
-{ TYPE_HOTPLUG_HANDLER },
-{ }
-}
-};
 static int next_scsi_bus;
 
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
@@ -160,35 +133,68 @@ static void scsi_dma_restart_cb(void *opaque, int 
running, RunState state)
 }
 }
 
-static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
+static bool scsi_bus_is_address_free(SCSIBus *bus,
+int channel, int target, int lun,
+SCSIDevice **p_dev)
+{
+SCSIDevice *d = scsi_device_find(bus, channel, target, lun);
+if (d && d->lun == lun) {
+if (p_dev) {
+*p_dev = d;
+}
+return false;
+}
+if (p_dev) {
+*p_dev = NULL;
+}
+return true;
+}
+
+static bool scsi_bus_check_address(BusState *qbus, DeviceState *qdev, Error 
**errp)
 {
 SCSIDevice *dev = SCSI_DEVICE(qdev);
-SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev->qdev.parent_bus);
-SCSIDevice *d;
-Error *local_err = NULL;
+SCSIBus *bus = SCSI_BUS(qbus);
 
 if (dev->channel > bus->info->max_channel) {
 error_setg(errp, "bad scsi channel id: %d", dev->channel);
-return;
+return false;
 }
 if (dev->id != -1 && dev->id > bus->info->max_target) {
 error_setg(errp, "bad scsi device id: %d", dev->id);
-return;
+return false;
 }
 if (dev->lun != -1 && dev->lun > bus->info->max_lun) {
 error_setg(errp, "bad scsi device lun: %d", dev->lun);
-return;
+return false;
+}
+
+if (dev->id != -1 && dev->lun != -1) {
+SCSIDevice *d;
+if (!scsi_bus_is_address_free(bus, dev->channel, dev->id, dev->lun, 
&d)) {
+error_setg(errp, "lun already used by '%s'", d->qdev.id);
+return false;
+}
 }
 
+return true;
+}
+
+static void scsi_qdev_realize(DeviceState *qdev, Error **errp)
+{
+SCSIDevice *dev = SCSI_DEVICE(qdev);
+SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, dev->qdev.parent_bus);
+bool is_free;
+Error *local_err = NULL;
+
 if (dev->id == -1) {
 int id = -1;
 if (dev->lun == -1) {
 dev->lun = 0;
 }
 do {
-d = scsi_device_find(bus, dev->channel, ++id, dev->lun);
-} while (d && d->lun == dev->lun && id < bus->info->max_target);
-if (d && d->lun == dev->lun) {
+is_free = scsi_bus_is_address_free(bus, dev->channel, ++id, 
dev->lun, NULL);
+} while (!is_free && id < bus->info->max_target);
+if (!is_free) {
 error_setg(errp, "no free target");
 return;
 }
@@ -196,20 +202,13 @@ static void scsi_qdev_realize(DeviceState *qdev, Error 
**errp)
 } else if (dev->lun == -1) {
 int lun = -1;
 do {
-d = scsi_device_find(bus, dev->channel, dev->id, ++lun);
-} while (d && d->lun == lun && lun < bus->info->max_lun);
-if (d && d->lun == lun) {
+is_free = scsi_bus_is_address_free(bus, dev->channel, dev->id, 
++lun, NULL);
+} while (!is_free && lun < bus->info->max_lun);
+if (!is_free) {
 error_setg(errp, "no free lun");
 return;
 }
 dev->lun = lun;
-} else {
-d = scsi_device_find(bus, dev->channel, dev->id, dev->lun);
-assert(d);
-if (d->lun == dev->lun && dev != d) {
-error_setg(errp, "lun already used by '%s'", d->qdev.id);
-return;
-}
 }
 
 QTAILQ_INIT(&dev->requests);
@@ -1723,6 +1722,13 @@ const VMStateDescription vmstate_scsi_device = {
 }
 };
 
+static Proper

[PATCH v8 10/17] device_core: use drain_call_rcu in in qmp_device_add

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

Soon, a device removal might only happen on RCU callback execution.
This is okay for device-del which provides a DEVICE_DELETED event,
but not for the failure case of device-add.  To avoid changing
monitor semantics, just drain all pending RCU callbacks on error.

Signed-off-by: Maxim Levitsky 
Suggested-by: Stefan Hajnoczi 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-4-mlevi...@redhat.com>
[Don't use it in qmp_device_del. - Paolo]
Signed-off-by: Paolo Bonzini 
---
 softmmu/qdev-monitor.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index e9b7228480..bcfb90a08f 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -803,6 +803,18 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 return;
 }
 dev = qdev_device_add(opts, errp);
+
+/*
+ * Drain all pending RCU callbacks. This is done because
+ * some bus related operations can delay a device removal
+ * (in this case this can happen if device is added and then
+ * removed due to a configuration error)
+ * to a RCU callback, but user might expect that this interface
+ * will finish its job completely once qmp command returns result
+ * to the user
+ */
+drain_call_rcu();
+
 if (!dev) {
 qemu_opts_del(opts);
 return;
-- 
2.26.2

[PATCH v8 11/17] device-core: use RCU for list of children of a bus

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

This fixes the race between device emulation code that tries to find
a child device to dispatch the request to (e.g a scsi disk),
and hotplug of a new device to that bus.

Note that this doesn't convert all the readers of the list
but only these that might go over that list without BQL held.

This is a very small first step to make this code thread safe.

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-5-mlevi...@redhat.com>
[Use RCU_READ_LOCK_GUARD in more places, adjust testcase now that
 the delay in DEVICE_DELETED due to RCU is more consistent. - Paolo]
Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-9-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/core/bus.c  | 28 +---
 hw/core/qdev.c | 37 +++--
 hw/scsi/scsi-bus.c | 12 +---
 hw/scsi/virtio-scsi.c  |  6 +-
 include/hw/qdev-core.h |  9 +
 5 files changed, 63 insertions(+), 29 deletions(-)

diff --git a/hw/core/bus.c b/hw/core/bus.c
index 6b987b6946..a0483859ae 100644
--- a/hw/core/bus.c
+++ b/hw/core/bus.c
@@ -49,12 +49,14 @@ int qbus_walk_children(BusState *bus,
 }
 }
 
-QTAILQ_FOREACH(kid, &bus->children, sibling) {
-err = qdev_walk_children(kid->child,
- pre_devfn, pre_busfn,
- post_devfn, post_busfn, opaque);
-if (err < 0) {
-return err;
+WITH_RCU_READ_LOCK_GUARD() {
+QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+err = qdev_walk_children(kid->child,
+ pre_devfn, pre_busfn,
+ post_devfn, post_busfn, opaque);
+if (err < 0) {
+return err;
+}
 }
 }
 
@@ -90,8 +92,10 @@ static void bus_reset_child_foreach(Object *obj, 
ResettableChildCallback cb,
 BusState *bus = BUS(obj);
 BusChild *kid;
 
-QTAILQ_FOREACH(kid, &bus->children, sibling) {
-cb(OBJECT(kid->child), opaque, type);
+WITH_RCU_READ_LOCK_GUARD() {
+QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+cb(OBJECT(kid->child), opaque, type);
+}
 }
 }
 
@@ -194,9 +198,11 @@ static void bus_set_realized(Object *obj, bool value, 
Error **errp)
 
 /* TODO: recursive realization */
 } else if (!value && bus->realized) {
-QTAILQ_FOREACH(kid, &bus->children, sibling) {
-DeviceState *dev = kid->child;
-qdev_unrealize(dev);
+WITH_RCU_READ_LOCK_GUARD() {
+QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+DeviceState *dev = kid->child;
+qdev_unrealize(dev);
+}
 }
 if (bc->unrealize) {
 bc->unrealize(bus);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 74db78df36..59e5e710b7 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -51,6 +51,12 @@ const VMStateDescription *qdev_get_vmsd(DeviceState *dev)
 return dc->vmsd;
 }
 
+static void bus_free_bus_child(BusChild *kid)
+{
+object_unref(OBJECT(kid->child));
+g_free(kid);
+}
+
 static void bus_remove_child(BusState *bus, DeviceState *child)
 {
 BusChild *kid;
@@ -60,15 +66,16 @@ static void bus_remove_child(BusState *bus, DeviceState 
*child)
 char name[32];
 
 snprintf(name, sizeof(name), "child[%d]", kid->index);
-QTAILQ_REMOVE(&bus->children, kid, sibling);
+QTAILQ_REMOVE_RCU(&bus->children, kid, sibling);
 
 bus->num_children--;
 
 /* This gives back ownership of kid->child back to us.  */
 object_property_del(OBJECT(bus), name);
-object_unref(OBJECT(kid->child));
-g_free(kid);
-return;
+
+/* free the bus kid, when it is safe to do so*/
+call_rcu(kid, bus_free_bus_child, rcu);
+break;
 }
 }
 }
@@ -83,7 +90,7 @@ static void bus_add_child(BusState *bus, DeviceState *child)
 kid->child = child;
 object_ref(OBJECT(kid->child));
 
-QTAILQ_INSERT_HEAD(&bus->children, kid, sibling);
+QTAILQ_INSERT_HEAD_RCU(&bus->children, kid, sibling);
 
 /* This transfers ownership of kid->child to the property.  */
 snprintf(name, sizeof(name), "child[%d]", kid->index);
@@ -672,17 +679,19 @@ DeviceState *qdev_find_recursive(BusState *bus, const 
char *id)
 DeviceState *ret;
 BusState *child;
 
-QTAILQ_FOREACH(kid, &bus->children, sibling) {
-DeviceState *dev = kid->child;
+WITH_RCU_READ_LOCK_GUARD() {
+QTAILQ_FOREACH_RCU(kid, &bus->children, sibling) {
+DeviceState *dev = kid->child;
 
-if (dev->id && strcmp(dev->id, id) == 0) {
-return dev;
-}
+if (dev->id && strcmp(dev->id, id) == 0) {
+return dev;
+

[PATCH v8 14/17] scsi/scsi-bus: scsi_device_find: don't return unrealized devices

2020-10-07 Thread Paolo Bonzini

The device core first places a device on the bus and then realizes it.
Make scsi_device_find avoid returing such devices to avoid
races in drivers that use an iothread (currently virtio-scsi)

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1812399

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-7-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-11-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 83 +-
 1 file changed, 53 insertions(+), 30 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4ab9811cd8..7599113efe 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -24,6 +24,55 @@ static void scsi_target_free_buf(SCSIRequest *req);
 
 static int next_scsi_bus;
 
+static SCSIDevice *do_scsi_device_find(SCSIBus *bus,
+   int channel, int id, int lun,
+   bool include_unrealized)
+{
+BusChild *kid;
+SCSIDevice *retval = NULL;
+
+QTAILQ_FOREACH_RCU(kid, &bus->qbus.children, sibling) {
+DeviceState *qdev = kid->child;
+SCSIDevice *dev = SCSI_DEVICE(qdev);
+
+if (dev->channel == channel && dev->id == id) {
+if (dev->lun == lun) {
+retval = dev;
+break;
+}
+
+/*
+ * If we don't find exact match (channel/bus/lun),
+ * we will return the first device which matches channel/bus
+ */
+
+if (!retval) {
+retval = dev;
+}
+}
+}
+
+/*
+ * This function might run on the IO thread and we might race against
+ * main thread hot-plugging the device.
+ * We assume that as soon as .realized is set to true we can let
+ * the user access the device.
+ */
+
+if (retval && !include_unrealized &&
+!qatomic_load_acquire(&retval->qdev.realized)) {
+retval = NULL;
+}
+
+return retval;
+}
+
+SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
+{
+RCU_READ_LOCK_GUARD();
+return do_scsi_device_find(bus, channel, id, lun, false);
+}
+
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
 {
 SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(s);
@@ -137,7 +186,10 @@ static bool scsi_bus_is_address_free(SCSIBus *bus,
 int channel, int target, int lun,
 SCSIDevice **p_dev)
 {
-SCSIDevice *d = scsi_device_find(bus, channel, target, lun);
+SCSIDevice *d;
+
+RCU_READ_LOCK_GUARD();
+d = do_scsi_device_find(bus, channel, target, lun, true);
 if (d && d->lun == lun) {
 if (p_dev) {
 *p_dev = d;
@@ -1570,35 +1622,6 @@ static char *scsibus_get_fw_dev_path(DeviceState *dev)
qdev_fw_name(dev), d->id, d->lun);
 }
 
-SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int id, int lun)
-{
-BusChild *kid;
-SCSIDevice *target_dev = NULL;
-
-RCU_READ_LOCK_GUARD();
-QTAILQ_FOREACH_RCU(kid, &bus->qbus.children, sibling) {
-DeviceState *qdev = kid->child;
-SCSIDevice *dev = SCSI_DEVICE(qdev);
-
-if (dev->channel == channel && dev->id == id) {
-if (dev->lun == lun) {
-return dev;
-}
-
-/*
- * If we don't find exact match (channel/bus/lun),
- * we will return the first device which matches channel/bus
- */
-
-if (!target_dev) {
-target_dev = dev;
-}
-}
-}
-
-return target_dev;
-}
-
 /* SCSI request list.  For simplicity, pv points to the whole device */
 
 static int put_scsi_requests(QEMUFile *f, void *pv, size_t size,
-- 
2.26.2

[PATCH v8 04/17] device-plug-test: use qtest_qmp to send the device_del command

2020-10-07 Thread Paolo Bonzini

Simplify the code now that events are buffered.  There is no need
anymore to separate sending the command and retrieving the response.

Signed-off-by: Paolo Bonzini 
---
 tests/qtest/device-plug-test.c | 32 +---
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index a2247856be..559d47727a 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -15,26 +15,17 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 
-static void device_del_start(QTestState *qtest, const char *id)
+static void device_del(QTestState *qtest, const char *id)
 {
-qtest_qmp_send(qtest,
-   "{'execute': 'device_del', 'arguments': { 'id': %s } }", 
id);
-}
+QDict *resp;
 
-static void device_del_finish(QTestState *qtest)
-{
-QDict *resp = qtest_qmp_receive_dict(qtest);
+resp = qtest_qmp(qtest,
+ "{'execute': 'device_del', 'arguments': { 'id': %s } }", 
id);
 
 g_assert(qdict_haskey(resp, "return"));
 qobject_unref(resp);
 }
 
-static void device_del_request(QTestState *qtest, const char *id)
-{
-device_del_start(qtest, id);
-device_del_finish(qtest);
-}
-
 static void system_reset(QTestState *qtest)
 {
 QDict *resp;
@@ -79,7 +70,7 @@ static void test_pci_unplug_request(void)
  * be processed. However during system reset, the removal will be
  * handled, removing the device.
  */
-device_del_request(qtest, "dev0");
+device_del(qtest, "dev0");
 system_reset(qtest);
 wait_device_deleted_event(qtest, "dev0");
 
@@ -90,13 +81,8 @@ static void test_ccw_unplug(void)
 {
 QTestState *qtest = qtest_initf("-device virtio-balloon-ccw,id=dev0");
 
-/*
- * The DEVICE_DELETED events will be sent before the command
- * completes.
- */
-device_del_start(qtest, "dev0");
+device_del(qtest, "dev0");
 wait_device_deleted_event(qtest, "dev0");
-device_del_finish(qtest);
 
 qtest_quit(qtest);
 }
@@ -109,7 +95,7 @@ static void test_spapr_cpu_unplug_request(void)
 "-device 
power9_v2.0-spapr-cpu-core,core-id=1,id=dev0");
 
 /* similar to test_pci_unplug_request */
-device_del_request(qtest, "dev0");
+device_del(qtest, "dev0");
 system_reset(qtest);
 wait_device_deleted_event(qtest, "dev0");
 
@@ -125,7 +111,7 @@ static void test_spapr_memory_unplug_request(void)
 "-device pc-dimm,id=dev0,memdev=mem0");
 
 /* similar to test_pci_unplug_request */
-device_del_request(qtest, "dev0");
+device_del(qtest, "dev0");
 system_reset(qtest);
 wait_device_deleted_event(qtest, "dev0");
 
@@ -139,7 +125,7 @@ static void test_spapr_phb_unplug_request(void)
 qtest = qtest_initf("-device spapr-pci-host-bridge,index=1,id=dev0");
 
 /* similar to test_pci_unplug_request */
-device_del_request(qtest, "dev0");
+device_del(qtest, "dev0");
 system_reset(qtest);
 wait_device_deleted_event(qtest, "dev0");
 
-- 
2.26.2

[PATCH v8 09/17] scsi/scsi_bus: switch search direction in scsi_device_find

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

This change will allow us to convert the bus children list to RCU,
while not changing the logic of this function

Signed-off-by: Maxim Levitsky 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-2-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 3284a5d1fb..6b1ed7ae9a 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1572,7 +1572,7 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, 
int id, int lun)
 BusChild *kid;
 SCSIDevice *target_dev = NULL;
 
-QTAILQ_FOREACH_REVERSE(kid, &bus->qbus.children, sibling) {
+QTAILQ_FOREACH(kid, &bus->qbus.children, sibling) {
 DeviceState *qdev = kid->child;
 SCSIDevice *dev = SCSI_DEVICE(qdev);
 
@@ -1580,7 +1580,15 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, 
int id, int lun)
 if (dev->lun == lun) {
 return dev;
 }
-target_dev = dev;
+
+/*
+ * If we don't find exact match (channel/bus/lun),
+ * we will return the first device which matches channel/bus
+ */
+
+if (!target_dev) {
+target_dev = dev;
+}
 }
 }
 return target_dev;
-- 
2.26.2

[PATCH v8 13/17] device-core: use atomic_set on .realized property

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

Some code might race with placement of new devices on a bus.
We currently first place a (unrealized) device on the bus
and then realize it.

As a workaround, users that scan the child device list, can
check the realized property to see if it is safe to access such a device.
Use an atomic write here too to aid with this.

A separate discussion is what to do with devices that are unrealized:
It looks like for this case we only call the hotplug handler's unplug
callback and its up to it to unrealize the device.
An atomic operation doesn't cause harm for this code path though.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200913160259.32145-6-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-10-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/core/qdev.c | 19 ++-
 include/hw/qdev-core.h |  2 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 59e5e710b7..fc4daa36fa 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -946,7 +946,25 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 }
}
 
+   qatomic_store_release(&dev->realized, value);
+
 } else if (!value && dev->realized) {
+
+/*
+ * Change the value so that any concurrent users are aware
+ * that the device is going to be unrealized
+ *
+ * TODO: change .realized property to enum that states
+ * each phase of the device realization/unrealization
+ */
+
+qatomic_set(&dev->realized, value);
+/*
+ * Ensure that concurrent users see this update prior to
+ * any other changes done by unrealize.
+ */
+smp_wmb();
+
 QLIST_FOREACH(bus, &dev->child_bus, sibling) {
 qbus_unrealize(bus);
 }
@@ -961,7 +979,6 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
 }
 
 assert(local_err == NULL);
-dev->realized = value;
 return;
 
 child_realize_fail:
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 2c6307e3ed..868973319e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -163,6 +163,8 @@ struct NamedClockList {
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
+ *When accessed outsize big qemu lock, must be accessed with
+ *atomic_load_acquire()
  * @reset: ResettableState for the device; handled by Resettable interface.
  *
  * This structure should not be accessed directly.  We declare it here
-- 
2.26.2

[PATCH v8 15/17] scsi/scsi_bus: Add scsi_device_get

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

Add scsi_device_get which finds the scsi device
and takes a reference to it.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Maxim Levitsky 
Message-Id: <20200913160259.32145-8-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-12-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 11 +++
 include/hw/scsi/scsi.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 7599113efe..eda8cb7e70 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -73,6 +73,17 @@ SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int 
id, int lun)
 return do_scsi_device_find(bus, channel, id, lun, false);
 }
 
+SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int id, int lun)
+{
+SCSIDevice *d;
+RCU_READ_LOCK_GUARD();
+d = do_scsi_device_find(bus, channel, id, lun, false);
+if (d) {
+object_ref(d);
+}
+return d;
+}
+
 static void scsi_device_realize(SCSIDevice *s, Error **errp)
 {
 SCSIDeviceClass *sc = SCSI_DEVICE_GET_CLASS(s);
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 7a55cdbd74..09fa5c9d2a 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -190,6 +190,7 @@ int scsi_device_get_sense(SCSIDevice *dev, uint8_t *buf, 
int len, bool fixed);
 int scsi_SG_IO_FROM_DEV(BlockBackend *blk, uint8_t *cmd, uint8_t cmd_size,
 uint8_t *buf, uint8_t buf_size);
 SCSIDevice *scsi_device_find(SCSIBus *bus, int channel, int target, int lun);
+SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int target, int lun);
 
 /* scsi-generic.c. */
 extern const SCSIReqOps scsi_generic_req_ops;
-- 
2.26.2

[PATCH v8 07/17] qemu-iotests, qtest: rewrite test 067 as a qtest

2020-10-07 Thread Paolo Bonzini

Test 067 from qemu-iotests is executing QMP commands to hotplug
and hot-unplug disks, devices and blockdevs.  Because the power
of the text-based test harness is limited, it is actually limiting
the checks that it does, for example by skipping DEVICE_DELETED
events.

tests/qtest already has a similar test, drive_del-test.c.
We can merge them, and even reuse some of the existing code in
drive_del-test.c, and improve the quality of the test by
covering DEVICE_DELETED events.  The only difference is that
the new test will always use null-co:// for the medium
rather than qcow2 or raw, but this should be irrelevant
for what the test is covering.  For example there are
no "qemu-img check" runs in 067 that would check that
the file is properly closed.

The new tests requires PCI hot-plug support, so drive_del-test
is moved from qemu-system-ppc to qemu-system-ppc64.

Signed-off-by: Paolo Bonzini 
---
 .gitlab-ci.yml   |   2 +-
 tests/qemu-iotests/067   | 157 -
 tests/qemu-iotests/067.out   | 414 ---
 tests/qemu-iotests/group |   1 -
 tests/qtest/drive_del-test.c | 211 --
 tests/qtest/meson.build  |   4 +-
 6 files changed, 190 insertions(+), 599 deletions(-)
 delete mode 100755 tests/qemu-iotests/067
 delete mode 100644 tests/qemu-iotests/067.out

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index a51c89554f..a4cf87481e 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -232,7 +232,7 @@ build-tcg-disabled:
 - ./check -raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 048
 052 063 077 086 101 104 106 113 148 150 151 152 157 159 160 163
 170 171 183 184 192 194 197 208 215 221 222 226 227 236 253 277
-- ./check -qcow2 028 051 056 057 058 065 067 068 082 085 091 095 096 102 
122
+- ./check -qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
 124 132 139 142 144 145 151 152 155 157 165 194 196 197 200 202
 208 209 215 216 218 222 227 234 246 247 248 250 254 255 257 258
 260 261 262 263 264 270 272 273 277 279
diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
deleted file mode 100755
index a63be9cabf..00
--- a/tests/qemu-iotests/067
+++ /dev/null
@@ -1,157 +0,0 @@
-#!/usr/bin/env bash
-#
-# Test automatic deletion of BDSes created by -drive/drive_add
-#
-# Copyright (C) 2013 Red Hat, Inc.
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
-# (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program.  If not, see .
-#
-
-# creator
-owner=kw...@redhat.com
-
-seq=`basename $0`
-echo "QA output created by $seq"
-
-status=1   # failure is the default!
-
-# get standard environment, filters and checks
-. ./common.rc
-. ./common.filter
-
-_supported_fmt qcow2
-_supported_proto file
-# Because anything other than 16 would change the output of query-block,
-# and external data files would change the output of
-# query-named-block-nodes
-_unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' data_file
-
-do_run_qemu()
-{
-echo Testing: "$@"
-$QEMU -nographic -qmp-pretty stdio -serial none "$@"
-echo
-}
-
-# Remove QMP events from (pretty-printed) output. Doesn't handle
-# nested dicts correctly, but we don't get any of those in this test.
-_filter_qmp_events()
-{
-tr '\n' '\t' | sed -e \
-   
's/{\s*"timestamp":\s*{[^}]*},\s*"event":[^,}]*\(,\s*"data":\s*{[^}]*}\)\?\s*}\s*//g'
 \
-   | tr '\t' '\n'
-}
-
-run_qemu()
-{
-do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qmp | _filter_qemu \
-  | _filter_actual_image_size \
-  | _filter_generated_node_ids | _filter_qmp_events \
-  | _filter_img_info
-}
-
-size=128M
-
-_make_test_img $size
-
-echo
-echo === -drive/-device and device_del ===
-echo
-
-run_qemu -drive file=$TEST_IMG,format=$IMGFMT,if=none,id=disk -device 
virtio-blk,drive=disk,id=virtio0

[PATCH v8 08/17] qdev: add "check if address free" callback for buses

2020-10-07 Thread Paolo Bonzini

Check if an address is free on the bus before plugging in the
device.  This makes it possible to do the check without any
side effects, and to detect the problem early without having
to do it in the realize callback.

Signed-off-by: Paolo Bonzini 
Message-Id: <20201006123904.610658-5-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/core/qdev.c | 17 +++--
 hw/net/virtio-net.c|  2 +-
 hw/sd/core.c   |  3 ++-
 include/hw/qdev-core.h | 13 -
 4 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 96772a15bd..74db78df36 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -94,13 +94,23 @@ static void bus_add_child(BusState *bus, DeviceState *child)
  0);
 }
 
-void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
+static bool bus_check_address(BusState *bus, DeviceState *child, Error **errp)
+{
+BusClass *bc = BUS_GET_CLASS(bus);
+return !bc->check_address || bc->check_address(bus, child, errp);
+}
+
+bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp)
 {
 BusState *old_parent_bus = dev->parent_bus;
 DeviceClass *dc = DEVICE_GET_CLASS(dev);
 
 assert(dc->bus_type && object_dynamic_cast(OBJECT(bus), dc->bus_type));
 
+if (!bus_check_address(bus, dev, errp)) {
+return false;
+}
+
 if (old_parent_bus) {
 trace_qdev_update_parent_bus(dev, object_get_typename(OBJECT(dev)),
 old_parent_bus, object_get_typename(OBJECT(old_parent_bus)),
@@ -126,6 +136,7 @@ void qdev_set_parent_bus(DeviceState *dev, BusState *bus)
 object_unref(OBJECT(old_parent_bus));
 object_unref(OBJECT(dev));
 }
+return true;
 }
 
 DeviceState *qdev_new(const char *name)
@@ -371,7 +382,9 @@ bool qdev_realize(DeviceState *dev, BusState *bus, Error 
**errp)
 assert(!dev->realized && !dev->parent_bus);
 
 if (bus) {
-qdev_set_parent_bus(dev, bus);
+if (!qdev_set_parent_bus(dev, bus, errp)) {
+return false;
+}
 } else {
 assert(!DEVICE_GET_CLASS(dev)->bus_type);
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7bf27b9db7..268cecc498 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3142,7 +3142,7 @@ static bool failover_replug_primary(VirtIONet *n, Error 
**errp)
 error_setg(errp, "virtio_net: couldn't find primary bus");
 return false;
 }
-qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+qdev_set_parent_bus(n->primary_dev, n->primary_bus, &error_abort);
 n->primary_should_be_hidden = false;
 if (!qemu_opt_set_bool(n->primary_device_opts,
"partially_hotplugged", true, errp)) {
diff --git a/hw/sd/core.c b/hw/sd/core.c
index 957d116f1a..08c93b5903 100644
--- a/hw/sd/core.c
+++ b/hw/sd/core.c
@@ -23,6 +23,7 @@
 #include "hw/qdev-core.h"
 #include "hw/sd/sd.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
 #include "trace.h"
 
 static inline const char *sdbus_name(SDBus *sdbus)
@@ -240,7 +241,7 @@ void sdbus_reparent_card(SDBus *from, SDBus *to)
 readonly = sc->get_readonly(card);
 
 sdbus_set_inserted(from, false);
-qdev_set_parent_bus(DEVICE(card), &to->qbus);
+qdev_set_parent_bus(DEVICE(card), &to->qbus, &error_abort);
 sdbus_set_inserted(to, true);
 sdbus_set_readonly(to, readonly);
 }
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 72064f4dd4..14d476c587 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -210,13 +210,24 @@ struct BusClass {
 /* FIXME first arg should be BusState */
 void (*print_dev)(Monitor *mon, DeviceState *dev, int indent);
 char *(*get_dev_path)(DeviceState *dev);
+
 /*
  * This callback is used to create Open Firmware device path in accordance
  * with OF spec http://forthworks.com/standards/of1275.pdf. Individual bus
  * bindings can be found at http://playground.sun.com/1275/bindings/.
  */
 char *(*get_fw_dev_path)(DeviceState *dev);
+
 void (*reset)(BusState *bus);
+
+/*
+ * Return whether the device can be added to @bus,
+ * based on the address that was set (via device properties)
+ * before realize.  If not, on return @errp contains the
+ * human-readable error message.
+ */
+bool (*check_address)(BusState *bus, DeviceState *dev, Error **errp);
+
 BusRealize realize;
 BusUnrealize unrealize;
 
@@ -788,7 +799,7 @@ const char *qdev_fw_name(DeviceState *dev);
 Object *qdev_get_machine(void);
 
 /* FIXME: make this a link<> */
-void qdev_set_parent_bus(DeviceState *dev, BusState *bus);
+bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp);
 
 extern bool qdev_hotplug;
 extern bool qdev_hot_removed;
-- 
2.26.2

[PATCH v8 06/17] qtest: check that drives are really appearing and disappearing

2020-10-07 Thread Paolo Bonzini

Do not just trust the HMP commands to create and delete the drive, use
query-block to check that this is actually the case.

Signed-off-by: Paolo Bonzini 
---
 tests/qtest/drive_del-test.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/drive_del-test.c b/tests/qtest/drive_del-test.c
index 9d20a1ed8b..ff772b3671 100644
--- a/tests/qtest/drive_del-test.c
+++ b/tests/qtest/drive_del-test.c
@@ -14,20 +14,49 @@
 #include "libqos/libqtest.h"
 #include "libqos/virtio.h"
 #include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qlist.h"
+
+static bool has_drive(QTestState *qts)
+{
+QDict *response;
+QList *ret;
+QListEntry *entry;
+bool found;
+
+response = qtest_qmp(qts, "{'execute': 'query-block'}");
+g_assert(response && qdict_haskey(response, "return"));
+ret = qdict_get_qlist(response, "return");
+
+found = false;
+QLIST_FOREACH_ENTRY(ret, entry) {
+QDict *entry_dict = qobject_to(QDict, entry->value);
+if (!strcmp(qdict_get_str(entry_dict, "device"), "drive0")) {
+found = true;
+break;
+}
+}
+
+qobject_unref(response);
+return found;
+}
 
 static void drive_add(QTestState *qts)
 {
 char *resp = qtest_hmp(qts, "drive_add 0 if=none,id=drive0");
 
 g_assert_cmpstr(resp, ==, "OK\r\n");
+g_assert(has_drive(qts));
 g_free(resp);
 }
 
 static void drive_del(QTestState *qts)
 {
-char *resp = qtest_hmp(qts, "drive_del drive0");
+char *resp;
 
+g_assert(has_drive(qts));
+resp = qtest_hmp(qts, "drive_del drive0");
 g_assert_cmpstr(resp, ==, "");
+g_assert(!has_drive(qts));
 g_free(resp);
 }
 
@@ -130,6 +159,7 @@ static void test_drive_del_device_del(void)
  */
 drive_del(qts);
 device_del(qts);
+g_assert(!has_drive(qts));
 
 qtest_quit(qts);
 }
-- 
2.26.2

[PATCH v8 01/17] qtest: rename qtest_qmp_receive to qtest_qmp_receive_dict

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

In the next patch a new version of qtest_qmp_receive will be
reintroduced that will buffer received qmp events for later
consumption in qtest_qmp_eventwait_ref

No functional change intended.

Suggested-by: Paolo Bonzini 
Signed-off-by: Maxim Levitsky 
Signed-off-by: Paolo Bonzini 
---
 tests/qtest/ahci-test.c|  4 ++--
 tests/qtest/device-plug-test.c |  2 +-
 tests/qtest/drive_del-test.c   |  2 +-
 tests/qtest/libqos/libqtest.h  |  4 ++--
 tests/qtest/libqtest.c | 16 
 tests/qtest/pvpanic-test.c |  2 +-
 tests/qtest/qmp-test.c | 18 +-
 7 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index 5e1954852e..d42ebaeb4c 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -1590,7 +1590,7 @@ static void test_atapi_tray(void)
 qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-open-tray', "
 "'arguments': {'id': 'cd0'}}");
 atapi_wait_tray(ahci, true);
-rsp = qtest_qmp_receive(ahci->parent->qts);
+rsp = qtest_qmp_receive_dict(ahci->parent->qts);
 qobject_unref(rsp);
 
 qmp_discard_response(ahci->parent->qts,
@@ -1620,7 +1620,7 @@ static void test_atapi_tray(void)
 qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-close-tray', "
"'arguments': {'id': 'cd0'}}");
 atapi_wait_tray(ahci, false);
-rsp = qtest_qmp_receive(ahci->parent->qts);
+rsp = qtest_qmp_receive_dict(ahci->parent->qts);
 qobject_unref(rsp);
 
 /* Now, to convince ATAPI we understand the media has changed... */
diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index 9214892741..a2247856be 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -23,7 +23,7 @@ static void device_del_start(QTestState *qtest, const char 
*id)
 
 static void device_del_finish(QTestState *qtest)
 {
-QDict *resp = qtest_qmp_receive(qtest);
+QDict *resp = qtest_qmp_receive_dict(qtest);
 
 g_assert(qdict_haskey(resp, "return"));
 qobject_unref(resp);
diff --git a/tests/qtest/drive_del-test.c b/tests/qtest/drive_del-test.c
index 2d765865ce..ba0cd77445 100644
--- a/tests/qtest/drive_del-test.c
+++ b/tests/qtest/drive_del-test.c
@@ -41,7 +41,7 @@ static void device_del(QTestState *qts)
 /* Complication: ignore DEVICE_DELETED event */
 qmp_discard_response(qts, "{'execute': 'device_del',"
  " 'arguments': { 'id': 'dev0' } }");
-response = qtest_qmp_receive(qts);
+response = qtest_qmp_receive_dict(qts);
 g_assert(response);
 g_assert(qdict_haskey(response, "return"));
 qobject_unref(response);
diff --git a/tests/qtest/libqos/libqtest.h b/tests/qtest/libqos/libqtest.h
index 209fcf6973..9b3f99b322 100644
--- a/tests/qtest/libqos/libqtest.h
+++ b/tests/qtest/libqos/libqtest.h
@@ -191,12 +191,12 @@ void qtest_qmp_vsend(QTestState *s, const char *fmt, 
va_list ap)
 GCC_FMT_ATTR(2, 0);
 
 /**
- * qtest_receive:
+ * qtest_qmp_receive_dict:
  * @s: #QTestState instance to operate on.
  *
  * Reads a QMP message from QEMU and returns the response.
  */
-QDict *qtest_qmp_receive(QTestState *s);
+QDict *qtest_qmp_receive_dict(QTestState *s);
 
 /**
  * qtest_qmp_eventwait:
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 58f58e1ece..dadc465825 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -322,7 +322,7 @@ QTestState *qtest_init(const char *extra_args)
 QDict *greeting;
 
 /* Read the QMP greeting and then do the handshake */
-greeting = qtest_qmp_receive(s);
+greeting = qtest_qmp_receive_dict(s);
 qobject_unref(greeting);
 qobject_unref(qtest_qmp(s, "{ 'execute': 'qmp_capabilities' }"));
 
@@ -603,7 +603,7 @@ QDict *qmp_fd_receive(int fd)
 return qmp.response;
 }
 
-QDict *qtest_qmp_receive(QTestState *s)
+QDict *qtest_qmp_receive_dict(QTestState *s)
 {
 return qmp_fd_receive(s->qmp_fd);
 }
@@ -678,7 +678,7 @@ QDict *qtest_vqmp_fds(QTestState *s, int *fds, size_t 
fds_num,
 qtest_qmp_vsend_fds(s, fds, fds_num, fmt, ap);
 
 /* Receive reply */
-return qtest_qmp_receive(s);
+return qtest_qmp_receive_dict(s);
 }
 
 QDict *qtest_vqmp(QTestState *s, const char *fmt, va_list ap)
@@ -686,7 +686,7 @@ QDict *qtest_vqmp(QTestState *s, const char *fmt, va_list 
ap)
 qtest_qmp_vsend(s, fmt, ap);
 
 /* Receive reply */
-return qtest_qmp_receive(s);
+return qtest_qmp_receive_dict(s);
 }
 
 QDict *qmp_fd(int fd, const char *fmt, ...)
@@ -776,7 +776,7 @@ QDict *qtest_qmp_eventwait_ref(QTestState *s, const char 
*event)
 QDict *response;
 
 for (;;) {
-response = qtest_qmp_receive(s);
+response = qtest_qmp_receive_dict(s);
 if ((qdict_haskey(response, "event")) &&
 (strcmp(qdict_get_str(response, "event"), event) == 0)) {
 return response;
@@ -807,7 +807,7 @@ char *qte

[PATCH v8 00/17] Fix scsi devices plug/unplug races w.r.t virtio-scsi iothread

2020-10-07 Thread Paolo Bonzini

Hopefully the final version of the patches, fixing the remaining
testsuite issues.  Kevin or Max, please take a look at patches 6 and 7
as they affect qemu-iotests.

Paolo

Maxim Levitsky (11):
  qtest: rename qtest_qmp_receive to qtest_qmp_receive_dict
  qtest: Reintroduce qtest_qmp_receive
  qtest: remove qtest_qmp_receive_success
  qtest: switch users back to qtest_qmp_receive
  scsi/scsi_bus: switch search direction in scsi_device_find
  device_core: use drain_call_rcu in in qmp_device_add
  device-core: use RCU for list of children of a bus
  device-core: use atomic_set on .realized property
  scsi/scsi_bus: Add scsi_device_get
  virtio-scsi: use scsi_device_get
  scsi/scsi_bus: fix races in REPORT LUNS

Paolo Bonzini (6):
  device-plug-test: use qtest_qmp to send the device_del command
  qtest: check that drives are really appearing and disappearing
  qemu-iotests, qtest: rewrite test 067 as a qtest
  qdev: add "check if address free" callback for buses
  scsi: switch to bus->check_address
  scsi/scsi-bus: scsi_device_find: don't return unrealized devices

 .gitlab-ci.yml  |   2 +-
 hw/core/bus.c   |  28 ++-
 hw/core/qdev.c  |  73 --
 hw/net/virtio-net.c |   2 +-
 hw/scsi/scsi-bus.c  | 262 
 hw/scsi/virtio-scsi.c   |  27 ++-
 hw/sd/core.c|   3 +-
 include/hw/qdev-core.h  |  24 +-
 include/hw/scsi/scsi.h  |   1 +
 softmmu/qdev-monitor.c  |  12 +
 tests/qemu-iotests/067  | 157 
 tests/qemu-iotests/067.out  | 414 
 tests/qemu-iotests/group|   1 -
 tests/qtest/device-plug-test.c  |  32 +--
 tests/qtest/drive_del-test.c| 244 ---
 tests/qtest/libqos/libqtest.h   |  34 +--
 tests/qtest/libqtest.c  | 110 +
 tests/qtest/meson.build |   4 +-
 tests/qtest/migration-helpers.c |  25 +-
 tests/qtest/pvpanic-test.c  |   4 +-
 tests/qtest/qmp-test.c  |  18 +-
 tests/qtest/tpm-util.c  |   8 +-
 22 files changed, 637 insertions(+), 848 deletions(-)
 delete mode 100755 tests/qemu-iotests/067
 delete mode 100644 tests/qemu-iotests/067.out

-- 
2.26.2

[PATCH v8 03/17] qtest: remove qtest_qmp_receive_success

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

The purpose of qtest_qmp_receive_success was mostly to process events
that arrived between the issueing of a command and the "return"
line from QMP.  This is now handled by the buffering of events
that libqtest performs automatically.

Signed-off-by: Maxim Levitsky 
[Extracted from Maxim's patch to a separate commit. - Paolo]
Signed-off-by: Paolo Bonzini 
---
 tests/qtest/libqos/libqtest.h   | 17 ---
 tests/qtest/libqtest.c  | 53 -
 tests/qtest/migration-helpers.c | 25 
 3 files changed, 25 insertions(+), 70 deletions(-)

diff --git a/tests/qtest/libqos/libqtest.h b/tests/qtest/libqos/libqtest.h
index a2e3961792..64bb1cd9eb 100644
--- a/tests/qtest/libqos/libqtest.h
+++ b/tests/qtest/libqos/libqtest.h
@@ -240,23 +240,6 @@ QDict *qtest_qmp_eventwait_ref(QTestState *s, const char 
*event);
  */
 QDict *qtest_qmp_event_ref(QTestState *s, const char *event);
 
-/**
- * qtest_qmp_receive_success:
- * @s: #QTestState instance to operate on
- * @event_cb: Event callback
- * @opaque: Argument for @event_cb
- *
- * Poll QMP messages until a command success response is received.
- * If @event_cb, call it for each event received, passing @opaque,
- * the event's name and data.
- * Return the success response's "return" member.
- */
-QDict *qtest_qmp_receive_success(QTestState *s,
- void (*event_cb)(void *opaque,
-  const char *name,
-  QDict *data),
- void *opaque);
-
 /**
  * qtest_hmp:
  * @s: #QTestState instance to operate on.
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index d4c49a52ff..baac667b8d 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -1291,35 +1291,6 @@ void qtest_cb_for_every_machine(void (*cb)(const char 
*machine),
 qobject_unref(response);
 }
 
-QDict *qtest_qmp_receive_success(QTestState *s,
- void (*event_cb)(void *opaque,
-  const char *event,
-  QDict *data),
- void *opaque)
-{
-QDict *response, *ret, *data;
-const char *event;
-
-for (;;) {
-response = qtest_qmp_receive_dict(s);
-g_assert(!qdict_haskey(response, "error"));
-ret = qdict_get_qdict(response, "return");
-if (ret) {
-break;
-}
-event = qdict_get_str(response, "event");
-data = qdict_get_qdict(response, "data");
-if (event_cb) {
-event_cb(opaque, event, data);
-}
-qobject_unref(response);
-}
-
-qobject_ref(ret);
-qobject_unref(response);
-return ret;
-}
-
 /*
  * Generic hot-plugging test via the device_add QMP commands.
  */
@@ -1355,13 +1326,6 @@ void qtest_qmp_device_add(QTestState *qts, const char 
*driver, const char *id,
 qobject_unref(args);
 }
 
-static void device_deleted_cb(void *opaque, const char *name, QDict *data)
-{
-bool *got_event = opaque;
-
-g_assert_cmpstr(name, ==, "DEVICE_DELETED");
-*got_event = true;
-}
 
 /*
  * Generic hot-unplugging test via the device_del QMP command.
@@ -1378,24 +1342,17 @@ static void device_deleted_cb(void *opaque, const char 
*name, QDict *data)
  * and this one:
  *
  * {"return": {}}
- *
- * But the order of arrival may vary - so we've got to detect both.
  */
 void qtest_qmp_device_del(QTestState *qts, const char *id)
 {
-bool got_event = false;
 QDict *rsp;
 
-qtest_qmp_send(qts, "{'execute': 'device_del', 'arguments': {'id': %s}}",
-   id);
-rsp = qtest_qmp_receive_success(qts, device_deleted_cb, &got_event);
+rsp = qtest_qmp(qts, "{'execute': 'device_del', 'arguments': {'id': %s}}",
+id);
+
+g_assert(qdict_haskey(rsp, "return"));
 qobject_unref(rsp);
-if (!got_event) {
-rsp = qtest_qmp_receive_dict(qts);
-g_assert_cmpstr(qdict_get_try_str(rsp, "event"),
-==, "DEVICE_DELETED");
-qobject_unref(rsp);
-}
+qtest_qmp_eventwait(qts, "DEVICE_DELETED");
 }
 
 bool qmp_rsp_is_err(QDict *rsp)
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 516093b39a..b799dbafb7 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -17,10 +17,12 @@
 
 bool got_stop;
 
-static void stop_cb(void *opaque, const char *name, QDict *data)
+static void check_stop_event(QTestState *who)
 {
-if (!strcmp(name, "STOP")) {
+QDict *event = qtest_qmp_event_ref(who, "STOP");
+if (event) {
 got_stop = true;
+qobject_unref(event);
 }
 }
 
@@ -30,12 +32,19 @@ static void stop_cb(void *opaque, const char *name, QDict 
*data)
 QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...)
 {
 va_list ap;

[PATCH v8 05/17] qtest: switch users back to qtest_qmp_receive

2020-10-07 Thread Paolo Bonzini

From: Maxim Levitsky 

Let test use the new functionality for buffering events.
The only remaining users of qtest_qmp_receive_dict are tests
that fuzz the QMP protocol.

Tested with 'make check-qtest'.

Signed-off-by: Maxim Levitsky 
Message-Id: <20201006123904.610658-4-mlevi...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/qtest/ahci-test.c  |  4 ++--
 tests/qtest/drive_del-test.c |  9 +++--
 tests/qtest/libqtest.c   | 12 +++-
 tests/qtest/pvpanic-test.c   |  4 +---
 tests/qtest/tpm-util.c   |  8 ++--
 5 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index d42ebaeb4c..5e1954852e 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -1590,7 +1590,7 @@ static void test_atapi_tray(void)
 qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-open-tray', "
 "'arguments': {'id': 'cd0'}}");
 atapi_wait_tray(ahci, true);
-rsp = qtest_qmp_receive_dict(ahci->parent->qts);
+rsp = qtest_qmp_receive(ahci->parent->qts);
 qobject_unref(rsp);
 
 qmp_discard_response(ahci->parent->qts,
@@ -1620,7 +1620,7 @@ static void test_atapi_tray(void)
 qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-close-tray', "
"'arguments': {'id': 'cd0'}}");
 atapi_wait_tray(ahci, false);
-rsp = qtest_qmp_receive_dict(ahci->parent->qts);
+rsp = qtest_qmp_receive(ahci->parent->qts);
 qobject_unref(rsp);
 
 /* Now, to convince ATAPI we understand the media has changed... */
diff --git a/tests/qtest/drive_del-test.c b/tests/qtest/drive_del-test.c
index ba0cd77445..9d20a1ed8b 100644
--- a/tests/qtest/drive_del-test.c
+++ b/tests/qtest/drive_del-test.c
@@ -15,9 +15,6 @@
 #include "libqos/virtio.h"
 #include "qapi/qmp/qdict.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(q, ...) qobject_unref(qtest_qmp(q, __VA_ARGS__))
-
 static void drive_add(QTestState *qts)
 {
 char *resp = qtest_hmp(qts, "drive_add 0 if=none,id=drive0");
@@ -38,13 +35,13 @@ static void device_del(QTestState *qts)
 {
 QDict *response;
 
-/* Complication: ignore DEVICE_DELETED event */
-qmp_discard_response(qts, "{'execute': 'device_del',"
+response = qtest_qmp(qts, "{'execute': 'device_del',"
  " 'arguments': { 'id': 'dev0' } }");
-response = qtest_qmp_receive_dict(qts);
 g_assert(response);
 g_assert(qdict_haskey(response, "return"));
 qobject_unref(response);
+
+qtest_qmp_eventwait(qts, "DEVICE_DELETED");
 }
 
 static void test_drive_without_dev(void)
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index baac667b8d..08929f5ff6 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -324,7 +324,7 @@ QTestState *qtest_init(const char *extra_args)
 QDict *greeting;
 
 /* Read the QMP greeting and then do the handshake */
-greeting = qtest_qmp_receive_dict(s);
+greeting = qtest_qmp_receive(s);
 qobject_unref(greeting);
 qobject_unref(qtest_qmp(s, "{ 'execute': 'qmp_capabilities' }"));
 
@@ -700,7 +700,7 @@ QDict *qtest_vqmp_fds(QTestState *s, int *fds, size_t 
fds_num,
 qtest_qmp_vsend_fds(s, fds, fds_num, fmt, ap);
 
 /* Receive reply */
-return qtest_qmp_receive_dict(s);
+return qtest_qmp_receive(s);
 }
 
 QDict *qtest_vqmp(QTestState *s, const char *fmt, va_list ap)
@@ -708,7 +708,7 @@ QDict *qtest_vqmp(QTestState *s, const char *fmt, va_list 
ap)
 qtest_qmp_vsend(s, fmt, ap);
 
 /* Receive reply */
-return qtest_qmp_receive_dict(s);
+return qtest_qmp_receive(s);
 }
 
 QDict *qmp_fd(int fd, const char *fmt, ...)
@@ -850,12 +850,6 @@ char *qtest_vhmp(QTestState *s, const char *fmt, va_list 
ap)
  " 'arguments': {'command-line': %s}}",
  cmd);
 ret = g_strdup(qdict_get_try_str(resp, "return"));
-while (ret == NULL && qdict_get_try_str(resp, "event")) {
-/* Ignore asynchronous QMP events */
-qobject_unref(resp);
-resp = qtest_qmp_receive_dict(s);
-ret = g_strdup(qdict_get_try_str(resp, "return"));
-}
 g_assert(ret);
 qobject_unref(resp);
 g_free(cmd);
diff --git a/tests/qtest/pvpanic-test.c b/tests/qtest/pvpanic-test.c
index b0b20ad340..0657de797f 100644
--- a/tests/qtest/pvpanic-test.c
+++ b/tests/qtest/pvpanic-test.c
@@ -24,9 +24,7 @@ static void test_panic(void)
 
 qtest_outb(qts, 0x505, 0x1);
 
-response = qtest_qmp_receive_dict(qts);
-g_assert(qdict_haskey(response, "event"));
-g_assert_cmpstr(qdict_get_str(response, "event"), ==, "GUEST_PANICKED");
+response = qtest_qmp_eventwait_ref(qts, "GUEST_PANICKED");
 g_assert(qdict_haskey(response, "data"));
 data = qdict_get_qdict(response, "data");
 g_assert(qdict_haskey(data, "action"));
diff --git a/tests/qtest/tpm-util.c b/tests/qtest/tpm-util.c
index 3ed6c8548a..5a33a6ef0f 100644
--- a/tests/qtest/tpm-util

Re: [PATCH] tests/docker: Add genisoimage to the docker file

2020-10-07 Thread Alex Bennée



Thomas Huth  writes:

> genisoimage is needed for running the tests/qtest/cdrom-test qtest.

Queued to testing/next, thanks.

>
> Signed-off-by: Thomas Huth 
> ---
>  tests/docker/dockerfiles/centos8.docker  | 1 +
>  tests/docker/dockerfiles/debian-amd64.docker | 1 +
>  tests/docker/dockerfiles/fedora.docker   | 1 +
>  tests/docker/dockerfiles/ubuntu2004.docker   | 1 +
>  4 files changed, 4 insertions(+)
>
> diff --git a/tests/docker/dockerfiles/centos8.docker 
> b/tests/docker/dockerfiles/centos8.docker
> index f435616d6a..0fc2697491 100644
> --- a/tests/docker/dockerfiles/centos8.docker
> +++ b/tests/docker/dockerfiles/centos8.docker
> @@ -8,6 +8,7 @@ ENV PACKAGES \
>  dbus-daemon \
>  gcc \
>  gcc-c++ \
> +genisoimage \
>  gettext \
>  git \
>  glib2-devel \
> diff --git a/tests/docker/dockerfiles/debian-amd64.docker 
> b/tests/docker/dockerfiles/debian-amd64.docker
> index d2500dcff1..314c6bae83 100644
> --- a/tests/docker/dockerfiles/debian-amd64.docker
> +++ b/tests/docker/dockerfiles/debian-amd64.docker
> @@ -14,6 +14,7 @@ RUN apt update && \
>  RUN apt update && \
>  DEBIAN_FRONTEND=noninteractive eatmydata \
>  apt install -y --no-install-recommends \
> +genisoimage \
>  libbz2-dev \
>  liblzo2-dev \
>  libgcrypt20-dev \
> diff --git a/tests/docker/dockerfiles/fedora.docker 
> b/tests/docker/dockerfiles/fedora.docker
> index ec783418c8..85c975543d 100644
> --- a/tests/docker/dockerfiles/fedora.docker
> +++ b/tests/docker/dockerfiles/fedora.docker
> @@ -15,6 +15,7 @@ ENV PACKAGES \
>  findutils \
>  gcc \
>  gcc-c++ \
> +genisoimage \
>  gettext \
>  git \
>  glib2-devel \
> diff --git a/tests/docker/dockerfiles/ubuntu2004.docker 
> b/tests/docker/dockerfiles/ubuntu2004.docker
> index cafe8443fb..f4b9556b9e 100644
> --- a/tests/docker/dockerfiles/ubuntu2004.docker
> +++ b/tests/docker/dockerfiles/ubuntu2004.docker
> @@ -3,6 +3,7 @@ ENV PACKAGES flex bison \
>  ccache \
>  clang-10\
>  gcc \
> +genisoimage \
>  gettext \
>  git \
>  glusterfs-common \


-- 
Alex Bennée

[PATCH] qcow2: Document and enforce the QCowL2Meta invariants

2020-10-07 Thread Alberto Garcia

The QCowL2Meta structure is used to store information about a part of
a write request that touches clusters that need changes in their L2
entries. This happens with newly-allocated clusters or subclusters.

This structure has changed a bit since it was first created and its
current documentation is not quite up-to-date.

A write request can span a region consisting on a combination of
clusters of different types, and qcow2_alloc_host_offset() can
repeatedly call handle_copied() and handle_alloc() to add more
clusters to the mix as long as they all are contiguous on the image
file.

Because of this a write request has a list of QCowL2Meta structures,
one for each part of the request that needs changes in the L2
metadata.

Each one of them spans nb_clusters and has two copy-on-write regions
located immediately before and after the middle region that that part
of the write request touches. Even when those regions themselves are
empty their offsets must be correct because they are used to know the
location of the middle region.

This was not always the case but it is not a problem anymore
because the only two places where QCowL2Meta structures are created
(calculate_l2_meta() and qcow2_co_truncate()) ensure that the
copy-on-write regions are correctly defined, and so do assertions like
the ones in perform_cow().

The conditional initialization of the 'written_to' variable is
therefore unnecessary and is removed by this patch.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.h | 19 +++
 block/qcow2-cluster.c |  5 +++--
 block/qcow2.c | 19 +++
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 125ea9679b..2e0272a7b8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -435,17 +435,18 @@ typedef struct Qcow2COWRegion {
 
 /**
  * Describes an in-flight (part of a) write request that writes to clusters
- * that are not referenced in their L2 table yet.
+ * that need to have their L2 table entries updated (because they are
+ * newly allocated or need changes in their L2 bitmaps)
  */
 typedef struct QCowL2Meta
 {
-/** Guest offset of the first newly allocated cluster */
+/** Guest offset of the first updated cluster */
 uint64_t offset;
 
-/** Host offset of the first newly allocated cluster */
+/** Host offset of the first updated cluster */
 uint64_t alloc_offset;
 
-/** Number of newly allocated clusters */
+/** Number of updated clusters */
 int nb_clusters;
 
 /** Do not free the old clusters */
@@ -458,14 +459,16 @@ typedef struct QCowL2Meta
 CoQueue dependent_requests;
 
 /**
- * The COW Region between the start of the first allocated cluster and the
- * area the guest actually writes to.
+ * The COW Region immediately before the area the guest actually
+ * writes to. This (part of the) write request starts at
+ * cow_start.offset + cow_start.nb_bytes.
  */
 Qcow2COWRegion cow_start;
 
 /**
- * The COW Region between the area the guest actually writes to and the
- * end of the last allocated cluster.
+ * The COW Region immediately after the area the guest actually
+ * writes to. This (part of the) write request ends at cow_end.offset
+ * (which must always be set even when cow_end.nb_bytes is 0).
  */
 Qcow2COWRegion cow_end;
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index aa87d3e99b..485b4cb92e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1049,6 +1049,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
 
 assert(l2_index + m->nb_clusters <= s->l2_slice_size);
+assert(m->cow_end.offset + m->cow_end.nb_bytes <=
+   m->nb_clusters << s->cluster_bits);
 for (i = 0; i < m->nb_clusters; i++) {
 uint64_t offset = cluster_offset + ((uint64_t)i << s->cluster_bits);
 /* if two concurrent writes happen to the same unallocated cluster
@@ -1070,8 +1072,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 if (has_subclusters(s) && !m->prealloc) {
 uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
 unsigned written_from = m->cow_start.offset;
-unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
-m->nb_clusters << s->cluster_bits;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes;
 int first_sc, last_sc;
 /* Narrow written_from and written_to down to the current cluster 
*/
 written_from = MAX(written_from, i << s->cluster_bits);
diff --git a/block/qcow2.c b/block/qcow2.c
index b05512718c..e7b27fdf25 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2361,15 +2361,26 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
 continue;
 }
 
-/* The data (middle) region

Re: [PATCH v2 3/5] nbd: Simplify meta-context parsing

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


30.09.2020 15:11, Eric Blake wrote:

We had a premature optimization of trying to read as little from the
wire as possible while handling NBD_OPT_SET_META_CONTEXT in phases.
But in reality, we HAVE to read the entire string from the client
before we can get to the next command, and it is easier to just read
it all at once than it is to read it in pieces.  And once we do that,
several functions end up no longer performing I/O, so they can drop
length and errp parameters, and just return a bool instead of
modifying through a pointer.

Our iotests still pass; I also checked that libnbd's testsuite (which
covers more corner cases of odd meta context requests) still passes.



Also, do not advertise bitmaps meta context when bitmap export is not set.


Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH 20/20] python: add mypy config

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Formalize the options used for checking the python library. You can run
> mypy from the directory that mypy.ini is in by typing `mypy qemu/`.
> 
> Signed-off-by: John Snow 
> ---
>  python/mypy.ini | 4 
>  1 file changed, 4 insertions(+)
>  create mode 100644 python/mypy.ini
> 
> diff --git a/python/mypy.ini b/python/mypy.ini
> new file mode 100644
> index 000..7a70eca47c6
> --- /dev/null
> +++ b/python/mypy.ini
> @@ -0,0 +1,4 @@
> +[mypy]
> +strict = True

$ mypy --strict qemu
mypy.ini: [mypy]: Strict mode is not supported in configuration files: specify 
individual flags instead (see 'mypy -h' for the list of flags enabled in strict 
mode)
Success: no issues found in 6 source files
$ mypy --version
mypy 0.740

Did this change in newer mypy versions? I guess it's time that I get the
new laptop which will involve installing a newer Fedora release. :-)

> +python_version = 3.6
> +warn_unused_configs = True
> \ No newline at end of file

Kevin

Re: [PATCH 18/20] python/qemu/qmp.py: re-raise OSError when encountered

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Nested if conditions don't change when the exception block fires; we
> need to explicitly re-raise the error if we didn't intend to capture and
> suppress it.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/qmp.py | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/python/qemu/qmp.py b/python/qemu/qmp.py
> index d911999da1f..bdbd1e9bdbb 100644
> --- a/python/qemu/qmp.py
> +++ b/python/qemu/qmp.py
> @@ -169,9 +169,9 @@ def __get_events(self, wait: Union[bool, float] = False) 
> -> None:
>  try:
>  self.__json_read()
>  except OSError as err:
> -if err.errno == errno.EAGAIN:
> -# No data available
> -pass
> +# EAGAIN: No data available; not critical
> +if err.errno != errno.EAGAIN:
> +raise

Hm, if we re-raise the exception here, the socket remains non-blocking.
I think we intended to have it like that only temporarily.

The same kind of exception would raise QMPConnectError below instead of
directly OSError. Should we try to make this consistent?

>  self.__sock.setblocking(True)
>  
>  # Wait for new events, if needed.

Kevin

Re: [PATCH 17/20] python/qemu/qmp.py: Preserve error context on re-raise

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Use the "from ..." phrasing when re-raising errors to preserve their
> initial context, to help aid debugging when things go wrong.
> 
> This also silences a pylint 2.6.0+ error.
> 
> Signed-off-by: John Snow 

I don't really understand what this improves compared to the implicit
chaining we got before, but if pylint says so...

Reviewed-by: Kevin Wolf

Re: [PATCH 16/20] python/console_socket: avoid encoding to/from string

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> We can work directly in bytes instead of translating back and forth to
> string, which removes the question of which encodings to use.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH 14/20] python/qemu/console_socket.py: Clarify type of drain_thread

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Mypy needs just a little help to guess the type here.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH 12/20] python/qemu/console_socket.py: Correct type of recv()

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> The type and parameter names of recv() should match socket.socket().

Should this be socket.socket without parentheses (the class name)?
socket.socket() is the constructor and it takes very different
parameters.

> OK, easy enough, but in the cases we don't pass straight through to the
> real socket implementation, we probably can't accept such flags. OK, for
> now, assert that we don't receive flags in such cases.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH 15/20] python/qemu/console_socket.py: Add type hint annotations

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Finish the typing of console_socket.py with annotations and no code
> changes.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH 13/20] python/qemu/console_socket.py: fix typing of settimeout

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> The types and names of the parameters must match the socket.socket interface.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH 11/20] python/qemu: Add mypy type annotations

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> These should all be purely annotations with no changes in behavior at
> all. You need to be in the python folder, but you should be able to
> confirm that these annotations are correct (or at least self-consistent)
> by running `mypy --strict qemu`.
> 
> Signed-off-by: John Snow 

'mypy --strict qemu' doesn't have clean output after this patch because
ConsoleSocket isn't converted yet. But then, technically the commit
message doesn't make this claim, and you can indeed check the
self-consistency when you ignore the ConsoleSocket related errors, so
probably not a problem.

Reviewed-by: Kevin Wolf

Re: [PATCH v2 2/5] nbd/server: Reject embedded NUL in NBD strings

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


30.09.2020 15:11, Eric Blake wrote:

The NBD spec is clear that any string sent from the client must not
contain embedded NUL characters.  If the client passes "a\0", we
should reject that option request rather than act on "a".

Testing this is not possible with a compliant client, but I was able
to use gdb to coerce libnbd into temporarily behaving as such a
client.

Signed-off-by: Eric Blake



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v2 1/5] qemu-nbd: Honor SIGINT and SIGHUP

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


30.09.2020 15:11, Eric Blake wrote:

Honoring just SIGTERM on Linux is too weak; we also want to handle
other common signals, and do so even on BSD.  Why?  Because at least
'qemu-nbd -B bitmap' needs a chance to clean up the in-use bit on
bitmaps when the server is shut down via a signal.


Probably not bad to update a comment [*] if you have a good wording in mind.



See also: http://bugzilla.redhat.com/1883608

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  qemu-nbd.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index bacb69b0898b..e7520261134f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -581,7 +581,7 @@ int main(int argc, char **argv)
  const char *pid_file_name = NULL;
  BlockExportOptions *export_opts;

-#if HAVE_NBD_DEVICE
+#ifdef CONFIG_POSIX
  /* The client thread uses SIGTERM to interrupt the server.  A signal
   * handler ensures that "qemu-nbd -v -c" exits with a nice status code.


[*]


   */
@@ -589,9 +589,9 @@ int main(int argc, char **argv)
  memset(&sa_sigterm, 0, sizeof(sa_sigterm));
  sa_sigterm.sa_handler = termsig_handler;
  sigaction(SIGTERM, &sa_sigterm, NULL);
-#endif /* HAVE_NBD_DEVICE */
+sigaction(SIGINT, &sa_sigterm, NULL);
+sigaction(SIGHUP, &sa_sigterm, NULL);

-#ifdef CONFIG_POSIX
  signal(SIGPIPE, SIG_IGN);
  #endif




--
Best regards,
Vladimir

Re: [PATCH v10 8/9] block: remove unused backing-file name parameter

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


29.09.2020 15:38, Andrey Shinkevich wrote:

The block stream QMP parameter backing-file is in use no more. It
designates a backing file name to set in QCOW2 image header after the
block stream job finished. The base file name is used instead.

Signed-off-by: Andrey Shinkevich 


We can't just remove it without a deprecation period of three releases.

So actually, in a previous patch, we should implement new behavior for
automatic backing-file detection if this parameter is unspecified. Amd
keep old behavior for backing-file-name if it is given.

Hmm. Or, probably, we can use direct base for base-filename? And in cases
when we should skip filters (for example of parallel jobs) user should
specify backing-file explicitly?


---
  block/monitor/block-hmp-cmds.c |  2 +-
  block/stream.c |  6 +-
  blockdev.c | 17 +
  include/block/block_int.h  |  2 +-
  qapi/block-core.json   | 17 +
  5 files changed, 5 insertions(+), 39 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4e66775..5f19499 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -506,7 +506,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
  int64_t speed = qdict_get_try_int(qdict, "speed", 0);
  
  qmp_block_stream(true, device, device, base != NULL, base, false, NULL,

- false, NULL, qdict_haskey(qdict, "speed"), speed, true,
+ qdict_haskey(qdict, "speed"), speed, true,
   BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, 
false,
   false, &error);
  
diff --git a/block/stream.c b/block/stream.c

index b0719e9..fe2663f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -34,7 +34,6 @@ typedef struct StreamBlockJob {
  BlockDriverState *base_overlay; /* COW overlay (stream from this) */
  BlockDriverState *above_base;   /* Node directly above the base */
  BlockdevOnError on_error;
-char *backing_file_str;
  bool bs_read_only;
  bool chain_frozen;
  } StreamBlockJob;
@@ -103,8 +102,6 @@ static void stream_clean(Job *job)
  blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
  bdrv_reopen_set_read_only(bs, true, NULL);
  }
-
-g_free(s->backing_file_str);
  }
  
  static int coroutine_fn stream_run(Job *job, Error **errp)

@@ -220,7 +217,7 @@ static const BlockJobDriver stream_job_driver = {
  };
  
  void stream_start(const char *job_id, BlockDriverState *bs,

-  BlockDriverState *base, const char *backing_file_str,
+  BlockDriverState *base,
int creation_flags, int64_t speed,
BlockdevOnError on_error,
const char *filter_node_name,
@@ -295,7 +292,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
  
  s->base_overlay = base_overlay;

  s->above_base = above_base;
-s->backing_file_str = g_strdup(backing_file_str);
  s->bs_read_only = bs_read_only;
  s->chain_frozen = true;
  
diff --git a/blockdev.c b/blockdev.c

index d719c47..b223601 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2486,7 +2486,6 @@ out:
  void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
bool has_base, const char *base,
bool has_base_node, const char *base_node,
-  bool has_backing_file, const char *backing_file,
bool has_speed, int64_t speed,
bool has_on_error, BlockdevOnError on_error,
bool has_filter_node_name, const char *filter_node_name,
@@ -2498,7 +2497,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  BlockDriverState *base_bs = NULL;
  AioContext *aio_context;
  Error *local_err = NULL;
-const char *base_name = NULL;
  int job_flags = JOB_DEFAULT;
  
  if (!has_on_error) {

@@ -2526,7 +2524,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  goto out;
  }
  assert(bdrv_get_aio_context(base_bs) == aio_context);
-base_name = base;
  }
  
  if (has_base_node) {

@@ -2541,7 +2538,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  }
  assert(bdrv_get_aio_context(base_bs) == aio_context);
  bdrv_refresh_filename(base_bs);
-base_name = base_bs->filename;
  }
  
  /* Check for op blockers in the whole chain between bs and base */

@@ -2553,17 +2549,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  }
  }
  
-/* if we are streaming the entire chain, the result will have no backing

- * file, and specifying one is therefore an error */
-if (base_bs == NULL && has_backing_file) {
-error_setg(errp, "backing fil

Re: [PATCH v10 7/9] stream: skip filters when writing backing file name to QCOW2 header

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


29.09.2020 15:38, Andrey Shinkevich wrote:

Avoid writing a filter JSON-name to QCOW2 image when the backing file
is changed after the block stream job.

Signed-off-by: Andrey Shinkevich 
---
  block/stream.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index e0540ee..b0719e9 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
  BlockDriverState *bs = blk_bs(bjob->blk);
  BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
  BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
+BlockDriverState *base_metadata = bdrv_skip_filters(base);


Could we call it base_unfiltered in according with all such things?


  Error *local_err = NULL;
  int ret = 0;
  
@@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
  
  if (bdrv_cow_child(unfiltered_bs)) {

  const char *base_id = NULL, *base_fmt = NULL;
-if (base) {
-base_id = s->backing_file_str;


Seems backing_file_str becomes unused and we should drop it. So, actually,
this patch fix two problems:

 - do not save backing_file_str at stream start (as base may change during the 
job)
 - avoid json filters in final qcow2 image metadata


-if (base->drv) {
-base_fmt = base->drv->format_name;
+if (base_metadata) {
+base_id = base_metadata->filename;
+if (base_metadata->drv) {
+base_fmt = base_metadata->drv->format_name;
  }
  }
  bdrv_set_backing_hd(unfiltered_bs, base, &local_err);




--
Best regards,
Vladimir

Re: [PATCH v10 6/9] copy-on-read: skip non-guest reads if no copy needed

2020-10-07 Thread Vladimir Sementsov-Ogievskiy


29.09.2020 15:38, Andrey Shinkevich wrote:

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
  block/copy-on-read.c | 14 ++
  block/io.c   |  2 +-
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index f53f7e0..5389dca 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -145,10 +145,16 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
  }
  }
  
-ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,

-  local_flags);
-if (ret < 0) {
-return ret;
+if ((flags & BDRV_REQ_PREFETCH) &


BDRV_REQ_PREFETCH is documented to be only used with BDRV_REQ_COPY_ON_READ. But 
here
BDRV_REQ_COPY_ON_READ appears intermediately. We should change documentation in 
block.h
in a separate patch (and probably code in bdrv_aligned_preadv())


+!(local_flags & BDRV_REQ_COPY_ON_READ)) {
+/* Skip non-guest reads if no copy needed */
+} else {
+


extra new-line ?


+ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+if (ret < 0) {
+return ret;
+}
  }
  
  offset += n;

diff --git a/block/io.c b/block/io.c
index 11df188..62b75a5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1388,7 +1388,7 @@ static int coroutine_fn jk(BdrvChild *child,
  qemu_iovec_init_buf(&local_qiov, bounce_buffer, pnum);
  
  ret = bdrv_driver_preadv(bs, cluster_offset, pnum,

- &local_qiov, 0, 0);
+ &local_qiov, 0, flags & 
BDRV_REQ_PREFETCH);


Why? In this place we want to read. We'll write back the data a few lines 
below. What will we write,
if underlying driver decide to do nothing because of BDRV_REQ_PREFETCH?


  if (ret < 0) {
  goto err;
  }




--
Best regards,
Vladimir

Re: [PATCH 08/20] python/machine.py: fix _popen access

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> As always, Optional[T] causes problems with unchecked access. Add a
> helper that asserts the pipe is present before we attempt to talk with
> it.
> 
> Signed-off-by: John Snow 

First a question about the preexisting state: I see that after
initialising self._popen once, we never reset it to None. Should we do
so on shutdown?

>  python/qemu/machine.py | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 3e9cf09fd2d..4e762fcd529 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -131,7 +131,7 @@ def __init__(self, binary, args=None, wrapper=None, 
> name=None,
>  # Runstate
>  self._qemu_log_path = None
>  self._qemu_log_file = None
> -self._popen = None
> +self._popen: Optional['subprocess.Popen[bytes]'] = None

Another option that we have, especially if it's an attribute that is
never reset, would be to set the attribute only when it first gets a
value other than None. Accessing it while it hasn't been set yet
automatically results in an AttributeError. I don't think that's much
worse than the exception raised explicitly in a property wrapper.

In this case, you would only declare the type in __init__, but not
assign a value to it:

self._popen: Optional['subprocess.Popen[bytes]']

Maybe a nicer alternative in some cases than adding properties around
everything.

Instead of checking for None, you would then have to use hasattr(),
which is a bit uglier, so I guess it's mainly for attributes where you
can assume that you will always have a value (if the caller isn't buggy)
and therefore don't even have a check in most places.

>  self._events = []
>  self._iolog = None
>  self._qmp_set = True   # Enable QMP monitor by default.
> @@ -244,6 +244,12 @@ def is_running(self):
>  """Returns true if the VM is running."""
>  return self._popen is not None and self._popen.poll() is None
>  
> +@property
> +def _subp(self) -> 'subprocess.Popen[bytes]':
> +if self._popen is None:
> +raise QEMUMachineError('Subprocess pipe not present')
> +return self._popen
> +
>  def exitcode(self):
>  """Returns the exit code if possible, or None."""
>  if self._popen is None:

Of course, even if an alternative is possible, what you have is still
correct.

Reviewed-by: Kevin Wolf

Re: [PATCH 07/20] python/machine.py: Add _qmp access shim

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Like many other Optional[] types, it's not always a given that this
> object will be set. Wrap it in a type-shim that raises a meaningful
> error and will always return a concrete type.
> 
> Signed-off-by: John Snow 

> @@ -515,11 +515,13 @@ def set_qmp_monitor(self, enabled=True):
>  line. Default is True.
>  @note: call this function before launch().
>  """
> -if enabled:
> -self._qmp_set = True
> -else:
> -self._qmp_set = False
> -self._qmp = None
> +self._qmp_set = enabled

This change seems unrelated to wrapping the connection in a property.
Intuitively, it makes sense that the connection of a running instance
doesn't go away just because I disable QMP in the command line for the
next launch.

If this is the reasoning behind the change, maybe mention it in the
commit message.

With this:
Reviewed-by: Kevin Wolf

Re: [PATCH 01/20] python/qemu: use isort to lay out imports

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:57 hat John Snow geschrieben:
> Borrowed from the QAPI cleanup series, use the same configuration to
> standardize the way we write and sort imports.
> 
> Signed-off-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [PATCH 03/20] python/machine.py: reorder init

2020-10-07 Thread Kevin Wolf

Am 07.10.2020 um 01:58 hat John Snow geschrieben:
> Put the init arg handling all at the top, and mostly in order (deviating
> when one is dependent on another), and put what is effectively runtime
> state declaration at the bottom.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/machine.py | 44 --
>  1 file changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 3017ec072df..71fe58be050 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -84,42 +84,54 @@ def __init__(self, binary, args=None, wrapper=None, 
> name=None,
>  @param monitor_address: address for QMP monitor
>  @param socket_scm_helper: helper program, required for send_fd_scm()
>  @param sock_dir: where to create socket (overrides test_dir for sock)
> -@param console_log: (optional) path to console log file
>  @param drain_console: (optional) True to drain console socket to 
> buffer
> +@param console_log: (optional) path to console log file
>  @note: Qemu process is not started until launch() is used.
>  '''
> +# Direct user configuration
> +
> +self._binary = binary
> +
>  if args is None:
>  args = []
> +# Copy mutable input: we will be modifying our copy
> +self._args = list(args)
> +
>  if wrapper is None:
>  wrapper = []
> -if name is None:
> -name = "qemu-%d" % os.getpid()
> -if sock_dir is None:
> -sock_dir = test_dir
> -self._name = name
> +self._wrapper = wrapper
> +
> +self._name = name or "qemu-%d" % os.getpid()
> +self._test_dir = test_dir
> +self._sock_dir = sock_dir or self._test_dir
> +self._socket_scm_helper = socket_scm_helper

Interesting that you use a shortcut with 'or' for name and sock_dir,
but you don't have 'wraper or []' and 'args or []' above.

It's not wrong, of course, but if you have to respin for another reason,
maybe an inconsistency to address.

Reviewed-by: Kevin Wolf

75 matches

Mail list logo