Re: [PATCH] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-21 Thread Paolo Bonzini
On 22/09/20 08:45, David Hildenbrand wrote:
>> It's certainly a good idea but it's quite verbose.
>>
>> What about using atomic__* as the prefix?  It is not very common in QEMU
>> but there are some cases (and I cannot think of anything better).
>
> aqomic_*, lol :)

Actually qatomic_ would be a good one, wouldn't it?

Paolo




Re: [PATCH] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-21 Thread David Hildenbrand
On 22.09.20 08:27, Paolo Bonzini wrote:
> On 21/09/20 18:23, Stefan Hajnoczi wrote:
>> clang's C11 atomic_fetch_*() functions only take a C11 atomic type
>> pointer argument. QEMU uses direct types (int, etc) and this causes a
>> compiler error when a QEMU code calls these functions in a source file
>> that also included  via a system header file:
>>
>>   $ CC=clang CXX=clang++ ./configure ... && make
>>   ../util/async.c:79:17: error: address argument to atomic operation must be 
>> a pointer to _Atomic type ('unsigned int *' invalid)
>>
>> Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
>> used by . Prefix QEMU's APIs with qemu_ so that atomic.h
>> and  can co-exist.
>>
>> This patch was generated using:
>>
>>   $ git diff | grep -o '\> >/tmp/changed_identifiers
>>   $ for identifier in $(>sed -i "s%\<$identifier\>%qemu_$identifier%" $(git grep -l 
>> "\<$identifier\>") \
>> done
> 
> It's certainly a good idea but it's quite verbose.
> 
> What about using atomic__* as the prefix?  It is not very common in QEMU
> but there are some cases (and I cannot think of anything better).
> 

aqomic_*, lol :)

> Paolo
> 


-- 
Thanks,

David / dhildenb




Re: [PATCH] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-21 Thread Paolo Bonzini
On 21/09/20 18:23, Stefan Hajnoczi wrote:
> clang's C11 atomic_fetch_*() functions only take a C11 atomic type
> pointer argument. QEMU uses direct types (int, etc) and this causes a
> compiler error when a QEMU code calls these functions in a source file
> that also included  via a system header file:
> 
>   $ CC=clang CXX=clang++ ./configure ... && make
>   ../util/async.c:79:17: error: address argument to atomic operation must be 
> a pointer to _Atomic type ('unsigned int *' invalid)
> 
> Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
> used by . Prefix QEMU's APIs with qemu_ so that atomic.h
> and  can co-exist.
> 
> This patch was generated using:
> 
>   $ git diff | grep -o '\ >/tmp/changed_identifiers
>   $ for identifier in $(sed -i "s%\<$identifier\>%qemu_$identifier%" $(git grep -l 
> "\<$identifier\>") \
> done

It's certainly a good idea but it's quite verbose.

What about using atomic__* as the prefix?  It is not very common in QEMU
but there are some cases (and I cannot think of anything better).

Paolo




Re: [PATCH] hw: ide: check the pointer before do dma memory unmap

2020-09-21 Thread Alexander Bulekov
On 200815 0020, Li Qiang wrote:
> In 'map_page' we need to check the return value of
> 'dma_memory_map' to ensure the we actully maped something.
> Otherwise, we will hit an assert in 'address_space_unmap'.
> This is because we can't find the MR with the NULL buffer.
> This is the LP#1884693:
> 
> -->https://bugs.launchpad.net/qemu/+bug/1884693
> 
> Reported-by: Alexander Bulekov 
> Signed-off-by: Li Qiang 

I'm not very familiar with the IDE code, but this seems like a simple
null-ptr check, and Li has not received a response in over a month.

Reviewed-by: Alexander Bulekov 

> ---
>  hw/ide/ahci.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index 009120f88b..63e9fccdbe 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -250,6 +250,11 @@ static void map_page(AddressSpace *as, uint8_t **ptr, 
> uint64_t addr,
>  }
>  
>  *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
> +
> +if (!*ptr) {
> +return;
> +}
> +
>  if (len < wanted) {
>  dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, len);
>  *ptr = NULL;
> -- 
> 2.17.1
> 



Re: [PATCH] hw: ide: check the pointer before do dma memory unmap

2020-09-21 Thread Li Qiang
Ping!!

Li Qiang  于2020年9月15日周二 下午9:38写道:
>
> ping!!
>
> Li Qiang  于2020年9月7日周一 上午9:39写道:
> >
> > Ping!
> >
> > Li Qiang  于2020年9月1日周二 下午6:34写道:
> > >
> > > Ping.
> > >
> > > Li Qiang  于2020年8月15日周六 下午3:21写道:
> > > >
> > > > In 'map_page' we need to check the return value of
> > > > 'dma_memory_map' to ensure the we actully maped something.
> > > > Otherwise, we will hit an assert in 'address_space_unmap'.
> > > > This is because we can't find the MR with the NULL buffer.
> > > > This is the LP#1884693:
> > > >
> > > > -->https://bugs.launchpad.net/qemu/+bug/1884693
> > > >
> > > > Reported-by: Alexander Bulekov 
> > > > Signed-off-by: Li Qiang 
> > > > ---
> > > >  hw/ide/ahci.c | 5 +
> > > >  1 file changed, 5 insertions(+)
> > > >
> > > > diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> > > > index 009120f88b..63e9fccdbe 100644
> > > > --- a/hw/ide/ahci.c
> > > > +++ b/hw/ide/ahci.c
> > > > @@ -250,6 +250,11 @@ static void map_page(AddressSpace *as, uint8_t 
> > > > **ptr, uint64_t addr,
> > > >  }
> > > >
> > > >  *ptr = dma_memory_map(as, addr, &len, DMA_DIRECTION_FROM_DEVICE);
> > > > +
> > > > +if (!*ptr) {
> > > > +return;
> > > > +}
> > > > +
> > > >  if (len < wanted) {
> > > >  dma_memory_unmap(as, *ptr, len, DMA_DIRECTION_FROM_DEVICE, 
> > > > len);
> > > >  *ptr = NULL;
> > > > --
> > > > 2.17.1
> > > >



[PULL 1/1] qemu-img: Support bitmap --merge into backing image

2020-09-21 Thread Eric Blake
If you have the chain 'base.qcow2 <- top.qcow2' and want to merge a
bitmap from top into base, qemu-img was failing with:

qemu-img: Could not open 'top.qcow2': Could not open backing file: Failed to 
get shared "write" lock
Is another process using the image [base.qcow2]?

The easiest fix is to not open the entire backing chain of either
image (source or destination); after all, the point of 'qemu-img
bitmap' is solely to manipulate bitmaps directly within a single qcow2
image, and this is made more precise if we don't pay attention to
other images in the chain that may happen to have a bitmap by the same
name.

However, note that on a case-by-case analysis, there _are_ times where
we treat it as a feature that we can access a bitmap from a backing
layer in association with an overlay BDS.  A demonstration of this is
using NBD to expose both an overlay BDS (for constant contents) and a
bitmap (for learning which blocks are interesting) during an
incremental backup:

Base <- Active <- Temporary
  \--block job ->/

where Temporary is being fed by a backup 'sync=none' job.  When
exposing Temporary over NBD, referring to a bitmap that lives only in
Active is less effort than having to copy a bitmap into Temporary [1].
So the testsuite additions in this patch check both where bitmaps get
allocated (the qemu-img info output), and that qemu-nbd is indeed able
to access a bitmap inherited from the backing chain since it is a
different use case than 'qemu-img bitmap'.

[1] Full disclosure: prior to the recent commit 374eedd1c4 and
friends, we were NOT able to see bitmaps through filters, which meant
that we actually did not have nice clean semantics for uniformly being
able to pick up bitmaps from anywhere in the backing chain (seen as a
change in behavior between qemu 4.1 and 4.2 at commit 00e30f05de, when
block-copy swapped from a one-off to a filter).  Which means libvirt
was already coded to copy bitmaps around for the sake of older qemu,
even though modern qemu no longer needs it.  Oh well.

Fixes: http://bugzilla.redhat.com/1877209
Reported-by: Eyal Shenitzky 
Signed-off-by: Eric Blake 
Message-Id: <20200914191009.644842-1-ebl...@redhat.com>
[eblake: more commit message tweaks, per Max Reitz review]
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 qemu-img.c | 11 ++--
 tests/qemu-iotests/291 | 12 
 tests/qemu-iotests/291.out | 56 ++
 3 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 1d8c5cd778c0..d79c6ffa2e49 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4779,14 +4779,19 @@ static int img_bitmap(int argc, char **argv)
 filename = argv[optind];
 bitmap = argv[optind + 1];

-blk = img_open(image_opts, filename, fmt, BDRV_O_RDWR, false, false,
-   false);
+/*
+ * No need to open backing chains; we will be manipulating bitmaps
+ * directly in this image without reference to image contents.
+ */
+blk = img_open(image_opts, filename, fmt, BDRV_O_RDWR | BDRV_O_NO_BACKING,
+   false, false, false);
 if (!blk) {
 goto out;
 }
 bs = blk_bs(blk);
 if (src_filename) {
-src = img_open(false, src_filename, src_fmt, 0, false, false, false);
+src = img_open(false, src_filename, src_fmt, BDRV_O_NO_BACKING,
+   false, false, false);
 if (!src) {
 goto out;
 }
diff --git a/tests/qemu-iotests/291 b/tests/qemu-iotests/291
index 1e0bb76959bb..4f837b205655 100755
--- a/tests/qemu-iotests/291
+++ b/tests/qemu-iotests/291
@@ -91,6 +91,15 @@ $QEMU_IMG bitmap --remove --image-opts \
 driver=$IMGFMT,file.driver=file,file.filename="$TEST_IMG" tmp
 _img_info --format-specific

+echo
+echo "=== Merge from top layer into backing image ==="
+echo
+
+$QEMU_IMG rebase -u -F qcow2 -b "$TEST_IMG.base" "$TEST_IMG"
+$QEMU_IMG bitmap --add --merge b2 -b "$TEST_IMG" -F $IMGFMT \
+ -f $IMGFMT "$TEST_IMG.base" b3
+_img_info --format-specific --backing-chain
+
 echo
 echo "=== Check bitmap contents ==="
 echo
@@ -107,6 +116,9 @@ $QEMU_IMG map --output=json --image-opts \
 nbd_server_start_unix_socket -r -f qcow2 -B b2 "$TEST_IMG"
 $QEMU_IMG map --output=json --image-opts \
 "$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b2" | _filter_qemu_img_map
+nbd_server_start_unix_socket -r -f qcow2 -B b3 "$TEST_IMG"
+$QEMU_IMG map --output=json --image-opts \
+"$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b3" | _filter_qemu_img_map

 # success, all done
 echo '*** done'
diff --git a/tests/qemu-iotests/291.out b/tests/qemu-iotests/291.out
index ee89a728855f..3990f7aacc7b 100644
--- a/tests/qemu-iotests/291.out
+++ b/tests/qemu-iotests/291.out
@@ -68,6 +68,59 @@ Format specific information:
 corrupt: false
 extended l2: false

+=== Merge from top layer into backing image ===
+
+image: TEST_DIR/t.IMGFMT
+file format: IMGFMT
+virtual size: 10 MiB (10485760 bytes)
+clus

Re: [PATCH v5 4/7] tests/qtest/libqos/virtio-blk: add support for vhost-user-blk

2020-09-21 Thread Dima Stepanov
On Mon, Sep 21, 2020 at 07:04:20PM -0400, Raphael Norwitz wrote:
> MST already sent a PR with all the patches :)

Thank you! )

> 
> On Wed, Sep 16, 2020 at 6:13 PM Dima Stepanov  wrote:
> >
> > On Mon, Sep 14, 2020 at 09:23:42PM -0400, Raphael Norwitz wrote:
> > > On Fri, Sep 11, 2020 at 4:43 AM Dima Stepanov  
> > > wrote:
> > > >
> > > > Add support for the vhost-user-blk-pci device. This node can be used by
> > > > the vhost-user-blk tests. Tests for the vhost-user-blk device are added
> > > > in the following patches.
> > > >
> > > > Signed-off-by: Dima Stepanov 
> > >
> > > Reviewed-by: Raphael Norwitz 
> > Hi,
> >
> > Looks like that all the patch set is reviewed except 7/7. If it is an
> > issue, we can cut it from the set and merge other six commits.
> >
> > Raphael,
> >
> > Will you take it for merge?
> >
> > Thanks, Dima.
> >
> > >
> > > > ---
> > > >  tests/qtest/libqos/virtio-blk.c | 14 +-
> > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tests/qtest/libqos/virtio-blk.c 
> > > > b/tests/qtest/libqos/virtio-blk.c
> > > > index 5da0259..c0fd9d2 100644
> > > > --- a/tests/qtest/libqos/virtio-blk.c
> > > > +++ b/tests/qtest/libqos/virtio-blk.c
> > > > @@ -30,7 +30,8 @@
> > > >  static void *qvirtio_blk_get_driver(QVirtioBlk *v_blk,
> > > >  const char *interface)
> > > >  {
> > > > -if (!g_strcmp0(interface, "virtio-blk")) {
> > > > +if (!g_strcmp0(interface, "virtio-blk") ||
> > > > +!g_strcmp0(interface, "vhost-user-blk")) {
> > > >  return v_blk;
> > > >  }
> > > >  if (!g_strcmp0(interface, "virtio")) {
> > > > @@ -120,6 +121,17 @@ static void virtio_blk_register_nodes(void)
> > > >  qos_node_produces("virtio-blk-pci", "virtio-blk");
> > > >
> > > >  g_free(arg);
> > > > +
> > > > +/* vhost-user-blk-pci */
> > > > +arg = g_strdup_printf("id=drv0,chardev=chdev0,addr=%x.%x",
> > > > +PCI_SLOT, PCI_FN);
> > > > +opts.extra_device_opts = arg;
> > > > +add_qpci_address(&opts, &addr);
> > > > +qos_node_create_driver("vhost-user-blk-pci", 
> > > > virtio_blk_pci_create);
> > > > +qos_node_consumes("vhost-user-blk-pci", "pci-bus", &opts);
> > > > +qos_node_produces("vhost-user-blk-pci", "vhost-user-blk");
> > > > +
> > > > +g_free(arg);
> > > >  }
> > > >
> > > >  libqos_init(virtio_blk_register_nodes);
> > > > --
> > > > 2.7.4
> > > >
> > > >



Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Kevin O'Connor
On Mon, Sep 21, 2020 at 12:31:21PM +0200, Philippe Mathieu-Daudé wrote:
> Back to the SDcard, it might be less critical, so a migration
> breaking change might be acceptable. I'm only aware of Paolo
> and Kevin using this device for testing. Not sure of its
> importance in production.

FWIW, I only use the sdcard for testing (and only use sdhci-pci).  I
don't know if others use it in production, however.

Cheers,
-Kevin



Re: [PATCH v5 4/7] tests/qtest/libqos/virtio-blk: add support for vhost-user-blk

2020-09-21 Thread Raphael Norwitz
MST already sent a PR with all the patches :)

On Wed, Sep 16, 2020 at 6:13 PM Dima Stepanov  wrote:
>
> On Mon, Sep 14, 2020 at 09:23:42PM -0400, Raphael Norwitz wrote:
> > On Fri, Sep 11, 2020 at 4:43 AM Dima Stepanov  
> > wrote:
> > >
> > > Add support for the vhost-user-blk-pci device. This node can be used by
> > > the vhost-user-blk tests. Tests for the vhost-user-blk device are added
> > > in the following patches.
> > >
> > > Signed-off-by: Dima Stepanov 
> >
> > Reviewed-by: Raphael Norwitz 
> Hi,
>
> Looks like that all the patch set is reviewed except 7/7. If it is an
> issue, we can cut it from the set and merge other six commits.
>
> Raphael,
>
> Will you take it for merge?
>
> Thanks, Dima.
>
> >
> > > ---
> > >  tests/qtest/libqos/virtio-blk.c | 14 +-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/tests/qtest/libqos/virtio-blk.c 
> > > b/tests/qtest/libqos/virtio-blk.c
> > > index 5da0259..c0fd9d2 100644
> > > --- a/tests/qtest/libqos/virtio-blk.c
> > > +++ b/tests/qtest/libqos/virtio-blk.c
> > > @@ -30,7 +30,8 @@
> > >  static void *qvirtio_blk_get_driver(QVirtioBlk *v_blk,
> > >  const char *interface)
> > >  {
> > > -if (!g_strcmp0(interface, "virtio-blk")) {
> > > +if (!g_strcmp0(interface, "virtio-blk") ||
> > > +!g_strcmp0(interface, "vhost-user-blk")) {
> > >  return v_blk;
> > >  }
> > >  if (!g_strcmp0(interface, "virtio")) {
> > > @@ -120,6 +121,17 @@ static void virtio_blk_register_nodes(void)
> > >  qos_node_produces("virtio-blk-pci", "virtio-blk");
> > >
> > >  g_free(arg);
> > > +
> > > +/* vhost-user-blk-pci */
> > > +arg = g_strdup_printf("id=drv0,chardev=chdev0,addr=%x.%x",
> > > +PCI_SLOT, PCI_FN);
> > > +opts.extra_device_opts = arg;
> > > +add_qpci_address(&opts, &addr);
> > > +qos_node_create_driver("vhost-user-blk-pci", virtio_blk_pci_create);
> > > +qos_node_consumes("vhost-user-blk-pci", "pci-bus", &opts);
> > > +qos_node_produces("vhost-user-blk-pci", "vhost-user-blk");
> > > +
> > > +g_free(arg);
> > >  }
> > >
> > >  libqos_init(virtio_blk_register_nodes);
> > > --
> > > 2.7.4
> > >
> > >



Re: [PATCH v2] qemu-img: Support bitmap --merge into backing image

2020-09-21 Thread Eric Blake

On 9/17/20 5:19 AM, Max Reitz wrote:


temporary over NBD, referring to a bitmap that lives only in Active is
less effort than having to copy a bitmap into temporary [1].  So the
testsuite additions in this patch check both where bitmaps get
allocated (the qemu-img info output), and, when NOT using 'qemu-img
bitmap', that bitmaps are indeed visible through a backing chain.


Well.  It is useful over NBD but I would doubt that it isn’t useful in
general.  For example, the QMP commands that refer to bitmaps always do
so through a node-name + bitmap-name combination, and they require that
the given bitmap is exactly on the given node.

So I think this is a very much a case-by-case question.  (And in
practice, NBD seems to be the outlier, not qemu-img bitmap.)



I'm happy to reword slightly to give that caveat.




The code looks good to me, but I wonder whether in the commit message it
should be noted that we don’t want to let bitmaps from deeper nodes
shine through by default everywhere, but just in specific cases where
that’s useful (i.e. only NBD so far AFAIA).


So is this a Reviewed-by?  I'm happy to queue it through my bitmaps
tree, if so.


It wasn’t meant as an R-b, because my R-b depends on how the commit
message addresses the question of when exactly bitmaps from the backing
chain should be visible on the top image.  Whether qemu-img bitmap is an
exception, or whether this is really a case-by-case question.

(I wanted to know whether you agree on including it.  Normally, I’m
happy to give an R-b on the basis of “with that done”, but in this case
I wasn’t entirely sure whether my request was reasonable, but I also
felt that in case it was, it wasn’t optional, given that you do have an
entire paragraph in the commit message dedicated to why the backing
image’s bitmap is visible on an image exported over NBD.)


Here's my rewording:

However, note that on a case-by-case analysis, there _are_ times where
we treat it as a feature that we can access a bitmap from a backing
layer in association with an overlay BDS.  A demonstration of this is
using NBD to expose both an overlay BDS (for constant contents) and a
bitmap (for learning which blocks are interesting) during an
incremental backup:

Base <- Active <- Temporary
  \--block job ->/

where Temporary is being fed by a backup 'sync=none' job.  When
exposing Temporary over NBD, referring to a bitmap that lives only in
Active is less effort than having to copy a bitmap into Temporary [1].
So the testsuite additions in this patch check both where bitmaps get
allocated (the qemu-img info output), and that qemu-nbd is indeed able
to access a bitmap inherited from the backing chain since it is a
different use case than 'qemu-img bitmap'.



I have to say I would like to see how you do phrase it in the end, but
given that you do agree on including it, I can give a

Reviewed-by: Max Reitz 

Now.


Okay, I think I've met your request, so I'll go ahead and send the pull 
request today.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 0/6] block/nvme: Map doorbells pages write-only, remove magic from nvme_init

2020-09-21 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200921162949.553863-1-phi...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

C linker for the host machine: cc ld.bfd 2.27-43
Host machine cpu family: x86_64
Host machine cpu: x86_64
../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or 
forwards compatibility and might not exist in future releases.
Program sh found: YES
Program python3 found: YES (/usr/bin/python3)
Configuring ninjatool using configuration
---
Compiling C object authz/libauthz.fa.p/listfile.c.o
Compiling C object authz/libauthz.fa.p/list.c.o
../src/block/nvme.c: In function 'nvme_file_open':
../src/block/nvme.c:807:8: error: 'regs' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
 if (regs) {
^
../src/block/nvme.c:692:23: note: 'regs' was declared here
 volatile NvmeBar *regs;
   ^
cc1: all warnings being treated as errors
make: *** [libblock.fa.p/block_nvme.c.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 709, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', 
'--label', 'com.qemu.instance.uuid=99233811d86a42b1b27aa376bb85979d', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 
'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-wu686lwl/src/docker-src.2020-09-21-17.52.21.21740:/var/tmp/qemu:z,ro',
 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=99233811d86a42b1b27aa376bb85979d
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-wu686lwl/src'
make: *** [docker-run-test-quick@centos7] Error 2

real3m46.350s
user0m15.940s


The full log is available at
http://patchew.org/logs/20200921162949.553863-1-phi...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

RE: [PATCH v3 01/15] hw/block/nvme: Define 64 bit cqe.result

2020-09-21 Thread Dmitry Fomichev
> -Original Message-
> From: Keith Busch 
> Sent: Friday, September 18, 2020 5:10 PM
> To: Klaus Jensen 
> Cc: Dmitry Fomichev ; Fam Zheng
> ; Kevin Wolf ; Damien Le Moal
> ; qemu-block@nongnu.org; Niklas Cassel
> ; Klaus Jensen ; qemu-
> de...@nongnu.org; Alistair Francis ; Philippe
> Mathieu-Daudé ; Matias Bjorling
> 
> Subject: Re: [PATCH v3 01/15] hw/block/nvme: Define 64 bit cqe.result
> 
> On Tue, Sep 15, 2020 at 10:48:49PM +0200, Klaus Jensen wrote:
> > On Sep 15 20:44, Dmitry Fomichev wrote:
> > >
> > > It is not necessary to change it in NST patch since result64 field is not 
> > > used
> > > in that patch. But if you insist, I can move it to NST patch :)
> >
> > You are right of course, but since it is a change to the "spec" related
> > data structures that go into include/block/nvme.h, I think it belongs in
> > "hw/block/nvme: Introduce the Namespace Types definitions".
> 
> Just my $.02, unless you're making exernal APIs, I really find it easier
> to review internal changes inline with the patches that use it.
> 
> Another example, there are patches in this series that introduce trace
> points, but the patch that use them come later. I find that harder to
> review since I need to look at two different patches to understand its
> value.
> 

I decided to split the trace events to be separate because I felt that it
could make the reviewing process simpler. Since the majority of the patches
in this series are on the large side, I thought it would be easier to open
them in separate sessions rather than to review a large single diff.
Let me know if you'd like me to fold the tracing stuff together with
the code...

DF

> I realize this particular patch is implementing a spec feature, but I'd
> prefer to see how it's used over of making a round trip to the spec.



Re: [PATCH 0/6] block/nvme: Map doorbells pages write-only, remove magic from nvme_init

2020-09-21 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200921162949.553863-1-phi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200921162949.553863-1-phi...@redhat.com
Subject: [PATCH 0/6] block/nvme: Map doorbells pages write-only, remove magic 
from nvme_init

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
b0ef8b7 block/nvme: Replace magic value by SCALE_MS definition
1792c4d block/nvme: Use register definitions from 'block/nvme.h'
352be0c block/nvme: Drop NVMeRegs structure, directly use NvmeBar
7aa184a block/nvme: Reduce I/O registers scope
4b1effc block/nvme: Map doorbells pages write-only
7ae4653 util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()

=== OUTPUT BEGIN ===
1/6 Checking commit 7ae46536369a (util/vfio-helpers: Pass page protections to 
qemu_vfio_pci_map_bar())
2/6 Checking commit 4b1effcfcd2b (block/nvme: Map doorbells pages write-only)
3/6 Checking commit 7aa184a756dc (block/nvme: Reduce I/O registers scope)
4/6 Checking commit 352be0c97869 (block/nvme: Drop NVMeRegs structure, directly 
use NvmeBar)
ERROR: Use of volatile is usually wrong, please add a comment
#43: FILE: block/nvme.c:692:
+volatile NvmeBar *regs;

total: 1 errors, 0 warnings, 62 lines checked

Patch 4/6 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/6 Checking commit 1792c4da4bc2 (block/nvme: Use register definitions from 
'block/nvme.h')
6/6 Checking commit b0ef8b776eb1 (block/nvme: Replace magic value by SCALE_MS 
definition)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200921162949.553863-1-phi...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-21 Thread Eric Blake

On 9/21/20 11:23 AM, Stefan Hajnoczi wrote:

clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included  via a system header file:

   $ CC=clang CXX=clang++ ./configure ... && make
   ../util/async.c:79:17: error: address argument to atomic operation must be a 
pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by . Prefix QEMU's APIs with qemu_ so that atomic.h
and  can co-exist.

This patch was generated using:

   $ git diff | grep -o '\/tmp/changed_identifiers


Missing a step in the recipe: namely, you probably modified 
include/qemu/atomic*.h prior to running 'git diff' (so that you actually 
had input to feed to grep -o).  But spelling it 'git diff HEAD^ 
include/qemu/atomic*.h | ...' does indeed give me a sane list of 
identifiers that looks like what you touched in the rest of the patch.



   $ for identifier in $(

Also not quite the right recipe, based on the file name used in the line 
above.



sed -i "s%\<$identifier\>%qemu_$identifier%" $(git grep -l 
"\<$identifier\>") \
 done



Fortunately, running "git grep -c '\pre-patch state of the tree gives me a list that is somewhat close to 
yours, where the obvious difference in line counts is explained by:



I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi 
---


First, focusing on the change summary:


  docs/devel/lockcnt.txt|  14 +-
  docs/devel/rcu.txt|  40 +--
  accel/tcg/atomic_template.h   |  20 +-
  include/block/aio-wait.h  |   4 +-
  include/block/aio.h   |   8 +-
  include/exec/cpu_ldst.h   |   2 +-
  include/exec/exec-all.h   |   6 +-
  include/exec/log.h|   6 +-
  include/exec/memory.h |   2 +-
  include/exec/ram_addr.h   |  27 +-
  include/exec/ramlist.h|   2 +-
  include/exec/tb-lookup.h  |   4 +-
  include/hw/core/cpu.h |   2 +-
  include/qemu/atomic.h | 258 +++---
  include/qemu/atomic128.h  |   6 +-


These two are the most important for the sake of this patch; perhaps 
it's worth a temporary override of your git orderfile if you have to 
respin, to list them first?



  include/qemu/bitops.h |   2 +-
  include/qemu/coroutine.h  |   2 +-
  include/qemu/log.h|   6 +-
  include/qemu/queue.h  |   8 +-
  include/qemu/rcu.h|  10 +-
  include/qemu/rcu_queue.h  | 103 +++---


Presumably, this and any other file with an odd number of changes was 
due to a difference in lines after reformatting long lines.



  include/qemu/seqlock.h|   8 +-

...


  util/stats64.c|  34 +-
  docs/devel/atomics.rst| 326 +-
  .../opensbi-riscv32-generic-fw_dynamic.elf| Bin 558668 -> 558698 bytes
  .../opensbi-riscv64-generic-fw_dynamic.elf| Bin 620424 -> 620454 bytes


Why are we regenerating .elf files in this patch?  Is your change even 
correct for those two files?



  scripts/kernel-doc|   2 +-
  tcg/aarch64/tcg-target.c.inc  |   2 +-
  tcg/mips/tcg-target.c.inc |   2 +-
  tcg/ppc/tcg-target.c.inc  |   6 +-
  tcg/sparc/tcg-target.c.inc|   5 +-
  135 files changed, 1195 insertions(+), 1130 deletions(-)


I don't spot accel/tcg/atomic_common.c.inc in the list (which declares 
functions such as atomic_trace_rmw_pre) - I guess that's intentional 
based on how you tried to edit only the identifiers you touched in 
include/qemu/atomic*.h.


For the rest of this patch, I only spot-checked in places, trusting the 
mechanical nature of this patch, and not spotting anything wrong in the 
places I checked.  But the two .elf files worry me enough to withhold 
R-b.  At the same time, because it's big, it will probably be a source 
of conflicts if we don't get it in soon, but can also be regenerated (if 
your recipe is corrected) without too much difficulty.  So I am in favor 
of the idea.



diff --git a/pc-bios/opensbi-riscv32-generic-fw_dynamic.elf 
b/pc-bios/opensbi-riscv32-generic-fw_dynamic.elf
index 
eb9ebf5674d3953ab25de6bf4db134abe904eb20..35b80512446dcf5c49424ae90caacf3c579be1b5
 100644
GIT binary patch
delta 98
zcmX@pqx7mrsiB3jg{g(Pg=Gt?!4uZP)ZEhe?LdZ@5QIJ5?Hg+mgxS918!HgAZQt>Y
ceSHN~X?i|K5fhYsvykI97nHrFhGPaN0Hp#a^8f$<

delta 62
zcmaFWqjaW6siB3jg{g(Pg=Gt?!ISN#Pguo-rU!guEo

Re: [PATCH] qemu/atomic.h: prefix qemu_ to solve collisions

2020-09-21 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200921162346.188997-1-stefa...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200921162346.188997-1-stefa...@redhat.com
Subject: [PATCH] qemu/atomic.h: prefix qemu_ to solve  collisions

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20200918103430.297167-1-th...@redhat.com -> 
patchew/20200918103430.297167-1-th...@redhat.com
Switched to a new branch 'test'
25ca702 qemu/atomic.h: prefix qemu_ to solve  collisions

=== OUTPUT BEGIN ===
ERROR: Macros with multiple statements should be enclosed in a do - while loop
#2968: FILE: include/qemu/atomic.h:152:
+#define qemu_atomic_rcu_read__nocheck(ptr, valptr)  \
 __atomic_load(ptr, valptr, __ATOMIC_RELAXED);   \
 smp_read_barrier_depends();

ERROR: space required before that '*' (ctx:VxB)
#3123: FILE: include/qemu/atomic.h:347:
+#define qemu_atomic_read__nocheck(p) (*(__typeof__(*(p)) volatile*) (p))
  ^

ERROR: Use of volatile is usually wrong, please add a comment
#3123: FILE: include/qemu/atomic.h:347:
+#define qemu_atomic_read__nocheck(p) (*(__typeof__(*(p)) volatile*) (p))

ERROR: space required before that '*' (ctx:VxB)
#3125: FILE: include/qemu/atomic.h:349:
+((*(__typeof__(*(p)) volatile*) (p)) = (i))
  ^

ERROR: Use of volatile is usually wrong, please add a comment
#3125: FILE: include/qemu/atomic.h:349:
+((*(__typeof__(*(p)) volatile*) (p)) = (i))

ERROR: space required after that ',' (ctx:VxV)
#3130: FILE: include/qemu/atomic.h:352:
+#define qemu_atomic_set(ptr, i) qemu_atomic_set__nocheck(ptr,i)
 ^

ERROR: memory barrier without comment
#3205: FILE: include/qemu/atomic.h:410:
+#define qemu_atomic_xchg(ptr, i) (smp_mb(), __sync_lock_test_and_set(ptr, i))

WARNING: Block comments use a leading /* on a separate line
#3280: FILE: include/qemu/atomic.h:462:
+/* qemu_atomic_mb_read/set semantics map Java volatile variables. They are

WARNING: Block comments use a leading /* on a separate line
#6394: FILE: util/bitmap.c:214:
+/* If we avoided the full barrier in qemu_atomic_or(), issue a

WARNING: Block comments use a leading /* on a separate line
#7430: FILE: util/rcu.c:85:
+/* Instead of using qemu_atomic_mb_set for index->waiting, and

WARNING: Block comments use a leading /* on a separate line
#7456: FILE: util/rcu.c:154:
+/* In either case, the qemu_atomic_mb_set below blocks stores that free

total: 7 errors, 4 warnings, 6507 lines checked

Commit 25ca7029b2f2 (qemu/atomic.h: prefix qemu_ to solve  
collisions) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200921162346.188997-1-stefa...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 03/17] hw/block/nvme: handle dma errors

2020-09-21 Thread Klaus Jensen
On Sep 21 09:50, Keith Busch wrote:
> On Fri, Sep 18, 2020 at 10:36:07PM +0200, Klaus Jensen wrote:
> > @@ -466,15 +476,21 @@ static void nvme_post_cqes(void *opaque)
> >  break;
> >  }
> >  
> > -QTAILQ_REMOVE(&cq->req_list, req, entry);
> >  sq = req->sq;
> >  req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
> >  req->cqe.sq_id = cpu_to_le16(sq->sqid);
> >  req->cqe.sq_head = cpu_to_le16(sq->head);
> >  addr = cq->dma_addr + cq->tail * n->cqe_size;
> > +ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
> > +sizeof(req->cqe));
> > +if (ret) {
> > +trace_pci_nvme_err_addr_write(addr);
> > +timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> > +  500 * SCALE_MS);
> > +break;
> > +}
> > +QTAILQ_REMOVE(&cq->req_list, req, entry);
> 
> Is this error really ever transient such that a retry later may be
> successful? I didn't find a way that appeared that it could be. If not,
> this probably deserves a CFS condition rather than a retry.
> 

I'll admit that I did this patch with the blktests test case in mind,
and that retrying causes the test to pass.

But I think you are right that retrying might not be the right way to
"pass the test".

I tested and setting CFS also passes the test and now actually exercises
the reset path in the kernel. So wins all around.

I am thinking we do the same for a failed read in nvme_process_sq then?


signature.asc
Description: PGP signature


Re: [PATCH v2 00/17] hw/block/nvme: multiple namespaces support

2020-09-21 Thread Keith Busch
Other than the comments on patches 3 and 9, the rest of the series looks
good to me.

Reviewed-by: Keith Busch 



Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Paolo Bonzini
I think we can just bite the bullet and bump the version number. Just like
not all boards are created equal in terms of migration compatibility,
neither are all devices.

Unfortunately pflash is among those that need some care, but we have much
more leeway with sdhci-pci.

Paolo

Il lun 21 set 2020, 17:08 Markus Armbruster  ha scritto:

> Philippe Mathieu-Daudé  writes:
>
> > On 9/21/20 2:24 PM, Dr. David Alan Gilbert wrote:
> >> * Markus Armbruster (arm...@redhat.com) wrote:
> >>> Philippe Mathieu-Daudé  writes:
> >>>
>  +Paolo & Kevin.
> 
>  On 9/21/20 10:40 AM, Markus Armbruster wrote:
> > Philippe Mathieu-Daudé  writes:
> >
> >> As it is legal to WRITE/ERASE the address/block 0,
> >> change the value of this definition to an illegal
> >> address: UINT32_MAX.
> >>
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >> Cc: Dr. David Alan Gilbert 
> >> Cc: Markus Armbruster 
> >>
> >> Same problem I had with the pflash device last year...
> >> This break migration :(
> >> What is the best way to do this?
> >
> > Remind me: did we solve the problem with pflash, and if yes, how?
> 
>  No we can't. The best I could do is add a comment and as this
>  is not fixable. See commit aba53a12bd5: ("hw/block/pflash_cfi01:
>  Document use of non-CFI compliant command '0x00'").
> 
>  I now consider the device in maintenance-only
>  mode and won't add any new features.
> 
>  I started working on a new implementation, hoping it can be a
>  drop in replacement. Laszlo still has hope that QEMU pflash
>  device will support sector locking so firmware developers could
>  test upgrading fw in VMs.
> 
>  Back to the SDcard, it might be less critical, so a migration
>  breaking change might be acceptable. I'm only aware of Paolo
>  and Kevin using this device for testing. Not sure of its
>  importance in production.
> >>>
> >>> Neither am I.
> >>>
> >>> Which machine types include this device by default?
> >>
> >> To me it looks like it's some of the ARM boards.
> >
> > My worry is TYPE_PCI_SDHCI ("sdhci-pci"):
> >
> > k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > k->device_id = PCI_DEVICE_ID_REDHAT_SDHCI;
> > k->class_id = PCI_CLASS_SYSTEM_SDHCI;
> >
> > config SDHCI_PCI
> > bool
> > default y if PCI_DEVICES
>
> Ah, now I remember.  Not the first time I wished it wouldn't exist...
>
> >>> How can a non-default device be added, and to which machine types?
> >>>
> >>> I gather the fix changes device state incompatibly.  Always, or only in
> >>> certain states?
>
> I think we need to answer this question.
>
> >>>  I'm asking because if device state remains compatible
> >>> most of the time, we might be able use subsection trickery to keep
> >>> migration working most of the time.  Has been done before, I think.
>
>


Re: [PATCH v4 9/9] migration: introduce snapshot-{save, load, delete} QMP commands

2020-09-21 Thread Eric Blake

On 9/16/20 3:17 AM, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


savevm, loadvm and delvm are some of the few HMP commands that have never
been converted to use QMP. The reasons for the lack of conversion are
that they blocked execution of the event thread, and the semantics
around choice of disks were ill-defined.

Despite this downside, however, libvirt and applications using libvirt
have used these commands for as long as QMP has existed, via the
"human-monitor-command" passthrough command. IOW, while it is clearly
desirable to be able to fix the problems, they are not a blocker to
all real world usage.

Meanwhile there is a need for other features which involve adding new
parameters to the commands. This is possible with HMP passthrough, but
it provides no reliable way for apps to introspect features, so using
QAPI modelling is highly desirable.

This patch thus introduces new snapshot-{load,save,delete} commands to
QMP that are intended to replace the old HMP counterparts. The new
commands are given different names, because they will be using the new
QEMU job framework and thus will have diverging behaviour from the HMP
originals. It would thus be misleading to keep the same name.

While this design uses the generic job framework, the current impl is
still blocking. The intention that the blocking problem is fixed later.
None the less applications using these new commands should assume that
they are asynchronous and thus wait for the job status change event to
indicate completion.

In addition to using the job framework, the new commands require the
caller to be explicit about all the block device nodes used in the
snapshot operations, with no built-in default heuristics in use.

Signed-off-by: Daniel P. Berrangé 

[...]

diff --git a/qapi/job.json b/qapi/job.json
index 280c2f76f1..b2cbb4fead 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -22,10 +22,17 @@
  #
  # @amend: image options amend job type, see "x-blockdev-amend" (since 5.1)
  #
+# @snapshot-load: snapshot load job type, see "snapshot-load" (since 5.2)
+#
+# @snapshot-save: snapshot save job type, see "snapshot-save" (since 5.2)
+#
+# @snapshot-delete: snapshot delete job type, see "snapshot-delete" (since 5.2)
+#
  # Since: 1.7
  ##
  { 'enum': 'JobType',
-  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend'] }
+  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend',
+   'snapshot-load', 'snapshot-save', 'snapshot-delete'] }
  
  ##

  # @JobStatus:
diff --git a/qapi/migration.json b/qapi/migration.json
index 675f70bb67..b584c0be31 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1720,3 +1720,123 @@
  ##
  { 'event': 'UNPLUG_PRIMARY',
'data': { 'device-id': 'str' } }
+
+##
+# @snapshot-save:
+#
+# Save a VM snapshot
+#
+# @job-id: identifier for the newly created job
+# @tag: name of the snapshot to create
+# @devices: list of block device node names to save a snapshot to


Looks like you dropped the idea to also accept drive IDs.  Is that for
good, or would you like to add it later?


Is it necessary?  Several of our newer block interfaces have required 
node names, rather than permitting alternation.  If we rewrite the 
existing HMP commands to operate on top of the new QMP command, it is 
still possible for HMP to support drive names even when QMP does not.  I 
don't think the complexity of worrying about drive names is worth it; 
after all, the QMP command is new enough that the only libvirt that will 
use it is also a libvirt that knows how to use -blockdev, and thus node 
names are sufficient.


Yes, we can add drive ids later if I turn out to be wrong, but for now, 
I'm hoping their exclusion is intentional.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 09/17] hw/block/nvme: refactor aio submission

2020-09-21 Thread Klaus Jensen
On Sep 21 08:20, Keith Busch wrote:
> On Fri, Sep 18, 2020 at 10:36:13PM +0200, Klaus Jensen wrote:
> > +static inline bool nvme_req_is_write(NvmeRequest *req)
> > +{
> > +switch (req->cmd.opcode) {
> > +case NVME_CMD_WRITE:
> > +case NVME_CMD_WRITE_ZEROES:
> > +return true;
> > +default:
> > +return false;
> > +}
> > +}
> 
> It doesn't look like this is called for WRITE_ZEROES anywhere. It also
> looks like this helper is a bit unnecessary. We can reorganize some of
> the flow so that we're not checking the opcode twice:
> 

Ooops. Yes, that is a leftover from when I had a patch that combined
nvme_rw and nvme_write_zeroes in the series.

I'll remove it.


signature.asc
Description: PGP signature


Re: [PATCH v4 6/9] migration: wire up support for snapshot device selection

2020-09-21 Thread Dr. David Alan Gilbert
* Daniel P. Berrangé (berra...@redhat.com) wrote:
> Modify load_snapshot/save_snapshot to accept the device list and vmstate
> node name parameters previously added to the block layer.
> 
> Signed-off-by: Daniel P. Berrangé 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  include/migration/snapshot.h | 12 ++--
>  migration/savevm.c   | 24 ++--
>  monitor/hmp-cmds.c   |  4 ++--
>  replay/replay-snapshot.c |  4 ++--
>  softmmu/vl.c |  2 +-
>  5 files changed, 29 insertions(+), 17 deletions(-)
> 
> diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
> index d7db1174ef..b2c72e0a1b 100644
> --- a/include/migration/snapshot.h
> +++ b/include/migration/snapshot.h
> @@ -15,7 +15,15 @@
>  #ifndef QEMU_MIGRATION_SNAPSHOT_H
>  #define QEMU_MIGRATION_SNAPSHOT_H
>  
> -int save_snapshot(const char *name, bool overwrite, Error **errp);
> -int load_snapshot(const char *name, Error **errp);
> +#include "qapi/qapi-builtin-types.h"
> +
> +int save_snapshot(const char *name, bool overwrite,
> +  const char *vmstate,
> +  bool has_devices, strList *devices,
> +  Error **errp);
> +int load_snapshot(const char *name,
> +  const char *vmstate,
> +  bool has_devices, strList *devices,
> +  Error **errp);
>  
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 2025e3e579..b3d2ce7045 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -43,6 +43,8 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-migration.h"
>  #include "qapi/qapi-commands-misc.h"
> +#include "qapi/clone-visitor.h"
> +#include "qapi/qapi-builtin-visit.h"
>  #include "qapi/qmp/qerror.h"
>  #include "qemu/error-report.h"
>  #include "sysemu/cpus.h"
> @@ -2658,7 +2660,8 @@ int qemu_load_device_state(QEMUFile *f)
>  return 0;
>  }
>  
> -int save_snapshot(const char *name, bool overwrite, Error **errp)
> +int save_snapshot(const char *name, bool overwrite, const char *vmstate,
> +  bool has_devices, strList *devices, Error **errp)
>  {
>  BlockDriverState *bs;
>  QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1;
> @@ -2680,18 +2683,18 @@ int save_snapshot(const char *name, bool overwrite, 
> Error **errp)
>  return ret;
>  }
>  
> -if (!bdrv_all_can_snapshot(false, NULL, errp)) {
> +if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
>  return ret;
>  }
>  
>  /* Delete old snapshots of the same name */
>  if (name && overwrite) {
> -if (bdrv_all_delete_snapshot(name, false, NULL, errp) < 0) {
> +if (bdrv_all_delete_snapshot(name, has_devices, devices, errp) < 0) {
>  return ret;
>  }
>  }
>  
> -bs = bdrv_all_find_vmstate_bs(NULL, false, NULL, errp);
> +bs = bdrv_all_find_vmstate_bs(vmstate, has_devices, devices, errp);
>  if (bs == NULL) {
>  return ret;
>  }
> @@ -2757,7 +2760,7 @@ int save_snapshot(const char *name, bool overwrite, 
> Error **errp)
>  aio_context_release(aio_context);
>  aio_context = NULL;
>  
> -ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, false, NULL, errp);
> +ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, has_devices, 
> devices, errp);
>  if (ret < 0) {
>  goto the_end;
>  }
> @@ -2858,7 +2861,8 @@ void qmp_xen_load_devices_state(const char *filename, 
> Error **errp)
>  migration_incoming_state_destroy();
>  }
>  
> -int load_snapshot(const char *name, Error **errp)
> +int load_snapshot(const char *name, const char *vmstate,
> +  bool has_devices, strList *devices, Error **errp)
>  {
>  BlockDriverState *bs_vm_state;
>  QEMUSnapshotInfo sn;
> @@ -2873,15 +2877,15 @@ int load_snapshot(const char *name, Error **errp)
>  return -1;
>  }
>  
> -if (!bdrv_all_can_snapshot(false, NULL, errp)) {
> +if (!bdrv_all_can_snapshot(has_devices, devices, errp)) {
>  return -1;
>  }
> -ret = bdrv_all_find_snapshot(name, false, NULL, errp);
> +ret = bdrv_all_find_snapshot(name, has_devices, devices, errp);
>  if (ret < 0) {
>  return -1;
>  }
>  
> -bs_vm_state = bdrv_all_find_vmstate_bs(NULL, false, NULL, errp);
> +bs_vm_state = bdrv_all_find_vmstate_bs(vmstate, has_devices, devices, 
> errp);
>  if (!bs_vm_state) {
>  return -1;
>  }
> @@ -2902,7 +2906,7 @@ int load_snapshot(const char *name, Error **errp)
>  /* Flush all IO requests so they don't interfere with the new state.  */
>  bdrv_drain_all_begin();
>  
> -ret = bdrv_all_goto_snapshot(name, false, NULL, errp);
> +ret = bdrv_all_goto_snapshot(name, has_devices, devices, errp);
>  if (ret < 0) {
>  goto err_drain;
>  }
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index c1b8533d0c..1191aac3ae 100644
> --- a/monitor/hmp-cmds.c
> +++

[PATCH] docs: Document the throttle block filter

2020-09-21 Thread Alberto Garcia
This filter was added back in 2017 for QEMU 2.11 but it was never
properly documented, so let's explain how it works and add a couple of
examples.

Signed-off-by: Alberto Garcia 
---
 docs/throttle.txt | 107 +-
 1 file changed, 106 insertions(+), 1 deletion(-)

diff --git a/docs/throttle.txt b/docs/throttle.txt
index cd4e109d39..c06d1b9662 100644
--- a/docs/throttle.txt
+++ b/docs/throttle.txt
@@ -1,6 +1,6 @@
 The QEMU throttling infrastructure
 ==
-Copyright (C) 2016 Igalia, S.L.
+Copyright (C) 2016,2020 Igalia, S.L.
 Author: Alberto Garcia 
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -253,3 +253,108 @@ up. After those 60 seconds the bucket will have leaked 60 
x 100 =
 
 Also, due to the way the algorithm works, longer burst can be done at
 a lower I/O rate, e.g. 1000 IOPS during 120 seconds.
+
+
+The 'throttle' block filter
+---
+Since QEMU 2.11 it is possible to configure the I/O limits using a
+'throttle' block filter. This filter uses the exact same throttling
+infrastructure described above but can be used anywhere in the node
+graph, allowing for more flexibility.
+
+The user can create an arbitrary number of filters and each one of
+them must be assigned to a group that contains the actual I/O limits.
+Different filters can use the same group so the limits are shared as
+described earlier in "Applying I/O limits to groups of disks".
+
+A group can be created using the object-add QMP function:
+
+   { "execute": "object-add",
+ "arguments": {
+   "qom-type": "throttle-group",
+   "id": "group0",
+   "props": {
+ "limits" : {
+   "iops-total": 1000
+   "bps-write": 2097152
+ }
+   }
+ }
+   }
+
+throttle-group has a 'limits' property (of type ThrottleLimits as
+defined in qapi/block-core.json) which can be set on creation or later
+with 'qom-set'.
+
+A throttle-group can also be created with the -object command line
+option but at the moment there is no way to pass a 'limits' parameter
+that contains a ThrottleLimits structure. The solution is to set the
+individual values directly, like in this example:
+
+   -object throttle-group,id=group0,x-iops-total=1000,x-bps-write=2097152
+
+Note however that this not stable API (hence the 'x-' prefixes) and
+can change or disappear in the future.
+
+Once we have a throttle-group we can use the throttle block filter,
+where the 'file' property must be set to the block device that we want
+to filter:
+
+   { "execute": "blockdev-add",
+ "arguments": {
+"options":  {
+   "driver": "qcow2",
+   "node-name": "disk0",
+   "file": {
+  "driver": "file",
+  "filename": "/path/to/disk.qcow2"
+   }
+}
+ }
+   }
+
+   { "execute": "blockdev-add",
+ "arguments": {
+"driver": "throttle",
+"node-name": "throttle0",
+"throttle-group": "group0",
+"file": "disk0"
+ }
+   }
+
+A similar setup can also be done with the command line, for example:
+
+   -drive driver=throttle,throttle-group=group0,
+  file.driver=qcow2,file.file.filename=/path/to/disk.qcow2
+
+The scenario described so far is very simple but the throttle block
+filter allows for more complex configurations. For example, let's say
+that we have three different drives and we want to set I/O limits for
+each one of them and an additional set of limits for the combined I/O
+of all three drives.
+
+First we would define all throttle groups, one for each one of the
+drives and one that would apply to all of them:
+
+   -object throttle-group,id=limits0,x-iops-total=2000
+   -object throttle-group,id=limits1,x-iops-total=2500
+   -object throttle-group,id=limits2,x-iops-total=3000
+   -object throttle-group,id=limits012,x-iops-total=4000
+
+Now we can define the drives, and for each one of them we use two
+chained throttle filters: the drive's own filter and the combined
+filter.
+
+   -drive driver=throttle,throttle-group=limits012,
+  file.driver=throttle,file.throttle-group=limits0
+  file.file.driver=qcow2,file.file.file.filename=/path/to/disk0.qcow2
+   -drive driver=throttle,throttle-group=limits012,
+  file.driver=throttle,file.throttle-group=limits1
+  file.file.driver=qcow2,file.file.file.filename=/path/to/disk1.qcow2
+   -drive driver=throttle,throttle-group=limits012,
+  file.driver=throttle,file.throttle-group=limits2
+  file.file.driver=qcow2,file.file.file.filename=/path/to/disk2.qcow2
+
+In this example the individual drives have IOPS limits of 2000, 2500
+and 3000 respectively but the total combined I/O can never exceed 4000
+IOPS.
-- 
2.20.1




Re: [PATCH v2 03/17] hw/block/nvme: handle dma errors

2020-09-21 Thread Keith Busch
On Fri, Sep 18, 2020 at 10:36:07PM +0200, Klaus Jensen wrote:
> @@ -466,15 +476,21 @@ static void nvme_post_cqes(void *opaque)
>  break;
>  }
>  
> -QTAILQ_REMOVE(&cq->req_list, req, entry);
>  sq = req->sq;
>  req->cqe.status = cpu_to_le16((req->status << 1) | cq->phase);
>  req->cqe.sq_id = cpu_to_le16(sq->sqid);
>  req->cqe.sq_head = cpu_to_le16(sq->head);
>  addr = cq->dma_addr + cq->tail * n->cqe_size;
> +ret = pci_dma_write(&n->parent_obj, addr, (void *)&req->cqe,
> +sizeof(req->cqe));
> +if (ret) {
> +trace_pci_nvme_err_addr_write(addr);
> +timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> +  500 * SCALE_MS);
> +break;
> +}
> +QTAILQ_REMOVE(&cq->req_list, req, entry);

Is this error really ever transient such that a retry later may be
successful? I didn't find a way that appeared that it could be. If not,
this probably deserves a CFS condition rather than a retry.



Re: [RFC] Move to C11 Atomics

2020-09-21 Thread Stefan Hajnoczi
On Mon, Sep 21, 2020 at 04:16:07PM +0200, Paolo Bonzini wrote:
> On 21/09/20 15:44, Stefan Hajnoczi wrote:
> > On Mon, Sep 21, 2020 at 01:15:30PM +0200, Paolo Bonzini wrote:
> >> On 21/09/20 12:41, Stefan Hajnoczi wrote:
> > They don't provide atomic arithmetic/logic operations. The only
> > non-seq-cst ALU operation I see in atomic.h is
> > atomic_fetch_inc_nonzero(), and it's a cmpxchg() loop (ugly compared to
> > an atomic ALU instruction).
> 
> Seq-cst is fine for RMW operations (arithmetic/logic, xchg, cmpxchg),
> also because they're usually less performance critical than loads and
> stores.  It's only loads and stores that give a false sense of
> correctness as in the above commit.

Okay. I've sent a patch to simply prefix atomic.h atomic_*() functions
with qemu_. That way they don't conflict with .

Stefan


signature.asc
Description: PGP signature


[PATCH 6/6] block/nvme: Replace magic value by SCALE_MS definition

2020-09-21 Thread Philippe Mathieu-Daudé
Use self-explicit SCALE_MS definition instead of magic value
(missed in similar commit e4f310fe7f5).

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index d18e897c7c9..b3dab1c4610 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -772,7 +772,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
CC_EN_MASK);
 /* Wait for CSTS.RDY = 1. */
 now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-deadline = now + timeout_ms * 100;
+deadline = now + timeout_ms * SCALE_MS;
 while (!NVME_CSTS_RDY(le32_to_cpu(regs->csts))) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to start (%"
-- 
2.26.2




[PATCH 4/6] block/nvme: Drop NVMeRegs structure, directly use NvmeBar

2020-09-21 Thread Philippe Mathieu-Daudé
NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar
directly.

This triggers a checkpatch.pl error:

  ERROR: Use of volatile is usually wrong, please add a comment
  #30: FILE: block/nvme.c:691:
  +volatile NvmeBar *regs;

This is a false positive as in our case we are using I/O registers,
so the 'volatile' use is justified.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index e517c7539ff..83bae6b2f13 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -81,11 +81,6 @@ typedef struct {
 QEMUBH  *completion_bh;
 } NVMeQueuePair;
 
-/* Memory mapped registers */
-typedef volatile struct {
-NvmeBar ctrl;
-} NVMeRegs;
-
 #define INDEX_ADMIN 0
 #define INDEX_IO(n) (1 + n)
 
@@ -694,7 +689,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 uint64_t timeout_ms;
 uint64_t deadline, now;
 Error *local_err = NULL;
-NVMeRegs *regs;
+volatile NvmeBar *regs;
 
 qemu_co_mutex_init(&s->dma_map_lock);
 qemu_co_queue_init(&s->dma_flush_queue);
@@ -722,7 +717,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 /* Perform initialize sequence as described in NVMe spec "7.6.1
  * Initialization". */
 
-cap = le64_to_cpu(regs->ctrl.cap);
+cap = le64_to_cpu(regs->cap);
 if (!(cap & (1ULL << 37))) {
 error_setg(errp, "Device doesn't support NVMe command set");
 ret = -EINVAL;
@@ -735,10 +730,10 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 3);
 
 /* Reset device to get a clean state. */
-regs->ctrl.cc = cpu_to_le32(le32_to_cpu(regs->ctrl.cc) & 0xFE);
+regs->cc = cpu_to_le32(le32_to_cpu(regs->cc) & 0xFE);
 /* Wait for CSTS.RDY = 0. */
 deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * SCALE_MS;
-while (le32_to_cpu(regs->ctrl.csts) & 0x1) {
+while (le32_to_cpu(regs->csts) & 0x1) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to reset (%"
  PRId64 " ms)",
@@ -766,18 +761,18 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 s->nr_queues = 1;
 QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
-regs->ctrl.aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
-regs->ctrl.asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
-regs->ctrl.acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
+regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
+regs->asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
+regs->acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
 
 /* After setting up all control registers we can enable device now. */
-regs->ctrl.cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
+regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
   (ctz32(NVME_SQ_ENTRY_BYTES) << 16) |
   0x1);
 /* Wait for CSTS.RDY = 1. */
 now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 deadline = now + timeout_ms * 100;
-while (!(le32_to_cpu(regs->ctrl.csts) & 0x1)) {
+while (!(le32_to_cpu(regs->csts) & 0x1)) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to start (%"
  PRId64 " ms)",
-- 
2.26.2




[PATCH 3/6] block/nvme: Reduce I/O registers scope

2020-09-21 Thread Philippe Mathieu-Daudé
We only access the I/O register in nvme_init().
Remove the reference in BDRVNVMeState and reduce its scope.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 3c834da8fec..e517c7539ff 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -98,7 +98,6 @@ enum {
 struct BDRVNVMeState {
 AioContext *aio_context;
 QEMUVFIOState *vfio;
-NVMeRegs *regs;
 /* Memory mapped registers */
 volatile struct {
 uint32_t sq_tail;
@@ -695,6 +694,7 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 uint64_t timeout_ms;
 uint64_t deadline, now;
 Error *local_err = NULL;
+NVMeRegs *regs;
 
 qemu_co_mutex_init(&s->dma_map_lock);
 qemu_co_queue_init(&s->dma_flush_queue);
@@ -713,16 +713,16 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 goto out;
 }
 
-s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
-PROT_READ | PROT_WRITE, errp);
-if (!s->regs) {
+regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
+ PROT_READ | PROT_WRITE, errp);
+if (!regs) {
 ret = -EINVAL;
 goto out;
 }
 /* Perform initialize sequence as described in NVMe spec "7.6.1
  * Initialization". */
 
-cap = le64_to_cpu(s->regs->ctrl.cap);
+cap = le64_to_cpu(regs->ctrl.cap);
 if (!(cap & (1ULL << 37))) {
 error_setg(errp, "Device doesn't support NVMe command set");
 ret = -EINVAL;
@@ -735,10 +735,10 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 3);
 
 /* Reset device to get a clean state. */
-s->regs->ctrl.cc = cpu_to_le32(le32_to_cpu(s->regs->ctrl.cc) & 0xFE);
+regs->ctrl.cc = cpu_to_le32(le32_to_cpu(regs->ctrl.cc) & 0xFE);
 /* Wait for CSTS.RDY = 0. */
 deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * SCALE_MS;
-while (le32_to_cpu(s->regs->ctrl.csts) & 0x1) {
+while (le32_to_cpu(regs->ctrl.csts) & 0x1) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to reset (%"
  PRId64 " ms)",
@@ -766,18 +766,18 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 s->nr_queues = 1;
 QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
-s->regs->ctrl.aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
-s->regs->ctrl.asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
-s->regs->ctrl.acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
+regs->ctrl.aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
+regs->ctrl.asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
+regs->ctrl.acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
 
 /* After setting up all control registers we can enable device now. */
-s->regs->ctrl.cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
+regs->ctrl.cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
   (ctz32(NVME_SQ_ENTRY_BYTES) << 16) |
   0x1);
 /* Wait for CSTS.RDY = 1. */
 now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 deadline = now + timeout_ms * 100;
-while (!(le32_to_cpu(s->regs->ctrl.csts) & 0x1)) {
+while (!(le32_to_cpu(regs->ctrl.csts) & 0x1)) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to start (%"
  PRId64 " ms)",
@@ -808,6 +808,10 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 ret = -EIO;
 }
 out:
+if (regs) {
+qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)regs, 0, sizeof(NvmeBar));
+}
+
 /* Cleaning up is done in nvme_file_open() upon error. */
 return ret;
 }
@@ -882,7 +886,6 @@ static void nvme_close(BlockDriverState *bs)
 event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]);
 qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
 sizeof(NvmeBar), NVME_DOORBELL_SIZE);
-qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, sizeof(NvmeBar));
 qemu_vfio_close(s->vfio);
 
 g_free(s->device);
-- 
2.26.2




[PATCH 5/6] block/nvme: Use register definitions from 'block/nvme.h'

2020-09-21 Thread Philippe Mathieu-Daudé
Use the NVMe register definitions from "block/nvme.h" which
ease a bit reviewing the code while matching the datasheet.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 83bae6b2f13..d18e897c7c9 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -718,22 +718,22 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
  * Initialization". */
 
 cap = le64_to_cpu(regs->cap);
-if (!(cap & (1ULL << 37))) {
+if (!NVME_CAP_CSS(cap)) {
 error_setg(errp, "Device doesn't support NVMe command set");
 ret = -EINVAL;
 goto out;
 }
 
-s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
-s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
+s->page_size = MAX(4096, 1 << NVME_CAP_MPSMIN(cap));
+s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
 bs->bl.opt_mem_alignment = s->page_size;
-timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 3);
+timeout_ms = MIN(500 * NVME_CAP_TO(cap), 3);
 
 /* Reset device to get a clean state. */
 regs->cc = cpu_to_le32(le32_to_cpu(regs->cc) & 0xFE);
 /* Wait for CSTS.RDY = 0. */
 deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * SCALE_MS;
-while (le32_to_cpu(regs->csts) & 0x1) {
+while (NVME_CSTS_RDY(le32_to_cpu(regs->csts))) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to reset (%"
  PRId64 " ms)",
@@ -761,18 +761,19 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 s->nr_queues = 1;
 QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
-regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
+regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << AQA_ACQS_SHIFT) |
+(NVME_QUEUE_SIZE << AQA_ASQS_SHIFT));
 regs->asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
 regs->acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
 
 /* After setting up all control registers we can enable device now. */
-regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
-  (ctz32(NVME_SQ_ENTRY_BYTES) << 16) |
-  0x1);
+regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << CC_IOCQES_SHIFT) |
+   (ctz32(NVME_SQ_ENTRY_BYTES) << CC_IOSQES_SHIFT) |
+   CC_EN_MASK);
 /* Wait for CSTS.RDY = 1. */
 now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 deadline = now + timeout_ms * 100;
-while (!(le32_to_cpu(regs->csts) & 0x1)) {
+while (!NVME_CSTS_RDY(le32_to_cpu(regs->csts))) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to start (%"
  PRId64 " ms)",
-- 
2.26.2




[PATCH 2/6] block/nvme: Map doorbells pages write-only

2020-09-21 Thread Philippe Mathieu-Daudé
Per the datasheet sections 3.1.13/3.1.14:
  "The host should not read the doorbell registers."

As we don't need read access, map the doorbells with write-only
permission. We keep a reference to this mapped address in the
BDRVNVMeState structure.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nvme.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5a4dc6a722a..3c834da8fec 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -31,7 +31,7 @@
 #define NVME_SQ_ENTRY_BYTES 64
 #define NVME_CQ_ENTRY_BYTES 16
 #define NVME_QUEUE_SIZE 128
-#define NVME_BAR_SIZE 8192
+#define NVME_DOORBELL_SIZE 4096
 
 /*
  * We have to leave one slot empty as that is the full queue case where
@@ -84,10 +84,6 @@ typedef struct {
 /* Memory mapped registers */
 typedef volatile struct {
 NvmeBar ctrl;
-struct {
-uint32_t sq_tail;
-uint32_t cq_head;
-} doorbells[];
 } NVMeRegs;
 
 #define INDEX_ADMIN 0
@@ -103,6 +99,11 @@ struct BDRVNVMeState {
 AioContext *aio_context;
 QEMUVFIOState *vfio;
 NVMeRegs *regs;
+/* Memory mapped registers */
+volatile struct {
+uint32_t sq_tail;
+uint32_t cq_head;
+} *doorbells;
 /* The submission/completion queue pairs.
  * [0]: admin queue.
  * [1..]: io queues.
@@ -247,14 +248,14 @@ static NVMeQueuePair 
*nvme_create_queue_pair(BDRVNVMeState *s,
 error_propagate(errp, local_err);
 goto fail;
 }
-q->sq.doorbell = &s->regs->doorbells[idx * s->doorbell_scale].sq_tail;
+q->sq.doorbell = &s->doorbells[idx * s->doorbell_scale].sq_tail;
 
 nvme_init_queue(s, &q->cq, size, NVME_CQ_ENTRY_BYTES, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto fail;
 }
-q->cq.doorbell = &s->regs->doorbells[idx * s->doorbell_scale].cq_head;
+q->cq.doorbell = &s->doorbells[idx * s->doorbell_scale].cq_head;
 
 return q;
 fail:
@@ -712,13 +713,12 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 goto out;
 }
 
-s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, NVME_BAR_SIZE,
+s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, sizeof(NvmeBar),
 PROT_READ | PROT_WRITE, errp);
 if (!s->regs) {
 ret = -EINVAL;
 goto out;
 }
-
 /* Perform initialize sequence as described in NVMe spec "7.6.1
  * Initialization". */
 
@@ -748,6 +748,13 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 }
 
+s->doorbells = qemu_vfio_pci_map_bar(s->vfio, 0, sizeof(NvmeBar),
+ NVME_DOORBELL_SIZE, PROT_WRITE, errp);
+if (!s->doorbells) {
+ret = -EINVAL;
+goto out;
+}
+
 /* Set up admin queue. */
 s->queues = g_new(NVMeQueuePair *, 1);
 s->queues[INDEX_ADMIN] = nvme_create_queue_pair(s, aio_context, 0,
@@ -873,7 +880,9 @@ static void nvme_close(BlockDriverState *bs)
&s->irq_notifier[MSIX_SHARED_IRQ_IDX],
false, NULL, NULL);
 event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]);
-qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, NVME_BAR_SIZE);
+qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->doorbells,
+sizeof(NvmeBar), NVME_DOORBELL_SIZE);
+qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)s->regs, 0, sizeof(NvmeBar));
 qemu_vfio_close(s->vfio);
 
 g_free(s->device);
-- 
2.26.2




[PATCH 0/6] block/nvme: Map doorbells pages write-only, remove magic from nvme_init

2020-09-21 Thread Philippe Mathieu-Daudé
Instead of mapping 8K of I/O + doorbells as RW during the whole
execution, maps I/O temporarly at init, and map doorbells WO.

Replace various magic values by slighly more explicit macros from
"block/nvme.h".

Philippe Mathieu-Daudé (6):
  util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()
  block/nvme: Map doorbells pages write-only
  block/nvme: Reduce I/O registers scope
  block/nvme: Drop NVMeRegs structure, directly use NvmeBar
  block/nvme: Use register definitions from 'block/nvme.h'
  block/nvme: Replace magic value by SCALE_MS definition

 include/qemu/vfio-helpers.h |  2 +-
 block/nvme.c| 73 +
 util/vfio-helpers.c |  4 +-
 3 files changed, 44 insertions(+), 35 deletions(-)

-- 
2.26.2




[PATCH 1/6] util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()

2020-09-21 Thread Philippe Mathieu-Daudé
Pages are currently mapped READ/WRITE. To be able to use different
protections, add a new argument to qemu_vfio_pci_map_bar().

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/vfio-helpers.h | 2 +-
 block/nvme.c| 3 ++-
 util/vfio-helpers.c | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 1f057c2b9e4..4491c8e1a6e 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -22,7 +22,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
 void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host);
 void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
-uint64_t offset, uint64_t size,
+uint64_t offset, uint64_t size, int prot,
 Error **errp);
 void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar,
  uint64_t offset, uint64_t size);
diff --git a/block/nvme.c b/block/nvme.c
index f4f27b6da7d..5a4dc6a722a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -712,7 +712,8 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 goto out;
 }
 
-s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, NVME_BAR_SIZE, errp);
+s->regs = qemu_vfio_pci_map_bar(s->vfio, 0, 0, NVME_BAR_SIZE,
+PROT_READ | PROT_WRITE, errp);
 if (!s->regs) {
 ret = -EINVAL;
 goto out;
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 583bdfb36fc..9ac307e3d42 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -146,13 +146,13 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int 
index, Error **errp)
  * Map a PCI bar area.
  */
 void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
-uint64_t offset, uint64_t size,
+uint64_t offset, uint64_t size, int prot,
 Error **errp)
 {
 void *p;
 assert_bar_index_valid(s, index);
 p = mmap(NULL, MIN(size, s->bar_region_info[index].size - offset),
- PROT_READ | PROT_WRITE, MAP_SHARED,
+ prot, MAP_SHARED,
  s->device, s->bar_region_info[index].offset + offset);
 if (p == MAP_FAILED) {
 error_setg_errno(errp, errno, "Failed to map BAR region");
-- 
2.26.2




Re: [PATCH v2 07/17] hw/block/nvme: fix endian conversion

2020-09-21 Thread Philippe Mathieu-Daudé
On 9/18/20 10:36 PM, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The raw NLB field is a 16 bit value, so use le16_to_cpu instead of
> le32_to_cpu and cast to uint32_t before incrementing the value to not
> wrap around.
> 
> Signed-off-by: Klaus Jensen 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  hw/block/nvme.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 62db87460413..32267a3e4782 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -645,7 +645,7 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, 
> NvmeRequest *req)
>  NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
>  NvmeNamespace *ns = req->ns;
>  uint64_t slba = le64_to_cpu(rw->slba);
> -uint32_t nlb  = le16_to_cpu(rw->nlb) + 1;
> +uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
>  uint64_t offset = nvme_l2b(ns, slba);
>  uint32_t count = nvme_l2b(ns, nlb);
>  uint16_t status;
> @@ -669,7 +669,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
>  {
>  NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
>  NvmeNamespace *ns = req->ns;
> -uint32_t nlb  = le32_to_cpu(rw->nlb) + 1;
> +uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
>  uint64_t slba = le64_to_cpu(rw->slba);
>  
>  uint64_t data_size = nvme_l2b(ns, nlb);
> 




Re: [PATCH v2 01/17] hw/block/nvme: fix typo in trace event

2020-09-21 Thread Philippe Mathieu-Daudé
On 9/18/20 10:36 PM, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Fix a typo in the sq doorbell trace event.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/trace-events | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index ec94c56a4165..8ff4cbc4932c 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -70,7 +70,7 @@ pci_nvme_enqueue_req_completion(uint16_t cid, uint16_t 
> cqid, uint16_t status) "c
>  pci_nvme_mmio_read(uint64_t addr) "addr 0x%"PRIx64""
>  pci_nvme_mmio_write(uint64_t addr, uint64_t data) "addr 0x%"PRIx64" data 
> 0x%"PRIx64""
>  pci_nvme_mmio_doorbell_cq(uint16_t cqid, uint16_t new_head) "cqid %"PRIu16" 
> new_head %"PRIu16""
> -pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "cqid %"PRIu16" 
> new_tail %"PRIu16""
> +pci_nvme_mmio_doorbell_sq(uint16_t sqid, uint16_t new_tail) "sqid %"PRIu16" 
> new_tail %"PRIu16""

Oops.
Reviewed-by: Philippe Mathieu-Daudé 

>  pci_nvme_mmio_intm_set(uint64_t data, uint64_t new_mask) "wrote MMIO, 
> interrupt mask set, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  pci_nvme_mmio_intm_clr(uint64_t data, uint64_t new_mask) "wrote MMIO, 
> interrupt mask clr, data=0x%"PRIx64", new_mask=0x%"PRIx64""
>  pci_nvme_mmio_cfg(uint64_t data) "wrote MMIO, config controller 
> config=0x%"PRIx64""
> 




Re: [PATCH v2 09/17] hw/block/nvme: refactor aio submission

2020-09-21 Thread Keith Busch
On Fri, Sep 18, 2020 at 10:36:13PM +0200, Klaus Jensen wrote:
> +static inline bool nvme_req_is_write(NvmeRequest *req)
> +{
> +switch (req->cmd.opcode) {
> +case NVME_CMD_WRITE:
> +case NVME_CMD_WRITE_ZEROES:
> +return true;
> +default:
> +return false;
> +}
> +}

It doesn't look like this is called for WRITE_ZEROES anywhere. It also
looks like this helper is a bit unnecessary. We can reorganize some of
the flow so that we're not checking the opcode twice:

> +static uint16_t nvme_do_aio(BlockBackend *blk, int64_t offset, size_t len,
> +NvmeRequest *req)
> +{
> +BlockAcctCookie *acct = &req->acct;
> +BlockAcctStats *stats = blk_get_stats(blk);
> +
> +bool is_write;

bool is_write = false;

> +trace_pci_nvme_do_aio(nvme_cid(req), req->cmd.opcode,
> +  nvme_io_opc_str(req->cmd.opcode), blk_name(blk),
> +  offset, len);
> +
> +switch (req->cmd.opcode) {
> +case NVME_CMD_FLUSH:
> +block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
> +req->aiocb = blk_aio_flush(blk, nvme_rw_cb, req);
> +break;
> +
> +case NVME_CMD_WRITE_ZEROES:
> +block_acct_start(stats, acct, len, BLOCK_ACCT_WRITE);
> +req->aiocb = blk_aio_pwrite_zeroes(blk, offset, len,
> +   BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
> +   req);
> +break;
> +
> +case NVME_CMD_READ:
> +case NVME_CMD_WRITE:
> +is_write = nvme_req_is_write(req);

   case NVME_CMD_WRITE:
   is_write = true; /* fallthrough */
   case NVME_CMD_READ:
   

> +
> +block_acct_start(stats, acct, len,
> + is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> +
> +if (req->qsg.sg) {
> +if (is_write) {
> +req->aiocb = dma_blk_write(blk, &req->qsg, offset,
> +   BDRV_SECTOR_SIZE, nvme_rw_cb, 
> req);
> +} else {
> +req->aiocb = dma_blk_read(blk, &req->qsg, offset,
> +  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
> +}
> +} else {
> +if (is_write) {
> +req->aiocb = blk_aio_pwritev(blk, offset, &req->iov, 0,
> + nvme_rw_cb, req);
> +} else {
> +req->aiocb = blk_aio_preadv(blk, offset, &req->iov, 0,
> +nvme_rw_cb, req);
> +}
> +}
> +
> +break;
> +}
> +
> +return NVME_NO_COMPLETE;
>  }



Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> On 9/21/20 2:24 PM, Dr. David Alan Gilbert wrote:
>> * Markus Armbruster (arm...@redhat.com) wrote:
>>> Philippe Mathieu-Daudé  writes:
>>>
 +Paolo & Kevin.

 On 9/21/20 10:40 AM, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
>
>> As it is legal to WRITE/ERASE the address/block 0,
>> change the value of this definition to an illegal
>> address: UINT32_MAX.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> Cc: Dr. David Alan Gilbert 
>> Cc: Markus Armbruster 
>>
>> Same problem I had with the pflash device last year...
>> This break migration :(
>> What is the best way to do this?
>
> Remind me: did we solve the problem with pflash, and if yes, how?

 No we can't. The best I could do is add a comment and as this
 is not fixable. See commit aba53a12bd5: ("hw/block/pflash_cfi01:
 Document use of non-CFI compliant command '0x00'").

 I now consider the device in maintenance-only
 mode and won't add any new features.

 I started working on a new implementation, hoping it can be a
 drop in replacement. Laszlo still has hope that QEMU pflash
 device will support sector locking so firmware developers could
 test upgrading fw in VMs.

 Back to the SDcard, it might be less critical, so a migration
 breaking change might be acceptable. I'm only aware of Paolo
 and Kevin using this device for testing. Not sure of its
 importance in production.
>>>
>>> Neither am I.
>>>
>>> Which machine types include this device by default?
>> 
>> To me it looks like it's some of the ARM boards.
>
> My worry is TYPE_PCI_SDHCI ("sdhci-pci"):
>
> k->vendor_id = PCI_VENDOR_ID_REDHAT;
> k->device_id = PCI_DEVICE_ID_REDHAT_SDHCI;
> k->class_id = PCI_CLASS_SYSTEM_SDHCI;
>
> config SDHCI_PCI
> bool
> default y if PCI_DEVICES

Ah, now I remember.  Not the first time I wished it wouldn't exist...

>>> How can a non-default device be added, and to which machine types?
>>>
>>> I gather the fix changes device state incompatibly.  Always, or only in
>>> certain states?

I think we need to answer this question.

>>>  I'm asking because if device state remains compatible
>>> most of the time, we might be able use subsection trickery to keep
>>> migration working most of the time.  Has been done before, I think.




[PATCH v4 2/2] qcow2: Skip copy-on-write when allocating a zero cluster

2020-09-21 Thread Alberto Garcia
Since commit c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a when a write
request results in a new allocation QEMU first tries to see if the
rest of the cluster outside the written area contains only zeroes.

In that case, instead of doing a normal copy-on-write operation and
writing explicit zero buffers to disk, the code zeroes the whole
cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.

This improves performance very significantly but it only happens when
we are writing to an area that was completely unallocated before. Zero
clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
are therefore slower to allocate.

This happens because the code uses bdrv_is_allocated_above() rather
bdrv_block_status_above(). The former is not as accurate for this
purpose but it is faster. However in the case of qcow2 the underlying
call does already report zero clusters just fine so there is no reason
why we cannot use that information.

After testing 4KB writes on an image that only contains zero clusters
this patch results in almost five times more IOPS.

Signed-off-by: Alberto Garcia 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block.h |  2 ++
 block/io.c| 27 +++
 block/qcow2.c | 35 +++
 3 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b314..26ada4445b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -496,6 +496,8 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
 bool include_base, int64_t offset, int64_t bytes,
 int64_t *pnum);
+int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
+  int64_t bytes);
 
 bool bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
diff --git a/block/io.c b/block/io.c
index ef1ea806e8..8084dec522 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2541,6 +2541,33 @@ int bdrv_block_status(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
offset, bytes, pnum, map, file);
 }
 
+/*
+ * Check @bs (and its backing chain) to see if the range defined
+ * by @offset and @bytes is known to read as zeroes.
+ * Return 1 if that is the case, 0 otherwise and -errno on error.
+ * This test is meant to be fast rather than accurate so returning 0
+ * does not guarantee non-zero data.
+ */
+int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
+  int64_t bytes)
+{
+int ret;
+int64_t pnum = bytes;
+
+if (!bytes) {
+return 1;
+}
+
+ret = bdrv_common_block_status_above(bs, NULL, false, offset,
+ bytes, &pnum, NULL, NULL);
+
+if (ret < 0) {
+return ret;
+}
+
+return (pnum == bytes) && (ret & BDRV_BLOCK_ZERO);
+}
+
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
int64_t bytes, int64_t *pnum)
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index b05512718c..114526ce62 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2387,26 +2387,26 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
 return false;
 }
 
-static bool is_unallocated(BlockDriverState *bs, int64_t offset, int64_t bytes)
-{
-int64_t nr;
-return !bytes ||
-(!bdrv_is_allocated_above(bs, NULL, false, offset, bytes, &nr) &&
- nr == bytes);
-}
-
-static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+/*
+ * Return 1 if the COW regions read as zeroes, 0 if not, < 0 on error.
+ * Note that returning 0 does not guarantee non-zero data.
+ */
+static int is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
 {
 /*
  * This check is designed for optimization shortcut so it must be
  * efficient.
- * Instead of is_zero(), use is_unallocated() as it is faster (but not
- * as accurate and can result in false negatives).
+ * Instead of is_zero(), use bdrv_co_is_zero_fast() as it is
+ * faster (but not as accurate and can result in false negatives).
  */
-return is_unallocated(bs, m->offset + m->cow_start.offset,
-  m->cow_start.nb_bytes) &&
-   is_unallocated(bs, m->offset + m->cow_end.offset,
-  m->cow_end.nb_bytes);
+int ret = bdrv_co_is_zero_fast(bs, m->offset + m->cow_start.offset,
+   m->cow_start.nb_bytes);
+if (ret <= 0) {
+return ret;
+}
+
+return bdrv_co_is_zero_fast(bs, m->offset + m->cow_end.offset,
+m->cow_end.nb_bytes);
 }
 
 static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
@@ -2432,7 +2432,10 @@ static int

[PATCH v4 1/2] qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status()

2020-09-21 Thread Alberto Garcia
If a BlockDriverState supports backing files but has none then any
unallocated area reads back as zeroes.

bdrv_co_block_status() is only reporting this is if want_zero is true,
but this is an inexpensive test and there is no reason not to do it in
all cases.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Alberto Garcia 
---
 block/io.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index a2389bb38c..ef1ea806e8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2391,17 +2391,17 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 
 if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
 ret |= BDRV_BLOCK_ALLOCATED;
-} else if (want_zero && bs->drv->supports_backing) {
+} else if (bs->drv->supports_backing) {
 BlockDriverState *cow_bs = bdrv_cow_bs(bs);
 
-if (cow_bs) {
+if (!cow_bs) {
+ret |= BDRV_BLOCK_ZERO;
+} else if (want_zero) {
 int64_t size2 = bdrv_getlength(cow_bs);
 
 if (size2 >= 0 && offset >= size2) {
 ret |= BDRV_BLOCK_ZERO;
 }
-} else {
-ret |= BDRV_BLOCK_ZERO;
 }
 }
 
-- 
2.20.1




[PATCH v4 0/2] Skip copy-on-write when allocating a zero cluster

2020-09-21 Thread Alberto Garcia
I had to rebase the series due to conflicting changes on master. There
are no other differences.

Berto

v4:
- Fix rebase conflicts after cb8503159a

v3: https://lists.gnu.org/archive/html/qemu-block/2020-09/msg00912.html
- Add a new patch to improve the reporting of BDRV_BLOCK_ZERO [Vladimir]
- Rename function to bdrv_co_is_zero_fast() [Vladimir, Kevin]
- Don't call bdrv_common_block_status_above() if bytes == 0

v2: https://lists.gnu.org/archive/html/qemu-block/2020-08/msg01165.html
- Add new, simpler API: bdrv_is_unallocated_or_zero_above()

v1: https://lists.gnu.org/archive/html/qemu-block/2020-08/msg00403.html

Alberto Garcia (2):
  qcow2: Report BDRV_BLOCK_ZERO more accurately in
bdrv_co_block_status()
  qcow2: Skip copy-on-write when allocating a zero cluster

 include/block/block.h |  2 ++
 block/io.c| 35 +++
 block/qcow2.c | 35 +++
 3 files changed, 52 insertions(+), 20 deletions(-)

-- 
2.20.1




Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Philippe Mathieu-Daudé
On 9/21/20 2:24 PM, Dr. David Alan Gilbert wrote:
> * Markus Armbruster (arm...@redhat.com) wrote:
>> Philippe Mathieu-Daudé  writes:
>>
>>> +Paolo & Kevin.
>>>
>>> On 9/21/20 10:40 AM, Markus Armbruster wrote:
 Philippe Mathieu-Daudé  writes:

> As it is legal to WRITE/ERASE the address/block 0,
> change the value of this definition to an illegal
> address: UINT32_MAX.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Dr. David Alan Gilbert 
> Cc: Markus Armbruster 
>
> Same problem I had with the pflash device last year...
> This break migration :(
> What is the best way to do this?

 Remind me: did we solve the problem with pflash, and if yes, how?
>>>
>>> No we can't. The best I could do is add a comment and as this
>>> is not fixable. See commit aba53a12bd5: ("hw/block/pflash_cfi01:
>>> Document use of non-CFI compliant command '0x00'").
>>>
>>> I now consider the device in maintenance-only
>>> mode and won't add any new features.
>>>
>>> I started working on a new implementation, hoping it can be a
>>> drop in replacement. Laszlo still has hope that QEMU pflash
>>> device will support sector locking so firmware developers could
>>> test upgrading fw in VMs.
>>>
>>> Back to the SDcard, it might be less critical, so a migration
>>> breaking change might be acceptable. I'm only aware of Paolo
>>> and Kevin using this device for testing. Not sure of its
>>> importance in production.
>>
>> Neither am I.
>>
>> Which machine types include this device by default?
> 
> To me it looks like it's some of the ARM boards.

My worry is TYPE_PCI_SDHCI ("sdhci-pci"):

k->vendor_id = PCI_VENDOR_ID_REDHAT;
k->device_id = PCI_DEVICE_ID_REDHAT_SDHCI;
k->class_id = PCI_CLASS_SYSTEM_SDHCI;

config SDHCI_PCI
bool
default y if PCI_DEVICES

> 
> Dave
> 
>> How can a non-default device be added, and to which machine types?
>>
>> I gather the fix changes device state incompatibly.  Always, or only in
>> certain states?  I'm asking because if device state remains compatible
>> most of the time, we might be able use subsection trickery to keep
>> migration working most of the time.  Has been done before, I think.




Re: [RFC] Move to C11 Atomics

2020-09-21 Thread Paolo Bonzini
On 21/09/20 15:44, Stefan Hajnoczi wrote:
> On Mon, Sep 21, 2020 at 01:15:30PM +0200, Paolo Bonzini wrote:
>> On 21/09/20 12:41, Stefan Hajnoczi wrote:
>>> The upshot is that all atomic variables in QEMU need to use C11 Atomic
>>> atomic_* types. This is a big change!
>>
>> The main issue with this is that C11 atomic types do too much magic,
>> including defaulting to seq-cst operations for loads and stores. As
>> documented in docs/devel/atomics.rst, seq-cst loads and stores are
>> almost never what you want (they're only a little below volatile :)):
> 
> I can't find where atomics.rst says seq-cst operations are almost never
> what you want?

Note that I'm talking only about loads and stores.  They are so much
never what you want that we don't even provide a wrapper for them in
qemu/atomic.h.

  ``qemu/atomic.h`` also provides loads and stores that cannot be
  reordered with each other::

typeof(*ptr) atomic_mb_read(ptr)
void atomic_mb_set(ptr, val)

  However these do not provide sequential consistency and, in
  particular, they do not participate in the total ordering enforced by
  sequentially-consistent operations.  For this reason they are
  deprecated.  They should instead be replaced with any of the following
  (ordered from easiest to hardest):

  - accesses inside a mutex or spinlock

  - lightweight synchronization primitives such as ``QemuEvent``

  - RCU operations (``atomic_rcu_read``, ``atomic_rcu_set``) when
  publishing or accessing a new version of a data structure

  - other atomic accesses: ``atomic_read`` and ``atomic_load_acquire``
  for loads, ``atomic_set`` and ``atomic_store_release`` for stores,
  ``smp_mb`` to forbid reordering subsequent loads before a store.

where seq-cst loads and stores are again completely missing.  smp_mb is
there to replace them, as we did in commit 5710a3e0 ("async: use
explicit memory barriers").

>> we can use store-release/load-acquire
> 
> They don't provide atomic arithmetic/logic operations. The only
> non-seq-cst ALU operation I see in atomic.h is
> atomic_fetch_inc_nonzero(), and it's a cmpxchg() loop (ugly compared to
> an atomic ALU instruction).

Seq-cst is fine for RMW operations (arithmetic/logic, xchg, cmpxchg),
also because they're usually less performance critical than loads and
stores.  It's only loads and stores that give a false sense of
correctness as in the above commit.

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 17/20] backup: move to block-copy

2020-09-21 Thread Vladimir Sementsov-Ogievskiy

23.07.2020 12:47, Max Reitz wrote:

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:

This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |   9 +--
  block/backup.c | 145 +++--
  block/block-copy.c |  21 ++
  3 files changed, 83 insertions(+), 92 deletions(-)


This patch feels like it should be multiple ones.  I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.

More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch.  Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.

(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now.  We’d just stop using the progress
callback in backup, and remove it in one separate patch.)

[...]


diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
  BlockdevOnError on_source_error;
  BlockdevOnError on_target_error;
  uint64_t len;
-uint64_t bytes_read;
  int64_t cluster_size;
  int max_workers;
  int64_t max_chunk;
  
  BlockCopyState *bcs;

+
+BlockCopyCallState *bcs_call;


Can this be more descriptive?  E.g. background_bcs?  bg_bcs_call?  bg_bcsc?


+int ret;
+bool error_is_read;
  } BackupBlockJob;
  
  static const BlockJobDriver backup_job_driver;
  


[...]


  static int coroutine_fn backup_loop(BackupBlockJob *job)
  {
-bool error_is_read;
-int64_t offset;
-BdrvDirtyBitmapIter *bdbi;
-int ret = 0;
+while (true) { /* retry loop */
+assert(!job->bcs_call);
+job->bcs_call = block_copy_async(job->bcs, 0,
+ QEMU_ALIGN_UP(job->len,
+   job->cluster_size),
+ true, job->max_workers, 
job->max_chunk,
+ backup_block_copy_callback, job);
  
-bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));

-while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-do {
-if (yield_and_check(job)) {
-goto out;
+while (job->bcs_call && !job->common.job.cancelled) {
+/* wait and handle pauses */


Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused?  I can’t see anyone doing that.

Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it.  Is there some magic I’m missing?

Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?


Yes, that's my idea: we cancel on pause and just run new block_copy operation
on resume.


But cancelling the block-copy operation leads to it returning 0, AFAIR,
so...


Looks like it should be fixed. By returning ECANCELED or by checking the bitmap.




+
+job_pause_point(&job->common.job);
+
+if (job->bcs_call && !job->common.job.cancelled) {
+job_yield(&job->common.job);
  }
-ret = backup_do_cow(job, offset, job->cluster_size, 
&error_is_read);
-if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
-   BLOCK_ERROR_ACTION_REPORT)
-{
-goto out;
+}
+
+if (!job->bcs_call && job->ret == 0) {
+/* Success */
+return 0;


...I would assume we return here when the job is paused.


+}
+
+if (job->common.job.cancelled) {
+if (job->bcs_call) {
+block_copy_cancel(job->bcs_call);
  }
-} while (ret < 0);
+return 0;
+}
+
+if (!job->bcs_call && job->ret < 0 &&


Is it even possible for bcs_call to be non-NULL here?


+(backup_error_action(job, job->error_is_read, -job->ret) ==
+ BLOCK_ERROR_ACTION_REPORT))
+{
+return job->ret;
+}


So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again?  That doesn’t sound very IGNORE-y to me.


Not the whole. We have the dirty bitmap: clusters that was already copied are 
not touched more.



--
Best regards,
Vladimir



Re: [RFC] Move to C11 Atomics

2020-09-21 Thread Stefan Hajnoczi
On Mon, Sep 21, 2020 at 01:15:30PM +0200, Paolo Bonzini wrote:
> On 21/09/20 12:41, Stefan Hajnoczi wrote:
> > The upshot is that all atomic variables in QEMU need to use C11 Atomic
> > atomic_* types. This is a big change!
> 
> The main issue with this is that C11 atomic types do too much magic,
> including defaulting to seq-cst operations for loads and stores. As
> documented in docs/devel/atomics.rst, seq-cst loads and stores are
> almost never what you want (they're only a little below volatile :)):

I can't find where atomics.rst says seq-cst operations are almost never
what you want?

I'm surprised that this isn't documented more prominently. Seq-cst
operations are the first type of atomic operation documented. It would
help to move them to the end of the document if they shouldn't be used.

> 1) in almost all cases where we do message passing between threads, we
> can use store-release/load-acquire

They don't provide atomic arithmetic/logic operations. The only
non-seq-cst ALU operation I see in atomic.h is
atomic_fetch_inc_nonzero(), and it's a cmpxchg() loop (ugly compared to
an atomic ALU instruction).

> 2) when we want a full memory barrier, it's usually better to write the
> whole construct by hand, to avoid issues like the ones we had with
> ->notify_me.
> 
> In addition, atomics are complicated enough that you almost always want
> a sign that you are using them, even at the cost of heavier-looking
> code.  For example "atomic_read" straight ahead tells you "this is
> complicated but somebody has thought about it", while a naked access to
> a global variable or a struct field tells you "check which mutex is held
> when this function is called".  By allowing shortcuts for seq-cst loads
> and stores, C11 atomics fool you into thinking that seq-cst accesses are
> all you need, while in reality they create more problems than they solve. :(

Good point. But it's easy to error out on 'atomic_fetch_add()' and
insist on 'atomic_fetch_add_explicit()' (where the user specifies the
memory order) in checkpatch.pl.

atomic_load_explicit()/atomic_store_explicit() can be used instead of
bare variable accesses. Although here I don't know how to enforce that
except via typedef struct { atomic_int i; } AtomicInt.

Thanks for suggesting the namespace cleanup I'll give that a try.

> > 1. Reimplement everything in terms of atomic_fetch_*() and other C11
> >Atomic APIs. For example atomic_fetch_inc() becomes
> >atomic_fetch_add(ptr, 1).
> > 
> > 2. atomic_*_fetch() is not available in C11 Atomics so emulate it by
> >performing the operation twice (once atomic, then again non-atomic
> >using the fetched old atomic value). Yuck!
> 
> They're hidden in plain sight if you are okay with seq-cst operations
> (which we almost always are for RMW operations, unlike loads and
> stores): "x += 3" is an atomic_add_fetch(&x, 3).  However, they share
> with loads and stores the problem of being too magic; if you see "x +=
> 3" you don't think of it as something that's thread safe.

Ah, I see!

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Dr. David Alan Gilbert
* Markus Armbruster (arm...@redhat.com) wrote:
> Philippe Mathieu-Daudé  writes:
> 
> > +Paolo & Kevin.
> >
> > On 9/21/20 10:40 AM, Markus Armbruster wrote:
> >> Philippe Mathieu-Daudé  writes:
> >> 
> >>> As it is legal to WRITE/ERASE the address/block 0,
> >>> change the value of this definition to an illegal
> >>> address: UINT32_MAX.
> >>>
> >>> Signed-off-by: Philippe Mathieu-Daudé 
> >>> ---
> >>> Cc: Dr. David Alan Gilbert 
> >>> Cc: Markus Armbruster 
> >>>
> >>> Same problem I had with the pflash device last year...
> >>> This break migration :(
> >>> What is the best way to do this?
> >> 
> >> Remind me: did we solve the problem with pflash, and if yes, how?
> >
> > No we can't. The best I could do is add a comment and as this
> > is not fixable. See commit aba53a12bd5: ("hw/block/pflash_cfi01:
> > Document use of non-CFI compliant command '0x00'").
> >
> > I now consider the device in maintenance-only
> > mode and won't add any new features.
> >
> > I started working on a new implementation, hoping it can be a
> > drop in replacement. Laszlo still has hope that QEMU pflash
> > device will support sector locking so firmware developers could
> > test upgrading fw in VMs.
> >
> > Back to the SDcard, it might be less critical, so a migration
> > breaking change might be acceptable. I'm only aware of Paolo
> > and Kevin using this device for testing. Not sure of its
> > importance in production.
> 
> Neither am I.
> 
> Which machine types include this device by default?

To me it looks like it's some of the ARM boards.

Dave

> How can a non-default device be added, and to which machine types?
> 
> I gather the fix changes device state incompatibly.  Always, or only in
> certain states?  I'm asking because if device state remains compatible
> most of the time, we might be able use subsection trickery to keep
> migration working most of the time.  Has been done before, I think.
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> +Paolo & Kevin.
>
> On 9/21/20 10:40 AM, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé  writes:
>> 
>>> As it is legal to WRITE/ERASE the address/block 0,
>>> change the value of this definition to an illegal
>>> address: UINT32_MAX.
>>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>> Cc: Dr. David Alan Gilbert 
>>> Cc: Markus Armbruster 
>>>
>>> Same problem I had with the pflash device last year...
>>> This break migration :(
>>> What is the best way to do this?
>> 
>> Remind me: did we solve the problem with pflash, and if yes, how?
>
> No we can't. The best I could do is add a comment and as this
> is not fixable. See commit aba53a12bd5: ("hw/block/pflash_cfi01:
> Document use of non-CFI compliant command '0x00'").
>
> I now consider the device in maintenance-only
> mode and won't add any new features.
>
> I started working on a new implementation, hoping it can be a
> drop in replacement. Laszlo still has hope that QEMU pflash
> device will support sector locking so firmware developers could
> test upgrading fw in VMs.
>
> Back to the SDcard, it might be less critical, so a migration
> breaking change might be acceptable. I'm only aware of Paolo
> and Kevin using this device for testing. Not sure of its
> importance in production.

Neither am I.

Which machine types include this device by default?

How can a non-default device be added, and to which machine types?

I gather the fix changes device state incompatibly.  Always, or only in
certain states?  I'm asking because if device state remains compatible
most of the time, we might be able use subsection trickery to keep
migration working most of the time.  Has been done before, I think.




[PULL v3 06/15] vhost: recheck dev state in the vhost_migration_log routine

2020-09-21 Thread Michael S. Tsirkin
From: Dima Stepanov 

vhost-user devices can get a disconnect in the middle of the VHOST-USER
handshake on the migration start. If disconnect event happened right
before sending next VHOST-USER command, then the vhost_dev_set_log()
call in the vhost_migration_log() function will return error. This error
will lead to the assert() and close the QEMU migration source process.
For the vhost-user devices the disconnect event should not break the
migration process, because:
  - the device will be in the stopped state, so it will not be changed
during migration
  - if reconnect will be made the migration log will be reinitialized as
part of reconnect/init process:
#0  vhost_log_global_start (listener=0x563989cf7be0)
at hw/virtio/vhost.c:920
#1  0x56398603d8bc in listener_add_address_space 
(listener=0x563989cf7be0,
as=0x563986ea4340 )
at softmmu/memory.c:2664
#2  0x56398603dd30 in memory_listener_register (listener=0x563989cf7be0,
as=0x563986ea4340 )
at softmmu/memory.c:2740
#3  0x563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
busyloop_timeout=0)
at hw/virtio/vhost.c:1385
#4  0x563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
at hw/block/vhost-user-blk.c:315
#5  0x563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
event=CHR_EVENT_OPENED)
at hw/block/vhost-user-blk.c:379
Update the vhost-user-blk device with the internal started_vu field which
will be used for initialization (vhost_user_blk_start) and clean up
(vhost_user_blk_stop). This additional flag in the VhostUserBlk structure
will be used to track whether the device really needs to be stopped and
cleaned up on a vhost-user level.
The disconnect event will set the overall VHOST device (not vhost-user) to
the stopped state, so it can be used by the general vhost_migration_log
routine.
Such approach could be propogated to the other vhost-user devices, but
better idea is just to make the same connect/disconnect code for all the
vhost-user devices.

This migration issue was slightly discussed earlier:
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html

Signed-off-by: Dima Stepanov 
Reviewed-by: Raphael Norwitz 
Message-Id: 
<9fbfba06791a87813fcee3e2315f0b904cc6789a.1599813294.git.dimas...@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/vhost-user-blk.h | 10 ++
 hw/block/vhost-user-blk.c  | 19 ---
 hw/virtio/vhost.c  | 27 ---
 3 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/vhost-user-blk.h 
b/include/hw/virtio/vhost-user-blk.h
index dc40ab6f11..5d86ff2e87 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -42,7 +42,17 @@ struct VHostUserBlk {
 VhostUserState vhost_user;
 struct vhost_virtqueue *vhost_vqs;
 VirtQueue **virtqs;
+
+/*
+ * There are at least two steps of initialization of the
+ * vhost-user device. The first is a "connect" step and
+ * second is a "start" step. Make a separation between
+ * those initialization phases by using two fields.
+ */
+/* vhost_user_blk_connect/vhost_user_blk_disconnect */
 bool connected;
+/* vhost_user_blk_start/vhost_user_blk_stop */
+bool started_vu;
 };
 
 #endif
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 39aec42dae..a076b1e54d 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -150,6 +150,7 @@ static int vhost_user_blk_start(VirtIODevice *vdev)
 error_report("Error starting vhost: %d", -ret);
 goto err_guest_notifiers;
 }
+s->started_vu = true;
 
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here. virtio-pci will do the right thing by
@@ -175,6 +176,11 @@ static void vhost_user_blk_stop(VirtIODevice *vdev)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 int ret;
 
+if (!s->started_vu) {
+return;
+}
+s->started_vu = false;
+
 if (!k->set_guest_notifiers) {
 return;
 }
@@ -341,9 +347,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
 }
 s->connected = false;
 
-if (s->dev.started) {
-vhost_user_blk_stop(vdev);
-}
+vhost_user_blk_stop(vdev);
 
 vhost_dev_cleanup(&s->dev);
 }
@@ -399,6 +403,15 @@ static void vhost_user_blk_event(void *opaque, 
QEMUChrEvent event)
 NULL, NULL, false);
 aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
 }
+
+/*
+ * Move vhost device to the stopped state. The vhost-user device
+ * will be clean up and disconnected in BH. This can be useful in
+ * the vhost mi

Re: [RFC] Move to C11 Atomics

2020-09-21 Thread Paolo Bonzini
On 21/09/20 12:41, Stefan Hajnoczi wrote:
> The upshot is that all atomic variables in QEMU need to use C11 Atomic
> atomic_* types. This is a big change!

The main issue with this is that C11 atomic types do too much magic,
including defaulting to seq-cst operations for loads and stores. As
documented in docs/devel/atomics.rst, seq-cst loads and stores are
almost never what you want (they're only a little below volatile :)):

1) in almost all cases where we do message passing between threads, we
can use store-release/load-acquire

2) when we want a full memory barrier, it's usually better to write the
whole construct by hand, to avoid issues like the ones we had with
->notify_me.

In addition, atomics are complicated enough that you almost always want
a sign that you are using them, even at the cost of heavier-looking
code.  For example "atomic_read" straight ahead tells you "this is
complicated but somebody has thought about it", while a naked access to
a global variable or a struct field tells you "check which mutex is held
when this function is called".  By allowing shortcuts for seq-cst loads
and stores, C11 atomics fool you into thinking that seq-cst accesses are
all you need, while in reality they create more problems than they solve. :(

So if we need to make a big change in the implementation of atomics,
what we should do is to wrap variables that are accessed atomically with
a struct.  This would enforce using atomic_read/atomic_set to access
them.  These structs could be types like stdatomic.h's atomic_char,
though probably we'd name them something like AtomicChar.

It would move us further from C11; however, it would also make it harder
to write buggy code.

If there is a problem of namespace pollution between qemu/atomic.h and
C11 atomics we could also fix that.

> 1. Reimplement everything in terms of atomic_fetch_*() and other C11
>Atomic APIs. For example atomic_fetch_inc() becomes
>atomic_fetch_add(ptr, 1).
> 
> 2. atomic_*_fetch() is not available in C11 Atomics so emulate it by
>performing the operation twice (once atomic, then again non-atomic
>using the fetched old atomic value). Yuck!

They're hidden in plain sight if you are okay with seq-cst operations
(which we almost always are for RMW operations, unlike loads and
stores): "x += 3" is an atomic_add_fetch(&x, 3).  However, they share
with loads and stores the problem of being too magic; if you see "x +=
3" you don't think of it as something that's thread safe.

Sorry for being so negative.  Unfortunately while the C11 memory model
is a great addition, in my not so humble opinion stdatomic.h was
designed without enough experience with atomics and with the memory
model itself (it even took people like Hans Boehm and Paul McKenney to
convince the committee that non-seq-cst atomics have a place).

Thanks,

Paolo

> 3. Can everything in atomic.h really be converted to C11 Atomics? I'm
>not sure yet :(.
> 
> Better ideas?
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/aio.h   |  2 +-
>  include/qemu/atomic.h | 79 +--
>  include/qemu/bitops.h |  2 +-
>  include/qom/object.h  |  3 +-
>  util/aio-posix.h  |  2 +-
>  util/async.c  |  2 +-
>  meson.build   |  3 ++
>  7 files changed, 70 insertions(+), 23 deletions(-)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index b2f703fa3f..466c058880 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -220,7 +220,7 @@ struct AioContext {
>   */
>  QEMUTimerListGroup tlg;
>  
> -int external_disable_cnt;
> +atomic_int external_disable_cnt;
>  
>  /* Number of AioHandlers without .io_poll() */
>  int poll_disable_cnt;
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index ff72db5115..4fbbd5f362 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -15,6 +15,32 @@
>  #ifndef QEMU_ATOMIC_H
>  #define QEMU_ATOMIC_H
>  
> +#include 
> +#include 
> +
> +/* Use C11 Atomics if possible, otherwise fall back to custom definitions */
> +#ifdef CONFIG_STDATOMIC_H
> +#include 
> +#else
> +/* Commonly used types from C11 "7.17.6 Atomic integer types" */
> +typedef bool atomic_bool;
> +typedef char atomic_char;
> +typedef signed char atomic_schar;
> +typedef unsigned char atomic_uchar;
> +typedef short atomic_short;
> +typedef unsigned short atomic_ushort;
> +typedef int atomic_int;
> +typedef unsigned int atomic_uint;
> +typedef long atomic_long;
> +typedef unsigned long atomic_ulong;
> +typedef long long atomic_llong;
> +typedef unsigned long long atomic_ullong;
> +typedef intptr_t atomic_intptr_t;
> +typedef uintptr_t atomic_uintptr_t;
> +typedef size_t atomic_size_t;
> +typedef ptrdiff_t atomic_ptrdiff_t;
> +#endif
> +
>  /* Compiler barrier */
>  #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
>  
> @@ -205,10 +231,6 @@
>  atomic_cmpxchg__nocheck(ptr, old, new); \
>  })
>  
> -/* Provid

Re: [PATCH v1 0/2] Add timeout mechanism to qmp actions

2020-09-21 Thread Fam Zheng
On 2020-09-19 10:22, Zhenyu Ye wrote:
> On 2020/9/18 22:06, Fam Zheng wrote:
> > 
> > I can see how blocking in a slow io_submit can cause trouble for main
> > thread. I think one way to fix it (until it's made truly async in new
> > kernels) is moving the io_submit call to thread pool, and wrapped in a
> > coroutine, perhaps.
> >
> 
> I'm not sure if any other operation will block the main thread, other
> than io_submit().

Then that's a problem with io_submit which should be fixed. Or more
precisely, that is a long held lock that we should avoid in QEMU's event
loops.

> 
> > I'm not sure qmp timeout is a complete solution because we would still
> > suffer from a blocked state for a period, in this exact situation before
> > the timeout.
> 
> Anyway, the qmp timeout may be the last measure to prevent the VM
> soft lockup. 

Maybe, but I don't think baking such a workaround into the QMP API is a
good idea. No QMP command should be synchronously long running, so
having a timeout parameter is just a wrong design.

Thanks,

Fam



Re: [PATCH] block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition

2020-09-21 Thread Alberto Garcia
On Mon 21 Sep 2020 01:01:45 PM CEST, Philippe Mathieu-Daudé wrote:
> Use self-explicit NANOSECONDS_PER_SECOND definition instead
> of magic value.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alberto Garcia 

Berto



[PATCH] block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition

2020-09-21 Thread Philippe Mathieu-Daudé
Use self-explicit NANOSECONDS_PER_SECOND definition instead
of magic value.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/sheepdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index cbbebc1aaf2..cbc655a1a05 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -740,7 +740,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
 if (s->fd < 0) {
 trace_sheepdog_reconnect_to_sdog();
 error_report_err(local_err);
-qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 10ULL);
+qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, NANOSECONDS_PER_SECOND);
 }
 };
 
-- 
2.26.2




[RFC] Move to C11 Atomics

2020-09-21 Thread Stefan Hajnoczi
This patch is incomplete but I am looking for feedback on the approach
before fully implementing it (which will involve lots of changes).

QEMU's atomic.h provides atomic operations and is intended to work with
or without . Some of the atomic.h APIs are from C11 Atomics
while others are Linux-inspired or QEMU-specific extensions.

atomic.h works fine with gcc but clang enforces the following:

  atomic_fetch_add() and friends must use C11 Atomic atomic_* types.

  __atomic_fetch_add() and friends must use direct types (int, etc) and
  NOT C11 Atomic types.

The consequences are:

1. atomic_fetch_*() produces compilation errors since QEMU code uses
   direct types and not C11 Atomic types.

2. atomic_fetch_*() cannot be used on the same variables as
   __atomic_fetch_*() because they support different types. This is a
   problem because QEMU's atomic.h builds on __atomic_fetch_*() and code
   expects to use both atomic_fetch_*() and QEMU atomic.h APIs on the
   same variables.

I would like to move QEMU to C11 Atomics, removing QEMU-specific APIs
which have C11 equivalents. The new atomic.h would #include
 and define additional APIs on top. It also needs to carry
a  fallback implementation for RHEL 7 because gcc does not
have  there.

The upshot is that all atomic variables in QEMU need to use C11 Atomic
atomic_* types. This is a big change!

Also, existing atomic.h APIs that use __atomic_fetch_*() need to move to
C11 Atomics instead so that they take atomic_* types.

This patch contains a few examples of this conversion. Things to note:

1. Reimplement everything in terms of atomic_fetch_*() and other C11
   Atomic APIs. For example atomic_fetch_inc() becomes
   atomic_fetch_add(ptr, 1).

2. atomic_*_fetch() is not available in C11 Atomics so emulate it by
   performing the operation twice (once atomic, then again non-atomic
   using the fetched old atomic value). Yuck!

3. Can everything in atomic.h really be converted to C11 Atomics? I'm
   not sure yet :(.

Better ideas?

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h   |  2 +-
 include/qemu/atomic.h | 79 +--
 include/qemu/bitops.h |  2 +-
 include/qom/object.h  |  3 +-
 util/aio-posix.h  |  2 +-
 util/async.c  |  2 +-
 meson.build   |  3 ++
 7 files changed, 70 insertions(+), 23 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index b2f703fa3f..466c058880 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -220,7 +220,7 @@ struct AioContext {
  */
 QEMUTimerListGroup tlg;
 
-int external_disable_cnt;
+atomic_int external_disable_cnt;
 
 /* Number of AioHandlers without .io_poll() */
 int poll_disable_cnt;
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index ff72db5115..4fbbd5f362 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -15,6 +15,32 @@
 #ifndef QEMU_ATOMIC_H
 #define QEMU_ATOMIC_H
 
+#include 
+#include 
+
+/* Use C11 Atomics if possible, otherwise fall back to custom definitions */
+#ifdef CONFIG_STDATOMIC_H
+#include 
+#else
+/* Commonly used types from C11 "7.17.6 Atomic integer types" */
+typedef bool atomic_bool;
+typedef char atomic_char;
+typedef signed char atomic_schar;
+typedef unsigned char atomic_uchar;
+typedef short atomic_short;
+typedef unsigned short atomic_ushort;
+typedef int atomic_int;
+typedef unsigned int atomic_uint;
+typedef long atomic_long;
+typedef unsigned long atomic_ulong;
+typedef long long atomic_llong;
+typedef unsigned long long atomic_ullong;
+typedef intptr_t atomic_intptr_t;
+typedef uintptr_t atomic_uintptr_t;
+typedef size_t atomic_size_t;
+typedef ptrdiff_t atomic_ptrdiff_t;
+#endif
+
 /* Compiler barrier */
 #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
 
@@ -205,10 +231,6 @@
 atomic_cmpxchg__nocheck(ptr, old, new); \
 })
 
-/* Provide shorter names for GCC atomic builtins, return old value */
-#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
-#define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
-
 #ifndef atomic_fetch_add
 #define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
@@ -217,22 +239,41 @@
 #define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
 #endif
 
-#define atomic_inc_fetch(ptr)__atomic_add_fetch(ptr, 1, __ATOMIC_SEQ_CST)
-#define atomic_dec_fetch(ptr)__atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
-#define atomic_add_fetch(ptr, n) __atomic_add_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_sub_fetch(ptr, n) __atomic_sub_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_and_fetch(ptr, n) __atomic_and_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_or_fetch(ptr, n)  __atomic_or_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_xor_fetch(ptr, n) __atomic_xor_fetch(ptr, n, __ATOMIC_SEQ_CST)
+/* Provide shorter names

Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Philippe Mathieu-Daudé
+Paolo & Kevin.

On 9/21/20 10:40 AM, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
> 
>> As it is legal to WRITE/ERASE the address/block 0,
>> change the value of this definition to an illegal
>> address: UINT32_MAX.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> Cc: Dr. David Alan Gilbert 
>> Cc: Markus Armbruster 
>>
>> Same problem I had with the pflash device last year...
>> This break migration :(
>> What is the best way to do this?
> 
> Remind me: did we solve the problem with pflash, and if yes, how?

No we can't. The best I could do is add a comment and as this
is not fixable. See commit aba53a12bd5: ("hw/block/pflash_cfi01:
Document use of non-CFI compliant command '0x00'").

I now consider the device in maintenance-only
mode and won't add any new features.

I started working on a new implementation, hoping it can be a
drop in replacement. Laszlo still has hope that QEMU pflash
device will support sector locking so firmware developers could
test upgrading fw in VMs.

Back to the SDcard, it might be less critical, so a migration
breaking change might be acceptable. I'm only aware of Paolo
and Kevin using this device for testing. Not sure of its
importance in production.

> 
>> ---
>>  hw/sd/sd.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
>> index 30ae435d669..4c05152f189 100644
>> --- a/hw/sd/sd.c
>> +++ b/hw/sd/sd.c
>> @@ -53,7 +53,7 @@
>>  
>>  #define SDSC_MAX_CAPACITY   (2 * GiB)
>>  
>> -#define INVALID_ADDRESS 0
>> +#define INVALID_ADDRESS UINT32_MAX
>>  
>>  typedef enum {
>>  sd_r0 = 0,/* no response */
>> @@ -666,8 +666,8 @@ static int sd_vmstate_pre_load(void *opaque)
>>  
>>  static const VMStateDescription sd_vmstate = {
>>  .name = "sd-card",
>> -.version_id = 1,
>> -.minimum_version_id = 1,
>> +.version_id = 2,
>> +.minimum_version_id = 2,
>>  .pre_load = sd_vmstate_pre_load,
>>  .fields = (VMStateField[]) {
>>  VMSTATE_UINT32(mode, SDState),
> 
> 




Re: [PATCH v2 04/20] block/block-copy: More explicit call_state

2020-09-21 Thread Max Reitz
On 18.09.20 22:11, Vladimir Sementsov-Ogievskiy wrote:
> 17.07.2020 16:45, Max Reitz wrote:
>> On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>> Refactor common path to use BlockCopyCallState pointer as parameter, to
>>> prepare it for use in asynchronous block-copy (at least, we'll need to
>>> run block-copy in a coroutine, passing the whole parameters as one
>>> pointer).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/block-copy.c | 51 ++
>>>   1 file changed, 38 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>> index 43a018d190..75882a094c 100644
>>> --- a/block/block-copy.c
>>> +++ b/block/block-copy.c
>>
>> [...]
>>
>>> @@ -646,16 +653,16 @@ out:
>>>    * it means that some I/O operation failed in context of _this_
>>> block_copy call,
>>>    * not some parallel operation.
>>>    */
>>> -int coroutine_fn block_copy(BlockCopyState *s, int64_t offset,
>>> int64_t bytes,
>>> -    bool *error_is_read)
>>> +static int coroutine_fn block_copy_common(BlockCopyCallState
>>> *call_state)
>>>   {
>>>   int ret;
>>>     do {
>>> -    ret = block_copy_dirty_clusters(s, offset, bytes,
>>> error_is_read);
>>> +    ret = block_copy_dirty_clusters(call_state);
>>
>> It’s possible that much of this code will change in a future patch of
>> this series, but as it is, it might be nice to make
>> block_copy_dirty_clusters’s argument a const pointer so it’s clear that
>> the call to block_copy_wait_one() below will use the original @offset
>> and @bytes values.
> 
> Hm. I'm trying this, and it doesn't work:
> 
> block_copy_task_entry() wants to change call_state:
> 
>    t->call_state->failed = true;

Too bad :)

>> *shrug*
>>
>> Reviewed-by: Max Reitz 
>>
>>>     if (ret == 0) {
>>> -    ret = block_copy_wait_one(s, offset, bytes);
>>> +    ret = block_copy_wait_one(call_state->s,
>>> call_state->offset,
>>> +  call_state->bytes);
>>>   }
>>>     /*
>>
> 
> 




signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 3/6] hw/sd/sdcard: Do not use legal address '0' for INVALID_ADDRESS

2020-09-21 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> As it is legal to WRITE/ERASE the address/block 0,
> change the value of this definition to an illegal
> address: UINT32_MAX.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Cc: Dr. David Alan Gilbert 
> Cc: Markus Armbruster 
>
> Same problem I had with the pflash device last year...
> This break migration :(
> What is the best way to do this?

Remind me: did we solve the problem with pflash, and if yes, how?

> ---
>  hw/sd/sd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 30ae435d669..4c05152f189 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -53,7 +53,7 @@
>  
>  #define SDSC_MAX_CAPACITY   (2 * GiB)
>  
> -#define INVALID_ADDRESS 0
> +#define INVALID_ADDRESS UINT32_MAX
>  
>  typedef enum {
>  sd_r0 = 0,/* no response */
> @@ -666,8 +666,8 @@ static int sd_vmstate_pre_load(void *opaque)
>  
>  static const VMStateDescription sd_vmstate = {
>  .name = "sd-card",
> -.version_id = 1,
> -.minimum_version_id = 1,
> +.version_id = 2,
> +.minimum_version_id = 2,
>  .pre_load = sd_vmstate_pre_load,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(mode, SDState),




RE: [PATCH] xen: rework pci_piix3_xen_ide_unplug

2020-09-21 Thread Paul Durrant
> -Original Message-
> From: Anthony PERARD 
> Sent: 18 September 2020 15:53
> To: qemu-de...@nongnu.org
> Cc: Paul Durrant ; Stefano Stabellini ; 
> Anthony PERARD
> ; John Snow ; 
> qemu-block@nongnu.org
> Subject: [PATCH] xen: rework pci_piix3_xen_ide_unplug
> 
> This is to allow IDE disks to be unplugged when adding to QEMU via:
> -drive file=/root/disk_file,if=none,id=ide-disk0,format=raw
> -device ide-hd,drive=ide-disk0,bus=ide.0,unit=0
> 
> as the current code only works for disk added with:
> -drive file=/root/disk_file,if=ide,index=0,media=disk,format=raw
> 
> Since the code already have the IDE controller as `dev`, we don't need
> to use the legacy DriveInfo to find all the drive we want to unplug.
> We can simply use `blk` from the controller, as it kind of was already
> assume to be the same, by setting it to NULL.
> 
> Signed-off-by: Anthony PERARD 
> ---
>  hw/ide/piix.c | 27 +--
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index b402a936362b..16fcbe58d6fe 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -164,30 +164,29 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
> **errp)
>  int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
>  {
>  PCIIDEState *pci_ide;
> -DriveInfo *di;
>  int i;
>  IDEDevice *idedev;
> +IDEBus *idebus;
> +BlockBackend *blk;
> 
>  pci_ide = PCI_IDE(dev);
> 
>  for (i = aux ? 1 : 0; i < 4; i++) {
> -di = drive_get_by_index(IF_IDE, i);
> -if (di != NULL && !di->media_cd) {
> -BlockBackend *blk = blk_by_legacy_dinfo(di);
> -DeviceState *ds = blk_get_attached_dev(blk);
> +idebus = &pci_ide->bus[i / 2];
> +blk = idebus->ifs[i % 2].blk;
> 
> -blk_drain(blk);
> -blk_flush(blk);
> -
> -if (ds) {
> -blk_detach_dev(blk, ds);
> -}
> -pci_ide->bus[di->bus].ifs[di->unit].blk = NULL;
> +if (blk && idebus->ifs[i%2].drive_kind != IDE_CD) {

Spaces around '%'

>  if (!(i % 2)) {
> -idedev = pci_ide->bus[di->bus].master;
> +idedev = idebus->master;
>  } else {
> -idedev = pci_ide->bus[di->bus].slave;
> +idedev = idebus->slave;
>  }

Perhaps use a ternary operator here rather than if/else. Also there are 3 uses 
'i % 2' so I wonder whether a local 'device index'
variable might be neater.

  Paul

> +
> +blk_drain(blk);
> +blk_flush(blk);
> +
> +blk_detach_dev(blk, DEVICE(idedev));
> +idebus->ifs[i % 2].blk = NULL;
>  idedev->conf.blk = NULL;
>  monitor_remove_blk(blk);
>  blk_unref(blk);
> --
> Anthony PERARD