date:20200727

Re: [PATCH v2 for-5.1? 0/5] Fix nbd reconnect dead-locks

2020-07-27 Thread Eric Blake


On 7/27/20 1:47 PM, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

v2: it's a bit updated "[PATCH for-5.1? 0/3] Fix nbd reconnect dead-locks"
plus completely rewritten "[PATCH for-5.1? 0/4] non-blocking connect"
(which is now the only one patch 05)

01: new
02: rebased on 01, fix (add outer "if")
03-04: add Eric's r-b:
05: new

If 05 is too big for 5.1, it's OK to take only 01-04 or less, as well as
postponing everything to 5.2, as it's all not a degradation of 5.1
(it's a degradation of 4.2, together with the whole reconnect feature).


I think I like where 5/5 is headed, but am not sure yet whether all 
paths are thread-safe or if there is anything we can reuse to make its 
implementation smaller.  You are right that it's probably best to defer 
that to 5.2.  In the meantime, I'll queue 1-4 for my NBD pull request 
for -rc2.




Vladimir Sementsov-Ogievskiy (5):
   block/nbd: split nbd_establish_connection out of nbd_client_connect
   block/nbd: allow drain during reconnect attempt
   block/nbd: on shutdown terminate connection attempt
   block/nbd: nbd_co_reconnect_loop(): don't sleep if drained
   block/nbd: use non-blocking connect: fix vm hang on connect()

  block/nbd.c| 360 +
  block/trace-events |   4 +-
  2 files changed, 331 insertions(+), 33 deletions(-)



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v2 0/4] Fix convert to qcow2 compressed to NBD

2020-07-27 Thread Nir Soffer

Fix qemu-img convert -O qcow2 -c to NBD URL and add missing test for this
usage.

This already works now, but unfortunately qemu-img fails when trying to
truncate the target image to the same size at the end of the operation.

Changes since v1:
- Include complete code for creating OVA file [Eric]
- Use qcow2 for source file to avoid issues with random CI filesystem [Max]
- Fix many typos [Eric, Max]
- Make qemu_nbd_popen a context manager
- Add more qemu_img_* helpers
- Verify OVA file contents

v1 was here:
https://lists.nongnu.org/archive/html/qemu-block/2020-07/msg01543.html

Nir Soffer (4):
  block: nbd: Fix convert qcow2 compressed to nbd
  iotests: Make qemu_nbd_popen() a contextmanager
  iotests: Add more qemu_img helpers
  iotests: Test convert to qcow2 compressed to NBD

 block/nbd.c   |  30 
 tests/qemu-iotests/264|  76 
 tests/qemu-iotests/264.out|   2 +
 tests/qemu-iotests/302| 127 ++
 tests/qemu-iotests/302.out|  31 +
 tests/qemu-iotests/group  |   1 +
 tests/qemu-iotests/iotests.py |  34 -
 7 files changed, 251 insertions(+), 50 deletions(-)
 create mode 100755 tests/qemu-iotests/302
 create mode 100644 tests/qemu-iotests/302.out

-- 
2.25.4

Re: equivalent to "-drive if=ide,id=disk0....."

2020-07-27 Thread Kashyap Chamarthy

[Cc: qemu-block]

On Mon, Jul 27, 2020 at 05:11:15PM +0800, Derek Su wrote:
> Hello,
> 
> I'm trying to replace "-drive if=ide,id=disk0." with "-blockdev
> '{"node-name": "top-node","
> The "id" is the name of BlockBackend, and the "node-name" is the name
> of the BDS tree's root.
> Is there any equivalent for "id" when use "-blockdev '{"node-name":
> "top-node"," ?

IIUC, specifying 'node-name' should be sufficient.  Also, you don't need
to specify JSON syntax on the command-line; you can 'flatten it' (see
below).

On 'id' vs. 'node-name', from the documentation of `blockdev-add`,
https://git.qemu.org/gitweb.cgi?p=qemu.git;a=blob;f=qapi/block-core.json#l4032

# Creates a new block device. If the @id option is given at the top level, a
# BlockBackend will be created; otherwise, @node-name is mandatory at the 
top
# level and no BlockBackend will be created.

- - -

And here's a minimal working example that I use with '-blockdev'

/usr/bin/qemu-system-x86_64\
-display none  \
-no-user-config\
-nodefaults\
-serial  stdio \
-cpu host  \
-smp 1,maxcpus=2   \
-machine q35,accel=kvm,usb=off \
-m 2048\
-blockdev 
node-name=node-Base,driver=qcow2,file.driver=file,file.filename=./base.qcow2 \
-device virtio-blk,drive=node-Base,id=virtio0 \

[...]

-- 
/kashyap

[PATCH v2 4/4] iotests: Test convert to qcow2 compressed to NBD

2020-07-27 Thread Nir Soffer

Add test for "qemu-img convert -O qcow2 -c" to NBD target. The tests    
create a OVA file and write compressed qcow2 disk content directly into
the OVA file via qemu-nbd.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/302 | 127 +
 tests/qemu-iotests/302.out |  31 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 159 insertions(+)
 create mode 100755 tests/qemu-iotests/302
 create mode 100644 tests/qemu-iotests/302.out

diff --git a/tests/qemu-iotests/302 b/tests/qemu-iotests/302
new file mode 100755
index 00..a8506bda15
--- /dev/null
+++ b/tests/qemu-iotests/302
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+#
+# Tests converting qcow2 compressed to NBD
+#
+# Copyright (c) 2020 Nir Soffer 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# owner=nir...@gmail.com
+
+import io
+import tarfile
+
+import iotests
+
+from iotests import (
+file_path,
+qemu_img,
+qemu_img_check,
+qemu_img_create,
+qemu_img_log,
+qemu_img_measure,
+qemu_io,
+qemu_nbd_popen,
+)
+
+iotests.script_initialize(supported_fmts=["qcow2"])
+
+# Create source disk. Using qcow2 to enable strict comparing later, and
+# avoid issues with random filesystem on CI environment.
+src_disk = file_path("disk.qcow2")
+qemu_img_create("-f", iotests.imgfmt, src_disk, "1g")
+qemu_io("-f", iotests.imgfmt, "-c", "write 1m 64k", src_disk)
+
+# The use case is writing qcow2 image directly into an ova file, which
+# is a tar file with specific layout. This is tricky since we don't know the
+# size of the image before compressing, so we have to do:
+# 1. Add an ovf file.
+# 2. Find the offset of the next member data.
+# 3. Make room for image data, allocating for the worst case.
+# 4. Write compressed image data into the tar.
+# 5. Add a tar entry with the actual image size.
+# 6. Shrink the tar to the actual size, aligned to 512 bytes.
+
+tar_file = file_path("test.ova")
+
+with tarfile.open(tar_file, "w") as tar:
+
+# 1. Add an ovf file.
+
+ovf_data = b""
+ovf = tarfile.TarInfo("vm.ovf")
+ovf.size = len(ovf_data)
+tar.addfile(ovf, io.BytesIO(ovf_data))
+
+# 2. Find the offset of the next member data.
+
+offset = tar.fileobj.tell() + 512
+
+# 3. Make room for image data, allocating for the worst case.
+
+measure = qemu_img_measure("-O", "qcow2", src_disk)
+tar.fileobj.truncate(offset + measure["required"])
+
+# 4. Write compressed image data into the tar.
+
+nbd_sock = file_path("nbd-sock", base_dir=iotests.sock_dir)
+nbd_uri = "nbd+unix:///exp?socket=" + nbd_sock
+
+# Use raw format to allow creating qcow2 directly into tar file.
+with qemu_nbd_popen(
+"--socket", nbd_sock,
+"--export-name", "exp",
+"--format", "raw",
+"--offset", str(offset),
+tar_file):
+
+iotests.log("=== Target image info ===")
+qemu_img_log("info", nbd_uri)
+
+qemu_img(
+"convert",
+"-f", iotests.imgfmt,
+"-O", "qcow2",
+"-c",
+src_disk,
+nbd_uri)
+
+iotests.log("=== Converted image info ===")
+qemu_img_log("info", nbd_uri)
+
+iotests.log("=== Converted image check ===")
+qemu_img_log("check", nbd_uri)
+
+iotests.log("=== Comparing to source disk ===")
+qemu_img_log("compare", src_disk, nbd_uri)
+
+actual_size = qemu_img_check(nbd_uri)["image-end-offset"]
+
+# 5. Add a tar entry with the actual image size.
+
+disk = tarfile.TarInfo("disk")
+disk.size = actual_size
+tar.addfile(disk)
+
+# 6. Shrink the tar to the actual size, aligned to 512 bytes.
+
+tar_size = offset + (disk.size + 511) & ~511
+tar.fileobj.seek(tar_size)
+tar.fileobj.truncate(tar_size)
+
+with tarfile.open(tar_file) as tar:
+members = [{"name": m.name, "size": m.size, "offset": m.offset_data}
+   for m in tar]
+iotests.log("=== OVA file contents ===")
+iotests.log(members)
diff --git a/tests/qemu-iotests/302.out b/tests/qemu-iotests/302.out
new file mode 100644
index 00..e37d3a1030
--- /dev/null
+++ b/tests/qemu-iotests/302.out
@@ -0,0 +1,31 @@
+Start NBD server
+=== Target image info ===
+image: nbd+unix:///exp?socket=SOCK_DIR/PID-nbd-sock
+file format: raw
+virtual size: 448 KiB (458752 bytes)
+disk

[PATCH v2 2/4] iotests: Make qemu_nbd_popen() a contextmanager

2020-07-27 Thread Nir Soffer

Instead of duplicating the code to wait until the server is ready and
remember to terminate the server and wait for it, make it possible to
use like this:

with qemu_nbd_popen('-k', sock, image):
# Access image via qemu-nbd socket...

Only test 264 used this helper, but I had to modify the output since it
did not consistently when starting and stopping qemu-nbd.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/264| 76 +--
 tests/qemu-iotests/264.out|  2 +
 tests/qemu-iotests/iotests.py | 28 -
 3 files changed, 56 insertions(+), 50 deletions(-)

diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index 304a7443d7..666f164ed8 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -36,48 +36,32 @@ wait_step = 0.2
 
 qemu_img_create('-f', iotests.imgfmt, disk_a, str(size))
 qemu_img_create('-f', iotests.imgfmt, disk_b, str(size))
-srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
 
-# Wait for NBD server availability
-t = 0
-ok = False
-while t < wait_limit:
-ok = qemu_io_silent_check('-f', 'raw', '-c', 'read 0 512', nbd_uri)
-if ok:
-break
-time.sleep(wait_step)
-t += wait_step
+with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
+vm = iotests.VM().add_drive(disk_a)
+vm.launch()
+vm.hmp_qemu_io('drive0', 'write 0 {}'.format(size))
+
+vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
+   **{'node_name': 'backup0',
+  'driver': 'raw',
+  'file': {'driver': 'nbd',
+   'server': {'type': 'unix', 'path': nbd_sock},
+   'reconnect-delay': 10}})
+vm.qmp_log('blockdev-backup', device='drive0', sync='full', 
target='backup0',
+   speed=(1 * 1024 * 1024))
+
+# Wait for some progress
+t = 0
+while t < wait_limit:
+jobs = vm.qmp('query-block-jobs')['return']
+if jobs and jobs[0]['offset'] > 0:
+break
+time.sleep(wait_step)
+t += wait_step
 
-assert ok
-
-vm = iotests.VM().add_drive(disk_a)
-vm.launch()
-vm.hmp_qemu_io('drive0', 'write 0 {}'.format(size))
-
-vm.qmp_log('blockdev-add', filters=[iotests.filter_qmp_testfiles],
-   **{'node_name': 'backup0',
-  'driver': 'raw',
-  'file': {'driver': 'nbd',
-   'server': {'type': 'unix', 'path': nbd_sock},
-   'reconnect-delay': 10}})
-vm.qmp_log('blockdev-backup', device='drive0', sync='full', target='backup0',
-   speed=(1 * 1024 * 1024))
-
-# Wait for some progress
-t = 0
-while t < wait_limit:
-jobs = vm.qmp('query-block-jobs')['return']
 if jobs and jobs[0]['offset'] > 0:
-break
-time.sleep(wait_step)
-t += wait_step
-
-if jobs and jobs[0]['offset'] > 0:
-log('Backup job is started')
-
-log('Kill NBD server')
-srv.kill()
-srv.wait()
+log('Backup job is started')
 
 jobs = vm.qmp('query-block-jobs')['return']
 if jobs and jobs[0]['offset'] < jobs[0]['len']:
@@ -88,12 +72,8 @@ vm.qmp_log('block-job-set-speed', device='drive0', speed=0)
 # Emulate server down time for 1 second
 time.sleep(1)
 
-log('Start NBD server')
-srv = qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b)
-
-e = vm.event_wait('BLOCK_JOB_COMPLETED')
-log('Backup completed: {}'.format(e['data']['offset']))
-
-vm.qmp_log('blockdev-del', node_name='backup0')
-srv.kill()
-vm.shutdown()
+with qemu_nbd_popen('-k', nbd_sock, '-f', iotests.imgfmt, disk_b):
+e = vm.event_wait('BLOCK_JOB_COMPLETED')
+log('Backup completed: {}'.format(e['data']['offset']))
+vm.qmp_log('blockdev-del', node_name='backup0')
+vm.shutdown()
diff --git a/tests/qemu-iotests/264.out b/tests/qemu-iotests/264.out
index 3000944b09..c45b1e81ef 100644
--- a/tests/qemu-iotests/264.out
+++ b/tests/qemu-iotests/264.out
@@ -1,3 +1,4 @@
+Start NBD server
 {"execute": "blockdev-add", "arguments": {"driver": "raw", "file": {"driver": 
"nbd", "reconnect-delay": 10, "server": {"path": "TEST_DIR/PID-nbd-sock", 
"type": "unix"}}, "node-name": "backup0"}}
 {"return": {}}
 {"execute": "blockdev-backup", "arguments": {"device": "drive0", "speed": 
1048576, "sync": "full", "target": "backup0"}}
@@ -11,3 +12,4 @@ Start NBD server
 Backup completed: 5242880
 {"execute": "blockdev-del", "arguments": {"node-name": "backup0"}}
 {"return": {}}
+Kill NBD server
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3590ed78a0..8f79668435 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -28,10 +28,13 @@ import signal
 import struct
 import subprocess
 import sys
+import time
 from typing import (Any, Callable, Dict, Iterable,
 List, Optional, Sequence, Tuple, TypeVar)
 import unittest
 
+from contextlib import contextmanager
+
 # pylint: disable=import-error, wrong-import-position

[PATCH v2 3/4] iotests: Add more qemu_img helpers

2020-07-27 Thread Nir Soffer

Add 2 helpers for measuring and checking images:
- qemu_img_measure()
- qemu_img_check()

Both use --output-json and parse the returned json to make easy to use
in other tests. I'm going to use them in a new test, and I hope they
will be useful in may other tests.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/iotests.py | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 8f79668435..717b5b652c 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -141,6 +141,12 @@ def qemu_img_create(*args):
 
 return qemu_img(*args)
 
+def qemu_img_measure(*args):
+return json.loads(qemu_img_pipe("measure", "--output", "json", *args))
+
+def qemu_img_check(*args):
+return json.loads(qemu_img_pipe("check", "--output", "json", *args))
+
 def qemu_img_verbose(*args):
 '''Run qemu-img without suppressing its output and return the exit code'''
 exitcode = subprocess.call(qemu_img_args + list(args))
-- 
2.25.4

[PATCH v2 1/4] block: nbd: Fix convert qcow2 compressed to nbd

2020-07-27 Thread Nir Soffer

When converting to qcow2 compressed format, the last step is a special
zero length compressed write, ending in call to bdrv_co_truncate(). This
call always fails for the nbd driver since it does not implement
bdrv_co_truncate().

For block devices, which have the same limits, the call succeeds since
file driver implements bdrv_co_truncate(). If the caller asked to
truncate to the same or smaller size with exact=false, the truncate
succeeds. Implement the same logic for nbd.

Example failing without this change:

In one shell starts qemu-nbd:

$ truncate -s 1g test.tar
$ qemu-nbd --socket=/tmp/nbd.sock --persistent --format=raw --offset 1536 
test.tar

In another shell convert an image to qcow2 compressed via NBD:

$ echo "disk data" > disk.raw
$ truncate -s 1g disk.raw
$ qemu-img convert -f raw -O qcow2 -c disk1.raw 
nbd+unix:///?socket=/tmp/nbd.sock; echo $?
1

qemu-img failed, but the conversion was successful:

$ qemu-img info nbd+unix:///?socket=/tmp/nbd.sock
image: nbd+unix://?socket=/tmp/nbd.sock
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
...

$ qemu-img check nbd+unix:///?socket=/tmp/nbd.sock
No errors were found on the image.
1/16384 = 0.01% allocated, 100.00% fragmented, 100.00% compressed clusters
Image end offset: 393216

$ qemu-img compare disk.raw nbd+unix:///?socket=/tmp/nbd.sock
Images are identical.

Fixes: https://bugzilla.redhat.com/1860627
Signed-off-by: Nir Soffer 
---
 block/nbd.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 65a4f56924..dcb0b03641 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1966,6 +1966,33 @@ static void nbd_close(BlockDriverState *bs)
 nbd_clear_bdrvstate(s);
 }
 
+/*
+ * NBD cannot truncate, but if the caller asks to truncate to the same size, or
+ * to a smaller size with exact=false, there is no reason to fail the
+ * operation.
+ *
+ * Preallocation mode is ignored since it does not seems useful to fail when
+ * when never change anything.
+ */
+static int coroutine_fn nbd_co_truncate(BlockDriverState *bs, int64_t offset,
+bool exact, PreallocMode prealloc,
+BdrvRequestFlags flags, Error **errp)
+{
+BDRVNBDState *s = bs->opaque;
+
+if (offset != s->info.size && exact) {
+error_setg(errp, "Cannot resize NBD nodes");
+return -ENOTSUP;
+}
+
+if (offset > s->info.size) {
+error_setg(errp, "Cannot grow NBD nodes");
+return -EINVAL;
+}
+
+return 0;
+}
+
 static int64_t nbd_getlength(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs->opaque;
@@ -2045,6 +2072,7 @@ static BlockDriver bdrv_nbd = {
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_pdiscard   = nbd_client_co_pdiscard,
 .bdrv_refresh_limits= nbd_refresh_limits,
+.bdrv_co_truncate   = nbd_co_truncate,
 .bdrv_getlength = nbd_getlength,
 .bdrv_detach_aio_context= nbd_client_detach_aio_context,
 .bdrv_attach_aio_context= nbd_client_attach_aio_context,
@@ -2072,6 +2100,7 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_pdiscard   = nbd_client_co_pdiscard,
 .bdrv_refresh_limits= nbd_refresh_limits,
+.bdrv_co_truncate   = nbd_co_truncate,
 .bdrv_getlength = nbd_getlength,
 .bdrv_detach_aio_context= nbd_client_detach_aio_context,
 .bdrv_attach_aio_context= nbd_client_attach_aio_context,
@@ -2099,6 +2128,7 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_co_flush_to_os= nbd_co_flush,
 .bdrv_co_pdiscard   = nbd_client_co_pdiscard,
 .bdrv_refresh_limits= nbd_refresh_limits,
+.bdrv_co_truncate   = nbd_co_truncate,
 .bdrv_getlength = nbd_getlength,
 .bdrv_detach_aio_context= nbd_client_detach_aio_context,
 .bdrv_attach_aio_context= nbd_client_attach_aio_context,
-- 
2.25.4

Re: [PATCH v2 2/5] block/nbd: allow drain during reconnect attempt

2020-07-27 Thread Eric Blake


On 7/27/20 1:47 PM, Vladimir Sementsov-Ogievskiy wrote:

It should be to reenter qio_channel_yield() on io/channel read/write


be safe


path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
 echo $n; ((n++));
 ./nbd-fault-injector.py 127.0.0.1:1 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
 echo $n; ((n++));
 ./qemu-io -c 'read 0 512' nbd://127.0.0.1:1;
done





Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/nbd.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 2ec6623c18..6d19f3c660 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -291,8 +291,22 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
  goto out;
  }
  
+bdrv_dec_in_flight(s->bs);

+
  ret = nbd_client_handshake(s->bs, sioc, _err);
  
+if (s->drained) {

+s->wait_drained_end = true;
+while (s->drained) {
+/*
+ * We may be entered once from nbd_client_attach_aio_context_bh
+ * and then from nbd_client_co_drain_end. So here is a loop.
+ */
+qemu_coroutine_yield();
+}
+}
+bdrv_inc_in_flight(s->bs);
+
  out:
  s->connect_status = ret;
  error_free(s->connect_err);



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PULL 24/24] migration: Fix typos in bitmap migration comments

2020-07-27 Thread Eric Blake

Noticed while reviewing the file for newer patches.

Fixes: b35ebdf076
Signed-off-by: Eric Blake 
Message-Id: <20200727203206.134996-1-ebl...@redhat.com>
---
 migration/block-dirty-bitmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1f675b792fc9..784330ebe130 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -97,7 +97,7 @@

 #define DIRTY_BITMAP_MIG_START_FLAG_ENABLED  0x01
 #define DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT   0x02
-/* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
+/* 0x04 was "AUTOLOAD" flags on older versions, now it is ignored */
 #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK0xf8

 /* State of one bitmap during save process */
@@ -180,7 +180,7 @@ static uint32_t qemu_get_bitmap_flags(QEMUFile *f)

 static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
 {
-/* The code currently do not send flags more than one byte */
+/* The code currently does not send flags as more than one byte */
 assert(!(flags & (0xff00 | DIRTY_BITMAP_MIG_EXTRA_FLAGS)));

 qemu_put_byte(f, flags);
-- 
2.27.0

Re: [PATCH v2 1/5] block/nbd: split nbd_establish_connection out of nbd_client_connect

2020-07-27 Thread Eric Blake


On 7/27/20 1:47 PM, Vladimir Sementsov-Ogievskiy wrote:

We are going to implement non-blocking version of
nbd_establish_connection, which for a while will be used only for
nbd_reconnect_attempt, not for nbd_open, so we need to call it
separately.

Refactor nbd_reconnect_attempt in a way which makes next commit
simpler.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/nbd.c| 60 +++---
  block/trace-events |  4 ++--
  2 files changed, 38 insertions(+), 26 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PULL 22/24] qemu-iotests/199: add source-killed case to bitmaps postcopy

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Previous patches fixes behavior of bitmaps migration, so that errors
are handled by just removing unfinished bitmaps, and not fail or try to
recover postcopy migration. Add corresponding test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-22-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 15 +++
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 140930b2b12e..58fad872a12c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -241,6 +241,21 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a.launch()
 check_bitmaps(self.vm_a, 0)

+def test_early_kill_source(self):
+self.start_postcopy()
+
+self.vm_a_events = self.vm_a.get_qmp_events()
+self.vm_a.kill()
+
+self.vm_a.launch()
+
+match = {'data': {'status': 'completed'}}
+e_complete = self.vm_b.event_wait('MIGRATION', match=match)
+self.vm_b_events.append(e_complete)
+
+check_bitmaps(self.vm_a, 0)
+check_bitmaps(self.vm_b, 0)
+

 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index fbc63e62f885..8d7e99670093 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-..
+...
 --
-Ran 2 tests
+Ran 3 tests

 OK
-- 
2.27.0

[PULL 21/24] qemu-iotests/199: add early shutdown case to bitmaps postcopy

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Previous patches fixed two crashes which may occur on shutdown prior to
bitmaps postcopy finished. Check that it works now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-21-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 24 
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 5fd34f0fcdfa..140930b2b12e 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -217,6 +217,30 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
 self.assert_qmp(result, 'return/sha256', sha)

+def test_early_shutdown_destination(self):
+self.start_postcopy()
+
+self.vm_b_events += self.vm_b.get_qmp_events()
+self.vm_b.shutdown()
+# recreate vm_b, so there is no incoming option, which prevents
+# loading bitmaps from disk
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+self.vm_b.launch()
+check_bitmaps(self.vm_b, 0)
+
+# Bitmaps will be lost if we just shutdown the vm, as they are marked
+# to skip storing to disk when prepared for migration. And that's
+# correct, as actual data may be modified in target vm, so we play
+# safe.
+# Still, this mark would be taken away if we do 'cont', and bitmaps
+# become persistent again. (see iotest 169 for such behavior case)
+result = self.vm_a.qmp('query-status')
+assert not result['return']['running']
+self.vm_a_events += self.vm_a.get_qmp_events()
+self.vm_a.shutdown()
+self.vm_a.launch()
+check_bitmaps(self.vm_a, 0)
+

 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index ae1213e6f863..fbc63e62f885 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-.
+..
 --
-Ran 1 tests
+Ran 2 tests

 OK
-- 
2.27.0

[PULL 23/24] iotests: Adjust which migration tests are quick

2020-07-27 Thread Eric Blake

A quick run of './check -qcow2 -g migration' shows that test 169 is
NOT quick, but meanwhile several other tests ARE quick.  Let's adjust
the test designations accordingly.

Signed-off-by: Eric Blake 
Message-Id: <20200727195117.132151-1-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/group | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 1d0252e1f051..806044642c69 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -112,7 +112,7 @@
 088 rw quick
 089 rw auto quick
 090 rw auto quick
-091 rw migration
+091 rw migration quick
 092 rw quick
 093 throttle
 094 rw quick
@@ -186,7 +186,7 @@
 162 quick
 163 rw
 165 rw quick
-169 rw quick migration
+169 rw migration
 170 rw auto quick
 171 rw quick
 172 auto
@@ -197,9 +197,9 @@
 177 rw auto quick
 178 img
 179 rw auto quick
-181 rw auto migration
+181 rw auto migration quick
 182 rw quick
-183 rw migration
+183 rw migration quick
 184 rw auto quick
 185 rw
 186 rw auto
@@ -216,9 +216,9 @@
 198 rw
 199 rw migration
 200 rw
-201 rw migration
+201 rw migration quick
 202 rw quick
-203 rw auto migration
+203 rw auto migration quick
 204 rw quick
 205 rw quick
 206 rw
-- 
2.27.0

[PULL 15/24] migration/block-dirty-bitmap: keep bitmap state for all bitmaps

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Keep bitmap state for disabled bitmaps too. Keep the state until the
end of the process. It's needed for the following commit to implement
bitmap postcopy canceling.

To clean-up the new list the following logic is used:
We need two events to consider bitmap migration finished:
1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
2. dirty_bitmap_mig_before_vm_start should be called
These two events may come in any order, so we understand which one is
last, and on the last of them we remove bitmap migration state from the
list.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-15-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 64 +++---
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 405a259296d9..eb4ffeac4d1b 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -132,6 +132,7 @@ typedef struct LoadBitmapState {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
 bool migrated;
+bool enabled;
 } LoadBitmapState;

 /* State of the dirty bitmap migration (DBM) during load process */
@@ -142,8 +143,10 @@ typedef struct DBMLoadState {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;

-GSList *enabled_bitmaps;
-QemuMutex lock; /* protect enabled_bitmaps */
+bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
+
+GSList *bitmaps;
+QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;

 typedef struct DBMState {
@@ -526,6 +529,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 Error *local_err = NULL;
 uint32_t granularity = qemu_get_be32(f);
 uint8_t flags = qemu_get_byte(f);
+LoadBitmapState *b;

 if (s->bitmap) {
 error_report("Bitmap with the same name ('%s') already exists on "
@@ -552,45 +556,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)

 bdrv_disable_dirty_bitmap(s->bitmap);
 if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
-LoadBitmapState *b;
-
 bdrv_dirty_bitmap_create_successor(s->bitmap, _err);
 if (local_err) {
 error_report_err(local_err);
 return -EINVAL;
 }
-
-b = g_new(LoadBitmapState, 1);
-b->bs = s->bs;
-b->bitmap = s->bitmap;
-b->migrated = false;
-s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
 }

+b = g_new(LoadBitmapState, 1);
+b->bs = s->bs;
+b->bitmap = s->bitmap;
+b->migrated = false;
+b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
+
+s->bitmaps = g_slist_prepend(s->bitmaps, b);
+
 return 0;
 }

-void dirty_bitmap_mig_before_vm_start(void)
+/*
+ * before_vm_start_handle_item
+ *
+ * g_slist_foreach helper
+ *
+ * item is LoadBitmapState*
+ * opaque is DBMLoadState*
+ */
+static void before_vm_start_handle_item(void *item, void *opaque)
 {
-DBMLoadState *s = _state.load;
-GSList *item;
-
-qemu_mutex_lock(>lock);
-
-for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
-LoadBitmapState *b = item->data;
+DBMLoadState *s = opaque;
+LoadBitmapState *b = item;

+if (b->enabled) {
 if (b->migrated) {
 bdrv_enable_dirty_bitmap(b->bitmap);
 } else {
 bdrv_dirty_bitmap_enable_successor(b->bitmap);
 }
+}

+if (b->migrated) {
+s->bitmaps = g_slist_remove(s->bitmaps, b);
 g_free(b);
 }
+}

-g_slist_free(s->enabled_bitmaps);
-s->enabled_bitmaps = NULL;
+void dirty_bitmap_mig_before_vm_start(void)
+{
+DBMLoadState *s = _state.load;
+qemu_mutex_lock(>lock);
+
+assert(!s->before_vm_start_handled);
+g_slist_foreach(s->bitmaps, before_vm_start_handle_item, s);
+s->before_vm_start_handled = true;

 qemu_mutex_unlock(>lock);
 }
@@ -607,11 +625,15 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 bdrv_reclaim_dirty_bitmap(s->bitmap, _abort);
 }

-for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
+for (item = s->bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;

 if (b->bitmap == s->bitmap) {
 b->migrated = true;
+if (s->before_vm_start_handled) {
+s->bitmaps = g_slist_remove(s->bitmaps, b);
+g_free(b);
+}
 break;
 }
 }
-- 
2.27.0

[PULL 09/24] migration/block-dirty-bitmap: rename state structure types

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Rename types to be symmetrical for load/save part and shorter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
Message-Id: <20200727194236.19551-9-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 70 ++
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 0739f1259e05..1d57bff4f6c7 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -100,23 +100,25 @@
 /* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
 #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK0xf8

-typedef struct DirtyBitmapMigBitmapState {
+/* State of one bitmap during save process */
+typedef struct SaveBitmapState {
 /* Written during setup phase. */
 BlockDriverState *bs;
 const char *node_name;
 BdrvDirtyBitmap *bitmap;
 uint64_t total_sectors;
 uint64_t sectors_per_chunk;
-QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+QSIMPLEQ_ENTRY(SaveBitmapState) entry;
 uint8_t flags;

 /* For bulk phase. */
 bool bulk_completed;
 uint64_t cur_sector;
-} DirtyBitmapMigBitmapState;
+} SaveBitmapState;

-typedef struct DirtyBitmapMigState {
-QSIMPLEQ_HEAD(, DirtyBitmapMigBitmapState) dbms_list;
+/* State of the dirty bitmap migration (DBM) during save process */
+typedef struct DBMSaveState {
+QSIMPLEQ_HEAD(, SaveBitmapState) dbms_list;

 bool bulk_completed;
 bool no_bitmaps;
@@ -124,23 +126,25 @@ typedef struct DirtyBitmapMigState {
 /* for send_bitmap_bits() */
 BlockDriverState *prev_bs;
 BdrvDirtyBitmap *prev_bitmap;
-} DirtyBitmapMigState;
+} DBMSaveState;

-typedef struct DirtyBitmapLoadState {
+/* State of the dirty bitmap migration (DBM) during load process */
+typedef struct DBMLoadState {
 uint32_t flags;
 char node_name[256];
 char bitmap_name[256];
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
-} DirtyBitmapLoadState;
+} DBMLoadState;

-static DirtyBitmapMigState dirty_bitmap_mig_state;
+static DBMSaveState dirty_bitmap_mig_state;

-typedef struct DirtyBitmapLoadBitmapState {
+/* State of one bitmap during load process */
+typedef struct LoadBitmapState {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
 bool migrated;
-} DirtyBitmapLoadBitmapState;
+} LoadBitmapState;
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;

@@ -170,7 +174,7 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t 
flags)
 qemu_put_byte(f, flags);
 }

-static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
uint32_t additional_flags)
 {
 BlockDriverState *bs = dbms->bs;
@@ -199,19 +203,19 @@ static void send_bitmap_header(QEMUFile *f, 
DirtyBitmapMigBitmapState *dbms,
 }
 }

-static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
 {
 send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
 qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
 qemu_put_byte(f, dbms->flags);
 }

-static void send_bitmap_complete(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
 {
 send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }

-static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
  uint64_t start_sector, uint32_t nr_sectors)
 {
 /* align for buffer_is_zero() */
@@ -257,7 +261,7 @@ static void send_bitmap_bits(QEMUFile *f, 
DirtyBitmapMigBitmapState *dbms,
 /* Called with iothread lock taken.  */
 static void dirty_bitmap_mig_cleanup(void)
 {
-DirtyBitmapMigBitmapState *dbms;
+SaveBitmapState *dbms;

 while ((dbms = QSIMPLEQ_FIRST(_bitmap_mig_state.dbms_list)) != NULL) 
{
 QSIMPLEQ_REMOVE_HEAD(_bitmap_mig_state.dbms_list, entry);
@@ -271,7 +275,7 @@ static void dirty_bitmap_mig_cleanup(void)
 static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
 {
 BdrvDirtyBitmap *bitmap;
-DirtyBitmapMigBitmapState *dbms;
+SaveBitmapState *dbms;
 Error *local_err = NULL;

 FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
@@ -309,7 +313,7 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const 
char *bs_name)
 bdrv_ref(bs);
 bdrv_dirty_bitmap_set_busy(bitmap, true);

-dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+dbms = g_new0(SaveBitmapState, 1);
 dbms->bs = bs;
 dbms->node_name = bs_name;
 dbms->bitmap = bitmap;
@@ -334,7 +338,7 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const 
char *bs_name)
 static int

[PULL 20/24] qemu-iotests/199: check persistent bitmaps

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Check that persistent bitmaps are not stored on source and that bitmaps
are persistent on destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-20-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 355c0b288592..5fd34f0fcdfa 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -117,7 +117,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 for i in range(nb_bitmaps):
 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap{}'.format(i),
-   granularity=granularity)
+   granularity=granularity,
+   persistent=True)
 self.assert_qmp(result, 'return', {})

 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
@@ -193,6 +194,19 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 print('downtime:', downtime)
 print('postcopy_time:', postcopy_time)

+# check that there are no bitmaps stored on source
+self.vm_a_events += self.vm_a.get_qmp_events()
+self.vm_a.shutdown()
+self.vm_a.launch()
+check_bitmaps(self.vm_a, 0)
+
+# check that bitmaps are migrated and persistence works
+check_bitmaps(self.vm_b, nb_bitmaps)
+self.vm_b.shutdown()
+# recreate vm_b, so there is no incoming option, which prevents
+# loading bitmaps from disk
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+self.vm_b.launch()
 check_bitmaps(self.vm_b, nb_bitmaps)

 # Check content of migrated bitmaps. Still, don't waste time checking
-- 
2.27.0

[PULL 17/24] migration/block-dirty-bitmap: cancel migration on shutdown

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-17-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/migration.h  |  2 ++
 migration/block-dirty-bitmap.c | 16 
 migration/migration.c  | 13 +
 3 files changed, 31 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index ab20c756f549..6c6a931d0dc2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
*mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);

 void dirty_bitmap_mig_before_vm_start(void);
+void dirty_bitmap_mig_cancel_outgoing(void);
+void dirty_bitmap_mig_cancel_incoming(void);
 void migrate_add_address(SocketAddress *address);

 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index f91015a4f88f..1f675b792fc9 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
 s->bitmaps = NULL;
 }

+void dirty_bitmap_mig_cancel_outgoing(void)
+{
+dirty_bitmap_do_save_cleanup(_state.save);
+}
+
+void dirty_bitmap_mig_cancel_incoming(void)
+{
+DBMLoadState *s = _state.load;
+
+qemu_mutex_lock(>lock);
+
+cancel_incoming_locked(s);
+
+qemu_mutex_unlock(>lock);
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
 GSList *item;
diff --git a/migration/migration.c b/migration/migration.c
index 1c61428988e9..8fe36339dbe8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -188,6 +188,19 @@ void migration_shutdown(void)
  */
 migrate_fd_cancel(current_migration);
 object_unref(OBJECT(current_migration));
+
+/*
+ * Cancel outgoing migration of dirty bitmaps. It should
+ * at least unref used block nodes.
+ */
+dirty_bitmap_mig_cancel_outgoing();
+
+/*
+ * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
+ * are non-critical data, and their loss never considered as
+ * something serious.
+ */
+dirty_bitmap_mig_cancel_incoming();
 }

 /* For outgoing */
-- 
2.27.0

[PULL 19/24] qemu-iotests/199: prepare for new test-cases addition

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Move future common part to start_postcopy() method. Move checking
number of bitmaps to check_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-19-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index d8532e49da00..355c0b288592 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -29,6 +29,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')

+granularity = 512
+nb_bitmaps = 15

 GiB = 1024 * 1024 * 1024

@@ -61,6 +63,15 @@ def event_dist(e1, e2):
 return event_seconds(e2) - event_seconds(e1)


+def check_bitmaps(vm, count):
+result = vm.qmp('query-block')
+
+if count == 0:
+assert 'dirty-bitmaps' not in result['return'][0]
+else:
+assert len(result['return'][0]['dirty-bitmaps']) == count
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 def tearDown(self):
 if debug:
@@ -101,10 +112,8 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a_events = []
 self.vm_b_events = []

-def test_postcopy(self):
-granularity = 512
-nb_bitmaps = 15
-
+def start_postcopy(self):
+""" Run migration until RESUME event on target. Return this event. """
 for i in range(nb_bitmaps):
 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap{}'.format(i),
@@ -119,10 +128,10 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):

 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap0')
-discards1_sha256 = result['return']['sha256']
+self.discards1_sha256 = result['return']['sha256']

 # Check, that updating the bitmap by discards works
-assert discards1_sha256 != empty_sha256
+assert self.discards1_sha256 != empty_sha256

 # We want to calculate resulting sha256. Do it in bitmap0, so, disable
 # other bitmaps
@@ -135,7 +144,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):

 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap0')
-all_discards_sha256 = result['return']['sha256']
+self.all_discards_sha256 = result['return']['sha256']

 # Now, enable some bitmaps, to be updated during migration
 for i in range(2, nb_bitmaps, 2):
@@ -160,6 +169,10 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):

 event_resume = self.vm_b.event_wait('RESUME')
 self.vm_b_events.append(event_resume)
+return event_resume
+
+def test_postcopy_success(self):
+event_resume = self.start_postcopy()

 # enabled bitmaps should be updated
 apply_discards(self.vm_b, discards2)
@@ -180,18 +193,15 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 print('downtime:', downtime)
 print('postcopy_time:', postcopy_time)

-# Assert that bitmap migration is finished (check that successor bitmap
-# is removed)
-result = self.vm_b.qmp('query-block')
-assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+check_bitmaps(self.vm_b, nb_bitmaps)

 # Check content of migrated bitmaps. Still, don't waste time checking
 # every bitmap
 for i in range(0, nb_bitmaps, 5):
 result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap{}'.format(i))
-sha256 = discards1_sha256 if i % 2 else all_discards_sha256
-self.assert_qmp(result, 'return/sha256', sha256)
+sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
+self.assert_qmp(result, 'return/sha256', sha)


 if __name__ == '__main__':
-- 
2.27.0

[PULL 16/24] migration/block-dirty-bitmap: relax error handling in incoming part

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Bitmaps data is not critical, and we should not fail the migration (or
use postcopy recovering) because of dirty-bitmaps migration failure.
Instead we should just lose unfinished bitmaps.

Still we have to report io stream violation errors, as they affect the
whole migration stream.

While touching this, tighten code that was previously blindly calling
malloc on a size read from the migration stream, as a corrupted stream
(perhaps from a malicious user) should not be able to convince us to
allocate an inordinate amount of memory.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20200727194236.19551-16-vsement...@virtuozzo.com>
Reviewed-by: Eric Blake 
[eblake: typo fixes, enhance commit message]
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 162 +
 1 file changed, 126 insertions(+), 36 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index eb4ffeac4d1b..f91015a4f88f 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -145,6 +145,15 @@ typedef struct DBMLoadState {

 bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */

+/*
+ * cancelled
+ * Incoming migration is cancelled for some reason. That means that we
+ * still should read our chunks from migration stream, to not affect other
+ * migration objects (like RAM), but just ignore them and do not touch any
+ * bitmaps or nodes.
+ */
+bool cancelled;
+
 GSList *bitmaps;
 QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
@@ -531,6 +540,10 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 uint8_t flags = qemu_get_byte(f);
 LoadBitmapState *b;

+if (s->cancelled) {
+return 0;
+}
+
 if (s->bitmap) {
 error_report("Bitmap with the same name ('%s') already exists on "
  "destination", bdrv_dirty_bitmap_name(s->bitmap));
@@ -613,14 +626,48 @@ void dirty_bitmap_mig_before_vm_start(void)
 qemu_mutex_unlock(>lock);
 }

+static void cancel_incoming_locked(DBMLoadState *s)
+{
+GSList *item;
+
+if (s->cancelled) {
+return;
+}
+
+s->cancelled = true;
+s->bs = NULL;
+s->bitmap = NULL;
+
+/* Drop all unfinished bitmaps */
+for (item = s->bitmaps; item; item = g_slist_next(item)) {
+LoadBitmapState *b = item->data;
+
+/*
+ * Bitmap must be unfinished, as finished bitmaps should already be
+ * removed from the list.
+ */
+assert(!s->before_vm_start_handled || !b->migrated);
+if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
+bdrv_reclaim_dirty_bitmap(b->bitmap, _abort);
+}
+bdrv_release_dirty_bitmap(b->bitmap);
+}
+
+g_slist_free_full(s->bitmaps, g_free);
+s->bitmaps = NULL;
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
 GSList *item;
 trace_dirty_bitmap_load_complete();
+
+if (s->cancelled) {
+return;
+}
+
 bdrv_dirty_bitmap_deserialize_finish(s->bitmap);

-qemu_mutex_lock(>lock);
-
 if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
 bdrv_reclaim_dirty_bitmap(s->bitmap, _abort);
 }
@@ -637,8 +684,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 break;
 }
 }
-
-qemu_mutex_unlock(>lock);
 }

 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -650,15 +695,46 @@ static int dirty_bitmap_load_bits(QEMUFile *f, 
DBMLoadState *s)

 if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
 trace_dirty_bitmap_load_bits_zeroes();
-bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
- false);
+if (!s->cancelled) {
+bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
+ nr_bytes, false);
+}
 } else {
 size_t ret;
-uint8_t *buf;
+g_autofree uint8_t *buf = NULL;
 uint64_t buf_size = qemu_get_be64(f);
-uint64_t needed_size =
-bdrv_dirty_bitmap_serialization_size(s->bitmap,
- first_byte, nr_bytes);
+uint64_t needed_size;
+
+/*
+ * The actual check for buf_size is done a bit later. We can't do it in
+ * cancelled mode as we don't have the bitmap to check the constraints
+ * (so, we allocate a buffer and read prior to the check). On the other
+ * hand, we shouldn't blindly g_malloc the number from the stream.
+ * Actually one chunk should not be larger than CHUNK_SIZE. Let's allow
+ * a bit larger (which means that bitmap migration will fail anyway and
+ * the whole migration will most probably fail soon due to broken
+ * stream).
+ */
+if

[PULL 10/24] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
stress that it is on save part.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
Message-Id: <20200727194236.19551-10-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1d57bff4f6c7..01a536d7d3d3 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -259,7 +259,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState 
*dbms,
 }

 /* Called with iothread lock taken.  */
-static void dirty_bitmap_mig_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(void)
 {
 SaveBitmapState *dbms;

@@ -406,7 +406,7 @@ static int init_dirty_bitmap_migration(void)

 fail:
 g_hash_table_destroy(handled_by_blk);
-dirty_bitmap_mig_cleanup();
+dirty_bitmap_do_save_cleanup();

 return -1;
 }
@@ -445,7 +445,7 @@ static void bulk_phase(QEMUFile *f, bool limit)
 /* for SaveVMHandlers */
 static void dirty_bitmap_save_cleanup(void *opaque)
 {
-dirty_bitmap_mig_cleanup();
+dirty_bitmap_do_save_cleanup();
 }

 static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
@@ -480,7 +480,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void 
*opaque)

 trace_dirty_bitmap_save_complete_finish();

-dirty_bitmap_mig_cleanup();
+dirty_bitmap_do_save_cleanup();
 return 0;
 }

-- 
2.27.0

[PULL 13/24] migration/block-dirty-bitmap: rename finish_lock to just lock

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

finish_lock is bad name, as lock used not only on process end.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-13-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9b39e7aa2b4f..9194807b54f1 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -143,7 +143,7 @@ typedef struct DBMLoadState {
 BdrvDirtyBitmap *bitmap;

 GSList *enabled_bitmaps;
-QemuMutex finish_lock;
+QemuMutex lock; /* protect enabled_bitmaps */
 } DBMLoadState;

 typedef struct DBMState {
@@ -575,7 +575,7 @@ void dirty_bitmap_mig_before_vm_start(void)
 DBMLoadState *s = _state.load;
 GSList *item;

-qemu_mutex_lock(>finish_lock);
+qemu_mutex_lock(>lock);

 for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;
@@ -592,7 +592,7 @@ void dirty_bitmap_mig_before_vm_start(void)
 g_slist_free(s->enabled_bitmaps);
 s->enabled_bitmaps = NULL;

-qemu_mutex_unlock(>finish_lock);
+qemu_mutex_unlock(>lock);
 }

 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
@@ -601,7 +601,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 trace_dirty_bitmap_load_complete();
 bdrv_dirty_bitmap_deserialize_finish(s->bitmap);

-qemu_mutex_lock(>finish_lock);
+qemu_mutex_lock(>lock);

 for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;
@@ -633,7 +633,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 bdrv_dirty_bitmap_unlock(s->bitmap);
 }

-qemu_mutex_unlock(>finish_lock);
+qemu_mutex_unlock(>lock);
 }

 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -815,7 +815,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
 QSIMPLEQ_INIT(_state.save.dbms_list);
-qemu_mutex_init(_state.load.finish_lock);
+qemu_mutex_init(_state.load.lock);

 register_savevm_live("dirty-bitmap", 0, 1,
  _dirty_bitmap_handlers,
-- 
2.27.0

[PULL 12/24] migration/block-dirty-bitmap: refactor state global variables

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Move all state variables into one global struct. Reduce global
variable usage, utilizing opaque pointer where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-12-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 179 ++---
 1 file changed, 99 insertions(+), 80 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 4b67e4f4fbcd..9b39e7aa2b4f 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -128,6 +128,12 @@ typedef struct DBMSaveState {
 BdrvDirtyBitmap *prev_bitmap;
 } DBMSaveState;

+typedef struct LoadBitmapState {
+BlockDriverState *bs;
+BdrvDirtyBitmap *bitmap;
+bool migrated;
+} LoadBitmapState;
+
 /* State of the dirty bitmap migration (DBM) during load process */
 typedef struct DBMLoadState {
 uint32_t flags;
@@ -135,18 +141,17 @@ typedef struct DBMLoadState {
 char bitmap_name[256];
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
+
+GSList *enabled_bitmaps;
+QemuMutex finish_lock;
 } DBMLoadState;

-static DBMSaveState dirty_bitmap_mig_state;
+typedef struct DBMState {
+DBMSaveState save;
+DBMLoadState load;
+} DBMState;

-/* State of one bitmap during load process */
-typedef struct LoadBitmapState {
-BlockDriverState *bs;
-BdrvDirtyBitmap *bitmap;
-bool migrated;
-} LoadBitmapState;
-static GSList *enabled_bitmaps;
-QemuMutex finish_lock;
+static DBMState dbm_state;

 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
@@ -169,21 +174,21 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t 
flags)
 qemu_put_byte(f, flags);
 }

-static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
-   uint32_t additional_flags)
+static void send_bitmap_header(QEMUFile *f, DBMSaveState *s,
+   SaveBitmapState *dbms, uint32_t 
additional_flags)
 {
 BlockDriverState *bs = dbms->bs;
 BdrvDirtyBitmap *bitmap = dbms->bitmap;
 uint32_t flags = additional_flags;
 trace_send_bitmap_header_enter();

-if (bs != dirty_bitmap_mig_state.prev_bs) {
-dirty_bitmap_mig_state.prev_bs = bs;
+if (bs != s->prev_bs) {
+s->prev_bs = bs;
 flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
 }

-if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
-dirty_bitmap_mig_state.prev_bitmap = bitmap;
+if (bitmap != s->prev_bitmap) {
+s->prev_bitmap = bitmap;
 flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
 }

@@ -198,19 +203,22 @@ static void send_bitmap_header(QEMUFile *f, 
SaveBitmapState *dbms,
 }
 }

-static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, DBMSaveState *s,
+  SaveBitmapState *dbms)
 {
-send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
+send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_START);
 qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
 qemu_put_byte(f, dbms->flags);
 }

-static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, DBMSaveState *s,
+ SaveBitmapState *dbms)
 {
-send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
+send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }

-static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, DBMSaveState *s,
+ SaveBitmapState *dbms,
  uint64_t start_sector, uint32_t nr_sectors)
 {
 /* align for buffer_is_zero() */
@@ -235,7 +243,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState 
*dbms,

 trace_send_bitmap_bits(flags, start_sector, nr_sectors, buf_size);

-send_bitmap_header(f, dbms, flags);
+send_bitmap_header(f, s, dbms, flags);

 qemu_put_be64(f, start_sector);
 qemu_put_be32(f, nr_sectors);
@@ -254,12 +262,12 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState 
*dbms,
 }

 /* Called with iothread lock taken.  */
-static void dirty_bitmap_do_save_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
 {
 SaveBitmapState *dbms;

-while ((dbms = QSIMPLEQ_FIRST(_bitmap_mig_state.dbms_list)) != NULL) 
{
-QSIMPLEQ_REMOVE_HEAD(_bitmap_mig_state.dbms_list, entry);
+while ((dbms = QSIMPLEQ_FIRST(>dbms_list)) != NULL) {
+QSIMPLEQ_REMOVE_HEAD(>dbms_list, entry);
 bdrv_dirty_bitmap_set_busy(dbms->bitmap, false);
 bdrv_unref(dbms->bs);
 g_free(dbms);
@@ -267,7 +275,8 @@ static void dirty_bitmap_do_save_cleanup(void)
 }

 /* Called with iothread lock taken. */
-static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
+static int

[PULL 07/24] qemu-iotests/199: increase postcopy period

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

The test wants to force a bitmap postcopy. Still, the resulting
postcopy period is very small. Let's increase it by adding more
bitmaps to migrate. Also, test disabled bitmaps migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-7-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 66 +++---
 1 file changed, 43 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index da4dae01fb5d..d8532e49da00 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -103,30 +103,46 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):

 def test_postcopy(self):
 granularity = 512
+nb_bitmaps = 15

-result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
-   name='bitmap', granularity=granularity)
-self.assert_qmp(result, 'return', {})
+for i in range(nb_bitmaps):
+result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+   name='bitmap{}'.format(i),
+   granularity=granularity)
+self.assert_qmp(result, 'return', {})

 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-   node='drive0', name='bitmap')
+   node='drive0', name='bitmap0')
 empty_sha256 = result['return']['sha256']

-apply_discards(self.vm_a, discards1 + discards2)
-
-result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-   node='drive0', name='bitmap')
-sha256 = result['return']['sha256']
-
-# Check, that updating the bitmap by discards works
-assert sha256 != empty_sha256
-
-result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
-   name='bitmap')
-self.assert_qmp(result, 'return', {})
-
 apply_discards(self.vm_a, discards1)

+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap0')
+discards1_sha256 = result['return']['sha256']
+
+# Check, that updating the bitmap by discards works
+assert discards1_sha256 != empty_sha256
+
+# We want to calculate resulting sha256. Do it in bitmap0, so, disable
+# other bitmaps
+for i in range(1, nb_bitmaps):
+result = self.vm_a.qmp('block-dirty-bitmap-disable', node='drive0',
+   name='bitmap{}'.format(i))
+self.assert_qmp(result, 'return', {})
+
+apply_discards(self.vm_a, discards2)
+
+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap0')
+all_discards_sha256 = result['return']['sha256']
+
+# Now, enable some bitmaps, to be updated during migration
+for i in range(2, nb_bitmaps, 2):
+result = self.vm_a.qmp('block-dirty-bitmap-enable', node='drive0',
+   name='bitmap{}'.format(i))
+self.assert_qmp(result, 'return', {})
+
 caps = [{'capability': 'dirty-bitmaps', 'state': True},
 {'capability': 'events', 'state': True}]

@@ -145,6 +161,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 event_resume = self.vm_b.event_wait('RESUME')
 self.vm_b_events.append(event_resume)

+# enabled bitmaps should be updated
 apply_discards(self.vm_b, discards2)

 match = {'data': {'status': 'completed'}}
@@ -158,7 +175,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 downtime = event_dist(event_stop, event_resume)
 postcopy_time = event_dist(event_resume, event_complete)

-# TODO: assert downtime * 10 < postcopy_time
+assert downtime * 10 < postcopy_time
 if debug:
 print('downtime:', downtime)
 print('postcopy_time:', postcopy_time)
@@ -166,12 +183,15 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 # Assert that bitmap migration is finished (check that successor bitmap
 # is removed)
 result = self.vm_b.qmp('query-block')
-assert len(result['return'][0]['dirty-bitmaps']) == 1
+assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps

-# Check content of migrated (and updated by new writes) bitmap
-result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
-   node='drive0', name='bitmap')
-self.assert_qmp(result, 'return/sha256', sha256)
+# Check content of migrated bitmaps. Still, don't waste time checking
+# every bitmap
+for i in range(0, nb_bitmaps, 5):
+

[PULL 11/24] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

No reasons to keep two public init functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Dr. David Alan Gilbert 
Message-Id: <20200727194236.19551-11-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/migration.h  | 1 -
 migration/block-dirty-bitmap.c | 6 +-
 migration/migration.c  | 2 --
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index f617960522aa..ab20c756f549 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,7 +335,6 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
*mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);

 void dirty_bitmap_mig_before_vm_start(void);
-void init_dirty_bitmap_incoming_migration(void);
 void migrate_add_address(SocketAddress *address);

 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 01a536d7d3d3..4b67e4f4fbcd 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -148,11 +148,6 @@ typedef struct LoadBitmapState {
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;

-void init_dirty_bitmap_incoming_migration(void)
-{
-qemu_mutex_init(_lock);
-}
-
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
 uint8_t flags = qemu_get_byte(f);
@@ -801,6 +796,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
 QSIMPLEQ_INIT(_bitmap_mig_state.dbms_list);
+qemu_mutex_init(_lock);

 register_savevm_live("dirty-bitmap", 0, 1,
  _dirty_bitmap_handlers,
diff --git a/migration/migration.c b/migration/migration.c
index 2ed99232272e..1c61428988e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,8 +165,6 @@ void migration_object_init(void)
 qemu_sem_init(_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(_incoming->postcopy_pause_sem_fault, 0);

-init_dirty_bitmap_incoming_migration();
-
 if (!migration_object_check(current_migration, )) {
 error_report_err(err);
 exit(1);
-- 
2.27.0

[PULL 14/24] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
postcopy, bitmap successor must be enabled, and reclaim operation will
enable the bitmap.

So, actually we need just call _reclaim_ in both if branches, and
making differences only to add an assertion seems not really good. The
logic becomes simple: on load complete we do reclaim and that's all.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Message-Id: <20200727194236.19551-14-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9194807b54f1..405a259296d9 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -603,6 +603,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)

 qemu_mutex_lock(>lock);

+if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
+bdrv_reclaim_dirty_bitmap(s->bitmap, _abort);
+}
+
 for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;

@@ -612,27 +616,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 }
 }

-if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
-bdrv_dirty_bitmap_lock(s->bitmap);
-if (s->enabled_bitmaps == NULL) {
-/* in postcopy */
-bdrv_reclaim_dirty_bitmap_locked(s->bitmap, _abort);
-bdrv_enable_dirty_bitmap_locked(s->bitmap);
-} else {
-/* target not started, successor must be empty */
-int64_t count = bdrv_get_dirty_count(s->bitmap);
-BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
-NULL);
-/* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
- * must be) or on merge fail, but merge can't fail when second
- * bitmap is empty
- */
-assert(ret == s->bitmap &&
-   count == bdrv_get_dirty_count(s->bitmap));
-}
-bdrv_dirty_bitmap_unlock(s->bitmap);
-}
-
 qemu_mutex_unlock(>lock);
 }

-- 
2.27.0

[PULL 03/24] qemu-iotests/199: drop extra constraints

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

We don't need any specific format constraints here. Still keep qcow2
for two reasons:
1. No extra calls of format-unrelated test
2. Add some check around persistent bitmap in future (require qcow2)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-3-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index de9ba8d94c23..dda918450a8b 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -116,5 +116,4 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):


 if __name__ == '__main__':
-iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
- supported_protocols=['file'])
+iotests.main(supported_fmts=['qcow2'])
-- 
2.27.0

[PULL 08/24] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Using the _locked version of bdrv_enable_dirty_bitmap to bypass locking
is wrong as we do not already own the mutex.  Moreover, the adjacent
call to bdrv_dirty_bitmap_enable_successor grabs the mutex.

Fixes: 58f72b965e9e1q
Cc: qemu-sta...@nongnu.org # v3.0
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
Message-Id: <20200727194236.19551-8-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index b0dbf9eeed43..0739f1259e05 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -566,7 +566,7 @@ void dirty_bitmap_mig_before_vm_start(void)
 DirtyBitmapLoadBitmapState *b = item->data;

 if (b->migrated) {
-bdrv_enable_dirty_bitmap_locked(b->bitmap);
+bdrv_enable_dirty_bitmap(b->bitmap);
 } else {
 bdrv_dirty_bitmap_enable_successor(b->bitmap);
 }
-- 
2.27.0

[PULL 04/24] qemu-iotests/199: better catch postcopy time

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

The test aims to test _postcopy_ migration, and wants to do some write
operations during postcopy time.

Test considers migrate status=complete event on source as start of
postcopy. This is completely wrong, completion is completion of the
whole migration process. Let's instead consider destination start as
start of postcopy, and use RESUME event for it.

Next, as migration finish, let's use migration status=complete event on
target, as such method is closer to what libvirt or another user will
do, than tracking number of dirty-bitmaps.

Finally, add a possibility to dump events for debug. And if
set debug to True, we see, that actual postcopy period is very small
relatively to the whole test duration time (~0.2 seconds to >40 seconds
for me). This means, that test is very inefficient in what it supposed
to do. Let's improve it in following commits.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-4-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 72 +-
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dda918450a8b..dd6044768c76 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -20,17 +20,43 @@

 import os
 import iotests
-import time
 from iotests import qemu_img

+debug = False
+
 disk_a = os.path.join(iotests.test_dir, 'disk_a')
 disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')


+def event_seconds(event):
+return event['timestamp']['seconds'] + \
+event['timestamp']['microseconds'] / 100.0
+
+
+def event_dist(e1, e2):
+return event_seconds(e2) - event_seconds(e1)
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 def tearDown(self):
+if debug:
+self.vm_a_events += self.vm_a.get_qmp_events()
+self.vm_b_events += self.vm_b.get_qmp_events()
+for e in self.vm_a_events:
+e['vm'] = 'SRC'
+for e in self.vm_b_events:
+e['vm'] = 'DST'
+events = (self.vm_a_events + self.vm_b_events)
+events = [(e['timestamp']['seconds'],
+   e['timestamp']['microseconds'],
+   e['vm'],
+   e['event'],
+   e.get('data', '')) for e in events]
+for e in sorted(events):
+print('{}.{:06} {} {} {}'.format(*e))
+
 self.vm_a.shutdown()
 self.vm_b.shutdown()
 os.remove(disk_a)
@@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a.launch()
 self.vm_b.launch()

+# collect received events for debug
+self.vm_a_events = []
+self.vm_b_events = []
+
 def test_postcopy(self):
 write_size = 0x4000
 granularity = 512
@@ -77,15 +107,13 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
 s += 0x1

-bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
-events_cap = {'capability': 'events', 'state': True}
+caps = [{'capability': 'dirty-bitmaps', 'state': True},
+{'capability': 'events', 'state': True}]

-result = self.vm_a.qmp('migrate-set-capabilities',
-   capabilities=[bitmaps_cap, events_cap])
+result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
 self.assert_qmp(result, 'return', {})

-result = self.vm_b.qmp('migrate-set-capabilities',
-   capabilities=[bitmaps_cap])
+result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
 self.assert_qmp(result, 'return', {})

 result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
@@ -94,24 +122,38 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 result = self.vm_a.qmp('migrate-start-postcopy')
 self.assert_qmp(result, 'return', {})

-while True:
-event = self.vm_a.event_wait('MIGRATION')
-if event['data']['status'] == 'completed':
-break
+event_resume = self.vm_b.event_wait('RESUME')
+self.vm_b_events.append(event_resume)

 s = 0x8000
 while s < write_size:
 self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
 s += 0x1

+match = {'data': {'status': 'completed'}}
+event_complete = self.vm_b.event_wait('MIGRATION', match=match)
+self.vm_b_events.append(event_complete)
+
+# take queued event, should already been happened
+event_stop = self.vm_a.event_wait('STOP')
+self.vm_a_events.append(event_stop)
+
+

[PULL 06/24] qemu-iotests/199: change discard patterns

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

iotest 199 works too long because of many discard operations. At the
same time, postcopy period is very short, in spite of all these
efforts.

So, let's use less discards (and with more interesting patterns) to
reduce test timing. In the next commit we'll increase postcopy period.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-6-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 44 +-
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 190e820b8408..da4dae01fb5d 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -30,6 +30,28 @@ size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')


+GiB = 1024 * 1024 * 1024
+
+discards1 = (
+(0, GiB),
+(2 * GiB + 512 * 5, 512),
+(3 * GiB + 512 * 5, 512),
+(100 * GiB, GiB)
+)
+
+discards2 = (
+(3 * GiB + 512 * 8, 512),
+(4 * GiB + 512 * 8, 512),
+(50 * GiB, GiB),
+(100 * GiB + GiB // 2, GiB)
+)
+
+
+def apply_discards(vm, discards):
+for d in discards:
+vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
+
+
 def event_seconds(event):
 return event['timestamp']['seconds'] + \
 event['timestamp']['microseconds'] / 100.0
@@ -80,9 +102,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b_events = []

 def test_postcopy(self):
-discard_size = 0x4000
 granularity = 512
-chunk = 4096

 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap', granularity=granularity)
@@ -92,14 +112,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
node='drive0', name='bitmap')
 empty_sha256 = result['return']['sha256']

-s = 0
-while s < discard_size:
-self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
-s = 0x8000
-while s < discard_size:
-self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
+apply_discards(self.vm_a, discards1 + discards2)

 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap')
@@ -111,10 +124,8 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
name='bitmap')
 self.assert_qmp(result, 'return', {})
-s = 0
-while s < discard_size:
-self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
+
+apply_discards(self.vm_a, discards1)

 caps = [{'capability': 'dirty-bitmaps', 'state': True},
 {'capability': 'events', 'state': True}]
@@ -134,10 +145,7 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 event_resume = self.vm_b.event_wait('RESUME')
 self.vm_b_events.append(event_resume)

-s = 0x8000
-while s < discard_size:
-self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
+apply_discards(self.vm_b, discards2)

 match = {'data': {'status': 'completed'}}
 event_complete = self.vm_b.event_wait('MIGRATION', match=match)
-- 
2.27.0

[PULL 01/24] qcow2: Fix capitalization of header extension constant.

2020-07-27 Thread Eric Blake

From: Andrey Shinkevich 

Make the capitalization of the hexadecimal numbers consistent for the
QCOW2 header extension constants in docs/interop/qcow2.txt.

Suggested-by: Eric Blake 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <1594973699-781898-2-git-send-email-andrey.shinkev...@virtuozzo.com>
Reviewed-by: Eric Blake 
Signed-off-by: Eric Blake 
---
 docs/interop/qcow2.txt | 2 +-
 block/qcow2.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index cb723463f241..f072e27900e6 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -231,7 +231,7 @@ be stored. Each extension has a structure like the 
following:

 Byte  0 -  3:   Header extension type:
 0x - End of the header extension area
-0xE2792ACA - Backing file format name string
+0xe2792aca - Backing file format name string
 0x6803f857 - Feature name table
 0x23852875 - Bitmaps extension
 0x0537be77 - Full disk encryption header pointer
diff --git a/block/qcow2.c b/block/qcow2.c
index fadf3422f8c5..6ad6bdc166ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -66,7 +66,7 @@ typedef struct {
 } QEMU_PACKED QCowExtension;

 #define  QCOW2_EXT_MAGIC_END 0
-#define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
+#define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xe2792aca
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
 #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
 #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
-- 
2.27.0

[PULL 05/24] qemu-iotests/199: improve performance: set bitmap by discard

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Discard dirties dirty-bitmap as well as write, but works faster. Let's
use it instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-5-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dd6044768c76..190e820b8408 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -67,8 +67,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 os.mkfifo(fifo)
 qemu_img('create', '-f', iotests.imgfmt, disk_a, size)
 qemu_img('create', '-f', iotests.imgfmt, disk_b, size)
-self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a)
-self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a,
+  'discard=unmap')
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b,
+  'discard=unmap')
 self.vm_b.add_incoming("exec: cat '" + fifo + "'")
 self.vm_a.launch()
 self.vm_b.launch()
@@ -78,7 +80,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b_events = []

 def test_postcopy(self):
-write_size = 0x4000
+discard_size = 0x4000
 granularity = 512
 chunk = 4096

@@ -86,25 +88,32 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
name='bitmap', granularity=granularity)
 self.assert_qmp(result, 'return', {})

+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap')
+empty_sha256 = result['return']['sha256']
+
 s = 0
-while s < write_size:
-self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1
 s = 0x8000
-while s < write_size:
-self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1

 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap')
 sha256 = result['return']['sha256']

+# Check, that updating the bitmap by discards works
+assert sha256 != empty_sha256
+
 result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
name='bitmap')
 self.assert_qmp(result, 'return', {})
 s = 0
-while s < write_size:
-self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1

 caps = [{'capability': 'dirty-bitmaps', 'state': True},
@@ -126,8 +135,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b_events.append(event_resume)

 s = 0x8000
-while s < write_size:
-self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1

 match = {'data': {'status': 'completed'}}
-- 
2.27.0

[PULL 02/24] qemu-iotests/199: fix style

2020-07-27 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

Mostly, satisfy pep8 complaints.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
Message-Id: <20200727194236.19551-2-vsement...@virtuozzo.com>
Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/199 | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 40774eed74c2..de9ba8d94c23 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -28,8 +28,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')

+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
-
 def tearDown(self):
 self.vm_a.shutdown()
 self.vm_b.shutdown()
@@ -54,7 +54,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):

 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap', granularity=granularity)
-self.assert_qmp(result, 'return', {});
+self.assert_qmp(result, 'return', {})

 s = 0
 while s < write_size:
@@ -71,7 +71,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):

 result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
name='bitmap')
-self.assert_qmp(result, 'return', {});
+self.assert_qmp(result, 'return', {})
 s = 0
 while s < write_size:
 self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
@@ -104,15 +104,16 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
 s += 0x1

-result = self.vm_b.qmp('query-block');
+result = self.vm_b.qmp('query-block')
 while len(result['return'][0]['dirty-bitmaps']) > 1:
 time.sleep(2)
-result = self.vm_b.qmp('query-block');
+result = self.vm_b.qmp('query-block')

 result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap')

-self.assert_qmp(result, 'return/sha256', sha256);
+self.assert_qmp(result, 'return/sha256', sha256)
+

 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
-- 
2.27.0

Re: [PATCH] migration: Fix typos in bitmap migration comments

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


27.07.2020 23:32, Eric Blake wrote:

Noticed while reviewing the file for newer patches.

Fixes: b35ebdf076
Signed-off-by: Eric Blake 
---

This is trivial enough that I'll throw it in my pull request today.

  migration/block-dirty-bitmap.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1f675b792fc9..784330ebe130 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -97,7 +97,7 @@

  #define DIRTY_BITMAP_MIG_START_FLAG_ENABLED  0x01
  #define DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT   0x02
-/* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
+/* 0x04 was "AUTOLOAD" flags on older versions, now it is ignored */


may be also s/flags/flag


  #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK0xf8

  /* State of one bitmap during save process */
@@ -180,7 +180,7 @@ static uint32_t qemu_get_bitmap_flags(QEMUFile *f)

  static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
  {
-/* The code currently do not send flags more than one byte */
+/* The code currently does not send flags as more than one byte */


Hmm, why "as more than", not just "more than"?.
(this note is about the following: the protocol allows adding more than
one byte of flags with use of DIRTY_BITMAP_MIG_EXTRA_FLAGS. Still,
currently this possibility is not used and we assert it.)


  assert(!(flags & (0xff00 | DIRTY_BITMAP_MIG_EXTRA_FLAGS)));

  qemu_put_byte(f, flags);



Anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH 00/16] hw/block/nvme: dma handling and address mapping cleanup

2020-07-27 Thread Keith Busch

On Mon, Jul 27, 2020 at 11:42:46AM +0200, Klaus Jensen wrote:
> On Jul 20 13:37, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > This series consists of patches that refactors dma read/write and adds a
> > number of address mapping helper functions.
> > 
> > Based-on: <20200706061303.246057-1-...@irrelevant.dk>
> > 
> > Klaus Jensen (16):
> >   hw/block/nvme: memset preallocated requests structures
> >   hw/block/nvme: add mapping helpers
> >   hw/block/nvme: replace dma_acct with blk_acct equivalent
> >   hw/block/nvme: remove redundant has_sg member
> >   hw/block/nvme: refactor dma read/write
> >   hw/block/nvme: pass request along for tracing
> >   hw/block/nvme: add request mapping helper
> >   hw/block/nvme: verify validity of prp lists in the cmb
> >   hw/block/nvme: refactor request bounds checking
> >   hw/block/nvme: add check for mdts
> >   hw/block/nvme: be consistent about zeros vs zeroes
> >   hw/block/nvme: refactor NvmeRequest clearing
> >   hw/block/nvme: add a namespace reference in NvmeRequest
> >   hw/block/nvme: consolidate qsg/iov clearing
> >   hw/block/nvme: remove NvmeCmd parameter
> >   hw/block/nvme: use preallocated qsg/iov in nvme_dma_prp
> > 
> >  block/nvme.c  |   4 +-
> >  hw/block/nvme.c   | 498 +++---
> >  hw/block/nvme.h   |   4 +-
> >  hw/block/trace-events |   4 +
> >  include/block/nvme.h  |   4 +-
> >  5 files changed, 331 insertions(+), 183 deletions(-)
> > 
> > -- 
> > 2.27.0
> > 
> 
> Gentle ping on this.

I'll have free time to get back to this probably end of the week,
possibly early next week.

[PATCH] migration: Fix typos in bitmap migration comments

2020-07-27 Thread Eric Blake

Noticed while reviewing the file for newer patches.

Fixes: b35ebdf076
Signed-off-by: Eric Blake 
---

This is trivial enough that I'll throw it in my pull request today.

 migration/block-dirty-bitmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1f675b792fc9..784330ebe130 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -97,7 +97,7 @@

 #define DIRTY_BITMAP_MIG_START_FLAG_ENABLED  0x01
 #define DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT   0x02
-/* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
+/* 0x04 was "AUTOLOAD" flags on older versions, now it is ignored */
 #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK0xf8

 /* State of one bitmap during save process */
@@ -180,7 +180,7 @@ static uint32_t qemu_get_bitmap_flags(QEMUFile *f)

 static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t flags)
 {
-/* The code currently do not send flags more than one byte */
+/* The code currently does not send flags as more than one byte */
 assert(!(flags & (0xff00 | DIRTY_BITMAP_MIG_EXTRA_FLAGS)));

 qemu_put_byte(f, flags);
-- 
2.27.0

Re: [PATCH v4 15/21] migration/block-dirty-bitmap: relax error handling in incoming part

2020-07-27 Thread Eric Blake


On 7/27/20 2:42 PM, Vladimir Sementsov-Ogievskiy wrote:

Bitmaps data is not critical, and we should not fail the migration (or
use postcopy recovering) because of dirty-bitmaps migration failure.
Instead we should just lose unfinished bitmaps.

Still we have to report io stream violation errors, as they affect the
whole migration stream.



I'm amending this to also add:

While touching this, tighten code that was previously blindly calling 
malloc on a size read from the migration stream, as a corrupted stream 
(perhaps from a malicious user) should not be able to convince us to 
allocate an inordinate amount of memory.



Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  migration/block-dirty-bitmap.c | 164 +
  1 file changed, 127 insertions(+), 37 deletions(-)




@@ -650,15 +695,46 @@ static int dirty_bitmap_load_bits(QEMUFile *f, 
DBMLoadState *s)
  
  if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {

  trace_dirty_bitmap_load_bits_zeroes();
-bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
- false);
+if (!s->cancelled) {
+bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
+ nr_bytes, false);
+}
  } else {
  size_t ret;
-uint8_t *buf;
+g_autofree uint8_t *buf = NULL;
  uint64_t buf_size = qemu_get_be64(f);
-uint64_t needed_size =
-bdrv_dirty_bitmap_serialization_size(s->bitmap,
- first_byte, nr_bytes);
+uint64_t needed_size;
+
+/*
+ * Actual check for buf_size is done a bit later. We can't do it in


s/Actual/The actual/


+ * cancelled mode as we don't have the bitmap to check the constraints
+ * (so, we do allocate buffer and read prior to the check). On the 
other
+ * hand, we shouldn't blindly g_malloc the number from the stream.
+ * Actually one chunk should not be larger thatn CHUNK_SIZE. Let's 
allow


than


+ * a bit larger (which means that bitmap migration will fail anyway and
+ * the whole migration will most probably fail soon due to broken
+ * stream).
+ */
+if (buf_size > 10 * CHUNK_SIZE) {
+error_report("Bitmap migration stream requests too large buffer "
+ "size to allocate");


Bitmap migration stream buffer allocation request is too large

I'll make those touchups.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH for-5.1?] iotests: Adjust which tests are quick

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


27.07.2020 22:51, Eric Blake wrote:

A quick run of './check -qcow2 -g migration' shows that test 169 is
NOT quick, but meanwhile several other tests ARE quick.  Let's adjust
the test designations accordingly.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 

Still, why do we need quick group? make check uses "auto" group..
Some tests are considered important enough to run even not being quick.
Probably, everyone who don't want to run all tests, should run "auto" group,
not "quick"?
I, when want to check my changes, run all tests or limit them with
help of grep. I mostly run tests on tmpfs, so they all are quick enough.
Saving several minutes of cpu work doesn't worth missing a bug..


--
Best regards,
Vladimir

Re: [PATCH v4 for-5.1 00/21] Fix error handling during bitmap postcopy

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


27.07.2020 22:53, Eric Blake wrote:

On 7/27/20 2:42 PM, Vladimir Sementsov-Ogievskiy wrote:

v4:

01: typo in commit msg
07: reword commit msg, add Eric's r-b
10: add Dr. David's r-b
15: add check for buf_size
 use g_autofree (and fix introduced in v3)
 use QEMU_LOCK_GUARD
17: fix commit msg, add Eric's r-b
20-21: add Eric's t-b


What timing!  I was literally in the middle of composing my pull request when 
this landed in my inbox; I'll refresh my local contents to pick this up (and 
see if you tweaked anything differently than I did).

Therefore, my pull request is now shifted by an hour or two, but will still 
come today ;)



Sorry :). Thanks for your work!

--
Best regards,
Vladimir

[PATCH v4 17/21] migration/savevm: don't worry if bitmap migration postcopy failed

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

First, if only bitmaps postcopy is enabled (and not ram postcopy)
postcopy_pause_incoming crashes on an assertion
assert(mis->to_src_file).

And anyway, bitmaps postcopy is not prepared to be somehow recovered.
The original idea instead is that if bitmaps postcopy failed, we just
lose some bitmaps, which is not critical. So, on failure we just need
to remove unfinished bitmaps and guest should continue execution on
destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
---
 migration/savevm.c | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a..a843d202b5 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1813,6 +1813,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
 MigrationIncomingState *mis = migration_incoming_get_current();
 QEMUFile *f = mis->from_src_file;
 int load_res;
+MigrationState *migr = migrate_get_current();
+
+object_ref(OBJECT(migr));
 
 migrate_set_state(>state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1839,11 +1842,24 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
 trace_postcopy_ram_listen_thread_exit();
 if (load_res < 0) {
-error_report("%s: loadvm failed: %d", __func__, load_res);
 qemu_file_set_error(f, load_res);
-migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
-   MIGRATION_STATUS_FAILED);
-} else {
+dirty_bitmap_mig_cancel_incoming();
+if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
+!migrate_postcopy_ram() && migrate_dirty_bitmaps())
+{
+error_report("%s: loadvm failed during postcopy: %d. All states "
+ "are migrated except dirty bitmaps. Some dirty "
+ "bitmaps may be lost, and present migrated dirty "
+ "bitmaps are correctly migrated and valid.",
+ __func__, load_res);
+load_res = 0; /* prevent further exit() */
+} else {
+error_report("%s: loadvm failed: %d", __func__, load_res);
+migrate_set_state(>state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+   MIGRATION_STATUS_FAILED);
+}
+}
+if (load_res >= 0) {
 /*
  * This looks good, but it's possible that the device loading in the
  * main thread hasn't finished yet, and so we might not be in 'RUN'
@@ -1879,6 +1895,8 @@ static void *postcopy_ram_listen_thread(void *opaque)
 mis->have_listen_thread = false;
 postcopy_state_set(POSTCOPY_INCOMING_END);
 
+object_unref(OBJECT(migr));
+
 return NULL;
 }
 
@@ -2437,6 +2455,8 @@ static bool 
postcopy_pause_incoming(MigrationIncomingState *mis)
 {
 trace_postcopy_pause_incoming();
 
+assert(migrate_postcopy_ram());
+
 /* Clear the triggered bit to allow one recovery */
 mis->postcopy_recover_triggered = false;
 
@@ -2521,15 +2541,22 @@ out:
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 
+/* Cancel bitmaps incoming regardless of recovery */
+dirty_bitmap_mig_cancel_incoming();
+
 /*
  * If we are during an active postcopy, then we pause instead
  * of bail out to at least keep the VM's dirty data.  Note
  * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
  * during which we're still receiving device states and we
  * still haven't yet started the VM on destination.
+ *
+ * Only RAM postcopy supports recovery. Still, if RAM postcopy is
+ * enabled, canceled bitmaps postcopy will not affect RAM postcopy
+ * recovering.
  */
 if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
-postcopy_pause_incoming(mis)) {
+migrate_postcopy_ram() && postcopy_pause_incoming(mis)) {
 /* Reset f to point to the newly created channel */
 f = mis->from_src_file;
 goto retry;
-- 
2.21.0

Re: [PATCH v4 for-5.1 00/21] Fix error handling during bitmap postcopy

2020-07-27 Thread Eric Blake


On 7/27/20 2:42 PM, Vladimir Sementsov-Ogievskiy wrote:

v4:

01: typo in commit msg
07: reword commit msg, add Eric's r-b
10: add Dr. David's r-b
15: add check for buf_size
 use g_autofree (and fix introduced in v3)
 use QEMU_LOCK_GUARD
17: fix commit msg, add Eric's r-b
20-21: add Eric's t-b


What timing!  I was literally in the middle of composing my pull request 
when this landed in my inbox; I'll refresh my local contents to pick 
this up (and see if you tweaked anything differently than I did).


Therefore, my pull request is now shifted by an hour or two, but will 
still come today ;)


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v4 11/21] migration/block-dirty-bitmap: refactor state global variables

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Move all state variables into one global struct. Reduce global
variable usage, utilizing opaque pointer where possible.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 migration/block-dirty-bitmap.c | 179 ++---
 1 file changed, 99 insertions(+), 80 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 4b67e4f4fb..9b39e7aa2b 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -128,6 +128,12 @@ typedef struct DBMSaveState {
 BdrvDirtyBitmap *prev_bitmap;
 } DBMSaveState;
 
+typedef struct LoadBitmapState {
+BlockDriverState *bs;
+BdrvDirtyBitmap *bitmap;
+bool migrated;
+} LoadBitmapState;
+
 /* State of the dirty bitmap migration (DBM) during load process */
 typedef struct DBMLoadState {
 uint32_t flags;
@@ -135,18 +141,17 @@ typedef struct DBMLoadState {
 char bitmap_name[256];
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
+
+GSList *enabled_bitmaps;
+QemuMutex finish_lock;
 } DBMLoadState;
 
-static DBMSaveState dirty_bitmap_mig_state;
+typedef struct DBMState {
+DBMSaveState save;
+DBMLoadState load;
+} DBMState;
 
-/* State of one bitmap during load process */
-typedef struct LoadBitmapState {
-BlockDriverState *bs;
-BdrvDirtyBitmap *bitmap;
-bool migrated;
-} LoadBitmapState;
-static GSList *enabled_bitmaps;
-QemuMutex finish_lock;
+static DBMState dbm_state;
 
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
@@ -169,21 +174,21 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t 
flags)
 qemu_put_byte(f, flags);
 }
 
-static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
-   uint32_t additional_flags)
+static void send_bitmap_header(QEMUFile *f, DBMSaveState *s,
+   SaveBitmapState *dbms, uint32_t 
additional_flags)
 {
 BlockDriverState *bs = dbms->bs;
 BdrvDirtyBitmap *bitmap = dbms->bitmap;
 uint32_t flags = additional_flags;
 trace_send_bitmap_header_enter();
 
-if (bs != dirty_bitmap_mig_state.prev_bs) {
-dirty_bitmap_mig_state.prev_bs = bs;
+if (bs != s->prev_bs) {
+s->prev_bs = bs;
 flags |= DIRTY_BITMAP_MIG_FLAG_DEVICE_NAME;
 }
 
-if (bitmap != dirty_bitmap_mig_state.prev_bitmap) {
-dirty_bitmap_mig_state.prev_bitmap = bitmap;
+if (bitmap != s->prev_bitmap) {
+s->prev_bitmap = bitmap;
 flags |= DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME;
 }
 
@@ -198,19 +203,22 @@ static void send_bitmap_header(QEMUFile *f, 
SaveBitmapState *dbms,
 }
 }
 
-static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, DBMSaveState *s,
+  SaveBitmapState *dbms)
 {
-send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
+send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_START);
 qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
 qemu_put_byte(f, dbms->flags);
 }
 
-static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, DBMSaveState *s,
+ SaveBitmapState *dbms)
 {
-send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
+send_bitmap_header(f, s, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }
 
-static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, DBMSaveState *s,
+ SaveBitmapState *dbms,
  uint64_t start_sector, uint32_t nr_sectors)
 {
 /* align for buffer_is_zero() */
@@ -235,7 +243,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState 
*dbms,
 
 trace_send_bitmap_bits(flags, start_sector, nr_sectors, buf_size);
 
-send_bitmap_header(f, dbms, flags);
+send_bitmap_header(f, s, dbms, flags);
 
 qemu_put_be64(f, start_sector);
 qemu_put_be32(f, nr_sectors);
@@ -254,12 +262,12 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState 
*dbms,
 }
 
 /* Called with iothread lock taken.  */
-static void dirty_bitmap_do_save_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(DBMSaveState *s)
 {
 SaveBitmapState *dbms;
 
-while ((dbms = QSIMPLEQ_FIRST(_bitmap_mig_state.dbms_list)) != NULL) 
{
-QSIMPLEQ_REMOVE_HEAD(_bitmap_mig_state.dbms_list, entry);
+while ((dbms = QSIMPLEQ_FIRST(>dbms_list)) != NULL) {
+QSIMPLEQ_REMOVE_HEAD(>dbms_list, entry);
 bdrv_dirty_bitmap_set_busy(dbms->bitmap, false);
 bdrv_unref(dbms->bs);
 g_free(dbms);
@@ -267,7 +275,8 @@ static void dirty_bitmap_do_save_cleanup(void)
 }
 
 /* Called with iothread lock taken. */
-static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
+static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
+   const char *bs_name)
 {

[PATCH v4 14/21] migration/block-dirty-bitmap: keep bitmap state for all bitmaps

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Keep bitmap state for disabled bitmaps too. Keep the state until the
end of the process. It's needed for the following commit to implement
bitmap postcopy canceling.

To clean-up the new list the following logic is used:
We need two events to consider bitmap migration finished:
1. chunk with DIRTY_BITMAP_MIG_FLAG_COMPLETE flag should be received
2. dirty_bitmap_mig_before_vm_start should be called
These two events may come in any order, so we understand which one is
last, and on the last of them we remove bitmap migration state from the
list.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 migration/block-dirty-bitmap.c | 64 +++---
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 405a259296..eb4ffeac4d 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -132,6 +132,7 @@ typedef struct LoadBitmapState {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
 bool migrated;
+bool enabled;
 } LoadBitmapState;
 
 /* State of the dirty bitmap migration (DBM) during load process */
@@ -142,8 +143,10 @@ typedef struct DBMLoadState {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
 
-GSList *enabled_bitmaps;
-QemuMutex lock; /* protect enabled_bitmaps */
+bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
+
+GSList *bitmaps;
+QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
 
 typedef struct DBMState {
@@ -526,6 +529,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 Error *local_err = NULL;
 uint32_t granularity = qemu_get_be32(f);
 uint8_t flags = qemu_get_byte(f);
+LoadBitmapState *b;
 
 if (s->bitmap) {
 error_report("Bitmap with the same name ('%s') already exists on "
@@ -552,45 +556,59 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 
 bdrv_disable_dirty_bitmap(s->bitmap);
 if (flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED) {
-LoadBitmapState *b;
-
 bdrv_dirty_bitmap_create_successor(s->bitmap, _err);
 if (local_err) {
 error_report_err(local_err);
 return -EINVAL;
 }
-
-b = g_new(LoadBitmapState, 1);
-b->bs = s->bs;
-b->bitmap = s->bitmap;
-b->migrated = false;
-s->enabled_bitmaps = g_slist_prepend(s->enabled_bitmaps, b);
 }
 
+b = g_new(LoadBitmapState, 1);
+b->bs = s->bs;
+b->bitmap = s->bitmap;
+b->migrated = false;
+b->enabled = flags & DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
+
+s->bitmaps = g_slist_prepend(s->bitmaps, b);
+
 return 0;
 }
 
-void dirty_bitmap_mig_before_vm_start(void)
+/*
+ * before_vm_start_handle_item
+ *
+ * g_slist_foreach helper
+ *
+ * item is LoadBitmapState*
+ * opaque is DBMLoadState*
+ */
+static void before_vm_start_handle_item(void *item, void *opaque)
 {
-DBMLoadState *s = _state.load;
-GSList *item;
-
-qemu_mutex_lock(>lock);
-
-for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
-LoadBitmapState *b = item->data;
+DBMLoadState *s = opaque;
+LoadBitmapState *b = item;
 
+if (b->enabled) {
 if (b->migrated) {
 bdrv_enable_dirty_bitmap(b->bitmap);
 } else {
 bdrv_dirty_bitmap_enable_successor(b->bitmap);
 }
+}
 
+if (b->migrated) {
+s->bitmaps = g_slist_remove(s->bitmaps, b);
 g_free(b);
 }
+}
 
-g_slist_free(s->enabled_bitmaps);
-s->enabled_bitmaps = NULL;
+void dirty_bitmap_mig_before_vm_start(void)
+{
+DBMLoadState *s = _state.load;
+qemu_mutex_lock(>lock);
+
+assert(!s->before_vm_start_handled);
+g_slist_foreach(s->bitmaps, before_vm_start_handle_item, s);
+s->before_vm_start_handled = true;
 
 qemu_mutex_unlock(>lock);
 }
@@ -607,11 +625,15 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 bdrv_reclaim_dirty_bitmap(s->bitmap, _abort);
 }
 
-for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
+for (item = s->bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;
 
 if (b->bitmap == s->bitmap) {
 b->migrated = true;
+if (s->before_vm_start_handled) {
+s->bitmaps = g_slist_remove(s->bitmaps, b);
+g_free(b);
+}
 break;
 }
 }
-- 
2.21.0

[PATCH v4 20/21] qemu-iotests/199: add early shutdown case to bitmaps postcopy

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Previous patches fixed two crashes which may occur on shutdown prior to
bitmaps postcopy finished. Check that it works now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 24 
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 5fd34f0fcd..140930b2b1 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -217,6 +217,30 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
 self.assert_qmp(result, 'return/sha256', sha)
 
+def test_early_shutdown_destination(self):
+self.start_postcopy()
+
+self.vm_b_events += self.vm_b.get_qmp_events()
+self.vm_b.shutdown()
+# recreate vm_b, so there is no incoming option, which prevents
+# loading bitmaps from disk
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+self.vm_b.launch()
+check_bitmaps(self.vm_b, 0)
+
+# Bitmaps will be lost if we just shutdown the vm, as they are marked
+# to skip storing to disk when prepared for migration. And that's
+# correct, as actual data may be modified in target vm, so we play
+# safe.
+# Still, this mark would be taken away if we do 'cont', and bitmaps
+# become persistent again. (see iotest 169 for such behavior case)
+result = self.vm_a.qmp('query-status')
+assert not result['return']['running']
+self.vm_a_events += self.vm_a.get_qmp_events()
+self.vm_a.shutdown()
+self.vm_a.launch()
+check_bitmaps(self.vm_a, 0)
+
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index ae1213e6f8..fbc63e62f8 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-.
+..
 --
-Ran 1 tests
+Ran 2 tests
 
 OK
-- 
2.21.0

[PATCH v4 21/21] qemu-iotests/199: add source-killed case to bitmaps postcopy

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Previous patches fixes behavior of bitmaps migration, so that errors
are handled by just removing unfinished bitmaps, and not fail or try to
recover postcopy migration. Add corresponding test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 15 +++
 tests/qemu-iotests/199.out |  4 ++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 140930b2b1..58fad872a1 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -241,6 +241,21 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a.launch()
 check_bitmaps(self.vm_a, 0)
 
+def test_early_kill_source(self):
+self.start_postcopy()
+
+self.vm_a_events = self.vm_a.get_qmp_events()
+self.vm_a.kill()
+
+self.vm_a.launch()
+
+match = {'data': {'status': 'completed'}}
+e_complete = self.vm_b.event_wait('MIGRATION', match=match)
+self.vm_b_events.append(e_complete)
+
+check_bitmaps(self.vm_a, 0)
+check_bitmaps(self.vm_b, 0)
+
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/199.out b/tests/qemu-iotests/199.out
index fbc63e62f8..8d7e996700 100644
--- a/tests/qemu-iotests/199.out
+++ b/tests/qemu-iotests/199.out
@@ -1,5 +1,5 @@
-..
+...
 --
-Ran 2 tests
+Ran 3 tests
 
 OK
-- 
2.21.0

[PATCH for-5.1?] iotests: Adjust which tests are quick

2020-07-27 Thread Eric Blake

A quick run of './check -qcow2 -g migration' shows that test 169 is
NOT quick, but meanwhile several other tests ARE quick.  Let's adjust
the test designations accordingly.

Signed-off-by: Eric Blake 
---

I noticed this while working on my pending pull request that includes
Vladimir's massive speedup of 199 (but even with his speedup, that test
is still not quick).

 tests/qemu-iotests/group | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 1d0252e1f051..806044642c69 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -112,7 +112,7 @@
 088 rw quick
 089 rw auto quick
 090 rw auto quick
-091 rw migration
+091 rw migration quick
 092 rw quick
 093 throttle
 094 rw quick
@@ -186,7 +186,7 @@
 162 quick
 163 rw
 165 rw quick
-169 rw quick migration
+169 rw migration
 170 rw auto quick
 171 rw quick
 172 auto
@@ -197,9 +197,9 @@
 177 rw auto quick
 178 img
 179 rw auto quick
-181 rw auto migration
+181 rw auto migration quick
 182 rw quick
-183 rw migration
+183 rw migration quick
 184 rw auto quick
 185 rw
 186 rw auto
@@ -216,9 +216,9 @@
 198 rw
 199 rw migration
 200 rw
-201 rw migration
+201 rw migration quick
 202 rw quick
-203 rw auto migration
+203 rw auto migration quick
 204 rw quick
 205 rw quick
 206 rw
-- 
2.27.0

[PATCH v4 19/21] qemu-iotests/199: check persistent bitmaps

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Check that persistent bitmaps are not stored on source and that bitmaps
are persistent on destination.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 tests/qemu-iotests/199 | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 355c0b2885..5fd34f0fcd 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -117,7 +117,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 for i in range(nb_bitmaps):
 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap{}'.format(i),
-   granularity=granularity)
+   granularity=granularity,
+   persistent=True)
 self.assert_qmp(result, 'return', {})
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
@@ -193,6 +194,19 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 print('downtime:', downtime)
 print('postcopy_time:', postcopy_time)
 
+# check that there are no bitmaps stored on source
+self.vm_a_events += self.vm_a.get_qmp_events()
+self.vm_a.shutdown()
+self.vm_a.launch()
+check_bitmaps(self.vm_a, 0)
+
+# check that bitmaps are migrated and persistence works
+check_bitmaps(self.vm_b, nb_bitmaps)
+self.vm_b.shutdown()
+# recreate vm_b, so there is no incoming option, which prevents
+# loading bitmaps from disk
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+self.vm_b.launch()
 check_bitmaps(self.vm_b, nb_bitmaps)
 
 # Check content of migrated bitmaps. Still, don't waste time checking
-- 
2.21.0

[PATCH v4 09/21] migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Rename dirty_bitmap_mig_cleanup to dirty_bitmap_do_save_cleanup, to
stress that it is on save part.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 1d57bff4f6..01a536d7d3 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -259,7 +259,7 @@ static void send_bitmap_bits(QEMUFile *f, SaveBitmapState 
*dbms,
 }
 
 /* Called with iothread lock taken.  */
-static void dirty_bitmap_mig_cleanup(void)
+static void dirty_bitmap_do_save_cleanup(void)
 {
 SaveBitmapState *dbms;
 
@@ -406,7 +406,7 @@ static int init_dirty_bitmap_migration(void)
 
 fail:
 g_hash_table_destroy(handled_by_blk);
-dirty_bitmap_mig_cleanup();
+dirty_bitmap_do_save_cleanup();
 
 return -1;
 }
@@ -445,7 +445,7 @@ static void bulk_phase(QEMUFile *f, bool limit)
 /* for SaveVMHandlers */
 static void dirty_bitmap_save_cleanup(void *opaque)
 {
-dirty_bitmap_mig_cleanup();
+dirty_bitmap_do_save_cleanup();
 }
 
 static int dirty_bitmap_save_iterate(QEMUFile *f, void *opaque)
@@ -480,7 +480,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void 
*opaque)
 
 trace_dirty_bitmap_save_complete_finish();
 
-dirty_bitmap_mig_cleanup();
+dirty_bitmap_do_save_cleanup();
 return 0;
 }
 
-- 
2.21.0

[PATCH v4 13/21] migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

bdrv_enable_dirty_bitmap_locked() call does nothing, as if we are in
postcopy, bitmap successor must be enabled, and reclaim operation will
enable the bitmap.

So, actually we need just call _reclaim_ in both if branches, and
making differences only to add an assertion seems not really good. The
logic becomes simple: on load complete we do reclaim and that's all.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 migration/block-dirty-bitmap.c | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9194807b54..405a259296 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -603,6 +603,10 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 
 qemu_mutex_lock(>lock);
 
+if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
+bdrv_reclaim_dirty_bitmap(s->bitmap, _abort);
+}
+
 for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;
 
@@ -612,27 +616,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 }
 }
 
-if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
-bdrv_dirty_bitmap_lock(s->bitmap);
-if (s->enabled_bitmaps == NULL) {
-/* in postcopy */
-bdrv_reclaim_dirty_bitmap_locked(s->bitmap, _abort);
-bdrv_enable_dirty_bitmap_locked(s->bitmap);
-} else {
-/* target not started, successor must be empty */
-int64_t count = bdrv_get_dirty_count(s->bitmap);
-BdrvDirtyBitmap *ret = bdrv_reclaim_dirty_bitmap_locked(s->bitmap,
-NULL);
-/* bdrv_reclaim_dirty_bitmap can fail only on no successor (it
- * must be) or on merge fail, but merge can't fail when second
- * bitmap is empty
- */
-assert(ret == s->bitmap &&
-   count == bdrv_get_dirty_count(s->bitmap));
-}
-bdrv_dirty_bitmap_unlock(s->bitmap);
-}
-
 qemu_mutex_unlock(>lock);
 }
 
-- 
2.21.0

[PATCH v4 18/21] qemu-iotests/199: prepare for new test-cases addition

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Move future common part to start_postcopy() method. Move checking
number of bitmaps to check_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 tests/qemu-iotests/199 | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index d8532e49da..355c0b2885 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -29,6 +29,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
+granularity = 512
+nb_bitmaps = 15
 
 GiB = 1024 * 1024 * 1024
 
@@ -61,6 +63,15 @@ def event_dist(e1, e2):
 return event_seconds(e2) - event_seconds(e1)
 
 
+def check_bitmaps(vm, count):
+result = vm.qmp('query-block')
+
+if count == 0:
+assert 'dirty-bitmaps' not in result['return'][0]
+else:
+assert len(result['return'][0]['dirty-bitmaps']) == count
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 def tearDown(self):
 if debug:
@@ -101,10 +112,8 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a_events = []
 self.vm_b_events = []
 
-def test_postcopy(self):
-granularity = 512
-nb_bitmaps = 15
-
+def start_postcopy(self):
+""" Run migration until RESUME event on target. Return this event. """
 for i in range(nb_bitmaps):
 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap{}'.format(i),
@@ -119,10 +128,10 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap0')
-discards1_sha256 = result['return']['sha256']
+self.discards1_sha256 = result['return']['sha256']
 
 # Check, that updating the bitmap by discards works
-assert discards1_sha256 != empty_sha256
+assert self.discards1_sha256 != empty_sha256
 
 # We want to calculate resulting sha256. Do it in bitmap0, so, disable
 # other bitmaps
@@ -135,7 +144,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap0')
-all_discards_sha256 = result['return']['sha256']
+self.all_discards_sha256 = result['return']['sha256']
 
 # Now, enable some bitmaps, to be updated during migration
 for i in range(2, nb_bitmaps, 2):
@@ -160,6 +169,10 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 event_resume = self.vm_b.event_wait('RESUME')
 self.vm_b_events.append(event_resume)
+return event_resume
+
+def test_postcopy_success(self):
+event_resume = self.start_postcopy()
 
 # enabled bitmaps should be updated
 apply_discards(self.vm_b, discards2)
@@ -180,18 +193,15 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 print('downtime:', downtime)
 print('postcopy_time:', postcopy_time)
 
-# Assert that bitmap migration is finished (check that successor bitmap
-# is removed)
-result = self.vm_b.qmp('query-block')
-assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+check_bitmaps(self.vm_b, nb_bitmaps)
 
 # Check content of migrated bitmaps. Still, don't waste time checking
 # every bitmap
 for i in range(0, nb_bitmaps, 5):
 result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap{}'.format(i))
-sha256 = discards1_sha256 if i % 2 else all_discards_sha256
-self.assert_qmp(result, 'return/sha256', sha256)
+sha = self.discards1_sha256 if i % 2 else self.all_discards_sha256
+self.assert_qmp(result, 'return/sha256', sha)
 
 
 if __name__ == '__main__':
-- 
2.21.0

[PATCH v4 07/21] migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Using the _locked version of bdrv_enable_dirty_bitmap to bypass locking
is wrong as we do not already own the mutex.  Moreover, the adjacent
call to bdrv_dirty_bitmap_enable_successor grabs the mutex.

Fixes: 58f72b965e9e1q
Cc: qemu-sta...@nongnu.org # v3.0
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index b0dbf9eeed..0739f1259e 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -566,7 +566,7 @@ void dirty_bitmap_mig_before_vm_start(void)
 DirtyBitmapLoadBitmapState *b = item->data;
 
 if (b->migrated) {
-bdrv_enable_dirty_bitmap_locked(b->bitmap);
+bdrv_enable_dirty_bitmap(b->bitmap);
 } else {
 bdrv_dirty_bitmap_enable_successor(b->bitmap);
 }
-- 
2.21.0

[PATCH v4 05/21] qemu-iotests/199: change discard patterns

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

iotest 199 works too long because of many discard operations. At the
same time, postcopy period is very short, in spite of all these
efforts.

So, let's use less discards (and with more interesting patterns) to
reduce test timing. In the next commit we'll increase postcopy period.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 44 +-
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 190e820b84..da4dae01fb 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -30,6 +30,28 @@ size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
 
+GiB = 1024 * 1024 * 1024
+
+discards1 = (
+(0, GiB),
+(2 * GiB + 512 * 5, 512),
+(3 * GiB + 512 * 5, 512),
+(100 * GiB, GiB)
+)
+
+discards2 = (
+(3 * GiB + 512 * 8, 512),
+(4 * GiB + 512 * 8, 512),
+(50 * GiB, GiB),
+(100 * GiB + GiB // 2, GiB)
+)
+
+
+def apply_discards(vm, discards):
+for d in discards:
+vm.hmp_qemu_io('drive0', 'discard {} {}'.format(*d))
+
+
 def event_seconds(event):
 return event['timestamp']['seconds'] + \
 event['timestamp']['microseconds'] / 100.0
@@ -80,9 +102,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b_events = []
 
 def test_postcopy(self):
-discard_size = 0x4000
 granularity = 512
-chunk = 4096
 
 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap', granularity=granularity)
@@ -92,14 +112,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
node='drive0', name='bitmap')
 empty_sha256 = result['return']['sha256']
 
-s = 0
-while s < discard_size:
-self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
-s = 0x8000
-while s < discard_size:
-self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
+apply_discards(self.vm_a, discards1 + discards2)
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap')
@@ -111,10 +124,8 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
name='bitmap')
 self.assert_qmp(result, 'return', {})
-s = 0
-while s < discard_size:
-self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
+
+apply_discards(self.vm_a, discards1)
 
 caps = [{'capability': 'dirty-bitmaps', 'state': True},
 {'capability': 'events', 'state': True}]
@@ -134,10 +145,7 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 event_resume = self.vm_b.event_wait('RESUME')
 self.vm_b_events.append(event_resume)
 
-s = 0x8000
-while s < discard_size:
-self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
-s += 0x1
+apply_discards(self.vm_b, discards2)
 
 match = {'data': {'status': 'completed'}}
 event_complete = self.vm_b.event_wait('MIGRATION', match=match)
-- 
2.21.0

[PATCH v4 15/21] migration/block-dirty-bitmap: relax error handling in incoming part

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Bitmaps data is not critical, and we should not fail the migration (or
use postcopy recovering) because of dirty-bitmaps migration failure.
Instead we should just lose unfinished bitmaps.

Still we have to report io stream violation errors, as they affect the
whole migration stream.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 migration/block-dirty-bitmap.c | 164 +
 1 file changed, 127 insertions(+), 37 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index eb4ffeac4d..4e45e79251 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -145,6 +145,15 @@ typedef struct DBMLoadState {
 
 bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
 
+/*
+ * cancelled
+ * Incoming migration is cancelled for some reason. That means that we
+ * still should read our chunks from migration stream, to not affect other
+ * migration objects (like RAM), but just ignore them and do not touch any
+ * bitmaps or nodes.
+ */
+bool cancelled;
+
 GSList *bitmaps;
 QemuMutex lock; /* protect bitmaps */
 } DBMLoadState;
@@ -531,6 +540,10 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 uint8_t flags = qemu_get_byte(f);
 LoadBitmapState *b;
 
+if (s->cancelled) {
+return 0;
+}
+
 if (s->bitmap) {
 error_report("Bitmap with the same name ('%s') already exists on "
  "destination", bdrv_dirty_bitmap_name(s->bitmap));
@@ -613,13 +626,47 @@ void dirty_bitmap_mig_before_vm_start(void)
 qemu_mutex_unlock(>lock);
 }
 
+static void cancel_incoming_locked(DBMLoadState *s)
+{
+GSList *item;
+
+if (s->cancelled) {
+return;
+}
+
+s->cancelled = true;
+s->bs = NULL;
+s->bitmap = NULL;
+
+/* Drop all unfinished bitmaps */
+for (item = s->bitmaps; item; item = g_slist_next(item)) {
+LoadBitmapState *b = item->data;
+
+/*
+ * Bitmap must be unfinished, as finished bitmaps should already be
+ * removed from the list.
+ */
+assert(!s->before_vm_start_handled || !b->migrated);
+if (bdrv_dirty_bitmap_has_successor(b->bitmap)) {
+bdrv_reclaim_dirty_bitmap(b->bitmap, _abort);
+}
+bdrv_release_dirty_bitmap(b->bitmap);
+}
+
+g_slist_free_full(s->bitmaps, g_free);
+s->bitmaps = NULL;
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
 GSList *item;
 trace_dirty_bitmap_load_complete();
-bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-qemu_mutex_lock(>lock);
+if (s->cancelled) {
+return;
+}
+
+bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
 if (bdrv_dirty_bitmap_has_successor(s->bitmap)) {
 bdrv_reclaim_dirty_bitmap(s->bitmap, _abort);
@@ -637,8 +684,6 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 break;
 }
 }
-
-qemu_mutex_unlock(>lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -650,15 +695,46 @@ static int dirty_bitmap_load_bits(QEMUFile *f, 
DBMLoadState *s)
 
 if (s->flags & DIRTY_BITMAP_MIG_FLAG_ZEROES) {
 trace_dirty_bitmap_load_bits_zeroes();
-bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte, nr_bytes,
- false);
+if (!s->cancelled) {
+bdrv_dirty_bitmap_deserialize_zeroes(s->bitmap, first_byte,
+ nr_bytes, false);
+}
 } else {
 size_t ret;
-uint8_t *buf;
+g_autofree uint8_t *buf = NULL;
 uint64_t buf_size = qemu_get_be64(f);
-uint64_t needed_size =
-bdrv_dirty_bitmap_serialization_size(s->bitmap,
- first_byte, nr_bytes);
+uint64_t needed_size;
+
+/*
+ * Actual check for buf_size is done a bit later. We can't do it in
+ * cancelled mode as we don't have the bitmap to check the constraints
+ * (so, we do allocate buffer and read prior to the check). On the 
other
+ * hand, we shouldn't blindly g_malloc the number from the stream.
+ * Actually one chunk should not be larger thatn CHUNK_SIZE. Let's 
allow
+ * a bit larger (which means that bitmap migration will fail anyway and
+ * the whole migration will most probably fail soon due to broken
+ * stream).
+ */
+if (buf_size > 10 * CHUNK_SIZE) {
+error_report("Bitmap migration stream requests too large buffer "
+ "size to allocate");
+return -EIO;
+}
+
+buf = g_malloc(buf_size);
+ret = qemu_get_buffer(f, buf, buf_size);
+if (ret != buf_size) {
+error_report("Failed to read bitmap bits");
+return -EIO;
+

[PATCH v4 16/21] migration/block-dirty-bitmap: cancel migration on shutdown

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 migration/migration.h  |  2 ++
 migration/block-dirty-bitmap.c | 16 
 migration/migration.c  | 13 +
 3 files changed, 31 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index ab20c756f5..6c6a931d0d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
*mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
+void dirty_bitmap_mig_cancel_outgoing(void);
+void dirty_bitmap_mig_cancel_incoming(void);
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 4e45e79251..36ca8be392 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
 s->bitmaps = NULL;
 }
 
+void dirty_bitmap_mig_cancel_outgoing(void)
+{
+dirty_bitmap_do_save_cleanup(_state.save);
+}
+
+void dirty_bitmap_mig_cancel_incoming(void)
+{
+DBMLoadState *s = _state.load;
+
+qemu_mutex_lock(>lock);
+
+cancel_incoming_locked(s);
+
+qemu_mutex_unlock(>lock);
+}
+
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
 {
 GSList *item;
diff --git a/migration/migration.c b/migration/migration.c
index 1c61428988..8fe36339db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -188,6 +188,19 @@ void migration_shutdown(void)
  */
 migrate_fd_cancel(current_migration);
 object_unref(OBJECT(current_migration));
+
+/*
+ * Cancel outgoing migration of dirty bitmaps. It should
+ * at least unref used block nodes.
+ */
+dirty_bitmap_mig_cancel_outgoing();
+
+/*
+ * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
+ * are non-critical data, and their loss never considered as
+ * something serious.
+ */
+dirty_bitmap_mig_cancel_incoming();
 }
 
 /* For outgoing */
-- 
2.21.0

[PATCH v4 03/21] qemu-iotests/199: better catch postcopy time

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

The test aims to test _postcopy_ migration, and wants to do some write
operations during postcopy time.

Test considers migrate status=complete event on source as start of
postcopy. This is completely wrong, completion is completion of the
whole migration process. Let's instead consider destination start as
start of postcopy, and use RESUME event for it.

Next, as migration finish, let's use migration status=complete event on
target, as such method is closer to what libvirt or another user will
do, than tracking number of dirty-bitmaps.

Finally, add a possibility to dump events for debug. And if
set debug to True, we see, that actual postcopy period is very small
relatively to the whole test duration time (~0.2 seconds to >40 seconds
for me). This means, that test is very inefficient in what it supposed
to do. Let's improve it in following commits.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 72 +-
 1 file changed, 57 insertions(+), 15 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dda918450a..dd6044768c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -20,17 +20,43 @@
 
 import os
 import iotests
-import time
 from iotests import qemu_img
 
+debug = False
+
 disk_a = os.path.join(iotests.test_dir, 'disk_a')
 disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
 
+def event_seconds(event):
+return event['timestamp']['seconds'] + \
+event['timestamp']['microseconds'] / 100.0
+
+
+def event_dist(e1, e2):
+return event_seconds(e2) - event_seconds(e1)
+
+
 class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 def tearDown(self):
+if debug:
+self.vm_a_events += self.vm_a.get_qmp_events()
+self.vm_b_events += self.vm_b.get_qmp_events()
+for e in self.vm_a_events:
+e['vm'] = 'SRC'
+for e in self.vm_b_events:
+e['vm'] = 'DST'
+events = (self.vm_a_events + self.vm_b_events)
+events = [(e['timestamp']['seconds'],
+   e['timestamp']['microseconds'],
+   e['vm'],
+   e['event'],
+   e.get('data', '')) for e in events]
+for e in sorted(events):
+print('{}.{:06} {} {} {}'.format(*e))
+
 self.vm_a.shutdown()
 self.vm_b.shutdown()
 os.remove(disk_a)
@@ -47,6 +73,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a.launch()
 self.vm_b.launch()
 
+# collect received events for debug
+self.vm_a_events = []
+self.vm_b_events = []
+
 def test_postcopy(self):
 write_size = 0x4000
 granularity = 512
@@ -77,15 +107,13 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
 s += 0x1
 
-bitmaps_cap = {'capability': 'dirty-bitmaps', 'state': True}
-events_cap = {'capability': 'events', 'state': True}
+caps = [{'capability': 'dirty-bitmaps', 'state': True},
+{'capability': 'events', 'state': True}]
 
-result = self.vm_a.qmp('migrate-set-capabilities',
-   capabilities=[bitmaps_cap, events_cap])
+result = self.vm_a.qmp('migrate-set-capabilities', capabilities=caps)
 self.assert_qmp(result, 'return', {})
 
-result = self.vm_b.qmp('migrate-set-capabilities',
-   capabilities=[bitmaps_cap])
+result = self.vm_b.qmp('migrate-set-capabilities', capabilities=caps)
 self.assert_qmp(result, 'return', {})
 
 result = self.vm_a.qmp('migrate', uri='exec:cat>' + fifo)
@@ -94,24 +122,38 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 result = self.vm_a.qmp('migrate-start-postcopy')
 self.assert_qmp(result, 'return', {})
 
-while True:
-event = self.vm_a.event_wait('MIGRATION')
-if event['data']['status'] == 'completed':
-break
+event_resume = self.vm_b.event_wait('RESUME')
+self.vm_b_events.append(event_resume)
 
 s = 0x8000
 while s < write_size:
 self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
 s += 0x1
 
+match = {'data': {'status': 'completed'}}
+event_complete = self.vm_b.event_wait('MIGRATION', match=match)
+self.vm_b_events.append(event_complete)
+
+# take queued event, should already been happened
+event_stop = self.vm_a.event_wait('STOP')
+self.vm_a_events.append(event_stop)
+
+downtime = event_dist(event_stop, event_resume)
+postcopy_time = event_dist(event_resume,

[PATCH v4 12/21] migration/block-dirty-bitmap: rename finish_lock to just lock

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

finish_lock is bad name, as lock used not only on process end.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
 migration/block-dirty-bitmap.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9b39e7aa2b..9194807b54 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -143,7 +143,7 @@ typedef struct DBMLoadState {
 BdrvDirtyBitmap *bitmap;
 
 GSList *enabled_bitmaps;
-QemuMutex finish_lock;
+QemuMutex lock; /* protect enabled_bitmaps */
 } DBMLoadState;
 
 typedef struct DBMState {
@@ -575,7 +575,7 @@ void dirty_bitmap_mig_before_vm_start(void)
 DBMLoadState *s = _state.load;
 GSList *item;
 
-qemu_mutex_lock(>finish_lock);
+qemu_mutex_lock(>lock);
 
 for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;
@@ -592,7 +592,7 @@ void dirty_bitmap_mig_before_vm_start(void)
 g_slist_free(s->enabled_bitmaps);
 s->enabled_bitmaps = NULL;
 
-qemu_mutex_unlock(>finish_lock);
+qemu_mutex_unlock(>lock);
 }
 
 static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
@@ -601,7 +601,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 trace_dirty_bitmap_load_complete();
 bdrv_dirty_bitmap_deserialize_finish(s->bitmap);
 
-qemu_mutex_lock(>finish_lock);
+qemu_mutex_lock(>lock);
 
 for (item = s->enabled_bitmaps; item; item = g_slist_next(item)) {
 LoadBitmapState *b = item->data;
@@ -633,7 +633,7 @@ static void dirty_bitmap_load_complete(QEMUFile *f, 
DBMLoadState *s)
 bdrv_dirty_bitmap_unlock(s->bitmap);
 }
 
-qemu_mutex_unlock(>finish_lock);
+qemu_mutex_unlock(>lock);
 }
 
 static int dirty_bitmap_load_bits(QEMUFile *f, DBMLoadState *s)
@@ -815,7 +815,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
 QSIMPLEQ_INIT(_state.save.dbms_list);
-qemu_mutex_init(_state.load.finish_lock);
+qemu_mutex_init(_state.load.lock);
 
 register_savevm_live("dirty-bitmap", 0, 1,
  _dirty_bitmap_handlers,
-- 
2.21.0

[PATCH v4 10/21] migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

No reasons to keep two public init functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/migration.h  | 1 -
 migration/block-dirty-bitmap.c | 6 +-
 migration/migration.c  | 2 --
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index f617960522..ab20c756f5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,7 +335,6 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
*mis,
 void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
 
 void dirty_bitmap_mig_before_vm_start(void);
-void init_dirty_bitmap_incoming_migration(void);
 void migrate_add_address(SocketAddress *address);
 
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 01a536d7d3..4b67e4f4fb 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -148,11 +148,6 @@ typedef struct LoadBitmapState {
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;
 
-void init_dirty_bitmap_incoming_migration(void)
-{
-qemu_mutex_init(_lock);
-}
-
 static uint32_t qemu_get_bitmap_flags(QEMUFile *f)
 {
 uint8_t flags = qemu_get_byte(f);
@@ -801,6 +796,7 @@ static SaveVMHandlers savevm_dirty_bitmap_handlers = {
 void dirty_bitmap_mig_init(void)
 {
 QSIMPLEQ_INIT(_bitmap_mig_state.dbms_list);
+qemu_mutex_init(_lock);
 
 register_savevm_live("dirty-bitmap", 0, 1,
  _dirty_bitmap_handlers,
diff --git a/migration/migration.c b/migration/migration.c
index 2ed9923227..1c61428988 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,8 +165,6 @@ void migration_object_init(void)
 qemu_sem_init(_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(_incoming->postcopy_pause_sem_fault, 0);
 
-init_dirty_bitmap_incoming_migration();
-
 if (!migration_object_check(current_migration, )) {
 error_report_err(err);
 exit(1);
-- 
2.21.0

[PATCH v4 04/21] qemu-iotests/199: improve performance: set bitmap by discard

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Discard dirties dirty-bitmap as well as write, but works faster. Let's
use it instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index dd6044768c..190e820b84 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -67,8 +67,10 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 os.mkfifo(fifo)
 qemu_img('create', '-f', iotests.imgfmt, disk_a, size)
 qemu_img('create', '-f', iotests.imgfmt, disk_b, size)
-self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a)
-self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b)
+self.vm_a = iotests.VM(path_suffix='a').add_drive(disk_a,
+  'discard=unmap')
+self.vm_b = iotests.VM(path_suffix='b').add_drive(disk_b,
+  'discard=unmap')
 self.vm_b.add_incoming("exec: cat '" + fifo + "'")
 self.vm_a.launch()
 self.vm_b.launch()
@@ -78,7 +80,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b_events = []
 
 def test_postcopy(self):
-write_size = 0x4000
+discard_size = 0x4000
 granularity = 512
 chunk = 4096
 
@@ -86,25 +88,32 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
name='bitmap', granularity=granularity)
 self.assert_qmp(result, 'return', {})
 
+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap')
+empty_sha256 = result['return']['sha256']
+
 s = 0
-while s < write_size:
-self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1
 s = 0x8000
-while s < write_size:
-self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap')
 sha256 = result['return']['sha256']
 
+# Check, that updating the bitmap by discards works
+assert sha256 != empty_sha256
+
 result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
name='bitmap')
 self.assert_qmp(result, 'return', {})
 s = 0
-while s < write_size:
-self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_a.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1
 
 caps = [{'capability': 'dirty-bitmaps', 'state': True},
@@ -126,8 +135,8 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b_events.append(event_resume)
 
 s = 0x8000
-while s < write_size:
-self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
+while s < discard_size:
+self.vm_b.hmp_qemu_io('drive0', 'discard %d %d' % (s, chunk))
 s += 0x1
 
 match = {'data': {'status': 'completed'}}
-- 
2.21.0

[PATCH v4 02/21] qemu-iotests/199: drop extra constraints

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

We don't need any specific format constraints here. Still keep qcow2
for two reasons:
1. No extra calls of format-unrelated test
2. Add some check around persistent bitmap in future (require qcow2)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index de9ba8d94c..dda918450a 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -116,5 +116,4 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 
 if __name__ == '__main__':
-iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
- supported_protocols=['file'])
+iotests.main(supported_fmts=['qcow2'])
-- 
2.21.0

[PATCH v4 06/21] qemu-iotests/199: increase postcopy period

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

The test wants to force a bitmap postcopy. Still, the resulting
postcopy period is very small. Let's increase it by adding more
bitmaps to migrate. Also, test disabled bitmaps migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 58 --
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index da4dae01fb..d8532e49da 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -103,29 +103,45 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 def test_postcopy(self):
 granularity = 512
+nb_bitmaps = 15
 
-result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
-   name='bitmap', granularity=granularity)
-self.assert_qmp(result, 'return', {})
+for i in range(nb_bitmaps):
+result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
+   name='bitmap{}'.format(i),
+   granularity=granularity)
+self.assert_qmp(result, 'return', {})
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-   node='drive0', name='bitmap')
+   node='drive0', name='bitmap0')
 empty_sha256 = result['return']['sha256']
 
-apply_discards(self.vm_a, discards1 + discards2)
+apply_discards(self.vm_a, discards1)
 
 result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
-   node='drive0', name='bitmap')
-sha256 = result['return']['sha256']
+   node='drive0', name='bitmap0')
+discards1_sha256 = result['return']['sha256']
 
 # Check, that updating the bitmap by discards works
-assert sha256 != empty_sha256
+assert discards1_sha256 != empty_sha256
 
-result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
-   name='bitmap')
-self.assert_qmp(result, 'return', {})
+# We want to calculate resulting sha256. Do it in bitmap0, so, disable
+# other bitmaps
+for i in range(1, nb_bitmaps):
+result = self.vm_a.qmp('block-dirty-bitmap-disable', node='drive0',
+   name='bitmap{}'.format(i))
+self.assert_qmp(result, 'return', {})
 
-apply_discards(self.vm_a, discards1)
+apply_discards(self.vm_a, discards2)
+
+result = self.vm_a.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap0')
+all_discards_sha256 = result['return']['sha256']
+
+# Now, enable some bitmaps, to be updated during migration
+for i in range(2, nb_bitmaps, 2):
+result = self.vm_a.qmp('block-dirty-bitmap-enable', node='drive0',
+   name='bitmap{}'.format(i))
+self.assert_qmp(result, 'return', {})
 
 caps = [{'capability': 'dirty-bitmaps', 'state': True},
 {'capability': 'events', 'state': True}]
@@ -145,6 +161,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 event_resume = self.vm_b.event_wait('RESUME')
 self.vm_b_events.append(event_resume)
 
+# enabled bitmaps should be updated
 apply_discards(self.vm_b, discards2)
 
 match = {'data': {'status': 'completed'}}
@@ -158,7 +175,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 downtime = event_dist(event_stop, event_resume)
 postcopy_time = event_dist(event_resume, event_complete)
 
-# TODO: assert downtime * 10 < postcopy_time
+assert downtime * 10 < postcopy_time
 if debug:
 print('downtime:', downtime)
 print('postcopy_time:', postcopy_time)
@@ -166,12 +183,15 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 # Assert that bitmap migration is finished (check that successor bitmap
 # is removed)
 result = self.vm_b.qmp('query-block')
-assert len(result['return'][0]['dirty-bitmaps']) == 1
-
-# Check content of migrated (and updated by new writes) bitmap
-result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
-   node='drive0', name='bitmap')
-self.assert_qmp(result, 'return/sha256', sha256)
+assert len(result['return'][0]['dirty-bitmaps']) == nb_bitmaps
+
+# Check content of migrated bitmaps. Still, don't waste time checking
+# every bitmap
+for i in range(0, nb_bitmaps, 5):
+result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
+   node='drive0', name='bitmap{}'.format(i))
+sha256 = discards1_sha256 if i % 2 else

[PATCH v4 01/21] qemu-iotests/199: fix style

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Mostly, satisfy pep8 complaints.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Tested-by: Eric Blake 
---
 tests/qemu-iotests/199 | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/199 b/tests/qemu-iotests/199
index 40774eed74..de9ba8d94c 100755
--- a/tests/qemu-iotests/199
+++ b/tests/qemu-iotests/199
@@ -28,8 +28,8 @@ disk_b = os.path.join(iotests.test_dir, 'disk_b')
 size = '256G'
 fifo = os.path.join(iotests.test_dir, 'mig_fifo')
 
-class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
+class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 def tearDown(self):
 self.vm_a.shutdown()
 self.vm_b.shutdown()
@@ -54,7 +54,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 result = self.vm_a.qmp('block-dirty-bitmap-add', node='drive0',
name='bitmap', granularity=granularity)
-self.assert_qmp(result, 'return', {});
+self.assert_qmp(result, 'return', {})
 
 s = 0
 while s < write_size:
@@ -71,7 +71,7 @@ class TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 
 result = self.vm_a.qmp('block-dirty-bitmap-clear', node='drive0',
name='bitmap')
-self.assert_qmp(result, 'return', {});
+self.assert_qmp(result, 'return', {})
 s = 0
 while s < write_size:
 self.vm_a.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
@@ -104,15 +104,16 @@ class 
TestDirtyBitmapPostcopyMigration(iotests.QMPTestCase):
 self.vm_b.hmp_qemu_io('drive0', 'write %d %d' % (s, chunk))
 s += 0x1
 
-result = self.vm_b.qmp('query-block');
+result = self.vm_b.qmp('query-block')
 while len(result['return'][0]['dirty-bitmaps']) > 1:
 time.sleep(2)
-result = self.vm_b.qmp('query-block');
+result = self.vm_b.qmp('query-block')
 
 result = self.vm_b.qmp('x-debug-block-dirty-bitmap-sha256',
node='drive0', name='bitmap')
 
-self.assert_qmp(result, 'return/sha256', sha256);
+self.assert_qmp(result, 'return/sha256', sha256)
+
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2'], supported_cache_modes=['none'],
-- 
2.21.0

[PATCH v4 08/21] migration/block-dirty-bitmap: rename state structure types

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Rename types to be symmetrical for load/save part and shorter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
Reviewed-by: Eric Blake 
---
 migration/block-dirty-bitmap.c | 70 ++
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 0739f1259e..1d57bff4f6 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -100,23 +100,25 @@
 /* 0x04 was "AUTOLOAD" flags on elder versions, no it is ignored */
 #define DIRTY_BITMAP_MIG_START_FLAG_RESERVED_MASK0xf8
 
-typedef struct DirtyBitmapMigBitmapState {
+/* State of one bitmap during save process */
+typedef struct SaveBitmapState {
 /* Written during setup phase. */
 BlockDriverState *bs;
 const char *node_name;
 BdrvDirtyBitmap *bitmap;
 uint64_t total_sectors;
 uint64_t sectors_per_chunk;
-QSIMPLEQ_ENTRY(DirtyBitmapMigBitmapState) entry;
+QSIMPLEQ_ENTRY(SaveBitmapState) entry;
 uint8_t flags;
 
 /* For bulk phase. */
 bool bulk_completed;
 uint64_t cur_sector;
-} DirtyBitmapMigBitmapState;
+} SaveBitmapState;
 
-typedef struct DirtyBitmapMigState {
-QSIMPLEQ_HEAD(, DirtyBitmapMigBitmapState) dbms_list;
+/* State of the dirty bitmap migration (DBM) during save process */
+typedef struct DBMSaveState {
+QSIMPLEQ_HEAD(, SaveBitmapState) dbms_list;
 
 bool bulk_completed;
 bool no_bitmaps;
@@ -124,23 +126,25 @@ typedef struct DirtyBitmapMigState {
 /* for send_bitmap_bits() */
 BlockDriverState *prev_bs;
 BdrvDirtyBitmap *prev_bitmap;
-} DirtyBitmapMigState;
+} DBMSaveState;
 
-typedef struct DirtyBitmapLoadState {
+/* State of the dirty bitmap migration (DBM) during load process */
+typedef struct DBMLoadState {
 uint32_t flags;
 char node_name[256];
 char bitmap_name[256];
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
-} DirtyBitmapLoadState;
+} DBMLoadState;
 
-static DirtyBitmapMigState dirty_bitmap_mig_state;
+static DBMSaveState dirty_bitmap_mig_state;
 
-typedef struct DirtyBitmapLoadBitmapState {
+/* State of one bitmap during load process */
+typedef struct LoadBitmapState {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
 bool migrated;
-} DirtyBitmapLoadBitmapState;
+} LoadBitmapState;
 static GSList *enabled_bitmaps;
 QemuMutex finish_lock;
 
@@ -170,7 +174,7 @@ static void qemu_put_bitmap_flags(QEMUFile *f, uint32_t 
flags)
 qemu_put_byte(f, flags);
 }
 
-static void send_bitmap_header(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_header(QEMUFile *f, SaveBitmapState *dbms,
uint32_t additional_flags)
 {
 BlockDriverState *bs = dbms->bs;
@@ -199,19 +203,19 @@ static void send_bitmap_header(QEMUFile *f, 
DirtyBitmapMigBitmapState *dbms,
 }
 }
 
-static void send_bitmap_start(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_start(QEMUFile *f, SaveBitmapState *dbms)
 {
 send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_START);
 qemu_put_be32(f, bdrv_dirty_bitmap_granularity(dbms->bitmap));
 qemu_put_byte(f, dbms->flags);
 }
 
-static void send_bitmap_complete(QEMUFile *f, DirtyBitmapMigBitmapState *dbms)
+static void send_bitmap_complete(QEMUFile *f, SaveBitmapState *dbms)
 {
 send_bitmap_header(f, dbms, DIRTY_BITMAP_MIG_FLAG_COMPLETE);
 }
 
-static void send_bitmap_bits(QEMUFile *f, DirtyBitmapMigBitmapState *dbms,
+static void send_bitmap_bits(QEMUFile *f, SaveBitmapState *dbms,
  uint64_t start_sector, uint32_t nr_sectors)
 {
 /* align for buffer_is_zero() */
@@ -257,7 +261,7 @@ static void send_bitmap_bits(QEMUFile *f, 
DirtyBitmapMigBitmapState *dbms,
 /* Called with iothread lock taken.  */
 static void dirty_bitmap_mig_cleanup(void)
 {
-DirtyBitmapMigBitmapState *dbms;
+SaveBitmapState *dbms;
 
 while ((dbms = QSIMPLEQ_FIRST(_bitmap_mig_state.dbms_list)) != NULL) 
{
 QSIMPLEQ_REMOVE_HEAD(_bitmap_mig_state.dbms_list, entry);
@@ -271,7 +275,7 @@ static void dirty_bitmap_mig_cleanup(void)
 static int add_bitmaps_to_list(BlockDriverState *bs, const char *bs_name)
 {
 BdrvDirtyBitmap *bitmap;
-DirtyBitmapMigBitmapState *dbms;
+SaveBitmapState *dbms;
 Error *local_err = NULL;
 
 FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
@@ -309,7 +313,7 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const 
char *bs_name)
 bdrv_ref(bs);
 bdrv_dirty_bitmap_set_busy(bitmap, true);
 
-dbms = g_new0(DirtyBitmapMigBitmapState, 1);
+dbms = g_new0(SaveBitmapState, 1);
 dbms->bs = bs;
 dbms->node_name = bs_name;
 dbms->bitmap = bitmap;
@@ -334,7 +338,7 @@ static int add_bitmaps_to_list(BlockDriverState *bs, const 
char *bs_name)
 static int init_dirty_bitmap_migration(void)
 {
 BlockDriverState *bs;
-DirtyBitmapMigBitmapState *dbms;
+SaveBitmapState

[PATCH v4 for-5.1 00/21] Fix error handling during bitmap postcopy

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

v4:

01: typo in commit msg
07: reword commit msg, add Eric's r-b
10: add Dr. David's r-b
15: add check for buf_size
use g_autofree (and fix introduced in v3)
use QEMU_LOCK_GUARD
17: fix commit msg, add Eric's r-b
20-21: add Eric's t-b

Original idea of bitmaps postcopy migration is that bitmaps are non
critical data, and their loss is not serious problem. So, using postcopy
method on any failure we should just drop unfinished bitmaps and
continue guest execution.

However, it doesn't work so. It crashes, fails, it goes to
postcopy-recovery feature. It does anything except for behavior we want.
These series fixes at least some problems with error handling during
bitmaps migration postcopy.

Vladimir Sementsov-Ogievskiy (21):
  qemu-iotests/199: fix style
  qemu-iotests/199: drop extra constraints
  qemu-iotests/199: better catch postcopy time
  qemu-iotests/199: improve performance: set bitmap by discard
  qemu-iotests/199: change discard patterns
  qemu-iotests/199: increase postcopy period
  migration/block-dirty-bitmap: fix dirty_bitmap_mig_before_vm_start
  migration/block-dirty-bitmap: rename state structure types
  migration/block-dirty-bitmap: rename dirty_bitmap_mig_cleanup
  migration/block-dirty-bitmap: move mutex init to dirty_bitmap_mig_init
  migration/block-dirty-bitmap: refactor state global variables
  migration/block-dirty-bitmap: rename finish_lock to just lock
  migration/block-dirty-bitmap: simplify dirty_bitmap_load_complete
  migration/block-dirty-bitmap: keep bitmap state for all bitmaps
  migration/block-dirty-bitmap: relax error handling in incoming part
  migration/block-dirty-bitmap: cancel migration on shutdown
  migration/savevm: don't worry if bitmap migration postcopy failed
  qemu-iotests/199: prepare for new test-cases addition
  qemu-iotests/199: check persistent bitmaps
  qemu-iotests/199: add early shutdown case to bitmaps postcopy
  qemu-iotests/199: add source-killed case to bitmaps postcopy

 migration/migration.h  |   3 +-
 migration/block-dirty-bitmap.c | 470 +
 migration/migration.c  |  15 +-
 migration/savevm.c |  37 ++-
 tests/qemu-iotests/199 | 250 ++
 tests/qemu-iotests/199.out |   4 +-
 6 files changed, 545 insertions(+), 234 deletions(-)

-- 
2.21.0

Re: [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


27.07.2020 20:06, Vladimir Sementsov-Ogievskiy wrote:

27.07.2020 16:21, Dr. David Alan Gilbert wrote:

* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:

If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
  migration/migration.h  |  2 ++
  migration/block-dirty-bitmap.c | 16 
  migration/migration.c  | 13 +
  3 files changed, 31 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index ab20c756f5..6c6a931d0d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
*mis,
  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
  void dirty_bitmap_mig_before_vm_start(void);
+void dirty_bitmap_mig_cancel_outgoing(void);
+void dirty_bitmap_mig_cancel_incoming(void);
  void migrate_add_address(SocketAddress *address);
  int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c24d4614bf..a198ec7278 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
  s->bitmaps = NULL;
  }
+void dirty_bitmap_mig_cancel_outgoing(void)
+{
+    dirty_bitmap_do_save_cleanup(_state.save);
+}
+
+void dirty_bitmap_mig_cancel_incoming(void)
+{
+    DBMLoadState *s = _state.load;
+
+    qemu_mutex_lock(>lock);
+
+    cancel_incoming_locked(s);
+
+    qemu_mutex_unlock(>lock);
+}
+
  static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
  {
  GSList *item;
diff --git a/migration/migration.c b/migration/migration.c
index 1c61428988..8fe36339db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -188,6 +188,19 @@ void migration_shutdown(void)
   */
  migrate_fd_cancel(current_migration);
  object_unref(OBJECT(current_migration));
+
+    /*
+ * Cancel outgoing migration of dirty bitmaps. It should
+ * at least unref used block nodes.
+ */
+    dirty_bitmap_mig_cancel_outgoing();
+
+    /*
+ * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
+ * are non-critical data, and their loss never considered as
+ * something serious.
+ */
+    dirty_bitmap_mig_cancel_incoming();


Are you sure this is the right place to put them - I'm thinking that
perhaps the object_unref of current_migration should still be after
them?


Hmm, looks strange, I will check.


It's OK. These functions are operate on global bitmap migration state
which is separate from  current_migration, and do post-processing of
dirty bitmaps, so it's seems OK to do it at last.






  }
  /* For outgoing */
--
2.21.0


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK







--
Best regards,
Vladimir

Re: [PATCH v5 3/3] nvme: allow cmb and pmr to be enabled on same device

2020-07-27 Thread Andrzej Jakowski

On 7/27/20 2:06 AM, Klaus Jensen wrote:
> On Jul 23 09:03, Andrzej Jakowski wrote:
>> So far it was not possible to have CMB and PMR emulated on the same
>> device, because BAR2 was used exclusively either of PMR or CMB. This
>> patch places CMB at BAR4 offset so it not conflicts with MSI-X vectors.
>>
>> Signed-off-by: Andrzej Jakowski 
>> ---
>>  hw/block/nvme.c  | 120 +--
>>  hw/block/nvme.h  |   1 +
>>  include/block/nvme.h |   4 +-
>>  3 files changed, 85 insertions(+), 40 deletions(-)
>>
>> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>> index 43866b744f..d55a71a346 100644
>> --- a/hw/block/nvme.c
>> +++ b/hw/block/nvme.c
>> @@ -22,12 +22,13 @@
>>   *  [pmrdev=,] \
>>   *  max_ioqpairs=
>>   *
>> - * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>> - * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
>> + * Note cmb_size_mb denotes size of CMB in MB. CMB when configured is 
>> assumed
>> + * to be resident in BAR4 at offset that is 2MiB aligned. When CMB is 
>> emulated
>> + * on Linux guest it is recommended to make cmb_size_mb multiple of 2. Both
>> + * size and alignment restrictions are imposed by Linux guest.
>>   *
>> - * cmb_size_mb= and pmrdev= options are mutually exclusive due to limitation
>> - * in available BAR's. cmb_size_mb= will take precedence over pmrdev= when
>> - * both provided.
>> + * pmrdev is assumed to be resident in BAR2/BAR3. When configured it 
>> consumes
>> + * whole BAR2/BAR3 exclusively.
>>   * Enabling pmr emulation can be achieved by pointing to 
>> memory-backend-file.
>>   * For example:
>>   * -object memory-backend-file,id=,share=on,mem-path=, \
>> @@ -57,8 +58,8 @@
>>  #define NVME_MAX_IOQPAIRS 0x
>>  #define NVME_DB_SIZE  4
>>  #define NVME_SPEC_VER 0x00010300
>> -#define NVME_CMB_BIR 2
>>  #define NVME_PMR_BIR 2
>> +#define NVME_MSIX_BIR 4
> 
> I think that either we keep the CMB constant (but updated with '4' of
> course) or we just get rid of both NVME_{CMB,MSIX}_BIR and use a literal
> '4' in nvme_bar4_init. It is very clear that is only BAR 4 we use.
> 
>>  #define NVME_TEMPERATURE 0x143
>>  #define NVME_TEMPERATURE_WARNING 0x157
>>  #define NVME_TEMPERATURE_CRITICAL 0x175
>> @@ -111,16 +112,18 @@ static uint16_t nvme_sqid(NvmeRequest *req)
>>  
>>  static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr)
>>  {
>> -hwaddr low = n->ctrl_mem.addr;
>> -hwaddr hi  = n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size);
>> +hwaddr low = memory_region_to_absolute_addr(>ctrl_mem, 0);
>> +hwaddr hi  = low + int128_get64(n->ctrl_mem.size);
> 
> Are we really really sure we want to use a global helper like this? What
> are the chances/risk that we ever introduce another overlay? I'd say
> zero. We are not even using a *real* overlay, it's just an io memory
> region (ctrl_mem) on top of a pure container (bar4). Can't we live with
> an internal helper doing `n->bar4.addr + n->ctrl_mem.addr` and be done
> with it? It also removes a data structure walk on each invocation of
> nvme_addr_is_cmb (which is done for **all** addresses in PRP lists and
> SGLs).

Thx!
My understanding of memory_region_absolute_addr()([1]) function is that it walks
memory hierarchy up to root while incrementing absolute addr. It is very 
similar to n->bar4.addr + n->ctrl_mem.addr approach with following 
differences:
 * n->bar4.addr + n->ctrl_mem.addr assumes single level hierarchy. Updates would
   be needed when another memory level is added.
 * memory_region_to_absolute_addr() works for any-level hierarchy at tradeoff
   of dereferencing data structure. 

I don't have data for likelihood of adding new memory level, nor how much more
memory_region_to_absolute_addr() vs n->bar4.addr + n->ctrl_mem.addr costs.

Please let me know which approach is preferred.

[1]
hwaddr memory_region_to_absolute_addr(MemoryRegion *mr, hwaddr offset)
{
MemoryRegion *root;
hwaddr abs_addr = offset;

abs_addr += mr->addr;
for (root = mr; root->container; ) {
root = root->container;
abs_addr += root->addr;
}

return abs_addr;
}

> 
>>  
>>  return addr >= low && addr < hi;
>>  }
>>  
>>  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>>  {
>> +hwaddr cmb_addr = memory_region_to_absolute_addr(>ctrl_mem, 0);
>> +
>>  if (n->bar.cmbsz && nvme_addr_is_cmb(n, addr)) {
>> -memcpy(buf, (void *)>cmbuf[addr - n->ctrl_mem.addr], size);
>> +memcpy(buf, (void *)>cmbuf[addr - cmb_addr], size);
>>  return;
>>  }
>>  
>> @@ -207,17 +210,18 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
>> QEMUIOVector *iov, uint64_t prp1,
>>   uint64_t prp2, uint32_t len, NvmeCtrl *n)
>>  {
>>  hwaddr trans_len = n->page_size - (prp1 % n->page_size);
>> +hwaddr cmb_addr = memory_region_to_absolute_addr(>ctrl_mem, 0);
>>  trans_len = MIN(len, trans_len);
>>  int num_prps =

[PATCH v2 5/5] block/nbd: use non-blocking connect: fix vm hang on connect()

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

This make nbd connection_co to yield during reconnects, so that
reconnect doesn't hang up the main thread. This is very important in
case of unavailable nbd server host: connect() call may take a long
time, blocking the main thread (and due to reconnect, it will hang
again and again with small gaps of working time during pauses between
connection attempts).

Realization notes:

 - We don't want to implement non-blocking connect() over non-blocking
 socket, because getaddrinfo() doesn't have portable non-blocking
 realization anyway, so let's just use a thread for both getaddrinfo()
 and connect().

 - We can't use qio_channel_socket_connect_async (which behave
 similarly and start a thread to execute connect() call), as it's rely
 on someone iterating main loop (g_main_loop_run() or something like
 this), which is not always the case.

 - We can't use thread_pool_submit_co API, as thread pool waits for all
 threads to finish (but we don't want to wait for blocking reconnect
 attempt on shutdown.

 So, we just create the thread by hand. Some additional difficulties
 are:

 - We want our connect don't block drained sections and aio context
 switches. To achieve this, we make it possible to "cancel" synchronous
 wait for the connect (which is an coroutine yield actually), still,
 the thread continues in background, and it successful result may be
 reused on next reconnect attempt.

 - We don't want to wait for reconnect on shutdown, so there is
 CONNECT_THREAD_RUNNING_DETACHED thread state, which means that block
 layer not more interested in a result, and thread should close new
 connected socket on finish and free the state.

How to reproduce the bug, fixed with this commit:

1. Create an image on node1:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server on node1:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \
-vnc :0 -qmp stdio -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, let's trigger nbd-reconnect loop in Qemu process. For this:

5.1 Kill NBD server on node1

5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest
again. The command should fail and a lot of error messages about
failing disk may appear as well.

Now NBD client driver in Qemu tries to reconnect.
Still, VM works well.

6. Make node1 unavailable on NBD port, so connect() from node2 will
   last for a long time:

   On node1 (Note, that 10809 is just a default NBD port):

   sudo iptables -A INPUT -p tcp --dport 10809 -j DROP

   After some time the guest hangs, and you may check in gdb that Qemu
   hangs in connect() call, issued from the main thread. This is the
   BUG.

7. Don't forget to drop iptables rule from your node1:

   sudo iptables -D INPUT -p tcp --dport 10809 -j DROP

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 266 +++-
 1 file changed, 265 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 8c5df68856..75352adf89 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -38,6 +38,7 @@
 
 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
+#include "qapi/clone-visitor.h"
 
 #include "block/qdict.h"
 #include "block/nbd.h"
@@ -62,6 +63,47 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
+typedef enum NBDConnectThreadState {
+/* No thread, no pending results */
+CONNECT_THREAD_NONE,
+
+/* Thread is running, no results for now */
+CONNECT_THREAD_RUNNING,
+
+/*
+ * Thread is running, but requestor exited. Thread should close the new socket
+ * and free the connect state on exit.
+ */
+CONNECT_THREAD_RUNNING_DETACHED,
+
+/* Thread finished, results are stored in a state */
+CONNECT_THREAD_FAIL,
+CONNECT_THREAD_SUCCESS
+} NBDConnectThreadState;
+
+typedef struct NBDConnectThread {
+/* Initialization constants */
+SocketAddress *saddr; /* address to connect to */
+/*
+ * Bottom half to schedule on completion. Scheduled only if bh_ctx is not
+ * NULL
+ */
+QEMUBHFunc *bh_func;
+void *bh_opaque;
+
+/*
+ * Result of last attempt. Valid in FAIL and SUCCESS states.
+ * If you want to steal error, don't forget to set pointer to NULL.
+ */
+QIOChannelSocket *sioc;
+Error *err;
+
+/* state and bh_ctx are protected by mutex */
+QemuMutex mutex;
+NBDConnectThreadState state; /* current state of the thread */
+AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
+} NBDConnectThread;
+
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -91,10 +133,17 @@

[PATCH v2 4/5] block/nbd: nbd_co_reconnect_loop(): don't sleep if drained

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

We try to go to wakeable sleep, so that, if drain begins it will break
the sleep. But what if nbd_client_co_drain_begin() already called and
s->drained is already true? We'll go to sleep, and drain will have to
wait for the whole timeout. Let's improve it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/nbd.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index dfe1408b2d..8c5df68856 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -341,8 +341,6 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState 
*s)
 qemu_co_queue_restart_all(>free_sema);
 }
 
-qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
-  >connection_co_sleep_ns_state);
 if (s->drained) {
 bdrv_dec_in_flight(s->bs);
 s->wait_drained_end = true;
@@ -354,9 +352,12 @@ static coroutine_fn void 
nbd_co_reconnect_loop(BDRVNBDState *s)
 qemu_coroutine_yield();
 }
 bdrv_inc_in_flight(s->bs);
-}
-if (timeout < max_timeout) {
-timeout *= 2;
+} else {
+qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
+  >connection_co_sleep_ns_state);
+if (timeout < max_timeout) {
+timeout *= 2;
+}
 }
 
 nbd_reconnect_attempt(s);
-- 
2.21.0

[PATCH v2 1/5] block/nbd: split nbd_establish_connection out of nbd_client_connect

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

We are going to implement non-blocking version of
nbd_establish_connection, which for a while will be used only for
nbd_reconnect_attempt, not for nbd_open, so we need to call it
separately.

Refactor nbd_reconnect_attempt in a way which makes next commit
simpler.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c| 60 +++---
 block/trace-events |  4 ++--
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 65a4f56924..2ec6623c18 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -93,7 +93,10 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 } BDRVNBDState;
 
-static int nbd_client_connect(BlockDriverState *bs, Error **errp);
+static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
+  Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
+Error **errp);
 
 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
@@ -241,7 +244,9 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
+int ret;
 Error *local_err = NULL;
+QIOChannelSocket *sioc;
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -280,19 +285,25 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-s->connect_status = nbd_client_connect(s->bs, _err);
+sioc = nbd_establish_connection(s->saddr, _err);
+if (!sioc) {
+ret = -ECONNREFUSED;
+goto out;
+}
+
+ret = nbd_client_handshake(s->bs, sioc, _err);
+
+out:
+s->connect_status = ret;
 error_free(s->connect_err);
 s->connect_err = NULL;
 error_propagate(>connect_err, local_err);
 
-if (s->connect_status < 0) {
-/* failed attempt */
-return;
+if (ret >= 0) {
+/* successfully connected */
+s->state = NBD_CLIENT_CONNECTED;
+qemu_co_queue_restart_all(>free_sema);
 }
-
-/* successfully connected */
-s->state = NBD_CLIENT_CONNECTED;
-qemu_co_queue_restart_all(>free_sema);
 }
 
 static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
@@ -1425,24 +1436,15 @@ static QIOChannelSocket 
*nbd_establish_connection(SocketAddress *saddr,
 return sioc;
 }
 
-static int nbd_client_connect(BlockDriverState *bs, Error **errp)
+/* nbd_client_handshake takes ownership on sioc. On failure it is unref'ed. */
+static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
+Error **errp)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 AioContext *aio_context = bdrv_get_aio_context(bs);
 int ret;
 
-/*
- * establish TCP connection, return error if it fails
- * TODO: Configurable retry-until-timeout behaviour.
- */
-QIOChannelSocket *sioc = nbd_establish_connection(s->saddr, errp);
-
-if (!sioc) {
-return -ECONNREFUSED;
-}
-
-/* NBD handshake */
-trace_nbd_client_connect(s->export);
+trace_nbd_client_handshake(s->export);
 qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
 qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
 
@@ -1489,7 +1491,7 @@ static int nbd_client_connect(BlockDriverState *bs, Error 
**errp)
 object_ref(OBJECT(s->ioc));
 }
 
-trace_nbd_client_connect_success(s->export);
+trace_nbd_client_handshake_success(s->export);
 
 return 0;
 
@@ -1894,6 +1896,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 {
 int ret;
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+QIOChannelSocket *sioc;
 
 ret = nbd_process_options(bs, options, errp);
 if (ret < 0) {
@@ -1904,7 +1907,16 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 qemu_co_mutex_init(>send_mutex);
 qemu_co_queue_init(>free_sema);
 
-ret = nbd_client_connect(bs, errp);
+/*
+ * establish TCP connection, return error if it fails
+ * TODO: Configurable retry-until-timeout behaviour.
+ */
+sioc = nbd_establish_connection(s->saddr, errp);
+if (!sioc) {
+return -ECONNREFUSED;
+}
+
+ret = nbd_client_handshake(bs, sioc, errp);
 if (ret < 0) {
 nbd_clear_bdrvstate(s);
 return ret;
diff --git a/block/trace-events b/block/trace-events
index d3533ca896..9158335061 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -168,8 +168,8 @@ nbd_parse_blockstatus_compliance(const char *err) "ignoring 
extra data from non-
 nbd_structured_read_compliance(const char *type) "server sent non-compliant 
unaligned read %s chunk"
 nbd_read_reply_entry_fail(int ret, const char *err) "ret = %d, err: %s"
 nbd_co_request_fail(uint64_t from, uint32_t len, uint64_t handle, uint16_t 
flags, uint16_t type, const char *name, int ret, const char *err) "Request 
failed { .from = %"

[PATCH v2 3/5] block/nbd: on shutdown terminate connection attempt

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

On shutdown nbd driver may be in a connecting state. We should shutdown
it as well, otherwise we may hang in
nbd_teardown_connection, waiting for conneciton_co to finish in
BDRV_POLL_WHILE(bs, s->connection_co) loop if remote server is down.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
echo $n; ((n++));
./nbd-fault-injector.py 127.0.0.1:1 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
echo $n; ((n++));
./qemu-io -c 'read 0 512' nbd://127.0.0.1:1;
done

After some time, qemu-io will hang. Note, that this hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: allow drain during reconnect attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/nbd.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6d19f3c660..dfe1408b2d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -209,11 +209,15 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-if (s->state == NBD_CLIENT_CONNECTED) {
+if (s->ioc) {
 /* finish any pending coroutines */
-assert(s->ioc);
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+} else if (s->sioc) {
+/* abort negotiation */
+qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH,
+ NULL);
 }
+
 s->state = NBD_CLIENT_QUIT;
 if (s->connection_co) {
 if (s->connection_co_sleep_ns_state) {
@@ -1459,6 +1463,9 @@ static int nbd_client_handshake(BlockDriverState *bs, 
QIOChannelSocket *sioc,
 int ret;
 
 trace_nbd_client_handshake(s->export);
+
+s->sioc = sioc;
+
 qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
 qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
 
@@ -1473,6 +1480,7 @@ static int nbd_client_handshake(BlockDriverState *bs, 
QIOChannelSocket *sioc,
 g_free(s->info.name);
 if (ret < 0) {
 object_unref(OBJECT(sioc));
+s->sioc = NULL;
 return ret;
 }
 if (s->x_dirty_bitmap && !s->info.base_allocation) {
@@ -1498,8 +1506,6 @@ static int nbd_client_handshake(BlockDriverState *bs, 
QIOChannelSocket *sioc,
 }
 }
 
-s->sioc = sioc;
-
 if (!s->ioc) {
 s->ioc = QIO_CHANNEL(sioc);
 object_ref(OBJECT(s->ioc));
@@ -1520,6 +1526,7 @@ static int nbd_client_handshake(BlockDriverState *bs, 
QIOChannelSocket *sioc,
 nbd_send_request(s->ioc ?: QIO_CHANNEL(sioc), );
 
 object_unref(OBJECT(sioc));
+s->sioc = NULL;
 
 return ret;
 }
-- 
2.21.0

[PATCH v2 2/5] block/nbd: allow drain during reconnect attempt

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

It should be to reenter qio_channel_yield() on io/channel read/write
path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.

How to reproduce the dead lock:

1. Create nbd-fault-injector.conf with the following contents:

[inject-error "mega1"]
event=data
io=readwrite
when=before

2. In one terminal run nbd-fault-injector in a loop, like this:

n=1; while true; do
echo $n; ((n++));
./nbd-fault-injector.py 127.0.0.1:1 nbd-fault-injector.conf;
done

3. In another terminal run qemu-io in a loop, like this:

n=1; while true; do
echo $n; ((n++));
./qemu-io -c 'read 0 512' nbd://127.0.0.1:1;
done

After some time, qemu-io will hang trying to drain, for example, like
this:

 #3 aio_poll (ctx=0x55f006bdd890, blocking=true) at
util/aio-posix.c:600
 #4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false,
parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427
 #5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433
 #6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710
 #7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498
 #8 bdrv_open_inherit (filename=0x7fffba1563bc
"nbd+tcp://127.0.0.1:1", reference=0x0, options=0x55f006be86d0,
flags=24578, parent=0x0, child_class=0x0, child_role=0,
errp=0x7fffba154620) at block.c:3491
 #9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:1",
reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
block.c:3513
 #10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:1",
reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
block/block-backend.c:421

And connection_co stack like this:

 #0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918,
action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302
 #1 qemu_coroutine_yield () at util/qemu-coroutine.c:193
 #2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at
io/channel.c:472
 #3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
niov=1, errp=0x7fe96d729eb0) at io/channel.c:110
 #4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
niov=1, errp=0x7fe96d729eb0) at io/channel.c:143
 #5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28
"\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at
io/channel.c:247
 #6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8,
desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
/work/src/qemu/master/include/block/nbd.h:365
 #7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28,
desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
/work/src/qemu/master/include/block/nbd.h:391
 #8 nbd_start_negotiate (aio_context=0x55f006bdd890,
ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
outioc=0x55f006bf19f8, structured_reply=true,
zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904
 #9 nbd_receive_negotiate (aio_context=0x55f006bdd890,
ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at
nbd/client.c:1032
 #10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at
block/nbd.c:1460
 #11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287
 #12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309
 #13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360
 #14 coroutine_trampoline (i0=113190480, i1=22000) at
util/coroutine-ucontext.c:173

Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 2ec6623c18..6d19f3c660 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -291,8 +291,22 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 goto out;
 }
 
+bdrv_dec_in_flight(s->bs);
+
 ret = nbd_client_handshake(s->bs, sioc, _err);
 
+if (s->drained) {
+s->wait_drained_end = true;
+while (s->drained) {
+/*
+ * We may be entered once from nbd_client_attach_aio_context_bh
+ * and then from nbd_client_co_drain_end. So here is a loop.
+ */
+qemu_coroutine_yield();
+}
+}
+bdrv_inc_in_flight(s->bs);
+
 out:
 s->connect_status = ret;
 error_free(s->connect_err);
-- 
2.21.0

[PATCH v2 for-5.1? 0/5] Fix nbd reconnect dead-locks

2020-07-27 Thread Vladimir Sementsov-Ogievskiy

Hi all!

v2: it's a bit updated "[PATCH for-5.1? 0/3] Fix nbd reconnect dead-locks"
plus completely rewritten "[PATCH for-5.1? 0/4] non-blocking connect"
(which is now the only one patch 05)

01: new
02: rebased on 01, fix (add outer "if")
03-04: add Eric's r-b:
05: new

If 05 is too big for 5.1, it's OK to take only 01-04 or less, as well as
postponing everything to 5.2, as it's all not a degradation of 5.1
(it's a degradation of 4.2, together with the whole reconnect feature).

Vladimir Sementsov-Ogievskiy (5):
  block/nbd: split nbd_establish_connection out of nbd_client_connect
  block/nbd: allow drain during reconnect attempt
  block/nbd: on shutdown terminate connection attempt
  block/nbd: nbd_co_reconnect_loop(): don't sleep if drained
  block/nbd: use non-blocking connect: fix vm hang on connect()

 block/nbd.c| 360 +
 block/trace-events |   4 +-
 2 files changed, 331 insertions(+), 33 deletions(-)

-- 
2.21.0

Re: [PATCH 3/3] block/nbd: nbd_co_reconnect_loop(): don't sleep if drained

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


23.07.2020 21:55, Eric Blake wrote:

On 7/20/20 4:00 AM, Vladimir Sementsov-Ogievskiy wrote:

We try to go to wakeable sleep, so that, if drain begins it will break
the sleep. But what if nbd_client_co_drain_begin() already called and
s->drained is already true? We'll go to sleep, and drain will have to
wait for the whole timeout. Let's improve it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/nbd.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)



How frequently did you hit this case?  At any rate, the optimization looks 
sane, and I'm happy to include it in 5.1.



I don't remember. Probably once? And, as I said in cover letter, it even was 
not master branch but some my experiment..

--
Best regards,
Vladimir

Re: [PATCH v7 44/47] iotests: Add filter commit test cases

2020-07-27 Thread Andrey Shinkevich


On 25.06.2020 18:22, Max Reitz wrote:

This patch adds some tests on how commit copes with filter nodes.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/040 | 177 +
  tests/qemu-iotests/040.out |   4 +-
  2 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 32c82b4ec6..e7fa244738 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -713,6 +713,183 @@ class TestErrorHandling(iotests.QMPTestCase):
  self.assertTrue(iotests.compare_images(mid_img, backing_img, 
fmt2='raw'),
  'target image does not match source after commit')
  
+class TestCommitWithFilters(iotests.QMPTestCase):

+img0 = os.path.join(iotests.test_dir, '0.img')
+img1 = os.path.join(iotests.test_dir, '1.img')
+img2 = os.path.join(iotests.test_dir, '2.img')
+img3 = os.path.join(iotests.test_dir, '3.img')
+
+def do_test_io(self, read_or_write):



The method defenition could be moved down after the ones of setUp() and 
tearDown().




+for index, pattern_file in enumerate(self.pattern_files):
+result = qemu_io('-f', iotests.imgfmt,
+ '-c', '{} -P {} {}M 1M'.format(read_or_write,
+index + 1, index),



The Python3 format string f'{rad_or_write} ..' might be used instead of 
the .format one.


Andrey



+ pattern_file)
+self.assertFalse('Pattern verification failed' in result)
+
+def setUp(self):


...


Reviewed-by: Andrey Shinkevich

Re: [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


27.07.2020 16:21, Dr. David Alan Gilbert wrote:

* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:

If target is turned off prior to postcopy finished, target crashes
because busy bitmaps are found at shutdown.
Canceling incoming migration helps, as it removes all unfinished (and
therefore busy) bitmaps.

Similarly on source we crash in bdrv_close_all which asserts that all
bdrv states are removed, because bdrv states involved into dirty bitmap
migration are referenced by it. So, we need to cancel outgoing
migration as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Andrey Shinkevich 
---
  migration/migration.h  |  2 ++
  migration/block-dirty-bitmap.c | 16 
  migration/migration.c  | 13 +
  3 files changed, 31 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index ab20c756f5..6c6a931d0d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
*mis,
  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
  
  void dirty_bitmap_mig_before_vm_start(void);

+void dirty_bitmap_mig_cancel_outgoing(void);
+void dirty_bitmap_mig_cancel_incoming(void);
  void migrate_add_address(SocketAddress *address);
  
  int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c24d4614bf..a198ec7278 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
  s->bitmaps = NULL;
  }
  
+void dirty_bitmap_mig_cancel_outgoing(void)

+{
+dirty_bitmap_do_save_cleanup(_state.save);
+}
+
+void dirty_bitmap_mig_cancel_incoming(void)
+{
+DBMLoadState *s = _state.load;
+
+qemu_mutex_lock(>lock);
+
+cancel_incoming_locked(s);
+
+qemu_mutex_unlock(>lock);
+}
+
  static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
  {
  GSList *item;
diff --git a/migration/migration.c b/migration/migration.c
index 1c61428988..8fe36339db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -188,6 +188,19 @@ void migration_shutdown(void)
   */
  migrate_fd_cancel(current_migration);
  object_unref(OBJECT(current_migration));
+
+/*
+ * Cancel outgoing migration of dirty bitmaps. It should
+ * at least unref used block nodes.
+ */
+dirty_bitmap_mig_cancel_outgoing();
+
+/*
+ * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
+ * are non-critical data, and their loss never considered as
+ * something serious.
+ */
+dirty_bitmap_mig_cancel_incoming();


Are you sure this is the right place to put them - I'm thinking that
perhaps the object_unref of current_migration should still be after
them?


Hmm, looks strange, I will check.




  }
  
  /* For outgoing */

--
2.21.0


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




--
Best regards,
Vladimir

[PATCH v2 4/6] block: add ability to specify list of blockdevs during snapshot

2020-07-27 Thread Daniel P . Berrangé

When running snapshot operations, there are various rules for which
blockdevs are included/excluded. While this provides reasonable default
behaviour, there are scenarios that are not well handled by the default
logic. Some of the conditions do not have a single correct answer.

Thus there needs to be a way for the mgmt app to provide an explicit
list of blockdevs to perform snapshots across. This can be achieved
by passing a list of node names that should be used.

Signed-off-by: Daniel P. Berrangé 
---
 block/monitor/block-hmp-cmds.c |  4 +--
 block/snapshot.c   | 48 ++
 include/block/snapshot.h   | 13 -
 migration/savevm.c | 16 ++--
 monitor/hmp-cmds.c |  2 +-
 5 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 9df11494d6..db76c43cc2 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -900,7 +900,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
 SnapshotEntry *snapshot_entry;
 Error *err = NULL;
 
-bs = bdrv_all_find_vmstate_bs();
+bs = bdrv_all_find_vmstate_bs(NULL, );
 if (!bs) {
 error_report_err(err);
 return;
@@ -952,7 +952,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
 total = 0;
 for (i = 0; i < nb_sns; i++) {
 SnapshotEntry *next_sn;
-if (bdrv_all_find_snapshot(sn_tab[i].name, NULL) == 0) {
+if (bdrv_all_find_snapshot(sn_tab[i].name, NULL, NULL) == 0) {
 global_snapshots[total] = i;
 total++;
 QTAILQ_FOREACH(image_entry, _list, next) {
diff --git a/block/snapshot.c b/block/snapshot.c
index 6839060622..f2600a8c7f 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -385,22 +385,34 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState 
*bs,
 return ret;
 }
 
-static bool bdrv_all_snapshots_includes_bs(BlockDriverState *bs)
+static bool bdrv_all_snapshots_includes_bs(BlockDriverState *bs,
+   strList *devices)
 {
-if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+if (devices) {
+const char *node_name = bdrv_get_node_name(bs);
+while (devices) {
+if (g_str_equal(node_name, devices->value)) {
+return true;
+}
+devices = devices->next;
+}
 return false;
-}
+} else {
+if (!bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+return false;
+}
 
-/* Include all nodes that are either in use by a BlockBackend, or that
- * aren't attached to any node, but owned by the monitor. */
-return bdrv_has_blk(bs) || QLIST_EMPTY(>parents);
+/* Include all nodes that are either in use by a BlockBackend, or that
+ * aren't attached to any node, but owned by the monitor. */
+return bdrv_has_blk(bs) || QLIST_EMPTY(>parents);
+}
 }
 
 /* Group operations. All block drivers are involved.
  * These functions will properly handle dataplane (take aio_context_acquire
  * when appropriate for appropriate block drivers) */
 
-bool bdrv_all_can_snapshot(Error **errp)
+bool bdrv_all_can_snapshot(strList *devices, Error **errp)
 {
 BlockDriverState *bs;
 BdrvNextIterator it;
@@ -410,7 +422,7 @@ bool bdrv_all_can_snapshot(Error **errp)
 bool ok;
 
 aio_context_acquire(ctx);
-if (bdrv_all_snapshots_includes_bs(bs)) {
+if (bdrv_all_snapshots_includes_bs(bs, devices)) {
 ok = bdrv_can_snapshot(bs);
 }
 aio_context_release(ctx);
@@ -425,7 +437,7 @@ bool bdrv_all_can_snapshot(Error **errp)
 return true;
 }
 
-int bdrv_all_delete_snapshot(const char *name, Error **errp)
+int bdrv_all_delete_snapshot(const char *name, strList *devices, Error **errp)
 {
 BlockDriverState *bs;
 BdrvNextIterator it;
@@ -436,7 +448,7 @@ int bdrv_all_delete_snapshot(const char *name, Error **errp)
 int ret;
 
 aio_context_acquire(ctx);
-if (bdrv_all_snapshots_includes_bs(bs) &&
+if (bdrv_all_snapshots_includes_bs(bs, devices) &&
 bdrv_snapshot_find(bs, snapshot, name) >= 0)
 {
 ret = bdrv_snapshot_delete(bs, snapshot->id_str,
@@ -455,7 +467,7 @@ int bdrv_all_delete_snapshot(const char *name, Error **errp)
 }
 
 
-int bdrv_all_goto_snapshot(const char *name, Error **errp)
+int bdrv_all_goto_snapshot(const char *name, strList *devices, Error **errp)
 {
 BlockDriverState *bs;
 BdrvNextIterator it;
@@ -465,7 +477,7 @@ int bdrv_all_goto_snapshot(const char *name, Error **errp)
 int ret;
 
 aio_context_acquire(ctx);
-if (bdrv_all_snapshots_includes_bs(bs)) {
+if (bdrv_all_snapshots_includes_bs(bs, devices)) {
 ret = bdrv_snapshot_goto(bs, name, errp);
 }
 aio_context_release(ctx);
@@ -480,7 +492,7 @@ int

Re: [PATCH 1/2] block: nbd: Fix convert qcow2 compressed to nbd

2020-07-27 Thread Nir Soffer

On Mon, Jul 27, 2020 at 5:04 PM Eric Blake  wrote:
>
> On 7/26/20 10:25 AM, Nir Soffer wrote:
> > When converting to qcow2 compressed format, the last step is a special
> > zero length compressed write, ending in call to bdrv_co_truncate(). This
> > call always fail for the nbd driver since it does not implement
>
> fails
>
> > bdrv_co_truncate().
>
> Arguably, qemu-img should be taught to ignore the failure, since it is
> not unique to the nbd driver. But I can live with your approach here.
>
> >
> > For block devices, which have the same limits, the call succeeds since
> > file driver implements bdrv_co_truncate(). If the caller asked to
> > truncate to the same or smaller size with exact=false, the truncate
> > succeeds. Implement the same logic for nbd.
> >
> > Example failing without this change:
> >
>
> >
> > Fixes: https://bugzilla.redhat.com/1860627
> > Signed-off-by: Nir Soffer 
> > ---
> >   block/nbd.c | 27 +++
> >   1 file changed, 27 insertions(+)
> >
> > diff --git a/block/nbd.c b/block/nbd.c
> > index 65a4f56924..2154113af3 100644
> > --- a/block/nbd.c
> > +++ b/block/nbd.c
> > @@ -1966,6 +1966,30 @@ static void nbd_close(BlockDriverState *bs)
> >   nbd_clear_bdrvstate(s);
> >   }
> >
> > +/*
> > + * NBD cannot truncate, but if the caller ask to truncate to the same 
> > size, or
>
> asks
>
> > + * to a smaller size with extact=false, there is not reason to fail the
>
> exact, no
>
> > + * operation.
> > + */
> > +static int coroutine_fn nbd_co_truncate(BlockDriverState *bs, int64_t 
> > offset,
> > +bool exact, PreallocMode prealloc,
> > +BdrvRequestFlags flags, Error 
> > **errp)
> > +{
> > +BDRVNBDState *s = bs->opaque;
> > +
> > +if (offset != s->info.size && exact) {
> > +error_setg(errp, "Cannot resize NBD nodes");
> > +return -ENOTSUP;
> > +}
> > +
> > +if (offset > s->info.size) {
> > +error_setg(errp, "Cannot grow NBD nodes");
> > +return -EINVAL;
> > +}
> > +
> > +return 0;
>
> Looks reasonable.  As Max said, I wonder if we want to reject particular
> preallocation modes (looking at block/file-posix.c:raw_co_truncate), in
> the case where the image was resized down and then back up (since
> s->info.size is constant, but the BDS size is not if inexact resize
> succeeds).

Do we want to fail if someone specifies -o preallocation={falloc,full}?

I see we convert DRV_REQ_MAY_UNMAP to NBD_CMD_FLAG_NO_HOLE
so using -o preallocation=falloc,full should be correct. But the last
request zero
length write request does not do anything, so failing does not look useful.

> As you have a bugzilla entry, I think this is safe for -rc2; I'll be
> touching up the typos and queuing it through my NBD tree later today.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>

[PATCH v2 6/6] migration: introduce snapshot-{save, load, delete} QMP commands

2020-07-27 Thread Daniel P . Berrangé

savevm, loadvm and delvm are some of the few HMP commands that have never
been converted to use QMP. The primary reason for this lack of conversion
is that they block execution of the thread for as long as they run.

Despite this downside, however, libvirt and applications using libvirt
have used these commands for as long as QMP has existed, via the
"human-monitor-command" passthrough command. IOW, while it is clearly
desirable to be able to fix the blocking problem, this is not an
immediate obstacle to real world usage.

Meanwhile there is a need for other features which involve adding new
parameters to the commands. This is possible with HMP passthrough, but
it provides no reliable way for apps to introspect features, so using
QAPI modelling is highly desirable.

This patch thus introduces new snapshot-{load,save,delete} commands to
QMP that are intended to replace the old HMP counterparts. The new
commands are given different names, because they will be using the new
QEMU job framework and thus will have diverging behaviour from the HMP
originals. It would thus be misleading to keep the same name.

While this design uses the generic job framework, the current impl is
still blocking. The intention that the blocking problem is fixed later.
None the less applications using these new commands should assume that
they are asynchronous and thus wait for the job status change event to
indicate completion.

Signed-off-by: Daniel P. Berrangé 
---
 include/migration/snapshot.h |  10 +-
 migration/savevm.c   | 172 +--
 monitor/hmp-cmds.c   |   4 +-
 qapi/job.json|   9 +-
 qapi/migration.json  | 112 +++
 replay/replay-snapshot.c |   4 +-
 softmmu/vl.c |   2 +-
 tests/qemu-iotests/310   | 125 +
 tests/qemu-iotests/310.out   |   0
 tests/qemu-iotests/group |   1 +
 10 files changed, 421 insertions(+), 18 deletions(-)
 create mode 100755 tests/qemu-iotests/310
 create mode 100644 tests/qemu-iotests/310.out

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index c85b6ec75b..f2ed9d1e43 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -15,7 +15,13 @@
 #ifndef QEMU_MIGRATION_SNAPSHOT_H
 #define QEMU_MIGRATION_SNAPSHOT_H
 
-int save_snapshot(const char *name, Error **errp);
-int load_snapshot(const char *name, Error **errp);
+#include "qapi/qapi-builtin-types.h"
+
+int save_snapshot(const char *name,
+  const char *vmstate, strList *devices,
+  Error **errp);
+int load_snapshot(const char *name,
+  const char *vmstate, strList *devices,
+  Error **errp);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index 1707fa30db..13c5a54aae 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -43,6 +43,8 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-commands-misc.h"
+#include "qapi/clone-visitor.h"
+#include "qapi/qapi-builtin-visit.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "sysemu/cpus.h"
@@ -2631,7 +2633,8 @@ int qemu_load_device_state(QEMUFile *f)
 return 0;
 }
 
-int save_snapshot(const char *name, Error **errp)
+int save_snapshot(const char *name, const char *vmstate,
+  strList *devices, Error **errp)
 {
 BlockDriverState *bs;
 QEMUSnapshotInfo sn1, *sn = , old_sn1, *old_sn = _sn1;
@@ -2653,18 +2656,18 @@ int save_snapshot(const char *name, Error **errp)
 return ret;
 }
 
-if (!bdrv_all_can_snapshot(NULL, errp)) {
+if (!bdrv_all_can_snapshot(devices, errp)) {
 return ret;
 }
 
 /* Delete old snapshots of the same name */
 if (name) {
-if (bdrv_all_delete_snapshot(name, NULL, errp) < 0) {
+if (bdrv_all_delete_snapshot(name, devices, errp) < 0) {
 return ret;
 }
 }
 
-bs = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
+bs = bdrv_all_find_vmstate_bs(vmstate, devices, errp);
 if (bs == NULL) {
 return ret;
 }
@@ -2730,7 +2733,7 @@ int save_snapshot(const char *name, Error **errp)
 aio_context_release(aio_context);
 aio_context = NULL;
 
-ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, NULL, errp);
+ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, devices, errp);
 if (ret < 0) {
 goto the_end;
 }
@@ -2831,7 +2834,8 @@ void qmp_xen_load_devices_state(const char *filename, 
Error **errp)
 migration_incoming_state_destroy();
 }
 
-int load_snapshot(const char *name, Error **errp)
+int load_snapshot(const char *name, const char *vmstate,
+  strList *devices, Error **errp)
 {
 BlockDriverState *bs_vm_state;
 QEMUSnapshotInfo sn;
@@ -2846,15 +2850,15 @@ int load_snapshot(const char *name, Error **errp)
 return -1;
 }
 
-if (!bdrv_all_can_snapshot(NULL, errp)) {
+

[PATCH v2 5/6] block: allow specifying name of block device for vmstate storage

2020-07-27 Thread Daniel P . Berrangé

Currently the vmstate will be stored in the first block device that
supports snapshots. Historically this would have usually been the
root device, but with UEFI it might be the variable store. There
needs to be a way to override the choice of block device to store
the state in.

Signed-off-by: Daniel P. Berrangé 
---
 block/monitor/block-hmp-cmds.c |  2 +-
 block/snapshot.c   | 64 ++
 include/block/snapshot.h   |  4 ++-
 migration/savevm.c |  4 +--
 4 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index db76c43cc2..81d1b52262 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -900,7 +900,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
 SnapshotEntry *snapshot_entry;
 Error *err = NULL;
 
-bs = bdrv_all_find_vmstate_bs(NULL, );
+bs = bdrv_all_find_vmstate_bs(NULL, NULL, );
 if (!bs) {
 error_report_err(err);
 return;
diff --git a/block/snapshot.c b/block/snapshot.c
index f2600a8c7f..b1ad70e278 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -551,27 +551,63 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
 return 0;
 }
 
-BlockDriverState *bdrv_all_find_vmstate_bs(strList *devices, Error **errp)
+BlockDriverState *bdrv_all_find_vmstate_bs(const char *vmstate_bs,
+   strList *devices,
+   Error **errp)
 {
 BlockDriverState *bs;
 BdrvNextIterator it;
 
-for (bs = bdrv_first(); bs; bs = bdrv_next()) {
-AioContext *ctx = bdrv_get_aio_context(bs);
-bool found;
+if (vmstate_bs) {
+bool usable = false;
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
+AioContext *ctx = bdrv_get_aio_context(bs);
+bool match;
 
-aio_context_acquire(ctx);
-found = bdrv_all_snapshots_includes_bs(bs, devices) &&
-bdrv_can_snapshot(bs);
-aio_context_release(ctx);
+aio_context_acquire(ctx);
+if (g_str_equal(vmstate_bs, bdrv_get_node_name(bs))) {
+match = true;
+usable = bdrv_can_snapshot(bs);
+}
+aio_context_release(ctx);
+if (match) {
+bdrv_next_cleanup();
+break;
+}
+}
+if (!bs) {
+error_setg(errp,
+   "block device '%s' does not exist",
+   vmstate_bs);
+return NULL;
+}
 
-if (found) {
-bdrv_next_cleanup();
-break;
+if (!usable) {
+error_setg(errp,
+   "block device '%s' does not support snapshots",
+   vmstate_bs);
+return NULL;
+}
+} else {
+for (bs = bdrv_first(); bs; bs = bdrv_next()) {
+AioContext *ctx = bdrv_get_aio_context(bs);
+bool found;
+
+aio_context_acquire(ctx);
+found = bdrv_all_snapshots_includes_bs(bs, devices) &&
+bdrv_can_snapshot(bs);
+aio_context_release(ctx);
+
+if (found) {
+bdrv_next_cleanup();
+break;
+}
+}
+
+if (!bs) {
+error_setg(errp, "No block device supports snapshots");
+return NULL;
 }
-}
-if (!bs) {
-error_setg(errp, "No block device supports snapshots");
 }
 return bs;
 }
diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index 1c5b0705a9..05550e5da1 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -86,6 +86,8 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
  strList *devices,
  Error **errp);
 
-BlockDriverState *bdrv_all_find_vmstate_bs(strList *devices, Error **errp);
+BlockDriverState *bdrv_all_find_vmstate_bs(const char *vmstate_bs,
+   strList *devices,
+   Error **errp);
 
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index cdc1f2f2d8..1707fa30db 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2664,7 +2664,7 @@ int save_snapshot(const char *name, Error **errp)
 }
 }
 
-bs = bdrv_all_find_vmstate_bs(NULL, errp);
+bs = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
 if (bs == NULL) {
 return ret;
 }
@@ -2854,7 +2854,7 @@ int load_snapshot(const char *name, Error **errp)
 return -1;
 }
 
-bs_vm_state = bdrv_all_find_vmstate_bs(NULL, errp);
+bs_vm_state = bdrv_all_find_vmstate_bs(NULL, NULL, errp);
 if (!bs_vm_state) {
 return -1;
 }
-- 
2.26.2

[PATCH v2 3/6] migration: stop returning errno from load_snapshot()

2020-07-27 Thread Daniel P . Berrangé

None of the callers care about the errno value since there is a full
Error object populated. This gives consistency with save_snapshot()
which already just returns -1.

Signed-off-by: Daniel P. Berrangé 
---
 migration/savevm.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 19259ef7c0..6c4d80fc5a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2843,20 +2843,20 @@ int load_snapshot(const char *name, Error **errp)
 if (!replay_can_snapshot()) {
 error_setg(errp, "Record/replay does not allow loading snapshot "
"right now. Try once more later.");
-return -EINVAL;
+return -1;
 }
 
 if (!bdrv_all_can_snapshot(errp)) {
-return -ENOTSUP;
+return -1;
 }
 ret = bdrv_all_find_snapshot(name, errp);
 if (ret < 0) {
-return ret;
+return -1;
 }
 
 bs_vm_state = bdrv_all_find_vmstate_bs(errp);
 if (!bs_vm_state) {
-return -ENOTSUP;
+return -1;
 }
 aio_context = bdrv_get_aio_context(bs_vm_state);
 
@@ -2865,11 +2865,11 @@ int load_snapshot(const char *name, Error **errp)
 ret = bdrv_snapshot_find(bs_vm_state, , name);
 aio_context_release(aio_context);
 if (ret < 0) {
-return ret;
+return -1;
 } else if (sn.vm_state_size == 0) {
 error_setg(errp, "This is a disk-only snapshot. Revert to it "
" offline using qemu-img");
-return -EINVAL;
+return -1;
 }
 
 /* Flush all IO requests so they don't interfere with the new state.  */
@@ -2884,7 +2884,6 @@ int load_snapshot(const char *name, Error **errp)
 f = qemu_fopen_bdrv(bs_vm_state, 0);
 if (!f) {
 error_setg(errp, "Could not open VM state file");
-ret = -EINVAL;
 goto err_drain;
 }
 
@@ -2900,14 +2899,14 @@ int load_snapshot(const char *name, Error **errp)
 
 if (ret < 0) {
 error_setg(errp, "Error %d while loading VM state", ret);
-return ret;
+return -1;
 }
 
 return 0;
 
 err_drain:
 bdrv_drain_all_end();
-return ret;
+return -1;
 }
 
 void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
-- 
2.26.2

[PATCH v2 2/6] block: push error reporting into bdrv_all_*_snapshot functions

2020-07-27 Thread Daniel P . Berrangé

The bdrv_all_*_snapshot functions return a BlockDriverState pointer
for the invalid backend, which the callers then use to report an
error message. In some cases multiple callers are reporting the
same error message, but with slightly different text. In the future
there will be more error scenarios for some of these methods, which
will benefit from fine grained error message reporting. So it is
helpful to push error reporting down a level.

Signed-off-by: Daniel P. Berrangé 
---
 block/monitor/block-hmp-cmds.c |  7 ++--
 block/snapshot.c   | 77 +-
 include/block/snapshot.h   | 14 +++
 migration/savevm.c | 37 +---
 monitor/hmp-cmds.c |  7 +---
 tests/qemu-iotests/267.out | 10 ++---
 6 files changed, 65 insertions(+), 87 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4c8c375172..9df11494d6 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -898,10 +898,11 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
 
 ImageEntry *image_entry, *next_ie;
 SnapshotEntry *snapshot_entry;
+Error *err = NULL;
 
-bs = bdrv_all_find_vmstate_bs();
+bs = bdrv_all_find_vmstate_bs();
 if (!bs) {
-monitor_printf(mon, "No available block device supports snapshots\n");
+error_report_err(err);
 return;
 }
 aio_context = bdrv_get_aio_context(bs);
@@ -951,7 +952,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
 total = 0;
 for (i = 0; i < nb_sns; i++) {
 SnapshotEntry *next_sn;
-if (bdrv_all_find_snapshot(sn_tab[i].name, ) == 0) {
+if (bdrv_all_find_snapshot(sn_tab[i].name, NULL) == 0) {
 global_snapshots[total] = i;
 total++;
 QTAILQ_FOREACH(image_entry, _list, next) {
diff --git a/block/snapshot.c b/block/snapshot.c
index bd9fb01817..6839060622 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -400,14 +400,14 @@ static bool 
bdrv_all_snapshots_includes_bs(BlockDriverState *bs)
  * These functions will properly handle dataplane (take aio_context_acquire
  * when appropriate for appropriate block drivers) */
 
-bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs)
+bool bdrv_all_can_snapshot(Error **errp)
 {
-bool ok = true;
 BlockDriverState *bs;
 BdrvNextIterator it;
 
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *ctx = bdrv_get_aio_context(bs);
+bool ok;
 
 aio_context_acquire(ctx);
 if (bdrv_all_snapshots_includes_bs(bs)) {
@@ -415,26 +415,25 @@ bool bdrv_all_can_snapshot(BlockDriverState 
**first_bad_bs)
 }
 aio_context_release(ctx);
 if (!ok) {
+error_setg(errp, "Device '%s' is writable but does not support "
+   "snapshots", bdrv_get_device_or_node_name(bs));
 bdrv_next_cleanup();
-goto fail;
+return false;
 }
 }
 
-fail:
-*first_bad_bs = bs;
-return ok;
+return true;
 }
 
-int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bad_bs,
- Error **errp)
+int bdrv_all_delete_snapshot(const char *name, Error **errp)
 {
-int ret = 0;
 BlockDriverState *bs;
 BdrvNextIterator it;
 QEMUSnapshotInfo sn1, *snapshot = 
 
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *ctx = bdrv_get_aio_context(bs);
+int ret;
 
 aio_context_acquire(ctx);
 if (bdrv_all_snapshots_includes_bs(bs) &&
@@ -445,26 +444,25 @@ int bdrv_all_delete_snapshot(const char *name, 
BlockDriverState **first_bad_bs,
 }
 aio_context_release(ctx);
 if (ret < 0) {
+error_prepend(errp, "Could not delete snapshot '%s' on '%s': ",
+  name, bdrv_get_device_or_node_name(bs));
 bdrv_next_cleanup();
-goto fail;
+return -1;
 }
 }
 
-fail:
-*first_bad_bs = bs;
-return ret;
+return 0;
 }
 
 
-int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs,
-   Error **errp)
+int bdrv_all_goto_snapshot(const char *name, Error **errp)
 {
-int ret = 0;
 BlockDriverState *bs;
 BdrvNextIterator it;
 
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *ctx = bdrv_get_aio_context(bs);
+int ret;
 
 aio_context_acquire(ctx);
 if (bdrv_all_snapshots_includes_bs(bs)) {
@@ -472,75 +470,75 @@ int bdrv_all_goto_snapshot(const char *name, 
BlockDriverState **first_bad_bs,
 }
 aio_context_release(ctx);
 if (ret < 0) {
+error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
+  name, bdrv_get_device_or_node_name(bs));
 bdrv_next_cleanup();
-goto fail;
+return -1;
 }

[PATCH v2 1/6] migration: improve error reporting of block driver state name

2020-07-27 Thread Daniel P . Berrangé

With blockdev, a BlockDriverState may not have a device name,
so using a node name is required as an alternative.

Signed-off-by: Daniel P. Berrangé 
---
 migration/savevm.c | 12 ++--
 tests/qemu-iotests/267.out |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 45c9dd9d8a..cffee6cab7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2655,7 +2655,7 @@ int save_snapshot(const char *name, Error **errp)
 
 if (!bdrv_all_can_snapshot()) {
 error_setg(errp, "Device '%s' is writable but does not support "
-   "snapshots", bdrv_get_device_name(bs));
+   "snapshots", bdrv_get_device_or_node_name(bs));
 return ret;
 }
 
@@ -2664,7 +2664,7 @@ int save_snapshot(const char *name, Error **errp)
 ret = bdrv_all_delete_snapshot(name, , errp);
 if (ret < 0) {
 error_prepend(errp, "Error while deleting snapshot on device "
-  "'%s': ", bdrv_get_device_name(bs1));
+  "'%s': ", bdrv_get_device_or_node_name(bs1));
 return ret;
 }
 }
@@ -2739,7 +2739,7 @@ int save_snapshot(const char *name, Error **errp)
 ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, );
 if (ret < 0) {
 error_setg(errp, "Error while creating snapshot on '%s'",
-   bdrv_get_device_name(bs));
+   bdrv_get_device_or_node_name(bs));
 goto the_end;
 }
 
@@ -2857,14 +2857,14 @@ int load_snapshot(const char *name, Error **errp)
 if (!bdrv_all_can_snapshot()) {
 error_setg(errp,
"Device '%s' is writable but does not support snapshots",
-   bdrv_get_device_name(bs));
+   bdrv_get_device_or_node_name(bs));
 return -ENOTSUP;
 }
 ret = bdrv_all_find_snapshot(name, );
 if (ret < 0) {
 error_setg(errp,
"Device '%s' does not have the requested snapshot '%s'",
-   bdrv_get_device_name(bs), name);
+   bdrv_get_device_or_node_name(bs), name);
 return ret;
 }
 
@@ -2893,7 +2893,7 @@ int load_snapshot(const char *name, Error **errp)
 ret = bdrv_all_goto_snapshot(name, , errp);
 if (ret < 0) {
 error_prepend(errp, "Could not load snapshot '%s' on '%s': ",
-  name, bdrv_get_device_name(bs));
+  name, bdrv_get_device_or_node_name(bs));
 goto err_drain;
 }
 
diff --git a/tests/qemu-iotests/267.out b/tests/qemu-iotests/267.out
index d6d80c099f..215902b3ad 100644
--- a/tests/qemu-iotests/267.out
+++ b/tests/qemu-iotests/267.out
@@ -81,11 +81,11 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 Testing: -blockdev driver=file,filename=TEST_DIR/t.IMGFMT,node-name=file
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) savevm snap0
-Error: Device '' is writable but does not support snapshots
+Error: Device 'file' is writable but does not support snapshots
 (qemu) info snapshots
 No available block device supports snapshots
 (qemu) loadvm snap0
-Error: Device '' is writable but does not support snapshots
+Error: Device 'file' is writable but does not support snapshots
 (qemu) quit
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
-- 
2.26.2

[PATCH v2 (BROKEN) 0/6] migration: bring improved savevm/loadvm/delvm to QMP

2020-07-27 Thread Daniel P . Berrangé

A followup to:

 v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html

When QMP was first introduced some 10+ years ago now, the snapshot
related commands (savevm/loadvm/delvm) were not converted. This was
primarily because their implementation causes blocking of the thread
running the monitor commands. This was (and still is) considered
undesirable behaviour both in HMP and QMP.

In theory someone was supposed to fix this flaw at some point in the
past 10 years and bring them into the QMP world. Sadly, thus far it
hasn't happened as people always had more important things to work
on. Enterprise apps were much more interested in external snapshots
than internal snapshots as they have many more features.

Meanwhile users still want to use internal snapshots as there is
a certainly simplicity in having everything self-contained in one
image, even though it has limitations. Thus the apps that end up
executing the savevm/loadvm/delvm via the "human-monitor-command"
QMP command.

IOW, the problematic blocking behaviour that was one of the reasons
for not having savevm/loadvm/delvm in QMP is experienced by applications
regardless. By not portting the commands to QMP due to one design flaw,
we've forced apps and users to suffer from other design flaws of HMP (
bad error reporting, strong type checking of args, no introspection) for
an additional 10 years. This feels rather sub-optimal :-(

In practice users don't appear to care strongly about the fact that these
commands block the VM while they run. I might have seen one bug report
about it, but it certainly isn't something that comes up as a frequent
topic except among us QEMU maintainers. Users do care about having
access to the snapshot feature.

Where I am seeing frequent complaints is wrt the use of OVMF combined
with snapshots which has some serious pain points. This is getting worse
as the push to ditch legacy BIOS in favour of UEFI gain momentum both
across OS vendors and mgmt apps. Solving it requires new parameters to
the commands, but doing this in HMP is super unappealing.

After 10 years, I think it is time for us to be a little pragmatic about
our handling of snapshots commands. My desire is that libvirt should never
use "human-monitor-command" under any circumstances, because of the
inherant flaws in HMP as a protocol for machine consumption.

Thus in this series I'm proposing a fairly direct mapping of the existing
HMP commands for savevm/loadvm/delvm into QMP as a first step. This does
not solve the blocking thread problem, but it does put in a place a
design using the jobs framework which can facilitate solving it later.
It does also solve the error reporting, type checking and introspection
problems inherant to HMP. So we're winning on 3 out of the 4 problems,
and pushed apps to a QMP design that will let us solve the last
remaining problem.

With a QMP variant, we reasonably deal with the problems related to OVMF:

 - The logic to pick which disk to store the vmstate in is not
   satsifactory.

   The first block driver state cannot be assumed to be the root disk
   image, it might be OVMF varstore and we don't want to store vmstate
   in there.

 - The logic to decide which disks must be snapshotted is hardwired
   to all disks which are writable

   Again with OVMF there might be a writable varstore, but this can be
   raw rather than qcow2 format, and thus unable to be snapshotted.
   While users might wish to snapshot their varstore, in some/many/most
   cases it is entirely uneccessary. Users are blocked from snapshotting
   their VM though due to this varstore.

These are solved by adding two parameters to the commands. The first is
a block device node name that identifies the image to store vmstate in,
and the second is a list of node names to include for the snapshots.
If the list of nodes isn't given, it falls back to the historical
behaviour of using all disks matching some undocumented criteria.

In the block code I've only dealt with node names for block devices, as
IIUC, this is all that libvirt should need in the -blockdev world it now
lives in. IOW, I've made not attempt to cope with people wanting to use
these QMP commands in combination with -drive args.

I've done some minimal work in libvirt to start to make use of the new
commands to validate their functionality, but this isn't finished yet.

My ultimate goal is to make the GNOME Boxes maintainer happy again by
having internal snapshots work with OVMF:

  https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e=
f45c5f64048f16a6e

HELP NEEDED:  this series starts to implement the approach that Kevin
suggested wrto use of generic jobs.

When I try to actually run the code though it crashes:

ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth=
read: assertion failed: (qemu_mutex_iothread_locked())
Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u=
nlock_iothread: assertion failed:

Re: [PATCH 1/2] block: nbd: Fix convert qcow2 compressed to nbd

2020-07-27 Thread Nir Soffer

On Mon, Jul 27, 2020 at 5:04 PM Eric Blake  wrote:
>
> On 7/26/20 10:25 AM, Nir Soffer wrote:
> > When converting to qcow2 compressed format, the last step is a special
> > zero length compressed write, ending in call to bdrv_co_truncate(). This
> > call always fail for the nbd driver since it does not implement
>
> fails
>
> > bdrv_co_truncate().
>
> Arguably, qemu-img should be taught to ignore the failure, since it is
> not unique to the nbd driver. But I can live with your approach here.

I started with ignoring ENOTSUP in qcow2, but felt less safe about this
approach since the same issue may happen in other flows, and making nbd
driver behave like a block device looks like a safer change.

> > For block devices, which have the same limits, the call succeeds since
> > file driver implements bdrv_co_truncate(). If the caller asked to
> > truncate to the same or smaller size with exact=false, the truncate
> > succeeds. Implement the same logic for nbd.
> >
> > Example failing without this change:
> >
>
> >
> > Fixes: https://bugzilla.redhat.com/1860627
> > Signed-off-by: Nir Soffer 
> > ---
> >   block/nbd.c | 27 +++
> >   1 file changed, 27 insertions(+)
> >
> > diff --git a/block/nbd.c b/block/nbd.c
> > index 65a4f56924..2154113af3 100644
> > --- a/block/nbd.c
> > +++ b/block/nbd.c
> > @@ -1966,6 +1966,30 @@ static void nbd_close(BlockDriverState *bs)
> >   nbd_clear_bdrvstate(s);
> >   }
> >
> > +/*
> > + * NBD cannot truncate, but if the caller ask to truncate to the same 
> > size, or
>
> asks
>
> > + * to a smaller size with extact=false, there is not reason to fail the
>
> exact, no
>
> > + * operation.
> > + */
> > +static int coroutine_fn nbd_co_truncate(BlockDriverState *bs, int64_t 
> > offset,
> > +bool exact, PreallocMode prealloc,
> > +BdrvRequestFlags flags, Error 
> > **errp)
> > +{
> > +BDRVNBDState *s = bs->opaque;
> > +
> > +if (offset != s->info.size && exact) {
> > +error_setg(errp, "Cannot resize NBD nodes");
> > +return -ENOTSUP;
> > +}
> > +
> > +if (offset > s->info.size) {
> > +error_setg(errp, "Cannot grow NBD nodes");
> > +return -EINVAL;
> > +}
> > +
> > +return 0;
>
> Looks reasonable.  As Max said, I wonder if we want to reject particular
> preallocation modes (looking at block/file-posix.c:raw_co_truncate), in
> the case where the image was resized down and then back up (since
> s->info.size is constant, but the BDS size is not if inexact resize
> succeeds).
>
> As you have a bugzilla entry, I think this is safe for -rc2; I'll be
> touching up the typos and queuing it through my NBD tree later today.

I'll post v2 with the test fixes later today.

>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>

Re: [PATCH 2/2] qemu-iotests: Test convert to qcow2 compressed to NBD

2020-07-27 Thread Nir Soffer

On Mon, Jul 27, 2020 at 1:05 PM Max Reitz  wrote:
>
> On 26.07.20 17:25, Nir Soffer wrote:
> > Add test for "qemu-img convert -O qcow2 -c" to NBD target. The use case
> > is writing compressed disk content to OVA archive.
> >
> > Signed-off-by: Nir Soffer 
> > ---
> >  tests/qemu-iotests/302 | 83 ++
> >  tests/qemu-iotests/302.out | 27 +
> >  tests/qemu-iotests/group   |  1 +
> >  3 files changed, 111 insertions(+)
> >  create mode 100755 tests/qemu-iotests/302
> >  create mode 100644 tests/qemu-iotests/302.out
> >
> > diff --git a/tests/qemu-iotests/302 b/tests/qemu-iotests/302
> > new file mode 100755
> > index 00..cefde1f7cf
> > --- /dev/null
> > +++ b/tests/qemu-iotests/302
> > @@ -0,0 +1,83 @@
> > +#!/usr/bin/env python3
> > +#
> > +# Tests conveting qcow2 compressed to NBD
>
> *converting
>
> > +#
> > +# Copyright (c) 2020 Nir Soffer 
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 2 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see .
> > +#
> > +# owner=nir...@gmail.com
> > +
> > +import json
> > +import iotests
> > +
> > +from iotests import (
> > +file_path,
> > +qemu_img,
> > +qemu_img_create,
> > +qemu_img_log,
> > +qemu_img_pipe,
> > +qemu_io,
> > +qemu_nbd,
> > +)
> > +
> > +iotests.script_initialize(supported_fmts=["qcow2"])
> > +
> > +# Create source disk, format does not matter.
> > +src_disk = file_path("disk.img")
> > +qemu_img_create("-f", "raw", src_disk, "10m")
>
> If the format doesn’t matter, why not just use qcow2 and so put
> iotests.imgfmt here?  (And everywhere else where you now have -f raw.)

I tried to use the simplest setup that is less likely to break, but
thinking about
CI environments with strange storage, maybe using qcow2 source disk will be
more robust.

> > +qemu_io("-f", "raw", "-c", "write 1m 64K", src_disk)
>
> (Except I think qemu_io already has -f qcow2 in its arguments by
> default, so specifying the format wouldn’t even be necessary here.)
>
> > +# The use case is writing qcow2 image directly into a tar file. Code to 
> > create
> > +# real tar file not included.
> > +#
> > +# offsetcontent
> > +# ---
> > +#  0first memebr header
>
> *member
>
> > +#512first member data
> > +#   1024second memeber header
>
> *member
>
> > +#   1536second member data
> > +
> > +tar_file = file_path("test.tar")
> > +out = qemu_img_pipe("measure", "-O", "qcow2", "--output", "json", src_disk)
> > +measure = json.loads(out)
> > +qemu_img_create("-f", "raw", tar_file, str(measure["required"]))
>
> Should this be measure["required"] + 1536?
>
> > +
> > +nbd_sock = file_path("nbd-sock", base_dir=iotests.sock_dir)
> > +nbd_uri = "nbd+unix:///exp?socket=" + nbd_sock
> > +
> > +# Use raw format to allow creating qcow2 directy into tar file.
> > +qemu_nbd(
> > +"--socket", nbd_sock,
> > +"--persistent",
> > +"--export-name", "exp",
> > +"--format", "raw",
> > +"--offset", "1536",
> > +tar_file)
> > +
> > +iotests.log("=== Target image info ===")
> > +qemu_img_log("info", nbd_uri)
> > +
> > +# Write image into the tar file. In a real applicatio we would write a tar
>
> *application
>
> > +# entry after writing the image.
> > +qemu_img("convert", "-f", "raw", "-O", "qcow2", "-c", src_disk, nbd_uri)
> > +
> > +iotests.log("=== Converted image info ===")
> > +qemu_img_log("info", nbd_uri)
> > +
> > +iotests.log("=== Converted image check ===")
> > +qemu_img_log("check", nbd_uri)
> > +
> > +iotests.log("=== Comparing to source disk ===")
> > +qemu_img_log("compare", src_disk, nbd_uri)
> > diff --git a/tests/qemu-iotests/302.out b/tests/qemu-iotests/302.out
> > new file mode 100644
> > index 00..babef3d574
> > --- /dev/null
> > +++ b/tests/qemu-iotests/302.out
> > @@ -0,0 +1,27 @@
> > +=== Target image info ===
> > +image: nbd+unix:///exp?socket=SOCK_DIR/PID-nbd-sock
> > +file format: raw
> > +virtual size: 446 KiB (457216 bytes)
> > +disk size: unavailable
> > +
> > +=== Converted image info ===
> > +image: nbd+unix:///exp?socket=SOCK_DIR/PID-nbd-sock
> > +file format: qcow2
> > +virtual size: 10 MiB (10485760 bytes)
> > +disk size: unavailable
> > +cluster_size: 65536
> > +Format specific information:
> > +compat: 1.1
> > +compression type: zlib
> > +lazy refcounts: false
> > +refcount bits: 16
> > +corrupt: false

Re: [PATCH 2/2] qemu-iotests: Test convert to qcow2 compressed to NBD

2020-07-27 Thread Nir Soffer

On Mon, Jul 27, 2020 at 5:41 PM Eric Blake  wrote:
>
> On 7/27/20 9:35 AM, Nir Soffer wrote:
>
> >> I guess it's okay that you don't create a real tar file here, but
> >> listing the commands to create it (even as a comment) is better than
> >> just saying "trust me".  And it doesn't seem like that much more work -
> >> it looks like the key to your test is that you created a tar file
> >> containing two files, where the first file was less than 512 bytes and
> >> the second file is your target destination that you will be rewriting.
> >
> > The real code is more complicated, something like:
> >
> >  offset = tar.fileobj.tell() + BLOCK_SIZE
> >
> >  with open(tar.name, "r+") as f:
> >  f.truncate(offset + measure["required"])
> >
> >  convert_image(image, tar.name, offset)
> >
> >  check = check_image(tar.name, offset)
> >  size = check["image-end-offset"]
> >
> >  member = tarfile.TarInfo(name)
> >  member.size = size
> >  tar.addfile(member)
> >
> >  tar_size = offset + round_up(size)
> >
> >  tar.fileobj.seek(tar_size)
> >  with open(tar.name, "r+") as f:
> >  f.truncate(tar_size)
> >
> > I'm not sure it helps qemu developers working on these tests.
>
> The closer the iotest is to reality, the more likely it will serve as a
> good regression test.  Cutting corners risks a test that passes in
> isolation even when we've done something that breaks the overall process
> in one of the corners you cut.

I'll add this code then.

> >>
> >> At any rate, given the urgency of getting pull requests for -rc2 in
> >> before slamming Peter tomorrow, I'll probably try to touch up the issues
> >> Max pointed out and queue it today.
> >
> > Thanks Max and Eric.
> >
> > Should I post a fixed version later today?
>
> A v2 would be helpful.

Will post later today.

Re: [PATCH 2/2] qemu-iotests: Test convert to qcow2 compressed to NBD

2020-07-27 Thread Eric Blake


On 7/27/20 9:35 AM, Nir Soffer wrote:


I guess it's okay that you don't create a real tar file here, but
listing the commands to create it (even as a comment) is better than
just saying "trust me".  And it doesn't seem like that much more work -
it looks like the key to your test is that you created a tar file
containing two files, where the first file was less than 512 bytes and
the second file is your target destination that you will be rewriting.


The real code is more complicated, something like:

 offset = tar.fileobj.tell() + BLOCK_SIZE

 with open(tar.name, "r+") as f:
 f.truncate(offset + measure["required"])

 convert_image(image, tar.name, offset)

 check = check_image(tar.name, offset)
 size = check["image-end-offset"]

 member = tarfile.TarInfo(name)
 member.size = size
 tar.addfile(member)

 tar_size = offset + round_up(size)

 tar.fileobj.seek(tar_size)
 with open(tar.name, "r+") as f:
 f.truncate(tar_size)

I'm not sure it helps qemu developers working on these tests.


The closer the iotest is to reality, the more likely it will serve as a 
good regression test.  Cutting corners risks a test that passes in 
isolation even when we've done something that breaks the overall process 
in one of the corners you cut.





At any rate, given the urgency of getting pull requests for -rc2 in
before slamming Peter tomorrow, I'll probably try to touch up the issues
Max pointed out and queue it today.


Thanks Max and Eric.

Should I post a fixed version later today?


A v2 would be helpful.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PULL 3/3] iotests/197: Fix for compat=0.10

2020-07-27 Thread Max Reitz

Writing zeroes to a qcow2 v2 images without a backing file results in an
unallocated cluster as of 61b3043965.  197 has a test for COR-ing a
cluster on an image without a backing file, which means that the data
will be zero, so now on a v2 image that cluster will just stay
unallocated, and so the test fails.  Just force compat=1.1 for that
particular case to enforce the cluster to get allocated.

Fixes: 61b3043965fe3552ee2684a97e7cc809ca7a71b3
Signed-off-by: Max Reitz 
Message-Id: <20200727135237.1096841-1-mre...@redhat.com>
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/197 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index 95f05b0e34..121959a09c 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -112,7 +112,9 @@ echo
 echo '=== Partial final cluster ==='
 echo
 
-_make_test_img 1024
+# Force compat=1.1, because writing zeroes on a v2 image without a
+# backing file would just result in an unallocated cluster
+_make_test_img -o compat=1.1 1024
 $QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IO -f $IMGFMT -c map "$TEST_IMG"
 _check_test_img
-- 
2.26.2

[PULL 1/3] block/amend: Check whether the node exists

2020-07-27 Thread Max Reitz

We should check whether the user-specified node-name actually refers to
a node.  The simplest way to do that is to use bdrv_lookup_bs() instead
of bdrv_find_node() (the former wraps the latter, and produces an error
message if necessary).

Reported-by: Coverity (CID 1430268)
Fixes: ced914d0ab9fb2c900f873f6349a0b8eecd1fdbe
Signed-off-by: Max Reitz 
Message-Id: <20200710095037.10885-1-mre...@redhat.com>
Reviewed-by: Maxim Levitsky 
---
 block/amend.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/amend.c b/block/amend.c
index f4612dcf08..392df9ef83 100644
--- a/block/amend.c
+++ b/block/amend.c
@@ -69,8 +69,12 @@ void qmp_x_blockdev_amend(const char *job_id,
 BlockdevAmendJob *s;
 const char *fmt = BlockdevDriver_str(options->driver);
 BlockDriver *drv = bdrv_find_format(fmt);
-BlockDriverState *bs = bdrv_find_node(node_name);
+BlockDriverState *bs;
 
+bs = bdrv_lookup_bs(NULL, node_name, errp);
+if (!bs) {
+return;
+}
 
 if (!drv) {
 error_setg(errp, "Block driver '%s' not found or not supported", fmt);
-- 
2.26.2

[PULL 2/3] iotests: Select a default machine for the rx and avr targets

2020-07-27 Thread Max Reitz

From: Thomas Huth 

If you are building only with either the new rx-softmmu or avr-softmmu
target, "make check-block" fails a couple of tests since there is no
default machine defined in these new targets. We have to select a machine
in the "check" script for these, just like we already do for the arm- and
tricore-softmmu targets.

Signed-off-by: Thomas Huth 
Message-Id: <20200722161908.25383-1-th...@redhat.com>
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/check | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index e0d8049012..0657f7286c 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -595,15 +595,19 @@ then
 fi
 export QEMU_PROG="$(type -p "$QEMU_PROG")"
 
+export QEMU_OPTIONS="-nodefaults -display none -accel qtest"
 case "$QEMU_PROG" in
 *qemu-system-arm|*qemu-system-aarch64)
-export QEMU_OPTIONS="-nodefaults -display none -machine virt -accel 
qtest"
+export QEMU_OPTIONS="$QEMU_OPTIONS -machine virt"
 ;;
-*qemu-system-tricore)
-export QEMU_OPTIONS="-nodefaults -display none -machine 
tricore_testboard -accel qtest"
+*qemu-system-avr)
+export QEMU_OPTIONS="$QEMU_OPTIONS -machine mega2560"
+;;
+*qemu-system-rx)
+export QEMU_OPTIONS="$QEMU_OPTIONS -machine gdbsim-r5f562n8"
 ;;
-*)
-export QEMU_OPTIONS="-nodefaults -display none -accel qtest"
+*qemu-system-tricore)
+export QEMU_OPTIONS="-$QEMU_OPTIONS -machine tricore_testboard"
 ;;
 esac
 
-- 
2.26.2

[PULL 0/3] Block patches for 5.1

2020-07-27 Thread Max Reitz

The following changes since commit 4215d3413272ad6d1c6c9d0234450b602e46a74c:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-5.1-20200727' into 
staging (2020-07-27 09:33:04 +0100)

are available in the Git repository at:

  https://github.com/XanClic/qemu.git tags/pull-block-2020-07-27

for you to fetch changes up to 1855536256eb0a5708b04b85f744de69559ea323:

  iotests/197: Fix for compat=0.10 (2020-07-27 16:35:17 +0200)


Block patches for 5.1:
- Coverity fix
- iotests fix for rx and avr
- iotests fix for qcow2 -o compat=0.10


Max Reitz (2):
  block/amend: Check whether the node exists
  iotests/197: Fix for compat=0.10

Thomas Huth (1):
  iotests: Select a default machine for the rx and avr targets

 block/amend.c|  6 +-
 tests/qemu-iotests/197   |  4 +++-
 tests/qemu-iotests/check | 14 +-
 3 files changed, 17 insertions(+), 7 deletions(-)

-- 
2.26.2

Re: [PATCH 2/2] qemu-iotests: Test convert to qcow2 compressed to NBD

2020-07-27 Thread Nir Soffer

On Mon, Jul 27, 2020 at 5:14 PM Eric Blake  wrote:
>
> On 7/27/20 5:04 AM, Max Reitz wrote:
> > On 26.07.20 17:25, Nir Soffer wrote:
> >> Add test for "qemu-img convert -O qcow2 -c" to NBD target. The use case
> >> is writing compressed disk content to OVA archive.
> >>
> >> Signed-off-by: Nir Soffer 
> >> ---
>
> >
> >> +# The use case is writing qcow2 image directly into a tar file. Code to 
> >> create
> >> +# real tar file not included.
> >> +#
> >> +# offsetcontent
> >> +# ---
> >> +#  0first memebr header
> >
> > *member

Sorry for the typos, I need to setup automated spelling check :-)

> >
> >> +#512first member data
> >> +#   1024second memeber header
> >
> > *member
> >
> >> +#   1536second member data
> >> +
> >> +tar_file = file_path("test.tar")
>
> I guess it's okay that you don't create a real tar file here, but
> listing the commands to create it (even as a comment) is better than
> just saying "trust me".  And it doesn't seem like that much more work -
> it looks like the key to your test is that you created a tar file
> containing two files, where the first file was less than 512 bytes and
> the second file is your target destination that you will be rewriting.

The real code is more complicated, something like:

offset = tar.fileobj.tell() + BLOCK_SIZE

with open(tar.name, "r+") as f:
f.truncate(offset + measure["required"])

convert_image(image, tar.name, offset)

check = check_image(tar.name, offset)
size = check["image-end-offset"]

member = tarfile.TarInfo(name)
member.size = size
tar.addfile(member)

tar_size = offset + round_up(size)

tar.fileobj.seek(tar_size)
with open(tar.name, "r+") as f:
f.truncate(tar_size)

I'm not sure it helps qemu developers working on these tests.

> >> +out = qemu_img_pipe("measure", "-O", "qcow2", "--output", "json", 
> >> src_disk)
> >> +measure = json.loads(out)
> >> +qemu_img_create("-f", "raw", tar_file, str(measure["required"]))
> >
> > Should this be measure["required"] + 1536?
>
> The test works without it (because of compression), but yes, if you are
> going to test writing into an offset, you should oversize your file by
> that same offset.

Right, in the real code using this I indeed use offset + required.

> >> +
> >> +nbd_sock = file_path("nbd-sock", base_dir=iotests.sock_dir)
> >> +nbd_uri = "nbd+unix:///exp?socket=" + nbd_sock
> >> +
> >> +# Use raw format to allow creating qcow2 directy into tar file.
> >> +qemu_nbd(
> >> +"--socket", nbd_sock,
> >> +"--persistent",
> >> +"--export-name", "exp",
> >> +"--format", "raw",
> >> +"--offset", "1536",
> >> +tar_file)
> >> +
> >> +iotests.log("=== Target image info ===")
> >> +qemu_img_log("info", nbd_uri)
> >> +
> >> +# Write image into the tar file. In a real applicatio we would write a tar
> >
> > *application
> >
>
> >> +=== Converted image check ===
> >> +No errors were found on the image.
> >> +1/160 = 0.62% allocated, 100.00% fragmented, 100.00% compressed clusters
> >> +Image end offset: 393216
> >
> > I hope none of this is fs-dependant.  (I don’t think it is, but who
> > knows.  I suppose we’ll find out.)
>
> Indeed - time to see what CI thinks of this.
>
> At any rate, given the urgency of getting pull requests for -rc2 in
> before slamming Peter tomorrow, I'll probably try to touch up the issues
> Max pointed out and queue it today.

Thanks Max and Eric.

Should I post a fixed version later today?

Re: [PATCH] iotests/197: Fix for compat=0.10

2020-07-27 Thread Eric Blake


On 7/27/20 8:52 AM, Max Reitz wrote:

Writing zeroes to a qcow2 v2 images without a backing file results in an
unallocated cluster as of 61b3043965.  197 has a test for COR-ing a
cluster on an image without a backing file, which means that the data
will be zero, so now on a v2 image that cluster will just stay
unallocated, and so the test fails.  Just force compat=1.1 for that
particular case to enforce the cluster to get allocated.

Fixes: 61b3043965fe3552ee2684a97e7cc809ca7a71b3
Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/197 | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)


Reviewed-by: Eric Blake 



diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index 95f05b0e34..121959a09c 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -112,7 +112,9 @@ echo
  echo '=== Partial final cluster ==='
  echo
  
-_make_test_img 1024

+# Force compat=1.1, because writing zeroes on a v2 image without a
+# backing file would just result in an unallocated cluster
+_make_test_img -o compat=1.1 1024
  $QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
  $QEMU_IO -f $IMGFMT -c map "$TEST_IMG"
  _check_test_img



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 2/2] qemu-iotests: Test convert to qcow2 compressed to NBD

2020-07-27 Thread Eric Blake


On 7/27/20 5:04 AM, Max Reitz wrote:

On 26.07.20 17:25, Nir Soffer wrote:

Add test for "qemu-img convert -O qcow2 -c" to NBD target. The use case
is writing compressed disk content to OVA archive.

Signed-off-by: Nir Soffer 
---





+# The use case is writing qcow2 image directly into a tar file. Code to create
+# real tar file not included.
+#
+# offsetcontent
+# ---
+#  0first memebr header


*member


+#512first member data
+#   1024second memeber header


*member


+#   1536second member data
+
+tar_file = file_path("test.tar")


I guess it's okay that you don't create a real tar file here, but 
listing the commands to create it (even as a comment) is better than 
just saying "trust me".  And it doesn't seem like that much more work - 
it looks like the key to your test is that you created a tar file 
containing two files, where the first file was less than 512 bytes and 
the second file is your target destination that you will be rewriting.



+out = qemu_img_pipe("measure", "-O", "qcow2", "--output", "json", src_disk)
+measure = json.loads(out)
+qemu_img_create("-f", "raw", tar_file, str(measure["required"]))


Should this be measure["required"] + 1536?


The test works without it (because of compression), but yes, if you are 
going to test writing into an offset, you should oversize your file by 
that same offset.





+
+nbd_sock = file_path("nbd-sock", base_dir=iotests.sock_dir)
+nbd_uri = "nbd+unix:///exp?socket=" + nbd_sock
+
+# Use raw format to allow creating qcow2 directy into tar file.
+qemu_nbd(
+"--socket", nbd_sock,
+"--persistent",
+"--export-name", "exp",
+"--format", "raw",
+"--offset", "1536",
+tar_file)
+
+iotests.log("=== Target image info ===")
+qemu_img_log("info", nbd_uri)
+
+# Write image into the tar file. In a real applicatio we would write a tar


*application




+=== Converted image check ===
+No errors were found on the image.
+1/160 = 0.62% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 393216


I hope none of this is fs-dependant.  (I don’t think it is, but who
knows.  I suppose we’ll find out.)


Indeed - time to see what CI thinks of this.

At any rate, given the urgency of getting pull requests for -rc2 in 
before slamming Peter tomorrow, I'll probably try to touch up the issues 
Max pointed out and queue it today.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/2] block: nbd: Fix convert qcow2 compressed to nbd

2020-07-27 Thread Eric Blake


On 7/26/20 10:25 AM, Nir Soffer wrote:

When converting to qcow2 compressed format, the last step is a special
zero length compressed write, ending in call to bdrv_co_truncate(). This
call always fail for the nbd driver since it does not implement


fails


bdrv_co_truncate().


Arguably, qemu-img should be taught to ignore the failure, since it is 
not unique to the nbd driver. But I can live with your approach here.




For block devices, which have the same limits, the call succeeds since
file driver implements bdrv_co_truncate(). If the caller asked to
truncate to the same or smaller size with exact=false, the truncate
succeeds. Implement the same logic for nbd.

Example failing without this change:





Fixes: https://bugzilla.redhat.com/1860627
Signed-off-by: Nir Soffer 
---
  block/nbd.c | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 65a4f56924..2154113af3 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1966,6 +1966,30 @@ static void nbd_close(BlockDriverState *bs)
  nbd_clear_bdrvstate(s);
  }
  
+/*

+ * NBD cannot truncate, but if the caller ask to truncate to the same size, or


asks


+ * to a smaller size with extact=false, there is not reason to fail the


exact, no


+ * operation.
+ */
+static int coroutine_fn nbd_co_truncate(BlockDriverState *bs, int64_t offset,
+bool exact, PreallocMode prealloc,
+BdrvRequestFlags flags, Error **errp)
+{
+BDRVNBDState *s = bs->opaque;
+
+if (offset != s->info.size && exact) {
+error_setg(errp, "Cannot resize NBD nodes");
+return -ENOTSUP;
+}
+
+if (offset > s->info.size) {
+error_setg(errp, "Cannot grow NBD nodes");
+return -EINVAL;
+}
+
+return 0;


Looks reasonable.  As Max said, I wonder if we want to reject particular 
preallocation modes (looking at block/file-posix.c:raw_co_truncate), in 
the case where the image was resized down and then back up (since 
s->info.size is constant, but the BDS size is not if inexact resize 
succeeds).


As you have a bugzilla entry, I think this is safe for -rc2; I'll be 
touching up the typos and queuing it through my NBD tree later today.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH] iotests/197: Fix for compat=0.10

2020-07-27 Thread Max Reitz

Writing zeroes to a qcow2 v2 images without a backing file results in an
unallocated cluster as of 61b3043965.  197 has a test for COR-ing a
cluster on an image without a backing file, which means that the data
will be zero, so now on a v2 image that cluster will just stay
unallocated, and so the test fails.  Just force compat=1.1 for that
particular case to enforce the cluster to get allocated.

Fixes: 61b3043965fe3552ee2684a97e7cc809ca7a71b3
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/197 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197
index 95f05b0e34..121959a09c 100755
--- a/tests/qemu-iotests/197
+++ b/tests/qemu-iotests/197
@@ -112,7 +112,9 @@ echo
 echo '=== Partial final cluster ==='
 echo
 
-_make_test_img 1024
+# Force compat=1.1, because writing zeroes on a v2 image without a
+# backing file would just result in an unallocated cluster
+_make_test_img -o compat=1.1 1024
 $QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IO -f $IMGFMT -c map "$TEST_IMG"
 _check_test_img
-- 
2.26.2

Re: [PATCH v7 43/47] iotests: Let complete_and_wait() work with commit

2020-07-27 Thread Andrey Shinkevich


On 25.06.2020 18:22, Max Reitz wrote:

complete_and_wait() and wait_ready() currently only work for mirror
jobs.  Let them work for active commit jobs, too.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/iotests.py | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 5ea4c4df8b..57b32d8ad3 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -932,8 +932,12 @@ class QMPTestCase(unittest.TestCase):
  
  def wait_ready(self, drive='drive0'):

  """Wait until a BLOCK_JOB_READY event, and return the event."""
-f = {'data': {'type': 'mirror', 'device': drive}}
-return self.vm.event_wait(name='BLOCK_JOB_READY', match=f)
+return self.vm.events_wait([
+('BLOCK_JOB_READY',
+ {'data': {'type': 'mirror', 'device': drive}}),
+('BLOCK_JOB_READY',
+ {'data': {'type': 'commit', 'device': drive}})
+])
  
  def wait_ready_and_cancel(self, drive='drive0'):

  self.wait_ready(drive=drive)
@@ -952,7 +956,7 @@ class QMPTestCase(unittest.TestCase):
  self.assert_qmp(result, 'return', {})
  
  event = self.wait_until_completed(drive=drive, error=completion_error)

-self.assert_qmp(event, 'data/type', 'mirror')
+self.assertTrue(event['data']['type'] in ['mirror', 'commit'])
  
  def pause_wait(self, job_id='job0'):

  with Timeout(3, "Timeout waiting for job to pause"):



Reviewed-by: Andrey Shinkevich

Re: [PATCH v3 17/21] migration/savevm: don't worry if bitmap migration postcopy failed

2020-07-27 Thread Dr. David Alan Gilbert

* Eric Blake (ebl...@redhat.com) wrote:
> On 7/24/20 3:43 AM, Vladimir Sementsov-Ogievskiy wrote:
> > First, if only bitmaps postcopy enabled (not ram postcopy)
> 
> is enabled (and not ram postcopy),
> 
> > postcopy_pause_incoming crashes on assertion assert(mis->to_src_file).
> 
> on an
> 
> > 
> > And anyway, bitmaps postcopy is not prepared to be somehow recovered.
> > The original idea instead is that if bitmaps postcopy failed, we just
> > loss some bitmaps, which is not critical. So, on failure we just need
> 
> lose
> 
> > to remove unfinished bitmaps and guest should continue execution on
> > destination.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > Reviewed-by: Dr. David Alan Gilbert 
> > Reviewed-by: Andrey Shinkevich 
> > ---
> >   migration/savevm.c | 37 -
> >   1 file changed, 32 insertions(+), 5 deletions(-)
> > 
> 
> Definitely a bug fix, but I'd like David's opinion on whether this is still
> 5.1 material (because it is limited to just bitmaps migration, which is
> opt-in) or too risky (because we've already had several releases where it
> was broken, what's one more?).

I think it's OK for 5.1

Dave

> I'm less familiar with the code, so this is weak, but I did read through it
> and nothing jumped out at me, so:
> 
> Reviewed-by: Eric Blake 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v7 42/47] iotests: Test that qcow2's data-file is flushed

2020-07-27 Thread Andrey Shinkevich


On 25.06.2020 18:22, Max Reitz wrote:

Flushing a qcow2 node must lead to the data-file node being flushed as
well.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/244 | 49 ++
  tests/qemu-iotests/244.out |  7 ++
  2 files changed, 56 insertions(+)

diff --git a/tests/qemu-iotests/244 b/tests/qemu-iotests/244
index efe3c0428b..f2b2dddf1c 100755
--- a/tests/qemu-iotests/244
+++ b/tests/qemu-iotests/244
@@ -217,6 +217,55 @@ $QEMU_IMG amend -f $IMGFMT -o "data_file=blkdebug::$TEST_IMG.data" 
"$TEST_IMG"
  $QEMU_IMG convert -f $IMGFMT -O $IMGFMT -n -C "$TEST_IMG.src" "$TEST_IMG"
  $QEMU_IMG compare -f $IMGFMT -F $IMGFMT "$TEST_IMG.src" "$TEST_IMG"
  
+echo

+echo "=== Flushing should flush the data file ==="
+echo
+
+# We are going to flush a qcow2 file with a blkdebug node inserted
+# between the qcow2 node and its data file node.  The blkdebug node
+# will return an error for all flushes and so we if the data file is
+# flushed, we will see qemu-io return an error.
+
+# We need to write something or the flush will not do anything; we
+# also need -t writeback so the write is not done as a FUA write
+# (which would then fail thanks to the implicit flush)
+$QEMU_IO -c 'write 0 512' -c flush \
+-t writeback \
+"json:{
+ 'driver': 'qcow2',
+ 'file': {
+ 'driver': 'file',
+ 'filename': '$TEST_IMG'
+ },
+ 'data-file': {
+ 'driver': 'blkdebug',
+ 'inject-error': [{
+ 'event': 'none',
+ 'iotype': 'flush'
+ }],
+ 'image': {
+ 'driver': 'file',
+ 'filename': '$TEST_IMG.data'
+ }
+ }
+ }" \
+| _filter_qemu_io
+
+result=${PIPESTATUS[0]}
+echo
+
+case $result in
+0)
+echo "ERROR: qemu-io succeeded, so the data file was not flushed"
+;;
+1)
+echo "Success: qemu-io failed, so the data file was flushed"
+;;
+*)
+echo "ERROR: qemu-io returned unknown exit code $result"
+;;
+esac
+
  # success, all done
  echo "*** done"
  rm -f $seq.full
diff --git a/tests/qemu-iotests/244.out b/tests/qemu-iotests/244.out
index dbab7359a9..7269b4295a 100644
--- a/tests/qemu-iotests/244.out
+++ b/tests/qemu-iotests/244.out
@@ -131,4 +131,11 @@ Offset  Length  Mapped to   File
  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
data_file=TEST_DIR/t.IMGFMT.data
  Images are identical.
  Images are identical.
+
+=== Flushing should flush the data file ===
+
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Success: qemu-io failed, so the data file was flushed
  *** done



Reviewed-by: Andrey Shinkevich

Re: [PATCH v3 16/21] migration/block-dirty-bitmap: cancel migration on shutdown

2020-07-27 Thread Dr. David Alan Gilbert

* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> If target is turned off prior to postcopy finished, target crashes
> because busy bitmaps are found at shutdown.
> Canceling incoming migration helps, as it removes all unfinished (and
> therefore busy) bitmaps.
> 
> Similarly on source we crash in bdrv_close_all which asserts that all
> bdrv states are removed, because bdrv states involved into dirty bitmap
> migration are referenced by it. So, we need to cancel outgoing
> migration as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Andrey Shinkevich 
> ---
>  migration/migration.h  |  2 ++
>  migration/block-dirty-bitmap.c | 16 
>  migration/migration.c  | 13 +
>  3 files changed, 31 insertions(+)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index ab20c756f5..6c6a931d0d 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -335,6 +335,8 @@ void migrate_send_rp_recv_bitmap(MigrationIncomingState 
> *mis,
>  void migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value);
>  
>  void dirty_bitmap_mig_before_vm_start(void);
> +void dirty_bitmap_mig_cancel_outgoing(void);
> +void dirty_bitmap_mig_cancel_incoming(void);
>  void migrate_add_address(SocketAddress *address);
>  
>  int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index c24d4614bf..a198ec7278 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -657,6 +657,22 @@ static void cancel_incoming_locked(DBMLoadState *s)
>  s->bitmaps = NULL;
>  }
>  
> +void dirty_bitmap_mig_cancel_outgoing(void)
> +{
> +dirty_bitmap_do_save_cleanup(_state.save);
> +}
> +
> +void dirty_bitmap_mig_cancel_incoming(void)
> +{
> +DBMLoadState *s = _state.load;
> +
> +qemu_mutex_lock(>lock);
> +
> +cancel_incoming_locked(s);
> +
> +qemu_mutex_unlock(>lock);
> +}
> +
>  static void dirty_bitmap_load_complete(QEMUFile *f, DBMLoadState *s)
>  {
>  GSList *item;
> diff --git a/migration/migration.c b/migration/migration.c
> index 1c61428988..8fe36339db 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -188,6 +188,19 @@ void migration_shutdown(void)
>   */
>  migrate_fd_cancel(current_migration);
>  object_unref(OBJECT(current_migration));
> +
> +/*
> + * Cancel outgoing migration of dirty bitmaps. It should
> + * at least unref used block nodes.
> + */
> +dirty_bitmap_mig_cancel_outgoing();
> +
> +/*
> + * Cancel incoming migration of dirty bitmaps. Dirty bitmaps
> + * are non-critical data, and their loss never considered as
> + * something serious.
> + */
> +dirty_bitmap_mig_cancel_incoming();

Are you sure this is the right place to put them - I'm thinking that
perhaps the object_unref of current_migration should still be after
them?

Dave

>  }
>  
>  /* For outgoing */
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v7 0/7] coroutines: generate wrapper code

2020-07-27 Thread Vladimir Sementsov-Ogievskiy


27.07.2020 15:48, Stefan Hajnoczi wrote:

On Wed, Jun 10, 2020 at 01:03:29PM +0300, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

The aim of the series is to reduce code-duplication and writing
parameters structure-packing by hand around coroutine function wrappers.

Benefits:
  - no code duplication
  - less indirection


Please add documentation so others know when and how to use this.

I suggest adding a docs/devel/coroutine-wrapper.rst document and adding
a code comment to #define generated_co_wrapper pointing to the
documentation.

Please rename coroutine-wrapper.py to block-coroutine-wrapper.py since
it is specific to the block layer.



OK, will do. Thanks for taking a look!


--
Best regards,
Vladimir

Re: [PATCH v7 0/7] coroutines: generate wrapper code

2020-07-27 Thread Stefan Hajnoczi

On Wed, Jun 10, 2020 at 01:03:29PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The aim of the series is to reduce code-duplication and writing
> parameters structure-packing by hand around coroutine function wrappers.
> 
> Benefits:
>  - no code duplication
>  - less indirection

Please add documentation so others know when and how to use this.

I suggest adding a docs/devel/coroutine-wrapper.rst document and adding
a code comment to #define generated_co_wrapper pointing to the
documentation.

Please rename coroutine-wrapper.py to block-coroutine-wrapper.py since
it is specific to the block layer.

Stefan

signature.asc
Description: PGP signature

1 2 >

1 - 100 of 113 matches

Mail list logo