date:20190816

Re: [Qemu-block] [Qemu-devel] [PATCH] job: drop job_drain

2019-08-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190816170457.522990-1-vsement...@virtuozzo.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote 
/tmp/qemu-test/src/tcg -iquote /tmp/qemu-test/src/tcg/i386 
-I/tmp/qemu-test/src/linux-headers -I/tmp/qemu-test/build/linux-headers -iquote 
. -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote 
/tmp/qemu-test/src/include -I/usr/include/pixman-1  
-I/tmp/qemu-test/src/dtc/libfdt -Werror  -pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wno-string-plus-int 
-Wno-typedef-redefinition -Wno-initializer-overrides -Wexpansion-to-defined 
-Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body 
-Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-definition -Wtype-limits 
-fstack-protector-strong  -I/usr/include/p11-kit-1 -I/usr/include/libpng16  
-I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 
-I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid 
-I/usr/include/uuid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -MMD 
-MP -MT tests/test-shift128.o -MF tests/test-shift128.d -fsanitize=undefined 
-fsanitize=address -g   -c -o tests/test-shift128.o 
/tmp/qemu-test/src/tests/test-shift128.c
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote 
/tmp/qemu-test/src/tcg -iquote /tmp/qemu-test/src/tcg/i386 
-I/tmp/qemu-test/src/linux-headers -I/tmp/qemu-test/build/linux-headers -iquote 
. -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote 
/tmp/qemu-test/src/include -I/usr/include/pixman-1  
-I/tmp/qemu-test/src/dtc/libfdt -Werror  -pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wno-string-plus-int 
-Wno-typedef-redefinition -Wno-initializer-overrides -Wexpansion-to-defined 
-Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body 
-Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-definition -Wtype-limits 
-fstack-protector-strong  -I/usr/include/p11-kit-1 -I/usr/include/libpng16  
-I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 
-I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid 
-I/usr/include/uuid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -MMD 
-MP -MT tests/test-mul64.o -MF tests/test-mul64.d -fsanitize=undefined 
-fsanitize=address -g   -c -o tests/test-mul64.o 
/tmp/qemu-test/src/tests/test-mul64.c
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote 
/tmp/qemu-test/src/tcg -iquote /tmp/qemu-test/src/tcg/i386 
-I/tmp/qemu-test/src/linux-headers -I/tmp/qemu-test/build/linux-headers -iquote 
. -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote 
/tmp/qemu-test/src/include -I/usr/include/pixman-1  
-I/tmp/qemu-test/src/dtc/libfdt -Werror  -pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wno-string-plus-int 
-Wno-typedef-redefinition -Wno-initializer-overrides -Wexpansion-to-defined 
-Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body 
-Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-definition -Wtype-limits 
-fstack-protector-strong  -I/usr/include/p11-kit-1 -I/usr/include/libpng16  
-I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 
-I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid 
-I/usr/include/uuid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -MMD 
-MP -MT tests/test-int128.o -MF tests/test-int128.d -fsanitize=undefined 
-fsanitize=address -g   -c -o tests/test-int128.o 
/tmp/qemu-test/src/tests/test-int128.c
/tmp/qemu-

[Qemu-block] [PULL 32/36] iotests/257: test traditional sync modes

2019-08-16 Thread John Snow

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-12-js...@redhat.com
[Edit 'Bitmap' --> 'bitmap' in 257.out --js]
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257 |   41 +-
 tests/qemu-iotests/257.out | 3089 
 2 files changed, 3128 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index 53ab31c92e1..c2a72c577aa 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -283,6 +283,12 @@ def test_bitmap_sync(bsync_mode, msync_mode='bitmap', 
failure=None):
   Bitmaps are always synchronized, regardless of failure.
   (Partial images must be kept.)
 
+:param msync_mode: The mirror sync mode to use for the first backup.
+   Can be any one of:
+- bitmap: Backups based on bitmap manifest.
+- full:   Full backups.
+- top:Full backups of the top layer only.
+
 :param failure: Is the (optional) failure mode, and can be any of:
 - None: No failure. Test the normative path. Default.
 - simulated:Cancel the job right before it completes.
@@ -393,7 +399,7 @@ def test_bitmap_sync(bsync_mode, msync_mode='bitmap', 
failure=None):
 # group 1 gets cleared first, then group two gets written.
 if ((bsync_mode == 'on-success' and not failure) or
 (bsync_mode == 'always')):
-ebitmap.clear_group(1)
+ebitmap.clear()
 ebitmap.dirty_group(2)
 
 vm.run_job(job, auto_dismiss=True, auto_finalize=False,
@@ -404,8 +410,19 @@ def test_bitmap_sync(bsync_mode, msync_mode='bitmap', 
failure=None):
 log('')
 
 if bsync_mode == 'always' and failure == 'intermediate':
+# TOP treats anything allocated as dirty, expect to see:
+if msync_mode == 'top':
+ebitmap.dirty_group(0)
+
 # We manage to copy one sector (one bit) before the error.
 ebitmap.clear_bit(ebitmap.first_bit)
+
+# Full returns all bits set except what was copied/skipped
+if msync_mode == 'full':
+fail_bit = ebitmap.first_bit
+ebitmap.clear()
+ebitmap.dirty_bits(range(fail_bit, SIZE // GRANULARITY))
+
 ebitmap.compare(get_bitmap(bitmaps, drive0.device, 'bitmap0'))
 
 # 2 - Writes and Reference Backup
@@ -499,10 +516,25 @@ def test_backup_api():
 'bitmap404': ['on-success', 'always', 'never', None],
 'bitmap0':   [None],
 },
+'full': {
+None:['on-success', 'always', 'never'],
+'bitmap404': ['on-success', 'always', 'never', None],
+'bitmap0':   ['never', None],
+},
+'top': {
+None:['on-success', 'always', 'never'],
+'bitmap404': ['on-success', 'always', 'never', None],
+'bitmap0':   ['never', None],
+},
+'none': {
+None:['on-success', 'always', 'never'],
+'bitmap404': ['on-success', 'always', 'never', None],
+'bitmap0':   ['on-success', 'always', 'never', None],
+}
 }
 
 # Dicts, as always, are not stably-ordered prior to 3.7, so use tuples:
-for sync_mode in ('incremental', 'bitmap'):
+for sync_mode in ('incremental', 'bitmap', 'full', 'top', 'none'):
 log("-- Sync mode {:s} tests --\n".format(sync_mode))
 for bitmap in (None, 'bitmap404', 'bitmap0'):
 for policy in error_cases[sync_mode][bitmap]:
@@ -517,6 +549,11 @@ def main():
 for failure in ("simulated", "intermediate", None):
 test_bitmap_sync(bsync_mode, "bitmap", failure)
 
+for sync_mode in ('full', 'top'):
+for bsync_mode in ('on-success', 'always'):
+for failure in ('simulated', 'intermediate', None):
+test_bitmap_sync(bsync_mode, sync_mode, failure)
+
 test_backup_api()
 
 if __name__ == '__main__':
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index 811b1b11f19..84b79d7bfe9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -2246,6 +2246,3002 @@ qemu_img compare "TEST_DIR/PID-bsync2" 
"TEST_DIR/PID-fbackup2" ==> Identical, OK
 qemu_img compare "TEST_DIR/PID-img" "TEST_DIR/PID-fbackup2" ==> Identical, OK!
 
 
+=== Mode full; Bitmap Sync on-success with simulated failure ===
+
+--- Preparing image & VM ---
+
+{"execute": "blockdev-add", "arguments": {"driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/PID-img"}, "node-name": "drive0"}}
+{"return": {}}
+{"execute": "device_add", "arguments": {"drive": "drive0", "driver": 
"scsi-hd", "id": "device0", "share-rw": true}}
+{"return": {}}
+
+--- Write #0 ---
+
+write -P0x49 0x000 0x1
+{"ret

[Qemu-block] [PULL 34/36] block/backup: deal with zero detection

2019-08-16 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

We have detect_zeroes option, so at least for blockdev-backup user
should define it if zero-detection is needed. For drive-backup leave
detection enabled by default but do it through existing option instead
of open-coding.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190730163251.755248-2-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 block/backup.c | 15 ++-
 blockdev.c |  8 
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index a9be07258c1..083fc189af9 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -113,7 +113,10 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 BlockBackend *blk = job->common.blk;
 int nbytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
+int write_flags =
+(job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) |
+(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
@@ -131,14 +134,8 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 goto fail;
 }
 
-if (buffer_is_zero(*bounce_buffer, nbytes)) {
-ret = blk_co_pwrite_zeroes(job->target, start,
-   nbytes, write_flags | BDRV_REQ_MAY_UNMAP);
-} else {
-ret = blk_co_pwrite(job->target, start,
-nbytes, *bounce_buffer, write_flags |
-(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
-}
+ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
+write_flags);
 if (ret < 0) {
 trace_backup_do_cow_write_fail(job, start, ret);
 if (error_is_read) {
diff --git a/blockdev.c b/blockdev.c
index 64d06d1f672..2e536dde3e9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3615,7 +3615,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 BlockDriverState *source = NULL;
 BlockJob *job = NULL;
 AioContext *aio_context;
-QDict *options = NULL;
+QDict *options;
 Error *local_err = NULL;
 int flags;
 int64_t size;
@@ -3688,10 +3688,10 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 goto out;
 }
 
+options = qdict_new();
+qdict_put_str(options, "discard", "unmap");
+qdict_put_str(options, "detect-zeroes", "unmap");
 if (backup->format) {
-if (!options) {
-options = qdict_new();
-}
 qdict_put_str(options, "driver", backup->format);
 }
 
-- 
2.21.0

[Qemu-block] [PULL 35/36] block/backup: refactor write_flags

2019-08-16 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

write flags are constant, let's store it in BackupBlockJob instead of
recalculating. It also makes two boolean fields to be unused, so,
drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190730163251.755248-4-vsement...@virtuozzo.com
Signed-off-by: John Snow 
---
 block/backup.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 083fc189af9..2baf7bed65a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -50,14 +50,13 @@ typedef struct BackupBlockJob {
 uint64_t len;
 uint64_t bytes_read;
 int64_t cluster_size;
-bool compress;
 NotifierWithReturn before_write;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 
 bool use_copy_range;
 int64_t copy_range_size;
 
-bool serialize_target_writes;
+BdrvRequestFlags write_flags;
 bool initializing_bitmap;
 } BackupBlockJob;
 
@@ -113,10 +112,6 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 BlockBackend *blk = job->common.blk;
 int nbytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-int write_flags =
-(job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) |
-(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
-
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
@@ -135,7 +130,7 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 }
 
 ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
-write_flags);
+job->write_flags);
 if (ret < 0) {
 trace_backup_do_cow_write_fail(job, start, ret);
 if (error_is_read) {
@@ -163,7 +158,6 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 BlockBackend *blk = job->common.blk;
 int nbytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
 assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
@@ -172,7 +166,7 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
 job->cluster_size * nr_clusters);
 ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-read_flags, write_flags);
+read_flags, job->write_flags);
 if (ret < 0) {
 trace_backup_do_cow_copy_range_fail(job, start, ret);
 bdrv_set_dirty_bitmap(job->copy_bitmap, start,
@@ -751,10 +745,16 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->sync_mode = sync_mode;
 job->sync_bitmap = sync_bitmap;
 job->bitmap_mode = bitmap_mode;
-job->compress = compress;
 
-/* Detect image-fleecing (and similar) schemes */
-job->serialize_target_writes = bdrv_chain_contains(target, bs);
+/*
+ * Set write flags:
+ * 1. Detect image-fleecing (and similar) schemes
+ * 2. Handle compression
+ */
+job->write_flags =
+(bdrv_chain_contains(target, bs) ? BDRV_REQ_SERIALISING : 0) |
+(compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
 job->cluster_size = cluster_size;
 job->copy_bitmap = copy_bitmap;
 copy_bitmap = NULL;
-- 
2.21.0

[Qemu-block] [PULL 26/36] iotests/257: test API failures

2019-08-16 Thread John Snow

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-6-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257 | 67 ++
 tests/qemu-iotests/257.out | 85 ++
 2 files changed, 152 insertions(+)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index aaa8f595043..53ab31c92e1 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -447,10 +447,77 @@ def test_bitmap_sync(bsync_mode, msync_mode='bitmap', 
failure=None):
 compare_images(img_path, fbackup2)
 log('')
 
+def test_backup_api():
+"""
+Test malformed and prohibited invocations of the backup API.
+"""
+with iotests.FilePaths(['img', 'bsync1']) as \
+ (img_path, backup_path), \
+ iotests.VM() as vm:
+
+log("\n=== API failure tests ===\n")
+log('--- Preparing image & VM ---\n')
+drive0 = Drive(img_path, vm=vm)
+drive0.img_create(iotests.imgfmt, SIZE)
+vm.add_device("{},id=scsi0".format(iotests.get_virtio_scsi_device()))
+vm.launch()
+
+file_config = {
+'driver': 'file',
+'filename': drive0.path
+}
+
+vm.qmp_log('blockdev-add',
+   filters=[iotests.filter_qmp_testfiles],
+   node_name="drive0",
+   driver=drive0.fmt,
+   file=file_config)
+drive0.node = 'drive0'
+drive0.device = 'device0'
+vm.qmp_log("device_add", id=drive0.device,
+   drive=drive0.name, driver="scsi-hd")
+log('')
+
+target0 = Drive(backup_path, vm=vm)
+target0.create_target("backup_target", drive0.fmt, drive0.size)
+log('')
+
+vm.qmp_log("block-dirty-bitmap-add", node=drive0.name,
+   name="bitmap0", granularity=GRANULARITY)
+log('')
+
+log('-- Testing invalid QMP commands --\n')
+
+error_cases = {
+'incremental': {
+None:['on-success', 'always', 'never', None],
+'bitmap404': ['on-success', 'always', 'never', None],
+'bitmap0':   ['always', 'never']
+},
+'bitmap': {
+None:['on-success', 'always', 'never', None],
+'bitmap404': ['on-success', 'always', 'never', None],
+'bitmap0':   [None],
+},
+}
+
+# Dicts, as always, are not stably-ordered prior to 3.7, so use tuples:
+for sync_mode in ('incremental', 'bitmap'):
+log("-- Sync mode {:s} tests --\n".format(sync_mode))
+for bitmap in (None, 'bitmap404', 'bitmap0'):
+for policy in error_cases[sync_mode][bitmap]:
+blockdev_backup(drive0.vm, drive0.name, "backup_target",
+sync_mode, job_id='api_job',
+bitmap=bitmap, bitmap_mode=policy)
+log('')
+
+
 def main():
 for bsync_mode in ("never", "on-success", "always"):
 for failure in ("simulated", "intermediate", None):
 test_bitmap_sync(bsync_mode, "bitmap", failure)
 
+test_backup_api()
+
 if __name__ == '__main__':
 iotests.script_main(main, supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index 0abc96acd36..43f2e0f9c99 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -2245,3 +2245,88 @@ qemu_img compare "TEST_DIR/PID-bsync1" 
"TEST_DIR/PID-fbackup1" ==> Identical, OK
 qemu_img compare "TEST_DIR/PID-bsync2" "TEST_DIR/PID-fbackup2" ==> Identical, 
OK!
 qemu_img compare "TEST_DIR/PID-img" "TEST_DIR/PID-fbackup2" ==> Identical, OK!
 
+
+=== API failure tests ===
+
+--- Preparing image & VM ---
+
+{"execute": "blockdev-add", "arguments": {"driver": "qcow2", "file": 
{"driver": "file", "filename": "TEST_DIR/PID-img"}, "node-name": "drive0"}}
+{"return": {}}
+{"execute": "device_add", "arguments": {"drive": "drive0", "driver": 
"scsi-hd", "id": "device0"}}
+{"return": {}}
+
+{}
+{"execute": "job-dismiss", "arguments": {"id": "bdc-file-job"}}
+{"return": {}}
+{}
+{}
+{"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
+{"return": {}}
+{}
+
+{"execute": "block-dirty-bitmap-add", "arguments": {"granularity": 65536, 
"name": "bitmap0", "node": "drive0"}}
+{"return": {}}
+
+-- Testing invalid QMP commands --
+
+-- Sync mode incremental tests --
+
+{"execute": "blockdev-backup", "arguments": {"bitmap-mode": "on-success", 
"device": "drive0", "job-id": "api_job", "sync": "incremental", "target": 
"backup_target"}}
+{"error": {"class": "GenericError", "desc": "must provide a valid bitmap name 
for 'incremental' sync mode"}}
+
+{"execute": "blockdev-backup", "arguments": {"bitmap-mode": "always", 
"device": "drive0", "job-id": "api_job", "sync": "incremental", "target": 
"backup_target"}}
+{"error": {"class": "G

[Qemu-block] [PULL 36/36] tests/test-hbitmap: test next_zero and _next_dirty_area after truncate

2019-08-16 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Test that hbitmap_next_zero and hbitmap_next_dirty_area can find things
after old bitmap end.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-id: 20190805164652.42409-1-vsement...@virtuozzo.com
Tested-by: John Snow 
Reviewed-by: John Snow 
Signed-off-by: John Snow 
---
 tests/test-hbitmap.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 592d8219db2..eed5d288cbc 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -1004,6 +1004,15 @@ static void test_hbitmap_next_zero_4(TestHBitmapData 
*data, const void *unused)
 test_hbitmap_next_zero_do(data, 4);
 }
 
+static void test_hbitmap_next_zero_after_truncate(TestHBitmapData *data,
+  const void *unused)
+{
+hbitmap_test_init(data, L1, 0);
+hbitmap_test_truncate_impl(data, L1 * 2);
+hbitmap_set(data->hb, 0, L1);
+test_hbitmap_next_zero_check(data, 0);
+}
+
 static void test_hbitmap_next_dirty_area_check(TestHBitmapData *data,
uint64_t offset,
uint64_t count)
@@ -1104,6 +1113,15 @@ static void 
test_hbitmap_next_dirty_area_4(TestHBitmapData *data,
 test_hbitmap_next_dirty_area_do(data, 4);
 }
 
+static void test_hbitmap_next_dirty_area_after_truncate(TestHBitmapData *data,
+const void *unused)
+{
+hbitmap_test_init(data, L1, 0);
+hbitmap_test_truncate_impl(data, L1 * 2);
+hbitmap_set(data->hb, L1 + 1, 1);
+test_hbitmap_next_dirty_area_check(data, 0, UINT64_MAX);
+}
+
 int main(int argc, char **argv)
 {
 g_test_init(&argc, &argv, NULL);
@@ -1169,6 +1187,8 @@ int main(int argc, char **argv)
  test_hbitmap_next_zero_0);
 hbitmap_test_add("/hbitmap/next_zero/next_zero_4",
  test_hbitmap_next_zero_4);
+hbitmap_test_add("/hbitmap/next_zero/next_zero_after_truncate",
+ test_hbitmap_next_zero_after_truncate);
 
 hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_0",
  test_hbitmap_next_dirty_area_0);
@@ -1176,6 +1196,8 @@ int main(int argc, char **argv)
  test_hbitmap_next_dirty_area_1);
 hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_4",
  test_hbitmap_next_dirty_area_4);
+hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_after_truncate",
+ test_hbitmap_next_dirty_area_after_truncate);
 
 g_test_run();
 
-- 
2.21.0

[Qemu-block] [PULL 28/36] block/backup: centralize copy_bitmap initialization

2019-08-16 Thread John Snow

Just a few housekeeping changes that keeps the following commit easier
to read; perform the initial copy_bitmap initialization in one place.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-8-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index b04ab2d5f0c..305f9b3468b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -451,16 +451,22 @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
 return ret;
 }
 
-/* init copy_bitmap from sync_bitmap */
-static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
+static void backup_init_copy_bitmap(BackupBlockJob *job)
 {
-bool ret = bdrv_dirty_bitmap_merge_internal(job->copy_bitmap,
-job->sync_bitmap,
-NULL, true);
-assert(ret);
+bool ret;
+uint64_t estimate;
 
-job_progress_set_remaining(&job->common.job,
-   bdrv_get_dirty_count(job->copy_bitmap));
+if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
+ret = bdrv_dirty_bitmap_merge_internal(job->copy_bitmap,
+   job->sync_bitmap,
+   NULL, true);
+assert(ret);
+} else {
+bdrv_set_dirty_bitmap(job->copy_bitmap, 0, job->len);
+}
+
+estimate = bdrv_get_dirty_count(job->copy_bitmap);
+job_progress_set_remaining(&job->common.job, estimate);
 }
 
 static int coroutine_fn backup_run(Job *job, Error **errp)
@@ -472,12 +478,7 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
 QLIST_INIT(&s->inflight_reqs);
 qemu_co_rwlock_init(&s->flush_rwlock);
 
-if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
-backup_incremental_init_copy_bitmap(s);
-} else {
-bdrv_set_dirty_bitmap(s->copy_bitmap, 0, s->len);
-job_progress_set_remaining(job, s->len);
-}
+backup_init_copy_bitmap(s);
 
 s->before_write.notify = backup_before_write_notify;
 bdrv_add_before_write_notifier(bs, &s->before_write);
-- 
2.21.0

[Qemu-block] [PULL 29/36] block/backup: add backup_is_cluster_allocated

2019-08-16 Thread John Snow

Modify bdrv_is_unallocated_range to utilize the pnum return from
bdrv_is_allocated, and in the process change the semantics from
"is unallocated" to "is allocated."

Optionally returns a full number of clusters that share the same
allocation status.

This will be used to carefully toggle bits in the bitmap for sync=top
initialization in the following commits.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-9-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c | 62 +++---
 1 file changed, 44 insertions(+), 18 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 305f9b3468b..f6bf32c9438 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -185,6 +185,48 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 return nbytes;
 }
 
+/*
+ * Check if the cluster starting at offset is allocated or not.
+ * return via pnum the number of contiguous clusters sharing this allocation.
+ */
+static int backup_is_cluster_allocated(BackupBlockJob *s, int64_t offset,
+   int64_t *pnum)
+{
+BlockDriverState *bs = blk_bs(s->common.blk);
+int64_t count, total_count = 0;
+int64_t bytes = s->len - offset;
+int ret;
+
+assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+
+while (true) {
+ret = bdrv_is_allocated(bs, offset, bytes, &count);
+if (ret < 0) {
+return ret;
+}
+
+total_count += count;
+
+if (ret || count == 0) {
+/*
+ * ret: partial segment(s) are considered allocated.
+ * otherwise: unallocated tail is treated as an entire segment.
+ */
+*pnum = DIV_ROUND_UP(total_count, s->cluster_size);
+return ret;
+}
+
+/* Unallocated segment(s) with uncertain following segment(s) */
+if (total_count >= s->cluster_size) {
+*pnum = total_count / s->cluster_size;
+return 0;
+}
+
+offset += count;
+bytes -= count;
+}
+}
+
 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
   int64_t offset, uint64_t bytes,
   bool *error_is_read,
@@ -398,34 +440,18 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
*job)
 return false;
 }
 
-static bool bdrv_is_unallocated_range(BlockDriverState *bs,
-  int64_t offset, int64_t bytes)
-{
-int64_t end = offset + bytes;
-
-while (offset < end && !bdrv_is_allocated(bs, offset, bytes, &bytes)) {
-if (bytes == 0) {
-return true;
-}
-offset += bytes;
-bytes = end - offset;
-}
-
-return offset >= end;
-}
-
 static int coroutine_fn backup_loop(BackupBlockJob *job)
 {
 bool error_is_read;
 int64_t offset;
 BdrvDirtyBitmapIter *bdbi;
-BlockDriverState *bs = blk_bs(job->common.blk);
 int ret = 0;
+int64_t dummy;
 
 bdbi = bdrv_dirty_iter_new(job->copy_bitmap);
 while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
 if (job->sync_mode == MIRROR_SYNC_MODE_TOP &&
-bdrv_is_unallocated_range(bs, offset, job->cluster_size))
+!backup_is_cluster_allocated(job, offset, &dummy))
 {
 bdrv_reset_dirty_bitmap(job->copy_bitmap, offset,
 job->cluster_size);
-- 
2.21.0

[Qemu-block] [PULL 20/36] qapi: implement block-dirty-bitmap-remove transaction action

2019-08-16 Thread John Snow

It is used to do transactional movement of the bitmap (which is
possible in conjunction with merge command). Transactional bitmap
movement is needed in scenarios with external snapshot, when we don't
want to leave copy of the bitmap in the base image.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190708220502.12977-3-js...@redhat.com
[Edited "since" version to 4.2 --js]
Signed-off-by: John Snow 
---
 block.c|  2 +-
 block/dirty-bitmap.c   | 15 +++
 blockdev.c | 79 +++---
 include/block/dirty-bitmap.h   |  2 +-
 migration/block-dirty-bitmap.c |  2 +-
 qapi/transaction.json  |  2 +
 6 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 2a2d0696672..3e698e9cabd 100644
--- a/block.c
+++ b/block.c
@@ -5346,7 +5346,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 for (bm = bdrv_dirty_bitmap_next(bs, NULL); bm;
  bm = bdrv_dirty_bitmap_next(bs, bm))
 {
-bdrv_dirty_bitmap_set_migration(bm, false);
+bdrv_dirty_bitmap_skip_store(bm, false);
 }
 
 ret = refresh_total_sectors(bs, bs->total_sectors);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 75a5daf116f..134e0c9a0c8 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -48,10 +48,9 @@ struct BdrvDirtyBitmap {
 bool inconsistent;  /* bitmap is persistent, but inconsistent.
It cannot be used at all in any way, except
a QMP user can remove it. */
-bool migration; /* Bitmap is selected for migration, it should
-   not be stored on the next inactivation
-   (persistent flag doesn't matter until next
-   invalidation).*/
+bool skip_store;/* We are either migrating or deleting this
+ * bitmap; it should not be stored on the next
+ * inactivation. */
 QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
 
@@ -762,16 +761,16 @@ void bdrv_dirty_bitmap_set_inconsistent(BdrvDirtyBitmap 
*bitmap)
 }
 
 /* Called with BQL taken. */
-void bdrv_dirty_bitmap_set_migration(BdrvDirtyBitmap *bitmap, bool migration)
+void bdrv_dirty_bitmap_skip_store(BdrvDirtyBitmap *bitmap, bool skip)
 {
 qemu_mutex_lock(bitmap->mutex);
-bitmap->migration = migration;
+bitmap->skip_store = skip;
 qemu_mutex_unlock(bitmap->mutex);
 }
 
 bool bdrv_dirty_bitmap_get_persistence(BdrvDirtyBitmap *bitmap)
 {
-return bitmap->persistent && !bitmap->migration;
+return bitmap->persistent && !bitmap->skip_store;
 }
 
 bool bdrv_dirty_bitmap_inconsistent(const BdrvDirtyBitmap *bitmap)
@@ -783,7 +782,7 @@ bool bdrv_has_changed_persistent_bitmaps(BlockDriverState 
*bs)
 {
 BdrvDirtyBitmap *bm;
 QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
-if (bm->persistent && !bm->readonly && !bm->migration) {
+if (bm->persistent && !bm->readonly && !bm->skip_store) {
 return true;
 }
 }
diff --git a/blockdev.c b/blockdev.c
index bcd766a1a24..210226d8290 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2136,6 +2136,51 @@ static void 
block_dirty_bitmap_merge_prepare(BlkActionState *common,
 errp);
 }
 
+static BdrvDirtyBitmap *do_block_dirty_bitmap_remove(
+const char *node, const char *name, bool release,
+BlockDriverState **bitmap_bs, Error **errp);
+
+static void block_dirty_bitmap_remove_prepare(BlkActionState *common,
+  Error **errp)
+{
+BlockDirtyBitmap *action;
+BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
+ common, common);
+
+if (action_check_completion_mode(common, errp) < 0) {
+return;
+}
+
+action = common->action->u.block_dirty_bitmap_remove.data;
+
+state->bitmap = do_block_dirty_bitmap_remove(action->node, action->name,
+ false, &state->bs, errp);
+if (state->bitmap) {
+bdrv_dirty_bitmap_skip_store(state->bitmap, true);
+bdrv_dirty_bitmap_set_busy(state->bitmap, true);
+}
+}
+
+static void block_dirty_bitmap_remove_abort(BlkActionState *common)
+{
+BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
+ common, common);
+
+if (state->bitmap) {
+bdrv_dirty_bitmap_skip_store(state->bitmap, false);
+bdrv_dirty_bitmap_set_busy(state->bitmap, false);
+}
+}
+
+static void block_dirty_bitmap_remove_commit(BlkActionState *common)
+{
+BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
+ common, common);

[Qemu-block] [PULL 31/36] block/backup: support bitmap sync modes for non-bitmap backups

2019-08-16 Thread John Snow

Accept bitmaps and sync policies for the other backup modes.
This allows us to do things like create a bitmap synced to a full backup
without a transaction, or start a resumable backup process.

Some combinations don't make sense, though:

- NEVER policy combined with any non-BITMAP mode doesn't do anything,
  because the bitmap isn't used for input or output.
  It's harmless, but is almost certainly never what the user wanted.

- sync=NONE is more questionable. It can't use on-success because this
  job never completes with success anyway, and the resulting artifact
  of 'always' is suspect: because we start with a full bitmap and only
  copy out segments that get written to, the final output bitmap will
  always be ... a fully set bitmap.

  Maybe there's contexts in which bitmaps make sense for sync=none,
  but not without more severe changes to the current job, and omitting
  it here doesn't prevent us from adding it later.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-11-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c   |  8 +---
 blockdev.c   | 22 ++
 qapi/block-core.json |  6 --
 3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 9e1382ec5c6..a9be07258c1 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -697,7 +697,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 return NULL;
 }
 
-if (sync_mode == MIRROR_SYNC_MODE_BITMAP) {
+if (sync_bitmap) {
 /* If we need to write to this bitmap, check that we can: */
 if (bitmap_mode != BITMAP_SYNC_MODE_NEVER &&
 bdrv_dirty_bitmap_check(sync_bitmap, BDRV_BITMAP_DEFAULT, errp)) {
@@ -708,12 +708,6 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
 return NULL;
 }
-} else if (sync_bitmap) {
-error_setg(errp,
-   "a bitmap was given to backup_job_create, "
-   "but it received an incompatible sync_mode (%s)",
-   MirrorSyncMode_str(sync_mode));
-return NULL;
 }
 
 len = bdrv_getlength(bs);
diff --git a/blockdev.c b/blockdev.c
index f889da0b427..64d06d1f672 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3567,6 +3567,28 @@ static BlockJob *do_backup_common(BackupCommon *backup,
 if (bdrv_dirty_bitmap_check(bmap, BDRV_BITMAP_ALLOW_RO, errp)) {
 return NULL;
 }
+
+/* This does not produce a useful bitmap artifact: */
+if (backup->sync == MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "sync mode '%s' does not produce meaningful 
bitmap"
+   " outputs", MirrorSyncMode_str(backup->sync));
+return NULL;
+}
+
+/* If the bitmap isn't used for input or output, this is useless: */
+if (backup->bitmap_mode == BITMAP_SYNC_MODE_NEVER &&
+backup->sync != MIRROR_SYNC_MODE_BITMAP) {
+error_setg(errp, "Bitmap sync mode '%s' has no meaningful effect"
+   " when combined with sync mode '%s'",
+   BitmapSyncMode_str(backup->bitmap_mode),
+   MirrorSyncMode_str(backup->sync));
+return NULL;
+}
+}
+
+if (!backup->has_bitmap && backup->has_bitmap_mode) {
+error_setg(errp, "Cannot specify bitmap sync mode without a bitmap");
+return NULL;
 }
 
 if (!backup->auto_finalize) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8344fbe2030..d72cf5f354b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1352,13 +1352,15 @@
 # @speed: the maximum speed, in bytes per second. The default is 0,
 # for unlimited.
 #
-# @bitmap: the name of a dirty bitmap if sync is "bitmap" or "incremental".
+# @bitmap: The name of a dirty bitmap to use.
 #  Must be present if sync is "bitmap" or "incremental".
+#  Can be present if sync is "full" or "top".
 #  Must not be present otherwise.
 #  (Since 2.4 (drive-backup), 3.1 (blockdev-backup))
 #
 # @bitmap-mode: Specifies the type of data the bitmap should contain after
-#   the operation concludes. Must be present if sync is "bitmap".
+#   the operation concludes.
+#   Must be present if a bitmap was provided,
 #   Must NOT be present otherwise. (Since 4.2)
 #
 # @compress: true to compress data, if the target format supports it.
-- 
2.21.0

[Qemu-block] [PULL 33/36] qapi: add dirty-bitmaps to query-named-block-nodes result

2019-08-16 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Let's add a possibility to query dirty-bitmaps not only on root nodes.
It is useful when dealing both with snapshots and incremental backups.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
Message-id: 20190717173937.18747-1-js...@redhat.com
[Added deprecation information. --js]
Signed-off-by: John Snow 
[Fixed spelling --js]
---
 block/qapi.c |  5 +
 qapi/block-core.json |  6 +-
 qemu-deprecated.texi | 12 
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/block/qapi.c b/block/qapi.c
index 917435f0226..15f10302647 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -79,6 +79,11 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 info->backing_file = g_strdup(bs->backing_file);
 }
 
+if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
+info->has_dirty_bitmaps = true;
+info->dirty_bitmaps = bdrv_query_dirty_bitmaps(bs);
+}
+
 info->detect_zeroes = bs->detect_zeroes;
 
 if (blk && blk_get_public(blk)->throttle_group_member.throttle_state) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d72cf5f354b..e9364a4a293 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -360,6 +360,9 @@
 # @write_threshold: configured write threshold for the device.
 #   0 if disabled. (Since 2.3)
 #
+# @dirty-bitmaps: dirty bitmaps information (only present if node
+# has one or more dirty bitmaps) (Since 4.2)
+#
 # Since: 0.14.0
 #
 ##
@@ -378,7 +381,7 @@
 '*bps_wr_max_length': 'int', '*iops_max_length': 'int',
 '*iops_rd_max_length': 'int', '*iops_wr_max_length': 'int',
 '*iops_size': 'int', '*group': 'str', 'cache': 'BlockdevCacheInfo',
-'write_threshold': 'int' } }
+'write_threshold': 'int', '*dirty-bitmaps': ['BlockDirtyInfo'] } }
 
 ##
 # @BlockDeviceIoStatus:
@@ -656,6 +659,7 @@
 #
 # @dirty-bitmaps: dirty bitmaps information (only present if the
 # driver has one or more dirty bitmaps) (Since 2.0)
+# Deprecated in 4.2; see BlockDeviceInfo instead.
 #
 # @io-status: @BlockDeviceIoStatus. Only present if the device
 # supports it and the VM is configured to stop on errors
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index f7680c08e10..00a4b6f3504 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -154,6 +154,18 @@ The ``status'' field of the ``BlockDirtyInfo'' structure, 
returned by
 the query-block command is deprecated. Two new boolean fields,
 ``recording'' and ``busy'' effectively replace it.
 
+@subsection query-block result field dirty-bitmaps (Since 4.2)
+
+The ``dirty-bitmaps`` field of the ``BlockInfo`` structure, returned by
+the query-block command is itself now deprecated. The ``dirty-bitmaps``
+field of the ``BlockDeviceInfo`` struct should be used instead, which is the
+type of the ``inserted`` field in query-block replies, as well as the
+type of array items in query-named-block-nodes.
+
+Since the ``dirty-bitmaps`` field is optionally present in both the old and
+new locations, clients must use introspection to learn where to anticipate
+the field if/when it does appear in command output.
+
 @subsection query-cpus (since 2.12.0)
 
 The ``query-cpus'' command is replaced by the ``query-cpus-fast'' command.
-- 
2.21.0

[Qemu-block] [PULL 17/36] iotests: add test 257 for bitmap-mode backups

2019-08-16 Thread John Snow

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-18-js...@redhat.com
[Removed 'auto' group, as per new testing config guidelines --js]
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257 |  416 +++
 tests/qemu-iotests/257.out | 2247 
 tests/qemu-iotests/group   |1 +
 3 files changed, 2664 insertions(+)
 create mode 100755 tests/qemu-iotests/257
 create mode 100644 tests/qemu-iotests/257.out

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
new file mode 100755
index 000..39526837499
--- /dev/null
+++ b/tests/qemu-iotests/257
@@ -0,0 +1,416 @@
+#!/usr/bin/env python
+#
+# Test bitmap-sync backups (incremental, differential, and partials)
+#
+# Copyright (c) 2019 John Snow for Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# owner=js...@redhat.com
+
+from collections import namedtuple
+import math
+import os
+
+import iotests
+from iotests import log, qemu_img
+
+SIZE = 64 * 1024 * 1024
+GRANULARITY = 64 * 1024
+
+Pattern = namedtuple('Pattern', ['byte', 'offset', 'size'])
+def mkpattern(byte, offset, size=GRANULARITY):
+"""Constructor for Pattern() with default size"""
+return Pattern(byte, offset, size)
+
+class PatternGroup:
+"""Grouping of Pattern objects. Initialize with an iterable of Patterns."""
+def __init__(self, patterns):
+self.patterns = patterns
+
+def bits(self, granularity):
+"""Calculate the unique bits dirtied by this pattern grouping"""
+res = set()
+for pattern in self.patterns:
+lower = pattern.offset // granularity
+upper = (pattern.offset + pattern.size - 1) // granularity
+res = res | set(range(lower, upper + 1))
+return res
+
+GROUPS = [
+PatternGroup([
+# Batch 0: 4 clusters
+mkpattern('0x49', 0x000),
+mkpattern('0x6c', 0x010),   # 1M
+mkpattern('0x6f', 0x200),   # 32M
+mkpattern('0x76', 0x3ff)]), # 64M - 64K
+PatternGroup([
+# Batch 1: 6 clusters (3 new)
+mkpattern('0x65', 0x000),   # Full overwrite
+mkpattern('0x77', 0x00f8000),   # Partial-left (1M-32K)
+mkpattern('0x72', 0x2008000),   # Partial-right (32M+32K)
+mkpattern('0x69', 0x3fe)]), # Adjacent-left (64M - 128K)
+PatternGroup([
+# Batch 2: 7 clusters (3 new)
+mkpattern('0x74', 0x001),   # Adjacent-right
+mkpattern('0x69', 0x00e8000),   # Partial-left  (1M-96K)
+mkpattern('0x6e', 0x2018000),   # Partial-right (32M+96K)
+mkpattern('0x67', 0x3fe,
+  2*GRANULARITY)]), # Overwrite [(64M-128K)-64M)
+PatternGroup([
+# Batch 3: 8 clusters (5 new)
+# Carefully chosen such that nothing re-dirties the one cluster
+# that copies out successfully before failure in Group #1.
+mkpattern('0xaa', 0x001,
+  3*GRANULARITY),   # Overwrite and 2x Adjacent-right
+mkpattern('0xbb', 0x00d8000),   # Partial-left (1M-160K)
+mkpattern('0xcc', 0x2028000),   # Partial-right (32M+160K)
+mkpattern('0xdd', 0x3fc)]), # New; leaving a gap to the right
+]
+
+class Drive:
+"""Represents, vaguely, a drive attached to a VM.
+Includes format, graph, and device information."""
+
+def __init__(self, path, vm=None):
+self.path = path
+self.vm = vm
+self.fmt = None
+self.size = None
+self.node = None
+self.device = None
+
+@property
+def name(self):
+return self.node or self.device
+
+def img_create(self, fmt, size):
+self.fmt = fmt
+self.size = size
+iotests.qemu_img_create('-f', self.fmt, self.path, str(self.size))
+
+def create_target(self, name, fmt, size):
+basename = os.path.basename(self.path)
+file_node_name = "file_{}".format(basename)
+vm = self.vm
+
+log(vm.command('blockdev-create', job_id='bdc-file-job',
+   options={
+   'driver': 'file',
+   'filename': self.path,
+   'size': 0,
+   }))
+vm.run_job('bdc-file-job')
+log(vm.command('blockdev-add', driver='file',
+   node_name=file_node_name, filename

[Qemu-block] [PULL 27/36] block/backup: improve sync=bitmap work estimates

2019-08-16 Thread John Snow

When making backups based on bitmaps, the work estimate can be more
accurate. Update iotests to reflect the new strategy.

TOP work estimates are broken, but do not get worse with this commit.
That issue is addressed in the following commits instead.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-7-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c |  8 +++-
 tests/qemu-iotests/256.out |  4 ++--
 tests/qemu-iotests/257.out | 36 ++--
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index f704c83a98f..b04ab2d5f0c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -459,9 +459,8 @@ static void 
backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 NULL, true);
 assert(ret);
 
-/* TODO job_progress_set_remaining() would make more sense */
-job_progress_update(&job->common.job,
-job->len - bdrv_get_dirty_count(job->copy_bitmap));
+job_progress_set_remaining(&job->common.job,
+   bdrv_get_dirty_count(job->copy_bitmap));
 }
 
 static int coroutine_fn backup_run(Job *job, Error **errp)
@@ -473,12 +472,11 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
 QLIST_INIT(&s->inflight_reqs);
 qemu_co_rwlock_init(&s->flush_rwlock);
 
-job_progress_set_remaining(job, s->len);
-
 if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
 backup_incremental_init_copy_bitmap(s);
 } else {
 bdrv_set_dirty_bitmap(s->copy_bitmap, 0, s->len);
+job_progress_set_remaining(job, s->len);
 }
 
 s->before_write.notify = backup_before_write_notify;
diff --git a/tests/qemu-iotests/256.out b/tests/qemu-iotests/256.out
index eec38614ec4..f18ecb0f912 100644
--- a/tests/qemu-iotests/256.out
+++ b/tests/qemu-iotests/256.out
@@ -113,7 +113,7 @@
 {
   "return": {}
 }
-{"data": {"device": "j2", "len": 67108864, "offset": 67108864, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "j3", "len": 67108864, "offset": 67108864, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "j2", "len": 0, "offset": 0, "speed": 0, "type": 
"backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
+{"data": {"device": "j3", "len": 0, "offset": 0, "speed": 0, "type": 
"backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
 
 --- Done ---
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index 43f2e0f9c99..811b1b11f19 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -150,7 +150,7 @@ expecting 7 dirty sectors; have 7. OK!
 {"execute": "job-cancel", "arguments": {"id": "backup_1"}}
 {"return": {}}
 {"data": {"id": "backup_1", "type": "backup"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "backup_1", "len": 67108864, "offset": 67108864, "speed": 
0, "type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "backup_1", "len": 393216, "offset": 393216, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_CANCELLED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {
   "bitmaps": {
 "device0": [
@@ -228,7 +228,7 @@ expecting 15 dirty sectors; have 15. OK!
 {"execute": "job-finalize", "arguments": {"id": "backup_2"}}
 {"return": {}}
 {"data": {"id": "backup_2", "type": "backup"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "backup_2", "len": 67108864, "offset": 67108864, "speed": 
0, "type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "backup_2", "len": 983040, "offset": 983040, "speed": 0, 
"type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {
   "bitmaps": {
 "device0": [
@@ -367,7 +367,7 @@ expecting 6 dirty sectors; have 6. OK!
 {"execute": "blockdev-backup", "arguments": {"auto-finalize": false, "bitmap": 
"bitmap0", "bitmap-mode": "never", "device": "drive0", "job-id": "backup_1", 
"sync": "bitmap", "target": "backup_target_1"}}
 {"return": {}}
 {"data": {"action": "report", "device": "backup_1", "operation": "read"}, 
"event": "BLOCK_JOB_ERROR", "timestamp": {"microseconds": "USECS", "seconds": 
"SECS"}}
-{"data": {"device": "backup_1", "error": "Input/output error", "len": 
67108864, "offset": 66781184, "speed": 0, "type": "backup"}, "event": 
"BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": 
"SECS"}}
+{"data": {"device": "backup_1", "error": "Inp

[Qemu-block] [PULL 19/36] blockdev: reduce aio_context locked sections in bitmap add/remove

2019-08-16 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Commit 0a6c86d024c52 returned these locks back to add/remove
functionality, to protect from intersection of persistent bitmap
related IO with other IO. But other bitmap-related functions called
here are unrelated to the problem, and there are no needs to keep these
calls inside critical sections.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190708220502.12977-2-js...@redhat.com
Signed-off-by: John Snow 
---
 blockdev.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index a44ab1f709e..bcd766a1a24 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2813,7 +2813,6 @@ void qmp_block_dirty_bitmap_add(const char *node, const 
char *name,
 {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
-AioContext *aio_context = NULL;
 
 if (!name || name[0] == '\0') {
 error_setg(errp, "Bitmap name cannot be empty");
@@ -2849,16 +2848,20 @@ void qmp_block_dirty_bitmap_add(const char *node, const 
char *name,
 }
 
 if (persistent) {
-aio_context = bdrv_get_aio_context(bs);
+AioContext *aio_context = bdrv_get_aio_context(bs);
+bool ok;
+
 aio_context_acquire(aio_context);
-if (!bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp)) {
-goto out;
+ok = bdrv_can_store_new_dirty_bitmap(bs, name, granularity, errp);
+aio_context_release(aio_context);
+if (!ok) {
+return;
 }
 }
 
 bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
 if (bitmap == NULL) {
-goto out;
+return;
 }
 
 if (disabled) {
@@ -2866,10 +2869,6 @@ void qmp_block_dirty_bitmap_add(const char *node, const 
char *name,
 }
 
 bdrv_dirty_bitmap_set_persistence(bitmap, persistent);
- out:
-if (aio_context) {
-aio_context_release(aio_context);
-}
 }
 
 void qmp_block_dirty_bitmap_remove(const char *node, const char *name,
@@ -2877,8 +2876,6 @@ void qmp_block_dirty_bitmap_remove(const char *node, 
const char *name,
 {
 BlockDriverState *bs;
 BdrvDirtyBitmap *bitmap;
-Error *local_err = NULL;
-AioContext *aio_context = NULL;
 
 bitmap = block_dirty_bitmap_lookup(node, name, &bs, errp);
 if (!bitmap || !bs) {
@@ -2891,20 +2888,19 @@ void qmp_block_dirty_bitmap_remove(const char *node, 
const char *name,
 }
 
 if (bdrv_dirty_bitmap_get_persistence(bitmap)) {
-aio_context = bdrv_get_aio_context(bs);
+AioContext *aio_context = bdrv_get_aio_context(bs);
+Error *local_err = NULL;
+
 aio_context_acquire(aio_context);
 bdrv_remove_persistent_dirty_bitmap(bs, name, &local_err);
+aio_context_release(aio_context);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
-goto out;
+return;
 }
 }
 
 bdrv_release_dirty_bitmap(bs, bitmap);
- out:
-if (aio_context) {
-aio_context_release(aio_context);
-}
 }
 
 /**
-- 
2.21.0

[Qemu-block] [PULL 24/36] iotests/257: Refactor backup helpers

2019-08-16 Thread John Snow

This test needs support for non-bitmap backups and missing or
unspecified bitmap sync modes, so rewrite the helpers to be a little
more generic.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-4-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257 |  56 ++-
 tests/qemu-iotests/257.out | 192 ++---
 2 files changed, 128 insertions(+), 120 deletions(-)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index bc66ea03b24..aaa8f595043 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -207,31 +207,37 @@ def get_bitmap(bitmaps, drivename, name, recording=None):
 return bitmap
 return None
 
+def blockdev_backup(vm, device, target, sync, **kwargs):
+# Strip any arguments explicitly nulled by the caller:
+kwargs = {key: val for key, val in kwargs.items() if val is not None}
+result = vm.qmp_log('blockdev-backup',
+device=device,
+target=target,
+sync=sync,
+**kwargs)
+return result
+
+def blockdev_backup_mktarget(drive, target_id, filepath, sync, **kwargs):
+target_drive = Drive(filepath, vm=drive.vm)
+target_drive.create_target(target_id, drive.fmt, drive.size)
+blockdev_backup(drive.vm, drive.name, target_id, sync, **kwargs)
+
 def reference_backup(drive, n, filepath):
 log("--- Reference Backup #{:d} ---\n".format(n))
 target_id = "ref_target_{:d}".format(n)
 job_id = "ref_backup_{:d}".format(n)
-target_drive = Drive(filepath, vm=drive.vm)
-
-target_drive.create_target(target_id, drive.fmt, drive.size)
-drive.vm.qmp_log("blockdev-backup",
- job_id=job_id, device=drive.name,
- target=target_id, sync="full")
+blockdev_backup_mktarget(drive, target_id, filepath, "full",
+ job_id=job_id)
 drive.vm.run_job(job_id, auto_dismiss=True)
 log('')
 
-def bitmap_backup(drive, n, filepath, bitmap, bitmap_mode):
-log("--- Bitmap Backup #{:d} ---\n".format(n))
-target_id = "bitmap_target_{:d}".format(n)
-job_id = "bitmap_backup_{:d}".format(n)
-target_drive = Drive(filepath, vm=drive.vm)
-
-target_drive.create_target(target_id, drive.fmt, drive.size)
-drive.vm.qmp_log("blockdev-backup", job_id=job_id, device=drive.name,
- target=target_id, sync="bitmap",
- bitmap_mode=bitmap_mode,
- bitmap=bitmap,
- auto_finalize=False)
+def backup(drive, n, filepath, sync, **kwargs):
+log("--- Test Backup #{:d} ---\n".format(n))
+target_id = "backup_target_{:d}".format(n)
+job_id = "backup_{:d}".format(n)
+kwargs.setdefault('auto-finalize', False)
+blockdev_backup_mktarget(drive, target_id, filepath, sync,
+ job_id=job_id, **kwargs)
 return job_id
 
 def perform_writes(drive, n):
@@ -263,7 +269,7 @@ def compare_images(image, reference, baseimg=None, 
expected_match=True):
 "OK!" if ret == expected_ret else "ERROR!"),
 filters=[iotests.filter_testfiles])
 
-def test_bitmap_sync(bsync_mode, failure=None):
+def test_bitmap_sync(bsync_mode, msync_mode='bitmap', failure=None):
 """
 Test bitmap backup routines.
 
@@ -291,7 +297,7 @@ def test_bitmap_sync(bsync_mode, failure=None):
  fbackup0, fbackup1, fbackup2), \
  iotests.VM() as vm:
 
-mode = "Bitmap Sync Mode {:s}".format(bsync_mode)
+mode = "Mode {:s}; Bitmap Sync {:s}".format(msync_mode, bsync_mode)
 preposition = "with" if failure else "without"
 cond = "{:s} {:s}".format(preposition,
   "{:s} failure".format(failure) if failure
@@ -362,12 +368,13 @@ def test_bitmap_sync(bsync_mode, failure=None):
 ebitmap.compare(bitmap)
 reference_backup(drive0, 1, fbackup1)
 
-# 1 - Bitmap Backup (Optional induced failure)
+# 1 - Test Backup (w/ Optional induced failure)
 if failure == 'intermediate':
 # Activate blkdebug induced failure for second-to-next read
 log(vm.hmp_qemu_io(drive0.name, 'flush'))
 log('')
-job = bitmap_backup(drive0, 1, bsync1, "bitmap0", bsync_mode)
+job = backup(drive0, 1, bsync1, msync_mode,
+ bitmap="bitmap0", bitmap_mode=bsync_mode)
 
 def _callback():
 """Issue writes while the job is open to test bitmap divergence."""
@@ -408,7 +415,8 @@ def test_bitmap_sync(bsync_mode, failure=None):
 reference_backup(drive0, 2, fbackup2)
 
 # 2 - Bitmap Backup (In failure modes, this is a recovery.)
-job = bitmap_backup(drive0, 2, bsync2, "bitmap0", bsync_mode)
+job = backup(drive0, 2, bsync2, "bitmap",
+ bitmap="bitmap0", bitmap_mode=bsync_mode)
 vm.run_job

[Qemu-block] [PULL 30/36] block/backup: teach TOP to never copy unallocated regions

2019-08-16 Thread John Snow

Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
In the write notifier handler, we dutifully copy out such regions.

Fix this in three parts:

1. Mark the bitmap as being initialized before the first yield.
2. After the first yield but before the backup loop, interrogate the
allocation status asynchronously and initialize the bitmap.
3. Teach the write notifier to interrogate allocation status if it is
invoked during bitmap initialization.

As an effect of this patch, the job progress for TOP backups
now behaves like this:

- total progress starts at bdrv_length.
- As allocation status is interrogated, total progress decreases.
- As blocks are copied, current progress increases.

Taken together, the floor and ceiling move to meet each other.


Signed-off-by: John Snow 
Message-id: 20190716000117.25219-10-js...@redhat.com
[Remove ret = -ECANCELED change. --js]
[Squash in conflict resolution based on Max's patch --js]
Message-id: c8b0ab36-79c8-0b4b-3193-4e12ed8c8...@redhat.com
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
---
 block/backup.c | 79 --
 block/trace-events |  1 +
 2 files changed, 71 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index f6bf32c9438..9e1382ec5c6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -58,6 +58,7 @@ typedef struct BackupBlockJob {
 int64_t copy_range_size;
 
 bool serialize_target_writes;
+bool initializing_bitmap;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
@@ -227,6 +228,35 @@ static int backup_is_cluster_allocated(BackupBlockJob *s, 
int64_t offset,
 }
 }
 
+/**
+ * Reset bits in copy_bitmap starting at offset if they represent unallocated
+ * data in the image. May reset subsequent contiguous bits.
+ * @return 0 when the cluster at @offset was unallocated,
+ * 1 otherwise, and -ret on error.
+ */
+static int64_t backup_bitmap_reset_unallocated(BackupBlockJob *s,
+   int64_t offset, int64_t *count)
+{
+int ret;
+int64_t clusters, bytes, estimate;
+
+ret = backup_is_cluster_allocated(s, offset, &clusters);
+if (ret < 0) {
+return ret;
+}
+
+bytes = clusters * s->cluster_size;
+
+if (!ret) {
+bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+estimate = bdrv_get_dirty_count(s->copy_bitmap);
+job_progress_set_remaining(&s->common.job, estimate);
+}
+
+*count = bytes;
+return ret;
+}
+
 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
   int64_t offset, uint64_t bytes,
   bool *error_is_read,
@@ -236,6 +266,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 int ret = 0;
 int64_t start, end; /* bytes */
 void *bounce_buffer = NULL;
+int64_t status_bytes;
 
 qemu_co_rwlock_rdlock(&job->flush_rwlock);
 
@@ -262,6 +293,17 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 dirty_end = end;
 }
 
+if (job->initializing_bitmap) {
+ret = backup_bitmap_reset_unallocated(job, start, &status_bytes);
+if (ret == 0) {
+trace_backup_do_cow_skip_range(job, start, status_bytes);
+start += status_bytes;
+continue;
+}
+/* Clamp to known allocated region */
+dirty_end = MIN(dirty_end, start + status_bytes);
+}
+
 trace_backup_do_cow_process(job, start);
 
 if (job->use_copy_range) {
@@ -446,18 +488,9 @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
 int64_t offset;
 BdrvDirtyBitmapIter *bdbi;
 int ret = 0;
-int64_t dummy;
 
 bdbi = bdrv_dirty_iter_new(job->copy_bitmap);
 while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-if (job->sync_mode == MIRROR_SYNC_MODE_TOP &&
-!backup_is_cluster_allocated(job, offset, &dummy))
-{
-bdrv_reset_dirty_bitmap(job->copy_bitmap, offset,
-job->cluster_size);
-continue;
-}
-
 do {
 if (yield_and_check(job)) {
 goto out;
@@ -488,6 +521,13 @@ static void backup_init_copy_bitmap(BackupBlockJob *job)
NULL, true);
 assert(ret);
 } else {
+if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
+/*
+ * We can't hog the coroutine to initialize this thoroughly.
+ * Set a flag and resume work when we are able to yield safely.
+ */
+job->initializing_bitmap = true;
+}
 bdrv_set_dirty_bitmap(job->copy_bitmap, 0, job->len);
 }
 
@@ -509,6 +549,26 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
 s->before_write.notify = backup_before_write_notify;
 bdrv_add_before_write_notifier(bs, &s->before_

[Qemu-block] [PULL 22/36] iotests/257: add Pattern class

2019-08-16 Thread John Snow

Just kidding, this is easier to manage with a full class instead of a
namedtuple.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-2-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257 | 58 +++---
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index 39526837499..02f9ae06490 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -19,7 +19,6 @@
 #
 # owner=js...@redhat.com
 
-from collections import namedtuple
 import math
 import os
 
@@ -29,10 +28,18 @@ from iotests import log, qemu_img
 SIZE = 64 * 1024 * 1024
 GRANULARITY = 64 * 1024
 
-Pattern = namedtuple('Pattern', ['byte', 'offset', 'size'])
-def mkpattern(byte, offset, size=GRANULARITY):
-"""Constructor for Pattern() with default size"""
-return Pattern(byte, offset, size)
+
+class Pattern:
+def __init__(self, byte, offset, size=GRANULARITY):
+self.byte = byte
+self.offset = offset
+self.size = size
+
+def bits(self, granularity):
+lower = self.offset // granularity
+upper = (self.offset + self.size - 1) // granularity
+return set(range(lower, upper + 1))
+
 
 class PatternGroup:
 """Grouping of Pattern objects. Initialize with an iterable of Patterns."""
@@ -43,40 +50,39 @@ class PatternGroup:
 """Calculate the unique bits dirtied by this pattern grouping"""
 res = set()
 for pattern in self.patterns:
-lower = pattern.offset // granularity
-upper = (pattern.offset + pattern.size - 1) // granularity
-res = res | set(range(lower, upper + 1))
+res |= pattern.bits(granularity)
 return res
 
+
 GROUPS = [
 PatternGroup([
 # Batch 0: 4 clusters
-mkpattern('0x49', 0x000),
-mkpattern('0x6c', 0x010),   # 1M
-mkpattern('0x6f', 0x200),   # 32M
-mkpattern('0x76', 0x3ff)]), # 64M - 64K
+Pattern('0x49', 0x000),
+Pattern('0x6c', 0x010),   # 1M
+Pattern('0x6f', 0x200),   # 32M
+Pattern('0x76', 0x3ff)]), # 64M - 64K
 PatternGroup([
 # Batch 1: 6 clusters (3 new)
-mkpattern('0x65', 0x000),   # Full overwrite
-mkpattern('0x77', 0x00f8000),   # Partial-left (1M-32K)
-mkpattern('0x72', 0x2008000),   # Partial-right (32M+32K)
-mkpattern('0x69', 0x3fe)]), # Adjacent-left (64M - 128K)
+Pattern('0x65', 0x000),   # Full overwrite
+Pattern('0x77', 0x00f8000),   # Partial-left (1M-32K)
+Pattern('0x72', 0x2008000),   # Partial-right (32M+32K)
+Pattern('0x69', 0x3fe)]), # Adjacent-left (64M - 128K)
 PatternGroup([
 # Batch 2: 7 clusters (3 new)
-mkpattern('0x74', 0x001),   # Adjacent-right
-mkpattern('0x69', 0x00e8000),   # Partial-left  (1M-96K)
-mkpattern('0x6e', 0x2018000),   # Partial-right (32M+96K)
-mkpattern('0x67', 0x3fe,
-  2*GRANULARITY)]), # Overwrite [(64M-128K)-64M)
+Pattern('0x74', 0x001),   # Adjacent-right
+Pattern('0x69', 0x00e8000),   # Partial-left  (1M-96K)
+Pattern('0x6e', 0x2018000),   # Partial-right (32M+96K)
+Pattern('0x67', 0x3fe,
+2*GRANULARITY)]), # Overwrite [(64M-128K)-64M)
 PatternGroup([
 # Batch 3: 8 clusters (5 new)
 # Carefully chosen such that nothing re-dirties the one cluster
 # that copies out successfully before failure in Group #1.
-mkpattern('0xaa', 0x001,
-  3*GRANULARITY),   # Overwrite and 2x Adjacent-right
-mkpattern('0xbb', 0x00d8000),   # Partial-left (1M-160K)
-mkpattern('0xcc', 0x2028000),   # Partial-right (32M+160K)
-mkpattern('0xdd', 0x3fc)]), # New; leaving a gap to the right
+Pattern('0xaa', 0x001,
+3*GRANULARITY),   # Overwrite and 2x Adjacent-right
+Pattern('0xbb', 0x00d8000),   # Partial-left (1M-160K)
+Pattern('0xcc', 0x2028000),   # Partial-right (32M+160K)
+Pattern('0xdd', 0x3fc)]), # New; leaving a gap to the right
 ]
 
 class Drive:
-- 
2.21.0

[Qemu-block] [PULL 23/36] iotests/257: add EmulatedBitmap class

2019-08-16 Thread John Snow

Represent a bitmap with an object that we can mark and clear bits in.
This makes it easier to manage partial writes when we don't write a
full group's worth of patterns before an error.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-3-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/257 | 124 +
 1 file changed, 75 insertions(+), 49 deletions(-)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index 02f9ae06490..bc66ea03b24 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -85,6 +85,59 @@ GROUPS = [
 Pattern('0xdd', 0x3fc)]), # New; leaving a gap to the right
 ]
 
+
+class EmulatedBitmap:
+def __init__(self, granularity=GRANULARITY):
+self._bits = set()
+self.granularity = granularity
+
+def dirty_bits(self, bits):
+self._bits |= set(bits)
+
+def dirty_group(self, n):
+self.dirty_bits(GROUPS[n].bits(self.granularity))
+
+def clear(self):
+self._bits = set()
+
+def clear_bits(self, bits):
+self._bits -= set(bits)
+
+def clear_bit(self, bit):
+self.clear_bits({bit})
+
+def clear_group(self, n):
+self.clear_bits(GROUPS[n].bits(self.granularity))
+
+@property
+def first_bit(self):
+return sorted(self.bits)[0]
+
+@property
+def bits(self):
+return self._bits
+
+@property
+def count(self):
+return len(self.bits)
+
+def compare(self, qmp_bitmap):
+"""
+Print a nice human-readable message checking that a bitmap as reported
+by the QMP interface has as many bits set as we expect it to.
+"""
+
+name = qmp_bitmap.get('name', '(anonymous)')
+log("= Checking Bitmap {:s} =".format(name))
+
+want = self.count
+have = qmp_bitmap['count'] // qmp_bitmap['granularity']
+
+log("expecting {:d} dirty sectors; have {:d}. {:s}".format(
+want, have, "OK!" if want == have else "ERROR!"))
+log('')
+
+
 class Drive:
 """Represents, vaguely, a drive attached to a VM.
 Includes format, graph, and device information."""
@@ -195,27 +248,6 @@ def perform_writes(drive, n):
 log('')
 return bitmaps
 
-def calculate_bits(groups=None):
-"""Calculate how many bits we expect to see dirtied."""
-if groups:
-bits = set.union(*(GROUPS[group].bits(GRANULARITY) for group in 
groups))
-return len(bits)
-return 0
-
-def bitmap_comparison(bitmap, groups=None, want=0):
-"""
-Print a nice human-readable message checking that this bitmap has as
-many bits set as we expect it to.
-"""
-log("= Checking Bitmap {:s} =".format(bitmap.get('name', '(anonymous)')))
-
-if groups:
-want = calculate_bits(groups)
-have = bitmap['count'] // bitmap['granularity']
-
-log("expecting {:d} dirty sectors; have {:d}. {:s}".format(
-want, have, "OK!" if want == have else "ERROR!"))
-log('')
 
 def compare_images(image, reference, baseimg=None, expected_match=True):
 """
@@ -321,12 +353,13 @@ def test_bitmap_sync(bsync_mode, failure=None):
 vm.qmp_log("block-dirty-bitmap-add", node=drive0.name,
name="bitmap0", granularity=GRANULARITY)
 log('')
+ebitmap = EmulatedBitmap()
 
 # 1 - Writes and Reference Backup
 bitmaps = perform_writes(drive0, 1)
-dirty_groups = {1}
+ebitmap.dirty_group(1)
 bitmap = get_bitmap(bitmaps, drive0.device, 'bitmap0')
-bitmap_comparison(bitmap, groups=dirty_groups)
+ebitmap.compare(bitmap)
 reference_backup(drive0, 1, fbackup1)
 
 # 1 - Bitmap Backup (Optional induced failure)
@@ -342,54 +375,47 @@ def test_bitmap_sync(bsync_mode, failure=None):
 log('')
 bitmaps = perform_writes(drive0, 2)
 # Named bitmap (static, should be unchanged)
-bitmap_comparison(get_bitmap(bitmaps, drive0.device, 'bitmap0'),
-  groups=dirty_groups)
+ebitmap.compare(get_bitmap(bitmaps, drive0.device, 'bitmap0'))
 # Anonymous bitmap (dynamic, shows new writes)
-bitmap_comparison(get_bitmap(bitmaps, drive0.device, '',
- recording=True), groups={2})
-dirty_groups.add(2)
+anonymous = EmulatedBitmap()
+anonymous.dirty_group(2)
+anonymous.compare(get_bitmap(bitmaps, drive0.device, '',
+ recording=True))
+
+# Simulate the order in which this will happen:
+# group 1 gets cleared first, then group two gets written.
+if ((bsync_mode == 'on-success' and not failure) or
+(bsync_mode == 'always')):
+ebitmap.clear_group(1)
+ebitmap.dirty_group(2)
 
 vm.run_job(job, auto_dismiss=True, auto_fina

[Qemu-block] [PULL 21/36] iotests: test bitmap moving inside 254

2019-08-16 Thread John Snow

From: Vladimir Sementsov-Ogievskiy 

Test persistent bitmap copying with and without removal of original
bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190708220502.12977-4-js...@redhat.com
[Edited comment "bitmap1" --> "bitmap2" as per review. --js]
Signed-off-by: John Snow 
---
 tests/qemu-iotests/254 | 30 +-
 tests/qemu-iotests/254.out | 82 ++
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/254 b/tests/qemu-iotests/254
index 8edba91c5d4..09584f3f7de 100755
--- a/tests/qemu-iotests/254
+++ b/tests/qemu-iotests/254
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Test external snapshot with bitmap copying.
+# Test external snapshot with bitmap copying and moving.
 #
 # Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
 #
@@ -32,6 +32,10 @@ vm = iotests.VM().add_drive(disk, opts='node-name=base')
 vm.launch()
 
 vm.qmp_log('block-dirty-bitmap-add', node='drive0', name='bitmap0')
+vm.qmp_log('block-dirty-bitmap-add', node='drive0', name='bitmap1',
+   persistent=True)
+vm.qmp_log('block-dirty-bitmap-add', node='drive0', name='bitmap2',
+   persistent=True)
 
 vm.hmp_qemu_io('drive0', 'write 0 512K')
 
@@ -39,16 +43,38 @@ vm.qmp_log('transaction', indent=2, actions=[
 {'type': 'blockdev-snapshot-sync',
  'data': {'device': 'drive0', 'snapshot-file': top,
   'snapshot-node-name': 'snap'}},
+
+# copy non-persistent bitmap0
 {'type': 'block-dirty-bitmap-add',
  'data': {'node': 'snap', 'name': 'bitmap0'}},
 {'type': 'block-dirty-bitmap-merge',
  'data': {'node': 'snap', 'target': 'bitmap0',
-  'bitmaps': [{'node': 'base', 'name': 'bitmap0'}]}}
+  'bitmaps': [{'node': 'base', 'name': 'bitmap0'}]}},
+
+# copy persistent bitmap1, original will be saved to base image
+{'type': 'block-dirty-bitmap-add',
+ 'data': {'node': 'snap', 'name': 'bitmap1', 'persistent': True}},
+{'type': 'block-dirty-bitmap-merge',
+ 'data': {'node': 'snap', 'target': 'bitmap1',
+  'bitmaps': [{'node': 'base', 'name': 'bitmap1'}]}},
+
+# move persistent bitmap2, original will be removed and not saved
+# to base image
+{'type': 'block-dirty-bitmap-add',
+ 'data': {'node': 'snap', 'name': 'bitmap2', 'persistent': True}},
+{'type': 'block-dirty-bitmap-merge',
+ 'data': {'node': 'snap', 'target': 'bitmap2',
+  'bitmaps': [{'node': 'base', 'name': 'bitmap2'}]}},
+{'type': 'block-dirty-bitmap-remove',
+ 'data': {'node': 'base', 'name': 'bitmap2'}}
 ], filters=[iotests.filter_qmp_testfiles])
 
 result = vm.qmp('query-block')['return'][0]
 log("query-block: device = {}, node-name = {}, dirty-bitmaps:".format(
 result['device'], result['inserted']['node-name']))
 log(result['dirty-bitmaps'], indent=2)
+log("\nbitmaps in backing image:")
+log(result['inserted']['image']['backing-image']['format-specific'] \
+['data']['bitmaps'], indent=2)
 
 vm.shutdown()
diff --git a/tests/qemu-iotests/254.out b/tests/qemu-iotests/254.out
index d7394cf0026..d185c0532f6 100644
--- a/tests/qemu-iotests/254.out
+++ b/tests/qemu-iotests/254.out
@@ -1,5 +1,9 @@
 {"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": 
"drive0"}}
 {"return": {}}
+{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap1", "node": 
"drive0", "persistent": true}}
+{"return": {}}
+{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap2", "node": 
"drive0", "persistent": true}}
+{"return": {}}
 {
   "execute": "transaction",
   "arguments": {
@@ -31,6 +35,55 @@
   "target": "bitmap0"
 },
 "type": "block-dirty-bitmap-merge"
+  },
+  {
+"data": {
+  "name": "bitmap1",
+  "node": "snap",
+  "persistent": true
+},
+"type": "block-dirty-bitmap-add"
+  },
+  {
+"data": {
+  "bitmaps": [
+{
+  "name": "bitmap1",
+  "node": "base"
+}
+  ],
+  "node": "snap",
+  "target": "bitmap1"
+},
+"type": "block-dirty-bitmap-merge"
+  },
+  {
+"data": {
+  "name": "bitmap2",
+  "node": "snap",
+  "persistent": true
+},
+"type": "block-dirty-bitmap-add"
+  },
+  {
+"data": {
+  "bitmaps": [
+{
+  "name": "bitmap2",
+  "node": "base"
+}
+  ],
+  "node": "snap",
+  "target": "bitmap2"
+},
+"type": "block-dirty-bitmap-merge"
+  },
+  {
+"data": {
+  "name": "bitmap2",
+  "node": "base"
+},
+"type": "block-dirty-bitmap-remove"
   }
 ]
   }
@@ -40,6 +93,24 @@
 }
 query-block: device = drive0, node-name = snap, dirty-bitmaps:
 [
+

[Qemu-block] [PULL 25/36] block/backup: hoist bitmap check into QMP interface

2019-08-16 Thread John Snow

This is nicer to do in the unified QMP interface that we have now,
because it lets us use the right terminology back at the user.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190716000117.25219-5-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c | 13 -
 blockdev.c | 10 ++
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index f8309be01b3..f704c83a98f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -576,6 +576,10 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 assert(bs);
 assert(target);
 
+/* QMP interface protects us from these cases */
+assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
+assert(sync_bitmap || sync_mode != MIRROR_SYNC_MODE_BITMAP);
+
 if (bs == target) {
 error_setg(errp, "Source and target cannot be the same");
 return NULL;
@@ -607,16 +611,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 return NULL;
 }
 
-/* QMP interface should have handled translating this to bitmap mode */
-assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
-
 if (sync_mode == MIRROR_SYNC_MODE_BITMAP) {
-if (!sync_bitmap) {
-error_setg(errp, "must provide a valid bitmap name for "
-   "'%s' sync mode", MirrorSyncMode_str(sync_mode));
-return NULL;
-}
-
 /* If we need to write to this bitmap, check that we can: */
 if (bitmap_mode != BITMAP_SYNC_MODE_NEVER &&
 bdrv_dirty_bitmap_check(sync_bitmap, BDRV_BITMAP_DEFAULT, errp)) {
diff --git a/blockdev.c b/blockdev.c
index 210226d8290..f889da0b427 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3529,6 +3529,16 @@ static BlockJob *do_backup_common(BackupCommon *backup,
 return NULL;
 }
 
+if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||
+(backup->sync == MIRROR_SYNC_MODE_INCREMENTAL)) {
+/* done before desugaring 'incremental' to print the right message */
+if (!backup->has_bitmap) {
+error_setg(errp, "must provide a valid bitmap name for "
+   "'%s' sync mode", MirrorSyncMode_str(backup->sync));
+return NULL;
+}
+}
+
 if (backup->sync == MIRROR_SYNC_MODE_INCREMENTAL) {
 if (backup->has_bitmap_mode &&
 backup->bitmap_mode != BITMAP_SYNC_MODE_ON_SUCCESS) {
-- 
2.21.0

[Qemu-block] [PULL 14/36] iotests: teach run_job to cancel pending jobs

2019-08-16 Thread John Snow

run_job can cancel pending jobs to simulate failure. This lets us use
the pending callback to issue test commands while the job is open, but
then still have the job fail in the end.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-15-js...@redhat.com
[Maintainer edit: Merge conflict resolution in run_job]
Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7fc062cdcf4..81ae7b911ac 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -541,7 +541,23 @@ class VM(qtest.QEMUQtestMachine):
 
 # Returns None on success, and an error string on failure
 def run_job(self, job, auto_finalize=True, auto_dismiss=False,
-pre_finalize=None, use_log=True, wait=60.0):
+pre_finalize=None, cancel=False, use_log=True, wait=60.0):
+"""
+run_job moves a job from creation through to dismissal.
+
+:param job: String. ID of recently-launched job
+:param auto_finalize: Bool. True if the job was launched with
+  auto_finalize. Defaults to True.
+:param auto_dismiss: Bool. True if the job was launched with
+ auto_dismiss=True. Defaults to False.
+:param pre_finalize: Callback. A callable that takes no arguments to be
+ invoked prior to issuing job-finalize, if any.
+:param cancel: Bool. When true, cancels the job after the pre_finalize
+   callback.
+:param use_log: Bool. When false, does not log QMP messages.
+:param wait: Float. Timeout value specifying how long to wait for any
+ event, in seconds. Defaults to 60.0.
+"""
 match_device = {'data': {'device': job}}
 match_id = {'data': {'id': job}}
 events = [
@@ -570,7 +586,11 @@ class VM(qtest.QEMUQtestMachine):
 elif status == 'pending' and not auto_finalize:
 if pre_finalize:
 pre_finalize()
-if use_log:
+if cancel and use_log:
+self.qmp_log('job-cancel', id=job)
+elif cancel:
+self.qmp('job-cancel', id=job)
+elif use_log:
 self.qmp_log('job-finalize', id=job)
 else:
 self.qmp('job-finalize', id=job)
-- 
2.21.0

[Qemu-block] [PULL 16/36] iotests: Add virtio-scsi device helper

2019-08-16 Thread John Snow

Seems that it comes up enough.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-17-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/040| 6 +-
 tests/qemu-iotests/093| 6 ++
 tests/qemu-iotests/139| 7 ++-
 tests/qemu-iotests/238| 5 +
 tests/qemu-iotests/iotests.py | 4 
 5 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index aa0b1847e30..6db9abf8e6e 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -85,11 +85,7 @@ class TestSingleDrive(ImageCommitTestCase):
 qemu_io('-f', 'raw', '-c', 'write -P 0xab 0 524288', backing_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', 
mid_img)
 self.vm = iotests.VM().add_drive(test_img, 
"node-name=top,backing.node-name=mid,backing.backing.node-name=base", 
interface="none")
-if iotests.qemu_default_machine == 's390-ccw-virtio':
-self.vm.add_device("virtio-scsi-ccw")
-else:
-self.vm.add_device("virtio-scsi-pci")
-
+self.vm.add_device(iotests.get_virtio_scsi_device())
 self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
 self.vm.launch()
 self.has_quit = False
diff --git a/tests/qemu-iotests/093 b/tests/qemu-iotests/093
index 4b2cac1d0c6..3c4f5173cea 100755
--- a/tests/qemu-iotests/093
+++ b/tests/qemu-iotests/093
@@ -367,10 +367,8 @@ class ThrottleTestGroupNames(iotests.QMPTestCase):
 class ThrottleTestRemovableMedia(iotests.QMPTestCase):
 def setUp(self):
 self.vm = iotests.VM()
-if iotests.qemu_default_machine == 's390-ccw-virtio':
-self.vm.add_device("virtio-scsi-ccw,id=virtio-scsi")
-else:
-self.vm.add_device("virtio-scsi-pci,id=virtio-scsi")
+self.vm.add_device("{},id=virtio-scsi".format(
+iotests.get_virtio_scsi_device()))
 self.vm.launch()
 
 def tearDown(self):
diff --git a/tests/qemu-iotests/139 b/tests/qemu-iotests/139
index 933b45121a9..2176ea51ba8 100755
--- a/tests/qemu-iotests/139
+++ b/tests/qemu-iotests/139
@@ -35,11 +35,8 @@ class TestBlockdevDel(iotests.QMPTestCase):
 def setUp(self):
 iotests.qemu_img('create', '-f', iotests.imgfmt, base_img, '1M')
 self.vm = iotests.VM()
-if iotests.qemu_default_machine == 's390-ccw-virtio':
-self.vm.add_device("virtio-scsi-ccw,id=virtio-scsi")
-else:
-self.vm.add_device("virtio-scsi-pci,id=virtio-scsi")
-
+self.vm.add_device("{},id=virtio-scsi".format(
+iotests.get_virtio_scsi_device()))
 self.vm.launch()
 
 def tearDown(self):
diff --git a/tests/qemu-iotests/238 b/tests/qemu-iotests/238
index 08bc7e6b4be..e5ac2b2ff84 100755
--- a/tests/qemu-iotests/238
+++ b/tests/qemu-iotests/238
@@ -23,10 +23,7 @@ import os
 import iotests
 from iotests import log
 
-if iotests.qemu_default_machine == 's390-ccw-virtio':
-virtio_scsi_device = 'virtio-scsi-ccw'
-else:
-virtio_scsi_device = 'virtio-scsi-pci'
+virtio_scsi_device = iotests.get_virtio_scsi_device()
 
 vm = iotests.VM()
 vm.launch()
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 385dbad16ac..84438e837cb 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -164,6 +164,10 @@ def qemu_io_silent(*args):
  (-exitcode, ' '.join(args)))
 return exitcode
 
+def get_virtio_scsi_device():
+if qemu_default_machine == 's390-ccw-virtio':
+return 'virtio-scsi-ccw'
+return 'virtio-scsi-pci'
 
 class QemuIoInteractive:
 def __init__(self, *args):
-- 
2.21.0

[Qemu-block] [PULL 13/36] iotests: add testing shim for script-style python tests

2019-08-16 Thread John Snow

Because the new-style python tests don't use the iotests.main() test
launcher, we don't turn on the debugger logging for these scripts
when invoked via ./check -d.

Refactor the launcher shim into new and old style shims so that they
share environmental configuration.

Two cleanup notes: debug was not actually used as a global, and there
was no reason to create a class in an inner scope just to achieve
default variables; we can simply create an instance of the runner with
the values we want instead.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-14-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 40 +++
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 91172c39a52..7fc062cdcf4 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -61,7 +61,6 @@ cachemode = os.environ.get('CACHEMODE')
 qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE')
 
 socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper')
-debug = False
 
 luks_default_secret_object = 'secret,id=keysec0,data=' + \
  os.environ.get('IMGKEYSECRET', '')
@@ -858,11 +857,22 @@ def skip_if_unsupported(required_formats=[], 
read_only=False):
 return func_wrapper
 return skip_test_decorator
 
-def main(supported_fmts=[], supported_oses=['linux'], supported_cache_modes=[],
- unsupported_fmts=[]):
-'''Run tests'''
+def execute_unittest(output, verbosity, debug):
+runner = unittest.TextTestRunner(stream=output, descriptions=True,
+ verbosity=verbosity)
+try:
+# unittest.main() will use sys.exit(); so expect a SystemExit
+# exception
+unittest.main(testRunner=runner)
+finally:
+if not debug:
+sys.stderr.write(re.sub(r'Ran (\d+) tests? in [\d.]+s',
+r'Ran \1 tests', output.getvalue()))
 
-global debug
+def execute_test(test_function=None,
+ supported_fmts=[], supported_oses=['linux'],
+ supported_cache_modes=[], unsupported_fmts=[]):
+"""Run either unittest or script-style tests."""
 
 # We are using TEST_DIR and QEMU_DEFAULT_MACHINE as proxies to
 # indicate that we're not being run via "check". There may be
@@ -894,13 +904,15 @@ def main(supported_fmts=[], supported_oses=['linux'], 
supported_cache_modes=[],
 
 logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
 
-class MyTestRunner(unittest.TextTestRunner):
-def __init__(self, stream=output, descriptions=True, 
verbosity=verbosity):
-unittest.TextTestRunner.__init__(self, stream, descriptions, 
verbosity)
+if not test_function:
+execute_unittest(output, verbosity, debug)
+else:
+test_function()
 
-# unittest.main() will use sys.exit() so expect a SystemExit exception
-try:
-unittest.main(testRunner=MyTestRunner)
-finally:
-if not debug:
-sys.stderr.write(re.sub(r'Ran (\d+) tests? in [\d.]+s', r'Ran \1 
tests', output.getvalue()))
+def script_main(test_function, *args, **kwargs):
+"""Run script-style tests outside of the unittest framework"""
+execute_test(test_function, *args, **kwargs)
+
+def main(*args, **kwargs):
+"""Run tests using the unittest framework"""
+execute_test(None, *args, **kwargs)
-- 
2.21.0

[Qemu-block] [PULL 11/36] block/backup: upgrade copy_bitmap to BdrvDirtyBitmap

2019-08-16 Thread John Snow

This simplifies some interface matters; namely the initialization and
(later) merging the manifest back into the sync_bitmap if it was
provided.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-12-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c | 84 ++
 1 file changed, 44 insertions(+), 40 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d07b838930f..474f8eeae29 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -38,7 +38,10 @@ typedef struct CowRequest {
 typedef struct BackupBlockJob {
 BlockJob common;
 BlockBackend *target;
+
 BdrvDirtyBitmap *sync_bitmap;
+BdrvDirtyBitmap *copy_bitmap;
+
 MirrorSyncMode sync_mode;
 BitmapSyncMode bitmap_mode;
 BlockdevOnError on_source_error;
@@ -51,7 +54,6 @@ typedef struct BackupBlockJob {
 NotifierWithReturn before_write;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 
-HBitmap *copy_bitmap;
 bool use_copy_range;
 int64_t copy_range_size;
 
@@ -113,7 +115,7 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-hbitmap_reset(job->copy_bitmap, start, job->cluster_size);
+bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
 nbytes = MIN(job->cluster_size, job->len - start);
 if (!*bounce_buffer) {
 *bounce_buffer = blk_blockalign(blk, job->cluster_size);
@@ -146,7 +148,7 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 
 return nbytes;
 fail:
-hbitmap_set(job->copy_bitmap, start, job->cluster_size);
+bdrv_set_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
 return ret;
 
 }
@@ -169,12 +171,14 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
 nbytes = MIN(job->copy_range_size, end - start);
 nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
-hbitmap_reset(job->copy_bitmap, start, job->cluster_size * nr_clusters);
+bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
+job->cluster_size * nr_clusters);
 ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
 read_flags, write_flags);
 if (ret < 0) {
 trace_backup_do_cow_copy_range_fail(job, start, ret);
-hbitmap_set(job->copy_bitmap, start, job->cluster_size * nr_clusters);
+bdrv_set_dirty_bitmap(job->copy_bitmap, start,
+  job->cluster_size * nr_clusters);
 return ret;
 }
 
@@ -204,13 +208,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 while (start < end) {
 int64_t dirty_end;
 
-if (!hbitmap_get(job->copy_bitmap, start)) {
+if (!bdrv_dirty_bitmap_get(job->copy_bitmap, start)) {
 trace_backup_do_cow_skip(job, start);
 start += job->cluster_size;
 continue; /* already copied */
 }
 
-dirty_end = hbitmap_next_zero(job->copy_bitmap, start, (end - start));
+dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
+(end - start));
 if (dirty_end < 0) {
 dirty_end = end;
 }
@@ -307,14 +312,16 @@ static void backup_abort(Job *job)
 static void backup_clean(Job *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
+BlockDriverState *bs = blk_bs(s->common.blk);
+
+if (s->copy_bitmap) {
+bdrv_release_dirty_bitmap(bs, s->copy_bitmap);
+s->copy_bitmap = NULL;
+}
+
 assert(s->target);
 blk_unref(s->target);
 s->target = NULL;
-
-if (s->copy_bitmap) {
-hbitmap_free(s->copy_bitmap);
-s->copy_bitmap = NULL;
-}
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -329,7 +336,7 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 return;
 }
 
-hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
+bdrv_set_dirty_bitmap(backup_job->copy_bitmap, 0, backup_job->len);
 }
 
 static void backup_drain(BlockJob *job)
@@ -398,59 +405,52 @@ static bool bdrv_is_unallocated_range(BlockDriverState 
*bs,
 
 static int coroutine_fn backup_loop(BackupBlockJob *job)
 {
-int ret;
 bool error_is_read;
 int64_t offset;
-HBitmapIter hbi;
+BdrvDirtyBitmapIter *bdbi;
 BlockDriverState *bs = blk_bs(job->common.blk);
+int ret = 0;
 
-hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
-while ((offset = hbitmap_iter_next(&hbi)) != -1) {
+bdbi = bdrv_dirty_iter_new(job->copy_bitmap);
+while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
 if (job->sync_mode == MIRROR_SYNC_MODE_TOP &&
 bdrv_is_unallocated_range(bs, offset, job->cluster_size))

[Qemu-block] [PULL 05/36] block/backup: Add mirror sync mode 'bitmap'

2019-08-16 Thread John Snow

We don't need or want a new sync mode for simple differences in
semantics.  Create a new mode simply named "BITMAP" that is designed to
make use of the new Bitmap Sync Mode field.

Because the only bitmap sync mode is 'on-success', this adds no new
functionality to the backup job (yet). The old incremental backup mode
is maintained as a syntactic sugar for sync=bitmap, mode=on-success.

Add all of the plumbing necessary to support this new instruction.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-6-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c| 20 
 block/mirror.c|  6 --
 block/replication.c   |  2 +-
 blockdev.c| 25 +++--
 include/block/block_int.h |  4 +++-
 qapi/block-core.json  | 21 +++--
 6 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 4743c8f0bc5..2b4c5c23e4e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -38,9 +38,9 @@ typedef struct CowRequest {
 typedef struct BackupBlockJob {
 BlockJob common;
 BlockBackend *target;
-/* bitmap for sync=incremental */
 BdrvDirtyBitmap *sync_bitmap;
 MirrorSyncMode sync_mode;
+BitmapSyncMode bitmap_mode;
 BlockdevOnError on_source_error;
 BlockdevOnError on_target_error;
 CoRwlock flush_rwlock;
@@ -461,7 +461,7 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
 
 job_progress_set_remaining(job, s->len);
 
-if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
 backup_incremental_init_copy_bitmap(s);
 } else {
 hbitmap_set(s->copy_bitmap, 0, s->len);
@@ -545,6 +545,7 @@ static int64_t 
backup_calculate_cluster_size(BlockDriverState *target,
 BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
+  BitmapSyncMode bitmap_mode,
   bool compress,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
@@ -592,10 +593,13 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 return NULL;
 }
 
-if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+/* QMP interface should have handled translating this to bitmap mode */
+assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
+
+if (sync_mode == MIRROR_SYNC_MODE_BITMAP) {
 if (!sync_bitmap) {
 error_setg(errp, "must provide a valid bitmap name for "
- "\"incremental\" sync mode");
+   "'%s' sync mode", MirrorSyncMode_str(sync_mode));
 return NULL;
 }
 
@@ -605,8 +609,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 }
 } else if (sync_bitmap) {
 error_setg(errp,
-   "a sync_bitmap was provided to backup_run, "
-   "but received an incompatible sync_mode (%s)",
+   "a bitmap was given to backup_job_create, "
+   "but it received an incompatible sync_mode (%s)",
MirrorSyncMode_str(sync_mode));
 return NULL;
 }
@@ -649,8 +653,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->on_source_error = on_source_error;
 job->on_target_error = on_target_error;
 job->sync_mode = sync_mode;
-job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
-   sync_bitmap : NULL;
+job->sync_bitmap = sync_bitmap;
+job->bitmap_mode = bitmap_mode;
 job->compress = compress;
 
 /* Detect image-fleecing (and similar) schemes */
diff --git a/block/mirror.c b/block/mirror.c
index 9b36391bb97..70f24d9ef63 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1755,8 +1755,10 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
 bool is_none_mode;
 BlockDriverState *base;
 
-if (mode == MIRROR_SYNC_MODE_INCREMENTAL) {
-error_setg(errp, "Sync mode 'incremental' not supported");
+if ((mode == MIRROR_SYNC_MODE_INCREMENTAL) ||
+(mode == MIRROR_SYNC_MODE_BITMAP)) {
+error_setg(errp, "Sync mode '%s' not supported",
+   MirrorSyncMode_str(mode));
 return;
 }
 is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
diff --git a/block/replication.c b/block/replication.c
index 23b2993d747..936b2f8b5a4 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -543,7 +543,7 @@ static void replication_start(ReplicationState *rs, 
ReplicationMode mode,
 
 s->backup_job = backup_job_create(
 NULL, s->secondary_disk->bs, 
s->hidden_disk->bs,
-0, MIRROR_SYNC_MODE_NONE, NULL, false,
+

[Qemu-block] [PULL 15/36] iotests: teach FilePath to produce multiple paths

2019-08-16 Thread John Snow

Use "FilePaths" instead of "FilePath" to request multiple files be
cleaned up after we leave that object's scope.

This is not crucial; but it saves a little typing.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-16-js...@redhat.com
Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 81ae7b911ac..385dbad16ac 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -358,31 +358,45 @@ class Timeout:
 def timeout(self, signum, frame):
 raise Exception(self.errmsg)
 
+def file_pattern(name):
+return "{0}-{1}".format(os.getpid(), name)
 
-class FilePath(object):
-'''An auto-generated filename that cleans itself up.
+class FilePaths(object):
+"""
+FilePaths is an auto-generated filename that cleans itself up.
 
 Use this context manager to generate filenames and ensure that the file
 gets deleted::
 
-with TestFilePath('test.img') as img_path:
+with FilePaths(['test.img']) as img_path:
 qemu_img('create', img_path, '1G')
 # migration_sock_path is automatically deleted
-'''
-def __init__(self, name):
-filename = '{0}-{1}'.format(os.getpid(), name)
-self.path = os.path.join(test_dir, filename)
+"""
+def __init__(self, names):
+self.paths = []
+for name in names:
+self.paths.append(os.path.join(test_dir, file_pattern(name)))
 
 def __enter__(self):
-return self.path
+return self.paths
 
 def __exit__(self, exc_type, exc_val, exc_tb):
 try:
-os.remove(self.path)
+for path in self.paths:
+os.remove(path)
 except OSError:
 pass
 return False
 
+class FilePath(FilePaths):
+"""
+FilePath is a specialization of FilePaths that takes a single filename.
+"""
+def __init__(self, name):
+super(FilePath, self).__init__([name])
+
+def __enter__(self):
+return self.paths[0]
 
 def file_path_remover():
 for path in reversed(file_path_remover.paths):
@@ -407,7 +421,7 @@ def file_path(*names):
 
 paths = []
 for name in names:
-filename = '{0}-{1}'.format(os.getpid(), name)
+filename = file_pattern(name)
 path = os.path.join(test_dir, filename)
 file_path_remover.paths.append(path)
 paths.append(path)
-- 
2.21.0

[Qemu-block] [PULL 18/36] block/backup: loosen restriction on readonly bitmaps

2019-08-16 Thread John Snow

With the "never" sync policy, we actually can utilize readonly bitmaps
now. Loosen the check at the QMP level, and tighten it based on
provided arguments down at the job creation level instead.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-19-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c | 6 ++
 blockdev.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index 2be570c0bfd..f8309be01b3 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -617,6 +617,12 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 return NULL;
 }
 
+/* If we need to write to this bitmap, check that we can: */
+if (bitmap_mode != BITMAP_SYNC_MODE_NEVER &&
+bdrv_dirty_bitmap_check(sync_bitmap, BDRV_BITMAP_DEFAULT, errp)) {
+return NULL;
+}
+
 /* Create a new bitmap, and freeze/disable this one. */
 if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
 return NULL;
diff --git a/blockdev.c b/blockdev.c
index 985b6cd75c0..a44ab1f709e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3491,7 +3491,7 @@ static BlockJob *do_backup_common(BackupCommon *backup,
"when providing a bitmap");
 return NULL;
 }
-if (bdrv_dirty_bitmap_check(bmap, BDRV_BITMAP_DEFAULT, errp)) {
+if (bdrv_dirty_bitmap_check(bmap, BDRV_BITMAP_ALLOW_RO, errp)) {
 return NULL;
 }
 }
-- 
2.21.0

[Qemu-block] [PULL 12/36] block/backup: add 'always' bitmap sync policy

2019-08-16 Thread John Snow

This adds an "always" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, the bitmap is *always* synchronized. This means
that for backups that fail part-way through, the bitmap retains a record of
which sectors need to be copied out to accomplish a new backup using the
old, partial result.

In effect, this allows us to "resume" a failed backup; however the new backup
will be from the new point in time, so it isn't a "resume" as much as it is
an "incremental retry." This can be useful in the case of extremely large
backups that fail considerably through the operation and we'd like to not waste
the work that was already performed.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-13-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c   | 27 +++
 qapi/block-core.json |  5 -
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 474f8eeae29..2be570c0bfd 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -278,18 +278,29 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob 
*job, int ret)
 {
 BdrvDirtyBitmap *bm;
 BlockDriverState *bs = blk_bs(job->common.blk);
+bool sync = (((ret == 0) || (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS)) 
\
+ && (job->bitmap_mode != BITMAP_SYNC_MODE_NEVER));
 
-if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
+if (sync) {
 /*
- * Failure, or we don't want to synchronize the bitmap.
- * Merge the successor back into the parent, delete nothing.
+ * We succeeded, or we always intended to sync the bitmap.
+ * Delete this bitmap and install the child.
  */
-bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
-assert(bm);
-} else {
-/* Everything is fine, delete this bitmap and install the backup. */
 bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
-assert(bm);
+} else {
+/*
+ * We failed, or we never intended to sync the bitmap anyway.
+ * Merge the successor back into the parent, keeping all data.
+ */
+bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
+}
+
+assert(bm);
+
+if (ret < 0 && job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
+/* If we failed and synced, merge in the bits we didn't copy: */
+bdrv_dirty_bitmap_merge_internal(bm, job->copy_bitmap,
+ NULL, true);
 }
 }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 06e34488a30..8344fbe2030 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1149,10 +1149,13 @@
 # @never: The bitmap is never synchronized with the operation, and is
 # treated solely as a read-only manifest of blocks to copy.
 #
+# @always: The bitmap is always synchronized with the operation,
+#  regardless of whether or not the operation was successful.
+#
 # Since: 4.2
 ##
 { 'enum': 'BitmapSyncMode',
-  'data': ['on-success', 'never'] }
+  'data': ['on-success', 'never', 'always'] }
 
 ##
 # @MirrorCopyMode:
-- 
2.21.0

[Qemu-block] [PULL 10/36] block/dirty-bitmap: add bdrv_dirty_bitmap_get

2019-08-16 Thread John Snow

Add a public interface for get. While we're at it,
rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked".

(There are more functions to rename to the bdrv_dirty_bitmap_VERB form,
but they will wait until the conclusion of this series.)

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-11-js...@redhat.com
Signed-off-by: John Snow 
---
 block/dirty-bitmap.c | 19 ---
 block/mirror.c   |  2 +-
 include/block/dirty-bitmap.h |  4 ++--
 migration/block.c|  5 ++---
 nbd/server.c |  2 +-
 5 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 7881fea684b..75a5daf116f 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -509,14 +509,19 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 }
 
 /* Called within bdrv_dirty_bitmap_lock..unlock */
-bool bdrv_get_dirty_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t offset)
+bool bdrv_dirty_bitmap_get_locked(BdrvDirtyBitmap *bitmap, int64_t offset)
 {
-if (bitmap) {
-return hbitmap_get(bitmap->bitmap, offset);
-} else {
-return false;
-}
+return hbitmap_get(bitmap->bitmap, offset);
+}
+
+bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset)
+{
+bool ret;
+bdrv_dirty_bitmap_lock(bitmap);
+ret = bdrv_dirty_bitmap_get_locked(bitmap, offset);
+bdrv_dirty_bitmap_unlock(bitmap);
+
+return ret;
 }
 
 /**
diff --git a/block/mirror.c b/block/mirror.c
index 70f24d9ef63..2b870683f14 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -476,7 +476,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 int64_t next_offset = offset + nb_chunks * s->granularity;
 int64_t next_chunk = next_offset / s->granularity;
 if (next_offset >= s->bdev_length ||
-!bdrv_get_dirty_locked(source, s->dirty_bitmap, next_offset)) {
+!bdrv_dirty_bitmap_get_locked(s->dirty_bitmap, next_offset)) {
 break;
 }
 if (test_bit(next_chunk, s->in_flight_bitmap)) {
diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 62682eb865f..0120ef3f05a 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -84,12 +84,12 @@ void bdrv_dirty_bitmap_set_busy(BdrvDirtyBitmap *bitmap, 
bool busy);
 void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  HBitmap **backup, Error **errp);
 void bdrv_dirty_bitmap_set_migration(BdrvDirtyBitmap *bitmap, bool migration);
+bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset);
 
 /* Functions that require manual locking.  */
 void bdrv_dirty_bitmap_lock(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_unlock(BdrvDirtyBitmap *bitmap);
-bool bdrv_get_dirty_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t offset);
+bool bdrv_dirty_bitmap_get_locked(BdrvDirtyBitmap *bitmap, int64_t offset);
 void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
   int64_t offset, int64_t bytes);
 void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
diff --git a/migration/block.c b/migration/block.c
index e81fd7e14fa..aa747b55fa8 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -521,7 +521,6 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
  int is_async)
 {
 BlkMigBlock *blk;
-BlockDriverState *bs = blk_bs(bmds->blk);
 int64_t total_sectors = bmds->total_sectors;
 int64_t sector;
 int nr_sectors;
@@ -536,8 +535,8 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 blk_mig_unlock();
 }
 bdrv_dirty_bitmap_lock(bmds->dirty_bitmap);
-if (bdrv_get_dirty_locked(bs, bmds->dirty_bitmap,
-  sector * BDRV_SECTOR_SIZE)) {
+if (bdrv_dirty_bitmap_get_locked(bmds->dirty_bitmap,
+ sector * BDRV_SECTOR_SIZE)) {
 if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
 nr_sectors = total_sectors - sector;
 } else {
diff --git a/nbd/server.c b/nbd/server.c
index 3eacb898757..f55ccf8edfd 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2004,7 +2004,7 @@ static unsigned int bitmap_to_extents(BdrvDirtyBitmap 
*bitmap, uint64_t offset,
 bdrv_dirty_bitmap_lock(bitmap);
 
 it = bdrv_dirty_iter_new(bitmap);
-dirty = bdrv_get_dirty_locked(NULL, bitmap, offset);
+dirty = bdrv_dirty_bitmap_get_locked(bitmap, offset);
 
 assert(begin < overall_end && nb_extents);
 while (begin < overall_end && i < nb_extents) {
-- 
2.21.0

[Qemu-block] [PULL 09/36] block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal

2019-08-16 Thread John Snow

I'm surprised it didn't come up sooner, but sometimes we have a +busy
bitmap as a source. This is dangerous from the QMP API, but if we are
the owner that marked the bitmap busy, it's safe to merge it using it as
a read only source.

It is not safe in the general case to allow users to read from in-use
bitmaps, so create an internal variant that foregoes the safety
checking.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-10-js...@redhat.com
Signed-off-by: John Snow 
---
 block/dirty-bitmap.c  | 54 +++
 include/block/block_int.h |  3 +++
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 95a9c2a5d8a..7881fea684b 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -810,6 +810,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap 
*bitmap,
 return hbitmap_next_dirty_area(bitmap->bitmap, offset, bytes);
 }
 
+/**
+ * bdrv_merge_dirty_bitmap: merge src into dest.
+ * Ensures permissions on bitmaps are reasonable; use for public API.
+ *
+ * @backup: If provided, make a copy of dest here prior to merge.
+ */
 void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  HBitmap **backup, Error **errp)
 {
@@ -833,6 +839,42 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const 
BdrvDirtyBitmap *src,
 goto out;
 }
 
+ret = bdrv_dirty_bitmap_merge_internal(dest, src, backup, false);
+assert(ret);
+
+out:
+qemu_mutex_unlock(dest->mutex);
+if (src->mutex != dest->mutex) {
+qemu_mutex_unlock(src->mutex);
+}
+}
+
+/**
+ * bdrv_dirty_bitmap_merge_internal: merge src into dest.
+ * Does NOT check bitmap permissions; not suitable for use as public API.
+ *
+ * @backup: If provided, make a copy of dest here prior to merge.
+ * @lock: If true, lock and unlock bitmaps on the way in/out.
+ * returns true if the merge succeeded; false if unattempted.
+ */
+bool bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
+  const BdrvDirtyBitmap *src,
+  HBitmap **backup,
+  bool lock)
+{
+bool ret;
+
+assert(!bdrv_dirty_bitmap_readonly(dest));
+assert(!bdrv_dirty_bitmap_inconsistent(dest));
+assert(!bdrv_dirty_bitmap_inconsistent(src));
+
+if (lock) {
+qemu_mutex_lock(dest->mutex);
+if (src->mutex != dest->mutex) {
+qemu_mutex_lock(src->mutex);
+}
+}
+
 if (backup) {
 *backup = dest->bitmap;
 dest->bitmap = hbitmap_alloc(dest->size, hbitmap_granularity(*backup));
@@ -840,11 +882,13 @@ void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const 
BdrvDirtyBitmap *src,
 } else {
 ret = hbitmap_merge(dest->bitmap, src->bitmap, dest->bitmap);
 }
-assert(ret);
 
-out:
-qemu_mutex_unlock(dest->mutex);
-if (src->mutex != dest->mutex) {
-qemu_mutex_unlock(src->mutex);
+if (lock) {
+qemu_mutex_unlock(dest->mutex);
+if (src->mutex != dest->mutex) {
+qemu_mutex_unlock(src->mutex);
+}
 }
+
+return ret;
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 80953ac8aeb..aa697f1f694 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1253,6 +1253,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, 
int64_t bytes);
 
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
 void bdrv_restore_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *backup);
+bool bdrv_dirty_bitmap_merge_internal(BdrvDirtyBitmap *dest,
+  const BdrvDirtyBitmap *src,
+  HBitmap **backup, bool lock);
 
 void bdrv_inc_in_flight(BlockDriverState *bs);
 void bdrv_dec_in_flight(BlockDriverState *bs);
-- 
2.21.0

[Qemu-block] [PULL 04/36] qapi: add BitmapSyncMode enum

2019-08-16 Thread John Snow

Depending on what a user is trying to accomplish, there might be a few
bitmap cleanup actions that occur when an operation is finished that
could be useful.

I am proposing three:
- NEVER: The bitmap is never synchronized against what was copied.
- ALWAYS: The bitmap is always synchronized, even on failures.
- ON-SUCCESS: The bitmap is synchronized only on success.

The existing incremental backup modes use 'on-success' semantics,
so add just that one for right now.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Reviewed-by: Markus Armbruster 
Message-id: 20190709232550.10724-5-js...@redhat.com
Signed-off-by: John Snow 
---
 qapi/block-core.json | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8ca12004ae9..06eb3bb3d78 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1134,6 +1134,20 @@
 { 'enum': 'MirrorSyncMode',
   'data': ['top', 'full', 'none', 'incremental'] }
 
+##
+# @BitmapSyncMode:
+#
+# An enumeration of possible behaviors for the synchronization of a bitmap
+# when used for data copy operations.
+#
+# @on-success: The bitmap is only synced when the operation is successful.
+#  This is the behavior always used for 'INCREMENTAL' backups.
+#
+# Since: 4.2
+##
+{ 'enum': 'BitmapSyncMode',
+  'data': ['on-success'] }
+
 ##
 # @MirrorCopyMode:
 #
-- 
2.21.0

[Qemu-block] [PULL 08/36] hbitmap: enable merging across granularities

2019-08-16 Thread John Snow

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-9-js...@redhat.com
Signed-off-by: John Snow 
---
 util/hbitmap.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/util/hbitmap.c b/util/hbitmap.c
index 83927f3c08a..fd44c897ab0 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -781,7 +781,27 @@ void hbitmap_truncate(HBitmap *hb, uint64_t size)
 
 bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b)
 {
-return (a->size == b->size) && (a->granularity == b->granularity);
+return (a->orig_size == b->orig_size);
+}
+
+/**
+ * hbitmap_sparse_merge: performs dst = dst | src
+ * works with differing granularities.
+ * best used when src is sparsely populated.
+ */
+static void hbitmap_sparse_merge(HBitmap *dst, const HBitmap *src)
+{
+uint64_t offset = 0;
+uint64_t count = src->orig_size;
+
+while (hbitmap_next_dirty_area(src, &offset, &count)) {
+hbitmap_set(dst, offset, count);
+offset += count;
+if (offset >= src->orig_size) {
+break;
+}
+count = src->orig_size - offset;
+}
 }
 
 /**
@@ -812,10 +832,24 @@ bool hbitmap_merge(const HBitmap *a, const HBitmap *b, 
HBitmap *result)
 return true;
 }
 
+if (a->granularity != b->granularity) {
+if ((a != result) && (b != result)) {
+hbitmap_reset_all(result);
+}
+if (a != result) {
+hbitmap_sparse_merge(result, a);
+}
+if (b != result) {
+hbitmap_sparse_merge(result, b);
+}
+return true;
+}
+
 /* This merge is O(size), as BITS_PER_LONG and HBITMAP_LEVELS are constant.
  * It may be possible to improve running times for sparsely populated maps
  * by using hbitmap_iter_next, but this is suboptimal for dense maps.
  */
+assert(a->size == b->size);
 for (i = HBITMAP_LEVELS - 1; i >= 0; i--) {
 for (j = 0; j < a->sizes[i]; j++) {
 result->levels[i][j] = a->levels[i][j] | b->levels[i][j];
-- 
2.21.0

[Qemu-block] [PULL 06/36] block/backup: add 'never' policy to bitmap sync mode

2019-08-16 Thread John Snow

This adds a "never" policy for bitmap synchronization. Regardless of if
the job succeeds or fails, we never update the bitmap. This can be used
to perform differential backups, or simply to avoid the job modifying a
bitmap.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-7-js...@redhat.com
Signed-off-by: John Snow 
---
 block/backup.c   | 7 +--
 qapi/block-core.json | 5 -
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 2b4c5c23e4e..d07b838930f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -274,8 +274,11 @@ static void backup_cleanup_sync_bitmap(BackupBlockJob 
*job, int ret)
 BdrvDirtyBitmap *bm;
 BlockDriverState *bs = blk_bs(job->common.blk);
 
-if (ret < 0) {
-/* Merge the successor back into the parent, delete nothing. */
+if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
+/*
+ * Failure, or we don't want to synchronize the bitmap.
+ * Merge the successor back into the parent, delete nothing.
+ */
 bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
 assert(bm);
 } else {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index dd926f78285..06e34488a30 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1146,10 +1146,13 @@
 # @on-success: The bitmap is only synced when the operation is successful.
 #  This is the behavior always used for 'INCREMENTAL' backups.
 #
+# @never: The bitmap is never synchronized with the operation, and is
+# treated solely as a read-only manifest of blocks to copy.
+#
 # Since: 4.2
 ##
 { 'enum': 'BitmapSyncMode',
-  'data': ['on-success'] }
+  'data': ['on-success', 'never'] }
 
 ##
 # @MirrorCopyMode:
-- 
2.21.0

[Qemu-block] [PULL 01/36] qapi/block-core: Introduce BackupCommon

2019-08-16 Thread John Snow

drive-backup and blockdev-backup have an awful lot of things in common
that are the same. Let's fix that.

I don't deduplicate 'target', because the semantics actually did change
between each structure. Leave that one alone so it can be documented
separately.

Where documentation was not identical, use the most up-to-date version.
For "speed", use Blockdev-Backup's version. For "sync", use
Drive-Backup's version.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
[Maintainer edit: modified commit message. --js]
Reviewed-by: Markus Armbruster 
Message-id: 20190709232550.10724-2-js...@redhat.com
Signed-off-by: John Snow 
---
 qapi/block-core.json | 103 ++-
 1 file changed, 33 insertions(+), 70 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f1e7701fbea..8ca12004ae9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1315,32 +1315,23 @@
   'data': { 'node': 'str', 'overlay': 'str' } }
 
 ##
-# @DriveBackup:
+# @BackupCommon:
 #
 # @job-id: identifier for the newly-created block job. If
 #  omitted, the device name will be used. (Since 2.7)
 #
 # @device: the device name or node-name of a root node which should be copied.
 #
-# @target: the target of the new image. If the file exists, or if it
-#  is a device, the existing file/device will be used as the new
-#  destination.  If it does not exist, a new file will be created.
-#
-# @format: the format of the new destination, default is to
-#  probe if @mode is 'existing', else the format of the source
-#
 # @sync: what parts of the disk image should be copied to the destination
 #(all the disk, only the sectors allocated in the topmost image, from a
 #dirty bitmap, or only new I/O).
 #
-# @mode: whether and how QEMU should create a new image, default is
-#'absolute-paths'.
-#
-# @speed: the maximum speed, in bytes per second
+# @speed: the maximum speed, in bytes per second. The default is 0,
+# for unlimited.
 #
 # @bitmap: the name of dirty bitmap if sync is "incremental".
 #  Must be present if sync is "incremental", must NOT be present
-#  otherwise. (Since 2.4)
+#  otherwise. (Since 2.4 (drive-backup), 3.1 (blockdev-backup))
 #
 # @compress: true to compress data, if the target format supports it.
 #(default: false) (since 2.8)
@@ -1370,75 +1361,47 @@
 # I/O.  If an error occurs during a guest write request, the device's
 # rerror/werror actions will be used.
 #
+# Since: 4.2
+##
+{ 'struct': 'BackupCommon',
+  'data': { '*job-id': 'str', 'device': 'str',
+'sync': 'MirrorSyncMode', '*speed': 'int',
+'*bitmap': 'str', '*compress': 'bool',
+'*on-source-error': 'BlockdevOnError',
+'*on-target-error': 'BlockdevOnError',
+'*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
+
+##
+# @DriveBackup:
+#
+# @target: the target of the new image. If the file exists, or if it
+#  is a device, the existing file/device will be used as the new
+#  destination.  If it does not exist, a new file will be created.
+#
+# @format: the format of the new destination, default is to
+#  probe if @mode is 'existing', else the format of the source
+#
+# @mode: whether and how QEMU should create a new image, default is
+#'absolute-paths'.
+#
 # Since: 1.6
 ##
 { 'struct': 'DriveBackup',
-  'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
-'*format': 'str', 'sync': 'MirrorSyncMode',
-'*mode': 'NewImageMode', '*speed': 'int',
-'*bitmap': 'str', '*compress': 'bool',
-'*on-source-error': 'BlockdevOnError',
-'*on-target-error': 'BlockdevOnError',
-'*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
+  'base': 'BackupCommon',
+  'data': { 'target': 'str',
+'*format': 'str',
+'*mode': 'NewImageMode' } }
 
 ##
 # @BlockdevBackup:
 #
-# @job-id: identifier for the newly-created block job. If
-#  omitted, the device name will be used. (Since 2.7)
-#
-# @device: the device name or node-name of a root node which should be copied.
-#
 # @target: the device name or node-name of the backup target node.
 #
-# @sync: what parts of the disk image should be copied to the destination
-#(all the disk, only the sectors allocated in the topmost image, or
-#only new I/O).
-#
-# @speed: the maximum speed, in bytes per second. The default is 0,
-# for unlimited.
-#
-# @bitmap: the name of dirty bitmap if sync is "incremental".
-#  Must be present if sync is "incremental", must NOT be present
-#  otherwise. (Since 3.1)
-#
-# @compress: true to compress data, if the target format supports it.
-#(default: false) (since 2.8)
-#
-# @on-source-error: the action to take on an error on the source,
-#   default 'report'.  'stop' and 'enospc' can only be used
-#

[Qemu-block] [PULL 03/36] blockdev-backup: utilize do_backup_common

2019-08-16 Thread John Snow

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-4-js...@redhat.com
Signed-off-by: John Snow 
---
 blockdev.c | 65 +-
 1 file changed, 6 insertions(+), 59 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d822b19b4b0..8e4f70a8d66 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3626,78 +3626,25 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
JobTxn *txn,
 {
 BlockDriverState *bs;
 BlockDriverState *target_bs;
-Error *local_err = NULL;
-BdrvDirtyBitmap *bmap = NULL;
 AioContext *aio_context;
-BlockJob *job = NULL;
-int job_flags = JOB_DEFAULT;
-int ret;
-
-if (!backup->has_speed) {
-backup->speed = 0;
-}
-if (!backup->has_on_source_error) {
-backup->on_source_error = BLOCKDEV_ON_ERROR_REPORT;
-}
-if (!backup->has_on_target_error) {
-backup->on_target_error = BLOCKDEV_ON_ERROR_REPORT;
-}
-if (!backup->has_job_id) {
-backup->job_id = NULL;
-}
-if (!backup->has_auto_finalize) {
-backup->auto_finalize = true;
-}
-if (!backup->has_auto_dismiss) {
-backup->auto_dismiss = true;
-}
-if (!backup->has_compress) {
-backup->compress = false;
-}
+BlockJob *job;
 
 bs = bdrv_lookup_bs(backup->device, backup->device, errp);
 if (!bs) {
 return NULL;
 }
 
-aio_context = bdrv_get_aio_context(bs);
-aio_context_acquire(aio_context);
-
 target_bs = bdrv_lookup_bs(backup->target, backup->target, errp);
 if (!target_bs) {
-goto out;
+return NULL;
 }
 
-ret = bdrv_try_set_aio_context(target_bs, aio_context, errp);
-if (ret < 0) {
-goto out;
-}
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
 
-if (backup->has_bitmap) {
-bmap = bdrv_find_dirty_bitmap(bs, backup->bitmap);
-if (!bmap) {
-error_setg(errp, "Bitmap '%s' could not be found", backup->bitmap);
-goto out;
-}
-if (bdrv_dirty_bitmap_check(bmap, BDRV_BITMAP_DEFAULT, errp)) {
-goto out;
-}
-}
+job = do_backup_common(qapi_BlockdevBackup_base(backup),
+   bs, target_bs, aio_context, txn, errp);
 
-if (!backup->auto_finalize) {
-job_flags |= JOB_MANUAL_FINALIZE;
-}
-if (!backup->auto_dismiss) {
-job_flags |= JOB_MANUAL_DISMISS;
-}
-job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
-backup->sync, bmap, backup->compress,
-backup->on_source_error, backup->on_target_error,
-job_flags, NULL, NULL, txn, &local_err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-}
-out:
 aio_context_release(aio_context);
 return job;
 }
-- 
2.21.0

[Qemu-block] [PULL 07/36] hbitmap: Fix merge when b is empty, and result is not an alias of a

2019-08-16 Thread John Snow

Nobody calls the function like this currently, but we neither prohibit
or cope with this behavior. I decided to make the function cope with it.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-8-js...@redhat.com
Signed-off-by: John Snow 
---
 util/hbitmap.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/util/hbitmap.c b/util/hbitmap.c
index bcc0acdc6a0..83927f3c08a 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -785,8 +785,9 @@ bool hbitmap_can_merge(const HBitmap *a, const HBitmap *b)
 }
 
 /**
- * Given HBitmaps A and B, let A := A (BITOR) B.
- * Bitmap B will not be modified.
+ * Given HBitmaps A and B, let R := A (BITOR) B.
+ * Bitmaps A and B will not be modified,
+ * except when bitmap R is an alias of A or B.
  *
  * @return true if the merge was successful,
  * false if it was not attempted.
@@ -801,7 +802,13 @@ bool hbitmap_merge(const HBitmap *a, const HBitmap *b, 
HBitmap *result)
 }
 assert(hbitmap_can_merge(b, result));
 
-if (hbitmap_count(b) == 0) {
+if ((!hbitmap_count(a) && result == b) ||
+(!hbitmap_count(b) && result == a)) {
+return true;
+}
+
+if (!hbitmap_count(a) && !hbitmap_count(b)) {
+hbitmap_reset_all(result);
 return true;
 }
 
-- 
2.21.0

[Qemu-block] [PULL 00/36] Bitmaps patches

2019-08-16 Thread John Snow

The following changes since commit afd760539308a5524accf964107cdb1d54a059e3:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190816' 
into staging (2019-08-16 17:21:40 +0100)

are available in the Git repository at:

  https://github.com/jnsnow/qemu.git tags/bitmaps-pull-request

for you to fetch changes up to a5f8a60b3eafd5563af48546d5d126d448e62ac5:

  tests/test-hbitmap: test next_zero and _next_dirty_area after truncate 
(2019-08-16 18:29:43 -0400)


Pull request

Rebase notes:

011/36:[0003] [FC] 'block/backup: upgrade copy_bitmap to BdrvDirtyBitmap'
016/36:[] [-C] 'iotests: Add virtio-scsi device helper'
017/36:[0002] [FC] 'iotests: add test 257 for bitmap-mode backups'
030/36:[0011] [FC] 'block/backup: teach TOP to never copy unallocated regions'
032/36:[0018] [FC] 'iotests/257: test traditional sync modes'

11: A new hbitmap call was added late in 4.1, changed to
bdrv_dirty_bitmap_next_zero.
16: Context-only (self.has_quit is new context in 040)
17: Removed 'auto' to follow upstream trends in iotest fashion
30: Handled explicitly on-list with R-B from Max.
32: Fix capitalization in test, as mentioned on-list.



John Snow (30):
  qapi/block-core: Introduce BackupCommon
  drive-backup: create do_backup_common
  blockdev-backup: utilize do_backup_common
  qapi: add BitmapSyncMode enum
  block/backup: Add mirror sync mode 'bitmap'
  block/backup: add 'never' policy to bitmap sync mode
  hbitmap: Fix merge when b is empty, and result is not an alias of a
  hbitmap: enable merging across granularities
  block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal
  block/dirty-bitmap: add bdrv_dirty_bitmap_get
  block/backup: upgrade copy_bitmap to BdrvDirtyBitmap
  block/backup: add 'always' bitmap sync policy
  iotests: add testing shim for script-style python tests
  iotests: teach run_job to cancel pending jobs
  iotests: teach FilePath to produce multiple paths
  iotests: Add virtio-scsi device helper
  iotests: add test 257 for bitmap-mode backups
  block/backup: loosen restriction on readonly bitmaps
  qapi: implement block-dirty-bitmap-remove transaction action
  iotests/257: add Pattern class
  iotests/257: add EmulatedBitmap class
  iotests/257: Refactor backup helpers
  block/backup: hoist bitmap check into QMP interface
  iotests/257: test API failures
  block/backup: improve sync=bitmap work estimates
  block/backup: centralize copy_bitmap initialization
  block/backup: add backup_is_cluster_allocated
  block/backup: teach TOP to never copy unallocated regions
  block/backup: support bitmap sync modes for non-bitmap backups
  iotests/257: test traditional sync modes

Vladimir Sementsov-Ogievskiy (6):
  blockdev: reduce aio_context locked sections in bitmap add/remove
  iotests: test bitmap moving inside 254
  qapi: add dirty-bitmaps to query-named-block-nodes result
  block/backup: deal with zero detection
  block/backup: refactor write_flags
  tests/test-hbitmap: test next_zero and _next_dirty_area after truncate

 block.c|2 +-
 block/backup.c |  312 +-
 block/dirty-bitmap.c   |   88 +-
 block/mirror.c |8 +-
 block/qapi.c   |5 +
 block/replication.c|2 +-
 block/trace-events |1 +
 blockdev.c |  353 ++-
 include/block/block_int.h  |7 +-
 include/block/dirty-bitmap.h   |6 +-
 migration/block-dirty-bitmap.c |2 +-
 migration/block.c  |5 +-
 nbd/server.c   |2 +-
 qapi/block-core.json   |  146 +-
 qapi/transaction.json  |2 +
 qemu-deprecated.texi   |   12 +
 tests/qemu-iotests/040 |6 +-
 tests/qemu-iotests/093 |6 +-
 tests/qemu-iotests/139 |7 +-
 tests/qemu-iotests/238 |5 +-
 tests/qemu-iotests/254 |   30 +-
 tests/qemu-iotests/254.out |   82 +
 tests/qemu-iotests/256.out |4 +-
 tests/qemu-iotests/257 |  560 
 tests/qemu-iotests/257.out | 5421 
 tests/qemu-iotests/group   |1 +
 tests/qemu-iotests/iotests.py  |  102 +-
 tests/test-hbitmap.c   |   22 +
 util/hbitmap.c |   49 +-
 29 files changed, 6843 insertions(+), 405 deletions(-)
 create mode 100755 tests/qemu-iotests/257
 create mode 100644 tests/qemu-iotests/257.out

-- 
2.21.0

[Qemu-block] [PULL 02/36] drive-backup: create do_backup_common

2019-08-16 Thread John Snow

Create a common core that comprises the actual meat of what the backup API
boundary needs to do, and then switch drive-backup to use it.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
Message-id: 20190709232550.10724-3-js...@redhat.com
Signed-off-by: John Snow 
---
 blockdev.c | 122 +
 1 file changed, 67 insertions(+), 55 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 95cdd5a5cb0..d822b19b4b0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3427,6 +3427,70 @@ out:
 aio_context_release(aio_context);
 }
 
+/* Common QMP interface for drive-backup and blockdev-backup */
+static BlockJob *do_backup_common(BackupCommon *backup,
+  BlockDriverState *bs,
+  BlockDriverState *target_bs,
+  AioContext *aio_context,
+  JobTxn *txn, Error **errp)
+{
+BlockJob *job = NULL;
+BdrvDirtyBitmap *bmap = NULL;
+int job_flags = JOB_DEFAULT;
+int ret;
+
+if (!backup->has_speed) {
+backup->speed = 0;
+}
+if (!backup->has_on_source_error) {
+backup->on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!backup->has_on_target_error) {
+backup->on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!backup->has_job_id) {
+backup->job_id = NULL;
+}
+if (!backup->has_auto_finalize) {
+backup->auto_finalize = true;
+}
+if (!backup->has_auto_dismiss) {
+backup->auto_dismiss = true;
+}
+if (!backup->has_compress) {
+backup->compress = false;
+}
+
+ret = bdrv_try_set_aio_context(target_bs, aio_context, errp);
+if (ret < 0) {
+return NULL;
+}
+
+if (backup->has_bitmap) {
+bmap = bdrv_find_dirty_bitmap(bs, backup->bitmap);
+if (!bmap) {
+error_setg(errp, "Bitmap '%s' could not be found", backup->bitmap);
+return NULL;
+}
+if (bdrv_dirty_bitmap_check(bmap, BDRV_BITMAP_DEFAULT, errp)) {
+return NULL;
+}
+}
+
+if (!backup->auto_finalize) {
+job_flags |= JOB_MANUAL_FINALIZE;
+}
+if (!backup->auto_dismiss) {
+job_flags |= JOB_MANUAL_DISMISS;
+}
+
+job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+backup->sync, bmap, backup->compress,
+backup->on_source_error, backup->on_target_error,
+job_flags, NULL, NULL, txn, errp);
+return job;
+}
+
 static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn,
  Error **errp)
 {
@@ -3434,39 +3498,16 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 BlockDriverState *target_bs;
 BlockDriverState *source = NULL;
 BlockJob *job = NULL;
-BdrvDirtyBitmap *bmap = NULL;
 AioContext *aio_context;
 QDict *options = NULL;
 Error *local_err = NULL;
-int flags, job_flags = JOB_DEFAULT;
+int flags;
 int64_t size;
 bool set_backing_hd = false;
-int ret;
 
-if (!backup->has_speed) {
-backup->speed = 0;
-}
-if (!backup->has_on_source_error) {
-backup->on_source_error = BLOCKDEV_ON_ERROR_REPORT;
-}
-if (!backup->has_on_target_error) {
-backup->on_target_error = BLOCKDEV_ON_ERROR_REPORT;
-}
 if (!backup->has_mode) {
 backup->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
 }
-if (!backup->has_job_id) {
-backup->job_id = NULL;
-}
-if (!backup->has_auto_finalize) {
-backup->auto_finalize = true;
-}
-if (!backup->has_auto_dismiss) {
-backup->auto_dismiss = true;
-}
-if (!backup->has_compress) {
-backup->compress = false;
-}
 
 bs = bdrv_lookup_bs(backup->device, backup->device, errp);
 if (!bs) {
@@ -3543,12 +3584,6 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 goto out;
 }
 
-ret = bdrv_try_set_aio_context(target_bs, aio_context, errp);
-if (ret < 0) {
-bdrv_unref(target_bs);
-goto out;
-}
-
 if (set_backing_hd) {
 bdrv_set_backing_hd(target_bs, source, &local_err);
 if (local_err) {
@@ -3556,31 +3591,8 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 }
 }
 
-if (backup->has_bitmap) {
-bmap = bdrv_find_dirty_bitmap(bs, backup->bitmap);
-if (!bmap) {
-error_setg(errp, "Bitmap '%s' could not be found", backup->bitmap);
-goto unref;
-}
-if (bdrv_dirty_bitmap_check(bmap, BDRV_BITMAP_DEFAULT, errp)) {
-goto unref;
-}
-}
-if (!backup->auto_finalize) {
-job_flags |= JOB_MANUAL_FINALIZE;
-}
-if (!backup->auto_dismiss) {
-job_flags |= JOB_MANUAL_DISMISS;
-}
-
-job = backup_job_create(backup->job_id, bs, target_bs,

Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block

2019-08-16 Thread John Snow




On 8/16/19 6:45 PM, Nir Soffer wrote:
> On Sat, Aug 17, 2019 at 12:57 AM John Snow  > wrote:
> 
> On 8/16/19 5:21 PM, Nir Soffer wrote:
> > When creating an image with preallocation "off" or "falloc", the first
> > block of the image is typically not allocated. When using Gluster
> > storage backed by XFS filesystem, reading this block using direct I/O
> > succeeds regardless of request length, fooling alignment detection.
> >
> > In this case we fallback to a safe value (4096) instead of the optimal
> > value (512), which may lead to unneeded data copying when aligning
> > requests.  Allocating the first block avoids the fallback.
> >
> 
> Where does this detection/fallback happen? (Can it be improved?)
> 
> 
> In raw_probe_alignment().
> 
> This patch explain the issues:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html
> 
> Here Kevin and me discussed ways to improve it:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html
> 

Thanks for the reading!
That does help explain this patch better.

> > When using preallocation=off, we always allocate at least one
> filesystem
> > block:
> >
> >     $ ./qemu-img create -f raw test.raw 1g
> >     Formatting 'test.raw', fmt=raw size=1073741824
> >
> >     $ ls -lhs test.raw
> >     4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> >
> > I did quick performance tests for these flows:
> > - Provisioning a VM with a new raw image.
> > - Copying disks with qemu-img convert to new raw target image
> >
> > I installed Fedora 29 server on raw sparse image, measuring the time
> > from clicking "Begin installation" until the "Reboot" button appears:
> >
> > Before(s)  After(s)     Diff(%)
> > ---
> >      356        389        +8.4
> >
> > I ran this only once, so we cannot tell much from these results.
> >
> 
> That seems like a pretty big difference for just having pre-allocated a
> single block. What was the actual command line / block graph for
> that test?
> 
> 
> Having the first block allocated changes the alignment.
> 
> Before this patch, we detect request_alignment=1, so we fallback to 4096.
> Then we detect buf_align=1, so we fallback to value of request alignment.
> 
> The guest see a disk with:
> logical_block_size = 512
> physical_block_size = 512
> 
> But qemu uses:
> request_alignment = 4096
> buf_align = 4096
> 
> storage uses:
> logical_block_size = 512
> physical_block_size = 512
> 
> If the guest does direct I/O using 512 bytes aligment, qemu has to copy
> the buffer to align them to 4096 bytes.
> 
> After this patch, qemu detects the alignment correctly, so we have:
> 
> guest
> logical_block_size = 512
> physical_block_size = 512
> 
> qemu
> request_alignment = 512
> buf_align = 512
> 
> storage:
> logical_block_size = 512
> physical_block_size = 512
> 
> We expect this to be more efficient because qemu does not have to emulate
> anything.
> 
> Was this over a network that could explain the variance?
> 
> 
> Maybe, this is complete install of Fedora 29 server, I'm not sure if the
> installation 
> access the network.
> 
> > The second test was cloning the installation image with qemu-img
> > convert, doing 10 runs:
> >
> >     for i in $(seq 10); do
> >         rm -f dst.raw
> >         sleep 10
> >         time ./qemu-img convert -f raw -O raw -t none -T none
> src.raw dst.raw
> >     done
> >
> > Here is a table comparing the total time spent:
> >
> > Type    Before(s)   After(s)    Diff(%)
> > ---
> > real      530.028    469.123      -11.4
> > user       17.204     10.768      -37.4
> > sys        17.881      7.011      -60.7
> >
> > Here we see very clear improvement in CPU usage.
> >
> 
> Hard to argue much with that. I feel a little strange trying to force
> the allocation of the first block, but I suppose in practice "almost no
> preallocation" is indistinguishable from "exactly no preallocation" if
> you squint.
> 
> 
> Right.
> 
> The real issue is that filesystems and block devices do not expose the
> alignment
> requirement for direct I/O, so we need to use these hacks and assumptions.
> 
> With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment,
> but this does
> not help for XFS filesystem used by Gluster on the server side.
> 
> I hope that Niels is working on adding similar ioctl for Glsuter, os it
> can expose the properties
> of the remote filesystem.
> 
> Nir

That sounds quite a bit less hacky, but I agree we still have to do what
we can in the meantime.

(It looks like you've been hashing this out with Kevin for a while, so
I'm going to sheepishly defer to his judgment on this patch. While I
think it's probably a fine trade-

Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block

2019-08-16 Thread Nir Soffer

On Sat, Aug 17, 2019 at 12:57 AM John Snow  wrote:

> On 8/16/19 5:21 PM, Nir Soffer wrote:
> > When creating an image with preallocation "off" or "falloc", the first
> > block of the image is typically not allocated. When using Gluster
> > storage backed by XFS filesystem, reading this block using direct I/O
> > succeeds regardless of request length, fooling alignment detection.
> >
> > In this case we fallback to a safe value (4096) instead of the optimal
> > value (512), which may lead to unneeded data copying when aligning
> > requests.  Allocating the first block avoids the fallback.
> >
>
> Where does this detection/fallback happen? (Can it be improved?)
>

In raw_probe_alignment().

This patch explain the issues:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00568.html

Here Kevin and me discussed ways to improve it:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00426.html

> When using preallocation=off, we always allocate at least one filesystem
> > block:
> >
> > $ ./qemu-img create -f raw test.raw 1g
> > Formatting 'test.raw', fmt=raw size=1073741824
> >
> > $ ls -lhs test.raw
> > 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> >
> > I did quick performance tests for these flows:
> > - Provisioning a VM with a new raw image.
> > - Copying disks with qemu-img convert to new raw target image
> >
> > I installed Fedora 29 server on raw sparse image, measuring the time
> > from clicking "Begin installation" until the "Reboot" button appears:
> >
> > Before(s)  After(s) Diff(%)
> > ---
> >  356389+8.4
> >
> > I ran this only once, so we cannot tell much from these results.
> >
>
> That seems like a pretty big difference for just having pre-allocated a
> single block. What was the actual command line / block graph for that test?
>

Having the first block allocated changes the alignment.

Before this patch, we detect request_alignment=1, so we fallback to 4096.
Then we detect buf_align=1, so we fallback to value of request alignment.

The guest see a disk with:
logical_block_size = 512
physical_block_size = 512

But qemu uses:
request_alignment = 4096
buf_align = 4096

storage uses:
logical_block_size = 512
physical_block_size = 512

If the guest does direct I/O using 512 bytes aligment, qemu has to copy
the buffer to align them to 4096 bytes.

After this patch, qemu detects the alignment correctly, so we have:

guest
logical_block_size = 512
physical_block_size = 512

qemu
request_alignment = 512
buf_align = 512

storage:
logical_block_size = 512
physical_block_size = 512

We expect this to be more efficient because qemu does not have to emulate
anything.

Was this over a network that could explain the variance?
>

Maybe, this is complete install of Fedora 29 server, I'm not sure if the
installation
access the network.

> The second test was cloning the installation image with qemu-img
> > convert, doing 10 runs:
> >
> > for i in $(seq 10); do
> > rm -f dst.raw
> > sleep 10
> > time ./qemu-img convert -f raw -O raw -t none -T none src.raw
> dst.raw
> > done
> >
> > Here is a table comparing the total time spent:
> >
> > TypeBefore(s)   After(s)Diff(%)
> > ---
> > real  530.028469.123  -11.4
> > user   17.204 10.768  -37.4
> > sys17.881  7.011  -60.7
> >
> > Here we see very clear improvement in CPU usage.
> >
>
> Hard to argue much with that. I feel a little strange trying to force
> the allocation of the first block, but I suppose in practice "almost no
> preallocation" is indistinguishable from "exactly no preallocation" if
> you squint.
>

Right.

The real issue is that filesystems and block devices do not expose the
alignment
requirement for direct I/O, so we need to use these hacks and assumptions.

With local XFS we use xfsctl(XFS_IOC_DIOINFO) to get request_alignment, but
this does
not help for XFS filesystem used by Gluster on the server side.

I hope that Niels is working on adding similar ioctl for Glsuter, os it can
expose the properties
of the remote filesystem.

Nir

Re: [Qemu-block] bitmaps branch conflict resolution

2019-08-16 Thread Max Reitz

On 17.08.19 00:07, John Snow wrote:
> Hi Max, I took your patch and adjusted it slightly: I don't like
> "skip_bytes" anymore because it's clear now that we don't only read that
> value when we're skipping bytes, so now it's just status_bytes.

Yep, sure.

> Since this is based on your fixup, would you like to offer an
> Ack/S-o-b/R-B/whichever here?

Sure:

Reviewed-by: Max Reitz 

Additional explanation for others:

The conflict resolution in itself is just a matter of the
“backup_bitmap_reset_unallocated” block and the
“bdrv_dirty_bitmap_next_zero” block introduced in the same place in two
separate patches (one went to master, the other to bitmaps-next).

So the question is how to order them.  On the first glance, it doesn’t
matter, it can go both ways.

On a second glance, it turns out we need to combine the results, hence
the new MIN() here.

If we are initializing the bitmap, bdrv_dirty_bitmap_next_zero() does
not necessarily return the correct result.  It is only accurate insofar
we have actually initialized the bitmap.  We can get that information
from backup_bitmap_reset_unallocated(): It ensures that the bitmap is
accurate in the [start, start + status_bytes) range.

Therefore, we have to limit dirty_end by start + status_bytes.

I don’t think it really matters whether we do the
backup_bitmap_reset_allocated() or the bdrv_dirty_bitmap_next_zero()
first.  It’s just that it’s slightly simpler to do the latter first,
because the former is in a conditional block, so we can put the MIN()
right there.  Hence the order change here.

(If we did it the other way around, we’d need another conditional block
“if (job->initializing_bitmap) { dirty_end = MIN(...) }” after we have
both dirty_end and status_bytes.)

Max

> diff --git a/block/backup.c b/block/backup.c
> index ee4d5598986..9e1382ec5c6 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -266,7 +266,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob
> *job,
>  int ret = 0;
>  int64_t start, end; /* bytes */
>  void *bounce_buffer = NULL;
> -int64_t skip_bytes;
> +int64_t status_bytes;
> 
>  qemu_co_rwlock_rdlock(&job->flush_rwlock);
> 
> @@ -287,21 +287,23 @@ static int coroutine_fn
> backup_do_cow(BackupBlockJob *job,
>  continue; /* already copied */
>  }
> 
> -if (job->initializing_bitmap) {
> -ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
> -if (ret == 0) {
> -trace_backup_do_cow_skip_range(job, start, skip_bytes);
> -start += skip_bytes;
> -continue;
> -}
> -}
> -
>  dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
>  (end - start));
>  if (dirty_end < 0) {
>  dirty_end = end;
>  }
> 
> +if (job->initializing_bitmap) {
> +ret = backup_bitmap_reset_unallocated(job, start,
> &status_bytes);
> +if (ret == 0) {
> +trace_backup_do_cow_skip_range(job, start, status_bytes);
> +start += status_bytes;
> +continue;
> +}
> +/* Clamp to known allocated region */
> +dirty_end = MIN(dirty_end, start + status_bytes);
> +}
> +
>  trace_backup_do_cow_process(job, start);
> 
>  if (job->use_copy_range) {
> 

signature.asc
Description: OpenPGP digital signature

[Qemu-block] bitmaps branch conflict resolution

2019-08-16 Thread John Snow

Hi Max, I took your patch and adjusted it slightly: I don't like
"skip_bytes" anymore because it's clear now that we don't only read that
value when we're skipping bytes, so now it's just status_bytes.

Since this is based on your fixup, would you like to offer an
Ack/S-o-b/R-B/whichever here?

--js

diff --git a/block/backup.c b/block/backup.c
index ee4d5598986..9e1382ec5c6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -266,7 +266,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob
*job,
 int ret = 0;
 int64_t start, end; /* bytes */
 void *bounce_buffer = NULL;
-int64_t skip_bytes;
+int64_t status_bytes;

 qemu_co_rwlock_rdlock(&job->flush_rwlock);

@@ -287,21 +287,23 @@ static int coroutine_fn
backup_do_cow(BackupBlockJob *job,
 continue; /* already copied */
 }

-if (job->initializing_bitmap) {
-ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
-if (ret == 0) {
-trace_backup_do_cow_skip_range(job, start, skip_bytes);
-start += skip_bytes;
-continue;
-}
-}
-
 dirty_end = bdrv_dirty_bitmap_next_zero(job->copy_bitmap, start,
 (end - start));
 if (dirty_end < 0) {
 dirty_end = end;
 }

+if (job->initializing_bitmap) {
+ret = backup_bitmap_reset_unallocated(job, start,
&status_bytes);
+if (ret == 0) {
+trace_backup_do_cow_skip_range(job, start, status_bytes);
+start += status_bytes;
+continue;
+}
+/* Clamp to known allocated region */
+dirty_end = MIN(dirty_end, start + status_bytes);
+}
+
 trace_backup_do_cow_process(job, start);

 if (job->use_copy_range) {

Re: [Qemu-block] [PATCH] block: posix: Always allocate the first block

2019-08-16 Thread John Snow




On 8/16/19 5:21 PM, Nir Soffer wrote:
> When creating an image with preallocation "off" or "falloc", the first
> block of the image is typically not allocated. When using Gluster
> storage backed by XFS filesystem, reading this block using direct I/O
> succeeds regardless of request length, fooling alignment detection.
> 
> In this case we fallback to a safe value (4096) instead of the optimal
> value (512), which may lead to unneeded data copying when aligning
> requests.  Allocating the first block avoids the fallback.
> 

Where does this detection/fallback happen? (Can it be improved?)

> When using preallocation=off, we always allocate at least one filesystem
> block:
> 
> $ ./qemu-img create -f raw test.raw 1g
> Formatting 'test.raw', fmt=raw size=1073741824
> 
> $ ls -lhs test.raw
> 4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw
> 
> I did quick performance tests for these flows:
> - Provisioning a VM with a new raw image.
> - Copying disks with qemu-img convert to new raw target image
> 
> I installed Fedora 29 server on raw sparse image, measuring the time
> from clicking "Begin installation" until the "Reboot" button appears:
> 
> Before(s)  After(s) Diff(%)
> ---
>  356389+8.4
> 
> I ran this only once, so we cannot tell much from these results.
> 

That seems like a pretty big difference for just having pre-allocated a
single block. What was the actual command line / block graph for that test?

Was this over a network that could explain the variance?

> The second test was cloning the installation image with qemu-img
> convert, doing 10 runs:
> 
> for i in $(seq 10); do
> rm -f dst.raw
> sleep 10
> time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
> done
> 
> Here is a table comparing the total time spent:
> 
> TypeBefore(s)   After(s)Diff(%)
> ---
> real  530.028469.123  -11.4
> user   17.204 10.768  -37.4
> sys17.881  7.011  -60.7
> 
> Here we see very clear improvement in CPU usage.
> 

Hard to argue much with that. I feel a little strange trying to force
the allocation of the first block, but I suppose in practice "almost no
preallocation" is indistinguishable from "exactly no preallocation" if
you squint.

> Signed-off-by: Nir Soffer 
> ---
>  block/file-posix.c | 25 +
>  tests/qemu-iotests/150.out |  1 +
>  tests/qemu-iotests/160 |  4 
>  tests/qemu-iotests/175 | 19 +--
>  tests/qemu-iotests/175.out |  8 
>  tests/qemu-iotests/221.out | 12 
>  tests/qemu-iotests/253.out | 12 
>  7 files changed, 63 insertions(+), 18 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index b9c33c8f6c..3964dd2021 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1755,6 +1755,27 @@ static int handle_aiocb_discard(void *opaque)
>  return ret;
>  }
>  
> +/*
> + * Help alignment detection by allocating the first block.
> + *
> + * When reading with direct I/O from unallocated area on Gluster backed by 
> XFS,
> + * reading succeeds regardless of request length. In this case we fallback to
> + * safe aligment which is not optimal. Allocating the first block avoids this
> + * fallback.
> + *
> + * Returns: 0 on success, -errno on failure.
> + */
> +static int allocate_first_block(int fd)
> +{
> +ssize_t n;
> +
> +do {
> +n = pwrite(fd, "\0", 1, 0);
> +} while (n == -1 && errno == EINTR);
> +
> +return (n == -1) ? -errno : 0;
> +}
> +
>  static int handle_aiocb_truncate(void *opaque)
>  {
>  RawPosixAIOData *aiocb = opaque;
> @@ -1794,6 +1815,8 @@ static int handle_aiocb_truncate(void *opaque)
>  /* posix_fallocate() doesn't set errno. */
>  error_setg_errno(errp, -result,
>   "Could not preallocate new data");
> +} else if (current_length == 0) {
> +allocate_first_block(fd);
>  }
>  } else {
>  result = 0;
> @@ -1855,6 +1878,8 @@ static int handle_aiocb_truncate(void *opaque)
>  if (ftruncate(fd, offset) != 0) {
>  result = -errno;
>  error_setg_errno(errp, -result, "Could not resize file");
> +} else if (current_length == 0 && offset > current_length) {
> +allocate_first_block(fd);
>  }
>  return result;
>  default:
> diff --git a/tests/qemu-iotests/150.out b/tests/qemu-iotests/150.out
> index 2a54e8dcfa..3cdc7727a5 100644
> --- a/tests/qemu-iotests/150.out
> +++ b/tests/qemu-iotests/150.out
> @@ -3,6 +3,7 @@ QA output created by 150
>  === Mapping sparse conversion ===
>  
>  Offset  Length  File
> +0   0x1000  TEST_DIR/t.IMGFMT
>  
>  === Mapping non-sparse conversion ===
>  
> diff --git a/tests/qemu-iot

[Qemu-block] [PATCH] block: posix: Always allocate the first block

2019-08-16 Thread Nir Soffer

When creating an image with preallocation "off" or "falloc", the first
block of the image is typically not allocated. When using Gluster
storage backed by XFS filesystem, reading this block using direct I/O
succeeds regardless of request length, fooling alignment detection.

In this case we fallback to a safe value (4096) instead of the optimal
value (512), which may lead to unneeded data copying when aligning
requests.  Allocating the first block avoids the fallback.

When using preallocation=off, we always allocate at least one filesystem
block:

$ ./qemu-img create -f raw test.raw 1g
Formatting 'test.raw', fmt=raw size=1073741824

$ ls -lhs test.raw
4.0K -rw-r--r--. 1 nsoffer nsoffer 1.0G Aug 16 23:48 test.raw

I did quick performance tests for these flows:
- Provisioning a VM with a new raw image.
- Copying disks with qemu-img convert to new raw target image

I installed Fedora 29 server on raw sparse image, measuring the time
from clicking "Begin installation" until the "Reboot" button appears:

Before(s)  After(s) Diff(%)
---
 356389+8.4

I ran this only once, so we cannot tell much from these results.

The second test was cloning the installation image with qemu-img
convert, doing 10 runs:

for i in $(seq 10); do
rm -f dst.raw
sleep 10
time ./qemu-img convert -f raw -O raw -t none -T none src.raw dst.raw
done

Here is a table comparing the total time spent:

TypeBefore(s)   After(s)Diff(%)
---
real  530.028469.123  -11.4
user   17.204 10.768  -37.4
sys17.881  7.011  -60.7

Here we see very clear improvement in CPU usage.

Signed-off-by: Nir Soffer 
---
 block/file-posix.c | 25 +
 tests/qemu-iotests/150.out |  1 +
 tests/qemu-iotests/160 |  4 
 tests/qemu-iotests/175 | 19 +--
 tests/qemu-iotests/175.out |  8 
 tests/qemu-iotests/221.out | 12 
 tests/qemu-iotests/253.out | 12 
 7 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b9c33c8f6c..3964dd2021 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1755,6 +1755,27 @@ static int handle_aiocb_discard(void *opaque)
 return ret;
 }
 
+/*
+ * Help alignment detection by allocating the first block.
+ *
+ * When reading with direct I/O from unallocated area on Gluster backed by XFS,
+ * reading succeeds regardless of request length. In this case we fallback to
+ * safe aligment which is not optimal. Allocating the first block avoids this
+ * fallback.
+ *
+ * Returns: 0 on success, -errno on failure.
+ */
+static int allocate_first_block(int fd)
+{
+ssize_t n;
+
+do {
+n = pwrite(fd, "\0", 1, 0);
+} while (n == -1 && errno == EINTR);
+
+return (n == -1) ? -errno : 0;
+}
+
 static int handle_aiocb_truncate(void *opaque)
 {
 RawPosixAIOData *aiocb = opaque;
@@ -1794,6 +1815,8 @@ static int handle_aiocb_truncate(void *opaque)
 /* posix_fallocate() doesn't set errno. */
 error_setg_errno(errp, -result,
  "Could not preallocate new data");
+} else if (current_length == 0) {
+allocate_first_block(fd);
 }
 } else {
 result = 0;
@@ -1855,6 +1878,8 @@ static int handle_aiocb_truncate(void *opaque)
 if (ftruncate(fd, offset) != 0) {
 result = -errno;
 error_setg_errno(errp, -result, "Could not resize file");
+} else if (current_length == 0 && offset > current_length) {
+allocate_first_block(fd);
 }
 return result;
 default:
diff --git a/tests/qemu-iotests/150.out b/tests/qemu-iotests/150.out
index 2a54e8dcfa..3cdc7727a5 100644
--- a/tests/qemu-iotests/150.out
+++ b/tests/qemu-iotests/150.out
@@ -3,6 +3,7 @@ QA output created by 150
 === Mapping sparse conversion ===
 
 Offset  Length  File
+0   0x1000  TEST_DIR/t.IMGFMT
 
 === Mapping non-sparse conversion ===
 
diff --git a/tests/qemu-iotests/160 b/tests/qemu-iotests/160
index df89d3864b..ad2d054a47 100755
--- a/tests/qemu-iotests/160
+++ b/tests/qemu-iotests/160
@@ -57,6 +57,10 @@ for skip in $TEST_SKIP_BLOCKS; do
 $QEMU_IMG dd if="$TEST_IMG" of="$TEST_IMG.out" skip="$skip" -O "$IMGFMT" \
 2> /dev/null
 TEST_IMG="$TEST_IMG.out" _check_test_img
+
+# We always write the first byte of an image.
+printf "\0" > "$TEST_IMG.out.dd"
+
 dd if="$TEST_IMG" of="$TEST_IMG.out.dd" skip="$skip" status=none
 
 echo
diff --git a/tests/qemu-iotests/175 b/tests/qemu-iotests/175
index 51e62c8276..c6a3a7bb1e 100755
--- a/tests/qemu-iotests/175
+++ b/tests/qemu-iotests/175
@@ -37,14 +37,16 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # the file size.  This function hides the resulting difference in the
 #

Re: [Qemu-block] [PATCH v5 3/6] iotests: Add casenotrun report to bash tests

2019-08-16 Thread Cleber Rosa

On Thu, Aug 15, 2019 at 08:44:11PM -0400, John Snow wrote:
> 
> 
> On 7/19/19 12:30 PM, Andrey Shinkevich wrote:
> > The new function _casenotrun() is to be invoked if a test case cannot
> > be run for some reason. The user will be notified by a message passed
> > to the function.
> > 
> 
> Oh, I assume this is a sub-test granularity; if we need to skip
> individual items.
> 
> I'm good with this, but we should CC Cleber Rosa, who has struggled
> against this in the past, too.
>

The discussion I was involved in was not that much about skipping
tests per se, but about how to determine if a test should be skipped
or not.  At that time, we proposed an integration with the build
system, but the downside (and the reason for not pushing it forward)
was the requirement to run the iotest outside of a build tree.

> > Suggested-by: Kevin Wolf 
> > Signed-off-by: Andrey Shinkevich 
> > ---
> >  tests/qemu-iotests/common.rc | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> > index 6e461a1..1089050 100644
> > --- a/tests/qemu-iotests/common.rc
> > +++ b/tests/qemu-iotests/common.rc
> > @@ -428,6 +428,13 @@ _notrun()
> >  exit
> >  }
> >  
> > +# bail out, setting up .casenotrun file
> > +#
> > +_casenotrun()
> > +{
> > +echo "[case not run] $*" >>"$OUTPUT_DIR/$seq.casenotrun"
> > +}
> > +
> >  # just plain bail out
> >  #
> >  _fail()
> > 
> 
> seems fine to me otherwise.
> 
> Reviewed-by: John Snow 

Yeah, this also LGTM.

Reviewed-by: Cleber Rosa

Re: [Qemu-block] [Qemu-devel] [PATCH v5 0/6] Allow Valgrind checking all QEMU processes

2019-08-16 Thread Cleber Rosa

On Fri, Jul 19, 2019 at 07:30:10PM +0300, Andrey Shinkevich wrote:
> In the current implementation of the QEMU bash iotests, only qemu-io
> processes may be run under the Valgrind, which is a useful tool for
> finding memory usage issues. Let's allow the common.rc bash script
> runing all the QEMU processes, such as qemu-kvm, qemu-img, qemu-ndb
> and qemu-vxhs, under the Valgrind tool.
>

FIY, this looks very similar (in purpose) to:

   https://avocado-framework.readthedocs.io/en/71.0/WrapProcess.html

And in fact Valgrind was one of the original motivations:

   
https://github.com/avocado-framework/avocado/blob/master/examples/wrappers/valgrind.sh

Maybe this can be helpful for the Python based iotests.

- Cleber.

> v5:
>   01: The patch "block/nbd: NBDReply is used being uninitialized" was detached
>   and taken into account in the patch "nbd: Initialize reply on failure"
>   by Eric Blake.
> 
> v4:
>   01: The patch "iotests: Set read-zeroes on in null block driver for 
> Valgrind"
>   was extended with new cases and issued as a separate series.
>   02: The new patch "block/nbd: NBDReply is used being uninitialized" was
>   added to resolve the failure of the iotest 083 run under Valgrind.
> 
> v3:
>   01: The new function _casenotrun() was added to the common.rc bash
>   script to notify the user of test cases dropped for some reason.
>   Suggested by Kevin.
>   Particularly, the notification about the nonexistent TMPDIR in
>   the test 051 was added (noticed by Vladimir).
>   02: The timeout in some test cases was extended for Valgrind because
>   it differs when running on the ramdisk.
>   03: Due to the common.nbd script has been changed with the commit
>   b28f582c, the patch "iotests: amend QEMU NBD process synchronization"
>   is actual no more. Note that QEMU_NBD is launched in the bash nested
>   shell in the _qemu_nbd_wrapper() as it was before in common.rc.
>   04: The patch "iotests: new file to suppress Valgrind errors" was dropped
>   due to my superficial understanding of the work of the function
>   blk_pread_unthrottled(). Special thanks to Kevin who shed the light
>   on the null block driver involved. Now, the parameter 'read-zeroes=on'
>   is passed to the null block driver to initialize the buffer in the
>   function guess_disk_lchs() that the Valgrind was complaining to.
> 
> v2:
>   01: The patch 2/7 of v1 was merged into the patch 1/7, suggested by Daniel.
>   02: Another patch 7/7 was added to introduce the Valgrind error suppression
>   file into the QEMU project.
>   Discussed in the email thread with the message ID:
>   <1560276131-683243-1-git-send-email-andrey.shinkev...@virtuozzo.com>
> 
> Andrey Shinkevich (6):
>   iotests: allow Valgrind checking all QEMU processes
>   iotests: exclude killed processes from running under  Valgrind
>   iotests: Add casenotrun report to bash tests
>   iotests: Valgrind fails with nonexistent directory
>   iotests: extended timeout under Valgrind
>   iotests: extend sleeping time under Valgrind
> 
>  tests/qemu-iotests/028   |  6 +++-
>  tests/qemu-iotests/039   |  5 +++
>  tests/qemu-iotests/039.out   | 30 +++--
>  tests/qemu-iotests/051   |  4 +++
>  tests/qemu-iotests/061   |  2 ++
>  tests/qemu-iotests/061.out   | 12 ++-
>  tests/qemu-iotests/137   |  1 +
>  tests/qemu-iotests/137.out   |  6 +---
>  tests/qemu-iotests/183   |  9 +-
>  tests/qemu-iotests/192   |  6 +++-
>  tests/qemu-iotests/247   |  6 +++-
>  tests/qemu-iotests/common.rc | 76 
> +---
>  12 files changed, 101 insertions(+), 62 deletions(-)
> 
> -- 
> 1.8.3.1
> 
>

Re: [Qemu-block] [PATCH 0/4] backup: fix skipping unallocated clusters

2019-08-16 Thread John Snow




On 8/14/19 12:54 PM, Vladimir Sementsov-Ogievskiy wrote:
> 
> 
> 14 авг. 2019 г. 17:43 пользователь Vladimir Sementsov-Ogievskiy
>  написал:
> 
> Hi all!
> 
> There is a bug in not yet merged patch
> "block/backup: teach TOP to never copy unallocated regions"
> in https://github.com/jnsnow/qemu bitmaps. 04 fixes it. So, I propose
> to put 01-03 somewhere before
> "block/backup: teach TOP to never copy unallocated regions"
> and squash 04 into "block/backup: teach TOP to never copy
> unallocated regions" 
> 
> 
> Hmm, don't bother with it. Simpler is fix the bug in your commit by just
> use skip_bytes variable when initializing dirty_end.
> 

OK, just use Max's fix instead of this entire 4 patch series?

--js

Re: [Qemu-block] [PATCH] virtio-blk: Cancel the pending BH when the dataplane is reset

2019-08-16 Thread John Snow




On 8/16/19 1:15 PM, Philippe Mathieu-Daudé wrote:
> When 'system_reset' is called, the main loop clear the memory
> region cache before the BH has a chance to execute. Later when
> the deferred function is called, some assumptions that were
> made when scheduling them are no longer true when they actually
> execute.
> 
> This is what happens using a virtio-blk device (fresh RHEL7.8 install):
> 
>  $ (sleep 12.3; echo system_reset; sleep 12.3; echo system_reset; sleep 1; 
> echo q) \
>| qemu-system-x86_64 -m 4G -smp 8 -boot menu=on \
>  -device virtio-blk-pci,id=image1,drive=drive_image1 \
>  -drive 
> file=/var/lib/libvirt/images/rhel78.qcow2,if=none,id=drive_image1,format=qcow2,cache=none
>  \
>  -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
>  -netdev tap,id=net0,script=/bin/true,downscript=/bin/true,vhost=on \
>  -monitor stdio -serial null -nographic
>   (qemu) system_reset
>   (qemu) system_reset
>   (qemu) qemu-system-x86_64: hw/virtio/virtio.c:225: vring_get_region_caches: 
> Assertion `caches != NULL' failed.
>   Aborted
> 
>   (gdb) bt
>   Thread 1 (Thread 0x7f109c17b680 (LWP 10939)):
>   #0  0x5604083296d1 in vring_get_region_caches (vq=0x56040a24bdd0) at 
> hw/virtio/virtio.c:227
>   #1  0x56040832972b in vring_avail_flags (vq=0x56040a24bdd0) at 
> hw/virtio/virtio.c:235
>   #2  0x56040832d13d in virtio_should_notify (vdev=0x56040a240630, 
> vq=0x56040a24bdd0) at hw/virtio/virtio.c:1648
>   #3  0x56040832d1f8 in virtio_notify_irqfd (vdev=0x56040a240630, 
> vq=0x56040a24bdd0) at hw/virtio/virtio.c:1662
>   #4  0x5604082d213d in notify_guest_bh (opaque=0x56040a243ec0) at 
> hw/block/dataplane/virtio-blk.c:75
>   #5  0x56040883dc35 in aio_bh_call (bh=0x56040a243f10) at util/async.c:90
>   #6  0x56040883dccd in aio_bh_poll (ctx=0x560409161980) at 
> util/async.c:118
>   #7  0x560408842af7 in aio_dispatch (ctx=0x560409161980) at 
> util/aio-posix.c:460
>   #8  0x56040883e068 in aio_ctx_dispatch (source=0x560409161980, 
> callback=0x0, user_data=0x0) at util/async.c:261
>   #9  0x7f10a8fca06d in g_main_context_dispatch () at 
> /lib64/libglib-2.0.so.0
>   #10 0x560408841445 in glib_pollfds_poll () at util/main-loop.c:215
>   #11 0x5604088414bf in os_host_main_loop_wait (timeout=0) at 
> util/main-loop.c:238
>   #12 0x5604088415c4 in main_loop_wait (nonblocking=0) at 
> util/main-loop.c:514
>   #13 0x560408416b1e in main_loop () at vl.c:1923
>   #14 0x56040841e0e8 in main (argc=20, argv=0x7ffc2c3f9c58, 
> envp=0x7ffc2c3f9d00) at vl.c:4578
> 
> Fix this by cancelling the BH when the virtio dataplane is stopped.
> 
> Reported-by: Yihuang Yu 
> Suggested-by: Stefan Hajnoczi 
> Fixes: https://bugs.launchpad.net/qemu/+bug/1839428
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/dataplane/virtio-blk.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 9299a1a7c2..4030faa21d 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -301,6 +301,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
>  /* Clean up guest notifier (irq) */
>  k->set_guest_notifiers(qbus->parent, nvqs, false);
>  
> +qemu_bh_cancel(s->bh);
> +
>  vblk->dataplane_started = false;
>  s->stopping = false;
>  }
> 

Naive question:

Since we're canceling the BH here and we're stopping the device; do we
need to do anything like clear out batch_notify_vqs? I assume in
system_reset contexts that's going to be handled anyway, are there
non-reset contexts where it matters?

--js

Re: [Qemu-block] [PATCH for-4.2 09/13] qcow2: Fix overly long snapshot tables

2019-08-16 Thread Max Reitz

On 31.07.19 11:22, Max Reitz wrote:
> On 30.07.19 21:08, Eric Blake wrote:
>> On 7/30/19 12:25 PM, Max Reitz wrote:
>>> We currently refuse to open qcow2 images with overly long snapshot
>>> tables.  This patch makes qemu-img check -r all drop all offending
>>> entries past what we deem acceptable.
>>>
>>> Signed-off-by: Max Reitz 
>>> ---
>>>  block/qcow2-snapshot.c | 89 +-
>>>  1 file changed, 79 insertions(+), 10 deletions(-)
>>
>> I'm less sure about this one.  8/13 should have no semantic effect (if
>> the user _depended_ on that much extra data, they should have set an
>> incompatible feature flag bit, at which point we'd leave their data
>> alone because we don't recognize the feature bit; so it is safe to
>> assume the user did not depend on the data and that we can thus nuke it
>> with impunity).  But here, we are throwing away the user's internal
>> snapshots, and not even giving them a say in which ones to throw away
>> (more likely, by trimming from the end, we are destroying the most
>> recent snapshots in favor of the older ones - but I could argue that
>> throwing away the oldest also has its uses).
> 
> First, I don’t think there really is a legitimate use case for having an
> overly long snapshot table.  In fact, I think our limit is too high as
> it is and we just introduced it this way because we didn’t have any
> repair functionality, and so just had to pick some limit that nobody
> could ever reasonably reach.
> 
> (As the test shows, you need more than 500 snapshots with 64 kB names
> and ID strings, and 1 kB of extra data to reach this limit.)
> 
> So the only likely cause to reach this number of snapshots is
> corruption.  OK, so maybe we don’t need to be able to fix it, then,
> because the image is corrupted anyway.
> 
> But I think we do want to be able to fix it, because otherwise you just
> can’t open the image at all and thus not even read the active layer.
> 
> 
> This gets me to: Second, it doesn’t make things worse.  Right now, we
> just refuse to open such images in all cases.  I’d personally prefer
> discarding some data on my image over losing it all.
> 
> 
> And third, I wonder what interface you have in mind.  I think adding an
> interface to qemu-img check to properly address this problem (letting
> the user discard individual snapshots) is hard.  I could imagine two things:
> 
> (A) Making qemu-img snapshot sometimes set BDRV_O_CHECK, too, or
> something.  For qemu-img snapshot -d, you don’t need to read the whole
> table into memory, and thus we don’t need to impose any limit.  But that
> seems pretty hackish to me.
> 
> (B) Maybe the proper solution would be to add an interactive interface
> to bdrv_check().  I can imagine that in the future, we may get more
> cases where we want interaction with the user on what data to delete and
> so on.  But that's hard...  (I’ll try.  Good thing stdio is already the
> standard interface in bdrv_check(), so I won’t have to feel bad if I go
> down that route even further.)

After some fiddling around, I don’t think this is worth it.  As I said,
this is an extremely rare case anyway, so the main goal should be to
just being able to access the active layer to copy at least that data
off the image.

The other side is that this would introduce quite complex code that
basically cannot be tested reasonably.  I’d rather not do that.

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH] virtio-blk: Cancel the pending BH when the dataplane is reset

2019-08-16 Thread Philippe Mathieu-Daudé

When 'system_reset' is called, the main loop clear the memory
region cache before the BH has a chance to execute. Later when
the deferred function is called, some assumptions that were
made when scheduling them are no longer true when they actually
execute.

This is what happens using a virtio-blk device (fresh RHEL7.8 install):

 $ (sleep 12.3; echo system_reset; sleep 12.3; echo system_reset; sleep 1; echo 
q) \
   | qemu-system-x86_64 -m 4G -smp 8 -boot menu=on \
 -device virtio-blk-pci,id=image1,drive=drive_image1 \
 -drive 
file=/var/lib/libvirt/images/rhel78.qcow2,if=none,id=drive_image1,format=qcow2,cache=none
 \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
 -netdev tap,id=net0,script=/bin/true,downscript=/bin/true,vhost=on \
 -monitor stdio -serial null -nographic
  (qemu) system_reset
  (qemu) system_reset
  (qemu) qemu-system-x86_64: hw/virtio/virtio.c:225: vring_get_region_caches: 
Assertion `caches != NULL' failed.
  Aborted

  (gdb) bt
  Thread 1 (Thread 0x7f109c17b680 (LWP 10939)):
  #0  0x5604083296d1 in vring_get_region_caches (vq=0x56040a24bdd0) at 
hw/virtio/virtio.c:227
  #1  0x56040832972b in vring_avail_flags (vq=0x56040a24bdd0) at 
hw/virtio/virtio.c:235
  #2  0x56040832d13d in virtio_should_notify (vdev=0x56040a240630, 
vq=0x56040a24bdd0) at hw/virtio/virtio.c:1648
  #3  0x56040832d1f8 in virtio_notify_irqfd (vdev=0x56040a240630, 
vq=0x56040a24bdd0) at hw/virtio/virtio.c:1662
  #4  0x5604082d213d in notify_guest_bh (opaque=0x56040a243ec0) at 
hw/block/dataplane/virtio-blk.c:75
  #5  0x56040883dc35 in aio_bh_call (bh=0x56040a243f10) at util/async.c:90
  #6  0x56040883dccd in aio_bh_poll (ctx=0x560409161980) at util/async.c:118
  #7  0x560408842af7 in aio_dispatch (ctx=0x560409161980) at 
util/aio-posix.c:460
  #8  0x56040883e068 in aio_ctx_dispatch (source=0x560409161980, 
callback=0x0, user_data=0x0) at util/async.c:261
  #9  0x7f10a8fca06d in g_main_context_dispatch () at 
/lib64/libglib-2.0.so.0
  #10 0x560408841445 in glib_pollfds_poll () at util/main-loop.c:215
  #11 0x5604088414bf in os_host_main_loop_wait (timeout=0) at 
util/main-loop.c:238
  #12 0x5604088415c4 in main_loop_wait (nonblocking=0) at 
util/main-loop.c:514
  #13 0x560408416b1e in main_loop () at vl.c:1923
  #14 0x56040841e0e8 in main (argc=20, argv=0x7ffc2c3f9c58, 
envp=0x7ffc2c3f9d00) at vl.c:4578

Fix this by cancelling the BH when the virtio dataplane is stopped.

Reported-by: Yihuang Yu 
Suggested-by: Stefan Hajnoczi 
Fixes: https://bugs.launchpad.net/qemu/+bug/1839428
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/dataplane/virtio-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 9299a1a7c2..4030faa21d 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -301,6 +301,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 /* Clean up guest notifier (irq) */
 k->set_guest_notifiers(qbus->parent, nvqs, false);
 
+qemu_bh_cancel(s->bh);
+
 vblk->dataplane_started = false;
 s->stopping = false;
 }
-- 
2.20.1

Re: [Qemu-block] [PATCH] job: drop job_drain

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

16.08.2019 20:10, Vladimir Sementsov-Ogievskiy wrote:
> 16.08.2019 20:04, Vladimir Sementsov-Ogievskiy wrote:
>> In job_finish_sync job_enter should be enough for a job to make some
>> progress and draining is a wrong tool for it. So use job_enter directly
>> here and drop job_drain with all related staff not used more.
>>
>> Suggested-by: Kevin Wolf 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>
>> It's a continuation for
>>     [PATCH v4] blockjob: drain all job nodes in block_job_drain
>>
>>   include/block/blockjob_int.h | 19 ---
>>   include/qemu/job.h   | 13 -
>>   block/backup.c   | 19 +--
>>   block/commit.c   |  1 -
>>   block/mirror.c   | 28 +++-
>>   block/stream.c   |  1 -
>>   blockjob.c   | 13 -
>>   job.c    | 12 +---
>>   8 files changed, 5 insertions(+), 101 deletions(-)
>>
>> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
>> index e4a318dd15..e2824a36a8 100644
>> --- a/include/block/blockjob_int.h
>> +++ b/include/block/blockjob_int.h
>> @@ -52,17 +52,6 @@ struct BlockJobDriver {
>>    * besides job->blk to the new AioContext.
>>    */
>>   void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
>> -
>> -    /*
>> - * If the callback is not NULL, it will be invoked when the job has to 
>> be
>> - * synchronously cancelled or completed; it should drain 
>> BlockDriverStates
>> - * as required to ensure progress.
>> - *
>> - * Block jobs must use the default implementation for job_driver.drain,
>> - * which will in turn call this callback after doing generic block job
>> - * stuff.
>> - */
>> -    void (*drain)(BlockJob *job);
>>   };
>>   /**
>> @@ -107,14 +96,6 @@ void block_job_free(Job *job);
>>    */
>>   void block_job_user_resume(Job *job);
>> -/**
>> - * block_job_drain:
>> - * Callback to be used for JobDriver.drain in all block jobs. Drains the 
>> main
>> - * block node associated with the block jobs and calls BlockJobDriver.drain 
>> for
>> - * job-specific actions.
>> - */
>> -void block_job_drain(Job *job);
>> -
>>   /**
>>    * block_job_ratelimit_get_delay:
>>    *
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 9e7cd1e4a0..09739b8dd9 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -220,13 +220,6 @@ struct JobDriver {
>>    */
>>   void (*complete)(Job *job, Error **errp);
>> -    /*
>> - * If the callback is not NULL, it will be invoked when the job has to 
>> be
>> - * synchronously cancelled or completed; it should drain any activities
>> - * as required to ensure progress.
>> - */
>> -    void (*drain)(Job *job);
>> -
>>   /**
>>    * If the callback is not NULL, prepare will be invoked when all the 
>> jobs
>>    * belonging to the same transaction complete; or upon this job's 
>> completion
>> @@ -470,12 +463,6 @@ bool job_user_paused(Job *job);
>>    */
>>   void job_user_resume(Job *job, Error **errp);
>> -/*
>> - * Drain any activities as required to ensure progress. This can be called 
>> in a
>> - * loop to synchronously complete a job.
>> - */
>> -void job_drain(Job *job);
>> -
>>   /**
>>    * Get the next element from the list of block jobs after @job, or the
>>    * first one if @job is %NULL.
>> diff --git a/block/backup.c b/block/backup.c
>> index 715e1d3be8..d1ecdfa9aa 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -320,21 +320,6 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
>>   hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
>>   }
>> -static void backup_drain(BlockJob *job)
>> -{
>> -    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>> -
>> -    /* Need to keep a reference in case blk_drain triggers execution
>> - * of backup_complete...
>> - */
>> -    if (s->target) {
>> -    BlockBackend *target = s->target;
>> -    blk_ref(target);
>> -    blk_drain(target);
>> -    blk_unref(target);
>> -    }
>> -}
>> -
>>   static BlockErrorAction backup_error_action(BackupBlockJob *job,
>>   bool read, int error)
>>   {
>> @@ -488,13 +473,11 @@ static const BlockJobDriver backup_job_driver = {
>>   .job_type   = JOB_TYPE_BACKUP,
>>   .free   = block_job_free,
>>   .user_resume    = block_job_user_resume,
>> -    .drain  = block_job_drain,
>>   .run    = backup_run,
>>   .commit = backup_commit,
>>   .abort  = backup_abort,
>>   .clean  = backup_clean,
>> -    },
>> -    .drain  = backup_drain,
>> +    }
>>   };
>>   static int64_t backup_calculate_cluster_size(BlockDriverState *target,
>> diff --git a/block/c

Re: [Qemu-block] [PATCH] job: drop job_drain

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

16.08.2019 20:04, Vladimir Sementsov-Ogievskiy wrote:
> In job_finish_sync job_enter should be enough for a job to make some
> progress and draining is a wrong tool for it. So use job_enter directly
> here and drop job_drain with all related staff not used more.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> It's a continuation for
> [PATCH v4] blockjob: drain all job nodes in block_job_drain
> 
>   include/block/blockjob_int.h | 19 ---
>   include/qemu/job.h   | 13 -
>   block/backup.c   | 19 +--
>   block/commit.c   |  1 -
>   block/mirror.c   | 28 +++-
>   block/stream.c   |  1 -
>   blockjob.c   | 13 -
>   job.c| 12 +---
>   8 files changed, 5 insertions(+), 101 deletions(-)
> 
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index e4a318dd15..e2824a36a8 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -52,17 +52,6 @@ struct BlockJobDriver {
>* besides job->blk to the new AioContext.
>*/
>   void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
> -
> -/*
> - * If the callback is not NULL, it will be invoked when the job has to be
> - * synchronously cancelled or completed; it should drain 
> BlockDriverStates
> - * as required to ensure progress.
> - *
> - * Block jobs must use the default implementation for job_driver.drain,
> - * which will in turn call this callback after doing generic block job
> - * stuff.
> - */
> -void (*drain)(BlockJob *job);
>   };
>   
>   /**
> @@ -107,14 +96,6 @@ void block_job_free(Job *job);
>*/
>   void block_job_user_resume(Job *job);
>   
> -/**
> - * block_job_drain:
> - * Callback to be used for JobDriver.drain in all block jobs. Drains the main
> - * block node associated with the block jobs and calls BlockJobDriver.drain 
> for
> - * job-specific actions.
> - */
> -void block_job_drain(Job *job);
> -
>   /**
>* block_job_ratelimit_get_delay:
>*
> diff --git a/include/qemu/job.h b/include/qemu/job.h
> index 9e7cd1e4a0..09739b8dd9 100644
> --- a/include/qemu/job.h
> +++ b/include/qemu/job.h
> @@ -220,13 +220,6 @@ struct JobDriver {
>*/
>   void (*complete)(Job *job, Error **errp);
>   
> -/*
> - * If the callback is not NULL, it will be invoked when the job has to be
> - * synchronously cancelled or completed; it should drain any activities
> - * as required to ensure progress.
> - */
> -void (*drain)(Job *job);
> -
>   /**
>* If the callback is not NULL, prepare will be invoked when all the 
> jobs
>* belonging to the same transaction complete; or upon this job's 
> completion
> @@ -470,12 +463,6 @@ bool job_user_paused(Job *job);
>*/
>   void job_user_resume(Job *job, Error **errp);
>   
> -/*
> - * Drain any activities as required to ensure progress. This can be called 
> in a
> - * loop to synchronously complete a job.
> - */
> -void job_drain(Job *job);
> -
>   /**
>* Get the next element from the list of block jobs after @job, or the
>* first one if @job is %NULL.
> diff --git a/block/backup.c b/block/backup.c
> index 715e1d3be8..d1ecdfa9aa 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -320,21 +320,6 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
>   hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
>   }
>   
> -static void backup_drain(BlockJob *job)
> -{
> -BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> -
> -/* Need to keep a reference in case blk_drain triggers execution
> - * of backup_complete...
> - */
> -if (s->target) {
> -BlockBackend *target = s->target;
> -blk_ref(target);
> -blk_drain(target);
> -blk_unref(target);
> -}
> -}
> -
>   static BlockErrorAction backup_error_action(BackupBlockJob *job,
>   bool read, int error)
>   {
> @@ -488,13 +473,11 @@ static const BlockJobDriver backup_job_driver = {
>   .job_type   = JOB_TYPE_BACKUP,
>   .free   = block_job_free,
>   .user_resume= block_job_user_resume,
> -.drain  = block_job_drain,
>   .run= backup_run,
>   .commit = backup_commit,
>   .abort  = backup_abort,
>   .clean  = backup_clean,
> -},
> -.drain  = backup_drain,
> +}
>   };
>   
>   static int64_t backup_calculate_cluster_size(BlockDriverState *target,
> diff --git a/block/commit.c b/block/commit.c
> index 2c5a6d4ebc..697a779d8e 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -216,7 +216,6 @@ static const BlockJobDriver commi

[Qemu-block] [PATCH] job: drop job_drain

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

In job_finish_sync job_enter should be enough for a job to make some
progress and draining is a wrong tool for it. So use job_enter directly
here and drop job_drain with all related staff not used more.

Suggested-by: Kevin Wolf 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

It's a continuation for
   [PATCH v4] blockjob: drain all job nodes in block_job_drain

 include/block/blockjob_int.h | 19 ---
 include/qemu/job.h   | 13 -
 block/backup.c   | 19 +--
 block/commit.c   |  1 -
 block/mirror.c   | 28 +++-
 block/stream.c   |  1 -
 blockjob.c   | 13 -
 job.c| 12 +---
 8 files changed, 5 insertions(+), 101 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index e4a318dd15..e2824a36a8 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -52,17 +52,6 @@ struct BlockJobDriver {
  * besides job->blk to the new AioContext.
  */
 void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
-
-/*
- * If the callback is not NULL, it will be invoked when the job has to be
- * synchronously cancelled or completed; it should drain BlockDriverStates
- * as required to ensure progress.
- *
- * Block jobs must use the default implementation for job_driver.drain,
- * which will in turn call this callback after doing generic block job
- * stuff.
- */
-void (*drain)(BlockJob *job);
 };
 
 /**
@@ -107,14 +96,6 @@ void block_job_free(Job *job);
  */
 void block_job_user_resume(Job *job);
 
-/**
- * block_job_drain:
- * Callback to be used for JobDriver.drain in all block jobs. Drains the main
- * block node associated with the block jobs and calls BlockJobDriver.drain for
- * job-specific actions.
- */
-void block_job_drain(Job *job);
-
 /**
  * block_job_ratelimit_get_delay:
  *
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 9e7cd1e4a0..09739b8dd9 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -220,13 +220,6 @@ struct JobDriver {
  */
 void (*complete)(Job *job, Error **errp);
 
-/*
- * If the callback is not NULL, it will be invoked when the job has to be
- * synchronously cancelled or completed; it should drain any activities
- * as required to ensure progress.
- */
-void (*drain)(Job *job);
-
 /**
  * If the callback is not NULL, prepare will be invoked when all the jobs
  * belonging to the same transaction complete; or upon this job's 
completion
@@ -470,12 +463,6 @@ bool job_user_paused(Job *job);
  */
 void job_user_resume(Job *job, Error **errp);
 
-/*
- * Drain any activities as required to ensure progress. This can be called in a
- * loop to synchronously complete a job.
- */
-void job_drain(Job *job);
-
 /**
  * Get the next element from the list of block jobs after @job, or the
  * first one if @job is %NULL.
diff --git a/block/backup.c b/block/backup.c
index 715e1d3be8..d1ecdfa9aa 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -320,21 +320,6 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
 }
 
-static void backup_drain(BlockJob *job)
-{
-BackupBlockJob *s = container_of(job, BackupBlockJob, common);
-
-/* Need to keep a reference in case blk_drain triggers execution
- * of backup_complete...
- */
-if (s->target) {
-BlockBackend *target = s->target;
-blk_ref(target);
-blk_drain(target);
-blk_unref(target);
-}
-}
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
 bool read, int error)
 {
@@ -488,13 +473,11 @@ static const BlockJobDriver backup_job_driver = {
 .job_type   = JOB_TYPE_BACKUP,
 .free   = block_job_free,
 .user_resume= block_job_user_resume,
-.drain  = block_job_drain,
 .run= backup_run,
 .commit = backup_commit,
 .abort  = backup_abort,
 .clean  = backup_clean,
-},
-.drain  = backup_drain,
+}
 };
 
 static int64_t backup_calculate_cluster_size(BlockDriverState *target,
diff --git a/block/commit.c b/block/commit.c
index 2c5a6d4ebc..697a779d8e 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -216,7 +216,6 @@ static const BlockJobDriver commit_job_driver = {
 .job_type  = JOB_TYPE_COMMIT,
 .free  = block_job_free,
 .user_resume   = block_job_user_resume,
-.drain = block_job_drain,
 .run   = commit_run,
 .prepare   = commit_prepare,
 .abort = commit_abort,
diff --git a/block/mirror.c b/block/mirror.c
index 8cb75fb409..b91abe0288

Re: [Qemu-block] [PATCH v3 0/4] qcow2: async handling of fragmented io

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

15.08.2019 18:39, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2019 17:09, Max Reitz wrote:
>> On 15.08.19 14:10, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> Here is an asynchronous scheme for handling fragmented qcow2
>>> reads and writes. Both qcow2 read and write functions loops through
>>> sequential portions of data. The series aim it to parallelize these
>>> loops iterations.
>>> It improves performance for fragmented qcow2 images, I've tested it
>>> as described below.
>>
>> Looks good to me, but I can’t take it yet because I need to wait for
>> Stefan’s branch to be merged, of course.
>>
>> Speaking of which, why didn’t you add any tests for the *_part()
>> methods?  I find it a bit unsettling that nothing would have caught the
>> bug you had in v2 in patch 3.
>>
> 
> Hmm, any test with write to fragmented area should have caught it.. Ok,
> I'll think of something.
> 
> 

And now I see that it's not trivial to make such a test:

1. qcow2 write is broken when we give nonzero qiov_offset to it, but only
qcow2_write calls bdrv_co_pwritev_part, so we need to have a test where qcow2
is a file child for qcow2

2. Then, the bug leads to the beginning of the qiov will be written to all 
parts.
But our testing tool qemu-io has only "write -P" command with buffer filled with
the same one byte, so we can't catch it

-- 
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PULL 00/16] Block layer patches

2019-08-16 Thread Peter Maydell

On Fri, 16 Aug 2019 at 10:36, Kevin Wolf  wrote:
>
> The following changes since commit 9e06029aea3b2eca1d5261352e695edc1e7d7b8b:
>
>   Update version for v4.1.0 release (2019-08-15 13:03:37 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to a6b257a08e3d72219f03e461a52152672fec0612:
>
>   file-posix: Handle undetectable alignment (2019-08-16 11:29:11 +0200)
>
> 
> Block layer patches:
>
> - file-posix: Fix O_DIRECT alignment detection
> - Fixes for concurrent block jobs
> - block-backend: Queue requests while drained (fix IDE vs. job crashes)
> - qemu-img convert: Deprecate using -n and -o together
> - iotests: Migration tests with filter nodes
> - iotests: More media change tests
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

Re: [Qemu-block] [PATCH v6 37/42] block: Leave BDS.backing_file constant

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

09.08.2019 19:14, Max Reitz wrote:
> Parts of the block layer treat BDS.backing_file as if it were whatever
> the image header says (i.e., if it is a relative path, it is relative to
> the overlay), other parts treat it like a cache for
> bs->backing->bs->filename (relative paths are relative to the CWD).
> Considering bs->backing->bs->filename exists, let us make it mean the
> former.
> 
> Among other things, this now allows the user to specify a base when
> using qemu-img to commit an image file in a directory that is not the
> CWD (assuming, everything uses relative filenames).
> 
> Before this patch:
> 
> $ ./qemu-img create -f qcow2 foo/bot.qcow2 1M
> $ ./qemu-img create -f qcow2 -b bot.qcow2 foo/mid.qcow2
> $ ./qemu-img create -f qcow2 -b mid.qcow2 foo/top.qcow2
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find '[...]/foo/mid.qcow2' in the backing chain of 
> 'foo/top.qcow2'

nothing works

> 
> After this patch:
> 
> $ ./qemu-img commit -b mid.qcow2 foo/top.qcow2
> Image committed.
> $ ./qemu-img commit -b foo/mid.qcow2 foo/top.qcow2
> qemu-img: Did not find 'foo/mid.qcow2' in the backing chain of 'foo/top.qcow2'
> $ ./qemu-img commit -b $PWD/foo/mid.qcow2 foo/top.qcow2
> Image committed.

something works.. However it seems that not working one is actually most 
probable
to be called by user. Anyway something is better than nothing.

> 
> With this change, bdrv_find_backing_image() must look at whether the
> user has overridden a BDS's backing file.  If so, it can no longer use
> bs->backing_file, but must instead compare the given filename against
> the backing node's filename directly.
> 
> Note that this changes the QAPI output for a node's backing_file.  We
> had very inconsistent output there (sometimes what the image header
> said, sometimes the actual filename of the backing image).  This
> inconsistent output was effectively useless, so we have to decide one
> way or the other.  Considering that bs->backing_file usually at runtime
> contained the path to the image relative to qemu's CWD (or absolute),
> this patch changes QAPI's backing_file to always report the
> bs->backing->bs->filename from now on.  If you want to receive the image
> header information, you have to refer to full-backing-filename.
> 
> This necessitates a change to iotest 228.  The interesting information
> it really wanted is the image header, and it can get that now, but it
> has to use full-backing-filename instead of backing_file.  Because of
> this patch's changes to bs->backing_file's behavior, we also need some
> reference output changes.
> 
> Along with the changes to bs->backing_file, stop updating
> BDS.backing_format in bdrv_backing_attach() as well.  This necessitates
> a change to the reference output of iotest 191.
> 
> iotest 245 changes in behavior: With the backing node no longer
> overriding the parent node's backing_file string, you can now omit the
> @backing option when reopening a node with neither a default nor a
> current backing file even if it used to have a backing node at some
> point.
> 
> Signed-off-by: Max Reitz 
> ---
>   include/block/block_int.h  | 19 ++-
>   block.c| 35 ---
>   block/qapi.c   |  7 ---
>   tests/qemu-iotests/191.out |  1 -
>   tests/qemu-iotests/228 |  6 +++---
>   tests/qemu-iotests/228.out |  6 +++---
>   tests/qemu-iotests/245 |  4 +++-
>   7 files changed, 55 insertions(+), 23 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 42ee2fcf7f..993bafc090 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -784,11 +784,20 @@ struct BlockDriverState {
>   bool walking_aio_notifiers; /* to make removal during iteration safe */
>   
>   char filename[PATH_MAX];
> -char backing_file[PATH_MAX]; /* if non zero, the image is a diff of
> -this file image */
> -/* The backing filename indicated by the image header; if we ever
> - * open this file, then this is replaced by the resulting BDS's
> - * filename (i.e. after a bdrv_refresh_filename() run). */
> +/*
> + * If not empty, this image is a diff in relation to backing_file.
> + * Note that this is the name given in the image header

Is it synced when image header is updated? If yes, it's not constant, if not 
it's just wrong.

> and
> + * therefore may or may not be equal to .backing->bs->filename.
> + * If this field contains a relative path, it is to be resolved
> + * relatively to the overlay's location.
> + */
> +char backing_file[PATH_MAX];
> +/*
> + * The backing filename indicat

[Qemu-block] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

qemu-io now prefixes its error and warnings with "qemu-io:".
36b9986b08787019e fixed a lot of iotests output but forget about
026.out.nocache. Fix it too.

Fixes: 99e98d7c9fc1a1639fad ("qemu-io: Use error_[gs]et_progname()")
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/026.out.nocache | 168 ++---
 1 file changed, 84 insertions(+), 84 deletions(-)

diff --git a/tests/qemu-iotests/026.out.nocache 
b/tests/qemu-iotests/026.out.nocache
index 1ca6cda15c..6dda95dfb4 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -14,8 +14,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 
 1 leaked clusters were found on the image.
@@ -23,8 +23,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 
 1 leaked clusters were found on the image.
@@ -42,8 +42,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 1 leaked clusters were found on the image.
@@ -51,8 +51,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 1 leaked clusters were found on the image.
@@ -136,8 +136,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_update; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -146,8 +146,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_update; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -168,8 +168,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_update; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -178,8 +178,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_update; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -198,15 +198,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGF

[Qemu-block] [PATCH v4 5/5] block/qcow2: introduce parallel subrequest handling in read and write

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

It improves performance for fragmented qcow2 images. It also affect 026
iotest, increasing leaked clusters number, which is not surprising when
we run several sub-requests of qcow2 request in parallel.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.h  |   3 +
 block/qcow2.c  | 125 ++---
 block/trace-events |   1 +
 tests/qemu-iotests/026.out |  18 +++--
 tests/qemu-iotests/026.out.nocache |  20 ++---
 5 files changed, 138 insertions(+), 29 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 998bcdaef1..fdfa9c31cd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -65,6 +65,9 @@
 #define QCOW2_MAX_BITMAPS 65535
 #define QCOW2_MAX_BITMAP_DIRECTORY_SIZE (1024 * QCOW2_MAX_BITMAPS)
 
+/* Maximum of parallel sub-request per guest request */
+#define QCOW2_MAX_WORKERS 8
+
 /* indicate that the refcount of the referenced cluster is exactly one. */
 #define QCOW_OFLAG_COPIED (1ULL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
diff --git a/block/qcow2.c b/block/qcow2.c
index 3aaa180e2b..36b41e8536 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -40,6 +40,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "crypto.h"
+#include "block/aio_task.h"
 
 /*
   Differences with QCOW:
@@ -2017,6 +2018,60 @@ fail:
 return ret;
 }
 
+typedef struct Qcow2AioTask {
+AioTask task;
+
+BlockDriverState *bs;
+QCow2ClusterType cluster_type; /* only for read */
+uint64_t file_cluster_offset;
+uint64_t offset;
+uint64_t bytes;
+QEMUIOVector *qiov;
+uint64_t qiov_offset;
+QCowL2Meta *l2meta; /* only for write */
+} Qcow2AioTask;
+
+static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task);
+static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
+   AioTaskPool *pool,
+   AioTaskFunc func,
+   QCow2ClusterType cluster_type,
+   uint64_t file_cluster_offset,
+   uint64_t offset,
+   uint64_t bytes,
+   QEMUIOVector *qiov,
+   size_t qiov_offset,
+   QCowL2Meta *l2meta)
+{
+Qcow2AioTask local_task;
+Qcow2AioTask *task = pool ? g_new(Qcow2AioTask, 1) : &local_task;
+
+*task = (Qcow2AioTask) {
+.task.func = func,
+.bs = bs,
+.cluster_type = cluster_type,
+.qiov = qiov,
+.file_cluster_offset = file_cluster_offset,
+.offset = offset,
+.bytes = bytes,
+.qiov_offset = qiov_offset,
+.l2meta = l2meta,
+};
+
+trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
+ func == qcow2_co_preadv_task_entry ? "read" : "write",
+ cluster_type, file_cluster_offset, offset, bytes,
+ qiov, qiov_offset);
+
+if (!pool) {
+return func(&task->task);
+}
+
+aio_task_pool_start_task(pool, &task->task);
+
+return 0;
+}
+
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
  QCow2ClusterType cluster_type,
  uint64_t file_cluster_offset,
@@ -2066,18 +2121,28 @@ static coroutine_fn int 
qcow2_co_preadv_task(BlockDriverState *bs,
 g_assert_not_reached();
 }
 
+static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
+{
+Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
+
+assert(!t->l2meta);
+
+return qcow2_co_preadv_task(t->bs, t->cluster_type, t->file_cluster_offset,
+t->offset, t->bytes, t->qiov, t->qiov_offset);
+}
+
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
-int ret;
+int ret = 0;
 unsigned int cur_bytes; /* number of bytes in current iteration */
 uint64_t cluster_offset = 0;
+AioTaskPool *aio = NULL;
 
-while (bytes != 0) {
-
+while (bytes != 0 && aio_task_pool_status(aio) == 0) {
 /* prepare next request */
 cur_bytes = MIN(bytes, INT_MAX);
 if (s->crypto) {
@@ -2089,7 +2154,7 @@ static coroutine_fn int 
qcow2_co_preadv_part(BlockDriverState *bs,
 ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, 
&cluster_offset);
 qemu_co_mutex_unlock(&s->lock);
 if (ret < 0) {
-return ret;
+goto out;
 }
 
 if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
@@ -2098,

[Qemu-block] [PATCH v4 4/5] block/qcow2: refactor qcow2_co_pwritev_part

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

Similarly to previous commit, prepare for parallelizing write-loop
iterations.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 153 +-
 1 file changed, 89 insertions(+), 64 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 89afb4272e..3aaa180e2b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2234,6 +2234,87 @@ static int handle_alloc_space(BlockDriverState *bs, 
QCowL2Meta *l2meta)
 return 0;
 }
 
+/*
+ * qcow2_co_pwritev_task
+ * Called with s->lock unlocked
+ * l2meta  - if not NULL, qcow2_co_do_pwritev() will consume it. Caller must 
not
+ *   use it somehow after qcow2_co_pwritev_task() call
+ */
+static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
+  uint64_t file_cluster_offset,
+  uint64_t offset, uint64_t bytes,
+  QEMUIOVector *qiov,
+  uint64_t qiov_offset,
+  QCowL2Meta *l2meta)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+void *crypt_buf = NULL;
+int offset_in_cluster = offset_into_cluster(s, offset);
+QEMUIOVector encrypted_qiov;
+
+if (bs->encrypted) {
+assert(s->crypto);
+assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
+if (crypt_buf == NULL) {
+ret = -ENOMEM;
+goto out_unlocked;
+}
+qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
+
+if (qcow2_co_encrypt(bs, file_cluster_offset, offset,
+ crypt_buf, bytes) < 0) {
+ret = -EIO;
+goto out_unlocked;
+}
+
+qemu_iovec_init_buf(&encrypted_qiov, crypt_buf, bytes);
+qiov = &encrypted_qiov;
+qiov_offset = 0;
+}
+
+/* Try to efficiently initialize the physical space with zeroes */
+ret = handle_alloc_space(bs, l2meta);
+if (ret < 0) {
+goto out_unlocked;
+}
+
+/*
+ * If we need to do COW, check if it's possible to merge the
+ * writing of the guest data together with that of the COW regions.
+ * If it's not possible (or not necessary) then write the
+ * guest data now.
+ */
+if (!merge_cow(offset, bytes, qiov, qiov_offset, l2meta)) {
+BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+trace_qcow2_writev_data(qemu_coroutine_self(),
+file_cluster_offset + offset_in_cluster);
+ret = bdrv_co_pwritev_part(s->data_file,
+   file_cluster_offset + offset_in_cluster,
+   bytes, qiov, qiov_offset, 0);
+if (ret < 0) {
+goto out_unlocked;
+}
+}
+
+qemu_co_mutex_lock(&s->lock);
+
+ret = qcow2_handle_l2meta(bs, &l2meta, true);
+goto out_locked;
+
+out_unlocked:
+qemu_co_mutex_lock(&s->lock);
+
+out_locked:
+qcow2_handle_l2meta(bs, &l2meta, false);
+qemu_co_mutex_unlock(&s->lock);
+
+qemu_vfree(crypt_buf);
+
+return ret;
+}
+
 static coroutine_fn int qcow2_co_pwritev_part(
 BlockDriverState *bs, uint64_t offset, uint64_t bytes,
 QEMUIOVector *qiov, size_t qiov_offset, int flags)
@@ -2243,15 +2324,10 @@ static coroutine_fn int qcow2_co_pwritev_part(
 int ret;
 unsigned int cur_bytes; /* number of sectors in current iteration */
 uint64_t cluster_offset;
-QEMUIOVector encrypted_qiov;
-uint64_t bytes_done = 0;
-uint8_t *cluster_data = NULL;
 QCowL2Meta *l2meta = NULL;
 
 trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
-qemu_co_mutex_lock(&s->lock);
-
 while (bytes != 0) {
 
 l2meta = NULL;
@@ -2265,6 +2341,8 @@ static coroutine_fn int qcow2_co_pwritev_part(
 - offset_in_cluster);
 }
 
+qemu_co_mutex_lock(&s->lock);
+
 ret = qcow2_alloc_cluster_offset(bs, offset, &cur_bytes,
  &cluster_offset, &l2meta);
 if (ret < 0) {
@@ -2282,73 +2360,20 @@ static coroutine_fn int qcow2_co_pwritev_part(
 
 qemu_co_mutex_unlock(&s->lock);
 
-if (bs->encrypted) {
-assert(s->crypto);
-if (!cluster_data) {
-cluster_data = qemu_try_blockalign(bs->file->bs,
-   QCOW_MAX_CRYPT_CLUSTERS
-   * s->cluster_size);
-if (cluster_data == NULL) {
-ret = -ENOMEM;
-goto out_unlocked;
-}
-}
-
-assert(cur_bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
-qemu_iovec_to_buf(qiov, qiov_offset + bytes_done,
-

[Qemu-block] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).

While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 qapi/block-core.json |   2 +-
 block/qcow2.c| 205 +++
 2 files changed, 111 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0d43d4f37c..dd80aa11db 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3266,7 +3266,7 @@
 'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
 'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-'cor_write', 'cluster_alloc_space', 'none'] }
+'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
 
 ##
 # @BlkdebugIOType:
diff --git a/block/qcow2.c b/block/qcow2.c
index 93ab7edcea..89afb4272e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1967,17 +1967,114 @@ out:
 return ret;
 }
 
+static coroutine_fn int
+qcow2_co_preadv_encrypted(BlockDriverState *bs,
+   uint64_t file_cluster_offset,
+   uint64_t offset,
+   uint64_t bytes,
+   QEMUIOVector *qiov,
+   uint64_t qiov_offset)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+uint8_t *buf;
+
+assert(bs->encrypted && s->crypto);
+assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+
+/*
+ * For encrypted images, read everything into a temporary
+ * contiguous buffer on which the AES functions can work.
+ * Also, decryption in a separate buffer is better as it
+ * prevents the guest from learning information about the
+ * encrypted nature of the virtual disk.
+ */
+
+buf = qemu_try_blockalign(s->data_file->bs, bytes);
+if (buf == NULL) {
+return -ENOMEM;
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_READ_ENCRYPTED);
+ret = bdrv_co_pread(s->data_file,
+file_cluster_offset + offset_into_cluster(s, offset),
+bytes, buf, 0);
+if (ret < 0) {
+goto fail;
+}
+
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+if (qcow2_co_decrypt(bs, file_cluster_offset, offset, buf, bytes) < 0) {
+ret = -EIO;
+goto fail;
+}
+qemu_iovec_from_buf(qiov, qiov_offset, buf, bytes);
+
+fail:
+qemu_vfree(buf);
+
+return ret;
+}
+
+static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
+ QCow2ClusterType cluster_type,
+ uint64_t file_cluster_offset,
+ uint64_t offset, uint64_t bytes,
+ QEMUIOVector *qiov,
+ size_t qiov_offset)
+{
+BDRVQcow2State *s = bs->opaque;
+int offset_in_cluster = offset_into_cluster(s, offset);
+
+switch (cluster_type) {
+case QCOW2_CLUSTER_ZERO_PLAIN:
+case QCOW2_CLUSTER_ZERO_ALLOC:
+/* Both zero types are handled in qcow2_co_preadv_part */
+g_assert_not_reached();
+
+case QCOW2_CLUSTER_UNALLOCATED:
+assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
+
+BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
+return bdrv_co_preadv_part(bs->backing, offset, bytes,
+   qiov, qiov_offset, 0);
+
+case QCOW2_CLUSTER_COMPRESSED:
+return qcow2_co_preadv_compressed(bs, file_cluster_offset,
+  offset, bytes, qiov, qiov_offset);
+
+case QCOW2_CLUSTER_NORMAL:
+if ((file_cluster_offset & 511) != 0) {
+return -EIO;
+}
+
+if (bs->encrypted) {
+return qcow2_co_preadv_encrypted(bs, file_cluster_offset,
+ offset, bytes, qiov, qiov_offset);
+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+return bdrv_co_preadv_part(s->data_file,
+   file_cluster_offset + offset_in_cluster,
+   bytes, qiov, qiov_offset, 0);
+
+default:
+g_assert_not_reached();
+}
+
+g_assert_not_reached();
+}
+
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov,
  size_t qiov_offset, int flags)
 {
 BDRVQcow2State *s =

[Qemu-block] [PATCH v4 0/5] qcow2: async handling of fragmented io

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

Hi all!

Here is an asynchronous scheme for handling fragmented qcow2
reads and writes. Both qcow2 read and write functions loops through
sequential portions of data. The series aim it to parallelize these
loops iterations.
It improves performance for fragmented qcow2 images, I've tested it
as described below.

v4 [perf results not updated]:
01: new patch. Unrelated, but need to fix 026 before the series to
correctly fix it after :)
02: - use coroutine_fn where appropriate (i.e. in aio_task_pool_new too)
- add Max's r-b
03,04: add Max's r-b
05: fix 026 output

v3 (by Max's comments) [perf results not updated]:

01: - use coroutine_fn where appropriate !!!
- add aio_task_pool_free
- add some comments
- move header to include/block
- s/wait_done/waiting
02: - Rewrite note about decryption in guest buffers [thx to Eric]
- separate g_assert_not_reached for QCOW2_CLUSTER_ZERO_*
- drop return after g_assert_not_reached
03: - drop bytes_done and correctly use qiov_offset
- fix comment
04: - move QCOW2_MAX_WORKERS to block/qcow2.h
- initialize ret in qcow2_co_preadv_part
Based-on: https://github.com/stefanha/qemu/commits/block


v2: changed a lot, as
 1. a lot of preparations around locks, hd_qiovs, threads for encryption
are done
 2. I decided to create separate file with async request handling API, to
reuse it for backup, stream and copy-on-read to improve their performance
too. Mirror and qemu-img convert has their own async request handling,
may be we'll be able finally merge all these similar code into one
feature.
Note that not all API calls used in qcow2, some will be needed on
following steps for parallelizing other io loops.

About testing:

I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
t-seq.qcow2 - sequentially written qcow2 image
t-reverse.qcow2 - filled by writing 64k portions from end to the start
t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
(see source code of image generation in the end for details)

and I've done several runs like the following (sequential io by 1mb chunks):

out=/tmp/block; echo > $out; cat /tmp/files | while read file; do for wr in 
{"","-w"}; do echo "$file" $wr; ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m 
-t none $wr "$file" | grep 'Run completed in' | awk '{print $4}' >> $out; done; 
done


short info about parameters:
  -w - do writes (otherwise do reads)
  -c - count of blocks
  -s - block size
  -t none - disable cache
  -n - native aio
  -d 1 - don't use parallel requests provided by qemu-img bench itself

results:

+---+-+-+
| file  | master  | async   |
+---+-+-+
| /ssd/t-part-rand.qcow2| 14.671  | 9.193   |
+---+-+-+
| /ssd/t-part-rand.qcow2 -w | 11.434  | 8.621   |
+---+-+-+
| /ssd/t-rand.qcow2 | 20.421  | 10.05   |
+---+-+-+
| /ssd/t-rand.qcow2 -w  | 11.097  | 8.915   |
+---+-+-+
| /ssd/t-reverse.qcow2  | 17.515  | 9.407   |
+---+-+-+
| /ssd/t-reverse.qcow2 -w   | 11.255  | 8.649   |
+---+-+-+
| /ssd/t-seq.qcow2  | 9.081   | 9.072   |
+---+-+-+
| /ssd/t-seq.qcow2 -w   | 8.761   | 8.747   |
+---+-+-+
| /tmp/t-part-rand.qcow2| 41.179  | 41.37   |
+---+-+-+
| /tmp/t-part-rand.qcow2 -w | 54.097  | 55.323  |
+---+-+-+
| /tmp/t-rand.qcow2 | 711.899 | 514.339 |
+---+-+-+
| /tmp/t-rand.qcow2 -w  | 546.259 | 642.114 |
+---+-+-+
| /tmp/t-reverse.qcow2  | 86.065  | 96.522  |
+---+-+-+
| /tmp/t-reverse.qcow2 -w   | 46.557  | 48.499  |
+---+-+-+
| /tmp/t-seq.qcow2  | 33.804  | 33.862  |
+---+-+-+
| /tmp/t-seq.qcow2 -w   | 34.299  | 34.233  |
+---+-+-+


Performance gain is obvious, especially for read and especially for ssd.
For hdd there is a degradation for reverse case, but this is the most
impossible case and seems not critical.

How images are generated:

 gen-writes ==
#!/usr/bin/env python
import random
import sys

size = 4 * 1024 * 1024 * 1024
block = 64 * 1024
block2 = 1024 * 1024

a

[Qemu-block] [PATCH v4 2/5] block: introduce aio task pool

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 include/block/aio_task.h |  54 +
 block/aio_task.c | 124 +++
 block/Makefile.objs  |   2 +
 3 files changed, 180 insertions(+)
 create mode 100644 include/block/aio_task.h
 create mode 100644 block/aio_task.c

diff --git a/include/block/aio_task.h b/include/block/aio_task.h
new file mode 100644
index 00..50bc1e1817
--- /dev/null
+++ b/include/block/aio_task.h
@@ -0,0 +1,54 @@
+/*
+ * Aio tasks loops
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_AIO_TASK_H
+#define BLOCK_AIO_TASK_H
+
+#include "qemu/coroutine.h"
+
+typedef struct AioTaskPool AioTaskPool;
+typedef struct AioTask AioTask;
+typedef int coroutine_fn (*AioTaskFunc)(AioTask *task);
+struct AioTask {
+AioTaskPool *pool;
+AioTaskFunc func;
+int ret;
+};
+
+AioTaskPool *coroutine_fn aio_task_pool_new(int max_busy_tasks);
+void aio_task_pool_free(AioTaskPool *);
+
+/* error code of failed task or 0 if all is OK */
+int aio_task_pool_status(AioTaskPool *pool);
+
+bool aio_task_pool_empty(AioTaskPool *pool);
+
+/* User provides filled @task, however task->pool will be set automatically */
+void coroutine_fn aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);
+
+void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool);
+void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool);
+void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool);
+
+#endif /* BLOCK_AIO_TASK_H */
diff --git a/block/aio_task.c b/block/aio_task.c
new file mode 100644
index 00..88989fa248
--- /dev/null
+++ b/block/aio_task.c
@@ -0,0 +1,124 @@
+/*
+ * Aio tasks loops
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/aio_task.h"
+
+struct AioTaskPool {
+Coroutine *main_co;
+int status;
+int max_busy_tasks;
+int busy_tasks;
+bool waiting;
+};
+
+static void coroutine_fn aio_task_co(void *opaque)
+{
+AioTask *task = opaque;
+AioTaskPool *pool = task->pool;
+
+assert(pool->busy_tasks < pool->max_busy_tasks);
+pool->busy_tasks++;
+
+task->ret = task->func(task);
+
+pool->busy_tasks--;
+
+if (task->ret < 0 && pool->status == 0) {
+pool->status = task->ret;
+}
+
+g_free(task);
+
+if (pool->waiting) {
+pool->waiting = false;
+aio_co_wake(pool->main_co);
+}
+}
+
+void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool)
+{
+assert(pool->busy_tasks > 0);
+assert(qemu_coroutine_self() == pool->main_co);

Re: [Qemu-block] [PATCH v6 35/42] block: Fix check_to_replace_node()

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

16.08.2019 16:30, Max Reitz wrote:
> On 16.08.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
>> 15.08.2019 20:01, Max Reitz wrote:
>>> On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote:
 09.08.2019 19:14, Max Reitz wrote:
> Currently, check_to_replace_node() only allows mirror to replace a node
> in the chain of the source node, and only if it is the first non-filter
> node below the source.  Well, technically, the idea is that you can
> exactly replace a quorum child by mirroring from quorum.
>
> This has (probably) two reasons:
> (1) We do not want to create loops.
> (2) @replaces and @device should have exactly the same content so
>replacing them does not cause visible data to change.
>
> This has two issues:
> (1) It is overly restrictive.  It is completely fine for @replaces to be
>a filter.
> (2) It is not restrictive enough.  You can create loops with this as
>follows:
>
> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
> $ qemu-system-x86_64 -qmp stdio
> {"execute": "qmp_capabilities"}
> {"execute": "object-add",
> "arguments": {"qom-type": "throttle-group", "id": "tg0"}}
> {"execute": "blockdev-add",
> "arguments": {
> "node-name": "source",
> "driver": "throttle",
> "throttle-group": "tg0",
> "file": {
> "node-name": "filtered",
> "driver": "qcow2",
> "file": {
> "driver": "file",
> "filename": "/tmp/source.qcow2"
> } } } }
> {"execute": "drive-mirror",
> "arguments": {
> "job-id": "mirror",
> "device": "source",
> "target": "/tmp/target.qcow2",
> "format": "qcow2",
> "node-name": "target",
> "sync" :"none",
> "replaces": "filtered"
> } }
> {"execute": "block-job-complete", "arguments": {"device": "mirror"}}
>
> And qemu crashes because of a stack overflow due to the loop being
> created (target's backing file is source, so when it replaces filtered,
> it points to itself through source).
>
> (blockdev-mirror can be broken similarly.)
>
> So let us make the checks for the two conditions above explicit, which
> makes the whole function exactly as restrictive as it needs to be.
>
> Signed-off-by: Max Reitz 
> ---
> include/block/block.h |  1 +
> block.c   | 83 +++
> blockdev.c| 34 --
> 3 files changed, 110 insertions(+), 8 deletions(-)
>
> diff --git a/include/block/block.h b/include/block/block.h
> index 6ba853fb90..8da706cd89 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState 
> *candidate);
> 
> /* check if a named node can be replaced when doing drive-mirror */
> BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
> +BlockDriverState *backing_bs,
> const char *node_name, Error 
> **errp);
> 
> /* async block I/O */
> diff --git a/block.c b/block.c
> index 915b80153c..4858d3e718 100644
> --- a/block.c
> +++ b/block.c
> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
> *candidate)
> return false;
> }
> 
> +static bool is_child_of(BlockDriverState *child, BlockDriverState 
> *parent)
> +{
> +BdrvChild *c;
> +
> +if (!parent) {
> +return false;
> +}
> +
> +QLIST_FOREACH(c, &parent->children, next) {
> +if (c->bs == child || is_child_of(child, c->bs)) {
> +return true;
> +}
> +}
> +
> +return false;
> +}
> +
> +/*
> + * Return true if there are only filters in [@top, @base).  Note that
> + * this may include quorum (which bdrv_chain_contains() cannot
> + * handle).

 More presizely: return true if exists chain of filters from top to base or 
 if
 top == base.

 I keep in mind backup-top filter:

 [backup-top]
 |  \target
>>>
>>> backup-top can’t be a filter if it has two children with different
>>> contents, though.
>>
>> Why? target is special child, unrelated to what is read/written over 
>> backup-top.
>> It's an own business of backup-top.
>>
>>>
>>> (commit-top and mirror-top aren’t filters either.)
>>
>> Ahm, I missed something. They have is_filter = true and their children 
>> considered
>> to be filtered-rw children in your series? And than, who they are? Format 
>> nodes?
>> And how they appears in backing chains than?
> 
> Er

Re: [Qemu-block] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size

2019-08-16 Thread Alberto Garcia

On Fri 16 Aug 2019 04:08:19 PM CEST, Kevin Wolf wrote:
>> And yes, the odd value on the 512KB row on that we discussed last month
>> is due to this same bug:
>> 
>> https://lists.gnu.org/archive/html/qemu-block/2019-07/msg00496.html
>
> Hm... And suddently it makes sense. :-)
>
> So I assume all of 512k/1024k/2048k actually perform better? Or is the
> effect neglegible for 1024k/2048k?

The 512K case is the only one that performs better, my test image was
too small (40 GB) for the other cases.

Berto

Re: [Qemu-block] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size

2019-08-16 Thread Kevin Wolf

Am 16.08.2019 um 15:30 hat Alberto Garcia geschrieben:
> On Fri 16 Aug 2019 02:59:21 PM CEST, Kevin Wolf wrote:
> > The requirement so that this bug doesn't affect the user seems to be
> > that the image size is a multiple of 64k * 8k = 512 MB. Which means
> > that users are probably often lucky enough in practice.
> 
> Or rather: cluster_size^2 / 8, which, if my numbers are right:
> 
> |--+-|
> | Cluster size | Multiple of |
> |--+-|
> | 4 KB |2 MB |
> | 8 KB |8 MB |
> |16 KB |   32 MB |
> |32 KB |  128 MB |
> |64 KB |  512 MB |
> |   128 KB |2 GB |
> |   256 KB |8 GB |
> |   512 KB |   32 GB |
> |  1024 KB |  128 GB |
> |  2048 KB |  512 GB |
> |--+-|
> 
> It get trickier with larger clusters, but if you have a larger cluster
> size you probably have a very large image anyway, so yes I also think
> that users are probably lucky enough in practice.

Yes, I assumed 64k clusters.

The other somewhat popular cluster size is probably 2 MB, where I think
images sizes that are not a multiple of 512 GB are rather likely...

> (also, the number of cache tables is always >= 2, so if the image size
> is less than twice those numbers then it's also safe)

Right. I already corrected my statement to include > 1024 MB in the Red
Hat Bugzilla (but still didn't consider other cluster sizes).

> And yes, the odd value on the 512KB row on that we discussed last month
> is due to this same bug:
> 
> https://lists.gnu.org/archive/html/qemu-block/2019-07/msg00496.html

Hm... And suddently it makes sense. :-)

So I assume all of 512k/1024k/2048k actually perform better? Or is the
effect neglegible for 1024k/2048k?

Kevin

Re: [Qemu-block] [PATCH v6 35/42] block: Fix check_to_replace_node()

2019-08-16 Thread Max Reitz

On 16.08.19 13:01, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2019 20:01, Max Reitz wrote:
>> On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.08.2019 19:14, Max Reitz wrote:
 Currently, check_to_replace_node() only allows mirror to replace a node
 in the chain of the source node, and only if it is the first non-filter
 node below the source.  Well, technically, the idea is that you can
 exactly replace a quorum child by mirroring from quorum.

 This has (probably) two reasons:
 (1) We do not want to create loops.
 (2) @replaces and @device should have exactly the same content so
   replacing them does not cause visible data to change.

 This has two issues:
 (1) It is overly restrictive.  It is completely fine for @replaces to be
   a filter.
 (2) It is not restrictive enough.  You can create loops with this as
   follows:

 $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
 $ qemu-system-x86_64 -qmp stdio
 {"execute": "qmp_capabilities"}
 {"execute": "object-add",
"arguments": {"qom-type": "throttle-group", "id": "tg0"}}
 {"execute": "blockdev-add",
"arguments": {
"node-name": "source",
"driver": "throttle",
"throttle-group": "tg0",
"file": {
"node-name": "filtered",
"driver": "qcow2",
"file": {
"driver": "file",
"filename": "/tmp/source.qcow2"
} } } }
 {"execute": "drive-mirror",
"arguments": {
"job-id": "mirror",
"device": "source",
"target": "/tmp/target.qcow2",
"format": "qcow2",
"node-name": "target",
"sync" :"none",
"replaces": "filtered"
} }
 {"execute": "block-job-complete", "arguments": {"device": "mirror"}}

 And qemu crashes because of a stack overflow due to the loop being
 created (target's backing file is source, so when it replaces filtered,
 it points to itself through source).

 (blockdev-mirror can be broken similarly.)

 So let us make the checks for the two conditions above explicit, which
 makes the whole function exactly as restrictive as it needs to be.

 Signed-off-by: Max Reitz 
 ---
include/block/block.h |  1 +
block.c   | 83 +++
blockdev.c| 34 --
3 files changed, 110 insertions(+), 8 deletions(-)

 diff --git a/include/block/block.h b/include/block/block.h
 index 6ba853fb90..8da706cd89 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState 
 *candidate);

/* check if a named node can be replaced when doing drive-mirror */
BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
 +BlockDriverState *backing_bs,
const char *node_name, Error 
 **errp);

/* async block I/O */
 diff --git a/block.c b/block.c
 index 915b80153c..4858d3e718 100644
 --- a/block.c
 +++ b/block.c
 @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
 *candidate)
return false;
}

 +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
 +{
 +BdrvChild *c;
 +
 +if (!parent) {
 +return false;
 +}
 +
 +QLIST_FOREACH(c, &parent->children, next) {
 +if (c->bs == child || is_child_of(child, c->bs)) {
 +return true;
 +}
 +}
 +
 +return false;
 +}
 +
 +/*
 + * Return true if there are only filters in [@top, @base).  Note that
 + * this may include quorum (which bdrv_chain_contains() cannot
 + * handle).
>>>
>>> More presizely: return true if exists chain of filters from top to base or 
>>> if
>>> top == base.
>>>
>>> I keep in mind backup-top filter:
>>>
>>> [backup-top]
>>> |  \target
>>
>> backup-top can’t be a filter if it has two children with different
>> contents, though.
> 
> Why? target is special child, unrelated to what is read/written over 
> backup-top.
> It's an own business of backup-top.
> 
>>
>> (commit-top and mirror-top aren’t filters either.)
> 
> Ahm, I missed something. They have is_filter = true and their children 
> considered
> to be filtered-rw children in your series? And than, who they are? Format 
> nodes?
> And how they appears in backing chains than?

Er, right, I remember, I made them filters in patch 1 of this series. m( :-)

But the chain would still be unique, in a sense, because backup-top only
has one filtered child, so you could go down the chain with
bdrv_f

Re: [Qemu-block] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size

2019-08-16 Thread Alberto Garcia

On Fri 16 Aug 2019 02:59:21 PM CEST, Kevin Wolf wrote:
> The requirement so that this bug doesn't affect the user seems to be
> that the image size is a multiple of 64k * 8k = 512 MB. Which means
> that users are probably often lucky enough in practice.

Or rather: cluster_size^2 / 8, which, if my numbers are right:

|--+-|
| Cluster size | Multiple of |
|--+-|
| 4 KB |2 MB |
| 8 KB |8 MB |
|16 KB |   32 MB |
|32 KB |  128 MB |
|64 KB |  512 MB |
|   128 KB |2 GB |
|   256 KB |8 GB |
|   512 KB |   32 GB |
|  1024 KB |  128 GB |
|  2048 KB |  512 GB |
|--+-|

It get trickier with larger clusters, but if you have a larger cluster
size you probably have a very large image anyway, so yes I also think
that users are probably lucky enough in practice.

(also, the number of cache tables is always >= 2, so if the image size
is less than twice those numbers then it's also safe)

And yes, the odd value on the 512KB row on that we discussed last month
is due to this same bug:

https://lists.gnu.org/archive/html/qemu-block/2019-07/msg00496.html

Berto

Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qapi: deprecate implicit filters

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

16.08.2019 15:33, Markus Armbruster wrote:
> Kevin Wolf  writes:
> 
>> Am 15.08.2019 um 21:24 hat Markus Armbruster geschrieben:
> [...]
>>> Let's assume all libvirt ever does with deprecation notices is logging
>>> them.  Would that solve the problem of reliably alerting libvirt
>>> developers to deprecation issues?  Nope.  But it could help
>>> occasionally.
>>
>> I'm not saying that deprecation notices would hurt, just that they
>> probably won't solve problem alone.
> 
> No argument.
> 
>> Crashing if --future is given and logging otherwise seems reasonable
>> enough to me. Whether we need to wire up a new deprecation mechanism in
>> QMP for the logging or if we can just keep printing to stderr is
>> debatable. stderr already ends up in a log file, a QMP extension would
>> require new libvirt code. If libvirt would log deprecation notices more
>> prominently, or use the information for tainting or any other kind of
>> processing, a dedicated QMP mechanism could be justified.
> 
> I'd like to start with two tasks:
> 
> * A CLI option to configure what to do on use of a deprecated feature.
> 
>We currently warn.  We want to be able to crash instead.  Silencing
>the warnings might be useful.  Turning them into errors might be
>useful.
> 
>The existing ad hoc warnings need to be replaced by a call of a common
>function that implements the configurable behavior.
> 
> * QAPI feature flag "deprecated", for introspectable deprecation, and
>without ad hoc code.
> 
> Then see whether our users need more.
> 

Crashing is useful for libvirt developers, it's obvious, just enable 
crash-on-deprecated
on all testing environments and most probably we will not miss such a case.

For qapi I doubt is it really needed. Implementing code in libvirt which will 
check for command
(or it's parameter, or it's parameter "optionality" is deprecated) ? It's hard 
and what libvirt
should report to final user? It becomes a kind of synthetic error in libvirt 
code, like

...
log_error("We are going to divide by zero. It's a bug, please report it to 
developers!");
x = a / 0;
...

It's simpler to fix second line than implement special mechanism including 
protocol specification
to report such a case.

I exaggerate of course with this example, but I doubt that implementing a 
special protocol
for it worth doing. And I think notifying libvirt by email (as Peter said) and 
providing option
"crash-on-deprecated" in Qemu are enough for libvirt developers to prevent and 
to fix using
deprecated things.

In other words, I don't see why reporting deprecated feature usage is better in 
libvirt than in
Qemu (by warning, error or crash), and in Qemu it's much more simple and don't 
need QAPI protocol
extension.

(I'm sorry if I'm repeating already written arguments, I've not read the whole 
thread)

-- 
Best regards,
Vladimir

Re: [Qemu-block] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size

2019-08-16 Thread Kevin Wolf

Am 16.08.2019 um 14:17 hat Alberto Garcia geschrieben:
> The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
> larger than the maximum amount of L2 metadata that the image can have.
> For example: with 64 KB clusters the user would need a qcow2 image
> with a virtual size of 256 GB in order to have 32 MB of L2 metadata.
> 
> Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
> we forbid the L2 cache to become larger than the maximum amount of L2
> metadata for the image, calculated using this formula:
> 
> uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> 
> The problem with this formula is that the result should be rounded up
> to the cluster size because an L2 table on disk always takes one full
> cluster.
> 
> For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
> 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
> the last 32 KB of those are not going to be used.
> 
> However QEMU rounds the numbers down and only creates 2 cache tables
> (128 KB), which is not enough for the image.
> 
> A quick test doing 4KB random writes on a 1280 MB image gives me
> around 500 IOPS, while with the correct cache size I get 16K IOPS.
> 
> Signed-off-by: Alberto Garcia 

Hm, this is bad. :-(

The requirement so that this bug doesn't affect the user seems to be
that the image size is a multiple of 64k * 8k = 512 MB. Which means that
users are probably often lucky enough in practice.

I'll Cc: qemu-stable anyway.

Thanks, applied to the block branch.

Kevin

Re: [Qemu-block] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size

2019-08-16 Thread Alberto Garcia

Cc qemu-stable

This bug means that under certain conditions it's impossible to
create a cache large enough for the image, resulting in reduced I/O
performance.

On Fri, Aug 16, 2019 at 03:17:42PM +0300, Alberto Garcia wrote:
> The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
> larger than the maximum amount of L2 metadata that the image can have.
> For example: with 64 KB clusters the user would need a qcow2 image
> with a virtual size of 256 GB in order to have 32 MB of L2 metadata.
> 
> Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
> we forbid the L2 cache to become larger than the maximum amount of L2
> metadata for the image, calculated using this formula:
> 
> uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> 
> The problem with this formula is that the result should be rounded up
> to the cluster size because an L2 table on disk always takes one full
> cluster.
> 
> For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
> 160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
> the last 32 KB of those are not going to be used.
> 
> However QEMU rounds the numbers down and only creates 2 cache tables
> (128 KB), which is not enough for the image.
> 
> A quick test doing 4KB random writes on a 1280 MB image gives me
> around 500 IOPS, while with the correct cache size I get 16K IOPS.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 039bdc2f7e..865839682c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -826,7 +826,11 @@ static void read_cache_sizes(BlockDriverState *bs, 
> QemuOpts *opts,
>  bool l2_cache_entry_size_set;
>  int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
>  uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
> -uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
> +uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, 
> s->cluster_size);
> +/* An L2 table is always one cluster in size so the max cache size
> + * should be a multiple of the cluster size. */
> +uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
> + s->cluster_size);
>  
>  combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
>  l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
> -- 
> 2.20.1

Re: [Qemu-block] [Qemu-devel] [PATCH 2/2] qapi: deprecate implicit filters

2019-08-16 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 15.08.2019 um 21:24 hat Markus Armbruster geschrieben:
[...]
>> Let's assume all libvirt ever does with deprecation notices is logging
>> them.  Would that solve the problem of reliably alerting libvirt
>> developers to deprecation issues?  Nope.  But it could help
>> occasionally.
>
> I'm not saying that deprecation notices would hurt, just that they
> probably won't solve problem alone.

No argument.

> Crashing if --future is given and logging otherwise seems reasonable
> enough to me. Whether we need to wire up a new deprecation mechanism in
> QMP for the logging or if we can just keep printing to stderr is
> debatable. stderr already ends up in a log file, a QMP extension would
> require new libvirt code. If libvirt would log deprecation notices more
> prominently, or use the information for tainting or any other kind of
> processing, a dedicated QMP mechanism could be justified.

I'd like to start with two tasks:

* A CLI option to configure what to do on use of a deprecated feature.

  We currently warn.  We want to be able to crash instead.  Silencing
  the warnings might be useful.  Turning them into errors might be
  useful.

  The existing ad hoc warnings need to be replaced by a call of a common
  function that implements the configurable behavior.

* QAPI feature flag "deprecated", for introspectable deprecation, and
  without ad hoc code.

Then see whether our users need more.

[Qemu-block] [PATCH] qcow2: Fix the calculation of the maximum L2 cache size

2019-08-16 Thread Alberto Garcia

The size of the qcow2 L2 cache defaults to 32 MB, which can be easily
larger than the maximum amount of L2 metadata that the image can have.
For example: with 64 KB clusters the user would need a qcow2 image
with a virtual size of 256 GB in order to have 32 MB of L2 metadata.

Because of that, since commit b749562d9822d14ef69c9eaa5f85903010b86c30
we forbid the L2 cache to become larger than the maximum amount of L2
metadata for the image, calculated using this formula:

uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);

The problem with this formula is that the result should be rounded up
to the cluster size because an L2 table on disk always takes one full
cluster.

For example, a 1280 MB qcow2 image with 64 KB clusters needs exactly
160 KB of L2 metadata, but we need 192 KB on disk (3 clusters) even if
the last 32 KB of those are not going to be used.

However QEMU rounds the numbers down and only creates 2 cache tables
(128 KB), which is not enough for the image.

A quick test doing 4KB random writes on a 1280 MB image gives me
around 500 IOPS, while with the correct cache size I get 16K IOPS.

Signed-off-by: Alberto Garcia 
---
 block/qcow2.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 039bdc2f7e..865839682c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -826,7 +826,11 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
 bool l2_cache_entry_size_set;
 int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
 uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
-uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
+uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
+/* An L2 table is always one cluster in size so the max cache size
+ * should be a multiple of the cluster size. */
+uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
+ s->cluster_size);
 
 combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
 l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
-- 
2.20.1

Re: [Qemu-block] [Qemu-devel] [PATCH v7 00/42] Invert Endian bit in SPARCv9 MMU TTE

2019-08-16 Thread David Gibson

On Fri, Aug 16, 2019 at 11:58:05AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Tony,
> 
> On 8/16/19 8:28 AM, tony.ngu...@bt.com wrote:
> > This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE.
> > 
> > v7:
> [...]
> > - Re-declared many native endian devices as little or big endian. This is 
> > why
> >   v7 has +16 patches.
> 
> Why are you doing that? What is the rational?
> 
> Anyhow if this not required by your series, you should split it out of
> it, and send it on your principal changes are merged.
> I'm worried because this these new patches involve many subsystems (thus
> maintainers) and reviewing them will now take a fair amount of time.
> 
> > For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
> > targets from the set of target/hw/*/device.o.
> >
> > If the set of targets are all little or all big endian, re-declare
> > the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
> > respectively.
> 
> If only little endian targets use a device, that doesn't mean the device
> is designed in little endian...
> 
> Then if a big endian target plan to use this device, it will require
> more work and you might have introduced regressions...

Uh.. only if they make the version of the device on a big endian
target big endian.  Which is a terrible idea - if you know a hardware
designer planning to do this, please slap them.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] [PATCH v7 00/42] Invert Endian bit in SPARCv9 MMU TTE

2019-08-16 Thread Peter Maydell

On Fri, 16 Aug 2019 at 12:37,  wrote:
>
> Hi Phillippe,
>
> On 8/16/19 7:58 PM, Philippe Mathieu-Daudé wrote:
> >On 8/16/19 8:28 AM, tony.ngu...@bt.com wrote:
> >> For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
> >> targets from the set of target/hw/*/device.o.
> >>
> >> If the set of targets are all little or all big endian, re-declare
> >> the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
> >> respectively.
> >
> >If only little endian targets use a device, that doesn't mean the device
> >is designed in little endian...
> >
> >Then if a big endian target plan to use this device, it will require
> >more work and you might have introduced regressions...
> >
> >I'm not sure this is a safe move.
> >
> >> This *naive* deduction may result in genuinely native endian devices
> >> being incorrectly declared as little or big endian, but should not
> >> introduce regressions for current targets.
> >
>
> Roger. Evidently too naive. TBH, most devices I've never heard of...

OTOH it's worth noting that it's quite likely that most of
the implementations of these DEVICE_NATIVE_ENDIAN devices
picked it in an equally naive way, by just copying some other
device's code...

thanks
-- PMM

Re: [Qemu-block] [Qemu-devel] [PATCH v7 00/42] Invert Endian bit in SPARCv9 MMU TTE

2019-08-16 Thread tony.nguyen

Hi Phillippe,

On 8/16/19 7:58 PM, Philippe Mathieu-Daudé wrote:
>On 8/16/19 8:28 AM, tony.ngu...@bt.com wrote:
>> This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE.
>>
>> v7:
>[...]
>> - Re-declared many native endian devices as little or big endian. This is why
>>   v7 has +16 patches.
>
>Why are you doing that? What is the rational?

While collapsing the byte swaps, it was suggested in patch #11 of v5 that
consistent use of MemOp simplified endian comparisons. This lead to the
deprecation of enum device_endian by MemOp.

As MO_TE is conditional upon NEED_CPU_H, the s/DEVICE_NATIVE_ENDIAN/MO_TE/
required changing some device object files from common-obj-* to obj-*. In patch
#15 of v6 Paolo noted that most devices should not of been DEVICE_NATIVE_ENDIAN
and hinted at a clean up.

The +16 patches in v7 is the clean up effort.

>Anyhow if this not required by your series, you should split it out of
>it, and send it on your principal changes are merged.
>I'm worried because this these new patches involve many subsystems (thus
>maintainers) and reviewing them will now take a fair amount of time.

Yes, lets split these patches out. They are very much a tangent to the series
purpose.

>> For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
>> targets from the set of target/hw/*/device.o.
>>
>> If the set of targets are all little or all big endian, re-declare
>> the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
>> respectively.
>
>If only little endian targets use a device, that doesn't mean the device
>is designed in little endian...
>
>Then if a big endian target plan to use this device, it will require
>more work and you might have introduced regressions...
>
>I'm not sure this is a safe move.
>
>> This *naive* deduction may result in genuinely native endian devices
>> being incorrectly declared as little or big endian, but should not
>> introduce regressions for current targets.
>

Roger. Evidently too naive. TBH, most devices I've never heard of...

Regards,
Tony

Re: [Qemu-block] [PATCH v4] blockjob: drain all job nodes in block_job_drain

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

15.08.2019 16:15, Max Reitz wrote:
> On 02.08.19 11:52, Vladimir Sementsov-Ogievskiy wrote:
>> Instead of draining additional nodes in each job code, let's do it in
>> common block_job_drain, draining just all job's children.
>> BlockJobDriver.drain becomes unused, so, drop it at all.
>>
>> It's also a first step to finally get rid of blockjob->blk.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
> 
> What do you think of Kevin’s comment that draining the block nodes may
> actually be entirely unnecessary?
> 

Hmmm, I'm afraid that nothing more than I can try, and if no real problems
with iotests I'll send a patch.


-- 
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v6 35/42] block: Fix check_to_replace_node()

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

15.08.2019 20:01, Max Reitz wrote:
> On 15.08.19 17:21, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 19:14, Max Reitz wrote:
>>> Currently, check_to_replace_node() only allows mirror to replace a node
>>> in the chain of the source node, and only if it is the first non-filter
>>> node below the source.  Well, technically, the idea is that you can
>>> exactly replace a quorum child by mirroring from quorum.
>>>
>>> This has (probably) two reasons:
>>> (1) We do not want to create loops.
>>> (2) @replaces and @device should have exactly the same content so
>>>   replacing them does not cause visible data to change.
>>>
>>> This has two issues:
>>> (1) It is overly restrictive.  It is completely fine for @replaces to be
>>>   a filter.
>>> (2) It is not restrictive enough.  You can create loops with this as
>>>   follows:
>>>
>>> $ qemu-img create -f qcow2 /tmp/source.qcow2 64M
>>> $ qemu-system-x86_64 -qmp stdio
>>> {"execute": "qmp_capabilities"}
>>> {"execute": "object-add",
>>>"arguments": {"qom-type": "throttle-group", "id": "tg0"}}
>>> {"execute": "blockdev-add",
>>>"arguments": {
>>>"node-name": "source",
>>>"driver": "throttle",
>>>"throttle-group": "tg0",
>>>"file": {
>>>"node-name": "filtered",
>>>"driver": "qcow2",
>>>"file": {
>>>"driver": "file",
>>>"filename": "/tmp/source.qcow2"
>>>} } } }
>>> {"execute": "drive-mirror",
>>>"arguments": {
>>>"job-id": "mirror",
>>>"device": "source",
>>>"target": "/tmp/target.qcow2",
>>>"format": "qcow2",
>>>"node-name": "target",
>>>"sync" :"none",
>>>"replaces": "filtered"
>>>} }
>>> {"execute": "block-job-complete", "arguments": {"device": "mirror"}}
>>>
>>> And qemu crashes because of a stack overflow due to the loop being
>>> created (target's backing file is source, so when it replaces filtered,
>>> it points to itself through source).
>>>
>>> (blockdev-mirror can be broken similarly.)
>>>
>>> So let us make the checks for the two conditions above explicit, which
>>> makes the whole function exactly as restrictive as it needs to be.
>>>
>>> Signed-off-by: Max Reitz 
>>> ---
>>>include/block/block.h |  1 +
>>>block.c   | 83 +++
>>>blockdev.c| 34 --
>>>3 files changed, 110 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index 6ba853fb90..8da706cd89 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -404,6 +404,7 @@ bool bdrv_is_first_non_filter(BlockDriverState 
>>> *candidate);
>>>
>>>/* check if a named node can be replaced when doing drive-mirror */
>>>BlockDriverState *check_to_replace_node(BlockDriverState *parent_bs,
>>> +BlockDriverState *backing_bs,
>>>const char *node_name, Error 
>>> **errp);
>>>
>>>/* async block I/O */
>>> diff --git a/block.c b/block.c
>>> index 915b80153c..4858d3e718 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -6290,7 +6290,59 @@ bool bdrv_is_first_non_filter(BlockDriverState 
>>> *candidate)
>>>return false;
>>>}
>>>
>>> +static bool is_child_of(BlockDriverState *child, BlockDriverState *parent)
>>> +{
>>> +BdrvChild *c;
>>> +
>>> +if (!parent) {
>>> +return false;
>>> +}
>>> +
>>> +QLIST_FOREACH(c, &parent->children, next) {
>>> +if (c->bs == child || is_child_of(child, c->bs)) {
>>> +return true;
>>> +}
>>> +}
>>> +
>>> +return false;
>>> +}
>>> +
>>> +/*
>>> + * Return true if there are only filters in [@top, @base).  Note that
>>> + * this may include quorum (which bdrv_chain_contains() cannot
>>> + * handle).
>>
>> More presizely: return true if exists chain of filters from top to base or if
>> top == base.
>>
>> I keep in mind backup-top filter:
>>
>> [backup-top]
>> |  \target
> 
> backup-top can’t be a filter if it has two children with different
> contents, though.

Why? target is special child, unrelated to what is read/written over backup-top.
It's an own business of backup-top.

> 
> (commit-top and mirror-top aren’t filters either.)

Ahm, I missed something. They have is_filter = true and their children 
considered
to be filtered-rw children in your series? And than, who they are? Format nodes?
And how they appears in backing chains than?

> 
> That’s why there must be a unique chain [@top, @base).
> 
> I should probably not that it will return true if top == base, though, yes.
> 
>> |backing>[target]
>> V/
>> [source]  <-/backing
>>
>>> + */
>>> +static bool is_filtered_child(BlockDriverState *top, BlockDriverState 
>>> *base)
>>> +{
>>> +BdrvChild *c;
>>> +
>>> +if (!top) {
>>> +return false;
>>>

Re: [Qemu-block] [PATCH] nbd: Advertise multi-conn for shared read-only connections

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

16.08.2019 13:23, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2019 21:50, Eric Blake wrote:
>> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
>> advertised when the server promises cache consistency between
>> simultaneous clients (basically, rules that determine what FUA and
>> flush from one client are able to guarantee for reads from another
>> client).  When we don't permit simultaneous clients (such as qemu-nbd
>> without -e), the bit makes no sense; and for writable images, we
>> probably have a lot more work before we can declare that actions from
>> one client are cache-consistent with actions from another.  But for
>> read-only images, where flush isn't changing any data, we might as
>> well advertise multi-conn support.  What's more, advertisement of the
>> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
>> use, where a second connection will succeed rather than hang until the
>> first client goes away.
>>
>> This patch affects qemu as server in advertising the bit.  We may want
>> to consider patches to qemu as client to attempt parallel connections
>> for higher throughput by spreading the load over those connections
>> when a server advertises multi-conn, but for now sticking to one
>> connection per nbd:// BDS is okay.
>>
>> See also: https://bugzilla.redhat.com/1708300
>> Signed-off-by: Eric Blake 
>> ---
>>   docs/interop/nbd.txt | 1 +
>>   include/block/nbd.h  | 2 +-
>>   blockdev-nbd.c   | 2 +-
>>   nbd/server.c | 4 +++-
>>   qemu-nbd.c   | 2 +-
>>   5 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
>> index fc64473e02b2..6dfec7f47647 100644
>> --- a/docs/interop/nbd.txt
>> +++ b/docs/interop/nbd.txt
>> @@ -53,3 +53,4 @@ the operation of that feature.
>>   * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
>>   * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
>>   NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
>> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 7b36d672f046..991fd52a5134 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
>>
>>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>     uint64_t size, const char *name, const char 
>> *desc,
>> -  const char *bitmap, uint16_t nbdflags,
>> +  const char *bitmap, uint16_t nbdflags, bool 
>> shared,
>>     void (*close)(NBDExport *), bool writethrough,
>>     BlockBackend *on_eject_blk, Error **errp);
>>   void nbd_export_close(NBDExport *exp);
>> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
>> index 66eebab31875..e5d228771292 100644
>> --- a/blockdev-nbd.c
>> +++ b/blockdev-nbd.c
>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool 
>> has_name, const char *name,
>>   }
>>
>>   exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>> - writable ? 0 : NBD_FLAG_READ_ONLY,
>> + writable ? 0 : NBD_FLAG_READ_ONLY, true,
> 
> s/true/!writable ?

Oh, I see, John already noticed this, it's checked in nbd_export_new anyway..

> 
>>    NULL, false, on_eject_blk, errp);
>>   if (!exp) {
>>   return;
>> diff --git a/nbd/server.c b/nbd/server.c
>> index a2cf085f7635..a602d85070ff 100644
>> --- a/nbd/server.c
>> +++ b/nbd/server.c
>> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
>>
>>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>     uint64_t size, const char *name, const char 
>> *desc,
>> -  const char *bitmap, uint16_t nbdflags,
>> +  const char *bitmap, uint16_t nbdflags, bool 
>> shared,
>>     void (*close)(NBDExport *), bool writethrough,
>>     BlockBackend *on_eject_blk, Error **errp)
>>   {
>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, 
>> uint64_t dev_offset,
>>   perm = BLK_PERM_CONSISTENT_READ;
>>   if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>>   perm |= BLK_PERM_WRITE;
>> +    } else if (shared) {
>> +    nbdflags |= NBD_FLAG_CAN_MULTI_CONN;

For me it looks a bit strange: we already have nbdflags parameter for 
nbd_export_new(), why
to add a separate boolean to pass one of nbdflags flags?

Also, for qemu-nbd, shouldn't we allow -e only together with -r ?

>>   }
>>   blk = blk_new(bdrv_get_aio_context(bs), perm,
>>     BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 049645491dab..55f5ceaf5c92 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -1173,7 +1173,7 @@ int main(int argc, char

Re: [Qemu-block] [PATCH] nbd: Advertise multi-conn for shared read-only connections

2019-08-16 Thread Vladimir Sementsov-Ogievskiy

15.08.2019 21:50, Eric Blake wrote:
> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
> advertised when the server promises cache consistency between
> simultaneous clients (basically, rules that determine what FUA and
> flush from one client are able to guarantee for reads from another
> client).  When we don't permit simultaneous clients (such as qemu-nbd
> without -e), the bit makes no sense; and for writable images, we
> probably have a lot more work before we can declare that actions from
> one client are cache-consistent with actions from another.  But for
> read-only images, where flush isn't changing any data, we might as
> well advertise multi-conn support.  What's more, advertisement of the
> bit makes it easier for clients to determine if 'qemu-nbd -e' was in
> use, where a second connection will succeed rather than hang until the
> first client goes away.
> 
> This patch affects qemu as server in advertising the bit.  We may want
> to consider patches to qemu as client to attempt parallel connections
> for higher throughput by spreading the load over those connections
> when a server advertises multi-conn, but for now sticking to one
> connection per nbd:// BDS is okay.
> 
> See also: https://bugzilla.redhat.com/1708300
> Signed-off-by: Eric Blake 
> ---
>   docs/interop/nbd.txt | 1 +
>   include/block/nbd.h  | 2 +-
>   blockdev-nbd.c   | 2 +-
>   nbd/server.c | 4 +++-
>   qemu-nbd.c   | 2 +-
>   5 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
> index fc64473e02b2..6dfec7f47647 100644
> --- a/docs/interop/nbd.txt
> +++ b/docs/interop/nbd.txt
> @@ -53,3 +53,4 @@ the operation of that feature.
>   * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
>   * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
>   NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 7b36d672f046..991fd52a5134 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
> 
>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> uint64_t size, const char *name, const char *desc,
> -  const char *bitmap, uint16_t nbdflags,
> +  const char *bitmap, uint16_t nbdflags, bool shared,
> void (*close)(NBDExport *), bool writethrough,
> BlockBackend *on_eject_blk, Error **errp);
>   void nbd_export_close(NBDExport *exp);
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index 66eebab31875..e5d228771292 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool 
> has_name, const char *name,
>   }
> 
>   exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
> - writable ? 0 : NBD_FLAG_READ_ONLY,
> + writable ? 0 : NBD_FLAG_READ_ONLY, true,

s/true/!writable ?

>NULL, false, on_eject_blk, errp);
>   if (!exp) {
>   return;
> diff --git a/nbd/server.c b/nbd/server.c
> index a2cf085f7635..a602d85070ff 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
> 
>   NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> uint64_t size, const char *name, const char *desc,
> -  const char *bitmap, uint16_t nbdflags,
> +  const char *bitmap, uint16_t nbdflags, bool shared,
> void (*close)(NBDExport *), bool writethrough,
> BlockBackend *on_eject_blk, Error **errp)
>   {
> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, 
> uint64_t dev_offset,
>   perm = BLK_PERM_CONSISTENT_READ;
>   if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
>   perm |= BLK_PERM_WRITE;
> +} else if (shared) {
> +nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
>   }
>   blk = blk_new(bdrv_get_aio_context(bs), perm,
> BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 049645491dab..55f5ceaf5c92 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1173,7 +1173,7 @@ int main(int argc, char **argv)
>   }
> 
>   export = nbd_export_new(bs, dev_offset, fd_size, export_name,
> -export_description, bitmap, nbdflags,
> +export_description, bitmap, nbdflags, shared > 1,
>   nbd_export_closed, writethrough, NULL,
>   &error_fatal);
> 


-- 
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PULL 00/16] Block layer patches

2019-08-16 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20190816093439.14262-1-kw...@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/target/arm/sve_helper.o
  CC  lm32-softmmu/hw/input/milkymist-softusb.o
  CC  lm32-softmmu/hw/misc/milkymist-hpdmc.o
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:209: qemu-system-arm] Error 1
make: *** [Makefile:472: arm-softmmu/all] Error 2
make: *** Waiting for unfinished jobs


The full log is available at
http://patchew.org/logs/20190816093439.14262-1-kw...@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-block] [qemu-s390x] [Qemu-devel] [PATCH v7 33/42] exec: Replace device_endian with MemOp

2019-08-16 Thread Thomas Huth

On 8/16/19 9:37 AM, tony.ngu...@bt.com wrote:
> Simplify endianness comparisons with consistent use of the more
> expressive MemOp.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Tony Nguyen 
> Reviewed-by: Richard Henderson 
> Acked-by: David Gibson 

This patch is *huge*, more than 800kB. It keeps being stuck in the the
filter of the qemu-s390x list each time you send it. Please:

1) Try to break it up in more digestible pieces, e.g. change only one
subsystem at a time (this is also better reviewable by people who are
interested in one area)

2) Do not send HTML emails to the mailing list.

 Thanks,
  Thomas

Re: [Qemu-block] [Qemu-devel] [PATCH v7 27/42] hw/pci-host: Declare device little or big endian

2019-08-16 Thread Philippe Mathieu-Daudé

On 8/16/19 9:35 AM, tony.ngu...@bt.com wrote:
> For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
> targets from the set of target/hw/*/device.o.
> 
> If the set of targets are all little or all big endian, re-declare
> the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
> respectively.
> 
> This *naive* deduction may result in genuinely native endian devices
> being incorrectly declared as little or big endian, but should not
> introduce regressions for current targets.
> 
> These devices should be re-declared as DEVICE_NATIVE_ENDIAN if 1) it
> has a new target with an opposite endian or 2) someone informed knows
> better =)
> 
> Signed-off-by: Tony Nguyen 
> ---
>  hw/pci-host/q35.c       | 2 +-
>  hw/pci-host/versatile.c | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 0a010be..fd20f72 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -288,7 +288,7 @@ static void tseg_blackhole_write(void *opaque,
> hwaddr addr, uint64_t val,
>  static const MemoryRegionOps tseg_blackhole_ops = {
>      .read = tseg_blackhole_read,
>      .write = tseg_blackhole_write,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

OK.

>      .valid.min_access_size = 1,
>      .valid.max_access_size = 4,
>      .impl.min_access_size = 4,
> diff --git a/hw/pci-host/versatile.c b/hw/pci-host/versatile.c
> index 791b321..e7017f3 100644
> --- a/hw/pci-host/versatile.c
> +++ b/hw/pci-host/versatile.c
> @@ -240,7 +240,7 @@ static uint64_t pci_vpb_reg_read(void *opaque,
> hwaddr addr,
>  static const MemoryRegionOps pci_vpb_reg_ops = {
>      .read = pci_vpb_reg_read,
>      .write = pci_vpb_reg_write,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
>      .valid = {
>          .min_access_size = 4,
>          .max_access_size = 4,
> @@ -306,7 +306,7 @@ static uint64_t pci_vpb_config_read(void *opaque,
> hwaddr addr,
>  static const MemoryRegionOps pci_vpb_config_ops = {
>      .read = pci_vpb_config_read,
>      .write = pci_vpb_config_write,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

Eh hard to say, PCI is not clear about endianess...

>  };
>  
>  static int pci_vpb_map_irq(PCIDevice *d, int irq_num)
> -- 
> 1.8.3.1
> 
> 
> 
>

Re: [Qemu-block] [Qemu-devel] [PATCH v7 25/42] hw/misc: Declare device little or big endian

2019-08-16 Thread Philippe Mathieu-Daudé

On 8/16/19 9:34 AM, tony.ngu...@bt.com wrote:
> For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
> targets from the set of target/hw/*/device.o.
> 
> If the set of targets are all little or all big endian, re-declare
> the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
> respectively.
> 
> This *naive* deduction may result in genuinely native endian devices
> being incorrectly declared as little or big endian, but should not
> introduce regressions for current targets.
> 
> These devices should be re-declared as DEVICE_NATIVE_ENDIAN if 1) it
> has a new target with an opposite endian or 2) someone informed knows
> better =)
> 
> Signed-off-by: Tony Nguyen 
> ---
>  hw/misc/a9scu.c    | 2 +-
>  hw/misc/applesmc.c | 6 +++---
>  hw/misc/arm11scu.c | 2 +-
>  hw/misc/arm_l2x0.c | 2 +-
>  hw/misc/puv3_pm.c  | 2 +-
>  5 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/misc/a9scu.c b/hw/misc/a9scu.c
> index 4307f00..3de8cd3 100644
> --- a/hw/misc/a9scu.c
> +++ b/hw/misc/a9scu.c
> @@ -94,7 +94,7 @@ static void a9_scu_write(void *opaque, hwaddr offset,
>  static const MemoryRegionOps a9_scu_ops = {
>      .read = a9_scu_read,
>      .write = a9_scu_write,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

Uh, I doubt that.

>  };
>  
>  static void a9_scu_reset(DeviceState *dev)
> diff --git a/hw/misc/applesmc.c b/hw/misc/applesmc.c
> index 2d7eb3c..6c91f29 100644
> --- a/hw/misc/applesmc.c
> +++ b/hw/misc/applesmc.c
> @@ -285,7 +285,7 @@ static void qdev_applesmc_isa_reset(DeviceState *dev)
>  static const MemoryRegionOps applesmc_data_io_ops = {
>      .write = applesmc_io_data_write,
>      .read = applesmc_io_data_read,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
>      .impl = {
>          .min_access_size = 1,
>          .max_access_size = 1,
> @@ -295,7 +295,7 @@ static const MemoryRegionOps applesmc_data_io_ops = {
>  static const MemoryRegionOps applesmc_cmd_io_ops = {
>      .write = applesmc_io_cmd_write,
>      .read = applesmc_io_cmd_read,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
>      .impl = {
>          .min_access_size = 1,
>          .max_access_size = 1,
> @@ -305,7 +305,7 @@ static const MemoryRegionOps applesmc_cmd_io_ops = {
>  static const MemoryRegionOps applesmc_err_io_ops = {
>      .write = applesmc_io_err_write,
>      .read = applesmc_io_err_read,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,
>      .impl = {
>          .min_access_size = 1,
>          .max_access_size = 1,

Being ioport, this one might be OK.

> diff --git a/hw/misc/arm11scu.c b/hw/misc/arm11scu.c
> index 84275df..59fd7c0 100644
> --- a/hw/misc/arm11scu.c
> +++ b/hw/misc/arm11scu.c
> @@ -57,7 +57,7 @@ static void mpcore_scu_write(void *opaque, hwaddr offset,
>  static const MemoryRegionOps mpcore_scu_ops = {
>      .read = mpcore_scu_read,
>      .write = mpcore_scu_write,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

I don't think so,

>  };
>  
>  static void arm11_scu_realize(DeviceState *dev, Error **errp)
> diff --git a/hw/misc/arm_l2x0.c b/hw/misc/arm_l2x0.c
> index b88f40a..72ecf46 100644
> --- a/hw/misc/arm_l2x0.c
> +++ b/hw/misc/arm_l2x0.c
> @@ -157,7 +157,7 @@ static void l2x0_priv_reset(DeviceState *dev)
>  static const MemoryRegionOps l2x0_mem_ops = {
>      .read = l2x0_priv_read,
>      .write = l2x0_priv_write,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

neither here, but Peter will confirm.

>   };
>  
>  static void l2x0_priv_init(Object *obj)
> diff --git a/hw/misc/puv3_pm.c b/hw/misc/puv3_pm.c
> index b538b4a..cd82b69 100644
> --- a/hw/misc/puv3_pm.c
> +++ b/hw/misc/puv3_pm.c
> @@ -118,7 +118,7 @@ static const MemoryRegionOps puv3_pm_ops = {
>          .min_access_size = 4,
>          .max_access_size = 4,
>      },
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

This one I can't tell.

>  };
>  
>  static void puv3_pm_realize(DeviceState *dev, Error **errp)
> -- 
> 1.8.3.1
> 
> 
>

Re: [Qemu-block] [Qemu-devel] [PATCH v7 24/42] hw/isa: Declare device little or big endian

2019-08-16 Thread Philippe Mathieu-Daudé

On 8/16/19 9:34 AM, tony.ngu...@bt.com wrote:
> For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
> targets from the set of target/hw/*/device.o.
> 
> If the set of targets are all little or all big endian, re-declare
> the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
> respectively.
> 
> This *naive* deduction may result in genuinely native endian devices
> being incorrectly declared as little or big endian, but should not
> introduce regressions for current targets.
> 
> These devices should be re-declared as DEVICE_NATIVE_ENDIAN if 1) it
> has a new target with an opposite endian or 2) someone informed knows
> better =)
> 
> Signed-off-by: Tony Nguyen 
> ---
>  hw/isa/vt82c686.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
> index 12c460590..adf65d3 100644
> --- a/hw/isa/vt82c686.c
> +++ b/hw/isa/vt82c686.c
> @@ -108,7 +108,7 @@ static uint64_t superio_ioport_readb(void *opaque,
> hwaddr addr, unsigned size)
>  static const MemoryRegionOps superio_ops = {
>      .read = superio_ioport_readb,
>      .write = superio_ioport_writeb,
> -    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .endianness = DEVICE_LITTLE_ENDIAN,

Being ioport, one is probably OK.

>      .impl = {
>          .min_access_size = 1,
>          .max_access_size = 1,
> -- 
> 1.8.3.1
> 
> 
>

Re: [Qemu-block] [Qemu-devel] [PATCH v7 00/42] Invert Endian bit in SPARCv9 MMU TTE

2019-08-16 Thread Philippe Mathieu-Daudé

Hi Tony,

On 8/16/19 8:28 AM, tony.ngu...@bt.com wrote:
> This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE.
> 
> v7:
[...]
> - Re-declared many native endian devices as little or big endian. This is why
>   v7 has +16 patches.

Why are you doing that? What is the rational?

Anyhow if this not required by your series, you should split it out of
it, and send it on your principal changes are merged.
I'm worried because this these new patches involve many subsystems (thus
maintainers) and reviewing them will now take a fair amount of time.

> For each device declared with DEVICE_NATIVE_ENDIAN, find the set of
> targets from the set of target/hw/*/device.o.
>
> If the set of targets are all little or all big endian, re-declare
> the device endianness as DEVICE_LITTLE_ENDIAN or DEVICE_BIG_ENDIAN
> respectively.

If only little endian targets use a device, that doesn't mean the device
is designed in little endian...

Then if a big endian target plan to use this device, it will require
more work and you might have introduced regressions...

I'm not sure this is a safe move.

> This *naive* deduction may result in genuinely native endian devices
> being incorrectly declared as little or big endian, but should not
> introduce regressions for current targets.

Regards,

Phil.

[Qemu-block] [PATCH] file-posix: Fix has_write_zeroes after NO_FALLBACK

2019-08-16 Thread Kevin Wolf

If QEMU_AIO_NO_FALLBACK is given, we always return failure and don't
even try to use the BLKZEROOUT ioctl. In this failure case, we shouldn't
disable has_write_zeroes because we didn't learn anything about the
ioctl. The next request might not set QEMU_AIO_NO_FALLBACK and we can
still use the ioctl then.

Reported-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b8b4dad553..e927f9d3c3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1555,12 +1555,12 @@ static ssize_t 
handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
 } while (errno == EINTR);
 
 ret = translate_err(-errno);
+if (ret == -ENOTSUP) {
+s->has_write_zeroes = false;
+}
 }
 #endif
 
-if (ret == -ENOTSUP) {
-s->has_write_zeroes = false;
-}
 return ret;
 }
 
-- 
2.20.1

[Qemu-block] [PULL 12/16] block: Remove blk_pread_unthrottled()

2019-08-16 Thread Kevin Wolf

The functionality offered by blk_pread_unthrottled() goes back to commit
498e386c584. Then, we couldn't perform I/O throttling with synchronous
requests because timers wouldn't be executed in polling loops. So the
commit automatically disabled I/O throttling as soon as a synchronous
request was issued.

However, for geometry detection during disk initialisation, we always
used (and still use) synchronous requests even if guest requests use AIO
later. Geometry detection was not wanted to disable I/O throttling, so
bdrv_pread_unthrottled() was introduced which disabled throttling only
temporarily.

All of this isn't necessary any more because we do run timers in polling
loop and even synchronous requests are now using coroutine
infrastructure internally. For this reason, commit 90c78624f already
removed the automatic disabling of I/O throttling.

It's time to get rid of the workaround for the removed code, and its
abuse of blk_root_drained_begin()/end(), as well.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 include/sysemu/block-backend.h |  2 --
 block/block-backend.c  | 16 
 hw/block/hd-geometry.c |  7 +--
 3 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 733c4957eb..7320b58467 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -117,8 +117,6 @@ char *blk_get_attached_dev_id(BlockBackend *blk);
 BlockBackend *blk_by_dev(void *dev);
 BlockBackend *blk_by_qdev_id(const char *id, Error **errp);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
-int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
-  int bytes);
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
unsigned int bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags);
diff --git a/block/block-backend.c b/block/block-backend.c
index 0056b526b8..fdd6b01ecf 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1237,22 +1237,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 return rwco.ret;
 }
 
-int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
-  int count)
-{
-int ret;
-
-ret = blk_check_byte_request(blk, offset, count);
-if (ret < 0) {
-return ret;
-}
-
-blk_root_drained_begin(blk->root);
-ret = blk_pread(blk, offset, buf, count);
-blk_root_drained_end(blk->root, NULL);
-return ret;
-}
-
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
   int bytes, BdrvRequestFlags flags)
 {
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 79384a2b0a..dcbccee294 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -63,12 +63,7 @@ static int guess_disk_lchs(BlockBackend *blk,
 
 blk_get_geometry(blk, &nb_sectors);
 
-/**
- * The function will be invoked during startup not only in sync I/O mode,
- * but also in async I/O mode. So the I/O throttling function has to
- * be disabled temporarily here, not permanently.
- */
-if (blk_pread_unthrottled(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
+if (blk_pread(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
 return -1;
 }
 /* test msdos magic */
-- 
2.20.1

[Qemu-block] [PULL 11/16] iotests: Add test for concurrent stream/commit

2019-08-16 Thread Kevin Wolf

From: Max Reitz 

We already have 030 for that in general, but this tests very specific
cases of both jobs finishing concurrently.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/258 | 163 +
 tests/qemu-iotests/258.out |  33 
 tests/qemu-iotests/group   |   1 +
 3 files changed, 197 insertions(+)
 create mode 100755 tests/qemu-iotests/258
 create mode 100644 tests/qemu-iotests/258.out

diff --git a/tests/qemu-iotests/258 b/tests/qemu-iotests/258
new file mode 100755
index 00..b84cf02254
--- /dev/null
+++ b/tests/qemu-iotests/258
@@ -0,0 +1,163 @@
+#!/usr/bin/env python
+#
+# Very specific tests for adjacent commit/stream block jobs
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Max Reitz 
+
+import iotests
+from iotests import log, qemu_img, qemu_io_silent, \
+filter_qmp_testfiles, filter_qmp_imgfmt
+
+# Need backing file and change-backing-file support
+iotests.verify_image_format(supported_fmts=['qcow2', 'qed'])
+iotests.verify_platform(['linux'])
+
+
+# Returns a node for blockdev-add
+def node(node_name, path, backing=None, fmt=None, throttle=None):
+if fmt is None:
+fmt = iotests.imgfmt
+
+res = {
+'node-name': node_name,
+'driver': fmt,
+'file': {
+'driver': 'file',
+'filename': path
+}
+}
+
+if backing is not None:
+res['backing'] = backing
+
+if throttle:
+res['file'] = {
+'driver': 'throttle',
+'throttle-group': throttle,
+'file': res['file']
+}
+
+return res
+
+# Finds a node in the debug block graph
+def find_graph_node(graph, node_id):
+return next(node for node in graph['nodes'] if node['id'] == node_id)
+
+
+def test_concurrent_finish(write_to_stream_node):
+log('')
+log('=== Commit and stream finish concurrently (letting %s write) ===' % \
+('stream' if write_to_stream_node else 'commit'))
+log('')
+
+# All chosen in such a way that when the commit job wants to
+# finish, it polls and thus makes stream finish concurrently --
+# and the other way around, depending on whether the commit job
+# is finalized before stream completes or not.
+
+with iotests.FilePath('node4.img') as node4_path, \
+ iotests.FilePath('node3.img') as node3_path, \
+ iotests.FilePath('node2.img') as node2_path, \
+ iotests.FilePath('node1.img') as node1_path, \
+ iotests.FilePath('node0.img') as node0_path, \
+ iotests.VM() as vm:
+
+# It is important to use raw for the base layer (so that
+# permissions are just handed through to the protocol layer)
+assert qemu_img('create', '-f', 'raw', node0_path, '64M') == 0
+
+stream_throttle=None
+commit_throttle=None
+
+for path in [node1_path, node2_path, node3_path, node4_path]:
+assert qemu_img('create', '-f', iotests.imgfmt, path, '64M') == 0
+
+if write_to_stream_node:
+# This is what (most of the time) makes commit finish
+# earlier and then pull in stream
+assert qemu_io_silent(node2_path,
+  '-c', 'write %iK 64K' % (65536 - 192),
+  '-c', 'write %iK 64K' % (65536 -  64)) == 0
+
+stream_throttle='tg'
+else:
+# And this makes stream finish earlier
+assert qemu_io_silent(node1_path,
+  '-c', 'write %iK 64K' % (65536 - 64)) == 0
+
+commit_throttle='tg'
+
+vm.launch()
+
+vm.qmp_log('object-add',
+   qom_type='throttle-group',
+   id='tg',
+   props={
+   'x-iops-write': 1,
+   'x-iops-write-max': 1
+   })
+
+vm.qmp_log('blockdev-add',
+   filters=[filter_qmp_testfiles, filter_qmp_imgfmt],
+   **node('node4', node4_path, throttle=stream_throttle,
+ backing=node('node3', node3_path,
+ backing=node('node2', node2_path,
+ backing=node('node1', node1_path,
+ backing=node('node0', node0_path, 
throttle=commit_throttle,
+

[Qemu-block] [PULL 14/16] block-backend: Queue requests while drained

2019-08-16 Thread Kevin Wolf

This fixes devices like IDE that can still start new requests from I/O
handlers in the CPU thread while the block backend is drained.

The basic assumption is that in a drain section, no new requests should
be allowed through a BlockBackend (blk_drained_begin/end don't exist,
we get drain sections only on the node level). However, there are two
special cases where requests should not be queued:

1. Block jobs: We already make sure that block jobs are paused in a
   drain section, so they won't start new requests. However, if the
   drain_begin is called on the job's BlockBackend first, it can happen
   that we deadlock because the job stays busy until it reaches a pause
   point - which it can't if its requests aren't processed any more.

   The proper solution here would be to make all requests through the
   job's filter node instead of using a BlockBackend. For now, just
   disabling request queuing on the job BlockBackend is simpler.

2. In test cases where making requests through bdrv_* would be
   cumbersome because we'd need a BdrvChild. As we already got the
   functionality to disable request queuing from 1., use it in tests,
   too, for convenience.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 include/sysemu/block-backend.h |  1 +
 block/backup.c |  1 +
 block/block-backend.c  | 53 --
 block/commit.c |  2 ++
 block/mirror.c |  1 +
 blockjob.c |  3 ++
 tests/test-bdrv-drain.c|  1 +
 7 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7320b58467..368d53af77 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -104,6 +104,7 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, 
uint64_t *shared_perm);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow);
+void blk_set_disable_request_queuing(BlockBackend *blk, bool disable);
 void blk_iostatus_enable(BlockBackend *blk);
 bool blk_iostatus_is_enabled(const BlockBackend *blk);
 BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk);
diff --git a/block/backup.c b/block/backup.c
index b26c22c4b8..4743c8f0bc 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -644,6 +644,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 if (ret < 0) {
 goto error;
 }
+blk_set_disable_request_queuing(job->target, true);
 
 job->on_source_error = on_source_error;
 job->on_target_error = on_target_error;
diff --git a/block/block-backend.c b/block/block-backend.c
index fdd6b01ecf..c13c5c83b0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -79,6 +79,9 @@ struct BlockBackend {
 QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
 
 int quiesce_counter;
+CoQueue queued_requests;
+bool disable_request_queuing;
+
 VMChangeStateEntry *vmsh;
 bool force_allow_inactivate;
 
@@ -339,6 +342,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 
 block_acct_init(&blk->stats);
 
+qemu_co_queue_init(&blk->queued_requests);
 notifier_list_init(&blk->remove_bs_notifiers);
 notifier_list_init(&blk->insert_bs_notifiers);
 QLIST_INIT(&blk->aio_notifiers);
@@ -1096,6 +1100,11 @@ void blk_set_allow_aio_context_change(BlockBackend *blk, 
bool allow)
 blk->allow_aio_context_change = allow;
 }
 
+void blk_set_disable_request_queuing(BlockBackend *blk, bool disable)
+{
+blk->disable_request_queuing = disable;
+}
+
 static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
   size_t size)
 {
@@ -1127,13 +1136,24 @@ static int blk_check_byte_request(BlockBackend *blk, 
int64_t offset,
 return 0;
 }
 
+static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
+{
+if (blk->quiesce_counter && !blk->disable_request_queuing) {
+qemu_co_queue_wait(&blk->queued_requests, NULL);
+}
+}
+
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
unsigned int bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags)
 {
 int ret;
-BlockDriverState *bs = blk_bs(blk);
+BlockDriverState *bs;
 
+blk_wait_while_drained(blk);
+
+/* Call blk_bs() only after waiting, the graph may have changed */
+bs = blk_bs(blk);
 trace_blk_co_preadv(blk, bs, offset, bytes, flags);
 
 ret = blk_check_byte_request(blk, offset, bytes);
@@ -1159,8 +1179,12 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, 
int64_t offset,
 BdrvRequestFlags flags)
 {
 int ret;
-BlockDriverState *bs = blk_bs(blk);
+BlockDriverState *bs;
 
+blk_wait_while_drained(blk);
+
+/* Call blk_bs() only after waiting, the graph may have changed */
+bs = blk_bs(b

[Qemu-block] [PULL 16/16] file-posix: Handle undetectable alignment

2019-08-16 Thread Kevin Wolf

From: Nir Soffer 

In some cases buf_align or request_alignment cannot be detected:

1. With Gluster, buf_align cannot be detected since the actual I/O is
   done on Gluster server, and qemu buffer alignment does not matter.
   Since we don't have alignment requirement, buf_align=1 is the best
   value.

2. With local XFS filesystem, buf_align cannot be detected if reading
   from unallocated area. In this we must align the buffer, but we don't
   know what is the correct size. Using the wrong alignment results in
   I/O error.

3. With Gluster backed by XFS, request_alignment cannot be detected if
   reading from unallocated area. In this case we need to use the
   correct alignment, and failing to do so results in I/O errors.

4. With NFS, the server does not use direct I/O, so both buf_align cannot
   be detected. In this case we don't need any alignment so we can use
   buf_align=1 and request_alignment=1.

These cases seems to work when storage sector size is 512 bytes, because
the current code starts checking align=512. If the check succeeds
because alignment cannot be detected we use 512. But this does not work
for storage with 4k sector size.

To determine if we can detect the alignment, we probe first with
align=1. If probing succeeds, maybe there are no alignment requirement
(cases 1, 4) or we are probing unallocated area (cases 2, 3). Since we
don't have any way to tell, we treat this as undetectable alignment. If
probing with align=1 fails with EINVAL, but probing with one of the
expected alignments succeeds, we know that we found a working alignment.

Practically the alignment requirements are the same for buffer
alignment, buffer length, and offset in file. So in case we cannot
detect buf_align, we can use request alignment. If we cannot detect
request alignment, we can fallback to a safe value. To use this logic,
we probe first request alignment instead of buf_align.

Here is a table showing the behaviour with current code (the value in
parenthesis is the optimal value).

CaseSectorbuf_align (opt)   request_alignment (opt) result
==
1   512   512   (1)  512   (512) OK
1   4096  512   (1)  4096  (4096)FAIL
--
2   512   512   (512)512   (512) OK
2   4096  512   (4096)   4096  (4096)FAIL
--
3   512   512   (1)  512   (512) OK
3   4096  512   (1)  512   (4096)FAIL
--
4   512   512   (1)  512   (1)   OK
4   4096  512   (1)  512   (1)   OK

Same cases with this change:

CaseSectorbuf_align (opt)   request_alignment (opt) result
==
1   512   512   (1)  512   (512) OK
1   4096  4096  (1)  4096  (4096)OK
--
2   512   512   (512)512   (512) OK
2   4096  4096  (4096)   4096  (4096)OK
--
3   512   4096  (1)  4096  (512) OK
3   4096  4096  (1)  4096  (4096)OK
--
4   512   4096  (1)  4096  (1)   OK
4   4096  4096  (1)  4096  (1)   OK

I tested that provisioning VMs and copying disks on local XFS and
Gluster with 4k bytes sector size work now, resolving bugs [1],[2].
I tested also on XFS, NFS, Gluster with 512 bytes sector size.

[1] https://bugzilla.redhat.com/1737256
[2] https://bugzilla.redhat.com/1738657

Signed-off-by: Nir Soffer 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 4479cc7ab4..b8b4dad553 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -323,6 +323,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 BDRVRawState *s = bs->opaque;
 char *buf;
 size_t max_align = MAX(MAX_BLOCKSIZE, getpagesize());
+size_t alignments[] = {1, 512, 1024, 2048, 4096};
 
 /* For SCSI generic devices the alignment is not really used.
With buffered I/O, we don't have any restrictions. */
@@ -349,25 +350,38 @@ static void raw_probe_alignment(BlockDriverState *bs, int 
fd, Error **errp)
 }
 #endif
 
-/* If we could not get the sizes so f

[Qemu-block] [PULL 15/16] qemu-img convert: Deprecate using -n and -o together

2019-08-16 Thread Kevin Wolf

bdrv_create options specified with -o have no effect when skipping image
creation with -n, so this doesn't make sense. Warn against the misuse
and deprecate the combination so we can make it a hard error later.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: John Snow 
Reviewed-by: Eric Blake 
---
 qemu-img.c   | 5 +
 qemu-deprecated.texi | 7 +++
 2 files changed, 12 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 79983772de..d9321f6418 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2231,6 +2231,11 @@ static int img_convert(int argc, char **argv)
 goto fail_getopt;
 }
 
+if (skip_create && options) {
+warn_report("-o has no effect when skipping image creation");
+warn_report("This will become an error in future QEMU versions.");
+}
+
 s.src_num = argc - optind - 1;
 out_filename = s.src_num >= 1 ? argv[argc - 1] : NULL;
 
diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index fff07bb2a3..f7680c08e1 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -305,6 +305,13 @@ to just export the entire image and then mount only 
/dev/nbd0p1 than
 it is to reinvoke @command{qemu-nbd -c /dev/nbd0} limited to just a
 subset of the image.
 
+@subsection qemu-img convert -n -o (since 4.2.0)
+
+All options specified in @option{-o} are image creation options, so
+they have no effect when used with @option{-n} to skip image creation.
+Silently ignored options can be confusing, so this combination of
+options will be made an error in future versions.
+
 @section Build system
 
 @subsection Python 2 support (since 4.1.0)
-- 
2.20.1

[Qemu-block] [PULL 13/16] mirror: Keep mirror_top_bs drained after dropping permissions

2019-08-16 Thread Kevin Wolf

mirror_top_bs is currently implicitly drained through its connection to
the source or the target node. However, the drain section for target_bs
ends early after moving mirror_top_bs from src to target_bs, so that
requests can already be restarted while mirror_top_bs is still present
in the chain, but has dropped all permissions and therefore runs into an
assertion failure like this:

qemu-system-x86_64: block/io.c:1634: bdrv_co_write_req_prepare:
Assertion `child->perm & BLK_PERM_WRITE' failed.

Keep mirror_top_bs drained until all graph changes have completed.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/mirror.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9f5c59ece1..642d6570cc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -656,7 +656,10 @@ static int mirror_exit_common(Job *job)
 s->target = NULL;
 
 /* We don't access the source any more. Dropping any WRITE/RESIZE is
- * required before it could become a backing file of target_bs. */
+ * required before it could become a backing file of target_bs. Not having
+ * these permissions any more means that we can't allow any new requests on
+ * mirror_top_bs from now on, so keep it drained. */
+bdrv_drained_begin(mirror_top_bs);
 bs_opaque->stop = true;
 bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
  &error_abort);
@@ -724,6 +727,7 @@ static int mirror_exit_common(Job *job)
 bs_opaque->job = NULL;
 
 bdrv_drained_end(src);
+bdrv_drained_end(mirror_top_bs);
 s->in_drain = false;
 bdrv_unref(mirror_top_bs);
 bdrv_unref(src);
-- 
2.20.1

[Qemu-block] [PULL 09/16] tests: Test polling in bdrv_drop_intermediate()

2019-08-16 Thread Kevin Wolf

From: Max Reitz 

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 tests/test-bdrv-drain.c | 167 
 1 file changed, 167 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 03fa1142a1..1600d41e9a 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -100,6 +100,13 @@ static void bdrv_test_child_perm(BlockDriverState *bs, 
BdrvChild *c,
   nperm, nshared);
 }
 
+static int bdrv_test_change_backing_file(BlockDriverState *bs,
+ const char *backing_file,
+ const char *backing_fmt)
+{
+return 0;
+}
+
 static BlockDriver bdrv_test = {
 .format_name= "test",
 .instance_size  = sizeof(BDRVTestState),
@@ -111,6 +118,8 @@ static BlockDriver bdrv_test = {
 .bdrv_co_drain_end  = bdrv_test_co_drain_end,
 
 .bdrv_child_perm= bdrv_test_child_perm,
+
+.bdrv_change_backing_file = bdrv_test_change_backing_file,
 };
 
 static void aio_ret_cb(void *opaque, int ret)
@@ -1671,6 +1680,161 @@ static void test_blockjob_commit_by_drained_end(void)
 bdrv_unref(bs_child);
 }
 
+
+typedef struct TestSimpleBlockJob {
+BlockJob common;
+bool should_complete;
+bool *did_complete;
+} TestSimpleBlockJob;
+
+static int coroutine_fn test_simple_job_run(Job *job, Error **errp)
+{
+TestSimpleBlockJob *s = container_of(job, TestSimpleBlockJob, common.job);
+
+while (!s->should_complete) {
+job_sleep_ns(job, 0);
+}
+
+return 0;
+}
+
+static void test_simple_job_clean(Job *job)
+{
+TestSimpleBlockJob *s = container_of(job, TestSimpleBlockJob, common.job);
+*s->did_complete = true;
+}
+
+static const BlockJobDriver test_simple_job_driver = {
+.job_driver = {
+.instance_size  = sizeof(TestSimpleBlockJob),
+.free   = block_job_free,
+.user_resume= block_job_user_resume,
+.drain  = block_job_drain,
+.run= test_simple_job_run,
+.clean  = test_simple_job_clean,
+},
+};
+
+static int drop_intermediate_poll_update_filename(BdrvChild *child,
+  BlockDriverState *new_base,
+  const char *filename,
+  Error **errp)
+{
+/*
+ * We are free to poll here, which may change the block graph, if
+ * it is not drained.
+ */
+
+/* If the job is not drained: Complete it, schedule job_exit() */
+aio_poll(qemu_get_current_aio_context(), false);
+/* If the job is not drained: Run job_exit(), finish the job */
+aio_poll(qemu_get_current_aio_context(), false);
+
+return 0;
+}
+
+/**
+ * Test a poll in the midst of bdrv_drop_intermediate().
+ *
+ * bdrv_drop_intermediate() calls BdrvChildRole.update_filename(),
+ * which can yield or poll.  This may lead to graph changes, unless
+ * the whole subtree in question is drained.
+ *
+ * We test this on the following graph:
+ *
+ *Job
+ *
+ * |
+ *  job-node
+ * |
+ * v
+ *
+ *  job-node
+ *
+ * |
+ *  backing
+ * |
+ * v
+ *
+ * node-2 --chain--> node-1 --chain--> node-0
+ *
+ * We drop node-1 with bdrv_drop_intermediate(top=node-1, base=node-0).
+ *
+ * This first updates node-2's backing filename by invoking
+ * drop_intermediate_poll_update_filename(), which polls twice.  This
+ * causes the job to finish, which in turns causes the job-node to be
+ * deleted.
+ *
+ * bdrv_drop_intermediate() uses a QLIST_FOREACH_SAFE() loop, so it
+ * already has a pointer to the BdrvChild edge between job-node and
+ * node-1.  When it tries to handle that edge, we probably get a
+ * segmentation fault because the object no longer exists.
+ *
+ *
+ * The solution is for bdrv_drop_intermediate() to drain top's
+ * subtree.  This prevents graph changes from happening just because
+ * BdrvChildRole.update_filename() yields or polls.  Thus, the block
+ * job is paused during that drained section and must finish before or
+ * after.
+ *
+ * (In addition, bdrv_replace_child() must keep the job paused.)
+ */
+static void test_drop_intermediate_poll(void)
+{
+static BdrvChildRole chain_child_role;
+BlockDriverState *chain[3];
+TestSimpleBlockJob *job;
+BlockDriverState *job_node;
+bool job_has_completed = false;
+int i;
+int ret;
+
+chain_child_role = child_backing;
+chain_child_role.update_filename = drop_intermediate_poll_update_filename;
+
+for (i = 0; i < 3; i++) {
+char name[32];
+snprintf(name, 32, "node-%i", i);
+
+chain[i] = bdrv_new_open_driver(&bdrv_test, name, 0, &error_abort);
+}
+
+job_node = bdrv_new_open_driv

[Qemu-block] [PULL 08/16] block: Reduce (un)drains when replacing a child

2019-08-16 Thread Kevin Wolf

From: Max Reitz 

Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.

This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way.  bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.

In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained.  So if anything, we have to drain the
parent before detaching the old child node.  Conversely, we have to
undrain it only after attaching the new child node.

Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block.c | 49 +
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index df3407934b..66e8602e68 100644
--- a/block.c
+++ b/block.c
@@ -2230,13 +2230,27 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
   BlockDriverState *new_bs)
 {
 BlockDriverState *old_bs = child->bs;
-int i;
+int new_bs_quiesce_counter;
+int drain_saldo;
 
 assert(!child->frozen);
 
 if (old_bs && new_bs) {
 assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
 }
+
+new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
+drain_saldo = new_bs_quiesce_counter - child->parent_quiesce_counter;
+
+/*
+ * If the new child node is drained but the old one was not, flush
+ * all outstanding requests to the old child node.
+ */
+while (drain_saldo > 0 && child->role->drained_begin) {
+bdrv_parent_drained_begin_single(child, true);
+drain_saldo--;
+}
+
 if (old_bs) {
 /* Detach first so that the recursive drain sections coming from @child
  * are already gone and we only end the drain sections that came from
@@ -2244,28 +2258,22 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 if (child->role->detach) {
 child->role->detach(child);
 }
-while (child->parent_quiesce_counter) {
-bdrv_parent_drained_end_single(child);
-}
 QLIST_REMOVE(child, next_parent);
-} else {
-assert(child->parent_quiesce_counter == 0);
 }
 
 child->bs = new_bs;
 
 if (new_bs) {
 QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-if (new_bs->quiesce_counter) {
-int num = new_bs->quiesce_counter;
-if (child->role->parent_is_bds) {
-num -= bdrv_drain_all_count;
-}
-assert(num >= 0);
-for (i = 0; i < num; i++) {
-bdrv_parent_drained_begin_single(child, true);
-}
-}
+
+/*
+ * Detaching the old node may have led to the new node's
+ * quiesce_counter having been decreased.  Not a problem, we
+ * just need to recognize this here and then invoke
+ * drained_end appropriately more often.
+ */
+assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
+drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
 
 /* Attach only after starting new drained sections, so that recursive
  * drain sections coming from @child don't get an extra .drained_begin
@@ -2274,6 +2282,15 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 child->role->attach(child);
 }
 }
+
+/*
+ * If the old child node was drained but the new one is not, allow
+ * requests to come in only after the new node has been attached.
+ */
+while (drain_saldo < 0 && child->role->drained_end) {
+bdrv_parent_drained_end_single(child);
+drain_saldo++;
+}
 }
 
 /*
-- 
2.20.1

[Qemu-block] [PULL 10/16] tests: Test mid-drain bdrv_replace_child_noperm()

2019-08-16 Thread Kevin Wolf

From: Max Reitz 

Add a test for what happens when you call bdrv_replace_child_noperm()
for various drain situations ({old,new} child {drained,not drained}).

Most importantly, if both the old and the new child are drained, the
parent must not be undrained at any point.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 tests/test-bdrv-drain.c | 308 
 1 file changed, 308 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 1600d41e9a..9dffd86c47 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -1835,6 +1835,311 @@ static void test_drop_intermediate_poll(void)
 bdrv_unref(chain[2]);
 }
 
+
+typedef struct BDRVReplaceTestState {
+bool was_drained;
+bool was_undrained;
+bool has_read;
+
+int drain_count;
+
+bool yield_before_read;
+Coroutine *io_co;
+Coroutine *drain_co;
+} BDRVReplaceTestState;
+
+static void bdrv_replace_test_close(BlockDriverState *bs)
+{
+}
+
+/**
+ * If @bs has a backing file:
+ *   Yield if .yield_before_read is true (and wait for drain_begin to
+ *   wake us up).
+ *   Forward the read to bs->backing.  Set .has_read to true.
+ *   If drain_begin has woken us, wake it in turn.
+ *
+ * Otherwise:
+ *   Set .has_read to true and return success.
+ */
+static int coroutine_fn bdrv_replace_test_co_preadv(BlockDriverState *bs,
+uint64_t offset,
+uint64_t bytes,
+QEMUIOVector *qiov,
+int flags)
+{
+BDRVReplaceTestState *s = bs->opaque;
+
+if (bs->backing) {
+int ret;
+
+g_assert(!s->drain_count);
+
+s->io_co = qemu_coroutine_self();
+if (s->yield_before_read) {
+s->yield_before_read = false;
+qemu_coroutine_yield();
+}
+s->io_co = NULL;
+
+ret = bdrv_preadv(bs->backing, offset, qiov);
+s->has_read = true;
+
+/* Wake up drain_co if it runs */
+if (s->drain_co) {
+aio_co_wake(s->drain_co);
+}
+
+return ret;
+}
+
+s->has_read = true;
+return 0;
+}
+
+/**
+ * If .drain_count is 0, wake up .io_co if there is one; and set
+ * .was_drained.
+ * Increment .drain_count.
+ */
+static void coroutine_fn bdrv_replace_test_co_drain_begin(BlockDriverState *bs)
+{
+BDRVReplaceTestState *s = bs->opaque;
+
+if (!s->drain_count) {
+/* Keep waking io_co up until it is done */
+s->drain_co = qemu_coroutine_self();
+while (s->io_co) {
+aio_co_wake(s->io_co);
+s->io_co = NULL;
+qemu_coroutine_yield();
+}
+s->drain_co = NULL;
+
+s->was_drained = true;
+}
+s->drain_count++;
+}
+
+/**
+ * Reduce .drain_count, set .was_undrained once it reaches 0.
+ * If .drain_count reaches 0 and the node has a backing file, issue a
+ * read request.
+ */
+static void coroutine_fn bdrv_replace_test_co_drain_end(BlockDriverState *bs)
+{
+BDRVReplaceTestState *s = bs->opaque;
+
+g_assert(s->drain_count > 0);
+if (!--s->drain_count) {
+int ret;
+
+s->was_undrained = true;
+
+if (bs->backing) {
+char data;
+QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
+
+/* Queue a read request post-drain */
+ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
+g_assert(ret >= 0);
+}
+}
+}
+
+static BlockDriver bdrv_replace_test = {
+.format_name= "replace_test",
+.instance_size  = sizeof(BDRVReplaceTestState),
+
+.bdrv_close = bdrv_replace_test_close,
+.bdrv_co_preadv = bdrv_replace_test_co_preadv,
+
+.bdrv_co_drain_begin= bdrv_replace_test_co_drain_begin,
+.bdrv_co_drain_end  = bdrv_replace_test_co_drain_end,
+
+.bdrv_child_perm= bdrv_format_default_perms,
+};
+
+static void coroutine_fn test_replace_child_mid_drain_read_co(void *opaque)
+{
+int ret;
+char data;
+
+ret = blk_co_pread(opaque, 0, 1, &data, 0);
+g_assert(ret >= 0);
+}
+
+/**
+ * We test two things:
+ * (1) bdrv_replace_child_noperm() must not undrain the parent if both
+ * children are drained.
+ * (2) bdrv_replace_child_noperm() must never flush I/O requests to a
+ * drained child.  If the old child is drained, it must flush I/O
+ * requests after the new one has been attached.  If the new child
+ * is drained, it must flush I/O requests before the old one is
+ * detached.
+ *
+ * To do so, we create one parent node and two child nodes; then
+ * attach one of the children (old_child_bs) to the parent, then
+ * drain both old_child_bs and new_child_bs according to
+ * old_drain_count and new_drain_count, respectively, and finally
+ * we invoke bdrv_replace_node() to repla

[Qemu-block] [PULL 04/16] iotests: Move migration helpers to iotests.py

2019-08-16 Thread Kevin Wolf

234 implements functions that are useful for doing migration between two
VMs. Move them to iotests.py so that other test cases can use them, too.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/234| 30 +++---
 tests/qemu-iotests/iotests.py | 16 
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/234 b/tests/qemu-iotests/234
index c4c26bc21e..34c818c485 100755
--- a/tests/qemu-iotests/234
+++ b/tests/qemu-iotests/234
@@ -26,22 +26,6 @@ import os
 iotests.verify_image_format(supported_fmts=['qcow2'])
 iotests.verify_platform(['linux'])
 
-def enable_migration_events(vm, name):
-iotests.log('Enabling migration QMP events on %s...' % name)
-iotests.log(vm.qmp('migrate-set-capabilities', capabilities=[
-{
-'capability': 'events',
-'state': True
-}
-]))
-
-def wait_migration(vm):
-while True:
-event = vm.event_wait('MIGRATION')
-iotests.log(event, filters=[iotests.filter_qmp_event])
-if event['data']['status'] == 'completed':
-break
-
 with iotests.FilePath('img') as img_path, \
  iotests.FilePath('backing') as backing_path, \
  iotests.FilePath('mig_fifo_a') as fifo_a, \
@@ -62,7 +46,7 @@ with iotests.FilePath('img') as img_path, \
  .add_blockdev('%s,file=drive0-backing-file,node-name=drive0-backing' 
% (iotests.imgfmt))
  .launch())
 
-enable_migration_events(vm_a, 'A')
+vm_a.enable_migration_events('A')
 
 iotests.log('Launching destination VM...')
 (vm_b.add_blockdev('file,filename=%s,node-name=drive0-file' % (img_path))
@@ -72,7 +56,7 @@ with iotests.FilePath('img') as img_path, \
  .add_incoming("exec: cat '%s'" % (fifo_a))
  .launch())
 
-enable_migration_events(vm_b, 'B')
+vm_b.enable_migration_events('B')
 
 # Add a child node that was created after the parent node. The reverse case
 # is covered by the -blockdev options above.
@@ -85,9 +69,9 @@ with iotests.FilePath('img') as img_path, \
 iotests.log(vm_a.qmp('migrate', uri='exec:cat >%s' % (fifo_a)))
 with iotests.Timeout(3, 'Migration does not complete'):
 # Wait for the source first (which includes setup=setup)
-wait_migration(vm_a)
+vm_a.wait_migration()
 # Wait for the destination second (which does not)
-wait_migration(vm_b)
+vm_b.wait_migration()
 
 iotests.log(vm_a.qmp('query-migrate')['return']['status'])
 iotests.log(vm_b.qmp('query-migrate')['return']['status'])
@@ -105,7 +89,7 @@ with iotests.FilePath('img') as img_path, \
  .add_incoming("exec: cat '%s'" % (fifo_b))
  .launch())
 
-enable_migration_events(vm_a, 'A')
+vm_a.enable_migration_events('A')
 
 iotests.log(vm_a.qmp('blockdev-snapshot', node='drive0-backing',
  overlay='drive0'))
@@ -114,9 +98,9 @@ with iotests.FilePath('img') as img_path, \
 iotests.log(vm_b.qmp('migrate', uri='exec:cat >%s' % (fifo_b)))
 with iotests.Timeout(3, 'Migration does not complete'):
 # Wait for the source first (which includes setup=setup)
-wait_migration(vm_b)
+vm_b.wait_migration()
 # Wait for the destination second (which does not)
-wait_migration(vm_a)
+vm_a.wait_migration()
 
 iotests.log(vm_a.qmp('query-migrate')['return']['status'])
 iotests.log(vm_b.qmp('query-migrate')['return']['status'])
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index ce74177ab1..91172c39a5 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -583,6 +583,22 @@ class VM(qtest.QEMUQtestMachine):
 elif status == 'null':
 return error
 
+def enable_migration_events(self, name):
+log('Enabling migration QMP events on %s...' % name)
+log(self.qmp('migrate-set-capabilities', capabilities=[
+{
+'capability': 'events',
+'state': True
+}
+]))
+
+def wait_migration(self):
+while True:
+event = self.event_wait('MIGRATION')
+log(event, filters=[filter_qmp_event])
+if event['data']['status'] == 'completed':
+break
+
 def node_info(self, node_name):
 nodes = self.qmp('query-named-block-nodes')
 for x in nodes['return']:
-- 
2.20.1

[Qemu-block] [PULL 05/16] iotests: Test migration with all kinds of filter nodes

2019-08-16 Thread Kevin Wolf

This test case is motivated by commit 2b23f28639 ('block/copy-on-read:
Fix permissions for inactive node'). Instead of just testing
copy-on-read on migration, let's stack all sorts of filter nodes on top
of each other and try if the resulting VM can still migrate
successfully. For good measure, put everything into an iothread, because
why not?

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/262 | 82 ++
 tests/qemu-iotests/262.out | 17 
 tests/qemu-iotests/group   |  1 +
 3 files changed, 100 insertions(+)
 create mode 100755 tests/qemu-iotests/262
 create mode 100644 tests/qemu-iotests/262.out

diff --git a/tests/qemu-iotests/262 b/tests/qemu-iotests/262
new file mode 100755
index 00..398f63587e
--- /dev/null
+++ b/tests/qemu-iotests/262
@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+# Creator/Owner: Kevin Wolf 
+#
+# Test migration with filter drivers present. Keep everything in an
+# iothread just for fun.
+
+import iotests
+import os
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+iotests.verify_platform(['linux'])
+
+with iotests.FilePath('img') as img_path, \
+ iotests.FilePath('mig_fifo') as fifo, \
+ iotests.VM(path_suffix='a') as vm_a, \
+ iotests.VM(path_suffix='b') as vm_b:
+
+def add_opts(vm):
+vm.add_object('iothread,id=iothread0')
+vm.add_object('throttle-group,id=tg0,x-bps-total=65536')
+vm.add_blockdev('file,filename=%s,node-name=drive0-file' % (img_path))
+vm.add_blockdev('%s,file=drive0-file,node-name=drive0-fmt' % 
(iotests.imgfmt))
+vm.add_blockdev('copy-on-read,file=drive0-fmt,node-name=drive0-cor')
+
vm.add_blockdev('throttle,file=drive0-cor,node-name=drive0-throttle,throttle-group=tg0')
+vm.add_blockdev('blkdebug,image=drive0-throttle,node-name=drive0-dbg')
+vm.add_blockdev('null-co,node-name=null,read-zeroes=on')
+
vm.add_blockdev('blkverify,test=drive0-dbg,raw=null,node-name=drive0-verify')
+
+if iotests.supports_quorum():
+
vm.add_blockdev('quorum,children.0=drive0-verify,vote-threshold=1,node-name=drive0-quorum')
+root = "drive0-quorum"
+else:
+root = "drive0-verify"
+
+vm.add_device('virtio-blk,drive=%s,iothread=iothread0' % root)
+
+iotests.qemu_img_pipe('create', '-f', iotests.imgfmt, img_path, '64M')
+
+os.mkfifo(fifo)
+
+iotests.log('Launching source VM...')
+add_opts(vm_a)
+vm_a.launch()
+
+vm_a.enable_migration_events('A')
+
+iotests.log('Launching destination VM...')
+add_opts(vm_b)
+vm_b.add_incoming("exec: cat '%s'" % (fifo))
+vm_b.launch()
+
+vm_b.enable_migration_events('B')
+
+iotests.log('Starting migration to B...')
+iotests.log(vm_a.qmp('migrate', uri='exec:cat >%s' % (fifo)))
+with iotests.Timeout(3, 'Migration does not complete'):
+# Wait for the source first (which includes setup=setup)
+vm_a.wait_migration()
+# Wait for the destination second (which does not)
+vm_b.wait_migration()
+
+iotests.log(vm_a.qmp('query-migrate')['return']['status'])
+iotests.log(vm_b.qmp('query-migrate')['return']['status'])
+
+iotests.log(vm_a.qmp('query-status'))
+iotests.log(vm_b.qmp('query-status'))
diff --git a/tests/qemu-iotests/262.out b/tests/qemu-iotests/262.out
new file mode 100644
index 00..5a58e5e9f8
--- /dev/null
+++ b/tests/qemu-iotests/262.out
@@ -0,0 +1,17 @@
+Launching source VM...
+Enabling migration QMP events on A...
+{"return": {}}
+Launching destination VM...
+Enabling migration QMP events on B...
+{"return": {}}
+Starting migration to B...
+{"return": {}}
+{"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "active"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+completed
+completed
+{

[Qemu-block] [PULL 06/16] block: Simplify bdrv_filter_default_perms()

2019-08-16 Thread Kevin Wolf

The same change as commit 2b23f28639 ('block/copy-on-read: Fix
permissions for inactive node') made for the copy-on-read driver can be
made for bdrv_filter_default_perms(): Retaining the old permissions from
the BdrvChild if it is given complicates things unnecessarily when in
the end this only means that the options set in the c == NULL case (i.e.
during child creation) are retained.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index cbd8da5f3b..6db8ecd62b 100644
--- a/block.c
+++ b/block.c
@@ -2168,16 +2168,8 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
-if (c == NULL) {
-*nperm = perm & DEFAULT_PERM_PASSTHROUGH;
-*nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | 
DEFAULT_PERM_UNCHANGED;
-return;
-}
-
-*nperm = (perm & DEFAULT_PERM_PASSTHROUGH) |
- (c->perm & DEFAULT_PERM_UNCHANGED);
-*nshared = (shared & DEFAULT_PERM_PASSTHROUGH) |
-   (c->shared_perm & DEFAULT_PERM_UNCHANGED);
+*nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+*nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
 }
 
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
-- 
2.20.1

[Qemu-block] [PULL 07/16] block: Keep subtree drained in drop_intermediate

2019-08-16 Thread Kevin Wolf

From: Max Reitz 

bdrv_drop_intermediate() calls BdrvChildRole.update_filename().  That
may poll, thus changing the graph, which potentially breaks the
QLIST_FOREACH_SAFE() loop.

Just keep the whole subtree drained.  This is probably the right thing
to do anyway (dropping nodes while the subtree is not drained seems
wrong).

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 6db8ecd62b..df3407934b 100644
--- a/block.c
+++ b/block.c
@@ -4491,6 +4491,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,
 int ret = -EIO;
 
 bdrv_ref(top);
+bdrv_subtree_drained_begin(top);
 
 if (!top->drv || !base->drv) {
 goto exit;
@@ -4562,6 +4563,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,
 
 ret = 0;
 exit:
+bdrv_subtree_drained_end(top);
 bdrv_unref(top);
 return ret;
 }
-- 
2.20.1

[Qemu-block] [PULL 02/16] iotests/118: Create test classes dynamically

2019-08-16 Thread Kevin Wolf

We're getting a ridiculous number of child classes of
TestInitiallyFilled and TestInitiallyEmpty that differ only in a few
attributes that we want to test in all combinations.

Instead of explicitly writing down every combination, let's use a loop
and create those classes dynamically.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/118 | 69 +-
 1 file changed, 21 insertions(+), 48 deletions(-)

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
index 3c20d2d61f..c281259215 100755
--- a/tests/qemu-iotests/118
+++ b/tests/qemu-iotests/118
@@ -294,15 +294,15 @@ class GeneralChangeTestsBaseClass(ChangeBaseClass):
 class TestInitiallyFilled(GeneralChangeTestsBaseClass):
 was_empty = False
 
-def setUp(self, media, interface):
+def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, old_img, '1440k')
 qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
 self.vm = iotests.VM()
-self.vm.add_drive(old_img, 'media=%s' % media, 'none')
-if interface == 'scsi':
+self.vm.add_drive(old_img, 'media=%s' % self.media, 'none')
+if self.interface == 'scsi':
 self.vm.add_device('virtio-scsi-pci')
 self.vm.add_device('%s,drive=drive0,id=%s' %
-   (interface_to_device_name(interface),
+   (interface_to_device_name(self.interface),
 self.device_name))
 self.vm.launch()
 
@@ -331,13 +331,13 @@ class TestInitiallyFilled(GeneralChangeTestsBaseClass):
 class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
 was_empty = True
 
-def setUp(self, media, interface):
+def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, new_img, '1440k')
-self.vm = iotests.VM().add_drive(None, 'media=%s' % media, 'none')
-if interface == 'scsi':
+self.vm = iotests.VM().add_drive(None, 'media=%s' % self.media, 'none')
+if self.interface == 'scsi':
 self.vm.add_device('virtio-scsi-pci')
 self.vm.add_device('%s,drive=drive0,id=%s' %
-   (interface_to_device_name(interface),
+   (interface_to_device_name(self.interface),
 self.device_name))
 self.vm.launch()
 
@@ -355,50 +355,23 @@ class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
 # Should be a no-op
 self.assert_qmp(result, 'return', {})
 
-class TestCDInitiallyFilled(TestInitiallyFilled):
-TestInitiallyFilled = TestInitiallyFilled
-has_real_tray = True
-
-def setUp(self):
-self.TestInitiallyFilled.setUp(self, 'cdrom', 'ide')
-
-class TestCDInitiallyEmpty(TestInitiallyEmpty):
-TestInitiallyEmpty = TestInitiallyEmpty
-has_real_tray = True
-
-def setUp(self):
-self.TestInitiallyEmpty.setUp(self, 'cdrom', 'ide')
+# Do this in a function to avoid leaking variables like case into the global
+# name space (otherwise tests would be run for the abstract base classes)
+def create_basic_test_classes():
+for (media, interface, has_real_tray) in [ ('cdrom', 'ide', True),
+   ('cdrom', 'scsi', True),
+   ('disk', 'floppy', False) ]:
 
-class TestSCSICDInitiallyFilled(TestInitiallyFilled):
-TestInitiallyFilled = TestInitiallyFilled
-has_real_tray = True
+for case in [ TestInitiallyFilled, TestInitiallyEmpty ]:
 
-def setUp(self):
-self.TestInitiallyFilled.setUp(self, 'cdrom', 'scsi')
+attr = { 'media': media,
+ 'interface': interface,
+ 'has_real_tray': has_real_tray }
 
-class TestSCSICDInitiallyEmpty(TestInitiallyEmpty):
-TestInitiallyEmpty = TestInitiallyEmpty
-has_real_tray = True
+name = '%s_%s_%s' % (case.__name__, media, interface)
+globals()[name] = type(name, (case, ), attr)
 
-def setUp(self):
-self.TestInitiallyEmpty.setUp(self, 'cdrom', 'scsi')
-
-class TestFloppyInitiallyFilled(TestInitiallyFilled):
-TestInitiallyFilled = TestInitiallyFilled
-has_real_tray = False
-
-def setUp(self):
-self.TestInitiallyFilled.setUp(self, 'disk', 'floppy')
-
-class TestFloppyInitiallyEmpty(TestInitiallyEmpty):
-TestInitiallyEmpty = TestInitiallyEmpty
-has_real_tray = False
-
-def setUp(self):
-self.TestInitiallyEmpty.setUp(self, 'disk', 'floppy')
-# FDDs not having a real tray and there not being a medium inside the
-# tray at startup means the tray will be considered open
-self.has_opened = True
+create_basic_test_classes()
 
 class TestChangeReadOnly(ChangeBaseClass):
 device_name = 'qdev0'
-- 
2.20.1

1 2 >

1 - 100 of 144 matches

Mail list logo