Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191025095849.25283-1-mre...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  block/sheepdog.o
  CC  block/accounting.o
/tmp/qemu-test/src/block/file-posix.c: In function 'raw_open_common':
/tmp/qemu-test/src/block/file-posix.c:671:5: error: implicit declaration of 
function 'platform_test_xfs_fd' [-Werror=implicit-function-declaration]
 if (platform_test_xfs_fd(s->fd)) {
 ^
/tmp/qemu-test/src/block/file-posix.c:671:5: error: nested extern declaration 
of 'platform_test_xfs_fd' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [block/file-posix.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=c74d47e9bfb24107b6e94885fa8a2151', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-ytazf4e4/src/docker-src.2019-10-25-20.11.53.7609:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=c74d47e9bfb24107b6e94885fa8a2151
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ytazf4e4/src'
make: *** [docker-run-test-quick@centos7] Error 2

real2m32.235s
user0m8.092s


The full log is available at
http://patchew.org/logs/20191025095849.25283-1-mre...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/4] SCSI COMPARE_AND_WRITE support

2019-10-25 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/1571996163-27688-1-git-send-email-baiyao...@cmss.chinamobile.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 0/4] SCSI COMPARE_AND_WRITE support
Type: series
Message-id: 1571996163-27688-1-git-send-email-baiyao...@cmss.chinamobile.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
9dec556 scsi-disk: add FUA support for COMPARE_AND_WRITE
8ee0b25 hw/scsi: add SCSI COMPARE_AND_WRITE support
6c2bd51 block/rbd: implement bdrv_aio_compare_and_write interface
f2cafca block: add SCSI COMPARE_AND_WRITE support

=== OUTPUT BEGIN ===
1/4 Checking commit f2cafca98400 (block: add SCSI COMPARE_AND_WRITE support)
2/4 Checking commit 6c2bd51ede14 (block/rbd: implement 
bdrv_aio_compare_and_write interface)
ERROR: braces {} are necessary for all arms of this statement
#59: FILE: block/rbd.c:808:
+if (LIBRBD_HAVE_COMPARE_AND_WRITE)
[...]

ERROR: line over 90 characters
#87: FILE: block/rbd.c:1015:
+r = rbd_aio_compare_and_write(s->image, off, size/2, rcb->buf, 
(rcb->buf + size/2), c, 0, 0);

ERROR: spaces required around that '/' (ctx:VxV)
#87: FILE: block/rbd.c:1015:
+r = rbd_aio_compare_and_write(s->image, off, size/2, rcb->buf, 
(rcb->buf + size/2), c, 0, 0);
  ^

ERROR: spaces required around that '/' (ctx:VxV)
#87: FILE: block/rbd.c:1015:
+r = rbd_aio_compare_and_write(s->image, off, size/2, rcb->buf, 
(rcb->buf + size/2), c, 0, 0);

^

WARNING: line over 80 characters
#98: FILE: block/rbd.c:1082:
+  uint64_t offset, uint64_t 
bytes,

total: 4 errors, 1 warnings, 90 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/4 Checking commit 8ee0b25d3d83 (hw/scsi: add SCSI COMPARE_AND_WRITE support)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#219: 
new file mode 100644

total: 0 errors, 1 warnings, 190 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/4 Checking commit 9dec556b752c (scsi-disk: add FUA support for 
COMPARE_AND_WRITE)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1571996163-27688-1-git-send-email-baiyao...@cmss.chinamobile.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PULL 1/2] virtio-blk: Add blk_drain() to virtio_blk_device_unrealize()

2019-10-25 Thread Stefan Hajnoczi
From: Julia Suvorova 

QEMU does not wait for completed I/O requests, assuming that the guest
driver will reset the device before calling unrealize(). This does not
happen on Windows, and QEMU crashes in virtio_notify(), getting the
result of a completed I/O request on hot-unplugged device.

Signed-off-by: Julia Suvorova 
Message-Id: <20191018142856.31870-1-jus...@redhat.com>
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ed2ddebd2b..14e9f85b8b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1207,6 +1207,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOBlock *s = VIRTIO_BLK(dev);
 
+blk_drain(s->blk);
 virtio_blk_data_plane_destroy(s->dataplane);
 s->dataplane = NULL;
 qemu_del_vm_change_state_handler(s->change);
-- 
2.21.0




[PULL 0/2] Block patches

2019-10-25 Thread Stefan Hajnoczi
The following changes since commit 58560ad254fbda71d4daa6622d71683190070ee2:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191024' into 
staging (2019-10-24 16:22:58 +0100)

are available in the Git repository at:

  https://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to d154ef37ff885918fa3e512fd7a8e42870291667:

  yield_until_fd_readable: make it work with any AioContect (2019-10-25 
14:38:29 +0200)


Pull request



Dietmar Maurer (1):
  yield_until_fd_readable: make it work with any AioContect

Julia Suvorova (1):
  virtio-blk: Add blk_drain() to virtio_blk_device_unrealize()

 hw/block/virtio-blk.c| 1 +
 util/qemu-coroutine-io.c | 7 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.21.0




[PULL 2/2] yield_until_fd_readable: make it work with any AioContect

2019-10-25 Thread Stefan Hajnoczi
From: Dietmar Maurer 

Simply use qemu_get_current_aio_context().

Signed-off-by: Dietmar Maurer 
Message-Id: <20191024045610.9071-1-diet...@proxmox.com>
Signed-off-by: Stefan Hajnoczi 
---
 util/qemu-coroutine-io.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/util/qemu-coroutine-io.c b/util/qemu-coroutine-io.c
index 44a8969a69..5b80bb416f 100644
--- a/util/qemu-coroutine-io.c
+++ b/util/qemu-coroutine-io.c
@@ -67,6 +67,7 @@ qemu_co_send_recv(int sockfd, void *buf, size_t bytes, bool 
do_send)
 }
 
 typedef struct {
+AioContext *ctx;
 Coroutine *co;
 int fd;
 } FDYieldUntilData;
@@ -74,7 +75,7 @@ typedef struct {
 static void fd_coroutine_enter(void *opaque)
 {
 FDYieldUntilData *data = opaque;
-qemu_set_fd_handler(data->fd, NULL, NULL, NULL);
+aio_set_fd_handler(data->ctx, data->fd, false, NULL, NULL, NULL, NULL);
 qemu_coroutine_enter(data->co);
 }
 
@@ -83,8 +84,10 @@ void coroutine_fn yield_until_fd_readable(int fd)
 FDYieldUntilData data;
 
 assert(qemu_in_coroutine());
+data.ctx = qemu_get_current_aio_context();
 data.co = qemu_coroutine_self();
 data.fd = fd;
-qemu_set_fd_handler(fd, fd_coroutine_enter, NULL, &data);
+aio_set_fd_handler(
+data.ctx, fd, false, fd_coroutine_enter, NULL, NULL, &data);
 qemu_coroutine_yield();
 }
-- 
2.21.0




Re: [PATCH v2 02/15] qapi/block-core: add option for io_uring

2019-10-25 Thread Markus Armbruster
Stefan Hajnoczi  writes:

> From: Aarushi Mehta 
>
> Only enumerates option for devices that support it.

I'm not sure I get this sentence.

> Since QAPI schema
> supports io_uring, which is the actual name of the Linux API, it is
> preferred over io-uring.

I guess this one means something like "Since io_uring is the actual name
of the Linux API, we use it as enum value even though the QAPI schema
conventions would prefer io-uring."

> Signed-off-by: Aarushi Mehta 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qapi/block-core.json | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b274aef713..3196f40178 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2851,11 +2851,13 @@
>  #
>  # @threads: Use qemu's thread pool
>  # @native:  Use native AIO backend (only Linux and Windows)
> +# @io_uring:Use linux io_uring (since 4.2)
>  #
>  # Since: 2.9
>  ##
>  { 'enum': 'BlockdevAioOptions',
> -  'data': [ 'threads', 'native' ] }
> +  'data': [ 'threads', 'native',
> +{ 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } ] 
> }
>  
>  ##
>  # @BlockdevCacheOptions:

I encourage you to polish the commit message some.

Acked-by: Markus Armbruster 




[PATCH v7 0/4] colo: Add support for continuous replication

2019-10-25 Thread Lukas Straub
Hello Everyone,
These Patches add support for continuous replication to colo. This means
that after the Primary fails and the Secondary did a failover, the Secondary
can then become Primary and resume replication to a new Secondary.

Regards,
Lukas Straub

v7:
 - clarify meaning of ip's in documentation and note that active and hidden
   images just need to be created once
 - reverted removal of bdrv_is_root_node(top_bs) in replication and adjusted
   the top-id= parameter in documentation acordingly

v6:
 - documented the position= and insert= options
 - renamed replication test
 - clarified documentation by using different ip's for primary and secondary
 - added Reviewed-by tags

v5:
 - change syntax for the position= parameter
 - fix spelling mistake

v4:
 - fix checkpatch.pl warnings

v3:
 - add test for replication changes
 - check if the filter to be inserted before/behind belongs to the same 
interface
 - fix the error message for the position= parameter
 - rename term "after" -> "behind" and variable "insert_before" -> 
"insert_before_flag"
 - document the quorum node on the secondary side
 - simplify quorum parameters in documentation
 - remove trailing spaces in documentation
 - clarify the testing procedure in documentation

v2:
 - fix email formating
 - fix checkpatch.pl warnings
 - fix patchew error
 - clearer commit messages


Lukas Straub (4):
  block/replication.c: Ignore requests after failover
  tests/test-replication.c: Add test for for secondary node continuing
replication
  net/filter.c: Add Options to insert filters anywhere in the filter
list
  colo: Update Documentation for continuous replication

 block/replication.c|  35 +-
 docs/COLO-FT.txt   | 224 +++--
 docs/block-replication.txt |  28 +++--
 include/net/filter.h   |   2 +
 net/filter.c   |  92 ++-
 qemu-options.hx|  31 -
 tests/test-replication.c   |  52 +
 7 files changed, 389 insertions(+), 75 deletions(-)

--
2.20.1



[PATCH v7 0/4] colo: Add support for continuous replication

2019-10-25 Thread Lukas Straub
Hello Everyone,
These Patches add support for continuous replication to colo. This means
that after the Primary fails and the Secondary did a failover, the Secondary
can then become Primary and resume replication to a new Secondary.

Regards,
Lukas Straub

v7:
 - clarify meaning of ip's in documentation and note that active and hidden
   images just need to be created once
 - reverted removal of bdrv_is_root_node(top_bs) in replication and adjusted
   the top-id= parameter in documentation acordingly

v6:
 - documented the position= and insert= options
 - renamed replication test
 - clarified documentation by using different ip's for primary and secondary
 - added Reviewed-by tags

v5:
 - change syntax for the position= parameter
 - fix spelling mistake

v4:
 - fix checkpatch.pl warnings

v3:
 - add test for replication changes
 - check if the filter to be inserted before/behind belongs to the same 
interface
 - fix the error message for the position= parameter
 - rename term "after" -> "behind" and variable "insert_before" -> 
"insert_before_flag"
 - document the quorum node on the secondary side
 - simplify quorum parameters in documentation
 - remove trailing spaces in documentation
 - clarify the testing procedure in documentation

v2:
 - fix email formating
 - fix checkpatch.pl warnings
 - fix patchew error
 - clearer commit messages


Lukas Straub (4):
  block/replication.c: Ignore requests after failover
  tests/test-replication.c: Add test for for secondary node continuing
replication
  net/filter.c: Add Options to insert filters anywhere in the filter
list
  colo: Update Documentation for continuous replication

 block/replication.c|  35 +-
 docs/COLO-FT.txt   | 224 +++--
 docs/block-replication.txt |  28 +++--
 include/net/filter.h   |   2 +
 net/filter.c   |  92 ++-
 qemu-options.hx|  31 -
 tests/test-replication.c   |  52 +
 7 files changed, 389 insertions(+), 75 deletions(-)

--
2.20.1



[PATCH v2 10/15] block/io_uring: adds userspace completion polling

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index a5c0d16220..56892fd1ab 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -274,6 +274,21 @@ static void qemu_luring_completion_cb(void *opaque)
 luring_process_completions_and_submit(s);
 }
 
+static bool qemu_luring_poll_cb(void *opaque)
+{
+LuringState *s = opaque;
+struct io_uring_cqe *cqes;
+
+if (io_uring_peek_cqe(&s->ring, &cqes) == 0) {
+if (cqes) {
+luring_process_completions_and_submit(s);
+return true;
+}
+}
+
+return false;
+}
+
 static void ioq_init(LuringQueue *io_q)
 {
 QSIMPLEQ_INIT(&io_q->submit_queue);
@@ -387,7 +402,7 @@ void luring_attach_aio_context(LuringState *s, AioContext 
*new_context)
 s->aio_context = new_context;
 s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s);
 aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false,
-   qemu_luring_completion_cb, NULL, NULL, s);
+   qemu_luring_completion_cb, NULL, qemu_luring_poll_cb, 
s);
 }
 
 LuringState *luring_init(Error **errp)
-- 
2.21.0




[PATCH v2 06/15] util/async: add aio interfaces for io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 util/async.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/util/async.c b/util/async.c
index ca83e32c7f..c9880e1812 100644
--- a/util/async.c
+++ b/util/async.c
@@ -276,6 +276,14 @@ aio_ctx_finalize(GSource *source)
 }
 #endif
 
+#ifdef CONFIG_LINUX_IO_URING
+if (ctx->linux_io_uring) {
+luring_detach_aio_context(ctx->linux_io_uring, ctx);
+luring_cleanup(ctx->linux_io_uring);
+ctx->linux_io_uring = NULL;
+}
+#endif
+
 assert(QSLIST_EMPTY(&ctx->scheduled_coroutines));
 qemu_bh_delete(ctx->co_schedule_bh);
 
@@ -340,6 +348,29 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 }
 #endif
 
+#ifdef CONFIG_LINUX_IO_URING
+LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp)
+{
+if (ctx->linux_io_uring) {
+return ctx->linux_io_uring;
+}
+
+ctx->linux_io_uring = luring_init(errp);
+if (!ctx->linux_io_uring) {
+return NULL;
+}
+
+luring_attach_aio_context(ctx->linux_io_uring, ctx);
+return ctx->linux_io_uring;
+}
+
+LuringState *aio_get_linux_io_uring(AioContext *ctx)
+{
+assert(ctx->linux_io_uring);
+return ctx->linux_io_uring;
+}
+#endif
+
 void aio_notify(AioContext *ctx)
 {
 /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
@@ -435,6 +466,11 @@ AioContext *aio_context_new(Error **errp)
 #ifdef CONFIG_LINUX_AIO
 ctx->linux_aio = NULL;
 #endif
+
+#ifdef CONFIG_LINUX_IO_URING
+ctx->linux_io_uring = NULL;
+#endif
+
 ctx->thread_pool = NULL;
 qemu_rec_mutex_init(&ctx->lock);
 timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
-- 
2.21.0




[PATCH v2 07/15] blockdev: adds bdrv_parse_aio to use io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block.h |  1 +
 block.c   | 22 ++
 blockdev.c| 12 
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index bdb48dcd1b..bf36fa9863 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -300,6 +300,7 @@ void bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
 void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
Error **errp);
 
+int bdrv_parse_aio(const char *mode, int *flags);
 int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough);
 int bdrv_parse_discard_flags(const char *mode, int *flags);
 BdrvChild *bdrv_open_child(const char *filename,
diff --git a/block.c b/block.c
index dad5a3d8e0..290ab7d2bf 100644
--- a/block.c
+++ b/block.c
@@ -845,6 +845,28 @@ static BlockdevDetectZeroesOptions 
bdrv_parse_detect_zeroes(QemuOpts *opts,
 return detect_zeroes;
 }
 
+/**
+ * Set open flags for aio engine
+ *
+ * Return 0 on success, -1 if the engine specified is invalid
+ */
+int bdrv_parse_aio(const char *mode, int *flags)
+{
+if (!strcmp(mode, "threads")) {
+/* do nothing, default */
+} else if (!strcmp(mode, "native")) {
+*flags |= BDRV_O_NATIVE_AIO;
+#ifdef CONFIG_LINUX_IO_URING
+} else if (!strcmp(mode, "io_uring")) {
+*flags |= BDRV_O_IO_URING;
+#endif
+} else {
+return -1;
+}
+
+return 0;
+}
+
 /**
  * Set open flags for a given discard mode
  *
diff --git a/blockdev.c b/blockdev.c
index 03c7cd7651..6ceef376a6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -385,13 +385,9 @@ static void extract_common_blockdev_options(QemuOpts 
*opts, int *bdrv_flags,
 }
 
 if ((aio = qemu_opt_get(opts, "aio")) != NULL) {
-if (!strcmp(aio, "native")) {
-*bdrv_flags |= BDRV_O_NATIVE_AIO;
-} else if (!strcmp(aio, "threads")) {
-/* this is the default */
-} else {
-   error_setg(errp, "invalid aio option");
-   return;
+if (bdrv_parse_aio(aio, bdrv_flags) < 0) {
+error_setg(errp, "invalid aio option");
+return;
 }
 }
 }
@@ -4642,7 +4638,7 @@ QemuOptsList qemu_common_drive_opts = {
 },{
 .name = "aio",
 .type = QEMU_OPT_STRING,
-.help = "host AIO implementation (threads, native)",
+.help = "host AIO implementation (threads, native, io_uring)",
 },{
 .name = BDRV_OPT_CACHE_WB,
 .type = QEMU_OPT_BOOL,
-- 
2.21.0




[PATCH v2 15/15] tests/qemu-iotests: use AIOMODE with various tests

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
v11:
 * Drop line-wrapping changes that accidentally broke qemu-iotests
---
 tests/qemu-iotests/028 |  2 +-
 tests/qemu-iotests/058 |  2 +-
 tests/qemu-iotests/089 |  4 ++--
 tests/qemu-iotests/091 |  4 ++--
 tests/qemu-iotests/109 |  2 +-
 tests/qemu-iotests/147 |  5 +++--
 tests/qemu-iotests/181 |  8 
 tests/qemu-iotests/183 |  4 ++--
 tests/qemu-iotests/185 | 10 +-
 tests/qemu-iotests/200 |  2 +-
 tests/qemu-iotests/201 |  8 
 11 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/tests/qemu-iotests/028 b/tests/qemu-iotests/028
index bba1ee59ae..d4d972c884 100755
--- a/tests/qemu-iotests/028
+++ b/tests/qemu-iotests/028
@@ -108,7 +108,7 @@ echo block-backup
 echo
 
 qemu_comm_method="monitor"
-_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},id=disk
+_launch_qemu -drive 
file="${TEST_IMG}",cache=${CACHEMODE},aio=${AIOMODE},id=disk
 h=$QEMU_HANDLE
 if [ "${VALGRIND_QEMU}" == "y" ]; then
 QEMU_COMM_TIMEOUT=7
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 8c3212a72f..38d1ed90c0 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -64,7 +64,7 @@ nbd_snapshot_img="nbd:unix:$nbd_unix_socket"
 converted_image=$TEST_IMG.converted
 
 # Use -f raw instead of -f $IMGFMT for the NBD connection
-QEMU_IO_NBD="$QEMU_IO -f raw --cache=$CACHEMODE"
+QEMU_IO_NBD="$QEMU_IO -f raw --cache=$CACHEMODE --aio=$AIOMODE"
 
 echo
 echo "== preparing image =="
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index ad029f1f09..059ad75e28 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -64,7 +64,7 @@ $QEMU_IO -c 'write -P 42 0 512' -c 'write -P 23 512 512' \
 
 $QEMU_IMG convert -f raw -O $IMGFMT "$TEST_IMG.base" "$TEST_IMG"
 
-$QEMU_IO_PROG --cache $CACHEMODE \
+$QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE \
  -c 'read -P 42 0 512' -c 'read -P 23 512 512' \
  -c 'read -P 66 1024 512' "json:{
 \"driver\": \"$IMGFMT\",
@@ -111,7 +111,7 @@ $QEMU_IO -c 'write -P 42 0x38000 512' "$TEST_IMG" | 
_filter_qemu_io
 
 # The "image.filename" part tests whether "a": { "b": "c" } and "a.b": "c" do
 # the same (which they should).
-$QEMU_IO_PROG --cache $CACHEMODE \
+$QEMU_IO_PROG --cache $CACHEMODE --aio $AIOMODE  \
  -c 'read -P 42 0x38000 512' "json:{
 \"driver\": \"$IMGFMT\",
 \"file\": {
diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
index f4b44659ae..3a29469324 100755
--- a/tests/qemu-iotests/091
+++ b/tests/qemu-iotests/091
@@ -60,13 +60,13 @@ echo === Starting QEMU VM1 ===
 echo
 
 qemu_comm_method="monitor"
-_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},id=disk
+_launch_qemu -drive 
file="${TEST_IMG}",cache=${CACHEMODE},aio=${AIOMODE},id=disk
 h1=$QEMU_HANDLE
 
 echo
 echo === Starting QEMU VM2 ===
 echo
-_launch_qemu -drive file="${TEST_IMG}",cache=${CACHEMODE},id=disk \
+_launch_qemu -drive 
file="${TEST_IMG}",cache=${CACHEMODE},aio=${AIOMODE},id=disk \
  -incoming "exec: cat '${MIG_FIFO}'"
 h2=$QEMU_HANDLE
 
diff --git a/tests/qemu-iotests/109 b/tests/qemu-iotests/109
index 9897ceb6cd..3a3dcf94eb 100755
--- a/tests/qemu-iotests/109
+++ b/tests/qemu-iotests/109
@@ -52,7 +52,7 @@ run_qemu()
 local qmp_format="$3"
 local qmp_event="$4"
 
-_launch_qemu -drive 
file="${source_img}",format=raw,cache=${CACHEMODE},id=src
+_launch_qemu -drive 
file="${source_img}",format=raw,cache=${CACHEMODE},aio=${AIOMODE},id=src
 _send_qemu_cmd $QEMU_HANDLE "{ 'execute': 'qmp_capabilities' }" "return"
 
 _send_qemu_cmd $QEMU_HANDLE \
diff --git a/tests/qemu-iotests/147 b/tests/qemu-iotests/147
index ab8480b9a4..6bc50ff5d9 100755
--- a/tests/qemu-iotests/147
+++ b/tests/qemu-iotests/147
@@ -24,7 +24,7 @@ import socket
 import stat
 import time
 import iotests
-from iotests import cachemode, imgfmt, qemu_img, qemu_nbd, qemu_nbd_early_pipe
+from iotests import cachemode, aiomode, imgfmt, qemu_img, qemu_nbd, 
qemu_nbd_early_pipe
 
 NBD_PORT_START  = 32768
 NBD_PORT_END= NBD_PORT_START + 1024
@@ -134,7 +134,8 @@ class BuiltinNBD(NBDBlockdevAddBase):
 self.server.add_drive_raw('if=none,id=nbd-export,' +
   'file=%s,' % test_img +
   'format=%s,' % imgfmt +
-  'cache=%s' % cachemode)
+  'cache=%s' % cachemode +
+  'aio=%s' % aiomode)
 self.server.launch()
 
 def tearDown(self):
diff --git a/tests/qemu-iotests/181 b/tests/qemu-iotests/181
index e317e63422..5b7de72f18 100755
--- a/tests/qemu-iotests/181
+++ b/tests/qemu-iotests/181
@@ -58,20 +58,20 @@ qemu_comm_method="monitor"
 
 if [ "$IMGOPTSSYNTAX" = "true" ]; then
 _launch_qemu \
--drive "${TEST_IMG}",cache=${CACHEMODE},id=disk
+-drive "${TEST_IMG}",cache=${CACHEMODE},aio=$AIOMODE,id=disk
 else
 _launch_qe

[PATCH v2 05/15] stubs: add stubs for io_uring interface

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS |  1 +
 stubs/Makefile.objs |  1 +
 stubs/io_uring.c| 32 
 3 files changed, 34 insertions(+)
 create mode 100644 stubs/io_uring.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 31b090033c..c5cf2466b6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2551,6 +2551,7 @@ M: Stefan Hajnoczi 
 L: qemu-block@nongnu.org
 S: Maintained
 F: block/io_uring.c
+F: stubs/io_uring.c
 
 qcow2
 M: Kevin Wolf 
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 4a50e95ec3..56c177c507 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -13,6 +13,7 @@ stub-obj-y += iothread.o
 stub-obj-y += iothread-lock.o
 stub-obj-y += is-daemonized.o
 stub-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+stub-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 stub-obj-y += machine-init-done.o
 stub-obj-y += migr-blocker.o
 stub-obj-y += change-state-handler.o
diff --git a/stubs/io_uring.c b/stubs/io_uring.c
new file mode 100644
index 00..622d1e4648
--- /dev/null
+++ b/stubs/io_uring.c
@@ -0,0 +1,32 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/raw-aio.h"
+
+void luring_detach_aio_context(LuringState *s, AioContext *old_context)
+{
+abort();
+}
+
+void luring_attach_aio_context(LuringState *s, AioContext *new_context)
+{
+abort();
+}
+
+LuringState *luring_init(Error **errp)
+{
+abort();
+}
+
+void luring_cleanup(LuringState *s)
+{
+abort();
+}
-- 
2.21.0




[PATCH v2 12/15] qemu-img: adds option to use aio engine for benchmarking

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
v10:
 * Add missing space to qemu-img command-line documentation
---
 qemu-img.c   | 11 ++-
 qemu-img-cmds.hx |  4 ++--
 qemu-img.texi|  5 -
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 8b03ef8171..3e696a4d8e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4210,7 +4210,8 @@ static int img_bench(int argc, char **argv)
 {"force-share", no_argument, 0, 'U'},
 {0, 0, 0, 0}
 };
-c = getopt_long(argc, argv, ":hc:d:f:no:qs:S:t:wU", long_options, 
NULL);
+c = getopt_long(argc, argv, ":hc:d:f:ni:o:qs:S:t:wU", long_options,
+NULL);
 if (c == -1) {
 break;
 }
@@ -4253,6 +4254,14 @@ static int img_bench(int argc, char **argv)
 case 'n':
 flags |= BDRV_O_NATIVE_AIO;
 break;
+case 'i':
+ret = bdrv_parse_aio(optarg, &flags);
+if (ret < 0) {
+error_report("Invalid aio option: %s", optarg);
+ret = -1;
+goto out;
+}
+break;
 case 'o':
 {
 offset = cvtnum(optarg);
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1c93e6d185..77b5a8dda8 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -20,9 +20,9 @@ STEXI
 ETEXI
 
 DEF("bench", img_bench,
-"bench [-c count] [-d depth] [-f fmt] [--flush-interval=flush_interval] 
[-n] [--no-drain] [-o offset] [--pattern=pattern] [-q] [-s buffer_size] [-S 
step_size] [-t cache] [-w] [-U] filename")
+"bench [-c count] [-d depth] [-f fmt] [--flush-interval=flush_interval] 
[-n] [--no-drain] [-o offset] [--pattern=pattern] [-q] [-s buffer_size] [-S 
step_size] [-t cache] [-i aio] [-w] [-U] filename")
 STEXI
-@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] 
[--flush-interval=@var{flush_interval}] [-n] [--no-drain] [-o @var{offset}] 
[--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t 
@var{cache}] [-w] [-U] @var{filename}
+@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] 
[--flush-interval=@var{flush_interval}] [-n] [--no-drain] [-o @var{offset}] 
[--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t 
@var{cache}] [-i @var{aio}] [-w] [-U] @var{filename}
 ETEXI
 
 DEF("check", img_check,
diff --git a/qemu-img.texi b/qemu-img.texi
index b5156d6316..20136fcb94 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -206,7 +206,7 @@ Command description:
 Amends the image format specific @var{options} for the image file
 @var{filename}. Not all file formats support this operation.
 
-@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] 
[--flush-interval=@var{flush_interval}] [-n] [--no-drain] [-o @var{offset}] 
[--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S @var{step_size}] [-t 
@var{cache}] [-w] [-U] @var{filename}
+@item bench [-c @var{count}] [-d @var{depth}] [-f @var{fmt}] 
[--flush-interval=@var{flush_interval}] [-n] [-i @var{aio}] [--no-drain] [-o 
@var{offset}] [--pattern=@var{pattern}] [-q] [-s @var{buffer_size}] [-S 
@var{step_size}] [-t @var{cache}] [-w] [-U] @var{filename}
 
 Run a simple sequential I/O benchmark on the specified image. If @code{-w} is
 specified, a write test is performed, otherwise a read test is performed.
@@ -227,6 +227,9 @@ If @code{-n} is specified, the native AIO backend is used 
if possible. On
 Linux, this option only works if @code{-t none} or @code{-t directsync} is
 specified as well.
 
+If @code{-i} is specified, aio option can be used to specify different AIO
+backends: @var{threads}, @var{native} or @var{io_uring}.
+
 For write tests, by default a buffer filled with zeros is written. This can be
 overridden with a pattern byte specified by @var{pattern}.
 
-- 
2.21.0




[PATCH v2 11/15] qemu-io: adds option to use aio engine

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-io.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 91e3276592..3adc5a7d0d 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -130,7 +130,8 @@ static void open_help(void)
 " -C, -- use copy-on-read\n"
 " -n, -- disable host cache, short for -t none\n"
 " -U, -- force shared permissions\n"
-" -k, -- use kernel AIO implementation (on Linux only)\n"
+" -k, -- use kernel AIO implementation (Linux only, prefer use of -i)\n"
+" -i, -- use AIO mode (threads, native or io_uring)\n"
 " -t, -- use the given cache mode for the image\n"
 " -d, -- use the given discard mode for the image\n"
 " -o, -- options to be given to the block driver"
@@ -172,7 +173,7 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
 QDict *opts;
 bool force_share = false;
 
-while ((c = getopt(argc, argv, "snCro:kt:d:U")) != -1) {
+while ((c = getopt(argc, argv, "snCro:ki:t:d:U")) != -1) {
 switch (c) {
 case 's':
 flags |= BDRV_O_SNAPSHOT;
@@ -204,6 +205,13 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
 return -EINVAL;
 }
 break;
+case 'i':
+if (bdrv_parse_aio(optarg, &flags) < 0) {
+error_report("Invalid aio option: %s", optarg);
+qemu_opts_reset(&empty_opts);
+return -EINVAL;
+}
+break;
 case 'o':
 if (imageOpts) {
 printf("--image-opts and 'open -o' are mutually exclusive\n");
@@ -291,7 +299,9 @@ static void usage(const char *name)
 "  -n, --nocachedisable host cache, short for -t none\n"
 "  -C, --copy-on-read   enable copy-on-read\n"
 "  -m, --misalign   misalign allocations for O_DIRECT\n"
-"  -k, --native-aio use kernel AIO implementation (on Linux only)\n"
+"  -k, --native-aio use kernel AIO implementation\n"
+"   (Linux only, prefer use of -i)\n"
+"  -i, --aio=MODE   use AIO mode (threads, native or io_uring)\n"
 "  -t, --cache=MODE use the given cache mode for the image\n"
 "  -d, --discard=MODE   use the given discard mode for the image\n"
 "  -T, --trace [[enable=]][,events=][,file=]\n"
@@ -496,7 +506,7 @@ static QemuOptsList file_opts = {
 int main(int argc, char **argv)
 {
 int readonly = 0;
-const char *sopt = "hVc:d:f:rsnCmkt:T:U";
+const char *sopt = "hVc:d:f:rsnCmki:t:T:U";
 const struct option lopt[] = {
 { "help", no_argument, NULL, 'h' },
 { "version", no_argument, NULL, 'V' },
@@ -508,6 +518,7 @@ int main(int argc, char **argv)
 { "copy-on-read", no_argument, NULL, 'C' },
 { "misalign", no_argument, NULL, 'm' },
 { "native-aio", no_argument, NULL, 'k' },
+{ "aio", required_argument, NULL, 'i' },
 { "discard", required_argument, NULL, 'd' },
 { "cache", required_argument, NULL, 't' },
 { "trace", required_argument, NULL, 'T' },
@@ -575,6 +586,12 @@ int main(int argc, char **argv)
 case 'k':
 flags |= BDRV_O_NATIVE_AIO;
 break;
+case 'i':
+if (bdrv_parse_aio(optarg, &flags) < 0) {
+error_report("Invalid aio option: %s", optarg);
+exit(1);
+}
+break;
 case 't':
 if (bdrv_parse_cache_mode(optarg, &flags, &writethrough) < 0) {
 error_report("Invalid cache option: %s", optarg);
-- 
2.21.0




[PATCH v2 03/15] block/block: add BDRV flag for io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/block/block.h b/include/block/block.h
index 89606bd9f8..bdb48dcd1b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -126,6 +126,7 @@ typedef struct HDGeometry {
   ignoring the format layer */
 #define BDRV_O_NO_IO   0x1 /* don't initialize for I/O */
 #define BDRV_O_AUTO_RDONLY 0x2 /* degrade to read-only if opening 
read-write fails */
+#define BDRV_O_IO_URING0x4 /* use io_uring instead of the thread pool 
*/
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
 
-- 
2.21.0




[PATCH v2 02/15] qapi/block-core: add option for io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Only enumerates option for devices that support it. Since QAPI schema
supports io_uring, which is the actual name of the Linux API, it is
preferred over io-uring.

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 qapi/block-core.json | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b274aef713..3196f40178 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2851,11 +2851,13 @@
 #
 # @threads: Use qemu's thread pool
 # @native:  Use native AIO backend (only Linux and Windows)
+# @io_uring:Use linux io_uring (since 4.2)
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevAioOptions',
-  'data': [ 'threads', 'native' ] }
+  'data': [ 'threads', 'native',
+{ 'name': 'io_uring', 'if': 'defined(CONFIG_LINUX_IO_URING)' } ] }
 
 ##
 # @BlockdevCacheOptions:
-- 
2.21.0




[PATCH v2 14/15] tests/qemu-iotests: enable testing with aio options

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/check  | 15 ++-
 tests/qemu-iotests/common.rc  | 14 ++
 tests/qemu-iotests/iotests.py | 12 ++--
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 588c453a94..150fb71cf9 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -132,6 +132,7 @@ sortme=false
 expunge=true
 have_test_arg=false
 cachemode=false
+aiomode=false
 
 tmp="${TEST_DIR}"/$$
 rm -f $tmp.list $tmp.tmp $tmp.sed
@@ -141,6 +142,7 @@ export IMGFMT_GENERIC=true
 export IMGPROTO=file
 export IMGOPTS=""
 export CACHEMODE="writeback"
+export AIOMODE="threads"
 export QEMU_IO_OPTIONS=""
 export QEMU_IO_OPTIONS_NO_FMT=""
 export CACHEMODE_IS_DEFAULT=true
@@ -225,6 +227,11 @@ s/ .*//p
 CACHEMODE_IS_DEFAULT=false
 cachemode=false
 continue
+elif $aiomode
+then
+AIOMODE="$r"
+aiomode=false
+continue
 fi
 
 xpand=true
@@ -269,6 +276,7 @@ other options
 -n  show me, do not run tests
 -o options  -o options to pass to qemu-img create/convert
 -c mode cache mode
+-i mode AIO mode
 -makecheck  pretty print output for make check
 
 testlist options
@@ -433,10 +441,13 @@ testlist options
 cachemode=true
 xpand=false
 ;;
+-i)
+aiomode=true
+xpand=false
+;;
 -T)# deprecated timestamp option
 xpand=false
 ;;
-
 -v)
 verbose=true
 xpand=false
@@ -515,6 +526,8 @@ done
 
 # Set qemu-io cache mode with $CACHEMODE we have
 QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS --cache $CACHEMODE"
+# Set qemu-io aio mode with $AIOMODE we have
+QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS --aio $AIOMODE"
 
 QEMU_IO_OPTIONS_NO_FMT="$QEMU_IO_OPTIONS"
 if [ "$IMGOPTSSYNTAX" != "true" ]; then
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 12b4751848..13f934f79b 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -576,6 +576,20 @@ _default_cache_mode()
 return
 fi
 }
+_supported_aio_modes()
+{
+for mode; do
+if [ "$mode" = "$AIOMODE" ]; then
+return
+fi
+done
+_notrun "not suitable for aio mode: $AIOMODE"
+}
+_default_aio_mode()
+{
+AIOMODE="$1"
+QEMU_IO="$QEMU_IO --aio $1"
+}
 
 _unsupported_imgopts()
 {
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 693fde155a..0d4ef6b3e6 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -59,6 +59,7 @@ imgproto = os.environ.get('IMGPROTO', 'file')
 test_dir = os.environ.get('TEST_DIR')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
 cachemode = os.environ.get('CACHEMODE')
+aiomode = os.environ.get('AIOMODE')
 qemu_default_machine = os.environ.get('QEMU_DEFAULT_MACHINE')
 
 socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper')
@@ -483,6 +484,7 @@ class VM(qtest.QEMUQtestMachine):
 options.append('file=%s' % path)
 options.append('format=%s' % format)
 options.append('cache=%s' % cachemode)
+options.append('aio=%s' % aiomode)
 
 if opts:
 options.append(opts)
@@ -888,6 +890,10 @@ def verify_cache_mode(supported_cache_modes=[]):
 if supported_cache_modes and (cachemode not in supported_cache_modes):
 notrun('not suitable for this cache mode: %s' % cachemode)
 
+def verify_aio_mode(supported_aio_modes=[]):
+if supported_aio_modes and (aiomode not in supported_aio_modes):
+notrun('not suitable for this aio mode: %s' % aiomode)
+
 def supports_quorum():
 return 'quorum' in qemu_img_pipe('--help')
 
@@ -945,8 +951,9 @@ def execute_unittest(output, verbosity, debug):
 
 def execute_test(test_function=None,
  supported_fmts=[], supported_oses=['linux'],
- supported_cache_modes=[], unsupported_fmts=[],
- supported_protocols=[], unsupported_protocols=[]):
+ supported_cache_modes=[], supported_aio_modes={},
+ unsupported_fmts=[], supported_protocols=[],
+ unsupported_protocols=[]):
 """Run either unittest or script-style tests."""
 
 # We are using TEST_DIR and QEMU_DEFAULT_MACHINE as proxies to
@@ -963,6 +970,7 @@ def execute_test(test_function=None,
 verify_protocol(supported_protocols, unsupported_protocols)
 verify_platform(supported_oses)
 verify_cache_mode(supported_cache_modes)
+verify_aio_mode(supported_aio_modes)
 
 if debug:
 output = sys.stdout
-- 
2.21.0




[PATCH v2 13/15] qemu-nbd: adds option for aio engines

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Acked-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 qemu-nbd.c| 12 
 qemu-nbd.texi |  4 ++--
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index caacf0ed73..1761b3e70a 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -135,7 +135,7 @@ static void usage(const char *name)
 "'[ID_OR_NAME]'\n"
 "  -n, --nocache disable host cache\n"
 "  --cache=MODE  set cache mode (none, writeback, ...)\n"
-"  --aio=MODEset AIO mode (native or threads)\n"
+"  --aio=MODEset AIO mode (native, io_uring or threads)\n"
 "  --discard=MODEset discard mode (ignore, unmap)\n"
 "  --detect-zeroes=MODE  set detect-zeroes mode (off, on, unmap)\n"
 "  --image-opts  treat FILE as a full set of image options\n"
@@ -726,13 +726,9 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 seen_aio = true;
-if (!strcmp(optarg, "native")) {
-flags |= BDRV_O_NATIVE_AIO;
-} else if (!strcmp(optarg, "threads")) {
-/* this is the default */
-} else {
-   error_report("invalid aio mode `%s'", optarg);
-   exit(EXIT_FAILURE);
+if (bdrv_parse_aio(optarg, &flags) < 0) {
+error_report("Invalid aio mode '%s'", optarg);
+exit(EXIT_FAILURE);
 }
 break;
 case QEMU_NBD_OPT_DISCARD:
diff --git a/qemu-nbd.texi b/qemu-nbd.texi
index 7f55657722..3ee3e4bdee 100644
--- a/qemu-nbd.texi
+++ b/qemu-nbd.texi
@@ -77,8 +77,8 @@ as an read-only device, @var{snapshot_param} format is
 The cache mode to be used with the file.  See the documentation of
 the emulator's @code{-drive cache=...} option for allowed values.
 @item --aio=@var{aio}
-Set the asynchronous I/O mode between @samp{threads} (the default)
-and @samp{native} (Linux only).
+Set the asynchronous I/O mode between @samp{threads} (the default),
+@samp{native} (Linux only) and @samp{io_uring} (Linux 5.1+).
 @item --discard=@var{discard}
 Control whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap})
 requests are ignored or passed to the filesystem.  @var{discard} is one of
-- 
2.21.0




[PATCH v2 08/15] block/file-posix.c: extend to use io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Stefan Hajnoczi 
---
 block/file-posix.c | 99 --
 1 file changed, 79 insertions(+), 20 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 695fcf740d..14c80b8cb3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -156,6 +156,7 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
 bool use_linux_aio:1;
+bool use_linux_io_uring:1;
 bool page_cache_inconsistent:1;
 bool has_fallocate;
 bool needs_alignment;
@@ -444,7 +445,7 @@ static QemuOptsList raw_runtime_opts = {
 {
 .name = "aio",
 .type = QEMU_OPT_STRING,
-.help = "host AIO implementation (threads, native)",
+.help = "host AIO implementation (threads, native, io_uring)",
 },
 {
 .name = "locking",
@@ -503,9 +504,15 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 goto fail;
 }
 
-aio_default = (bdrv_flags & BDRV_O_NATIVE_AIO)
-  ? BLOCKDEV_AIO_OPTIONS_NATIVE
-  : BLOCKDEV_AIO_OPTIONS_THREADS;
+if (bdrv_flags & BDRV_O_NATIVE_AIO) {
+aio_default = BLOCKDEV_AIO_OPTIONS_NATIVE;
+#ifdef CONFIG_LINUX_IO_URING
+} else if (bdrv_flags & BDRV_O_IO_URING) {
+aio_default = BLOCKDEV_AIO_OPTIONS_IO_URING;
+#endif
+} else {
+aio_default = BLOCKDEV_AIO_OPTIONS_THREADS;
+}
 aio = qapi_enum_parse(&BlockdevAioOptions_lookup,
   qemu_opt_get(opts, "aio"),
   aio_default, &local_err);
@@ -514,7 +521,11 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 ret = -EINVAL;
 goto fail;
 }
+
 s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
+#ifdef CONFIG_LINUX_IO_URING
+s->use_linux_io_uring = (aio == BLOCKDEV_AIO_OPTIONS_IO_URING);
+#endif
 
 locking = qapi_enum_parse(&OnOffAuto_lookup,
   qemu_opt_get(opts, "locking"),
@@ -578,7 +589,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 s->shared_perm = BLK_PERM_ALL;
 
 #ifdef CONFIG_LINUX_AIO
- /* Currently Linux does AIO only for files opened with O_DIRECT */
+/* Currently Linux does AIO only for files opened with O_DIRECT */
 if (s->use_linux_aio) {
 if (!(s->open_flags & O_DIRECT)) {
 error_setg(errp, "aio=native was specified, but it requires "
@@ -600,6 +611,22 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 #endif /* !defined(CONFIG_LINUX_AIO) */
 
+#ifdef CONFIG_LINUX_IO_URING
+if (s->use_linux_io_uring) {
+if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
+error_prepend(errp, "Unable to use io_uring: ");
+goto fail;
+}
+}
+#else
+if (s->use_linux_io_uring) {
+error_setg(errp, "aio=io_uring was specified, but is not supported "
+ "in this build.");
+ret = -EINVAL;
+goto fail;
+}
+#endif /* !defined(CONFIG_LINUX_IO_URING) */
+
 s->has_discard = true;
 s->has_write_zeroes = true;
 if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
@@ -1876,21 +1903,25 @@ static int coroutine_fn raw_co_prw(BlockDriverState 
*bs, uint64_t offset,
 return -EIO;
 
 /*
- * Check if the underlying device requires requests to be aligned,
- * and if the request we are trying to submit is aligned or not.
- * If this is the case tell the low-level driver that it needs
- * to copy the buffer.
+ * When using O_DIRECT, the request must be aligned to be able to use
+ * either libaio or io_uring interface. If not fail back to regular thread
+ * pool read/write code which emulates this for us if we
+ * set QEMU_AIO_MISALIGNED.
  */
-if (s->needs_alignment) {
-if (!bdrv_qiov_is_aligned(bs, qiov)) {
-type |= QEMU_AIO_MISALIGNED;
+if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) {
+type |= QEMU_AIO_MISALIGNED;
+#ifdef CONFIG_LINUX_IO_URING
+} else if (s->use_linux_io_uring) {
+LuringState *aio = aio_get_linux_io_uring(bdrv_get_aio_context(bs));
+assert(qiov->size == bytes);
+return luring_co_submit(bs, aio, s->fd, offset, qiov, type);
+#endif
 #ifdef CONFIG_LINUX_AIO
-} else if (s->use_linux_aio) {
-LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
-assert(qiov->size == bytes);
-return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
+} else if (s->use_linux_aio) {
+LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
+assert(qiov->size == bytes);
+return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
 #endif
-}
 }
 
 acb = (RawPosixAIOData) {
@@ -1926,24 +1957,

[PATCH v2 04/15] block/io_uring: implements interfaces for io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Aborts when sqe fails to be set as sqes cannot be returned to the
ring. Adds slow path for short reads for older kernels

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
v10:
 * Update MAINTAINERS file [Julia]
 * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
 * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
 * Declare variables at the beginning of the block in luring_init() [Julia]
---
 MAINTAINERS |   8 +
 block/Makefile.objs |   3 +
 include/block/aio.h |  16 +-
 include/block/raw-aio.h |  12 ++
 block/io_uring.c| 403 
 5 files changed, 441 insertions(+), 1 deletion(-)
 create mode 100644 block/io_uring.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ed41d7d1b6..31b090033c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2544,6 +2544,14 @@ F: block/file-posix.c
 F: block/file-win32.c
 F: block/win32-aio.c
 
+Linux io_uring
+M: Aarushi Mehta 
+M: Julia Suvorova 
+M: Stefan Hajnoczi 
+L: qemu-block@nongnu.org
+S: Maintained
+F: block/io_uring.c
+
 qcow2
 M: Kevin Wolf 
 M: Max Reitz 
diff --git a/block/Makefile.objs b/block/Makefile.objs
index e394fe0b6c..035abb9c5c 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block-backend.o snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+block-obj-$(CONFIG_LINUX_IO_URING) += io_uring.o
 block-obj-y += null.o mirror.o commit.o io.o create.o
 block-obj-y += throttle-groups.o
 block-obj-$(CONFIG_LINUX) += nvme.o
@@ -65,5 +66,7 @@ block-obj-$(if $(CONFIG_LZFSE),m,n) += dmg-lzfse.o
 dmg-lzfse.o-libs   := $(LZFSE_LIBS)
 qcow.o-libs:= -lz
 linux-aio.o-libs   := -laio
+io_uring.o-cflags  := $(LINUX_IO_URING_CFLAGS)
+io_uring.o-libs:= $(LINUX_IO_URING_LIBS)
 parallels.o-cflags := $(LIBXML2_CFLAGS)
 parallels.o-libs   := $(LIBXML2_LIBS)
diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..7ba9bd7874 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -49,6 +49,7 @@ typedef void IOHandler(void *opaque);
 struct Coroutine;
 struct ThreadPool;
 struct LinuxAioState;
+struct LuringState;
 
 struct AioContext {
 GSource source;
@@ -117,11 +118,19 @@ struct AioContext {
 struct ThreadPool *thread_pool;
 
 #ifdef CONFIG_LINUX_AIO
-/* State for native Linux AIO.  Uses aio_context_acquire/release for
+/*
+ * State for native Linux AIO.  Uses aio_context_acquire/release for
  * locking.
  */
 struct LinuxAioState *linux_aio;
 #endif
+#ifdef CONFIG_LINUX_IO_URING
+/*
+ * State for Linux io_uring.  Uses aio_context_acquire/release for
+ * locking.
+ */
+struct LuringState *linux_io_uring;
+#endif
 
 /* TimerLists for calling timers - one per clock type.  Has its own
  * locking.
@@ -386,6 +395,11 @@ struct LinuxAioState *aio_setup_linux_aio(AioContext *ctx, 
Error **errp);
 /* Return the LinuxAioState bound to this AioContext */
 struct LinuxAioState *aio_get_linux_aio(AioContext *ctx);
 
+/* Setup the LuringState bound to this AioContext */
+struct LuringState *aio_setup_linux_io_uring(AioContext *ctx, Error **errp);
+
+/* Return the LuringState bound to this AioContext */
+struct LuringState *aio_get_linux_io_uring(AioContext *ctx);
 /**
  * aio_timer_new_with_attrs:
  * @ctx: the aio context
diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 4629f24d08..251b10d273 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -57,6 +57,18 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext 
*new_context);
 void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
 void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
 #endif
+/* io_uring.c - Linux io_uring implementation */
+#ifdef CONFIG_LINUX_IO_URING
+typedef struct LuringState LuringState;
+LuringState *luring_init(Error **errp);
+void luring_cleanup(LuringState *s);
+int coroutine_fn luring_co_submit(BlockDriverState *bs, LuringState *s, int fd,
+uint64_t offset, QEMUIOVector *qiov, int type);
+void luring_detach_aio_context(LuringState *s, AioContext *old_context);
+void luring_attach_aio_context(LuringState *s, AioContext *new_context);
+void luring_io_plug(BlockDriverState *bs, LuringState *s);
+void luring_io_unplug(BlockDriverState *bs, LuringState *s);
+#endif
 
 #ifdef _WIN32
 typedef struct QEMUWin32AIOState QEMUWin32AIOState;
diff --git a/block/io_uring.c b/block/io_uring.c
new file mode 100644
index 00..307c4c5823
--- /dev/null
+++ b/block/io_uring.c
@@ -0,0 +1,403 @@
+/*
+ * Linux io_uring support.
+ *
+ * Copyright (C) 2009 IBM, Corp.
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2019 Aarushi Mehta
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "

[PATCH v2 01/15] configure: permit use of io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Stefan Hajnoczi 
---
 configure | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/configure b/configure
index 145fcabbb3..745429c925 100755
--- a/configure
+++ b/configure
@@ -370,6 +370,7 @@ xen=""
 xen_ctrl_version=""
 xen_pci_passthrough=""
 linux_aio=""
+linux_io_uring=""
 cap_ng=""
 attr=""
 libattr=""
@@ -1250,6 +1251,10 @@ for opt do
   ;;
   --enable-linux-aio) linux_aio="yes"
   ;;
+  --disable-linux-io-uring) linux_io_uring="no"
+  ;;
+  --enable-linux-io-uring) linux_io_uring="yes"
+  ;;
   --disable-attr) attr="no"
   ;;
   --enable-attr) attr="yes"
@@ -1760,6 +1765,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   vde support for vde network
   netmap  support for netmap network
   linux-aio   Linux AIO support
+  linux-io-uring  Linux io_uring support
   cap-ng  libcap-ng support
   attrattr and xattr support
   vhost-net   vhost-net kernel acceleration support
@@ -3950,6 +3956,21 @@ EOF
 linux_aio=no
   fi
 fi
+##
+# linux-io-uring probe
+
+if test "$linux_io_uring" != "no" ; then
+  if $pkg_config liburing; then
+linux_io_uring_cflags=$($pkg_config --cflags liburing)
+linux_io_uring_libs=$($pkg_config --libs liburing)
+linux_io_uring=yes
+  else
+if test "$linux_io_uring" = "yes" ; then
+  feature_not_found "linux io_uring" "Install liburing devel"
+fi
+linux_io_uring=no
+  fi
+fi
 
 ##
 # TPM emulation is only on POSIX
@@ -6356,6 +6377,7 @@ echo "PIE   $pie"
 echo "vde support   $vde"
 echo "netmap support$netmap"
 echo "Linux AIO support $linux_aio"
+echo "Linux io_uring support $linux_io_uring"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
@@ -6844,6 +6866,11 @@ fi
 if test "$linux_aio" = "yes" ; then
   echo "CONFIG_LINUX_AIO=y" >> $config_host_mak
 fi
+if test "$linux_io_uring" = "yes" ; then
+  echo "CONFIG_LINUX_IO_URING=y" >> $config_host_mak
+  echo "LINUX_IO_URING_CFLAGS=$linux_io_uring_cflags" >> $config_host_mak
+  echo "LINUX_IO_URING_LIBS=$linux_io_uring_libs" >> $config_host_mak
+fi
 if test "$attr" = "yes" ; then
   echo "CONFIG_ATTR=y" >> $config_host_mak
 fi
-- 
2.21.0




[PATCH v2 09/15] block: add trace events for io_uring

2019-10-25 Thread Stefan Hajnoczi
From: Aarushi Mehta 

Signed-off-by: Aarushi Mehta 
Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c   | 21 ++---
 block/trace-events | 12 
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 307c4c5823..a5c0d16220 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -17,6 +17,7 @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "trace.h"
 
 /* io_uring ring size */
 #define MAX_ENTRIES 128
@@ -85,6 +86,8 @@ static void luring_resubmit_short_read(LuringState *s, 
LuringAIOCB *luringcb,
 QEMUIOVector *resubmit_qiov;
 size_t remaining;
 
+trace_luring_resubmit_short_read(s, luringcb, nread);
+
 /* Update read position */
 luringcb->total_read = nread;
 remaining = luringcb->qiov->size - luringcb->total_read;
@@ -156,6 +159,7 @@ static void luring_process_completions(LuringState *s)
 
 /* Change counters one-by-one because we can be nested. */
 s->io_q.in_flight--;
+trace_luring_process_completion(s, luringcb, ret);
 
 /* total_read is non-zero only for resubmitted read requests */
 total_bytes = ret + luringcb->total_read;
@@ -224,6 +228,7 @@ static int ioq_submit(LuringState *s)
 QSIMPLEQ_REMOVE_HEAD(&s->io_q.submit_queue, next);
 }
 ret = io_uring_submit(&s->ring);
+trace_luring_io_uring_submit(s, ret);
 /* Prevent infinite loop if submission is refused */
 if (ret <= 0) {
 if (ret == -EAGAIN) {
@@ -280,12 +285,15 @@ static void ioq_init(LuringQueue *io_q)
 
 void luring_io_plug(BlockDriverState *bs, LuringState *s)
 {
+trace_luring_io_plug(s);
 s->io_q.plugged++;
 }
 
 void luring_io_unplug(BlockDriverState *bs, LuringState *s)
 {
 assert(s->io_q.plugged);
+trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
+   s->io_q.in_queue, s->io_q.in_flight);
 if (--s->io_q.plugged == 0 &&
 !s->io_q.blocked && s->io_q.in_queue > 0) {
 ioq_submit(s);
@@ -306,6 +314,7 @@ void luring_io_unplug(BlockDriverState *bs, LuringState *s)
 static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
 uint64_t offset, int type)
 {
+int ret;
 struct io_uring_sqe *sqes = &luringcb->sqeq;
 
 switch (type) {
@@ -329,11 +338,14 @@ static int luring_do_submit(int fd, LuringAIOCB 
*luringcb, LuringState *s,
 
 QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
 s->io_q.in_queue++;
-
+trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
+   s->io_q.in_queue, s->io_q.in_flight);
 if (!s->io_q.blocked &&
 (!s->io_q.plugged ||
  s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-return ioq_submit(s);
+ret = ioq_submit(s);
+trace_luring_do_submit_done(s, ret);
+return ret;
 }
 return 0;
 }
@@ -348,8 +360,10 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, 
LuringState *s, int fd,
 .qiov   = qiov,
 .is_read= (type == QEMU_AIO_READ),
 };
-
+trace_luring_co_submit(bs, s, &luringcb, fd, offset, qiov ? qiov->size : 0,
+   type);
 ret = luring_do_submit(fd, &luringcb, s, offset, type);
+
 if (ret < 0) {
 return ret;
 }
@@ -400,4 +414,5 @@ void luring_cleanup(LuringState *s)
 {
 io_uring_queue_exit(&s->ring);
 g_free(s);
+trace_luring_cleanup_state(s);
 }
diff --git a/block/trace-events b/block/trace-events
index b8d70f5242..c8e1647e90 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -63,6 +63,18 @@ qmp_block_stream(void *bs) "bs %p"
 file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) 
"acb %p opaque %p offset %"PRId64" count %d type %d"
 file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t 
dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset 
%"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
 
+#io_uring.c
+luring_init_state(void *s, size_t size) "s %p size %zu"
+luring_cleanup_state(void *s) "%p freed"
+luring_io_plug(void *s) "LuringState %p plug"
+luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) 
"LuringState %p blocked %d plugged %d queued %d inflight %d"
+luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) 
"LuringState %p blocked %d plugged %d queued %d inflight %d"
+luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d"
+luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, 
size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " 
nbytes %zd type %d"
+luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p 
luringcb %p ret %d"
+luring_io_uring_submit(void *s, int ret) "LuringState %p ret %d"
+luring_resubmit_

[PATCH v2 00/15] io_uring: add Linux io_uring AIO engine

2019-10-25 Thread Stefan Hajnoczi
v11:
 * Drop fd registration because it breaks QEMU's file locking and will need to
   be resolved in a separate patch series
 * Drop line-wrapping changes that accidentally broke several qemu-iotests

v10:
 * Dropped kernel submission queue polling, it requires root and has additional
   limitations.  It should be benchmarked and considered for inclusion later,
   maybe even together with kernel side changes.
 * Add io_uring_register_files() return value to trace_luring_fd_register()
 * Fix indentation in luring_fd_unregister()
 * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
 * Simplify fd registration code
 * Add luring_fd_unregister() and call it from file-posix.c to prevent
   fd leaks
 * Add trace_luring_fd_unregister() trace event
 * Add missing space to qemu-img command-line documentation
 * Update MAINTAINERS file [Julia]
 * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
 * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
 * Declare variables at the beginning of the block in luring_init() [Julia]

This patch series is based on Aarushi Mehta's v9 patch series written for
Google Summer of Code 2019:

  https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html

It adds a new AIO engine that uses the new Linux io_uring API.  This is the
successor to Linux AIO with a number of improvements:
1. Both O_DIRECT and buffered I/O work
2. fdatasync(2) is supported (no need for a separate thread pool!)
3. True async behavior so the syscall doesn't block (Linux AIO got there to 
some degree...)
4. Advanced performance optimizations are available (file registration, memory
   buffer registration, completion polling, submission polling).

Since Aarushi has been busy, I have taken up this patch series.  Booting a
guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
raw file on XFS.

I currently recommend using -drive aio=io_uring only with host block devices
(like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
image files on ext4 or XFS.

Aarushi Mehta (15):
  configure: permit use of io_uring
  qapi/block-core: add option for io_uring
  block/block: add BDRV flag for io_uring
  block/io_uring: implements interfaces for io_uring
  stubs: add stubs for io_uring interface
  util/async: add aio interfaces for io_uring
  blockdev: adds bdrv_parse_aio to use io_uring
  block/file-posix.c: extend to use io_uring
  block: add trace events for io_uring
  block/io_uring: adds userspace completion polling
  qemu-io: adds option to use aio engine
  qemu-img: adds option to use aio engine for benchmarking
  qemu-nbd: adds option for aio engines
  tests/qemu-iotests: enable testing with aio options
  tests/qemu-iotests: use AIOMODE with various tests

 MAINTAINERS   |   9 +
 qapi/block-core.json  |   4 +-
 configure |  27 +++
 block/Makefile.objs   |   3 +
 stubs/Makefile.objs   |   1 +
 include/block/aio.h   |  16 +-
 include/block/block.h |   2 +
 include/block/raw-aio.h   |  12 +
 block.c   |  22 ++
 block/file-posix.c|  99 ++--
 block/io_uring.c  | 433 ++
 blockdev.c|  12 +-
 qemu-img.c|  11 +-
 qemu-io.c |  25 +-
 qemu-nbd.c|  12 +-
 stubs/io_uring.c  |  32 +++
 util/async.c  |  36 +++
 block/trace-events|  12 +
 qemu-img-cmds.hx  |   4 +-
 qemu-img.texi |   5 +-
 qemu-nbd.texi |   4 +-
 tests/qemu-iotests/028|   2 +-
 tests/qemu-iotests/058|   2 +-
 tests/qemu-iotests/089|   4 +-
 tests/qemu-iotests/091|   4 +-
 tests/qemu-iotests/109|   2 +-
 tests/qemu-iotests/147|   5 +-
 tests/qemu-iotests/181|   8 +-
 tests/qemu-iotests/183|   4 +-
 tests/qemu-iotests/185|  10 +-
 tests/qemu-iotests/200|   2 +-
 tests/qemu-iotests/201|   8 +-
 tests/qemu-iotests/check  |  15 +-
 tests/qemu-iotests/common.rc  |  14 ++
 tests/qemu-iotests/iotests.py |  12 +-
 35 files changed, 797 insertions(+), 76 deletions(-)
 create mode 100644 block/io_uring.c
 create mode 100644 stubs/io_uring.c

-- 
2.21.0




Re: [PATCH] qemu-iotests: restrict 264 to qcow2 only

2019-10-25 Thread Eric Blake

On 10/25/19 9:50 AM, Vladimir Sementsov-Ogievskiy wrote:

264 is unprepared to run with different formats, for example luks needs
handling keys, cloop doesn't support image creation, vpc creates image
larger than requested (which breaks "Backup completed: 5242880" in test
output).

The test is here to check nbd-reconnect feature and we actually don't
need it for all formats. Let's restrict it to qcow2 only.

Reported-by: Max Reitz 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  tests/qemu-iotests/264 | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Eric Blake 

I will queue through my NBD tree; it may miss soft freeze (due to my 
travel schedule for KVM Forum), but is definite 4.2 material, so it will 
be in by rc1.




diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index c8cd97ae2b..131366422b 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -24,6 +24,8 @@ import iotests
  from iotests import qemu_img_create, qemu_io_silent_check, file_path, \
  qemu_nbd_popen, log
  
+iotests.verify_image_format(supported_fmts=['qcow2'])

+
  disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
  nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
  size = 5 * 1024 * 1024



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: Early 4.1.1 release for image corruption fixes?

2019-10-25 Thread Daniel P . Berrangé
On Fri, Oct 25, 2019 at 04:29:15PM +0200, Kevin Wolf wrote:
> Hi,
> 
> the QEMU 4.1.0 release has two problems that can easily cause image
> corruption with any qcow2 images (no special configuration needed to
> trigger the bugs):
> 
> 1. A locking bug in the qcow2 code. I just sent a pull request that
>includes the fix for this. The important patch there is:
> 
>'qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()'
> 
> 2. A kernel bug in the XFS driver that became visible by new I/O
>patterns the qcow2 implementation started to use in 4.1. Max is
>currently working on a workaround for this.
> 
> The Planning/4.1 wiki page tells me that a 4.1.1 release is planned for
> end of November, which isn't too far, but I was wondering if want to
> have a stable release even earlier, right after the fixes for both
> problems are in.

IMHO data corruption for our primary image format is an excellent
reason to do a stable release at the soonest viable opportunity.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Peter Maydell
On Fri, 25 Oct 2019 at 15:21, Max Reitz  wrote:
>
> On 25.10.19 16:17, Peter Maydell wrote:
> > On Fri, 25 Oct 2019 at 15:16, Max Reitz  wrote:
> >> I’ve created an RH BZ here:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1765547
> >
> > Currently "You are not authorized to access bug #1765547." for
> > anonymous browsers, just fyi.
>
> Err.  Oops.  That wasn’t my intention.  I hate that misfeature.
>
> Thanks for telling me, I fixed it. :-)

Yeah, that's a really well written bug report, you definitely
don't want to hide it away :-) Thanks for making it public.

-- PMM



[PATCH] qemu-iotests: restrict 264 to qcow2 only

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
264 is unprepared to run with different formats, for example luks needs
handling keys, cloop doesn't support image creation, vpc creates image
larger than requested (which breaks "Backup completed: 5242880" in test
output).

The test is here to check nbd-reconnect feature and we actually don't
need it for all formats. Let's restrict it to qcow2 only.

Reported-by: Max Reitz 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/264 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qemu-iotests/264 b/tests/qemu-iotests/264
index c8cd97ae2b..131366422b 100755
--- a/tests/qemu-iotests/264
+++ b/tests/qemu-iotests/264
@@ -24,6 +24,8 @@ import iotests
 from iotests import qemu_img_create, qemu_io_silent_check, file_path, \
 qemu_nbd_popen, log
 
+iotests.verify_image_format(supported_fmts=['qcow2'])
+
 disk_a, disk_b, nbd_sock = file_path('disk_a', 'disk_b', 'nbd-sock')
 nbd_uri = 'nbd+unix:///?socket=' + nbd_sock
 size = 5 * 1024 * 1024
-- 
2.21.0




Re: [PULL 3/3] iotests: test nbd reconnect

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 17:28, Max Reitz wrote:
> On 23.10.19 04:01, Eric Blake wrote:
>> From: Vladimir Sementsov-Ogievskiy 
>>
>> Add test, which starts backup to nbd target and restarts nbd server
>> during backup.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Message-Id: <20191009084158.15614-4-vsement...@virtuozzo.com>
>> Reviewed-by: Eric Blake 
>> Signed-off-by: Eric Blake 
>> ---
>>   tests/qemu-iotests/264| 95 +++
>>   tests/qemu-iotests/264.out| 13 +
>>   tests/qemu-iotests/group  |  1 +
>>   tests/qemu-iotests/iotests.py | 11 
>>   4 files changed, 120 insertions(+)
>>   create mode 100755 tests/qemu-iotests/264
>>   create mode 100644 tests/qemu-iotests/264.out
> 
> I suppose this test should limit the supported image formats.  For me,
> it fails for at least LUKS, cloop, and vpc.  (Due to different reasons
> for each format.)
> 

Sorry for this. I should finally remember to (almost) always restrict my tests 
to qcow2.
Patch sent.


-- 
Best regards,
Vladimir


Re: Early 4.1.1 release for image corruption fixes?

2019-10-25 Thread Max Reitz
On 25.10.19 16:29, Kevin Wolf wrote:
> Hi,
> 
> the QEMU 4.1.0 release has two problems that can easily cause image
> corruption with any qcow2 images (no special configuration needed to
> trigger the bugs):

Three, actually.  There’s also b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2
which fixed 50ba5b2d994853b38fed10e0841b119da0f8b8e5 that was part of
4.1.0, too.

> 1. A locking bug in the qcow2 code. I just sent a pull request that
>includes the fix for this. The important patch there is:
> 
>'qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()'
> 
> 2. A kernel bug in the XFS driver that became visible by new I/O
>patterns the qcow2 implementation started to use in 4.1. Max is
>currently working on a workaround for this.

You most likely won’t see the kernel bug on 4.1.0, because 50ba5b2d99 is
basically the same bug, but in qemu...

(I don’t know if it has the same technical reason, but it causes the
same symptoms.)

Max

> The Planning/4.1 wiki page tells me that a 4.1.1 release is planned for
> end of November, which isn't too far, but I was wondering if want to
> have a stable release even earlier, right after the fixes for both
> problems are in.
> 
> Peter, if we haven't made a decision until then, maybe discussing this
> is an agenda item for QEMU Summit, too.
> 
> Kevin
> 




signature.asc
Description: OpenPGP digital signature


Re: [PULL 0/7] Block layer patches

2019-10-25 Thread Peter Maydell
On Fri, 25 Oct 2019 at 14:46, Kevin Wolf  wrote:
>
> The following changes since commit 7bc8f9734213b76e76631a483be13d6737c2adbc:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20191025' into staging (2019-10-25 
> 13:12:16 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 5e9785505210e2477e590e61b1ab100d0ec22b01:
>
>   qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation() 
> (2019-10-25 15:18:55 +0200)
>
> 
> Block layer patches:
>
> - qcow2: Fix data corruption bug that is triggered in partial cluster
>   allocation with default options
> - qapi: add support for blkreplay driver
> - doc: Describe missing generic -blockdev options
> - iotests: Fix 118 when run as root
> - Minor code cleanups
>
> 



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM



Re: [PATCH v14 1/9] esp: add pseudo-DMA as used by Macintosh

2019-10-25 Thread Philippe Mathieu-Daudé

On 10/25/19 4:01 PM, Paolo Bonzini wrote:

On 22/10/19 13:17, Laurent Vivier wrote:

+if (s->dma_memory_read) {
+s->dma_memory_read(s->dma_opaque, &s->cmdbuf[s->cmdlen], len);
+} else {
+set_pdma(s, CMD, s->cmdlen, len);
+s->pdma_cb = do_dma_pdma_cb;
+esp_raise_drq(s);
+return;
+}
+trace_esp_handle_ti_cmd(s->cmdlen);
+s->ti_size = 0;
+s->cmdlen = 0;
+s->do_cmd = 0;
+do_cmd(s, s->cmdbuf);
  return;


Can you explain these lines after s->dma_memory_read?  I suppose they
are related to


-}
-if (s->do_cmd) {
+} else if (s->do_cmd) {


If so, it would be nice to make those a separate patch.  Otherwise seems
okay.


Third reviewer asking, so it seems worthwhile.



Re: [PATCH v14 1/9] esp: add pseudo-DMA as used by Macintosh

2019-10-25 Thread Laurent Vivier
Le 25/10/2019 à 16:54, Philippe Mathieu-Daudé a écrit :
> On 10/25/19 4:01 PM, Paolo Bonzini wrote:
>> On 22/10/19 13:17, Laurent Vivier wrote:
>>> +    if (s->dma_memory_read) {
>>> +    s->dma_memory_read(s->dma_opaque, &s->cmdbuf[s->cmdlen],
>>> len);
>>> +    } else {
>>> +    set_pdma(s, CMD, s->cmdlen, len);
>>> +    s->pdma_cb = do_dma_pdma_cb;
>>> +    esp_raise_drq(s);
>>> +    return;
>>> +    }
>>> +    trace_esp_handle_ti_cmd(s->cmdlen);
>>> +    s->ti_size = 0;
>>> +    s->cmdlen = 0;
>>> +    s->do_cmd = 0;
>>> +    do_cmd(s, s->cmdbuf);
>>>   return;
>>
>> Can you explain these lines after s->dma_memory_read?  I suppose they
>> are related to
>>
>>> -    }
>>> -    if (s->do_cmd) {
>>> +    } else if (s->do_cmd) {
>>
>> If so, it would be nice to make those a separate patch.  Otherwise seems
>> okay.
> 
> Third reviewer asking, so it seems worthwhile.

It seems, yes :)

I'm going to try to make a separate patch.

Thank you to all of you.

Laurent



Re: [PULL 3/3] iotests: test nbd reconnect

2019-10-25 Thread Max Reitz
On 23.10.19 04:01, Eric Blake wrote:
> From: Vladimir Sementsov-Ogievskiy 
> 
> Add test, which starts backup to nbd target and restarts nbd server
> during backup.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Message-Id: <20191009084158.15614-4-vsement...@virtuozzo.com>
> Reviewed-by: Eric Blake 
> Signed-off-by: Eric Blake 
> ---
>  tests/qemu-iotests/264| 95 +++
>  tests/qemu-iotests/264.out| 13 +
>  tests/qemu-iotests/group  |  1 +
>  tests/qemu-iotests/iotests.py | 11 
>  4 files changed, 120 insertions(+)
>  create mode 100755 tests/qemu-iotests/264
>  create mode 100644 tests/qemu-iotests/264.out

I suppose this test should limit the supported image formats.  For me,
it fails for at least LUKS, cloop, and vpc.  (Due to different reasons
for each format.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Kevin Wolf
Am 25.10.2019 um 16:19 hat Max Reitz geschrieben:
> On 25.10.19 15:56, Vladimir Sementsov-Ogievskiy wrote:
> > 25.10.2019 16:40, Vladimir Sementsov-Ogievskiy wrote:
> >> 25.10.2019 12:58, Max Reitz wrote:
> >>> Hi,
> >>>
> >>> It seems to me that there is a bug in Linux’s XFS kernel driver, as
> >>> I’ve explained here:
> >>>
> >>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
> >>>
> >>> In combination with our commit c8bb23cbdbe32f, this may lead to guest
> >>> data corruption when using qcow2 images on XFS with aio=native.
> >>>
> >>> We can’t wait until the XFS kernel driver is fixed, we should work
> >>> around the problem ourselves.
> >>>
> >>> This is an RFC for two reasons:
> >>> (1) I don’t know whether this is the right way to address the issue,
> >>> (2) Ideally, we should detect whether the XFS kernel driver is fixed and
> >>>  if so stop applying the workaround.
> >>>  I don’t know how we would go about this, so this series doesn’t do
> >>>  it.  (Hence it’s an RFC.)
> >>> (3) Perhaps it’s a bit of a layering violation to let the file-posix
> >>>  driver access and modify a BdrvTrackedRequest object.
> >>>
> >>> As for how we can address the issue, I see three ways:
> >>> (1) The one presented in this series: On XFS with aio=native, we extend
> >>>  tracked requests for post-EOF fallocate() calls (i.e., write-zero
> >>>  operations) to reach until infinity (INT64_MAX in practice), mark
> >>>  them serializing and wait for other conflicting requests.
> >>>
> >>>  Advantages:
> >>>  + Limits the impact to very specific cases
> >>>    (And that means it wouldn’t hurt too much to keep this workaround
> >>>    even when the XFS driver has been fixed)
> >>>  + Works around the bug where it happens, namely in file-posix
> >>>
> >>>  Disadvantages:
> >>>  - A bit complex
> >>>  - A bit of a layering violation (should file-posix have access to
> >>>    tracked requests?)
> >>>
> >>> (2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
> >>>  becomes visible due to that function: I don’t think qcow2 writes
> >>>  zeroes in any other I/O path, and raw images are fixed in size so
> >>>  post-EOF writes won’t happen.
> >>>
> >>>  Advantages:
> >>>  + Maybe simpler, depending on how difficult it is to handle the
> >>>    layering violation
> >>>  + Also fixes the performance problem of handle_alloc_space() being
> >>>    slow on ppc64+XFS.
> >>>
> >>>  Disadvantages:
> >>>  - Huge layering violation because qcow2 would need to know whether
> >>>    the image is stored on XFS or not.
> >>>  - We’d definitely want to skip this workaround when the XFS driver
> >>>    has been fixed, so we need some method to find out whether it has
> >>>
> >>> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
> >>>  To my knowledge I’m the only one who has provided any benchmarks for
> >>>  this commit, and even then I was a bit skeptical because it performs
> >>>  well in some cases and bad in others.  I concluded that it’s
> >>>  probably worth it because the “some cases” are more likely to occur.
> >>>
> >>>  Now we have this problem of corruption here (granted due to a bug in
> >>>  the XFS driver), and another report of massively degraded
> >>>  performance on ppc64
> >>>  (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
> >>>  private BZ; I hate that :-/  The report is about 40 % worse
> >>>  performance for an in-guest fio write benchmark.)
> >>>
> >>>  So I have to ask the question about what the justification for
> >>>  keeping c8bb23cbdbe32f is.  How much does performance increase with
> >>>  it actually?  (On non-(ppc64+XFS) machines, obviously)
> >>>
> >>>  Advantages:
> >>>  + Trivial
> >>>  + No layering violations
> >>>  + We wouldn’t need to keep track of whether the kernel bug has been
> >>>    fixed or not
> >>>  + Fixes the ppc64+XFS performance problem
> >>>
> >>>  Disadvantages:
> >>>  - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
> >>>    levels, whatever that means
> >>>
> >>> So this is the main reason this is an RFC: What should we do?  Is (1)
> >>> really the best choice?
> >>>
> >>>
> >>> In any case, I’ve ran the test case I showed in
> >>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
> >>> more than ten times with this series applied and the installation
> >>> succeeded every time.  (Without this series, it fails like every other
> >>> time.)
> >>>
> >>>
> >>
> >> Hi!
> >>
> >> First, great thanks for your investigation!
> >>
> >> We need c8bb23cbdbe3 patch, because we use 1M clusters, and zeroing 1M is 
> >> significant
> >> in time.
> >>
> >> I've tested a bit:
> >>
> >> test:
> >> for img in /ssd/test.img /test.img; do for cl in 64K 1M; do for step in 4K 
> >> 64K 1M; do ./qemu-img creat

Re: [PATCH v2 0/2] qcow2: Fix image corruption bug in 4.1

2019-10-25 Thread Michael Weiser
Hello Kevin,

On Thu, Oct 24, 2019 at 04:26:56PM +0200, Kevin Wolf wrote:

> Kevin Wolf (2):
>   coroutine: Add qemu_co_mutex_assert_locked()
>   qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()

Tested-by: Michael Weiser 

with offending 69f47505e and today's master
(58560ad254fbda71d4daa6622d71683190070ee2).

Corruption does not happen with series applied.
Assertion tiggers as expected if lock is not taken.

FWIW: Reviewed-by: Michael Weiser 
-- 
Thanks,
Michael



Early 4.1.1 release for image corruption fixes?

2019-10-25 Thread Kevin Wolf
Hi,

the QEMU 4.1.0 release has two problems that can easily cause image
corruption with any qcow2 images (no special configuration needed to
trigger the bugs):

1. A locking bug in the qcow2 code. I just sent a pull request that
   includes the fix for this. The important patch there is:

   'qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()'

2. A kernel bug in the XFS driver that became visible by new I/O
   patterns the qcow2 implementation started to use in 4.1. Max is
   currently working on a workaround for this.

The Planning/4.1 wiki page tells me that a 4.1.1 release is planned for
end of November, which isn't too far, but I was wondering if want to
have a stable release even earlier, right after the fixes for both
problems are in.

Peter, if we haven't made a decision until then, maybe discussing this
is an agenda item for QEMU Summit, too.

Kevin




Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 17:19, Max Reitz wrote:
> On 25.10.19 15:56, Vladimir Sementsov-Ogievskiy wrote:
>> 25.10.2019 16:40, Vladimir Sementsov-Ogievskiy wrote:
>>> 25.10.2019 12:58, Max Reitz wrote:
 Hi,

 It seems to me that there is a bug in Linux’s XFS kernel driver, as
 I’ve explained here:

 https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html

 In combination with our commit c8bb23cbdbe32f, this may lead to guest
 data corruption when using qcow2 images on XFS with aio=native.

 We can’t wait until the XFS kernel driver is fixed, we should work
 around the problem ourselves.

 This is an RFC for two reasons:
 (1) I don’t know whether this is the right way to address the issue,
 (2) Ideally, we should detect whether the XFS kernel driver is fixed and
   if so stop applying the workaround.
   I don’t know how we would go about this, so this series doesn’t do
   it.  (Hence it’s an RFC.)
 (3) Perhaps it’s a bit of a layering violation to let the file-posix
   driver access and modify a BdrvTrackedRequest object.

 As for how we can address the issue, I see three ways:
 (1) The one presented in this series: On XFS with aio=native, we extend
   tracked requests for post-EOF fallocate() calls (i.e., write-zero
   operations) to reach until infinity (INT64_MAX in practice), mark
   them serializing and wait for other conflicting requests.

   Advantages:
   + Limits the impact to very specific cases
     (And that means it wouldn’t hurt too much to keep this workaround
     even when the XFS driver has been fixed)
   + Works around the bug where it happens, namely in file-posix

   Disadvantages:
   - A bit complex
   - A bit of a layering violation (should file-posix have access to
     tracked requests?)

 (2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
   becomes visible due to that function: I don’t think qcow2 writes
   zeroes in any other I/O path, and raw images are fixed in size so
   post-EOF writes won’t happen.

   Advantages:
   + Maybe simpler, depending on how difficult it is to handle the
     layering violation
   + Also fixes the performance problem of handle_alloc_space() being
     slow on ppc64+XFS.

   Disadvantages:
   - Huge layering violation because qcow2 would need to know whether
     the image is stored on XFS or not.
   - We’d definitely want to skip this workaround when the XFS driver
     has been fixed, so we need some method to find out whether it has

 (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
   To my knowledge I’m the only one who has provided any benchmarks for
   this commit, and even then I was a bit skeptical because it performs
   well in some cases and bad in others.  I concluded that it’s
   probably worth it because the “some cases” are more likely to occur.

   Now we have this problem of corruption here (granted due to a bug in
   the XFS driver), and another report of massively degraded
   performance on ppc64
   (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
   private BZ; I hate that :-/  The report is about 40 % worse
   performance for an in-guest fio write benchmark.)

   So I have to ask the question about what the justification for
   keeping c8bb23cbdbe32f is.  How much does performance increase with
   it actually?  (On non-(ppc64+XFS) machines, obviously)

   Advantages:
   + Trivial
   + No layering violations
   + We wouldn’t need to keep track of whether the kernel bug has been
     fixed or not
   + Fixes the ppc64+XFS performance problem

   Disadvantages:
   - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
     levels, whatever that means

 So this is the main reason this is an RFC: What should we do?  Is (1)
 really the best choice?


 In any case, I’ve ran the test case I showed in
 https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
 more than ten times with this series applied and the installation
 succeeded every time.  (Without this series, it fails like every other
 time.)


>>>
>>> Hi!
>>>
>>> First, great thanks for your investigation!
>>>
>>> We need c8bb23cbdbe3 patch, because we use 1M clusters, and zeroing 1M is 
>>> significant
>>> in time.
>>>
>>> I've tested a bit:
>>>
>>> test:
>>> for img in /ssd/test.img /test.img; do for cl in 64K 1M; do for step in 4K 
>>> 64K 1M; do ./qemu-img create -f qcow2 -o cluster_size=$cl $img 15G > 
>>> /dev/null; printf '%-15s%-7s

Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Max Reitz
On 25.10.19 16:17, Peter Maydell wrote:
> On Fri, 25 Oct 2019 at 15:16, Max Reitz  wrote:
>>
>> On 25.10.19 15:46, Peter Maydell wrote:
>>> On Fri, 25 Oct 2019 at 11:09, Max Reitz  wrote:

 Hi,

 It seems to me that there is a bug in Linux’s XFS kernel driver, as
 I’ve explained here:

 https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html

 In combination with our commit c8bb23cbdbe32f, this may lead to guest
 data corruption when using qcow2 images on XFS with aio=native.
>>>
>>> Have we reported that upstream to the xfs folks?
>>
>> I’ve created an RH BZ here:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1765547
> 
> Currently "You are not authorized to access bug #1765547." for
> anonymous browsers, just fyi.

Err.  Oops.  That wasn’t my intention.  I hate that misfeature.

Thanks for telling me, I fixed it. :-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Max Reitz
On 25.10.19 15:56, Vladimir Sementsov-Ogievskiy wrote:
> 25.10.2019 16:40, Vladimir Sementsov-Ogievskiy wrote:
>> 25.10.2019 12:58, Max Reitz wrote:
>>> Hi,
>>>
>>> It seems to me that there is a bug in Linux’s XFS kernel driver, as
>>> I’ve explained here:
>>>
>>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
>>>
>>> In combination with our commit c8bb23cbdbe32f, this may lead to guest
>>> data corruption when using qcow2 images on XFS with aio=native.
>>>
>>> We can’t wait until the XFS kernel driver is fixed, we should work
>>> around the problem ourselves.
>>>
>>> This is an RFC for two reasons:
>>> (1) I don’t know whether this is the right way to address the issue,
>>> (2) Ideally, we should detect whether the XFS kernel driver is fixed and
>>>  if so stop applying the workaround.
>>>  I don’t know how we would go about this, so this series doesn’t do
>>>  it.  (Hence it’s an RFC.)
>>> (3) Perhaps it’s a bit of a layering violation to let the file-posix
>>>  driver access and modify a BdrvTrackedRequest object.
>>>
>>> As for how we can address the issue, I see three ways:
>>> (1) The one presented in this series: On XFS with aio=native, we extend
>>>  tracked requests for post-EOF fallocate() calls (i.e., write-zero
>>>  operations) to reach until infinity (INT64_MAX in practice), mark
>>>  them serializing and wait for other conflicting requests.
>>>
>>>  Advantages:
>>>  + Limits the impact to very specific cases
>>>    (And that means it wouldn’t hurt too much to keep this workaround
>>>    even when the XFS driver has been fixed)
>>>  + Works around the bug where it happens, namely in file-posix
>>>
>>>  Disadvantages:
>>>  - A bit complex
>>>  - A bit of a layering violation (should file-posix have access to
>>>    tracked requests?)
>>>
>>> (2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
>>>  becomes visible due to that function: I don’t think qcow2 writes
>>>  zeroes in any other I/O path, and raw images are fixed in size so
>>>  post-EOF writes won’t happen.
>>>
>>>  Advantages:
>>>  + Maybe simpler, depending on how difficult it is to handle the
>>>    layering violation
>>>  + Also fixes the performance problem of handle_alloc_space() being
>>>    slow on ppc64+XFS.
>>>
>>>  Disadvantages:
>>>  - Huge layering violation because qcow2 would need to know whether
>>>    the image is stored on XFS or not.
>>>  - We’d definitely want to skip this workaround when the XFS driver
>>>    has been fixed, so we need some method to find out whether it has
>>>
>>> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
>>>  To my knowledge I’m the only one who has provided any benchmarks for
>>>  this commit, and even then I was a bit skeptical because it performs
>>>  well in some cases and bad in others.  I concluded that it’s
>>>  probably worth it because the “some cases” are more likely to occur.
>>>
>>>  Now we have this problem of corruption here (granted due to a bug in
>>>  the XFS driver), and another report of massively degraded
>>>  performance on ppc64
>>>  (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
>>>  private BZ; I hate that :-/  The report is about 40 % worse
>>>  performance for an in-guest fio write benchmark.)
>>>
>>>  So I have to ask the question about what the justification for
>>>  keeping c8bb23cbdbe32f is.  How much does performance increase with
>>>  it actually?  (On non-(ppc64+XFS) machines, obviously)
>>>
>>>  Advantages:
>>>  + Trivial
>>>  + No layering violations
>>>  + We wouldn’t need to keep track of whether the kernel bug has been
>>>    fixed or not
>>>  + Fixes the ppc64+XFS performance problem
>>>
>>>  Disadvantages:
>>>  - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
>>>    levels, whatever that means
>>>
>>> So this is the main reason this is an RFC: What should we do?  Is (1)
>>> really the best choice?
>>>
>>>
>>> In any case, I’ve ran the test case I showed in
>>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
>>> more than ten times with this series applied and the installation
>>> succeeded every time.  (Without this series, it fails like every other
>>> time.)
>>>
>>>
>>
>> Hi!
>>
>> First, great thanks for your investigation!
>>
>> We need c8bb23cbdbe3 patch, because we use 1M clusters, and zeroing 1M is 
>> significant
>> in time.
>>
>> I've tested a bit:
>>
>> test:
>> for img in /ssd/test.img /test.img; do for cl in 64K 1M; do for step in 4K 
>> 64K 1M; do ./qemu-img create -f qcow2 -o cluster_size=$cl $img 15G > 
>> /dev/null; printf '%-15s%-7s%-10s : ' $img cl=$cl step=$step; ./qemu-img 
>> bench -c $((15 * 1024)) -n -s 4K -S $step -t none -w $img | tail -1 | awk 
>> '{print $4}'; done; done; done
>>
>> on master:
>>
>> /ssd/test.img  cl=6

Re: [PATCH 2/3] qcow2: Assert that qcow2_cache_get() callers hold s->lock

2019-10-25 Thread Michael Weiser
Hi Kevin,

On Wed, Oct 23, 2019 at 05:37:49PM +0200, Kevin Wolf wrote:

> > qcow2_cache_do_get() requires that s->lock is locked because it can
> > yield between picking a cache entry and actually taking ownership of it
> > by setting offset and increasing the reference count.
> > 
> > Add an assertion to make sure the caller really holds the lock. The
> > function can be called outside of coroutine context, where bdrv_pread
> > and flushes become synchronous operations. The lock cannot and need not
> > be taken in this case.
> I'm still running tests to see if any other code paths trigger the
> assertion, but image creation calls this without the lock held (which is
> harmless because nobody else knows about the image so there won't be
> concurrent requests). The following patch is needed additionally to make
> image creation work with the new assertion.

I can confirm that with all four patches corruption does no longer
occur as of commit 69f47505ee66afaa513305de0c1895a224e52c45. Removing
only 3/3 (qcow2: Fix corruption bug in
qcow2_detect_metadata_preallocation()) the assertion triggers after a
few seconds, leaving behind a few leaked clusters but no errors in the
image.

(qemu) qemu-system-x86_64:qemu/include/qemu/coroutine.h:175:
qemu_co_mutex_assert_locked: Assertion `mutex->locked && mutex->holder
== qemu_coroutine_self()' failed.
Aborted (core dumped)

$ qemu-img check qtest.qcow2 
Leaked cluster 169257 refcount=3 reference=2
Leaked cluster 172001 refcount=1 reference=0
Leaked cluster 172002 refcount=1 reference=0
Leaked cluster 172003 refcount=1 reference=0
Leaked cluster 172004 refcount=1 reference=0
Leaked cluster 172005 refcount=1 reference=0
Leaked cluster 172006 refcount=1 reference=0
Leaked cluster 172007 refcount=1 reference=0
Leaked cluster 172008 refcount=1 reference=0
Leaked cluster 172009 refcount=1 reference=0
Leaked cluster 172010 refcount=1 reference=0
Leaked cluster 172011 refcount=1 reference=0
Leaked cluster 172012 refcount=1 reference=0

13 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
255525/327680 = 77.98% allocated, 3.22% fragmented, 0.00% compressed
clusters
Image end offset: 17106403328

I was going to test with master as well but got overtaken by v2. Will
move on to test v2 now. :)

Series:
Tested-by: Michael Weiser 

No biggie but if there's a chance could you switch my address to the
above?
-- 
Thanks,
Michael



Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Peter Maydell
On Fri, 25 Oct 2019 at 15:16, Max Reitz  wrote:
>
> On 25.10.19 15:46, Peter Maydell wrote:
> > On Fri, 25 Oct 2019 at 11:09, Max Reitz  wrote:
> >>
> >> Hi,
> >>
> >> It seems to me that there is a bug in Linux’s XFS kernel driver, as
> >> I’ve explained here:
> >>
> >> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
> >>
> >> In combination with our commit c8bb23cbdbe32f, this may lead to guest
> >> data corruption when using qcow2 images on XFS with aio=native.
> >
> > Have we reported that upstream to the xfs folks?
>
> I’ve created an RH BZ here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1765547

Currently "You are not authorized to access bug #1765547." for
anonymous browsers, just fyi.

thanks
-- PMM



Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Max Reitz
On 25.10.19 15:46, Peter Maydell wrote:
> On Fri, 25 Oct 2019 at 11:09, Max Reitz  wrote:
>>
>> Hi,
>>
>> It seems to me that there is a bug in Linux’s XFS kernel driver, as
>> I’ve explained here:
>>
>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
>>
>> In combination with our commit c8bb23cbdbe32f, this may lead to guest
>> data corruption when using qcow2 images on XFS with aio=native.
> 
> Have we reported that upstream to the xfs folks?

I’ve created an RH BZ here:

https://bugzilla.redhat.com/show_bug.cgi?id=1765547

So at least some XFS folks are aware of it (and I trust them to inform
the others as necessary :-)).  (Eric Sandeen has been part of the
discussion for the last couple of days already.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PULL 00/19] Trivial branch patches

2019-10-25 Thread Peter Maydell
On Fri, 25 Oct 2019 at 09:35, Laurent Vivier  wrote:
>
> The following changes since commit 58560ad254fbda71d4daa6622d71683190070ee2:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191024' 
> into staging (2019-10-24 16:22:58 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/vivier/qemu.git tags/trivial-branch-pull-request
>
> for you to fetch changes up to fabb862f76f093cdd1610571de9ba714dc1c:
>
>   hw/rtc/aspeed_rtc: Remove unused includes (2019-10-24 20:35:28 +0200)
>
> 
> Fix typos and docs, trivial changes and RTC devices split
>
> 
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM



Re: [PATCH v14 1/9] esp: add pseudo-DMA as used by Macintosh

2019-10-25 Thread Paolo Bonzini
On 22/10/19 13:17, Laurent Vivier wrote:
> +if (s->dma_memory_read) {
> +s->dma_memory_read(s->dma_opaque, &s->cmdbuf[s->cmdlen], len);
> +} else {
> +set_pdma(s, CMD, s->cmdlen, len);
> +s->pdma_cb = do_dma_pdma_cb;
> +esp_raise_drq(s);
> +return;
> +}
> +trace_esp_handle_ti_cmd(s->cmdlen);
> +s->ti_size = 0;
> +s->cmdlen = 0;
> +s->do_cmd = 0;
> +do_cmd(s, s->cmdbuf);
>  return;

Can you explain these lines after s->dma_memory_read?  I suppose they
are related to

> -}
> -if (s->do_cmd) {
> +} else if (s->do_cmd) {

If so, it would be nice to make those a separate patch.  Otherwise seems
okay.

Paolo



Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 16:40, Vladimir Sementsov-Ogievskiy wrote:
> 25.10.2019 12:58, Max Reitz wrote:
>> Hi,
>>
>> It seems to me that there is a bug in Linux’s XFS kernel driver, as
>> I’ve explained here:
>>
>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
>>
>> In combination with our commit c8bb23cbdbe32f, this may lead to guest
>> data corruption when using qcow2 images on XFS with aio=native.
>>
>> We can’t wait until the XFS kernel driver is fixed, we should work
>> around the problem ourselves.
>>
>> This is an RFC for two reasons:
>> (1) I don’t know whether this is the right way to address the issue,
>> (2) Ideally, we should detect whether the XFS kernel driver is fixed and
>>  if so stop applying the workaround.
>>  I don’t know how we would go about this, so this series doesn’t do
>>  it.  (Hence it’s an RFC.)
>> (3) Perhaps it’s a bit of a layering violation to let the file-posix
>>  driver access and modify a BdrvTrackedRequest object.
>>
>> As for how we can address the issue, I see three ways:
>> (1) The one presented in this series: On XFS with aio=native, we extend
>>  tracked requests for post-EOF fallocate() calls (i.e., write-zero
>>  operations) to reach until infinity (INT64_MAX in practice), mark
>>  them serializing and wait for other conflicting requests.
>>
>>  Advantages:
>>  + Limits the impact to very specific cases
>>    (And that means it wouldn’t hurt too much to keep this workaround
>>    even when the XFS driver has been fixed)
>>  + Works around the bug where it happens, namely in file-posix
>>
>>  Disadvantages:
>>  - A bit complex
>>  - A bit of a layering violation (should file-posix have access to
>>    tracked requests?)
>>
>> (2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
>>  becomes visible due to that function: I don’t think qcow2 writes
>>  zeroes in any other I/O path, and raw images are fixed in size so
>>  post-EOF writes won’t happen.
>>
>>  Advantages:
>>  + Maybe simpler, depending on how difficult it is to handle the
>>    layering violation
>>  + Also fixes the performance problem of handle_alloc_space() being
>>    slow on ppc64+XFS.
>>
>>  Disadvantages:
>>  - Huge layering violation because qcow2 would need to know whether
>>    the image is stored on XFS or not.
>>  - We’d definitely want to skip this workaround when the XFS driver
>>    has been fixed, so we need some method to find out whether it has
>>
>> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
>>  To my knowledge I’m the only one who has provided any benchmarks for
>>  this commit, and even then I was a bit skeptical because it performs
>>  well in some cases and bad in others.  I concluded that it’s
>>  probably worth it because the “some cases” are more likely to occur.
>>
>>  Now we have this problem of corruption here (granted due to a bug in
>>  the XFS driver), and another report of massively degraded
>>  performance on ppc64
>>  (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
>>  private BZ; I hate that :-/  The report is about 40 % worse
>>  performance for an in-guest fio write benchmark.)
>>
>>  So I have to ask the question about what the justification for
>>  keeping c8bb23cbdbe32f is.  How much does performance increase with
>>  it actually?  (On non-(ppc64+XFS) machines, obviously)
>>
>>  Advantages:
>>  + Trivial
>>  + No layering violations
>>  + We wouldn’t need to keep track of whether the kernel bug has been
>>    fixed or not
>>  + Fixes the ppc64+XFS performance problem
>>
>>  Disadvantages:
>>  - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
>>    levels, whatever that means
>>
>> So this is the main reason this is an RFC: What should we do?  Is (1)
>> really the best choice?
>>
>>
>> In any case, I’ve ran the test case I showed in
>> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
>> more than ten times with this series applied and the installation
>> succeeded every time.  (Without this series, it fails like every other
>> time.)
>>
>>
> 
> Hi!
> 
> First, great thanks for your investigation!
> 
> We need c8bb23cbdbe3 patch, because we use 1M clusters, and zeroing 1M is 
> significant
> in time.
> 
> I've tested a bit:
> 
> test:
> for img in /ssd/test.img /test.img; do for cl in 64K 1M; do for step in 4K 
> 64K 1M; do ./qemu-img create -f qcow2 -o cluster_size=$cl $img 15G > 
> /dev/null; printf '%-15s%-7s%-10s : ' $img cl=$cl step=$step; ./qemu-img 
> bench -c $((15 * 1024)) -n -s 4K -S $step -t none -w $img | tail -1 | awk 
> '{print $4}'; done; done; done
> 
> on master:
> 
> /ssd/test.img  cl=64K step=4K    : 0.291
> /ssd/test.img  cl=64K step=64K   : 0.813
> /ssd/test.img  cl=64K step=1M    : 2.799
> /ssd/test.img  cl=1M  step=4K    : 0.217
> /ssd/test.img 

Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Peter Maydell
On Fri, 25 Oct 2019 at 11:09, Max Reitz  wrote:
>
> Hi,
>
> It seems to me that there is a bug in Linux’s XFS kernel driver, as
> I’ve explained here:
>
> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
>
> In combination with our commit c8bb23cbdbe32f, this may lead to guest
> data corruption when using qcow2 images on XFS with aio=native.

Have we reported that upstream to the xfs folks?

thanks
-- PMM



[PULL 6/7] coroutine: Add qemu_co_mutex_assert_locked()

2019-10-25 Thread Kevin Wolf
Some functions require that the caller holds a certain CoMutex for them
to operate correctly. Add a function so that they can assert the lock is
really held.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Tested-by: Michael Weiser 
Reviewed-by: Michael Weiser 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Denis V. Lunev 
Reviewed-by: Max Reitz 
---
 include/qemu/coroutine.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 8d55663062..dfd261c5b1 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -167,6 +167,21 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex);
  */
 void coroutine_fn qemu_co_mutex_unlock(CoMutex *mutex);
 
+/**
+ * Assert that the current coroutine holds @mutex.
+ */
+static inline coroutine_fn void qemu_co_mutex_assert_locked(CoMutex *mutex)
+{
+/*
+ * mutex->holder doesn't need any synchronisation if the assertion holds
+ * true because the mutex protects it. If it doesn't hold true, we still
+ * don't mind if another thread takes or releases mutex behind our back,
+ * because the condition will be false no matter whether we read NULL or
+ * the pointer for any other coroutine.
+ */
+assert(atomic_read(&mutex->locked) &&
+   mutex->holder == qemu_coroutine_self());
+}
 
 /**
  * CoQueues are a mechanism to queue coroutines in order to continue executing
-- 
2.20.1




[PULL 1/7] qapi: add support for blkreplay driver

2019-10-25 Thread Kevin Wolf
From: Pavel Dovgalyuk 

This patch adds support for blkreplay driver to the blockdev options.
Now blkreplay can be used with -blockdev command line option
in the following format:
-blockdev driver=blkreplay,image=file-node-name,node-name=replay-node-name

This option makes possible implementation of the better command
line support for record/replay invocations.

Signed-off-by: Pavel Dovgalyuk 
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b274aef713..aa97ee2641 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2883,12 +2883,13 @@
 # @nvme: Since 2.12
 # @copy-on-read: Since 3.0
 # @blklogwrites: Since 3.0
+# @blkreplay: Since 4.2
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
-  'data': [ 'blkdebug', 'blklogwrites', 'blkverify', 'bochs', 'cloop',
-'copy-on-read', 'dmg', 'file', 'ftp', 'ftps', 'gluster',
+  'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
+'cloop', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps', 'gluster',
 'host_cdrom', 'host_device', 'http', 'https', 'iscsi', 'luks',
 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels', 'qcow',
 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
@@ -3501,6 +3502,18 @@
   'data': { 'test': 'BlockdevRef',
 'raw': 'BlockdevRef' } }
 
+##
+# @BlockdevOptionsBlkreplay:
+#
+# Driver specific block device options for blkreplay.
+#
+# @image: disk image which should be controlled with blkreplay
+#
+# Since: 4.2
+##
+{ 'struct': 'BlockdevOptionsBlkreplay',
+  'data': { 'image': 'BlockdevRef' } }
+
 ##
 # @QuorumReadPattern:
 #
@@ -4028,6 +4041,7 @@
   'blkdebug':   'BlockdevOptionsBlkdebug',
   'blklogwrites':'BlockdevOptionsBlklogwrites',
   'blkverify':  'BlockdevOptionsBlkverify',
+  'blkreplay':  'BlockdevOptionsBlkreplay',
   'bochs':  'BlockdevOptionsGenericFormat',
   'cloop':  'BlockdevOptionsGenericFormat',
   'copy-on-read':'BlockdevOptionsGenericFormat',
-- 
2.20.1




[PULL 2/7] iotests: Skip read-only cases in 118 when run as root

2019-10-25 Thread Kevin Wolf
Some tests in 118 use chmod to remove write permissions from the file
and assume that the image can indeed not be opened read-write
afterwards. This doesn't work when the test is run as root, because root
can still open the file as writable even when the permission bit isn't
set.

Introduce a @skip_if_root decorator and use it in 118 to skip the tests
in question when the script is run as root.

Signed-off-by: Kevin Wolf 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tests/qemu-iotests/118|  3 +++
 tests/qemu-iotests/iotests.py | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
index ea0b326ae0..e20080e9a6 100755
--- a/tests/qemu-iotests/118
+++ b/tests/qemu-iotests/118
@@ -446,6 +446,7 @@ class TestChangeReadOnly(ChangeBaseClass):
 self.assert_qmp(result, 'return[0]/inserted/ro', True)
 self.assert_qmp(result, 'return[0]/inserted/image/filename', new_img)
 
+@iotests.skip_if_user_is_root
 def test_rw_ro_retain(self):
 os.chmod(new_img, 0o444)
 self.vm.add_drive(old_img, 'media=disk', 'none')
@@ -530,6 +531,7 @@ class TestChangeReadOnly(ChangeBaseClass):
 self.assert_qmp(result, 'return[0]/inserted/ro', True)
 self.assert_qmp(result, 'return[0]/inserted/image/filename', new_img)
 
+@iotests.skip_if_user_is_root
 def test_make_ro_rw(self):
 os.chmod(new_img, 0o444)
 self.vm.add_drive(old_img, 'media=disk', 'none')
@@ -571,6 +573,7 @@ class TestChangeReadOnly(ChangeBaseClass):
 self.assert_qmp(result, 'return[0]/inserted/ro', True)
 self.assert_qmp(result, 'return[0]/inserted/image/filename', new_img)
 
+@iotests.skip_if_user_is_root
 def test_make_ro_rw_by_retain(self):
 os.chmod(new_img, 0o444)
 self.vm.add_drive(old_img, 'media=disk', 'none')
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 693fde155a..709def4d5d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -931,6 +931,16 @@ def skip_if_unsupported(required_formats=[], 
read_only=False):
 return func_wrapper
 return skip_test_decorator
 
+def skip_if_user_is_root(func):
+'''Skip Test Decorator
+   Runs the test only without root permissions'''
+def func_wrapper(*args, **kwargs):
+if os.getuid() == 0:
+case_notrun('{}: cannot be run as root'.format(args[0]))
+else:
+return func(*args, **kwargs)
+return func_wrapper
+
 def execute_unittest(output, verbosity, debug):
 runner = unittest.TextTestRunner(stream=output, descriptions=True,
  verbosity=verbosity)
-- 
2.20.1




[PULL 0/7] Block layer patches

2019-10-25 Thread Kevin Wolf
The following changes since commit 7bc8f9734213b76e76631a483be13d6737c2adbc:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20191025' 
into staging (2019-10-25 13:12:16 +0100)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 5e9785505210e2477e590e61b1ab100d0ec22b01:

  qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation() 
(2019-10-25 15:18:55 +0200)


Block layer patches:

- qcow2: Fix data corruption bug that is triggered in partial cluster
  allocation with default options
- qapi: add support for blkreplay driver
- doc: Describe missing generic -blockdev options
- iotests: Fix 118 when run as root
- Minor code cleanups


Kevin Wolf (5):
  iotests: Skip read-only cases in 118 when run as root
  blockdev: Use error_report() in hmp_commit()
  doc: Describe missing generic -blockdev options
  coroutine: Add qemu_co_mutex_assert_locked()
  qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()

Pavel Dovgaluk (1):
  qapi: add support for blkreplay driver

Vladimir Sementsov-Ogievskiy (1):
  block/backup: drop dead code from backup_job_create

 qapi/block-core.json  | 18 --
 include/qemu/coroutine.h  | 15 +++
 block/backup.c|  5 +
 block/qcow2-refcount.c|  2 ++
 block/qcow2.c |  3 ++-
 blockdev.c|  7 +++
 qemu-options.hx   | 22 +-
 tests/qemu-iotests/118|  3 +++
 tests/qemu-iotests/iotests.py | 10 ++
 9 files changed, 73 insertions(+), 12 deletions(-)




[PULL 7/7] qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()

2019-10-25 Thread Kevin Wolf
qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
requires s->lock to be taken to protect its accesses to the refcount
table and refcount blocks. However, nothing in this code path actually
took the lock. This could cause the same cache entry to be used by two
requests at the same time, for different tables at different offsets,
resulting in image corruption.

As it would be preferable to base the detection on consistent data (even
though it's just heuristics), let's take the lock not only around the
qcow2_get_refcount() calls, but around the whole function.

This patch takes the lock in qcow2_co_block_status() earlier and asserts
in qcow2_detect_metadata_preallocation() that we hold the lock.

Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
Cc: qemu-sta...@nongnu.org
Reported-by: Michael Weiser 
Signed-off-by: Kevin Wolf 
Tested-by: Michael Weiser 
Reviewed-by: Michael Weiser 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/qcow2-refcount.c | 2 ++
 block/qcow2.c  | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index ef965d7895..0d64bf5a5e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3455,6 +3455,8 @@ int qcow2_detect_metadata_preallocation(BlockDriverState 
*bs)
 int64_t i, end_cluster, cluster_count = 0, threshold;
 int64_t file_length, real_allocation, real_clusters;
 
+qemu_co_mutex_assert_locked(&s->lock);
+
 file_length = bdrv_getlength(bs->file->bs);
 if (file_length < 0) {
 return file_length;
diff --git a/block/qcow2.c b/block/qcow2.c
index 8b05933565..0bc69e6996 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1916,6 +1916,8 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 unsigned int bytes;
 int status = 0;
 
+qemu_co_mutex_lock(&s->lock);
+
 if (!s->metadata_preallocation_checked) {
 ret = qcow2_detect_metadata_preallocation(bs);
 s->metadata_preallocation = (ret == 1);
@@ -1923,7 +1925,6 @@ static int coroutine_fn 
qcow2_co_block_status(BlockDriverState *bs,
 }
 
 bytes = MIN(INT_MAX, count);
-qemu_co_mutex_lock(&s->lock);
 ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
 qemu_co_mutex_unlock(&s->lock);
 if (ret < 0) {
-- 
2.20.1




[PULL 5/7] doc: Describe missing generic -blockdev options

2019-10-25 Thread Kevin Wolf
We added more generic options after introducing -blockdev and forgot to
update the documentation (man page and --help output) accordingly. Do
that now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Peter Maydell 
---
 qemu-options.hx | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 996b6fba74..19709f973d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -864,7 +864,8 @@ ETEXI
 DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev,
 "-blockdev [driver=]driver[,node-name=N][,discard=ignore|unmap]\n"
 "  [,cache.direct=on|off][,cache.no-flush=on|off]\n"
-"  [,read-only=on|off][,detect-zeroes=on|off|unmap]\n"
+"  [,read-only=on|off][,auto-read-only=on|off]\n"
+"  [,force-share=on|off][,detect-zeroes=on|off|unmap]\n"
 "  [,driver specific parameters...]\n"
 "configure a block backend\n", QEMU_ARCH_ALL)
 STEXI
@@ -900,6 +901,25 @@ name is not intended to be predictable and changes between 
QEMU invocations.
 For the top level, an explicit node name must be specified.
 @item read-only
 Open the node read-only. Guest write attempts will fail.
+
+Note that some block drivers support only read-only access, either generally or
+in certain configurations. In this case, the default value
+@option{read-only=off} does not work and the option must be specified
+explicitly.
+@item auto-read-only
+If @option{auto-read-only=on} is set, QEMU may fall back to read-only usage
+even when @option{read-only=off} is requested, or even switch between modes as
+needed, e.g. depending on whether the image file is writable or whether a
+writing user is attached to the node.
+@item force-share
+Override the image locking system of QEMU by forcing the node to utilize
+weaker shared access for permissions where it would normally request exclusive
+access.  When there is the potential for multiple instances to have the same
+file open (whether this invocation of QEMU is the first or the second
+instance), both instances must permit shared access for the second instance to
+succeed at opening the file.
+
+Enabling @option{force-share=on} requires @option{read-only=on}.
 @item cache.direct
 The host page cache can be avoided with @option{cache.direct=on}. This will
 attempt to do disk IO directly to the guest's memory. QEMU may still perform an
-- 
2.20.1




[PULL 3/7] blockdev: Use error_report() in hmp_commit()

2019-10-25 Thread Kevin Wolf
Instead of using monitor_printf() to report errors, hmp_commit() should
use error_report() like other places do.

Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
---
 blockdev.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 03c7cd7651..ba491e3ef5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1088,11 +1088,11 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 
 blk = blk_by_name(device);
 if (!blk) {
-monitor_printf(mon, "Device '%s' not found\n", device);
+error_report("Device '%s' not found", device);
 return;
 }
 if (!blk_is_available(blk)) {
-monitor_printf(mon, "Device '%s' has no medium\n", device);
+error_report("Device '%s' has no medium", device);
 return;
 }
 
@@ -1105,8 +1105,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 aio_context_release(aio_context);
 }
 if (ret < 0) {
-monitor_printf(mon, "'commit' error for '%s': %s\n", device,
-   strerror(-ret));
+error_report("'commit' error for '%s': %s", device, strerror(-ret));
 }
 }
 
-- 
2.20.1




Re: [RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 12:58, Max Reitz wrote:
> Hi,
> 
> It seems to me that there is a bug in Linux’s XFS kernel driver, as
> I’ve explained here:
> 
> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html
> 
> In combination with our commit c8bb23cbdbe32f, this may lead to guest
> data corruption when using qcow2 images on XFS with aio=native.
> 
> We can’t wait until the XFS kernel driver is fixed, we should work
> around the problem ourselves.
> 
> This is an RFC for two reasons:
> (1) I don’t know whether this is the right way to address the issue,
> (2) Ideally, we should detect whether the XFS kernel driver is fixed and
>  if so stop applying the workaround.
>  I don’t know how we would go about this, so this series doesn’t do
>  it.  (Hence it’s an RFC.)
> (3) Perhaps it’s a bit of a layering violation to let the file-posix
>  driver access and modify a BdrvTrackedRequest object.
> 
> As for how we can address the issue, I see three ways:
> (1) The one presented in this series: On XFS with aio=native, we extend
>  tracked requests for post-EOF fallocate() calls (i.e., write-zero
>  operations) to reach until infinity (INT64_MAX in practice), mark
>  them serializing and wait for other conflicting requests.
> 
>  Advantages:
>  + Limits the impact to very specific cases
>(And that means it wouldn’t hurt too much to keep this workaround
>even when the XFS driver has been fixed)
>  + Works around the bug where it happens, namely in file-posix
> 
>  Disadvantages:
>  - A bit complex
>  - A bit of a layering violation (should file-posix have access to
>tracked requests?)
> 
> (2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
>  becomes visible due to that function: I don’t think qcow2 writes
>  zeroes in any other I/O path, and raw images are fixed in size so
>  post-EOF writes won’t happen.
> 
>  Advantages:
>  + Maybe simpler, depending on how difficult it is to handle the
>layering violation
>  + Also fixes the performance problem of handle_alloc_space() being
>slow on ppc64+XFS.
> 
>  Disadvantages:
>  - Huge layering violation because qcow2 would need to know whether
>the image is stored on XFS or not.
>  - We’d definitely want to skip this workaround when the XFS driver
>has been fixed, so we need some method to find out whether it has
> 
> (3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
>  To my knowledge I’m the only one who has provided any benchmarks for
>  this commit, and even then I was a bit skeptical because it performs
>  well in some cases and bad in others.  I concluded that it’s
>  probably worth it because the “some cases” are more likely to occur.
> 
>  Now we have this problem of corruption here (granted due to a bug in
>  the XFS driver), and another report of massively degraded
>  performance on ppc64
>  (https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
>  private BZ; I hate that :-/  The report is about 40 % worse
>  performance for an in-guest fio write benchmark.)
> 
>  So I have to ask the question about what the justification for
>  keeping c8bb23cbdbe32f is.  How much does performance increase with
>  it actually?  (On non-(ppc64+XFS) machines, obviously)
> 
>  Advantages:
>  + Trivial
>  + No layering violations
>  + We wouldn’t need to keep track of whether the kernel bug has been
>fixed or not
>  + Fixes the ppc64+XFS performance problem
> 
>  Disadvantages:
>  - Reverts cluster allocation performance to pre-c8bb23cbdbe32f
>levels, whatever that means
> 
> So this is the main reason this is an RFC: What should we do?  Is (1)
> really the best choice?
> 
> 
> In any case, I’ve ran the test case I showed in
> https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
> more than ten times with this series applied and the installation
> succeeded every time.  (Without this series, it fails like every other
> time.)
> 
> 

Hi!

First, great thanks for your investigation!

We need c8bb23cbdbe3 patch, because we use 1M clusters, and zeroing 1M is 
significant
in time.

I've tested a bit:

test:
for img in /ssd/test.img /test.img; do for cl in 64K 1M; do for step in 4K 64K 
1M; do ./qemu-img create -f qcow2 -o cluster_size=$cl $img 15G > /dev/null; 
printf '%-15s%-7s%-10s : ' $img cl=$cl step=$step; ./qemu-img bench -c $((15 * 
1024)) -n -s 4K -S $step -t none -w $img | tail -1 | awk '{print $4}'; done; 
done; done

on master:

/ssd/test.img  cl=64K step=4K: 0.291
/ssd/test.img  cl=64K step=64K   : 0.813
/ssd/test.img  cl=64K step=1M: 2.799
/ssd/test.img  cl=1M  step=4K: 0.217
/ssd/test.img  cl=1M  step=64K   : 0.332
/ssd/test.img  cl=1M  step=1M: 0.685
/test.img  cl=64K step=4K: 1.751
/test.img  cl=64K step=64K   : 14.811
/test.img  cl=64K step=1M

[PULL 4/7] block/backup: drop dead code from backup_job_create

2019-10-25 Thread Kevin Wolf
From: Vladimir Sementsov-Ogievskiy 

After commit 00e30f05de1d195, there is no more "goto error" points
after job creation, so after "error:" @job is always NULL and we don't
need roll-back job creation.

Reported-by: Coverity (CID 1406402)
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
---
 block/backup.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index dddcf77f53..cf62b1a38c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -474,10 +474,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 if (sync_bitmap) {
 bdrv_reclaim_dirty_bitmap(sync_bitmap, NULL);
 }
-if (job) {
-backup_clean(&job->common.job);
-job_early_fail(&job->common.job);
-} else if (backup_top) {
+if (backup_top) {
 bdrv_backup_top_drop(backup_top);
 }
 
-- 
2.20.1




Re: [PATCH v2 2/2] iotests: add 269 to check maximum of bitmaps in qcow2

2019-10-25 Thread Max Reitz
On 14.10.19 13:51, Vladimir Sementsov-Ogievskiy wrote:
> Check that it's impossible to create more persistent bitmaps than qcow2
> supports.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  tests/qemu-iotests/269 | 47 ++
>  tests/qemu-iotests/269.out |  3 +++
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 51 insertions(+)
>  create mode 100755 tests/qemu-iotests/269
>  create mode 100644 tests/qemu-iotests/269.out

Is there no way to make this test any faster, e.g. by creating like
65534 bitmaps with dd and a binary blob?  (Similarly to what I do in
“iotests: Test qcow2's snapshot table handling”)

This is such an edge case, but running the test took 3:46 min before
patch 1 (which I already find much too long), and 8:13 min afterwards
(on my machine).

(To be honest, if we take this test as-is, I’m probably just never going
to run it.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 1/2] qcow2-bitmaps: fix qcow2_can_store_new_dirty_bitmap

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 15:50, Max Reitz wrote:
> On 14.10.19 13:51, Vladimir Sementsov-Ogievskiy wrote:
>> qcow2_can_store_new_dirty_bitmap works wrong, as it considers only
>> bitmaps already stored in the qcow2 image and ignores persistent
>> BdrvDirtyBitmap objects.
>>
>> So, let's instead count persistent BdrvDirtyBitmaps. We load all qcow2
>> bitmaps on open, so there should not be any bitmap in the image for
>> which we don't have BdrvDirtyBitmaps version. If it is - it's a kind of
>> corruption, and no reason to check for corruptions here (open() and
>> close() are better places for it).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/qcow2-bitmap.c | 41 ++---
>>   1 file changed, 18 insertions(+), 23 deletions(-)
> 
> Am I right in interpreting qcow2_load_dirty_bitmaps() in such a way that
> every persistent dirty bitmap will cause a runtime dirty bitmap to be
> created?

Yes, we load all the bitmaps.

Every bitmap stored in qcow2 image is loaded and corresponding BdrvDirtyBitmap
is created. If bitmap has IN_USE flag in the image, created BdrvDirtyBitmap is
marked inconsistent, but it is still there. If bitmap doesn't have AUTO flag,
it becomes disabled bitmap.

> 
> If so:
> 
> Reviewed-by: Max Reitz 
> 


-- 
Best regards,
Vladimir


Re: [PATCH v2 1/2] qcow2-bitmaps: fix qcow2_can_store_new_dirty_bitmap

2019-10-25 Thread Max Reitz
On 14.10.19 13:51, Vladimir Sementsov-Ogievskiy wrote:
> qcow2_can_store_new_dirty_bitmap works wrong, as it considers only
> bitmaps already stored in the qcow2 image and ignores persistent
> BdrvDirtyBitmap objects.
> 
> So, let's instead count persistent BdrvDirtyBitmaps. We load all qcow2
> bitmaps on open, so there should not be any bitmap in the image for
> which we don't have BdrvDirtyBitmaps version. If it is - it's a kind of
> corruption, and no reason to check for corruptions here (open() and
> close() are better places for it).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2-bitmap.c | 41 ++---
>  1 file changed, 18 insertions(+), 23 deletions(-)

Am I right in interpreting qcow2_load_dirty_bitmaps() in such a way that
every persistent dirty bitmap will cause a runtime dirty bitmap to be
created?

If so:

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/3] qcow2: Assert that qcow2_cache_get() callers hold s->lock

2019-10-25 Thread Kevin Wolf
Am 25.10.2019 um 12:35 hat Michael Weiser geschrieben:
> I was going to test with master as well but got overtaken by v2. Will
> move on to test v2 now. :)
> 
> Series:
> Tested-by: Michael Weiser 

Thanks for testing!

The fix itself is unchanged in v2, so I assume the result will be the
same, but testing it explicitly can't hurt. I'm going to send a pull
request today, but if you're quick, I can wait for your results.

> No biggie but if there's a chance could you switch my address to the
> above?

No problem, I've updated the address in the Reported-by tag.

Kevin




Re: [PATCH v3 0/6] block-copy: memory limit

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
25.10.2019 15:31, Max Reitz wrote:
> On 22.10.19 13:17, Vladimir Sementsov-Ogievskiy wrote:
>> I'm going to bring block-status driven, async copying process to
>> block-copy, to make it fast. The first step is to limit memory usage of
>> backup, here is it.
>>
>> v3:
>> 03: add Max's r-b
>> 04: fix commit message and include guards, add Max's r-b
>> 05-06: add Max's r-b
>>
>> v2: [mostly by Max's comments]
>> Now based on master (Thank you Max!)
>> 01: add Max's r-b
>> 02: add Max's r-b
>> 03: - refactor block_copy_do_copy goto/return
>>  - add small comment to block_copy_do_copy
>> 04: - a lot of renaming and wording fixes
>>  - refactor to use "available" instead of "taken"
>>  - refactor co_get_from_shres
>> 05: rebase on 04 changes
>> 06: drop extra things from max_transfer calculation
>>
>> Vladimir Sementsov-Ogievskiy (6):
>>block/block-copy: allocate buffer in block_copy_with_bounce_buffer
>>block/block-copy: limit copy_range_size to 16 MiB
>>block/block-copy: refactor copying
>>util: introduce SharedResource
>>block/block-copy: add memory limit
>>block/block-copy: increase buffered copy request
>>
>>   include/block/block-copy.h|   5 +-
>>   include/qemu/co-shared-resource.h |  71 
>>   block/block-copy.c| 182 +++---
>>   util/qemu-co-shared-resource.c|  76 +
>>   block/trace-events|   6 +-
>>   util/Makefile.objs|   1 +
>>   6 files changed, 249 insertions(+), 92 deletions(-)
>>   create mode 100644 include/qemu/co-shared-resource.h
>>   create mode 100644 util/qemu-co-shared-resource.c
> 
> Thanks, applied to my block branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block
> 
> Max
> 

Thank you Max!

-- 
Best regards,
Vladimir


Re: [PATCH v3 0/6] block-copy: memory limit

2019-10-25 Thread Max Reitz
On 22.10.19 13:17, Vladimir Sementsov-Ogievskiy wrote:
> I'm going to bring block-status driven, async copying process to
> block-copy, to make it fast. The first step is to limit memory usage of
> backup, here is it.
> 
> v3:
> 03: add Max's r-b
> 04: fix commit message and include guards, add Max's r-b
> 05-06: add Max's r-b
> 
> v2: [mostly by Max's comments]
> Now based on master (Thank you Max!)
> 01: add Max's r-b
> 02: add Max's r-b
> 03: - refactor block_copy_do_copy goto/return
> - add small comment to block_copy_do_copy
> 04: - a lot of renaming and wording fixes
> - refactor to use "available" instead of "taken"
> - refactor co_get_from_shres
> 05: rebase on 04 changes
> 06: drop extra things from max_transfer calculation
> 
> Vladimir Sementsov-Ogievskiy (6):
>   block/block-copy: allocate buffer in block_copy_with_bounce_buffer
>   block/block-copy: limit copy_range_size to 16 MiB
>   block/block-copy: refactor copying
>   util: introduce SharedResource
>   block/block-copy: add memory limit
>   block/block-copy: increase buffered copy request
> 
>  include/block/block-copy.h|   5 +-
>  include/qemu/co-shared-resource.h |  71 
>  block/block-copy.c| 182 +++---
>  util/qemu-co-shared-resource.c|  76 +
>  block/trace-events|   6 +-
>  util/Makefile.objs|   1 +
>  6 files changed, 249 insertions(+), 92 deletions(-)
>  create mode 100644 include/qemu/co-shared-resource.h
>  create mode 100644 util/qemu-co-shared-resource.c

Thanks, applied to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 0/8] Packed virtqueue for virtio

2019-10-25 Thread Michael S. Tsirkin
On Fri, Oct 25, 2019 at 10:35:19AM +0200, Eugenio Pérez wrote:
> Hi:
> 
> This is an updated version of packed virtqueue support based on Wei and 
> Jason's
> V6, mainly solving the clang leak detector error CI gave.


Looks good, I will queue this up.

It would be nice to add libqos based tests on top,
based on Stefan's work.


> Please review.
> 
> Changes from V6:
> - Commit reorder: Squash bugfix and sepparate big changes into smaller 
> commits.
> 
> Changes from V5:
> - Fix qemu's CI asan error.
> - Move/copy rcu comments.
> - Merge duplicated vdev->broken check between split and packet version.
> 
> Eugenio Pérez (2):
>   virtio: Free blk virqueues at unrealize()
>   virtio: Free rnd virqueue at unrealize()
> 
> Jason Wang (4):
>   virtio: basic packed virtqueue support
>   virtio: event suppression support for packed ring
>   vhost_net: enable packed ring support
>   virtio: add property to enable packed virtqueue
> 
> Wei Xu (2):
>   virtio: basic structure for packed ring
>   virtio: device/driver area size calculation refactor for split ring
> 
>  hw/block/virtio-blk.c   |7 +-
>  hw/char/virtio-serial-bus.c |2 +-
>  hw/net/vhost_net.c  |2 +
>  hw/scsi/virtio-scsi.c   |3 +-
>  hw/virtio/virtio-rng.c  |1 +
>  hw/virtio/virtio.c  | 1154 
> ++-
>  include/hw/virtio/virtio.h  |   14 +-
>  7 files changed, 1045 insertions(+), 138 deletions(-)
> 
> -- 
> 2.16.5



Re: [PATCH v4 00/16] libqos: add VIRTIO PCI 1.0 support

2019-10-25 Thread Michael S. Tsirkin
On Wed, Oct 23, 2019 at 11:04:09AM +0100, Stefan Hajnoczi wrote:
> v4:
>  * Introduce bool d->features_negotiated so that tests can negotiate a
>0 feature bit set in Legacy mode [Thomas]
>  * Make the FEATURES_OK code change in qvirtio_set_driver_ok() clearer and
>mention it in the commit description [Thomas]
>  * Fix indentation in qvring_init() [Thomas]
> v3:
>  * Now implements VIRTIO 1.0 fully (vring, device initialization).
>This required several new patches to address the following issues:
>1. Tests that do not negotiate features (non-compliant!)
>2. Tests that access configuration space before feature negotiation
>   (non-compliant!)
>3. Tests that set up virtqueues before feature negotiation (non-compliant!)
>4. vring accesses require byte-swapping when VIRTIO 1.0 is used with a
>   big-endian guest because the qtest_readX/writeX() API automatically
>   converts to guest-endian
>5. VIRTIO 1.0 has an additional FEATURES_OK step during Device
>   Initialization
>  * Change uint8_t bar_idx to int [Thomas]
>  * Document qpci_find_capability() [Thomas]
>  * Every commit tested with arm, ppc64, and x86_64 targets using:
>git rebase -i origin/master -x 'make tests/qos-test &&
>QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/qos-test &&
>QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/qos-test'
>QTEST_QEMU_BINARY=arm-softmmu/qemu-system-arm tests/qos-test'
> v2:
>  * Fix checkpatch.pl issues, except MAINTAINERS file warning.  libqos already
>has maintainers and the new virtio-pci-modern.[ch] files don't need extra
>entries since they are already covered by the existing libqos/ entry.
> 
> New VIRTIO devices are Non-Transitional.  This means they only expose the
> VIRTIO 1.0 interface, not the Legacy interface.
> 
> The libqos only supports Legacy and Transitional devices (in Legacy mode).
> This patch series adds VIRTIO 1.0 support so that tests can run against
> Non-Transitional devices too.

Very nice, thanks!
I'll queue this up in my tree.

> The virtio-fs device is Non-Transitional, so this is a prerequisite for
> virtio-fs qos tests.
> 
> Stefan Hajnoczi (16):
>   tests/virtio-blk-test: read config space after feature negotiation
>   libqos: read QVIRTIO_MMIO_VERSION register
>   libqos: extend feature bits to 64-bit
>   virtio-scsi-test: add missing feature negotiation
>   tests/virtio-blk-test: set up virtqueue after feature negotiation
>   libqos: add missing virtio-9p feature negotiation
>   libqos: enforce Device Initialization order
>   libqos: implement VIRTIO 1.0 FEATURES_OK step
>   libqos: access VIRTIO 1.0 vring in little-endian
>   libqos: add iteration support to qpci_find_capability()
>   libqos: pass full QVirtQueue to set_queue_address()
>   libqos: add MSI-X callbacks to QVirtioPCIDevice
>   libqos: expose common virtqueue setup/cleanup functions
>   libqos: make the virtio-pci BAR index configurable
>   libqos: extract Legacy virtio-pci.c code
>   libqos: add VIRTIO PCI 1.0 support
> 
>  tests/Makefile.include   |   1 +
>  tests/libqos/pci.h   |   2 +-
>  tests/libqos/virtio-mmio.h   |   1 +
>  tests/libqos/virtio-pci-modern.h |  17 ++
>  tests/libqos/virtio-pci.h|  34 ++-
>  tests/libqos/virtio.h|  19 +-
>  tests/libqos/pci.c   |  30 ++-
>  tests/libqos/virtio-9p.c |   6 +
>  tests/libqos/virtio-mmio.c   |  38 ++-
>  tests/libqos/virtio-net.c|   6 +-
>  tests/libqos/virtio-pci-modern.c | 443 +++
>  tests/libqos/virtio-pci.c| 105 +---
>  tests/libqos/virtio.c| 160 ---
>  tests/virtio-blk-test.c  |  66 +++--
>  tests/virtio-scsi-test.c |   8 +
>  15 files changed, 802 insertions(+), 134 deletions(-)
>  create mode 100644 tests/libqos/virtio-pci-modern.h
>  create mode 100644 tests/libqos/virtio-pci-modern.c
> 
> -- 
> 2.21.0



Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-25 Thread Fernando Casas Schössow
I managed to upgrade to qemu 4.1 on a test KVM host and I can confirm I can't 
repro the issue in this version.
Great news that is fixed in 4.1.

Thanks everyone for your inputs and the fast replies.

Kind regards,

Fernando

On vie, oct 25, 2019 at 12:28 PM, Fernando Casas Schössow 
 wrote:
Thanks for the reply Kevin.

I will do my best to upgrade to 4.1, test again and report back if this is 
fixed or not in that version.
Hopefully it is.

Fernando

On vie, oct 25, 2019 at 12:07 PM, Kevin Wolf  wrote:
Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben:
Hi John, Thanks for looking into this. I can quickly repro the problem with 
qemu 4.0 binary with debugging symbols enabled as I have it available already. 
Doing the same with qemu 4.1 or development head may be too much hassle but if 
it's really the only way I can give it try.
We had a lot of iothread related fixes in 4.1, so this would really be the only 
way to tell if it's a bug that still exists. I suspect that it's already fixed 
(and to be more precise, I assume that commit d0ee0204f fixed it). Kevin






Re: [PATCH v2 0/2] qcow2: Fix image corruption bug in 4.1

2019-10-25 Thread Max Reitz
On 24.10.19 16:26, Kevin Wolf wrote:
> This series fixes an image corruption bug that was introduced in commit
> 69f47505e ('block: avoid recursive block_status call if possible'),
> first contained in the QEMU 4.1.0 release.
> 
> This bug was reported by Michael Weiser on Launchpad:
> https://bugs.launchpad.net/qemu/+bug/1846427
> 
> v2:
> 
> - Dropped the assertion in qcow2_cache_do_get() for now. Making sure
>   that it actually holds true for all callers requires more work and
>   getting the corruption fix in quickly is important.
> 
> - Use atomic_read() and add comment to qemu_co_mutex_assert_locked()
>   implementation [Denis]
> 
> Kevin Wolf (2):
>   coroutine: Add qemu_co_mutex_assert_locked()
>   qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
> 
>  include/qemu/coroutine.h | 15 +++
>  block/qcow2-refcount.c   |  2 ++
>  block/qcow2.c|  3 ++-
>  3 files changed, 19 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [RFC 2/3] block/file-posix: Detect XFS with CONFIG_FALLOCATE

2019-10-25 Thread Kevin Wolf
Am 25.10.2019 um 11:58 hat Max Reitz geschrieben:
> We will need this for the next patch.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/file-posix.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 695fcf740d..5cd54c8bff 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -149,7 +149,7 @@ typedef struct BDRVRawState {
>  int perm_change_flags;
>  BDRVReopenState *reopen_state;
>  
> -#ifdef CONFIG_XFS
> +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
>  bool is_xfs:1;
>  #endif
>  bool has_discard:1;
> @@ -667,7 +667,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> *options,
>  }
>  #endif
>  
> -#ifdef CONFIG_XFS
> +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
>  if (platform_test_xfs_fd(s->fd)) {

platform_test_xfs_fd() is defined in a header file from xfsprogs. I
don't think you can call that without CONFIG_XFS, it would break the
build.

>  s->is_xfs = true;
>  }

Kevin




Re: [RFC 2/3] block/file-posix: Detect XFS with CONFIG_FALLOCATE

2019-10-25 Thread Max Reitz
On 25.10.19 12:35, Kevin Wolf wrote:
> Am 25.10.2019 um 12:22 hat Max Reitz geschrieben:
>> On 25.10.19 12:19, Kevin Wolf wrote:
>>> Am 25.10.2019 um 11:58 hat Max Reitz geschrieben:
 We will need this for the next patch.

 Signed-off-by: Max Reitz 
 ---
  block/file-posix.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/block/file-posix.c b/block/file-posix.c
 index 695fcf740d..5cd54c8bff 100644
 --- a/block/file-posix.c
 +++ b/block/file-posix.c
 @@ -149,7 +149,7 @@ typedef struct BDRVRawState {
  int perm_change_flags;
  BDRVReopenState *reopen_state;
  
 -#ifdef CONFIG_XFS
 +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
  bool is_xfs:1;
  #endif
  bool has_discard:1;
 @@ -667,7 +667,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
 *options,
  }
  #endif
  
 -#ifdef CONFIG_XFS
 +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
  if (platform_test_xfs_fd(s->fd)) {
>>>
>>> platform_test_xfs_fd() is defined in a header file from xfsprogs. I
>>> don't think you can call that without CONFIG_XFS, it would break the
>>> build.
>>
>> Aw.
>>
>> OK.  Should we then just assume is_xfs to be true (for the next patch)
>> with !defined(CONFIG_XFS) && defined(CONFIG_FALLOCATE)?
> 
> It's a small inline function that basically just calls statfs() and then
> checks f_type.

Yes, this is where my error came from.  I asked cscope, which happily
referred me to the inline implementation and so I didn’t bother looking
at the filename and just assumed it’d be some code that belongs to qemu.

> I think we can have a small function to implement this in the file-posix
> code. Don't copy it from the header because of licensing (LGPL, while
> file-posix is MIT); refer to the statfs() manpage instead and write it
> yourself.

OK.

Max



signature.asc
Description: OpenPGP digital signature


Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-25 Thread Kevin Wolf
Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben:
> Hi John,
> 
> Thanks for looking into this.  I can quickly repro the problem with
> qemu 4.0 binary with debugging symbols enabled as I have it available
> already.  Doing the same with qemu 4.1 or development head may be too
> much hassle but if it's really the only way I can give it try.

We had a lot of iothread related fixes in 4.1, so this would really be
the only way to tell if it's a bug that still exists. I suspect that
it's already fixed (and to be more precise, I assume that commit
d0ee0204f fixed it).

Kevin




Re: [RFC 2/3] block/file-posix: Detect XFS with CONFIG_FALLOCATE

2019-10-25 Thread Kevin Wolf
Am 25.10.2019 um 12:22 hat Max Reitz geschrieben:
> On 25.10.19 12:19, Kevin Wolf wrote:
> > Am 25.10.2019 um 11:58 hat Max Reitz geschrieben:
> >> We will need this for the next patch.
> >>
> >> Signed-off-by: Max Reitz 
> >> ---
> >>  block/file-posix.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/file-posix.c b/block/file-posix.c
> >> index 695fcf740d..5cd54c8bff 100644
> >> --- a/block/file-posix.c
> >> +++ b/block/file-posix.c
> >> @@ -149,7 +149,7 @@ typedef struct BDRVRawState {
> >>  int perm_change_flags;
> >>  BDRVReopenState *reopen_state;
> >>  
> >> -#ifdef CONFIG_XFS
> >> +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
> >>  bool is_xfs:1;
> >>  #endif
> >>  bool has_discard:1;
> >> @@ -667,7 +667,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> >> *options,
> >>  }
> >>  #endif
> >>  
> >> -#ifdef CONFIG_XFS
> >> +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
> >>  if (platform_test_xfs_fd(s->fd)) {
> > 
> > platform_test_xfs_fd() is defined in a header file from xfsprogs. I
> > don't think you can call that without CONFIG_XFS, it would break the
> > build.
> 
> Aw.
> 
> OK.  Should we then just assume is_xfs to be true (for the next patch)
> with !defined(CONFIG_XFS) && defined(CONFIG_FALLOCATE)?

It's a small inline function that basically just calls statfs() and then
checks f_type.

I think we can have a small function to implement this in the file-posix
code. Don't copy it from the header because of licensing (LGPL, while
file-posix is MIT); refer to the statfs() manpage instead and write it
yourself.

Kevin


signature.asc
Description: PGP signature


Re: qemu crashing when attaching an ISO file to a virtio-scsi CD-ROM device through libvirt

2019-10-25 Thread Fernando Casas Schössow
Thanks for the reply Kevin.

I will do my best to upgrade to 4.1, test again and report back if this is 
fixed or not in that version.
Hopefully it is.

Fernando

On vie, oct 25, 2019 at 12:07 PM, Kevin Wolf  wrote:
Am 23.10.2019 um 19:28 hat Fernando Casas Schössow geschrieben:
Hi John, Thanks for looking into this. I can quickly repro the problem with 
qemu 4.0 binary with debugging symbols enabled as I have it available already. 
Doing the same with qemu 4.1 or development head may be too much hassle but if 
it's really the only way I can give it try.
We had a lot of iothread related fixes in 4.1, so this would really be the only 
way to tell if it's a bug that still exists. I suspect that it's already fixed 
(and to be more precise, I assume that commit d0ee0204f fixed it). Kevin




Re: [RFC 2/3] block/file-posix: Detect XFS with CONFIG_FALLOCATE

2019-10-25 Thread Max Reitz
On 25.10.19 12:19, Kevin Wolf wrote:
> Am 25.10.2019 um 11:58 hat Max Reitz geschrieben:
>> We will need this for the next patch.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block/file-posix.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 695fcf740d..5cd54c8bff 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -149,7 +149,7 @@ typedef struct BDRVRawState {
>>  int perm_change_flags;
>>  BDRVReopenState *reopen_state;
>>  
>> -#ifdef CONFIG_XFS
>> +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
>>  bool is_xfs:1;
>>  #endif
>>  bool has_discard:1;
>> @@ -667,7 +667,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
>> *options,
>>  }
>>  #endif
>>  
>> -#ifdef CONFIG_XFS
>> +#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
>>  if (platform_test_xfs_fd(s->fd)) {
> 
> platform_test_xfs_fd() is defined in a header file from xfsprogs. I
> don't think you can call that without CONFIG_XFS, it would break the
> build.

Aw.

OK.  Should we then just assume is_xfs to be true (for the next patch)
with !defined(CONFIG_XFS) && defined(CONFIG_FALLOCATE)?

Max



signature.asc
Description: OpenPGP digital signature


[RFC 2/3] block/file-posix: Detect XFS with CONFIG_FALLOCATE

2019-10-25 Thread Max Reitz
We will need this for the next patch.

Signed-off-by: Max Reitz 
---
 block/file-posix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 695fcf740d..5cd54c8bff 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -149,7 +149,7 @@ typedef struct BDRVRawState {
 int perm_change_flags;
 BDRVReopenState *reopen_state;
 
-#ifdef CONFIG_XFS
+#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
 bool is_xfs:1;
 #endif
 bool has_discard:1;
@@ -667,7 +667,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 #endif
 
-#ifdef CONFIG_XFS
+#if defined(CONFIG_XFS) || defined(CONFIG_FALLOCATE)
 if (platform_test_xfs_fd(s->fd)) {
 s->is_xfs = true;
 }
-- 
2.21.0




[RFC 3/3] block/file-posix: Let post-EOF fallocate serialize

2019-10-25 Thread Max Reitz
The XFS kernel driver has a bug that may cause data corruption for qcow2
images as of qemu commit c8bb23cbdbe32f.  We can work around it by
treating post-EOF fallocates as serializing up until infinity (INT64_MAX
in practice).

Signed-off-by: Max Reitz 
---
 block/file-posix.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 5cd54c8bff..1f5a01df70 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2713,6 +2713,48 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int bytes,
 RawPosixAIOData acb;
 ThreadPoolFunc *handler;
 
+#ifdef CONFIG_FALLOCATE
+if (s->is_xfs && s->use_linux_aio &&
+offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE)
+{
+BdrvTrackedRequest *req;
+uint64_t end;
+
+/*
+ * The Linux XFS driver has a bug where it will discard writes
+ * submitted through the AIO interface if they happen beyond a
+ * concurrently running fallocate() that increases the file
+ * length (i.e., both the write and the fallocate() happen
+ * beyond the EOF).
+ *
+ * To work around it, we look for the tracked request for this
+ * zero write, extend it until INT64_MAX (effectively
+ * infinity), and mark it as serializing.
+ *
+ * TODO: Detect whether this has been fixed in the XFS driver.
+ */
+
+QLIST_FOREACH(req, &bs->tracked_requests, list) {
+if (req->co == qemu_coroutine_self() &&
+req->type == BDRV_TRACKED_WRITE)
+{
+break;
+}
+}
+
+assert(req);
+assert(req->offset <= offset);
+assert(req->offset + req->bytes >= offset + bytes);
+
+end = INT64_MAX & -(uint64_t)bs->bl.request_alignment;
+req->bytes = end - req->offset;
+req->overlap_bytes = req->bytes;
+
+bdrv_mark_request_serialising(req, bs->bl.request_alignment);
+bdrv_wait_serialising_requests(req);
+}
+#endif
+
 acb = (RawPosixAIOData) {
 .bs = bs,
 .aio_fildes = s->fd,
-- 
2.21.0




[RFC 0/3] block/file-posix: Work around XFS bug

2019-10-25 Thread Max Reitz
Hi,

It seems to me that there is a bug in Linux’s XFS kernel driver, as
I’ve explained here:

https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01429.html

In combination with our commit c8bb23cbdbe32f, this may lead to guest
data corruption when using qcow2 images on XFS with aio=native.

We can’t wait until the XFS kernel driver is fixed, we should work
around the problem ourselves.

This is an RFC for two reasons:
(1) I don’t know whether this is the right way to address the issue,
(2) Ideally, we should detect whether the XFS kernel driver is fixed and
if so stop applying the workaround.
I don’t know how we would go about this, so this series doesn’t do
it.  (Hence it’s an RFC.)
(3) Perhaps it’s a bit of a layering violation to let the file-posix
driver access and modify a BdrvTrackedRequest object.

As for how we can address the issue, I see three ways:
(1) The one presented in this series: On XFS with aio=native, we extend
tracked requests for post-EOF fallocate() calls (i.e., write-zero
operations) to reach until infinity (INT64_MAX in practice), mark
them serializing and wait for other conflicting requests.

Advantages:
+ Limits the impact to very specific cases
  (And that means it wouldn’t hurt too much to keep this workaround
  even when the XFS driver has been fixed)
+ Works around the bug where it happens, namely in file-posix

Disadvantages:
- A bit complex
- A bit of a layering violation (should file-posix have access to
  tracked requests?)

(2) Always skip qcow2’s handle_alloc_space() on XFS.  The XFS bug only
becomes visible due to that function: I don’t think qcow2 writes
zeroes in any other I/O path, and raw images are fixed in size so
post-EOF writes won’t happen.

Advantages:
+ Maybe simpler, depending on how difficult it is to handle the
  layering violation
+ Also fixes the performance problem of handle_alloc_space() being
  slow on ppc64+XFS.

Disadvantages:
- Huge layering violation because qcow2 would need to know whether
  the image is stored on XFS or not.
- We’d definitely want to skip this workaround when the XFS driver
  has been fixed, so we need some method to find out whether it has

(3) Drop handle_alloc_space(), i.e. revert c8bb23cbdbe32f.
To my knowledge I’m the only one who has provided any benchmarks for
this commit, and even then I was a bit skeptical because it performs
well in some cases and bad in others.  I concluded that it’s
probably worth it because the “some cases” are more likely to occur.

Now we have this problem of corruption here (granted due to a bug in
the XFS driver), and another report of massively degraded
performance on ppc64
(https://bugzilla.redhat.com/show_bug.cgi?id=1745823 – sorry, a
private BZ; I hate that :-/  The report is about 40 % worse
performance for an in-guest fio write benchmark.)

So I have to ask the question about what the justification for
keeping c8bb23cbdbe32f is.  How much does performance increase with
it actually?  (On non-(ppc64+XFS) machines, obviously)

Advantages:
+ Trivial
+ No layering violations
+ We wouldn’t need to keep track of whether the kernel bug has been
  fixed or not
+ Fixes the ppc64+XFS performance problem

Disadvantages:
- Reverts cluster allocation performance to pre-c8bb23cbdbe32f
  levels, whatever that means

So this is the main reason this is an RFC: What should we do?  Is (1)
really the best choice?


In any case, I’ve ran the test case I showed in
https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html
more than ten times with this series applied and the installation
succeeded every time.  (Without this series, it fails like every other
time.)


Max Reitz (3):
  block: Make wait/mark serialising requests public
  block/file-posix: Detect XFS with CONFIG_FALLOCATE
  block/file-posix: Let post-EOF fallocate serialize

 include/block/block_int.h |  3 +++
 block/file-posix.c| 46 +--
 block/io.c| 24 ++--
 3 files changed, 59 insertions(+), 14 deletions(-)

-- 
2.21.0




[RFC 1/3] block: Make wait/mark serialising requests public

2019-10-25 Thread Max Reitz
Make both bdrv_mark_request_serialising() and
bdrv_wait_serialising_requests() public so they can be used from block
drivers.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |  3 +++
 block/io.c| 24 
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index ca4ccac4c1..c85733293d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -984,6 +984,9 @@ extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState 
*old_parent);
 
+bool coroutine_fn bdrv_wait_serialising_requests(BdrvTrackedRequest *self);
+void bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align);
+
 int get_tmp_filename(char *filename, int size);
 BlockDriver *bdrv_probe_all(const uint8_t *buf, int buf_size,
 const char *filename);
diff --git a/block/io.c b/block/io.c
index f0b86c1d19..a65cc7fb61 100644
--- a/block/io.c
+++ b/block/io.c
@@ -715,7 +715,7 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 qemu_co_mutex_unlock(&bs->reqs_lock);
 }
 
-static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
+void bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
 {
 int64_t overlap_offset = req->offset & ~(align - 1);
 uint64_t overlap_bytes = ROUND_UP(req->offset + req->bytes, align)
@@ -805,7 +805,7 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
 bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+bool coroutine_fn bdrv_wait_serialising_requests(BdrvTrackedRequest *self)
 {
 BlockDriverState *bs = self->bs;
 BdrvTrackedRequest *req;
@@ -1437,14 +1437,14 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
  * with each other for the same cluster.  For example, in copy-on-read
  * it ensures that the CoR read and write operations are atomic and
  * guest writes cannot interleave between them. */
-mark_request_serialising(req, bdrv_get_cluster_size(bs));
+bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
 }
 
 /* BDRV_REQ_SERIALISING is only for write operation */
 assert(!(flags & BDRV_REQ_SERIALISING));
 
 if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-wait_serialising_requests(req);
+bdrv_wait_serialising_requests(req);
 }
 
 if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1841,10 +1841,10 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t 
offset, uint64_t bytes,
 assert(!(flags & ~BDRV_REQ_MASK));
 
 if (flags & BDRV_REQ_SERIALISING) {
-mark_request_serialising(req, bdrv_get_cluster_size(bs));
+bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
 }
 
-waited = wait_serialising_requests(req);
+waited = bdrv_wait_serialising_requests(req);
 
 assert(!waited || !req->serialising ||
is_request_serialising_and_aligned(req));
@@ -2008,8 +2008,8 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild 
*child,
 
 padding = bdrv_init_padding(bs, offset, bytes, &pad);
 if (padding) {
-mark_request_serialising(req, align);
-wait_serialising_requests(req);
+bdrv_mark_request_serialising(req, align);
+bdrv_wait_serialising_requests(req);
 
 bdrv_padding_rmw_read(child, req, &pad, true);
 
@@ -2111,8 +2111,8 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
 }
 
 if (bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad)) {
-mark_request_serialising(&req, align);
-wait_serialising_requests(&req);
+bdrv_mark_request_serialising(&req, align);
+bdrv_wait_serialising_requests(&req);
 bdrv_padding_rmw_read(child, &req, &pad, false);
 }
 
@@ -3205,7 +3205,7 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 /* BDRV_REQ_SERIALISING is only for write operation */
 assert(!(read_flags & BDRV_REQ_SERIALISING));
 if (!(read_flags & BDRV_REQ_NO_SERIALISING)) {
-wait_serialising_requests(&req);
+bdrv_wait_serialising_requests(&req);
 }
 
 ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
@@ -3332,7 +3332,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset,
  * new area, we need to make sure that no write requests are made to it
  * concurrently or they might be overwritten by preallocation. */
 if (new_bytes) {
-mark_request_serialising(&req, 1);
+bdrv_mark_request_serialising(&req, 1);
 }
 if (bs->read_only) {
 error_setg(errp, "Image is read-only");
-- 
2.21.0




[bugfix ping] Re: [PATCH v2 0/2] fix qcow2_can_store_new_dirty_bitmap

2019-10-25 Thread Vladimir Sementsov-Ogievskiy
Hi!

Don't we forget it?

Here is a bug-fix, I think we want it for 4.2.

14.10.2019 14:51, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is a fix for persistent bitmaps managing: we must check existent
> but not yet stored bitmaps for qcow2-related constraints, like maximum
> number of bitmaps in qcow2 image.
> 
> v2:
> 
> 01: change assertion to error-return at function start
>  Be free to add
>  Reported-by: aihua liang 
>  Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1712636
>  if it's appropriate
> 02: new test
>  Ohh, it takes about 4 minutes. Be free to drop it, as I doubt that
>  it worth to have. The case is simple, we may live without a
>  test.
> 
> Vladimir Sementsov-Ogievskiy (2):
>qcow2-bitmaps: fix qcow2_can_store_new_dirty_bitmap
>iotests: add 269 to check maximum of bitmaps in qcow2
> 
>   block/qcow2-bitmap.c   | 41 +++--
>   tests/qemu-iotests/269 | 47 ++
>   tests/qemu-iotests/269.out |  3 +++
>   tests/qemu-iotests/group   |  1 +
>   4 files changed, 69 insertions(+), 23 deletions(-)
>   create mode 100755 tests/qemu-iotests/269
>   create mode 100644 tests/qemu-iotests/269.out
> 


-- 
Best regards,
Vladimir


Re: [PATCH 3/4] hw/scsi: add SCSI COMPARE_AND_WRITE support

2019-10-25 Thread Yaowei Bai
On Fri, Oct 25, 2019 at 05:36:01PM +0800, Yaowei Bai wrote:
>  
> diff --git a/include/tcmu/tcmu.h b/include/tcmu/tcmu.h
> new file mode 100644
> index 000..656a545
> --- /dev/null
> +++ b/include/tcmu/tcmu.h
> @@ -0,0 +1,14 @@
> +#ifndef QEMU_TCMU_H
> +#define QEMU_TCMU_H
> +
> +#include "qemu-common.h"
> +
> +typedef struct TCMUExport TCMUExport;
> +extern QemuOptsList qemu_tcmu_export_opts;
> +
> +void qemu_tcmu_stop(void);
> +void qemu_tcmu_start(const char *subtype, Error **errp);
> +TCMUExport *tcmu_export_new(BlockBackend *blk, bool writable, Error **errp);
> +int export_init_func(void *opaque, QemuOpts *all_opts, Error **errp);
> +
> +#endif

Please ignore this odd part.

> -- 
> 1.8.3.1
> 





[PATCH 3/4] hw/scsi: add SCSI COMPARE_AND_WRITE support

2019-10-25 Thread Yaowei Bai
This patch emulates COMPARE_AND_WRITE command with the
BDRV_REQ_COMPARE_AND_WRITE flag introduced by last patch. It matches
the SBC-4 standard except the FUA bit support, it'll be finished in
the next patch.

Note that cmd->xfer is set 2 * the number got by scsi_data_cdb_xfer
so we could touch the least code.

Signed-off-by: Yaowei Bai 
---
 hw/scsi/emulation.c |  1 +
 hw/scsi/scsi-bus.c  |  4 +++
 hw/scsi/scsi-disk.c | 88 +
 hw/scsi/trace-events|  1 +
 include/hw/scsi/emulation.h |  3 ++
 include/scsi/utils.h|  2 ++
 include/tcmu/tcmu.h | 14 
 scsi/utils.c|  5 +++
 8 files changed, 118 insertions(+)
 create mode 100644 include/tcmu/tcmu.h

diff --git a/hw/scsi/emulation.c b/hw/scsi/emulation.c
index 06d62f3..1f53c4a 100644
--- a/hw/scsi/emulation.c
+++ b/hw/scsi/emulation.c
@@ -9,6 +9,7 @@ int scsi_emulate_block_limits(uint8_t *outbuf, const 
SCSIBlockLimits *bl)
 memset(outbuf, 0, 0x3c);
 
 outbuf[0] = bl->wsnz; /* wsnz */
+outbuf[1] = MAX_COMPARE_AND_WRITE_LENGTH;
 
 if (bl->max_io_sectors) {
 /* optimal transfer length granularity.  This field and the optimal
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index bccb7cc..15aab8a 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -987,6 +987,9 @@ static int scsi_req_xfer(SCSICommand *cmd, SCSIDevice *dev, 
uint8_t *buf)
 case WRITE_VERIFY_16:
 cmd->xfer *= dev->blocksize;
 break;
+case COMPARE_AND_WRITE:
+cmd->xfer *= 2 * dev->blocksize;
+break;
 case READ_6:
 case READ_REVERSE:
 /* length 0 means 256 blocks */
@@ -1206,6 +1209,7 @@ static void scsi_cmd_xfer_mode(SCSICommand *cmd)
 case PERSISTENT_RESERVE_OUT:
 case MAINTENANCE_OUT:
 case SET_WINDOW:
+case COMPARE_AND_WRITE:
 case SCAN:
 /* SCAN conflicts with START_STOP.  START_STOP has cmd->xfer set to 0 
for
  * non-scanner devices, so we only get here for SCAN and not for 
START_STOP.
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 68b1675..4bff862 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -477,6 +477,9 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int error, 
bool acct_failed)
 case ENOSPC:
 scsi_check_condition(r, SENSE_CODE(SPACE_ALLOC_FAILED));
 break;
+case EILSEQ:
+scsi_check_condition(r, SENSE_CODE(MISCOMPARE_DURING_VERIFY));
+break;
 default:
 scsi_check_condition(r, SENSE_CODE(IO_ERROR));
 break;
@@ -1824,6 +1827,84 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, 
uint8_t *inbuf)
scsi_write_same_complete, data);
 }
 
+typedef struct CompareAndWriteCBData {
+SCSIDiskReq *r;
+int64_t sector;
+int nb_sectors;
+QEMUIOVector qiov;
+struct iovec iov;
+} CompareAndWriteCBData;
+
+static void scsi_compare_and_write_complete(void *opaque, int ret)
+{
+CompareAndWriteCBData *data = opaque;
+SCSIDiskReq *r = data->r;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
+if (scsi_disk_req_check_error(r, ret, true)) {
+goto done;
+}
+
+block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
+scsi_req_complete(&r->req, GOOD);
+
+done:
+scsi_req_unref(&r->req);
+qemu_vfree(data->iov.iov_base);
+g_free(data);
+aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
+}
+
+static void scsi_disk_emulate_compare_and_write(SCSIDiskReq *r, uint8_t *inbuf)
+{
+SCSIRequest *req = &r->req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+uint32_t nb_sectors = scsi_data_cdb_xfer(r->req.cmd.buf);
+CompareAndWriteCBData *data;
+uint8_t *buf;
+int i;
+
+if (nb_sectors > MAX_COMPARE_AND_WRITE_LENGTH) {
+scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
+return;
+}
+
+if (blk_is_read_only(s->qdev.conf.blk)) {
+scsi_check_condition(r, SENSE_CODE(WRITE_PROTECTED));
+return;
+}
+
+if (r->req.cmd.lba > s->qdev.max_lba ||
+!check_lba_range(s, r->req.cmd.lba, nb_sectors)) {
+scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
+return;
+}
+
+data = g_new0(CompareAndWriteCBData, 1);
+data->r = r;
+data->sector = r->req.cmd.lba * (s->qdev.blocksize / 512);
+data->nb_sectors = r->req.cmd.xfer * (s->qdev.blocksize / 512);
+data->iov.iov_len = data->nb_sectors;
+data->iov.iov_base = buf = blk_blockalign(s->qdev.conf.blk,
+  data->iov.iov_len);
+qemu_iovec_init_external(&data->qiov, &data->iov, 1);
+
+for (i = 0; i < data->iov.iov_len; i += s->qdev.blocksize) {
+memcpy(&buf[i], &inbuf[i]

[PATCH 0/4] SCSI COMPARE_AND_WRITE support

2019-10-25 Thread Yaowei Bai
Recently ceph/librbd added several interfaces to handle SCSI commands like
COMPARE_AND_WRITE directly. However they were only be used in special
scenarios, i.e. ISCSI. That involves more software components which makes
the IO path longer and could bring more potential issues. Actually we're
maintaining several ceph clusters with ISCSI protocal being used which,
i have to say, is not an easy job.

So supporting COMPARE_AND_WRITE in scsi-disk would be a reasonable solution
and easy to implement with the help of the SCSI handlers in librbd, which
could leave alone the ISCSI stuff and make the IO path shorter.

This patchset implements it by reusing the blk_aio_pwritev interface and
introducing a new BDRV_REQ_COMPARE_AND_WRITE element into BdrvRequestFlags
to indicate a COMPARE_AND_WRITE request, rather than adding a new interface
into block-backend.c.

The FUA support is implemented in the blk_aio_pwritev's callback function
similar to its emulation in scsi_write_do_fua function, maybe through
BDRV_REQ_FUA is another doable way.

This patchset is tested with the method of sg_compare_and_write.txt from
sg3_utils. Link below:

https://github.com/hreinecke/sg3_utils/blob/master/examples/sg_compare_and_write.txt

Yaowei Bai (4):
  block: add SCSI COMPARE_AND_WRITE support
  block/rbd: implement bdrv_aio_compare_and_write interface
  hw/scsi: add SCSI COMPARE_AND_WRITE support
  scsi-disk: add FUA support for COMPARE_AND_WRITE

 block/io.c  | 20 +
 block/raw-format.c  |  3 +-
 block/rbd.c | 41 ++-
 hw/scsi/emulation.c |  1 +
 hw/scsi/scsi-bus.c  |  4 ++
 hw/scsi/scsi-disk.c | 98 +
 hw/scsi/trace-events|  1 +
 include/block/block.h   |  5 ++-
 include/block/block_int.h   |  3 ++
 include/hw/scsi/emulation.h |  3 ++
 include/scsi/utils.h|  2 +
 include/tcmu/tcmu.h | 14 +++
 scsi/utils.c|  5 +++
 13 files changed, 195 insertions(+), 5 deletions(-)
 create mode 100644 include/tcmu/tcmu.h

-- 
1.8.3.1






[PATCH 4/4] scsi-disk: add FUA support for COMPARE_AND_WRITE

2019-10-25 Thread Yaowei Bai
It is implemented in the blk_aio_pwritev's callback function in a way
similar to its emulation in scsi_write_do_fua function

Signed-off-by: Yaowei Bai 
---
 hw/scsi/scsi-disk.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 4bff862..ef9c257 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -228,6 +228,7 @@ static bool scsi_is_cmd_fua(SCSICommand *cmd)
 case WRITE_10:
 case WRITE_12:
 case WRITE_16:
+case COMPARE_AND_WRITE:
 return (cmd->buf[1] & 8) != 0;
 
 case VERIFY_10:
@@ -1849,10 +1850,17 @@ static void scsi_compare_and_write_complete(void 
*opaque, int ret)
 }
 
 block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
+if (r->need_fua_emulation) {
+block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
+ BLOCK_ACCT_FLUSH);
+r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_aio_complete, r);
+goto free;
+}
 scsi_req_complete(&r->req, GOOD);
 
 done:
 scsi_req_unref(&r->req);
+free:
 qemu_vfree(data->iov.iov_base);
 g_free(data);
 aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
@@ -1953,6 +1961,7 @@ static int32_t scsi_disk_emulate_command(SCSIRequest 
*req, uint8_t *buf)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
 uint64_t nb_sectors;
 uint8_t *outbuf;
 int buflen;
@@ -2208,6 +2217,7 @@ static int32_t scsi_disk_emulate_command(SCSIRequest 
*req, uint8_t *buf)
 return 0;
 }
 assert(!r->req.aiocb);
+r->need_fua_emulation = sdc->need_fua_emulation(&r->req.cmd);
 r->iov.iov_len = MIN(r->buflen, req->cmd.xfer);
 if (r->iov.iov_len == 0) {
 scsi_req_complete(&r->req, GOOD);
-- 
1.8.3.1






[PATCH 2/4] block/rbd: implement bdrv_aio_compare_and_write interface

2019-10-25 Thread Yaowei Bai
This patch adds librbd's SCSI COMPARE_AND_WRITE command interface
support with bdrv_aio_compare_and_write function pointer. Note
currently when a miscompare happens a mismatch offset of 0 is
always reported rather than the actual mismatch offset. This
should not be a big issue contemporarily and will be fixed
accordingly in the future.

Signed-off-by: Yaowei Bai 
---
 block/raw-format.c |  3 ++-
 block/rbd.c| 41 +++--
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index 42c28cc..3d8f201 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -438,7 +438,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 
 bs->sg = bs->file->bs->sg;
 bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
-(BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
+(BDRV_REQ_FUA & bs->file->bs->supported_write_flags) |
+(BDRV_REQ_COMPARE_AND_WRITE & bs->file->bs->supported_write_flags);
 bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
diff --git a/block/rbd.c b/block/rbd.c
index c71e45d..7065343 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -73,11 +73,18 @@
 #define LIBRBD_USE_IOVEC 0
 #endif
 
+#ifdef LIBRBD_SUPPORTS_COMPARE_AND_WRITE
+#define LIBRBD_HAVE_COMPARE_AND_WRITE 1
+#else
+#define LIBRBD_HAVE_COMPARE_AND_WRITE 0
+#endif
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
 RBD_AIO_DISCARD,
-RBD_AIO_FLUSH
+RBD_AIO_FLUSH,
+RBD_AIO_COMPARE_AND_WRITE
 } RBDAIOCmd;
 
 typedef struct RBDAIOCB {
@@ -798,6 +805,9 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
+if (LIBRBD_HAVE_COMPARE_AND_WRITE)
+bs->supported_write_flags = BDRV_REQ_COMPARE_AND_WRITE;
+
 r = 0;
 goto out;
 
@@ -933,7 +943,15 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 
 rcb = g_new(RADOSCB, 1);
 
-if (!LIBRBD_USE_IOVEC) {
+if (cmd == RBD_AIO_COMPARE_AND_WRITE) {
+acb->bounce = qemu_try_blockalign(bs, qiov->size);
+if (acb->bounce == NULL) {
+goto failed;
+}
+
+qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
+rcb->buf = acb->bounce;
+} else if (!LIBRBD_USE_IOVEC) {
 if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
 acb->bounce = NULL;
 } else {
@@ -993,6 +1011,9 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 case RBD_AIO_FLUSH:
 r = rbd_aio_flush_wrapper(s->image, c);
 break;
+case RBD_AIO_COMPARE_AND_WRITE:
+r = rbd_aio_compare_and_write(s->image, off, size/2, rcb->buf, 
(rcb->buf + size/2), c, 0, 0);
+break;
 default:
 r = -EINVAL;
 }
@@ -1056,6 +1077,18 @@ static int qemu_rbd_co_flush(BlockDriverState *bs)
 }
 #endif
 
+#ifdef LIBRBD_SUPPORTS_COMPARE_AND_WRITE
+static BlockAIOCB *qemu_rbd_aio_compare_and_write(BlockDriverState *bs,
+  uint64_t offset, uint64_t 
bytes,
+  QEMUIOVector *qiov, int 
flags,
+  BlockCompletionFunc *cb,
+  void *opaque)
+{
+return rbd_start_aio(bs, offset, qiov, bytes, cb, opaque,
+ RBD_AIO_COMPARE_AND_WRITE);
+}
+#endif
+
 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVRBDState *s = bs->opaque;
@@ -1309,6 +1342,10 @@ static BlockDriver bdrv_rbd = {
 .bdrv_aio_pdiscard  = qemu_rbd_aio_pdiscard,
 #endif
 
+#ifdef LIBRBD_SUPPORTS_COMPARE_AND_WRITE
+.bdrv_aio_compare_and_write = qemu_rbd_aio_compare_and_write,
+#endif
+
 .bdrv_snapshot_create   = qemu_rbd_snap_create,
 .bdrv_snapshot_delete   = qemu_rbd_snap_remove,
 .bdrv_snapshot_list = qemu_rbd_snap_list,
-- 
1.8.3.1






[PATCH 1/4] block: add SCSI COMPARE_AND_WRITE support

2019-10-25 Thread Yaowei Bai
Some storages(i.e. librbd) already have interfaces to handle some SCSI
commands directly. This patch adds COMPARE_AND_WRITE command support
through the write path in the block io layer by introducing a new element
BDRV_REQ_COMPARE_AND_WRITE into BdrvRequestFlags which indicates a
COMPARE_AND_WRITE request. In this way we could easily extend to other
SCSI commands support like WRITE_SAME in the future.

Signed-off-by: Yaowei Bai 
---
 block/io.c| 20 
 include/block/block.h |  5 +++--
 include/block/block_int.h |  3 +++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index f0b86c1..667b5ea 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1168,6 +1168,26 @@ static int coroutine_fn 
bdrv_driver_pwritev(BlockDriverState *bs,
 goto emulate_flags;
 }
 
+if (drv->bdrv_aio_compare_and_write &&
+  (flags & BDRV_REQ_COMPARE_AND_WRITE)) {
+BlockAIOCB *acb;
+CoroutineIOCompletion co = {
+.coroutine = qemu_coroutine_self(),
+};
+
+acb = drv->bdrv_aio_compare_and_write(bs, offset, bytes, qiov,
+flags & bs->supported_write_flags,
+bdrv_co_io_em_complete, &co);
+flags &= ~bs->supported_write_flags;
+if (acb == NULL) {
+ret = -EIO;
+} else {
+qemu_coroutine_yield();
+ret = co.ret;
+}
+goto emulate_flags;
+}
+
 if (drv->bdrv_aio_pwritev) {
 BlockAIOCB *acb;
 CoroutineIOCompletion co = {
diff --git a/include/block/block.h b/include/block/block.h
index 89606bd..7159e03 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -92,9 +92,10 @@ typedef enum {
  * on read request and means that caller doesn't really need data to be
  * written to qiov parameter which may be NULL.
  */
-BDRV_REQ_PREFETCH  = 0x200,
+BDRV_REQ_PREFETCH   = 0x200,
+BDRV_REQ_COMPARE_AND_WRITE  = 0x400,
 /* Mask of valid flags */
-BDRV_REQ_MASK   = 0x3ff,
+BDRV_REQ_MASK   = 0x7ff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ca4ccac..a039b68 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -189,6 +189,9 @@ struct BlockDriver {
 BlockAIOCB *(*bdrv_aio_pdiscard)(BlockDriverState *bs,
 int64_t offset, int bytes,
 BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *(*bdrv_aio_compare_and_write)(BlockDriverState *bs,
+uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags,
+BlockCompletionFunc *cb, void *opaque);
 
 int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
-- 
1.8.3.1






[PULL 19/19] hw/rtc/aspeed_rtc: Remove unused includes

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The system  include is already provided by "osdep.h"
(the scripts/clean-includes file clean such headers).

Commit 64552b6be47 suggests we don't need to include "hw/irq.h":

Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to
qemu/typedefs.h, and then include hw/irq.h only where it's still
needed.

Reviewed-by: Cédric Le Goater 
Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-15-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 include/hw/rtc/aspeed_rtc.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/hw/rtc/aspeed_rtc.h b/include/hw/rtc/aspeed_rtc.h
index 3fde854ad99c..b94a7102688a 100644
--- a/include/hw/rtc/aspeed_rtc.h
+++ b/include/hw/rtc/aspeed_rtc.h
@@ -8,9 +8,6 @@
 #ifndef HW_RTC_ASPEED_RTC_H
 #define HW_RTC_ASPEED_RTC_H
 
-#include 
-
-#include "hw/irq.h"
 #include "hw/sysbus.h"
 
 typedef struct AspeedRtcState {
-- 
2.21.0




[PULL 16/19] hw: Move Aspeed RTC from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

Move RTC devices under the hw/rtc/ subdirectory.

Reviewed-by: Cédric Le Goater 
Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-12-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 hw/rtc/Makefile.objs   | 1 +
 hw/{timer => rtc}/aspeed_rtc.c | 2 +-
 hw/rtc/trace-events| 4 
 hw/timer/Makefile.objs | 2 +-
 hw/timer/trace-events  | 4 
 include/hw/arm/aspeed_soc.h| 2 +-
 include/hw/{timer => rtc}/aspeed_rtc.h | 6 +++---
 7 files changed, 11 insertions(+), 10 deletions(-)
 rename hw/{timer => rtc}/aspeed_rtc.c (99%)
 rename include/hw/{timer => rtc}/aspeed_rtc.h (84%)

diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 3d4763fc269e..8dc9fcd3a983 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -10,3 +10,4 @@ common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-rtc.o
 common-obj-$(CONFIG_EXYNOS4) += exynos4210_rtc.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
+common-obj-$(CONFIG_ASPEED_SOC) += aspeed_rtc.o
diff --git a/hw/timer/aspeed_rtc.c b/hw/rtc/aspeed_rtc.c
similarity index 99%
rename from hw/timer/aspeed_rtc.c
rename to hw/rtc/aspeed_rtc.c
index 531301735334..3ca1183558b7 100644
--- a/hw/timer/aspeed_rtc.c
+++ b/hw/rtc/aspeed_rtc.c
@@ -8,7 +8,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
-#include "hw/timer/aspeed_rtc.h"
+#include "hw/rtc/aspeed_rtc.h"
 #include "migration/vmstate.h"
 #include "qemu/log.h"
 #include "qemu/timer.h"
diff --git a/hw/rtc/trace-events b/hw/rtc/trace-events
index 7f1945ad4cc6..d6749f4616a0 100644
--- a/hw/rtc/trace-events
+++ b/hw/rtc/trace-events
@@ -13,3 +13,7 @@ pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 
0x%08x"
 pl031_write(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
 pl031_alarm_raised(void) "alarm raised"
 pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
+
+# aspeed-rtc.c
+aspeed_rtc_read(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
+aspeed_rtc_write(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 33191d74cb98..83091770df3a 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -29,7 +29,7 @@ common-obj-$(CONFIG_MIPS_CPS) += mips_gictimer.o
 common-obj-$(CONFIG_ALLWINNER_A10_PIT) += allwinner-a10-pit.o
 
 common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
-common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o aspeed_rtc.o
+common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o
 
 common-obj-$(CONFIG_CMSDK_APB_TIMER) += cmsdk-apb-timer.o
 common-obj-$(CONFIG_CMSDK_APB_DUALTIMER) += cmsdk-apb-dualtimer.o
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index 1459d07237b9..e18b87fc96dd 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -66,10 +66,6 @@ cmsdk_apb_dualtimer_read(uint64_t offset, uint64_t data, 
unsigned size) "CMSDK A
 cmsdk_apb_dualtimer_write(uint64_t offset, uint64_t data, unsigned size) 
"CMSDK APB dualtimer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
 
-# hw/timer/aspeed-rtc.c
-aspeed_rtc_read(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
-aspeed_rtc_write(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
-
 # nrf51_timer.c
 nrf51_timer_read(uint64_t addr, uint32_t value, unsigned size) "read addr 0x%" 
PRIx64 " data 0x%" PRIx32 " size %u"
 nrf51_timer_write(uint64_t addr, uint32_t value, unsigned size) "write addr 
0x%" PRIx64 " data 0x%" PRIx32 " size %u"
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index cccb684a19bb..495c08be1b84 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -18,7 +18,7 @@
 #include "hw/misc/aspeed_sdmc.h"
 #include "hw/misc/aspeed_xdma.h"
 #include "hw/timer/aspeed_timer.h"
-#include "hw/timer/aspeed_rtc.h"
+#include "hw/rtc/aspeed_rtc.h"
 #include "hw/i2c/aspeed_i2c.h"
 #include "hw/ssi/aspeed_smc.h"
 #include "hw/watchdog/wdt_aspeed.h"
diff --git a/include/hw/timer/aspeed_rtc.h b/include/hw/rtc/aspeed_rtc.h
similarity index 84%
rename from include/hw/timer/aspeed_rtc.h
rename to include/hw/rtc/aspeed_rtc.h
index 15ba42912b7f..3fde854ad99c 100644
--- a/include/hw/timer/aspeed_rtc.h
+++ b/include/hw/rtc/aspeed_rtc.h
@@ -5,8 +5,8 @@
  * Copyright 2019 IBM Corp
  * SPDX-License-Identifier: GPL-2.0-or-later
  */
-#ifndef ASPEED_RTC_H
-#define ASPEED_RTC_H
+#ifndef HW_RTC_ASPEED_RTC_H
+#define HW_RTC_ASPEED_RTC_H
 
 #include 
 
@@ -27,4 +27,4 @@ typedef struct AspeedRtcState {
 #define TYPE_ASPEED_RTC "aspeed.rtc"
 #define ASPEED_RTC(obj) OBJECT_CHECK(AspeedRtcState, (obj), TYPE_ASPEED_RTC)
 
-#endif /* ASPEED_RTC_H */
+#endif /* HW_RTC_ASPEED_RTC_H */
-- 
2.21.0




[PATCH v7 8/8] virtio: add property to enable packed virtqueue

2019-10-25 Thread Eugenio Pérez
From: Jason Wang 

Signed-off-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
Reviewed-by: Jens Freimann 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index d123d5b181..40ddeafadb 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -285,7 +285,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("any_layout", _state, _field, \
   VIRTIO_F_ANY_LAYOUT, true), \
 DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
-  VIRTIO_F_IOMMU_PLATFORM, false)
+  VIRTIO_F_IOMMU_PLATFORM, false), \
+DEFINE_PROP_BIT64("packed", _state, _field, \
+  VIRTIO_F_RING_PACKED, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled(VirtIODevice *vdev, int n);
-- 
2.16.5




[PATCH v7 7/8] vhost_net: enable packed ring support

2019-10-25 Thread Eugenio Pérez
From: Jason Wang 

Signed-off-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
Reviewed-by: Jens Freimann 
---
 hw/net/vhost_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e975700f95..6b82803fa7 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -49,6 +49,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_VERSION_1,
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
+VIRTIO_F_RING_PACKED,
 VHOST_INVALID_FEATURE_BIT
 };
 
@@ -74,6 +75,7 @@ static const int user_feature_bits[] = {
 VIRTIO_NET_F_MRG_RXBUF,
 VIRTIO_NET_F_MTU,
 VIRTIO_F_IOMMU_PLATFORM,
+VIRTIO_F_RING_PACKED,
 
 /* This bit implies RARP isn't sent by QEMU out of band */
 VIRTIO_NET_F_GUEST_ANNOUNCE,
-- 
2.16.5




[PULL 14/19] hw: Move Xilinx ZynqMP RTC from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

Move RTC devices under the hw/rtc/ subdirectory.

Remove Alistair outdated email address (see commit c22e580c2ad).

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-10-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 hw/rtc/Makefile.objs| 1 +
 hw/rtc/trace-events | 3 +++
 hw/{timer => rtc}/xlnx-zynqmp-rtc.c | 2 +-
 hw/timer/Makefile.objs  | 1 -
 hw/timer/trace-events   | 3 ---
 include/hw/arm/xlnx-zynqmp.h| 2 +-
 include/hw/{timer => rtc}/xlnx-zynqmp-rtc.h | 6 +++---
 7 files changed, 9 insertions(+), 9 deletions(-)
 rename hw/{timer => rtc}/xlnx-zynqmp-rtc.c (99%)
 rename include/hw/{timer => rtc}/xlnx-zynqmp-rtc.h (95%)

diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index b195863291d1..543a550a0f11 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -6,5 +6,6 @@ common-obj-$(CONFIG_M48T59) += m48t59-isa.o
 endif
 common-obj-$(CONFIG_PL031) += pl031.o
 common-obj-$(CONFIG_TWL92230) += twl92230.o
+common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-rtc.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
diff --git a/hw/rtc/trace-events b/hw/rtc/trace-events
index ac9e0e0fba32..7f1945ad4cc6 100644
--- a/hw/rtc/trace-events
+++ b/hw/rtc/trace-events
@@ -4,6 +4,9 @@
 sun4v_rtc_read(uint64_t addr, uint64_t value) "read: addr 0x%" PRIx64 " value 
0x%" PRIx64
 sun4v_rtc_write(uint64_t addr, uint64_t value) "write: addr 0x%" PRIx64 " 
value 0x%" PRIx64
 
+# xlnx-zynqmp-rtc.c
+xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int 
sec) "Get time from host: %d-%d-%d %2d:%02d:%02d"
+
 # pl031.c
 pl031_irq_state(int level) "irq state %d"
 pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
diff --git a/hw/timer/xlnx-zynqmp-rtc.c b/hw/rtc/xlnx-zynqmp-rtc.c
similarity index 99%
rename from hw/timer/xlnx-zynqmp-rtc.c
rename to hw/rtc/xlnx-zynqmp-rtc.c
index 5692db98c2db..f9f09b72965a 100644
--- a/hw/timer/xlnx-zynqmp-rtc.c
+++ b/hw/rtc/xlnx-zynqmp-rtc.c
@@ -36,7 +36,7 @@
 #include "qemu/cutils.h"
 #include "sysemu/sysemu.h"
 #include "trace.h"
-#include "hw/timer/xlnx-zynqmp-rtc.h"
+#include "hw/rtc/xlnx-zynqmp-rtc.h"
 #include "migration/vmstate.h"
 
 #ifndef XLNX_ZYNQMP_RTC_ERR_DEBUG
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 70b61b69c7a4..294465ef47ad 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -14,7 +14,6 @@ common-obj-$(CONFIG_IMX) += imx_epit.o
 common-obj-$(CONFIG_IMX) += imx_gpt.o
 common-obj-$(CONFIG_LM32) += lm32_timer.o
 common-obj-$(CONFIG_MILKYMIST) += milkymist-sysctl.o
-common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-rtc.o
 common-obj-$(CONFIG_NRF51_SOC) += nrf51_timer.o
 
 common-obj-$(CONFIG_ALTERA_TIMER) += altera_timer.o
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index ce34b967db9f..1459d07237b9 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -70,9 +70,6 @@ cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
 aspeed_rtc_read(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
 aspeed_rtc_write(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
 
-# xlnx-zynqmp-rtc.c
-xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int 
sec) "Get time from host: %d-%d-%d %2d:%02d:%02d"
-
 # nrf51_timer.c
 nrf51_timer_read(uint64_t addr, uint32_t value, unsigned size) "read addr 0x%" 
PRIx64 " data 0x%" PRIx32 " size %u"
 nrf51_timer_write(uint64_t addr, uint32_t value, unsigned size) "write addr 
0x%" PRIx64 " data 0x%" PRIx32 " size %u"
diff --git a/include/hw/arm/xlnx-zynqmp.h b/include/hw/arm/xlnx-zynqmp.h
index d7483c3b4285..53076fa29a75 100644
--- a/include/hw/arm/xlnx-zynqmp.h
+++ b/include/hw/arm/xlnx-zynqmp.h
@@ -29,7 +29,7 @@
 #include "hw/dma/xlnx-zdma.h"
 #include "hw/display/xlnx_dp.h"
 #include "hw/intc/xlnx-zynqmp-ipi.h"
-#include "hw/timer/xlnx-zynqmp-rtc.h"
+#include "hw/rtc/xlnx-zynqmp-rtc.h"
 #include "hw/cpu/cluster.h"
 #include "target/arm/cpu.h"
 
diff --git a/include/hw/timer/xlnx-zynqmp-rtc.h 
b/include/hw/rtc/xlnx-zynqmp-rtc.h
similarity index 95%
rename from include/hw/timer/xlnx-zynqmp-rtc.h
rename to include/hw/rtc/xlnx-zynqmp-rtc.h
index 97e32322ed70..6fa1cb2f43f8 100644
--- a/include/hw/timer/xlnx-zynqmp-rtc.h
+++ b/include/hw/rtc/xlnx-zynqmp-rtc.h
@@ -3,7 +3,7 @@
  *
  * Copyright (c) 2017 Xilinx Inc.
  *
- * Written-by: Alistair Francis 
+ * Written-by: Alistair Francis
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
@@ -24,8 +24,8 @@
  * THE SOFTWARE.
  */
 
-#ifndef HW_TIMER_XLNX_ZYNQMP_RTC_H
-#define HW_TIMER_XLNX_ZYNQMP_RTC_H
+#ifndef HW_RTC_XLNX_ZYNQMP_H
+#define HW_RTC_XLNX_ZYNQMP_H
 

[PULL 18/19] hw/rtc/xlnx-zynqmp-rtc: Remove unused "ptimer.h" include

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The "hw/ptimer.h" header is not used, remove it.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-14-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 hw/rtc/xlnx-zynqmp-rtc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/rtc/xlnx-zynqmp-rtc.c b/hw/rtc/xlnx-zynqmp-rtc.c
index f9f09b72965a..2bcd14d7795d 100644
--- a/hw/rtc/xlnx-zynqmp-rtc.c
+++ b/hw/rtc/xlnx-zynqmp-rtc.c
@@ -32,7 +32,6 @@
 #include "qemu/log.h"
 #include "qemu/module.h"
 #include "hw/irq.h"
-#include "hw/ptimer.h"
 #include "qemu/cutils.h"
 #include "sysemu/sysemu.h"
 #include "trace.h"
-- 
2.21.0




[PULL 12/19] hw: Move TWL92230 device from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The TWL92230 is an "energy management device" companion with
a RTC. Since we mostly model the RTC, move it under the hw/rtc/
subdirectory.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-8-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 MAINTAINERS  | 2 +-
 hw/rtc/Kconfig   | 4 
 hw/rtc/Makefile.objs | 1 +
 hw/{timer => rtc}/twl92230.c | 0
 hw/timer/Kconfig | 4 
 hw/timer/Makefile.objs   | 1 -
 6 files changed, 6 insertions(+), 6 deletions(-)
 rename hw/{timer => rtc}/twl92230.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index a7de5e2d1253..0dc72d37bdc8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -663,7 +663,7 @@ F: hw/display/blizzard.c
 F: hw/input/lm832x.c
 F: hw/input/tsc2005.c
 F: hw/misc/cbus.c
-F: hw/timer/twl92230.c
+F: hw/rtc/twl92230.c
 F: include/hw/display/blizzard.h
 F: include/hw/input/tsc2xxx.h
 F: include/hw/misc/cbus.h
diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index cc7fead764f4..dff9d60946af 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -8,6 +8,10 @@ config M48T59
 config PL031
 bool
 
+config TWL92230
+bool
+depends on I2C
+
 config MC146818RTC
 bool
 
diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 4621b37bc2f6..810a38ee7b3d 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -4,5 +4,6 @@ ifeq ($(CONFIG_ISA_BUS),y)
 common-obj-$(CONFIG_M48T59) += m48t59-isa.o
 endif
 common-obj-$(CONFIG_PL031) += pl031.o
+common-obj-$(CONFIG_TWL92230) += twl92230.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
diff --git a/hw/timer/twl92230.c b/hw/rtc/twl92230.c
similarity index 100%
rename from hw/timer/twl92230.c
rename to hw/rtc/twl92230.c
diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
index b04c928136c2..9357875f285d 100644
--- a/hw/timer/Kconfig
+++ b/hw/timer/Kconfig
@@ -20,10 +20,6 @@ config HPET
 config I8254
 bool
 
-config TWL92230
-bool
-depends on I2C
-
 config ALTERA_TIMER
 bool
 select PTIMER
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 034bd30255c0..23be70b71d32 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -7,7 +7,6 @@ common-obj-$(CONFIG_DS1338) += ds1338.o
 common-obj-$(CONFIG_HPET) += hpet.o
 common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
 common-obj-$(CONFIG_PUV3) += puv3_ost.o
-common-obj-$(CONFIG_TWL92230) += twl92230.o
 common-obj-$(CONFIG_XILINX) += xilinx_timer.o
 common-obj-$(CONFIG_SLAVIO) += slavio_timer.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_timer.o
-- 
2.21.0




[PULL 11/19] hw: Move sun4v hypervisor RTC from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

Move RTC devices under the hw/rtc/ subdirectory.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Artyom Tarasenko 
Message-Id: <20191003230404.19384-7-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 MAINTAINERS   |  4 ++--
 hw/rtc/Kconfig|  3 +++
 hw/rtc/Makefile.objs  |  1 +
 hw/{timer => rtc}/sun4v-rtc.c |  2 +-
 hw/rtc/trace-events   |  4 
 hw/sparc64/niagara.c  |  2 +-
 hw/timer/Kconfig  |  3 ---
 hw/timer/Makefile.objs|  1 -
 hw/timer/trace-events |  4 
 include/hw/rtc/sun4v-rtc.h| 19 +++
 include/hw/timer/sun4v-rtc.h  |  1 -
 11 files changed, 31 insertions(+), 13 deletions(-)
 rename hw/{timer => rtc}/sun4v-rtc.c (98%)
 create mode 100644 include/hw/rtc/sun4v-rtc.h
 delete mode 100644 include/hw/timer/sun4v-rtc.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ba0d1906aae9..a7de5e2d1253 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1163,8 +1163,8 @@ Sun4v
 M: Artyom Tarasenko 
 S: Maintained
 F: hw/sparc64/niagara.c
-F: hw/timer/sun4v-rtc.c
-F: include/hw/timer/sun4v-rtc.h
+F: hw/rtc/sun4v-rtc.c
+F: include/hw/rtc/sun4v-rtc.h
 
 Leon3
 M: Fabien Chouteau 
diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index 434b20b2b1bd..cc7fead764f4 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -10,3 +10,6 @@ config PL031
 
 config MC146818RTC
 bool
+
+config SUN4V_RTC
+bool
diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 89e8e48c6409..4621b37bc2f6 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -5,3 +5,4 @@ common-obj-$(CONFIG_M48T59) += m48t59-isa.o
 endif
 common-obj-$(CONFIG_PL031) += pl031.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
+common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
diff --git a/hw/timer/sun4v-rtc.c b/hw/rtc/sun4v-rtc.c
similarity index 98%
rename from hw/timer/sun4v-rtc.c
rename to hw/rtc/sun4v-rtc.c
index 54272a822fdf..ada01b57748b 100644
--- a/hw/timer/sun4v-rtc.c
+++ b/hw/rtc/sun4v-rtc.c
@@ -13,7 +13,7 @@
 #include "hw/sysbus.h"
 #include "qemu/module.h"
 #include "qemu/timer.h"
-#include "hw/timer/sun4v-rtc.h"
+#include "hw/rtc/sun4v-rtc.h"
 #include "trace.h"
 
 
diff --git a/hw/rtc/trace-events b/hw/rtc/trace-events
index 54c94ac557b2..ac9e0e0fba32 100644
--- a/hw/rtc/trace-events
+++ b/hw/rtc/trace-events
@@ -1,5 +1,9 @@
 # See docs/devel/tracing.txt for syntax documentation.
 
+# sun4v-rtc.c
+sun4v_rtc_read(uint64_t addr, uint64_t value) "read: addr 0x%" PRIx64 " value 
0x%" PRIx64
+sun4v_rtc_write(uint64_t addr, uint64_t value) "write: addr 0x%" PRIx64 " 
value 0x%" PRIx64
+
 # pl031.c
 pl031_irq_state(int level) "irq state %d"
 pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
diff --git a/hw/sparc64/niagara.c b/hw/sparc64/niagara.c
index 5987693659ed..5eb2d097b904 100644
--- a/hw/sparc64/niagara.c
+++ b/hw/sparc64/niagara.c
@@ -30,7 +30,7 @@
 #include "hw/misc/unimp.h"
 #include "hw/loader.h"
 #include "hw/sparc/sparc64.h"
-#include "hw/timer/sun4v-rtc.h"
+#include "hw/rtc/sun4v-rtc.h"
 #include "exec/address-spaces.h"
 #include "sysemu/block-backend.h"
 #include "qemu/error-report.h"
diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
index a6b668b2559b..b04c928136c2 100644
--- a/hw/timer/Kconfig
+++ b/hw/timer/Kconfig
@@ -35,9 +35,6 @@ config ALLWINNER_A10_PIT
 config STM32F2XX_TIMER
 bool
 
-config SUN4V_RTC
-bool
-
 config CMSDK_APB_TIMER
 bool
 select PTIMER
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 2fb12162a623..034bd30255c0 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -35,7 +35,6 @@ common-obj-$(CONFIG_ALLWINNER_A10_PIT) += allwinner-a10-pit.o
 common-obj-$(CONFIG_STM32F2XX_TIMER) += stm32f2xx_timer.o
 common-obj-$(CONFIG_ASPEED_SOC) += aspeed_timer.o aspeed_rtc.o
 
-common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
 common-obj-$(CONFIG_CMSDK_APB_TIMER) += cmsdk-apb-timer.o
 common-obj-$(CONFIG_CMSDK_APB_DUALTIMER) += cmsdk-apb-dualtimer.o
 common-obj-$(CONFIG_MSF2) += mss-timer.o
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index 6936fe8573e9..ce34b967db9f 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -70,10 +70,6 @@ cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
 aspeed_rtc_read(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
 aspeed_rtc_write(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 
0x%08" PRIx64
 
-# sun4v-rtc.c
-sun4v_rtc_read(uint64_t addr, uint64_t value) "read: addr 0x%" PRIx64 " value 
0x%" PRIx64
-sun4v_rtc_write(uint64_t addr, uint64_t value) "write: addr 0x%" PRIx64 " 
value 0x%" PRIx64
-
 # xlnx-zynqmp-rtc.c
 xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int 
sec) "Get time from host: %d-%d-%d %2d:%02d:%02d"
 
diff --git a/include/hw/rtc/sun4v-rtc.h b/include/hw/rtc/sun4v-rtc.h
new file mode 100644
index ..fd868f6ed2fa
--- /dev/null
+++

[PULL 17/19] hw/rtc/mc146818: Include mc146818rtc_regs.h a bit less

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

Only 2 source files require the "mc146818rtc_regs.h" header.
Instead of having it processed 12 times, by all objects
using "mc146818rtc.h", include it directly where used.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20191003230404.19384-13-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 hw/rtc/mc146818rtc.c | 1 +
 hw/timer/hpet.c  | 1 +
 include/hw/rtc/mc146818rtc.h | 1 -
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index ced15f764fc1..9d4ed54f65e2 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -35,6 +35,7 @@
 #include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "hw/rtc/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc_regs.h"
 #include "migration/vmstate.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc-target.h"
diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 02bf8a8ce8fc..9f17aaa278e2 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -34,6 +34,7 @@
 #include "hw/timer/hpet.h"
 #include "hw/sysbus.h"
 #include "hw/rtc/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc_regs.h"
 #include "migration/vmstate.h"
 #include "hw/timer/i8254.h"
 
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 2e9331637a6d..7fa59d4279c5 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -10,7 +10,6 @@
 #define HW_RTC_MC146818RTC_H
 
 #include "hw/isa/isa.h"
-#include "hw/rtc/mc146818rtc_regs.h"
 
 #define TYPE_MC146818_RTC "mc146818rtc"
 
-- 
2.21.0




[PATCH v7 5/8] virtio: basic packed virtqueue support

2019-10-25 Thread Eugenio Pérez
From: Jason Wang 

This patch implements basic support for the packed virtqueue. Compare
the split virtqueue which has three rings, packed virtqueue only have
one which is supposed to have better cache utilization and more
hardware friendly.

Please refer virtio specification for more information.

Signed-off-by: Wei Xu 
Signed-off-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
---
 hw/block/virtio-blk.c   |   2 +-
 hw/char/virtio-serial-bus.c |   2 +-
 hw/scsi/virtio-scsi.c   |   3 +-
 hw/virtio/virtio.c  | 901 
 include/hw/virtio/virtio.h  |  10 +-
 5 files changed, 836 insertions(+), 82 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ba846fe9dc..7dbdeaaab9 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1052,7 +1052,7 @@ static void virtio_blk_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 qemu_put_be32(f, virtio_get_queue_index(req->vq));
 }
 
-qemu_put_virtqueue_element(f, &req->elem);
+qemu_put_virtqueue_element(vdev, f, &req->elem);
 req = req->next;
 }
 qemu_put_sbyte(f, 0);
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 4e0ed829ae..33259042a9 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -708,7 +708,7 @@ static void virtio_serial_save_device(VirtIODevice *vdev, 
QEMUFile *f)
 if (elem_popped) {
 qemu_put_be32s(f, &port->iov_idx);
 qemu_put_be64s(f, &port->iov_offset);
-qemu_put_virtqueue_element(f, port->elem);
+qemu_put_virtqueue_element(vdev, f, port->elem);
 }
 }
 }
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index ee52aa7d17..e8b2b64d09 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -190,11 +190,12 @@ static void virtio_scsi_save_request(QEMUFile *f, 
SCSIRequest *sreq)
 {
 VirtIOSCSIReq *req = sreq->hba_private;
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(req->dev);
+VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
 uint32_t n = virtio_get_queue_index(req->vq) - 2;
 
 assert(n < vs->conf.num_queues);
 qemu_put_be32s(f, &n);
-qemu_put_virtqueue_element(f, &req->elem);
+qemu_put_virtqueue_element(vdev, f, &req->elem);
 }
 
 static void *virtio_scsi_load_request(QEMUFile *f, SCSIRequest *sreq)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 74cc10fad9..a694b4ab50 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -96,6 +96,7 @@ typedef struct VRingPackedDescEvent {
 struct VirtQueue
 {
 VRing vring;
+VirtQueueElement *used_elems;
 
 /* Next head to pop */
 uint16_t last_avail_idx;
@@ -160,6 +161,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, 
int n)
 VRingMemoryRegionCaches *new = NULL;
 hwaddr addr, size;
 int64_t len;
+bool packed;
 
 
 addr = vq->vring.desc;
@@ -168,8 +170,10 @@ static void virtio_init_region_cache(VirtIODevice *vdev, 
int n)
 }
 new = g_new0(VRingMemoryRegionCaches, 1);
 size = virtio_queue_get_desc_size(vdev, n);
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) ?
+   true : false;
 len = address_space_cache_init(&new->desc, vdev->dma_as,
-   addr, size, false);
+   addr, size, packed);
 if (len < size) {
 virtio_error(vdev, "Cannot map desc");
 goto err_desc;
@@ -225,8 +229,8 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n)
 }
 
 /* Called within rcu_read_lock().  */
-static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
-MemoryRegionCache *cache, int i)
+static void vring_split_desc_read(VirtIODevice *vdev, VRingDesc *desc,
+  MemoryRegionCache *cache, int i)
 {
 address_space_read_cached(cache, i * sizeof(VRingDesc),
   desc, sizeof(VRingDesc));
@@ -236,6 +240,7 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc 
*desc,
 virtio_tswap16s(vdev, &desc->next);
 }
 
+/* Called within rcu_read_lock().  */
 static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
 {
 VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
@@ -370,6 +375,95 @@ int virtio_queue_ready(VirtQueue *vq)
 return vq->vring.avail != 0;
 }
 
+static void vring_packed_desc_read_flags(VirtIODevice *vdev,
+ uint16_t *flags,
+ MemoryRegionCache *cache,
+ int i)
+{
+address_space_read_cached(cache,
+  i * sizeof(VRingPackedDesc) +
+  offsetof(VRingPackedDesc, flags),
+  flags, sizeof(*flags));
+virtio_tswap16s(vdev, flags);
+}
+
+static void vring_packed_desc_read(Virt

[PULL 09/19] hw: Move M48T59 device from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The M48T59 is a Real Time Clock, not a timer.
Move it under the hw/rtc/ subdirectory.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20191003230404.19384-5-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 MAINTAINERS |  4 +-
 hw/ppc/ppc405_boards.c  |  2 +-
 hw/ppc/prep.c   |  2 +-
 hw/rtc/Kconfig  |  3 ++
 hw/rtc/Makefile.objs|  4 ++
 hw/{timer => rtc}/m48t59-internal.h |  0
 hw/{timer => rtc}/m48t59-isa.c  |  4 +-
 hw/{timer => rtc}/m48t59.c  |  2 +-
 hw/sparc/sun4m.c|  2 +-
 hw/sparc64/sun4u.c  |  2 +-
 hw/timer/Kconfig|  3 --
 hw/timer/Makefile.objs  |  4 --
 include/hw/rtc/m48t59.h | 57 +
 include/hw/timer/m48t59.h   | 32 
 14 files changed, 73 insertions(+), 48 deletions(-)
 rename hw/{timer => rtc}/m48t59-internal.h (100%)
 rename hw/{timer => rtc}/m48t59-isa.c (98%)
 rename hw/{timer => rtc}/m48t59.c (99%)
 create mode 100644 include/hw/rtc/m48t59.h
 delete mode 100644 include/hw/timer/m48t59.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7eba146444ae..4e65f062f29c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1064,9 +1064,9 @@ F: hw/pci-host/prep.[hc]
 F: hw/isa/i82378.c
 F: hw/isa/pc87312.c
 F: hw/dma/i82374.c
-F: hw/timer/m48t59-isa.c
+F: hw/rtc/m48t59-isa.c
 F: include/hw/isa/pc87312.h
-F: include/hw/timer/m48t59.h
+F: include/hw/rtc/m48t59.h
 F: pc-bios/ppc_rom.bin
 
 sPAPR
diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c
index 388cae0b4370..1f721feed6a4 100644
--- a/hw/ppc/ppc405_boards.c
+++ b/hw/ppc/ppc405_boards.c
@@ -29,7 +29,7 @@
 #include "cpu.h"
 #include "hw/ppc/ppc.h"
 #include "ppc405.h"
-#include "hw/timer/m48t59.h"
+#include "hw/rtc/m48t59.h"
 #include "hw/block/flash.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/qtest.h"
diff --git a/hw/ppc/prep.c b/hw/ppc/prep.c
index 3a51536e1a39..862345c2ac5f 100644
--- a/hw/ppc/prep.c
+++ b/hw/ppc/prep.c
@@ -25,7 +25,7 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
-#include "hw/timer/m48t59.h"
+#include "hw/rtc/m48t59.h"
 #include "hw/char/serial.h"
 #include "hw/block/fdc.h"
 #include "net/net.h"
diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index 7ffd702268ad..159c2335171a 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -1,3 +1,6 @@
+config M48T59
+bool
+
 config PL031
 bool
 
diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 3cac0d5a637b..c87f81405e9d 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -1,2 +1,6 @@
+common-obj-$(CONFIG_M48T59) += m48t59.o
+ifeq ($(CONFIG_ISA_BUS),y)
+common-obj-$(CONFIG_M48T59) += m48t59-isa.o
+endif
 common-obj-$(CONFIG_PL031) += pl031.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
diff --git a/hw/timer/m48t59-internal.h b/hw/rtc/m48t59-internal.h
similarity index 100%
rename from hw/timer/m48t59-internal.h
rename to hw/rtc/m48t59-internal.h
diff --git a/hw/timer/m48t59-isa.c b/hw/rtc/m48t59-isa.c
similarity index 98%
rename from hw/timer/m48t59-isa.c
rename to hw/rtc/m48t59-isa.c
index 5e5432abfdfa..7fde854c0f8f 100644
--- a/hw/timer/m48t59-isa.c
+++ b/hw/rtc/m48t59-isa.c
@@ -1,5 +1,5 @@
 /*
- * QEMU M48T59 and M48T08 NVRAM emulation (ISA bus interface
+ * QEMU M48T59 and M48T08 NVRAM emulation (ISA bus interface)
  *
  * Copyright (c) 2003-2005, 2007 Jocelyn Mayer
  * Copyright (c) 2013 Hervé Poussineau
@@ -26,7 +26,7 @@
 #include "qemu/osdep.h"
 #include "hw/isa/isa.h"
 #include "hw/qdev-properties.h"
-#include "hw/timer/m48t59.h"
+#include "hw/rtc/m48t59.h"
 #include "m48t59-internal.h"
 #include "qemu/module.h"
 
diff --git a/hw/timer/m48t59.c b/hw/rtc/m48t59.c
similarity index 99%
rename from hw/timer/m48t59.c
rename to hw/rtc/m48t59.c
index a9fc2f981a56..fc592b9fb126 100644
--- a/hw/timer/m48t59.c
+++ b/hw/rtc/m48t59.c
@@ -27,7 +27,7 @@
 #include "qemu-common.h"
 #include "hw/irq.h"
 #include "hw/qdev-properties.h"
-#include "hw/timer/m48t59.h"
+#include "hw/rtc/m48t59.h"
 #include "qemu/timer.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 6c5a17a02055..2aaa5bf1ae2e 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -31,7 +31,7 @@
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
 #include "hw/sparc/sun4m_iommu.h"
-#include "hw/timer/m48t59.h"
+#include "hw/rtc/m48t59.h"
 #include "migration/vmstate.h"
 #include "hw/sparc/sparc32_dma.h"
 #include "hw/block/fdc.h"
diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index 1ded2a4c9ab3..955082773b06 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -36,7 +36,7 @@
 #include "hw/pci-host/sabre.h"
 #include "hw/char/serial.h"
 #include "hw/char/parallel.h"
-#include "hw/timer/m48t59.h"
+#include "hw/rtc/m48t59.h"
 #include "migration/vmstate.h"
 #include "hw/input/i8042.h"
 #include "hw/block/fdc.h"
diff --

[PULL 15/19] hw: Move Exynos4210 RTC from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

Move RTC devices under the hw/rtc/ subdirectory.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-11-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 hw/rtc/Makefile.objs   | 1 +
 hw/{timer => rtc}/exynos4210_rtc.c | 0
 hw/timer/Makefile.objs | 1 -
 3 files changed, 1 insertion(+), 1 deletion(-)
 rename hw/{timer => rtc}/exynos4210_rtc.c (100%)

diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 543a550a0f11..3d4763fc269e 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -7,5 +7,6 @@ endif
 common-obj-$(CONFIG_PL031) += pl031.o
 common-obj-$(CONFIG_TWL92230) += twl92230.o
 common-obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp-rtc.o
+common-obj-$(CONFIG_EXYNOS4) += exynos4210_rtc.o
 obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
diff --git a/hw/timer/exynos4210_rtc.c b/hw/rtc/exynos4210_rtc.c
similarity index 100%
rename from hw/timer/exynos4210_rtc.c
rename to hw/rtc/exynos4210_rtc.c
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 294465ef47ad..33191d74cb98 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -19,7 +19,6 @@ common-obj-$(CONFIG_NRF51_SOC) += nrf51_timer.o
 common-obj-$(CONFIG_ALTERA_TIMER) += altera_timer.o
 common-obj-$(CONFIG_EXYNOS4) += exynos4210_mct.o
 common-obj-$(CONFIG_EXYNOS4) += exynos4210_pwm.o
-common-obj-$(CONFIG_EXYNOS4) += exynos4210_rtc.o
 common-obj-$(CONFIG_OMAP) += omap_gptimer.o
 common-obj-$(CONFIG_OMAP) += omap_synctimer.o
 common-obj-$(CONFIG_PXA2XX) += pxa2xx_timer.o
-- 
2.21.0




[PULL 05/19] qemu-timer: reuse MIN macro in qemu_timeout_ns_to_ms

2019-10-25 Thread Laurent Vivier
From: Frediano Ziglio 

Signed-off-by: Frediano Ziglio 
Reviewed-by: Laurent Vivier 
Message-Id: <20191023122652.2999-3-fzig...@redhat.com>
[lv: removed the two useless casts]
Reviewed-by: Eric Blake 
Signed-off-by: Laurent Vivier 
---
 util/qemu-timer.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index d428fec56772..ef52d28d3753 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -322,11 +322,7 @@ int qemu_timeout_ns_to_ms(int64_t ns)
 ms = DIV_ROUND_UP(ns, SCALE_MS);
 
 /* To avoid overflow problems, limit this to 2^31, i.e. approx 25 days */
-if (ms > (int64_t) INT32_MAX) {
-ms = INT32_MAX;
-}
-
-return (int) ms;
+return MIN(ms, INT32_MAX);
 }
 
 
-- 
2.21.0




[PULL 10/19] hw: Move M41T80 device from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The M41T80 is a Real Time Clock, not a timer.
Move it under the hw/rtc/ subdirectory.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20191003230404.19384-6-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 MAINTAINERS| 2 +-
 hw/rtc/Kconfig | 4 
 hw/rtc/Makefile.objs   | 1 +
 hw/{timer => rtc}/m41t80.c | 0
 hw/timer/Kconfig   | 4 
 hw/timer/Makefile.objs | 1 -
 6 files changed, 6 insertions(+), 6 deletions(-)
 rename hw/{timer => rtc}/m41t80.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4e65f062f29c..ba0d1906aae9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -,7 +,7 @@ F: hw/ppc/sam460ex.c
 F: hw/ppc/ppc440_pcix.c
 F: hw/display/sm501*
 F: hw/ide/sii3112.c
-F: hw/timer/m41t80.c
+F: hw/rtc/m41t80.c
 F: pc-bios/canyonlands.dt[sb]
 F: pc-bios/u-boot-sam460ex-20100605.bin
 F: roms/u-boot-sam460ex
diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index 159c2335171a..434b20b2b1bd 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -1,3 +1,7 @@
+config M41T80
+bool
+depends on I2C
+
 config M48T59
 bool
 
diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index c87f81405e9d..89e8e48c6409 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -1,3 +1,4 @@
+common-obj-$(CONFIG_M41T80) += m41t80.o
 common-obj-$(CONFIG_M48T59) += m48t59.o
 ifeq ($(CONFIG_ISA_BUS),y)
 common-obj-$(CONFIG_M48T59) += m48t59-isa.o
diff --git a/hw/timer/m41t80.c b/hw/rtc/m41t80.c
similarity index 100%
rename from hw/timer/m41t80.c
rename to hw/rtc/m41t80.c
diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
index a57e9b59fca8..a6b668b2559b 100644
--- a/hw/timer/Kconfig
+++ b/hw/timer/Kconfig
@@ -20,10 +20,6 @@ config HPET
 config I8254
 bool
 
-config M41T80
-bool
-depends on I2C
-
 config TWL92230
 bool
 depends on I2C
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index fe2d1fbc4040..2fb12162a623 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -6,7 +6,6 @@ common-obj-$(CONFIG_CADENCE) += cadence_ttc.o
 common-obj-$(CONFIG_DS1338) += ds1338.o
 common-obj-$(CONFIG_HPET) += hpet.o
 common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
-common-obj-$(CONFIG_M41T80) += m41t80.o
 common-obj-$(CONFIG_PUV3) += puv3_ost.o
 common-obj-$(CONFIG_TWL92230) += twl92230.o
 common-obj-$(CONFIG_XILINX) += xilinx_timer.o
-- 
2.21.0




[PATCH v7 4/8] virtio: Free rnd virqueue at unrealize()

2019-10-25 Thread Eugenio Pérez
The function virtio_del_queue was not called at unrealize() callback.

This was detected due to add an allocated element on the vq introduce
in future commits (used_elems) and running address sanitizer memory
leak detector.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/virtio-rng.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index e93bed020f..b498a20332 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -238,6 +238,7 @@ static void virtio_rng_device_unrealize(DeviceState *dev, 
Error **errp)
 qemu_del_vm_change_state_handler(vrng->vmstate);
 timer_del(vrng->rate_limit_timer);
 timer_free(vrng->rate_limit_timer);
+virtio_del_queue(vdev, 0);
 virtio_cleanup(vdev);
 }
 
-- 
2.16.5




[PULL 08/19] hw: Move MC146818 device from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The MC146818 is a Real Time Clock, not a timer.
Move it under the hw/rtc/ subdirectory.

Use copyright statement from 80cabfad163 for "hw/rtc/mc146818rtc.h".

Reviewed-by: Alistair Francis 
Acked-by: David Gibson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20191003230404.19384-4-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 MAINTAINERS  |  4 ++--
 hw/alpha/dp264.c |  2 +-
 hw/hppa/machine.c|  2 +-
 hw/i386/acpi-build.c |  2 +-
 hw/i386/pc.c |  2 +-
 hw/i386/pc_q35.c |  2 +-
 hw/mips/mips_fulong2e.c  |  2 +-
 hw/mips/mips_jazz.c  |  2 +-
 hw/mips/mips_malta.c |  2 +-
 hw/mips/mips_r4k.c   |  2 +-
 hw/ppc/pnv.c |  2 +-
 hw/ppc/prep.c|  2 +-
 hw/rtc/Kconfig   |  3 +++
 hw/rtc/Makefile.objs |  1 +
 hw/{timer => rtc}/mc146818rtc.c  |  2 +-
 hw/timer/Kconfig |  3 ---
 hw/timer/Makefile.objs   |  2 --
 hw/timer/hpet.c  |  2 +-
 include/hw/{timer => rtc}/mc146818rtc.h  | 14 +++---
 include/hw/{timer => rtc}/mc146818rtc_regs.h |  5 +++--
 tests/rtc-test.c |  2 +-
 21 files changed, 34 insertions(+), 26 deletions(-)
 rename hw/{timer => rtc}/mc146818rtc.c (99%)
 rename include/hw/{timer => rtc}/mc146818rtc.h (58%)
 rename include/hw/{timer => rtc}/mc146818rtc_regs.h (96%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2e13ba46282d..7eba146444ae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1261,7 +1261,7 @@ F: hw/misc/debugexit.c
 F: hw/misc/pc-testdev.c
 F: hw/timer/hpet*
 F: hw/timer/i8254*
-F: hw/timer/mc146818rtc*
+F: hw/rtc/mc146818rtc*
 F: hw/watchdog/wdt_ib700.c
 F: hw/watchdog/wdt_i6300esb.c
 F: include/hw/display/vga.h
@@ -1273,7 +1273,7 @@ F: include/hw/isa/i8259_internal.h
 F: include/hw/isa/superio.h
 F: include/hw/timer/hpet.h
 F: include/hw/timer/i8254*
-F: include/hw/timer/mc146818rtc*
+F: include/hw/rtc/mc146818rtc*
 
 Machine core
 M: Eduardo Habkost 
diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c
index 51feee855812..51b3cf7a6128 100644
--- a/hw/alpha/dp264.c
+++ b/hw/alpha/dp264.c
@@ -14,7 +14,7 @@
 #include "alpha_sys.h"
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
-#include "hw/timer/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc.h"
 #include "hw/ide.h"
 #include "hw/timer/i8254.h"
 #include "hw/isa/superio.h"
diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index 953d454f4879..b30aba6d5439 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -12,7 +12,7 @@
 #include "qemu/error-report.h"
 #include "sysemu/reset.h"
 #include "sysemu/sysemu.h"
-#include "hw/timer/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc.h"
 #include "hw/ide.h"
 #include "hw/timer/i8254.h"
 #include "hw/char/serial.h"
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1d077a7cb772..d9435ba0b310 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -45,7 +45,7 @@
 #include "hw/acpi/vmgenid.h"
 #include "hw/boards.h"
 #include "sysemu/tpm_backend.h"
-#include "hw/timer/mc146818rtc_regs.h"
+#include "hw/rtc/mc146818rtc_regs.h"
 #include "migration/vmstate.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/numa.h"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4b1904237ec6..51b72439b440 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -42,7 +42,7 @@
 #include "elf.h"
 #include "migration/vmstate.h"
 #include "multiboot.h"
-#include "hw/timer/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc.h"
 #include "hw/dma/i8257.h"
 #include "hw/timer/i8254.h"
 #include "hw/input/i8042.h"
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 8fad20f3146a..748fc2ee15bf 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -33,7 +33,7 @@
 #include "hw/loader.h"
 #include "sysemu/arch_init.h"
 #include "hw/i2c/smbus_eeprom.h"
-#include "hw/timer/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc.h"
 #include "hw/xen/xen.h"
 #include "sysemu/kvm.h"
 #include "kvm_i386.h"
diff --git a/hw/mips/mips_fulong2e.c b/hw/mips/mips_fulong2e.c
index cf537dd7e631..03a27e176796 100644
--- a/hw/mips/mips_fulong2e.c
+++ b/hw/mips/mips_fulong2e.c
@@ -39,7 +39,7 @@
 #include "hw/ide.h"
 #include "elf.h"
 #include "hw/isa/vt82c686.h"
-#include "hw/timer/mc146818rtc.h"
+#include "hw/rtc/mc146818rtc.h"
 #include "hw/timer/i8254.h"
 #include "exec/address-spaces.h"
 #include "sysemu/qtest.h"
diff --git a/hw/mips/mips_jazz.c b/hw/mips/mips_jazz.c
index 8d010a0b6e10..d978bb64a0f5 100644
--- a/hw/mips/mips_jazz.c
+++ b/hw/mips/mips_jazz.c
@@ -39,7 +39,7 @@
 #include "hw/scsi/esp.h"
 #include "hw/mips/bios.h"
 #include "hw/loader.h"
-#include "hw/timer/mc146818rtc.h"
+#include "hw/rtc/m

[PULL 13/19] hw: Move DS1338 device from hw/timer/ to hw/rtc/ subdirectory

2019-10-25 Thread Laurent Vivier
From: Philippe Mathieu-Daudé 

The DS1338 is a Real Time Clock, not a timer.
Move it under the hw/rtc/ subdirectory.

Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Peter Maydell 
Message-Id: <20191003230404.19384-9-phi...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 hw/rtc/Kconfig | 4 
 hw/rtc/Makefile.objs   | 1 +
 hw/{timer => rtc}/ds1338.c | 0
 hw/timer/Kconfig   | 4 
 hw/timer/Makefile.objs | 1 -
 5 files changed, 5 insertions(+), 5 deletions(-)
 rename hw/{timer => rtc}/ds1338.c (100%)

diff --git a/hw/rtc/Kconfig b/hw/rtc/Kconfig
index dff9d60946af..45daa8d655c9 100644
--- a/hw/rtc/Kconfig
+++ b/hw/rtc/Kconfig
@@ -1,3 +1,7 @@
+config DS1338
+bool
+depends on I2C
+
 config M41T80
 bool
 depends on I2C
diff --git a/hw/rtc/Makefile.objs b/hw/rtc/Makefile.objs
index 810a38ee7b3d..b195863291d1 100644
--- a/hw/rtc/Makefile.objs
+++ b/hw/rtc/Makefile.objs
@@ -1,3 +1,4 @@
+common-obj-$(CONFIG_DS1338) += ds1338.o
 common-obj-$(CONFIG_M41T80) += m41t80.o
 common-obj-$(CONFIG_M48T59) += m48t59.o
 ifeq ($(CONFIG_ISA_BUS),y)
diff --git a/hw/timer/ds1338.c b/hw/rtc/ds1338.c
similarity index 100%
rename from hw/timer/ds1338.c
rename to hw/rtc/ds1338.c
diff --git a/hw/timer/Kconfig b/hw/timer/Kconfig
index 9357875f285d..a990f9fe35fc 100644
--- a/hw/timer/Kconfig
+++ b/hw/timer/Kconfig
@@ -9,10 +9,6 @@ config ARM_MPTIMER
 config A9_GTIMER
 bool
 
-config DS1338
-bool
-depends on I2C
-
 config HPET
 bool
 default y if PC
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index 23be70b71d32..70b61b69c7a4 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -3,7 +3,6 @@ common-obj-$(CONFIG_ARM_MPTIMER) += arm_mptimer.o
 common-obj-$(CONFIG_ARM_V7M) += armv7m_systick.o
 common-obj-$(CONFIG_A9_GTIMER) += a9gtimer.o
 common-obj-$(CONFIG_CADENCE) += cadence_ttc.o
-common-obj-$(CONFIG_DS1338) += ds1338.o
 common-obj-$(CONFIG_HPET) += hpet.o
 common-obj-$(CONFIG_I8254) += i8254_common.o i8254.o
 common-obj-$(CONFIG_PUV3) += puv3_ost.o
-- 
2.21.0




  1   2   >