Re: [PATCH 0/3] Fix active mirror dead-lock

2021-07-02 Thread Vladimir Sementsov-Ogievskiy

[Fix Den's email address in CC]

03.07.2021 00:16, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

We've faced a dead-lock in active mirror in our Rhev-8.4 based Qemu
build. And it's reproducible on master too.

Vladimir Sementsov-Ogievskiy (3):
   block/mirror: set .co for active-write MirrorOp objects
   iotest 151: add test-case that shows active mirror dead-lock
   block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts

  block/mirror.c | 13 +
  tests/qemu-iotests/151 | 54 --
  tests/qemu-iotests/151.out |  4 +--
  3 files changed, 67 insertions(+), 4 deletions(-)




--
Best regards,
Vladimir



Re: [PATCH] hw/sd: sdhci: Enable 64-bit system bus capability in the default SD/MMC host controller

2021-07-02 Thread Philippe Mathieu-Daudé
Hi Joanne,

Next time I recommend you to Cc the maintainers, otherwise they
might miss your patch. See:
https://wiki.qemu.org/Contribute/SubmitAPatch#CC_the_relevant_maintainer

$ ./scripts/get_maintainer.pl -f hw/sd/sdhci-internal.h
"Philippe Mathieu-Daudé"  (odd fixer:SD (Secure Card))
Bin Meng  (odd fixer:SD (Secure Card))
qemu-block@nongnu.org (open list:SD (Secure Card))
qemu-de...@nongnu.org (open list:All patches CC here)

On 6/23/21 8:59 PM, Joanne Koong wrote:
> The default SD/MMC host controller uses SD spec v2.00. 64-bit system bus 
> capability
> was added in v2.
> 
> In this change, we arrive at 0x157834b4 by computing (0x057834b4 | (1ul << 
> 28))
> where 28 represents the BUS64BIT SDHC_CAPAB field.
> 
> Signed-off-by: Joanne Koong 
> ---
>  hw/sd/sdhci-internal.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
> index e8c753d6d1..a76fc704e5 100644
> --- a/hw/sd/sdhci-internal.h
> +++ b/hw/sd/sdhci-internal.h
> @@ -316,16 +316,16 @@ extern const VMStateDescription sdhci_vmstate;
>   * - 3.3v and 1.8v voltages
>   * - SDMA/ADMA1/ADMA2
>   * - high-speed
> + * - 64-bit system bus
>   * max host controller R/W buffers size: 512B
>   * max clock frequency for SDclock: 52 MHz
>   * timeout clock frequency: 52 MHz
>   *
>   * does not support:
>   * - 3.0v voltage
> - * - 64-bit system bus
>   * - suspend/resume
>   */
> -#define SDHC_CAPAB_REG_DEFAULT 0x057834b4
> +#define SDHC_CAPAB_REG_DEFAULT 0x157834b4
>  
>  #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
>  DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
> 

Reviewed-by: Philippe Mathieu-Daudé 



[PATCH 2/3] iotest 151: add test-case that shows active mirror dead-lock

2021-07-02 Thread Vladimir Sementsov-Ogievskiy
There is a dead-lock in active mirror: when we have parallel
intersecting requests (note that non intersecting requests may be
considered intersecting after aligning to mirror granularity), it may
happen that request A waits request B in mirror_wait_on_conflicts() and
request B waits for A.

Look at the test for details. Test now dead-locks, that's why it's
disabled. Next commit will fix mirror and enable the test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/151 | 62 --
 tests/qemu-iotests/151.out |  4 +--
 2 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
index 182f6b5321..ab46c5e8ba 100755
--- a/tests/qemu-iotests/151
+++ b/tests/qemu-iotests/151
@@ -38,8 +38,9 @@ class TestActiveMirror(iotests.QMPTestCase):
   'if': 'none',
   'node-name': 'source-node',
   'driver': iotests.imgfmt,
-  'file': {'driver': 'file',
-   'filename': source_img}}
+  'file': {'driver': 'blkdebug',
+   'image': {'driver': 'file',
+ 'filename': source_img}}}
 
 blk_target = {'node-name': 'target-node',
   'driver': iotests.imgfmt,
@@ -141,6 +142,63 @@ class TestActiveMirror(iotests.QMPTestCase):
 
 self.potential_writes_in_flight = False
 
+def testIntersectingActiveIO(self):
+# FIXME: test-case is dead-locking. To reproduce dead-lock just drop
+# this return statement
+return
+
+# Fill the source image
+result = self.vm.hmp_qemu_io('source', 'write -P 1 0 2M')
+
+# Start the block job (very slowly)
+result = self.vm.qmp('blockdev-mirror',
+ job_id='mirror',
+ filter_node_name='mirror-node',
+ device='source-node',
+ target='target-node',
+ sync='full',
+ copy_mode='write-blocking',
+ speed=1)
+
+self.vm.hmp_qemu_io('source', 'break write_aio A')
+self.vm.hmp_qemu_io('source', 'aio_write 0 1M')  # 1
+self.vm.hmp_qemu_io('source', 'wait_break A')
+self.vm.hmp_qemu_io('source', 'aio_write 0 2M')  # 2
+self.vm.hmp_qemu_io('source', 'aio_write 0 2M')  # 3
+
+# Now 2 and 3 are in mirror_wait_on_conflicts, waiting for 1
+
+self.vm.hmp_qemu_io('source', 'break write_aio B')
+self.vm.hmp_qemu_io('source', 'aio_write 1M 2M')  # 4
+self.vm.hmp_qemu_io('source', 'wait_break B')
+
+# 4 doesn't wait for 2 and 3, because they didn't yet set
+# in_flight_bitmap. So, nothing prevents 4 to go except for our
+# break-point B.
+
+self.vm.hmp_qemu_io('source', 'resume A')
+
+# Now we resumed 1, so 2 and 3 goes to the next iteration of while loop
+# in mirror_wait_on_conflicts(). They don't exit, as bitmap is dirty
+# due to request 4. And they start to wait: 2 wait for 3, 3 wait for 2
+# - DEAD LOCK.
+# Note that it's important that we add request 4 at last: requests are
+# appended to the list, so we are sure that 4 is last in the list, so 2
+# and 3 now waits for each other, not for 4.
+
+self.vm.hmp_qemu_io('source', 'resume B')
+
+# Resuming 4 doesn't help, 2 and 3 already dead-locked
+# To check the dead-lock run:
+#gdb -p $(pidof qemu-system-x86_64) -ex 'set $job=(MirrorBlockJob 
*)jobs.lh_first' -ex 'p *$job->ops_in_flight.tqh_first' -ex 'p 
*$job->ops_in_flight.tqh_first->next.tqe_next'
+# You'll see two MirrorOp objects waiting on each other
+
+result = self.vm.qmp('block-job-set-speed', device='mirror', speed=0)
+self.assert_qmp(result, 'return', {})
+self.complete_and_wait(drive='mirror')
+
+self.potential_writes_in_flight = False
+
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2', 'raw'],
diff --git a/tests/qemu-iotests/151.out b/tests/qemu-iotests/151.out
index 8d7e996700..89968f35d7 100644
--- a/tests/qemu-iotests/151.out
+++ b/tests/qemu-iotests/151.out
@@ -1,5 +1,5 @@
-...
+
 --
-Ran 3 tests
+Ran 4 tests
 
 OK
-- 
2.29.2




[PATCH 3/3] block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts

2021-07-02 Thread Vladimir Sementsov-Ogievskiy
It's possible that requests start to wait each other in
mirror_wait_on_conflicts(). To avoid it let's use same technique as in
block/io.c in bdrv_wait_serialising_requests_locked() /
bdrv_find_conflicting_request(): don't wait on intersecting request if
it is already waiting for some other request.

For details of the dead-lock look at testIntersectingActiveIO()
test-case which we actually fixing now.

Fixes: d06107ade0ce74dc39739bac80de84b51ec18546
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Note, that I failed to run the test on d06107ade0ce74d, that introduces
active mirror. But it seems that the problem should exit in it too, and
it's better to leave "Fixes:" tag than don't do it.

 block/mirror.c | 12 
 tests/qemu-iotests/151 | 18 +-
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index ad6aac2f95..98fc66eabf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -107,6 +107,7 @@ struct MirrorOp {
 bool is_in_flight;
 CoQueue waiting_requests;
 Coroutine *co;
+MirrorOp *waiting_for_op;
 
 QTAILQ_ENTRY(MirrorOp) next;
 };
@@ -159,7 +160,18 @@ static void coroutine_fn mirror_wait_on_conflicts(MirrorOp 
*self,
 if (ranges_overlap(self_start_chunk, self_nb_chunks,
op_start_chunk, op_nb_chunks))
 {
+/*
+ * If the operation is already (indirectly) waiting for us, or
+ * will wait for us as soon as it wakes up, then just go on
+ * (instead of producing a deadlock in the former case).
+ */
+if (op->waiting_for_op) {
+continue;
+}
+
+self->waiting_for_op = op;
 qemu_co_queue_wait(>waiting_requests, NULL);
+self->waiting_for_op = NULL;
 break;
 }
 }
diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
index ab46c5e8ba..93d14193d0 100755
--- a/tests/qemu-iotests/151
+++ b/tests/qemu-iotests/151
@@ -143,10 +143,6 @@ class TestActiveMirror(iotests.QMPTestCase):
 self.potential_writes_in_flight = False
 
 def testIntersectingActiveIO(self):
-# FIXME: test-case is dead-locking. To reproduce dead-lock just drop
-# this return statement
-return
-
 # Fill the source image
 result = self.vm.hmp_qemu_io('source', 'write -P 1 0 2M')
 
@@ -180,18 +176,14 @@ class TestActiveMirror(iotests.QMPTestCase):
 
 # Now we resumed 1, so 2 and 3 goes to the next iteration of while loop
 # in mirror_wait_on_conflicts(). They don't exit, as bitmap is dirty
-# due to request 4. And they start to wait: 2 wait for 3, 3 wait for 2
-# - DEAD LOCK.
-# Note that it's important that we add request 4 at last: requests are
-# appended to the list, so we are sure that 4 is last in the list, so 2
-# and 3 now waits for each other, not for 4.
+# due to request 4.
+# In the past at that point 2 and 3 would wait for each other producing
+# a dead-lock. Now this is fixed and they will wait for request 4.
 
 self.vm.hmp_qemu_io('source', 'resume B')
 
-# Resuming 4 doesn't help, 2 and 3 already dead-locked
-# To check the dead-lock run:
-#gdb -p $(pidof qemu-system-x86_64) -ex 'set $job=(MirrorBlockJob 
*)jobs.lh_first' -ex 'p *$job->ops_in_flight.tqh_first' -ex 'p 
*$job->ops_in_flight.tqh_first->next.tqe_next'
-# You'll see two MirrorOp objects waiting on each other
+# After resuming 4, one of 2 and 3 goes first and set in_flight_bitmap,
+# so the other will wait for it.
 
 result = self.vm.qmp('block-job-set-speed', device='mirror', speed=0)
 self.assert_qmp(result, 'return', {})
-- 
2.29.2




[PATCH 1/3] block/mirror: set .co for active-write MirrorOp objects

2021-07-02 Thread Vladimir Sementsov-Ogievskiy
This field is unused, but it very helpful for debugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/mirror.c b/block/mirror.c
index 019f6deaa5..ad6aac2f95 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1343,6 +1343,7 @@ static MirrorOp *coroutine_fn 
active_write_prepare(MirrorBlockJob *s,
 .bytes  = bytes,
 .is_active_write= true,
 .is_in_flight   = true,
+.co = qemu_coroutine_self(),
 };
 qemu_co_queue_init(>waiting_requests);
 QTAILQ_INSERT_TAIL(>ops_in_flight, op, next);
-- 
2.29.2




[PATCH 0/3] Fix active mirror dead-lock

2021-07-02 Thread Vladimir Sementsov-Ogievskiy
Hi all!

We've faced a dead-lock in active mirror in our Rhev-8.4 based Qemu
build. And it's reproducible on master too.

Vladimir Sementsov-Ogievskiy (3):
  block/mirror: set .co for active-write MirrorOp objects
  iotest 151: add test-case that shows active mirror dead-lock
  block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts

 block/mirror.c | 13 +
 tests/qemu-iotests/151 | 54 --
 tests/qemu-iotests/151.out |  4 +--
 3 files changed, 67 insertions(+), 4 deletions(-)

-- 
2.29.2




Re: [PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-02 Thread Ilya Dryomov
On Fri, Jul 2, 2021 at 7:24 PM Ilya Dryomov  wrote:
>
> This series migrates the qemu rbd driver from the old aio emulation
> to native coroutines and adds write zeroes support which is important
> for block operations.
>
> To achieve this we first bump the librbd requirement to the already
> outdated luminous release of ceph to get rid of some wrappers and
> ifdef'ry in the code.
>
> V4->V5:
>  - rebase on Kevin's block branch (https://repo.or.cz/qemu/kevin.git block)
>to resolve conflicts with "block/rbd: Add support for rbd image encryption"
>patch
>
> V3->V4:
>  - this patch is now rebased on top of current master
>  - Patch 1: just mention librbd, tweak version numbers [Ilya]
>  - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
>  - Patch 4: retain comment about using a BH in the callback [Ilya]
>  - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP 
> [Ilya]
>
> V2->V3:
>  - this patch is now rebased on top of current master
>  - Patch 1: only use cc.links and not cc.run to not break
>cross-compiling. [Kevin]
>Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
>support was dropped [Daniel]
>  - Patch 4: dropped
>  - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]
>
> V1->V2:
>  - this patch is now rebased on top of current master with Paolos
>upcoming fixes for the meson.build script included:
> - meson: accept either shared or static libraries if --disable-static
> - meson: honor --enable-rbd if cc.links test fails
>  - Patch 1: adjusted to meson.build script
>  - Patch 2: unchanged
>  - Patch 3: new patch
>  - Patch 4: do not implement empty detach_aio_context callback [Jason]
>  - Patch 5: - fix aio completion cleanup in error case [Jason]
> - return error codes from librbd
>  - Patch 6: - add support for thick provisioning [Jason]
> - do not set write zeroes alignment
>  - Patch 7: new patch
>
> Peter Lieven (6):
>   block/rbd: bump librbd requirement to luminous release
>   block/rbd: store object_size in BDRVRBDState
>   block/rbd: update s->image_size in qemu_rbd_getlength
>   block/rbd: migrate from aio to coroutines
>   block/rbd: add write zeroes support
>   block/rbd: drop qemu_rbd_refresh_limits
>
>  block/rbd.c | 406 
>  meson.build |   7 +-
>  2 files changed, 128 insertions(+), 285 deletions(-)
>
> --
> 2.19.2
>

Attempting to suit patchew:

Based-on: <20210627114635.39326-1-...@il.ibm.com>
([PATCH v2] block/rbd: Add support for rbd image encryption)

Thanks,

Ilya



Re: [PATCH 3/3] hw/sd: Check for valid address range in SEND_WRITE_PROT (CMD30)

2021-07-02 Thread Alexander Bulekov
On 210702 1759, Philippe Mathieu-Daudé wrote:
> OSS-Fuzz found sending illegal addresses when querying the write
> protection bits triggers an assertion:
> 
>   qemu-fuzz-i386: hw/sd/sd.c:824: uint32_t sd_wpbits(SDState *, uint64_t): 
> Assertion `wpnum < sd->wpgrps_size' failed.
>   ==11578== ERROR: libFuzzer: deadly signal
>   #8 0x7628e091 in __assert_fail
>   #9 0x588f1a3c in sd_wpbits hw/sd/sd.c:824:9
>   #10 0x588dd271 in sd_normal_command hw/sd/sd.c:1383:38
>   #11 0x588d777c in sd_do_command hw/sd/sd.c
>   #12 0x58cb25a0 in sdbus_do_command hw/sd/core.c:100:16
>   #13 0x58e02a9a in sdhci_send_command hw/sd/sdhci.c:337:12
>   #14 0x58dffa46 in sdhci_write hw/sd/sdhci.c:1187:9
>   #15 0x598b9d76 in memory_region_write_accessor softmmu/memory.c:489:5
> 
> Similarly to commit 8573378e62d ("hw/sd: fix out-of-bounds check
> for multi block reads"), check the address range before sending
> the status of the write protection bits.
> 
> Include the qtest reproducer provided by Alexander Bulekov:
> 
>   $ make check-qtest-i386
>   ...
>   Running test qtest-i386/fuzz-sdcard-test
>   qemu-system-i386: ../hw/sd/sd.c:824: sd_wpbits: Assertion `wpnum < 
> sd->wpgrps_size' failed.
> 
> Reported-by: OSS-Fuzz (Issue 29225)
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/450
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Alexander Bulekov 

Thanks

> ---
>  hw/sd/sd.c |  5 +++
>  tests/qtest/fuzz-sdcard-test.c | 66 ++
>  MAINTAINERS|  3 +-
>  tests/qtest/meson.build|  1 +
>  4 files changed, 74 insertions(+), 1 deletion(-)
>  create mode 100644 tests/qtest/fuzz-sdcard-test.c
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 9c8dd11bad1..c753ae24ba9 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1379,6 +1379,11 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  
>  switch (sd->state) {
>  case sd_transfer_state:
> +if (!address_in_range(sd, "SEND_WRITE_PROT",
> +  req.arg, sd->blk_len)) {
> +return sd_r1;
> +}
> +
>  sd->state = sd_sendingdata_state;
>  *(uint32_t *) sd->data = sd_wpbits(sd, req.arg);
>  sd->data_start = addr;
> diff --git a/tests/qtest/fuzz-sdcard-test.c b/tests/qtest/fuzz-sdcard-test.c
> new file mode 100644
> index 000..96602eac7e5
> --- /dev/null
> +++ b/tests/qtest/fuzz-sdcard-test.c
> @@ -0,0 +1,66 @@
> +/*
> + * QTest fuzzer-generated testcase for sdcard device
> + *
> + * Copyright (c) 2021 Philippe Mathieu-Daudé 
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqos/libqtest.h"
> +
> +/*
> + * https://gitlab.com/qemu-project/qemu/-/issues/450
> + * Used to trigger:
> + *  Assertion `wpnum < sd->wpgrps_size' failed.
> + */
> +static void oss_fuzz_29225(void)
> +{
> +QTestState *s;
> +
> +s = qtest_init(" -display none -m 512m -nodefaults -nographic"
> +   " -device sdhci-pci,sd-spec-version=3"
> +   " -device sd-card,drive=d0"
> +   " -drive 
> if=none,index=0,file=null-co://,format=raw,id=d0");
> +
> +qtest_outl(s, 0xcf8, 0x80001010);
> +qtest_outl(s, 0xcfc, 0xd0690);
> +qtest_outl(s, 0xcf8, 0x80001003);
> +qtest_outl(s, 0xcf8, 0x80001013);
> +qtest_outl(s, 0xcfc, 0x);
> +qtest_outl(s, 0xcf8, 0x80001003);
> +qtest_outl(s, 0xcfc, 0x3effe00);
> +
> +qtest_bufwrite(s, 0xff0d062c, "\xff", 0x1);
> +qtest_bufwrite(s, 0xff0d060f, "\xb7", 0x1);
> +qtest_bufwrite(s, 0xff0d060a, "\xc9", 0x1);
> +qtest_bufwrite(s, 0xff0d060f, "\x29", 0x1);
> +qtest_bufwrite(s, 0xff0d060f, "\xc2", 0x1);
> +qtest_bufwrite(s, 0xff0d0628, "\xf7", 0x1);
> +qtest_bufwrite(s, 0x0, "\xe3", 0x1);
> +qtest_bufwrite(s, 0x7, "\x13", 0x1);
> +qtest_bufwrite(s, 0x8, "\xe3", 0x1);
> +qtest_bufwrite(s, 0xf, "\xe3", 0x1);
> +qtest_bufwrite(s, 0xff0d060f, "\x03", 0x1);
> +qtest_bufwrite(s, 0xff0d0605, "\x01", 0x1);
> +qtest_bufwrite(s, 0xff0d060b, "\xff", 0x1);
> +qtest_bufwrite(s, 0xff0d060c, "\xff", 0x1);
> +qtest_bufwrite(s, 0xff0d060e, "\xff", 0x1);
> +qtest_bufwrite(s, 0xff0d060f, "\x06", 0x1);
> +qtest_bufwrite(s, 0xff0d060f, "\x9e", 0x1);
> +
> +qtest_quit(s);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +const char *arch = qtest_get_arch();
> +
> +g_test_init(, , NULL);
> +
> +   if (strcmp(arch, "i386") == 0) {
> +qtest_add_func("fuzz/sdcard/oss_fuzz_29225", oss_fuzz_29225);
> +   }
> +
> +   return g_test_run();
> +}
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cfbf7ef79bc..fb33fe12200 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1794,7 +1794,8 @@ F: include/hw/sd/sd*
>  F: hw/sd/core.c
>  F: hw/sd/sd*
>  F: hw/sd/ssi-sd.c
> -F: tests/qtest/sd*
> +F: tests/qtest/fuzz-sdcard-test.c
> +F: 

Re: [PATCH 1/3] hw/sd: When card is in wrong state, log which state it is

2021-07-02 Thread Alexander Bulekov
On 210702 1758, Philippe Mathieu-Daudé wrote:
> We report the card is in an inconsistent state, but don't precise
> in which state it is. Add this information, as it is useful when
> debugging problems.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> Reviewed-by: Bin Meng 
> Message-Id: <20210624142209.1193073-2-f4...@amsat.org>

Reviewed-by: Alexander Bulekov 

> ---
>  hw/sd/sd.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 282d39a7042..d8fdf84f4db 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -1504,7 +1504,8 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
> SDRequest req)
>  return sd_illegal;
>  }
>  
> -qemu_log_mask(LOG_GUEST_ERROR, "SD: CMD%i in a wrong state\n", req.cmd);
> +qemu_log_mask(LOG_GUEST_ERROR, "SD: CMD%i in a wrong state: %s\n",
> +  req.cmd, sd_state_name(sd->state));
>  return sd_illegal;
>  }
>  
> -- 
> 2.31.1
> 



[PATCH v5 5/6] block/rbd: add write zeroes support

2021-07-02 Thread Ilya Dryomov
From: Peter Lieven 

This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores
BDRV_REQ_MAY_UNMAP for older librbd versions.

The rationale for this is as follows (citing Ilya Dryomov current RBD
maintainer):

---8<---
a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
   and as a consequence always unmap if librbd is too old

   It's not clear what qemu's expectation is but in general Write
   Zeroes is allowed to unmap.  The only guarantee is that subsequent
   reads return zeroes, everything else is a hint.  This is how it is
   specified in the kernel and in the NVMe spec.

   In particular, block/nvme.c implements it as follows:

   if (flags & BDRV_REQ_MAY_UNMAP) {
   cdw12 |= (1 << 25);
   }

   This sets the Deallocate bit.  But if it's not set, the device may
   still deallocate:

   """
   If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
   command, and the namespace supports clearing all bytes to 0h in the
   values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
   from a deallocated logical block and its metadata (excluding
   protection information), then for each specified logical block, the
   controller:
   - should deallocate that logical block;

   ...

   If the Deallocate bit is cleared to '0' in a Write Zeroes command,
   and the namespace supports clearing all bytes to 0h in the values
   read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
   a deallocated logical block and its metadata (excluding protection
   information), then, for each specified logical block, the
   controller:
   - may deallocate that logical block;
   """

   
https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf

b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags

   Again, it's not clear what qemu expects here, but without it we end
   up in a ridiculous situation where specifying the "don't allow slow
   fallback" switch immediately fails all efficient zeroing requests on
   a device where Write Zeroes is always efficient:

   $ qemu-io -c 'help write' | grep -- '-[zun]'
-n, -- with -z, don't allow slow fallback
-u, -- with -z, allow unmapping
-z, -- write zeroes using blk_co_pwrite_zeroes

   $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
   write failed: Operation not supported
--->8---

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/rbd.c b/block/rbd.c
index 380ad28861ad..3152bc8ba00a 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -75,7 +75,8 @@ typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
 RBD_AIO_DISCARD,
-RBD_AIO_FLUSH
+RBD_AIO_FLUSH,
+RBD_AIO_WRITE_ZEROES
 } RBDAIOCmd;
 
 typedef struct BDRVRBDState {
@@ -999,6 +1000,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
+#endif
+
 /* When extending regular files, we get zeros from the OS */
 bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 
@@ -1123,6 +1128,18 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,
 case RBD_AIO_FLUSH:
 r = rbd_aio_flush(s->image, c);
 break;
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+case RBD_AIO_WRITE_ZEROES: {
+int zero_flags = 0;
+#ifdef RBD_WRITE_ZEROES_FLAG_THICK_PROVISION
+if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+zero_flags = RBD_WRITE_ZEROES_FLAG_THICK_PROVISION;
+}
+#endif
+r = rbd_aio_write_zeroes(s->image, offset, bytes, c, zero_flags, 0);
+break;
+}
+#endif
 default:
 r = -EINVAL;
 }
@@ -1193,6 +1210,16 @@ static int coroutine_fn 
qemu_rbd_co_pdiscard(BlockDriverState *bs,
 return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD);
 }
 
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+static int
+coroutine_fn qemu_rbd_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+  int count, BdrvRequestFlags flags)
+{
+return qemu_rbd_start_co(bs, offset, count, NULL, flags,
+ RBD_AIO_WRITE_ZEROES);
+}
+#endif
+
 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVRBDState *s = bs->opaque;
@@ -1473,6 +1500,9 @@ static BlockDriver bdrv_rbd = {
 .bdrv_co_pwritev= qemu_rbd_co_pwritev,
 .bdrv_co_flush_to_disk  = qemu_rbd_co_flush,
 .bdrv_co_pdiscard   = qemu_rbd_co_pdiscard,
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+.bdrv_co_pwrite_zeroes  = qemu_rbd_co_pwrite_zeroes,
+#endif
 
 .bdrv_snapshot_create   = qemu_rbd_snap_create,
 .bdrv_snapshot_delete   = qemu_rbd_snap_remove,
-- 
2.19.2




[PATCH v5 6/6] block/rbd: drop qemu_rbd_refresh_limits

2021-07-02 Thread Ilya Dryomov
From: Peter Lieven 

librbd supports 1 byte alignment for all aio operations.

Currently, there is no API call to query limits from the Ceph
ObjectStore backend.  So drop the bdrv_refresh_limits completely
until there is such an API call.

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 3152bc8ba00a..01a7b94d6257 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -240,14 +240,6 @@ done:
 return;
 }
 
-
-static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error **errp)
-{
-/* XXX Does RBD support AIO on less than 512-byte alignment? */
-bs->bl.request_alignment = 512;
-}
-
-
 static int qemu_rbd_set_auth(rados_t cluster, BlockdevOptionsRbd *opts,
  Error **errp)
 {
@@ -1482,7 +1474,6 @@ static BlockDriver bdrv_rbd = {
 .format_name= "rbd",
 .instance_size  = sizeof(BDRVRBDState),
 .bdrv_parse_filename= qemu_rbd_parse_filename,
-.bdrv_refresh_limits= qemu_rbd_refresh_limits,
 .bdrv_file_open = qemu_rbd_open,
 .bdrv_close = qemu_rbd_close,
 .bdrv_reopen_prepare= qemu_rbd_reopen_prepare,
-- 
2.19.2




[PATCH v5 1/6] block/rbd: bump librbd requirement to luminous release

2021-07-02 Thread Ilya Dryomov
From: Peter Lieven 

Ceph Luminous (version 12.2.z) is almost 4 years old at this point.
Bump the requirement to get rid of the ifdef'ry in the code.
Qemu 6.1 dropped the support for RHEL-7 which was the last supported
OS that required an older librbd.

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 120 
 meson.build |   7 ++-
 2 files changed, 13 insertions(+), 114 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f51adb364670..b4b928bbb99f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -55,24 +55,10 @@
  * leading "\".
  */
 
-/* rbd_aio_discard added in 0.1.2 */
-#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2)
-#define LIBRBD_SUPPORTS_DISCARD
-#else
-#undef LIBRBD_SUPPORTS_DISCARD
-#endif
-
 #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
 
 #define RBD_MAX_SNAPS 100
 
-/* The LIBRBD_SUPPORTS_IOVEC is defined in librbd.h */
-#ifdef LIBRBD_SUPPORTS_IOVEC
-#define LIBRBD_USE_IOVEC 1
-#else
-#define LIBRBD_USE_IOVEC 0
-#endif
-
 #define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
 
 static const char rbd_luks_header_verification[
@@ -96,7 +82,6 @@ typedef struct RBDAIOCB {
 BlockAIOCB common;
 int64_t ret;
 QEMUIOVector *qiov;
-char *bounce;
 RBDAIOCmd cmd;
 int error;
 struct BDRVRBDState *s;
@@ -106,7 +91,6 @@ typedef struct RADOSCB {
 RBDAIOCB *acb;
 struct BDRVRBDState *s;
 int64_t size;
-char *buf;
 int64_t ret;
 } RADOSCB;
 
@@ -354,13 +338,9 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 
 static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 {
-if (LIBRBD_USE_IOVEC) {
-RBDAIOCB *acb = rcb->acb;
-iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
-   acb->qiov->size - offs);
-} else {
-memset(rcb->buf + offs, 0, rcb->size - offs);
-}
+RBDAIOCB *acb = rcb->acb;
+iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
+   acb->qiov->size - offs);
 }
 
 #ifdef LIBRBD_SUPPORTS_ENCRYPTION
@@ -787,13 +767,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 
 g_free(rcb);
 
-if (!LIBRBD_USE_IOVEC) {
-if (acb->cmd == RBD_AIO_READ) {
-qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
-}
-qemu_vfree(acb->bounce);
-}
-
 acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
 
 qemu_aio_unref(acb);
@@ -1174,28 +1147,6 @@ static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB 
*rcb)
  rbd_finish_bh, rcb);
 }
 
-static int rbd_aio_discard_wrapper(rbd_image_t image,
-   uint64_t off,
-   uint64_t len,
-   rbd_completion_t comp)
-{
-#ifdef LIBRBD_SUPPORTS_DISCARD
-return rbd_aio_discard(image, off, len, comp);
-#else
-return -ENOTSUP;
-#endif
-}
-
-static int rbd_aio_flush_wrapper(rbd_image_t image,
- rbd_completion_t comp)
-{
-#ifdef LIBRBD_SUPPORTS_AIO_FLUSH
-return rbd_aio_flush(image, comp);
-#else
-return -ENOTSUP;
-#endif
-}
-
 static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
  int64_t off,
  QEMUIOVector *qiov,
@@ -1218,21 +1169,6 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 
 rcb = g_new(RADOSCB, 1);
 
-if (!LIBRBD_USE_IOVEC) {
-if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
-acb->bounce = NULL;
-} else {
-acb->bounce = qemu_try_blockalign(bs, qiov->size);
-if (acb->bounce == NULL) {
-goto failed;
-}
-}
-if (cmd == RBD_AIO_WRITE) {
-qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
-}
-rcb->buf = acb->bounce;
-}
-
 acb->ret = 0;
 acb->error = 0;
 acb->s = s;
@@ -1246,7 +1182,7 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 }
 
 switch (cmd) {
-case RBD_AIO_WRITE: {
+case RBD_AIO_WRITE:
 /*
  * RBD APIs don't allow us to write more than actual size, so in order
  * to support growing images, we resize the image before write
@@ -1258,25 +1194,16 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 goto failed_completion;
 }
 }
-#ifdef LIBRBD_SUPPORTS_IOVEC
-r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
-#else
-r = rbd_aio_write(s->image, off, size, rcb->buf, c);
-#endif
+r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
 break;
-}
 case RBD_AIO_READ:
-#ifdef LIBRBD_SUPPORTS_IOVEC
-r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
-#else
-r = rbd_aio_read(s->image, off, size, rcb->buf, c);
-#endif
+r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, 

[PATCH v5 4/6] block/rbd: migrate from aio to coroutines

2021-07-02 Thread Ilya Dryomov
From: Peter Lieven 

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 252 +++-
 1 file changed, 90 insertions(+), 162 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index e2028d3db5ff..380ad28861ad 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -78,22 +78,6 @@ typedef enum {
 RBD_AIO_FLUSH
 } RBDAIOCmd;
 
-typedef struct RBDAIOCB {
-BlockAIOCB common;
-int64_t ret;
-QEMUIOVector *qiov;
-RBDAIOCmd cmd;
-int error;
-struct BDRVRBDState *s;
-} RBDAIOCB;
-
-typedef struct RADOSCB {
-RBDAIOCB *acb;
-struct BDRVRBDState *s;
-int64_t size;
-int64_t ret;
-} RADOSCB;
-
 typedef struct BDRVRBDState {
 rados_t cluster;
 rados_ioctx_t io_ctx;
@@ -105,6 +89,13 @@ typedef struct BDRVRBDState {
 uint64_t object_size;
 } BDRVRBDState;
 
+typedef struct RBDTask {
+BlockDriverState *bs;
+Coroutine *co;
+bool complete;
+int64_t ret;
+} RBDTask;
+
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 BlockdevOptionsRbd *opts, bool cache,
 const char *keypairs, const char *secretid,
@@ -337,13 +328,6 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 return ret;
 }
 
-static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
-{
-RBDAIOCB *acb = rcb->acb;
-iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
-   acb->qiov->size - offs);
-}
-
 #ifdef LIBRBD_SUPPORTS_ENCRYPTION
 static int qemu_rbd_convert_luks_options(
 RbdEncryptionOptionsLUKSBase *luks_opts,
@@ -733,46 +717,6 @@ exit:
 return ret;
 }
 
-/*
- * This aio completion is being called from rbd_finish_bh() and runs in qemu
- * BH context.
- */
-static void qemu_rbd_complete_aio(RADOSCB *rcb)
-{
-RBDAIOCB *acb = rcb->acb;
-int64_t r;
-
-r = rcb->ret;
-
-if (acb->cmd != RBD_AIO_READ) {
-if (r < 0) {
-acb->ret = r;
-acb->error = 1;
-} else if (!acb->error) {
-acb->ret = rcb->size;
-}
-} else {
-if (r < 0) {
-qemu_rbd_memset(rcb, 0);
-acb->ret = r;
-acb->error = 1;
-} else if (r < rcb->size) {
-qemu_rbd_memset(rcb, r);
-if (!acb->error) {
-acb->ret = rcb->size;
-}
-} else if (!acb->error) {
-acb->ret = r;
-}
-}
-
-g_free(rcb);
-
-acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
-
-qemu_aio_unref(acb);
-}
-
 static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, Error **errp)
 {
 const char **vals;
@@ -1122,89 +1066,59 @@ static int qemu_rbd_resize(BlockDriverState *bs, 
uint64_t size)
 return 0;
 }
 
-static const AIOCBInfo rbd_aiocb_info = {
-.aiocb_size = sizeof(RBDAIOCB),
-};
-
-static void rbd_finish_bh(void *opaque)
+static void qemu_rbd_finish_bh(void *opaque)
 {
-RADOSCB *rcb = opaque;
-qemu_rbd_complete_aio(rcb);
+RBDTask *task = opaque;
+task->complete = 1;
+aio_co_wake(task->co);
 }
 
 /*
- * This is the callback function for rbd_aio_read and _write
+ * This is the completion callback function for all rbd aio calls
+ * started from qemu_rbd_start_co().
  *
  * Note: this function is being called from a non qemu thread so
  * we need to be careful about what we do here. Generally we only
  * schedule a BH, and do the rest of the io completion handling
- * from rbd_finish_bh() which runs in a qemu context.
+ * from qemu_rbd_finish_bh() which runs in a qemu context.
  */
-static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb)
+static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
 {
-RBDAIOCB *acb = rcb->acb;
-
-rcb->ret = rbd_aio_get_return_value(c);
+task->ret = rbd_aio_get_return_value(c);
 rbd_aio_release(c);
-
-replay_bh_schedule_oneshot_event(bdrv_get_aio_context(acb->common.bs),
- rbd_finish_bh, rcb);
+aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
+qemu_rbd_finish_bh, task);
 }
 
-static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
- int64_t off,
- QEMUIOVector *qiov,
- int64_t size,
- BlockCompletionFunc *cb,
- void *opaque,
- RBDAIOCmd cmd)
+static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
+  uint64_t offset,
+  uint64_t bytes,
+  QEMUIOVector *qiov,
+  int flags,
+  RBDAIOCmd cmd)
 {
-RBDAIOCB *acb;
-RADOSCB *rcb = NULL;
+BDRVRBDState *s = bs->opaque;
+

[PATCH v5 2/6] block/rbd: store object_size in BDRVRBDState

2021-07-02 Thread Ilya Dryomov
From: Peter Lieven 

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index b4b928bbb99f..1ebf8f7e4875 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -102,6 +102,7 @@ typedef struct BDRVRBDState {
 char *snap;
 char *namespace;
 uint64_t image_size;
+uint64_t object_size;
 } BDRVRBDState;
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
@@ -958,6 +959,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 const QDictEntry *e;
 Error *local_err = NULL;
 char *keypairs, *secretid;
+rbd_image_info_t info;
 int r;
 
 keypairs = g_strdup(qdict_get_try_str(options, "=keyvalue-pairs"));
@@ -1035,12 +1037,14 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 #endif
 }
 
-r = rbd_get_size(s->image, >image_size);
+r = rbd_stat(s->image, , sizeof(info));
 if (r < 0) {
-error_setg_errno(errp, -r, "error getting image size from %s",
+error_setg_errno(errp, -r, "error getting image info from %s",
  s->image_name);
 goto failed_post_open;
 }
+s->image_size = info.size;
+s->object_size = info.obj_size;
 
 /* If we are using an rbd snapshot, we must be r/o, otherwise
  * leave as-is */
@@ -1253,15 +1257,7 @@ static BlockAIOCB *qemu_rbd_aio_flush(BlockDriverState 
*bs,
 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVRBDState *s = bs->opaque;
-rbd_image_info_t info;
-int r;
-
-r = rbd_stat(s->image, , sizeof(info));
-if (r < 0) {
-return r;
-}
-
-bdi->cluster_size = info.obj_size;
+bdi->cluster_size = s->object_size;
 return 0;
 }
 
-- 
2.19.2




[PATCH v5 3/6] block/rbd: update s->image_size in qemu_rbd_getlength

2021-07-02 Thread Ilya Dryomov
From: Peter Lieven 

While at it just call rbd_get_size and avoid rbd_image_info_t.

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 1ebf8f7e4875..e2028d3db5ff 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1304,15 +1304,14 @@ static ImageInfoSpecific 
*qemu_rbd_get_specific_info(BlockDriverState *bs,
 static int64_t qemu_rbd_getlength(BlockDriverState *bs)
 {
 BDRVRBDState *s = bs->opaque;
-rbd_image_info_t info;
 int r;
 
-r = rbd_stat(s->image, , sizeof(info));
+r = rbd_get_size(s->image, >image_size);
 if (r < 0) {
 return r;
 }
 
-return info.size;
+return s->image_size;
 }
 
 static int coroutine_fn qemu_rbd_co_truncate(BlockDriverState *bs,
-- 
2.19.2




[PATCH v5 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-02 Thread Ilya Dryomov
This series migrates the qemu rbd driver from the old aio emulation
to native coroutines and adds write zeroes support which is important
for block operations.

To achieve this we first bump the librbd requirement to the already
outdated luminous release of ceph to get rid of some wrappers and
ifdef'ry in the code.

V4->V5:
 - rebase on Kevin's block branch (https://repo.or.cz/qemu/kevin.git block)
   to resolve conflicts with "block/rbd: Add support for rbd image encryption"
   patch

V3->V4:
 - this patch is now rebased on top of current master
 - Patch 1: just mention librbd, tweak version numbers [Ilya]
 - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
 - Patch 4: retain comment about using a BH in the callback [Ilya]
 - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP 
[Ilya]

V2->V3:
 - this patch is now rebased on top of current master
 - Patch 1: only use cc.links and not cc.run to not break
   cross-compiling. [Kevin]
   Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
   support was dropped [Daniel]
 - Patch 4: dropped
 - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]

V1->V2:
 - this patch is now rebased on top of current master with Paolos
   upcoming fixes for the meson.build script included:
- meson: accept either shared or static libraries if --disable-static
- meson: honor --enable-rbd if cc.links test fails
 - Patch 1: adjusted to meson.build script
 - Patch 2: unchanged
 - Patch 3: new patch
 - Patch 4: do not implement empty detach_aio_context callback [Jason]
 - Patch 5: - fix aio completion cleanup in error case [Jason]
- return error codes from librbd
 - Patch 6: - add support for thick provisioning [Jason]
- do not set write zeroes alignment
 - Patch 7: new patch

Peter Lieven (6):
  block/rbd: bump librbd requirement to luminous release
  block/rbd: store object_size in BDRVRBDState
  block/rbd: update s->image_size in qemu_rbd_getlength
  block/rbd: migrate from aio to coroutines
  block/rbd: add write zeroes support
  block/rbd: drop qemu_rbd_refresh_limits

 block/rbd.c | 406 
 meson.build |   7 +-
 2 files changed, 128 insertions(+), 285 deletions(-)

-- 
2.19.2




[PATCH 3/3] hw/sd: Check for valid address range in SEND_WRITE_PROT (CMD30)

2021-07-02 Thread Philippe Mathieu-Daudé
OSS-Fuzz found sending illegal addresses when querying the write
protection bits triggers an assertion:

  qemu-fuzz-i386: hw/sd/sd.c:824: uint32_t sd_wpbits(SDState *, uint64_t): 
Assertion `wpnum < sd->wpgrps_size' failed.
  ==11578== ERROR: libFuzzer: deadly signal
  #8 0x7628e091 in __assert_fail
  #9 0x588f1a3c in sd_wpbits hw/sd/sd.c:824:9
  #10 0x588dd271 in sd_normal_command hw/sd/sd.c:1383:38
  #11 0x588d777c in sd_do_command hw/sd/sd.c
  #12 0x58cb25a0 in sdbus_do_command hw/sd/core.c:100:16
  #13 0x58e02a9a in sdhci_send_command hw/sd/sdhci.c:337:12
  #14 0x58dffa46 in sdhci_write hw/sd/sdhci.c:1187:9
  #15 0x598b9d76 in memory_region_write_accessor softmmu/memory.c:489:5

Similarly to commit 8573378e62d ("hw/sd: fix out-of-bounds check
for multi block reads"), check the address range before sending
the status of the write protection bits.

Include the qtest reproducer provided by Alexander Bulekov:

  $ make check-qtest-i386
  ...
  Running test qtest-i386/fuzz-sdcard-test
  qemu-system-i386: ../hw/sd/sd.c:824: sd_wpbits: Assertion `wpnum < 
sd->wpgrps_size' failed.

Reported-by: OSS-Fuzz (Issue 29225)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/450
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c |  5 +++
 tests/qtest/fuzz-sdcard-test.c | 66 ++
 MAINTAINERS|  3 +-
 tests/qtest/meson.build|  1 +
 4 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 tests/qtest/fuzz-sdcard-test.c

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 9c8dd11bad1..c753ae24ba9 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1379,6 +1379,11 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 
 switch (sd->state) {
 case sd_transfer_state:
+if (!address_in_range(sd, "SEND_WRITE_PROT",
+  req.arg, sd->blk_len)) {
+return sd_r1;
+}
+
 sd->state = sd_sendingdata_state;
 *(uint32_t *) sd->data = sd_wpbits(sd, req.arg);
 sd->data_start = addr;
diff --git a/tests/qtest/fuzz-sdcard-test.c b/tests/qtest/fuzz-sdcard-test.c
new file mode 100644
index 000..96602eac7e5
--- /dev/null
+++ b/tests/qtest/fuzz-sdcard-test.c
@@ -0,0 +1,66 @@
+/*
+ * QTest fuzzer-generated testcase for sdcard device
+ *
+ * Copyright (c) 2021 Philippe Mathieu-Daudé 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqtest.h"
+
+/*
+ * https://gitlab.com/qemu-project/qemu/-/issues/450
+ * Used to trigger:
+ *  Assertion `wpnum < sd->wpgrps_size' failed.
+ */
+static void oss_fuzz_29225(void)
+{
+QTestState *s;
+
+s = qtest_init(" -display none -m 512m -nodefaults -nographic"
+   " -device sdhci-pci,sd-spec-version=3"
+   " -device sd-card,drive=d0"
+   " -drive if=none,index=0,file=null-co://,format=raw,id=d0");
+
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xd0690);
+qtest_outl(s, 0xcf8, 0x80001003);
+qtest_outl(s, 0xcf8, 0x80001013);
+qtest_outl(s, 0xcfc, 0x);
+qtest_outl(s, 0xcf8, 0x80001003);
+qtest_outl(s, 0xcfc, 0x3effe00);
+
+qtest_bufwrite(s, 0xff0d062c, "\xff", 0x1);
+qtest_bufwrite(s, 0xff0d060f, "\xb7", 0x1);
+qtest_bufwrite(s, 0xff0d060a, "\xc9", 0x1);
+qtest_bufwrite(s, 0xff0d060f, "\x29", 0x1);
+qtest_bufwrite(s, 0xff0d060f, "\xc2", 0x1);
+qtest_bufwrite(s, 0xff0d0628, "\xf7", 0x1);
+qtest_bufwrite(s, 0x0, "\xe3", 0x1);
+qtest_bufwrite(s, 0x7, "\x13", 0x1);
+qtest_bufwrite(s, 0x8, "\xe3", 0x1);
+qtest_bufwrite(s, 0xf, "\xe3", 0x1);
+qtest_bufwrite(s, 0xff0d060f, "\x03", 0x1);
+qtest_bufwrite(s, 0xff0d0605, "\x01", 0x1);
+qtest_bufwrite(s, 0xff0d060b, "\xff", 0x1);
+qtest_bufwrite(s, 0xff0d060c, "\xff", 0x1);
+qtest_bufwrite(s, 0xff0d060e, "\xff", 0x1);
+qtest_bufwrite(s, 0xff0d060f, "\x06", 0x1);
+qtest_bufwrite(s, 0xff0d060f, "\x9e", 0x1);
+
+qtest_quit(s);
+}
+
+int main(int argc, char **argv)
+{
+const char *arch = qtest_get_arch();
+
+g_test_init(, , NULL);
+
+   if (strcmp(arch, "i386") == 0) {
+qtest_add_func("fuzz/sdcard/oss_fuzz_29225", oss_fuzz_29225);
+   }
+
+   return g_test_run();
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index cfbf7ef79bc..fb33fe12200 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1794,7 +1794,8 @@ F: include/hw/sd/sd*
 F: hw/sd/core.c
 F: hw/sd/sd*
 F: hw/sd/ssi-sd.c
-F: tests/qtest/sd*
+F: tests/qtest/fuzz-sdcard-test.c
+F: tests/qtest/sdhci-test.c
 
 USB
 M: Gerd Hoffmann 
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index b03e8541700..1bb75ee7324 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -21,6 +21,7 @@
   (config_all_devices.has_key('CONFIG_MEGASAS_SCSI_PCI') ? 
['fuzz-megasas-test'] : []) + \
   

[PATCH 2/3] hw/sd: Extract address_in_range() helper, log invalid accesses

2021-07-02 Thread Philippe Mathieu-Daudé
Multiple commands have to check the address requested is valid.
Extract this code pattern as a new address_in_range() helper, and
log invalid accesses as guest errors.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Message-Id: <20210624142209.1193073-3-f4...@amsat.org>
---
 hw/sd/sd.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index d8fdf84f4db..9c8dd11bad1 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -937,6 +937,18 @@ static void sd_lock_command(SDState *sd)
 sd->card_status &= ~CARD_IS_LOCKED;
 }
 
+static bool address_in_range(SDState *sd, const char *desc,
+ uint64_t addr, uint32_t length)
+{
+if (addr + length > sd->size) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s offset %lu > card %lu [%%%u]\n",
+  desc, addr, sd->size, length);
+sd->card_status |= ADDRESS_ERROR;
+return false;
+}
+return true;
+}
+
 static sd_rsp_type_t sd_normal_command(SDState *sd, SDRequest req)
 {
 uint32_t rca = 0x;
@@ -1218,8 +1230,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 switch (sd->state) {
 case sd_transfer_state:
 
-if (addr + sd->blk_len > sd->size) {
-sd->card_status |= ADDRESS_ERROR;
+if (!address_in_range(sd, "READ_BLOCK", addr, sd->blk_len)) {
 return sd_r1;
 }
 
@@ -1264,8 +1275,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 switch (sd->state) {
 case sd_transfer_state:
 
-if (addr + sd->blk_len > sd->size) {
-sd->card_status |= ADDRESS_ERROR;
+if (!address_in_range(sd, "WRITE_BLOCK", addr, sd->blk_len)) {
 return sd_r1;
 }
 
@@ -1325,8 +1335,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 
 switch (sd->state) {
 case sd_transfer_state:
-if (addr >= sd->size) {
-sd->card_status |= ADDRESS_ERROR;
+if (!address_in_range(sd, "SET_WRITE_PROT", addr, 1)) {
 return sd_r1b;
 }
 
@@ -1348,8 +1357,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 
 switch (sd->state) {
 case sd_transfer_state:
-if (addr >= sd->size) {
-sd->card_status |= ADDRESS_ERROR;
+if (!address_in_range(sd, "CLR_WRITE_PROT", addr, 1)) {
 return sd_r1b;
 }
 
@@ -1826,8 +1834,8 @@ void sd_write_byte(SDState *sd, uint8_t value)
 case 25:   /* CMD25:  WRITE_MULTIPLE_BLOCK */
 if (sd->data_offset == 0) {
 /* Start of the block - let's check the address is valid */
-if (sd->data_start + sd->blk_len > sd->size) {
-sd->card_status |= ADDRESS_ERROR;
+if (!address_in_range(sd, "WRITE_MULTIPLE_BLOCK",
+  sd->data_start, sd->blk_len)) {
 break;
 }
 if (sd->size <= SDSC_MAX_CAPACITY) {
@@ -1999,8 +2007,8 @@ uint8_t sd_read_byte(SDState *sd)
 
 case 18:   /* CMD18:  READ_MULTIPLE_BLOCK */
 if (sd->data_offset == 0) {
-if (sd->data_start + io_len > sd->size) {
-sd->card_status |= ADDRESS_ERROR;
+if (!address_in_range(sd, "READ_MULTIPLE_BLOCK",
+  sd->data_start, io_len)) {
 return 0x00;
 }
 BLK_READ_BLOCK(sd->data_start, io_len);
-- 
2.31.1




[PATCH 1/3] hw/sd: When card is in wrong state, log which state it is

2021-07-02 Thread Philippe Mathieu-Daudé
We report the card is in an inconsistent state, but don't precise
in which state it is. Add this information, as it is useful when
debugging problems.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Message-Id: <20210624142209.1193073-2-f4...@amsat.org>
---
 hw/sd/sd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 282d39a7042..d8fdf84f4db 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1504,7 +1504,8 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 return sd_illegal;
 }
 
-qemu_log_mask(LOG_GUEST_ERROR, "SD: CMD%i in a wrong state\n", req.cmd);
+qemu_log_mask(LOG_GUEST_ERROR, "SD: CMD%i in a wrong state: %s\n",
+  req.cmd, sd_state_name(sd->state));
 return sd_illegal;
 }
 
-- 
2.31.1




[PATCH 0/3] hw/sd: Check for valid address range in SEND_WRITE_PROT (CMD30)

2021-07-02 Thread Philippe Mathieu-Daudé
Trivial fix for https://gitlab.com/qemu-project/qemu/-/issues/450

Missing review: patch #3

Philippe Mathieu-Daudé (3):
  hw/sd: When card is in wrong state, log which state it is
  hw/sd: Extract address_in_range() helper, log invalid accesses
  hw/sd: Check for valid address range in SEND_WRITE_PROT (CMD30)

 hw/sd/sd.c | 40 ++---
 tests/qtest/fuzz-sdcard-test.c | 66 ++
 MAINTAINERS|  3 +-
 tests/qtest/meson.build|  1 +
 4 files changed, 96 insertions(+), 14 deletions(-)
 create mode 100644 tests/qtest/fuzz-sdcard-test.c

-- 
2.31.1




Re: [PULL 00/24] Block layer patches

2021-07-02 Thread Peter Maydell
On Wed, 30 Jun 2021 at 17:02, Kevin Wolf  wrote:
>
> The following changes since commit 13d5f87cc3b94bfccc501142df4a7b12fee3a6e7:
>
>   Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-axp-20210628' 
> into staging (2021-06-29 10:02:42 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to a527e312b59ac382cb84af4b91f517a846f50705:
>
>   vhost-user-blk: Implement reconnection during realize (2021-06-30 13:21:22 
> +0200)
>
> 
> Block layer patches
>
> - Supporting changing 'file' in x-blockdev-reopen
> - ssh: add support for sha256 host key fingerprints
> - vhost-user-blk: Implement reconnection during realize
> - introduce QEMU_AUTO_VFREE
> - Don't require password of encrypted backing file for image creation
> - Code cleanups


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Re: [PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-02 Thread Peter Lieven



> Am 02.07.2021 um 14:46 schrieb Ilya Dryomov :
> 
> On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven  wrote:
>> 
>> this series migrates the qemu rbd driver from the old aio emulation
>> to native coroutines and adds write zeroes support which is important
>> for block operations.
>> 
>> To achive this we first bump the librbd requirement to the already
>> outdated luminous release of ceph to get rid of some wrappers and
>> ifdef'ry in the code.
>> 
>> V4->V4:
>> - this patch is now rebased on top of current master
>> - Patch 1: just mention librbd, tweak version numbers [Ilya]
>> - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
>> - Patch 4: retain comment about using a BH in the callback [Ilya]
>> - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP 
>> [Ilya]
>> 
>> V2->V3:
>> - this patch is now rebased on top of current master
>> - Patch 1: only use cc.links and not cc.run to not break
>>   cross-compiling. [Kevin]
>>   Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
>>   support was dropped [Daniel]
>> - Patch 4: dropped
>> - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]
>> 
>> V1->V2:
>> - this patch is now rebased on top of current master with Paolos
>>   upcoming fixes for the meson.build script included:
>>- meson: accept either shared or static libraries if --disable-static
>>- meson: honor --enable-rbd if cc.links test fails
>> - Patch 1: adjusted to meson.build script
>> - Patch 2: unchanged
>> - Patch 3: new patch
>> - Patch 4: do not implement empty detach_aio_context callback [Jason]
>> - Patch 5: - fix aio completion cleanup in error case [Jason]
>>- return error codes from librbd
>> - Patch 6: - add support for thick provisioning [Jason]
>>- do not set write zeroes alignment
>> - Patch 7: new patch
>> 
>> Peter Lieven (6):
>>  block/rbd: bump librbd requirement to luminous release
>>  block/rbd: store object_size in BDRVRBDState
>>  block/rbd: update s->image_size in qemu_rbd_getlength
>>  block/rbd: migrate from aio to coroutines
>>  block/rbd: add write zeroes support
>>  block/rbd: drop qemu_rbd_refresh_limits
>> 
>> block/rbd.c | 406 
>> meson.build |   7 +-
>> 2 files changed, 128 insertions(+), 285 deletions(-)
>> 
>> --
>> 2.17.1
>> 
>> 
> 
> Looks good to me!
> 
> Kevin picked up Or's encryption patch, so there are a few simple
> conflicts with https://repo.or.cz/qemu/kevin.git block now.  Do you
> want to rebase on top of Kevin's block branch and repost with
> "Based-on: <20210627114635.39326-1-...@il.ibm.com>" or some such in
> the cover letter or should I?
> 

Please do, i am already ooo and off for vacation. I wasn’t aware of a conflict 
in Kevin’s git repo, sorry.

Peter

> Thanks,
> 
>Ilya





Re: [PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-02 Thread Ilya Dryomov
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven  wrote:
>
> this series migrates the qemu rbd driver from the old aio emulation
> to native coroutines and adds write zeroes support which is important
> for block operations.
>
> To achive this we first bump the librbd requirement to the already
> outdated luminous release of ceph to get rid of some wrappers and
> ifdef'ry in the code.
>
> V4->V4:
>  - this patch is now rebased on top of current master
>  - Patch 1: just mention librbd, tweak version numbers [Ilya]
>  - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
>  - Patch 4: retain comment about using a BH in the callback [Ilya]
>  - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP 
> [Ilya]
>
> V2->V3:
>  - this patch is now rebased on top of current master
>  - Patch 1: only use cc.links and not cc.run to not break
>cross-compiling. [Kevin]
>Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
>support was dropped [Daniel]
>  - Patch 4: dropped
>  - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]
>
> V1->V2:
>  - this patch is now rebased on top of current master with Paolos
>upcoming fixes for the meson.build script included:
> - meson: accept either shared or static libraries if --disable-static
> - meson: honor --enable-rbd if cc.links test fails
>  - Patch 1: adjusted to meson.build script
>  - Patch 2: unchanged
>  - Patch 3: new patch
>  - Patch 4: do not implement empty detach_aio_context callback [Jason]
>  - Patch 5: - fix aio completion cleanup in error case [Jason]
> - return error codes from librbd
>  - Patch 6: - add support for thick provisioning [Jason]
> - do not set write zeroes alignment
>  - Patch 7: new patch
>
> Peter Lieven (6):
>   block/rbd: bump librbd requirement to luminous release
>   block/rbd: store object_size in BDRVRBDState
>   block/rbd: update s->image_size in qemu_rbd_getlength
>   block/rbd: migrate from aio to coroutines
>   block/rbd: add write zeroes support
>   block/rbd: drop qemu_rbd_refresh_limits
>
>  block/rbd.c | 406 
>  meson.build |   7 +-
>  2 files changed, 128 insertions(+), 285 deletions(-)
>
> --
> 2.17.1
>
>

Looks good to me!

Kevin picked up Or's encryption patch, so there are a few simple
conflicts with https://repo.or.cz/qemu/kevin.git block now.  Do you
want to rebase on top of Kevin's block branch and repost with
"Based-on: <20210627114635.39326-1-...@il.ibm.com>" or some such in
the cover letter or should I?

Thanks,

Ilya



Re: [PATCH V4 5/6] block/rbd: add write zeroes support

2021-07-02 Thread Ilya Dryomov
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven  wrote:
>
> this patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores 
> BDRV_REQ_MAY_UNMAP
> for older librbd versions.
>
> The rationale for this is as following (citing Ilya Dryomov current RBD 
> maintainer):
> ---8<---
> a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
>and as a consequence always unmap if librbd is too old
>
>It's not clear what qemu's expectation is but in general Write
>Zeroes is allowed to unmap.  The only guarantee is that subsequent
>reads return zeroes, everything else is a hint.  This is how it is
>specified in the kernel and in the NVMe spec.
>
>In particular, block/nvme.c implements it as follows:
>
>if (flags & BDRV_REQ_MAY_UNMAP) {
>cdw12 |= (1 << 25);
>}
>
>This sets the Deallocate bit.  But if it's not set, the device may
>still deallocate:
>
>"""
>If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
>command, and the namespace supports clearing all bytes to 0h in the
>values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
>from a deallocated logical block and its metadata (excluding
>protection information), then for each specified logical block, the
>controller:
>- should deallocate that logical block;
>
>...
>
>If the Deallocate bit is cleared to '0' in a Write Zeroes command,
>and the namespace supports clearing all bytes to 0h in the values
>read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
>a deallocated logical block and its metadata (excluding protection
>information), then, for each specified logical block, the
>controller:
>- may deallocate that logical block;
>"""
>
>
> https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf
>
> b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags
>
>Again, it's not clear what qemu expects here, but without it we end
>up in a ridiculous situation where specifying the "don't allow slow
>fallback" switch immediately fails all efficient zeroing requests on
>a device where Write Zeroes is always efficient:
>
>$ qemu-io -c 'help write' | grep -- '-[zun]'
> -n, -- with -z, don't allow slow fallback
> -u, -- with -z, allow unmapping
> -z, -- write zeroes using blk_co_pwrite_zeroes
>
>$ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
>write failed: Operation not supported
> --->8---
>
> Signed-off-by: Peter Lieven 
> ---
>  block/rbd.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index be0471944a..149317d33c 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -63,7 +63,8 @@ typedef enum {
>  RBD_AIO_READ,
>  RBD_AIO_WRITE,
>  RBD_AIO_DISCARD,
> -RBD_AIO_FLUSH
> +RBD_AIO_FLUSH,
> +RBD_AIO_WRITE_ZEROES
>  } RBDAIOCmd;
>
>  typedef struct BDRVRBDState {
> @@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  }
>
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> +bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
> +#endif
> +
>  /* When extending regular files, we get zeros from the OS */
>  bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
>
> @@ -827,6 +832,18 @@ static int coroutine_fn 
> qemu_rbd_start_co(BlockDriverState *bs,
>  case RBD_AIO_FLUSH:
>  r = rbd_aio_flush(s->image, c);
>  break;
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> +case RBD_AIO_WRITE_ZEROES: {
> +int zero_flags = 0;
> +#ifdef RBD_WRITE_ZEROES_FLAG_THICK_PROVISION
> +if (!(flags & BDRV_REQ_MAY_UNMAP)) {
> +zero_flags = RBD_WRITE_ZEROES_FLAG_THICK_PROVISION;
> +}
> +#endif
> +r = rbd_aio_write_zeroes(s->image, offset, bytes, c, zero_flags, 0);
> +break;
> +}
> +#endif
>  default:
>  r = -EINVAL;
>  }
> @@ -897,6 +914,16 @@ static int coroutine_fn 
> qemu_rbd_co_pdiscard(BlockDriverState *bs,
>  return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD);
>  }
>
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> +static int
> +coroutine_fn qemu_rbd_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
> +  int count, BdrvRequestFlags flags)
> +{
> +return qemu_rbd_start_co(bs, offset, count, NULL, flags,
> + RBD_AIO_WRITE_ZEROES);
> +}
> +#endif
> +
>  static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
>  {
>  BDRVRBDState *s = bs->opaque;
> @@ -1120,6 +1147,9 @@ static BlockDriver bdrv_rbd = {
>  .bdrv_co_pwritev= qemu_rbd_co_pwritev,
>  .bdrv_co_flush_to_disk  = qemu_rbd_co_flush,
>  .bdrv_co_pdiscard   = qemu_rbd_co_pdiscard,
> +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
> +.bdrv_co_pwrite_zeroes  = qemu_rbd_co_pwrite_zeroes,
> 

Re: [PATCH v2] block/rbd: Add support for rbd image encryption

2021-07-02 Thread Kevin Wolf
Am 27.06.2021 um 15:46 hat Ilya Dryomov geschrieben:
> On Sun, Jun 27, 2021 at 1:46 PM Or Ozeri  wrote:
> >
> > Starting from ceph Pacific, RBD has built-in support for image-level 
> > encryption.
> > Currently supported formats are LUKS version 1 and 2.
> >
> > There are 2 new relevant librbd APIs for controlling encryption, both 
> > expect an
> > open image context:
> >
> > rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> > rbd_encryption_load: loads encryptor/decryptor to the image IO stack
> >
> > This commit extends the qemu rbd driver API to support the above.
> >
> > Signed-off-by: Or Ozeri 
> > ---
> > v2: handle encryption info only for the case where encryption is not loaded
> > ---
> >  block/rbd.c  | 361 ++-
> >  qapi/block-core.json | 110 -
> >  2 files changed, 465 insertions(+), 6 deletions(-)
> >
> > diff --git a/block/rbd.c b/block/rbd.c
> > index f098a89c7b..8edd1be49e 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -73,6 +73,18 @@
> >  #define LIBRBD_USE_IOVEC 0
> >  #endif
> >
> > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > +
> > +static const char rbd_luks_header_verification[
> > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > +};
> > +
> > +static const char rbd_luks2_header_verification[
> > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > +};
> > +
> >  typedef enum {
> >  RBD_AIO_READ,
> >  RBD_AIO_WRITE,
> > @@ -341,6 +353,203 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t 
> > offs)
> >  }
> >  }
> >
> > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > +static int qemu_rbd_convert_luks_options(
> > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > +char **passphrase,
> > +size_t *passphrase_len,
> > +Error **errp)
> > +{
> > +return qcrypto_secret_lookup(luks_opts->key_secret, (uint8_t 
> > **)passphrase,
> > + passphrase_len, errp);
> > +}
> > +
> > +static int qemu_rbd_convert_luks_create_options(
> > +RbdEncryptionCreateOptionsLUKSBase *luks_opts,
> > +rbd_encryption_algorithm_t *alg,
> > +char **passphrase,
> > +size_t *passphrase_len,
> > +Error **errp)
> > +{
> > +int r = 0;
> > +
> > +r = qemu_rbd_convert_luks_options(
> > +qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
> > +passphrase, passphrase_len, errp);
> > +if (r < 0) {
> > +return r;
> > +}
> > +
> > +if (luks_opts->has_cipher_alg) {
> > +switch (luks_opts->cipher_alg) {
> > +case QCRYPTO_CIPHER_ALG_AES_128: {
> > +*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
> > +break;
> > +}
> > +case QCRYPTO_CIPHER_ALG_AES_256: {
> > +*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
> > +break;
> > +}
> > +default: {
> > +r = -ENOTSUP;
> > +error_setg_errno(errp, -r, "unknown encryption algorithm: 
> > %u",
> > + luks_opts->cipher_alg);
> > +return r;
> > +}
> > +}
> > +} else {
> > +/* default alg */
> > +*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static int qemu_rbd_encryption_format(rbd_image_t image,
> > +  RbdEncryptionCreateOptions *encrypt,
> > +  Error **errp)
> > +{
> > +int r = 0;
> > +g_autofree char *passphrase = NULL;
> > +size_t passphrase_len;
> > +rbd_encryption_format_t format;
> > +rbd_encryption_options_t opts;
> > +rbd_encryption_luks1_format_options_t luks_opts;
> > +rbd_encryption_luks2_format_options_t luks2_opts;
> > +size_t opts_size;
> > +uint64_t raw_size, effective_size;
> > +
> > +r = rbd_get_size(image, _size);
> > +if (r < 0) {
> > +error_setg_errno(errp, -r, "cannot get raw image size");
> > +return r;
> > +}
> > +
> > +switch (encrypt->format) {
> > +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
> > +memset(_opts, 0, sizeof(luks_opts));
> > +format = RBD_ENCRYPTION_FORMAT_LUKS1;
> > +opts = _opts;
> > +opts_size = sizeof(luks_opts);
> > +r = qemu_rbd_convert_luks_create_options(
> > +
> > qapi_RbdEncryptionCreateOptionsLUKS_base(>u.luks),
> > +_opts.alg, , _len, errp);
> > +if (r < 0) {
> > +return r;
> > +}
> > +luks_opts.passphrase = passphrase;
> > +luks_opts.passphrase_size = passphrase_len;
> > +break;
> > +}
> > +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
> > +

Re: [PATCH] MAINTAINERS: update block/rbd.c maintainer

2021-07-02 Thread Kevin Wolf
Am 19.05.2021 um 13:25 hat Ilya Dryomov geschrieben:
> Jason has moved on from working on RBD and Ceph.  I'm taking over
> his role upstream.
> 
> Signed-off-by: Ilya Dryomov 

Thanks, applied to the block branch.

Kevin




Re: [PATCH V4 4/6] block/rbd: migrate from aio to coroutines

2021-07-02 Thread Ilya Dryomov
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven  wrote:
>
> Signed-off-by: Peter Lieven 
> ---
>  block/rbd.c | 252 +++-
>  1 file changed, 90 insertions(+), 162 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 1f8dc84079..be0471944a 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -66,22 +66,6 @@ typedef enum {
>  RBD_AIO_FLUSH
>  } RBDAIOCmd;
>
> -typedef struct RBDAIOCB {
> -BlockAIOCB common;
> -int64_t ret;
> -QEMUIOVector *qiov;
> -RBDAIOCmd cmd;
> -int error;
> -struct BDRVRBDState *s;
> -} RBDAIOCB;
> -
> -typedef struct RADOSCB {
> -RBDAIOCB *acb;
> -struct BDRVRBDState *s;
> -int64_t size;
> -int64_t ret;
> -} RADOSCB;
> -
>  typedef struct BDRVRBDState {
>  rados_t cluster;
>  rados_ioctx_t io_ctx;
> @@ -93,6 +77,13 @@ typedef struct BDRVRBDState {
>  uint64_t object_size;
>  } BDRVRBDState;
>
> +typedef struct RBDTask {
> +BlockDriverState *bs;
> +Coroutine *co;
> +bool complete;
> +int64_t ret;
> +} RBDTask;
> +
>  static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
>  BlockdevOptionsRbd *opts, bool cache,
>  const char *keypairs, const char *secretid,
> @@ -325,13 +316,6 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
> char *keypairs_json,
>  return ret;
>  }
>
> -static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
> -{
> -RBDAIOCB *acb = rcb->acb;
> -iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
> -   acb->qiov->size - offs);
> -}
> -
>  /* FIXME Deprecate and remove keypairs or make it available in QMP. */
>  static int qemu_rbd_do_create(BlockdevCreateOptions *options,
>const char *keypairs, const char 
> *password_secret,
> @@ -450,46 +434,6 @@ exit:
>  return ret;
>  }
>
> -/*
> - * This aio completion is being called from rbd_finish_bh() and runs in qemu
> - * BH context.
> - */
> -static void qemu_rbd_complete_aio(RADOSCB *rcb)
> -{
> -RBDAIOCB *acb = rcb->acb;
> -int64_t r;
> -
> -r = rcb->ret;
> -
> -if (acb->cmd != RBD_AIO_READ) {
> -if (r < 0) {
> -acb->ret = r;
> -acb->error = 1;
> -} else if (!acb->error) {
> -acb->ret = rcb->size;
> -}
> -} else {
> -if (r < 0) {
> -qemu_rbd_memset(rcb, 0);
> -acb->ret = r;
> -acb->error = 1;
> -} else if (r < rcb->size) {
> -qemu_rbd_memset(rcb, r);
> -if (!acb->error) {
> -acb->ret = rcb->size;
> -}
> -} else if (!acb->error) {
> -acb->ret = r;
> -}
> -}
> -
> -g_free(rcb);
> -
> -acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
> -
> -qemu_aio_unref(acb);
> -}
> -
>  static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, Error **errp)
>  {
>  const char **vals;
> @@ -826,89 +770,59 @@ static int qemu_rbd_resize(BlockDriverState *bs, 
> uint64_t size)
>  return 0;
>  }
>
> -static const AIOCBInfo rbd_aiocb_info = {
> -.aiocb_size = sizeof(RBDAIOCB),
> -};
> -
> -static void rbd_finish_bh(void *opaque)
> +static void qemu_rbd_finish_bh(void *opaque)
>  {
> -RADOSCB *rcb = opaque;
> -qemu_rbd_complete_aio(rcb);
> +RBDTask *task = opaque;
> +task->complete = 1;
> +aio_co_wake(task->co);
>  }
>
>  /*
> - * This is the callback function for rbd_aio_read and _write
> + * This is the completion callback function for all rbd aio calls
> + * started from qemu_rbd_start_co().
>   *
>   * Note: this function is being called from a non qemu thread so
>   * we need to be careful about what we do here. Generally we only
>   * schedule a BH, and do the rest of the io completion handling
> - * from rbd_finish_bh() which runs in a qemu context.
> + * from qemu_rbd_finish_bh() which runs in a qemu context.
>   */
> -static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb)
> +static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
>  {
> -RBDAIOCB *acb = rcb->acb;
> -
> -rcb->ret = rbd_aio_get_return_value(c);
> +task->ret = rbd_aio_get_return_value(c);
>  rbd_aio_release(c);
> -
> -replay_bh_schedule_oneshot_event(bdrv_get_aio_context(acb->common.bs),
> - rbd_finish_bh, rcb);
> +aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
> +qemu_rbd_finish_bh, task);
>  }
>
> -static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
> - int64_t off,
> - QEMUIOVector *qiov,
> - int64_t size,
> - BlockCompletionFunc *cb,
> - void *opaque,
> - RBDAIOCmd cmd)
> +static int coroutine_fn 

Re: [PATCH V4 3/6] block/rbd: update s->image_size in qemu_rbd_getlength

2021-07-02 Thread Ilya Dryomov
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven  wrote:
>
> while at it just call rbd_get_size and avoid rbd_stat.
>
> Signed-off-by: Peter Lieven 
> ---
>  block/rbd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index b4caea4f1b..1f8dc84079 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -968,15 +968,14 @@ static int qemu_rbd_getinfo(BlockDriverState *bs, 
> BlockDriverInfo *bdi)
>  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
>  {
>  BDRVRBDState *s = bs->opaque;
> -rbd_image_info_t info;
>  int r;
>
> -r = rbd_stat(s->image, , sizeof(info));
> +r = rbd_get_size(s->image, >image_size);
>  if (r < 0) {
>  return r;
>  }
>
> -return info.size;
> +return s->image_size;
>  }
>
>  static int coroutine_fn qemu_rbd_co_truncate(BlockDriverState *bs,
> --
> 2.17.1
>
>

Reviewed-by: Ilya Dryomov 

Thanks,

Ilya



Re: [PATCH V4 1/6] block/rbd: bump librbd requirement to luminous release

2021-07-02 Thread Ilya Dryomov
On Fri, Jul 2, 2021 at 11:09 AM Peter Lieven  wrote:
>
> even luminous (version 12.2) is unmaintained for over 3 years now.
> Bump the requirement to get rid of the ifdef'ry in the code.
> Qemu 6.1 dropped the support for RHEL-7 which was the last supported
> OS that required an older librbd.
>
> Signed-off-by: Peter Lieven 
> ---
>  block/rbd.c | 120 
>  meson.build |   7 ++-
>  2 files changed, 13 insertions(+), 114 deletions(-)
>
> diff --git a/block/rbd.c b/block/rbd.c
> index 26f64cce7c..6b1cbe1d75 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -55,24 +55,10 @@
>   * leading "\".
>   */
>
> -/* rbd_aio_discard added in 0.1.2 */
> -#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2)
> -#define LIBRBD_SUPPORTS_DISCARD
> -#else
> -#undef LIBRBD_SUPPORTS_DISCARD
> -#endif
> -
>  #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
>
>  #define RBD_MAX_SNAPS 100
>
> -/* The LIBRBD_SUPPORTS_IOVEC is defined in librbd.h */
> -#ifdef LIBRBD_SUPPORTS_IOVEC
> -#define LIBRBD_USE_IOVEC 1
> -#else
> -#define LIBRBD_USE_IOVEC 0
> -#endif
> -
>  typedef enum {
>  RBD_AIO_READ,
>  RBD_AIO_WRITE,
> @@ -84,7 +70,6 @@ typedef struct RBDAIOCB {
>  BlockAIOCB common;
>  int64_t ret;
>  QEMUIOVector *qiov;
> -char *bounce;
>  RBDAIOCmd cmd;
>  int error;
>  struct BDRVRBDState *s;
> @@ -94,7 +79,6 @@ typedef struct RADOSCB {
>  RBDAIOCB *acb;
>  struct BDRVRBDState *s;
>  int64_t size;
> -char *buf;
>  int64_t ret;
>  } RADOSCB;
>
> @@ -342,13 +326,9 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
> char *keypairs_json,
>
>  static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
>  {
> -if (LIBRBD_USE_IOVEC) {
> -RBDAIOCB *acb = rcb->acb;
> -iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
> -   acb->qiov->size - offs);
> -} else {
> -memset(rcb->buf + offs, 0, rcb->size - offs);
> -}
> +RBDAIOCB *acb = rcb->acb;
> +iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
> +   acb->qiov->size - offs);
>  }
>
>  /* FIXME Deprecate and remove keypairs or make it available in QMP. */
> @@ -504,13 +484,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
>
>  g_free(rcb);
>
> -if (!LIBRBD_USE_IOVEC) {
> -if (acb->cmd == RBD_AIO_READ) {
> -qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
> -}
> -qemu_vfree(acb->bounce);
> -}
> -
>  acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
>
>  qemu_aio_unref(acb);
> @@ -878,28 +851,6 @@ static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB 
> *rcb)
>   rbd_finish_bh, rcb);
>  }
>
> -static int rbd_aio_discard_wrapper(rbd_image_t image,
> -   uint64_t off,
> -   uint64_t len,
> -   rbd_completion_t comp)
> -{
> -#ifdef LIBRBD_SUPPORTS_DISCARD
> -return rbd_aio_discard(image, off, len, comp);
> -#else
> -return -ENOTSUP;
> -#endif
> -}
> -
> -static int rbd_aio_flush_wrapper(rbd_image_t image,
> - rbd_completion_t comp)
> -{
> -#ifdef LIBRBD_SUPPORTS_AIO_FLUSH
> -return rbd_aio_flush(image, comp);
> -#else
> -return -ENOTSUP;
> -#endif
> -}
> -
>  static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
>   int64_t off,
>   QEMUIOVector *qiov,
> @@ -922,21 +873,6 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
>
>  rcb = g_new(RADOSCB, 1);
>
> -if (!LIBRBD_USE_IOVEC) {
> -if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
> -acb->bounce = NULL;
> -} else {
> -acb->bounce = qemu_try_blockalign(bs, qiov->size);
> -if (acb->bounce == NULL) {
> -goto failed;
> -}
> -}
> -if (cmd == RBD_AIO_WRITE) {
> -qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
> -}
> -rcb->buf = acb->bounce;
> -}
> -
>  acb->ret = 0;
>  acb->error = 0;
>  acb->s = s;
> @@ -950,7 +886,7 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
>  }
>
>  switch (cmd) {
> -case RBD_AIO_WRITE: {
> +case RBD_AIO_WRITE:
>  /*
>   * RBD APIs don't allow us to write more than actual size, so in 
> order
>   * to support growing images, we resize the image before write
> @@ -962,25 +898,16 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
>  goto failed_completion;
>  }
>  }
> -#ifdef LIBRBD_SUPPORTS_IOVEC
> -r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
> -#else
> -r = rbd_aio_write(s->image, off, size, rcb->buf, c);
> -#endif
> +r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
>  break;
> 

Re: [PULL 0/7] crypto patches

2021-07-02 Thread Peter Maydell
On Wed, 30 Jun 2021 at 13:02, Daniel P. Berrangé  wrote:
>
> The following changes since commit 13d5f87cc3b94bfccc501142df4a7b12fee3a6e7:
>
>   Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-axp-20210628' 
> into staging (2021-06-29 10:02:42 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/berrange/qemu tags/tls-deps-pull-request
>
> for you to fetch changes up to 678bcc3c2cf22262d0a72b52da57737c4a40e040:
>
>   crypto: Make QCryptoTLSCreds* structures private (2021-06-29 18:30:24 +0100)
>
> 
> Hide build time dependancy on gnutls fom non-crypto code
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



[PATCH V4 5/6] block/rbd: add write zeroes support

2021-07-02 Thread Peter Lieven
this patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores 
BDRV_REQ_MAY_UNMAP
for older librbd versions.

The rationale for this is as following (citing Ilya Dryomov current RBD 
maintainer):
---8<---
a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes()
   and as a consequence always unmap if librbd is too old

   It's not clear what qemu's expectation is but in general Write
   Zeroes is allowed to unmap.  The only guarantee is that subsequent
   reads return zeroes, everything else is a hint.  This is how it is
   specified in the kernel and in the NVMe spec.

   In particular, block/nvme.c implements it as follows:

   if (flags & BDRV_REQ_MAY_UNMAP) {
   cdw12 |= (1 << 25);
   }

   This sets the Deallocate bit.  But if it's not set, the device may
   still deallocate:

   """
   If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes
   command, and the namespace supports clearing all bytes to 0h in the
   values read (e.g., bits 2:0 in the DLFEAT field are set to 001b)
   from a deallocated logical block and its metadata (excluding
   protection information), then for each specified logical block, the
   controller:
   - should deallocate that logical block;

   ...

   If the Deallocate bit is cleared to '0' in a Write Zeroes command,
   and the namespace supports clearing all bytes to 0h in the values
   read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from
   a deallocated logical block and its metadata (excluding protection
   information), then, for each specified logical block, the
   controller:
   - may deallocate that logical block;
   """

   
https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf

b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags

   Again, it's not clear what qemu expects here, but without it we end
   up in a ridiculous situation where specifying the "don't allow slow
   fallback" switch immediately fails all efficient zeroing requests on
   a device where Write Zeroes is always efficient:

   $ qemu-io -c 'help write' | grep -- '-[zun]'
-n, -- with -z, don't allow slow fallback
-u, -- with -z, allow unmapping
-z, -- write zeroes using blk_co_pwrite_zeroes

   $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar
   write failed: Operation not supported
--->8---

Signed-off-by: Peter Lieven 
---
 block/rbd.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/rbd.c b/block/rbd.c
index be0471944a..149317d33c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -63,7 +63,8 @@ typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
 RBD_AIO_DISCARD,
-RBD_AIO_FLUSH
+RBD_AIO_FLUSH,
+RBD_AIO_WRITE_ZEROES
 } RBDAIOCmd;
 
 typedef struct BDRVRBDState {
@@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
+#endif
+
 /* When extending regular files, we get zeros from the OS */
 bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
 
@@ -827,6 +832,18 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState 
*bs,
 case RBD_AIO_FLUSH:
 r = rbd_aio_flush(s->image, c);
 break;
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+case RBD_AIO_WRITE_ZEROES: {
+int zero_flags = 0;
+#ifdef RBD_WRITE_ZEROES_FLAG_THICK_PROVISION
+if (!(flags & BDRV_REQ_MAY_UNMAP)) {
+zero_flags = RBD_WRITE_ZEROES_FLAG_THICK_PROVISION;
+}
+#endif
+r = rbd_aio_write_zeroes(s->image, offset, bytes, c, zero_flags, 0);
+break;
+}
+#endif
 default:
 r = -EINVAL;
 }
@@ -897,6 +914,16 @@ static int coroutine_fn 
qemu_rbd_co_pdiscard(BlockDriverState *bs,
 return qemu_rbd_start_co(bs, offset, count, NULL, 0, RBD_AIO_DISCARD);
 }
 
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+static int
+coroutine_fn qemu_rbd_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+  int count, BdrvRequestFlags flags)
+{
+return qemu_rbd_start_co(bs, offset, count, NULL, flags,
+ RBD_AIO_WRITE_ZEROES);
+}
+#endif
+
 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVRBDState *s = bs->opaque;
@@ -1120,6 +1147,9 @@ static BlockDriver bdrv_rbd = {
 .bdrv_co_pwritev= qemu_rbd_co_pwritev,
 .bdrv_co_flush_to_disk  = qemu_rbd_co_flush,
 .bdrv_co_pdiscard   = qemu_rbd_co_pdiscard,
+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+.bdrv_co_pwrite_zeroes  = qemu_rbd_co_pwrite_zeroes,
+#endif
 
 .bdrv_snapshot_create   = qemu_rbd_snap_create,
 .bdrv_snapshot_delete   = qemu_rbd_snap_remove,
-- 
2.17.1





[PATCH V4 4/6] block/rbd: migrate from aio to coroutines

2021-07-02 Thread Peter Lieven
Signed-off-by: Peter Lieven 
---
 block/rbd.c | 252 +++-
 1 file changed, 90 insertions(+), 162 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 1f8dc84079..be0471944a 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -66,22 +66,6 @@ typedef enum {
 RBD_AIO_FLUSH
 } RBDAIOCmd;
 
-typedef struct RBDAIOCB {
-BlockAIOCB common;
-int64_t ret;
-QEMUIOVector *qiov;
-RBDAIOCmd cmd;
-int error;
-struct BDRVRBDState *s;
-} RBDAIOCB;
-
-typedef struct RADOSCB {
-RBDAIOCB *acb;
-struct BDRVRBDState *s;
-int64_t size;
-int64_t ret;
-} RADOSCB;
-
 typedef struct BDRVRBDState {
 rados_t cluster;
 rados_ioctx_t io_ctx;
@@ -93,6 +77,13 @@ typedef struct BDRVRBDState {
 uint64_t object_size;
 } BDRVRBDState;
 
+typedef struct RBDTask {
+BlockDriverState *bs;
+Coroutine *co;
+bool complete;
+int64_t ret;
+} RBDTask;
+
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 BlockdevOptionsRbd *opts, bool cache,
 const char *keypairs, const char *secretid,
@@ -325,13 +316,6 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 return ret;
 }
 
-static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
-{
-RBDAIOCB *acb = rcb->acb;
-iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
-   acb->qiov->size - offs);
-}
-
 /* FIXME Deprecate and remove keypairs or make it available in QMP. */
 static int qemu_rbd_do_create(BlockdevCreateOptions *options,
   const char *keypairs, const char 
*password_secret,
@@ -450,46 +434,6 @@ exit:
 return ret;
 }
 
-/*
- * This aio completion is being called from rbd_finish_bh() and runs in qemu
- * BH context.
- */
-static void qemu_rbd_complete_aio(RADOSCB *rcb)
-{
-RBDAIOCB *acb = rcb->acb;
-int64_t r;
-
-r = rcb->ret;
-
-if (acb->cmd != RBD_AIO_READ) {
-if (r < 0) {
-acb->ret = r;
-acb->error = 1;
-} else if (!acb->error) {
-acb->ret = rcb->size;
-}
-} else {
-if (r < 0) {
-qemu_rbd_memset(rcb, 0);
-acb->ret = r;
-acb->error = 1;
-} else if (r < rcb->size) {
-qemu_rbd_memset(rcb, r);
-if (!acb->error) {
-acb->ret = rcb->size;
-}
-} else if (!acb->error) {
-acb->ret = r;
-}
-}
-
-g_free(rcb);
-
-acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
-
-qemu_aio_unref(acb);
-}
-
 static char *qemu_rbd_mon_host(BlockdevOptionsRbd *opts, Error **errp)
 {
 const char **vals;
@@ -826,89 +770,59 @@ static int qemu_rbd_resize(BlockDriverState *bs, uint64_t 
size)
 return 0;
 }
 
-static const AIOCBInfo rbd_aiocb_info = {
-.aiocb_size = sizeof(RBDAIOCB),
-};
-
-static void rbd_finish_bh(void *opaque)
+static void qemu_rbd_finish_bh(void *opaque)
 {
-RADOSCB *rcb = opaque;
-qemu_rbd_complete_aio(rcb);
+RBDTask *task = opaque;
+task->complete = 1;
+aio_co_wake(task->co);
 }
 
 /*
- * This is the callback function for rbd_aio_read and _write
+ * This is the completion callback function for all rbd aio calls
+ * started from qemu_rbd_start_co().
  *
  * Note: this function is being called from a non qemu thread so
  * we need to be careful about what we do here. Generally we only
  * schedule a BH, and do the rest of the io completion handling
- * from rbd_finish_bh() which runs in a qemu context.
+ * from qemu_rbd_finish_bh() which runs in a qemu context.
  */
-static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb)
+static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
 {
-RBDAIOCB *acb = rcb->acb;
-
-rcb->ret = rbd_aio_get_return_value(c);
+task->ret = rbd_aio_get_return_value(c);
 rbd_aio_release(c);
-
-replay_bh_schedule_oneshot_event(bdrv_get_aio_context(acb->common.bs),
- rbd_finish_bh, rcb);
+aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
+qemu_rbd_finish_bh, task);
 }
 
-static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
- int64_t off,
- QEMUIOVector *qiov,
- int64_t size,
- BlockCompletionFunc *cb,
- void *opaque,
- RBDAIOCmd cmd)
+static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
+  uint64_t offset,
+  uint64_t bytes,
+  QEMUIOVector *qiov,
+  int flags,
+  RBDAIOCmd cmd)
 {
-RBDAIOCB *acb;
-RADOSCB *rcb = NULL;
+

[PATCH V4 2/6] block/rbd: store object_size in BDRVRBDState

2021-07-02 Thread Peter Lieven
Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 6b1cbe1d75..b4caea4f1b 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -90,6 +90,7 @@ typedef struct BDRVRBDState {
 char *snap;
 char *namespace;
 uint64_t image_size;
+uint64_t object_size;
 } BDRVRBDState;
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
@@ -675,6 +676,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 const QDictEntry *e;
 Error *local_err = NULL;
 char *keypairs, *secretid;
+rbd_image_info_t info;
 int r;
 
 keypairs = g_strdup(qdict_get_try_str(options, "=keyvalue-pairs"));
@@ -739,13 +741,15 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto failed_open;
 }
 
-r = rbd_get_size(s->image, >image_size);
+r = rbd_stat(s->image, , sizeof(info));
 if (r < 0) {
-error_setg_errno(errp, -r, "error getting image size from %s",
+error_setg_errno(errp, -r, "error getting image info from %s",
  s->image_name);
 rbd_close(s->image);
 goto failed_open;
 }
+s->image_size = info.size;
+s->object_size = info.obj_size;
 
 /* If we are using an rbd snapshot, we must be r/o, otherwise
  * leave as-is */
@@ -957,15 +961,7 @@ static BlockAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs,
 static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
 BDRVRBDState *s = bs->opaque;
-rbd_image_info_t info;
-int r;
-
-r = rbd_stat(s->image, , sizeof(info));
-if (r < 0) {
-return r;
-}
-
-bdi->cluster_size = info.obj_size;
+bdi->cluster_size = s->object_size;
 return 0;
 }
 
-- 
2.17.1





[PATCH V4 6/6] block/rbd: drop qemu_rbd_refresh_limits

2021-07-02 Thread Peter Lieven
librbd supports 1 byte alignment for all aio operations.

Currently, there is no API call to query limits from the ceph backend.
So drop the bdrv_refresh_limits completely until there is such an API call.

Signed-off-by: Peter Lieven 
Reviewed-by: Ilya Dryomov 
---
 block/rbd.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 149317d33c..93f4bc8b93 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -228,14 +228,6 @@ done:
 return;
 }
 
-
-static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error **errp)
-{
-/* XXX Does RBD support AIO on less than 512-byte alignment? */
-bs->bl.request_alignment = 512;
-}
-
-
 static int qemu_rbd_set_auth(rados_t cluster, BlockdevOptionsRbd *opts,
  Error **errp)
 {
@@ -1130,7 +1122,6 @@ static BlockDriver bdrv_rbd = {
 .format_name= "rbd",
 .instance_size  = sizeof(BDRVRBDState),
 .bdrv_parse_filename= qemu_rbd_parse_filename,
-.bdrv_refresh_limits= qemu_rbd_refresh_limits,
 .bdrv_file_open = qemu_rbd_open,
 .bdrv_close = qemu_rbd_close,
 .bdrv_reopen_prepare= qemu_rbd_reopen_prepare,
-- 
2.17.1





[PATCH V4 0/6] block/rbd: migrate to coroutines and add write zeroes support

2021-07-02 Thread Peter Lieven
this series migrates the qemu rbd driver from the old aio emulation
to native coroutines and adds write zeroes support which is important
for block operations.

To achive this we first bump the librbd requirement to the already
outdated luminous release of ceph to get rid of some wrappers and
ifdef'ry in the code.

V4->V4:
 - this patch is now rebased on top of current master
 - Patch 1: just mention librbd, tweak version numbers [Ilya]
 - Patch 3: use rbd_get_size instead of rbd_stat [Ilya]
 - Patch 4: retain comment about using a BH in the callback [Ilya]
 - Patch 5: set BDRV_REQ_NO_FALLBACK and silently ignore BDRV_REQ_MAY_UNMAP 
[Ilya]

V2->V3:
 - this patch is now rebased on top of current master
 - Patch 1: only use cc.links and not cc.run to not break
   cross-compiling. [Kevin]
   Since Qemu 6.1 its okay to rely on librbd >= 12.x since RHEL-7
   support was dropped [Daniel]
 - Patch 4: dropped
 - Patch 5: store BDS in RBDTask and use bdrv_get_aio_context() [Kevin]

V1->V2:
 - this patch is now rebased on top of current master with Paolos
   upcoming fixes for the meson.build script included:
- meson: accept either shared or static libraries if --disable-static
- meson: honor --enable-rbd if cc.links test fails
 - Patch 1: adjusted to meson.build script
 - Patch 2: unchanged
 - Patch 3: new patch
 - Patch 4: do not implement empty detach_aio_context callback [Jason]
 - Patch 5: - fix aio completion cleanup in error case [Jason]
- return error codes from librbd
 - Patch 6: - add support for thick provisioning [Jason]
- do not set write zeroes alignment
 - Patch 7: new patch

Peter Lieven (6):
  block/rbd: bump librbd requirement to luminous release
  block/rbd: store object_size in BDRVRBDState
  block/rbd: update s->image_size in qemu_rbd_getlength
  block/rbd: migrate from aio to coroutines
  block/rbd: add write zeroes support
  block/rbd: drop qemu_rbd_refresh_limits

 block/rbd.c | 406 
 meson.build |   7 +-
 2 files changed, 128 insertions(+), 285 deletions(-)

-- 
2.17.1





[PATCH V4 1/6] block/rbd: bump librbd requirement to luminous release

2021-07-02 Thread Peter Lieven
even luminous (version 12.2) is unmaintained for over 3 years now.
Bump the requirement to get rid of the ifdef'ry in the code.
Qemu 6.1 dropped the support for RHEL-7 which was the last supported
OS that required an older librbd.

Signed-off-by: Peter Lieven 
---
 block/rbd.c | 120 
 meson.build |   7 ++-
 2 files changed, 13 insertions(+), 114 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 26f64cce7c..6b1cbe1d75 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -55,24 +55,10 @@
  * leading "\".
  */
 
-/* rbd_aio_discard added in 0.1.2 */
-#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2)
-#define LIBRBD_SUPPORTS_DISCARD
-#else
-#undef LIBRBD_SUPPORTS_DISCARD
-#endif
-
 #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
 
 #define RBD_MAX_SNAPS 100
 
-/* The LIBRBD_SUPPORTS_IOVEC is defined in librbd.h */
-#ifdef LIBRBD_SUPPORTS_IOVEC
-#define LIBRBD_USE_IOVEC 1
-#else
-#define LIBRBD_USE_IOVEC 0
-#endif
-
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -84,7 +70,6 @@ typedef struct RBDAIOCB {
 BlockAIOCB common;
 int64_t ret;
 QEMUIOVector *qiov;
-char *bounce;
 RBDAIOCmd cmd;
 int error;
 struct BDRVRBDState *s;
@@ -94,7 +79,6 @@ typedef struct RADOSCB {
 RBDAIOCB *acb;
 struct BDRVRBDState *s;
 int64_t size;
-char *buf;
 int64_t ret;
 } RADOSCB;
 
@@ -342,13 +326,9 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 
 static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 {
-if (LIBRBD_USE_IOVEC) {
-RBDAIOCB *acb = rcb->acb;
-iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
-   acb->qiov->size - offs);
-} else {
-memset(rcb->buf + offs, 0, rcb->size - offs);
-}
+RBDAIOCB *acb = rcb->acb;
+iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
+   acb->qiov->size - offs);
 }
 
 /* FIXME Deprecate and remove keypairs or make it available in QMP. */
@@ -504,13 +484,6 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
 
 g_free(rcb);
 
-if (!LIBRBD_USE_IOVEC) {
-if (acb->cmd == RBD_AIO_READ) {
-qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
-}
-qemu_vfree(acb->bounce);
-}
-
 acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
 
 qemu_aio_unref(acb);
@@ -878,28 +851,6 @@ static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB 
*rcb)
  rbd_finish_bh, rcb);
 }
 
-static int rbd_aio_discard_wrapper(rbd_image_t image,
-   uint64_t off,
-   uint64_t len,
-   rbd_completion_t comp)
-{
-#ifdef LIBRBD_SUPPORTS_DISCARD
-return rbd_aio_discard(image, off, len, comp);
-#else
-return -ENOTSUP;
-#endif
-}
-
-static int rbd_aio_flush_wrapper(rbd_image_t image,
- rbd_completion_t comp)
-{
-#ifdef LIBRBD_SUPPORTS_AIO_FLUSH
-return rbd_aio_flush(image, comp);
-#else
-return -ENOTSUP;
-#endif
-}
-
 static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
  int64_t off,
  QEMUIOVector *qiov,
@@ -922,21 +873,6 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 
 rcb = g_new(RADOSCB, 1);
 
-if (!LIBRBD_USE_IOVEC) {
-if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
-acb->bounce = NULL;
-} else {
-acb->bounce = qemu_try_blockalign(bs, qiov->size);
-if (acb->bounce == NULL) {
-goto failed;
-}
-}
-if (cmd == RBD_AIO_WRITE) {
-qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
-}
-rcb->buf = acb->bounce;
-}
-
 acb->ret = 0;
 acb->error = 0;
 acb->s = s;
@@ -950,7 +886,7 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 }
 
 switch (cmd) {
-case RBD_AIO_WRITE: {
+case RBD_AIO_WRITE:
 /*
  * RBD APIs don't allow us to write more than actual size, so in order
  * to support growing images, we resize the image before write
@@ -962,25 +898,16 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
 goto failed_completion;
 }
 }
-#ifdef LIBRBD_SUPPORTS_IOVEC
-r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
-#else
-r = rbd_aio_write(s->image, off, size, rcb->buf, c);
-#endif
+r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
 break;
-}
 case RBD_AIO_READ:
-#ifdef LIBRBD_SUPPORTS_IOVEC
-r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
-#else
-r = rbd_aio_read(s->image, off, size, rcb->buf, c);
-#endif
+r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
 break;
 case RBD_AIO_DISCARD:
-r = 

[PATCH V4 3/6] block/rbd: update s->image_size in qemu_rbd_getlength

2021-07-02 Thread Peter Lieven
while at it just call rbd_get_size and avoid rbd_stat.

Signed-off-by: Peter Lieven 
---
 block/rbd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index b4caea4f1b..1f8dc84079 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -968,15 +968,14 @@ static int qemu_rbd_getinfo(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 static int64_t qemu_rbd_getlength(BlockDriverState *bs)
 {
 BDRVRBDState *s = bs->opaque;
-rbd_image_info_t info;
 int r;
 
-r = rbd_stat(s->image, , sizeof(info));
+r = rbd_get_size(s->image, >image_size);
 if (r < 0) {
 return r;
 }
 
-return info.size;
+return s->image_size;
 }
 
 static int coroutine_fn qemu_rbd_co_truncate(BlockDriverState *bs,
-- 
2.17.1