date:20210621

Re: [PATCH 5/6] block/nbd: Do not force-cap *pnum

2021-06-21 Thread Eric Blake

On Mon, Jun 21, 2021 at 11:50:02AM +0200, Max Reitz wrote:
> > I don't that this change is correct. In contrast with file-posix you
> > don't get extra information for free, you just make a larger request.
> > This means that server will have to do more work.
> 
> Oh, oops.  Seems I was blind in my rage to replace this MIN() pattern.
> 
> You’re absolutely right.  So this patch should be dropped.

I disagree - I think ths patch is still correct, as written, _because_
we use the REQ_ONE flag.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 5/6] block/nbd: Do not force-cap *pnum

2021-06-21 Thread Eric Blake

On Sat, Jun 19, 2021 at 01:53:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > +++ b/block/nbd.c
> > @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status(
> >   .type = NBD_CMD_BLOCK_STATUS,
> >   .from = offset,
> >   .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
> > -   MIN(bytes, s->info.size - offset)),
> > +   s->info.size - offset),
> >   .flags = NBD_CMD_FLAG_REQ_ONE,
> >   };
> > 
> 
> Hmm..
> 
> I don't that this change is correct. In contrast with file-posix you don't 
> get extra information for free, you just make a larger request. This means 
> that server will have to do more work.

Not necessarily.  The fact that we have passed NBD_CMD_FLAG_REQ_ONE
means that the server is still only allowed to give us one extent in
its answer, and that it may not give us information beyond the length
we requested.  You are right that if we lose the REQ_ONE flag we may
result in the server doing more work to provide us additional extents
that we will then be ignoring because we aren't yet set up for
avoiding REQ_ONE.  Fixing that is a longer-term goal.  But in the
short term, I see no harm in giving a larger length to the server with
REQ_ONE.

> 
> (look at blockstatus_to_extents, it calls bdrv_block_status_above in a loop).
> 
> For example, assume that nbd export is a qcow2 image with all clusters 
> allocated. With this change, nbd server will loop through the whole qcow2 
> image, load all L2 tables to return big allocated extent.

No, the server is allowed to reply with less length than our request,
and that is particularly true if the server does NOT have free access
to the full length of our request.  In the case of qcow2, since
bdrv_block_status is (by current design) clamped at cluster
boundaries, requesting a 4G length will NOT increase the amount of the
server response any further than the first cluster boundary (that is,
the point where the server no longer has free access to status without
loading another cluster of L2 entries).

> 
> So, only server can decide, could it add some extra free information to 
> request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't allow it.

What the flag prohibits is the server giving us more information than
the length we requested.  But this patch is increasing our request
length for the case where the server CAN give us more information than
we need locally, on the hopes that even though the server can only
reply with one extent, we aren't wasting as many network
back-and-forth trips when a larger request would have worked.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/4] export/fuse: Allow other users access to the export

2021-06-21 Thread Max Reitz


On 21.06.21 18:12, Kevin Wolf wrote:

Am 14.06.2021 um 16:44 hat Max Reitz geschrieben:

Hi,

With the default mount options, FUSE mounts are not accessible to any
users but the one who did the mount, not even to root.  To allow such
accesses, allow_other must be passed.

This is probably useful to some people (it certainly is to me, e.g. when
exporting some image as my normal user, and then trying to loop mount it
as root), so this series adds a QAPI allow-other bool that will make the
FUSE export code pass allow_other,default_permissions to FUSE.

(default_permissions will make the kernel do the usual UNIX permission
checks, which is something that makes a lot of sense when allowing other
users access to the export.)

This also requires our SETATTR code to be able to handle permission
changes, though, so the user can then run chmod/chown/chgrp on the
export to adjust its permissions to their need.

The final patch adds a test.

If there is even a use case for leaving the option off (not trusting
root?), it must certainly be the less common case? So I'm not sure if
allow-other should be an option at all, but if it is, enabling it by
default would make more sense to me.

Is there a reason why you picked false as the default, except that it is
the old behaviour?


No. :)

Well, mostly.  I also thought, if FUSE thinks allow_other shouldn’t be 
the default, who am I to decide otherwise.


Now that I tried to find out why FUSE has it as the default (I only 
remember vague “security reasons”), I still couldn’t find out why, but I 
did find that using this option as non-root user requires /etc/fuse.conf 
to have user_allow_other in it, which I don’t think we can require.


So I think it must be an option.  As for which value should be the 
default, that probably depends on how common having user_allow_other in 
/etc/fuse.conf is.  I know I never put it there, and it’s both on my 
Fedora and my Arch system.  So I guess it seems rather common?


Max

Re: [PATCH 0/4] export/fuse: Allow other users access to the export

2021-06-21 Thread Kevin Wolf

Am 14.06.2021 um 16:44 hat Max Reitz geschrieben:
> Hi,
> 
> With the default mount options, FUSE mounts are not accessible to any
> users but the one who did the mount, not even to root.  To allow such
> accesses, allow_other must be passed.
> 
> This is probably useful to some people (it certainly is to me, e.g. when
> exporting some image as my normal user, and then trying to loop mount it
> as root), so this series adds a QAPI allow-other bool that will make the
> FUSE export code pass allow_other,default_permissions to FUSE.
> 
> (default_permissions will make the kernel do the usual UNIX permission
> checks, which is something that makes a lot of sense when allowing other
> users access to the export.)
> 
> This also requires our SETATTR code to be able to handle permission
> changes, though, so the user can then run chmod/chown/chgrp on the
> export to adjust its permissions to their need.
> 
> The final patch adds a test.

If there is even a use case for leaving the option off (not trusting
root?), it must certainly be the less common case? So I'm not sure if
allow-other should be an option at all, but if it is, enabling it by
default would make more sense to me.

Is there a reason why you picked false as the default, except that it is
the old behaviour?

Kevin

Re: [PATCH RFC] meson: add option to use zstd for qcow2 compression by default

2021-06-21 Thread Kevin Wolf

Am 21.06.2021 um 10:22 hat Paolo Bonzini geschrieben:
> On 17/06/21 21:51, Vladimir Sementsov-Ogievskiy wrote:
> > So, it's an RFC. I also can split the patch so that refactoring of
> > qcow2_co_create() go in a separate preparation patch.
> > 
> > Another RFC question, shouldn't we move to zstd by default in upstream
> > too?
> 
> I think backwards-incompatible changes in the past were not handled with
> build options, but that can be changed.
> 
> However I would prefer to have an option like
> --with-qcow2-compression={zstd,zlib}.  Meson supports multiple-choice
> options, they don't have to use enabled/disabled or (if boolean) true/false.

Yes, this is more extensible.

> Regarding changing the default, that would make images unreadable to QEMU
> 5.0 and earlier versions.  Does that apply to images that have no compressed
> clusters?

I think it does because you could be writing compressed clusters to it
later. Originally, we had only 'qemu-img convert -c' that could write
compressed clusters, but today the backup job can write them, too, and
it doesn't create the image file itself.

Kevin

Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-21 Thread Fam Zheng




> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé  wrote:
> 
> On 6/21/21 3:18 PM, Fam Zheng wrote:
>> 
>> 
>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé  wrote:
>>> 
>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>>> -ENOMEM in case of error. The driver was correctly handling the
>>> error path to recycle its volatile IOVA mappings.
>>> 
>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>>> DMA mappings per container", April 2019) added the -ENOSPC error to
>>> signal the user exhausted the DMA mappings available for a container.
>>> 
>>> The block driver started to mis-behave:
>>> 
>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>>> (qemu)
>>> (qemu) info status
>>> VM status: paused (io-error)
>>> (qemu) c
>>> VFIO_MAP_DMA failed: No space left on device
>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
>>> Assertion `ctx == blk->ctx' failed.
>> 
>> Hi Phil,
>> 
>> 
>> The diff looks good to me, but I’m not sure what exactly caused the 
>> assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC 
>> before, so it should be treated as a general case. What am I missing?
> 
> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE
> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping
> exhausted. I don't understand the full "VM resume" path, but this
> is not what we want (IO_NOSPACE is to warn the operator to add
> more storage and resume, which is pointless in our case, resuming
> won't help until we flush the mappings).
> 
> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED.

I agree with that. It just makes me feel there’s another bug in the resuming 
code path. Can you get a backtrace?

Fam

Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-21 Thread Philippe Mathieu-Daudé

On 6/21/21 3:18 PM, Fam Zheng wrote:
> 
> 
>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé  wrote:
>>
>> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
>> -ENOMEM in case of error. The driver was correctly handling the
>> error path to recycle its volatile IOVA mappings.
>>
>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
>> DMA mappings per container", April 2019) added the -ENOSPC error to
>> signal the user exhausted the DMA mappings available for a container.
>>
>> The block driver started to mis-behave:
>>
>>  qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>>  (qemu)
>>  (qemu) info status
>>  VM status: paused (io-error)
>>  (qemu) c
>>  VFIO_MAP_DMA failed: No space left on device
>>  qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
>> Assertion `ctx == blk->ctx' failed.
> 
> Hi Phil,
>  
> 
> The diff looks good to me, but I’m not sure what exactly caused the assertion 
> failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it 
> should be treated as a general case. What am I missing?

Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE
-> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping
exhausted. I don't understand the full "VM resume" path, but this
is not what we want (IO_NOSPACE is to warn the operator to add
more storage and resume, which is pointless in our case, resuming
won't help until we flush the mappings).

IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED.

[PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Or Ozeri

Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.

There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:

rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack

This commit extends the qemu rbd driver API to support the above.

Signed-off-by: Or Ozeri 
---
 block/rbd.c  | 367 ++-
 qapi/block-core.json | 110 -
 2 files changed, 471 insertions(+), 6 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f098a89c7b..7e282a8e94 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -73,6 +73,18 @@
 #define LIBRBD_USE_IOVEC 0
 #endif
 
+#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
+
+static const char rbd_luks_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
+};
+
+static const char rbd_luks2_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
+};
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -341,6 +353,202 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION
+static int qemu_rbd_convert_luks_options(
+RbdEncryptionOptionsLUKSBase *luks_opts,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+return qcrypto_secret_lookup(
+luks_opts->key_secret, (uint8_t **)passphrase, passphrase_len, 
errp);
+}
+
+static int qemu_rbd_convert_luks_create_options(
+RbdEncryptionCreateOptionsLUKSBase *luks_opts,
+rbd_encryption_algorithm_t *alg,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+int r = 0;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
+passphrase, passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+
+if (luks_opts->has_cipher_alg) {
+switch (luks_opts->cipher_alg) {
+case QCRYPTO_CIPHER_ALG_AES_128: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
+break;
+}
+case QCRYPTO_CIPHER_ALG_AES_256: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+break;
+}
+default: {
+r = -ENOTSUP;
+error_setg_errno(errp, -r, "unknown encryption algorithm: %u",
+ luks_opts->cipher_alg);
+return r;
+}
+}
+} else {
+/* default alg */
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+}
+
+return 0;
+}
+
+static int qemu_rbd_encryption_format(rbd_image_t image,
+  RbdEncryptionCreateOptions *encrypt,
+  Error **errp)
+{
+int r = 0;
+g_autofree char *passphrase = NULL;
+size_t passphrase_len;
+rbd_encryption_format_t format;
+rbd_encryption_options_t opts;
+rbd_encryption_luks1_format_options_t luks_opts;
+rbd_encryption_luks2_format_options_t luks2_opts;
+size_t opts_size;
+uint64_t raw_size, effective_size;
+
+r = rbd_get_size(image, _size);
+if (r < 0) {
+error_setg_errno(errp, -r, "cannot get raw image size");
+return r;
+}
+
+switch (encrypt->format) {
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
+memset(_opts, 0, sizeof(luks_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS1;
+opts = _opts;
+opts_size = sizeof(luks_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS_base(>u.luks),
+_opts.alg, , _len, errp);
+if (r < 0) {
+return r;
+}
+luks_opts.passphrase = passphrase;
+luks_opts.passphrase_size = passphrase_len;
+break;
+}
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
+memset(_opts, 0, sizeof(luks2_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS2;
+opts = _opts;
+opts_size = sizeof(luks2_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS2_base(
+>u.luks2),
+_opts.alg, , _len, errp);
+if (r < 0) {
+return r;
+}
+luks2_opts.passphrase = passphrase;
+luks2_opts.passphrase_size = passphrase_len;
+break;
+}
+default: {
+r = -ENOTSUP;
+error_setg_errno(
+errp, -r, "unknown image encryption format: %u",
+encrypt->format);
+return

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20210621142103.1417408-1-...@il.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210621142103.1417408-1-...@il.ibm.com
Subject: [PATCH] block/rbd: Add support for rbd image encryption

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/20210614083800.1166166-1-richard.hender...@linaro.org -> 
patchew/20210614083800.1166166-1-richard.hender...@linaro.org
 * [new tag] patchew/20210621142103.1417408-1-...@il.ibm.com -> 
patchew/20210621142103.1417408-1-...@il.ibm.com
Switched to a new branch 'test'
19faaee block/rbd: Add support for rbd image encryption

=== OUTPUT BEGIN ===
ERROR: "(foo**)" should be "(foo **)"
#60: FILE: block/rbd.c:374:
+luks_opts->key_secret, (uint8_t**)passphrase, passphrase_len, 
errp);

total: 1 errors, 0 warnings, 601 lines checked

Commit 19faaee12b6d (block/rbd: Add support for rbd image encryption) has style 
problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210621142103.1417408-1-...@il.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Or Ozeri

Starting from ceph Pacific, RBD has built-in support for image-level encryption.
Currently supported formats are LUKS version 1 and 2.

There are 2 new relevant librbd APIs for controlling encryption, both expect an
open image context:

rbd_encryption_format: formats an image (i.e. writes the LUKS header)
rbd_encryption_load: loads encryptor/decryptor to the image IO stack

This commit extends the qemu rbd driver API to support the above.

Signed-off-by: Or Ozeri 
---
 block/rbd.c  | 367 ++-
 qapi/block-core.json | 110 -
 2 files changed, 471 insertions(+), 6 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f098a89c7b..be1419a9bd 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -73,6 +73,18 @@
 #define LIBRBD_USE_IOVEC 0
 #endif
 
+#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
+
+static const char rbd_luks_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
+};
+
+static const char rbd_luks2_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
+};
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -341,6 +353,202 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
 }
 }
 
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION
+static int qemu_rbd_convert_luks_options(
+RbdEncryptionOptionsLUKSBase *luks_opts,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+return qcrypto_secret_lookup(
+luks_opts->key_secret, (uint8_t**)passphrase, passphrase_len, 
errp);
+}
+
+static int qemu_rbd_convert_luks_create_options(
+RbdEncryptionCreateOptionsLUKSBase *luks_opts,
+rbd_encryption_algorithm_t *alg,
+char **passphrase,
+size_t *passphrase_len,
+Error **errp)
+{
+int r = 0;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts),
+passphrase, passphrase_len, errp);
+if (r < 0) {
+return r;
+}
+
+if (luks_opts->has_cipher_alg) {
+switch (luks_opts->cipher_alg) {
+case QCRYPTO_CIPHER_ALG_AES_128: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES128;
+break;
+}
+case QCRYPTO_CIPHER_ALG_AES_256: {
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+break;
+}
+default: {
+r = -ENOTSUP;
+error_setg_errno(errp, -r, "unknown encryption algorithm: %u",
+ luks_opts->cipher_alg);
+return r;
+}
+}
+} else {
+/* default alg */
+*alg = RBD_ENCRYPTION_ALGORITHM_AES256;
+}
+
+return 0;
+}
+
+static int qemu_rbd_encryption_format(rbd_image_t image,
+  RbdEncryptionCreateOptions *encrypt,
+  Error **errp)
+{
+int r = 0;
+g_autofree char *passphrase = NULL;
+size_t passphrase_len;
+rbd_encryption_format_t format;
+rbd_encryption_options_t opts;
+rbd_encryption_luks1_format_options_t luks_opts;
+rbd_encryption_luks2_format_options_t luks2_opts;
+size_t opts_size;
+uint64_t raw_size, effective_size;
+
+r = rbd_get_size(image, _size);
+if (r < 0) {
+error_setg_errno(errp, -r, "cannot get raw image size");
+return r;
+}
+
+switch (encrypt->format) {
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
+memset(_opts, 0, sizeof(luks_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS1;
+opts = _opts;
+opts_size = sizeof(luks_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS_base(>u.luks),
+_opts.alg, , _len, errp);
+if (r < 0) {
+return r;
+}
+luks_opts.passphrase = passphrase;
+luks_opts.passphrase_size = passphrase_len;
+break;
+}
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
+memset(_opts, 0, sizeof(luks2_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS2;
+opts = _opts;
+opts_size = sizeof(luks2_opts);
+r = qemu_rbd_convert_luks_create_options(
+qapi_RbdEncryptionCreateOptionsLUKS2_base(
+>u.luks2),
+_opts.alg, , _len, errp);
+if (r < 0) {
+return r;
+}
+luks2_opts.passphrase = passphrase;
+luks2_opts.passphrase_size = passphrase_len;
+break;
+}
+default: {
+r = -ENOTSUP;
+error_setg_errno(
+errp, -r, "unknown image encryption format: %u",
+encrypt->format);
+return r;

Re: [PATCH 0/2] introduce QEMU_AUTO_VFREE

2021-06-21 Thread Stefan Hajnoczi

On Sat, Jun 19, 2021 at 05:21:18PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> There is a good movement to use g_autofree macro, that helps to
> automatically call g_free on exit from code block.
> 
> We lack similar possibility for qemu_memalign() functions family. Let's
> add, it seems rather simple with help of "cleanup" attribute.
> 
> I'll update more places with a follow-up if this is accepted.
> 
> Vladimir Sementsov-Ogievskiy (2):
>   introduce QEMU_AUTO_VFREE
>   block/commit: use QEMU_AUTO_VFREE
> 
>  include/qemu/osdep.h |  2 ++
>  block/commit.c   | 25 +
>  2 files changed, 11 insertions(+), 16 deletions(-)
> 
> -- 
> 2.29.2
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-21 Thread Fam Zheng




> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé  wrote:
> 
> When the NVMe block driver was introduced (see commit bdd6a90a9e5,
> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
> -ENOMEM in case of error. The driver was correctly handling the
> error path to recycle its volatile IOVA mappings.
> 
> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
> DMA mappings per container", April 2019) added the -ENOSPC error to
> signal the user exhausted the DMA mappings available for a container.
> 
> The block driver started to mis-behave:
> 
>  qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
>  (qemu)
>  (qemu) info status
>  VM status: paused (io-error)
>  (qemu) c
>  VFIO_MAP_DMA failed: No space left on device
>  qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
> Assertion `ctx == blk->ctx' failed.

Hi Phil,
 

The diff looks good to me, but I’m not sure what exactly caused the assertion 
failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it 
should be treated as a general case. What am I missing?

Reviewed-by: Fam Zheng 

> 
> Fix by handling the new -ENOSPC error (when DMA mappings are
> exhausted) without any distinction to the current -ENOMEM error,
> so we don't change the behavior on old kernels where the CVE-2019-3882
> fix is not present.
> 
> An easy way to reproduce this bug is to restrict the DMA mapping
> limit (65535 by default) when loading the VFIO IOMMU module:
> 
>  # modprobe vfio_iommu_type1 dma_entry_limit=666
> 
> Cc: qemu-sta...@nongnu.org
> Reported-by: Michal Prívozník 
> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
> Buglink: https://bugs.launchpad.net/qemu/+bug/186
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: KISS checking both errors undistinguishedly (Maxim)
> ---
> block/nvme.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 2b5421e7aa6..c3d2a49866c 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -1030,7 +1030,7 @@ try_map:
> r = qemu_vfio_dma_map(s->vfio,
>   qiov->iov[i].iov_base,
>   len, true, );
> -if (r == -ENOMEM && retry) {
> +if ((r == -ENOMEM || r == -ENOSPC) && retry) {
> retry = false;
> trace_nvme_dma_flush_queue_wait(s);
> if (s->dma_map_count) {
> -- 
> 2.31.1
> 
>

Re: [PATCH v3 02/24] modules: collect module meta-data

2021-06-21 Thread Gerd Hoffmann

On Fri, Jun 18, 2021 at 06:09:55PM +0200, Paolo Bonzini wrote:
> On 18/06/21 06:53, Gerd Hoffmann wrote:
> > +def find_command(src, target, compile_commands):
> > +for command in compile_commands:
> > +if command['file'] != src:
> > +continue
> > +if target != '' and command['command'].find(target) == -1:
> > +continue
> 
> 
> Did you look into using extract_objects for this instead of looking for the
> target (which works, but yuck :))?

ninja: error: 'libui-curses.a.p/meson-generated_.._config-host.h.o', needed by 
'ui-curses.modinfo.test', missing and no known rule to make it

Hmm, not sure where this comes from.  meson doesn't try to link
config-host.h.o into libui-curses.a, so why does extract_all_objects()
return it?

Test patch (incremental to this series) below.

take care,
  Gerd

>From 5453683429d7b08b959e2cd63ee00fdccfb0c7b7 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Mon, 21 Jun 2021 14:45:14 +0200
Subject: [PATCH] [wip] extract_all_objects experiments

---
 meson.build | 7 +++
 scripts/modinfo-test.sh | 8 
 2 files changed, 15 insertions(+)
 create mode 100755 scripts/modinfo-test.sh

diff --git a/meson.build b/meson.build
index 03bacca7cddb..8e7f176c 100644
--- a/meson.build
+++ b/meson.build
@@ -2042,6 +2042,7 @@ target_modules += { 'accel' : { 'qtest': qtest_module_ss,
 
 modinfo_collect = find_program('scripts/modinfo-collect.py')
 modinfo_generate = find_program('scripts/modinfo-generate.py')
+modinfo_test = find_program('scripts/modinfo-test.sh')
 modinfo_files = []
 
 block_mods = []
@@ -2063,6 +2064,12 @@ foreach d, list : modules
input: module_ss.sources(),
capture: true,
command: [modinfo_collect, '@INPUT@'])
+custom_target(d + '-' + m + '.modinfo.test',
+  output: d + '-' + m + '.modinfo.test',
+  input: sl.extract_all_objects(recursive: true),
+  capture: true,
+  build_by_default: true, # to be removed when added to a 
target
+  command: [modinfo_test, '@INPUT@'])
   endif
 else
   if d == 'block'
diff --git a/scripts/modinfo-test.sh b/scripts/modinfo-test.sh
new file mode 100755
index ..979c9cc9aeef
--- /dev/null
+++ b/scripts/modinfo-test.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+if test "$1" = "--target"; then
+echo "# target $2"
+shift;shift
+fi
+for item in "$@"; do
+echo "# input $item"
+done
-- 
2.31.1

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Ilya Dryomov

On Mon, Jun 21, 2021 at 1:27 PM Daniel P. Berrangé  wrote:
>
> On Mon, Jun 21, 2021 at 01:23:46PM +0200, Ilya Dryomov wrote:
> > On Mon, Jun 21, 2021 at 1:04 PM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote:
> > > > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé 
> > > >  wrote:
> > > > >
> > > > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote:
> > > > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri  wrote:
> > > > > > >
> > > > > > > Starting from ceph Pacific, RBD has built-in support for 
> > > > > > > image-level encryption.
> > > > > > > Currently supported formats are LUKS version 1 and 2.
> > > > > > >
> > > > > > > There are 2 new relevant librbd APIs for controlling encryption, 
> > > > > > > both expect an
> > > > > > > open image context:
> > > > > > >
> > > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS 
> > > > > > > header)
> > > > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO 
> > > > > > > stack
> > > > > > >
> > > > > > > This commit extends the qemu rbd driver API to support the above.
> > > > > > >
> > > > > > > Signed-off-by: Or Ozeri 
> > > > > > > ---
> > > > > > >  block/raw-format.c   |   7 +
> > > > > > >  block/rbd.c  | 371 
> > > > > > > ++-
> > > > > > >  qapi/block-core.json | 110 -
> > > > > > >  3 files changed, 482 insertions(+), 6 deletions(-)
> > > > >
> > > > >
> > > > > > > diff --git a/block/rbd.c b/block/rbd.c
> > > > > > > index f098a89c7b..183b17cd84 100644
> > > > > > > --- a/block/rbd.c
> > > > > > > +++ b/block/rbd.c
> > > > > > > @@ -73,6 +73,18 @@
> > > > > > >  #define LIBRBD_USE_IOVEC 0
> > > > > > >  #endif
> > > > > > >
> > > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > > > > > > +
> > > > > > > +static const char rbd_luks_header_verification[
> > > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > > > > > > +};
> > > > > > > +
> > > > > > > +static const char rbd_luks2_header_verification[
> > > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > > > > > > +};
> > > > > > > +
> > > > > > >  typedef enum {
> > > > > > >  RBD_AIO_READ,
> > > > > > >  RBD_AIO_WRITE,
> > > > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, 
> > > > > > > int64_t offs)
> > > > > > >  }
> > > > > > >  }
> > > > > > >
> > > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > > > > > > +static int qemu_rbd_convert_luks_options(
> > > > > > > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > > > > > > +char **passphrase,
> > > > > > > +Error **errp)
> > > > > > > +{
> > > > > > > +int r = 0;
> > > > > > > +
> > > > > > > +if (!luks_opts->has_key_secret) {
> > > > > > > +r = -EINVAL;
> > > > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret");
> > > > > > > +return r;
> > > > > > > +}
> > > > > >
> > > > > > Why is key-secret optional?
> > > > >
> > > > > It doesn't look like it is handled correctly here, but we need to
> > > > > be able to run 'qemu-img info ' and get information back
> > > > > on the size of the image, and whether or not it is encrypted,
> > > > > without having to supply a passphrase upfront. So it is right that
> > > > > key-secret be optional, but also we shouldn't return an fatal
> > > > > error like this.
> > > >
> > > > Hi Daniel,
> > > >
> > > > The key-secret lives inside RbdEncryptionOptions (or
> > > > RbdEncryptionCreateOptions) which are already optional:
> > > >
> > > > '*encrypt': 'RbdEncryptionOptions'
> > > >
> > > > '*encrypt' :'RbdEncryptionCreateOptions'
> > > >
> > > > The image is opened as usual and then, if "encrypt" is specified,
> > > > the encryption profile is loaded (or created and laid down).  It does
> > > > not make sense to attempt to load or create the encryption profile
> > > > without the passphrase -- it would always fail.
> > >
> > > Ah, that sounds like it is probably ok then.
> > >
> > >
> > > > > Only if BDRV_O_NO_IO is NOT set, should this error be reported
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > >  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> > > > > > >  {
> > > > > > >  BDRVRBDState *s = bs->opaque;
> > > > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = 
> > > > > > > {
> > > > > > >  .type = QEMU_OPT_STRING,
> > > > > > >  .help = "ID of secret providing the password",
> > > > > > >  },
> > > > > > > +{
> > > > > > > +.name = "encrypt.format",
> > > > > > > +.type = QEMU_OPT_STRING,
> > > > > > > +.help = "Encrypt the image, format choices: 'luks', 
> > > > > > > 'luks2'",
> > > > > >
> > > > > > I think it should be "luks1" and

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Daniel P . Berrangé

On Mon, Jun 21, 2021 at 01:23:46PM +0200, Ilya Dryomov wrote:
> On Mon, Jun 21, 2021 at 1:04 PM Daniel P. Berrangé  
> wrote:
> >
> > On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote:
> > > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé  
> > > wrote:
> > > >
> > > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote:
> > > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri  wrote:
> > > > > >
> > > > > > Starting from ceph Pacific, RBD has built-in support for 
> > > > > > image-level encryption.
> > > > > > Currently supported formats are LUKS version 1 and 2.
> > > > > >
> > > > > > There are 2 new relevant librbd APIs for controlling encryption, 
> > > > > > both expect an
> > > > > > open image context:
> > > > > >
> > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS 
> > > > > > header)
> > > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack
> > > > > >
> > > > > > This commit extends the qemu rbd driver API to support the above.
> > > > > >
> > > > > > Signed-off-by: Or Ozeri 
> > > > > > ---
> > > > > >  block/raw-format.c   |   7 +
> > > > > >  block/rbd.c  | 371 
> > > > > > ++-
> > > > > >  qapi/block-core.json | 110 -
> > > > > >  3 files changed, 482 insertions(+), 6 deletions(-)
> > > >
> > > >
> > > > > > diff --git a/block/rbd.c b/block/rbd.c
> > > > > > index f098a89c7b..183b17cd84 100644
> > > > > > --- a/block/rbd.c
> > > > > > +++ b/block/rbd.c
> > > > > > @@ -73,6 +73,18 @@
> > > > > >  #define LIBRBD_USE_IOVEC 0
> > > > > >  #endif
> > > > > >
> > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > > > > > +
> > > > > > +static const char rbd_luks_header_verification[
> > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > > > > > +};
> > > > > > +
> > > > > > +static const char rbd_luks2_header_verification[
> > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > > > > > +};
> > > > > > +
> > > > > >  typedef enum {
> > > > > >  RBD_AIO_READ,
> > > > > >  RBD_AIO_WRITE,
> > > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, 
> > > > > > int64_t offs)
> > > > > >  }
> > > > > >  }
> > > > > >
> > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > > > > > +static int qemu_rbd_convert_luks_options(
> > > > > > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > > > > > +char **passphrase,
> > > > > > +Error **errp)
> > > > > > +{
> > > > > > +int r = 0;
> > > > > > +
> > > > > > +if (!luks_opts->has_key_secret) {
> > > > > > +r = -EINVAL;
> > > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret");
> > > > > > +return r;
> > > > > > +}
> > > > >
> > > > > Why is key-secret optional?
> > > >
> > > > It doesn't look like it is handled correctly here, but we need to
> > > > be able to run 'qemu-img info ' and get information back
> > > > on the size of the image, and whether or not it is encrypted,
> > > > without having to supply a passphrase upfront. So it is right that
> > > > key-secret be optional, but also we shouldn't return an fatal
> > > > error like this.
> > >
> > > Hi Daniel,
> > >
> > > The key-secret lives inside RbdEncryptionOptions (or
> > > RbdEncryptionCreateOptions) which are already optional:
> > >
> > > '*encrypt': 'RbdEncryptionOptions'
> > >
> > > '*encrypt' :'RbdEncryptionCreateOptions'
> > >
> > > The image is opened as usual and then, if "encrypt" is specified,
> > > the encryption profile is loaded (or created and laid down).  It does
> > > not make sense to attempt to load or create the encryption profile
> > > without the passphrase -- it would always fail.
> >
> > Ah, that sounds like it is probably ok then.
> >
> >
> > > > Only if BDRV_O_NO_IO is NOT set, should this error be reported
> > > >
> > > >
> > > >
> > > >
> > > > > >  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> > > > > >  {
> > > > > >  BDRVRBDState *s = bs->opaque;
> > > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = {
> > > > > >  .type = QEMU_OPT_STRING,
> > > > > >  .help = "ID of secret providing the password",
> > > > > >  },
> > > > > > +{
> > > > > > +.name = "encrypt.format",
> > > > > > +.type = QEMU_OPT_STRING,
> > > > > > +.help = "Encrypt the image, format choices: 'luks', 
> > > > > > 'luks2'",
> > > > >
> > > > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and
> > > > > "rbd encryption format" command.
> > > >
> > > > No, it should stay "luks" not "luks1", to match the existing QEMU
> > > > terminology for its LUKS v1 encryption support.
> > >
> > > If you insist on following the QEMU nomenclature here it's fine with
> > > me but I want to point out that

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Ilya Dryomov

On Mon, Jun 21, 2021 at 1:04 PM Daniel P. Berrangé  wrote:
>
> On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote:
> > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote:
> > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri  wrote:
> > > > >
> > > > > Starting from ceph Pacific, RBD has built-in support for image-level 
> > > > > encryption.
> > > > > Currently supported formats are LUKS version 1 and 2.
> > > > >
> > > > > There are 2 new relevant librbd APIs for controlling encryption, both 
> > > > > expect an
> > > > > open image context:
> > > > >
> > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack
> > > > >
> > > > > This commit extends the qemu rbd driver API to support the above.
> > > > >
> > > > > Signed-off-by: Or Ozeri 
> > > > > ---
> > > > >  block/raw-format.c   |   7 +
> > > > >  block/rbd.c  | 371 
> > > > > ++-
> > > > >  qapi/block-core.json | 110 -
> > > > >  3 files changed, 482 insertions(+), 6 deletions(-)
> > >
> > >
> > > > > diff --git a/block/rbd.c b/block/rbd.c
> > > > > index f098a89c7b..183b17cd84 100644
> > > > > --- a/block/rbd.c
> > > > > +++ b/block/rbd.c
> > > > > @@ -73,6 +73,18 @@
> > > > >  #define LIBRBD_USE_IOVEC 0
> > > > >  #endif
> > > > >
> > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > > > > +
> > > > > +static const char rbd_luks_header_verification[
> > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > > > > +};
> > > > > +
> > > > > +static const char rbd_luks2_header_verification[
> > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > > > > +};
> > > > > +
> > > > >  typedef enum {
> > > > >  RBD_AIO_READ,
> > > > >  RBD_AIO_WRITE,
> > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, 
> > > > > int64_t offs)
> > > > >  }
> > > > >  }
> > > > >
> > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > > > > +static int qemu_rbd_convert_luks_options(
> > > > > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > > > > +char **passphrase,
> > > > > +Error **errp)
> > > > > +{
> > > > > +int r = 0;
> > > > > +
> > > > > +if (!luks_opts->has_key_secret) {
> > > > > +r = -EINVAL;
> > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret");
> > > > > +return r;
> > > > > +}
> > > >
> > > > Why is key-secret optional?
> > >
> > > It doesn't look like it is handled correctly here, but we need to
> > > be able to run 'qemu-img info ' and get information back
> > > on the size of the image, and whether or not it is encrypted,
> > > without having to supply a passphrase upfront. So it is right that
> > > key-secret be optional, but also we shouldn't return an fatal
> > > error like this.
> >
> > Hi Daniel,
> >
> > The key-secret lives inside RbdEncryptionOptions (or
> > RbdEncryptionCreateOptions) which are already optional:
> >
> > '*encrypt': 'RbdEncryptionOptions'
> >
> > '*encrypt' :'RbdEncryptionCreateOptions'
> >
> > The image is opened as usual and then, if "encrypt" is specified,
> > the encryption profile is loaded (or created and laid down).  It does
> > not make sense to attempt to load or create the encryption profile
> > without the passphrase -- it would always fail.
>
> Ah, that sounds like it is probably ok then.
>
>
> > > Only if BDRV_O_NO_IO is NOT set, should this error be reported
> > >
> > >
> > >
> > >
> > > > >  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> > > > >  {
> > > > >  BDRVRBDState *s = bs->opaque;
> > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = {
> > > > >  .type = QEMU_OPT_STRING,
> > > > >  .help = "ID of secret providing the password",
> > > > >  },
> > > > > +{
> > > > > +.name = "encrypt.format",
> > > > > +.type = QEMU_OPT_STRING,
> > > > > +.help = "Encrypt the image, format choices: 'luks', 
> > > > > 'luks2'",
> > > >
> > > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and
> > > > "rbd encryption format" command.
> > >
> > > No, it should stay "luks" not "luks1", to match the existing QEMU
> > > terminology for its LUKS v1 encryption support.
> >
> > If you insist on following the QEMU nomenclature here it's fine with
> > me but I want to point out that encryption-formatted clones won't be
> > interoperable with QEMU LUKS driver or dm-crypt so making the names
> > match QEMU instead of librbd and rbd CLI seems a bit misleading.
>
> In what way is it not interoperable ?
>
> If we don't specify any 'encrypt' option, does the guest see the
> raw LUKS header + payload, or is the

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Daniel P . Berrangé

On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote:
> On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé  
> wrote:
> >
> > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote:
> > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri  wrote:
> > > >
> > > > Starting from ceph Pacific, RBD has built-in support for image-level 
> > > > encryption.
> > > > Currently supported formats are LUKS version 1 and 2.
> > > >
> > > > There are 2 new relevant librbd APIs for controlling encryption, both 
> > > > expect an
> > > > open image context:
> > > >
> > > > rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack
> > > >
> > > > This commit extends the qemu rbd driver API to support the above.
> > > >
> > > > Signed-off-by: Or Ozeri 
> > > > ---
> > > >  block/raw-format.c   |   7 +
> > > >  block/rbd.c  | 371 ++-
> > > >  qapi/block-core.json | 110 -
> > > >  3 files changed, 482 insertions(+), 6 deletions(-)
> >
> >
> > > > diff --git a/block/rbd.c b/block/rbd.c
> > > > index f098a89c7b..183b17cd84 100644
> > > > --- a/block/rbd.c
> > > > +++ b/block/rbd.c
> > > > @@ -73,6 +73,18 @@
> > > >  #define LIBRBD_USE_IOVEC 0
> > > >  #endif
> > > >
> > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > > > +
> > > > +static const char rbd_luks_header_verification[
> > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > > > +};
> > > > +
> > > > +static const char rbd_luks2_header_verification[
> > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > > > +};
> > > > +
> > > >  typedef enum {
> > > >  RBD_AIO_READ,
> > > >  RBD_AIO_WRITE,
> > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t 
> > > > offs)
> > > >  }
> > > >  }
> > > >
> > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > > > +static int qemu_rbd_convert_luks_options(
> > > > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > > > +char **passphrase,
> > > > +Error **errp)
> > > > +{
> > > > +int r = 0;
> > > > +
> > > > +if (!luks_opts->has_key_secret) {
> > > > +r = -EINVAL;
> > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret");
> > > > +return r;
> > > > +}
> > >
> > > Why is key-secret optional?
> >
> > It doesn't look like it is handled correctly here, but we need to
> > be able to run 'qemu-img info ' and get information back
> > on the size of the image, and whether or not it is encrypted,
> > without having to supply a passphrase upfront. So it is right that
> > key-secret be optional, but also we shouldn't return an fatal
> > error like this.
> 
> Hi Daniel,
> 
> The key-secret lives inside RbdEncryptionOptions (or
> RbdEncryptionCreateOptions) which are already optional:
> 
> '*encrypt': 'RbdEncryptionOptions'
> 
> '*encrypt' :'RbdEncryptionCreateOptions'
> 
> The image is opened as usual and then, if "encrypt" is specified,
> the encryption profile is loaded (or created and laid down).  It does
> not make sense to attempt to load or create the encryption profile
> without the passphrase -- it would always fail.

Ah, that sounds like it is probably ok then.


> > Only if BDRV_O_NO_IO is NOT set, should this error be reported
> >
> >
> >
> >
> > > >  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> > > >  {
> > > >  BDRVRBDState *s = bs->opaque;
> > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = {
> > > >  .type = QEMU_OPT_STRING,
> > > >  .help = "ID of secret providing the password",
> > > >  },
> > > > +{
> > > > +.name = "encrypt.format",
> > > > +.type = QEMU_OPT_STRING,
> > > > +.help = "Encrypt the image, format choices: 'luks', 
> > > > 'luks2'",
> > >
> > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and
> > > "rbd encryption format" command.
> >
> > No, it should stay "luks" not "luks1", to match the existing QEMU
> > terminology for its LUKS v1 encryption support.
> 
> If you insist on following the QEMU nomenclature here it's fine with
> me but I want to point out that encryption-formatted clones won't be
> interoperable with QEMU LUKS driver or dm-crypt so making the names
> match QEMU instead of librbd and rbd CLI seems a bit misleading.

In what way is it not interoperable ?

If we don't specify any 'encrypt' option, does the guest see the
raw LUKS header + payload, or is the header completely hidden
and only ever accessible RBD ?


> > > > +##
> > > > +# @RbdEncryptionCreateOptionsLUKSBase:
> > > > +#
> > > > +# @cipher-alg: The encryption algorithm
> > > > +#
> > > > +# Since: 6.1
> > > > +##
> > > > +{ 'struct': 'RbdEncryptionCreateOptionsLUKSBase',
> > > > +  'base':

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Ilya Dryomov

On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé  wrote:
>
> On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote:
> > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri  wrote:
> > >
> > > Starting from ceph Pacific, RBD has built-in support for image-level 
> > > encryption.
> > > Currently supported formats are LUKS version 1 and 2.
> > >
> > > There are 2 new relevant librbd APIs for controlling encryption, both 
> > > expect an
> > > open image context:
> > >
> > > rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack
> > >
> > > This commit extends the qemu rbd driver API to support the above.
> > >
> > > Signed-off-by: Or Ozeri 
> > > ---
> > >  block/raw-format.c   |   7 +
> > >  block/rbd.c  | 371 ++-
> > >  qapi/block-core.json | 110 -
> > >  3 files changed, 482 insertions(+), 6 deletions(-)
>
>
> > > diff --git a/block/rbd.c b/block/rbd.c
> > > index f098a89c7b..183b17cd84 100644
> > > --- a/block/rbd.c
> > > +++ b/block/rbd.c
> > > @@ -73,6 +73,18 @@
> > >  #define LIBRBD_USE_IOVEC 0
> > >  #endif
> > >
> > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > > +
> > > +static const char rbd_luks_header_verification[
> > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > > +};
> > > +
> > > +static const char rbd_luks2_header_verification[
> > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > > +};
> > > +
> > >  typedef enum {
> > >  RBD_AIO_READ,
> > >  RBD_AIO_WRITE,
> > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t 
> > > offs)
> > >  }
> > >  }
> > >
> > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > > +static int qemu_rbd_convert_luks_options(
> > > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > > +char **passphrase,
> > > +Error **errp)
> > > +{
> > > +int r = 0;
> > > +
> > > +if (!luks_opts->has_key_secret) {
> > > +r = -EINVAL;
> > > +error_setg_errno(errp, -r, "missing encrypt.key-secret");
> > > +return r;
> > > +}
> >
> > Why is key-secret optional?
>
> It doesn't look like it is handled correctly here, but we need to
> be able to run 'qemu-img info ' and get information back
> on the size of the image, and whether or not it is encrypted,
> without having to supply a passphrase upfront. So it is right that
> key-secret be optional, but also we shouldn't return an fatal
> error like this.

Hi Daniel,

The key-secret lives inside RbdEncryptionOptions (or
RbdEncryptionCreateOptions) which are already optional:

'*encrypt': 'RbdEncryptionOptions'

'*encrypt' :'RbdEncryptionCreateOptions'

The image is opened as usual and then, if "encrypt" is specified,
the encryption profile is loaded (or created and laid down).  It does
not make sense to attempt to load or create the encryption profile
without the passphrase -- it would always fail.

"qemu-img info" can get the size, etc without loading the encryption
profile (i.e. ->bdrv_get_info and ->bdrv_get_specific_info don't need
it).  But note that once the encryption profile is loaded, the size
changes because librbd subtracts the LUKS header overhead.  So

$ qemu-img info --image-opts driver=rbd,pool=foo,image=bar

and

$ qemu-img info --object secret,file=passphrase.txt,id=sec0
--image-opts 
driver=rbd,pool=foo,image=bar,encrypt.format=luks2,encrypt.key-secret=sec0

would give different results.

>
> Only if BDRV_O_NO_IO is NOT set, should this error be reported
>
>
>
>
> > >  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> > >  {
> > >  BDRVRBDState *s = bs->opaque;
> > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = {
> > >  .type = QEMU_OPT_STRING,
> > >  .help = "ID of secret providing the password",
> > >  },
> > > +{
> > > +.name = "encrypt.format",
> > > +.type = QEMU_OPT_STRING,
> > > +.help = "Encrypt the image, format choices: 'luks', 'luks2'",
> >
> > I think it should be "luks1" and "luks2" to match rbd/librbd.h and
> > "rbd encryption format" command.
>
> No, it should stay "luks" not "luks1", to match the existing QEMU
> terminology for its LUKS v1 encryption support.

If you insist on following the QEMU nomenclature here it's fine with
me but I want to point out that encryption-formatted clones won't be
interoperable with QEMU LUKS driver or dm-crypt so making the names
match QEMU instead of librbd and rbd CLI seems a bit misleading.

>
>
> > > @@ -3609,6 +3622,94 @@
> > >  { 'enum': 'RbdAuthMode',
> > >'data': [ 'cephx', 'none' ] }
> > >
> > > +##
> > > +# @RbdImageEncryptionFormat:
> > > +#
> > > +# Since: 6.1
> > > +##
> > > +{ 'enum': 'RbdImageEncryptionFormat',
> > > +  'data': [ 'luks', 'luks2' ] }
> >
> > Ditto

[PATCH v6 16/16] docs/devel/testing: add -p option to the debug section of QEMU iotests

2021-06-21 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 docs/devel/testing.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index fa85592a38..28a0b37b84 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -250,6 +250,10 @@ a failing test:
 * ``-d`` (debug) just increases the logging verbosity, showing
   for example the QMP commands and answers.
 
+* ``-p`` (print) redirect QEMU’s stdout and stderr to the test output,
+  instead of saving it into a log file in
+  ``$TEST_DIR/qemu-machine-``.
+
 Test case groups
 
 
-- 
2.31.1

[PATCH v6 06/16] qemu-iotests: delay QMP socket timers

2021-06-21 Thread Emanuele Giuseppe Esposito

Attaching gdbserver implies that the qmp socket
should wait indefinitely for an answer from QEMU.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index c86f239d81..e176a84620 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -477,10 +477,14 @@ def __init__(self, seconds, errmsg="Timeout"):
 self.seconds = seconds
 self.errmsg = errmsg
 def __enter__(self):
+if qemu_gdb:
+return self
 signal.signal(signal.SIGALRM, self.timeout)
 signal.setitimer(signal.ITIMER_REAL, self.seconds)
 return self
 def __exit__(self, exc_type, value, traceback):
+if qemu_gdb:
+return False
 signal.setitimer(signal.ITIMER_REAL, 0)
 return False
 def timeout(self, signum, frame):
@@ -575,7 +579,7 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
-timer = 15.0
+timer = 15.0 if not qemu_gdb else None
 super().__init__(qemu_prog, qemu_opts, name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
-- 
2.31.1

[PATCH v6 15/16] qemu-iotests: add option to show qemu binary logs on stdout

2021-06-21 Thread Emanuele Giuseppe Esposito

Using the flag -p, allow the qemu binary to print to stdout.

Also create the common function _close_qemu_log_file() to
avoid accessing machine.py private fields directly and have
duplicate code.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 python/qemu/machine/machine.py | 9 ++---
 tests/qemu-iotests/check   | 4 +++-
 tests/qemu-iotests/iotests.py  | 8 
 tests/qemu-iotests/testenv.py  | 9 +++--
 4 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index fdf2fc0e9c..c9d344d955 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -338,6 +338,11 @@ def _post_launch(self) -> None:
 if self._qmp_connection:
 self._qmp.accept(self._qmp_timer)
 
+def _close_qemu_log_file(self) -> None:
+if self._qemu_log_file is not None:
+self._qemu_log_file.close()
+self._qemu_log_file = None
+
 def _post_shutdown(self) -> None:
 """
 Called to cleanup the VM instance after the process has exited.
@@ -350,9 +355,7 @@ def _post_shutdown(self) -> None:
 self._qmp.close()
 self._qmp_connection = None
 
-if self._qemu_log_file is not None:
-self._qemu_log_file.close()
-self._qemu_log_file = None
+self._close_qemu_log_file()
 
 self._load_io_log()
 
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index ebd27946db..da1bfb839e 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -36,6 +36,8 @@ def make_argparser() -> argparse.ArgumentParser:
help='pretty print output for make check')
 
 p.add_argument('-d', dest='debug', action='store_true', help='debug')
+p.add_argument('-p', dest='print', action='store_true',
+help='redirects qemu\'s stdout and stderr to the test output')
 p.add_argument('-gdb', action='store_true',
help="start gdbserver with $GDB_OPTIONS options \
 ('localhost:12345' if $GDB_OPTIONS is empty)")
@@ -119,7 +121,7 @@ if __name__ == '__main__':
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
   debug=args.debug, valgrind=args.valgrind,
-  gdb=args.gdb)
+  gdb=args.gdb, qprint=args.print)
 
 if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--':
 if not args.tests:
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7aa6707032..eee6fb7a9f 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -79,6 +79,8 @@
 if gdb_qemu_env:
 qemu_gdb = ['gdbserver'] + gdb_qemu_env.strip().split(' ')
 
+qemu_print = os.environ.get('PRINT_QEMU', False)
+
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
@@ -613,6 +615,12 @@ def _post_shutdown(self) -> None:
 else:
 os.remove(valgrind_filename)
 
+def _pre_launch(self) -> None:
+super()._pre_launch()
+if qemu_print:
+# set QEMU binary output to stdout
+self._close_qemu_log_file()
+
 def add_object(self, opts):
 self._args.append('-object')
 self._args.append(opts)
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 8bf154376f..70da0d60c8 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -74,7 +74,7 @@ class TestEnv(ContextManager['TestEnv']):
  'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU',
  'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX',
  'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_',
- 'GDB_OPTIONS']
+ 'GDB_OPTIONS', 'PRINT_QEMU']
 
 def prepare_subprocess(self, args: List[str]) -> Dict[str, str]:
 if self.debug:
@@ -181,7 +181,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
  misalign: bool = False,
  debug: bool = False,
  valgrind: bool = False,
- gdb: bool = False) -> None:
+ gdb: bool = False,
+ qprint: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -189,6 +190,9 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
 self.misalign = misalign
 self.debug = debug
 
+if qprint:
+self.print_qemu = 'y'
+
 if gdb:
 self.gdb_options = os.getenv('GDB_OPTIONS', DEF_GDB_OPTIONS)
 if not self.gdb_options:
@@ -299,6 +303,7 @@ def print_env(self) -> None:
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
 GDB_OPTIONS   -- {GDB_OPTIONS}

[PATCH v6 13/16] qemu-iotests: insert valgrind command line as wrapper for qemu binary

2021-06-21 Thread Emanuele Giuseppe Esposito

If -gdb and -valgrind are both defined, return an error.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 85d8c0abbb..7aa6707032 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -591,7 +591,11 @@ class VM(qtest.QEMUQtestMachine):
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
 timer = 15.0 if not (qemu_gdb or qemu_valgrind) else None
-super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
+if qemu_gdb and qemu_valgrind:
+sys.stderr.write('Either use gdb or valgrind, not together\n')
+sys.exit(1)
+wrapper = qemu_gdb if qemu_gdb else qemu_valgrind
+super().__init__(qemu_prog, qemu_opts, wrapper=wrapper,
  name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
-- 
2.31.1

[PATCH v6 02/16] python: Reduce strictness of pylint's duplicate-code check

2021-06-21 Thread Emanuele Giuseppe Esposito

From: John Snow 

Pylint prior to 2.8.3 (We pin at >= 2.8.0) includes function and method
signatures as part of its duplicate checking algorithm. This check does
not listen to pragmas, so the only way to disable it is to turn it off
completely or increase the minimum duplicate lines so that it doesn't
trigger for functions with long, multi-line signatures.

When we decide to upgrade to pylint 2.8.3 or greater, we will be able to
use 'ignore-signatures = true' to the config instead.

I'd prefer not to keep us on the very bleeding edge of pylint if I can
help it -- 2.8.3 came out only three days ago at time of writing.

See: https://github.com/PyCQA/pylint/pull/4474
Signed-off-by: John Snow 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 python/setup.cfg | 5 +
 1 file changed, 5 insertions(+)

diff --git a/python/setup.cfg b/python/setup.cfg
index 0fcdec6f32..d82c39aa46 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -78,6 +78,11 @@ good-names=i,
 # Ignore imports when computing similarities.
 ignore-imports=yes
 
+# Minimum lines number of a similarity.
+# TODO: Remove after we opt in to Pylint 2.8.3. See commit msg.
+min-similarity-lines=6
+
+
 [isort]
 force_grid_wrap=4
 force_sort_within_sections=True
-- 
2.31.1

[PATCH v6 14/16] docs/devel/testing: add -valgrind option to the debug section of QEMU iotests

2021-06-21 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 docs/devel/testing.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 8b24e6fb47..fa85592a38 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -240,6 +240,13 @@ a failing test:
   If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored,
   regardless on whether it is set or not.
 
+* ``-valgrind`` attaches a valgrind instance to QEMU. If it detects
+  warnings, it will print and save the log in
+  ``$TEST_DIR/.valgrind``.
+  The final command line will be ``valgrind --log-file=$TEST_DIR/
+  .valgrind --error-exitcode=99 $QEMU ...``
+  Note: if used together with ``-gdb``, this command will be ignored.
+
 * ``-d`` (debug) just increases the logging verbosity, showing
   for example the QMP commands and answers.
 
-- 
2.31.1

[PATCH v6 01/16] python: qemu: add timer parameter for qmp.accept socket

2021-06-21 Thread Emanuele Giuseppe Esposito

Also add a new _qmp_timer field to the QEMUMachine class.

Let's change the default socket timeout to None, so that if
a subclass needs to add a timer, it can be done by modifying
this private field.

At the same time, restore the timer to be 15 seconds in iotests.py, to
give an upper bound to the QMP monitor test command execution.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 python/qemu/machine/machine.py | 7 +--
 python/qemu/machine/qtest.py   | 5 +++--
 tests/qemu-iotests/iotests.py  | 3 ++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index b62435528e..fdf2fc0e9c 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -95,7 +95,8 @@ def __init__(self,
  socket_scm_helper: Optional[str] = None,
  sock_dir: Optional[str] = None,
  drain_console: bool = False,
- console_log: Optional[str] = None):
+ console_log: Optional[str] = None,
+ qmp_timer: Optional[float] = None):
 '''
 Initialize a QEMUMachine
 
@@ -109,6 +110,7 @@ def __init__(self,
 @param sock_dir: where to create socket (defaults to base_temp_dir)
 @param drain_console: (optional) True to drain console socket to buffer
 @param console_log: (optional) path to console log file
+@param qmp_timer: (optional) default QMP socket timeout
 @note: Qemu process is not started until launch() is used.
 '''
 # Direct user configuration
@@ -116,6 +118,7 @@ def __init__(self,
 self._binary = binary
 self._args = list(args)
 self._wrapper = wrapper
+self._qmp_timer = qmp_timer
 
 self._name = name or "qemu-%d" % os.getpid()
 self._base_temp_dir = base_temp_dir
@@ -333,7 +336,7 @@ def _pre_launch(self) -> None:
 
 def _post_launch(self) -> None:
 if self._qmp_connection:
-self._qmp.accept()
+self._qmp.accept(self._qmp_timer)
 
 def _post_shutdown(self) -> None:
 """
diff --git a/python/qemu/machine/qtest.py b/python/qemu/machine/qtest.py
index 93700684d1..33a86a9d69 100644
--- a/python/qemu/machine/qtest.py
+++ b/python/qemu/machine/qtest.py
@@ -115,14 +115,15 @@ def __init__(self,
  name: Optional[str] = None,
  base_temp_dir: str = "/var/tmp",
  socket_scm_helper: Optional[str] = None,
- sock_dir: Optional[str] = None):
+ sock_dir: Optional[str] = None,
+ qmp_timer: Optional[float] = None):
 if name is None:
 name = "qemu-%d" % os.getpid()
 if sock_dir is None:
 sock_dir = base_temp_dir
 super().__init__(binary, args, name=name, base_temp_dir=base_temp_dir,
  socket_scm_helper=socket_scm_helper,
- sock_dir=sock_dir)
+ sock_dir=sock_dir, qmp_timer=qmp_timer)
 self._qtest: Optional[QEMUQtestProtocol] = None
 self._qtest_path = os.path.join(sock_dir, name + "-qtest.sock")
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 89663dac06..6b0db4ce54 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -570,10 +570,11 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
+timer = 15.0
 super().__init__(qemu_prog, qemu_opts, name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
- sock_dir=sock_dir)
+ sock_dir=sock_dir, qmp_timer=timer)
 self._num_drives = 0
 
 def add_object(self, opts):
-- 
2.31.1

[PATCH v6 10/16] qemu-iotests: extend the check script to prepare supporting valgrind for python tests

2021-06-21 Thread Emanuele Giuseppe Esposito

Currently, the check script only parses the option and sets the
VALGRIND_QEMU environmental variable to "y".
Add another local python variable that prepares the command line,
identical to the one provided in the test scripts.

Because the python script does not know in advance the valgrind
PID to assign to the log file name, use the "%p" flag in valgrind
log file name that automatically puts the process PID at runtime.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Max Reitz 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/check  |  7 ---
 tests/qemu-iotests/iotests.py | 11 +++
 tests/qemu-iotests/testenv.py |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 4365bb8066..ebd27946db 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -39,6 +39,10 @@ def make_argparser() -> argparse.ArgumentParser:
 p.add_argument('-gdb', action='store_true',
help="start gdbserver with $GDB_OPTIONS options \
 ('localhost:12345' if $GDB_OPTIONS is empty)")
+p.add_argument('-valgrind', action='store_true',
+help='use valgrind, sets VALGRIND_QEMU environment '
+'variable')
+
 p.add_argument('-misalign', action='store_true',
help='misalign memory allocations')
 p.add_argument('--color', choices=['on', 'off', 'auto'],
@@ -88,9 +92,6 @@ def make_argparser() -> argparse.ArgumentParser:
 g_bash.add_argument('-o', dest='imgopts',
 help='options to pass to qemu-img create/convert, '
 'sets IMGOPTS environment variable')
-g_bash.add_argument('-valgrind', action='store_true',
-help='use valgrind, sets VALGRIND_QEMU environment '
-'variable')
 
 g_sel = p.add_argument_group('test selecting options',
  'The following options specify test set '
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e7e3d92d3e..6aa1dc48ba 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -96,6 +96,17 @@
 sys.stderr.write('Please run this test via the "check" script\n')
 sys.exit(os.EX_USAGE)
 
+qemu_valgrind = []
+if os.environ.get('VALGRIND_QEMU') == "y" and \
+os.environ.get('NO_VALGRIND') != "y":
+valgrind_logfile = "--log-file=" + test_dir
+# %p allows to put the valgrind process PID, since
+# we don't know it a priori (subprocess.Popen is
+# not yet invoked)
+valgrind_logfile += "/%p.valgrind"
+
+qemu_valgrind = ['valgrind', valgrind_logfile, '--error-exitcode=99']
+
 socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper')
 
 luks_default_secret_object = 'secret,id=keysec0,data=' + \
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 8501c6caf5..8bf154376f 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -298,6 +298,7 @@ def print_env(self) -> None:
 SOCK_DIR  -- {SOCK_DIR}
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
 GDB_OPTIONS   -- {GDB_OPTIONS}
+VALGRIND_QEMU -- {VALGRIND_QEMU}
 """
 
 args = collections.defaultdict(str, self.get_env())
-- 
2.31.1

[PATCH v6 09/16] docs/devel/testing: add -gdb option to the debugging section of QEMU iotests

2021-06-21 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 docs/devel/testing.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 9d6a8f8636..8b24e6fb47 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -229,6 +229,17 @@ Debugging a test case
 The following options to the ``check`` script can be useful when debugging
 a failing test:
 
+* ``-gdb`` wraps every QEMU invocation in a ``gdbserver``, which waits for a
+  connection from a gdb client.  The options given to ``gdbserver`` (e.g. the
+  address on which to listen for connections) are taken from the 
``$GDB_OPTIONS``
+  environment variable.  By default (if ``$GDB_OPTIONS`` is empty), it listens 
on
+  ``localhost:12345``.
+  It is possible to connect to it for example with
+  ``gdb -iex "target remote $addr"``, where ``$addr`` is the address
+  ``gdbserver`` listens on.
+  If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored,
+  regardless on whether it is set or not.
+
 * ``-d`` (debug) just increases the logging verbosity, showing
   for example the QMP commands and answers.
 
-- 
2.31.1

[PATCH v6 08/16] qemu-iotests: add gdbserver option to script tests too

2021-06-21 Thread Emanuele Giuseppe Esposito

The only limitation here is that running a script with gdbserver
will make the test output mismatch with the expected
results, making the test fail.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/common.rc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index cbbf6d7c7f..a1ef2b5c2f 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -166,8 +166,14 @@ _qemu_wrapper()
 if [ -n "${QEMU_NEED_PID}" ]; then
 echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"
 fi
+
+GDB=""
+if [ ! -z ${GDB_OPTIONS} ]; then
+GDB="gdbserver ${GDB_OPTIONS}"
+fi
+
 VALGRIND_QEMU="${VALGRIND_QEMU_VM}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" \
-"$QEMU_PROG" $QEMU_OPTIONS "$@"
+$GDB "$QEMU_PROG" $QEMU_OPTIONS "$@"
 )
 RETVAL=$?
 _qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL
-- 
2.31.1

[PATCH v6 07/16] qemu_iotests: insert gdbserver command line as wrapper for qemu binary

2021-06-21 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e176a84620..e7e3d92d3e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -580,7 +580,8 @@ class VM(qtest.QEMUQtestMachine):
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
 timer = 15.0 if not qemu_gdb else None
-super().__init__(qemu_prog, qemu_opts, name=name,
+super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
+ name=name,
  base_temp_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
  sock_dir=sock_dir, qmp_timer=timer)
-- 
2.31.1

[PATCH v6 11/16] qemu-iotests: extend QMP socket timeout when using valgrind

2021-06-21 Thread Emanuele Giuseppe Esposito

As with gdbserver, valgrind delays the test execution, so
the default QMP socket timeout and the generic class
Timeout in iotests.py timeouts too soon.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6aa1dc48ba..26c580f9e7 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -488,13 +488,13 @@ def __init__(self, seconds, errmsg="Timeout"):
 self.seconds = seconds
 self.errmsg = errmsg
 def __enter__(self):
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 return self
 signal.signal(signal.SIGALRM, self.timeout)
 signal.setitimer(signal.ITIMER_REAL, self.seconds)
 return self
 def __exit__(self, exc_type, value, traceback):
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 return False
 signal.setitimer(signal.ITIMER_REAL, 0)
 return False
@@ -590,7 +590,7 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
-timer = 15.0 if not qemu_gdb else None
+timer = 15.0 if not (qemu_gdb or qemu_valgrind) else None
 super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
  name=name,
  base_temp_dir=test_dir,
-- 
2.31.1

[PATCH v6 05/16] qemu-iotests: add option to attach gdbserver

2021-06-21 Thread Emanuele Giuseppe Esposito

Define -gdb flag and GDB_OPTIONS environment variable
to python tests to attach a gdbserver to each qemu instance.
This patch only adds and parses this flag, it does not yet add
the implementation for it.

if -gdb is not provided but $GDB_OPTIONS is set, ignore the
environment variable.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/check  |  6 +-
 tests/qemu-iotests/iotests.py |  5 +
 tests/qemu-iotests/testenv.py | 17 +++--
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 2dd529eb75..4365bb8066 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -36,6 +36,9 @@ def make_argparser() -> argparse.ArgumentParser:
help='pretty print output for make check')
 
 p.add_argument('-d', dest='debug', action='store_true', help='debug')
+p.add_argument('-gdb', action='store_true',
+   help="start gdbserver with $GDB_OPTIONS options \
+('localhost:12345' if $GDB_OPTIONS is empty)")
 p.add_argument('-misalign', action='store_true',
help='misalign memory allocations')
 p.add_argument('--color', choices=['on', 'off', 'auto'],
@@ -114,7 +117,8 @@ if __name__ == '__main__':
 env = TestEnv(imgfmt=args.imgfmt, imgproto=args.imgproto,
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
-  debug=args.debug, valgrind=args.valgrind)
+  debug=args.debug, valgrind=args.valgrind,
+  gdb=args.gdb)
 
 if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--':
 if not args.tests:
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6b0db4ce54..c86f239d81 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -74,6 +74,11 @@
 qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
 qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
 
+gdb_qemu_env = os.environ.get('GDB_OPTIONS')
+qemu_gdb = []
+if gdb_qemu_env:
+qemu_gdb = ['gdbserver'] + gdb_qemu_env.strip().split(' ')
+
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 0c3fe75636..8501c6caf5 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -27,6 +27,7 @@
 import glob
 from typing import List, Dict, Any, Optional, ContextManager
 
+DEF_GDB_OPTIONS = 'localhost:12345'
 
 def isxfile(path: str) -> bool:
 return os.path.isfile(path) and os.access(path, os.X_OK)
@@ -72,7 +73,8 @@ class TestEnv(ContextManager['TestEnv']):
  'QEMU_NBD_OPTIONS', 'IMGOPTS', 'IMGFMT', 'IMGPROTO',
  'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU',
  'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX',
- 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_']
+ 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_',
+ 'GDB_OPTIONS']
 
 def prepare_subprocess(self, args: List[str]) -> Dict[str, str]:
 if self.debug:
@@ -178,7 +180,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
  imgopts: Optional[str] = None,
  misalign: bool = False,
  debug: bool = False,
- valgrind: bool = False) -> None:
+ valgrind: bool = False,
+ gdb: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -186,6 +189,15 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: 
str,
 self.misalign = misalign
 self.debug = debug
 
+if gdb:
+self.gdb_options = os.getenv('GDB_OPTIONS', DEF_GDB_OPTIONS)
+if not self.gdb_options:
+# cover the case 'export GDB_OPTIONS='
+self.gdb_options = DEF_GDB_OPTIONS
+elif 'GDB_OPTIONS' in os.environ:
+# to not propagate it in prepare_subprocess()
+del os.environ['GDB_OPTIONS']
+
 if valgrind:
 self.valgrind_qemu = 'y'
 
@@ -285,6 +297,7 @@ def print_env(self) -> None:
 TEST_DIR  -- {TEST_DIR}
 SOCK_DIR  -- {SOCK_DIR}
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
+GDB_OPTIONS   -- {GDB_OPTIONS}
 """
 
 args = collections.defaultdict(str, self.get_env())
-- 
2.31.1

[PATCH v6 12/16] qemu-iotests: allow valgrind to read/delete the generated log file

2021-06-21 Thread Emanuele Giuseppe Esposito

When using -valgrind on the script tests, it generates a log file
in $TEST_DIR that is either read (if valgrind finds problems) or
otherwise deleted. Provide the same exact behavior when using
-valgrind on the python tests.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 26c580f9e7..85d8c0abbb 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -598,6 +598,17 @@ def __init__(self, path_suffix=''):
  sock_dir=sock_dir, qmp_timer=timer)
 self._num_drives = 0
 
+def _post_shutdown(self) -> None:
+super()._post_shutdown()
+if not qemu_valgrind or not self._popen:
+return
+valgrind_filename =  f"{test_dir}/{self._popen.pid}.valgrind"
+if self.exitcode() == 99:
+with open(valgrind_filename) as f:
+print(f.read())
+else:
+os.remove(valgrind_filename)
+
 def add_object(self, opts):
 self._args.append('-object')
 self._args.append(opts)
-- 
2.31.1

[PATCH v6 00/16] qemu_iotests: improve debugging options

2021-06-21 Thread Emanuele Giuseppe Esposito

This series adds the option to attach gdbserver and valgrind
to the QEMU binary running in qemu_iotests.
It also allows to redirect QEMU binaries output of the python tests
to the stdout, instead of a log file.

Patches 1-9 introduce the -gdb option to both python and bash tests, 
10-14 extend the already existing -valgrind flag to work also on 
python tests, and patch 15-16 introduces -p to enable logging to stdout.

In particular, patches 1,6,11 focus on extending the QMP socket timers
when using gdb/valgrind, otherwise the python tests will fail due to
delays in the QMP responses.

Signed-off-by: Emanuele Giuseppe Esposito 
---
v6:
* undo the previous series change "base this serie on the double dash options,
  so define --gdb instead of -gdb"
* undo Vladimir's suggestion on patch 5 to use @contextmanager, because it
  produces a pylint warning.

Emanuele Giuseppe Esposito (15):
  python: qemu: add timer parameter for qmp.accept socket
  python: qemu: pass the wrapper field from QEMUQtestmachine to
QEMUMachine
  docs/devel/testing: add debug section to the QEMU iotests chapter
  qemu-iotests: add option to attach gdbserver
  qemu-iotests: delay QMP socket timers
  qemu_iotests: insert gdbserver command line as wrapper for qemu binary
  qemu-iotests: add gdbserver option to script tests too
  docs/devel/testing: add -gdb option to the debugging section of QEMU
iotests
  qemu-iotests: extend the check script to prepare supporting valgrind
for python tests
  qemu-iotests: extend QMP socket timeout when using valgrind
  qemu-iotests: allow valgrind to read/delete the generated log file
  qemu-iotests: insert valgrind command line as wrapper for qemu binary
  docs/devel/testing: add -valgrind option to the debug section of QEMU
iotests
  qemu-iotests: add option to show qemu binary logs on stdout
  docs/devel/testing: add -p option to the debug section of QEMU iotests

John Snow (1):
  python: Reduce strictness of pylint's duplicate-code check

 docs/devel/testing.rst | 30 +
 python/qemu/machine/machine.py | 16 +++
 python/qemu/machine/qtest.py   |  9 ---
 python/setup.cfg   |  5 
 tests/qemu-iotests/check   | 15 ---
 tests/qemu-iotests/common.rc   |  8 +-
 tests/qemu-iotests/iotests.py  | 49 --
 tests/qemu-iotests/testenv.py  | 23 ++--
 8 files changed, 138 insertions(+), 17 deletions(-)

-- 
2.31.1

[PATCH v6 04/16] docs/devel/testing: add debug section to the QEMU iotests chapter

2021-06-21 Thread Emanuele Giuseppe Esposito

Introduce the "Debugging a test case" section, in preparation
to the additional flags that will be added in the next patches.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 docs/devel/testing.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 4e42392810..9d6a8f8636 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -224,6 +224,14 @@ another application on the host may have locked the file, 
possibly leading to a
 test failure.  If using such devices are explicitly desired, consider adding
 ``locking=off`` option to disable image locking.
 
+Debugging a test case
+---
+The following options to the ``check`` script can be useful when debugging
+a failing test:
+
+* ``-d`` (debug) just increases the logging verbosity, showing
+  for example the QMP commands and answers.
+
 Test case groups
 
 
-- 
2.31.1

[PATCH v6 03/16] python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine

2021-06-21 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
---
 python/qemu/machine/qtest.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/qtest.py b/python/qemu/machine/qtest.py
index 33a86a9d69..dc2b5ccfb1 100644
--- a/python/qemu/machine/qtest.py
+++ b/python/qemu/machine/qtest.py
@@ -112,6 +112,7 @@ class QEMUQtestMachine(QEMUMachine):
 def __init__(self,
  binary: str,
  args: Sequence[str] = (),
+ wrapper: Sequence[str] = (),
  name: Optional[str] = None,
  base_temp_dir: str = "/var/tmp",
  socket_scm_helper: Optional[str] = None,
@@ -121,7 +122,8 @@ def __init__(self,
 name = "qemu-%d" % os.getpid()
 if sock_dir is None:
 sock_dir = base_temp_dir
-super().__init__(binary, args, name=name, base_temp_dir=base_temp_dir,
+super().__init__(binary, args, wrapper=wrapper, name=name,
+ base_temp_dir=base_temp_dir,
  socket_scm_helper=socket_scm_helper,
  sock_dir=sock_dir, qmp_timer=qmp_timer)
 self._qtest: Optional[QEMUQtestProtocol] = None
-- 
2.31.1

Re: [PATCH 2/6] block: block-status cache for data regions

2021-06-21 Thread Max Reitz


On 19.06.21 12:20, Vladimir Sementsov-Ogievskiy wrote:

17.06.2021 18:52, Max Reitz wrote:

As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform. The
main difference is that this time it is implemented as part of the
general block layer code.

The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow.  Given the
implementation is outside of qemu, there is little we can do about its
performance.

We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts. So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.

To address this, we want to cache data regions.  Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.

(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine.  While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)

We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Max Reitz 
---
  include/block/block_int.h | 19 ++
  block.c   |  2 +
  block/io.c    | 80 +--
  3 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index a8f9598102..c09512556a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -832,6 +832,23 @@ struct BdrvChild {
  QLIST_ENTRY(BdrvChild) next_parent;
  };
  +/*
+ * Allows bdrv_co_block_status() to cache one data region for a
+ * protocol node.
+ *
+ * @lock: Lock for accessing this object's fields
+ * @valid: Whether the cache is valid
+ * @data_start: Offset where we know (or strongly assume) is data
+ * @data_end: Offset where the data region ends (which is not 
necessarily

+ *    the start of a zeroed region)
+ */
+typedef struct BdrvBlockStatusCache {
+    CoMutex lock;
+    bool valid;
+    int64_t data_start;
+    int64_t data_end;
+} BdrvBlockStatusCache;
+
  struct BlockDriverState {
  /* Protected by big QEMU lock or read-only after opening. No 
special

   * locking needed during I/O...
@@ -997,6 +1014,8 @@ struct BlockDriverState {
    /* BdrvChild links to this node may never be frozen */
  bool never_freeze;
+
+    BdrvBlockStatusCache block_status_cache;
  };
    struct BlockBackendRootState {
diff --git a/block.c b/block.c
index 3f456892d0..bad64d5c4f 100644
--- a/block.c
+++ b/block.c
@@ -398,6 +398,8 @@ BlockDriverState *bdrv_new(void)
    qemu_co_queue_init(>flush_queue);
  +    qemu_co_mutex_init(>block_status_cache.lock);
+
  for (i = 0; i < bdrv_drain_all_count; i++) {
  bdrv_drained_begin(bs);
  }
diff --git a/block/io.c b/block/io.c
index 323854d063..320638cc48 100644
--- a/block/io.c
+++ b/block/io.c
@@ -35,6 +35,7 @@
  #include "qapi/error.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
+#include "qemu/range.h"
  #include "sysemu/replay.h"
    /* Maximum bounce buffer for copy-on-read and write zeroes, in 
bytes */
@@ -1862,6 +1863,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,

  bool need_flush = false;
  int head = 0;
  int tail = 0;
+    BdrvBlockStatusCache *bsc = >block_status_cache;
    int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, 
INT_MAX);

  int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
@@ -1878,6 +1880,16 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,

  return -ENOTSUP;
  }
  +    /* Invalidate the cached block-status data range if this write 
overlaps */

+    qemu_co_mutex_lock(>lock);
+    if (bsc->valid &&
+    ranges_overlap(offset, bytes, bsc->data_start,
+   bsc->data_end - bsc->data_start))
+    {
+    bsc->valid = false;
+    }
+

Re: [PATCH 5/6] block/nbd: Do not force-cap *pnum

2021-06-21 Thread Max Reitz


On 19.06.21 12:53, Vladimir Sementsov-Ogievskiy wrote:

17.06.2021 18:52, Max Reitz wrote:

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Max Reitz 
---
  block/nbd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 616f9ae6c4..930bd234de 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1702,7 +1702,7 @@ static int coroutine_fn 
nbd_client_co_block_status(

  .type = NBD_CMD_BLOCK_STATUS,
  .from = offset,
  .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
-   MIN(bytes, s->info.size - offset)),
+   s->info.size - offset),
  .flags = NBD_CMD_FLAG_REQ_ONE,
  };



Hmm..

I don't that this change is correct. In contrast with file-posix you 
don't get extra information for free, you just make a larger request. 
This means that server will have to do more work.


Oh, oops.  Seems I was blind in my rage to replace this MIN() pattern.

You’re absolutely right.  So this patch should be dropped.

Max

(look at blockstatus_to_extents, it calls bdrv_block_status_above in a 
loop).


For example, assume that nbd export is a qcow2 image with all clusters 
allocated. With this change, nbd server will loop through the whole 
qcow2 image, load all L2 tables to return big allocated extent.


So, only server can decide, could it add some extra free information 
to request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't 
allow it.

Re: [PATCH 4/6] block/gluster: Do not force-cap *pnum

2021-06-21 Thread Max Reitz


On 19.06.21 12:36, Vladimir Sementsov-Ogievskiy wrote:

17.06.2021 18:52, Max Reitz wrote:

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

Signed-off-by: Max Reitz 



Reviewed-by: Vladimir Sementsov-Ogievskiy 



---
  block/gluster.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index e8ee14c8e9..8ef7bb18d5 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1461,7 +1461,8 @@ exit:
   * the specified offset) that are known to be in the same
   * allocated/unallocated state.
   *
- * 'bytes' is the max value 'pnum' should be set to.
+ * 'bytes' is a soft cap for 'pnum'.  If the information is free, 
'pnum' may

+ * well exceed it.
   *
   * (Based on raw_co_block_status() from file-posix.c.)
   */
@@ -1500,12 +1501,12 @@ static int coroutine_fn 
qemu_gluster_co_block_status(BlockDriverState *bs,

  } else if (data == offset) {
  /* On a data extent, compute bytes to the end of the extent,
   * possibly including a partial sector at EOF. */
-    *pnum = MIN(bytes, hole - offset);
+    *pnum = hole - offset;
  ret = BDRV_BLOCK_DATA;


Interesting, isn't it a bug that we don't ROUND_UP *pnum to 
request_alignment here like it is done in file-posix ?


Guess I forgot gluster in 9c3db310ff0 O:)

I don’t think I’ll be able to reproduce it for gluster, but I suppose 
just doing the same thing for gluster should be fine...


Max

Re: [PATCH 3/6] block/file-posix: Do not force-cap *pnum

2021-06-21 Thread Max Reitz


On 18.06.21 22:16, Eric Blake wrote:

On Thu, Jun 17, 2021 at 05:52:44PM +0200, Max Reitz wrote:

bdrv_co_block_status() does it for us, we do not need to do it here.

The advantage of not capping *pnum is that bdrv_co_block_status() can
cache larger data regions than requested by its caller.

We should update the documentation in include/block/block_int.h to
mention that the driver's block_status callback may treat *pnum as a
soft cap, and that returning a larger value is fine.


Oh, sure.

Max


But I agree with this change in the individual drivers, as long as we
remember to make our global contract explicit that we can now rely on
it ;)

Reviewed-by: Eric Blake

Re: [PATCH 2/6] block: block-status cache for data regions

2021-06-21 Thread Max Reitz


On 18.06.21 20:51, Eric Blake wrote:

On Thu, Jun 17, 2021 at 05:52:43PM +0200, Max Reitz wrote:

To address this, we want to cache data regions.  Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.

Here's hoping third time's the charm!


Indeed :)


(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine.  While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)

We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Max Reitz 
---
  include/block/block_int.h | 19 ++
  block.c   |  2 +
  block/io.c| 80 +--
  3 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index a8f9598102..c09512556a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -832,6 +832,23 @@ struct BdrvChild {
  QLIST_ENTRY(BdrvChild) next_parent;
  };
  
+/*

+ * Allows bdrv_co_block_status() to cache one data region for a
+ * protocol node.
+ *
+ * @lock: Lock for accessing this object's fields
+ * @valid: Whether the cache is valid
+ * @data_start: Offset where we know (or strongly assume) is data
+ * @data_end: Offset where the data region ends (which is not necessarily
+ *the start of a zeroed region)
+ */
+typedef struct BdrvBlockStatusCache {
+CoMutex lock;
+bool valid;
+int64_t data_start;
+int64_t data_end;
+} BdrvBlockStatusCache;

Looks like the right bits of information, and I'm glad you documented
the need to be prepared for protocols that report split data sections
rather than consolidated.


+++ b/block/io.c
@@ -35,6 +35,7 @@
  #include "qapi/error.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
+#include "qemu/range.h"
  #include "sysemu/replay.h"
  
  /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */

@@ -1862,6 +1863,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
  bool need_flush = false;
  int head = 0;
  int tail = 0;
+BdrvBlockStatusCache *bsc = >block_status_cache;
  
  int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);

  int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
@@ -1878,6 +1880,16 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
  return -ENOTSUP;
  }
  
+/* Invalidate the cached block-status data range if this write overlaps */

+qemu_co_mutex_lock(>lock);

Are we going to be suffering from a lot of lock contention performance
degradation?  Is there a way to take advantage of RCU access patterns
for any more performance without sacrificing correctness?


The critical section is so short that I considered it fine.  I wanted to 
use RW locks, but then realized that every RW lock operation is 
internally locked by another mutex, so it wouldn’t gain anything.


I’m not sure whether RCU is worth it here.

We could try something very crude, namely to just not take a lock and 
make `valid` an atomic.  After all, it doesn’t really matter whether 
`data_start` and `data_end` are consistent values, and resetting `valid` 
to false is always safe.


The worst that could happen is that a concurrent block-status call tries 
to set up an overlapping data area, which we thus fail to recognize 
here.  But if such a thing were to happen, it could just as well happen 
before said concurrent call took any lock on `bsc`.



+if (bsc->valid &&
+ranges_overlap(offset, bytes, bsc->data_start,
+   bsc->data_end - bsc->data_start))
+{
+bsc->valid = false;
+}

Do we want to invalidate the entire bsc, or can we be smart and leave
the prefix intact (if offset > bsc->data_start, then set bsc->data_end
to offset)?


Perhaps we could be smart, but I don’t think it really makes a 
difference in practice, so I think keeping it simple is better.



+qemu_co_mutex_unlock(>lock);

Worth using WITH_QEMU_LOCK_GUARD?


I knew I forgot something, right.  Will use!

Max

[PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device

2021-06-21 Thread Philippe Mathieu-Daudé

When the NVMe block driver was introduced (see commit bdd6a90a9e5,
January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning
-ENOMEM in case of error. The driver was correctly handling the
error path to recycle its volatile IOVA mappings.

To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit
DMA mappings per container", April 2019) added the -ENOSPC error to
signal the user exhausted the DMA mappings available for a container.

The block driver started to mis-behave:

  qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device
  (qemu)
  (qemu) info status
  VM status: paused (io-error)
  (qemu) c
  VFIO_MAP_DMA failed: No space left on device
  qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: 
Assertion `ctx == blk->ctx' failed.

Fix by handling the new -ENOSPC error (when DMA mappings are
exhausted) without any distinction to the current -ENOMEM error,
so we don't change the behavior on old kernels where the CVE-2019-3882
fix is not present.

An easy way to reproduce this bug is to restrict the DMA mapping
limit (65535 by default) when loading the VFIO IOMMU module:

  # modprobe vfio_iommu_type1 dma_entry_limit=666

Cc: qemu-sta...@nongnu.org
Reported-by: Michal Prívozník 
Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver")
Buglink: https://bugs.launchpad.net/qemu/+bug/186
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65
Signed-off-by: Philippe Mathieu-Daudé 
---
v2: KISS checking both errors undistinguishedly (Maxim)
---
 block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nvme.c b/block/nvme.c
index 2b5421e7aa6..c3d2a49866c 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1030,7 +1030,7 @@ try_map:
 r = qemu_vfio_dma_map(s->vfio,
   qiov->iov[i].iov_base,
   len, true, );
-if (r == -ENOMEM && retry) {
+if ((r == -ENOMEM || r == -ENOSPC) && retry) {
 retry = false;
 trace_nvme_dma_flush_queue_wait(s);
 if (s->dma_map_count) {
-- 
2.31.1

Re: [PATCH v4 6/6] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState

2021-06-21 Thread Emanuele Giuseppe Esposito





On 19/06/2021 22:06, Vladimir Sementsov-Ogievskiy wrote:

14.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

By adding acquire/release pairs, we ensure that .ret and .error_is_read
fields are written by block_copy_dirty_clusters before .finished is true.


And that they are read by API user after .finished is true.



The atomic here are necessary because the fields are concurrently 
modified

also outside coroutines.


To be honest, finished is modified only in coroutine. And read outside.



Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 33 ++---
  1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 6416929abd..5348e1f61b 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -53,14 +53,14 @@ typedef struct BlockCopyCallState {
  Coroutine *co;
  /* State */
-    bool finished;
+    bool finished; /* atomic */


So, logic around finished:

Thread of block_copy does:
0. finished is false
1. tasks set ret and error_is_read
2. qatomic_store_release finished -> true
3. after that point ret and error_is_read are not modified

Other threads can:

- qatomic_read finished, just to check are we finished or not

- if finished, can read ret and error_is_read safely. If you not sure 
that block-copy finished, use qatomic_load_acquire() of finished first, 
to be sure that you read ret and error_is_read AFTER finished read and 
checked to be true.



  QemuCoSleep sleep; /* TODO: protect API with a lock */
  /* To reference all call states from BlockCopyState */
  QLIST_ENTRY(BlockCopyCallState) list;
  /* OUT parameters */
-    bool cancelled;
+    bool cancelled; /* atomic */


Logic around cancelled is simpler:

- false at start

- qatomic_read is allowed from any thread

- qatomic_write to true is allowed from any thread

- never write to false

Note that cancelling and finishing are racy. User can cancel block-copy 
that's already finished. We probably may improve change it, but I'm not 
sure that it worth doing. Still, maybe leave some comment in API 
documentation.



  /* Fields protected by lock in BlockCopyState */
  bool error_is_read;
  int ret;
@@ -650,7 +650,8 @@ block_copy_dirty_clusters(BlockCopyCallState 
*call_state)

  assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
  assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
-    while (bytes && aio_task_pool_status(aio) == 0 && 
!call_state->cancelled) {

+    while (bytes && aio_task_pool_status(aio) == 0 &&
+   !qatomic_read(_state->cancelled)) {
  BlockCopyTask *task;
  int64_t status_bytes;
@@ -761,7 +762,7 @@ static int coroutine_fn 
block_copy_common(BlockCopyCallState *call_state)

  do {
  ret = block_copy_dirty_clusters(call_state);
-    if (ret == 0 && !call_state->cancelled) {
+    if (ret == 0 && !qatomic_read(_state->cancelled)) {
  WITH_QEMU_LOCK_GUARD(>lock) {
  /*
   * Check that there is no task we still need to
@@ -792,9 +793,9 @@ static int coroutine_fn 
block_copy_common(BlockCopyCallState *call_state)

   * 2. We have waited for some intersecting block-copy request
   *    It may have failed and produced new dirty bits.
   */
-    } while (ret > 0 && !call_state->cancelled);
+    } while (ret > 0 && !qatomic_read(_state->cancelled));
-    call_state->finished = true;
+    qatomic_store_release(_state->finished, true);


so, all writes to ret and error_is_read are finished to this point.


  if (call_state->cb) {
  call_state->cb(call_state->cb_opaque);
@@ -857,35 +858,37 @@ void block_copy_call_free(BlockCopyCallState 
*call_state)

  return;
  }
-    assert(call_state->finished);
+    assert(qatomic_load_acquire(_state->finished));


Here we don't need load_aquire, as we don't read other fields. 
qatomic_read is enough.


So what you say makes sense, the only thing that I wonder is: wouldn't 
it be better to have the acquire without assertion (or assert 
afterwards), just to be sure that we delete when finished is true?


[...]




  }
  bool block_copy_call_cancelled(BlockCopyCallState *call_state)
  {
-    return call_state->cancelled;
+    return qatomic_read(_state->cancelled);
  }
  int block_copy_call_status(BlockCopyCallState *call_state, bool 
*error_is_read)

  {
-    assert(call_state->finished);
+    assert(qatomic_load_acquire(_state->finished));


Hmm. Here qatomic_load_acquire protects nothing (assertion will crash if 
not yet finished anyway). So, caller is double sure that block-copy is 
finished.


Also it's misleading: if we think that it do some protection, we are 
doing wrong thing: assertions may be simply compiled out, we can't rely 
on statements inside assert() to be executed.


So, let's use simple qatomic_read here too.


Same applies here.




  if (error_is_read) {
  *error_is_read =

Re: [PATCH V3 5/6] block/rbd: add write zeroes support

2021-06-21 Thread Peter Lieven


Am 18.06.21 um 12:34 schrieb Ilya Dryomov:

On Fri, Jun 18, 2021 at 11:00 AM Peter Lieven  wrote:

Am 16.06.21 um 14:34 schrieb Ilya Dryomov:

On Wed, May 19, 2021 at 4:28 PM Peter Lieven  wrote:

Signed-off-by: Peter Lieven 
---
  block/rbd.c | 37 -
  1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/block/rbd.c b/block/rbd.c
index 0d8612a988..ee13f08a74 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -63,7 +63,8 @@ typedef enum {
  RBD_AIO_READ,
  RBD_AIO_WRITE,
  RBD_AIO_DISCARD,
-RBD_AIO_FLUSH
+RBD_AIO_FLUSH,
+RBD_AIO_WRITE_ZEROES
  } RBDAIOCmd;

  typedef struct BDRVRBDState {
@@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }

+#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
+bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;

I wonder if we should also set BDRV_REQ_NO_FALLBACK here since librbd
does not really have a notion of non-efficient explicit zeroing.


This is only true if thick provisioning is supported which is in Octopus 
onwards, right?

Since Pacific, I think.


So it would only be correct to set this if thick provisioning is supported 
otherwise we could

fail with ENOTSUP and then qemu emulates the zeroing with plain writes.

I actually had a question about that.  Why are you returning ENOTSUP
in case BDRV_REQ_MAY_UNMAP is not specified and that can't be fulfilled
because librbd is too old for RBD_WRITE_ZEROES_FLAG_THICK_PROVISION?

My understanding has always been that BDRV_REQ_MAY_UNMAP is just
a hint.  Deallocating if BDRV_REQ_MAY_UNMAP is specified is not nice
but should be perfectly acceptable.  It is certainly better than
returning ENOTSUP, particularly if ENOTSUP causes Qemu to do plain
zeroing.



I think this was introduced to support different provisioning modes. If 
BDRV_REQ_MAY_UNMAP is not set

the caller of bdrv_write_zeroes expects that the driver does thick 
provisioning. If the driver cannot handle that (efficiently)

qemu does a plain zero write.


I am still not fully understanding the meaning of the BDRV_REQ_NO_FALLBACK 
flag. The original commit states that it was introduced for qemu-img to 
efficiently

zero out the target and avoid the slow fallback. When I last worked on qemu-img 
convert I remember that there was a call to zero out the target if 
bdrv_has_zero_init

is not 1. It seems hat meanwhile a target_is_zero cmdline switch for qemu-img 
convert was added to let the user assert that a preexisting target is zero.

Maybe someone can help here if it would be right to set BDRV_REQ_NO_FALLBACK 
for rbd in either of the 2 cases (thick provisioning is support or not)?


Thanks

Peter

Re: [PATCH] block/rbd: Add support for rbd image encryption

2021-06-21 Thread Daniel P . Berrangé

On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote:
> On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri  wrote:
> >
> > Starting from ceph Pacific, RBD has built-in support for image-level 
> > encryption.
> > Currently supported formats are LUKS version 1 and 2.
> >
> > There are 2 new relevant librbd APIs for controlling encryption, both 
> > expect an
> > open image context:
> >
> > rbd_encryption_format: formats an image (i.e. writes the LUKS header)
> > rbd_encryption_load: loads encryptor/decryptor to the image IO stack
> >
> > This commit extends the qemu rbd driver API to support the above.
> >
> > Signed-off-by: Or Ozeri 
> > ---
> >  block/raw-format.c   |   7 +
> >  block/rbd.c  | 371 ++-
> >  qapi/block-core.json | 110 -
> >  3 files changed, 482 insertions(+), 6 deletions(-)


> > diff --git a/block/rbd.c b/block/rbd.c
> > index f098a89c7b..183b17cd84 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -73,6 +73,18 @@
> >  #define LIBRBD_USE_IOVEC 0
> >  #endif
> >
> > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8
> > +
> > +static const char rbd_luks_header_verification[
> > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1
> > +};
> > +
> > +static const char rbd_luks2_header_verification[
> > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
> > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
> > +};
> > +
> >  typedef enum {
> >  RBD_AIO_READ,
> >  RBD_AIO_WRITE,
> > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t 
> > offs)
> >  }
> >  }
> >
> > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION
> > +static int qemu_rbd_convert_luks_options(
> > +RbdEncryptionOptionsLUKSBase *luks_opts,
> > +char **passphrase,
> > +Error **errp)
> > +{
> > +int r = 0;
> > +
> > +if (!luks_opts->has_key_secret) {
> > +r = -EINVAL;
> > +error_setg_errno(errp, -r, "missing encrypt.key-secret");
> > +return r;
> > +}
> 
> Why is key-secret optional?

It doesn't look like it is handled correctly here, but we need to
be able to run 'qemu-img info ' and get information back
on the size of the image, and whether or not it is encrypted,
without having to supply a passphrase upfront. So it is right that
key-secret be optional, but also we shouldn't return an fatal
error like this.

Only if BDRV_O_NO_IO is NOT set, should this error be reported




> >  static int64_t qemu_rbd_getlength(BlockDriverState *bs)
> >  {
> >  BDRVRBDState *s = bs->opaque;
> > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = {
> >  .type = QEMU_OPT_STRING,
> >  .help = "ID of secret providing the password",
> >  },
> > +{
> > +.name = "encrypt.format",
> > +.type = QEMU_OPT_STRING,
> > +.help = "Encrypt the image, format choices: 'luks', 'luks2'",
> 
> I think it should be "luks1" and "luks2" to match rbd/librbd.h and
> "rbd encryption format" command.

No, it should stay "luks" not "luks1", to match the existing QEMU
terminology for its LUKS v1 encryption support.


> > @@ -3609,6 +3622,94 @@
> >  { 'enum': 'RbdAuthMode',
> >'data': [ 'cephx', 'none' ] }
> >
> > +##
> > +# @RbdImageEncryptionFormat:
> > +#
> > +# Since: 6.1
> > +##
> > +{ 'enum': 'RbdImageEncryptionFormat',
> > +  'data': [ 'luks', 'luks2' ] }
> 
> Ditto -- "luks1" and "luks2".

No, the patch is correct as is.

> > +# @RbdEncryptionOptionsLUKSBase:
> > +#
> > +# @key-secret: ID of a QCryptoSecret object providing a passphrase
> > +#  for unlocking the encryption
> > +#
> > +# Since: 6.1
> > +##
> > +{ 'struct': 'RbdEncryptionOptionsLUKSBase',
> > +  'data': { '*key-secret': 'str' }}
> 
> When would we not need a passphrase?  I think it should be required.

When running 'qemu-img info'

> > +##
> > +# @RbdEncryptionCreateOptionsLUKSBase:
> > +#
> > +# @cipher-alg: The encryption algorithm
> > +#
> > +# Since: 6.1
> > +##
> > +{ 'struct': 'RbdEncryptionCreateOptionsLUKSBase',
> > +  'base': 'RbdEncryptionOptionsLUKSBase',
> > +  'data': { '*cipher-alg': 'QCryptoCipherAlgorithm'}}
> 
> Why QCryptoCipherAlgorithm instead of just enumerating the two
> algorithms that librbd supports?  An early failure when parsing
> seems better than failing in qemu_rbd_convert_luks_create_options()
> and having to clean up the newly created image.

We don't want to duplicate algorithm names that already have
a defined enum data type.

> 
> > +
> > +##
> > +# @RbdEncryptionOptionsLUKS:
> > +#
> > +# Since: 6.1
> > +##
> > +{ 'struct': 'RbdEncryptionOptionsLUKS',
> > +  'base': 'RbdEncryptionOptionsLUKSBase',
> > +  'data': {}}
> > +
> > +##
> > +# @RbdEncryptionOptionsLUKS2:
> > +#
> > +# Since: 6.1
> > +##
> > +{ 'struct': 'RbdEncryptionOptionsLUKS2',
> > +  'base': 'RbdEncryptionOptionsLUKSBase',
> > +  'data': {}}
> > +
> > +##
> > +# @RbdEncryptionCreateOptionsLUKS:
> >

Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-21 Thread Emanuele Giuseppe Esposito





On 19/06/2021 20:53, Vladimir Sementsov-Ogievskiy wrote:

14.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -52,29 +52,35 @@ typedef struct BlockCopyCallState {
  /* Coroutine where async block-copy is running */
  Coroutine *co;
-    /* To reference all call states from BlockCopyState */
-    QLIST_ENTRY(BlockCopyCallState) list;
-
  /* State */
-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
+
+    /* To reference all call states from BlockCopyState */
+    QLIST_ENTRY(BlockCopyCallState) list;
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;


Hmm, about that. Is @ret an "OUT parameter"? Yes it is.

But someone may think, that out parameters doesn't need locking like 
"State" parameters (otherwise, what is the difference for the person who 
read these comments?). But that is wrong. And ret is modified under 
mutex for reason.


In patch 5 I added a comment above @ret and @error_is_read:
/* Fields protected by lock in BlockCopyState */

I can add your explanation too.



Actually, the full description of "ret" field usage may look as follows:

Set concurrently by tasks under mutex. Only set once by first failed 
task (and untouched if no task failed).
After finish (if call_state->finished is true) not modified anymore and 
may be read safely without mutex.


So, before finished, ret is a kind of "State" too: it is both read and 
written by tasks.


This shows to me that dividing fields into "IN", "State" and "OUT", 
doesn't really help here. In this series we use different policies of 
concurrent access to fields: some are accessed only under mutex, other 
has more complex usage scenario (like this @ret), some needs atomic access.


Yes but I think especially the IN vs State division helps a lot to 
understand what needs a lock and what doesn't.


Emanuele

Re: [PATCH RFC] meson: add option to use zstd for qcow2 compression by default

2021-06-21 Thread Paolo Bonzini


On 17/06/21 21:51, Vladimir Sementsov-Ogievskiy wrote:

So, it's an RFC. I also can split the patch so that refactoring of
qcow2_co_create() go in a separate preparation patch.

Another RFC question, shouldn't we move to zstd by default in upstream
too?


I think backwards-incompatible changes in the past were not handled with 
build options, but that can be changed.


However I would prefer to have an option like 
--with-qcow2-compression={zstd,zlib}.  Meson supports multiple-choice 
options, they don't have to use enabled/disabled or (if boolean) true/false.


Regarding changing the default, that would make images unreadable to 
QEMU 5.0 and earlier versions.  Does that apply to images that have no 
compressed clusters?


Paolo


  configure | 10 +-
  meson.build   |  4 
  block/qcow2.c | 32 +---
  meson_options.txt |  2 ++
  4 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index debd50c085..b19af43525 100755
--- a/configure
+++ b/configure
@@ -385,6 +385,7 @@ snappy="auto"
  bzip2="auto"
  lzfse="auto"
  zstd="auto"
+qcow2_zstd_default="no"
  guest_agent="$default_feature"
  guest_agent_with_vss="no"
  guest_agent_ntddscsi="no"
@@ -1318,6 +1319,10 @@ for opt do
;;
--enable-zstd) zstd="enabled"
;;
+  --disable-qcow2-zstd-default) qcow2_zstd_default="disabled"
+  ;;
+  --enable-qcow2-zstd-default) qcow2_zstd_default="enabled"
+  ;;
--enable-guest-agent) guest_agent="yes"
;;
--disable-guest-agent) guest_agent="no"
@@ -1903,6 +1908,8 @@ disabled with --disable-FEATURE, default is enabled if 
available
(for reading lzfse-compressed dmg images)
zstdsupport for zstd compression library
(for migration compression and qcow2 cluster compression)
+  qcow2-zstd-default  Use zstd by default for qcow2 image creation
+  (requires zstd enabled)
seccomp seccomp support
coroutine-pool  coroutine freelist (better performance)
glusterfs   GlusterFS backend
@@ -6424,7 +6431,8 @@ if test "$skip_meson" = no; then
  -Dcurl=$curl -Dglusterfs=$glusterfs -Dbzip2=$bzip2 
-Dlibiscsi=$libiscsi \
  -Dlibnfs=$libnfs -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\
  -Drbd=$rbd -Dlzo=$lzo -Dsnappy=$snappy -Dlzfse=$lzfse \
--Dzstd=$zstd -Dseccomp=$seccomp -Dvirtfs=$virtfs -Dcap_ng=$cap_ng \
+-Dzstd=$zstd -Dqcow2_zstd_default=$qcow2_zstd_default \
+-Dseccomp=$seccomp -Dvirtfs=$virtfs -Dcap_ng=$cap_ng \
  -Dattr=$attr -Ddefault_devices=$default_devices \
  -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
  -Dvhost_user_blk_server=$vhost_user_blk_server 
-Dmultiprocess=$multiprocess \
diff --git a/meson.build b/meson.build
index d8a92666fb..3d65b6c46b 100644
--- a/meson.build
+++ b/meson.build
@@ -484,6 +484,9 @@ if not get_option('zstd').auto() or have_block
  required: get_option('zstd'),
  method: 'pkg-config', kwargs: static_kwargs)
  endif
+if not zstd.found() and get_option('qcow2_zstd_default').enabled()
+  error('--enable-qcow2-zstd-default: Cannot use zstd by default without 
enabling zstd')
+endif
  gbm = not_found
  if 'CONFIG_GBM' in config_host
gbm = declare_dependency(compile_args: config_host['GBM_CFLAGS'].split(),
@@ -1168,6 +1171,7 @@ config_host_data.set('CONFIG_GETTID', has_gettid)
  config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
  config_host_data.set('CONFIG_STATX', has_statx)
  config_host_data.set('CONFIG_ZSTD', zstd.found())
+config_host_data.set('CONFIG_QCOW2_ZSTD_DEFAULT', 
get_option('qcow2_zstd_default').enabled())
  config_host_data.set('CONFIG_FUSE', fuse.found())
  config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found())
  config_host_data.set('CONFIG_X11', x11.found())
diff --git a/block/qcow2.c b/block/qcow2.c
index ee4530cdbd..06bfbbf7b8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3540,17 +3540,36 @@ qcow2_co_create(BlockdevCreateOptions *create_options, 
Error **errp)
  }
  }
  
-if (qcow2_opts->has_compression_type &&

-qcow2_opts->compression_type != QCOW2_COMPRESSION_TYPE_ZLIB) {
-
-ret = -EINVAL;
-
-if (version < 3) {
+if (version < 3) {
+if (qcow2_opts->has_compression_type &&
+qcow2_opts->compression_type != QCOW2_COMPRESSION_TYPE_ZLIB)
+{
+ret = -EINVAL;
  error_setg(errp, "Non-zlib compression type is only supported with 
"
 "compatibility level 1.1 and above (use version=v3 or "
 "greater)");
  goto out;
  }
+} else {
+if (qcow2_opts->has_compression_type) {
+compression_type = qcow2_opts->compression_type;
+#ifdef CONFIG_QCOW2_ZSTD_DEFAULT
+} else {
+compression_type = QCOW2_COMPRESSION_TYPE_ZSTD;
+#endif
+}
+

Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-21 Thread Emanuele Giuseppe Esposito





On 19/06/2021 19:27, Vladimir Sementsov-Ogievskiy wrote:

14.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito
---
  block/block-copy.c | 49 +-
  1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 3f26be8ddc..5ff7764e87 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -52,29 +52,35 @@ typedef struct BlockCopyCallState {
  /* Coroutine where async block-copy is running */
  Coroutine *co;
-    /* To reference all call states from BlockCopyState */
-    QLIST_ENTRY(BlockCopyCallState) list;
-
  /* State */
-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
+
+    /* To reference all call states from BlockCopyState */
+    QLIST_ENTRY(BlockCopyCallState) list;
  /* OUT parameters */
+    bool cancelled;


actually, cancelled is not OUT field. It's set by user to cancel the 
operation. And block-copy tracks the field to understand should it 
cancel at the moment or not. So, it's part of state.


Makes sense.



Also, I just now understand, that "out parameter" sounds strange here. 
As "parameter" is an input by general meaning.. We may have "out 
parameters" for functions, as all parameters of a function are generally 
called "parameters" anyway. I think "OUT fields" is more correct here. I 
don't insist, as I'm not an expert in English (for sure, you'll find 
mistakes even in this paragraph :\



Actually I am using the terminology that was already there :)
Anyways, I am not expert here either but fields do sounds better.
I will change parameter -> field replacement to this patch.

Emanuele

Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-21 Thread Emanuele Giuseppe Esposito





On 19/06/2021 20:31, Vladimir Sementsov-Ogievskiy wrote:

19.06.2021 18:23, Vladimir Sementsov-Ogievskiy wrote:

  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */


That's just not true for method field :(


I think, we just need to document that @method is never accessed 
concurrently


Ok I got confused in the last patch. Method is read by 
block_copy_task_entry only after it is re-set in 
block_copy_dirty_clusters loop. Sorry for that.


Will leave it as IN and document it better.

Still, moving the lock granularity inside the while loop might not be 
too bad. Not sure though. At this point skip_unallocated can also be an 
atomic, even though I sense that you won't like that :)


Emanuele

Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions

2021-06-21 Thread Emanuele Giuseppe Esposito





On 19/06/2021 17:23, Vladimir Sementsov-Ogievskiy wrote:

14.06.2021 10:33, Emanuele Giuseppe Esposito wrote:

As done in BlockCopyCallState, categorize BlockCopyTask
and BlockCopyState in IN, State and OUT fields.
This is just to understand which field has to be protected with a lock.

.sleep_state is handled in the series "coroutine: new sleep/wake API"
and thus here left as TODO.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  block/block-copy.c | 49 +-
  1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 3f26be8ddc..5ff7764e87 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -52,29 +52,35 @@ typedef struct BlockCopyCallState {
  /* Coroutine where async block-copy is running */
  Coroutine *co;
-    /* To reference all call states from BlockCopyState */
-    QLIST_ENTRY(BlockCopyCallState) list;
-
  /* State */
-    int ret;
  bool finished;
-    QemuCoSleep sleep;
-    bool cancelled;
+    QemuCoSleep sleep; /* TODO: protect API with a lock */
+
+    /* To reference all call states from BlockCopyState */
+    QLIST_ENTRY(BlockCopyCallState) list;
  /* OUT parameters */
+    bool cancelled;
  bool error_is_read;
+    int ret;
  } BlockCopyCallState;
  typedef struct BlockCopyTask {
  AioTask task;
+    /*
+ * IN parameters. Initialized in block_copy_task_create()
+ * and never changed.
+ */


That's just not true for method field :(


You're right, because it is modified later in the while loop of 
block_copy_dirty_clusters, but the task is already in the list.

Will move it to state field.




  BlockCopyState *s;
  BlockCopyCallState *call_state;
  int64_t offset;
-    int64_t bytes;
  BlockCopyMethod method;
-    QLIST_ENTRY(BlockCopyTask) list;
+
+    /* State */
  CoQueue wait_queue; /* coroutines blocked on this task */
+    int64_t bytes;
+    QLIST_ENTRY(BlockCopyTask) list;
  } BlockCopyTask;
  static int64_t task_end(BlockCopyTask *task)
@@ -90,15 +96,25 @@ typedef struct BlockCopyState {
   */
  BdrvChild *source;
  BdrvChild *target;
-    BdrvDirtyBitmap *copy_bitmap;
+
+    /* State */
  int64_t in_flight_bytes;
-    int64_t cluster_size;
  BlockCopyMethod method;
-    int64_t max_transfer;
-    uint64_t len;
  QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all 
block-copy calls */

  QLIST_HEAD(, BlockCopyCallState) calls;
+    /* State fields that use a thread-safe API */
+    BdrvDirtyBitmap *copy_bitmap;
+    ProgressMeter *progress;
+    SharedResource *mem;
+    RateLimit rate_limit;
+    /*
+ * IN parameters. Initialized in block_copy_state_new()
+ * and never changed.
+ */
+    int64_t cluster_size;
+    int64_t max_transfer;
+    uint64_t len;
  BdrvRequestFlags write_flags;
  /*
@@ -114,14 +130,11 @@ typedef struct BlockCopyState {
   * In this case, block_copy() will query the source’s allocation 
status,
   * skip unallocated regions, clear them in the copy_bitmap, and 
invoke

   * block_copy_reset_unallocated() every time it does.
+ *
+ * This field is set in backup_run() before coroutines are run,
+ * therefore is an IN.


That's not true: it is modified from backup_run, when block-copy already 
initialized, and block_copy() calls may be done from backup-top filter.




Ok, I am not very familiar with the backup code, so I did not see it.
This means we need to protect this field too.

At this point, I think we can increase the granularity of the lock in 
block_copy_dirty_clusters:

instead of having in each while loop

block_copy_task_create(); // locks and releases internally
block_copy_block_status(); // no lock used, but uses skip_unallocated so 
it will need one

if()
block_copy_task_shrink(); // locks and releases internally
if(s->skip_unallocated) // will need lock
block_copy_task_end(); // locks and releases internally
[..]
if()
task->method = COPY_WRITE_ZEROS; // needs lock
[..]

we can just lock the while section and eventually create _locked() 
version of task_end and similar functions that are also used in 
non-locked code blocks.



Emanuele


   */
  bool skip_unallocated;
-
-    ProgressMeter *progress;
-
-    SharedResource *mem;
-
-    RateLimit rate_limit;
  } BlockCopyState;
  static BlockCopyTask *find_conflicting_task(BlockCopyState *s,

49 matches

Mail list logo