Re: [PATCH 5/6] block/nbd: Do not force-cap *pnum
On Mon, Jun 21, 2021 at 11:50:02AM +0200, Max Reitz wrote: > > I don't that this change is correct. In contrast with file-posix you > > don't get extra information for free, you just make a larger request. > > This means that server will have to do more work. > > Oh, oops. Seems I was blind in my rage to replace this MIN() pattern. > > You’re absolutely right. So this patch should be dropped. I disagree - I think ths patch is still correct, as written, _because_ we use the REQ_ONE flag. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH 5/6] block/nbd: Do not force-cap *pnum
On Sat, Jun 19, 2021 at 01:53:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > +++ b/block/nbd.c > > @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status( > > .type = NBD_CMD_BLOCK_STATUS, > > .from = offset, > > .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment), > > - MIN(bytes, s->info.size - offset)), > > + s->info.size - offset), > > .flags = NBD_CMD_FLAG_REQ_ONE, > > }; > > > > Hmm.. > > I don't that this change is correct. In contrast with file-posix you don't > get extra information for free, you just make a larger request. This means > that server will have to do more work. Not necessarily. The fact that we have passed NBD_CMD_FLAG_REQ_ONE means that the server is still only allowed to give us one extent in its answer, and that it may not give us information beyond the length we requested. You are right that if we lose the REQ_ONE flag we may result in the server doing more work to provide us additional extents that we will then be ignoring because we aren't yet set up for avoiding REQ_ONE. Fixing that is a longer-term goal. But in the short term, I see no harm in giving a larger length to the server with REQ_ONE. > > (look at blockstatus_to_extents, it calls bdrv_block_status_above in a loop). > > For example, assume that nbd export is a qcow2 image with all clusters > allocated. With this change, nbd server will loop through the whole qcow2 > image, load all L2 tables to return big allocated extent. No, the server is allowed to reply with less length than our request, and that is particularly true if the server does NOT have free access to the full length of our request. In the case of qcow2, since bdrv_block_status is (by current design) clamped at cluster boundaries, requesting a 4G length will NOT increase the amount of the server response any further than the first cluster boundary (that is, the point where the server no longer has free access to status without loading another cluster of L2 entries). > > So, only server can decide, could it add some extra free information to > request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't allow it. What the flag prohibits is the server giving us more information than the length we requested. But this patch is increasing our request length for the case where the server CAN give us more information than we need locally, on the hopes that even though the server can only reply with one extent, we aren't wasting as many network back-and-forth trips when a larger request would have worked. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH 0/4] export/fuse: Allow other users access to the export
On 21.06.21 18:12, Kevin Wolf wrote: Am 14.06.2021 um 16:44 hat Max Reitz geschrieben: Hi, With the default mount options, FUSE mounts are not accessible to any users but the one who did the mount, not even to root. To allow such accesses, allow_other must be passed. This is probably useful to some people (it certainly is to me, e.g. when exporting some image as my normal user, and then trying to loop mount it as root), so this series adds a QAPI allow-other bool that will make the FUSE export code pass allow_other,default_permissions to FUSE. (default_permissions will make the kernel do the usual UNIX permission checks, which is something that makes a lot of sense when allowing other users access to the export.) This also requires our SETATTR code to be able to handle permission changes, though, so the user can then run chmod/chown/chgrp on the export to adjust its permissions to their need. The final patch adds a test. If there is even a use case for leaving the option off (not trusting root?), it must certainly be the less common case? So I'm not sure if allow-other should be an option at all, but if it is, enabling it by default would make more sense to me. Is there a reason why you picked false as the default, except that it is the old behaviour? No. :) Well, mostly. I also thought, if FUSE thinks allow_other shouldn’t be the default, who am I to decide otherwise. Now that I tried to find out why FUSE has it as the default (I only remember vague “security reasons”), I still couldn’t find out why, but I did find that using this option as non-root user requires /etc/fuse.conf to have user_allow_other in it, which I don’t think we can require. So I think it must be an option. As for which value should be the default, that probably depends on how common having user_allow_other in /etc/fuse.conf is. I know I never put it there, and it’s both on my Fedora and my Arch system. So I guess it seems rather common? Max
Re: [PATCH 0/4] export/fuse: Allow other users access to the export
Am 14.06.2021 um 16:44 hat Max Reitz geschrieben: > Hi, > > With the default mount options, FUSE mounts are not accessible to any > users but the one who did the mount, not even to root. To allow such > accesses, allow_other must be passed. > > This is probably useful to some people (it certainly is to me, e.g. when > exporting some image as my normal user, and then trying to loop mount it > as root), so this series adds a QAPI allow-other bool that will make the > FUSE export code pass allow_other,default_permissions to FUSE. > > (default_permissions will make the kernel do the usual UNIX permission > checks, which is something that makes a lot of sense when allowing other > users access to the export.) > > This also requires our SETATTR code to be able to handle permission > changes, though, so the user can then run chmod/chown/chgrp on the > export to adjust its permissions to their need. > > The final patch adds a test. If there is even a use case for leaving the option off (not trusting root?), it must certainly be the less common case? So I'm not sure if allow-other should be an option at all, but if it is, enabling it by default would make more sense to me. Is there a reason why you picked false as the default, except that it is the old behaviour? Kevin
Re: [PATCH RFC] meson: add option to use zstd for qcow2 compression by default
Am 21.06.2021 um 10:22 hat Paolo Bonzini geschrieben: > On 17/06/21 21:51, Vladimir Sementsov-Ogievskiy wrote: > > So, it's an RFC. I also can split the patch so that refactoring of > > qcow2_co_create() go in a separate preparation patch. > > > > Another RFC question, shouldn't we move to zstd by default in upstream > > too? > > I think backwards-incompatible changes in the past were not handled with > build options, but that can be changed. > > However I would prefer to have an option like > --with-qcow2-compression={zstd,zlib}. Meson supports multiple-choice > options, they don't have to use enabled/disabled or (if boolean) true/false. Yes, this is more extensible. > Regarding changing the default, that would make images unreadable to QEMU > 5.0 and earlier versions. Does that apply to images that have no compressed > clusters? I think it does because you could be writing compressed clusters to it later. Originally, we had only 'qemu-img convert -c' that could write compressed clusters, but today the backup job can write them, too, and it doesn't create the image file itself. Kevin
Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device
> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé wrote: > > On 6/21/21 3:18 PM, Fam Zheng wrote: >> >> >>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé wrote: >>> >>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>> -ENOMEM in case of error. The driver was correctly handling the >>> error path to recycle its volatile IOVA mappings. >>> >>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>> DMA mappings per container", April 2019) added the -ENOSPC error to >>> signal the user exhausted the DMA mappings available for a container. >>> >>> The block driver started to mis-behave: >>> >>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >>> (qemu) >>> (qemu) info status >>> VM status: paused (io-error) >>> (qemu) c >>> VFIO_MAP_DMA failed: No space left on device >>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: >>> Assertion `ctx == blk->ctx' failed. >> >> Hi Phil, >> >> >> The diff looks good to me, but I’m not sure what exactly caused the >> assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC >> before, so it should be treated as a general case. What am I missing? > > Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE > -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping > exhausted. I don't understand the full "VM resume" path, but this > is not what we want (IO_NOSPACE is to warn the operator to add > more storage and resume, which is pointless in our case, resuming > won't help until we flush the mappings). > > IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED. I agree with that. It just makes me feel there’s another bug in the resuming code path. Can you get a backtrace? Fam
Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device
On 6/21/21 3:18 PM, Fam Zheng wrote: > > >> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé wrote: >> >> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >> -ENOMEM in case of error. The driver was correctly handling the >> error path to recycle its volatile IOVA mappings. >> >> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >> DMA mappings per container", April 2019) added the -ENOSPC error to >> signal the user exhausted the DMA mappings available for a container. >> >> The block driver started to mis-behave: >> >> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >> (qemu) >> (qemu) info status >> VM status: paused (io-error) >> (qemu) c >> VFIO_MAP_DMA failed: No space left on device >> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: >> Assertion `ctx == blk->ctx' failed. > > Hi Phil, > > > The diff looks good to me, but I’m not sure what exactly caused the assertion > failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it > should be treated as a general case. What am I missing? Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping exhausted. I don't understand the full "VM resume" path, but this is not what we want (IO_NOSPACE is to warn the operator to add more storage and resume, which is pointless in our case, resuming won't help until we flush the mappings). IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED.
[PATCH] block/rbd: Add support for rbd image encryption
Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri --- block/rbd.c | 367 ++- qapi/block-core.json | 110 - 2 files changed, 471 insertions(+), 6 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index f098a89c7b..7e282a8e94 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -73,6 +73,18 @@ #define LIBRBD_USE_IOVEC 0 #endif +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 + +static const char rbd_luks_header_verification[ +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 +}; + +static const char rbd_luks2_header_verification[ +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 +}; + typedef enum { RBD_AIO_READ, RBD_AIO_WRITE, @@ -341,6 +353,202 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs) } } +#ifdef LIBRBD_SUPPORTS_ENCRYPTION +static int qemu_rbd_convert_luks_options( +RbdEncryptionOptionsLUKSBase *luks_opts, +char **passphrase, +size_t *passphrase_len, +Error **errp) +{ +return qcrypto_secret_lookup( +luks_opts->key_secret, (uint8_t **)passphrase, passphrase_len, errp); +} + +static int qemu_rbd_convert_luks_create_options( +RbdEncryptionCreateOptionsLUKSBase *luks_opts, +rbd_encryption_algorithm_t *alg, +char **passphrase, +size_t *passphrase_len, +Error **errp) +{ +int r = 0; + +r = qemu_rbd_convert_luks_options( +qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts), +passphrase, passphrase_len, errp); +if (r < 0) { +return r; +} + +if (luks_opts->has_cipher_alg) { +switch (luks_opts->cipher_alg) { +case QCRYPTO_CIPHER_ALG_AES_128: { +*alg = RBD_ENCRYPTION_ALGORITHM_AES128; +break; +} +case QCRYPTO_CIPHER_ALG_AES_256: { +*alg = RBD_ENCRYPTION_ALGORITHM_AES256; +break; +} +default: { +r = -ENOTSUP; +error_setg_errno(errp, -r, "unknown encryption algorithm: %u", + luks_opts->cipher_alg); +return r; +} +} +} else { +/* default alg */ +*alg = RBD_ENCRYPTION_ALGORITHM_AES256; +} + +return 0; +} + +static int qemu_rbd_encryption_format(rbd_image_t image, + RbdEncryptionCreateOptions *encrypt, + Error **errp) +{ +int r = 0; +g_autofree char *passphrase = NULL; +size_t passphrase_len; +rbd_encryption_format_t format; +rbd_encryption_options_t opts; +rbd_encryption_luks1_format_options_t luks_opts; +rbd_encryption_luks2_format_options_t luks2_opts; +size_t opts_size; +uint64_t raw_size, effective_size; + +r = rbd_get_size(image, _size); +if (r < 0) { +error_setg_errno(errp, -r, "cannot get raw image size"); +return r; +} + +switch (encrypt->format) { +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: { +memset(_opts, 0, sizeof(luks_opts)); +format = RBD_ENCRYPTION_FORMAT_LUKS1; +opts = _opts; +opts_size = sizeof(luks_opts); +r = qemu_rbd_convert_luks_create_options( +qapi_RbdEncryptionCreateOptionsLUKS_base(>u.luks), +_opts.alg, , _len, errp); +if (r < 0) { +return r; +} +luks_opts.passphrase = passphrase; +luks_opts.passphrase_size = passphrase_len; +break; +} +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: { +memset(_opts, 0, sizeof(luks2_opts)); +format = RBD_ENCRYPTION_FORMAT_LUKS2; +opts = _opts; +opts_size = sizeof(luks2_opts); +r = qemu_rbd_convert_luks_create_options( +qapi_RbdEncryptionCreateOptionsLUKS2_base( +>u.luks2), +_opts.alg, , _len, errp); +if (r < 0) { +return r; +} +luks2_opts.passphrase = passphrase; +luks2_opts.passphrase_size = passphrase_len; +break; +} +default: { +r = -ENOTSUP; +error_setg_errno( +errp, -r, "unknown image encryption format: %u", +encrypt->format); +return
Re: [PATCH] block/rbd: Add support for rbd image encryption
Patchew URL: https://patchew.org/QEMU/20210621142103.1417408-1-...@il.ibm.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20210621142103.1417408-1-...@il.ibm.com Subject: [PATCH] block/rbd: Add support for rbd image encryption === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20210614083800.1166166-1-richard.hender...@linaro.org -> patchew/20210614083800.1166166-1-richard.hender...@linaro.org * [new tag] patchew/20210621142103.1417408-1-...@il.ibm.com -> patchew/20210621142103.1417408-1-...@il.ibm.com Switched to a new branch 'test' 19faaee block/rbd: Add support for rbd image encryption === OUTPUT BEGIN === ERROR: "(foo**)" should be "(foo **)" #60: FILE: block/rbd.c:374: +luks_opts->key_secret, (uint8_t**)passphrase, passphrase_len, errp); total: 1 errors, 0 warnings, 601 lines checked Commit 19faaee12b6d (block/rbd: Add support for rbd image encryption) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20210621142103.1417408-1-...@il.ibm.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
[PATCH] block/rbd: Add support for rbd image encryption
Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri --- block/rbd.c | 367 ++- qapi/block-core.json | 110 - 2 files changed, 471 insertions(+), 6 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index f098a89c7b..be1419a9bd 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -73,6 +73,18 @@ #define LIBRBD_USE_IOVEC 0 #endif +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 + +static const char rbd_luks_header_verification[ +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 +}; + +static const char rbd_luks2_header_verification[ +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 +}; + typedef enum { RBD_AIO_READ, RBD_AIO_WRITE, @@ -341,6 +353,202 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs) } } +#ifdef LIBRBD_SUPPORTS_ENCRYPTION +static int qemu_rbd_convert_luks_options( +RbdEncryptionOptionsLUKSBase *luks_opts, +char **passphrase, +size_t *passphrase_len, +Error **errp) +{ +return qcrypto_secret_lookup( +luks_opts->key_secret, (uint8_t**)passphrase, passphrase_len, errp); +} + +static int qemu_rbd_convert_luks_create_options( +RbdEncryptionCreateOptionsLUKSBase *luks_opts, +rbd_encryption_algorithm_t *alg, +char **passphrase, +size_t *passphrase_len, +Error **errp) +{ +int r = 0; + +r = qemu_rbd_convert_luks_options( +qapi_RbdEncryptionCreateOptionsLUKSBase_base(luks_opts), +passphrase, passphrase_len, errp); +if (r < 0) { +return r; +} + +if (luks_opts->has_cipher_alg) { +switch (luks_opts->cipher_alg) { +case QCRYPTO_CIPHER_ALG_AES_128: { +*alg = RBD_ENCRYPTION_ALGORITHM_AES128; +break; +} +case QCRYPTO_CIPHER_ALG_AES_256: { +*alg = RBD_ENCRYPTION_ALGORITHM_AES256; +break; +} +default: { +r = -ENOTSUP; +error_setg_errno(errp, -r, "unknown encryption algorithm: %u", + luks_opts->cipher_alg); +return r; +} +} +} else { +/* default alg */ +*alg = RBD_ENCRYPTION_ALGORITHM_AES256; +} + +return 0; +} + +static int qemu_rbd_encryption_format(rbd_image_t image, + RbdEncryptionCreateOptions *encrypt, + Error **errp) +{ +int r = 0; +g_autofree char *passphrase = NULL; +size_t passphrase_len; +rbd_encryption_format_t format; +rbd_encryption_options_t opts; +rbd_encryption_luks1_format_options_t luks_opts; +rbd_encryption_luks2_format_options_t luks2_opts; +size_t opts_size; +uint64_t raw_size, effective_size; + +r = rbd_get_size(image, _size); +if (r < 0) { +error_setg_errno(errp, -r, "cannot get raw image size"); +return r; +} + +switch (encrypt->format) { +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: { +memset(_opts, 0, sizeof(luks_opts)); +format = RBD_ENCRYPTION_FORMAT_LUKS1; +opts = _opts; +opts_size = sizeof(luks_opts); +r = qemu_rbd_convert_luks_create_options( +qapi_RbdEncryptionCreateOptionsLUKS_base(>u.luks), +_opts.alg, , _len, errp); +if (r < 0) { +return r; +} +luks_opts.passphrase = passphrase; +luks_opts.passphrase_size = passphrase_len; +break; +} +case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: { +memset(_opts, 0, sizeof(luks2_opts)); +format = RBD_ENCRYPTION_FORMAT_LUKS2; +opts = _opts; +opts_size = sizeof(luks2_opts); +r = qemu_rbd_convert_luks_create_options( +qapi_RbdEncryptionCreateOptionsLUKS2_base( +>u.luks2), +_opts.alg, , _len, errp); +if (r < 0) { +return r; +} +luks2_opts.passphrase = passphrase; +luks2_opts.passphrase_size = passphrase_len; +break; +} +default: { +r = -ENOTSUP; +error_setg_errno( +errp, -r, "unknown image encryption format: %u", +encrypt->format); +return r;
Re: [PATCH 0/2] introduce QEMU_AUTO_VFREE
On Sat, Jun 19, 2021 at 05:21:18PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > There is a good movement to use g_autofree macro, that helps to > automatically call g_free on exit from code block. > > We lack similar possibility for qemu_memalign() functions family. Let's > add, it seems rather simple with help of "cleanup" attribute. > > I'll update more places with a follow-up if this is accepted. > > Vladimir Sementsov-Ogievskiy (2): > introduce QEMU_AUTO_VFREE > block/commit: use QEMU_AUTO_VFREE > > include/qemu/osdep.h | 2 ++ > block/commit.c | 25 + > 2 files changed, 11 insertions(+), 16 deletions(-) > > -- > 2.29.2 > Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device
> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé wrote: > > When the NVMe block driver was introduced (see commit bdd6a90a9e5, > January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning > -ENOMEM in case of error. The driver was correctly handling the > error path to recycle its volatile IOVA mappings. > > To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit > DMA mappings per container", April 2019) added the -ENOSPC error to > signal the user exhausted the DMA mappings available for a container. > > The block driver started to mis-behave: > > qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device > (qemu) > (qemu) info status > VM status: paused (io-error) > (qemu) c > VFIO_MAP_DMA failed: No space left on device > qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: > Assertion `ctx == blk->ctx' failed. Hi Phil, The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? Reviewed-by: Fam Zheng > > Fix by handling the new -ENOSPC error (when DMA mappings are > exhausted) without any distinction to the current -ENOMEM error, > so we don't change the behavior on old kernels where the CVE-2019-3882 > fix is not present. > > An easy way to reproduce this bug is to restrict the DMA mapping > limit (65535 by default) when loading the VFIO IOMMU module: > > # modprobe vfio_iommu_type1 dma_entry_limit=666 > > Cc: qemu-sta...@nongnu.org > Reported-by: Michal Prívozník > Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") > Buglink: https://bugs.launchpad.net/qemu/+bug/186 > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 > Signed-off-by: Philippe Mathieu-Daudé > --- > v2: KISS checking both errors undistinguishedly (Maxim) > --- > block/nvme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 2b5421e7aa6..c3d2a49866c 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -1030,7 +1030,7 @@ try_map: > r = qemu_vfio_dma_map(s->vfio, > qiov->iov[i].iov_base, > len, true, ); > -if (r == -ENOMEM && retry) { > +if ((r == -ENOMEM || r == -ENOSPC) && retry) { > retry = false; > trace_nvme_dma_flush_queue_wait(s); > if (s->dma_map_count) { > -- > 2.31.1 > >
Re: [PATCH v3 02/24] modules: collect module meta-data
On Fri, Jun 18, 2021 at 06:09:55PM +0200, Paolo Bonzini wrote: > On 18/06/21 06:53, Gerd Hoffmann wrote: > > +def find_command(src, target, compile_commands): > > +for command in compile_commands: > > +if command['file'] != src: > > +continue > > +if target != '' and command['command'].find(target) == -1: > > +continue > > > Did you look into using extract_objects for this instead of looking for the > target (which works, but yuck :))? ninja: error: 'libui-curses.a.p/meson-generated_.._config-host.h.o', needed by 'ui-curses.modinfo.test', missing and no known rule to make it Hmm, not sure where this comes from. meson doesn't try to link config-host.h.o into libui-curses.a, so why does extract_all_objects() return it? Test patch (incremental to this series) below. take care, Gerd >From 5453683429d7b08b959e2cd63ee00fdccfb0c7b7 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann Date: Mon, 21 Jun 2021 14:45:14 +0200 Subject: [PATCH] [wip] extract_all_objects experiments --- meson.build | 7 +++ scripts/modinfo-test.sh | 8 2 files changed, 15 insertions(+) create mode 100755 scripts/modinfo-test.sh diff --git a/meson.build b/meson.build index 03bacca7cddb..8e7f176c 100644 --- a/meson.build +++ b/meson.build @@ -2042,6 +2042,7 @@ target_modules += { 'accel' : { 'qtest': qtest_module_ss, modinfo_collect = find_program('scripts/modinfo-collect.py') modinfo_generate = find_program('scripts/modinfo-generate.py') +modinfo_test = find_program('scripts/modinfo-test.sh') modinfo_files = [] block_mods = [] @@ -2063,6 +2064,12 @@ foreach d, list : modules input: module_ss.sources(), capture: true, command: [modinfo_collect, '@INPUT@']) +custom_target(d + '-' + m + '.modinfo.test', + output: d + '-' + m + '.modinfo.test', + input: sl.extract_all_objects(recursive: true), + capture: true, + build_by_default: true, # to be removed when added to a target + command: [modinfo_test, '@INPUT@']) endif else if d == 'block' diff --git a/scripts/modinfo-test.sh b/scripts/modinfo-test.sh new file mode 100755 index ..979c9cc9aeef --- /dev/null +++ b/scripts/modinfo-test.sh @@ -0,0 +1,8 @@ +#!/bin/sh +if test "$1" = "--target"; then +echo "# target $2" +shift;shift +fi +for item in "$@"; do +echo "# input $item" +done -- 2.31.1
Re: [PATCH] block/rbd: Add support for rbd image encryption
On Mon, Jun 21, 2021 at 1:27 PM Daniel P. Berrangé wrote: > > On Mon, Jun 21, 2021 at 01:23:46PM +0200, Ilya Dryomov wrote: > > On Mon, Jun 21, 2021 at 1:04 PM Daniel P. Berrangé > > wrote: > > > > > > On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote: > > > > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé > > > > wrote: > > > > > > > > > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote: > > > > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri wrote: > > > > > > > > > > > > > > Starting from ceph Pacific, RBD has built-in support for > > > > > > > image-level encryption. > > > > > > > Currently supported formats are LUKS version 1 and 2. > > > > > > > > > > > > > > There are 2 new relevant librbd APIs for controlling encryption, > > > > > > > both expect an > > > > > > > open image context: > > > > > > > > > > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS > > > > > > > header) > > > > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO > > > > > > > stack > > > > > > > > > > > > > > This commit extends the qemu rbd driver API to support the above. > > > > > > > > > > > > > > Signed-off-by: Or Ozeri > > > > > > > --- > > > > > > > block/raw-format.c | 7 + > > > > > > > block/rbd.c | 371 > > > > > > > ++- > > > > > > > qapi/block-core.json | 110 - > > > > > > > 3 files changed, 482 insertions(+), 6 deletions(-) > > > > > > > > > > > > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > > > > > index f098a89c7b..183b17cd84 100644 > > > > > > > --- a/block/rbd.c > > > > > > > +++ b/block/rbd.c > > > > > > > @@ -73,6 +73,18 @@ > > > > > > > #define LIBRBD_USE_IOVEC 0 > > > > > > > #endif > > > > > > > > > > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 > > > > > > > + > > > > > > > +static const char rbd_luks_header_verification[ > > > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 > > > > > > > +}; > > > > > > > + > > > > > > > +static const char rbd_luks2_header_verification[ > > > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 > > > > > > > +}; > > > > > > > + > > > > > > > typedef enum { > > > > > > > RBD_AIO_READ, > > > > > > > RBD_AIO_WRITE, > > > > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, > > > > > > > int64_t offs) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION > > > > > > > +static int qemu_rbd_convert_luks_options( > > > > > > > +RbdEncryptionOptionsLUKSBase *luks_opts, > > > > > > > +char **passphrase, > > > > > > > +Error **errp) > > > > > > > +{ > > > > > > > +int r = 0; > > > > > > > + > > > > > > > +if (!luks_opts->has_key_secret) { > > > > > > > +r = -EINVAL; > > > > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret"); > > > > > > > +return r; > > > > > > > +} > > > > > > > > > > > > Why is key-secret optional? > > > > > > > > > > It doesn't look like it is handled correctly here, but we need to > > > > > be able to run 'qemu-img info ' and get information back > > > > > on the size of the image, and whether or not it is encrypted, > > > > > without having to supply a passphrase upfront. So it is right that > > > > > key-secret be optional, but also we shouldn't return an fatal > > > > > error like this. > > > > > > > > Hi Daniel, > > > > > > > > The key-secret lives inside RbdEncryptionOptions (or > > > > RbdEncryptionCreateOptions) which are already optional: > > > > > > > > '*encrypt': 'RbdEncryptionOptions' > > > > > > > > '*encrypt' :'RbdEncryptionCreateOptions' > > > > > > > > The image is opened as usual and then, if "encrypt" is specified, > > > > the encryption profile is loaded (or created and laid down). It does > > > > not make sense to attempt to load or create the encryption profile > > > > without the passphrase -- it would always fail. > > > > > > Ah, that sounds like it is probably ok then. > > > > > > > > > > > Only if BDRV_O_NO_IO is NOT set, should this error be reported > > > > > > > > > > > > > > > > > > > > > > > > > > > static int64_t qemu_rbd_getlength(BlockDriverState *bs) > > > > > > > { > > > > > > > BDRVRBDState *s = bs->opaque; > > > > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = > > > > > > > { > > > > > > > .type = QEMU_OPT_STRING, > > > > > > > .help = "ID of secret providing the password", > > > > > > > }, > > > > > > > +{ > > > > > > > +.name = "encrypt.format", > > > > > > > +.type = QEMU_OPT_STRING, > > > > > > > +.help = "Encrypt the image, format choices: 'luks', > > > > > > > 'luks2'", > > > > > > > > > > > > I think it should be "luks1" and
Re: [PATCH] block/rbd: Add support for rbd image encryption
On Mon, Jun 21, 2021 at 01:23:46PM +0200, Ilya Dryomov wrote: > On Mon, Jun 21, 2021 at 1:04 PM Daniel P. Berrangé > wrote: > > > > On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote: > > > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé > > > wrote: > > > > > > > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote: > > > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri wrote: > > > > > > > > > > > > Starting from ceph Pacific, RBD has built-in support for > > > > > > image-level encryption. > > > > > > Currently supported formats are LUKS version 1 and 2. > > > > > > > > > > > > There are 2 new relevant librbd APIs for controlling encryption, > > > > > > both expect an > > > > > > open image context: > > > > > > > > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS > > > > > > header) > > > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack > > > > > > > > > > > > This commit extends the qemu rbd driver API to support the above. > > > > > > > > > > > > Signed-off-by: Or Ozeri > > > > > > --- > > > > > > block/raw-format.c | 7 + > > > > > > block/rbd.c | 371 > > > > > > ++- > > > > > > qapi/block-core.json | 110 - > > > > > > 3 files changed, 482 insertions(+), 6 deletions(-) > > > > > > > > > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > > > > index f098a89c7b..183b17cd84 100644 > > > > > > --- a/block/rbd.c > > > > > > +++ b/block/rbd.c > > > > > > @@ -73,6 +73,18 @@ > > > > > > #define LIBRBD_USE_IOVEC 0 > > > > > > #endif > > > > > > > > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 > > > > > > + > > > > > > +static const char rbd_luks_header_verification[ > > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 > > > > > > +}; > > > > > > + > > > > > > +static const char rbd_luks2_header_verification[ > > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 > > > > > > +}; > > > > > > + > > > > > > typedef enum { > > > > > > RBD_AIO_READ, > > > > > > RBD_AIO_WRITE, > > > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, > > > > > > int64_t offs) > > > > > > } > > > > > > } > > > > > > > > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION > > > > > > +static int qemu_rbd_convert_luks_options( > > > > > > +RbdEncryptionOptionsLUKSBase *luks_opts, > > > > > > +char **passphrase, > > > > > > +Error **errp) > > > > > > +{ > > > > > > +int r = 0; > > > > > > + > > > > > > +if (!luks_opts->has_key_secret) { > > > > > > +r = -EINVAL; > > > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret"); > > > > > > +return r; > > > > > > +} > > > > > > > > > > Why is key-secret optional? > > > > > > > > It doesn't look like it is handled correctly here, but we need to > > > > be able to run 'qemu-img info ' and get information back > > > > on the size of the image, and whether or not it is encrypted, > > > > without having to supply a passphrase upfront. So it is right that > > > > key-secret be optional, but also we shouldn't return an fatal > > > > error like this. > > > > > > Hi Daniel, > > > > > > The key-secret lives inside RbdEncryptionOptions (or > > > RbdEncryptionCreateOptions) which are already optional: > > > > > > '*encrypt': 'RbdEncryptionOptions' > > > > > > '*encrypt' :'RbdEncryptionCreateOptions' > > > > > > The image is opened as usual and then, if "encrypt" is specified, > > > the encryption profile is loaded (or created and laid down). It does > > > not make sense to attempt to load or create the encryption profile > > > without the passphrase -- it would always fail. > > > > Ah, that sounds like it is probably ok then. > > > > > > > > Only if BDRV_O_NO_IO is NOT set, should this error be reported > > > > > > > > > > > > > > > > > > > > > > static int64_t qemu_rbd_getlength(BlockDriverState *bs) > > > > > > { > > > > > > BDRVRBDState *s = bs->opaque; > > > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = { > > > > > > .type = QEMU_OPT_STRING, > > > > > > .help = "ID of secret providing the password", > > > > > > }, > > > > > > +{ > > > > > > +.name = "encrypt.format", > > > > > > +.type = QEMU_OPT_STRING, > > > > > > +.help = "Encrypt the image, format choices: 'luks', > > > > > > 'luks2'", > > > > > > > > > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and > > > > > "rbd encryption format" command. > > > > > > > > No, it should stay "luks" not "luks1", to match the existing QEMU > > > > terminology for its LUKS v1 encryption support. > > > > > > If you insist on following the QEMU nomenclature here it's fine with > > > me but I want to point out that
Re: [PATCH] block/rbd: Add support for rbd image encryption
On Mon, Jun 21, 2021 at 1:04 PM Daniel P. Berrangé wrote: > > On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote: > > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé > > wrote: > > > > > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote: > > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri wrote: > > > > > > > > > > Starting from ceph Pacific, RBD has built-in support for image-level > > > > > encryption. > > > > > Currently supported formats are LUKS version 1 and 2. > > > > > > > > > > There are 2 new relevant librbd APIs for controlling encryption, both > > > > > expect an > > > > > open image context: > > > > > > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS header) > > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack > > > > > > > > > > This commit extends the qemu rbd driver API to support the above. > > > > > > > > > > Signed-off-by: Or Ozeri > > > > > --- > > > > > block/raw-format.c | 7 + > > > > > block/rbd.c | 371 > > > > > ++- > > > > > qapi/block-core.json | 110 - > > > > > 3 files changed, 482 insertions(+), 6 deletions(-) > > > > > > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > > > index f098a89c7b..183b17cd84 100644 > > > > > --- a/block/rbd.c > > > > > +++ b/block/rbd.c > > > > > @@ -73,6 +73,18 @@ > > > > > #define LIBRBD_USE_IOVEC 0 > > > > > #endif > > > > > > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 > > > > > + > > > > > +static const char rbd_luks_header_verification[ > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 > > > > > +}; > > > > > + > > > > > +static const char rbd_luks2_header_verification[ > > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 > > > > > +}; > > > > > + > > > > > typedef enum { > > > > > RBD_AIO_READ, > > > > > RBD_AIO_WRITE, > > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, > > > > > int64_t offs) > > > > > } > > > > > } > > > > > > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION > > > > > +static int qemu_rbd_convert_luks_options( > > > > > +RbdEncryptionOptionsLUKSBase *luks_opts, > > > > > +char **passphrase, > > > > > +Error **errp) > > > > > +{ > > > > > +int r = 0; > > > > > + > > > > > +if (!luks_opts->has_key_secret) { > > > > > +r = -EINVAL; > > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret"); > > > > > +return r; > > > > > +} > > > > > > > > Why is key-secret optional? > > > > > > It doesn't look like it is handled correctly here, but we need to > > > be able to run 'qemu-img info ' and get information back > > > on the size of the image, and whether or not it is encrypted, > > > without having to supply a passphrase upfront. So it is right that > > > key-secret be optional, but also we shouldn't return an fatal > > > error like this. > > > > Hi Daniel, > > > > The key-secret lives inside RbdEncryptionOptions (or > > RbdEncryptionCreateOptions) which are already optional: > > > > '*encrypt': 'RbdEncryptionOptions' > > > > '*encrypt' :'RbdEncryptionCreateOptions' > > > > The image is opened as usual and then, if "encrypt" is specified, > > the encryption profile is loaded (or created and laid down). It does > > not make sense to attempt to load or create the encryption profile > > without the passphrase -- it would always fail. > > Ah, that sounds like it is probably ok then. > > > > > Only if BDRV_O_NO_IO is NOT set, should this error be reported > > > > > > > > > > > > > > > > > static int64_t qemu_rbd_getlength(BlockDriverState *bs) > > > > > { > > > > > BDRVRBDState *s = bs->opaque; > > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = { > > > > > .type = QEMU_OPT_STRING, > > > > > .help = "ID of secret providing the password", > > > > > }, > > > > > +{ > > > > > +.name = "encrypt.format", > > > > > +.type = QEMU_OPT_STRING, > > > > > +.help = "Encrypt the image, format choices: 'luks', > > > > > 'luks2'", > > > > > > > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and > > > > "rbd encryption format" command. > > > > > > No, it should stay "luks" not "luks1", to match the existing QEMU > > > terminology for its LUKS v1 encryption support. > > > > If you insist on following the QEMU nomenclature here it's fine with > > me but I want to point out that encryption-formatted clones won't be > > interoperable with QEMU LUKS driver or dm-crypt so making the names > > match QEMU instead of librbd and rbd CLI seems a bit misleading. > > In what way is it not interoperable ? > > If we don't specify any 'encrypt' option, does the guest see the > raw LUKS header + payload, or is the
Re: [PATCH] block/rbd: Add support for rbd image encryption
On Mon, Jun 21, 2021 at 12:59:37PM +0200, Ilya Dryomov wrote: > On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé > wrote: > > > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote: > > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri wrote: > > > > > > > > Starting from ceph Pacific, RBD has built-in support for image-level > > > > encryption. > > > > Currently supported formats are LUKS version 1 and 2. > > > > > > > > There are 2 new relevant librbd APIs for controlling encryption, both > > > > expect an > > > > open image context: > > > > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS header) > > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack > > > > > > > > This commit extends the qemu rbd driver API to support the above. > > > > > > > > Signed-off-by: Or Ozeri > > > > --- > > > > block/raw-format.c | 7 + > > > > block/rbd.c | 371 ++- > > > > qapi/block-core.json | 110 - > > > > 3 files changed, 482 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > > index f098a89c7b..183b17cd84 100644 > > > > --- a/block/rbd.c > > > > +++ b/block/rbd.c > > > > @@ -73,6 +73,18 @@ > > > > #define LIBRBD_USE_IOVEC 0 > > > > #endif > > > > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 > > > > + > > > > +static const char rbd_luks_header_verification[ > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 > > > > +}; > > > > + > > > > +static const char rbd_luks2_header_verification[ > > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 > > > > +}; > > > > + > > > > typedef enum { > > > > RBD_AIO_READ, > > > > RBD_AIO_WRITE, > > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t > > > > offs) > > > > } > > > > } > > > > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION > > > > +static int qemu_rbd_convert_luks_options( > > > > +RbdEncryptionOptionsLUKSBase *luks_opts, > > > > +char **passphrase, > > > > +Error **errp) > > > > +{ > > > > +int r = 0; > > > > + > > > > +if (!luks_opts->has_key_secret) { > > > > +r = -EINVAL; > > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret"); > > > > +return r; > > > > +} > > > > > > Why is key-secret optional? > > > > It doesn't look like it is handled correctly here, but we need to > > be able to run 'qemu-img info ' and get information back > > on the size of the image, and whether or not it is encrypted, > > without having to supply a passphrase upfront. So it is right that > > key-secret be optional, but also we shouldn't return an fatal > > error like this. > > Hi Daniel, > > The key-secret lives inside RbdEncryptionOptions (or > RbdEncryptionCreateOptions) which are already optional: > > '*encrypt': 'RbdEncryptionOptions' > > '*encrypt' :'RbdEncryptionCreateOptions' > > The image is opened as usual and then, if "encrypt" is specified, > the encryption profile is loaded (or created and laid down). It does > not make sense to attempt to load or create the encryption profile > without the passphrase -- it would always fail. Ah, that sounds like it is probably ok then. > > Only if BDRV_O_NO_IO is NOT set, should this error be reported > > > > > > > > > > > > static int64_t qemu_rbd_getlength(BlockDriverState *bs) > > > > { > > > > BDRVRBDState *s = bs->opaque; > > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = { > > > > .type = QEMU_OPT_STRING, > > > > .help = "ID of secret providing the password", > > > > }, > > > > +{ > > > > +.name = "encrypt.format", > > > > +.type = QEMU_OPT_STRING, > > > > +.help = "Encrypt the image, format choices: 'luks', > > > > 'luks2'", > > > > > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and > > > "rbd encryption format" command. > > > > No, it should stay "luks" not "luks1", to match the existing QEMU > > terminology for its LUKS v1 encryption support. > > If you insist on following the QEMU nomenclature here it's fine with > me but I want to point out that encryption-formatted clones won't be > interoperable with QEMU LUKS driver or dm-crypt so making the names > match QEMU instead of librbd and rbd CLI seems a bit misleading. In what way is it not interoperable ? If we don't specify any 'encrypt' option, does the guest see the raw LUKS header + payload, or is the header completely hidden and only ever accessible RBD ? > > > > +## > > > > +# @RbdEncryptionCreateOptionsLUKSBase: > > > > +# > > > > +# @cipher-alg: The encryption algorithm > > > > +# > > > > +# Since: 6.1 > > > > +## > > > > +{ 'struct': 'RbdEncryptionCreateOptionsLUKSBase', > > > > + 'base':
Re: [PATCH] block/rbd: Add support for rbd image encryption
On Mon, Jun 21, 2021 at 10:32 AM Daniel P. Berrangé wrote: > > On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote: > > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri wrote: > > > > > > Starting from ceph Pacific, RBD has built-in support for image-level > > > encryption. > > > Currently supported formats are LUKS version 1 and 2. > > > > > > There are 2 new relevant librbd APIs for controlling encryption, both > > > expect an > > > open image context: > > > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS header) > > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack > > > > > > This commit extends the qemu rbd driver API to support the above. > > > > > > Signed-off-by: Or Ozeri > > > --- > > > block/raw-format.c | 7 + > > > block/rbd.c | 371 ++- > > > qapi/block-core.json | 110 - > > > 3 files changed, 482 insertions(+), 6 deletions(-) > > > > > diff --git a/block/rbd.c b/block/rbd.c > > > index f098a89c7b..183b17cd84 100644 > > > --- a/block/rbd.c > > > +++ b/block/rbd.c > > > @@ -73,6 +73,18 @@ > > > #define LIBRBD_USE_IOVEC 0 > > > #endif > > > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 > > > + > > > +static const char rbd_luks_header_verification[ > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 > > > +}; > > > + > > > +static const char rbd_luks2_header_verification[ > > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 > > > +}; > > > + > > > typedef enum { > > > RBD_AIO_READ, > > > RBD_AIO_WRITE, > > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t > > > offs) > > > } > > > } > > > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION > > > +static int qemu_rbd_convert_luks_options( > > > +RbdEncryptionOptionsLUKSBase *luks_opts, > > > +char **passphrase, > > > +Error **errp) > > > +{ > > > +int r = 0; > > > + > > > +if (!luks_opts->has_key_secret) { > > > +r = -EINVAL; > > > +error_setg_errno(errp, -r, "missing encrypt.key-secret"); > > > +return r; > > > +} > > > > Why is key-secret optional? > > It doesn't look like it is handled correctly here, but we need to > be able to run 'qemu-img info ' and get information back > on the size of the image, and whether or not it is encrypted, > without having to supply a passphrase upfront. So it is right that > key-secret be optional, but also we shouldn't return an fatal > error like this. Hi Daniel, The key-secret lives inside RbdEncryptionOptions (or RbdEncryptionCreateOptions) which are already optional: '*encrypt': 'RbdEncryptionOptions' '*encrypt' :'RbdEncryptionCreateOptions' The image is opened as usual and then, if "encrypt" is specified, the encryption profile is loaded (or created and laid down). It does not make sense to attempt to load or create the encryption profile without the passphrase -- it would always fail. "qemu-img info" can get the size, etc without loading the encryption profile (i.e. ->bdrv_get_info and ->bdrv_get_specific_info don't need it). But note that once the encryption profile is loaded, the size changes because librbd subtracts the LUKS header overhead. So $ qemu-img info --image-opts driver=rbd,pool=foo,image=bar and $ qemu-img info --object secret,file=passphrase.txt,id=sec0 --image-opts driver=rbd,pool=foo,image=bar,encrypt.format=luks2,encrypt.key-secret=sec0 would give different results. > > Only if BDRV_O_NO_IO is NOT set, should this error be reported > > > > > > > static int64_t qemu_rbd_getlength(BlockDriverState *bs) > > > { > > > BDRVRBDState *s = bs->opaque; > > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = { > > > .type = QEMU_OPT_STRING, > > > .help = "ID of secret providing the password", > > > }, > > > +{ > > > +.name = "encrypt.format", > > > +.type = QEMU_OPT_STRING, > > > +.help = "Encrypt the image, format choices: 'luks', 'luks2'", > > > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and > > "rbd encryption format" command. > > No, it should stay "luks" not "luks1", to match the existing QEMU > terminology for its LUKS v1 encryption support. If you insist on following the QEMU nomenclature here it's fine with me but I want to point out that encryption-formatted clones won't be interoperable with QEMU LUKS driver or dm-crypt so making the names match QEMU instead of librbd and rbd CLI seems a bit misleading. > > > > > @@ -3609,6 +3622,94 @@ > > > { 'enum': 'RbdAuthMode', > > >'data': [ 'cephx', 'none' ] } > > > > > > +## > > > +# @RbdImageEncryptionFormat: > > > +# > > > +# Since: 6.1 > > > +## > > > +{ 'enum': 'RbdImageEncryptionFormat', > > > + 'data': [ 'luks', 'luks2' ] } > > > > Ditto
[PATCH v6 16/16] docs/devel/testing: add -p option to the debug section of QEMU iotests
Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- docs/devel/testing.rst | 4 1 file changed, 4 insertions(+) diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst index fa85592a38..28a0b37b84 100644 --- a/docs/devel/testing.rst +++ b/docs/devel/testing.rst @@ -250,6 +250,10 @@ a failing test: * ``-d`` (debug) just increases the logging verbosity, showing for example the QMP commands and answers. +* ``-p`` (print) redirect QEMU’s stdout and stderr to the test output, + instead of saving it into a log file in + ``$TEST_DIR/qemu-machine-``. + Test case groups -- 2.31.1
[PATCH v6 06/16] qemu-iotests: delay QMP socket timers
Attaching gdbserver implies that the qmp socket should wait indefinitely for an answer from QEMU. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/iotests.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index c86f239d81..e176a84620 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -477,10 +477,14 @@ def __init__(self, seconds, errmsg="Timeout"): self.seconds = seconds self.errmsg = errmsg def __enter__(self): +if qemu_gdb: +return self signal.signal(signal.SIGALRM, self.timeout) signal.setitimer(signal.ITIMER_REAL, self.seconds) return self def __exit__(self, exc_type, value, traceback): +if qemu_gdb: +return False signal.setitimer(signal.ITIMER_REAL, 0) return False def timeout(self, signum, frame): @@ -575,7 +579,7 @@ class VM(qtest.QEMUQtestMachine): def __init__(self, path_suffix=''): name = "qemu%s-%d" % (path_suffix, os.getpid()) -timer = 15.0 +timer = 15.0 if not qemu_gdb else None super().__init__(qemu_prog, qemu_opts, name=name, base_temp_dir=test_dir, socket_scm_helper=socket_scm_helper, -- 2.31.1
[PATCH v6 15/16] qemu-iotests: add option to show qemu binary logs on stdout
Using the flag -p, allow the qemu binary to print to stdout. Also create the common function _close_qemu_log_file() to avoid accessing machine.py private fields directly and have duplicate code. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- python/qemu/machine/machine.py | 9 ++--- tests/qemu-iotests/check | 4 +++- tests/qemu-iotests/iotests.py | 8 tests/qemu-iotests/testenv.py | 9 +++-- 4 files changed, 24 insertions(+), 6 deletions(-) diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py index fdf2fc0e9c..c9d344d955 100644 --- a/python/qemu/machine/machine.py +++ b/python/qemu/machine/machine.py @@ -338,6 +338,11 @@ def _post_launch(self) -> None: if self._qmp_connection: self._qmp.accept(self._qmp_timer) +def _close_qemu_log_file(self) -> None: +if self._qemu_log_file is not None: +self._qemu_log_file.close() +self._qemu_log_file = None + def _post_shutdown(self) -> None: """ Called to cleanup the VM instance after the process has exited. @@ -350,9 +355,7 @@ def _post_shutdown(self) -> None: self._qmp.close() self._qmp_connection = None -if self._qemu_log_file is not None: -self._qemu_log_file.close() -self._qemu_log_file = None +self._close_qemu_log_file() self._load_io_log() diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index ebd27946db..da1bfb839e 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -36,6 +36,8 @@ def make_argparser() -> argparse.ArgumentParser: help='pretty print output for make check') p.add_argument('-d', dest='debug', action='store_true', help='debug') +p.add_argument('-p', dest='print', action='store_true', +help='redirects qemu\'s stdout and stderr to the test output') p.add_argument('-gdb', action='store_true', help="start gdbserver with $GDB_OPTIONS options \ ('localhost:12345' if $GDB_OPTIONS is empty)") @@ -119,7 +121,7 @@ if __name__ == '__main__': aiomode=args.aiomode, cachemode=args.cachemode, imgopts=args.imgopts, misalign=args.misalign, debug=args.debug, valgrind=args.valgrind, - gdb=args.gdb) + gdb=args.gdb, qprint=args.print) if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--': if not args.tests: diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 7aa6707032..eee6fb7a9f 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -79,6 +79,8 @@ if gdb_qemu_env: qemu_gdb = ['gdbserver'] + gdb_qemu_env.strip().split(' ') +qemu_print = os.environ.get('PRINT_QEMU', False) + imgfmt = os.environ.get('IMGFMT', 'raw') imgproto = os.environ.get('IMGPROTO', 'file') output_dir = os.environ.get('OUTPUT_DIR', '.') @@ -613,6 +615,12 @@ def _post_shutdown(self) -> None: else: os.remove(valgrind_filename) +def _pre_launch(self) -> None: +super()._pre_launch() +if qemu_print: +# set QEMU binary output to stdout +self._close_qemu_log_file() + def add_object(self, opts): self._args.append('-object') self._args.append(opts) diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py index 8bf154376f..70da0d60c8 100644 --- a/tests/qemu-iotests/testenv.py +++ b/tests/qemu-iotests/testenv.py @@ -74,7 +74,7 @@ class TestEnv(ContextManager['TestEnv']): 'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU', 'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX', 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_', - 'GDB_OPTIONS'] + 'GDB_OPTIONS', 'PRINT_QEMU'] def prepare_subprocess(self, args: List[str]) -> Dict[str, str]: if self.debug: @@ -181,7 +181,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str, misalign: bool = False, debug: bool = False, valgrind: bool = False, - gdb: bool = False) -> None: + gdb: bool = False, + qprint: bool = False) -> None: self.imgfmt = imgfmt self.imgproto = imgproto self.aiomode = aiomode @@ -189,6 +190,9 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str, self.misalign = misalign self.debug = debug +if qprint: +self.print_qemu = 'y' + if gdb: self.gdb_options = os.getenv('GDB_OPTIONS', DEF_GDB_OPTIONS) if not self.gdb_options: @@ -299,6 +303,7 @@ def print_env(self) -> None: SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER} GDB_OPTIONS -- {GDB_OPTIONS}
[PATCH v6 13/16] qemu-iotests: insert valgrind command line as wrapper for qemu binary
If -gdb and -valgrind are both defined, return an error. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/iotests.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 85d8c0abbb..7aa6707032 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -591,7 +591,11 @@ class VM(qtest.QEMUQtestMachine): def __init__(self, path_suffix=''): name = "qemu%s-%d" % (path_suffix, os.getpid()) timer = 15.0 if not (qemu_gdb or qemu_valgrind) else None -super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb, +if qemu_gdb and qemu_valgrind: +sys.stderr.write('Either use gdb or valgrind, not together\n') +sys.exit(1) +wrapper = qemu_gdb if qemu_gdb else qemu_valgrind +super().__init__(qemu_prog, qemu_opts, wrapper=wrapper, name=name, base_temp_dir=test_dir, socket_scm_helper=socket_scm_helper, -- 2.31.1
[PATCH v6 02/16] python: Reduce strictness of pylint's duplicate-code check
From: John Snow Pylint prior to 2.8.3 (We pin at >= 2.8.0) includes function and method signatures as part of its duplicate checking algorithm. This check does not listen to pragmas, so the only way to disable it is to turn it off completely or increase the minimum duplicate lines so that it doesn't trigger for functions with long, multi-line signatures. When we decide to upgrade to pylint 2.8.3 or greater, we will be able to use 'ignore-signatures = true' to the config instead. I'd prefer not to keep us on the very bleeding edge of pylint if I can help it -- 2.8.3 came out only three days ago at time of writing. See: https://github.com/PyCQA/pylint/pull/4474 Signed-off-by: John Snow Reviewed-by: Vladimir Sementsov-Ogievskiy --- python/setup.cfg | 5 + 1 file changed, 5 insertions(+) diff --git a/python/setup.cfg b/python/setup.cfg index 0fcdec6f32..d82c39aa46 100644 --- a/python/setup.cfg +++ b/python/setup.cfg @@ -78,6 +78,11 @@ good-names=i, # Ignore imports when computing similarities. ignore-imports=yes +# Minimum lines number of a similarity. +# TODO: Remove after we opt in to Pylint 2.8.3. See commit msg. +min-similarity-lines=6 + + [isort] force_grid_wrap=4 force_sort_within_sections=True -- 2.31.1
[PATCH v6 14/16] docs/devel/testing: add -valgrind option to the debug section of QEMU iotests
Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy --- docs/devel/testing.rst | 7 +++ 1 file changed, 7 insertions(+) diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst index 8b24e6fb47..fa85592a38 100644 --- a/docs/devel/testing.rst +++ b/docs/devel/testing.rst @@ -240,6 +240,13 @@ a failing test: If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored, regardless on whether it is set or not. +* ``-valgrind`` attaches a valgrind instance to QEMU. If it detects + warnings, it will print and save the log in + ``$TEST_DIR/.valgrind``. + The final command line will be ``valgrind --log-file=$TEST_DIR/ + .valgrind --error-exitcode=99 $QEMU ...`` + Note: if used together with ``-gdb``, this command will be ignored. + * ``-d`` (debug) just increases the logging verbosity, showing for example the QMP commands and answers. -- 2.31.1
[PATCH v6 01/16] python: qemu: add timer parameter for qmp.accept socket
Also add a new _qmp_timer field to the QEMUMachine class. Let's change the default socket timeout to None, so that if a subclass needs to add a timer, it can be done by modifying this private field. At the same time, restore the timer to be 15 seconds in iotests.py, to give an upper bound to the QMP monitor test command execution. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- python/qemu/machine/machine.py | 7 +-- python/qemu/machine/qtest.py | 5 +++-- tests/qemu-iotests/iotests.py | 3 ++- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py index b62435528e..fdf2fc0e9c 100644 --- a/python/qemu/machine/machine.py +++ b/python/qemu/machine/machine.py @@ -95,7 +95,8 @@ def __init__(self, socket_scm_helper: Optional[str] = None, sock_dir: Optional[str] = None, drain_console: bool = False, - console_log: Optional[str] = None): + console_log: Optional[str] = None, + qmp_timer: Optional[float] = None): ''' Initialize a QEMUMachine @@ -109,6 +110,7 @@ def __init__(self, @param sock_dir: where to create socket (defaults to base_temp_dir) @param drain_console: (optional) True to drain console socket to buffer @param console_log: (optional) path to console log file +@param qmp_timer: (optional) default QMP socket timeout @note: Qemu process is not started until launch() is used. ''' # Direct user configuration @@ -116,6 +118,7 @@ def __init__(self, self._binary = binary self._args = list(args) self._wrapper = wrapper +self._qmp_timer = qmp_timer self._name = name or "qemu-%d" % os.getpid() self._base_temp_dir = base_temp_dir @@ -333,7 +336,7 @@ def _pre_launch(self) -> None: def _post_launch(self) -> None: if self._qmp_connection: -self._qmp.accept() +self._qmp.accept(self._qmp_timer) def _post_shutdown(self) -> None: """ diff --git a/python/qemu/machine/qtest.py b/python/qemu/machine/qtest.py index 93700684d1..33a86a9d69 100644 --- a/python/qemu/machine/qtest.py +++ b/python/qemu/machine/qtest.py @@ -115,14 +115,15 @@ def __init__(self, name: Optional[str] = None, base_temp_dir: str = "/var/tmp", socket_scm_helper: Optional[str] = None, - sock_dir: Optional[str] = None): + sock_dir: Optional[str] = None, + qmp_timer: Optional[float] = None): if name is None: name = "qemu-%d" % os.getpid() if sock_dir is None: sock_dir = base_temp_dir super().__init__(binary, args, name=name, base_temp_dir=base_temp_dir, socket_scm_helper=socket_scm_helper, - sock_dir=sock_dir) + sock_dir=sock_dir, qmp_timer=qmp_timer) self._qtest: Optional[QEMUQtestProtocol] = None self._qtest_path = os.path.join(sock_dir, name + "-qtest.sock") diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 89663dac06..6b0db4ce54 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -570,10 +570,11 @@ class VM(qtest.QEMUQtestMachine): def __init__(self, path_suffix=''): name = "qemu%s-%d" % (path_suffix, os.getpid()) +timer = 15.0 super().__init__(qemu_prog, qemu_opts, name=name, base_temp_dir=test_dir, socket_scm_helper=socket_scm_helper, - sock_dir=sock_dir) + sock_dir=sock_dir, qmp_timer=timer) self._num_drives = 0 def add_object(self, opts): -- 2.31.1
[PATCH v6 10/16] qemu-iotests: extend the check script to prepare supporting valgrind for python tests
Currently, the check script only parses the option and sets the VALGRIND_QEMU environmental variable to "y". Add another local python variable that prepares the command line, identical to the one provided in the test scripts. Because the python script does not know in advance the valgrind PID to assign to the log file name, use the "%p" flag in valgrind log file name that automatically puts the process PID at runtime. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/check | 7 --- tests/qemu-iotests/iotests.py | 11 +++ tests/qemu-iotests/testenv.py | 1 + 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index 4365bb8066..ebd27946db 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -39,6 +39,10 @@ def make_argparser() -> argparse.ArgumentParser: p.add_argument('-gdb', action='store_true', help="start gdbserver with $GDB_OPTIONS options \ ('localhost:12345' if $GDB_OPTIONS is empty)") +p.add_argument('-valgrind', action='store_true', +help='use valgrind, sets VALGRIND_QEMU environment ' +'variable') + p.add_argument('-misalign', action='store_true', help='misalign memory allocations') p.add_argument('--color', choices=['on', 'off', 'auto'], @@ -88,9 +92,6 @@ def make_argparser() -> argparse.ArgumentParser: g_bash.add_argument('-o', dest='imgopts', help='options to pass to qemu-img create/convert, ' 'sets IMGOPTS environment variable') -g_bash.add_argument('-valgrind', action='store_true', -help='use valgrind, sets VALGRIND_QEMU environment ' -'variable') g_sel = p.add_argument_group('test selecting options', 'The following options specify test set ' diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index e7e3d92d3e..6aa1dc48ba 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -96,6 +96,17 @@ sys.stderr.write('Please run this test via the "check" script\n') sys.exit(os.EX_USAGE) +qemu_valgrind = [] +if os.environ.get('VALGRIND_QEMU') == "y" and \ +os.environ.get('NO_VALGRIND') != "y": +valgrind_logfile = "--log-file=" + test_dir +# %p allows to put the valgrind process PID, since +# we don't know it a priori (subprocess.Popen is +# not yet invoked) +valgrind_logfile += "/%p.valgrind" + +qemu_valgrind = ['valgrind', valgrind_logfile, '--error-exitcode=99'] + socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper') luks_default_secret_object = 'secret,id=keysec0,data=' + \ diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py index 8501c6caf5..8bf154376f 100644 --- a/tests/qemu-iotests/testenv.py +++ b/tests/qemu-iotests/testenv.py @@ -298,6 +298,7 @@ def print_env(self) -> None: SOCK_DIR -- {SOCK_DIR} SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER} GDB_OPTIONS -- {GDB_OPTIONS} +VALGRIND_QEMU -- {VALGRIND_QEMU} """ args = collections.defaultdict(str, self.get_env()) -- 2.31.1
[PATCH v6 09/16] docs/devel/testing: add -gdb option to the debugging section of QEMU iotests
Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- docs/devel/testing.rst | 11 +++ 1 file changed, 11 insertions(+) diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst index 9d6a8f8636..8b24e6fb47 100644 --- a/docs/devel/testing.rst +++ b/docs/devel/testing.rst @@ -229,6 +229,17 @@ Debugging a test case The following options to the ``check`` script can be useful when debugging a failing test: +* ``-gdb`` wraps every QEMU invocation in a ``gdbserver``, which waits for a + connection from a gdb client. The options given to ``gdbserver`` (e.g. the + address on which to listen for connections) are taken from the ``$GDB_OPTIONS`` + environment variable. By default (if ``$GDB_OPTIONS`` is empty), it listens on + ``localhost:12345``. + It is possible to connect to it for example with + ``gdb -iex "target remote $addr"``, where ``$addr`` is the address + ``gdbserver`` listens on. + If the ``-gdb`` option is not used, ``$GDB_OPTIONS`` is ignored, + regardless on whether it is set or not. + * ``-d`` (debug) just increases the logging verbosity, showing for example the QMP commands and answers. -- 2.31.1
[PATCH v6 08/16] qemu-iotests: add gdbserver option to script tests too
The only limitation here is that running a script with gdbserver will make the test output mismatch with the expected results, making the test fail. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/common.rc | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc index cbbf6d7c7f..a1ef2b5c2f 100644 --- a/tests/qemu-iotests/common.rc +++ b/tests/qemu-iotests/common.rc @@ -166,8 +166,14 @@ _qemu_wrapper() if [ -n "${QEMU_NEED_PID}" ]; then echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid" fi + +GDB="" +if [ ! -z ${GDB_OPTIONS} ]; then +GDB="gdbserver ${GDB_OPTIONS}" +fi + VALGRIND_QEMU="${VALGRIND_QEMU_VM}" _qemu_proc_exec "${VALGRIND_LOGFILE}" \ -"$QEMU_PROG" $QEMU_OPTIONS "$@" +$GDB "$QEMU_PROG" $QEMU_OPTIONS "$@" ) RETVAL=$? _qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL -- 2.31.1
[PATCH v6 07/16] qemu_iotests: insert gdbserver command line as wrapper for qemu binary
Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/iotests.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index e176a84620..e7e3d92d3e 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -580,7 +580,8 @@ class VM(qtest.QEMUQtestMachine): def __init__(self, path_suffix=''): name = "qemu%s-%d" % (path_suffix, os.getpid()) timer = 15.0 if not qemu_gdb else None -super().__init__(qemu_prog, qemu_opts, name=name, +super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb, + name=name, base_temp_dir=test_dir, socket_scm_helper=socket_scm_helper, sock_dir=sock_dir, qmp_timer=timer) -- 2.31.1
[PATCH v6 11/16] qemu-iotests: extend QMP socket timeout when using valgrind
As with gdbserver, valgrind delays the test execution, so the default QMP socket timeout and the generic class Timeout in iotests.py timeouts too soon. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/iotests.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 6aa1dc48ba..26c580f9e7 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -488,13 +488,13 @@ def __init__(self, seconds, errmsg="Timeout"): self.seconds = seconds self.errmsg = errmsg def __enter__(self): -if qemu_gdb: +if qemu_gdb or qemu_valgrind: return self signal.signal(signal.SIGALRM, self.timeout) signal.setitimer(signal.ITIMER_REAL, self.seconds) return self def __exit__(self, exc_type, value, traceback): -if qemu_gdb: +if qemu_gdb or qemu_valgrind: return False signal.setitimer(signal.ITIMER_REAL, 0) return False @@ -590,7 +590,7 @@ class VM(qtest.QEMUQtestMachine): def __init__(self, path_suffix=''): name = "qemu%s-%d" % (path_suffix, os.getpid()) -timer = 15.0 if not qemu_gdb else None +timer = 15.0 if not (qemu_gdb or qemu_valgrind) else None super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb, name=name, base_temp_dir=test_dir, -- 2.31.1
[PATCH v6 05/16] qemu-iotests: add option to attach gdbserver
Define -gdb flag and GDB_OPTIONS environment variable to python tests to attach a gdbserver to each qemu instance. This patch only adds and parses this flag, it does not yet add the implementation for it. if -gdb is not provided but $GDB_OPTIONS is set, ignore the environment variable. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/check | 6 +- tests/qemu-iotests/iotests.py | 5 + tests/qemu-iotests/testenv.py | 17 +++-- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check index 2dd529eb75..4365bb8066 100755 --- a/tests/qemu-iotests/check +++ b/tests/qemu-iotests/check @@ -36,6 +36,9 @@ def make_argparser() -> argparse.ArgumentParser: help='pretty print output for make check') p.add_argument('-d', dest='debug', action='store_true', help='debug') +p.add_argument('-gdb', action='store_true', + help="start gdbserver with $GDB_OPTIONS options \ +('localhost:12345' if $GDB_OPTIONS is empty)") p.add_argument('-misalign', action='store_true', help='misalign memory allocations') p.add_argument('--color', choices=['on', 'off', 'auto'], @@ -114,7 +117,8 @@ if __name__ == '__main__': env = TestEnv(imgfmt=args.imgfmt, imgproto=args.imgproto, aiomode=args.aiomode, cachemode=args.cachemode, imgopts=args.imgopts, misalign=args.misalign, - debug=args.debug, valgrind=args.valgrind) + debug=args.debug, valgrind=args.valgrind, + gdb=args.gdb) if len(sys.argv) > 1 and sys.argv[-len(args.tests)-1] == '--': if not args.tests: diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 6b0db4ce54..c86f239d81 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -74,6 +74,11 @@ qemu_prog = os.environ.get('QEMU_PROG', 'qemu') qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ') +gdb_qemu_env = os.environ.get('GDB_OPTIONS') +qemu_gdb = [] +if gdb_qemu_env: +qemu_gdb = ['gdbserver'] + gdb_qemu_env.strip().split(' ') + imgfmt = os.environ.get('IMGFMT', 'raw') imgproto = os.environ.get('IMGPROTO', 'file') output_dir = os.environ.get('OUTPUT_DIR', '.') diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py index 0c3fe75636..8501c6caf5 100644 --- a/tests/qemu-iotests/testenv.py +++ b/tests/qemu-iotests/testenv.py @@ -27,6 +27,7 @@ import glob from typing import List, Dict, Any, Optional, ContextManager +DEF_GDB_OPTIONS = 'localhost:12345' def isxfile(path: str) -> bool: return os.path.isfile(path) and os.access(path, os.X_OK) @@ -72,7 +73,8 @@ class TestEnv(ContextManager['TestEnv']): 'QEMU_NBD_OPTIONS', 'IMGOPTS', 'IMGFMT', 'IMGPROTO', 'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU', 'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX', - 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_'] + 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_', + 'GDB_OPTIONS'] def prepare_subprocess(self, args: List[str]) -> Dict[str, str]: if self.debug: @@ -178,7 +180,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str, imgopts: Optional[str] = None, misalign: bool = False, debug: bool = False, - valgrind: bool = False) -> None: + valgrind: bool = False, + gdb: bool = False) -> None: self.imgfmt = imgfmt self.imgproto = imgproto self.aiomode = aiomode @@ -186,6 +189,15 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str, self.misalign = misalign self.debug = debug +if gdb: +self.gdb_options = os.getenv('GDB_OPTIONS', DEF_GDB_OPTIONS) +if not self.gdb_options: +# cover the case 'export GDB_OPTIONS=' +self.gdb_options = DEF_GDB_OPTIONS +elif 'GDB_OPTIONS' in os.environ: +# to not propagate it in prepare_subprocess() +del os.environ['GDB_OPTIONS'] + if valgrind: self.valgrind_qemu = 'y' @@ -285,6 +297,7 @@ def print_env(self) -> None: TEST_DIR -- {TEST_DIR} SOCK_DIR -- {SOCK_DIR} SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER} +GDB_OPTIONS -- {GDB_OPTIONS} """ args = collections.defaultdict(str, self.get_env()) -- 2.31.1
[PATCH v6 12/16] qemu-iotests: allow valgrind to read/delete the generated log file
When using -valgrind on the script tests, it generates a log file in $TEST_DIR that is either read (if valgrind finds problems) or otherwise deleted. Provide the same exact behavior when using -valgrind on the python tests. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/iotests.py | 11 +++ 1 file changed, 11 insertions(+) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 26c580f9e7..85d8c0abbb 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -598,6 +598,17 @@ def __init__(self, path_suffix=''): sock_dir=sock_dir, qmp_timer=timer) self._num_drives = 0 +def _post_shutdown(self) -> None: +super()._post_shutdown() +if not qemu_valgrind or not self._popen: +return +valgrind_filename = f"{test_dir}/{self._popen.pid}.valgrind" +if self.exitcode() == 99: +with open(valgrind_filename) as f: +print(f.read()) +else: +os.remove(valgrind_filename) + def add_object(self, opts): self._args.append('-object') self._args.append(opts) -- 2.31.1
[PATCH v6 00/16] qemu_iotests: improve debugging options
This series adds the option to attach gdbserver and valgrind to the QEMU binary running in qemu_iotests. It also allows to redirect QEMU binaries output of the python tests to the stdout, instead of a log file. Patches 1-9 introduce the -gdb option to both python and bash tests, 10-14 extend the already existing -valgrind flag to work also on python tests, and patch 15-16 introduces -p to enable logging to stdout. In particular, patches 1,6,11 focus on extending the QMP socket timers when using gdb/valgrind, otherwise the python tests will fail due to delays in the QMP responses. Signed-off-by: Emanuele Giuseppe Esposito --- v6: * undo the previous series change "base this serie on the double dash options, so define --gdb instead of -gdb" * undo Vladimir's suggestion on patch 5 to use @contextmanager, because it produces a pylint warning. Emanuele Giuseppe Esposito (15): python: qemu: add timer parameter for qmp.accept socket python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine docs/devel/testing: add debug section to the QEMU iotests chapter qemu-iotests: add option to attach gdbserver qemu-iotests: delay QMP socket timers qemu_iotests: insert gdbserver command line as wrapper for qemu binary qemu-iotests: add gdbserver option to script tests too docs/devel/testing: add -gdb option to the debugging section of QEMU iotests qemu-iotests: extend the check script to prepare supporting valgrind for python tests qemu-iotests: extend QMP socket timeout when using valgrind qemu-iotests: allow valgrind to read/delete the generated log file qemu-iotests: insert valgrind command line as wrapper for qemu binary docs/devel/testing: add -valgrind option to the debug section of QEMU iotests qemu-iotests: add option to show qemu binary logs on stdout docs/devel/testing: add -p option to the debug section of QEMU iotests John Snow (1): python: Reduce strictness of pylint's duplicate-code check docs/devel/testing.rst | 30 + python/qemu/machine/machine.py | 16 +++ python/qemu/machine/qtest.py | 9 --- python/setup.cfg | 5 tests/qemu-iotests/check | 15 --- tests/qemu-iotests/common.rc | 8 +- tests/qemu-iotests/iotests.py | 49 -- tests/qemu-iotests/testenv.py | 23 ++-- 8 files changed, 138 insertions(+), 17 deletions(-) -- 2.31.1
[PATCH v6 04/16] docs/devel/testing: add debug section to the QEMU iotests chapter
Introduce the "Debugging a test case" section, in preparation to the additional flags that will be added in the next patches. Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy --- docs/devel/testing.rst | 8 1 file changed, 8 insertions(+) diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst index 4e42392810..9d6a8f8636 100644 --- a/docs/devel/testing.rst +++ b/docs/devel/testing.rst @@ -224,6 +224,14 @@ another application on the host may have locked the file, possibly leading to a test failure. If using such devices are explicitly desired, consider adding ``locking=off`` option to disable image locking. +Debugging a test case +--- +The following options to the ``check`` script can be useful when debugging +a failing test: + +* ``-d`` (debug) just increases the logging verbosity, showing + for example the QMP commands and answers. + Test case groups -- 2.31.1
[PATCH v6 03/16] python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine
Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: John Snow Reviewed-by: Max Reitz --- python/qemu/machine/qtest.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/python/qemu/machine/qtest.py b/python/qemu/machine/qtest.py index 33a86a9d69..dc2b5ccfb1 100644 --- a/python/qemu/machine/qtest.py +++ b/python/qemu/machine/qtest.py @@ -112,6 +112,7 @@ class QEMUQtestMachine(QEMUMachine): def __init__(self, binary: str, args: Sequence[str] = (), + wrapper: Sequence[str] = (), name: Optional[str] = None, base_temp_dir: str = "/var/tmp", socket_scm_helper: Optional[str] = None, @@ -121,7 +122,8 @@ def __init__(self, name = "qemu-%d" % os.getpid() if sock_dir is None: sock_dir = base_temp_dir -super().__init__(binary, args, name=name, base_temp_dir=base_temp_dir, +super().__init__(binary, args, wrapper=wrapper, name=name, + base_temp_dir=base_temp_dir, socket_scm_helper=socket_scm_helper, sock_dir=sock_dir, qmp_timer=qmp_timer) self._qtest: Optional[QEMUQtestProtocol] = None -- 2.31.1
Re: [PATCH 2/6] block: block-status cache for data regions
On 19.06.21 12:20, Vladimir Sementsov-Ogievskiy wrote: 17.06.2021 18:52, Max Reitz wrote: As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Max Reitz --- include/block/block_int.h | 19 ++ block.c | 2 + block/io.c | 80 +-- 3 files changed, 98 insertions(+), 3 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index a8f9598102..c09512556a 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -832,6 +832,23 @@ struct BdrvChild { QLIST_ENTRY(BdrvChild) next_parent; }; +/* + * Allows bdrv_co_block_status() to cache one data region for a + * protocol node. + * + * @lock: Lock for accessing this object's fields + * @valid: Whether the cache is valid + * @data_start: Offset where we know (or strongly assume) is data + * @data_end: Offset where the data region ends (which is not necessarily + * the start of a zeroed region) + */ +typedef struct BdrvBlockStatusCache { + CoMutex lock; + bool valid; + int64_t data_start; + int64_t data_end; +} BdrvBlockStatusCache; + struct BlockDriverState { /* Protected by big QEMU lock or read-only after opening. No special * locking needed during I/O... @@ -997,6 +1014,8 @@ struct BlockDriverState { /* BdrvChild links to this node may never be frozen */ bool never_freeze; + + BdrvBlockStatusCache block_status_cache; }; struct BlockBackendRootState { diff --git a/block.c b/block.c index 3f456892d0..bad64d5c4f 100644 --- a/block.c +++ b/block.c @@ -398,6 +398,8 @@ BlockDriverState *bdrv_new(void) qemu_co_queue_init(>flush_queue); + qemu_co_mutex_init(>block_status_cache.lock); + for (i = 0; i < bdrv_drain_all_count; i++) { bdrv_drained_begin(bs); } diff --git a/block/io.c b/block/io.c index 323854d063..320638cc48 100644 --- a/block/io.c +++ b/block/io.c @@ -35,6 +35,7 @@ #include "qapi/error.h" #include "qemu/error-report.h" #include "qemu/main-loop.h" +#include "qemu/range.h" #include "sysemu/replay.h" /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */ @@ -1862,6 +1863,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, bool need_flush = false; int head = 0; int tail = 0; + BdrvBlockStatusCache *bsc = >block_status_cache; int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX); int alignment = MAX(bs->bl.pwrite_zeroes_alignment, @@ -1878,6 +1880,16 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, return -ENOTSUP; } + /* Invalidate the cached block-status data range if this write overlaps */ + qemu_co_mutex_lock(>lock); + if (bsc->valid && + ranges_overlap(offset, bytes, bsc->data_start, + bsc->data_end - bsc->data_start)) + { + bsc->valid = false; + } +
Re: [PATCH 5/6] block/nbd: Do not force-cap *pnum
On 19.06.21 12:53, Vladimir Sementsov-Ogievskiy wrote: 17.06.2021 18:52, Max Reitz wrote: bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Max Reitz --- block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/nbd.c b/block/nbd.c index 616f9ae6c4..930bd234de 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -1702,7 +1702,7 @@ static int coroutine_fn nbd_client_co_block_status( .type = NBD_CMD_BLOCK_STATUS, .from = offset, .len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment), - MIN(bytes, s->info.size - offset)), + s->info.size - offset), .flags = NBD_CMD_FLAG_REQ_ONE, }; Hmm.. I don't that this change is correct. In contrast with file-posix you don't get extra information for free, you just make a larger request. This means that server will have to do more work. Oh, oops. Seems I was blind in my rage to replace this MIN() pattern. You’re absolutely right. So this patch should be dropped. Max (look at blockstatus_to_extents, it calls bdrv_block_status_above in a loop). For example, assume that nbd export is a qcow2 image with all clusters allocated. With this change, nbd server will loop through the whole qcow2 image, load all L2 tables to return big allocated extent. So, only server can decide, could it add some extra free information to request or not. But unfortunately NBD_CMD_FLAG_REQ_ONE doesn't allow it.
Re: [PATCH 4/6] block/gluster: Do not force-cap *pnum
On 19.06.21 12:36, Vladimir Sementsov-Ogievskiy wrote: 17.06.2021 18:52, Max Reitz wrote: bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/gluster.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/block/gluster.c b/block/gluster.c index e8ee14c8e9..8ef7bb18d5 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -1461,7 +1461,8 @@ exit: * the specified offset) that are known to be in the same * allocated/unallocated state. * - * 'bytes' is the max value 'pnum' should be set to. + * 'bytes' is a soft cap for 'pnum'. If the information is free, 'pnum' may + * well exceed it. * * (Based on raw_co_block_status() from file-posix.c.) */ @@ -1500,12 +1501,12 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs, } else if (data == offset) { /* On a data extent, compute bytes to the end of the extent, * possibly including a partial sector at EOF. */ - *pnum = MIN(bytes, hole - offset); + *pnum = hole - offset; ret = BDRV_BLOCK_DATA; Interesting, isn't it a bug that we don't ROUND_UP *pnum to request_alignment here like it is done in file-posix ? Guess I forgot gluster in 9c3db310ff0 O:) I don’t think I’ll be able to reproduce it for gluster, but I suppose just doing the same thing for gluster should be fine... Max
Re: [PATCH 3/6] block/file-posix: Do not force-cap *pnum
On 18.06.21 22:16, Eric Blake wrote: On Thu, Jun 17, 2021 at 05:52:44PM +0200, Max Reitz wrote: bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. We should update the documentation in include/block/block_int.h to mention that the driver's block_status callback may treat *pnum as a soft cap, and that returning a larger value is fine. Oh, sure. Max But I agree with this change in the individual drivers, as long as we remember to make our global contract explicit that we can now rely on it ;) Reviewed-by: Eric Blake
Re: [PATCH 2/6] block: block-status cache for data regions
On 18.06.21 20:51, Eric Blake wrote: On Thu, Jun 17, 2021 at 05:52:43PM +0200, Max Reitz wrote: To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. Here's hoping third time's the charm! Indeed :) (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Max Reitz --- include/block/block_int.h | 19 ++ block.c | 2 + block/io.c| 80 +-- 3 files changed, 98 insertions(+), 3 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index a8f9598102..c09512556a 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -832,6 +832,23 @@ struct BdrvChild { QLIST_ENTRY(BdrvChild) next_parent; }; +/* + * Allows bdrv_co_block_status() to cache one data region for a + * protocol node. + * + * @lock: Lock for accessing this object's fields + * @valid: Whether the cache is valid + * @data_start: Offset where we know (or strongly assume) is data + * @data_end: Offset where the data region ends (which is not necessarily + *the start of a zeroed region) + */ +typedef struct BdrvBlockStatusCache { +CoMutex lock; +bool valid; +int64_t data_start; +int64_t data_end; +} BdrvBlockStatusCache; Looks like the right bits of information, and I'm glad you documented the need to be prepared for protocols that report split data sections rather than consolidated. +++ b/block/io.c @@ -35,6 +35,7 @@ #include "qapi/error.h" #include "qemu/error-report.h" #include "qemu/main-loop.h" +#include "qemu/range.h" #include "sysemu/replay.h" /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */ @@ -1862,6 +1863,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, bool need_flush = false; int head = 0; int tail = 0; +BdrvBlockStatusCache *bsc = >block_status_cache; int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX); int alignment = MAX(bs->bl.pwrite_zeroes_alignment, @@ -1878,6 +1880,16 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, return -ENOTSUP; } +/* Invalidate the cached block-status data range if this write overlaps */ +qemu_co_mutex_lock(>lock); Are we going to be suffering from a lot of lock contention performance degradation? Is there a way to take advantage of RCU access patterns for any more performance without sacrificing correctness? The critical section is so short that I considered it fine. I wanted to use RW locks, but then realized that every RW lock operation is internally locked by another mutex, so it wouldn’t gain anything. I’m not sure whether RCU is worth it here. We could try something very crude, namely to just not take a lock and make `valid` an atomic. After all, it doesn’t really matter whether `data_start` and `data_end` are consistent values, and resetting `valid` to false is always safe. The worst that could happen is that a concurrent block-status call tries to set up an overlapping data area, which we thus fail to recognize here. But if such a thing were to happen, it could just as well happen before said concurrent call took any lock on `bsc`. +if (bsc->valid && +ranges_overlap(offset, bytes, bsc->data_start, + bsc->data_end - bsc->data_start)) +{ +bsc->valid = false; +} Do we want to invalidate the entire bsc, or can we be smart and leave the prefix intact (if offset > bsc->data_start, then set bsc->data_end to offset)? Perhaps we could be smart, but I don’t think it really makes a difference in practice, so I think keeping it simple is better. +qemu_co_mutex_unlock(>lock); Worth using WITH_QEMU_LOCK_GUARD? I knew I forgot something, right. Will use! Max
[PATCH v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device
When the NVMe block driver was introduced (see commit bdd6a90a9e5, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. Fix by handling the new -ENOSPC error (when DMA mappings are exhausted) without any distinction to the current -ENOMEM error, so we don't change the behavior on old kernels where the CVE-2019-3882 fix is not present. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-sta...@nongnu.org Reported-by: Michal Prívozník Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/186 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by: Philippe Mathieu-Daudé --- v2: KISS checking both errors undistinguishedly (Maxim) --- block/nvme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/nvme.c b/block/nvme.c index 2b5421e7aa6..c3d2a49866c 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -1030,7 +1030,7 @@ try_map: r = qemu_vfio_dma_map(s->vfio, qiov->iov[i].iov_base, len, true, ); -if (r == -ENOMEM && retry) { +if ((r == -ENOMEM || r == -ENOSPC) && retry) { retry = false; trace_nvme_dma_flush_queue_wait(s); if (s->dma_map_count) { -- 2.31.1
Re: [PATCH v4 6/6] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState
On 19/06/2021 22:06, Vladimir Sementsov-Ogievskiy wrote: 14.06.2021 10:33, Emanuele Giuseppe Esposito wrote: By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true. And that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified also outside coroutines. To be honest, finished is modified only in coroutine. And read outside. Signed-off-by: Emanuele Giuseppe Esposito --- block/block-copy.c | 33 ++--- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/block/block-copy.c b/block/block-copy.c index 6416929abd..5348e1f61b 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -53,14 +53,14 @@ typedef struct BlockCopyCallState { Coroutine *co; /* State */ - bool finished; + bool finished; /* atomic */ So, logic around finished: Thread of block_copy does: 0. finished is false 1. tasks set ret and error_is_read 2. qatomic_store_release finished -> true 3. after that point ret and error_is_read are not modified Other threads can: - qatomic_read finished, just to check are we finished or not - if finished, can read ret and error_is_read safely. If you not sure that block-copy finished, use qatomic_load_acquire() of finished first, to be sure that you read ret and error_is_read AFTER finished read and checked to be true. QemuCoSleep sleep; /* TODO: protect API with a lock */ /* To reference all call states from BlockCopyState */ QLIST_ENTRY(BlockCopyCallState) list; /* OUT parameters */ - bool cancelled; + bool cancelled; /* atomic */ Logic around cancelled is simpler: - false at start - qatomic_read is allowed from any thread - qatomic_write to true is allowed from any thread - never write to false Note that cancelling and finishing are racy. User can cancel block-copy that's already finished. We probably may improve change it, but I'm not sure that it worth doing. Still, maybe leave some comment in API documentation. /* Fields protected by lock in BlockCopyState */ bool error_is_read; int ret; @@ -650,7 +650,8 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state) assert(QEMU_IS_ALIGNED(offset, s->cluster_size)); assert(QEMU_IS_ALIGNED(bytes, s->cluster_size)); - while (bytes && aio_task_pool_status(aio) == 0 && !call_state->cancelled) { + while (bytes && aio_task_pool_status(aio) == 0 && + !qatomic_read(_state->cancelled)) { BlockCopyTask *task; int64_t status_bytes; @@ -761,7 +762,7 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state) do { ret = block_copy_dirty_clusters(call_state); - if (ret == 0 && !call_state->cancelled) { + if (ret == 0 && !qatomic_read(_state->cancelled)) { WITH_QEMU_LOCK_GUARD(>lock) { /* * Check that there is no task we still need to @@ -792,9 +793,9 @@ static int coroutine_fn block_copy_common(BlockCopyCallState *call_state) * 2. We have waited for some intersecting block-copy request * It may have failed and produced new dirty bits. */ - } while (ret > 0 && !call_state->cancelled); + } while (ret > 0 && !qatomic_read(_state->cancelled)); - call_state->finished = true; + qatomic_store_release(_state->finished, true); so, all writes to ret and error_is_read are finished to this point. if (call_state->cb) { call_state->cb(call_state->cb_opaque); @@ -857,35 +858,37 @@ void block_copy_call_free(BlockCopyCallState *call_state) return; } - assert(call_state->finished); + assert(qatomic_load_acquire(_state->finished)); Here we don't need load_aquire, as we don't read other fields. qatomic_read is enough. So what you say makes sense, the only thing that I wonder is: wouldn't it be better to have the acquire without assertion (or assert afterwards), just to be sure that we delete when finished is true? [...] } bool block_copy_call_cancelled(BlockCopyCallState *call_state) { - return call_state->cancelled; + return qatomic_read(_state->cancelled); } int block_copy_call_status(BlockCopyCallState *call_state, bool *error_is_read) { - assert(call_state->finished); + assert(qatomic_load_acquire(_state->finished)); Hmm. Here qatomic_load_acquire protects nothing (assertion will crash if not yet finished anyway). So, caller is double sure that block-copy is finished. Also it's misleading: if we think that it do some protection, we are doing wrong thing: assertions may be simply compiled out, we can't rely on statements inside assert() to be executed. So, let's use simple qatomic_read here too. Same applies here. if (error_is_read) { *error_is_read =
Re: [PATCH V3 5/6] block/rbd: add write zeroes support
Am 18.06.21 um 12:34 schrieb Ilya Dryomov: On Fri, Jun 18, 2021 at 11:00 AM Peter Lieven wrote: Am 16.06.21 um 14:34 schrieb Ilya Dryomov: On Wed, May 19, 2021 at 4:28 PM Peter Lieven wrote: Signed-off-by: Peter Lieven --- block/rbd.c | 37 - 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/block/rbd.c b/block/rbd.c index 0d8612a988..ee13f08a74 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -63,7 +63,8 @@ typedef enum { RBD_AIO_READ, RBD_AIO_WRITE, RBD_AIO_DISCARD, -RBD_AIO_FLUSH +RBD_AIO_FLUSH, +RBD_AIO_WRITE_ZEROES } RBDAIOCmd; typedef struct BDRVRBDState { @@ -705,6 +706,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, } } +#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES +bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP; I wonder if we should also set BDRV_REQ_NO_FALLBACK here since librbd does not really have a notion of non-efficient explicit zeroing. This is only true if thick provisioning is supported which is in Octopus onwards, right? Since Pacific, I think. So it would only be correct to set this if thick provisioning is supported otherwise we could fail with ENOTSUP and then qemu emulates the zeroing with plain writes. I actually had a question about that. Why are you returning ENOTSUP in case BDRV_REQ_MAY_UNMAP is not specified and that can't be fulfilled because librbd is too old for RBD_WRITE_ZEROES_FLAG_THICK_PROVISION? My understanding has always been that BDRV_REQ_MAY_UNMAP is just a hint. Deallocating if BDRV_REQ_MAY_UNMAP is specified is not nice but should be perfectly acceptable. It is certainly better than returning ENOTSUP, particularly if ENOTSUP causes Qemu to do plain zeroing. I think this was introduced to support different provisioning modes. If BDRV_REQ_MAY_UNMAP is not set the caller of bdrv_write_zeroes expects that the driver does thick provisioning. If the driver cannot handle that (efficiently) qemu does a plain zero write. I am still not fully understanding the meaning of the BDRV_REQ_NO_FALLBACK flag. The original commit states that it was introduced for qemu-img to efficiently zero out the target and avoid the slow fallback. When I last worked on qemu-img convert I remember that there was a call to zero out the target if bdrv_has_zero_init is not 1. It seems hat meanwhile a target_is_zero cmdline switch for qemu-img convert was added to let the user assert that a preexisting target is zero. Maybe someone can help here if it would be right to set BDRV_REQ_NO_FALLBACK for rbd in either of the 2 cases (thick provisioning is support or not)? Thanks Peter
Re: [PATCH] block/rbd: Add support for rbd image encryption
On Sat, Jun 19, 2021 at 09:44:32PM +0200, Ilya Dryomov wrote: > On Thu, Jun 17, 2021 at 6:05 PM Or Ozeri wrote: > > > > Starting from ceph Pacific, RBD has built-in support for image-level > > encryption. > > Currently supported formats are LUKS version 1 and 2. > > > > There are 2 new relevant librbd APIs for controlling encryption, both > > expect an > > open image context: > > > > rbd_encryption_format: formats an image (i.e. writes the LUKS header) > > rbd_encryption_load: loads encryptor/decryptor to the image IO stack > > > > This commit extends the qemu rbd driver API to support the above. > > > > Signed-off-by: Or Ozeri > > --- > > block/raw-format.c | 7 + > > block/rbd.c | 371 ++- > > qapi/block-core.json | 110 - > > 3 files changed, 482 insertions(+), 6 deletions(-) > > diff --git a/block/rbd.c b/block/rbd.c > > index f098a89c7b..183b17cd84 100644 > > --- a/block/rbd.c > > +++ b/block/rbd.c > > @@ -73,6 +73,18 @@ > > #define LIBRBD_USE_IOVEC 0 > > #endif > > > > +#define RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN 8 > > + > > +static const char rbd_luks_header_verification[ > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 1 > > +}; > > + > > +static const char rbd_luks2_header_verification[ > > +RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = { > > +'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2 > > +}; > > + > > typedef enum { > > RBD_AIO_READ, > > RBD_AIO_WRITE, > > @@ -341,6 +353,206 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t > > offs) > > } > > } > > > > +#ifdef LIBRBD_SUPPORTS_ENCRYPTION > > +static int qemu_rbd_convert_luks_options( > > +RbdEncryptionOptionsLUKSBase *luks_opts, > > +char **passphrase, > > +Error **errp) > > +{ > > +int r = 0; > > + > > +if (!luks_opts->has_key_secret) { > > +r = -EINVAL; > > +error_setg_errno(errp, -r, "missing encrypt.key-secret"); > > +return r; > > +} > > Why is key-secret optional? It doesn't look like it is handled correctly here, but we need to be able to run 'qemu-img info ' and get information back on the size of the image, and whether or not it is encrypted, without having to supply a passphrase upfront. So it is right that key-secret be optional, but also we shouldn't return an fatal error like this. Only if BDRV_O_NO_IO is NOT set, should this error be reported > > static int64_t qemu_rbd_getlength(BlockDriverState *bs) > > { > > BDRVRBDState *s = bs->opaque; > > @@ -1243,6 +1589,22 @@ static QemuOptsList qemu_rbd_create_opts = { > > .type = QEMU_OPT_STRING, > > .help = "ID of secret providing the password", > > }, > > +{ > > +.name = "encrypt.format", > > +.type = QEMU_OPT_STRING, > > +.help = "Encrypt the image, format choices: 'luks', 'luks2'", > > I think it should be "luks1" and "luks2" to match rbd/librbd.h and > "rbd encryption format" command. No, it should stay "luks" not "luks1", to match the existing QEMU terminology for its LUKS v1 encryption support. > > @@ -3609,6 +3622,94 @@ > > { 'enum': 'RbdAuthMode', > >'data': [ 'cephx', 'none' ] } > > > > +## > > +# @RbdImageEncryptionFormat: > > +# > > +# Since: 6.1 > > +## > > +{ 'enum': 'RbdImageEncryptionFormat', > > + 'data': [ 'luks', 'luks2' ] } > > Ditto -- "luks1" and "luks2". No, the patch is correct as is. > > +# @RbdEncryptionOptionsLUKSBase: > > +# > > +# @key-secret: ID of a QCryptoSecret object providing a passphrase > > +# for unlocking the encryption > > +# > > +# Since: 6.1 > > +## > > +{ 'struct': 'RbdEncryptionOptionsLUKSBase', > > + 'data': { '*key-secret': 'str' }} > > When would we not need a passphrase? I think it should be required. When running 'qemu-img info' > > +## > > +# @RbdEncryptionCreateOptionsLUKSBase: > > +# > > +# @cipher-alg: The encryption algorithm > > +# > > +# Since: 6.1 > > +## > > +{ 'struct': 'RbdEncryptionCreateOptionsLUKSBase', > > + 'base': 'RbdEncryptionOptionsLUKSBase', > > + 'data': { '*cipher-alg': 'QCryptoCipherAlgorithm'}} > > Why QCryptoCipherAlgorithm instead of just enumerating the two > algorithms that librbd supports? An early failure when parsing > seems better than failing in qemu_rbd_convert_luks_create_options() > and having to clean up the newly created image. We don't want to duplicate algorithm names that already have a defined enum data type. > > > + > > +## > > +# @RbdEncryptionOptionsLUKS: > > +# > > +# Since: 6.1 > > +## > > +{ 'struct': 'RbdEncryptionOptionsLUKS', > > + 'base': 'RbdEncryptionOptionsLUKSBase', > > + 'data': {}} > > + > > +## > > +# @RbdEncryptionOptionsLUKS2: > > +# > > +# Since: 6.1 > > +## > > +{ 'struct': 'RbdEncryptionOptionsLUKS2', > > + 'base': 'RbdEncryptionOptionsLUKSBase', > > + 'data': {}} > > + > > +## > > +# @RbdEncryptionCreateOptionsLUKS: > >
Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
On 19/06/2021 20:53, Vladimir Sementsov-Ogievskiy wrote: 14.06.2021 10:33, Emanuele Giuseppe Esposito wrote: --- a/block/block-copy.c +++ b/block/block-copy.c @@ -52,29 +52,35 @@ typedef struct BlockCopyCallState { /* Coroutine where async block-copy is running */ Coroutine *co; - /* To reference all call states from BlockCopyState */ - QLIST_ENTRY(BlockCopyCallState) list; - /* State */ - int ret; bool finished; - QemuCoSleep sleep; - bool cancelled; + QemuCoSleep sleep; /* TODO: protect API with a lock */ + + /* To reference all call states from BlockCopyState */ + QLIST_ENTRY(BlockCopyCallState) list; /* OUT parameters */ + bool cancelled; bool error_is_read; + int ret; Hmm, about that. Is @ret an "OUT parameter"? Yes it is. But someone may think, that out parameters doesn't need locking like "State" parameters (otherwise, what is the difference for the person who read these comments?). But that is wrong. And ret is modified under mutex for reason. In patch 5 I added a comment above @ret and @error_is_read: /* Fields protected by lock in BlockCopyState */ I can add your explanation too. Actually, the full description of "ret" field usage may look as follows: Set concurrently by tasks under mutex. Only set once by first failed task (and untouched if no task failed). After finish (if call_state->finished is true) not modified anymore and may be read safely without mutex. So, before finished, ret is a kind of "State" too: it is both read and written by tasks. This shows to me that dividing fields into "IN", "State" and "OUT", doesn't really help here. In this series we use different policies of concurrent access to fields: some are accessed only under mutex, other has more complex usage scenario (like this @ret), some needs atomic access. Yes but I think especially the IN vs State division helps a lot to understand what needs a lock and what doesn't. Emanuele
Re: [PATCH RFC] meson: add option to use zstd for qcow2 compression by default
On 17/06/21 21:51, Vladimir Sementsov-Ogievskiy wrote: So, it's an RFC. I also can split the patch so that refactoring of qcow2_co_create() go in a separate preparation patch. Another RFC question, shouldn't we move to zstd by default in upstream too? I think backwards-incompatible changes in the past were not handled with build options, but that can be changed. However I would prefer to have an option like --with-qcow2-compression={zstd,zlib}. Meson supports multiple-choice options, they don't have to use enabled/disabled or (if boolean) true/false. Regarding changing the default, that would make images unreadable to QEMU 5.0 and earlier versions. Does that apply to images that have no compressed clusters? Paolo configure | 10 +- meson.build | 4 block/qcow2.c | 32 +--- meson_options.txt | 2 ++ 4 files changed, 40 insertions(+), 8 deletions(-) diff --git a/configure b/configure index debd50c085..b19af43525 100755 --- a/configure +++ b/configure @@ -385,6 +385,7 @@ snappy="auto" bzip2="auto" lzfse="auto" zstd="auto" +qcow2_zstd_default="no" guest_agent="$default_feature" guest_agent_with_vss="no" guest_agent_ntddscsi="no" @@ -1318,6 +1319,10 @@ for opt do ;; --enable-zstd) zstd="enabled" ;; + --disable-qcow2-zstd-default) qcow2_zstd_default="disabled" + ;; + --enable-qcow2-zstd-default) qcow2_zstd_default="enabled" + ;; --enable-guest-agent) guest_agent="yes" ;; --disable-guest-agent) guest_agent="no" @@ -1903,6 +1908,8 @@ disabled with --disable-FEATURE, default is enabled if available (for reading lzfse-compressed dmg images) zstdsupport for zstd compression library (for migration compression and qcow2 cluster compression) + qcow2-zstd-default Use zstd by default for qcow2 image creation + (requires zstd enabled) seccomp seccomp support coroutine-pool coroutine freelist (better performance) glusterfs GlusterFS backend @@ -6424,7 +6431,8 @@ if test "$skip_meson" = no; then -Dcurl=$curl -Dglusterfs=$glusterfs -Dbzip2=$bzip2 -Dlibiscsi=$libiscsi \ -Dlibnfs=$libnfs -Diconv=$iconv -Dcurses=$curses -Dlibudev=$libudev\ -Drbd=$rbd -Dlzo=$lzo -Dsnappy=$snappy -Dlzfse=$lzfse \ --Dzstd=$zstd -Dseccomp=$seccomp -Dvirtfs=$virtfs -Dcap_ng=$cap_ng \ +-Dzstd=$zstd -Dqcow2_zstd_default=$qcow2_zstd_default \ +-Dseccomp=$seccomp -Dvirtfs=$virtfs -Dcap_ng=$cap_ng \ -Dattr=$attr -Ddefault_devices=$default_devices \ -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \ -Dvhost_user_blk_server=$vhost_user_blk_server -Dmultiprocess=$multiprocess \ diff --git a/meson.build b/meson.build index d8a92666fb..3d65b6c46b 100644 --- a/meson.build +++ b/meson.build @@ -484,6 +484,9 @@ if not get_option('zstd').auto() or have_block required: get_option('zstd'), method: 'pkg-config', kwargs: static_kwargs) endif +if not zstd.found() and get_option('qcow2_zstd_default').enabled() + error('--enable-qcow2-zstd-default: Cannot use zstd by default without enabling zstd') +endif gbm = not_found if 'CONFIG_GBM' in config_host gbm = declare_dependency(compile_args: config_host['GBM_CFLAGS'].split(), @@ -1168,6 +1171,7 @@ config_host_data.set('CONFIG_GETTID', has_gettid) config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim) config_host_data.set('CONFIG_STATX', has_statx) config_host_data.set('CONFIG_ZSTD', zstd.found()) +config_host_data.set('CONFIG_QCOW2_ZSTD_DEFAULT', get_option('qcow2_zstd_default').enabled()) config_host_data.set('CONFIG_FUSE', fuse.found()) config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found()) config_host_data.set('CONFIG_X11', x11.found()) diff --git a/block/qcow2.c b/block/qcow2.c index ee4530cdbd..06bfbbf7b8 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3540,17 +3540,36 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp) } } -if (qcow2_opts->has_compression_type && -qcow2_opts->compression_type != QCOW2_COMPRESSION_TYPE_ZLIB) { - -ret = -EINVAL; - -if (version < 3) { +if (version < 3) { +if (qcow2_opts->has_compression_type && +qcow2_opts->compression_type != QCOW2_COMPRESSION_TYPE_ZLIB) +{ +ret = -EINVAL; error_setg(errp, "Non-zlib compression type is only supported with " "compatibility level 1.1 and above (use version=v3 or " "greater)"); goto out; } +} else { +if (qcow2_opts->has_compression_type) { +compression_type = qcow2_opts->compression_type; +#ifdef CONFIG_QCOW2_ZSTD_DEFAULT +} else { +compression_type = QCOW2_COMPRESSION_TYPE_ZSTD; +#endif +} +
Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
On 19/06/2021 19:27, Vladimir Sementsov-Ogievskiy wrote: 14.06.2021 10:33, Emanuele Giuseppe Esposito wrote: As done in BlockCopyCallState, categorize BlockCopyTask and BlockCopyState in IN, State and OUT fields. This is just to understand which field has to be protected with a lock. .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. Signed-off-by: Emanuele Giuseppe Esposito --- block/block-copy.c | 49 +- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/block/block-copy.c b/block/block-copy.c index 3f26be8ddc..5ff7764e87 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -52,29 +52,35 @@ typedef struct BlockCopyCallState { /* Coroutine where async block-copy is running */ Coroutine *co; - /* To reference all call states from BlockCopyState */ - QLIST_ENTRY(BlockCopyCallState) list; - /* State */ - int ret; bool finished; - QemuCoSleep sleep; - bool cancelled; + QemuCoSleep sleep; /* TODO: protect API with a lock */ + + /* To reference all call states from BlockCopyState */ + QLIST_ENTRY(BlockCopyCallState) list; /* OUT parameters */ + bool cancelled; actually, cancelled is not OUT field. It's set by user to cancel the operation. And block-copy tracks the field to understand should it cancel at the moment or not. So, it's part of state. Makes sense. Also, I just now understand, that "out parameter" sounds strange here. As "parameter" is an input by general meaning.. We may have "out parameters" for functions, as all parameters of a function are generally called "parameters" anyway. I think "OUT fields" is more correct here. I don't insist, as I'm not an expert in English (for sure, you'll find mistakes even in this paragraph :\ Actually I am using the terminology that was already there :) Anyways, I am not expert here either but fields do sounds better. I will change parameter -> field replacement to this patch. Emanuele
Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
On 19/06/2021 20:31, Vladimir Sementsov-Ogievskiy wrote: 19.06.2021 18:23, Vladimir Sementsov-Ogievskiy wrote: typedef struct BlockCopyTask { AioTask task; + /* + * IN parameters. Initialized in block_copy_task_create() + * and never changed. + */ That's just not true for method field :( I think, we just need to document that @method is never accessed concurrently Ok I got confused in the last patch. Method is read by block_copy_task_entry only after it is re-set in block_copy_dirty_clusters loop. Sorry for that. Will leave it as IN and document it better. Still, moving the lock granularity inside the while loop might not be too bad. Not sure though. At this point skip_unallocated can also be an atomic, even though I sense that you won't like that :) Emanuele
Re: [PATCH v4 3/6] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
On 19/06/2021 17:23, Vladimir Sementsov-Ogievskiy wrote: 14.06.2021 10:33, Emanuele Giuseppe Esposito wrote: As done in BlockCopyCallState, categorize BlockCopyTask and BlockCopyState in IN, State and OUT fields. This is just to understand which field has to be protected with a lock. .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. Signed-off-by: Emanuele Giuseppe Esposito --- block/block-copy.c | 49 +- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/block/block-copy.c b/block/block-copy.c index 3f26be8ddc..5ff7764e87 100644 --- a/block/block-copy.c +++ b/block/block-copy.c @@ -52,29 +52,35 @@ typedef struct BlockCopyCallState { /* Coroutine where async block-copy is running */ Coroutine *co; - /* To reference all call states from BlockCopyState */ - QLIST_ENTRY(BlockCopyCallState) list; - /* State */ - int ret; bool finished; - QemuCoSleep sleep; - bool cancelled; + QemuCoSleep sleep; /* TODO: protect API with a lock */ + + /* To reference all call states from BlockCopyState */ + QLIST_ENTRY(BlockCopyCallState) list; /* OUT parameters */ + bool cancelled; bool error_is_read; + int ret; } BlockCopyCallState; typedef struct BlockCopyTask { AioTask task; + /* + * IN parameters. Initialized in block_copy_task_create() + * and never changed. + */ That's just not true for method field :( You're right, because it is modified later in the while loop of block_copy_dirty_clusters, but the task is already in the list. Will move it to state field. BlockCopyState *s; BlockCopyCallState *call_state; int64_t offset; - int64_t bytes; BlockCopyMethod method; - QLIST_ENTRY(BlockCopyTask) list; + + /* State */ CoQueue wait_queue; /* coroutines blocked on this task */ + int64_t bytes; + QLIST_ENTRY(BlockCopyTask) list; } BlockCopyTask; static int64_t task_end(BlockCopyTask *task) @@ -90,15 +96,25 @@ typedef struct BlockCopyState { */ BdrvChild *source; BdrvChild *target; - BdrvDirtyBitmap *copy_bitmap; + + /* State */ int64_t in_flight_bytes; - int64_t cluster_size; BlockCopyMethod method; - int64_t max_transfer; - uint64_t len; QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls */ QLIST_HEAD(, BlockCopyCallState) calls; + /* State fields that use a thread-safe API */ + BdrvDirtyBitmap *copy_bitmap; + ProgressMeter *progress; + SharedResource *mem; + RateLimit rate_limit; + /* + * IN parameters. Initialized in block_copy_state_new() + * and never changed. + */ + int64_t cluster_size; + int64_t max_transfer; + uint64_t len; BdrvRequestFlags write_flags; /* @@ -114,14 +130,11 @@ typedef struct BlockCopyState { * In this case, block_copy() will query the source’s allocation status, * skip unallocated regions, clear them in the copy_bitmap, and invoke * block_copy_reset_unallocated() every time it does. + * + * This field is set in backup_run() before coroutines are run, + * therefore is an IN. That's not true: it is modified from backup_run, when block-copy already initialized, and block_copy() calls may be done from backup-top filter. Ok, I am not very familiar with the backup code, so I did not see it. This means we need to protect this field too. At this point, I think we can increase the granularity of the lock in block_copy_dirty_clusters: instead of having in each while loop block_copy_task_create(); // locks and releases internally block_copy_block_status(); // no lock used, but uses skip_unallocated so it will need one if() block_copy_task_shrink(); // locks and releases internally if(s->skip_unallocated) // will need lock block_copy_task_end(); // locks and releases internally [..] if() task->method = COPY_WRITE_ZEROS; // needs lock [..] we can just lock the while section and eventually create _locked() version of task_end and similar functions that are also used in non-locked code blocks. Emanuele */ bool skip_unallocated; - - ProgressMeter *progress; - - SharedResource *mem; - - RateLimit rate_limit; } BlockCopyState; static BlockCopyTask *find_conflicting_task(BlockCopyState *s,