Re: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough
On 09/12/20 15:03, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/20201209135355.561745-1-mlevi...@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20201209135355.561745-1-mlevi...@redhat.com Subject: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20201209135355.561745-1-mlevi...@redhat.com -> patchew/20201209135355.561745-1-mlevi...@redhat.com Switched to a new branch 'test' 77c9000 block/scsi: correctly emulate the VPD block limits page 61f49e1 block: use blk_get_max_ioctl_transfer for SCSI passthrough 35c66d6 block: add max_ioctl_transfer to BlockLimits 08ba263 file-posix: add sg_get_max_segments that actually works with sg e9fd749 file-posix: split hdev_refresh_limits from raw_refresh_limits === OUTPUT BEGIN === 1/5 Checking commit e9fd7498060c (file-posix: split hdev_refresh_limits from raw_refresh_limits) 2/5 Checking commit 08ba263f565d (file-posix: add sg_get_max_segments that actually works with sg) 3/5 Checking commit 35c66d636d83 (block: add max_ioctl_transfer to BlockLimits) 4/5 Checking commit 61f49e1c953b (block: use blk_get_max_ioctl_transfer for SCSI passthrough) 5/5 Checking commit 77c9000b7c30 (block/scsi: correctly emulate the VPD block limits page) ERROR: braces {} are necessary for all arms of this statement #39: FILE: hw/scsi/scsi-generic.c:204: +if (len < r->buflen) [...] total: 1 errors, 0 warnings, 28 lines checked Patch 5/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20201209135355.561745-1-mlevi...@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com Time for v3? Paolo
[PATCH v5 2/4] block: add bdrv_co_delete_file_noerr
This function wraps bdrv_co_delete_file for the common case of removing a file, which was just created by format driver, on an error condition. It hides the -ENOTSUPP error, and reports all other errors otherwise. Signed-off-by: Maxim Levitsky Reviewed-by: Alberto Garcia --- block.c | 24 include/block/block.h | 1 + 2 files changed, 25 insertions(+) diff --git a/block.c b/block.c index f1cedac362..5d35ba2fb8 100644 --- a/block.c +++ b/block.c @@ -704,6 +704,30 @@ int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp) return ret; } +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs) +{ +Error *local_err = NULL; +int ret; + +if (!bs) { +return; +} + +ret = bdrv_co_delete_file(bs, _err); +/* + * ENOTSUP will happen if the block driver doesn't support + * the 'bdrv_co_delete_file' interface. This is a predictable + * scenario and shouldn't be reported back to the user. + */ +if (ret == -ENOTSUP) { +error_free(local_err); +} else if (ret < 0) { +error_report_err(local_err); +} +} + + + /** * Try to get @bs's logical and physical block size. * On success, store them in @bsz struct and return 0. diff --git a/include/block/block.h b/include/block/block.h index c9d7c58765..af03022723 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -428,6 +428,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base, Error **errp); void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base); int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp); +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs); typedef struct BdrvCheckResult { -- 2.26.2
[PATCH v5 1/4] crypto: luks: Fix tiny memory leak
When the underlying block device doesn't support the bdrv_co_delete_file interface, an 'Error' object was leaked. Signed-off-by: Maxim Levitsky Reviewed-by: Alberto Garcia --- block/crypto.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/crypto.c b/block/crypto.c index aef5a5721a..b3a5275132 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -735,6 +735,8 @@ fail: */ if ((r_del < 0) && (r_del != -ENOTSUP)) { error_report_err(local_delete_err); +} else { +error_free(local_delete_err); } } -- 2.26.2
[PATCH v5 0/4] qcow2: don't leave partially initialized file on image creation
Use the bdrv_co_delete_file interface to delete the underlying file if qcow2 initialization fails (e.g due to bad encryption secret) This makes the qcow2 driver behave the same way as the luks driver behaves. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1845353 V3: addressed review feedback and reworked commit messages V4: got rid of code duplication by adding bdrv_co_delete_file_noerr and made the qcow2 driver use this function to delete both the main and the data file. V5: addresssed review feedback on reworked version. Best regards, Maxim Levitsky Maxim Levitsky (4): crypto: luks: Fix tiny memory leak block: add bdrv_co_delete_file_noerr crypto: luks: use bdrv_co_delete_file_noerr block: qcow2: remove the created file on initialization error block.c | 24 block/crypto.c| 13 ++--- block/qcow2.c | 6 -- include/block/block.h | 1 + 4 files changed, 31 insertions(+), 13 deletions(-) -- 2.26.2
[PATCH v5 3/4] crypto: luks: use bdrv_co_delete_file_noerr
This refactoring is now possible thanks to this function. Signed-off-by: Maxim Levitsky Reviewed-by: Alberto Garcia --- block/crypto.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index b3a5275132..1d30fde38e 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -725,19 +725,8 @@ fail: * If an error occurred, delete 'filename'. Even if the file existed * beforehand, it has been truncated and corrupted in the process. */ -if (ret && bs) { -Error *local_delete_err = NULL; -int r_del = bdrv_co_delete_file(bs, _delete_err); -/* - * ENOTSUP will happen if the block driver doesn't support - * the 'bdrv_co_delete_file' interface. This is a predictable - * scenario and shouldn't be reported back to the user. - */ -if ((r_del < 0) && (r_del != -ENOTSUP)) { -error_report_err(local_delete_err); -} else { -error_free(local_delete_err); -} +if (ret) { +bdrv_co_delete_file_noerr(bs); } bdrv_unref(bs); -- 2.26.2
[PATCH v5 4/4] block: qcow2: remove the created file on initialization error
If the qcow initialization fails, we should remove the file if it was already created, to avoid leaving stale files around. We already do this for luks raw images. Signed-off-by: Maxim Levitsky Reviewed-by: Alberto Garcia --- block/qcow2.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 3a90ef2786..68c9182f92 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3847,12 +3847,14 @@ static int coroutine_fn qcow2_co_create_opts(BlockDriver *drv, /* Create the qcow2 image (format layer) */ ret = qcow2_co_create(create_options, errp); +finish: if (ret < 0) { -goto finish; +bdrv_co_delete_file_noerr(bs); +bdrv_co_delete_file_noerr(data_bs); } ret = 0; -finish: + qobject_unref(qdict); bdrv_unref(bs); bdrv_unref(data_bs); -- 2.26.2
Re: [PATCH v4 4/4] block: qcow2: remove the created file on initialization error
On Wed, 2020-12-09 at 18:41 +0100, Alberto Garcia wrote: > On Wed 09 Dec 2020 05:44:41 PM CET, Maxim Levitsky wrote: > > @@ -3847,12 +3847,13 @@ static int coroutine_fn > > qcow2_co_create_opts(BlockDriver *drv, > > > > /* Create the qcow2 image (format layer) */ > > ret = qcow2_co_create(create_options, errp); > > + > > +finish: > > if (ret < 0) { > > -goto finish; > > +bdrv_co_delete_file_noerr(bs); > > +bdrv_co_delete_file_noerr(data_bs); > > } > > > > -ret = 0; > > Many/most functions in qcow2.c force ret to be 0 on success, we could > also keep that here (although in practice I don't think that ret can be > greater than 0 in this case, or that the caller would care). I also noticed this when I was sending the patches, and I wasn't sure if I want to keep that 'ret = 0' or not. I will add it back. Best regards, Maxim Levitsky > > Either way, > > Reviewed-by: Alberto Garcia > > Berto >
Re: [PATCH v4 2/4] block: add bdrv_co_delete_file_noerr
On Wed, 2020-12-09 at 18:34 +0100, Alberto Garcia wrote: > On Wed 09 Dec 2020 05:44:39 PM CET, Maxim Levitsky wrote: > > +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs) > > +{ > > +Error *local_err = NULL; > > + > > +if (!bs) { > > +return; > > +} > > + > > +int ret = bdrv_co_delete_file(bs, _err); >^^^ > > According to the QEMU coding style we should not have declarations in > the middle of a block. Oops! I will send next version now. Thanks a lot for the review! Best regards, Maxim Levitsky > > The patch looks otherwise fine. > > Reviewed-by: Alberto Garcia > > Berto >
[PULL v2 48/65] libvhost-user: make it a meson subproject
From: Marc-André Lureau By making libvhost-user a subproject, check it builds standalone (without the global QEMU cflags etc). Note that the library still relies on QEMU include/qemu/atomic.h and linux_headers/. Signed-off-by: Marc-André Lureau Message-Id: <20201125100640.366523-6-marcandre.lur...@redhat.com> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- contrib/vhost-user-gpu/vugpu.h| 2 +- include/qemu/vhost-user-server.h | 2 +- .../libvhost-user/libvhost-user-glib.h| 0 .../libvhost-user/libvhost-user.h | 0 block/export/vhost-user-blk-server.c | 2 +- contrib/vhost-user-blk/vhost-user-blk.c | 3 +-- contrib/vhost-user-input/main.c | 3 +-- contrib/vhost-user-scsi/vhost-user-scsi.c | 2 +- .../libvhost-user/libvhost-user-glib.c| 0 .../libvhost-user/libvhost-user.c | 0 tests/vhost-user-bridge.c | 2 +- tools/virtiofsd/fuse_virtio.c | 2 +- contrib/libvhost-user/meson.build | 4 contrib/vhost-user-blk/meson.build| 3 +-- contrib/vhost-user-gpu/meson.build| 3 +-- contrib/vhost-user-input/meson.build | 3 +-- contrib/vhost-user-scsi/meson.build | 3 +-- meson.build | 7 ++- subprojects/libvhost-user/meson.build | 20 +++ tests/meson.build | 3 +-- tools/virtiofsd/meson.build | 3 +-- 21 files changed, 40 insertions(+), 27 deletions(-) rename {contrib => subprojects}/libvhost-user/libvhost-user-glib.h (100%) rename {contrib => subprojects}/libvhost-user/libvhost-user.h (100%) rename {contrib => subprojects}/libvhost-user/libvhost-user-glib.c (100%) rename {contrib => subprojects}/libvhost-user/libvhost-user.c (100%) delete mode 100644 contrib/libvhost-user/meson.build create mode 100644 subprojects/libvhost-user/meson.build diff --git a/contrib/vhost-user-gpu/vugpu.h b/contrib/vhost-user-gpu/vugpu.h index 3153c9a6de..bdf9a74b46 100644 --- a/contrib/vhost-user-gpu/vugpu.h +++ b/contrib/vhost-user-gpu/vugpu.h @@ -17,7 +17,7 @@ #include "qemu/osdep.h" -#include "contrib/libvhost-user/libvhost-user-glib.h" +#include "libvhost-user-glib.h" #include "standard-headers/linux/virtio_gpu.h" #include "qemu/queue.h" diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h index 0da4c2cc4c..121ea1dedf 100644 --- a/include/qemu/vhost-user-server.h +++ b/include/qemu/vhost-user-server.h @@ -11,7 +11,7 @@ #ifndef VHOST_USER_SERVER_H #define VHOST_USER_SERVER_H -#include "contrib/libvhost-user/libvhost-user.h" +#include "subprojects/libvhost-user/libvhost-user.h" /* only for the type definitions */ #include "io/channel-socket.h" #include "io/channel-file.h" #include "io/net-listener.h" diff --git a/contrib/libvhost-user/libvhost-user-glib.h b/subprojects/libvhost-user/libvhost-user-glib.h similarity index 100% rename from contrib/libvhost-user/libvhost-user-glib.h rename to subprojects/libvhost-user/libvhost-user-glib.h diff --git a/contrib/libvhost-user/libvhost-user.h b/subprojects/libvhost-user/libvhost-user.h similarity index 100% rename from contrib/libvhost-user/libvhost-user.h rename to subprojects/libvhost-user/libvhost-user.h diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index 62672d1cb9..a3d95ca012 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -11,7 +11,7 @@ */ #include "qemu/osdep.h" #include "block/block.h" -#include "contrib/libvhost-user/libvhost-user.h" +#include "subprojects/libvhost-user/libvhost-user.h" /* only for the type definitions */ #include "standard-headers/linux/virtio_blk.h" #include "qemu/vhost-user-server.h" #include "vhost-user-blk-server.h" diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c index dc981bf945..6abd7835a8 100644 --- a/contrib/vhost-user-blk/vhost-user-blk.c +++ b/contrib/vhost-user-blk/vhost-user-blk.c @@ -17,8 +17,7 @@ #include "qemu/osdep.h" #include "standard-headers/linux/virtio_blk.h" -#include "contrib/libvhost-user/libvhost-user-glib.h" -#include "contrib/libvhost-user/libvhost-user.h" +#include "libvhost-user-glib.h" #if defined(__linux__) #include diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c index 6020c6f33a..3ea840cf44 100644 --- a/contrib/vhost-user-input/main.c +++ b/contrib/vhost-user-input/main.c @@ -12,8 +12,7 @@ #include "qemu/iov.h" #include "qemu/bswap.h" #include "qemu/sockets.h" -#include "contrib/libvhost-user/libvhost-user.h" -#include "contrib/libvhost-user/libvhost-user-glib.h" +#include "libvhost-user-glib.h" #include "standard-headers/linux/virtio_input.h" #include "qapi/error.h" diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c
[PULL v2 54/65] block/export: avoid g_return_val_if() input validation
From: Stefan Hajnoczi Do not validate input with g_return_val_if(). This API is intended for checking programming errors and is compiled out with -DG_DISABLE_CHECKS. Use an explicit if statement for input validation so it cannot accidentally be compiled out. Suggested-by: Markus Armbruster Signed-off-by: Stefan Hajnoczi Message-Id: <20201118091644.199527-5-stefa...@redhat.com> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- block/export/vhost-user-blk-server.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index a3d95ca012..ab2c4d44c4 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -267,7 +267,9 @@ vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len) VuServer *server = container_of(vu_dev, VuServer, vu_dev); VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server); -g_return_val_if_fail(len <= sizeof(struct virtio_blk_config), -1); +if (len > sizeof(struct virtio_blk_config)) { +return -1; +} memcpy(config, >blkcfg, len); return 0; -- MST
Re: [PATCH v4 4/4] block: qcow2: remove the created file on initialization error
On Wed 09 Dec 2020 05:44:41 PM CET, Maxim Levitsky wrote: > @@ -3847,12 +3847,13 @@ static int coroutine_fn > qcow2_co_create_opts(BlockDriver *drv, > > /* Create the qcow2 image (format layer) */ > ret = qcow2_co_create(create_options, errp); > + > +finish: > if (ret < 0) { > -goto finish; > +bdrv_co_delete_file_noerr(bs); > +bdrv_co_delete_file_noerr(data_bs); > } > > -ret = 0; Many/most functions in qcow2.c force ret to be 0 on success, we could also keep that here (although in practice I don't think that ret can be greater than 0 in this case, or that the caller would care). Either way, Reviewed-by: Alberto Garcia Berto
Re: [PATCH] file-posix: detect the lock using the real file
Am 09.12.2020 um 10:33 hat Daniel P. Berrangé geschrieben: > On Tue, Dec 08, 2020 at 03:38:22PM +0100, Kevin Wolf wrote: > > Am 08.12.2020 um 13:59 hat Li Feng geschrieben: > > > This patch addresses this issue: > > > When accessing a volume on an NFS filesystem without supporting the file > > > lock, > > > tools, like qemu-img, will complain "Failed to lock byte 100". > > > > > > In the original code, the qemu_has_ofd_lock will test the lock on the > > > "/dev/null" pseudo-file. Actually, the file.locking is per-drive property, > > > which depends on the underlay filesystem. > > > > > > In this patch, make the 'qemu_has_ofd_lock' with a filename be more > > > generic > > > and reasonable. > > > > > > Signed-off-by: Li Feng > > > > Do you know any way how I could configure either the NFS server or the > > NFS client such that locking would fail? For any patch related to this, > > it would be good if I could even test the scenario. > > One could write a qtest that uses an LD_PRELOAD to replace the standard > glibc fcntl() function with one that returns an error for locking commands. Sounds a bit ugly, but for regression testing it could make sense. However, part of the testing would be to verify that we our checks actually match the kernel code, which this approach couldn't solve. Kevin
Re: [PATCH 2/2] nbd/server: Quiesce coroutines on context switch
On Fri, Dec 04, 2020 at 12:39:07PM -0600, Eric Blake wrote: > On 12/4/20 10:53 AM, Sergio Lopez wrote: > > When switching between AIO contexts we need to me make sure that both > > recv_coroutine and send_coroutine are not scheduled to run. Otherwise, > > QEMU may crash while attaching the new context with an error like > > this one: > > > > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule' > > > > To achieve this we need a local implementation of > > 'qio_channel_readv_all_eof' named 'nbd_read_eof' (a trick already done > > by 'nbd/client.c') that allows us to interrupt the operation and to > > know when recv_coroutine is yielding. > > > > With this in place, we delegate detaching the AIO context to the > > owning context with a BH ('nbd_aio_detach_bh') scheduled using > > 'aio_wait_bh_oneshot'. This BH signals that we need to quiesce the > > channel by setting 'client->quiescing' to 'true', and either waits for > > the coroutine to finish using AIO_WAIT_WHILE or, if it's yielding in > > 'nbd_read_eof', actively enters the coroutine to interrupt it. > > > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900326 > > Signed-off-by: Sergio Lopez > > --- > > nbd/server.c | 120 +-- > > 1 file changed, 106 insertions(+), 14 deletions(-) > > A complex patch, so I'd appreciate a second set of eyes. > > > > > diff --git a/nbd/server.c b/nbd/server.c > > index 613ed2634a..7229f487d2 100644 > > --- a/nbd/server.c > > +++ b/nbd/server.c > > @@ -132,6 +132,9 @@ struct NBDClient { > > CoMutex send_lock; > > Coroutine *send_coroutine; > > > > +bool read_yielding; > > +bool quiescing; > > Will either of these fields need to be accessed atomically once the > 'yank' code is added, or are we still safe with direct access because > coroutines are not multithreaded? Yes, those are only accessed from coroutines, which will be scheduled on the same thread. > > + > > QTAILQ_ENTRY(NBDClient) next; > > int nb_requests; > > bool closing; > > @@ -1352,14 +1355,60 @@ static coroutine_fn int nbd_negotiate(NBDClient > > *client, Error **errp) > > return 0; > > } > > > > -static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request, > > +/* nbd_read_eof > > + * Tries to read @size bytes from @ioc. This is a local implementation of > > + * qio_channel_readv_all_eof. We have it here because we need it to be > > + * interruptible and to know when the coroutine is yielding. > > + * Returns 1 on success > > + * 0 on eof, when no data was read (errp is not set) > > + * negative errno on failure (errp is set) > > + */ > > +static inline int coroutine_fn > > +nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp) > > +{ > > +bool partial = false; > > + > > +assert(size); > > +while (size > 0) { > > +struct iovec iov = { .iov_base = buffer, .iov_len = size }; > > +ssize_t len; > > + > > +len = qio_channel_readv(client->ioc, , 1, errp); > > +if (len == QIO_CHANNEL_ERR_BLOCK) { > > +client->read_yielding = true; > > +qio_channel_yield(client->ioc, G_IO_IN); > > +client->read_yielding = false; > > nbd/client.c:nbd_read_eof() uses bdrv_dec/inc_in_flight instead of > read_yielding... > > > +if (client->quiescing) { > > +return -EAGAIN; > > +} > > and the quiescing check is new; otherwise, these two functions look > identical. Having two static functions with the same name makes gdb a > bit more annoying (which one of the two did you want your breakpoint > on?). Is there any way we could write this code only once in > nbd/common.c for reuse by both client and server? But I can live with > it as written. I'm not happy with this either, but on the first implementation I've tried to come up with a unique function for both use cases, and it looked terrible. We can easily use a different name, though. > > @@ -2151,20 +2223,23 @@ static int nbd_co_send_bitmap(NBDClient *client, > > uint64_t handle, > > > > /* nbd_co_receive_request > > * Collect a client request. Return 0 if request looks valid, -EIO to drop > > - * connection right away, and any other negative value to report an error > > to > > - * the client (although the caller may still need to disconnect after > > reporting > > - * the error). > > + * connection right away, -EAGAIN to indicate we were interrupted and the > > + * channel should be quiesced, and any other negative value to report an > > error > > + * to the client (although the caller may still need to disconnect after > > + * reporting the error). > > */ > > static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, > >Error **errp) > > { > > NBDClient *client = req->client; > > int valid_flags; > > +int ret; > > > > g_assert(qemu_in_coroutine()); > >
Re: [PATCH 1/2] virtio-blk: Acquire context while switching them on dataplane start
Am 09.12.2020 um 17:51 hat Sergio Lopez geschrieben: > On Mon, Dec 07, 2020 at 04:37:53PM +0100, Kevin Wolf wrote: > > Am 04.12.2020 um 17:53 hat Sergio Lopez geschrieben: > > > On dataplane start, acquire the new AIO context before calling > > > 'blk_set_aio_context', releasing it immediately afterwards. This > > > prevents reaching the AIO context attach/detach notifier functions > > > without having acquired it first. > > > > > > It was also the only place where 'blk_set_aio_context' was called with > > > an unprotected AIO context. > > > > > > Signed-off-by: Sergio Lopez > > > --- > > > hw/block/dataplane/virtio-blk.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/hw/block/dataplane/virtio-blk.c > > > b/hw/block/dataplane/virtio-blk.c > > > index 37499c5564..034e43cb1f 100644 > > > --- a/hw/block/dataplane/virtio-blk.c > > > +++ b/hw/block/dataplane/virtio-blk.c > > > @@ -214,7 +214,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > > > vblk->dataplane_started = true; > > > trace_virtio_blk_data_plane_start(s); > > > > > > +aio_context_acquire(s->ctx); > > > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, _err); > > > +aio_context_release(s->ctx); > > > > bdrv_set_aio_context_ignore() is documented like this: > > > > * The caller must own the AioContext lock for the old AioContext of bs, > > but it > > * must not own the AioContext lock for new_context (unless new_context is > > the > > * same as the current context of bs). > > Does that rule apply to blk_set_aio_context too? bdrv_set_aio_context_ignore() is what blk_set_aio_context() calls, so I would say yes. > All use cases I can find in the code are acquiring the new context: > [...] Hm... That's unfortunate. I think the reason why you shouldn't hold it is that the bdrv_drained_begin() call in bdrv_set_aio_context_ignore() could deadlock if you hold the lock of a context that is not the current context of the BlockDriverState. Maybe there are more reasons, I'm not sure. Kevin signature.asc Description: PGP signature
Re: [PATCH v4 3/4] crypto: luks: use bdrv_co_delete_file_noerr
On Wed 09 Dec 2020 05:44:40 PM CET, Maxim Levitsky wrote: > This refactoring is now possible thanks to this function. > > Signed-off-by: Maxim Levitsky Reviewed-by: Alberto Garcia Berto
Re: [PATCH v4 2/4] block: add bdrv_co_delete_file_noerr
On Wed 09 Dec 2020 05:44:39 PM CET, Maxim Levitsky wrote: > +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs) > +{ > +Error *local_err = NULL; > + > +if (!bs) { > +return; > +} > + > +int ret = bdrv_co_delete_file(bs, _err); ^^^ According to the QEMU coding style we should not have declarations in the middle of a block. The patch looks otherwise fine. Reviewed-by: Alberto Garcia Berto
Re: [PATCH 1/2] virtio-blk: Acquire context while switching them on dataplane start
On Mon, Dec 07, 2020 at 04:37:53PM +0100, Kevin Wolf wrote: > Am 04.12.2020 um 17:53 hat Sergio Lopez geschrieben: > > On dataplane start, acquire the new AIO context before calling > > 'blk_set_aio_context', releasing it immediately afterwards. This > > prevents reaching the AIO context attach/detach notifier functions > > without having acquired it first. > > > > It was also the only place where 'blk_set_aio_context' was called with > > an unprotected AIO context. > > > > Signed-off-by: Sergio Lopez > > --- > > hw/block/dataplane/virtio-blk.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/hw/block/dataplane/virtio-blk.c > > b/hw/block/dataplane/virtio-blk.c > > index 37499c5564..034e43cb1f 100644 > > --- a/hw/block/dataplane/virtio-blk.c > > +++ b/hw/block/dataplane/virtio-blk.c > > @@ -214,7 +214,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) > > vblk->dataplane_started = true; > > trace_virtio_blk_data_plane_start(s); > > > > +aio_context_acquire(s->ctx); > > r = blk_set_aio_context(s->conf->conf.blk, s->ctx, _err); > > +aio_context_release(s->ctx); > > bdrv_set_aio_context_ignore() is documented like this: > > * The caller must own the AioContext lock for the old AioContext of bs, but > it > * must not own the AioContext lock for new_context (unless new_context is the > * same as the current context of bs). Does that rule apply to blk_set_aio_context too? All use cases I can find in the code are acquiring the new context: hw/block/dataplane/xen-block.c: 719 void xen_block_dataplane_start(XenBlockDataPlane *dataplane, 720const unsigned int ring_ref[], 721unsigned int nr_ring_ref, 722unsigned int event_channel, 723unsigned int protocol, 724Error **errp) 725 { ... 811 aio_context_acquire(dataplane->ctx); 812 /* If other users keep the BlockBackend in the iothread, that's ok */ 813 blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL); 814 /* Only reason for failure is a NULL channel */ 815 xen_device_set_event_channel_context(xendev, dataplane->event_channel, 816 dataplane->ctx, _abort); 817 aio_context_release(dataplane->ctx); hw/scsi/virtio-scsi.c: 818 static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev, 819 Error **errp) 820 { ... 830 virtio_scsi_acquire(s); 831 ret = blk_set_aio_context(sd->conf.blk, s->ctx, errp); 832 virtio_scsi_release(s); Thanks, Sergio. signature.asc Description: PGP signature
[PATCH v4 1/4] crypto: luks: Fix tiny memory leak
When the underlying block device doesn't support the bdrv_co_delete_file interface, an 'Error' object was leaked. Signed-off-by: Maxim Levitsky Reviewed-by: Alberto Garcia --- block/crypto.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/crypto.c b/block/crypto.c index aef5a5721a..b3a5275132 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -735,6 +735,8 @@ fail: */ if ((r_del < 0) && (r_del != -ENOTSUP)) { error_report_err(local_delete_err); +} else { +error_free(local_delete_err); } } -- 2.26.2
[PATCH v4 2/4] block: add bdrv_co_delete_file_noerr
This function wraps bdrv_co_delete_file for the common case of removing a file, which was just created by format driver, on an error condition. It hides the -ENOTSUPP error, and reports all other errors otherwise. Signed-off-by: Maxim Levitsky --- block.c | 23 +++ include/block/block.h | 1 + 2 files changed, 24 insertions(+) diff --git a/block.c b/block.c index f1cedac362..57e6d9750a 100644 --- a/block.c +++ b/block.c @@ -704,6 +704,29 @@ int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp) return ret; } +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs) +{ +Error *local_err = NULL; + +if (!bs) { +return; +} + +int ret = bdrv_co_delete_file(bs, _err); +/* + * ENOTSUP will happen if the block driver doesn't support + * the 'bdrv_co_delete_file' interface. This is a predictable + * scenario and shouldn't be reported back to the user. + */ +if (ret == -ENOTSUP) { +error_free(local_err); +} else if (ret < 0) { +error_report_err(local_err); +} +} + + + /** * Try to get @bs's logical and physical block size. * On success, store them in @bsz struct and return 0. diff --git a/include/block/block.h b/include/block/block.h index c9d7c58765..af03022723 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -428,6 +428,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base, Error **errp); void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base); int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp); +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs); typedef struct BdrvCheckResult { -- 2.26.2
[PATCH v4 4/4] block: qcow2: remove the created file on initialization error
If the qcow initialization fails, we should remove the file if it was already created, to avoid leaving stale files around. We already do this for luks raw images. Signed-off-by: Maxim Levitsky --- block/qcow2.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 3a90ef2786..b5169b7cad 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3847,12 +3847,13 @@ static int coroutine_fn qcow2_co_create_opts(BlockDriver *drv, /* Create the qcow2 image (format layer) */ ret = qcow2_co_create(create_options, errp); + +finish: if (ret < 0) { -goto finish; +bdrv_co_delete_file_noerr(bs); +bdrv_co_delete_file_noerr(data_bs); } -ret = 0; -finish: qobject_unref(qdict); bdrv_unref(bs); bdrv_unref(data_bs); -- 2.26.2
[PATCH v4 3/4] crypto: luks: use bdrv_co_delete_file_noerr
This refactoring is now possible thanks to this function. Signed-off-by: Maxim Levitsky --- block/crypto.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/block/crypto.c b/block/crypto.c index b3a5275132..1d30fde38e 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -725,19 +725,8 @@ fail: * If an error occurred, delete 'filename'. Even if the file existed * beforehand, it has been truncated and corrupted in the process. */ -if (ret && bs) { -Error *local_delete_err = NULL; -int r_del = bdrv_co_delete_file(bs, _delete_err); -/* - * ENOTSUP will happen if the block driver doesn't support - * the 'bdrv_co_delete_file' interface. This is a predictable - * scenario and shouldn't be reported back to the user. - */ -if ((r_del < 0) && (r_del != -ENOTSUP)) { -error_report_err(local_delete_err); -} else { -error_free(local_delete_err); -} +if (ret) { +bdrv_co_delete_file_noerr(bs); } bdrv_unref(bs); -- 2.26.2
[PATCH v4 0/4] qcow2: don't leave partially initialized file on image creation
Use the bdrv_co_delete_file interface to delete the underlying file if qcow2 initialization fails (e.g due to bad encryption secret) This makes the qcow2 driver behave the same way as the luks driver behaves. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1845353 V3: addressed review feedback and reworked commit messages V4: got rid of code duplication by adding bdrv_co_delete_file_noerr and made the qcow2 driver use this function to delete both the main and the data file. Best regards, Maxim Levitsky Maxim Levitsky (4): crypto: luks: Fix tiny memory leak block: add bdrv_co_delete_file_noerr crypto: luks: use bdrv_co_delete_file_noerr block: qcow2: remove the created file on initialization error block.c | 23 +++ block/crypto.c| 13 ++--- block/qcow2.c | 7 --- include/block/block.h | 1 + 4 files changed, 30 insertions(+), 14 deletions(-) -- 2.26.2
Re: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough
Patchew URL: https://patchew.org/QEMU/20201209135355.561745-1-mlevi...@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20201209135355.561745-1-mlevi...@redhat.com Subject: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20201209135355.561745-1-mlevi...@redhat.com -> patchew/20201209135355.561745-1-mlevi...@redhat.com Switched to a new branch 'test' 77c9000 block/scsi: correctly emulate the VPD block limits page 61f49e1 block: use blk_get_max_ioctl_transfer for SCSI passthrough 35c66d6 block: add max_ioctl_transfer to BlockLimits 08ba263 file-posix: add sg_get_max_segments that actually works with sg e9fd749 file-posix: split hdev_refresh_limits from raw_refresh_limits === OUTPUT BEGIN === 1/5 Checking commit e9fd7498060c (file-posix: split hdev_refresh_limits from raw_refresh_limits) 2/5 Checking commit 08ba263f565d (file-posix: add sg_get_max_segments that actually works with sg) 3/5 Checking commit 35c66d636d83 (block: add max_ioctl_transfer to BlockLimits) 4/5 Checking commit 61f49e1c953b (block: use blk_get_max_ioctl_transfer for SCSI passthrough) 5/5 Checking commit 77c9000b7c30 (block/scsi: correctly emulate the VPD block limits page) ERROR: braces {} are necessary for all arms of this statement #39: FILE: hw/scsi/scsi-generic.c:204: +if (len < r->buflen) [...] total: 1 errors, 0 warnings, 28 lines checked Patch 5/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20201209135355.561745-1-mlevi...@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough
On Wed, 2020-12-09 at 06:03 -0800, no-re...@patchew.org wrote: > Patchew URL: > https://patchew.org/QEMU/20201209135355.561745-1-mlevi...@redhat.com/ > > > > Hi, > > This series seems to have some coding style problems. See output below for > more information: > > Type: series > Message-id: 20201209135355.561745-1-mlevi...@redhat.com > Subject: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough > > === TEST SCRIPT BEGIN === > #!/bin/bash > git rev-parse base > /dev/null || exit 0 > git config --local diff.renamelimit 0 > git config --local diff.renames True > git config --local diff.algorithm histogram > ./scripts/checkpatch.pl --mailback base.. > === TEST SCRIPT END === > > Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 > From https://github.com/patchew-project/qemu > * [new tag] patchew/20201209135355.561745-1-mlevi...@redhat.com -> > patchew/20201209135355.561745-1-mlevi...@redhat.com > Switched to a new branch 'test' > 77c9000 block/scsi: correctly emulate the VPD block limits page > 61f49e1 block: use blk_get_max_ioctl_transfer for SCSI passthrough > 35c66d6 block: add max_ioctl_transfer to BlockLimits > 08ba263 file-posix: add sg_get_max_segments that actually works with sg > e9fd749 file-posix: split hdev_refresh_limits from raw_refresh_limits > > === OUTPUT BEGIN === > 1/5 Checking commit e9fd7498060c (file-posix: split hdev_refresh_limits from > raw_refresh_limits) > 2/5 Checking commit 08ba263f565d (file-posix: add sg_get_max_segments that > actually works with sg) > 3/5 Checking commit 35c66d636d83 (block: add max_ioctl_transfer to > BlockLimits) > 4/5 Checking commit 61f49e1c953b (block: use blk_get_max_ioctl_transfer for > SCSI passthrough) > 5/5 Checking commit 77c9000b7c30 (block/scsi: correctly emulate the VPD block > limits page) > ERROR: braces {} are necessary for all arms of this statement > #39: FILE: hw/scsi/scsi-generic.c:204: > +if (len < r->buflen) +1 Good bot :-) Best regards, Maxim Levitsky > [...] > > total: 1 errors, 0 warnings, 28 lines checked > > Patch 5/5 has style problems, please review. If any of these errors > are false positives report them to the maintainer, see > CHECKPATCH in MAINTAINERS. > > === OUTPUT END === > > Test command exited with code: 1 > > > The full log is available at > http://patchew.org/logs/20201209135355.561745-1-mlevi...@redhat.com/testing.checkpatch/?type=message. > --- > Email generated automatically by Patchew [https://patchew.org/]. > Please send your feedback to patchew-de...@redhat.com
Re: qemu 6.0 rbd driver rewrite
On Wed, Dec 9, 2020 at 7:19 AM Peter Lieven wrote: > > Am 01.12.20 um 13:40 schrieb Peter Lieven: > > Hi, > > > > > > i would like to submit a series for 6.0 which will convert the aio hooks to > > native coroutine hooks and add write zeroes support. > > > > The aio routines are nowadays just an emulation on top of coroutines which > > add additional overhead. > > > > For this I would like to lift the minimum librbd requirement to luminous > > release to get rid of the ifdef'ry in the code. > > > > > > Any objections? None from me (speaking in my role under Ceph) -- even Luminous is EoL for us upstream. Hopefully no one would attempt to install QEMU 6 but expect to keep librbd frozen at a >3 year old Kraken or earlier release. > > Best, > > > > Peter > > > > Kindly pinging as the 6.0 dev tree is now open. Also cc'ing qemu-devel which > I accidently forgot. > > > Peter > > -- Jason
[PATCH v2 5/5] block/scsi: correctly emulate the VPD block limits page
When the device doesn't support the VPD block limits page, we emulate it even for SCSI passthrough. As a part of the emulation we need to add it to the 'Supported VPD Pages' The code that does this adds it to the page, but it doesn't increase the length of the data to be copied to the guest, thus the guest never sees the VPD block limits page as supported. Bump the transfer size by 1 in this case. Signed-off-by: Maxim Levitsky --- hw/scsi/scsi-generic.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c index 6df67bf889..4354469841 100644 --- a/hw/scsi/scsi-generic.c +++ b/hw/scsi/scsi-generic.c @@ -134,7 +134,7 @@ static int execute_command(BlockBackend *blk, return 0; } -static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s) +static int scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s, int len) { uint8_t page, page_idx; @@ -200,8 +200,12 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s) r->buf[page_idx] = 0xb0; } stw_be_p(r->buf + 2, lduw_be_p(r->buf + 2) + 1); + +if (len < r->buflen) +len++; } } +return len; } static int scsi_generic_emulate_block_limits(SCSIGenericReq *r, SCSIDevice *s) @@ -316,7 +320,7 @@ static void scsi_read_complete(void * opaque, int ret) } } if (r->req.cmd.buf[0] == INQUIRY) { -scsi_handle_inquiry_reply(r, s); +len = scsi_handle_inquiry_reply(r, s, len); } req_complete: -- 2.26.2
[PATCH v2 4/5] block: use blk_get_max_ioctl_transfer for SCSI passthrough
Switch file-posix to expose only the max_ioctl_transfer limit. Let the iscsi driver work as it did before since it is bound by the transfer limit in both regular read/write and in SCSI passthrough case. Switch the scsi-disk and scsi-block drivers to read the SG max transfer limits using the new blk_get_max_ioctl_transfer interface. Fixes: 867eccfed8 ("file-posix: Use max transfer length/segment count only for SCSI passthrough") Signed-off-by: Maxim Levitsky --- block/file-posix.c | 4 ++-- block/iscsi.c | 1 + hw/scsi/scsi-generic.c | 4 ++-- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 10ebc4c5b7..0a94211847 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1284,12 +1284,12 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) get_max_transfer_length(s->fd); if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) { -bs->bl.max_transfer = pow2floor(ret); +bs->bl.max_ioctl_transfer = pow2floor(ret); } ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd); if (ret > 0) { -bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, +bs->bl.max_ioctl_transfer = MIN_NON_ZERO(bs->bl.max_ioctl_transfer, ret * qemu_real_host_page_size); } diff --git a/block/iscsi.c b/block/iscsi.c index e30a7e3606..3685da2971 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -2065,6 +2065,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp) if (max_xfer_len * block_size < INT_MAX) { bs->bl.max_transfer = max_xfer_len * iscsilun->block_size; +bs->bl.max_ioctl_transfer = bs->bl.max_transfer; } if (iscsilun->lbp.lbpu) { diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c index 2cb23ca891..6df67bf889 100644 --- a/hw/scsi/scsi-generic.c +++ b/hw/scsi/scsi-generic.c @@ -167,7 +167,7 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s) page = r->req.cmd.buf[2]; if (page == 0xb0) { uint32_t max_transfer = -blk_get_max_transfer(s->conf.blk) / s->blocksize; +blk_get_max_ioctl_transfer(s->conf.blk) / s->blocksize; assert(max_transfer); stl_be_p(>buf[8], max_transfer); @@ -210,7 +210,7 @@ static int scsi_generic_emulate_block_limits(SCSIGenericReq *r, SCSIDevice *s) uint8_t buf[64]; SCSIBlockLimits bl = { -.max_io_sectors = blk_get_max_transfer(s->conf.blk) / s->blocksize +.max_io_sectors = blk_get_max_ioctl_transfer(s->conf.blk) / s->blocksize }; memset(r->buf, 0, r->buflen); -- 2.26.2
[PATCH v2 2/5] file-posix: add sg_get_max_segments that actually works with sg
From: Tom Yan sg devices have different major/minor than their corresponding block devices. Using sysfs to get max segments never really worked for them. Fortunately the sg driver provides an ioctl to get sg_tablesize, which is apparently equivalent to max segments. Signed-off-by: Tom Yan Signed-off-by: Maxim Levitsky --- block/file-posix.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/block/file-posix.c b/block/file-posix.c index 226ddbbdad..10ebc4c5b7 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1181,6 +1181,26 @@ static int sg_get_max_transfer_length(int fd) #endif } +static int sg_get_max_segments(int fd) +{ +/* + * /dev/sg* character devices report 'max_segments' via + * SG_GET_SG_TABLESIZE ioctl + */ + +#ifdef SG_GET_SG_TABLESIZE +long max_segments = 0; + +if (ioctl(fd, SG_GET_SG_TABLESIZE, _segments) == 0) { +return max_segments; +} else { +return -errno; +} +#else +return -ENOSYS; +#endif +} + static int get_max_transfer_length(int fd) { #if defined(BLKSECTGET) @@ -1267,7 +1287,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_transfer = pow2floor(ret); } -ret = get_max_segments(s->fd); +ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd); if (ret > 0) { bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, ret * qemu_real_host_page_size); -- 2.26.2
[PATCH v2 3/5] block: add max_ioctl_transfer to BlockLimits
Maximum transfer size when accessing a kernel block device is only relevant when using SCSI passthrough (SG_IO ioctl) since only in this case the requests are passed directly to underlying hardware with no pre-processing. Same is true when using /dev/sg* character devices (which only support SG_IO) Therefore split the block driver's advertized max transfer size by the regular max transfer size, and the max transfer size for SCSI passthrough (the new max_ioctl_transfer field) In the next patch, the qemu block drivers that support SCSI passthrough will set the max_ioctl_transfer field, and simultaneously, the block devices that implement scsi passthrough will switch to 'blk_get_max_ioctl_transfer' to query and to pass it to the guest. Signed-off-by: Maxim Levitsky --- block/block-backend.c | 12 block/io.c | 2 ++ include/block/block_int.h | 4 include/sysemu/block-backend.h | 1 + 4 files changed, 19 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index ce78d30794..c1d149a755 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1938,6 +1938,18 @@ uint32_t blk_get_max_transfer(BlockBackend *blk) return MIN_NON_ZERO(max, INT_MAX); } +/* Returns the maximum transfer length, for SCSI passthrough */ +uint32_t blk_get_max_ioctl_transfer(BlockBackend *blk) +{ +BlockDriverState *bs = blk_bs(blk); +uint32_t max = 0; + +if (bs) { +max = bs->bl.max_ioctl_transfer; +} +return MIN_NON_ZERO(max, INT_MAX); +} + int blk_get_max_iov(BlockBackend *blk) { return blk->root->bs->bl.max_iov; diff --git a/block/io.c b/block/io.c index ec5e152bb7..3eae176992 100644 --- a/block/io.c +++ b/block/io.c @@ -126,6 +126,8 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src) { dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer); dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer); +dst->max_ioctl_transfer = MIN_NON_ZERO(dst->max_ioctl_transfer, +src->max_ioctl_transfer); dst->opt_mem_alignment = MAX(dst->opt_mem_alignment, src->opt_mem_alignment); dst->min_mem_alignment = MAX(dst->min_mem_alignment, diff --git a/include/block/block_int.h b/include/block/block_int.h index 95d9333be1..e9874c8c23 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -678,6 +678,10 @@ typedef struct BlockLimits { * clamped down. */ uint32_t max_transfer; +/* Maximal transfer length for SCSI passthrough (ioctl interface) */ +uint32_t max_ioctl_transfer; + + /* memory alignment, in bytes so that no bounce buffer is needed */ size_t min_mem_alignment; diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8203d7f6f9..b019a37b7a 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -203,6 +203,7 @@ void blk_eject(BlockBackend *blk, bool eject_flag); int blk_get_flags(BlockBackend *blk); uint32_t blk_get_request_alignment(BlockBackend *blk); uint32_t blk_get_max_transfer(BlockBackend *blk); +uint32_t blk_get_max_ioctl_transfer(BlockBackend *blk); int blk_get_max_iov(BlockBackend *blk); void blk_set_guest_block_size(BlockBackend *blk, int align); void *blk_try_blockalign(BlockBackend *blk, size_t size); -- 2.26.2
[PATCH v2 1/5] file-posix: split hdev_refresh_limits from raw_refresh_limits
From: Tom Yan We can and should get max transfer length and max segments for all host devices / cdroms (on Linux). Also use MIN_NON_ZERO instead when we clamp max transfer length against max segments. Signed-off-by: Tom Yan Signed-off-by: Maxim Levitsky --- block/file-posix.c | 59 +- 1 file changed, 43 insertions(+), 16 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index d5fd1dbcd2..226ddbbdad 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1162,6 +1162,12 @@ static void raw_reopen_abort(BDRVReopenState *state) static int sg_get_max_transfer_length(int fd) { +/* + * BLKSECTGET for /dev/sg* character devices incorrectly returns + * the max transfer size in bytes (rather than in blocks). + * Also note that /dev/sg* doesn't support BLKSSZGET ioctl. + */ + #ifdef BLKSECTGET int max_bytes = 0; @@ -1175,7 +1181,22 @@ static int sg_get_max_transfer_length(int fd) #endif } -static int sg_get_max_segments(int fd) +static int get_max_transfer_length(int fd) +{ +#if defined(BLKSECTGET) +int sect = 0; + +if (ioctl(fd, BLKSECTGET, ) == 0) { +return sect << 9; +} else { +return -errno; +} +#else +return -ENOSYS; +#endif +} + +static int get_max_segments(int fd) { #ifdef CONFIG_LINUX char buf[32]; @@ -1230,23 +1251,29 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; -if (bs->sg) { -int ret = sg_get_max_transfer_length(s->fd); +raw_probe_alignment(bs, s->fd, errp); +bs->bl.min_mem_alignment = s->buf_align; +bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size); +} -if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) { -bs->bl.max_transfer = pow2floor(ret); -} +static void hdev_refresh_limits(BlockDriverState *bs, Error **errp) +{ +BDRVRawState *s = bs->opaque; -ret = sg_get_max_segments(s->fd); -if (ret > 0) { -bs->bl.max_transfer = MIN(bs->bl.max_transfer, - ret * qemu_real_host_page_size); -} +int ret = bs->sg ? sg_get_max_transfer_length(s->fd) : + get_max_transfer_length(s->fd); + +if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) { +bs->bl.max_transfer = pow2floor(ret); } -raw_probe_alignment(bs, s->fd, errp); -bs->bl.min_mem_alignment = s->buf_align; -bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size); +ret = get_max_segments(s->fd); +if (ret > 0) { +bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, + ret * qemu_real_host_page_size); +} + +raw_refresh_limits(bs, errp); } static int check_for_dasd(int fd) @@ -3601,7 +3628,7 @@ static BlockDriver bdrv_host_device = { .bdrv_co_pdiscard = hdev_co_pdiscard, .bdrv_co_copy_range_from = raw_co_copy_range_from, .bdrv_co_copy_range_to = raw_co_copy_range_to, -.bdrv_refresh_limits = raw_refresh_limits, +.bdrv_refresh_limits = hdev_refresh_limits, .bdrv_io_plug = raw_aio_plug, .bdrv_io_unplug = raw_aio_unplug, .bdrv_attach_aio_context = raw_aio_attach_aio_context, @@ -3725,7 +3752,7 @@ static BlockDriver bdrv_host_cdrom = { .bdrv_co_preadv = raw_co_preadv, .bdrv_co_pwritev= raw_co_pwritev, .bdrv_co_flush_to_disk = raw_co_flush_to_disk, -.bdrv_refresh_limits = raw_refresh_limits, +.bdrv_refresh_limits = hdev_refresh_limits, .bdrv_io_plug = raw_aio_plug, .bdrv_io_unplug = raw_aio_unplug, .bdrv_attach_aio_context = raw_aio_attach_aio_context, -- 2.26.2
[PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough
This patch series attempts to provide a solution to the problem of the transfer limits of the raw file driver (host_device/file-posix), some of which I already tried to fix in the past. I included 2 patches from Tom Yan which fix two issues with reading the limits correctly from the */dev/sg* character devices in the first place. The only change to these patches is that I tweaked a bit the comments in the source to better document the /dev/sg quirks. The other two patches in this series split the max transfer limits that qemu block devices expose in two: One limit is for the regular IO, and another is for the SG_IO (aka bdrv_*_ioctl), and the two device drivers (scsi-block and scsi-generic) that use the later are switched to the new interface. This should ensure that the raw driver can still advertise the unlimited transfer length, unless it is used for SG_IO, because that yields the highest performance. Also I include a somewhat unrelated fix to a bug I found in qemu's SCSI passthrough while testing this: When qemu emulates the VPD block limit page, for a SCSI device that doesn't implement it, it doesn't really advertise the emulated page to the guest. I tested this by doing both regular and SG_IO passthrough of my USB SD card reader. That device turned out to be a perfect device for the task, since it has max transfer size of 1024 blocks (512K), and it enforces it. Also it didn't implement the VPD block limits page, (transfer size limit probably comes from something USB related) which triggered the unrelated bug. I was able to see IO errors without the patches, and the wrong max transfer size in the guest, and with patches both issues were gone. I also found an unrelated issue in /dev/sg passthrough in the kernel. It turns out that in-kernel driver has a limitation of 16 requests in flight, regardless of what underlying device supports. With a large multi-threaded fio job and a debug print in qemu, it is easy to see it, although the errors don't do much harm to the guest as it retries the IO, and eventually succeed. It is an open question if this should be solved. V2: fixed an issue in a patch from Tom Yan (thanks), and removed refactoring from last patch according to Paulo's request. Maxim Levitsky (3): block: add max_ioctl_transfer to BlockLimits block: use blk_get_max_ioctl_transfer for SCSI passthrough block/scsi: correctly emulate the VPD block limits page Tom Yan (2): file-posix: split hdev_refresh_limits from raw_refresh_limits file-posix: add sg_get_max_segments that actually works with sg block/block-backend.c | 12 ++ block/file-posix.c | 77 +++--- block/io.c | 2 + block/iscsi.c | 1 + hw/scsi/scsi-generic.c | 12 -- include/block/block_int.h | 4 ++ include/sysemu/block-backend.h | 1 + 7 files changed, 90 insertions(+), 19 deletions(-) -- 2.26.2
Re: [PATCH RFC] qemu co-mutex crash / question
09.12.2020 15:32, Vladimir Sementsov-Ogievskiy wrote: test-aio-multithread: ../util/qemu-coroutine-lock.c:197: qemu_co_mutex_wake: Assertion `mutex == co->wait_on_mutex' failed. Thread 18 "test-aio-multit" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe5ffb700 (LWP 24549)] 0x77063625 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x77063625 in raise () from /lib64/libc.so.6 #1 0x7704c8d9 in abort () from /lib64/libc.so.6 #2 0x7704c7a9 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x7705ba66 in __assert_fail () from /lib64/libc.so.6 #4 0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 , co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197 #5 0x5568c5a0 in qemu_co_mutex_unlock (mutex=0x55771360 ) at ../util/qemu-coroutine-lock.c:307 #6 0x5557acfd in test_multi_co_mutex_entry (opaque=0x0) at ../tests/test-aio-multithread.c:208 #7 0x556bb5d7 in coroutine_trampoline (i0=1434467712, i1=21845) at ../util/coroutine-ucontext.c:173 #8 0x77078d30 in ?? () from /lib64/libc.so.6 #9 0x7fffd850 in ?? () #10 0x in ?? () (gdb) fr 4 #4 0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 , co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197 197 assert(mutex == co->wait_on_mutex); (gdb) p mutex $1 = (CoMutex *) 0x55771360 (gdb) p co->wait_on_mutex $2 = (CoMutex *) 0x55771360 (gdb) p mutex == co->wait_on_mutex $3 = 1 So, it failed, but in gdb the condition is true.. How can that be? Interesting: I tried to run test on one cpu: for i in {1..100}; do taskset -c 0 ./build/tests/test-aio-multithread -p /aio/multi/mutex/handoff; done with taskset it takes a lot more tries to reproduce, but finally I have correct coredump with correct assertion failure: (gdb) bt #0 0x7ff7fa22d625 in raise () from /lib64/libc.so.6 #1 0x7ff7fa2168d9 in abort () from /lib64/libc.so.6 #2 0x7ff7fa2167a9 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x7ff7fa225a66 in __assert_fail () from /lib64/libc.so.6 #4 0x564c7ca99153 in qemu_co_mutex_wake (mutex=0x564c7cb7e360 , co=0x564c7d3f5c40) at ../util/qemu-coroutine-lock.c:197 #5 0x564c7ca995a0 in qemu_co_mutex_unlock (mutex=0x564c7cb7e360 ) at ../util/qemu-coroutine-lock.c:307 #6 0x564c7c987cfd in test_multi_co_mutex_entry (opaque=0x0) at ../tests/test-aio-multithread.c:208 #7 0x564c7cac85d7 in coroutine_trampoline (i0=2101304064, i1=22092) at ../util/coroutine-ucontext.c:173 #8 0x7ff7fa242d30 in ?? () from /lib64/libc.so.6 #9 0x7ffd3b3c6ac0 in ?? () #10 0x in ?? () Backtrace stopped: Cannot access memory at address 0x7ff7ed19c000 (gdb) fr 4 #4 0x564c7ca99153 in qemu_co_mutex_wake (mutex=0x564c7cb7e360 , co=0x564c7d3f5c40) at ../util/qemu-coroutine-lock.c:197 197 assert(mutex == co->wait_on_mutex); (gdb) p mutex $1 = (CoMutex *) 0x564c7cb7e360 (gdb) p co->wait_on_mutex $2 = (CoMutex *) 0x0 other interesting threads: Thread 7 (Thread 0x7ff7ef19f700 (LWP 261134)): #0 0x564c7ca98f99 in push_waiter (mutex=0x564c7cb7e360 , w=0x7ff7ed09aea0) at ../util/qemu-coroutine-lock.c:151 #1 0x564c7ca991c4 in qemu_co_mutex_lock_slowpath (ctx=0x7ff7e4000b60, mutex=0x564c7cb7e360 ) at ../util/qemu-coroutine-lock.c:211 #2 0x564c7ca993f5 in qemu_co_mutex_lock (mutex=0x564c7cb7e360 ) at ../util/qemu-coroutine-lock.c:277 #3 0x564c7c987ce2 in test_multi_co_mutex_entry (opaque=0x0) at ../tests/test-aio-multithread.c:206 #4 0x564c7cac85d7 in coroutine_trampoline (i0=2101304384, i1=22092) at ../util/coroutine-ucontext.c:173 #5 0x7ff7fa242d30 in ?? () from /lib64/libc.so.6 #6 0x7ffd3b3c6ac0 in ?? () #7 0x in ?? () #0 0x7ff7fa3cdf55 in nanosleep () from /lib64/libpthread.so.0 #1 0x7ff7fb0d27b7 in g_usleep () from /lib64/libglib-2.0.so.0 #2 0x564c7c987e05 in test_multi_co_mutex (threads=2, seconds=3) at ../tests/test-aio-multithread.c:237 #3 0x564c7c987eff in test_multi_co_mutex_2_3 () at ../tests/test-aio-multithread.c:270 #4 0x7ff7fb0cface in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #5 0x7ff7fb0cf874 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #6 0x7ff7fb0cf874 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #7 0x7ff7fb0cf874 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #8 0x7ff7fb0cff7b in g_test_run_suite () from /lib64/libglib-2.0.so.0 #9 0x7ff7fb0cffd5 in g_test_run () from /lib64/libglib-2.0.so.0 #10 0x564c7c98874e in main (argc=1, argv=0x7ffd3b3c7868) at ../tests/test-aio-multithread.c:459 -- Best regards, Vladimir
Re: [PATCH RFC] qemu co-mutex crash / question
09.12.2020 15:32, Vladimir Sementsov-Ogievskiy wrote: Hi all! I have a coredump of our qemu branch, based on rhev-2.12.0-44.el7_8.2, which in turn is based on v2.12.0.. And don't have any kind of reproduce. The backtrace: #0 aio_co_schedule (ctx=0x0, co=0x55dd539fa340) at util/async.c:455 #1 0x55dd51149716 in aio_co_enter (ctx=, co=) at util/async.c:476 #2 0x55dd511497bc in aio_co_wake (co=) at util/async.c:470 #3 0x55dd5115ea43 in qemu_co_mutex_wake (mutex=0x55dd539c36b0, co=) at util/qemu-coroutine-lock.c:197 #4 qemu_co_mutex_unlock (mutex=mutex@entry=0x55dd539c36b0) at util/qemu-coroutine-lock.c:300 #5 0x55dd5109f4e0 in qcow2_co_pwritev_cluster_compressed (qiov=0x7fcbbc972a70, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at block/qcow2.c:4360 #6 qcow2_co_pwritev_compressed (bs=0x55dd539fe000, offset=17582325760, bytes=65536, qiov=0x7fcbbc972de0) at block/qcow2.c:4425 #7 0x55dd510d5cd2 in bdrv_driver_pwritev_compressed (qiov=0x7fcbbc972de0, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at block/io.c:1227 #8 bdrv_aligned_pwritev (req=req@entry=0x7fcbbc972c60, offset=offset@entry=17582325760, bytes=bytes@entry=65536, align=align@entry=1, qiov=qiov@entry=0x7fcbbc972de0, flags=flags@entry=32, child=0x55dd539cea80, child=0x55dd539cea80) at block/io.c:1850 #9 0x55dd510d6369 in bdrv_co_pwritev (child=0x55dd539cea80, offset=offset@entry=17582325760, bytes=bytes@entry=65536, qiov=qiov@entry=0x7fcbbc972de0, flags=BDRV_REQ_WRITE_COMPRESSED) at block/io.c:2144 #10 0x55dd510c3644 in blk_co_pwritev (blk=0x55dd539fc300, offset=17582325760, bytes=65536, qiov=0x7fcbbc972de0, flags=) at block/block-backend.c:1237 #11 0x55dd510c372c in blk_write_entry (opaque=0x7fcbbc972e00) at block/block-backend.c:1264 #12 0x55dd510c1e18 in blk_prw (blk=0x55dd539fc300, offset=17582325760, buf=buf@entry=0x55dd54a38000 "", bytes=bytes@entry=65536, co_entry=co_entry@entry=0x55dd510c3710 , flags=BDRV_REQ_WRITE_COMPRESSED) at block/block-backend.c:1292 #13 0x55dd510c2f45 in blk_pwrite (blk=, offset=, buf=buf@entry=0x55dd54a38000, count=count@entry=65536, flags=) at block/block-backend.c:1486 #14 0x55dd510ef949 in nbd_handle_request (errp=0x7fcbbc972ef8, data=0x55dd54a38000 "", request=, client=0x55dd539ee420) at nbd/server.c:2264 #15 nbd_trip (opaque=0x55dd539ee420) at nbd/server.c:2393 #16 0x55dd5115f72a in coroutine_trampoline (i0=, i1=) at util/coroutine-ucontext.c:116 #17 0x7fcbc5422190 in ?? () from /work/crash-bugs/PSBM-123528/ccpp-2020-12-08-00_59_06-418945/root/lib64/libc.so.6 #18 0x7fcbbca736d0 in ?? () #19 0x in ?? () Backtrace stopped: Cannot access memory at address 0x7fcbbc973000 (gdb) p *co $1 = {entry = 0x0, entry_arg = 0x0, caller = 0x0, pool_next = {sle_next = 0x0}, locks_held = 0, ctx = 0x0, scheduled = 0x55dd51195660 <__func__.23793> "aio_co_schedule", co_queue_next = {sqe_next = 0x0}, co_queue_wakeup = {sqh_first = 0x0, sqh_last = 0x55dd539f5680}, co_scheduled_next = {sle_next = 0x0}} So, it looks like we want to wake up a coroutine on co-mutex unlock, but the coroutine is already exited.. I had no idea what how to debug it, and decided to add some assertions, to make sure that coroutine waiting on mutex is entered through qemu_co_mutex_unlock, see the patch below. Still, when I run make check with this patch applied, I faced a crash in ./tests/test-aio-multithread, my assertion fails: test-aio-multithread: ../util/qemu-coroutine-lock.c:197: qemu_co_mutex_wake: Assertion `mutex == co->wait_on_mutex' failed. Thread 18 "test-aio-multit" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe5ffb700 (LWP 24549)] 0x77063625 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x77063625 in raise () from /lib64/libc.so.6 #1 0x7704c8d9 in abort () from /lib64/libc.so.6 #2 0x7704c7a9 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x7705ba66 in __assert_fail () from /lib64/libc.so.6 #4 0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 , co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197 #5 0x5568c5a0 in qemu_co_mutex_unlock (mutex=0x55771360 ) at ../util/qemu-coroutine-lock.c:307 #6 0x5557acfd in test_multi_co_mutex_entry (opaque=0x0) at ../tests/test-aio-multithread.c:208 #7 0x556bb5d7 in coroutine_trampoline (i0=1434467712, i1=21845) at ../util/coroutine-ucontext.c:173 #8 0x77078d30 in ?? () from /lib64/libc.so.6 #9 0x7fffd850 in ?? () #10 0x in ?? () (gdb) fr 4 #4 0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 , co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197 197 assert(mutex == co->wait_on_mutex);
[PATCH RFC] qemu co-mutex crash / question
Hi all! I have a coredump of our qemu branch, based on rhev-2.12.0-44.el7_8.2, which in turn is based on v2.12.0.. And don't have any kind of reproduce. The backtrace: #0 aio_co_schedule (ctx=0x0, co=0x55dd539fa340) at util/async.c:455 #1 0x55dd51149716 in aio_co_enter (ctx=, co=) at util/async.c:476 #2 0x55dd511497bc in aio_co_wake (co=) at util/async.c:470 #3 0x55dd5115ea43 in qemu_co_mutex_wake (mutex=0x55dd539c36b0, co=) at util/qemu-coroutine-lock.c:197 #4 qemu_co_mutex_unlock (mutex=mutex@entry=0x55dd539c36b0) at util/qemu-coroutine-lock.c:300 #5 0x55dd5109f4e0 in qcow2_co_pwritev_cluster_compressed (qiov=0x7fcbbc972a70, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at block/qcow2.c:4360 #6 qcow2_co_pwritev_compressed (bs=0x55dd539fe000, offset=17582325760, bytes=65536, qiov=0x7fcbbc972de0) at block/qcow2.c:4425 #7 0x55dd510d5cd2 in bdrv_driver_pwritev_compressed (qiov=0x7fcbbc972de0, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at block/io.c:1227 #8 bdrv_aligned_pwritev (req=req@entry=0x7fcbbc972c60, offset=offset@entry=17582325760, bytes=bytes@entry=65536, align=align@entry=1, qiov=qiov@entry=0x7fcbbc972de0, flags=flags@entry=32, child=0x55dd539cea80, child=0x55dd539cea80) at block/io.c:1850 #9 0x55dd510d6369 in bdrv_co_pwritev (child=0x55dd539cea80, offset=offset@entry=17582325760, bytes=bytes@entry=65536, qiov=qiov@entry=0x7fcbbc972de0, flags=BDRV_REQ_WRITE_COMPRESSED) at block/io.c:2144 #10 0x55dd510c3644 in blk_co_pwritev (blk=0x55dd539fc300, offset=17582325760, bytes=65536, qiov=0x7fcbbc972de0, flags=) at block/block-backend.c:1237 #11 0x55dd510c372c in blk_write_entry (opaque=0x7fcbbc972e00) at block/block-backend.c:1264 #12 0x55dd510c1e18 in blk_prw (blk=0x55dd539fc300, offset=17582325760, buf=buf@entry=0x55dd54a38000 "", bytes=bytes@entry=65536, co_entry=co_entry@entry=0x55dd510c3710 , flags=BDRV_REQ_WRITE_COMPRESSED) at block/block-backend.c:1292 #13 0x55dd510c2f45 in blk_pwrite (blk=, offset=, buf=buf@entry=0x55dd54a38000, count=count@entry=65536, flags=) at block/block-backend.c:1486 #14 0x55dd510ef949 in nbd_handle_request (errp=0x7fcbbc972ef8, data=0x55dd54a38000 "", request=, client=0x55dd539ee420) at nbd/server.c:2264 #15 nbd_trip (opaque=0x55dd539ee420) at nbd/server.c:2393 #16 0x55dd5115f72a in coroutine_trampoline (i0=, i1=) at util/coroutine-ucontext.c:116 #17 0x7fcbc5422190 in ?? () from /work/crash-bugs/PSBM-123528/ccpp-2020-12-08-00_59_06-418945/root/lib64/libc.so.6 #18 0x7fcbbca736d0 in ?? () #19 0x in ?? () Backtrace stopped: Cannot access memory at address 0x7fcbbc973000 (gdb) p *co $1 = {entry = 0x0, entry_arg = 0x0, caller = 0x0, pool_next = {sle_next = 0x0}, locks_held = 0, ctx = 0x0, scheduled = 0x55dd51195660 <__func__.23793> "aio_co_schedule", co_queue_next = {sqe_next = 0x0}, co_queue_wakeup = {sqh_first = 0x0, sqh_last = 0x55dd539f5680}, co_scheduled_next = {sle_next = 0x0}} So, it looks like we want to wake up a coroutine on co-mutex unlock, but the coroutine is already exited.. I had no idea what how to debug it, and decided to add some assertions, to make sure that coroutine waiting on mutex is entered through qemu_co_mutex_unlock, see the patch below. Still, when I run make check with this patch applied, I faced a crash in ./tests/test-aio-multithread, my assertion fails: test-aio-multithread: ../util/qemu-coroutine-lock.c:197: qemu_co_mutex_wake: Assertion `mutex == co->wait_on_mutex' failed. Thread 18 "test-aio-multit" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe5ffb700 (LWP 24549)] 0x77063625 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x77063625 in raise () from /lib64/libc.so.6 #1 0x7704c8d9 in abort () from /lib64/libc.so.6 #2 0x7704c7a9 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x7705ba66 in __assert_fail () from /lib64/libc.so.6 #4 0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 , co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197 #5 0x5568c5a0 in qemu_co_mutex_unlock (mutex=0x55771360 ) at ../util/qemu-coroutine-lock.c:307 #6 0x5557acfd in test_multi_co_mutex_entry (opaque=0x0) at ../tests/test-aio-multithread.c:208 #7 0x556bb5d7 in coroutine_trampoline (i0=1434467712, i1=21845) at ../util/coroutine-ucontext.c:173 #8 0x77078d30 in ?? () from /lib64/libc.so.6 #9 0x7fffd850 in ?? () #10 0x in ?? () (gdb) fr 4 #4 0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 , co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197 197 assert(mutex == co->wait_on_mutex); (gdb) p mutex $1 = (CoMutex *) 0x55771360 (gdb) p co->wait_on_mutex $2 =
Re: qemu 6.0 rbd driver rewrite
Am 01.12.20 um 13:40 schrieb Peter Lieven: > Hi, > > > i would like to submit a series for 6.0 which will convert the aio hooks to > native coroutine hooks and add write zeroes support. > > The aio routines are nowadays just an emulation on top of coroutines which > add additional overhead. > > For this I would like to lift the minimum librbd requirement to luminous > release to get rid of the ifdef'ry in the code. > > > Any objections? > > > Best, > > Peter > Kindly pinging as the 6.0 dev tree is now open. Also cc'ing qemu-devel which I accidently forgot. Peter
[PATCH] block/nfs: fix int overflow in nfs_client_open_qdict
nfs_client_open returns the file size in sectors. This effectively makes it impossible to open files larger than 1TB. Fixes: a1a42af422d46812f1f0cebe6b230c20409a3731 Cc: qemu-sta...@nongnu.org Signed-off-by: Peter Lieven --- block/nfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/nfs.c b/block/nfs.c index 77905f516d..8c1968bb41 100644 --- a/block/nfs.c +++ b/block/nfs.c @@ -592,7 +592,7 @@ static int64_t nfs_client_open_qdict(NFSClient *client, QDict *options, int flags, int open_flags, Error **errp) { BlockdevOptionsNfs *opts; -int ret; +int64_t ret; opts = nfs_options_qdict_to_qapi(options, errp); if (opts == NULL) { -- 2.17.1
Re: [PATCH] hw/block/nvme: fix bad clearing of CAP
On Dec 8 10:16, Klaus Jensen wrote: > From: Klaus Jensen > > Commit 37712e00b1f0 ("hw/block/nvme: factor out pmr setup") changed the > control flow such that the CAP register is erronously cleared after > nvme_init_pmr() has configured it. Since the entire NvmeCtrl structure > is zero-filled initially, there is no need for the explicit clearing, so > just remove it. > > Fixes: 37712e00b1f0 ("hw/block/nvme: factor out pmr setup") > Signed-off-by: Klaus Jensen > --- > hw/block/nvme.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index 8814201364c1..28416b18a5c0 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -3040,7 +3040,6 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice > *pci_dev) > id->psd[0].enlat = cpu_to_le32(0x10); > id->psd[0].exlat = cpu_to_le32(0x4); > > -n->bar.cap = 0; > NVME_CAP_SET_MQES(n->bar.cap, 0x7ff); > NVME_CAP_SET_CQR(n->bar.cap, 1); > NVME_CAP_SET_TO(n->bar.cap, 0xf); > -- > 2.29.2 > > Thanks for the reviews, applied to nvme-next. signature.asc Description: PGP signature
[PATCH v4 2/2] hw/block/nvme: add simple copy command
From: Klaus Jensen Add support for TP 4065a ("Simple Copy Command"), v2020.05.04 ("Ratified"). The implementation uses a bounce buffer to first read in the source logical blocks, then issue a write of that bounce buffer. The default maximum number of source logical blocks is 128, translating to 512 KiB for 4k logical blocks which aligns with the default value of MDTS. Signed-off-by: Klaus Jensen --- hw/block/nvme-ns.h| 4 + hw/block/nvme.h | 1 + hw/block/nvme-ns.c| 8 ++ hw/block/nvme.c | 217 +- hw/block/trace-events | 6 ++ 5 files changed, 235 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 44bf6271b744..745d288b09cf 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,10 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; + +uint16_t mssrl; +uint32_t mcl; +uint8_t msrc; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 574333caa3f9..f549abeeb930 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -62,6 +62,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc) case NVME_CMD_READ: return "NVME_NVM_CMD_READ"; case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES"; case NVME_CMD_DSM: return "NVME_NVM_CMD_DSM"; +case NVME_CMD_COPY: return "NVME_NVM_CMD_COPY"; default:return "NVME_NVM_CMD_UNKNOWN"; } } diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 2d69b5177b51..f53f8fc56fd8 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -59,6 +59,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->npda = id_ns->npdg = npdg - 1; +/* simple copy */ +id_ns->mssrl = cpu_to_le16(ns->params.mssrl); +id_ns->mcl = cpu_to_le32(ns->params.mcl); +id_ns->msrc = ns->params.msrc; + return 0; } @@ -150,6 +155,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), +DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128), +DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128), +DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 8814201364c1..fbfeb7ac8140 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -999,6 +999,109 @@ static void nvme_aio_discard_cb(void *opaque, int ret) nvme_enqueue_req_completion(nvme_cq(req), req); } +struct nvme_copy_ctx { +int copies; +uint8_t *bounce; +uint32_t nlb; +}; + +struct nvme_copy_in_ctx { +NvmeRequest *req; +QEMUIOVector iov; +}; + +static void nvme_copy_cb(void *opaque, int ret) +{ +NvmeRequest *req = opaque; +NvmeNamespace *ns = req->ns; +struct nvme_copy_ctx *ctx = req->opaque; + +trace_pci_nvme_copy_cb(nvme_cid(req)); + +if (!ret) { +block_acct_done(blk_get_stats(ns->blkconf.blk), >acct); +} else { +block_acct_failed(blk_get_stats(ns->blkconf.blk), >acct); +nvme_aio_err(req, ret); +} + +g_free(ctx->bounce); +g_free(ctx); + +nvme_enqueue_req_completion(nvme_cq(req), req); +} + +static void nvme_copy_in_complete(NvmeRequest *req) +{ +NvmeNamespace *ns = req->ns; +NvmeCopyCmd *copy = (NvmeCopyCmd *)>cmd; +struct nvme_copy_ctx *ctx = req->opaque; +uint64_t sdlba = le64_to_cpu(copy->sdlba); +uint16_t status; + +trace_pci_nvme_copy_in_complete(nvme_cid(req)); + +block_acct_done(blk_get_stats(ns->blkconf.blk), >acct); + +status = nvme_check_bounds(ns, sdlba, ctx->nlb); +if (status) { +trace_pci_nvme_err_invalid_lba_range(sdlba, ctx->nlb, ns->id_ns.nsze); +req->status = status; + +g_free(ctx->bounce); +g_free(ctx); + +nvme_enqueue_req_completion(nvme_cq(req), req); + +return; +} + +qemu_iovec_init(>iov, 1); +qemu_iovec_add(>iov, ctx->bounce, nvme_l2b(ns, ctx->nlb)); + +block_acct_start(blk_get_stats(ns->blkconf.blk), >acct, + nvme_l2b(ns, ctx->nlb), BLOCK_ACCT_WRITE); + +req->aiocb = blk_aio_pwritev(ns->blkconf.blk, nvme_l2b(ns, sdlba), + >iov, 0, nvme_copy_cb, req); +} + +static void nvme_aio_copy_in_cb(void *opaque, int ret) +{ +struct nvme_copy_in_ctx *in_ctx = opaque; +NvmeRequest *req = in_ctx->req; +NvmeNamespace *ns = req->ns; +struct nvme_copy_ctx *ctx = req->opaque; + +qemu_iovec_destroy(_ctx->iov); +g_free(in_ctx); + +trace_pci_nvme_aio_copy_in_cb(nvme_cid(req)); + +if (ret) { +nvme_aio_err(req, ret); +} + +ctx->copies--; + +if (ctx->copies) { +return; +} + +if (req->status) { +
[PATCH v4 1/2] nvme: updated shared header for copy command
From: Klaus Jensen Add new data structures and types for the Simple Copy command. Signed-off-by: Klaus Jensen Cc: Stefan Hajnoczi Cc: Fam Zheng Reviewed-by: Minwoo Im Acked-by: Stefan Hajnoczi --- include/block/nvme.h | 45 ++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/include/block/nvme.h b/include/block/nvme.h index e95ff6ca9b37..be3aca913a1d 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -472,6 +472,7 @@ enum NvmeIoCommands { NVME_CMD_COMPARE= 0x05, NVME_CMD_WRITE_ZEROES = 0x08, NVME_CMD_DSM= 0x09, +NVME_CMD_COPY = 0x19, }; typedef struct QEMU_PACKED NvmeDeleteQ { @@ -603,6 +604,35 @@ typedef struct QEMU_PACKED NvmeDsmRange { uint64_tslba; } NvmeDsmRange; +enum { +NVME_COPY_FORMAT_0 = 0x0, +}; + +typedef struct NvmeCopyCmd { +uint8_t opcode; +uint8_t flags; +uint16_tcid; +uint32_tnsid; +uint32_trsvd2[4]; +NvmeCmdDptr dptr; +uint64_tsdlba; +uint32_tcdw12; +uint32_tcdw13; +uint32_tilbrt; +uint16_tlbat; +uint16_tlbatm; +} NvmeCopyCmd; + +typedef struct NvmeCopySourceRange { +uint8_t rsvd0[8]; +uint64_t slba; +uint16_t nlb; +uint8_t rsvd18[6]; +uint32_t eilbrt; +uint16_t elbat; +uint16_t elbatm; +} NvmeCopySourceRange; + enum NvmeAsyncEventRequest { NVME_AER_TYPE_ERROR = 0, NVME_AER_TYPE_SMART = 1, @@ -680,6 +710,7 @@ enum NvmeStatusCodes { NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO= 0x0182, +NVME_CMD_SIZE_LIMIT = 0x0183, NVME_WRITE_FAULT= 0x0280, NVME_UNRECOVERED_READ = 0x0281, NVME_E2E_GUARD_ERROR= 0x0282, @@ -831,7 +862,7 @@ typedef struct QEMU_PACKED NvmeIdCtrl { uint8_t nvscc; uint8_t rsvd531; uint16_tacwu; -uint8_t rsvd534[2]; +uint16_tocfs; uint32_tsgls; uint8_t rsvd540[228]; uint8_t subnqn[256]; @@ -854,6 +885,11 @@ enum NvmeIdCtrlOncs { NVME_ONCS_FEATURES = 1 << 4, NVME_ONCS_RESRVATIONS = 1 << 5, NVME_ONCS_TIMESTAMP = 1 << 6, +NVME_ONCS_COPY = 1 << 8, +}; + +enum NvmeIdCtrlOcfs { +NVME_OCFS_COPY_FORMAT_0 = 1 << 0, }; enum NvmeIdCtrlFrmw { @@ -995,7 +1031,10 @@ typedef struct QEMU_PACKED NvmeIdNs { uint16_tnpdg; uint16_tnpda; uint16_tnows; -uint8_t rsvd74[30]; +uint16_tmssrl; +uint32_tmcl; +uint8_t msrc; +uint8_t rsvd81[23]; uint8_t nguid[16]; uint64_teui64; NvmeLBAFlbaf[16]; @@ -1059,6 +1098,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 4); QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16); +QEMU_BUILD_BUG_ON(sizeof(NvmeCopySourceRange) != 32); QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeDeleteQ) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeCreateCq) != 64); @@ -1066,6 +1106,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64); +QEMU_BUILD_BUG_ON(sizeof(NvmeCopyCmd) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512); -- 2.29.2
[PATCH v4 0/2] hw/block/nvme: add simple copy command
From: Klaus Jensen Add support for TP 4065 ("Simple Copy Command"). Changes for v4 * merge for-loops (Keith) Changes for v3 * rebased on nvme-next * changed the default msrc value to a more reasonable 127 from 255 to better align with the default mcl value of 128. Changes for v2 * prefer style that aligns with existing NvmeIdCtrl field enums (Minwoo) * swapped elbat/elbatm fields in copy source range. I've kept the R-b and A-b from Minwoo and Stefan since this is a non-functional change (the device does not use these fields at all). Klaus Jensen (2): nvme: updated shared header for copy command hw/block/nvme: add simple copy command hw/block/nvme-ns.h| 4 + hw/block/nvme.h | 1 + include/block/nvme.h | 45 - hw/block/nvme-ns.c| 8 ++ hw/block/nvme.c | 217 +- hw/block/trace-events | 6 ++ 6 files changed, 278 insertions(+), 3 deletions(-) -- 2.29.2
Re: [PATCH] hw/block/nvme: fix bad clearing of CAP
Hello, Reviewed-by: Minwoo Im
[PATCH v3 1/3] docs: generate qemu-storage-daemon-qmp-ref(7) man page
Although individual qemu-storage-daemon QMP commands are identical to QEMU QMP commands, qemu-storage-daemon only supports a subset of QEMU's QMP commands. Generate a manual page of just the commands supported by qemu-storage-daemon so that users know exactly what is available in qemu-storage-daemon. Add an h1 heading in storage-daemon/qapi/qapi-schema.json so that block-core.json is at the h2 heading level. Signed-off-by: Stefan Hajnoczi --- docs/interop/index.rst | 1 + docs/interop/qemu-storage-daemon-qmp-ref.rst | 13 + storage-daemon/qapi/qapi-schema.json | 3 +++ docs/interop/conf.py | 2 ++ docs/meson.build | 1 + 5 files changed, 20 insertions(+) create mode 100644 docs/interop/qemu-storage-daemon-qmp-ref.rst diff --git a/docs/interop/index.rst b/docs/interop/index.rst index cd78d679d8..95d56495f6 100644 --- a/docs/interop/index.rst +++ b/docs/interop/index.rst @@ -20,6 +20,7 @@ Contents: qemu-ga qemu-ga-ref qemu-qmp-ref + qemu-storage-daemon-qmp-ref vhost-user vhost-user-gpu vhost-vdpa diff --git a/docs/interop/qemu-storage-daemon-qmp-ref.rst b/docs/interop/qemu-storage-daemon-qmp-ref.rst new file mode 100644 index 00..caf9dad23a --- /dev/null +++ b/docs/interop/qemu-storage-daemon-qmp-ref.rst @@ -0,0 +1,13 @@ +QEMU Storage Daemon QMP Reference Manual + + +.. + TODO: the old Texinfo manual used to note that this manual + is GPL-v2-or-later. We should make that reader-visible + both here and in our Sphinx manuals more generally. + +.. + TODO: display the QEMU version, both here and in our Sphinx manuals + more generally. + +.. qapi-doc:: storage-daemon/qapi/qapi-schema.json diff --git a/storage-daemon/qapi/qapi-schema.json b/storage-daemon/qapi/qapi-schema.json index c6ad5ae1e3..28117c3aac 100644 --- a/storage-daemon/qapi/qapi-schema.json +++ b/storage-daemon/qapi/qapi-schema.json @@ -15,6 +15,9 @@ { 'include': '../../qapi/pragma.json' } +## +# = Block devices +## { 'include': '../../qapi/block-core.json' } { 'include': '../../qapi/block-export.json' } { 'include': '../../qapi/char.json' } diff --git a/docs/interop/conf.py b/docs/interop/conf.py index 2634ca3410..f4370aaa13 100644 --- a/docs/interop/conf.py +++ b/docs/interop/conf.py @@ -23,4 +23,6 @@ man_pages = [ [], 7), ('qemu-qmp-ref', 'qemu-qmp-ref', 'QEMU QMP Reference Manual', [], 7), +('qemu-storage-daemon-qmp-ref', 'qemu-storage-daemon-qmp-ref', + 'QEMU Storage Daemon QMP Reference Manual', [], 7), ] diff --git a/docs/meson.build b/docs/meson.build index ebd85d59f9..df5dc50485 100644 --- a/docs/meson.build +++ b/docs/meson.build @@ -56,6 +56,7 @@ if build_docs 'qemu-ga.8': (have_tools ? 'man8' : ''), 'qemu-ga-ref.7': 'man7', 'qemu-qmp-ref.7': 'man7', +'qemu-storage-daemon-qmp-ref.7': (have_tools ? 'man7' : ''), }, 'tools': { 'qemu-img.1': (have_tools ? 'man1' : ''), -- 2.28.0
[PATCH v3 3/3] MAINTAINERS: add Kevin Wolf as storage daemon maintainer
The MAINTAINERS file was not updated when the storage daemon was merged. Signed-off-by: Stefan Hajnoczi Acked-by: Kevin Wolf Reviewed-by: Philippe Mathieu-Daudé --- MAINTAINERS | 9 + 1 file changed, 9 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 68bc160f41..8676730cc9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2146,6 +2146,15 @@ F: qobject/block-qdict.c F: tests/check-block-qdict.c T: git https://repo.or.cz/qemu/kevin.git block +Storage daemon +M: Kevin Wolf +L: qemu-block@nongnu.org +S: Supported +F: storage-daemon/ +F: docs/interop/qemu-storage-daemon-qmp-ref.rst +F: docs/tools/qemu-storage-daemon.rst +T: git https://repo.or.cz/qemu/kevin.git block + Block I/O path M: Stefan Hajnoczi M: Fam Zheng -- 2.28.0
[PATCH v3 2/3] docs: add qemu-storage-daemon(1) man page
Document the qemu-storage-daemon tool. Most of the command-line options are identical to their QEMU counterparts. Perhaps Sphinx hxtool integration could be extended to extract documentation for individual command-line options so they can be shared. For now the qemu-storage-daemon simply refers to the qemu(1) man page where the command-line options are identical. Signed-off-by: Stefan Hajnoczi --- docs/tools/index.rst | 1 + docs/tools/qemu-storage-daemon.rst | 148 + docs/tools/conf.py | 2 + 3 files changed, 151 insertions(+) create mode 100644 docs/tools/qemu-storage-daemon.rst diff --git a/docs/tools/index.rst b/docs/tools/index.rst index b99f86c7c6..3a5829c17a 100644 --- a/docs/tools/index.rst +++ b/docs/tools/index.rst @@ -11,6 +11,7 @@ Contents: :maxdepth: 2 qemu-img + qemu-storage-daemon qemu-nbd qemu-pr-helper qemu-trace-stap diff --git a/docs/tools/qemu-storage-daemon.rst b/docs/tools/qemu-storage-daemon.rst new file mode 100644 index 00..f63627eaf6 --- /dev/null +++ b/docs/tools/qemu-storage-daemon.rst @@ -0,0 +1,148 @@ +QEMU Storage Daemon +=== + +Synopsis + + +**qemu-storage-daemon** [options] + +Description +--- + +qemu-storage-daemon provides disk image functionality from QEMU, qemu-img, and +qemu-nbd in a long-running process controlled via QMP commands without running +a virtual machine. It can export disk images, run block job operations, and +perform other disk-related operations. The daemon is controlled via a QMP +monitor and initial configuration from the command-line. + +The daemon offers the following subset of QEMU features: + +* Block nodes +* Block jobs +* Block exports +* Throttle groups +* Character devices +* Crypto and secrets +* QMP +* IOThreads + +Commands can be sent over a QEMU Monitor Protocol (QMP) connection. See the +:manpage:`qemu-storage-daemon-qmp-ref(7)` manual page for a description of the +commands. + +The daemon runs until it is stopped using the ``quit`` QMP command or +SIGINT/SIGHUP/SIGTERM. + +**Warning:** Never modify images in use by a running virtual machine or any +other process; this may destroy the image. Also, be aware that querying an +image that is being modified by another process may encounter inconsistent +state. + +Options +--- + +.. program:: qemu-storage-daemon + +Standard options: + +.. option:: -h, --help + + Display help and exit + +.. option:: -V, --version + + Display version information and exit + +.. option:: -T, --trace [[enable=]PATTERN][,events=FILE][,file=FILE] + + .. include:: ../qemu-option-trace.rst.inc + +.. option:: --blockdev BLOCKDEVDEF + + is a block node definition. See the :manpage:`qemu(1)` manual page for a + description of block node properties and the :manpage:`qemu-block-drivers(7)` + manual page for a description of driver-specific parameters. + +.. option:: --chardev CHARDEVDEF + + is a character device definition. See the :manpage:`qemu(1)` manual page for + a description of character device properties. A common character device + definition configures a UNIX domain socket:: + + --chardev socket,id=char1,path=/tmp/qmp.sock,server,nowait + +.. option:: --export [type=]nbd,id=,node-name=[,name=][,writable=on|off][,bitmap=] + --export [type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=] + --export [type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,logical-block-size=][,num-queues=] + + is a block export definition. ``node-name`` is the block node that should be + exported. ``writable`` determines whether or not the export allows write + requests for modifying data (the default is off). + + The ``nbd`` export type requires ``--nbd-server`` (see below). ``name`` is + the NBD export name. ``bitmap`` is the name of a dirty bitmap reachable from + the block node, so the NBD client can use NBD_OPT_SET_META_CONTEXT with the + metadata context name "qemu:dirty-bitmap:BITMAP" to inspect the bitmap. + + The ``vhost-user-blk`` export type takes a vhost-user socket address on which + it accept incoming connections. Both + ``addr.type=unix,addr.path=`` for UNIX domain sockets and + ``addr.type=fd,addr.str=`` for file descriptor passing are supported. + ``logical-block-size`` sets the logical block size in bytes (the default is + 512). ``num-queues`` sets the number of virtqueues (the default is 1). + +.. option:: --monitor MONITORDEF + + is a QMP monitor definition. See the :manpage:`qemu(1)` manual page for + a description of QMP monitor properties. A common QMP monitor definition + configures a monitor on character device ``char1``:: + + --monitor chardev=char1 + +.. option:: --nbd-server addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=] + --nbd-server addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=] + + is a server
[PATCH v3 0/3] docs: add qemu-storage-daemon documentation
v3: * Address Kevin's comments v2: * Drop block-core.json h2 header removal, add an h1 header to storage-daemon/qapi/qapi-schema.json instead [Kevin] * Add Examples section to man page [Kevin] Add documentation for the qemu-storage-daemon program and its QMP commands. The man page looks like this: QEMU-STORAGE-DAEMON(1) QEMU QEMU-STORAGE-DAEMON(1) NAME qemu-storage-daemon - QEMU storage daemon SYNOPSIS qemu-storage-daemon [options] DESCRIPTION qemu-storage-daemon provides disk image functionality from QEMU, qemu-img, and qemu-nbd in a long-running process con‐ trolled via QMP commands without running a virtual machine. It can export disk images, run block job operations, and perform other disk-related operations. The daemon is controlled via a QMP monitor and initial configuration from the command-line. The daemon offers the following subset of QEMU features: • Block nodes • Block jobs • Block exports • Throttle groups • Character devices • Crypto and secrets • QMP • IOThreads Commands can be sent over a QEMU Monitor Protocol (QMP) connec‐ tion. See the qemu-storage-daemon-qmp-ref(7) manual page for a description of the commands. The daemon runs until it is stopped using the quit QMP command or SIGINT/SIGHUP/SIGTERM. Warning: Never modify images in use by a running virtual ma‐ chine or any other process; this may destroy the image. Also, be aware that querying an image that is being modified by an‐ other process may encounter inconsistent state. OPTIONS Standard options: -h, --help Display help and exit -V, --version Display version information and exit -T, --trace [[enable=]PATTERN][,events=FILE][,file=FILE] Specify tracing options. [enable=]PATTERN Immediately enable events matching PATTERN (either event name or a globbing pattern). This option is only available if QEMU has been compiled with the simple, log or ftrace tracing backend. To specify multiple events or patterns, specify the -trace op‐ tion multiple times. Use -trace help to print a list of names of trace points. events=FILE Immediately enable events listed in FILE. The file must contain one event name (as listed in the trace-events-all file) per line; globbing patterns are accepted too. This option is only available if QEMU has been compiled with the simple, log or ftrace tracing backend. file=FILE Log output traces to FILE. This option is only available if QEMU has been compiled with the simple tracing backend. --blockdev BLOCKDEVDEF is a block node definition. See the qemu(1) manual page for a description of block node properties and the qemu-block-drivers(7) manual page for a description of driver-specific parameters. --chardev CHARDEVDEF is a character device definition. See the qemu(1) manual page for a description of character device properties. A common character device definition configures a UNIX do‐ main socket: --chardev socket,id=char1,path=/tmp/qmp.sock,server,nowait --export[type=]nbd,id=,node-name=[,name=][,writable=on|off][,bitmap=] --export [type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,log‐ ical-block-size=][,num-queues=] --export [type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,log‐ ical-block-size=][,num-queues=] is a block export definition. node-name is the block node that should be exported. writable determines whether or not the export allows write requests for mod‐ ifying data (the default is off). The nbd export type requires --nbd-server (see below). name is the NBD export name. bitmap is the name of a dirty bitmap reachable from the block node, so the NBD client can use NBD_OPT_SET_META_CONTEXT with the meta‐ data context name "qemu:dirty-bitmap:BITMAP" to inspect the bitmap. The vhost-user-blk export type takes a vhost-user socket address on which it accept incoming connections. Both addr.type=unix,addr.path= for UNIX domain
Re: [PATCH v2 1/3] docs: generate qemu-storage-daemon-qmp-ref(7) man page
On Tue, Oct 06, 2020 at 12:22:55PM +0200, Kevin Wolf wrote: > Am 10.09.2020 um 16:43 hat Stefan Hajnoczi geschrieben: > > Although qemu-storage-daemon QMP commands are identical to QEMU QMP > > commands they are a subset. Generate a manual page of just the commands > > supported by qemu-storage-daemon so that users know exactly what is > > available in qemu-storage-daemon. > > > > Add an h1 heading in storage-daemon/qapi/qapi-schema.json so that > > block-core.json is at the h2 heading level. > > > > Signed-off-by: Stefan Hajnoczi > > As the series doesn't apply any more, I can't actually try it out > easily, but is the order of includes in the schema right now? > > I seem to remember that in v1 we discussed that nested includes result > in an unexpected section structure in the documentation in some cases > (such as generic jobs being documented in a subsection of block > devices), and that we need to reorder includes in qapi-schema.json to > fix this because a more clever doc generator wasn't considered worth the > effort. v2 onwards takes a different approach and leaves the header where it is. Stefan signature.asc Description: PGP signature
Re: [PATCH v3 2/2] hw/block/nvme: add simple copy command
On Dec 9 07:13, Keith Busch wrote: > On Tue, Dec 08, 2020 at 09:33:39AM +0100, Klaus Jensen wrote: > > +static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req) > > +{ > > > > > +for (i = 0; i < nr; i++) { > > +uint32_t _nlb = le16_to_cpu(range[i].nlb) + 1; > > +if (_nlb > le16_to_cpu(ns->id_ns.mssrl)) { > > +return NVME_CMD_SIZE_LIMIT | NVME_DNR; > > +} > > + > > +nlb += _nlb; > > +} > > + > > +if (nlb > le32_to_cpu(ns->id_ns.mcl)) { > > +return NVME_CMD_SIZE_LIMIT | NVME_DNR; > > +} > > + > > +bounce = bouncep = g_malloc(nvme_l2b(ns, nlb)); > > + > > +for (i = 0; i < nr; i++) { > > +uint64_t slba = le64_to_cpu(range[i].slba); > > +uint32_t nlb = le16_to_cpu(range[i].nlb) + 1; > > + > > +status = nvme_check_bounds(ns, slba, nlb); > > +if (status) { > > +trace_pci_nvme_err_invalid_lba_range(slba, nlb, > > ns->id_ns.nsze); > > +goto free_bounce; > > +} > > + > > +if (NVME_ERR_REC_DULBE(ns->features.err_rec)) { > > +status = nvme_check_dulbe(ns, slba, nlb); > > +if (status) { > > +goto free_bounce; > > +} > > +} > > +} > > Only comment I have is that these two for-loops look like they can be > collaped into one, which also simplifies how you account for the bounce > buffer when error'ing out. > Yeah. And the shadowing of nlb is not good either. I'll fix it up. signature.asc Description: PGP signature
Re: [PATCH v11 00/13] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Hi Dmitry, By and large, this looks OK to me. There are still some issues here and there, and some comments of mine that you did not address, but I will follow up with patches to fix that. Let's get this merged. It looks like the nvme-next you rebased on is slightly old and missing two commits: "hw/block/nvme: remove superfluous NvmeCtrl parameter" and "hw/block/nvme: pull aio error handling" It caused a couple of conflicts, but nothing that I couldn't fix up. Since I didn't manage to convince anyone about the zsze and zcap parameters being in terms of LBAs, I'll revert that to be 'zoned.zone_size' and 'zoned.zone_capacity'. Finally, would you accept that we skip "hw/block/nvme: Add injection of Offline/Read-Only zones" for now? I'd like to discuss it a bit since I think the random injects feels a bit ad-hoc. Back when I did OCSSD emulation with Hans, we did something like this for setting up state through a descriptor text file - I think we should explore something like that before we lock down the two parameters. I'll amend the final documentation commit to not include those parameters. Sounds good? Otherwise, I think this is mergeable to nvme-next. So, for the series (excluding "hw/block/nvme: Add injection of Offline/Read-Only zones"): Reviewed-by: Klaus Jensen On Dec 9 05:03, Dmitry Fomichev wrote: > v10 -> v11: > > - Address review comments by Klaus. > > - Add a patch to separate the handling of controller reset >and subsystem shutdown. Place the patch at the beginning >of the series so it can be picked up separately. > > - Rebase on the current nvme-next branch. > > v9 -> v10: > > - Correctly check for MDTS in Zone Management Receive handler. > > - Change Klaus' "Reviewed-by" email in UUID patch. > > v8 -> v9: > > - Move the modifications to "include/block/nvme.h" made to >introduce ZNS-related definitions to a separate patch. > > - Add a new struct, NvmeZonedResult, along the same lines as the >existing NvmeAerResult, to carry Zone Append LBA returned to >the host. Now, there is no need to modify NvmeCqe struct except >renaming DW1 field from "rsvd" to "dw1". > > - Add check for MDTS in Zone Management Receive handler. > > - Remove checks for ns->attached since the value of this flag >is always true for now. > > - Rebase to the current quemu-nvme/nvme-next branch. > > v7 -> v8: > > - Move refactoring commits to the front of the series. > > - Remove "attached" and "fill_pattern" device properties. > > - Only close open zones upon subsystem shutdown, not when CC.EN flag >is set to 0. Avoid looping through all zones by iterating through >lists of open and closed zones. > > - Improve bulk processing of zones aka zoned operations with "all" >flag set. Avoid looping through the entire zone array for all zone >operations except Offline Zone. > > - Prefix ZNS-related property names with "zoned.". The "zoned" Boolean >property is retained to turn on zoned command set as it is much more >intuitive and user-friendly compared to setting a magic number value >to csi property. > > - Address review comments. > > - Remove unused trace events. > > v6 -> v7: > > - Introduce ns->iocs initialization function earlier in the series, >in CSE Log patch. > > - Set NVM iocs for zoned namespaces when CC.CSS is set to >NVME_CC_CSS_NVM. > > - Clean up code in CSE log handler. > > v5 -> v6: > > - Remove zoned state persistence code. Replace position-independent >zone lists with QTAILQs. > > - Close all open zones upon clearing of the controller. This is >a similar procedure to the one previously performed upon powering >up with zone persistence. > > - Squash NS Types and ZNS triplets of commits to keep definitions >and trace event definitions together with the implementation code. > > - Move namespace UUID generation to a separate patch. Add the new >"uuid" property as suggested by Klaus. > > - Rework Commands and Effects patch to make sure that the log is >always in sync with the actual set of commands supported. > > - Add two refactoring commits at the end of the series to >optimize read and write i/o path. > > - Incorporate feedback from Keith, Klaus and Niklas: > > * fix rebase errors in nvme_identify_ns_descr_list() > * remove unnecessary code from nvme_write_bar() > * move csi to NvmeNamespace and use it from the beginning in NSTypes > patch > * change zone read processing to cover all corner cases with RAZB=1 > * sync w_ptr and d.wp in case of a i/o error at the preceding zone > * reword the commit message in active/inactive patch with the new > text from Niklas > * correct dlfeat reporting depending on the fill pattern set > * add more checks for "attached" n/s parameter to prevent i/o and > get/set features on inactive namespaces > * Use DEFINE_PROP_SIZE and DEFINE_PROP_SIZE32 for zone size/capacity > and ZASL
Re: [PATCH] file-posix: detect the lock using the real file
On Tue, Dec 08, 2020 at 03:38:22PM +0100, Kevin Wolf wrote: > Am 08.12.2020 um 13:59 hat Li Feng geschrieben: > > This patch addresses this issue: > > When accessing a volume on an NFS filesystem without supporting the file > > lock, > > tools, like qemu-img, will complain "Failed to lock byte 100". > > > > In the original code, the qemu_has_ofd_lock will test the lock on the > > "/dev/null" pseudo-file. Actually, the file.locking is per-drive property, > > which depends on the underlay filesystem. > > > > In this patch, make the 'qemu_has_ofd_lock' with a filename be more generic > > and reasonable. > > > > Signed-off-by: Li Feng > > Do you know any way how I could configure either the NFS server or the > NFS client such that locking would fail? For any patch related to this, > it would be good if I could even test the scenario. One could write a qtest that uses an LD_PRELOAD to replace the standard glibc fcntl() function with one that returns an error for locking commands. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|