date:20201209

Re: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough

2020-12-09 Thread Paolo Bonzini


On 09/12/20 15:03, no-re...@patchew.org wrote:

Patchew URL: 
https://patchew.org/QEMU/20201209135355.561745-1-mlevi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201209135355.561745-1-mlevi...@redhat.com
Subject: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
 From https://github.com/patchew-project/qemu
  * [new tag] patchew/20201209135355.561745-1-mlevi...@redhat.com -> 
patchew/20201209135355.561745-1-mlevi...@redhat.com
Switched to a new branch 'test'
77c9000 block/scsi: correctly emulate the VPD block limits page
61f49e1 block: use blk_get_max_ioctl_transfer for SCSI passthrough
35c66d6 block: add max_ioctl_transfer to BlockLimits
08ba263 file-posix: add sg_get_max_segments that actually works with sg
e9fd749 file-posix: split hdev_refresh_limits from raw_refresh_limits

=== OUTPUT BEGIN ===
1/5 Checking commit e9fd7498060c (file-posix: split hdev_refresh_limits from 
raw_refresh_limits)
2/5 Checking commit 08ba263f565d (file-posix: add sg_get_max_segments that 
actually works with sg)
3/5 Checking commit 35c66d636d83 (block: add max_ioctl_transfer to BlockLimits)
4/5 Checking commit 61f49e1c953b (block: use blk_get_max_ioctl_transfer for 
SCSI passthrough)
5/5 Checking commit 77c9000b7c30 (block/scsi: correctly emulate the VPD block 
limits page)
ERROR: braces {} are necessary for all arms of this statement
#39: FILE: hw/scsi/scsi-generic.c:204:
+if (len < r->buflen)
[...]

total: 1 errors, 0 warnings, 28 lines checked

Patch 5/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201209135355.561745-1-mlevi...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com



Time for v3?

Paolo

[PATCH v5 2/4] block: add bdrv_co_delete_file_noerr

2020-12-09 Thread Maxim Levitsky

This function wraps bdrv_co_delete_file for the common case of removing a file,
which was just created by format driver, on an error condition.

It hides the -ENOTSUPP error, and reports all other errors otherwise.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Alberto Garcia 
---
 block.c   | 24 
 include/block/block.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/block.c b/block.c
index f1cedac362..5d35ba2fb8 100644
--- a/block.c
+++ b/block.c
@@ -704,6 +704,30 @@ int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, 
Error **errp)
 return ret;
 }
 
+void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs)
+{
+Error *local_err = NULL;
+int ret;
+
+if (!bs) {
+return;
+}
+
+ret = bdrv_co_delete_file(bs, _err);
+/*
+ * ENOTSUP will happen if the block driver doesn't support
+ * the 'bdrv_co_delete_file' interface. This is a predictable
+ * scenario and shouldn't be reported back to the user.
+ */
+if (ret == -ENOTSUP) {
+error_free(local_err);
+} else if (ret < 0) {
+error_report_err(local_err);
+}
+}
+
+
+
 /**
  * Try to get @bs's logical and physical block size.
  * On success, store them in @bsz struct and return 0.
diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..af03022723 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -428,6 +428,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, 
BlockDriverState *base,
   Error **errp);
 void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base);
 int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp);
+void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs);
 
 
 typedef struct BdrvCheckResult {
-- 
2.26.2

[PATCH v5 1/4] crypto: luks: Fix tiny memory leak

2020-12-09 Thread Maxim Levitsky

When the underlying block device doesn't support the
bdrv_co_delete_file interface, an 'Error' object was leaked.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Alberto Garcia 
---
 block/crypto.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/crypto.c b/block/crypto.c
index aef5a5721a..b3a5275132 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -735,6 +735,8 @@ fail:
  */
 if ((r_del < 0) && (r_del != -ENOTSUP)) {
 error_report_err(local_delete_err);
+} else {
+error_free(local_delete_err);
 }
 }
 
-- 
2.26.2

[PATCH v5 0/4] qcow2: don't leave partially initialized file on image creation

2020-12-09 Thread Maxim Levitsky

Use the bdrv_co_delete_file interface to delete the underlying
file if qcow2 initialization fails (e.g due to bad encryption secret)

This makes the qcow2 driver behave the same way as the luks driver behaves.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1845353

V3: addressed review feedback and reworked commit messages

V4: got rid of code duplication by adding bdrv_co_delete_file_noerr
and made the qcow2 driver use this function to delete
both the main and the data file.

V5: addresssed review feedback on reworked version.

Best regards,
Maxim Levitsky

Maxim Levitsky (4):
  crypto: luks: Fix tiny memory leak
  block: add bdrv_co_delete_file_noerr
  crypto: luks: use bdrv_co_delete_file_noerr
  block: qcow2: remove the created file on initialization error

 block.c   | 24 
 block/crypto.c| 13 ++---
 block/qcow2.c |  6 --
 include/block/block.h |  1 +
 4 files changed, 31 insertions(+), 13 deletions(-)

-- 
2.26.2

[PATCH v5 3/4] crypto: luks: use bdrv_co_delete_file_noerr

2020-12-09 Thread Maxim Levitsky

This refactoring is now possible thanks to this function.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Alberto Garcia 
---
 block/crypto.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index b3a5275132..1d30fde38e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -725,19 +725,8 @@ fail:
  * If an error occurred, delete 'filename'. Even if the file existed
  * beforehand, it has been truncated and corrupted in the process.
  */
-if (ret && bs) {
-Error *local_delete_err = NULL;
-int r_del = bdrv_co_delete_file(bs, _delete_err);
-/*
- * ENOTSUP will happen if the block driver doesn't support
- * the 'bdrv_co_delete_file' interface. This is a predictable
- * scenario and shouldn't be reported back to the user.
- */
-if ((r_del < 0) && (r_del != -ENOTSUP)) {
-error_report_err(local_delete_err);
-} else {
-error_free(local_delete_err);
-}
+if (ret) {
+bdrv_co_delete_file_noerr(bs);
 }
 
 bdrv_unref(bs);
-- 
2.26.2

[PATCH v5 4/4] block: qcow2: remove the created file on initialization error

2020-12-09 Thread Maxim Levitsky

If the qcow initialization fails, we should remove the file if it was
already created, to avoid leaving stale files around.

We already do this for luks raw images.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Alberto Garcia 
---
 block/qcow2.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3a90ef2786..68c9182f92 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3847,12 +3847,14 @@ static int coroutine_fn 
qcow2_co_create_opts(BlockDriver *drv,
 
 /* Create the qcow2 image (format layer) */
 ret = qcow2_co_create(create_options, errp);
+finish:
 if (ret < 0) {
-goto finish;
+bdrv_co_delete_file_noerr(bs);
+bdrv_co_delete_file_noerr(data_bs);
 }
 
 ret = 0;
-finish:
+
 qobject_unref(qdict);
 bdrv_unref(bs);
 bdrv_unref(data_bs);
-- 
2.26.2

Re: [PATCH v4 4/4] block: qcow2: remove the created file on initialization error

2020-12-09 Thread Maxim Levitsky

On Wed, 2020-12-09 at 18:41 +0100, Alberto Garcia wrote:
> On Wed 09 Dec 2020 05:44:41 PM CET, Maxim Levitsky wrote:
> > @@ -3847,12 +3847,13 @@ static int coroutine_fn 
> > qcow2_co_create_opts(BlockDriver *drv,
> >  
> >  /* Create the qcow2 image (format layer) */
> >  ret = qcow2_co_create(create_options, errp);
> > +
> > +finish:
> >  if (ret < 0) {
> > -goto finish;
> > +bdrv_co_delete_file_noerr(bs);
> > +bdrv_co_delete_file_noerr(data_bs);
> >  }
> >  
> > -ret = 0;
> 
> Many/most functions in qcow2.c force ret to be 0 on success, we could
> also keep that here (although in practice I don't think that ret can be
> greater than 0 in this case, or that the caller would care).

I also noticed this when I was sending the patches, and I wasn't sure
if I want to keep that 'ret = 0' or not.
I will add it back.

Best regards,
Maxim Levitsky

> 
> Either way,
> 
> Reviewed-by: Alberto Garcia 
> 
> Berto
>

Re: [PATCH v4 2/4] block: add bdrv_co_delete_file_noerr

2020-12-09 Thread Maxim Levitsky

On Wed, 2020-12-09 at 18:34 +0100, Alberto Garcia wrote:
> On Wed 09 Dec 2020 05:44:39 PM CET, Maxim Levitsky wrote:
> > +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs)
> > +{
> > +Error *local_err = NULL;
> > +
> > +if (!bs) {
> > +return;
> > +}
> > +
> > +int ret = bdrv_co_delete_file(bs, _err);
>^^^
> 
> According to the QEMU coding style we should not have declarations in
> the middle of a block.

Oops!

I will send next version now.

Thanks a lot for the review!

Best regards,
Maxim Levitsky

> 
> The patch looks otherwise fine.
> 
> Reviewed-by: Alberto Garcia 
> 
> Berto
>

[PULL v2 48/65] libvhost-user: make it a meson subproject

2020-12-09 Thread Michael S. Tsirkin

From: Marc-André Lureau 

By making libvhost-user a subproject, check it builds
standalone (without the global QEMU cflags etc).

Note that the library still relies on QEMU include/qemu/atomic.h and
linux_headers/.

Signed-off-by: Marc-André Lureau 
Message-Id: <20201125100640.366523-6-marcandre.lur...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 contrib/vhost-user-gpu/vugpu.h|  2 +-
 include/qemu/vhost-user-server.h  |  2 +-
 .../libvhost-user/libvhost-user-glib.h|  0
 .../libvhost-user/libvhost-user.h |  0
 block/export/vhost-user-blk-server.c  |  2 +-
 contrib/vhost-user-blk/vhost-user-blk.c   |  3 +--
 contrib/vhost-user-input/main.c   |  3 +--
 contrib/vhost-user-scsi/vhost-user-scsi.c |  2 +-
 .../libvhost-user/libvhost-user-glib.c|  0
 .../libvhost-user/libvhost-user.c |  0
 tests/vhost-user-bridge.c |  2 +-
 tools/virtiofsd/fuse_virtio.c |  2 +-
 contrib/libvhost-user/meson.build |  4 
 contrib/vhost-user-blk/meson.build|  3 +--
 contrib/vhost-user-gpu/meson.build|  3 +--
 contrib/vhost-user-input/meson.build  |  3 +--
 contrib/vhost-user-scsi/meson.build   |  3 +--
 meson.build   |  7 ++-
 subprojects/libvhost-user/meson.build | 20 +++
 tests/meson.build |  3 +--
 tools/virtiofsd/meson.build   |  3 +--
 21 files changed, 40 insertions(+), 27 deletions(-)
 rename {contrib => subprojects}/libvhost-user/libvhost-user-glib.h (100%)
 rename {contrib => subprojects}/libvhost-user/libvhost-user.h (100%)
 rename {contrib => subprojects}/libvhost-user/libvhost-user-glib.c (100%)
 rename {contrib => subprojects}/libvhost-user/libvhost-user.c (100%)
 delete mode 100644 contrib/libvhost-user/meson.build
 create mode 100644 subprojects/libvhost-user/meson.build

diff --git a/contrib/vhost-user-gpu/vugpu.h b/contrib/vhost-user-gpu/vugpu.h
index 3153c9a6de..bdf9a74b46 100644
--- a/contrib/vhost-user-gpu/vugpu.h
+++ b/contrib/vhost-user-gpu/vugpu.h
@@ -17,7 +17,7 @@
 
 #include "qemu/osdep.h"
 
-#include "contrib/libvhost-user/libvhost-user-glib.h"
+#include "libvhost-user-glib.h"
 #include "standard-headers/linux/virtio_gpu.h"
 
 #include "qemu/queue.h"
diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 0da4c2cc4c..121ea1dedf 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -11,7 +11,7 @@
 #ifndef VHOST_USER_SERVER_H
 #define VHOST_USER_SERVER_H
 
-#include "contrib/libvhost-user/libvhost-user.h"
+#include "subprojects/libvhost-user/libvhost-user.h" /* only for the type 
definitions */
 #include "io/channel-socket.h"
 #include "io/channel-file.h"
 #include "io/net-listener.h"
diff --git a/contrib/libvhost-user/libvhost-user-glib.h 
b/subprojects/libvhost-user/libvhost-user-glib.h
similarity index 100%
rename from contrib/libvhost-user/libvhost-user-glib.h
rename to subprojects/libvhost-user/libvhost-user-glib.h
diff --git a/contrib/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
similarity index 100%
rename from contrib/libvhost-user/libvhost-user.h
rename to subprojects/libvhost-user/libvhost-user.h
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 62672d1cb9..a3d95ca012 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -11,7 +11,7 @@
  */
 #include "qemu/osdep.h"
 #include "block/block.h"
-#include "contrib/libvhost-user/libvhost-user.h"
+#include "subprojects/libvhost-user/libvhost-user.h" /* only for the type 
definitions */
 #include "standard-headers/linux/virtio_blk.h"
 #include "qemu/vhost-user-server.h"
 #include "vhost-user-blk-server.h"
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index dc981bf945..6abd7835a8 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -17,8 +17,7 @@
 
 #include "qemu/osdep.h"
 #include "standard-headers/linux/virtio_blk.h"
-#include "contrib/libvhost-user/libvhost-user-glib.h"
-#include "contrib/libvhost-user/libvhost-user.h"
+#include "libvhost-user-glib.h"
 
 #if defined(__linux__)
 #include 
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index 6020c6f33a..3ea840cf44 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -12,8 +12,7 @@
 #include "qemu/iov.h"
 #include "qemu/bswap.h"
 #include "qemu/sockets.h"
-#include "contrib/libvhost-user/libvhost-user.h"
-#include "contrib/libvhost-user/libvhost-user-glib.h"
+#include "libvhost-user-glib.h"
 #include "standard-headers/linux/virtio_input.h"
 #include "qapi/error.h"
 
diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c

[PULL v2 54/65] block/export: avoid g_return_val_if() input validation

2020-12-09 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

Do not validate input with g_return_val_if(). This API is intended for
checking programming errors and is compiled out with -DG_DISABLE_CHECKS.

Use an explicit if statement for input validation so it cannot
accidentally be compiled out.

Suggested-by: Markus Armbruster 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20201118091644.199527-5-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 block/export/vhost-user-blk-server.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index a3d95ca012..ab2c4d44c4 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -267,7 +267,9 @@ vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t 
len)
 VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
 
-g_return_val_if_fail(len <= sizeof(struct virtio_blk_config), -1);
+if (len > sizeof(struct virtio_blk_config)) {
+return -1;
+}
 
 memcpy(config, >blkcfg, len);
 return 0;
-- 
MST

Re: [PATCH v4 4/4] block: qcow2: remove the created file on initialization error

2020-12-09 Thread Alberto Garcia

On Wed 09 Dec 2020 05:44:41 PM CET, Maxim Levitsky wrote:
> @@ -3847,12 +3847,13 @@ static int coroutine_fn 
> qcow2_co_create_opts(BlockDriver *drv,
>  
>  /* Create the qcow2 image (format layer) */
>  ret = qcow2_co_create(create_options, errp);
> +
> +finish:
>  if (ret < 0) {
> -goto finish;
> +bdrv_co_delete_file_noerr(bs);
> +bdrv_co_delete_file_noerr(data_bs);
>  }
>  
> -ret = 0;

Many/most functions in qcow2.c force ret to be 0 on success, we could
also keep that here (although in practice I don't think that ret can be
greater than 0 in this case, or that the caller would care).

Either way,

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH] file-posix: detect the lock using the real file

2020-12-09 Thread Kevin Wolf

Am 09.12.2020 um 10:33 hat Daniel P. BerrangÃ© geschrieben:
> On Tue, Dec 08, 2020 at 03:38:22PM +0100, Kevin Wolf wrote:
> > Am 08.12.2020 um 13:59 hat Li Feng geschrieben:
> > > This patch addresses this issue:
> > > When accessing a volume on an NFS filesystem without supporting the file 
> > > lock,
> > > tools, like qemu-img, will complain "Failed to lock byte 100".
> > > 
> > > In the original code, the qemu_has_ofd_lock will test the lock on the
> > > "/dev/null" pseudo-file. Actually, the file.locking is per-drive property,
> > > which depends on the underlay filesystem.
> > > 
> > > In this patch, make the 'qemu_has_ofd_lock' with a filename be more 
> > > generic
> > > and reasonable.
> > > 
> > > Signed-off-by: Li Feng 
> > 
> > Do you know any way how I could configure either the NFS server or the
> > NFS client such that locking would fail? For any patch related to this,
> > it would be good if I could even test the scenario.
> 
> One could write a qtest that uses an LD_PRELOAD to replace the standard
> glibc fcntl() function with one that returns an error for locking commands.

Sounds a bit ugly, but for regression testing it could make sense.

However, part of the testing would be to verify that we our checks
actually match the kernel code, which this approach couldn't solve.

Kevin

Re: [PATCH 2/2] nbd/server: Quiesce coroutines on context switch

2020-12-09 Thread Sergio Lopez

On Fri, Dec 04, 2020 at 12:39:07PM -0600, Eric Blake wrote:
> On 12/4/20 10:53 AM, Sergio Lopez wrote:
> > When switching between AIO contexts we need to me make sure that both
> > recv_coroutine and send_coroutine are not scheduled to run. Otherwise,
> > QEMU may crash while attaching the new context with an error like
> > this one:
> > 
> > aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'
> > 
> > To achieve this we need a local implementation of
> > 'qio_channel_readv_all_eof' named 'nbd_read_eof' (a trick already done
> > by 'nbd/client.c') that allows us to interrupt the operation and to
> > know when recv_coroutine is yielding.
> > 
> > With this in place, we delegate detaching the AIO context to the
> > owning context with a BH ('nbd_aio_detach_bh') scheduled using
> > 'aio_wait_bh_oneshot'. This BH signals that we need to quiesce the
> > channel by setting 'client->quiescing' to 'true', and either waits for
> > the coroutine to finish using AIO_WAIT_WHILE or, if it's yielding in
> > 'nbd_read_eof', actively enters the coroutine to interrupt it.
> > 
> > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900326
> > Signed-off-by: Sergio Lopez 
> > ---
> >  nbd/server.c | 120 +--
> >  1 file changed, 106 insertions(+), 14 deletions(-)
> 
> A complex patch, so I'd appreciate a second set of eyes.
> 
> > 
> > diff --git a/nbd/server.c b/nbd/server.c
> > index 613ed2634a..7229f487d2 100644
> > --- a/nbd/server.c
> > +++ b/nbd/server.c
> > @@ -132,6 +132,9 @@ struct NBDClient {
> >  CoMutex send_lock;
> >  Coroutine *send_coroutine;
> >  
> > +bool read_yielding;
> > +bool quiescing;
> 
> Will either of these fields need to be accessed atomically once the
> 'yank' code is added, or are we still safe with direct access because
> coroutines are not multithreaded?

Yes, those are only accessed from coroutines, which will be scheduled
on the same thread.

> > +
> >  QTAILQ_ENTRY(NBDClient) next;
> >  int nb_requests;
> >  bool closing;
> > @@ -1352,14 +1355,60 @@ static coroutine_fn int nbd_negotiate(NBDClient 
> > *client, Error **errp)
> >  return 0;
> >  }
> >  
> > -static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request,
> > +/* nbd_read_eof
> > + * Tries to read @size bytes from @ioc. This is a local implementation of
> > + * qio_channel_readv_all_eof. We have it here because we need it to be
> > + * interruptible and to know when the coroutine is yielding.
> > + * Returns 1 on success
> > + * 0 on eof, when no data was read (errp is not set)
> > + * negative errno on failure (errp is set)
> > + */
> > +static inline int coroutine_fn
> > +nbd_read_eof(NBDClient *client, void *buffer, size_t size, Error **errp)
> > +{
> > +bool partial = false;
> > +
> > +assert(size);
> > +while (size > 0) {
> > +struct iovec iov = { .iov_base = buffer, .iov_len = size };
> > +ssize_t len;
> > +
> > +len = qio_channel_readv(client->ioc, , 1, errp);
> > +if (len == QIO_CHANNEL_ERR_BLOCK) {
> > +client->read_yielding = true;
> > +qio_channel_yield(client->ioc, G_IO_IN);
> > +client->read_yielding = false;
> 
> nbd/client.c:nbd_read_eof() uses bdrv_dec/inc_in_flight instead of
> read_yielding...
> 
> > +if (client->quiescing) {
> > +return -EAGAIN;
> > +}
> 
> and the quiescing check is new; otherwise, these two functions look
> identical.  Having two static functions with the same name makes gdb a
> bit more annoying (which one of the two did you want your breakpoint
> on?).  Is there any way we could write this code only once in
> nbd/common.c for reuse by both client and server?  But I can live with
> it as written.

I'm not happy with this either, but on the first implementation I've
tried to come up with a unique function for both use cases, and it
looked terrible.

We can easily use a different name, though.

> > @@ -2151,20 +2223,23 @@ static int nbd_co_send_bitmap(NBDClient *client, 
> > uint64_t handle,
> >  
> >  /* nbd_co_receive_request
> >   * Collect a client request. Return 0 if request looks valid, -EIO to drop
> > - * connection right away, and any other negative value to report an error 
> > to
> > - * the client (although the caller may still need to disconnect after 
> > reporting
> > - * the error).
> > + * connection right away, -EAGAIN to indicate we were interrupted and the
> > + * channel should be quiesced, and any other negative value to report an 
> > error
> > + * to the client (although the caller may still need to disconnect after
> > + * reporting the error).
> >   */
> >  static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
> >Error **errp)
> >  {
> >  NBDClient *client = req->client;
> >  int valid_flags;
> > +int ret;
> >  
> >  g_assert(qemu_in_coroutine());
> >

Re: [PATCH 1/2] virtio-blk: Acquire context while switching them on dataplane start

2020-12-09 Thread Kevin Wolf

Am 09.12.2020 um 17:51 hat Sergio Lopez geschrieben:
> On Mon, Dec 07, 2020 at 04:37:53PM +0100, Kevin Wolf wrote:
> > Am 04.12.2020 um 17:53 hat Sergio Lopez geschrieben:
> > > On dataplane start, acquire the new AIO context before calling
> > > 'blk_set_aio_context', releasing it immediately afterwards. This
> > > prevents reaching the AIO context attach/detach notifier functions
> > > without having acquired it first.
> > > 
> > > It was also the only place where 'blk_set_aio_context' was called with
> > > an unprotected AIO context.
> > > 
> > > Signed-off-by: Sergio Lopez 
> > > ---
> > >  hw/block/dataplane/virtio-blk.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/hw/block/dataplane/virtio-blk.c 
> > > b/hw/block/dataplane/virtio-blk.c
> > > index 37499c5564..034e43cb1f 100644
> > > --- a/hw/block/dataplane/virtio-blk.c
> > > +++ b/hw/block/dataplane/virtio-blk.c
> > > @@ -214,7 +214,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> > >  vblk->dataplane_started = true;
> > >  trace_virtio_blk_data_plane_start(s);
> > >  
> > > +aio_context_acquire(s->ctx);
> > >  r = blk_set_aio_context(s->conf->conf.blk, s->ctx, _err);
> > > +aio_context_release(s->ctx);
> > 
> > bdrv_set_aio_context_ignore() is documented like this:
> > 
> >  * The caller must own the AioContext lock for the old AioContext of bs, 
> > but it
> >  * must not own the AioContext lock for new_context (unless new_context is 
> > the
> >  * same as the current context of bs).
> 
> Does that rule apply to blk_set_aio_context too?

bdrv_set_aio_context_ignore() is what blk_set_aio_context() calls, so I
would say yes.

> All use cases I can find in the code are acquiring the new context:
> [...]

Hm... That's unfortunate.

I think the reason why you shouldn't hold it is that the
bdrv_drained_begin() call in bdrv_set_aio_context_ignore() could
deadlock if you hold the lock of a context that is not the current
context of the BlockDriverState.

Maybe there are more reasons, I'm not sure.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH v4 3/4] crypto: luks: use bdrv_co_delete_file_noerr

2020-12-09 Thread Alberto Garcia

On Wed 09 Dec 2020 05:44:40 PM CET, Maxim Levitsky wrote:
> This refactoring is now possible thanks to this function.
>
> Signed-off-by: Maxim Levitsky 

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH v4 2/4] block: add bdrv_co_delete_file_noerr

2020-12-09 Thread Alberto Garcia

On Wed 09 Dec 2020 05:44:39 PM CET, Maxim Levitsky wrote:
> +void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs)
> +{
> +Error *local_err = NULL;
> +
> +if (!bs) {
> +return;
> +}
> +
> +int ret = bdrv_co_delete_file(bs, _err);
   ^^^

According to the QEMU coding style we should not have declarations in
the middle of a block.

The patch looks otherwise fine.

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH 1/2] virtio-blk: Acquire context while switching them on dataplane start

2020-12-09 Thread Sergio Lopez

On Mon, Dec 07, 2020 at 04:37:53PM +0100, Kevin Wolf wrote:
> Am 04.12.2020 um 17:53 hat Sergio Lopez geschrieben:
> > On dataplane start, acquire the new AIO context before calling
> > 'blk_set_aio_context', releasing it immediately afterwards. This
> > prevents reaching the AIO context attach/detach notifier functions
> > without having acquired it first.
> > 
> > It was also the only place where 'blk_set_aio_context' was called with
> > an unprotected AIO context.
> > 
> > Signed-off-by: Sergio Lopez 
> > ---
> >  hw/block/dataplane/virtio-blk.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/block/dataplane/virtio-blk.c 
> > b/hw/block/dataplane/virtio-blk.c
> > index 37499c5564..034e43cb1f 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -214,7 +214,9 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> >  vblk->dataplane_started = true;
> >  trace_virtio_blk_data_plane_start(s);
> >  
> > +aio_context_acquire(s->ctx);
> >  r = blk_set_aio_context(s->conf->conf.blk, s->ctx, _err);
> > +aio_context_release(s->ctx);
> 
> bdrv_set_aio_context_ignore() is documented like this:
> 
>  * The caller must own the AioContext lock for the old AioContext of bs, but 
> it
>  * must not own the AioContext lock for new_context (unless new_context is the
>  * same as the current context of bs).

Does that rule apply to blk_set_aio_context too? All use cases I can
find in the code are acquiring the new context:

hw/block/dataplane/xen-block.c:
 719 void xen_block_dataplane_start(XenBlockDataPlane *dataplane,
 720const unsigned int ring_ref[],
 721unsigned int nr_ring_ref,
 722unsigned int event_channel,
 723unsigned int protocol,
 724Error **errp)
 725 {
 ...
 811 aio_context_acquire(dataplane->ctx);
 812 /* If other users keep the BlockBackend in the iothread, that's ok */
 813 blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL);
 814 /* Only reason for failure is a NULL channel */
 815 xen_device_set_event_channel_context(xendev, dataplane->event_channel,
 816  dataplane->ctx, _abort);
 817 aio_context_release(dataplane->ctx);

hw/scsi/virtio-scsi.c:
 818 static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState 
*dev,
 819 Error **errp)
 820 {
 ...
 830 virtio_scsi_acquire(s);
 831 ret = blk_set_aio_context(sd->conf.blk, s->ctx, errp);
 832 virtio_scsi_release(s);

Thanks,
Sergio.

signature.asc
Description: PGP signature

[PATCH v4 1/4] crypto: luks: Fix tiny memory leak

2020-12-09 Thread Maxim Levitsky

When the underlying block device doesn't support the
bdrv_co_delete_file interface, an 'Error' object was leaked.

Signed-off-by: Maxim Levitsky 
Reviewed-by: Alberto Garcia 
---
 block/crypto.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/crypto.c b/block/crypto.c
index aef5a5721a..b3a5275132 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -735,6 +735,8 @@ fail:
  */
 if ((r_del < 0) && (r_del != -ENOTSUP)) {
 error_report_err(local_delete_err);
+} else {
+error_free(local_delete_err);
 }
 }
 
-- 
2.26.2

[PATCH v4 2/4] block: add bdrv_co_delete_file_noerr

2020-12-09 Thread Maxim Levitsky

This function wraps bdrv_co_delete_file for the common case of removing a file,
which was just created by format driver, on an error condition.

It hides the -ENOTSUPP error, and reports all other errors otherwise.

Signed-off-by: Maxim Levitsky 
---
 block.c   | 23 +++
 include/block/block.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/block.c b/block.c
index f1cedac362..57e6d9750a 100644
--- a/block.c
+++ b/block.c
@@ -704,6 +704,29 @@ int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, 
Error **errp)
 return ret;
 }
 
+void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs)
+{
+Error *local_err = NULL;
+
+if (!bs) {
+return;
+}
+
+int ret = bdrv_co_delete_file(bs, _err);
+/*
+ * ENOTSUP will happen if the block driver doesn't support
+ * the 'bdrv_co_delete_file' interface. This is a predictable
+ * scenario and shouldn't be reported back to the user.
+ */
+if (ret == -ENOTSUP) {
+error_free(local_err);
+} else if (ret < 0) {
+error_report_err(local_err);
+}
+}
+
+
+
 /**
  * Try to get @bs's logical and physical block size.
  * On success, store them in @bsz struct and return 0.
diff --git a/include/block/block.h b/include/block/block.h
index c9d7c58765..af03022723 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -428,6 +428,7 @@ int bdrv_freeze_backing_chain(BlockDriverState *bs, 
BlockDriverState *base,
   Error **errp);
 void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState *base);
 int coroutine_fn bdrv_co_delete_file(BlockDriverState *bs, Error **errp);
+void coroutine_fn bdrv_co_delete_file_noerr(BlockDriverState *bs);
 
 
 typedef struct BdrvCheckResult {
-- 
2.26.2

[PATCH v4 4/4] block: qcow2: remove the created file on initialization error

2020-12-09 Thread Maxim Levitsky

If the qcow initialization fails, we should remove the file if it was
already created, to avoid leaving stale files around.

We already do this for luks raw images.

Signed-off-by: Maxim Levitsky 
---
 block/qcow2.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3a90ef2786..b5169b7cad 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3847,12 +3847,13 @@ static int coroutine_fn 
qcow2_co_create_opts(BlockDriver *drv,
 
 /* Create the qcow2 image (format layer) */
 ret = qcow2_co_create(create_options, errp);
+
+finish:
 if (ret < 0) {
-goto finish;
+bdrv_co_delete_file_noerr(bs);
+bdrv_co_delete_file_noerr(data_bs);
 }
 
-ret = 0;
-finish:
 qobject_unref(qdict);
 bdrv_unref(bs);
 bdrv_unref(data_bs);
-- 
2.26.2

[PATCH v4 3/4] crypto: luks: use bdrv_co_delete_file_noerr

2020-12-09 Thread Maxim Levitsky

This refactoring is now possible thanks to this function.

Signed-off-by: Maxim Levitsky 
---
 block/crypto.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index b3a5275132..1d30fde38e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -725,19 +725,8 @@ fail:
  * If an error occurred, delete 'filename'. Even if the file existed
  * beforehand, it has been truncated and corrupted in the process.
  */
-if (ret && bs) {
-Error *local_delete_err = NULL;
-int r_del = bdrv_co_delete_file(bs, _delete_err);
-/*
- * ENOTSUP will happen if the block driver doesn't support
- * the 'bdrv_co_delete_file' interface. This is a predictable
- * scenario and shouldn't be reported back to the user.
- */
-if ((r_del < 0) && (r_del != -ENOTSUP)) {
-error_report_err(local_delete_err);
-} else {
-error_free(local_delete_err);
-}
+if (ret) {
+bdrv_co_delete_file_noerr(bs);
 }
 
 bdrv_unref(bs);
-- 
2.26.2

[PATCH v4 0/4] qcow2: don't leave partially initialized file on image creation

2020-12-09 Thread Maxim Levitsky

Use the bdrv_co_delete_file interface to delete the underlying
file if qcow2 initialization fails (e.g due to bad encryption secret)

This makes the qcow2 driver behave the same way as the luks driver behaves.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1845353

V3: addressed review feedback and reworked commit messages

V4: got rid of code duplication by adding bdrv_co_delete_file_noerr
and made the qcow2 driver use this function to delete
both the main and the data file.

Best regards,
Maxim Levitsky

Maxim Levitsky (4):
  crypto: luks: Fix tiny memory leak
  block: add bdrv_co_delete_file_noerr
  crypto: luks: use bdrv_co_delete_file_noerr
  block: qcow2: remove the created file on initialization error

 block.c   | 23 +++
 block/crypto.c| 13 ++---
 block/qcow2.c |  7 ---
 include/block/block.h |  1 +
 4 files changed, 30 insertions(+), 14 deletions(-)

-- 
2.26.2

Re: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough

2020-12-09 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20201209135355.561745-1-mlevi...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201209135355.561745-1-mlevi...@redhat.com
Subject: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20201209135355.561745-1-mlevi...@redhat.com -> 
patchew/20201209135355.561745-1-mlevi...@redhat.com
Switched to a new branch 'test'
77c9000 block/scsi: correctly emulate the VPD block limits page
61f49e1 block: use blk_get_max_ioctl_transfer for SCSI passthrough
35c66d6 block: add max_ioctl_transfer to BlockLimits
08ba263 file-posix: add sg_get_max_segments that actually works with sg
e9fd749 file-posix: split hdev_refresh_limits from raw_refresh_limits

=== OUTPUT BEGIN ===
1/5 Checking commit e9fd7498060c (file-posix: split hdev_refresh_limits from 
raw_refresh_limits)
2/5 Checking commit 08ba263f565d (file-posix: add sg_get_max_segments that 
actually works with sg)
3/5 Checking commit 35c66d636d83 (block: add max_ioctl_transfer to BlockLimits)
4/5 Checking commit 61f49e1c953b (block: use blk_get_max_ioctl_transfer for 
SCSI passthrough)
5/5 Checking commit 77c9000b7c30 (block/scsi: correctly emulate the VPD block 
limits page)
ERROR: braces {} are necessary for all arms of this statement
#39: FILE: hw/scsi/scsi-generic.c:204:
+if (len < r->buflen)
[...]

total: 1 errors, 0 warnings, 28 lines checked

Patch 5/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201209135355.561745-1-mlevi...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough

2020-12-09 Thread Maxim Levitsky

On Wed, 2020-12-09 at 06:03 -0800, no-re...@patchew.org wrote:
> Patchew URL: 
> https://patchew.org/QEMU/20201209135355.561745-1-mlevi...@redhat.com/
> 
> 
> 
> Hi,
> 
> This series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Message-id: 20201209135355.561745-1-mlevi...@redhat.com
> Subject: [PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> git rev-parse base > /dev/null || exit 0
> git config --local diff.renamelimit 0
> git config --local diff.renames True
> git config --local diff.algorithm histogram
> ./scripts/checkpatch.pl --mailback base..
> === TEST SCRIPT END ===
> 
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
>  * [new tag] patchew/20201209135355.561745-1-mlevi...@redhat.com -> 
> patchew/20201209135355.561745-1-mlevi...@redhat.com
> Switched to a new branch 'test'
> 77c9000 block/scsi: correctly emulate the VPD block limits page
> 61f49e1 block: use blk_get_max_ioctl_transfer for SCSI passthrough
> 35c66d6 block: add max_ioctl_transfer to BlockLimits
> 08ba263 file-posix: add sg_get_max_segments that actually works with sg
> e9fd749 file-posix: split hdev_refresh_limits from raw_refresh_limits
> 
> === OUTPUT BEGIN ===
> 1/5 Checking commit e9fd7498060c (file-posix: split hdev_refresh_limits from 
> raw_refresh_limits)
> 2/5 Checking commit 08ba263f565d (file-posix: add sg_get_max_segments that 
> actually works with sg)
> 3/5 Checking commit 35c66d636d83 (block: add max_ioctl_transfer to 
> BlockLimits)
> 4/5 Checking commit 61f49e1c953b (block: use blk_get_max_ioctl_transfer for 
> SCSI passthrough)
> 5/5 Checking commit 77c9000b7c30 (block/scsi: correctly emulate the VPD block 
> limits page)
> ERROR: braces {} are necessary for all arms of this statement
> #39: FILE: hw/scsi/scsi-generic.c:204:
> +if (len < r->buflen)
+1 Good bot :-)

Best regards,
Maxim Levitsky

> [...]
> 
> total: 1 errors, 0 warnings, 28 lines checked
> 
> Patch 5/5 has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
> 
> === OUTPUT END ===
> 
> Test command exited with code: 1
> 
> 
> The full log is available at
> http://patchew.org/logs/20201209135355.561745-1-mlevi...@redhat.com/testing.checkpatch/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-de...@redhat.com

Re: qemu 6.0 rbd driver rewrite

2020-12-09 Thread Jason Dillaman

On Wed, Dec 9, 2020 at 7:19 AM Peter Lieven  wrote:
>
> Am 01.12.20 um 13:40 schrieb Peter Lieven:
> > Hi,
> >
> >
> > i would like to submit a series for 6.0 which will convert the aio hooks to 
> > native coroutine hooks and add write zeroes support.
> >
> > The aio routines are nowadays just an emulation on top of coroutines which 
> > add additional overhead.
> >
> > For this I would like to lift the minimum librbd requirement to luminous 
> > release to get rid of the ifdef'ry in the code.
> >
> >
> > Any objections?

None from me (speaking in my role under Ceph) -- even Luminous is EoL
for us upstream. Hopefully no one would attempt to install QEMU 6 but
expect to keep librbd frozen at a >3 year old Kraken or earlier
release.

> > Best,
> >
> > Peter
> >
>
> Kindly pinging as the 6.0 dev tree is now open. Also cc'ing qemu-devel which 
> I accidently forgot.
>
>
> Peter
>
>

-- 
Jason

[PATCH v2 5/5] block/scsi: correctly emulate the VPD block limits page

2020-12-09 Thread Maxim Levitsky

When the device doesn't support the VPD block limits page, we emulate it even
for SCSI passthrough.

As a part of the emulation we need to add it to the 'Supported VPD Pages'

The code that does this adds it to the page, but it doesn't increase the length
of the data to be copied to the guest, thus the guest never sees the VPD block
limits page as supported.

Bump the transfer size by 1 in this case.

Signed-off-by: Maxim Levitsky 
---
 hw/scsi/scsi-generic.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 6df67bf889..4354469841 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -134,7 +134,7 @@ static int execute_command(BlockBackend *blk,
 return 0;
 }
 
-static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
+static int scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s, int len)
 {
 uint8_t page, page_idx;
 
@@ -200,8 +200,12 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, 
SCSIDevice *s)
 r->buf[page_idx] = 0xb0;
 }
 stw_be_p(r->buf + 2, lduw_be_p(r->buf + 2) + 1);
+
+if (len < r->buflen)
+len++;
 }
 }
+return len;
 }
 
 static int scsi_generic_emulate_block_limits(SCSIGenericReq *r, SCSIDevice *s)
@@ -316,7 +320,7 @@ static void scsi_read_complete(void * opaque, int ret)
 }
 }
 if (r->req.cmd.buf[0] == INQUIRY) {
-scsi_handle_inquiry_reply(r, s);
+len = scsi_handle_inquiry_reply(r, s, len);
 }
 
 req_complete:
-- 
2.26.2

[PATCH v2 4/5] block: use blk_get_max_ioctl_transfer for SCSI passthrough

2020-12-09 Thread Maxim Levitsky

Switch file-posix to expose only the max_ioctl_transfer limit.

Let the iscsi driver work as it did before since it is bound by the transfer
limit in both regular read/write and in SCSI passthrough case.

Switch the scsi-disk and scsi-block drivers to read the SG max transfer limits
using the new blk_get_max_ioctl_transfer interface.


Fixes: 867eccfed8 ("file-posix: Use max transfer length/segment count only for 
SCSI passthrough")
Signed-off-by: Maxim Levitsky 
---
 block/file-posix.c | 4 ++--
 block/iscsi.c  | 1 +
 hw/scsi/scsi-generic.c | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 10ebc4c5b7..0a94211847 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1284,12 +1284,12 @@ static void hdev_refresh_limits(BlockDriverState *bs, 
Error **errp)
get_max_transfer_length(s->fd);
 
 if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-bs->bl.max_transfer = pow2floor(ret);
+bs->bl.max_ioctl_transfer = pow2floor(ret);
 }
 
 ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
 if (ret > 0) {
-bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+bs->bl.max_ioctl_transfer = MIN_NON_ZERO(bs->bl.max_ioctl_transfer,
ret * qemu_real_host_page_size);
 }
 
diff --git a/block/iscsi.c b/block/iscsi.c
index e30a7e3606..3685da2971 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -2065,6 +2065,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
 
 if (max_xfer_len * block_size < INT_MAX) {
 bs->bl.max_transfer = max_xfer_len * iscsilun->block_size;
+bs->bl.max_ioctl_transfer = bs->bl.max_transfer;
 }
 
 if (iscsilun->lbp.lbpu) {
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 2cb23ca891..6df67bf889 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -167,7 +167,7 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, 
SCSIDevice *s)
 page = r->req.cmd.buf[2];
 if (page == 0xb0) {
 uint32_t max_transfer =
-blk_get_max_transfer(s->conf.blk) / s->blocksize;
+blk_get_max_ioctl_transfer(s->conf.blk) / s->blocksize;
 
 assert(max_transfer);
 stl_be_p(>buf[8], max_transfer);
@@ -210,7 +210,7 @@ static int scsi_generic_emulate_block_limits(SCSIGenericReq 
*r, SCSIDevice *s)
 uint8_t buf[64];
 
 SCSIBlockLimits bl = {
-.max_io_sectors = blk_get_max_transfer(s->conf.blk) / s->blocksize
+.max_io_sectors = blk_get_max_ioctl_transfer(s->conf.blk) / 
s->blocksize
 };
 
 memset(r->buf, 0, r->buflen);
-- 
2.26.2

[PATCH v2 2/5] file-posix: add sg_get_max_segments that actually works with sg

2020-12-09 Thread Maxim Levitsky

From: Tom Yan 

sg devices have different major/minor than their corresponding
block devices. Using sysfs to get max segments never really worked
for them.

Fortunately the sg driver provides an ioctl to get sg_tablesize,
which is apparently equivalent to max segments.

Signed-off-by: Tom Yan 
Signed-off-by: Maxim Levitsky 
---
 block/file-posix.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 226ddbbdad..10ebc4c5b7 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1181,6 +1181,26 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
+static int sg_get_max_segments(int fd)
+{
+/*
+ * /dev/sg* character devices report 'max_segments' via
+ * SG_GET_SG_TABLESIZE ioctl
+ */
+
+#ifdef SG_GET_SG_TABLESIZE
+long max_segments = 0;
+
+if (ioctl(fd, SG_GET_SG_TABLESIZE, _segments) == 0) {
+return max_segments;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
 static int get_max_transfer_length(int fd)
 {
 #if defined(BLKSECTGET)
@@ -1267,7 +1287,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_transfer = pow2floor(ret);
 }
 
-ret = get_max_segments(s->fd);
+ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
 if (ret > 0) {
 bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
ret * qemu_real_host_page_size);
-- 
2.26.2

[PATCH v2 3/5] block: add max_ioctl_transfer to BlockLimits

2020-12-09 Thread Maxim Levitsky

Maximum transfer size when accessing a kernel block device is only relevant
when using SCSI passthrough (SG_IO ioctl) since only in this case the requests
are passed directly to underlying hardware with no pre-processing.
Same is true when using /dev/sg* character devices (which only support SG_IO)

Therefore split the block driver's advertized max transfer size by
the regular max transfer size, and the max transfer size for SCSI passthrough
(the new max_ioctl_transfer field)

In the next patch, the qemu block drivers that support SCSI passthrough
will set the max_ioctl_transfer field, and simultaneously, the block devices
that implement scsi passthrough will switch to 'blk_get_max_ioctl_transfer' to
query and to pass it to the guest.

Signed-off-by: Maxim Levitsky 
---
 block/block-backend.c  | 12 
 block/io.c |  2 ++
 include/block/block_int.h  |  4 
 include/sysemu/block-backend.h |  1 +
 4 files changed, 19 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index ce78d30794..c1d149a755 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1938,6 +1938,18 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
 return MIN_NON_ZERO(max, INT_MAX);
 }
 
+/* Returns the maximum transfer length, for SCSI passthrough */
+uint32_t blk_get_max_ioctl_transfer(BlockBackend *blk)
+{
+BlockDriverState *bs = blk_bs(blk);
+uint32_t max = 0;
+
+if (bs) {
+max = bs->bl.max_ioctl_transfer;
+}
+return MIN_NON_ZERO(max, INT_MAX);
+}
+
 int blk_get_max_iov(BlockBackend *blk)
 {
 return blk->root->bs->bl.max_iov;
diff --git a/block/io.c b/block/io.c
index ec5e152bb7..3eae176992 100644
--- a/block/io.c
+++ b/block/io.c
@@ -126,6 +126,8 @@ static void bdrv_merge_limits(BlockLimits *dst, const 
BlockLimits *src)
 {
 dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
 dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+dst->max_ioctl_transfer = MIN_NON_ZERO(dst->max_ioctl_transfer,
+src->max_ioctl_transfer);
 dst->opt_mem_alignment = MAX(dst->opt_mem_alignment,
  src->opt_mem_alignment);
 dst->min_mem_alignment = MAX(dst->min_mem_alignment,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 95d9333be1..e9874c8c23 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -678,6 +678,10 @@ typedef struct BlockLimits {
  * clamped down. */
 uint32_t max_transfer;
 
+/* Maximal transfer length for SCSI passthrough (ioctl interface) */
+uint32_t max_ioctl_transfer;
+
+
 /* memory alignment, in bytes so that no bounce buffer is needed */
 size_t min_mem_alignment;
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8203d7f6f9..b019a37b7a 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -203,6 +203,7 @@ void blk_eject(BlockBackend *blk, bool eject_flag);
 int blk_get_flags(BlockBackend *blk);
 uint32_t blk_get_request_alignment(BlockBackend *blk);
 uint32_t blk_get_max_transfer(BlockBackend *blk);
+uint32_t blk_get_max_ioctl_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
-- 
2.26.2

[PATCH v2 1/5] file-posix: split hdev_refresh_limits from raw_refresh_limits

2020-12-09 Thread Maxim Levitsky

From: Tom Yan 

We can and should get max transfer length and max segments for all host
devices / cdroms (on Linux).

Also use MIN_NON_ZERO instead when we clamp max transfer length against
max segments.

Signed-off-by: Tom Yan 
Signed-off-by: Maxim Levitsky 
---
 block/file-posix.c | 59 +-
 1 file changed, 43 insertions(+), 16 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index d5fd1dbcd2..226ddbbdad 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1162,6 +1162,12 @@ static void raw_reopen_abort(BDRVReopenState *state)
 
 static int sg_get_max_transfer_length(int fd)
 {
+/*
+ * BLKSECTGET for /dev/sg* character devices incorrectly returns
+ * the max transfer size in bytes (rather than in blocks).
+ * Also note that /dev/sg* doesn't support BLKSSZGET ioctl.
+ */
+
 #ifdef BLKSECTGET
 int max_bytes = 0;
 
@@ -1175,7 +1181,22 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
-static int sg_get_max_segments(int fd)
+static int get_max_transfer_length(int fd)
+{
+#if defined(BLKSECTGET)
+int sect = 0;
+
+if (ioctl(fd, BLKSECTGET, ) == 0) {
+return sect << 9;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
+static int get_max_segments(int fd)
 {
 #ifdef CONFIG_LINUX
 char buf[32];
@@ -1230,23 +1251,29 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 
-if (bs->sg) {
-int ret = sg_get_max_transfer_length(s->fd);
+raw_probe_alignment(bs, s->fd, errp);
+bs->bl.min_mem_alignment = s->buf_align;
+bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+}
 
-if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-bs->bl.max_transfer = pow2floor(ret);
-}
+static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+BDRVRawState *s = bs->opaque;
 
-ret = sg_get_max_segments(s->fd);
-if (ret > 0) {
-bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-  ret * qemu_real_host_page_size);
-}
+int ret = bs->sg ? sg_get_max_transfer_length(s->fd) :
+   get_max_transfer_length(s->fd);
+
+if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
+bs->bl.max_transfer = pow2floor(ret);
 }
 
-raw_probe_alignment(bs, s->fd, errp);
-bs->bl.min_mem_alignment = s->buf_align;
-bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+ret = get_max_segments(s->fd);
+if (ret > 0) {
+bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+   ret * qemu_real_host_page_size);
+}
+
+raw_refresh_limits(bs, errp);
 }
 
 static int check_for_dasd(int fd)
@@ -3601,7 +3628,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_pdiscard   = hdev_co_pdiscard,
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
@@ -3725,7 +3752,7 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_preadv = raw_co_preadv,
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
-- 
2.26.2

[PATCH v2 0/5] SCSI: fix transfer limits for SCSI passthrough

2020-12-09 Thread Maxim Levitsky

This patch series attempts to provide a solution to the problem of the transfer
limits of the raw file driver (host_device/file-posix), some of which I
already tried to fix in the past.

I included 2 patches from Tom Yan which fix two issues with reading the limits
correctly from the */dev/sg* character devices in the first place.

The only change to these patches is that I tweaked a bit the comments in the
source to better document the /dev/sg quirks.

The other two patches in this series split the max transfer limits that qemu
block devices expose in two:
One limit is for the regular IO, and another is for the SG_IO (aka 
bdrv_*_ioctl),
and the two device drivers (scsi-block and scsi-generic) that use the later
are switched to the new interface.

This should ensure that the raw driver can still advertise the unlimited
transfer  length, unless it is used for SG_IO, because that yields the highest
performance.

Also I include a somewhat unrelated fix to a bug I found in qemu's
SCSI passthrough while testing this:
When qemu emulates the VPD block limit page, for a SCSI device that doesn't
implement it, it doesn't really advertise the emulated page to the guest.

I tested this by doing both regular and SG_IO passthrough of my
USB SD card reader.

That device turned out to be a perfect device for the task, since it has max
transfer size of 1024 blocks (512K), and it enforces it.

Also it didn't implement the VPD block limits page,
(transfer size limit probably comes from something USB related) which triggered
the unrelated bug.

I was able to see IO errors without the patches, and the wrong max transfer
size in the guest, and with patches both issues were gone.

I also found an unrelated issue in /dev/sg passthrough in the kernel.
It turns out that in-kernel driver has a limitation of 16 requests in flight,
regardless of what underlying device supports.

With a large multi-threaded fio job  and a debug print in qemu, it is easy to
see it, although the errors don't do much harm to the guest as it retries the
IO, and eventually succeed.
It is an open question if this should be solved.

V2: fixed an issue in a patch from Tom Yan (thanks), and removed
refactoring from last patch according to Paulo's request.

Maxim Levitsky (3):
  block: add max_ioctl_transfer to BlockLimits
  block: use blk_get_max_ioctl_transfer for SCSI passthrough
  block/scsi: correctly emulate the VPD block limits page

Tom Yan (2):
  file-posix: split hdev_refresh_limits from raw_refresh_limits
  file-posix: add sg_get_max_segments that actually works with sg

 block/block-backend.c  | 12 ++
 block/file-posix.c | 77 +++---
 block/io.c |  2 +
 block/iscsi.c  |  1 +
 hw/scsi/scsi-generic.c | 12 --
 include/block/block_int.h  |  4 ++
 include/sysemu/block-backend.h |  1 +
 7 files changed, 90 insertions(+), 19 deletions(-)

-- 
2.26.2

Re: [PATCH RFC] qemu co-mutex crash / question

2020-12-09 Thread Vladimir Sementsov-Ogievskiy


09.12.2020 15:32, Vladimir Sementsov-Ogievskiy wrote:

test-aio-multithread: ../util/qemu-coroutine-lock.c:197: qemu_co_mutex_wake: 
Assertion `mutex == co->wait_on_mutex' failed.

 Thread 18 "test-aio-multit" received signal SIGABRT, Aborted.
 [Switching to Thread 0x7fffe5ffb700 (LWP 24549)]
 0x77063625 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0  0x77063625 in raise () from /lib64/libc.so.6
 #1  0x7704c8d9 in abort () from /lib64/libc.so.6
 #2  0x7704c7a9 in __assert_fail_base.cold () from /lib64/libc.so.6
 #3  0x7705ba66 in __assert_fail () from /lib64/libc.so.6
 #4  0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 
, co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197
 #5  0x5568c5a0 in qemu_co_mutex_unlock (mutex=0x55771360 
) at ../util/qemu-coroutine-lock.c:307
 #6  0x5557acfd in test_multi_co_mutex_entry (opaque=0x0) at 
../tests/test-aio-multithread.c:208
 #7  0x556bb5d7 in coroutine_trampoline (i0=1434467712, i1=21845) 
at ../util/coroutine-ucontext.c:173
 #8  0x77078d30 in ?? () from /lib64/libc.so.6
 #9  0x7fffd850 in ?? ()
 #10 0x in ?? ()
 (gdb) fr 4
 #4  0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 
, co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197
 197 assert(mutex == co->wait_on_mutex);
 (gdb) p mutex
 $1 = (CoMutex *) 0x55771360 
 (gdb) p co->wait_on_mutex
 $2 = (CoMutex *) 0x55771360 
 (gdb) p mutex == co->wait_on_mutex
 $3 = 1

So, it failed, but in gdb the condition is true.. How can that be?


Interesting: I tried to run test on one cpu:

for i in {1..100}; do taskset -c 0 ./build/tests/test-aio-multithread -p 
/aio/multi/mutex/handoff; done

with taskset it takes a lot more tries to reproduce, but finally I have correct 
coredump with correct assertion failure:

(gdb) bt
#0  0x7ff7fa22d625 in raise () from /lib64/libc.so.6
#1  0x7ff7fa2168d9 in abort () from /lib64/libc.so.6
#2  0x7ff7fa2167a9 in __assert_fail_base.cold () from /lib64/libc.so.6
#3  0x7ff7fa225a66 in __assert_fail () from /lib64/libc.so.6
#4  0x564c7ca99153 in qemu_co_mutex_wake (mutex=0x564c7cb7e360 , 
co=0x564c7d3f5c40) at ../util/qemu-coroutine-lock.c:197
#5  0x564c7ca995a0 in qemu_co_mutex_unlock (mutex=0x564c7cb7e360 ) 
at ../util/qemu-coroutine-lock.c:307
#6  0x564c7c987cfd in test_multi_co_mutex_entry (opaque=0x0) at 
../tests/test-aio-multithread.c:208
#7  0x564c7cac85d7 in coroutine_trampoline (i0=2101304064, i1=22092) at 
../util/coroutine-ucontext.c:173
#8  0x7ff7fa242d30 in ?? () from /lib64/libc.so.6
#9  0x7ffd3b3c6ac0 in ?? ()
#10 0x in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ff7ed19c000
(gdb) fr 4
#4  0x564c7ca99153 in qemu_co_mutex_wake (mutex=0x564c7cb7e360 , 
co=0x564c7d3f5c40) at ../util/qemu-coroutine-lock.c:197
197 assert(mutex == co->wait_on_mutex);
(gdb) p mutex
$1 = (CoMutex *) 0x564c7cb7e360 
(gdb) p co->wait_on_mutex
$2 = (CoMutex *) 0x0


other interesting threads:

Thread 7 (Thread 0x7ff7ef19f700 (LWP 261134)):
#0  0x564c7ca98f99 in push_waiter (mutex=0x564c7cb7e360 , 
w=0x7ff7ed09aea0) at ../util/qemu-coroutine-lock.c:151
#1  0x564c7ca991c4 in qemu_co_mutex_lock_slowpath (ctx=0x7ff7e4000b60, 
mutex=0x564c7cb7e360 ) at ../util/qemu-coroutine-lock.c:211
#2  0x564c7ca993f5 in qemu_co_mutex_lock (mutex=0x564c7cb7e360 ) 
at ../util/qemu-coroutine-lock.c:277
#3  0x564c7c987ce2 in test_multi_co_mutex_entry (opaque=0x0) at 
../tests/test-aio-multithread.c:206
#4  0x564c7cac85d7 in coroutine_trampoline (i0=2101304384, i1=22092) at 
../util/coroutine-ucontext.c:173
#5  0x7ff7fa242d30 in ?? () from /lib64/libc.so.6
#6  0x7ffd3b3c6ac0 in ?? ()
#7  0x in ?? ()

#0  0x7ff7fa3cdf55 in nanosleep () from /lib64/libpthread.so.0
#1  0x7ff7fb0d27b7 in g_usleep () from /lib64/libglib-2.0.so.0
#2  0x564c7c987e05 in test_multi_co_mutex (threads=2, seconds=3) at 
../tests/test-aio-multithread.c:237
#3  0x564c7c987eff in test_multi_co_mutex_2_3 () at 
../tests/test-aio-multithread.c:270
#4  0x7ff7fb0cface in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#5  0x7ff7fb0cf874 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#6  0x7ff7fb0cf874 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#7  0x7ff7fb0cf874 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#8  0x7ff7fb0cff7b in g_test_run_suite () from /lib64/libglib-2.0.so.0
#9  0x7ff7fb0cffd5 in g_test_run () from /lib64/libglib-2.0.so.0
#10 0x564c7c98874e in main (argc=1, argv=0x7ffd3b3c7868) at 
../tests/test-aio-multithread.c:459



--
Best regards,
Vladimir

Re: [PATCH RFC] qemu co-mutex crash / question

2020-12-09 Thread Vladimir Sementsov-Ogievskiy


09.12.2020 15:32, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

I have a coredump of our qemu branch, based on rhev-2.12.0-44.el7_8.2,
which in turn is based on v2.12.0.. And don't have any kind of
reproduce.

The backtrace:

 #0  aio_co_schedule (ctx=0x0, co=0x55dd539fa340) at util/async.c:455
 #1  0x55dd51149716 in aio_co_enter (ctx=, co=) at util/async.c:476
 #2  0x55dd511497bc in aio_co_wake (co=) at 
util/async.c:470
 #3  0x55dd5115ea43 in qemu_co_mutex_wake (mutex=0x55dd539c36b0, 
co=) at util/qemu-coroutine-lock.c:197
 #4  qemu_co_mutex_unlock (mutex=mutex@entry=0x55dd539c36b0) at 
util/qemu-coroutine-lock.c:300
 #5  0x55dd5109f4e0 in qcow2_co_pwritev_cluster_compressed 
(qiov=0x7fcbbc972a70, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at 
block/qcow2.c:4360
 #6  qcow2_co_pwritev_compressed (bs=0x55dd539fe000, offset=17582325760, 
bytes=65536, qiov=0x7fcbbc972de0) at block/qcow2.c:4425
 #7  0x55dd510d5cd2 in bdrv_driver_pwritev_compressed 
(qiov=0x7fcbbc972de0, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at 
block/io.c:1227
 #8  bdrv_aligned_pwritev (req=req@entry=0x7fcbbc972c60, 
offset=offset@entry=17582325760, bytes=bytes@entry=65536, align=align@entry=1, 
qiov=qiov@entry=0x7fcbbc972de0, flags=flags@entry=32, child=0x55dd539cea80,
 child=0x55dd539cea80) at block/io.c:1850
 #9  0x55dd510d6369 in bdrv_co_pwritev (child=0x55dd539cea80, 
offset=offset@entry=17582325760, bytes=bytes@entry=65536, 
qiov=qiov@entry=0x7fcbbc972de0, flags=BDRV_REQ_WRITE_COMPRESSED) at 
block/io.c:2144
 #10 0x55dd510c3644 in blk_co_pwritev (blk=0x55dd539fc300, 
offset=17582325760, bytes=65536, qiov=0x7fcbbc972de0, flags=) at 
block/block-backend.c:1237
 #11 0x55dd510c372c in blk_write_entry (opaque=0x7fcbbc972e00) at 
block/block-backend.c:1264
 #12 0x55dd510c1e18 in blk_prw (blk=0x55dd539fc300, offset=17582325760, 
buf=buf@entry=0x55dd54a38000 "", bytes=bytes@entry=65536, 
co_entry=co_entry@entry=0x55dd510c3710 ,
 flags=BDRV_REQ_WRITE_COMPRESSED) at block/block-backend.c:1292
 #13 0x55dd510c2f45 in blk_pwrite (blk=, offset=, buf=buf@entry=0x55dd54a38000, count=count@entry=65536, flags=) at 
block/block-backend.c:1486
 #14 0x55dd510ef949 in nbd_handle_request (errp=0x7fcbbc972ef8, data=0x55dd54a38000 
"", request=, client=0x55dd539ee420) at nbd/server.c:2264
 #15 nbd_trip (opaque=0x55dd539ee420) at nbd/server.c:2393
 #16 0x55dd5115f72a in coroutine_trampoline (i0=, 
i1=) at util/coroutine-ucontext.c:116
 #17 0x7fcbc5422190 in ?? () from 
/work/crash-bugs/PSBM-123528/ccpp-2020-12-08-00_59_06-418945/root/lib64/libc.so.6
 #18 0x7fcbbca736d0 in ?? ()
 #19 0x in ?? ()
 Backtrace stopped: Cannot access memory at address 0x7fcbbc973000
 (gdb) p *co
 $1 = {entry = 0x0, entry_arg = 0x0, caller = 0x0, pool_next = {sle_next = 0x0}, locks_held 
= 0, ctx = 0x0, scheduled = 0x55dd51195660 <__func__.23793> "aio_co_schedule", 
co_queue_next = {sqe_next = 0x0},
   co_queue_wakeup = {sqh_first = 0x0, sqh_last = 0x55dd539f5680}, 
co_scheduled_next = {sle_next = 0x0}}

So, it looks like we want to wake up a coroutine on co-mutex unlock,
but the coroutine is already exited..

I had no idea what how to debug it, and decided to add some assertions,
to make sure that coroutine waiting on mutex is entered through
qemu_co_mutex_unlock, see the patch below.

Still, when I run make check with this patch applied, I faced a crash
in ./tests/test-aio-multithread, my assertion fails:

test-aio-multithread: ../util/qemu-coroutine-lock.c:197: qemu_co_mutex_wake: 
Assertion `mutex == co->wait_on_mutex' failed.

 Thread 18 "test-aio-multit" received signal SIGABRT, Aborted.
 [Switching to Thread 0x7fffe5ffb700 (LWP 24549)]
 0x77063625 in raise () from /lib64/libc.so.6
 (gdb) bt
 #0  0x77063625 in raise () from /lib64/libc.so.6
 #1  0x7704c8d9 in abort () from /lib64/libc.so.6
 #2  0x7704c7a9 in __assert_fail_base.cold () from /lib64/libc.so.6
 #3  0x7705ba66 in __assert_fail () from /lib64/libc.so.6
 #4  0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 
, co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197
 #5  0x5568c5a0 in qemu_co_mutex_unlock (mutex=0x55771360 
) at ../util/qemu-coroutine-lock.c:307
 #6  0x5557acfd in test_multi_co_mutex_entry (opaque=0x0) at 
../tests/test-aio-multithread.c:208
 #7  0x556bb5d7 in coroutine_trampoline (i0=1434467712, i1=21845) 
at ../util/coroutine-ucontext.c:173
 #8  0x77078d30 in ?? () from /lib64/libc.so.6
 #9  0x7fffd850 in ?? ()
 #10 0x in ?? ()
 (gdb) fr 4
 #4  0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 
, co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197
 197 assert(mutex == co->wait_on_mutex);

[PATCH RFC] qemu co-mutex crash / question

2020-12-09 Thread Vladimir Sementsov-Ogievskiy

Hi all!

I have a coredump of our qemu branch, based on rhev-2.12.0-44.el7_8.2,
which in turn is based on v2.12.0.. And don't have any kind of
reproduce.

The backtrace:

#0  aio_co_schedule (ctx=0x0, co=0x55dd539fa340) at util/async.c:455
#1  0x55dd51149716 in aio_co_enter (ctx=, co=) at util/async.c:476
#2  0x55dd511497bc in aio_co_wake (co=) at 
util/async.c:470
#3  0x55dd5115ea43 in qemu_co_mutex_wake (mutex=0x55dd539c36b0, 
co=) at util/qemu-coroutine-lock.c:197
#4  qemu_co_mutex_unlock (mutex=mutex@entry=0x55dd539c36b0) at 
util/qemu-coroutine-lock.c:300
#5  0x55dd5109f4e0 in qcow2_co_pwritev_cluster_compressed 
(qiov=0x7fcbbc972a70, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at 
block/qcow2.c:4360
#6  qcow2_co_pwritev_compressed (bs=0x55dd539fe000, offset=17582325760, 
bytes=65536, qiov=0x7fcbbc972de0) at block/qcow2.c:4425
#7  0x55dd510d5cd2 in bdrv_driver_pwritev_compressed 
(qiov=0x7fcbbc972de0, bytes=65536, offset=17582325760, bs=0x55dd539fe000) at 
block/io.c:1227
#8  bdrv_aligned_pwritev (req=req@entry=0x7fcbbc972c60, 
offset=offset@entry=17582325760, bytes=bytes@entry=65536, align=align@entry=1, 
qiov=qiov@entry=0x7fcbbc972de0, flags=flags@entry=32, child=0x55dd539cea80,
child=0x55dd539cea80) at block/io.c:1850
#9  0x55dd510d6369 in bdrv_co_pwritev (child=0x55dd539cea80, 
offset=offset@entry=17582325760, bytes=bytes@entry=65536, 
qiov=qiov@entry=0x7fcbbc972de0, flags=BDRV_REQ_WRITE_COMPRESSED) at 
block/io.c:2144
#10 0x55dd510c3644 in blk_co_pwritev (blk=0x55dd539fc300, 
offset=17582325760, bytes=65536, qiov=0x7fcbbc972de0, flags=) at 
block/block-backend.c:1237
#11 0x55dd510c372c in blk_write_entry (opaque=0x7fcbbc972e00) at 
block/block-backend.c:1264
#12 0x55dd510c1e18 in blk_prw (blk=0x55dd539fc300, offset=17582325760, 
buf=buf@entry=0x55dd54a38000 "", bytes=bytes@entry=65536, 
co_entry=co_entry@entry=0x55dd510c3710 ,
flags=BDRV_REQ_WRITE_COMPRESSED) at block/block-backend.c:1292
#13 0x55dd510c2f45 in blk_pwrite (blk=, 
offset=, buf=buf@entry=0x55dd54a38000, count=count@entry=65536, 
flags=) at block/block-backend.c:1486
#14 0x55dd510ef949 in nbd_handle_request (errp=0x7fcbbc972ef8, 
data=0x55dd54a38000 "", request=, client=0x55dd539ee420) at 
nbd/server.c:2264
#15 nbd_trip (opaque=0x55dd539ee420) at nbd/server.c:2393
#16 0x55dd5115f72a in coroutine_trampoline (i0=, 
i1=) at util/coroutine-ucontext.c:116
#17 0x7fcbc5422190 in ?? () from 
/work/crash-bugs/PSBM-123528/ccpp-2020-12-08-00_59_06-418945/root/lib64/libc.so.6
#18 0x7fcbbca736d0 in ?? ()
#19 0x in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fcbbc973000
(gdb) p *co
$1 = {entry = 0x0, entry_arg = 0x0, caller = 0x0, pool_next = {sle_next = 
0x0}, locks_held = 0, ctx = 0x0, scheduled = 0x55dd51195660 <__func__.23793> 
"aio_co_schedule", co_queue_next = {sqe_next = 0x0},
  co_queue_wakeup = {sqh_first = 0x0, sqh_last = 0x55dd539f5680}, 
co_scheduled_next = {sle_next = 0x0}}

So, it looks like we want to wake up a coroutine on co-mutex unlock,
but the coroutine is already exited..

I had no idea what how to debug it, and decided to add some assertions,
to make sure that coroutine waiting on mutex is entered through
qemu_co_mutex_unlock, see the patch below.

Still, when I run make check with this patch applied, I faced a crash
in ./tests/test-aio-multithread, my assertion fails:

test-aio-multithread: ../util/qemu-coroutine-lock.c:197: qemu_co_mutex_wake: 
Assertion `mutex == co->wait_on_mutex' failed.

Thread 18 "test-aio-multit" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe5ffb700 (LWP 24549)]
0x77063625 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x77063625 in raise () from /lib64/libc.so.6
#1  0x7704c8d9 in abort () from /lib64/libc.so.6
#2  0x7704c7a9 in __assert_fail_base.cold () from /lib64/libc.so.6
#3  0x7705ba66 in __assert_fail () from /lib64/libc.so.6
#4  0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 
, co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197
#5  0x5568c5a0 in qemu_co_mutex_unlock (mutex=0x55771360 
) at ../util/qemu-coroutine-lock.c:307
#6  0x5557acfd in test_multi_co_mutex_entry (opaque=0x0) at 
../tests/test-aio-multithread.c:208
#7  0x556bb5d7 in coroutine_trampoline (i0=1434467712, i1=21845) at 
../util/coroutine-ucontext.c:173
#8  0x77078d30 in ?? () from /lib64/libc.so.6
#9  0x7fffd850 in ?? ()
#10 0x in ?? ()
(gdb) fr 4
#4  0x5568c153 in qemu_co_mutex_wake (mutex=0x55771360 
, co=0x55803ec0) at ../util/qemu-coroutine-lock.c:197
197 assert(mutex == co->wait_on_mutex);
(gdb) p mutex
$1 = (CoMutex *) 0x55771360 
(gdb) p co->wait_on_mutex
$2 =

Re: qemu 6.0 rbd driver rewrite

2020-12-09 Thread Peter Lieven

Am 01.12.20 um 13:40 schrieb Peter Lieven:
> Hi,
>
>
> i would like to submit a series for 6.0 which will convert the aio hooks to 
> native coroutine hooks and add write zeroes support.
>
> The aio routines are nowadays just an emulation on top of coroutines which 
> add additional overhead.
>
> For this I would like to lift the minimum librbd requirement to luminous 
> release to get rid of the ifdef'ry in the code.
>
>
> Any objections?
>
>
> Best,
>
> Peter
>

Kindly pinging as the 6.0 dev tree is now open. Also cc'ing qemu-devel which I 
accidently forgot.


Peter

[PATCH] block/nfs: fix int overflow in nfs_client_open_qdict

2020-12-09 Thread Peter Lieven

nfs_client_open returns the file size in sectors. This effectively
makes it impossible to open files larger than 1TB.

Fixes: a1a42af422d46812f1f0cebe6b230c20409a3731
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven 
---
 block/nfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nfs.c b/block/nfs.c
index 77905f516d..8c1968bb41 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -592,7 +592,7 @@ static int64_t nfs_client_open_qdict(NFSClient *client, 
QDict *options,
  int flags, int open_flags, Error **errp)
 {
 BlockdevOptionsNfs *opts;
-int ret;
+int64_t ret;
 
 opts = nfs_options_qdict_to_qapi(options, errp);
 if (opts == NULL) {
-- 
2.17.1

Re: [PATCH] hw/block/nvme: fix bad clearing of CAP

2020-12-09 Thread Klaus Jensen

On Dec  8 10:16, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Commit 37712e00b1f0 ("hw/block/nvme: factor out pmr setup") changed the
> control flow such that the CAP register is erronously cleared after
> nvme_init_pmr() has configured it. Since the entire NvmeCtrl structure
> is zero-filled initially, there is no need for the explicit clearing, so
> just remove it.
> 
> Fixes: 37712e00b1f0 ("hw/block/nvme: factor out pmr setup")
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 8814201364c1..28416b18a5c0 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -3040,7 +3040,6 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  id->psd[0].enlat = cpu_to_le32(0x10);
>  id->psd[0].exlat = cpu_to_le32(0x4);
>  
> -n->bar.cap = 0;
>  NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
>  NVME_CAP_SET_CQR(n->bar.cap, 1);
>  NVME_CAP_SET_TO(n->bar.cap, 0xf);
> -- 
> 2.29.2
> 
> 

Thanks for the reviews, applied to nvme-next.


signature.asc
Description: PGP signature

[PATCH v4 2/2] hw/block/nvme: add simple copy command

2020-12-09 Thread Klaus Jensen

From: Klaus Jensen 

Add support for TP 4065a ("Simple Copy Command"), v2020.05.04
("Ratified").

The implementation uses a bounce buffer to first read in the source
logical blocks, then issue a write of that bounce buffer. The default
maximum number of source logical blocks is 128, translating to 512 KiB
for 4k logical blocks which aligns with the default value of MDTS.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h|   4 +
 hw/block/nvme.h   |   1 +
 hw/block/nvme-ns.c|   8 ++
 hw/block/nvme.c   | 217 +-
 hw/block/trace-events |   6 ++
 5 files changed, 235 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 44bf6271b744..745d288b09cf 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,10 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+
+uint16_t mssrl;
+uint32_t mcl;
+uint8_t  msrc;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 574333caa3f9..f549abeeb930 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -62,6 +62,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
 case NVME_CMD_DSM:  return "NVME_NVM_CMD_DSM";
+case NVME_CMD_COPY: return "NVME_NVM_CMD_COPY";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
 }
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 2d69b5177b51..f53f8fc56fd8 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -59,6 +59,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 
 id_ns->npda = id_ns->npdg = npdg - 1;
 
+/* simple copy */
+id_ns->mssrl = cpu_to_le16(ns->params.mssrl);
+id_ns->mcl = cpu_to_le32(ns->params.mcl);
+id_ns->msrc = ns->params.msrc;
+
 return 0;
 }
 
@@ -150,6 +155,9 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
+DEFINE_PROP_UINT16("mssrl", NvmeNamespace, params.mssrl, 128),
+DEFINE_PROP_UINT32("mcl", NvmeNamespace, params.mcl, 128),
+DEFINE_PROP_UINT8("msrc", NvmeNamespace, params.msrc, 127),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 8814201364c1..fbfeb7ac8140 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -999,6 +999,109 @@ static void nvme_aio_discard_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
+struct nvme_copy_ctx {
+int copies;
+uint8_t *bounce;
+uint32_t nlb;
+};
+
+struct nvme_copy_in_ctx {
+NvmeRequest *req;
+QEMUIOVector iov;
+};
+
+static void nvme_copy_cb(void *opaque, int ret)
+{
+NvmeRequest *req = opaque;
+NvmeNamespace *ns = req->ns;
+struct nvme_copy_ctx *ctx = req->opaque;
+
+trace_pci_nvme_copy_cb(nvme_cid(req));
+
+if (!ret) {
+block_acct_done(blk_get_stats(ns->blkconf.blk), >acct);
+} else {
+block_acct_failed(blk_get_stats(ns->blkconf.blk), >acct);
+nvme_aio_err(req, ret);
+}
+
+g_free(ctx->bounce);
+g_free(ctx);
+
+nvme_enqueue_req_completion(nvme_cq(req), req);
+}
+
+static void nvme_copy_in_complete(NvmeRequest *req)
+{
+NvmeNamespace *ns = req->ns;
+NvmeCopyCmd *copy = (NvmeCopyCmd *)>cmd;
+struct nvme_copy_ctx *ctx = req->opaque;
+uint64_t sdlba = le64_to_cpu(copy->sdlba);
+uint16_t status;
+
+trace_pci_nvme_copy_in_complete(nvme_cid(req));
+
+block_acct_done(blk_get_stats(ns->blkconf.blk), >acct);
+
+status = nvme_check_bounds(ns, sdlba, ctx->nlb);
+if (status) {
+trace_pci_nvme_err_invalid_lba_range(sdlba, ctx->nlb, ns->id_ns.nsze);
+req->status = status;
+
+g_free(ctx->bounce);
+g_free(ctx);
+
+nvme_enqueue_req_completion(nvme_cq(req), req);
+
+return;
+}
+
+qemu_iovec_init(>iov, 1);
+qemu_iovec_add(>iov, ctx->bounce, nvme_l2b(ns, ctx->nlb));
+
+block_acct_start(blk_get_stats(ns->blkconf.blk), >acct,
+ nvme_l2b(ns, ctx->nlb), BLOCK_ACCT_WRITE);
+
+req->aiocb = blk_aio_pwritev(ns->blkconf.blk, nvme_l2b(ns, sdlba),
+ >iov, 0, nvme_copy_cb, req);
+}
+
+static void nvme_aio_copy_in_cb(void *opaque, int ret)
+{
+struct nvme_copy_in_ctx *in_ctx = opaque;
+NvmeRequest *req = in_ctx->req;
+NvmeNamespace *ns = req->ns;
+struct nvme_copy_ctx *ctx = req->opaque;
+
+qemu_iovec_destroy(_ctx->iov);
+g_free(in_ctx);
+
+trace_pci_nvme_aio_copy_in_cb(nvme_cid(req));
+
+if (ret) {
+nvme_aio_err(req, ret);
+}
+
+ctx->copies--;
+
+if (ctx->copies) {
+return;
+}
+
+if (req->status) {
+

[PATCH v4 1/2] nvme: updated shared header for copy command

2020-12-09 Thread Klaus Jensen

From: Klaus Jensen 

Add new data structures and types for the Simple Copy command.

Signed-off-by: Klaus Jensen 
Cc: Stefan Hajnoczi 
Cc: Fam Zheng 
Reviewed-by: Minwoo Im 
Acked-by: Stefan Hajnoczi 
---
 include/block/nvme.h | 45 ++--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e95ff6ca9b37..be3aca913a1d 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -472,6 +472,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_COPY   = 0x19,
 };
 
 typedef struct QEMU_PACKED NvmeDeleteQ {
@@ -603,6 +604,35 @@ typedef struct QEMU_PACKED NvmeDsmRange {
 uint64_tslba;
 } NvmeDsmRange;
 
+enum {
+NVME_COPY_FORMAT_0 = 0x0,
+};
+
+typedef struct NvmeCopyCmd {
+uint8_t opcode;
+uint8_t flags;
+uint16_tcid;
+uint32_tnsid;
+uint32_trsvd2[4];
+NvmeCmdDptr dptr;
+uint64_tsdlba;
+uint32_tcdw12;
+uint32_tcdw13;
+uint32_tilbrt;
+uint16_tlbat;
+uint16_tlbatm;
+} NvmeCopyCmd;
+
+typedef struct NvmeCopySourceRange {
+uint8_t  rsvd0[8];
+uint64_t slba;
+uint16_t nlb;
+uint8_t  rsvd18[6];
+uint32_t eilbrt;
+uint16_t elbat;
+uint16_t elbatm;
+} NvmeCopySourceRange;
+
 enum NvmeAsyncEventRequest {
 NVME_AER_TYPE_ERROR = 0,
 NVME_AER_TYPE_SMART = 1,
@@ -680,6 +710,7 @@ enum NvmeStatusCodes {
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
+NVME_CMD_SIZE_LIMIT = 0x0183,
 NVME_WRITE_FAULT= 0x0280,
 NVME_UNRECOVERED_READ   = 0x0281,
 NVME_E2E_GUARD_ERROR= 0x0282,
@@ -831,7 +862,7 @@ typedef struct QEMU_PACKED NvmeIdCtrl {
 uint8_t nvscc;
 uint8_t rsvd531;
 uint16_tacwu;
-uint8_t rsvd534[2];
+uint16_tocfs;
 uint32_tsgls;
 uint8_t rsvd540[228];
 uint8_t subnqn[256];
@@ -854,6 +885,11 @@ enum NvmeIdCtrlOncs {
 NVME_ONCS_FEATURES  = 1 << 4,
 NVME_ONCS_RESRVATIONS   = 1 << 5,
 NVME_ONCS_TIMESTAMP = 1 << 6,
+NVME_ONCS_COPY  = 1 << 8,
+};
+
+enum NvmeIdCtrlOcfs {
+NVME_OCFS_COPY_FORMAT_0 = 1 << 0,
 };
 
 enum NvmeIdCtrlFrmw {
@@ -995,7 +1031,10 @@ typedef struct QEMU_PACKED NvmeIdNs {
 uint16_tnpdg;
 uint16_tnpda;
 uint16_tnows;
-uint8_t rsvd74[30];
+uint16_tmssrl;
+uint32_tmcl;
+uint8_t msrc;
+uint8_t rsvd81[23];
 uint8_t nguid[16];
 uint64_teui64;
 NvmeLBAFlbaf[16];
@@ -1059,6 +1098,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 4);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16);
+QEMU_BUILD_BUG_ON(sizeof(NvmeCopySourceRange) != 32);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDeleteQ) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeCreateCq) != 64);
@@ -1066,6 +1106,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeCopyCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
-- 
2.29.2

[PATCH v4 0/2] hw/block/nvme: add simple copy command

2020-12-09 Thread Klaus Jensen

From: Klaus Jensen 

Add support for TP 4065 ("Simple Copy Command").

Changes for v4

  * merge for-loops (Keith)

Changes for v3

  * rebased on nvme-next
  * changed the default msrc value to a more reasonable 127 from 255 to
better align with the default mcl value of 128.

Changes for v2

  * prefer style that aligns with existing NvmeIdCtrl field enums
(Minwoo)
  * swapped elbat/elbatm fields in copy source range. I've kept the R-b
and A-b from Minwoo and Stefan since this is a non-functional change
(the device does not use these fields at all).

Klaus Jensen (2):
  nvme: updated shared header for copy command
  hw/block/nvme: add simple copy command

 hw/block/nvme-ns.h|   4 +
 hw/block/nvme.h   |   1 +
 include/block/nvme.h  |  45 -
 hw/block/nvme-ns.c|   8 ++
 hw/block/nvme.c   | 217 +-
 hw/block/trace-events |   6 ++
 6 files changed, 278 insertions(+), 3 deletions(-)

-- 
2.29.2

Re: [PATCH] hw/block/nvme: fix bad clearing of CAP

2020-12-09 Thread Minwoo Im

Hello,

Reviewed-by: Minwoo Im

[PATCH v3 1/3] docs: generate qemu-storage-daemon-qmp-ref(7) man page

2020-12-09 Thread Stefan Hajnoczi

Although individual qemu-storage-daemon QMP commands are identical to
QEMU QMP commands, qemu-storage-daemon only supports a subset of QEMU's
QMP commands. Generate a manual page of just the commands supported by
qemu-storage-daemon so that users know exactly what is available in
qemu-storage-daemon.

Add an h1 heading in storage-daemon/qapi/qapi-schema.json so that
block-core.json is at the h2 heading level.

Signed-off-by: Stefan Hajnoczi 
---
 docs/interop/index.rst   |  1 +
 docs/interop/qemu-storage-daemon-qmp-ref.rst | 13 +
 storage-daemon/qapi/qapi-schema.json |  3 +++
 docs/interop/conf.py |  2 ++
 docs/meson.build |  1 +
 5 files changed, 20 insertions(+)
 create mode 100644 docs/interop/qemu-storage-daemon-qmp-ref.rst

diff --git a/docs/interop/index.rst b/docs/interop/index.rst
index cd78d679d8..95d56495f6 100644
--- a/docs/interop/index.rst
+++ b/docs/interop/index.rst
@@ -20,6 +20,7 @@ Contents:
qemu-ga
qemu-ga-ref
qemu-qmp-ref
+   qemu-storage-daemon-qmp-ref
vhost-user
vhost-user-gpu
vhost-vdpa
diff --git a/docs/interop/qemu-storage-daemon-qmp-ref.rst 
b/docs/interop/qemu-storage-daemon-qmp-ref.rst
new file mode 100644
index 00..caf9dad23a
--- /dev/null
+++ b/docs/interop/qemu-storage-daemon-qmp-ref.rst
@@ -0,0 +1,13 @@
+QEMU Storage Daemon QMP Reference Manual
+
+
+..
+   TODO: the old Texinfo manual used to note that this manual
+   is GPL-v2-or-later. We should make that reader-visible
+   both here and in our Sphinx manuals more generally.
+
+..
+   TODO: display the QEMU version, both here and in our Sphinx manuals
+   more generally.
+
+.. qapi-doc:: storage-daemon/qapi/qapi-schema.json
diff --git a/storage-daemon/qapi/qapi-schema.json 
b/storage-daemon/qapi/qapi-schema.json
index c6ad5ae1e3..28117c3aac 100644
--- a/storage-daemon/qapi/qapi-schema.json
+++ b/storage-daemon/qapi/qapi-schema.json
@@ -15,6 +15,9 @@
 
 { 'include': '../../qapi/pragma.json' }
 
+##
+# = Block devices
+##
 { 'include': '../../qapi/block-core.json' }
 { 'include': '../../qapi/block-export.json' }
 { 'include': '../../qapi/char.json' }
diff --git a/docs/interop/conf.py b/docs/interop/conf.py
index 2634ca3410..f4370aaa13 100644
--- a/docs/interop/conf.py
+++ b/docs/interop/conf.py
@@ -23,4 +23,6 @@ man_pages = [
  [], 7),
 ('qemu-qmp-ref', 'qemu-qmp-ref', 'QEMU QMP Reference Manual',
  [], 7),
+('qemu-storage-daemon-qmp-ref', 'qemu-storage-daemon-qmp-ref',
+ 'QEMU Storage Daemon QMP Reference Manual', [], 7),
 ]
diff --git a/docs/meson.build b/docs/meson.build
index ebd85d59f9..df5dc50485 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -56,6 +56,7 @@ if build_docs
 'qemu-ga.8': (have_tools ? 'man8' : ''),
 'qemu-ga-ref.7': 'man7',
 'qemu-qmp-ref.7': 'man7',
+'qemu-storage-daemon-qmp-ref.7': (have_tools ? 'man7' : ''),
 },
 'tools': {
 'qemu-img.1': (have_tools ? 'man1' : ''),
-- 
2.28.0

[PATCH v3 3/3] MAINTAINERS: add Kevin Wolf as storage daemon maintainer

2020-12-09 Thread Stefan Hajnoczi

The MAINTAINERS file was not updated when the storage daemon was merged.

Signed-off-by: Stefan Hajnoczi 
Acked-by: Kevin Wolf 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 68bc160f41..8676730cc9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2146,6 +2146,15 @@ F: qobject/block-qdict.c
 F: tests/check-block-qdict.c
 T: git https://repo.or.cz/qemu/kevin.git block
 
+Storage daemon
+M: Kevin Wolf 
+L: qemu-block@nongnu.org
+S: Supported
+F: storage-daemon/
+F: docs/interop/qemu-storage-daemon-qmp-ref.rst
+F: docs/tools/qemu-storage-daemon.rst
+T: git https://repo.or.cz/qemu/kevin.git block
+
 Block I/O path
 M: Stefan Hajnoczi 
 M: Fam Zheng 
-- 
2.28.0

[PATCH v3 2/3] docs: add qemu-storage-daemon(1) man page

2020-12-09 Thread Stefan Hajnoczi

Document the qemu-storage-daemon tool. Most of the command-line options
are identical to their QEMU counterparts. Perhaps Sphinx hxtool
integration could be extended to extract documentation for individual
command-line options so they can be shared. For now the
qemu-storage-daemon simply refers to the qemu(1) man page where the
command-line options are identical.

Signed-off-by: Stefan Hajnoczi 
---
 docs/tools/index.rst   |   1 +
 docs/tools/qemu-storage-daemon.rst | 148 +
 docs/tools/conf.py |   2 +
 3 files changed, 151 insertions(+)
 create mode 100644 docs/tools/qemu-storage-daemon.rst

diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index b99f86c7c6..3a5829c17a 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -11,6 +11,7 @@ Contents:
:maxdepth: 2
 
qemu-img
+   qemu-storage-daemon
qemu-nbd
qemu-pr-helper
qemu-trace-stap
diff --git a/docs/tools/qemu-storage-daemon.rst 
b/docs/tools/qemu-storage-daemon.rst
new file mode 100644
index 00..f63627eaf6
--- /dev/null
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -0,0 +1,148 @@
+QEMU Storage Daemon
+===
+
+Synopsis
+
+
+**qemu-storage-daemon** [options]
+
+Description
+---
+
+qemu-storage-daemon provides disk image functionality from QEMU, qemu-img, and
+qemu-nbd in a long-running process controlled via QMP commands without running
+a virtual machine. It can export disk images, run block job operations, and
+perform other disk-related operations. The daemon is controlled via a QMP
+monitor and initial configuration from the command-line.
+
+The daemon offers the following subset of QEMU features:
+
+* Block nodes
+* Block jobs
+* Block exports
+* Throttle groups
+* Character devices
+* Crypto and secrets
+* QMP
+* IOThreads
+
+Commands can be sent over a QEMU Monitor Protocol (QMP) connection. See the
+:manpage:`qemu-storage-daemon-qmp-ref(7)` manual page for a description of the
+commands.
+
+The daemon runs until it is stopped using the ``quit`` QMP command or
+SIGINT/SIGHUP/SIGTERM.
+
+**Warning:** Never modify images in use by a running virtual machine or any
+other process; this may destroy the image. Also, be aware that querying an
+image that is being modified by another process may encounter inconsistent
+state.
+
+Options
+---
+
+.. program:: qemu-storage-daemon
+
+Standard options:
+
+.. option:: -h, --help
+
+  Display help and exit
+
+.. option:: -V, --version
+
+  Display version information and exit
+
+.. option:: -T, --trace [[enable=]PATTERN][,events=FILE][,file=FILE]
+
+  .. include:: ../qemu-option-trace.rst.inc
+
+.. option:: --blockdev BLOCKDEVDEF
+
+  is a block node definition. See the :manpage:`qemu(1)` manual page for a
+  description of block node properties and the :manpage:`qemu-block-drivers(7)`
+  manual page for a description of driver-specific parameters.
+
+.. option:: --chardev CHARDEVDEF
+
+  is a character device definition. See the :manpage:`qemu(1)` manual page for
+  a description of character device properties. A common character device
+  definition configures a UNIX domain socket::
+
+  --chardev socket,id=char1,path=/tmp/qmp.sock,server,nowait
+
+.. option:: --export 
[type=]nbd,id=,node-name=[,name=][,writable=on|off][,bitmap=]
+  --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,logical-block-size=][,num-queues=]
+  --export 
[type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,logical-block-size=][,num-queues=]
+
+  is a block export definition. ``node-name`` is the block node that should be
+  exported. ``writable`` determines whether or not the export allows write
+  requests for modifying data (the default is off).
+
+  The ``nbd`` export type requires ``--nbd-server`` (see below). ``name`` is
+  the NBD export name. ``bitmap`` is the name of a dirty bitmap reachable from
+  the block node, so the NBD client can use NBD_OPT_SET_META_CONTEXT with the
+  metadata context name "qemu:dirty-bitmap:BITMAP" to inspect the bitmap.
+
+  The ``vhost-user-blk`` export type takes a vhost-user socket address on which
+  it accept incoming connections. Both
+  ``addr.type=unix,addr.path=`` for UNIX domain sockets and
+  ``addr.type=fd,addr.str=`` for file descriptor passing are supported.
+  ``logical-block-size`` sets the logical block size in bytes (the default is
+  512). ``num-queues`` sets the number of virtqueues (the default is 1).
+
+.. option:: --monitor MONITORDEF
+
+  is a QMP monitor definition. See the :manpage:`qemu(1)` manual page for
+  a description of QMP monitor properties. A common QMP monitor definition
+  configures a monitor on character device ``char1``::
+
+  --monitor chardev=char1
+
+.. option:: --nbd-server 
addr.type=inet,addr.host=,addr.port=[,tls-creds=][,tls-authz=][,max-connections=]
+  --nbd-server 
addr.type=unix,addr.path=[,tls-creds=][,tls-authz=][,max-connections=]
+
+  is a server

[PATCH v3 0/3] docs: add qemu-storage-daemon documentation

2020-12-09 Thread Stefan Hajnoczi

v3:
 * Address Kevin's comments
v2:
 * Drop block-core.json h2 header removal, add an h1 header to
   storage-daemon/qapi/qapi-schema.json instead [Kevin]
 * Add Examples section to man page [Kevin]

Add documentation for the qemu-storage-daemon program and its QMP commands.

The man page looks like this:

QEMU-STORAGE-DAEMON(1)   QEMU   QEMU-STORAGE-DAEMON(1)

NAME
   qemu-storage-daemon - QEMU storage daemon

SYNOPSIS
   qemu-storage-daemon [options]

DESCRIPTION
   qemu-storage-daemon  provides  disk  image  functionality  from
   QEMU, qemu-img, and qemu-nbd in  a  long-running  process  con‐
   trolled  via QMP commands without running a virtual machine. It
   can export disk images, run block job operations,  and  perform
   other  disk-related  operations. The daemon is controlled via a
   QMP monitor and initial configuration from the command-line.

   The daemon offers the following subset of QEMU features:

   • Block nodes

   • Block jobs

   • Block exports

   • Throttle groups

   • Character devices

   • Crypto and secrets

   • QMP

   • IOThreads

   Commands can be sent over a QEMU Monitor Protocol (QMP) connec‐
   tion.  See the qemu-storage-daemon-qmp-ref(7) manual page for a
   description of the commands.

   The daemon runs until it is stopped using the quit QMP  command
   or SIGINT/SIGHUP/SIGTERM.

   Warning:  Never  modify  images in use by a running virtual ma‐
   chine or any other process; this may destroy the  image.  Also,
   be  aware  that querying an image that is being modified by an‐
   other process may encounter inconsistent state.

OPTIONS
   Standard options:

   -h, --help
  Display help and exit

   -V, --version
  Display version information and exit

   -T, --trace [[enable=]PATTERN][,events=FILE][,file=FILE]
  Specify tracing options.

  [enable=]PATTERN
 Immediately enable events  matching  PATTERN  (either
 event  name  or  a globbing pattern).  This option is
 only available if QEMU has  been  compiled  with  the
 simple,  log  or  ftrace tracing backend.  To specify
 multiple events or patterns, specify the  -trace  op‐
 tion multiple times.

 Use  -trace  help  to  print a list of names of trace
 points.

  events=FILE
 Immediately enable events listed in FILE.   The  file
 must  contain  one  event  name  (as  listed  in  the
 trace-events-all file) per  line;  globbing  patterns
 are  accepted  too.  This option is only available if
 QEMU has been compiled with the simple, log or ftrace
 tracing backend.

  file=FILE
 Log  output  traces  to  FILE.   This  option is only
 available if QEMU has been compiled with  the  simple
 tracing backend.

   --blockdev BLOCKDEVDEF
  is  a block node definition. See the qemu(1) manual page
  for a description  of  block  node  properties  and  the
  qemu-block-drivers(7)  manual  page for a description of
  driver-specific parameters.

   --chardev CHARDEVDEF
  is a character device definition. See the qemu(1) manual
  page for a description of character device properties. A
  common character device definition configures a UNIX do‐
  main socket:

 --chardev socket,id=char1,path=/tmp/qmp.sock,server,nowait

   --export[type=]nbd,id=,node-name=[,name=][,writable=on|off][,bitmap=]

   --export
   
[type=]vhost-user-blk,id=,node-name=,addr.type=unix,addr.path=[,writable=on|off][,log‐
   ical-block-size=][,num-queues=]

   --export
   
[type=]vhost-user-blk,id=,node-name=,addr.type=fd,addr.str=[,writable=on|off][,log‐
   ical-block-size=][,num-queues=]
  is a block export definition.  node-name  is  the  block
  node   that  should  be  exported.  writable  determines
  whether or not the export allows write requests for mod‐
  ifying data (the default is off).

  The  nbd  export type requires --nbd-server (see below).
  name is the NBD export name. bitmap is  the  name  of  a
  dirty  bitmap  reachable from the block node, so the NBD
  client can use NBD_OPT_SET_META_CONTEXT with  the  meta‐
  data  context name "qemu:dirty-bitmap:BITMAP" to inspect
  the bitmap.

  The vhost-user-blk export type takes a vhost-user socket
  address  on  which  it accept incoming connections. Both
  addr.type=unix,addr.path= for  UNIX  domain

Re: [PATCH v2 1/3] docs: generate qemu-storage-daemon-qmp-ref(7) man page

2020-12-09 Thread Stefan Hajnoczi

On Tue, Oct 06, 2020 at 12:22:55PM +0200, Kevin Wolf wrote:
> Am 10.09.2020 um 16:43 hat Stefan Hajnoczi geschrieben:
> > Although qemu-storage-daemon QMP commands are identical to QEMU QMP
> > commands they are a subset. Generate a manual page of just the commands
> > supported by qemu-storage-daemon so that users know exactly what is
> > available in qemu-storage-daemon.
> > 
> > Add an h1 heading in storage-daemon/qapi/qapi-schema.json so that
> > block-core.json is at the h2 heading level.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> As the series doesn't apply any more, I can't actually try it out
> easily, but is the order of includes in the schema right now?
> 
> I seem to remember that in v1 we discussed that nested includes result
> in an unexpected section structure in the documentation in some cases
> (such as generic jobs being documented in a subsection of block
> devices), and that we need to reorder includes in qapi-schema.json to
> fix this because a more clever doc generator wasn't considered worth the
> effort.

v2 onwards takes a different approach and leaves the header where it is.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v3 2/2] hw/block/nvme: add simple copy command

2020-12-09 Thread Klaus Jensen

On Dec  9 07:13, Keith Busch wrote:
> On Tue, Dec 08, 2020 at 09:33:39AM +0100, Klaus Jensen wrote:
> > +static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
> > +{
> 
> 
> 
> > +for (i = 0; i < nr; i++) {
> > +uint32_t _nlb = le16_to_cpu(range[i].nlb) + 1;
> > +if (_nlb > le16_to_cpu(ns->id_ns.mssrl)) {
> > +return NVME_CMD_SIZE_LIMIT | NVME_DNR;
> > +}
> > +
> > +nlb += _nlb;
> > +}
> > +
> > +if (nlb > le32_to_cpu(ns->id_ns.mcl)) {
> > +return NVME_CMD_SIZE_LIMIT | NVME_DNR;
> > +}
> > +
> > +bounce = bouncep = g_malloc(nvme_l2b(ns, nlb));
> > +
> > +for (i = 0; i < nr; i++) {
> > +uint64_t slba = le64_to_cpu(range[i].slba);
> > +uint32_t nlb = le16_to_cpu(range[i].nlb) + 1;
> > +
> > +status = nvme_check_bounds(ns, slba, nlb);
> > +if (status) {
> > +trace_pci_nvme_err_invalid_lba_range(slba, nlb, 
> > ns->id_ns.nsze);
> > +goto free_bounce;
> > +}
> > +
> > +if (NVME_ERR_REC_DULBE(ns->features.err_rec)) {
> > +status = nvme_check_dulbe(ns, slba, nlb);
> > +if (status) {
> > +goto free_bounce;
> > +}
> > +}
> > +}
> 
> Only comment I have is that these two for-loops look like they can be
> collaped into one, which also simplifies how you account for the bounce
> buffer when error'ing out. 
> 

Yeah. And the shadowing of nlb is not good either. I'll fix it up.


signature.asc
Description: PGP signature

Re: [PATCH v11 00/13] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set

2020-12-09 Thread Klaus Jensen

Hi Dmitry,

By and large, this looks OK to me. There are still some issues here and
there, and some comments of mine that you did not address, but I will
follow up with patches to fix that. Let's get this merged.

It looks like the nvme-next you rebased on is slightly old and missing
two commits:

  "hw/block/nvme: remove superfluous NvmeCtrl parameter" and
  "hw/block/nvme: pull aio error handling"

It caused a couple of conflicts, but nothing that I couldn't fix up.

Since I didn't manage to convince anyone about the zsze and zcap
parameters being in terms of LBAs, I'll revert that to be
'zoned.zone_size' and 'zoned.zone_capacity'.

Finally, would you accept that we skip "hw/block/nvme: Add injection of
Offline/Read-Only zones" for now? I'd like to discuss it a bit since I
think the random injects feels a bit ad-hoc. Back when I did OCSSD
emulation with Hans, we did something like this for setting up state
through a descriptor text file - I think we should explore something
like that before we lock down the two parameters. I'll amend the final
documentation commit to not include those parameters.

Sounds good?

Otherwise, I think this is mergeable to nvme-next. So, for the series
(excluding "hw/block/nvme: Add injection of Offline/Read-Only zones"):

Reviewed-by: Klaus Jensen 

On Dec  9 05:03, Dmitry Fomichev wrote:
> v10 -> v11:
> 
>  - Address review comments by Klaus.
> 
>  - Add a patch to separate the handling of controller reset
>and subsystem shutdown. Place the patch at the beginning
>of the series so it can be picked up separately.
> 
>  - Rebase on the current nvme-next branch.
> 
> v9 -> v10:
> 
>  - Correctly check for MDTS in Zone Management Receive handler.
> 
>  - Change Klaus' "Reviewed-by" email in UUID patch.
> 
> v8 -> v9:
> 
>  - Move the modifications to "include/block/nvme.h" made to
>introduce ZNS-related definitions to a separate patch.
> 
>  - Add a new struct, NvmeZonedResult, along the same lines as the
>existing NvmeAerResult, to carry Zone Append LBA returned to
>the host. Now, there is no need to modify NvmeCqe struct except
>renaming DW1 field from "rsvd" to "dw1".
> 
>  - Add check for MDTS in Zone Management Receive handler.
> 
>  - Remove checks for ns->attached since the value of this flag
>is always true for now.
> 
>  - Rebase to the current quemu-nvme/nvme-next branch.
> 
> v7 -> v8:
> 
>  - Move refactoring commits to the front of the series.
> 
>  - Remove "attached" and "fill_pattern" device properties.
> 
>  - Only close open zones upon subsystem shutdown, not when CC.EN flag
>is set to 0. Avoid looping through all zones by iterating through
>lists of open and closed zones.
> 
>  - Improve bulk processing of zones aka zoned operations with "all"
>flag set. Avoid looping through the entire zone array for all zone
>operations except Offline Zone.
> 
>  - Prefix ZNS-related property names with "zoned.". The "zoned" Boolean
>property is retained to turn on zoned command set as it is much more
>intuitive and user-friendly compared to setting a magic number value
>to csi property.
> 
>  - Address review comments.
> 
>  - Remove unused trace events.
> 
> v6 -> v7:
> 
>  - Introduce ns->iocs initialization function earlier in the series,
>in CSE Log patch.
> 
>  - Set NVM iocs for zoned namespaces when CC.CSS is set to
>NVME_CC_CSS_NVM.
> 
>  - Clean up code in CSE log handler.
>  
> v5 -> v6:
> 
>  - Remove zoned state persistence code. Replace position-independent
>zone lists with QTAILQs.
> 
>  - Close all open zones upon clearing of the controller. This is
>a similar procedure to the one previously performed upon powering
>up with zone persistence. 
> 
>  - Squash NS Types and ZNS triplets of commits to keep definitions
>and trace event definitions together with the implementation code.
> 
>  - Move namespace UUID generation to a separate patch. Add the new
>"uuid" property as suggested by Klaus.
> 
>  - Rework Commands and Effects patch to make sure that the log is
>always in sync with the actual set of commands supported.
> 
>  - Add two refactoring commits at the end of the series to
>optimize read and write i/o path.
> 
> - Incorporate feedback from Keith, Klaus and Niklas:
> 
>   * fix rebase errors in nvme_identify_ns_descr_list()
>   * remove unnecessary code from nvme_write_bar()
>   * move csi to NvmeNamespace and use it from the beginning in NSTypes
> patch
>   * change zone read processing to cover all corner cases with RAZB=1
>   * sync w_ptr and d.wp in case of a i/o error at the preceding zone
>   * reword the commit message in active/inactive patch with the new
> text from Niklas
>   * correct dlfeat reporting depending on the fill pattern set
>   * add more checks for "attached" n/s parameter to prevent i/o and
> get/set features on inactive namespaces
>   * Use DEFINE_PROP_SIZE and DEFINE_PROP_SIZE32 for zone size/capacity
> and ZASL

Re: [PATCH] file-posix: detect the lock using the real file

2020-12-09 Thread Daniel P . Berrangé

On Tue, Dec 08, 2020 at 03:38:22PM +0100, Kevin Wolf wrote:
> Am 08.12.2020 um 13:59 hat Li Feng geschrieben:
> > This patch addresses this issue:
> > When accessing a volume on an NFS filesystem without supporting the file 
> > lock,
> > tools, like qemu-img, will complain "Failed to lock byte 100".
> > 
> > In the original code, the qemu_has_ofd_lock will test the lock on the
> > "/dev/null" pseudo-file. Actually, the file.locking is per-drive property,
> > which depends on the underlay filesystem.
> > 
> > In this patch, make the 'qemu_has_ofd_lock' with a filename be more generic
> > and reasonable.
> > 
> > Signed-off-by: Li Feng 
> 
> Do you know any way how I could configure either the NFS server or the
> NFS client such that locking would fail? For any patch related to this,
> it would be good if I could even test the scenario.

One could write a qtest that uses an LD_PRELOAD to replace the standard
glibc fcntl() function with one that returns an error for locking commands.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

49 matches

Mail list logo