date:20230601

[PULL 09/21] cutils: Adjust signature of parse_uint[_full]

2023-06-01 Thread Eric Blake

It's already confusing that we have two very similar functions for
wrapping the parse of a 64-bit unsigned value, differing mainly on
whether they permit leading '-'.  Adjust the signature of parse_uint()
and parse_uint_full() to be like all of qemu_strto*(): put the result
parameter last, use the same types (uint64_t and unsigned long long
have the same width, but are not always the same type), and mark
endptr const (this latter change only affects the rare caller of
parse_uint).  Adjust all callers in the tree.

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
Message-Id: <20230522190441.64278-8-ebl...@redhat.com>
---
 include/qemu/cutils.h |   5 +-
 audio/audio_legacy.c  |   4 +-
 block/gluster.c   |   4 +-
 block/nfs.c   |   4 +-
 blockdev.c|   4 +-
 contrib/ivshmem-server/main.c |   4 +-
 qapi/opts-visitor.c   |  10 +--
 tests/unit/test-cutils.c  | 119 +++---
 ui/vnc.c  |   4 +-
 util/cutils.c |  13 ++--
 util/guest-random.c   |   4 +-
 util/qemu-sockets.c   |  10 +--
 12 files changed, 85 insertions(+), 100 deletions(-)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 92c436d8c70..92c927a6a35 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -163,9 +163,8 @@ int qemu_strtou64(const char *nptr, const char **endptr, 
int base,
 int qemu_strtod(const char *nptr, const char **endptr, double *result);
 int qemu_strtod_finite(const char *nptr, const char **endptr, double *result);

-int parse_uint(const char *s, unsigned long long *value, char **endptr,
-   int base);
-int parse_uint_full(const char *s, unsigned long long *value, int base);
+int parse_uint(const char *s, const char **endptr, int base, uint64_t *value);
+int parse_uint_full(const char *s, int base, uint64_t *value);

 int qemu_strtosz(const char *nptr, const char **end, uint64_t *result);
 int qemu_strtosz_MiB(const char *nptr, const char **end, uint64_t *result);
diff --git a/audio/audio_legacy.c b/audio/audio_legacy.c
index b848001ff70..dc72ba55e9a 100644
--- a/audio/audio_legacy.c
+++ b/audio/audio_legacy.c
@@ -35,8 +35,8 @@

 static uint32_t toui32(const char *str)
 {
-unsigned long long ret;
-if (parse_uint_full(str, , 10) || ret > UINT32_MAX) {
+uint64_t ret;
+if (parse_uint_full(str, 10, ) || ret > UINT32_MAX) {
 dolog("Invalid integer value `%s'\n", str);
 exit(1);
 }
diff --git a/block/gluster.c b/block/gluster.c
index 185a83e5e53..ad5fadbe793 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -424,7 +424,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 int ret;
 int old_errno;
 SocketAddressList *server;
-unsigned long long port;
+uint64_t port;

 glfs = glfs_find_preopened(gconf->volume);
 if (glfs) {
@@ -445,7 +445,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
server->value->u.q_unix.path, 0);
 break;
 case SOCKET_ADDRESS_TYPE_INET:
-if (parse_uint_full(server->value->u.inet.port, , 10) < 0 ||
+if (parse_uint_full(server->value->u.inet.port, 10, ) < 0 ||
 port > 65535) {
 error_setg(errp, "'%s' is not a valid port number",
server->value->u.inet.port);
diff --git a/block/nfs.c b/block/nfs.c
index 8f89ece69fa..c24df49747d 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -114,13 +114,13 @@ static int nfs_parse_uri(const char *filename, QDict 
*options, Error **errp)
 qdict_put_str(options, "path", uri->path);

 for (i = 0; i < qp->n; i++) {
-unsigned long long val;
+uint64_t val;
 if (!qp->p[i].value) {
 error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
 goto out;
 }
-if (parse_uint_full(qp->p[i].value, , 0)) {
+if (parse_uint_full(qp->p[i].value, 0, )) {
 error_setg(errp, "Illegal value for NFS parameter: %s",
qp->p[i].name);
 goto out;
diff --git a/blockdev.c b/blockdev.c
index db2725fe741..e6eba61484a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -341,10 +341,10 @@ static bool parse_stats_intervals(BlockAcctStats *stats, 
QList *intervals,
 switch (qobject_type(entry->value)) {

 case QTYPE_QSTRING: {
-unsigned long long length;
+uint64_t length;
 const char *str = qstring_get_str(qobject_to(QString,
  entry->value));
-if (parse_uint_full(str, , 10) == 0 &&
+if (parse_uint_full(str, 10, ) == 0 &&
 length > 0 && length <= UINT_MAX) {
 block_acct_add_interval(stats, (unsigned) length);
 } else {
diff --git

[PULL 19/21] cutils: Use parse_uint in qemu_strtosz for negative rejection

2023-06-01 Thread Eric Blake

Rather than open-coding two different ways to check for an unwanted
negative sign, reuse the same code in both functions.  That way, if we
decide down the road to accept "-0" instead of rejecting it, we have
fewer places to change.  Also, it means we now get ERANGE instead of
EINVAL for negative values in qemu_strtosz, which is reasonable for
what it represents.  This in turn changes the expected output of a
couple of iotests.

The change is not quite complete: negative fractional scaled values
can trip us up.  This will be fixed in a later patch addressing other
issues with fractional scaled values.

Signed-off-by: Eric Blake 
Reviewed-by: Hanna Czenczek 
Message-Id: <20230522190441.64278-18-ebl...@redhat.com>
---
 tests/unit/test-cutils.c | 7 +++
 util/cutils.c| 8 ++--
 tests/qemu-iotests/049.out   | 7 ++-
 tests/qemu-iotests/178.out.qcow2 | 3 +--
 tests/qemu-iotests/178.out.raw   | 3 +--
 5 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/tests/unit/test-cutils.c b/tests/unit/test-cutils.c
index e5b780672d1..c2dbed9eda9 100644
--- a/tests/unit/test-cutils.c
+++ b/tests/unit/test-cutils.c
@@ -3519,10 +3519,9 @@ static void test_qemu_strtosz_trailing(void)
 static void test_qemu_strtosz_erange(void)
 {
 /* FIXME negative values fit better as ERANGE */
-do_strtosz(" -0", -EINVAL /* FIXME -ERANGE */, 0, 0 /* FIXME 3 */);
-do_strtosz("-1", -EINVAL /* FIXME -ERANGE */, 0, 0 /* FIXME 2 */);
-do_strtosz_full("-2M", qemu_strtosz, -EINVAL /* FIXME -ERANGE */, 0,
-0 /* FIXME 2 */, -EINVAL, 0);
+do_strtosz(" -0", -ERANGE, 0, 3);
+do_strtosz("-1", -ERANGE, 0, 2);
+do_strtosz_full("-2M", qemu_strtosz, -ERANGE, 0, 2, -EINVAL, 0);
 do_strtosz(" -.0", -EINVAL /* FIXME -ERANGE */, 0, 0 /* FIXME 4 */);
 do_strtosz_full("-.1k", qemu_strtosz, -EINVAL /* FIXME -ERANGE */, 0,
 0 /* FIXME 3 */, -EINVAL, 0);
diff --git a/util/cutils.c b/util/cutils.c
index edfb71a2171..e3a49209a94 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -201,6 +201,7 @@ static int64_t suffix_mul(char suffix, int64_t unit)
  * - hex with scaling suffix, such as 0x20M
  * - octal, such as 08
  * - fractional hex, such as 0x1.8
+ * - negative values, including -0
  * - floating point exponents, such as 1e3
  *
  * The end pointer will be returned in *end, if not NULL.  If there is
@@ -226,15 +227,10 @@ static int do_strtosz(const char *nptr, const char **end,
 int64_t mul;

 /* Parse integral portion as decimal. */
-retval = qemu_strtou64(nptr, , 10, );
+retval = parse_uint(nptr, , 10, );
 if (retval) {
 goto out;
 }
-if (memchr(nptr, '-', endptr - nptr) != NULL) {
-endptr = nptr;
-retval = -EINVAL;
-goto out;
-}
 if (val == 0 && (*endptr == 'x' || *endptr == 'X')) {
 /* Input looks like hex; reparse, and insist on no fraction or suffix. 
*/
 retval = qemu_strtou64(nptr, , 16, );
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 8719c91b483..34e1b452e6e 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -92,13 +92,10 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 cluster_size=65536 
extended_l2=off comp
 == 3. Invalid sizes ==

 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1024
-qemu-img: Invalid image size specified. You may use k, M, G, T, P or E 
suffixes for
-qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
+qemu-img: Invalid image size specified. Must be between 0 and 
9223372036854775807.

 qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2
-qemu-img: TEST_DIR/t.qcow2: Parameter 'size' expects a non-negative number 
below 2^64
-Optional suffix k, M, G, T, P or E means kilo-, mega-, giga-, tera-, peta-
-and exabytes, respectively.
+qemu-img: TEST_DIR/t.qcow2: Value '-1024' is out of range for parameter 'size'

 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k
 qemu-img: Invalid image size specified. You may use k, M, G, T, P or E 
suffixes for
diff --git a/tests/qemu-iotests/178.out.qcow2 b/tests/qemu-iotests/178.out.qcow2
index 0d51fe401ec..fe193fd5f4f 100644
--- a/tests/qemu-iotests/178.out.qcow2
+++ b/tests/qemu-iotests/178.out.qcow2
@@ -13,8 +13,7 @@ qemu-img: Invalid option list: ,
 qemu-img: Invalid parameter 'snapshot.foo'
 qemu-img: Failed in parsing snapshot param 'snapshot.foo=bar'
 qemu-img: --output must be used with human or json as argument.
-qemu-img: Invalid image size specified. You may use k, M, G, T, P or E 
suffixes for
-qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
+qemu-img: Invalid image size specified. Must be between 0 and 
9223372036854775807.
 qemu-img: Unknown file format 'foo'

 == Size calculation for a new file (human) ==
diff --git a/tests/qemu-iotests/178.out.raw b/tests/qemu-iotests/178.out.raw
index 116241ddef2..445e460fad9 100644
--- a/tests/qemu-iotests/178.out.raw
+++

[PULL 01/21] iotests: Fix test 104 under NBD

2023-06-01 Thread Eric Blake

In the past, commit a231cb27 ("iotests: Fix 104 for NBD", v2.3.0)
added an additional filter to _filter_img_info to rewrite NBD URIs
into the expected output form.  This recently broke when we tweaked
tests to run in a per-format directory, which did not match the regex,
because _img_info itself is now already changing
SOCK_DIR=/tmp/tmpphjfbphd/raw-nbd-104 into
/tmp/tmpphjfbphd/IMGFMT-nbd-104 prior to _img_info_filter getting a
chance to further filter things.

While diagnosing the problem, I also noticed some filter lines
rendered completely useless by a typo when we switched from TCP to
Unix sockets for NBD (in shell, '\\+' is different from "\\+" (one
gives two backslash to the regex, matching the literal 2-byte sequence
<\+> after a single digit; the other gives one backslash to the regex,
as the metacharacter \+ to match one or more of <[0-9]>); since the
literal string  is not a valid URI, that regex
hasn't been matching anything for years so it is fine to just drop it
rather than fix the typo.

Fixes: f3923a72 ("iotests: Switch nbd tests to use Unix rather than TCP", 
v4.2.0)
Fixes: 5ba7db09 ("iotests: always use a unique sub-directory per test", v8.0.0)
Signed-off-by: Eric Blake 
Message-Id: <20230519150216.2599189-1-ebl...@redhat.com>
Reviewed-by: Daniel P. Berrangé 
---
 tests/qemu-iotests/common.filter | 4 +---
 tests/qemu-iotests/common.rc | 3 ++-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 6b32c7fbfa1..fc3c64bcb8e 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Copyright (C) 2009 Red Hat, Inc.
+# Copyright Red Hat
 # Copyright (c) 2000-2001 Silicon Graphics, Inc.  All Rights Reserved.
 #
 # This program is free software; you can redistribute it and/or
@@ -131,7 +131,6 @@ _filter_img_create_filenames()
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
 -e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "s#$IMGFMT#IMGFMT#g" \
--e 's#nbd:127.0.0.1:[0-9]\\+#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g'
 }

@@ -229,7 +228,6 @@ _filter_img_info()
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR#SOCK_DIR#g" \
 -e "s#$IMGFMT#IMGFMT#g" \
--e 's#nbd://127.0.0.1:[0-9]\\+$#TEST_DIR/t.IMGFMT#g' \
 -e 's#nbd+unix:///\??socket=SOCK_DIR/nbd#TEST_DIR/t.IMGFMT#g' \
 -e 's#SOCK_DIR/fuse-#TEST_DIR/#g' \
 -e "/encrypted: yes/d" \
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index f4476b62f7d..d145f08201c 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 #
-# Copyright (C) 2009 Red Hat, Inc.
+# Copyright Red Hat
 # Copyright (c) 2000-2006 Silicon Graphics, Inc.  All Rights Reserved.
 #
 # This program is free software; you can redistribute it and/or modify
@@ -717,6 +717,7 @@ _img_info()
 -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \
 -e "s#$TEST_DIR#TEST_DIR#g" \
 -e "s#$SOCK_DIR/fuse-#TEST_DIR/#g" \
+-e "s#$SOCK_DIR/#SOCK_DIR/#g" \
 -e "s#$IMGFMT#IMGFMT#g" \
 -e 's/\(compression type: \)\(zlib\|zstd\)/\1COMPRESSION_TYPE/' \
 -e "/^disk size:/ D" \
-- 
2.40.1

[PULL 02/21] qcow2: Explicit mention of padding bytes

2023-06-01 Thread Eric Blake

Although we already covered the need for padding bytes with our
changes in commit 3ae3fcfa, commit 66fcbca5 (both v5.0.0) added one
byte and relied on the rest of the text for implicitly covering 7
padding bytes.  For consistency with other parts of the header (such
as the header extension format listing padding from n - m, or the
snapshot table entry listing variable padding), we might as well call
out the remaining 7 bytes as padding until such time (as any) as they
gain another meaning.

Signed-off-by: Eric Blake 
CC: Vladimir Sementsov-Ogievskiy 
Message-Id: <20230522184631.47211-1-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 docs/interop/qcow2.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index e7f036c286b..2c4618375ad 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -226,6 +226,7 @@ version 2.
  in QEMU. However, clusters with the
 deflate compression type do not have zlib headers.

+105 - 111:  Padding, contents defined below.

 === Header padding ===

-- 
2.40.1

Re: [PATCH v3 19/19] cutils: Improve qemu_strtosz handling of fractions

2023-06-01 Thread Eric Blake

On Mon, May 22, 2023 at 02:04:41PM -0500, Eric Blake wrote:
> We have several limitations and bugs worth fixing; they are
> inter-related enough that it is not worth splitting this patch into
> smaller pieces:
> 
> +++ b/util/cutils.c
> @@ -194,15 +194,18 @@ static int64_t suffix_mul(char suffix, int64_t unit)
>   * - 12345 - decimal, scale determined by @default_suffix and @unit
>   * - 12345{bBkKmMgGtTpPeE} - decimal, scale determined by suffix and @unit
>   * - 12345.678{kKmMgGtTpPeE} - decimal, scale determined by suffix, and
> - *   fractional portion is truncated to byte
> + *   fractional portion is truncated to byte, either side of . may be empty
>   * - 0x7fEE - hexadecimal, unit determined by @default_suffix
>   *
>   * The following are intentionally not supported
> - * - hex with scaling suffix, such as 0x20M
> - * - octal, such as 08
> - * - fractional hex, such as 0x1.8
> - * - negative values, including -0
> - * - floating point exponents, such as 1e3
> + * - hex with scaling suffix, such as 0x20M (0x1b is 27, not 1)
> + * - octal, such as 08 (parsed as decimal instead)
> + * - binary, such as 0b1000 (parsed as 0b with trailing garbage "1000")
> + * - fractional hex, such as 0x1.8 (parsed as 0 with trailing garbage "x1.8")
> + * - negative values, including -0 (fail with -ERANGE)
> + * - floating point exponents, such as 1e3 (parsed as 1e with trailing
> + *   garbage "3") or 0x1p3 (parsed as 1 with trailing garbage "p3")

This latter clause is wrong - we reject 0x1p3 earlier under the hex
with scaling suffix rule.  I've touched up the comment as part of
preparing the pull request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3 00/19] Fix qemu_strtosz() read-out-of-bounds

2023-06-01 Thread Eric Blake

On Mon, May 22, 2023 at 02:04:22PM -0500, Eric Blake wrote:
> v2 was here:
> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg02951.html
> 
> Since then:
>  - fix another qemu_strtoui bug
>  - address review comments from Hanna

This series has been reviewed; I fixed up the last few bits, and am
queueing it through my NBD tree (not really about NBD directly, but
tangentially we rely on size parsing in unit testing...), in order to
prepare a pull request today.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] qcow2: Explicit mention of padding bytes

2023-06-01 Thread Eric Blake

On Mon, May 22, 2023 at 11:26:03PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 22.05.23 21:46, Eric Blake wrote:
> > Although we already covered the need for padding bytes with our
> > changes in commit 3ae3fcfa, commit 66fcbca5 (both v5.0.0) added one
> > byte and relied on the rest of the text for implicitly covering 7
> > padding bytes.  For consistency with other parts of the header (such
> > as the header extension format listing padding from n - m, or the
> > snapshot table entry listing variable padding), we might as well call
> > out the remaining 7 bytes as padding until such time (as any) as they
> > gain another meaning.
> > 
> > Signed-off-by: Eric Blake 
> > CC: Vladimir Sementsov-Ogievskiy 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Not strictly related to NBD, but I'll pick it up since I'm about to do a pull 
request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/6] qemu-img: rebase: stop when reaching EOF of old backing file

2023-06-01 Thread Michael Tokarev


01.06.2023 22:28, Andrey Drobyshev via пишет:

In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0.  As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop.  Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev 


It looks like you forgot the Reviewed-by: Denis V. Lunev here and
in the subsequent patch.

Should this be backported to -stable? Not that I've seen this issue,
it's a quite specific and somewhat rare case..

Thanks,

/mjt

Re: [PATCH v2 0/2] qemu-img: fix getting stuck in infinite loop on in-chain rebase

2023-06-01 Thread Andrey Drobyshev

On 5/25/23 21:02, Andrey Drobyshev wrote:
> v1 -> v2:
> 
>   * Avoid breaking the loop just yet, as the offsets beyond the old
> backing size need to be explicitly zeroed;
>   * Amend the commit message accordingly;
>   * Alter the added test case to take the last zeroed cluster into
> consideration.
> 
> v1: https://lists.nongnu.org/archive/html/qemu-block/2023-05/msg00674.html
> 
> Andrey Drobyshev (2):
>   qemu-img: rebase: stop when reaching EOF of old backing file
>   qemu-iotests: 024: add rebasing test case for overlay_size >
> backing_size
> 
>  qemu-img.c | 13 -
>  tests/qemu-iotests/024 | 57 ++
>  tests/qemu-iotests/024.out | 30 
>  3 files changed, 99 insertions(+), 1 deletion(-)
> 

Since there're no comments so far, I've included this same bugfix into
the bigger series regarding "qemu-img rebase".  Please refer to
https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00068.html.

Re: [PATCH v3 1/3] hw/i2c: add smbus pec utility function

2023-06-01 Thread Philippe Mathieu-Daudé


On 31/5/23 13:47, Klaus Jensen wrote:

From: Klaus Jensen 

Add i2c_smbus_pec() to calculate the SMBus Packet Error Code for a
message.

Signed-off-by: Klaus Jensen 
---
  hw/i2c/smbus_master.c | 28 
  include/hw/i2c/smbus_master.h |  2 ++
  2 files changed, 30 insertions(+)

diff --git a/hw/i2c/smbus_master.c b/hw/i2c/smbus_master.c
index 6a53c34e70b7..47f9eb24e033 100644
--- a/hw/i2c/smbus_master.c
+++ b/hw/i2c/smbus_master.c
@@ -15,6 +15,34 @@
  #include "hw/i2c/i2c.h"
  #include "hw/i2c/smbus_master.h"
  
+static uint8_t crc8(uint16_t data)

+{
+#define POLY (0x1070U << 3)


static const unsigned crc8_poly = ..., but why not inline the single use?
  data ^= 0x1070U << 3;
and
  data <<= 1;


+int i;
+
+for (i = 0; i < 8; i++) {
+if (data & 0x8000) {
+data = data ^ POLY;
+}
+
+data = data << 1;
+}
+
+return (uint8_t)(data >> 8);
+#undef POLY
+}


We have "qemu/crc32c.h", maybe we could have a similar crc8.h. Just 
wondering...

Re: [PATCH v3 0/3] hw/{i2c, nvme}: mctp endpoint, nvme management interface model

2023-06-01 Thread Corey Minyard

On Wed, May 31, 2023 at 01:47:41PM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This adds a generic MCTP endpoint model that other devices may derive
> from. I'm not 100% happy with the design of the class methods, but it's
> a start.
> 
> Also included is a very basic implementation of an NVMe-MI device,
> supporting only a small subset of the required commands. Lior (CC'ed) has some
> patches coming up that adds futher support.
> 
> Since this all relies on i2c target mode, this can currently only be
> used with an SoC that includes the Aspeed I2C controller.
> 
> The easiest way to get up and running with this, is to grab my buildroot
> overlay[1]. It includes modified a modified dts as well as a couple of
> required packages.
> 
> QEMU can then be launched along these lines:
> 
>   qemu-system-arm \
> -nographic \
> -M ast2600-evb \
> -kernel output/images/zImage \
> -initrd output/images/rootfs.cpio \
> -dtb output/images/aspeed-ast2600-evb-nmi.dtb \
> -nic user,hostfwd=tcp::-:22 \
> -device nmi-i2c,address=0x3a \
> -serial mon:stdio
> 
> From within the booted system,
> 
>   mctp addr add 8 dev mctpi2c15
>   mctp link set mctpi2c15 up
>   mctp route add 9 via mctpi2c15
>   mctp neigh add 9 dev mctpi2c15 lladdr 0x3a
>   mi-mctp 1 9 info
> 
> Comments are very welcome!
> 
>   [1]: https://github.com/birkelund/buildroots/tree/main/mctp-i2c
> 
> Changes since v2
> 
> 
>   - Applied a bunch of feedback from Jonathan:
> + Moved a lot of internally used structs out of the include headers
>   and into the source files.
> + Added spec references in various places
> + Split the patch for i2c_smbus_pec() into its own
> + Fix a compile error (and bug) in nmi-i2c.c.
> 
>   - From Corey:
> + Reworked the buffer handling. The deriving devices now returns a
>   pointer to their own buffer that the mctp core copies into.

You didn't do what I asked here, I guess I wasn't clear.  You have:

+static void i2c_mctp_handle_control_set_eid(MCTPI2CEndpoint *mctp, uint8_t eid)
+{
+mctp->my_eid = eid;
+
+uint8_t buf[] = {
+0x0, 0x0, eid, 0x0,
+};
+
+memcpy(i2c_mctp_control_data(mctp->buffer), buf, sizeof(buf));
+mctp->len += sizeof(buf);
+}

That style of programming can lead to buffer overruns as code changes,
as you aren't checking the length of the target buffer.  I don't think
there are any issues now, but as people change the code you might end up
with one if someone gets a length wrong.

What I would like is for you to create a function like:

  i2c_mctp_add_bytes(mctp, buf, len)

that checks that len bytes will fit, then does the addition of the
bytes.  You need to adjust this to fit how you are doing things, and you
probably want one that adds just one byte, but hopefully you get the idea.

I'm sorry to be picky, but I've seen and fixed too many buffer overruns
(including one in the qemu i2c code) in situations like this.  Corey's
rule is: Never add anything to a buffer without checking the length.

Everything else looks good.

-corey

> + Added a couple of extra debugging trace events.
> 
> Changes since v1
> 
> 
>   - Fix SPDX-License tag for hw/nvme/nmi-i2c.c (Philippe)
>   - Add some asserts to verify buffer indices (by request from Corey).
>   - Drop short packets that could result in underflow (Corey)
>   - Move i2c_smbus_pec() to smbus common code (Corey)
>   - A couple of logic fixes (patch from Jeremy squashed in)
>   - Added a patch to handle messages with dest eid 0 (Matt)
> Maybe squash this as well.
> 
> Klaus Jensen (3):
>   hw/i2c: add smbus pec utility function
>   hw/i2c: add mctp core
>   hw/nvme: add nvme management interface model
> 
>  MAINTAINERS   |   7 +
>  hw/arm/Kconfig|   1 +
>  hw/i2c/Kconfig|   4 +
>  hw/i2c/mctp.c | 398 ++
>  hw/i2c/meson.build|   1 +
>  hw/i2c/smbus_master.c |  28 +++
>  hw/i2c/trace-events   |  13 ++
>  hw/nvme/meson.build   |   1 +
>  hw/nvme/nmi-i2c.c | 367 +++
>  hw/nvme/trace-events  |   6 +
>  include/hw/i2c/mctp.h | 137 
>  include/hw/i2c/smbus_master.h |   2 +
>  include/net/mctp.h|  28 +++
>  13 files changed, 993 insertions(+)
>  create mode 100644 hw/i2c/mctp.c
>  create mode 100644 hw/nvme/nmi-i2c.c
>  create mode 100644 include/hw/i2c/mctp.h
>  create mode 100644 include/net/mctp.h
> 
> -- 
> 2.40.0
> 
>

[PATCH 0/6] qemu-img: rebase: add compression support

2023-06-01 Thread Andrey Drobyshev via

This series is adding [-c | --compress] option to "qemu-img rebase"
command, which might prove useful for saving some disk space when, for
instance, manipulating chains of backup images.  Along the way I had to
make a couple of minor improvements.

The first 2 patches are a bug fix + corresponding test case.
Patch 3 merely fixes wrong args used in allocation.
Patch 4 makes write requests during rebase operation cluster_size-aligned,
which seems to be beneficial for both non-compressed and compressed mode.
The last 2 patches are the actual feature implementation + tests.

Andrey Drobyshev (6):
  qemu-img: rebase: stop when reaching EOF of old backing file
  qemu-iotests: 024: add rebasing test case for overlay_size >
backing_size
  qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  qemu-img: rebase: avoid unnecessary COW operations
  qemu-img: add compression option to rebase subcommand
  iotests: add test 314 for "qemu-img rebase" with compression

 docs/tools/qemu-img.rst|   6 +-
 qemu-img-cmds.hx   |   4 +-
 qemu-img.c | 106 ++--
 tests/qemu-iotests/024 |  57 +
 tests/qemu-iotests/024.out |  30 +++
 tests/qemu-iotests/314 | 165 +
 tests/qemu-iotests/314.out |  75 +
 7 files changed, 415 insertions(+), 28 deletions(-)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out

-- 
2.31.1

[PATCH 1/6] qemu-img: rebase: stop when reaching EOF of old backing file

2023-06-01 Thread Andrey Drobyshev via

In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0.  As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop.  Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev 
---
 qemu-img.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 27f48051b0..78433f3746 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3801,6 +3801,8 @@ static int img_rebase(int argc, char **argv)
 }
 
 if (prefix_chain_bs) {
+uint64_t bytes = n;
+
 /*
  * If cluster wasn't changed since prefix_chain, we don't need
  * to take action
@@ -3813,9 +3815,18 @@ static int img_rebase(int argc, char **argv)
  strerror(-ret));
 goto out;
 }
-if (!ret) {
+if (!ret && n) {
 continue;
 }
+if (!n) {
+/*
+ * If we've reached EOF of the old backing, it means that
+ * offsets beyond the old backing size were read as zeroes.
+ * Now we will need to explicitly zero the cluster in
+ * order to preserve that state after the rebase.
+ */
+n = bytes;
+}
 }
 
 /*
-- 
2.31.1

[PATCH 2/6] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size

2023-06-01 Thread Andrey Drobyshev via

Before previous commit, rebase was getting infitely stuck in case of
rebasing within the same backing chain and when overlay_size > backing_size.
Let's add this case to the rebasing test 024 to make sure it doesn't
break again.

Signed-off-by: Andrey Drobyshev 
---
 tests/qemu-iotests/024 | 57 ++
 tests/qemu-iotests/024.out | 30 
 2 files changed, 87 insertions(+)

diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 25a564a150..98a7c8fd65 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -199,6 +199,63 @@ echo
 # $BASE_OLD and $BASE_NEW)
 $QEMU_IMG map "$OVERLAY" | _filter_qemu_img_map
 
+# Check that rebase within the chain is working when
+# overlay_size > old_backing_size
+#
+# base_new <-- base_old <-- overlay
+#
+# Backing (new): 11 11 11 11 11
+# Backing (old): 22 22 22 22
+# Overlay:   -- -- -- -- --
+#
+# As a result, overlay should contain data identical to base_old, with the
+# last cluster remaining unallocated.
+
+echo
+echo "=== Test rebase within one backing chain ==="
+echo
+
+echo "Creating backing chain"
+echo
+
+TEST_IMG=$BASE_NEW _make_test_img $(( CLUSTER_SIZE * 5 ))
+TEST_IMG=$BASE_OLD _make_test_img -b "$BASE_NEW" -F $IMGFMT \
+$(( CLUSTER_SIZE * 4 ))
+TEST_IMG=$OVERLAY _make_test_img -b "$BASE_OLD" -F $IMGFMT \
+$(( CLUSTER_SIZE * 5 ))
+
+echo
+echo "Fill backing files with data"
+echo
+
+$QEMU_IO "$BASE_NEW" -c "write -P 0x11 0 $(( CLUSTER_SIZE * 5 ))" \
+| _filter_qemu_io
+$QEMU_IO "$BASE_OLD" -c "write -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+| _filter_qemu_io
+
+echo
+echo "Check the last cluster is zeroed in overlay before the rebase"
+echo
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+| _filter_qemu_io
+
+echo
+echo "Rebase onto another image in the same chain"
+echo
+
+$QEMU_IMG rebase -b "$BASE_NEW" -F $IMGFMT "$OVERLAY"
+
+echo "Verify that data is read the same before and after rebase"
+echo
+
+# Verify the first 4 clusters are still read the same as in the old base
+$QEMU_IO "$OVERLAY" -c "read -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+| _filter_qemu_io
+# Verify the last cluster still reads as zeroes
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+| _filter_qemu_io
+
+echo
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/024.out b/tests/qemu-iotests/024.out
index 973a5a3711..245fe8b1d1 100644
--- a/tests/qemu-iotests/024.out
+++ b/tests/qemu-iotests/024.out
@@ -171,4 +171,34 @@ read 65536/65536 bytes at offset 196608
 Offset  Length  File
 0   0x3 TEST_DIR/subdir/t.IMGFMT
 0x3 0x1 TEST_DIR/subdir/t.IMGFMT.base_new
+
+=== Test rebase within one backing chain ===
+
+Creating backing chain
+
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_new', fmt=IMGFMT size=327680
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_old', fmt=IMGFMT size=262144 
backing_file=TEST_DIR/subdir/t.IMGFMT.base_new backing_fmt=IMGFMT
+Formatting 'TEST_DIR/subdir/t.IMGFMT', fmt=IMGFMT size=327680 
backing_file=TEST_DIR/subdir/t.IMGFMT.base_old backing_fmt=IMGFMT
+
+Fill backing files with data
+
+wrote 327680/327680 bytes at offset 0
+320 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Check the last cluster is zeroed in overlay before the rebase
+
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Rebase onto another image in the same chain
+
+Verify that data is read the same before and after rebase
+
+read 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 *** done
-- 
2.31.1

[PATCH 5/6] qemu-img: add compression option to rebase subcommand

2023-06-01 Thread Andrey Drobyshev via

If we rebase an image whose backing file has compressed clusters, we
might end up wasting disk space since the copied clusters are now
uncompressed.  In order to have better control over this, let's add
"--compress" option to the "qemu-img rebase" command.

Note that this option affects only the clusters which are actually being
copied from the original backing file.  The clusters which were
uncompressed in the target image will remain so.

Signed-off-by: Andrey Drobyshev 
---
 docs/tools/qemu-img.rst |  6 --
 qemu-img-cmds.hx|  4 ++--
 qemu-img.c  | 19 +--
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..973a912dec 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -663,7 +663,7 @@ Command description:
 
   List, apply, create or delete snapshots in image *FILENAME*.
 
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
 
   Changes the backing file of an image. Only the formats ``qcow2`` and
   ``qed`` support changing the backing file.
@@ -690,7 +690,9 @@ Command description:
 
 In order to achieve this, any clusters that differ between
 *BACKING_FILE* and the old backing file of *FILENAME* are merged
-into *FILENAME* before actually changing the backing file.
+into *FILENAME* before actually changing the backing file. With ``-c``
+option specified, the clusters which are being merged (but not the
+entire *FILENAME* image) are written in the compressed mode.
 
 Note that the safe mode is an expensive operation, comparable to
 converting an image. It only works if the old backing file still
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..068692d13e 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -88,9 +88,9 @@ SRST
 ERST
 
 DEF("rebase", img_rebase,
-"rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] 
[-T src_cache] [-p] [-u] -b backing_file [-F backing_fmt] filename")
+"rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] 
[-T src_cache] [-p] [-u] [-c] -b backing_file [-F backing_fmt] filename")
 SRST
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
 ERST
 
 DEF("resize", img_resize,
diff --git a/qemu-img.c b/qemu-img.c
index 9a469cd609..108da27b23 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3517,11 +3517,13 @@ static int img_rebase(int argc, char **argv)
 char *filename;
 const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
 int c, flags, src_flags, ret;
+BdrvRequestFlags write_flags = 0;
 bool writethrough, src_writethrough;
 int unsafe = 0;
 bool force_share = false;
 int progress = 0;
 bool quiet = false;
+bool compress = false;
 Error *local_err = NULL;
 bool image_opts = false;
 
@@ -3537,9 +3539,10 @@ static int img_rebase(int argc, char **argv)
 {"object", required_argument, 0, OPTION_OBJECT},
 {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
 {"force-share", no_argument, 0, 'U'},
+{"compress", no_argument, 0, 'c'},
 {0, 0, 0, 0}
 };
-c = getopt_long(argc, argv, ":hf:F:b:upt:T:qU",
+c = getopt_long(argc, argv, ":hf:F:b:upt:T:qUc",
 long_options, NULL);
 if (c == -1) {
 break;
@@ -3587,6 +3590,9 @@ static int img_rebase(int argc, char **argv)
 case 'U':
 force_share = true;
 break;
+case 'c':
+compress = true;
+break;
 }
 }
 
@@ -3639,6 +3645,14 @@ static int img_rebase(int argc, char **argv)
 
 unfiltered_bs = bdrv_skip_filters(bs);
 
+if (compress && !block_driver_can_compress(unfiltered_bs->drv)) {
+error_report("Compression not supported for this file format");
+ret = -1;
+goto out;
+} else if (compress) {
+write_flags |= BDRV_REQ_WRITE_COMPRESSED;
+}
+
 if (out_basefmt != NULL) {
 if (bdrv_find_format(out_basefmt) == NULL) {
 error_report("Invalid format name: '%s'", out_basefmt);
@@ -3903,7 +3917,8 @@ static int img_rebase(int argc, char **argv)
 bdi.cluster_size);
 end = end > size ? size : end;
 ret = blk_pwrite(blk, start, end - start,
- buf_old + (start -

[PATCH 6/6] iotests: add test 314 for "qemu-img rebase" with compression

2023-06-01 Thread Andrey Drobyshev via

The test cases considered so far:

1. Check that compression mode isn't compatible with "-f raw" (raw
   format doesn't support compression).
2. Check that rebasing an image onto no backing file preserves the data
   and writes the copied clusters actually compressed.
3. Same as 2, but with a raw backing file (i.e. the clusters copied from the
   backing are originally uncompressed -- we check they end up compressed
   after being merged).
4. Remove a single delta from a backing chain, perform the same checks
   as in 2.
5. Check that even when backing and overlay are initially uncompressed,
   copied clusters end up compressed when rebase with compression is
   performed.

Signed-off-by: Andrey Drobyshev 
---
 tests/qemu-iotests/314 | 165 +
 tests/qemu-iotests/314.out |  75 +
 2 files changed, 240 insertions(+)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out

diff --git a/tests/qemu-iotests/314 b/tests/qemu-iotests/314
new file mode 100755
index 00..96d7b4d258
--- /dev/null
+++ b/tests/qemu-iotests/314
@@ -0,0 +1,165 @@
+#!/usr/bin/env bash
+# group: rw backing auto quick
+#
+# Test qemu-img rebase with compression
+#
+# Copyright (c) 2023 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=andrey.drobys...@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+_rm_test_img "$TEST_IMG.base"
+_rm_test_img "$TEST_IMG.itmd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# Want the size divisible by 2 and 3
+size=$(( 48 * 1024 * 1024 ))
+half_size=$(( size / 2 ))
+third_size=$(( size / 3 ))
+
+# 1. "qemu-img rebase -c" should refuse working with any format which doesn't
+# support compression.  We only check "-f raw" here.
+echo
+echo "=== Testing compressed rebase format compatibility ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG" "$size" | _filter_img_create
+$QEMU_IMG rebase -c -f raw -b "" "$TEST_IMG"
+
+# 2. Write the 1st half of $size to backing file (compressed), 2nd half -- to
+# the top image (also compressed).  Rebase the top image onto no backing file,
+# with compression (i.e. "qemu-img -c -b ''").  Check that the resulting image
+# has the written data preserved, and "qemu-img check" reports 100% clusters
+# as compressed.
+echo
+echo "=== Testing rebase with compression onto no backing file ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT $size
+
+$QEMU_IO -c "write -c -P 0xaa 0 $half_size" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" "$TEST_IMG" \
+| _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 3. Same as the previous one, but with raw backing file (hence write to
+# the backing is uncompressed).
+echo
+echo "=== Testing rebase with compression with raw backing file ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$half_size" | _filter_img_create
+_make_test_img -b "$TEST_IMG.base" -F raw $size
+
+$QEMU_IO -f raw -c "write -P 0xaa 0 $half_size" "$TEST_IMG.base" \
+| _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" \
+"$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 4. Create a backing chain base<--itmd<--img, filling 1st, 2nd and 3rd
+# thirds of them, respectively (with compression).  Rebase img onto base,
+# effectively deleting itmd from the chain, and check that written data is
+# preserved in the resulting image.  Also check that "qemu-img check" reports
+# 100% clusters as compressed.
+echo
+echo "=== Testing compressed rebase removing single delta from the chain ==="
+echo
+

[PATCH 3/6] qemu-img: rebase: use backing files' BlockBackend for buffer alignment

2023-06-01 Thread Andrey Drobyshev via

Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
the data read from the old and new backing files are aligned using
BlockDriverState (or BlockBackend later on) referring to the target image.
However, this isn't quite right, because target image is only being
written to and has nothing to do with those buffers.  Let's fix that.

Signed-off-by: Andrey Drobyshev 
---
 qemu-img.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 78433f3746..60f4c06487 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3746,8 +3746,8 @@ static int img_rebase(int argc, char **argv)
 int64_t n;
 float local_progress = 0;
 
-buf_old = blk_blockalign(blk, IO_BUF_SIZE);
-buf_new = blk_blockalign(blk, IO_BUF_SIZE);
+buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
+buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
 
 size = blk_getlength(blk);
 if (size < 0) {
-- 
2.31.1

[PATCH 4/6] qemu-img: rebase: avoid unnecessary COW operations

2023-06-01 Thread Andrey Drobyshev via

When rebasing an image from one backing file to another, we need to
compare data from old and new backings.  If the diff between that data
happens to be unaligned to the target cluster size, we might end up
doing partial writes, which would lead to copy-on-write and additional IO.

Consider the following simple case (virtual_size == cluster_size == 64K):

base <-- inc1 <-- inc2

qemu-io -c "write -P 0xaa 0 32K" base.qcow2
qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2

While doing rebase, we'll write a half of the cluster to inc2, and block
layer will have to read the 2nd half of the same cluster from the base image
inc1 while doing this write operation, although the whole cluster is already
read earlier to perform data comparison.

In order to avoid these unnecessary IO cycles, let's make sure every
write request is aligned to the overlay cluster size.

Signed-off-by: Andrey Drobyshev 
---
 qemu-img.c | 72 +++---
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 60f4c06487..9a469cd609 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3513,6 +3513,7 @@ static int img_rebase(int argc, char **argv)
 uint8_t *buf_new = NULL;
 BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
 BlockDriverState *unfiltered_bs;
+BlockDriverInfo bdi = {0};
 char *filename;
 const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
 int c, flags, src_flags, ret;
@@ -3646,6 +3647,15 @@ static int img_rebase(int argc, char **argv)
 }
 }
 
+/* We need overlay cluster size to make sure write requests are aligned */
+ret = bdrv_get_info(unfiltered_bs, );
+if (ret < 0) {
+error_report("could not get block driver info");
+goto out;
+} else if (bdi.cluster_size == 0) {
+bdi.cluster_size = 1;
+}
+
 /* For safe rebasing we need to compare old and new backing file */
 if (!unsafe) {
 QDict *options = NULL;
@@ -3744,6 +3754,7 @@ static int img_rebase(int argc, char **argv)
 int64_t new_backing_size = 0;
 uint64_t offset;
 int64_t n;
+int64_t n_old = 0, n_new = 0;
 float local_progress = 0;
 
 buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
@@ -3784,7 +3795,7 @@ static int img_rebase(int argc, char **argv)
 }
 
 for (offset = 0; offset < size; offset += n) {
-bool buf_old_is_zero = false;
+bool old_backing_eof = false;
 
 /* How many bytes can we handle with the next read? */
 n = MIN(IO_BUF_SIZE, size - offset);
@@ -3829,33 +3840,38 @@ static int img_rebase(int argc, char **argv)
 }
 }
 
+/* At this point n must be aligned to the target cluster size. */
+if (offset + n < size) {
+assert(n % bdi.cluster_size == 0);
+}
+
+/*
+ * Much like the with the target image, we'll try to read as much
+ * of the old and new backings as we can.
+ */
+n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
+if (blk_new_backing) {
+n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
+}
+
 /*
  * Read old and new backing file and take into consideration that
  * backing files may be smaller than the COW image.
  */
-if (offset >= old_backing_size) {
-memset(buf_old, 0, n);
-buf_old_is_zero = true;
+memset(buf_old + n_old, 0, n - n_old);
+if (!n_old) {
+old_backing_eof = true;
 } else {
-if (offset + n > old_backing_size) {
-n = old_backing_size - offset;
-}
-
-ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
+ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
 if (ret < 0) {
 error_report("error while reading from old backing file");
 goto out;
 }
 }
 
-if (offset >= new_backing_size || !blk_new_backing) {
-memset(buf_new, 0, n);
-} else {
-if (offset + n > new_backing_size) {
-n = new_backing_size - offset;
-}
-
-ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
+memset(buf_new + n_new, 0, n - n_new);
+if (blk_new_backing && n_new) {
+ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
 if (ret < 0) {
 error_report("error while reading from new backing file");
 goto

Re: [PATCH v2 0/8] misc AHCI cleanups

2023-06-01 Thread John Snow

On Thu, Jun 1, 2023 at 9:45 AM Niklas Cassel  wrote:
>
> From: Niklas Cassel 
>
> Hello John,
>
> Here comes some misc AHCI cleanups.
>
> Most are related to error handling.
>
> Please review.
>
> (I'm also working on a second series which will add support for
> READ LOG EXT and READ LOG DMA EXT, but I will send that one out
> once it is ready. (It might take a couple of weeks still, since
> I've been a bit busy lately.))
>
>
> Changes since v1:
> -Picked up Reviewed-by tags.
>  (I did not convert your ACK to explicit Acked-by tags, since I assume
>  that the patches will go via your tree).

Guess so! I haven't been involved with IDE for a minute so I left the
ACKs in case I wandered off to signify that I hadn't reviewed them
thoroughly, but they *looked* good. Since I haven't wandered off,
guess I will actually take this and send an MR. I'll try to do this
Friday, June 2nd.

Thanks again for the very detailed commit messages, which make this easy. :)

--js

> -Rebased on master in order to fix a conflict in patch
>  "hw/ide/ahci: simplify and document PxCI handling".
> -Dropped patch "hw/ide/ahci: trigger either error IRQ or regular IRQ, not 
> both"
>  for now, as it caused a boot time regression in SeaBIOS.
>  This appears to be a bug in SeaBIOS, for more info see:
>  
> https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/RIHV3FZ4EVMAJA4TEDPASKNYV7V72O4C/
>  I will resend the QEMU patch separately once the SeaBIOS patch has been
>  merged, and once QEMU has updated to a SeaBIOS tag that includes the fix.
>
>
> Kind regards,
> Niklas
>
> Niklas Cassel (8):
>   hw/ide/ahci: remove stray backslash
>   hw/ide/core: set ERR_STAT in unsupported command completion
>   hw/ide/ahci: write D2H FIS on when processing NCQ command
>   hw/ide/ahci: simplify and document PxCI handling
>   hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
>   hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
>   hw/ide/ahci: fix ahci_write_fis_sdb()
>   hw/ide/ahci: fix broken SError handling
>
>  hw/ide/ahci.c | 112 +++---
>  hw/ide/core.c |   2 +-
>  2 files changed, 81 insertions(+), 33 deletions(-)
>
> --
> 2.40.1
>

Re: [PATCH 02/11] qdev-properties-system: Lock AioContext for blk_insert_bs()

2023-06-01 Thread Kevin Wolf

Am 31.05.2023 um 13:02 hat Kevin Wolf geschrieben:
> blk_insert_bs() requires that callers hold the AioContext lock for the
> node that should be inserted. Take it.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  hw/core/qdev-properties-system.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/core/qdev-properties-system.c 
> b/hw/core/qdev-properties-system.c
> index d42493f630..7f6b14276a 100644
> --- a/hw/core/qdev-properties-system.c
> +++ b/hw/core/qdev-properties-system.c
> @@ -147,7 +147,10 @@ static void set_drive_helper(Object *obj, Visitor *v, 
> const char *name,
>  blk = blk_new(ctx, 0, BLK_PERM_ALL);
>  blk_created = true;
>  
> +aio_context_acquire(ctx);
>  ret = blk_insert_bs(blk, bs, errp);
> +aio_context_release(ctx);

This one actually isn't completely right. ctx is only the AioContext of
bs if the device supports iothreads, but the main context otherwise.
It's the right thing for blk_new(), but for blk_insert_bs() we always
need to hold the lock for the AioContext of bs.

Kevin

[PULL 7/8] block/blkio: use qemu_open() to support fd passing for virtio-blk

2023-06-01 Thread Stefan Hajnoczi

From: Stefano Garzarella 

Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
passing. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened path.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
Message-id: 20230530071941.8954-2-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 53 ++-
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 11be8787a3..527323d625 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -673,25 +673,60 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
 {
 const char *path = qdict_get_try_str(options, "path");
 BDRVBlkioState *s = bs->opaque;
-int ret;
+bool fd_supported = false;
+int fd, ret;
 
 if (!path) {
 error_setg(errp, "missing 'path' option");
 return -EINVAL;
 }
 
-ret = blkio_set_str(s->blkio, "path", path);
-qdict_del(options, "path");
-if (ret < 0) {
-error_setg_errno(errp, -ret, "failed to set path: %s",
- blkio_get_error_msg());
-return ret;
-}
-
 if (!(flags & BDRV_O_NOCACHE)) {
 error_setg(errp, "cache.direct=off is not supported");
 return -EINVAL;
 }
+
+if (blkio_get_int(s->blkio, "fd", ) == 0) {
+fd_supported = true;
+}
+
+/*
+ * If the libblkio driver supports fd passing, let's always use qemu_open()
+ * to open the `path`, so we can handle fd passing from the management
+ * layer through the "/dev/fdset/N" special path.
+ */
+if (fd_supported) {
+int open_flags;
+
+if (flags & BDRV_O_RDWR) {
+open_flags = O_RDWR;
+} else {
+open_flags = O_RDONLY;
+}
+
+fd = qemu_open(path, open_flags, errp);
+if (fd < 0) {
+return -EINVAL;
+}
+
+ret = blkio_set_int(s->blkio, "fd", fd);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set fd: %s",
+ blkio_get_error_msg());
+qemu_close(fd);
+return ret;
+}
+} else {
+ret = blkio_set_str(s->blkio, "path", path);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set path: %s",
+ blkio_get_error_msg());
+return ret;
+}
+}
+
+qdict_del(options, "path");
+
 return 0;
 }
 
-- 
2.40.1

[PULL 6/8] block: remove bdrv_co_io_plug() API

2023-06-01 Thread Stefan Hajnoczi

No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-7-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-io.h |  3 ---
 include/block/block_int-common.h | 11 --
 block/io.c   | 37 
 3 files changed, 51 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index a27e471a87..43af816d75 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -259,9 +259,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
AioContext *old_ctx);
 
 AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
 
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
-
 bool coroutine_fn GRAPH_RDLOCK
 bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
uint32_t granularity, Error **errp);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index b1cbc1e00c..74195c3004 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -768,11 +768,6 @@ struct BlockDriver {
 void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
 BlockDriverState *bs, BlkdebugEvent event);
 
-/* io queue for linux-aio */
-void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState 
*bs);
-void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
-BlockDriverState *bs);
-
 bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
 
 bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
@@ -1227,12 +1222,6 @@ struct BlockDriverState {
 unsigned int in_flight;
 unsigned int serialising_in_flight;
 
-/*
- * counter for nested bdrv_io_plug.
- * Accessed with atomic ops.
- */
-unsigned io_plugged;
-
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
diff --git a/block/io.c b/block/io.c
index 540bf8d26d..f2dfc7c405 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3223,43 +3223,6 @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t 
size)
 return mem;
 }
 
-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
-{
-BdrvChild *child;
-IO_CODE();
-assert_bdrv_graph_readable();
-
-QLIST_FOREACH(child, >children, next) {
-bdrv_co_io_plug(child->bs);
-}
-
-if (qatomic_fetch_inc(>io_plugged) == 0) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_co_io_plug) {
-drv->bdrv_co_io_plug(bs);
-}
-}
-}
-
-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
-{
-BdrvChild *child;
-IO_CODE();
-assert_bdrv_graph_readable();
-
-assert(bs->io_plugged);
-if (qatomic_fetch_dec(>io_plugged) == 1) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_co_io_unplug) {
-drv->bdrv_co_io_unplug(bs);
-}
-}
-
-QLIST_FOREACH(child, >children, next) {
-bdrv_co_io_unplug(child->bs);
-}
-}
-
 /* Helper that undoes bdrv_register_buf() when it fails partway through */
 static void GRAPH_RDLOCK
 bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
-- 
2.40.1

[PULL 5/8] block/linux-aio: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Acked-by: Kevin Wolf 
Reviewed-by: Stefano Garzarella 
Message-id: 20230530180959.1108766-6-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/raw-aio.h |  7 ---
 block/file-posix.c  | 28 
 block/linux-aio.c   | 41 +++--
 3 files changed, 11 insertions(+), 65 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index da60ca13ef..0f63c2800c 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, 
QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
-
-/*
- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void laio_io_plug(void);
-void laio_io_unplug(uint64_t dev_max_batch);
 #endif
 /* io_uring.c - Linux io_uring implementation */
 #ifdef CONFIG_LINUX_IO_URING
diff --git a/block/file-posix.c b/block/file-posix.c
index 7baa8491dd..ac1ed54811 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
-{
-BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-if (s->use_linux_aio) {
-laio_io_plug();
-}
-#endif
-}
-
-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
-{
-BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-if (s->use_linux_aio) {
-laio_io_unplug(s->aio_max_batch);
-}
-#endif
-}
-
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
@@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
 .bdrv_refresh_limits = raw_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
 .bdrv_refresh_limits = raw_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
 .bdrv_refresh_limits= cdrom_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
 .bdrv_refresh_limits= cdrom_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 916f001e32..561c71a9ae 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 
 /* Only used for assertions.  */
 #include "qemu/coroutine_int.h"
@@ -46,7 +47,6 @@ struct qemu_laiocb {
 };
 
 typedef struct {
-int plugged;
 unsigned int

[PULL 1/8] block: add blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-2-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   1 +
 include/sysemu/block-backend-io.h |  13 +--
 block/block-backend.c |  22 -
 block/plug.c  | 159 ++
 hw/block/dataplane/xen-block.c|   8 +-
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 block/meson.build |   1 +
 8 files changed, 173 insertions(+), 41 deletions(-)
 create mode 100644 block/plug.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b025a7b63..89f274f85e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2650,6 +2650,7 @@ F: util/aio-*.c
 F: util/aio-*.h
 F: util/fdmon-*.c
 F: block/io.c
+F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index d62a7ee773..be4dcef59d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,16 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-/*
- * blk_io_plug/unplug are thread-local operations. This means that multiple
- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
- * that each unplug() is called in the same IOThread of the matching plug().
- */
-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
-void co_wrapper blk_io_plug(BlockBackend *blk);
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
-void co_wrapper blk_io_unplug(BlockBackend *blk);
+void blk_io_plug(void);
+void blk_io_unplug(void);
+void blk_io_plug_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 241f643507..4009ed5fed 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2582,28 +2582,6 @@ void blk_add_insert_bs_notifier(BlockBackend *blk, 
Notifier *notify)
 notifier_list_add(>insert_bs_notifiers, notify);
 }
 
-void coroutine_fn blk_co_io_plug(BlockBackend *blk)
-{
-BlockDriverState *bs = blk_bs(blk);
-IO_CODE();
-GRAPH_RDLOCK_GUARD();
-
-if (bs) {
-bdrv_co_io_plug(bs);
-}
-}
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
-{
-BlockDriverState *bs = blk_bs(blk);
-IO_CODE();
-GRAPH_RDLOCK_GUARD();
-
-if (bs) {
-bdrv_co_io_unplug(bs);
-}
-}
-
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
 {
 IO_CODE();
diff --git a/block/plug.c b/block/plug.c
new file mode 100644
index 00..98a155d2f4
--- /dev/null
+++ b/block/plug.c
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Block I/O plugging
+ *
+ * Copyright Red Hat.
+ *
+ * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * section, allowing multiple calls to batch up. This is a performance
+ * optimization that is used in the block layer to submit several I/O requests
+ * at once instead of individually:
+ *
+ *   blk_io_plug(); <-- start of plugged region
+ *   ...
+ *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   ...
+ *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
+ *
+ * This code is actually generic and not tied to the block layer. If another
+ * subsystem needs this functionality, it could be renamed.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/coroutine-tls.h"
+#include "qemu/notify.h"
+#include "qemu/thread.h"
+#include "sysemu/block-backend.h"
+
+/* A function call that has been deferred until unplug() */
+typedef struct {
+void (*fn)(void *);
+void *opaque;
+} UnplugFn;
+
+/* Per-thread state */
+typedef struct {
+unsigned count;   /* how many times has plug()

[PULL 8/8] qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

2023-06-01 Thread Stefan Hajnoczi

From: Stefano Garzarella 

The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd
passing through the new 'fd' property.

Since now we are using qemu_open() on '@path' if the virtio-blk driver
supports the fd passing, let's announce it.
In this way, the management layer can pass the file descriptor of an
already opened vhost-vdpa character device. This is useful especially
when the device can only be accessed with certain privileges.

Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver
in libblkio supports it.

Suggested-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
Message-id: 20230530071941.8954-3-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 qapi/block-core.json | 6 ++
 meson.build  | 4 
 2 files changed, 10 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98d9116dae..4bf89171c6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3955,10 +3955,16 @@
 #
 # @path: path to the vhost-vdpa character device.
 #
+# Features:
+# @fdset: Member @path supports the special "/dev/fdset/N" path
+# (since 8.1)
+#
 # Since: 7.2
 ##
 { 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
   'data': { 'path': 'str' },
+  'features': [ { 'name' :'fdset',
+  'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ],
   'if': 'CONFIG_BLKIO' }
 
 ##
diff --git a/meson.build b/meson.build
index bc76ea96bf..a61d3e9b06 100644
--- a/meson.build
+++ b/meson.build
@@ -2106,6 +2106,10 @@ config_host_data.set('CONFIG_LZO', lzo.found())
 config_host_data.set('CONFIG_MPATH', mpathpersist.found())
 config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
 config_host_data.set('CONFIG_BLKIO', blkio.found())
+if blkio.found()
+  config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
+   blkio.version().version_compare('>=1.3.0'))
+endif
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
 config_host_data.set('CONFIG_GBM', gbm.found())
-- 
2.40.1

[PULL 3/8] block/blkio: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-4-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 43 ---
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 72117fa005..11be8787a3 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -17,6 +17,7 @@
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "sysemu/block-backend.h"
 #include "exec/memory.h" /* for ram_block_discard_disable() */
 
 #include "block/block-io.h"
@@ -320,16 +321,30 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
NULL, NULL, NULL);
 }
 
-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
-static void blkio_submit_io(BlockDriverState *bs)
+/*
+ * Called by blk_io_unplug() or immediately if not plugged. Called without
+ * blkio_lock.
+ */
+static void blkio_unplug_fn(void *opaque)
 {
-if (qatomic_read(>io_plugged) == 0) {
-BDRVBlkioState *s = bs->opaque;
+BDRVBlkioState *s = opaque;
 
+WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
 }
 }
 
+/*
+ * Schedule I/O submission after enqueuing a new request. Called without
+ * blkio_lock.
+ */
+static void blkio_submit_io(BlockDriverState *bs)
+{
+BDRVBlkioState *s = bs->opaque;
+
+blk_io_plug_call(blkio_unplug_fn, s);
+}
+
 static int coroutine_fn
 blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
@@ -340,9 +355,9 @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
 
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 blkioq_discard(s->blkioq, offset, bytes, , 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
@@ -373,9 +388,9 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 blkioq_readv(s->blkioq, offset, iov, iovcnt, , 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 
 if (use_bounce_buffer) {
@@ -418,9 +433,9 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 blkioq_writev(s->blkioq, offset, iov, iovcnt, , blkio_flags);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 
 if (use_bounce_buffer) {
@@ -439,9 +454,9 @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
 
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 blkioq_flush(s->blkioq, , 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
@@ -467,22 +482,13 @@ static int coroutine_fn 
blkio_co_pwrite_zeroes(BlockDriverState *bs,
 
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 blkioq_write_zeroes(s->blkioq, offset, bytes, , blkio_flags);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
 
-static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
-{
-BDRVBlkioState *s = bs->opaque;
-
-WITH_QEMU_LOCK_GUARD(>blkio_lock) {
-blkio_submit_io(bs);
-}
-}
-
 typedef enum {
 BMRR_OK,
 BMRR_SKIP,
@@ -1004,7 +1010,6 @@ static void blkio_refresh_limits(BlockDriverState *bs, 
Error **errp)
 .bdrv_co_pwritev = blkio_co_pwritev, \
 .bdrv_co_flush_to_disk   = blkio_co_flush, \
 .bdrv_co_pwrite_zeroes   = blkio_co_pwrite_zeroes, \
-.bdrv_co_io_unplug   = blkio_co_io_unplug, \
 .bdrv_refresh_limits = blkio_refresh_limits, \
 .bdrv_register_buf   = blkio_register_buf, \
 .bdrv_unregister_buf = blkio_unregister_buf, \
-- 
2.40.1

[PULL 4/8] block/io_uring: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-5-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/raw-aio.h |  7 ---
 block/file-posix.c  | 10 --
 block/io_uring.c| 44 -
 block/trace-events  |  5 ++---
 4 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 0fe85ade77..da60ca13ef 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -81,13 +81,6 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int 
fd, uint64_t offset,
   QEMUIOVector *qiov, int type);
 void luring_detach_aio_context(LuringState *s, AioContext *old_context);
 void luring_attach_aio_context(LuringState *s, AioContext *new_context);
-
-/*
- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void luring_io_plug(void);
-void luring_io_unplug(void);
 #endif
 
 #ifdef _WIN32
diff --git a/block/file-posix.c b/block/file-posix.c
index 0ab158efba..7baa8491dd 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2558,11 +2558,6 @@ static void coroutine_fn raw_co_io_plug(BlockDriverState 
*bs)
 laio_io_plug();
 }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-luring_io_plug();
-}
-#endif
 }
 
 static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
@@ -2573,11 +2568,6 @@ static void coroutine_fn 
raw_co_io_unplug(BlockDriverState *bs)
 laio_io_unplug(s->aio_max_batch);
 }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-luring_io_unplug();
-}
-#endif
 }
 
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
diff --git a/block/io_uring.c b/block/io_uring.c
index 3a77480e16..69d9820928 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -16,6 +16,7 @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 #include "trace.h"
 
 /* Only used for assertions.  */
@@ -41,7 +42,6 @@ typedef struct LuringAIOCB {
 } LuringAIOCB;
 
 typedef struct LuringQueue {
-int plugged;
 unsigned int in_queue;
 unsigned int in_flight;
 bool blocked;
@@ -267,7 +267,7 @@ static void 
luring_process_completions_and_submit(LuringState *s)
 {
 luring_process_completions(s);
 
-if (!s->io_q.plugged && s->io_q.in_queue > 0) {
+if (s->io_q.in_queue > 0) {
 ioq_submit(s);
 }
 }
@@ -301,29 +301,17 @@ static void qemu_luring_poll_ready(void *opaque)
 static void ioq_init(LuringQueue *io_q)
 {
 QSIMPLEQ_INIT(_q->submit_queue);
-io_q->plugged = 0;
 io_q->in_queue = 0;
 io_q->in_flight = 0;
 io_q->blocked = false;
 }
 
-void luring_io_plug(void)
+static void luring_unplug_fn(void *opaque)
 {
-AioContext *ctx = qemu_get_current_aio_context();
-LuringState *s = aio_get_linux_io_uring(ctx);
-trace_luring_io_plug(s);
-s->io_q.plugged++;
-}
-
-void luring_io_unplug(void)
-{
-AioContext *ctx = qemu_get_current_aio_context();
-LuringState *s = aio_get_linux_io_uring(ctx);
-assert(s->io_q.plugged);
-trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
-   s->io_q.in_queue, s->io_q.in_flight);
-if (--s->io_q.plugged == 0 &&
-!s->io_q.blocked && s->io_q.in_queue > 0) {
+LuringState *s = opaque;
+trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
+   s->io_q.in_flight);
+if (!s->io_q.blocked && s->io_q.in_queue > 0) {
 ioq_submit(s);
 }
 }
@@ -370,14 +358,16 @@ static int luring_do_submit(int fd, LuringAIOCB 
*luringcb, LuringState *s,
 
 QSIMPLEQ_INSERT_TAIL(>io_q.submit_queue, luringcb, next);
 s->io_q.in_queue++;
-trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
-   s->io_q.in_queue, s->io_q.in_flight);
-if (!s->io_q.blocked &&
-(!s->io_q.plugged ||
- s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-ret = ioq_submit(s);
-trace_luring_do_submit_done(s, ret);
-return ret;
+trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
+   s->io_q.in_flight);
+if (!s->io_q.blocked) {
+if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
+ret = ioq_submit(s);
+trace_luring_do_submit_done(s, ret);
+return ret;
+}
+
+blk_io_plug_call(luring_unplug_fn, s);
 }
 return 0;
 }
diff --git a/block/trace-events b/block/trace-events
index 048ad27519..6f121b7636

[PULL 0/8] Block patches

2023-06-01 Thread Stefan Hajnoczi

The following changes since commit c6a5fc2ac76c5ab709896ee1b0edd33685a67ed1:

  decodetree: Add --output-null for meson testing (2023-05-31 19:56:42 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 98b126f5e3228a346c774e569e26689943b401dd:

  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa (2023-06-01 
11:08:21 -0400)


Pull request

- Stefano Garzarella's blkio block driver 'fd' parameter
- My thread-local blk_io_plug() series



Stefan Hajnoczi (6):
  block: add blk_io_plug_call() API
  block/nvme: convert to blk_io_plug_call() API
  block/blkio: convert to blk_io_plug_call() API
  block/io_uring: convert to blk_io_plug_call() API
  block/linux-aio: convert to blk_io_plug_call() API
  block: remove bdrv_co_io_plug() API

Stefano Garzarella (2):
  block/blkio: use qemu_open() to support fd passing for virtio-blk
  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

 MAINTAINERS   |   1 +
 qapi/block-core.json  |   6 ++
 meson.build   |   4 +
 include/block/block-io.h  |   3 -
 include/block/block_int-common.h  |  11 ---
 include/block/raw-aio.h   |  14 ---
 include/sysemu/block-backend-io.h |  13 +--
 block/blkio.c |  96 --
 block/block-backend.c |  22 -
 block/file-posix.c|  38 ---
 block/io.c|  37 ---
 block/io_uring.c  |  44 -
 block/linux-aio.c |  41 +++-
 block/nvme.c  |  44 +++--
 block/plug.c  | 159 ++
 hw/block/dataplane/xen-block.c|   8 +-
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 block/meson.build |   1 +
 block/trace-events|   6 +-
 20 files changed, 293 insertions(+), 265 deletions(-)
 create mode 100644 block/plug.c

-- 
2.40.1

[PULL 2/8] block/nvme: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/nvme.c   | 44 
 block/trace-events |  1 -
 2 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 17937d398d..7ca85bc44a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -25,6 +25,7 @@
 #include "qemu/vfio-helpers.h"
 #include "block/block-io.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/replay.h"
 #include "trace.h"
 
@@ -119,7 +120,6 @@ struct BDRVNVMeState {
 int blkshift;
 
 uint64_t max_transfer;
-bool plugged;
 
 bool supports_write_zeroes;
 bool supports_discard;
@@ -282,7 +282,7 @@ static void nvme_kick(NVMeQueuePair *q)
 {
 BDRVNVMeState *s = q->s;
 
-if (s->plugged || !q->need_kick) {
+if (!q->need_kick) {
 return;
 }
 trace_nvme_kick(s, q->index);
@@ -387,10 +387,6 @@ static bool nvme_process_completion(NVMeQueuePair *q)
 NvmeCqe *c;
 
 trace_nvme_process_completion(s, q->index, q->inflight);
-if (s->plugged) {
-trace_nvme_process_completion_queue_plugged(s, q->index);
-return false;
-}
 
 /*
  * Support re-entrancy when a request cb() function invokes aio_poll().
@@ -480,6 +476,15 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
+static void nvme_unplug_fn(void *opaque)
+{
+NVMeQueuePair *q = opaque;
+
+QEMU_LOCK_GUARD(>lock);
+nvme_kick(q);
+nvme_process_completion(q);
+}
+
 static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
 NvmeCmd *cmd, BlockCompletionFunc cb,
 void *opaque)
@@ -496,8 +501,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
 q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
 q->need_kick++;
-nvme_kick(q);
-nvme_process_completion(q);
+blk_io_plug_call(nvme_unplug_fn, q);
 qemu_mutex_unlock(>lock);
 }
 
@@ -1567,27 +1571,6 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 }
 }
 
-static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
-{
-BDRVNVMeState *s = bs->opaque;
-assert(!s->plugged);
-s->plugged = true;
-}
-
-static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
-{
-BDRVNVMeState *s = bs->opaque;
-assert(s->plugged);
-s->plugged = false;
-for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
-NVMeQueuePair *q = s->queues[i];
-qemu_mutex_lock(>lock);
-nvme_kick(q);
-nvme_process_completion(q);
-qemu_mutex_unlock(>lock);
-}
-}
-
 static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
   Error **errp)
 {
@@ -1664,9 +1647,6 @@ static BlockDriver bdrv_nvme = {
 .bdrv_detach_aio_context  = nvme_detach_aio_context,
 .bdrv_attach_aio_context  = nvme_attach_aio_context,
 
-.bdrv_co_io_plug  = nvme_co_io_plug,
-.bdrv_co_io_unplug= nvme_co_io_unplug,
-
 .bdrv_register_buf= nvme_register_buf,
 .bdrv_unregister_buf  = nvme_unregister_buf,
 };
diff --git a/block/trace-events b/block/trace-events
index 32665158d6..048ad27519 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -141,7 +141,6 @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
"cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
 nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u 
inflight %d"
-nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
 nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int 
c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
-- 
2.40.1

Re: [PATCH v5 0/2] block/blkio: support fd passing for virtio-blk-vhost-vdpa driver

2023-06-01 Thread Stefan Hajnoczi

On Tue, May 30, 2023 at 09:19:39AM +0200, Stefano Garzarella wrote:
> v5:
> - moved `features` to the object level to simplify libvirt code [Jonathon]
> - wrapped a line too long in the documentation [Markus]
> - added Stefan R-b tags
> 
> v4: 
> https://lore.kernel.org/qemu-devel/20230526150304.158206-1-sgarz...@redhat.com/
> - added patch 02 to allow libvirt to discover we support fdset [Markus]
> - modified the commit description of patch 01
> 
> v3: 
> https://lore.kernel.org/qemu-devel/20230511091527.46620-1-sgarz...@redhat.com/
> - use qemu_open() on `path` to simplify libvirt code [Jonathon]
> - remove patch 01 since we are not using monitor_fd_param() anymore
> 
> v2: 
> https://lore.kernel.org/qemu-devel/20230504092843.62493-1-sgarz...@redhat.com/
> - added patch 01 to use monitor_fd_param() in the blkio module
> - use monitor_fd_param() to parse the fd like vhost devices [Stefan]
> 
> v1: 
> https://lore.kernel.org/qemu-devel/20230502145050.224615-1-sgarz...@redhat.com/
> 
> The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the new
> 'fd' property. Let's expose this to the user, so the management layer
> can pass the file descriptor of an already opened vhost-vdpa character
> device. This is useful especially when the device can only be accessed
> with certain privileges.
> 
> Stefano Garzarella (2):
>   block/blkio: use qemu_open() to support fd passing for virtio-blk
>   qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa
> 
>  meson.build  |  4 
>  qapi/block-core.json |  6 +
>  block/blkio.c| 53 
>  3 files changed, 54 insertions(+), 9 deletions(-)
> 
> -- 
> 2.40.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 0/2] bulk: Remove pointless QOM casts

2023-06-01 Thread Richard Henderson


On 6/1/23 02:34, Philippe Mathieu-Daudé wrote:

As per Markus suggestion in [*], use Coccinelle to remove
pointless QOM cast macro uses. Since we have more than 1000
QOM types, add a script to generate the semantic patch.

[*]https://lore.kernel.org/qemu-devel/87mt1jafjt@pond.sub.org/

Philippe Mathieu-Daudé (2):
   scripts: Add qom-cast-macro-clean-cocci-gen.py
   bulk: Remove pointless QOM casts


Cool!

Reviewed-by: Richard Henderson 


r~

Re: [PATCH 01/12] file-posix: remove incorrect coroutine_fn calls

2023-06-01 Thread Paolo Bonzini

Il gio 1 giu 2023, 15:50 Eric Blake  ha scritto:

> > @@ -2696,7 +2696,7 @@ static int coroutine_fn
> raw_co_truncate(BlockDriverState *bs, int64_t offset,
> >  }
> >
> >  if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
> > -int64_t cur_length = raw_co_getlength(bs);
> > +int64_t cur_length = raw_getlength(bs);
>
> Shouldn't this one still call the raw_co_getlength() wrapper?
>

It could, but instead I wanted to clarify that this will never suspend
(because it's calling the function directly rather than bdrv_co_getlength).

Paolo


> > @@ -3245,7 +3250,7 @@ static int coroutine_fn
> raw_co_block_status(BlockDriverState *bs,
> >   * round up if necessary.
> >   */
> >  if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
> > -int64_t file_length = raw_co_getlength(bs);
> > +int64_t file_length = raw_getlength(bs);
>
> Likewise this one?
>
> >
> >  static bool coroutine_fn cdrom_co_is_inserted(BlockDriverState *bs)
> >  {
> > -return raw_co_getlength(bs) > 0;
> > +return raw_getlength(bs) > 0;
> >  }
>
> and this one?
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>
>

Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies

2023-06-01 Thread Laszlo Ersek

On 6/1/23 15:00, Eric Blake wrote:
> On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:

>> Probably best to reorder the files in this patch for review:
> 
> I see what you mean: because of the state hierarchy, it is probably
> worth tweaking the git orderfile to favor files nearer the top of the
> state tree first, rather than strict alphabetical ordering of *.c.  I
> may just push a patch for that without needing review, as it doesn't
> affect library semantics at all.

Ouch, I'm sorry, I meant that *I* should rearrange the hunks in the
patch for review! I didn't mean to ask you for any orderfile changes! :)

Of course if you can do that: high five! :)

> 
> ...
>>> +++ b/generator/states-reply.c
>>> @@ -62,15 +62,19 @@  REPLY.START:
>>> */
>>>ssize_t r;
>>>
>>> -  /* We read all replies initially as if they are simple replies, but
>>> -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
>>> -   * This works because the structured_reply header is larger.
>>> +  /* With extended headers, there is only one size to read, so we can do it
>>> +   * all in one syscall.  But for older structured replies, we don't know 
>>> if
>>> +   * we have a simple or structured reply until we read the magic number,
>>> +   * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
>>> */
>>>assert (h->reply_cmd == NULL);
>>>assert (h->rlen == 0);
>>>
>>>h->rbuf = >sbuf.reply.hdr;
>>> -  h->rlen = sizeof h->sbuf.reply.hdr.simple;
>>> +  if (h->extended_headers)
>>> +h->rlen = sizeof h->sbuf.reply.hdr.extended;
>>> +  else
>>> +h->rlen = sizeof h->sbuf.reply.hdr.simple;
>>>
>>>r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
>>>if (r == -1) {
>>
>> The comment "This works because the structured_reply header is larger"
>> is being removed, which makes the non-ext branch even less
>> comprehensible than before.
>>
>> (2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is
>> smaller than "sizeof structured"?
> 
> Yep, I'm already rebasing onto some additional asserts along those
> lines, based on your reviews earlier in the series.
> 
>>
>>> @@ -116,16 +120,22 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
>>>uint64_t cookie;
>>>
>>>magic = be32toh (h->sbuf.reply.hdr.simple.magic);
>>> -  if (magic == NBD_SIMPLE_REPLY_MAGIC) {
>>> +  switch (magic) {
>>> +  case NBD_SIMPLE_REPLY_MAGIC:
>>> +if (h->extended_headers)
>>> +  goto invalid;
>>>  SET_NEXT_STATE (%SIMPLE_REPLY.START);
>>> -  }
>>> -  else if (magic == NBD_STRUCTURED_REPLY_MAGIC) {
>>> +break;
>>> +  case NBD_STRUCTURED_REPLY_MAGIC:
>>> +  case NBD_EXTENDED_REPLY_MAGIC:
>>> +if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)
>>
>> Heh, I love this ((a == b) == c) "equivalence" form! :)
> 
> I still do a double-take every time I see 'a < b < c' in languages
> (like python) that support it as shorthand for 'a < b && b < c'.  Even
> weirder is when you see 'a < b > c'.  But yes, C's non-transitive ==
> can be a surprise for the uninitiated.
> 
>>
>>> +  goto invalid;
>>>  SET_NEXT_STATE (%STRUCTURED_REPLY.START);
>>> -  }
>>> -  else {
>>> -/* We've probably lost synchronization. */
>>> -SET_NEXT_STATE (%.DEAD);
>>> -set_error (0, "invalid reply magic 0x%" PRIx32, magic);
>>> +break;
>>> +  default:
>>> +  invalid:
>>> +SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
>>> +set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic);
>>>  #if 0 /* uncomment to see desynchronized data */
>>>  nbd_internal_hexdump (>sbuf.reply.hdr.simple,
>>>sizeof (h->sbuf.reply.hdr.simple),
>>
>> My apologies, but I find *this* placement of "invalid" terrible. I
>> thought the "goto invalid" statements were jumping to the end of the
>> function. Now I see they jump effectively to another case label.
>>
>> (3) How about this (on top of your patch, to be squashed):
> 
> That part of the patch pre-dates other reviews I've seen from you on
> the same topic (this patch series has been percolating on my local
> builds for a long time), so I'm not surprised by your request, and I'm
> happy to squash in your improvement.  I may have copied from other
> similar code in the generator/states-*.c files, if so, I'll probably
> do a separate cleanup patch first.
> 
>>
>> ... At the same time, even with this "cleanup" for the labels, I find it
>> relatively confusing (albeit correct, as far as I can tell!) that in
>> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we first check the magic
>> number(s) and *then* whether we negotiated extended headers, whereas in
>> all the other state handlers, the order (= condition nesting) is the
>> opposite: we first check extended headers, and deal with any other
>> objects second. This makes the assert()s in REPLY.STRUCTURED_REPLY.START
>> especially tricky to demonstrate -- I think I've managed to do it, it
>> looks correct, it's just hard.

Re: [PATCH 0/9] misc AHCI cleanups

2023-06-01 Thread Niklas Cassel

On Wed, May 17, 2023 at 01:06:06PM -0400, John Snow wrote:
> On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel  wrote:
> >
> > From: Niklas Cassel 
> >
> > Hello John,
> >
> 
> Hi Niklas!
> 
> I haven't been actively involved with AHCI for a while, so I am not
> sure I can find the time to give this a deep scrub. I'm going to
> assume based on '@wdc.com` that you probably know a thing or two more
> about AHCI than I do, though. Can you tell me what testing you've
> performed on this? As long as it doesn't cause any obvious
> regressions, we might be able to push it through, but it might not be
> up to me anymore. I can give it a review on technical merit, but with
> regards to "correctness" I have to admit I am flying blind.

Hello John,

The testing is mostly using linux and injecting NCQ errors using some
additional QEMU patches that are not part of this series.

> 
> (I haven't looked at the patches yet, but if in your commit messages
> you can point to the relevant sections of the relevant specifications,
> that'd help immensely.)
> 
> > Here comes some misc AHCI cleanups.
> >
> > Most are related to error handling.
> 
> I've always found this the most difficult part to verify. In a real
> system, the registers between AHCI and the actual hard disk are
> *physically separate*, and they update at specific times based on the
> transmission of the FIS packets. The model in QEMU doesn't bother with
> a perfect reproduction of that, and so it's been a little tough to
> verify correctness. I tried to improve it a bit back in the day, but
> my understanding has always been a bit limited :)
> 
> Are there any vendor tools you're aware of that have test suites we
> could use to verify behavior?

Unfortunately, I don't know of any good test suite.

> A question for you: is it worth solidifying which ATA specification we
> conform to? I don't believe we adhere to any one specific model,
> because a lot of the code is shared between PATA and SATA, and we "in
> theory" support IDE hard drives for fairly old guest operating systems
> that may or may not predate the DMA extensions. As a result, the
> actual device emulation is kind of a mish-mash of different ATA
> specifications, generally whichever version provided the
> least-abstracted detail and was easy to implement.
> 
> If you're adding the logging features, that seems to push us towards
> the newer end of the spectrum, but I'm not sure if this causes any
> problems for guest operating systems doing probing to guess what kind
> of device they're talking to.
> 
> Any input?

I agree.

In my next series, after we have General Purpose Logging support,
I intend to bump the major version to indicate ACS-5 support.
I will need to verify that we are not missing any other major
feature from ACS-5 first though (other than GPL).


Kind regards,
Niklas

Re: [PATCH 01/12] file-posix: remove incorrect coroutine_fn calls

2023-06-01 Thread Eric Blake

On Thu, Jun 01, 2023 at 01:51:34PM +0200, Paolo Bonzini wrote:
> raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a 
> coroutine
> function.  This is harmless because raw_co_getlength does not actually 
> suspend,
> but in the interest of clarity make it a non-coroutine_fn that is just wrapped
> by the coroutine_fn raw_co_getlength.  Likewise, check_cache_dropped was only
> a coroutine_fn because it called raw_co_getlength, so it can be made 
> non-coroutine
> as well.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/file-posix.c | 29 +
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 

> @@ -2696,7 +2696,7 @@ static int coroutine_fn 
> raw_co_truncate(BlockDriverState *bs, int64_t offset,
>  }
>  
>  if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
> -int64_t cur_length = raw_co_getlength(bs);
> +int64_t cur_length = raw_getlength(bs);

Shouldn't this one still call the raw_co_getlength() wrapper?

> @@ -3245,7 +3250,7 @@ static int coroutine_fn 
> raw_co_block_status(BlockDriverState *bs,
>   * round up if necessary.
>   */
>  if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
> -int64_t file_length = raw_co_getlength(bs);
> +int64_t file_length = raw_getlength(bs);

Likewise this one?

>  
>  static bool coroutine_fn cdrom_co_is_inserted(BlockDriverState *bs)
>  {
> -return raw_co_getlength(bs) > 0;
> +return raw_getlength(bs) > 0;
>  }

and this one?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v2 8/8] hw/ide/ahci: fix broken SError handling

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

When encountering an NCQ error, you should not write the NCQ tag to the
SError register. This is completely wrong.

The SError register has a clear definition, where each bit represents a
different error, see PxSERR definition in AHCI 1.3.1.

If we write a random value (like the NCQ tag) in SError, e.g. Linux will
read SError, and will trigger arbitrary error handling depending on the
NCQ tag that happened to be executing.

In case of success, ncq_cb() will call ncq_finish().
In case of error, ncq_cb() will call ncq_err() (which will clear
ncq_tfs->used), and then call ncq_finish(), thus using ncq_tfs->used is
sufficient to tell if finished should get set or not.

Signed-off-by: Niklas Cassel 
---
 hw/ide/ahci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index ef6c9fc378..d0a774bc17 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1012,7 +1012,6 @@ static void ncq_err(NCQTransferState *ncq_tfs)
 
 ide_state->error = ABRT_ERR;
 ide_state->status = READY_STAT | ERR_STAT;
-ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
 qemu_sglist_destroy(_tfs->sglist);
 ncq_tfs->used = 0;
 }
@@ -1022,7 +1021,7 @@ static void ncq_finish(NCQTransferState *ncq_tfs)
 /* If we didn't error out, set our finished bit. Errored commands
  * do not get a bit set for the SDB FIS ACT register, nor do they
  * clear the outstanding bit in scr_act (PxSACT). */
-if (!(ncq_tfs->drive->port_regs.scr_err & (1 << ncq_tfs->tag))) {
+if (ncq_tfs->used) {
 ncq_tfs->drive->finished |= (1 << ncq_tfs->tag);
 }
 
-- 
2.40.1

[PATCH v2 7/8] hw/ide/ahci: fix ahci_write_fis_sdb()

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

When there is an error, we need to raise a TFES error irq, see AHCI 1.3.1,
5.3.13.1 SDB:Entry.

If ERR_STAT is set, we jump to state ERR:FatalTaskfile, which will raise
a TFES IRQ unconditionally, regardless if the I bit is set in the FIS or
not.

Thus, we should never raise a normal IRQ after having sent an error IRQ.

It is valid to signal successfully completed commands as finished in the
same SDB FIS that generates the error IRQ. The important thing is that
commands that did not complete successfully (e.g. commands that were
aborted, do not get the finished bit set).

Before this commit, there was never a TFES IRQ raised on NCQ error.

Signed-off-by: Niklas Cassel 
---
 hw/ide/ahci.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 12aaadc554..ef6c9fc378 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -806,8 +806,14 @@ static void ahci_write_fis_sdb(AHCIState *s, 
NCQTransferState *ncq_tfs)
 pr->scr_act &= ~ad->finished;
 ad->finished = 0;
 
-/* Trigger IRQ if interrupt bit is set (which currently, it always is) */
-if (sdb_fis->flags & 0x40) {
+/*
+ * TFES IRQ is always raised if ERR_STAT is set, regardless of I bit.
+ * If ERR_STAT is not set, trigger SDBS IRQ if interrupt bit is set
+ * (which currently, it always is).
+ */
+if (sdb_fis->status & ERR_STAT) {
+ahci_trigger_irq(s, ad, AHCI_PORT_IRQ_BIT_TFES);
+} else if (sdb_fis->flags & 0x40) {
 ahci_trigger_irq(s, ad, AHCI_PORT_IRQ_BIT_SDBS);
 }
 }
-- 
2.40.1

[PATCH v2 4/8] hw/ide/ahci: simplify and document PxCI handling

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

The AHCI spec states that:
For NCQ, PxCI is cleared on command queued successfully.

For non-NCQ, PxCI is cleared on command completed successfully.
(A non-NCQ command that completes with error does not clear PxCI.)

The current QEMU implementation either clears PxCI in check_cmd(),
or in ahci_cmd_done().

check_cmd() will clear PxCI for a command if handle_cmd() returns 0.
handle_cmd() will return -1 if BUSY or DRQ is set.

The QEMU implementation for NCQ commands will currently not set BUSY
or DRQ, so they will always have PxCI cleared by handle_cmd().
ahci_cmd_done() will never even get called for NCQ commands.

Non-NCQ commands are executed by ide_bus_exec_cmd().
Non-NCQ commands in QEMU are implemented either in a sync or in an async
way.

For non-NCQ commands implemented in a sync way, the command handler will
return true, and when ide_bus_exec_cmd() sees that a command handler
returns true, it will call ide_cmd_done() (which will call
ahci_cmd_done()). For a command implemented in a sync way,
ahci_cmd_done() will do nothing (since busy_slot is not set). Instead,
after ide_bus_exec_cmd() has finished, check_cmd() will clear PxCI for
these commands.

For non-NCQ commands implemented in an async way (using either aiocb or
pio_aiocb), the command handler will return false, ide_bus_exec_cmd()
will not call ide_cmd_done(), instead it is expected that the async
callback function will call ide_cmd_done() once the async command is
done. handle_cmd() will set busy_slot, if and only if BUSY or DRQ is
set, and this is checked _after_ ide_bus_exec_cmd() has returned.
handle_cmd() will return -1, so check_cmd() will not clear PxCI.
When the async callback calls ide_cmd_done() (which will call
ahci_cmd_done()), it will see that busy_slot is set, and
ahci_cmd_done() will clear PxCI.

This seems racy, since busy_slot is set _after_ ide_bus_exec_cmd() has
returned. The callback might come before busy_slot gets set. And it is
quite confusing that ahci_cmd_done() will be called for all non-NCQ
commands when the command is done, but will only clear PxCI in certain
cases, even though it will always write a D2H FIS and raise an IRQ.

Even worse, in the case where ahci_cmd_done() does not clear PxCI, it
still raises an IRQ. Host software might thus read an old PxCI value,
since PxCI is cleared (by check_cmd()) after the IRQ has been raised.

Try to simplify this by always setting busy_slot for non-NCQ commands,
such that ahci_cmd_done() will always be responsible for clearing PxCI
for non-NCQ commands.

For NCQ commands, clear PxCI when we receive the D2H FIS, but before
raising the IRQ, see AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and
RegFIS:ClearCI.

Signed-off-by: Niklas Cassel 
---
 hw/ide/ahci.c | 70 ---
 1 file changed, 50 insertions(+), 20 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4b272397fd..3deaf01add 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -41,9 +41,10 @@
 #include "trace.h"
 
 static void check_cmd(AHCIState *s, int port);
-static int handle_cmd(AHCIState *s, int port, uint8_t slot);
+static void handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
 static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
+static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t slot);
 static void ahci_init_d2h(AHCIDevice *ad);
 static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
 static bool ahci_map_clb_address(AHCIDevice *ad);
@@ -591,9 +592,8 @@ static void check_cmd(AHCIState *s, int port)
 
 if ((pr->cmd & PORT_CMD_START) && pr->cmd_issue) {
 for (slot = 0; (slot < 32) && pr->cmd_issue; slot++) {
-if ((pr->cmd_issue & (1U << slot)) &&
-!handle_cmd(s, port, slot)) {
-pr->cmd_issue &= ~(1U << slot);
+if (pr->cmd_issue & (1U << slot)) {
+handle_cmd(s, port, slot);
 }
 }
 }
@@ -1123,6 +1123,22 @@ static void process_ncq_command(AHCIState *s, int port, 
const uint8_t *cmd_fis,
 return;
 }
 
+/*
+ * A NCQ command clears the bit in PxCI after the command has been QUEUED
+ * successfully (ERROR not set, BUSY and DRQ cleared).
+ *
+ * For NCQ commands, PxCI will always be cleared here.
+ *
+ * (Once the NCQ command is COMPLETED, the device will send a SDB FIS with
+ * the interrupt bit set, which will clear PxSACT and raise an interrupt.)
+ */
+ahci_clear_cmd_issue(ad, slot);
+
+/*
+ * In reality, for NCQ commands, PxCI is cleared after receiving a D2H FIS
+ * without the interrupt bit set, but since ahci_write_fis_d2h() can raise
+ * an IRQ on error, we need to call them in reverse order.
+ */
 ahci_write_fis_d2h(ad, false);
 
 ncq_tfs->used = 1;
@@ -1197,6 +1213,7 @@ static void handle_reg_h2d_fis(AHCIState *s, int port,
 {
 IDEState *ide_state =

[PATCH v2 6/8] hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

According to AHCI 1.3.1 definition of PxSACT:
This field is cleared when PxCMD.ST is written from a '1' to a '0' by
software. This field is not cleared by a COMRESET or a software reset.

According to AHCI 1.3.1 definition of PxCI:
This field is also cleared when PxCMD.ST is written from a '1' to a '0'
by software.

Clearing PxCMD.ST is part of the error recovery procedure, see
AHCI 1.3.1, section "6.2 Error Recovery".

If we don't clear PxCI on error recovery, the previous command will
incorrectly still be marked as pending after error recovery.

Signed-off-by: Niklas Cassel 
---
 hw/ide/ahci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 1237f94ddc..12aaadc554 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -329,6 +329,11 @@ static void ahci_port_write(AHCIState *s, int port, int 
offset, uint32_t val)
 ahci_check_irq(s);
 break;
 case AHCI_PORT_REG_CMD:
+if ((pr->cmd & PORT_CMD_START) && !(val & PORT_CMD_START)) {
+pr->scr_act = 0;
+pr->cmd_issue = 0;
+}
+
 /* Block any Read-only fields from being set;
  * including LIST_ON and FIS_ON.
  * The spec requires to set ICC bits to zero after the ICC change
-- 
2.40.1

[PATCH v2 5/8] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

For NCQ, PxCI is cleared on command queued successfully.
For non-NCQ, PxCI is cleared on command completed successfully.
Successfully means ERR_STAT, BUSY and DRQ are all cleared.

A command that has ERR_STAT set, does not get to clear PxCI.
See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
and 5.3.16.5 ERR:FatalTaskfile.

In the case of non-NCQ commands, not clearing PxCI is needed in order
for host software to be able to see which command slot that failed.

Signed-off-by: Niklas Cassel 
---
 hw/ide/ahci.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 3deaf01add..1237f94ddc 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1518,7 +1518,8 @@ static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t 
slot)
 {
 IDEState *ide_state = >port.ifs[0];
 
-if (!(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
+if (!(ide_state->status & ERR_STAT) &&
+!(ide_state->status & (BUSY_STAT | DRQ_STAT))) {
 ad->port_regs.cmd_issue &= ~(1 << slot);
 }
 }
@@ -1527,6 +1528,7 @@ static void ahci_clear_cmd_issue(AHCIDevice *ad, uint8_t 
slot)
 static void ahci_cmd_done(const IDEDMA *dma)
 {
 AHCIDevice *ad = DO_UPCAST(AHCIDevice, dma, dma);
+IDEState *ide_state = >port.ifs[0];
 
 trace_ahci_cmd_done(ad->hba, ad->port_no);
 
@@ -1543,7 +1545,8 @@ static void ahci_cmd_done(const IDEDMA *dma)
  */
 ahci_write_fis_d2h(ad, true);
 
-if (ad->port_regs.cmd_issue && !ad->check_bh) {
+if (!(ide_state->status & ERR_STAT) &&
+ad->port_regs.cmd_issue && !ad->check_bh) {
 ad->check_bh = qemu_bh_new_guarded(ahci_check_cmd_bh, ad,
>mem_reentrancy_guard);
 qemu_bh_schedule(ad->check_bh);
-- 
2.40.1

[PATCH v2 3/8] hw/ide/ahci: write D2H FIS on when processing NCQ command

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

The way that BUSY + PxCI is cleared for NCQ (FPDMA QUEUED) commands is
described in SATA 3.5a Gold:

11.15 FPDMA QUEUED command protocol
DFPDMAQ2: ClearInterfaceBsy
"Transmit Register Device to Host FIS with the BSY bit cleared to zero
and the DRQ bit cleared to zero and Interrupt bit cleared to zero to
mark interface ready for the next command."

PxCI is currently cleared by handle_cmd(), but we don't write the D2H
FIS to the FIS Receive Area that actually caused PxCI to be cleared.

Similar to how ahci_pio_transfer() calls ahci_write_fis_pio() with an
additional parameter to write a PIO Setup FIS without raising an IRQ,
add a parameter to ahci_write_fis_d2h() so that ahci_write_fis_d2h()
also can write the FIS to the FIS Receive Area without raising an IRQ.

Change process_ncq_command() to call ahci_write_fis_d2h() without
raising an IRQ (similar to ahci_pio_transfer()), such that the FIS
Receive Area is in sync with the PxTFD shadow register.

E.g. Linux reads status and error fields from the FIS Receive Area
directly, so it is wise to keep the FIS Receive Area and the PxTFD
shadow register in sync.

Signed-off-by: Niklas Cassel 
---
 hw/ide/ahci.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 48d550f633..4b272397fd 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -43,7 +43,7 @@
 static void check_cmd(AHCIState *s, int port);
 static int handle_cmd(AHCIState *s, int port, uint8_t slot);
 static void ahci_reset_port(AHCIState *s, int port);
-static bool ahci_write_fis_d2h(AHCIDevice *ad);
+static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i);
 static void ahci_init_d2h(AHCIDevice *ad);
 static int ahci_dma_prepare_buf(const IDEDMA *dma, int32_t limit);
 static bool ahci_map_clb_address(AHCIDevice *ad);
@@ -618,7 +618,7 @@ static void ahci_init_d2h(AHCIDevice *ad)
 return;
 }
 
-if (ahci_write_fis_d2h(ad)) {
+if (ahci_write_fis_d2h(ad, true)) {
 ad->init_d2h_sent = true;
 /* We're emulating receiving the first Reg H2D Fis from the device;
  * Update the SIG register, but otherwise proceed as normal. */
@@ -850,7 +850,7 @@ static void ahci_write_fis_pio(AHCIDevice *ad, uint16_t 
len, bool pio_fis_i)
 }
 }
 
-static bool ahci_write_fis_d2h(AHCIDevice *ad)
+static bool ahci_write_fis_d2h(AHCIDevice *ad, bool d2h_fis_i)
 {
 AHCIPortRegs *pr = >port_regs;
 uint8_t *d2h_fis;
@@ -864,7 +864,7 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
 d2h_fis = >res_fis[RES_FIS_RFIS];
 
 d2h_fis[0] = SATA_FIS_TYPE_REGISTER_D2H;
-d2h_fis[1] = (1 << 6); /* interrupt bit */
+d2h_fis[1] = d2h_fis_i ? (1 << 6) : 0; /* interrupt bit */
 d2h_fis[2] = s->status;
 d2h_fis[3] = s->error;
 
@@ -890,7 +890,10 @@ static bool ahci_write_fis_d2h(AHCIDevice *ad)
 ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_TFES);
 }
 
-ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
+if (d2h_fis_i) {
+ahci_trigger_irq(ad->hba, ad, AHCI_PORT_IRQ_BIT_DHRS);
+}
+
 return true;
 }
 
@@ -1120,6 +1123,8 @@ static void process_ncq_command(AHCIState *s, int port, 
const uint8_t *cmd_fis,
 return;
 }
 
+ahci_write_fis_d2h(ad, false);
+
 ncq_tfs->used = 1;
 ncq_tfs->drive = ad;
 ncq_tfs->slot = slot;
@@ -1506,7 +1511,7 @@ static void ahci_cmd_done(const IDEDMA *dma)
 }
 
 /* update d2h status */
-ahci_write_fis_d2h(ad);
+ahci_write_fis_d2h(ad, true);
 
 if (ad->port_regs.cmd_issue && !ad->check_bh) {
 ad->check_bh = qemu_bh_new_guarded(ahci_check_cmd_bh, ad,
-- 
2.40.1

[PATCH v2 2/8] hw/ide/core: set ERR_STAT in unsupported command completion

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

Currently, the first time sending an unsupported command
(e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
Sending the unsupported command again, will correctly have ERR_STAT set.

When ide_cmd_permitted() returns false, it calls ide_abort_command().
ide_abort_command() first calls ide_transfer_stop(), which will call
ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
sets ERR_STAT in status.

ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
current status in the FIS, and raises an IRQ. (The status here will not
have ERR_STAT set!).

Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
ide_transfer_stop() will result in the FIS being written and an IRQ
being raised.

The reason why it works the second time, is that ERR_STAT will still
be set from the previous command, so when writing the FIS, the
completion will correctly have ERR_STAT set.

Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
raise an error IRQ correctly when receiving an unsupported command.

Signed-off-by: Niklas Cassel 
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index de48ff9f86..07971c0218 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -533,9 +533,9 @@ BlockAIOCB *ide_issue_trim(
 
 void ide_abort_command(IDEState *s)
 {
-ide_transfer_stop(s);
 s->status = READY_STAT | ERR_STAT;
 s->error = ABRT_ERR;
+ide_transfer_stop(s);
 }
 
 static void ide_set_retry(IDEState *s)
-- 
2.40.1

[PATCH v2 1/8] hw/ide/ahci: remove stray backslash

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

This backslash obviously does not belong here, so remove it.

Signed-off-by: Niklas Cassel 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: John Snow 
---
 hw/ide/ahci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4e76d6b191..48d550f633 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -690,7 +690,7 @@ static void ahci_reset_port(AHCIState *s, int port)
 
 s->dev[port].port_state = STATE_RUN;
 if (ide_state->drive_kind == IDE_CD) {
-ahci_set_signature(d, SATA_SIGNATURE_CDROM);\
+ahci_set_signature(d, SATA_SIGNATURE_CDROM);
 ide_state->status = SEEK_STAT | WRERR_STAT | READY_STAT;
 } else {
 ahci_set_signature(d, SATA_SIGNATURE_DISK);
-- 
2.40.1

[PATCH v2 0/8] misc AHCI cleanups

2023-06-01 Thread Niklas Cassel

From: Niklas Cassel 

Hello John,

Here comes some misc AHCI cleanups.

Most are related to error handling.

Please review.

(I'm also working on a second series which will add support for
READ LOG EXT and READ LOG DMA EXT, but I will send that one out
once it is ready. (It might take a couple of weeks still, since
I've been a bit busy lately.))


Changes since v1:
-Picked up Reviewed-by tags.
 (I did not convert your ACK to explicit Acked-by tags, since I assume
 that the patches will go via your tree).
-Rebased on master in order to fix a conflict in patch
 "hw/ide/ahci: simplify and document PxCI handling".
-Dropped patch "hw/ide/ahci: trigger either error IRQ or regular IRQ, not both"
 for now, as it caused a boot time regression in SeaBIOS.
 This appears to be a bug in SeaBIOS, for more info see:
 
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/RIHV3FZ4EVMAJA4TEDPASKNYV7V72O4C/
 I will resend the QEMU patch separately once the SeaBIOS patch has been
 merged, and once QEMU has updated to a SeaBIOS tag that includes the fix.


Kind regards,
Niklas

Niklas Cassel (8):
  hw/ide/ahci: remove stray backslash
  hw/ide/core: set ERR_STAT in unsupported command completion
  hw/ide/ahci: write D2H FIS on when processing NCQ command
  hw/ide/ahci: simplify and document PxCI handling
  hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
  hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
  hw/ide/ahci: fix ahci_write_fis_sdb()
  hw/ide/ahci: fix broken SError handling

 hw/ide/ahci.c | 112 +++---
 hw/ide/core.c |   2 +-
 2 files changed, 81 insertions(+), 33 deletions(-)

-- 
2.40.1

Re: [PATCH v2 0/3] block: remove separate bdrv_file_open callback

2023-06-01 Thread Eric Blake

On Thu, Jun 01, 2023 at 01:51:36PM +0200, Paolo Bonzini wrote:
> The value of the bdrv_file_open is sometimes checked to distinguish
> protocol and format drivers, but apart from that there is no difference
> between bdrv_file_open and bdrv_open.
> 
> However, they can all be distinguished by the non-NULL .protocol_name
> member.  Change the checks to use .protocol_name instead of .bdrv_file_open,
> and unify the two callbacks.
> 
> Paolo
> 
> v1->v2: fix bisectability
> 
> Paolo Bonzini (3):
>   block: make assertion more generic
>   block: do not check bdrv_file_open
>   block: remove separate bdrv_file_open callback

Series:
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Libguestfs] [libnbd PATCH v3 06/22] states: Break deadlock if server goofs on extended replies

2023-06-01 Thread Laszlo Ersek

On 5/25/23 15:00, Eric Blake wrote:
> One of the benefits of extended replies is that we can do a
> fixed-length read for the entire header of every server reply, which
> is fewer syscalls than the split-read approach required by structured
> replies.

(Totally tangential comment: recvmsg() could scatter the incoming
traffic into non-contiguous fields of a non-packed structure. But I
don't know if TLS has anything similar. The current "linear receive"
approach is probably the least demanding of the underlying socket
abstractions. At the same time it requires us to use packed structs,
and/or multiple syscalls.)

> But one of the drawbacks of doing a large read is that if
> the server is non-compliant (not a problem for normal servers, but
> something I hit rather more than I'd like to admit while developing
> extended header support in servers),

Haha, so this is where it's coming from! :) I can totally relate.

> nbd_pwrite() and friends will
> deadlock if the server replies with the wrong header.  Add in some
> code to catch that failure mode and move the state machine to DEAD
> sooner, to make it easier to diagnose the fault in the server.
> 
> Unlike in the case of an unexpected simply reply from a structured

(1) s/simply/simple/

> server (where we never over-read, and can therefore

(2) At this point my English parser gets thrown off:

> commit b31e7bac
> can merely fail the command with EPROTO and successfully move on to
> the next server reply),

resync here:

> in this case we really do have to move to
> DEAD: in addition to having already read the 16 or 20 bytes that the
> server sent in its (short) reply for this command, we may have already
> read the initial bytes of the server's next reply, but we have no way
> to push those extra bytes back onto our read stream for parsing on our
> next pass through the state machine.
> 
> Signed-off-by: Eric Blake 
> ---
>  generator/states-reply.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/generator/states-reply.c b/generator/states-reply.c
> index 4e9f2dde..d4710d91 100644
> --- a/generator/states-reply.c
> +++ b/generator/states-reply.c
> @@ -109,7 +109,28 @@  REPLY.START:
>   REPLY.RECV_REPLY:
>switch (recv_into_rbuf (h)) {
>case -1: SET_NEXT_STATE (%.DEAD); return 0;
> -  case 1: SET_NEXT_STATE (%.READY); return 0;
> +  case 1: SET_NEXT_STATE (%.READY);
> +/* Special case: if we have a short read, but got at least far
> + * enough to decode the magic number, we can check if the server
> + * is matching our expectations. This lets us avoid deadlocking if
> + * a buggy server sends only 16 bytes of a simple reply, and is
> + * waiting for our next command, while we are blocked waiting for
> + * the server to send 32 bytes of an extended reply.
> + */

(4) Slight inconsistency between commit message and code comment: the
former mentions "20 bytes", but the latter doesn't mention "structured
reply".

> +if (h->extended_headers &&
> +(char *)h->rbuf >= (char *)>sbuf.reply.hdr.extended.flags) {

(.) I wonder if (address-of-magic + size-of magic) might be more
readable / greppable. Just in case.

Feel free to ignore.

> +  uint32_t magic = be32toh (h->sbuf.reply.hdr.extended.magic);
> +  if (magic != NBD_EXTENDED_REPLY_MAGIC) {
> +SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
> +set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic);
> +#if 0 /* uncomment to see desynchronized data */
> +nbd_internal_hexdump (>sbuf.reply.hdr.extended.flags,
> +  sizeof (h->sbuf.reply.hdr.extended.flags),
> +  stderr);
> +#endif
> +  }
> +}
> +return 0;

(5) This could be factored out to a helper function, to share the
"invalid:" label logic with the previous patch.

(6) Commencing a dump from "flags" onward looks questionable. At this
point, the "flags" field need not to be complete, or even started -- all
we know is that the "magic" field *before" "flags" is complete.

I think we should dump "h->sbuf.reply.hdr.simple", like in patch#5.

(.) Side comment (so no bullet number assigned): because we can take
multiple iterations on RECV_REPLY, we may end up checking the "magic"
field multiple times. Not very costly, just something to be aware of.

(7) Assume that we have a short read first, but then complete
REPLY.RECV_REPLY successfully, and move to
REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY.

Then, the following condition will have been caught (excluded) under
RECV_REPLY:

  (extended_headers && magic != NBD_EXTENDED_REPLY_MAGIC) [1]

Consequently, the same condition will never hold in
REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY.

Now consider REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, from the previous
patch:

   magic = be32toh (h->sbuf.reply.hdr.simple.magic);
   switch (magic) {
   case NBD_SIMPLE_REPLY_MAGIC:
 if (h->extended_headers)

Re: [PATCH v2 2/4] block: complete public block status API

2023-06-01 Thread Eric Blake

On Thu, Jun 01, 2023 at 01:51:29PM +0200, Paolo Bonzini wrote:
> Include both coroutine and non-coroutine versions, the latter being
> co_wrapper_mixed_bdrv_rdlock of the former.
> 
> Reviewed-by: Eric Blake 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c   | 18 +-
>  include/block/block-io.h | 18 --
>  2 files changed, 17 insertions(+), 19 deletions(-)

Always funny to see adding functions by reducing line count, thanks to
the wrapper generator.

> +++ b/include/block/block-io.h
> @@ -128,17 +128,23 @@ int coroutine_fn GRAPH_RDLOCK 
> bdrv_co_zone_append(BlockDriverState *bs,
>BdrvRequestFlags flags);
>  
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
> -int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> -  int64_t bytes, int64_t *pnum, int64_t *map,
> -  BlockDriverState **file);
> +
> +int coroutine_fn GRAPH_RDLOCK
> +bdrv_co_block_status(BlockDriverState *bs, int64_t offset,
> + int64_t bytes, int64_t *pnum,
> + int64_t *map, BlockDriverState **file);

Given that you line-wrapped this one at the function name,

> +int co_wrapper_mixed_bdrv_rdlock bdrv_block_status(BlockDriverState *bs, 
> int64_t offset,
> +   int64_t bytes, int64_t 
> *pnum,
> +   int64_t *map, 
> BlockDriverState **file);

shouldn't you do likewise here?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2 1/4] block: rename the bdrv_co_block_status static function

2023-06-01 Thread Eric Blake

On Thu, Jun 01, 2023 at 01:51:28PM +0200, Paolo Bonzini wrote:
> bdrv_block_status exists as a wrapper for bdrv_block_status_above,
> but the name of the (hypothetical) coroutine version, bdrv_co_block_status,
> is squatted by a random static function.  Rename it to bdrv_do_block_status.

bdrv_co_do_block_status

> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)

> @@ -2509,8 +2509,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
>  for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
>   p = bdrv_filter_or_cow_bs(p))
>  {
> -ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
> -   file);
> +ret = bdrv_co_do_block_status(p, want_zero, offset, bytes, pnum, 
> map, file);

Did you intend to rewrap this line at 80 columns at a different parameter?

With those minor fixes,

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 2/9] hw/ide/core: set ERR_STAT in unsupported command completion

2023-06-01 Thread Niklas Cassel

On Wed, May 17, 2023 at 05:12:57PM -0400, John Snow wrote:
> On Fri, Apr 28, 2023 at 9:22 AM Niklas Cassel  wrote:
> >
> > From: Niklas Cassel 
> >
> > Currently, the first time sending an unsupported command
> > (e.g. READ LOG DMA EXT) will not have ERR_STAT set in the completion.
> > Sending the unsupported command again, will correctly have ERR_STAT set.
> >
> > When ide_cmd_permitted() returns false, it calls ide_abort_command().
> > ide_abort_command() first calls ide_transfer_stop(), which will call
> > ide_transfer_halt() and ide_cmd_done(), after that ide_abort_command()
> > sets ERR_STAT in status.
> >
> > ide_cmd_done() for AHCI will call ahci_write_fis_d2h() which writes the
> > current status in the FIS, and raises an IRQ. (The status here will not
> > have ERR_STAT set!).
> >
> > Thus, we cannot call ide_transfer_stop() before setting ERR_STAT, as
> > ide_transfer_stop() will result in the FIS being written and an IRQ
> > being raised.
> >
> > The reason why it works the second time, is that ERR_STAT will still
> > be set from the previous command, so when writing the FIS, the
> > completion will correctly have ERR_STAT set.
> >
> > Set ERR_STAT before writing the FIS (calling cmd_done), so that we will
> > raise an error IRQ correctly when receiving an unsupported command.
> >
> > Signed-off-by: Niklas Cassel 
> > ---
> >  hw/ide/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/ide/core.c b/hw/ide/core.c
> > index 45d14a25e9..c144d1155d 100644
> > --- a/hw/ide/core.c
> > +++ b/hw/ide/core.c
> > @@ -531,9 +531,9 @@ BlockAIOCB *ide_issue_trim(
> >
> >  void ide_abort_command(IDEState *s)
> >  {
> > -ide_transfer_stop(s);
> >  s->status = READY_STAT | ERR_STAT;
> >  s->error = ABRT_ERR;
> > +ide_transfer_stop(s);
> >  }
> >
> >  static void ide_set_retry(IDEState *s)
> > --
> > 2.40.0
> >
> 
> Seems OK at a glance. Does this change the behavior of
> ide_transfer_stop at all? I guess we've been using this order of
> operations since 2008 at least. I didn't know C then.

Hello John,

Not as far as I can see.


Kind regards,
Niklas

Re: [PATCH v3 0/6] block: add blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi

On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote:
> v3
> - Patch 5: Mention why dev_max_batch condition was dropped [Stefano]
> v2
> - Patch 1: "is not be freed" -> "is not freed" [Eric]
> - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event
>   [Stefano]
> - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
> - Patch 4: Removed whitespace hunk [Eric]
> 
> The existing blk_io_plug() API is not block layer multi-queue friendly because
> the plug state is per-BlockDriverState.
> 
> Change blk_io_plug()'s implementation so it is thread-local. This is done by
> introducing the blk_io_plug_call() function that block drivers use to batch
> calls while plugged. It is relatively easy to convert block drivers from
> .bdrv_co_io_plug() to blk_io_plug_call().
> 
> Random read 4KB performance with virtio-blk on a host NVMe block device:
> 
> iodepth   iops   change vs today
> 145612   -4%
> 287967   +2%
> 4   129872   +0%
> 8   171096   -3%
> 16  194508   -4%
> 32  208947   -1%
> 64  217647   +0%
> 128 229629   +0%
> 
> The results are within the noise for these benchmarks. This is to be expected
> because the plugging behavior for a single thread hasn't changed in this patch
> series, only that the state is thread-local now.
> 
> The following graph compares several approaches:
> https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
> - v7.2.0: before most of the multi-queue block layer changes landed.
> - with-blk_io_plug: today's post-8.0.0 QEMU.
> - blk_io_plug-thread-local: this patch series.
> - no-blk_io_plug: what happens when we simply remove plugging?
> - call-after-dispatch: what if we integrate plugging into the event loop? I
>   decided against this approach in the end because it's more likely to
>   introduce performance regressions since I/O submission is deferred until the
>   end of the event loop iteration.
> 
> Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
> others, we see that all plugging approaches are more or less equivalent in 
> this
> benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.
> 
> The Ansible playbook, fio results, and a Jupyter notebook are available here:
> https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug
> 
> Stefan Hajnoczi (6):
>   block: add blk_io_plug_call() API
>   block/nvme: convert to blk_io_plug_call() API
>   block/blkio: convert to blk_io_plug_call() API
>   block/io_uring: convert to blk_io_plug_call() API
>   block/linux-aio: convert to blk_io_plug_call() API
>   block: remove bdrv_co_io_plug() API
> 
>  MAINTAINERS   |   1 +
>  include/block/block-io.h  |   3 -
>  include/block/block_int-common.h  |  11 ---
>  include/block/raw-aio.h   |  14 ---
>  include/sysemu/block-backend-io.h |  13 +--
>  block/blkio.c |  43 
>  block/block-backend.c |  22 -
>  block/file-posix.c|  38 ---
>  block/io.c|  37 ---
>  block/io_uring.c  |  44 -
>  block/linux-aio.c |  41 +++-
>  block/nvme.c  |  44 +++--
>  block/plug.c  | 159 ++
>  hw/block/dataplane/xen-block.c|   8 +-
>  hw/block/virtio-blk.c |   4 +-
>  hw/scsi/virtio-scsi.c |   6 +-
>  block/meson.build |   1 +
>  block/trace-events|   6 +-
>  18 files changed, 239 insertions(+), 256 deletions(-)
>  create mode 100644 block/plug.c
> 
> -- 
> 2.40.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies

2023-06-01 Thread Eric Blake

On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:
> On 5/25/23 15:00, Eric Blake wrote:
> > Support receiving headers for 64-bit replies if extended headers were
> > negotiated.  We already insist that the server not send us too much
> > payload in one reply, so we can exploit that and merge the 64-bit
> > length back into a normalized 32-bit field for the rest of the payload
> > length calculations.  The NBD protocol specifically documents that
> > extended mode takes precedence over structured replies, and that there
> > are no simple replies in extended mode.  We can also take advantage
> > that the handle field is in the same offset in all the various reply
> > types.
> 
> (1) We might want to replace "handle" with "cookie" at this point.

Yep, I'm already rebasing the series to do that, now that c1df4df9
landed.

> 
> >
> > Note that if we negotiate extended headers, but a non-compliant server
> > replies with a non-extended header, this patch will stall waiting for
> > the server to send more bytes
> 
> Ah, yes. If we negotiate extended headers, we set "rlen" to "sizeof
> extended" in REPLY.START, which is larger than either of "sizeof simple"
> and "sizeof structured". Therefore we'll get stuck in REPLY.RECV_REPLY.
> Correct?

Yes.

> 
> > rather than noticing that the magic
> > number is wrong (for aio operations, you'll get a magic number
> > mismatch once you send a second command that elicits a reply; but for
> > blocking operations, we basically deadlock).  The easy alternative
> > would be to read just the first 4 bytes of magic, then determine how
> > many more bytes to expect; but that would require more states and
> > syscalls, and not worth it since the typical server will be compliant.
> 
> Not liking this "not worth it" argument. But...
> 
> > The other alternative is what the next patch implements: teaching
> > REPLY.RECV_REPLY to handle short reads that were at least long enough
> > to transmit magic to specifically look for magic number mismatch.
> 
> ... that sounds OK!

If you haven't guessed already, I actually did hit this bug (my
initial qemu patches sent the wrong reply type, and it took me several
minutes to figure out why libnbd was hung), so I was also pleasantly
surprised at how easy the next patch turned out to be, after first
trying (and failing) to go down the rabbit hole mentioned here of
adding yet more states.

But, as you mentioned earlier in the series, it MAY be worth
rearranging states anyways, to specifically route extended header
parsing differently from structured header parsing.  So I'm still
playing with that, and seeing what may happen to this patch as
fallout.

> 
> >
> > At this point, h->extended_headers is permanently false (we can't
> > enable it until all other aspects of the protocol have likewise been
> > converted).
> >
> > Signed-off-by: Eric Blake 
> > ---
> >  lib/internal.h  |  1 +
> >  generator/states-reply-structured.c | 52 +++--
> >  generator/states-reply.c| 34 ---
> >  3 files changed, 58 insertions(+), 29 deletions(-)
> 
> Probably best to reorder the files in this patch for review:

I see what you mean: because of the state hierarchy, it is probably
worth tweaking the git orderfile to favor files nearer the top of the
state tree first, rather than strict alphabetical ordering of *.c.  I
may just push a patch for that without needing review, as it doesn't
affect library semantics at all.

...
> > +++ b/generator/states-reply.c
> > @@ -62,15 +62,19 @@  REPLY.START:
> > */
> >ssize_t r;
> >
> > -  /* We read all replies initially as if they are simple replies, but
> > -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
> > -   * This works because the structured_reply header is larger.
> > +  /* With extended headers, there is only one size to read, so we can do it
> > +   * all in one syscall.  But for older structured replies, we don't know 
> > if
> > +   * we have a simple or structured reply until we read the magic number,
> > +   * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
> > */
> >assert (h->reply_cmd == NULL);
> >assert (h->rlen == 0);
> >
> >h->rbuf = >sbuf.reply.hdr;
> > -  h->rlen = sizeof h->sbuf.reply.hdr.simple;
> > +  if (h->extended_headers)
> > +h->rlen = sizeof h->sbuf.reply.hdr.extended;
> > +  else
> > +h->rlen = sizeof h->sbuf.reply.hdr.simple;
> >
> >r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
> >if (r == -1) {
> 
> The comment "This works because the structured_reply header is larger"
> is being removed, which makes the non-ext branch even less
> comprehensible than before.
> 
> (2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is
> smaller than "sizeof structured"?

Yep, I'm already rebasing onto some additional asserts along those
lines, based on your reviews earlier in the series.

> 
> > @@ -116,16 +120,22 @@

Re: [PATCH v3 7/7] hw/ide/piix: Move registration of VMStateDescription to DeviceClass

2023-06-01 Thread Mark Cave-Ayland


On 31/05/2023 22:10, Bernhard Beschow wrote:


The modern, declarative way to set up VM state handling is to assign to
DeviceClass::vmsd attribute.

There shouldn't be any change in behavior since dc->vmsd causes
vmstate_register_with_alias_id() to be called on the instance during
the instance init phase. vmstate_register() was also called during the
instance init phase which forwards to vmstate_register_with_alias_id()
internally. Checking the migration schema before and after this patch confirms:

before:

qemu-system-x86_64 -S
qemu > migrate -d exec:cat>before.mig


after:

qemu-system-x86_64 -S
qemu > migrate -d exec:cat>after.mig



analyze-migration.py -d desc -f before.mig > before.json
analyze-migration.py -d desc -f after.mig > after.json
diff before.json after.json

-> empty

Signed-off-by: Bernhard Beschow 
---
  hw/ide/piix.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 47e0b474c3..151f206046 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -28,7 +28,6 @@
   */
  
  #include "qemu/osdep.h"

-#include "migration/vmstate.h"
  #include "qapi/error.h"
  #include "hw/pci/pci.h"
  #include "hw/ide/piix.h"
@@ -159,8 +158,6 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
  bmdma_setup_bar(d);
  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
  
-vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);

-
  for (unsigned i = 0; i < 2; i++) {
  if (!pci_piix_init_bus(d, i, errp)) {
  return;
@@ -186,6 +183,7 @@ static void piix3_ide_class_init(ObjectClass *klass, void 
*data)
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
  dc->reset = piix_ide_reset;

+dc->vmsd = _ide_pci;
  k->realize = pci_piix_ide_realize;
  k->exit = pci_piix_ide_exitfn;
  k->vendor_id = PCI_VENDOR_ID_INTEL;
@@ -208,6 +206,7 @@ static void piix4_ide_class_init(ObjectClass *klass, void 
*data)
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
  dc->reset = piix_ide_reset;

+dc->vmsd = _ide_pci;
  k->realize = pci_piix_ide_realize;
  k->exit = pci_piix_ide_exitfn;
  k->vendor_id = PCI_VENDOR_ID_INTEL;


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.

Re: [PATCH v3 6/7] hw/ide/pci: Replace some magic numbers by constants

2023-06-01 Thread Mark Cave-Ayland


On 31/05/2023 22:10, Bernhard Beschow wrote:


Signed-off-by: Bernhard Beschow 
---
  hw/ide/pci.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 0b26a4ce9f..a25b352537 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -320,7 +320,8 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
  
  void bmdma_status_writeb(BMDMAState *bm, uint32_t val)

  {
-bm->status = (val & 0x60) | (bm->status & 1) | (bm->status & ~val & 0x06);
+bm->status = (val & 0x60) | (bm->status & BM_STATUS_DMAING)
+ | (bm->status & ~val & (BM_STATUS_ERROR | BM_STATUS_INT));
  }
  
  static uint64_t bmdma_addr_read(void *opaque, hwaddr addr,


Reviewed-by: Mark Cave-Ayland 


ATB,

Mark.

[PATCH 09/12] vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/vhdx-log.c | 36 +---
 block/vhdx.c | 73 +++-
 block/vhdx.h |  5 ++--
 3 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/block/vhdx-log.c b/block/vhdx-log.c
index 38148f107a97..cf36d2b3a346 100644
--- a/block/vhdx-log.c
+++ b/block/vhdx-log.c
@@ -169,9 +169,10 @@ exit:
  * It is assumed that 'buffer' is at least 4096*num_sectors large.
  *
  * 0 is returned on success, -errno otherwise */
-static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
-  uint32_t *sectors_written, void *buffer,
-  uint32_t num_sectors)
+static int coroutine_fn GRAPH_RDLOCK
+vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
+   uint32_t *sectors_written, void *buffer,
+   uint32_t num_sectors)
 {
 int ret = 0;
 uint64_t offset;
@@ -195,8 +196,7 @@ static int vhdx_log_write_sectors(BlockDriverState *bs, 
VHDXLogEntries *log,
 /* full */
 break;
 }
-ret = bdrv_pwrite(bs->file, offset, VHDX_LOG_SECTOR_SIZE, buffer_tmp,
-  0);
+ret = bdrv_co_pwrite(bs->file, offset, VHDX_LOG_SECTOR_SIZE, 
buffer_tmp, 0);
 if (ret < 0) {
 goto exit;
 }
@@ -853,8 +853,9 @@ static void vhdx_log_raw_to_le_sector(VHDXLogDescriptor 
*desc,
 }
 
 
-static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
-  void *data, uint32_t length, uint64_t offset)
+static int coroutine_fn GRAPH_RDLOCK
+vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
+   void *data, uint32_t length, uint64_t offset)
 {
 int ret = 0;
 void *buffer = NULL;
@@ -924,7 +925,7 @@ static int vhdx_log_write(BlockDriverState *bs, 
BDRVVHDXState *s,
 
 sectors += partial_sectors;
 
-file_length = bdrv_getlength(bs->file->bs);
+file_length = bdrv_co_getlength(bs->file->bs);
 if (file_length < 0) {
 ret = file_length;
 goto exit;
@@ -971,8 +972,8 @@ static int vhdx_log_write(BlockDriverState *bs, 
BDRVVHDXState *s,
 
 if (i == 0 && leading_length) {
 /* partial sector at the front of the buffer */
-ret = bdrv_pread(bs->file, file_offset, VHDX_LOG_SECTOR_SIZE,
- merged_sector, 0);
+ret = bdrv_co_pread(bs->file, file_offset, VHDX_LOG_SECTOR_SIZE,
+merged_sector, 0);
 if (ret < 0) {
 goto exit;
 }
@@ -981,9 +982,9 @@ static int vhdx_log_write(BlockDriverState *bs, 
BDRVVHDXState *s,
 sector_write = merged_sector;
 } else if (i == sectors - 1 && trailing_length) {
 /* partial sector at the end of the buffer */
-ret = bdrv_pread(bs->file, file_offset + trailing_length,
- VHDX_LOG_SECTOR_SIZE - trailing_length,
- merged_sector + trailing_length, 0);
+ret = bdrv_co_pread(bs->file, file_offset + trailing_length,
+VHDX_LOG_SECTOR_SIZE - trailing_length,
+merged_sector + trailing_length, 0);
 if (ret < 0) {
 goto exit;
 }
@@ -1036,8 +1037,9 @@ exit:
 }
 
 /* Perform a log write, and then immediately flush the entire log */
-int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
- void *data, uint32_t length, uint64_t offset)
+int coroutine_fn GRAPH_RDLOCK
+vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
+ void *data, uint32_t length, uint64_t offset)
 {
 int ret = 0;
 VHDXLogSequence logs = { .valid = true,
@@ -1047,7 +1049,7 @@ int vhdx_log_write_and_flush(BlockDriverState *bs, 
BDRVVHDXState *s,
 
 /* Make sure data written (new and/or changed blocks) is stable
  * on disk, before creating log entry */
-ret = bdrv_flush(bs);
+ret = bdrv_co_flush(bs);
 if (ret < 0) {
 goto exit;
 }
@@ -1059,7 +1061,7 @@ int vhdx_log_write_and_flush(BlockDriverState *bs, 
BDRVVHDXState *s,
 logs.log = s->log;
 
 /* Make sure log is stable on disk */
-ret = bdrv_flush(bs);
+ret = bdrv_co_flush(bs);
 if (ret < 0) {
 goto exit;
 }
diff --git a/block/vhdx.c b/block/vhdx.c
index 798b6aacf419..45a11255b306 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1250,12 +1250,13 @@ exit:
  *
  * Returns the file offset start of the new payload block
  */
-static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
-

[PATCH 12/12] block: use bdrv_co_debug_event in coroutine context

2023-06-01 Thread Paolo Bonzini

bdrv_co_debug_event was recently introduced, with bdrv_debug_event
becoming a wrapper for use in unknown context.  Because most of the
time bdrv_debug_event is used on a BdrvChild via the wrapper macro
BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls
bdrv_co_debug_event, and switch whenever possible.

Signed-off-by: Paolo Bonzini 
---
 block/io.c   |  4 ++--
 block/qcow.c | 24 
 block/qcow2-cluster.c| 12 ++--
 block/qcow2-refcount.c   |  4 ++--
 block/qcow2.c| 18 +-
 block/qed-table.c|  6 +++---
 block/qed.c  |  8 
 block/raw-format.c   |  4 ++--
 block/vmdk.c | 24 
 include/block/block-io.h |  7 +++
 10 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/block/io.c b/block/io.c
index f537421ef523..e48b7454cfd8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2827,7 +2827,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 }
 
 /* Write back cached data to the OS even with cache=unsafe */
-BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
+BLKDBG_CO_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
 if (bs->drv->bdrv_co_flush_to_os) {
 ret = bs->drv->bdrv_co_flush_to_os(bs);
 if (ret < 0) {
@@ -2845,7 +2845,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 goto flush_children;
 }
 
-BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
+BLKDBG_CO_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
 if (!bs->drv) {
 /* bs->drv->bdrv_co_flush() might have ejected the BDS
  * (even in case of apparent success) */
diff --git a/block/qcow.c b/block/qcow.c
index 4aee835e8c36..94aba110fd32 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -379,7 +379,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 /* update the L1 entry */
 s->l1_table[l1_index] = l2_offset;
 tmp = cpu_to_be64(l2_offset);
-BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_L1_UPDATE);
 ret = bdrv_co_pwrite_sync(bs->file,
   s->l1_table_offset + l1_index * sizeof(tmp),
   sizeof(tmp), , 0);
@@ -410,7 +410,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 }
 }
 l2_table = s->l2_cache + (min_index << s->l2_bits);
-BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_L2_LOAD);
 if (new_l2_table) {
 memset(l2_table, 0, s->l2_size * sizeof(uint64_t));
 ret = bdrv_co_pwrite_sync(bs->file, l2_offset,
@@ -434,7 +434,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 ((cluster_offset & QCOW_OFLAG_COMPRESSED) && allocate == 1)) {
 if (!allocate)
 return 0;
-BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC);
 assert(QEMU_IS_ALIGNED(n_start | n_end, BDRV_SECTOR_SIZE));
 /* allocate a new cluster */
 if ((cluster_offset & QCOW_OFLAG_COMPRESSED) &&
@@ -451,7 +451,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 }
 cluster_offset = QEMU_ALIGN_UP(cluster_offset, s->cluster_size);
 /* write the cluster content */
-BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_WRITE_AIO);
 ret = bdrv_co_pwrite(bs->file, cluster_offset, s->cluster_size,
  s->cluster_cache, 0);
 if (ret < 0) {
@@ -491,7 +491,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
   NULL) < 0) {
 return -EIO;
 }
-BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_WRITE_AIO);
 ret = bdrv_co_pwrite(bs->file, cluster_offset + i,
  BDRV_SECTOR_SIZE,
  s->cluster_data, 0);
@@ -510,9 +510,9 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 tmp = cpu_to_be64(cluster_offset);
 l2_table[l2_index] = tmp;
 if (allocate == 2) {
-BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
 } else {
-BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
+BLKDBG_CO_EVENT(bs->file, BLKDBG_L2_UPDATE);
 }
 ret = bdrv_co_pwrite_sync(bs->file, l2_offset + l2_index * sizeof(tmp),
   sizeof(tmp), , 0);
@@ -595,7 +595,7 @@ decompress_cluster(BlockDriverState *bs, uint64_t

[PATCH 04/12] bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/bochs.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index 2f5ae52c908d..66e7a58e5e31 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -203,7 +203,8 @@ static void bochs_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O */
 }
 
-static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
+static int64_t coroutine_fn GRAPH_RDLOCK
+seek_to_sector(BlockDriverState *bs, int64_t sector_num)
 {
 BDRVBochsState *s = bs->opaque;
 uint64_t offset = sector_num * 512;
@@ -224,8 +225,8 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t 
sector_num)
 (s->extent_blocks + s->bitmap_blocks));
 
 /* read in bitmap for current extent */
-ret = bdrv_pread(bs->file, bitmap_offset + (extent_offset / 8), 1,
- _entry, 0);
+ret = bdrv_co_pread(bs->file, bitmap_offset + (extent_offset / 8), 1,
+_entry, 0);
 if (ret < 0) {
 return ret;
 }
-- 
2.40.1

[PATCH 08/12] vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/vmdk.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index a864069a5a1f..419868a42ae2 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -339,7 +339,8 @@ out:
 return ret;
 }
 
-static int vmdk_write_cid(BlockDriverState *bs, uint32_t cid)
+static int coroutine_fn GRAPH_RDLOCK
+vmdk_write_cid(BlockDriverState *bs, uint32_t cid)
 {
 char *desc, *tmp_desc;
 char *p_name, *tmp_str;
@@ -348,7 +349,7 @@ static int vmdk_write_cid(BlockDriverState *bs, uint32_t 
cid)
 
 desc = g_malloc0(DESC_SIZE);
 tmp_desc = g_malloc0(DESC_SIZE);
-ret = bdrv_pread(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
+ret = bdrv_co_pread(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
 if (ret < 0) {
 goto out;
 }
@@ -368,7 +369,7 @@ static int vmdk_write_cid(BlockDriverState *bs, uint32_t 
cid)
 pstrcat(desc, DESC_SIZE, tmp_desc);
 }
 
-ret = bdrv_pwrite_sync(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
+ret = bdrv_co_pwrite_sync(bs->file, s->desc_offset, DESC_SIZE, desc, 0);
 
 out:
 g_free(desc);
@@ -2165,7 +2166,7 @@ vmdk_co_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 return ret;
 }
 
-static int GRAPH_UNLOCKED
+static int coroutine_fn GRAPH_UNLOCKED
 vmdk_init_extent(BlockBackend *blk, int64_t filesize, bool flat, bool compress,
  bool zeroed_grain, Error **errp)
 {
@@ -2176,7 +2177,7 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
 int gd_buf_size;
 
 if (flat) {
-ret = blk_truncate(blk, filesize, false, PREALLOC_MODE_OFF, 0, errp);
+ret = blk_co_truncate(blk, filesize, false, PREALLOC_MODE_OFF, 0, 
errp);
 goto exit;
 }
 magic = cpu_to_be32(VMDK4_MAGIC);
@@ -2228,19 +2229,19 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
 header.check_bytes[3] = 0xa;
 
 /* write all the data */
-ret = blk_pwrite(blk, 0, sizeof(magic), , 0);
+ret = blk_co_pwrite(blk, 0, sizeof(magic), , 0);
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
-ret = blk_pwrite(blk, sizeof(magic), sizeof(header), , 0);
+ret = blk_co_pwrite(blk, sizeof(magic), sizeof(header), , 0);
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
 
-ret = blk_truncate(blk, le64_to_cpu(header.grain_offset) << 9, false,
-   PREALLOC_MODE_OFF, 0, errp);
+ret = blk_co_truncate(blk, le64_to_cpu(header.grain_offset) << 9, false,
+  PREALLOC_MODE_OFF, 0, errp);
 if (ret < 0) {
 goto exit;
 }
@@ -2252,8 +2253,8 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
  i < gt_count; i++, tmp += gt_size) {
 gd_buf[i] = cpu_to_le32(tmp);
 }
-ret = blk_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
- gd_buf_size, gd_buf, 0);
+ret = blk_co_pwrite(blk, le64_to_cpu(header.rgd_offset) * BDRV_SECTOR_SIZE,
+gd_buf_size, gd_buf, 0);
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
@@ -2264,8 +2265,8 @@ vmdk_init_extent(BlockBackend *blk, int64_t filesize, 
bool flat, bool compress,
  i < gt_count; i++, tmp += gt_size) {
 gd_buf[i] = cpu_to_le32(tmp);
 }
-ret = blk_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
- gd_buf_size, gd_buf, 0);
+ret = blk_co_pwrite(blk, le64_to_cpu(header.gd_offset) * BDRV_SECTOR_SIZE,
+gd_buf_size, gd_buf, 0);
 if (ret < 0) {
 error_setg(errp, QERR_IO_ERROR);
 }
-- 
2.40.1

[PATCH 10/12] qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/qcow2-bitmap.c   |  26 +
 block/qcow2-cluster.c  |  12 ++--
 block/qcow2-refcount.c | 130 +
 block/qcow2.c  |   2 +-
 block/qcow2.h  |  33 +--
 5 files changed, 105 insertions(+), 98 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index a952fd58d85e..037fa2d4351a 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -283,10 +283,9 @@ static int free_bitmap_clusters(BlockDriverState *bs, 
Qcow2BitmapTable *tb)
 /* load_bitmap_data
  * @bitmap_table entries must satisfy specification constraints.
  * @bitmap must be cleared */
-static int load_bitmap_data(BlockDriverState *bs,
-const uint64_t *bitmap_table,
-uint32_t bitmap_table_size,
-BdrvDirtyBitmap *bitmap)
+static int coroutine_fn GRAPH_RDLOCK
+load_bitmap_data(BlockDriverState *bs, const uint64_t *bitmap_table,
+ uint32_t bitmap_table_size, BdrvDirtyBitmap *bitmap)
 {
 int ret = 0;
 BDRVQcow2State *s = bs->opaque;
@@ -319,7 +318,7 @@ static int load_bitmap_data(BlockDriverState *bs,
  * already cleared */
 }
 } else {
-ret = bdrv_pread(bs->file, data_offset, s->cluster_size, buf, 0);
+ret = bdrv_co_pread(bs->file, data_offset, s->cluster_size, buf, 
0);
 if (ret < 0) {
 goto finish;
 }
@@ -337,8 +336,9 @@ finish:
 return ret;
 }
 
-static BdrvDirtyBitmap *load_bitmap(BlockDriverState *bs,
-Qcow2Bitmap *bm, Error **errp)
+static coroutine_fn GRAPH_RDLOCK
+BdrvDirtyBitmap *load_bitmap(BlockDriverState *bs,
+ Qcow2Bitmap *bm, Error **errp)
 {
 int ret;
 uint64_t *bitmap_table = NULL;
@@ -649,9 +649,10 @@ fail:
 return NULL;
 }
 
-int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-  void **refcount_table,
-  int64_t *refcount_table_size)
+int coroutine_fn
+qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+  void **refcount_table,
+  int64_t *refcount_table_size)
 {
 int ret;
 BDRVQcow2State *s = bs->opaque;
@@ -957,8 +958,9 @@ static void set_readonly_helper(gpointer bitmap, gpointer 
value)
  * If header_updated is not NULL then it is set appropriately regardless of
  * the return value.
  */
-bool coroutine_fn qcow2_load_dirty_bitmaps(BlockDriverState *bs,
-   bool *header_updated, Error **errp)
+bool coroutine_fn GRAPH_RDLOCK
+qcow2_load_dirty_bitmaps(BlockDriverState *bs,
+ bool *header_updated, Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
 Qcow2BitmapList *bm_list;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 39cda7f907ec..427c96c1d039 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -823,10 +823,9 @@ static int get_cluster_table(BlockDriverState *bs, 
uint64_t offset,
  *
  * Return 0 on success and -errno in error cases
  */
-int coroutine_fn qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
-   uint64_t offset,
-   int compressed_size,
-   uint64_t *host_offset)
+int coroutine_fn GRAPH_RDLOCK
+qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, uint64_t offset,
+  int compressed_size, uint64_t 
*host_offset)
 {
 BDRVQcow2State *s = bs->opaque;
 int l2_index, ret;
@@ -2014,8 +2013,9 @@ fail:
  * all clusters in the same L2 slice) and returns the number of zeroed
  * clusters.
  */
-static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
-uint64_t nb_clusters, int flags)
+static int coroutine_fn
+zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
+ uint64_t nb_clusters, int flags)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t *l2_slice;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 4cf91bd95581..de527c3ee496 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1069,7 +1069,7 @@ int64_t coroutine_fn 
qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offs
 
 /* only used to allocate compressed sectors. We try to allocate
contiguous sectors. size must be <= cluster_size */
-int64_t coroutine_fn qcow2_alloc_bytes(BlockDriverState *bs, int size)
+int64_t coroutine_fn GRAPH_RDLOCK

[PATCH 05/12] block: mark another function as coroutine_fns and GRAPH_UNLOCKED

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Because this function operates on a BlockBackend, mark it
GRAPH_UNLOCKED.

Signed-off-by: Paolo Bonzini 
---
 block.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 113e3d90fd52..98cba7d11a56 100644
--- a/block.c
+++ b/block.c
@@ -555,8 +555,9 @@ int coroutine_fn bdrv_co_create(BlockDriver *drv, const 
char *filename,
  * On success, return @blk's actual length.
  * Otherwise, return -errno.
  */
-static int64_t create_file_fallback_truncate(BlockBackend *blk,
- int64_t minimum_size, Error 
**errp)
+static int64_t coroutine_fn GRAPH_UNLOCKED
+create_file_fallback_truncate(BlockBackend *blk, int64_t minimum_size,
+  Error **errp)
 {
 Error *local_err = NULL;
 int64_t size;
@@ -564,14 +565,14 @@ static int64_t create_file_fallback_truncate(BlockBackend 
*blk,
 
 GLOBAL_STATE_CODE();
 
-ret = blk_truncate(blk, minimum_size, false, PREALLOC_MODE_OFF, 0,
-   _err);
+ret = blk_co_truncate(blk, minimum_size, false, PREALLOC_MODE_OFF, 0,
+  _err);
 if (ret < 0 && ret != -ENOTSUP) {
 error_propagate(errp, local_err);
 return ret;
 }
 
-size = blk_getlength(blk);
+size = blk_co_getlength(blk);
 if (size < 0) {
 error_free(local_err);
 error_setg_errno(errp, -size,
-- 
2.40.1

[PATCH 07/12] dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/dmg.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 2769900359f8..06a0244a9c16 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -616,7 +616,8 @@ err:
 return s->n_chunks; /* error */
 }
 
-static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
+static int coroutine_fn GRAPH_RDLOCK
+dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
 {
 BDRVDMGState *s = bs->opaque;
 
@@ -633,8 +634,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, 
uint64_t sector_num)
 case UDZO: { /* zlib compressed */
 /* we need to buffer, because only the chunk as whole can be
  * inflated. */
-ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
- s->compressed_chunk, 0);
+ret = bdrv_co_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
+s->compressed_chunk, 0);
 if (ret < 0) {
 return -1;
 }
@@ -659,8 +660,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, 
uint64_t sector_num)
 }
 /* we need to buffer, because only the chunk as whole can be
  * inflated. */
-ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
- s->compressed_chunk, 0);
+ret = bdrv_co_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
+s->compressed_chunk, 0);
 if (ret < 0) {
 return -1;
 }
@@ -680,8 +681,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, 
uint64_t sector_num)
 }
 /* we need to buffer, because only the chunk as whole can be
  * inflated. */
-ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
- s->compressed_chunk, 0);
+ret = bdrv_co_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
+s->compressed_chunk, 0);
 if (ret < 0) {
 return -1;
 }
@@ -696,8 +697,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, 
uint64_t sector_num)
 }
 break;
 case UDRW: /* copy */
-ret = bdrv_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
- s->uncompressed_chunk, 0);
+ret = bdrv_co_pread(bs->file, s->offsets[chunk], s->lengths[chunk],
+s->uncompressed_chunk, 0);
 if (ret < 0) {
 return -1;
 }
@@ -713,7 +714,7 @@ static inline int dmg_read_chunk(BlockDriverState *bs, 
uint64_t sector_num)
 return 0;
 }
 
-static int coroutine_fn
+static int coroutine_fn GRAPH_RDLOCK
 dmg_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
   QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
-- 
2.40.1

[PATCH 03/12] vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/vpc.c | 52 ++--
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index 5af8e2807112..22a160267131 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -486,8 +486,8 @@ static int vpc_reopen_prepare(BDRVReopenState *state,
  * operation (the block bitmaps is updated then), 0 otherwise.
  * If write is true then err must not be NULL.
  */
-static inline int64_t get_image_offset(BlockDriverState *bs, uint64_t offset,
-   bool write, int *err)
+static int64_t coroutine_fn GRAPH_RDLOCK
+get_image_offset(BlockDriverState *bs, uint64_t offset, bool write, int *err)
 {
 BDRVVPCState *s = bs->opaque;
 uint64_t bitmap_offset, block_offset;
@@ -515,8 +515,7 @@ static inline int64_t get_image_offset(BlockDriverState 
*bs, uint64_t offset,
 
 s->last_bitmap_offset = bitmap_offset;
 memset(bitmap, 0xff, s->bitmap_size);
-r = bdrv_pwrite_sync(bs->file, bitmap_offset, s->bitmap_size, bitmap,
- 0);
+r = bdrv_co_pwrite_sync(bs->file, bitmap_offset, s->bitmap_size, 
bitmap, 0);
 if (r < 0) {
 *err = r;
 return -2;
@@ -532,13 +531,13 @@ static inline int64_t get_image_offset(BlockDriverState 
*bs, uint64_t offset,
  *
  * Returns 0 on success and < 0 on error
  */
-static int rewrite_footer(BlockDriverState *bs)
+static int coroutine_fn GRAPH_RDLOCK rewrite_footer(BlockDriverState *bs)
 {
 int ret;
 BDRVVPCState *s = bs->opaque;
 int64_t offset = s->free_data_block_offset;
 
-ret = bdrv_pwrite_sync(bs->file, offset, sizeof(s->footer), >footer, 0);
+ret = bdrv_co_pwrite_sync(bs->file, offset, sizeof(s->footer), >footer, 
0);
 if (ret < 0)
 return ret;
 
@@ -552,7 +551,8 @@ static int rewrite_footer(BlockDriverState *bs)
  *
  * Returns the sectors' offset in the image file on success and < 0 on error
  */
-static int64_t alloc_block(BlockDriverState *bs, int64_t offset)
+static int64_t coroutine_fn GRAPH_RDLOCK
+alloc_block(BlockDriverState *bs, int64_t offset)
 {
 BDRVVPCState *s = bs->opaque;
 int64_t bat_offset;
@@ -572,8 +572,8 @@ static int64_t alloc_block(BlockDriverState *bs, int64_t 
offset)
 
 /* Initialize the block's bitmap */
 memset(bitmap, 0xff, s->bitmap_size);
-ret = bdrv_pwrite_sync(bs->file, s->free_data_block_offset,
-   s->bitmap_size, bitmap, 0);
+ret = bdrv_co_pwrite_sync(bs->file, s->free_data_block_offset,
+  s->bitmap_size, bitmap, 0);
 if (ret < 0) {
 return ret;
 }
@@ -587,7 +587,7 @@ static int64_t alloc_block(BlockDriverState *bs, int64_t 
offset)
 /* Write BAT entry to disk */
 bat_offset = s->bat_offset + (4 * index);
 bat_value = cpu_to_be32(s->pagetable[index]);
-ret = bdrv_pwrite_sync(bs->file, bat_offset, 4, _value, 0);
+ret = bdrv_co_pwrite_sync(bs->file, bat_offset, 4, _value, 0);
 if (ret < 0)
 goto fail;
 
@@ -718,11 +718,11 @@ fail:
 return ret;
 }
 
-static int coroutine_fn vpc_co_block_status(BlockDriverState *bs,
-bool want_zero,
-int64_t offset, int64_t bytes,
-int64_t *pnum, int64_t *map,
-BlockDriverState **file)
+static int coroutine_fn GRAPH_RDLOCK
+vpc_co_block_status(BlockDriverState *bs, bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
 BDRVVPCState *s = bs->opaque;
 int64_t image_offset;
@@ -820,8 +820,8 @@ static int calculate_geometry(int64_t total_sectors, 
uint16_t *cyls,
 return 0;
 }
 
-static int create_dynamic_disk(BlockBackend *blk, VHDFooter *footer,
-   int64_t total_sectors)
+static int coroutine_fn create_dynamic_disk(BlockBackend *blk, VHDFooter 
*footer,
+int64_t total_sectors)
 {
 VHDDynDiskHeader dyndisk_header;
 uint8_t bat_sector[512];
@@ -834,13 +834,13 @@ static int create_dynamic_disk(BlockBackend *blk, 
VHDFooter *footer,
 block_size = 0x20;
 num_bat_entries = DIV_ROUND_UP(total_sectors, block_size / 512);
 
-ret = blk_pwrite(blk, offset, sizeof(*footer), footer, 0);
+ret = blk_co_pwrite(blk, offset, sizeof(*footer), footer, 0);
 if (ret < 0) {
 goto fail;
 }
 
 offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
-ret = blk_pwrite(blk, offset, sizeof(*footer), footer, 0);
+ret =

[PATCH v2 2/4] block: complete public block status API

2023-06-01 Thread Paolo Bonzini

Include both coroutine and non-coroutine versions, the latter being
co_wrapper_mixed_bdrv_rdlock of the former.

Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
---
 block/io.c   | 18 +-
 include/block/block-io.h | 18 --
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/block/io.c b/block/io.c
index 2c2c9656853c..806715a5bbe3 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2575,21 +2575,13 @@ int coroutine_fn 
bdrv_co_block_status_above(BlockDriverState *bs,
  bytes, pnum, map, file, NULL);
 }
 
-int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
-int64_t offset, int64_t bytes, int64_t *pnum,
-int64_t *map, BlockDriverState **file)
+int coroutine_fn bdrv_co_block_status(BlockDriverState *bs, int64_t offset,
+  int64_t bytes, int64_t *pnum,
+  int64_t *map, BlockDriverState **file)
 {
 IO_CODE();
-return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
-  pnum, map, file, NULL);
-}
-
-int bdrv_block_status(BlockDriverState *bs, int64_t offset, int64_t bytes,
-  int64_t *pnum, int64_t *map, BlockDriverState **file)
-{
-IO_CODE();
-return bdrv_block_status_above(bs, bdrv_filter_or_cow_bs(bs),
-   offset, bytes, pnum, map, file);
+return bdrv_co_block_status_above(bs, bdrv_filter_or_cow_bs(bs),
+  offset, bytes, pnum, map, file);
 }
 
 /*
diff --git a/include/block/block-io.h b/include/block/block-io.h
index a27e471a87b6..ce75ff5ddde8 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -128,17 +128,23 @@ int coroutine_fn GRAPH_RDLOCK 
bdrv_co_zone_append(BlockDriverState *bs,
   BdrvRequestFlags flags);
 
 bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
-int bdrv_block_status(BlockDriverState *bs, int64_t offset,
-  int64_t bytes, int64_t *pnum, int64_t *map,
-  BlockDriverState **file);
+
+int coroutine_fn GRAPH_RDLOCK
+bdrv_co_block_status(BlockDriverState *bs, int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file);
+int co_wrapper_mixed_bdrv_rdlock bdrv_block_status(BlockDriverState *bs, 
int64_t offset,
+   int64_t bytes, int64_t 
*pnum,
+   int64_t *map, 
BlockDriverState **file);
 
 int coroutine_fn GRAPH_RDLOCK
 bdrv_co_block_status_above(BlockDriverState *bs, BlockDriverState *base,
int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file);
-int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
-int64_t offset, int64_t bytes, int64_t *pnum,
-int64_t *map, BlockDriverState **file);
+int co_wrapper_mixed_bdrv_rdlock
+bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
+int64_t offset, int64_t bytes, int64_t *pnum,
+int64_t *map, BlockDriverState **file);
 
 int coroutine_fn GRAPH_RDLOCK
 bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
-- 
2.40.1

[PATCH v2 3/4] block: switch to co_wrapper for bdrv_is_allocated_*

2023-06-01 Thread Paolo Bonzini

Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
---
 block/io.c   | 53 ++--
 include/block/block-io.h | 12 +
 2 files changed, 14 insertions(+), 51 deletions(-)

diff --git a/block/io.c b/block/io.c
index 806715a5bbe3..2fae64ad1eb6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2628,45 +2628,6 @@ int coroutine_fn bdrv_co_is_allocated(BlockDriverState 
*bs, int64_t offset,
 return !!(ret & BDRV_BLOCK_ALLOCATED);
 }
 
-int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
-  int64_t *pnum)
-{
-int ret;
-int64_t dummy;
-IO_CODE();
-
-ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
- bytes, pnum ? pnum : , NULL,
- NULL, NULL);
-if (ret < 0) {
-return ret;
-}
-return !!(ret & BDRV_BLOCK_ALLOCATED);
-}
-
-/* See bdrv_is_allocated_above for documentation */
-int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
-BlockDriverState *base,
-bool include_base, int64_t offset,
-int64_t bytes, int64_t *pnum)
-{
-int depth;
-int ret;
-IO_CODE();
-
-ret = bdrv_co_common_block_status_above(top, base, include_base, false,
-offset, bytes, pnum, NULL, NULL,
-);
-if (ret < 0) {
-return ret;
-}
-
-if (ret & BDRV_BLOCK_ALLOCATED) {
-return depth;
-}
-return 0;
-}
-
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
@@ -2684,18 +2645,18 @@ int coroutine_fn 
bdrv_co_is_allocated_above(BlockDriverState *top,
  * words, the result is not necessarily the maximum possible range);
  * but 'pnum' will only be 0 when end of file is reached.
  */
-int bdrv_is_allocated_above(BlockDriverState *top,
-BlockDriverState *base,
-bool include_base, int64_t offset,
-int64_t bytes, int64_t *pnum)
+int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *bs,
+BlockDriverState *base,
+bool include_base, int64_t offset,
+int64_t bytes, int64_t *pnum)
 {
 int depth;
 int ret;
 IO_CODE();
 
-ret = bdrv_common_block_status_above(top, base, include_base, false,
- offset, bytes, pnum, NULL, NULL,
- );
+ret = bdrv_co_common_block_status_above(bs, base, include_base, false,
+offset, bytes, pnum, NULL, NULL,
+);
 if (ret < 0) {
 return ret;
 }
diff --git a/include/block/block-io.h b/include/block/block-io.h
index ce75ff5ddde8..3946bbefc5c2 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -149,16 +149,18 @@ bdrv_block_status_above(BlockDriverState *bs, 
BlockDriverState *base,
 int coroutine_fn GRAPH_RDLOCK
 bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
  int64_t *pnum);
-int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
-  int64_t *pnum);
+int co_wrapper_mixed_bdrv_rdlock
+bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
+  int64_t bytes, int64_t *pnum);
 
 int coroutine_fn GRAPH_RDLOCK
 bdrv_co_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
bool include_base, int64_t offset, int64_t bytes,
int64_t *pnum);
-int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
-bool include_base, int64_t offset, int64_t bytes,
-int64_t *pnum);
+int co_wrapper_mixed_bdrv_rdlock
+bdrv_is_allocated_above(BlockDriverState *bs, BlockDriverState *base,
+bool include_base, int64_t offset,
+int64_t bytes, int64_t *pnum);
 
 int coroutine_fn GRAPH_RDLOCK
 bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
-- 
2.40.1

[PATCH v2 4/4] block: convert more bdrv_is_allocated* and bdrv_block_status* calls to coroutine versions

2023-06-01 Thread Paolo Bonzini

Reviewed-by: Eric Blake 
Signed-off-by: Paolo Bonzini 
---
 block/copy-before-write.c |  2 +-
 block/copy-on-read.c  |  8 
 block/io.c|  6 +++---
 block/mirror.c| 10 +-
 block/qcow2.c |  5 +++--
 block/replication.c   |  8 
 block/stream.c|  8 
 block/vvfat.c | 18 +-
 8 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index b866e42271d0..2711d95718f7 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -305,7 +305,7 @@ cbw_co_snapshot_block_status(BlockDriverState *bs,
 return -EACCES;
 }
 
-ret = bdrv_block_status(child->bs, offset, cur_bytes, pnum, map, file);
+ret = bdrv_co_block_status(child->bs, offset, cur_bytes, pnum, map, file);
 if (child == s->target) {
 /*
  * We refer to s->target only for areas that we've written to it.
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index b4d6b7efc30f..5149fcf63adc 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -146,11 +146,11 @@ cor_co_preadv_part(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 local_flags = flags;
 
 /* In case of failure, try to copy-on-read anyway */
-ret = bdrv_is_allocated(bs->file->bs, offset, bytes, );
+ret = bdrv_co_is_allocated(bs->file->bs, offset, bytes, );
 if (ret <= 0) {
-ret = 
bdrv_is_allocated_above(bdrv_backing_chain_next(bs->file->bs),
-  state->bottom_bs, true, offset,
-  n, );
+ret = 
bdrv_co_is_allocated_above(bdrv_backing_chain_next(bs->file->bs),
+ state->bottom_bs, true, offset,
+ n, );
 if (ret > 0 || ret < 0) {
 local_flags |= BDRV_REQ_COPY_ON_READ;
 }
diff --git a/block/io.c b/block/io.c
index 2fae64ad1eb6..3bd4c7de97f0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1216,8 +1216,8 @@ bdrv_co_do_copy_on_readv(BdrvChild *child, int64_t 
offset, int64_t bytes,
 ret = 1; /* "already allocated", so nothing will be copied */
 pnum = MIN(cluster_bytes, max_transfer);
 } else {
-ret = bdrv_is_allocated(bs, cluster_offset,
-MIN(cluster_bytes, max_transfer), );
+ret = bdrv_co_is_allocated(bs, cluster_offset,
+   MIN(cluster_bytes, max_transfer), 
);
 if (ret < 0) {
 /*
  * Safe to treat errors in querying allocation as if
@@ -1364,7 +1364,7 @@ bdrv_aligned_preadv(BdrvChild *child, BdrvTrackedRequest 
*req,
 /* The flag BDRV_REQ_COPY_ON_READ has reached its addressee */
 flags &= ~BDRV_REQ_COPY_ON_READ;
 
-ret = bdrv_is_allocated(bs, offset, bytes, );
+ret = bdrv_co_is_allocated(bs, offset, bytes, );
 if (ret < 0) {
 goto out;
 }
diff --git a/block/mirror.c b/block/mirror.c
index d3cacd170860..c0f40e696072 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -559,9 +559,9 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s)
 
 assert(!(offset % s->granularity));
 WITH_GRAPH_RDLOCK_GUARD() {
-ret = bdrv_block_status_above(source, NULL, offset,
-nb_chunks * s->granularity,
-_bytes, NULL, NULL);
+ret = bdrv_co_block_status_above(source, NULL, offset,
+ nb_chunks * s->granularity,
+ _bytes, NULL, NULL);
 }
 if (ret < 0) {
 io_bytes = MIN(nb_chunks * s->granularity, max_io_bytes);
@@ -875,8 +875,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 }
 
 WITH_GRAPH_RDLOCK_GUARD() {
-ret = bdrv_is_allocated_above(bs, s->base_overlay, true, offset,
-  bytes, );
+ret = bdrv_co_is_allocated_above(bs, s->base_overlay, true, offset,
+ bytes, );
 }
 if (ret < 0) {
 return ret;
diff --git a/block/qcow2.c b/block/qcow2.c
index 7f3948360d05..5a27740223b4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3968,7 +3968,8 @@ finish:
 }
 
 
-static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
+static bool coroutine_fn GRAPH_RDLOCK
+is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 int64_t nr;
 int res;
@@ -3989,7 +3990,7 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
  * backing file. So, we need a loop.
  */
 do {
-res = bdrv_block_status_above(bs, NULL, offset, bytes, , NULL,

[PATCH 11/12] block: use bdrv_co_getlength in coroutine context

2023-06-01 Thread Paolo Bonzini

bdrv_co_getlength was recently introduced, with bdrv_getlength becoming
a wrapper for use in unknown context.  Switch to bdrv_co_getlength when
possible.

Signed-off-by: Paolo Bonzini 
---
 block/io.c| 10 +-
 block/parallels.c |  4 ++--
 block/qcow.c  |  6 +++---
 block/vmdk.c  |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3bd4c7de97f0..f537421ef523 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1379,7 +1379,7 @@ bdrv_aligned_preadv(BdrvChild *child, BdrvTrackedRequest 
*req,
 }
 
 /* Forward the request to the BlockDriver, possibly fragmenting it */
-total_bytes = bdrv_getlength(bs);
+total_bytes = bdrv_co_getlength(bs);
 if (total_bytes < 0) {
 ret = total_bytes;
 goto out;
@@ -2252,7 +2252,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool 
want_zero,
 assert(pnum);
 assert_bdrv_graph_readable();
 *pnum = 0;
-total_size = bdrv_getlength(bs);
+total_size = bdrv_co_getlength(bs);
 if (total_size < 0) {
 ret = total_size;
 goto early_out;
@@ -2272,7 +2272,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool 
want_zero,
 bytes = n;
 }
 
-/* Must be non-NULL or bdrv_getlength() would have failed */
+/* Must be non-NULL or bdrv_co_getlength() would have failed */
 assert(bs->drv);
 has_filtered_child = bdrv_filter_child(bs);
 if (!bs->drv->bdrv_co_block_status && !has_filtered_child) {
@@ -2410,7 +2410,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool 
want_zero,
 if (!cow_bs) {
 ret |= BDRV_BLOCK_ZERO;
 } else if (want_zero) {
-int64_t size2 = bdrv_getlength(cow_bs);
+int64_t size2 = bdrv_co_getlength(cow_bs);
 
 if (size2 >= 0 && offset >= size2) {
 ret |= BDRV_BLOCK_ZERO;
@@ -3445,7 +3445,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
 return ret;
 }
 
-old_size = bdrv_getlength(bs);
+old_size = bdrv_co_getlength(bs);
 if (old_size < 0) {
 error_setg_errno(errp, -old_size, "Failed to get old image size");
 return old_size;
diff --git a/block/parallels.c b/block/parallels.c
index 190406ba2e3d..91247cb157f6 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -193,7 +193,7 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 assert(idx < s->bat_size && idx + to_allocate <= s->bat_size);
 
 space = to_allocate * s->tracks;
-len = bdrv_getlength(bs->file->bs);
+len = bdrv_co_getlength(bs->file->bs);
 if (len < 0) {
 return len;
 }
@@ -426,7 +426,7 @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult 
*res,
 uint32_t i;
 bool flush_bat = false;
 
-size = bdrv_getlength(bs->file->bs);
+size = bdrv_co_getlength(bs->file->bs);
 if (size < 0) {
 res->check_errors++;
 return size;
diff --git a/block/qcow.c b/block/qcow.c
index 823e931cea2d..4aee835e8c36 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -370,7 +370,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 if (!allocate)
 return 0;
 /* allocate a new l2 entry */
-l2_offset = bdrv_getlength(bs->file->bs);
+l2_offset = bdrv_co_getlength(bs->file->bs);
 if (l2_offset < 0) {
 return l2_offset;
 }
@@ -445,7 +445,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 if (decompress_cluster(bs, cluster_offset) < 0) {
 return -EIO;
 }
-cluster_offset = bdrv_getlength(bs->file->bs);
+cluster_offset = bdrv_co_getlength(bs->file->bs);
 if ((int64_t) cluster_offset < 0) {
 return cluster_offset;
 }
@@ -458,7 +458,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, 
int allocate,
 return ret;
 }
 } else {
-cluster_offset = bdrv_getlength(bs->file->bs);
+cluster_offset = bdrv_co_getlength(bs->file->bs);
 if ((int64_t) cluster_offset < 0) {
 return cluster_offset;
 }
diff --git a/block/vmdk.c b/block/vmdk.c
index 419868a42ae2..c64f2eacc03f 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -2132,7 +2132,7 @@ vmdk_co_pwritev_compressed(BlockDriverState *bs, int64_t 
offset, int64_t bytes,
 int64_t length;
 
 for (i = 0; i < s->num_extents; i++) {
-length = bdrv_getlength(s->extents[i].file->bs);
+length = bdrv_co_getlength(s->extents[i].file->bs);
 if (length < 0) {
 return length;
 }
@@ -2939,7 +2939,7 @@ vmdk_co_check(BlockDriverState *bs, BdrvCheckResult 
*result, BdrvCheckMode fix)
 break;
 }
 if (ret == VMDK_OK) {
-int64_t extent_len = bdrv_getlength(extent->file->bs);
+

[PATCH 00/12] block: more fixes to coroutine_fn marking

2023-06-01 Thread Paolo Bonzini

*** BLURB HERE ***

Paolo Bonzini (12):
  file-posix: remove incorrect coroutine_fn calls
  qed: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vpc: mark more functions as coroutine_fns and GRAPH_RDLOCK
  bochs: mark more functions as coroutine_fns and GRAPH_RDLOCK
  block: mark another function as coroutine_fns and GRAPH_UNLOCKED
  cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK
  dmg: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vmdk: mark more functions as coroutine_fns and GRAPH_RDLOCK
  vhdx: mark more functions as coroutine_fns and GRAPH_RDLOCK
  qcow2: mark more functions as coroutine_fns and GRAPH_RDLOCK
  block: use bdrv_co_getlength in coroutine context
  block: use bdrv_co_debug_event in coroutine context

 block.c  |  11 ++--
 block/bochs.c|   7 +-
 block/cloop.c|   9 +--
 block/dmg.c  |  21 +++---
 block/file-posix.c   |  29 +
 block/io.c   |  14 ++--
 block/parallels.c|   4 +-
 block/qcow.c |  30 -
 block/qcow2-bitmap.c |  26 
 block/qcow2-cluster.c|  24 +++
 block/qcow2-refcount.c   | 134 ---
 block/qcow2.c|  20 +++---
 block/qcow2.h|  33 +-
 block/qed-check.c|   5 +-
 block/qed-table.c|   6 +-
 block/qed.c  |  15 +++--
 block/raw-format.c   |   4 +-
 block/vhdx-log.c |  36 ++-
 block/vhdx.c |  73 ++---
 block/vhdx.h |   5 +-
 block/vmdk.c |  55 
 block/vpc.c  |  52 +++
 include/block/block-io.h |   7 ++
 23 files changed, 323 insertions(+), 297 deletions(-)

-- 
2.40.1

[PATCH v2 0/3] block: remove separate bdrv_file_open callback

2023-06-01 Thread Paolo Bonzini

The value of the bdrv_file_open is sometimes checked to distinguish
protocol and format drivers, but apart from that there is no difference
between bdrv_file_open and bdrv_open.

However, they can all be distinguished by the non-NULL .protocol_name
member.  Change the checks to use .protocol_name instead of .bdrv_file_open,
and unify the two callbacks.

Paolo

v1->v2: fix bisectability

Paolo Bonzini (3):
  block: make assertion more generic
  block: do not check bdrv_file_open
  block: remove separate bdrv_file_open callback

 block.c  | 17 +++--
 block/blkdebug.c |  2 +-
 block/blkio.c|  2 +-
 block/blkverify.c|  2 +-
 block/curl.c |  8 
 block/file-posix.c   |  8 
 block/file-win32.c   |  4 ++--
 block/gluster.c  |  8 
 block/iscsi.c|  4 ++--
 block/nbd.c  |  6 +++---
 block/nfs.c  |  2 +-
 block/null.c |  4 ++--
 block/nvme.c |  2 +-
 block/rbd.c  |  3 ++-
 block/ssh.c  |  2 +-
 block/vvfat.c|  2 +-
 include/block/block_int-common.h |  3 ---
 17 files changed, 37 insertions(+), 42 deletions(-)

-- 
2.40.1

[PATCH 06/12] cloop: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/cloop.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/cloop.c b/block/cloop.c
index 1e5a52d6b2dc..835a0fe3da0a 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -212,7 +212,8 @@ static void cloop_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O */
 }
 
-static inline int cloop_read_block(BlockDriverState *bs, int block_num)
+static int coroutine_fn GRAPH_RDLOCK
+cloop_read_block(BlockDriverState *bs, int block_num)
 {
 BDRVCloopState *s = bs->opaque;
 
@@ -220,8 +221,8 @@ static inline int cloop_read_block(BlockDriverState *bs, 
int block_num)
 int ret;
 uint32_t bytes = s->offsets[block_num + 1] - s->offsets[block_num];
 
-ret = bdrv_pread(bs->file, s->offsets[block_num], bytes,
- s->compressed_block, 0);
+ret = bdrv_co_pread(bs->file, s->offsets[block_num], bytes,
+s->compressed_block, 0);
 if (ret < 0) {
 return -1;
 }
@@ -244,7 +245,7 @@ static inline int cloop_read_block(BlockDriverState *bs, 
int block_num)
 return 0;
 }
 
-static int coroutine_fn
+static int coroutine_fn GRAPH_RDLOCK
 cloop_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
 QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
-- 
2.40.1

[PATCH v2 3/3] block: remove separate bdrv_file_open callback

2023-06-01 Thread Paolo Bonzini

bdrv_file_open and bdrv_open are completely equivalent, they are
never checked except to see which one to invoke.  So merge them
into a single one.

Signed-off-by: Paolo Bonzini 
---
 block.c  | 4 +---
 block/blkdebug.c | 2 +-
 block/blkio.c| 2 +-
 block/blkverify.c| 2 +-
 block/curl.c | 8 
 block/file-posix.c   | 8 
 block/file-win32.c   | 4 ++--
 block/gluster.c  | 8 
 block/iscsi.c| 4 ++--
 block/nbd.c  | 6 +++---
 block/nfs.c  | 2 +-
 block/null.c | 4 ++--
 block/nvme.c | 2 +-
 block/rbd.c  | 3 ++-
 block/ssh.c  | 2 +-
 block/vvfat.c| 2 +-
 include/block/block_int-common.h | 3 ---
 17 files changed, 31 insertions(+), 35 deletions(-)

diff --git a/block.c b/block.c
index 40eeda213666..113e3d90fd52 100644
--- a/block.c
+++ b/block.c
@@ -1627,9 +1627,7 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, 
const char *node_name,
 bs->opaque = g_malloc0(drv->instance_size);
 
 assert(!drv->bdrv_needs_filename || bs->filename[0]);
-if (drv->bdrv_file_open) {
-ret = drv->bdrv_file_open(bs, options, open_flags, _err);
-} else if (drv->bdrv_open) {
+if (drv->bdrv_open) {
 ret = drv->bdrv_open(bs, options, open_flags, _err);
 } else {
 ret = 0;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index addad914b3f7..c9ae3cb6ae3d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1072,7 +1072,7 @@ static BlockDriver bdrv_blkdebug = {
 .is_filter  = true,
 
 .bdrv_parse_filename= blkdebug_parse_filename,
-.bdrv_file_open = blkdebug_open,
+.bdrv_open  = blkdebug_open,
 .bdrv_close = blkdebug_close,
 .bdrv_reopen_prepare= blkdebug_reopen_prepare,
 .bdrv_child_perm= blkdebug_child_perm,
diff --git a/block/blkio.c b/block/blkio.c
index 72117fa0059b..202cf20ca4bb 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -992,7 +992,7 @@ static void blkio_refresh_limits(BlockDriverState *bs, 
Error **errp)
 .format_name = name, \
 .protocol_name   = name, \
 .instance_size   = sizeof(BDRVBlkioState), \
-.bdrv_file_open  = blkio_file_open, \
+.bdrv_open   = blkio_open, \
 .bdrv_close  = blkio_close, \
 .bdrv_co_getlength   = blkio_co_getlength, \
 .bdrv_co_truncate= blkio_truncate, \
diff --git a/block/blkverify.c b/block/blkverify.c
index 7326461f30e0..263166046ea6 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -313,7 +313,7 @@ static BlockDriver bdrv_blkverify = {
 .instance_size= sizeof(BDRVBlkverifyState),
 
 .bdrv_parse_filename  = blkverify_parse_filename,
-.bdrv_file_open   = blkverify_open,
+.bdrv_open= blkverify_open,
 .bdrv_close   = blkverify_close,
 .bdrv_child_perm  = bdrv_default_perms,
 .bdrv_co_getlength= blkverify_co_getlength,
diff --git a/block/curl.c b/block/curl.c
index 0fc42d03d777..cd2dee3f0a8b 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -1032,7 +1032,7 @@ static BlockDriver bdrv_http = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open  = curl_open,
 .bdrv_close = curl_close,
 .bdrv_co_getlength  = curl_co_getlength,
 
@@ -1051,7 +1051,7 @@ static BlockDriver bdrv_https = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open  = curl_open,
 .bdrv_close = curl_close,
 .bdrv_co_getlength  = curl_co_getlength,
 
@@ -1070,7 +1070,7 @@ static BlockDriver bdrv_ftp = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open  = curl_open,
 .bdrv_close = curl_close,
 .bdrv_co_getlength  = curl_co_getlength,
 
@@ -1089,7 +1089,7 @@ static BlockDriver bdrv_ftps = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open  = curl_open,
 .bdrv_close = curl_close,
 .bdrv_co_getlength  = curl_co_getlength,
 
diff --git a/block/file-posix.c b/block/file-posix.c
index

[PATCH v2 1/3] block: make assertion more generic

2023-06-01 Thread Paolo Bonzini

.bdrv_needs_filename is only set for drivers that also set bdrv_file_open,
i.e. protocol drivers.

So we can make the assertion always, it will always pass for those drivers
that use bdrv_open.

Signed-off-by: Paolo Bonzini 
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index dae629075c2c..a1452e747b62 100644
--- a/block.c
+++ b/block.c
@@ -1627,8 +1627,8 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, 
const char *node_name,
 bs->drv = drv;
 bs->opaque = g_malloc0(drv->instance_size);
 
+assert(!drv->bdrv_needs_filename || bs->filename[0]);
 if (drv->bdrv_file_open) {
-assert(!drv->bdrv_needs_filename || bs->filename[0]);
 ret = drv->bdrv_file_open(bs, options, open_flags, _err);
 } else if (drv->bdrv_open) {
 ret = drv->bdrv_open(bs, options, open_flags, _err);
-- 
2.40.1

[PATCH 02/12] qed: mark more functions as coroutine_fns and GRAPH_RDLOCK

2023-06-01 Thread Paolo Bonzini

Mark functions as coroutine_fn when they are only called by other coroutine_fns
and they can suspend.  Change calls to co_wrappers to use the non-wrapped
functions, which in turn requires adding GRAPH_RDLOCK annotations.

Signed-off-by: Paolo Bonzini 
---
 block/qed-check.c | 5 +++--
 block/qed.c   | 7 ---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/qed-check.c b/block/qed-check.c
index 8fd94f405ed7..6a01b94f9c9c 100644
--- a/block/qed-check.c
+++ b/block/qed-check.c
@@ -200,7 +200,8 @@ static void qed_check_for_leaks(QEDCheck *check)
 /**
  * Mark an image clean once it passes check or has been repaired
  */
-static void qed_check_mark_clean(BDRVQEDState *s, BdrvCheckResult *result)
+static void coroutine_fn GRAPH_RDLOCK
+qed_check_mark_clean(BDRVQEDState *s, BdrvCheckResult *result)
 {
 /* Skip if there were unfixable corruptions or I/O errors */
 if (result->corruptions > 0 || result->check_errors > 0) {
@@ -213,7 +214,7 @@ static void qed_check_mark_clean(BDRVQEDState *s, 
BdrvCheckResult *result)
 }
 
 /* Ensure fixes reach storage before clearing check bit */
-bdrv_flush(s->bs);
+bdrv_co_flush(s->bs);
 
 s->header.features &= ~QED_F_NEED_CHECK;
 qed_write_header_sync(s);
diff --git a/block/qed.c b/block/qed.c
index 8e08f89bbd01..382c05c83fef 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -195,14 +195,15 @@ static bool qed_is_image_size_valid(uint64_t image_size, 
uint32_t cluster_size,
  *
  * The string is NUL-terminated.
  */
-static int qed_read_string(BdrvChild *file, uint64_t offset, size_t n,
-   char *buf, size_t buflen)
+static int coroutine_fn GRAPH_RDLOCK
+qed_read_string(BdrvChild *file, uint64_t offset,
+size_t n, char *buf, size_t buflen)
 {
 int ret;
 if (n >= buflen) {
 return -EINVAL;
 }
-ret = bdrv_pread(file, offset, n, buf, 0);
+ret = bdrv_co_pread(file, offset, n, buf, 0);
 if (ret < 0) {
 return ret;
 }
-- 
2.40.1

[PATCH 01/12] file-posix: remove incorrect coroutine_fn calls

2023-06-01 Thread Paolo Bonzini

raw_co_getlength is called by handle_aiocb_write_zeroes, which is not a 
coroutine
function.  This is harmless because raw_co_getlength does not actually suspend,
but in the interest of clarity make it a non-coroutine_fn that is just wrapped
by the coroutine_fn raw_co_getlength.  Likewise, check_cache_dropped was only
a coroutine_fn because it called raw_co_getlength, so it can be made 
non-coroutine
as well.

Signed-off-by: Paolo Bonzini 
---
 block/file-posix.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 942f529f6ffc..5e0afaba69a5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -193,7 +193,7 @@ static int fd_open(BlockDriverState *bs)
 return -EIO;
 }
 
-static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs);
+static int64_t raw_getlength(BlockDriverState *bs);
 
 typedef struct RawPosixAIOData {
 BlockDriverState *bs;
@@ -1974,7 +1974,7 @@ static int handle_aiocb_write_zeroes(void *opaque)
 #ifdef CONFIG_FALLOCATE
 /* Last resort: we are trying to extend the file with zeroed data. This
  * can be done via fallocate(fd, 0) */
-len = raw_co_getlength(aiocb->bs);
+len = raw_getlength(aiocb->bs);
 if (s->has_fallocate && len >= 0 && aiocb->aio_offset >= len) {
 int ret = do_fallocate(s->fd, 0, aiocb->aio_offset, aiocb->aio_nbytes);
 if (ret == 0 || ret != -ENOTSUP) {
@@ -2696,7 +2696,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState 
*bs, int64_t offset,
 }
 
 if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-int64_t cur_length = raw_co_getlength(bs);
+int64_t cur_length = raw_getlength(bs);
 
 if (offset != cur_length && exact) {
 error_setg(errp, "Cannot resize device files");
@@ -2714,7 +2714,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState 
*bs, int64_t offset,
 }
 
 #ifdef __OpenBSD__
-static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
+static int64_t raw_getlength(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 int fd = s->fd;
@@ -2733,7 +2733,7 @@ static int64_t coroutine_fn 
raw_co_getlength(BlockDriverState *bs)
 return st.st_size;
 }
 #elif defined(__NetBSD__)
-static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
+static int64_t raw_getlength(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 int fd = s->fd;
@@ -2758,7 +2758,7 @@ static int64_t coroutine_fn 
raw_co_getlength(BlockDriverState *bs)
 return st.st_size;
 }
 #elif defined(__sun__)
-static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
+static int64_t raw_getlength(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 struct dk_minfo minfo;
@@ -2789,7 +2789,7 @@ static int64_t coroutine_fn 
raw_co_getlength(BlockDriverState *bs)
 return size;
 }
 #elif defined(CONFIG_BSD)
-static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
+static int64_t raw_getlength(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 int fd = s->fd;
@@ -2861,7 +2861,7 @@ again:
 return size;
 }
 #else
-static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
+static int64_t raw_getlength(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
 int ret;
@@ -2880,6 +2880,11 @@ static int64_t coroutine_fn 
raw_co_getlength(BlockDriverState *bs)
 }
 #endif
 
+static int64_t coroutine_fn raw_co_getlength(BlockDriverState *bs)
+{
+return raw_getlength(bs);
+}
+
 static int64_t coroutine_fn raw_co_get_allocated_file_size(BlockDriverState 
*bs)
 {
 struct stat st;
@@ -3245,7 +3250,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,
  * round up if necessary.
  */
 if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
-int64_t file_length = raw_co_getlength(bs);
+int64_t file_length = raw_getlength(bs);
 if (file_length > 0) {
 /* Ignore errors, this is just a safeguard */
 assert(hole == file_length);
@@ -3267,7 +3272,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,
 
 #if defined(__linux__)
 /* Verify that the file is not in the page cache */
-static void coroutine_fn check_cache_dropped(BlockDriverState *bs, Error 
**errp)
+static void check_cache_dropped(BlockDriverState *bs, Error **errp)
 {
 const size_t window_size = 128 * 1024 * 1024;
 BDRVRawState *s = bs->opaque;
@@ -3282,7 +3287,7 @@ static void coroutine_fn 
check_cache_dropped(BlockDriverState *bs, Error **errp)
 page_size = sysconf(_SC_PAGESIZE);
 vec = g_malloc(DIV_ROUND_UP(window_size, page_size));
 
-end = raw_co_getlength(bs);
+end = raw_getlength(bs);
 
 for (offset = 0; offset < end; offset += window_size) {
 void *new_window;
@@ -4504,7 +4509,7 @@ static int cdrom_reopen(BlockDriverState *bs)
 
 static bool coroutine_fn

[PATCH v2 2/3] block: do not check bdrv_file_open

2023-06-01 Thread Paolo Bonzini

The set of BlockDrivers that have .bdrv_file_open coincides with those
that have .protocol_name and guess what---checking drv->bdrv_file_open
is done to see if the driver is a protocol.  So check drv->protocol_name
instead.

Signed-off-by: Paolo Bonzini 
---
 block.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index a1452e747b62..40eeda213666 100644
--- a/block.c
+++ b/block.c
@@ -913,7 +913,6 @@ BlockDriver *bdrv_find_protocol(const char *filename,
 int i;
 
 GLOBAL_STATE_CODE();
-/* TODO Drivers without bdrv_file_open must be specified explicitly */
 
 /*
  * XXX(hch): we really should not let host device detection
@@ -1949,7 +1948,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 open_flags = bdrv_open_flags(bs, bs->open_flags);
 node_name = qemu_opt_get(opts, "node-name");
 
-assert(!drv->bdrv_file_open || file == NULL);
+assert(!drv->protocol_name || file == NULL);
 ret = bdrv_open_driver(bs, drv, node_name, options, open_flags, errp);
 if (ret < 0) {
 goto fail_opts;
@@ -2049,7 +2048,7 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 }
 /* If the user has explicitly specified the driver, this choice should
  * override the BDRV_O_PROTOCOL flag */
-protocol = drv->bdrv_file_open;
+protocol = drv->protocol_name;
 }
 
 if (protocol) {
@@ -4017,7 +4016,7 @@ bdrv_open_inherit(const char *filename, const char 
*reference, QDict *options,
 }
 
 /* BDRV_O_PROTOCOL must be set iff a protocol BDS is about to be created */
-assert(!!(flags & BDRV_O_PROTOCOL) == !!drv->bdrv_file_open);
+assert(!!(flags & BDRV_O_PROTOCOL) == !!drv->protocol_name);
 /* file must be NULL if a protocol BDS is about to be created
  * (the inverse results in an error message from bdrv_open_common()) */
 assert(!(flags & BDRV_O_PROTOCOL) || !file);
@@ -5816,7 +5815,7 @@ int64_t coroutine_fn 
bdrv_co_get_allocated_file_size(BlockDriverState *bs)
 return drv->bdrv_co_get_allocated_file_size(bs);
 }
 
-if (drv->bdrv_file_open) {
+if (drv->protocol_name) {
 /*
  * Protocol drivers default to -ENOTSUP (most of their data is
  * not stored in any of their children (if they even have any),
@@ -7932,7 +7931,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  *   Both of these conditions are represented by 
generate_json_filename.
  */
 if (primary_child_bs->exact_filename[0] &&
-primary_child_bs->drv->bdrv_file_open &&
+primary_child_bs->drv->protocol_name &&
 !drv->is_filter && !generate_json_filename)
 {
 strcpy(bs->exact_filename, primary_child_bs->exact_filename);
-- 
2.40.1

[PATCH v2 0/4] block: clean up coroutine versions of bdrv_{is_allocated, block_status}*

2023-06-01 Thread Paolo Bonzini

Provide coroutine versions of bdrv_is_allocated* and bdrv_block_status*,
since the underlying BlockDriver API is coroutine-based, and use
automatically-generated wrappers for the "mixed" versions.

Paolo

v1->v2: rename the old bdrv_co_block_status to bdrv_co_do_block_status


Paolo Bonzini (4):
  block: rename the bdrv_co_block_status static function
  block: complete public block status API
  block: switch to co_wrapper for bdrv_is_allocated_*
  block: convert more bdrv_is_allocated* and bdrv_block_status* calls to
coroutine versions

 block/copy-before-write.c |  2 +-
 block/copy-on-read.c  |  8 ++--
 block/io.c| 96 ++-
 block/mirror.c| 10 ++--
 block/qcow2.c |  5 +-
 block/replication.c   |  8 ++--
 block/stream.c|  8 ++--
 block/vvfat.c | 18 
 include/block/block-io.h  | 30 +++-
 9 files changed, 73 insertions(+), 112 deletions(-)

-- 
2.40.1

[PATCH v2 1/4] block: rename the bdrv_co_block_status static function

2023-06-01 Thread Paolo Bonzini

bdrv_block_status exists as a wrapper for bdrv_block_status_above,
but the name of the (hypothetical) coroutine version, bdrv_co_block_status,
is squatted by a random static function.  Rename it to bdrv_do_block_status.

Signed-off-by: Paolo Bonzini 
---
 block/io.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index 540bf8d26dff..2c2c9656853c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2236,9 +2236,9 @@ int bdrv_flush_all(void)
  * set to the host mapping and BDS corresponding to the guest offset.
  */
 static int coroutine_fn GRAPH_RDLOCK
-bdrv_co_block_status(BlockDriverState *bs, bool want_zero,
- int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file)
+bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
 int64_t total_size;
 int64_t n; /* bytes */
@@ -2397,8 +2397,8 @@ bdrv_co_block_status(BlockDriverState *bs, bool want_zero,
 
 if (ret & BDRV_BLOCK_RAW) {
 assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-ret = bdrv_co_block_status(local_file, want_zero, local_map,
-   *pnum, pnum, _map, _file);
+ret = bdrv_co_do_block_status(local_file, want_zero, local_map,
+  *pnum, pnum, _map, _file);
 goto out;
 }
 
@@ -2425,8 +2425,8 @@ bdrv_co_block_status(BlockDriverState *bs, bool want_zero,
 int64_t file_pnum;
 int ret2;
 
-ret2 = bdrv_co_block_status(local_file, want_zero, local_map,
-*pnum, _pnum, NULL, NULL);
+ret2 = bdrv_co_do_block_status(local_file, want_zero, local_map,
+   *pnum, _pnum, NULL, NULL);
 if (ret2 >= 0) {
 /* Ignore errors.  This is just providing extra information, it
  * is useful but not necessary.
@@ -2493,7 +2493,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 return 0;
 }
 
-ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
+ret = bdrv_co_do_block_status(bs, want_zero, offset, bytes, pnum, map, 
file);
 ++*depth;
 if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
 return ret;
@@ -2509,8 +2509,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
 for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
  p = bdrv_filter_or_cow_bs(p))
 {
-ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
-   file);
+ret = bdrv_co_do_block_status(p, want_zero, offset, bytes, pnum, map, 
file);
 ++*depth;
 if (ret < 0) {
 return ret;
-- 
2.40.1

Re: [PATCH v3 13/14] nbd/server: Prepare for per-request filtering of BLOCK_STATUS

2023-06-01 Thread Vladimir Sementsov-Ogievskiy


On 15.05.23 22:53, Eric Blake wrote:

The next commit will add support for the new addition of
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake 
---
  include/block/nbd.h |  15 ++
  nbd/server.c| 108 +++-
  2 files changed, 72 insertions(+), 51 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 865bb4ee2e1..6696d61bd59 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -60,6 +60,20 @@ typedef enum NBDHeaderStyle {
  NBD_HEADER_EXTENDED,/* NBD_OPT_EXTENDED_HEADERS negotiated */
  } NBDHeaderStyle;

+/*
+ * NBDMetaContexts represents a list of meta contexts in use, as
+ * selected by NBD_OPT_SET_META_CONTEXT. Also used for
+ * NBD_OPT_LIST_META_CONTEXT, and payload filtering in
+ * NBD_CMD_BLOCK_STATUS.
+ */
+typedef struct NBDMetaContexts {
+size_t count; /* number of negotiated contexts */
+bool base_allocation; /* export base:allocation context (block status) */
+bool allocation_depth; /* export qemu:allocation-depth */
+size_t nr_bitmaps; /* Length of bitmaps array */
+bool *bitmaps; /* export qemu:dirty-bitmap: */
+} NBDMetaContexts;
+
  /*
   * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
   * request!
@@ -70,6 +84,7 @@ typedef struct NBDRequest {
  uint64_t len;   /* Effect length; 32 bit limit without extended headers */
  uint16_t flags; /* NBD_CMD_FLAG_* */
  uint16_t type;  /* NBD_CMD_* */
+NBDMetaContexts contexts; /* Used by NBD_CMD_BLOCK_STATUS */
  } NBDRequest;

  typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index 6475a76c1f0..db550c82cd2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -105,20 +105,6 @@ struct NBDExport {

  static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
- * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-NBDExport *exp;
-size_t count; /* number of negotiated contexts */
-bool base_allocation; /* export base:allocation context (block status) */
-bool allocation_depth; /* export qemu:allocation-depth */
-bool *bitmaps; /*
-* export qemu:dirty-bitmap:,
-* sized by exp->nr_export_bitmaps
-*/
-} NBDExportMetaContexts;
-
  struct NBDClient {
  int refcount;
  void (*close_fn)(NBDClient *client, bool negotiated);
@@ -144,7 +130,8 @@ struct NBDClient {
  uint32_t check_align; /* If non-zero, check for aligned client requests */

  NBDHeaderStyle header_style;
-NBDExportMetaContexts export_meta;
+NBDExport *context_exp; /* export of last OPT_SET_META_CONTEXT */
+NBDMetaContexts contexts; /* Negotiated meta contexts */

  uint32_t opt; /* Current option being negotiated */
  uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -457,8 +444,8 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
Error **errp)

  static void nbd_check_meta_export(NBDClient *client)
  {
-if (client->exp != client->export_meta.exp) {
-client->export_meta.count = 0;
+if (client->exp != client->context_exp) {
+client->contexts.count = 0;
  }
  }

@@ -852,7 +839,7 @@ static bool nbd_strshift(const char **str, const char 
*prefix)
   * Handle queries to 'base' namespace. For now, only the base:allocation
   * context is available.  Return true if @query has been handled.
   */
-static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
  const char *query)
  {
  if (!nbd_strshift(, "base:")) {
@@ -872,8 +859,8 @@ static bool nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts *meta,
   * and qemu:allocation-depth contexts are available.  Return true if @query
   * has been handled.
   */
-static bool nbd_meta_qemu_query(NBDClient *client, NBDExportMetaContexts *meta,
-const char *query)
+static bool nbd_meta_qemu_query(NBDClient *client, NBDExport *exp,
+NBDMetaContexts *meta, const char *query)
  {
  size_t i;

@@ -884,9 +871,9 @@ static bool nbd_meta_qemu_query(NBDClient *client, 
NBDExportMetaContexts *meta,

  if (!*query) {
  if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-meta->allocation_depth = meta->exp->allocation_depth;
-if (meta->exp->nr_export_bitmaps) {
-memset(meta->bitmaps, 1,

[PATCH 1/2] scripts: Add qom-cast-macro-clean-cocci-gen.py

2023-06-01 Thread Philippe Mathieu-Daudé

Add a script to generate Coccinelle semantic patch
removing all pointless QOM cast macro uses.

Suggested-by: Markus Armbruster 
Signed-off-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS   |  1 +
 scripts/qom-cast-macro-clean-cocci-gen.py | 49 +++
 2 files changed, 50 insertions(+)
 create mode 100644 scripts/qom-cast-macro-clean-cocci-gen.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b025a7b63..37a2412011 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3044,6 +3044,7 @@ F: include/qom/
 F: qapi/qom.json
 F: qapi/qdev.json
 F: scripts/coccinelle/qom-parent-type.cocci
+F: scripts/qom-cast-macro-clean-cocci-gen.py
 F: softmmu/qdev-monitor.c
 F: stubs/qdev.c
 F: qom/
diff --git a/scripts/qom-cast-macro-clean-cocci-gen.py 
b/scripts/qom-cast-macro-clean-cocci-gen.py
new file mode 100644
index 00..2fa8438a14
--- /dev/null
+++ b/scripts/qom-cast-macro-clean-cocci-gen.py
@@ -0,0 +1,49 @@
+#!/usr/bin/env python3
+#
+# Generate a Coccinelle semantic patch to remove pointless QOM cast.
+#
+# Usage:
+#
+# $ qom-cast-macro-clean-cocci-gen.py $(git ls-files) > 
qom_pointless_cast.cocci
+# $ spatch \
+#   --macro-file scripts/cocci-macro-file.h \
+#   --sp-file qom_pointless_cast.cocci \
+#   --keep-comments \
+#   --use-gitgrep \
+#   --in-place \
+#   --dir .
+#
+# SPDX-FileContributor: Philippe Mathieu-Daudé 
+# SPDX-FileCopyrightText: 2023 Linaro Ltd.
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import re
+import sys
+
+assert len(sys.argv) > 0
+
+def print_cocci_rule(qom_typedef, qom_cast_macro):
+print(f'''@@
+typedef {qom_typedef};
+{qom_typedef} *obj;
+@@
+-{qom_cast_macro}(obj)
++obj
+''')
+
+patterns = [
+r'DECLARE_INSTANCE_CHECKER\((\w+),\W*(\w+),\W*TYPE_\w+\)',
+r'DECLARE_OBJ_CHECKERS\((\w+),\W*\w+,\W*(\w+),\W*TYPE_\w+\)',
+r'OBJECT_DECLARE_TYPE\((\w+),\W*\w+,\W*(\w+)\)',
+r'OBJECT_DECLARE_SIMPLE_TYPE\((\w+),\W*(\w+)\)',
+r'INTERFACE_CHECK\((\w+),\W*\(\w+\),\W*TYPE_(\w+)\)',
+]
+
+for fn in sys.argv[1:]:
+try:
+content = open(fn, 'rt').read()
+except:
+continue
+for pattern in patterns:
+for match in re.findall(pattern, content):
+print_cocci_rule(match[0], match[1])
-- 
2.38.1

[PATCH 0/2] bulk: Remove pointless QOM casts

2023-06-01 Thread Philippe Mathieu-Daudé

As per Markus suggestion in [*], use Coccinelle to remove
pointless QOM cast macro uses. Since we have more than 1000
QOM types, add a script to generate the semantic patch.

[*] https://lore.kernel.org/qemu-devel/87mt1jafjt@pond.sub.org/

Philippe Mathieu-Daudé (2):
  scripts: Add qom-cast-macro-clean-cocci-gen.py
  bulk: Remove pointless QOM casts

 MAINTAINERS   |  1 +
 block/nbd.c   |  4 +-
 chardev/char-pty.c|  2 +-
 hw/arm/musicpal.c |  2 +-
 hw/arm/xlnx-versal.c  |  2 +-
 hw/display/vhost-user-gpu.c   |  4 +-
 hw/intc/loongarch_extioi.c|  6 +--
 hw/m68k/q800.c|  2 +-
 hw/pci-host/bonito.c  |  2 +-
 hw/ppc/pnv_lpc.c  |  2 +-
 hw/ppc/pnv_occ.c  |  2 +-
 hw/ppc/pnv_sbe.c  |  2 +-
 hw/riscv/virt.c   | 10 ++---
 hw/rx/rx62n.c |  2 +-
 hw/scsi/esp-pci.c | 18 -
 hw/sparc/sun4m.c  |  4 +-
 hw/virtio/virtio-mem-pci.c|  6 +--
 hw/virtio/virtio-pmem-pci.c   |  6 +--
 migration/fd.c|  4 +-
 migration/multifd.c   |  2 +-
 migration/yank_functions.c|  4 +-
 nbd/client-connection.c   |  2 +-
 nbd/server.c  |  2 +-
 softmmu/qdev-monitor.c|  2 +-
 ui/vnc-ws.c   |  6 +--
 scripts/qom-cast-macro-clean-cocci-gen.py | 49 +++
 26 files changed, 99 insertions(+), 49 deletions(-)
 create mode 100644 scripts/qom-cast-macro-clean-cocci-gen.py

-- 
2.38.1

[PATCH 2/2] bulk: Remove pointless QOM casts

2023-06-01 Thread Philippe Mathieu-Daudé

Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.

Suggested-by: Markus Armbruster 
Signed-off-by: Philippe Mathieu-Daudé 
---
 block/nbd.c |  4 ++--
 chardev/char-pty.c  |  2 +-
 hw/arm/musicpal.c   |  2 +-
 hw/arm/xlnx-versal.c|  2 +-
 hw/display/vhost-user-gpu.c |  4 ++--
 hw/intc/loongarch_extioi.c  |  6 +++---
 hw/m68k/q800.c  |  2 +-
 hw/pci-host/bonito.c|  2 +-
 hw/ppc/pnv_lpc.c|  2 +-
 hw/ppc/pnv_occ.c|  2 +-
 hw/ppc/pnv_sbe.c|  2 +-
 hw/riscv/virt.c | 10 +-
 hw/rx/rx62n.c   |  2 +-
 hw/scsi/esp-pci.c   | 18 +-
 hw/sparc/sun4m.c|  4 ++--
 hw/virtio/virtio-mem-pci.c  |  6 +++---
 hw/virtio/virtio-pmem-pci.c |  6 +++---
 migration/fd.c  |  4 ++--
 migration/multifd.c |  2 +-
 migration/yank_functions.c  |  4 ++--
 nbd/client-connection.c |  2 +-
 nbd/server.c|  2 +-
 softmmu/qdev-monitor.c  |  2 +-
 ui/vnc-ws.c |  6 +++---
 24 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a3f8f8a9d5..5aef5cb6bd 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -397,7 +397,7 @@ static void coroutine_fn GRAPH_RDLOCK 
nbd_reconnect_attempt(BDRVNBDState *s)
 
 /* Finalize previous connection if any */
 if (s->ioc) {
-qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
+qio_channel_detach_aio_context(s->ioc);
 yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
  nbd_yank, s->bs);
 object_unref(OBJECT(s->ioc));
@@ -1455,7 +1455,7 @@ static void nbd_yank(void *opaque)
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
 QEMU_LOCK_GUARD(>requests_lock);
-qio_channel_shutdown(QIO_CHANNEL(s->ioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 s->state = NBD_CLIENT_QUIT;
 }
 
diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 92fd33c854..4e5deac18a 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -334,7 +334,7 @@ static void char_pty_open(Chardev *chr,
 s = PTY_CHARDEV(chr);
 s->ioc = QIO_CHANNEL(qio_channel_file_new_fd(master_fd));
 name = g_strdup_printf("chardev-pty-%s", chr->label);
-qio_channel_set_name(QIO_CHANNEL(s->ioc), name);
+qio_channel_set_name(s->ioc, name);
 g_free(name);
 s->timer_src = NULL;
 *be_opened = false;
diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index 58f3d30c9b..dc4e43e0ee 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -1250,7 +1250,7 @@ static void musicpal_init(MachineState *machine)
 uart_orgate = DEVICE(object_new(TYPE_OR_IRQ));
 object_property_set_int(OBJECT(uart_orgate), "num-lines", 2, _fatal);
 qdev_realize_and_unref(uart_orgate, NULL, _fatal);
-qdev_connect_gpio_out(DEVICE(uart_orgate), 0,
+qdev_connect_gpio_out(uart_orgate, 0,
   qdev_get_gpio_in(pic, MP_UART_SHARED_IRQ));
 
 serial_mm_init(address_space_mem, MP_UART1_BASE, 2,
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index 69b1b99e93..db1e0dee6e 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -327,7 +327,7 @@ static void versal_create_rtc(Versal *s, qemu_irq *pic)
 object_initialize_child(OBJECT(s), "rtc", >pmc.rtc,
 TYPE_XLNX_ZYNQMP_RTC);
 sbd = SYS_BUS_DEVICE(>pmc.rtc);
-sysbus_realize(SYS_BUS_DEVICE(sbd), _fatal);
+sysbus_realize(sbd, _fatal);
 
 mr = sysbus_mmio_get_region(sbd, 0);
 memory_region_add_subregion(>mr_ps, MM_PMC_RTC, mr);
diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 71dfd956b8..1386e869e5 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -364,11 +364,11 @@ vhost_user_gpu_gl_flushed(VirtIOGPUBase *b)
 VhostUserGPU *g = VHOST_USER_GPU(b);
 
 if (g->backend_blocked) {
-vhost_user_gpu_unblock(VHOST_USER_GPU(g));
+vhost_user_gpu_unblock(g);
 g->backend_blocked = false;
 }
 
-vhost_user_gpu_update_blocked(VHOST_USER_GPU(g), false);
+vhost_user_gpu_update_blocked(g, false);
 }
 
 static bool
diff --git a/hw/intc/loongarch_extioi.c b/hw/intc/loongarch_extioi.c
index 0e7a3e32f3..af75460643 100644
--- a/hw/intc/loongarch_extioi.c
+++ b/hw/intc/loongarch_extioi.c
@@ -276,7 +276,7 @@ static void loongarch_extioi_instance_init(Object *obj)
 int i, cpu, pin;
 
 for (i = 0; i < EXTIOI_IRQS; i++) {
-sysbus_init_irq(SYS_BUS_DEVICE(dev), >irq[i]);
+sysbus_init_irq(dev, >irq[i]);
 }
 
 qdev_init_gpio_in(DEVICE(obj), extioi_setirq, EXTIOI_IRQS);
@@ -284,14 +284,14 @@ static void loongarch_extioi_instance_init(Object *obj)
 for (cpu = 0; cpu < EXTIOI_CPUS; cpu++) {

Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies

2023-06-01 Thread Laszlo Ersek

On 5/25/23 15:00, Eric Blake wrote:
> Support receiving headers for 64-bit replies if extended headers were
> negotiated.  We already insist that the server not send us too much
> payload in one reply, so we can exploit that and merge the 64-bit
> length back into a normalized 32-bit field for the rest of the payload
> length calculations.  The NBD protocol specifically documents that
> extended mode takes precedence over structured replies, and that there
> are no simple replies in extended mode.  We can also take advantage
> that the handle field is in the same offset in all the various reply
> types.

(1) We might want to replace "handle" with "cookie" at this point.

>
> Note that if we negotiate extended headers, but a non-compliant server
> replies with a non-extended header, this patch will stall waiting for
> the server to send more bytes

Ah, yes. If we negotiate extended headers, we set "rlen" to "sizeof
extended" in REPLY.START, which is larger than either of "sizeof simple"
and "sizeof structured". Therefore we'll get stuck in REPLY.RECV_REPLY.
Correct?

> rather than noticing that the magic
> number is wrong (for aio operations, you'll get a magic number
> mismatch once you send a second command that elicits a reply; but for
> blocking operations, we basically deadlock).  The easy alternative
> would be to read just the first 4 bytes of magic, then determine how
> many more bytes to expect; but that would require more states and
> syscalls, and not worth it since the typical server will be compliant.

Not liking this "not worth it" argument. But...

> The other alternative is what the next patch implements: teaching
> REPLY.RECV_REPLY to handle short reads that were at least long enough
> to transmit magic to specifically look for magic number mismatch.

... that sounds OK!

>
> At this point, h->extended_headers is permanently false (we can't
> enable it until all other aspects of the protocol have likewise been
> converted).
>
> Signed-off-by: Eric Blake 
> ---
>  lib/internal.h  |  1 +
>  generator/states-reply-structured.c | 52 +++--
>  generator/states-reply.c| 34 ---
>  3 files changed, 58 insertions(+), 29 deletions(-)

Probably best to reorder the files in this patch for review:

>
> diff --git a/lib/internal.h b/lib/internal.h
> index 8a5f93d4..e4e21a4d 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -243,6 +243,7 @@ struct nbd_handle {
>union {
>  struct nbd_simple_reply simple;
>  struct nbd_structured_reply structured;
> +struct nbd_extended_reply extended;
>} hdr;
>union {
>  struct nbd_structured_reply_offset_data offset_data;

Seems OK.

> diff --git a/generator/states-reply.c b/generator/states-reply.c
> index 31e4bd2f..4e9f2dde 100644
> --- a/generator/states-reply.c
> +++ b/generator/states-reply.c
> @@ -62,15 +62,19 @@  REPLY.START:
> */
>ssize_t r;
>
> -  /* We read all replies initially as if they are simple replies, but
> -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
> -   * This works because the structured_reply header is larger.
> +  /* With extended headers, there is only one size to read, so we can do it
> +   * all in one syscall.  But for older structured replies, we don't know if
> +   * we have a simple or structured reply until we read the magic number,
> +   * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
> */
>assert (h->reply_cmd == NULL);
>assert (h->rlen == 0);
>
>h->rbuf = >sbuf.reply.hdr;
> -  h->rlen = sizeof h->sbuf.reply.hdr.simple;
> +  if (h->extended_headers)
> +h->rlen = sizeof h->sbuf.reply.hdr.extended;
> +  else
> +h->rlen = sizeof h->sbuf.reply.hdr.simple;
>
>r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
>if (r == -1) {

The comment "This works because the structured_reply header is larger"
is being removed, which makes the non-ext branch even less
comprehensible than before.

(2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is
smaller than "sizeof structured"?

> @@ -116,16 +120,22 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
>uint64_t cookie;
>
>magic = be32toh (h->sbuf.reply.hdr.simple.magic);
> -  if (magic == NBD_SIMPLE_REPLY_MAGIC) {
> +  switch (magic) {
> +  case NBD_SIMPLE_REPLY_MAGIC:
> +if (h->extended_headers)
> +  goto invalid;
>  SET_NEXT_STATE (%SIMPLE_REPLY.START);
> -  }
> -  else if (magic == NBD_STRUCTURED_REPLY_MAGIC) {
> +break;
> +  case NBD_STRUCTURED_REPLY_MAGIC:
> +  case NBD_EXTENDED_REPLY_MAGIC:
> +if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)

Heh, I love this ((a == b) == c) "equivalence" form! :)

> +  goto invalid;
>  SET_NEXT_STATE (%STRUCTURED_REPLY.START);
> -  }
> -  else {
> -/* We've probably lost synchronization. */
> -SET_NEXT_STATE (%.DEAD);
> -set_error (0, "invalid reply magic 0x%" PRIx32,

Re: [PATCH v3 12/14] nbd/client: Request extended headers during negotiation

2023-06-01 Thread Vladimir Sementsov-Ogievskiy


On 31.05.23 23:26, Eric Blake wrote:

On Wed, May 31, 2023 at 09:33:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 31.05.23 20:54, Eric Blake wrote:

On Wed, May 31, 2023 at 08:39:53PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 15.05.23 22:53, Eric Blake wrote:

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd


why must not? It should gracefully report ENOTSUP? Or not?


The kernel code does not yet know how to send extended requests; once
extended mode is negotiated, sending a simple request requires the


but how it could be negotiated if kernel doesn't support it?


That's the problem.  The kernel doesn't do the negotiation, userspace


oh yes, I totally forget these mechanics


does.  There is an ioctl for the userspace to tell the kernel what
flags were advertised as part of the negotiation, but that does not
include a flag for extended operation.  The kernel ONLY takes care of
NBD_CMD_ operations, it does not do NBD_OPT_ operations.  So when
qemu-nbd is preparing to connect to /dev/nbdN, it first has to
negotiate in userspace, avoiding any attempt to use extended headers,
then call the right ioctls for the kernel to take over command phase
using the older compact headers.






I mean if we request extended headers during negotiation with kernel, the kernel will 
just say "unsupported option", isn't it?


If we request extended headers in userspace before calling the ioctl
to tell the kernel to start transmission, then the kernel's first
command will use the compact style, which the server is not expecting,
and while we can hope the server will hang up on the kernel, I didn't
test what actually happens.




Or, in other words, I understand that kernel doesn't support it, I don't 
understand why you note it here. Is kernel different from other NBD server 
implementations which doesn't support extended requests at the moment?


The kernel is an NBD client, not a server.  But if we are about to
connect an NBD server over to the kernel for /dev/nbdN, we better make
sure the server is not using any features the kernel doesn't support.



thanks!

--
Best regards,
Vladimir

Re: [PATCH v3 0/6] block: add blk_io_plug_call() API

2023-06-01 Thread Kevin Wolf

Am 31.05.2023 um 21:50 hat Stefan Hajnoczi geschrieben:
> Hi Kevin,
> Do you want to review the thread-local blk_io_plug() patch series or
> should I merge it?

I haven't reviewed it in detail, but on the high level it looks good to
me, and you already got reviews for the details.

Acked-by: Kevin Wolf 


signature.asc
Description: PGP signature

82 matches

Mail list logo