date:20230608

Re: [Libguestfs] [PATCH v4 24/24] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:53AM -0500, Eric Blake wrote:
> Allow a client to request a subset of negotiated meta contexts.  For
> example, a client may ask to use a single connection to learn about
> both block status and dirty bitmaps, but where the dirty bitmap
> queries only need to be performed on a subset of the disk; forcing the
> server to compute that information on block status queries in the rest
> of the disk is wasted effort (both at the server, and on the amount of
> traffic sent over the wire to be parsed and ignored by the client).
> 
> Qemu as an NBD client never requests to use more than one meta
> context, so it has no need to use block status payloads.  Testing this
> instead requires support from libnbd, which CAN access multiple meta
> contexts in parallel from a single NBD connection; an interop test
> submitted to the libnbd project at the same time as this patch
> demonstrates the feature working, as well as testing some corner cases
> (for example, when the payload length is longer than the export
> length), although other corner cases (like passing the same id
> duplicated) requires a protocol fuzzer because libnbd is not wired up
> to break the protocol that badly.
> 
> This also includes tweaks to 'qemu-nbd --list' to show when a server
> is advertising the capability, and to the testsuite to reflect the
> addition to that output.
> 
> Signed-off-by: Eric Blake 
> ---

> +++ b/nbd/server.c
> @@ -512,6 +512,9 @@ static int nbd_negotiate_handle_export_name(NBDClient 
> *client, bool no_zeroes,
>  if (client->mode >= NBD_MODE_STRUCTURED) {
>  myflags |= NBD_FLAG_SEND_DF;
>  }
> +if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) {
> +myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
> +}

This one's awkward.  At the last minute, I changed what went into
upstream NBD to state that a client that successfully negotiates
extended headers MUST NOT use NBD_OPT_EXPORT_NAME, which means this
branch should never fire for a compliant client.  I don't think it
hurts to leave this in, but it does point out that I am missing code
(either here or in 17/24) that at least logs when we detect a
non-compliant client trying to connect with the wrong command.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Libguestfs] [PATCH v4 23/24] nbd/server: Prepare for per-request filtering of BLOCK_STATUS

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:52AM -0500, Eric Blake wrote:
> The next commit will add support for the optional extension
> NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
> request that the server only return a subset of negotiated contexts,
> rather than all contexts.  To make that task easier, this patch
> populates the list of contexts to return on a per-command basis (for
> now, identical to the full set of negotiated contexts).
> 
> Signed-off-by: Eric Blake 
> ---
> 

> +++ b/nbd/server.c
> @@ -2491,6 +2491,8 @@ static int coroutine_fn 
> nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>  error_setg(errp, "No memory");
>  return -ENOMEM;
>  }
> +} else if (request->type == NBD_CMD_BLOCK_STATUS) {
> +request->contexts = >contexts;
>  }

THere are paths where request->contexts is left NULL but request->type
was set...

> @@ -2841,6 +2848,11 @@ static coroutine_fn void nbd_trip(void *opaque)
>  } else {
>  ret = nbd_handle_request(client, , req->data, _err);
>  }
> +if (request.type == NBD_CMD_BLOCK_STATUS &&
> +request.contexts != >contexts) {
> +g_free(request.contexts->bitmaps);
> +g_free(request.contexts);
> +}

so I think this also needs to be tweaked to check that
request.contexts is non-NULL before dereferencing to free bitmaps.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Libguestfs] [PATCH v4 18/24] nbd/client: Plumb errp through nbd_receive_replies

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:47AM -0500, Eric Blake wrote:
> Instead of ignoring the low-level error just to refabricate our own
> message to pass to the caller, we can just plump the caller's errp

plumb

(at least I got it right in the subject line...)

> down to the low level.
> 
> Signed-off-by: Eric Blake 
> ---
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Libguestfs] [PATCH v4 13/24] nbd/server: Refactor handling of request payload

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:42AM -0500, Eric Blake wrote:
> Upcoming additions to support NBD 64-bit effect lengths allow for the
> possibility to distinguish between payload length (capped at 32M) and
> effect length (up to 63 bits).  Without that extension, only the

Technically, the NBD spec does not (yet) have a documented cap of a
63-bit size limit; although I should probably propose a patch upstream
that does that (I had one in my draft work, but it hasn't yet made it
upstream)

> NBD_CMD_WRITE request has a payload; but with the extension, it makes
> sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload
> and effect length (where the payload is a limited-size struct that in
> turns gives the real effect length as well as a subset of known ids
> for which status is requested).  Other future NBD commands may also
> have a request payload, so the 64-bit extension introduces a new
> NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
> length is a payload length or an effect length, rather than
> hard-coding the decision based on the command.  Note that we do not
> support the payload version of BLOCK_STATUS yet.
> 
> For this patch, no semantic change is intended for a compliant client.
> For a non-compliant client, it is possible that the error behavior
> changes (a different message, a change on whether the connection is
> killed or remains alive for the next command, or so forth), in part
> because req->complete is set later on some paths, but all errors
> should still be handled gracefully.
> 
> Signed-off-by: Eric Blake 
> ---
> 
> v4: less indentation on several 'if's [Vladimir]
> ---
>  nbd/server.c | 76 ++--
>  nbd/trace-events |  1 +
>  2 files changed, 49 insertions(+), 28 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 4ac05d0cd7b..d7dc29f0445 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2329,6 +2329,8 @@ static int coroutine_fn 
> nbd_co_receive_request(NBDRequestData *req, NBDRequest *
> Error **errp)
>  {
>  NBDClient *client = req->client;
> +bool extended_with_payload;
> +unsigned payload_len = 0;
>  int valid_flags;
>  int ret;
> 
> @@ -2342,48 +2344,63 @@ static int coroutine_fn 
> nbd_co_receive_request(NBDRequestData *req, NBDRequest *
>  trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
>   nbd_cmd_lookup(request->type));
> 
> -if (request->type != NBD_CMD_WRITE) {
> -/* No payload, we are ready to read the next request.  */
> -req->complete = true;
> -}
> -
>  if (request->type == NBD_CMD_DISC) {
>  /* Special case: we're going to disconnect without a reply,
>   * whether or not flags, from, or len are bogus */
> +req->complete = true;
>  return -EIO;
>  }
> 
> -if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
> -request->type == NBD_CMD_CACHE)
> -{
> -if (request->len > NBD_MAX_BUFFER_SIZE) {
> -error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
> -   request->len, NBD_MAX_BUFFER_SIZE);
> -return -EINVAL;
> +/* Payload and buffer handling. */
> +extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
> +(request->flags & NBD_CMD_FLAG_PAYLOAD_LEN);
> +if ((request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
> + request->type == NBD_CMD_CACHE || extended_with_payload) &&
> +request->len > NBD_MAX_BUFFER_SIZE) {

I'm still debating about rewriting this series of slightly-different
'if' into a single switch (request->type) block; the benefit is that
all actions for one command will be localized instead of split over
multiple lines of if, the drawback is that some code will now have to
be duplicated across commands (such as the buffer allocation code for
CMD_READ and CMD_WRITE).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Libguestfs] [PATCH v4 12/24] nbd: Prepare for 64-bit request effect lengths

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:41AM -0500, Eric Blake wrote:
> Widen the length field of NBDRequest to 64-bits, although we can
> assert that all current uses are still under 32 bits, because nothing
> ever puts us into NBD_MODE_EXTENDED yet.  Thus no semantic change.  No
> semantic change yet.

I should quit repeating myself ;) That last sentence is leftover from
a squash.

> 
> Signed-off-by: Eric Blake 
> ---
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 0/5] improve ahci test suite

2023-06-08 Thread John Snow

On Thu, Jun 8, 2023, 11:16 AM Niklas Cassel  wrote:

> From: Niklas Cassel 
>
> Hello John,
>
> This series should be applied on top of the series called:
> "[PATCH v2 0/8] misc AHCI cleanups"
> which can be found here:
> https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00038.html
>
> This series improves the ahci test suite to be in line with
> the AHCI specification when it comes to taskfile errors.
>
> Theoretically these commits could be squashed with the corresponding
> patch for the QEMU model, however, that would lose the commit log for
> the test patches, so I prefer to keep them separate.
>
> Please review.
>

We need to squash them, because we can't have any regressions that might
appear during git bisect.

Will test and review shortly.


>
> Kind regards,
> Niklas
>
> Niklas Cassel (5):
>   libqos/ahci: fix ahci_command_wait()
>   libqos/ahci: fix ahci_port_check_nonbusy()
>   libqos/ahci: simplify ahci_port_check_error()
>   libqos/ahci: improve ahci_port_check_error()
>   libqos/ahci: perform mandatory error recovery on error
>
>  tests/qtest/libqos/ahci.c | 106 --
>  tests/qtest/libqos/ahci.h |   8 ++-
>  2 files changed, 83 insertions(+), 31 deletions(-)
>
> --
> 2.40.1
>
>

[PATCH 4/5] libqos/ahci: improve ahci_port_check_error()

2023-06-08 Thread Niklas Cassel

From: Niklas Cassel 

Improve ahci_port_check_error() to also assert that PxIS.TFES is set when
expecting errors.

Signed-off-by: Niklas Cassel 
---
 tests/qtest/libqos/ahci.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/libqos/ahci.c b/tests/qtest/libqos/ahci.c
index 644ed7e79f..b216f61f14 100644
--- a/tests/qtest/libqos/ahci.c
+++ b/tests/qtest/libqos/ahci.c
@@ -409,13 +409,19 @@ void ahci_port_check_error(AHCIQState *ahci, AHCICommand 
*cmd)
 uint8_t port = cmd->port;
 uint32_t reg;
 
-/* The upper 9 bits of the IS register all indicate errors. */
-reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
-reg &= ~cmd->interrupts;
-reg >>= 23;
-g_assert_cmphex(reg, ==, 0);
+/* If expecting TF error, ensure that TFES is set. */
+if (cmd->errors) {
+reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
+ASSERT_BIT_SET(reg, AHCI_PX_IS_TFES);
+} else {
+/* The upper 9 bits of the IS register all indicate errors. */
+reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
+reg &= ~cmd->interrupts;
+reg >>= 23;
+g_assert_cmphex(reg, ==, 0);
+}
 
-/* The Sata Error Register should be empty. */
+/* The Sata Error Register should be empty, even when expecting TF error. 
*/
 reg = ahci_px_rreg(ahci, port, AHCI_PX_SERR);
 g_assert_cmphex(reg, ==, 0);
 
-- 
2.40.1

[PATCH 5/5] libqos/ahci: perform mandatory error recovery on error

2023-06-08 Thread Niklas Cassel

From: Niklas Cassel 

When the HBA encouters an error, the host has to perform error recovery,
see AHCI 1.3.1 section 6.2.2.1, in order to be able issue new commands.
If we don't do this, all the commands that we queue will get aborted.

Some tests, e.g. test_atapi_tray() call ahci_atapi_test_ready() with
ready == false, intentionally sending a command that will cause an error.

After test_atapi_tray() has called ahci_atapi_test_ready(), it directly
calls ahci_atapi_get_sense() to send a REQUEST SENSE command.

If we don't perform error recovery, the REQUEST SENSE command will get
aborted, and will not return the sense data that the test case expects.

Since the error recovery will clear PxCI, we also need to move the
ahci_port_check_nonbusy() call to before the error handling takes place.

Signed-off-by: Niklas Cassel 
---
 tests/qtest/libqos/ahci.c | 44 +--
 tests/qtest/libqos/ahci.h |  3 +--
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/libqos/ahci.c b/tests/qtest/libqos/ahci.c
index b216f61f14..a2c94c6e06 100644
--- a/tests/qtest/libqos/ahci.c
+++ b/tests/qtest/libqos/ahci.c
@@ -425,6 +425,31 @@ void ahci_port_check_error(AHCIQState *ahci, AHCICommand 
*cmd)
 reg = ahci_px_rreg(ahci, port, AHCI_PX_SERR);
 g_assert_cmphex(reg, ==, 0);
 
+/* If expecting TF error, and TFES was set, perform error recovery
+ * (see AHCI 1.3 section 6.2.2.1) such that we can send new commands. */
+if (cmd->errors) {
+/* This will clear PxCI. */
+ahci_px_clr(ahci, port, AHCI_PX_CMD, AHCI_PX_CMD_ST);
+
+/* The port has 500ms to disengage. */
+usleep(50);
+reg = ahci_px_rreg(ahci, port, AHCI_PX_CMD);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_CMD_CR);
+
+/* Clear PxIS. */
+reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
+ahci_px_wreg(ahci, port, AHCI_PX_IS, reg);
+
+/* Check if we need to perform a COMRESET.
+ * Not implemented right now, as there is no reason why our QEMU model
+ * should need a COMRESET when expecting TF error. */
+reg = ahci_px_rreg(ahci, port, AHCI_PX_TFD);
+ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_BSY | AHCI_PX_TFD_STS_DRQ);
+
+/* Enable issuing new commands. */
+ahci_px_set(ahci, port, AHCI_PX_CMD, AHCI_PX_CMD_ST);
+}
+
 /* The TFD also has two error sections. */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_TFD);
 if (!cmd->errors) {
@@ -436,17 +461,24 @@ void ahci_port_check_error(AHCIQState *ahci, AHCICommand 
*cmd)
 ASSERT_BIT_SET(reg, AHCI_PX_TFD_ERR & (cmd->errors << 8));
 }
 
-void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t port,
-uint32_t intr_mask)
+void ahci_port_check_interrupts(AHCIQState *ahci, AHCICommand *cmd)
 {
+uint8_t port = cmd->port;
 uint32_t reg;
 
+/* If we expect errors, error handling in ahci_port_check_error() will
+ * already have cleared PxIS, so in that case this function cannot verify
+ * and clear expected interrupts. */
+if (cmd->errors) {
+return;
+}
+
 /* Check for expected interrupts */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
-ASSERT_BIT_SET(reg, intr_mask);
+ASSERT_BIT_SET(reg, cmd->interrupts);
 
 /* Clear expected interrupts and assert all interrupts now cleared. */
-ahci_px_wreg(ahci, port, AHCI_PX_IS, intr_mask);
+ahci_px_wreg(ahci, port, AHCI_PX_IS, cmd->interrupts);
 g_assert_cmphex(ahci_px_rreg(ahci, port, AHCI_PX_IS), ==, 0);
 }
 
@@ -1248,9 +1280,9 @@ void ahci_command_verify(AHCIQState *ahci, AHCICommand 
*cmd)
 uint8_t slot = cmd->slot;
 uint8_t port = cmd->port;
 
-ahci_port_check_error(ahci, cmd);
-ahci_port_check_interrupts(ahci, port, cmd->interrupts);
 ahci_port_check_nonbusy(ahci, cmd);
+ahci_port_check_error(ahci, cmd);
+ahci_port_check_interrupts(ahci, cmd);
 ahci_port_check_cmd_sanity(ahci, cmd);
 if (cmd->interrupts & AHCI_PX_IS_DHRS) {
 ahci_port_check_d2h_sanity(ahci, port, slot);
diff --git a/tests/qtest/libqos/ahci.h b/tests/qtest/libqos/ahci.h
index 137858d79c..48017864bf 100644
--- a/tests/qtest/libqos/ahci.h
+++ b/tests/qtest/libqos/ahci.h
@@ -591,8 +591,7 @@ void ahci_destroy_command(AHCIQState *ahci, uint8_t port, 
uint8_t slot);
 
 /* AHCI sanity check routines */
 void ahci_port_check_error(AHCIQState *ahci, AHCICommand *cmd);
-void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t port,
-uint32_t intr_mask);
+void ahci_port_check_interrupts(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_nonbusy(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_d2h_sanity(AHCIQState *ahci, uint8_t port, uint8_t slot);
 void ahci_port_check_pio_sanity(AHCIQState *ahci, AHCICommand *cmd);
-- 
2.40.1

[PATCH 3/5] libqos/ahci: simplify ahci_port_check_error()

2023-06-08 Thread Niklas Cassel

From: Niklas Cassel 

Modify ahci_port_check_error() to simply take a struct AHCICommand.
This way, the conditionals are in line which the existing code,
e.g. ahci_port_check_nonbusy(), which checks for cmd->errors.

This makes the code easier to reason with, we don't want to use
cmd->errors in some functions and emask in some functions.

No functional changes intended.

Signed-off-by: Niklas Cassel 
---
 tests/qtest/libqos/ahci.c | 14 +++---
 tests/qtest/libqos/ahci.h |  3 +--
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/libqos/ahci.c b/tests/qtest/libqos/ahci.c
index 2d4981dae4..644ed7e79f 100644
--- a/tests/qtest/libqos/ahci.c
+++ b/tests/qtest/libqos/ahci.c
@@ -404,14 +404,14 @@ void ahci_port_clear(AHCIQState *ahci, uint8_t port)
 /**
  * Check a port for errors.
  */
-void ahci_port_check_error(AHCIQState *ahci, uint8_t port,
-   uint32_t imask, uint8_t emask)
+void ahci_port_check_error(AHCIQState *ahci, AHCICommand *cmd)
 {
+uint8_t port = cmd->port;
 uint32_t reg;
 
 /* The upper 9 bits of the IS register all indicate errors. */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_IS);
-reg &= ~imask;
+reg &= ~cmd->interrupts;
 reg >>= 23;
 g_assert_cmphex(reg, ==, 0);
 
@@ -421,13 +421,13 @@ void ahci_port_check_error(AHCIQState *ahci, uint8_t port,
 
 /* The TFD also has two error sections. */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_TFD);
-if (!emask) {
+if (!cmd->errors) {
 ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_STS_ERR);
 } else {
 ASSERT_BIT_SET(reg, AHCI_PX_TFD_STS_ERR);
 }
-ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_ERR & (~emask << 8));
-ASSERT_BIT_SET(reg, AHCI_PX_TFD_ERR & (emask << 8));
+ASSERT_BIT_CLEAR(reg, AHCI_PX_TFD_ERR & (~cmd->errors << 8));
+ASSERT_BIT_SET(reg, AHCI_PX_TFD_ERR & (cmd->errors << 8));
 }
 
 void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t port,
@@ -1242,7 +1242,7 @@ void ahci_command_verify(AHCIQState *ahci, AHCICommand 
*cmd)
 uint8_t slot = cmd->slot;
 uint8_t port = cmd->port;
 
-ahci_port_check_error(ahci, port, cmd->interrupts, cmd->errors);
+ahci_port_check_error(ahci, cmd);
 ahci_port_check_interrupts(ahci, port, cmd->interrupts);
 ahci_port_check_nonbusy(ahci, cmd);
 ahci_port_check_cmd_sanity(ahci, cmd);
diff --git a/tests/qtest/libqos/ahci.h b/tests/qtest/libqos/ahci.h
index 2234f46862..137858d79c 100644
--- a/tests/qtest/libqos/ahci.h
+++ b/tests/qtest/libqos/ahci.h
@@ -590,8 +590,7 @@ void ahci_set_command_header(AHCIQState *ahci, uint8_t port,
 void ahci_destroy_command(AHCIQState *ahci, uint8_t port, uint8_t slot);
 
 /* AHCI sanity check routines */
-void ahci_port_check_error(AHCIQState *ahci, uint8_t port,
-   uint32_t imask, uint8_t emask);
+void ahci_port_check_error(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t port,
 uint32_t intr_mask);
 void ahci_port_check_nonbusy(AHCIQState *ahci, AHCICommand *cmd);
-- 
2.40.1

[PATCH 2/5] libqos/ahci: fix ahci_port_check_nonbusy()

2023-06-08 Thread Niklas Cassel

From: Niklas Cassel 

A command that has ERR_STAT set, does not get to clear PxCI.
See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
and 5.3.16.5 ERR:FatalTaskfile.

Some tests, e.g. test_atapi_tray() call ahci_atapi_test_ready() with
ready == false, intentionally sending a command that will cause an error.

Thus, ahci_port_check_nonbusy() cannot assume that PxCI and PxSACT will
always be cleared for the executed command.

Signed-off-by: Niklas Cassel 
---
 tests/qtest/libqos/ahci.c | 27 +--
 tests/qtest/libqos/ahci.h |  2 +-
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/libqos/ahci.c b/tests/qtest/libqos/ahci.c
index c1d571e477..2d4981dae4 100644
--- a/tests/qtest/libqos/ahci.c
+++ b/tests/qtest/libqos/ahci.c
@@ -444,17 +444,32 @@ void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t 
port,
 g_assert_cmphex(ahci_px_rreg(ahci, port, AHCI_PX_IS), ==, 0);
 }
 
-void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t port, uint8_t slot)
+void ahci_port_check_nonbusy(AHCIQState *ahci, AHCICommand *cmd)
 {
+uint8_t slot = cmd->slot;
+uint8_t port = cmd->port;
 uint32_t reg;
 
-/* Assert that the command slot is no longer busy (NCQ) */
+/* For NCQ command with error PxSACT bit should still be set.
+ * For NCQ command without error, PxSACT bit should be cleared.
+ * For non-NCQ command, PxSACT bit should always be cleared. */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_SACT);
-ASSERT_BIT_CLEAR(reg, (1 << slot));
+if (cmd->props->ncq && cmd->errors) {
+ASSERT_BIT_SET(reg, (1 << slot));
+} else {
+ASSERT_BIT_CLEAR(reg, (1 << slot));
+}
 
-/* Non-NCQ */
+/* For non-NCQ command with error, PxCI bit should still be set.
+ * For non-NCQ command without error, PxCI bit should be cleared.
+ * For NCQ command without error, PxCI bit should be cleared.
+ * For NCQ command with error, PxCI bit may or may not be cleared. */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_CI);
-ASSERT_BIT_CLEAR(reg, (1 << slot));
+if (!cmd->props->ncq && cmd->errors) {
+ASSERT_BIT_SET(reg, (1 << slot));
+} else if (!cmd->errors) {
+ASSERT_BIT_CLEAR(reg, (1 << slot));
+}
 
 /* And assert that we are generally not busy. */
 reg = ahci_px_rreg(ahci, port, AHCI_PX_TFD);
@@ -1229,7 +1244,7 @@ void ahci_command_verify(AHCIQState *ahci, AHCICommand 
*cmd)
 
 ahci_port_check_error(ahci, port, cmd->interrupts, cmd->errors);
 ahci_port_check_interrupts(ahci, port, cmd->interrupts);
-ahci_port_check_nonbusy(ahci, port, slot);
+ahci_port_check_nonbusy(ahci, cmd);
 ahci_port_check_cmd_sanity(ahci, cmd);
 if (cmd->interrupts & AHCI_PX_IS_DHRS) {
 ahci_port_check_d2h_sanity(ahci, port, slot);
diff --git a/tests/qtest/libqos/ahci.h b/tests/qtest/libqos/ahci.h
index 88835b6228..2234f46862 100644
--- a/tests/qtest/libqos/ahci.h
+++ b/tests/qtest/libqos/ahci.h
@@ -594,7 +594,7 @@ void ahci_port_check_error(AHCIQState *ahci, uint8_t port,
uint32_t imask, uint8_t emask);
 void ahci_port_check_interrupts(AHCIQState *ahci, uint8_t port,
 uint32_t intr_mask);
-void ahci_port_check_nonbusy(AHCIQState *ahci, uint8_t port, uint8_t slot);
+void ahci_port_check_nonbusy(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_d2h_sanity(AHCIQState *ahci, uint8_t port, uint8_t slot);
 void ahci_port_check_pio_sanity(AHCIQState *ahci, AHCICommand *cmd);
 void ahci_port_check_cmd_sanity(AHCIQState *ahci, AHCICommand *cmd);
-- 
2.40.1

[PATCH 0/5] improve ahci test suite

2023-06-08 Thread Niklas Cassel

From: Niklas Cassel 

Hello John,

This series should be applied on top of the series called:
"[PATCH v2 0/8] misc AHCI cleanups"
which can be found here:
https://lists.nongnu.org/archive/html/qemu-block/2023-06/msg00038.html

This series improves the ahci test suite to be in line with
the AHCI specification when it comes to taskfile errors.

Theoretically these commits could be squashed with the corresponding
patch for the QEMU model, however, that would lose the commit log for
the test patches, so I prefer to keep them separate.

Please review.


Kind regards,
Niklas

Niklas Cassel (5):
  libqos/ahci: fix ahci_command_wait()
  libqos/ahci: fix ahci_port_check_nonbusy()
  libqos/ahci: simplify ahci_port_check_error()
  libqos/ahci: improve ahci_port_check_error()
  libqos/ahci: perform mandatory error recovery on error

 tests/qtest/libqos/ahci.c | 106 --
 tests/qtest/libqos/ahci.h |   8 ++-
 2 files changed, 83 insertions(+), 31 deletions(-)

-- 
2.40.1

[PATCH 1/5] libqos/ahci: fix ahci_command_wait()

2023-06-08 Thread Niklas Cassel

From: Niklas Cassel 

A command that has ERR_STAT set, does not get to clear PxCI.
See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
and 5.3.16.5 ERR:FatalTaskfile.

Some tests, e.g. test_atapi_tray() call ahci_atapi_test_ready() with
ready == false, intentionally sending a command that will cause an error.

Thus, ahci_command_wait() cannot simply wait indefinitely for PxCI to be
cleared, it also has to take ERR_STAT into account.

Signed-off-by: Niklas Cassel 
---
 tests/qtest/libqos/ahci.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/libqos/ahci.c b/tests/qtest/libqos/ahci.c
index f53f12aa99..c1d571e477 100644
--- a/tests/qtest/libqos/ahci.c
+++ b/tests/qtest/libqos/ahci.c
@@ -1207,9 +1207,10 @@ void ahci_command_wait(AHCIQState *ahci, AHCICommand 
*cmd)
 
 #define RSET(REG, MASK) (BITSET(ahci_px_rreg(ahci, cmd->port, (REG)), (MASK)))
 
-while (RSET(AHCI_PX_TFD, AHCI_PX_TFD_STS_BSY) ||
-   RSET(AHCI_PX_CI, 1 << cmd->slot) ||
-   (cmd->props->ncq && RSET(AHCI_PX_SACT, 1 << cmd->slot))) {
+while (!RSET(AHCI_PX_TFD, AHCI_PX_TFD_STS_ERR) &&
+   (RSET(AHCI_PX_TFD, AHCI_PX_TFD_STS_BSY) ||
+RSET(AHCI_PX_CI, 1 << cmd->slot) ||
+(cmd->props->ncq && RSET(AHCI_PX_SACT, 1 << cmd->slot {
 usleep(50);
 }
 
-- 
2.40.1

Re: [Libguestfs] [PATCH v4 05/24] nbd: s/handle/cookie/ to match NBD spec

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:34AM -0500, Eric Blake wrote:
> Externally, libnbd exposed the 64-bit opaque marker for each client
> NBD packet as the "cookie", because it was less confusing when
> contrasted with 'struct nbd_handle *' holding all libnbd state.  It
> also avoids confusion between the nown 'handle' as a way to identify a

noun

> packet and the verb 'handle' for reacting to things like signals.
> Upstream NBD changed their spec to favor the name "cookie" based on
> libnbd's recommendations[1], so we can do likewise.
> 
> [1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b
> 
> Signed-off-by: Eric Blake 
> ---

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Libguestfs] [PATCH v4 02/24] nbd: Consistent typedef usage in header

2023-06-08 Thread Eric Blake

On Thu, Jun 08, 2023 at 08:56:31AM -0500, Eric Blake wrote:
> We had a mix of struct declarataions followed by typedefs, and direct

declarations

> struct definitions as part of a typedef.  Pick a single style.  Also
> float a couple of opaque typedefs earlier in the file, as a later
> patch wants to refer NBDExport* in NBDRequest.  No semantic impact.

The curse of writing a commit message and then rebasing to a different
idea; in patch 22, I had originally intended to make NBDMetaContexts a
concrete type in nbd.h (which depends on NBDExport*, and would be
directly used in NBDRequest, which in turn is declared before the
pre-patch mention of NBDExport), but then changed my mind to instead
have NBDMetaContexts itself also be an opaque type with NBDRequest
only using NBDMetaContexts*.  And I missed floating the typedef for
NBDClientConnection to the same point, because we somewhat separated
opaque types along the lines of which .c files provide various
functions and opaque types.

> @@ -26,24 +26,25 @@
>  #include "qapi/error.h"
>  #include "qemu/bswap.h"
> 
> +typedef struct NBDExport NBDExport;
> +typedef struct NBDClient NBDClient;
> +

Preferences on how I should tweak that aspect of this patch?  Options:

- Don't float NBDExport or NBDClient, and drop that part of the commit
  message.  However, the later patch that adds the typedef for
  NBDMetaContexts still has to do it earlier than the definition of
  NBDRequest, rather than alongside the other opaque types relevant to
  server.c

- Also float NBDClientConnection up here, and reword the commit
  message along the lines of: Also float forward declarations of
  opaque types to the top of the file, rather than interspersed with
  function declarations, which will help a future patch that wants to
  expose yet another opaque type that will be referenced in
  NBDRequest.

- something else?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v4 11/24] nbd: Add types for extended headers

2023-06-08 Thread Eric Blake

Add the constants and structs necessary for later patches to start
implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
and server, matching recent upstream nbd.git (through commit
e6f3b94a934).  This patch does not change any existing behavior, but
merely sets the stage for upcoming patches.

This patch does not change the status quo that neither the client nor
server use a packed-struct representation for the request header.
While most of the patch adds new types, there is also some churn for
renaming the existing NBDExtent to NBDExtent32 to contrast it with
NBDExtent64, which I thought was a nicer name than NBDExtentExt.

Signed-off-by: Eric Blake 
---

v4: Hoist earlier in series, tweak a few comments, defer docs/interop
change to when feature is actually turned on, NBDExtent rename, add
QEMU_BUG_BUILD_ON for sanity sake, hoist in block status payload bits
from v3 14/14; R-b dropped
---
 include/block/nbd.h | 124 +++-
 nbd/nbd-internal.h  |   3 +-
 block/nbd.c |   6 +--
 nbd/common.c|  12 -
 nbd/server.c|   6 +--
 5 files changed, 106 insertions(+), 45 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 52420660a65..f706e38dc72 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -59,7 +59,7 @@ typedef enum NBDMode {
 NBD_MODE_EXPORT_NAME,  /* newstyle but only OPT_EXPORT_NAME safe */
 NBD_MODE_SIMPLE,   /* newstyle but only simple replies */
 NBD_MODE_STRUCTURED,   /* newstyle, structured replies enabled */
-/* TODO add NBD_MODE_EXTENDED */
+NBD_MODE_EXTENDED, /* newstyle, extended headers enabled */
 } NBDMode;

 /* Transmission phase structs */
@@ -92,20 +92,36 @@ typedef struct NBDStructuredReplyChunk {
 uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

+typedef struct NBDExtendedReplyChunk {
+uint32_t magic;  /* NBD_EXTENDED_REPLY_MAGIC */
+uint16_t flags;  /* combination of NBD_REPLY_FLAG_* */
+uint16_t type;   /* NBD_REPLY_TYPE_* */
+uint64_t cookie; /* request handle */
+uint64_t offset; /* request offset */
+uint64_t length; /* length of payload */
+} QEMU_PACKED NBDExtendedReplyChunk;
+
 typedef union NBDReply {
 NBDSimpleReply simple;
 NBDStructuredReplyChunk structured;
+NBDExtendedReplyChunk extended;
 struct {
 /*
- * @magic and @cookie fields have the same offset and size both in
- * simple reply and structured reply chunk, so let them be accessible
- * without ".simple." or ".structured." specification
+ * @magic and @cookie fields have the same offset and size in all
+ * forms of replies, so let them be accessible without ".simple.",
+ * ".structured.", or ".extended." specifications.
  */
 uint32_t magic;
 uint32_t _skip;
 uint64_t cookie;
-} QEMU_PACKED;
+};
 } NBDReply;
+QEMU_BUILD_BUG_ON(offsetof(NBDReply, simple.cookie) !=
+  offsetof(NBDReply, cookie));
+QEMU_BUILD_BUG_ON(offsetof(NBDReply, structured.cookie) !=
+  offsetof(NBDReply, cookie));
+QEMU_BUILD_BUG_ON(offsetof(NBDReply, extended.cookie) !=
+  offsetof(NBDReply, cookie));

 /* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
 typedef struct NBDStructuredReadData {
@@ -132,14 +148,34 @@ typedef struct NBDStructuredError {
 typedef struct NBDStructuredMeta {
 /* header's length >= 12 (at least one extent) */
 uint32_t context_id;
-/* extents follows */
+/* NBDExtent32 extents[] follows, array length implied by header */
 } QEMU_PACKED NBDStructuredMeta;

-/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
-typedef struct NBDExtent {
+/* Extent array element for NBD_REPLY_TYPE_BLOCK_STATUS */
+typedef struct NBDExtent32 {
 uint32_t length;
 uint32_t flags; /* NBD_STATE_* */
-} QEMU_PACKED NBDExtent;
+} QEMU_PACKED NBDExtent32;
+
+/* Header of NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtendedMeta {
+/* header's length >= 24 (at least one extent) */
+uint32_t context_id;
+uint32_t count; /* header length must be count * 16 + 8 */
+/* NBDExtent64 extents[count] follows */
+} QEMU_PACKED NBDExtendedMeta;
+
+/* Extent array element for NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtent64 {
+uint64_t length;
+uint64_t flags; /* NBD_STATE_* */
+} QEMU_PACKED NBDExtent64;
+
+/* Client payload for limiting NBD_CMD_BLOCK_STATUS reply */
+typedef struct NBDBlockStatusPayload {
+uint64_t effect_length;
+/* uint32_t ids[] follows, array length implied by header */
+} QEMU_PACKED NBDBlockStatusPayload;

 /* Transmission (export) flags: sent from server to client during handshake,
but describe what will happen during transmission */
@@ -157,20 +193,22 @@ enum {
 NBD_FLAG_SEND_RESIZE_BIT=  9, /* Send resize */
 NBD_FLAG_SEND_CACHE_BIT = 10, /* Send CACHE (prefetch) */

[PATCH v4 23/24] nbd/server: Prepare for per-request filtering of BLOCK_STATUS

2023-06-08 Thread Eric Blake

The next commit will add support for the optional extension
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake 
---

v4: split out NBDMetaContexts refactoring to its own patch, track
NBDRequests.contexts as a pointer rather than inline
---
 include/block/nbd.h |  1 +
 nbd/server.c| 22 +-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index f240707f646..47850be5a66 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -76,6 +76,7 @@ typedef struct NBDRequest {
 uint16_t flags; /* NBD_CMD_FLAG_* */
 uint16_t type;  /* NBD_CMD_* */
 NBDMode mode;   /* Determines which network representation to use */
+NBDMetaContexts *contexts; /* Used by NBD_CMD_BLOCK_STATUS */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index 42a4300c95e..308846fe46b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2491,6 +2491,8 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req, NBDRequest *
 error_setg(errp, "No memory");
 return -ENOMEM;
 }
+} else if (request->type == NBD_CMD_BLOCK_STATUS) {
+request->contexts = >contexts;
 }

 if (payload_len) {
@@ -2716,11 +2718,11 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 assert(client->mode >= NBD_MODE_EXTENDED ||
request->len <= UINT32_MAX);
-if (client->contexts.count) {
+if (request->contexts->count) {
 bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-int contexts_remaining = client->contexts.count;
+int contexts_remaining = request->contexts->count;

-if (client->contexts.base_allocation) {
+if (request->contexts->base_allocation) {
 ret = nbd_co_send_block_status(client, request,
exp->common.blk,
request->from,
@@ -2733,7 +2735,7 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 }

-if (client->contexts.allocation_depth) {
+if (request->contexts->allocation_depth) {
 ret = nbd_co_send_block_status(client, request,
exp->common.blk,
request->from, request->len,
@@ -2746,8 +2748,9 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 }

+assert(request->contexts->exp == client->exp);
 for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-if (!client->contexts.bitmaps[i]) {
+if (!request->contexts->bitmaps[i]) {
 continue;
 }
 ret = nbd_co_send_bitmap(client, request,
@@ -2763,6 +2766,10 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 assert(!contexts_remaining);

 return 0;
+} else if (client->contexts.count) {
+return nbd_send_generic_reply(client, request, -EINVAL,
+  "CMD_BLOCK_STATUS payload not valid",
+  errp);
 } else {
 return nbd_send_generic_reply(client, request, -EINVAL,
   "CMD_BLOCK_STATUS not negotiated",
@@ -2841,6 +2848,11 @@ static coroutine_fn void nbd_trip(void *opaque)
 } else {
 ret = nbd_handle_request(client, , req->data, _err);
 }
+if (request.type == NBD_CMD_BLOCK_STATUS &&
+request.contexts != >contexts) {
+g_free(request.contexts->bitmaps);
+g_free(request.contexts);
+}
 if (ret < 0) {
 error_prepend(_err, "Failed to send reply: ");
 goto disconnect;
-- 
2.40.1

[PATCH v4 03/24] nbd/server: Prepare for alternate-size headers

2023-06-08 Thread Eric Blake

Upstream NBD now documents[1] an extension that supports 64-bit effect
lengths in requests.  As part of that extension, the size of the reply
headers will change in order to permit a 64-bit length in the reply
for symmetry[2].  Additionally, where the reply header is currently 16
bytes for simple reply, and 20 bytes for structured reply; with the
extension enabled, there will only be one extended reply header, of 32
bytes, with both structured and extended modes sending identical
payloads for chunked replies.

Since we are already wired up to use iovecs, it is easiest to allow
for this change in header size by splitting each structured reply
across multiple iovecs, one for the header (which will become wider in
a future patch according to client negotiation), and the other(s) for
the chunk payload, and removing the header from the payload struct
definitions.  Rename the affected functions with s/structured/chunk/
to make it obvious that the code will be reused in extended mode.

Interestingly, the client side code never utilized the packed types,
so only the server code needs to be updated.

[1] 
https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md
as of NBD commit e6f3b94a934

[2] Note that on the surface, this is because some future server might
permit a 4G+ NBD_CMD_READ and need to reply with that much data in one
transaction.  But even though the extended reply length is widened to
64 bits, for now the NBD spec is clear that servers will not reply
with more than a maximum payload bounded by the 32-bit
NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually
agree to transactions larger than 4G would require yet another
extension.

Signed-off-by: Eric Blake 
---

v4: hoist earlier in series, drop most changes to
nbd_co_send_simple_reply, pass niov to set_be_chunk, rename several
functions, drop R-b
---
 include/block/nbd.h |   8 +--
 nbd/server.c| 137 ++--
 nbd/trace-events|   8 +--
 3 files changed, 88 insertions(+), 65 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9c3ceae5ba5..e563f1774b0 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -96,28 +96,28 @@ typedef union NBDReply {

 /* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
 typedef struct NBDStructuredReadData {
-NBDStructuredReplyChunk h; /* h.length >= 9 */
+/* header's .length >= 9 */
 uint64_t offset;
 /* At least one byte of data payload follows, calculated from h.length */
 } QEMU_PACKED NBDStructuredReadData;

 /* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE */
 typedef struct NBDStructuredReadHole {
-NBDStructuredReplyChunk h; /* h.length == 12 */
+/* header's length == 12 */
 uint64_t offset;
 uint32_t length;
 } QEMU_PACKED NBDStructuredReadHole;

 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
-NBDStructuredReplyChunk h; /* h.length >= 6 */
+/* header's length >= 6 */
 uint32_t error;
 uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;

 /* Header of NBD_REPLY_TYPE_BLOCK_STATUS */
 typedef struct NBDStructuredMeta {
-NBDStructuredReplyChunk h; /* h.length >= 12 (at least one extent) */
+/* header's length >= 12 (at least one extent) */
 uint32_t context_id;
 /* extents follows */
 } QEMU_PACKED NBDStructuredMeta;
diff --git a/nbd/server.c b/nbd/server.c
index febe001a399..6698ab46365 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2022 Red Hat, Inc.
+ *  Copyright Red Hat
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device Server Side
@@ -1906,16 +1906,36 @@ static int coroutine_fn 
nbd_co_send_simple_reply(NBDClient *client,
 {.iov_base = data, .iov_len = len}
 };

+assert(!len || !nbd_err);
 trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
len);
 set_be_simple_reply(, nbd_err, handle);

-return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
+return nbd_co_send_iov(client, iov, 2, errp);
 }

-static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
-uint16_t type, uint64_t handle, uint32_t 
length)
+/*
+ * Prepare the header of a reply chunk for network transmission.
+ *
+ * On input, @iov is partially initialized: iov[0].iov_base must point
+ * to an uninitialized NBDReply, while the remaining @niov elements
+ * (if any) must be ready for transmission.  This function then
+ * populates iov[0] for transmission.
+ */
+static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
+size_t niov, uint16_t flags, uint16_t type,
+uint64_t handle)
 {
+/* TODO - handle structured vs. extended replies */
+NBDStructuredReplyChunk *chunk = iov->iov_base;
+size_t i, length = 0;
+
+for (i = 1; i < niov; i++) {

[PATCH v4 15/24] nbd/server: Prepare to send extended header replies

2023-06-08 Thread Eric Blake

Although extended mode is not yet enabled, once we do turn it on, we
need to reply with extended headers to all messages.  Update the low
level entry points necessary so that all other callers automatically
get the right header based on the current mode.

Signed-off-by: Eric Blake 
---

v4: new patch, split out from v3 9/14
---
 nbd/server.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 119ac765f09..84c848a31d3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1947,8 +1947,6 @@ static inline void set_be_chunk(NBDClient *client, struct 
iovec *iov,
 size_t niov, uint16_t flags, uint16_t type,
 NBDRequest *request)
 {
-/* TODO - handle structured vs. extended replies */
-NBDStructuredReplyChunk *chunk = iov->iov_base;
 size_t i, length = 0;

 for (i = 1; i < niov; i++) {
@@ -1956,12 +1954,26 @@ static inline void set_be_chunk(NBDClient *client, 
struct iovec *iov,
 }
 assert(length <= NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData));

-iov[0].iov_len = sizeof(*chunk);
-stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
-stw_be_p(>flags, flags);
-stw_be_p(>type, type);
-stq_be_p(>cookie, request->cookie);
-stl_be_p(>length, length);
+if (client->mode >= NBD_MODE_EXTENDED) {
+NBDExtendedReplyChunk *chunk = iov->iov_base;
+
+iov->iov_len = sizeof(*chunk);
+stl_be_p(>magic, NBD_EXTENDED_REPLY_MAGIC);
+stw_be_p(>flags, flags);
+stw_be_p(>type, type);
+stq_be_p(>cookie, request->cookie);
+stq_be_p(>offset, request->from);
+stq_be_p(>length, length);
+} else {
+NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+iov->iov_len = sizeof(*chunk);
+stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
+stw_be_p(>flags, flags);
+stw_be_p(>type, type);
+stq_be_p(>cookie, request->cookie);
+stl_be_p(>length, length);
+}
 }

 static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client,
@@ -2478,6 +2490,8 @@ static coroutine_fn int nbd_send_generic_reply(NBDClient 
*client,
 {
 if (client->mode >= NBD_MODE_STRUCTURED && ret < 0) {
 return nbd_co_send_chunk_error(client, request, -ret, error_msg, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+return nbd_co_send_chunk_done(client, request, errp);
 } else {
 return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
 NULL, 0, errp);
-- 
2.40.1

[PATCH v4 06/24] nbd/client: Simplify cookie vs. index computation

2023-06-08 Thread Eric Blake

Our code relies on a sentinel cookie value of zero for deciding when a
packet has been handled, as well as relying on array indices between 0
and MAX_NBD_REQUESTS-1 for dereferencing purposes.  As long as we can
symmetrically convert between two forms, there is no reason to go with
the odd choice of using XOR with a random pointer, when we can instead
simplify the mappings with a mere offset of 1.

Signed-off-by: Eric Blake 
---

v4: new patch
---
 block/nbd.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index be3c46c6fee..5322e66166c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -50,8 +50,8 @@
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS16

-#define COOKIE_TO_INDEX(bs, cookie) ((cookie) ^ (uint64_t)(intptr_t)(bs))
-#define INDEX_TO_COOKIE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
+#define COOKIE_TO_INDEX(cookie) ((cookie) - 1)
+#define INDEX_TO_COOKIE(index)  ((index) + 1)

 typedef struct {
 Coroutine *coroutine;
@@ -420,7 +420,7 @@ static void coroutine_fn GRAPH_RDLOCK 
nbd_reconnect_attempt(BDRVNBDState *s)
 static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)
 {
 int ret;
-uint64_t ind = COOKIE_TO_INDEX(s, cookie), ind2;
+uint64_t ind = COOKIE_TO_INDEX(cookie), ind2;
 QEMU_LOCK_GUARD(>receive_mutex);

 while (true) {
@@ -435,7 +435,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)
  * woken by whoever set s->reply.cookie (or never wait in this
  * yield). So, we should not wake it here.
  */
-ind2 = COOKIE_TO_INDEX(s, s->reply.cookie);
+ind2 = COOKIE_TO_INDEX(s->reply.cookie);
 assert(!s->requests[ind2].receiving);

 s->requests[ind].receiving = true;
@@ -468,7 +468,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)
 nbd_channel_error(s, -EINVAL);
 return -EINVAL;
 }
-ind2 = COOKIE_TO_INDEX(s, s->reply.cookie);
+ind2 = COOKIE_TO_INDEX(s->reply.cookie);
 if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].coroutine) {
 nbd_channel_error(s, -EINVAL);
 return -EINVAL;
@@ -519,7 +519,7 @@ nbd_co_send_request(BlockDriverState *bs, NBDRequest 
*request,
 qemu_mutex_unlock(>requests_lock);

 qemu_co_mutex_lock(>send_mutex);
-request->cookie = INDEX_TO_COOKIE(s, i);
+request->cookie = INDEX_TO_COOKIE(i);

 assert(s->ioc);

@@ -832,7 +832,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 int *request_ret, QEMUIOVector *qiov, void **payload, Error **errp)
 {
 int ret;
-int i = COOKIE_TO_INDEX(s, cookie);
+int i = COOKIE_TO_INDEX(cookie);
 void *local_payload = NULL;
 NBDStructuredReplyChunk *chunk;

@@ -1038,7 +1038,7 @@ static bool coroutine_fn 
nbd_reply_chunk_iter_receive(BDRVNBDState *s,

 break_loop:
 qemu_mutex_lock(>requests_lock);
-s->requests[COOKIE_TO_INDEX(s, cookie)].coroutine = NULL;
+s->requests[COOKIE_TO_INDEX(cookie)].coroutine = NULL;
 s->in_flight--;
 qemu_co_queue_next(>free_sema);
 qemu_mutex_unlock(>requests_lock);
-- 
2.40.1

[PATCH v4 17/24] nbd/server: Enable initial support for extended headers

2023-06-08 Thread Eric Blake

Time to start supporting clients that request extended headers.  Now
we can finally reach the code added across several previous patches.

Even though the NBD spec has been altered to allow us to accept
NBD_CMD_READ larger than the max payload size (provided our response
is a hole or broken up over more than one data chunk), we are not
planning to take advantage of that, and continue to cap NBD_CMD_READ
to 32M regardless of header size.

For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
supports 64-bit operations without any effort on our part.  For
NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
patch took care of implementing the required
NBD_REPLY_TYPE_BLOCK_STATUS_EXT.

We do not yet support clients that want to do request payload
filtering of NBD_CMD_BLOCK_STATUS; that will be added in later
patches, but is not essential for qemu as a client since qemu only
requests the single context base:allocation.

Signed-off-by: Eric Blake 
---

v4: split out parts into earlier patches, rebase to earlier changes,
simplify handling of generic replies, retitle (compare to v3 9/14)
---
 docs/interop/nbd.txt |  1 +
 nbd/server.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f5ca25174a6..abaf4c28a96 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
+* 8.1: NBD_OPT_EXTENDED_HEADERS
diff --git a/nbd/server.c b/nbd/server.c
index 3010ff0dca4..ae293663ca2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -482,6 +482,10 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 [10 .. 133]   reserved (0) [unless no_zeroes]
  */
 trace_nbd_negotiate_handle_export_name();
+if (client->mode >= NBD_MODE_EXTENDED) {
+error_setg(errp, "Extended headers already negotiated");
+return -EINVAL;
+}
 if (client->optlen > NBD_MAX_STRING_SIZE) {
 error_setg(errp, "Bad length received");
 return -EINVAL;
@@ -1262,6 +1266,10 @@ static int nbd_negotiate_options(NBDClient *client, 
Error **errp)
 case NBD_OPT_STRUCTURED_REPLY:
 if (length) {
 ret = nbd_reject_length(client, false, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+ret = nbd_negotiate_send_rep_err(
+client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
+"extended headers already negotiated");
 } else if (client->mode >= NBD_MODE_STRUCTURED) {
 ret = nbd_negotiate_send_rep_err(
 client, NBD_REP_ERR_INVALID, errp,
@@ -1278,6 +1286,19 @@ static int nbd_negotiate_options(NBDClient *client, 
Error **errp)
  errp);
 break;

+case NBD_OPT_EXTENDED_HEADERS:
+if (length) {
+ret = nbd_reject_length(client, false, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+ret = nbd_negotiate_send_rep_err(
+client, NBD_REP_ERR_INVALID, errp,
+"extended headers already negotiated");
+} else {
+ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+client->mode = NBD_MODE_EXTENDED;
+}
+break;
+
 default:
 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
"Unsupported option %" PRIu32 " (%s)",
-- 
2.40.1

[PATCH v4 05/24] nbd: s/handle/cookie/ to match NBD spec

2023-06-08 Thread Eric Blake

Externally, libnbd exposed the 64-bit opaque marker for each client
NBD packet as the "cookie", because it was less confusing when
contrasted with 'struct nbd_handle *' holding all libnbd state.  It
also avoids confusion between the nown 'handle' as a way to identify a
packet and the verb 'handle' for reacting to things like signals.
Upstream NBD changed their spec to favor the name "cookie" based on
libnbd's recommendations[1], so we can do likewise.

[1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b

Signed-off-by: Eric Blake 
---

v4: new patch
---
 include/block/nbd.h | 11 +++---
 block/nbd.c | 96 +++--
 nbd/client.c| 14 +++
 nbd/server.c| 29 +++---
 nbd/trace-events| 22 +--
 5 files changed, 87 insertions(+), 85 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index e563f1774b0..59db69bafa5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -58,7 +58,7 @@ typedef struct NBDOptionReplyMetaContext {
  * request and reply!
  */
 typedef struct NBDRequest {
-uint64_t handle;
+uint64_t cookie;
 uint64_t from;
 uint32_t len;
 uint16_t flags; /* NBD_CMD_FLAG_* */
@@ -68,7 +68,7 @@ typedef struct NBDRequest {
 typedef struct NBDSimpleReply {
 uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
 uint32_t error;
-uint64_t handle;
+uint64_t cookie;
 } QEMU_PACKED NBDSimpleReply;

 /* Header of all structured replies */
@@ -76,7 +76,7 @@ typedef struct NBDStructuredReplyChunk {
 uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
 uint16_t flags;  /* combination of NBD_REPLY_FLAG_* */
 uint16_t type;   /* NBD_REPLY_TYPE_* */
-uint64_t handle; /* request handle */
+uint64_t cookie; /* request handle */
 uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

@@ -84,13 +84,14 @@ typedef union NBDReply {
 NBDSimpleReply simple;
 NBDStructuredReplyChunk structured;
 struct {
-/* @magic and @handle fields have the same offset and size both in
+/*
+ * @magic and @cookie fields have the same offset and size both in
  * simple reply and structured reply chunk, so let them be accessible
  * without ".simple." or ".structured." specification
  */
 uint32_t magic;
 uint32_t _skip;
-uint64_t handle;
+uint64_t cookie;
 } QEMU_PACKED;
 } NBDReply;

diff --git a/block/nbd.c b/block/nbd.c
index 5aef5cb6bd5..be3c46c6fee 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1,8 +1,8 @@
 /*
- * QEMU Block driver for  NBD
+ * QEMU Block driver for NBD
  *
  * Copyright (c) 2019 Virtuozzo International GmbH.
- * Copyright (C) 2016 Red Hat, Inc.
+ * Copyright Red Hat
  * Copyright (C) 2008 Bull S.A.S.
  * Author: Laurent Vivier 
  *
@@ -50,8 +50,8 @@
 #define EN_OPTSTR ":exportname="
 #define MAX_NBD_REQUESTS16

-#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
-#define INDEX_TO_HANDLE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))
+#define COOKIE_TO_INDEX(bs, cookie) ((cookie) ^ (uint64_t)(intptr_t)(bs))
+#define INDEX_TO_COOKIE(bs, index)  ((index)  ^ (uint64_t)(intptr_t)(bs))

 typedef struct {
 Coroutine *coroutine;
@@ -417,25 +417,25 @@ static void coroutine_fn GRAPH_RDLOCK 
nbd_reconnect_attempt(BDRVNBDState *s)
 reconnect_delay_timer_del(s);
 }

-static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
+static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)
 {
 int ret;
-uint64_t ind = HANDLE_TO_INDEX(s, handle), ind2;
+uint64_t ind = COOKIE_TO_INDEX(s, cookie), ind2;
 QEMU_LOCK_GUARD(>receive_mutex);

 while (true) {
-if (s->reply.handle == handle) {
+if (s->reply.cookie == cookie) {
 /* We are done */
 return 0;
 }

-if (s->reply.handle != 0) {
+if (s->reply.cookie != 0) {
 /*
  * Some other request is being handled now. It should already be
- * woken by whoever set s->reply.handle (or never wait in this
+ * woken by whoever set s->reply.cookie (or never wait in this
  * yield). So, we should not wake it here.
  */
-ind2 = HANDLE_TO_INDEX(s, s->reply.handle);
+ind2 = COOKIE_TO_INDEX(s, s->reply.cookie);
 assert(!s->requests[ind2].receiving);

 s->requests[ind].receiving = true;
@@ -445,9 +445,9 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t handle)
 /*
  * We may be woken for 2 reasons:
  * 1. From this function, executing in parallel coroutine, when our
- *handle is received.
+ *cookie is received.
  * 2. From nbd_co_receive_one_chunk(), when previous request is
- *finished and s->reply.handle set to 0.
+ *

[PATCH v4 24/24] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

2023-06-08 Thread Eric Blake

Allow a client to request a subset of negotiated meta contexts.  For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).

Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads.  Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.

This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.

Signed-off-by: Eric Blake 
---
 docs/interop/nbd.txt  |  2 +-
 nbd/server.c  | 99 ++-
 qemu-nbd.c|  1 +
 nbd/trace-events  |  1 +
 tests/qemu-iotests/223.out| 12 +--
 tests/qemu-iotests/307.out| 10 +-
 .../tests/nbd-qemu-allocation.out |  2 +-
 7 files changed, 111 insertions(+), 16 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index abaf4c28a96..83d85ce8d13 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 8.1: NBD_OPT_EXTENDED_HEADERS
+* 8.1: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/nbd/server.c b/nbd/server.c
index 308846fe46b..696afcf5c46 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -512,6 +512,9 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 if (client->mode >= NBD_MODE_STRUCTURED) {
 myflags |= NBD_FLAG_SEND_DF;
 }
+if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) {
+myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+}
 trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
 stq_be_p(buf, client->exp->size);
 stw_be_p(buf + 8, myflags);
@@ -699,6 +702,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 if (client->mode >= NBD_MODE_STRUCTURED) {
 myflags |= NBD_FLAG_SEND_DF;
 }
+if (client->mode >= NBD_MODE_EXTENDED &&
+(client->contexts.count || client->opt == NBD_OPT_INFO)) {
+myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+}
 trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
 stq_be_p(buf, exp->size);
 stw_be_p(buf + 8, myflags);
@@ -2424,6 +2431,81 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient 
*client,
 return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

+/*
+ * nbd_co_block_status_payload_read
+ * Called when a client wants a subset of negotiated contexts via a
+ * BLOCK_STATUS payload.  Check the payload for valid length and
+ * contents.  On success, return 0 with request updated to effective
+ * length.  If request was invalid but payload consumed, return 0 with
+ * request->len and request->contexts->count set to 0 (which will
+ * trigger an appropriate NBD_EINVAL response later on).  On I/O
+ * error, return -EIO.
+ */
+static int
+nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
+ Error **errp)
+{
+int payload_len = request->len;
+g_autofree char *buf = NULL;
+size_t count, i, nr_bitmaps;
+uint32_t id;
+
+assert(request->len <= NBD_MAX_BUFFER_SIZE);
+assert(client->contexts.exp == client->exp);
+nr_bitmaps = client->exp->nr_export_bitmaps;
+request->contexts = g_new0(NBDMetaContexts, 1);
+request->contexts->exp = client->exp;
+
+if (payload_len % sizeof(uint32_t) ||
+payload_len < sizeof(NBDBlockStatusPayload) ||
+payload_len > (sizeof(NBDBlockStatusPayload) +
+   sizeof(id) * client->contexts.count)) {
+goto skip;
+}
+
+buf = g_malloc(payload_len);
+if (nbd_read(client->ioc, buf, payload_len,
+ "CMD_BLOCK_STATUS data", errp) < 0) {
+return -EIO;
+}
+trace_nbd_co_receive_request_payload_received(request->cookie,
+

[PATCH v4 19/24] nbd/client: Initial support for extended headers

2023-06-08 Thread Eric Blake

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

Signed-off-by: Eric Blake 
---

v4: split off errp handling to separate patch [Vladimir], better
function naming [Vladimir]
---
 include/block/nbd.h |   3 +-
 block/nbd.c |   2 +-
 nbd/client.c| 100 +---
 nbd/trace-events|   3 +-
 4 files changed, 72 insertions(+), 36 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index dc05f5981fb..af80087e2cd 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -389,7 +389,8 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo 
*info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-   NBDReply *reply, Error **errp);
+   NBDReply *reply, NBDMode mode,
+   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index c17ce935f17..e281fac43d1 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -459,7 +459,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie,

 /* We are under mutex and cookie is 0. We have to do the dirty work. */
 assert(s->reply.cookie == 0);
-ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp);
+ret = nbd_receive_reply(s->bs, s->ioc, >reply, s->info.mode, errp);
 if (ret == 0) {
 ret = -EIO;
 error_setg(errp, "server dropped connection");
diff --git a/nbd/client.c b/nbd/client.c
index 1495a9b0ab1..a4598a95427 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1352,22 +1352,29 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
-uint8_t buf[NBD_REQUEST_SIZE];
+uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+size_t len;

-assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
-assert(request->len <= UINT32_MAX);
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));

-stl_be_p(buf, NBD_REQUEST_MAGIC);
 stw_be_p(buf + 4, request->flags);
 stw_be_p(buf + 6, request->type);
 stq_be_p(buf + 8, request->cookie);
 stq_be_p(buf + 16, request->from);
-stl_be_p(buf + 24, request->len);
+if (request->mode >= NBD_MODE_EXTENDED) {
+stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC);
+stq_be_p(buf + 24, request->len);
+len = NBD_EXTENDED_REQUEST_SIZE;
+} else {
+assert(request->len <= UINT32_MAX);
+stl_be_p(buf, NBD_REQUEST_MAGIC);
+stl_be_p(buf + 24, request->len);
+len = NBD_REQUEST_SIZE;
+}

-return nbd_write(ioc, buf, sizeof(buf), NULL);
+return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1394,30 +1401,36 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, 
NBDSimpleReply *reply,
 return 0;
 }

-/* nbd_receive_structured_reply_chunk
+/* nbd_receive_reply_chunk_header
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-  NBDStructuredReplyChunk *chunk,
-  Error **errp)
+static int nbd_receive_reply_chunk_header(QIOChannel *ioc, NBDReply *chunk,
+  Error **errp)
 {
 int ret;
+size_t len;
+uint64_t payload_len;

-assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+len = sizeof(chunk->structured);
+} else {
+assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
+len = sizeof(chunk->extended);
+}

 ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+   len - sizeof(chunk->magic), "structured chunk",
errp);
 if (ret < 0) {
 return ret;
 }

-chunk->flags = be16_to_cpu(chunk->flags);
-

[PATCH v4 16/24] nbd/server: Support 64-bit block status

2023-06-08 Thread Eric Blake

The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits.  As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.

For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
  -> bdrv_block_status_above(bytes, _t num)
  -> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers).  This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that.

Signed-off-by: Eric Blake 
---

v4: split conversion to big-endian across two helper functions rather
than in-place union [Vladimir]
---
 nbd/server.c | 104 ++-
 1 file changed, 78 insertions(+), 26 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 84c848a31d3..3010ff0dca4 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2111,20 +2111,24 @@ static int coroutine_fn 
nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-NBDExtent32 *extents;
+NBDExtent64 *extents;
 unsigned int nb_alloc;
 unsigned int count;
 uint64_t total_length;
+bool extended;
 bool can_add;
 bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+NBDMode mode)
 {
 NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

+assert(mode >= NBD_MODE_STRUCTURED);
 ea->nb_alloc = nb_alloc;
-ea->extents = g_new(NBDExtent32, nb_alloc);
+ea->extents = g_new(NBDExtent64, nb_alloc);
+ea->extended = mode >= NBD_MODE_EXTENDED;
 ea->can_add = true;

 return ea;
@@ -2143,15 +2147,36 @@ static void 
nbd_extent_array_convert_to_be(NBDExtentArray *ea)
 int i;

 assert(!ea->converted_to_be);
+assert(ea->extended);
 ea->can_add = false;
 ea->converted_to_be = true;

 for (i = 0; i < ea->count; i++) {
-ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
 }
 }

+/* Further modifications of the array after conversion are abandoned */
+static NBDExtent32 *nbd_extent_array_convert_to_narrow(NBDExtentArray *ea)
+{
+int i;
+NBDExtent32 *extents = g_new(NBDExtent32, ea->count);
+
+assert(!ea->converted_to_be);
+assert(!ea->extended);
+ea->can_add = false;
+ea->converted_to_be = true;
+
+for (i = 0; i < ea->count; i++) {
+assert((ea->extents[i].length | ea->extents[i].flags) <= UINT32_MAX);
+extents[i].length = cpu_to_be32(ea->extents[i].length);
+extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+}
+
+return extents;
+}
+
 /*
  * Add extent to NBDExtentArray. If extent can't be added (no available space),
  * return -1.
@@ -2162,19 +2187,23 @@ static void 
nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-uint32_t length, uint32_t flags)
+uint64_t length, uint32_t flags)
 {
 assert(ea->can_add);

 if (!length) {
 return 0;
 }
+if (!ea->extended) {
+assert(length <= UINT32_MAX);
+}

 /* Extend previous extent if flags are the same */
 if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+uint64_t sum = length + ea->extents[ea->count - 1].length;

-if (sum <= UINT32_MAX) {
+assert(sum >= length);
+if (sum <= UINT32_MAX || ea->extended) {
 ea->extents[ea->count - 1].length = sum;
 ea->total_length += length;
 return 0;
@@ -2187,7 +2216,7 @@ static int nbd_extent_array_add(NBDExtentArray *ea,

[PATCH v4 04/24] nbd/server: Refactor to pass full request around

2023-06-08 Thread Eric Blake

Part of NBD's 64-bit headers extension involves passing the client's
requested offset back as part of the reply header (one reason it
stated for this change: converting absolute offsets stored in
NBD_REPLY_TYPE_OFFSET_DATA to relative offsets within the buffer is
easier if the absolute offset of the buffer is also available).  This
is a refactoring patch to pass the full request around the reply
stack, rather than just the handle, so that later patches can then
access request->from when extended headers are active.  Meanwhile,
this patch enables us to now assert that simple replies are only
attempted when appropriate, and otherwise has no semantic change.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v4: reorder earlier in series, add assertion, keep R-b
---
 nbd/server.c | 114 ++-
 1 file changed, 59 insertions(+), 55 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 6698ab46365..26b27d69202 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1893,7 +1893,7 @@ static inline void set_be_simple_reply(NBDSimpleReply 
*reply, uint64_t error,
 }

 static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
- uint64_t handle,
+ NBDRequest *request,
  uint32_t error,
  void *data,
  size_t len,
@@ -1907,9 +1907,10 @@ static int coroutine_fn 
nbd_co_send_simple_reply(NBDClient *client,
 };

 assert(!len || !nbd_err);
-trace_nbd_co_send_simple_reply(handle, nbd_err, nbd_err_lookup(nbd_err),
-   len);
-set_be_simple_reply(, nbd_err, handle);
+assert(!client->structured_reply || request->type != NBD_CMD_READ);
+trace_nbd_co_send_simple_reply(request->handle, nbd_err,
+   nbd_err_lookup(nbd_err), len);
+set_be_simple_reply(, nbd_err, request->handle);

 return nbd_co_send_iov(client, iov, 2, errp);
 }
@@ -1924,7 +1925,7 @@ static int coroutine_fn 
nbd_co_send_simple_reply(NBDClient *client,
  */
 static inline void set_be_chunk(NBDClient *client, struct iovec *iov,
 size_t niov, uint16_t flags, uint16_t type,
-uint64_t handle)
+NBDRequest *request)
 {
 /* TODO - handle structured vs. extended replies */
 NBDStructuredReplyChunk *chunk = iov->iov_base;
@@ -1939,12 +1940,12 @@ static inline void set_be_chunk(NBDClient *client, 
struct iovec *iov,
 stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
 stw_be_p(>flags, flags);
 stw_be_p(>type, type);
-stq_be_p(>handle, handle);
+stq_be_p(>handle, request->handle);
 stl_be_p(>length, length);
 }

 static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client,
-   uint64_t handle,
+   NBDRequest *request,
Error **errp)
 {
 NBDReply hdr;
@@ -1952,15 +1953,15 @@ static int coroutine_fn 
nbd_co_send_chunk_done(NBDClient *client,
 {.iov_base = },
 };

-trace_nbd_co_send_chunk_done(handle);
+trace_nbd_co_send_chunk_done(request->handle);
 set_be_chunk(client, iov, 1, NBD_REPLY_FLAG_DONE,
- NBD_REPLY_TYPE_NONE, handle);
+ NBD_REPLY_TYPE_NONE, request);

 return nbd_co_send_iov(client, iov, 1, errp);
 }

 static int coroutine_fn nbd_co_send_chunk_read(NBDClient *client,
-   uint64_t handle,
+   NBDRequest *request,
uint64_t offset,
void *data,
size_t size,
@@ -1976,16 +1977,16 @@ static int coroutine_fn 
nbd_co_send_chunk_read(NBDClient *client,
 };

 assert(size);
-trace_nbd_co_send_chunk_read(handle, offset, data, size);
+trace_nbd_co_send_chunk_read(request->handle, offset, data, size);
 set_be_chunk(client, iov, 3, final ? NBD_REPLY_FLAG_DONE : 0,
- NBD_REPLY_TYPE_OFFSET_DATA, handle);
+ NBD_REPLY_TYPE_OFFSET_DATA, request);
 stq_be_p(, offset);

 return nbd_co_send_iov(client, iov, 3, errp);
 }
-
+/*ebb*/
 static int coroutine_fn nbd_co_send_chunk_error(NBDClient *client,
-uint64_t handle,
+NBDRequest *request,
 uint32_t error,
 const char *msg,
 Error **errp)
@@ -2000,10 +2001,10 @@ static int coroutine_fn

[PATCH v4 13/24] nbd/server: Refactor handling of request payload

2023-06-08 Thread Eric Blake

Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (up to 63 bits).  Without that extension, only the
NBD_CMD_WRITE request has a payload; but with the extension, it makes
sense to allow at least NBD_CMD_BLOCK_STATUS to have both a payload
and effect length (where the payload is a limited-size struct that in
turns gives the real effect length as well as a subset of known ids
for which status is requested).  Other future NBD commands may also
have a request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command.  Note that we do not
support the payload version of BLOCK_STATUS yet.

For this patch, no semantic change is intended for a compliant client.
For a non-compliant client, it is possible that the error behavior
changes (a different message, a change on whether the connection is
killed or remains alive for the next command, or so forth), in part
because req->complete is set later on some paths, but all errors
should still be handled gracefully.

Signed-off-by: Eric Blake 
---

v4: less indentation on several 'if's [Vladimir]
---
 nbd/server.c | 76 ++--
 nbd/trace-events |  1 +
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 4ac05d0cd7b..d7dc29f0445 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2329,6 +2329,8 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req, NBDRequest *
Error **errp)
 {
 NBDClient *client = req->client;
+bool extended_with_payload;
+unsigned payload_len = 0;
 int valid_flags;
 int ret;

@@ -2342,48 +2344,63 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req, NBDRequest *
 trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
  nbd_cmd_lookup(request->type));

-if (request->type != NBD_CMD_WRITE) {
-/* No payload, we are ready to read the next request.  */
-req->complete = true;
-}
-
 if (request->type == NBD_CMD_DISC) {
 /* Special case: we're going to disconnect without a reply,
  * whether or not flags, from, or len are bogus */
+req->complete = true;
 return -EIO;
 }

-if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
-request->type == NBD_CMD_CACHE)
-{
-if (request->len > NBD_MAX_BUFFER_SIZE) {
-error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
-   request->len, NBD_MAX_BUFFER_SIZE);
-return -EINVAL;
+/* Payload and buffer handling. */
+extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
+(request->flags & NBD_CMD_FLAG_PAYLOAD_LEN);
+if ((request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
+ request->type == NBD_CMD_CACHE || extended_with_payload) &&
+request->len > NBD_MAX_BUFFER_SIZE) {
+error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
+   request->len, NBD_MAX_BUFFER_SIZE);
+return -EINVAL;
+}
+
+if (request->type == NBD_CMD_WRITE || extended_with_payload) {
+payload_len = request->len;
+if (request->type != NBD_CMD_WRITE) {
+/*
+ * For now, we don't support payloads on other
+ * commands; but we can keep the connection alive.
+ */
+request->len = 0;
+} else if (client->mode >= NBD_MODE_EXTENDED &&
+   !extended_with_payload) {
+/* The client is noncompliant. Trace it, but proceed. */
+trace_nbd_co_receive_ext_payload_compliance(request->from,
+request->len);
 }
+}

-if (request->type != NBD_CMD_CACHE) {
-req->data = blk_try_blockalign(client->exp->common.blk,
-   request->len);
-if (req->data == NULL) {
-error_setg(errp, "No memory");
-return -ENOMEM;
-}
+if (request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ) {
+req->data = blk_try_blockalign(client->exp->common.blk,
+   request->len);
+if (req->data == NULL) {
+error_setg(errp, "No memory");
+return -ENOMEM;
 }
 }

-if (request->type == NBD_CMD_WRITE) {
-assert(request->len <= NBD_MAX_BUFFER_SIZE);
-if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
- errp) < 0)
-{
+if (payload_len) {
+if (req->data) {
+ret

[PATCH v4 07/24] nbd/client: Add safety check on chunk payload length

2023-06-08 Thread Eric Blake

Our existing use of structured replies either reads into a qiov capped
at 32M (NBD_CMD_READ) or caps allocation to 1000 bytes (see
NBD_MAX_MALLOC_PAYLOAD in block/nbd.c).  But the existing length
checks are rather late; if we encounter a buggy (or malicious) server
that sends a super-large payload length, we should drop the connection
right then rather than assuming the layer on top will be careful.
This becomes more important when we permit 64-bit lengths which are
even more likely to have the potential for attempted denial of service
abuse.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v4: sink this later in series [Vladimir]
---
 nbd/client.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/nbd/client.c b/nbd/client.c
index ea3590ca3d0..1b5569556fe 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1413,6 +1413,18 @@ static int nbd_receive_structured_reply_chunk(QIOChannel 
*ioc,
 chunk->cookie = be64_to_cpu(chunk->cookie);
 chunk->length = be32_to_cpu(chunk->length);

+/*
+ * Because we use BLOCK_STATUS with REQ_ONE, and cap READ requests
+ * at 32M, no valid server should send us payload larger than
+ * this.  Even if we stopped using REQ_ONE, sane servers will cap
+ * the number of extents they return for block status.
+ */
+if (chunk->length > NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData)) {
+error_setg(errp, "server chunk %" PRIu32 " (%s) payload is too long",
+   chunk->type, nbd_rep_lookup(chunk->type));
+return -EINVAL;
+}
+
 return 0;
 }

-- 
2.40.1

[PATCH v4 21/24] nbd/client: Request extended headers during negotiation

2023-06-08 Thread Eric Blake

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them, but expects us to do the negotiation in userspace before handing
the socket over to the kernel), but there is no harm in all other
clients requesting them.

Extended headers are not essential to the information collected during
'qemu-nbd --list', but probing for it gives us one more piece of
information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake 
---

v4: rebase to earlier changes, tweak commit message for why qemu-nbd
connection to /dev/nbd cannot use extended mode [Vladimir]
---
 nbd/client-connection.c   |  2 +-
 nbd/client.c  | 20 ++-
 qemu-nbd.c|  3 +++
 tests/qemu-iotests/223.out|  6 ++
 tests/qemu-iotests/233.out|  4 
 tests/qemu-iotests/241.out|  3 +++
 tests/qemu-iotests/307.out|  5 +
 .../tests/nbd-qemu-allocation.out |  1 +
 8 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 13e4cb6684b..d9d946da006 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
 .do_negotiation = do_negotiation,

 .initial_info.request_sizes = true,
-.initial_info.mode = NBD_MODE_STRUCTURED,
+.initial_info.mode = NBD_MODE_EXTENDED,
 .initial_info.base_allocation = true,
 .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
 .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index a4598a95427..99c0e5c8114 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -958,15 +958,23 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 if (fixedNewStyle) {
 int result = 0;

+if (max_mode >= NBD_MODE_EXTENDED) {
+result = nbd_request_simple_option(ioc,
+   NBD_OPT_EXTENDED_HEADERS,
+   false, errp);
+if (result) {
+return result < 0 ? -EINVAL : NBD_MODE_EXTENDED;
+}
+}
 if (max_mode >= NBD_MODE_STRUCTURED) {
 result = nbd_request_simple_option(ioc,
NBD_OPT_STRUCTURED_REPLY,
false, errp);
-if (result < 0) {
-return -EINVAL;
+if (result) {
+return result < 0 ? -EINVAL : NBD_MODE_STRUCTURED;
 }
 }
-return result ? NBD_MODE_STRUCTURED : NBD_MODE_SIMPLE;
+return NBD_MODE_SIMPLE;
 } else {
 return NBD_MODE_EXPORT_NAME;
 }
@@ -1040,6 +1048,7 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 }

 switch (info->mode) {
+case NBD_MODE_EXTENDED:
 case NBD_MODE_STRUCTURED:
 if (base_allocation) {
 result = nbd_negotiate_simple_meta_context(ioc, info, errp);
@@ -1150,7 +1159,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,

 *info = NULL;
 result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, ,
- NBD_MODE_STRUCTURED, NULL, errp);
+ NBD_MODE_EXTENDED, NULL, errp);
 if (tlscreds && sioc) {
 ioc = sioc;
 }
@@ -1161,6 +1170,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 switch ((NBDMode)result) {
 case NBD_MODE_SIMPLE:
 case NBD_MODE_STRUCTURED:
+case NBD_MODE_EXTENDED:
 /* newstyle - use NBD_OPT_LIST to populate array, then try
  * NBD_OPT_INFO on each array member. If structured replies
  * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */
@@ -1197,7 +1207,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 break;
 }

-if (result == NBD_MODE_STRUCTURED &&
+if (result >= NBD_MODE_STRUCTURED &&
 nbd_list_meta_contexts(ioc, [i], errp) < 0) {
 goto out;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 3ddd0bf02b4..1d155fc2c66 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -238,6 +238,9 @@ static int qemu_nbd_client_list(SocketAddress *saddr, 
QCryptoTLSCreds *tls,
 printf("  opt block: %u\n", list[i].opt_block);
 printf("  max block: %u\n", list[i].max_block);
 }
+printf("  transaction size: %s\n",
+

[PATCH v4 18/24] nbd/client: Plumb errp through nbd_receive_replies

2023-06-08 Thread Eric Blake

Instead of ignoring the low-level error just to refabricate our own
message to pass to the caller, we can just plump the caller's errp
down to the low level.

Signed-off-by: Eric Blake 
---

v4: new patch [Vladimir]
---
 block/nbd.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 57123c17f94..c17ce935f17 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -417,7 +417,8 @@ static void coroutine_fn GRAPH_RDLOCK 
nbd_reconnect_attempt(BDRVNBDState *s)
 reconnect_delay_timer_del(s);
 }

-static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)
+static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie,
+Error **errp)
 {
 int ret;
 uint64_t ind = COOKIE_TO_INDEX(cookie), ind2;
@@ -458,9 +459,12 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)

 /* We are under mutex and cookie is 0. We have to do the dirty work. */
 assert(s->reply.cookie == 0);
-ret = nbd_receive_reply(s->bs, s->ioc, >reply, NULL);
-if (ret <= 0) {
-ret = ret ? ret : -EIO;
+ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp);
+if (ret == 0) {
+ret = -EIO;
+error_setg(errp, "server dropped connection");
+}
+if (ret < 0) {
 nbd_channel_error(s, ret);
 return ret;
 }
@@ -843,9 +847,9 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }
 *request_ret = 0;

-ret = nbd_receive_replies(s, cookie);
+ret = nbd_receive_replies(s, cookie, errp);
 if (ret < 0) {
-error_setg(errp, "Connection closed");
+error_prepend(errp, "Connection closed: ");
 return -EIO;
 }
 assert(s->ioc);
-- 
2.40.1

[PATCH v4 10/24] nbd/client: Pass mode through to nbd_send_request

2023-06-08 Thread Eric Blake

Once the 64-bit headers extension is enabled, the data layout we send
over the wire for a client request depends on the mode negotiated with
the server.  Rather than adding a parameter to nbd_send_request, we
can add a member to struct NBDRequest, since it already does not
reflect on-wire format.  Some callers initialize it directly; many
others rely on a common initialization point during
nbd_co_send_request().  At this point, there is no semantic change.

Signed-off-by: Eric Blake 
---

v4: new patch, based on ideas in v3 4/14, but by modifying NBDRequest
instead of adding a parameter
---
 include/block/nbd.h | 12 +++-
 block/nbd.c |  5 +++--
 nbd/client.c|  3 ++-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index fea69ac24bb..52420660a65 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -62,17 +62,19 @@ typedef enum NBDMode {
 /* TODO add NBD_MODE_EXTENDED */
 } NBDMode;

-/* Transmission phase structs
- *
- * Note: these are _NOT_ the same as the network representation of an NBD
- * request and reply!
+/* Transmission phase structs */
+
+/*
+ * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
+ * request!
  */
 typedef struct NBDRequest {
 uint64_t cookie;
 uint64_t from;
 uint32_t len;
 uint16_t flags; /* NBD_CMD_FLAG_* */
-uint16_t type; /* NBD_CMD_* */
+uint16_t type;  /* NBD_CMD_* */
+NBDMode mode;   /* Determines which network representation to use */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/block/nbd.c b/block/nbd.c
index 5f88f7a819b..ca5991f868a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -339,7 +339,7 @@ int coroutine_fn 
nbd_co_do_establish_connection(BlockDriverState *bs,
  * We have connected, but must fail for other reasons.
  * Send NBD_CMD_DISC as a courtesy to the server.
  */
-NBDRequest request = { .type = NBD_CMD_DISC };
+NBDRequest request = { .type = NBD_CMD_DISC, .mode = s->info.mode };

 nbd_send_request(s->ioc, );

@@ -521,6 +521,7 @@ nbd_co_send_request(BlockDriverState *bs, NBDRequest 
*request,

 qemu_co_mutex_lock(>send_mutex);
 request->cookie = INDEX_TO_COOKIE(i);
+request->mode = s->info.mode;

 assert(s->ioc);

@@ -1466,7 +1467,7 @@ static void nbd_yank(void *opaque)
 static void nbd_client_close(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-NBDRequest request = { .type = NBD_CMD_DISC };
+NBDRequest request = { .type = NBD_CMD_DISC, .mode = s->info.mode };

 if (s->ioc) {
 nbd_send_request(s->ioc, );
diff --git a/nbd/client.c b/nbd/client.c
index faa054c4527..40a1eb72346 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1224,7 +1224,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 /* Send NBD_CMD_DISC as a courtesy to the server, but ignore all
  * errors now that we have the information we wanted. */
 if (nbd_drop(ioc, 124, NULL) == 0) {
-NBDRequest request = { .type = NBD_CMD_DISC };
+NBDRequest request = { .type = NBD_CMD_DISC, .mode = result };

 nbd_send_request(ioc, );
 }
@@ -1354,6 +1354,7 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
 uint8_t buf[NBD_REQUEST_SIZE];

+assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));
-- 
2.40.1

[PATCH v4 01/24] nbd/client: Use smarter assert

2023-06-08 Thread Eric Blake

Assigning strlen() to a uint32_t and then asserting that it isn't too
large doesn't catch the case of an input string 4G in length.
Thankfully, the incoming strings can never be that large: if the
export name or query is reflecting a string the client got from the
server, we already guarantee that we dropped the NBD connection if the
server sent more than 32M in a single reply to our NBD_OPT_* request;
if the export name is coming from qemu, nbd_receive_negotiate()
asserted that strlen(info->name) <= NBD_MAX_STRING_SIZE; and
similarly, a query string via x->dirty_bitmap coming from the user was
bounds-checked in either qemu-nbd or by the limitations of QMP.
Still, it doesn't hurt to be more explicit in how we write our
assertions to not have to analyze whether inadvertent wraparound is
possible.

Fixes: 93676c88 ("nbd: Don't send oversize strings", v4.2.0)
Reported-by: Dr. David Alan Gilbert 
Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 30d5383cb19..ff75722e487 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -650,19 +650,20 @@ static int nbd_send_meta_query(QIOChannel *ioc, uint32_t 
opt,
Error **errp)
 {
 int ret;
-uint32_t export_len = strlen(export);
+uint32_t export_len;
 uint32_t queries = !!query;
 uint32_t query_len = 0;
 uint32_t data_len;
 char *data;
 char *p;

+assert(strnlen(export, NBD_MAX_STRING_SIZE + 1) <= NBD_MAX_STRING_SIZE);
+export_len = strlen(export);
 data_len = sizeof(export_len) + export_len + sizeof(queries);
-assert(export_len <= NBD_MAX_STRING_SIZE);
 if (query) {
+assert(strnlen(query, NBD_MAX_STRING_SIZE + 1) <= NBD_MAX_STRING_SIZE);
 query_len = strlen(query);
 data_len += sizeof(query_len) + query_len;
-assert(query_len <= NBD_MAX_STRING_SIZE);
 } else {
 assert(opt == NBD_OPT_LIST_META_CONTEXT);
 }
-- 
2.40.1

[PATCH v4 12/24] nbd: Prepare for 64-bit request effect lengths

2023-06-08 Thread Eric Blake

Widen the length field of NBDRequest to 64-bits, although we can
assert that all current uses are still under 32 bits, because nothing
ever puts us into NBD_MODE_EXTENDED yet.  Thus no semantic change.  No
semantic change yet.

Signed-off-by: Eric Blake 
---

v4: split off enum changes to earlier patches [Vladimir]
---
 include/block/nbd.h |  4 ++--
 block/nbd.c | 25 +++--
 nbd/client.c|  1 +
 nbd/server.c| 11 ---
 nbd/trace-events|  8 
 5 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index f706e38dc72..dc05f5981fb 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -70,8 +70,8 @@ typedef enum NBDMode {
  */
 typedef struct NBDRequest {
 uint64_t cookie;
-uint64_t from;
-uint32_t len;
+uint64_t from;  /* Offset touched by the command */
+uint64_t len;   /* Effect length; 32 bit limit without extended headers */
 uint16_t flags; /* NBD_CMD_FLAG_* */
 uint16_t type;  /* NBD_CMD_* */
 NBDMode mode;   /* Determines which network representation to use */
diff --git a/block/nbd.c b/block/nbd.c
index c7581794873..57123c17f94 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1306,10 +1306,11 @@ nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset, int64_t bytes,
 NBDRequest request = {
 .type = NBD_CMD_WRITE_ZEROES,
 .from = offset,
-.len = bytes,  /* .len is uint32_t actually */
+.len = bytes,
 };

-assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */
+/* rely on max_pwrite_zeroes */
+assert(bytes <= UINT32_MAX || s->info.mode >= NBD_MODE_EXTENDED);

 assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
 if (!(s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
@@ -1356,10 +1357,11 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int64_t bytes)
 NBDRequest request = {
 .type = NBD_CMD_TRIM,
 .from = offset,
-.len = bytes, /* len is uint32_t */
+.len = bytes,
 };

-assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
+/* rely on max_pdiscard */
+assert(bytes <= UINT32_MAX || s->info.mode >= NBD_MODE_EXTENDED);

 assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
 if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
@@ -1381,8 +1383,7 @@ static int coroutine_fn GRAPH_RDLOCK 
nbd_client_co_block_status(
 NBDRequest request = {
 .type = NBD_CMD_BLOCK_STATUS,
 .from = offset,
-.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
-   MIN(bytes, s->info.size - offset)),
+.len = MIN(bytes, s->info.size - offset),
 .flags = NBD_CMD_FLAG_REQ_ONE,
 };

@@ -1392,6 +1393,10 @@ static int coroutine_fn GRAPH_RDLOCK 
nbd_client_co_block_status(
 *file = bs;
 return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
+if (s->info.mode < NBD_MODE_EXTENDED) {
+request.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
+  request.len);
+}

 /*
  * Work around the fact that the block layer doesn't do
@@ -1956,6 +1961,14 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_pwrite_zeroes = max;
 bs->bl.max_transfer = max;

+/*
+ * Assume that if the server supports extended headers, it also
+ * supports unlimited size zero and trim commands.
+ */
+if (s->info.mode >= NBD_MODE_EXTENDED) {
+bs->bl.max_pdiscard = bs->bl.max_pwrite_zeroes = 0;
+}
+
 if (s->info.opt_block &&
 s->info.opt_block > bs->bl.opt_transfer) {
 bs->bl.opt_transfer = s->info.opt_block;
diff --git a/nbd/client.c b/nbd/client.c
index 40a1eb72346..1495a9b0ab1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1355,6 +1355,7 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 uint8_t buf[NBD_REQUEST_SIZE];

 assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
+assert(request->len <= UINT32_MAX);
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));
diff --git a/nbd/server.c b/nbd/server.c
index 9b16f7e5405..4ac05d0cd7b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1439,7 +1439,7 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 request->type   = lduw_be_p(buf + 6);
 request->cookie = ldq_be_p(buf + 8);
 request->from   = ldq_be_p(buf + 16);
-request->len= ldl_be_p(buf + 24);
+request->len= ldl_be_p(buf + 24); /* widen 32 to 64 bits */

 trace_nbd_receive_request(magic, request->flags, request->type,
   request->from, request->len);
@@ -2357,7 +2357,7 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req, NBDRequest *
 request->type ==

[PATCH v4 09/24] nbd: Replace bool structured_reply with mode enum

2023-06-08 Thread Eric Blake

The upcoming patches for 64-bit extensions requires various points in
the protocol to make decisions based on what was negotiated.  While we
could easily add a 'bool extended_headers' alongside the existing
'bool structured_reply', this does not scale well if more modes are
added in the future.  Better is to expose the mode enum added in the
previous patch out to a wider use in the code base.

Where the code previously checked for structured_reply being set or
clear, it now prefers checking for an inequality; this works because
the nodes are in a continuum of increasing abilities, and allows us to
touch fewer places if we ever insert other modes in the middle of the
enum.  There should be no semantic change in this patch.

Signed-off-by: Eric Blake 
---

v4: new patch, expanding enum idea from v3 4/14
---
 include/block/nbd.h |  2 +-
 block/nbd.c |  8 +---
 nbd/client-connection.c |  4 ++--
 nbd/client.c| 18 +-
 nbd/server.c| 27 +++
 qemu-nbd.c  |  4 +++-
 6 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index aba4279b56c..fea69ac24bb 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -304,7 +304,7 @@ typedef struct NBDExportInfo {

 /* In-out fields, set by client before nbd_receive_negotiate() and
  * updated by server results during nbd_receive_negotiate() */
-bool structured_reply;
+NBDMode mode; /* input maximum mode tolerated; output actual mode chosen */
 bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS 
*/

 /* Set by server results during nbd_receive_negotiate() and
diff --git a/block/nbd.c b/block/nbd.c
index 5322e66166c..5f88f7a819b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -464,7 +464,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)
 nbd_channel_error(s, ret);
 return ret;
 }
-if (nbd_reply_is_structured(>reply) && !s->info.structured_reply) {
+if (nbd_reply_is_structured(>reply) &&
+s->info.mode < NBD_MODE_STRUCTURED) {
 nbd_channel_error(s, -EINVAL);
 return -EINVAL;
 }
@@ -867,7 +868,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }

 /* handle structured reply chunk */
-assert(s->info.structured_reply);
+assert(s->info.mode >= NBD_MODE_STRUCTURED);
 chunk = >reply.structured;

 if (chunk->type == NBD_REPLY_TYPE_NONE) {
@@ -1071,7 +1072,8 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t 
cookie,
 void *payload = NULL;
 Error *local_err = NULL;

-NBD_FOREACH_REPLY_CHUNK(s, iter, cookie, s->info.structured_reply,
+NBD_FOREACH_REPLY_CHUNK(s, iter, cookie,
+s->info.mode >= NBD_MODE_STRUCTURED,
 qiov, , )
 {
 int ret;
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 3d14296c042..13e4cb6684b 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -1,5 +1,5 @@
 /*
- * QEMU Block driver for  NBD
+ * QEMU Block driver for NBD
  *
  * Copyright (c) 2021 Virtuozzo International GmbH.
  *
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
 .do_negotiation = do_negotiation,

 .initial_info.request_sizes = true,
-.initial_info.structured_reply = true,
+.initial_info.mode = NBD_MODE_STRUCTURED,
 .initial_info.base_allocation = true,
 .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
 .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index 479208d5d9d..faa054c4527 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -880,7 +880,7 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
 static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
QCryptoTLSCreds *tlscreds,
const char *hostname, QIOChannel **outioc,
-   bool structured_reply, bool *zeroes,
+   NBDMode max_mode, bool *zeroes,
Error **errp)
 {
 ERRP_GUARD();
@@ -958,7 +958,7 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 if (fixedNewStyle) {
 int result = 0;

-if (structured_reply) {
+if (max_mode >= NBD_MODE_STRUCTURED) {
 result = nbd_request_simple_option(ioc,
NBD_OPT_STRUCTURED_REPLY,
false, errp);
@@ -1028,20 +1028,19 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 trace_nbd_receive_negotiate_name(info->name);

 result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
-

[PATCH v4 22/24] nbd/server: Refactor list of negotiated meta contexts

2023-06-08 Thread Eric Blake

Peform several minor refactorings of how the list of negotiated meta
contexts is managed, to make upcoming patches easier: Promote the
internal type NBDExportMetaContexts to the public opaque type
NBDMetaContexts, and mark exp const.  Use a shorter member name in
NBDClient.  Hoist calls to nbd_check_meta_context() earlier in their
callers, as the number of negotiated contexts may impact the flags
exposed in regards to an export, which in turn requires a new
parameter.  Drop a redundant parameter to nbd_negotiate_meta_queries.
No semantic change intended.

Signed-off-by: Eric Blake 
---

v4: new patch split out from v3 13/14, with smaller impact (quit
trying to separate exp outside of NBDMeataContexts)
---
 include/block/nbd.h |  1 +
 nbd/server.c| 55 -
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index af80087e2cd..f240707f646 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -28,6 +28,7 @@

 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
+typedef struct NBDMetaContexts NBDMetaContexts;

 extern const BlockExportDriver blk_exp_nbd;

diff --git a/nbd/server.c b/nbd/server.c
index ae293663ca2..42a4300c95e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -105,11 +105,13 @@ struct NBDExport {

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
+/*
+ * NBDMetaContexts represents a list of meta contexts in use,
  * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-NBDExport *exp;
+ * NBD_OPT_LIST_META_CONTEXT.
+ */
+struct NBDMetaContexts {
+const NBDExport *exp; /* associated export */
 size_t count; /* number of negotiated contexts */
 bool base_allocation; /* export base:allocation context (block status) */
 bool allocation_depth; /* export qemu:allocation-depth */
@@ -117,7 +119,7 @@ typedef struct NBDExportMetaContexts {
 * export qemu:dirty-bitmap:,
 * sized by exp->nr_export_bitmaps
 */
-} NBDExportMetaContexts;
+};

 struct NBDClient {
 int refcount;
@@ -144,7 +146,7 @@ struct NBDClient {
 uint32_t check_align; /* If non-zero, check for aligned client requests */

 NBDMode mode;
-NBDExportMetaContexts export_meta;
+NBDMetaContexts contexts; /* Negotiated meta contexts */

 uint32_t opt; /* Current option being negotiated */
 uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -455,10 +457,10 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
Error **errp)
 return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
 {
-if (client->exp != client->export_meta.exp) {
-client->export_meta.count = 0;
+if (exp != client->contexts.exp) {
+client->contexts.count = 0;
 }
 }

@@ -504,6 +506,7 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 error_setg(errp, "export not found");
 return -EINVAL;
 }
+nbd_check_meta_export(client, client->exp);

 myflags = client->exp->nbdflags;
 if (client->mode >= NBD_MODE_STRUCTURED) {
@@ -521,7 +524,6 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,

 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 blk_exp_ref(>exp->common);
-nbd_check_meta_export(client);

 return 0;
 }
@@ -641,6 +643,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
   errp, "export '%s' not present",
   sane_name);
 }
+if (client->opt == NBD_OPT_GO) {
+nbd_check_meta_export(client, exp);
+}

 /* Don't bother sending NBD_INFO_NAME unless client requested it */
 if (sendname) {
@@ -729,7 +734,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 client->check_align = check_align;
 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 blk_exp_ref(>exp->common);
-nbd_check_meta_export(client);
 rc = 1;
 }
 return rc;
@@ -852,7 +856,7 @@ static bool nbd_strshift(const char **str, const char 
*prefix)
  * Handle queries to 'base' namespace. For now, only the base:allocation
  * context is available.  Return true if @query has been handled.
  */
-static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+static bool nbd_meta_base_query(NBDClient *client, NBDMetaContexts *meta,
 const char *query)
 {
 if (!nbd_strshift(, "base:")) {
@@ -872,7 +876,7 @@ static bool nbd_meta_base_query(NBDClient *client, 
NBDExportMetaContexts

[PATCH v4 02/24] nbd: Consistent typedef usage in header

2023-06-08 Thread Eric Blake

We had a mix of struct declarataions followed by typedefs, and direct
struct definitions as part of a typedef.  Pick a single style.  Also
float a couple of opaque typedefs earlier in the file, as a later
patch wants to refer NBDExport* in NBDRequest.  No semantic impact.

Signed-off-by: Eric Blake 
---

v4: new patch
---
 include/block/nbd.h | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index a4c98169c39..9c3ceae5ba5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2022 Red Hat, Inc.
+ *  Copyright Red Hat
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device
@@ -26,24 +26,25 @@
 #include "qapi/error.h"
 #include "qemu/bswap.h"

+typedef struct NBDExport NBDExport;
+typedef struct NBDClient NBDClient;
+
 extern const BlockExportDriver blk_exp_nbd;

 /* Handshake phase structs - this struct is passed on the wire */

-struct NBDOption {
+typedef struct NBDOption {
 uint64_t magic; /* NBD_OPTS_MAGIC */
 uint32_t option; /* NBD_OPT_* */
 uint32_t length;
-} QEMU_PACKED;
-typedef struct NBDOption NBDOption;
+} QEMU_PACKED NBDOption;

-struct NBDOptionReply {
+typedef struct NBDOptionReply {
 uint64_t magic; /* NBD_REP_MAGIC */
 uint32_t option; /* NBD_OPT_* */
 uint32_t type; /* NBD_REP_* */
 uint32_t length;
-} QEMU_PACKED;
-typedef struct NBDOptionReply NBDOptionReply;
+} QEMU_PACKED NBDOptionReply;

 typedef struct NBDOptionReplyMetaContext {
 NBDOptionReply h; /* h.type = NBD_REP_META_CONTEXT, h.length > 4 */
@@ -56,14 +57,13 @@ typedef struct NBDOptionReplyMetaContext {
  * Note: these are _NOT_ the same as the network representation of an NBD
  * request and reply!
  */
-struct NBDRequest {
+typedef struct NBDRequest {
 uint64_t handle;
 uint64_t from;
 uint32_t len;
 uint16_t flags; /* NBD_CMD_FLAG_* */
 uint16_t type; /* NBD_CMD_* */
-};
-typedef struct NBDRequest NBDRequest;
+} NBDRequest;

 typedef struct NBDSimpleReply {
 uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
@@ -282,7 +282,7 @@ static inline bool nbd_reply_type_is_error(int type)
 #define NBD_ESHUTDOWN  108

 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
-struct NBDExportInfo {
+typedef struct NBDExportInfo {
 /* Set by client before nbd_receive_negotiate() */
 bool request_sizes;
 char *x_dirty_bitmap;
@@ -310,8 +310,7 @@ struct NBDExportInfo {
 char *description;
 int n_contexts;
 char **contexts;
-};
-typedef struct NBDExportInfo NBDExportInfo;
+} NBDExportInfo;

 int nbd_receive_negotiate(AioContext *aio_context, QIOChannel *ioc,
   QCryptoTLSCreds *tlscreds,
@@ -330,9 +329,6 @@ int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);

-typedef struct NBDExport NBDExport;
-typedef struct NBDClient NBDClient;
-
 void nbd_export_set_on_eject_blk(BlockExport *exp, BlockBackend *blk);

 AioContext *nbd_export_aio_context(NBDExport *exp);
-- 
2.40.1

[PATCH v4 20/24] nbd/client: Accept 64-bit block status chunks

2023-06-08 Thread Eric Blake

Once extended mode is enabled, we need to accept 64-bit status replies
(even for replies that don't exceed a 32-bit length).  It is easier to
normalize narrow replies into wide format so that the rest of our code
only has to handle one width.  Although a server is non-compliant if
it sends a 64-bit reply in compact mode, or a 32-bit reply in extended
mode, it is still easy enough to tolerate these mismatches.

In normal execution, we are only requesting "base:allocation" which
never exceeds 32 bits for flag values. But during testing with
x-dirty-bitmap, we can force qemu to connect to some other context
that might have 64-bit status bit; however, we ignore those upper bits
(other than mapping qemu:allocation-depth into something that
'qemu-img map --output=json' can expose), and since that only affects
testing, we really don't bother with checking whether more than the
two least-significant bits are set.

Signed-off-by: Eric Blake 
---

v4: tweak comments and error message about count mismatch, fix setting
of wide in loop [Vladimir]
---
 block/nbd.c| 47 --
 block/trace-events |  1 +
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index e281fac43d1..74c0a9d3b8c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -614,13 +614,16 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
  NBDStructuredReplyChunk *chunk,
- uint8_t *payload, uint64_t 
orig_length,
- NBDExtent32 *extent, Error **errp)
+ uint8_t *payload, bool wide,
+ uint64_t orig_length,
+ NBDExtent64 *extent, Error **errp)
 {
 uint32_t context_id;
+uint32_t count;
+size_t len = wide ? sizeof(*extent) : sizeof(NBDExtent32);

 /* The server succeeded, so it must have sent [at least] one extent */
-if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+if (chunk->length < sizeof(context_id) + wide * sizeof(count) + len) {
 error_setg(errp, "Protocol error: invalid payload for "
  "NBD_REPLY_TYPE_BLOCK_STATUS");
 return -EINVAL;
@@ -635,8 +638,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
 return -EINVAL;
 }

-extent->length = payload_advance32();
-extent->flags = payload_advance32();
+if (wide) {
+count = payload_advance32();
+extent->length = payload_advance64();
+extent->flags = payload_advance64();
+} else {
+count = 0;
+extent->length = payload_advance32();
+extent->flags = payload_advance32();
+}

 if (extent->length == 0) {
 error_setg(errp, "Protocol error: server sent status chunk with "
@@ -671,13 +681,16 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
 /*
  * We used NBD_CMD_FLAG_REQ_ONE, so the server should not have
  * sent us any more than one extent, nor should it have included
- * status beyond our request in that extent. However, it's easy
- * enough to ignore the server's noncompliance without killing the
+ * status beyond our request in that extent. Furthermore, a wide
+ * server should have replied with an accurate count (we left
+ * count at 0 for a narrow server).  However, it's easy enough to
+ * ignore the server's noncompliance without killing the
  * connection; just ignore trailing extents, and clamp things to
  * the length of our request.
  */
-if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
-trace_nbd_parse_blockstatus_compliance("more than one extent");
+if (count != wide ||
+chunk->length > sizeof(context_id) + wide * sizeof(count) + len) {
+trace_nbd_parse_blockstatus_compliance("unexpected extent count");
 }
 if (extent->length > orig_length) {
 extent->length = orig_length;
@@ -1123,7 +1136,7 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t 
cookie,

 static int coroutine_fn
 nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie,
- uint64_t length, NBDExtent32 *extent,
+ uint64_t length, NBDExtent64 *extent,
  int *request_ret, Error **errp)
 {
 NBDReplyChunkIter iter;
@@ -1136,11 +1149,17 @@ nbd_co_receive_blockstatus_reply(BDRVNBDState *s, 
uint64_t cookie,
 NBD_FOREACH_REPLY_CHUNK(s, iter, cookie, false, NULL, , ) {
 int ret;
 NBDStructuredReplyChunk *chunk = 
+bool wide;

 assert(nbd_reply_is_structured());

 switch (chunk->type) {
+case NBD_REPLY_TYPE_BLOCK_STATUS_EXT:
 case NBD_REPLY_TYPE_BLOCK_STATUS:
+wide = chunk->type ==

[PATCH v4 14/24] nbd/server: Prepare to receive extended header requests

2023-06-08 Thread Eric Blake

Although extended mode is not yet enabled, once we do turn it on, we
need to accept extended requests for all messages.  Previous patches
have already taken care of supporting 64-bit lengths, now we just need
to read it off the wire.

Note that this implementation will block indefinitely on a buggy
client that sends a non-extended payload (that is, we try to read a
full packet before we ever check the magic number, but a client that
mistakenly sends a simple request after negotiating extended headers
doesn't send us enough bytes), but it's no different from any other
client that stops talking to us partway through a packet and thus not
worth coding around.

Signed-off-by: Eric Blake 
---

v4: new patch, split out from v3 9/14
---
 nbd/nbd-internal.h |  5 -
 nbd/server.c   | 43 ++-
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 133b1d94b50..dfa02f77ee4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -34,8 +34,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_EXTENDED_REQUEST_SIZE   (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE  (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index d7dc29f0445..119ac765f09 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1413,11 +1413,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t 
size, Error **errp)
 static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest 
*request,
 Error **errp)
 {
-uint8_t buf[NBD_REQUEST_SIZE];
-uint32_t magic;
+uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+uint32_t magic, expect;
 int ret;
+size_t size = client->mode >= NBD_MODE_EXTENDED ?
+NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE;

-ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+ret = nbd_read_eof(client, buf, size, errp);
 if (ret < 0) {
 return ret;
 }
@@ -1425,13 +1427,21 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 return -EIO;
 }

-/* Request
-   [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-   [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-   [ 6 ..  7]   type(NBD_CMD_READ, ...)
-   [ 8 .. 15]   cookie
-   [16 .. 23]   from
-   [24 .. 27]   len
+/*
+ * Compact request
+ *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+ *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+ *  [ 6 ..  7]   type(NBD_CMD_READ, ...)
+ *  [ 8 .. 15]   cookie
+ *  [16 .. 23]   from
+ *  [24 .. 27]   len
+ * Extended request
+ *  [ 0 ..  3]   magic   (NBD_EXTENDED_REQUEST_MAGIC)
+ *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
+ *  [ 6 ..  7]   type(NBD_CMD_READ, ...)
+ *  [ 8 .. 15]   cookie
+ *  [16 .. 23]   from
+ *  [24 .. 31]   len
  */

 magic = ldl_be_p(buf);
@@ -1439,13 +1449,20 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 request->type   = lduw_be_p(buf + 6);
 request->cookie = ldq_be_p(buf + 8);
 request->from   = ldq_be_p(buf + 16);
-request->len= ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+if (client->mode >= NBD_MODE_EXTENDED) {
+request->len = ldq_be_p(buf + 24);
+expect = NBD_EXTENDED_REQUEST_MAGIC;
+} else {
+request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+expect = NBD_REQUEST_MAGIC;
+}

 trace_nbd_receive_request(magic, request->flags, request->type,
   request->from, request->len);

-if (magic != NBD_REQUEST_MAGIC) {
-error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
+if (magic != expect) {
+error_setg(errp, "invalid magic (got 0x%" PRIx32 ", expected 0x%"
+   PRIx32 ")", magic, expect);
 return -EINVAL;
 }
 return 0;
-- 
2.40.1

[PATCH v4 08/24] nbd: Use enum for various negotiation modes

2023-06-08 Thread Eric Blake

Deciphering the hard-coded list of integer return values from
nbd_start_negotiate() will only get more confusing when adding support
for 64-bit extended headers.  Better is to name things in an enum.
Although the function in question is private to client.c, putting the
enum in a public header and including an enum-to-string conversion
will allow its use in more places in upcoming patches.

The enum is intentionally laid out so that operators like <= can be
used to group multiple modes with similar characteristics, and where
the least powerful mode has value 0, even though this patch does not
exploit that.  No semantic change intended.

Signed-off-by: Eric Blake 
---

v4: new patch, expanding enum idea from v3 4/14
---
 include/block/nbd.h | 11 +++
 nbd/client.c| 46 -
 nbd/common.c| 17 +
 3 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 59db69bafa5..aba4279b56c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -52,6 +52,16 @@ typedef struct NBDOptionReplyMetaContext {
 /* metadata context name follows */
 } QEMU_PACKED NBDOptionReplyMetaContext;

+/* Track results of negotiation */
+typedef enum NBDMode {
+/* Keep this list in a continuum of increasing features. */
+NBD_MODE_OLDSTYLE, /* server lacks newstyle negotiation */
+NBD_MODE_EXPORT_NAME,  /* newstyle but only OPT_EXPORT_NAME safe */
+NBD_MODE_SIMPLE,   /* newstyle but only simple replies */
+NBD_MODE_STRUCTURED,   /* newstyle, structured replies enabled */
+/* TODO add NBD_MODE_EXTENDED */
+} NBDMode;
+
 /* Transmission phase structs
  *
  * Note: these are _NOT_ the same as the network representation of an NBD
@@ -404,6 +414,7 @@ const char *nbd_rep_lookup(uint32_t rep);
 const char *nbd_info_lookup(uint16_t info);
 const char *nbd_cmd_lookup(uint16_t info);
 const char *nbd_err_lookup(int err);
+const char *nbd_mode_lookup(NBDMode mode);

 /* nbd/client-connection.c */
 typedef struct NBDClientConnection NBDClientConnection;
diff --git a/nbd/client.c b/nbd/client.c
index 1b5569556fe..479208d5d9d 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -875,10 +875,7 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
  * Start the handshake to the server.  After a positive return, the server
  * is ready to accept additional NBD_OPT requests.
  * Returns: negative errno: failure talking to server
- *  0: server is oldstyle, must call nbd_negotiate_finish_oldstyle
- *  1: server is newstyle, but can only accept EXPORT_NAME
- *  2: server is newstyle, but lacks structured replies
- *  3: server is newstyle and set up for structured replies
+ *  non-negative: enum NBDMode describing server abilities
  */
 static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
QCryptoTLSCreds *tlscreds,
@@ -969,16 +966,16 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 return -EINVAL;
 }
 }
-return 2 + result;
+return result ? NBD_MODE_STRUCTURED : NBD_MODE_SIMPLE;
 } else {
-return 1;
+return NBD_MODE_EXPORT_NAME;
 }
 } else if (magic == NBD_CLIENT_MAGIC) {
 if (tlscreds) {
 error_setg(errp, "Server does not support STARTTLS");
 return -EINVAL;
 }
-return 0;
+return NBD_MODE_OLDSTYLE;
 } else {
 error_setg(errp, "Bad server magic received: 0x%" PRIx64, magic);
 return -EINVAL;
@@ -1032,6 +1029,9 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,

 result = nbd_start_negotiate(aio_context, ioc, tlscreds, hostname, outioc,
  info->structured_reply, , errp);
+if (result < 0) {
+return result;
+}

 info->structured_reply = false;
 info->base_allocation = false;
@@ -1039,8 +1039,8 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 ioc = *outioc;
 }

-switch (result) {
-case 3: /* newstyle, with structured replies */
+switch ((NBDMode)result) {
+case NBD_MODE_STRUCTURED:
 info->structured_reply = true;
 if (base_allocation) {
 result = nbd_negotiate_simple_meta_context(ioc, info, errp);
@@ -1050,7 +1050,7 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 info->base_allocation = result == 1;
 }
 /* fall through */
-case 2: /* newstyle, try OPT_GO */
+case NBD_MODE_SIMPLE:
 /* Try NBD_OPT_GO first - if it works, we are done (it
  * also gives us a good message if the server requires
  * TLS).  If it is not available, fall back to
@@ -1073,7 +1073,7 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 return

[PATCH v4 00/24] qemu patches for 64-bit NBD extensions

2023-06-08 Thread Eric Blake

v3 was here:
https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg03607.html

Since then, I've incorporated lots of feedback from Vladimir:
 - split several patches into smaller pieces
 - use an enum to track various negotiation modes
 - reorder a few patches
 - several new patches (2, 5, 6)
 - other cleanups

001/24:[] [--] 'nbd/client: Use smarter assert'
002/24:[down] 'nbd: Consistent typedef usage in header'
003/24:[0137] [FC] 'nbd/server: Prepare for alternate-size headers'
004/24:[0093] [FC] 'nbd/server: Refactor to pass full request around'
005/24:[down] 'nbd: s/handle/cookie/ to match NBD spec'
006/24:[down] 'nbd/client: Simplify cookie vs. index computation'
007/24:[] [-C] 'nbd/client: Add safety check on chunk payload length'
008/24:[down] 'nbd: Use enum for various negotiation modes'
009/24:[down] 'nbd: Replace bool structured_reply with mode enum'
010/24:[down] 'nbd/client: Pass mode through to nbd_send_request'
011/24:[0096] [FC] 'nbd: Add types for extended headers'
012/24:[0118] [FC] 'nbd: Prepare for 64-bit request effect lengths'
013/24:[0071] [FC] 'nbd/server: Refactor handling of request payload'
014/24:[down] 'nbd/server: Prepare to receive extended header requests'
015/24:[down] 'nbd/server: Prepare to send extended header replies'
016/24:[0132] [FC] 'nbd/server: Support 64-bit block status'
017/24:[down] 'nbd/server: Enable initial support for extended headers'
018/24:[down] 'nbd/client: Plumb errp through nbd_receive_replies'
019/24:[0066] [FC] 'nbd/client: Initial support for extended headers'
020/24:[0032] [FC] 'nbd/client: Accept 64-bit block status chunks'
021/24:[0058] [FC] 'nbd/client: Request extended headers during negotiation'
022/24:[down] 'nbd/server: Refactor list of negotiated meta contexts'
023/24:[0132] [FC] 'nbd/server: Prepare for per-request filtering of 
BLOCK_STATUS'
024/24:[0109] [FC] 'nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS'

Eric Blake (24):
  nbd/client: Use smarter assert
  nbd: Consistent typedef usage in header
  nbd/server: Prepare for alternate-size headers
  nbd/server: Refactor to pass full request around
  nbd: s/handle/cookie/ to match NBD spec
  nbd/client: Simplify cookie vs. index computation
  nbd/client: Add safety check on chunk payload length
  nbd: Use enum for various negotiation modes
  nbd: Replace bool structured_reply with mode enum
  nbd/client: Pass mode through to nbd_send_request
  nbd: Add types for extended headers
  nbd: Prepare for 64-bit request effect lengths
  nbd/server: Refactor handling of request payload
  nbd/server: Prepare to receive extended header requests
  nbd/server: Prepare to send extended header replies
  nbd/server: Support 64-bit block status
  nbd/server: Enable initial support for extended headers
  nbd/client: Plumb errp through nbd_receive_replies
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  nbd/server: Refactor list of negotiated meta contexts
  nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

 docs/interop/nbd.txt  |   1 +
 include/block/nbd.h   | 201 --
 nbd/nbd-internal.h|   8 +-
 block/nbd.c   | 189 +++--
 nbd/client-connection.c   |   4 +-
 nbd/client.c  | 199 --
 nbd/common.c  |  29 +-
 nbd/server.c  | 666 --
 qemu-nbd.c|   8 +-
 block/trace-events|   1 +
 nbd/trace-events  |  29 +-
 tests/qemu-iotests/223.out|  18 +-
 tests/qemu-iotests/233.out|   4 +
 tests/qemu-iotests/241.out|   3 +
 tests/qemu-iotests/307.out|  15 +-
 .../tests/nbd-qemu-allocation.out |   3 +-
 16 files changed, 937 insertions(+), 441 deletions(-)


base-commit: 4f65e89f8cf0e079b4ec3ddfede314bbb4e35c76
-- 
2.40.1

Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers

2023-06-08 Thread Richard W.M. Jones

On Thu, Jun 08, 2023 at 02:38:40PM +0200, Laszlo Ersek wrote:
> On 6/8/23 14:20, Richard W.M. Jones wrote:
> > On Thu, Jun 08, 2023 at 01:48:41PM +0200, Laszlo Ersek wrote:
> >> On 6/7/23 12:00, Richard W.M. Jones wrote:
> >>> On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:
>  BTW I'm foreseeing a problem: if the extended block descriptor can
>  provide an unsigned 64-bit length, we're going to have trouble exposing
>  that in OCaml, because OCaml only has signed 64-bit integers. So that's
>  going to reproduce the same issue, only for OCaml callers of the *new* 
>  API.
> 
>  I can see Eric's series includes patches like "ocaml: Add example for
>  64-bit extents" -- I've not looked at those yet; for now I'm just
>  wondering what tricks we might need in the bindings generator. The
>  method seen in the "middle patch" above won't work; we don't have a
>  native OCaml "i128" type for example that we could use as an escape
>  hatch, for representing C's uint64_t.
> >>>
> >>> I think that's OK because disk sizes are already limited to
> >>> 2^63 - 1 by the kernel (and for qemu even less than that).
> >>> The OCaml bindings return a (signed) int64 for NBD.get_size.
> >>
> >> Under patch#7 yesterday, I made a proposal for "armoring" at least one
> >> instance / direction of the uint64_t <-> int64 conversion. It raised an
> >> interesting problem: raising OCaml exceptions in such C functions that
> >> are *not* directly called by the OCaml runtime. Comments would be much
> >> appreciated in that subthread!
> > 
> > I can't seem to find that thread (but also gmail-- split the messages
> > randomly over the 3 different mailing lists because Google don't
> > understand how email works).  Do you have a link?
> 
> It starts here (link to patch#7):
> 
> 20230525130108.757242-8-eblake@redhat.com">http://mid.mail-archive.com/20230525130108.757242-8-eblake@redhat.com

OK I see it now (in the reply).  I have answered there.

> So for example, in extent64_wrapper_locked(), if we exited the called
> nbd_internal_ocaml_alloc_extent64_array() function with caml_failwith(),
> the caml_copy_string() and caml_copy_int64() allocations, stored earlier
> into "CAMLlocal"s "metacontextv" and "offsetv", would not be leaked?

Correct.  When unwinding the stack, those frame structs created by
CAML* macros are unlinked.  Then the heap variables will have no
references and will be freed in the course of garbage collection.

> > 
> >> (On a tangent: I've also noticed we use CAMLparam0() & friends in some
> >> of our functions that are *not* directly called by the OCaml runtime.
> >> They certainly run on the OCaml runtime's stack, but there's at least
> >> one intervening stack frame where the C-language function is provided by
> >> us. Now I know we must use CAMLparam0() in our *outermost* such
> >> function, but what about the further functions (inner C-language
> >> functions) that our outermost function calls in turn? I think the inner
> >> functions are at liberty not to use CAMLparam0() -- otherwise, our
> >> functions couldn't even call normal C library functions!)
> > 
> > These macros just set up a linked list of frames.  You don't need to
> > use them in every function, only ones which are using OCaml values.
> 
> Ah, understood.
> 
> > The macros are fairly easy to understand by reading them:
> > 
> > https://github.com/ocaml/ocaml/blob/864f772e5338dcf6be2093d5cc3ed6f7fbce16b7/runtime/caml/memory.h#L270
> > 
> > When the GC runs it walks up the linked list of the current thread to
> > find roots.  The only tricky thing about it is making sure that at any
> > point where the GC could run, each slot contains a valid entry and not
> > some intermediate or uninitialized value, since this is precise (not
> > conservative) garbage collection.
> 
> Right,  too contains a related
> warning:
> 
>   Rule 5  After a structured block (a block with tag less than
>   No_scan_tag) is allocated with the low-level functions, all fields of
>   this block must be filled with well-formed values before the next
>   allocation operation. [...]
> 
> Thankfully it also says, "You can ignore those rules if you stick to the
> simplified allocation function caml_alloc".
> 
> > 
> > This mechanism is only used by C code.  In OCaml code there's a bitmap
> > generated for each function showing which stack slots contain values
> > (versus ints, return addresses, other stuff).
> 
> I'm slightly interested in learning the OCaml runtime details, but at
> the same time I feel like not knowing them (and relying only on the
> docs) might allow me to write more "portable" code...

Yes the real rules are quite subtle, and following the documentation
is a good idea.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with

Re: [Libguestfs] [libnbd PATCH v3 07/22] generator: Add struct nbd_extent in prep for 64-bit extents

2023-06-08 Thread Richard W.M. Jones

On Wed, Jun 07, 2023 at 04:23:27PM +0200, Laszlo Ersek wrote:
[...]
> > diff --git a/ocaml/helpers.c b/ocaml/helpers.c
> > index 3361a696..09666daf 100644
> > --- a/ocaml/helpers.c
> > +++ b/ocaml/helpers.c
> > @@ -133,6 +133,26 @@ nbd_internal_ocaml_alloc_i64_from_u32_array (uint32_t 
> > *a, size_t len)
> >CAMLreturn (rv);
> >  }
> >
> > +value
> > +nbd_internal_ocaml_alloc_extent64_array (nbd_extent *a, size_t len)
> > +{
> > +  CAMLparam0 ();
> > +  CAMLlocal3 (s, v, rv);
> > +  size_t i;
> > +
> > +  rv = caml_alloc (len, 0);
> > +  for (i = 0; i < len; ++i) {
> > +s = caml_alloc (2, 0);
> > +v = caml_copy_int64 (a[i].length);
> > +Store_field (s, 0, v);
> > +v = caml_copy_int64 (a[i].flags);
> > +Store_field (s, 1, v);
> > +Store_field (rv, i, s);
> > +  }
> > +
> > +  CAMLreturn (rv);
> > +}
> > +
> >  /* Convert a Unix.sockaddr to a C struct sockaddr. */
> >  void
> >  nbd_internal_unix_sockaddr_to_sa (value sockaddrv,
> 
> (19) I'd suggest the following addition:
> 
> > diff --git a/ocaml/helpers.c b/ocaml/helpers.c
> > index 09666dafa7d1..db652943141d 100644
> > --- a/ocaml/helpers.c
> > +++ b/ocaml/helpers.c
> > @@ -25,6 +25,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -140,6 +141,16 @@ nbd_internal_ocaml_alloc_extent64_array (nbd_extent 
> > *a, size_t len)
> >CAMLlocal3 (s, v, rv);
> >size_t i;
> >
> > +  for (i = 0; i < len; ++i)
> > +if (a[i].length > INT64_MAX || a[i].flags > INT64_MAX) {
> > +  char errmsg[256];
> > +
> > +  snprintf (errmsg, sizeof errmsg,
> > +"%s: extent[%zu] = { .length = %"PRIu64", .flags = 
> > %"PRIu64"}",
> > +__func__, i, a[i].length, a[i].flags);
> > +  caml_failwith (errmsg);
> > +}
> > +
> >rv = caml_alloc (len, 0);
> >for (i = 0; i < len; ++i) {
> >  s = caml_alloc (2, 0);
>
> *However*, considering the nbd_internal_ocaml_alloc_extent64_array()
> call site, in the generated extent64_wrapper_locked() function
> [ocaml/nbd-c.c]:
> 
> > /* Wrapper for extent64 callback. */
> > static int
> > extent64_wrapper_locked (void *user_data, const char *metacontext,
> >  uint64_t offset, nbd_extent *entries,
> >  size_t nr_entries, int *error)
> > {
> >   CAMLparam0 ();
> >   CAMLlocal4 (metacontextv, offsetv, entriesv, errorv);
> >   CAMLlocal2 (exn, rv);
> >   const struct user_data *data = user_data;
> >   int r;
> >   value args[4];
> >
> >   metacontextv = caml_copy_string (metacontext);
> >   offsetv = caml_copy_int64 (offset);
> >   entriesv = nbd_internal_ocaml_alloc_extent64_array (
> >entries,
> >nr_entries
> >  );
> >   errorv = caml_alloc_tuple (1);
> >   Store_field (errorv, 0, Val_int (*error));
> >   args[0] = metacontextv;
> >   args[1] = offsetv;
> >   args[2] = entriesv;
> >   args[3] = errorv;
> >   rv = caml_callbackN_exn (data->fnv, 4, args);
> >   *error = Int_val (Field (errorv, 0));
> >   if (Is_exception_result (rv)) {
> > nbd_internal_ocaml_exception_in_wrapper ("extent64", rv);
> > CAMLreturnT (int, -1);
> >   }
> >
> >   r = Int_val (rv);
> >   assert (r >= 0);
> >   CAMLreturnT (int, r);
> > }
> 
> I'm not sure if raising an OCaml exception like this, in an *inner* C
> function, is appropriate. caml_failwith() may only be suitable for C
> functions *directly* called by the OCaml runtime.

caml_failwith is just a longjmp.  The OCaml values on the stack are
cleaned up by following a chain of stack frames which are created by
the CAMLparam* macros.  These macros don't need to be used in every
function, although they should be used in functions which have OCaml
value parameters or value local variables (but there's no harm in
using them unnecessarily).  They can also be omitted for functions
which do not invoke the OCaml GC (don't allocate on the OCaml heap,
basically).

C local variables however will not be cleaned up, so if there are any
arrays that have to be freed then it gets a bit more complicated.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [libnbd PATCH v3 17/22] ocaml: Add example for 64-bit extents

2023-06-08 Thread Richard W.M. Jones

On Thu, Jun 08, 2023 at 10:37:59AM +0100, Richard W.M. Jones wrote:
> Yes, the API is nicer now we return the subelements as a list instead
> of having to iterate over the list in pairs.  I might change that to
> an array or struct after these patches go upstream as that will be a
> tiny bit more efficient.

Actually the second sentence here is wrong.  It's allocated as a pair
not a list, and a pair has the same internal representation as a
2-element array.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers

2023-06-08 Thread Laszlo Ersek

On 6/8/23 14:20, Richard W.M. Jones wrote:
> On Thu, Jun 08, 2023 at 01:48:41PM +0200, Laszlo Ersek wrote:
>> On 6/7/23 12:00, Richard W.M. Jones wrote:
>>> On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:
 BTW I'm foreseeing a problem: if the extended block descriptor can
 provide an unsigned 64-bit length, we're going to have trouble exposing
 that in OCaml, because OCaml only has signed 64-bit integers. So that's
 going to reproduce the same issue, only for OCaml callers of the *new* API.

 I can see Eric's series includes patches like "ocaml: Add example for
 64-bit extents" -- I've not looked at those yet; for now I'm just
 wondering what tricks we might need in the bindings generator. The
 method seen in the "middle patch" above won't work; we don't have a
 native OCaml "i128" type for example that we could use as an escape
 hatch, for representing C's uint64_t.
>>>
>>> I think that's OK because disk sizes are already limited to
>>> 2^63 - 1 by the kernel (and for qemu even less than that).
>>> The OCaml bindings return a (signed) int64 for NBD.get_size.
>>
>> Under patch#7 yesterday, I made a proposal for "armoring" at least one
>> instance / direction of the uint64_t <-> int64 conversion. It raised an
>> interesting problem: raising OCaml exceptions in such C functions that
>> are *not* directly called by the OCaml runtime. Comments would be much
>> appreciated in that subthread!
> 
> I can't seem to find that thread (but also gmail-- split the messages
> randomly over the 3 different mailing lists because Google don't
> understand how email works).  Do you have a link?

It starts here (link to patch#7):

20230525130108.757242-8-eblake@redhat.com">http://mid.mail-archive.com/20230525130108.757242-8-eblake@redhat.com

(The primary libguestfs archive works too, but mailman splits the
archives on month boundaries, and so the patch is still in the May
bucket, but my reply is in the June one:

https://listman.redhat.com/archives/libguestfs/2023-May/031613.html
https://listman.redhat.com/archives/libguestfs/2023-June/031736.html
)

The particular topic is at the very end of my message.

> But the answer is you can raise an exception anywhere since it's
> really just a longjmp.  The only issue is if you need stack-allocated
> variables to be freed, which for OCaml values is handled by the
> CAMLlocal* macros and for C variables you need to be careful about and
> deal with yourself.

Understood! Thanks!

So for example, in extent64_wrapper_locked(), if we exited the called
nbd_internal_ocaml_alloc_extent64_array() function with caml_failwith(),
the caml_copy_string() and caml_copy_int64() allocations, stored earlier
into "CAMLlocal"s "metacontextv" and "offsetv", would not be leaked?

> 
>> (On a tangent: I've also noticed we use CAMLparam0() & friends in some
>> of our functions that are *not* directly called by the OCaml runtime.
>> They certainly run on the OCaml runtime's stack, but there's at least
>> one intervening stack frame where the C-language function is provided by
>> us. Now I know we must use CAMLparam0() in our *outermost* such
>> function, but what about the further functions (inner C-language
>> functions) that our outermost function calls in turn? I think the inner
>> functions are at liberty not to use CAMLparam0() -- otherwise, our
>> functions couldn't even call normal C library functions!)
> 
> These macros just set up a linked list of frames.  You don't need to
> use them in every function, only ones which are using OCaml values.

Ah, understood.

> The macros are fairly easy to understand by reading them:
> 
> https://github.com/ocaml/ocaml/blob/864f772e5338dcf6be2093d5cc3ed6f7fbce16b7/runtime/caml/memory.h#L270
> 
> When the GC runs it walks up the linked list of the current thread to
> find roots.  The only tricky thing about it is making sure that at any
> point where the GC could run, each slot contains a valid entry and not
> some intermediate or uninitialized value, since this is precise (not
> conservative) garbage collection.

Right,  too contains a related
warning:

  Rule 5  After a structured block (a block with tag less than
  No_scan_tag) is allocated with the low-level functions, all fields of
  this block must be filled with well-formed values before the next
  allocation operation. [...]

Thankfully it also says, "You can ignore those rules if you stick to the
simplified allocation function caml_alloc".

> 
> This mechanism is only used by C code.  In OCaml code there's a bitmap
> generated for each function showing which stack slots contain values
> (versus ints, return addresses, other stuff).

I'm slightly interested in learning the OCaml runtime details, but at
the same time I feel like not knowing them (and relying only on the
docs) might allow me to write more "portable" code...

Laszlo

Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers

2023-06-08 Thread Richard W.M. Jones

On Thu, Jun 08, 2023 at 01:48:41PM +0200, Laszlo Ersek wrote:
> On 6/7/23 12:00, Richard W.M. Jones wrote:
> > On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:
> >> BTW I'm foreseeing a problem: if the extended block descriptor can
> >> provide an unsigned 64-bit length, we're going to have trouble exposing
> >> that in OCaml, because OCaml only has signed 64-bit integers. So that's
> >> going to reproduce the same issue, only for OCaml callers of the *new* API.
> >>
> >> I can see Eric's series includes patches like "ocaml: Add example for
> >> 64-bit extents" -- I've not looked at those yet; for now I'm just
> >> wondering what tricks we might need in the bindings generator. The
> >> method seen in the "middle patch" above won't work; we don't have a
> >> native OCaml "i128" type for example that we could use as an escape
> >> hatch, for representing C's uint64_t.
> > 
> > I think that's OK because disk sizes are already limited to
> > 2^63 - 1 by the kernel (and for qemu even less than that).
> > The OCaml bindings return a (signed) int64 for NBD.get_size.
> 
> Under patch#7 yesterday, I made a proposal for "armoring" at least one
> instance / direction of the uint64_t <-> int64 conversion. It raised an
> interesting problem: raising OCaml exceptions in such C functions that
> are *not* directly called by the OCaml runtime. Comments would be much
> appreciated in that subthread!

I can't seem to find that thread (but also gmail-- split the messages
randomly over the 3 different mailing lists because Google don't
understand how email works).  Do you have a link?

But the answer is you can raise an exception anywhere since it's
really just a longjmp.  The only issue is if you need stack-allocated
variables to be freed, which for OCaml values is handled by the
CAMLlocal* macros and for C variables you need to be careful about and
deal with yourself.

> (On a tangent: I've also noticed we use CAMLparam0() & friends in some
> of our functions that are *not* directly called by the OCaml runtime.
> They certainly run on the OCaml runtime's stack, but there's at least
> one intervening stack frame where the C-language function is provided by
> us. Now I know we must use CAMLparam0() in our *outermost* such
> function, but what about the further functions (inner C-language
> functions) that our outermost function calls in turn? I think the inner
> functions are at liberty not to use CAMLparam0() -- otherwise, our
> functions couldn't even call normal C library functions!)

These macros just set up a linked list of frames.  You don't need to
use them in every function, only ones which are using OCaml values.

The macros are fairly easy to understand by reading them:

https://github.com/ocaml/ocaml/blob/864f772e5338dcf6be2093d5cc3ed6f7fbce16b7/runtime/caml/memory.h#L270

When the GC runs it walks up the linked list of the current thread to
find roots.  The only tricky thing about it is making sure that at any
point where the GC could run, each slot contains a valid entry and not
some intermediate or uninitialized value, since this is precise (not
conservative) garbage collection.

This mechanism is only used by C code.  In OCaml code there's a bitmap
generated for each function showing which stack slots contain values
(versus ints, return addresses, other stuff).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers

2023-06-08 Thread Laszlo Ersek

On 6/7/23 12:00, Richard W.M. Jones wrote:
> On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:
>> BTW I'm foreseeing a problem: if the extended block descriptor can
>> provide an unsigned 64-bit length, we're going to have trouble exposing
>> that in OCaml, because OCaml only has signed 64-bit integers. So that's
>> going to reproduce the same issue, only for OCaml callers of the *new* API.
>>
>> I can see Eric's series includes patches like "ocaml: Add example for
>> 64-bit extents" -- I've not looked at those yet; for now I'm just
>> wondering what tricks we might need in the bindings generator. The
>> method seen in the "middle patch" above won't work; we don't have a
>> native OCaml "i128" type for example that we could use as an escape
>> hatch, for representing C's uint64_t.
> 
> I think that's OK because disk sizes are already limited to
> 2^63 - 1 by the kernel (and for qemu even less than that).
> The OCaml bindings return a (signed) int64 for NBD.get_size.

Under patch#7 yesterday, I made a proposal for "armoring" at least one
instance / direction of the uint64_t <-> int64 conversion. It raised an
interesting problem: raising OCaml exceptions in such C functions that
are *not* directly called by the OCaml runtime. Comments would be much
appreciated in that subthread!

(On a tangent: I've also noticed we use CAMLparam0() & friends in some
of our functions that are *not* directly called by the OCaml runtime.
They certainly run on the OCaml runtime's stack, but there's at least
one intervening stack frame where the C-language function is provided by
us. Now I know we must use CAMLparam0() in our *outermost* such
function, but what about the further functions (inner C-language
functions) that our outermost function calls in turn? I think the inner
functions are at liberty not to use CAMLparam0() -- otherwise, our
functions couldn't even call normal C library functions!)

Thanks,
Laszlo

Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies

2023-06-08 Thread Laszlo Ersek

On 6/7/23 16:55, Richard W.M. Jones wrote:
> On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:
>> On 5/25/23 15:00, Eric Blake wrote:
>>> @@ -69,11 +75,18 @@  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
>>>   REPLY.STRUCTURED_REPLY.CHECK:
>>>struct command *cmd = h->reply_cmd;
>>>uint16_t flags, type;
>>> -  uint32_t length;
>>> +  uint64_t length;
>>> +  uint64_t offset = -1;
>>
>> (6) I disagree with initializing the local variable "offset" here.
>>
>> Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read
>> "offset" back if "extended_headers" is set. But if "extended_headers" is
>> set, we also store a value to "offset", before the read.
>>
>> Initializing "offset" to -1 suggests that the code might otherwise read
>> an indeterminate value from "offset" -- but that's not the case.
> 
> You may find that the compiler will give a warning.  It's usually not
> good about dealing with the case where a variable being initialized +
> used depends on another variable being true.

Good point; that reminds me we used to encounter that issue specifically
in IA32 edk2 builds, but only when using old gcc (RHEL-7 era, gcc-4.8.5).

Of course it might still happen today. If that's the case, can we
comment the code that we initialize "offset" for shutting up the compiler?

Thanks!
Laszlo

Re: [Libguestfs] [libnbd PATCH v3 00/22] NBD 64-bit extensions (libnbd portion)

2023-06-08 Thread Richard W.M. Jones



Sorry it took me so long to get around to this one.  Hopefully
the next version will need only cursory approval.

I did read all of the patches, and I basically agree with Laszlo on
the ones he reviewed already.  Therefore I didn't add any comment
except where necessary.  You can assume Acked-by on those patches.

Thanks!

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [Libguestfs] [libnbd PATCH v3 22/22] api: Add nbd_[aio_]block_status_filter()

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:08AM -0500, Eric Blake wrote:
> As part of extending NBD to support 64-bit lengths, the protocol also
> added an option for servers to allow clients to request filtered
> responses to NBD_CMD_BLOCK_STATUS when more than one meta-context is
> negotiated (see NBD commit e6f3b94a).  At the same time as this patch,
> qemu-nbd was taught to support and advertise this feature as a server,
> but does not utilize it as a client (qemu doesn't yet need to connect
> to multiple contexts at once).  Thus, addding generic client support
> and enhancing the interop/ test in libnbd is needed to prove that the
> feature is viable and worth standardizing.
> ---
>  lib/internal.h   |   5 +-
>  generator/API.ml |  71 +++--
>  generator/states-issue-command.c |   4 +-
>  lib/aio.c|   7 +-
>  lib/rw.c | 127 ++-
>  interop/block-status-payload.c   | 117 +++-
>  interop/block-status-payload.sh  |  14 +++-
>  info/info-can.sh |   3 +
>  8 files changed, 336 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/internal.h b/lib/internal.h
> index 2948b77b..64921de9 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -73,6 +73,8 @@ struct meta_context {
>  };
>  DEFINE_VECTOR_TYPE (meta_vector, struct meta_context);
> 
> +DEFINE_VECTOR_TYPE(uint32_vector, uint32_t);
> +
>  struct export {
>char *name;
>char *description;
> @@ -380,7 +382,8 @@ struct command {
>uint64_t cookie;
>uint64_t offset;
>uint64_t count;
> -  void *data; /* Buffer for read/write */
> +  void *data; /* Buffer for read/write, uint32_vector* for status payload */
> +  uint32_vector *ids; /* For block status with payload */
>struct command_cb cb;
>bool initialized; /* For read, true if getting a hole may skip memset */
>uint32_t data_seen; /* For read, cumulative size of data chunks seen */
> diff --git a/generator/API.ml b/generator/API.ml
> index 5a31ce3b..a26ed1da 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -2335,12 +2335,13 @@   "can_block_status_payload", {
>  longdesc = "\
>  Returns true if the server supports the use of the
>  C flag to allow filtering of the
> -block status command.  Returns
> +block status command (see L).  Returns
>  false if the server does not.  Note that this will never return
>  true if L is false."
>  ^ non_blocking_test_call_description;
>  see_also = [SectionLink "Flag calls"; Link "opt_info";
> -Link "get_extended_headers_negotiated"];
> +Link "get_extended_headers_negotiated";
> +Link "block_status_filter"];
>  example = Some "examples/server-flags.c";
>};
> 
> @@ -2409,6 +2410,10 @@   "can_meta_context", {
>  meta contexts were requested but there is a missing or failed
>  attempt at NBD_OPT_SET_META_CONTEXT during option negotiation.
> 
> +If the server supports block status filtering (see
> +L, this function must return
> +true for any filter name passed to L.
> +
>  The single parameter is the name of the metadata context,
>  for example C.
>  Blibnbd.hE> includes defined constants for well-known
> @@ -2941,9 +2946,12 @@   "block_status_64", {
>  information about blocks beginning from the specified
>  offset to be returned. The C parameter is a hint: the
>  server may choose to return less status, or the final block
> -may extend beyond the requested range. If multiple contexts
> +may extend beyond the requested range. When multiple contexts
>  are supported, the number of blocks and cumulative length
> -of those blocks need not be identical between contexts.
> +of those blocks need not be identical between contexts; this
> +command generally returns the status of all negotiated contexts,
> +while some servers also support a filtered request (see
> +L, L).
> 
>  Note that not all servers can support a C of 4GiB or larger;
>  L indicates which servers
> @@ -2993,11 +3001,38 @@   "block_status_64", {
>  does not exceed C bytes; however, libnbd does not
>  validate that the server obeyed the flag."
>  ^ strict_call_description;
> -see_also = [Link "block_status";
> +see_also = [Link "block_status"; Link "block_status_filter";
>  Link "add_meta_context"; Link "can_meta_context";
>  Link "aio_block_status_64"; Link "set_strict_mode"];
>};
> 
> +  "block_status_filter", {
> +default_call with
> +args = [ UInt64 "count"; UInt64 "offset"; StringList "contexts";
> + Closure extent64_closure ];
> +optargs = [ OFlags ("flags", cmd_flags, Some ["REQ_ONE"; "PAYLOAD_LEN"]) 
> ];
> +ret = RErr;
> +permitted_states = [ Connected ];
> +shortdesc = "send filtered block status command, with 64-bit callback";
> +longdesc = "\
> +Issue a filtered block status command to the NBD server.  If
> +supported by the server (see L),
> +this causes metadata context

Re: [PULL 02/17] block: Collapse padded I/O vecs exceeding IOV_MAX

2023-06-08 Thread Peter Maydell

On Mon, 5 Jun 2023 at 16:48, Hanna Czenczek  wrote:
>
> When processing vectored guest requests that are not aligned to the
> storage request alignment, we pad them by adding head and/or tail
> buffers for a read-modify-write cycle.

Hi; Coverity complains (CID 1512819) that the assert added
in this commit is always true:

> @@ -1573,26 +1701,34 @@ static void bdrv_padding_destroy(BdrvRequestPadding 
> *pad)
>  static int bdrv_pad_request(BlockDriverState *bs,
>  QEMUIOVector **qiov, size_t *qiov_offset,
>  int64_t *offset, int64_t *bytes,
> +bool write,
>  BdrvRequestPadding *pad, bool *padded,
>  BdrvRequestFlags *flags)
>  {
>  int ret;
> +struct iovec *sliced_iov;
> +int sliced_niov;
> +size_t sliced_head, sliced_tail;
>
>  bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, 
> _abort);
>
> -if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
> +if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
>  if (padded) {
>  *padded = false;
>  }
>  return 0;
>  }
>
> -ret = qemu_iovec_init_extended(>local_qiov, pad->buf, pad->head,
> -   *qiov, *qiov_offset, *bytes,
> -   pad->buf + pad->buf_len - pad->tail,
> -   pad->tail);
> +sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
> +  _head, _tail,
> +  _niov);
> +
> +/* Guaranteed by bdrv_check_qiov_request() */
> +assert(*bytes <= SIZE_MAX);

This one. (The Coverity complaint is because SIZE_MAX for it is
UINT64_MAX and an int64_t can't possibly be bigger than that.)

Is this because the assert() is deliberately handling the case
of a 32-bit host where SIZE_MAX might not be UINT64_MAX ? Or was
the bound supposed to be SSIZE_MAX or INT64_MAX ?

Looking at bdrv_check_qiov_request(), it seems to check bytes
against BDRV_MAX_LENGTH, which is defined as something somewhere
near INT64_MAX. So on a 32-bit host I'm not sure that function
does guarantee that the bytes count is going to be less than
SIZE_MAX...

(CID 1512819)

thanks
-- PMM

Re: [libnbd PATCH v3 21/22] api: Add nbd_can_block_status_payload()

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:07AM -0500, Eric Blake wrote:
> In the recent NBD protocol extensions to add 64-bit commands [1], an
> additional option was added to allow NBD_CMD_BLOCK_STATUS pass a

to pass

> client payload instructing the server to filter its answers in nbd.git
> commit e6f3b94a (mainly useful when the client requests more than one
> meta context with NBD_OPT_SET_META_CONTEXT).  This patch lays the
> groundwork by exposing servers that advertise this capability,
> although libnbd does not yet actually utilize it until the next patch.
> 
> At the time this patch was written, qemu-nbd was also patched to
> provide such support; hence, an interop/ test shows the API in action.
> 
> [1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/
> 
> Signed-off-by: Eric Blake 
> ---
>  info/nbdinfo.pod|  10 ++-
>  lib/nbd-protocol.h  |  29 +---
>  generator/API.ml|  18 +
>  lib/flags.c |  12 +++
>  examples/server-flags.c |   7 +-
>  interop/Makefile.am |   6 ++
>  interop/block-status-payload.c  | 126 
>  interop/block-status-payload.sh |  68 +
>  .gitignore  |   1 +
>  info/can.c  |   5 ++
>  info/show.c |   9 ++-
>  11 files changed, 274 insertions(+), 17 deletions(-)
>  create mode 100644 interop/block-status-payload.c
>  create mode 100755 interop/block-status-payload.sh
> 
> diff --git a/info/nbdinfo.pod b/info/nbdinfo.pod
> index 9ea4a278..f5dc53fa 100644
> --- a/info/nbdinfo.pod
> +++ b/info/nbdinfo.pod
> @@ -178,6 +178,8 @@ rotating disk: accessing nearby blocks may be faster than 
> random
>  access and requests should be sorted to improve performance.  Many
>  servers do not or cannot report this accurately.
> 
> +=item nbdinfo --can block-status-payload URI

Another case where "--has" sounds better ...

>  =item nbdinfo --can cache URI
> 
>  =item nbdinfo --can df URI
> @@ -345,10 +347,10 @@ The command does not print anything.  Instead it exits 
> with success
> 
>  For further information see the L  protocol|https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md>
> -and the following libnbd functions: L,
> -L, L, L,
> -L, L, L,
> -L, L,
> +and the following libnbd functions: L,
> +L, L, L,
> +L, L, L,
> +L, L, L,
>  L,
>  L.
> 
> diff --git a/lib/nbd-protocol.h b/lib/nbd-protocol.h
> index b6fa9b8a..9e358122 100644
> --- a/lib/nbd-protocol.h
> +++ b/lib/nbd-protocol.h
> @@ -102,17 +102,18 @@ struct nbd_fixed_new_option_reply {
>  #define NBD_FLAG_NO_ZEROES (1 << 1)
> 
>  /* Per-export flags. */
> -#define NBD_FLAG_HAS_FLAGS (1 << 0)
> -#define NBD_FLAG_READ_ONLY (1 << 1)
> -#define NBD_FLAG_SEND_FLUSH(1 << 2)
> -#define NBD_FLAG_SEND_FUA  (1 << 3)
> -#define NBD_FLAG_ROTATIONAL(1 << 4)
> -#define NBD_FLAG_SEND_TRIM (1 << 5)
> -#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)
> -#define NBD_FLAG_SEND_DF   (1 << 7)
> -#define NBD_FLAG_CAN_MULTI_CONN(1 << 8)
> -#define NBD_FLAG_SEND_CACHE(1 << 10)
> -#define NBD_FLAG_SEND_FAST_ZERO(1 << 11)
> +#define NBD_FLAG_HAS_FLAGS(1 << 0)
> +#define NBD_FLAG_READ_ONLY(1 << 1)
> +#define NBD_FLAG_SEND_FLUSH   (1 << 2)
> +#define NBD_FLAG_SEND_FUA (1 << 3)
> +#define NBD_FLAG_ROTATIONAL   (1 << 4)
> +#define NBD_FLAG_SEND_TRIM(1 << 5)
> +#define NBD_FLAG_SEND_WRITE_ZEROES(1 << 6)
> +#define NBD_FLAG_SEND_DF  (1 << 7)
> +#define NBD_FLAG_CAN_MULTI_CONN   (1 << 8)
> +#define NBD_FLAG_SEND_CACHE   (1 << 10)
> +#define NBD_FLAG_SEND_FAST_ZERO   (1 << 11)
> +#define NBD_FLAG_BLOCK_STATUS_PAYLOAD (1 << 12)
> 
>  /* NBD options (new style handshake only). */
>  #define NBD_OPT_EXPORT_NAME1
> @@ -204,6 +205,12 @@ struct nbd_request_ext {
>uint64_t count;   /* Request effect or payload length. */
>  } NBD_ATTRIBUTE_PACKED;
> 
> +/* Extended request payload for NBD_CMD_BLOCK_STATUS, when supported. */
> +struct nbd_block_status_payload {
> +  uint64_t length;  /* Effective length of client request */
> +  /* followed by array of uint32_t ids */
> +} NBD_ATTRIBUTE_PACKED;
> +
>  /* Simple reply (server -> client). */
>  struct nbd_simple_reply {
>uint32_t magic;   /* NBD_SIMPLE_REPLY_MAGIC. */
> diff --git a/generator/API.ml b/generator/API.ml
> index 85625bbd..5a31ce3b 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -2327,6 +2327,23 @@   "can_fast_zero", {
>  example = Some "examples/server-flags.c";
>};
> 
> +  "can_block_status_payload", {
> +default_call with
> +args = []; ret = RBool;
> +permitted_states = [ Negotiating; Connected; Closed ];
> +shortdesc = "does the server support the block status payload flag?";
> +longdesc = "\
> +Returns true if the server supports

Re: [Libguestfs] [libnbd PATCH v3 20/22] interop: Add test of 64-bit block status

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:06AM -0500, Eric Blake wrote:
> Prove that we can round-trip a block status request larger than 4G
> through a new-enough qemu-nbd.  Also serves as a unit test of our shim
> for converting internal 64-bit representation back to the older 32-bit
> nbd_block_status callback interface.

I think it would be best to call this test "large-block-status.{c,sh}"
as "large-status" is ambiguous.  (Or even "block-status-64"?)

The test itself is fine, so if renamed:

Reviewed-by: Richard W.M. Jones 

> Signed-off-by: Eric Blake 
> ---
>  interop/Makefile.am |   6 ++
>  interop/large-status.c  | 186 
>  interop/large-status.sh |  49 +++
>  .gitignore  |   1 +
>  4 files changed, 242 insertions(+)
>  create mode 100644 interop/large-status.c
>  create mode 100755 interop/large-status.sh
> 
> diff --git a/interop/Makefile.am b/interop/Makefile.am
> index 3f81df0c..9a7a5967 100644
> --- a/interop/Makefile.am
> +++ b/interop/Makefile.am
> @@ -21,6 +21,7 @@ EXTRA_DIST = \
>   dirty-bitmap.sh \
>   interop-qemu-storage-daemon.sh \
>   interop-qemu-block-size.sh \
> + large-status.sh \
>   list-exports-nbd-config \
>   list-exports-test-dir/disk1 \
>   list-exports-test-dir/disk2 \
> @@ -134,6 +135,7 @@ check_PROGRAMS += \
>   list-exports-qemu-nbd \
>   socket-activation-qemu-nbd \
>   dirty-bitmap \
> + large-status \
>   structured-read \
>   opt-extended-headers \
>   $(NULL)
> @@ -144,6 +146,7 @@ TESTS += \
>   list-exports-qemu-nbd \
>   socket-activation-qemu-nbd \
>   dirty-bitmap.sh \
> + large-status.sh \
>   structured-read.sh \
>   interop-qemu-block-size.sh \
>   opt-extended-headers.sh \
> @@ -235,6 +238,9 @@ socket_activation_qemu_nbd_LDADD = 
> $(top_builddir)/lib/libnbd.la
>  dirty_bitmap_SOURCES = dirty-bitmap.c
>  dirty_bitmap_LDADD = $(top_builddir)/lib/libnbd.la
> 
> +large_status_SOURCES = large-status.c
> +large_status_LDADD = $(top_builddir)/lib/libnbd.la
> +
>  structured_read_SOURCES = structured-read.c
>  structured_read_LDADD = $(top_builddir)/lib/libnbd.la
> 
> diff --git a/interop/large-status.c b/interop/large-status.c
> new file mode 100644
> index ..36415653
> --- /dev/null
> +++ b/interop/large-status.c
> @@ -0,0 +1,186 @@
> +/* NBD client library in userspace
> + * Copyright Red Hat
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +/* Test 64-bit block status with qemu. */
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +static const char *bitmap;
> +
> +struct data {
> +  bool req_one;/* input: true if req_one was passed to request */
> +  int count;   /* input: count of expected remaining calls */
> +  bool seen_base;  /* output: true if base:allocation encountered */
> +  bool seen_dirty; /* output: true if qemu:dirty-bitmap encountered */
> +};
> +
> +static int
> +cb32 (void *opaque, const char *metacontext, uint64_t offset,
> +  uint32_t *entries, size_t len, int *error)
> +{
> +  struct data *data = opaque;
> +
> +  assert (offset == 0);
> +  assert (data->count-- > 0);
> +
> +  if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) == 0) {
> +assert (!data->seen_base);
> +data->seen_base = true;
> +
> +/* Data block offset 0 size 64k, remainder is hole */
> +assert (len == 4);
> +assert (entries[0] == 65536);
> +assert (entries[1] == 0);
> +/* libnbd had to truncate qemu's >4G answer */
> +assert (entries[2] == 4227858432);
> +assert (entries[3] == (LIBNBD_STATE_HOLE|LIBNBD_STATE_ZERO));
> +  }
> +  else if (strcmp (metacontext, bitmap) == 0) {
> +assert (!data->seen_dirty);
> +data->seen_dirty = true;
> +
> +/* Dirty block at offset 5G-64k, remainder is clean */
> +/* libnbd had to truncate qemu's >4G answer */
> +assert (len == 2);
> +assert (entries[0] == 4227858432);
> +assert (entries[1] == 0);
> +  }
> +  else {
> +fprintf (stderr, "unexpected context %s\n", metacontext);
> +exit (EXIT_FAILURE);
> +  }
> +  return 0;
> +}
> +
> +static int
> +cb64 (void *opaque, const char *metacontext,

Re: [libnbd PATCH v3 19/22] api: Add nbd_[aio_]opt_extended_headers()

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:05AM -0500, Eric Blake wrote:
> Very similar to the recent addition of nbd_opt_structured_reply,
> giving us fine-grained control over an extended headers request.
> 
> Because nbdkit does not yet support extended headers, testsuite
> coverage is limited to interop testing with qemu-nbd.  It shows that
> extended headers imply structured replies, regardless of which order
> the two are manually negotiated in.
> 
> Signed-off-by: Eric Blake 
> ---
>  generator/API.ml  |  79 +++--
>  .../states-newstyle-opt-extended-headers.c|  30 +++-
>  generator/states-newstyle.c   |   3 +
>  lib/opt.c |  44 +
>  interop/Makefile.am   |   6 +
>  interop/opt-extended-headers.c| 153 ++
>  interop/opt-extended-headers.sh   |  29 
>  .gitignore|   1 +
>  8 files changed, 329 insertions(+), 16 deletions(-)
>  create mode 100644 interop/opt-extended-headers.c
>  create mode 100755 interop/opt-extended-headers.sh
> 
> diff --git a/generator/API.ml b/generator/API.ml
> index 078f140f..85625bbd 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -825,6 +825,7 @@   "set_request_extended_headers", {
>  if L is also set to false,
>  since the use of extended headers implies structured replies.";
>  see_also = [Link "get_request_extended_headers";
> +Link "opt_extended_headers";
>  Link "set_handshake_flags"; Link "set_strict_mode";
>  Link "get_extended_headers_negotiated";
>  Link "zero"; Link "trim"; Link "cache";
> @@ -856,7 +857,9 @@   "get_extended_headers_negotiated", {
>  shortdesc = "see if extended headers are in use";
>  longdesc = "\
>  After connecting you may call this to find out if the connection is
> -using extended headers.
> +using extended headers.  Note that this setting is sticky; this
> +can return true even after a second L
> +returns false because the server detected a duplicate request.
> 
>  When extended headers are not in use, commands are limited to a
>  32-bit length, even when the libnbd API uses a 64-bit parameter
> @@ -938,7 +941,7 @@   "get_structured_replies_negotiated", {
>  attempted.";
>  see_also = [Link "set_request_structured_replies";
>  Link "get_request_structured_replies";
> -Link "opt_structured_reply";
> +Link "opt_structured_reply"; Link "opt_extended_headers";
>  Link "get_protocol";
>  Link "get_extended_headers_negotiated"];
>};
> @@ -1211,12 +1214,13 @@   "set_opt_mode", {
>  newstyle server.  This setting has no effect when connecting to an
>  oldstyle server.
> 
> -Note that libnbd defaults to attempting C and
> -C before letting you control remaining
> -negotiation steps; if you need control over these steps as well,
> -first set L to C and
> -L to false before starting
> -the connection attempt.
> +Note that libnbd defaults to attempting C,
> +C, and C
> +before letting you control remaining negotiation steps; if you
> +need control over these steps as well, first set L
> +to C, and L
> +or L to false, before
> +starting the connection attempt.
> 
>  When option mode is enabled, you have fine-grained control over which
>  options are negotiated, compared to the default of the server
> @@ -1324,6 +1328,35 @@   "opt_starttls", {
>  Link "supports_tls"]
>};
> 
> +  "opt_extended_headers", {
> +default_call with
> +args = []; ret = RBool;
> +permitted_states = [ Negotiating ];
> +shortdesc = "request the server to enable extended headers";
> +longdesc = "\
> +Request that the server use extended headers, by sending
> +C.  This can only be used if
> +L enabled option mode; furthermore, libnbd
> +defaults to automatically requesting this unless you use
> +L or
> +L prior to connecting.
> +This function is mainly useful for integration testing of corner
> +cases in server handling.
> +
> +This function returns true if the server replies with success,
> +false if the server replies with an error, and fails only if
> +the server does not reply (such as for a loss of connection).
> +Note that some servers fail a second request as redundant;
> +libnbd assumes that once one request has succeeded, then
> +extended headers are supported (as visible by
> +L) regardless if
> +later calls to this function return false.  If this function
> +returns true, the use of structured replies is implied.";
> +see_also = [Link "set_opt_mode"; Link "aio_opt_extended_headers";
> +Link "opt_go"; Link "set_request_extended_headers";
> +Link "set_request_structured_replies"]
> +  };
> +
>"opt_structured_reply", {
>  default_call with
>  args = []; ret = RBool;
> @@ -1345,7 +1378,9 @@   "opt_structured_reply", {
>  libnbd

Re: [libnbd PATCH v3 18/22] generator: Actually request extended headers

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:04AM -0500, Eric Blake wrote:
> This is the culmination of the previous patches' preparation work for
> using extended headers when possible.  The new states in the state
> machine are copied extensively from our handling of
> OPT_STRUCTURED_REPLY.  The next patch will then expose a new API
> nbd_opt_extended_headers() for manual control.
> 
> At the same time I posted this patch, I had patches for qemu-nbd to
> support extended headers as server (nbdkit is a bit tougher).  The
> next patches will add some interop tests that pass when using a new
> enough qemu-nbd, showing that we have cross-project interoperability
> and therefore an extension worth standardizing.
> 
> Signed-off-by: Eric Blake 
> ---
>  generator/API.ml  | 87 -
>  generator/Makefile.am |  1 +
>  generator/state_machine.ml| 41 
>  .../states-newstyle-opt-extended-headers.c| 94 +++
>  generator/states-newstyle-opt-starttls.c  |  6 +-
>  5 files changed, 184 insertions(+), 45 deletions(-)
>  create mode 100644 generator/states-newstyle-opt-extended-headers.c
> 
> diff --git a/generator/API.ml b/generator/API.ml
> index 7616990a..078f140f 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -953,23 +953,24 @@   "set_request_meta_context", {
>  (all C calls when L is false,
>  or L and L when option mode is
>  enabled) will also try to issue NBD_OPT_SET_META_CONTEXT when
> -the server supports structured replies and any contexts were
> -registered by L.  The default setting
> -is true; however the extra step of negotiating meta contexts is
> -not always desirable: performing both info and go on the same
> -export works without needing to re-negotiate contexts on the
> -second call; integration testing of other servers may benefit
> -from manual invocation of L at
> -other times in the negotiation sequence; and even when using just
> -L, it can be faster to collect the server's
> +the server supports structured replies or extended headers and
> +any contexts were registered by L.  The
> +default setting is true; however the extra step of negotiating
> +meta contexts is not always desirable: performing both info and
> +go on the same export works without needing to re-negotiate
> +contexts on the second call; integration testing of other servers
> +may benefit from manual invocation of L
> +at other times in the negotiation sequence; and even when using
> +just L, it can be faster to collect the server's
>  results by relying on the callback function passed to
>  L than a series of post-process
>  calls to L.
> 
>  Note that this control has no effect if the server does not
> -negotiate structured replies, or if the client did not request
> -any contexts via L.  Setting this
> -control to false may cause L to fail.";
> +negotiate structured replies or extended headers, or if the
> +client did not request any contexts via L.
> +Setting this control to false may cause L
> +to fail.";
>  see_also = [Link "set_opt_mode"; Link "opt_go"; Link "opt_info";
>  Link "opt_list_meta_context"; Link "opt_set_meta_context";
>  Link "get_structured_replies_negotiated";
> @@ -1404,11 +1405,11 @@   "opt_info", {
>  If successful, functions like L and
>  L will report details about that export.  If
>  L is set (the default) and
> -structured replies were negotiated, it is also valid to use
> -L after this call.  However, it may be
> -more efficient to clear that setting and manually utilize
> -L with its callback approach, for
> -learning which contexts an export supports.  In general, if
> +structured replies or extended headers were negotiated, it is also
> +valid to use L after this call.  However,
> +it may be more efficient to clear that setting and manually
> +utilize L with its callback approach,
> +for learning which contexts an export supports.  In general, if
>  L is called next, that call will likely succeed
>  with the details remaining the same, although this is not
>  guaranteed by all servers.
> @@ -1538,12 +1539,12 @@   "opt_set_meta_context", {
>  recent L or L.
>  This can only be used if L enabled option
>  mode.  Normally, this function is redundant, as L
> -automatically does the same task if structured replies have
> -already been negotiated.  But manual control over meta context
> -requests can be useful for fine-grained testing of how a server
> -handles unusual negotiation sequences.  Often, use of this
> -function is coupled with L to
> -bypass the automatic context request normally performed by
> +automatically does the same task if structured replies or extended
> +headers have already been negotiated.  But manual control over
> +meta context requests can be useful for fine-grained testing of
> +how a server handles unusual negotiation sequences.  Often, use
> +of this function is coupled with L
> +to bypass the automatic context request

Re: [libnbd PATCH v3 17/22] ocaml: Add example for 64-bit extents

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:03AM -0500, Eric Blake wrote:
> Since our example program for 32-bit extents is inherently limited to
> 32-bit lengths, it is also worth demonstrating the 64-bit extent API,
> including the difference in the array indexing being saner.
> 
> Signed-off-by: Eric Blake 
> ---
>  ocaml/examples/Makefile.am  |  1 +
>  ocaml/examples/extents64.ml | 42 +
>  2 files changed, 43 insertions(+)
>  create mode 100644 ocaml/examples/extents64.ml
> 
> diff --git a/ocaml/examples/Makefile.am b/ocaml/examples/Makefile.am
> index 28b4ab94..a4eb47a5 100644
> --- a/ocaml/examples/Makefile.am
> +++ b/ocaml/examples/Makefile.am
> @@ -20,6 +20,7 @@ include $(top_srcdir)/subdir-rules.mk
>  ml_examples = \
>   asynch_copy.ml \
>   extents.ml \
> + extents64.ml \
>   get_size.ml \
>   open_qcow2.ml \
>   server_flags.ml \
> diff --git a/ocaml/examples/extents64.ml b/ocaml/examples/extents64.ml
> new file mode 100644
> index ..8ee7e218
> --- /dev/null
> +++ b/ocaml/examples/extents64.ml
> @@ -0,0 +1,42 @@
> +open Printf
> +
> +let () =
> +  NBD.with_handle (
> +fun nbd ->
> +  NBD.add_meta_context nbd "base:allocation";
> +  NBD.connect_command nbd
> +  ["nbdkit"; "-s"; "--exit-with-parent"; "-r";
> +   "sparse-random"; "8G"];
> +
> +  (* Read the extents and print them. *)
> +  let size = NBD.get_size nbd in
> +  let cap =
> +match NBD.get_extended_headers_negotiated nbd with
> +| true -> size
> +| false -> 0x8000__L in
> +  let fetch_offset = ref 0_L in
> +  while !fetch_offset < size do
> +let remaining = Int64.sub size !fetch_offset in
> +let fetch_size = min remaining cap in
> +NBD.block_status_64 nbd fetch_size !fetch_offset (
> +  fun meta _ entries err ->
> +printf "nbd_block_status callback: meta=%s err=%d\n" meta !err;
> +if meta = "base:allocation" then (
> +  printf "index\t%16s %16s %s\n" "offset" "length" "flags";
> +  for i = 0 to Array.length entries - 1 do
> +let len = fst entries.(i)
> +and flags =
> +  match snd entries.(i) with
> +  | 0_L -> "data"
> +  | 1_L -> "hole"
> +  | 2_L -> "zero"
> +  | 3_L -> "hole+zero"
> +  | unknown -> sprintf "unknown (%Ld)" unknown in
> +printf "%d:\t%16Ld %16Ld %s\n" i !fetch_offset len flags;
> +fetch_offset := Int64.add !fetch_offset len
> +  done;
> +);
> +0
> +) (* NBD.block_status *)
> +  done
> +  )

Yes, the API is nicer now we return the subelements as a list instead
of having to iterate over the list in pairs.  I might change that to
an array or struct after these patches go upstream as that will be a
tiny bit more efficient.

Reviewed-by: Richard W.M. Jones 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Re: [libnbd PATCH v3 16/22] examples: Update copy-libev to use 64-bit block status

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:02AM -0500, Eric Blake wrote:
> Although our use of "base:allocation" doesn't require the use of the
> 64-bit API for flags, we might perform slightly faster for a server
> that does give us 64-bit extent lengths and honors larger nbd_zero
> lengths.
> 
> Signed-off-by: Eric Blake 
> ---
>  examples/copy-libev.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/examples/copy-libev.c b/examples/copy-libev.c
> index 32cb46b3..9346faf7 100644
> --- a/examples/copy-libev.c
> +++ b/examples/copy-libev.c
> @@ -96,7 +96,7 @@ struct request {
>  };
> 
>  struct extent {
> -uint32_t length;
> +uint64_t length;
>  bool zero;
>  };
> 
> @@ -184,7 +184,7 @@ get_events (struct connection *c)
> 
>  static int
>  extent_callback (void *user_data, const char *metacontext, uint64_t offset,
> - uint32_t *entries, size_t nr_entries, int *error)
> + nbd_extent *entries, size_t nr_entries, int *error)
>  {
>  struct request *r = user_data;
> 
> @@ -199,22 +199,21 @@ extent_callback (void *user_data, const char 
> *metacontext, uint64_t offset,
>  return 1;
>  }
> 
> -/* Libnbd returns uint32_t pair (length, flags) for each extent. */
> -extents_len = nr_entries / 2;
> +extents_len = nr_entries;
> 
>  extents = malloc (extents_len * sizeof *extents);
>  if (extents == NULL)
>  FAIL ("Cannot allocated extents: %s", strerror (errno));
> 
>  /* Copy libnbd entries to extents array. */
> -for (int i = 0, j = 0; i < extents_len; i++, j=i*2) {
> -extents[i].length = entries[j];
> +for (int i = 0; i < extents_len; i++) {
> +extents[i].length = entries[i].length;
> 
>  /* Libnbd exposes both ZERO and HOLE flags. We care only about
>   * ZERO status, meaning we can copy this extent using efficinet
>   * zero method.
>   */
> -extents[i].zero = (entries[j + 1] & LIBNBD_STATE_ZERO) != 0;
> +extents[i].zero = (entries[i].flags & LIBNBD_STATE_ZERO) != 0;
>  }
> 
>  DEBUG ("r%zu: received %zu extents for %s",
> @@ -286,10 +285,10 @@ start_extents (struct request *r)
>  DEBUG ("r%zu: start extents offset=%" PRIi64 " count=%zu",
> r->index, offset, count);
> 
> -cookie = nbd_aio_block_status (
> +cookie = nbd_aio_block_status_64 (
>  src.nbd, count, offset,
> -(nbd_extent_callback) { .callback=extent_callback,
> -.user_data=r },
> +(nbd_extent64_callback) { .callback=extent_callback,
> +  .user_data=r },
>  (nbd_completion_callback) { .callback=extents_completed,
>  .user_data=r },
>  0);
> @@ -324,7 +323,7 @@ next_extent (struct request *r)
>  limit = MIN (REQUEST_SIZE, size - offset);
> 
>  while (length < limit) {
> -DEBUG ("e%zu: offset=%" PRIi64 " len=%" PRIu32 " zero=%d",
> +DEBUG ("e%zu: offset=%" PRIi64 " len=%" PRIu64 " zero=%d",
> extents_pos, offset, extents[extents_pos].length, is_zero);
> 
>  /* If this extent is too large, steal some data from it to

Reviewed-by: Richard W.M. Jones 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Re: [libnbd PATCH v3 15/22] info: Update nbdinfo --map to use 64-bit block status

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:01AM -0500, Eric Blake wrote:
> Although we usually map "base:allocation" which doesn't require the
> use of the 64-bit API for flags, this application IS intended to map
> out other metacontexts that might have 64-bit flags.  And when
> extended headers are in use, we might as well ask for the server to
> give us extents as large as it wants, rather than breaking things up
> at 4G boundaries.
> 
> At the time this patch was written, there are no known servers that
> actually provide a metacontext with 64-bit flags.  However, that is
> planned for the nbdkit v3 protocol.
> 
> Signed-off-by: Eric Blake 
> ---
>  info/map.c | 65 --
>  1 file changed, 34 insertions(+), 31 deletions(-)
> 
> diff --git a/info/map.c b/info/map.c
> index 1169fce4..50b058f2 100644
> --- a/info/map.c
> +++ b/info/map.c
> @@ -36,13 +36,13 @@
> 
>  #include "nbdinfo.h"
> 
> -DEFINE_VECTOR_TYPE (uint32_vector, uint32_t);
> +DEFINE_VECTOR_TYPE (uint64_vector, uint64_t);
> 
> -static void print_extents (uint32_vector *entries);
> -static void print_totals (uint32_vector *entries, int64_t size);
> +static void print_extents (uint64_vector *entries);
> +static void print_totals (uint64_vector *entries, int64_t size);
>  static int extent_callback (void *user_data, const char *metacontext,
>  uint64_t offset,
> -uint32_t *entries, size_t nr_entries,
> +nbd_extent *entries, size_t nr_entries,
>  int *error);
> 
>  void
> @@ -50,7 +50,7 @@ do_map (void)
>  {
>size_t i;
>int64_t size;
> -  uint32_vector entries = empty_vector;
> +  uint64_vector entries = empty_vector;
>uint64_t offset, align, max_len;
>size_t prev_entries_size;
> 
> @@ -69,14 +69,16 @@ do_map (void)
>  fprintf (stderr, "%s: %s\n", progname, nbd_get_error ());
>  exit (EXIT_FAILURE);
>}
> +  if (nbd_get_extended_headers_negotiated (nbd) == 1)
> +max_len = size;
> 
>for (offset = 0; offset < size;) {
>  prev_entries_size = entries.len;
> -if (nbd_block_status (nbd, MIN (size - offset, max_len), offset,
> -  (nbd_extent_callback) {
> -.callback = extent_callback,
> -.user_data =  },
> -  0) == -1) {
> +if (nbd_block_status_64 (nbd, MIN (size - offset, max_len), offset,
> + (nbd_extent64_callback) {
> +   .callback = extent_callback,
> +   .user_data =  },
> + 0) == -1) {
>fprintf (stderr, "%s: %s\n", progname, nbd_get_error ());
>exit (EXIT_FAILURE);
>  }
> @@ -99,18 +101,18 @@ do_map (void)
>  }
> 
>  /* Callback handling --map. */
> -static void print_one_extent (uint64_t offset, uint64_t len, uint32_t type);
> -static void extent_description (const char *metacontext, uint32_t type,
> +static void print_one_extent (uint64_t offset, uint64_t len, uint64_t type);
> +static void extent_description (const char *metacontext, uint64_t type,
>  char **descr, bool *free_descr,
>  const char **fg, const char **bg);
> 
>  static int
>  extent_callback (void *user_data, const char *metacontext,
>   uint64_t offset,
> - uint32_t *entries, size_t nr_entries,
> + nbd_extent *entries, size_t nr_entries,
>   int *error)
>  {
> -  uint32_vector *list = user_data;
> +  uint64_vector *list = user_data;
>size_t i;
> 
>if (strcmp (metacontext, map) != 0)
> @@ -120,7 +122,8 @@ extent_callback (void *user_data, const char *metacontext,
> * print_extents below.
> */
>for (i = 0; i < nr_entries; ++i) {
> -if (uint32_vector_append (list, entries[i]) == -1) {
> +if (uint64_vector_append (list, entries[i].length) == -1 ||
> +uint64_vector_append (list, entries[i].flags) == -1) {
>perror ("realloc");
>exit (EXIT_FAILURE);
>  }
> @@ -129,7 +132,7 @@ extent_callback (void *user_data, const char *metacontext,
>  }
> 
>  static void
> -print_extents (uint32_vector *entries)
> +print_extents (uint64_vector *entries)
>  {
>size_t i, j;
>uint64_t offset = 0;  /* end of last extent printed + 1 */
> @@ -138,7 +141,7 @@ print_extents (uint32_vector *entries)
>if (json_output) fprintf (fp, "[\n");
> 
>for (i = 0; i < entries->len; i += 2) {
> -uint32_t type = entries->ptr[last+1];
> +uint64_t type = entries->ptr[last+1];
> 
>  /* If we're coalescing and the current type is different from the
>   * previous one then we should print everything up to this entry.
> @@ -157,7 +160,7 @@ print_extents (uint32_vector *entries)
> 
>/* Print the last extent if there is one. */
>if (last != i) {
> -uint32_t type =

Re: [libnbd PATCH v3 14/22] info: Expose extended-headers support through nbdinfo

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:01:00AM -0500, Eric Blake wrote:
> Add another bit of overall server information, as well as a '--can
> extended-headers' silent query.  For now, the testsuite is written
> assuming that when nbdkit finally adds extended headers support, it
> will also add a --no-eh kill switch comparable to its existing --no-sr
> switch.
> 
> Signed-off-by: Eric Blake 
> ---
>  info/nbdinfo.pod | 11 ++-
>  info/can.c   |  9 +
>  info/info-can.sh | 27 +++
>  info/info-packets.sh | 17 -
>  info/main.c  |  7 ++-
>  5 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/info/nbdinfo.pod b/info/nbdinfo.pod
> index 7eb3c1a0..9ea4a278 100644
> --- a/info/nbdinfo.pod
> +++ b/info/nbdinfo.pod
> @@ -86,6 +86,7 @@ the I<--json> parameter:
> "protocol": "newstyle-fixed",
> "TLS": false,
> "structured": true,
> +   "extended": false,
> "exports": [
>   {
> "export-name": "",
> @@ -165,6 +166,11 @@ Test if the NBD URI connection is using TLS.
>  Test if server can respond with structured replies (a prerequisite
>  for supporting block status commands).
> 
> +=item nbdinfo --can extended-headers URI
> +
> +Test if server supports extended headers (a prerequisite for
> +supporting 64-bit commands; implies structured replies as well).

Maybe we should add "--has" and "--have" as yet more synonyms for
--is/--can.  "--has extended-headers" reads better.  It's only tiny
extra change to this array:

https://gitlab.com/nbdkit/libnbd/-/blob/27d4ea0bd833f17349e7696353c7a9df069a3e2a/info/main.c#L109

>  =item nbdinfo --is rotational URI
> 
>  Test if the server export is backed by something which behaves like a
> @@ -312,6 +318,8 @@ Display brief command line help and exit.
> 
>  =item B<--can df>
> 
> +=item B<--can extended-headers>
> +
>  =item B<--can fast-zero>
> 
>  =item B<--can flush>
> @@ -341,7 +349,8 @@ and the following libnbd functions: L,
>  L, L, L,
>  L, L, L,
>  L, L,
> -L.
> +L,
> +L.
> 
>  =item B<--color>
> 
> diff --git a/info/can.c b/info/can.c
> index 01ab4806..31c4a1ca 100644
> --- a/info/can.c
> +++ b/info/can.c
> @@ -50,6 +50,15 @@ do_can (void)
> strcasecmp (can, "structured_replies") == 0)
>  feature = nbd_get_structured_replies_negotiated (nbd);
> 
> +  else if (strcasecmp (can, "eh") == 0 ||
> +   strcasecmp (can, "extended header") == 0 ||
> +   strcasecmp (can, "extended-header") == 0 ||
> +   strcasecmp (can, "extended_header") == 0 ||
> +   strcasecmp (can, "extended headers") == 0 ||
> +   strcasecmp (can, "extended-headers") == 0 ||
> +   strcasecmp (can, "extended_headers") == 0)
> +feature = nbd_get_extended_headers_negotiated (nbd);
> +
>else if (strcasecmp (can, "readonly") == 0 ||
> strcasecmp (can, "read-only") == 0 ||
> strcasecmp (can, "read_only") == 0)
> diff --git a/info/info-can.sh b/info/info-can.sh
> index 6cc8cbf4..8154d1ce 100755
> --- a/info/info-can.sh
> +++ b/info/info-can.sh
> @@ -61,6 +61,33 @@ esac
>  EOF
>  test $st = 2
> 
> +# --can extended-headers cannot be positively tested until nbdkit gains
> +# --no-eh support.  Otherwise, it is similar to --can structured-reply.
> +
> +no_eh=
> +if nbdkit --no-eh --help >/dev/null 2>/dev/null; then
> +no_eh=--no-eh
> +nbdkit -v -U - sh - \
> +   --run '$VG nbdinfo --can extended-headers 
> "nbd+unix:///?socket=$unixsocket"' <<'EOF'
> +case "$1" in
> +  get_size) echo 1024 ;;
> +  pread) ;;
> +  *) exit 2 ;;
> +esac
> +EOF
> +fi
> +
> +st=0
> +nbdkit -v -U - $no_eh sh - \
> +   --run '$VG nbdinfo --can extended-headers 
> "nbd+unix:///?socket=$unixsocket"' <<'EOF' || st=$?
> +case "$1" in
> +  get_size) echo 1024 ;;
> +  pread) ;;
> +  *) exit 2 ;;
> +esac
> +EOF
> +test $st = 2
> +
>  # --can cache and --can fua require special handling because in
>  # nbdkit-sh-plugin we must print "native" or "none".  Also the can_fua
>  # flag is only sent if the export is writable (hence can_write below).
> diff --git a/info/info-packets.sh b/info/info-packets.sh
> index 2460052e..410faef8 100755
> --- a/info/info-packets.sh
> +++ b/info/info-packets.sh
> @@ -27,12 +27,27 @@ requires nbdkit --no-sr memory --version
>  out=info-packets.out
>  cleanup_fn rm -f $out
> 
> +# Older nbdkit does not support extended headers; --no-eh is a reliable
> +# witness of whether nbdkit is new enough.
> +
> +no_eh=
> +if nbdkit --no-eh --help >/dev/null 2>/dev/null; then
> +no_eh=--no-eh
> +fi
> +
>  nbdkit --no-sr -U - memory size=1M \
> --run '$VG nbdinfo "nbd+unix:///?socket=$unixsocket"' > $out
>  cat $out
>  grep "protocol: .*using simple packets" $out
> 
> -nbdkit -U - memory size=1M \
> +nbdkit $no_eh -U - memory size=1M \
> --run '$VG nbdinfo "nbd+unix:///?socket=$unixsocket"' > $out
>  cat $out
>  grep "protocol: .*using structured packets" $out
> +
> +if test x != "x$no_eh"; then
> +

Re: [Libguestfs] [libnbd PATCH v3 13/22] dump: Update nbddump to use 64-bit block status

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:00:59AM -0500, Eric Blake wrote:
> Although our use of "base:allocation" doesn't require the use of the
> 64-bit API for flags, we might perform slightly faster for a server
> that does give us 64-bit extent lengths.
> 
> Signed-off-by: Eric Blake 
> ---
>  dump/dump.c | 27 ++-
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/dump/dump.c b/dump/dump.c
> index b4aebe84..71053277 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -38,7 +38,7 @@
>  #include "version.h"
>  #include "vector.h"
> 
> -DEFINE_VECTOR_TYPE (uint32_vector, uint32_t);
> +DEFINE_VECTOR_TYPE (uint64_vector, uint64_t);
> 
>  static const char *progname;
>  static struct nbd_handle *nbd;
> @@ -262,10 +262,10 @@ catch_signal (int sig)
>  static int
>  extent_callback (void *user_data, const char *metacontext,
>   uint64_t offset,
> - uint32_t *entries, size_t nr_entries,
> + nbd_extent *entries, size_t nr_entries,
>   int *error)
>  {
> -  uint32_vector *list = user_data;
> +  uint64_vector *list = user_data;
>size_t i;
> 
>if (strcmp (metacontext, LIBNBD_CONTEXT_BASE_ALLOCATION) != 0)
> @@ -273,7 +273,8 @@ extent_callback (void *user_data, const char *metacontext,
> 
>/* Just append the entries we got to the list. */
>for (i = 0; i < nr_entries; ++i) {
> -if (uint32_vector_append (list, entries[i]) == -1) {
> +if (uint64_vector_append (list, entries[i].length) == -1 ||
> +uint64_vector_append (list, entries[i].flags) == -1) {
>perror ("realloc");
>exit (EXIT_FAILURE);
>  }
> @@ -284,7 +285,7 @@ extent_callback (void *user_data, const char *metacontext,
>  static bool
>  test_all_zeroes (uint64_t offset, size_t count)
>  {
> -  uint32_vector entries = empty_vector;
> +  uint64_vector entries = empty_vector;
>size_t i;
>uint64_t count_read;
> 
> @@ -296,22 +297,22 @@ test_all_zeroes (uint64_t offset, size_t count)
> * false, causing the main code to do a full read.  We could be
> * smarter and keep asking the server (XXX).
> */
> -  if (nbd_block_status (nbd, count, offset,
> -(nbd_extent_callback) {
> -  .callback = extent_callback,
> -  .user_data =  },
> -0) == -1) {
> +  if (nbd_block_status_64 (nbd, count, offset,
> +   (nbd_extent64_callback) {
> + .callback = extent_callback,
> + .user_data =  },
> +   0) == -1) {
>  fprintf (stderr, "%s: %s\n", progname, nbd_get_error ());
>  exit (EXIT_FAILURE);
>}
> 
>count_read = 0;
>for (i = 0; i < entries.len; i += 2) {
> -uint32_t len = entries.ptr[i];
> -uint32_t type = entries.ptr[i+1];
> +uint64_t len = entries.ptr[i];
> +uint64_t type = entries.ptr[i+1];
> 
>  count_read += len;
> -if (!(type & 2))/* not zero */
> +if (!(type & LIBNBD_STATE_ZERO))/* not zero */
>return false;
>}

Reviewed-by: Richard W.M. Jones 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

Re: [Libguestfs] [libnbd PATCH v3 12/22] copy: Update nbdcopy to use 64-bit block status

2023-06-08 Thread Richard W.M. Jones

On Thu, May 25, 2023 at 08:00:58AM -0500, Eric Blake wrote:
> Although our use of "base:allocation" doesn't require the use of the
> 64-bit API for flags, we might perform slightly faster for a server
> that does give us 64-bit extent lengths and honors larger nbd_zero
> lengths.

I think this change may turn out to be more significant than you have
describe here.

Consider copying from a remote, very large, mostly blank NBD source,
and also that our block status querying is synchronous.  The old
nbdcopy might end up doing lots of serialized round trips over 4GB
segments of the source to discover that it is sparse.  With this
change as I understand it, those would be coalesced into a single RTT.

(Similarly when zeroing on the output side, but to a lesser extent
because those requests can be issued in parallel.)

> Signed-off-by: Eric Blake 
> ---
>  copy/nbd-ops.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/copy/nbd-ops.c b/copy/nbd-ops.c
> index f3b3bed3..71ee0a3b 100644
> --- a/copy/nbd-ops.c
> +++ b/copy/nbd-ops.c
> @@ -428,7 +428,7 @@ nbd_ops_asynch_notify_write (struct rw *rw, size_t index)
>   * request for extents in a single round trip.
>   */
>  static int add_extent (void *vp, const char *metacontext,
> -   uint64_t offset, uint32_t *entries, size_t nr_entries,
> +   uint64_t offset, nbd_extent *entries, size_t 
> nr_entries,
> int *error);
> 
>  static void
> @@ -449,11 +449,11 @@ nbd_ops_get_extents (struct rw *rw, size_t index,
>  size_t i;
> 
>  exts.len = 0;
> -if (nbd_block_status (nbd, count, offset,
> -  (nbd_extent_callback) {
> -.user_data = ,
> -.callback = add_extent
> -  }, 0) == -1) {
> +if (nbd_block_status_64 (nbd, count, offset,
> + (nbd_extent64_callback) {
> +   .user_data = ,
> +   .callback = add_extent
> + }, 0) == -1) {
>/* XXX We could call default_get_extents, but unclear if it's
> * the right thing to do if the server is returning errors.
> */
> @@ -496,7 +496,7 @@ nbd_ops_get_extents (struct rw *rw, size_t index,
> 
>  static int
>  add_extent (void *vp, const char *metacontext,
> -uint64_t offset, uint32_t *entries, size_t nr_entries,
> +uint64_t offset, nbd_extent *entries, size_t nr_entries,
>  int *error)
>  {
>extent_list *ret = vp;
> @@ -505,25 +505,25 @@ add_extent (void *vp, const char *metacontext,
>if (strcmp (metacontext, "base:allocation") != 0 || *error)
>  return 0;
> 
> -  for (i = 0; i < nr_entries; i += 2) {
> +  for (i = 0; i < nr_entries; i++) {
>  struct extent e;
> 
>  e.offset = offset;
> -e.length = entries[i];
> +e.length = entries[i].length;
> 
>  /* Note we deliberately don't care about the HOLE flag.  There is
>   * no need to read extent that reads as zeroes.  We will convert
>   * to it to a hole or allocated extents based on the command line
>   * arguments.
>   */
> -e.zero = (entries[i+1] & LIBNBD_STATE_ZERO) != 0;
> +e.zero = (entries[i].flags & LIBNBD_STATE_ZERO) != 0;
> 
>  if (extent_list_append (ret, e) == -1) {
>perror ("realloc");
>exit (EXIT_FAILURE);
>  }
> 
> -offset += entries[i];
> +offset += entries[i].length;
>}
> 
>return 0;
> -- 
> 2.40.1

Reviewed-by: Richard W.M. Jones 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

59 matches

Mail list logo