Re: [PATCH v2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Sam Li
Damien Le Moal  于2023年6月8日周四 09:29写道:
>
> On 6/8/23 03:57, Sam Li wrote:
> > If the write operation fails and the wps is NULL, then accessing it will
> > lead to data corruption.
> >
> > Solving the issue by adding a nullptr checking in get_zones_wp() where
> > the wps is used.
> >
> > This issue is found by Peter Maydell using the Coverity Tool (CID
> > 1512459).
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/file-posix.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index ac1ed54811..4a6c71c7f5 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2523,7 +2523,7 @@ out:
> >  }
> >  }
> >  } else {
> > -if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> > +if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND) && wps) {
> >  update_zones_wp(bs, s->fd, 0, 1);
>
> Nit: this could be:
>
> } else if (wps && type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
>
> However, both if & else side do something only if the above condition is true
> and we only need to that for a zoned drive. So the entire code block could
> really be simplified to be a lot more readable. Something like this (totally
> untested, not even compiled):
>
> #if defined(CONFIG_BLKZONED)
> if (bs->bl.zone_size && (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) 
> {
> BlockZoneWps *wps = bs->wps;
> uint64_t *wp;
>
> if (!wps) {
> return ret;
> }
>
> if (ret) {
> /* write error: update the wp from the underlying device */
> update_zones_wp(bs, s->fd, 0, 1);
> goto unlock;
> }
>
> wp = >wp[offset / bs->bl.zone_size];
> if (BDRV_ZT_IS_CONV(*wp)) {
> /* Conventional zones do not have a write pointer */
> goto unlock;
> }
>
> /* Return the written position for zone append */
> if (type & QEMU_AIO_ZONE_APPEND) {
> *s->offset = *wp;
> trace_zbd_zone_append_complete(bs,
> *s->offset >> BDRV_SECTOR_BITS);
> }
>
> /* Advance the wp if needed */
> if (offset + bytes > *wp) {
> *wp = offset + bytes;
> }
>
> unlock:
> qemu_co_mutex_unlock(>colock);
> }
> #endif
>
> And making this entire block a helper function (e.g. advance_zone_wp()) would
> further clean the code. But that should be done in another patch. Care to 
> send one ?

Sure. If replacing the current code block by saying advance_zone_wp(),
I guess this patch won't be necessary. So I will send another patch
(advance_zone_wp()...) after testing.

Sam



Re: [PATCH v2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Damien Le Moal
On 6/8/23 03:57, Sam Li wrote:
> If the write operation fails and the wps is NULL, then accessing it will
> lead to data corruption.
> 
> Solving the issue by adding a nullptr checking in get_zones_wp() where
> the wps is used.
> 
> This issue is found by Peter Maydell using the Coverity Tool (CID
> 1512459).
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ac1ed54811..4a6c71c7f5 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2523,7 +2523,7 @@ out:
>  }
>  }
>  } else {
> -if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> +if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND) && wps) {
>  update_zones_wp(bs, s->fd, 0, 1);

Nit: this could be:

} else if (wps && type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {

However, both if & else side do something only if the above condition is true
and we only need to that for a zoned drive. So the entire code block could
really be simplified to be a lot more readable. Something like this (totally
untested, not even compiled):

#if defined(CONFIG_BLKZONED)
if (bs->bl.zone_size && (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
BlockZoneWps *wps = bs->wps;
uint64_t *wp;

if (!wps) {
return ret;
}

if (ret) {
/* write error: update the wp from the underlying device */
update_zones_wp(bs, s->fd, 0, 1);
goto unlock;
}

wp = >wp[offset / bs->bl.zone_size];
if (BDRV_ZT_IS_CONV(*wp)) {
/* Conventional zones do not have a write pointer */
goto unlock;
}

/* Return the written position for zone append */
if (type & QEMU_AIO_ZONE_APPEND) {
*s->offset = *wp;
trace_zbd_zone_append_complete(bs,
*s->offset >> BDRV_SECTOR_BITS);
}

/* Advance the wp if needed */
if (offset + bytes > *wp) {
*wp = offset + bytes;
}

unlock:
qemu_co_mutex_unlock(>colock);
}
#endif

And making this entire block a helper function (e.g. advance_zone_wp()) would
further clean the code. But that should be done in another patch. Care to send 
one ?

-- 
Damien Le Moal
Western Digital Research




[PATCH v2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Sam Li
If the write operation fails and the wps is NULL, then accessing it will
lead to data corruption.

Solving the issue by adding a nullptr checking in get_zones_wp() where
the wps is used.

This issue is found by Peter Maydell using the Coverity Tool (CID
1512459).

Signed-off-by: Sam Li 
---
 block/file-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ac1ed54811..4a6c71c7f5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2523,7 +2523,7 @@ out:
 }
 }
 } else {
-if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND) && wps) {
 update_zones_wp(bs, s->fd, 0, 1);
 }
 }
-- 
2.40.1




Re: [PATCH] iotests: fix 194: filter out racy postcopy-active event

2023-06-07 Thread Richard Henderson

On 6/7/23 07:36, Vladimir Sementsov-Ogievskiy wrote:

The event is racy: it will not appear in the output if bitmap is
migrated during downtime period of migration and postcopy phase is not
started.

Fixes: ae00aa239847 "iotests: 194: test also migration of dirty bitmap"
Reported-by: Richard Henderson 
Signed-off-by: Vladimir Sementsov-Ogievskiy 


Queued and applied.


r~



Re: [PATCH v3 10/14] nbd/client: Initial support for extended headers

2023-06-07 Thread Eric Blake
The subject lines are confusing: 9/14 enables extended headers in the
server, while this one does not yet enable the headers but is merely a
preliminary step.  I'll probably retitle one or the other in v4.

On Wed, May 31, 2023 at 06:26:17PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Update the client code to be able to send an extended request, and
> > parse an extended header from the server.  Note that since we reject
> > any structured reply with a too-large payload, we can always normalize
> > a valid header back into the compact form, so that the caller need not
> > deal with two branches of a union.  Still, until a later patch lets
> > the client negotiate extended headers, the code added here should not
> > be reached.  Note that because of the different magic numbers, it is
> > just as easy to trace and then tolerate a non-compliant server sending
> > the wrong header reply as it would be to insist that the server is
> > compliant.
> > 
> > The only caller to nbd_receive_reply() always passed NULL for errp;
> > since we are changing the signature anyways, I decided to sink the
> > decision to ignore errors one layer lower.
> 
> This way nbd_receive_simple_reply() and nbd_receive_structured_reply_chunk() 
> are called now only with explicit NULL last argument.. And we start to drop 
> all errors.
> 
> Also, actually, we'd better add errp parameter to the caller - 
> nbd_receive_replies(), because its caller (nbd_co_do_receive_one_chunk()) 
> will benefit of it, as it already has errp.

I can explore plumbing errp back through for v4.

> > @@ -1394,28 +1401,34 @@ static int nbd_receive_simple_reply(QIOChannel 
> > *ioc, NBDSimpleReply *reply,
> > 
> >   /* nbd_receive_structured_reply_chunk
> >* Read structured reply chunk except magic field (which should be already
> > - * read).
> > + * read).  Normalize into the compact form.
> >* Payload is not read.
> >*/
> > -static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
> > -  NBDStructuredReplyChunk 
> > *chunk,
> > +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply 
> > *chunk,
> > Error **errp)
> 
> Hmm, _structured_or_extened_ ? Or at least in comment above the function we 
> should mention this.

I'm going with 'nbd_receive_reply_chunk', since both structured and
extended modes receive chunks.

> 
> >   {
> >   int ret;
> > +size_t len;
> > +uint64_t payload_len;
> > 
> > -assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
> > +if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
> > +len = sizeof(chunk->structured);
> > +} else {
> > +assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
> > +len = sizeof(chunk->extended);
> > +}
> > 
> >   ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
> > -   sizeof(*chunk) - sizeof(chunk->magic), "structured 
> > chunk",
> 
> Would be good to print "extended chunk" in error message for EXTENDED case.

Or even just "chunk header", which covers both modes.

> >   int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
> > -   NBDReply *reply, Error **errp)
> > +   NBDReply *reply, NBDHeaderStyle hdr)
> >   {
> >   int ret;
> >   const char *type;
> > 
> > -ret = nbd_read_eof(bs, ioc, >magic, sizeof(reply->magic), errp);
> > +ret = nbd_read_eof(bs, ioc, >magic, sizeof(reply->magic), NULL);
> >   if (ret <= 0) {
> >   return ret;
> >   }
> > 
> >   reply->magic = be32_to_cpu(reply->magic);
> > 
> > +/* Diagnose but accept wrong-width header */
> >   switch (reply->magic) {
> >   case NBD_SIMPLE_REPLY_MAGIC:
> > -ret = nbd_receive_simple_reply(ioc, >simple, errp);
> > +if (hdr >= NBD_HEADER_EXTENDED) {
> > +trace_nbd_receive_wrong_header(reply->magic);
> 
> maybe, trace also expected style

Sure, I can give that a shot.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] iotests: fix 194: filter out racy postcopy-active event

2023-06-07 Thread Stefan Hajnoczi
On Wed, Jun 07, 2023 at 05:36:06PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The event is racy: it will not appear in the output if bitmap is
> migrated during downtime period of migration and postcopy phase is not
> started.
> 
> Fixes: ae00aa239847 "iotests: 194: test also migration of dirty bitmap"
> Reported-by: Richard Henderson 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> The patch fixes the problem described in
>   [PATCH] gitlab: Disable io-raw-194 for build-tcg-disabled
> and we can keep the test in gitlab ci
> 
>  tests/qemu-iotests/194 | 5 +
>  tests/qemu-iotests/194.out | 1 -
>  2 files changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 2/2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Stefan Hajnoczi
On Sun, Jun 04, 2023 at 02:16:58PM +0800, Sam Li wrote:
> If the write operation fails and the wps is NULL, then accessing it will
> lead to data corruption.
> 
> Solving the issue by adding a nullptr checking in get_zones_wp() where
> the wps is used.
> 
> This issue is found by Peter Maydell using the Coverity Tool (CID
> 1512459).
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 0d9d179a35..620942bf40 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1340,6 +1340,10 @@ static int get_zones_wp(BlockDriverState *bs, int fd, 
> int64_t offset,
>  rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> blk_zone);
>  g_autofree struct blk_zone_report *rep = NULL;
>  
> +if (!wps) {
> +return -1;
> +}

An error will be printed every time this happens on a non-zoned device:

  static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
  unsigned int nrz)
  {
  if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
  error_report("update zone wp failed");

Please change the following code to avoid the call to update_zones_wp():

  #if defined(CONFIG_BLKZONED)
  {
  BlockZoneWps *wps = bs->wps;
  if (ret == 0) {
  if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
  && wps && bs->bl.zone_size) {
  uint64_t *wp = >wp[offset / bs->bl.zone_size];
  if (!BDRV_ZT_IS_CONV(*wp)) {
  if (type & QEMU_AIO_ZONE_APPEND) {
  *s->offset = *wp;
  trace_zbd_zone_append_complete(bs, *s->offset
  >> BDRV_SECTOR_BITS);
  }
  /* Advance the wp if needed */
  if (offset + bytes > *wp) {
  *wp = offset + bytes;
  }
  }
  }
  } else {
- if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+ if (wps && (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
  update_zones_wp(bs, s->fd, 0, 1);
  }
  }

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 1/2] block/file-posix: fix g_file_get_contents return path

2023-06-07 Thread Stefan Hajnoczi
On Sun, Jun 04, 2023 at 02:16:57PM +0800, Sam Li wrote:
> The g_file_get_contents() function returns a g_boolean. If it fails, the
> returned value will be 0 instead of -1. Solve the issue by skipping
> assigning ret value.
> 
> This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
> by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

The number of bytes returned was never used, so changing the return
value to 0 or -errno is fine:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v2 00/11] block: Re-enable the graph lock

2023-06-07 Thread Stefan Hajnoczi
On Mon, Jun 05, 2023 at 10:57:00AM +0200, Kevin Wolf wrote:
> This series fixes the deadlock that was observed before commit ad128dff
> ('graph-lock: Disable locking for now'), which just disabled the graph
> lock completely as a workaround to get 8.0.1 stable.
> 
> In theory the problem is simple: We can't poll while still holding the
> lock of a different AioContext. So bdrv_graph_wrlock() just needs to
> drop that lock before it polls. However, there are a number of callers
> that don't even hold the AioContext lock they are supposed to hold, so
> temporarily unlocking tries to unlock a mutex that isn't locked,
> resulting in assertion failures.
> 
> Therefore, much of this series is just for fixing AioContext locking
> correctness. It is only the last two patches that actually fix the
> deadlock and reenable the graph locking.
> 
> v2:
> - Fixed patch 2 to actually lock the correct AioContext even if the
>   device doesn't support iothreads
> - Improved the commit message for patch 7 [Eric]
> - Fixed mismerge in patch 11 (v1 incorrectly left an #if 0 around)
> 
> Kevin Wolf (11):
>   iotests: Test active commit with iothread and background I/O
>   qdev-properties-system: Lock AioContext for blk_insert_bs()
>   test-block-iothread: Lock AioContext for blk_insert_bs()
>   block: Fix AioContext locking in bdrv_open_child()
>   block: Fix AioContext locking in bdrv_attach_child_common()
>   block: Fix AioContext locking in bdrv_reopen_parse_file_or_backing()
>   block: Fix AioContext locking in bdrv_open_inherit()
>   block: Fix AioContext locking in bdrv_open_backing_file()
>   blockjob: Fix AioContext locking in block_job_add_bdrv()
>   graph-lock: Unlock the AioContext while polling
>   Revert "graph-lock: Disable locking for now"
> 
>  include/block/graph-lock.h|   6 +-
>  block.c   | 103 --
>  block/graph-lock.c|  42 ---
>  blockjob.c|  17 ++-
>  hw/core/qdev-properties-system.c  |   8 +-
>  tests/unit/test-block-iothread.c  |   7 +-
>  .../tests/iothreads-commit-active |  85 +++
>  .../tests/iothreads-commit-active.out |  23 
>  8 files changed, 250 insertions(+), 41 deletions(-)
>  create mode 100755 tests/qemu-iotests/tests/iothreads-commit-active
>  create mode 100644 tests/qemu-iotests/tests/iothreads-commit-active.out
> 
> -- 
> 2.40.1
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v2 5/8] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

2023-06-07 Thread Niklas Cassel
On Wed, Jun 07, 2023 at 06:01:17PM +0200, Niklas Cassel wrote:
> On Mon, Jun 05, 2023 at 08:19:43PM -0400, John Snow wrote:
> > On Thu, Jun 1, 2023 at 9:46 AM Niklas Cassel  wrote:
> > >
> > > From: Niklas Cassel 
> > >
> > > For NCQ, PxCI is cleared on command queued successfully.
> > > For non-NCQ, PxCI is cleared on command completed successfully.
> > > Successfully means ERR_STAT, BUSY and DRQ are all cleared.
> > >
> > > A command that has ERR_STAT set, does not get to clear PxCI.
> > > See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
> > > and 5.3.16.5 ERR:FatalTaskfile.
> > >
> > > In the case of non-NCQ commands, not clearing PxCI is needed in order
> > > for host software to be able to see which command slot that failed.
> > >
> > > Signed-off-by: Niklas Cassel 
> > 
> > This patch causes the ahci test suite to hang. You might just need to
> > update the AHCI test suite.
> > 
> > "make check" will hang on the ahci-test as of this patch.
> 
> Argh :)
> 
> Is there any simple way to run only the ahci test suite?

To answer my own question:
QTEST_QEMU_BINARY=./build/qemu-system-x86_64 QTEST_QEMU_IMG=./build/qemu-img 
gtester -k --verbose -m=quick build/tests/qtest/ahci-test -o test_log.xml


Kind regards,
Niklas

Re: [PATCH v2 5/8] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

2023-06-07 Thread Niklas Cassel
On Mon, Jun 05, 2023 at 08:19:43PM -0400, John Snow wrote:
> On Thu, Jun 1, 2023 at 9:46 AM Niklas Cassel  wrote:
> >
> > From: Niklas Cassel 
> >
> > For NCQ, PxCI is cleared on command queued successfully.
> > For non-NCQ, PxCI is cleared on command completed successfully.
> > Successfully means ERR_STAT, BUSY and DRQ are all cleared.
> >
> > A command that has ERR_STAT set, does not get to clear PxCI.
> > See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
> > and 5.3.16.5 ERR:FatalTaskfile.
> >
> > In the case of non-NCQ commands, not clearing PxCI is needed in order
> > for host software to be able to see which command slot that failed.
> >
> > Signed-off-by: Niklas Cassel 
> 
> This patch causes the ahci test suite to hang. You might just need to
> update the AHCI test suite.
> 
> "make check" will hang on the ahci-test as of this patch.

Argh :)

Is there any simple way to run only the ahci test suite?

"make check" and "make check-qtest" are running many tests that I'm not
interested in.


Kind regards,
Niklas

[PATCH 1/3] block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()

2023-06-07 Thread Andrey Drobyshev via
Functions qcow2_get_host_offset(), get_cluster_offset() explicitly
report compressed cluster types when data is compressed.  However, this
information is never passed further.  Let's make use of it by adding new
BDRV_BLOCK_COMPRESSED flag for bdrv_block_status(), so that caller may
know that the data range is compressed.  In particular, we're going to
use this flag to tweak "qemu-img map" output.

This new flag is only being utilized by qcow and qcow2 formats, as only
these two support compression.

Signed-off-by: Andrey Drobyshev 
---
 block/qcow.c | 5 -
 block/qcow2.c| 3 +++
 include/block/block-common.h | 3 +++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/qcow.c b/block/qcow.c
index 3644bbf5cb..8416bcc2c3 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -549,7 +549,10 @@ qcow_co_block_status(BlockDriverState *bs, bool want_zero,
 if (!cluster_offset) {
 return 0;
 }
-if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypto) {
+if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
+return BDRV_BLOCK_DATA | BDRV_BLOCK_COMPRESSED;
+}
+if (s->crypto) {
 return BDRV_BLOCK_DATA;
 }
 *map = cluster_offset | index_in_cluster;
diff --git a/block/qcow2.c b/block/qcow2.c
index e23edd48c2..8e01adc610 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2162,6 +2162,9 @@ qcow2_co_block_status(BlockDriverState *bs, bool 
want_zero, int64_t offset,
 {
 status |= BDRV_BLOCK_RECURSE;
 }
+if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
+status |= BDRV_BLOCK_COMPRESSED;
+}
 return status;
 }
 
diff --git a/include/block/block-common.h b/include/block/block-common.h
index e15395f2cb..f7a4e7d4db 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -282,6 +282,8 @@ typedef enum {
  *   layer rather than any backing, set by block layer
  * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this
  * layer, set by block layer
+ * BDRV_BLOCK_COMPRESSED: the underlying data is compressed; only valid for
+ *the formats supporting compression: qcow, qcow2
  *
  * Internal flags:
  * BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to request
@@ -317,6 +319,7 @@ typedef enum {
 #define BDRV_BLOCK_ALLOCATED0x10
 #define BDRV_BLOCK_EOF  0x20
 #define BDRV_BLOCK_RECURSE  0x40
+#define BDRV_BLOCK_COMPRESSED   0x80
 
 typedef QTAILQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue;
 
-- 
2.31.1




[PATCH 3/3] qemu-iotests: update expected tests output to contain "compressed" field

2023-06-07 Thread Andrey Drobyshev via
The previous commit adds "compressed" boolean field to JSON output of
"qemu-img map" command.  Let's tweak expected tests output accordingly.

Signed-off-by: Andrey Drobyshev 
---
 tests/qemu-iotests/122.out|  84 
 tests/qemu-iotests/154.out| 194 +-
 tests/qemu-iotests/179.out| 178 
 tests/qemu-iotests/244.out|  24 +--
 tests/qemu-iotests/252.out|  10 +-
 tests/qemu-iotests/274.out|  48 ++---
 .../tests/nbd-qemu-allocation.out |   6 +-
 7 files changed, 272 insertions(+), 272 deletions(-)

diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index e18766e167..6a1aa3fe2b 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -67,12 +67,12 @@ read 65536/65536 bytes at offset 4194304
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 65536/65536 bytes at offset 8388608
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 65536, "depth": 0, "present": true, "zero": false, 
"data": true},
-{ "start": 65536, "length": 4128768, "depth": 0, "present": false, "zero": 
true, "data": false},
-{ "start": 4194304, "length": 65536, "depth": 0, "present": true, "zero": 
false, "data": true},
-{ "start": 4259840, "length": 4128768, "depth": 0, "present": false, "zero": 
true, "data": false},
-{ "start": 8388608, "length": 65536, "depth": 0, "present": true, "zero": 
false, "data": true},
-{ "start": 8454144, "length": 4128768, "depth": 0, "present": false, "zero": 
true, "data": false}]
+[{ "start": 0, "length": 65536, "depth": 0, "present": true, "zero": false, 
"data": true, "compressed": true},
+{ "start": 65536, "length": 4128768, "depth": 0, "present": false, "zero": 
true, "data": false, "compressed": false},
+{ "start": 4194304, "length": 65536, "depth": 0, "present": true, "zero": 
false, "data": true, "compressed": true},
+{ "start": 4259840, "length": 4128768, "depth": 0, "present": false, "zero": 
true, "data": false, "compressed": false},
+{ "start": 8388608, "length": 65536, "depth": 0, "present": true, "zero": 
false, "data": true, "compressed": true},
+{ "start": 8454144, "length": 4128768, "depth": 0, "present": false, "zero": 
true, "data": false, "compressed": false}]
 read 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 65536/65536 bytes at offset 4194304
@@ -94,12 +94,12 @@ wrote 1024/1024 bytes at offset 1046528
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 1024/1024 bytes at offset 0
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 65536, "depth": 0, "present": true, "zero": false, 
"data": true},
-{ "start": 65536, "length": 65536, "depth": 0, "present": false, "zero": true, 
"data": false},
-{ "start": 131072, "length": 196608, "depth": 0, "present": true, "zero": 
false, "data": true},
-{ "start": 327680, "length": 655360, "depth": 0, "present": false, "zero": 
true, "data": false},
-{ "start": 983040, "length": 65536, "depth": 0, "present": true, "zero": 
false, "data": true},
-{ "start": 1048576, "length": 1046528, "depth": 0, "present": false, "zero": 
true, "data": false}]
+[{ "start": 0, "length": 65536, "depth": 0, "present": true, "zero": false, 
"data": true, "compressed": true},
+{ "start": 65536, "length": 65536, "depth": 0, "present": false, "zero": true, 
"data": false, "compressed": false},
+{ "start": 131072, "length": 196608, "depth": 0, "present": true, "zero": 
false, "data": true, "compressed": true},
+{ "start": 327680, "length": 655360, "depth": 0, "present": false, "zero": 
true, "data": false, "compressed": false},
+{ "start": 983040, "length": 65536, "depth": 0, "present": true, "zero": 
false, "data": true, "compressed": true},
+{ "start": 1048576, "length": 1046528, "depth": 0, "present": false, "zero": 
true, "data": false, "compressed": false}]
 read 16384/16384 bytes at offset 0
 16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 16384/16384 bytes at offset 16384
@@ -130,14 +130,14 @@ read 3145728/3145728 bytes at offset 0
 3 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 63963136/63963136 bytes at offset 3145728
 61 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 67108864, "depth": 0, "present": true, "zero": false, 
"data": true, "offset": OFFSET}]
+[{ "start": 0, "length": 67108864, "depth": 0, "present": true, "zero": false, 
"data": true, "compressed": false, "offset": OFFSET}]
 
 convert -c -S 0:
 read 3145728/3145728 bytes at offset 0
 3 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 63963136/63963136 bytes at offset 3145728
 61 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 67108864, "depth": 0, "present": true, "zero": false, 
"data": true}]
+[{ "start": 0, "length": 67108864, "depth": 0, "present": true, "zero": false, 

[PATCH 0/3] qemu-img: map: implement support for compressed clusters

2023-06-07 Thread Andrey Drobyshev via
This series adds "compressed" field to the output of "qemu-img map"
command, specifying whether or not data block is compressed.  Only
JSON output mode is affected.  With this applied, output looks like so:

# qemu-img create -f qcow2 img.qcow2 192K
# qemu-io -c "write -c -P 0xaa 0 64K" img.qcow2
# qemu-io -c "write -P 0xbb 64K 64K" img.qcow2
# qemu-img map --output=json img.qcow2

[{ "start": 0, "length": 65536, "depth": 0, "present": true, "zero": false, 
"data": true, "compressed": true},
{ "start": 65536, "length": 65536, "depth": 0, "present": true, "zero": false, 
"data": true, "compressed": false, "offset": 393216},
{ "start": 131072, "length": 65536, "depth": 0, "present": false, "zero": true, 
"data": false, "compressed": false}]

Only formats supporting compression are affected (qcow, qcow2).

Andrey Drobyshev (3):
  block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()
  qemu-img: map: report compressed data blocks
  qemu-iotests: update expected tests output to contain "compressed"
field

 block/qcow.c  |   5 +-
 block/qcow2.c |   3 +
 include/block/block-common.h  |   3 +
 qapi/block-core.json  |   7 +-
 qemu-img.c|  16 +-
 tests/qemu-iotests/122.out|  84 
 tests/qemu-iotests/154.out| 194 +-
 tests/qemu-iotests/179.out| 178 
 tests/qemu-iotests/244.out|  24 +--
 tests/qemu-iotests/252.out|  10 +-
 tests/qemu-iotests/274.out|  48 ++---
 .../tests/nbd-qemu-allocation.out |   6 +-
 12 files changed, 300 insertions(+), 278 deletions(-)

-- 
2.31.1




[PATCH 2/3] qemu-img: map: report compressed data blocks

2023-06-07 Thread Andrey Drobyshev via
Right now "qemu-img map" reports compressed blocks as containing data
but having no host offset.  This is not very informative.  Instead,
let's add another boolean field named "compressed" in case JSON output
mode is specified.  This is achieved by utilizing new allocation status
flag BDRV_BLOCK_COMPRESSED for bdrv_block_status().

Signed-off-by: Andrey Drobyshev 
---
 qapi/block-core.json |  7 +--
 qemu-img.c   | 16 +---
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5dd5f7e4b0..bc6653e5d6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -409,6 +409,9 @@
 #
 # @zero: whether the virtual blocks read as zeroes
 #
+# @compressed: true indicates that data is stored compressed (the target
+# format must support compression)
+#
 # @depth: number of layers (0 = top image, 1 = top image's backing
 # file, ..., n - 1 = bottom image (where n is the number of images
 # in the chain)) before reaching one for which the range is
@@ -426,8 +429,8 @@
 ##
 { 'struct': 'MapEntry',
   'data': {'start': 'int', 'length': 'int', 'data': 'bool',
-   'zero': 'bool', 'depth': 'int', 'present': 'bool',
-   '*offset': 'int', '*filename': 'str' } }
+   'zero': 'bool', 'compressed': 'bool', 'depth': 'int',
+   'present': 'bool', '*offset': 'int', '*filename': 'str' } }
 
 ##
 # @BlockdevCacheInfo:
diff --git a/qemu-img.c b/qemu-img.c
index 27f48051b0..9bb69f58f6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3083,7 +3083,7 @@ static int img_info(int argc, char **argv)
 }
 
 static int dump_map_entry(OutputFormat output_format, MapEntry *e,
-  MapEntry *next)
+  MapEntry *next, bool can_compress)
 {
 switch (output_format) {
 case OFORMAT_HUMAN:
@@ -3112,6 +3112,9 @@ static int dump_map_entry(OutputFormat output_format, 
MapEntry *e,
e->present ? "true" : "false",
e->zero ? "true" : "false",
e->data ? "true" : "false");
+if (can_compress) {
+printf(", \"compressed\": %s", e->compressed ? "true" : "false");
+}
 if (e->has_offset) {
 printf(", \"offset\": %"PRId64"", e->offset);
 }
@@ -3172,6 +3175,7 @@ static int get_block_status(BlockDriverState *bs, int64_t 
offset,
 .length = bytes,
 .data = !!(ret & BDRV_BLOCK_DATA),
 .zero = !!(ret & BDRV_BLOCK_ZERO),
+.compressed = !!(ret & BDRV_BLOCK_COMPRESSED),
 .offset = map,
 .has_offset = has_offset,
 .depth = depth,
@@ -3189,6 +3193,7 @@ static inline bool entry_mergeable(const MapEntry *curr, 
const MapEntry *next)
 }
 if (curr->zero != next->zero ||
 curr->data != next->data ||
+curr->compressed != next->compressed ||
 curr->depth != next->depth ||
 curr->present != next->present ||
 !curr->filename != !next->filename ||
@@ -3218,6 +3223,7 @@ static int img_map(int argc, char **argv)
 bool force_share = false;
 int64_t start_offset = 0;
 int64_t max_length = -1;
+bool can_compress = false;
 
 fmt = NULL;
 output = NULL;
@@ -3313,6 +3319,10 @@ static int img_map(int argc, char **argv)
 length = MIN(start_offset + max_length, length);
 }
 
+if (output_format == OFORMAT_JSON) {
+can_compress = block_driver_can_compress(bs->drv);
+}
+
 curr.start = start_offset;
 while (curr.start + curr.length < length) {
 int64_t offset = curr.start + curr.length;
@@ -3330,7 +3340,7 @@ static int img_map(int argc, char **argv)
 }
 
 if (curr.length > 0) {
-ret = dump_map_entry(output_format, , );
+ret = dump_map_entry(output_format, , , can_compress);
 if (ret < 0) {
 goto out;
 }
@@ -3338,7 +3348,7 @@ static int img_map(int argc, char **argv)
 curr = next;
 }
 
-ret = dump_map_entry(output_format, , NULL);
+ret = dump_map_entry(output_format, , NULL, can_compress);
 if (output_format == OFORMAT_JSON) {
 puts("]");
 }
-- 
2.31.1




Re: [PULL 05/17] parallels: Out of image offset in BAT leads to image inflation

2023-06-07 Thread Hanna Czenczek

On 07.06.23 08:51, Michael Tokarev wrote:

05.06.2023 18:45, Hanna Czenczek wrote:

From: Alexander Ivanov 

data_end field in BDRVParallelsState is set to the biggest offset 
present

in BAT. If this offset is outside of the image, any further write will
create the cluster at this offset and/or the image will be truncated to
this offset on close. This is definitely not correct.

Raise an error in parallels_open() if data_end points outside the image
and it is not a check (let the check to repaire the image). Set data_end
to the end of the cluster with the last correct offset.


Hi!

This, and a few other parallels changes in this series:

 parallels: Out of image offset in BAT leads to image inflation
 parallels: Fix high_off calculation in parallels_co_check()
 parallels: Fix image_end_offset and data_end after out-of-image check
 parallels: Fix statistics calculation (?)

Should these be applied to -stable too, or is it not important?


Personally, I don’t think they need to be in stable; but I’ll leave the 
final judgment to Alexander.


Hanna




Release of RSD (Rust QSD) 0.1

2023-06-07 Thread Hanna Czenczek

Hi everyone!

I’ve just released the first version (0.1) of RSD, which is a 
proof-of-concept to rewrite the qemu-storage-daemon (QSD) and thus the 
qemu block layer in Rust:


https://gitlab.com/hreitz/rsd
https://gitlab.com/hreitz/rsd/-/releases/v0.1

We’ve been talking for quite a long time about adding Rust into qemu and 
the qemu block layer, and usually ended it with “Could be nice, we just 
need someone to start.”  After we’ve had discussion last year about 
maybe adding C++ for language-supported coroutines, I thought if the 
time isn’t now, it’s never.


In the process, I’ve gathered some insights into obstacles and benefits 
that Rust could bring us, which I’ve summed up in two blog posts:


Part 1 (Overview): https://czenczek.de/blog/rsd-overview.html
Part 2 (Performance): https://czenczek.de/blog/rsd-performance.html

The bottom line so far is that Rust could bring us valuable benefits, 
but likely only if we rewrote everything.  A middle ground is possible, 
by keeping RSD focused on a specific subset of functionality, the one 
that is most valuable for it.  In any case, this (v0.1) is as far as is 
reasonable to go with RSD as an experiment – any further work we’d need 
to do in earnest and define what exactly we actually want.  And that’s 
an important part of what this announcement is for, to see whether 
anyone has this interest!



Hanna




Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies

2023-06-07 Thread Richard W.M. Jones
On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:
> On 5/25/23 15:00, Eric Blake wrote:
> > @@ -69,11 +75,18 @@  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
> >   REPLY.STRUCTURED_REPLY.CHECK:
> >struct command *cmd = h->reply_cmd;
> >uint16_t flags, type;
> > -  uint32_t length;
> > +  uint64_t length;
> > +  uint64_t offset = -1;
> 
> (6) I disagree with initializing the local variable "offset" here.
> 
> Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read
> "offset" back if "extended_headers" is set. But if "extended_headers" is
> set, we also store a value to "offset", before the read.
> 
> Initializing "offset" to -1 suggests that the code might otherwise read
> an indeterminate value from "offset" -- but that's not the case.

You may find that the compiler will give a warning.  It's usually not
good about dealing with the case where a variable being initialized +
used depends on another variable being true.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




[PATCH] iotests: fix 194: filter out racy postcopy-active event

2023-06-07 Thread Vladimir Sementsov-Ogievskiy
The event is racy: it will not appear in the output if bitmap is
migrated during downtime period of migration and postcopy phase is not
started.

Fixes: ae00aa239847 "iotests: 194: test also migration of dirty bitmap"
Reported-by: Richard Henderson 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

The patch fixes the problem described in
  [PATCH] gitlab: Disable io-raw-194 for build-tcg-disabled
and we can keep the test in gitlab ci

 tests/qemu-iotests/194 | 5 +
 tests/qemu-iotests/194.out | 1 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 68894371f5..c0ce82dd25 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -74,6 +74,11 @@ with iotests.FilePath('source.img') as source_img_path, \
 
 while True:
 event1 = source_vm.event_wait('MIGRATION')
+if event1['data']['status'] == 'postcopy-active':
+# This event is racy, it depends do we really do postcopy or bitmap
+# was migrated during downtime (and no data to migrate in postcopy
+# phase). So, don't log it.
+continue
 iotests.log(event1, filters=[iotests.filter_qmp_event])
 if event1['data']['status'] in ('completed', 'failed'):
 iotests.log('Gracefully ending the `drive-mirror` job on 
source...')
diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out
index 4e6df1565a..376ed1d2e6 100644
--- a/tests/qemu-iotests/194.out
+++ b/tests/qemu-iotests/194.out
@@ -14,7 +14,6 @@ Starting migration...
 {"return": {}}
 {"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {"data": {"status": "active"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"status": "postcopy-active"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 Gracefully ending the `drive-mirror` job on source...
 {"return": {}}
-- 
2.34.1




Re: [PATCH] gitlab: Disable io-raw-194 for build-tcg-disabled

2023-06-07 Thread Vladimir Sementsov-Ogievskiy

On 07.06.23 15:44, Stefan Hajnoczi wrote:

The line of output that has changed was originally added by the
following commit:

commit ae00aa2398476824f0eca80461da215e7cdc1c3b
Author: Vladimir Sementsov-Ogievskiy 
Date:   Fri May 22 01:06:46 2020 +0300

 iotests: 194: test also migration of dirty bitmap

Vladimir: Any idea why the postcopy-active event may not be emitted in
some cases?



I think:

fast connection + small postcopy data => postcopy actually not started, 
everything is migrated in downtime, when both source and target are not running.

The test doesn't want to test exactly postcopy, but just want to check that 
bitmaps are migrated somehow.


So, we need something like this:

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 68894371f5..c0ce82dd25 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -74,6 +74,11 @@ with iotests.FilePath('source.img') as source_img_path, \
 
 while True:

 event1 = source_vm.event_wait('MIGRATION')
+if event1['data']['status'] == 'postcopy-active':
+# This event is racy, it depends do we really do postcopy or bitmap
+# was migrated during downtime (and no data to migrate in postcopy
+# phase). So, don't log it.
+continue
 iotests.log(event1, filters=[iotests.filter_qmp_event])
 if event1['data']['status'] in ('completed', 'failed'):
 iotests.log('Gracefully ending the `drive-mirror` job on 
source...')





On Tue, 6 Jun 2023 at 12:26, Richard Henderson
 wrote:


This test consistently fails on Azure cloud build hosts in
a way that suggests a timing problem in the test itself:

--- .../194.out
+++ .../194.out.bad
@@ -14,7 +14,6 @@
  {"return": {}}
  {"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", 
"seconds": "SECS"}}
  {"data": {"status": "active"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", 
"seconds": "SECS"}}
-{"data": {"status": "postcopy-active"}, "event": "MIGRATION", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
  {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", 
"seconds": "SECS"}}
  Gracefully ending the `drive-mirror` job on source...
  {"return": {}}

Signed-off-by: Richard Henderson 
---
  .gitlab-ci.d/buildtest.yml | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 0f1be14cb6..62483f 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -236,7 +236,7 @@ build-tcg-disabled:
  - cd tests/qemu-iotests/
  - ./check -raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 048
  052 063 077 086 101 104 106 113 148 150 151 152 157 159 160 163
-170 171 183 184 192 194 208 221 226 227 236 253 277 image-fleecing
+170 171 183 184 192 208 221 226 227 236 253 277 image-fleecing
  - ./check -qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
  124 132 139 142 144 145 151 152 155 157 165 194 196 200 202
  208 209 216 218 227 234 246 247 248 250 254 255 257 258
--
2.34.1




--
Best regards,
Vladimir




Re: [Libguestfs] [libnbd PATCH v3 07/22] generator: Add struct nbd_extent in prep for 64-bit extents

2023-06-07 Thread Laszlo Ersek
On 5/25/23 15:00, Eric Blake wrote:
> The existing nbd_block_status() callback is permanently stuck with an
> array of uint32_t pairs (len/2 extents), which is both constrained on

(

so the terminology is:

- we have "len" uint32_t elements in the array,

- one extent is one uint32_t pair,

- hence we have len/2 extents in the array

)

> maximum extent size (no larger than 4G) and on the status flags (must
> fit in 32 bits).  While the "base:allocation" metacontext will never
> exceed 32 bits, it is not hard to envision other extension
> metacontexts where a 64-bit status would be useful (for example, Zoned
> Block Devices expressing a 64-bit offset[1]).  Exposing 64-bit extents
> will require a new API; we now have the decision of whether to copy
> the existing API practice of returning a bare array containing len/2
> extent pairs,

The term "extent pairs" is inconsistent with the above terminology.

(1) We should say:

- "len/2 extents", or
- "len/2 uint64_t pairs".

> or with a saner idea

(2) s/or with/or go with/ ?

> of an array of structs with len
> extents.

This sounds OK; if the element type of the array is a struct, then one
struct describes one extent, and then the "len" elements of the array
describe "len" extents.

> Returning an array of structs is actually easier to map to
> various language bindings, particularly since we know the length field
> will generally fit in 63-bits (fallout from the fact that NBD exports
> are effectively capped in size by signed off_t), but where the status
> field must represent a full 64 bits (assuming that the user wants to
> connect to a metadata extension that utilizes 64 bits, rather than
> existing metadata contexts that only expose 32 bits).

This is a good argument, but I needed to chew on the "but" conjunction
for a good while until I understood the point. The "but" seemed out of
place at first; only later did I realize it was actually the core of the
argument ("easier to map").

The point is that for each extent, we have two properties that are same
in size but that differ in meaning (hence in type), and *therefore* a
struct is better than two adjacent, raw uint64_t elements. We're going
to have two uint64_t fields, but conceptually, that's one off_t field
(just serialized as a uint64_t) and a genuine uint64_t field, for
status.

(3) Can you append a sentence, "That is, conceptually, the fields have
different types, so a structure is best to express them"? Or something
similar?

>
> This patch introduces the struct we plan to use in the new API, along
> with language bindings.  The bindings for Python and OCaml were
> relatively straightforward; the Golang bindings took a bit more effort
> for me to write.  Temporary unused attributes are needed to keep the
> compiler happy until a later patch exposes a new API using the new
> callback type.

(4) Can you say "We need to insert '__attribute__ ((unused))' a number
of times, temporarily, to keep the compiler happy..."?

"Unused attributes" sounds as if we needed some attributes that were not
used. Confusing to me. What we need instead is to mark some variables or
functions with an attribute that says those vars/funcs are unused.

>
> Note that 'struct nbd_block_descriptor_ext' in lib/nbd-protocol.h is
> exactly the same as what we want to use in C.  But it is easier to
> stick a new public type in  than to figure out how to expose
> just part of a header we only want to use internally.

Could be OK or fishy; I should be able to tell later :)

>
> [1] https://zonedstorage.io/docs/linux/zbd-api
>
> Signed-off-by: Eric Blake 
> ---
>  generator/API.mli   |  1 +
>  generator/API.ml| 12 +++-
>  generator/C.ml  | 24 +---
>  generator/GoLang.ml | 24 
>  generator/OCaml.ml  | 24 +---
>  generator/Python.ml | 21 -
>  ocaml/helpers.c | 20 
>  ocaml/nbd-c.h   |  1 +
>  golang/handle.go|  6 ++
>  9 files changed, 125 insertions(+), 8 deletions(-)
>
> diff --git a/generator/API.mli b/generator/API.mli
> index c5bba8c2..96626c9b 100644
> --- a/generator/API.mli
> +++ b/generator/API.mli
> @@ -52,6 +52,7 @@ and
>  | BytesPersistOut of string * string
>  | Closure of closure   (** function pointer + void *opaque *)
>  | Enum of string * enum(** enum/union type, int in C *)
> +| Extent64 of string   (** extent descriptor, struct nbd_extent in C *)
>  | Fd of string (** file descriptor *)
>  | Flags of string * flags  (** flags, uint32_t in C *)
>  | Int of string(** small int *)
> diff --git a/generator/API.ml b/generator/API.ml
> index 02c1260d..1da1ec53 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -42,6 +42,7 @@
>  | BytesPersistOut of string * string
>  | Closure of closure
>  | Enum of string * enum
> +| Extent64 of string
>  | Fd of string
>  | Flags of string * flags
>  | Int of string
> @@ -157,6 +158,14 @@ let extent_closure 

Re: [PATCH 1/2] block/file-posix: fix g_file_get_contents return path

2023-06-07 Thread Matthew Rosato
On 6/4/23 2:16 AM, Sam Li wrote:
> The g_file_get_contents() function returns a g_boolean. If it fails, the
> returned value will be 0 instead of -1. Solve the issue by skipping
> assigning ret value.
> 
> This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
> by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.
> 
> Signed-off-by: Sam Li 

Reviewed-by: Matthew Rosato 

Also tested to verify that this fix resolves the reported issue.  Thanks!





Re: [PATCH] gitlab: Disable io-raw-194 for build-tcg-disabled

2023-06-07 Thread Stefan Hajnoczi
The line of output that has changed was originally added by the
following commit:

commit ae00aa2398476824f0eca80461da215e7cdc1c3b
Author: Vladimir Sementsov-Ogievskiy 
Date:   Fri May 22 01:06:46 2020 +0300

iotests: 194: test also migration of dirty bitmap

Vladimir: Any idea why the postcopy-active event may not be emitted in
some cases?

Stefan

On Tue, 6 Jun 2023 at 12:26, Richard Henderson
 wrote:
>
> This test consistently fails on Azure cloud build hosts in
> a way that suggests a timing problem in the test itself:
>
> --- .../194.out
> +++ .../194.out.bad
> @@ -14,7 +14,6 @@
>  {"return": {}}
>  {"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
>  {"data": {"status": "active"}, "event": "MIGRATION", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
> -{"data": {"status": "postcopy-active"}, "event": "MIGRATION", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
>  {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
>  Gracefully ending the `drive-mirror` job on source...
>  {"return": {}}
>
> Signed-off-by: Richard Henderson 
> ---
>  .gitlab-ci.d/buildtest.yml | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 0f1be14cb6..62483f 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -236,7 +236,7 @@ build-tcg-disabled:
>  - cd tests/qemu-iotests/
>  - ./check -raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 
> 048
>  052 063 077 086 101 104 106 113 148 150 151 152 157 159 160 163
> -170 171 183 184 192 194 208 221 226 227 236 253 277 
> image-fleecing
> +170 171 183 184 192 208 221 226 227 236 253 277 image-fleecing
>  - ./check -qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
>  124 132 139 142 144 145 151 152 155 157 165 194 196 200 202
>  208 209 216 218 227 234 246 247 248 250 254 255 257 258
> --
> 2.34.1
>
>



Re: [PATCH v3 09/14] nbd/server: Initial support for extended headers

2023-06-07 Thread Eric Blake
On Wed, May 31, 2023 at 05:46:55PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Time to support clients that request extended headers.  Now we can
> > finally reach the code added across several previous patches.
> > 
> > Even though the NBD spec has been altered to allow us to accept
> > NBD_CMD_READ larger than the max payload size (provided our response
> > is a hole or broken up over more than one data chunk), we are not
> > planning to take advantage of that, and continue to cap NBD_CMD_READ
> > to 32M regardless of header size.
> > 
> > For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
> > supports 64-bit operations without any effort on our part.  For
> > NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
> > patch took care of implementing the required
> > NBD_REPLY_TYPE_BLOCK_STATUS_EXT.
> > 
> > Signed-off-by: Eric Blake 
> > ---
> >   nbd/nbd-internal.h |   5 +-
> 
> [..]
> 
> > 
> >   static inline void set_be_simple_reply(NBDClient *client, struct iovec 
> > *iov,
> > -   uint64_t error, NBDRequest *request)
> > +   uint32_t error, NBDStructuredError 
> > *err,
> > +   NBDRequest *request)
> >   {
> > -NBDSimpleReply *reply = iov->iov_base;
> > +if (client->header_style >= NBD_HEADER_EXTENDED) {
> > +NBDExtendedReplyChunk *chunk = iov->iov_base;
> > 
> > -iov->iov_len = sizeof(*reply);
> > -stl_be_p(>magic, NBD_SIMPLE_REPLY_MAGIC);
> > -stl_be_p(>error, error);
> > -stq_be_p(>handle, request->handle);
> > +iov->iov_len = sizeof(*chunk);
> > +stl_be_p(>magic, NBD_EXTENDED_REPLY_MAGIC);
> > +stw_be_p(>flags, NBD_REPLY_FLAG_DONE);
> > +stq_be_p(>handle, request->handle);
> > +stq_be_p(>offset, request->from);
> > +if (error) {
> > +assert(!iov[1].iov_base);
> > +iov[1].iov_base = err;
> > +iov[1].iov_len = sizeof(*err);
> > +stw_be_p(>type, NBD_REPLY_TYPE_ERROR);
> > +stq_be_p(>length, sizeof(*err));
> > +stl_be_p(>error, error);
> > +stw_be_p(>message_length, 0);
> > +} else {
> > +stw_be_p(>type, NBD_REPLY_TYPE_NONE);
> > +stq_be_p(>length, 0);
> > +}
> > +} else {
> > +NBDSimpleReply *reply = iov->iov_base;
> > +
> > +iov->iov_len = sizeof(*reply);
> > +stl_be_p(>magic, NBD_SIMPLE_REPLY_MAGIC);
> > +stl_be_p(>error, error);
> > +stq_be_p(>handle, request->handle);
> > +}
> >   }
> > 
> >   static int coroutine_fn nbd_co_send_simple_reply(NBDClient *client,
> > @@ -1906,30 +1966,44 @@ static int coroutine_fn 
> > nbd_co_send_simple_reply(NBDClient *client,
> 
> So, that's not _simple_ now.. The function should be renamed. As well as 
> set_be_simple_reply(). _simple_or_extended_ ? a bit too long. But continuing 
> to use "simple" is in bad relation with use of "simple" word in specification.

In fact, I added an assertion that set_be_simple_reply() can only be
reached when extended replies are not in use, so none of this
complexity here was needed after all.

> 
> Probably better to update callers? The only caller isi 
> nbd_send_generic_reply(). So, could we just add 
> nbd_co_send_single_extended_reply() to call from nbd_send_generic_reply() in 
> case of EXTENDED?
> 
> Also, transformation of set_be_simple_reply() do look like it should be two 
> separate functions.
> 
> >   {
> >   NBDReply hdr;
> >   int nbd_err = system_errno_to_nbd_errno(error);
> > +NBDStructuredError err;
> >   struct iovec iov[] = {
> >   {.iov_base = },
> >   {.iov_base = data, .iov_len = len}
> >   };
> > 
> > +assert(!len || !nbd_err);
> >   trace_nbd_co_send_simple_reply(request->handle, nbd_err,
> >  nbd_err_lookup(nbd_err), len);
> > -set_be_simple_reply(client, [0], nbd_err, request);
> > +set_be_simple_reply(client, [0], nbd_err, , request);
> > 
> > -return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
> > +return nbd_co_send_iov(client, iov, iov[1].iov_len ? 2 : 1, errp);

Not introduced in this patch, but it turns out that when
iov[1].iov_len == 0, blindly passing niov==2 to nbd_co_send_iov()
still does the right thing, so I can lose the conditional here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [libnbd PATCH v3 04/22] states: Prepare to send 64-bit requests

2023-06-07 Thread Richard W.M. Jones
On Thu, May 25, 2023 at 08:00:50AM -0500, Eric Blake wrote:
> Support sending 64-bit requests if extended headers were negotiated.
> This includes setting NBD_CMD_FLAG_PAYLOAD_LEN any time we send an
> extended NBD_CMD_WRITE; this is such a fundamental part of the
> protocol that for now it is easier to silently ignore whatever value
> the user passes in for that bit in the flags parameter of nbd_pwrite
> regardless of the current settings in set_strict_mode, rather than
> trying to force the user to pass in the correct value to match whether
> extended mode is negotiated.  However, when we later add APIs to give
> the user more control for interoperability testing, it may be worth
> adding a new set_strict_mode control knob to explicitly enable the
> client to intentionally violate the protocol (the testsuite added in
> this patch would then be updated to match).

In hindsight it's unfortunate that the flags parameter of the library
ABI functions like nbd_pread, nbd_pwrite etc got directly mapped to
the command flags used on the wire.  But given that it is, I agree
with this approach.

> At this point, h->extended_headers is permanently false (we can't
> enable it until all other aspects of the protocol have likewise been
> converted).
> 
> Support for using FLAG_PAYLOAD_LEN with NBD_CMD_BLOCK_STATUS is less
> fundamental, and deserves to be in its own patch.
> 
> Signed-off-by: Eric Blake 
> ---
>  lib/internal.h  |  10 ++-
>  generator/API.ml|  20 +++--
>  generator/states-issue-command.c|  29 ---
>  generator/states-reply-structured.c |   2 +-
>  lib/rw.c|  17 +++--
>  tests/Makefile.am   |   4 +
>  tests/pwrite-extended.c | 112 
>  .gitignore  |   1 +
>  8 files changed, 169 insertions(+), 26 deletions(-)
>  create mode 100644 tests/pwrite-extended.c
> 
> diff --git a/lib/internal.h b/lib/internal.h
> index c71980ef..8a5f93d4 100644
> --- a/lib/internal.h
> +++ b/lib/internal.h
> @@ -110,6 +110,9 @@ struct nbd_handle {
>char *tls_username;   /* Username, NULL = use current username */
>char *tls_psk_file;   /* PSK filename, NULL = no PSK */
> 
> +  /* Extended headers. */
> +  bool extended_headers;/* If we negotiated NBD_OPT_EXTENDED_HEADERS 
> */
> +
>/* Desired metadata contexts. */
>bool request_sr;
>bool request_meta;
> @@ -263,7 +266,10 @@ struct nbd_handle {
>/* Issuing a command must use a buffer separate from sbuf, for the
> * case when we interrupt a request to service a reply.
> */
> -  struct nbd_request request;
> +  union {
> +struct nbd_request compact;
> +struct nbd_request_ext extended;
> +  } req;
>bool in_write_payload;
>bool in_write_shutdown;
> 
> @@ -364,7 +370,7 @@ struct command {
>uint16_t type;
>uint64_t cookie;
>uint64_t offset;
> -  uint32_t count;
> +  uint64_t count;
>void *data; /* Buffer for read/write */
>struct command_cb cb;
>bool initialized; /* For read, true if getting a hole may skip memset */
> diff --git a/generator/API.ml b/generator/API.ml
> index 5fcb0e1f..02c1260d 100644
> --- a/generator/API.ml
> +++ b/generator/API.ml
> @@ -198,11 +198,12 @@ let cmd_flags =
>flag_prefix = "CMD_FLAG";
>guard = Some "((h->strict & LIBNBD_STRICT_FLAGS) || flags > UINT16_MAX)";
>flags = [
> -"FUA",   1 lsl 0;
> -"NO_HOLE",   1 lsl 1;
> -"DF",1 lsl 2;
> -"REQ_ONE",   1 lsl 3;
> -"FAST_ZERO", 1 lsl 4;
> +"FUA", 1 lsl 0;
> +"NO_HOLE", 1 lsl 1;
> +"DF",  1 lsl 2;
> +"REQ_ONE", 1 lsl 3;
> +"FAST_ZERO",   1 lsl 4;
> +"PAYLOAD_LEN", 1 lsl 5;
>]
>  }
>  let handshake_flags = {
> @@ -2507,7 +2508,7 @@   "pread_structured", {
>"pwrite", {
>  default_call with
>  args = [ BytesIn ("buf", "count"); UInt64 "offset" ];
> -optargs = [ OFlags ("flags", cmd_flags, Some ["FUA"]) ];
> +optargs = [ OFlags ("flags", cmd_flags, Some ["FUA"; "PAYLOAD_LEN"]) ];
>  ret = RErr;
>  permitted_states = [ Connected ];
>  shortdesc = "write to the NBD server";
> @@ -2530,7 +2531,10 @@   "pwrite", {
>  C meaning that the server should not
>  return until the data has been committed to permanent storage
>  (if that is supported - some servers cannot do this, see
> -L)."
> +L).  For convenience, libnbd ignores the presence
> +or absence of the flag C in C,
> +while correctly using the flag over the wire according to whether
> +extended headers were negotiated."
>  ^ strict_call_description;
>  see_also = [Link "can_fua"; Link "is_read_only";
>  Link "aio_pwrite"; Link "get_block_size";
> @@ -3220,7 +3224,7 @@   "aio_pwrite", {
>  default_call with
>  args = [ BytesPersistIn ("buf", "count"); UInt64 "offset" ];
>  optargs = [ OClosure completion_closure;
> -OFlags ("flags", cmd_flags, Some 

Re: [Libguestfs] [libnbd PATCH v3 03/22] protocol: Add definitions for extended headers

2023-06-07 Thread Richard W.M. Jones
On Tue, May 30, 2023 at 05:48:25PM +0200, Laszlo Ersek wrote:
> BTW I'm foreseeing a problem: if the extended block descriptor can
> provide an unsigned 64-bit length, we're going to have trouble exposing
> that in OCaml, because OCaml only has signed 64-bit integers. So that's
> going to reproduce the same issue, only for OCaml callers of the *new* API.
> 
> I can see Eric's series includes patches like "ocaml: Add example for
> 64-bit extents" -- I've not looked at those yet; for now I'm just
> wondering what tricks we might need in the bindings generator. The
> method seen in the "middle patch" above won't work; we don't have a
> native OCaml "i128" type for example that we could use as an escape
> hatch, for representing C's uint64_t.

I think that's OK because disk sizes are already limited to
2^63 - 1 by the kernel (and for qemu even less than that).
The OCaml bindings return a (signed) int64 for NBD.get_size.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit




Re: [PULL 05/17] parallels: Out of image offset in BAT leads to image inflation

2023-06-07 Thread Michael Tokarev

07.06.2023 09:51, Michael Tokarev wrote:

05.06.2023 18:45, Hanna Czenczek wrote:

From: Alexander Ivanov 

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will
create the cluster at this offset and/or the image will be truncated to
this offset on close. This is definitely not correct.

Raise an error in parallels_open() if data_end points outside the image
and it is not a check (let the check to repaire the image). Set data_end
to the end of the cluster with the last correct offset.


Hi!

This, and a few other parallels changes in this series:

  parallels: Out of image offset in BAT leads to image inflation
  parallels: Fix high_off calculation in parallels_co_check()
  parallels: Fix image_end_offset and data_end after out-of-image check
  parallels: Fix statistics calculation (?)


And probably also:

  parallels: Incorrect condition in out-of-image check


Should these be applied to -stable too, or is it not important?

Thanks,

/mjt






Re: [PULL 05/17] parallels: Out of image offset in BAT leads to image inflation

2023-06-07 Thread Michael Tokarev

05.06.2023 18:45, Hanna Czenczek wrote:

From: Alexander Ivanov 

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will
create the cluster at this offset and/or the image will be truncated to
this offset on close. This is definitely not correct.

Raise an error in parallels_open() if data_end points outside the image
and it is not a check (let the check to repaire the image). Set data_end
to the end of the cluster with the last correct offset.


Hi!

This, and a few other parallels changes in this series:

 parallels: Out of image offset in BAT leads to image inflation
 parallels: Fix high_off calculation in parallels_co_check()
 parallels: Fix image_end_offset and data_end after out-of-image check
 parallels: Fix statistics calculation (?)

Should these be applied to -stable too, or is it not important?

Thanks,

/mjt