date:20171109

Re: [Qemu-block] [PATCH v2 1/7] nbd-client: Fix error message typos

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:56, Eric Blake wrote:

Provide missing spaces that are required when using string
concatenation to break error messages across source lines.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



---
  block/nbd-client.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index b44d4d4a01..de6c153328 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -248,7 +248,7 @@ static int nbd_parse_error_payload(NBDStructuredReplyChunk 
*chunk,

  error = nbd_errno_to_system_errno(payload_advance32());
  if (error == 0) {
-error_setg(errp, "Protocol error: server sent structured error chunk"
+error_setg(errp, "Protocol error: server sent structured error chunk "
   "with error = 0");
  return -EINVAL;
  }
@@ -257,7 +257,7 @@ static int nbd_parse_error_payload(NBDStructuredReplyChunk 
*chunk,
  message_size = payload_advance16();

  if (message_size > chunk->length - sizeof(error) - sizeof(message_size)) {
-error_setg(errp, "Protocol error: server sent structured error chunk"
+error_setg(errp, "Protocol error: server sent structured error chunk "
   "with incorrect message size");
  return -EINVAL;
  }
@@ -408,7 +408,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
  if (chunk->type == NBD_REPLY_TYPE_NONE) {
  if (!(chunk->flags & NBD_REPLY_FLAG_DONE)) {
  error_setg(errp, "Protocol error: NBD_REPLY_TYPE_NONE chunk 
without"
- "NBD_REPLY_FLAG_DONE flag set");
+   " NBD_REPLY_FLAG_DONE flag set");


I think it's better not to change indentation here, as it is done so in 
other places.

You don't like this way of indenting splitted strings?


  return -EINVAL;
  }
  return 0;



--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v2 6/7] nbd-client: Stricter enforcing of structured reply spec

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:57, Eric Blake wrote:

Ensure that the server is not sending unexpected chunk lengths
for either the NONE or the OFFSET_DATA chunk, nor unexpected
hole length for OFFSET_HOLE.  This will flag any server that
responds to a zero-length read with an OFFSET_DATA as broken,


or OFFSET_HOLE


even though we previously fixed our client to never be able to
send such a request over the wire.

Signed-off-by: Eric Blake 
---
  block/nbd-client.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 0a675d0fab..73c9fe0905 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -216,7 +216,7 @@ static int 
nbd_parse_offset_hole_payload(NBDStructuredReplyChunk *chunk,
  offset = payload_advance64();
  hole_size = payload_advance32();

-if (offset < orig_offset || hole_size > qiov->size ||
+if (!hole_size || offset < orig_offset || hole_size > qiov->size ||
  offset > orig_offset + qiov->size - hole_size) {
  error_setg(errp, "Protocol error: server sent chunk exceeding 
requested"
   " region");
@@ -281,7 +281,8 @@ static int 
nbd_co_receive_offset_data_payload(NBDClientSession *s,

  assert(nbd_reply_is_structured(>reply));

-if (chunk->length < sizeof(offset)) {
+/* The NBD spec requires at least one byte of payload */
+if (chunk->length <= sizeof(offset)) {
  error_setg(errp, "Protocol error: invalid payload for "
   "NBD_REPLY_TYPE_OFFSET_DATA");
  return -EINVAL;
@@ -293,7 +294,7 @@ static int 
nbd_co_receive_offset_data_payload(NBDClientSession *s,
  be64_to_cpus();

  data_size = chunk->length - sizeof(offset);
-if (offset < orig_offset || data_size > qiov->size ||
+if (!data_size || offset < orig_offset || data_size > qiov->size ||


!data_size - always false here (because of previous check), and even if it 
could be zero it
isn't correspond to error message.

without this, or with an assert instead:

Reviewed-by: Vladimir Sementsov-Ogievskiy 



  offset > orig_offset + qiov->size - data_size) {
  error_setg(errp, "Protocol error: server sent chunk exceeding 
requested"
   " region");
@@ -411,6 +412,11 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 " NBD_REPLY_FLAG_DONE flag set");
  return -EINVAL;
  }
+if (chunk->length) {
+error_setg(errp, "Protocol error: NBD_REPLY_TYPE_NONE chunk with"
+   " nonzero length");
+return -EINVAL;
+}
  return 0;
  }




--
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] question： I found a qemu crash when attach virtio-scsi disk

2017-11-09 Thread Stefan Hajnoczi

On Tue, Nov 07, 2017 at 11:39:44AM +0100, Paolo Bonzini wrote:
> On 07/11/2017 02:59, Fam Zheng wrote:
> > On Mon, 11/06 17:33, Paolo Bonzini wrote:
> >> On 06/11/2017 17:11, Kevin Wolf wrote:
> >>> Am 03.11.2017 um 11:26 hat Stefan Hajnoczi geschrieben:
>  On Wed, Nov 01, 2017 at 06:42:33AM +, lizhengui wrote:
> > Hi, when I attach virtio-scsi disk to VM, the qemu crash happened at 
> > very low probability. 
> >
> > The qemu crash bt is below:
> >
> > #0  0x7f2be3ada1d7 in raise () from /usr/lib64/libc.so.6
> > #1  0x7f2be3adb8c8 in abort () from /usr/lib64/libc.so.6
> > #2  0x0084fe49 in PAT_abort ()
> > #3  0x0084ce8d in patchIllInsHandler ()
> > #4  
> > #5  0x008228bb in qemu_strnlen ()
> > #6  0x00822934 in strpadcpy ()
> > #7  0x00684a88 in scsi_disk_emulate_inquiry ()
> > #8  0x0068744b in scsi_disk_emulate_command ()
> > #9  0x0068c481 in scsi_req_enqueue ()
> > #10 0x004b1f00 in virtio_scsi_handle_cmd_req_submit ()
> > #11 0x004b2e9e in virtio_scsi_handle_cmd_vq ()
> > #12 0x0076dba7 in aio_dispatch ()
> > #13 0x0076dd96 in aio_poll ()
> > #14 0x007a8673 in blk_prw ()
> > #15 0x007a922c in blk_pread ()
> > #16 0x007a9cd0 in blk_pread_unthrottled ()
> > #17 0x005cb404 in guess_disk_lchs ()
> > #18 0x005cb5b4 in hd_geometry_guess ()
> > #19 0x005cad56 in blkconf_geometry ()
> > #20 0x00685956 in scsi_realize ()
> > #21 0x0068d3e3 in scsi_qdev_realize ()
> > #22 0x005e3938 in device_set_realized ()
> > #23 0x0075bced in property_set_bool ()
> > #24 0x00760205 in object_property_set_qobject ()
> > #25 0x0075df64 in object_property_set_bool ()
> > #26 0x005580ad in qdev_device_add ()
> > #27 0x0055850b in qmp_device_add ()
> > #28 0x00818b37 in do_qmp_dispatch.constprop.1 ()
> > #29 0x00818d8b in qmp_dispatch ()
> > #30 0x0045d212 in handle_qmp_command ()
> > #31 0x0081f819 in json_message_process_token ()
> > #32 0x008434d0 in json_lexer_feed_char ()
> > #33 0x008435e6 in json_lexer_feed ()
> > #34 0x0045bd72 in monitor_qmp_read ()
> > #35 0x0055ecf3 in tcp_chr_read ()
> > #36 0x7f2be4cf899a in g_main_context_dispatch () from 
> > /usr/lib64/libglib-2.0.so.0
> > #37 0x0076b86b in os_host_main_loop_wait ()
> > #38 0x0076b995 in main_loop_wait ()
> > #39 0x00569c51 in main_loop ()
> > #40 0x00420665 in main ()
> >
> > From the qemu crash bt, we can see that the scsi_realize has not 
> > completed yet. Some fields sush as vendor, version in SCSIDiskState is 
> > Null at this moment. If qemu handles scsi request from this scsi disk 
> > at this moment, the qemu will access some null pointers and cause crash.
> > How can I solve this problem? Should we add a check that whether the 
> > scsi disk has realized or not in scsi_disk_emulate_command before
> > Handling scsi requests? 
> 
>  Please try this patch:
> 
>  diff --git a/hw/block/block.c b/hw/block/block.c
>  index 27878d0087..df99ddb899 100644
>  --- a/hw/block/block.c
>  +++ b/hw/block/block.c
>  @@ -120,9 +120,16 @@ void blkconf_geometry(BlockConf *conf, int *ptrans,
>   }
>   }
>   if (!conf->cyls && !conf->heads && !conf->secs) {
>  +AioContext *ctx = blk_get_aio_context(conf->blk);
>  +
>  +/* Callers may not expect this function to dispatch aio 
>  handlers, so
>  + * disable external aio such as guest device emulation.
>  + */
>  +aio_disable_external(ctx);
>   hd_geometry_guess(conf->blk,
> >cyls, >heads, >secs,
> ptrans);
>  +aio_enable_external(ctx);
>   } else if (ptrans && *ptrans == BIOS_ATA_TRANSLATION_AUTO) {
>   *ptrans = hd_bios_chs_auto_trans(conf->cyls, conf->heads, 
>  conf->secs);
>   }
> >>>
> >>> But why is the new disk even attached to the controller and visible to
> >>> the guest at this point when it hasn't completed its initialisation yet?
> >>> Isn't the root problem that we're initialising things in the wrong
> >>> order?
> >>
> >> Well, the root cause then is that scsi_device_find is just reusing the
> >> list of devices on the SCSI bus.  Devices are added to that list very
> >> early by qdev_set_parent_bus.
> >>
> >> Stefan's patch could make the issue even harder to hit, but I think that
> >> with iothreads you could hit it anyway.
> >>
> >> The solution is probably to add an "online" flag to the device, and set
> >> it in scsi_device_realize.

Re: [Qemu-block] [PATCH v2 3/7] nbd/client: Nicer trace of structured reply

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:56, Eric Blake wrote:

It's useful to know which structured reply chunk is being processed.

Signed-off-by: Eric Blake 



Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir

[Qemu-block] [PATCH] coroutine: simplify co_aio_sleep_ns() prototype

2017-11-09 Thread Stefan Hajnoczi

The AioContext pointer argument to co_aio_sleep_ns() is only used for
the sleep timer.  It does not affect where the caller coroutine is
resumed.

Due to changes to coroutine and AIO APIs it is now possible to drop the
AioContext pointer argument.  This is safe to do since no caller has
specific requirements for which AioContext the timer must run in.

This patch drops the AioContext pointer argument and renames the
function to simplify the API.

Reported-by: Paolo Bonzini 
Reported-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/coroutine.h| 6 +-
 block/null.c| 3 +--
 block/sheepdog.c| 3 +--
 blockjob.c  | 2 +-
 util/qemu-coroutine-sleep.c | 4 ++--
 5 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 9aff9a735e..ce2eb73670 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -261,12 +261,8 @@ void qemu_co_rwlock_unlock(CoRwlock *lock);
 
 /**
  * Yield the coroutine for a given duration
- *
- * Behaves similarly to co_sleep_ns(), but the sleeping coroutine will be
- * resumed when using aio_poll().
  */
-void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
-  int64_t ns);
+void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns);
 
 /**
  * Yield until a file descriptor becomes readable
diff --git a/block/null.c b/block/null.c
index dd9c13f9ba..0cdabaa440 100644
--- a/block/null.c
+++ b/block/null.c
@@ -110,8 +110,7 @@ static coroutine_fn int null_co_common(BlockDriverState *bs)
 BDRVNullState *s = bs->opaque;
 
 if (s->latency_ns) {
-co_aio_sleep_ns(bdrv_get_aio_context(bs), QEMU_CLOCK_REALTIME,
-s->latency_ns);
+qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, s->latency_ns);
 }
 return 0;
 }
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 696a71442a..a1edb992ff 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -776,8 +776,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
 if (s->fd < 0) {
 DPRINTF("Wait for connection to be established\n");
 error_report_err(local_err);
-co_aio_sleep_ns(bdrv_get_aio_context(s->bs), QEMU_CLOCK_REALTIME,
-10ULL);
+qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 10ULL);
 }
 };
 
diff --git a/blockjob.c b/blockjob.c
index 3a0c49137e..24d912161f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -799,7 +799,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, 
int64_t ns)
 
 job->busy = false;
 if (!block_job_should_pause(job)) {
-co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
+qemu_co_sleep_ns(type, ns);
 }
 job->busy = true;
 
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 9c5655041b..b345a55994 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -28,9 +28,9 @@ static void co_sleep_cb(void *opaque)
 aio_co_wake(sleep_cb->co);
 }
 
-void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type,
-  int64_t ns)
+void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns)
 {
+AioContext *ctx = qemu_get_current_aio_context();
 CoSleepCB sleep_cb = {
 .co = qemu_coroutine_self(),
 };
-- 
2.13.6

Re: [Qemu-block] [PATCH 1/2] nbd/server: Implement sparse reads atop structured reply

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


07.11.2017 06:09, Eric Blake wrote:

The reason that NBD added structured reply in the first place was
to allow for efficient reads of sparse files, by allowing the
reply to include chunks to quickly communicate holes to the client
without sending lots of zeroes over the wire.  Time to implement
this in the server; our client can already read such data.

We can only skip holes insofar as the block layer can query them;
and only if the client is okay with a fragmented request (if a
client requests NBD_CMD_FLAG_DF and the entire read is a hole, we
could technically return a single NBD_REPLY_TYPE_OFFSET_HOLE, but
that's a fringe case not worth catering to here).  Sadly, the
control flow is a bit wonkier than I would have preferred, but
it was minimally invasive to have a split in the action between
a fragmented read (handled directly where we recognize
NBD_CMD_READ with the right conditions, and sending multiple
chunks) vs. a single read (handled at the end of nbd_trip, for
both simple and structured replies, when we know there is only
one thing being read).  Likewise, I didn't make any effort to
optimize the final chunk of a fragmented read to set the
NBD_REPLY_FLAG_DONE, but unconditionally send that as a separate
NBD_REPLY_TYPE_NONE.

Signed-off-by: Eric Blake 



Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH 2/2] nbd/server: Optimize final chunk of sparse read

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


07.11.2017 06:09, Eric Blake wrote:

If we are careful to handle 0-length read requests correctly,
we can optimize our sparse read to send the NBD_REPLY_FLAG_DONE
bit on our last OFFSET_DATA or OFFSET_HOLE chunk rather than
needing a separate chunk.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PATCH] block: Fix error path in bdrv_backing_update_filename()

2017-11-09 Thread Stefan Hajnoczi

On Mon, Nov 06, 2017 at 05:55:00PM +0100, Kevin Wolf wrote:
> error_setg_errno() takes a positive errno code.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH v4 0/4] Don't write headers if BDS is INACTIVE

2017-11-09 Thread Stefan Hajnoczi

On Tue, Nov 07, 2017 at 08:10:32AM -0500, Jeff Cody wrote:
> Changes from v3->v4:
> 
> Patch 3: Add migrate_del_blocker and error_free (Thanks Stefan)
> 
> 
> git-backport-diff -r qemu/master.. -u ba11b69
> 
> 001/4:[] [--] 'block/vhdx.c: Don't blindly update the header'
> 002/4:[] [--] 'block/parallels: Do not update header or truncate image 
> when INMIGRATE'
> 003/4:[0003] [FC] 'block/parallels: add migration blocker'
> 004/4:[] [--] 'qemu-iotests: update unsupported image formats in 194'
> 
> 
> Changes from v2->v3:
> 
> Patch 2: Uh... fix that misspelling.  Thanks Stefan :)
> Patch 3: New patch to block migration in parallels
> 
> git-backport-diff -r qemu/master.. -u 6dc6acb
> 
> 001/4:[] [--] 'block/vhdx.c: Don't blindly update the header'
> 002/4:[] [--] 'block/parallels: Do not update header or truncate image 
> when INMIGRATE'
> 003/4:[down] 'block/parallels: add migration blocker'
> 004/4:[] [--] 'qemu-iotests: update unsupported image formats in 194'
> 
> 
> Changes from v1->v2:
> 
> * Drop previous parallels patches, just check BDRV_O_INACTIVE now
>   (Kevin)
> 
> git-backport-diff -r qemu/master.. -u github/master
> Key:
> [] : patches are identical
> [] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, 
> respectively
> 
> 001/3:[] [--] 'block/vhdx.c: Don't blindly update the header'
> 002/3:[down] 'block/parallals: Do not update header or truncate image when 
> INMIGRATE'
> 003/3:[] [--] 'qemu-iotests: update unsupported image formats in 194'
> 
> 
> v1:
> 
> VHDX and Parallels both blindly write headers to the image file
> if the images are opened R/W.  This causes an assert if the QEMU run
> state is INMIGRATE.
> 
> Jeff Cody (4):
>   block/vhdx.c: Don't blindly update the header
>   block/parallels: Do not update header or truncate image when INMIGRATE
>   block/parallels: add migration blocker
>   qemu-iotests: update unsupported image formats in 194
> 
>  block/parallels.c  | 22 +-
>  block/vhdx.c   |  7 ---
>  tests/qemu-iotests/194 |  2 +-
>  3 files changed, 18 insertions(+), 13 deletions(-)
> 
> -- 
> 2.9.5
> 
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] segfault in parallel blockjobs (iotest 30)

2017-11-09 Thread Alberto Garcia

On Thu 09 Nov 2017 07:05:26 AM CET, Fam Zheng wrote:

>> > I can fix the crash by adding block_job_pause_point(>common) at
>> > the end of stream_run() (where the 'out' label is).
>> > 
>> > I'm thinking that perhaps we should add the pause point directly to
>> > block_job_defer_to_main_loop(), to prevent any block job from
>> > running the exit function when it's paused.
>> > 
>> 
>> Is it possible that the exit function is already deferred when the
>> jobs are being paused? (even though it's at least less likely to
>> happen)
>> 
>> Then should we flush the bottom halves somehow in addition to putting
>> the jobs to sleep? And also then it all probably has to happen before
>> bdrv_reopen_queue()
>
> Or we can stash away the BH temporarily during pause period:
>
> ---
>
> diff --git a/blockjob.c b/blockjob.c
> index 3a0c49137e..7058ff7ae1 100644

Yeah, what you suggest sounds good. We would still need to add a pause
point in addition to this.

Does this sound good to everyone? If so I can prepare the patches.

Berto

Re: [Qemu-block] [PATCH v2 4/7] nbd: Fix struct name for structured reads

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:57, Eric Blake wrote:

A closer read of the NBD spec shows that a structured reply chunk
for a hole is not quite identical to the prefix of a data chunk,
because the hole has to also send a 32-bit size field.  Although
we do not yet send holes, we should fix the misleading information
in our header and make it easier for a future patch to support
sparse reads.

Signed-off-by: Eric Blake 



Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v2 7/7] nbd/server: Fix structured read of length 0

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:57, Eric Blake wrote:

The NBD spec was recently clarified to state that a read of length 0
should not be attempted by a compliant client; but that a server must
still handle it correctly in an unspecified manner (that is, either
a successful no-op or an error reply, but not a crash) [1].  However,
it also implies that NBD_REPLY_TYPE_OFFSET_DATA must have a non-zero
payload length, but our existing code was replying with a chunk
that a picky client could reject as invalid because it was missing
a payload.

We are already doing successful no-ops for 0-length writes and for
non-structured reads; so for consistency, we want structured reply
reads to also be a no-op.  The easiest way to do this is to return
a NBD_REPLY_TYPE_NONE chunk; this is best done via a new helper
function (especially since future patches for other structured
replies may benefit from using the same helper).

[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037

Signed-off-by: Eric Blake 



Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v2 2.5/7] fixup! nbd-client: Refuse read-only client with BDRV_O_RDWR

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 01:23, Eric Blake wrote:

...

Fix several iotests to comply with the new behavior (since
qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP,
default to a read-only export, we must tell blockdev-add/qemu-io to
set up a read-only client).

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v2 2/7] nbd-client: Refuse read-only client with BDRV_O_RDWR

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:56, Eric Blake wrote:

The NBD spec says that clients should not try to write/trim to
an export advertised as read-only by the server.  But we failed
to check that, and would allow the block layer to use NBD with
BDRV_O_RDWR even when the server is read-only, which meant we
were depending on the server sending a proper EPERM failure for
various commands, and also exposes a leaky abstraction: using
qemu-io in read-write mode would succeed on 'w -z 0 0' because
of local short-circuiting logic, but 'w 0 0' would send a
request over the wire (where it then depends on the server, and
fails at least for qemu-nbd but might pass for other NBD
implementations).

With this patch, a client MUST request read-only mode to access
a server that is doing a read-only export, or else it will get
a message like:

can't open device nbd://localhost:10809/foo: request for write access conflicts 
with read-only export

It is no longer possible to even attempt writes over the wire
(including the corner case of 0-length writes), because the block
layer enforces the explicit read-only request; this matches the
behavior of qcow2 when backed by a read-only POSIX file.

Fix iotest 147 to comply with the new behavior (since nbd-server-add
defaults to a read-only export, we must tell blockdev-add to set up a
read-only client).

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 



--
Best regards,
Vladimir

Re: [Qemu-block] [PATCH v2 5/7] nbd-client: Short-circuit 0-length operations

2017-11-09 Thread Vladimir Sementsov-Ogievskiy


09.11.2017 00:57, Eric Blake wrote:

The NBD spec was recently clarified to state that clients should
not send 0-length requests to the server, as the server behavior
is undefined [1].  We know that qemu-nbd's behavior is a successful
no-op (once it has filtered for read-only exports), but other NBD
implementations might return an error.  To avoid any questionable
server implementations, it is better to just short-circuit such
requests on the client side (we are relying on the block layer to
already filter out requests such as invalid offset, write to a
read-only volume, and so forth).

[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  block/nbd-client.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index daa4392531..0a675d0fab 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -674,6 +674,9 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t 
offset,
  assert(bytes <= NBD_MAX_BUFFER_SIZE);
  assert(!flags);

+if (!bytes) {
+return 0;
+}
  ret = nbd_co_send_request(bs, , NULL);
  if (ret < 0) {
  return ret;
@@ -705,6 +708,9 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,

  assert(bytes <= NBD_MAX_BUFFER_SIZE);

+if (!bytes) {
+return 0;
+}


we don't do this check before flags manipulation to do not miss possible 
asserts...



  return nbd_co_request(bs, , qiov);
  }

@@ -731,6 +737,9 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset,
  request.flags |= NBD_CMD_FLAG_NO_HOLE;
  }

+if (!bytes) {
+return 0;
+}
  return nbd_co_request(bs, , NULL);
  }

@@ -759,7 +768,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int bytes)
  };

  assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
-if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
+if (!(client->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
  return 0;
  }




--
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] question： I found a qemu crash when attach virtio-scsi disk

2017-11-09 Thread Paolo Bonzini

On 09/11/2017 12:33, Stefan Hajnoczi wrote:
>>> Can main thread somehow call aio_context_acquire(vs->ctx) (and release) 
>>> around
>>> qdev_set_parent_bus()? virtio_scsi_device_find() takes the lock.
>> No, the context is not set yet.  But the locking is easy to add,
>> separately from the bug that Zhengui is reporting.
> Do you want to submit a patch for this instead of the
> aio_disable_external() approach I posted?  I think your idea is cleaner
> than modifying the geometry probing function.

Yes, I will.  Both the locking and the "online" status.

Paolo



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 1/7] nbd-client: Fix error message typos

2017-11-09 Thread Eric Blake

On 11/09/2017 02:58 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.11.2017 00:56, Eric Blake wrote:
>> Provide missing spaces that are required when using string
>> concatenation to break error messages across source lines.
>>
>> Signed-off-by: Eric Blake 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 

>> @@ -408,7 +408,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
>>   if (chunk->type == NBD_REPLY_TYPE_NONE) {
>>   if (!(chunk->flags & NBD_REPLY_FLAG_DONE)) {
>>   error_setg(errp, "Protocol error: NBD_REPLY_TYPE_NONE
>> chunk without"
>> - "NBD_REPLY_FLAG_DONE flag set");
>> +   " NBD_REPLY_FLAG_DONE flag set");
> 
> I think it's better not to change indentation here, as it is done so in
> other places.
> You don't like this way of indenting splitted strings?

s/splitted/split/ (one of those weird irregular English verbs)

I just did what emacs recommended when I hit TAB.  Your indentation
style also works in isolation, even if it isn't the default that emacs
tries to give me.  I could avoid the churn on this patch, but then my
addition in 6/7 looks inconsistent compared to this one, so I'll
probably just leave the indentation change in place.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 5/7] nbd-client: Short-circuit 0-length operations

2017-11-09 Thread Eric Blake

On 11/09/2017 03:20 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.11.2017 00:57, Eric Blake wrote:
>> The NBD spec was recently clarified to state that clients should
>> not send 0-length requests to the server, as the server behavior
>> is undefined [1].  We know that qemu-nbd's behavior is a successful
>> no-op (once it has filtered for read-only exports), but other NBD
>> implementations might return an error.  To avoid any questionable
>> server implementations, it is better to just short-circuit such
>> requests on the client side (we are relying on the block layer to
>> already filter out requests such as invalid offset, write to a
>> read-only volume, and so forth).
>>
>> [1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037
>>
>> Signed-off-by: Eric Blake 
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 

>> @@ -705,6 +708,9 @@ int nbd_client_co_pwritev(BlockDriverState *bs,
>> uint64_t offset,
>>
>>   assert(bytes <= NBD_MAX_BUFFER_SIZE);
>>
>> +    if (!bytes) {
>> +    return 0;
>> +    }
> 
> we don't do this check before flags manipulation to do not miss possible
> asserts...
> 
>>   return nbd_co_request(bs, , qiov);

Correct - I put the short-circuit as late as possible to ensure that
preconditions are still being met.  I can tweak the commit message to
make that more obvious, if desired.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH] coroutine: simplify co_aio_sleep_ns() prototype

2017-11-09 Thread Eric Blake

On 11/09/2017 04:26 AM, Stefan Hajnoczi wrote:
> The AioContext pointer argument to co_aio_sleep_ns() is only used for
> the sleep timer.  It does not affect where the caller coroutine is
> resumed.
> 
> Due to changes to coroutine and AIO APIs it is now possible to drop the
> AioContext pointer argument.  This is safe to do since no caller has
> specific requirements for which AioContext the timer must run in.
> 
> This patch drops the AioContext pointer argument and renames the
> function to simplify the API.

And it fixes a stale comment ;)

> 
> Reported-by: Paolo Bonzini 
> Reported-by: Eric Blake 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/qemu/coroutine.h| 6 +-
>  block/null.c| 3 +--
>  block/sheepdog.c| 3 +--
>  blockjob.c  | 2 +-
>  util/qemu-coroutine-sleep.c | 4 ++--
>  5 files changed, 6 insertions(+), 12 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] RFC: use case for adding QMP, block jobs & multiple exports to qemu-nbd ?

2017-11-09 Thread Markus Armbruster

Max Reitz  writes:

> On 2017-11-02 13:02, Daniel P. Berrange wrote:
> [...]
>> One alternative approach to doing this would be to suggest that we should
>> instead just spawn qemu-system-x86_64 with '--machine none' and use that
>> as a replacement for qemu-nbd, since it already has a built-in NBD server
>> which can do many exports at once and arbitrary block jobs.
>
> As far as I know, we had wanted to add QMP support to qemu-nbd maybe one
> or two years ago, but nobody ever did it.
>
> I've had some discussions about this with Markus and Kevin at KVM Forum.
>  They appeared to strongly prefer this approach.  I agree with them that
> design-wise, a qemu with no machine at all (and not even -M none) and
> early QMP is the way we want to go anyway, and then this would be the
> correct tool to use.

"Strongly" is perhaps a bit strong, at least as far as I'm concerned.  I
just believe that we want the capability to run QEMU without a machine
anyway, and if that's good enough, then why bother duplicating so much
of qemu-system-FOO in qemu-nbd & friends?  Besides, once you start to
duplicate, you'll likely find it hard to stop.

>> I'm concerned that this could end up being a be a game of whack-a-mole
>> though, constantly trying to cut out/down all the bits of system emulation
>> in the machine emulators to get its resource overhead to match the low
>> overhead of standalone qemu-nbd.
>
> However, I personally share your concern.  Especially, I think that
> getting to a point where we can have no machine at all and early QMP
> will take much longer than just adding QMP to qemu-nbd -- or adding a
> qmp command to qemu-img (because you can add NBD exports through QMP, so
> qemu-nbd's hardcoded focus on NBD exports seems kind of weird then)[1].
>
> I'm very much torn here.  There are two approaches: Stripping fat qemu
> down, or fattening lean qemu-img (?) up.  The latter is very simple.

"Very simple" is perhaps debatable, but I think we can agree on
"temptingly simple".

> The former is what we want anyway.

Yes.

> Markus says it's not too hard to strip down qemu.  If that is true,

To find out, we need to experimentally remodel main() with an axe.
Volunteers?

> there is no point in fattening qemu-img now.  I personally am not
> convinced at all, but he knows the state of that project much better
> than me, so I cannot reasonably disagree.
>
> So my mail is more of a CC to Markus and Kevin -- but I think both are
> on PTO right now.

Back, nursing my conference cold.

> I guess the main question is: If someone were to introduce a qemu-img
> QMP subcommand -- would it be accepted? :-)
>
> Max
>
>
> [1] Also, adding QMP should trivially add block jobs and multiple
> exports to whatever tool we are talking about (in fact, qemu-img already
> does perform the mirror block job for committing).

Re: [Qemu-block] [Qemu-devel] [PATCH 2/5] iotests: Add missing 'blkdebug::' in 040

2017-11-09 Thread Eric Blake

On 11/08/2017 07:38 PM, Max Reitz wrote:
> 040 tries to invoke pause_drive() on a drive that does not use blkdebug.
> Good idea, but let's use blkdebug to make it actually work.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/040 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake 

> diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
> index c284d08796..90b5b4f2ad 100755
> --- a/tests/qemu-iotests/040
> +++ b/tests/qemu-iotests/040
> @@ -289,7 +289,7 @@ class TestSetSpeed(ImageCommitTestCase):
>  qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
> mid_img, test_img)
>  qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x1 0 512', test_img)
>  qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', 
> mid_img)
> -self.vm = iotests.VM().add_drive(test_img)
> +self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
>  self.vm.launch()
>  
>  def tearDown(self):
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] iotests: Make 030 less flaky

2017-11-09 Thread Eric Blake

On 11/08/2017 07:38 PM, Max Reitz wrote:
> This patch fixes two race conditions in 030:
> 
> 1. The first is in TestENSPC.test_enospc().  After resuming the job,

s/ENSPC/ENOSPC/

>querying it to confirm it is no longer paused may fail because in the
>meantime it might have completed already.  The same was fixed in
>TestEIO.test_ignore() already (in commit
>2c3b44da07d341557a8203cc509ea07fe3605e11).
> 
> 2. The second is in TestSetSpeed.test_set_speed_invalid(): Here, a
>stream job is started on a drive without any break points, with a
>block-job-set-speed invoked subsequently.  However, without any break
>points, the job might have completed in the meantime (on tmpfs at
>least); or it might complete before cancel_and_wait() which expects
>the job to still exist.  This can be fixed like everywhere else by
>pausing the drive (installing break points) before starting the job
>and letting cancel_and_wait() resume it.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/030 | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 3/5] iotests: Make 055 less flaky

2017-11-09 Thread Eric Blake

On 11/08/2017 07:38 PM, Max Reitz wrote:
> First of all, test 055 does a valiant job of invoking pause_drive()
> sometimes, but that is worth nothing without blkdebug.  So the first
> thing to do is to sprinkle a couple of "blkdebug::" in there -- with the
> exception of the transaction tests, because the blkdebug break points
> make the transaction QMP command hang (which is bad).  In that case, we
> can get away with throttling the block job that it effectively is
> paused.
> 
> Then, 055 usually does not pause the drive before starting a block job
> that should be cancelled.  This means that the backup job might be
> completed already before block-job-cancel is invoked; thus making the
> test either fail (currently) or moot if cancel_and_wait() ignored this
> condition.  Fix this by pausing the drive before starting the job.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/055 | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 4/5] iotests: Make 083 less flaky

2017-11-09 Thread Eric Blake

On 11/08/2017 07:38 PM, Max Reitz wrote:
> 083 has (at least) two issues:

I think I hit one of them intermittently yesterday; thanks for
diagnosing these (and like you say, there may be more lurking, but we'll
whack them separately if we can reproduce and identify them).

> 
> 1. By launching the nbd-fault-injector in background, it may not be
>scheduled until the first grep on its output file is executed.
>However, until then, that file may not have been created yet -- so it
>either does not exist yet (thus making the grep emit an error), or it
>does exist but contains stale data (thus making the rest of the test
>case work connect to a wrong address).
>Fix this by explicitly overwriting the output file before executing
>nbd-fault-injector.
> 
> 2. The nbd-fault-injector prints things other than "Listening on...".
>It also prints a "Closing connection" message from time to time.  We
>currently invoke sed on the whole file in the hope of it only
>containing the "Listening on..." line yet.  That hope is sometimes
>shattered by the brutal reality of race conditions, so invoke grep
>before sed.

Invoking 'grep | sed' is almost always a waste of a process; sed can do
the job alone.

> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/083 | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
> index 0306f112da..2f6444eeb9 100755
> --- a/tests/qemu-iotests/083
> +++ b/tests/qemu-iotests/083
> @@ -86,6 +86,7 @@ EOF
>  
>   rm -f "$TEST_DIR/nbd.sock"
>  
> +echo > "$TEST_DIR/nbd-fault-injector.out"

This makes the file contain a blank line.  Would it be any better as a
truly empty file, as in:

: > "$TEST_DIR/nbd-fault-injector.out"

>   $PYTHON nbd-fault-injector.py $extra_args "$nbd_addr" 
> "$TEST_DIR/nbd-fault-injector.conf" >"$TEST_DIR/nbd-fault-injector.out" 2>&1 &
>  
>   # Wait for server to be ready
> @@ -94,7 +95,7 @@ EOF
>   done
>  
>   # Extract the final address (port number has now been assigned in tcp 
> case)
> - nbd_addr=$(sed 's/Listening on \(.*\)$/\1/' 
> "$TEST_DIR/nbd-fault-injector.out")
> +nbd_addr=$(grep 'Listening on ' "$TEST_DIR/nbd-fault-injector.out" | 
> sed 's/Listening on \(.*\)$/\1/')

Fixing TAB damage while at it - nice.

Here's how to do it using just sed, and with less typing:

nbd_addr=$(sed -n 's/^Listening on //p' \
   "$TEST_DIR/nbd-fault-injector.out")

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-11-09 Thread Vladimir Sementsov-Ogievskiy

This is needed to implement image-fleecing scheme, when we create
a temporary node, mark our active node to be backing for the temp,
and start backup(sync=none) from active node to the temp node.
Temp node then represents a kind of snapshot and may be used
for external backup through NBD.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

What was the reason to abandon non-root nodes?

 blockdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 56a6b24a0b..d0a5a107f0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn,
 backup->compress = false;
 }
 
-bs = qmp_get_root_bs(backup->device, errp);
+bs = bdrv_lookup_bs(backup->device, backup->device, errp);
 if (!bs) {
 return NULL;
 }
-- 
2.11.1

Re: [Qemu-block] [Qemu-devel] [PATCH v4 04/45] tests: Replace fprintf(stderr, "*\n" with error_report()

2017-11-09 Thread Philippe Mathieu-Daudé

On 11/09/2017 11:38 AM, Eric Blake wrote:
> On 11/08/2017 04:56 PM, Alistair Francis wrote:
[...]
>> exit(-1); } qemu_thread_create([n_threads], "test", func,
>> [n_threads], @@ -417,7 +418,7 @@ static void
>> gtest_stress_10_5(void)
>> 
>> static void usage(int argc, char *argv[]) { -fprintf(stderr,
>> "Usage: %s [nreaders [ perf | stress ] ]\n", argv[0]); +
>> error_report("Usage: %s [nreaders [ perf | stress ] ]",
>> argv[0]); exit(-1);
> 
> Separate patch - but 'exit(-1)' is almost always wrong (it gives
> status 255 through wait()/waitpid(); meanwhile waitid() is required
> by POSIX to get at the full 32-bit value except that Linux doesn't
> obey that requirement).  A process where wait() returns 255 makes
> xargs behave differently.  We really want exit(1).

$ git grep 'exit(-' origin/master | wc -l
60

Candidate for another mechanical patch?

> 
>> +++ b/tests/tcg/linux-test.c @@ -51,7 +51,7 @@ void error1(const
>> char *filename, int line, const char *fmt, ...) va_start(ap,
>> fmt); fprintf(stderr, "%s:%d: ", filename, line); 
>> vfprintf(stderr, fmt, ap); -fprintf(stderr, "\n"); +
>> error_report("");
> 
> Umm, a blank line is not a useful error.  This hunk is bogus; we 
> probably want to stick with fprintf() for the entire message.

=)

Re: [Qemu-block] [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)

2017-11-09 Thread Max Reitz

On 2017-11-09 05:21, Fam Zheng wrote:
> On Thu, 11/09 01:48, Max Reitz wrote:
>> Hi,
>>
>> More exciting news from the bdrv_drain() front!
>>
>> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
>> the tests on tmpfs, but I've just now verified that it happens on my SSD
>> just as well.
>>
>> So the reproducer is a plain:
>>
>> while ./check -raw 194; do; done
> 
> I cannot produce it on my machine.

Hm, too bad.  I see it both on my work laptop (with Fedora) and my
desktop (with Arch)...

>> (No difference between raw or qcow2, though.)
>>
>> And then, after a couple of runs (or a couple ten), it will just hang.
>> The reason is that the source VM lingers around and doesn't quit
>> voluntarily -- the test itself was successful, but it just can't exit.
>>
>> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
>> this is the backtrace:
>>
>> #0  0x7f7cfc297e06 in ppoll () at /lib64/libc.so.6
>> #1  0x563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
>> __nfds=, __fds=) at
>> /usr/include/bits/poll2.h:77
> 
> Looking at the 0 timeout it seems we are in the aio_poll(ctx, blocking=false);
> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is making
> progress and causing the return value to be true in aio_poll().
> 
>> #2  0x563b846bcac9 in qemu_poll_ns (fds=,
>> nfds=, timeout=) at util/qemu-timer.c:322
>> #3  0x563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
>> blocking=) at util/aio-posix.c:629
>> #4  0x563b8463afa4 in bdrv_drain_recurse
>> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
>> #5  0x563b8463baff in bdrv_drain_all_begin () at block/io.c:381
>> #6  0x563b8463bc99 in bdrv_drain_all () at block/io.c:411
>> #7  0x563b8459888b in block_migration_cleanup (opaque=> out>) at migration/block.c:714
>> #8  0x563b845883be in qemu_savevm_state_cleanup () at
>> migration/savevm.c:1251
>> #9  0x563b845811fd in migration_thread (opaque=0x563b856f1da0) at
>> migration/migration.c:2298
>> #10 0x7f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
>> #11 0x7f7cfc2a3e1f in clone () at /lib64/libc.so.6
>>
>>
>> And when you make bdrv_drain_all_begin() print what we are trying to
>> drain, you can see that it's the format node (managed by the "raw"
>> driver in this case).
> 
> So what is the value of bs->in_flight?

gdb:
> (gdb) print bs->in_flight 
> $3 = 2307492233

"That's weird, why would it..."

> (gdb) print *bs
> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = false, 
> probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque = 0x0, 
> aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 
> 0x8989898989898989}, 
>   walking_aio_notifiers = 137, filename = '\211' , 
> backing_file = '\211' , backing_format = '\211'  16 times>, full_open_options = 0x8989898989898989, 
>   exact_filename = '\211' , backing = 0x8989898989898989, 
> file = 0x8989898989898989, bl = {request_alignment = 2307492233, max_pdiscard 
> = -1987475063, pdiscard_alignment = 2307492233, 
> max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, 
> opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 
> 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = 
> -1987475063}, 
>   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, 
> node_name = '\211' , node_list = {tqe_next = 
> 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 
> 0x8989898989898989, 
> tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 
> 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, 
> op_blockers = {{lh_first = 0x8989898989898989} }, job = 
> 0x8989898989898989, 
>   inherits_from = 0x8989898989898989, children = {lh_first = 
> 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 
> 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 
> 2307492233, 
>   backing_blocker = 0x8989898989898989, total_sectors = -8536140394893047415, 
> before_write_notifiers = {notifiers = {lh_first = 0x8989898989898989}}, 
> write_threshold_offset = 9910603678816504201, write_threshold_notifier = 
> {notify = 
> 0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 
> 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = 
> -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 
> 2307492233, 
> __kind = -1987475063, __spins = -30327, __elision = -30327, __list = 
> {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = '\211' 
> , __align = -8536140394893047415}, initialized = 137}, 
>   dirty_bitmaps = {lh_first = 0x8989898989898989}, wr_highest_offset = {value 
> = 9910603678816504201}, copy_on_read = -1987475063, in_flight = 2307492233, 
> serialising_in_flight = 2307492233, wakeup = 137, io_plugged = 2307492233, 
>   enable_write_cache = -1987475063,

Re: [Qemu-block] [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)

2017-11-09 Thread Fam Zheng

On Thu, 11/09 16:14, Max Reitz wrote:
> On 2017-11-09 05:21, Fam Zheng wrote:
> > On Thu, 11/09 01:48, Max Reitz wrote:
> >> Hi,
> >>
> >> More exciting news from the bdrv_drain() front!
> >>
> >> I've noticed in the past that iotest 194 sometimes hangs.  I usually run
> >> the tests on tmpfs, but I've just now verified that it happens on my SSD
> >> just as well.
> >>
> >> So the reproducer is a plain:
> >>
> >> while ./check -raw 194; do; done
> > 
> > I cannot produce it on my machine.
> 
> Hm, too bad.  I see it both on my work laptop (with Fedora) and my
> desktop (with Arch)...
> 
> >> (No difference between raw or qcow2, though.)
> >>
> >> And then, after a couple of runs (or a couple ten), it will just hang.
> >> The reason is that the source VM lingers around and doesn't quit
> >> voluntarily -- the test itself was successful, but it just can't exit.
> >>
> >> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
> >> this is the backtrace:
> >>
> >> #0  0x7f7cfc297e06 in ppoll () at /lib64/libc.so.6
> >> #1  0x563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
> >> __nfds=, __fds=) at
> >> /usr/include/bits/poll2.h:77
> > 
> > Looking at the 0 timeout it seems we are in the aio_poll(ctx, 
> > blocking=false);
> > branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is 
> > making
> > progress and causing the return value to be true in aio_poll().
> > 
> >> #2  0x563b846bcac9 in qemu_poll_ns (fds=,
> >> nfds=, timeout=) at util/qemu-timer.c:322
> >> #3  0x563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
> >> blocking=) at util/aio-posix.c:629
> >> #4  0x563b8463afa4 in bdrv_drain_recurse
> >> (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
> >> #5  0x563b8463baff in bdrv_drain_all_begin () at block/io.c:381
> >> #6  0x563b8463bc99 in bdrv_drain_all () at block/io.c:411
> >> #7  0x563b8459888b in block_migration_cleanup (opaque= >> out>) at migration/block.c:714
> >> #8  0x563b845883be in qemu_savevm_state_cleanup () at
> >> migration/savevm.c:1251
> >> #9  0x563b845811fd in migration_thread (opaque=0x563b856f1da0) at
> >> migration/migration.c:2298
> >> #10 0x7f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
> >> #11 0x7f7cfc2a3e1f in clone () at /lib64/libc.so.6
> >>
> >>
> >> And when you make bdrv_drain_all_begin() print what we are trying to
> >> drain, you can see that it's the format node (managed by the "raw"
> >> driver in this case).
> > 
> > So what is the value of bs->in_flight?
> 
> gdb:
> > (gdb) print bs->in_flight 
> > $3 = 2307492233
> 
> "That's weird, why would it..."
> 
> > (gdb) print *bs
> > $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = 
> > false, probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque 
> > = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 
> > 0x8989898989898989}, 
> >   walking_aio_notifiers = 137, filename = '\211' , 
> > backing_file = '\211' , backing_format = '\211' 
> > , full_open_options = 0x8989898989898989, 
> >   exact_filename = '\211' , backing = 
> > 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 
> > 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
> > max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, 
> > opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 
> > 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = 
> > -1987475063}, 
> >   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, 
> > node_name = '\211' , node_list = {tqe_next = 
> > 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 
> > 0x8989898989898989, 
> > tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 
> > 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, 
> > op_blockers = {{lh_first = 0x8989898989898989} }, job = 
> > 0x8989898989898989, 
> >   inherits_from = 0x8989898989898989, children = {lh_first = 
> > 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 
> > 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 
> > 2307492233, 
> >   backing_blocker = 0x8989898989898989, total_sectors = 
> > -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 
> > 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, 
> > write_threshold_notifier = {notify = 
> > 0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 
> > 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = 
> > -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 
> > 2307492233, 
> > __kind = -1987475063, __spins = -30327, __elision = -30327, __list 
> > = {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = 
> > '\211' , __align = -8536140394893047415}, initialized = 
> > 137}, 
> >   dirty_bitmaps

Re: [Qemu-block] [Qemu-devel] [PATCH v4 04/45] tests: Replace fprintf(stderr, "*\n" with error_report()

2017-11-09 Thread Eric Blake

On 11/08/2017 04:56 PM, Alistair Francis wrote:
> Replace a large number of the fprintf(stderr, "*\n" calls with
> error_report(). The functions were renamed with these commands and then
> compiler issues where manually fixed.

s/where/were/


> 
> Some of the error_report()'s were manually kept as fprintf(stderr, ) to
> maintain standalone test cases.
> 
> Signed-off-by: Alistair Francis 
> Cc: "Michael S. Tsirkin" 
> Cc: Igor Mammedov 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Gerd Hoffmann 
> Cc: qemu-block@nongnu.org
> ---
> V2:
>  - Keep some of the fprintf() calls
>  - Remove a file I accidently checked in
> 

> +++ b/tests/ahci-test.c
> @@ -23,6 +23,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include 
>  
>  #include "libqtest.h"
> @@ -1859,7 +1860,7 @@ int main(int argc, char **argv)
>  ahci_pedantic = 1;
>  break;
>  default:
> -fprintf(stderr, "Unrecognized ahci_test option.\n");
> +error_report("Unrecognized ahci_test option.");

Most of our error_report() calls do not include trailing dot.

> +++ b/tests/bios-tables-test.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include 
>  #include "qemu-common.h"
>  #include "hw/smbios/smbios.h"
> @@ -396,7 +397,7 @@ try_again:
>  aml_file = g_strdup_printf("%s/%s/%.4s%s", data_dir, data->machine,
> (gchar *), ext);
>  if (getenv("V")) {
> -fprintf(stderr, "\nLooking for expected file '%s'\n", aml_file);
> +error_report("Looking for expected file '%s'", aml_file);

Here, it looks like this is a debug statement, gated by whether $V is
set in the environment; it's not really an error.

>  }
>  if (g_file_test(aml_file, G_FILE_TEST_EXISTS)) {
>  exp_sdt.aml_file = aml_file;
> @@ -408,7 +409,7 @@ try_again:
>  }
>  g_assert(exp_sdt.aml_file);
>  if (getenv("V")) {
> -fprintf(stderr, "\nUsing expected file '%s'\n", aml_file);
> +error_report("Using expected file '%s'", aml_file);
>  }

Ditto.

> +++ b/tests/i440fx-test.c
> @@ -13,7 +13,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> -
> +#include "qemu/error-report.h"
>  #include "libqtest.h"
>  #include "libqos/pci.h"
>  #include "libqos/pci-pc.h"
> @@ -295,18 +295,18 @@ static char *create_blob_file(void)
>  ret = -1;
>  fd = g_file_open_tmp("blob_XX", , );
>  if (fd == -1) {
> -fprintf(stderr, "unable to create blob file: %s\n", error->message);
> +error_report("unable to create blob file: %s", error->message);
>  g_error_free(error);

A candidate for
 error_reportf_err(error, "unable to create blob file: ");

> +++ b/tests/libqos/ahci.c
> @@ -23,7 +23,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> -
> +#include "qemu/error-report.h"
>  #include "libqtest.h"
>  #include "libqos/ahci.h"
>  #include "libqos/pci-pc.h"
> @@ -985,9 +985,9 @@ static void ahci_atapi_command_set_offset(AHCICommand 
> *cmd, uint64_t lba)
>  default:
>  /* SCSI doesn't have uniform packet formats,
>   * so you have to add support for it manually. Sorry! */
> -fprintf(stderr, "The Libqos AHCI driver does not support the "
> +error_report("The Libqos AHCI driver does not support the "
>  "set_offset operation for ATAPI command 0x%02x, "
> -"please add support.\n",
> +"please add support.",

Trailing dot is unusual.

>  cbd[0]);
>  g_assert_not_reached();
>  }
> @@ -1060,8 +1060,8 @@ static void ahci_atapi_set_size(AHCICommand *cmd, 
> uint64_t xbytes)
>  default:
>  /* SCSI doesn't have uniform packet formats,
>   * so you have to add support for it manually. Sorry! */
> -fprintf(stderr, "The Libqos AHCI driver does not support the 
> set_size "
> -"operation for ATAPI command 0x%02x, please add support.\n",
> +error_report("The Libqos AHCI driver does not support the set_size "
> +"operation for ATAPI command 0x%02x, please add support.",

Again

> +++ b/tests/libqos/libqos.c

> @@ -199,7 +200,7 @@ void mkimg(const char *file, const char *fmt, unsigned 
> size_mb)
>fmt, file, size_mb);
>  ret = g_spawn_command_line_sync(cli, , , , );
>  if (err) {
> -fprintf(stderr, "%s\n", err->message);
> +error_report("%s", err->message);
>  g_error_free(err);

Candidate for error_report_err()

> +++ b/tests/libqos/malloc.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "libqos/malloc.h"
>  #include "qemu-common.h"
>  #include "qemu/host-utils.h"
> @@ -193,7 +194,7 @@ static uint64_t mlist_alloc(QGuestAllocator *s,

Re: [Qemu-block] [PATCH v4 0/4] Don't write headers if BDS is INACTIVE

2017-11-09 Thread Denis V. Lunev

On 11/07/2017 04:10 PM, Jeff Cody wrote:
> Changes from v3->v4:
>
> Patch 3: Add migrate_del_blocker and error_free (Thanks Stefan)
>
>
> git-backport-diff -r qemu/master.. -u ba11b69
>
> 001/4:[] [--] 'block/vhdx.c: Don't blindly update the header'
> 002/4:[] [--] 'block/parallels: Do not update header or truncate image 
> when INMIGRATE'
> 003/4:[0003] [FC] 'block/parallels: add migration blocker'
> 004/4:[] [--] 'qemu-iotests: update unsupported image formats in 194'
>
>
> Changes from v2->v3:
>
> Patch 2: Uh... fix that misspelling.  Thanks Stefan :)
> Patch 3: New patch to block migration in parallels
>
> git-backport-diff -r qemu/master.. -u 6dc6acb
>
> 001/4:[] [--] 'block/vhdx.c: Don't blindly update the header'
> 002/4:[] [--] 'block/parallels: Do not update header or truncate image 
> when INMIGRATE'
> 003/4:[down] 'block/parallels: add migration blocker'
> 004/4:[] [--] 'qemu-iotests: update unsupported image formats in 194'
>
>
> Changes from v1->v2:
>
> * Drop previous parallels patches, just check BDRV_O_INACTIVE now
>   (Kevin)
>
> git-backport-diff -r qemu/master.. -u github/master
> Key:
> [] : patches are identical
> [] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, 
> respectively
>
> 001/3:[] [--] 'block/vhdx.c: Don't blindly update the header'
> 002/3:[down] 'block/parallals: Do not update header or truncate image when 
> INMIGRATE'
> 003/3:[] [--] 'qemu-iotests: update unsupported image formats in 194'
>
>
> v1:
>
> VHDX and Parallels both blindly write headers to the image file
> if the images are opened R/W.  This causes an assert if the QEMU run
> state is INMIGRATE.
>
> Jeff Cody (4):
>   block/vhdx.c: Don't blindly update the header
>   block/parallels: Do not update header or truncate image when INMIGRATE
>   block/parallels: add migration blocker
>   qemu-iotests: update unsupported image formats in 194
>
>  block/parallels.c  | 22 +-
>  block/vhdx.c   |  7 ---
>  tests/qemu-iotests/194 |  2 +-
>  3 files changed, 18 insertions(+), 13 deletions(-)
>
Reviewed-by: Denis V. Lunev

Re: [Qemu-block] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete

2017-11-09 Thread Stefan Hajnoczi

On Tue, Oct 24, 2017 at 11:33:51AM +0800, sochin jiang wrote:
> commit 7ca7f0 moves the throttling related part of the BDS life cycle
> management to BlockBackend, adds call to
> throttle_timers_detach_aio_context in blk_remove_bs.  commit 1606e
> remove a block device from its throttle group in blk_delete by calling
> blk_io_limits_disable, this fix an easily reproducible qemu crash. But
> delete a BB without a BDS inserted could easily cause a qemu crash too
> by calling bdrv_drained_begin in blk_io_limits_disable. Say, a simply
> drive_add and then a drive_del command.
> 
> This patch removes draining BDS by calling throttle_group_unregister_tgm
> directly instead of blk_io_limits_disable, leaves draining operation to
> blk_remove_bs in case that there is no BDS inserted. Futhermore, make sure
> throttle timers are initialized or attached before throttle_timers_destroy
> is called in throttle_group_unregister_tgm.
> 
> Signed-off-by: sochin jiang 
> ---
>  block/block-backend.c   | 2 +-
>  block/throttle-groups.c | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 45d9101..39c7cca 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -341,7 +341,7 @@ static void blk_delete(BlockBackend *blk)
>  assert(!blk->name);
>  assert(!blk->dev);
>  if (blk->public.throttle_group_member.throttle_state) {
> -blk_io_limits_disable(blk);
> +throttle_group_unregister_tgm(>public.throttle_group_member);

The following assertions fail without the drain when there are pending
requests:

  void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
  {
  ThrottleState *ts = tgm->throttle_state;
  ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
  ThrottleGroupMember *token;
  int i;

  if (!ts) {
  /* Discard already unregistered tgm */
  return;
  }

  assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
  assert(qemu_co_queue_empty(>throttled_reqs[0]));
  assert(qemu_co_queue_empty(>throttled_reqs[1]));

A safer approach is making blk_io_limits_disable(blk) skip the draining
when blk_bs(blk) == NULL.  That is the only case where we are 100% sure
that there are no pending requests.


signature.asc
Description: PGP signature

[Qemu-block] [PULL 3/8] nbd-client: Refuse read-only client with BDRV_O_RDWR

2017-11-09 Thread Eric Blake

The NBD spec says that clients should not try to write/trim to
an export advertised as read-only by the server.  But we failed
to check that, and would allow the block layer to use NBD with
BDRV_O_RDWR even when the server is read-only, which meant we
were depending on the server sending a proper EPERM failure for
various commands, and also exposes a leaky abstraction: using
qemu-io in read-write mode would succeed on 'w -z 0 0' because
of local short-circuiting logic, but 'w 0 0' would send a
request over the wire (where it then depends on the server, and
fails at least for qemu-nbd but might pass for other NBD
implementations).

With this patch, a client MUST request read-only mode to access
a server that is doing a read-only export, or else it will get
a message like:

can't open device nbd://localhost:10809/foo: request for write access conflicts 
with read-only export

It is no longer possible to even attempt writes over the wire
(including the corner case of 0-length writes), because the block
layer enforces the explicit read-only request; this matches the
behavior of qcow2 when backed by a read-only POSIX file.

Fix several iotests to comply with the new behavior (since
qemu-nbd of an internal snapshot, as well as nbd-server-add over QMP,
default to a read-only export, we must tell blockdev-add/qemu-io to
set up a read-only client).

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-3-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 9 +
 tests/qemu-iotests/058 | 8 
 tests/qemu-iotests/140 | 4 ++--
 tests/qemu-iotests/147 | 1 +
 4 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index de6c153328..daa4392531 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -697,6 +697,7 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 .len = bytes,
 };

+assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
 if (flags & BDRV_REQ_FUA) {
 assert(client->info.flags & NBD_FLAG_SEND_FUA);
 request.flags |= NBD_CMD_FLAG_FUA;
@@ -717,6 +718,7 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset,
 .len = bytes,
 };

+assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
 if (!(client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
 return -ENOTSUP;
 }
@@ -756,6 +758,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int bytes)
 .len = bytes,
 };

+assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
 if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
 return 0;
 }
@@ -814,6 +817,12 @@ int nbd_client_init(BlockDriverState *bs,
 logout("Failed to negotiate with the NBD server\n");
 return ret;
 }
+if (client->info.flags & NBD_FLAG_READ_ONLY &&
+!bdrv_is_read_only(bs)) {
+error_setg(errp,
+   "request for write access conflicts with read-only export");
+return -EACCES;
+}
 if (client->info.flags & NBD_FLAG_SEND_FUA) {
 bs->supported_write_flags = BDRV_REQ_FUA;
 bs->supported_zero_flags |= BDRV_REQ_FUA;
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 2253c6a6d1..5eb8784669 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -117,15 +117,15 @@ _export_nbd_snapshot sn1

 echo
 echo "== verifying the exported snapshot with patterns, method 1 =="
-$QEMU_IO_NBD -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
-$QEMU_IO_NBD -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io

 _export_nbd_snapshot1 sn1

 echo
 echo "== verifying the exported snapshot with patterns, method 2 =="
-$QEMU_IO_NBD -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
-$QEMU_IO_NBD -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xa 0x1000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io
+$QEMU_IO_NBD -r -c 'read -P 0xb 0x2000 0x1000' "$nbd_snapshot_img" | 
_filter_qemu_io

 $QEMU_IMG convert "$TEST_IMG" -l sn1 -O qcow2 "$converted_image"

diff --git a/tests/qemu-iotests/140 b/tests/qemu-iotests/140
index f89d0d6789..a8fc95145c 100755
--- a/tests/qemu-iotests/140
+++ b/tests/qemu-iotests/140
@@ -78,7 +78,7 @@ _send_qemu_cmd $QEMU_HANDLE \
'arguments': { 'device': 'drv' }}" \
 'return'

-$QEMU_IO_PROG -f raw -c 'read -P 42 0 64k' \
+$QEMU_IO_PROG -f raw -r -c 'read -P 42 0 64k' \
 "nbd+unix:///drv?socket=$TEST_DIR/nbd" 2>&1 \
 | _filter_qemu_io | _filter_nbd

@@ -87,7 +87,7 @@ _send_qemu_cmd $QEMU_HANDLE \
'arguments': { 'device': 'drv' }}" \
 'return'

-$QEMU_IO_PROG -f raw -c close

Re: [Qemu-block] [PATCH 2/2] qmp: add nbd-server-remove

2017-11-09 Thread Eric Blake

On 11/09/2017 09:40 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add command for export removing. It is needed for cases when we
> don't want to keep export after the operation on it was completed.
> The other example is temporary node, created with blockdev-add.
> If we want to delete it we should firstly remove corresponding
> NBD export.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  qapi/block.json | 20 
>  blockdev-nbd.c  | 27 +++
>  2 files changed, 47 insertions(+)

It would be nice (okay as a followup patch) to add iotest coverage of
the new command.

>  ##
> +# @nbd-server-remove:
> +#
> +# Stop exporting block node through QEMU's embedded NBD server.
> +#
> +# @device: The device name or node name of the exported node. Should be equal
> +#  to @device parameter for corresponding nbd-server-add command 
> call.
> +#
> +# @force: Whether active connections to the export should be closed. If this
> +# parameter is false the export is only removed from named exports 
> list,
> +# so new connetions are impossible and it would be freed after all
> +# clients are disconnected (default false).
> +#
> +# Returns: error if the server is not running or the device is not marked for
> +#  export.
> +#
> +# Since: 2.12

You are correct that this is too late for 2.11, but I like the concept.
Once we have some testsuite coverage, I'll queue it for 2.12.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PULL 4/8] nbd/client: Nicer trace of structured reply

2017-11-09 Thread Eric Blake

It's useful to know which structured reply chunk is being processed.
Missed in commit d2febedb.

Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-4-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client.c | 4 +++-
 nbd/trace-events | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index 3d680e63e1..1880103d2a 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -979,6 +979,7 @@ static int nbd_receive_structured_reply_chunk(QIOChannel 
*ioc,
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
 int ret;
+const char *type;

 ret = nbd_read_eof(ioc, >magic, sizeof(reply->magic), errp);
 if (ret <= 0) {
@@ -1008,8 +1009,9 @@ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, 
Error **errp)
 if (ret < 0) {
 break;
 }
+type = nbd_reply_type_lookup(reply->structured.type);
 trace_nbd_receive_structured_reply_chunk(reply->structured.flags,
- reply->structured.type,
+ reply->structured.type, type,
  reply->structured.handle,
  reply->structured.length);
 break;
diff --git a/nbd/trace-events b/nbd/trace-events
index 4a13757524..bbc75f6414 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -27,7 +27,7 @@ nbd_client_clear_queue(void) "Clearing NBD queue"
 nbd_client_clear_socket(void) "Clearing NBD socket"
 nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, 
uint16_t type, const char *name) "Sending request to server: { .from = %" 
PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", 
.type = %" PRIu16 " (%s) }"
 nbd_receive_simple_reply(int32_t error, const char *errname, uint64_t handle) 
"Got simple reply: { .error = %" PRId32 " (%s), handle = %" PRIu64" }"
-nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, uint64_t 
handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", 
type = %d, handle = %" PRIu64 ", length = %" PRIu32 " }"
+nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, const char 
*name, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 
0x%" PRIx16 ", type = %d (%s), handle = %" PRIu64 ", length = %" PRIu32 " }"

 # nbd/common.c
 nbd_unknown_error(int err) "Squashing unexpected error %d to EINVAL"
-- 
2.13.6

[Qemu-block] [PULL 2/8] nbd-client: Fix error message typos

2017-11-09 Thread Eric Blake

Provide missing spaces that are required when using string
concatenation to break error messages across source lines.
Introduced in commit f140e300.

Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-2-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index b44d4d4a01..de6c153328 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -248,7 +248,7 @@ static int nbd_parse_error_payload(NBDStructuredReplyChunk 
*chunk,

 error = nbd_errno_to_system_errno(payload_advance32());
 if (error == 0) {
-error_setg(errp, "Protocol error: server sent structured error chunk"
+error_setg(errp, "Protocol error: server sent structured error chunk "
  "with error = 0");
 return -EINVAL;
 }
@@ -257,7 +257,7 @@ static int nbd_parse_error_payload(NBDStructuredReplyChunk 
*chunk,
 message_size = payload_advance16();

 if (message_size > chunk->length - sizeof(error) - sizeof(message_size)) {
-error_setg(errp, "Protocol error: server sent structured error chunk"
+error_setg(errp, "Protocol error: server sent structured error chunk "
  "with incorrect message size");
 return -EINVAL;
 }
@@ -408,7 +408,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 if (chunk->type == NBD_REPLY_TYPE_NONE) {
 if (!(chunk->flags & NBD_REPLY_FLAG_DONE)) {
 error_setg(errp, "Protocol error: NBD_REPLY_TYPE_NONE chunk 
without"
- "NBD_REPLY_FLAG_DONE flag set");
+   " NBD_REPLY_FLAG_DONE flag set");
 return -EINVAL;
 }
 return 0;
-- 
2.13.6

[Qemu-block] [PULL 1/8] nbd/server: fix nbd_negotiate_handle_info

2017-11-09 Thread Eric Blake

From: Vladimir Sementsov-Ogievskiy 

namelen should be here, length is unrelated, and always 0 at this
point.  Broken in introduction in commit f37708f6, but mostly
harmless (replying with '' as the name does not violate protocol,
and does not confuse qemu as the nbd client since our implementation
does not ask for the name; but might confuse some other client that
does ask for the name especially if the default export is different
than the export name being queried).

Adding an assert makes it obvious that we are not skipping any bytes
in the client's message, as well as making it obvious that we were
using the wrong variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
CC: qemu-sta...@nongnu.org
Message-Id: <20171101154204.27146-1-vsement...@virtuozzo.com>
[eblake: improve commit message, squash in assert addition]
Signed-off-by: Eric Blake 
---
 nbd/server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 70b40ed27e..bcf0cdb47c 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -423,6 +423,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
uint32_t length,
 break;
 }
 }
+assert(length == 0);

 exp = nbd_export_find(name);
 if (!exp) {
@@ -433,7 +434,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
uint32_t length,

 /* Don't bother sending NBD_INFO_NAME unless client requested it */
 if (sendname) {
-rc = nbd_negotiate_send_info(client, opt, NBD_INFO_NAME, length, name,
+rc = nbd_negotiate_send_info(client, opt, NBD_INFO_NAME, namelen, name,
  errp);
 if (rc < 0) {
 return rc;
-- 
2.13.6

[Qemu-block] [PULL 5/8] nbd: Fix struct name for structured reads

2017-11-09 Thread Eric Blake

A closer read of the NBD spec shows that a structured reply chunk
for a hole is not quite identical to the prefix of a data chunk,
because the hole has to also send a 32-bit size field.  Although
we do not yet send holes, we should fix the misleading information
in our header and make it easier for a future patch to support
sparse reads.  Messed up in commit bae245d1.

Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-5-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h | 18 +-
 nbd/server.c|  2 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 92d1723d7c..113c707a5e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -86,15 +86,23 @@ typedef union NBDReply {
 } QEMU_PACKED;
 } NBDReply;

-/* Header of NBD_REPLY_TYPE_OFFSET_DATA, complete NBD_REPLY_TYPE_OFFSET_HOLE */
-typedef struct NBDStructuredRead {
-NBDStructuredReplyChunk h;
+/* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
+typedef struct NBDStructuredReadData {
+NBDStructuredReplyChunk h; /* h.length >= 9 */
 uint64_t offset;
-} QEMU_PACKED NBDStructuredRead;
+/* At least one byte of data payload follows, calculated from h.length */
+} QEMU_PACKED NBDStructuredReadData;
+
+/* Complete chunk for NBD_REPLY_TYPE_OFFSET_HOLE */
+typedef struct NBDStructuredReadHole {
+NBDStructuredReplyChunk h; /* h.length == 12 */
+uint64_t offset;
+uint32_t length;
+} QEMU_PACKED NBDStructuredReadHole;

 /* Header of all NBD_REPLY_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
-NBDStructuredReplyChunk h;
+NBDStructuredReplyChunk h; /* h.length >= 6 */
 uint32_t error;
 uint16_t message_length;
 } QEMU_PACKED NBDStructuredError;
diff --git a/nbd/server.c b/nbd/server.c
index bcf0cdb47c..6ebb7d9c2e 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1280,7 +1280,7 @@ static int coroutine_fn 
nbd_co_send_structured_read(NBDClient *client,
 size_t size,
 Error **errp)
 {
-NBDStructuredRead chunk;
+NBDStructuredReadData chunk;
 struct iovec iov[] = {
 {.iov_base = , .iov_len = sizeof(chunk)},
 {.iov_base = data, .iov_len = size}
-- 
2.13.6

Re: [Qemu-block] [PATCH] coroutine: simplify co_aio_sleep_ns() prototype

2017-11-09 Thread Stefan Hajnoczi

On Thu, Nov 09, 2017 at 10:26:52AM +, Stefan Hajnoczi wrote:
> The AioContext pointer argument to co_aio_sleep_ns() is only used for
> the sleep timer.  It does not affect where the caller coroutine is
> resumed.
> 
> Due to changes to coroutine and AIO APIs it is now possible to drop the
> AioContext pointer argument.  This is safe to do since no caller has
> specific requirements for which AioContext the timer must run in.
> 
> This patch drops the AioContext pointer argument and renames the
> function to simplify the API.
> 
> Reported-by: Paolo Bonzini 
> Reported-by: Eric Blake 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/qemu/coroutine.h| 6 +-
>  block/null.c| 3 +--
>  block/sheepdog.c| 3 +--
>  blockjob.c  | 2 +-
>  util/qemu-coroutine-sleep.c | 4 ++--
>  5 files changed, 6 insertions(+), 12 deletions(-)

Thanks, applied to my block-next tree for QEMU 2.12:
https://github.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-block] Drainage in bdrv_replace_child_noperm()

2017-11-09 Thread Kevin Wolf

Am 08.11.2017 um 21:16 hat Max Reitz geschrieben:
> On 2017-11-07 15:22, Kevin Wolf wrote:
> > I think the issue is much simpler, even though it still has two parts.
> > It's the old story of bdrv_drain mixing two separate concepts:
> > 
> > 1. Wait synchronously for the completion of all my requests to this
> >node. This needs to be propagated down the graph to the children.
> 
> So, flush without flushing protocol drivers? :-)

No, flush is a completely different thing. Drain is about completing all
in-flight requests, whereas flush is about writing out all caches. You
can do either one without the other.

In particular, a flush doesn't involve a drain, you can still requests
while a flush request is in flight. The semantics is that a flush makes
sure that all requests which had already completed when the flush
request was started are stable on disk. Later requests may or may not be
stable on disk yet.

> > 2. Make sure that nobody else sends new requests to this node. This
> >needs to be propagated up the graph to the parents.
> > 
> > Some callers want only 1. (usually callers of bdrv_drain_all() without a
> > begin/end pair), some callers want both 1. and 2. (that's the begin/end
> > construction for getting exclusive access). Not sure if there are
> > callers that want only 2., but possibly.
> > 
> > If we actually take this separation serious, the first step to a fix
> > could be that BdrvChildRole.drained_begin shouldn't be propagated to the
> > children.
> 
> That seems good to me, but considering that the default was to just
> propagate it to every child, I thought that I was missing something.

bdrv_child_cb_drained_begin() calls bdrv_drained_begin() to wait for the
completion of all running requests on its current node. The propagation
to every child is an unwanted side effect, but so far it didn't seem to
hurt anyone, so we didn't care about preventing it.

> >   We may still need to drain the requests at the node itself:
> > > Imagine a drained backing file of qcow2 node. Making sure that the qcow2
> > node doesn't get new requests isn't enough, we also must make sure that
> > in-flight requests don't access the backing file any more. This means
> > draining the qcow2 node, though we don't care whether its child nodes
> > still have requests in flight.
> 
> If you want to stop the qcow2 node from generating further requests, you
> can either flush them (as you said) or pause them (whatever that means...).

Pausing is probably way to complicated. qcow2 never issues requests by
itself. It only has requests running if someone else has a request
running on the qcow2 node. So it's enough to just drain the request
queue of the qcow2 node to get to the state we want.

The only thing we need to make sure is that we drain it _before_ its
child node is replaced with a drained one, so that its requests can
complete.

In fact, I think graph modifications should only be done with drained
nodes anyway. Otherwise you could switch over in the middle of a single
high-level operation and possibly get callbacks from a node that isn't a
child any more. Maybe we should add some assertions to that effect and
clean up whatever code breaks after it.

> However, if you flush them, you also need to flush the children you have
> flushed them to...  So what I wrote was technically wrong ("don't pass
> parent drainage back to the child because the child is already
> drained"), instead it should be "don't pass parent drainage back to the
> child because the child is going to be drained (flushed) anyway".
> 
> So, pause requests first (either by parking them or flushing them,
> driver's choice), then flush the tree.
> 
> Of course that's easier said than done...  First, we would need a
> non-recursive flush.  Then, every node that is visited by the drain
> would (*after* recursing to its parents) need to flush itself.
> 
> (Note that I'm completely disregarding nodes which are below the
> original node that is supposed to be drained, and which therefore do not
> drain their parents (or do they?) because I'd like to retain at least
> some of my sanity for now.)

I don't follow, but I suppose this is related to the flush/drain
confusion.

> Secondly we have exactly the issue you describe below...
> 
> > The big question is whether bdrv_drain() would still work for a single
> > node without recursing to the children, but as it uses bs->in_flight, I
> > think that should be okay these days.
> > 
> >> (Most importantly, ideally we'd want to attach the new child to the
> >> parent and then drain the parent: This would give us exactly the state
> >> we want.  However, attaching the child first and then draining the
> >> parent is unsafe, so we cannot do it...)
> >>
> >> [1] Whether the parent (un-)drains the child depends on the
> >> BdrvChildRole.drained_{begin,end}() implementation, strictly speaking.
> >> We cannot say it generally.
> >>
> >> OK, so how to fix it?  I don't know, so I'm asking you. :-)
> > 
> >

Re: [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-11-09 Thread Eric Blake

On 11/09/2017 08:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is needed to implement image-fleecing scheme, when we create
> a temporary node, mark our active node to be backing for the temp,
> and start backup(sync=none) from active node to the temp node.
> Temp node then represents a kind of snapshot and may be used
> for external backup through NBD.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> What was the reason to abandon non-root nodes?

I think the original restriction was that we didn't know all the
implications to having multiple readers to an intermediate node, so it
was easier to prevent it with plans to add it later than to add it up
front and deal with the fallout.  But I think that now we are
sufficiently versed in handling BDS trees with multiple readers, with
proper op-blocking in place; so you are right that we can probably
support it now.

However, I'm a bit worried that there is no documentation change about
this in a .json QAPI file, nor any easy way to introspect via QMP
whether a particular qemu implementation supports this (short of trying
it and seeing whether it works).  I'm also thinking that this is 2.12
material, unless we can prove it is fixing a bug for 2.11 that was not
previously present.

> 
>  blockdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 56a6b24a0b..d0a5a107f0 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3345,7 +3345,7 @@ BlockJob *do_blockdev_backup(BlockdevBackup *backup, 
> BlockJobTxn *txn,
>  backup->compress = false;
>  }
>  
> -bs = qmp_get_root_bs(backup->device, errp);
> +bs = bdrv_lookup_bs(backup->device, backup->device, errp);
>  if (!bs) {
>  return NULL;
>  }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH] block: all I/O should be completed before removing throttle timers.

2017-11-09 Thread Stefan Hajnoczi

On Sat, Oct 21, 2017 at 01:34:00PM +0800, Zhengui Li wrote:
> From: Zhengui 
> 
> In blk_remove_bs, all I/O should be completed before removing throttle
> timers. If there has inflight I/O, removing throttle timers here will
> cause the inflight I/O never return.
> This patch add bdrv_drained_begin before throttle_timers_detach_aio_context
> to let all I/O completed before removing throttle timers.
> 
> Signed-off-by: Zhengui 
> ---
>  block/block-backend.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH 0/2] add qmp nbd-server-remove

2017-11-09 Thread Eric Blake

On 11/09/2017 09:40 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add command to remove nbd export, pair to nbd-server-add.
> The whole thing and description are in patch 02.
> 
> Vladimir Sementsov-Ogievskiy (2):
>   nbd/server: add additional assert to nbd_export_put
>   qmp: add nbd-server-remove
> 
>  qapi/block.json | 20 
>  blockdev-nbd.c  | 27 +++
>  nbd/server.c|  1 +
>  3 files changed, 48 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 0/5] iotests: Make some tests less flaky

2017-11-09 Thread Stefan Hajnoczi

On Thu, Nov 09, 2017 at 02:37:59AM +0100, Max Reitz wrote:
> There are a couple of tests that fail (on my machine) from time to
> time (and by that I mean that recently I've rarely ever had a test run
> with both 083 and 136 working on first try).
> This series should fix most (at least the issues I am aware of).
> 
> Notes:
> - 083 might have another issue, but if so it occurs extremely rarely and
>   so I was unable to debug it.
> 
> - 129 is flaky, too, because it tries to use block jobs with BB
>   throttling -- however, block jobs ignore that these days.  Making it
>   use a throttle filter node will require quite a bit of work.  See
>   http://lists.nongnu.org/archive/html/qemu-block/2017-11/msg00111.html
>   for more.
> 
> - 194 sometimes hangs because the source VM fails to drain its drive.
>   This is probably not an issue with the test, but actually an issue in
>   qemu.  See
>   http://lists.nongnu.org/archive/html/qemu-block/2017-11/msg00256.html
>   for more.
> 
> 
> "All tests have passed, let's ship it!"
> -- Me, 2:36 am
> 
> 
> (Editor's note: "all" means raw/file, qcow2/file, and raw/nbd.)
> 
> 
> Max Reitz (5):
>   iotests: Make 030 less flaky
>   iotests: Add missing 'blkdebug::' in 040
>   iotests: Make 055 less flaky
>   iotests: Make 083 less flaky
>   iotests: Make 136 less flaky
> 
>  tests/qemu-iotests/030 |  8 ++--
>  tests/qemu-iotests/040 |  2 +-
>  tests/qemu-iotests/055 | 25 -
>  tests/qemu-iotests/083 |  3 ++-
>  tests/qemu-iotests/136 | 14 +-
>  5 files changed, 38 insertions(+), 14 deletions(-)
> 
> -- 
> 2.13.6
> 
> 

Thanks for these fixes!  QEMU 2.11 material.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

[Qemu-block] [PATCH 2/2] qmp: add nbd-server-remove

2017-11-09 Thread Vladimir Sementsov-Ogievskiy

Add command for export removing. It is needed for cases when we
don't want to keep export after the operation on it was completed.
The other example is temporary node, created with blockdev-add.
If we want to delete it we should firstly remove corresponding
NBD export.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block.json | 20 
 blockdev-nbd.c  | 27 +++
 2 files changed, 47 insertions(+)

diff --git a/qapi/block.json b/qapi/block.json
index f093fa3f27..1827940717 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -223,6 +223,26 @@
 { 'command': 'nbd-server-add', 'data': {'device': 'str', '*writable': 'bool'} }
 
 ##
+# @nbd-server-remove:
+#
+# Stop exporting block node through QEMU's embedded NBD server.
+#
+# @device: The device name or node name of the exported node. Should be equal
+#  to @device parameter for corresponding nbd-server-add command call.
+#
+# @force: Whether active connections to the export should be closed. If this
+# parameter is false the export is only removed from named exports 
list,
+# so new connetions are impossible and it would be freed after all
+# clients are disconnected (default false).
+#
+# Returns: error if the server is not running or the device is not marked for
+#  export.
+#
+# Since: 2.12
+##
+{ 'command': 'nbd-server-remove', 'data': {'device': 'str', '*force': 'bool'} }
+
+##
 # @nbd-server-stop:
 #
 # Stop QEMU's embedded NBD server, and unregister all devices previously
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 28f551a7b0..5f66951c33 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -203,6 +203,33 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 nbd_export_put(exp);
 }
 
+void qmp_nbd_server_remove(const char *device, bool has_force, bool force,
+   Error **errp)
+{
+NBDExport *exp;
+
+if (!nbd_server) {
+error_setg(errp, "NBD server not running");
+return;
+}
+
+exp = nbd_export_find(device);
+if (exp == NULL) {
+error_setg(errp, "'%s' is not exported", device);
+return;
+}
+
+if (!has_force) {
+force = false;
+}
+
+if (force) {
+nbd_export_close(exp);
+} else {
+nbd_export_set_name(exp, NULL);
+}
+}
+
 void qmp_nbd_server_stop(Error **errp)
 {
 nbd_export_close_all();
-- 
2.11.1

[Qemu-block] [PATCH 0/2] add qmp nbd-server-remove

2017-11-09 Thread Vladimir Sementsov-Ogievskiy

Add command to remove nbd export, pair to nbd-server-add.
The whole thing and description are in patch 02.

Vladimir Sementsov-Ogievskiy (2):
  nbd/server: add additional assert to nbd_export_put
  qmp: add nbd-server-remove

 qapi/block.json | 20 
 blockdev-nbd.c  | 27 +++
 nbd/server.c|  1 +
 3 files changed, 48 insertions(+)

-- 
2.11.1

[Qemu-block] [PATCH 1/2] nbd/server: add additional assert to nbd_export_put

2017-11-09 Thread Vladimir Sementsov-Ogievskiy

This place is not obvious, nbd_export_close may theoretically reduce
refcount to 0. It may happen if someone calls nbd_export_put on named
export not through nbd_export_set_name when refcount is 1.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/server.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/nbd/server.c b/nbd/server.c
index 70b40ed27e..2f2e05943f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1179,6 +1179,7 @@ void nbd_export_put(NBDExport *exp)
 nbd_export_close(exp);
 }
 
+assert(exp->refcount > 0);
 if (--exp->refcount == 0) {
 assert(exp->name == NULL);
 assert(exp->description == NULL);
-- 
2.11.1

[Qemu-block] [PULL 8/8] nbd/server: Fix structured read of length 0

2017-11-09 Thread Eric Blake

The NBD spec was recently clarified to state that a read of length 0
should not be attempted by a compliant client; but that a server must
still handle it correctly in an unspecified manner (that is, either
a successful no-op or an error reply, but not a crash) [1].  However,
it also implies that NBD_REPLY_TYPE_OFFSET_DATA must have a non-zero
payload length, but our existing code was replying with a chunk
that a picky client could reject as invalid because it was missing
a payload (our own client implementation was recently patched to be
that picky, after first fixing it to not send 0-length requests).

We are already doing successful no-ops for 0-length writes and for
non-structured reads; so for consistency, we want structured reply
reads to also be a no-op.  The easiest way to do this is to return
a NBD_REPLY_TYPE_NONE chunk; this is best done via a new helper
function (especially since future patches for other structured
replies may benefit from using the same helper).

[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037

Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-8-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/server.c | 21 -
 nbd/trace-events |  1 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 6ebb7d9c2e..df771fd42f 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1273,6 +1273,21 @@ static inline void set_be_chunk(NBDStructuredReplyChunk 
*chunk, uint16_t flags,
 stl_be_p(>length, length);
 }

+static int coroutine_fn nbd_co_send_structured_done(NBDClient *client,
+uint64_t handle,
+Error **errp)
+{
+NBDStructuredReplyChunk chunk;
+struct iovec iov[] = {
+{.iov_base = , .iov_len = sizeof(chunk)},
+};
+
+trace_nbd_co_send_structured_done(handle);
+set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);
+
+return nbd_co_send_iov(client, iov, 1, errp);
+}
+
 static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
 uint64_t handle,
 uint64_t offset,
@@ -1286,6 +1301,7 @@ static int coroutine_fn 
nbd_co_send_structured_read(NBDClient *client,
 {.iov_base = data, .iov_len = size}
 };

+assert(size);
 trace_nbd_co_send_structured_read(handle, offset, data, size);
 set_be_chunk(, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_OFFSET_DATA,
  handle, sizeof(chunk) - sizeof(chunk.h) + size);
@@ -1544,10 +1560,13 @@ reply:
 if (ret < 0) {
 ret = nbd_co_send_structured_error(req->client, request.handle,
-ret, msg, _err);
-} else {
+} else if (reply_data_len) {
 ret = nbd_co_send_structured_read(req->client, request.handle,
   request.from, req->data,
   reply_data_len, _err);
+} else {
+ret = nbd_co_send_structured_done(req->client, request.handle,
+  _err);
 }
 } else {
 ret = nbd_co_send_simple_reply(req->client, request.handle,
diff --git a/nbd/trace-events b/nbd/trace-events
index bbc75f6414..92568edce5 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -55,6 +55,7 @@ nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t 
type, uint64_t from
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching 
clients to AIO context %p\n"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients 
from AIO context %p\n"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, const char *errname, 
int len) "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 " (%s), 
len = %d"
+nbd_co_send_structured_done(uint64_t handle) "Send structured reply done: 
handle = %" PRIu64
 nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void *data, 
size_t size) "Send structured read data reply: handle = %" PRIu64 ", offset = 
%" PRIu64 ", data = %p, len = %zu"
 nbd_co_send_structured_error(uint64_t handle, int err, const char *errname, 
const char *msg) "Send structured error reply: handle = %" PRIu64 ", error = %d 
(%s), msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char 
*name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
-- 
2.13.6

[Qemu-block] [PULL 7/8] nbd-client: Stricter enforcing of structured reply spec

2017-11-09 Thread Eric Blake

Ensure that the server is not sending unexpected chunk lengths
for either the NONE or the OFFSET_DATA chunk, nor unexpected
hole length for OFFSET_HOLE.  This will flag any server as
broken that responds to a zero-length read with an OFFSET_DATA
(what our server currently does, but that's about to be fixed)
or with OFFSET_HOLE, even though we previously fixed our client
to never be able to send such a request over the wire.

Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-7-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 0a675d0fab..bcfed0133d 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -216,7 +216,7 @@ static int 
nbd_parse_offset_hole_payload(NBDStructuredReplyChunk *chunk,
 offset = payload_advance64();
 hole_size = payload_advance32();

-if (offset < orig_offset || hole_size > qiov->size ||
+if (!hole_size || offset < orig_offset || hole_size > qiov->size ||
 offset > orig_offset + qiov->size - hole_size) {
 error_setg(errp, "Protocol error: server sent chunk exceeding 
requested"
  " region");
@@ -281,7 +281,8 @@ static int 
nbd_co_receive_offset_data_payload(NBDClientSession *s,

 assert(nbd_reply_is_structured(>reply));

-if (chunk->length < sizeof(offset)) {
+/* The NBD spec requires at least one byte of payload */
+if (chunk->length <= sizeof(offset)) {
 error_setg(errp, "Protocol error: invalid payload for "
  "NBD_REPLY_TYPE_OFFSET_DATA");
 return -EINVAL;
@@ -293,6 +294,7 @@ static int 
nbd_co_receive_offset_data_payload(NBDClientSession *s,
 be64_to_cpus();

 data_size = chunk->length - sizeof(offset);
+assert(data_size);
 if (offset < orig_offset || data_size > qiov->size ||
 offset > orig_offset + qiov->size - data_size) {
 error_setg(errp, "Protocol error: server sent chunk exceeding 
requested"
@@ -411,6 +413,11 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
" NBD_REPLY_FLAG_DONE flag set");
 return -EINVAL;
 }
+if (chunk->length) {
+error_setg(errp, "Protocol error: NBD_REPLY_TYPE_NONE chunk with"
+   " nonzero length");
+return -EINVAL;
+}
 return 0;
 }

-- 
2.13.6

Re: [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-11-09 Thread Kevin Wolf

Am 09.11.2017 um 17:33 hat Eric Blake geschrieben:
> On 11/09/2017 08:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> > This is needed to implement image-fleecing scheme, when we create
> > a temporary node, mark our active node to be backing for the temp,
> > and start backup(sync=none) from active node to the temp node.
> > Temp node then represents a kind of snapshot and may be used
> > for external backup through NBD.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > ---
> > 
> > What was the reason to abandon non-root nodes?
> 
> I think the original restriction was that we didn't know all the
> implications to having multiple readers to an intermediate node, so it
> was easier to prevent it with plans to add it later than to add it up
> front and deal with the fallout.  But I think that now we are
> sufficiently versed in handling BDS trees with multiple readers, with
> proper op-blocking in place; so you are right that we can probably
> support it now.

Op blockers are actually the reason why I'm not so sure. The old
blockers often still only work on the top level, and we haven't fully
replaced them yet. In particular, graph modifications might not be
protected well enough by the new system yet.

But then, backup works only on a single node in the source tree, not on
a whole subchain, so I don't see what other operation could conflict
with a backup job. The only thing is resizes, but that is covered by the
new system.

So in the end, I think this specific change should be okay.

> However, I'm a bit worried that there is no documentation change about
> this in a .json QAPI file, nor any easy way to introspect via QMP
> whether a particular qemu implementation supports this (short of trying
> it and seeing whether it works).  I'm also thinking that this is 2.12
> material, unless we can prove it is fixing a bug for 2.11 that was not
> previously present.

Sounds like another use case for the capability annotations that Markus
wants to introduce for commands in the QAPI schema.

Kevin

signature.asc
Description: PGP signature

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-09 Thread Alberto Garcia

On Wed 08 Nov 2017 03:45:38 PM CET, Alberto Garcia wrote:

> I'm thinking that perhaps we should add the pause point directly to
> block_job_defer_to_main_loop(), to prevent any block job from running
> the exit function when it's paused.

I was trying this and unfortunately this breaks the mirror job at least
(reproduced with a simple block-commit on the topmost node, which uses
commit_active_start() -> mirror_start_job()).

So what happens is that mirror_run() always calls bdrv_drained_begin()
before returning, pausing the block job. The closing bdrv_drained_end()
call is at the end of mirror_exit(), already in the main loop.

So the mirror job is always calling block_job_defer_to_main_loop() and
mirror_exit() when the job is paused.

Berto

[Qemu-block] [PULL 6/8] nbd-client: Short-circuit 0-length operations

2017-11-09 Thread Eric Blake

The NBD spec was recently clarified to state that clients should
not send 0-length requests to the server, as the server behavior
is undefined [1].  We know that qemu-nbd's behavior is a successful
no-op (once it has filtered for read-only exports), but other NBD
implementations might return an error.  To avoid any questionable
server implementations, it is better to just short-circuit such
requests on the client side (we are relying on the block layer to
already filter out requests such as invalid offset, write to a
read-only volume, and so forth); do the short-circuit as late as
possible to still benefit from protections from assertions that
the block layer is not violating our assumptions.

[1] https://github.com/NetworkBlockDevice/nbd/commit/ee926037

Signed-off-by: Eric Blake 
Message-Id: <20171108215703.9295-6-ebl...@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd-client.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index daa4392531..0a675d0fab 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -674,6 +674,9 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t 
offset,
 assert(bytes <= NBD_MAX_BUFFER_SIZE);
 assert(!flags);

+if (!bytes) {
+return 0;
+}
 ret = nbd_co_send_request(bs, , NULL);
 if (ret < 0) {
 return ret;
@@ -705,6 +708,9 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,

 assert(bytes <= NBD_MAX_BUFFER_SIZE);

+if (!bytes) {
+return 0;
+}
 return nbd_co_request(bs, , qiov);
 }

@@ -731,6 +737,9 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset,
 request.flags |= NBD_CMD_FLAG_NO_HOLE;
 }

+if (!bytes) {
+return 0;
+}
 return nbd_co_request(bs, , NULL);
 }

@@ -759,7 +768,7 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int bytes)
 };

 assert(!(client->info.flags & NBD_FLAG_READ_ONLY));
-if (!(client->info.flags & NBD_FLAG_SEND_TRIM)) {
+if (!(client->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
 return 0;
 }

-- 
2.13.6

Re: [Qemu-block] [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)

2017-11-09 Thread Max Reitz

On 2017-11-09 16:30, Fam Zheng wrote:
> On Thu, 11/09 16:14, Max Reitz wrote:
>> On 2017-11-09 05:21, Fam Zheng wrote:
>>> On Thu, 11/09 01:48, Max Reitz wrote:
 Hi,

 More exciting news from the bdrv_drain() front!

 I've noticed in the past that iotest 194 sometimes hangs.  I usually run
 the tests on tmpfs, but I've just now verified that it happens on my SSD
 just as well.

 So the reproducer is a plain:

 while ./check -raw 194; do; done
>>>
>>> I cannot produce it on my machine.
>>
>> Hm, too bad.  I see it both on my work laptop (with Fedora) and my
>> desktop (with Arch)...
>>
 (No difference between raw or qcow2, though.)

 And then, after a couple of runs (or a couple ten), it will just hang.
 The reason is that the source VM lingers around and doesn't quit
 voluntarily -- the test itself was successful, but it just can't exit.

 If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
 this is the backtrace:

 #0  0x7f7cfc297e06 in ppoll () at /lib64/libc.so.6
 #1  0x563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
 __nfds=, __fds=) at
 /usr/include/bits/poll2.h:77
>>>
>>> Looking at the 0 timeout it seems we are in the aio_poll(ctx, 
>>> blocking=false);
>>> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is 
>>> making
>>> progress and causing the return value to be true in aio_poll().
>>>
 #2  0x563b846bcac9 in qemu_poll_ns (fds=,
 nfds=, timeout=) at util/qemu-timer.c:322
 #3  0x563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
 blocking=) at util/aio-posix.c:629
 #4  0x563b8463afa4 in bdrv_drain_recurse
 (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
 #5  0x563b8463baff in bdrv_drain_all_begin () at block/io.c:381
 #6  0x563b8463bc99 in bdrv_drain_all () at block/io.c:411
 #7  0x563b8459888b in block_migration_cleanup (opaque=>>> out>) at migration/block.c:714
 #8  0x563b845883be in qemu_savevm_state_cleanup () at
 migration/savevm.c:1251
 #9  0x563b845811fd in migration_thread (opaque=0x563b856f1da0) at
 migration/migration.c:2298
 #10 0x7f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
 #11 0x7f7cfc2a3e1f in clone () at /lib64/libc.so.6


 And when you make bdrv_drain_all_begin() print what we are trying to
 drain, you can see that it's the format node (managed by the "raw"
 driver in this case).
>>>
>>> So what is the value of bs->in_flight?
>>
>> gdb:
>>> (gdb) print bs->in_flight 
>>> $3 = 2307492233
>>
>> "That's weird, why would it..."
>>
>>> (gdb) print *bs
>>> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = 
>>> false, probed = false, force_share = 96, implicit = 159, drv = 0x0, opaque 
>>> = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first = 
>>> 0x8989898989898989}, 
>>>   walking_aio_notifiers = 137, filename = '\211' , 
>>> backing_file = '\211' , backing_format = '\211' 
>>> , full_open_options = 0x8989898989898989, 
>>>   exact_filename = '\211' , backing = 
>>> 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 
>>> 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
>>> max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 2307492233, 
>>> opt_transfer = 2307492233, max_transfer = 2307492233, min_mem_alignment = 
>>> 9910603678816504201, opt_mem_alignment = 9910603678816504201, max_iov = 
>>> -1987475063}, 
>>>   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, 
>>> node_name = '\211' , node_list = {tqe_next = 
>>> 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 
>>> 0x8989898989898989, 
>>> tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 
>>> 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, 
>>> op_blockers = {{lh_first = 0x8989898989898989} }, job = 
>>> 0x8989898989898989, 
>>>   inherits_from = 0x8989898989898989, children = {lh_first = 
>>> 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 
>>> 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes = 
>>> 2307492233, 
>>>   backing_blocker = 0x8989898989898989, total_sectors = 
>>> -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 
>>> 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, 
>>> write_threshold_notifier = {notify = 
>>> 0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 
>>> 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = 
>>> -1987475063, __count = 2307492233, __owner = -1987475063, __nusers = 
>>> 2307492233, 
>>> __kind = -1987475063, __spins = -30327, __elision = -30327, __list 
>>> = {__prev = 0x8989898989898989, __next = 0x8989898989898989}}, __size = 
>>> '\211' , __align = -8536140394893047415},

[Qemu-block] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-09 Thread Max Reitz

Draining a BDS may lead to graph modifications, which in turn may result
in it and other BDS being stripped of their current references.  If
bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
references themselves, the BDS they are trying to drain (or undrain) may
disappear right under their feet -- or, more specifically, under the
feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().

This fixes an occasional hang of iotest 194.

Signed-off-by: Max Reitz 
---
 block/io.c | 47 ---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3d5ef2cabe..a0a2833e8e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
 bool waited = true;
 BlockDriverState *bs;
 BdrvNextIterator it;
-GSList *aio_ctxs = NULL, *ctx;
+GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
+
+/* Must be called from the main loop */
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
 block_job_pause_all();
 
@@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
 if (!g_slist_find(aio_ctxs, aio_context)) {
 aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
 }
+
+/* Keep a strong reference to all root BDS and copy them into
+ * an own list because draining them may lead to graph
+ * modifications. */
+bdrv_ref(bs);
+bs_list = g_slist_prepend(bs_list, bs);
 }
 
 /* Note that completion of an asynchronous I/O operation can trigger any
@@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
 AioContext *aio_context = ctx->data;
 
 aio_context_acquire(aio_context);
-for (bs = bdrv_first(); bs; bs = bdrv_next()) {
+for (bs_list_entry = bs_list; bs_list_entry;
+ bs_list_entry = bs_list_entry->next)
+{
+bs = bs_list_entry->data;
+
 if (aio_context == bdrv_get_aio_context(bs)) {
 waited |= bdrv_drain_recurse(bs, true);
 }
@@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
 }
 }
 
+for (bs_list_entry = bs_list; bs_list_entry;
+ bs_list_entry = bs_list_entry->next)
+{
+bdrv_unref(bs_list_entry->data);
+}
+
 g_slist_free(aio_ctxs);
+g_slist_free(bs_list);
 }
 
 void bdrv_drain_all_end(void)
 {
 BlockDriverState *bs;
 BdrvNextIterator it;
+GSList *bs_list = NULL, *bs_list_entry;
+
+/* Must be called from the main loop */
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
+/* Keep a strong reference to all root BDS and copy them into an
+ * own list because draining them may lead to graph modifications.
+ */
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
+bdrv_ref(bs);
+bs_list = g_slist_prepend(bs_list, bs);
+}
+
+for (bs_list_entry = bs_list; bs_list_entry;
+ bs_list_entry = bs_list_entry->next)
+{
+AioContext *aio_context;
+
+bs = bs_list_entry->data;
+aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
 aio_enable_external(aio_context);
 bdrv_parent_drained_end(bs);
 bdrv_drain_recurse(bs, false);
 aio_context_release(aio_context);
+
+bdrv_unref(bs);
 }
 
+g_slist_free(bs_list);
+
 block_job_resume_all();
 }
 
-- 
2.13.6

[Qemu-block] [PATCH for-2.11] iotests: Use new-style NBD connections

2017-11-09 Thread Eric Blake

Old-style NBD is deprecated upstream (it is documented, but no
longer implemented in the reference implementation), and it is
severely limited (it cannot support structured replies, which
means it cannot support efficient handling of zeroes), when
compared to new-style NBD.  We are better off having our iotests
favor new-style everywhere (although some explicit tests,
particularly 83, still cover old-style for back-compat reasons);
this is as simple as supplying the empty string as the default
export name, as it does not change the URI needed to connect a
client to the server.  This also gives us more coverage of the
just-added structured reply code, when not overriding $QEMU_NBD
to intentionally point to an older server.

Signed-off-by: Eric Blake 
---

Proposing this for 2.11; it can either go in through the NBD
tree (although I just send my 2.11-rc1 pull request) or through
Max' iotest tree.

 tests/qemu-iotests/common.rc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 0e8a33c696..dbae7d74ba 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -242,7 +242,7 @@ _make_test_img()
 if [ $IMGPROTO = "nbd" ]; then
 # Pass a sufficiently high number to -e that should be enough for all
 # tests
-eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT -e 42  
$TEST_IMG_FILE >/dev/null &"
+eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 -f $IMGFMT -e 42 -x '' 
$TEST_IMG_FILE >/dev/null &"
 sleep 1 # FIXME: qemu-nbd needs to be listening before we continue
 fi

-- 
2.13.6

[Qemu-block] [PATCH v2 2/5] iotests: Add missing 'blkdebug::' in 040

2017-11-09 Thread Max Reitz

040 tries to invoke pause_drive() on a drive that does not use blkdebug.
Good idea, but let's use blkdebug to make it actually work.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/040 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index c284d08796..90b5b4f2ad 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -289,7 +289,7 @@ class TestSetSpeed(ImageCommitTestCase):
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
mid_img, test_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x1 0 512', test_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', 
mid_img)
-self.vm = iotests.VM().add_drive(test_img)
+self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
 self.vm.launch()
 
 def tearDown(self):
-- 
2.13.6

Re: [Qemu-block] [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-09 Thread Eric Blake

On 11/09/2017 02:43 PM, Max Reitz wrote:
> Draining a BDS may lead to graph modifications, which in turn may result
> in it and other BDS being stripped of their current references.  If
> bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
> references themselves, the BDS they are trying to drain (or undrain) may
> disappear right under their feet -- or, more specifically, under the
> feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().
> 
> This fixes an occasional hang of iotest 194.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/io.c | 47 ---
>  1 file changed, 44 insertions(+), 3 deletions(-)

> +
> +/* Keep a strong reference to all root BDS and copy them into
> + * an own list because draining them may lead to graph

'an own' sounds awkward; maybe 'copy them into a local list'

> + * modifications. */
> +bdrv_ref(bs);
> +bs_list = g_slist_prepend(bs_list, bs);
>  }

>  void bdrv_drain_all_end(void)
>  {
>  BlockDriverState *bs;
>  BdrvNextIterator it;
> +GSList *bs_list = NULL, *bs_list_entry;
> +
> +/* Must be called from the main loop */
> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>  
> +/* Keep a strong reference to all root BDS and copy them into an
> + * own list because draining them may lead to graph modifications.

And again.

With that tweak,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 4/5] iotests: Make 083 less flaky

2017-11-09 Thread Max Reitz

On 2017-11-09 15:11, Eric Blake wrote:
> On 11/08/2017 07:38 PM, Max Reitz wrote:
>> 083 has (at least) two issues:
> 
> I think I hit one of them intermittently yesterday; thanks for
> diagnosing these (and like you say, there may be more lurking, but we'll
> whack them separately if we can reproduce and identify them).
> 
>>
>> 1. By launching the nbd-fault-injector in background, it may not be
>>scheduled until the first grep on its output file is executed.
>>However, until then, that file may not have been created yet -- so it
>>either does not exist yet (thus making the grep emit an error), or it
>>does exist but contains stale data (thus making the rest of the test
>>case work connect to a wrong address).
>>Fix this by explicitly overwriting the output file before executing
>>nbd-fault-injector.
>>
>> 2. The nbd-fault-injector prints things other than "Listening on...".
>>It also prints a "Closing connection" message from time to time.  We
>>currently invoke sed on the whole file in the hope of it only
>>containing the "Listening on..." line yet.  That hope is sometimes
>>shattered by the brutal reality of race conditions, so invoke grep
>>before sed.
> 
> Invoking 'grep | sed' is almost always a waste of a process; sed can do
> the job alone.
> 
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  tests/qemu-iotests/083 | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
>> index 0306f112da..2f6444eeb9 100755
>> --- a/tests/qemu-iotests/083
>> +++ b/tests/qemu-iotests/083
>> @@ -86,6 +86,7 @@ EOF
>>  
>>  rm -f "$TEST_DIR/nbd.sock"
>>  
>> +echo > "$TEST_DIR/nbd-fault-injector.out"
> 
> This makes the file contain a blank line.  Would it be any better as a
> truly empty file, as in:
> 
> : > "$TEST_DIR/nbd-fault-injector.out"

Although ":>" looks funny, I guess I'd rather stay with the echo...
Yes, it may look stupid to you, but I would have to look up what exactly
that will do, whereas the echo is clear.

And in the end, both work equally fine.

>>  $PYTHON nbd-fault-injector.py $extra_args "$nbd_addr" 
>> "$TEST_DIR/nbd-fault-injector.conf" >"$TEST_DIR/nbd-fault-injector.out" 2>&1 
>> &
>>  
>>  # Wait for server to be ready
>> @@ -94,7 +95,7 @@ EOF
>>  done
>>  
>>  # Extract the final address (port number has now been assigned in tcp 
>> case)
>> -nbd_addr=$(sed 's/Listening on \(.*\)$/\1/' 
>> "$TEST_DIR/nbd-fault-injector.out")
>> +nbd_addr=$(grep 'Listening on ' "$TEST_DIR/nbd-fault-injector.out" 
>> | sed 's/Listening on \(.*\)$/\1/')
> 
> Fixing TAB damage while at it - nice.
> 
> Here's how to do it using just sed, and with less typing:
> 
> nbd_addr=$(sed -n 's/^Listening on //p' \
>"$TEST_DIR/nbd-fault-injector.out")

Oh, nice!  Will do, thanks.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-11-09 Thread John Snow



On 11/09/2017 09:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> What was the reason to abandon non-root nodes?

Eric had it correct: we were never convinced it work would properly, so
we went with a smaller set.

[Qemu-block] [PATCH v2 5/5] iotests: Make 136 less flaky

2017-11-09 Thread Max Reitz

136 executes some AIO requests without a final aio_flush; then it
advances the virtual clock and thus expects the last access time of the
device to be less than the current time when queried (i.e. idle_time_ns
to be greater than 0).  However, without the aio_flush, some requests
may be settled after the clock_step invocation.  In that case,
idle_time_ns would be 0 and the test fails.

Fix this by adding an aio_flush if any AIO request other than some other
aio_flush has been executed.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/136 | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/136 b/tests/qemu-iotests/136
index 4b994897af..88b97ea7c6 100644
--- a/tests/qemu-iotests/136
+++ b/tests/qemu-iotests/136
@@ -238,6 +238,18 @@ sector = "%d"
 for i in range(failed_wr_ops):
 ops.append("aio_write %d 512" % bad_offset)
 
+# We need an extra aio_flush to settle all outstanding AIO
+# operations before we can advance the virtual clock, so that
+# the last access happens before clock_step and idle_time_ns
+# will be greater than 0
+extra_flush = 0
+if rd_ops + wr_ops + invalid_rd_ops + invalid_wr_ops + \
+failed_rd_ops + failed_wr_ops > 0:
+extra_flush = 1
+
+if extra_flush > 0:
+ops.append("aio_flush")
+
 if failed_wr_ops > 0:
 highest_offset = max(highest_offset, bad_offset + 512)
 
@@ -251,7 +263,7 @@ sector = "%d"
 self.total_wr_bytes += wr_ops * wr_size
 self.total_wr_ops += wr_ops
 self.total_wr_merged += wr_merged
-self.total_flush_ops += flush_ops
+self.total_flush_ops += flush_ops + extra_flush
 self.invalid_rd_ops += invalid_rd_ops
 self.invalid_wr_ops += invalid_wr_ops
 self.failed_rd_ops += failed_rd_ops
-- 
2.13.6

Re: [Qemu-block] [PATCH v2 4/5] iotests: Make 083 less flaky

2017-11-09 Thread Eric Blake

On 11/09/2017 02:30 PM, Max Reitz wrote:
> 083 has (at least) two issues:
> 
> 1. By launching the nbd-fault-injector in background, it may not be
>scheduled until the first grep on its output file is executed.
>However, until then, that file may not have been created yet -- so it
>either does not exist yet (thus making the grep emit an error), or it
>does exist but contains stale data (thus making the rest of the test
>case work connect to a wrong address).
>Fix this by explicitly overwriting the output file before executing
>nbd-fault-injector.
> 
> 2. The nbd-fault-injector prints things other than "Listening on...".
>It also prints a "Closing connection" message from time to time.  We
>currently invoke sed on the whole file in the hope of it only
>containing the "Listening on..." line yet.  That hope is sometimes
>shattered by the brutal reality of race conditions, so invoke grep
>before sed.

Comment is now stale; s/invoke grep before sed/make the sed script more
robust/

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v2 3/5] iotests: Make 055 less flaky

2017-11-09 Thread Max Reitz

First of all, test 055 does a valiant job of invoking pause_drive()
sometimes, but that is worth nothing without blkdebug.  So the first
thing to do is to sprinkle a couple of "blkdebug::" in there -- with the
exception of the transaction tests, because the blkdebug break points
make the transaction QMP command hang (which is bad).  In that case, we
can get away with throttling the block job that it effectively is
paused.

Then, 055 usually does not pause the drive before starting a block job
that should be cancelled.  This means that the backup job might be
completed already before block-job-cancel is invoked; thus making the
test either fail (currently) or moot if cancel_and_wait() ignored this
condition.  Fix this by pausing the drive before starting the job.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/055 | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index e1206caf9b..8a5d9fd269 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -48,7 +48,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, 
str(image_len))
 
-self.vm = iotests.VM().add_drive(test_img)
+self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
 self.vm.add_drive(blockdev_target_img, interface="none")
 if iotests.qemu_default_machine == 'pc':
 self.vm.add_drive(None, 'media=cdrom', 'ide')
@@ -65,10 +65,11 @@ class TestSingleDrive(iotests.QMPTestCase):
 def do_test_cancel(self, cmd, target):
 self.assert_no_active_block_jobs()
 
+self.vm.pause_drive('drive0')
 result = self.vm.qmp(cmd, device='drive0', target=target, sync='full')
 self.assert_qmp(result, 'return', {})
 
-event = self.cancel_and_wait()
+event = self.cancel_and_wait(resume=True)
 self.assert_qmp(event, 'data/type', 'backup')
 
 def test_cancel_drive_backup(self):
@@ -166,7 +167,7 @@ class TestSetSpeed(iotests.QMPTestCase):
 def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, 
str(image_len))
 
-self.vm = iotests.VM().add_drive(test_img)
+self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
 self.vm.add_drive(blockdev_target_img, interface="none")
 self.vm.launch()
 
@@ -246,6 +247,8 @@ class TestSetSpeed(iotests.QMPTestCase):
 def test_set_speed_invalid_blockdev_backup(self):
 self.do_test_set_speed_invalid('blockdev-backup',  'drive1')
 
+# Note: We cannot use pause_drive() here, or the transaction command
+#   would stall.  Instead, we limit the block job speed here.
 class TestSingleTransaction(iotests.QMPTestCase):
 def setUp(self):
 qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, 
str(image_len))
@@ -271,7 +274,8 @@ class TestSingleTransaction(iotests.QMPTestCase):
 'type': cmd,
 'data': { 'device': 'drive0',
   'target': target,
-  'sync': 'full' },
+  'sync': 'full',
+  'speed': 64 * 1024 },
 }
 ])
 
@@ -289,12 +293,12 @@ class TestSingleTransaction(iotests.QMPTestCase):
 def do_test_pause(self, cmd, target, image):
 self.assert_no_active_block_jobs()
 
-self.vm.pause_drive('drive0')
 result = self.vm.qmp('transaction', actions=[{
 'type': cmd,
 'data': { 'device': 'drive0',
   'target': target,
-  'sync': 'full' },
+  'sync': 'full',
+  'speed': 64 * 1024 },
 }
 ])
 self.assert_qmp(result, 'return', {})
@@ -302,7 +306,9 @@ class TestSingleTransaction(iotests.QMPTestCase):
 result = self.vm.qmp('block-job-pause', device='drive0')
 self.assert_qmp(result, 'return', {})
 
-self.vm.resume_drive('drive0')
+result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
+self.assert_qmp(result, 'return', {})
+
 self.pause_job('drive0')
 
 result = self.vm.qmp('query-block-jobs')
@@ -461,7 +467,7 @@ class TestDriveCompression(iotests.QMPTestCase):
 pass
 
 def do_prepare_drives(self, fmt, args, attach_target):
-self.vm = iotests.VM().add_drive(test_img)
+self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
 
 qemu_img('create', '-f', fmt, blockdev_target_img,
  str(TestDriveCompression.image_len), *args)
@@ -500,10 +506,11 @@ class TestDriveCompression(iotests.QMPTestCase):
 
 self.assert_no_active_block_jobs()
 
+self.vm.pause_drive('drive0')
 result =

[Qemu-block] [PATCH v2 1/5] iotests: Make 030 less flaky

2017-11-09 Thread Max Reitz

This patch fixes two race conditions in 030:

1. The first is in TestENOSPC.test_enospc().  After resuming the job,
   querying it to confirm it is no longer paused may fail because in the
   meantime it might have completed already.  The same was fixed in
   TestEIO.test_ignore() already (in commit
   2c3b44da07d341557a8203cc509ea07fe3605e11).

2. The second is in TestSetSpeed.test_set_speed_invalid(): Here, a
   stream job is started on a drive without any break points, with a
   block-job-set-speed invoked subsequently.  However, without any break
   points, the job might have completed in the meantime (on tmpfs at
   least); or it might complete before cancel_and_wait() which expects
   the job to still exist.  This can be fixed like everywhere else by
   pausing the drive (installing break points) before starting the job
   and letting cancel_and_wait() resume it.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/030 | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 18838948fa..457984b8e9 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -666,6 +666,7 @@ class TestENOSPC(TestErrors):
 if event['event'] == 'BLOCK_JOB_ERROR':
 self.assert_qmp(event, 'data/device', 'drive0')
 self.assert_qmp(event, 'data/operation', 'read')
+error = True
 
 result = self.vm.qmp('query-block-jobs')
 self.assert_qmp(result, 'return[0]/paused', True)
@@ -676,9 +677,11 @@ class TestENOSPC(TestErrors):
 self.assert_qmp(result, 'return', {})
 
 result = self.vm.qmp('query-block-jobs')
+if result == {'return': []}:
+# Race; likely already finished. Check.
+continue
 self.assert_qmp(result, 'return[0]/paused', False)
 self.assert_qmp(result, 'return[0]/io-status', 'ok')
-error = True
 elif event['event'] == 'BLOCK_JOB_COMPLETED':
 self.assertTrue(error, 'job completed unexpectedly')
 self.assert_qmp(event, 'data/type', 'stream')
@@ -792,13 +795,14 @@ class TestSetSpeed(iotests.QMPTestCase):
 
 self.assert_no_active_block_jobs()
 
+self.vm.pause_drive('drive0')
 result = self.vm.qmp('block-stream', device='drive0')
 self.assert_qmp(result, 'return', {})
 
 result = self.vm.qmp('block-job-set-speed', device='drive0', speed=-1)
 self.assert_qmp(result, 'error/class', 'GenericError')
 
-self.cancel_and_wait()
+self.cancel_and_wait(resume=True)
 
 if __name__ == '__main__':
 iotests.main(supported_fmts=['qcow2', 'qed'])
-- 
2.13.6

[Qemu-block] [PATCH v2 0/5] iotests: Make some tests less flaky

2017-11-09 Thread Max Reitz

There are a couple of tests that fail (on my machine) from time to
time (and by that I mean that recently I've rarely ever had a test run
with both 083 and 136 working on first try).
This series should fix most (at least the issues I am aware of).

Notes:
- 083 might have another issue, but if so it occurs extremely rarely and
  so I was unable to debug it.

- 129 is flaky, too, because it tries to use block jobs with BB
  throttling -- however, block jobs ignore that these days.  Making it
  use a throttle filter node will require quite a bit of work.  See
  http://lists.nongnu.org/archive/html/qemu-block/2017-11/msg00111.html
  for more.

- 194 sometimes hangs because the source VM fails to drain its drive.
  This is probably not an issue with the test, but actually an issue in
  qemu.  See
  http://lists.nongnu.org/archive/html/qemu-block/2017-11/msg00256.html
  for more.


v2:
 - Spelling fixes [Eric]
 - Contract grep | sed to a single sed in 083 [Eric]


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[] [--] 'iotests: Make 030 less flaky'
002/5:[] [--] 'iotests: Add missing 'blkdebug::' in 040'
003/5:[] [--] 'iotests: Make 055 less flaky'
004/5:[0003] [FC] 'iotests: Make 083 less flaky'
005/5:[] [--] 'iotests: Make 136 less flaky'


Max Reitz (5):
  iotests: Make 030 less flaky
  iotests: Add missing 'blkdebug::' in 040
  iotests: Make 055 less flaky
  iotests: Make 083 less flaky
  iotests: Make 136 less flaky

 tests/qemu-iotests/030 |  8 ++--
 tests/qemu-iotests/040 |  2 +-
 tests/qemu-iotests/055 | 25 -
 tests/qemu-iotests/083 |  4 +++-
 tests/qemu-iotests/136 | 14 +-
 5 files changed, 39 insertions(+), 14 deletions(-)

-- 
2.13.6

[Qemu-block] [PATCH v2 4/5] iotests: Make 083 less flaky

2017-11-09 Thread Max Reitz

083 has (at least) two issues:

1. By launching the nbd-fault-injector in background, it may not be
   scheduled until the first grep on its output file is executed.
   However, until then, that file may not have been created yet -- so it
   either does not exist yet (thus making the grep emit an error), or it
   does exist but contains stale data (thus making the rest of the test
   case work connect to a wrong address).
   Fix this by explicitly overwriting the output file before executing
   nbd-fault-injector.

2. The nbd-fault-injector prints things other than "Listening on...".
   It also prints a "Closing connection" message from time to time.  We
   currently invoke sed on the whole file in the hope of it only
   containing the "Listening on..." line yet.  That hope is sometimes
   shattered by the brutal reality of race conditions, so invoke grep
   before sed.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/083 | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
index 0306f112da..3c1adbf0fb 100755
--- a/tests/qemu-iotests/083
+++ b/tests/qemu-iotests/083
@@ -86,6 +86,7 @@ EOF
 
rm -f "$TEST_DIR/nbd.sock"
 
+echo > "$TEST_DIR/nbd-fault-injector.out"
$PYTHON nbd-fault-injector.py $extra_args "$nbd_addr" 
"$TEST_DIR/nbd-fault-injector.conf" >"$TEST_DIR/nbd-fault-injector.out" 2>&1 &
 
# Wait for server to be ready
@@ -94,7 +95,8 @@ EOF
done
 
# Extract the final address (port number has now been assigned in tcp 
case)
-   nbd_addr=$(sed 's/Listening on \(.*\)$/\1/' 
"$TEST_DIR/nbd-fault-injector.out")
+nbd_addr=$(sed -n 's/^Listening on //p' \
+   "$TEST_DIR/nbd-fault-injector.out")
 
if [ "$proto" = "tcp" ]; then
nbd_url="nbd+tcp://$nbd_addr/$export_name"
-- 
2.13.6

Re: [Qemu-block] [PATCH v2 4/5] iotests: Make 083 less flaky

2017-11-09 Thread Max Reitz

On 2017-11-09 21:58, Eric Blake wrote:
> On 11/09/2017 02:30 PM, Max Reitz wrote:
>> 083 has (at least) two issues:
>>
>> 1. By launching the nbd-fault-injector in background, it may not be
>>scheduled until the first grep on its output file is executed.
>>However, until then, that file may not have been created yet -- so it
>>either does not exist yet (thus making the grep emit an error), or it
>>does exist but contains stale data (thus making the rest of the test
>>case work connect to a wrong address).
>>Fix this by explicitly overwriting the output file before executing
>>nbd-fault-injector.
>>
>> 2. The nbd-fault-injector prints things other than "Listening on...".
>>It also prints a "Closing connection" message from time to time.  We
>>currently invoke sed on the whole file in the hope of it only
>>containing the "Listening on..." line yet.  That hope is sometimes
>>shattered by the brutal reality of race conditions, so invoke grep
>>before sed.
> 
> Comment is now stale; s/invoke grep before sed/make the sed script more
> robust/

*sigh*

It appears my hope of easily fixing patches is often shattered by the
brutal reality, too.

> Reviewed-by: Eric Blake 

Thanks!  I'll fix the sentence when applying the series.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 0/5] iotests: Make some tests less flaky

2017-11-09 Thread Max Reitz

On 2017-11-09 21:30, Max Reitz wrote:
> There are a couple of tests that fail (on my machine) from time to
> time (and by that I mean that recently I've rarely ever had a test run
> with both 083 and 136 working on first try).
> This series should fix most (at least the issues I am aware of).

Fixed the commit message in patch 4 as suggested by Eric and applied to
my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH for-2.11] iotests: Use new-style NBD connections

2017-11-09 Thread Max Reitz

On 2017-11-09 23:12, Eric Blake wrote:
> Old-style NBD is deprecated upstream (it is documented, but no
> longer implemented in the reference implementation), and it is
> severely limited (it cannot support structured replies, which
> means it cannot support efficient handling of zeroes), when
> compared to new-style NBD.  We are better off having our iotests
> favor new-style everywhere (although some explicit tests,
> particularly 83, still cover old-style for back-compat reasons);
> this is as simple as supplying the empty string as the default
> export name, as it does not change the URI needed to connect a
> client to the server.  This also gives us more coverage of the
> just-added structured reply code, when not overriding $QEMU_NBD
> to intentionally point to an older server.
> 
> Signed-off-by: Eric Blake 
> ---
> 
> Proposing this for 2.11; it can either go in through the NBD
> tree (although I just send my 2.11-rc1 pull request) or through
> Max' iotest tree.

random.org said 1, so: Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v8 01/14] block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor()

2017-11-09 Thread John Snow



On 10/30/2017 12:32 PM, Vladimir Sementsov-Ogievskiy wrote:
> Enabling bitmap successor is necessary to enable successors of bitmaps
> being migrated before target vm start.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/dirty-bitmap.h | 1 +
>  block/dirty-bitmap.c | 8 
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 3579a7597c..93d4336505 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -20,6 +20,7 @@ BdrvDirtyBitmap 
> *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
>  BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
> BdrvDirtyBitmap *bitmap,
> Error **errp);
> +void bdrv_dirty_bitmap_enable_successor(BdrvDirtyBitmap *bitmap);
>  BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
>  const char *name);
>  void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index bd04e991b1..81adbeb6d4 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -237,6 +237,14 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState 
> *bs,
>  return 0;
>  }
>  
> +/* Called with BQL taken. */
> +void bdrv_dirty_bitmap_enable_successor(BdrvDirtyBitmap *bitmap)
> +{
> +qemu_mutex_lock(bitmap->mutex);
> +bdrv_enable_dirty_bitmap(bitmap->successor);
> +qemu_mutex_unlock(bitmap->mutex);
> +}
> +
>  /**
>   * For a bitmap with a successor, yield our name to the successor,
>   * delete the old bitmap, and return a handle to the new bitmap.
> 

Mechanically correct; though I haven't looked forward to how it's used yet.

Reviewed-by: John Snow

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

2017-11-09 Thread Fam Zheng

On Thu, 11/09 17:26, Alberto Garcia wrote:
> On Wed 08 Nov 2017 03:45:38 PM CET, Alberto Garcia wrote:
> 
> > I'm thinking that perhaps we should add the pause point directly to
> > block_job_defer_to_main_loop(), to prevent any block job from running
> > the exit function when it's paused.
> 
> I was trying this and unfortunately this breaks the mirror job at least
> (reproduced with a simple block-commit on the topmost node, which uses
> commit_active_start() -> mirror_start_job()).
> 
> So what happens is that mirror_run() always calls bdrv_drained_begin()
> before returning, pausing the block job. The closing bdrv_drained_end()
> call is at the end of mirror_exit(), already in the main loop.
> 
> So the mirror job is always calling block_job_defer_to_main_loop() and
> mirror_exit() when the job is paused.
> 

FWIW, I think Max's report on 194 failures is related:

https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01822.html

so perhaps it's worth testing his patch too:

https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01835.html

Fam

Re: [Qemu-block] [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)

2017-11-09 Thread Fam Zheng

On Thu, 11/09 20:31, Max Reitz wrote:
> On 2017-11-09 16:30, Fam Zheng wrote:
> > On Thu, 11/09 16:14, Max Reitz wrote:
> >> On 2017-11-09 05:21, Fam Zheng wrote:
> >>> On Thu, 11/09 01:48, Max Reitz wrote:
>  Hi,
> 
>  More exciting news from the bdrv_drain() front!
> 
>  I've noticed in the past that iotest 194 sometimes hangs.  I usually run
>  the tests on tmpfs, but I've just now verified that it happens on my SSD
>  just as well.
> 
>  So the reproducer is a plain:
> 
>  while ./check -raw 194; do; done
> >>>
> >>> I cannot produce it on my machine.
> >>
> >> Hm, too bad.  I see it both on my work laptop (with Fedora) and my
> >> desktop (with Arch)...
> >>
>  (No difference between raw or qcow2, though.)
> 
>  And then, after a couple of runs (or a couple ten), it will just hang.
>  The reason is that the source VM lingers around and doesn't quit
>  voluntarily -- the test itself was successful, but it just can't exit.
> 
>  If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
>  this is the backtrace:
> 
>  #0  0x7f7cfc297e06 in ppoll () at /lib64/libc.so.6
>  #1  0x563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
>  __nfds=, __fds=) at
>  /usr/include/bits/poll2.h:77
> >>>
> >>> Looking at the 0 timeout it seems we are in the aio_poll(ctx, 
> >>> blocking=false);
> >>> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is 
> >>> making
> >>> progress and causing the return value to be true in aio_poll().
> >>>
>  #2  0x563b846bcac9 in qemu_poll_ns (fds=,
>  nfds=, timeout=) at util/qemu-timer.c:322
>  #3  0x563b846be711 in aio_poll (ctx=ctx@entry=0x563b856e3e80,
>  blocking=) at util/aio-posix.c:629
>  #4  0x563b8463afa4 in bdrv_drain_recurse
>  (bs=bs@entry=0x563b865568a0, begin=begin@entry=true) at block/io.c:201
>  #5  0x563b8463baff in bdrv_drain_all_begin () at block/io.c:381
>  #6  0x563b8463bc99 in bdrv_drain_all () at block/io.c:411
>  #7  0x563b8459888b in block_migration_cleanup (opaque=  out>) at migration/block.c:714
>  #8  0x563b845883be in qemu_savevm_state_cleanup () at
>  migration/savevm.c:1251
>  #9  0x563b845811fd in migration_thread (opaque=0x563b856f1da0) at
>  migration/migration.c:2298
>  #10 0x7f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
>  #11 0x7f7cfc2a3e1f in clone () at /lib64/libc.so.6
> 
> 
>  And when you make bdrv_drain_all_begin() print what we are trying to
>  drain, you can see that it's the format node (managed by the "raw"
>  driver in this case).
> >>>
> >>> So what is the value of bs->in_flight?
> >>
> >> gdb:
> >>> (gdb) print bs->in_flight 
> >>> $3 = 2307492233
> >>
> >> "That's weird, why would it..."
> >>
> >>> (gdb) print *bs
> >>> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg = 
> >>> false, probed = false, force_share = 96, implicit = 159, drv = 0x0, 
> >>> opaque = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first 
> >>> = 0x8989898989898989}, 
> >>>   walking_aio_notifiers = 137, filename = '\211' , 
> >>> backing_file = '\211' , backing_format = '\211' 
> >>> , full_open_options = 0x8989898989898989, 
> >>>   exact_filename = '\211' , backing = 
> >>> 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment = 
> >>> 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233, 
> >>> max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment = 
> >>> 2307492233, opt_transfer = 2307492233, max_transfer = 2307492233, 
> >>> min_mem_alignment = 9910603678816504201, opt_mem_alignment = 
> >>> 9910603678816504201, max_iov = -1987475063}, 
> >>>   supported_write_flags = 2307492233, supported_zero_flags = 2307492233, 
> >>> node_name = '\211' , node_list = {tqe_next = 
> >>> 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next = 
> >>> 0x8989898989898989, 
> >>> tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next = 
> >>> 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063, 
> >>> op_blockers = {{lh_first = 0x8989898989898989} }, job = 
> >>> 0x8989898989898989, 
> >>>   inherits_from = 0x8989898989898989, children = {lh_first = 
> >>> 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options = 
> >>> 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes 
> >>> = 2307492233, 
> >>>   backing_blocker = 0x8989898989898989, total_sectors = 
> >>> -8536140394893047415, before_write_notifiers = {notifiers = {lh_first = 
> >>> 0x8989898989898989}}, write_threshold_offset = 9910603678816504201, 
> >>> write_threshold_notifier = {notify = 
> >>> 0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev = 
> >>> 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock = 
> >>> -1987475063, __count = 2307492233, __owner =

Re: [Qemu-block] [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-09 Thread Fam Zheng

On Thu, 11/09 21:43, Max Reitz wrote:
> Draining a BDS may lead to graph modifications, which in turn may result
> in it and other BDS being stripped of their current references.  If
> bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
> references themselves, the BDS they are trying to drain (or undrain) may
> disappear right under their feet -- or, more specifically, under the
> feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().
> 
> This fixes an occasional hang of iotest 194.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/io.c | 47 ---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 3d5ef2cabe..a0a2833e8e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
>  bool waited = true;
>  BlockDriverState *bs;
>  BdrvNextIterator it;
> -GSList *aio_ctxs = NULL, *ctx;
> +GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
> +
> +/* Must be called from the main loop */
> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>  
>  block_job_pause_all();
>  
> @@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
>  if (!g_slist_find(aio_ctxs, aio_context)) {
>  aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
>  }
> +
> +/* Keep a strong reference to all root BDS and copy them into
> + * an own list because draining them may lead to graph
> + * modifications. */
> +bdrv_ref(bs);
> +bs_list = g_slist_prepend(bs_list, bs);
>  }
>  
>  /* Note that completion of an asynchronous I/O operation can trigger any
> @@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
>  AioContext *aio_context = ctx->data;
>  
>  aio_context_acquire(aio_context);
> -for (bs = bdrv_first(); bs; bs = bdrv_next()) {
> +for (bs_list_entry = bs_list; bs_list_entry;
> + bs_list_entry = bs_list_entry->next)
> +{
> +bs = bs_list_entry->data;
> +
>  if (aio_context == bdrv_get_aio_context(bs)) {
>  waited |= bdrv_drain_recurse(bs, true);
>  }
> @@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
>  }
>  }
>  
> +for (bs_list_entry = bs_list; bs_list_entry;
> + bs_list_entry = bs_list_entry->next)
> +{
> +bdrv_unref(bs_list_entry->data);
> +}
> +
>  g_slist_free(aio_ctxs);
> +g_slist_free(bs_list);
>  }
>  
>  void bdrv_drain_all_end(void)
>  {
>  BlockDriverState *bs;
>  BdrvNextIterator it;
> +GSList *bs_list = NULL, *bs_list_entry;
> +
> +/* Must be called from the main loop */
> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>  
> +/* Keep a strong reference to all root BDS and copy them into an
> + * own list because draining them may lead to graph modifications.
> + */
>  for (bs = bdrv_first(); bs; bs = bdrv_next()) {
> -AioContext *aio_context = bdrv_get_aio_context(bs);
> +bdrv_ref(bs);
> +bs_list = g_slist_prepend(bs_list, bs);
> +}
> +
> +for (bs_list_entry = bs_list; bs_list_entry;
> + bs_list_entry = bs_list_entry->next)
> +{
> +AioContext *aio_context;
> +
> +bs = bs_list_entry->data;
> +aio_context = bdrv_get_aio_context(bs);
>  
>  aio_context_acquire(aio_context);
>  aio_enable_external(aio_context);
>  bdrv_parent_drained_end(bs);
>  bdrv_drain_recurse(bs, false);
>  aio_context_release(aio_context);
> +
> +bdrv_unref(bs);
>  }
>  
> +g_slist_free(bs_list);
> +
>  block_job_resume_all();
>  }
>  
> -- 
> 2.13.6
> 
> 

It is better to put the references into BdrvNextIterator and introduce
bdrv_next_iterator_destroy() to free them? You'll need to touch all callers
because it is not C++, but it secures all of rest, which seems vulnerable in the
same pattern, for example the aio_poll() in iothread_stop_all().

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH 5/5] iotests: Make 136 less flaky

2017-11-09 Thread Eric Blake

On 11/08/2017 07:38 PM, Max Reitz wrote:
> 136 executes some AIO requests without a final aio_flush; then it
> advances the virtual clock and thus expects the last access time of the
> device to be less than the current time when queried (i.e. idle_time_ns
> to be greater than 0).  However, without the aio_flush, some requests
> may be settled after the clock_step invocation.  In that case,
> idle_time_ns would be 0 and the test fails.
> 
> Fix this by adding an aio_flush if any AIO request but some other

s/but/other than/

> aio_flush has been executed.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/136 | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 6/7] nbd-client: Stricter enforcing of structured reply spec

2017-11-09 Thread Eric Blake

On 11/09/2017 03:37 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.11.2017 00:57, Eric Blake wrote:
>> Ensure that the server is not sending unexpected chunk lengths
>> for either the NONE or the OFFSET_DATA chunk, nor unexpected
>> hole length for OFFSET_HOLE.  This will flag any server that
>> responds to a zero-length read with an OFFSET_DATA as broken,
> 
> or OFFSET_HOLE

True, even though our implementation was not doing that.  I can tweak
the commit message.

> 
>> even though we previously fixed our client to never be able to
>> send such a request over the wire.
>>
>> Signed-off-by: Eric Blake 
>> ---
>>   block/nbd-client.c | 12 +---
>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>

>> @@ -281,7 +281,8 @@ static int
>> nbd_co_receive_offset_data_payload(NBDClientSession *s,
>>
>>   assert(nbd_reply_is_structured(>reply));
>>
>> -    if (chunk->length < sizeof(offset)) {
>> +    /* The NBD spec requires at least one byte of payload */
>> +    if (chunk->length <= sizeof(offset)) {
>>   error_setg(errp, "Protocol error: invalid payload for "
>>    "NBD_REPLY_TYPE_OFFSET_DATA");
>>   return -EINVAL;
>> @@ -293,7 +294,7 @@ static int
>> nbd_co_receive_offset_data_payload(NBDClientSession *s,
>>   be64_to_cpus();
>>
>>   data_size = chunk->length - sizeof(offset);
>> -    if (offset < orig_offset || data_size > qiov->size ||
>> +    if (!data_size || offset < orig_offset || data_size > qiov->size ||
> 
> !data_size - always false here (because of previous check), and even if
> it could be zero it
> isn't correspond to error message.
> 
> without this, or with an assert instead:

I like the assert instead.

> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 
> 

Thanks for the reviews; I'll queue this series onto my NBD tree with
your suggestions incorporated, and send a pull request for 2.11 shortly.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

73 matches

Mail list logo