[PATCH] docs: Move bootindex.txt into system section and rstify

2021-07-27 Thread Peter Maydell
Move bootindex.txt into the system section of the manual and turn it
into rST format.  To make the document make more sense in the context
of the system manual, expand the title and introductory paragraphs to
give more context.

Signed-off-by: Peter Maydell 
---
 docs/bootindex.txt| 52 ---
 docs/system/bootindex.rst | 76 +++
 docs/system/index.rst |  1 +
 3 files changed, 77 insertions(+), 52 deletions(-)
 delete mode 100644 docs/bootindex.txt
 create mode 100644 docs/system/bootindex.rst

diff --git a/docs/bootindex.txt b/docs/bootindex.txt
deleted file mode 100644
index 6937862ba0d..000
--- a/docs/bootindex.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-= Bootindex property =
-
-Block and net devices have bootindex property. This property is used to
-determine the order in which firmware will consider devices for booting
-the guest OS. If the bootindex property is not set for a device, it gets
-lowest boot priority. There is no particular order in which devices with
-unset bootindex property will be considered for booting, but they will
-still be bootable.
-
-== Example ==
-
-Let's assume we have a QEMU machine with two NICs (virtio, e1000) and two
-disks (IDE, virtio):
-
-qemu -drive file=disk1.img,if=none,id=disk1
- -device ide-hd,drive=disk1,bootindex=4
- -drive file=disk2.img,if=none,id=disk2
- -device virtio-blk-pci,drive=disk2,bootindex=3
- -netdev type=user,id=net0 -device virtio-net-pci,netdev=net0,bootindex=2
- -netdev type=user,id=net1 -device e1000,netdev=net1,bootindex=1
-
-Given the command above, firmware should try to boot from the e1000 NIC
-first.  If this fails, it should try the virtio NIC next; if this fails
-too, it should try the virtio disk, and then the IDE disk.
-
-== Limitations ==
-
-1. Some firmware has limitations on which devices can be considered for
-booting.  For instance, the PC BIOS boot specification allows only one
-disk to be bootable.  If boot from disk fails for some reason, the BIOS
-won't retry booting from other disk.  It can still try to boot from
-floppy or net, though.
-
-2. Sometimes, firmware cannot map the device path QEMU wants firmware to
-boot from to a boot method.  It doesn't happen for devices the firmware
-can natively boot from, but if firmware relies on an option ROM for
-booting, and the same option ROM is used for booting from more then one
-device, the firmware may not be able to ask the option ROM to boot from
-a particular device reliably.  For instance with the PC BIOS, if a SCSI HBA
-has three bootable devices target1, target3, target5 connected to it,
-the option ROM will have a boot method for each of them, but it is not
-possible to map from boot method back to a specific target.  This is a
-shortcoming of the PC BIOS boot specification.
-
-== Mixing bootindex and boot order parameters ==
-
-Note that it does not make sense to use the bootindex property together
-with the "-boot order=..." (or "-boot once=...") parameter. The guest
-firmware implementations normally either support the one or the other,
-but not both parameters at the same time. Mixing them will result in
-undefined behavior, and thus the guest firmware will likely not boot
-from the expected devices.
diff --git a/docs/system/bootindex.rst b/docs/system/bootindex.rst
new file mode 100644
index 000..8b057f812f2
--- /dev/null
+++ b/docs/system/bootindex.rst
@@ -0,0 +1,76 @@
+Managing device boot order with bootindex properties
+
+
+QEMU can tell QEMU-aware guest firmware (like the x86 PC BIOS)
+which order it should look for a bootable OS on which devices.
+A simple way to set this order is to use the ``-boot order=`` option,
+but you can also do this more flexibly, by setting a ``bootindex``
+property on the individual block or net devices you specify
+on the QEMU command line.
+
+The ``bootindex`` properties are used to determine the order in which
+firmware will consider devices for booting the guest OS. If the
+``bootindex`` property is not set for a device, it gets the lowest
+boot priority. There is no particular order in which devices with no
+``bootindex`` property set will be considered for booting, but they
+will still be bootable.
+
+Some guest machine types (for instance the s390x machines) do
+not support ``-boot order=``; on those machines you must always
+use ``bootindex`` properties.
+
+There is no way to set a ``bootindex`` property if you are using
+a short-form option like ``-hda`` or ``-cdrom``, so to use
+``bootindex`` properties you will need to expand out those options
+into long-form ``-drive`` and ``-device`` option pairs.
+
+Example
+---
+
+Let's assume we have a QEMU machine with two NICs (virtio, e1000) and two
+disks (IDE, virtio):
+
+.. parsed-literal::
+
+  |qemu_system| -drive file=disk1.img,if=none,id=disk1 \\
+-device ide-hd,drive=disk1,bootindex=4 \\
+-drive

Re: [PATCH RFC 0/3] mirror: rework soft-cancelling READY mirror

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

27.07.2021 19:47, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

That's an alternative to (part of) Max's
"[PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel"
and shows' my idea of handling soft-cancelling READY mirror case
directly in qmp_block_job_cancel. And cleanup all other job cancelling
functions.

That's untested draft, don't take it to heart :)



Side idea:

Instead of this all we can do the following:

1. Add new interface as alternative to soft-cancelling READY mirror.

It may be argument to job-complete qmp command, or some separate command to 
change job parameters, or just an option for blockdev-mirror command (may be 
our users actually know what they want when they start the job).

2. Deprecate block-job-cancel command (with recommendation to use job-cancel 
and new interface [1] instead)

3. Wait for 3 releases and apply patch 3, improved by dropping block-job-cancel 
at all.


This way, deprecated interface remains buggy until dropped, but that's not bad. 
It's good actually :)




Vladimir Sementsov-Ogievskiy (3):
   job: add job_complete_ex with do_graph_change argument
   job: use complete(do_graph_change=false) to handle soft cancel
   job: drop force argument of *job*cancel

  include/qemu/job.h   | 19 --
  block/backup.c   |  2 +-
  block/mirror.c   | 33 +---
  blockdev.c   | 13 +++--
  job-qmp.c|  2 +-
  job.c| 27 ++
  tests/unit/test-bdrv-drain.c |  2 +-
  tests/unit/test-block-iothread.c |  2 +-
  tests/unit/test-blockjob-txn.c   |  8 
  tests/unit/test-blockjob.c   |  4 ++--
  10 files changed, 62 insertions(+), 50 deletions(-)




--
Best regards,
Vladimir



[PATCH 3/3] job: drop force argument of *job*cancel

2021-07-27 Thread Vladimir Sementsov-Ogievskiy
Now, when soft-cancelling READY mirror handled in
qmp_block_job_cancel(), no other functions need to care of it:
cancel is always force.

So drop unused code paths.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/qemu/job.h | 14 --
 block/backup.c |  2 +-
 block/mirror.c | 13 -
 blockdev.c |  4 ++--
 job-qmp.c  |  2 +-
 job.c  | 18 --
 tests/unit/test-blockjob-txn.c |  8 
 7 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 3dfb79cee6..0e30665fed 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -103,12 +103,6 @@ typedef struct Job {
  */
 bool cancelled;
 
-/**
- * Set to true if the job should abort immediately without waiting
- * for data to be in sync.
- */
-bool force_cancel;
-
 /** Set to true when the job has deferred work to the main loop. */
 bool deferred_to_main_loop;
 
@@ -254,7 +248,7 @@ struct JobDriver {
 /**
  * If the callback is not NULL, it will be invoked in job_cancel_async
  */
-void (*cancel)(Job *job, bool force);
+void (*cancel)(Job *job);
 
 
 /** Called when the job is freed */
@@ -496,16 +490,16 @@ void job_complete(Job *job, Error **errp);
 void job_complete_ex(Job *job, bool do_graph_change, Error **errp);
 
 /**
- * Asynchronously cancel the specified @job. If @force is true, the job should
+ * Asynchronously cancel the specified @job.
  * be cancelled immediately without waiting for a consistent state.
  */
-void job_cancel(Job *job, bool force);
+void job_cancel(Job *job);
 
 /**
  * Cancels the specified job like job_cancel(), but may refuse to do so if the
  * operation isn't meaningful in the current state of the job.
  */
-void job_user_cancel(Job *job, bool force, Error **errp);
+void job_user_cancel(Job *job, Error **errp);
 
 /**
  * Synchronously cancel the @job.  The completion callback is called
diff --git a/block/backup.c b/block/backup.c
index bd3614ce70..6cf2f974aa 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -331,7 +331,7 @@ static void coroutine_fn backup_set_speed(BlockJob *job, 
int64_t speed)
 }
 }
 
-static void backup_cancel(Job *job, bool force)
+static void backup_cancel(Job *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 
diff --git a/block/mirror.c b/block/mirror.c
index ad9736eb5e..06a07baf46 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1095,9 +1095,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
 }
 trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 job_sleep_ns(&s->common.job, delay_ns);
-if (job_is_cancelled(&s->common.job) &&
-(!s->synced || s->common.job.force_cancel))
-{
+if (job_is_cancelled(&s->common.job)) {
 break;
 }
 s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1109,8 +1107,7 @@ immediate_exit:
  * or it was cancelled prematurely so that we do not guarantee that
  * the target is a copy of the source.
  */
-assert(ret < 0 || ((s->common.job.force_cancel || !s->synced) &&
-   job_is_cancelled(&s->common.job)));
+assert(ret < 0 || job_is_cancelled(&s->common.job));
 assert(need_drain);
 mirror_wait_for_all_io(s);
 }
@@ -1197,14 +1194,12 @@ static bool mirror_drained_poll(BlockJob *job)
 return !!s->in_flight;
 }
 
-static void mirror_cancel(Job *job, bool force)
+static void mirror_cancel(Job *job)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
 BlockDriverState *target = blk_bs(s->target);
 
-if (force || !job_is_ready(job)) {
-bdrv_cancel_in_flight(target);
-}
+bdrv_cancel_in_flight(target);
 }
 
 static const BlockJobDriver mirror_job_driver = {
diff --git a/blockdev.c b/blockdev.c
index c4ee5f02f4..cc424a451f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -150,7 +150,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 AioContext *aio_context = job->job.aio_context;
 aio_context_acquire(aio_context);
 
-job_cancel(&job->job, false);
+job_cancel(&job->job);
 
 aio_context_release(aio_context);
 }
@@ -3374,7 +3374,7 @@ void qmp_block_job_cancel(const char *device,
  */
 job_complete_ex(&job->job, false, errp);
 } else {
-job_user_cancel(&job->job, force, errp);
+job_user_cancel(&job->job, errp);
 }
 out:
 aio_context_release(aio_context);
diff --git a/job-qmp.c b/job-qmp.c
index 829a28aa70..272837bd1f 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -58,7 +58,7 @@ void qmp_job_cancel(const char *id, Error **errp)
 }
 
 trace_qmp_job_cancel(job);
-job_user_cancel(job, true, errp);
+job_user_cancel(job, errp);
 aio_context_rel

[PATCH 2/3] job: use complete(do_graph_change=false) to handle soft cancel

2021-07-27 Thread Vladimir Sementsov-Ogievskiy
Soft cancel of READY mirror is more like completion than cancelling. We
have bugs and misunderstanding because of this feature. Now, let's
handle it the other way to drop force cancelling at all in the
following commit.

This makes internal implementation cleaner. Still, we should deprecate
and drop old interface (through block-job-cancel) as a separate step.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 blockdev.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index 3d8ac368a1..c4ee5f02f4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3366,7 +3366,16 @@ void qmp_block_job_cancel(const char *device,
 }
 
 trace_qmp_block_job_cancel(job);
-job_user_cancel(&job->job, force, errp);
+if (!force && job_is_ready(&job->job)) {
+/*
+ * Hack to support old mirror soft-cancel. Please add new API to do
+ * complete with disabled graph-change, deprecate soft-cancel and
+ * finally drop this code.
+ */
+job_complete_ex(&job->job, false, errp);
+} else {
+job_user_cancel(&job->job, force, errp);
+}
 out:
 aio_context_release(aio_context);
 }
-- 
2.29.2




[PATCH 1/3] job: add job_complete_ex with do_graph_change argument

2021-07-27 Thread Vladimir Sementsov-Ogievskiy
It's an alternative for soft-cancelling mirror job after READY: mirror
should finish all in-flight requests, but don't change block graph in
any way.

To be used in the next commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/qemu/job.h   |  5 -
 block/mirror.c   | 20 ++--
 job.c|  9 +++--
 tests/unit/test-bdrv-drain.c |  2 +-
 tests/unit/test-block-iothread.c |  2 +-
 tests/unit/test-blockjob.c   |  4 ++--
 6 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 41162ed494..3dfb79cee6 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -211,7 +211,7 @@ struct JobDriver {
  * Optional callback for job types whose completion must be triggered
  * manually.
  */
-void (*complete)(Job *job, Error **errp);
+void (*complete)(Job *job, bool do_graph_change, Error **errp);
 
 /**
  * If the callback is not NULL, prepare will be invoked when all the jobs
@@ -492,6 +492,9 @@ void job_transition_to_ready(Job *job);
 /** Asynchronously complete the specified @job. */
 void job_complete(Job *job, Error **errp);
 
+/** Asynchronously complete the specified @job. */
+void job_complete_ex(Job *job, bool do_graph_change, Error **errp);
+
 /**
  * Asynchronously cancel the specified @job. If @force is true, the job should
  * be cancelled immediately without waiting for a consistent state.
diff --git a/block/mirror.c b/block/mirror.c
index 98fc66eabf..ad9736eb5e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -44,6 +44,11 @@ typedef struct MirrorBlockJob {
 BlockDriverState *base;
 BlockDriverState *base_overlay;
 
+/*
+ * Do final graph changes. True at start, may be changed by
+ * mirror_complete().
+ */
+bool do_graph_change;
 /* The name of the graph node to replace */
 char *replaces;
 /* The BDS to replace */
@@ -648,7 +653,7 @@ static int mirror_exit_common(Job *job)
 BlockDriverState *target_bs;
 BlockDriverState *mirror_top_bs;
 Error *local_err = NULL;
-bool abort = job->ret < 0;
+bool do_graph_change = s->do_graph_change && job->ret >= 0;
 int ret = 0;
 
 if (s->prepared) {
@@ -689,7 +694,7 @@ static int mirror_exit_common(Job *job)
 bs_opaque->stop = true;
 bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
  &error_abort);
-if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
+if (do_graph_change && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
 BlockDriverState *backing = s->is_none_mode ? src : s->base;
 BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
 
@@ -701,7 +706,7 @@ static int mirror_exit_common(Job *job)
 ret = -EPERM;
 }
 }
-} else if (!abort && s->backing_mode == MIRROR_OPEN_BACKING_CHAIN) {
+} else if (do_graph_change && s->backing_mode == 
MIRROR_OPEN_BACKING_CHAIN) {
 assert(!bdrv_backing_chain_next(target_bs));
 ret = bdrv_open_backing_file(bdrv_skip_filters(target_bs), NULL,
  "backing", &local_err);
@@ -716,7 +721,7 @@ static int mirror_exit_common(Job *job)
 aio_context_acquire(replace_aio_context);
 }
 
-if (s->should_complete && !abort) {
+if (s->should_complete && do_graph_change) {
 BlockDriverState *to_replace = s->to_replace ?: src;
 bool ro = bdrv_is_read_only(to_replace);
 
@@ -1124,7 +1129,7 @@ immediate_exit:
 return ret;
 }
 
-static void mirror_complete(Job *job, Error **errp)
+static void mirror_complete(Job *job, bool do_graph_change, Error **errp)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common.job);
 
@@ -1134,8 +1139,10 @@ static void mirror_complete(Job *job, Error **errp)
 return;
 }
 
+s->do_graph_change = do_graph_change;
+
 /* block all operations on to_replace bs */
-if (s->replaces) {
+if (s->do_graph_change && s->replaces) {
 AioContext *replace_aio_context;
 
 s->to_replace = bdrv_find_node(s->replaces);
@@ -1737,6 +1744,7 @@ static BlockJob *mirror_start_job(
 blk_set_allow_aio_context_change(s->target, true);
 blk_set_disable_request_queuing(s->target, true);
 
+s->do_graph_change = true;
 s->replaces = g_strdup(replaces);
 s->on_source_error = on_source_error;
 s->on_target_error = on_target_error;
diff --git a/job.c b/job.c
index e7a5d28854..52127dd6bd 100644
--- a/job.c
+++ b/job.c
@@ -987,7 +987,7 @@ int job_complete_sync(Job *job, Error **errp)
 return job_finish_sync(job, job_complete, errp);
 }
 
-void job_complete(Job *job, Error **errp)
+void job_complete_ex(Job *job, bool do_graph_change, Error **errp)
 {
 /* Should not be reachable via external interface for internal jobs */
 assert(job->id);
@@ -1000,7 +1000,12 @@ void job_complete(Job *job, Error **err

[PATCH RFC 0/3] mirror: rework soft-cancelling READY mirror

2021-07-27 Thread Vladimir Sementsov-Ogievskiy
Hi all!

That's an alternative to (part of) Max's
"[PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel"
and shows' my idea of handling soft-cancelling READY mirror case
directly in qmp_block_job_cancel. And cleanup all other job cancelling
functions.

That's untested draft, don't take it to heart :)

Vladimir Sementsov-Ogievskiy (3):
  job: add job_complete_ex with do_graph_change argument
  job: use complete(do_graph_change=false) to handle soft cancel
  job: drop force argument of *job*cancel

 include/qemu/job.h   | 19 --
 block/backup.c   |  2 +-
 block/mirror.c   | 33 +---
 blockdev.c   | 13 +++--
 job-qmp.c|  2 +-
 job.c| 27 ++
 tests/unit/test-bdrv-drain.c |  2 +-
 tests/unit/test-block-iothread.c |  2 +-
 tests/unit/test-blockjob-txn.c   |  8 
 tests/unit/test-blockjob.c   |  4 ++--
 10 files changed, 62 insertions(+), 50 deletions(-)

-- 
2.29.2




Re: [PATCH] gitlab-ci.d/buildtest: Disable iotests 197 and 215

2021-07-27 Thread Willian Rampazzo
On Tue, Jul 27, 2021 at 1:26 PM Thomas Huth  wrote:
>
> The iotests 197 and 215 are occasionally failing in the gitlab-CI now.
> According to the log, the failure is "./common.rc: Killed" which might
> be an indication that the process has been killed due to out-of-memory
> reasons. Both tests are doing a big read with 2G that likely causes
> this issue. It used to work fine in the gitlab-CI in the past, but
> either the program is now requiring more free memory, or the the CI
> containers have changed, so that the OOM condition now sometimes occurs.
>
> Anyway, these two tests are not really suitable for CI containers if
> they are doing things like huge reads (which is likely also the reason
> why they haven't been added to the "auto" group in the past), so let's
> simply disable them in the gitlab-CI now, too.
>
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/buildtest.yml | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>

Reviewed-by: Willian Rampazzo 




Re: [PATCH] gitlab-ci.d/buildtest: Disable iotests 197 and 215

2021-07-27 Thread Philippe Mathieu-Daudé
On 7/27/21 6:25 PM, Thomas Huth wrote:
> The iotests 197 and 215 are occasionally failing in the gitlab-CI now.
> According to the log, the failure is "./common.rc: Killed" which might
> be an indication that the process has been killed due to out-of-memory
> reasons. Both tests are doing a big read with 2G that likely causes
> this issue. It used to work fine in the gitlab-CI in the past, but
> either the program is now requiring more free memory, or the the CI
> containers have changed, so that the OOM condition now sometimes occurs.
> 
> Anyway, these two tests are not really suitable for CI containers if
> they are doing things like huge reads (which is likely also the reason
> why they haven't been added to the "auto" group in the past), so let's
> simply disable them in the gitlab-CI now, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/buildtest.yml | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] gitlab-ci.d/buildtest: Disable iotests 197 and 215

2021-07-27 Thread Daniel P . Berrangé
On Tue, Jul 27, 2021 at 06:25:42PM +0200, Thomas Huth wrote:
> The iotests 197 and 215 are occasionally failing in the gitlab-CI now.
> According to the log, the failure is "./common.rc: Killed" which might
> be an indication that the process has been killed due to out-of-memory
> reasons. Both tests are doing a big read with 2G that likely causes
> this issue. It used to work fine in the gitlab-CI in the past, but
> either the program is now requiring more free memory, or the the CI
> containers have changed, so that the OOM condition now sometimes occurs.
> 
> Anyway, these two tests are not really suitable for CI containers if
> they are doing things like huge reads (which is likely also the reason
> why they haven't been added to the "auto" group in the past), so let's
> simply disable them in the gitlab-CI now, too.
> 
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/buildtest.yml | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH] gitlab-ci.d/buildtest: Disable iotests 197 and 215

2021-07-27 Thread Thomas Huth
The iotests 197 and 215 are occasionally failing in the gitlab-CI now.
According to the log, the failure is "./common.rc: Killed" which might
be an indication that the process has been killed due to out-of-memory
reasons. Both tests are doing a big read with 2G that likely causes
this issue. It used to work fine in the gitlab-CI in the past, but
either the program is now requiring more free memory, or the the CI
containers have changed, so that the OOM condition now sometimes occurs.

Anyway, these two tests are not really suitable for CI containers if
they are doing things like huge reads (which is likely also the reason
why they haven't been added to the "auto" group in the past), so let's
simply disable them in the gitlab-CI now, too.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index ee0c83b600..63f1903f07 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -305,10 +305,10 @@ build-tcg-disabled:
 - cd tests/qemu-iotests/
 - ./check -raw 001 002 003 004 005 008 009 010 011 012 021 025 032 033 048
 052 063 077 086 101 104 106 113 148 150 151 152 157 159 160 163
-170 171 183 184 192 194 197 208 215 221 222 226 227 236 253 277
+170 171 183 184 192 194 208 221 222 226 227 236 253 277
 - ./check -qcow2 028 051 056 057 058 065 068 082 085 091 095 096 102 122
-124 132 139 142 144 145 151 152 155 157 165 194 196 197 200 202
-208 209 215 216 218 222 227 234 246 247 248 250 254 255 257 258
+124 132 139 142 144 145 151 152 155 157 165 194 196 200 202
+208 209 216 218 222 227 234 246 247 248 250 254 255 257 258
 260 261 262 263 264 270 272 273 277 279
 
 build-user:
-- 
2.27.0




Re: [PATCH] block: Fix in_flight leak in request padding error path

2021-07-27 Thread Max Reitz

On 27.07.21 17:49, Kevin Wolf wrote:

When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight
has been increased, but is never decreased again. This leads to a hang
when trying to drain the block node.

This bug was observed with Windows guests which issue a request that
fully uses IOV_MAX during installation, so that when padding is
necessary (O_DIRECT with a 4k sector size block device on the host),
adding another entry causes failure.

Call bdrv_dec_in_flight() to fix this. There is a larger problem to
solve here because this request shouldn't even fail, but Windows doesn't
seem to care and with this minimal fix the installation succeeds. So
given that we're already in freeze, let's take this minimal fix for 6.1.

Fixes: 98ca45494fcd6bf0336ecd559e440b6de6ea4cd3
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079
Reported-by: Qing Wang 
Signed-off-by: Kevin Wolf 
---
  block/io.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)


Reviewed-by: Max Reitz 




[PATCH] block: Fix in_flight leak in request padding error path

2021-07-27 Thread Kevin Wolf
When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight
has been increased, but is never decreased again. This leads to a hang
when trying to drain the block node.

This bug was observed with Windows guests which issue a request that
fully uses IOV_MAX during installation, so that when padding is
necessary (O_DIRECT with a 4k sector size block device on the host),
adding another entry causes failure.

Call bdrv_dec_in_flight() to fix this. There is a larger problem to
solve here because this request shouldn't even fail, but Windows doesn't
seem to care and with this minimal fix the installation succeeds. So
given that we're already in freeze, let's take this minimal fix for 6.1.

Fixes: 98ca45494fcd6bf0336ecd559e440b6de6ea4cd3
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079
Reported-by: Qing Wang 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index e0a689c584..a19942718b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1841,7 +1841,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
NULL);
 if (ret < 0) {
-return ret;
+goto fail;
 }
 
 tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
@@ -1849,10 +1849,11 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
   bs->bl.request_alignment,
   qiov, qiov_offset, flags);
 tracked_request_end(&req);
-bdrv_dec_in_flight(bs);
-
 bdrv_padding_destroy(&pad);
 
+fail:
+bdrv_dec_in_flight(bs);
+
 return ret;
 }
 
-- 
2.31.1




Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

27.07.2021 18:39, Max Reitz wrote:

On 27.07.21 15:04, Vladimir Sementsov-Ogievskiy wrote:

26.07.2021 17:46, Max Reitz wrote:

Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_request() as the general variant, which returns true for any


job_cancel_requested()


jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
  include/qemu/job.h |  8 +++-
  block/mirror.c | 10 --
  job.c  |  7 ++-
  3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8aa90f7395..032edf3c5f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
  /** Returns true if the job should not be visible to the management layer. */
  bool job_is_internal(Job *job);
  -/** Returns whether the job is scheduled for cancellation. */
+/** Returns whether the job is being cancelled. */
  bool job_is_cancelled(Job *job);
  +/**
+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
  /** Returns whether the job is in a completed state. */
  bool job_is_completed(Job *job);
  diff --git a/block/mirror.c b/block/mirror.c
index e93631a9f6..72e02fa34e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  /* Transition to the READY state and wait for complete. */
  job_transition_to_ready(&s->common.job);
  s->actively_synced = true;
-    while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
+    while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
  job_yield(&s->common.job);
  }
  s->common.job.cancelled = false;
@@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  }
    should_complete = s->should_complete ||
-    job_is_cancelled(&s->common.job);
+    job_cancel_requested(&s->common.job);
  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
  }
  @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error 
**errp)
  trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
    delay_ns);
  job_sleep_ns(&s->common.job, delay_ns);
-    if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
+    if (job_is_cancelled(&s->common.job)) {
  break;
  }
  s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1099,9 +1099,7 @@ immediate_exit:
   * or it was cancelled prematurely so that we do not guarantee that
   * the target is a copy of the source.
   */
-    assert(ret < 0 ||
-   (s->common.job.force_cancel &&
-    job_is_cancelled(&s->common.job)));
+    assert(ret < 0 || job_is_cancelled(&s->common.job));


(As a note, I hope this does the job regarding your suggestions for patch 4. :))


  assert(need_drain);
  mirror_wait_for_all_io(s);
  }
diff --git a/job.c b/job.c
index e78d893a9c..dba17a680f 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
  }
    bool job_is_cancelled(Job *job)
+{
+    return job->cancelled && job->force_cancel;


can job->cancelled be false when job->force_cancel is true ? I think not and 
worth an assertion here. Something like

if (job->force_cancel) {
   assert(job->cancelled);
   return true;
}

return false;


Sounds good, why not.




+}
+
+bool job_cancel_requested(Job *job)
  {
  return job->cancelled;
  }
@@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
  if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
  return;
  }
-    if (job_is_cancelled(job) || !job->driver->complete) {
+    if (job_cancel_requested(job) || !job->driver->complete) {

Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier

2021-07-27 Thread Max Reitz

On 27.07.21 15:13, Vladimir Sementsov-Ogievskiy wrote:

26.07.2021 17:46, Max Reitz wrote:

We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
  block/mirror.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 72e02fa34e..46d1a1e5a2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, 
Error **errp)

  mirror_wait_for_any_operation(s, true);
  }
  -    if (s->ret < 0) {
+    if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
  ret = s->ret;
  goto immediate_exit;
  }
@@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, 
Error **errp)

  break;
  }
  -    ret = 0;
-


That's just a cleanup, that statement is useless pre-patch, yes?


I think it was intended for if we left this loop via the 
job_is_cancelled() condition below.  Since it’s removed, this statement 
seems meaningless, so I removed it along with the `break`.


Max




  if (job_is_ready(&s->common.job) && !should_complete) {
  delay_ns = (s->in_flight == 0 &&
  cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
@@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, 
Error **errp)
  trace_mirror_before_sleep(s, cnt, 
job_is_ready(&s->common.job),

    delay_ns);
  job_sleep_ns(&s->common.job, delay_ns);
-    if (job_is_cancelled(&s->common.job)) {
-    break;
-    }
  s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  }



Reviewed-by: Vladimir Sementsov-Ogievskiy 






Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()

2021-07-27 Thread Max Reitz

On 27.07.21 15:04, Vladimir Sementsov-Ogievskiy wrote:

26.07.2021 17:46, Max Reitz wrote:

Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_request() as the general variant, which returns true for any


job_cancel_requested()


jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
  include/qemu/job.h |  8 +++-
  block/mirror.c | 10 --
  job.c  |  7 ++-
  3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8aa90f7395..032edf3c5f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
  /** Returns true if the job should not be visible to the management 
layer. */

  bool job_is_internal(Job *job);
  -/** Returns whether the job is scheduled for cancellation. */
+/** Returns whether the job is being cancelled. */
  bool job_is_cancelled(Job *job);
  +/**
+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
  /** Returns whether the job is in a completed state. */
  bool job_is_completed(Job *job);
  diff --git a/block/mirror.c b/block/mirror.c
index e93631a9f6..72e02fa34e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, 
Error **errp)

  /* Transition to the READY state and wait for complete. */
  job_transition_to_ready(&s->common.job);
  s->actively_synced = true;
-    while (!job_is_cancelled(&s->common.job) && 
!s->should_complete) {
+    while (!job_cancel_requested(&s->common.job) && 
!s->should_complete) {

  job_yield(&s->common.job);
  }
  s->common.job.cancelled = false;
@@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, 
Error **errp)

  }
    should_complete = s->should_complete ||
-    job_is_cancelled(&s->common.job);
+    job_cancel_requested(&s->common.job);
  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
  }
  @@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, 
Error **errp)
  trace_mirror_before_sleep(s, cnt, 
job_is_ready(&s->common.job),

    delay_ns);
  job_sleep_ns(&s->common.job, delay_ns);
-    if (job_is_cancelled(&s->common.job) && 
s->common.job.force_cancel) {

+    if (job_is_cancelled(&s->common.job)) {
  break;
  }
  s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1099,9 +1099,7 @@ immediate_exit:
   * or it was cancelled prematurely so that we do not 
guarantee that

   * the target is a copy of the source.
   */
-    assert(ret < 0 ||
-   (s->common.job.force_cancel &&
-    job_is_cancelled(&s->common.job)));
+    assert(ret < 0 || job_is_cancelled(&s->common.job));


(As a note, I hope this does the job regarding your suggestions for 
patch 4. :))



  assert(need_drain);
  mirror_wait_for_all_io(s);
  }
diff --git a/job.c b/job.c
index e78d893a9c..dba17a680f 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
  }
    bool job_is_cancelled(Job *job)
+{
+    return job->cancelled && job->force_cancel;


can job->cancelled be false when job->force_cancel is true ? I think 
not and worth an assertion here. Something like


if (job->force_cancel) {
   assert(job->cancelled);
   return true;
}

return false;


Sounds good, why not.




+}
+
+bool job_cancel_requested(Job *job)
  {
  return job->cancelled;
  }
@@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
  if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
  return;
  }
-    if (job_is_cancelled(job) || !job->driver->complete) {
+    if (job_cancel_requested(job) || !job->driver->complete) {
  error_se

Re: [PATCH for-6.1? 4/6] job: Add job_cancel_requested()

2021-07-27 Thread Max Reitz

On 27.07.21 16:47, Vladimir Sementsov-Ogievskiy wrote:

26.07.2021 10:09, Max Reitz wrote:



  job->ret = -ECANCELED;
  }
  if (job->ret) {
@@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
    /* Emit events only if we actually started */
  if (job_started(job)) {
-    if (job_is_cancelled(job)) {
+    if (job_cancel_requested(job)) {
  job_event_cancelled(job);


Same question here.. Shouldn't mirror report COMPLETED event in case 
of not-force cancelled in READY state?


Same here, I thought this is user-visible, nothing internal, so I 
should leave it as-is.


Now I see that cancelling mirror post-READY indeed should result in a 
COMPLETED event.  So I’m actually not exactly sure how mirror does 
that, despite this code here



Hmm. Now looking at mirror code, I see that it does 
"s->common.job.cancelled = false"


*lol*, that’s nice.

So since we’ve missed the rc1 boat now, I think this is 6.2 material.  
I’ll look into whether we can drop that then, that would be nice.


Max




Re: [PATCH for-6.1? 4/6] job: Add job_cancel_requested()

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

26.07.2021 10:09, Max Reitz wrote:



  job->ret = -ECANCELED;
  }
  if (job->ret) {
@@ -704,7 +709,7 @@ static int job_finalize_single(Job *job)
    /* Emit events only if we actually started */
  if (job_started(job)) {
-    if (job_is_cancelled(job)) {
+    if (job_cancel_requested(job)) {
  job_event_cancelled(job);


Same question here.. Shouldn't mirror report COMPLETED event in case of 
not-force cancelled in READY state?


Same here, I thought this is user-visible, nothing internal, so I should leave 
it as-is.

Now I see that cancelling mirror post-READY indeed should result in a COMPLETED 
event.  So I’m actually not exactly sure how mirror does that, despite this 
code here



Hmm. Now looking at mirror code, I see that it does "s->common.job.cancelled = 
false"

--
Best regards,
Vladimir



Re: [PULL for-6.1 00/11] hw/nvme fixes

2021-07-27 Thread Peter Maydell
On Mon, 26 Jul 2021 at 20:19, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> The following changes since commit 1d6f147f043bece029a795c6eb9d43c1abd909b6:
>
>   Merge remote-tracking branch 'remotes/quic/tags/pull-hex-20210725' into 
> staging (2021-07-26 13:36:51 +0100)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request
>
> for you to fetch changes up to 9631a8ab21679e3d605f7f540dd8c692b9593e02:
>
>   tests/qtest/nvme-test: add mmio read test (2021-07-26 21:09:39 +0200)
>
> 
> hw/nvme fixes
>
> * new PMR test (Gollu Appalanaidu)
> * pmr/sgl mapping fix (Padmakar Kalghatgi)
> * hotplug fixes (me)
> * mmio out-of-bound read fix (me)
> * big-endian host fixes (me)


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Re: Transient fail of iotests 215 and 197

2021-07-27 Thread Thomas Huth

On 21/07/2021 19.22, Daniel P. Berrangé wrote:

Peter caught the following transient fail on the staging tree:

   https://gitlab.com/qemu-project/qemu/-/jobs/1438817749

--- /builds/qemu-project/qemu/tests/qemu-iotests/197.out
+++ 197.out.bad
@@ -12,13 +12,12 @@
  128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  read 0/0 bytes at offset 0
  0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 2147483136/2147483136 bytes at offset 1024
-2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+./common.rc: Killed  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
  read 1024/1024 bytes at offset 3221226496
  1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on 
read-only device
-2 GiB (0x8001) bytes allocated at offset 0 bytes (0x0)
-1023.938 MiB (0x3fff) bytes not allocated at offset 2 GiB (0x8001)
+2 GiB (0x8000) bytes allocated at offset 0 bytes (0x0)
+1 GiB (0x4000) bytes not allocated at offset 2 GiB (0x8000)
  64 KiB (0x1) bytes allocated at offset 3 GiB (0xc000)
  1023.938 MiB (0x3fff) bytes not allocated at offset 3 GiB (0xc001)
  No errors were found on the image.


--- /builds/qemu-project/qemu/tests/qemu-iotests/215.out
+++ 215.out.bad
@@ -12,13 +12,12 @@
  128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  read 0/0 bytes at offset 0
  0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-read 2147483136/2147483136 bytes at offset 1024
-2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+./common.rc: Killed  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
  read 1024/1024 bytes at offset 3221226496
  1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Block node is read-only
-2 GiB (0x8001) bytes allocated at offset 0 bytes (0x0)
-1023.938 MiB (0x3fff) bytes not allocated at offset 2 GiB (0x8001)
+2 GiB (0x8000) bytes allocated at offset 0 bytes (0x0)
+1 GiB (0x4000) bytes not allocated at offset 2 GiB (0x8000)
  64 KiB (0x1) bytes allocated at offset 3 GiB (0xc000)
  1023.938 MiB (0x3fff) bytes not allocated at offset 3 GiB (0xc001)
  No errors were found on the image.


Looks like the process might have been killed off by the OS part way
through.

Interestingly both test cases have a comment:

   #Since a 2G read may exhaust
   # memory on some machines (particularly 32-bit), we skip the test if
   # that fails due to memory pressure.


I'm wondering if the logic for handling this failure is flawed, as being
killed by the OS for exhuasting memory limits for the CI container looks
like a plausible scenario to explain the failure.

The CI shared runners supposedly have 3.75 GB of RAM for the VM as a whole.
If the tests are run in parallel this could still be an issue.

Maybe we need to skip these tests by default if they are known to require
a significant amount of memory to run ?


The tests are not in the "auto" group, so they are not running by default - 
but I once added them to the build-tcg-disabled job since they were working 
fine in the gitlab-CI.


If they are now dying because of out-of-memory issues, that means that 
either they are using more memory now, or that the containers changed and 
provide less free memory now. Anyway, it sounds like the tests are not 
suited for the gitlab-CI anymore, and since they are not in the "auto" group 
anyway, I'd suggest to simply disable them in the build-tcg-disabled job 
again. I'll send a patch...


 Thomas




Re: [PATCH for-6.1? v2 7/7] iotests: Add mirror-ready-cancel-error test

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

26.07.2021 17:46, Max Reitz wrote:

Test what happens when there is an I/O error after a mirror job in the
READY phase has been cancelled.

Signed-off-by: Max Reitz



Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

26.07.2021 17:46, Max Reitz wrote:

We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
  block/mirror.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 72e02fa34e..46d1a1e5a2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  mirror_wait_for_any_operation(s, true);
  }
  
-if (s->ret < 0) {

+if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
  ret = s->ret;
  goto immediate_exit;
  }
@@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  break;
  }
  
-ret = 0;

-


That's just a cleanup, that statement is useless pre-patch, yes?


  if (job_is_ready(&s->common.job) && !should_complete) {
  delay_ns = (s->in_flight == 0 &&
  cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
@@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
delay_ns);
  job_sleep_ns(&s->common.job, delay_ns);
-if (job_is_cancelled(&s->common.job)) {
-break;
-}
  s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
  }
  



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

26.07.2021 17:46, Max Reitz wrote:

Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_request() as the general variant, which returns true for any


job_cancel_requested()


jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz 
---
  include/qemu/job.h |  8 +++-
  block/mirror.c | 10 --
  job.c  |  7 ++-
  3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 8aa90f7395..032edf3c5f 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -436,9 +436,15 @@ const char *job_type_str(const Job *job);
  /** Returns true if the job should not be visible to the management layer. */
  bool job_is_internal(Job *job);
  
-/** Returns whether the job is scheduled for cancellation. */

+/** Returns whether the job is being cancelled. */
  bool job_is_cancelled(Job *job);
  
+/**

+ * Returns whether the job is scheduled for cancellation (at an
+ * indefinite point).
+ */
+bool job_cancel_requested(Job *job);
+
  /** Returns whether the job is in a completed state. */
  bool job_is_completed(Job *job);
  
diff --git a/block/mirror.c b/block/mirror.c

index e93631a9f6..72e02fa34e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -936,7 +936,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  /* Transition to the READY state and wait for complete. */
  job_transition_to_ready(&s->common.job);
  s->actively_synced = true;
-while (!job_is_cancelled(&s->common.job) && !s->should_complete) {
+while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
  job_yield(&s->common.job);
  }
  s->common.job.cancelled = false;
@@ -1043,7 +1043,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  }
  
  should_complete = s->should_complete ||

-job_is_cancelled(&s->common.job);
+job_cancel_requested(&s->common.job);
  cnt = bdrv_get_dirty_count(s->dirty_bitmap);
  }
  
@@ -1087,7 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)

  trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
delay_ns);
  job_sleep_ns(&s->common.job, delay_ns);
-if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {
+if (job_is_cancelled(&s->common.job)) {
  break;
  }
  s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1099,9 +1099,7 @@ immediate_exit:
   * or it was cancelled prematurely so that we do not guarantee that
   * the target is a copy of the source.
   */
-assert(ret < 0 ||
-   (s->common.job.force_cancel &&
-job_is_cancelled(&s->common.job)));
+assert(ret < 0 || job_is_cancelled(&s->common.job));
  assert(need_drain);
  mirror_wait_for_all_io(s);
  }
diff --git a/job.c b/job.c
index e78d893a9c..dba17a680f 100644
--- a/job.c
+++ b/job.c
@@ -216,6 +216,11 @@ const char *job_type_str(const Job *job)
  }
  
  bool job_is_cancelled(Job *job)

+{
+return job->cancelled && job->force_cancel;


can job->cancelled be false when job->force_cancel is true ? I think not and 
worth an assertion here. Something like

if (job->force_cancel) {
   assert(job->cancelled);
   return true;
}

return false;


+}
+
+bool job_cancel_requested(Job *job)
  {
  return job->cancelled;
  }
@@ -1015,7 +1020,7 @@ void job_complete(Job *job, Error **errp)
  if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
  return;
  }
-if (job_is_cancelled(job) || !job->driver->complete) {
+if (job_cancel_requested(job) || !job->driver->complete) {
  error_setg(errp, "The active block job '%s' cannot be completed",
 job->id);
  return;



I think it's a correct change, although there may be unexpected 

Re: [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

26.07.2021 17:46, Max Reitz wrote:

We largely have two cancel modes for jobs:

First, there is actual cancelling.  The job is terminated as soon as
possible, without trying to reach a consistent result.

Second, we have mirror in the READY state.  Technically, the job is not
really cancelled, but it just is a different completion mode.  The job
can still run for an indefinite amount of time while it tries to reach a
consistent result.

We want to be able to clearly distinguish which cancel mode a job is in
(when it has been cancelled).  We can use Job.force_cancel for this, but
right now it only reflects cancel requests from the user with
force=true, but clearly, jobs that do not even distinguish between
force=false and force=true are effectively always force-cancelled.

So this patch has Job.force_cancel signify whether the job will
terminate as soon as possible (force_cancel=true) or whether it will
effectively remain running despite being "cancelled"
(force_cancel=false).

To this end, we let jobs that provide JobDriver.cancel() tell the
generic job code whether they will terminate as soon as possible or not,
and for jobs that do not provide that method we assume they will.

Signed-off-by: Max Reitz 
---
  include/qemu/job.h | 11 ++-
  block/backup.c |  3 ++-
  block/mirror.c | 24 ++--
  job.c  |  6 +-
  4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5e8edbc2c8..8aa90f7395 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -253,8 +253,17 @@ struct JobDriver {
  
  /**

   * If the callback is not NULL, it will be invoked in job_cancel_async
+ *
+ * This function must return true if the job will be cancelled
+ * immediately without any further I/O (mandatory if @force is
+ * true), and false otherwise.  This lets the generic job layer
+ * know whether a job has been truly (force-)cancelled, or whether
+ * it is just in a special completion mode (like mirror after
+ * READY).
+ * (If the callback is NULL, the job is assumed to terminate
+ * without I/O.)
   */
-void (*cancel)(Job *job, bool force);
+bool (*cancel)(Job *job, bool force);
  
  
  /** Called when the job is freed */

diff --git a/block/backup.c b/block/backup.c
index bd3614ce70..513e1c8a0b 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -331,11 +331,12 @@ static void coroutine_fn backup_set_speed(BlockJob *job, 
int64_t speed)
  }
  }
  
-static void backup_cancel(Job *job, bool force)

+static bool backup_cancel(Job *job, bool force)
  {
  BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
  
  bdrv_cancel_in_flight(s->target_bs);

+return true;
  }
  
  static const BlockJobDriver backup_job_driver = {

diff --git a/block/mirror.c b/block/mirror.c
index fcb7b65f93..e93631a9f6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1087,9 +1087,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
delay_ns);
  job_sleep_ns(&s->common.job, delay_ns);
-if (job_is_cancelled(&s->common.job) &&
-(!job_is_ready(&s->common.job) || s->common.job.force_cancel))
-{
+if (job_is_cancelled(&s->common.job) && s->common.job.force_cancel) {


Seems, it could it be reduced to

if (s->common.job.force_cancel) {



  break;
  }
  s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -1102,7 +1100,7 @@ immediate_exit:
   * the target is a copy of the source.
   */
  assert(ret < 0 ||
-   ((s->common.job.force_cancel || !job_is_ready(&s->common.job)) 
&&
+   (s->common.job.force_cancel &&


and here


  job_is_cancelled(&s->common.job)));
  assert(need_drain);
  mirror_wait_for_all_io(s);
@@ -1188,14 +1186,27 @@ static bool mirror_drained_poll(BlockJob *job)
  return !!s->in_flight;
  }
  


anyway:

Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir



Re: [PATCH for-6.1? v2 2/7] mirror: Drop s->synced

2021-07-27 Thread Vladimir Sementsov-Ogievskiy

26.07.2021 17:46, Max Reitz wrote:

As of HEAD^, there is no meaning to s->synced other than whether the job
is READY or not.  job_is_ready() gives us that information, too.

Suggested-by: Vladimir Sementsov-Ogievskiy
Signed-off-by: Max Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir