Re: [Qemu-block] [PATCH v3 1/1] block/mirror: change the semantic of 'force' of block-job-cancel

2018-03-13 Thread Kevin Wolf
Am 13.03.2018 um 13:12 hat Jeff Cody geschrieben:
> From: Liang Li 
> 
> When doing drive mirror to a low speed shared storage, if there was heavy
> BLK IO write workload in VM after the 'ready' event, drive mirror block job
> can't be canceled immediately, it would keep running until the heavy BLK IO
> workload stopped in the VM.
> 
> Libvirt depends on the current block-job-cancel semantics, which is that
> when used without a flag after the 'ready' event, the command blocks
> until data is in sync.  However, these semantics are awkward in other
> situations, for example, people may use drive mirror for realtime
> backups while still wanting to use block live migration.  Libvirt cannot
> start a block live migration while another drive mirror is in progress,
> but the user would rather abandon the backup attempt as broken and
> proceed with the live migration than be stuck waiting for the current
> drive mirror backup to finish.
> 
> The drive-mirror command already includes a 'force' flag, which libvirt
> does not use, although it documented the flag as only being useful to
> quit a job which is paused.  However, since quitting a paused job has
> the same effect as abandoning a backup in a non-paused job (namely, the
> destination file is not in sync, and the command completes immediately),
> we can just improve the documentation to make the force flag obviously
> useful.
> 
> Cc: Paolo Bonzini 
> Cc: Jeff Cody 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: Eric Blake 
> Cc: John Snow 
> Reported-by: Huaitong Han 
> Signed-off-by: Huaitong Han 
> Signed-off-by: Liang Li 
> Signed-off-by: Jeff Cody 
> ---
> 
> N.B.: This was rebased on top of Kevin's block branch,
>   and the 'force' flag added to block_job_user_cancel

Thanks, applied to the block branch.

Kevin



[Qemu-block] [PATCH v3 1/1] block/mirror: change the semantic of 'force' of block-job-cancel

2018-03-13 Thread Jeff Cody
From: Liang Li 

When doing drive mirror to a low speed shared storage, if there was heavy
BLK IO write workload in VM after the 'ready' event, drive mirror block job
can't be canceled immediately, it would keep running until the heavy BLK IO
workload stopped in the VM.

Libvirt depends on the current block-job-cancel semantics, which is that
when used without a flag after the 'ready' event, the command blocks
until data is in sync.  However, these semantics are awkward in other
situations, for example, people may use drive mirror for realtime
backups while still wanting to use block live migration.  Libvirt cannot
start a block live migration while another drive mirror is in progress,
but the user would rather abandon the backup attempt as broken and
proceed with the live migration than be stuck waiting for the current
drive mirror backup to finish.

The drive-mirror command already includes a 'force' flag, which libvirt
does not use, although it documented the flag as only being useful to
quit a job which is paused.  However, since quitting a paused job has
the same effect as abandoning a backup in a non-paused job (namely, the
destination file is not in sync, and the command completes immediately),
we can just improve the documentation to make the force flag obviously
useful.

Cc: Paolo Bonzini 
Cc: Jeff Cody 
Cc: Kevin Wolf 
Cc: Max Reitz 
Cc: Eric Blake 
Cc: John Snow 
Reported-by: Huaitong Han 
Signed-off-by: Huaitong Han 
Signed-off-by: Liang Li 
Signed-off-by: Jeff Cody 
---

N.B.: This was rebased on top of Kevin's block branch,
  and the 'force' flag added to block_job_user_cancel

 block/mirror.c| 10 --
 blockdev.c|  4 ++--
 blockjob.c| 16 +---
 hmp-commands.hx   |  3 ++-
 include/block/blockjob.h  | 12 ++--
 qapi/block-core.json  |  5 +++--
 tests/test-blockjob-txn.c |  8 
 7 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 76fddb3838..820f512c7b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
 
 ret = 0;
 trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
-if (!s->synced) {
-block_job_sleep_ns(>common, delay_ns);
-if (block_job_is_cancelled(>common)) {
-break;
-}
+if (block_job_is_cancelled(>common) && s->common.force) {
+break;
 } else if (!should_complete) {
 delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
 block_job_sleep_ns(>common, delay_ns);
@@ -887,7 +884,8 @@ immediate_exit:
  * or it was cancelled prematurely so that we do not guarantee that
  * the target is a copy of the source.
  */
-assert(ret < 0 || (!s->synced && block_job_is_cancelled(>common)));
+assert(ret < 0 || ((s->common.force || !s->synced) &&
+   block_job_is_cancelled(>common)));
 assert(need_drain);
 mirror_wait_for_all_io(s);
 }
diff --git a/blockdev.c b/blockdev.c
index 809adbe7f9..6ac4467ac4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -150,7 +150,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 aio_context_acquire(aio_context);
 
 if (bs->job) {
-block_job_cancel(bs->job);
+block_job_cancel(bs->job, false);
 }
 
 aio_context_release(aio_context);
@@ -3831,7 +3831,7 @@ void qmp_block_job_cancel(const char *device,
 }
 
 trace_qmp_block_job_cancel(job);
-block_job_user_cancel(job, errp);
+block_job_user_cancel(job, force, errp);
 out:
 aio_context_release(aio_context);
 }
diff --git a/blockjob.c b/blockjob.c
index ba538c93dd..885197abf6 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -487,7 +487,7 @@ static int block_job_finalize_single(BlockJob *job)
 return 0;
 }
 
-static void block_job_cancel_async(BlockJob *job)
+static void block_job_cancel_async(BlockJob *job, bool force)
 {
 if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
 block_job_iostatus_reset(job);
@@ -498,6 +498,8 @@ static void block_job_cancel_async(BlockJob *job)
 job->pause_count--;
 }
 job->cancelled = true;
+/* To prevent 'force == false' overriding a previous 'force == true' */
+job->force |= force;
 }
 
 static int block_job_txn_apply(BlockJobTxn *txn, int fn(BlockJob *), bool lock)
@@ -581,7 +583,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
  * on the caller, so leave it. */
 QLIST_FOREACH(other_job, >jobs, txn_list) {
 if (other_job != job) {
-block_job_cancel_async(other_job);
+