Re: [PATCH v2 17/20] backup: move to block-copy

2020-11-04 Thread Max Reitz
On 26.10.20 16:18, Vladimir Sementsov-Ogievskiy wrote:
> 23.07.2020 12:47, Max Reitz wrote:
>>> +static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
>>> +{
>>> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>>> +
>>> +    if (s->bcs) {
>>> +    /* In block_job_create we yet don't have bcs */
>> Shouldn’t hurt to make it conditional, but how can we get here in
>> block_job_create()?
>>
> 
> block_job_set_speed is called from block_job_create.

Ah, right.

Max




Re: [PATCH v2 17/20] backup: move to block-copy

2020-10-26 Thread Vladimir Sementsov-Ogievskiy

23.07.2020 12:47, Max Reitz wrote:

+static void coroutine_fn backup_set_speed(BlockJob *job, int64_t speed)
+{
+BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+if (s->bcs) {
+/* In block_job_create we yet don't have bcs */

Shouldn’t hurt to make it conditional, but how can we get here in
block_job_create()?



block_job_set_speed is called from block_job_create.

--
Best regards,
Vladimir



Re: [PATCH v2 17/20] backup: move to block-copy

2020-09-21 Thread Vladimir Sementsov-Ogievskiy

23.07.2020 12:47, Max Reitz wrote:

On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:

This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block-copy.h |   9 +--
  block/backup.c | 145 +++--
  block/block-copy.c |  21 ++
  3 files changed, 83 insertions(+), 92 deletions(-)


This patch feels like it should be multiple ones.  I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.

More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch.  Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.

(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now.  We’d just stop using the progress
callback in backup, and remove it in one separate patch.)

[...]


diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
  BlockdevOnError on_source_error;
  BlockdevOnError on_target_error;
  uint64_t len;
-uint64_t bytes_read;
  int64_t cluster_size;
  int max_workers;
  int64_t max_chunk;
  
  BlockCopyState *bcs;

+
+BlockCopyCallState *bcs_call;


Can this be more descriptive?  E.g. background_bcs?  bg_bcs_call?  bg_bcsc?


+int ret;
+bool error_is_read;
  } BackupBlockJob;
  
  static const BlockJobDriver backup_job_driver;
  


[...]


  static int coroutine_fn backup_loop(BackupBlockJob *job)
  {
-bool error_is_read;
-int64_t offset;
-BdrvDirtyBitmapIter *bdbi;
-int ret = 0;
+while (true) { /* retry loop */
+assert(!job->bcs_call);
+job->bcs_call = block_copy_async(job->bcs, 0,
+ QEMU_ALIGN_UP(job->len,
+   job->cluster_size),
+ true, job->max_workers, 
job->max_chunk,
+ backup_block_copy_callback, job);
  
-bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));

-while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-do {
-if (yield_and_check(job)) {
-goto out;
+while (job->bcs_call && !job->common.job.cancelled) {
+/* wait and handle pauses */


Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused?  I can’t see anyone doing that.

Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it.  Is there some magic I’m missing?

Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?


Yes, that's my idea: we cancel on pause and just run new block_copy operation
on resume.


But cancelling the block-copy operation leads to it returning 0, AFAIR,
so...


Looks like it should be fixed. By returning ECANCELED or by checking the bitmap.




+
+job_pause_point(>common.job);
+
+if (job->bcs_call && !job->common.job.cancelled) {
+job_yield(>common.job);
  }
-ret = backup_do_cow(job, offset, job->cluster_size, 
_is_read);
-if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
-   BLOCK_ERROR_ACTION_REPORT)
-{
-goto out;
+}
+
+if (!job->bcs_call && job->ret == 0) {
+/* Success */
+return 0;


...I would assume we return here when the job is paused.


+}
+
+if (job->common.job.cancelled) {
+if (job->bcs_call) {
+block_copy_cancel(job->bcs_call);
  }
-} while (ret < 0);
+return 0;
+}
+
+if (!job->bcs_call && job->ret < 0 &&


Is it even possible for bcs_call to be non-NULL here?


+(backup_error_action(job, job->error_is_read, -job->ret) ==
+ BLOCK_ERROR_ACTION_REPORT))
+{
+return job->ret;
+}


So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again?  That doesn’t sound very IGNORE-y to me.


Not the whole. We have the dirty bitmap: clusters that was already copied are 
not touched more.



--
Best regards,
Vladimir



Re: [PATCH v2 17/20] backup: move to block-copy

2020-07-23 Thread Max Reitz
On 01.06.20 20:11, Vladimir Sementsov-Ogievskiy wrote:
> This brings async request handling and block-status driven chunk sizes
> to backup out of the box, which improves backup performance.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/block-copy.h |   9 +--
>  block/backup.c | 145 +++--
>  block/block-copy.c |  21 ++
>  3 files changed, 83 insertions(+), 92 deletions(-)

This patch feels like it should be multiple ones.  I don’t see why a
patch that lets backup use block-copy (more) should have to modify the
block-copy code.

More specifically: I think that block_copy_set_progress_callback()
should be removed in a separate patch.  Also, moving
@cb_opaque/@progress_opaque from BlockCopyState into BlockCopyCallState
seems like a separate patch to me, too.

(I presume if the cb had had its own opaque object from patch 5 on,
there wouldn’t be this problem now.  We’d just stop using the progress
callback in backup, and remove it in one separate patch.)

[...]

> diff --git a/block/backup.c b/block/backup.c
> index ec2676abc2..59c00f5293 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
>  BlockdevOnError on_source_error;
>  BlockdevOnError on_target_error;
>  uint64_t len;
> -uint64_t bytes_read;
>  int64_t cluster_size;
>  int max_workers;
>  int64_t max_chunk;
>  
>  BlockCopyState *bcs;
> +
> +BlockCopyCallState *bcs_call;

Can this be more descriptive?  E.g. background_bcs?  bg_bcs_call?  bg_bcsc?

> +int ret;
> +bool error_is_read;
>  } BackupBlockJob;
>  
>  static const BlockJobDriver backup_job_driver;
>  

[...]

>  static int coroutine_fn backup_loop(BackupBlockJob *job)
>  {
> -bool error_is_read;
> -int64_t offset;
> -BdrvDirtyBitmapIter *bdbi;
> -int ret = 0;
> +while (true) { /* retry loop */
> +assert(!job->bcs_call);
> +job->bcs_call = block_copy_async(job->bcs, 0,
> + QEMU_ALIGN_UP(job->len,
> +   job->cluster_size),
> + true, job->max_workers, 
> job->max_chunk,
> + backup_block_copy_callback, job);
>  
> -bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
> -while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
> -do {
> -if (yield_and_check(job)) {
> -goto out;
> +while (job->bcs_call && !job->common.job.cancelled) {
> +/* wait and handle pauses */

Doesn’t someone need to reset BlockCopyCallState.cancelled when the job
is unpaused?  I can’t see anyone doing that.

Well, or even just reentering the block-copy operation after
backup_pause() has cancelled it.  Is there some magic I’m missing?

Does pausing (which leads to cancelling the block-copy operation) just
yield to the CB being invoked, so the copy operation is considered done,
and then we just enter the next iteration of the loop and try again?
But cancelling the block-copy operation leads to it returning 0, AFAIR,
so...

> +
> +job_pause_point(>common.job);
> +
> +if (job->bcs_call && !job->common.job.cancelled) {
> +job_yield(>common.job);
>  }
> -ret = backup_do_cow(job, offset, job->cluster_size, 
> _is_read);
> -if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
> -   BLOCK_ERROR_ACTION_REPORT)
> -{
> -goto out;
> +}
> +
> +if (!job->bcs_call && job->ret == 0) {
> +/* Success */
> +return 0;

...I would assume we return here when the job is paused.

> +}
> +
> +if (job->common.job.cancelled) {
> +if (job->bcs_call) {
> +block_copy_cancel(job->bcs_call);
>  }
> -} while (ret < 0);
> +return 0;
> +}
> +
> +if (!job->bcs_call && job->ret < 0 &&

Is it even possible for bcs_call to be non-NULL here?

> +(backup_error_action(job, job->error_is_read, -job->ret) ==
> + BLOCK_ERROR_ACTION_REPORT))
> +{
> +return job->ret;
> +}

So if we get an error, and the error action isn’t REPORT, we just try
the whole operation again?  That doesn’t sound very IGNORE-y to me.

>  }
>  
> - out:
> -bdrv_dirty_iter_free(bdbi);
> -return ret;
> +g_assert_not_reached();
>  }
>  
>  static void backup_init_bcs_bitmap(BackupBlockJob *job)
> @@ -246,9 +227,14 @@ static int coroutine_fn backup_run(Job *job, Error 
> **errp)
>  int64_t count;
>  
>  for (offset = 0; offset < s->len; ) {
> -if (yield_and_check(s)) {
> -ret = -ECANCELED;
> -goto out;
> +if (job_is_cancelled(job)) {
> +   

[PATCH v2 17/20] backup: move to block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |   9 +--
 block/backup.c | 145 +++--
 block/block-copy.c |  21 ++
 3 files changed, 83 insertions(+), 92 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 370a194d3c..a3e11d3fa2 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -18,7 +18,6 @@
 #include "block/block.h"
 #include "qemu/co-shared-resource.h"
 
-typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
 typedef void (*BlockCopyAsyncCallbackFunc)(int ret, bool error_is_read,
void *opaque);
 typedef struct BlockCopyState BlockCopyState;
@@ -29,11 +28,6 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  BdrvRequestFlags write_flags,
  Error **errp);
 
-void block_copy_set_progress_callback(
-BlockCopyState *s,
-ProgressBytesCallbackFunc progress_bytes_callback,
-void *progress_opaque);
-
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
@@ -57,7 +51,8 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
  int64_t offset, int64_t bytes,
  bool ratelimit, int max_workers,
  int64_t max_chunk,
- BlockCopyAsyncCallbackFunc cb);
+ BlockCopyAsyncCallbackFunc cb,
+ void *cb_opaque);
 
 /*
  * Set speed limit for block-copy instance. All block-copy operations related 
to
diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
 BlockdevOnError on_source_error;
 BlockdevOnError on_target_error;
 uint64_t len;
-uint64_t bytes_read;
 int64_t cluster_size;
 int max_workers;
 int64_t max_chunk;
 
 BlockCopyState *bcs;
+
+BlockCopyCallState *bcs_call;
+int ret;
+bool error_is_read;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
 
-static void backup_progress_bytes_callback(int64_t bytes, void *opaque)
-{
-BackupBlockJob *s = opaque;
-
-s->bytes_read += bytes;
-}
-
-static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-  int64_t offset, uint64_t bytes,
-  bool *error_is_read)
-{
-int ret = 0;
-int64_t start, end; /* bytes */
-
-start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
-end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);
-
-trace_backup_do_cow_enter(job, start, offset, bytes);
-
-ret = block_copy(job->bcs, start, end - start, error_is_read);
-
-trace_backup_do_cow_return(job, offset, bytes, ret);
-
-return ret;
-}
-
 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
 {
 BdrvDirtyBitmap *bm;
@@ -159,54 +136,58 @@ static BlockErrorAction 
backup_error_action(BackupBlockJob *job,
 }
 }
 
-static bool coroutine_fn yield_and_check(BackupBlockJob *job)
+static void coroutine_fn backup_block_copy_callback(int ret, bool 
error_is_read,
+void *opaque)
 {
-uint64_t delay_ns;
-
-if (job_is_cancelled(>common.job)) {
-return true;
-}
-
-/*
- * We need to yield even for delay_ns = 0 so that bdrv_drain_all() can
- * return. Without a yield, the VM would not reboot.
- */
-delay_ns = block_job_ratelimit_get_delay(>common, job->bytes_read);
-job->bytes_read = 0;
-job_sleep_ns(>common.job, delay_ns);
-
-if (job_is_cancelled(>common.job)) {
-return true;
-}
+BackupBlockJob *s = opaque;
 
-return false;
+s->bcs_call = NULL;
+s->ret = ret;
+s->error_is_read = error_is_read;
+job_enter(>common.job);
 }
 
 static int coroutine_fn backup_loop(BackupBlockJob *job)
 {
-bool error_is_read;
-int64_t offset;
-BdrvDirtyBitmapIter *bdbi;
-int ret = 0;
+while (true) { /* retry loop */
+assert(!job->bcs_call);
+job->bcs_call = block_copy_async(job->bcs, 0,
+ QEMU_ALIGN_UP(job->len,
+   job->cluster_size),
+ true, job->max_workers, 
job->max_chunk,
+ backup_block_copy_callback, job);
 
-bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
-while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-