from:"Peter Krempa"

Re: [PATCH v2 00/10] mirror: allow switching from background to active mode

2024-03-11 Thread Peter Krempa

On Mon, Mar 11, 2024 at 18:51:18 +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11.03.24 00:07, Peter Krempa wrote:
> > On Thu, Mar 07, 2024 at 22:42:56 +0300, Vladimir Sementsov-Ogievskiy wrote:

[...]

> > Libvirt can adapt to any option that will give us the above semantics
> > (extra parameter at completion time, different completion command or
> > extra command to switch job properties right before completion), but to
> > be honest all of these feel like they would be more hassle than keeping
> > 'block-job-cancel' around from qemu's side.
> > 
> 
> I understand. But still, it would be good to finally resolve the duplication 
> between job-* and block-job-* APIs. We can keep old quirk working for a long 
> time even after making a new consistent API.

Sure, if you decide to go that way it's okay as long as we can avoid the
graph change at 'completion' time. However you decide to implement it
just let me know in advance so that I can prepare the libvirt patches
for it.

Re: [PATCH v2 00/10] mirror: allow switching from background to active mode

2024-03-10 Thread Peter Krempa

On Thu, Mar 07, 2024 at 22:42:56 +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 04.03.24 14:09, Peter Krempa wrote:
> > On Mon, Mar 04, 2024 at 11:48:54 +0100, Kevin Wolf wrote:
> > > Am 28.02.2024 um 19:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > On 03.11.23 18:56, Markus Armbruster wrote:
> > > > > Kevin Wolf  writes:

[...]

> 
> I only see the following differencies:
> 
> 1. block-job-complete documents that it completes the job synchronously.. But 
> looking at mirror code I see it just set s->should_complete = true, which 
> will be then handled asynchronously.. So I doubt that documentation is 
> correct.
> 
> 2. block-job-complete will trigger final graph changes. block-job-cancel will 
> not.
> 
> Is [2] really useful? Seems yes: in case of some failure before starting 
> migration target, we'd like to continue executing source. So, no reason to 
> break block-graph in source, better keep it unchanged.
> 
> But I think, such behavior better be setup by mirror-job start parameter, 
> rather then by special option for cancel (or even compelete) command, useful 
> only for mirror.

Libvirt's API design was based on the qemu quirk and thus we allow users
to do the decision after starting the job, thus anything you pick needs
to allow us to do this at "completion" time.

Libvirt can adapt to any option that will give us the above semantics
(extra parameter at completion time, different completion command or
extra command to switch job properties right before completion), but to
be honest all of these feel like they would be more hassle than keeping
'block-job-cancel' around from qemu's side.

Re: [PATCH v2 00/10] mirror: allow switching from background to active mode

2024-03-04 Thread Peter Krempa

On Mon, Mar 04, 2024 at 11:48:54 +0100, Kevin Wolf wrote:
> Am 28.02.2024 um 19:07 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > On 03.11.23 18:56, Markus Armbruster wrote:
> > > Kevin Wolf  writes:

[...]

> > > Is the job abstraction a failure?
> > > 
> > > We have
> > > 
> > >  block-job- command  since   job- commandsince
> > >  -
> > >  block-job-set-speed 1.1
> > >  block-job-cancel1.1 job-cancel  3.0
> > >  block-job-pause 1.3 job-pause   3.0
> > >  block-job-resume1.3 job-resume  3.0
> > >  block-job-complete  1.3 job-complete3.0
> > >  block-job-dismiss   2.12job-dismiss 3.0
> > >  block-job-finalize  2.12job-finalize3.0
> > >  block-job-change8.2
> > >  query-block-jobs1.1 query-jobs

[...]

> I consider these strictly optional. We don't really have strong reasons
> to deprecate these commands (they are just thin wrappers), and I think
> libvirt still uses block-job-* in some places.

Libvirt uses 'block-job-cancel' because it has different semantics from
'job-cancel' which libvirt documented as the behaviour of the API that
uses it. (Semantics regarding the expectation of what is written to the
destination node at the point when the job is cancelled).

Libvirt also uses 'block-job-set-speed' and 'query-block-jobs' but those
can be replaced easily and looking at the above table even without any
feature checks.

Thus the plan to deprecate at least 'block-job-cancel' will not work
unless the semantics are ported into 'job-cancel'.

Re: [PATCH v3 0/2] block: commit/stream: Allow users to request only format driver names in backing file format

2024-01-03 Thread Peter Krempa

On Tue, Dec 05, 2023 at 18:14:40 +0100, Peter Krempa wrote:
> Please see patches for rationale.
> 
> Libvirt patches using this new flag will be posted soon-ish (after
> cleanup).
> 
> v3:
>  - changed name of flag to 'backing-mask-protocol' (Eric)
>  - decided to keep Vladimir's R-b as he requested shorter name too
> 
> v2:
>  - fixed mistaken argument order in 'hmp_block_stream'
>  - changed version in docs to 9.0 as getting this into RC 3 probably
>isn't realistic
> 
> Peter Krempa (2):
>   block: commit: Allow users to request only format driver names in
> backing file format
>   block: stream: Allow users to request only format driver names in
> backing file format

Polite ping, now that qemu-8.2 was released.

Re: [PATCH v3 0/2] block: commit/stream: Allow users to request only format driver names in backing file format

2023-12-05 Thread Peter Krempa

On Tue, Dec 05, 2023 at 18:14:40 +0100, Peter Krempa wrote:
> Please see patches for rationale.
> 
> Libvirt patches using this new flag will be posted soon-ish (after
> cleanup).

https://lists.libvirt.org/archives/list/de...@lists.libvirt.org/message/GCCZP5ANYTPVXCIEYGSM5NWYCGDL23V6/

[PATCH v3 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-12-05 Thread Peter Krempa

Introduce a new flag 'backing-mask-protocol' for the block-commit QMP
command which instructs the internals to use 'raw' instead of the
protocol driver in case when a image is used without a dummy 'raw'
wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block.c| 37 +-
 block/commit.c |  6 -
 blockdev.c |  6 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   |  8 +-
 tests/unit/test-bdrv-drain.c   |  3 ++-
 8 files changed, 58 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index bfb0861ec6..771a00a14d 100644
--- a/block.c
+++ b/block.c
@@ -1309,11 +1309,14 @@ static void bdrv_backing_detach(BdrvChild *c)
 }

 static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
-const char *filename, Error **errp)
+const char *filename,
+bool backing_mask_protocol,
+Error **errp)
 {
 BlockDriverState *parent = c->opaque;
 bool read_only = bdrv_is_read_only(parent);
 int ret;
+const char *format_name;
 GLOBAL_STATE_CODE();

 if (read_only) {
@@ -1323,9 +1326,23 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 }
 }

-ret = bdrv_change_backing_file(parent, filename,
-   base->drv ? base->drv->format_name : "",
-   false);
+if (base->drv) {
+/*
+ * If the new base image doesn't have a format driver layer, which we
+ * detect by the fact that @base is a protocol driver, we record
+ * 'raw' as the format instead of putting the protocol name as the
+ * backing format
+ */
+if (backing_mask_protocol && base->drv->protocol_name) {
+format_name = "raw";
+} else {
+format_name = base->drv->format_name;
+}
+} else {
+format_name = "";
+}
+
+ret = bdrv_change_backing_file(parent, filename, format_name, false);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not update backing file link");
 }
@@ -1479,10 +1496,14 @@ static void GRAPH_WRLOCK bdrv_child_cb_detach(BdrvChild 
*child)
 }

 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
- const char *filename, Error **errp)
+ const char *filename,
+ bool backing_mask_protocol,
+ Error **errp)
 {
 if (c->role & BDRV_CHILD_COW) {
-return bdrv_backing_update_filename(c, base, filename, errp);
+return bdrv_backing_update_filename(c, base, filename,
+backing_mask_protocol,
+errp);
 }
 return 0;
 }
@@ -5961,7 +5982,8 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, 
BlockDriverState *base)
  *
  */
 int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
-   const char *backing_file_str)
+   const char *backing_file_str,
+   bool backing_mask_protocol)
 {
 BlockDriverState *explicit_top = top;
 bool update_inherits_from;
@@ -6027,6 +6049,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,

 if (c->klass->update_filename) {
 ret = c->klass->update_filename(c, base, backing_file_str,
+backing_mask_protocol,
 _err);
 if (ret < 0) {
 /*
diff --git a/block/commit.c b/block/commit.c
index 69cc75be0c..4c351644c1 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -42,6 +42,7 @@ typedef struct CommitBlockJob {
 bool base_read_only;
 bool chain_frozen;
 char *backing_file_str;
+bool backing_mask_protocol;
 } CommitBlockJob;

 static int commit_prepare(Job *job)
@@ -61,7 +62,

[PATCH v3 2/2] block: stream: Allow users to request only format driver names in backing file format

2023-12-05 Thread Peter Krempa

Introduce a new flag 'backing-mask-protocol' for the block-stream QMP
command which instructs the internals to use 'raw' instead of the
protocol driver in case when a image is used without a dummy 'raw'
wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 +-
 blockdev.c |  7 +++
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   |  9 -
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c729cbf1eb..9080e29d4d 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -509,7 +509,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 const char *base = qdict_get_try_str(qdict, "base");
 int64_t speed = qdict_get_try_int(qdict, "speed", 0);

-qmp_block_stream(device, device, base, NULL, NULL, NULL,
+qmp_block_stream(device, device, base, NULL, NULL, false, false, NULL,
  qdict_haskey(qdict, "speed"), speed,
  true, BLOCKDEV_ON_ERROR_REPORT, NULL,
  false, false, false, false, );
diff --git a/block/stream.c b/block/stream.c
index 01fe7c0f16..da62410dde 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -39,6 +39,7 @@ typedef struct StreamBlockJob {
 BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
+bool backing_mask_protocol;
 bool bs_read_only;
 } StreamBlockJob;

@@ -95,7 +96,12 @@ static int stream_prepare(Job *job)
 if (unfiltered_base) {
 base_id = s->backing_file_str ?: unfiltered_base->filename;
 if (unfiltered_base->drv) {
-base_fmt = unfiltered_base->drv->format_name;
+if (s->backing_mask_protocol &&
+unfiltered_base->drv->protocol_name) {
+base_fmt = "raw";
+} else {
+base_fmt = unfiltered_base->drv->format_name;
+}
 }
 }

@@ -247,6 +253,7 @@ static const BlockJobDriver stream_job_driver = {

 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
+  bool backing_mask_protocol,
   BlockDriverState *bottom,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
@@ -398,6 +405,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->base_overlay = base_overlay;
 s->above_base = above_base;
 s->backing_file_str = g_strdup(backing_file_str);
+s->backing_mask_protocol = backing_mask_protocol;
 s->cor_filter_bs = cor_filter_bs;
 s->target_bs = bs;
 s->bs_read_only = bs_read_only;
diff --git a/blockdev.c b/blockdev.c
index 9ced07a8c4..a5ca1cf3ad 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2408,6 +2408,8 @@ void qmp_block_stream(const char *job_id, const char 
*device,
   const char *base,
   const char *base_node,
   const char *backing_file,
+  bool has_backing_mask_protocol,
+  bool backing_mask_protocol,
   const char *bottom,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
@@ -2443,6 +2445,10 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 return;
 }

+if (!has_backing_mask_protocol) {
+backing_mask_protocol = false;
+}
+
 if (!has_on_error) {
 on_error = BLOCKDEV_ON_ERROR_REPORT;
 }
@@ -2531,6 +2537,7 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 }

 stream_start(job_id, bs, base_bs, backing_file,
+ backing_mask_protocol,
  bottom_bs, job_flags, has_speed ? speed : 0, on_error,
  filter_node_name, _err);
 if (local_err) {
diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index 2162269df6..d2201e27f4 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -46,6 +46,8 @@
  * flatten the

[PATCH v3 0/2] block: commit/stream: Allow users to request only format driver names in backing file format

2023-12-05 Thread Peter Krempa

Please see patches for rationale.

Libvirt patches using this new flag will be posted soon-ish (after
cleanup).

v3:
 - changed name of flag to 'backing-mask-protocol' (Eric)
 - decided to keep Vladimir's R-b as he requested shorter name too

v2:
 - fixed mistaken argument order in 'hmp_block_stream'
 - changed version in docs to 9.0 as getting this into RC 3 probably
   isn't realistic

Peter Krempa (2):
  block: commit: Allow users to request only format driver names in
backing file format
  block: stream: Allow users to request only format driver names in
backing file format

 block.c| 37 +-
 block/commit.c |  6 -
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 ++-
 blockdev.c | 13 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  6 +
 qapi/block-core.json   | 17 ++--
 tests/unit/test-bdrv-drain.c   |  3 ++-
 10 files changed, 86 insertions(+), 15 deletions(-)

-- 
2.43.0

Re: [PATCH v2 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa

On Thu, Nov 30, 2023 at 13:24:18 -0600, Eric Blake wrote:
> On Thu, Nov 30, 2023 at 05:06:03PM +0100, Peter Krempa wrote:
> > Introduce a new flag 'backing_file_format_no_protocol' for the
> > block-commit QMP command which instructs the internals to use 'raw'
> > instead of the protocol driver in case when a image is used without a
> > dummy 'raw' wrapper.
> > 
> > The flag is designed such that it can be always asserted by management
> > tools even when there isn't any update to backing files.
> > 
> > The flag will be used by libvirt so that the backing images still
> > reference the proper format even when libvirt will stop using the dummy
> > raw driver (raw driver with no other config). Libvirt needs this so that
> > the images stay compatible with older libvirt versions which didn't
> > expect that a protocol driver name can appear in the backing file format
> > field.
> > 
> > Signed-off-by: Peter Krempa 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > ---
> 
> > +++ b/qapi/block-core.json
> > @@ -1810,6 +1810,14 @@
> >  # Care should be taken when specifying the string, to specify a
> >  # valid filename or protocol.  (Since 2.1)
> >  #
> > +# @backing-file-format-no-protocol: If true always use a 'format' driver 
> > name
> > +# for the 'backing file format' field if updating the image header of 
> > the
> > +# overlay of 'top'. Otherwise the real name of the driver of the 
> > backing
> > +# image may be used which may be a protocol driver.
> > +#
> > +# Can be used also when no image header will be updated.
> > +# (default: false; since: 9.0)


As I've previously stated, I don't really care about a name as long as I
don't have to keep re-sending,

> This is a long name.  What about:

But is the long name really a problem?

> @backing-mask-protocol: If true, replace any protocol mentioned in the
> 'backing file format' with 'raw', rather than storing the protocol
> name as the backing format.  Can be used even when no image header
> will be updated (default false; since 9.0).

Sounds okay to me. In the end, nobody will really see this as libvirt
will be using it internally

[PATCH v2 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa

Introduce a new flag 'backing_file_format_no_protocol' for the
block-commit QMP command which instructs the internals to use 'raw'
instead of the protocol driver in case when a image is used without a
dummy 'raw' wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block.c| 37 +-
 block/commit.c |  6 -
 blockdev.c |  6 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   | 11 +++-
 tests/unit/test-bdrv-drain.c   |  3 ++-
 8 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index bfb0861ec6..986a529941 100644
--- a/block.c
+++ b/block.c
@@ -1309,11 +1309,14 @@ static void bdrv_backing_detach(BdrvChild *c)
 }

 static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
-const char *filename, Error **errp)
+const char *filename,
+bool backing_file_format_no_protocol,
+Error **errp)
 {
 BlockDriverState *parent = c->opaque;
 bool read_only = bdrv_is_read_only(parent);
 int ret;
+const char *format_name;
 GLOBAL_STATE_CODE();

 if (read_only) {
@@ -1323,9 +1326,23 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 }
 }

-ret = bdrv_change_backing_file(parent, filename,
-   base->drv ? base->drv->format_name : "",
-   false);
+if (base->drv) {
+/*
+ * If the new base image doesn't have a format driver layer, which we
+ * detect by the fact that @base is a protocol driver, we record
+ * 'raw' as the format instead of putting the protocol name as the
+ * backing format
+ */
+if (backing_file_format_no_protocol && base->drv->protocol_name) {
+format_name = "raw";
+} else {
+format_name = base->drv->format_name;
+}
+} else {
+format_name = "";
+}
+
+ret = bdrv_change_backing_file(parent, filename, format_name, false);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not update backing file link");
 }
@@ -1479,10 +1496,14 @@ static void GRAPH_WRLOCK bdrv_child_cb_detach(BdrvChild 
*child)
 }

 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
- const char *filename, Error **errp)
+ const char *filename,
+ bool backing_file_format_no_protocol,
+ Error **errp)
 {
 if (c->role & BDRV_CHILD_COW) {
-return bdrv_backing_update_filename(c, base, filename, errp);
+return bdrv_backing_update_filename(c, base, filename,
+backing_file_format_no_protocol,
+errp);
 }
 return 0;
 }
@@ -5961,7 +5982,8 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, 
BlockDriverState *base)
  *
  */
 int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
-   const char *backing_file_str)
+   const char *backing_file_str,
+   bool backing_file_format_no_protocol)
 {
 BlockDriverState *explicit_top = top;
 bool update_inherits_from;
@@ -6027,6 +6049,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,

 if (c->klass->update_filename) {
 ret = c->klass->update_filename(c, base, backing_file_str,
+backing_file_format_no_protocol,
 _err);
 if (ret < 0) {
 /*
diff --git a/block/commit.c b/block/commit.c
index 69cc75be0c..5a584b712e 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -42,6 +42,7 @@ typedef struct CommitBlockJob {
 bool base_read_only;
 bool chain_frozen;
 char *backing_file_str;
+bool backing_file_forma

[PATCH v2 2/2] block: stream: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa

Introduce a new flag 'backing_file_format_no_protocol' for the
block-commit QMP command which instructs the internals to use 'raw'
instead of the protocol driver in case when a image is used without a
dummy 'raw' wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 +-
 blockdev.c |  7 +++
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   | 11 ++-
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c729cbf1eb..9080e29d4d 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -509,7 +509,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 const char *base = qdict_get_try_str(qdict, "base");
 int64_t speed = qdict_get_try_int(qdict, "speed", 0);

-qmp_block_stream(device, device, base, NULL, NULL, NULL,
+qmp_block_stream(device, device, base, NULL, NULL, false, false, NULL,
  qdict_haskey(qdict, "speed"), speed,
  true, BLOCKDEV_ON_ERROR_REPORT, NULL,
  false, false, false, false, );
diff --git a/block/stream.c b/block/stream.c
index 01fe7c0f16..42befd6b1d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -39,6 +39,7 @@ typedef struct StreamBlockJob {
 BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
+bool backing_file_format_no_protocol;
 bool bs_read_only;
 } StreamBlockJob;

@@ -95,7 +96,12 @@ static int stream_prepare(Job *job)
 if (unfiltered_base) {
 base_id = s->backing_file_str ?: unfiltered_base->filename;
 if (unfiltered_base->drv) {
-base_fmt = unfiltered_base->drv->format_name;
+if (s->backing_file_format_no_protocol &&
+unfiltered_base->drv->protocol_name) {
+base_fmt = "raw";
+} else {
+base_fmt = unfiltered_base->drv->format_name;
+}
 }
 }

@@ -247,6 +253,7 @@ static const BlockJobDriver stream_job_driver = {

 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
+  bool backing_file_format_no_protocol,
   BlockDriverState *bottom,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
@@ -398,6 +405,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->base_overlay = base_overlay;
 s->above_base = above_base;
 s->backing_file_str = g_strdup(backing_file_str);
+s->backing_file_format_no_protocol = backing_file_format_no_protocol;
 s->cor_filter_bs = cor_filter_bs;
 s->target_bs = bs;
 s->bs_read_only = bs_read_only;
diff --git a/blockdev.c b/blockdev.c
index 038031bb03..dc477c4f7e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2408,6 +2408,8 @@ void qmp_block_stream(const char *job_id, const char 
*device,
   const char *base,
   const char *base_node,
   const char *backing_file,
+  bool has_backing_file_format_no_protocol,
+  bool backing_file_format_no_protocol,
   const char *bottom,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
@@ -2443,6 +2445,10 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 return;
 }

+if (!has_backing_file_format_no_protocol) {
+backing_file_format_no_protocol = false;
+}
+
 if (!has_on_error) {
 on_error = BLOCKDEV_ON_ERROR_REPORT;
 }
@@ -2531,6 +2537,7 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 }

 stream_start(job_id, bs, base_bs, backing_file,
+ backing_file_format_no_protocol,
  bottom_bs, job_flags, has_speed ? speed : 0, on_error,
  filter_node_name, _err);
 if (local_err) {
diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index 4f253ff362..4301061048 100644
--- a/include/block/

[PATCH v2 0/2] block: commit/stream: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa

Please see patches for rationale.

Libvirt patches using this new flag will be posted soon-ish (after
cleanup).

v2:
 - fixed mistaken argument order in 'hmp_block_stream'
 - changed version in docs to 9.0 as getting this into RC 3 probably
   isn't realistic

Peter Krempa (2):
  block: commit: Allow users to request only format driver names in
backing file format
  block: stream: Allow users to request only format driver names in
backing file format

 block.c| 37 +-
 block/commit.c |  6 -
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 ++-
 blockdev.c | 13 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  6 +
 qapi/block-core.json   | 22 +--
 tests/unit/test-bdrv-drain.c   |  3 ++-
 10 files changed, 91 insertions(+), 15 deletions(-)

-- 
2.43.0

Re: [PATCH 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-28 Thread Peter Krempa

On Tue, Nov 28, 2023 at 20:10:10 +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 24.11.23 17:52, Peter Krempa wrote:
> > Introduce a new flag 'backing_file_format_no_protocol' for the
> > block-commit QMP command which instructs the internals to use 'raw'
> > instead of the protocol driver in case when a image is used without a
> > dummy 'raw' wrapper.
> > 
> > The flag is designed such that it can be always asserted by management
> > tools even when there isn't any update to backing files.
> > 
> > The flag will be used by libvirt so that the backing images still
> > reference the proper format even when libvirt will stop using the dummy
> > raw driver (raw driver with no other config). Libvirt needs this so that
> > the images stay compatible with older libvirt versions which didn't
> > expect that a protocol driver name can appear in the backing file format
> > field.
> > 
> > Signed-off-by: Peter Krempa 
> > ---

[...]

> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index ca390c5700..367e896905 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -1810,6 +1810,14 @@
> >   # Care should be taken when specifying the string, to specify a
> >   # valid filename or protocol.  (Since 2.1)
> >   #
> > +# @backing-file-format-no-protocol: If true always use a 'format' driver 
> > name
> > +# for the 'backing file format' field if updating the image header of 
> > the
> > +# overlay of 'top'. Otherwise the real name of the driver of the 
> > backing
> > +# image may be used which may be a protocol driver.
> > +#
> > +# Can be used also when no image header will be updated.
> > +# (default: false; since: 8.2)
> > +#
> 
> Hi Peter.

Hi,

Firstly to be honest, I consider the fact that qemu can put a protocol
driver into the header field named "backing file format" to be a bug.
After discussion with Kevin I understand the rationale of doing so, but
nevertheless, a backing protocol name is not a format and should have
had it's own header field.

> Hmm. Could this just be @backing-file-format ?

I don't really care too deeply about the name.

Calling it just @backing-file-format though would imply (as you write
below) that as argument the string to write into the metadata. More on
that below.

> As I understand, finally, that's just a string which we are going to put into 
> qcow2 metadata.

Yes.

> And from qcow2 point of view, it's rather strange to set 
> backing_file_format="raw" for backing image which is actually "qcow2".

Indeed, that would be wrong, but this is _NOT_ what this patch
actually does.

This patch ensures that only a *format* driver name is ever written to
the backing image. It overrides the fact that a *protocol* driver name
would be written into the field if the tail of the backing chain is a
raw image, which was instantiated in qemu without the dummy 'raw' driver
on top.

Since a raw driver with no configuration simply passes request to the
*protocol* driver below it it's not needed in most configs. In fact as
stefanha figured out a long time ago it's just simpy overhead.

We need a format name in the backing file format field as libvirt
assumed for a very long time that such a field would contain only format
drivers, and since I want to remove the overhead of the 'raw' driver I
need to ensure that images won't break for older libvirt.

>"raw" say nothing to the reader and may be even misleading. Same for qemu, 
>this seems just a strange thing.
> 
> Also, what I dislike, that new feature sounds like hardcoded "raw" is the 
> only format driver. If go this way, more honest name would be 
> @backing-file-raw.

Once again, that is not what's happening here. The field enables that
only a *format* driver is used. This is only a problem when the final
image is raw, but without the dummy 'raw' driver on top of it. Thus
that's the reason the workaround only ever writes 'raw' into it.
Otherwise the proper format name is in the 'driver' field already and is
not overwritten.

> And, if we allow user to give any backing-file name to be written into qcow2 
> metadata after the operation, why not just allow to set any backing-file-name 
> as well?

This IMO makes no sense to allow, but based on the reimagined design of
the 'backing file /format/' field it might appear that it does.

'block-commit' and 'block-stream, can't change the format of any of the
images, they simply move data, thus for any non-raw image in a chain it
must still use actual format name. Only place it makes sense is in the
same cases when the code in this patch triggers.

And then it still doesn't make sense to write anything besides:
1) The actual

[PATCH 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-24 Thread Peter Krempa

Introduce a new flag 'backing_file_format_no_protocol' for the
block-commit QMP command which instructs the internals to use 'raw'
instead of the protocol driver in case when a image is used without a
dummy 'raw' wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
---
 block.c| 37 +-
 block/commit.c |  6 -
 blockdev.c |  6 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   | 11 +++-
 tests/unit/test-bdrv-drain.c   |  3 ++-
 8 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index bfb0861ec6..986a529941 100644
--- a/block.c
+++ b/block.c
@@ -1309,11 +1309,14 @@ static void bdrv_backing_detach(BdrvChild *c)
 }

 static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
-const char *filename, Error **errp)
+const char *filename,
+bool backing_file_format_no_protocol,
+Error **errp)
 {
 BlockDriverState *parent = c->opaque;
 bool read_only = bdrv_is_read_only(parent);
 int ret;
+const char *format_name;
 GLOBAL_STATE_CODE();

 if (read_only) {
@@ -1323,9 +1326,23 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 }
 }

-ret = bdrv_change_backing_file(parent, filename,
-   base->drv ? base->drv->format_name : "",
-   false);
+if (base->drv) {
+/*
+ * If the new base image doesn't have a format driver layer, which we
+ * detect by the fact that @base is a protocol driver, we record
+ * 'raw' as the format instead of putting the protocol name as the
+ * backing format
+ */
+if (backing_file_format_no_protocol && base->drv->protocol_name) {
+format_name = "raw";
+} else {
+format_name = base->drv->format_name;
+}
+} else {
+format_name = "";
+}
+
+ret = bdrv_change_backing_file(parent, filename, format_name, false);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not update backing file link");
 }
@@ -1479,10 +1496,14 @@ static void GRAPH_WRLOCK bdrv_child_cb_detach(BdrvChild 
*child)
 }

 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
- const char *filename, Error **errp)
+ const char *filename,
+ bool backing_file_format_no_protocol,
+ Error **errp)
 {
 if (c->role & BDRV_CHILD_COW) {
-return bdrv_backing_update_filename(c, base, filename, errp);
+return bdrv_backing_update_filename(c, base, filename,
+backing_file_format_no_protocol,
+errp);
 }
 return 0;
 }
@@ -5961,7 +5982,8 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, 
BlockDriverState *base)
  *
  */
 int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
-   const char *backing_file_str)
+   const char *backing_file_str,
+   bool backing_file_format_no_protocol)
 {
 BlockDriverState *explicit_top = top;
 bool update_inherits_from;
@@ -6027,6 +6049,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,

 if (c->klass->update_filename) {
 ret = c->klass->update_filename(c, base, backing_file_str,
+backing_file_format_no_protocol,
 _err);
 if (ret < 0) {
 /*
diff --git a/block/commit.c b/block/commit.c
index 69cc75be0c..5a584b712e 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -42,6 +42,7 @@ typedef struct CommitBlockJob {
 bool base_read_only;
 bool chain_frozen;
 char *backing_file_str;
+bool backing_file_format_no_protocol;
 } CommitBlockJob;

 sta

[PATCH 2/2] block: stream: Allow users to request only format driver names in backing file format

2023-11-24 Thread Peter Krempa

Introduce a new flag 'backing_file_format_no_protocol' for the
block-commit QMP command which instructs the internals to use 'raw'
instead of the protocol driver in case when a image is used without a
dummy 'raw' wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
---
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 +-
 blockdev.c |  7 +++
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   | 11 ++-
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c729cbf1eb..28e708a981 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -509,7 +509,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 const char *base = qdict_get_try_str(qdict, "base");
 int64_t speed = qdict_get_try_int(qdict, "speed", 0);

-qmp_block_stream(device, device, base, NULL, NULL, NULL,
+qmp_block_stream(device, device, base, NULL, NULL, NULL, false, false,
  qdict_haskey(qdict, "speed"), speed,
  true, BLOCKDEV_ON_ERROR_REPORT, NULL,
  false, false, false, false, );
diff --git a/block/stream.c b/block/stream.c
index 01fe7c0f16..42befd6b1d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -39,6 +39,7 @@ typedef struct StreamBlockJob {
 BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
+bool backing_file_format_no_protocol;
 bool bs_read_only;
 } StreamBlockJob;

@@ -95,7 +96,12 @@ static int stream_prepare(Job *job)
 if (unfiltered_base) {
 base_id = s->backing_file_str ?: unfiltered_base->filename;
 if (unfiltered_base->drv) {
-base_fmt = unfiltered_base->drv->format_name;
+if (s->backing_file_format_no_protocol &&
+unfiltered_base->drv->protocol_name) {
+base_fmt = "raw";
+} else {
+base_fmt = unfiltered_base->drv->format_name;
+}
 }
 }

@@ -247,6 +253,7 @@ static const BlockJobDriver stream_job_driver = {

 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
+  bool backing_file_format_no_protocol,
   BlockDriverState *bottom,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
@@ -398,6 +405,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->base_overlay = base_overlay;
 s->above_base = above_base;
 s->backing_file_str = g_strdup(backing_file_str);
+s->backing_file_format_no_protocol = backing_file_format_no_protocol;
 s->cor_filter_bs = cor_filter_bs;
 s->target_bs = bs;
 s->bs_read_only = bs_read_only;
diff --git a/blockdev.c b/blockdev.c
index 038031bb03..dc477c4f7e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2408,6 +2408,8 @@ void qmp_block_stream(const char *job_id, const char 
*device,
   const char *base,
   const char *base_node,
   const char *backing_file,
+  bool has_backing_file_format_no_protocol,
+  bool backing_file_format_no_protocol,
   const char *bottom,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
@@ -2443,6 +2445,10 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 return;
 }

+if (!has_backing_file_format_no_protocol) {
+backing_file_format_no_protocol = false;
+}
+
 if (!has_on_error) {
 on_error = BLOCKDEV_ON_ERROR_REPORT;
 }
@@ -2531,6 +2537,7 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 }

 stream_start(job_id, bs, base_bs, backing_file,
+ backing_file_format_no_protocol,
  bottom_bs, job_flags, has_speed ? speed : 0, on_error,
  filter_node_name, _err);
 if (local_err) {
diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index 4f253ff362..4301061048 100644
--- a/include/block/block_int-global-state.h
+++ b/include

[PATCH 0/2] block: commit/stream: Allow users to request only format driver names in backing file format

2023-11-24 Thread Peter Krempa

Please see patches for rationale.

Libvirt patches using this new flag will be posted soon-ish (after
cleanup).

Peter Krempa (2):
  block: commit: Allow users to request only format driver names in
backing file format
  block: stream: Allow users to request only format driver names in
backing file format

 block.c| 37 +-
 block/commit.c |  6 -
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 ++-
 blockdev.c | 13 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  6 +
 qapi/block-core.json   | 22 +--
 tests/unit/test-bdrv-drain.c   |  3 ++-
 10 files changed, 91 insertions(+), 15 deletions(-)

-- 
2.43.0

Re: [PATCH] block/blkio: Fix inclusion of required headers

2023-01-23 Thread Peter Krempa

On Mon, Jan 23, 2023 at 15:01:37 +0100, Markus Armbruster wrote:
> Peter Krempa  writes:
> 
> > After recent header file inclusion rework the build fails when the blkio
> > module is enabled:
> >
> > ../block/blkio.c: In function ‘blkio_detach_aio_context’:
> > ../block/blkio.c:321:24: error: implicit declaration of function 
> > ‘bdrv_get_aio_context’; did you mean ‘qemu_get_aio_context’? 
> > [-Werror=implicit-function-declaration]
> >   321 | aio_set_fd_handler(bdrv_get_aio_context(bs),
> >   |^~~~
> >   |qemu_get_aio_context
> > ../block/blkio.c:321:24: error: nested extern declaration of 
> > ‘bdrv_get_aio_context’ [-Werror=nested-externs]
> > ../block/blkio.c:321:24: error: passing argument 1 of ‘aio_set_fd_handler’ 
> > makes pointer from integer without a cast [-Werror=int-conversion]
> >   321 | aio_set_fd_handler(bdrv_get_aio_context(bs),
> >   |^~~~
> >   ||
> >   |int
> > In file included from /home/pipo/git/qemu.git/include/qemu/job.h:33,
> >  from /home/pipo/git/qemu.git/include/block/blockjob.h:30,
> >  from 
> > /home/pipo/git/qemu.git/include/block/block_int-global-state.h:28,
> >  from /home/pipo/git/qemu.git/include/block/block_int.h:27,
> >  from ../block/blkio.c:13:
> > /home/pipo/git/qemu.git/include/block/aio.h:476:37: note: expected 
> > ‘AioContext *’ but argument is of type ‘int’
> >   476 | void aio_set_fd_handler(AioContext *ctx,
> >   | ^~~
> > ../block/blkio.c: In function ‘blkio_file_open’:
> > ../block/blkio.c:821:34: error: passing argument 2 of 
> > ‘blkio_attach_aio_context’ makes pointer from integer without a cast 
> > [-Werror=int-conversion]
> >   821 | blkio_attach_aio_context(bs, bdrv_get_aio_context(bs));
> >   |  ^~~~
> >   |  |
> >   |  int
> >
> 
> My apologies...
> 
> Why are modules disabled by default?

libblkio is too new and is not yet packaged too widely. IIUC it's
supposed to be included e.g. in Fedora 38, so you most likely don't have
the dependancy.

I actually installed it explicitly since I wanted to give it a try.

[PATCH] block/blkio: Fix inclusion of required headers

2023-01-23 Thread Peter Krempa

After recent header file inclusion rework the build fails when the blkio
module is enabled:

../block/blkio.c: In function ‘blkio_detach_aio_context’:
../block/blkio.c:321:24: error: implicit declaration of function 
‘bdrv_get_aio_context’; did you mean ‘qemu_get_aio_context’? 
[-Werror=implicit-function-declaration]
  321 | aio_set_fd_handler(bdrv_get_aio_context(bs),
  |^~~~
  |qemu_get_aio_context
../block/blkio.c:321:24: error: nested extern declaration of 
‘bdrv_get_aio_context’ [-Werror=nested-externs]
../block/blkio.c:321:24: error: passing argument 1 of ‘aio_set_fd_handler’ 
makes pointer from integer without a cast [-Werror=int-conversion]
  321 | aio_set_fd_handler(bdrv_get_aio_context(bs),
  |^~~~
  ||
  |int
In file included from /home/pipo/git/qemu.git/include/qemu/job.h:33,
 from /home/pipo/git/qemu.git/include/block/blockjob.h:30,
 from 
/home/pipo/git/qemu.git/include/block/block_int-global-state.h:28,
 from /home/pipo/git/qemu.git/include/block/block_int.h:27,
 from ../block/blkio.c:13:
/home/pipo/git/qemu.git/include/block/aio.h:476:37: note: expected ‘AioContext 
*’ but argument is of type ‘int’
  476 | void aio_set_fd_handler(AioContext *ctx,
  | ^~~
../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:821:34: error: passing argument 2 of 
‘blkio_attach_aio_context’ makes pointer from integer without a cast 
[-Werror=int-conversion]
  821 | blkio_attach_aio_context(bs, bdrv_get_aio_context(bs));
  |  ^~~~
  |  |
  |  int

Fix it by including 'block/block-io.h' which contains the required
declarations.

Fixes: e2c1c34f139f49ef909bb4322607fb8b39002312
Signed-off-by: Peter Krempa 
---
 block/blkio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blkio.c b/block/blkio.c
index 5eae3adfaf..6ad86b23d1 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -19,6 +19,8 @@
 #include "qemu/module.h"
 #include "exec/memory.h" /* for ram_block_discard_disable() */

+#include "block/block-io.h"
+
 /*
  * Keep the QEMU BlockDriver names identical to the libblkio driver names.
  * Using macros instead of typing out the string literals avoids typos.
-- 
2.38.1

Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

2022-08-03 Thread Peter Krempa

On Wed, Jul 27, 2022 at 21:33:40 +0200, Kevin Wolf wrote:
> Am 08.07.2022 um 06:17 hat Stefan Hajnoczi geschrieben:
> > libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> > high-performance disk I/O. It currently supports io_uring and
> > virtio-blk-vhost-vdpa with additional drivers under development.
> > 
> > One of the reasons for developing libblkio is that other applications
> > besides QEMU can use it. This will be particularly useful for
> > vhost-user-blk which applications may wish to use for connecting to
> > qemu-storage-daemon.
> > 
> > libblkio also gives us an opportunity to develop in Rust behind a C API
> > that is easy to consume from QEMU.
> > 
> > This commit adds io_uring and virtio-blk-vhost-vdpa BlockDrivers to QEMU
> > using libblkio. It will be easy to add other libblkio drivers since they
> > will share the majority of code.
> > 
> > For now I/O buffers are copied through bounce buffers if the libblkio
> > driver requires it. Later commits add an optimization for
> > pre-registering guest RAM to avoid bounce buffers.
> > 
> > The syntax is:
> > 
> >   --blockdev 
> > io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off
> > 
> > and:
> > 
> >   --blockdev 
> > virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> The subject line implies only io_uring, but you actually add vhost-vdpa
> support, too. I think the subject line should be changed.
> 
> I think it would also make sense to already implement support for
> vhost-user-blk on the QEMU side even if support isn't compiled in
> libblkio by default and opening vhost-user-blk images would therefore
> always fail with a default build.
> 
> But then you could run QEMU with a custom build of libblkio to make use
> of it without patching QEMU. This is probably useful for getting libvirt
> support for using a storage daemon implemented without having to wait
> for another QEMU release. (Peter, do you have any opinion on this?)

How will this work in terms of detecting whether that feature is
present?

The issue is that libvirt caches capabilities of qemu and the cache is
invalidated based on the timestamp of the qemu binary (and few other
mostly host kernel and cpu properties). In case when a backend library
is updated/changed this probably means that libvirt will not be able to
detect that qemu gained support.

In case when qemu lies about the support even if the backend library
doesn't suport it then we have a problem in not being even able to see
whether we can use it.

Re: [PATCH v2 00/15] qdev: Add JSON -device

2021-10-15 Thread Peter Krempa

On Fri, Oct 08, 2021 at 15:34:27 +0200, Kevin Wolf wrote:
> It's still a long way until we'll have QAPIfied devices, but there are
> some improvements that we can already make now to make the future switch
> easier.
> 
> One important part of this is having code paths without QemuOpts, which
> we want to get rid of and replace with the keyval parser in the long
> run. This series adds support for JSON syntax to -device, which bypasses
> QemuOpts.
> 
> While we're not using QAPI yet, devices are based on QOM, so we already
> do have type checks and an implied schema. JSON syntax supported now can
> be supported by QAPI later and regarding command line compatibility,
> actually switching to it becomes an implementation detail this way (of
> course, it will still add valuable user-visible features like
> introspection and documentation).
> 
> Apart from making things more future proof, this also immediately adds
> a way to do non-scalar properties on the command line. nvme could have
> used list support recently, and the lack of it in -device led to some
> rather unnatural solution in the first version (doing the relationship
> between a device and objects backwards) and loss of features in the
> following. With this series, using a list as a device property should be
> possible without any weird tricks.
> 
> Unfortunately, even QMP device_add goes through QemuOpts before this
> series, which destroys any type safety QOM provides and also can't
> support non-scalar properties. This is a bug, but it turns out that
> libvirt actually relies on it and passes only strings for everything.
> So this series still leaves device_add alone until libvirt is fixed.

I've tested hotplug and standard -device with many (not all) devices
libvirt uses and seems that both '-device JSON' and monitor work
properly.

I'll enable '-device JSON' in libvirt once this hits upstream.

> 
> v2:
> - Drop type safe QMP device_add because libvirt gets it wrong, too

I've actually applied this patch too to make sure that libvirt is
working properly in both modes. It works now well with this too.

I'm not sure about the compatibility promise here, but libvirt is now
prepared even for teh strict version of 'device_add'. Libvirt's
requirements state that new libvirt needs to be used with new qemu and
thus from our view it's okay to do it even now.

> - More network patches to eliminate dependencies on the legacy QemuOpts
>   data structures which would not contain all devices any more after
>   this series. Fix some bugs there as a side effect.
> - Remove an unnecessary use of ERRP_GUARD()
> - Replaced error handling patch for qdev_set_id() with Damien's
> - Drop the deprecation patch until libvirt uses the new JSON syntax

Series:

Tested-by: Peter Krempa

Re: [PATCH 09/11] qdev: Avoid QemuOpts in QMP device_add

2021-10-01 Thread Peter Krempa

On Fri, Sep 24, 2021 at 11:04:25 +0200, Kevin Wolf wrote:
> Directly call qdev_device_add_from_qdict() for QMP device_add instead of
> first going through QemuOpts and converting back to QDict.
> 
> Note that this changes the behaviour of device_add, though in ways that
> should be considered bug fixes:
> 
> QemuOpts ignores differences between data types, so you could
> successfully pass a string "123" for an integer property, or a string
> "on" for a boolean property (and vice versa).  After this change, the
> correct data type for the property must be used in the JSON input.
> 
> qemu_opts_from_qdict() also silently ignores any options whose value is
> a QDict, QList or QNull.

Sorry for chiming in a bit late, but preferrably this commit should be
postponed to at least the next release so that we decrease the amount of
libvirt users broken by this.

Granted users are supposed to use new libvirt with new qemu but that's
not always the case.

Anyways, libvirt is currently mangling all the types to strings with
device_add. I'm currently working on fixing it and it will hopefully be
done until next-month's release, but preferrably we increase the window
of working combinations by postponing this until the next release.

Re: [PATCH for-6.1? v2 5/7] job: Add job_cancel_requested()

2021-08-04 Thread Peter Krempa

On Wed, Aug 04, 2021 at 12:34:31 +0200, Kevin Wolf wrote:
> We could in theory keep allowing redundant completion requests when the
> completion mode doesn't conflict, but I don't see the point of that.

I don't see either. Especially since ...

> Unless libvirt can actually issue multiple completion requests (again,
> this includes both (block-)job-complete and non-force block-job-cancel
> for mirror) for the same block job - Peter, I hope it doesn't?

... the regular job completion code in libvirt which is meant for user
interaction (qemuDomainBlockJobAbort) has the following interlock:

if (job->state == QEMU_BLOCKJOB_STATE_ABORTING ||
job->state == QEMU_BLOCKJOB_STATE_PIVOTING) {
virReportError(VIR_ERR_OPERATION_INVALID,
   _("block job on disk '%s' is still being ended"),
   disk->dst);
goto endjob;
}

.. the other two uses of blockjobs are internal for handling migration
with non shared storage and there we also issue exactly one cancel
request and backup jobs were we too make sure to cancel it just once.

As of such it's okay to forbid the case you are mentioning.

Arbitrary truncating of SCSI disk serial number

2021-06-03 Thread Peter Krempa

Hi,

recently I've got a report that an upgrade of libvirt (and qemu) caused
a guest-visible change in the SCSI disk identification when a very long
serial number is used.

I've traced it back to the point where libvirt started to use the
'device_id=' property of the SCSI disk to pass in the alias of the disk
when the serial is not configured and the serial if it is.

https://gitlab.com/libvirt/libvirt/-/commit/a1dce96236f6d35167924fa7e6a70f58f394b23c

The change is caused by the fact that when serial is configured via the
'serial=' property it's being silently truncated.

Now there are two distinct VPD pages which report the serial number:

0x83 - device identification

 This one used to report only the device alias in the beginning but
 starting from qemu commit:

   commit fd9307912d0a2ffa0310f9e20935d96d5af0a1ca
   Author: Paolo Bonzini 
   Date:   Fri Mar 16 19:12:43 2012 +0100

   scsi: copy serial number into VPD page 0x83

   Currently QEMU passes the qdev device id to the guest in an ASCII-string
   designator in page 0x83.  While this is fine, it does not match what
   real hardware does; usually the ASCII-string designator there hosts
   another copy of the serial number (there can be other designators,
   for example with a world-wide name).  Do the same for QEMU SCSI
   disks.

   ATAPI does not support VPD pages, so it does not matter there.
   Signed-off-by: Paolo Bonzini 

 it reports the serial number instead of the device alias when the
 serial is configured. Now this historically copied the IDE(?) limit of
 20 characters.

 Now with the change to use 'device_id' which overrides the behavior the
 length of the reported value is limited to the technical limit of 255-8
 which creates the problem.

 Libvirt uses 'device_id' because when -blockdev is used the disk alias
 which was configured via the -drive is no longer configured and thus
 would be missing.

 Libvirt also (unfortunately in this case I'd say) started to pass the
 serial number via this property.

0x80 - device serial (optional, only when serial is configured)

 This one started similarly to the 0x83 page to report the serial
 truncated to 20, but later in commit:

   commit 48b6206305b8d56524ac2ee347b68e6e0a528559
   Author: Rony Weng 
   Date:   Mon Aug 29 15:52:18 2016 +0800

   scsi-disk: change disk serial length from 20 to 36

   Openstack Cinder assigns volume a 36 characters uuid as serial.
   QEMU will shrinks the uuid to 20 characters, which does not match
   the original uuid.

   Note that there is no limit to the length of the serial number in
   the SCSI spec.  20 was copy-pasted from virtio-blk which in turn was
   copy-pasted from ATA; 36 is even more arbitrary.  However, bumping it
   up too much might cause issues (e.g. 252 seems to make sense because
   then the maximum amount of returned data is 256; but who knows there's
   no off-by-one somewhere for such a nicely rounded number).

   Signed-off-by: Rony Weng 
   Message-Id: <1472457138-23386-1-git-send-email-ronyw...@synology.com>
   Cc: qemu-sta...@nongnu.org
   Signed-off-by: Paolo Bonzini 

  qemu actually changed the silent truncation to another arbitrary value
  of 36, which is the length of the UUID.

  Thus qemu isn't inocent either in these regards.

Now based on the fact that the above mentioned libvirt commit is
contained in libvirt-5.1 (and qemu-4.0 adds support for 'device_id')
reverting to truncation to 20 characters would IMO also be considerable
as regression, based on the fact that there are users who changed qemu
to lessen the truncation.

As of such I don't think libvirt should revert to using the trucated
serial despite an ABI change.

On the other hand QEMU should IMO:
1) unify the truncation to a single length; preferrably the technical
  limit
2) add possibility to report error when the serial is too long (libvirt
   can accept a new property for example)

I'm open to other suggestions though.

Thanks.

Peter

Re: [PATCH] block: Drop the sheepdog block driver

2021-05-03 Thread Peter Krempa

On Sat, May 01, 2021 at 09:57:47 +0200, Markus Armbruster wrote:
> It was deprecated in commit e1c4269763, v5.2.0.  See that commit
> message for rationale.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  docs/system/deprecated.rst |9 -
>  docs/system/device-url-syntax.rst.inc  |   18 -
>  docs/system/qemu-block-drivers.rst.inc |   69 -
>  docs/system/removed-features.rst   |7 +
>  configure  |   10 -
>  meson.build|1 -
>  qapi/block-core.json   |   93 +-
>  qapi/transaction.json  |8 +-
>  block/sheepdog.c   | 3356 
>  .gitlab-ci.yml |1 -
>  MAINTAINERS|6 -
>  block/meson.build  |1 -
>  block/trace-events |   14 -
>  tests/qemu-iotests/005 |5 -
>  tests/qemu-iotests/025 |2 +-
>  tests/qemu-iotests/check   |3 +-
>  tests/qemu-iotests/common.rc   |4 -
>  17 files changed, 14 insertions(+), 3593 deletions(-)
>  delete mode 100644 block/sheepdog.c

Libvirt will need to adjust one test case (lock it to qemu-6.0 test
data), but other than that, this change is okay with us.

ACKed-by: Peter Krempa

Re: [PATCH] virtio-blk: drop deprecated scsi=on|off property

2021-04-30 Thread Peter Krempa

On Fri, Apr 30, 2021 at 09:42:05 +0200, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> > On Thu, Apr 29, 2021 at 04:52:21PM +0100, Stefan Hajnoczi wrote:
> >> The scsi=on|off property was deprecated in QEMU 5.0 and can be removed
> >> completely at this point.
> >> 
> >> Drop the scsi=on|off option. It was only available on Legacy virtio-blk
> >> devices. Linux v5.6 already dropped support for it.
> >> 
> >> Remove the hw_compat_2_4[] property assignment since scsi=on|off no
> >> longer exists. Old guests with Legacy virtio-blk devices no longer see
> >> the SCSI host features bit.
> >> 
> >
> > This means pc-2.4 will now break guest ABI if using virtio-blk
> > devices, correct?
> >
> > This looks like a sign we should have deprecated pc-2.4 a long
> > time ago.
> 
> The last batch of PC machine type retiring was pc-1.0 to pc-1.3:
> deprecated in 5.0 (commit 30d2a17b4, Dec 2019), dropped in 6.0 (commit
> f862ddbb1, just weeks ago).  pc-1.3 was a bit over seven years old when
> we released 5.0.  pc-2.4 will be six years old by the time we release
> 6.1.  Fair game?

As a data-point, libvirt will be dropping support for

Re: [PATCH] virtio-blk: drop deprecated scsi=on|off property

2021-04-29 Thread Peter Krempa

On Thu, Apr 29, 2021 at 16:52:21 +0100, Stefan Hajnoczi wrote:
> The scsi=on|off property was deprecated in QEMU 5.0 and can be removed
> completely at this point.
> 
> Drop the scsi=on|off option. It was only available on Legacy virtio-blk
> devices. Linux v5.6 already dropped support for it.
> 
> Remove the hw_compat_2_4[] property assignment since scsi=on|off no
> longer exists. Old guests with Legacy virtio-blk devices no longer see
> the SCSI host features bit.

Does this mean that qemu rejects it if it's explicitly enabled on the
commandline? 

> Live migrating old guests from an old QEMU with the SCSI feature bit
> enabled will fail with "Features 0x... unsupported. Allowed features:
> 0x...". We've followed the QEMU deprecation policy so users have been
> warned...
> 
> I have tested that libvirt still works when the property is absent. It
> no longer adds scsi=on|off to the command-line.

Yup, we deliberately don't format it unless the user requested it since:

https://gitlab.com/libvirt/libvirt/-/commit/ec69f0190be731d12faeac08dbf63325836509a9

Depending on your answer above I might need to dig through the code
again to see whether we do the correct thing if it's no longer
available.

> 
> Cc: Markus Armbruster 
> Cc: Christoph Hellwig 
> Cc: Peter Krempa 
> Cc: Dr. David Alan Gilbert 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/specs/tpm.rst   |   2 +-
>  docs/system/deprecated.rst   |  13 ---
>  docs/pci_expander_bridge.txt |   2 +-
>  hw/block/virtio-blk.c| 192 +--
>  hw/core/machine.c|   2 -
>  5 files changed, 3 insertions(+), 208 deletions(-)

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread Peter Krempa

On Fri, Apr 23, 2021 at 15:59:00 +0300, Vladimir Sementsov-Ogievskiy wrote:
> Modern way is using blockdev-add + blockdev-backup, which provides a
> lot more control on how target is opened.
> 
> As example of drive-backup problems consider the following:
> 
> User of drive-backup expects that target will be opened in the same
> cache and aio mode as source. Corresponding logic is in
> drive_backup_prepare(), where we take bs->open_flags of source.
> 
> It works rather bad if source was added by blockdev-add. Assume source
> is qcow2 image. On blockdev-add we should specify aio and cache options
> for file child of qcow2 node. What happens next:
> 
> drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
> But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
> places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
> as file-posix parse options and simply set s->use_linux_aio.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all! I remember, I suggested to deprecate drive-backup some time ago,
> and nobody complain.. But that old patch was inside the series with
> other more questionable deprecations and it did not landed.
> 
> Let's finally deprecate what should be deprecated long ago.
> 
> We now faced a problem in our downstream, described in commit message.
> In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
> unconditionally for drive_backup target. But actually this just shows
> that using drive-backup in blockdev era is a bad idea. So let's motivate
> everyone (including Virtuozzo of course) to move to new interfaces and
> avoid problems with all that outdated option inheritance.

libvirt never used 'drive-backup' thus

Reviewed-by: Peter Krempa

Re: issuing [block-]job-complete to jobs in STANDBY state

2021-04-06 Thread Peter Krempa

On Thu, Apr 01, 2021 at 15:02:35 -0400, John Snow wrote:
> Hi; downstream we've run into an issue where VMs under heavy load with many
> simultaneously concurrent block jobs running might occasionally flicker into
> the STANDBY state, during which time they will be unable to receive JOB
> COMPLETE commands. I assume this flicker is due to
> child_job_drained_begin().
> 
> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
> 
> It's safe to just retry this operation again, but it may be difficult to
> understand WHY the job is paused at the application level, since the flush
> event may be asynchronous and unpredictable.
> 
> We could define a transition to allow COMPLETE to be applied to STANDBY
> jobs, but is there any risk or drawback to doing so? On QMP's side, we do
> know the difference between a temporary pause and a user pause/error pause
> (Both use the user_pause flag.)

I think qemu should allow that, because otherwise it's like playing
whack-a-mole. Libvirt could delay the completiion until the 'standby'
state is over and the job is running but that would be racy.

Additionally, we could even hit the standby state before the state
transition would be processed internally, and the error message is too
generic to attempt to decide what to do based on it. (We do want to
report real errors)

Re: [PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-12 Thread Peter Krempa

On Fri, Mar 12, 2021 at 09:46:54 +0100, Paolo Bonzini wrote:
> On 12/03/21 09:14, Markus Armbruster wrote:
> > Paolo Bonzini  writes:
> > 
> > > On 11/03/21 15:08, Markus Armbruster wrote:
> > > > > I would rather keep the OptsVisitor here.  Do the same check for JSON
> > > > > syntax that you have in qobject_input_visitor_new_str, and whenever
> > > > > you need to walk all -object arguments, use something like this:
> > > > > 
> > > > >   typedef struct ObjectArgument {
> > > > >   const char *id;
> > > > >   QDict *json;/* or NULL for QemuOpts */
> > > > >   QSIMPLEQ_ENTRY(ObjectArgument) next;
> > > > >   }
> > > > > 
> > > > > I already had patches in my queue to store -object in a GSList of
> > > > > dictionaries, changing it to use the above is easy enough.
> > > > 
> > > > I think I'd prefer following -display's precedence.  See my reply to
> > > > Kevin for details.
> > > 
> > > Yeah, I got independently to the same conclusion and posted patches
> > > for that.  I was scared that visit_type_ObjectOptions was too much for
> > > OptsVisitor but it seems to work...
> > 
> > We have reason to be scared.  I'll try to cover this in my review.
> 
> Yes, it's a good reason to possibly even delay those 3 patches to 6.1.

Is there a chance we could get the json syntax for -object for now? I
think it would simplify libvirt's code a bit and sidestep the issue of
converting the already existing parameters from JSON form we have into
the commandline form.

Re: [PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-11 Thread Peter Krempa

On Thu, Mar 11, 2021 at 12:41:42 +0100, Kevin Wolf wrote:
> Am 11.03.2021 um 12:24 hat Peter Krempa geschrieben:
> > On Thu, Mar 11, 2021 at 09:37:11 +0100, Kevin Wolf wrote:
> > > Am 11.03.2021 um 08:47 hat Peter Krempa geschrieben:
> > > > On Wed, Mar 10, 2021 at 18:30:44 +0100, Kevin Wolf wrote:
> > > > > Am 10.03.2021 um 15:31 hat Paolo Bonzini geschrieben:
> > > > > > On 10/03/21 15:22, Peter Krempa wrote:
> > 
> > [...]
> > 
> > > > -object 
> > > > memory-backend-ram,id=ram-node2,size=24578621440,host-nodes=1-2,host-nodes=5,host-nodes=7,policy=bind
> > > 
> > > Oh, we have ranges, too... That makes the compatibility code even
> > > nastier to write. I doubt that we can implement this in the keyval
> > > parser alone without touching the visitor. :-/
> > > 
> > > > If any of the above is to be deprecated we'll need to adjust our
> > > > JSON->commandline generator accordignly.
> > > > 
> > > > Luckily the 'host-nodes' is storeable as a bitmap and the generator is
> > > > actually modular to allow plugging an array interpretor which actually
> > > > does the above conversion from a JSON array.
> > > > 
> > > > So, what is the preferred syntax here? Additionally we might need a
> > > > witness property to detect when to use the new syntax if basing it on
> > > > the object-add qapification will not be enough.
> > > 
> > > The list syntax supported by the keyval parser is the one you know from
> > > -blockdev: host-nodes.0=3,host-nodes.1=7, ...
> > 
> > I think that should be easy enough to convert to.
> 
> We could also support JSON syntax in QEMU. That would probably be even
> more convenient for libvirt?

Definitely yes. We already do have the JSON internal representation, so
outputing JSON directly just skips the commandlinificator.


> > Is it safe to do right away (with the old parser?). Otherwise we need
> > to agree on something which will let us detect when it's safe to
> > change.
> 
> Neither keyval nor JSON syntax work with the old QemuOpts parser.
> 
> What is the usual way to do this for command line options? If we don't
> have a good way there, we can always tie it to something in the QAPI
> schema. If we still get this solved in time for 6.0, we could use the
> existence of ObjectOptions. Or all else failing, we can use a feature
> flag.

Yup, in this release I'd use what I have for detecting qapification of
-object. If we can do JSON with this, it would be ideal.

Re: [PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-11 Thread Peter Krempa

On Thu, Mar 11, 2021 at 09:37:11 +0100, Kevin Wolf wrote:
> Am 11.03.2021 um 08:47 hat Peter Krempa geschrieben:
> > On Wed, Mar 10, 2021 at 18:30:44 +0100, Kevin Wolf wrote:
> > > Am 10.03.2021 um 15:31 hat Paolo Bonzini geschrieben:
> > > > On 10/03/21 15:22, Peter Krempa wrote:

[...]

> > -object 
> > memory-backend-ram,id=ram-node2,size=24578621440,host-nodes=1-2,host-nodes=5,host-nodes=7,policy=bind
> 
> Oh, we have ranges, too... That makes the compatibility code even
> nastier to write. I doubt that we can implement this in the keyval
> parser alone without touching the visitor. :-/
> 
> > If any of the above is to be deprecated we'll need to adjust our
> > JSON->commandline generator accordignly.
> > 
> > Luckily the 'host-nodes' is storeable as a bitmap and the generator is
> > actually modular to allow plugging an array interpretor which actually
> > does the above conversion from a JSON array.
> > 
> > So, what is the preferred syntax here? Additionally we might need a
> > witness property to detect when to use the new syntax if basing it on
> > the object-add qapification will not be enough.
> 
> The list syntax supported by the keyval parser is the one you know from
> -blockdev: host-nodes.0=3,host-nodes.1=7, ...

I think that should be easy enough to convert to. Is it safe to do right
away (with the old parser?). Otherwise we need to agree on something
which will let us detect when it's safe to change.

Re: [PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-10 Thread Peter Krempa

On Wed, Mar 10, 2021 at 18:30:44 +0100, Kevin Wolf wrote:
> Am 10.03.2021 um 15:31 hat Paolo Bonzini geschrieben:
> > On 10/03/21 15:22, Peter Krempa wrote:

[...]

> The keyval parser would create a list if multiple values are given for
> the same key. Some care needs to be taken to avoid mixing the magic
> list feature with the normal indexed list options.
> 
> The QAPI schema would then use an alternate between 'int' and ['int'],
> with the the memory-backend-ram implementation changed accordingly.
> 
> We could consider immediately deprecating the syntax and printing a
> warning in the keyval parser when it automatically creates a list from
> two values for a key, so that users don't start using this syntax

By 'creating a list from two values for a key' you mean:

host-nodes=0,host-nodes=1

to be converted into [0, 1] ?

> instead of the normal list syntax in other places. We'd probably still
> leave the implementation around for -device and other users of the same
> magic.

There's three options actually that libvirt uses, visible in one our
test files [1]

For a single value we format:

-object 
memory-backend-ram,id=ram-node0,size=20971520,host-nodes=3,policy=preferred

For a contiguous list:

-object 
memory-backend-ram,id=ram-node1,size=676331520,host-nodes=0-7,policy=bind

And for an interleaved list:

-object 
memory-backend-ram,id=ram-node2,size=24578621440,host-nodes=1-2,host-nodes=5,host-nodes=7,policy=bind

If any of the above is to be deprecated we'll need to adjust our
JSON->commandline generator accordignly.

Luckily the 'host-nodes' is storeable as a bitmap and the generator is
actually modular to allow plugging an array interpretor which actually
does the above conversion from a JSON array.

So, what is the preferred syntax here? Additionally we might need a
witness property to detect when to use the new syntax if basing it on
the object-add qapification will not be enough.

[1] 
https://gitlab.com/libvirt/libvirt/-/blob/master/tests/qemuxml2argvdata/numatune-memnode.args

Re: [PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-10 Thread Peter Krempa

On Wed, Mar 10, 2021 at 15:31:57 +0100, Paolo Bonzini wrote:
> On 10/03/21 15:22, Peter Krempa wrote:
> > I've stumbled upon a regression with this patchset applied:
> > 
> > error: internal error: process exited while connecting to monitor: 
> > qemu-system-x86_64: -object 
> > memory-backend-ram,id=pc.ram,size=1048576000,host-nodes=0,policy=bind: 
> > Invalid parameter type for 'host-nodes', expected: array
> 
> This is the magic conversion of "string containing a list of integers" to
> "list of integers".
> 
> The relevant code is in qapi/string-input-visitor.c, but I'm not sure where
> to stick it in the keyval-based parsing flow (i.e.
> qobject_input_visitor_new_keyval).  Markus, any ideas?

I've got a slightly off-topic question/idea.

Would it be reasonably easy/straightforward to run qemu's init code
which parses arguments and possibly validates them but quit before
actually starting to initiate resources?

The use case would be to plug it (optionally?) into libvirt's
qemuxml2argvtest to prevent such a thing from happening.

It's not feasible to run all the configurations we have with a real VM
but a partial validation would possibly make sense if it's easy enough.

Re: [PATCH v3 00/30] qapi/qom: QAPIfy --object and object-add

2021-03-10 Thread Peter Krempa

On Mon, Mar 08, 2021 at 17:54:10 +0100, Kevin Wolf wrote:
> This series adds a QAPI type for the properties of all user creatable
> QOM types and finally makes the --object command line option (in all
> binaries) and the object-add monitor commands (in QMP and HMP) use the
> new ObjectOptions union.
> 
> This change improves things in more than just one way:
> 
> 1. Documentation for QOM object types has always been lacking. Adding
>the schema, we get documentation for every property.
> 
> 2. It prevents bugs by performing parts of the input validation (e.g.
>checking presence of mandatory properties) already in QAPI instead of
>relying on separate manual implementations in each class.
> 
> 3. It provides QAPI introspection for user creatable objects.
> 
> 4. Non-scalar properties are now supported everywhere because the
>command line parsers (including HMP) use the keyval parser now.
> 
> 
> If you are in the CC list and didn't expect this series, it's probably
> because you're the maintainer of one of the objects for which I'm adding
> a QAPI schema description. Please just have a look at the specific patch
> for your object and check whether the schema and its documentation make
> sense to you. You can ignore all other patches.
> 
> 
> In a next step after this series, we can add make use of the QAPI
> structs in the implementation of the object and separate their
> configuration from the runtime state. Specifically, the plan is to
> add a .configure() callback to ObjectClass that allows configuring the
> object in one place at creation time and keeping QOM property setters
> only for properties that can actually be changed at runtime. Paolo made
> an example of what the state could look like after this:
> 
> https://wiki.qemu.org/Features/QOM-QAPI_integration
> 
> Finally, the intention is to extend the QAPI schema to have separate
> 'object' entities and generate some of the code that was written
> manually in the intermediate state before.
> 
> 
> This series is available as a git tag at:
> 
> https://repo.or.cz/qemu/kevin.git qapi-object-v3
> 
> 
> v3:
> - Removed now useless QAuthZListRuleListHack
> - Made some more ObjectOptions branches conditional
> - Improved documentation for some properties
> - Fixed 'qemu-img compare' exit code for option parsing failure
> 
> v2:
> - Convert not only object-add, but all external interfaces so that the
>   schema will always be enforced and mismatch between implementation and
>   schema can't go unnoticed.
> - Rebased, covering properties and object types added since v1 (yes,
>   things do become outdated rather quickly when you touch all user
>   creatable objects)
> - Changed the "Since:" version number in the schema documentation to
>   refer to the version when the object was introduced rather than 6.0
>   where the schema will (hopefully) be added
> - Probably some other minor changes

I've stumbled upon a regression with this patchset applied:

error: internal error: process exited while connecting to monitor: 
qemu-system-x86_64: -object 
memory-backend-ram,id=pc.ram,size=1048576000,host-nodes=0,policy=bind: Invalid 
parameter type for 'host-nodes', expected: array

Full commandline is:

LC_ALL=C \
PATH=/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
 \
HOME=/var/lib/libvirt/qemu/domain-15-cd \
USER=root \
LOGNAME=root \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-15-cd/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-15-cd/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-15-cd/.config \
/home/pipo/git/qemu.git/build/x86_64-softmmu/qemu-system-x86_64 \
-name guest=cd,debug-threads=on \
-S \
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-15-cd/master-key.aes
 \
-machine 
pc-i440fx-2.9,accel=kvm,usb=off,vmport=off,dump-guest-core=off,memory-backend=pc.ram
 \
-cpu 
EPYC-Rome,x2apic=on,tsc-deadline=on,hypervisor=on,tsc-adjust=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,cmp-legacy=on,amd-ssbd=on,virt-ssbd=on,rdctl-no=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on
 \
-m 1000 \
-object memory-backend-ram,id=pc.ram,size=1048576000,host-nodes=0,policy=bind \
-overcommit mem-lock=off \
-smp 2,maxcpus=8,sockets=8,cores=1,threads=1 \
-uuid 8e70100a-64b4-4186-aff9-e055c3075cb0 \
-no-user-config \
-nodefaults \
-device sga \
-chardev socket,id=charmonitor,fd=42,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot menu=on,strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 \
-device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x9 \

Re: [PATCH v2 1/3] fdc: Drop deprecated floppy configuration

2021-03-04 Thread Peter Krempa

On Thu, Mar 04, 2021 at 17:23:05 +0100, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > On Thu, Mar 04, 2021 at 03:26:55PM +0100, Markus Armbruster wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> > On Thu, Mar 04, 2021 at 11:00:57AM +0100, Markus Armbruster wrote:
> >> >> Drop the crap deprecated in commit 4a27a638e7 "fdc: Deprecate
> >> >> configuring floppies with -global isa-fdc" (v5.1.0).
> >> >> 
> >> >> Signed-off-by: Markus Armbruster 
> >> >> ---
> >> >>  docs/system/deprecated.rst   |  26 --
> >> >>  docs/system/removed-features.rst |  26 ++
> >> >>  hw/block/fdc.c   |  54 +--
> >> >>  tests/qemu-iotests/172   |  31 +-
> >> >>  tests/qemu-iotests/172.out   | 562 +--
> >> >>  5 files changed, 30 insertions(+), 669 deletions(-)

[...]

> >> 
> >> Correct.
> >> 
> >> This was deprecated in commit 4a27a638e7 "fdc: Deprecate configuring
> >> floppies with -global isa-fdc" (v5.1.0).  Since then, its use triggers a
> >> warning:
> >> 
> >> $ qemu-system-x86_64 -nodefaults -M q35 -display none -drive 
> >> if=none,id=drive-fdc0-0-0 -device 
> >> isa-fdc,driveA=drive-fdc0-0-0,bootindexA=1
> >> qemu-system-x86_64: -device 
> >> isa-fdc,driveA=drive-fdc0-0-0,bootindexA=1: warning: warning: property 
> >> isa-fdc.driveA is deprecated
> >> Use -device floppy,unit=0,drive=... instead.
> >> 
> >> Note the -M q35.  Needed because the default machine type has an onboard
> >> isa-fdc, which cannot be configured this way.
> >> 
> >> Sadly, the commit's update of docs/system/deprecated.rst neglects to
> >> cover this use.  Looks the series overtaxed my capacity to juggle
> >> details; my apologies.
> >> 
> >> Is libvirt still using these properties?
> >
> > Unfortunately yes, but it seems like it ought to be fairly easy to
> > change the syntax. Just need to figure out what the right way to
> > detect the availability of the new syntax is. Presumably just look
> > for existance of the 'floppy' device type ?
> 
> Yes.  The device type was added in merge commit fd209e4a7, v2.8.0.
> 
> > Can you confirm that switching from -global to the new -device floppy
> > does /not/ have any live migration impact ?
> 
> Yes, it must not affect migration.
> 
> When Kevin split the floppy device type off the floppy controller, he
> had to add some moderately ugly hackery to keep the old qdev properties
> working.  Think propagate property values to floppy from controller,
> which otherwise ignores them.
> 
> The way you get the values into the floppy device cannot affect the
> migration data.  Only different values can.
> 
> This patch removes a deprecated way.

Note that when QEMU_CAPS_BLOCKDEV is asserted we format floppies as:

-blockdev '{"driver":"file","filename":"/tmp/firmware.img",\
"node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"raw",\
"file":"libvirt-2-storage"}' \
-device floppy,unit=0,drive=libvirt-2-format,id=fdc0-0-0 \
-blockdev '{"driver":"file","filename":"/tmp/data.img",\
"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2",\
"file":"libvirt-1-storage"}' \
-device floppy,unit=1,drive=libvirt-1-format,id=fdc0-0-1 \

as visible in the test file:

tests/qemuxml2argvdata/disk-floppy-q35-2_11.x86_64-latest.args

So libvirt should be in the clear. isa-fdc with driveA/driveB is
formatted only when the blockdev capability isn't present.

Re: [RFC PATCH v2 3/4] block: Support multiple reopening with x-blockdev-reopen

2021-03-01 Thread Peter Krempa

On Mon, Mar 01, 2021 at 12:07:26 +0100, Kevin Wolf wrote:
> Am 25.02.2021 um 18:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 24.02.2021 15:33, Kevin Wolf wrote:
> > > Am 09.02.2021 um 09:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > 08.02.2021 21:44, Alberto Garcia wrote:
> > > > > Signed-off-by: Alberto Garcia 
> > > > > ---
> > > > >qapi/block-core.json   |  2 +-
> > > > >include/block/block.h  |  1 +
> > > > >block.c| 16 +--
> > > > >blockdev.c | 85 
> > > > > +-
> > > > >tests/qemu-iotests/155 |  9 ++--
> > > > >tests/qemu-iotests/165 |  4 +-
> > > > >tests/qemu-iotests/245 | 27 +++-
> > > > >tests/qemu-iotests/248 |  2 +-
> > > > >tests/qemu-iotests/248.out |  2 +-
> > > > >tests/qemu-iotests/298 |  4 +-
> > > > >10 files changed, 89 insertions(+), 63 deletions(-)
> > > > > 
> > > > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > > > index c0e7c23331..b9fcf20a81 100644
> > > > > --- a/qapi/block-core.json
> > > > > +++ b/qapi/block-core.json
> > > > > @@ -4177,7 +4177,7 @@
> > > > ># Since: 4.0
> > > > >##
> > > > >{ 'command': 'x-blockdev-reopen',
> > > > > -  'data': 'BlockdevOptions', 'boxed': true }
> > > > > +  'data': { 'options': ['BlockdevOptions'] } }
> > > > 
> > > > Do we also want to drop x- prefix?
> > > 
> > > libvirt really wants to have a stable blockdev-reopen interface in 6.0
> > > because enabling the incremental backup code depends on this (they just
> > > toggle the readonly flag if I understand correctly, so most of the work
> > > we're currently doing isn't even relevant at this moment for libvirt).
> > 
> > Do you know what is the case exactly? If they do it to remove dirty bitmap
> > from backing image after snapshot operation, probably we'd better improve
> > block-dirty-bitmap-remove command to be able to reopen r-o image?
> > 
> > (I just recently faced such a task)
> 
> I think it was to switch nodes between read-only and read-write, but I
> don't remember the exact context.
> 
> Peter, can you add the details?

Libvirt indeed has code to drive blockdev-reopen to use it to toggle the
read-write state for block dirty bitmap manipulation.

Few of the cases where we use it:

- after blockjobs to toggle the destination image writable (the
  blockjobs turns it RO when it finishes) so that we can merge the
  appropriate bitmaps

- for deletion of bitmaps in backing images in the checkpoint APIs

Both of the operations really require blockdev-reopen and are required
for the incremental backup feature as whole since bitmap manipulation is
crucial for it and it doesn't happen automatically.

The blockdev code has the facilities to also use blockdev-reopen to
change RO/RW state for blockjobs if it will be needed in the future or
it can be later used to even replace the auto-read-only feature of the
protocol node drivers which were actually added to make -blockdev work
in libvirt's s-virt environment properly.

Re: [PATCH v2 00/31] qapi/qom: QAPIfy --object and object-add

2021-02-24 Thread Peter Krempa

On Wed, Feb 24, 2021 at 14:52:24 +0100, Kevin Wolf wrote:
> This series adds a QAPI type for the properties of all user creatable
> QOM types and finally makes the --object command line option (in all
> binaries) and the object-add monitor commands (in QMP and HMP) use the
> new ObjectOptions union.
> 
> This change improves things in more than just one way:
> 
> 1. Documentation for QOM object types has always been lacking. Adding
>the schema, we get documentation for every property.
> 
> 2. It prevents bugs by performing parts of the input validation (e.g.
>checking presence of mandatory properties) already in QAPI instead of
>relying on separate manual implementations in each class.
> 
> 3. It provides QAPI introspection for user creatable objects.
> 
> 4. Non-scalar properties are now supported everywhere because the
>command line parsers (including HMP) use the keyval parser now.

I've updated and posted another version of the libvirt patches which add
testing that our generated props conform to the schema and also deals
with the dropped 'props' wrapper:

https://listman.redhat.com/archives/libvir-list/2021-February/msg01212.html

Libvirt's test pass after it without any change, so on behalf of libvirt

ACKed-by: Peter Krempa

Re: [PATCH v3 1/3] migration: dirty-bitmap: Convert alias map inner members to BitmapMigrationBitmapAlias

2021-02-12 Thread Peter Krempa

On Fri, Feb 12, 2021 at 12:38:10 -0600, Eric Blake wrote:
> On 2/12/21 11:34 AM, Peter Krempa wrote:
> 
> Long subject line; if it's okay with you, I'd prefer to use:
> 
> migration: dirty-bitmap: Use struct for alias map inner members
> 
> > Currently the alias mapping hash stores just strings of the target
> > objects internally. In further patches we'll be adding another member
> > which will need to be stored in the map so pass a copy of the whole
> > BitmapMigrationBitmapAlias QAPI struct into the map.
> > 
> > Signed-off-by: Peter Krempa 
> > ---
> >  migration/block-dirty-bitmap.c | 30 +++---
> >  1 file changed, 19 insertions(+), 11 deletions(-)
> > 
> > Note that there's a very long line but there doesn't seem to be a
> > sensible point where to break it.
> 
> In other words, the patchew warning can be ignored if I can't reformat
> the line.
> 
> > +++ b/migration/block-dirty-bitmap.c
> > @@ -75,6 +75,8 @@
> >  #include "qemu/id.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-commands-migration.h"
> > +#include "qapi/qapi-visit-migration.h"
> > +#include "qapi/clone-visitor.h"
> >  #include "trace.h"
> > 
> >  #define CHUNK_SIZE (1 << 10)
> > @@ -263,8 +265,8 @@ static GHashTable *construct_alias_map(const 
> > BitmapMigrationNodeAliasList *bbm,
> >  node_map_to = bmna->node_name;
> >  }
> > 
> > -bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal,
> > -g_free, g_free);
> > +bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal, 
> > g_free,
> > +(GDestroyNotify) 
> > qapi_free_BitmapMigrationBitmapAlias);
> 
> A possible fix: declare a temporary variable of type GDestroyNotify, so
> that assigning the variable uses a shorter line, then use that variable
> here.
> 
> > @@ -312,7 +312,8 @@ static GHashTable *construct_alias_map(const 
> > BitmapMigrationNodeAliasList *bbm,
> >  }
> > 
> >  g_hash_table_insert(bitmaps_map,
> > -g_strdup(bmap_map_from), 
> > g_strdup(bmap_map_to));
> > +g_strdup(bmap_map_from),
> 
> This line could be wrapped with the previous, now.
> 
> Reviewed-by: Eric Blake 
> 
> If you like, I can squash the following in before queuing.

Anything that gets the patches in :).

But honestly, nowadays 80 colums hard limit seems very prehistoric. I
understand that shorter lines are better but if you need to hack around
it, it IMO defeats the readability of the code anyways.

[PATCH v3 3/3] qemu-iotests: 300: Add test case for modifying persistence of bitmap

2021-02-12 Thread Peter Krempa

Verify that the modification of the bitmap persistence over migration
which is controlled via BitmapMigrationBitmapAliasTransform works
properly.

Based on TestCrossAliasMigration

Signed-off-by: Peter Krempa 
---
 tests/qemu-iotests/300 | 91 ++
 tests/qemu-iotests/300.out |  4 +-
 2 files changed, 93 insertions(+), 2 deletions(-)

v3: new in series

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
index 43264d883d..9d4ec6a381 100755
--- a/tests/qemu-iotests/300
+++ b/tests/qemu-iotests/300
@@ -600,6 +600,97 @@ class TestCrossAliasMigration(TestDirtyBitmapMigration):
 self.verify_dest_has_all_bitmaps()
 self.verify_dest_error(None)

+class TestAliasTransformMigration(TestDirtyBitmapMigration):
+"""
+Tests the 'transform' option which modifies bitmap persistence on 
migration.
+"""
+
+src_node_name = 'node-a'
+dst_node_name = 'node-b'
+src_bmap_name = 'bmap-a'
+dst_bmap_name = 'bmap-b'
+
+def setUp(self) -> None:
+TestDirtyBitmapMigration.setUp(self)
+
+# Now create another block device and let both have two bitmaps each
+result = self.vm_a.qmp('blockdev-add',
+   node_name='node-b', driver='null-co')
+self.assert_qmp(result, 'return', {})
+
+result = self.vm_b.qmp('blockdev-add',
+   node_name='node-a', driver='null-co')
+self.assert_qmp(result, 'return', {})
+
+bmaps_to_add = (('node-a', 'bmap-b'),
+('node-b', 'bmap-a'),
+('node-b', 'bmap-b'))
+
+for (node, bmap) in bmaps_to_add:
+result = self.vm_a.qmp('block-dirty-bitmap-add',
+   node=node, name=bmap)
+self.assert_qmp(result, 'return', {})
+
+@staticmethod
+def transform_mapping() -> BlockBitmapMapping:
+return [
+{
+'node-name': 'node-a',
+'alias': 'node-a',
+'bitmaps': [
+{
+'name': 'bmap-a',
+'alias': 'bmap-a',
+'transform':
+{
+'persistent': True
+}
+},
+{
+'name': 'bmap-b',
+'alias': 'bmap-b'
+}
+]
+},
+{
+'node-name': 'node-b',
+'alias': 'node-b',
+'bitmaps': [
+{
+'name': 'bmap-a',
+'alias': 'bmap-a'
+},
+{
+'name': 'bmap-b',
+'alias': 'bmap-b'
+}
+]
+}
+]
+
+def verify_dest_bitmap_state(self) -> None:
+bitmaps = self.vm_b.query_bitmaps()
+
+for node in bitmaps:
+bitmaps[node] = sorted(((bmap['name'], bmap['persistent']) for 
bmap in bitmaps[node]))
+
+self.assertEqual(bitmaps,
+ {'node-a': [('bmap-a', True), ('bmap-b', False)],
+  'node-b': [('bmap-a', False), ('bmap-b', False)]})
+
+def test_transform_on_src(self) -> None:
+self.set_mapping(self.vm_a, self.transform_mapping())
+
+self.migrate()
+self.verify_dest_bitmap_state()
+self.verify_dest_error(None)
+
+def test_transform_on_dst(self) -> None:
+self.set_mapping(self.vm_b, self.transform_mapping())
+
+self.migrate()
+self.verify_dest_bitmap_state()
+self.verify_dest_error(None)

 if __name__ == '__main__':
 iotests.main(supported_protocols=['file'])
diff --git a/tests/qemu-iotests/300.out b/tests/qemu-iotests/300.out
index cafb8161f7..12e9ab7d57 100644
--- a/tests/qemu-iotests/300.out
+++ b/tests/qemu-iotests/300.out
@@ -1,5 +1,5 @@
-.
+...
 --
-Ran 37 tests
+Ran 39 tests

 OK
-- 
2.29.2

[PATCH v3 0/3] migration: dirty-bitmap: Allow control of bitmap persistence

2021-02-12 Thread Peter Krempa

See 2/2 for explanation.

Peter Krempa (3):
  migration: dirty-bitmap: Convert alias map inner members to
BitmapMigrationBitmapAlias
  migration: dirty-bitmap: Allow control of bitmap persistence
  qemu-iotests: 300: Add test case for modifying persistence of bitmap

 migration/block-dirty-bitmap.c | 60 --
 qapi/migration.json| 19 ++-
 tests/qemu-iotests/300 | 91 ++
 tests/qemu-iotests/300.out |  4 +-
 4 files changed, 157 insertions(+), 17 deletions(-)

-- 
2.29.2

[PATCH v3 1/3] migration: dirty-bitmap: Convert alias map inner members to BitmapMigrationBitmapAlias

2021-02-12 Thread Peter Krempa

Currently the alias mapping hash stores just strings of the target
objects internally. In further patches we'll be adding another member
which will need to be stored in the map so pass a copy of the whole
BitmapMigrationBitmapAlias QAPI struct into the map.

Signed-off-by: Peter Krempa 
---
 migration/block-dirty-bitmap.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

Note that there's a very long line but there doesn't seem to be a
sensible point where to break it.

v3:
 - use a copy of BitmapMigrationBitmapAlias QAPI sctruct to do the
   mapping
 - dropped R-b's because the above is a significant change

v2:
 - NULL-check in freeing function (Eric)
 - style problems (Vladimir)


diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c61d382be8..0244f9bb1d 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -75,6 +75,8 @@
 #include "qemu/id.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
+#include "qapi/qapi-visit-migration.h"
+#include "qapi/clone-visitor.h"
 #include "trace.h"

 #define CHUNK_SIZE (1 << 10)
@@ -263,8 +265,8 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 node_map_to = bmna->node_name;
 }

-bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal,
-g_free, g_free);
+bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal, g_free,
+(GDestroyNotify) 
qapi_free_BitmapMigrationBitmapAlias);

 amin = g_new(AliasMapInnerNode, 1);
 *amin = (AliasMapInnerNode){
@@ -276,7 +278,7 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,

 for (bmbal = bmna->bitmaps; bmbal; bmbal = bmbal->next) {
 const BitmapMigrationBitmapAlias *bmba = bmbal->value;
-const char *bmap_map_from, *bmap_map_to;
+const char *bmap_map_from;

 if (strlen(bmba->alias) > UINT8_MAX) {
 error_setg(errp,
@@ -293,7 +295,6 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,

 if (name_to_alias) {
 bmap_map_from = bmba->name;
-bmap_map_to = bmba->alias;

 if (g_hash_table_contains(bitmaps_map, bmba->name)) {
 error_setg(errp, "The bitmap '%s'/'%s' is mapped twice",
@@ -302,7 +303,6 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 }
 } else {
 bmap_map_from = bmba->alias;
-bmap_map_to = bmba->name;

 if (g_hash_table_contains(bitmaps_map, bmba->alias)) {
 error_setg(errp, "The bitmap alias '%s'/'%s' is used 
twice",
@@ -312,7 +312,8 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 }

 g_hash_table_insert(bitmaps_map,
-g_strdup(bmap_map_from), 
g_strdup(bmap_map_to));
+g_strdup(bmap_map_from),
+QAPI_CLONE(BitmapMigrationBitmapAlias, bmba));
 }
 }

@@ -538,11 +539,15 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 if (bitmap_aliases) {
-bitmap_alias = g_hash_table_lookup(bitmap_aliases, bitmap_name);
-if (!bitmap_alias) {
+BitmapMigrationBitmapAlias *bmap_inner;
+
+bmap_inner = g_hash_table_lookup(bitmap_aliases, bitmap_name);
+if (!bmap_inner) {
 /* Skip bitmaps with no alias */
 continue;
 }
+
+bitmap_alias = bmap_inner->alias;
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
 error_report("Cannot migrate bitmap '%s' on node '%s': "
@@ -1074,13 +1079,16 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,

 bitmap_name = s->bitmap_alias;
 if (!s->cancelled && bitmap_alias_map) {
-bitmap_name = g_hash_table_lookup(bitmap_alias_map,
-  s->bitmap_alias);
-if (!bitmap_name) {
+BitmapMigrationBitmapAlias *bmap_inner;
+
+bmap_inner = g_hash_table_lookup(bitmap_alias_map, 
s->bitmap_alias);
+if (!bmap_inner) {
 error_report("Error: Unknown bitmap alias '%s' on node "
  "'%s' (alias '%s')", s->bitmap_alias,
  s->bs->node_name, s->node_alias);
 cancel_incoming_locked(s);
+} else {
+bitmap_name = bmap_inner->name;
 }
 }

-- 
2.29.2

[PATCH v3 2/3] migration: dirty-bitmap: Allow control of bitmap persistence

2021-02-12 Thread Peter Krempa

Bitmap's source persistence is transported over the migration stream and
the destination mirrors it. In some cases the destination might want to
persist bitmaps which are not persistent on the source (e.g. the result
of merge of bitmaps from a number of layers on the source when migrating
into a squashed image) but currently it would need to create another set
of persistent bitmaps and merge them.

This patch adds a 'transform' property to the alias map which allows to
override the persistence of migrated bitmaps both on the source and
destination sides.

Signed-off-by: Peter Krempa 
---
 migration/block-dirty-bitmap.c | 30 +++---
 qapi/migration.json| 19 ++-
 2 files changed, 45 insertions(+), 4 deletions(-)

v3:
 - adapted to full passtrhough of BitmapMigrationBitmapAlias into the
   map hash table (Vladimir)
 - schema and commit message language fixes (Eric)

v2:
 - grammar fixes (Eric)
 - added 'transform' object to group other possible transformations (Vladimir)
 - transformation can also be used on source (Vladimir)
 - put bmap_inner directly into DBMLoadState for deduplication  (Vladimir)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 0244f9bb1d..80781de5d8 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -150,6 +150,7 @@ typedef struct DBMLoadState {
 BdrvDirtyBitmap *bitmap;

 bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
+BitmapMigrationBitmapAlias *bmap_inner;

 /*
  * cancelled
@@ -528,6 +529,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
+BitmapMigrationBitmapAliasTransform *bitmap_transform = NULL;
 bitmap_name = bdrv_dirty_bitmap_name(bitmap);
 if (!bitmap_name) {
 continue;
@@ -548,6 +550,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 bitmap_alias = bmap_inner->alias;
+if (bmap_inner->has_transform) {
+bitmap_transform = bmap_inner->transform;
+}
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
 error_report("Cannot migrate bitmap '%s' on node '%s': "
@@ -573,8 +578,15 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 if (bdrv_dirty_bitmap_enabled(bitmap)) {
 dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
 }
-if (bdrv_dirty_bitmap_get_persistence(bitmap)) {
-dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+if (bitmap_transform &&
+bitmap_transform->has_persistent) {
+if (bitmap_transform->persistent) {
+dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
+} else {
+if (bdrv_dirty_bitmap_get_persistence(bitmap)) {
+dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
 }

 QSIMPLEQ_INSERT_TAIL(>dbms_list, dbms, entry);
@@ -782,6 +794,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 uint32_t granularity = qemu_get_be32(f);
 uint8_t flags = qemu_get_byte(f);
 LoadBitmapState *b;
+bool persistent;

 if (s->cancelled) {
 return 0;
@@ -806,7 +819,16 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 return -EINVAL;
 }

-if (flags & DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT) {
+if (s->bmap_inner &&
+s->bmap_inner->has_transform &&
+s->bmap_inner->transform &&
+s->bmap_inner->transform->has_persistent) {
+persistent = s->bmap_inner->transform->persistent;
+} else {
+persistent = flags & DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
+
+if (persistent) {
 bdrv_dirty_bitmap_set_persistence(s->bitmap, true);
 }

@@ -1090,6 +1112,8 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,
 } else {
 bitmap_name = bmap_inner->name;
 }
+
+s->bmap_inner = bmap_inner;
 }

 if (!s->cancelled) {
diff --git a/qapi/migration.json b/qapi/migration.json
index ce14d78071..8a85a6d041 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -536,6 +536,19 @@
   'data': [ 'none', 'zlib',
 { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }

+##
+# @BitmapMigrationBitmapAliasTransform:
+#
+# @persistent: If present, the bitmap will be made persistent
+#  or transient depending on this parameter.
+#
+# Since: 6.0
+##
+{ 'struct': 'BitmapMigrationBitmapAliasTransform',
+  'data': {
+  '*persistent': 'bool'
+  } }
+
 ##
 # @BitmapMigrationBitmapAlias:
 #
@@ -544,12 +557,16 @@
 # @alias: An al

Re: [RFC PATCH v2 0/4] Allow changing bs->file on reopen

2021-02-12 Thread Peter Krempa

On Wed, Feb 10, 2021 at 18:26:57 +0100, Kevin Wolf wrote:
> Am 08.02.2021 um 19:44 hat Alberto Garcia geschrieben:
> > Hi,
> > 
> > this series allows changing bs->file using x-blockdev-reopen. Read
> > here for more details:
> > 
> >https://lists.gnu.org/archive/html/qemu-block/2021-01/msg00437.html
> > 
> > Version 2 of the series introduces a very significant change:
> > x-blockdev-reopen now receives a list of BlockdevOptions instead of
> > just one, so it is possible to reopen multiple block devices using a
> > single transaction.
> 
> Adding Peter to Cc for this one.

For now I've only used blockdev-reopen for turning backing files
read-write and back for bitmap manipulation. In such case I presume it
doesn't really make sense to use the batching if I ever need to reopen
multiple images.

Other than that, the blockdev-reopen code is still dormant in libvirt so
it's easy to add the extra required wrapping without any problem.

[PATCH v2 2/2] migration: dirty-bitmap: Allow control of bitmap persistance

2021-02-10 Thread Peter Krempa

Bitmap's source persistance is transported over the migration stream and
the destination mirrors it. In some cases the destination might want to
persist bitmaps which are not persistent on the source (e.g. the result
of merge of bitmaps from a number of layers on the source when migrating
into a squashed image) but currently it would need to create another set
of persistent bitmaps and merge them.

This patch adds a 'transform' property to the alias map which allows to
override the persistance of migrated bitmaps both on the source and
destination sides.

Signed-off-by: Peter Krempa 
---

v2:
 - grammar fixes (Eric)
 - added 'transform' object to group other possible transformations (Vladimir)
 - transformation can also be used on source (Vladimir)
 - put bmap_inner directly into DBMLoadState for deduplication  (Vladimir)

 migration/block-dirty-bitmap.c | 38 +++---
 qapi/migration.json| 20 +-
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 0169f672df..a05bf74073 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -138,6 +138,13 @@ typedef struct LoadBitmapState {
 bool enabled;
 } LoadBitmapState;

+typedef struct AliasMapInnerBitmap {
+char *string;
+
+/* 'transform' properties borrowed from QAPI */
+BitmapMigrationBitmapAliasTransform *transform;
+} AliasMapInnerBitmap;
+
 /* State of the dirty bitmap migration (DBM) during load process */
 typedef struct DBMLoadState {
 uint32_t flags;
@@ -148,6 +155,7 @@ typedef struct DBMLoadState {
 BdrvDirtyBitmap *bitmap;

 bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */
+AliasMapInnerBitmap *bmap_inner;

 /*
  * cancelled
@@ -169,10 +177,6 @@ typedef struct DBMState {

 static DBMState dbm_state;

-typedef struct AliasMapInnerBitmap {
-char *string;
-} AliasMapInnerBitmap;
-
 static void free_alias_map_inner_bitmap(void *amin_ptr)
 {
 AliasMapInnerBitmap *amin = amin_ptr;
@@ -330,6 +334,7 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,

 bmap_inner = g_new0(AliasMapInnerBitmap, 1);
 bmap_inner->string = g_strdup(bmap_map_to);
+bmap_inner->transform = bmba->transform;

 g_hash_table_insert(bitmaps_map,
 g_strdup(bmap_map_from), bmap_inner);
@@ -547,6 +552,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
+BitmapMigrationBitmapAliasTransform *bitmap_transform = NULL;
 bitmap_name = bdrv_dirty_bitmap_name(bitmap);
 if (!bitmap_name) {
 continue;
@@ -567,6 +573,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 bitmap_alias = bmap_inner->string;
+bitmap_transform = bmap_inner->transform;
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
 error_report("Cannot migrate bitmap '%s' on node '%s': "
@@ -592,8 +599,15 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 if (bdrv_dirty_bitmap_enabled(bitmap)) {
 dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
 }
-if (bdrv_dirty_bitmap_get_persistence(bitmap)) {
-dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+if (bitmap_transform &&
+bitmap_transform->has_persistent) {
+if (bitmap_transform->persistent) {
+dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
+} else {
+if (bdrv_dirty_bitmap_get_persistence(bitmap)) {
+dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
 }

 QSIMPLEQ_INSERT_TAIL(>dbms_list, dbms, entry);
@@ -801,6 +815,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 uint32_t granularity = qemu_get_be32(f);
 uint8_t flags = qemu_get_byte(f);
 LoadBitmapState *b;
+bool persistent;

 if (s->cancelled) {
 return 0;
@@ -825,7 +840,15 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 return -EINVAL;
 }

-if (flags & DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT) {
+if (s->bmap_inner &&
+s->bmap_inner->transform &&
+s->bmap_inner->transform->has_persistent) {
+persistent = s->bmap_inner->transform->persistent;
+} else {
+persistent = flags & DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
+
+if (persistent) {
 bdrv_dirty_bitmap_set_persistence(s->bitmap, true);
 }

@@ -1109,6 +1132,7 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,
 }

 bitmap_

[PATCH v2 0/2] migration: dirty-bitmap: Allow control of bitmap persistence

2021-02-10 Thread Peter Krempa

See 2/2 for explanation.

Peter Krempa (2):
  migration: dirty-bitmap: Convert alias map inner members to a struct
  migration: dirty-bitmap: Allow control of bitmap persistance

 migration/block-dirty-bitmap.c | 73 +-
 qapi/migration.json| 20 +-
 2 files changed, 81 insertions(+), 12 deletions(-)

-- 
2.29.2

[PATCH v2 1/2] migration: dirty-bitmap: Convert alias map inner members to a struct

2021-02-10 Thread Peter Krempa

Currently the alias mapping hash stores just strings of the target
objects internally. In further patches we'll be adding another member
which will need to be stored in the map so convert the members to a
struct.

Signed-off-by: Peter Krempa 
Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v2:
 - NULL-check in freeing function (Eric)
 - style problems (Vladimir)

 migration/block-dirty-bitmap.c | 43 +++---
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c61d382be8..0169f672df 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -169,6 +169,22 @@ typedef struct DBMState {

 static DBMState dbm_state;

+typedef struct AliasMapInnerBitmap {
+char *string;
+} AliasMapInnerBitmap;
+
+static void free_alias_map_inner_bitmap(void *amin_ptr)
+{
+AliasMapInnerBitmap *amin = amin_ptr;
+
+if (!amin_ptr) {
+return;
+}
+
+g_free(amin->string);
+g_free(amin);
+}
+
 /* For hash tables that map node/bitmap names to aliases */
 typedef struct AliasMapInnerNode {
 char *string;
@@ -263,8 +279,8 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 node_map_to = bmna->node_name;
 }

-bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal,
-g_free, g_free);
+bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal, g_free,
+free_alias_map_inner_bitmap);

 amin = g_new(AliasMapInnerNode, 1);
 *amin = (AliasMapInnerNode){
@@ -277,6 +293,7 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 for (bmbal = bmna->bitmaps; bmbal; bmbal = bmbal->next) {
 const BitmapMigrationBitmapAlias *bmba = bmbal->value;
 const char *bmap_map_from, *bmap_map_to;
+AliasMapInnerBitmap *bmap_inner;

 if (strlen(bmba->alias) > UINT8_MAX) {
 error_setg(errp,
@@ -311,8 +328,11 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 }
 }

+bmap_inner = g_new0(AliasMapInnerBitmap, 1);
+bmap_inner->string = g_strdup(bmap_map_to);
+
 g_hash_table_insert(bitmaps_map,
-g_strdup(bmap_map_from), 
g_strdup(bmap_map_to));
+g_strdup(bmap_map_from), bmap_inner);
 }
 }

@@ -538,11 +558,15 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 if (bitmap_aliases) {
-bitmap_alias = g_hash_table_lookup(bitmap_aliases, bitmap_name);
-if (!bitmap_alias) {
+AliasMapInnerBitmap *bmap_inner;
+
+bmap_inner = g_hash_table_lookup(bitmap_aliases, bitmap_name);
+if (!bmap_inner) {
 /* Skip bitmaps with no alias */
 continue;
 }
+
+bitmap_alias = bmap_inner->string;
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
 error_report("Cannot migrate bitmap '%s' on node '%s': "
@@ -1074,14 +1098,17 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,

 bitmap_name = s->bitmap_alias;
 if (!s->cancelled && bitmap_alias_map) {
-bitmap_name = g_hash_table_lookup(bitmap_alias_map,
-  s->bitmap_alias);
-if (!bitmap_name) {
+AliasMapInnerBitmap *bmap_inner;
+
+bmap_inner = g_hash_table_lookup(bitmap_alias_map, 
s->bitmap_alias);
+if (!bmap_inner) {
 error_report("Error: Unknown bitmap alias '%s' on node "
  "'%s' (alias '%s')", s->bitmap_alias,
  s->bs->node_name, s->node_alias);
 cancel_incoming_locked(s);
 }
+
+bitmap_name = bmap_inner->string;
 }

 if (!s->cancelled) {
-- 
2.29.2

Re: [PATCH 2/2] migration: dirty-bitmap: Allow control of bitmap persistence on destination

2021-02-03 Thread Peter Krempa

On Wed, Feb 03, 2021 at 14:27:49 +0100, Peter Krempa wrote:
> On Wed, Feb 03, 2021 at 16:23:21 +0300, Vladimir Sementsov-Ogievskiy wrote:
> > 03.02.2021 16:00, Peter Krempa wrote:
> > > Bitmap's source persistence is transported over the migration stream and
> > > the destination mirrors it. In some cases the destination might want to
> > > persist bitmaps which are not persistent on the source (e.g. the result
> > > of merge of bitmaps from a number of layers on the source when migrating
> > > into a squashed image)
> > 
> > Why not make merge target on source be persistent itself? Then it will be 
> > persistent on migration destination.
> 
> Because they are temporary on the source. I don't want to make it
> persistent in case of a failure so that it doesn't get written to the
> disk e.g. in case of VM shutdown.

To be a bit more specific, I don't want the bitmaps to stay in the
image, that means that I'd have to also delete them on the source after
a successfull migration before qemu is terminated, which might not even
be possible since source deactivates storage after migration.

So making them persistent on source is impossible.

> 
> > 
> > > but currently it would need to create another set
> > > of persistent bitmaps and merge them.
> > > 
> > > This adds 'dest-persistent' optional property to
> > > 'BitmapMigrationBitmapAlias' which when present overrides the bitmap
> > > presence state from the source.
> > 
> > It's seems simpler to make a separate qmp command 
> > block-dirty-bitmap-make-persistent.. Didn't you consider this way?
> 
> I'm not sure how the internals work entirely. In my case it's way
> simpler to do this setup when generating the mapping which I need to do
> anyways rather than calling separate commands.

Similarly here, after a successful migration I'd have to go and make all
the bitmaps persistent, which is an extra step, and also a vector for
possible failures after migration which also doesn't seem appealing.

Re: [PATCH 2/2] migration: dirty-bitmap: Allow control of bitmap persistence on destination

2021-02-03 Thread Peter Krempa

On Wed, Feb 03, 2021 at 16:23:21 +0300, Vladimir Sementsov-Ogievskiy wrote:
> 03.02.2021 16:00, Peter Krempa wrote:
> > Bitmap's source persistence is transported over the migration stream and
> > the destination mirrors it. In some cases the destination might want to
> > persist bitmaps which are not persistent on the source (e.g. the result
> > of merge of bitmaps from a number of layers on the source when migrating
> > into a squashed image)
> 
> Why not make merge target on source be persistent itself? Then it will be 
> persistent on migration destination.

Because they are temporary on the source. I don't want to make it
persistent in case of a failure so that it doesn't get written to the
disk e.g. in case of VM shutdown.

> 
> > but currently it would need to create another set
> > of persistent bitmaps and merge them.
> > 
> > This adds 'dest-persistent' optional property to
> > 'BitmapMigrationBitmapAlias' which when present overrides the bitmap
> > presence state from the source.
> 
> It's seems simpler to make a separate qmp command 
> block-dirty-bitmap-make-persistent.. Didn't you consider this way?

I'm not sure how the internals work entirely. In my case it's way
simpler to do this setup when generating the mapping which I need to do
anyways rather than calling separate commands.

[PATCH 2/2] migration: dirty-bitmap: Allow control of bitmap persistence on destination

2021-02-03 Thread Peter Krempa

Bitmap's source persistence is transported over the migration stream and
the destination mirrors it. In some cases the destination might want to
persist bitmaps which are not persistent on the source (e.g. the result
of merge of bitmaps from a number of layers on the source when migrating
into a squashed image) but currently it would need to create another set
of persistent bitmaps and merge them.

This adds 'dest-persistent' optional property to
'BitmapMigrationBitmapAlias' which when present overrides the bitmap
presence state from the source.

Signed-off-by: Peter Krempa 
---
 migration/block-dirty-bitmap.c | 30 +-
 qapi/migration.json|  7 ++-
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index b0403dd00c..1876c94c45 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -149,6 +149,9 @@ typedef struct DBMLoadState {

 bool before_vm_start_handled; /* set in dirty_bitmap_mig_before_vm_start */

+bool has_dest_persistent;
+bool dest_persistent;
+
 /*
  * cancelled
  * Incoming migration is cancelled for some reason. That means that we
@@ -171,6 +174,10 @@ static DBMState dbm_state;

 typedef struct AliasMapInnerBitmap {
 char *string;
+
+/* for destination's properties setting bitmap state */
+bool has_dest_persistent;
+bool dest_persistent;
 } AliasMapInnerBitmap;

 static void free_alias_map_inner_bitmap(void *amin_ptr)
@@ -289,6 +296,8 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 for (bmbal = bmna->bitmaps; bmbal; bmbal = bmbal->next) {
 const BitmapMigrationBitmapAlias *bmba = bmbal->value;
 const char *bmap_map_from, *bmap_map_to;
+bool bmap_has_dest_persistent = false;
+bool bmap_dest_persistent = false;
 AliasMapInnerBitmap *bmap_inner;

 if (strlen(bmba->alias) > UINT8_MAX) {
@@ -317,6 +326,9 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 bmap_map_from = bmba->alias;
 bmap_map_to = bmba->name;

+bmap_has_dest_persistent = bmba->has_dest_persistent;
+bmap_dest_persistent = bmba->dest_persistent;
+
 if (g_hash_table_contains(bitmaps_map, bmba->alias)) {
 error_setg(errp, "The bitmap alias '%s'/'%s' is used 
twice",
bmna->alias, bmba->alias);
@@ -326,6 +338,8 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,

 bmap_inner = g_new0(AliasMapInnerBitmap, 1);
 bmap_inner->string = g_strdup(bmap_map_to);
+bmap_inner->has_dest_persistent = bmap_has_dest_persistent;
+bmap_inner->dest_persistent = bmap_dest_persistent;

 g_hash_table_insert(bitmaps_map,
 g_strdup(bmap_map_from), bmap_inner);
@@ -798,6 +812,7 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 uint32_t granularity = qemu_get_be32(f);
 uint8_t flags = qemu_get_byte(f);
 LoadBitmapState *b;
+bool persistent;

 if (s->cancelled) {
 return 0;
@@ -822,7 +837,13 @@ static int dirty_bitmap_load_start(QEMUFile *f, 
DBMLoadState *s)
 return -EINVAL;
 }

-if (flags & DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT) {
+if (s->has_dest_persistent) {
+persistent = s->dest_persistent;
+} else {
+persistent = flags & DIRTY_BITMAP_MIG_START_FLAG_PERSISTENT;
+}
+
+if (persistent) {
 bdrv_dirty_bitmap_set_persistence(s->bitmap, true);
 }

@@ -1087,6 +1108,8 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,

 if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
 const char *bitmap_name;
+bool bitmap_has_dest_persistent = false;
+bool bitmap_dest_persistent = false;

 if (!qemu_get_counted_string(f, s->bitmap_alias)) {
 error_report("Unable to read bitmap alias string");
@@ -1107,12 +1130,17 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,
 }

 bitmap_name = bmap_inner->string;
+bitmap_has_dest_persistent = bmap_inner->has_dest_persistent;
+bitmap_dest_persistent = bmap_inner->dest_persistent;
 }

 if (!s->cancelled) {
 g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name));
 s->bitmap = bdrv_find_dirty_bitmap(s->bs, s->bitmap_name);

+s->has_dest_persistent = bitmap_has_dest_persistent;
+s->dest_persistent = bitmap_dest_persistent;
+
 /*
  * bitmap may be NULL here, it wouldn't be an error if it is the

[PATCH 1/2] migration: dirty-bitmap: Convert alias map inner members to a struct

2021-02-03 Thread Peter Krempa

Currently the alias mapping hash stores just strings of the target
objects internally. In further patches we'll be adding another member
which will need to be stored in the map so convert the members to a
struct.

Signed-off-by: Peter Krempa 
---
 migration/block-dirty-bitmap.c | 37 --
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c61d382be8..b0403dd00c 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -169,6 +169,18 @@ typedef struct DBMState {

 static DBMState dbm_state;

+typedef struct AliasMapInnerBitmap {
+char *string;
+} AliasMapInnerBitmap;
+
+static void free_alias_map_inner_bitmap(void *amin_ptr)
+{
+AliasMapInnerBitmap *amin = amin_ptr;
+
+g_free(amin->string);
+g_free(amin);
+}
+
 /* For hash tables that map node/bitmap names to aliases */
 typedef struct AliasMapInnerNode {
 char *string;
@@ -264,7 +276,7 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 }

 bitmaps_map = g_hash_table_new_full(g_str_hash, g_str_equal,
-g_free, g_free);
+g_free, 
free_alias_map_inner_bitmap);

 amin = g_new(AliasMapInnerNode, 1);
 *amin = (AliasMapInnerNode){
@@ -277,6 +289,7 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 for (bmbal = bmna->bitmaps; bmbal; bmbal = bmbal->next) {
 const BitmapMigrationBitmapAlias *bmba = bmbal->value;
 const char *bmap_map_from, *bmap_map_to;
+AliasMapInnerBitmap *bmap_inner;

 if (strlen(bmba->alias) > UINT8_MAX) {
 error_setg(errp,
@@ -311,8 +324,11 @@ static GHashTable *construct_alias_map(const 
BitmapMigrationNodeAliasList *bbm,
 }
 }

+bmap_inner = g_new0(AliasMapInnerBitmap, 1);
+bmap_inner->string = g_strdup(bmap_map_to);
+
 g_hash_table_insert(bitmaps_map,
-g_strdup(bmap_map_from), 
g_strdup(bmap_map_to));
+g_strdup(bmap_map_from), bmap_inner);
 }
 }

@@ -538,11 +554,16 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
BlockDriverState *bs,
 }

 if (bitmap_aliases) {
-bitmap_alias = g_hash_table_lookup(bitmap_aliases, bitmap_name);
-if (!bitmap_alias) {
+AliasMapInnerBitmap *bmap_inner;
+
+bmap_inner = g_hash_table_lookup(bitmap_aliases, bitmap_name);
+
+if (!bmap_inner) {
 /* Skip bitmaps with no alias */
 continue;
 }
+
+bitmap_alias = bmap_inner->string;
 } else {
 if (strlen(bitmap_name) > UINT8_MAX) {
 error_report("Cannot migrate bitmap '%s' on node '%s': "
@@ -1074,14 +1095,18 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,

 bitmap_name = s->bitmap_alias;
 if (!s->cancelled && bitmap_alias_map) {
-bitmap_name = g_hash_table_lookup(bitmap_alias_map,
+AliasMapInnerBitmap *bmap_inner;
+
+bmap_inner = g_hash_table_lookup(bitmap_alias_map,
   s->bitmap_alias);
-if (!bitmap_name) {
+if (!bmap_inner) {
 error_report("Error: Unknown bitmap alias '%s' on node "
  "'%s' (alias '%s')", s->bitmap_alias,
  s->bs->node_name, s->node_alias);
 cancel_incoming_locked(s);
 }
+
+bitmap_name = bmap_inner->string;
 }

 if (!s->cancelled) {
-- 
2.29.2

[PATCH 0/2] migration: dirty-bitmap: Allow control of bitmap persistence on destination

2021-02-03 Thread Peter Krempa

See 2/2 for explanation.

Please let me know if I need to add any tests for this.

Peter Krempa (2):
  migration: dirty-bitmap: Convert alias map inner members to a struct
  migration: dirty-bitmap: Allow control of bitmap persistence on
destination

 migration/block-dirty-bitmap.c | 67 ++
 qapi/migration.json|  7 +++-
 2 files changed, 66 insertions(+), 8 deletions(-)

-- 
2.29.2

Re: To start multiple KVM guests from one qcow2 image with transient disk option

2021-01-15 Thread Peter Krempa

On Thu, Jan 07, 2021 at 16:32:59 -0500, Masayoshi Mizuma wrote:
> On Thu, Jan 07, 2021 at 09:05:42AM +0100, Peter Krempa wrote:
> > On Tue, Jan 05, 2021 at 15:12:55 +0100, Peter Krempa wrote:
> > > On Mon, Jan 04, 2021 at 15:30:19 -0500, Masayoshi Mizuma wrote:
> > > > On Sat, Dec 19, 2020 at 11:30:39PM -0500, Masayoshi Mizuma wrote:

[...]

> Guest0 QMP:
> 
>   
> {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT-Guest0","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}
> 
> Guest1 QMP:
> 
>   
> {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT-Guest1","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}

Anyways, if you want, you can try implement this using the
frontend-hotplug approach for 'virtio' and 'scsi' disks, but I can't
guarantee that it will be accepted upstream in cases when the
implementation will turn out to be too hairy.

Re: To start multiple KVM guests from one qcow2 image with transient disk option

2021-01-07 Thread Peter Krempa

On Tue, Jan 05, 2021 at 15:12:55 +0100, Peter Krempa wrote:
> On Mon, Jan 04, 2021 at 15:30:19 -0500, Masayoshi Mizuma wrote:
> > On Sat, Dec 19, 2020 at 11:30:39PM -0500, Masayoshi Mizuma wrote:

 [...]

 >   {"execute":"cont"}
> 
> So that is a no-go. Some disk bus-es such as IDE don't support hotplug:
> 
> https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_hotplug.c#L1074
> 
> You could try to just instantiate the backend of the disk as read-only,
> and then create a writable overlay. You just need to make sure that the
> disk will be writable and that it works even for IDE/SATA which doesn't
> support read-only:
> 
> https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_validate.c#L2634

Okay, I've actually tried implementing my suggestion and that doesn't
work. With scsi disks qemu crashes on an assertion failure when I try
writing to the disk after setting it up as suggested and virtio disk
returns an I/O error as it remembers that it was read-only when
starting.

I'm considering whether it's worth implementing this via hotplug then.

Arguably simple deployments don't need it and complex ones have some
kind of orchestration which can do it externally.

This specifically comes with a caveat of the above, as currently the
overlay used to discard writes is created under the same path as the
image and can't be controlled, which might be a problem for complex
deployments.

Also the above algorithm with a constant suffix added to the image
prevents to use it with multiple VMs anyways, since the overlay file
name will collide (since it's generated based on the same rules).

Didn't you run into the collision?

Re: To start multiple KVM guests from one qcow2 image with transient disk option

2021-01-05 Thread Peter Krempa

On Mon, Jan 04, 2021 at 15:30:19 -0500, Masayoshi Mizuma wrote:
> On Sat, Dec 19, 2020 at 11:30:39PM -0500, Masayoshi Mizuma wrote:

[...]

> I think following qemu command line options and QMP commands work for sharing
> the qcow2 disks. The following uses disk hotplug instead of snapshot overlay.
> Does that make sense for libvirt...?
> 
> qemu command line options:

So you are proposing to ...

> 
>   qemu-system-x86_64 \
>   -M q35,accel=kvm,usb=off,vmport=off,smm=on,dump-guest-core=off \
>   -smp 1 \
>   -m 4096 \
>   -blockdev 
> '{"driver":"file","filename":"/home/mmizuma/debug/guest.qcow2","node-name":"storage1","auto-read-only":true,"discard":"unmap"}'
>  \
>   -blockdev 
> '{"node-name":"format1","read-only":true,"driver":"qcow2","file":"storage1","backing":null}'
>  \

... start with the disk already in 'read-only' mode _and_ skip addition
of the disk ...

>   -nographic \
>   -nodefaults \
>   -no-user-config \
>   -serial telnet::1,server,nowait \
>   -qmp tcp::10001,server,nowait \
>   -S \
>   -device pcie-root-port,id=pci.1
> 
> QMP commands:
> 
>   {"execute":"qmp_capabilities"}
>   
> {"execute":"blockdev-add","arguments":{"driver":"file","filename":"/var/lib/libvirt/images/guest.TRANSIENT","node-name":"storage2","auto-read-only":true,"discard":"unmap"}}
>   
> {"execute":"blockdev-create","arguments":{"job-id":"create","options":{"driver":"qcow2","file":"storage2","size":4294967296,"cluster-size":65536,"backing-file":"/var/lib/libvirt/images/guest.TRANSIENT","backing-fmt":"qcow2"}}}
>   
> {"execute":"blockdev-add","arguments":{"node-name":"format2","read-only":false,"driver":"qcow2","file":"storage2"}}

... and then add a writable overlay ...

>   
> {"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"format2","id":"transient-disk","bootindex":1,"bus":"pci.1","addr":0}}

... and hotplug the disk.
>   {"execute":"cont"}

So that is a no-go. Some disk bus-es such as IDE don't support hotplug:

https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_hotplug.c#L1074

You could try to just instantiate the backend of the disk as read-only,
and then create a writable overlay. You just need to make sure that the
disk will be writable and that it works even for IDE/SATA which doesn't
support read-only:

https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_validate.c#L2634

Re: RFC: don't store backing filename in qcow2 image

2020-12-11 Thread Peter Krempa

On Thu, Dec 10, 2020 at 17:26:52 +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!

Hi,

> 
> I have an idea, that not storing backing filename in qcow2 image at all may 
> be a good thing. I'll give some reasons and want to know what do you think 
> about it.
> 
> 1. Libvirt has to manage and keep in mind backing chains anyway.
> 
> This means, that storing this information in qcow2 header is a source of bugs 
> when we update it in one place but failed/forget to update in another. Of 
> course, Libvirt is not the only user of qemu.. But we are moving to 
> "blockdev" anyway, when management tool should control all node-names at 
> least. It would be strange to not control the relations between images in the 
> same time.

At the same time many users depend on this. If you move in images from
another host, you'd have to remember the dependencies/order.

> 2. backing file name specified in qcow2 metadata doesn't relate to any other 
> thing, and nothing rely on it.
> 
> 3. calculating and updating backing file name in Qemu is a headache:
>- with some options specified or with filters we risk to write json 
> filenames into qcow2 metadata, which is almost never what user wants. Also, 
> json may exceed the qcow2 limitation of backing_file_size to be <= 1023

As long as it works (libvirt and qemu have parsers for json:) I don't
think the user cares.

>- updating it in transactional way for read-only image during reopen, when 
> another transactional permission update is ongoing is difficult (who know, 
> how to do it?) (remember recent d669ed6ab02849 "block: make 
> bdrv_drop_intermediate() less wrong")
> 
> 4. Moving qcow2 files to another directory is a problem: you should care to 
> update backing file names in all dependent qcow2 images.

Or alternatively use relative names.

> So, what about moving libvirt (at least) to not rely on backing file name 
> stored in qcow2 image? Backing chain then should be in xml? Is it hard or 
> not? Finally, will it make the code simpler, or more difficult?
> 
> 
> Then, if the idea is good in general, what to do on Qemu part? If we want to 
> finally get rid of problem code (see [3.]) we should deprecate something.. 
> Just deprecate support for qcow2 images with backing file specified, 
> requiring user always specify backing chain by hand? I don't see anything 
> that should be changed in qcow2 format itself: no reason to add some kind of 
> restricted bits, etc..

I think this will create headaches for many users. Libvirt does support
specification of the chain manually, but doesn't mandate it.

It's also a fairly recent addition to libvirt so I doubt that any other
project which uses libvirt only for a part of the functionality (such as
oVirt or openstack) picked up the full specification of chain in the
XML. The problem here is that libvirt isn't used for the whole knowledge
state here. Rather projects like oVirt feed us a new XML every single
time. This means that they'd need to start keeping the chain info
internally too.

Rather they currently rely on our detection code and the proper setting
of paths in the image, and thus removing it would be a rather serious
regression in behaviour, which would be visible beyond libvirt without
any way for us to make it opaque to higher levels.

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-11-30 Thread Peter Krempa

On Mon, Nov 30, 2020 at 13:25:20 +0100, Kevin Wolf wrote:
> This series adds a QAPI type for the properties of all user creatable
> QOM types and finally makes QMP object-add use the new ObjectOptions
> union so that QAPI introspection can be used for user creatable objects.

FYI, here's a libvirt patchset which (apart from refactors) adds
compatibility with the deprecated/removed 'props' sub-object and also
adds testing to see whether all objects used by libvirt conform to the
new schema:

https://www.redhat.com/archives/libvir-list/2020-November/msg01625.html

Spoiler alert: surprisingly they do match so no updates are necessary!

Re: [PATCH v6 04/11] nbd: Update qapi to support exporting multiple bitmaps

2020-10-27 Thread Peter Krempa

On Tue, Oct 27, 2020 at 00:05:49 -0500, Eric Blake wrote:
> Since 'block-export-add' is new to 5.2, we can still tweak the
> interface; there, allowing 'bitmaps':['str'] is nicer than
> 'bitmap':'str'.  This wires up the qapi and qemu-nbd changes to permit
> passing multiple bitmaps as distinct metadata contexts that the NBD
> client may request, but the actual support for more than one will
> require a further patch to the server.
> 
> Note that there are no changes made to the existing deprecated
> 'nbd-server-add' command; this required splitting the QAPI type
> BlockExportOptionsNbd, which fortunately does not affect QMP
> introspection.
> 
> Signed-off-by: Eric Blake 
> ---
>  docs/system/deprecated.rst |  3 ++-
>  qapi/block-export.json | 41 +++---
>  blockdev-nbd.c |  6 +-
>  nbd/server.c   | 19 --
>  qemu-nbd.c | 18 -
>  5 files changed, 58 insertions(+), 29 deletions(-)

Reviewed-by: Peter Krempa

Re: [PATCH v5 06/12] nbd: Update qapi to support exporting multiple bitmaps

2020-10-26 Thread Peter Krempa

On Fri, Oct 23, 2020 at 13:36:46 -0500, Eric Blake wrote:
> Since 'nbd-server-add' is deprecated, and 'block-export-add' is new to
> 5.2, we can still tweak the interface.  Allowing 'bitmaps':['str'] is
> nicer than 'bitmap':'str'.  This wires up the qapi and qemu-nbd
> changes to permit passing multiple bitmaps as distinct metadata
> contexts that the NBD client may request, but the actual support for
> more than one will require a further patch to the server.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  docs/system/deprecated.rst |  4 +++-
>  qapi/block-export.json | 18 --
>  blockdev-nbd.c | 13 +
>  nbd/server.c   | 19 +--
>  qemu-nbd.c | 10 +-
>  5 files changed, 46 insertions(+), 18 deletions(-)
> 
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index 905628f3a0cb..d6cd027ac740 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -268,7 +268,9 @@ the 'wait' field, which is only applicable to sockets in 
> server mode
>  
> 
>  Use the more generic commands ``block-export-add`` and ``block-export-del``
> -instead.
> +instead.  As part of this deprecation, it is now preferred to export a
> +list of dirty bitmaps via ``bitmaps``, rather than a single bitmap via
> +``bitmap``.
> 
>  Human Monitor Protocol (HMP) commands
>  -
> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 893d5cde5dfe..c7c749d61097 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -74,10 +74,10 @@
>  # @description: Free-form description of the export, up to 4096 bytes.
>  #   (Since 5.0)
>  #
> -# @bitmap: Also export the dirty bitmap reachable from @device, so the
> -#  NBD client can use NBD_OPT_SET_META_CONTEXT with the
> -#  metadata context name "qemu:dirty-bitmap:NAME" to inspect the
> -#  bitmap. (since 4.0)
> +# @bitmaps: Also export each of the named dirty bitmaps reachable from
> +#   @device, so the NBD client can use NBD_OPT_SET_META_CONTEXT with
> +#   the metadata context name "qemu:dirty-bitmap:BITMAP" to inspect
> +#   each bitmap. (since 5.2)

Given unsynchronised release cycles between qemu and management apps
it's not cool to deprecate an interface without having at least one
release where the replacement interface is already stable.

This means that any project wanting to stay up to date will either have
to use deprecated interfaces for at least one release and develop the
replacement only when there's a stable interface or hope that they don't
have to change the interfaces too often.

This specifically impacts libvirt as we have validators which notify us
that a deprecated interface is used and we want to stay in sync, so that
there's one less group of users to worry about at the point qemu will
want to delete the interface.

>  # @allocation-depth: Also export the allocation depth map for @device, so
>  #the NBD client can use NBD_OPT_SET_META_CONTEXT with
> @@ -88,7 +88,7 @@
>  ##
>  { 'struct': 'BlockExportOptionsNbd',
>'data': { '*name': 'str', '*description': 'str',
> -'*bitmap': 'str', '*allocation-depth': 'bool' } }
> +'*bitmaps': ['str'], '*allocation-depth': 'bool' } }

This adds 'bitmaps' also to nbd-server-add, which should not happen. You
probably want to stop using 'base' for 'NbdServerAddOptions' and just
duplicate everything else.

>  # @NbdServerAddOptions:
> @@ -100,12 +100,18 @@
>  # @writable: Whether clients should be able to write to the device via the
>  #NBD connection (default false).
>  #
> +# @bitmap: Also export a single dirty bitmap reachable from @device, so the
> +#  NBD client can use NBD_OPT_SET_META_CONTEXT with the metadata
> +#  context name "qemu:dirty-bitmap:BITMAP" to inspect the bitmap
> +#  (since 4.0).  Mutually exclusive with @bitmaps, and newer
> +#  clients should use that instead.

This doesn't make sense, nbd-server-add never had @bitmaps. Also adding
features to a deprecated interface doesn't IMO make sense if you want to
motivate users switch to thne new one.

> +#
>  # Since: 5.0
>  ##
>  { 'struct': 'NbdServerAddOptions',
>'base': 'BlockExportOptionsNbd',
>'data': { 'device': 'str',
> -'*writable': 'bool' } }
> +'*writable': 'bool', '*bitmap': 'str' } }
> 
>  ##
>  # @nbd-server-add:

Re: Attached disk blockpull

2020-10-13 Thread Peter Krempa

This is a libvirt question, so asking it on the qemu-block might not get
you an answer that quick, ... or ever if I didn't notice your other
question also addressed incorrectly.

[adding libvirt-us...@redhat.com to cc]

On Tue, Sep 01, 2020 at 10:57:34 -0400, Yoonho Park wrote:
> I am trying to perform a blockpull on an attached disk, but I cannot do
> this using "virsh attach-disk" specifying a new, empty overlay file created
> with "qemu-img create". The VM does not pick up the backing_file path from

Could you please elaborate on the exact steps? specifically the
arguments for qemu-img create.

My feeling is that you didn't specify the backing image format (-F) when
creating the overlay, but it's hard to tell from this very sparse
question.

> the qcow2 overlay file, and there seems to be no way to specify the backing
> path through the attach-disk command itself. Instead the backing path has

virsh attach-disk is a syntax-sugar wrapper which generates the XML
document which is in fact used to attach disk. You can use virsh
attach-disk --print-xml to see what would be used.

> to be specified in the xml file used to create the VM, which precludes
> attaching a new disk to a running VM. Is this a bug? Is it possible to

using virsh attach-device allows you to pass in a whole 
subelement such as  with any configuration you are able to specify
when defining the XML.

The virsh attach-disk is limited to just "basic" config options.

> attach a disk to a running VM, and specify its backing path, using qemu
> directly? This is with qemu 4.2.0 and libvirt 6.0.0.

Re: Overlay limit bug

2020-10-13 Thread Peter Krempa

Adding libvir-list to cc. If you come across stuff that involves both
libvirt and qemu it's best to crosspost it to libvir-list too as I've
noticed this only accidentally.

On Mon, Oct 12, 2020 at 15:03:10 -0400, Yoonho Park wrote:
> I stumbled on a bug in qemu 4.2.0 (virsh 6.0.0) with a large number of
> overlays. I am using "qemu-img create" and "virsh snapshot-create-as" to
> create each overlay. When I run "virsh snapshot-create-as" for the 42nd
> overlay, I get "error: No complete monitor response found in 10485760

This error comes from libvirt's DoS protection. The response from qemu
is too large. This happens when 'query-named-block-nodes' is called on
too-deep backing chains as the reply contains both nested and linear
entries. Thus for a N deep backing chain, you get N + N-1 + N-2 ...
entries in the returned data.

Libvirt stops reading after 1MiB of json to prevent buffer overflows
from a rogue qemu.

> bytes: Numerical result out of range". However, I pulled down qemu 5.1.50
> (still using virsh 6.0.0), and it looks like the problem has disappeared
> which is great. Does anyone know the patch set that addressed this bug?

qemu patch:

commit facda5443f5a8676fb635b82ac1046ac6b6a67ce
Author: Peter Krempa 
Date:   Mon Jan 20 09:50:49 2020 +0100

qapi: Allow getting flat output from 'query-named-block-nodes'

When a management application manages node names there's no reason to
recurse into backing images in the output of query-named-block-nodes.

Add a parameter to the command which will return just the top level
structs.

v4.2.0-1584-gfacda5443f

libvirt patches:

commit 95080cc8b470c977d53043d4eff3e30781f472eb
Author: Peter Krempa 
Date:   Tue Jan 21 16:51:40 2020 +0100

qemu: Don't request nested entries in qemuBlockGetNamedNodeData

Use the 'flat' flag for 'query-named-block-nodes' if qemu supports
QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT in qemuBlockGetNamedNodeData.

We don't need the data so plumb in whether qemu supports the
'flat' output.

Signed-off-by: Peter Krempa 
Reviewed-by: Ján Tomko 

commit 855211bbf3ed45a73159f45eba1b15f05d771b76
Author: Peter Krempa 
Date:   Tue Jan 21 16:42:49 2020 +0100

qemu: monitor: Add 'flat' parameter for qemuMonitorJSONQueryNamedBlockNodes

Modern qemu allows to skip the nested redundant data in the output of
query-named-block-nodes. Plumb in the support for the argument that
enables it.

Signed-off-by: Peter Krempa 
Reviewed-by: Ján Tomko 

commit 63610bd5fbe67ba9eb4f22f67c4bdab6eda8c041
Author: Peter Krempa 
Date:   Wed Feb 26 12:50:53 2020 +0100

qemuCheckpointDiscardBitmaps: Use qemuBlockGetNamedNodeData

Replace qemuMonitorBlockGetNamedNodeData by qemuBlockGetNamedNodeData.

Signed-off-by: Peter Krempa 
Reviewed-by: Ján Tomko 

commit f886c9f33051fc03dd6c78134c92d31e7caf0c81
Author: Peter Krempa 
Date:   Tue Jan 21 16:33:12 2020 +0100

qemu: monitor: Refactor variable cleanup in 
qemuMonitorJSONQueryNamedBlockNodes

Use g_autoptr to get rid of the cleanup section.

Signed-off-by: Peter Krempa 
Reviewed-by: Ján Tomko 

commit b7991c903cdd5b3c8b79a206584a4db81a6d4eaa
Author: Peter Krempa 
Date:   Tue Jan 21 15:13:47 2020 +0100

qemu: capabilities: Add capability for the 'flat' argument of 
'query-named-block-nodes'

Detect the presence of the flag and make it available internally as
QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT.

Signed-off-by: Peter Krempa 
Reviewed-by: Ján Tomko 

v6.1.0-29-g95080cc8b4

> Also, does anyone know the "official" limit on the number of overlays that
> can be created and is there a qemu test that exercises this? I could not

Libvirt's official limit is 200 layers.

This is the documentation for the validation function explaining the
rationale:

/**
 * qemuDomainStorageSourceValidateDepth:
 * @src: storage source chain to validate
 * @add: offsets the calculated number of images
 * @diskdst: optional disk target to use in error message
 *
 * The XML parser limits the maximum element nesting to 256 layers. As libvirt
 * reports the chain into the status and in some cases the config XML we must
 * validate that any user-provided chains will not exceed the XML nesting limit
 * when formatted to the XML.
 *
 * This function validates that the storage source chain starting @src is at
 * most 200 layers deep. @add modifies the calculated value to offset the number
 * to allow checking cases when new layers are going to be added to the chain.
 *
 * Returns 0 on success and -1 if the chain is too deep. Error is reported.
 */
int
qemuDomainStorageSourceValidateDepth(virStorageSourcePtr src,
 int add,
 const char *diskdst)

> find an overlay limit test in tests/qemu-iotests.

I'd guess qemu doesn't really limit that, but at some point you'll get
too much of a performance penalty form traversing all the structs.

Re: [PATCH v2 0/2] block: deprecate the sheepdog driver

2020-09-23 Thread Peter Krempa

On Tue, Sep 22, 2020 at 18:42:52 +0100, Daniel Berrange wrote:
> On Tue, Sep 22, 2020 at 01:09:06PM -0400, Neal Gompa wrote:
> > On Tue, Sep 22, 2020 at 12:16 PM Daniel P. BerrangÃ©  
> > wrote:
> > >
> > > 2 years back I proposed dropping the sheepdog mailing list from the
> > > MAINTAINERS file, but somehow the patch never got picked up:
> > >
> > >   https://lists.gnu.org/archive/html/qemu-block/2018-03/msg01048.html
> > >
> > > So here I am with the same patch again.
> > >
> > > This time I go further and deprecate the sheepdog driver entirely.
> > > See the rationale in the second patch commit message.
> > >
> > > Daniel P. BerrangÃ© (2):
> > >   block: drop moderated sheepdog mailing list from MAINTAINERS file
> > >   block: deprecate the sheepdog block driver
> > >
> > >  MAINTAINERS|  1 -
> > >  block/sheepdog.c   | 15 +++
> > >  configure  |  5 +++--
> > >  docs/system/deprecated.rst |  9 +
> > >  4 files changed, 27 insertions(+), 3 deletions(-)
> > >
> > > --
> > > 2.26.2
> > >
> > >
> > 
> > I don't know of anyone shipping this other than Fedora, and I
> > certainly don't use it there.
> > 
> > Upstream looks like it's unmaintained now, with no commits in over two
> > years: https://github.com/sheepdog/sheepdog/commits/master
> > 
> > Can we also get a corresponding change to eliminate this from libvirt?
> 
> This is only deprecation in QEMU, the feature still exists and is
> intended to be as fully functional as in previous releases.
> 
> Assuming QEMU actually deletes the feature at end of the deprecation
> cycle, then libvirt would look at removing its own support.

There are two sheepdog-related bits in libvirt which are IMO completely
separate:

1) the blockdev backend handling for the 'sheepdog' protocol driver

 This is the one connected to qemu's deprecation, but until there is a
 qemu version where sheepdog wasn't deprecated/removed supported by
 libvirt we can't really do much about it.

 On the other hand it's just a few generators of arguments for
 -drive/-blockdev so the burden is very low.

2) the sheepdog storage driver

 This one is completely separate from qemu and we can decide to
 deprecate/delete it at our own schedule as it will not influence the
 ability to start VMs.

 The last non-housekeeping commit in that driver seems to be dated:
 Thu Jun 18 16:20:42 2015

 Similarly the burden of keeping this around is low, but I would not bat
 an eye if we remove it right away even.

Re: [PATCH v3 5/6] qemu: Introduce to handle transient disk

2020-09-18 Thread Peter Krempa

On Thu, Sep 17, 2020 at 09:30:44 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Here is the implementation of transient option for qcow2 and raw format
> disk. This gets available  directive in domain xml file
> like as:
> 
> 
>   
>   
>   
>   
> 
> 
> When the qemu command line options are built, a new qcow2 image is
> created with backing qcow2 by using blockdev-snapshot command.
> The backing image is the qcow2 file which is set as .
> The filename of the new qcow2 image is original-source-file.TRANSIENT.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_snapshot.c  | 103 +++---
>  src/qemu/qemu_snapshot.h  |   5 ++
>  src/util/virstoragefile.c |   2 +
>  src/util/virstoragefile.h |   4 ++
>  4 files changed, 107 insertions(+), 7 deletions(-)
> 
> diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
> index 1e8ea80b22..67fdc488e0 100644
> --- a/src/qemu/qemu_snapshot.c
> +++ b/src/qemu/qemu_snapshot.c
> @@ -1158,7 +1158,8 @@ qemuSnapshotCreateDiskActive(virQEMUDriverPtr driver,
>   virHashTablePtr blockNamedNodeData,
>   unsigned int flags,
>   virQEMUDriverConfigPtr cfg,
> - qemuDomainAsyncJob asyncJob)
> + qemuDomainAsyncJob asyncJob,
> + bool domSave)
>  {
>  qemuDomainObjPrivatePtr priv = vm->privateData;
>  g_autoptr(virJSONValue) actions = NULL;
> @@ -1201,17 +1202,26 @@ qemuSnapshotCreateDiskActive(virQEMUDriverPtr driver,
>  
>  virDomainAuditDisk(vm, dd->disk->src, dd->src, "snapshot", rc >= 0);
>  
> -if (rc == 0)
> +if (rc == 0) {
>  qemuSnapshotDiskUpdateSource(driver, vm, dd, blockdev);
> +
> +if (dd->disk->transient) {
> +/* Mark the transient working is completed to make sure we 
> can */
> +/* remove the transient disk when the guest is shutdown. */
> +dd->disk->src->transientEstablished = true;
> +}
> +}
>  }
>  
>  if (rc < 0)
>  goto cleanup;
>  
> -if (virDomainObjSave(vm, driver->xmlopt, cfg->stateDir) < 0 ||
> -(vm->newDef && virDomainDefSave(vm->newDef, driver->xmlopt,
> -cfg->configDir) < 0))
> -goto cleanup;
> +if (domSave) {
> +if (virDomainObjSave(vm, driver->xmlopt, cfg->stateDir) < 0 ||
> +(vm->newDef && virDomainDefSave(vm->newDef, driver->xmlopt,
> +cfg->configDir) < 0))
> +goto cleanup;
> +}
>  
>  ret = 0;
>  
> @@ -1349,7 +1359,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriverPtr 
> driver,
>  
>  if ((ret = qemuSnapshotCreateDiskActive(driver, vm, snap,
>  blockNamedNodeData, flags, cfg,
> -QEMU_ASYNC_JOB_SNAPSHOT)) < 0)
> +QEMU_ASYNC_JOB_SNAPSHOT, true)) 
> < 0)
>  goto cleanup;
>  
>  /* the snapshot is complete now */
> @@ -2264,3 +2274,82 @@ qemuSnapshotDelete(virDomainObjPtr vm,
>   cleanup:
>  return ret;
>  }
> +
> +
> +static int
> +qemuSnapshotSetupTransientDisk(virDomainSnapshotDiskDefPtr def,
> +   virStorageSourcePtr src)
> +{
> +g_autoptr(virStorageSource) trans = NULL;
> +
> +if (!(trans = virStorageSourceNew()))
> +return -1;
> +
> +trans->path = g_strdup_printf("%s.TRANSIENT", src->path);
> +if (virFileExists(trans->path)) {
> +virReportError(VIR_ERR_INVALID_ARG,
> +   _("Transient disk '%s' for '%s' exists"),
> +   trans->path, src->path);
> +return -1;
> +}
> +
> +trans->type = VIR_STORAGE_TYPE_FILE;
> +trans->format = VIR_STORAGE_FILE_QCOW2;
> +
> +def->src = g_steal_pointer();
> +
> +return 0;
> +}
> +
> +
> +int
> +qemuSnapshotCreateTransientDisk(virQEMUDriverPtr driver,
> +virDomainObjPtr vm,
> +int asyncJob)
> +{
> +int rc;
> +size_t i;
> +virDomainMomentObjPtr snap = NULL;
> +g_autoptr(virDomainSnapshotDef) snapdef = NULL;
> +g_autoptr(virHashTable) blockNamedNodeData = NULL;
> +qemuDomainObjPrivatePtr priv = vm->privateData;
> +g_autoptr(virQEMUDriverConfig) cfg = 
> virQEMUDriverGetConfig(priv->driver);
> +
> +if (!(snapdef = virDomainSnapshotDefNew()))
> +return -1;

This shouldn't be necessary if you factor out stuff out of
qemuSnapshotCreateDiskActive. I'll send a prerequisite patch doing the
refactor as I've requested last time.

> +
> +snapdef->parent.name = g_strdup_printf("transient");
> +
> +snapdef->ndisks = vm->def->ndisks;
> +if (VIR_ALLOC_N(snapdef->disks, snapdef->ndisks) < 0)
> +return -1;

Re: [PATCH v3 4/6] qemu: update the validation for transient disk option

2020-09-18 Thread Peter Krempa

On Thu, Sep 17, 2020 at 09:30:43 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Update validation of transient disk option. The option for qemu is supported
> with under condistions.
> 
> - qemu has blockdev feature 
> - the type is file and the format is qcow2 and raw
> - writable disk
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_validate.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
> index 070f1c962b..9cf78ca0c9 100644
> --- a/src/qemu/qemu_validate.c
> +++ b/src/qemu/qemu_validate.c
> @@ -2002,6 +2002,28 @@ qemuValidateDomainDeviceDefDiskSerial(const char 
> *value)
>  }
>  
>  
> +static int

This is declared as returning 'int'

> +qemuValidateDomainDeviceDefDiskTransient(const virDomainDiskDef *disk,

If this function is called qemuValidateDomainDeviceDefDiskTransient it
should also do all of the validation including reporting errors.

> + virQEMUCapsPtr qemuCaps)
> +{
> +if ((!qemuCaps) || !virQEMUCapsGet(qemuCaps, QEMU_CAPS_BLOCKDEV))

Parentheses are not required around negation

> +return false;

but returns booleans instead

> +
> +if (disk->src->readonly)
> +return false;
> +
> +if ((disk->src->format != VIR_STORAGE_FILE_QCOW2) &&
> +(disk->src->format != VIR_STORAGE_FILE_RAW) &&
> +(disk->src->type != VIR_STORAGE_TYPE_FILE)) {
> +return false;
> +}
> +
> +if (virStorageSourceIsEmpty(disk->src))
> +return false;
> +
> +return true;
> +}
> +
>  static int
>  qemuValidateDomainDeviceDefDiskFrontend(const virDomainDiskDef *disk,
>  virQEMUCapsPtr qemuCaps)
> @@ -2186,7 +2208,8 @@ qemuValidateDomainDeviceDefDiskFrontend(const 
> virDomainDiskDef *disk,
>  }
>  }
>  
> -if (disk->transient) {
> +if ((disk->transient) &&
> +!qemuValidateDomainDeviceDefDiskTransient(disk, qemuCaps)) {
>  virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> _("transient disks not supported yet"));

The error messages this produces are really suboptimal. You don't know
whether your qemu is old, whether incorrect format is used or whether
it's a network disk.

>  return -1;

I'll rewrite this patch to produce better error messages and fix the
problems above.

Re: [PATCH v3 3/6] qemu: Block migration when transient disk option is enabled

2020-09-18 Thread Peter Krempa

On Thu, Sep 17, 2020 at 09:30:42 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Block migration when transient disk option is enabled because migration
> requires some blockjobs.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_migration.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index a530c17582..7316d74677 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -1397,6 +1397,16 @@ qemuMigrationSrcIsAllowed(virQEMUDriverPtr driver,
> _("cannot migrate this domain without 
> dbus-vmstate support"));
>  return false;
>  }
> +
> +for (i = 0; i < vm->def->ndisks; i++) {
> +virDomainDiskDefPtr disk = vm->def->disks[i];
> +
> +if (disk->transient) {
> +virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> +   _("migration with transient disk is not 
> supported"));

transient disk '%s' is not supported

and use disk->dst as '%s'

> +return false;
> +}
> +}

Reviewed-by: Peter Krempa

Re: [PATCH v3 1/6] qemu: Block blockjobs when transient disk option is enabled

2020-09-18 Thread Peter Krempa

On Fri, Sep 18, 2020 at 14:53:53 +0200, Peter Krempa wrote:
> On Thu, Sep 17, 2020 at 09:30:40 -0400, Masayoshi Mizuma wrote:
> > From: Masayoshi Mizuma 
> > 
> > Signed-off-by: Masayoshi Mizuma 
> > ---
> >  src/qemu/qemu_domain.c | 7 +++
> >  1 file changed, 7 insertions(+)
> 
> Reviewed-by: Peter Krempa 

Actually, I'd prefer some explanation in the commit message. No need to
re-send for now as I'll probably be fixing some things in other patches
as well.

Re: [PATCH v3 2/6] qemu: Block disk hotplug when transient disk option is enabled

2020-09-18 Thread Peter Krempa

On Thu, Sep 17, 2020 at 09:30:41 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_hotplug.c | 6 ++
>  1 file changed, 6 insertions(+)

Similarly to previous commit, the code is good but the commit message
could actually describe the reasons.

Reviewed-by: Peter Krempa

Re: [PATCH v3 1/6] qemu: Block blockjobs when transient disk option is enabled

2020-09-18 Thread Peter Krempa

On Thu, Sep 17, 2020 at 09:30:40 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_domain.c | 7 +++
>  1 file changed, 7 insertions(+)

Reviewed-by: Peter Krempa

Re: [PATCH v2 6/7] qemu: Block disk hotplug when transient disk option is enabled

2020-09-08 Thread Peter Krempa

On Fri, Aug 28, 2020 at 10:08:36 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Block disk hotplug when transient disk option is enabled so far.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_hotplug.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
> index 2c6c30ce03..1c1b6c3acf 100644
> --- a/src/qemu/qemu_hotplug.c
> +++ b/src/qemu/qemu_hotplug.c
> @@ -1031,6 +1031,12 @@ 
> qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr driver,
>  return -1;
>  }
>  
> +if (disk->transient) {
> +virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
> +   _("transient disk hotplug isn't supported"));
> +return -1;
> +}

Please move this up in the series so that it's in place before enabling
the feature.

Re: [PATCH v2 5/7] qemu: Block migration when transient disk option is enabled

2020-09-08 Thread Peter Krempa

On Fri, Aug 28, 2020 at 10:08:35 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Block migration when transient disk option is enabled because migration
> requires some blockjobs.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_migration.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> index 0f2f92b211..6fcf5a3a07 100644
> --- a/src/qemu/qemu_migration.c
> +++ b/src/qemu/qemu_migration.c
> @@ -2949,6 +2949,22 @@ qemuMigrationDstPrepareDirect(virQEMUDriverPtr driver,
>  }
>  
>  
> +static bool
> +qemuMigrationTransientDiskExists(virDomainDefPtr def)
> +{
> +size_t i;
> +
> +for (i = 0; i < def->ndisks; i++) {
> +   virDomainDiskDefPtr disk = def->disks[i];
> +
> +   if (disk->transient)
> +return true;
> +}
> +
> +return false;
> +}
> +
> +
>  virDomainDefPtr
>  qemuMigrationAnyPrepareDef(virQEMUDriverPtr driver,
> virQEMUCapsPtr qemuCaps,
> @@ -2971,6 +2987,12 @@ qemuMigrationAnyPrepareDef(virQEMUDriverPtr driver,
>  VIR_DOMAIN_DEF_PARSE_SKIP_VALIDATE)))
>  goto cleanup;
>  
> +/*
> + * transient disk option is a blocker for migration
> + */
> +if (qemuMigrationTransientDiskExists(def))
> +   goto cleanup;

This should really be placed into qemuMigrationSrcIsAllowed() rather
than open-coded. Migration is used in other places too and doesn't use
this API to trigger it.

Re: [PATCH v2 7/7] qemu: Block blockjobs when transient disk option is enabled

2020-09-08 Thread Peter Krempa

On Fri, Aug 28, 2020 at 10:08:34 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Block blockjobs when transient disk option is enabled so far.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_domain.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index e28f704dba..98a52e5476 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -10678,6 +10678,13 @@ qemuDomainDiskBlockJobIsSupported(virDomainObjPtr vm,
>  return false;
>  }
>  
> +if (disk->transient) {
> +virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
> +   _("block jobs are not supported on transient disk 
> '%s'"),
> +   disk->dst);
> +return false;
> +}

Please move this patch a bit sooner in the series so that all checks are
in place before enabling the feature.

Re: [PATCH v2 4/7] qemu: Transient option gets avaiable for qcow2 and raw format disk

2020-09-08 Thread Peter Krempa

On Fri, Aug 28, 2020 at 10:08:33 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_validate.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
> index 488f258d00..82818a4fdc 100644
> --- a/src/qemu/qemu_validate.c
> +++ b/src/qemu/qemu_validate.c
> @@ -2166,9 +2166,12 @@ qemuValidateDomainDeviceDefDiskFrontend(const 
> virDomainDiskDef *disk,
>  }
>  
>  if (disk->transient) {
> -virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> -   _("transient disks not supported yet"));
> -return -1;
> +if ((disk->src->format != VIR_STORAGE_FILE_QCOW2) &&
> +(disk->src->format != VIR_STORAGE_FILE_RAW)) {

This needs a check that QEMU_CAPS_BLOCKDEV is available as it won't
really work properly in pre-blockdev config. Also here you
should reject any other unsupported configuration rather than in the
code.

> +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> +   _("transient disks not supported yet"));
> +return -1;
> +}
>  }
>  
>  if (disk->iomode == VIR_DOMAIN_DISK_IO_NATIVE &&
> -- 
> 2.27.0
>

Re: [PATCH v2 1/7] qemuSnapshotDiskPrepareOne: Get available even if snapdisk is NULL

2020-09-08 Thread Peter Krempa

On Fri, Aug 28, 2020 at 10:08:30 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Get available even if snapdisk argument is NULL at 
> qemuSnapshotDiskPrepareOne()
> so that the caller can setup dd->src.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_snapshot.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
> index 1e8ea80b22..d310e6ff02 100644
> --- a/src/qemu/qemu_snapshot.c
> +++ b/src/qemu/qemu_snapshot.c
> @@ -953,8 +953,9 @@ qemuSnapshotDiskPrepareOne(virQEMUDriverPtr driver,
>  if (qemuDomainStorageSourceValidateDepth(disk->src, 1, disk->dst) < 0)
>  return -1;
>  
> -if (!(dd->src = virStorageSourceCopy(snapdisk->src, false)))
> -return -1;
> +if (snapdisk)
> +if (!(dd->src = virStorageSourceCopy(snapdisk->src, false)))
> +return -1;

NACK, you can pass in a 'snapdisk' with the correct data to create the
overlay which will be a cleaner solution.

Re: [PATCH v2 2/7] qemu: Introduce functions to handle transient disk

2020-09-08 Thread Peter Krempa

On Fri, Aug 28, 2020 at 10:08:31 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Here is the implementation of transient option for qcow2 and raw format
> disk. This gets available  directive in domain xml file
> like as:
> 
> 
>   
>   
>   
>   
> 
> 
> When the qemu command line options are built, a new qcow2 image is
> created with backing qcow2 by using blockdev-snapshot command.
> The backing image is the qcow2 file which is set as .
> The filename of the new qcow2 image is original-source-file.TRANSIENT.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_snapshot.c  | 134 ++
>  src/qemu/qemu_snapshot.h  |   8 +++
>  src/util/virstoragefile.h |   2 +
>  3 files changed, 144 insertions(+)
> 
> diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
> index d310e6ff02..5c61d19f26 100644
> --- a/src/qemu/qemu_snapshot.c
> +++ b/src/qemu/qemu_snapshot.c
> @@ -2265,3 +2265,137 @@ qemuSnapshotDelete(virDomainObjPtr vm,
>   cleanup:
>  return ret;
>  }
> +
> +static int
> +qemuTransientDiskPrepareOne(virQEMUDriverPtr driver,

This should be named with a qemuSnapshot... prefix:

qemuSnapshotDiskTransientPrepareOne for example.

> +virDomainObjPtr vm,
> +qemuSnapshotDiskDataPtr data,
> +virHashTablePtr blockNamedNodeData,
> +int asyncJob,
> +virJSONValuePtr actions)
> +{
> +int rc = -1;
> +virStorageSourcePtr dest;

You can use g_autoptr here.

> +virStorageSourcePtr src = data->disk->src;
> +qemuDomainObjPrivatePtr priv = vm->privateData;
> +g_autoptr(virQEMUDriverConfig) cfg = 
> virQEMUDriverGetConfig(priv->driver);
> +
> +if (!strlen(src->path))

This will not work as expected. You must make sure that the source disk
is VIR_STORAGE_TYPE_FILE. presence of src->path does not guarantee it.

Also you can use virStorageSourceIsEmpty() here if you want to skip
empty cdroms, but transient cdrom/readonly disks should be rejected in
the config validator beforehand.

> +return rc;

Please always return constants right away. (rc is a constant at this
point).

> +
> +if (!(dest = virStorageSourceNew()))
> +return rc;
> +
> +dest->path = g_strdup_printf("%s.TRANSIENT", src->path);
> +
> +if (virFileExists(dest->path)) {
> +virReportError(VIR_ERR_INVALID_ARG,
> +   _("Transient disk '%s' for '%s' exists"),
> +   dest->path, src->path);
> +goto cleanup;
> +}
> +
> +dest->type = VIR_STORAGE_TYPE_FILE;
> +dest->format = VIR_STORAGE_FILE_QCOW2;
> +data->src = dest;
> +
> +if (qemuSnapshotDiskPrepareOne(driver, vm, cfg, data->disk,
> + NULL, data, blockNamedNodeData,

Create a virDomainSnapshotDiskDefPtr with the correct data from above
and pass it in here, rather than working around the addition to the data
structure.

> + false, true, asyncJob, actions) < 0)
> +goto cleanup;
> +
> +rc = 0;
> + cleanup:
> +if (rc < 0)
> +g_free(dest->path);
> +
> +return rc;
> +}
> +
> +static int
> +qemuWaitTransaction(virQEMUDriverPtr driver,
> +virDomainObjPtr vm,
> +int asyncJob,
> +virJSONValuePtr *actions)

This function isn't IMO necessary.

> +{
> +qemuDomainObjPrivatePtr priv = vm->privateData;
> +
> +if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0)
> +return -1;
> +
> +if (qemuMonitorTransaction(priv->mon, actions) < 0)
> +return -1;
> +
> +if (qemuDomainObjExitMonitor(driver, vm) < 0)
> +return -1;
> +
> +rturn 0;
> +}
> +
> +int
> +qemuTransientCreatetDisk(virQEMUDriverPtr driver,
> + virDomainObjPtr vm,
> + int asyncJob)
> +{
> +size_t i;
> +int rc = -1;
> +size_t ndata = 0;
> +qemuSnapshotDiskDataPtr data = NULL;
> +g_autoptr(virJSONValue) actions = NULL;
> +qemuDomainObjPrivatePtr priv = vm->privateData;
> +g_autoptr(virHashTable) blockNamedNodeData = NULL;
> +bool blockdev =  virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV);

Imo it will be better to factor out the common parts from this and
qemuSnapshotCreateDiskActive and then just use the disjunct logic in
this function.

> +
> +if (!blockdev)
> +return rc;

This breaks startup of VMs with pre-blockdev qemu as it returns -1 here
directly when blockdev is not supported.

> +if (VIR_ALLOC_N(data, vm->def->ndisks) < 0)
> +return rc;
> +
> +if (!(blockNamedNodeData = qemuBlockGetNamedNodeData(vm, asyncJob)))
> +goto cleanup;
> +
> +if (!(actions = virJSONValueNewArray()))
> +goto cleanup;
> +
> +for (i = 0; i < vm->def->ndisks; i++) {
> +

Re: qcow2 overlay performance

2020-08-26 Thread Peter Krempa

On Wed, Aug 26, 2020 at 15:30:03 +0200, Peter Krempa wrote:
> On Wed, Aug 26, 2020 at 15:18:32 +0200, Kevin Wolf wrote:
> > Am 26.08.2020 um 02:46 hat Yoonho Park geschrieben:
> > > Another issue I hit is that I cannot set or change the cluster size of
> > > overlays. Is this possible with "virsh snapshot-create-as"?
> > 
> > That's a libvirt question. Peter, can you help?

[...]

> If you figure out that cluster size matters a lot and qemu will not want
> to change the default, please file a feature request for libvirt.

After sending the previous mail I thought about it for a bit and IMO a
majority of cases would be fixed by using the cluster size of the
original image in the newly created image. I've created patches which do
this with libvirt. You can get them here:

https://www.redhat.com/archives/libvir-list/2020-August/msg00995.html

Re: qcow2 overlay performance

2020-08-26 Thread Peter Krempa

On Wed, Aug 26, 2020 at 15:18:32 +0200, Kevin Wolf wrote:
> Am 26.08.2020 um 02:46 hat Yoonho Park geschrieben:
> > Another issue I hit is that I cannot set or change the cluster size of
> > overlays. Is this possible with "virsh snapshot-create-as"?
> 
> That's a libvirt question. Peter, can you help?

Currently the libvirt snapshot API doesn't support configuration of the
cluster size of the new image, (it should be straightforward to
implement it now that we create images via blockdev-create, so the only
open question is how to expose it in the XML).

There's a reasonalby straightforward workaround (at least for testing),
depending on how you want to approach it. You can create your own
overlay file using qemu-img create (make sure to properly specify
-F and -b) using your desired cluster size. You then use 'virsh
snapshot-create --reuse-external' which expects that the images
described by the snapshot XML are already existing files and uses those
(without touching the metadata at all) rather than creating new files.

If you figure out that cluster size matters a lot and qemu will not want
to change the default, please file a feature request for libvirt.

Either way, the best performing option should be the default though.

Re: [RFC PATCH 07/22] block/export: Remove magic from block-export-add

2020-08-20 Thread Peter Krempa

On Thu, Aug 20, 2020 at 09:41:14 -0500, Eric Blake wrote:
> On 8/20/20 6:05 AM, Kevin Wolf wrote:
> 
> > As long as we can keep the compatibility code local to qmp_nbd_*(), I
> > don't think it's too bad. In particular because it's already written.
> > 
> > Instead of adjusting libvirt to changes in the nbd-* commands, I'd
> > rather have it change over to block-export-*. I would like to see the
> > nbd-server-add/remove commands deprecated soon after we have the
> > replacements.
> 
> Makes sense to me. So deprecate nbd-server-add in 5.2, and require
> block-export in 6.1.
> 
> 
> > > > +/*
> > > > + * nbd-server-add doesn't complain when a read-only device should 
> > > > be
> > > > + * exported as writable, but simply downgrades it. This is an 
> > > > error with
> > > > + * block-export-add.
> > > 
> > > I'd be happy with either marking this deprecated now (and fixing it in two
> > > releases), or declaring it a bug in nbd-server-add now (and fixing it
> > > outright).
> > 
> > How about deprecating nbd-server-add completely?
> 
> Works for me. Keeping the warts backwards-compatible in nbd-server-add is
> more palatable if we know we are going to drop it wholesale down the road.
> 
> > > > +/*
> > > > + * nbd-server-add removes the export when the named BlockBackend 
> > > > used for
> > > > + * @device goes away.
> > > > + */
> > > > +on_eject_blk = blk_by_name(arg->device);
> > > > +if (on_eject_blk) {
> > > > +nbd_export_set_on_eject_blk(export, on_eject_blk);
> > > > +}
> > > 
> > > Wait - is the magic export removal tied only to exporting a drive name, 
> > > and
> > > not a node name?  So as long as libvirt is using only node names whwen
> > > adding exports, a drive being unplugged won't interfere?
> > 
> > Yes, seems so. It's the existing behaviour, I'm only moving the code
> > around.
> > 
> > > Overall, the change makes sense to me, although I'd love to see if we 
> > > could
> > > go further on the writable vs. read-only issue.
> > 
> > If nbd-server-add will be going away relatively soon, it's probably not
> > worth the trouble. But if you have reasons to keep it, maybe we should
> > consider it.
> 
> No, I'm fine with the idea of getting rid of nbd-server-add, at which point
> changing it before removal is pointless.

I agree that this is a better approach. While it's technically possible
to retrofit old commands since QMP schema introspection allows granilar
detection of what's happening in this regard using a new command is IMO
better. Specifically for APPS which might not use QMP introspection to
the extent libvirt does.

Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-08-06 Thread Peter Krempa

On Sun, Jul 19, 2020 at 22:56:50 -0400, Masayoshi Mizuma wrote:
> On Sat, Jul 18, 2020 at 08:06:00AM +0200, Peter Krempa wrote:
> > On Thu, Jul 16, 2020 at 20:55:29 -0400, Masayoshi Mizuma wrote:
> > > Thank you for your review.
> > > 
> > > On Tue, Jul 07, 2020 at 06:36:23AM -0500, Eric Blake wrote:
> > > > On 7/7/20 2:12 AM, Peter Krempa wrote:
> > > > 
> > > > > You can install a qcow2 overlay on top of a raw file too. IMO the
> > > > > implications of using  allow that.
> > > > > 
> > > > > As said above I'd strongly prefer if the overlay is created in qemu
> > > > > using the blockdev-create blockjob (there is already infrastructure in
> > > > > libvirt to achieve that).
> > > > 
> > > > Agreed.  At this point, any time we call out to qemu-img as a separate
> > > > process, we are probably doing it wrong.
> > > 
> > > Got it. I'm thinking about the procedure such as followings.
> > > Does that make sense?
> > > 
> > >   1) Open the monitor with qemuProcessQMPNew()/qemuProcessQMPStart(), 
> > >  and connect it.
> > 
> > Starting a new qemu process just to format an image is extreme overkill
> > and definitely not what we want to do.
> 
> I see, thanks.
> 
> > 
> > >   2) Setup the transient disk with 
> > > qemuDomainPrepareStorageSourceBlockdev(),
> > >  qemuBlockStorageSourceAttachApplyStorage(), 
> > > qemuBlockStorageSourceCreateGetFormatProps()
> > >  and something...
> > > 
> > >   3) Run blockdev-create command with qemuMonitorBlockdevCreate(), then
> > >  close the monitor.
> > 
> > These two steps should be exectued in the qemu process which already
> > will run the VM prior to starting the guest CPUs.
> > 
> > >   4) Switch the original disk to the transient disk.
> > > 
> > >   5) Build the blockdev argument for qemu.
> > 
> > And instead of this step, you use the external snapshot infrastructure
> > to install the overlays via 'blockdev-snapshot' QMP command
> 
> OK. I suppose qemuDomainSnapshotDiskPrepare() and
> qemuDomainSnapshotDiskUpdateSource() maybe helpful to implement the
> setup steps of transient disk.
> 
> > 
> > > 
> > >   6) Run qemu
> > 
> > And instead of this the VM cpus will be started.
> 
> Got it, I think the appropriate place is after virCommandRun() at 
> qemuProcessLaunch(),
> and before qemuProcessFinishStartup().
> 
> > 
> > 
> > The above steps require factoring out snapshot code a bit. I have a few
> > patches in that direction so I'll try posting them next week hopefully.

Sorry this took longer than expected, but we were dealing with the build
system change.

The refactor is here:

https://www.redhat.com/archives/libvir-list/2020-August/msg00299.html

You can now create an equivalent of 'qemuSnapshotDiskPrepare' which will
go through the disks and find the 'transient' ones. It will then create
snapshot data by a call to 'qemuSnapshotDiskPrepareOne' with a faked
snapshot disk object.

'qemuSnapshotDiskPrepareOne' ensures that the files are created and
formatted properly.

You then use same algorithm as 'qemuSnapshotCreateDiskActive'
(e.g. by extracting the common internals (basically everything except
call to 'qemuSnapshotDiskPrepare') into a separate function) and reuse
it when starting the VM as you've described above.

Note that all of the above can work only when QEMU_CAPS_BLOCKDEV is
supported.

The caveats/limitations with blockjobs and snapshots still apply though.
It depends on how you approach it. It's okay to limit/block the features
if transient disk is used. Alternatively you can implement some form of
private data to mark which image was transient and allow those
operations.

Re: [PATCH 1/2] qcow2: Release read-only bitmaps when inactivated

2020-07-30 Thread Peter Krempa

On Thu, Jul 30, 2020 at 14:02:33 +0200, Max Reitz wrote:
> During migration, we release all bitmaps after storing them on disk, as
> long as they are (1) stored on disk, (2) not read-only, and (3)
> consistent.
> 
> (2) seems arbitrary, though.  The reason we do not release them is
> because we do not write them, as there is no need to; and then we just
> forget about all bitmaps that we have not written to the file.  However,
> read-only persistent bitmaps are still in the file and in sync with
> their in-memory representation, so we may as well release them just like
> any R/W bitmap that we have updated.
> 
> It leads to actual problems, too: After migration, letting the source
> continue may result in an error if there were any bitmaps on read-only
> nodes (such as backing images), because those have not been released by
> bdrv_inactive_all(), but bdrv_invalidate_cache_all() attempts to reload
> them (which fails, because they are still present in memory).
> 

I've tested it with same commands as I've used before and now the 'cont'
succeeds and also the bitmaps after the cont call are loaded and active
at least according to 'query-named-block-nodes'

Tested-by: Peter Krempa

Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-18 Thread Peter Krempa

On Thu, Jul 16, 2020 at 20:55:29 -0400, Masayoshi Mizuma wrote:
> Thank you for your review.
> 
> On Tue, Jul 07, 2020 at 06:36:23AM -0500, Eric Blake wrote:
> > On 7/7/20 2:12 AM, Peter Krempa wrote:
> > 
> > > You can install a qcow2 overlay on top of a raw file too. IMO the
> > > implications of using  allow that.
> > > 
> > > As said above I'd strongly prefer if the overlay is created in qemu
> > > using the blockdev-create blockjob (there is already infrastructure in
> > > libvirt to achieve that).
> > 
> > Agreed.  At this point, any time we call out to qemu-img as a separate
> > process, we are probably doing it wrong.
> 
> Got it. I'm thinking about the procedure such as followings.
> Does that make sense?
> 
>   1) Open the monitor with qemuProcessQMPNew()/qemuProcessQMPStart(), 
>  and connect it.

Starting a new qemu process just to format an image is extreme overkill
and definitely not what we want to do.

>   2) Setup the transient disk with qemuDomainPrepareStorageSourceBlockdev(),
>  qemuBlockStorageSourceAttachApplyStorage(), 
> qemuBlockStorageSourceCreateGetFormatProps()
>  and something...
> 
>   3) Run blockdev-create command with qemuMonitorBlockdevCreate(), then
>  close the monitor.

These two steps should be exectued in the qemu process which already
will run the VM prior to starting the guest CPUs.

>   4) Switch the original disk to the transient disk.
> 
>   5) Build the blockdev argument for qemu.

And instead of this step, you use the external snapshot infrastructure
to install the overlays via 'blockdev-snapshot' QMP command

> 
>   6) Run qemu

And instead of this the VM cpus will be started.


The above steps require factoring out snapshot code a bit. I have a few
patches in that direction so I'll try posting them next week hopefully.

Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-07 Thread Peter Krempa

On Tue, Jul 07, 2020 at 06:36:23 -0500, Eric Blake wrote:
> On 7/7/20 2:12 AM, Peter Krempa wrote:
> > 
> > 1) the virDomainBlockCopy operation flattens the backing chain into the
> > top level only. This means that  must be stripped or the
> > operation rejected, as otherwise shutting down the VM would end up
> > removing the disk image completely.
> 
> If you have marked a disk transient, does it still make sense to allow
> copying that disk?  Rejecting the operation is easiest, as permitting it
> implies that even though you already said you don't care about changes to
> your disk, you still want to be able to back up that disk.

Back up? This is more about moving to new storage.

> > 2) the same as above is used also for non-shared-storage migration where
> > we use block-copy internally to transport the disks, same as above
> > applies. Here again it requires careful consideration of the semantics,
> > e.g whether to reject it or e.g. copy it into the original filename and
> > strip  (we can't currently copy multiple layers).
> 
> The easiest solution is to make a transient disk a migration-blocker. Of
> course, this may annoy people, so migration properly creating a transient
> destination on top of the original base, to preserve the fact that when the
> migrated guest shuts down it reverts to original contents, is a nicer (but
> more complex) goal.

To be fair, our docs already point out that  prevents
migration. It still needs to be implemented.

Same appliest to snapshots.

> > 3) active-layer virDomainBlockCommit moves the data from the transient
> > overlay into the original (now backing image). The  flag is
> > stored in the disk struct though, so that would mean that the original
> > disk source would be removed after stopping the VM. block commit must
> > clear the  flag.
> 
> Why should commit be permitted, when you declared that disk contents
> shouldn't change?  For that matter, external snapshots should be blocked if
> there is a transient disk.

I might have read it in the qemu documentation, but commit is meant to
actually ... commit ... the changes user might want to keep.

I'm okay with forbidding it though in libvirt.

Re: [PATCH 2/6] migration: introduce savevm, loadvm, delvm QMP commands

2020-07-07 Thread Peter Krempa

On Tue, Jul 07, 2020 at 12:33:31 +0200, Kevin Wolf wrote:
> Am 07.07.2020 um 08:38 hat Peter Krempa geschrieben:
> > On Mon, Jul 06, 2020 at 18:15:55 +0200, Kevin Wolf wrote:
> > > Am 03.07.2020 um 18:02 hat Daniel P. BerrangÃ© geschrieben:

[...]

> > IMO we really want this also for external snapshots. Driving the
> > migration as standard migration is really suboptimal especially when the
> > user wants minimal downtime. Transactioning a post-copy style copy-on
> > write migration would simplify this a lot. I agree though that this is
> > for a different conversation.
> 
> This is an interesting point actually. And while the implementation of
> the post-copy style live snapshotting is for a different conversation, I
> think the implications it has on the API are relevant for us now.
> 
> But even if we have an all-in-one snapshot job instead of a transaction
> to group all the individual operations together, I think you could still
> represent that by just specifying an empty list of nodes to be
> snapshotted. (I feel this is another argument for passing the nodes to
> include rather than nodes to exclude from the disk snapshot.)

Definitely. From libvirt's POV it's IMO simpler and more future-proof to
enumerate everything rather than keep a database of what to skip.

Re: [PATCH 3/3] qemublocktest: add test of transient option for qcow2 disk

2020-07-07 Thread Peter Krempa

On Mon, Jul 06, 2020 at 14:20:25 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Add a unit test for transient option for qcow2 file.
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  tests/qemublocktest.c   | 10 ++
>  .../xml2json/qcow2-transient-srconly.json   |  9 +
>  .../qemublocktestdata/xml2json/qcow2-transient.json | 13 +
>  .../qemublocktestdata/xml2json/qcow2-transient.xml  | 13 +
>  4 files changed, 45 insertions(+)
>  create mode 100644 
> tests/qemublocktestdata/xml2json/qcow2-transient-srconly.json
>  create mode 100644 tests/qemublocktestdata/xml2json/qcow2-transient.json
>  create mode 100644 tests/qemublocktestdata/xml2json/qcow2-transient.xml
> 
> diff --git a/tests/qemublocktest.c b/tests/qemublocktest.c
> index 0cdedb9..1294c18 100644
> --- a/tests/qemublocktest.c
> +++ b/tests/qemublocktest.c
> @@ -266,6 +266,7 @@ testQemuDiskXMLToProps(const void *opaque)
>  g_autoptr(virJSONValue) formatProps = NULL;
>  g_autoptr(virJSONValue) storageProps = NULL;
>  g_autoptr(virJSONValue) storageSrcOnlyProps = NULL;
> +qemuDomainObjPrivate priv;
>  g_autofree char *xmlpath = NULL;
>  g_autofree char *xmlstr = NULL;
>  
> @@ -288,6 +289,13 @@ testQemuDiskXMLToProps(const void *opaque)
>  return -1;
>  }
>  
> +if (disk->transient) {
> +priv.driver = data->driver;
> +if (qemuBlockCreateTransientDisk(disk->src, ) < 0)

NACK, this would create files on the system running the test suite in
random paths according to the disk config. The test-suite must never do
that.

Re: [PATCH 1/3] qemu: implementation of transient option for qcow2 file

2020-07-07 Thread Peter Krempa

On Mon, Jul 06, 2020 at 14:20:23 -0400, Masayoshi Mizuma wrote:
> From: Masayoshi Mizuma 
> 
> Here is the implementation of transient option for qcow2 file.
> This gets available  directive in domain xml file
> like as:
> 
> 
>   
>   
>   
>   
> 
> 
> The internal procedure is as follows.
> When the qemu command line options are built, a new qcow2 image is created
> with backing qcow2 image by using qemu-img command. The backing image is the
> qcow2 file which is set as .
> The filename of the new qcow2 image is original-source-file.TRANSIENT.
> The qemu-img will be:
> 
> qemu-img create -f qcow2 -F qcow2 \
> -b /var/lib/libvirt/images/guest.qcow2 \
> /var/lib/libvirt/images/guest.qcow2.TRANSIENT
> 
> Then, it switches the disk path, virStorageSourcePtr src->path, to
> the new qcow2 image. The new image and the backing image is handled and
> the blockdev option of qemu will be built like as:
> 
> -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/guest.qcow2",
> 
> "node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
> -blockdev 
> '{"node-name":"libvirt-2-format","read-only":true,"driver":"qcow2",
> "file":"libvirt-2-storage","backing":null}' \
> -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/guest.qcow2.TRANSIENT",
> 
> "node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
> -blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2",
> "file":"libvirt-1-storage","backing":"libvirt-2-format"}'
> 
> The new qcow2 image is removed when the Guest is shutdowned, 
> 
> Signed-off-by: Masayoshi Mizuma 
> ---
>  src/qemu/qemu_block.c| 71 
>  src/qemu/qemu_block.h|  7 
>  src/qemu/qemu_domain.c   |  4 +++
>  src/qemu/qemu_process.c  |  3 ++
>  src/qemu/qemu_validate.c |  2 +-
>  5 files changed, 86 insertions(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c
> index 6f9c707..5eb0225 100644
> --- a/src/qemu/qemu_block.c
> +++ b/src/qemu/qemu_block.c
> @@ -27,6 +27,7 @@
>  #include "viralloc.h"
>  #include "virstring.h"
>  #include "virlog.h"
> +#include "virqemu.h"
>  
>  #define VIR_FROM_THIS VIR_FROM_QEMU
>  
> @@ -3438,3 +3439,73 @@ qemuBlockUpdateRelativeBacking(virDomainObjPtr vm,
>  
>  return 0;
>  }
> +
> +int
> +qemuBlockCreateTransientDisk(virStorageSourcePtr src,
> + qemuDomainObjPrivatePtr priv)
> +{
> +virBuffer buf = VIR_BUFFER_INITIALIZER;
> +virCommandPtr cmd = NULL;
> +g_autofree char *filename = NULL;
> +g_autofree char *dirname = NULL;
> +const char *qemuImgPath;
> +char *newpath;
> +int err = -1;
> +
> +if ((src->format != VIR_STORAGE_FILE_QCOW2)) {
> +virReportError(VIR_ERR_OPERATION_INVALID,
> +   "%s", _("transient option supports qcow2"));
> +return -1;
> +}
> +
> +if (!(qemuImgPath = qemuFindQemuImgBinary(priv->driver)))
> +return -1;
> +
> +newpath = g_strdup_printf("%s.TRANSIENT", src->path);

This assumes that 'src' is a local disk, but the code doesn't check it.
If e.g. an NBD disk which can have 'path' NULL is used it would crash.

Specifically since we can't even create a new NBD disk this won't even
wrok.

> +
> +if (virFileExists(newpath)) {
> +virReportError(VIR_ERR_INVALID_ARG,
> +   _(" '%s' is already exists. Please remove it."), 
> newpath);
> +goto cleanup;
> +}
> +
> +if (!(cmd = virCommandNewArgList(qemuImgPath,
> + "create",
> + "-f",
> + "qcow2",
> + "-F",
> + "qcow2",
> + "-b",
> + NULL)))
> +goto cleanup;

I'd strongly suggest you implement this after qemu starts using the new
blockdev-create blockjob before starting the cpus and install the
overlays using the snapshot command.

The above will not work or will require many adjustments e.g. for
luks-encrypted qcow2 files. which would require substantial changes.


> +virQEMUBuildBufferEscapeComma(, src->path);
> +virCommandAddArgBuffer(cmd, );
> +
> +virQEMUBuildBufferEscapeComma(, newpath);
> +virCommandAddArgBuffer(cmd, );
> +
> +if (virCommandRun(cmd, NULL) < 0)
> +goto cleanup;
> +
> +VIR_DEBUG("Original disk: %s Transient disk: %s", src->path, newpath);
> +
> +g_free(src->path);
> +src->path = newpath;

This must add a new virStorageSource as 'src' and move the original to
src->backingStore in the live configuration object. we should not try to
detect the files which we know are there.


> +
> +err = 0;
> + cleanup:
> +

Re: [PATCH 2/6] migration: introduce savevm, loadvm, delvm QMP commands

2020-07-07 Thread Peter Krempa

On Mon, Jul 06, 2020 at 18:15:55 +0200, Kevin Wolf wrote:
> Am 03.07.2020 um 18:02 hat Daniel P. BerrangÃ© geschrieben:
> > On Fri, Jul 03, 2020 at 04:49:33PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. BerrangÃ© (berra...@redhat.com) wrote:
> > > > On Thu, Jul 02, 2020 at 01:12:52PM -0500, Eric Blake wrote:
> > > > > On 7/2/20 12:57 PM, Daniel P. BerrangÃ© wrote:

[...]

> > migration only does vmstate, not disks. The current blockdev commands
> > are all related to external snapshots, nothing for internal snapshots
> > AFAIK. So we still need commands to load/save internal snapshots of
> > the disk data in the qcow2 files.
> > 
> > So we could look at loadvm/savevm conceptually as a syntax sugar above
> > a migration transport that targets disk images, and blockdev QMP command
> > that can do internal snapshots. Neither of these exist though and feel
> > like a significantly larger amount of work than using existing functionality
> > that is currently working.
> 
> There is blockdev-snapshot-internal-sync, which does a disk-only
> snapshot of a single node. A snapshot of multiple nodes can be taken by
> putting multiple blockdev-snapshot-internal-sync inside a 'transaction'
> command.

Libvirt never implemented support for disk-only internal snapshots as I
didn't think they are worth it. We also made a mistake by using the
VIR_DOMAIN_SNAPSHOT_DISK_ONLY to switch to an external snapshot, so
while the XML can modify the snapshot to be internal it's not very clear
nor user-friendly to force an internal disk only snapshot.

> If we want to build on top of this, we'd have to implement a
> transactionable command for storing only the VM state to a specific
> node. This would probably still be a long-running job.

IMO we really want this also for external snapshots. Driving the
migration as standard migration is really suboptimal especially when the
user wants minimal downtime. Transactioning a post-copy style copy-on
write migration would simplify this a lot. I agree though that this is
for a different conversation.

Re: [PATCH 2/6] migration: introduce savevm, loadvm, delvm QMP commands

2020-07-03 Thread Peter Krempa

On Fri, Jul 03, 2020 at 17:10:12 +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. BerrangÃ© (berra...@redhat.com) wrote:
> > On Fri, Jul 03, 2020 at 04:49:33PM +0100, Dr. David Alan Gilbert wrote:
> > > * Daniel P. BerrangÃ© (berra...@redhat.com) wrote:
> > > > On Thu, Jul 02, 2020 at 01:12:52PM -0500, Eric Blake wrote:
> > > > > On 7/2/20 12:57 PM, Daniel P. BerrangÃ© wrote:

[...]

> > > Remind me, what was the problem with just making a block: migration
> > > channel, and then we can migrate to it?
> > 
> > migration only does vmstate, not disks. The current blockdev commands
> > are all related to external snapshots, nothing for internal snapshots
> > AFAIK. So we still need commands to load/save internal snapshots of
> > the disk data in the qcow2 files.
> > 
> > So we could look at loadvm/savevm conceptually as a syntax sugar above
> > a migration transport that targets disk images, and blockdev QMP command
> > that can do internal snapshots. Neither of these exist though and feel
> > like a significantly larger amount of work than using existing functionality
> > that is currently working.
> 
> I think that's what we should aim for; adding this wrapper isn't gaining
> that much without moving a bit towards that; so I would stick with the
> x- for now.

Relying on the HMP variants is IMO even worse. Error handling is
terrible there. I'd vote even for a straight wrapper without any logic
at this point. IMO it's just necessary to document that it's an
intermediate solution which WILL be deprecated and removed as soon as a
suitable replacement is in place.

Not doing anything is the argument we hear for multiple years now and
savevm/delvm/loadvm are now the only 3 commands used via the HMP wrapper
in libvirt.

Since deprecation is now a thing I think we can add a disposable
inteface. In the end HMP will or will not need to remain anyways and the
deprecation there is IMO less clear.

Re: [PATCH RFC v2 0/5] block: add block-dirty-bitmap-populate job

2020-06-09 Thread Peter Krempa

On Mon, May 18, 2020 at 16:52:45 +0200, Peter Krempa wrote:
> On Wed, May 13, 2020 at 23:49:17 -0400, John Snow wrote:
> > Hi,
> > 
> > This is a new (very small) block job that writes a pattern into a
> > bitmap. The only pattern implemented is the top allocation information.
> > 
> > This can be used to "recover" an incremental bitmap chain if an external
> > snapshot was taken without creating a new bitmap first: any writes made
> > to the image will be reflected by the allocation status and can be
> > written back into a bitmap.
> > 
> > This is useful for e.g. libvirt managing backup chains if a user creates
> > an external snapshot outside of libvirt.
> 
> I've dusted-off my patches for using this blockjob for this very
> specific case and it works for me.
> 
> Tested-by: Peter Krempa 
> 
> For now I'll continue the integration with other blockjobs where we
> merge bitmaps.

I've posted the libvirt patches which make use of this blockjob as RFC
on the libvir-list:

https://www.redhat.com/archives/libvir-list/2020-June/msg00292.html

I also have a branch with the patchset rebased to master (except for one
of the test commits IIRC) linked from the cover-letter.

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-08 Thread Peter Krempa

On Mon, Jun 08, 2020 at 13:30:48 +0300, Vladimir Sementsov-Ogievskiy wrote:
> 08.06.2020 12:38, Peter Krempa wrote:
> > On Sat, Jun 06, 2020 at 09:55:13 +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > 05.06.2020 13:59, Peter Krempa wrote:

[...]

> > 
> > It's not only a "user forgot" thing, but more that a systemic change
> > would be required.
> > 
> > Additionally until _very_ recently it wasn't possible to create bitmaps
> > using qemu-img, which made it impossible to create overlays for inactive
> 
> Didn't you consider to use qemu started in stopped mode to do block
> operations in same manner as for running vm? What's wrong with it?
> Also, there is qemu-storage-daemon, which is developed as separated
> binary, where block-layer is compiled in together with QMP interface.

It's still considered, but I didn't have time to implement anything
related to it and nobody else implemented it either.

Additionally the problem isn't in libvirt's handling but more in other
apps handling. We still due to historical reasons support if users
create overlays outside of libvirt.

This would mean that either backups break if the overlay is not done
properly or we have to have a fallback to use
'block-dirty-bitmap-populate'. At this point for both my sanity and
actually finishing all the details in regards to incremental backup

> > VMs. Arguably this has changed so we could require it. It still adds a
> > moving part which can break if the user doesn't add the bitmap or
> > requires yet another special case handling if we want to compensate for
> > that.
> > 
> > As of such, in libvirt's tech-preview implementation that is present
> > currently upstream, if you create a qcow2 overlay without adding the
> > appropriate bitmaps, the backups simply won't work.
> > 
> > > What do you think of granularity? We in Virtuozzo use 1M cluster as a 
> > > default for qcow2 images. But we use 64k granularity (default) for 
> > > bitmaps, to have smaller incremental backups. So, this is advantage of 
> > > creating bitmap over relaying on block-status capturing by 
> > > block-dirty-bitmap-populate: you don't control dirtiness granularity. So, 
> > > I think that bitmap propagation, or just creating new dirty bitmap to 
> > > track dirtiness from start of new snapshot is better.
> > 
> > This is a valid argument. Backups in this scenario will be bigger. I
> > still don't feel like the code needs to be made much more complex
> > because of it though.
> 
> May be, there is a simple solution? an option for blockdev-snapshot-sync to 
> create a bitmap in a new image (or if you create image by qemu-img, just 
> create bitmap by qemu-img as well, using new functionality).

You mean like we do now?:

https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_driver.c#L15020

That is slated to be deleted with my patchset though.

Good thing is that we can theoretically re-add it later.

For now I'd go with the simpler option that has fewer side effects.

> Isn't it simpler than to just use existing block-status-bitmap, than run a 
> job?

No. Because we'd still need to support cases where user added an overlay
without the appropriate bitmap. As said I'd prefer to keep the code
simple and then work on optimizations. Good thing is that with 'one
active bitmap per point in time' this is simpler to achieve.

[...]

> > > 
> > > What is 'merge' step?
> > 
> > In some previous replies to Kevin, we discussed that it might be worth
> > optimizing 'block-dirty-bitmap-populate' to directly populate the bits
> > in the target bitmap rather than after the job is complete, so
> > efectively directly mering it. I probably described it wrong here.
> > 
> > > Do you mean that populating directly into target bitmap is not really 
> > > needed?
> > 
> > I need the bitmap populated by 'block-dirty-bitmap-populate' to be
> > merged into multiple bitmaps in the new semantics. If the job itself
> > doesn't support that sematics, changing it to just to directly populate
> > one bitmap doesn't seem to be worth it since I'll be using intermediate
> > bitmaps anyways.
> > 
> 
> Hmm, if the main use case of populating job is to merge changes since 
> snapshot to several bitmaps (all active bitmaps?), than I think it's correct 
> to implement exactly this semantics, allowing a list of targets as well as 
> list of source bitmaps. We even can reuse same structure for target-list 
> which we use for source-list. And it's simple to implement in Qemu.

I'd mainly like for the design to converge. I actually have almost
complete patches which use the current semantics. I'm okay with
reworking it but at some point there should be a line when it won't
change.

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-08 Thread Peter Krempa

On Sat, Jun 06, 2020 at 09:55:13 +0300, Vladimir Sementsov-Ogievskiy wrote:
> 05.06.2020 13:59, Peter Krempa wrote:
> > On Fri, Jun 05, 2020 at 12:07:47 +0200, Kevin Wolf wrote:
> > > Am 05.06.2020 um 11:58 hat Peter Krempa geschrieben:
> > > > On Fri, Jun 05, 2020 at 11:44:07 +0200, Kevin Wolf wrote:
> > 
> > [...]
> > 
> > > > The above was actually inspired by a very recent problem I have in my
> > > > attempt to use the dirty bitmap populate job to refactor how libvirt
> > > > handles bitmaps. I've just figured out that I need to shuffle around
> > > > some stuff as I can't run the dirty-bitmap-populate job while an active
> > > > layer commit is in synchronised phase and I wanted to do the merging at
> > > > that point. That reminded me of a possible gotcha in having to sequence
> > > > the blockjobs which certainly would be more painful.
> > > 
> > > It would probably be good to have not only an iotests case that tests
> > > the low-level functionality of the block job, but also one that
> > > resembles the way libvirt would actually use it in combination with
> > > other jobs.
> > 
> 
> Hi! Sorry me missing the discussion for a long time.
> 
> About new job semantics: if you create temporary bitmaps anyway, I do think 
> that we should allow to populate into target bitmap directly, without 
> creating any internal temporary bitmaps. I suggested it when reviewing v1, 
> John argued for more transaction-like semantics to look like other jobs. 
> Still, we can support both modes if we want.
> 
> Allowing to use one target for several populating job is an interesting idea. 
> Current series does "bdrv_dirty_bitmap_set_busy(target_bitmap, true);", which 
> forbids it.. Hmm. If we just drop it, nothing prevents user just remove 
> target bitmap during the job. So, we'll need something like collective-busy 
> bitmap..
> 
> > I certainly can document the way we'll use it but that in turn depends
> > on how the job behaves.
> > 
> > With the current state of the job I plan to use it in two scenarios:
> > 
> > Preface: I'm currently changing libvirt to use one active bitmap per
> > checkpoint (checkpoint is name for the point in time we want to take
> > backup from). This means that a layer of the backing chain can have
> > multiple active bitmaps depending on how many checkpoints were created
> > in the current top layer. (previously we've tried to optimize things by
> > having just one bitmap active, but the semantics were getting too crazy
> > to be maintainable long-term)
> 
> Hmm. I had a plan of creating "lazy" disabled bitmaps, to optimize scenario 
> with one active bitmap, so that disabled bitmaps are not loaded into RAM on 
> start, but only on demand. But how to do it with "many active bitmaps" 
> scenario? I don't think that's a good idea.. Possibly, we can implement 
> laziness by internally make only one active bitmap and merge it here and 
> there when you request some active bitmap which we actually didn't load yet..
> 
> Could you describe, what is the exact problem with "several disabled - one 
> active" scheme, and how is it solved by "several active"?

The 'several disabled one active' semantics _heavily_ depend on metadata
which must be tracked outside of qemu and is more prone to break. If any
of the intermediate bitmaps is broken or missing everything breaks.

Then there's the complexity of the code which handles merging of the
bitmaps during block jobs. Jobs such as blockdev-mirror in full mode and
block-commit squash together the data and we need to do something about
the bitmaps for the backups to work properly afterwards.

Without considering overlays which were created without propagating
bitmaps, the code was already getting hairy especially in the case of
backups where we needed to stitch together bitmaps for all the bitmaps
corresponding to the given point in time where the backup is taken from.

When we add overlays without any bitmaps into the mix the code for
resolving which bitmaps to merge the code is becoming very unpleasant,
hard to understand and maintain and that is the main point for the
switch.

I don't want to add unnecessary complexity to the libvirt code which
will make it more fragile or hard to understand and fix in the future.

Both points which I heard for now (performance, and backup granularity
in case of non-default qcow2 block size) don't seem compelling enough to
me to make my life of implementing the feature in libvirt so much
harder.

Also users really can just remove the point in time they wish to backup
from after a successful backup which will also remove the corresponding
act

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-05 Thread Peter Krempa

On Fri, Jun 05, 2020 at 12:07:47 +0200, Kevin Wolf wrote:
> Am 05.06.2020 um 11:58 hat Peter Krempa geschrieben:
> > On Fri, Jun 05, 2020 at 11:44:07 +0200, Kevin Wolf wrote:

[...]

> > The above was actually inspired by a very recent problem I have in my
> > attempt to use the dirty bitmap populate job to refactor how libvirt
> > handles bitmaps. I've just figured out that I need to shuffle around
> > some stuff as I can't run the dirty-bitmap-populate job while an active
> > layer commit is in synchronised phase and I wanted to do the merging at
> > that point. That reminded me of a possible gotcha in having to sequence
> > the blockjobs which certainly would be more painful.
> 
> It would probably be good to have not only an iotests case that tests
> the low-level functionality of the block job, but also one that
> resembles the way libvirt would actually use it in combination with
> other jobs.

I certainly can document the way we'll use it but that in turn depends
on how the job behaves.

With the current state of the job I plan to use it in two scenarios:

Preface: I'm currently changing libvirt to use one active bitmap per
checkpoint (checkpoint is name for the point in time we want to take
backup from). This means that a layer of the backing chain can have
multiple active bitmaps depending on how many checkpoints were created
in the current top layer. (previously we've tried to optimize things by
having just one bitmap active, but the semantics were getting too crazy
to be maintainable long-term)

Bitmaps are no longer propagated over to upper layers when creating
snapshots as we can use block-dirty-bitmap-populate instead.

1) backup

Prior to doing the backup I'm figuring out the final backup bitmap, this
involves creating a temporary bitmap populated by the job for every
layer of the backing chain above of the one which contains the bitmap we
want to take a backup from and then merge all of them together as a base
for the backup.

2) blockjobs

Note: This is currently an outline how the things should be as I've hit
the snag with attempting to run the population jobs during 'ready' state
of a active-layer block-commit/blockdev-mirror job only an hour ago and
I need to change a few things.

2.1) active layer block-commit/blockdev-mirror

When the job reaches 'ready' state I'll create bitmaps in the
destination/base image of the job for every bitmap in the images
dropped/merged (we use blockdev-mirror in full-sync mode) by the
blockjob. This will capture the writes that happen after 'job-complete'.

The job will then be completed and the 2.2. will be executed as well.

2.2) non-active commit and also continuation of active layer 
block-commit/blockdev-mirror

After the job is completed succesfully I'll create temporary
non-persistent bitmaps for/in the images removed by the blockjob and
merge them into the destination image's bitmaps depending on their
original location in the backing chain so that the bitmap state still
properly describes which blocks have changed.

After that the original images willbe blockdev-del-eted. The above is
partialy in use today and since the job is already completed also
requires blockdev-reopen to successfuly write to the bitmaps.

While writing the above down I've actually realized that controling the
destination of the bitmap might not be as useful as I've thought
originally as in 2.2. step I might need the allocation bitmap merged
into multiple bitmaps, so I'd either need a temporary bitmap anyways or
would have to re-run the job multiple times which seems wasteful. I'm no
longer fully persuaded that adding the 'merge' step to the dirty
populate blockjob will be the best thing since sliced bread.

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-05 Thread Peter Krempa

On Fri, Jun 05, 2020 at 11:44:07 +0200, Kevin Wolf wrote:
> Am 05.06.2020 um 11:24 hat Peter Krempa geschrieben:
> > On Fri, Jun 05, 2020 at 11:01:23 +0200, Kevin Wolf wrote:
> > > Am 04.06.2020 um 18:22 hat Peter Krempa geschrieben:
> > > > On Thu, Jun 04, 2020 at 13:31:45 +0200, Kevin Wolf wrote:
> > > > > Am 04.06.2020 um 11:16 hat Peter Krempa geschrieben:
> > > > > > On Thu, Jun 04, 2020 at 11:12:31 +0200, Kevin Wolf wrote:
> > > > > > > Am 18.05.2020 um 22:49 hat Eric Blake geschrieben:
> > > > > > > > > +
> > > > > > > > > +/* NB: new bitmap is anonymous and enabled */
> > > > > > > > > +cluster_size = 
> > > > > > > > > bdrv_dirty_bitmap_granularity(target_bitmap);
> > > > > > > > > +new_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, 
> > > > > > > > > NULL, errp);
> > > > > > > > > +if (!new_bitmap) {
> > > > > > > > > +return NULL;
> > > > > > > > > +}
> > > > > > > > 
> > > > > > > > This means if the guest writes to the disk while the job is 
> > > > > > > > ongoing, the
> > > > > > > > bitmap will be updated to mark that portion of the bitmap as 
> > > > > > > > set, even if it
> > > > > > > > was not allocated at the time the job started.  But then again, 
> > > > > > > > the guest
> > > > > > > > writes are causing allocation, so this seems like the right 
> > > > > > > > thing to do.
> > > > > > > 
> > > > > > > Is the target bitmap active at the same time, i.e. will it get the
> > > > > > > correct information only from new_bitmap or are the bits already 
> > > > > > > set in
> > > > > > > it anyway?
> > > > > > 
> > > > > > Yes, libvirt plans to use it with an active non-persistent bitmap 
> > > > > > which
> > > > > > will in subsequent steps be merged into others. The bitmap is added 
> > > > > > in
> > > > > > the same transaction. The bitmap must be active, because we need to 
> > > > > > wait
> > > > > > for the block jobs to finish before it becomes usable and thus can't
> > > > > > sequence in other operations until later.
> > > > > 
> > > > > A lot of bitmap merging then, because the block job in this series
> > > > > already creates a temporary internal bitmap that is merged into the
> > > > > target bitmap on completion. But if the target bitmap is only 
> > > > > libvirt's
> > > > > temporary bitmap to be merged to yet another bitmap, I wonder if this
> > > > > process shouldn't be simplified.
> > > > 
> > > > Possibly yes, but I'll leave that for later. All of this is done when
> > > > executin very expensive operations anyways so for our first
> > > > implementation it IMO won't matter that much.
> > > 
> > > I'm not necessarily saying that the change is needed on the libvirt
> > > side. It could also be that the block job should directly work with the
> > > given bitmap instead of having its internal temporary bitmap. Changing
> > > this later would mean changing the semantics of the block job, so it
> > > would be somewhat problematic.
> > > 
> > > It would be good to have a clear picture of what we want the final
> > > result to look like.
> > 
> > Well with current semantics of the 'nodename' argument controling both
> > where the populated bitmap is located and also which node's allocation
> > bitmap to take I don't think we can optimize it further in libvirt.
> > 
> > Current usage scenario is that we use a temporary bitmap populated with
> > the job to merge with bitmaps present in nodes which are removed by
> > blockjobs into the destination node of the block job. This means that
> > the real destination of the bits populated is in a different node than
> > it was originally and the above job semantics don't allow that.
> 
> So does this mean that a better API wouldn't only take a node-name and
> bitmap name (where the node identified by node-name is not only where
> the target bitmap is, but also the node whose allocation status is
>

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-05 Thread Peter Krempa

On Fri, Jun 05, 2020 at 11:01:23 +0200, Kevin Wolf wrote:
> Am 04.06.2020 um 18:22 hat Peter Krempa geschrieben:
> > On Thu, Jun 04, 2020 at 13:31:45 +0200, Kevin Wolf wrote:
> > > Am 04.06.2020 um 11:16 hat Peter Krempa geschrieben:
> > > > On Thu, Jun 04, 2020 at 11:12:31 +0200, Kevin Wolf wrote:
> > > > > Am 18.05.2020 um 22:49 hat Eric Blake geschrieben:
> > > > > > > +
> > > > > > > +/* NB: new bitmap is anonymous and enabled */
> > > > > > > +cluster_size = bdrv_dirty_bitmap_granularity(target_bitmap);
> > > > > > > +new_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, 
> > > > > > > NULL, errp);
> > > > > > > +if (!new_bitmap) {
> > > > > > > +return NULL;
> > > > > > > +}
> > > > > > 
> > > > > > This means if the guest writes to the disk while the job is 
> > > > > > ongoing, the
> > > > > > bitmap will be updated to mark that portion of the bitmap as set, 
> > > > > > even if it
> > > > > > was not allocated at the time the job started.  But then again, the 
> > > > > > guest
> > > > > > writes are causing allocation, so this seems like the right thing 
> > > > > > to do.
> > > > > 
> > > > > Is the target bitmap active at the same time, i.e. will it get the
> > > > > correct information only from new_bitmap or are the bits already set 
> > > > > in
> > > > > it anyway?
> > > > 
> > > > Yes, libvirt plans to use it with an active non-persistent bitmap which
> > > > will in subsequent steps be merged into others. The bitmap is added in
> > > > the same transaction. The bitmap must be active, because we need to wait
> > > > for the block jobs to finish before it becomes usable and thus can't
> > > > sequence in other operations until later.
> > > 
> > > A lot of bitmap merging then, because the block job in this series
> > > already creates a temporary internal bitmap that is merged into the
> > > target bitmap on completion. But if the target bitmap is only libvirt's
> > > temporary bitmap to be merged to yet another bitmap, I wonder if this
> > > process shouldn't be simplified.
> > 
> > Possibly yes, but I'll leave that for later. All of this is done when
> > executin very expensive operations anyways so for our first
> > implementation it IMO won't matter that much.
> 
> I'm not necessarily saying that the change is needed on the libvirt
> side. It could also be that the block job should directly work with the
> given bitmap instead of having its internal temporary bitmap. Changing
> this later would mean changing the semantics of the block job, so it
> would be somewhat problematic.
> 
> It would be good to have a clear picture of what we want the final
> result to look like.

Well with current semantics of the 'nodename' argument controling both
where the populated bitmap is located and also which node's allocation
bitmap to take I don't think we can optimize it further in libvirt.

Current usage scenario is that we use a temporary bitmap populated with
the job to merge with bitmaps present in nodes which are removed by
blockjobs into the destination node of the block job. This means that
the real destination of the bits populated is in a different node than
it was originally and the above job semantics don't allow that.

Either way I'd strongly prefer to be able to kick off all the populate
jobs at once rather than having to sequence them so any semantic change
towards making it possible to target bitmaps in a different node would
also require that multiple jobs can run in parallel with a single bitmap
as destination. I'm not sure if that doesn't overcomplicate things
though.

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-04 Thread Peter Krempa

On Thu, Jun 04, 2020 at 13:31:45 +0200, Kevin Wolf wrote:
> Am 04.06.2020 um 11:16 hat Peter Krempa geschrieben:
> > On Thu, Jun 04, 2020 at 11:12:31 +0200, Kevin Wolf wrote:
> > > Am 18.05.2020 um 22:49 hat Eric Blake geschrieben:
> > > > > +
> > > > > +/* NB: new bitmap is anonymous and enabled */
> > > > > +cluster_size = bdrv_dirty_bitmap_granularity(target_bitmap);
> > > > > +new_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, 
> > > > > errp);
> > > > > +if (!new_bitmap) {
> > > > > +return NULL;
> > > > > +}
> > > > 
> > > > This means if the guest writes to the disk while the job is ongoing, the
> > > > bitmap will be updated to mark that portion of the bitmap as set, even 
> > > > if it
> > > > was not allocated at the time the job started.  But then again, the 
> > > > guest
> > > > writes are causing allocation, so this seems like the right thing to do.
> > > 
> > > Is the target bitmap active at the same time, i.e. will it get the
> > > correct information only from new_bitmap or are the bits already set in
> > > it anyway?
> > 
> > Yes, libvirt plans to use it with an active non-persistent bitmap which
> > will in subsequent steps be merged into others. The bitmap is added in
> > the same transaction. The bitmap must be active, because we need to wait
> > for the block jobs to finish before it becomes usable and thus can't
> > sequence in other operations until later.
> 
> A lot of bitmap merging then, because the block job in this series
> already creates a temporary internal bitmap that is merged into the
> target bitmap on completion. But if the target bitmap is only libvirt's
> temporary bitmap to be merged to yet another bitmap, I wonder if this
> process shouldn't be simplified.

Possibly yes, but I'll leave that for later. All of this is done when
executin very expensive operations anyways so for our first
implementation it IMO won't matter that much.

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-06-04 Thread Peter Krempa

On Thu, Jun 04, 2020 at 11:12:31 +0200, Kevin Wolf wrote:
> Am 18.05.2020 um 22:49 hat Eric Blake geschrieben:
> > > +
> > > +/* NB: new bitmap is anonymous and enabled */
> > > +cluster_size = bdrv_dirty_bitmap_granularity(target_bitmap);
> > > +new_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
> > > +if (!new_bitmap) {
> > > +return NULL;
> > > +}
> > 
> > This means if the guest writes to the disk while the job is ongoing, the
> > bitmap will be updated to mark that portion of the bitmap as set, even if it
> > was not allocated at the time the job started.  But then again, the guest
> > writes are causing allocation, so this seems like the right thing to do.
> 
> Is the target bitmap active at the same time, i.e. will it get the
> correct information only from new_bitmap or are the bits already set in
> it anyway?

Yes, libvirt plans to use it with an active non-persistent bitmap which
will in subsequent steps be merged into others. The bitmap is added in
the same transaction. The bitmap must be active, because we need to wait
for the block jobs to finish before it becomes usable and thus can't
sequence in other operations until later.

Re: [RFC v2] migration: Add migrate-set-bitmap-node-mapping

2020-06-02 Thread Peter Krempa

On Tue, Jun 02, 2020 at 12:56:32 +0200, Max Reitz wrote:
> On 18.05.20 18:26, Peter Krempa wrote:
> > On Wed, May 13, 2020 at 16:56:10 +0200, Max Reitz wrote:
> >> This command allows mapping block node names to aliases for the purpose
> >> of block dirty bitmap migration.
> >>
> >> This way, management tools can use different node names on the source
> >> and destination and pass the mapping of how bitmaps are to be
> >> transferred to qemu (on the source, the destination, or even both with
> >> arbitrary aliases in the migration stream).
> >>
> >> Suggested-by: Vladimir Sementsov-Ogievskiy 
> >> Signed-off-by: Max Reitz 
> >> ---

[...]

> >> +# @migrate-set-bitmap-node-mapping:
> > 
> > qemu-5.0 deprecated a bunch of migrate-set- specific commands in favor
> > of migrate-set-parameters. Is it worth/necessary to add a new command
> > here?
> 
> I wasn’t aware of that.  It would probably indeed make sense from a
> user’s perspective.
> 
> It would make the implementation rather different, though, because
> instead of putting the mapping locally (and statically) into
> migration/block-dirty-bitmap.c, it would require putting it into the
> central MigrationState.  Which isn’t to say it’d be worse.  I suppose
> it’d be better, actually, but I’ll have to try to say for sure.
> 
> You also suggested “setting nothing will clear the mapping” in your
> second mail.  That would be a weird default.  Right now, the default for
> all migration parameters is to leave them as-is, so it would be different.
> 
> The first question would be: What do you mean by “clear the mapping”?
> Reset it to the original identity mapping?  Or actually clear it, so
> that no bitmap is migrated?  I presume the former, because the latter
> can easily be achieved by passing an empty array.

Yeah, lucikily this is easy with json:

default mapping:

"mapping": null

empty mapping:

"mapping": [ ]

> I understand that it seems to make sense to be able to return to the
> original identity mapping, but is there actually a use for this?  After
> you have started using a custom mapping, wouldn’t you always use custom
> mappings?

Libvirt mostly doesn't care. Downgrade of libvirt version is unsupported
so this should not be a problem.

> If there is a use for it, I think the better way to do it would be to
> use an alternate type where an explicit null resets the mapping to the
> identity mapping.

yes :)

Re: [PATCH RFC v2 1/5] block: add bitmap-populate job

2020-05-19 Thread Peter Krempa

On Mon, May 18, 2020 at 15:49:02 -0500, Eric Blake wrote:
> On 5/13/20 10:49 PM, John Snow wrote:

[...]

> > +
> > +/* NB: new bitmap is anonymous and enabled */
> > +cluster_size = bdrv_dirty_bitmap_granularity(target_bitmap);
> > +new_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp);
> > +if (!new_bitmap) {
> > +return NULL;
> > +}
> 
> This means if the guest writes to the disk while the job is ongoing, the
> bitmap will be updated to mark that portion of the bitmap as set, even if it
> was not allocated at the time the job started.  But then again, the guest
> writes are causing allocation, so this seems like the right thing to do.

Well, this could be made problem of the caller by skipping any newly
allocated sectors to be written to the bitmap. The caller then can
decide whether a snapshot of the allocation map is needed and thus a new
inactive bitmap should be used as the destination, or whether new writes
should be tracked by using an active bitmap.

> Do we need to worry about the converse case where the job started with
> something allocated but runs in parallel with the guest trimming, such that
> our bitmap marks something as set even though at the conclusion of our job
> it is no longer allocated?

Given the semantics above this would conveniently not be a problem of
the population job. If you create a snapshot of the allocation map any
any point we'd care about that state.

Anyways, from the point of view of the bitmap code any write to a sector
sets the bit so the trimming should not be treated differently.

Speicifically libvirt plans to use it on overlay(snapshot) images where
the btimaps are not present so in that case even trimmed sectors need to
mask the data in the backing image so they can technically be considered
as allocated too.

Re: [RFC v2] migration: Add migrate-set-bitmap-node-mapping

2020-05-18 Thread Peter Krempa

On Mon, May 18, 2020 at 20:52:32 +0300, Vladimir Sementsov-Ogievskiy wrote:
> [add Nikolay]
> 
> 18.05.2020 19:26, Peter Krempa wrote:
> > On Wed, May 13, 2020 at 16:56:10 +0200, Max Reitz wrote:

[...]

> > Is there any difference of handling of persistent and non-persistent
> > bitmaps? Specifically I'm curious what's the correct approach to do the
> > shared storage migration case when the source and destination image is
> > the same one. Should we also instruct to migrate the active ones? or are
> > they migrated by writing them to the image and reloading them?
> 
> About migration of persistent bitmaps with shared storage: both variants are 
> possible:
> 
> 1. if you do nothing (i.e. not enable bitmaps migration capability), 
> persistent bitmaps are stored on inactivation of source Qemu and loaded on 
> activation of target Qemu
> 
> 2. if you enable bitmap migration capability, then bitmaps are _NOT_ stored, 
> they migrate through migration channel
> 
> The difference is in downtime: if we have a lot of bitmap data (big disks, 
> small bitmap granularity, a lot of bitmaps) and/or slow shared storage, then 
> with [1] downtime will increase, as we'll have to store all bitmaps to the 
> disk and load them during migration downtime. With [2] bitmaps migrate in 
> postcopy time, when target is already running, so downtime is not increased.
> 
> So, in general [2] is better, and I think we should always use it, not making 
> extra difference between shared and non-shared migration.

Thanks for the explanation! What about the inactive bitmaps though? Are
they treated the same when opened? Should we consider them for migration
through the migration stream?

> 
> ==
> 
> And in this way, no difference between persistent and non-persistent bitmaps 
> migration, let's always migrate them by postcopy [and we go this way (I hope 
> ;) in Virtuozzo]

Yeah, that's probably going to be what libvirt does as well.


> > > +# @migrate-set-bitmap-node-mapping:
> > 
> > qemu-5.0 deprecated a bunch of migrate-set- specific commands in favor
> > of migrate-set-parameters. Is it worth/necessary to add a new command
> > here?
> 
> Hmm probably, we should include mapping into MigrateSetParameters structure?

IMO yes. I think it also conveniently sidesteps some of the issues that
were discussed in the other threads regarding addition/multiple calls
etc. The mapping will be set at once and any other invocation sets a new
mapping and that's it. Setting nothing will clear the mapping.

Re: [RFC v2] migration: Add migrate-set-bitmap-node-mapping

2020-05-18 Thread Peter Krempa

On Wed, May 13, 2020 at 16:56:10 +0200, Max Reitz wrote:
> This command allows mapping block node names to aliases for the purpose
> of block dirty bitmap migration.
> 
> This way, management tools can use different node names on the source
> and destination and pass the mapping of how bitmaps are to be
> transferred to qemu (on the source, the destination, or even both with
> arbitrary aliases in the migration stream).
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 
> ---

[...]

> ---
>  qapi/migration.json| 36 
>  migration/block-dirty-bitmap.c | 60 --
>  2 files changed, 94 insertions(+), 2 deletions(-)

I just started to write some quick and dirty hacks to test use of this
infrastructure in libvirt. I have 2 quick observations for now:

> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index d5000558c6..97037ea635 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1621,3 +1621,39 @@
>  ##
>  { 'event': 'UNPLUG_PRIMARY',
>'data': { 'device-id': 'str' } }
> +
> +##
> +# @MigrationBlockNodeMapping:
> +#
> +# Maps a block node name to an alias for migration.
> +#
> +# @node-name: A block node name.
> +#
> +# @alias: An alias name for migration (for example the node name on
> +# the opposite site).
> +#
> +# Since: 5.1
> +##
> +{ 'struct': 'MigrationBlockNodeMapping',
> +  'data': {
> +  'node-name': 'str',
> +  'alias': 'str'
> +  } }

We'd probably like a
'nodename:bitmapname' -> 'alias' mapping so that we can select which
bitmaps to migrate and where to migrate them to. The specific use case
is following:

Libvirt supports migration without shared storage (NFS/etc) where we
copy the disk images prior to the memory migration using qemu's NBD
server and the blockdev-mirror job. By default and the most simple way
which doesn't require users fudging with the disk images and copying
backing images themselves is that we flatten all the backing chain
during the copy ("sync":"full"). In this mode we'll need to do some
merging of the bitmaps prior to finalizing the copy.

It's not a problem to do it ourselves, but in the end we'll need to copy
only certain bitmaps which will be created temporarily on the source to
the destination where they'll be persisted.

For now (until I implement the use of the dirty-bitmap-populate blockjob
which I'm also doing in parallel with this kind of) when creating a
snapshot we create a new active bitmap in the overlay for every active
bitmap in the snapshotted image.

When flattening we'll then need to merge the appropriate ones. As said
it's not a problem to prepare all the bitmaps before but we then don't
need to migrate all of them.

By the way, that brings me to another question:

Is there any difference of handling of persistent and non-persistent
bitmaps? Specifically I'm curious what's the correct approach to do the
shared storage migration case when the source and destination image is
the same one. Should we also instruct to migrate the active ones? or are
they migrated by writing them to the image and reloading them?

> +##
> +# @migrate-set-bitmap-node-mapping:

qemu-5.0 deprecated a bunch of migrate-set- specific commands in favor
of migrate-set-parameters. Is it worth/necessary to add a new command
here?

> +#
> +# Maps block node names to arbitrary aliases for the purpose of dirty
> +# bitmap migration.  Such aliases may for example be the corresponding
> +# node names on the opposite site.
> +#
> +# By default, every node name is mapped to itself.
> +#
> +# @mapping: The mapping; must be one-to-one, but not necessarily
> +#   complete.  Any mapping not given will be reset to the
> +#   default (i.e. the identity mapping).
> +#
> +# Since: 5.1
> +##

Re: [PATCH RFC v2 0/5] block: add block-dirty-bitmap-populate job

2020-05-18 Thread Peter Krempa

On Wed, May 13, 2020 at 23:49:17 -0400, John Snow wrote:
> Hi,
> 
> This is a new (very small) block job that writes a pattern into a
> bitmap. The only pattern implemented is the top allocation information.
> 
> This can be used to "recover" an incremental bitmap chain if an external
> snapshot was taken without creating a new bitmap first: any writes made
> to the image will be reflected by the allocation status and can be
> written back into a bitmap.
> 
> This is useful for e.g. libvirt managing backup chains if a user creates
> an external snapshot outside of libvirt.

I've dusted-off my patches for using this blockjob for this very
specific case and it works for me.

Tested-by: Peter Krempa 

For now I'll continue the integration with other blockjobs where we
merge bitmaps.

Re: [PATCH v5 7/7] qemu-img: Deprecate use of -b without -F

2020-05-05 Thread Peter Krempa

On Tue, May 05, 2020 at 10:11:03 +0200, Kevin Wolf wrote:
> Am 03.04.2020 um 19:58 hat Eric Blake geschrieben:
> > Creating an image that requires format probing of the backing image is
> > inherently unsafe (we've had several CVEs over the years based on
> > probes leaking information to the guest on a subsequent boot, although
> > these days tools like libvirt are aware of the issue enough to prevent
> > the worst effects).  However, if our probing algorithm ever changes,
> > or if other tools like libvirt determine a different probe result than
> > we do, then subsequent use of that backing file under a different
> > format will present corrupted data to the guest.  Start a deprecation
> > clock so that future qemu-img can refuse to create unsafe backing
> > chains that would rely on probing.  Most warnings are intentionally
> > emitted from bdrv_img_create() in the block layer, but qemu-img
> > convert uses bdrv_create() which cannot emit its own warning without
> > causing spurious warnings on other code paths.  In the end, all
> > command-line image creation or backing file rewriting now performs a
> > check.
> > 
> > However, there is one time where probing is safe: if we probe raw,
> > then it is safe to record that implicitly in the image (but we still
> > warn, as it's better to teach the user to supply -F always than to
> > make them guess when it is safe).
> 
> You're not stating it explicitly, but I guess the thing that you mean
> that is actually unsafe is if you have a raw image, always pass
> format=raw to QEMU (so the guest could write e.g. a qcow2 header), but
> then create a backing file without -F, so it will be probed. This is as
> bad as specifying format=raw only sometimes.
> 
> I don't like the idea of responding to this by making raw images more
> convenient to use than actual image formats.
> 
> How about we approach it the other way around: The troublemaker is raw,
> so let's require specifying raw explicitly, and record the probed format
> implicitly in other cases. This is a bit weaker in the immediate effect
> in that it doesn't protect you when you actually deal with a malicious
> image, but in normal use it will point out where your scripts or
> management software is too careless. The final result should be that
> management tools are fixed and you'll be safe, while manual users who
> can usually trust their guests aren't inconvenienced too much.

That is definitely better than just probing. In my opinion though the
format should always be specified explicitly. No favorism of 'raw' or
any other format. In libvirt we fill-in that the format of the image is
'raw' unless the user specifies something else rather than reporting an
error. This causes continuous grief for users who forget to set the
format. (one example is that if you create a fully-allocated qcow2 image
but use it as raw and install your VM, you'll never notice until you
want to use qcow2 features. Users usually notice once they try to create
a sparse image though due to the size difference)

Re: Backup of vm disk images

2020-05-04 Thread Peter Krempa

On Fri, May 01, 2020 at 16:05:47 +0100, Stefan Hajnoczi wrote:
> On Wed, Apr 22, 2020 at 07:51:09AM +0200, Anders Östling wrote:
> > I am fighting to understand the difference between backing up a VM by
> > using a regular copy vs using the virsh blockcopy command.
> > What I want to do is to suspend the vm, copy the XML and .QCOW2 files
> > and then resume the vm again. What are your thoughts? What are the
> > drawbacks compared to other methods?

The approaches have diffrerent kind of data integrity they provide and
downtime they require.

Assuming from the above that you don't want to shutdown the OS in the VM
you've got following options:

1) pause VM and copy images as described above
  I definitely don't recommend this approach at all. The drawback is
  that the QCOW2 file on the disk may have inconsistent metadata. Even
  if you ensure that the gues OS state is consistend and written to disk
  it's not guaranteed that qemu's buffers were flushed.

  Also the VM needs to be suspended during the whole copy, unless you
  have the image on a filesystem which has --reflink support as  pointed
  out by Stefan.

2) 'virsh blockcopy'
 The original idea of blockcopy is to move storage of an active VM to a
 different location. It can be used though to "copy" the active disk and
 ensure that the metadata is correct when combined with 'virsh blockjob
 --abort' to finish it. This still requires the guest OS in the VM to
 ensure that the filesystems on the backed-up disk are consistent.

 Also the API has one very severe limitation if your VM has multiple
 disks: There is no way to ensure that you cancel all the copies at the
 same time, so the 'backup' done this way is not taken at a single point
 in time. It's also worth noting that the point in time the backup is
 taken is when the job is --abort-ed.

3) virsh backup
 This is the newest set of APIs specifically designed to do disk backups
 of the VM, offers consistency of the image metadata, and taking of the
 backups of multiple disks at the same point in time. Also the point in
 time is when the API is started, regardless of how long the actual data
 handling takes.

 Your gues OS still needs to ensure filesystem consistency though.

 Additionally as mentioned by Stefan below you can also do incremental
 backups as well.

 One thing to note though is that the backup integration is not entirely
 finished in libvirt and thus in a 'tech-preview' state. Some
 interactions corrupt the state for incremental backups.

 If you are interested, I can give you specific info how to enable
 support for backups as well as the specifics of the current state of
 implementation.

4) snapshots
 Libvirt's snapshot implementation supports taking full VM snapshots
 including memory and disk image state. This sidesteps the problem of
 inconsistent filesystem state as the memory state contains also all the
 buffers.

 When an external snapshot is created, we add a new set of overlay files
 on top of the original disk images. This means that they become
 effectively read-only. You can then copy them aside if you want so. The
 memory image taken along can be then used to fully restore the state of
 the VM.

 There are a few caveats here as well. If the image chain created this
 way becomes too long it may negatively impact performance. Also
 reverting the memory image is a partially manual operation for now. I
 can give specifics if you want.

> 
> Hi Anders,
> The k...@vger.kernel.org mailing list is mostly for the discussion and
> development of the KVM kernel module so you may not get replies.  I have
> CCed libvir-list and developers who have been involved in libvirt backup
> features.
> 
> A naive cp(1) command will be very slow because the entire disk image is
> copied to a new file.  The fastest solution with cp(1) is the --reflink
> flag which basically takes a snapshot of the file and shares the disk
> blocks (only available when the host file system supports it and not
> available across mounts).
> 
> Libvirt's backup commands are more powerful.  They can do things like
> copy out a point-in-time snapshot of the disk while the guest is
> running.  They also support incremental backup so you don't need to
> store a full copy of the disk image each time you take a backup.
> 
> I hope others will join the discussion and give examples of some of the
> available features.
> 
> Stefan

1 2 3 >

1 - 100 of 224 matches

Mail list logo