Re: [PATCH 0/3] Implement support for QCOW2 data files

2024-08-19 Thread Denis V. Lunev via Devel

On 8/16/24 17:00, Peter Krempa wrote:

On Wed, Aug 14, 2024 at 15:28:57 +0200, Denis V. Lunev wrote:

On 8/12/24 16:46, Peter Krempa wrote:

On Mon, Aug 12, 2024 at 12:04:01 +0200, Denis V. Lunev wrote:

On 8/12/24 10:36, Peter Krempa wrote:

On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:

On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote:

There are use cases when the existing disks (i.e. LVM) are wanted
to be used with advanced features. For this purpose QEMU allows
data-file feature for qcow2 files: metadata is kept in the qcow2
file like usual, but guest data is written to an external file.
These patches enable support for this feature in libvirt.

So this feature was once attempted to be added but was never finished
and the comitted bits were reverted eventually. (I've purged my local
archive so I don't have the link handy but I can look if you want the
links to the old posting)

It was deemed that this doesn't really add any performance benefit over
storing the actual qcow2 with data inside. The qcow2 with data can be
stored inside the LV or other block device for that matter and thus can
provide all features that are necessary. The data file feature makes
also the management of the metadata and data much more complex, for a
very bad trade-off. At this point with 'qemu-img measure' it's easy to
query the necessary size to have a fully allocated qcow2 inside the
block device.

Based on the history of this I'd like to ask you to summarize
justifications and reasons for adding this feature before continuing.

Based on the current state of the series and what would be required to
make it viable to be accepted I very strongly suggest re-thinking if you
really need this feature, especially based on the caveats above.


Let me clarify a bit.

QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary
block devices. This is a feature of QEMU and it would be quite
natural to have its representation in the libvirt as without
libvirt help QEMU could not even start with such a configuration
due to namespaces.

LVM or not LVM. How it is better in comparison with the QCOW2.

Historically when this was considered to be used for the incremental
backup feature in oVirt, similar set of the advantages was picked as the
justification. Later on after discussing this for a bit it became
obvious that the advantages are not as great to justify the extra
effort:

   - extra management complexity (need to carry over the qcow2 as well as
 the data file)
   - possibility to desynchronize the state (write into the 'data_file'
 invalidates the qcow2 "overlay"), without being able to see that it
 was desync'd (writing into the data file doesn't touch the overlay so
 the qcow2 driver doesn't see that).
   - basically no performance benefits on top of qcow2
   - (perhaps other's on oVirt side ... it was long time ago so I don't
 remember any more)

Yes. And definitely we will have this extra complexity over extra
functionality. Right now we have to support for our product
backups of VM data residing on LVM volumes. This is shipped into
the production and I have option to have this in downstream
only or submit this upstream.

Fair enough. As long as you are willing to properly implement backing
file support I'm fine with that. I just wanted to note that one prior
attempt was deemed not worth it and abandoned, so that you can
understand what you are taking on.


The problem is that if we would say that libvirt is not going
this way, we should clearly indicate in
* QCOW2 documentation
* qemu-img man page
that the option of using datafile for VM metadata is deprecated
and will not get further development. This would be at least
fair.

I have no idea how the qemu project approaches this and what they think
about the status of backing files.

For libvirt if qemu supports it it's a fair game to add the feature if
somebody is willing to spend the effort.


We have taken the decision that this scenario should be supported
on the base of availability of this option and presence it in
the public docs.


First of all, there are historical setups when the customer
uses LVM for virtual machines and does not want to reformat
his hard disks to the file system. This makes a sense as we

Yes, such setups would not be possible. Users would need to convert to
qcow2 but that can be done transparently to the VM. (but briefly
requires twice the storage).

That is a BIG problem. Customers do not want to change
disks layout. Each time we try to force them, we get
problems. Real big problems.


are avoiding two level of metadata, i.e. QCOW2 metadata over

local FS metadata. This makes the setup a little bit more

As stated above you don't really need to use a filesystem. You can make
a block device (whether real/partition/lv) into a qcow2 image by simply
using qemu-img format.

We use LVM for big data (TBs in size) and QCOW2 for metadata,
namely CBT. This is how libvirt now distinguish EFI QCOW2

Re: [PATCH 0/3] Implement support for QCOW2 data files

2024-08-14 Thread Denis V. Lunev via Devel

On 8/12/24 16:46, Peter Krempa wrote:

On Mon, Aug 12, 2024 at 12:04:01 +0200, Denis V. Lunev wrote:

On 8/12/24 10:36, Peter Krempa wrote:

On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:

On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote:

There are use cases when the existing disks (i.e. LVM) are wanted
to be used with advanced features. For this purpose QEMU allows
data-file feature for qcow2 files: metadata is kept in the qcow2
file like usual, but guest data is written to an external file.
These patches enable support for this feature in libvirt.

So this feature was once attempted to be added but was never finished
and the comitted bits were reverted eventually. (I've purged my local
archive so I don't have the link handy but I can look if you want the
links to the old posting)

It was deemed that this doesn't really add any performance benefit over
storing the actual qcow2 with data inside. The qcow2 with data can be
stored inside the LV or other block device for that matter and thus can
provide all features that are necessary. The data file feature makes
also the management of the metadata and data much more complex, for a
very bad trade-off. At this point with 'qemu-img measure' it's easy to
query the necessary size to have a fully allocated qcow2 inside the
block device.

Based on the history of this I'd like to ask you to summarize
justifications and reasons for adding this feature before continuing.

Based on the current state of the series and what would be required to
make it viable to be accepted I very strongly suggest re-thinking if you
really need this feature, especially based on the caveats above.


Let me clarify a bit.

QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary
block devices. This is a feature of QEMU and it would be quite
natural to have its representation in the libvirt as without
libvirt help QEMU could not even start with such a configuration
due to namespaces.

LVM or not LVM. How it is better in comparison with the QCOW2.

Historically when this was considered to be used for the incremental
backup feature in oVirt, similar set of the advantages was picked as the
justification. Later on after discussing this for a bit it became
obvious that the advantages are not as great to justify the extra
effort:

  - extra management complexity (need to carry over the qcow2 as well as
the data file)
  - possibility to desynchronize the state (write into the 'data_file'
invalidates the qcow2 "overlay"), without being able to see that it
was desync'd (writing into the data file doesn't touch the overlay so
the qcow2 driver doesn't see that).
  - basically no performance benefits on top of qcow2
  - (perhaps other's on oVirt side ... it was long time ago so I don't
remember any more)


Yes. And definitely we will have this extra complexity over extra
functionality. Right now we have to support for our product
backups of VM data residing on LVM volumes. This is shipped into
the production and I have option to have this in downstream
only or submit this upstream.

The problem is that if we would say that libvirt is not going
this way, we should clearly indicate in
* QCOW2 documentation
* qemu-img man page
that the option of using datafile for VM metadata is deprecated
and will not get further development. This would be at least
fair.

We have taken the decision that this scenario should be supported
on the base of availability of this option and presence it in
the public docs.


First of all, there are historical setups when the customer
uses LVM for virtual machines and does not want to reformat
his hard disks to the file system. This makes a sense as we

Yes, such setups would not be possible. Users would need to convert to
qcow2 but that can be done transparently to the VM. (but briefly
requires twice the storage).


That is a BIG problem. Customers do not want to change
disks layout. Each time we try to force them, we get
problems. Real big problems.


are avoiding two level of metadata, i.e. QCOW2 metadata over

local FS metadata. This makes the setup a little bit more

As stated above you don't really need to use a filesystem. You can make
a block device (whether real/partition/lv) into a qcow2 image by simply
using qemu-img format.

We use LVM for big data (TBs in size) and QCOW2 for metadata,
namely CBT. This is how libvirt now distinguish EFI QCOW2
and disk QCOW2.



reliable. It should also be noted that QCOW2 in very specific

setups is at least 2 times slow (when L2 cache does not fit
into the dedicated amount of RAM while the disk itself is
quite big. I would admit that this problem would be seen even
for not so big 1T disks).

Doesn't a QCOW2 with a 'data file' behave the same as a fully-allocated
qcow2? I don't really see how this is more reliable/performant than
plain fully-allocated qcow2.


No real disk data resides in QCOW2. This is metadata storage, CBT
and memory state only in the case of snapsho

Re: [PATCH 0/3] Implement support for QCOW2 data files

2024-08-12 Thread Denis V. Lunev via Devel

On 8/12/24 10:36, Peter Krempa wrote:

On Mon, Aug 12, 2024 at 09:26:08 +0200, Peter Krempa wrote:

On Sun, Aug 11, 2024 at 17:34:45 +0300, Nikolai Barybin via Devel wrote:

There are use cases when the existing disks (i.e. LVM) are wanted
to be used with advanced features. For this purpose QEMU allows
data-file feature for qcow2 files: metadata is kept in the qcow2
file like usual, but guest data is written to an external file.
These patches enable support for this feature in libvirt.

So this feature was once attempted to be added but was never finished
and the comitted bits were reverted eventually. (I've purged my local
archive so I don't have the link handy but I can look if you want the
links to the old posting)

It was deemed that this doesn't really add any performance benefit over
storing the actual qcow2 with data inside. The qcow2 with data can be
stored inside the LV or other block device for that matter and thus can
provide all features that are necessary. The data file feature makes
also the management of the metadata and data much more complex, for a
very bad trade-off. At this point with 'qemu-img measure' it's easy to
query the necessary size to have a fully allocated qcow2 inside the
block device.

Based on the history of this I'd like to ask you to summarize
justifications and reasons for adding this feature before continuing.

Based on the current state of the series and what would be required to
make it viable to be accepted I very strongly suggest re-thinking if you
really need this feature, especially based on the caveats above.


Let me clarify a bit.

QCOW2 as a data file uses QCOW2 as a metadata storage for ordinary
block devices. This is a feature of QEMU and it would be quite
natural to have its representation in the libvirt as without
libvirt help QEMU could not even start with such a configuration
due to namespaces.

LVM or not LVM. How it is better in comparison with the QCOW2.
First of all, there are historical setups when the customer
uses LVM for virtual machines and does not want to reformat
his hard disks to the file system. This makes a sense as we
are avoiding two level of metadata, i.e. QCOW2 metadata over
local FS metadata. This makes the setup a little bit more
reliable. It should also be noted that QCOW2 in very specific
setups is at least 2 times slow (when L2 cache does not fit
into the dedicated amount of RAM while the disk itself is
quite big. I would admit that this problem would be seen even
for not so big 1T disks).

On top of that LVM is quite useful once we start talking about
clustered LVM setups. Clustered LVM is a good alternative
for CEPH at least for some users.

Thus this data file setup is a way to provide backups, VM
state snapshots and other features.

Den


Re: [PATCH 0/4] Rework qemu internal active snapshots to use QMP

2024-07-26 Thread Denis V. Lunev via Devel

On 7/26/24 12:35, Nikolai Barybin wrote:

Hello everyone! This is a ping email

On 7/16/24 00:42, Nikolai Barybin wrote:

These patches make use of QMP recently added snapshot-save/delete
commands and reaps HMP savevm/deletevm. The usage of HMP commands
are highly discouraged by QEMU.

Nikolai Barybin (4):
   qemu monitor: add snaphot-save/delete QMP commands
   qemu blockjob: add snapshot-save/delete job types
   qemu snapshot: use QMP snapshot-save/delete for internal snapshots
   qemu monitor: reap qemu_monitor_text

  po/POTFILES  |   1 -
  po/libvirt.pot   |  18 
  src/qemu/meson.build |   1 -
  src/qemu/qemu_block.c    |   2 +
  src/qemu/qemu_blockjob.c |   6 +-
  src/qemu/qemu_blockjob.h |   2 +
  src/qemu/qemu_domain.c   |   4 +
  src/qemu/qemu_monitor.c  |  23 +++--
  src/qemu/qemu_monitor.h  |  16 +++-
  src/qemu/qemu_monitor_json.c |  66 +++
  src/qemu/qemu_monitor_json.h |  13 +++
  src/qemu/qemu_monitor_text.c |  88 ---
  src/qemu/qemu_monitor_text.h |  29 ---
  src/qemu/qemu_snapshot.c | 158 ---
  14 files changed, 267 insertions(+), 160 deletions(-)
  delete mode 100644 src/qemu/qemu_monitor_text.c
  delete mode 100644 src/qemu/qemu_monitor_text.h


1. Please do not top-post
2. I believe you are pinging wrong thread, you should do that in v2


Re: [PATCH 3/4] qemu snapshot: use QMP snapshot-save/delete for internal snapshots

2024-07-16 Thread Denis V. Lunev via Devel

On 7/16/24 00:42, Nikolai Barybin wrote:

The usage of HMP commands are highly discouraged by qemu. Moreover,
current snapshot creation routine does not provide flexibility in
choosing target device for VM state snapshot.

This patch makes use of QMP commands snapshot-save/delete and by
default chooses first writable disk (if present) as target for VM
state, NVRAM - otherwise.

Signed-off-by: Nikolai Barybin 
---
  src/qemu/qemu_snapshot.c | 158 ---
  1 file changed, 148 insertions(+), 10 deletions(-)

diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index f5260c4a22..83949a9a27 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -308,6 +308,96 @@ qemuSnapshotCreateInactiveExternal(virQEMUDriver *driver,
  return ret;
  }
  
+static int

+qemuSnapshotActiveInternalGetWrdevListHelper(virDomainObj *vm,
+ char **wrdevs)
+{
+size_t wrdevCount = 0;
+size_t i = 0;
+
+for (i = 0; i < vm->def->ndisks; i++) {
+virDomainDiskDef *disk = vm->def->disks[i];
+if (!disk->src->readonly) {
+wrdevs[wrdevCount] = g_strdup(disk->src->nodenameformat);
+wrdevCount++;
+}
+}
+
+if (wrdevCount == 0) {
+if (vm->def->os.loader->nvram) {
+wrdevs[0] = g_strdup(vm->def->os.loader->nvram->nodenameformat);
+} else {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("no writable device for internal snapshot 
creation/deletion"));
+return -1;
+}
+}
+
+return 0;
+}
+
+
+static int
+qemuSnapshotCreateActiveInternalDone(virDomainObj *vm)
+{
+qemuBlockJobData *job = NULL;
+qemuDomainObjPrivate *priv = vm->privateData;
+
+if (!(job = virHashLookup(priv->blockjobs, g_strdup_printf("snapsave%d", 
vm->def->id {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("failed to lookup blockjob 'snapsave%1$d'"), 
vm->def->id);
+return -1;
+}
+
+qemuBlockJobUpdate(vm, job, VIR_ASYNC_JOB_NONE);
+if (job->state == VIR_DOMAIN_BLOCK_JOB_FAILED) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("snapshot-save job failed: %1$s"), 
NULLSTR(job->errmsg));
+return -1;
+}
+
+return job->state == VIR_DOMAIN_BLOCK_JOB_COMPLETED ? 1 : 0;
+}
+
+
+static int
+qemuSnapshotCreateActiveInternalStart(virDomainObj *vm,
+  const char *name)
+{
+qemuBlockJobData *job = NULL;
+g_autofree char** wrdevs = NULL;
+int ret = -1;
+int rc = 0;
+
+wrdevs = g_new0(char *, vm->def->ndisks + 1);
+if (qemuSnapshotActiveInternalGetWrdevListHelper(vm, wrdevs) < 0)
+return -1;
+
+if (!(job = qemuBlockJobDiskNew(vm, NULL, QEMU_BLOCKJOB_TYPE_SNAPSHOT_SAVE,
+g_strdup_printf("snapsave%d", 
vm->def->id {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to create new 
blockjob"));
+return -1;
+}
+
+qemuBlockJobSyncBegin(job);
+if (qemuDomainObjEnterMonitorAsync(vm, VIR_ASYNC_JOB_SNAPSHOT) < 0) {
+ret = -1;
+goto cleanup;
+}
+
+rc = qemuMonitorSnapshotSave(qemuDomainGetMonitor(vm), job->name,
+ name, wrdevs[0], wrdevs);
+qemuDomainObjExitMonitor(vm);
+if (rc == 0) {
+qemuBlockJobStarted(job, vm);
+ret = 0;
+}
+
+ cleanup:
+qemuBlockJobStartupFinalize(vm, job);
+return ret;
+}
+
  
  /* The domain is expected to be locked and active. */

  static int
@@ -316,11 +406,11 @@ qemuSnapshotCreateActiveInternal(virQEMUDriver *driver,
   virDomainMomentObj *snap,
   unsigned int flags)
  {
-qemuDomainObjPrivate *priv = vm->privateData;
  virObjectEvent *event = NULL;
  bool resume = false;
  virDomainSnapshotDef *snapdef = virDomainSnapshotObjGetDef(snap);
  int ret = -1;
+int rv = 0;
  
  if (!qemuMigrationSrcIsAllowed(vm, false, VIR_ASYNC_JOB_SNAPSHOT, 0))

  goto cleanup;
@@ -342,15 +432,17 @@ qemuSnapshotCreateActiveInternal(virQEMUDriver *driver,
  }
  }
  
-if (qemuDomainObjEnterMonitorAsync(vm, VIR_ASYNC_JOB_SNAPSHOT) < 0) {

+if ((ret = qemuSnapshotCreateActiveInternalStart(vm, snap->def->name)) < 
0) {
  resume = false;
  goto cleanup;
  }
  
-ret = qemuMonitorCreateSnapshot(priv->mon, snap->def->name);

-qemuDomainObjExitMonitor(vm);
-if (ret < 0)
-goto cleanup;


'snapshot-save' has been added in QEMU 6.0.
Right now we are at 8.2

Should we still support pre 6.0 versions, i.e. check the availability of the
command via proper capability?

Den


Re: [PATCH 3/4] qemu snapshot: use QMP snapshot-save/delete for internal snapshots

2024-07-16 Thread Denis V. Lunev via Devel

On 7/16/24 00:42, Nikolai Barybin wrote:

The usage of HMP commands are highly discouraged by qemu. Moreover,
current snapshot creation routine does not provide flexibility in
choosing target device for VM state snapshot.

This patch makes use of QMP commands snapshot-save/delete and by
default chooses first writable disk (if present) as target for VM
state, NVRAM - otherwise.

Signed-off-by: Nikolai Barybin 
---
  src/qemu/qemu_snapshot.c | 158 ---
  1 file changed, 148 insertions(+), 10 deletions(-)

diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index f5260c4a22..83949a9a27 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -308,6 +308,96 @@ qemuSnapshotCreateInactiveExternal(virQEMUDriver *driver,
  return ret;
  }
  
+static int

+qemuSnapshotActiveInternalGetWrdevListHelper(virDomainObj *vm,
+ char **wrdevs)
+{
+size_t wrdevCount = 0;
+size_t i = 0;
+
+for (i = 0; i < vm->def->ndisks; i++) {
+virDomainDiskDef *disk = vm->def->disks[i];
+if (!disk->src->readonly) {
+wrdevs[wrdevCount] = g_strdup(disk->src->nodenameformat);
+wrdevCount++;
+}
+}
+
+if (wrdevCount == 0) {
+if (vm->def->os.loader->nvram) {
+wrdevs[0] = g_strdup(vm->def->os.loader->nvram->nodenameformat);
+} else {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("no writable device for internal snapshot 
creation/deletion"));
+return -1;
+}
+}
+


should we select NVRAM at all?

IMHO that would be very problematic.
I think that we should generate error in this case.

Den


Re: [PATCH] Сheck snapshot disk is not NULL when searching it in the VM config

2024-05-17 Thread Denis V. Lunev via Devel

On 4/29/24 14:43, Fima Shevrin wrote:

When creating a snapshot of a VM with multiple hard disks,
the snapshot takes into account the presence of all disks
in the system. If, over time, one of the disks is deleted,
the snapshot will continue to store knowledge of the deleted disk.
This results in the fact that at the moment of deleting the snapshot,
at the validation stage, a disk from the snapshot will be searched which
is not in the VM configuration. As a result, vmdisk variable will
be equal to NULL. Dereferencing a null pointer at the time of calling
virStorageSourceIsSameLocation(vmdisk->src, disk->src) will result in SIGSEGV.

Signed-off-by: Fima Shevrin 
---
  src/qemu/qemu_snapshot.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index 09ec959f10..bf93cd485e 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -3806,7 +3806,7 @@ qemuSnapshotDeleteValidate(virDomainObj *vm,
  vmdisk = qemuDomainDiskByName(vm->def, snapDisk->name);
  disk = qemuDomainDiskByName(snapdef->parent.dom, snapDisk->name);
  
-if (!virStorageSourceIsSameLocation(vmdisk->src, disk->src)) {

+if (vmdisk != NULL && !virStorageSourceIsSameLocation(vmdisk->src, 
disk->src)) {
  virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
 _("disk image '%1$s' for internal snapshot '%2$s' 
is not the same as disk image currently used by VM"),
 snapDisk->name, snap->def->name);

ping