Re: device compatibility interface for live migration with assigned devices

2020-07-20 Thread Jason Wang



On 2020/7/20 下午6:39, Sean Mooney wrote:

On Mon, 2020-07-20 at 11:41 +0800, Jason Wang wrote:

On 2020/7/18 上午12:12, Alex Williamson wrote:

On Thu, 16 Jul 2020 16:32:30 +0800
Yan Zhao  wrote:


On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote:

On 2020/7/14 上午7:29, Yan Zhao wrote:

hi folks,
we are defining a device migration compatibility interface that helps upper
layer stack like openstack/ovirt/libvirt to check if two devices are
live migration compatible.
The "devices" here could be MDEVs, physical devices, or hybrid of the two.
e.g. we could use it to check whether
- a src MDEV can migrate to a target MDEV,
- a src VF in SRIOV can migrate to a target VF in SRIOV,
- a src MDEV can migration to a target VF in SRIOV.
 (e.g. SIOV/SRIOV backward compatibility case)

The upper layer stack could use this interface as the last step to check
if one device is able to migrate to another device before triggering a real
live migration procedure.
we are not sure if this interface is of value or help to you. please don't
hesitate to drop your valuable comments.


(1) interface definition
The interface is defined in below way:

__userspace
 /\  \
/ \write
   / read  \
  /__   ___\|/_
 | migration_version | | migration_version |-->check migration
 - -   compatibility
device Adevice B


a device attribute named migration_version is defined under each device's
sysfs node. e.g. 
(/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).

Are you aware of the devlink based device management interface that is
proposed upstream? I think it has many advantages over sysfs, do you
consider to switch to that?

Advantages, such as?


My understanding for devlink(netlink) over sysfs (some are mentioned at
the time of vDPA sysfs mgmt API discussion) are:

i tought netlink was used more a as a configuration protocoal to qurry and 
confire nic and i guess
other devices in its devlink form requireint a tool to be witten that can speak 
the protocal to interact with.
the primary advantate of sysfs is that everything is just a file. there are no 
addtional depleenceis
needed



Well, if you try to build logic like introspection on top for a 
sophisticated hardware, you probably need to have library on top. And 
it's attribute per file is pretty inefficient.




  and unlike netlink there are not interoperatblity issues in a coanitnerised 
env. if you are using diffrenet
version of libc and gcc in the contaienr vs the host my understanding is tools 
like ethtool from ubuntu deployed
in a container on a centos host can have issue communicating with the host 
kernel.



Kernel provides stable ABI for userspace, so it's not something that we 
can't fix.




if its jsut a file unless
the format the data is returnin in chagnes or the layout of sysfs changes its 
compatiable regardless of what you
use to read it.



I believe you can't change sysfs layout which is part of uABI. But as I 
mentioned below, sysfs has several drawbacks. It's not harm to compare 
between different approach when you start a new device management API.


Thanks



- existing users (NIC, crypto, SCSI, ib), mature and stable
- much better error reporting (ext_ack other than string or errno)
- namespace aware
- do not couple with kobject

Thanks





Re: device compatibility interface for live migration with assigned devices

2020-07-20 Thread Yan Zhao
On Fri, Jul 17, 2020 at 10:12:58AM -0600, Alex Williamson wrote:
<...>
> > yes, in another reply, Alex proposed to use an interface in json format.
> > I guess we can define something like
> > 
> > { "self" :
> >   [
> > { "pciid" : "8086591d",
> >   "driver" : "i915",
> >   "gvt-version" : "v1",
> >   "mdev_type"   : "i915-GVTg_V5_2",
> >   "aggregator"  : "1",
> >   "pv-mode" : "none",
> > }
> >   ],
> >   "compatible" :
> >   [
> > { "pciid" : "8086591d",
> >   "driver" : "i915",
> >   "gvt-version" : "v1",
> >   "mdev_type"   : "i915-GVTg_V5_2",
> >   "aggregator"  : "1"
> >   "pv-mode" : "none",
> > },
> > { "pciid" : "8086591d",
> >   "driver" : "i915",
> >   "gvt-version" : "v1",
> >   "mdev_type"   : "i915-GVTg_V5_4",
> >   "aggregator"  : "2"
> >   "pv-mode" : "none",
> > },
> > { "pciid" : "8086591d",
> >   "driver" : "i915",
> >   "gvt-version" : "v2",
> >   "mdev_type"   : "i915-GVTg_V5_4",
> >   "aggregator"  : "2"
> >   "pv-mode" : "none, ppgtt, context",
> > }
> > ...
> >   ]
> > }
> > 
> > But as those fields are mostly vendor specific, the userspace can
> > only do simple string comparing, I guess the list would be very long as
> > it needs to enumerate all possible targets.
> 
> 
> This ignores so much of what I tried to achieve in my example :(
> 
sorry, I just was eager to show and confirm the way to list all compatible
combination of mdev_type and mdev attributes.

> 
> > also, in some fileds like "gvt-version", is there a simple way to express
> > things like v2+?
> 
> 
> That's not a reasonable thing to express anyway, how can you be certain
> that v3 won't break compatibility with v2?  Sean proposed a versioning
> scheme that accounts for this, using an x.y.z version expressing the
> major, minor, and bugfix versions, where there is no compatibility
> across major versions, minor versions have forward compatibility (ex. 1
> -> 2 is ok, 2 -> 1 is not) and bugfix version number indicates some
> degree of internal improvement that is not visible to the user in terms
> of features or compatibility, but provides a basis for preferring
> equally compatible candidates.
>
right. if self version is v1, it can't know its compatible version is
v2. it can only be done in reverse. i.e.
when self version is v2, it can list its compatible version is v1 and
v2.
and maybe later when self version is v3, there's no v1 in its compatible
list.

In this way, do you think we still need the complex x.y.z versioning scheme?

>  
> > If the userspace can read this interface both in src and target and
> > check whether both src and target are in corresponding compatible list, I
> > think it will work for us.
> > 
> > But still, kernel should not rely on userspace's choice, the opaque
> > compatibility string is still required in kernel. No matter whether
> > it would be exposed to userspace as an compatibility checking interface,
> > vendor driver would keep this part of code and embed the string into the
> > migration stream. so exposing it as an interface to be used by libvirt to
> > do a safety check before a real live migration is only about enabling
> > the kernel part of check to happen ahead.
> 
> As you indicate, the vendor driver is responsible for checking version
> information embedded within the migration stream.  Therefore a
> migration should fail early if the devices are incompatible.  Is it
but as I know, currently in VFIO migration protocol, we have no way to
get vendor specific compatibility checking string in migration setup stage
(i.e. .save_setup stage) before the device is set to _SAVING state.
In this way, for devices who does not save device data in precopy stage,
the migration compatibility checking is as late as in stop-and-copy
stage, which is too late.
do you think we need to add the getting/checking of vendor specific
compatibility string early in save_setup stage?

> really libvirt's place to second guess what it has been directed to do?
if libvirt uses the scheme of reading compatibility string at source and
writing for checking at the target, it can not be called "a second guess".
It's not a guess, but a confirmation.

> Why would we even proceed to design a user parse-able version interface
> if we still have a dependency on an opaque interface?  Thanks,
one reason is that libvirt can't trust the parsing result from
openstack.
Another reason is that libvirt can use this opaque interface easier than
another parsing by itself, in the fact that it would not introduce more
burden to kernel who would write this part of code anyway, no matter
libvirt uses it or not.
 
Thanks
Yan



Re: [libvirt PATCH 0/4] storage: support controlling COW attribute for pool

2020-07-20 Thread Neal Gompa
On Mon, Jul 20, 2020 at 1:33 PM Daniel P. Berrangé  wrote:
>
> We already support a "nocow" flag for storage volumes, but this requires
> extra work by the mgmt app or user when creating images on btrfs. We
> want to "do the right thing" out of the box for btrfs.
>
> We achieve this by changint the storage pool code so that when creating
> a storage pool we always try to disable COW on btrfs filesystems. We
> then add an  feature in the pool XML to let apps
> override the default logic.
>
> The COW setting on the pool is inherited by any volumes.
>
> The main thing not solved here is that the default directory at
> /var/lib/libvirt/images is created by the RPM itself, not by a
> normal "pool-build" command.
>
> Fortunately it appears that virt-install  will explicitly invoke a
> storage pool build even if the directory already exists.
>
> Daniel P. Berrangé (4):
>   util: add a helper method for controlling the COW flag on btrfs
>   storage: convert to use virFileSetCOW
>   storage: attempt to disable COW by default
>   conf: add control over COW for storage pool directories
>
>  docs/formatstorage.html.in   | 25 +++
>  docs/schemas/storagepool.rng | 30 
>  src/conf/storage_conf.c  | 49 +
>  src/conf/storage_conf.h  |  8 +++
>  src/libvirt_private.syms |  1 +
>  src/storage/storage_util.c   | 46 +---
>  src/util/virfile.c   | 76 
>  src/util/virfile.h   |  3 +
>  tests/storagepoolxml2xmlin/pool-dir-cow.xml  | 10 +++
>  tests/storagepoolxml2xmlout/pool-dir-cow.xml | 15 
>  tests/storagepoolxml2xmltest.c   |  1 +
>  11 files changed, 237 insertions(+), 27 deletions(-)
>  create mode 100644 tests/storagepoolxml2xmlin/pool-dir-cow.xml
>  create mode 100644 tests/storagepoolxml2xmlout/pool-dir-cow.xml
>

Patch series looks fine. Tested it and it seemed to automatically do
what I expected on a fresh Fedora Rawhide system.

Reviewed-by: Neal Gompa 


-- 
真実はいつも一つ!/ Always, there's only one truth!




Re: [libvirt PATCH v2 08/15] nwfilter: remove unnecessary code from ebtablesGetSubChainInsts()

2020-07-20 Thread Laine Stump

On 7/20/20 5:04 PM, Ján Tomko wrote:

On a Saturday in 2020, Laine Stump wrote:

On 7/15/20 11:30 AM, Ján Tomko wrote:

On a Tuesday in 2020, Laine Stump wrote:


Signed-off-by: Laine Stump 



My S-o-b stands. I still think this is the right thing to do.



S-o-b merely means that you are the author and/or have the author's
permission to use the code. I don't think you can revoke a S-o-b,
even if you don't think the code is right. 



Yeah, I know that a misuse of S-o-b, I just like the idea of putting 
something at the end of the commit message that's equivalent to a 
politician at the end of their election ad: "I'm Zaphod Beeblebrox, and 
I approve this message". :-)




Re: [PATCH 10/10] virDomainSetBlockThreshold: Mention that the event can be registered for

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

The infrastructure supports setting the threshold also for the .
Mention it in the docs.

https://bugzilla.redhat.com/show_bug.cgi?id=1807741

Signed-off-by: Peter Krempa 
---
  src/libvirt-domain.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index 6c5ff5b0db..a4e73d5480 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12381,6 +12381,9 @@ int virDomainGetGuestInfo(virDomainPtr domain,
   * live VM XML for 'backingStore' or 'source' elements of a disk. If index is
   * given the threshold is set for the corresponding image.
   *
+ * Note that the threshold event can be registered also for destinations of a
+ * 'virDomainBlockCopy' destination by using the 'index' of the 'mirror' 
source.
+ *
   * Hypervisors report the last written sector of an image in the bulk stats 
API


Reviewed-by: Eric Blake 


   * (virConnectGetAllDomainStats/virDomainListGetStats) as
   * "block..allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 09/10] qemuDomainGetStorageSourceByDevstr: Look also in 'mirror' chain

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

A disk can have a mirror, look also in it's backing chain.


its

(it's is only valid when you can replace with 'it is')



Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_domain.c | 13 +
  1 file changed, 9 insertions(+), 4 deletions(-)



Reviewed-by: Eric Blake 

Hmm, when doing a pull-mode backup, do we ever want a write-threshold on 
the temporary image?  Or is this only for actual block-copy mirroring, 
and not backups?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [libvirt PATCH v2 08/15] nwfilter: remove unnecessary code from ebtablesGetSubChainInsts()

2020-07-20 Thread Ján Tomko

On a Saturday in 2020, Laine Stump wrote:

On 7/15/20 11:30 AM, Ján Tomko wrote:

On a Tuesday in 2020, Laine Stump wrote:

On failure, this function would clear out and free the list of
subchains it had been called with. This is unnecessary, because the
*only* caller of this function will also clear out and free the list
of subchains if it gets a failure from ebtablesGetSubChainInsts().

(It also makes more logical sense for the function that is creating
the entire list to be the one freeing the entire list, rather than
having a function whose purpose is only to create *one item* on the
list freeing the entire list).


This is the function creating the list,



I disagree with that characterization. The list is created, with 0 
elements, when the caller (ebiptablesApplyNewRules()) defines it. Then 
each time ebtablesGetSubChainInsts() is called, it doesn't create the 
list anew, it just adds to whatever is already on the existing list - 
as a matter of fact it is called multiple times and each time it adds 
more items to the list without re=initializing it.




Oh, I missed that it's called twice - I thought it was either one or the
other call.



This is very much like what happens with a virBuffer - some function 
creates a virBuffer by defining it and initializing it to empty, then 
each time a virBuffer function is called, it adds more text to the 
buffer. But if there is an error in a virBuffer function, it doesn't 
clear out the buffer before returning, it just returns an error 
leaving the buffer in whatever state it was in when the error 
occurred; it is then up to the caller, who is the owner of the 
virBuffer, to clear it out.




I think it makes sense
to not leave anything allocated in case of failure.



Aside from making the code simpler and cleaner, I think it doesn't 
make sense for one invocation of the function to clear out anything 
that was put into the list by *a different* invocation of the 
function. If you're going to be a purist about it, then a failed 
ebtablesGetSubChainInsts() should remove from the list *only those 
items that were added during the current call* and nothing else.




Yeah, that would be wrong.



But that's just pedantic nitpicking (Hey, *you* started the nitpicking 
though :-P)



(Also, there is only one caller of ebtablesGetSubChainInsts(), and 
whenever ebtablesGetSubChainInsts() fails, the *very next thing* that 
caller does is to clear out the entire list. So in fact, "nothing is 
left allocated in case of failure".)





Jano



Signed-off-by: Laine Stump 



My S-o-b stands. I still think this is the right thing to do.



S-o-b merely means that you are the author and/or have the author's
permission to use the code. I don't think you can revoke a S-o-b,
even if you don't think the code is right.





---
src/nwfilter/nwfilter_ebiptables_driver.c | 6 --
1 file changed, 6 deletions(-)





Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature


Re: [PATCH 08/10] qemuDomainDiskLookupByNodename: Look also for 'mirror' node names

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

When doing a block copy, there is another chain of images attached to a
disk. Consider them as well when looking up a disk using nodename.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_domain.c | 8 
  1 file changed, 8 insertions(+)


Reviewed-by: Eric Blake 



diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index ed7ec77ed4..18fd445e30 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11492,6 +11492,14 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def,

  return def->disks[i];
  }
+
+if (def->disks[i]->mirror &&
+(tmp = virStorageSourceFindByNodeName(def->disks[i]->mirror, 
nodename))) {
+if (src)
+*src = tmp;
+
+return def->disks[i];
+}
  }

  return NULL;



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 07/10] virStorageSourceFindByNodeName: Remove unused 'idx' argument

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

None of the callers actually use it.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_domain.c|  9 -
  src/util/virstoragefile.c | 16 +++-
  src/util/virstoragefile.h |  3 +--
  3 files changed, 8 insertions(+), 20 deletions(-)


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 06/10] qemuDomainDiskLookupByNodename: Remove unused 'idx'

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

All callers pass NULL as the value. Remove the argument.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_domain.c  | 12 ++--
  src/qemu/qemu_domain.h  |  3 +--
  src/qemu/qemu_process.c |  4 ++--
  3 files changed, 5 insertions(+), 14 deletions(-)


Another fun code cleanup.  All this refactoring work is getting somewhere...

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 05/10] virDomainSetBlockThreshold: Clarify values of @dev the event is fired for

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

Top level image may get two events, one with the disk target (vda) and
one with disk target with index (vda[3]) if the top level image has an
index.

Signed-off-by: Peter Krempa 
---
  src/libvirt-domain.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index ba30d18f65..6c5ff5b0db 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12370,6 +12370,10 @@ int virDomainGetGuestInfo(virDomainPtr domain,
   * described by @dev is written beyond the set threshold level. The threshold
   * level is unset once the event fires. The event might not be delivered at 
all
   * if libvirtd was not running at the moment when the threshold was reached.
+ * Note that if the threshold level is reached for a top level image the event


s/image/image,/


+ * is emitted for @dev corresponding to the disk target, and may also be 
reported
+ * with @dev corresponding to the disk target with an index corresponding to 
the
+ * 'index' attribute of 'source' in the live VM XML if the atribute is present.


attribute


   *
   * @dev can either be a disk target name (vda, sda) or disk target with index 
(
   * vda[4]). Without the index the top image in the backing chain will have the



With fixes,
Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 04/10] qemuProcessHandleBlockThreshold: Report correct indexes

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

The index returned by qemuDomainDiskLookupByNodename is the position in
the backing chain rather than the index we report in the XML.

Since with -blockdev they differ now and additionally the disk source
also has an index we need to fix the 'threshold' evens we report:


events



1) If it's the top level image we must always trigger the event without
any suffix as we did until now

2) We must report the correct index

3) We must report the correct index also for the top level image, when
blockdev is used.

This means that we need to potentially emit 2 events, one for the device
without the index and then when blockdev is used and the top level image
has an idex we must do it also with the index.


index



This will fix it for blockdev cases, while also not removing previous
semantics.

https://bugzilla.redhat.com/show_bug.cgi?id=1857204

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_process.c | 26 ++
  1 file changed, 18 insertions(+), 8 deletions(-)



With typos fixed,
Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 02/10] qemuDomainDiskBackingStoreGetName: Remove unused argument

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_domain.c  | 1 -
  src/qemu/qemu_domain.h  | 1 -
  src/qemu/qemu_process.c | 2 +-
  3 files changed, 1 insertion(+), 3 deletions(-)


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 03/10] qemuDomainDiskBackingStoreGetName: Eliminate temp variable

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

We can return the formatted string directly.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_domain.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 3d136a6b8a..cfdd9270da 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11517,14 +11517,10 @@ char *
  qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk,
unsigned int idx)
  {
-char *ret = NULL;
-
  if (idx)
-ret = g_strdup_printf("%s[%d]", disk->dst, idx);
+return g_strdup_printf("%s[%d]", disk->dst, idx);
  else
-ret = g_strdup(disk->dst);
-
-return ret;
+return g_strdup(disk->dst);


You could even get rid of the 'else', and less indentation on this line. 
 Whichever way is fine.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



[PATCH 3/5] util: log an error if virXMLNodeContentString will return NULL

2020-07-20 Thread Laine Stump
Many of our calls to xmlNodeGetContent() (which are now all via
virXMLNodeContentString() are failing to check for a NULL return. We
need to remedy that, but in order to make the remedy simpler, let's
log an error in virXMLNodeContentString(), so that the callers don't
all individually need to (since it would be the same error message for
all of them anyway).

Signed-off-by: Laine Stump 
---
 src/util/virxml.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/util/virxml.c b/src/util/virxml.c
index 27d22598ee..5315d4ff6f 100644
--- a/src/util/virxml.c
+++ b/src/util/virxml.c
@@ -538,7 +538,23 @@ virXMLPropStringLimit(xmlNodePtr node,
 char *
 virXMLNodeContentString(xmlNodePtr node)
 {
-return (char *)xmlNodeGetContent(node);
+char *ret = (char *)xmlNodeGetContent(node);
+
+if (node->type !=  XML_ELEMENT_NODE) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("node '%s' has unexpected type %d"),
+   node->name, node->type);
+return NULL;
+}
+
+if (!ret) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("node '%s' has unexpected NULL content. This could be 
caused by malformed input, or a memory allocation failure"),
+   node->name);
+return NULL;
+}
+
+return ret;
 }
 
 
-- 
2.25.4



[PATCH 1/5] conf: refactor virDomainBlkioDeviceParseXML to reduce calls to xmlNodeGetContent

2020-07-20 Thread Laine Stump
virDomainBlkioDeviceParseXML() calls xmlNodeGetContent() multiple
times in a loop, but can easily be refactored to call it once for all
element nodes, and then use the result of that one call in each of the
(mutually exclusive) blocks that previously each had their own call to
xmlNodeGetContent.

This is being done in order to reduce the number of changes needed in
an upcoming patch that will eliminate the lack of checking for NULL on
return from xmlNodeGetContent().

As part of the simplification, the while() loop has been changed into
a for() so that we can use "continue" without bypassing the "node =
node->next".

Signed-off-by: Laine Stump 

Change from V1: turned into for() loop and log error rather than
ignoring NULL. Jano had suggested we might be able to set dev->path
directly instead of using a temporary var, but doing that would
require keeping the error: label and its cleanup of dev->path, rather
than just relying on g_autofree.

Signed-off-by: Laine Stump 
---
 src/conf/domain_conf.c | 128 ++---
 1 file changed, 70 insertions(+), 58 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 7ecd2818b9..ade8c13914 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -1635,73 +1635,85 @@ virDomainBlkioDeviceParseXML(xmlNodePtr root,
  virBlkioDevicePtr dev)
 {
 xmlNodePtr node;
-g_autofree char *c = NULL;
-
-node = root->children;
-while (node) {
-if (node->type == XML_ELEMENT_NODE) {
-if (virXMLNodeNameEqual(node, "path") && !dev->path) {
-dev->path = (char *)xmlNodeGetContent(node);
-} else if (virXMLNodeNameEqual(node, "weight")) {
-c = (char *)xmlNodeGetContent(node);
-if (virStrToLong_ui(c, NULL, 10, >weight) < 0) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-   _("could not parse weight %s"),
-   c);
-goto error;
-}
-VIR_FREE(c);
-} else if (virXMLNodeNameEqual(node, "read_bytes_sec")) {
-c = (char *)xmlNodeGetContent(node);
-if (virStrToLong_ull(c, NULL, 10, >rbps) < 0) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-   _("could not parse read bytes sec %s"),
-   c);
-goto error;
-}
-VIR_FREE(c);
-} else if (virXMLNodeNameEqual(node, "write_bytes_sec")) {
-c = (char *)xmlNodeGetContent(node);
-if (virStrToLong_ull(c, NULL, 10, >wbps) < 0) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-   _("could not parse write bytes sec %s"),
-   c);
-goto error;
-}
-VIR_FREE(c);
-} else if (virXMLNodeNameEqual(node, "read_iops_sec")) {
-c = (char *)xmlNodeGetContent(node);
-if (virStrToLong_ui(c, NULL, 10, >riops) < 0) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-   _("could not parse read iops sec %s"),
-   c);
-goto error;
-}
-VIR_FREE(c);
-} else if (virXMLNodeNameEqual(node, "write_iops_sec")) {
-c = (char *)xmlNodeGetContent(node);
-if (virStrToLong_ui(c, NULL, 10, >wiops) < 0) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-   _("could not parse write iops sec %s"),
-   c);
-goto error;
-}
-VIR_FREE(c);
+g_autofree char *path = NULL;
+
+for (node = root->children; node != NULL; node = node->next) {
+g_autofree char *c = NULL;
+
+if (node->type != XML_ELEMENT_NODE)
+continue;
+
+c = (char *)xmlNodeGetContent(node);
+
+if (virXMLNodeNameEqual(node, "path")) {
+/* To avoid the need for explicit cleanup on failure,
+ * don't set dev->path until we're assured of
+ * success. Until then, store it in an autofree pointer.
+ */
+if (!path)
+path = g_steal_pointer();
+continue;
+}
+
+if (virXMLNodeNameEqual(node, "weight")) {
+if (virStrToLong_ui(c, NULL, 10, >weight) < 0) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("could not parse weight %s"),
+   c);
+return -1;
 }
+continue;
+}
+
+if (virXMLNodeNameEqual(node, "read_bytes_sec")) {
+if (virStrToLong_ull(c, 

[FYI PATCH 5/5] util: open code virXMLNodeContentString to access the node object directly

2020-07-20 Thread Laine Stump
(I am *NOT* advocating that we apply this patch. Just providing it for
informational purposes, since we had previously discussed this
possibility on the list)

Since it's impossible to determine whether xmlNodeContent has returned
a NULL due to OOM, or due to badly formed / evil XML, this patch open
codes virXMLNodeContentString to get the content string directly from
the node.

This turns out to not be so easy as it seemed at first glance when it
was suggested - the "content" member of the element node itself does
not contain the content string for the node. The content string that
we want can be found (at least for our uses of libxml) by looking for
a child node of the element node - if that child node is of type
XML_TEXT_NODE, then the content member of *that* node is the string
we're looking for. If there is no child node, then the element has no
content, so we return "". Likewise, if the child node is type
XML_TEXT_NODE but has no content, we also return "". In all other
cases, we log an error and return because this is some case that
hasn't been encountered in our test cases, so either someone is
sending bad XML, or our assumptions about the layout of the XML node
object list are incorrect.

Note that while calling virXMLNodeContentString() would return NULL
from an OOM situation, this new code will exit the process on OOM
(since it is calling glib for memory allocation).

Signed-off-by: Laine Stump 
---
 src/util/virxml.c | 43 ++-
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/src/util/virxml.c b/src/util/virxml.c
index 5315d4ff6f..b2298d74c8 100644
--- a/src/util/virxml.c
+++ b/src/util/virxml.c
@@ -538,7 +538,17 @@ virXMLPropStringLimit(xmlNodePtr node,
 char *
 virXMLNodeContentString(xmlNodePtr node)
 {
-char *ret = (char *)xmlNodeGetContent(node);
+/* We specifically avoid using virXMLNodeContentString() here, because
+ * when NULL is returned, it is difficult/impossible to
+ * distinguish between 1) OOM, 2) NULL content, 3) some other error.
+ */
+
+/* for elements used the way libvirt uses them, the xmlNode object
+ * for an element will have a type of XML_ELEMENT_NODE, and if the
+ * node has any content, it will be in the content field of a
+ * child node of that object which is itself of type
+ * XML_TEXT_NODE.
+ */
 
 if (node->type !=  XML_ELEMENT_NODE) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
@@ -547,15 +557,38 @@ virXMLNodeContentString(xmlNodePtr node)
 return NULL;
 }
 
-if (!ret) {
+/* no children --> empty element node */
+if (!node->children)
+return g_strdup("");
+
+/* if the child isn't text, or there is more than a single node
+ * hanging off "children", our assumptions have been wrong
+ */
+if (node->children->type != XML_TEXT_NODE) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("child of element node '%s' has unexpected name '%s', 
type %d"),
+   node->name, node->children->name, node->children->type);
+return NULL;
+}
+if (node->children->next) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
-   _("node '%s' has unexpected NULL content. This could be 
caused by malformed input, or a memory allocation failure"),
+   _("child of element node '%s' is type XML_TEXT_NODE, 
but is a list"),
+   node->name);
+return NULL;
+}
+if (node->children->children) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("child of element node '%s' is type XML_TEXT_NODE, 
but has children"),
node->name);
 return NULL;
 }
 
-return ret;
-}
+/* if content is NULL, return "" instead */
+if (!node->children->content)
+return g_strdup("");
+
+return g_strdup((char *)node->children->content);
+ }
 
 
 /**
-- 
2.25.4



[PATCH 2/5] util: replace all calls to xmlNodeGetContent with virXMLNodeContentString

2020-07-20 Thread Laine Stump
No functional change

Signed-off-by: Laine Stump 
---
 src/conf/domain_conf.c  | 20 ++--
 src/conf/network_conf.c |  2 +-
 src/conf/node_device_conf.c |  4 ++--
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index ade8c13914..cb69c97a8e 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -1643,7 +1643,7 @@ virDomainBlkioDeviceParseXML(xmlNodePtr root,
 if (node->type != XML_ELEMENT_NODE)
 continue;
 
-c = (char *)xmlNodeGetContent(node);
+c = virXMLNodeContentString(node);
 
 if (virXMLNodeNameEqual(node, "path")) {
 /* To avoid the need for explicit cleanup on failure,
@@ -9373,10 +9373,10 @@ virDomainLeaseDefParseXML(xmlNodePtr node)
 while (cur != NULL) {
 if (cur->type == XML_ELEMENT_NODE) {
 if (!key && virXMLNodeNameEqual(cur, "key")) {
-key = (char *)xmlNodeGetContent(cur);
+key = virXMLNodeContentString(cur);
 } else if (!lockspace &&
virXMLNodeNameEqual(cur, "lockspace")) {
-lockspace = (char *)xmlNodeGetContent(cur);
+lockspace = virXMLNodeContentString(cur);
 } else if (!path &&
virXMLNodeNameEqual(cur, "target")) {
 path = virXMLPropString(cur, "path");
@@ -10595,16 +10595,16 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt,
 
 } else if (!serial &&
virXMLNodeNameEqual(cur, "serial")) {
-serial = (char *)xmlNodeGetContent(cur);
+serial = virXMLNodeContentString(cur);
 } else if (!wwn &&
virXMLNodeNameEqual(cur, "wwn")) {
-wwn = (char *)xmlNodeGetContent(cur);
+wwn = virXMLNodeContentString(cur);
 
 if (!virValidateWWN(wwn))
 goto error;
 } else if (!vendor &&
virXMLNodeNameEqual(cur, "vendor")) {
-vendor = (char *)xmlNodeGetContent(cur);
+vendor = virXMLNodeContentString(cur);
 
 if (strlen(vendor) > VENDOR_LEN) {
 virReportError(VIR_ERR_XML_ERROR, "%s",
@@ -10619,7 +10619,7 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt,
 }
 } else if (!product &&
virXMLNodeNameEqual(cur, "product")) {
-product = (char *)xmlNodeGetContent(cur);
+product = virXMLNodeContentString(cur);
 
 if (strlen(product) > PRODUCT_LEN) {
 virReportError(VIR_ERR_XML_ERROR, "%s",
@@ -13513,7 +13513,7 @@ virDomainSmartcardDefParseXML(virDomainXMLOptionPtr 
xmlopt,
  "exactly three certificates"));
 goto error;
 }
-def->data.cert.file[i] = (char *)xmlNodeGetContent(cur);
+def->data.cert.file[i] = virXMLNodeContentString(cur);
 if (!def->data.cert.file[i]) {
 virReportOOMError();
 goto error;
@@ -13522,7 +13522,7 @@ virDomainSmartcardDefParseXML(virDomainXMLOptionPtr 
xmlopt,
 } else if (cur->type == XML_ELEMENT_NODE &&
virXMLNodeNameEqual(cur, "database") &&
!def->data.cert.database) {
-def->data.cert.database = (char *)xmlNodeGetContent(cur);
+def->data.cert.database = virXMLNodeContentString(cur);
 if (!def->data.cert.database) {
 virReportOOMError();
 goto error;
@@ -19875,7 +19875,7 @@ virDomainLoaderDefParseXML(xmlNodePtr node,
 if (!fwAutoSelect) {
 readonly_str = virXMLPropString(node, "readonly");
 type_str = virXMLPropString(node, "type");
-loader->path = (char *) xmlNodeGetContent(node);
+loader->path = virXMLNodeContentString(node);
 if (STREQ_NULLABLE(loader->path, ""))
 VIR_FREE(loader->path);
 }
diff --git a/src/conf/network_conf.c b/src/conf/network_conf.c
index 0fd68a7d66..0a32f57188 100644
--- a/src/conf/network_conf.c
+++ b/src/conf/network_conf.c
@@ -720,7 +720,7 @@ virNetworkDNSHostDefParseXML(const char *networkName,
 if (cur->type == XML_ELEMENT_NODE &&
 virXMLNodeNameEqual(cur, "hostname")) {
   if (cur->children != NULL) {
-  g_autofree char *name = (char *) xmlNodeGetContent(cur);
+  g_autofree char *name = virXMLNodeContentString(cur);
 
   if (!name) {
   virReportError(VIR_ERR_XML_DETAIL,
diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c
index c54015336a..7bbe709c5d 100644
--- a/src/conf/node_device_conf.c
+++ b/src/conf/node_device_conf.c
@@ -1988,10 +1988,10 @@ virNodeDeviceDefParseXML(xmlXPathContextPtr ctxt,
 
 switch ((virNodeDevDevnodeType)val) {
 

[PATCH 0/5] be consistent about error checking xmlNodeGetContent() return

2020-07-20 Thread Laine Stump
Awhile back I noticed that calls to xmlNodeGetContent() from libvirt
code were inconsistent in their handling of the returned
pointer. Sometimes we would assume the return was always non-NULL
(dereferencing with wild abandon without concern for the
consequences), sometimes we would interpret NULL as "", and sometimes
as OOM. I sent mail about this to the list last week, wondering (and
doubting) if we could assume that a NULL return would always mean OOM:

https://www.redhat.com/archives/libvir-list/2020-July/msg00333.html

After looking at the libxml code, danpb's determination was that a
NULL return from xmlNodeGetContent *might* mean OOM, but it might also
mean some odd XML that we weren't expecting, so we can't just always
exit on a NULL return. He also agreed with (and expanded on) my
suspicion that there really is no reliable way to tell the reason for
a NULL return from xmlNodeGetContent, and suggested that possibly we
could just examing the xmlNode directly to learn the content instead
of calling xmlNodeGetContent().

This series is a followup to that discussion. The first 4 patches
clean up the code with the result that:

1) a libvirt wrapper function is always called instead of calling
xmlNodeGetContent() directly.

2) that wrapper function logs an error when it gets back NULL from
xmlNodeGetContent().

3) All the callers check for a NULL return, and do a "silent error
return" themselves when there is a NULL.

In the final patch, I try out Dan's idea of looking at the xmlNode
object directly to get the content. It turns out it's not as
straightforward as you would think from just looking at the layout of
the object - a full explanation is in patch 5. I'm mainly sending that
patch as an "FYI" (one step back from an "RFC"), since really all it
changes is that libvirt will exit on OOM, and log (different, but not
any more informative) error messages when the problem isn't
OOM. Unless someone has a strong opinion otherwise, I think just the
first 4 patches should be applied, and users can just "deal" with the
ambiguity in case of error.


Laine Stump (5):
  conf: refactor virDomainBlkioDeviceParseXML to reduce calls to
xmlNodeGetContent
  util: replace all calls to xmlNodeGetContent with
virXMLNodeContentString
  util: log an error if virXMLNodeContentString will return NULL
  treat all NULL returns from virXMLNodeContentString() as an error
  util: open code virXMLNodeContentString to access the node object
directly

 src/conf/domain_conf.c  | 194 
 src/conf/network_conf.c |   7 +-
 src/conf/node_device_conf.c |   6 +-
 src/util/virxml.c   |  53 +-
 4 files changed, 169 insertions(+), 91 deletions(-)

-- 
2.25.4



[PATCH 4/5] treat all NULL returns from virXMLNodeContentString() as an error

2020-07-20 Thread Laine Stump
and stop erroneously equating NULL with "". The latter means that the
element has empty content, while the former means there was an error
during parsing (either internal with the parser, or the content of the
XML was bad).

Signed-off-by: Laine Stump 
---
 src/conf/domain_conf.c  | 68 ++---
 src/conf/network_conf.c |  5 ++-
 src/conf/node_device_conf.c |  6 ++--
 3 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index cb69c97a8e..c377fd74aa 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -1643,7 +1643,8 @@ virDomainBlkioDeviceParseXML(xmlNodePtr root,
 if (node->type != XML_ELEMENT_NODE)
 continue;
 
-c = virXMLNodeContentString(node);
+if (!(c = virXMLNodeContentString(node)))
+return -1;
 
 if (virXMLNodeNameEqual(node, "path")) {
 /* To avoid the need for explicit cleanup on failure,
@@ -9373,10 +9374,12 @@ virDomainLeaseDefParseXML(xmlNodePtr node)
 while (cur != NULL) {
 if (cur->type == XML_ELEMENT_NODE) {
 if (!key && virXMLNodeNameEqual(cur, "key")) {
-key = virXMLNodeContentString(cur);
+if (!(key = virXMLNodeContentString(cur)))
+goto error;
 } else if (!lockspace &&
virXMLNodeNameEqual(cur, "lockspace")) {
-lockspace = virXMLNodeContentString(cur);
+if (!(lockspace = virXMLNodeContentString(cur)))
+goto error;
 } else if (!path &&
virXMLNodeNameEqual(cur, "target")) {
 path = virXMLPropString(cur, "path");
@@ -10595,16 +10598,19 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt,
 
 } else if (!serial &&
virXMLNodeNameEqual(cur, "serial")) {
-serial = virXMLNodeContentString(cur);
+if (!(serial = virXMLNodeContentString(cur)))
+goto error;
 } else if (!wwn &&
virXMLNodeNameEqual(cur, "wwn")) {
-wwn = virXMLNodeContentString(cur);
+if (!(wwn = virXMLNodeContentString(cur)))
+goto error;
 
 if (!virValidateWWN(wwn))
 goto error;
 } else if (!vendor &&
virXMLNodeNameEqual(cur, "vendor")) {
-vendor = virXMLNodeContentString(cur);
+if (!(vendor = virXMLNodeContentString(cur)))
+goto error;
 
 if (strlen(vendor) > VENDOR_LEN) {
 virReportError(VIR_ERR_XML_ERROR, "%s",
@@ -10619,7 +10625,8 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt,
 }
 } else if (!product &&
virXMLNodeNameEqual(cur, "product")) {
-product = virXMLNodeContentString(cur);
+if (!(product = virXMLNodeContentString(cur)))
+goto error;
 
 if (strlen(product) > PRODUCT_LEN) {
 virReportError(VIR_ERR_XML_ERROR, "%s",
@@ -13513,20 +13520,16 @@ virDomainSmartcardDefParseXML(virDomainXMLOptionPtr 
xmlopt,
  "exactly three certificates"));
 goto error;
 }
-def->data.cert.file[i] = virXMLNodeContentString(cur);
-if (!def->data.cert.file[i]) {
-virReportOOMError();
+if (!(def->data.cert.file[i] = virXMLNodeContentString(cur)))
 goto error;
-}
+
 i++;
 } else if (cur->type == XML_ELEMENT_NODE &&
virXMLNodeNameEqual(cur, "database") &&
!def->data.cert.database) {
-def->data.cert.database = virXMLNodeContentString(cur);
-if (!def->data.cert.database) {
-virReportOOMError();
+if (!(def->data.cert.database = virXMLNodeContentString(cur)))
 goto error;
-}
+
 if (*def->data.cert.database != '/') {
 virReportError(VIR_ERR_XML_ERROR,
_("expecting absolute path: %s"),
@@ -15638,8 +15641,10 @@ virSysinfoOEMStringsParseXML(xmlNodePtr node,
 goto cleanup;
 
 def->nvalues = nstrings;
-for (i = 0; i < nstrings; i++)
-def->values[i] = virXMLNodeContentString(strings[i]);
+for (i = 0; i < nstrings; i++) {
+if (!(def->values[i] = virXMLNodeContentString(strings[i])))
+goto cleanup;
+}
 
 *oem = g_steal_pointer();
 ret = 0;
@@ -15767,7 +15772,9 @@ virSysinfoParseFWCfgDef(virSysinfoDefPtr def,
 return -1;
 }
 
-value = virXMLNodeContentString(nodes[i]);
+if (!(value = virXMLNodeContentString(nodes[i])))
+return -1;
+
 file = virXMLPropString(nodes[i], 

Re: [PATCH 01/10] virDomainSetBlockThreshold: Document values of '@dev' better

2020-07-20 Thread Eric Blake

On 7/15/20 8:10 AM, Peter Krempa wrote:

Mention where to obtain the index and how it's treated.

Signed-off-by: Peter Krempa 
---
  src/libvirt-domain.c | 6 ++
  1 file changed, 6 insertions(+)


Reviewed-by: Eric Blake 



diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index fe4dab7cdf..ba30d18f65 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12371,6 +12371,12 @@ int virDomainGetGuestInfo(virDomainPtr domain,
   * level is unset once the event fires. The event might not be delivered at 
all
   * if libvirtd was not running at the moment when the threshold was reached.
   *
+ * @dev can either be a disk target name (vda, sda) or disk target with index (
+ * vda[4]). Without the index the top image in the backing chain will have the
+ * threshold set. The index corresponds to the 'index' attribute reported in 
the
+ * live VM XML for 'backingStore' or 'source' elements of a disk. If index is
+ * given the threshold is set for the corresponding image.
+ *
   * Hypervisors report the last written sector of an image in the bulk stats 
API
   * (virConnectGetAllDomainStats/virDomainListGetStats) as
   * "block..allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 5/5] qemuDomainBlockPivot: Ignore failures of creating active layer bitmap

2020-07-20 Thread Eric Blake

On 7/16/20 9:20 AM, Peter Krempa wrote:

Ignore errors from creating "libvirt-tmp-activewrite" bitmap. This
prevents failures of finishing blockjobs if the bitmap already exists.

Note that if the bitmap exists, the worst case that can happen is that
more bits are marked as dirty in the resulting merge.


In turn, your incremental backup might be larger than it needs to be, 
but you have not lost any data.  I agree this is safe.




Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_driver.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Eric Blake 



diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 348ef17141..64ddc8dce9 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -17414,7 +17414,7 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver,
  }

  if (bitmapactions && rc == 0)
-rc = qemuMonitorTransaction(priv->mon, );
+ignore_value(qemuMonitorTransaction(priv->mon, ));

  if (rc == 0)
  ret = qemuMonitorJobComplete(priv->mon, job->name);



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 4/5] qemuDomainBlockPivot: Rename 'actions' to 'bitmapactions'

2020-07-20 Thread Eric Blake

On 7/16/20 9:20 AM, Peter Krempa wrote:

There are two possible 'transaction' command arguments in the function.
Rename 'actions' as they deal with creating bitmaps only.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_driver.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 1/5] qemu: blockjob: Don't base bitmap handling of active-layer block commit on QEMU_CAPS_BLOCKDEV_REOPEN

2020-07-20 Thread Eric Blake

On 7/16/20 9:20 AM, Peter Krempa wrote:

The handler finalizing the active layer block commit doesn't actually
reopen the file for active layer block commit, so the comment and check
are invalid.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_blockjob.c | 3 ++-
  src/qemu/qemu_driver.c   | 6 +-
  2 files changed, 3 insertions(+), 6 deletions(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 2/5] qemu: blockjob: Actually delete temporary bitmap on failed active commit

2020-07-20 Thread Eric Blake

On 7/16/20 9:20 AM, Peter Krempa wrote:

Commit 20a7abc2d2d tried to delete the possibly leftover bitmap but
neglected to call the actual monitor to do so.

Signed-off-by: Peter Krempa 
---
  src/qemu/qemu_blockjob.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)


Yeah, building up the command structure doesn't help if you don't run it ;)

Reviewed-by: Eric Blake 


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [PATCH 3/5] qemu: block: Remove 'active-write' bitmap even if there are no bitmaps to merge

2020-07-20 Thread Eric Blake

On 7/16/20 9:20 AM, Peter Krempa wrote:

The 'libvirt-tmp-activewrite' bitmap is added during the 'pivot'
operation of block copy and active layer block commit operations
regardless of whether there are any bitmaps to merge, but was not
removed unless a bitmap was merged. This meant that subsequent attempts
to merge into the same image would fail.

Fix it by checking whether the 'libvirt-tmp-activewrite' would be used
by the code and don't skip the code which would delete it.

This is a regression introduced when we switched to the new code for
block commit in <20a7abc2d2d> and for block copy in <7bfff40fdfe5>. The
actual bug originates from <4fa8654ece>.

https://bugzilla.redhat.com/show_bug.cgi?id=1857735

Signed-off-by: Peter Krempa 
---


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-20 Thread Chris Murphy
On Mon, Jul 20, 2020 at 12:11 PM Daniel P. Berrangé  wrote:
>
> On Mon, Jul 13, 2020 at 08:06:22AM -0600, Chris Murphy wrote:
> > On Mon, Jul 13, 2020 at 6:20 AM Daniel P. Berrangé  
> > wrote:
> > >
> > > On Sat, Jul 11, 2020 at 07:28:43PM -0600, Chris Murphy wrote:
> >
> > > > Also, the location for GNOME Boxes doesn't exist at install time, so
> > > > the installer doing it with a post-install script isn't a complete
> > > > solution.
> > > >
> > > > While Boxes can use 'qemu-img create -o nocow=on' there is an
> > > > advantage of 'chattr +C' on the enclosing directory: files copied into
> > > > it, as well as new files created, inherit the attribute. Meanwhile, it
> > > > can't be set after the file has non-zero size.
> > >
> > > Boxes will use libvirt storage pools APIs to create the directory in
> > > the first place. So if we add nocow support to the pool APIs, we dont
> > > need to worry about per-file usage in most cases.
> > >
> > >
> > > Is there a good way to determine whether a given filesystem supports
> > > copy-on-write, other than to simply try to set the +C attribute on a
> > > directory ?  Ideally we would check for this feature, not check
> > > whether the filesystem is btrfs, as that makes it future proof to
> > > other cow supporting filesystems.
> >
> > It's a messy history. lsattr and chattr come from e2fsprogs. Not all
> > file systems support file attributes. And not all COW file systems (or
> > layers) support an option for nocow, in fact Btrfs is the only one I'm
> > aware of that has it. Both as a file attribute, 'chattr +C' but also
> > with a mount option 'nodatacow' - the metadata is always updated per
> > copy-on-write semantic.
> >
> > ZFS is always copy-on-write.
> >
> > XFS is overwrite in place, but supports shared extents (reflink
> > copies). Any overwrite to a shared extent becomes temporarily COW
> > behavior. It's the same behavior on Btrfs if the file is nodatacow and
> > reflinked (or snapshot).
> >
> > For device-mapper layered approaches (thin provisioning, vdo) I'm not
> > certain whether this information is available to higher layers or can
> > be inhibited.
>
> Given this mess, I've taken the simple option and just checked the
> filesystem magic == btrfs.
>
> With this series:
>
>   https://www.redhat.com/archives/libvir-list/2020-July/msg01377.html
>
> any application which builds storage pools in libvirt will automatically
> get "nocow" attribute set on btrfs, unless they take explicit steps to
> override this default.
>
> This will work with virt-manager and GNOME Boxes out of the box IIUC.
>
> Despite the fact that /var/lib/libvirt/images is created by RPM, this
> should still get +C  attribute set by virt-manager, and virt-install
> *provided* using this syntax:
>
>   virt-install --disk size=10
>
> It won't get +C if using the explicit path:
>
>   virt-install --disk size=10,path=/var/lib/libvirt/images/demo.img
>
> I'm thinking most people will use the simpler syntax, so not a big
> deal.
>
> IOW, this should give us reasonable behaviour on btrfs out of the
> box with +C set on directories, and thus auto-inherited by images.


Awesome. Thanks for this work. I've updated the RFE associated with
the 'btrfs by default' feature tracking bug.

-- 
Chris Murphy




Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-20 Thread Daniel P . Berrangé
On Mon, Jul 20, 2020 at 08:20:12PM +0200, Andrea Bolognani wrote:
> On Mon, 2020-07-20 at 18:21 +0100, Daniel P. Berrangé wrote:
> > On Mon, Jul 20, 2020 at 07:18:55PM +0200, Andrea Bolognani wrote:
> > > The other day I randomly realized the ssh-based transports already
> > > accept a 'netcat' URI parameter which can be used to point libvirt
> > > to a non-standard nc stand-in. With that in mind, is it really
> > > necessary to introduce another URI parameter? Can't we just reuse
> > > the existing one?
> > 
> > Any binary provided there needs to implement netcat syntax. The whole
> > point of this work is that we don't want to use the netcat syntax since
> > it is impossible to know the correct socket path.
> 
> We could special-case binaries called 'virt-nc' and use our internal
> syntax for those. Having two separate URI parameters to control the
> same knob sounds like trouble, especially since you can mix and
> match: if you try to connect to
> 
>   qemu+ssh://host/system?proxy=virt-nc=my-cool-nc
> 
> for example, what happens? As far as I can tell virt-nc will be used,
> but it's certainly not as obvious as it would be if everything was
> controlled by a single URI parameter.

No, I really don't want to do magic based on the name of the binary.
That is a recipe for long term pain. It never turns out well when we
try to overload two distinct concepts onto a single tunable.

That URL you illustrate should be reported as an error since it
is using mutually exclusive args.

> Actually, and I might be very well missing something because I looked
> at the code rather quickly, from what I can tell the default value
> for proxy will cause libvirt to always prefer virt-nc when available,
> which means that the URI
> 
>   qemu+ssh://host/system?netcat=my-cool-nc
> 
> will suddenly stop using my-cool-nc and start using virt-nc after
> libvirt has been upgraded - a breaking change.

It will only stop using my-cool-nc if you have upgraded the remote
host to have virt-nc installed, and your local host also has the
libvirt supporting virt-nc. I'd consider that desirable, as netcat
is redundant once both sides are upgraded.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-20 Thread Andrea Bolognani
On Mon, 2020-07-20 at 18:21 +0100, Daniel P. Berrangé wrote:
> On Mon, Jul 20, 2020 at 07:18:55PM +0200, Andrea Bolognani wrote:
> > The other day I randomly realized the ssh-based transports already
> > accept a 'netcat' URI parameter which can be used to point libvirt
> > to a non-standard nc stand-in. With that in mind, is it really
> > necessary to introduce another URI parameter? Can't we just reuse
> > the existing one?
> 
> Any binary provided there needs to implement netcat syntax. The whole
> point of this work is that we don't want to use the netcat syntax since
> it is impossible to know the correct socket path.

We could special-case binaries called 'virt-nc' and use our internal
syntax for those. Having two separate URI parameters to control the
same knob sounds like trouble, especially since you can mix and
match: if you try to connect to

  qemu+ssh://host/system?proxy=virt-nc=my-cool-nc

for example, what happens? As far as I can tell virt-nc will be used,
but it's certainly not as obvious as it would be if everything was
controlled by a single URI parameter.

Actually, and I might be very well missing something because I looked
at the code rather quickly, from what I can tell the default value
for proxy will cause libvirt to always prefer virt-nc when available,
which means that the URI

  qemu+ssh://host/system?netcat=my-cool-nc

will suddenly stop using my-cool-nc and start using virt-nc after
libvirt has been upgraded - a breaking change.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: Setting 'nodatacow' on VM image when on Btrfs

2020-07-20 Thread Daniel P . Berrangé
On Mon, Jul 13, 2020 at 08:06:22AM -0600, Chris Murphy wrote:
> On Mon, Jul 13, 2020 at 6:20 AM Daniel P. Berrangé  
> wrote:
> >
> > On Sat, Jul 11, 2020 at 07:28:43PM -0600, Chris Murphy wrote:
> 
> > > Also, the location for GNOME Boxes doesn't exist at install time, so
> > > the installer doing it with a post-install script isn't a complete
> > > solution.
> > >
> > > While Boxes can use 'qemu-img create -o nocow=on' there is an
> > > advantage of 'chattr +C' on the enclosing directory: files copied into
> > > it, as well as new files created, inherit the attribute. Meanwhile, it
> > > can't be set after the file has non-zero size.
> >
> > Boxes will use libvirt storage pools APIs to create the directory in
> > the first place. So if we add nocow support to the pool APIs, we dont
> > need to worry about per-file usage in most cases.
> >
> >
> > Is there a good way to determine whether a given filesystem supports
> > copy-on-write, other than to simply try to set the +C attribute on a
> > directory ?  Ideally we would check for this feature, not check
> > whether the filesystem is btrfs, as that makes it future proof to
> > other cow supporting filesystems.
> 
> It's a messy history. lsattr and chattr come from e2fsprogs. Not all
> file systems support file attributes. And not all COW file systems (or
> layers) support an option for nocow, in fact Btrfs is the only one I'm
> aware of that has it. Both as a file attribute, 'chattr +C' but also
> with a mount option 'nodatacow' - the metadata is always updated per
> copy-on-write semantic.
> 
> ZFS is always copy-on-write.
> 
> XFS is overwrite in place, but supports shared extents (reflink
> copies). Any overwrite to a shared extent becomes temporarily COW
> behavior. It's the same behavior on Btrfs if the file is nodatacow and
> reflinked (or snapshot).
> 
> For device-mapper layered approaches (thin provisioning, vdo) I'm not
> certain whether this information is available to higher layers or can
> be inhibited.

Given this mess, I've taken the simple option and just checked the
filesystem magic == btrfs.

With this series:

  https://www.redhat.com/archives/libvir-list/2020-July/msg01377.html

any application which builds storage pools in libvirt will automatically
get "nocow" attribute set on btrfs, unless they take explicit steps to
override this default.

This will work with virt-manager and GNOME Boxes out of the box IIUC.

Despite the fact that /var/lib/libvirt/images is created by RPM, this
should still get +C  attribute set by virt-manager, and virt-install
*provided* using this syntax:

  virt-install --disk size=10

It won't get +C if using the explicit path:

  virt-install --disk size=10,path=/var/lib/libvirt/images/demo.img

I'm thinking most people will use the simpler syntax, so not a big
deal.

IOW, this should give us reasonable behaviour on btrfs out of the
box with +C set on directories, and thus auto-inherited by images.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[libvirt PATCH 1/4] util: add a helper method for controlling the COW flag on btrfs

2020-07-20 Thread Daniel P . Berrangé
btrfs defaults to performing copy-on-write for files. This is often
undesirable for VM images, so we need to be able to control whether this
behaviour is used.

The virFileSetCOW() will allow for this. We use a tristate, since out of
the box, we want the default behaviour attempt to disable cow, but only
on btrfs, silently do nothing on non-btrfs. If someone explicitly asks
to disable/enable cow, then we want to raise a hard error on non-btrfs.

Signed-off-by: Daniel P. Berrangé 
---
 src/libvirt_private.syms |  1 +
 src/util/virfile.c   | 76 
 src/util/virfile.h   |  3 ++
 3 files changed, 80 insertions(+)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 73b72c9e10..eea31a736d 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -2125,6 +2125,7 @@ virFileRewrite;
 virFileRewriteStr;
 virFileSanitizePath;
 virFileSetACLs;
+virFileSetCOW;
 virFileSetupDev;
 virFileSetXAttr;
 virFileTouch;
diff --git a/src/util/virfile.c b/src/util/virfile.c
index 213acdbcaa..5b169b3d11 100644
--- a/src/util/virfile.c
+++ b/src/util/virfile.c
@@ -71,6 +71,7 @@
 # endif
 # include 
 # include 
+# include 
 #endif
 
 #if HAVE_LIBATTR
@@ -4504,3 +4505,78 @@ virFileDataSync(int fd)
 return fdatasync(fd);
 #endif
 }
+
+
+int
+virFileSetCOW(const char *path,
+  virTristateBool state)
+{
+#if __linux__
+int val = 0;
+struct statfs buf;
+VIR_AUTOCLOSE fd = -1;
+
+VIR_DEBUG("Setting COW flag on '%s' to '%s'",
+  path, virTristateBoolTypeToString(state));
+
+fd = open(path, O_RDONLY|O_NONBLOCK|O_LARGEFILE);
+if (fd < 0) {
+virReportSystemError(errno, _("unable to open '%s'"),
+ path);
+return -1;
+}
+
+if (fstatfs(fd, ) < 0)  {
+virReportSystemError(errno, _("unable query filesystem type on '%s'"),
+ path);
+return -1;
+}
+
+if (buf.f_type != BTRFS_SUPER_MAGIC) {
+if (state == VIR_TRISTATE_BOOL_ABSENT) {
+virReportSystemError(ENOSYS,
+ _("unable to control COW flag on '%s', not 
btrfs"),
+ path);
+return -1;
+}
+return 0;
+}
+
+if (ioctl(fd, FS_IOC_GETFLAGS, ) < 0) {
+virReportSystemError(errno, _("unable get directory flags on '%s'"),
+ path);
+return -1;
+}
+
+VIR_DEBUG("Current flags on '%s' are 0x%x", path, val);
+if (state == VIR_TRISTATE_BOOL_YES) {
+val &= ~FS_NOCOW_FL;
+} else {
+val |= FS_NOCOW_FL;
+}
+
+VIR_DEBUG("New flags on '%s' will be 0x%x", path, val);
+if (ioctl(fd, FS_IOC_SETFLAGS, ) < 0) {
+int saved_err = errno;
+VIR_DEBUG("Failed to set flags on '%s': %s", path, 
g_strerror(saved_err));
+if (state != VIR_TRISTATE_BOOL_ABSENT) {
+virReportSystemError(saved_err,
+ _("unable control COW flag on '%s'"),
+ path);
+return -1;
+} else {
+VIR_DEBUG("Ignoring failure to set COW");
+}
+}
+
+return 0;
+#else /* ! __linux__ */
+if (state != VIR_TRISTATE_BOOL_ABSENT) {
+virReportSystemError(ENOSYS,
+ _("Unable to set copy-on-write state on '%s' to 
'%s'"),
+ path, virTristateBoolTypeToString(state));
+return -1;
+}
+return 0;
+#endif /* ! __linux__ */
+}
diff --git a/src/util/virfile.h b/src/util/virfile.h
index 7a92364a5c..cb0e586a7d 100644
--- a/src/util/virfile.h
+++ b/src/util/virfile.h
@@ -374,3 +374,6 @@ int virFileRemoveXAttr(const char *path,
 G_GNUC_NO_INLINE;
 
 int virFileDataSync(int fd);
+
+int virFileSetCOW(const char *path,
+  virTristateBool state);
-- 
2.26.2



[libvirt PATCH 4/4] conf: add control over COW for storage pool directories

2020-07-20 Thread Daniel P . Berrangé
The storage pool code now attempts to disable COW by default on btrfs,
but management applications may wish to override this behaviour. Thus we
introduce a concept of storage pool features:

  

  

If the  feature policy is set, it will be enforced. It will always
return an hard error if COW cannot be explicitly set or unset.

Signed-off-by: Daniel P. Berrangé 
---
 docs/formatstorage.html.in   | 25 ++
 docs/schemas/storagepool.rng | 30 
 src/conf/storage_conf.c  | 49 
 src/conf/storage_conf.h  |  8 
 src/storage/storage_util.c   |  2 +-
 tests/storagepoolxml2xmlin/pool-dir-cow.xml  | 10 
 tests/storagepoolxml2xmlout/pool-dir-cow.xml | 15 ++
 tests/storagepoolxml2xmltest.c   |  1 +
 8 files changed, 139 insertions(+), 1 deletion(-)
 create mode 100644 tests/storagepoolxml2xmlin/pool-dir-cow.xml
 create mode 100644 tests/storagepoolxml2xmlout/pool-dir-cow.xml

diff --git a/docs/formatstorage.html.in b/docs/formatstorage.html.in
index 2a7604d136..7493714c5c 100644
--- a/docs/formatstorage.html.in
+++ b/docs/formatstorage.html.in
@@ -67,6 +67,31 @@
 pool. Since 0.4.1
 
 
+Features
+
+
+  Some pools support optional features:
+
+
+
+...
+features
+  cow state='no'
+/features
+...
+
+
+  Valid features are:
+
+
+  cow
+  Controls whether the filesystem performs copy-on-write (COW) for
+images in the pool. This may only be set for directory / filesystem
+pools on the btrfs filesystem. If not set then libvirt
+will attempt to disable COW on any btrfs filesystems.
+Since 6.6.0.
+
+
 Source elements
 
 
diff --git a/docs/schemas/storagepool.rng b/docs/schemas/storagepool.rng
index ff0d3c836c..f5cf6769c8 100644
--- a/docs/schemas/storagepool.rng
+++ b/docs/schemas/storagepool.rng
@@ -37,6 +37,7 @@
 
   
   
+  
   
   
 
@@ -49,6 +50,7 @@
 
   
   
+  
   
   
 
@@ -64,6 +66,7 @@
 
   
   
+  
   
   
 
@@ -79,6 +82,7 @@
 
   
   
+  
   
   
 
@@ -91,6 +95,7 @@
 
   
   
+  
   
   
 
@@ -103,6 +108,7 @@
 
   
   
+  
   
   
 
@@ -117,6 +123,7 @@
   
 
   
+  
   
 
   
@@ -128,6 +135,7 @@
 
   
   
+  
   
   
 
@@ -140,6 +148,7 @@
 
   
   
+  
   
 
   
@@ -154,6 +163,7 @@
 
   
   
+  
   
   
 
@@ -169,6 +179,7 @@
 
   
   
+  
   
 
   
@@ -180,6 +191,7 @@
 
   
   
+  
   
 
   
@@ -191,6 +203,7 @@
 
   
   
+  
   
   
 
@@ -205,6 +218,7 @@
 
   
   
+  
   
   
 
@@ -277,6 +291,22 @@
 
   
 
+  
+
+  
+
+  
+
+  
+
+
+
+  
+
+  
+
+  
+
   
 
   
diff --git a/src/conf/storage_conf.c b/src/conf/storage_conf.c
index 65d9b33049..4e63865b39 100644
--- a/src/conf/storage_conf.c
+++ b/src/conf/storage_conf.c
@@ -839,6 +839,33 @@ virStoragePoolDefRefreshFormat(virBufferPtr buf,
 }
 
 
+static int
+virStoragePoolDefParseFeatures(virStoragePoolDefPtr def,
+   xmlXPathContextPtr ctxt)
+{
+g_autofree char *cow = virXPathString("string(./features/cow/@state)", 
ctxt);
+
+if (cow) {
+int val;
+if (def->type != VIR_STORAGE_POOL_FS &&
+def->type != VIR_STORAGE_POOL_DIR) {
+virReportError(VIR_ERR_NO_SUPPORT, "%s",
+   _("cow feature may only be used for 'fs' and 'dir' 
pools"));
+return -1;
+}
+if ((val = virTristateBoolTypeFromString(cow)) < 0) {
+virReportError(VIR_ERR_XML_ERROR,
+   _("invalid storage pool cow feature state '%s'"),
+   cow);
+return -1;
+}
+def->features.cow = val;
+}
+
+return 0;
+}
+
+
 virStoragePoolDefPtr
 virStoragePoolDefParseXML(xmlXPathContextPtr ctxt)
 {
@@ -910,6 +937,9 @@ virStoragePoolDefParseXML(xmlXPathContextPtr ctxt)
 }
 }
 
+if (virStoragePoolDefParseFeatures(def, ctxt) < 0)
+return NULL;
+
 if (options->flags & VIR_STORAGE_POOL_SOURCE_HOST) {
 if (!def->source.nhost) {
 virReportError(VIR_ERR_XML_ERROR, "%s",
@@ -1131,6 +1161,23 @@ virStoragePoolSourceFormat(virBufferPtr buf,
 }
 
 
+static void
+virStoragePoolDefFormatFeatures(virBufferPtr buf,
+virStoragePoolDefPtr def)
+{
+if (!def->features.cow)
+return;
+
+virBufferAddLit(buf, "\n");
+virBufferAdjustIndent(buf, 2);
+if (def->features.cow)
+ 

[libvirt PATCH 3/4] storage: attempt to disable COW by default

2020-07-20 Thread Daniel P . Berrangé
This calls virFileSetCOW when building a pool with a request to attempt,
but not require, COW to be disabled. The effect is that nothing changes
on non-btrfs filesystems, but btrfs will get COW disabled on the
directory. This setting is then inherited by all newly created files in
the pool, avoiding the need for mgmt app to set "nocow" on a per-volume
basis.

Signed-off-by: Daniel P. Berrangé 
---
 src/storage/storage_util.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/storage/storage_util.c b/src/storage/storage_util.c
index 2595162a61..80b49bd1cf 100644
--- a/src/storage/storage_util.c
+++ b/src/storage/storage_util.c
@@ -2712,6 +2712,7 @@ virStorageBackendBuildLocal(virStoragePoolObjPtr pool)
 bool needs_create_as_uid;
 unsigned int dir_create_flags;
 g_autofree char *parent = NULL;
+int ret;
 
 parent = g_strdup(def->target.path);
 if (!(p = strrchr(parent, '/'))) {
@@ -2745,11 +2746,19 @@ virStorageBackendBuildLocal(virStoragePoolObjPtr pool)
 /* Now create the final dir in the path with the uid/gid/mode
  * requested in the config. If the dir already exists, just set
  * the perms. */
-return virDirCreate(def->target.path,
-mode,
-def->target.perms.uid,
-def->target.perms.gid,
-dir_create_flags);
+ret = virDirCreate(def->target.path,
+   mode,
+   def->target.perms.uid,
+   def->target.perms.gid,
+   dir_create_flags);
+if (ret < 0)
+return -1;
+
+if (virFileSetCOW(def->target.path,
+  VIR_TRISTATE_BOOL_ABSENT) < 0)
+return -1;
+
+return 0;
 }
 
 
-- 
2.26.2



[libvirt PATCH 2/4] storage: convert to use virFileSetCOW

2020-07-20 Thread Daniel P . Berrangé
When disabling COW on individual files, we now use the virFileSetCOW
method. Note that this change has a slight semantic difference to the
old implementation.

The original code reported errors but returned success when disabling
COW failed.

With this new code, we will always report an error if the user requested
disabling of COW and we could not honour it, either because btrfs
returned an error, or because the filesystem is not btrfs.

Signed-off-by: Daniel P. Berrangé 
---
 src/storage/storage_util.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/src/storage/storage_util.c b/src/storage/storage_util.c
index ee048f02fe..2595162a61 100644
--- a/src/storage/storage_util.c
+++ b/src/storage/storage_util.c
@@ -28,9 +28,6 @@
 #ifdef __linux__
 # include 
 # include 
-# ifndef FS_NOCOW_FL
-#  define FS_NOCOW_FL 0x0080 /* Do not cow file */
-# endif
 # define default_mount_opts "nodev,nosuid,noexec"
 #elif defined(__FreeBSD__)
 # define default_mount_opts "nosuid,noexec"
@@ -456,25 +453,11 @@ storageBackendCreateRaw(virStoragePoolObjPtr pool,
 }
 created = true;
 
-if (vol->target.nocow) {
-#ifdef __linux__
-int attr;
-
-/* Set NOCOW flag. This is an optimisation for btrfs.
- * The FS_IOC_SETFLAGS ioctl return value will be ignored since any
- * failure of this operation should not block the volume creation.
- */
-if (ioctl(fd, FS_IOC_GETFLAGS, ) < 0) {
-virReportSystemError(errno, "%s", _("Failed to get fs flags"));
-} else {
-attr |= FS_NOCOW_FL;
-if (ioctl(fd, FS_IOC_SETFLAGS, ) < 0) {
-virReportSystemError(errno, "%s",
- _("Failed to set NOCOW flag"));
-}
-}
-#endif
-}
+/* NB, COW flag can only be toggled when the file is zero-size,
+ * so must go before the createRawFile call allocates payload */
+if (vol->target.nocow &&
+virFileSetCOW(vol->target.path, VIR_TRISTATE_BOOL_NO) < 0)
+goto cleanup;
 
 if ((ret = createRawFile(fd, vol, inputvol, reflink_copy)) < 0)
 /* createRawFile already reported the exact error. */
-- 
2.26.2



[libvirt PATCH 0/4] storage: support controlling COW attribute for pool

2020-07-20 Thread Daniel P . Berrangé
We already support a "nocow" flag for storage volumes, but this requires
extra work by the mgmt app or user when creating images on btrfs. We
want to "do the right thing" out of the box for btrfs.

We achieve this by changint the storage pool code so that when creating
a storage pool we always try to disable COW on btrfs filesystems. We
then add an  feature in the pool XML to let apps
override the default logic.

The COW setting on the pool is inherited by any volumes.

The main thing not solved here is that the default directory at
/var/lib/libvirt/images is created by the RPM itself, not by a
normal "pool-build" command.

Fortunately it appears that virt-install  will explicitly invoke a
storage pool build even if the directory already exists.

Daniel P. Berrangé (4):
  util: add a helper method for controlling the COW flag on btrfs
  storage: convert to use virFileSetCOW
  storage: attempt to disable COW by default
  conf: add control over COW for storage pool directories

 docs/formatstorage.html.in   | 25 +++
 docs/schemas/storagepool.rng | 30 
 src/conf/storage_conf.c  | 49 +
 src/conf/storage_conf.h  |  8 +++
 src/libvirt_private.syms |  1 +
 src/storage/storage_util.c   | 46 +---
 src/util/virfile.c   | 76 
 src/util/virfile.h   |  3 +
 tests/storagepoolxml2xmlin/pool-dir-cow.xml  | 10 +++
 tests/storagepoolxml2xmlout/pool-dir-cow.xml | 15 
 tests/storagepoolxml2xmltest.c   |  1 +
 11 files changed, 237 insertions(+), 27 deletions(-)
 create mode 100644 tests/storagepoolxml2xmlin/pool-dir-cow.xml
 create mode 100644 tests/storagepoolxml2xmlout/pool-dir-cow.xml

-- 
2.26.2




Re: [libvirt PATCH] docs: virConnectGetCapabilities do not provide pool types

2020-07-20 Thread John Ferlan



On 7/20/20 3:50 AM, Pino Toscano wrote:
> Remove the paragraph in the storage pool page that mentions
> virConnectGetCapabilities, as virConnectGetCapabilities does not return
> any information about pools.
> 
> Signed-off-by: Pino Toscano 
> ---
>  docs/formatstoragecaps.html.in | 6 --
>  1 file changed, 6 deletions(-)
> 

I was asked by Pino to review to be "ok" with removal, so yeah:

Reviewed-by: John Ferlan 

John

If anyone cares to dredge up history back in Jan/Feb 2019 I believe it
was a true statement in at least a patches I had locally, but reviews,
conversations, and other adjustments to the code changed that. Been too
long to recall all those details.

> diff --git a/docs/formatstoragecaps.html.in b/docs/formatstoragecaps.html.in
> index ee3888f44d..d8a1cacd96 100644
> --- a/docs/formatstoragecaps.html.in
> +++ b/docs/formatstoragecaps.html.in
> @@ -13,12 +13,6 @@
>  supported, and if relevant the source format types, the required
>  source elements, and the target volume format types. 
>  
> -The Storage Pool Capabilities XML provides more information than the
> -  
> -virConnectGetCapabilities
> -  
> -which only provides an enumerated list of supported pool types.
> -
>  Element and attribute overview
>  
>  A query interface was added to the virConnect API's to retrieve the
> 



Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-20 Thread Daniel P . Berrangé
On Mon, Jul 20, 2020 at 07:18:55PM +0200, Andrea Bolognani wrote:
> On Wed, 2020-07-15 at 16:11 +0200, Andrea Bolognani wrote:
> > On Wed, 2020-07-15 at 14:25 +0100, Daniel P. Berrangé wrote:
> > > On Wed, Jul 15, 2020 at 02:25:14PM +0200, Andrea Bolognani wrote:
> > > > Mh, that makes sense but I'm still wary of using "proxy" due to the
> > > > potential for confusion, since in this case the proxy is on the
> > > > opposite side of the connection than one would probably expect it
> > > > to be. Something like "remoteproxy" or "serverproxy", perhaps?
> > > 
> > > I don't think there's really any problem of confusion here unless
> > > someone doesn't read the docs at all, in which case they won't
> > > even know about this parameter. So I don't think using a more
> > > verbose term is any benefit.
> > 
> > Okay.
> 
> The other day I randomly realized the ssh-based transports already
> accept a 'netcat' URI parameter which can be used to point libvirt
> to a non-standard nc stand-in. With that in mind, is it really
> necessary to introduce another URI parameter? Can't we just reuse
> the existing one?

Any binary provided there needs to implement netcat syntax. The whole
point of this work is that we don't want to use the netcat syntax since
it is impossible to know the correct socket path.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-20 Thread Andrea Bolognani
On Wed, 2020-07-15 at 16:11 +0200, Andrea Bolognani wrote:
> On Wed, 2020-07-15 at 14:25 +0100, Daniel P. Berrangé wrote:
> > On Wed, Jul 15, 2020 at 02:25:14PM +0200, Andrea Bolognani wrote:
> > > Mh, that makes sense but I'm still wary of using "proxy" due to the
> > > potential for confusion, since in this case the proxy is on the
> > > opposite side of the connection than one would probably expect it
> > > to be. Something like "remoteproxy" or "serverproxy", perhaps?
> > 
> > I don't think there's really any problem of confusion here unless
> > someone doesn't read the docs at all, in which case they won't
> > even know about this parameter. So I don't think using a more
> > verbose term is any benefit.
> 
> Okay.

The other day I randomly realized the ssh-based transports already
accept a 'netcat' URI parameter which can be used to point libvirt
to a non-standard nc stand-in. With that in mind, is it really
necessary to introduce another URI parameter? Can't we just reuse
the existing one?

-- 
Andrea Bolognani / Red Hat / Virtualization



[PATCH 1/1] formatdomain.html.in: mention pSeries NVDIMM 'align down' mechanic

2020-07-20 Thread Daniel Henrique Barboza
The reason why we align down the guest area (total-size - label-size) is
explained in the body of qemuDomainNVDimmAlignSizePseries(). This
behavior must also be documented in the user docs.

Signed-off-by: Daniel Henrique Barboza 
---
 docs/formatdomain.html.in | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index f5ee97de81..af6c809ddd 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -9412,8 +9412,10 @@ qemu-kvm -net nic,model=? /dev/null
 
 
   the minimum label size is 128KiB,
-  the remaining size (total-size - label-size) will be aligned
-to 4KiB as default.
+  the remaining size (total-size - label-size), also called 
guest
+area, will be aligned to 4KiB as default. For pSeries guests, 
the
+guest area will be aligned down to 256MiB, and the minimum size
+of the guest area must be at least 256MiB plus the 
label-size.
 
   
 
-- 
2.26.2



[libvirt PATCH 3/3] Add a check attribute on the mac address element

2020-07-20 Thread Daniel P . Berrangé
From: Bastien Orivel 

This is only used in the ESX driver where, when set to "no", it will
ignore all the checks libvirt does about the origin of the MAC address
(whether or not it's in a VMWare OUI) and forward the original one to
the ESX server telling it not to check it either.

This allows keeping a deterministic MAC address which can be useful for
licensed software which might dislike changes.

Signed-off-by: Bastien Orivel 

VMX conversion parts rewritten to apply on top of previously merged
support for type='generated|static'

Signed-off-by: Daniel P. Berrangé 
---
 docs/schemas/domaincommon.rng |  5 
 src/conf/domain_conf.c| 14 +
 src/conf/domain_conf.h|  1 +
 src/vmx/vmx.c | 11 +++
 .../network-interface-mac-check.xml   | 29 +++
 tests/genericxml2xmltest.c|  2 ++
 tests/vmx2xmldata/vmx2xml-ethernet-other.xml  |  2 +-
 .../xml2vmxdata/xml2vmx-ethernet-mac-type.vmx |  2 ++
 .../xml2vmxdata/xml2vmx-ethernet-mac-type.xml |  4 +--
 9 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 tests/genericxml2xmlindata/network-interface-mac-check.xml

diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index a810f569c6..8cbbd7e6e9 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -3187,6 +3187,11 @@
   
 
   
+  
+
+  
+
+  
   
 
   
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 7ecd2818b9..6c05244c6e 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -11928,6 +11928,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
 int rv, val;
 g_autofree char *macaddr = NULL;
 g_autofree char *macaddr_type = NULL;
+g_autofree char *macaddr_check = NULL;
 g_autofree char *type = NULL;
 g_autofree char *network = NULL;
 g_autofree char *portgroup = NULL;
@@ -12009,6 +12010,7 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
 if (!macaddr && virXMLNodeNameEqual(cur, "mac")) {
 macaddr = virXMLPropString(cur, "address");
 macaddr_type = virXMLPropString(cur, "type");
+macaddr_check = virXMLPropString(cur, "check");
 } else if (!network &&
def->type == VIR_DOMAIN_NET_TYPE_NETWORK &&
virXMLNodeNameEqual(cur, "source")) {
@@ -12209,6 +12211,16 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
 }
 def->mac_type = tmp;
 }
+if (macaddr_check) {
+int tmpCheck;
+if ((tmpCheck = virTristateBoolTypeFromString(macaddr_check)) < 0) {
+virReportError(VIR_ERR_XML_ERROR,
+   _("invalid mac address check value: '%s'"),
+   macaddr_check);
+goto error;
+}
+def->mac_check = tmpCheck;
+}
 
 if (virDomainDeviceInfoParseXML(xmlopt, node, >info,
 flags | VIR_DOMAIN_DEF_PARSE_ALLOW_BOOT
@@ -26561,6 +26573,8 @@ virDomainNetDefFormat(virBufferPtr buf,
   virMacAddrFormat(>mac, macstr));
 if (def->mac_type)
 virBufferAsprintf(buf, " type='%s'", 
virDomainNetMacTypeTypeToString(def->mac_type));
+if (def->mac_check != VIR_TRISTATE_BOOL_ABSENT)
+virBufferAsprintf(buf, " check='%s'", 
virTristateBoolTypeToString(def->mac_check));
 virBufferAddLit(buf, "/>\n");
 
 if (publicActual) {
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 241149af24..6e9da298b4 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -982,6 +982,7 @@ struct _virDomainNetDef {
 virMacAddr mac;
 bool mac_generated; /* true if mac was *just now* auto-generated by 
libvirt */
 virDomainNetMacType mac_type;
+virTristateBool mac_check;
 int model; /* virDomainNetModelType */
 char *modelstr;
 union {
diff --git a/src/vmx/vmx.c b/src/vmx/vmx.c
index 72f6a7d8dd..a123a8807c 100644
--- a/src/vmx/vmx.c
+++ b/src/vmx/vmx.c
@@ -2638,6 +2638,14 @@ virVMXParseEthernet(virConfPtr conf, int controller, 
virDomainNetDefPtr *def)
 goto cleanup;
 }
 
+if (checkMACAddress) {
+if (STREQ(checkMACAddress, "true")) {
+(*def)->mac_check = VIR_TRISTATE_BOOL_YES;
+} else {
+(*def)->mac_check = VIR_TRISTATE_BOOL_NO;
+}
+}
+
 /* vmx:virtualDev, vmx:features -> def:model */
 if (virVMXGetConfigString(conf, virtualDev_name, , true) < 0 ||
 virVMXGetConfigLong(conf, features_name, , 0, true) < 0) {
@@ -3865,6 +3873,9 @@ virVMXFormatEthernet(virDomainNetDefPtr def, int 
controller,
 mac_check = VIR_TRISTATE_BOOL_ABSENT;
 }
 
+if (def->mac_check != VIR_TRISTATE_BOOL_ABSENT)
+mac_check = 

[libvirt PATCH 2/3] vmx: support outputing the type attribute for MAC addresses

2020-07-20 Thread Daniel P . Berrangé
When support for MAC addresses having a type='static|generated'
attribute was added in:

  commit 454e5961abf40c14f8b6d7ee216229e68fd170bf
  Author: Bastien Orivel 
  Date:   Mon Jul 13 16:28:53 2020 +0200

Add a type attribute on the mac address element

the VMX -> XML parser was not updated. As a result while we
accept the 'type' attribute on input, we never show it again
on 'output', so we loose information during the roundtrip.

Signed-off-by: Daniel P. Berrangé 
---
 src/vmx/vmx.c | 11 +-
 .../vmx2xml-case-insensitive-1.xml|  2 +-
 .../vmx2xml-case-insensitive-2.xml|  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-1.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-2.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-3.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-4.xml |  4 ++--
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-5.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-6.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-7.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-8.xml | 20 +--
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-9.xml |  2 +-
 .../vmx2xmldata/vmx2xml-ethernet-bridged.xml  |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-custom.xml |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-e1000.xml  |  2 +-
 .../vmx2xml-ethernet-generated.xml|  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-nat.xml|  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-other.xml  |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-static.xml |  2 +-
 .../vmx2xmldata/vmx2xml-ethernet-vmxnet2.xml  |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-vpx.xml|  2 +-
 .../vmx2xml-fusion-in-the-wild-1.xml  |  4 ++--
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-1.xml |  2 +-
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-2.xml |  2 +-
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-3.xml |  4 ++--
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-4.xml |  2 +-
 .../vmx2xmldata/vmx2xml-ws-in-the-wild-1.xml  |  2 +-
 .../vmx2xmldata/vmx2xml-ws-in-the-wild-2.xml  |  2 +-
 28 files changed, 49 insertions(+), 40 deletions(-)

diff --git a/src/vmx/vmx.c b/src/vmx/vmx.c
index f0a45089cc..72f6a7d8dd 100644
--- a/src/vmx/vmx.c
+++ b/src/vmx/vmx.c
@@ -2535,6 +2535,9 @@ virVMXParseEthernet(virConfPtr conf, int controller, 
virDomainNetDefPtr *def)
 char generatedAddress_name[48] = "";
 char *generatedAddress = NULL;
 
+char checkMACAddress_name[48] = "";
+char *checkMACAddress = NULL;
+
 char address_name[48] = "";
 char *address = NULL;
 
@@ -2564,6 +2567,7 @@ virVMXParseEthernet(virConfPtr conf, int controller, 
virDomainNetDefPtr *def)
 VMX_BUILD_NAME(connectionType);
 VMX_BUILD_NAME(addressType);
 VMX_BUILD_NAME(generatedAddress);
+VMX_BUILD_NAME(checkMACAddress);
 VMX_BUILD_NAME(address);
 VMX_BUILD_NAME(virtualDev);
 VMX_BUILD_NAME(features);
@@ -2598,7 +2602,9 @@ virVMXParseEthernet(virConfPtr conf, int controller, 
virDomainNetDefPtr *def)
   true) < 0 ||
 virVMXGetConfigString(conf, generatedAddress_name, ,
   true) < 0 ||
-virVMXGetConfigString(conf, address_name, , true) < 0) {
+virVMXGetConfigString(conf, address_name, , true) < 0 ||
+virVMXGetConfigString(conf, checkMACAddress_name, ,
+  true) < 0) {
 goto cleanup;
 }
 
@@ -2613,6 +2619,8 @@ virVMXParseEthernet(virConfPtr conf, int controller, 
virDomainNetDefPtr *def)
 goto cleanup;
 }
 }
+if (addressType != NULL)
+(*def)->mac_type = VIR_DOMAIN_NET_MAC_TYPE_GENERATED;
 } else if (STRCASEEQ(addressType, "static")) {
 if (address != NULL) {
 if (virMacAddrParse(address, &(*def)->mac) < 0) {
@@ -2622,6 +2630,7 @@ virVMXParseEthernet(virConfPtr conf, int controller, 
virDomainNetDefPtr *def)
 goto cleanup;
 }
 }
+(*def)->mac_type = VIR_DOMAIN_NET_MAC_TYPE_STATIC;
 } else {
 virReportError(VIR_ERR_INTERNAL_ERROR,
_("Expecting VMX entry '%s' to be 'generated' or 
'static' or "
diff --git a/tests/vmx2xmldata/vmx2xml-case-insensitive-1.xml 
b/tests/vmx2xmldata/vmx2xml-case-insensitive-1.xml
index fd38cfd67f..7cb6413941 100644
--- a/tests/vmx2xmldata/vmx2xml-case-insensitive-1.xml
+++ b/tests/vmx2xmldata/vmx2xml-case-insensitive-1.xml
@@ -22,7 +22,7 @@
 
 
 
-  
+  
   
 
 
diff --git a/tests/vmx2xmldata/vmx2xml-case-insensitive-2.xml 
b/tests/vmx2xmldata/vmx2xml-case-insensitive-2.xml
index eb81691456..188c3f3cd5 100644
--- a/tests/vmx2xmldata/vmx2xml-case-insensitive-2.xml
+++ b/tests/vmx2xmldata/vmx2xml-case-insensitive-2.xml
@@ -22,7 +22,7 @@
 
 
 
-  
+  
   
 
 
diff --git a/tests/vmx2xmldata/vmx2xml-esx-in-the-wild-1.xml 
b/tests/vmx2xmldata/vmx2xml-esx-in-the-wild-1.xml
index 906cfe0648..c15275ccb9 100644
--- 

[libvirt PATCH 0/3] vmx: fix handling of VMX MAC address options

2020-07-20 Thread Daniel P . Berrangé
We first had a proposal to add a "check" attribute for MAC addresses

  https://www.redhat.com/archives/libvir-list/2020-July/msg00617.html

For reasons I don't understand this was then replaced by a "type"
attribute

  https://www.redhat.com/archives/libvir-list/2020-July/msg00656.html

with the "type" attribute having the side-effect of changing the
VMX checkMACAddress config. See the first patch for more detailed
description of the flaws.

The core problem with the original VMX code before either of these
patches was that we have multiple distinct VMX config settings and
they were all being overloaded into a single MAC address attribute
in the XML. This overloading is inherantly loosing information so
cannot be reliably round-trippped.

The only way to solve this is to actually have explicit attributes
for the config parameters from VMX and stop overloading things.

IOW, we needed *both* the "check" and "type" attributes.

That is what this series does. It also adds the missing VMX -> XML
conversion step so we round-trip everything.

There are still some pieces that are not perfect.

 - libvirt has type=static|generated, but VMX has
   type=static|generated|vpx.  "vpx" is just a synonym for
   "generated", but using a different MAC addr range. We
   use the range to do the mapping, and can probablylive
   with that.

 - VMX has a a address offset field - I've not found any
   info on what this is needed for, but we hardcode it
   to 0 for XML -> VMX config, and ignore it entirely
   for VMX -> XML config. This means we won't roundtrip
   this setting.  This needs fixing if anyone expects
   we'll see  offset != 0 in the real world.

Bastien Orivel (1):
  Add a check attribute on the mac address element

Daniel P. Berrangé (2):
  vmx: fix logic handling mac address type
  vmx: support outputing the type attribute for MAC addresses

 docs/schemas/domaincommon.rng |  5 ++
 src/conf/domain_conf.c| 14 
 src/conf/domain_conf.h|  1 +
 src/vmx/vmx.c | 77 ++-
 .../network-interface-mac-check.xml   | 29 +++
 tests/genericxml2xmltest.c|  2 +
 .../vmx2xml-case-insensitive-1.xml|  2 +-
 .../vmx2xml-case-insensitive-2.xml|  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-1.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-2.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-3.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-4.xml |  4 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-5.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-6.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-7.xml |  2 +-
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-8.xml | 20 ++---
 .../vmx2xmldata/vmx2xml-esx-in-the-wild-9.xml |  2 +-
 .../vmx2xmldata/vmx2xml-ethernet-bridged.xml  |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-custom.xml |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-e1000.xml  |  2 +-
 .../vmx2xml-ethernet-generated.xml|  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-nat.xml|  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-other.xml  |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-static.xml |  2 +-
 .../vmx2xmldata/vmx2xml-ethernet-vmxnet2.xml  |  2 +-
 tests/vmx2xmldata/vmx2xml-ethernet-vpx.xml|  2 +-
 .../vmx2xml-fusion-in-the-wild-1.xml  |  4 +-
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-1.xml |  2 +-
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-2.xml |  2 +-
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-3.xml |  4 +-
 .../vmx2xmldata/vmx2xml-gsx-in-the-wild-4.xml |  2 +-
 .../vmx2xmldata/vmx2xml-ws-in-the-wild-1.xml  |  2 +-
 .../vmx2xmldata/vmx2xml-ws-in-the-wild-2.xml  |  2 +-
 .../xml2vmxdata/xml2vmx-ethernet-mac-type.vmx |  7 +-
 .../xml2vmxdata/xml2vmx-ethernet-mac-type.xml |  4 +-
 35 files changed, 154 insertions(+), 63 deletions(-)
 create mode 100644 tests/genericxml2xmlindata/network-interface-mac-check.xml

-- 
2.24.1



[libvirt PATCH 1/3] vmx: fix logic handling mac address type

2020-07-20 Thread Daniel P . Berrangé
With the current formatter, the XML snippets:


  
  


  
  


result in

 ethernet1.present = "true"
 ethernet1.networkName = "br1"
 ethernet1.connectionType = "bridged"
 ethernet1.addressType = "static"
 ethernet1.address = "00:0c:29:dd:ee:fe"
 ethernet1.checkMACAddress = "false"

 ethernet2.present = "true"
 ethernet2.networkName = "br2"
 ethernet2.connectionType = "bridged"
 ethernet2.addressType = "static"
 ethernet2.address = "aa:bb:cc:dd:ee:fd"
 ethernet2.checkMACAddress = "false"

which is flawed, as both type='static' and type='generated' in the XML
turn into 'static' in the VMX config.

The existence of the 'static' attribute is further overriding whether
the checkMACAddress config option is set as a side effect.

Both these pieces of flawed logic were introduced in

  commit 454e5961abf40c14f8b6d7ee216229e68fd170bf
  Author: Bastien Orivel 
  Date:   Mon Jul 13 16:28:53 2020 +0200

Add a type attribute on the mac address element

which intentionally added the 'checkMACAddress' side effect based on
the 'type' attribute.

With this change, we're reverting the handling of checkMACAddress
to match what existed historically. The 'type' attribute now directly
maps to the addressType attribute, so the above config becomes:

 ethernet1.present = "true"
 ethernet1.networkName = "br1"
 ethernet1.connectionType = "bridged"
 ethernet1.addressType = "static"
 ethernet1.address = "00:0c:29:dd:ee:fe"

 ethernet2.present = "true"
 ethernet2.networkName = "br2"
 ethernet2.connectionType = "bridged"
 ethernet2.addressType = "generated"
 ethernet2.generatedAddress = "aa:bb:cc:dd:ee:fd"
 ethernet2.generatedAddressOffset = "0"

Signed-off-by: Daniel P. Berrangé 
---
 src/vmx/vmx.c | 55 +--
 .../xml2vmxdata/xml2vmx-ethernet-mac-type.vmx |  7 +--
 2 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/src/vmx/vmx.c b/src/vmx/vmx.c
index 97ec84446a..f0a45089cc 100644
--- a/src/vmx/vmx.c
+++ b/src/vmx/vmx.c
@@ -3732,7 +3732,9 @@ virVMXFormatEthernet(virDomainNetDefPtr def, int 
controller,
  virBufferPtr buffer, int virtualHW_version)
 {
 char mac_string[VIR_MAC_STRING_BUFLEN];
-const bool staticMac = def->mac_type == VIR_DOMAIN_NET_MAC_TYPE_STATIC;
+virDomainNetMacType mac_type = VIR_DOMAIN_NET_MAC_TYPE_DEFAULT;
+virTristateBool mac_check = VIR_TRISTATE_BOOL_ABSENT;
+bool mac_vpx = false;
 unsigned int prefix, suffix;
 
 /*
@@ -3830,31 +3832,48 @@ virVMXFormatEthernet(virDomainNetDefPtr def, int 
controller,
 prefix = (def->mac.addr[0] << 16) | (def->mac.addr[1] << 8) | 
def->mac.addr[2];
 suffix = (def->mac.addr[3] << 16) | (def->mac.addr[4] << 8) | 
def->mac.addr[5];
 
-if (prefix == 0x000c29 && !staticMac) {
-virBufferAsprintf(buffer, "ethernet%d.addressType = \"generated\"\n",
-  controller);
-virBufferAsprintf(buffer, "ethernet%d.generatedAddress = \"%s\"\n",
-  controller, mac_string);
-virBufferAsprintf(buffer, "ethernet%d.generatedAddressOffset = 
\"0\"\n",
-  controller);
-} else if (prefix == 0x005056 && suffix <= 0x3f && !staticMac) {
-virBufferAsprintf(buffer, "ethernet%d.addressType = \"static\"\n",
-  controller);
-virBufferAsprintf(buffer, "ethernet%d.address = \"%s\"\n",
-  controller, mac_string);
-} else if (prefix == 0x005056 && suffix >= 0x80 && suffix <= 0xbf 
&& !staticMac) {
-virBufferAsprintf(buffer, "ethernet%d.addressType = \"vpx\"\n",
-  controller);
+/*
+ * Historically we've not stored all the MAC related properties
+ * explicitly in the XML, so we must figure out some defaults
+ * based on the address ranges.
+ */
+if (prefix == 0x000c29) {
+mac_type = VIR_DOMAIN_NET_MAC_TYPE_GENERATED;
+} else if (prefix == 0x005056 && suffix <= 0x3f) {
+mac_type = VIR_DOMAIN_NET_MAC_TYPE_STATIC;
+} else if (prefix == 0x005056 && suffix >= 0x80 && suffix <= 0xbf) 
{
+mac_type = VIR_DOMAIN_NET_MAC_TYPE_GENERATED;
+mac_vpx = true;
+} else {
+mac_type = VIR_DOMAIN_NET_MAC_TYPE_STATIC;
+mac_check = VIR_TRISTATE_BOOL_NO;
+}
+
+/* If explicit MAC type is set, ignore the above defaults */
+if (def->mac_type != VIR_DOMAIN_NET_MAC_TYPE_DEFAULT) {
+mac_type = def->mac_type;
+if (mac_type == VIR_DOMAIN_NET_MAC_TYPE_GENERATED)
+mac_check = VIR_TRISTATE_BOOL_ABSENT;
+}
+
+if (mac_type == VIR_DOMAIN_NET_MAC_TYPE_GENERATED) {
+virBufferAsprintf(buffer, "ethernet%d.addressType = \"%s\"\n",
+  controller, mac_vpx ? "vpx" : "generated");
 virBufferAsprintf(buffer, "ethernet%d.generatedAddress = \"%s\"\n",
   controller, mac_string);
+if (!mac_vpx)
+  

Re: [libvirt PATCH] network: split out networkSetIPv6Sysctl

2020-07-20 Thread Andrea Bolognani
On Wed, 2020-07-15 at 23:27 +0200, Ján Tomko wrote:
> Refactor networkSetIPv6Sysctls to remove repetition and reuse
> of the 'field' variable.
> 
> Signed-off-by: Ján Tomko 
> ---
>  src/network/bridge_driver.c | 71 ++---
>  1 file changed, 35 insertions(+), 36 deletions(-)

Reviewed-by: Andrea Bolognani 

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: The issue about adding multipath device's targets into qemu-pr-helper's namespace

2020-07-20 Thread Lin Ma

On 2020-07-17 07:05, Michal Prívozník wrote:

On 7/15/20 5:35 AM, Lin Ma wrote:

On 2020-07-14 14:38, Michal Privoznik wrote:

On 7/14/20 3:41 PM, Lin Ma wrote:

Hi all,

I have a namespace question about passthrough disk(multipath 
device).
In case of enabling namespace and cgroups in qemu.conf, The 
target(s) of the
multipath device won't be added into qemu-pr-helper's namespace 
under certain
situation, It causes the PERSISTENT RESERVE command failure in 
guest.




Yeah, this is expected. I mean, the failure is expected if not all
devices are added to the namespace.


While user starts a vm,
To build namespace, The qemuDomainSetupDisk() will be invoked via 
threadA(this

thread id will be the qemu's pid),
To build cgroup, The qemuSetupImagePathCgroup() will be invoked via 
threadB.


Both of the functions invoke the virDevMapperGetTargets() trying to 
parse a
multipath device to target paths string, Then fill the 
targetPaths[].


The issue I experienced is:
After libvirtd started, Everything works well for the first booted 
vm which has

the passthrough multipath device.
But If I shut it down & start it again, OR keep it running & start 
another vm
which has other passthrough multipath device, Then the target(s) of 
the fresh
started vm won't be added into the related qemu-pr-helper's 
namespace and it
causes PERSISTENT RESERVE command failure in the corresponding 
guest.
I digged into code, In this situation, The targetPaths[] in 
qemuDomainSetupDisk()
won't be filled, it keeps NULL after virDevMapperGetTargets() 
returns.
The virDevMapperGetTargets doesn't fill targetPaths[] because the 
dm_task_run()

of libdevmapper returns 0 with errno 9(Bad file descriptor).
So far, I don't understand why the dm_task_run() return 0 in this 
situation.
BTW, The virDevMapperGetTargets() can always successfully fill the 
targetPaths[]

in qemuSetupImagePathCgroup().


What is your libvirt version? I've merged a fix for something similar
not a long ago: v6.5.0-rc1~190

Can you please check v6.5.0?


The libvirt version I used in the tests is git master.
The sure thing is that the multipath devices I used are valid and 
functional.


In the test A or the test B, Everytime we restart libvirtd, The issue 
won't happen for

the first booted vm.

In other words, The target of multipath device can be parsed & added 
to qemu-pr-helper's

namespace for the first booted vm.

As long as libvirtd once invoked 
qemuSetupImagePathCgroup()->virDevMapperGetTargetsImpl()

to successfully parse any valid multipath device,
 From then on, The 
qemuDomainSetupDisk()->virDevMapperGetTargetsImpl()->dm_task_run()

returns 0 with errno=9 when parsing any valid multipath device.


At this point I'm wondering whether we are using the API correctly.
If you disable namespaces, then CGroups still need to know the targets
and will query dm. Does that work?
Only disable namespaces or only disable CGroups, The process of querying 
dm deps works well.



The aim is to do what 'dmsetup deps' does. Once you reach the error
state - does that command work?

'dmsetup deps' works well while I hit the error state.



errno=9 is EBADF so maybe this is a dm issue after all?
Emm...So far I lean towards to an issue in libvirt code, rather than in 
dm.




BTW:

Maybe we will need to merge the following patch to log error messages 
from dm?


https://gitlab.com/MichalPrivoznik/libvirt/-/commit/ed3a7ae5d0cd3f492fe219dbd6b4820dde55841f

After merged this commit, I pasted part of log outputs here:

* libvirtd log:
2020-07-20 08:19:27.970+: 4621: info : libvirt version: 6.6.0
2020-07-20 08:19:27.970+: 4621: info : hostname: tw
..
2020-07-20 08:19:27.970+: 4636: debug : virThreadJobSetWorker:75 : 
Thread 4636 is running worker prio-rpc-worker
2020-07-20 08:19:27.971+: 4631: debug : virThreadJobSetWorker:75 : 
Thread 4631 is running worker rpc-worker
2020-07-20 08:19:27.971+: 4637: debug : virThreadJobSetWorker:75 : 
Thread 4637 is running worker prio-rpc-worker
2020-07-20 08:19:27.971+: 4635: debug : virThreadJobSetWorker:75 : 
Thread 4635 is running worker rpc-worker
2020-07-20 08:19:27.971+: 4634: debug : virThreadJobSetWorker:75 : 
Thread 4634 is running worker rpc-worker
2020-07-20 08:19:27.971+: 4639: debug : virThreadJobSetWorker:75 : 
Thread 4639 is running worker prio-rpc-worker
2020-07-20 08:19:27.971+: 4638: debug : virThreadJobSetWorker:75 : 
Thread 4638 is running worker prio-rpc-worker

..
2020-07-20 08:19:27.971+: 4632: debug : virThreadJobSetWorker:75 : 
Thread 4632 is running worker rpc-worker

..
2020-07-20 08:19:33.293+: 4632: debug : qemuSetupImagePathCgroup:73 
: Allow path /opt/vms/guest01/disk0.qcow2, perms: rw
2020-07-20 08:19:33.293+: 4632: debug : qemuSetupImagePathCgroup:73 
: Allow path /dev/mapper/control, perms: rw
2020-07-20 08:19:33.293+: 4632: debug : virCgroupSetValueRaw:454 : 
Set value 

Re: [GSoC][PATCH v5] qemu_domainjob: introduce `privateData` for `qemuDomainJob`

2020-07-20 Thread Michal Privoznik

On 7/16/20 1:48 PM, Prathamesh Chavan wrote:

To remove dependecy of `qemuDomainJob` on job specific
paramters, a `privateData` pointer is introduced.
To handle it, structure of callback functions is
also introduced.

Signed-off-by: Prathamesh Chavan 
---
Previous version of this patch can be found here[1].

This patch mainly improvises on the changes suggested in the review
of the previous version by Martin and Michal.

Currently, the structure `qemuDomainJobInfo` is kept untouched.

  src/qemu/qemu_domain.c   | 70 ++-
  src/qemu/qemu_domain.h   | 10 +
  src/qemu/qemu_domainjob.c| 71 ++--
  src/qemu/qemu_domainjob.h| 38 -
  src/qemu/qemu_driver.c   |  3 +-
  src/qemu/qemu_migration.c| 28 +
  src/qemu/qemu_migration_params.c |  9 ++--
  src/qemu/qemu_process.c  | 15 +--
  8 files changed, 185 insertions(+), 59 deletions(-)




diff --git a/src/qemu/qemu_domainjob.c b/src/qemu/qemu_domainjob.c
index d96d5334a3..7cbfe3801e 100644
--- a/src/qemu/qemu_domainjob.c
+++ b/src/qemu/qemu_domainjob.c
@@ -159,24 +159,6 @@ qemuDomainEventEmitJobCompleted(virQEMUDriverPtr driver,
  virObjectEventStateQueue(driver->domainEventState, event);
  }
  
-

-int
-qemuDomainObjInitJob(qemuDomainJobObjPtr job)
-{
-memset(job, 0, sizeof(*job));
-
-if (virCondInit(>cond) < 0)
-return -1;
-
-if (virCondInit(>asyncCond) < 0) {
-virCondDestroy(>cond);
-return -1;
-}
-
-return 0;
-}
-
-




+
+int
+qemuDomainObjInitJob(qemuDomainJobObjPtr job,
+ qemuDomainObjPrivateJobCallbacksPtr cb)
+{
+memset(job, 0, sizeof(*job));
+job->cb = cb;
+
+if (!(job->privateData = job->cb->allocJobPrivate()))
+return -1;
+
+if (virCondInit(>cond) < 0) {
+job->cb->freeJobPrivate(job->privateData);
  return -1;
+}
+
+if (virCondInit(>asyncCond) < 0) {
+job->cb->freeJobPrivate(job->privateData);
+virCondDestroy(>cond);
+return -1;
+}
  
  return 0;

  }



There is no need to move this function. It can live where it is.

Reviewed-by: Michal Privoznik 

As promised on our meeting earlier today, I looked into turning this 
into a virObject/GObject. It's not going to be that easy (read trivial) 
and thus let's go with this patch for now. We can always turn it into an 
object if we want.


Michal



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-20 Thread Peter Xu
On Mon, Jul 20, 2020 at 12:02:06PM +0800, Jason Wang wrote:
> Right, so there's no need to deal with unmap in vtd's replay implementation
> (as what generic one did).

We don't even for now; see vtd_page_walk_info.notify_unmap.  Thanks,

-- 
Peter Xu



Re: [PATCH] qemu_driver.c: Fix domfsinfo for non-PCI device information from guest agent

2020-07-20 Thread Thomas Huth
On 20/07/2020 12.32, Daniel P. Berrangé wrote:
> On Mon, Jul 20, 2020 at 12:22:33PM +0200, Thomas Huth wrote:
>> qemuAgentFSInfoToPublic() currently only sets the devAlias for PCI devices.
>> However, the QEMU guest agent could also provide the device name in the
>> "dev" field of the response for other devices instead (well, at least after
>> fixing another problem in the current QEMU guest agent...). So if creating
>> the devAlias from the PCI information failed, let's fall back to the name
>> provided by the guest agent. This helps to fix the empty "Target" fields
>> that occur when running "virsh domfsinfo" on s390x where CCW devices are
>> used for the guest instead of PCI devices.
>>
>> Also add a proper debug message here in case we completely failed to set the
>> device alias, since this problem here was very hard to debug: The only two
>> error messages that I've seen were "Unable to get filesystem information"
>> and "Unable to encode message payload" - which only indicates that something
>> went wrong in the RPC call. No debug message indicated the real problem, so
>> I had to learn the hard way why the RPC call failed (it apparently does not
>> like devAlias left to be NULL) and where the real problem comes from.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1755075
>> Signed-off-by: Thomas Huth 
>> ---
>>  src/qemu/qemu_driver.c | 19 +++
>>  1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index d185666ed8..e45c61ee20 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -21935,14 +21935,17 @@ qemuAgentFSInfoToPublic(qemuAgentFSInfoPtr agent,
>>  qemuAgentDiskInfoPtr agentdisk = agent->disks[i];
>>  virDomainDiskDefPtr diskDef;
>>  
>> -if (!(diskDef = virDomainDiskByAddress(vmdef,
>> -   >pci_controller,
>> -   agentdisk->bus,
>> -   agentdisk->target,
>> -   agentdisk->unit)))
>> -continue;
>> -
>> -ret->devAlias[i] = g_strdup(diskDef->dst);
>> +diskDef = virDomainDiskByAddress(vmdef,
>> + >pci_controller,
>> + agentdisk->bus,
>> + agentdisk->target,
>> + agentdisk->unit);
> 
> Hmmm, even on x86 I think the original code is broken, or at least fragile.
> The libvirt "bus" number is not guaranteed to match the guest kernel PCI
> "bus" number. I wonder if this has been validated for complex PCI-Express
> topologies.

I also thought already that maybe I should revert the logic and try
agentdisk->devnode first and only fall back to virDomainDiskByAddress()
if it is not set? ... but that might result in different behavior in
some cases, so I did not use that approach in the this version of my patch.

>> +if (diskDef != NULL)
>> +ret->devAlias[i] = g_strdup(diskDef->dst);
>> +else if (agentdisk->devnode != NULL)
>> +ret->devAlias[i] = g_strdup(agentdisk->devnode);
> 
> In the linked BZ the "devnode" field is not provided either. You mention
> a related QEMU patch, which I guess fixes that issue in QEMU ? Can you point
> to that fix.

I just sent the related patches to qemu-devel (look for the "Allow
guest-get-fsinfo also for non-PCI devices" patch series ... I've also
put you on CC:).

 Thomas



Re: device compatibility interface for live migration with assigned devices

2020-07-20 Thread Sean Mooney
On Mon, 2020-07-20 at 11:41 +0800, Jason Wang wrote:
> On 2020/7/18 上午12:12, Alex Williamson wrote:
> > On Thu, 16 Jul 2020 16:32:30 +0800
> > Yan Zhao  wrote:
> > 
> > > On Thu, Jul 16, 2020 at 12:16:26PM +0800, Jason Wang wrote:
> > > > On 2020/7/14 上午7:29, Yan Zhao wrote:
> > > > > hi folks,
> > > > > we are defining a device migration compatibility interface that helps 
> > > > > upper
> > > > > layer stack like openstack/ovirt/libvirt to check if two devices are
> > > > > live migration compatible.
> > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the 
> > > > > two.
> > > > > e.g. we could use it to check whether
> > > > > - a src MDEV can migrate to a target MDEV,
> > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > > > - a src MDEV can migration to a target VF in SRIOV.
> > > > > (e.g. SIOV/SRIOV backward compatibility case)
> > > > > 
> > > > > The upper layer stack could use this interface as the last step to 
> > > > > check
> > > > > if one device is able to migrate to another device before triggering 
> > > > > a real
> > > > > live migration procedure.
> > > > > we are not sure if this interface is of value or help to you. please 
> > > > > don't
> > > > > hesitate to drop your valuable comments.
> > > > > 
> > > > > 
> > > > > (1) interface definition
> > > > > The interface is defined in below way:
> > > > > 
> > > > >__userspace
> > > > > /\  \
> > > > >/ \write
> > > > >   / read  \
> > > > >  /__   ___\|/_
> > > > > | migration_version | | migration_version |-->check migration
> > > > > - -   compatibility
> > > > >device Adevice B
> > > > > 
> > > > > 
> > > > > a device attribute named migration_version is defined under each 
> > > > > device's
> > > > > sysfs node. e.g. 
> > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > > 
> > > > Are you aware of the devlink based device management interface that is
> > > > proposed upstream? I think it has many advantages over sysfs, do you
> > > > consider to switch to that?
> > 
> > Advantages, such as?
> 
> 
> My understanding for devlink(netlink) over sysfs (some are mentioned at 
> the time of vDPA sysfs mgmt API discussion) are:
i tought netlink was used more a as a configuration protocoal to qurry and 
confire nic and i guess
other devices in its devlink form requireint a tool to be witten that can speak 
the protocal to interact with.
the primary advantate of sysfs is that everything is just a file. there are no 
addtional depleenceis
needed and unlike netlink there are not interoperatblity issues in a 
coanitnerised env. if you are using diffrenet
version of libc and gcc in the contaienr vs the host my understanding is tools 
like ethtool from ubuntu deployed
in a container on a centos host can have issue communicating with the host 
kernel. if its jsut a file unless
the format the data is returnin in chagnes or the layout of sysfs changes its 
compatiable regardless of what you
use to read it.
> 
> - existing users (NIC, crypto, SCSI, ib), mature and stable
> - much better error reporting (ext_ack other than string or errno)
> - namespace aware
> - do not couple with kobject
> 
> Thanks
> 



Re: [PATCH] qemu_driver.c: Fix domfsinfo for non-PCI device information from guest agent

2020-07-20 Thread Daniel P . Berrangé
On Mon, Jul 20, 2020 at 12:22:33PM +0200, Thomas Huth wrote:
> qemuAgentFSInfoToPublic() currently only sets the devAlias for PCI devices.
> However, the QEMU guest agent could also provide the device name in the
> "dev" field of the response for other devices instead (well, at least after
> fixing another problem in the current QEMU guest agent...). So if creating
> the devAlias from the PCI information failed, let's fall back to the name
> provided by the guest agent. This helps to fix the empty "Target" fields
> that occur when running "virsh domfsinfo" on s390x where CCW devices are
> used for the guest instead of PCI devices.
> 
> Also add a proper debug message here in case we completely failed to set the
> device alias, since this problem here was very hard to debug: The only two
> error messages that I've seen were "Unable to get filesystem information"
> and "Unable to encode message payload" - which only indicates that something
> went wrong in the RPC call. No debug message indicated the real problem, so
> I had to learn the hard way why the RPC call failed (it apparently does not
> like devAlias left to be NULL) and where the real problem comes from.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1755075
> Signed-off-by: Thomas Huth 
> ---
>  src/qemu/qemu_driver.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index d185666ed8..e45c61ee20 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -21935,14 +21935,17 @@ qemuAgentFSInfoToPublic(qemuAgentFSInfoPtr agent,
>  qemuAgentDiskInfoPtr agentdisk = agent->disks[i];
>  virDomainDiskDefPtr diskDef;
>  
> -if (!(diskDef = virDomainDiskByAddress(vmdef,
> -   >pci_controller,
> -   agentdisk->bus,
> -   agentdisk->target,
> -   agentdisk->unit)))
> -continue;
> -
> -ret->devAlias[i] = g_strdup(diskDef->dst);
> +diskDef = virDomainDiskByAddress(vmdef,
> + >pci_controller,
> + agentdisk->bus,
> + agentdisk->target,
> + agentdisk->unit);

Hmmm, even on x86 I think the original code is broken, or at least fragile.
The libvirt "bus" number is not guaranteed to match the guest kernel PCI
"bus" number. I wonder if this has been validated for complex PCI-Express
topologies.

> +if (diskDef != NULL)
> +ret->devAlias[i] = g_strdup(diskDef->dst);
> +else if (agentdisk->devnode != NULL)
> +ret->devAlias[i] = g_strdup(agentdisk->devnode);

In the linked BZ the "devnode" field is not provided either. You mention
a related QEMU patch, which I guess fixes that issue in QEMU ? Can you point
to that fix.

> +else
> +VIR_DEBUG("Missing devnode name for '%s'.", ret->mountpoint);
>  }

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[PATCH] qemu_driver.c: Fix domfsinfo for non-PCI device information from guest agent

2020-07-20 Thread Thomas Huth
qemuAgentFSInfoToPublic() currently only sets the devAlias for PCI devices.
However, the QEMU guest agent could also provide the device name in the
"dev" field of the response for other devices instead (well, at least after
fixing another problem in the current QEMU guest agent...). So if creating
the devAlias from the PCI information failed, let's fall back to the name
provided by the guest agent. This helps to fix the empty "Target" fields
that occur when running "virsh domfsinfo" on s390x where CCW devices are
used for the guest instead of PCI devices.

Also add a proper debug message here in case we completely failed to set the
device alias, since this problem here was very hard to debug: The only two
error messages that I've seen were "Unable to get filesystem information"
and "Unable to encode message payload" - which only indicates that something
went wrong in the RPC call. No debug message indicated the real problem, so
I had to learn the hard way why the RPC call failed (it apparently does not
like devAlias left to be NULL) and where the real problem comes from.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1755075
Signed-off-by: Thomas Huth 
---
 src/qemu/qemu_driver.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index d185666ed8..e45c61ee20 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -21935,14 +21935,17 @@ qemuAgentFSInfoToPublic(qemuAgentFSInfoPtr agent,
 qemuAgentDiskInfoPtr agentdisk = agent->disks[i];
 virDomainDiskDefPtr diskDef;
 
-if (!(diskDef = virDomainDiskByAddress(vmdef,
-   >pci_controller,
-   agentdisk->bus,
-   agentdisk->target,
-   agentdisk->unit)))
-continue;
-
-ret->devAlias[i] = g_strdup(diskDef->dst);
+diskDef = virDomainDiskByAddress(vmdef,
+ >pci_controller,
+ agentdisk->bus,
+ agentdisk->target,
+ agentdisk->unit);
+if (diskDef != NULL)
+ret->devAlias[i] = g_strdup(diskDef->dst);
+else if (agentdisk->devnode != NULL)
+ret->devAlias[i] = g_strdup(agentdisk->devnode);
+else
+VIR_DEBUG("Missing devnode name for '%s'.", ret->mountpoint);
 }
 
 return ret;
-- 
2.18.1



Re: [libvirt PATCH] spec: Drop explicit dependency on ncurses

2020-07-20 Thread Erik Skultety
On Sun, Jul 19, 2020 at 12:54:43AM +0200, Andrea Bolognani wrote:
> We don't actually use ncurses directly: readline needs it, but
> that's a readline implementation detail and not something that we
> should concern ourselves with.
>
> Signed-off-by: Andrea Bolognani 
> ---
> Test pipeline where I've tweaked all CI environments so that ncurses
> is not explicitly installed, and in the case of macOS pkg-config is
> not configured in a way that allows it to locate ncurses.pc:
>
>   https://gitlab.com/abologna/libvirt/-/pipelines/168175622
>
>  libvirt.spec.in | 2 --
>  1 file changed, 2 deletions(-)

Reviewed-by: Erik Skultety 



[PATCH] qemu: pre-create the dbus directory in qemuStateInitialize

2020-07-20 Thread Bihong Yu
>From 187323ce736dcd9b1a177d552630b0c6859a4798 Mon Sep 17 00:00:00 2001
From: Bihong Yu 
Date: Tue, 14 Jul 2020 15:44:05 +0800
Subject: [PATCH] qemu: pre-create the dbus directory in qemuStateInitialize

There are races condiction to make '/run/libvirt/qemu/dbus' directory in
virDirCreateNoFork() while concurrent start VMs, and get "failed to create
directory '/run/libvirt/qemu/dbus': File exists" error message. pre-create the
dbus directory in qemuStateInitialize.

Signed-off-by:Bihong Yu 
---
 src/qemu/qemu_dbus.c| 4 +---
 src/qemu/qemu_dbus.h| 2 +-
 src/qemu/qemu_driver.c  | 4 
 src/qemu/qemu_process.c | 3 ---
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/qemu/qemu_dbus.c b/src/qemu/qemu_dbus.c
index 51f6c94..0e0306a 100644
--- a/src/qemu/qemu_dbus.c
+++ b/src/qemu/qemu_dbus.c
@@ -34,10 +34,8 @@ VIR_LOG_INIT("qemu.dbus");


 int
-qemuDBusPrepareHost(virQEMUDriverPtr driver)
+qemuDBusPreparePath(virQEMUDriverConfigPtr cfg)
 {
-g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);
-
 return virDirCreate(cfg->dbusStateDir, 0770, cfg->user, cfg->group,
 VIR_DIR_CREATE_ALLOW_EXIST);
 }
diff --git a/src/qemu/qemu_dbus.h b/src/qemu/qemu_dbus.h
index 474eb10..6ce9f7b 100644
--- a/src/qemu/qemu_dbus.h
+++ b/src/qemu/qemu_dbus.h
@@ -21,7 +21,7 @@
 #include "qemu_conf.h"
 #include "qemu_domain.h"

-int qemuDBusPrepareHost(virQEMUDriverPtr driver);
+int qemuDBusPreparePath(virQEMUDriverConfigPtr cfg);

 char *qemuDBusGetAddress(virQEMUDriverPtr driver,
  virDomainObjPtr vm);
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index d185666..52b68c9 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -50,6 +50,7 @@
 #include "qemu_security.h"
 #include "qemu_checkpoint.h"
 #include "qemu_backup.h"
+#include "qemu_dbus.h"

 #include "virerror.h"
 #include "virlog.h"
@@ -790,6 +791,9 @@ qemuStateInitialize(bool privileged,
   cfg->migrationPortMax)) == NULL)
 goto error;

+if (qemuDBusPreparePath(cfg) < 0)
+goto error;
+
 if (qemuSecurityInit(qemu_driver) < 0)
 goto error;

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index eba14ed..46620ca 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -6449,9 +6449,6 @@ qemuProcessPrepareHost(virQEMUDriverPtr driver,
 qemuDomainObjPrivatePtr priv = vm->privateData;
 g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);

-if (qemuDBusPrepareHost(driver) < 0)
-return -1;
-
 if (qemuPrepareNVRAM(cfg, vm) < 0)
 return -1;

-- 
1.8.3.1



[libvirt PATCH] docs: virConnectGetCapabilities do not provide pool types

2020-07-20 Thread Pino Toscano
Remove the paragraph in the storage pool page that mentions
virConnectGetCapabilities, as virConnectGetCapabilities does not return
any information about pools.

Signed-off-by: Pino Toscano 
---
 docs/formatstoragecaps.html.in | 6 --
 1 file changed, 6 deletions(-)

diff --git a/docs/formatstoragecaps.html.in b/docs/formatstoragecaps.html.in
index ee3888f44d..d8a1cacd96 100644
--- a/docs/formatstoragecaps.html.in
+++ b/docs/formatstoragecaps.html.in
@@ -13,12 +13,6 @@
 supported, and if relevant the source format types, the required
 source elements, and the target volume format types. 
 
-The Storage Pool Capabilities XML provides more information than the
-  
-virConnectGetCapabilities
-  
-which only provides an enumerated list of supported pool types.
-
 Element and attribute overview
 
 A query interface was added to the virConnect API's to retrieve the
-- 
2.26.2



Re: [libvirt PATCH] spec: Don't require mdevctl on RHEL 7

2020-07-20 Thread Erik Skultety
On Sun, Jul 19, 2020 at 12:46:05AM +0200, Andrea Bolognani wrote:
> mdevctl is a relatively new tool that's packaged for Fedora and
> RHEL 8, but not for RHEL 7. Make the dependency conditional to
> avoid the libvirt-daemon-driver-nodedev package becoming
> uninstallable on that platform.
>
> Fixes: 9691440ecbc7d9383a1410f1067a4f9221f2de2c
> Signed-off-by: Andrea Bolognani 
> ---

Oops...
Reviewed-by: Erik Skultety