[PATCH] libxl: Fix driver reload

2021-09-13 Thread Jim Fehlig
On reload, the libxl driver calls virDomainObjListLoadAllConfigs to load
all configs from /etc/libvirt/libxl/ but incorrectly passes 'true' for
the liveStatus parameter, resulting in error messages such as

libvirtd[21053]: XML error: unexpected root element , expecting 

libvirtd[21053]: Failed to load config for domain 'sles15sp3'

Fix by not requesting live status when re-reading the persistent VM config
files.

Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c
index 6a3938ead4..c5dbcaafa5 100644
--- a/src/libxl/libxl_driver.c
+++ b/src/libxl/libxl_driver.c
@@ -821,7 +821,7 @@ libxlStateReload(void)
 virDomainObjListLoadAllConfigs(libxl_driver->domains,
cfg->configDir,
cfg->autostartDir,
-   true,
+   false,
libxl_driver->xmlopt,
NULL, libxl_driver);
 
-- 
2.33.0




[PATCH] libxl: Improve reporting of die_id in capabilities

2021-09-13 Thread Jim Fehlig
On Xen, libvirt runs in a VM (typically dom0) and does not have an accurate
picture of numa and cpu topology of the underlying physical machine using
the "usual" mechanisms. numa info and cpu toplogy are retrieved from libxl
and used to populate the libvirt conterparts. Commit 7b79ee2f78b introduced
support for reporting die_id in capabilities, but did not account for
special handling of numa and cpu topology in libxl.

Currently, Xen does not report die_id in the libxl_cputopology structure.
In the meantime, set die_id to 0, which was suggested by the Xen developers
and is slightly better than random garbage such as



Signed-off-by: Jim Fehlig 
---
 src/libxl/libxl_capabilities.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/libxl/libxl_capabilities.c b/src/libxl/libxl_capabilities.c
index b4bd1d7e62..4afed71436 100644
--- a/src/libxl/libxl_capabilities.c
+++ b/src/libxl/libxl_capabilities.c
@@ -292,6 +292,8 @@ libxlCapsInitNuma(libxl_ctx *ctx, virCaps *caps)
 cpus[node][nr_cpus_node[node]-1].id = i;
 cpus[node][nr_cpus_node[node]-1].socket_id = cpu_topo[i].socket;
 cpus[node][nr_cpus_node[node]-1].core_id = cpu_topo[i].core;
+/* Until Xen reports die_id, 0 is better than random garbage */
+cpus[node][nr_cpus_node[node]-1].die_id = 0;
 /* Allocate the siblings maps. We will be filling them later */
 cpus[node][nr_cpus_node[node]-1].siblings = virBitmapNew(nr_cpus);
 }
-- 
2.33.0




Re: question on vhost, limiting kernel threads and NPROC

2021-09-13 Thread Michael S. Tsirkin
On Mon, Sep 13, 2021 at 12:04:04PM -0500, Mike Christie wrote:
> I just realized I forgot to cc the virt list so adding now.
> 
> Christian see the very bottom for a different fork patch.
> 
> On 7/12/21 7:05 AM, Stefan Hajnoczi wrote:
> > On Fri, Jul 09, 2021 at 11:25:37AM -0500, Mike Christie wrote:
> >> Hi,
> >>
> >> The goal of this email is to try and figure how we want to track/limit the
> >> number of kernel threads created by vhost devices.
> >>
> >> Background:
> >> ---
> >> For vhost-scsi, we've hit a issue where the single vhost worker thread 
> >> can't
> >> handle all IO the being sent from multiple queues. IOPs is stuck at around
> >> 500K. To fix this, we did this patchset:
> >>
> >> https://lore.kernel.org/linux-scsi/20210525180600.6349-1-michael.chris...@oracle.com/
> >>
> >> which allows userspace to create N threads and map them to a dev's 
> >> virtqueues.
> >> With this we can get around 1.4M IOPs.
> >>
> >> Problem:
> >> 
> >> While those patches were being reviewed, a concern about tracking all these
> >> new possible threads was raised here:
> >>
> >> https://lore.kernel.org/linux-scsi/YL45CfpHyzSEcAJv@stefanha-x1.localdomain/
> >>
> >> To save you some time, the question is what does other kernel code using 
> >> the
> >> kthread API do to track the number of kernel threads created on behalf of
> >> a userspace thread. The answer is they don't do anything so we will have to
> >> add that code.
> >>
> >> I started to do that here:
> >>
> >> https://lkml.org/lkml/2021/6/23/1233
> >>
> >> where those patches would charge/check the vhost device owner's 
> >> RLIMIT_NPROC
> >> value. But, the question of if we really want to do this has come up which 
> >> is
> >> why I'm bugging lists like libvirt now.
> >>
> >> Question/Solution:
> >> --
> >> I'm bugging everyone so we can figure out:
> >>
> >> If we need to specifically track the number of kernel threads being made
> >> for the vhost kernel use case by the RLIMIT_NPROC limit?
> >>
> >> Or, is it ok to limit the number of devices with the RLIMIT_NOFILE limit.
> >> Then each device has a limit on the number of threads it can create.
> > 
> > Do we want to add an interface where an unprivileged userspace process
> > can create large numbers of kthreads? The number is indirectly bounded
> > by RLIMIT_NOFILE * num_virtqueues, but there is no practical way to
> > use that rlimit since num_virtqueues various across vhost devices and
> > RLIMIT_NOFILE might need to have a specific value to control file
> > descriptors.
> > 
> > io_uring worker threads are limited by RLIMIT_NPROC. I think it makes
> > sense in vhost too where the device instance is owned by a specific
> > userspace process and can be accounted against that process' rlimit.
> > 
> > I don't have a specific use case other than that I think vhost should be
> > safe and well-behaved.
> > 
> 
> Sorry for the late reply. I finally got to go on PTO and used like 2
> years worth in one super long vacation :)
> 
> I still don't have a RLIMIT_NPROC use case and it wasn't not clear to
> me if that has to be handled before merging. However, I might have got
> lucky and found a bug where the fix will handle your request too.
> 
> It looks like cgroup v2 is supposed to work, but for vhost threads
> it doesn't because the kernel functions we use just support v1. If
> we change the vhost layer to create threads like how io_uring does
> then we get the RLIMIT_NPROC checks and also cgroup v2 support.
> 
> Christian, If you didn't like this patch
> 
> https://lkml.org/lkml/2021/6/23/1233
> 
> then I'm not sure how much you will like what is needed to support the
> above. Here is a patch which includes what we would need from the fork
> related code. On one hand, it's nicer because it fits into the PF FLAG
> code like you requested. But, I have to add a no_files arg. See below:
> 
> 
> --
> 
> 
> >From 351d476e8db0a78b9bdf22d77dd1abe66c0eac40 Mon Sep 17 00:00:00 2001
> From: Mike Christie 
> Date: Mon, 13 Sep 2021 11:20:20 -0500
> Subject: [PATCH] fork: allow cloning of userspace procs from kernel
> 
> Userspace apps/processes like Qemu call into the vhost layer to create
> worker threads which execute IO on behalf of VMs. If users set RIMIT
> or cgroup limits or setup v2 cgroups or namespaces, the worker thread
> is not accounted for or even setup correctly. The reason is that vhost
> uses the kthread api which inherits those attributes/values from the
> kthreadd thread. This patch allows kernel modules to work like the
> io_uring code which can call kernel_clone from the userspace thread's
> context and directly inherit its attributes like cgroups from and will
> check limits like RLIMIT_NPROC against that userspace thread.
> 
> Note: this patch combines 2 changes that should be separate patches. I'm
> including both in one patch to just make it easier to get an idea of what
> needs to be done. If we are ok with this 

Re: RFC: revert to external snapshot API

2021-09-13 Thread Nikolay Shirokovskiy
ср, 8 сент. 2021 г. в 12:27, Peter Krempa :

> On Tue, Aug 31, 2021 at 16:46:25 +0300, Nikolay Shirokovskiy wrote:
> > Hi, all.
>
> Hi, sorry for the late reply I was on PTO.
>
>
> > I want to implement reverting to external snapshot functionality.
> > I've checked the mailing list history and found some previous attempts
> > (not sure if this is a complete list of them).
> >
> > [1] was done in 2012 by the Redhat team itself. [2] and [3] was done in
> > 2018.
> > Looks like it was not clear how the API should look like.
>
> One additional thing is that me and phrdina started discussing this (in
> person so I can't point you to a discussion) 2 weeks ago.
>
> I'll summarize the points we agreed upon.
>
> > For example we have disk.qcow2 <- disk.snap1 chain <- disk.snap2 chain
> > (after
> > having snap1 and snap2 snapshots). Now we want to revert to snap1
> snapshot.
>
> There's one implementation snag we currently have which complicates
> stuff. Let's expand your above scenario with the snapshot states:
>
>
>s  s
>n  n
>a  a
>p  p
>1  2
>   p
> o  │  │   r
> rbase.qcow2│snap1.qcow2   │snap2.qcow2e
> i ───► │ ►│ ► s
> g  │  │   e
> i  │  │   n
> n t
>
> A rather big set of problems which we will necessarily encounter when
> implementing this comes from the following:
>
> 1) users can modify the VM betwen snapshots or between the last snapshot
> and the abandoned state and this modification (e.g. changing a disk
> image) must be considered to prevent data loss when manipulating the
> images.
>

Hi, Peter!

Can you please explain in more detail what the issue is? For example if
after snap1 I changed disk image to some base2.qcow2 then snap1.qcow2 is
expected to be managed by mgmt (expected to be deleted I guess). Then I made
a snap2. Now I have base.qcow2 (keeping snap1 state) and  base2.qcow2
(keeping snap2 state) <-snap2.qcow2
chain. Thanks to metadata I can revert to both snap1 and snap2 as I know
the domain
xml in those states and these xmls references right images (snap1
references base.qcow2
and snap2 references base2.qcow2).


> 2) libvirt doesn't record the XML used to create the snapshot (e.g.
>

It is not quite so. According to
https://libvirt.org/formatsnapshot.html#example
xml given to create API saved in snapshot metadata. Or I miss something.



> snap1.xm) thus we don't actually know what the (historical) snapshot was
> actually recording. Coupled with the fact that the user could have
> changed the VM definition between 'snap1' and 'snap2' we can't even
> infer what the state was.
>

Here too I cannot follow what the issue is. Can you please give an example?


>
>
> > The
> > snapshot state is held by disk.qcow2 image. We can run reverted domain on
> > disk.qcow2 itself but then we lose snap1 (held by disk.qcow2) and snap2
> > (held by disk.snap1). So we definitely should run on some overlay over
> > disk.qcow2. But then what name should be given to overlay? We should have
> > an option for mgmt to specify this name like in case of snapshots itself.
>
> Exactly. Reversion of external snapshots will necessarily require a new
> API, which will take a new "snapshot" XML describing the new overlays as
> you describe below.
>
> In the simple case such as with local files we can use the same
> algorithm for creating overlay filenames as we do when creating
> snapshots but generally we need to give the MGMT the ability to specify
> the overlay name.
>
>
> > The [1] has some discussion on adding extra options to reverting
> > functionality.
> > Like for example if we revert back to snap2 then having the ability to
> run
> > from
> > state in disk.snap2 rather than disk.snap1. My opinion is there is no
> need
> > to
> > as if one wants to revert to the state of disk2.snap2 it can take a
> > snapshot (some
> > snap3).
>
> It's possible to avoid doing a combined "take snapshot" operation
> as long as we have the possibility to take a snapshot and destroy the
> VM as running it after the point you want to return to is undesirable
> and pointless.
>
> One thing that you need to consider here is that when you are reverting
> to an arbitrary snapshot, the overlay files after the last snapshot
> (e.g. snap2.qcow2 (in my diagram), which is the state between snap2 and
> present) are abandoned and will never be used again.
>
> At this point we 

Re: question on vhost, limiting kernel threads and NPROC

2021-09-13 Thread Mike Christie
I just realized I forgot to cc the virt list so adding now.

Christian see the very bottom for a different fork patch.

On 7/12/21 7:05 AM, Stefan Hajnoczi wrote:
> On Fri, Jul 09, 2021 at 11:25:37AM -0500, Mike Christie wrote:
>> Hi,
>>
>> The goal of this email is to try and figure how we want to track/limit the
>> number of kernel threads created by vhost devices.
>>
>> Background:
>> ---
>> For vhost-scsi, we've hit a issue where the single vhost worker thread can't
>> handle all IO the being sent from multiple queues. IOPs is stuck at around
>> 500K. To fix this, we did this patchset:
>>
>> https://lore.kernel.org/linux-scsi/20210525180600.6349-1-michael.chris...@oracle.com/
>>
>> which allows userspace to create N threads and map them to a dev's 
>> virtqueues.
>> With this we can get around 1.4M IOPs.
>>
>> Problem:
>> 
>> While those patches were being reviewed, a concern about tracking all these
>> new possible threads was raised here:
>>
>> https://lore.kernel.org/linux-scsi/YL45CfpHyzSEcAJv@stefanha-x1.localdomain/
>>
>> To save you some time, the question is what does other kernel code using the
>> kthread API do to track the number of kernel threads created on behalf of
>> a userspace thread. The answer is they don't do anything so we will have to
>> add that code.
>>
>> I started to do that here:
>>
>> https://lkml.org/lkml/2021/6/23/1233
>>
>> where those patches would charge/check the vhost device owner's RLIMIT_NPROC
>> value. But, the question of if we really want to do this has come up which is
>> why I'm bugging lists like libvirt now.
>>
>> Question/Solution:
>> --
>> I'm bugging everyone so we can figure out:
>>
>> If we need to specifically track the number of kernel threads being made
>> for the vhost kernel use case by the RLIMIT_NPROC limit?
>>
>> Or, is it ok to limit the number of devices with the RLIMIT_NOFILE limit.
>> Then each device has a limit on the number of threads it can create.
> 
> Do we want to add an interface where an unprivileged userspace process
> can create large numbers of kthreads? The number is indirectly bounded
> by RLIMIT_NOFILE * num_virtqueues, but there is no practical way to
> use that rlimit since num_virtqueues various across vhost devices and
> RLIMIT_NOFILE might need to have a specific value to control file
> descriptors.
> 
> io_uring worker threads are limited by RLIMIT_NPROC. I think it makes
> sense in vhost too where the device instance is owned by a specific
> userspace process and can be accounted against that process' rlimit.
> 
> I don't have a specific use case other than that I think vhost should be
> safe and well-behaved.
> 

Sorry for the late reply. I finally got to go on PTO and used like 2
years worth in one super long vacation :)

I still don't have a RLIMIT_NPROC use case and it wasn't not clear to
me if that has to be handled before merging. However, I might have got
lucky and found a bug where the fix will handle your request too.

It looks like cgroup v2 is supposed to work, but for vhost threads
it doesn't because the kernel functions we use just support v1. If
we change the vhost layer to create threads like how io_uring does
then we get the RLIMIT_NPROC checks and also cgroup v2 support.

Christian, If you didn't like this patch

https://lkml.org/lkml/2021/6/23/1233

then I'm not sure how much you will like what is needed to support the
above. Here is a patch which includes what we would need from the fork
related code. On one hand, it's nicer because it fits into the PF FLAG
code like you requested. But, I have to add a no_files arg. See below:


--


>From 351d476e8db0a78b9bdf22d77dd1abe66c0eac40 Mon Sep 17 00:00:00 2001
From: Mike Christie 
Date: Mon, 13 Sep 2021 11:20:20 -0500
Subject: [PATCH] fork: allow cloning of userspace procs from kernel

Userspace apps/processes like Qemu call into the vhost layer to create
worker threads which execute IO on behalf of VMs. If users set RIMIT
or cgroup limits or setup v2 cgroups or namespaces, the worker thread
is not accounted for or even setup correctly. The reason is that vhost
uses the kthread api which inherits those attributes/values from the
kthreadd thread. This patch allows kernel modules to work like the
io_uring code which can call kernel_clone from the userspace thread's
context and directly inherit its attributes like cgroups from and will
check limits like RLIMIT_NPROC against that userspace thread.

Note: this patch combines 2 changes that should be separate patches. I'm
including both in one patch to just make it easier to get an idea of what
needs to be done. If we are ok with this then I'll break it up into a
proper patchset.

This patch does the following:

1. Separates the PF_IO_WORKER flag behavior that controls signals and exit
cleanup into a new flag PF_USER_WORKER, so the vhost layer can use it
without the PF_IO_WORKER scheduling/IO behavior.

2. It adds a new no_files 

Re: [PATCH v4 for v7.6.0 00/14] Introduce virtio-mem model

2021-09-13 Thread David Hildenbrand

On 13.09.21 08:53, Michal Prívozník wrote:

On 7/7/21 12:30 PM, David Hildenbrand wrote:

On 23.06.21 12:12, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html

diff to v3:
- Rebased code on the top of master
- Tried to work in all Peter's review suggestions
- Fixed a bug where adjusting  was viewed as hotplug of new
     by XML validator and thus if  was close
enough to
     the validator reported an error (this was reported by
    David).



Hi Michal,





6. 4k source results in an error

     
   4096
   0-1
     

"error: internal error: Unable to find any usable hugetlbfs mount for
4096 KiB"

This example is taken from https://libvirt.org/formatdomain.html for
DIMMs. Not sure what the expected behavior is.


Just realized that this IS expected behavior. 4096KiB pages (=4MiB) are
not regular system pages (4KiB). Thus libvirt is trying to find
hugetlbfs mount point that's serving 4MiB pages, unsuccessfully. I'll
post a patch that fixes the example though.


Ah, very right. I blindly copied the example ... make sense to me and 
the error message is correct.


--
Thanks,

David / dhildenb



Re: [RFC REPOST 0/7] introduce support for live appid updates

2021-09-13 Thread Pavel Hrdina
On Fri, Sep 10, 2021 at 02:37:49PM +0100, Daniel P. Berrangé wrote:
> On Fri, Sep 10, 2021 at 03:26:20PM +0200, Pavel Hrdina wrote:
> > On Fri, Sep 10, 2021 at 12:57:30PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Sep 09, 2021 at 06:13:16PM +0200, Pavel Hrdina wrote:
> > > > Rebased on top of current master.
> > > > 
> > > > I'm posting this as an RFC mainly because I'm not sure how to model
> > > > the new API. This patches introduce a new naive API that will change
> > > > only the APPID and nothing else.
> > > > 
> > > > Currently there are no other known features related to Fibre Channel
> > > > resources so this non-extendable API will be sufficient, however the
> > > > appid lives in  element in the XML where we currently have
> > > > root cgroup partition. Even though changing the partition will not be
> > > > supported and we don't know about anything else that could be placed
> > > > here it doesn't mean it will not happen in the future. In that case
> > > > we would have to add new API as well.
> > > 
> > > I'm curious why we need to even support updates for the APPID in
> > > the first place. I understood it to be a unique identifier for
> > > the VM from POV of the storage, and unique identifier are not
> > > something you would typically change on the fly.
> > 
> > Good point, when implementing this feature I also thought that there is
> > no need to have live changes, but to be sure I asked in the BZ (no
> > public) and got an answer that they would like to have the live changes
> > as well.
> 
> Reading the bug report I'm not convinced they are asking for live
> changes. They are asking how to set the value using virsh, but
> that doesn't imply live change. It is entirely possible, maybe even
> likely, they just want to use 'virsh edit' to set the value in an
> existing guest before booting it.

So we had a meeting today and yes there was some noise in the
communication. There is no need to change the VMID while the VM is
running so this whole patch series can be dropped.

They wanted to have a virsh command to edit it but I explained that it
can be already done by virsh edit and that we will add support to
virt-install/virt-xml as well to make it easier.

Thanks Michal for the review but I'm not going to push this series.

Pavel


signature.asc
Description: PGP signature


Re: device-specific VFIO drivers

2021-09-13 Thread Daniel P . Berrangé
On Mon, Sep 13, 2021 at 10:20:08AM -0400, Laine Stump wrote:
> The Linux kernel has recently added support for device-specific VFIO drivers
> to be used (instead of the current generic, one-size-fits-all vfio_pci
> driver) when assigning a device to a guest with VFIO. The intent of this is
> to (for example) support APIs for device-specific setup that needs to be
> done before the device is handed over to the guest, and to support migration
> of guests using these devices. More details can be found in the comments of
> the patch emails here:
> 
>  https://lore.kernel.org/kvm/20210826103912.128972-1-yish...@nvidia.com/
> 
> There are a couple of issues that need to be resolved before that will work
> well with libvirt-managed guests:
> 
> 
> 1) Although they've outlined a system for determining the best / most
> specific driver for a device by extending the existing modules aliases, the
> kernel people have left it up to userspace to actually parse the
> module.aliases info to determine the best driver for a device. They've even
> "thrown us a bone" in the form of an example python script to do just that:
> 
> https://github.com/maxgurtovoy/linux_tools/blob/main/vfio/bind_vfio_pci_driver.py
> 
> Currently, if a device is assigned to a guest with 
> (the default) then libvirt will unbind the device from whatever host driver
> its bound to, and bind it to the vfio_pci driver (since, up to now, that has
> been the only driver available). In the future, if we want to be able to
> take advantage of the extended capabilities provided by the device-specific
> vfio drivers, we will need to figure out which driver to load.
> 
> I personally don't like the idea of embedding the logic from the above
> example program into libvirt - this really seems to me like something that
> should be implemented in one place for use by many consumers (libvirt being
> one of the consumers). One suggestion made by Alex Williamson (in a private
> discussion, not on a mailing list) was that possibly we could get the
> driverctl utility to add this functionality, and then call driverctl to
> learn the name of the best-fit vfio driver for a device. I haven't yet
> looked at it, but I've been told that driverctl is a bash script :-O. If we
> decide that's a good route, perhaps we could also convince someone to
> convert drivertctl to rust, similar to what jjongsma did with mdevctl (nudge
> nudge, hint hint).
> 
> Or maybe I'm making it into a bigger deal than it needs to be, and we should
> just implement the logic of the python script up above directly in libvirt.
> Does anyone have an opinion?

The python script does a whole lot more than what we would want,
so isn't usable in any case. eg it loads drivers, and rebinds
them itself. All we want is to know the name of the vfio_BLAH
module that is applicable, as we already take care of everything
else.

So we have to parse the module.alias file to extract the vfio lines
that contain the glob match rule.

If we then format the info about the device we have in the matching
structure, potentially all we need do is invoke g_pattern_match_simple()
to compare the two.

So I'd ignore the demo program referenced above. I don't particularly
see a need to wait for some new tool to be written either, especially
if it is going to be in bash.

> 2) There may be cases where, in spite of a device-specific driver being
> available, the user prefers to use the generic vfio_pci driver. To support
> that we will need to have a place in our config to set the driver name.
> 
> We already have a  subelement of  that was originally added
> to allow choosing between VFIO device assignment and legacy KVM device
> assignment:
> 
> 
> 
> All support for KVM device assignment was removed a few years ago, so in
> practice the driver name is always "vfio". The most natural looking way to
> support device-specific drivers would be to use this name attribute to
> specify the driver name, e.g. if you wanted to let libvirt select the best
> driver, you would use:
> 
>
> 
> (what it's currently set to in everyone's configs). But if you wanted to
> force use of the generic driver, you'd use:
> 
>
> 
> or if you wanted to force use of another driver that wasn't the 'best fit'
> according to the module aliases, you could use, e.g.:
> 
>
> 
> I'm uncomfortable with the fact that we're effectively "re-using" the name
> attribute for a new purpose though - up until now it has been "which device
> assignment method?", but this changes it to "which vfio driver should be
> loaded?".

Yes, that definitely isn't right, as this existing attribute is
describing the assignment type. That it happens to match the kmod
name was just co-incidence.

>   So maybe we need a new element. I'm not sure what to call it
> though. How about this?
> 
>
> 
> Does anyone have a better idea for the name?

That name seems fine to me.


Regards,
Daniel
-- 
|: https://berrange.com  -o-

Re: [libvirt PATCH 2/2] conf: log error on attempts to modify ACPI index of active device

2021-09-13 Thread Peter Krempa
On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
> On 9/10/21 3:30 AM, Peter Krempa wrote:
> > On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
> > > The ACPI index of a device in a running guest can't be modified, and
> > > libvirt doesn't actually attempt to modify it, but it was possible for
> > > a user to request such a modification, and libvirt wouldn't complain,
> > > thus misleading the user into thinking that it had actually been changed.
> > > 
> > > Resolves: https://bugzilla.redhat.com/1998920
> > > 
> > > Signed-off-by: Laine Stump 
> > > ---
> > >   src/conf/domain_conf.c | 6 ++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> > > index fefc527901..7ff890d8b7 100644
> > > --- a/src/conf/domain_conf.c
> > > +++ b/src/conf/domain_conf.c
> > > @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def,
> > >  _("changing device alias is not allowed"));
> > >   return -1;
> > >   }
> > > +
> > > +if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) {
> > > +virReportError(VIR_ERR_OPERATION_DENIED, "%s",
> > > +   _("changing device 'acpi index' is not 
> > > allowed"));
> > > +return -1;
> > > +}
> > 
> > I don't remember any more what the intended semantics of the 'update'
> > API are in regards to asking the user to provide a complete definition
> > of the device, but according to the 'alias' check above we seem to be
> > specifically ignoring the situation when the new definition didn't
> > provide an alias. This seems to be hinting that we ignore stuff that the
> > used didn't provide.
> > 
> > If that's the case you'll need to specifically exclude index '0' from
> > the above check.
> 
> The difference here is that alias is something that normally is not set by
> the user, but is instead auto-generated by libvirt when the guest starts and
> not stored/reported in the persistent config. So in the common case of "grab

I strongly disagree with this statement. Firstly we now have user
aliases and secondly there's always runtime related stuff in the XML.

> config XML, extract device element, modify it, send it to update-device" the
> alias wouldn't be in the XML, but if the user instead grabs the *status* XML
> then the alias *would* be there. (However, if the alias had been set by the
> user (with the prefix "ua-"), the alias would be present in the config XML.)
> So we allow for either blank (in the case that config XML was used) or
> no-change (in case alias was user-set and/or the status XML is used).

Picking config XML is semantically as wrong as generating a partial XML.
It can simply be a completely different device.

If we want the users to use the full XML when updating the device it
_must_ be the full XML snippet from the running config. Otherwise we are
back at a "partial XML" and all the problems connected to that.



Re: [libvirt PATCH 2/2] conf: log error on attempts to modify ACPI index of active device

2021-09-13 Thread Michal Prívozník
On 9/10/21 5:18 PM, Laine Stump wrote:
> On 9/10/21 3:30 AM, Peter Krempa wrote:
>> On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:
>>> The ACPI index of a device in a running guest can't be modified, and
>>> libvirt doesn't actually attempt to modify it, but it was possible for
>>> a user to request such a modification, and libvirt wouldn't complain,
>>> thus misleading the user into thinking that it had actually been
>>> changed.
>>>
>>> Resolves: https://bugzilla.redhat.com/1998920
>>>
>>> Signed-off-by: Laine Stump 
>>> ---
>>>   src/conf/domain_conf.c | 6 ++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>>> index fefc527901..7ff890d8b7 100644
>>> --- a/src/conf/domain_conf.c
>>> +++ b/src/conf/domain_conf.c
>>> @@ -28485,6 +28485,12 @@ virDomainDefCompatibleDevice(virDomainDef *def,
>>>  _("changing device alias is not allowed"));
>>>   return -1;
>>>   }
>>> +
>>> +    if (data.newInfo->acpiIndex != data.oldInfo->acpiIndex) {
>>> +    virReportError(VIR_ERR_OPERATION_DENIED, "%s",
>>> +   _("changing device 'acpi index' is not
>>> allowed"));
>>> +    return -1;
>>> +    }
>>
>> I don't remember any more what the intended semantics of the 'update'
>> API are in regards to asking the user to provide a complete definition
>> of the device, but according to the 'alias' check above we seem to be
>> specifically ignoring the situation when the new definition didn't
>> provide an alias. This seems to be hinting that we ignore stuff that the
>> used didn't provide.
>>
>> If that's the case you'll need to specifically exclude index '0' from
>> the above check.
> 
> The difference here is that alias is something that normally is not set
> by the user, but is instead auto-generated by libvirt when the guest
> starts and not stored/reported in the persistent config. So in the
> common case of "grab config XML, extract device element, modify it, send
> it to update-device" the alias wouldn't be in the XML, but if the user
> instead grabs the *status* XML then the alias *would* be there.
> (However, if the alias had been set by the user (with the prefix "ua-"),
> the alias would be present in the config XML.) So we allow for either
> blank (in the case that config XML was used) or no-change (in case alias
> was user-set and/or the status XML is used).
> 
> The acpi index, on the other hand, is always unset unless the user
> specifies it, and so is never auto-generated and always matches in both
> the config and status XMLs. So whichever way the user gets the original
> XML (status or config) they are always going to have the actual setting.
> 
> There are many other attributes that are checked to assure they match in
> (e.g.) qemuDomainChangeNet() (for example, network device model name,
> virtio options, upscript name,...) and (with one exception) none of
> those check for "not specified or not changed" - they all just check for
> "not changed". The one exception is ifname, which is another attribute
> that (like alias) is autogenerated every time the guest starts, and is
> never output in config XML (unless it was user-specified); for ifname if
> it's been left unspecified in the update XML, then the value from the
> status is copied over, making it also effectively ("unspecified or
> unchanged"). But again, this special handling isn't done for any
> attribute that isn't both auto-generated and ephemeral (as alias and
> ifname are).
> 
> So in the end, I think it's correct that validation of acpi index
> *shouldn't* ignore an "apparently missing" value in the update.
> 
> ==
> 
> BTW, something forgot to mention in the commit log - in the case of,
> e.g., network interfaces which is the only type of device where an alias
> is useful, most of the checks are in qemuDomainChangeNet() rather than
> in this virDomainDefCompatibleDevice() function. I was originally going
> to add this check into qemuDomainChangeNet() as well, but it is an
> attribute that's technically valid for any PCI device, not just
> interfaces; searching around, I found that
> virDomainDefCompatibleDevice() is called on update for all devices, and
> is already used to pretect against alias change (another attribute
> that's valid for all devices), so in the interest of consistency I put
> the check there.
> 
> That said, although I used it based on precendent, I can't say that I
> really like virDomainDefCompatibleDevice(). It is used for multiple
> unrelated things - the bit at the beginning of the function is only
> executed if action is UPDATE and live is true, and all the rest of the
> function is only useful if action is ATTACH - it's checking things that
> could verifiably never happen during a live device update (e.g. checking
> if the device is USB but the domain doesn't support USB devices, or
> checking attributes for a device type that isn't allowed to be 

[PATCH v5 16/16] kbase: Document virtio-mem

2021-09-13 Thread Michal Privoznik
This commit adds new memorydevices.rst page which should serve
all models of memory devices. Yet, I'm documenting virtio-mem
quirks only.

Signed-off-by: Michal Privoznik 
---
 docs/kbase/index.rst |   4 +
 docs/kbase/memorydevices.rst | 150 +++
 docs/kbase/meson.build   |   1 +
 3 files changed, 155 insertions(+)
 create mode 100644 docs/kbase/memorydevices.rst

diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst
index 351ea2c93b..09b19ed1af 100644
--- a/docs/kbase/index.rst
+++ b/docs/kbase/index.rst
@@ -52,6 +52,10 @@ Usage
 `PCI topology <../pci-addresses.html>`__
Addressing schemes for PCI devices
 
+`Memory devices `__
+   Memory devices and their use
+
+
 Internals / Debugging
 -
 
diff --git a/docs/kbase/memorydevices.rst b/docs/kbase/memorydevices.rst
new file mode 100644
index 00..7f58a4247b
--- /dev/null
+++ b/docs/kbase/memorydevices.rst
@@ -0,0 +1,150 @@
+==
+Memory devices
+==
+
+.. contents::
+
+Basics
+==
+
+Memory devices can be divided into two families: volatile and non-volatile.
+The former is typical RAM memory: it's volatile and thus its contents doesn't
+survive guest reboots or power cycles. The latter retains its contents across
+reboots or power outages.
+
+In Libvirt, there are two models for volatile memory:
+
+* ``dimm`` model:
+
+  ::
+
+
+  
+523264
+0
+  
+  
+
+
+* ``virtio-mem`` model:
+
+  ::
+
+
+  
+1048576
+0
+2048
+524288
+  
+  
+
+
+Then there are two models for non-volatile memory:
+
+* ``nvdimm`` model:
+
+  ::
+
+
+  
+/tmp/nvdimm
+  
+  
+523264
+0
+  
+  
+
+
+* ``virtio-pmem`` model:
+
+  ::
+
+
+  
+/tmp/virtio_pmem
+  
+  
+524288
+  
+  
+
+
+
+Please note that (maybe somewhat surprisingly) virtio models go onto PCI bus
+instead of DIMM slots.
+
+Furthermore, DIMMs can have  element which configures backend for
+devices. For NVDIMMs the element is mandatory and reflects where the content
+is saved.
+
+See `memory devices documentation <../formatdomain.html#elementsMemory>`_.
+
+``virtio-mem`` model
+
+
+The ``virtio-mem`` model can be viewed as revised memory balloon. It offers
+adding and removing memory (without the actual hotplug of the device). It
+solves problems that memory balloon can't solve on its own and thus is more
+flexible than DIMM + balloon solution. ``virtio-mem`` is NUMA aware, and thus
+memory can be inflated/deflated only for a subset of guest NUMA nodes.  Also,
+it works with chunks that are either exposed to guest or reclaimed from it.
+
+See https://virtio-mem.gitlab.io/
+
+Under the hood, ``virtio-mem`` device is split into chunks of equal size which
+are then exposed to the guest. Either all of them or only a portion depending
+on user's request. Therefore there are three important sizes for
+``virtio-mem``. All are to be found under  element:
+
+#. The maximum size the device can ever offer, exposed under 
+#. The size of a single block, exposed under 
+#. The current size exposed to the guest, exposed under 
+
+For instance, in the following example the maximum size is 4GiB, the block size
+is 2MiB and only 1GiB should be exposed to the guest:
+
+  ::
+
+
+  
+4194304
+2048
+1048576
+  
+
+
+Please note that  must be an integer multiple of 
+size or zero (no blocks exposed to the guest) and has to be less or equal to
+ (all blocks exposed to the guest). Furthermore, QEMU recommends the
+ size to be as big as a Transparent Huge Page (usually 2MiB).
+
+To change the size exposed to the guest, users should pass memory device XML
+with nothing but  changed into the
+``virDomainUpdateDeviceFlags()`` API. For user's convenience this can be done
+via virsh too:
+
+ ::
+
+   # virsh update-memory-device $dom --requested-size 2GiB
+
+If there are two or more  devices then ``--alias`` shall be used
+to tell virsh which memory device should be updated.
+
+For running guests there is fourth size that can be found under :
+
+  ::
+
+2097152
+
+The  reflects the current size used by the guest. In general it
+can differ from . Reasons include guest kernel missing
+``virtio-mem`` module and thus being unable to take offered memory, or guest
+kernel being unable to free memory.  Since  only reports size to
+users, the element is never parsed. It is formatted only into live XML.
+
+Since changing  allocation requires cooperation with guest
+kernel, requests for change are not instant. Therefore, libvirt emits
+``VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE`` event whenever current
+allocation changed.
diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build
index 73dc11837f..af067db453 100644
--- a/docs/kbase/meson.build
+++ 

[PATCH v5 08/16] Introduce property to virtio-mem

2021-09-13 Thread Michal Privoznik
The virtio-mem has another property that isn't exposed yet:
current size exposed to the guest. Please note, that this is
different to  because esp. on sizing the memory
down guest may refuse to release some blocks. Therefore, let's
have another size to report in the XML. But because of its
nature, the  won't be parsed and is report only (for
live XMLs).

Signed-off-by: Michal Privoznik 
---
 docs/formatdomain.rst | 7 +++
 docs/schemas/domaincommon.rng | 5 +
 src/conf/domain_conf.c| 9 +++--
 src/conf/domain_conf.h| 3 +++
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index c327e1deae..7dc93d6abe 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -7800,6 +7800,7 @@ Example: usage of the memory devices
  0
  2048
  1048576
+ 524288

  

@@ -7915,6 +7916,12 @@ Example: usage of the memory devices
  The total size exposed to the guest. Must respect ``block`` granularity
  and be smaller than or equal to ``size``.
 
+   ``current``
+ Active XML for ``virtio-mem`` model may contain ``current`` element that
+ reflects the current size of the corresponding virtio memory device. The
+ element is formatted into live XML and never parsed, i.e. it is
+ output-only element.
+
 :anchor:``
 
 IOMMU devices
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 82fdc7c8b5..a68f794262 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -6693,6 +6693,11 @@
 
   
 
+
+  
+
+  
+
 
   
 
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 9562065499..c91dd9d81c 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -25905,7 +25905,8 @@ virDomainMemorySourceDefFormat(virBuffer *buf,
 
 static void
 virDomainMemoryTargetDefFormat(virBuffer *buf,
-   virDomainMemoryDef *def)
+   virDomainMemoryDef *def,
+   unsigned int flags)
 {
 g_auto(virBuffer) childBuf = VIR_BUFFER_INIT_CHILD(buf);
 
@@ -25927,6 +25928,10 @@ virDomainMemoryTargetDefFormat(virBuffer *buf,
 
 virBufferAsprintf(, "%llu\n",
   def->requestedsize);
+if (!(flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE)) {
+virBufferAsprintf(, "%llu\n",
+  def->currentsize);
+}
 }
 
 virXMLFormatElement(buf, "target", NULL, );
@@ -25959,7 +25964,7 @@ virDomainMemoryDefFormat(virBuffer *buf,
 if (virDomainMemorySourceDefFormat(buf, def) < 0)
 return -1;
 
-virDomainMemoryTargetDefFormat(buf, def);
+virDomainMemoryTargetDefFormat(buf, def, flags);
 
 virDomainDeviceInfoFormat(buf, >info, flags);
 
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index e9d8e0c7ae..91bf52ff3b 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -2495,6 +2495,9 @@ struct _virDomainMemoryDef {
 unsigned long long labelsize; /* kibibytes; valid only for NVDIMM */
 unsigned long long blocksize; /* kibibytes; valid only for VIRTIO_MEM */
 unsigned long long requestedsize; /* kibibytes; valid only for VIRTIO_MEM 
*/
+unsigned long long currentsize; /* kibibytes, valid for VIRTIO_MEM and
+   active domain only, only to report never
+   parse */
 bool readonly; /* valid only for NVDIMM */
 
 /* required for QEMU NVDIMM ppc64 support */
-- 
2.32.0



[PATCH v5 15/16] news: document recent virtio memory addition

2021-09-13 Thread Michal Privoznik
Signed-off-by: Michal Privoznik 
---
 NEWS.rst | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/NEWS.rst b/NEWS.rst
index 4521499db7..68afc01633 100644
--- a/NEWS.rst
+++ b/NEWS.rst
@@ -17,6 +17,13 @@ v7.8.0 (unreleased)
 
 * **New features**
 
+  * Introduce virtio-mem  model
+
+New virtio-mem model is introduced for  device which is a
+paravirtualized mechanism of adding/removing memory to/from a VM. Use
+``virDomainUpdateDeviceFlags()`` API to adjust amount of memory or ``virsh
+update-memory-device`` for convenience.
+
 * **Improvements**
 
 * **Bug fixes**
-- 
2.32.0



[PATCH v5 13/16] qemuDomainSetMemoryFlags: Take virtio-mem into consideration

2021-09-13 Thread Michal Privoznik
The qemuDomainSetMemoryFlags() allows for memballoon
() changes for both active and inactive guests.
And just before doing any change, we have to make sure that the
new size is not greater than the total memory ().

However, the total memory includes not only the regular guest
memory, but also sum of maximum sizes of all virtio-mems (in fact
all memory devices for that matter). But virtio-mem devices are
modified differently (via virDomainUpdateDevice()) and thus the
upper limit for new balloon size has to be lowered.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_driver.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 84adeec1c2..171a5217a4 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -2388,12 +2388,28 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, 
unsigned long newmem,
 } else {
 /* resize the current memory */
 unsigned long oldmax = 0;
+size_t i;
 
-if (def)
+if (def) {
 oldmax = virDomainDefGetMemoryTotal(def);
+
+/* While virtio-mem is regular mem from guest POV, it can't be
+ * modified through this API. */
+for (i = 0; i < def->nmems; i++) {
+if (def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
+oldmax -= def->mems[i]->size;
+}
+}
+
 if (persistentDef) {
-if (!oldmax || oldmax > virDomainDefGetMemoryTotal(persistentDef))
+if (!def || oldmax > virDomainDefGetMemoryTotal(persistentDef)) {
 oldmax = virDomainDefGetMemoryTotal(persistentDef);
+
+for (i = 0; i < persistentDef->nmems; i++) {
+if (persistentDef->mems[i]->model == 
VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
+oldmax -= persistentDef->mems[i]->size;
+}
+}
 }
 
 if (newmem > oldmax) {
-- 
2.32.0



[PATCH v5 10/16] qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event

2021-09-13 Thread Michal Privoznik
As advertised in previous commit, this event is delivered to us
when virtio-mem module changes the allocation inside the guest.
It comes with one attribute - size - which holds the new size of
the virtio-mem (well, allocated size), in bytes.
Mind you, this is not necessarily the same number as 'requested
size'. It almost certainly will be when sizing the memory up, but
it might not be when sizing the memory down - the guest kernel
might be unable to free some blocks.

This current size is reported in the domain XML as an output
element only.

Signed-off-by: Michal Privoznik 
---
 examples/c/misc/event-test.c| 17 ++
 include/libvirt/libvirt-domain.h| 24 +
 src/conf/domain_event.c | 84 +
 src/conf/domain_event.h | 10 
 src/libvirt_private.syms|  2 +
 src/qemu/qemu_domain.c  |  3 ++
 src/qemu/qemu_domain.h  |  1 +
 src/qemu/qemu_driver.c  | 37 +
 src/qemu/qemu_monitor.c | 21 
 src/qemu/qemu_monitor.h | 20 +++
 src/qemu/qemu_monitor_json.c| 24 +
 src/qemu/qemu_process.c | 42 +++
 src/remote/remote_daemon_dispatch.c | 30 +++
 src/remote/remote_driver.c  | 32 +++
 src/remote/remote_protocol.x| 15 +-
 src/remote_protocol-structs |  7 +++
 tools/virsh-domain.c| 20 +++
 17 files changed, 388 insertions(+), 1 deletion(-)

diff --git a/examples/c/misc/event-test.c b/examples/c/misc/event-test.c
index 10c707e66b..1eec76c79d 100644
--- a/examples/c/misc/event-test.c
+++ b/examples/c/misc/event-test.c
@@ -982,6 +982,22 @@ myDomainEventMemoryFailureCallback(virConnectPtr conn 
G_GNUC_UNUSED,
 }
 
 
+static int
+myDomainEventMemoryDeviceSizeChangeCallback(virConnectPtr conn G_GNUC_UNUSED,
+virDomainPtr dom,
+const char *alias,
+unsigned long long size,
+void *opaque G_GNUC_UNUSED)
+{
+/* Casts to uint64_t to work around mingw not knowing %lld */
+printf("%s EVENT: Domain %s(%d) memory device size change: "
+   "alias: '%s' new size %" PRIu64 "'\n",
+   __func__, virDomainGetName(dom), virDomainGetID(dom),
+   alias, (uint64_t)size);
+return 0;
+}
+
+
 static int
 myDomainEventMigrationIterationCallback(virConnectPtr conn G_GNUC_UNUSED,
 virDomainPtr dom,
@@ -1113,6 +1129,7 @@ struct domainEventData domainEvents[] = {
 DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_METADATA_CHANGE, 
myDomainEventMetadataChangeCallback),
 DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD, 
myDomainEventBlockThresholdCallback),
 DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE, 
myDomainEventMemoryFailureCallback),
+DOMAIN_EVENT(VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE, 
myDomainEventMemoryDeviceSizeChangeCallback),
 };
 
 struct storagePoolEventData {
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
index 7ef8ac51e5..eaafcc6b29 100644
--- a/include/libvirt/libvirt-domain.h
+++ b/include/libvirt/libvirt-domain.h
@@ -4655,6 +4655,29 @@ typedef void 
(*virConnectDomainEventMemoryFailureCallback)(virConnectPtr conn,
void *opaque);
 
 
+/**
+ * virConnectDomainEventMemoryDeviceSizeChangeCallback:
+ * @conn: connection object
+ * @dom: domain on which the event occurred
+ * @alias: memory device alias
+ * @size: new current size of memory device (in KiB)
+ * @opaque: application specified data
+ *
+ * The callback occurs when the guest acknowledges request to change size of
+ * memory device (so far only virtio-mem model supports this). The @size then
+ * reflects the new amount of guest visible memory (in kibibytes).
+ *
+ * The callback signature to use when registering for an event of type
+ * VIR_DOMAIN_EVENT_ID_MEMORY_DEVICE_SIZE_CHANGE with
+ * virConnectDomainEventRegisterAny().
+ */
+typedef void 
(*virConnectDomainEventMemoryDeviceSizeChangeCallback)(virConnectPtr conn,
+
virDomainPtr dom,
+const char 
*alias,
+unsigned 
long long size,
+void 
*opaque);
+
+
 /**
  * VIR_DOMAIN_EVENT_CALLBACK:
  *
@@ -4698,6 +4721,7 @@ typedef enum {
 VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* 
virConnectDomainEventMetadataChangeCallback */
 VIR_DOMAIN_EVENT_ID_BLOCK_THRESHOLD = 24, /* 
virConnectDomainEventBlockThresholdCallback */
 VIR_DOMAIN_EVENT_ID_MEMORY_FAILURE = 25,  /* 
virConnectDomainEventMemoryFailureCallback */
+

[PATCH v5 01/16] virhostmem: Introduce virHostMemGetTHPSize()

2021-09-13 Thread Michal Privoznik
New virHostMemGetTHPSize() is introduced which allows caller to
obtain THP PMD (Page Middle Directory) size, which is equal to
the minimal size that THP can use, taken from kernel doc
(Documentation/admin-guide/mm/transhuge.rst):

  Some userspace (such as a test program, or an optimized memory allocation
  library) may want to know the size (in bytes) of a transparent hugepage::

cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size

Since this size depends on the host architecture and the kernel
it won't change whilst libvirtd is running. Therefore, we can use
virOnce() and cache the value. Of course, we can be running under
kernel that has THP disabled or has no notion of THP at all. In
that case a negative value is returned to signal error.

Signed-off-by: Michal Privoznik 
---
 src/libvirt_private.syms |  1 +
 src/util/virhostmem.c| 54 
 src/util/virhostmem.h|  3 +++
 tests/domaincapsmock.c   |  9 +++
 4 files changed, 67 insertions(+)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 2778fe7f8f..d42d3abcce 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -2408,6 +2408,7 @@ virHostMemGetFreePages;
 virHostMemGetInfo;
 virHostMemGetParameters;
 virHostMemGetStats;
+virHostMemGetTHPSize;
 virHostMemSetParameters;
 
 
diff --git a/src/util/virhostmem.c b/src/util/virhostmem.c
index b13c3fe38e..5984dfab3f 100644
--- a/src/util/virhostmem.c
+++ b/src/util/virhostmem.c
@@ -45,11 +45,14 @@
 #include "virstring.h"
 #include "virnuma.h"
 #include "virlog.h"
+#include "virthread.h"
 
 #define VIR_FROM_THIS VIR_FROM_NONE
 
 VIR_LOG_INIT("util.hostmem");
 
+static unsigned long long virHostTHPPMDSize; /* in kibibytes */
+static virOnceControl virHostMemGetTHPSizeOnce = VIR_ONCE_CONTROL_INITIALIZER;
 
 #ifdef __FreeBSD__
 # define BSD_MEMORY_STATS_ALL 4
@@ -930,3 +933,54 @@ virHostMemAllocPages(unsigned int npages,
 
 return ncounts;
 }
+
+#if defined(__linux__)
+# define HPAGE_PMD_SIZE_PATH 
"/sys/kernel/mm/transparent_hugepage/hpage_pmd_size"
+static void
+virHostMemGetTHPSizeSysfs(unsigned long long *size)
+{
+if (virFileReadValueUllong(size, "%s", HPAGE_PMD_SIZE_PATH) < 0) {
+VIR_WARN("unable to get THP PMD size: %s", g_strerror(errno));
+return;
+}
+
+/* Size is now in bytes. Convert to KiB. */
+*size >>= 10;
+}
+#endif /* defined(__linux__) */
+
+
+static void
+virHostMemGetTHPSizeOnceInit(void)
+{
+#if defined(__linux__)
+virHostMemGetTHPSizeSysfs();
+#else /* !defined(__linux__) */
+VIR_WARN("Getting THP size not ported yet");
+#endif /* !defined(__linux__) */
+}
+
+
+/**
+ * virHostMemGetTHPSize:
+ * @size: returned size of THP in kibibytes
+ *
+ * Obtain Transparent Huge Page size in kibibytes. The size
+ * depends on host architecture and kernel. Because of virOnce(),
+ * do not rely on errno in case of failure.
+ *
+ * Returns: 0 on success,
+ * -1 on failure.
+ */
+int
+virHostMemGetTHPSize(unsigned long long *size)
+{
+if (virOnce(, virHostMemGetTHPSizeOnceInit) < 0)
+return -1;
+
+if (virHostTHPPMDSize == 0)
+return -1;
+
+*size = virHostTHPPMDSize;
+return 0;
+}
diff --git a/src/util/virhostmem.h b/src/util/virhostmem.h
index 3265215d84..c36de94f0f 100644
--- a/src/util/virhostmem.h
+++ b/src/util/virhostmem.h
@@ -55,3 +55,6 @@ int virHostMemAllocPages(unsigned int npages,
  unsigned int cellCount,
  int lastCell,
  bool add);
+
+int virHostMemGetTHPSize(unsigned long long *size)
+G_GNUC_NO_INLINE;
diff --git a/tests/domaincapsmock.c b/tests/domaincapsmock.c
index 0a6c541f77..d382d06e27 100644
--- a/tests/domaincapsmock.c
+++ b/tests/domaincapsmock.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "virhostcpu.h"
+#include "virhostmem.h"
 
 #if WITH_QEMU
 # include "virmock.h"
@@ -51,3 +52,11 @@ virQEMUCapsGetKVMSupportsSecureGuest(virQEMUCaps *qemuCaps)
 return real_virQEMUCapsGetKVMSupportsSecureGuest(qemuCaps);
 }
 #endif
+
+int
+virHostMemGetTHPSize(unsigned long long *size)
+{
+/* Pretend Transparent Huge Page size is 2MiB. */
+*size = 2048;
+return 0;
+}
-- 
2.32.0



[PATCH v5 04/16] conf: Introduce virtio-mem model

2021-09-13 Thread Michal Privoznik
The virtio-mem is paravirtualized mechanism of adding/removing
memory to/from a VM. A virtio-mem-pci device is split into blocks
of equal size which are then exposed (all or only a requested
portion of them) to the guest kernel to use as regular memory.
Therefore, the device has two important attributes:

  1) block-size, which defines the size of a block
  2) requested-size, which defines how much memory (in bytes)
 is the device requested to expose to the guest.

The 'block-size' is configured on command line and immutable
throughout device's lifetime. The 'requested-size' can be set on
the command line too, but also is adjustable via monitor. In
fact, that is how management software places its requests to
change the memory allocation. If it wants to give more memory to
the guest it changes 'requested-size' to a bigger value, and if it
wants to shrink guest memory it changes the 'requested-size' to a
smaller value. Note, value of zero means that guest should
release all memory offered by the device. Of course, guest has to
cooperate. Therefore, there is a third attribute 'size' which is
read only and reflects how much memory the guest still has. This
can be different to 'requested-size', obviously. Because of name
clash, I've named it 'current' and it is dealt with in future
commits (it is a runtime information anyway).

In the backend, memory for virtio-mem is backed by usual objects:
memory-backend-{ram,file,memfd} and their size puts the cap on
the amount of memory that a virtio-mem device can offer to a
guest. But we are already able to express this info using 
under .

Therefore, we need only two more elements to cover 'block-size'
and 'requested-size' attributes. This is the XML I've came up
with:

  

  1-3
  2048


  2097152
  0
  2048
  1048576


  

I hope by now it is obvious that:

  1) 'requested-size' must be an integer multiple of
 'block-size', and
  2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
 address.

Then there is a limitation that the minimal 'block-size' is
transparent huge page size (I'll leave this without explanation).

Signed-off-by: Michal Privoznik 
---
 docs/formatdomain.rst | 38 +--
 docs/schemas/domaincommon.rng | 11 +++
 src/conf/domain_conf.c| 53 ++-
 src/conf/domain_conf.h|  3 +
 src/conf/domain_validate.c| 39 +++
 src/qemu/qemu_alias.c |  1 +
 src/qemu/qemu_command.c   |  1 +
 src/qemu/qemu_domain.c| 10 +++
 src/qemu/qemu_domain_address.c| 38 ---
 src/qemu/qemu_process.c   |  2 +
 src/qemu/qemu_validate.c  |  8 +++
 src/security/security_apparmor.c  |  1 +
 src/security/security_dac.c   |  2 +
 src/security/security_selinux.c   |  2 +
 .../memory-hotplug-virtio-mem.xml | 67 +++
 ...emory-hotplug-virtio-mem.x86_64-latest.xml |  1 +
 tests/qemuxml2xmltest.c   |  1 +
 17 files changed, 261 insertions(+), 17 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
 create mode 12 
tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 9173d9f1d6..c327e1deae 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -7790,6 +7790,18 @@ Example: usage of the memory devices
  524288

  
+ 
+   
+ 1-3
+ 2048
+   
+   
+ 2097152
+ 0
+ 2048
+ 1048576
+   
+ 

...
 
@@ -7797,7 +7809,8 @@ Example: usage of the memory devices
Provide ``dimm`` to add a virtual DIMM module to the guest. :since:`Since
1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module.
:since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a paravirtualized
-   persistent memory device. :since:`Since 7.1.0`
+   persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` model
+   to add paravirtualized memory device. :since:`Since 7.4.0`
 
 ``access``
An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
@@ -7820,10 +7833,11 @@ Example: usage of the memory devices
allowed only for ``model='nvdimm'`` for pSeries guests. :since:`Since 6.2.0`
 
 ``source``
-   For model ``dimm`` this element is optional and allows to fine tune the
-   source of the memory used for the given memory device. If the element is not
-   provided defaults configured via ``numatune`` are used. If ``dimm`` is
-   provided, then the following optional elements can be provided as well:
+   For model ``dimm`` and model ``virtio-mem`` this element is optional and
+   allows to fine tune the source of the memory used for the given memory
+   

[PATCH v5 11/16] qemu: Refresh the current size of virtio-mem on monitor reconnect

2021-09-13 Thread Michal Privoznik
If the QEMU driver restarts it loses the track of the current size
of virtio-mem (because it's runtime type of information and thus
not stored in XML) and therefore, we have to refresh it when
reconnecting to the domain monitor.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_domain.c   | 20 +++--
 src/qemu/qemu_monitor.h  |  3 ++
 src/qemu/qemu_monitor_json.c | 58 ++--
 src/qemu/qemu_process.c  |  3 ++
 4 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 325e45d009..00c6ae877c 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -8375,9 +8375,23 @@ qemuDomainUpdateMemoryDeviceInfo(virQEMUDriver *driver,
 if (!(dimm = virHashLookup(meminfo, mem->info.alias)))
 continue;
 
-mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM;
-mem->info.addr.dimm.slot = dimm->slot;
-mem->info.addr.dimm.base = dimm->address;
+switch (mem->model) {
+case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+mem->currentsize = VIR_DIV_UP(dimm->size, 1024);
+break;
+
+case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
+mem->info.type = VIR_DOMAIN_DEVICE_ADDRESS_TYPE_DIMM;
+mem->info.addr.dimm.slot = dimm->slot;
+mem->info.addr.dimm.base = dimm->address;
+break;
+
+case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+case VIR_DOMAIN_MEMORY_MODEL_NONE:
+case VIR_DOMAIN_MEMORY_MODEL_LAST:
+break;
+}
 }
 
 virHashFree(meminfo);
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h
index 962b516542..648fe293ed 100644
--- a/src/qemu/qemu_monitor.h
+++ b/src/qemu/qemu_monitor.h
@@ -1385,10 +1385,13 @@ int qemuMonitorSetIOThread(qemuMonitor *mon,
 
 typedef struct _qemuMonitorMemoryDeviceInfo qemuMonitorMemoryDeviceInfo;
 struct _qemuMonitorMemoryDeviceInfo {
+/* For pc-dimm */
 unsigned long long address;
 unsigned int slot;
 bool hotplugged;
 bool hotpluggable;
+/* For virtio-mem */
+unsigned long long size; /* in bytes */
 };
 
 int qemuMonitorGetMemoryDeviceInfo(qemuMonitor *mon,
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
index ddce0fd2cd..a4e60b4ee9 100644
--- a/src/qemu/qemu_monitor_json.c
+++ b/src/qemu/qemu_monitor_json.c
@@ -7984,7 +7984,6 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon,
 virJSONValue *cmd;
 virJSONValue *reply = NULL;
 virJSONValue *data = NULL;
-qemuMonitorMemoryDeviceInfo *meminfo = NULL;
 size_t i;
 
 if (!(cmd = qemuMonitorJSONMakeCommand("query-memory-devices", NULL)))
@@ -8005,6 +8004,9 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon,
 
 for (i = 0; i < virJSONValueArraySize(data); i++) {
 virJSONValue *elem = virJSONValueArrayGet(data, i);
+g_autofree qemuMonitorMemoryDeviceInfo *meminfo = NULL;
+virJSONValue *dimminfo;
+const char *devalias;
 const char *type;
 
 if (!(type = virJSONValueObjectGetString(elem, "type"))) {
@@ -8014,26 +8016,26 @@ qemuMonitorJSONGetMemoryDeviceInfo(qemuMonitor *mon,
 goto cleanup;
 }
 
+if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("query-memory-devices reply data doesn't "
+ "contain enum data"));
+goto cleanup;
+}
+
+/* While 'id' attribute is marked as optional in QEMU's QAPI
+ * specification, Libvirt always sets it. Thus we can fail if not
+ * present. */
+if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("dimm memory info data is missing 'id'"));
+goto cleanup;
+}
+
+meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1);
+
 /* dimm memory devices */
 if (STREQ(type, "dimm")) {
-virJSONValue *dimminfo;
-const char *devalias;
-
-if (!(dimminfo = virJSONValueObjectGetObject(elem, "data"))) {
-virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-   _("query-memory-devices reply data doesn't "
- "contain enum data"));
-goto cleanup;
-}
-
-if (!(devalias = virJSONValueObjectGetString(dimminfo, "id"))) {
-virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-   _("dimm memory info data is missing 'id'"));
-goto cleanup;
-}
-
-meminfo = g_new0(qemuMonitorMemoryDeviceInfo, 1);
-
 if (virJSONValueObjectGetNumberUlong(dimminfo, "addr",
  >address) < 0) {
   

[PATCH v5 07/16] qemu: Wire up offline update

2021-09-13 Thread Michal Privoznik
Updating offline XML of  devices might come handy when
dealing with virtio-mem devices. But it's implemented to just
replace one virDomainMemoryDef with another so it can be used to
change almost anything.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_driver.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index d05052a3d7..10ff8519c5 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -7764,6 +7764,7 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef,
 virDomainDiskDef *newDisk;
 virDomainGraphicsDef *newGraphics;
 virDomainNetDef *net;
+virDomainMemoryDef *mem;
 virDomainDeviceDef oldDev = { .type = dev->type };
 int pos;
 
@@ -7826,6 +7827,23 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef,
 dev->data.net = NULL;
 break;
 
+case VIR_DOMAIN_DEVICE_MEMORY:
+mem = virDomainMemoryFindByDeviceInfo(vmdef, >data.memory->info, 
);
+if (!mem) {
+virReportError(VIR_ERR_INVALID_ARG, "%s", _("memory not found"));
+return -1;
+}
+
+oldDev.data.memory = mem;
+if (virDomainDefCompatibleDevice(vmdef, dev, ,
+ VIR_DOMAIN_DEVICE_ACTION_UPDATE,
+ false) < 0)
+return -1;
+
+virDomainMemoryDefFree(vmdef->mems[pos]);
+vmdef->mems[pos] = g_steal_pointer(>data.memory);
+break;
+
 case VIR_DOMAIN_DEVICE_FS:
 case VIR_DOMAIN_DEVICE_INPUT:
 case VIR_DOMAIN_DEVICE_SOUND:
@@ -7842,7 +7860,6 @@ qemuDomainUpdateDeviceConfig(virDomainDef *vmdef,
 case VIR_DOMAIN_DEVICE_CONTROLLER:
 case VIR_DOMAIN_DEVICE_REDIRDEV:
 case VIR_DOMAIN_DEVICE_CHR:
-case VIR_DOMAIN_DEVICE_MEMORY:
 case VIR_DOMAIN_DEVICE_NONE:
 case VIR_DOMAIN_DEVICE_TPM:
 case VIR_DOMAIN_DEVICE_PANIC:
-- 
2.32.0



[PATCH v5 09/16] conf: Introduce virDomainMemoryFindByDeviceAlias()

2021-09-13 Thread Michal Privoznik
This function will be needed in the next commit where we will
want to find virtio-mem given its alias by QEMU on the monitor.

Signed-off-by: Michal Privoznik 
---
 src/conf/domain_conf.c   | 17 +
 src/conf/domain_conf.h   |  5 +
 src/libvirt_private.syms |  1 +
 3 files changed, 23 insertions(+)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index c91dd9d81c..ae0218a674 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -16618,6 +16618,23 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *def,
 }
 
 
+virDomainMemoryDef *
+virDomainMemoryFindByDeviceAlias(virDomainDef *def,
+ const char *alias)
+{
+size_t i;
+
+for (i = 0; i < def->nmems; i++) {
+virDomainMemoryDef *tmp = def->mems[i];
+
+if (STREQ_NULLABLE(tmp->info.alias, alias))
+return tmp;
+}
+
+return NULL;
+}
+
+
 /**
  * virDomainMemoryInsert:
  *
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 91bf52ff3b..2ce32934d5 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -3815,6 +3815,11 @@ virDomainMemoryFindByDeviceInfo(virDomainDef *dev,
 int *pos)
 ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
 
+virDomainMemoryDef *
+virDomainMemoryFindByDeviceAlias(virDomainDef *def,
+ const char *alias)
+ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
+
 int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem)
 ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
 bool virDomainShmemDefEquals(virDomainShmemDef *src, virDomainShmemDef *dst)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 3f9035c25b..dfb1e29ca0 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -508,6 +508,7 @@ virDomainMemballoonModelTypeFromString;
 virDomainMemballoonModelTypeToString;
 virDomainMemoryDefFree;
 virDomainMemoryFindByDef;
+virDomainMemoryFindByDeviceAlias;
 virDomainMemoryFindByDeviceInfo;
 virDomainMemoryFindInactiveByDef;
 virDomainMemoryInsert;
-- 
2.32.0



[PATCH v5 03/16] qemu_capabilities: Introduce QEMU_CAPS_MEMORY_BACKEND_RESERVE

2021-09-13 Thread Michal Privoznik
This capability tracks whether memory-backend-* supports .reserve
attribute which is going to be important for backends associated
with virtio-mem devices.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_capabilities.c | 4 
 src/qemu/qemu_capabilities.h | 3 +++
 tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 +
 3 files changed, 8 insertions(+)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index 711dc2f248..f92ef069d1 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -639,6 +639,9 @@ VIR_ENUM_IMPL(virQEMUCaps,
   "s390-pv-guest", /* QEMU_CAPS_S390_PV_GUEST */
   "set-action", /* QEMU_CAPS_SET_ACTION */
   "virtio-mem-pci", /* QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI */
+
+  /* 410 */
+  "memory-backend-file.reserve", /* 
QEMU_CAPS_MEMORY_BACKEND_RESERVE */
 );
 
 
@@ -1719,6 +1722,7 @@ static struct virQEMUCapsStringFlags 
virQEMUCapsObjectPropsMemoryBackendFile[] =
  * supported. The 'x-' prefix was kept for compatibility with already
  * released qemu versions. */
 { "x-use-canonical-path-for-ramblock-id", 
QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID },
+{ "reserve", QEMU_CAPS_MEMORY_BACKEND_RESERVE },
 };
 
 static struct virQEMUCapsStringFlags 
virQEMUCapsObjectPropsMemoryBackendMemfd[] = {
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
index 1e88591975..79a7df4f7d 100644
--- a/src/qemu/qemu_capabilities.h
+++ b/src/qemu/qemu_capabilities.h
@@ -620,6 +620,9 @@ typedef enum { /* virQEMUCapsFlags grouping marker for 
syntax-check */
 QEMU_CAPS_SET_ACTION, /* 'set-action' QMP command */
 QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */
 
+/* 410 */
+QEMU_CAPS_MEMORY_BACKEND_RESERVE, /* -object memory-backend-*.reserve= */
+
 QEMU_CAPS_LAST /* this must always be the last item */
 } virQEMUCapsFlags;
 
diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml
index 20ec221deb..ba4b65f04d 100644
--- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml
@@ -257,6 +257,7 @@
   
   
   
+  
   6001000
   0
   43100243
-- 
2.32.0



[PATCH v5 05/16] qemu: Build command line for virtio-mem

2021-09-13 Thread Michal Privoznik
Nothing special is happening here. All important changes were
done when for 'virtio-pmem' (adjusting the code to put virtio
memory on PCI bus, generating alias using
qemuDomainDeviceAliasIndex(). The only bit that might look
suspicious is no prealloc for virtio-mem. But if you think about
it, the whole purpose of this device is to change amount of
memory exposed to guest on the fly. There is no point in locking
the whole backend in memory.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_alias.c |  9 +++-
 src/qemu/qemu_command.c   | 24 +--
 ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++
 tests/qemuxml2argvtest.c  |  1 +
 4 files changed, 70 insertions(+), 5 deletions(-)
 create mode 100644 
tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args

diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c
index 79e8953b2f..81a1e7eeed 100644
--- a/src/qemu/qemu_alias.c
+++ b/src/qemu/qemu_alias.c
@@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def,
 size_t i;
 int maxidx = 0;
 
-/* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */
-if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM)
+/* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not
+ * valid */
+if (!oldAlias &&
+mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM &&
+mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
 return mem->info.addr.dimm.slot;
 
 for (i = 0; i < def->nmems; i++) {
@@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def,
 prefix = "virtiopmem";
 break;
 case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+prefix = "virtiomem";
+break;
 case VIR_DOMAIN_MEMORY_MODEL_NONE:
 case VIR_DOMAIN_MEMORY_MODEL_LAST:
 default:
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 594e5643b1..91f3094ac8 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3136,9 +3136,19 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps,
 virJSONValueObjectAdd(props, "b:x-use-canonical-path-for-ramblock-id", 
false, NULL) < 0)
 return -1;
 
-if (!priv->memPrealloc &&
-virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0)
-return -1;
+if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) {
+/* Explicitly disable prealloc for virtio-mem */
+if (priv->memPrealloc &&
+virJSONValueObjectAppendBoolean(props, "prealloc", 0) < 0)
+return -1;
+if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MEMORY_BACKEND_RESERVE) &&
+virJSONValueObjectAppendBoolean(props, "reserve", 0) < 0)
+return -1;
+} else {
+if (!priv->memPrealloc &&
+virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0)
+return -1;
+}
 
 if (virJSONValueObjectAdd(props, "U:size", mem->size * 1024, NULL) < 0)
 return -1;
@@ -3318,6 +3328,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def,
 break;
 
 case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+device = "virtio-mem-pci";
+break;
+
 case VIR_DOMAIN_MEMORY_MODEL_NONE:
 case VIR_DOMAIN_MEMORY_MODEL_LAST:
 default:
@@ -3334,6 +3347,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def,
 if (mem->labelsize)
 virBufferAsprintf(, "label-size=%llu,", mem->labelsize * 1024);
 
+if (mem->blocksize) {
+virBufferAsprintf(, "block-size=%llu,", mem->blocksize * 1024);
+virBufferAsprintf(, "requested-size=%llu,", mem->requestedsize * 
1024);
+}
+
 if (mem->uuid) {
 char uuidstr[VIR_UUID_STRING_BUFLEN];
 
diff --git 
a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args 
b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
new file mode 100644
index 00..22ee1bc459
--- /dev/null
+++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
@@ -0,0 +1,41 @@
+LC_ALL=C \
+PATH=/bin \
+HOME=/tmp/lib/domain--1-QEMUGuest1 \
+USER=test \
+LOGNAME=test \
+XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
+XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
+XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
+/usr/bin/qemu-system-i386 \
+-name guest=QEMUGuest1,debug-threads=on \
+-S \
+-object 
'{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/tmp/lib/domain--1-QEMUGuest1/master-key.aes"}'
 \
+-machine pc,accel=kvm,usb=off,dump-guest-core=off \
+-cpu qemu64 \
+-m size=2095104k,slots=16,maxmem=1099511627776k \
+-overcommit mem-lock=off \
+-smp 2,sockets=2,dies=1,cores=1,threads=1 \
+-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":2145386496}' 
\
+-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
+-object 
'{"qom-type":"memory-backend-ram","id":"memvirtiomem0","reserve":false,"size":1073741824}'
 \
+-device 

[PATCH v5 14/16] virsh: Introduce update-memory-device command

2021-09-13 Thread Michal Privoznik
New 'update-memory-device' command is introduced which aims on
making it user friendly to change  device. So far I just
need to change  so I'm introducing --requested-size
only; but the idea is that this is extensible for other cases
too. For instance, want to change ? A new
--my-element argument can be easily introduced.

Signed-off-by: Michal Privoznik 
---
 docs/manpages/virsh.rst |  30 
 tools/virsh-domain.c| 161 
 2 files changed, 191 insertions(+)

diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst
index 9561b3f59d..c57937a55a 100644
--- a/docs/manpages/virsh.rst
+++ b/docs/manpages/virsh.rst
@@ -4919,6 +4919,36 @@ results as some fields may be autogenerated and thus 
match devices other than
 expected.
 
 
+update-memory-device
+
+
+**Syntax:**
+
+::
+
+   update-memory-device domain [--print-xml] [[--alias alias] | [--node node]]
+ [[--live] [--config] | [--current]]
+ [--requested-size size]
+
+This command finds  device inside given *domain*, changes
+requested values and passes updated device XML to daemon. If *--print-xml* is
+specified then the device is not changed, but the updated device XML is printed
+to stdout.  If there are more than one  devices in *domain* use
+*--alias* or *--node* to select the desired one.
+
+If *--live* is specified, affect a running domain.
+If *--config* is specified, affect the next startup of a persistent guest.
+If *--current* is specified, it is equivalent to either *--live* or
+*--config*, depending on the current state of the guest.
+Both *--live* and *--config* flags may be given, but *--current* is
+exclusive. Not specifying any flag is the same as specifying *--current*.
+
+If *--requested-size* is specified then  under memory target is
+changed to requested *size* (as scaled integer, see ``NOTES`` above). It
+defaults to kibibytes if no suffix is provided. The option is valid only for
+``virtio-mem`` memory device model.
+
+
 change-media
 
 
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index b06f651c13..0c02ce5ae0 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -8910,6 +8910,161 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd)
 return true;
 }
 
+
+/*
+ * "update-memory-device" command
+ */
+static const vshCmdInfo info_update_memory_device[] = {
+{.name = "help",
+ .data = N_("update memory device of a domain")
+},
+{.name = "desc",
+ .data = N_("Update values of a memory device of a domain")
+},
+{.name = NULL}
+};
+
+static const vshCmdOptDef opts_update_memory_device[] = {
+VIRSH_COMMON_OPT_DOMAIN_FULL(0),
+VIRSH_COMMON_OPT_DOMAIN_CONFIG,
+VIRSH_COMMON_OPT_DOMAIN_LIVE,
+VIRSH_COMMON_OPT_DOMAIN_CURRENT,
+{.name = "print-xml",
+ .type = VSH_OT_BOOL,
+ .help = N_("print updated memory device XML instead of executing the 
change")
+},
+{.name = "alias",
+ .type = VSH_OT_STRING,
+ .completer = virshDomainDeviceAliasCompleter,
+ .help = N_("memory device alias")
+},
+{.name = "node",
+ .type = VSH_OT_INT,
+ .help = N_("memory device target node")
+},
+{.name = "requested-size",
+ .type = VSH_OT_INT,
+ .help = N_("new value of  size, as scaled integer (default 
KiB)")
+},
+{.name = NULL}
+};
+
+static int
+virshGetUpdatedMemoryXML(char **updatedMemoryXML,
+ vshControl *ctl,
+ const vshCmd *cmd,
+ virDomainPtr dom,
+ unsigned int flags)
+{
+const char *alias = NULL;
+bool nodeOpt = false;
+unsigned int node = 0;
+g_autoptr(xmlDoc) doc = NULL;
+g_autoptr(xmlXPathContext) ctxt = NULL;
+g_autofree char *xpath = NULL;
+int nmems;
+g_autofree xmlNodePtr *mems = NULL;
+unsigned int domainXMLFlags = 0;
+
+if (flags & VIR_DOMAIN_AFFECT_CONFIG)
+domainXMLFlags |= VIR_DOMAIN_XML_INACTIVE;
+
+if (virshDomainGetXMLFromDom(ctl, dom, domainXMLFlags, , ) < 0)
+return -1;
+
+nodeOpt = vshCommandOptBool(cmd, "node");
+if (vshCommandOptStringReq(ctl, cmd, "alias", ) < 0 ||
+vshCommandOptUInt(ctl, cmd, "node", ) < 0) {
+return -1;
+}
+
+if (nodeOpt) {
+xpath = g_strdup_printf("/domain/devices/memory[./target/node='%u']", 
node);
+} else if (alias) {
+xpath = g_strdup_printf("/domain/devices/memory[./alias/@name='%s']", 
alias);
+} else {
+xpath = g_strdup("/domain/devices/memory");
+}
+
+nmems = virXPathNodeSet(xpath, ctxt, );
+if (nmems < 0) {
+vshSaveLibvirtError();
+return -1;
+} else if (nmems == 0) {
+vshError(ctl, _("no memory device found"));
+return -1;
+} else if (nmems > 1) {
+vshError(ctl, _("multiple memory devices found, use --alias or --node 
to select one"));
+return -1;
+}
+
+ctxt->node = mems[0];
+
+

[PATCH v5 12/16] qemu: Account for both memballoon and virtio-mem

2021-09-13 Thread Michal Privoznik
Reporting how much memory is exposed to the guest happens under
 which is taken from def->mem.cur_balloon. The
reported amount should account for both balloon size and the sum
of @currentsize of all virtio-mems. For instance, if domain has
4GiB via balloon and additional 2GiB via virtio-mem, then the
domain XML should report 6GiB. The same applies for domain
statistics.

The way to achieve this is to account for either balloon or
virtio-mem when the size of the other is changed, e.g. on balloon
change we have to add all @currentsize (for non virtio-mem these
will be zero, so the check for memory model is needless, but
makes it more obvious what's happening), and vice versa.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_driver.c  |  9 +
 src/qemu/qemu_process.c | 25 +
 2 files changed, 34 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 712ec68f7c..84adeec1c2 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -4261,6 +4261,7 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver,
 {
 virDomainMemoryDef *mem = NULL;
 virObjectEvent *event = NULL;
+unsigned long long balloon;
 
 if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
 return;
@@ -4276,7 +4277,15 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver,
 goto endjob;
 }
 
+/* If this looks weird it's because it is. The balloon size
+ * as reported by QEMU does not include any of @currentsize.
+ * It really contains just the balloon size. But in domain
+ * definition we want to report also sum of @currentsize. Do
+ * a bit of math to fix the domain definition. */
+balloon = vm->def->mem.cur_balloon - mem->currentsize;
 mem->currentsize = VIR_DIV_UP(info->size, 1024);
+balloon += mem->currentsize;
+vm->def->mem.cur_balloon = balloon;
 
 event = virDomainEventMemoryDeviceSizeChangeNewFromObj(vm,
info->devAlias,
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index b2ac5937ec..f8e7345329 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1213,10 +1213,22 @@ qemuProcessHandleBalloonChange(qemuMonitor *mon 
G_GNUC_UNUSED,
 virQEMUDriver *driver = opaque;
 virObjectEvent *event = NULL;
 g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);
+size_t i;
 
 virObjectLock(vm);
 event = virDomainEventBalloonChangeNewFromObj(vm, actual);
 
+/* We want the balloon size stored in domain definition to
+ * account for the actual size of virtio-mem too. But the
+ * balloon size as reported by QEMU (@actual) contains just
+ * the balloon size without any virtio-mem. Do a wee bit of
+ * math to fix it. */
+VIR_DEBUG("balloon size before fix is %lld", actual);
+for (i = 0; i < vm->def->nmems; i++) {
+if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
+actual += vm->def->mems[i]->currentsize;
+}
+
 VIR_DEBUG("Updating balloon from %lld to %lld kb",
   vm->def->mem.cur_balloon, actual);
 vm->def->mem.cur_balloon = actual;
@@ -2343,6 +2355,7 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver,
int asyncJob)
 {
 unsigned long long balloon;
+size_t i;
 int rc;
 
 /* if no ballooning is available, the current size equals to the current
@@ -2359,6 +2372,18 @@ qemuProcessRefreshBalloonState(virQEMUDriver *driver,
 if (qemuDomainObjExitMonitor(driver, vm) < 0 || rc < 0)
 return -1;
 
+/* We want the balloon size stored in domain definition to
+ * account for the actual size of virtio-mem too. But the
+ * balloon size as reported by QEMU (@balloon) contains just
+ * the balloon size without any virtio-mem. Do a wee bit of
+ * math to fix it. */
+VIR_DEBUG("balloon size before fix is %lld", balloon);
+for (i = 0; i < vm->def->nmems; i++) {
+if (vm->def->mems[i]->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
+balloon += vm->def->mems[i]->currentsize;
+}
+VIR_DEBUG("Updating balloon from %lld to %lld kb",
+  vm->def->mem.cur_balloon, balloon);
 vm->def->mem.cur_balloon = balloon;
 
 return 0;
-- 
2.32.0



[PATCH v5 00/16] Introduce virtio-mem model

2021-09-13 Thread Michal Privoznik
v4 of:

https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html

diff to v4:
- Rebased onto current master
- Worked in David's suggestions, e.g. rename from  to
  , implemented offline memory update, implemented --node
  argument to virsh update-memory-device, prealloc is OFF and reserve is
  ON for virtio-mem

Some suggestions are left as future work. For instance:
- Don't require memory slots because virtio-mem lives on PCI bus anyway
- Allow path backed backend for virtio-mem
- support .prealloc for virtio-mem object (not memory-backend-* !)


I keep occasionally rebased version on my gitlab:

https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/

Michal Prívozník (16):
  virhostmem: Introduce virHostMemGetTHPSize()
  qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI
  qemu_capabilities: Introduce QEMU_CAPS_MEMORY_BACKEND_RESERVE
  conf: Introduce virtio-mem  model
  qemu: Build command line for virtio-mem
  qemu: Wire up  live update
  qemu: Wire up  offline update
  Introduce  property to virtio-mem
  conf: Introduce virDomainMemoryFindByDeviceAlias()
  qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event
  qemu: Refresh the current size of virtio-mem on monitor reconnect
  qemu: Account for both memballoon and virtio-mem
  qemuDomainSetMemoryFlags: Take virtio-mem into consideration
  virsh: Introduce update-memory-device command
  news: document recent virtio memory addition
  kbase: Document virtio-mem

 NEWS.rst  |   7 +
 docs/formatdomain.rst |  45 ++-
 docs/kbase/index.rst  |   4 +
 docs/kbase/memorydevices.rst  | 150 ++
 docs/kbase/meson.build|   1 +
 docs/manpages/virsh.rst   |  30 ++
 docs/schemas/domaincommon.rng |  16 ++
 examples/c/misc/event-test.c  |  17 ++
 include/libvirt/libvirt-domain.h  |  24 ++
 src/conf/domain_conf.c| 126 -
 src/conf/domain_conf.h|  16 ++
 src/conf/domain_event.c   |  84 ++
 src/conf/domain_event.h   |  10 +
 src/conf/domain_validate.c|  39 +++
 src/libvirt_private.syms  |   5 +
 src/qemu/qemu_alias.c |  10 +-
 src/qemu/qemu_capabilities.c  |   6 +
 src/qemu/qemu_capabilities.h  |   4 +
 src/qemu/qemu_command.c   |  25 +-
 src/qemu/qemu_domain.c|  33 ++-
 src/qemu/qemu_domain.h|   1 +
 src/qemu/qemu_domain_address.c|  38 ++-
 src/qemu/qemu_driver.c| 259 +-
 src/qemu/qemu_hotplug.c   |  18 ++
 src/qemu/qemu_hotplug.h   |   5 +
 src/qemu/qemu_monitor.c   |  34 +++
 src/qemu/qemu_monitor.h   |  28 ++
 src/qemu/qemu_monitor_json.c  |  97 +--
 src/qemu/qemu_monitor_json.h  |   5 +
 src/qemu/qemu_process.c   |  72 +
 src/qemu/qemu_validate.c  |   8 +
 src/remote/remote_daemon_dispatch.c   |  30 ++
 src/remote/remote_driver.c|  32 +++
 src/remote/remote_protocol.x  |  15 +-
 src/remote_protocol-structs   |   7 +
 src/security/security_apparmor.c  |   1 +
 src/security/security_dac.c   |   2 +
 src/security/security_selinux.c   |   2 +
 src/util/virhostmem.c |  54 
 src/util/virhostmem.h |   3 +
 tests/domaincapsmock.c|   9 +
 .../caps_5.1.0.x86_64.xml |   1 +
 .../caps_5.2.0.x86_64.xml |   1 +
 .../caps_6.0.0.x86_64.xml |   1 +
 .../caps_6.1.0.x86_64.xml |   2 +
 ...mory-hotplug-virtio-mem.x86_64-latest.args |  41 +++
 .../memory-hotplug-virtio-mem.xml |  67 +
 tests/qemuxml2argvtest.c  |   1 +
 ...emory-hotplug-virtio-mem.x86_64-latest.xml |   1 +
 tests/qemuxml2xmltest.c   |   1 +
 tools/virsh-domain.c  | 181 
 51 files changed, 1613 insertions(+), 56 deletions(-)
 create mode 100644 docs/kbase/memorydevices.rst
 create mode 100644 
tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
 create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
 create mode 12 
tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml

-- 
2.32.0



[PATCH v5 06/16] qemu: Wire up live update

2021-09-13 Thread Michal Privoznik
As advertised in one of previous commits, we want to be able to
change 'requested-size' attribute of virtio-mem on the fly. This
commit does exactly that. Changing anything else is checked for
and forbidden.

Once guest has changed the allocation, QEMU emits an event which
we will use to track the allocation. In the next commit.

Signed-off-by: Michal Privoznik 
---
 src/conf/domain_conf.c   |  47 +-
 src/conf/domain_conf.h   |   5 +
 src/libvirt_private.syms |   1 +
 src/qemu/qemu_driver.c   | 174 ++-
 src/qemu/qemu_hotplug.c  |  18 
 src/qemu/qemu_hotplug.h  |   5 +
 src/qemu/qemu_monitor.c  |  13 +++
 src/qemu/qemu_monitor.h  |   5 +
 src/qemu/qemu_monitor_json.c |  15 +++
 src/qemu/qemu_monitor_json.h |   5 +
 10 files changed, 286 insertions(+), 2 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index c4cac65a7b..9562065499 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -16574,6 +16574,50 @@ virDomainMemoryFindInactiveByDef(virDomainDef *def,
 }
 
 
+/**
+ * virDomainMemoryFindByDeviceInfo:
+ * @def: domain defintion
+ * @info: device info to match
+ * @pos: store position within array
+ *
+ * For given domain definition @def find  device with
+ * matching address and matching device alias (if set in @info,
+ * otherwise ignored).
+ *
+ * If @pos is not NULL then the position of the matched device
+ * within the array is stored there.
+ *
+ * Returns: device if found,
+ *  NULL otherwise.
+ */
+virDomainMemoryDef *
+virDomainMemoryFindByDeviceInfo(virDomainDef *def,
+virDomainDeviceInfo *info,
+int *pos)
+{
+size_t i;
+
+for (i = 0; i < def->nmems; i++) {
+virDomainMemoryDef *tmp = def->mems[i];
+
+if (!virDomainDeviceInfoAddressIsEqual(>info, info))
+continue;
+
+/* alias, if present */
+if (info->alias &&
+STRNEQ_NULLABLE(tmp->info.alias, info->alias))
+continue;
+
+if (pos)
+*pos = i;
+
+return tmp;
+}
+
+return NULL;
+}
+
+
 /**
  * virDomainMemoryInsert:
  *
@@ -28567,7 +28611,8 @@ virDomainDefCompatibleDevice(virDomainDef *def,
 return -1;
 }
 
-if ((virDomainDefGetMemoryTotal(def) + sz) > def->mem.max_memory) {
+if (action == VIR_DOMAIN_DEVICE_ACTION_ATTACH &&
+(virDomainDefGetMemoryTotal(def) + sz) > def->mem.max_memory) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
_("Attaching memory device with size '%llu' would "
  "exceed domain's maxMemory config size '%llu'"),
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 3bdfb66a16..e9d8e0c7ae 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -3806,6 +3806,11 @@ int virDomainMemoryFindByDef(virDomainDef *def, 
virDomainMemoryDef *mem)
 int virDomainMemoryFindInactiveByDef(virDomainDef *def,
  virDomainMemoryDef *mem)
 ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
+virDomainMemoryDef *
+virDomainMemoryFindByDeviceInfo(virDomainDef *dev,
+virDomainDeviceInfo *info,
+int *pos)
+ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
 
 int virDomainShmemDefInsert(virDomainDef *def, virDomainShmemDef *shmem)
 ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index d42d3abcce..3f9035c25b 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -508,6 +508,7 @@ virDomainMemballoonModelTypeFromString;
 virDomainMemballoonModelTypeToString;
 virDomainMemoryDefFree;
 virDomainMemoryFindByDef;
+virDomainMemoryFindByDeviceInfo;
 virDomainMemoryFindInactiveByDef;
 virDomainMemoryInsert;
 virDomainMemoryModelTypeToString;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index dfc27572c4..d05052a3d7 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -7087,6 +7087,175 @@ qemuDomainChangeDiskLive(virDomainObj *vm,
 return 0;
 }
 
+
+static bool
+qemuDomainChangeMemoryLiveValidateChange(const virDomainMemoryDef *oldDef,
+ const virDomainMemoryDef *newDef)
+{
+/* The only thing that is allowed to change is 'requestedsize' for
+ * virtio-mem model. Check if user isn't trying to sneak in change for
+ * something else. */
+
+switch (oldDef->model) {
+case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+break;
+
+case VIR_DOMAIN_MEMORY_MODEL_NONE:
+case VIR_DOMAIN_MEMORY_MODEL_DIMM:
+case VIR_DOMAIN_MEMORY_MODEL_NVDIMM:
+case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM:
+case VIR_DOMAIN_MEMORY_MODEL_LAST:
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ 

[PATCH v5 02/16] qemu_capabilities: Introduce QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI

2021-09-13 Thread Michal Privoznik
This commit introduces a new capability that reflects virtio-mem-pci
device support in QEMU:

  QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */

The virtio-mem-pci device was introduced in QEMU 5.1.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_capabilities.c | 2 ++
 src/qemu/qemu_capabilities.h | 1 +
 tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 +
 tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 +
 tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 +
 tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 +
 6 files changed, 7 insertions(+)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index f27a621f8c..711dc2f248 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -638,6 +638,7 @@ VIR_ENUM_IMPL(virQEMUCaps,
   "query-display-options", /* QEMU_CAPS_QUERY_DISPLAY_OPTIONS */
   "s390-pv-guest", /* QEMU_CAPS_S390_PV_GUEST */
   "set-action", /* QEMU_CAPS_SET_ACTION */
+  "virtio-mem-pci", /* QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI */
 );
 
 
@@ -1354,6 +1355,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = {
 { "virtio-gpu-gl-pci", QEMU_CAPS_VIRTIO_GPU_GL_PCI },
 { "virtio-vga-gl", QEMU_CAPS_VIRTIO_VGA_GL },
 { "s390-pv-guest", QEMU_CAPS_S390_PV_GUEST },
+{ "virtio-mem-pci", QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI },
 };
 
 
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
index f3379f556c..1e88591975 100644
--- a/src/qemu/qemu_capabilities.h
+++ b/src/qemu/qemu_capabilities.h
@@ -618,6 +618,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for 
syntax-check */
 QEMU_CAPS_QUERY_DISPLAY_OPTIONS, /* 'query-display-options' qmp command 
present */
 QEMU_CAPS_S390_PV_GUEST, /* -object s390-pv-guest,... */
 QEMU_CAPS_SET_ACTION, /* 'set-action' QMP command */
+QEMU_CAPS_DEVICE_VIRTIO_MEM_PCI, /* -device virtio-mem-pci */
 
 QEMU_CAPS_LAST /* this must always be the last item */
 } virQEMUCapsFlags;
diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml
index fc0b502ef9..ff984afaa4 100644
--- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml
@@ -245,6 +245,7 @@
   
   
   
+  
   5001000
   0
   43100242
diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml
index cb1226fc04..1d6ac320b3 100644
--- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml
@@ -246,6 +246,7 @@
   
   
   
+  
   5002000
   0
   43100243
diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml
index d03b8aa726..8be83614ad 100644
--- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml
@@ -254,6 +254,7 @@
   
   
   
+  
   600
   0
   43100242
diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml
index 8239f4266a..20ec221deb 100644
--- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml
@@ -256,6 +256,7 @@
   
   
   
+  
   6001000
   0
   43100243
-- 
2.32.0



[PATCH v3 1/2] qemu: move temp file of screenshot and memorypeek to per-domain dir

2021-09-13 Thread Peng Liang
The temp files of screenshot and memory peek, which are created by QEMU,
are put in the cache directory.  However, the caches of domain
capabilities, which are created and used by libvirtd, are also put in
the cache directory.  In order to make the cache directory more secure,
move the temp files of screenshot and memory peek to per-domain
directory.

Since the temp files are just temporary files and are only used by
libvirtd (libvirtd will delete them after use), the use of screenshot
and memory peek will be affected.

Signed-off-by: Peng Liang 
---
 src/qemu/qemu_driver.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index dfc27572c461..7ffe5f1e3856 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -3431,7 +3431,7 @@ qemuDomainScreenshot(virDomainPtr dom,
 }
 }
 
-tmp = g_strdup_printf("%s/qemu.screendump.XX", cfg->cacheDir);
+tmp = g_strdup_printf("%s/qemu.screendump.XX", priv->libDir);
 
 if ((tmp_fd = g_mkstemp_full(tmp, O_RDWR | O_CLOEXEC, S_IRUSR | S_IWUSR)) 
== -1) {
 virReportSystemError(errno, _("g_mkstemp(\"%s\") failed"), tmp);
@@ -10675,6 +10675,7 @@ qemuDomainMemoryPeek(virDomainPtr dom,
 if (!(vm = qemuDomainObjFromDomain(dom)))
 goto cleanup;
 
+priv = vm->privateData;
 cfg = virQEMUDriverGetConfig(driver);
 
 if (virDomainMemoryPeekEnsureACL(dom->conn, vm->def) < 0)
@@ -10692,7 +10693,7 @@ qemuDomainMemoryPeek(virDomainPtr dom,
 if (virDomainObjCheckActive(vm) < 0)
 goto endjob;
 
-tmp = g_strdup_printf("%s/qemu.mem.XX", cfg->cacheDir);
+tmp = g_strdup_printf("%s/qemu.mem.XX", priv->libDir);
 
 /* Create a temporary filename. */
 if ((fd = g_mkstemp_full(tmp, O_RDWR | O_CLOEXEC, S_IRUSR | S_IWUSR)) == 
-1) {
@@ -10703,7 +10704,6 @@ qemuDomainMemoryPeek(virDomainPtr dom,
 
 qemuSecurityDomainSetPathLabel(driver, vm, tmp, false);
 
-priv = vm->privateData;
 qemuDomainObjEnterMonitor(driver, vm);
 if (flags == VIR_MEMORY_VIRTUAL) {
 if (qemuMonitorSaveVirtualMemory(priv->mon, offset, size, tmp) < 0) {
-- 
2.31.1




[PATCH v3 2/2] qemu: don't change ownership of cache directory

2021-09-13 Thread Peng Liang
Commit 6bcf25017bc6 ("virDomainMemoryPeek API") introduced memory peek
and commit 9936aecfd1b4 ("qemu: Implement the driver methods")
introduced screenshot.  Both of them will put temporary files in
/var/cache/libvirt/qemu, and the temporary files are created by QEMU.
Therefore, the ownership of /var/cache/libvirt/qemu should be changed to
user and group configured in qemu.conf to make sure that QEMU process
can create and write files in the cache directory.

Libvirt will only put the temporary files in /var/cache/libvirt/qemu
until commit cbde35899b90 ("Cache result of QEMU capabilities
extraction"), which will put the cache of QEMU capabilities in
'capabilities' subdir of the cache directory.  Because the capabilities
is used by libvirt, the ownership of both 'capabilities' subdir and
capabilitie files are root.  However, when QEMU process runs as a
regular user (e.g. qemu user), the ownership of /var/cache/libvirt/qemu
will be changed to qemu:qemu while that of
/var/cache/libvirt/qemu/capabilities will be still root:root.  Then the
regular user could spoof different capabilities, which maybe lead to
denial of service.

Since the previous patch has move the temp files of screenshot and
memory peek to per-domain directory, no one except domain capabilities
uses cacheDir currently.  And since domain capabilities are used by
libvirtd instead of QEMU, no need to change the ownership of cacheDir to
qemu:qemu explicitly.

Signed-off-by: Peng Liang 
---
 src/qemu/qemu_driver.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 7ffe5f1e3856..818889219b11 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -748,13 +748,6 @@ qemuStateInitialize(bool privileged,
  (int)cfg->group);
 goto error;
 }
-if (chown(cfg->cacheDir, cfg->user, cfg->group) < 0) {
-virReportSystemError(errno,
- _("unable to set ownership of '%s' to %d:%d"),
- cfg->cacheDir, (int)cfg->user,
- (int)cfg->group);
-goto error;
-}
 if (chown(cfg->saveDir, cfg->user, cfg->group) < 0) {
 virReportSystemError(errno,
  _("unable to set ownership of '%s' to %d:%d"),
-- 
2.31.1




[PATCH v3 0/2] qemu: don't change ownership of cache directory

2021-09-13 Thread Peng Liang
See path 2 for detail.

v2 -> v3:
- Change the directory to put the temp files of screenshot and
memorypeek from autoDumpPath to per-domain directory [Daniel]

v1 -> v2:
- Add commit message for path1 [Peter]

Peng Liang (2):
  qemu: move temp file of screenshot and memorypeek to per-domain dir
  qemu: don't change ownership of cache directory

 src/qemu/qemu_driver.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

-- 
2.31.1




device-specific VFIO drivers

2021-09-13 Thread Laine Stump
The Linux kernel has recently added support for device-specific VFIO 
drivers to be used (instead of the current generic, one-size-fits-all 
vfio_pci driver) when assigning a device to a guest with VFIO. The 
intent of this is to (for example) support APIs for device-specific 
setup that needs to be done before the device is handed over to the 
guest, and to support migration of guests using these devices. More 
details can be found in the comments of the patch emails here:


 https://lore.kernel.org/kvm/20210826103912.128972-1-yish...@nvidia.com/

There are a couple of issues that need to be resolved before that will 
work well with libvirt-managed guests:



1) Although they've outlined a system for determining the best / most 
specific driver for a device by extending the existing modules aliases, 
the kernel people have left it up to userspace to actually parse the 
module.aliases info to determine the best driver for a device. They've 
even "thrown us a bone" in the form of an example python script to do 
just that:


https://github.com/maxgurtovoy/linux_tools/blob/main/vfio/bind_vfio_pci_driver.py

Currently, if a device is assigned to a guest with managed='yes'> (the default) then libvirt will unbind the device from 
whatever host driver its bound to, and bind it to the vfio_pci driver 
(since, up to now, that has been the only driver available). In the 
future, if we want to be able to take advantage of the extended 
capabilities provided by the device-specific vfio drivers, we will need 
to figure out which driver to load.


I personally don't like the idea of embedding the logic from the above 
example program into libvirt - this really seems to me like something 
that should be implemented in one place for use by many consumers 
(libvirt being one of the consumers). One suggestion made by Alex 
Williamson (in a private discussion, not on a mailing list) was that 
possibly we could get the driverctl utility to add this functionality, 
and then call driverctl to learn the name of the best-fit vfio driver 
for a device. I haven't yet looked at it, but I've been told that 
driverctl is a bash script :-O. If we decide that's a good route, 
perhaps we could also convince someone to convert drivertctl to rust, 
similar to what jjongsma did with mdevctl (nudge nudge, hint hint).


Or maybe I'm making it into a bigger deal than it needs to be, and we 
should just implement the logic of the python script up above directly 
in libvirt. Does anyone have an opinion?


(NB: If a device is assigned to a guest by libvirt with managed='no'> then libvirt will just blindly assume that it is already 
bound to the correct driver, so until we get internal support for using 
the device-specific drivers, users can still get the same functionality 
by binding their devices to the device-specific vfio driver externally 
to libvirt).


=

2) There may be cases where, in spite of a device-specific driver being 
available, the user prefers to use the generic vfio_pci driver. To 
support that we will need to have a place in our config to set the 
driver name.


We already have a  subelement of  that was originally 
added to allow choosing between VFIO device assignment and legacy KVM 
device assignment:




All support for KVM device assignment was removed a few years ago, so in 
practice the driver name is always "vfio". The most natural looking way 
to support device-specific drivers would be to use this name attribute 
to specify the driver name, e.g. if you wanted to let libvirt select the 
best driver, you would use:


   

(what it's currently set to in everyone's configs). But if you wanted to 
force use of the generic driver, you'd use:


   

or if you wanted to force use of another driver that wasn't the 'best 
fit' according to the module aliases, you could use, e.g.:


   

I'm uncomfortable with the fact that we're effectively "re-using" the 
name attribute for a new purpose though - up until now it has been 
"which device assignment method?", but this changes it to "which vfio 
driver should be loaded?". So maybe we need a new element. I'm not sure 
what to call it though. How about this?


   

Does anyone have a better idea for the name?



Re: [PATH v2 1/2] qemu: move temp file of screenshot and memorypeek to autoDumpPath

2021-09-13 Thread Peng Liang
On 9/13/2021 7:23 PM, Daniel P. Berrangé wrote:
> On Mon, Sep 13, 2021 at 07:11:04PM +0800, Peng Liang wrote:
>> The temp files of screenshot and memory peek, which are created by QEMU,
>> are put in the cache directory.  However, the caches of domain
>> capabilities, which are created and used by libvirtd, are also put in
>> the cache directory.  In order to make the cache directory more secure,
>> move the temp files of screenshot and memory peek to autoDumpPath.
>>
>> Since the temp files are just temporary files and are only used by
>> libvirtd (libvirtd will delete them after use), the use of screenshot
>> and memory peek will be affected.
> 
> autoDumpPath does nt look like the right thing to be using here.
> Why don't we just put these files in a subdirectory of the cache
> dir to avoid the problem with capabilities ?
> 

Ah, I just find that autoDumpPath is for watchdog event to auto-dump a
guest.  But I think the files libvirtd put in the cache directory
(except capabilites) are just temporary files instead of cache files.
So IMHO, a subdir in the cache directory is also not the perfect path
for these files.  How about putting these files in the pre-domain dir
(e.g. /var/lib/libvirt/qemu/domain-1-test)?

> 
> Regards,
> Daniel
> 

Thanks,
Peng




Re: [libvirt PATCH 2/2] ci: Avoid use of magic constants

2021-09-13 Thread Beraldo Leal
On Fri, Sep 10, 2021 at 03:59:55PM +0200, Andrea Bolognani wrote:
> The value 3 is the length of the "ci-" prefix, which is present
> in the items returned by get_registry_images() but not in those
> returned by get_dockerfiles().
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  ci/util.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/ci/util.py b/ci/util.py
> index d8be6631eb..f9f3c550db 100644
> --- a/ci/util.py
> +++ b/ci/util.py
> @@ -68,10 +68,11 @@ def get_registry_stale_images(registry_uri: str, 
> base_dir: str) -> Dict[str, int
>  
>  dockerfiles = get_dockerfiles(base_dir)
>  images = get_registry_images(registry_uri)
> +name_prefix = "ci-"
>  
>  stale_images = {}
>  for img in images:
> -if img["name"][3:] not in dockerfiles:
> +if img["name"][len(name_prefix):] not in dockerfiles:
>  stale_images[img["name"]] = img["id"]
>  
>  return stale_images
> -- 
> 2.31.1
> 

Reviewed-by: Beraldo Leal 

--
Beraldo



Re: [PATCH] qemuDomainChangeDiskLive: Modify 'startupPolicy' before changing source

2021-09-13 Thread Peter Krempa
On Mon, Sep 13, 2021 at 13:56:45 +0200, Vojtech Juranek wrote:
> On Monday, 13 September 2021 13:30:45 CEST Nir Soffer wrote:
> > On Mon, Sep 13, 2021 at 11:04 AM Peter Krempa  wrote:
> > > On Fri, Sep 10, 2021 at 22:04:01 +0300, Nir Soffer wrote:
> > > > On Fri, Sep 10, 2021 at 4:35 PM Peter Krempa  wrote:

[...]

> > > On another note, I don't quite understand though why oVirt actually even
> > > uses startup policy in the first place. Since oVirt is already doing a
> > > pretty complicated storage management, checking whether the image file
> > > exists is a trivial operation.
> > 
> > In oVirt we know that the image exists since we prepare it before
> > starting the VM
> > (e.g. activate logical volume), and we will fail the operation if the
> > disk does not
> > exists, so I think this attribute is not needed for our use case and
> > it should be
> > removed.
> 
> +1
> 
> > > Similarly, when migrating you can use the destination XML (which can be
> > > optionally used with the migration API) allows you to update storage
> > > source and even provide empty storage for a cdrom. This can be used to
> > > update paths to storage too. This allows superior flexibility when
> > > compared to startup policy.
> > 
> > Even if we remove startupPolicy in new oVirt version, we have to
> > support migration
> > from an older version using startupPolicy in the xml.
> 
> What is the problem here? The issue happens only when you try to change the 
> CD, not during the migration, so there shouldn't be any problem (I also tried 
> it now and I can migrate VM with block-based CD without any issue). The issue 
> is there only when VM is started with empty CD or file-based CD, as in these 
> cases we added startupPolicy into the XML.

I assume that the problem is that removing 'startupPolicy' being added
to such configs would not fix the problem for older VMs which were
started with 'startupPolicy' being present thus you'd have to have the
code removing it anyways.

Live migration isn't a problem though as I've pointed out in a different
reply as you can drop it from the destination XML as it's not guest ABI.



Re: [PATCH] qemuDomainChangeDiskLive: Modify 'startupPolicy' before changing source

2021-09-13 Thread Peter Krempa
On Mon, Sep 13, 2021 at 14:30:45 +0300, Nir Soffer wrote:
> On Mon, Sep 13, 2021 at 11:04 AM Peter Krempa  wrote:
> >
> > On Fri, Sep 10, 2021 at 22:04:01 +0300, Nir Soffer wrote:
> > > On Fri, Sep 10, 2021 at 4:35 PM Peter Krempa  wrote:

[...]

> > > Do we have a way to switch from file to block with current libvirt
> > > (centos 8 stream, rhel 8.4) without this patch, or do we need to wait
> > > until this fix is available? (rhel 8.5?)
> >
> > You can issue two separate update calls. One updating the startup policy
> > first and then the second one updating the source.
> 
> If the second call fails, this leave us with half of the change.

Based on what you write below, where you state that you actually make
sure that the image exists it seems that it should not be a problem at
all.

Specifically just removing startupPolicy has no guest visible change so
it's allowed on migration and other cases.

Just removing it from an empty drive has no effect other than recording
it in the libvirt metadata.

Removing it from a full cdrom could have semantic effects on migration
but they are removed by oVirt actually checking before. Same for start
of a new VM where you always re-define the XML anyways.

> > On another note, I don't quite understand though why oVirt actually even
> > uses startup policy in the first place. Since oVirt is already doing a
> > pretty complicated storage management, checking whether the image file
> > exists is a trivial operation.
> 
> In oVirt we know that the image exists since we prepare it before
> starting the VM
> (e.g. activate logical volume), and we will fail the operation if the
> disk does not
> exists, so I think this attribute is not needed for our use case and
> it should be
> removed.
> 
> > Similarly, when migrating you can use the destination XML (which can be
> > optionally used with the migration API) allows you to update storage
> > source and even provide empty storage for a cdrom. This can be used to
> > update paths to storage too. This allows superior flexibility when
> > compared to startup policy.
> 
> Even if we remove startupPolicy in new oVirt version, we have to
> support migration
> from an older version using startupPolicy in the xml.

Speaking of live migration you can remove it from the destination XML,
since startup policy is exempt from the ABI stability check ... well
technically it's not even ABI.

The only case where it could be a problem is when upgrading existing
host as the running VMs would still have it.



Re: [PATCH] qemuDomainChangeDiskLive: Modify 'startupPolicy' before changing source

2021-09-13 Thread Nir Soffer
On Mon, Sep 13, 2021 at 11:04 AM Peter Krempa  wrote:
>
> On Fri, Sep 10, 2021 at 22:04:01 +0300, Nir Soffer wrote:
> > On Fri, Sep 10, 2021 at 4:35 PM Peter Krempa  wrote:
> > >
> > > We don't support all startup policies with all source types so to
> > > correctly allow switching from a 'file' based cdrom with 'optional'
> > > startup policy to a 'block' based one which doesn't support optional we
> > > must update the startup policy field first. Obviously we need to have
> > > fallback if the update fails.
> >
> > Why is there a difference between file and block for startup policy?
>
> I'm not sure about the historic reasons for why 'block' was not
> considered when this was implemented originally.
>
> > Is this documented?
>
> (citation from 'formatdomain.rst' the source for 
> https://www.libvirt.org/formatdomain.html )
>
>For a "file" or "volume" disk type which represents a cdrom or floppy (the
>``device`` attribute), it is possible to define policy what to do with the
>disk if the source file is not accessible. (NB, ``startupPolicy`` is not
>valid for "volume" disk unless the specified storage volume is of "file"
>type). This is done by the ``startupPolicy`` attribute ( :since:`since 
> 0.9.7`
>), accepting these values:
>
>= 
> =
>mandatory fail if missing for any reason (the default)
>requisite fail if missing on boot up, drop if missing on 
> migrate/restore/revert
>optional  drop if missing at any start attempt
>= 
> =
>
>:since:`Since 1.1.2` the ``startupPolicy`` is extended to support hard 
> disks
>besides cdrom and floppy. On guest cold bootup, if a certain disk is not
>accessible or its disk chain is broken, with startupPolicy 'optional' the
>guest will drop this disk. This feature doesn't support migration 
> currently.
>
> As you can notice only "file" and "volume" (pointing to a file) are
> allowed.

Looks like we missed this detail when adding support for cdrom on
block storage.

> > Do we have a way to switch from file to block with current libvirt
> > (centos 8 stream, rhel 8.4) without this patch, or do we need to wait
> > until this fix is available? (rhel 8.5?)
>
> You can issue two separate update calls. One updating the startup policy
> first and then the second one updating the source.

If the second call fails, this leave us with half of the change.

> On another note, I don't quite understand though why oVirt actually even
> uses startup policy in the first place. Since oVirt is already doing a
> pretty complicated storage management, checking whether the image file
> exists is a trivial operation.

In oVirt we know that the image exists since we prepare it before
starting the VM
(e.g. activate logical volume), and we will fail the operation if the
disk does not
exists, so I think this attribute is not needed for our use case and
it should be
removed.

> Similarly, when migrating you can use the destination XML (which can be
> optionally used with the migration API) allows you to update storage
> source and even provide empty storage for a cdrom. This can be used to
> update paths to storage too. This allows superior flexibility when
> compared to startup policy.

Even if we remove startupPolicy in new oVirt version, we have to
support migration
from an older version using startupPolicy in the xml.

Nir



Re: [PATH v2 1/2] qemu: move temp file of screenshot and memorypeek to autoDumpPath

2021-09-13 Thread Daniel P . Berrangé
On Mon, Sep 13, 2021 at 07:11:04PM +0800, Peng Liang wrote:
> The temp files of screenshot and memory peek, which are created by QEMU,
> are put in the cache directory.  However, the caches of domain
> capabilities, which are created and used by libvirtd, are also put in
> the cache directory.  In order to make the cache directory more secure,
> move the temp files of screenshot and memory peek to autoDumpPath.
> 
> Since the temp files are just temporary files and are only used by
> libvirtd (libvirtd will delete them after use), the use of screenshot
> and memory peek will be affected.

autoDumpPath does nt look like the right thing to be using here.
Why don't we just put these files in a subdirectory of the cache
dir to avoid the problem with capabilities ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[PATH v2 2/2] qemu: don't change ownership of cache directory

2021-09-13 Thread Peng Liang
Commit 6bcf25017bc6 ("virDomainMemoryPeek API") introduced memory peek
and commit 9936aecfd1b4 ("qemu: Implement the driver methods")
introduced screenshot.  Both of them will put temporary files in
/var/cache/libvirt/qemu, and the temporary files are created by QEMU.
Therefore, the ownership of /var/cache/libvirt/qemu should be changed to
user and group configured in qemu.conf to make sure that QEMU process
can create and write files in the cache directory.

Libvirt will only put the temporary files in /var/cache/libvirt/qemu
until commit cbde35899b90 ("Cache result of QEMU capabilities
extraction"), which will put the cache of QEMU capabilities in
'capabilities' subdir of the cache directory.  Because the capabilities
is used by libvirt, the ownership of both 'capabilities' subdir and
capabilitie files are root.  However, when QEMU process runs as a
regular user (e.g. qemu user), the ownership of /var/cache/libvirt/qemu
will be changed to qemu:qemu while that of
/var/cache/libvirt/qemu/capabilities will be still root:root.  Then the
regular user could spoof different capabilities, which maybe lead to
denial of service.

Since the previous patch has move the temp files of screenshot and
memory peek to autoDumpPath, no one except domain capabilities uses
cacheDir currently.  And since domain capabilities are used by libvirtd
instead of QEMU, no need to change the ownership of cacheDir to
qemu:qemu explicitly.

Signed-off-by: Peng Liang 
---
 src/qemu/qemu_driver.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index e929e950e848..8a6fd5767893 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -748,13 +748,6 @@ qemuStateInitialize(bool privileged,
  (int)cfg->group);
 goto error;
 }
-if (chown(cfg->cacheDir, cfg->user, cfg->group) < 0) {
-virReportSystemError(errno,
- _("unable to set ownership of '%s' to %d:%d"),
- cfg->cacheDir, (int)cfg->user,
- (int)cfg->group);
-goto error;
-}
 if (chown(cfg->saveDir, cfg->user, cfg->group) < 0) {
 virReportSystemError(errno,
  _("unable to set ownership of '%s' to %d:%d"),
-- 
2.31.1




[PATH v2 1/2] qemu: move temp file of screenshot and memorypeek to autoDumpPath

2021-09-13 Thread Peng Liang
The temp files of screenshot and memory peek, which are created by QEMU,
are put in the cache directory.  However, the caches of domain
capabilities, which are created and used by libvirtd, are also put in
the cache directory.  In order to make the cache directory more secure,
move the temp files of screenshot and memory peek to autoDumpPath.

Since the temp files are just temporary files and are only used by
libvirtd (libvirtd will delete them after use), the use of screenshot
and memory peek will be affected.

Signed-off-by: Peng Liang 
---
 src/qemu/qemu_driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index dfc27572c461..e929e950e848 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -3431,7 +3431,7 @@ qemuDomainScreenshot(virDomainPtr dom,
 }
 }
 
-tmp = g_strdup_printf("%s/qemu.screendump.XX", cfg->cacheDir);
+tmp = g_strdup_printf("%s/qemu.screendump.XX", cfg->autoDumpPath);
 
 if ((tmp_fd = g_mkstemp_full(tmp, O_RDWR | O_CLOEXEC, S_IRUSR | S_IWUSR)) 
== -1) {
 virReportSystemError(errno, _("g_mkstemp(\"%s\") failed"), tmp);
@@ -10692,7 +10692,7 @@ qemuDomainMemoryPeek(virDomainPtr dom,
 if (virDomainObjCheckActive(vm) < 0)
 goto endjob;
 
-tmp = g_strdup_printf("%s/qemu.mem.XX", cfg->cacheDir);
+tmp = g_strdup_printf("%s/qemu.mem.XX", cfg->autoDumpPath);
 
 /* Create a temporary filename. */
 if ((fd = g_mkstemp_full(tmp, O_RDWR | O_CLOEXEC, S_IRUSR | S_IWUSR)) == 
-1) {
-- 
2.31.1




[PATH v2 0/2] qemu: don't change ownership of cache directory

2021-09-13 Thread Peng Liang
See path 2 for detail.

v1 -> v2:
- Add commit message for path1 [Peter]

Peng Liang (2):
  qemu: move temp file of screenshot and memorypeek to autoDumpPath
  qemu: don't change ownership of cache directory

 src/qemu/qemu_driver.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

-- 
2.31.1




Re: [PATCH] docs: Fix dimm example

2021-09-13 Thread Pavel Hrdina
On Mon, Sep 13, 2021 at 11:23:17AM +0200, Michal Privoznik wrote:
> In the example for  we show how to
> configure hugepages as backend. In the example we show 4MiB
> hugepages which are non-standard and thus at the first glance may
> mislead users thinking that a regular sized pages (4K) will be
> used. Use 2MiB as the value instead.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  docs/formatdomain.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Pavel Hrdina 


signature.asc
Description: PGP signature


Re: [libvirt PATCH 0/4] qemu: replace -device sga with -M graphics=off

2021-09-13 Thread Richard W.M. Jones
On Thu, Sep 09, 2021 at 12:25:12PM +0100, Daniel P. Berrangé wrote:
> SeaBIOS >= 1.11 / QEMU  >= 2.11 no longer requires the 'sga'
> device for serial console output from the BIOS. It can be
> done directly with graphics=off machine option.
> 
> This appears to be live migration compatible, despite
> changing the number of option ROMS loaded, though I need a
> little more testing to fully confirm this.
> 
> Daniel P. Berrangé (4):
>   qemu: prevent use of  on non-x86 arches
>   qemu: tweak error message to be more general purpose
>   qemu: switch to use -M graphics=off instead of -device sga
>   qemu: stop probing for '-device sga' support
> 
>  src/qemu/qemu_capabilities.c  |  3 +--
>  src/qemu/qemu_capabilities.h  |  2 +-
>  src/qemu/qemu_command.c   | 25 ---
>  src/qemu/qemu_validate.c  | 15 ---
>  .../caps_2.11.0.x86_64.xml|  1 -
>  .../caps_2.12.0.x86_64.xml|  1 -
>  .../caps_3.0.0.x86_64.xml |  1 -
>  .../caps_3.1.0.x86_64.xml |  1 -
>  .../caps_4.0.0.x86_64.xml |  1 -
>  .../caps_4.1.0.x86_64.xml |  1 -
>  .../caps_4.2.0.x86_64.xml |  1 -
>  .../caps_5.0.0.x86_64.xml |  1 -
>  .../caps_5.1.0.x86_64.xml |  1 -
>  .../caps_5.2.0.x86_64.xml |  1 -
>  .../caps_6.0.0.x86_64.xml |  1 -
>  .../caps_6.1.0.x86_64.xml |  1 -
>  tests/qemuxml2argvdata/bios.args  |  3 +--
>  tests/qemuxml2argvtest.c  |  3 +--
>  18 files changed, 25 insertions(+), 38 deletions(-)

I didn't test it, but the changes look reasonable so:

Reviewed-by: Richard W.M. Jones 

Equivalent (non-)change to libguestfs is:

https://github.com/libguestfs/libguestfs/commit/e14ff937422115e23094ca4cce80ec9fb01c10b3

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top



[PATCH] docs: Fix dimm example

2021-09-13 Thread Michal Privoznik
In the example for  we show how to
configure hugepages as backend. In the example we show 4MiB
hugepages which are non-standard and thus at the first glance may
mislead users thinking that a regular sized pages (4K) will be
used. Use 2MiB as the value instead.

Signed-off-by: Michal Privoznik 
---
 docs/formatdomain.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 67cb934881..7dc93d6abe 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -7745,7 +7745,7 @@ Example: usage of the memory devices
  
  

- 4096
+ 2048
  1-3


-- 
2.32.0



Re: [PATCH v3 0/5] Add support for two i386 pm options which control acpi hotplug

2021-09-13 Thread Ani Sinha



On Sun, 12 Sep 2021, Ani Sinha wrote:

> Hi all:
>
> This patchset introduces libvirt xml support for the following two pm conf
> options:
>
> 
>   
>   
> 
>

Another option is to create a new xml tag and add these under that tag.
Something like:

+   
+ 
+ 
+   

These are not exactly power management options. So putting them under 
may not be correct.
If this is a better approach I will resend the patchset with the changes
along with addressing other review comments.


> The above two options are only available for qemu driver and that too for x86
> guests only. Both of them are global options.
>
> ``acpi-hotplug-bridge`` option enables or disables ACPI hotplug support for 
> cold
> plugged bridges. Examples of cold plugged bridges include PCI-PCI bridge
> (pci-bridge controller) for pc machines and pcie-root-port controller for q35
> machines. The corresponding commandline options to qemu for x86 guests are:
>
> (pc machines): -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=
> (q35 machines): -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=
>
> Being global options, no other bridge specific options for pci-bridge
> controller or pcie-root-port controllers are required. For pc machine type in
> x86, this option is available in qemu for a long time, from version 2.1.
> Please see the changes in qemu.git:
>
> 9e047b982452c6 ("piix4: add acpi pci hotplug support")
> 133a2da488062e ("pc: acpi: generate AML only for PCI0 devices if PCI bridge 
> hotplug is disabled")
>
> For q35 machine type, this was introduced in qemu 6.1 with the following
> changes in qemu.git:
>
> (a) c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")
> (b) 17858a16950860 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
>
> The reasons for enabling ACPI based hotplug for PCIe (q35) based machines (as
> opposed to native hotplug) for bridges are outlined in (b). It is possible 
> that
> some users might still want to use native hotplug on PCIe [1]. Therefore,
> this conf option enables users to choose either ACPI based hotplug or native
> hotplug for cold plugged bridges (for example for pcie root port controller
> in q35 machines).
>
> ``acpi-root-hotplug`` option enables or disables ACPI based hotplug for PCI 
> root
> bus (pci-root controller). This option is only available for pc machine type.
> The corresponding commandline option to qemu for x86 guests is:
>
> -global PIIX4_PM.acpi-root-pci-hotplug=
>
> This additional option enables users to disable hotplug for all devices in the
> system without adding an additional PCI-PCI bridge, putting the devices behind
> the bridge and using the existing ``acpi-hotplug-bridge`` option to disable
> hotplug on that bridge. This feature was introduced from qemu version 5.2 with
> the following change in qemu.git:
>
> 3d7e78aaf ("Introduce a new flag for i440fx to disable PCI hotplug on the 
> root bus")
>
> The above qemu commit describes some compelling reasons why users might to
> disable hotplug on PCI root buses [2].
>
> A brief summary of the patches:
>
> >[PATCH v3 1/5] qemu: capablities: detect presence of
> >[PATCH v3 2/5] qemu: capablities: detect presence of
> Patches 1 and 2 implement support for qemu capability checks for the above
> config options.
>
> >[PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and
> Patch 3 actually adds the config option to the schema and adds related unit
> tests.
>
> >[PATCH v3 4/5] qemu: command: add support for qemu options that
> Patch 4 adds the backend qemu commandline support for the options. It also 
> adds
> relevant unit tests for the same.
>
> >[PATCH v3 5/5] NEWS: add new acpi pci hotplug options in the release
> Patch 5 adds the release notes for the next libvirt release.
>
>
> Changelog:
> v1: initial implementation. Had some bugs and missed some unit tests.
> v2: fixed bugs and added additional missing unit tests.
> v3: reorganized the patches as per Laine's suggestion. Added more
> details in commit messages. Added conf description in formatdomain.rst.
> Added changelog for next release.
>
>
> Notes:
>
> [1] One concrete example of why one might still want to use native hotplug 
> with
> pcie-root-port controller is the fact that we are still discovering issues 
> with
> acpi hotplug on PCIE. One such issue is:
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html
> Another reason is that users have been using native hotplug on pcie root ports
> up until now. They have built and tested their systems based on native 
> hotplug.
> They may not want to suddenly move to acpi based hotplug just because it is 
> now
> the default in qemu. Supporting the option to chose one or the other through
> libvirt makes things simpler for end users.
>
> [2] The use case scenario described by Laine in
> https://listman.redhat.com/archives/libvir-list/2020-February/msg00110.html
> intentionally does not discuss i440fx and focusses solely on q35. I do realize
> that redhat has moved on 

Re: [PATCH 1/2] qemu: move temp file of screenshot and memorypeek to autoDumpPath

2021-09-13 Thread Peter Krempa
On Mon, Sep 13, 2021 at 10:10:58 +0800, Peng Liang wrote:

Please elaborate in the commit message why you are making this change.

> Signed-off-by: Peng Liang 
> ---
>  src/qemu/qemu_driver.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)



Re: [PATCH] qemuDomainChangeDiskLive: Modify 'startupPolicy' before changing source

2021-09-13 Thread Peter Krempa
On Fri, Sep 10, 2021 at 22:04:01 +0300, Nir Soffer wrote:
> On Fri, Sep 10, 2021 at 4:35 PM Peter Krempa  wrote:
> >
> > We don't support all startup policies with all source types so to
> > correctly allow switching from a 'file' based cdrom with 'optional'
> > startup policy to a 'block' based one which doesn't support optional we
> > must update the startup policy field first. Obviously we need to have
> > fallback if the update fails.
> 
> Why is there a difference between file and block for startup policy?

I'm not sure about the historic reasons for why 'block' was not
considered when this was implemented originally.

> Is this documented?

(citation from 'formatdomain.rst' the source for 
https://www.libvirt.org/formatdomain.html )

   For a "file" or "volume" disk type which represents a cdrom or floppy (the
   ``device`` attribute), it is possible to define policy what to do with the
   disk if the source file is not accessible. (NB, ``startupPolicy`` is not
   valid for "volume" disk unless the specified storage volume is of "file"
   type). This is done by the ``startupPolicy`` attribute ( :since:`since 0.9.7`
   ), accepting these values:

   = 
=
   mandatory fail if missing for any reason (the default)
   requisite fail if missing on boot up, drop if missing on 
migrate/restore/revert
   optional  drop if missing at any start attempt
   = 
=

   :since:`Since 1.1.2` the ``startupPolicy`` is extended to support hard disks
   besides cdrom and floppy. On guest cold bootup, if a certain disk is not
   accessible or its disk chain is broken, with startupPolicy 'optional' the
   guest will drop this disk. This feature doesn't support migration currently.

As you can notice only "file" and "volume" (pointing to a file) are
allowed.

> Do we have a way to switch from file to block with current libvirt
> (centos 8 stream, rhel 8.4) without this patch, or do we need to wait
> until this fix is available? (rhel 8.5?)

You can issue two separate update calls. One updating the startup policy
first and then the second one updating the source.

On another note, I don't quite understand though why oVirt actually even
uses startup policy in the first place. Since oVirt is already doing a
pretty complicated storage management, checking whether the image file
exists is a trivial operation.

Similarly, when migrating you can use the destination XML (which can be
optionally used with the migration API) allows you to update storage
source and even provide empty storage for a cdrom. This can be used to
update paths to storage too. This allows superior flexibility when
compared to startup policy.



[PATCH 1/1] qemu_tpm: Start swtpm(8) daemon with --terminate switch

2021-09-13 Thread Nick Chevsky
Launch swtpm(8) with the --terminate switch, which guarantees that
the daemon will shut itself down when QEMU dies (current behavior).
We had so far been getting this "for free" (i.e. without --terminate)
due to a defect in upstream's connection handling logic [1], on which
libvirt should not rely since it will eventually be fixed. Adding
--terminate preserves and guarantees the current behavior.

[1] https://github.com/stefanberger/swtpm/pull/509

Signed-off-by: Nick Chevsky 
---
 src/qemu/qemu_tpm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/qemu/qemu_tpm.c b/src/qemu/qemu_tpm.c
index 477a26dc69..100481503c 100644
--- a/src/qemu/qemu_tpm.c
+++ b/src/qemu/qemu_tpm.c
@@ -576,6 +576,8 @@ qemuTPMEmulatorBuildCommand(virDomainTPMDef *tpm,
 virCommandAddArg(cmd, "--log");
 virCommandAddArgFormat(cmd, "file=%s", tpm->data.emulator.logfile);
 
+virCommandAddArg(cmd, "--terminate");
+
 virCommandSetUID(cmd, swtpm_user);
 virCommandSetGID(cmd, swtpm_group);
 
-- 
2.30.2




[PATCH 0/1] qemu_tpm: Start swtpm(8) daemon with --terminate switch

2021-09-13 Thread Nick Chevsky
libvirt expects the swtpm(8) daemon to auto-terminate along with QEMU.
While that's already the case, it's currently happening for the wrong
reason: swtpm's documented way of achieving this behavior is via the
--terminate switch (which causes the daemon to shut down when the
data channel connection drops), but libvirt isn't currently using
this switch--and it should.

The reason this currently works anyway, even without the --terminate
switch, is two-fold:

(1) When QEMU terminates gracefully, it sends command CMD_SHUTDOWN to
swtpm which triggers a shutdown. Nothing wrong with this one.
(2) When QEMU dies abruptly (e.g. SIGKILL, SIGSEGV) without issuing
CMD_SHUTDOWN, swtpm should (a) shut down if the --terminate switch
was given OR (b) stay alive if --terminate wasn't given. At the
moment this isn't being respected, and swtpm unconditionally shuts
down (regardless of whether --terminate was given or not) due to a
bug in swtpm's connection handling logic [1]. libvirt currently
relies on this incorrect and undocumented upstream behavior,
trusting swtpm to shut itself down even when --terminate wasn't
given, which is wrong and bound to break.

The discussion [1] between swtpm's author and I shows that --terminate
(a) is the proper way to achieve--and guarantee--the current behavior,
(b) is innocuous to add since it won't alter existing behavior, (c)
should've been used by libvirt all along, and (d) should be enforced
by swtpm going forward.

Since libvirt presently relies on swtpm's current (incorrect) behavior
and we don't want to break libvirt, we need libvirt to start invoking
swtpm with the --terminate switch ASAP so that the upstream bug can
be fixed as soon as it's safe. Fixing the bug is the first step toward
eventually enabling non-libvirt swtpm users to optionally run swtpm as
a persistent service, allowing a VM to connect to and disconnect from
it without the daemon dying.

Proxmox VE, to which I also contribute, is already using --terminate
in its (WIP) swtpm implementation.

[1] https://github.com/stefanberger/swtpm/pull/509 -- Note that this
already-merged PR addresses only one half of the bug; the other
half (which will actually effect the change) remains on hold until
libvirt implements --terminate.

Nick Chevsky (1):
  qemu_tpm: Start swtpm(8) daemon with --terminate switch

 src/qemu/qemu_tpm.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.30.2




Re: [PATCH v4 for v7.6.0 00/14] Introduce virtio-mem model

2021-09-13 Thread Michal Prívozník
On 7/7/21 12:30 PM, David Hildenbrand wrote:
> On 23.06.21 12:12, Michal Privoznik wrote:
>> v4 of:
>>
>> https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html
>>
>> diff to v3:
>> - Rebased code on the top of master
>> - Tried to work in all Peter's review suggestions
>> - Fixed a bug where adjusting  was viewed as hotplug of new
>>     by XML validator and thus if  was close
>> enough to
>>     the validator reported an error (this was reported by
>>    David).
>>
> 
> Hi Michal,
> 


> 6. 4k source results in an error
> 
>     
>   4096
>   0-1
>     
> 
> "error: internal error: Unable to find any usable hugetlbfs mount for
> 4096 KiB"
> 
> This example is taken from https://libvirt.org/formatdomain.html for
> DIMMs. Not sure what the expected behavior is.

Just realized that this IS expected behavior. 4096KiB pages (=4MiB) are
not regular system pages (4KiB). Thus libvirt is trying to find
hugetlbfs mount point that's serving 4MiB pages, unsuccessfully. I'll
post a patch that fixes the example though.

Michal