Re: [libvirt-users] Determining domain job kind from job stats?

2017-02-20 Thread Milan Zamazal
Jiri Denemark <jdene...@redhat.com> writes:

> On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote:
>> Jiri Denemark <jdene...@redhat.com> writes:
>> 
>> > On Fri, Feb 10, 2017 at 21:50:19 +0100, Milan Zamazal wrote:
>> >> Hi, is there a reliable way to find out to what kind of job does the
>> >> information returned from virDomainGetJobStats or provided in
>> >> VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event callback belong to?
>> >
>> > No, libvirt expects that the caller knows what job it started. All jobs
>> > currently reported using virDomainGetJobStats API or
>> > VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event are internally implemented as
>> > migration in QEMU driver (either to a file or to a network socket),
>> > which may confuse any heuristics for detecting the job type from the set
>> > of fields returned by libvirt.
>> 
>> I see, thank you for explanation.
>> 
>> > What is the problem you are trying to solve?
>> 
>> There are basically two problems:
>> 
>> - When the job completion callback is called, I need to distinguish what
>>   kind of job was it to perform the appropriate actions.  It would be
>>   easier if I knew the job type directly in the callback (no need to
>>   coordinate anything), but "external" job tracking is also possible.
>
> An immediate answer would be: "don't rely on the completion callback and
> just check the return value of the API which started the job", but I
> guess you want it because checking the return value is not possible when
> the process which started the job is not running anymore as described
> below.

Well, avoiding using the completion callback is probably OK for me.
(In case of the process restart, I don't expect having everything
perfectly working, just some basic sanity.)

>> - If I lost track of my jobs (e.g. because of a crash and restart), I'd
>>   like to find out whether a given VM is migrating.  Examining the job
>>   looked like a good candidate to get the information, but apparently
>>   it's not.  Again, I can probably arrange things to handle that, but to
>>   get the information directly from libvirt (not necessarily via job
>>   info) would be easier and more reliable.
>
> Apparently you are talking about peer-to-peer migration, 

Yes.

> otherwise the migration would be automatically canceled when the
> process which started it disappears. I'm afraid this is not currently
> possible in general. You might be able to get something by checking
> the domain's status, but it won't work in all cases.

Too bad.  Could some future libvirt version provide that information?

Thank you for clarification,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] Determining domain job kind from job stats?

2017-02-10 Thread Milan Zamazal
Hi, is there a reliable way to find out to what kind of job does the
information returned from virDomainGetJobStats or provided in
VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event callback belong to?

I'm specifically interested in distinguishing host-to-host migration
jobs (e.g. those started by virDomainMigrateToUri* functions) from other
jobs.  If there is no better way, I'm thinking about examining presence
or values of certain fields in the stats.  I'd be fine with that as long
as I can be sure it's a reliable way to identify the job kind.

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] Determining domain job kind from job stats?

2017-02-17 Thread Milan Zamazal
Jiri Denemark <jdene...@redhat.com> writes:

> On Fri, Feb 10, 2017 at 21:50:19 +0100, Milan Zamazal wrote:
>> Hi, is there a reliable way to find out to what kind of job does the
>> information returned from virDomainGetJobStats or provided in
>> VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event callback belong to?
>
> No, libvirt expects that the caller knows what job it started. All jobs
> currently reported using virDomainGetJobStats API or
> VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event are internally implemented as
> migration in QEMU driver (either to a file or to a network socket),
> which may confuse any heuristics for detecting the job type from the set
> of fields returned by libvirt.

I see, thank you for explanation.

> What is the problem you are trying to solve?

There are basically two problems:

- When the job completion callback is called, I need to distinguish what
  kind of job was it to perform the appropriate actions.  It would be
  easier if I knew the job type directly in the callback (no need to
  coordinate anything), but "external" job tracking is also possible.

- If I lost track of my jobs (e.g. because of a crash and restart), I'd
  like to find out whether a given VM is migrating.  Examining the job
  looked like a good candidate to get the information, but apparently
  it's not.  Again, I can probably arrange things to handle that, but to
  get the information directly from libvirt (not necessarily via job
  info) would be easier and more reliable.

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call

2017-03-17 Thread Milan Zamazal
"Daniel P. Berrange" <berra...@redhat.com> writes:

> On Fri, Mar 17, 2017 at 11:55:13AM +0100, Milan Zamazal wrote:
>> Hi, we experienced a strange, non-reproducible error after a successful
>> migration to another host.  When we called virDomainDestroyFlags with
>> VIR_DOMAIN_DESTROY_GRACEFUL flag after the migration on the source host,
>> we got VIR_ERR_OPERATION_INVALID (code 55) error.  The same with
>> repeated virDomainDestroyFlags calls.  Normally, we would expect either
>> success or VIR_ERR_NO_DOMAIN error.  `virsh list' didn't show the VM.
>
> What about 'virsh list --all' - i expect you have an inactive guest
> present, as calling destory on an inactive guest triggers OPERATION_INVALID

I see.  It's interesting, since we use transient domains.  Are there
known circumstances when OPERATION_INVALID could be returned for a
transient domain?  Can we assume that we never receive that error when
trying to destroy a running domain?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call

2017-03-17 Thread Milan Zamazal
Hi, we experienced a strange, non-reproducible error after a successful
migration to another host.  When we called virDomainDestroyFlags with
VIR_DOMAIN_DESTROY_GRACEFUL flag after the migration on the source host,
we got VIR_ERR_OPERATION_INVALID (code 55) error.  The same with
repeated virDomainDestroyFlags calls.  Normally, we would expect either
success or VIR_ERR_NO_DOMAIN error.  `virsh list' didn't show the VM.

Can anybody please explain to us when this can happen and what the error
means in this context?  When we have good reasons to believe that the VM
is down (e.g. after a migration call successfully finishes) and we
receive such an error from virDomainDestroyFlags, is it safe to assume
the VM is basically gone and can we perform standard cleanup actions
(like removing related files from the host file system)?

Thank you,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call

2017-03-17 Thread Milan Zamazal
"Daniel P. Berrange" <berra...@redhat.com> writes:

> On Fri, Mar 17, 2017 at 02:07:11PM +0100, Milan Zamazal wrote:
>> "Daniel P. Berrange" <berra...@redhat.com> writes:
>> 
>> > On Fri, Mar 17, 2017 at 11:55:13AM +0100, Milan Zamazal wrote:
>> >> Hi, we experienced a strange, non-reproducible error after a successful
>> >> migration to another host.  When we called virDomainDestroyFlags with
>> >> VIR_DOMAIN_DESTROY_GRACEFUL flag after the migration on the source host,
>> >> we got VIR_ERR_OPERATION_INVALID (code 55) error.  The same with
>> >> repeated virDomainDestroyFlags calls.  Normally, we would expect either
>> >> success or VIR_ERR_NO_DOMAIN error.  `virsh list' didn't show the VM.
>> >
>> > What about 'virsh list --all' - i expect you have an inactive guest
>> > present, as calling destory on an inactive guest triggers OPERATION_INVALID
>> 
>> I see.  It's interesting, since we use transient domains.  Are there
>> known circumstances when OPERATION_INVALID could be returned for a
>> transient domain?  Can we assume that we never receive that error when
>> trying to destroy a running domain?
>
> Cleanup & destruction of domains is an area where there is relatively
> high level of concurrency in libvirt. So it is conceivable that you
> would see OPERATION_INVALID for a transient guest if libvirt is part
> way through cleaning it up - it shouldn't be in that state for very
> long though

We had the state returning OPERATION_INVALID for "infinite" time.  That
could be caused by some bug or maybe problems with storage or whatever,
we don't know.

> You'll never see OPERATION_INVALID if the guest is truely running - it
> will either be shutoff, or in the process of becoming shutoff very soon.

OK, thank you for explanation and clarification.

Regards,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] Device lease hot unplug and events

2018-10-12 Thread Milan Zamazal
Hi, when working on hot unplugs of various devices, I've found out that
hot unplugging  device doesn't generate
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event.   also doesn't have an
alias, so it wouldn't be identifiable in the corresponding callback.

Is this difference from other hotpluggable devices intentional?  If yes,
is there any better way of checking that  removal is completed
than querying and examining the domain XML?  From user's point of view,
it would be best if I could simply handle the device removal event the
same way as with other devices.

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] Device lease hot unplug and events

2018-10-15 Thread Milan Zamazal
Peter Krempa  writes:

> On Mon, Oct 15, 2018 at 09:56:39 +0200, Milan Zamazal wrote:
>> Peter Krempa  writes:
>> 
>
>> > On Fri, Oct 12, 2018 at 19:33:54 +0200, Milan Zamazal wrote:
>> >> Hi, when working on hot unplugs of various devices, I've found out that
>> >> hot unplugging  device doesn't generate
>> >> VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event.   also doesn't have an
>> >> alias, so it wouldn't be identifiable in the corresponding callback.
>> >> 
>> >> Is this difference from other hotpluggable devices intentional?  If yes,
>> >
>> > Well a "lease" is not a device per-se. It's just libvirt putting it with
>> > devices. Currently the "lease" is always successfully removed/unplugged
>> > if the API returns success as there is no cooperation with qemu
>> > necessary so the semantics of asking the guest OS to do something don't
>> > apply.
>> 
>> I see, thank you for explanation.  Can we rely on the fact that lease
>> removal is and remains synchronous or is it a property that can change
>> in the future?
>
> In case of the current implementation I don't see a reason why we'd have
> to change it. If so it would be for a different "model".
>
> Generally we can't guarantee that some usage will not eventually need it
> but we should not change this behaviour, or at least I don't expect us
> to.
>
>> 
>> >> is there any better way of checking that  removal is completed
>> >> than querying and examining the domain XML?  From user's point of view,
>> >> it would be best if I could simply handle the device removal event the
>> >> same way as with other devices.
>> >
>> > Yes, we probably should add the event and synthetize it for "lease"
>> > since we will not get one from qemu. Also we'll need to add alias for
>> > the lease so that the event can be used.
>> 
>> Since this is what an uninformed user expects (and I believe libvirt
>> documentation doesn't contradict), I'd like to have the event + alias.
>> Should I file a corresponding bug or RFE?
>
> Yes please.

OK, done: https://bugzilla.redhat.com/1639228

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] Device lease hot unplug and events

2018-10-15 Thread Milan Zamazal
Peter Krempa  writes:

> On Fri, Oct 12, 2018 at 19:33:54 +0200, Milan Zamazal wrote:
>> Hi, when working on hot unplugs of various devices, I've found out that
>> hot unplugging  device doesn't generate
>> VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event.   also doesn't have an
>> alias, so it wouldn't be identifiable in the corresponding callback.
>> 
>> Is this difference from other hotpluggable devices intentional?  If yes,
>
> Well a "lease" is not a device per-se. It's just libvirt putting it with
> devices. Currently the "lease" is always successfully removed/unplugged
> if the API returns success as there is no cooperation with qemu
> necessary so the semantics of asking the guest OS to do something don't
> apply.

I see, thank you for explanation.  Can we rely on the fact that lease
removal is and remains synchronous or is it a property that can change
in the future?

>> is there any better way of checking that  removal is completed
>> than querying and examining the domain XML?  From user's point of view,
>> it would be best if I could simply handle the device removal event the
>> same way as with other devices.
>
> Yes, we probably should add the event and synthetize it for "lease"
> since we will not get one from qemu. Also we'll need to add alias for
> the lease so that the event can be used.

Since this is what an uninformed user expects (and I believe libvirt
documentation doesn't contradict), I'd like to have the event + alias.
Should I file a corresponding bug or RFE?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] Which objects does dynamic_ownership apply to?

2018-09-20 Thread Milan Zamazal
Michal Prívozník  writes:

> On 09/19/2018 12:39 PM, Milan Zamazal wrote:
>> Hi, I'm playing with dynamic ownership and not all objects have their
>> owners changed.
>
>> 
>> Is dynamic_ownership and its scope documented somewhere, besides the
>> comment in qemu.conf?
>> 
>> And what kinds of objects are handled by dynamic ownership?  While some
>> objects seem to be handled, other objects are apparently unaffected.
>> For instance /dev/hwrng or a USB host device keep their root owners and
>> are inaccessible to the VM.  Is that expected or do I have anything
>> wrong?
>
> Basically, if a file is used solely by a domain we can relabel it.
> However, if a file can be used by other processes (not only qemu) then
> we must not change its label as we would be effectively cutting of the
> other processes we know nothing about. In this case, /dev/hwrng might be
> used by some other process in the system. Also the fact that it's owned
> by root:root and not readable by anybody except the root user, tells me
> that we might not want to pass the file to any domain?

Well, /dev/hwrng may be arguable, although oVirt permits passing it to a
VM, of course only on explicit user's request.

But how about host devices such as USB and PCI devices?  For example

  
  
  
  
  
  
  

doesn't change the owner of /dev/bus/usb/003/002 (the same for
managed="yes").  Similarly for a PCI hostdev device /dev/vfio/* owners
are not changed.  Does the same argument apply?

OTOH, a CD-ROM image, which can be shared across domains and at least in
theory can be accessed by other processes, gets its owner changed.

My primary concern right now is what exactly is handled.  We can deal
with manual ownership changes of certain devices as we have done so far.
But I'm looking for a more reliable source of information than my
experiments, to prevent future breakages.  Is it documented anywhere
what is handled by libvirt and what is not?  Or can it be defined in
less ambiguous terms than above?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users

Re: [libvirt-users] Which objects does dynamic_ownership apply to?

2018-09-20 Thread Milan Zamazal
Michal Privoznik  writes:

> On 09/20/2018 12:31 PM, Milan Zamazal wrote:
>> Michal Prívozník  writes:
>> 
>
>>> On 09/19/2018 12:39 PM, Milan Zamazal wrote:
>>>> Hi, I'm playing with dynamic ownership and not all objects have their
>>>> owners changed.
>>>
>>>>
>>>> Is dynamic_ownership and its scope documented somewhere, besides the
>>>> comment in qemu.conf?
>>>>
>>>> And what kinds of objects are handled by dynamic ownership?  While some
>>>> objects seem to be handled, other objects are apparently unaffected.
>>>> For instance /dev/hwrng or a USB host device keep their root owners and
>>>> are inaccessible to the VM.  Is that expected or do I have anything
>>>> wrong?
>>>
>>> Basically, if a file is used solely by a domain we can relabel it.
>>> However, if a file can be used by other processes (not only qemu) then
>>> we must not change its label as we would be effectively cutting of the
>>> other processes we know nothing about. In this case, /dev/hwrng might be
>>> used by some other process in the system. Also the fact that it's owned
>>> by root:root and not readable by anybody except the root user, tells me
>>> that we might not want to pass the file to any domain?
>> 
>> Well, /dev/hwrng may be arguable, although oVirt permits passing it to a
>> VM, of course only on explicit user's request.
>> 
>> But how about host devices such as USB and PCI devices?  For example
>> 
>>   
>>   
>>   
>>   
>>   
>>   
>>   
>> 
>> doesn't change the owner of /dev/bus/usb/003/002 (the same for
>> managed="yes"). 
>
> Are you perhaps using namespaces and looking into the parent namespace
> rather than into qemu namespace?

Ah, that's the trick, thank you!

>  Similarly for a PCI hostdev device /dev/vfio/* owners
>> are not changed.  Does the same argument apply?
>
> Again, try looking into the namespace.
>
>> 
>> OTOH, a CD-ROM image, which can be shared across domains and at least in
>> theory can be accessed by other processes, gets its owner changed.
>
> Well, this is arguable. Firstly, if you want CD-ROM image to be shared,
> it needs to have  tag, and you may want to either disable
> relabelling by  or ensure by other ways that all
> qemu processes are able to access it.

OK, makes sense.

> Libvirt should not get involved into coming up with a seclabel that
> would fit all. In terms of unix uid:gid - libvirt should not try to
> figure out which users belong to which groups and try to find such
> combination that would fit all. This is sysadmin's responsibility.

Sure.

>> My primary concern right now is what exactly is handled.  We can deal
>> with manual ownership changes of certain devices as we have done so far.
>> But I'm looking for a more reliable source of information than my
>> experiments, to prevent future breakages.  Is it documented anywhere
>> what is handled by libvirt and what is not?  Or can it be defined in
>> less ambiguous terms than above?
>
> What devices are you changing yourself? We definitely need to go through
> the list and evaluate every item.

USB/SCSI/PCI/mediated host devices – I assume it was just my confusion
of not being aware of the namespace and they should work.

And then hwrng, which definitely doesn't work (QEMU fails on start due
to not being able to access it).  We can change the owner in oVirt,
since the host is not supposed to be used for purposes other than
running VMs.  I can understand that doesn't apply in all situations.
But if someone passes /dev/hwrng to a VM, it's intended to be accessible
there.  Should libvirt or sysadmin be responsible for that?  (I'd vote
for libvirt in order to get rid of the udev rule based workaround in
oVirt, but maybe there are more important arguments to consider.)

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users

Re: [libvirt-users] Usable and non-usable CPU models in nested virtualization

2018-12-12 Thread Milan Zamazal
Thank you for explanation, it's clear to me now.

One last question: Is there a way to get supported compatibility modes
on POWER?  For instance, I get the following from domcapabilities on a
POWER9 machine:

  


  POWER9
  IBM


  

How can I find out that POWER8 guests are also supported on the machine?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] Usable and non-usable CPU models in nested virtualization

2018-12-11 Thread Milan Zamazal
Jiri Denemark  writes:

> On Fri, Dec 07, 2018 at 11:52:38 +0100, Milan Zamazal wrote:
>> Hi, some custom CPU models are reported from
>> virConnectGetDomainCapabilities as usable='yes' on a physical machine
>
>> while as usable='no' inside a VM running on the same machine.  That's
>> not completely surprising.
>> 
>> But what surprises me is that those models are still reported from
>> virConnectCompareCPU as supported (VIR_CPU_COMPARE_SUPERSET) in the
>
> virConnectCompareCPU uses CPUID data for comparison, which is not the
> same as a list of features QEMU/KVM can provide on the host. You should
> use virConnectCompareHypervisorCPU to check whether a given CPU can be
> used on the host.
>
>> nested environment and VMs can be started happily with them.
>> 
>> For instance, virConnectGetDomainCapabilities reports
>> 
>>   Skylake-Client
>> 
>> but when I try to use that model anyway, the VM starts fine with it:
>> 
>>   
>> Skylake-Client
>> 
>> 
>> 
>
> This is not the same as Skylake-Client, it's Skylake-Client without
> invpcid. The usable='no' attribute says the Skylake-Client CPU model is
> not usable unless you disable some features. You did that and it works.
> If you asked for just Skylake-Client without any  element, the
> domain should fail to start.

Thank you for explanation.  However the behavior I observe is still not
clear to me.  The  snippet above is from a running domain,
successfully started from this definition:

  
Skylake-Client




  

When this definition is fed to compare CPU, I get:

  # virsh hypervisor-cpu-compare cpu.xml
  CPU described in cpu.xml is incompatible with the CPU provided by hypervisor 
on the host

  # virsh cpu-compare cpu.xml
  Host CPU is a superset of CPU described in cpu.xml

It's not clear to me:

- Why is the domain successfully started despite hypervisor-cpu-compare
  rejects it?

- Why is `invpcid' disabled when `invpcid' is present in /proc/cpuinfo?

- What's the basic difference between virConnectCompareCPU and
  virConnectCompareHypervisorCPU?  Does "specific hypervisor and its
  abilities" (as stated in the documentation) mean that the hypervisor
  may extend CPU capabilities (by emulation), restrict CPU capabilities,
  or both (depending or particular feature etc.)?

> Actually QEMU even reports what features need to be disabled to run each
> CPU model, but I don't think that's really useful. You don't want to
> disable all of them mechanically anyway since that can result in strange
> CPU models which would confuse guests. That's why we only report the
> usable=yes/no attribute.
>
> Jirka

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] Usable and non-usable CPU models in nested virtualization

2018-12-07 Thread Milan Zamazal
Hi, some custom CPU models are reported from
virConnectGetDomainCapabilities as usable='yes' on a physical machine
while as usable='no' inside a VM running on the same machine.  That's
not completely surprising.

But what surprises me is that those models are still reported from
virConnectCompareCPU as supported (VIR_CPU_COMPARE_SUPERSET) in the
nested environment and VMs can be started happily with them.

For instance, virConnectGetDomainCapabilities reports

  Skylake-Client

but when I try to use that model anyway, the VM starts fine with it:

  
Skylake-Client




  

  

That's actually good news, but unexpected.  Do I miss something?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] How to get list of CPUs compatible with the host CPU and vendor?

2018-12-03 Thread Milan Zamazal
Hi, I'm trying to use virConnectGetDomainCapabilities to get the list of
CPUs compatible with the host CPU.  I would like to further limit the
list to CPUs of the same vendor as the host CPU.  How can I do that?

I tried to use virConnectBaselineCPU with  element and checking
whether I obtain the same CPU, but that doesn't filter out CPUs without
any vendor such as `kvm64' or `pentium'.

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] Which objects does dynamic_ownership apply to?

2018-09-19 Thread Milan Zamazal
Hi, I'm playing with dynamic ownership and not all objects have their
owners changed.

Is dynamic_ownership and its scope documented somewhere, besides the
comment in qemu.conf?

And what kinds of objects are handled by dynamic ownership?  While some
objects seem to be handled, other objects are apparently unaffected.
For instance /dev/hwrng or a USB host device keep their root owners and
are inaccessible to the VM.  Is that expected or do I have anything
wrong?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


[libvirt-users] Certificate checking on TLS migrations to an IP address

2019-09-04 Thread Milan Zamazal
Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem
with certificate checking.

oVirt uses the destination host IP address, rather than the host name,
in the migration URI passed to virDomainMigrateToURI3.  One reason for
doing that is that a separate migration network may be used for
migrations, while the host name resolves to the management network
interface.

But it causes a problem with certificate checking.  The destination IP
address is checked against the name, which is a host name, given in the
destination certificate.  That means there is mismatch and the migration
fails.  I don't think it'd be a very good idea to avoid the problem by
putting IP addresses into server certificates.

Is there any way to make TLS migrations working under these
circumstances?  For instance, SPICE remote-viewer allows the client to
specify the certificate subject to expect on the host when connecting to
it using an IP address.  Can (or could) libvirt do something similar?
Or is there any other mechanism to handle this problem?

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users


Re: [libvirt-users] Certificate checking on TLS migrations to an IP address

2019-09-04 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Wed, Sep 04, 2019 at 03:38:25PM +0200, Milan Zamazal wrote:
>> Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem
>> with certificate checking.
>
>> 
>> oVirt uses the destination host IP address, rather than the host name,
>> in the migration URI passed to virDomainMigrateToURI3.  One reason for
>> doing that is that a separate migration network may be used for
>> migrations, while the host name resolves to the management network
>> interface.
>> 
>> But it causes a problem with certificate checking.  The destination IP
>> address is checked against the name, which is a host name, given in the
>> destination certificate.  That means there is mismatch and the migration
>> fails.  I don't think it'd be a very good idea to avoid the problem by
>> putting IP addresses into server certificates.
>
> In fact that is *exactly* what you should be doing.

OK, thank you for explanation and the doc reference.

Regards,
Milan

> Traditionally certificates were created with the 'common name' field
> holding the fully qualified DNS based hostname for the server.
>
> This was long known to be a problem because it is very common for
> servers to have multiple DNS names, or for clients to use the
> unqualified hostname, or use the IP address(es).
>
> Thus, the "Subject alt name" extension was created. This allows
> certificates to be created containing multiple hostnames and
> multiple IP addresses. The certificate will be validated correctly
> if any one of those data items matches. When 'subject alt name' is
> present in a certificate, the 'common name' field should be completely
> ignored by compliant TLS clients, so you are free to put whatever
> you want in the common name - hostname or IP address or blah...
>
> If you look at our docs, we updated them to illustrate how to
> issue certs containing hostnames + IP addresses:
>
> https://libvirt.org/remote.html#Remote_TLS_server_certificates
>
>> 
>> Is there any way to make TLS migrations working under these
>> circumstances?  For instance, SPICE remote-viewer allows the client to
>> specify the certificate subject to expect on the host when connecting to
>> it using an IP address.  Can (or could) libvirt do something similar?
>> Or is there any other mechanism to handle this problem?
>
> Regards,
> Daniel

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users

Re: [libvirt-users] Certificate checking on TLS migrations to an IP address

2019-09-19 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Wed, Sep 18, 2019 at 12:18:32PM +0200, Milan Zamazal wrote:
>> Daniel P. Berrangé  writes:
>> 
>
>> > On Wed, Sep 04, 2019 at 03:38:25PM +0200, Milan Zamazal wrote:
>> >> Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem
>> >> with certificate checking.
>> >
>> >> 
>> >> oVirt uses the destination host IP address, rather than the host name,
>> >> in the migration URI passed to virDomainMigrateToURI3.  One reason for
>> >> doing that is that a separate migration network may be used for
>> >> migrations, while the host name resolves to the management network
>> >> interface.
>> >> 
>> >> But it causes a problem with certificate checking.  The destination IP
>> >> address is checked against the name, which is a host name, given in the
>> >> destination certificate.  That means there is mismatch and the migration
>> >> fails.  I don't think it'd be a very good idea to avoid the problem by
>> >> putting IP addresses into server certificates.
>> >
>> > In fact that is *exactly* what you should be doing.
>> >
>> > Traditionally certificates were created with the 'common name' field
>> > holding the fully qualified DNS based hostname for the server.
>> >
>> > This was long known to be a problem because it is very common for
>> > servers to have multiple DNS names, or for clients to use the
>> > unqualified hostname, or use the IP address(es).
>> 
>> The problem with putting IP addresses into certificates is that the
>> certificate must be updated each time an IP address changes, is added or
>> is removed.  Doing this in oVirt would be complicated and error-prone.
>> While host names are stable, host networks and the related IP addresses
>> may change.
>> 
>> > Thus, the "Subject alt name" extension was created. This allows
>> > certificates to be created containing multiple hostnames and
>> > multiple IP addresses. The certificate will be validated correctly
>> > if any one of those data items matches. When 'subject alt name' is
>> > present in a certificate, the 'common name' field should be completely
>> > ignored by compliant TLS clients, so you are free to put whatever
>> > you want in the common name - hostname or IP address or blah...
>> 
>> We can switch to using Subject Alt Name and we have a patch for that now
>> based on your advice, but it doesn't solve the problem with tracking IP
>> address changes and updating the corresponding certificates whenever a
>> change occurs.
>> 
>> > If you look at our docs, we updated them to illustrate how to
>> > issue certs containing hostnames + IP addresses:
>> >
>> > https://libvirt.org/remote.html#Remote_TLS_server_certificates
>> >
>> >> 
>> >> Is there any way to make TLS migrations working under these
>> >> circumstances?  For instance, SPICE remote-viewer allows the client to
>> >> specify the certificate subject to expect on the host when connecting to
>> >> it using an IP address.  Can (or could) libvirt do something similar?
>> 
>> Would it be possible?  We have host names in the certificates under our
>> control and we know which host name to expect in the certificate
>> regardless the IP address used for the given connection.  Checking the
>> certificate against a given host name would solve the problem easily and
>> robustly for us.
>
> There's two options that could make it work
>
>  - Define a new migration parameter which lets apps pass in the hostname
>to use for TLS cert validation to libvirt, which would have to then
>pass it into QEMU

I think this is the best option.  We know the destination host name,
while we need to use an IP address to connect to it in order to use a
particular network.

>  - The source host libvirtd has a connection to the dest host libvirtd.
>It can thus ask dest host what its primary hostname is, and then
>automatically tell QEMu to use that for TLS cert validation. This
>could cause problems though for people already using TLS certs
>with IP addresses in.

This doesn't look very good from the security point of view, since then
the source doesn't check it really connects to the host it expects, just
that the destination host has a valid certificate signed by the right CA
(I suppose).  It may be good enough or even useful for some scenarios,
but not for others.

Thanks,
Milan

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users

Re: [libvirt-users] Descriptions of mdev types?

2019-11-20 Thread Milan Zamazal
Erik Skultety  writes:

> On Tue, Nov 19, 2019 at 05:38:54PM +0100, Milan Zamazal wrote:
>> Hi, when retrieving an mdev device info using `virsh nodedev-dumpxml' or
>> the libvirt API, something like the following is returned:
>
>>
>> 
>>   
>> GRID M60-2B4
>> vfio-pci
>> 4
>>   
>>   ...
>> 
>>
>> Besides device_api, available_instances and (optional) `name',
>> `description' of the given mdev type may be optionally provided in
>> /sys/.../mdev_supported_types/... for each of the available mdev types.
>> I can see in the sources that libvirt doesn't try to retrieve it -- is
>> it intentionally or is it just an omission?  If the latter, could it be
>> added, please?  It looks like a useful piece of information for the user
>> to get an idea what the given mdev type means.
>
> The reasoning we had about not including the "description" attribute when we
> introduced mdevs to libvirt's nodedev driver was that there was no way for
> NVIDIA and Intel to agree on the values to be exposed by the attribute -
> especially the data NVIDIA puts in there (like you already said) is useful, 
> but
> there was also no agreement on extracting the data into a different attribute
> (a set of attributes) and make them structured within the XML.
>
> Which brings me to the actual content of the "description" attribute, it
> contains unstructured free-form text and we didn't want to expose that kind of
> thing in the XML even though it just so happens that NVIDIA put some
> interesting data in it - since the attribute is optional and free-form, one
> day, you find the useful data you're interested in now, but tomorrow that may
> not be the case anymore and can easily change. I've got no problem with the
> idea of exposing some kind of description as part of the XML per se, the
> problem I see is that someone will try and start parsing the description field
> because of the potentially useful data and if it changes and I'm afraid
> complaints will head our way even though we cannot guarantee anything wrt to
> that specific field (I'm still open to a discussion though).

I agree that it would not be a good idea to rely on the content of the
description data or trying to parse it.  But if we can accept the
description is basically a free-form, less or more informative, text
then it is still useful for some purposes.  I think your worries about
the users relying on the content of the "data" can be handled in
documentation, by clearly stating the item may contain anything, which
may change any time.

Our use case in oVirt is to display some accompanied information when a
user selects one of the many mdev types in the UI.  If we provided
description (as a normal text) next to each of the mdev types, it would
be definitely helpful and a significant improvement over just providing
the cryptic mdev type numbers + available instances.  Currently, we have
no better choice than to read `description' files from /sys, which makes
little sense when all the other info is already available from libvirt.
This is why I think having  element with a text content
copied 1:1 from `description' file would be useful.

Thanks,
Milan


___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users



[libvirt-users] Descriptions of mdev types?

2019-11-19 Thread Milan Zamazal
Hi, when retrieving an mdev device info using `virsh nodedev-dumpxml' or
the libvirt API, something like the following is returned:


  
GRID M60-2B4
vfio-pci
4
  
  ...


Besides device_api, available_instances and (optional) `name',
`description' of the given mdev type may be optionally provided in
/sys/.../mdev_supported_types/... for each of the available mdev types.
I can see in the sources that libvirt doesn't try to retrieve it -- is
it intentionally or is it just an omission?  If the latter, could it be
added, please?  It looks like a useful piece of information for the user
to get an idea what the given mdev type means.

Thanks,
Milan


___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users



Re: [libvirt-users] Certificate checking on TLS migrations to an IP address

2019-09-23 Thread Milan Zamazal
Milan Zamazal  writes:

> Daniel P. Berrangé  writes:
>
>> On Wed, Sep 18, 2019 at 12:18:32PM +0200, Milan Zamazal wrote:
>>> Daniel P. Berrangé  writes:
>>> 
>>
>>> > On Wed, Sep 04, 2019 at 03:38:25PM +0200, Milan Zamazal wrote:
>>> >> Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem
>>> >> with certificate checking.
>>> >
>>> >> 
>>> >> oVirt uses the destination host IP address, rather than the host name,
>>> >> in the migration URI passed to virDomainMigrateToURI3.  One reason for
>>> >> doing that is that a separate migration network may be used for
>>> >> migrations, while the host name resolves to the management network
>>> >> interface.
>>> >> 
>>> >> But it causes a problem with certificate checking.  The destination IP
>>> >> address is checked against the name, which is a host name, given in the
>>> >> destination certificate.  That means there is mismatch and the migration
>>> >> fails.  I don't think it'd be a very good idea to avoid the problem by
>>> >> putting IP addresses into server certificates.
>>> >
>>> > In fact that is *exactly* what you should be doing.
>>> >
>>> > Traditionally certificates were created with the 'common name' field
>>> > holding the fully qualified DNS based hostname for the server.
>>> >
>>> > This was long known to be a problem because it is very common for
>>> > servers to have multiple DNS names, or for clients to use the
>>> > unqualified hostname, or use the IP address(es).
>>> 
>>> The problem with putting IP addresses into certificates is that the
>>> certificate must be updated each time an IP address changes, is added or
>>> is removed.  Doing this in oVirt would be complicated and error-prone.
>>> While host names are stable, host networks and the related IP addresses
>>> may change.
>>> 
>>> > Thus, the "Subject alt name" extension was created. This allows
>>> > certificates to be created containing multiple hostnames and
>>> > multiple IP addresses. The certificate will be validated correctly
>>> > if any one of those data items matches. When 'subject alt name' is
>>> > present in a certificate, the 'common name' field should be completely
>>> > ignored by compliant TLS clients, so you are free to put whatever
>>> > you want in the common name - hostname or IP address or blah...
>>> 
>>> We can switch to using Subject Alt Name and we have a patch for that now
>>> based on your advice, but it doesn't solve the problem with tracking IP
>>> address changes and updating the corresponding certificates whenever a
>>> change occurs.
>>> 
>>> > If you look at our docs, we updated them to illustrate how to
>>> > issue certs containing hostnames + IP addresses:
>>> >
>>> > https://libvirt.org/remote.html#Remote_TLS_server_certificates
>>> >
>>> >> 
>>> >> Is there any way to make TLS migrations working under these
>>> >> circumstances?  For instance, SPICE remote-viewer allows the client to
>>> >> specify the certificate subject to expect on the host when connecting to
>>> >> it using an IP address.  Can (or could) libvirt do something similar?
>>> 
>>> Would it be possible?  We have host names in the certificates under our
>>> control and we know which host name to expect in the certificate
>>> regardless the IP address used for the given connection.  Checking the
>>> certificate against a given host name would solve the problem easily and
>>> robustly for us.
>>
>> There's two options that could make it work
>>
>>  - Define a new migration parameter which lets apps pass in the hostname
>>to use for TLS cert validation to libvirt, which would have to then
>>pass it into QEMU
>
> I think this is the best option.  We know the destination host name,
> while we need to use an IP address to connect to it in order to use a
> particular network.

If we can agree on this, should I file a corresponding RFE on libvirt?

Thanks,
Milan

>>  - The source host libvirtd has a connection to the dest host libvirtd.
>>It can thus ask dest host what its primary hostname is, and then
>>automatically tell QEMu to use that for TLS cert validation. This
>>could cause problems though for people already using TLS certs
>>with IP addresses in.
>
> This doesn't look very good from the security point of view, since then
> the source doesn't check it really connects to the host it expects, just
> that the destination host has a valid certificate signed by the right CA
> (I suppose).  It may be good enough or even useful for some scenarios,
> but not for others.

___
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users

Re: How to detect completion of a paused VM migration on the destination?

2020-01-22 Thread Milan Zamazal
Michal Privoznik  writes:

> On 1/21/20 3:28 PM, Milan Zamazal wrote:
>> Hi,
>>
>
>> when a normally running VM is migrated, libvirt sends
>> VIR_DOMAIN_EVENT_RESUMED_MIGRATED event on the destination once the
>> migration completes.  I can see that when a paused VM is migrated,
>> libvirt sends VIR_DOMAIN_EVENT_SUSPENDED_PAUSED instead.
>>
>> Since there seems to be nothing migration specific about
>> VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event, my question is: Is it safe to
>> assume on the destination that this event signals completion of the
>> incoming migration (unless VIR_DOMAIN_EVENT_RESUMED_MIGRATED is received
>> before)?
>
> Yes. This is the code that handles the finish phase of migration:
>
> https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_migration.c;h=29d228a8d9345ec8e2853571444614008a95e914;hb=HEAD#l5105
>
> which can be read as the following pseudo code:
>
> if (postCopy)
>   sendEvent(VIR_DOMAIN_EVENT_RESUMED_MIGRATED);
>
> if (domain.paused)
>   sendEvent(VIR_DOMAIN_EVENT_SUSPENDED_PAUSED);

OK, thank you for clarification.

Regards,
Milan




How to detect completion of a paused VM migration on the destination?

2020-01-21 Thread Milan Zamazal
Hi,

when a normally running VM is migrated, libvirt sends
VIR_DOMAIN_EVENT_RESUMED_MIGRATED event on the destination once the
migration completes.  I can see that when a paused VM is migrated,
libvirt sends VIR_DOMAIN_EVENT_SUSPENDED_PAUSED instead.

Since there seems to be nothing migration specific about
VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event, my question is: Is it safe to
assume on the destination that this event signals completion of the
incoming migration (unless VIR_DOMAIN_EVENT_RESUMED_MIGRATED is received
before)?

Thanks,
Milan




Re: Two questions about NVDIMM devices

2020-09-10 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Thu, Sep 10, 2020 at 04:26:40PM +0200, Milan Zamazal wrote:
>> Daniel P. Berrangé  writes:
>> 
>
>> > On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote:
>> >> Hi,
>> >> 
>> >
>> >> I've met two situations with NVDIMM support in libvirt where I'm not
>> >> sure all the parties (libvirt & I) do the things correctly.
>> >> 
>> >> The first problem is with memory alignment and size changes.  In
>> >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also
>> >> makes some NVDIMM size changes for better alignments, in
>> >> qemuDomainMemoryDeviceAlignSize.  This can lead to the size being
>> >> rounded up, exceeding the size of the backing device and QEMU failing to
>> >> start the VM for that reason (I've experienced that actually).  I work
>> >> with emulated NVDIMM devices, not a bare metal hardware, so one might
>> >> argue that in practice the device sizes should already be aligned, but
>> >> I'm not sure it must be always the case considering labels or whatever
>> >> else the user decides to set up.  And I still don't feel very
>> >> comfortable that I have to count with two internal size adjustments
>> >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal
>> >> of getting the VM started and having the NVDIMM aligned properly to make
>> >> (non-NVDIMM) memory hot plug working.  Is the size alignment performed
>> >> by libvirt, especially rounding up, completely correct for NVDIMMs?
>> >
>> > The comment on the function says QEMU aligns to "page size", which
>> > is something that can vary depending not only on architecture, and
>> > also the build config options for the kernel on that architecture.
>> > eg aarch64 has different page size in RHEL than other distros because
>> > of different choice of page size in kernel config. 
>> >
>> > Libvirt rounds up to 1 MB, essentially so that the size works no matter
>> > what architecture or build options were used. I think this is quite
>> > compelling as I don't think mgmt apps are likely to care enough about
>> > non-x86 architectures to pick the right rounded sizes. 
>> >
>> > If we're enforcing this 1 MB rounding though, we really should be
>> > documenting it clearly, so that apps can pick the right backing file
>> > size. I think we dropped the ball on docs.
>> 
>> I still can't see it in the documentation, would it be possible to be
>> clear about it in the docs, please?  For first, it's not very intuitive
>> to figure out that (if I've figured out it correctly) on POWER one
>> *must* specify the NVDIMM size S as
>> 
>>   S == aligned_size + label_size
>> 
>> and that size is used for the QEMU device; while on x86_64 one can
>> specify any size S and
>> 
>>   align_up(S)
>> 
>> will be used for the QEMU device (and label size doesn't influence the
>> value).  And additional alignment may be required for having any memory
>> hot plug working.
>> 
>> For second, and more importantly, I'm afraid that without documenting
>> it, future changes may break the current behavior without warning.  For
>> example, the recent changes regarding POWER alignment in 6.7.0 are for
>> good IMHO and one can use the same size with both 6.7 and 6.6 versions,
>> but they could still cause pre-6.7 sizes stop working.
>
> I don't know what changes you are referring to here, but if they were
> in libvirt I'd consider that a bug - we shouldn't break a previously
> working configuration by increasing required alignment.

I mean disabling the auto alignment in
https://gitlab.com/libvirt/libvirt/-/commit/07de813924caf37e535855541c0c1183d9d382e2
and replacing it with validation in
https://gitlab.com/libvirt/libvirt/-/commit/0ccceaa57c50e5ee528f7073fa8723afd62b88b7

That change can cause a VM fail to start but after (manually) adjusting
the device size, all should work all right.  Changes that would actually
change sizes would be more dangerous.

Regards,
Milan




Re: Two questions about NVDIMM devices

2020-09-10 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote:
>> Hi,
>> 
>
>> I've met two situations with NVDIMM support in libvirt where I'm not
>> sure all the parties (libvirt & I) do the things correctly.
>> 
>> The first problem is with memory alignment and size changes.  In
>> addition to the size changes applied to NVDIMMs by QEMU, libvirt also
>> makes some NVDIMM size changes for better alignments, in
>> qemuDomainMemoryDeviceAlignSize.  This can lead to the size being
>> rounded up, exceeding the size of the backing device and QEMU failing to
>> start the VM for that reason (I've experienced that actually).  I work
>> with emulated NVDIMM devices, not a bare metal hardware, so one might
>> argue that in practice the device sizes should already be aligned, but
>> I'm not sure it must be always the case considering labels or whatever
>> else the user decides to set up.  And I still don't feel very
>> comfortable that I have to count with two internal size adjustments
>> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal
>> of getting the VM started and having the NVDIMM aligned properly to make
>> (non-NVDIMM) memory hot plug working.  Is the size alignment performed
>> by libvirt, especially rounding up, completely correct for NVDIMMs?
>
> The comment on the function says QEMU aligns to "page size", which
> is something that can vary depending not only on architecture, and
> also the build config options for the kernel on that architecture.
> eg aarch64 has different page size in RHEL than other distros because
> of different choice of page size in kernel config. 
>
> Libvirt rounds up to 1 MB, essentially so that the size works no matter
> what architecture or build options were used. I think this is quite
> compelling as I don't think mgmt apps are likely to care enough about
> non-x86 architectures to pick the right rounded sizes. 
>
> If we're enforcing this 1 MB rounding though, we really should be
> documenting it clearly, so that apps can pick the right backing file
> size. I think we dropped the ball on docs.

I still can't see it in the documentation, would it be possible to be
clear about it in the docs, please?  For first, it's not very intuitive
to figure out that (if I've figured out it correctly) on POWER one
*must* specify the NVDIMM size S as

  S == aligned_size + label_size

and that size is used for the QEMU device; while on x86_64 one can
specify any size S and

  align_up(S)

will be used for the QEMU device (and label size doesn't influence the
value).  And additional alignment may be required for having any memory
hot plug working.

For second, and more importantly, I'm afraid that without documenting
it, future changes may break the current behavior without warning.  For
example, the recent changes regarding POWER alignment in 6.7.0 are for
good IMHO and one can use the same size with both 6.7 and 6.6 versions,
but they could still cause pre-6.7 sizes stop working.

Thanks,
Milan




Distinguishing between host and guest initiated VM shutdown

2020-08-26 Thread Milan Zamazal
Hi,

we have a problem in oVirt that highly available VMs don't restart after
host poweroff because Vdsm identifies the case as a user initiated
shutdown (https://bugzilla.redhat.com/1800966).

When poweroff is run on the host, libvirt-guests service takes an
action.  `virsh shutdown' is run on the VM, the guest OS is shut down
cleanly and libvirt reports a shutdown event with
VIR_DOMAIN_EVENT_SHUTDOWN_GUEST detail.  Although it is a host initiated
shutdown actually.

Does libvirt provide any means to distinguish this case from a regular
user shutdown?

Thanks,
Milan



Re: Distinguishing between host and guest initiated VM shutdown

2020-08-27 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Thu, Aug 27, 2020 at 10:06:25AM +0200, Milan Zamazal wrote:
>> "Daniel P. Berrange"  writes:
>> 
>
>> > On Wed, Aug 26, 2020 at 10:35:22PM +0200, Milan Zamazal wrote:
>> >> Hi,
>> >> 
>> >
>> >> we have a problem in oVirt that highly available VMs don't restart after
>> >> host poweroff because Vdsm identifies the case as a user initiated
>> >> shutdown (https://bugzilla.redhat.com/1800966).
>> >> 
>> >> When poweroff is run on the host, libvirt-guests service takes an
>> >> action.
>> >
>> > If oVirt is initiating a graceful host shutdown, 
>> 
>> I meant host shutdown not initiated by oVirt.
>
> Well oVirt still knows at any point in time what VMs are currently
> running on a host, so if it sees the host shutdown, it already
> knows that needs restarting.

libvirt-guests ensures graceful shutdown of the VMs (which is a good
thing) so at the moment when the host gets down, there are no VMs
running there.  We need that info at the moment when a VM is shut down
and I think we can get it by examining some systemd service (if ignoring
some not so important timing issues).

Regards,
Milan




Re: Distinguishing between host and guest initiated VM shutdown

2020-08-27 Thread Milan Zamazal
"Daniel P. Berrange"  writes:

> On Wed, Aug 26, 2020 at 10:35:22PM +0200, Milan Zamazal wrote:
>> Hi,
>> 
>
>> we have a problem in oVirt that highly available VMs don't restart after
>> host poweroff because Vdsm identifies the case as a user initiated
>> shutdown (https://bugzilla.redhat.com/1800966).
>> 
>> When poweroff is run on the host, libvirt-guests service takes an
>> action.
>
> If oVirt is initiating a graceful host shutdown, 

I meant host shutdown not initiated by oVirt.

> then surely it already knows what VMs it has running on the host at
> that time, and so has enough info to restart them later.
>
>>`virsh shutdown' is run on the VM, the guest OS is shut down
>> cleanly and libvirt reports a shutdown event with
>> VIR_DOMAIN_EVENT_SHUTDOWN_GUEST detail.  Although it is a host initiated
>> shutdown actually.
>> 
>> Does libvirt provide any means to distinguish this case from a regular
>> user shutdown?
>
> A "virsh shutdown" merely triggers a request to the guest OS to start
> a guest initiated shutdown. As such it is indistinguishable from an
> administrator initiating the same thing inside the guest.

OK, so we will have to check on VM shutdown whether the host is in
shutdown or not ourselves.

Thank you for clarification,
Milan



Two questions about NVDIMM devices

2020-07-02 Thread Milan Zamazal
Hi,

I've met two situations with NVDIMM support in libvirt where I'm not
sure all the parties (libvirt & I) do the things correctly.

The first problem is with memory alignment and size changes.  In
addition to the size changes applied to NVDIMMs by QEMU, libvirt also
makes some NVDIMM size changes for better alignments, in
qemuDomainMemoryDeviceAlignSize.  This can lead to the size being
rounded up, exceeding the size of the backing device and QEMU failing to
start the VM for that reason (I've experienced that actually).  I work
with emulated NVDIMM devices, not a bare metal hardware, so one might
argue that in practice the device sizes should already be aligned, but
I'm not sure it must be always the case considering labels or whatever
else the user decides to set up.  And I still don't feel very
comfortable that I have to count with two internal size adjustments
(libvirt & QEMU) to the `size' value I specify, with the ultimate goal
of getting the VM started and having the NVDIMM aligned properly to make
(non-NVDIMM) memory hot plug working.  Is the size alignment performed
by libvirt, especially rounding up, completely correct for NVDIMMs?

The second problem is that a VM fails to start with a backing NVDIMM in
devdax mode due to SELinux preventing access to the /dev/dax* device (it
doesn't happen with any other NVDIMM modes).  Who should be responsible
for handling the SELinux label appropriately in that case?  libvirt, the
system administrator, anybody else?  Using  in NVDIMM's source
doesn't seem to be accepted by the domain XML schema.

Thanks,
Milan



Re: Two questions about NVDIMM devices

2020-07-02 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote:
>> Hi,
>> 
>
>> I've met two situations with NVDIMM support in libvirt where I'm not
>> sure all the parties (libvirt & I) do the things correctly.
>> 
>> The first problem is with memory alignment and size changes.  In
>> addition to the size changes applied to NVDIMMs by QEMU, libvirt also
>> makes some NVDIMM size changes for better alignments, in
>> qemuDomainMemoryDeviceAlignSize.  This can lead to the size being
>> rounded up, exceeding the size of the backing device and QEMU failing to
>> start the VM for that reason (I've experienced that actually).  I work
>> with emulated NVDIMM devices, not a bare metal hardware, so one might
>> argue that in practice the device sizes should already be aligned, but
>> I'm not sure it must be always the case considering labels or whatever
>> else the user decides to set up.  And I still don't feel very
>> comfortable that I have to count with two internal size adjustments
>> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal
>> of getting the VM started and having the NVDIMM aligned properly to make
>> (non-NVDIMM) memory hot plug working.  Is the size alignment performed
>> by libvirt, especially rounding up, completely correct for NVDIMMs?
>
> The comment on the function says QEMU aligns to "page size", which
> is something that can vary depending not only on architecture, and
> also the build config options for the kernel on that architecture.
> eg aarch64 has different page size in RHEL than other distros because
> of different choice of page size in kernel config. 
>
> Libvirt rounds up to 1 MB, 

Actually 2 MB, at least in my case, apparently in
qemuDomainGetMemoryModuleSizeAlignment.  But it's just a detail.

> essentially so that the size works no matter what architecture or
> build options were used. I think this is quite compelling as I don't
> think mgmt apps are likely to care enough about non-x86 architectures
> to pick the right rounded sizes.
>
> If we're enforcing this 1 MB rounding though, we really should be
> documenting it clearly, so that apps can pick the right backing file
> size. I think we dropped the ball on docs.

Yes, OK.  I also wonder how exactly label size is counted in.  It's
added to the aligned value in qemuDomainNVDimmAlignSizePseries with the
argument that label size is mandatory on ppc.  But it's also permitted
on other architectures and I can't see a similar adjustment for them.  I
think QEMU handles it fine in either case (by subtracting label size
from the overall size and aligning the result down) and I guess the
special handling of ppc in libvirt is just not to waste 256 MB
unnecessarily.  Still, all the size shuffling scares me and I can only
hope that I compute my target sizes for the domain XML correctly to make
everything working well...

>> The second problem is that a VM fails to start with a backing NVDIMM in
>> devdax mode due to SELinux preventing access to the /dev/dax* device (it
>> doesn't happen with any other NVDIMM modes).  Who should be responsible
>> for handling the SELinux label appropriately in that case?  libvirt, the
>> system administrator, anybody else?  Using  in NVDIMM's source
>> doesn't seem to be accepted by the domain XML schema.
>
> The expectation is that out of the box SELinux will "just work". So
> anything that is broken is a bug in either libvirt or selinux policy.
>
> There is no expectation/requirement to use  unless you want
> to setup non-default behaviour which isn't the case here.
>
> IOW this sounds like a genuine bug.

OK, I'll try to find out what and where is the problem exactly.

Thank you for the clarifications,
Milan




NVDIMM in devdax mode and SELinux (was: Two questions about NVDIMM devices)

2020-07-09 Thread Milan Zamazal
Milan Zamazal  writes:

> Daniel P. Berrangé  writes:
>
>> On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote:
>>> The second problem is that a VM fails to start with a backing NVDIMM in
>>> devdax mode due to SELinux preventing access to the /dev/dax* device (it
>>> doesn't happen with any other NVDIMM modes).  Who should be responsible
>>> for handling the SELinux label appropriately in that case?  libvirt, the
>>> system administrator, anybody else?  Using  in NVDIMM's source
>>> doesn't seem to be accepted by the domain XML schema.
>>
>> The expectation is that out of the box SELinux will "just work". So
>> anything that is broken is a bug in either libvirt or selinux policy.
>>
>> There is no expectation/requirement to use  unless you want
>> to setup non-default behaviour which isn't the case here.
>>
>> IOW this sounds like a genuine bug.
>
> OK, I'll try to find out what and where is the problem exactly.

The problem apparently is that /dev/dax* is a character device rather
than a block device (such as /dev/pmem*), which is not expected by
SELinux policy rules.

This is an NVDIMM in fsdax mode:

  # ls -lZ /dev/pmem0
  brw-rw. 1 root disk system_u:object_r:device_t:s0 259, 0 Jul  9 11:39 
/dev/pmem0

This is the same NVDIMM reconfigured as devdax:

  # ls -lZ /dev/dax0.0 
  crw---. 1 root root system_u:object_r:device_t:s0 252, 5 Jul  9 11:43 
/dev/dax0.0

(Unix permissions are different, but when I change them to `disk' group
and 660, the same problem still occurs.)

audit.log reports the following when starting a VM with an NVDIMM device
in devdax mode:

  type=AVC msg=audit(1594144691.758:913): avc:  denied  { map } for  pid=21659 
comm="qemu-kvm" path="/dev/dax0.0" dev="tmpfs" ino=1521557 
scontext=system_u:system_r:svirt_t:s0:c216,c981 
tcontext=system_u:object_r:svirt_image_t:s0:c216,c981 tclass=chr_file 
permissive=0
  type=AVC msg=audit(1594144691.758:914): avc:  denied  { map } for  pid=21659 
comm="qemu-kvm" path="/dev/dax0.0" dev="tmpfs" ino=1521557 
scontext=system_u:system_r:svirt_t:s0:c216,c981 
tcontext=system_u:object_r:svirt_image_t:s0:c216,c981 tclass=chr_file 
permissive=0

Indeed, svirt_t map access to svirt_image_t is allowed only for files
and block devices:

  # sesearch -A -p map -s svirt_t -t svirt_image_t
  ...
  allow svirt_t svirt_image_t:blk_file map;
  allow svirt_t svirt_image_t:file map;

What to do about it?  Do I handle the NVDIMM in a wrong way or should
sVirt policies be fixed?

Thanks,
Milan




Re: Emulated TPM devices and snapshots of running VMs

2020-07-09 Thread Milan Zamazal
Milan Zamazal  writes:

> Hi,
>
> I would like to clarify how to make snapshots of running VMs with
> emulated TPM devices.  As far as I understand QEMU documentation, it's
> possible to make snapshots of running VMs with TPM, but it's important
> to retain the state of swtpm.  Does libvirt assist with that in any way
> or is it completely user's responsibility?  libvirt pauses the VM
> internally when making a snapshot, which should be the right moment to
> copy the swtpm data, but the user doesn't have control over it.  Is
> there a way to make a copy of swtpm data that is guaranteed to be
> consistent with the snapshot?

No idea?

> Thank you,
> Milan



Re: Emulated TPM devices and snapshots of running VMs

2020-07-09 Thread Milan Zamazal
Peter Krempa  writes:

> On Thu, Jul 09, 2020 at 14:14:32 +0200, Milan Zamazal wrote:
>> Milan Zamazal  writes:
>> 
>
>> > Hi,
>> >
>> > I would like to clarify how to make snapshots of running VMs with
>> > emulated TPM devices.  As far as I understand QEMU documentation, it's
>> > possible to make snapshots of running VMs with TPM, but it's important
>> > to retain the state of swtpm.  Does libvirt assist with that in any way
>> > or is it completely user's responsibility?  libvirt pauses the VM
>> > internally when making a snapshot, which should be the right moment to
>> > copy the swtpm data, but the user doesn't have control over it.  Is
>> > there a way to make a copy of swtpm data that is guaranteed to be
>> > consistent with the snapshot?
>> 
>> No idea?
>
> I can comment only on the fact that libvirt doesn't do anything
> regarding snapshots on a VM with TPM.

Thank you for the confirmation.

Can anybody confirm there is no way to perform custom actions while a VM
is frozen by libvirt when making a memory snapshot, before we start
thinking about workarounds and/or filing a RFE?

Thanks,
Milan



Re: Emulated TPM devices and snapshots of running VMs

2020-07-09 Thread Milan Zamazal
Peter Krempa  writes:

> On Thu, Jul 09, 2020 at 17:54:23 +0200, Milan Zamazal wrote:
>> Peter Krempa  writes:
>> 
>
>> > On Thu, Jul 09, 2020 at 14:14:32 +0200, Milan Zamazal wrote:
>> >> Milan Zamazal  writes:
>> >> 
>> >
>> >> > Hi,
>> >> >
>> >> > I would like to clarify how to make snapshots of running VMs with
>> >> > emulated TPM devices.  As far as I understand QEMU documentation, it's
>> >> > possible to make snapshots of running VMs with TPM, but it's important
>> >> > to retain the state of swtpm.  Does libvirt assist with that in any way
>> >> > or is it completely user's responsibility?  libvirt pauses the VM
>> >> > internally when making a snapshot, which should be the right moment to
>> >> > copy the swtpm data, but the user doesn't have control over it.  Is
>> >> > there a way to make a copy of swtpm data that is guaranteed to be
>> >> > consistent with the snapshot?
>> >> 
>> >> No idea?
>> >
>> > I can comment only on the fact that libvirt doesn't do anything
>> > regarding snapshots on a VM with TPM.
>> 
>> Thank you for the confirmation.
>> 
>> Can anybody confirm there is no way to perform custom actions while a VM
>> is frozen by libvirt when making a memory snapshot, before we start
>> thinking about workarounds and/or filing a RFE?
>
> No, currently we don't support any custom actions at the point when the
> external memory snapshot is finalized prior to continuing the VM.
>
> Please file a generic RFE for snapshoting including TPM rather than a
> partial one where you'll request a way to do your hack.

OK, thanks, done: https://bugzilla.redhat.com/1855367



Emulated TPM devices and snapshots of running VMs

2020-07-03 Thread Milan Zamazal
Hi,

I would like to clarify how to make snapshots of running VMs with
emulated TPM devices.  As far as I understand QEMU documentation, it's
possible to make snapshots of running VMs with TPM, but it's important
to retain the state of swtpm.  Does libvirt assist with that in any way
or is it completely user's responsibility?  libvirt pauses the VM
internally when making a snapshot, which should be the right moment to
copy the swtpm data, but the user doesn't have control over it.  Is
there a way to make a copy of swtpm data that is guaranteed to be
consistent with the snapshot?

Thank you,
Milan



Re: NVDIMM sizes and DIMM hot plug

2020-06-18 Thread Milan Zamazal
Peter Krempa  writes:

> On Tue, Jun 16, 2020 at 12:54:29 +0200, Milan Zamazal wrote:
>> Hi,
>> 
>
>> I've found out that NVDIMM size and label size matter for regular
>> (non-NV) DIMM hot plug.  If the NVDIMM is not aligned correctly, the
>> guest OS will not accept the hot plugged memory and will complain with
>> messages such as
>> 
>>   Block size [0x800] unaligned hotplug range: start 0x22500, size 
>> 0x1000
>> 
>> The start address above is also reported within  element of the
>> hot plugged memory in the domain XML:
>> 
>>   
>> 
>> Apparently, in order to make memory hot plug working in the guest OS,
>> the inserted memory must be aligned to the platform memory alignment
>> (128 MB on x86_64).
>> 
>> I'd like to clarify, how libvirt makes the DIMM address above.  How is
>
> If the address isn't provided in the device XML of the attached device,
> libvirt attaches the device without any address at all and then
> refreshes the address from qemu in 'qemuDomainUpdateMemoryDeviceInfo'.

OK, I can look into the QEMU source code, but I'd still like to have
some official confirmation, especially regarding possible pitfalls or
future changes.  We can't risk data loss.

>> the NVDIMM memory range determined?  According to my experiments, it
>> seems the NVDIMM specified  is taken, NVDIMM  size is
>> subtracted from it and the resulting value is reduced to the nearest
>> multiple of NVDIMM .  Is this observation correct?  Is it
>> guaranteed to be stable in future versions?  I need to determine the
>> right NVDIMM size to make the subsequent memory modules correctly
>> aligned and then I can't change the NVDIMM size, to not damage data
>> stored in the NVDIMM.
>
>
> Unfortunatelly I didn't implement NVDIMM support so I don't know the
> intricacies. I've cc'd Martin Kletzander who did that part.

Martin, do you know how the QEMU part is supposed to work?  I haven't
received any response on the QEMU list, do you know who could I ask
directly?

>> Additionally, when adjusting maxMemory due to NVDIMM presence, should I
>> increase it by the specified NVDIMM  or a different value?
>> 
>> Thank you,
>> Milan
>> 



NVDIMM sizes and DIMM hot plug

2020-06-16 Thread Milan Zamazal
Hi,

I've found out that NVDIMM size and label size matter for regular
(non-NV) DIMM hot plug.  If the NVDIMM is not aligned correctly, the
guest OS will not accept the hot plugged memory and will complain with
messages such as

  Block size [0x800] unaligned hotplug range: start 0x22500, size 
0x1000

The start address above is also reported within  element of the
hot plugged memory in the domain XML:

  

Apparently, in order to make memory hot plug working in the guest OS,
the inserted memory must be aligned to the platform memory alignment
(128 MB on x86_64).

I'd like to clarify, how libvirt makes the DIMM address above.  How is
the NVDIMM memory range determined?  According to my experiments, it
seems the NVDIMM specified  is taken, NVDIMM  size is
subtracted from it and the resulting value is reduced to the nearest
multiple of NVDIMM .  Is this observation correct?  Is it
guaranteed to be stable in future versions?  I need to determine the
right NVDIMM size to make the subsequent memory modules correctly
aligned and then I can't change the NVDIMM size, to not damage data
stored in the NVDIMM.

Additionally, when adjusting maxMemory due to NVDIMM presence, should I
increase it by the specified NVDIMM  or a different value?

Thank you,
Milan



Re: NVDIMM sizes and DIMM hot plug

2020-06-16 Thread Milan Zamazal
Daniel P. Berrangé  writes:

> On Tue, Jun 16, 2020 at 12:54:29PM +0200, Milan Zamazal wrote:
>> Hi,
>> 
>
>> I've found out that NVDIMM size and label size matter for regular
>> (non-NV) DIMM hot plug.  If the NVDIMM is not aligned correctly, the
>> guest OS will not accept the hot plugged memory and will complain with
>> messages such as
>> 
>>   Block size [0x800] unaligned hotplug range: start 0x22500, size 
>> 0x1000
>> 
>> The start address above is also reported within  element of the
>> hot plugged memory in the domain XML:
>> 
>>   
>> 
>> Apparently, in order to make memory hot plug working in the guest OS,
>> the inserted memory must be aligned to the platform memory alignment
>> (128 MB on x86_64).
>> 
>> I'd like to clarify, how libvirt makes the DIMM address above.  How is
>> the NVDIMM memory range determined?  According to my experiments, it
>> seems the NVDIMM specified  is taken, NVDIMM  size is
>> subtracted from it and the resulting value is reduced to the nearest
>> multiple of NVDIMM .  Is this observation correct?  Is it
>> guaranteed to be stable in future versions?  I need to determine the
>> right NVDIMM size to make the subsequent memory modules correctly
>> aligned and then I can't change the NVDIMM size, to not damage data
>> stored in the NVDIMM.
>
> Libvirt doesn't ever assign a "base" address value itself. We just
> start QEMU, and then fill in the XML "base" with the value that QEMU
> has assigned.

I see, then I'll ask about it on the QEMU list.

>> Additionally, when adjusting maxMemory due to NVDIMM presence, should I
>> increase it by the specified NVDIMM  or a different value?
>
> IIRC, maxMemory has to allow for the sum of the basic RAM amount, plus
> RAM intended to be used for all possible future (NV)DIMMS that will be
> hotplugged.

OK.

Thanks,
Milan




Re: Memory locking limit and zero-copy migrations

2022-08-18 Thread Milan Zamazal
Fangge Jin  writes:

> On Thu, Aug 18, 2022 at 2:46 PM Milan Zamazal  wrote:
>
>> Fangge Jin  writes:
>>
>> > I can share some test results with you:
>> > 1. If no memtune->hard_limit is set when start a vm, the default memlock
>> > hard limit is 64MB
>> > 2. If memtune->hard_limit is set when start a vm, memlock hard limit will
>> > be set to the value of memtune->hard_limit
>> > 3. If memtune->hard_limit is updated at run-time, memlock hard limit
>> won't
>> > be changed accordingly
>> >
>> > And some additional knowledge:
>> > 1. memlock hard limit can be shown by ‘prlimit -p  -l’
>> > 2. The default value of memlock hard limit can be changed by setting
>> > LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service
>>
>> Ah, that explains it to me, thank you.  And since in the default case
>> the systemd limit is not reported in  of a running VM, I assume
>> libvirt takes it as "not set" and sets the higher limit when setting up
>> a zero-copy migration.  Good.
>>
> Not sure whether you already know this, but I had a hard time
> differentiating the two concepts:
> 1. memlock hard limit(shown by prlimit): the hard limit for locked host
> memory
> 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host
> memory, this memory can be swapped out.

No, I didn't know it, thank you for pointing this out.  Indeed, 2. is
what both the libvirt and kernel documentation seem to say, although not
so clearly.

But when I add  with  to the domain XML and then
start the VM, I can see the limit shown by `prlimit -l' is increased
accordingly.  This is good for my use case, but does it match what you
say about the two concepts?



Re: Memory locking limit and zero-copy migrations

2022-08-19 Thread Milan Zamazal
Fangge Jin  writes:

> On Fri, Aug 19, 2022 at 4:08 AM Milan Zamazal  wrote:
>
>> > Not sure whether you already know this, but I had a hard time
>> > differentiating the two concepts:
>> > 1. memlock hard limit(shown by prlimit): the hard limit for locked host
>> > memory
>> > 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use
>> host
>> > memory, this memory can be swapped out.
>>
>> No, I didn't know it, thank you for pointing this out.  Indeed, 2. is
>> what both the libvirt and kernel documentation seem to say, although not
>> so clearly.
>>
>> But when I add  with  to the domain XML and then
>> start the VM, I can see the limit shown by `prlimit -l' is increased
>> accordingly.  This is good for my use case, but does it match what you
>> say about the two concepts?
>
> memtune->hard_limit(hard limit of in-use memory) actually takes effect via
> cgroup,
> you can check the value by:
> # virsh memtune uefi1
> hard_limit : 134217728
> soft_limit : unlimited
> swap_hard_limit: unlimited
> # cat
> /sys/fs/cgroup/memory/machine.slice/machine-qemu\\x2d6\\x2duefi1.scope/libvirt/memory.limit_in_bytes
>
> 137438953472
>
> When vm starts with memtune->hard_limit set in domain XML, memlock
> hard limit( hard_limit of locked memory, shown by 'prlimit -l')will be
> set to the value of memtune->hard_limit. This's probably because
> memlock hard limit must be less than memtune->hard_limit.

Well, increasing the memlock limit to keep it within memtune->hard_limit
wouldn't make much sense, but thank you for confirming that setting
memtune->hard_limit adjusts both the limits to the requested value.



Re: NUMA node - Memory Only

2022-08-10 Thread Milan Zamazal
Michal Prívozník  writes:

> On 8/9/22 12:55, Jin Huang wrote:
>> Hi, everyone
>> I built the libvirt 8.6.0 on my Ubuntu 20 system with the options like this:
>
>> meson build -Dsystem=true -Ddriver_interface=enabled
>> -Ddriver_libvirtd=enabled -Ddriver_network=enabled -Ddriver_qemu=enabled
>> -Ddriver_remote=enabled -Dnumactl=enabled -Dnumad=enabled
>> -Dstorage_disk=enabled
>> 
>> (1)After installation, when I tried to start the libvirtd, I get this
>> error message:
>> error : virNetworkObjAssignDefLocked:576 : operation failed: network
>> 'default' already exists with uuid 7477a9f5-02d3-4fbc-b0e8-d7229d39a6a2
>> 
>> (2)When try the virsh command, I get this error message:
>> virsh: /lib/x86_64-linux-gnu/libvirt-qemu.so.0: version
>> `LIBVIRT_QEMU_8.2.0' not found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_8.0.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_8.5.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_6.10.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.7.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.8.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.2.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.1.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.3.0' not
>> found (required by virsh)
>> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version
>> `LIBVIRT_PRIVATE_8.6.0' not found (required by virsh)
>> 
>> Could anyone give me some suggestions to fix these issues?
>
> This is pretty much expected if you had libvirt installed from your
> package manager (which I believe is the case because of the network
> error). I don't know what the correct way to build a .deb package is,

The easiest way is to use a package from a newer Ubuntu version.  I'd
suggest using the oldest good enough version available to avoid problems
with dependencies.  You may still be forced to use a newer libc or so.
In such a case, you can either configure your system (in /etc/apt/) to
use some packages from a newer Ubuntu version, or to rebuild the newer
libvirt package using instructions from
https://wiki.debian.org/BuildingTutorial#The_packaging_workflow

> but on rpm based distros I usually build a .tar.xz (meson dist) from
> which I build a .rpm (rpmbuild -ta) and then install it.
>
> Michal



Re: Memory locking limit and zero-copy migrations

2022-08-17 Thread Milan Zamazal
Peter Krempa  writes:

> On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote:
>> Hi,
>> 
>
>> do I read libvirt sources right that when  is not used in the
>> libvirt domain then libvirt takes proper care about setting memory
>> locking limits when zero-copy is requested for a migration?
>
> Well yes, for a definition of "proper". In this instance qemu can lock
> up to the guest-visible memory size of memory for the migration, thus we
> set the lockable size to the guest memory size. This is a simple upper
> bound which is supposed to work in all scenarios. Qemu is also unlikely
> to ever use up all the allowed locking.

Great, thank you for confirmation.

>> I also wonder whether there are any other situations where memory limits
>> could be set by libvirt or QEMU automatically rather than having no
>> memory limits?  We had oVirt bugs in the past where certain VMs with
>> VFIO devices couldn't be started due to extra requirements on the amount
>> of locked memory and adding  to the domain apparently
>> helped.
>
>  is not only an amount of memory qemu can lock into ram, but
> an upper bound of all memory the qemu process can consume. This includes
> any qemu overhead e.g. used for the emulation layer.
>
> Guessing the correct size of overhead still has the same problems it had
> and libvirt is not going to be in the business of doing that.

To clarify, my point was not whether libvirt should, but whether libvirt
or any related component possibly does (or did in the past) impose
memory limits.  Because as I was looking around it seems there are no
real memory limits by default, at least in libvirt, but some limit had
been apparently hit in the reported bugs.



Memory locking limit and zero-copy migrations

2022-08-17 Thread Milan Zamazal
Hi,

do I read libvirt sources right that when  is not used in the
libvirt domain then libvirt takes proper care about setting memory
locking limits when zero-copy is requested for a migration?

I also wonder whether there are any other situations where memory limits
could be set by libvirt or QEMU automatically rather than having no
memory limits?  We had oVirt bugs in the past where certain VMs with
VFIO devices couldn't be started due to extra requirements on the amount
of locked memory and adding  to the domain apparently
helped.

Thanks,
Milan



Re: Memory locking limit and zero-copy migrations

2022-08-18 Thread Milan Zamazal
Fangge Jin  writes:

> I can share some test results with you:
> 1. If no memtune->hard_limit is set when start a vm, the default memlock
> hard limit is 64MB
> 2. If memtune->hard_limit is set when start a vm, memlock hard limit will
> be set to the value of memtune->hard_limit
> 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't
> be changed accordingly
>
> And some additional knowledge:
> 1. memlock hard limit can be shown by ‘prlimit -p  -l’
> 2. The default value of memlock hard limit can be changed by setting
> LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service

Ah, that explains it to me, thank you.  And since in the default case
the systemd limit is not reported in  of a running VM, I assume
libvirt takes it as "not set" and sets the higher limit when setting up
a zero-copy migration.  Good.

Regards,
Milan

> BR,
> Fangge Jin
>
> On Wed, Aug 17, 2022 at 19:25 Milan Zamazal  wrote:
>
>> Peter Krempa  writes:
>>
>> > On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote:
>> >> Hi,
>> >>
>> >
>> >> do I read libvirt sources right that when  is not used in the
>> >> libvirt domain then libvirt takes proper care about setting memory
>> >> locking limits when zero-copy is requested for a migration?
>> >
>> > Well yes, for a definition of "proper". In this instance qemu can lock
>> > up to the guest-visible memory size of memory for the migration, thus we
>> > set the lockable size to the guest memory size. This is a simple upper
>> > bound which is supposed to work in all scenarios. Qemu is also unlikely
>> > to ever use up all the allowed locking.
>>
>> Great, thank you for confirmation.
>>
>> >> I also wonder whether there are any other situations where memory limits
>> >> could be set by libvirt or QEMU automatically rather than having no
>> >> memory limits?  We had oVirt bugs in the past where certain VMs with
>> >> VFIO devices couldn't be started due to extra requirements on the amount
>> >> of locked memory and adding  to the domain apparently
>> >> helped.
>> >
>> >  is not only an amount of memory qemu can lock into ram, but
>> > an upper bound of all memory the qemu process can consume. This includes
>> > any qemu overhead e.g. used for the emulation layer.
>> >
>> > Guessing the correct size of overhead still has the same problems it had
>> > and libvirt is not going to be in the business of doing that.
>>
>> To clarify, my point was not whether libvirt should, but whether libvirt
>> or any related component possibly does (or did in the past) impose
>> memory limits.  Because as I was looking around it seems there are no
>> real memory limits by default, at least in libvirt, but some limit had
>> been apparently hit in the reported bugs.
>>
>>