Re: [libvirt-users] Determining domain job kind from job stats?
Jiri Denemark <jdene...@redhat.com> writes: > On Fri, Feb 17, 2017 at 12:38:24 +0100, Milan Zamazal wrote: >> Jiri Denemark <jdene...@redhat.com> writes: >> >> > On Fri, Feb 10, 2017 at 21:50:19 +0100, Milan Zamazal wrote: >> >> Hi, is there a reliable way to find out to what kind of job does the >> >> information returned from virDomainGetJobStats or provided in >> >> VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event callback belong to? >> > >> > No, libvirt expects that the caller knows what job it started. All jobs >> > currently reported using virDomainGetJobStats API or >> > VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event are internally implemented as >> > migration in QEMU driver (either to a file or to a network socket), >> > which may confuse any heuristics for detecting the job type from the set >> > of fields returned by libvirt. >> >> I see, thank you for explanation. >> >> > What is the problem you are trying to solve? >> >> There are basically two problems: >> >> - When the job completion callback is called, I need to distinguish what >> kind of job was it to perform the appropriate actions. It would be >> easier if I knew the job type directly in the callback (no need to >> coordinate anything), but "external" job tracking is also possible. > > An immediate answer would be: "don't rely on the completion callback and > just check the return value of the API which started the job", but I > guess you want it because checking the return value is not possible when > the process which started the job is not running anymore as described > below. Well, avoiding using the completion callback is probably OK for me. (In case of the process restart, I don't expect having everything perfectly working, just some basic sanity.) >> - If I lost track of my jobs (e.g. because of a crash and restart), I'd >> like to find out whether a given VM is migrating. Examining the job >> looked like a good candidate to get the information, but apparently >> it's not. Again, I can probably arrange things to handle that, but to >> get the information directly from libvirt (not necessarily via job >> info) would be easier and more reliable. > > Apparently you are talking about peer-to-peer migration, Yes. > otherwise the migration would be automatically canceled when the > process which started it disappears. I'm afraid this is not currently > possible in general. You might be able to get something by checking > the domain's status, but it won't work in all cases. Too bad. Could some future libvirt version provide that information? Thank you for clarification, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] Determining domain job kind from job stats?
Hi, is there a reliable way to find out to what kind of job does the information returned from virDomainGetJobStats or provided in VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event callback belong to? I'm specifically interested in distinguishing host-to-host migration jobs (e.g. those started by virDomainMigrateToUri* functions) from other jobs. If there is no better way, I'm thinking about examining presence or values of certain fields in the stats. I'd be fine with that as long as I can be sure it's a reliable way to identify the job kind. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Determining domain job kind from job stats?
Jiri Denemark <jdene...@redhat.com> writes: > On Fri, Feb 10, 2017 at 21:50:19 +0100, Milan Zamazal wrote: >> Hi, is there a reliable way to find out to what kind of job does the >> information returned from virDomainGetJobStats or provided in >> VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event callback belong to? > > No, libvirt expects that the caller knows what job it started. All jobs > currently reported using virDomainGetJobStats API or > VIR_DOMAIN_EVENT_ID_JOB_COMPLETED event are internally implemented as > migration in QEMU driver (either to a file or to a network socket), > which may confuse any heuristics for detecting the job type from the set > of fields returned by libvirt. I see, thank you for explanation. > What is the problem you are trying to solve? There are basically two problems: - When the job completion callback is called, I need to distinguish what kind of job was it to perform the appropriate actions. It would be easier if I knew the job type directly in the callback (no need to coordinate anything), but "external" job tracking is also possible. - If I lost track of my jobs (e.g. because of a crash and restart), I'd like to find out whether a given VM is migrating. Examining the job looked like a good candidate to get the information, but apparently it's not. Again, I can probably arrange things to handle that, but to get the information directly from libvirt (not necessarily via job info) would be easier and more reliable. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call
"Daniel P. Berrange" <berra...@redhat.com> writes: > On Fri, Mar 17, 2017 at 11:55:13AM +0100, Milan Zamazal wrote: >> Hi, we experienced a strange, non-reproducible error after a successful >> migration to another host. When we called virDomainDestroyFlags with >> VIR_DOMAIN_DESTROY_GRACEFUL flag after the migration on the source host, >> we got VIR_ERR_OPERATION_INVALID (code 55) error. The same with >> repeated virDomainDestroyFlags calls. Normally, we would expect either >> success or VIR_ERR_NO_DOMAIN error. `virsh list' didn't show the VM. > > What about 'virsh list --all' - i expect you have an inactive guest > present, as calling destory on an inactive guest triggers OPERATION_INVALID I see. It's interesting, since we use transient domains. Are there known circumstances when OPERATION_INVALID could be returned for a transient domain? Can we assume that we never receive that error when trying to destroy a running domain? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call
Hi, we experienced a strange, non-reproducible error after a successful migration to another host. When we called virDomainDestroyFlags with VIR_DOMAIN_DESTROY_GRACEFUL flag after the migration on the source host, we got VIR_ERR_OPERATION_INVALID (code 55) error. The same with repeated virDomainDestroyFlags calls. Normally, we would expect either success or VIR_ERR_NO_DOMAIN error. `virsh list' didn't show the VM. Can anybody please explain to us when this can happen and what the error means in this context? When we have good reasons to believe that the VM is down (e.g. after a migration call successfully finishes) and we receive such an error from virDomainDestroyFlags, is it safe to assume the VM is basically gone and can we perform standard cleanup actions (like removing related files from the host file system)? Thank you, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] VIR_ERR_OPERATION_INVALID from virDomainDestroyFlags call
"Daniel P. Berrange" <berra...@redhat.com> writes: > On Fri, Mar 17, 2017 at 02:07:11PM +0100, Milan Zamazal wrote: >> "Daniel P. Berrange" <berra...@redhat.com> writes: >> >> > On Fri, Mar 17, 2017 at 11:55:13AM +0100, Milan Zamazal wrote: >> >> Hi, we experienced a strange, non-reproducible error after a successful >> >> migration to another host. When we called virDomainDestroyFlags with >> >> VIR_DOMAIN_DESTROY_GRACEFUL flag after the migration on the source host, >> >> we got VIR_ERR_OPERATION_INVALID (code 55) error. The same with >> >> repeated virDomainDestroyFlags calls. Normally, we would expect either >> >> success or VIR_ERR_NO_DOMAIN error. `virsh list' didn't show the VM. >> > >> > What about 'virsh list --all' - i expect you have an inactive guest >> > present, as calling destory on an inactive guest triggers OPERATION_INVALID >> >> I see. It's interesting, since we use transient domains. Are there >> known circumstances when OPERATION_INVALID could be returned for a >> transient domain? Can we assume that we never receive that error when >> trying to destroy a running domain? > > Cleanup & destruction of domains is an area where there is relatively > high level of concurrency in libvirt. So it is conceivable that you > would see OPERATION_INVALID for a transient guest if libvirt is part > way through cleaning it up - it shouldn't be in that state for very > long though We had the state returning OPERATION_INVALID for "infinite" time. That could be caused by some bug or maybe problems with storage or whatever, we don't know. > You'll never see OPERATION_INVALID if the guest is truely running - it > will either be shutoff, or in the process of becoming shutoff very soon. OK, thank you for explanation and clarification. Regards, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] Device lease hot unplug and events
Hi, when working on hot unplugs of various devices, I've found out that hot unplugging device doesn't generate VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. also doesn't have an alias, so it wouldn't be identifiable in the corresponding callback. Is this difference from other hotpluggable devices intentional? If yes, is there any better way of checking that removal is completed than querying and examining the domain XML? From user's point of view, it would be best if I could simply handle the device removal event the same way as with other devices. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Device lease hot unplug and events
Peter Krempa writes: > On Mon, Oct 15, 2018 at 09:56:39 +0200, Milan Zamazal wrote: >> Peter Krempa writes: >> > >> > On Fri, Oct 12, 2018 at 19:33:54 +0200, Milan Zamazal wrote: >> >> Hi, when working on hot unplugs of various devices, I've found out that >> >> hot unplugging device doesn't generate >> >> VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. also doesn't have an >> >> alias, so it wouldn't be identifiable in the corresponding callback. >> >> >> >> Is this difference from other hotpluggable devices intentional? If yes, >> > >> > Well a "lease" is not a device per-se. It's just libvirt putting it with >> > devices. Currently the "lease" is always successfully removed/unplugged >> > if the API returns success as there is no cooperation with qemu >> > necessary so the semantics of asking the guest OS to do something don't >> > apply. >> >> I see, thank you for explanation. Can we rely on the fact that lease >> removal is and remains synchronous or is it a property that can change >> in the future? > > In case of the current implementation I don't see a reason why we'd have > to change it. If so it would be for a different "model". > > Generally we can't guarantee that some usage will not eventually need it > but we should not change this behaviour, or at least I don't expect us > to. > >> >> >> is there any better way of checking that removal is completed >> >> than querying and examining the domain XML? From user's point of view, >> >> it would be best if I could simply handle the device removal event the >> >> same way as with other devices. >> > >> > Yes, we probably should add the event and synthetize it for "lease" >> > since we will not get one from qemu. Also we'll need to add alias for >> > the lease so that the event can be used. >> >> Since this is what an uninformed user expects (and I believe libvirt >> documentation doesn't contradict), I'd like to have the event + alias. >> Should I file a corresponding bug or RFE? > > Yes please. OK, done: https://bugzilla.redhat.com/1639228 ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Device lease hot unplug and events
Peter Krempa writes: > On Fri, Oct 12, 2018 at 19:33:54 +0200, Milan Zamazal wrote: >> Hi, when working on hot unplugs of various devices, I've found out that >> hot unplugging device doesn't generate >> VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event. also doesn't have an >> alias, so it wouldn't be identifiable in the corresponding callback. >> >> Is this difference from other hotpluggable devices intentional? If yes, > > Well a "lease" is not a device per-se. It's just libvirt putting it with > devices. Currently the "lease" is always successfully removed/unplugged > if the API returns success as there is no cooperation with qemu > necessary so the semantics of asking the guest OS to do something don't > apply. I see, thank you for explanation. Can we rely on the fact that lease removal is and remains synchronous or is it a property that can change in the future? >> is there any better way of checking that removal is completed >> than querying and examining the domain XML? From user's point of view, >> it would be best if I could simply handle the device removal event the >> same way as with other devices. > > Yes, we probably should add the event and synthetize it for "lease" > since we will not get one from qemu. Also we'll need to add alias for > the lease so that the event can be used. Since this is what an uninformed user expects (and I believe libvirt documentation doesn't contradict), I'd like to have the event + alias. Should I file a corresponding bug or RFE? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Which objects does dynamic_ownership apply to?
Michal Prívozník writes: > On 09/19/2018 12:39 PM, Milan Zamazal wrote: >> Hi, I'm playing with dynamic ownership and not all objects have their >> owners changed. > >> >> Is dynamic_ownership and its scope documented somewhere, besides the >> comment in qemu.conf? >> >> And what kinds of objects are handled by dynamic ownership? While some >> objects seem to be handled, other objects are apparently unaffected. >> For instance /dev/hwrng or a USB host device keep their root owners and >> are inaccessible to the VM. Is that expected or do I have anything >> wrong? > > Basically, if a file is used solely by a domain we can relabel it. > However, if a file can be used by other processes (not only qemu) then > we must not change its label as we would be effectively cutting of the > other processes we know nothing about. In this case, /dev/hwrng might be > used by some other process in the system. Also the fact that it's owned > by root:root and not readable by anybody except the root user, tells me > that we might not want to pass the file to any domain? Well, /dev/hwrng may be arguable, although oVirt permits passing it to a VM, of course only on explicit user's request. But how about host devices such as USB and PCI devices? For example doesn't change the owner of /dev/bus/usb/003/002 (the same for managed="yes"). Similarly for a PCI hostdev device /dev/vfio/* owners are not changed. Does the same argument apply? OTOH, a CD-ROM image, which can be shared across domains and at least in theory can be accessed by other processes, gets its owner changed. My primary concern right now is what exactly is handled. We can deal with manual ownership changes of certain devices as we have done so far. But I'm looking for a more reliable source of information than my experiments, to prevent future breakages. Is it documented anywhere what is handled by libvirt and what is not? Or can it be defined in less ambiguous terms than above? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Which objects does dynamic_ownership apply to?
Michal Privoznik writes: > On 09/20/2018 12:31 PM, Milan Zamazal wrote: >> Michal Prívozník writes: >> > >>> On 09/19/2018 12:39 PM, Milan Zamazal wrote: >>>> Hi, I'm playing with dynamic ownership and not all objects have their >>>> owners changed. >>> >>>> >>>> Is dynamic_ownership and its scope documented somewhere, besides the >>>> comment in qemu.conf? >>>> >>>> And what kinds of objects are handled by dynamic ownership? While some >>>> objects seem to be handled, other objects are apparently unaffected. >>>> For instance /dev/hwrng or a USB host device keep their root owners and >>>> are inaccessible to the VM. Is that expected or do I have anything >>>> wrong? >>> >>> Basically, if a file is used solely by a domain we can relabel it. >>> However, if a file can be used by other processes (not only qemu) then >>> we must not change its label as we would be effectively cutting of the >>> other processes we know nothing about. In this case, /dev/hwrng might be >>> used by some other process in the system. Also the fact that it's owned >>> by root:root and not readable by anybody except the root user, tells me >>> that we might not want to pass the file to any domain? >> >> Well, /dev/hwrng may be arguable, although oVirt permits passing it to a >> VM, of course only on explicit user's request. >> >> But how about host devices such as USB and PCI devices? For example >> >> >> >> >> >> >> >> >> >> doesn't change the owner of /dev/bus/usb/003/002 (the same for >> managed="yes"). > > Are you perhaps using namespaces and looking into the parent namespace > rather than into qemu namespace? Ah, that's the trick, thank you! > Similarly for a PCI hostdev device /dev/vfio/* owners >> are not changed. Does the same argument apply? > > Again, try looking into the namespace. > >> >> OTOH, a CD-ROM image, which can be shared across domains and at least in >> theory can be accessed by other processes, gets its owner changed. > > Well, this is arguable. Firstly, if you want CD-ROM image to be shared, > it needs to have tag, and you may want to either disable > relabelling by or ensure by other ways that all > qemu processes are able to access it. OK, makes sense. > Libvirt should not get involved into coming up with a seclabel that > would fit all. In terms of unix uid:gid - libvirt should not try to > figure out which users belong to which groups and try to find such > combination that would fit all. This is sysadmin's responsibility. Sure. >> My primary concern right now is what exactly is handled. We can deal >> with manual ownership changes of certain devices as we have done so far. >> But I'm looking for a more reliable source of information than my >> experiments, to prevent future breakages. Is it documented anywhere >> what is handled by libvirt and what is not? Or can it be defined in >> less ambiguous terms than above? > > What devices are you changing yourself? We definitely need to go through > the list and evaluate every item. USB/SCSI/PCI/mediated host devices – I assume it was just my confusion of not being aware of the namespace and they should work. And then hwrng, which definitely doesn't work (QEMU fails on start due to not being able to access it). We can change the owner in oVirt, since the host is not supposed to be used for purposes other than running VMs. I can understand that doesn't apply in all situations. But if someone passes /dev/hwrng to a VM, it's intended to be accessible there. Should libvirt or sysadmin be responsible for that? (I'd vote for libvirt in order to get rid of the udev rule based workaround in oVirt, but maybe there are more important arguments to consider.) Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Usable and non-usable CPU models in nested virtualization
Thank you for explanation, it's clear to me now. One last question: Is there a way to get supported compatibility modes on POWER? For instance, I get the following from domcapabilities on a POWER9 machine: POWER9 IBM How can I find out that POWER8 guests are also supported on the machine? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Usable and non-usable CPU models in nested virtualization
Jiri Denemark writes: > On Fri, Dec 07, 2018 at 11:52:38 +0100, Milan Zamazal wrote: >> Hi, some custom CPU models are reported from >> virConnectGetDomainCapabilities as usable='yes' on a physical machine > >> while as usable='no' inside a VM running on the same machine. That's >> not completely surprising. >> >> But what surprises me is that those models are still reported from >> virConnectCompareCPU as supported (VIR_CPU_COMPARE_SUPERSET) in the > > virConnectCompareCPU uses CPUID data for comparison, which is not the > same as a list of features QEMU/KVM can provide on the host. You should > use virConnectCompareHypervisorCPU to check whether a given CPU can be > used on the host. > >> nested environment and VMs can be started happily with them. >> >> For instance, virConnectGetDomainCapabilities reports >> >> Skylake-Client >> >> but when I try to use that model anyway, the VM starts fine with it: >> >> >> Skylake-Client >> >> >> > > This is not the same as Skylake-Client, it's Skylake-Client without > invpcid. The usable='no' attribute says the Skylake-Client CPU model is > not usable unless you disable some features. You did that and it works. > If you asked for just Skylake-Client without any element, the > domain should fail to start. Thank you for explanation. However the behavior I observe is still not clear to me. The snippet above is from a running domain, successfully started from this definition: Skylake-Client When this definition is fed to compare CPU, I get: # virsh hypervisor-cpu-compare cpu.xml CPU described in cpu.xml is incompatible with the CPU provided by hypervisor on the host # virsh cpu-compare cpu.xml Host CPU is a superset of CPU described in cpu.xml It's not clear to me: - Why is the domain successfully started despite hypervisor-cpu-compare rejects it? - Why is `invpcid' disabled when `invpcid' is present in /proc/cpuinfo? - What's the basic difference between virConnectCompareCPU and virConnectCompareHypervisorCPU? Does "specific hypervisor and its abilities" (as stated in the documentation) mean that the hypervisor may extend CPU capabilities (by emulation), restrict CPU capabilities, or both (depending or particular feature etc.)? > Actually QEMU even reports what features need to be disabled to run each > CPU model, but I don't think that's really useful. You don't want to > disable all of them mechanically anyway since that can result in strange > CPU models which would confuse guests. That's why we only report the > usable=yes/no attribute. > > Jirka ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] Usable and non-usable CPU models in nested virtualization
Hi, some custom CPU models are reported from virConnectGetDomainCapabilities as usable='yes' on a physical machine while as usable='no' inside a VM running on the same machine. That's not completely surprising. But what surprises me is that those models are still reported from virConnectCompareCPU as supported (VIR_CPU_COMPARE_SUPERSET) in the nested environment and VMs can be started happily with them. For instance, virConnectGetDomainCapabilities reports Skylake-Client but when I try to use that model anyway, the VM starts fine with it: Skylake-Client That's actually good news, but unexpected. Do I miss something? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] How to get list of CPUs compatible with the host CPU and vendor?
Hi, I'm trying to use virConnectGetDomainCapabilities to get the list of CPUs compatible with the host CPU. I would like to further limit the list to CPUs of the same vendor as the host CPU. How can I do that? I tried to use virConnectBaselineCPU with element and checking whether I obtain the same CPU, but that doesn't filter out CPUs without any vendor such as `kvm64' or `pentium'. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] Which objects does dynamic_ownership apply to?
Hi, I'm playing with dynamic ownership and not all objects have their owners changed. Is dynamic_ownership and its scope documented somewhere, besides the comment in qemu.conf? And what kinds of objects are handled by dynamic ownership? While some objects seem to be handled, other objects are apparently unaffected. For instance /dev/hwrng or a USB host device keep their root owners and are inaccessible to the VM. Is that expected or do I have anything wrong? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] Certificate checking on TLS migrations to an IP address
Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem with certificate checking. oVirt uses the destination host IP address, rather than the host name, in the migration URI passed to virDomainMigrateToURI3. One reason for doing that is that a separate migration network may be used for migrations, while the host name resolves to the management network interface. But it causes a problem with certificate checking. The destination IP address is checked against the name, which is a host name, given in the destination certificate. That means there is mismatch and the migration fails. I don't think it'd be a very good idea to avoid the problem by putting IP addresses into server certificates. Is there any way to make TLS migrations working under these circumstances? For instance, SPICE remote-viewer allows the client to specify the certificate subject to expect on the host when connecting to it using an IP address. Can (or could) libvirt do something similar? Or is there any other mechanism to handle this problem? Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Certificate checking on TLS migrations to an IP address
Daniel P. Berrangé writes: > On Wed, Sep 04, 2019 at 03:38:25PM +0200, Milan Zamazal wrote: >> Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem >> with certificate checking. > >> >> oVirt uses the destination host IP address, rather than the host name, >> in the migration URI passed to virDomainMigrateToURI3. One reason for >> doing that is that a separate migration network may be used for >> migrations, while the host name resolves to the management network >> interface. >> >> But it causes a problem with certificate checking. The destination IP >> address is checked against the name, which is a host name, given in the >> destination certificate. That means there is mismatch and the migration >> fails. I don't think it'd be a very good idea to avoid the problem by >> putting IP addresses into server certificates. > > In fact that is *exactly* what you should be doing. OK, thank you for explanation and the doc reference. Regards, Milan > Traditionally certificates were created with the 'common name' field > holding the fully qualified DNS based hostname for the server. > > This was long known to be a problem because it is very common for > servers to have multiple DNS names, or for clients to use the > unqualified hostname, or use the IP address(es). > > Thus, the "Subject alt name" extension was created. This allows > certificates to be created containing multiple hostnames and > multiple IP addresses. The certificate will be validated correctly > if any one of those data items matches. When 'subject alt name' is > present in a certificate, the 'common name' field should be completely > ignored by compliant TLS clients, so you are free to put whatever > you want in the common name - hostname or IP address or blah... > > If you look at our docs, we updated them to illustrate how to > issue certs containing hostnames + IP addresses: > > https://libvirt.org/remote.html#Remote_TLS_server_certificates > >> >> Is there any way to make TLS migrations working under these >> circumstances? For instance, SPICE remote-viewer allows the client to >> specify the certificate subject to expect on the host when connecting to >> it using an IP address. Can (or could) libvirt do something similar? >> Or is there any other mechanism to handle this problem? > > Regards, > Daniel ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Certificate checking on TLS migrations to an IP address
Daniel P. Berrangé writes: > On Wed, Sep 18, 2019 at 12:18:32PM +0200, Milan Zamazal wrote: >> Daniel P. Berrangé writes: >> > >> > On Wed, Sep 04, 2019 at 03:38:25PM +0200, Milan Zamazal wrote: >> >> Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem >> >> with certificate checking. >> > >> >> >> >> oVirt uses the destination host IP address, rather than the host name, >> >> in the migration URI passed to virDomainMigrateToURI3. One reason for >> >> doing that is that a separate migration network may be used for >> >> migrations, while the host name resolves to the management network >> >> interface. >> >> >> >> But it causes a problem with certificate checking. The destination IP >> >> address is checked against the name, which is a host name, given in the >> >> destination certificate. That means there is mismatch and the migration >> >> fails. I don't think it'd be a very good idea to avoid the problem by >> >> putting IP addresses into server certificates. >> > >> > In fact that is *exactly* what you should be doing. >> > >> > Traditionally certificates were created with the 'common name' field >> > holding the fully qualified DNS based hostname for the server. >> > >> > This was long known to be a problem because it is very common for >> > servers to have multiple DNS names, or for clients to use the >> > unqualified hostname, or use the IP address(es). >> >> The problem with putting IP addresses into certificates is that the >> certificate must be updated each time an IP address changes, is added or >> is removed. Doing this in oVirt would be complicated and error-prone. >> While host names are stable, host networks and the related IP addresses >> may change. >> >> > Thus, the "Subject alt name" extension was created. This allows >> > certificates to be created containing multiple hostnames and >> > multiple IP addresses. The certificate will be validated correctly >> > if any one of those data items matches. When 'subject alt name' is >> > present in a certificate, the 'common name' field should be completely >> > ignored by compliant TLS clients, so you are free to put whatever >> > you want in the common name - hostname or IP address or blah... >> >> We can switch to using Subject Alt Name and we have a patch for that now >> based on your advice, but it doesn't solve the problem with tracking IP >> address changes and updating the corresponding certificates whenever a >> change occurs. >> >> > If you look at our docs, we updated them to illustrate how to >> > issue certs containing hostnames + IP addresses: >> > >> > https://libvirt.org/remote.html#Remote_TLS_server_certificates >> > >> >> >> >> Is there any way to make TLS migrations working under these >> >> circumstances? For instance, SPICE remote-viewer allows the client to >> >> specify the certificate subject to expect on the host when connecting to >> >> it using an IP address. Can (or could) libvirt do something similar? >> >> Would it be possible? We have host names in the certificates under our >> control and we know which host name to expect in the certificate >> regardless the IP address used for the given connection. Checking the >> certificate against a given host name would solve the problem easily and >> robustly for us. > > There's two options that could make it work > > - Define a new migration parameter which lets apps pass in the hostname >to use for TLS cert validation to libvirt, which would have to then >pass it into QEMU I think this is the best option. We know the destination host name, while we need to use an IP address to connect to it in order to use a particular network. > - The source host libvirtd has a connection to the dest host libvirtd. >It can thus ask dest host what its primary hostname is, and then >automatically tell QEMu to use that for TLS cert validation. This >could cause problems though for people already using TLS certs >with IP addresses in. This doesn't look very good from the security point of view, since then the source doesn't check it really connects to the host it expects, just that the destination host has a valid certificate signed by the right CA (I suppose). It may be good enough or even useful for some scenarios, but not for others. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Descriptions of mdev types?
Erik Skultety writes: > On Tue, Nov 19, 2019 at 05:38:54PM +0100, Milan Zamazal wrote: >> Hi, when retrieving an mdev device info using `virsh nodedev-dumpxml' or >> the libvirt API, something like the following is returned: > >> >> >> >> GRID M60-2B4 >> vfio-pci >> 4 >> >> ... >> >> >> Besides device_api, available_instances and (optional) `name', >> `description' of the given mdev type may be optionally provided in >> /sys/.../mdev_supported_types/... for each of the available mdev types. >> I can see in the sources that libvirt doesn't try to retrieve it -- is >> it intentionally or is it just an omission? If the latter, could it be >> added, please? It looks like a useful piece of information for the user >> to get an idea what the given mdev type means. > > The reasoning we had about not including the "description" attribute when we > introduced mdevs to libvirt's nodedev driver was that there was no way for > NVIDIA and Intel to agree on the values to be exposed by the attribute - > especially the data NVIDIA puts in there (like you already said) is useful, > but > there was also no agreement on extracting the data into a different attribute > (a set of attributes) and make them structured within the XML. > > Which brings me to the actual content of the "description" attribute, it > contains unstructured free-form text and we didn't want to expose that kind of > thing in the XML even though it just so happens that NVIDIA put some > interesting data in it - since the attribute is optional and free-form, one > day, you find the useful data you're interested in now, but tomorrow that may > not be the case anymore and can easily change. I've got no problem with the > idea of exposing some kind of description as part of the XML per se, the > problem I see is that someone will try and start parsing the description field > because of the potentially useful data and if it changes and I'm afraid > complaints will head our way even though we cannot guarantee anything wrt to > that specific field (I'm still open to a discussion though). I agree that it would not be a good idea to rely on the content of the description data or trying to parse it. But if we can accept the description is basically a free-form, less or more informative, text then it is still useful for some purposes. I think your worries about the users relying on the content of the "data" can be handled in documentation, by clearly stating the item may contain anything, which may change any time. Our use case in oVirt is to display some accompanied information when a user selects one of the many mdev types in the UI. If we provided description (as a normal text) next to each of the mdev types, it would be definitely helpful and a significant improvement over just providing the cryptic mdev type numbers + available instances. Currently, we have no better choice than to read `description' files from /sys, which makes little sense when all the other info is already available from libvirt. This is why I think having element with a text content copied 1:1 from `description' file would be useful. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
[libvirt-users] Descriptions of mdev types?
Hi, when retrieving an mdev device info using `virsh nodedev-dumpxml' or the libvirt API, something like the following is returned: GRID M60-2B4 vfio-pci 4 ... Besides device_api, available_instances and (optional) `name', `description' of the given mdev type may be optionally provided in /sys/.../mdev_supported_types/... for each of the available mdev types. I can see in the sources that libvirt doesn't try to retrieve it -- is it intentionally or is it just an omission? If the latter, could it be added, please? It looks like a useful piece of information for the user to get an idea what the given mdev type means. Thanks, Milan ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: [libvirt-users] Certificate checking on TLS migrations to an IP address
Milan Zamazal writes: > Daniel P. Berrangé writes: > >> On Wed, Sep 18, 2019 at 12:18:32PM +0200, Milan Zamazal wrote: >>> Daniel P. Berrangé writes: >>> >> >>> > On Wed, Sep 04, 2019 at 03:38:25PM +0200, Milan Zamazal wrote: >>> >> Hi, I'm trying to add TLS migrations to oVirt, but I've hit a problem >>> >> with certificate checking. >>> > >>> >> >>> >> oVirt uses the destination host IP address, rather than the host name, >>> >> in the migration URI passed to virDomainMigrateToURI3. One reason for >>> >> doing that is that a separate migration network may be used for >>> >> migrations, while the host name resolves to the management network >>> >> interface. >>> >> >>> >> But it causes a problem with certificate checking. The destination IP >>> >> address is checked against the name, which is a host name, given in the >>> >> destination certificate. That means there is mismatch and the migration >>> >> fails. I don't think it'd be a very good idea to avoid the problem by >>> >> putting IP addresses into server certificates. >>> > >>> > In fact that is *exactly* what you should be doing. >>> > >>> > Traditionally certificates were created with the 'common name' field >>> > holding the fully qualified DNS based hostname for the server. >>> > >>> > This was long known to be a problem because it is very common for >>> > servers to have multiple DNS names, or for clients to use the >>> > unqualified hostname, or use the IP address(es). >>> >>> The problem with putting IP addresses into certificates is that the >>> certificate must be updated each time an IP address changes, is added or >>> is removed. Doing this in oVirt would be complicated and error-prone. >>> While host names are stable, host networks and the related IP addresses >>> may change. >>> >>> > Thus, the "Subject alt name" extension was created. This allows >>> > certificates to be created containing multiple hostnames and >>> > multiple IP addresses. The certificate will be validated correctly >>> > if any one of those data items matches. When 'subject alt name' is >>> > present in a certificate, the 'common name' field should be completely >>> > ignored by compliant TLS clients, so you are free to put whatever >>> > you want in the common name - hostname or IP address or blah... >>> >>> We can switch to using Subject Alt Name and we have a patch for that now >>> based on your advice, but it doesn't solve the problem with tracking IP >>> address changes and updating the corresponding certificates whenever a >>> change occurs. >>> >>> > If you look at our docs, we updated them to illustrate how to >>> > issue certs containing hostnames + IP addresses: >>> > >>> > https://libvirt.org/remote.html#Remote_TLS_server_certificates >>> > >>> >> >>> >> Is there any way to make TLS migrations working under these >>> >> circumstances? For instance, SPICE remote-viewer allows the client to >>> >> specify the certificate subject to expect on the host when connecting to >>> >> it using an IP address. Can (or could) libvirt do something similar? >>> >>> Would it be possible? We have host names in the certificates under our >>> control and we know which host name to expect in the certificate >>> regardless the IP address used for the given connection. Checking the >>> certificate against a given host name would solve the problem easily and >>> robustly for us. >> >> There's two options that could make it work >> >> - Define a new migration parameter which lets apps pass in the hostname >>to use for TLS cert validation to libvirt, which would have to then >>pass it into QEMU > > I think this is the best option. We know the destination host name, > while we need to use an IP address to connect to it in order to use a > particular network. If we can agree on this, should I file a corresponding RFE on libvirt? Thanks, Milan >> - The source host libvirtd has a connection to the dest host libvirtd. >>It can thus ask dest host what its primary hostname is, and then >>automatically tell QEMu to use that for TLS cert validation. This >>could cause problems though for people already using TLS certs >>with IP addresses in. > > This doesn't look very good from the security point of view, since then > the source doesn't check it really connects to the host it expects, just > that the destination host has a valid certificate signed by the right CA > (I suppose). It may be good enough or even useful for some scenarios, > but not for others. ___ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
Re: How to detect completion of a paused VM migration on the destination?
Michal Privoznik writes: > On 1/21/20 3:28 PM, Milan Zamazal wrote: >> Hi, >> > >> when a normally running VM is migrated, libvirt sends >> VIR_DOMAIN_EVENT_RESUMED_MIGRATED event on the destination once the >> migration completes. I can see that when a paused VM is migrated, >> libvirt sends VIR_DOMAIN_EVENT_SUSPENDED_PAUSED instead. >> >> Since there seems to be nothing migration specific about >> VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event, my question is: Is it safe to >> assume on the destination that this event signals completion of the >> incoming migration (unless VIR_DOMAIN_EVENT_RESUMED_MIGRATED is received >> before)? > > Yes. This is the code that handles the finish phase of migration: > > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/qemu/qemu_migration.c;h=29d228a8d9345ec8e2853571444614008a95e914;hb=HEAD#l5105 > > which can be read as the following pseudo code: > > if (postCopy) > sendEvent(VIR_DOMAIN_EVENT_RESUMED_MIGRATED); > > if (domain.paused) > sendEvent(VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); OK, thank you for clarification. Regards, Milan
How to detect completion of a paused VM migration on the destination?
Hi, when a normally running VM is migrated, libvirt sends VIR_DOMAIN_EVENT_RESUMED_MIGRATED event on the destination once the migration completes. I can see that when a paused VM is migrated, libvirt sends VIR_DOMAIN_EVENT_SUSPENDED_PAUSED instead. Since there seems to be nothing migration specific about VIR_DOMAIN_EVENT_SUSPENDED_PAUSED event, my question is: Is it safe to assume on the destination that this event signals completion of the incoming migration (unless VIR_DOMAIN_EVENT_RESUMED_MIGRATED is received before)? Thanks, Milan
Re: Two questions about NVDIMM devices
Daniel P. Berrangé writes: > On Thu, Sep 10, 2020 at 04:26:40PM +0200, Milan Zamazal wrote: >> Daniel P. Berrangé writes: >> > >> > On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: >> >> Hi, >> >> >> > >> >> I've met two situations with NVDIMM support in libvirt where I'm not >> >> sure all the parties (libvirt & I) do the things correctly. >> >> >> >> The first problem is with memory alignment and size changes. In >> >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also >> >> makes some NVDIMM size changes for better alignments, in >> >> qemuDomainMemoryDeviceAlignSize. This can lead to the size being >> >> rounded up, exceeding the size of the backing device and QEMU failing to >> >> start the VM for that reason (I've experienced that actually). I work >> >> with emulated NVDIMM devices, not a bare metal hardware, so one might >> >> argue that in practice the device sizes should already be aligned, but >> >> I'm not sure it must be always the case considering labels or whatever >> >> else the user decides to set up. And I still don't feel very >> >> comfortable that I have to count with two internal size adjustments >> >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal >> >> of getting the VM started and having the NVDIMM aligned properly to make >> >> (non-NVDIMM) memory hot plug working. Is the size alignment performed >> >> by libvirt, especially rounding up, completely correct for NVDIMMs? >> > >> > The comment on the function says QEMU aligns to "page size", which >> > is something that can vary depending not only on architecture, and >> > also the build config options for the kernel on that architecture. >> > eg aarch64 has different page size in RHEL than other distros because >> > of different choice of page size in kernel config. >> > >> > Libvirt rounds up to 1 MB, essentially so that the size works no matter >> > what architecture or build options were used. I think this is quite >> > compelling as I don't think mgmt apps are likely to care enough about >> > non-x86 architectures to pick the right rounded sizes. >> > >> > If we're enforcing this 1 MB rounding though, we really should be >> > documenting it clearly, so that apps can pick the right backing file >> > size. I think we dropped the ball on docs. >> >> I still can't see it in the documentation, would it be possible to be >> clear about it in the docs, please? For first, it's not very intuitive >> to figure out that (if I've figured out it correctly) on POWER one >> *must* specify the NVDIMM size S as >> >> S == aligned_size + label_size >> >> and that size is used for the QEMU device; while on x86_64 one can >> specify any size S and >> >> align_up(S) >> >> will be used for the QEMU device (and label size doesn't influence the >> value). And additional alignment may be required for having any memory >> hot plug working. >> >> For second, and more importantly, I'm afraid that without documenting >> it, future changes may break the current behavior without warning. For >> example, the recent changes regarding POWER alignment in 6.7.0 are for >> good IMHO and one can use the same size with both 6.7 and 6.6 versions, >> but they could still cause pre-6.7 sizes stop working. > > I don't know what changes you are referring to here, but if they were > in libvirt I'd consider that a bug - we shouldn't break a previously > working configuration by increasing required alignment. I mean disabling the auto alignment in https://gitlab.com/libvirt/libvirt/-/commit/07de813924caf37e535855541c0c1183d9d382e2 and replacing it with validation in https://gitlab.com/libvirt/libvirt/-/commit/0ccceaa57c50e5ee528f7073fa8723afd62b88b7 That change can cause a VM fail to start but after (manually) adjusting the device size, all should work all right. Changes that would actually change sizes would be more dangerous. Regards, Milan
Re: Two questions about NVDIMM devices
Daniel P. Berrangé writes: > On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: >> Hi, >> > >> I've met two situations with NVDIMM support in libvirt where I'm not >> sure all the parties (libvirt & I) do the things correctly. >> >> The first problem is with memory alignment and size changes. In >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also >> makes some NVDIMM size changes for better alignments, in >> qemuDomainMemoryDeviceAlignSize. This can lead to the size being >> rounded up, exceeding the size of the backing device and QEMU failing to >> start the VM for that reason (I've experienced that actually). I work >> with emulated NVDIMM devices, not a bare metal hardware, so one might >> argue that in practice the device sizes should already be aligned, but >> I'm not sure it must be always the case considering labels or whatever >> else the user decides to set up. And I still don't feel very >> comfortable that I have to count with two internal size adjustments >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal >> of getting the VM started and having the NVDIMM aligned properly to make >> (non-NVDIMM) memory hot plug working. Is the size alignment performed >> by libvirt, especially rounding up, completely correct for NVDIMMs? > > The comment on the function says QEMU aligns to "page size", which > is something that can vary depending not only on architecture, and > also the build config options for the kernel on that architecture. > eg aarch64 has different page size in RHEL than other distros because > of different choice of page size in kernel config. > > Libvirt rounds up to 1 MB, essentially so that the size works no matter > what architecture or build options were used. I think this is quite > compelling as I don't think mgmt apps are likely to care enough about > non-x86 architectures to pick the right rounded sizes. > > If we're enforcing this 1 MB rounding though, we really should be > documenting it clearly, so that apps can pick the right backing file > size. I think we dropped the ball on docs. I still can't see it in the documentation, would it be possible to be clear about it in the docs, please? For first, it's not very intuitive to figure out that (if I've figured out it correctly) on POWER one *must* specify the NVDIMM size S as S == aligned_size + label_size and that size is used for the QEMU device; while on x86_64 one can specify any size S and align_up(S) will be used for the QEMU device (and label size doesn't influence the value). And additional alignment may be required for having any memory hot plug working. For second, and more importantly, I'm afraid that without documenting it, future changes may break the current behavior without warning. For example, the recent changes regarding POWER alignment in 6.7.0 are for good IMHO and one can use the same size with both 6.7 and 6.6 versions, but they could still cause pre-6.7 sizes stop working. Thanks, Milan
Distinguishing between host and guest initiated VM shutdown
Hi, we have a problem in oVirt that highly available VMs don't restart after host poweroff because Vdsm identifies the case as a user initiated shutdown (https://bugzilla.redhat.com/1800966). When poweroff is run on the host, libvirt-guests service takes an action. `virsh shutdown' is run on the VM, the guest OS is shut down cleanly and libvirt reports a shutdown event with VIR_DOMAIN_EVENT_SHUTDOWN_GUEST detail. Although it is a host initiated shutdown actually. Does libvirt provide any means to distinguish this case from a regular user shutdown? Thanks, Milan
Re: Distinguishing between host and guest initiated VM shutdown
Daniel P. Berrangé writes: > On Thu, Aug 27, 2020 at 10:06:25AM +0200, Milan Zamazal wrote: >> "Daniel P. Berrange" writes: >> > >> > On Wed, Aug 26, 2020 at 10:35:22PM +0200, Milan Zamazal wrote: >> >> Hi, >> >> >> > >> >> we have a problem in oVirt that highly available VMs don't restart after >> >> host poweroff because Vdsm identifies the case as a user initiated >> >> shutdown (https://bugzilla.redhat.com/1800966). >> >> >> >> When poweroff is run on the host, libvirt-guests service takes an >> >> action. >> > >> > If oVirt is initiating a graceful host shutdown, >> >> I meant host shutdown not initiated by oVirt. > > Well oVirt still knows at any point in time what VMs are currently > running on a host, so if it sees the host shutdown, it already > knows that needs restarting. libvirt-guests ensures graceful shutdown of the VMs (which is a good thing) so at the moment when the host gets down, there are no VMs running there. We need that info at the moment when a VM is shut down and I think we can get it by examining some systemd service (if ignoring some not so important timing issues). Regards, Milan
Re: Distinguishing between host and guest initiated VM shutdown
"Daniel P. Berrange" writes: > On Wed, Aug 26, 2020 at 10:35:22PM +0200, Milan Zamazal wrote: >> Hi, >> > >> we have a problem in oVirt that highly available VMs don't restart after >> host poweroff because Vdsm identifies the case as a user initiated >> shutdown (https://bugzilla.redhat.com/1800966). >> >> When poweroff is run on the host, libvirt-guests service takes an >> action. > > If oVirt is initiating a graceful host shutdown, I meant host shutdown not initiated by oVirt. > then surely it already knows what VMs it has running on the host at > that time, and so has enough info to restart them later. > >>`virsh shutdown' is run on the VM, the guest OS is shut down >> cleanly and libvirt reports a shutdown event with >> VIR_DOMAIN_EVENT_SHUTDOWN_GUEST detail. Although it is a host initiated >> shutdown actually. >> >> Does libvirt provide any means to distinguish this case from a regular >> user shutdown? > > A "virsh shutdown" merely triggers a request to the guest OS to start > a guest initiated shutdown. As such it is indistinguishable from an > administrator initiating the same thing inside the guest. OK, so we will have to check on VM shutdown whether the host is in shutdown or not ourselves. Thank you for clarification, Milan
Two questions about NVDIMM devices
Hi, I've met two situations with NVDIMM support in libvirt where I'm not sure all the parties (libvirt & I) do the things correctly. The first problem is with memory alignment and size changes. In addition to the size changes applied to NVDIMMs by QEMU, libvirt also makes some NVDIMM size changes for better alignments, in qemuDomainMemoryDeviceAlignSize. This can lead to the size being rounded up, exceeding the size of the backing device and QEMU failing to start the VM for that reason (I've experienced that actually). I work with emulated NVDIMM devices, not a bare metal hardware, so one might argue that in practice the device sizes should already be aligned, but I'm not sure it must be always the case considering labels or whatever else the user decides to set up. And I still don't feel very comfortable that I have to count with two internal size adjustments (libvirt & QEMU) to the `size' value I specify, with the ultimate goal of getting the VM started and having the NVDIMM aligned properly to make (non-NVDIMM) memory hot plug working. Is the size alignment performed by libvirt, especially rounding up, completely correct for NVDIMMs? The second problem is that a VM fails to start with a backing NVDIMM in devdax mode due to SELinux preventing access to the /dev/dax* device (it doesn't happen with any other NVDIMM modes). Who should be responsible for handling the SELinux label appropriately in that case? libvirt, the system administrator, anybody else? Using in NVDIMM's source doesn't seem to be accepted by the domain XML schema. Thanks, Milan
Re: Two questions about NVDIMM devices
Daniel P. Berrangé writes: > On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: >> Hi, >> > >> I've met two situations with NVDIMM support in libvirt where I'm not >> sure all the parties (libvirt & I) do the things correctly. >> >> The first problem is with memory alignment and size changes. In >> addition to the size changes applied to NVDIMMs by QEMU, libvirt also >> makes some NVDIMM size changes for better alignments, in >> qemuDomainMemoryDeviceAlignSize. This can lead to the size being >> rounded up, exceeding the size of the backing device and QEMU failing to >> start the VM for that reason (I've experienced that actually). I work >> with emulated NVDIMM devices, not a bare metal hardware, so one might >> argue that in practice the device sizes should already be aligned, but >> I'm not sure it must be always the case considering labels or whatever >> else the user decides to set up. And I still don't feel very >> comfortable that I have to count with two internal size adjustments >> (libvirt & QEMU) to the `size' value I specify, with the ultimate goal >> of getting the VM started and having the NVDIMM aligned properly to make >> (non-NVDIMM) memory hot plug working. Is the size alignment performed >> by libvirt, especially rounding up, completely correct for NVDIMMs? > > The comment on the function says QEMU aligns to "page size", which > is something that can vary depending not only on architecture, and > also the build config options for the kernel on that architecture. > eg aarch64 has different page size in RHEL than other distros because > of different choice of page size in kernel config. > > Libvirt rounds up to 1 MB, Actually 2 MB, at least in my case, apparently in qemuDomainGetMemoryModuleSizeAlignment. But it's just a detail. > essentially so that the size works no matter what architecture or > build options were used. I think this is quite compelling as I don't > think mgmt apps are likely to care enough about non-x86 architectures > to pick the right rounded sizes. > > If we're enforcing this 1 MB rounding though, we really should be > documenting it clearly, so that apps can pick the right backing file > size. I think we dropped the ball on docs. Yes, OK. I also wonder how exactly label size is counted in. It's added to the aligned value in qemuDomainNVDimmAlignSizePseries with the argument that label size is mandatory on ppc. But it's also permitted on other architectures and I can't see a similar adjustment for them. I think QEMU handles it fine in either case (by subtracting label size from the overall size and aligning the result down) and I guess the special handling of ppc in libvirt is just not to waste 256 MB unnecessarily. Still, all the size shuffling scares me and I can only hope that I compute my target sizes for the domain XML correctly to make everything working well... >> The second problem is that a VM fails to start with a backing NVDIMM in >> devdax mode due to SELinux preventing access to the /dev/dax* device (it >> doesn't happen with any other NVDIMM modes). Who should be responsible >> for handling the SELinux label appropriately in that case? libvirt, the >> system administrator, anybody else? Using in NVDIMM's source >> doesn't seem to be accepted by the domain XML schema. > > The expectation is that out of the box SELinux will "just work". So > anything that is broken is a bug in either libvirt or selinux policy. > > There is no expectation/requirement to use unless you want > to setup non-default behaviour which isn't the case here. > > IOW this sounds like a genuine bug. OK, I'll try to find out what and where is the problem exactly. Thank you for the clarifications, Milan
NVDIMM in devdax mode and SELinux (was: Two questions about NVDIMM devices)
Milan Zamazal writes: > Daniel P. Berrangé writes: > >> On Thu, Jul 02, 2020 at 01:21:15PM +0200, Milan Zamazal wrote: >>> The second problem is that a VM fails to start with a backing NVDIMM in >>> devdax mode due to SELinux preventing access to the /dev/dax* device (it >>> doesn't happen with any other NVDIMM modes). Who should be responsible >>> for handling the SELinux label appropriately in that case? libvirt, the >>> system administrator, anybody else? Using in NVDIMM's source >>> doesn't seem to be accepted by the domain XML schema. >> >> The expectation is that out of the box SELinux will "just work". So >> anything that is broken is a bug in either libvirt or selinux policy. >> >> There is no expectation/requirement to use unless you want >> to setup non-default behaviour which isn't the case here. >> >> IOW this sounds like a genuine bug. > > OK, I'll try to find out what and where is the problem exactly. The problem apparently is that /dev/dax* is a character device rather than a block device (such as /dev/pmem*), which is not expected by SELinux policy rules. This is an NVDIMM in fsdax mode: # ls -lZ /dev/pmem0 brw-rw. 1 root disk system_u:object_r:device_t:s0 259, 0 Jul 9 11:39 /dev/pmem0 This is the same NVDIMM reconfigured as devdax: # ls -lZ /dev/dax0.0 crw---. 1 root root system_u:object_r:device_t:s0 252, 5 Jul 9 11:43 /dev/dax0.0 (Unix permissions are different, but when I change them to `disk' group and 660, the same problem still occurs.) audit.log reports the following when starting a VM with an NVDIMM device in devdax mode: type=AVC msg=audit(1594144691.758:913): avc: denied { map } for pid=21659 comm="qemu-kvm" path="/dev/dax0.0" dev="tmpfs" ino=1521557 scontext=system_u:system_r:svirt_t:s0:c216,c981 tcontext=system_u:object_r:svirt_image_t:s0:c216,c981 tclass=chr_file permissive=0 type=AVC msg=audit(1594144691.758:914): avc: denied { map } for pid=21659 comm="qemu-kvm" path="/dev/dax0.0" dev="tmpfs" ino=1521557 scontext=system_u:system_r:svirt_t:s0:c216,c981 tcontext=system_u:object_r:svirt_image_t:s0:c216,c981 tclass=chr_file permissive=0 Indeed, svirt_t map access to svirt_image_t is allowed only for files and block devices: # sesearch -A -p map -s svirt_t -t svirt_image_t ... allow svirt_t svirt_image_t:blk_file map; allow svirt_t svirt_image_t:file map; What to do about it? Do I handle the NVDIMM in a wrong way or should sVirt policies be fixed? Thanks, Milan
Re: Emulated TPM devices and snapshots of running VMs
Milan Zamazal writes: > Hi, > > I would like to clarify how to make snapshots of running VMs with > emulated TPM devices. As far as I understand QEMU documentation, it's > possible to make snapshots of running VMs with TPM, but it's important > to retain the state of swtpm. Does libvirt assist with that in any way > or is it completely user's responsibility? libvirt pauses the VM > internally when making a snapshot, which should be the right moment to > copy the swtpm data, but the user doesn't have control over it. Is > there a way to make a copy of swtpm data that is guaranteed to be > consistent with the snapshot? No idea? > Thank you, > Milan
Re: Emulated TPM devices and snapshots of running VMs
Peter Krempa writes: > On Thu, Jul 09, 2020 at 14:14:32 +0200, Milan Zamazal wrote: >> Milan Zamazal writes: >> > >> > Hi, >> > >> > I would like to clarify how to make snapshots of running VMs with >> > emulated TPM devices. As far as I understand QEMU documentation, it's >> > possible to make snapshots of running VMs with TPM, but it's important >> > to retain the state of swtpm. Does libvirt assist with that in any way >> > or is it completely user's responsibility? libvirt pauses the VM >> > internally when making a snapshot, which should be the right moment to >> > copy the swtpm data, but the user doesn't have control over it. Is >> > there a way to make a copy of swtpm data that is guaranteed to be >> > consistent with the snapshot? >> >> No idea? > > I can comment only on the fact that libvirt doesn't do anything > regarding snapshots on a VM with TPM. Thank you for the confirmation. Can anybody confirm there is no way to perform custom actions while a VM is frozen by libvirt when making a memory snapshot, before we start thinking about workarounds and/or filing a RFE? Thanks, Milan
Re: Emulated TPM devices and snapshots of running VMs
Peter Krempa writes: > On Thu, Jul 09, 2020 at 17:54:23 +0200, Milan Zamazal wrote: >> Peter Krempa writes: >> > >> > On Thu, Jul 09, 2020 at 14:14:32 +0200, Milan Zamazal wrote: >> >> Milan Zamazal writes: >> >> >> > >> >> > Hi, >> >> > >> >> > I would like to clarify how to make snapshots of running VMs with >> >> > emulated TPM devices. As far as I understand QEMU documentation, it's >> >> > possible to make snapshots of running VMs with TPM, but it's important >> >> > to retain the state of swtpm. Does libvirt assist with that in any way >> >> > or is it completely user's responsibility? libvirt pauses the VM >> >> > internally when making a snapshot, which should be the right moment to >> >> > copy the swtpm data, but the user doesn't have control over it. Is >> >> > there a way to make a copy of swtpm data that is guaranteed to be >> >> > consistent with the snapshot? >> >> >> >> No idea? >> > >> > I can comment only on the fact that libvirt doesn't do anything >> > regarding snapshots on a VM with TPM. >> >> Thank you for the confirmation. >> >> Can anybody confirm there is no way to perform custom actions while a VM >> is frozen by libvirt when making a memory snapshot, before we start >> thinking about workarounds and/or filing a RFE? > > No, currently we don't support any custom actions at the point when the > external memory snapshot is finalized prior to continuing the VM. > > Please file a generic RFE for snapshoting including TPM rather than a > partial one where you'll request a way to do your hack. OK, thanks, done: https://bugzilla.redhat.com/1855367
Emulated TPM devices and snapshots of running VMs
Hi, I would like to clarify how to make snapshots of running VMs with emulated TPM devices. As far as I understand QEMU documentation, it's possible to make snapshots of running VMs with TPM, but it's important to retain the state of swtpm. Does libvirt assist with that in any way or is it completely user's responsibility? libvirt pauses the VM internally when making a snapshot, which should be the right moment to copy the swtpm data, but the user doesn't have control over it. Is there a way to make a copy of swtpm data that is guaranteed to be consistent with the snapshot? Thank you, Milan
Re: NVDIMM sizes and DIMM hot plug
Peter Krempa writes: > On Tue, Jun 16, 2020 at 12:54:29 +0200, Milan Zamazal wrote: >> Hi, >> > >> I've found out that NVDIMM size and label size matter for regular >> (non-NV) DIMM hot plug. If the NVDIMM is not aligned correctly, the >> guest OS will not accept the hot plugged memory and will complain with >> messages such as >> >> Block size [0x800] unaligned hotplug range: start 0x22500, size >> 0x1000 >> >> The start address above is also reported within element of the >> hot plugged memory in the domain XML: >> >> >> >> Apparently, in order to make memory hot plug working in the guest OS, >> the inserted memory must be aligned to the platform memory alignment >> (128 MB on x86_64). >> >> I'd like to clarify, how libvirt makes the DIMM address above. How is > > If the address isn't provided in the device XML of the attached device, > libvirt attaches the device without any address at all and then > refreshes the address from qemu in 'qemuDomainUpdateMemoryDeviceInfo'. OK, I can look into the QEMU source code, but I'd still like to have some official confirmation, especially regarding possible pitfalls or future changes. We can't risk data loss. >> the NVDIMM memory range determined? According to my experiments, it >> seems the NVDIMM specified is taken, NVDIMM size is >> subtracted from it and the resulting value is reduced to the nearest >> multiple of NVDIMM . Is this observation correct? Is it >> guaranteed to be stable in future versions? I need to determine the >> right NVDIMM size to make the subsequent memory modules correctly >> aligned and then I can't change the NVDIMM size, to not damage data >> stored in the NVDIMM. > > > Unfortunatelly I didn't implement NVDIMM support so I don't know the > intricacies. I've cc'd Martin Kletzander who did that part. Martin, do you know how the QEMU part is supposed to work? I haven't received any response on the QEMU list, do you know who could I ask directly? >> Additionally, when adjusting maxMemory due to NVDIMM presence, should I >> increase it by the specified NVDIMM or a different value? >> >> Thank you, >> Milan >>
NVDIMM sizes and DIMM hot plug
Hi, I've found out that NVDIMM size and label size matter for regular (non-NV) DIMM hot plug. If the NVDIMM is not aligned correctly, the guest OS will not accept the hot plugged memory and will complain with messages such as Block size [0x800] unaligned hotplug range: start 0x22500, size 0x1000 The start address above is also reported within element of the hot plugged memory in the domain XML: Apparently, in order to make memory hot plug working in the guest OS, the inserted memory must be aligned to the platform memory alignment (128 MB on x86_64). I'd like to clarify, how libvirt makes the DIMM address above. How is the NVDIMM memory range determined? According to my experiments, it seems the NVDIMM specified is taken, NVDIMM size is subtracted from it and the resulting value is reduced to the nearest multiple of NVDIMM . Is this observation correct? Is it guaranteed to be stable in future versions? I need to determine the right NVDIMM size to make the subsequent memory modules correctly aligned and then I can't change the NVDIMM size, to not damage data stored in the NVDIMM. Additionally, when adjusting maxMemory due to NVDIMM presence, should I increase it by the specified NVDIMM or a different value? Thank you, Milan
Re: NVDIMM sizes and DIMM hot plug
Daniel P. Berrangé writes: > On Tue, Jun 16, 2020 at 12:54:29PM +0200, Milan Zamazal wrote: >> Hi, >> > >> I've found out that NVDIMM size and label size matter for regular >> (non-NV) DIMM hot plug. If the NVDIMM is not aligned correctly, the >> guest OS will not accept the hot plugged memory and will complain with >> messages such as >> >> Block size [0x800] unaligned hotplug range: start 0x22500, size >> 0x1000 >> >> The start address above is also reported within element of the >> hot plugged memory in the domain XML: >> >> >> >> Apparently, in order to make memory hot plug working in the guest OS, >> the inserted memory must be aligned to the platform memory alignment >> (128 MB on x86_64). >> >> I'd like to clarify, how libvirt makes the DIMM address above. How is >> the NVDIMM memory range determined? According to my experiments, it >> seems the NVDIMM specified is taken, NVDIMM size is >> subtracted from it and the resulting value is reduced to the nearest >> multiple of NVDIMM . Is this observation correct? Is it >> guaranteed to be stable in future versions? I need to determine the >> right NVDIMM size to make the subsequent memory modules correctly >> aligned and then I can't change the NVDIMM size, to not damage data >> stored in the NVDIMM. > > Libvirt doesn't ever assign a "base" address value itself. We just > start QEMU, and then fill in the XML "base" with the value that QEMU > has assigned. I see, then I'll ask about it on the QEMU list. >> Additionally, when adjusting maxMemory due to NVDIMM presence, should I >> increase it by the specified NVDIMM or a different value? > > IIRC, maxMemory has to allow for the sum of the basic RAM amount, plus > RAM intended to be used for all possible future (NV)DIMMS that will be > hotplugged. OK. Thanks, Milan
Re: Memory locking limit and zero-copy migrations
Fangge Jin writes: > On Thu, Aug 18, 2022 at 2:46 PM Milan Zamazal wrote: > >> Fangge Jin writes: >> >> > I can share some test results with you: >> > 1. If no memtune->hard_limit is set when start a vm, the default memlock >> > hard limit is 64MB >> > 2. If memtune->hard_limit is set when start a vm, memlock hard limit will >> > be set to the value of memtune->hard_limit >> > 3. If memtune->hard_limit is updated at run-time, memlock hard limit >> won't >> > be changed accordingly >> > >> > And some additional knowledge: >> > 1. memlock hard limit can be shown by ‘prlimit -p -l’ >> > 2. The default value of memlock hard limit can be changed by setting >> > LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service >> >> Ah, that explains it to me, thank you. And since in the default case >> the systemd limit is not reported in of a running VM, I assume >> libvirt takes it as "not set" and sets the higher limit when setting up >> a zero-copy migration. Good. >> > Not sure whether you already know this, but I had a hard time > differentiating the two concepts: > 1. memlock hard limit(shown by prlimit): the hard limit for locked host > memory > 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use host > memory, this memory can be swapped out. No, I didn't know it, thank you for pointing this out. Indeed, 2. is what both the libvirt and kernel documentation seem to say, although not so clearly. But when I add with to the domain XML and then start the VM, I can see the limit shown by `prlimit -l' is increased accordingly. This is good for my use case, but does it match what you say about the two concepts?
Re: Memory locking limit and zero-copy migrations
Fangge Jin writes: > On Fri, Aug 19, 2022 at 4:08 AM Milan Zamazal wrote: > >> > Not sure whether you already know this, but I had a hard time >> > differentiating the two concepts: >> > 1. memlock hard limit(shown by prlimit): the hard limit for locked host >> > memory >> > 2. memtune hard limit(memtune->hard_limit): the hard limit for in-use >> host >> > memory, this memory can be swapped out. >> >> No, I didn't know it, thank you for pointing this out. Indeed, 2. is >> what both the libvirt and kernel documentation seem to say, although not >> so clearly. >> >> But when I add with to the domain XML and then >> start the VM, I can see the limit shown by `prlimit -l' is increased >> accordingly. This is good for my use case, but does it match what you >> say about the two concepts? > > memtune->hard_limit(hard limit of in-use memory) actually takes effect via > cgroup, > you can check the value by: > # virsh memtune uefi1 > hard_limit : 134217728 > soft_limit : unlimited > swap_hard_limit: unlimited > # cat > /sys/fs/cgroup/memory/machine.slice/machine-qemu\\x2d6\\x2duefi1.scope/libvirt/memory.limit_in_bytes > > 137438953472 > > When vm starts with memtune->hard_limit set in domain XML, memlock > hard limit( hard_limit of locked memory, shown by 'prlimit -l')will be > set to the value of memtune->hard_limit. This's probably because > memlock hard limit must be less than memtune->hard_limit. Well, increasing the memlock limit to keep it within memtune->hard_limit wouldn't make much sense, but thank you for confirming that setting memtune->hard_limit adjusts both the limits to the requested value.
Re: NUMA node - Memory Only
Michal Prívozník writes: > On 8/9/22 12:55, Jin Huang wrote: >> Hi, everyone >> I built the libvirt 8.6.0 on my Ubuntu 20 system with the options like this: > >> meson build -Dsystem=true -Ddriver_interface=enabled >> -Ddriver_libvirtd=enabled -Ddriver_network=enabled -Ddriver_qemu=enabled >> -Ddriver_remote=enabled -Dnumactl=enabled -Dnumad=enabled >> -Dstorage_disk=enabled >> >> (1)After installation, when I tried to start the libvirtd, I get this >> error message: >> error : virNetworkObjAssignDefLocked:576 : operation failed: network >> 'default' already exists with uuid 7477a9f5-02d3-4fbc-b0e8-d7229d39a6a2 >> >> (2)When try the virsh command, I get this error message: >> virsh: /lib/x86_64-linux-gnu/libvirt-qemu.so.0: version >> `LIBVIRT_QEMU_8.2.0' not found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_8.0.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_8.5.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_6.10.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.7.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.8.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.2.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.1.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version `LIBVIRT_7.3.0' not >> found (required by virsh) >> virsh: /lib/x86_64-linux-gnu/libvirt.so.0: version >> `LIBVIRT_PRIVATE_8.6.0' not found (required by virsh) >> >> Could anyone give me some suggestions to fix these issues? > > This is pretty much expected if you had libvirt installed from your > package manager (which I believe is the case because of the network > error). I don't know what the correct way to build a .deb package is, The easiest way is to use a package from a newer Ubuntu version. I'd suggest using the oldest good enough version available to avoid problems with dependencies. You may still be forced to use a newer libc or so. In such a case, you can either configure your system (in /etc/apt/) to use some packages from a newer Ubuntu version, or to rebuild the newer libvirt package using instructions from https://wiki.debian.org/BuildingTutorial#The_packaging_workflow > but on rpm based distros I usually build a .tar.xz (meson dist) from > which I build a .rpm (rpmbuild -ta) and then install it. > > Michal
Re: Memory locking limit and zero-copy migrations
Peter Krempa writes: > On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote: >> Hi, >> > >> do I read libvirt sources right that when is not used in the >> libvirt domain then libvirt takes proper care about setting memory >> locking limits when zero-copy is requested for a migration? > > Well yes, for a definition of "proper". In this instance qemu can lock > up to the guest-visible memory size of memory for the migration, thus we > set the lockable size to the guest memory size. This is a simple upper > bound which is supposed to work in all scenarios. Qemu is also unlikely > to ever use up all the allowed locking. Great, thank you for confirmation. >> I also wonder whether there are any other situations where memory limits >> could be set by libvirt or QEMU automatically rather than having no >> memory limits? We had oVirt bugs in the past where certain VMs with >> VFIO devices couldn't be started due to extra requirements on the amount >> of locked memory and adding to the domain apparently >> helped. > > is not only an amount of memory qemu can lock into ram, but > an upper bound of all memory the qemu process can consume. This includes > any qemu overhead e.g. used for the emulation layer. > > Guessing the correct size of overhead still has the same problems it had > and libvirt is not going to be in the business of doing that. To clarify, my point was not whether libvirt should, but whether libvirt or any related component possibly does (or did in the past) impose memory limits. Because as I was looking around it seems there are no real memory limits by default, at least in libvirt, but some limit had been apparently hit in the reported bugs.
Memory locking limit and zero-copy migrations
Hi, do I read libvirt sources right that when is not used in the libvirt domain then libvirt takes proper care about setting memory locking limits when zero-copy is requested for a migration? I also wonder whether there are any other situations where memory limits could be set by libvirt or QEMU automatically rather than having no memory limits? We had oVirt bugs in the past where certain VMs with VFIO devices couldn't be started due to extra requirements on the amount of locked memory and adding to the domain apparently helped. Thanks, Milan
Re: Memory locking limit and zero-copy migrations
Fangge Jin writes: > I can share some test results with you: > 1. If no memtune->hard_limit is set when start a vm, the default memlock > hard limit is 64MB > 2. If memtune->hard_limit is set when start a vm, memlock hard limit will > be set to the value of memtune->hard_limit > 3. If memtune->hard_limit is updated at run-time, memlock hard limit won't > be changed accordingly > > And some additional knowledge: > 1. memlock hard limit can be shown by ‘prlimit -p -l’ > 2. The default value of memlock hard limit can be changed by setting > LimitMEMLOCK in /usr/lib/systemd/system/virtqemud.service Ah, that explains it to me, thank you. And since in the default case the systemd limit is not reported in of a running VM, I assume libvirt takes it as "not set" and sets the higher limit when setting up a zero-copy migration. Good. Regards, Milan > BR, > Fangge Jin > > On Wed, Aug 17, 2022 at 19:25 Milan Zamazal wrote: > >> Peter Krempa writes: >> >> > On Wed, Aug 17, 2022 at 10:56:54 +0200, Milan Zamazal wrote: >> >> Hi, >> >> >> > >> >> do I read libvirt sources right that when is not used in the >> >> libvirt domain then libvirt takes proper care about setting memory >> >> locking limits when zero-copy is requested for a migration? >> > >> > Well yes, for a definition of "proper". In this instance qemu can lock >> > up to the guest-visible memory size of memory for the migration, thus we >> > set the lockable size to the guest memory size. This is a simple upper >> > bound which is supposed to work in all scenarios. Qemu is also unlikely >> > to ever use up all the allowed locking. >> >> Great, thank you for confirmation. >> >> >> I also wonder whether there are any other situations where memory limits >> >> could be set by libvirt or QEMU automatically rather than having no >> >> memory limits? We had oVirt bugs in the past where certain VMs with >> >> VFIO devices couldn't be started due to extra requirements on the amount >> >> of locked memory and adding to the domain apparently >> >> helped. >> > >> > is not only an amount of memory qemu can lock into ram, but >> > an upper bound of all memory the qemu process can consume. This includes >> > any qemu overhead e.g. used for the emulation layer. >> > >> > Guessing the correct size of overhead still has the same problems it had >> > and libvirt is not going to be in the business of doing that. >> >> To clarify, my point was not whether libvirt should, but whether libvirt >> or any related component possibly does (or did in the past) impose >> memory limits. Because as I was looking around it seems there are no >> real memory limits by default, at least in libvirt, but some limit had >> been apparently hit in the reported bugs. >> >>