Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
On 10/13/20 7:57 PM, Peter Krempa wrote: On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote: On 10/13/20 5:10 PM, Michal Privoznik wrote: On 10/13/20 1:35 AM, Nico Pache wrote: gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79 The virtio-balloon device now has the ability to report free pages back to the hypervisor for reuse by other programs. Is this something that we might want to report? I mean, we have 'virsh dommemstat $dom' which under the hood calls: {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"} As I've pointed out in earlier review, I think that the feature name is a bit misleading. It sounds like a statistic, not something that actually returns memory to the host. The docs are now better but still leave a lot of room for imagination. IMO, if the feature is mainly for returning memory to the host, the 'reporting' word should not have been used. Oh sorry I missed that. I a penance I will post a cleanup patch. How does "free-pages" or "return-pages" or even "discard-pages" sound? Michal
Re: [libvirt][PATCH v1 1/1] introduce an attribute "migratable" to numatune memory element
On 10/6/2020 3:15 PM, Peter Krempa wrote: On Tue, Oct 06, 2020 at 13:47:25 +0800, Luyao Zhong wrote: Attribute ``migratable`` will be 'no' by default, and 'yes' indicates that it allows operating system or hypervisor migrating the memory pages between different memory nodes, that also means we will not rely on hypervisor to set the memory policy or memory affinity, we only use cgroups to restrict the memory nodes, so if ``migratable`` is 'yes', the ``mode`` which indicates memory policy will be ignored. Note that I'm not reviewing whether this is justified and makes sense. I'm commenting purely on the code. --- We require that patches are split into logical blocks. You need to split this commit at least into following patches: docs/formatdomain.rst | 8 +++- docs/schemas/domaincommon.rng | 5 +++ src/conf/numa_conf.c | 45 +++ src/conf/numa_conf.h | 3 ++ src/libvirt_private.syms | 1 + Docs, schema and parser changes should be separated Got it. src/qemu/qemu_command.c | 6 ++- src/qemu/qemu_process.c | 21 + .../numatune-memory-migratable.args | 34 ++ .../numatune-memory-migratable.err| 1 + .../numatune-memory-migratable.xml| 33 ++ tests/qemuxml2argvtest.c | 5 +++ Implementation in qemu can be all together with tests. Please also add a qemuxml2xmltest case. I've provided some more feedback in a recent review: https://www.redhat.com/archives/libvir-list/2020-October/msg00231.html specifically some guidance on tests: https://www.redhat.com/archives/libvir-list/2020-October/msg00395.html Got it. Thanks. 11 files changed, 160 insertions(+), 2 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-memory-migratable.args create mode 100644 tests/qemuxml2argvdata/numatune-memory-migratable.err create mode 100644 tests/qemuxml2argvdata/numatune-memory-migratable.xml diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index cc4f91d4ea..4e386a0c3d 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst [...] @@ -1097,6 +1097,12 @@ NUMA Node Tuning will be ignored if it's specified. If ``placement`` of ``vcpu`` is 'auto', and ``numatune`` is not specified, a default ``numatune`` with ``placement`` 'auto' and ``mode`` 'strict' will be added implicitly. :since:`Since 0.9.3` + Attribute ``migratable`` is 'no' by default, and 'yes' indicates that it + allows operating system or hypervisor migrating the memory pages between + different memory nodes, that also means we will not rely on hypervisor to + set the memory policy or memory affinity, we only use cgroups to restrict + the memory nodes, so if ``migratable`` is 'yes', the ``mode`` which indicates + memory policy will be ignored. So ... does this make us ignore the per NUMA-node set 'mode'? If yes, the code should actually reject any configs where we ignore some settings rather than just relying on the docs. Yes, it will ignore the per NUMA-node set 'mode'. So let me saying in this way, if 'migratable' is set, per NUMA-node mode should not be set, otherwise the config will be rejected, so the mode will have 'None' as its default value but not 'strict'. ``memnode`` Optional ``memnode`` elements can specify memory allocation policies per each guest NUMA node. For those nodes having no corresponding ``memnode`` element, [...] diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 6653ba05a6..c14ba1295c 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c [...] @@ -292,6 +294,15 @@ virDomainNumatuneParseXML(virDomainNumaPtr numa, placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO; if (node) { +if ((tmp = virXMLPropString(node, "migratable")) && +(migratable = virTristateBoolTypeFromString(tmp)) <= 0) { +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, VIR_ERR_XML_ERROR Got it. + _("Invalid 'migratable' attribute value '%s'"), tmp); +goto cleanup; +} +numa->memory.migratable = migratable; +VIR_FREE(tmp); + if ((tmp = virXMLPropString(node, "mode")) && (mode = virDomainNumatuneMemModeTypeFromString(tmp)) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -369,6 +380,11 @@ virDomainNumatuneFormatXML(virBufferPtr buf, tmp = virDomainNumatuneMemModeTypeToString(numatune->memory.mode); virBufferAsprintf(buf, " +if (numatune->memory.migratable) { Use an explicit condition here: if (numatune->memory.migratable != VIR_TRISTATE_BOOL_ABSENT) Got it. +tmp = virTristateBoolTypeToString(numatune->memory.migratable); +
Re: [libvirt PATCH] qemu: remove some unnecessary local variables
On 10/13/20 5:14 PM, Jonathon Jongsma wrote: These variables seem to be left over from a previous refactoring and they don't add anything to the code. Certainly the proof is in the compiling :-) (it looks like it was probably *me* that left these useless variables in there about 1.5 years ago when I split all the unplug stuff into two pieces, and factored out some common code used by all different device types. See commit dd60bd62d and surrounding commits if you're interested...) Reviewed-by: Laine Stump and pushed. Signed-off-by: Jonathon Jongsma --- src/qemu/qemu_hotplug.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 2c184b9ba0..79fc8baa5c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -5502,7 +5502,6 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm, virDomainRedirdevDefPtr match, virDomainRedirdevDefPtr *detach) { -virDomainRedirdevDefPtr redirdev; ssize_t idx; if ((idx = virDomainRedirdevDefFind(vm->def, match)) < 0) { @@ -5511,7 +5510,7 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm, return -1; } -*detach = redirdev = vm->def->redirdevs[idx]; +*detach = vm->def->redirdevs[idx]; return 0; } @@ -5523,12 +5522,11 @@ qemuDomainDetachPrepNet(virDomainObjPtr vm, virDomainNetDefPtr *detach) { int detachidx; -virDomainNetDefPtr net = NULL; if ((detachidx = virDomainNetFindIdx(vm->def, match)) < 0) return -1; -*detach = net = vm->def->nets[detachidx]; +*detach = vm->def->nets[detachidx]; return 0; } @@ -5598,7 +5596,6 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm, virDomainRNGDefPtr *detach) { ssize_t idx; -virDomainRNGDefPtr rng; if ((idx = virDomainRNGFind(vm->def, match)) < 0) { virReportError(VIR_ERR_DEVICE_MISSING, @@ -5608,7 +5605,7 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm, return -1; } -*detach = rng = vm->def->rngs[idx]; +*detach = vm->def->rngs[idx]; return 0; } @@ -5619,7 +5616,6 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm, virDomainMemoryDefPtr match, virDomainMemoryDefPtr *detach) { -virDomainMemoryDefPtr mem; int idx; if (qemuDomainMemoryDeviceAlignSize(vm->def, match) < 0) @@ -5633,7 +5629,7 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm, return -1; } -*detach = mem = vm->def->mems[idx]; +*detach = vm->def->mems[idx]; return 0; }
Re: [External] Re: [PATCH v2 1/4] API: introduce memory failure
On 10/13/20 8:31 PM, Daniel Henrique Barboza wrote: This patch failed to compile in my env: FAILED: tools/virsh.p/virsh-domain.c.o [] -D_FUNCTION_DEF -MD -MQ tools/virsh.p/virsh-domain.c.o -MF tools/virsh.p/virsh-domain.c.o.d -o tools/virsh.p/virsh-domain.c.o -c ../tools/virsh-domain.c In file included from /usr/lib64/glib-2.0/include/glibconfig.h:9, from /usr/include/glib-2.0/glib/gtypes.h:32, from /usr/include/glib-2.0/glib/galloca.h:32, from /usr/include/glib-2.0/glib.h:30, from ../src/util/glibcompat.h:21, from ../src/internal.h:30, from ../tools/virsh.h:25, from ../tools/virsh-domain.h:23, from ../tools/virsh-domain.c:22: /usr/include/glib-2.0/glib/gmacros.h:745:53: error: size of array ‘_GStaticAssertCompileTimeAssertion_185’ is negative 745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE (_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] G_GNUC_UNUSED | ^~~ /usr/include/glib-2.0/glib/gmacros.h:735:47: note: in definition of macro ‘G_PASTE_ARGS’ 735 | #define G_PASTE_ARGS(identifier1,identifier2) identifier1 ## identifier2 | ^~~ /usr/include/glib-2.0/glib/gmacros.h:745:44: note: in expansion of macro ‘G_PASTE’ 745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE (_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] G_GNUC_UNUSED | ^~~ ../tools/virsh-domain.c:13643:1: note: in expansion of macro ‘G_STATIC_ASSERT’ 13643 | G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); | ^~~ [505/984] Compiling C object src/virtqemud.p/remote_remote_daemon_dispatch.c.o ninja: build stopped: subcommand failed. $ I didn't verify if the following patches fixes it. Thanks, DHB I described it in '[PATCH v2 4/4] virsh: implement memory failure event' Notice: The full patch set includes 4 patches: virsh: implement memory failure event (current patch) qemu: monitor: handle memory failure event qemu: process: implement domainMemoryFailure API: introduce memory failure To avoid build/test errors, the 4 patches should be merged/removed together. Suggested by Peter, separate a 'all in one' patch into 4 patches (described in cover letter '[PATCH v2 0/4] support memory failure'). I forked a repo and pushed the 4 patches(https://gitlab.com/pacepi/libvirt/-/tree/memory-failure-v2), CI worked fine. On 10/12/20 9:31 AM, zhenwei pi wrote: Introduce memory failure event. Libvirt should monitor domain's event, then posts it to uplayer. According to the hardware memory corrupted message, the cloud scheduler could migrate domain to another health physical server. Signed-off-by: zhenwei pi --- include/libvirt/libvirt-domain.h | 82 + src/conf/domain_event.c | 80 src/conf/domain_event.h | 12 ++ src/libvirt_private.syms | 2 + src/remote/remote_daemon_dispatch.c | 32 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x | 16 +++- src/remote_protocol-structs | 8 8 files changed, 263 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 77f9116675..5138843a56 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -3196,6 +3196,64 @@ typedef enum { } virDomainEventCrashedDetailType; /** + * virDomainMemoryFailureRecipientType: + * + * Recipient of a memory failure event. + */ +typedef enum { + /* memory failure at hypersivor memory address space */ + VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_HYPERVISOR = 0, + + /* memory failure at guest memory address space */ + VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_GUEST = 1, + +# ifdef VIR_ENUM_SENTINELS + VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_LAST +# endif +} virDomainMemoryFailureRecipientType; + + +/** + * virDomainMemoryFailureActionType: + * + * Action of a memory failure event. + */ +typedef enum { + /* the memory failure could be ignored. This will only be the case for + * action-optional failures. */ + VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_IGNORE = 0, + + /* memory failure occurred in guest memory, the guest enabled MCE handling + * mechanism, and hypervisor could inject the MCE into the guest + * successfully. */ + VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_INJECT = 1, + + /* the failure is unrecoverable. This occurs for action-required failures + * if the recipient is the hypervisor; hypervisor will exit. */ +
Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
Hi Sean, Thanks much for your detailed replies. It's clear to me why GPAs are different from HVAs in QEM/KVM. Thanks! I appreciate it if you could help with the following two more questions. On Tue, Oct 13, 2020 at 3:03 AM Sean Christopherson wrote: > > This is where memslots come in. Think of memslots as a one-level page tablea > that translate GPAs to HVAs. A memslot, set by userspace, tells KVM the > corresponding HVA for a given GPA. > > Before the guest is running (assuming host userspace isn't broken), the > userspace VMM will first allocate virtual memory (HVA) for all physical > memory it wants to map into the guest (GPA). It then tells KVM how to > translate a given GPA to its HVA by creating a memslot. > > To avoid getting lost in a tangent about page offsets, let's assume array[0]'s > GPA = 0xa000. For KVM to create a GPA->HPA mapping for the guest, there > _must_ > be a memslot that translates GPA 0xa000 to an HVA[*]. Let's say HVA = 0xb000. > > On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) > to > its HVA (0xb000), and then walks the host page tables to translate the HVA > into > a HPA (let's say that ends up being 0xc000). KVM then stuffs 0xc000 into the > EPT tables, which yields: > > GPA-> HVA(KVM memslots) > 0xa0000xb000 > > HVA-> HPA(host page tables) > 0xb0000xc000 > > GPA-> HPA(extended page tables) > 0xa0000xc000 > > To keep the EPT tables synchronized with the host page tables, if HVA->HPA > changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified > by the host kernel that the HVA has been unmapped and will find and unmap > the corresponding GPA (again via memslots) to HPA translations. > > Ditto for the case where userspace moves a memslot, e.g. if HVA is changed > to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses > to GPA 0xa000 from the guest will take an EPT violation and see the new HVA > (and presumably a new HPA). Q1: Is there any file like ``/proc/pid/pagemap'' to record the mappings between GPAs and HVAs in the host OS? Q2: Seems that there might be extra overhead (e.g., synchronization between EPT tables and host regular page tables; maintaining extra regular page tables and data structures), which is caused by the extra translation between GPAs to HVAs via memslots. Why doesn't KVM directly use GPAs as HVAs and leverage extended/nested page tables to translate HVAs (i.e., GPAs) to HPAs? Thanks, Harry
[libvirt PATCH] qemu: remove some unnecessary local variables
These variables seem to be left over from a previous refactoring and they don't add anything to the code. Signed-off-by: Jonathon Jongsma --- src/qemu/qemu_hotplug.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 2c184b9ba0..79fc8baa5c 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -5502,7 +5502,6 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm, virDomainRedirdevDefPtr match, virDomainRedirdevDefPtr *detach) { -virDomainRedirdevDefPtr redirdev; ssize_t idx; if ((idx = virDomainRedirdevDefFind(vm->def, match)) < 0) { @@ -5511,7 +5510,7 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm, return -1; } -*detach = redirdev = vm->def->redirdevs[idx]; +*detach = vm->def->redirdevs[idx]; return 0; } @@ -5523,12 +5522,11 @@ qemuDomainDetachPrepNet(virDomainObjPtr vm, virDomainNetDefPtr *detach) { int detachidx; -virDomainNetDefPtr net = NULL; if ((detachidx = virDomainNetFindIdx(vm->def, match)) < 0) return -1; -*detach = net = vm->def->nets[detachidx]; +*detach = vm->def->nets[detachidx]; return 0; } @@ -5598,7 +5596,6 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm, virDomainRNGDefPtr *detach) { ssize_t idx; -virDomainRNGDefPtr rng; if ((idx = virDomainRNGFind(vm->def, match)) < 0) { virReportError(VIR_ERR_DEVICE_MISSING, @@ -5608,7 +5605,7 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm, return -1; } -*detach = rng = vm->def->rngs[idx]; +*detach = vm->def->rngs[idx]; return 0; } @@ -5619,7 +5616,6 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm, virDomainMemoryDefPtr match, virDomainMemoryDefPtr *detach) { -virDomainMemoryDefPtr mem; int idx; if (qemuDomainMemoryDeviceAlignSize(vm->def, match) < 0) @@ -5633,7 +5629,7 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm, return -1; } -*detach = mem = vm->def->mems[idx]; +*detach = vm->def->mems[idx]; return 0; } -- 2.26.2
Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
Hi Paolo and Sean, Thanks much for your prompt replies and clear explanations. On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini wrote: > > No, the logic to find the HPA with a given HVA is the same as the > hardware logic to translate HVA -> HPA. That is it uses the host > "regular" page tables, not the nested page tables. > > In order to translate GPA to HPA, instead, KVM does not use the nested > page tables. I am curious why KVM does not directly use GPAs as HVAs and leverage nested page tables to translate HVAs (i.e., GPAs) to HPAs? Is that because 1) the hardware logic of ``GPA -> [extended/nested page tables] -> HPA[*]'' is different[**] from the hardware logic of ``HVA -> [host regular page tables] -> HPA''; 2) if 1) is true, it is natural to reuse Linux's original functionality to translate HVAs to HPAs through regular page tables. [*]: Here, the translation means the last step for MMU to translate a GVA's corresponding GPA to an HPA through the extended/nested page tables. [**]: To my knowledge, the hardware logic of ``GPA -> [extended/nested page tables] -> HPA'' seems to be the same as the hardware logic of ``HVA -> [host regular page tables] -> HPA''. I appreciate it if you could point out the differences I ignored. Thanks! Best, Harry
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote: > On 10/13/20 5:10 PM, Michal Privoznik wrote: > > On 10/13/20 1:35 AM, Nico Pache wrote: > > > gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79 > > > > > > The virtio-balloon device now has the ability to report free pages > > > back to the hypervisor for reuse by other programs. > > Is this something that we might want to report? I mean, we have 'virsh > dommemstat $dom' which under the hood calls: > > {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"} As I've pointed out in earlier review, I think that the feature name is a bit misleading. It sounds like a statistic, not something that actually returns memory to the host. The docs are now better but still leave a lot of room for imagination. IMO, if the feature is mainly for returning memory to the host, the 'reporting' word should not have been used.
Re: [PATCH 0/2] virtiofs can be used without NUMA
On Tue, Oct 13, 2020 at 07:10 PM +0200, Michal Privoznik wrote: > On 10/13/20 6:53 PM, Marc Hartmayer wrote: >> Halil Pasic (1): >>Reflect in virtiofs.rst that virtiofs can be used without NUMA >> >> Marc Hartmayer (1): >>qemu: virtiofs can be used without NUMA nodes >> >> docs/kbase/virtiofs.rst | 17 - >> src/qemu/qemu_validate.c | 13 + >> 2 files changed, 21 insertions(+), 9 deletions(-) >> > > Reviewed-by: Michal Privoznik > > and pushed. > > Michal > Thanks. -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH 0/2] virtiofs can be used without NUMA
On 10/13/20 6:53 PM, Marc Hartmayer wrote: Halil Pasic (1): Reflect in virtiofs.rst that virtiofs can be used without NUMA Marc Hartmayer (1): qemu: virtiofs can be used without NUMA nodes docs/kbase/virtiofs.rst | 17 - src/qemu/qemu_validate.c | 13 + 2 files changed, 21 insertions(+), 9 deletions(-) Reviewed-by: Michal Privoznik and pushed. Michal
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
On 13.10.20 19:00, Nico Pache wrote: > I don't believe so, but David might have some more insight into that > question. > > Once the feature is enabled the guest will report pages on its freelist > to the hypervisor, and the host/hypervisor will then release them. We report (some, not all) free memory of our guest to the hypervisor. So whatever the guest reported shows up in "stat-free-memory" in the stats already. The hypervisor will "discard" backing storage of reported free memory, resulting in your hypervisor having more free memory available (and the QEMU process consuming less memory). > > I'm not exactly sure what information we can or would want to 'stat' out > of this process... A counter for number of pages returned? As the guest can reuse memory any time without coordination with the hypervisor that wouldn't be of too much help. A counter would only tell you if free page reporting was once performed successfully. You would really have to compare the QEMU process memory footprint with the guest stats to figure out if free page reporting is doing what it's supposed to do. -- Thanks, David / dhildenb
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
I don't believe so, but David might have some more insight into that question. Once the feature is enabled the guest will report pages on its freelist to the hypervisor, and the host/hypervisor will then release them. I'm not exactly sure what information we can or would want to 'stat' out of this process... A counter for number of pages returned? Cheers, Nico On Tue, Oct 13, 2020, 12:47 PM Michal Privoznik wrote: > On 10/13/20 5:10 PM, Michal Privoznik wrote: > > On 10/13/20 1:35 AM, Nico Pache wrote: > >> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79 > >> > >> The virtio-balloon device now has the ability to report free pages > >> back to the hypervisor for reuse by other programs. > > Is this something that we might want to report? I mean, we have 'virsh > dommemstat $dom' which under the hood calls: > > > {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"} > > Michal > >
[PATCH 2/2] Reflect in virtiofs.rst that virtiofs can be used without NUMA
From: Halil Pasic Reflect in the virtiofs documentation that virtiofs can now be used even without NUMA. While at it, be more precise where and why shared memory is required. Signed-off-by: Halil Pasic Signed-off-by: Marc Hartmayer --- docs/kbase/virtiofs.rst | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/docs/kbase/virtiofs.rst b/docs/kbase/virtiofs.rst index dea8e79f833c..01440420d76d 100644 --- a/docs/kbase/virtiofs.rst +++ b/docs/kbase/virtiofs.rst @@ -16,10 +16,17 @@ See https://virtio-fs.gitlab.io/ Host setup == -The host-side virtiofsd daemon, like other vhost-user backed devices, -requires shared memory between the host and the guest. As of QEMU 4.2, this -requires specifying a NUMA topology for the guest and explicitly specifying -a memory backend. Multiple options are available: +Almost all virtio devices (all that use virtqueues) require access to +at least certain portions of guest RAM (possibly policed by DMA). In +case of virtiofsd, much like in case of other vhost-user (see +https://www.qemu.org/docs/master/interop/vhost-user.html) virtio +devices that are realized by an userspace process, this in practice +means that QEMU needs to allocate the backing memory for all the guest +RAM as shared memory. As of QEMU 4.2, it is possible to explicitly +specify a memory backend when specifying the NUMA topology. This +method is however only viable for machine types that do support +NUMA. As of QEMU 5.0.0, it is possible to specify the memory backend +without NUMA (using the so called memobject interface). Either of the following: @@ -46,7 +53,7 @@ Either of the following: Guest setup === -#. Specify the NUMA topology +#. Specify the NUMA topology (this step is only required for the NUMA case) in the domain XML of the guest. For the simplest one-node topology for a guest with 2GiB of RAM and 8 vCPUs: -- 2.25.4
[PATCH 1/2] qemu: virtiofs can be used without NUMA nodes
...if a machine memory-backend using shared memory is configured for the guest. This is especially important for QEMU machine types that don't have NUMA but virtiofs support. An example snippet: test 2097152 ... ... and the corresponding QEMU command line: /usr/bin/qemu-system-s390x \ -machine s390-ccw-virtio-5.2,memory-backend=s390.ram \ -m 2048 \ -object memory-backend-file,id=s390.ram,mem-path=/var/lib/libvirt/qemu/ram/46-test/s390.ram,share=yes,size=2147483648 \ ... Signed-off-by: Marc Hartmayer --- src/qemu/qemu_validate.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 27e10d59fd25..bc3043bb3f0d 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -3470,14 +3470,19 @@ qemuValidateDomainDeviceDefGraphics(const virDomainGraphicsDef *graphics, static int -qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def) +qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def, + virQEMUCapsPtr qemuCaps) { +const char *defaultRAMId = virQEMUCapsGetMachineDefaultRAMid(qemuCaps, + def->virtType, + def->os.machine); size_t numa_nodes = virDomainNumaGetNodeCount(def->numa); size_t i; -if (numa_nodes == 0) { +if (numa_nodes == 0 && +!(defaultRAMId && def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("virtiofs requires one or more NUMA nodes")); + _("virtiofs requires shared memory")); return -1; } @@ -3591,7 +3596,7 @@ qemuValidateDomainDeviceDefFS(virDomainFSDefPtr fs, _("virtiofs does not support multidevs")); return -1; } -if (qemuValidateDomainDefVirtioFSSharedMemory(def) < 0) +if (qemuValidateDomainDefVirtioFSSharedMemory(def, qemuCaps) < 0) return -1; break; -- 2.25.4
[PATCH 0/2] virtiofs can be used without NUMA
Halil Pasic (1): Reflect in virtiofs.rst that virtiofs can be used without NUMA Marc Hartmayer (1): qemu: virtiofs can be used without NUMA nodes docs/kbase/virtiofs.rst | 17 - src/qemu/qemu_validate.c | 13 + 2 files changed, 21 insertions(+), 9 deletions(-) -- 2.25.4
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
On 10/13/20 5:10 PM, Michal Privoznik wrote: On 10/13/20 1:35 AM, Nico Pache wrote: gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79 The virtio-balloon device now has the ability to report free pages back to the hypervisor for reuse by other programs. Is this something that we might want to report? I mean, we have 'virsh dommemstat $dom' which under the hood calls: {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"} Michal
Re: [PATCH v2 0/9] docs: Fix migration.html generation and report such errors next time
On Tue, Oct 13, 2020 at 18:18:04 +0200, Peter Krempa wrote: > This version refactors the XSLT transformation command a bit more > carefully and thoroughly. A semantic difference in v2 is that the output > HTMLs are no longer reformatted. The output can be verified here: https://gitlab.com/pipo.sk/libvirt/-/jobs/788658848/artifacts/browse/website/
[PATCH v2 6/9] docs/internals/meson.build: Use template code for XSLT processing
Replace the reimplementation of the XSLT processing custom target with an identical copy form docs/meson.build and an comment to keep them in sync. Signed-off-by: Peter Krempa --- docs/internals/meson.build | 55 -- 1 file changed, 35 insertions(+), 20 deletions(-) diff --git a/docs/internals/meson.build b/docs/internals/meson.build index 5d008dec5b..5aa67ded5c 100644 --- a/docs/internals/meson.build +++ b/docs/internals/meson.build @@ -5,33 +5,48 @@ internals_in_files = [ 'rpc', ] +html_xslt_gen_xslt = subsite_xsl +html_xslt_gen_install_dir = docs_html_dir / 'internals' +html_xslt_gen = [] + foreach name : internals_in_files - html_in_file = '@0...@.html.in'.format(name) - html_file = '@0@.html'.format(name) + html_xslt_gen += { +'name': name, +'source': 'docs/internals' / name + '.html.in', + } +endforeach + +# keep the XSLT processing code block in sync with docs/meson.build + +# --- begin of XSLT processing --- - out_file = custom_target( -html_file, -input: html_in_file, -output: html_file, +foreach data : html_xslt_gen + html_filename = data['name'] + '.html' + + html_file = custom_target( +html_filename, +input: data.get('file', data['name'] + '.html.in'), +output: html_filename, command: [ - meson_python_prog, - python3_prog.path(), - meson_html_gen_prog.path(), - xsltproc_prog.path(), - xmllint_prog.path(), - meson.build_root(), - docs_timestamp, - subsite_xsl, + xsltproc_prog, + '--stringparam', 'pagesrc', data.get('source', ''), + '--stringparam', 'builddir', meson.build_root(), + '--stringparam', 'timestamp', docs_timestamp, + '--nonet', + html_xslt_gen_xslt, '@INPUT@', - '@OUTPUT@', - 'docs/internals' / html_in_file, ], -depends: [ aclperms_gen ], +depends: data.get('depends', []), depend_files: [ page_xsl ], +capture: true, install: true, -install_dir: docs_html_dir / 'internals', +install_dir: html_xslt_gen_install_dir, ) - install_web_deps += out_file - install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir / 'internals') + install_web_deps += html_file + install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir endforeach + +html_xslt_gen = [] + +# --- end of XSLT processing --- -- 2.26.2
[PATCH v2 8/9] docs/manpages/meson.build: Use template code for XSLT processing
Replace the reimplementation of the XSLT processing custom target with an identical copy form docs/meson.build and an comment to keep them in sync. Signed-off-by: Peter Krempa --- docs/manpages/meson.build | 55 ++- 1 file changed, 37 insertions(+), 18 deletions(-) diff --git a/docs/manpages/meson.build b/docs/manpages/meson.build index 7ed1d304a4..ecc517e80e 100644 --- a/docs/manpages/meson.build +++ b/docs/manpages/meson.build @@ -1,3 +1,7 @@ +html_xslt_gen_xslt = subsite_xsl +html_xslt_gen_install_dir = docs_html_dir / 'manpages' +html_xslt_gen = [] + # docs_man_files # each entry is a dictionary with following items: # name - man page name (required) @@ -104,29 +108,44 @@ foreach data : docs_man_files capture: true, ) - out_file = custom_target( -html_file, -input: html_in, -output: html_file, + html_xslt_gen += { +'name': data['name'], +'file': html_in, +'source': 'docs/manpages' / rst_in_file, + } +endforeach + +# keep the XSLT processing code block in sync with docs/meson.build + +# --- begin of XSLT processing --- + +foreach data : html_xslt_gen + html_filename = data['name'] + '.html' + + html_file = custom_target( +html_filename, +input: data.get('file', data['name'] + '.html.in'), +output: html_filename, command: [ - meson_python_prog, - python3_prog.path(), - meson_html_gen_prog.path(), - xsltproc_prog.path(), - xmllint_prog.path(), - meson.build_root(), - docs_timestamp, - subsite_xsl, + xsltproc_prog, + '--stringparam', 'pagesrc', data.get('source', ''), + '--stringparam', 'builddir', meson.build_root(), + '--stringparam', 'timestamp', docs_timestamp, + '--nonet', + html_xslt_gen_xslt, '@INPUT@', - '@OUTPUT@', - 'docs/manpages' / rst_in_file, ], -depends: [ aclperms_gen ], +depends: data.get('depends', []), depend_files: [ page_xsl ], +capture: true, install: true, -install_dir: docs_html_dir / 'manpages', +install_dir: html_xslt_gen_install_dir, ) - install_web_deps += out_file - install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir / 'manpages') + install_web_deps += html_file + install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir endforeach + +html_xslt_gen = [] + +# --- end of XSLT processing --- -- 2.26.2
[PATCH v2 4/9] docs: meson.build: Generate HTML files directly by meson
Since we no longer reformat the XSLT-transformed files, there's no need to use an external script any more. Unfortunately this hid errors from 'xsltproc' as return value was not checked and the stderr was piped into xmllints stdin. The result was that any invalid input file would result into an empty output file. Since the script's only purpose was to prevent additional temporary files at the time we were reformatting the output in a pipeline we no longer need this. Moving the generation directly into the meson definition makes it more obvious what's happening and saves readers from having to parse what's going on. A free bonus is that errors are now properly caught and reported. This patch converts the main docs/ directory for now with cleanup of other comming later. Signed-off-by: Peter Krempa --- docs/meson.build | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/docs/meson.build b/docs/meson.build index d18e5c1feb..8b7c63bc6f 100644 --- a/docs/meson.build +++ b/docs/meson.build @@ -267,20 +267,17 @@ foreach data : docs_html_in_gen input: data['file'], output: html_file, command: [ - meson_python_prog, - python3_prog.path(), - meson_html_gen_prog.path(), - xsltproc_prog.path(), - xmllint_prog.path(), - meson.build_root(), - docs_timestamp, + xsltproc_prog, + '--stringparam', 'pagesrc', data.get('source', ''), + '--stringparam', 'builddir', meson.build_root(), + '--stringparam', 'timestamp', docs_timestamp, + '--nonet', site_xsl, '@INPUT@', - '@OUTPUT@', - data.get('source', []), ], depends: data.get('depends', []), depend_files: [ page_xsl ], +capture: true, install: true, install_dir: docs_html_dir, ) -- 2.26.2
[PATCH v2 0/9] docs: Fix migration.html generation and report such errors next time
This version refactors the XSLT transformation command a bit more carefully and thoroughly. A semantic difference in v2 is that the output HTMLs are no longer reformatted. Don't apply/revert first patch to see the build failure: [59/144] Generating migration.html with a meson_exe.py custom command FAILED: docs/migration.html /usr/bin/meson --internal exe --capture docs/migration.html -- /bin/xsltproc --stringparam pagesrc docs/migration.html.in --stringparam builddir /home/pipo/build/libvirt/gcc --stringparam timestamp 'Tue Oct 13 16:16:15 2020 UTC' --nonet ../../../libvirt/docs/site.xsl ../../../libvirt/docs/migration.html.in ../../../libvirt/docs/migration.html.in:664: parser error : Opening and ending tag mismatch: p line 649 and body ^ ../../../libvirt/docs/migration.html.in:665: parser error : Opening and ending tag mismatch: body line 649 and html ^ ../../../libvirt/docs/migration.html.in:666: parser error : EndTag: '
[PATCH v2 9/9] scripts: meson-html-gen: Remove
The script was obscuring what's happening and not reporting errors properly. Remove it since it's no longer used now. Signed-off-by: Peter Krempa --- scripts/meson-html-gen.py | 30 -- scripts/meson.build | 1 - 2 files changed, 31 deletions(-) delete mode 100755 scripts/meson-html-gen.py diff --git a/scripts/meson-html-gen.py b/scripts/meson-html-gen.py deleted file mode 100755 index dcc11f37cf..00 --- a/scripts/meson-html-gen.py +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env python3 - -import argparse -import subprocess - -parser = argparse.ArgumentParser() -parser.add_argument("xsltproc", type=str, help="path to xsltproc bin") -parser.add_argument("xmllint", type=str, help="path to xmllint bin") -parser.add_argument("builddir", type=str, help="build root dir path") -parser.add_argument("timestamp", type=str, help="docs timestamp") -parser.add_argument("style", type=str, help="XSL stile file") -parser.add_argument("infile", type=str, help="path to source HTML file") -parser.add_argument("htmlfile", type=str, help="path to generated HTML file") -parser.add_argument("pagesrc", type=str, default="", nargs='?', help="(optional) path to source file used for edit this page") -args = parser.parse_args() - -html = subprocess.run( -[ -args.xsltproc, -'--stringparam', 'pagesrc', args.pagesrc, -'--stringparam', 'builddir', args.builddir, -'--stringparam', 'timestamp', args.timestamp, -'--nonet', args.style, args.infile, -], -stdout=subprocess.PIPE, -stderr=subprocess.PIPE, -) - -with open(args.htmlfile, 'wb') as outfile: -outfile.write(html.stdout) diff --git a/scripts/meson.build b/scripts/meson.build index 59b3c9bacd..655ec0e0e2 100644 --- a/scripts/meson.build +++ b/scripts/meson.build @@ -22,7 +22,6 @@ scripts = [ 'meson-gen-authors.py', 'meson-gen-def.py', 'meson-gen-sym.py', - 'meson-html-gen.py', 'meson-install-dirs.py', 'meson-install-symlink.py', 'meson-install-web.py', -- 2.26.2
[PATCH v2 2/9] scripts/meson-html-gen.py: Don't rereformat output files
The output HTML files (especially those generated from rST files) don't look good even after reformatting. Skip the extra step and accept that no matter what we do HTMLs will not look great. This additionally makes it way simpler to remove meson-html-gen.py in the future (thus I've neglected to remove passing of xmllint). Signed-off-by: Peter Krempa --- scripts/meson-html-gen.py | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/scripts/meson-html-gen.py b/scripts/meson-html-gen.py index 2731d734a7..dcc11f37cf 100755 --- a/scripts/meson-html-gen.py +++ b/scripts/meson-html-gen.py @@ -14,7 +14,7 @@ parser.add_argument("htmlfile", type=str, help="path to generated HTML file") parser.add_argument("pagesrc", type=str, default="", nargs='?', help="(optional) path to source file used for edit this page") args = parser.parse_args() -html_tmp = subprocess.run( +html = subprocess.run( [ args.xsltproc, '--stringparam', 'pagesrc', args.pagesrc, @@ -26,12 +26,5 @@ html_tmp = subprocess.run( stderr=subprocess.PIPE, ) -html = subprocess.run( -[args.xmllint, '--nonet', '--format', '-'], -input=html_tmp.stdout, -stdout=subprocess.PIPE, -stderr=subprocess.PIPE, -) - with open(args.htmlfile, 'wb') as outfile: outfile.write(html.stdout) -- 2.26.2
[PATCH v2 7/9] docs/kbase/meson.build: Use template code for XSLT processing
Replace the reimplementation of the XSLT processing custom target with an identical copy form docs/meson.build and an comment to keep them in sync. Signed-off-by: Peter Krempa --- docs/kbase/meson.build | 56 +++--- 1 file changed, 36 insertions(+), 20 deletions(-) diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build index b1d1d7610b..c0fa72ff35 100644 --- a/docs/kbase/meson.build +++ b/docs/kbase/meson.build @@ -17,35 +17,51 @@ docs_kbase_files = [ 'virtiofs', ] +html_xslt_gen_xslt = subsite_xsl +html_xslt_gen_install_dir = docs_html_dir / 'kbase' +html_xslt_gen = [] + foreach name : docs_kbase_files rst_file = '@0@.rst'.format(name) - html_file = '@0@.html'.format(name) - html_in = docs_rst2html_gen.process(rst_file) + html_xslt_gen += { +'name': name, +'file': docs_rst2html_gen.process(rst_file), +'source': 'docs/kbase' / rst_file, + } +endforeach + +# keep the XSLT processing code block in sync with docs/meson.build + +# --- begin of XSLT processing --- - out_file = custom_target( -html_file, -input: html_in, -output: html_file, +foreach data : html_xslt_gen + html_filename = data['name'] + '.html' + + html_file = custom_target( +html_filename, +input: data.get('file', data['name'] + '.html.in'), +output: html_filename, command: [ - meson_python_prog, - python3_prog.path(), - meson_html_gen_prog.path(), - xsltproc_prog.path(), - xmllint_prog.path(), - meson.build_root(), - docs_timestamp, - subsite_xsl, + xsltproc_prog, + '--stringparam', 'pagesrc', data.get('source', ''), + '--stringparam', 'builddir', meson.build_root(), + '--stringparam', 'timestamp', docs_timestamp, + '--nonet', + html_xslt_gen_xslt, '@INPUT@', - '@OUTPUT@', - 'docs/kbase' / rst_file, ], -depends: [ aclperms_gen ], +depends: data.get('depends', []), depend_files: [ page_xsl ], +capture: true, install: true, -install_dir: docs_html_dir / 'kbase', +install_dir: html_xslt_gen_install_dir, ) - install_web_deps += out_file - install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir / 'kbase') + install_web_deps += html_file + install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir endforeach + +html_xslt_gen = [] + +# --- end of XSLT processing --- -- 2.26.2
[PATCH v2 3/9] docs: meson.build: Limit html files depending on 'aclperms.htmlinc'
Only 'acl.html' output file includes that file so there's no need to make everything depend on it. Signed-off-by: Peter Krempa --- docs/meson.build | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/meson.build b/docs/meson.build index 400c1ca955..d18e5c1feb 100644 --- a/docs/meson.build +++ b/docs/meson.build @@ -32,7 +32,6 @@ docs_assets = [ docs_html_in_files = [ '404', - 'acl', 'aclpolkit', 'api_extension', 'api', @@ -199,6 +198,7 @@ docs_rst2html_gen = generator( # name - base file name (required) # file - generated file (required) # source - source filename relative to repository root (optional, if there is no source) +# depends - explicit dependency on other input (optional) docs_html_in_gen = [] foreach name : docs_html_in_files @@ -219,6 +219,13 @@ foreach name : docs_rst_files } endforeach +docs_html_in_gen += { + 'name': 'acl.html', + 'file': 'acl.html.in', + 'source': 'docs' / 'acl.html.in', + 'depends': aclperms_gen, +} + hvsupport_html_in = custom_target( 'hvsupport.html.in', output: 'hvsupport.html.in', @@ -272,7 +279,7 @@ foreach data : docs_html_in_gen '@OUTPUT@', data.get('source', []), ], -depends: [ aclperms_gen ], +depends: data.get('depends', []), depend_files: [ page_xsl ], install: true, install_dir: docs_html_dir, -- 2.26.2
[PATCH v2 1/9] docs: migration: Fix syntax
One of the paragraphs added in f51cbe92c0d was not terminated thus making it invalid XML/XHTML. This was not caught by the build system as 'scripts/meson-html-gen.py' unnecessarily obscures and hides errors from 'xsltproc'. Signed-off-by: Peter Krempa --- docs/migration.html.in | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/migration.html.in b/docs/migration.html.in index 162c202227..dd5eddd6f4 100644 --- a/docs/migration.html.in +++ b/docs/migration.html.in @@ -653,6 +653,7 @@ virsh migrate --p2p --tunnelled web1 qemu+ssh://desthost/system qemu+ssh://10.0. daemons or forwarding connections to these sockets manually (using socat, netcat or a custom piece of software): + virsh migrate web1 [--p2p] --copy-storage-all 'qemu+unix:///system?socket=/tmp/migdir/test-sock-driver' 'unix:///tmp/migdir/test-sock-qemu' --disks-uri unix:///tmp/migdir/test-sock-nbd -- 2.26.2
[PATCH v2 5/9] docs: meson.build: Prepare for use of identical code for XSLT processing of htmls
Meson unfortunately doesn't give us any means to share the code using xsltproc to output HTMLs processed by our template. This means we will have to resort to copy engineering. To make things simpler, let's use the same block of code in docs/meson.build but also any of the subdirs which generate htmls. This will be achieved by making it configurable and wrapping it in a comment that instructs anybody editing it to keep it identical. We need to be able to configure the template file used and installation directory. The rest of the processing is same as we do in docs/meson.build. This code will then be copied to subdirs to refactor the current approach used there. Signed-off-by: Peter Krempa --- docs/meson.build | 60 +--- 1 file changed, 36 insertions(+), 24 deletions(-) diff --git a/docs/meson.build b/docs/meson.build index 8b7c63bc6f..a915d6252a 100644 --- a/docs/meson.build +++ b/docs/meson.build @@ -193,35 +193,37 @@ docs_rst2html_gen = generator( ) -# docs_html_in_gen: +# html_xslt_gen config + +html_xslt_gen_xslt = site_xsl +html_xslt_gen_install_dir = docs_html_dir + +html_xslt_gen = [] +# html_xslt_gen: # each entry is a dictionary with following items: -# name - base file name (required) -# file - generated file (required) +# name - base file name (required), output file will become 'name.html' +# file - input file (optional, 'name.html.in' assumed if missing) # source - source filename relative to repository root (optional, if there is no source) # depends - explicit dependency on other input (optional) -docs_html_in_gen = [] foreach name : docs_html_in_files - html_in_file = '@0...@.html.in'.format(name) - docs_html_in_gen += { + html_xslt_gen += { 'name': name, -'file': html_in_file, -'source': 'docs' / html_in_file, +'source': 'docs' / name + '.html.in', } endforeach foreach name : docs_rst_files rst_file = '@0@.rst'.format(name) - docs_html_in_gen += { + html_xslt_gen += { 'name': name, 'file': docs_rst2html_gen.process(rst_file), 'source': 'docs' / rst_file, } endforeach -docs_html_in_gen += { - 'name': 'acl.html', - 'file': 'acl.html.in', +html_xslt_gen += { + 'name': 'acl', 'source': 'docs' / 'acl.html.in', 'depends': aclperms_gen, } @@ -247,45 +249,55 @@ hvsupport_html_in = custom_target( docs_api_generated, ], ) -docs_html_in_gen += { +html_xslt_gen += { 'name': 'hvsupport', 'file': hvsupport_html_in, } news_html_in = docs_rst2html_gen.process(meson.source_root() / 'NEWS.rst') -docs_html_in_gen += { +html_xslt_gen += { 'name': 'news', 'file': news_html_in, 'source': 'NEWS.rst', } -foreach data : docs_html_in_gen - html_file = '@0@.html'.format(data['name']) +# The following code between the markers must be kept identical with the other +# copies of the code in various subdirs, since meson doesn't support any kind +# of functions. + +# --- begin of XSLT processing --- - out_file = custom_target( -html_file, -input: data['file'], -output: html_file, +foreach data : html_xslt_gen + html_filename = data['name'] + '.html' + + html_file = custom_target( +html_filename, +input: data.get('file', data['name'] + '.html.in'), +output: html_filename, command: [ xsltproc_prog, '--stringparam', 'pagesrc', data.get('source', ''), '--stringparam', 'builddir', meson.build_root(), '--stringparam', 'timestamp', docs_timestamp, '--nonet', - site_xsl, + html_xslt_gen_xslt, '@INPUT@', ], depends: data.get('depends', []), depend_files: [ page_xsl ], capture: true, install: true, -install_dir: docs_html_dir, +install_dir: html_xslt_gen_install_dir, ) - install_web_deps += out_file - install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir) + install_web_deps += html_file + install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir endforeach +html_xslt_gen = [] + +# --- end of XSLT processing --- + subdir('fonts') subdir('html') subdir('internals') -- 2.26.2
Re: [PATCH v2 1/4] Document and parser support for the Virtio free page reporting feature.
On 10/13/20 1:35 AM, Nico Pache wrote: This will add the proper documentation and parser support for the free page reporting feature that is introduced in QEMU 5.1. Signed-off-by: Nico Pache --- docs/formatdomain.rst | 6 ++ docs/schemas/domaincommon.rng | 5 + src/conf/domain_conf.c| 21 + src/conf/domain_conf.h| 1 + 4 files changed, 33 insertions(+) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index df5ac28028..e75b360b32 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -6755,6 +6755,12 @@ Example: manually added device with static PCI slot 2 requested release some memory at the last moment before a guest's process get killed by Out of Memory killer. :since:`Since 1.3.1, QEMU and KVM only` +``free-page-reporting`` + The optional ``free-page-reporting`` attribute allows to enable/disable + ("on"/"off", respectively) the ability of the QEMU virtio memory balloon to + return unused pages back to the hypervisor to be used by other guests or + processes. :since:`Since 5.1, QEMU and KVM only` This "Since" attribute should refer to Libvirt version and not qemu. The idea is that when I am reading libvirt docs, how to enable something in domain XML I want to know whether my libvirt supports it. We could also record QEMU version but in general we don't do that. Michal
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
On 10/13/20 1:35 AM, Nico Pache wrote: gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79 The virtio-balloon device now has the ability to report free pages back to the hypervisor for reuse by other programs. This kernel feature allows for more stable and resource friendly systems. This feature is available in QEMU and is enabled with free-page-reporting=on default virt setting should be off but the user should be able to enable it. Nico Pache (4): Document and parser support for the Virtio free page reporting feature. QEMU: declare qemu capabilities for the Virtio Free page reporting feature QEMU: introduce Virtio free page reporting feature provide testing for free-page-reporting feature in QEMU docs/formatdomain.rst | 6 docs/schemas/domaincommon.rng | 5 +++ src/conf/domain_conf.c| 21 +++ src/conf/domain_conf.h| 1 + src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 5 +++ src/qemu/qemu_validate.c | 7 .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + ...-options-memballoon-freepage-reporting.err | 1 + ...loon-freepage-reporting.x86_64-latest.args | 36 +++ ...-options-memballoon-freepage-reporting.xml | 22 tests/qemuxml2argvtest.c | 2 ++ ...-options-memballoon-freepage-reporting.xml | 22 15 files changed, 133 insertions(+) create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml create mode 100644 tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml Couple of white space problems, but nothing I couldn't fix before pushing. Reviewed-by: Michal Privoznik Congratulations on your first libvirt contribution! Michal
Re: [PATCH 0/2] qemu: Set noqueue qdisc for TAP devices
On 10/13/20 11:06 AM, Michal Privoznik wrote: On 10/13/20 3:09 PM, Daniel Henrique Barboza wrote: On 10/8/20 3:10 PM, Michal Privoznik wrote: See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains. Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices Reviewed-by: Daniel Henrique Barboza Thanks. Just before I merged these I realized that it will be a good idea to mock virNetDevSetRootQDisc() so that we don't run tc from the test suite. But the diff is really trivial: diff --git i/src/util/virnetdev.h w/src/util/virnetdev.h index 82943b8e08..dfef49938f 100644 --- i/src/util/virnetdev.h +++ w/src/util/virnetdev.h @@ -313,6 +313,7 @@ int virNetDevRunEthernetScript(const char *ifname, const char *script) G_GNUC_NO_INLINE; int virNetDevSetRootQDisc(const char *ifname, - const char *qdisc); + const char *qdisc) + G_GNUC_NO_INLINE; G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree); diff --git i/tests/qemuxml2argvmock.c w/tests/qemuxml2argvmock.c index 9bf4357b66..b9322f4f2a 100644 --- i/tests/qemuxml2argvmock.c +++ w/tests/qemuxml2argvmock.c @@ -286,3 +286,11 @@ qemuBuildTPMOpenBackendFDs(const char *tpmdev G_GNUC_UNUSED, *cancelfd = 1731; return 0; } + + +int +virNetDevSetRootQDisc(const char *ifname G_GNUC_UNUSED, + const char *qdisc G_GNUC_UNUSED) +{ + return 0; +} Hopefully, you are okay if I squash this to 2/2. Yep, go ahead. Have you tried it out with older kernels? Reading the bug I understood that 4.2 and older (2015 kernels) does not have qdisc support and that you can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to verify it. I didn't, but I had this on mind when writing patches and I made the whole thing best effort. Note that qemuDomainInterfaceSetDefaultQDisc() retval is not checked for => if setting noqueue fails (for whatever reason) then worst case scenario an error is printed into logs. I consider this to be a 'nice to have' that can be added in a follow up patch though, if applicable. By the way, do we have any documentation about "the latest Libvirt release will not care about N+ years old kernel/QEMU"? I'm not sure what you mean. This perhaps? https://libvirt.org/platforms.html#linux-freebsd-and-macos Thanks, this is what I was looking for. DHB Michal
Re: [PATCH 0/2] qemu: Set noqueue qdisc for TAP devices
On 10/13/20 3:09 PM, Daniel Henrique Barboza wrote: On 10/8/20 3:10 PM, Michal Privoznik wrote: See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains. Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices Reviewed-by: Daniel Henrique Barboza Thanks. Just before I merged these I realized that it will be a good idea to mock virNetDevSetRootQDisc() so that we don't run tc from the test suite. But the diff is really trivial: diff --git i/src/util/virnetdev.h w/src/util/virnetdev.h index 82943b8e08..dfef49938f 100644 --- i/src/util/virnetdev.h +++ w/src/util/virnetdev.h @@ -313,6 +313,7 @@ int virNetDevRunEthernetScript(const char *ifname, const char *script) G_GNUC_NO_INLINE; int virNetDevSetRootQDisc(const char *ifname, - const char *qdisc); + const char *qdisc) +G_GNUC_NO_INLINE; G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree); diff --git i/tests/qemuxml2argvmock.c w/tests/qemuxml2argvmock.c index 9bf4357b66..b9322f4f2a 100644 --- i/tests/qemuxml2argvmock.c +++ w/tests/qemuxml2argvmock.c @@ -286,3 +286,11 @@ qemuBuildTPMOpenBackendFDs(const char *tpmdev G_GNUC_UNUSED, *cancelfd = 1731; return 0; } + + +int +virNetDevSetRootQDisc(const char *ifname G_GNUC_UNUSED, + const char *qdisc G_GNUC_UNUSED) +{ +return 0; +} Hopefully, you are okay if I squash this to 2/2. Have you tried it out with older kernels? Reading the bug I understood that 4.2 and older (2015 kernels) does not have qdisc support and that you can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to verify it. I didn't, but I had this on mind when writing patches and I made the whole thing best effort. Note that qemuDomainInterfaceSetDefaultQDisc() retval is not checked for => if setting noqueue fails (for whatever reason) then worst case scenario an error is printed into logs. I consider this to be a 'nice to have' that can be added in a follow up patch though, if applicable. By the way, do we have any documentation about "the latest Libvirt release will not care about N+ years old kernel/QEMU"? I'm not sure what you mean. This perhaps? https://libvirt.org/platforms.html#linux-freebsd-and-macos Michal
Re: [PATCH] qemu_slirp: Check if helper exists before fetching its capabilities
On 10/13/20 9:32 AM, Michal Privoznik wrote: I've noticed when running libvirtd in the session mode that whenever I start a virtual machine the following error is printed into logs: error : cannot execute binary /usr/bin/slirp-helper: No such file or directory The error message is produced in qemuSlirpNewForHelper() which does attempt to be a NO-OP if the helper doesn't exist by checking if its path is NULL, but that's not usually the case because in the default config (in virQEMUDriverConfigNew()) its path is initialized to QEMU_SLIRP_HELPER. And while it is true that during configure we try to get the actual path of the helper we fallback to the path above if not found. Fix the check so that the function checks whether the helper exists and is executable. Signed-off-by: Michal Privoznik --- src/qemu/qemu_slirp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c index d2e4ed79be..4e5ab12727 100644 --- a/src/qemu/qemu_slirp.c +++ b/src/qemu/qemu_slirp.c @@ -101,7 +101,8 @@ qemuSlirpNewForHelper(const char *helper) virJSONValuePtr featuresJSON; size_t i, nfeatures; -if (!helper) +if (!helper || +!virFileIsExecutable(helper)) I would prefer if either then entire expression was on a single line (since it's so short), or if you're going to put it on multiple lines, that you surroung the "return NULL;" with { } (I think the coding standards say to do this, but the syntax-check doesn't check it) Reviewed-by: Laine Stump return NULL; slirp = qemuSlirpNew();
Re: [PATCH] qemu_slirp: Check if helper exists before fetching its capabilities
On Tue, Oct 13, 2020 at 03:32:28PM +0200, Michal Privoznik wrote: > I've noticed when running libvirtd in the session mode that > whenever I start a virtual machine the following error is printed > into logs: > > error : cannot execute binary /usr/bin/slirp-helper: No such file or > directory > > The error message is produced in qemuSlirpNewForHelper() which > does attempt to be a NO-OP if the helper doesn't exist by > checking if its path is NULL, but that's not usually the case > because in the default config (in virQEMUDriverConfigNew()) its > path is initialized to QEMU_SLIRP_HELPER. And while it is true > that during configure we try to get the actual path of the helper > we fallback to the path above if not found. > > Fix the check so that the function checks whether the helper > exists and is executable. > > Signed-off-by: Michal Privoznik > --- > src/qemu/qemu_slirp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c > index d2e4ed79be..4e5ab12727 100644 > --- a/src/qemu/qemu_slirp.c > +++ b/src/qemu/qemu_slirp.c > @@ -101,7 +101,8 @@ qemuSlirpNewForHelper(const char *helper) > virJSONValuePtr featuresJSON; > size_t i, nfeatures; > > -if (!helper) > +if (!helper || > +!virFileIsExecutable(helper)) > return NULL; This API has way bigger problems than this. It is reporting errors using virReportError, but the callers ignore them and can't distinguish real errors from expected errors. I'm working on a more comprehensive fix and will send patch shortly Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] bhyve: implement virtio-9p support
Daniel P. Berrangé wrote: > On Thu, Oct 08, 2020 at 05:06:16PM +0400, Roman Bogorodskiy wrote: > > Recently virtio-9p support was added to bhyve. > > > > On the host side it looks this way: > > > > bhyve -s 25:0,virtio-9p,sharename=/path/to/shared/dir > > > > It could also have ",ro" suffix to make share read-only. > > > > In the Linux guest, this share is mounted with: > > > > mount -t 9p sharename /mnt/sharename > > > > In the guest user will see the same permissions and ownership > > information for this directory as on the host. No uid/gid remapping is > > supported, so those could resolve to wrong user or group names. > > > > The same applies to the other side: chowning/chmodding in the guest will > > set specified ownership and permissions on the host. > > > > In libvirt domain XML it's modeled using the 'filesystem' element: > > > > > > > > > > > > > > diff --git a/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml > > b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml > > new file mode 100644 > > index 00..6341236654 > > --- /dev/null > > +++ b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml > > @@ -0,0 +1,28 @@ > > + > > + bhyve > > + df3be7e7-a104-11e3-aeb0-50e5492bd3dc > > + 219136 > > + 1 > > + > > +hvm > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > > + > function='0x0'/> > > + > > + > > This is missing the type="mount" attribute which should be mandatory. > It suggests we're not validating the type in the driver, before accessing > the element, which is dangerous. > > > + > > + > > + > > + > > + > > + > > The other demo XML files are the same. Hm, as I can see in the schema, type="mount" is default. That's what I see in virDomainFSDefParseXML() @ src/conf/domain_conf.c as well. I also check that in the driver, and there's a test for it: tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-unsupported-type.xml Are you referring to something different? > Regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| > Roman Bogorodskiy signature.asc Description: PGP signature
[PATCH] qemu_slirp: Check if helper exists before fetching its capabilities
I've noticed when running libvirtd in the session mode that whenever I start a virtual machine the following error is printed into logs: error : cannot execute binary /usr/bin/slirp-helper: No such file or directory The error message is produced in qemuSlirpNewForHelper() which does attempt to be a NO-OP if the helper doesn't exist by checking if its path is NULL, but that's not usually the case because in the default config (in virQEMUDriverConfigNew()) its path is initialized to QEMU_SLIRP_HELPER. And while it is true that during configure we try to get the actual path of the helper we fallback to the path above if not found. Fix the check so that the function checks whether the helper exists and is executable. Signed-off-by: Michal Privoznik --- src/qemu/qemu_slirp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c index d2e4ed79be..4e5ab12727 100644 --- a/src/qemu/qemu_slirp.c +++ b/src/qemu/qemu_slirp.c @@ -101,7 +101,8 @@ qemuSlirpNewForHelper(const char *helper) virJSONValuePtr featuresJSON; size_t i, nfeatures; -if (!helper) +if (!helper || +!virFileIsExecutable(helper)) return NULL; slirp = qemuSlirpNew(); -- 2.26.2
Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting
On 10/12/20 8:35 PM, Nico Pache wrote: gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79 The virtio-balloon device now has the ability to report free pages back to the hypervisor for reuse by other programs. This kernel feature allows for more stable and resource friendly systems. This feature is available in QEMU and is enabled with free-page-reporting=on default virt setting should be off but the user should be able to enable it. Nico Pache (4): Document and parser support for the Virtio free page reporting feature. QEMU: declare qemu capabilities for the Virtio Free page reporting feature QEMU: introduce Virtio free page reporting feature provide testing for free-page-reporting feature in QEMU Reviewed-by: Daniel Henrique Barboza docs/formatdomain.rst | 6 docs/schemas/domaincommon.rng | 5 +++ src/conf/domain_conf.c| 21 +++ src/conf/domain_conf.h| 1 + src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 5 +++ src/qemu/qemu_validate.c | 7 .../caps_5.1.0.x86_64.xml | 1 + .../caps_5.2.0.x86_64.xml | 1 + ...-options-memballoon-freepage-reporting.err | 1 + ...loon-freepage-reporting.x86_64-latest.args | 36 +++ ...-options-memballoon-freepage-reporting.xml | 22 tests/qemuxml2argvtest.c | 2 ++ ...-options-memballoon-freepage-reporting.xml | 22 15 files changed, 133 insertions(+) create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml create mode 100644 tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml
Re: [libvirt PATCH 0/4] vmx: start parsing SATA disks
On 10/12/20 5:13 PM, Pino Toscano wrote: Try to parse SATA disks in VMware guests from their configs in VMX files. There are a couple of helper/cleanup commits to ease a bit the actual patches. Pino Toscano (4): vmx: hide private helpers vmx: shortcut 'cdrom-image' as CD-ROM earlier vmx: expand the disk array vmx: start parsing SATA disks src/libvirt_vmx.syms | 12 - src/vmx/vmx.c | 212 -- src/vmx/vmx.h | 44 .../vmx2xml-esx-in-the-wild-10.vmx| 101 + .../vmx2xml-esx-in-the-wild-10.xml| 36 +++ .../vmx2xmldata/vmx2xml-esx-in-the-wild-8.xml | 6 + tests/vmx2xmltest.c | 1 + 7 files changed, 335 insertions(+), 77 deletions(-) create mode 100644 tests/vmx2xmldata/vmx2xml-esx-in-the-wild-10.vmx create mode 100644 tests/vmx2xmldata/vmx2xml-esx-in-the-wild-10.xml Reviewed-by: Michal Privoznik Michal
Re: [PATCH 0/2] qemu: Set noqueue qdisc for TAP devices
On 10/8/20 3:10 PM, Michal Privoznik wrote: See 2/2 for detailed explanation. Long story short - we can squeeze more bandwidth from TAP devices we create for domains. Michal Prívozník (2): virnetdev: Introduce virNetDevSetRootQDisc() qemu: Set noqueue qdisc for TAP devices Reviewed-by: Daniel Henrique Barboza Have you tried it out with older kernels? Reading the bug I understood that 4.2 and older (2015 kernels) does not have qdisc support and that you can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to verify it. I consider this to be a 'nice to have' that can be added in a follow up patch though, if applicable. By the way, do we have any documentation about "the latest Libvirt release will not care about N+ years old kernel/QEMU"? Thanks, DHB src/libvirt_private.syms | 1 + src/qemu/qemu_command.c | 2 ++ src/qemu/qemu_domain.c | 36 +++ src/qemu/qemu_domain.h | 4 src/qemu/qemu_hotplug.c | 2 ++ src/util/virnetdev.c | 46 src/util/virnetdev.h | 3 +++ 7 files changed, 94 insertions(+)
Re: [RFC] qemu: virtiofs can be used without NUMA nodes
On 10/6/20 6:20 PM, Marc Hartmayer wrote: ...if a machine memory-backend using shared memory is configured for the guest. This is especially important for QEMU machine types that don't have NUMA but virtiofs support. An example snippet: test 2097152 ... ... and the corresponding QEMU command line: /usr/bin/qemu-system-s390x \ -machine s390-ccw-virtio-5.2,memory-backend=s390.ram \ -m 2048 \ -object memory-backend-file,id=s390.ram,mem-path=/var/lib/libvirt/qemu/ram/46-test/s390.ram,share=yes,size=2147483648 \ ... Signed-off-by: Marc Hartmayer --- Note: There are still some TODOs left... e.g. adapt the virtiofs documentation of libvirt. Yep, but looks good. --- src/qemu/qemu_validate.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index a212605579d2..077a85b30802 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -3470,14 +3470,21 @@ qemuValidateDomainDeviceDefGraphics(const virDomainGraphicsDef *graphics, static int -qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def) +qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def, + virQEMUCapsPtr qemuCaps) { +const char *defaultRAMId = virQEMUCapsGetMachineDefaultRAMid(qemuCaps, + def->virtType, + def->os.machine); size_t numa_nodes = virDomainNumaGetNodeCount(def->numa); size_t i; -if (numa_nodes == 0) { +if (numa_nodes == 0 && +!(defaultRAMId && def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED)) { +/* TODO do we need further checks here (e.g. check whether + * memory backend is supported by the QEMU binary)? */ I don't think we need that. memory backends can't be compiled out (well, unless a distro has a patch on the top of qemu which would do exactly that), can defaultRAMId exposed is strictly newer than memory backends. I think this comment can be removed and the rest can be kept as is (plus the docs). Michal
Re: [PATCH] bhyve: implement virtio-9p support
On Thu, Oct 08, 2020 at 05:06:16PM +0400, Roman Bogorodskiy wrote: > Recently virtio-9p support was added to bhyve. > > On the host side it looks this way: > > bhyve -s 25:0,virtio-9p,sharename=/path/to/shared/dir > > It could also have ",ro" suffix to make share read-only. > > In the Linux guest, this share is mounted with: > > mount -t 9p sharename /mnt/sharename > > In the guest user will see the same permissions and ownership > information for this directory as on the host. No uid/gid remapping is > supported, so those could resolve to wrong user or group names. > > The same applies to the other side: chowning/chmodding in the guest will > set specified ownership and permissions on the host. > > In libvirt domain XML it's modeled using the 'filesystem' element: > > > > > > diff --git a/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml > b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml > new file mode 100644 > index 00..6341236654 > --- /dev/null > +++ b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml > @@ -0,0 +1,28 @@ > + > + bhyve > + df3be7e7-a104-11e3-aeb0-50e5492bd3dc > + 219136 > + 1 > + > +hvm > + > + > + > + > + > + > + > + > + > + > + > + > + function='0x0'/> > + > + This is missing the type="mount" attribute which should be mandatory. It suggests we're not validating the type in the driver, before accessing the element, which is dangerous. > + > + > + > + > + > + The other demo XML files are the same. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v2 1/4] API: introduce memory failure
This patch failed to compile in my env: FAILED: tools/virsh.p/virsh-domain.c.o [] -D_FUNCTION_DEF -MD -MQ tools/virsh.p/virsh-domain.c.o -MF tools/virsh.p/virsh-domain.c.o.d -o tools/virsh.p/virsh-domain.c.o -c ../tools/virsh-domain.c In file included from /usr/lib64/glib-2.0/include/glibconfig.h:9, from /usr/include/glib-2.0/glib/gtypes.h:32, from /usr/include/glib-2.0/glib/galloca.h:32, from /usr/include/glib-2.0/glib.h:30, from ../src/util/glibcompat.h:21, from ../src/internal.h:30, from ../tools/virsh.h:25, from ../tools/virsh-domain.h:23, from ../tools/virsh-domain.c:22: /usr/include/glib-2.0/glib/gmacros.h:745:53: error: size of array ‘_GStaticAssertCompileTimeAssertion_185’ is negative 745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE (_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] G_GNUC_UNUSED | ^~~ /usr/include/glib-2.0/glib/gmacros.h:735:47: note: in definition of macro ‘G_PASTE_ARGS’ 735 | #define G_PASTE_ARGS(identifier1,identifier2) identifier1 ## identifier2 | ^~~ /usr/include/glib-2.0/glib/gmacros.h:745:44: note: in expansion of macro ‘G_PASTE’ 745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE (_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] G_GNUC_UNUSED |^~~ ../tools/virsh-domain.c:13643:1: note: in expansion of macro ‘G_STATIC_ASSERT’ 13643 | G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == G_N_ELEMENTS(virshDomainEventCallbacks)); | ^~~ [505/984] Compiling C object src/virtqemud.p/remote_remote_daemon_dispatch.c.o ninja: build stopped: subcommand failed. $ I didn't verify if the following patches fixes it. Thanks, DHB On 10/12/20 9:31 AM, zhenwei pi wrote: Introduce memory failure event. Libvirt should monitor domain's event, then posts it to uplayer. According to the hardware memory corrupted message, the cloud scheduler could migrate domain to another health physical server. Signed-off-by: zhenwei pi --- include/libvirt/libvirt-domain.h| 82 + src/conf/domain_event.c | 80 src/conf/domain_event.h | 12 ++ src/libvirt_private.syms| 2 + src/remote/remote_daemon_dispatch.c | 32 +++ src/remote/remote_driver.c | 32 +++ src/remote/remote_protocol.x| 16 +++- src/remote_protocol-structs | 8 8 files changed, 263 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 77f9116675..5138843a56 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -3196,6 +3196,64 @@ typedef enum { } virDomainEventCrashedDetailType; /** + * virDomainMemoryFailureRecipientType: + * + * Recipient of a memory failure event. + */ +typedef enum { +/* memory failure at hypersivor memory address space */ +VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_HYPERVISOR = 0, + +/* memory failure at guest memory address space */ +VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_GUEST = 1, + +# ifdef VIR_ENUM_SENTINELS +VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_LAST +# endif +} virDomainMemoryFailureRecipientType; + + +/** + * virDomainMemoryFailureActionType: + * + * Action of a memory failure event. + */ +typedef enum { +/* the memory failure could be ignored. This will only be the case for + * action-optional failures. */ +VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_IGNORE = 0, + +/* memory failure occurred in guest memory, the guest enabled MCE handling + * mechanism, and hypervisor could inject the MCE into the guest + * successfully. */ +VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_INJECT = 1, + +/* the failure is unrecoverable. This occurs for action-required failures + * if the recipient is the hypervisor; hypervisor will exit. */ +VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_FATAL = 2, + +/* the failure is unrecoverable but confined to the guest. This occurs if + * the recipient is a guest which is not ready to handle memory failures. */ +VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_RESET = 3, + +# ifdef VIR_ENUM_SENTINELS +VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_LAST +# endif +} virDomainMemoryFailureActionType; + + +typedef enum { +/* whether a memory failure event is action-required or action-optional + * (e.g. a failure during memory scrub). */ +VIR_DOMAIN_MEMORY_FAILURE_ACTION_REQUIRED = (1 << 0), + +/* whether the failure occurred while the previous failure was still in + * progress. */ +VIR_DOMAIN_MEMORY_FAILURE_RECURSIVE = (1
[PATCH v11 02/13] copy-on-read: add filter append/drop functions
Provide API for the COR-filter insertion/removal. Also, drop the filter child permissions for an inactive state when the filter node is being removed. Signed-off-by: Andrey Shinkevich Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/copy-on-read.c | 88 block/copy-on-read.h | 35 + 2 files changed, 123 insertions(+) create mode 100644 block/copy-on-read.h diff --git a/block/copy-on-read.c b/block/copy-on-read.c index cb03e0f..bcccf0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -23,11 +23,21 @@ #include "qemu/osdep.h" #include "block/block_int.h" #include "qemu/module.h" +#include "qapi/error.h" +#include "qapi/qmp/qdict.h" +#include "block/copy-on-read.h" + + +typedef struct BDRVStateCOR { +bool active; +} BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { +BDRVStateCOR *state = bs->opaque; + bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false, errp); @@ -42,6 +52,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); +state->active = true; + +/* + * We don't need to call bdrv_child_refresh_perms() now as the permissions + * will be updated later when the filter node gets its parent. + */ + return 0; } @@ -57,6 +74,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild *c, uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared) { +BDRVStateCOR *s = bs->opaque; + +if (!s->active) { +/* + * While the filter is being removed + */ +*nperm = 0; +*nshared = BLK_PERM_ALL; +return; +} + *nperm = perm & PERM_PASSTHROUGH; *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED; @@ -135,6 +163,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked) static BlockDriver bdrv_copy_on_read = { .format_name= "copy-on-read", +.instance_size = sizeof(BDRVStateCOR), .bdrv_open = cor_open, .bdrv_child_perm= cor_child_perm, @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void) bdrv_register(_copy_on_read); } + +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs, + QDict *node_options, + int flags, Error **errp) +{ +BlockDriverState *cor_filter_bs; +Error *local_err = NULL; + +cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp); +if (cor_filter_bs == NULL) { +error_prepend(errp, "Could not create COR-filter node: "); +return NULL; +} + +if (!qdict_get_try_str(node_options, "node-name")) { +cor_filter_bs->implicit = true; +} + +bdrv_drained_begin(bs); +bdrv_replace_node(bs, cor_filter_bs, _err); +bdrv_drained_end(bs); + +if (local_err) { +bdrv_unref(cor_filter_bs); +error_propagate(errp, local_err); +return NULL; +} + +return cor_filter_bs; +} + + +void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs) +{ +BdrvChild *child; +BlockDriverState *bs; +BDRVStateCOR *s = cor_filter_bs->opaque; + +child = bdrv_filter_child(cor_filter_bs); +if (!child) { +return; +} +bs = child->bs; + +/* Retain the BDS until we complete the graph change. */ +bdrv_ref(bs); +/* Hold a guest back from writing while permissions are being reset. */ +bdrv_drained_begin(bs); +/* Drop permissions before the graph change. */ +s->active = false; +bdrv_child_refresh_perms(cor_filter_bs, child, _abort); +bdrv_replace_node(cor_filter_bs, bs, _abort); + +bdrv_drained_end(bs); +bdrv_unref(bs); +bdrv_unref(cor_filter_bs); +} + + block_init(bdrv_copy_on_read_init); diff --git a/block/copy-on-read.h b/block/copy-on-read.h new file mode 100644 index 000..d6f2422 --- /dev/null +++ b/block/copy-on-read.h @@ -0,0 +1,35 @@ +/* + * Copy-on-read filter block driver + * + * The filter driver performs Copy-On-Read (COR) operations + * + * Copyright (c) 2018-2020 Virtuozzo International GmbH. + * + * Author: + * Andrey Shinkevich + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of
[PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
We are going to use the COR-filter for a block-stream job. To limit COR operations by the base node in the backing chain during stream job, pass the name of overlay base node to the copy-on-read driver as base node itself may change due to possible concurrent jobs. The rest of the functionality will be implemented in the patch that follows. Signed-off-by: Andrey Shinkevich --- block/copy-on-read.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index bcccf0f..c578b1b 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -24,19 +24,24 @@ #include "block/block_int.h" #include "qemu/module.h" #include "qapi/error.h" +#include "qapi/qmp/qerror.h" #include "qapi/qmp/qdict.h" #include "block/copy-on-read.h" typedef struct BDRVStateCOR { bool active; +BlockDriverState *base_overlay; } BDRVStateCOR; static int cor_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { +BlockDriverState *base_overlay = NULL; BDRVStateCOR *state = bs->opaque; +/* We need the base overlay node rather than the base itself */ +const char *base_overlay_node = qdict_get_try_str(options, "base"); bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds, BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) & bs->file->bs->supported_zero_flags); +if (base_overlay_node) { +qdict_del(options, "base"); +base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp); +if (!base_overlay) { +error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node); +return -EINVAL; +} +} state->active = true; +state->base_overlay = base_overlay; /* * We don't need to call bdrv_child_refresh_perms() now as the permissions -- 1.8.3.1
[PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions
Add support for the recently introduced functions bdrv_co_preadv_part() and bdrv_co_pwritev_part() to the COR-filter driver. Signed-off-by: Andrey Shinkevich Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/copy-on-read.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index 2816e61..cb03e0f 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs) } -static int coroutine_fn cor_co_preadv(BlockDriverState *bs, - uint64_t offset, uint64_t bytes, - QEMUIOVector *qiov, int flags) +static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, + uint64_t offset, uint64_t bytes, + QEMUIOVector *qiov, + size_t qiov_offset, + int flags) { -return bdrv_co_preadv(bs->file, offset, bytes, qiov, - flags | BDRV_REQ_COPY_ON_READ); +return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); } -static int coroutine_fn cor_co_pwritev(BlockDriverState *bs, - uint64_t offset, uint64_t bytes, - QEMUIOVector *qiov, int flags) +static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs, +uint64_t offset, +uint64_t bytes, +QEMUIOVector *qiov, +size_t qiov_offset, int flags) { - -return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); +return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset, +flags); } @@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = { .bdrv_getlength = cor_getlength, -.bdrv_co_preadv = cor_co_preadv, -.bdrv_co_pwritev= cor_co_pwritev, +.bdrv_co_preadv_part= cor_co_preadv_part, +.bdrv_co_pwritev_part = cor_co_pwritev_part, .bdrv_co_pwrite_zeroes = cor_co_pwrite_zeroes, .bdrv_co_pdiscard = cor_co_pdiscard, .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed, -- 1.8.3.1
[PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
If the flag BDRV_REQ_PREFETCH was set, pass it further to the COR-driver to skip unneeded reading. It can be taken into account for the COR-algorithms optimization. That check is being made during the block stream job by the moment. Signed-off-by: Andrey Shinkevich --- block/copy-on-read.c | 13 + block/io.c | 3 ++- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index b136895..278a11a 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, } } -ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, - local_flags); -if (ret < 0) { -return ret; +if (!!(flags & BDRV_REQ_PREFETCH) & +!(local_flags & BDRV_REQ_COPY_ON_READ)) { +/* Skip non-guest reads if no copy needed */ +} else { +ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); +if (ret < 0) { +return ret; +} } offset += n; diff --git a/block/io.c b/block/io.c index 11df188..bff1808 100644 --- a/block/io.c +++ b/block/io.c @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { -ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0); +ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, + flags & bs->supported_read_flags); goto out; } -- 1.8.3.1
[PATCH v11 07/13] block: include supported_read_flags into BDS structure
Add the new member supported_read_flags to BlockDriverState structure. It will control the BDRV_REQ_PREFETCH flag set for copy-on-read operations. Signed-off-by: Andrey Shinkevich --- include/block/block_int.h | 4 1 file changed, 4 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index f782737..a142867 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -873,6 +873,10 @@ struct BlockDriverState { /* I/O Limits */ BlockLimits bl; +/* + * Flags honored during pread (so far: BDRV_REQ_PREFETCH) + */ +unsigned int supported_read_flags; /* Flags honored during pwrite (so far: BDRV_REQ_FUA, * BDRV_REQ_WRITE_UNCHANGED). * If a driver does not support BDRV_REQ_WRITE_UNCHANGED, those -- 1.8.3.1
[PATCH v11 11/13] stream: mark backing-file argument as deprecated
Whereas the block-stream job starts using a backing file name of the base node overlay after the block-stream job completes, mark the QMP 'backing-file' argument as deprecated. Signed-off-by: Andrey Shinkevich --- docs/system/deprecated.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 8b3ab5b..7491fcf 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -285,6 +285,12 @@ details. The ``query-events`` command has been superseded by the more powerful and accurate ``query-qmp-schema`` command. +``block-stream`` argument ``backing-file`` (since 5.2) +' + +The argument ``backing-file`` is deprecated. QEMU uses a backing file +name of the base node overlay after the block-stream job completes. + chardev client socket with ``wait`` option (since 4.0) '' -- 1.8.3.1
[PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter
Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags of the COR-filter. Signed-off-by: Andrey Shinkevich --- block/copy-on-read.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index dfbd6ad..b136895 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags, return -EINVAL; } +bs->supported_read_flags = BDRV_REQ_PREFETCH; bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED | (BDRV_REQ_FUA & bs->file->bs->supported_write_flags); -- 1.8.3.1
[PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag
Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to use it alone and pass it to the COR-filter driver for further processing. Signed-off-by: Andrey Shinkevich --- include/block/block.h | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 981ab5b..2b7efd1 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -71,9 +71,10 @@ typedef enum { BDRV_REQ_NO_FALLBACK= 0x100, /* - * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ - * on read request and means that caller doesn't really need data to be - * written to qiov parameter which may be NULL. + * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ + * flag or when the COR-filter applied to read operations and means that + * caller doesn't really need data to be written to qiov parameter which + * may be NULL. */ BDRV_REQ_PREFETCH = 0x200, /* Mask of valid flags */ -- 1.8.3.1
[PATCH v11 00/13] Apply COR-filter to the block-stream permanently
The iotest case test_stream_parallel still does not pass after the COR-filter is inserted into the backing chain. As the test case may not be initialized, it does not make a sense and was removed again. v11: 04: Base node overlay is used instead of base. 05: Base node overlay is used instead of base. 06: New. 07: New. 08: New. 09: The new BDS-member 'supported_read_flags' is applied. 10: The 'base_metadata' variable renamed to 'base_unfiltered'. 11: New. 12: The backing-file argument is left in the QMP interface. Warning added. 13: The BDRV_REQ_COPY_ON_READ removed from the stream_populate(); The 'implicit' initialization moved back to COR-filter driver. Base node overlay is used instead of base. The v8 Message-Id: <1601383109-110988-1-git-send-email-andrey.shinkev...@virtuozzo.com> Andrey Shinkevich (13): copy-on-read: Support preadv/pwritev_part functions copy-on-read: add filter append/drop functions qapi: add filter-node-name to block-stream copy-on-read: pass overlay base node name to COR driver copy-on-read: limit COR operations to base in COR driver block: modify the comment for BDRV_REQ_PREFETCH flag block: include supported_read_flags into BDS structure copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter copy-on-read: skip non-guest reads if no copy needed stream: skip filters when writing backing file name to QCOW2 header stream: mark backing-file argument as deprecated stream: remove unused backing-file name parameter block: apply COR-filter to block-stream jobs block/copy-on-read.c | 171 ++--- block/copy-on-read.h | 35 + block/io.c | 3 +- block/monitor/block-hmp-cmds.c | 4 +- block/stream.c | 112 --- blockdev.c | 25 +++--- docs/system/deprecated.rst | 6 ++ include/block/block.h | 7 +- include/block/block_int.h | 13 +++- qapi/block-core.json | 6 ++ tests/qemu-iotests/030 | 51 ++-- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++-- 14 files changed, 324 insertions(+), 134 deletions(-) create mode 100644 block/copy-on-read.h -- 1.8.3.1
[PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
Limit COR operations by the base node in the backing chain when the overlay base node name is given. It will be useful for a block stream job when the COR-filter is applied. The overlay base node is passed as the base itself may change due to concurrent commit jobs on the same backing chain. Signed-off-by: Andrey Shinkevich --- block/copy-on-read.c | 39 +-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index c578b1b..dfbd6ad 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs, size_t qiov_offset, int flags) { -return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, - flags | BDRV_REQ_COPY_ON_READ); +int64_t n = 0; +int64_t size = offset + bytes; +int local_flags; +int ret; +BDRVStateCOR *state = bs->opaque; + +if (!state->base_overlay) { +return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset, + flags | BDRV_REQ_COPY_ON_READ); +} + +while (offset < size) { +local_flags = flags; + +/* In case of failure, try to copy-on-read anyway */ +ret = bdrv_is_allocated(bs->file->bs, offset, bytes, ); +if (!ret) { +ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs), + state->base_overlay, true, offset, + n, ); +if (ret) { +local_flags |= BDRV_REQ_COPY_ON_READ; +} +} + +ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset, + local_flags); +if (ret < 0) { +return ret; +} + +offset += n; +qiov_offset += n; +bytes -= n; +} + +return 0; } -- 1.8.3.1
[PATCH v11 12/13] stream: remove unused backing-file name parameter
The 'backing-file' argument is not used by the block-stream job. It designates a backing file name to set in QCOW2 image header after the block-stream job finished. A backing file name of the node above base is used instead. Signed-off-by: Andrey Shinkevich --- block/stream.c| 6 +- blockdev.c| 21 ++--- include/block/block_int.h | 2 +- 3 files changed, 8 insertions(+), 21 deletions(-) diff --git a/block/stream.c b/block/stream.c index 51462bd..d3e1812 100644 --- a/block/stream.c +++ b/block/stream.c @@ -34,7 +34,6 @@ typedef struct StreamBlockJob { BlockDriverState *base_overlay; /* COW overlay (stream from this) */ BlockDriverState *above_base; /* Node directly above the base */ BlockdevOnError on_error; -char *backing_file_str; bool bs_read_only; bool chain_frozen; } StreamBlockJob; @@ -103,8 +102,6 @@ static void stream_clean(Job *job) blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort); bdrv_reopen_set_read_only(bs, true, NULL); } - -g_free(s->backing_file_str); } static int coroutine_fn stream_run(Job *job, Error **errp) @@ -220,7 +217,7 @@ static const BlockJobDriver stream_job_driver = { }; void stream_start(const char *job_id, BlockDriverState *bs, - BlockDriverState *base, const char *backing_file_str, + BlockDriverState *base, int creation_flags, int64_t speed, BlockdevOnError on_error, const char *filter_node_name, @@ -295,7 +292,6 @@ void stream_start(const char *job_id, BlockDriverState *bs, s->base_overlay = base_overlay; s->above_base = above_base; -s->backing_file_str = g_strdup(backing_file_str); s->bs_read_only = bs_read_only; s->chain_frozen = true; diff --git a/blockdev.c b/blockdev.c index d719c47..019b6e0 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2498,7 +2498,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, BlockDriverState *base_bs = NULL; AioContext *aio_context; Error *local_err = NULL; -const char *base_name = NULL; int job_flags = JOB_DEFAULT; if (!has_on_error) { @@ -2526,7 +2525,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, goto out; } assert(bdrv_get_aio_context(base_bs) == aio_context); -base_name = base; } if (has_base_node) { @@ -2541,7 +2539,11 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, } assert(bdrv_get_aio_context(base_bs) == aio_context); bdrv_refresh_filename(base_bs); -base_name = base_bs->filename; +} + +if (has_backing_file) { +warn_report("Use of \"backing-file\" argument is deprecated; " +"a backing file of the node above base is used instead"); } /* Check for op blockers in the whole chain between bs and base */ @@ -2553,17 +2555,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, } } -/* if we are streaming the entire chain, the result will have no backing - * file, and specifying one is therefore an error */ -if (base_bs == NULL && has_backing_file) { -error_setg(errp, "backing file specified, but streaming the " - "entire chain"); -goto out; -} - -/* backing_file string overrides base bs filename */ -base_name = has_backing_file ? backing_file : base_name; - if (has_auto_finalize && !auto_finalize) { job_flags |= JOB_MANUAL_FINALIZE; } @@ -2571,7 +2562,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, job_flags |= JOB_MANUAL_DISMISS; } -stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name, +stream_start(has_job_id ? job_id : NULL, bs, base_bs, job_flags, has_speed ? speed : 0, on_error, filter_node_name, _err); if (local_err) { diff --git a/include/block/block_int.h b/include/block/block_int.h index a142867..4f523c3 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -1151,7 +1151,7 @@ int is_windows_drive(const char *filename); * BlockDriverState. */ void stream_start(const char *job_id, BlockDriverState *bs, - BlockDriverState *base, const char *backing_file_str, + BlockDriverState *base, int creation_flags, int64_t speed, BlockdevOnError on_error, const char *filter_node_name, -- 1.8.3.1
[PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header
Avoid writing a filter JSON-name to QCOW2 image when the backing file is changed after the block stream job. Signed-off-by: Andrey Shinkevich --- block/stream.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/block/stream.c b/block/stream.c index e0540ee..51462bd 100644 --- a/block/stream.c +++ b/block/stream.c @@ -65,6 +65,7 @@ static int stream_prepare(Job *job) BlockDriverState *bs = blk_bs(bjob->blk); BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); +BlockDriverState *base_unfiltered = bdrv_skip_filters(base); Error *local_err = NULL; int ret = 0; @@ -73,10 +74,10 @@ static int stream_prepare(Job *job) if (bdrv_cow_child(unfiltered_bs)) { const char *base_id = NULL, *base_fmt = NULL; -if (base) { -base_id = s->backing_file_str; -if (base->drv) { -base_fmt = base->drv->format_name; +if (base_unfiltered) { +base_id = base_unfiltered->filename; +if (base_unfiltered->drv) { +base_fmt = base_unfiltered->drv->format_name; } } bdrv_set_backing_hd(unfiltered_bs, base, _err); -- 1.8.3.1
[PATCH v11 13/13] block: apply COR-filter to block-stream jobs
This patch completes the series with the COR-filter insertion for block-stream operations. Adding the filter makes it possible for copied regions to be discarded in backing files during the block-stream job, what will reduce the disk overuse. The COR-filter insertion incurs changes in the iotests case 245:test_block_stream_4 that reopens the backing chain during a block-stream job. There are changes in the iotests #030 as well. The iotests case 030:test_stream_parallel was deleted due to multiple conflicts between the concurrent job operations over the same backing chain. The base backing node for one job is the top node for another job. It may change due to the filter node inserted into the backing chain while both jobs are running. Another issue is that the parts of the backing chain are being frozen by the running job and may not be changed by the concurrent job when needed. The concept of the parallel jobs with common nodes is considered vital no more. Signed-off-by: Andrey Shinkevich --- block/stream.c | 93 +- tests/qemu-iotests/030 | 51 +++-- tests/qemu-iotests/030.out | 4 +- tests/qemu-iotests/141.out | 2 +- tests/qemu-iotests/245 | 19 +++--- 5 files changed, 81 insertions(+), 88 deletions(-) diff --git a/block/stream.c b/block/stream.c index d3e1812..93564db 100644 --- a/block/stream.c +++ b/block/stream.c @@ -17,8 +17,10 @@ #include "block/blockjob_int.h" #include "qapi/error.h" #include "qapi/qmp/qerror.h" +#include "qapi/qmp/qdict.h" #include "qemu/ratelimit.h" #include "sysemu/block-backend.h" +#include "block/copy-on-read.h" enum { /* @@ -33,6 +35,8 @@ typedef struct StreamBlockJob { BlockJob common; BlockDriverState *base_overlay; /* COW overlay (stream from this) */ BlockDriverState *above_base; /* Node directly above the base */ +BlockDriverState *cor_filter_bs; +BlockDriverState *target_bs; BlockdevOnError on_error; bool bs_read_only; bool chain_frozen; @@ -43,8 +47,7 @@ static int coroutine_fn stream_populate(BlockBackend *blk, { assert(bytes < SIZE_MAX); -return blk_co_preadv(blk, offset, bytes, NULL, - BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH); +return blk_co_preadv(blk, offset, bytes, NULL, BDRV_REQ_PREFETCH); } static void stream_abort(Job *job) @@ -52,23 +55,20 @@ static void stream_abort(Job *job) StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); if (s->chain_frozen) { -BlockJob *bjob = >common; -bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base); +bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base); } } static int stream_prepare(Job *job) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); -BlockJob *bjob = >common; -BlockDriverState *bs = blk_bs(bjob->blk); -BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); +BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs); BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base); BlockDriverState *base_unfiltered = bdrv_skip_filters(base); Error *local_err = NULL; int ret = 0; -bdrv_unfreeze_backing_chain(bs, s->above_base); +bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base); s->chain_frozen = false; if (bdrv_cow_child(unfiltered_bs)) { @@ -94,13 +94,14 @@ static void stream_clean(Job *job) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); BlockJob *bjob = >common; -BlockDriverState *bs = blk_bs(bjob->blk); + +bdrv_cor_filter_drop(s->cor_filter_bs); /* Reopen the image back in read-only mode if necessary */ if (s->bs_read_only) { /* Give up write permissions before making it read-only */ blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort); -bdrv_reopen_set_read_only(bs, true, NULL); +bdrv_reopen_set_read_only(s->target_bs, true, NULL); } } @@ -108,9 +109,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp) { StreamBlockJob *s = container_of(job, StreamBlockJob, common.job); BlockBackend *blk = s->common.blk; -BlockDriverState *bs = blk_bs(blk); -BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs); -bool enable_cor = !bdrv_cow_child(s->base_overlay); +BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs); int64_t len; int64_t offset = 0; uint64_t delay_ns = 0; @@ -122,21 +121,12 @@ static int coroutine_fn stream_run(Job *job, Error **errp) return 0; } -len = bdrv_getlength(bs); +len = bdrv_getlength(s->target_bs); if (len < 0) { return len; } job_progress_set_remaining(>common.job, len); -/* Turn on copy-on-read for the whole block device so that guest read - * requests help us make progress. Only do this when copying the entire - *
[PATCH v11 03/13] qapi: add filter-node-name to block-stream
Provide the possibility to pass the 'filter-node-name' parameter to the block-stream job as it is done for the commit block job. Signed-off-by: Andrey Shinkevich Reviewed-by: Vladimir Sementsov-Ogievskiy --- block/monitor/block-hmp-cmds.c | 4 ++-- block/stream.c | 4 +++- blockdev.c | 4 +++- include/block/block_int.h | 7 ++- qapi/block-core.json | 6 ++ 5 files changed, 20 insertions(+), 5 deletions(-) diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c index 4d3db5e..4e66775 100644 --- a/block/monitor/block-hmp-cmds.c +++ b/block/monitor/block-hmp-cmds.c @@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict) qmp_block_stream(true, device, device, base != NULL, base, false, NULL, false, NULL, qdict_haskey(qdict, "speed"), speed, true, - BLOCKDEV_ON_ERROR_REPORT, false, false, false, false, - ); + BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, false, + false, ); hmp_handle_error(mon, error); } diff --git a/block/stream.c b/block/stream.c index 8ce6729..e0540ee 100644 --- a/block/stream.c +++ b/block/stream.c @@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = { void stream_start(const char *job_id, BlockDriverState *bs, BlockDriverState *base, const char *backing_file_str, int creation_flags, int64_t speed, - BlockdevOnError on_error, Error **errp) + BlockdevOnError on_error, + const char *filter_node_name, + Error **errp) { StreamBlockJob *s; BlockDriverState *iter; diff --git a/blockdev.c b/blockdev.c index bebd3ba..d719c47 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2489,6 +2489,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, bool has_backing_file, const char *backing_file, bool has_speed, int64_t speed, bool has_on_error, BlockdevOnError on_error, + bool has_filter_node_name, const char *filter_node_name, bool has_auto_finalize, bool auto_finalize, bool has_auto_dismiss, bool auto_dismiss, Error **errp) @@ -2571,7 +2572,8 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, } stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name, - job_flags, has_speed ? speed : 0, on_error, _err); + job_flags, has_speed ? speed : 0, on_error, + filter_node_name, _err); if (local_err) { error_propagate(errp, local_err); goto out; diff --git a/include/block/block_int.h b/include/block/block_int.h index 38cad9d..f782737 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -1134,6 +1134,9 @@ int is_windows_drive(const char *filename); * See @BlockJobCreateFlags * @speed: The maximum speed, in bytes per second, or 0 for unlimited. * @on_error: The action to take upon error. + * @filter_node_name: The node name that should be assigned to the filter + * driver that the commit job inserts into the graph above @bs. NULL means + * that a node name should be autogenerated. * @errp: Error object. * * Start a streaming operation on @bs. Clusters that are unallocated @@ -1146,7 +1149,9 @@ int is_windows_drive(const char *filename); void stream_start(const char *job_id, BlockDriverState *bs, BlockDriverState *base, const char *backing_file_str, int creation_flags, int64_t speed, - BlockdevOnError on_error, Error **errp); + BlockdevOnError on_error, + const char *filter_node_name, + Error **errp); /** * commit_start: diff --git a/qapi/block-core.json b/qapi/block-core.json index 3c16f1e..32fb097 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2533,6 +2533,11 @@ #'stop' and 'enospc' can only be used if the block device #supports io-status (see BlockInfo). Since 1.3. # +# @filter-node-name: the node name that should be assigned to the +#filter driver that the stream job inserts into the graph +#above @device. If this option is not given, a node name is +#autogenerated. (Since: 5.2) +# # @auto-finalize: When false, this job will wait in a PENDING state after it has # finished its work, waiting for @block-job-finalize before # making any block graph changes. @@ -2563,6 +2568,7 @@ 'data': { '*job-id': 'str', 'device': 'str', '*base': 'str', '*base-node': 'str', '*backing-file': 'str', '*speed': 'int', '*on-error': 'BlockdevOnError', +
Re: Remove rpath from RPMs
Signed-off-by Daniel Letai d...@letai.org.il On 13/10/2020 13:14:49, Daniel P. Berrangé wrote: On Tue, Oct 13, 2020 at 11:03:28AM +0100, Daniel P. Berrangé wrote: On Sat, Oct 10, 2020 at 06:11:53PM +0300, Daniel Letai wrote: Hi, Attached see simple patch to remove rpath during meson build, so rpmbuild on redhat/fedora based distros don't complain. Seeing as all rpaths are in standard locations, this seems safe to me. This was supposed to have been fixed by commit 69980ab798e240923ef12f86a92665b6c9ff7292 Author: Andrea Bolognani Date: Wed Aug 19 11:15:35 2020 +0200 however that overlooked that %meson macro adds --auto-features=enabled, which caused the rpath to be force enabled in the RPM build. >From 6622771b2f595b293e8872a44bfceb7f2097e4a5 Mon Sep 17 00:00:00 2001 From: Daniel Letai Date: Sat, 10 Oct 2020 18:02:13 +0300 Subject: [PATCH] Remove rpath from rpms --- libvirt.spec.in | 1 + 1 file changed, 1 insertion(+) diff --git a/libvirt.spec.in b/libvirt.spec.in index edf919d7ba..7e356bb843 100644 --- a/libvirt.spec.in +++ b/libvirt.spec.in @@ -1184,6 +1184,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' %{_specdir}/%{name}.spec) -Dinit_script=systemd \ -Ddocs=enabled \ -Dtests=enabled \ + -Drpath=disabled \ %{?arg_login_shell} so yeah, we need this. Reviewed-by: Daniel P. Berrangé BTW, we require contributions to be signed-off, to assert that the author is in compliance with the Developer Certificate of Origin https://developercertificate.org/ Assuming you're fine with this, just reply to this mail with a Signed-Off-By line that has your name+email addr Regards, Daniel -- Regards, Daniel Letai +972 (0)505 870 456
Re: Remove rpath from RPMs
On Tue, Oct 13, 2020 at 11:03:28AM +0100, Daniel P. Berrangé wrote: > On Sat, Oct 10, 2020 at 06:11:53PM +0300, Daniel Letai wrote: > >Hi, > > > >Attached see simple patch to remove rpath during meson build, so rpmbuild > >on redhat/fedora based distros don't complain. > > > >Seeing as all rpaths are in standard locations, this seems safe to me. > > This was supposed to have been fixed by > > commit 69980ab798e240923ef12f86a92665b6c9ff7292 > Author: Andrea Bolognani > Date: Wed Aug 19 11:15:35 2020 +0200 > > however that overlooked that %meson macro adds --auto-features=enabled, > which caused the rpath to be force enabled in the RPM build. > > > >From 6622771b2f595b293e8872a44bfceb7f2097e4a5 Mon Sep 17 00:00:00 2001 > > From: Daniel Letai > > Date: Sat, 10 Oct 2020 18:02:13 +0300 > > Subject: [PATCH] Remove rpath from rpms > > > > --- > > libvirt.spec.in | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/libvirt.spec.in b/libvirt.spec.in > > index edf919d7ba..7e356bb843 100644 > > --- a/libvirt.spec.in > > +++ b/libvirt.spec.in > > @@ -1184,6 +1184,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' > > %{_specdir}/%{name}.spec) > > -Dinit_script=systemd \ > > -Ddocs=enabled \ > > -Dtests=enabled \ > > + -Drpath=disabled \ > > %{?arg_login_shell} > > so yeah, we need this. > > Reviewed-by: Daniel P. Berrangé BTW, we require contributions to be signed-off, to assert that the author is in compliance with the Developer Certificate of Origin https://developercertificate.org/ Assuming you're fine with this, just reply to this mail with a Signed-Off-By line that has your name+email addr Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: Remove rpath from RPMs
On Sat, Oct 10, 2020 at 06:11:53PM +0300, Daniel Letai wrote: >Hi, > >Attached see simple patch to remove rpath during meson build, so rpmbuild >on redhat/fedora based distros don't complain. > >Seeing as all rpaths are in standard locations, this seems safe to me. This was supposed to have been fixed by commit 69980ab798e240923ef12f86a92665b6c9ff7292 Author: Andrea Bolognani Date: Wed Aug 19 11:15:35 2020 +0200 however that overlooked that %meson macro adds --auto-features=enabled, which caused the rpath to be force enabled in the RPM build. > >From 6622771b2f595b293e8872a44bfceb7f2097e4a5 Mon Sep 17 00:00:00 2001 > From: Daniel Letai > Date: Sat, 10 Oct 2020 18:02:13 +0300 > Subject: [PATCH] Remove rpath from rpms > > --- > libvirt.spec.in | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libvirt.spec.in b/libvirt.spec.in > index edf919d7ba..7e356bb843 100644 > --- a/libvirt.spec.in > +++ b/libvirt.spec.in > @@ -1184,6 +1184,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' > %{_specdir}/%{name}.spec) > -Dinit_script=systemd \ > -Ddocs=enabled \ > -Dtests=enabled \ > +-Drpath=disabled \ > %{?arg_login_shell} so yeah, we need this. Reviewed-by: Daniel P. Berrangé Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH] logging: allow max_len=0 to disable log rollover
On Tue, Oct 13, 2020 at 10:21:58 +0100, Daniel Berrange wrote: > Currently setting max_len=0 causes virtlogd to spin in a busy loop. It > is natural to allow this to disable log rollover which can be useful for > developers debugging things. > > Note disabling rollover exposes the host to denial of service from a > malicious guest, so must be used with care. > > Closes https://gitlab.com/libvirt/libvirt/-/issues/85 > Signed-off-by: Daniel P. Berrangé > --- > src/logging/virtlogd.conf | 4 > src/util/virrotatingfile.c | 48 +- > 2 files changed, 30 insertions(+), 22 deletions(-) Reviewed-by: Peter Krempa
Re: [libvirt PATCH 0/2] fixes allocation issues leading to crashes
On a Monday in 2020, Pavel Hrdina wrote: Our libvirt-dbus tests discovered allocation issues introduced by recent rewrite to use g_new0 instead of VIR_ALLOC_N by allocating array one element shorter in some places which can lead to crashes of client applications counting on a fact that the returned arrays are NULL terminated. Oops, I did not notice that my vim macro [0] dropped these (due to looking for the ')' from the left side). There seem to be no other cases like this in my changes where the expression contains a parenthesis. Thanks for fixing this. Jano Pavel Hrdina (2): conf: fix g_new0 allocation conf: virsecretobj: fix g_new0 allocation src/conf/virinterfaceobj.c | 2 +- src/conf/virnetworkobj.c| 4 ++-- src/conf/virnodedeviceobj.c | 2 +- src/conf/virsecretobj.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) [0] I was not able to replace everything with Coccinelle signature.asc Description: PGP signature
[libvirt PATCH] logging: allow max_len=0 to disable log rollover
Currently setting max_len=0 causes virtlogd to spin in a busy loop. It is natural to allow this to disable log rollover which can be useful for developers debugging things. Note disabling rollover exposes the host to denial of service from a malicious guest, so must be used with care. Closes https://gitlab.com/libvirt/libvirt/-/issues/85 Signed-off-by: Daniel P. Berrangé --- src/logging/virtlogd.conf | 4 src/util/virrotatingfile.c | 48 +- 2 files changed, 30 insertions(+), 22 deletions(-) diff --git a/src/logging/virtlogd.conf b/src/logging/virtlogd.conf index 8b1ff0156f..c53a1112bd 100644 --- a/src/logging/virtlogd.conf +++ b/src/logging/virtlogd.conf @@ -87,6 +87,10 @@ # Maximum file size before rolling over. Defaults to 2 MB # +# Setting max_size to zero will disable rollover entirely. +# NOTE: disabling rollover exposes the host filesystem to +# denial of service from a malicious guest. +# # Beware that a logrotate config file might be installed too, # to handle cases where virtlogd is disabled. To ensure that # the logrotate config is a no-op when virtlogd is running, diff --git a/src/util/virrotatingfile.c b/src/util/virrotatingfile.c index a88c332cf4..9f1ef17c3e 100644 --- a/src/util/virrotatingfile.c +++ b/src/util/virrotatingfile.c @@ -225,7 +225,8 @@ virRotatingFileWriterDelete(virRotatingFileWriterPtr file) * * The files will never exceed @maxlen bytes in size, * but may be rolled over before they reach this size - * in order to avoid splitting lines + * in order to avoid splitting lines. If @maxlen is + * zero then no rollover will be performed. */ virRotatingFileWriterPtr virRotatingFileWriterNew(const char *path, @@ -430,25 +431,27 @@ virRotatingFileWriterAppend(virRotatingFileWriterPtr file, size_t towrite = len; bool forceRollover = false; -if (file->entry->pos > file->maxlen) { -/* If existing file is for some reason larger then max length we - * won't write to this file anymore, but we rollover this file.*/ -forceRollover = true; -towrite = 0; -} else if ((file->entry->pos + towrite) > file->maxlen) { -towrite = file->maxlen - file->entry->pos; - -/* - * If there's a newline in the last 80 chars - * we're about to write, then break at that - * point to avoid splitting lines across - * separate files - */ -for (i = 0; i < towrite && i < 80; i++) { -if (buf[towrite - i - 1] == '\n') { -towrite -= i; -forceRollover = true; -break; +if (file->maxlen != 0) { +if (file->entry->pos > file->maxlen) { +/* If existing file is for some reason larger then max length we + * won't write to this file anymore, but we rollover this file.*/ +forceRollover = true; +towrite = 0; +} else if ((file->entry->pos + towrite) > file->maxlen) { +towrite = file->maxlen - file->entry->pos; + +/* + * If there's a newline in the last 80 chars + * we're about to write, then break at that + * point to avoid splitting lines across + * separate files + */ +for (i = 0; i < towrite && i < 80; i++) { +if (buf[towrite - i - 1] == '\n') { +towrite -= i; +forceRollover = true; +break; +} } } } @@ -468,8 +471,9 @@ virRotatingFileWriterAppend(virRotatingFileWriterPtr file, file->entry->len += towrite; } -if ((file->entry->pos == file->maxlen && len) || -forceRollover) { +if (file->maxlen != 0 && +((file->entry->pos == file->maxlen && len) || + forceRollover)) { virRotatingFileWriterEntryPtr tmp; VIR_DEBUG("Hit max size %zu on %s (force=%d)", file->maxlen, file->basepath, forceRollover); -- 2.26.2
[PATCH v2 6/7] hyperv: fix domainManagedSave on Hyper-V V2
Signed-off-by: Matt Coleman --- src/hyperv/hyperv_driver.c | 8 ++-- src/hyperv/hyperv_wmi_classes.h | 1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c index aeee5b2b9f..fb46f42631 100644 --- a/src/hyperv/hyperv_driver.c +++ b/src/hyperv/hyperv_driver.c @@ -1644,6 +1644,10 @@ hypervDomainManagedSave(virDomainPtr domain, unsigned int flags) hypervPrivate *priv = domain->conn->privateData; Msvm_ComputerSystem *computerSystem = NULL; bool in_transition = false; +int requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_SUSPENDED; + +if (priv->wmiVersion == HYPERV_WMI_VERSION_V2) +requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED_BUT_OFFLINE; virCheckFlags(0, -1); @@ -1657,8 +1661,8 @@ hypervDomainManagedSave(virDomainPtr domain, unsigned int flags) goto cleanup; } -result = hypervInvokeMsvmComputerSystemRequestStateChange - (domain, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_SUSPENDED); +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain, + requestedState); cleanup: hypervFreeObject(priv, (hypervObject *)computerSystem); diff --git a/src/hyperv/hyperv_wmi_classes.h b/src/hyperv/hyperv_wmi_classes.h index 0074d8889e..a5213901c8 100644 --- a/src/hyperv/hyperv_wmi_classes.h +++ b/src/hyperv/hyperv_wmi_classes.h @@ -73,6 +73,7 @@ enum _Msvm_ComputerSystem_EnabledState { enum _Msvm_ComputerSystem_RequestedState { MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED = 2, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_DISABLED = 3, +MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED_BUT_OFFLINE = 6, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE = 9, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT = 10, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET = 11, -- 2.27.0
[PATCH v2 7/7] news: more Hyper-V APIs
Signed-off-by: Matt Coleman --- NEWS.rst | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/NEWS.rst b/NEWS.rst index bc35458f38..d0454b7840 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -22,8 +22,11 @@ v6.9.0 (unreleased) * hyperv: implement new APIs The ``virConnectGetCapabilities()``, ``virConnectGetMaxVcpus()``, -``virConnectGetVersion()``, and ``virDomainGetAutostart()`` APIs have been -implemented in the Hyper-V driver. +``virConnectGetVersion()``, ``virDomainGetAutostart()``, +``virDomainSetAutostart()``, ``virNodeGetFreeMemory()``, +``virDomainReboot()``, ``virDomainReset()``, ``virDomainShutdown()``, and +``virDomainShutdownFlags()`` APIs have been implemented in the Hyper-V +driver. * bhyve: implement virtio-9p filesystem support -- 2.27.0
[PATCH v2 4/7] hyperv: implement domainShutdown and domainShutdownFlags
Co-authored-by: Sri Ramanujam Signed-off-by: Matt Coleman --- src/hyperv/hyperv_driver.c| 77 ++ src/hyperv/hyperv_wmi_generator.input | 78 +++ 2 files changed, 155 insertions(+) diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c index 7182340f74..024ba7c6f4 100644 --- a/src/hyperv/hyperv_driver.c +++ b/src/hyperv/hyperv_driver.c @@ -917,6 +917,81 @@ hypervDomainResume(virDomainPtr domain) +static int +hypervDomainShutdownFlags(virDomainPtr domain, unsigned int flags) +{ +int result = -1; +hypervPrivate *priv = domain->conn->privateData; +Msvm_ComputerSystem *computerSystem = NULL; +Msvm_ShutdownComponent *shutdown = NULL; +bool in_transition = false; +char uuid[VIR_UUID_STRING_BUFLEN]; +g_auto(virBuffer) query = VIR_BUFFER_INITIALIZER; +hypervInvokeParamsListPtr params = NULL; +g_autofree char *selector = NULL; + +virCheckFlags(0, -1); + +virUUIDFormat(domain->uuid, uuid); + +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) +goto cleanup; + +if (!hypervIsMsvmComputerSystemActive(computerSystem, _transition) || +in_transition) { +virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Domain is not active or in state transition")); +goto cleanup; +} + +virBufferEscapeSQL(, + MSVM_SHUTDOWNCOMPONENT_WQL_SELECT + "WHERE SystemName = '%s'", uuid); + +if (hypervGetWmiClass(Msvm_ShutdownComponent, ) < 0 || +!shutdown) { +virReportError(VIR_ERR_OPERATION_FAILED, + _("Could not get Msvm_ShutdownComponent for domain with UUID '%s'"), + uuid); +goto cleanup; +} + +selector = g_strdup_printf( +"CreationClassName=\"Msvm_ShutdownComponent\"=\"%s\"&" + "SystemCreationClassName=\"Msvm_ComputerSystem\"=\"%s\"", +shutdown->data.common->DeviceID, uuid); + +params = hypervCreateInvokeParamsList(priv, "InitiateShutdown", selector, + Msvm_ShutdownComponent_WmiInfo); + +hypervAddSimpleParam(params, "Force", "False"); +hypervAddSimpleParam(params, "Reason", _("Planned shutdown via libvirt")); + +if (hypervInvokeMethod(priv, params, NULL) < 0) { +virReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not shutdown domain with UUID '%s'"), uuid); +goto cleanup; +} + +result = 0; + + cleanup: +hypervFreeObject(priv, (hypervObject *) computerSystem); +hypervFreeObject(priv, (hypervObject *) shutdown); + +return result; +} + + + +static int +hypervDomainShutdown(virDomainPtr domain) +{ +return hypervDomainShutdownFlags(domain, 0); +} + + + static int hypervDomainReboot(virDomainPtr domain, unsigned int flags) { @@ -2013,6 +2088,8 @@ static virHypervisorDriver hypervHypervisorDriver = { .domainLookupByName = hypervDomainLookupByName, /* 0.9.5 */ .domainSuspend = hypervDomainSuspend, /* 0.9.5 */ .domainResume = hypervDomainResume, /* 0.9.5 */ +.domainShutdown = hypervDomainShutdown, /* 6.9.0 */ +.domainShutdownFlags = hypervDomainShutdownFlags, /* 6.9.0 */ .domainReboot = hypervDomainReboot, /* 6.9.0 */ .domainReset = hypervDomainReset, /* 6.9.0 */ .domainDestroy = hypervDomainDestroy, /* 0.9.5 */ diff --git a/src/hyperv/hyperv_wmi_generator.input b/src/hyperv/hyperv_wmi_generator.input index bbca550790..1377138a12 100644 --- a/src/hyperv/hyperv_wmi_generator.input +++ b/src/hyperv/hyperv_wmi_generator.input @@ -1072,3 +1072,81 @@ class v2/Msvm_Keyboard uint16 Password boolean UnicodeSupported end + + +class Msvm_ShutdownComponent +string Caption +string Description +string ElementName +datetime InstallDate +string Name +uint16 OperationalStatus[] +string StatusDescriptions[] +string Status +uint16 HealthState +uint16 EnabledState +string OtherEnabledState +uint16 RequestedState +uint16 EnabledDefault +datetime TimeOfLastStateChange +string SystemCreationClassName +string SystemName +string CreationClassName +string DeviceID +boolean PowerManagementSupported +uint16 PowerManagementCapabilities[] +uint16 Availability +uint16 StatusInfo +uint32 LastErrorCode +string ErrorDescription +boolean ErrorCleared +string OtherIdentifyingInfo[] +uint64 PowerOnHours +uint64 TotalPowerOnHours +string IdentifyingDescriptions[] +uint16 AdditionalAvailability[] +uint64 MaxQuiesceTime +uint16 LocationIndicator +end + + +class v2/Msvm_ShutdownComponent +string InstanceID +string Caption +string Description +string ElementName +datetime InstallDate +string Name +uint16 OperationalStatus[] +string
[PATCH v2 5/7] hyperv: fix domainSuspend and domainResume on Hyper-V V2
Signed-off-by: Matt Coleman --- src/hyperv/hyperv_driver.c | 15 +++ src/hyperv/hyperv_wmi_classes.h | 1 + 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c index 024ba7c6f4..aeee5b2b9f 100644 --- a/src/hyperv/hyperv_driver.c +++ b/src/hyperv/hyperv_driver.c @@ -867,6 +867,10 @@ hypervDomainSuspend(virDomainPtr domain) int result = -1; hypervPrivate *priv = domain->conn->privateData; Msvm_ComputerSystem *computerSystem = NULL; +int requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED; + +if (priv->wmiVersion == HYPERV_WMI_VERSION_V2) +requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE; if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) goto cleanup; @@ -878,8 +882,8 @@ hypervDomainSuspend(virDomainPtr domain) goto cleanup; } -result = hypervInvokeMsvmComputerSystemRequestStateChange - (domain, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED); +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain, + requestedState); cleanup: hypervFreeObject(priv, (hypervObject *)computerSystem); @@ -895,12 +899,15 @@ hypervDomainResume(virDomainPtr domain) int result = -1; hypervPrivate *priv = domain->conn->privateData; Msvm_ComputerSystem *computerSystem = NULL; +int expectedState = MSVM_COMPUTERSYSTEM_ENABLEDSTATE_PAUSED; + +if (priv->wmiVersion == HYPERV_WMI_VERSION_V2) +expectedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE; if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) goto cleanup; -if (computerSystem->data.common->EnabledState != -MSVM_COMPUTERSYSTEM_ENABLEDSTATE_PAUSED) { +if (computerSystem->data.common->EnabledState != expectedState) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("Domain is not paused")); goto cleanup; diff --git a/src/hyperv/hyperv_wmi_classes.h b/src/hyperv/hyperv_wmi_classes.h index 7f4159dd8e..0074d8889e 100644 --- a/src/hyperv/hyperv_wmi_classes.h +++ b/src/hyperv/hyperv_wmi_classes.h @@ -73,6 +73,7 @@ enum _Msvm_ComputerSystem_EnabledState { enum _Msvm_ComputerSystem_RequestedState { MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED = 2, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_DISABLED = 3, +MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE = 9, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT = 10, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET = 11, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED = 32768, -- 2.27.0
[PATCH v2 2/7] hyperv: implement nodeGetFreeMemory
Co-authored-by: Sri Ramanujam Signed-off-by: Matt Coleman --- src/hyperv/hyperv_driver.c | 28 1 file changed, 28 insertions(+) diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c index 79b48a9dff..c4fca4685e 100644 --- a/src/hyperv/hyperv_driver.c +++ b/src/hyperv/hyperv_driver.c @@ -1400,6 +1400,33 @@ hypervDomainSetAutostart(virDomainPtr domain, int autostart) +static unsigned long long +hypervNodeGetFreeMemory(virConnectPtr conn) +{ +unsigned long long res = 0; +hypervPrivate *priv = conn->privateData; +Win32_OperatingSystem *operatingSystem = NULL; +g_auto(virBuffer) query = { g_string_new(WIN32_OPERATINGSYSTEM_WQL_SELECT), 0 }; + +if (hypervGetWmiClass(Win32_OperatingSystem, ) < 0 || +!operatingSystem) { +virReportError(VIR_ERR_INTERNAL_ERROR, + _("Could not get free memory for host %s"), + conn->uri->server); +goto cleanup; +} + +/* Return free memory in bytes */ +res = operatingSystem->data.common->FreePhysicalMemory * 1024; + +cleanup: +hypervFreeObject(priv, (hypervObject *) operatingSystem); + +return res; +} + + + static int hypervConnectIsEncrypted(virConnectPtr conn) { @@ -1952,6 +1979,7 @@ static virHypervisorDriver hypervHypervisorDriver = { .domainCreateWithFlags = hypervDomainCreateWithFlags, /* 0.9.5 */ .domainGetAutostart = hypervDomainGetAutostart, /* 6.9.0 */ .domainSetAutostart = hypervDomainSetAutostart, /* 6.9.0 */ +.nodeGetFreeMemory = hypervNodeGetFreeMemory, /* 6.9.0 */ .connectIsEncrypted = hypervConnectIsEncrypted, /* 0.9.5 */ .connectIsSecure = hypervConnectIsSecure, /* 0.9.5 */ .domainIsActive = hypervDomainIsActive, /* 0.9.5 */ -- 2.27.0
[PATCH v2 3/7] hyperv: implement domainReboot and domainReset
Co-authored-by: Sri Ramanujam Signed-off-by: Matt Coleman --- src/hyperv/hyperv_driver.c | 48 + src/hyperv/hyperv_wmi_classes.h | 1 + 2 files changed, 49 insertions(+) diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c index c4fca4685e..7182340f74 100644 --- a/src/hyperv/hyperv_driver.c +++ b/src/hyperv/hyperv_driver.c @@ -917,6 +917,52 @@ hypervDomainResume(virDomainPtr domain) +static int +hypervDomainReboot(virDomainPtr domain, unsigned int flags) +{ +int result = -1; +hypervPrivate *priv = domain->conn->privateData; +Msvm_ComputerSystem *computerSystem = NULL; + +virCheckFlags(0, -1); + +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) +goto cleanup; + +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain, +MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT); + + cleanup: +hypervFreeObject(priv, (hypervObject *)computerSystem); + +return result; +} + + + +static int +hypervDomainReset(virDomainPtr domain, unsigned int flags) +{ +int result = -1; +hypervPrivate *priv = domain->conn->privateData; +Msvm_ComputerSystem *computerSystem = NULL; + +virCheckFlags(0, -1); + +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) +goto cleanup; + +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain, +MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET); + + cleanup: +hypervFreeObject(priv, (hypervObject *)computerSystem); + +return result; +} + + + static int hypervDomainDestroyFlags(virDomainPtr domain, unsigned int flags) { @@ -1967,6 +2013,8 @@ static virHypervisorDriver hypervHypervisorDriver = { .domainLookupByName = hypervDomainLookupByName, /* 0.9.5 */ .domainSuspend = hypervDomainSuspend, /* 0.9.5 */ .domainResume = hypervDomainResume, /* 0.9.5 */ +.domainReboot = hypervDomainReboot, /* 6.9.0 */ +.domainReset = hypervDomainReset, /* 6.9.0 */ .domainDestroy = hypervDomainDestroy, /* 0.9.5 */ .domainDestroyFlags = hypervDomainDestroyFlags, /* 0.9.5 */ .domainGetOSType = hypervDomainGetOSType, /* 0.9.5 */ diff --git a/src/hyperv/hyperv_wmi_classes.h b/src/hyperv/hyperv_wmi_classes.h index d32711589a..7f4159dd8e 100644 --- a/src/hyperv/hyperv_wmi_classes.h +++ b/src/hyperv/hyperv_wmi_classes.h @@ -74,6 +74,7 @@ enum _Msvm_ComputerSystem_RequestedState { MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED = 2, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_DISABLED = 3, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT = 10, +MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET = 11, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED = 32768, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_SUSPENDED = 32769, }; -- 2.27.0
[PATCH v2 0/7] more Hyper-V APIs
This set of patches adds several new APIs and fixes several others. Changes since v1: * 'enabledValue' and 'disabledValue' are now static strings in hypervDomainSetAutostart() * a long virReportError() call has been split into two lines Matt Coleman (7): hyperv: implement domainSetAutostart hyperv: implement nodeGetFreeMemory hyperv: implement domainReboot and domainReset hyperv: implement domainShutdown and domainShutdownFlags hyperv: fix domainSuspend and domainResume on Hyper-V V2 hyperv: fix domainManagedSave on Hyper-V V2 news: more Hyper-V APIs NEWS.rst | 7 +- src/hyperv/hyperv_driver.c| 267 +- src/hyperv/hyperv_wmi_classes.h | 3 + src/hyperv/hyperv_wmi_generator.input | 78 4 files changed, 347 insertions(+), 8 deletions(-) -- 2.27.0
[PATCH v2 1/7] hyperv: implement domainSetAutostart
Co-authored-by: Sri Ramanujam Signed-off-by: Matt Coleman --- src/hyperv/hyperv_driver.c | 91 ++ 1 file changed, 91 insertions(+) diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c index 2ac30fa4c6..79b48a9dff 100644 --- a/src/hyperv/hyperv_driver.c +++ b/src/hyperv/hyperv_driver.c @@ -1310,6 +1310,96 @@ hypervDomainGetAutostart(virDomainPtr domain, int *autostart) +static int +hypervDomainSetAutostart(virDomainPtr domain, int autostart) +{ +int result = -1; +char uuid_string[VIR_UUID_STRING_BUFLEN]; +hypervPrivate *priv = domain->conn->privateData; +Msvm_VirtualSystemSettingData *vssd = NULL; +hypervInvokeParamsListPtr params = NULL; +g_auto(virBuffer) eprQuery = VIR_BUFFER_INITIALIZER; +virHashTablePtr autostartParam = NULL; +hypervWmiClassInfoListPtr embeddedParamClass = NULL; +const char *methodName = NULL, *embeddedParamName = NULL; +char enabledValue[] = "2", disabledValue[] = "0"; + +if (priv->wmiVersion == HYPERV_WMI_VERSION_V1) { +methodName = "ModifyVirtualSystem"; +embeddedParamName = "SystemSettingData"; +embeddedParamClass = Msvm_VirtualSystemGlobalSettingData_WmiInfo; +} else if (priv->wmiVersion == HYPERV_WMI_VERSION_V2) { +methodName = "ModifySystemSettings"; +embeddedParamName = "SystemSettings"; +embeddedParamClass = Msvm_VirtualSystemSettingData_WmiInfo; +enabledValue[0] = '4'; +disabledValue[0] = '2'; +} + +virUUIDFormat(domain->uuid, uuid_string); + +if (hypervGetVSSDFromUUID(priv, uuid_string, ) < 0) +goto cleanup; + +params = hypervCreateInvokeParamsList(priv, methodName, +MSVM_VIRTUALSYSTEMMANAGEMENTSERVICE_SELECTOR, +Msvm_VirtualSystemManagementService_WmiInfo); + +if (!params) { +virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not create params")); +goto cleanup; +} + +if (priv->wmiVersion == HYPERV_WMI_VERSION_V1) { +virBufferEscapeSQL(, + MSVM_COMPUTERSYSTEM_WQL_SELECT "WHERE Name = '%s'", + uuid_string); + +if (hypervAddEprParam(params, "ComputerSystem", priv, , + Msvm_ComputerSystem_WmiInfo) < 0) +goto params_cleanup; +} + +autostartParam = hypervCreateEmbeddedParam(priv, embeddedParamClass); + +if (hypervSetEmbeddedProperty(autostartParam, "AutomaticStartupAction", + autostart ? enabledValue : disabledValue) < 0) { +hypervFreeEmbeddedParam(autostartParam); +goto params_cleanup; +} + +if (hypervSetEmbeddedProperty(autostartParam, "InstanceID", + vssd->data.common->InstanceID) < 0) { +hypervFreeEmbeddedParam(autostartParam); +goto params_cleanup; +} + +if (hypervAddEmbeddedParam(params, priv, embeddedParamName, autostartParam, + embeddedParamClass) < 0) { +hypervFreeEmbeddedParam(autostartParam); +goto params_cleanup; +} + +if (hypervInvokeMethod(priv, params, NULL) < 0) { +virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not set autostart")); +goto cleanup; +} + +result = 0; +goto cleanup; + + params_cleanup: +hypervFreeInvokeParams(params); + cleanup: +hypervFreeObject(priv, (hypervObject *) vssd); + +return result; +} + + + static int hypervConnectIsEncrypted(virConnectPtr conn) { @@ -1861,6 +1951,7 @@ static virHypervisorDriver hypervHypervisorDriver = { .domainCreate = hypervDomainCreate, /* 0.9.5 */ .domainCreateWithFlags = hypervDomainCreateWithFlags, /* 0.9.5 */ .domainGetAutostart = hypervDomainGetAutostart, /* 6.9.0 */ +.domainSetAutostart = hypervDomainSetAutostart, /* 6.9.0 */ .connectIsEncrypted = hypervConnectIsEncrypted, /* 0.9.5 */ .connectIsSecure = hypervConnectIsSecure, /* 0.9.5 */ .domainIsActive = hypervDomainIsActive, /* 0.9.5 */ -- 2.27.0
Re: Overlay limit bug
Adding libvir-list to cc. If you come across stuff that involves both libvirt and qemu it's best to crosspost it to libvir-list too as I've noticed this only accidentally. On Mon, Oct 12, 2020 at 15:03:10 -0400, Yoonho Park wrote: > I stumbled on a bug in qemu 4.2.0 (virsh 6.0.0) with a large number of > overlays. I am using "qemu-img create" and "virsh snapshot-create-as" to > create each overlay. When I run "virsh snapshot-create-as" for the 42nd > overlay, I get "error: No complete monitor response found in 10485760 This error comes from libvirt's DoS protection. The response from qemu is too large. This happens when 'query-named-block-nodes' is called on too-deep backing chains as the reply contains both nested and linear entries. Thus for a N deep backing chain, you get N + N-1 + N-2 ... entries in the returned data. Libvirt stops reading after 1MiB of json to prevent buffer overflows from a rogue qemu. > bytes: Numerical result out of range". However, I pulled down qemu 5.1.50 > (still using virsh 6.0.0), and it looks like the problem has disappeared > which is great. Does anyone know the patch set that addressed this bug? qemu patch: commit facda5443f5a8676fb635b82ac1046ac6b6a67ce Author: Peter Krempa Date: Mon Jan 20 09:50:49 2020 +0100 qapi: Allow getting flat output from 'query-named-block-nodes' When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. v4.2.0-1584-gfacda5443f libvirt patches: commit 95080cc8b470c977d53043d4eff3e30781f472eb Author: Peter Krempa Date: Tue Jan 21 16:51:40 2020 +0100 qemu: Don't request nested entries in qemuBlockGetNamedNodeData Use the 'flat' flag for 'query-named-block-nodes' if qemu supports QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT in qemuBlockGetNamedNodeData. We don't need the data so plumb in whether qemu supports the 'flat' output. Signed-off-by: Peter Krempa Reviewed-by: Ján Tomko commit 855211bbf3ed45a73159f45eba1b15f05d771b76 Author: Peter Krempa Date: Tue Jan 21 16:42:49 2020 +0100 qemu: monitor: Add 'flat' parameter for qemuMonitorJSONQueryNamedBlockNodes Modern qemu allows to skip the nested redundant data in the output of query-named-block-nodes. Plumb in the support for the argument that enables it. Signed-off-by: Peter Krempa Reviewed-by: Ján Tomko commit 63610bd5fbe67ba9eb4f22f67c4bdab6eda8c041 Author: Peter Krempa Date: Wed Feb 26 12:50:53 2020 +0100 qemuCheckpointDiscardBitmaps: Use qemuBlockGetNamedNodeData Replace qemuMonitorBlockGetNamedNodeData by qemuBlockGetNamedNodeData. Signed-off-by: Peter Krempa Reviewed-by: Ján Tomko commit f886c9f33051fc03dd6c78134c92d31e7caf0c81 Author: Peter Krempa Date: Tue Jan 21 16:33:12 2020 +0100 qemu: monitor: Refactor variable cleanup in qemuMonitorJSONQueryNamedBlockNodes Use g_autoptr to get rid of the cleanup section. Signed-off-by: Peter Krempa Reviewed-by: Ján Tomko commit b7991c903cdd5b3c8b79a206584a4db81a6d4eaa Author: Peter Krempa Date: Tue Jan 21 15:13:47 2020 +0100 qemu: capabilities: Add capability for the 'flat' argument of 'query-named-block-nodes' Detect the presence of the flag and make it available internally as QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT. Signed-off-by: Peter Krempa Reviewed-by: Ján Tomko v6.1.0-29-g95080cc8b4 > Also, does anyone know the "official" limit on the number of overlays that > can be created and is there a qemu test that exercises this? I could not Libvirt's official limit is 200 layers. This is the documentation for the validation function explaining the rationale: /** * qemuDomainStorageSourceValidateDepth: * @src: storage source chain to validate * @add: offsets the calculated number of images * @diskdst: optional disk target to use in error message * * The XML parser limits the maximum element nesting to 256 layers. As libvirt * reports the chain into the status and in some cases the config XML we must * validate that any user-provided chains will not exceed the XML nesting limit * when formatted to the XML. * * This function validates that the storage source chain starting @src is at * most 200 layers deep. @add modifies the calculated value to offset the number * to allow checking cases when new layers are going to be added to the chain. * * Returns 0 on success and -1 if the chain is too deep. Error is reported. */ int qemuDomainStorageSourceValidateDepth(virStorageSourcePtr src, int add, const char *diskdst) > find an overlay limit test in tests/qemu-iotests. I'd guess qemu doesn't really limit that, but at some point you'll get too much of a performance penalty form traversing all the
Re: [RFC] qemu: virtiofs can be used without NUMA nodes
On Tue, Oct 06, 2020 at 06:20 PM +0200, Marc Hartmayer wrote: > ...if a machine memory-backend using shared memory is configured for > the guest. This is especially important for QEMU machine types that > don't have NUMA but virtiofs support. > > An example snippet: > > > test > 2097152 > > > > > > > > > > ... > > ... > > > and the corresponding QEMU command line: > > /usr/bin/qemu-system-s390x \ > -machine s390-ccw-virtio-5.2,memory-backend=s390.ram \ > -m 2048 \ > -object > > memory-backend-file,id=s390.ram,mem-path=/var/lib/libvirt/qemu/ram/46-test/s390.ram,share=yes,size=2147483648 > \ > ... > > Signed-off-by: Marc Hartmayer > --- > Note: There are still some TODOs left... e.g. adapt the virtiofs > documentation of libvirt. > --- > src/qemu/qemu_validate.c | 15 +++ > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c > index a212605579d2..077a85b30802 100644 > --- a/src/qemu/qemu_validate.c > +++ b/src/qemu/qemu_validate.c > @@ -3470,14 +3470,21 @@ qemuValidateDomainDeviceDefGraphics(const > virDomainGraphicsDef *graphics, > > > static int > -qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def) > +qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def, > + virQEMUCapsPtr qemuCaps) > { > +const char *defaultRAMId = virQEMUCapsGetMachineDefaultRAMid(qemuCaps, > + > def->virtType, > + > def->os.machine); > size_t numa_nodes = virDomainNumaGetNodeCount(def->numa); > size_t i; > > -if (numa_nodes == 0) { > +if (numa_nodes == 0 && > +!(defaultRAMId && def->mem.access == > VIR_DOMAIN_MEMORY_ACCESS_SHARED)) { > +/* TODO do we need further checks here (e.g. check whether > + * memory backend is supported by the QEMU binary)? */ > virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > - _("virtiofs requires one or more NUMA nodes")); > + _("virtiofs requires shared memory")); > return -1; > } > > @@ -3591,7 +3598,7 @@ qemuValidateDomainDeviceDefFS(virDomainFSDefPtr fs, > _("virtiofs does not support multidevs")); > return -1; > } > -if (qemuValidateDomainDefVirtioFSSharedMemory(def) < 0) > +if (qemuValidateDomainDefVirtioFSSharedMemory(def, qemuCaps) < 0) > return -1; > break; > > -- > 2.25.4 > Gentle ping :) -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Gregor Pillen Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH v2 4/7] hyperv: implement domainShutdown and domainShutdownFlags
On Tuesday, 13 October 2020 07:14:01 CEST Matt Coleman wrote: > +hypervAddSimpleParam(params, "Force", "False"); > +hypervAddSimpleParam(params, "Reason", _("Planned shutdown via > libvirt")); Is this message logged in the Hyper-V logs? If so, then I'd not translate it: imagine an user running libvirt on an e.g. a French locale, sending this query to an Hyper-V on e.g. German. While English is not that different in this situation (still potentially not the Hyper-V language), at least IMHO it gives something understandable. Of course, in case the message is changed to an untranslated text, then please leave a comment there, so noone accidentally makes it translatable. -- Pino Toscano signature.asc Description: This is a digitally signed message part.
Re: [PATCH v2 3/7] hyperv: implement domainReboot and domainReset
On Tuesday, 13 October 2020 07:14:00 CEST Matt Coleman wrote: > Co-authored-by: Sri Ramanujam > Signed-off-by: Matt Coleman > --- > src/hyperv/hyperv_driver.c | 48 + > src/hyperv/hyperv_wmi_classes.h | 1 + > 2 files changed, 49 insertions(+) > > diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c > index c4fca4685e..7182340f74 100644 > --- a/src/hyperv/hyperv_driver.c > +++ b/src/hyperv/hyperv_driver.c > @@ -917,6 +917,52 @@ hypervDomainResume(virDomainPtr domain) > > > > +static int > +hypervDomainReboot(virDomainPtr domain, unsigned int flags) > +{ > +int result = -1; > +hypervPrivate *priv = domain->conn->privateData; > +Msvm_ComputerSystem *computerSystem = NULL; > + > +virCheckFlags(0, -1); > + > +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) > +goto cleanup; > + > +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain, > + > MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT); > + > + cleanup: > +hypervFreeObject(priv, (hypervObject *)computerSystem); > + > +return result; > +} > + > + > + > +static int > +hypervDomainReset(virDomainPtr domain, unsigned int flags) > +{ > +int result = -1; > +hypervPrivate *priv = domain->conn->privateData; > +Msvm_ComputerSystem *computerSystem = NULL; > + > +virCheckFlags(0, -1); > + > +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0) > +goto cleanup; > + > +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain, > + > MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET); > + > + cleanup: > +hypervFreeObject(priv, (hypervObject *)computerSystem); > + > +return result; > +} What about making a common helper function that calls hypervMsvmComputerSystemFromDomain + hypervInvokeMsvmComputerSystemRequestStateChange? Note that virDomainReboot() can take various flags, however none is used ATM. -- Pino Toscano signature.asc Description: This is a digitally signed message part.
Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
On Tue, Oct 13, 2020 at 01:33:28AM -0400, harry harry wrote: > > > Do you mean that GPAs are different from their corresponding HVAs when > > > KVM does the walks (as you said above) in software? > > > > What do you mean by "different"? GPAs and HVAs are two completely > different > > address spaces. > > Let me give you one concrete example as follows to explain the meaning of > ``different''. > > Suppose a program is running in a single-vCPU VM. The program allocates and > references one page (e.g., array[1024*4]). Assume that allocating and > referencing the page in the guest OS triggers a page fault and host OS > allocates a machine page to back it. > > Assume that GVA of array[0] is 0x0021 and its corresponding GPA is > 0x0081. I think array[0]'s corresponding HVA should also be > 0x0081, which is the same as array[0]'s GPA. If array[0]'s HVA > is not 0x0081, array[0]'s GPA is* different* from its > corresponding HVA. > > Now, let's assume array[0]'s GPA is different from its corresponding HVA. I > think there might be one issue like this: I think MMU's hardware logic to > translate ``GPA ->[extended/nested page tables] -> HPA''[1] should be the > same as ``VA-> [page tables] -> PA"[2]; if true, how does KVM find the > correct HPA with the different HVA (e.g., array[0]'s HVA is not > 0x0081) when there are EPT violations? This is where memslots come in. Think of memslots as a one-level page tablea that translate GPAs to HVAs. A memslot, set by userspace, tells KVM the corresponding HVA for a given GPA. Before the guest is running (assuming host userspace isn't broken), the userspace VMM will first allocate virtual memory (HVA) for all physical memory it wants to map into the guest (GPA). It then tells KVM how to translate a given GPA to its HVA by creating a memslot. To avoid getting lost in a tangent about page offsets, let's assume array[0]'s GPA = 0xa000. For KVM to create a GPA->HPA mapping for the guest, there _must_ be a memslot that translates GPA 0xa000 to an HVA[*]. Let's say HVA = 0xb000. On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to its HVA (0xb000), and then walks the host page tables to translate the HVA into a HPA (let's say that ends up being 0xc000). KVM then stuffs 0xc000 into the EPT tables, which yields: GPA-> HVA(KVM memslots) 0xa0000xb000 HVA-> HPA(host page tables) 0xb0000xc000 GPA-> HPA(extended page tables) 0xa0000xc000 To keep the EPT tables synchronized with the host page tables, if HVA->HPA changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified by the host kernel that the HVA has been unmapped and will find and unmap the corresponding GPA (again via memslots) to HPA translations. Ditto for the case where userspace moves a memslot, e.g. if HVA is changed to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses to GPA 0xa000 from the guest will take an EPT violation and see the new HVA (and presumably a new HPA). [*] If there is no memslot, KVM will exit to userspace on the EPT violation, with some information about what GPA the guest was accessing. This is how emulated MMIO is implemented, e.g. userspace intentionally doesn't back a GPA with a memslot so that it can trap guest accesses to said GPA for the purpose of emulating a device. > [1] Please note that this hardware walk is the last step, which only > translates the guest physical address to the host physical address through > the four-level nested page table. > [2] Please note that this hardware walk assumes translating the VA to the > PA without virtualization involvement. > > Please note that the above addresses are not real and just use for > explanations. > > Thanks, > Harry
Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
On 13/10/20 07:46, harry harry wrote: > Now, let's assume array[0]'s GPA is different from its corresponding > HVA. I think there might be one issue like this: I think MMU's hardware > logic to translate ``GPA ->[extended/nested page tables] -> HPA''[1] > should be the same as ``VA-> [page tables] -> PA"[2]; if true, how does > KVM find the correct HPA with the different HVA (e.g., array[0]'s HVA is > not 0x0081) when there are EPT violations? It has separate data structures that help with the translation. These data structures are specific to KVM for GPA to HVA translation, while for HVA to HPA the Linux functionality is reused. > BTW, I assume the software logic for KVM to find the HPA with a given > HVA (as you said like below) should be the same as the hardware logic in > MMU to translate ``GPA -> [extended/nested page tables] -> HPA''. No, the logic to find the HPA with a given HVA is the same as the hardware logic to translate HVA -> HPA. That is it uses the host "regular" page tables, not the nested page tables. In order to translate GPA to HPA, instead, KVM does not use the nested page tables. It performs instead two steps, from GPA to HVA and from HVA to HPA: * for GPA to HVA it uses a custom data structure. * for HVA to HPA it uses the host page tables as mentioned above. This is because: * the GPA to HVA translation is the one that is almost always sufficient, and the nested page tables do not provide this information * even if GPA to HPA is needed, the nested page tables are built lazily and therefore may not always contain the requested mapping. In addition using HPA requires special steps (such as calling get_page/put_page) and often these steps need an HVA anyway. Paolo