date:20201013

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread Michal Privoznik


On 10/13/20 7:57 PM, Peter Krempa wrote:

On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote:

On 10/13/20 5:10 PM, Michal Privoznik wrote:

On 10/13/20 1:35 AM, Nico Pache wrote:

gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79

The virtio-balloon device now has the ability to report free pages
back to the hypervisor for reuse by other programs.


Is this something that we might want to report? I mean, we have 'virsh
dommemstat $dom' which under the hood calls:

{"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}


As I've pointed out in earlier review, I think that the feature name is
a bit misleading. It sounds like a statistic, not something that
actually returns memory to the host. The docs are now better but still
leave a lot of room for imagination.

IMO, if the feature is mainly for returning memory to the host, the
'reporting' word should not have been used.



Oh sorry I missed that. I a penance I will post a cleanup patch. How 
does "free-pages" or "return-pages" or even "discard-pages" sound?


Michal

Re: [libvirt][PATCH v1 1/1] introduce an attribute "migratable" to numatune memory element

2020-10-13 Thread Zhong, Luyao





On 10/6/2020 3:15 PM, Peter Krempa wrote:

On Tue, Oct 06, 2020 at 13:47:25 +0800, Luyao Zhong wrote:

   
 
 
 
   

Attribute ``migratable`` will be 'no' by default, and 'yes' indicates
that it allows operating system or hypervisor migrating the memory
pages between different memory nodes, that also means we will not
rely on hypervisor to set the memory policy or memory affinity, we only
use cgroups to restrict the memory nodes, so if ``migratable`` is 'yes',
the ``mode`` which indicates memory policy will be ignored.


Note that I'm not reviewing whether this is justified and makes sense.
I'm commenting purely on the code.


---

We require that patches are split into logical blocks. You need to split
this commit at least into following patches:

  docs/formatdomain.rst |  8 +++-
  docs/schemas/domaincommon.rng |  5 +++
  src/conf/numa_conf.c  | 45 +++
  src/conf/numa_conf.h  |  3 ++
  src/libvirt_private.syms  |  1 +


Docs, schema and parser changes should be separated

Got it.



  src/qemu/qemu_command.c   |  6 ++-
  src/qemu/qemu_process.c   | 21 +
  .../numatune-memory-migratable.args   | 34 ++
  .../numatune-memory-migratable.err|  1 +
  .../numatune-memory-migratable.xml| 33 ++
  tests/qemuxml2argvtest.c  |  5 +++


Implementation in qemu can be all together with tests. Please also add a
qemuxml2xmltest case.

I've provided some more feedback in a recent review:

https://www.redhat.com/archives/libvir-list/2020-October/msg00231.html

specifically some guidance on tests:

https://www.redhat.com/archives/libvir-list/2020-October/msg00395.html


Got it. Thanks.



  11 files changed, 160 insertions(+), 2 deletions(-)
  create mode 100644 tests/qemuxml2argvdata/numatune-memory-migratable.args
  create mode 100644 tests/qemuxml2argvdata/numatune-memory-migratable.err
  create mode 100644 tests/qemuxml2argvdata/numatune-memory-migratable.xml

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index cc4f91d4ea..4e386a0c3d 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst


[...]


@@ -1097,6 +1097,12 @@ NUMA Node Tuning
 will be ignored if it's specified. If ``placement`` of ``vcpu`` is 'auto',
 and ``numatune`` is not specified, a default ``numatune`` with 
``placement``
 'auto' and ``mode`` 'strict' will be added implicitly. :since:`Since 0.9.3`
+   Attribute ``migratable`` is 'no' by default, and 'yes' indicates that it
+   allows operating system or hypervisor migrating the memory pages between
+   different memory nodes, that also means we will not rely on hypervisor to
+   set the memory policy or memory affinity, we only use cgroups to restrict
+   the memory nodes, so if ``migratable`` is 'yes', the ``mode`` which 
indicates
+   memory policy will be ignored.


So ... does this make us ignore the per NUMA-node set 'mode'? If yes,
the code should actually reject any configs where we ignore some
settings rather than just relying on the docs.

Yes, it will ignore the per NUMA-node set 'mode'. So let me saying in 
this way, if 'migratable' is set, per NUMA-node mode should not be set, 
otherwise the config will be rejected, so the mode will have 'None' as 
its default value but not 'strict'.



  ``memnode``
 Optional ``memnode`` elements can specify memory allocation policies per 
each
 guest NUMA node. For those nodes having no corresponding ``memnode`` 
element,


[...]


diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
index 6653ba05a6..c14ba1295c 100644
--- a/src/conf/numa_conf.c
+++ b/src/conf/numa_conf.c


[...]


@@ -292,6 +294,15 @@ virDomainNumatuneParseXML(virDomainNumaPtr numa,
  placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO;
  
  if (node) {

+if ((tmp = virXMLPropString(node, "migratable")) &&
+(migratable = virTristateBoolTypeFromString(tmp)) <= 0) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,


VIR_ERR_XML_ERROR

Got it.



+   _("Invalid 'migratable' attribute value '%s'"), 
tmp);
+goto cleanup;
+}
+numa->memory.migratable = migratable;
+VIR_FREE(tmp);
+
  if ((tmp = virXMLPropString(node, "mode")) &&
  (mode = virDomainNumatuneMemModeTypeFromString(tmp)) < 0) {
  virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
@@ -369,6 +380,11 @@ virDomainNumatuneFormatXML(virBufferPtr buf,
  tmp = virDomainNumatuneMemModeTypeToString(numatune->memory.mode);
  virBufferAsprintf(buf, "  
+if (numatune->memory.migratable) {


Use an explicit condition here:

if (numatune->memory.migratable != VIR_TRISTATE_BOOL_ABSENT)


Got it.

+tmp = virTristateBoolTypeToString(numatune->memory.migratable);
+

Re: [libvirt PATCH] qemu: remove some unnecessary local variables

2020-10-13 Thread Laine Stump


On 10/13/20 5:14 PM, Jonathon Jongsma wrote:

These variables seem to be left over from a previous refactoring and
they don't add anything to the code.



Certainly the proof is in the compiling :-)

(it looks like it was probably *me* that left these useless variables in 
there about 1.5 years ago when I split all the unplug stuff into two 
pieces, and factored out some common code used by all different device 
types. See commit dd60bd62d and surrounding commits if you're interested...)



Reviewed-by: Laine Stump  and pushed.




Signed-off-by: Jonathon Jongsma 
---
  src/qemu/qemu_hotplug.c | 12 
  1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 2c184b9ba0..79fc8baa5c 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -5502,7 +5502,6 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm,
   virDomainRedirdevDefPtr match,
   virDomainRedirdevDefPtr *detach)
  {
-virDomainRedirdevDefPtr redirdev;
  ssize_t idx;
  
  if ((idx = virDomainRedirdevDefFind(vm->def, match)) < 0) {

@@ -5511,7 +5510,7 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm,
  return -1;
  }
  
-*detach = redirdev = vm->def->redirdevs[idx];

+*detach = vm->def->redirdevs[idx];
  
  return 0;

  }
@@ -5523,12 +5522,11 @@ qemuDomainDetachPrepNet(virDomainObjPtr vm,
  virDomainNetDefPtr *detach)
  {
  int detachidx;
-virDomainNetDefPtr net = NULL;
  
  if ((detachidx = virDomainNetFindIdx(vm->def, match)) < 0)

  return -1;
  
-*detach = net = vm->def->nets[detachidx];

+*detach = vm->def->nets[detachidx];
  
  return 0;

  }
@@ -5598,7 +5596,6 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm,
  virDomainRNGDefPtr *detach)
  {
  ssize_t idx;
-virDomainRNGDefPtr rng;
  
  if ((idx = virDomainRNGFind(vm->def, match)) < 0) {

  virReportError(VIR_ERR_DEVICE_MISSING,
@@ -5608,7 +5605,7 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm,
  return -1;
  }
  
-*detach = rng = vm->def->rngs[idx];

+*detach = vm->def->rngs[idx];
  
  return 0;

  }
@@ -5619,7 +5616,6 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm,
 virDomainMemoryDefPtr match,
 virDomainMemoryDefPtr *detach)
  {
-virDomainMemoryDefPtr mem;
  int idx;
  
  if (qemuDomainMemoryDeviceAlignSize(vm->def, match) < 0)

@@ -5633,7 +5629,7 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm,
  return -1;
  }
  
-*detach = mem = vm->def->mems[idx];

+*detach = vm->def->mems[idx];
  
  return 0;

  }

Re: [External] Re: [PATCH v2 1/4] API: introduce memory failure

2020-10-13 Thread zhenwei pi





On 10/13/20 8:31 PM, Daniel Henrique Barboza wrote:

This patch failed to compile in my env:


FAILED: tools/virsh.p/virsh-domain.c.o
[]
-D_FUNCTION_DEF -MD -MQ tools/virsh.p/virsh-domain.c.o -MF 
tools/virsh.p/virsh-domain.c.o.d -o tools/virsh.p/virsh-domain.c.o -c 
../tools/virsh-domain.c

In file included from /usr/lib64/glib-2.0/include/glibconfig.h:9,
  from /usr/include/glib-2.0/glib/gtypes.h:32,
  from /usr/include/glib-2.0/glib/galloca.h:32,
  from /usr/include/glib-2.0/glib.h:30,
  from ../src/util/glibcompat.h:21,
  from ../src/internal.h:30,
  from ../tools/virsh.h:25,
  from ../tools/virsh-domain.h:23,
  from ../tools/virsh-domain.c:22:
/usr/include/glib-2.0/glib/gmacros.h:745:53: error: size of array 
‘_GStaticAssertCompileTimeAssertion_185’ is negative
   745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE 
(_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] 
G_GNUC_UNUSED
   | 
^~~
/usr/include/glib-2.0/glib/gmacros.h:735:47: note: in definition of 
macro ‘G_PASTE_ARGS’
   735 | #define G_PASTE_ARGS(identifier1,identifier2) identifier1 ## 
identifier2

   |   ^~~
/usr/include/glib-2.0/glib/gmacros.h:745:44: note: in expansion of macro 
‘G_PASTE’
   745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE 
(_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] 
G_GNUC_UNUSED

   |    ^~~
../tools/virsh-domain.c:13643:1: note: in expansion of macro 
‘G_STATIC_ASSERT’
13643 | G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == 
G_N_ELEMENTS(virshDomainEventCallbacks));

   | ^~~
[505/984] Compiling C object 
src/virtqemud.p/remote_remote_daemon_dispatch.c.o

ninja: build stopped: subcommand failed.
$


I didn't verify if the following patches fixes it.


Thanks,


DHB



I described it in '[PATCH v2 4/4] virsh: implement memory failure event'

Notice:
The full patch set includes 4 patches:
virsh: implement memory failure event (current patch)
qemu: monitor: handle memory failure event
qemu: process: implement domainMemoryFailure
API: introduce memory failure

To avoid build/test errors, the 4 patches should be merged/removed
together.


Suggested by Peter, separate a 'all in one' patch into 4 patches 
(described in cover letter '[PATCH v2 0/4] support memory failure').
I forked a repo and pushed the 4 
patches(https://gitlab.com/pacepi/libvirt/-/tree/memory-failure-v2), CI 
worked fine.





On 10/12/20 9:31 AM, zhenwei pi wrote:

Introduce memory failure event. Libvirt should monitor domain's
event, then posts it to uplayer. According to the hardware memory
corrupted message, the cloud scheduler could migrate domain to another
health physical server.

Signed-off-by: zhenwei pi 
---
  include/libvirt/libvirt-domain.h    | 82 
+
  src/conf/domain_event.c | 80 


  src/conf/domain_event.h | 12 ++
  src/libvirt_private.syms    |  2 +
  src/remote/remote_daemon_dispatch.c | 32 +++
  src/remote/remote_driver.c  | 32 +++
  src/remote/remote_protocol.x    | 16 +++-
  src/remote_protocol-structs |  8 
  8 files changed, 263 insertions(+), 1 deletion(-)

diff --git a/include/libvirt/libvirt-domain.h 
b/include/libvirt/libvirt-domain.h

index 77f9116675..5138843a56 100644
--- a/include/libvirt/libvirt-domain.h
+++ b/include/libvirt/libvirt-domain.h
@@ -3196,6 +3196,64 @@ typedef enum {
  } virDomainEventCrashedDetailType;
  /**
+ * virDomainMemoryFailureRecipientType:
+ *
+ * Recipient of a memory failure event.
+ */
+typedef enum {
+    /* memory failure at hypersivor memory address space */
+    VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_HYPERVISOR = 0,
+
+    /* memory failure at guest memory address space */
+    VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_GUEST = 1,
+
+# ifdef VIR_ENUM_SENTINELS
+    VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_LAST
+# endif
+} virDomainMemoryFailureRecipientType;
+
+
+/**
+ * virDomainMemoryFailureActionType:
+ *
+ * Action of a memory failure event.
+ */
+typedef enum {
+    /* the memory failure could be ignored. This will only be the 
case for

+ * action-optional failures. */
+    VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_IGNORE = 0,
+
+    /* memory failure occurred in guest memory, the guest enabled MCE 
handling

+ * mechanism, and hypervisor could inject the MCE into the guest
+ * successfully. */
+    VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_INJECT = 1,
+
+    /* the failure is unrecoverable.  This occurs for action-required 
failures

+ * if the recipient is the hypervisor; hypervisor will exit. */
+

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-13 Thread harry harry

Hi Sean,

Thanks much for your detailed replies. It's clear to me why GPAs are
different from HVAs in QEM/KVM. Thanks! I appreciate it if you could
help with the following two more questions.

On Tue, Oct 13, 2020 at 3:03 AM Sean Christopherson
 wrote:
>
> This is where memslots come in.  Think of memslots as a one-level page tablea
> that translate GPAs to HVAs.  A memslot, set by userspace, tells KVM the
> corresponding HVA for a given GPA.
>
> Before the guest is running (assuming host userspace isn't broken), the
> userspace VMM will first allocate virtual memory (HVA) for all physical
> memory it wants to map into the guest (GPA).  It then tells KVM how to
> translate a given GPA to its HVA by creating a memslot.
>
> To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
> GPA = 0xa000.  For KVM to create a GPA->HPA mapping for the guest, there 
> _must_
> be a memslot that translates GPA 0xa000 to an HVA[*].  Let's say HVA = 0xb000.
>
> On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) 
> to
> its HVA (0xb000), and then walks the host page tables to translate the HVA 
> into
> a HPA (let's say that ends up being 0xc000).  KVM then stuffs 0xc000 into the
> EPT tables, which yields:
>
>   GPA-> HVA(KVM memslots)
>   0xa0000xb000
>
>   HVA-> HPA(host page tables)
>   0xb0000xc000
>
>   GPA-> HPA(extended page tables)
>   0xa0000xc000
>
> To keep the EPT tables synchronized with the host page tables, if HVA->HPA
> changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
> by the host kernel that the HVA has been unmapped and will find and unmap
> the corresponding GPA (again via memslots) to HPA translations.
>
> Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
> to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
> to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
> (and presumably a new HPA).

Q1: Is there any file like ``/proc/pid/pagemap'' to record the
mappings between GPAs and HVAs in the host OS?

Q2: Seems that there might be extra overhead (e.g., synchronization
between EPT tables and host regular page tables; maintaining extra
regular page tables and data structures), which is caused by the extra
translation between GPAs to HVAs via memslots. Why doesn't KVM
directly use GPAs as HVAs and leverage extended/nested page tables to
translate HVAs (i.e., GPAs) to HPAs?

Thanks,
Harry

[libvirt PATCH] qemu: remove some unnecessary local variables

2020-10-13 Thread Jonathon Jongsma

These variables seem to be left over from a previous refactoring and
they don't add anything to the code.

Signed-off-by: Jonathon Jongsma 
---
 src/qemu/qemu_hotplug.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 2c184b9ba0..79fc8baa5c 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -5502,7 +5502,6 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm,
  virDomainRedirdevDefPtr match,
  virDomainRedirdevDefPtr *detach)
 {
-virDomainRedirdevDefPtr redirdev;
 ssize_t idx;
 
 if ((idx = virDomainRedirdevDefFind(vm->def, match)) < 0) {
@@ -5511,7 +5510,7 @@ qemuDomainDetachPrepRedirdev(virDomainObjPtr vm,
 return -1;
 }
 
-*detach = redirdev = vm->def->redirdevs[idx];
+*detach = vm->def->redirdevs[idx];
 
 return 0;
 }
@@ -5523,12 +5522,11 @@ qemuDomainDetachPrepNet(virDomainObjPtr vm,
 virDomainNetDefPtr *detach)
 {
 int detachidx;
-virDomainNetDefPtr net = NULL;
 
 if ((detachidx = virDomainNetFindIdx(vm->def, match)) < 0)
 return -1;
 
-*detach = net = vm->def->nets[detachidx];
+*detach = vm->def->nets[detachidx];
 
 return 0;
 }
@@ -5598,7 +5596,6 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm,
 virDomainRNGDefPtr *detach)
 {
 ssize_t idx;
-virDomainRNGDefPtr rng;
 
 if ((idx = virDomainRNGFind(vm->def, match)) < 0) {
 virReportError(VIR_ERR_DEVICE_MISSING,
@@ -5608,7 +5605,7 @@ qemuDomainDetachPrepRNG(virDomainObjPtr vm,
 return -1;
 }
 
-*detach = rng = vm->def->rngs[idx];
+*detach = vm->def->rngs[idx];
 
 return 0;
 }
@@ -5619,7 +5616,6 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm,
virDomainMemoryDefPtr match,
virDomainMemoryDefPtr *detach)
 {
-virDomainMemoryDefPtr mem;
 int idx;
 
 if (qemuDomainMemoryDeviceAlignSize(vm->def, match) < 0)
@@ -5633,7 +5629,7 @@ qemuDomainDetachPrepMemory(virDomainObjPtr vm,
 return -1;
 }
 
-*detach = mem = vm->def->mems[idx];
+*detach = vm->def->mems[idx];
 
 return 0;
 }
-- 
2.26.2

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-13 Thread harry harry

Hi Paolo and Sean,

Thanks much for your prompt replies and clear explanations.

On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini  wrote:
>
> No, the logic to find the HPA with a given HVA is the same as the
> hardware logic to translate HVA -> HPA.  That is it uses the host
> "regular" page tables, not the nested page tables.
>
> In order to translate GPA to HPA, instead, KVM does not use the nested
> page tables.

I am curious why KVM does not directly use GPAs as HVAs and leverage
nested page tables to translate HVAs (i.e., GPAs) to HPAs? Is that
because 1) the hardware logic of ``GPA -> [extended/nested page
tables] -> HPA[*]'' is different[**] from the hardware logic of ``HVA
-> [host regular page tables] -> HPA''; 2) if 1) is true, it is
natural to reuse Linux's original functionality to translate HVAs to
HPAs through regular page tables.

[*]: Here, the translation means the last step for MMU to translate a
GVA's corresponding GPA to an HPA through the extended/nested page
tables.
[**]: To my knowledge, the hardware logic of ``GPA -> [extended/nested
page tables] -> HPA'' seems to be the same as the hardware logic of
``HVA -> [host regular page tables] -> HPA''. I appreciate it if you
could point out the differences I ignored. Thanks!

Best,
Harry

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread Peter Krempa

On Tue, Oct 13, 2020 at 18:47:39 +0200, Michal Privoznik wrote:
> On 10/13/20 5:10 PM, Michal Privoznik wrote:
> > On 10/13/20 1:35 AM, Nico Pache wrote:
> > > gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> > > 
> > > The virtio-balloon device now has the ability to report free pages
> > > back to the hypervisor for reuse by other programs.
> 
> Is this something that we might want to report? I mean, we have 'virsh
> dommemstat $dom' which under the hood calls:
> 
> {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}

As I've pointed out in earlier review, I think that the feature name is
a bit misleading. It sounds like a statistic, not something that
actually returns memory to the host. The docs are now better but still
leave a lot of room for imagination.

IMO, if the feature is mainly for returning memory to the host, the
'reporting' word should not have been used.

Re: [PATCH 0/2] virtiofs can be used without NUMA

2020-10-13 Thread Marc Hartmayer

On Tue, Oct 13, 2020 at 07:10 PM +0200, Michal Privoznik  
wrote:
> On 10/13/20 6:53 PM, Marc Hartmayer wrote:
>> Halil Pasic (1):
>>Reflect in virtiofs.rst that virtiofs can be used without NUMA
>> 
>> Marc Hartmayer (1):
>>qemu: virtiofs can be used without NUMA nodes
>> 
>>   docs/kbase/virtiofs.rst  | 17 -
>>   src/qemu/qemu_validate.c | 13 +
>>   2 files changed, 21 insertions(+), 9 deletions(-)
>> 
>
> Reviewed-by: Michal Privoznik 
>
> and pushed.
>
> Michal
>

Thanks.

-- 
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Re: [PATCH 0/2] virtiofs can be used without NUMA

2020-10-13 Thread Michal Privoznik


On 10/13/20 6:53 PM, Marc Hartmayer wrote:

Halil Pasic (1):
   Reflect in virtiofs.rst that virtiofs can be used without NUMA

Marc Hartmayer (1):
   qemu: virtiofs can be used without NUMA nodes

  docs/kbase/virtiofs.rst  | 17 -
  src/qemu/qemu_validate.c | 13 +
  2 files changed, 21 insertions(+), 9 deletions(-)



Reviewed-by: Michal Privoznik 

and pushed.

Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread David Hildenbrand

On 13.10.20 19:00, Nico Pache wrote:
> I don't believe so, but David might have some more insight into that
> question. 
> 
> Once the feature is enabled the guest will report pages on its freelist
> to the hypervisor, and the host/hypervisor will then release them. 

We report (some, not all) free memory of our guest to the hypervisor. So
whatever the guest reported shows up in "stat-free-memory" in the stats
already.

The hypervisor will "discard" backing storage of reported free memory,
resulting in your hypervisor having more free memory available (and the
QEMU process consuming less memory).

> 
> I'm not exactly sure what information we can or would want to 'stat' out
> of this process... A counter for number of pages returned?

As the guest can reuse memory any time without coordination with the
hypervisor that wouldn't be of too much help. A counter would only tell
you if free page reporting was once performed successfully.

You would really have to compare the QEMU process memory footprint with
the guest stats to figure out if free page reporting is doing what it's
supposed to do.

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread Nico Pache

I don't believe so, but David might have some more insight into that
question.

Once the feature is enabled the guest will report pages on its freelist to
the hypervisor, and the host/hypervisor will then release them.

I'm not exactly sure what information we can or would want to 'stat' out of
this process... A counter for number of pages returned?

Cheers,
Nico

On Tue, Oct 13, 2020, 12:47 PM Michal Privoznik  wrote:

> On 10/13/20 5:10 PM, Michal Privoznik wrote:
> > On 10/13/20 1:35 AM, Nico Pache wrote:
> >> gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79
> >>
> >> The virtio-balloon device now has the ability to report free pages
> >> back to the hypervisor for reuse by other programs.
>
> Is this something that we might want to report? I mean, we have 'virsh
> dommemstat $dom' which under the hood calls:
>
>
> {"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}
>
> Michal
>
>

[PATCH 2/2] Reflect in virtiofs.rst that virtiofs can be used without NUMA

2020-10-13 Thread Marc Hartmayer

From: Halil Pasic 

Reflect in the virtiofs documentation that virtiofs can now be used
even without NUMA. While at it, be more precise where and why shared
memory is required.

Signed-off-by: Halil Pasic 
Signed-off-by: Marc Hartmayer 
---
 docs/kbase/virtiofs.rst | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/docs/kbase/virtiofs.rst b/docs/kbase/virtiofs.rst
index dea8e79f833c..01440420d76d 100644
--- a/docs/kbase/virtiofs.rst
+++ b/docs/kbase/virtiofs.rst
@@ -16,10 +16,17 @@ See https://virtio-fs.gitlab.io/
 Host setup
 ==
 
-The host-side virtiofsd daemon, like other vhost-user backed devices,
-requires shared memory between the host and the guest. As of QEMU 4.2, this
-requires specifying a NUMA topology for the guest and explicitly specifying
-a memory backend. Multiple options are available:
+Almost all virtio devices (all that use virtqueues) require access to
+at least certain portions of guest RAM (possibly policed by DMA). In
+case of virtiofsd, much like in case of other vhost-user (see
+https://www.qemu.org/docs/master/interop/vhost-user.html) virtio
+devices that are realized by an userspace process, this in practice
+means that QEMU needs to allocate the backing memory for all the guest
+RAM as shared memory. As of QEMU 4.2, it is possible to explicitly
+specify a memory backend when specifying the NUMA topology. This
+method is however only viable for machine types that do support
+NUMA. As of QEMU 5.0.0, it is possible to specify the memory backend
+without NUMA (using the so called memobject interface).
 
 Either of the following:
 
@@ -46,7 +53,7 @@ Either of the following:
 Guest setup
 ===
 
-#. Specify the NUMA topology
+#. Specify the NUMA topology (this step is only required for the NUMA case)
 
in the domain XML of the guest.
For the simplest one-node topology for a guest with 2GiB of RAM and 8 vCPUs:
-- 
2.25.4

[PATCH 1/2] qemu: virtiofs can be used without NUMA nodes

2020-10-13 Thread Marc Hartmayer

...if a machine memory-backend using shared memory is configured for
the guest. This is especially important for QEMU machine types that
don't have NUMA but virtiofs support.

An example snippet:

  
test
2097152

  






  
  ...

...
  

and the corresponding QEMU command line:

  /usr/bin/qemu-system-s390x \
  -machine s390-ccw-virtio-5.2,memory-backend=s390.ram \
  -m 2048 \
  -object
  
memory-backend-file,id=s390.ram,mem-path=/var/lib/libvirt/qemu/ram/46-test/s390.ram,share=yes,size=2147483648
 \
  ...

Signed-off-by: Marc Hartmayer 
---
 src/qemu/qemu_validate.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index 27e10d59fd25..bc3043bb3f0d 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -3470,14 +3470,19 @@ qemuValidateDomainDeviceDefGraphics(const 
virDomainGraphicsDef *graphics,
 
 
 static int
-qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def)
+qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def,
+  virQEMUCapsPtr qemuCaps)
 {
+const char *defaultRAMId = virQEMUCapsGetMachineDefaultRAMid(qemuCaps,
+ def->virtType,
+ 
def->os.machine);
 size_t numa_nodes = virDomainNumaGetNodeCount(def->numa);
 size_t i;
 
-if (numa_nodes == 0) {
+if (numa_nodes == 0 &&
+!(defaultRAMId && def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED)) 
{
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
-   _("virtiofs requires one or more NUMA nodes"));
+   _("virtiofs requires shared memory"));
 return -1;
 }
 
@@ -3591,7 +3596,7 @@ qemuValidateDomainDeviceDefFS(virDomainFSDefPtr fs,
_("virtiofs does not support multidevs"));
 return -1;
 }
-if (qemuValidateDomainDefVirtioFSSharedMemory(def) < 0)
+if (qemuValidateDomainDefVirtioFSSharedMemory(def, qemuCaps) < 0)
 return -1;
 break;
 
-- 
2.25.4

[PATCH 0/2] virtiofs can be used without NUMA

2020-10-13 Thread Marc Hartmayer

Halil Pasic (1):
  Reflect in virtiofs.rst that virtiofs can be used without NUMA

Marc Hartmayer (1):
  qemu: virtiofs can be used without NUMA nodes

 docs/kbase/virtiofs.rst  | 17 -
 src/qemu/qemu_validate.c | 13 +
 2 files changed, 21 insertions(+), 9 deletions(-)

-- 
2.25.4

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread Michal Privoznik


On 10/13/20 5:10 PM, Michal Privoznik wrote:

On 10/13/20 1:35 AM, Nico Pache wrote:

gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79

The virtio-balloon device now has the ability to report free pages
back to the hypervisor for reuse by other programs.


Is this something that we might want to report? I mean, we have 'virsh 
dommemstat $dom' which under the hood calls:


{"execute":"qom-get","arguments":{"path":"/machine/peripheral/balloon0","property":"guest-stats"},"id":"libvirt-400"}

Michal

Re: [PATCH v2 0/9] docs: Fix migration.html generation and report such errors next time

2020-10-13 Thread Peter Krempa

On Tue, Oct 13, 2020 at 18:18:04 +0200, Peter Krempa wrote:
> This version refactors the XSLT transformation command a bit more
> carefully and thoroughly. A semantic difference in v2 is that the output
> HTMLs are no longer reformatted.

The output can be verified here:

https://gitlab.com/pipo.sk/libvirt/-/jobs/788658848/artifacts/browse/website/

[PATCH v2 6/9] docs/internals/meson.build: Use template code for XSLT processing

2020-10-13 Thread Peter Krempa

Replace the reimplementation of the XSLT processing custom target with
an identical copy form docs/meson.build and an comment to keep them in
sync.

Signed-off-by: Peter Krempa 
---
 docs/internals/meson.build | 55 --
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/docs/internals/meson.build b/docs/internals/meson.build
index 5d008dec5b..5aa67ded5c 100644
--- a/docs/internals/meson.build
+++ b/docs/internals/meson.build
@@ -5,33 +5,48 @@ internals_in_files = [
   'rpc',
 ]

+html_xslt_gen_xslt = subsite_xsl
+html_xslt_gen_install_dir = docs_html_dir / 'internals'
+html_xslt_gen = []
+
 foreach name : internals_in_files
-  html_in_file = '@0...@.html.in'.format(name)
-  html_file = '@0@.html'.format(name)
+  html_xslt_gen += {
+'name': name,
+'source': 'docs/internals' / name + '.html.in',
+  }
+endforeach
+
+# keep the XSLT processing code block in sync with docs/meson.build
+
+# --- begin of XSLT processing ---

-  out_file = custom_target(
-html_file,
-input: html_in_file,
-output: html_file,
+foreach data : html_xslt_gen
+  html_filename = data['name'] + '.html'
+
+  html_file = custom_target(
+html_filename,
+input: data.get('file', data['name'] + '.html.in'),
+output: html_filename,
 command: [
-  meson_python_prog,
-  python3_prog.path(),
-  meson_html_gen_prog.path(),
-  xsltproc_prog.path(),
-  xmllint_prog.path(),
-  meson.build_root(),
-  docs_timestamp,
-  subsite_xsl,
+  xsltproc_prog,
+  '--stringparam', 'pagesrc', data.get('source', ''),
+  '--stringparam', 'builddir', meson.build_root(),
+  '--stringparam', 'timestamp', docs_timestamp,
+  '--nonet',
+  html_xslt_gen_xslt,
   '@INPUT@',
-  '@OUTPUT@',
-  'docs/internals' / html_in_file,
 ],
-depends: [ aclperms_gen ],
+depends: data.get('depends', []),
 depend_files: [ page_xsl ],
+capture: true,
 install: true,
-install_dir: docs_html_dir / 'internals',
+install_dir: html_xslt_gen_install_dir,
   )

-  install_web_deps += out_file
-  install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir / 
'internals')
+  install_web_deps += html_file
+  install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir
 endforeach
+
+html_xslt_gen = []
+
+# --- end of XSLT processing ---
-- 
2.26.2

[PATCH v2 8/9] docs/manpages/meson.build: Use template code for XSLT processing

2020-10-13 Thread Peter Krempa

Replace the reimplementation of the XSLT processing custom target with
an identical copy form docs/meson.build and an comment to keep them in
sync.

Signed-off-by: Peter Krempa 
---
 docs/manpages/meson.build | 55 ++-
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/docs/manpages/meson.build b/docs/manpages/meson.build
index 7ed1d304a4..ecc517e80e 100644
--- a/docs/manpages/meson.build
+++ b/docs/manpages/meson.build
@@ -1,3 +1,7 @@
+html_xslt_gen_xslt = subsite_xsl
+html_xslt_gen_install_dir = docs_html_dir / 'manpages'
+html_xslt_gen = []
+
 # docs_man_files
 #   each entry is a dictionary with following items:
 #   name - man page name (required)
@@ -104,29 +108,44 @@ foreach data : docs_man_files
 capture: true,
   )

-  out_file = custom_target(
-html_file,
-input: html_in,
-output: html_file,
+  html_xslt_gen += {
+'name': data['name'],
+'file': html_in,
+'source': 'docs/manpages' / rst_in_file,
+  }
+endforeach
+
+# keep the XSLT processing code block in sync with docs/meson.build
+
+# --- begin of XSLT processing ---
+
+foreach data : html_xslt_gen
+  html_filename = data['name'] + '.html'
+
+  html_file = custom_target(
+html_filename,
+input: data.get('file', data['name'] + '.html.in'),
+output: html_filename,
 command: [
-  meson_python_prog,
-  python3_prog.path(),
-  meson_html_gen_prog.path(),
-  xsltproc_prog.path(),
-  xmllint_prog.path(),
-  meson.build_root(),
-  docs_timestamp,
-  subsite_xsl,
+  xsltproc_prog,
+  '--stringparam', 'pagesrc', data.get('source', ''),
+  '--stringparam', 'builddir', meson.build_root(),
+  '--stringparam', 'timestamp', docs_timestamp,
+  '--nonet',
+  html_xslt_gen_xslt,
   '@INPUT@',
-  '@OUTPUT@',
-  'docs/manpages' / rst_in_file,
 ],
-depends: [ aclperms_gen ],
+depends: data.get('depends', []),
 depend_files: [ page_xsl ],
+capture: true,
 install: true,
-install_dir: docs_html_dir / 'manpages',
+install_dir: html_xslt_gen_install_dir,
   )

-  install_web_deps += out_file
-  install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir / 
'manpages')
+  install_web_deps += html_file
+  install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir
 endforeach
+
+html_xslt_gen = []
+
+# --- end of XSLT processing ---
-- 
2.26.2

[PATCH v2 4/9] docs: meson.build: Generate HTML files directly by meson

2020-10-13 Thread Peter Krempa

Since we no longer reformat the XSLT-transformed files, there's no need
to use an external script any more.

Unfortunately this hid errors from 'xsltproc' as return value was not
checked and the stderr was piped into xmllints stdin. The result was
that any invalid input file would result into an empty output file.

Since the script's only purpose was to prevent additional temporary
files at the time we were reformatting the output in a pipeline we no
longer need this.

Moving the generation directly into the meson definition makes it more
obvious what's happening and saves readers from having to parse what's
going on. A free bonus is that errors are now properly caught and
reported.

This patch converts the main docs/ directory for now with cleanup of
other comming later.

Signed-off-by: Peter Krempa 
---
 docs/meson.build | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/docs/meson.build b/docs/meson.build
index d18e5c1feb..8b7c63bc6f 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -267,20 +267,17 @@ foreach data : docs_html_in_gen
 input: data['file'],
 output: html_file,
 command: [
-  meson_python_prog,
-  python3_prog.path(),
-  meson_html_gen_prog.path(),
-  xsltproc_prog.path(),
-  xmllint_prog.path(),
-  meson.build_root(),
-  docs_timestamp,
+  xsltproc_prog,
+  '--stringparam', 'pagesrc', data.get('source', ''),
+  '--stringparam', 'builddir', meson.build_root(),
+  '--stringparam', 'timestamp', docs_timestamp,
+  '--nonet',
   site_xsl,
   '@INPUT@',
-  '@OUTPUT@',
-  data.get('source', []),
 ],
 depends: data.get('depends', []),
 depend_files: [ page_xsl ],
+capture: true,
 install: true,
 install_dir: docs_html_dir,
   )
-- 
2.26.2

[PATCH v2 0/9] docs: Fix migration.html generation and report such errors next time

2020-10-13 Thread Peter Krempa

This version refactors the XSLT transformation command a bit more
carefully and thoroughly. A semantic difference in v2 is that the output
HTMLs are no longer reformatted.

Don't apply/revert first patch to see the build failure:

[59/144] Generating migration.html with a meson_exe.py custom command
FAILED: docs/migration.html
/usr/bin/meson --internal exe --capture docs/migration.html -- /bin/xsltproc 
--stringparam pagesrc docs/migration.html.in --stringparam builddir 
/home/pipo/build/libvirt/gcc --stringparam timestamp 'Tue Oct 13 16:16:15 2020 
UTC' --nonet ../../../libvirt/docs/site.xsl 
../../../libvirt/docs/migration.html.in
../../../libvirt/docs/migration.html.in:664: parser error : Opening and ending 
tag mismatch: p line 649 and body
  
 ^
../../../libvirt/docs/migration.html.in:665: parser error : Opening and ending 
tag mismatch: body line 649 and html

   ^
../../../libvirt/docs/migration.html.in:666: parser error : EndTag: '

[PATCH v2 9/9] scripts: meson-html-gen: Remove

2020-10-13 Thread Peter Krempa

The script was obscuring what's happening and not reporting errors
properly. Remove it since it's no longer used now.

Signed-off-by: Peter Krempa 
---
 scripts/meson-html-gen.py | 30 --
 scripts/meson.build   |  1 -
 2 files changed, 31 deletions(-)
 delete mode 100755 scripts/meson-html-gen.py

diff --git a/scripts/meson-html-gen.py b/scripts/meson-html-gen.py
deleted file mode 100755
index dcc11f37cf..00
--- a/scripts/meson-html-gen.py
+++ /dev/null
@@ -1,30 +0,0 @@
-#!/usr/bin/env python3
-
-import argparse
-import subprocess
-
-parser = argparse.ArgumentParser()
-parser.add_argument("xsltproc", type=str, help="path to xsltproc bin")
-parser.add_argument("xmllint", type=str, help="path to xmllint bin")
-parser.add_argument("builddir", type=str, help="build root dir path")
-parser.add_argument("timestamp", type=str, help="docs timestamp")
-parser.add_argument("style", type=str, help="XSL stile file")
-parser.add_argument("infile", type=str, help="path to source HTML file")
-parser.add_argument("htmlfile", type=str, help="path to generated HTML file")
-parser.add_argument("pagesrc", type=str, default="", nargs='?', 
help="(optional) path to source file used for edit this page")
-args = parser.parse_args()
-
-html = subprocess.run(
-[
-args.xsltproc,
-'--stringparam', 'pagesrc', args.pagesrc,
-'--stringparam', 'builddir', args.builddir,
-'--stringparam', 'timestamp', args.timestamp,
-'--nonet', args.style, args.infile,
-],
-stdout=subprocess.PIPE,
-stderr=subprocess.PIPE,
-)
-
-with open(args.htmlfile, 'wb') as outfile:
-outfile.write(html.stdout)
diff --git a/scripts/meson.build b/scripts/meson.build
index 59b3c9bacd..655ec0e0e2 100644
--- a/scripts/meson.build
+++ b/scripts/meson.build
@@ -22,7 +22,6 @@ scripts = [
   'meson-gen-authors.py',
   'meson-gen-def.py',
   'meson-gen-sym.py',
-  'meson-html-gen.py',
   'meson-install-dirs.py',
   'meson-install-symlink.py',
   'meson-install-web.py',
-- 
2.26.2

[PATCH v2 2/9] scripts/meson-html-gen.py: Don't rereformat output files

2020-10-13 Thread Peter Krempa

The output HTML files (especially those generated from rST files) don't
look good even after reformatting. Skip the extra step and accept that
no matter what we do HTMLs will not look great.

This additionally makes it way simpler to remove meson-html-gen.py in
the future (thus I've neglected to remove passing of xmllint).

Signed-off-by: Peter Krempa 
---
 scripts/meson-html-gen.py | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/scripts/meson-html-gen.py b/scripts/meson-html-gen.py
index 2731d734a7..dcc11f37cf 100755
--- a/scripts/meson-html-gen.py
+++ b/scripts/meson-html-gen.py
@@ -14,7 +14,7 @@ parser.add_argument("htmlfile", type=str, help="path to 
generated HTML file")
 parser.add_argument("pagesrc", type=str, default="", nargs='?', 
help="(optional) path to source file used for edit this page")
 args = parser.parse_args()

-html_tmp = subprocess.run(
+html = subprocess.run(
 [
 args.xsltproc,
 '--stringparam', 'pagesrc', args.pagesrc,
@@ -26,12 +26,5 @@ html_tmp = subprocess.run(
 stderr=subprocess.PIPE,
 )

-html = subprocess.run(
-[args.xmllint, '--nonet', '--format', '-'],
-input=html_tmp.stdout,
-stdout=subprocess.PIPE,
-stderr=subprocess.PIPE,
-)
-
 with open(args.htmlfile, 'wb') as outfile:
 outfile.write(html.stdout)
-- 
2.26.2

[PATCH v2 7/9] docs/kbase/meson.build: Use template code for XSLT processing

2020-10-13 Thread Peter Krempa

Replace the reimplementation of the XSLT processing custom target with
an identical copy form docs/meson.build and an comment to keep them in
sync.

Signed-off-by: Peter Krempa 
---
 docs/kbase/meson.build | 56 +++---
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/docs/kbase/meson.build b/docs/kbase/meson.build
index b1d1d7610b..c0fa72ff35 100644
--- a/docs/kbase/meson.build
+++ b/docs/kbase/meson.build
@@ -17,35 +17,51 @@ docs_kbase_files = [
   'virtiofs',
 ]

+html_xslt_gen_xslt = subsite_xsl
+html_xslt_gen_install_dir = docs_html_dir / 'kbase'
+html_xslt_gen = []
+
 foreach name : docs_kbase_files
   rst_file = '@0@.rst'.format(name)
-  html_file = '@0@.html'.format(name)

-  html_in = docs_rst2html_gen.process(rst_file)
+  html_xslt_gen += {
+'name': name,
+'file': docs_rst2html_gen.process(rst_file),
+'source': 'docs/kbase' / rst_file,
+  }
+endforeach
+
+# keep the XSLT processing code block in sync with docs/meson.build
+
+# --- begin of XSLT processing ---

-  out_file = custom_target(
-html_file,
-input: html_in,
-output: html_file,
+foreach data : html_xslt_gen
+  html_filename = data['name'] + '.html'
+
+  html_file = custom_target(
+html_filename,
+input: data.get('file', data['name'] + '.html.in'),
+output: html_filename,
 command: [
-  meson_python_prog,
-  python3_prog.path(),
-  meson_html_gen_prog.path(),
-  xsltproc_prog.path(),
-  xmllint_prog.path(),
-  meson.build_root(),
-  docs_timestamp,
-  subsite_xsl,
+  xsltproc_prog,
+  '--stringparam', 'pagesrc', data.get('source', ''),
+  '--stringparam', 'builddir', meson.build_root(),
+  '--stringparam', 'timestamp', docs_timestamp,
+  '--nonet',
+  html_xslt_gen_xslt,
   '@INPUT@',
-  '@OUTPUT@',
-  'docs/kbase' / rst_file,
 ],
-depends: [ aclperms_gen ],
+depends: data.get('depends', []),
 depend_files: [ page_xsl ],
+capture: true,
 install: true,
-install_dir: docs_html_dir / 'kbase',
+install_dir: html_xslt_gen_install_dir,
   )

-  install_web_deps += out_file
-  install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir / 
'kbase')
+  install_web_deps += html_file
+  install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir
 endforeach
+
+html_xslt_gen = []
+
+# --- end of XSLT processing ---
-- 
2.26.2

[PATCH v2 3/9] docs: meson.build: Limit html files depending on 'aclperms.htmlinc'

2020-10-13 Thread Peter Krempa

Only 'acl.html' output file includes that file so there's no need to
make everything depend on it.

Signed-off-by: Peter Krempa 
---
 docs/meson.build | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/meson.build b/docs/meson.build
index 400c1ca955..d18e5c1feb 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -32,7 +32,6 @@ docs_assets = [

 docs_html_in_files = [
   '404',
-  'acl',
   'aclpolkit',
   'api_extension',
   'api',
@@ -199,6 +198,7 @@ docs_rst2html_gen = generator(
 #   name - base file name (required)
 #   file - generated file (required)
 # source - source filename relative to repository root (optional, if there is 
no source)
+# depends - explicit dependency on other input (optional)
 docs_html_in_gen = []

 foreach name : docs_html_in_files
@@ -219,6 +219,13 @@ foreach name : docs_rst_files
   }
 endforeach

+docs_html_in_gen += {
+  'name': 'acl.html',
+  'file': 'acl.html.in',
+  'source': 'docs' / 'acl.html.in',
+  'depends': aclperms_gen,
+}
+
 hvsupport_html_in = custom_target(
   'hvsupport.html.in',
   output: 'hvsupport.html.in',
@@ -272,7 +279,7 @@ foreach data : docs_html_in_gen
   '@OUTPUT@',
   data.get('source', []),
 ],
-depends: [ aclperms_gen ],
+depends: data.get('depends', []),
 depend_files: [ page_xsl ],
 install: true,
 install_dir: docs_html_dir,
-- 
2.26.2

[PATCH v2 1/9] docs: migration: Fix syntax

2020-10-13 Thread Peter Krempa

One of the paragraphs added in f51cbe92c0d was not terminated thus
making it invalid XML/XHTML.

This was not caught by the build system as 'scripts/meson-html-gen.py'
unnecessarily obscures and hides errors from 'xsltproc'.

Signed-off-by: Peter Krempa 
---
 docs/migration.html.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/migration.html.in b/docs/migration.html.in
index 162c202227..dd5eddd6f4 100644
--- a/docs/migration.html.in
+++ b/docs/migration.html.in
@@ -653,6 +653,7 @@ virsh migrate --p2p --tunnelled web1 
qemu+ssh://desthost/system qemu+ssh://10.0.
   daemons or forwarding connections to these sockets manually
   (using socat, netcat or a custom piece of
   software):
+
 
 virsh migrate web1 [--p2p] --copy-storage-all 
'qemu+unix:///system?socket=/tmp/migdir/test-sock-driver' 
'unix:///tmp/migdir/test-sock-qemu' --disks-uri unix:///tmp/migdir/test-sock-nbd
 
-- 
2.26.2

[PATCH v2 5/9] docs: meson.build: Prepare for use of identical code for XSLT processing of htmls

2020-10-13 Thread Peter Krempa

Meson unfortunately doesn't give us any means to share the code using
xsltproc to output HTMLs processed by our template. This means we will
have to resort to copy engineering.

To make things simpler, let's use the same block of code in
docs/meson.build but also any of the subdirs which generate htmls.

This will be achieved by making it configurable and wrapping it in a
comment that instructs anybody editing it to keep it identical.

We need to be able to configure the template file used and installation
directory. The rest of the processing is same as we do in
docs/meson.build.

This code will then be copied to subdirs to refactor the current
approach used there.

Signed-off-by: Peter Krempa 
---
 docs/meson.build | 60 +---
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/docs/meson.build b/docs/meson.build
index 8b7c63bc6f..a915d6252a 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -193,35 +193,37 @@ docs_rst2html_gen = generator(
 )


-# docs_html_in_gen:
+# html_xslt_gen config
+
+html_xslt_gen_xslt = site_xsl
+html_xslt_gen_install_dir = docs_html_dir
+
+html_xslt_gen = []
+# html_xslt_gen:
 #   each entry is a dictionary with following items:
-#   name - base file name (required)
-#   file - generated file (required)
+#   name - base file name (required), output file will become 'name.html'
+#   file - input file (optional, 'name.html.in' assumed if missing)
 # source - source filename relative to repository root (optional, if there is 
no source)
 # depends - explicit dependency on other input (optional)
-docs_html_in_gen = []

 foreach name : docs_html_in_files
-  html_in_file = '@0...@.html.in'.format(name)
-  docs_html_in_gen += {
+  html_xslt_gen += {
 'name': name,
-'file': html_in_file,
-'source': 'docs' / html_in_file,
+'source': 'docs' / name + '.html.in',
   }
 endforeach

 foreach name : docs_rst_files
   rst_file = '@0@.rst'.format(name)
-  docs_html_in_gen += {
+  html_xslt_gen += {
 'name': name,
 'file': docs_rst2html_gen.process(rst_file),
 'source': 'docs' / rst_file,
   }
 endforeach

-docs_html_in_gen += {
-  'name': 'acl.html',
-  'file': 'acl.html.in',
+html_xslt_gen += {
+  'name': 'acl',
   'source': 'docs' / 'acl.html.in',
   'depends': aclperms_gen,
 }
@@ -247,45 +249,55 @@ hvsupport_html_in = custom_target(
 docs_api_generated,
   ],
 )
-docs_html_in_gen += {
+html_xslt_gen += {
   'name': 'hvsupport',
   'file': hvsupport_html_in,
 }

 news_html_in = docs_rst2html_gen.process(meson.source_root() / 'NEWS.rst')
-docs_html_in_gen += {
+html_xslt_gen += {
   'name': 'news',
   'file': news_html_in,
   'source': 'NEWS.rst',
 }

-foreach data : docs_html_in_gen
-  html_file = '@0@.html'.format(data['name'])
+# The following code between the markers must be kept identical with the other
+# copies of the code in various subdirs, since meson doesn't support any kind
+# of functions.
+
+# --- begin of XSLT processing ---

-  out_file = custom_target(
-html_file,
-input: data['file'],
-output: html_file,
+foreach data : html_xslt_gen
+  html_filename = data['name'] + '.html'
+
+  html_file = custom_target(
+html_filename,
+input: data.get('file', data['name'] + '.html.in'),
+output: html_filename,
 command: [
   xsltproc_prog,
   '--stringparam', 'pagesrc', data.get('source', ''),
   '--stringparam', 'builddir', meson.build_root(),
   '--stringparam', 'timestamp', docs_timestamp,
   '--nonet',
-  site_xsl,
+  html_xslt_gen_xslt,
   '@INPUT@',
 ],
 depends: data.get('depends', []),
 depend_files: [ page_xsl ],
 capture: true,
 install: true,
-install_dir: docs_html_dir,
+install_dir: html_xslt_gen_install_dir,
   )

-  install_web_deps += out_file
-  install_web_files += '@0@:@1@'.format(out_file.full_path(), docs_html_dir)
+  install_web_deps += html_file
+  install_web_files += html_file.full_path() + ':' + html_xslt_gen_install_dir
 endforeach

+html_xslt_gen = []
+
+# --- end of XSLT processing ---
+
 subdir('fonts')
 subdir('html')
 subdir('internals')
-- 
2.26.2

Re: [PATCH v2 1/4] Document and parser support for the Virtio free page reporting feature.

2020-10-13 Thread Michal Privoznik


On 10/13/20 1:35 AM, Nico Pache wrote:

This will add the proper documentation and parser support for the free page
reporting feature that is introduced in QEMU 5.1.

Signed-off-by: Nico Pache 
---
  docs/formatdomain.rst |  6 ++
  docs/schemas/domaincommon.rng |  5 +
  src/conf/domain_conf.c| 21 +
  src/conf/domain_conf.h|  1 +
  4 files changed, 33 insertions(+)

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index df5ac28028..e75b360b32 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -6755,6 +6755,12 @@ Example: manually added device with static PCI slot 2 
requested
 release some memory at the last moment before a guest's process get killed 
by
 Out of Memory killer. :since:`Since 1.3.1, QEMU and KVM only`
  
+``free-page-reporting``

+   The optional ``free-page-reporting`` attribute allows to enable/disable
+   ("on"/"off", respectively) the ability of the QEMU virtio memory balloon to
+   return unused pages back to the hypervisor to be used by other guests or
+   processes. :since:`Since 5.1, QEMU and KVM only`


This "Since" attribute should refer to Libvirt version and not qemu. The 
idea is that when I am reading libvirt docs, how to enable something in 
domain XML I want to know whether my libvirt supports it. We could also 
record QEMU version but in general we don't do that.


Michal

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread Michal Privoznik


On 10/13/20 1:35 AM, Nico Pache wrote:

gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79

The virtio-balloon device now has the ability to report free pages
back to the hypervisor for reuse by other programs.

This kernel feature allows for more stable and resource friendly systems.

This feature is available in QEMU and is enabled with free-page-reporting=on
default virt setting should be off but the user should be able to enable it.

Nico Pache (4):
   Document and parser support for the Virtio free page reporting
 feature.
   QEMU: declare qemu capabilities for the Virtio Free page reporting
 feature
   QEMU: introduce Virtio free page reporting feature
   provide testing for free-page-reporting feature in QEMU

  docs/formatdomain.rst |  6 
  docs/schemas/domaincommon.rng |  5 +++
  src/conf/domain_conf.c| 21 +++
  src/conf/domain_conf.h|  1 +
  src/qemu/qemu_capabilities.c  |  2 ++
  src/qemu/qemu_capabilities.h  |  1 +
  src/qemu/qemu_command.c   |  5 +++
  src/qemu/qemu_validate.c  |  7 
  .../caps_5.1.0.x86_64.xml |  1 +
  .../caps_5.2.0.x86_64.xml |  1 +
  ...-options-memballoon-freepage-reporting.err |  1 +
  ...loon-freepage-reporting.x86_64-latest.args | 36 +++
  ...-options-memballoon-freepage-reporting.xml | 22 
  tests/qemuxml2argvtest.c  |  2 ++
  ...-options-memballoon-freepage-reporting.xml | 22 
  15 files changed, 133 insertions(+)
  create mode 100644 
tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err
  create mode 100644 
tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args
  create mode 100644 
tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml
  create mode 100644 
tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml



Couple of white space problems, but nothing I couldn't fix before pushing.

Reviewed-by: Michal Privoznik 

Congratulations on your first libvirt contribution!

Michal

Re: [PATCH 0/2] qemu: Set noqueue qdisc for TAP devices

2020-10-13 Thread Daniel Henrique Barboza





On 10/13/20 11:06 AM, Michal Privoznik wrote:

On 10/13/20 3:09 PM, Daniel Henrique Barboza wrote:



On 10/8/20 3:10 PM, Michal Privoznik wrote:

See 2/2 for detailed explanation.
Long story short - we can squeeze more bandwidth from TAP devices we
create for domains.

Michal Prívozník (2):
   virnetdev: Introduce virNetDevSetRootQDisc()
   qemu: Set noqueue qdisc for TAP devices



Reviewed-by: Daniel Henrique Barboza 



Thanks. Just before I merged these I realized that it will be a good idea to 
mock virNetDevSetRootQDisc() so that we don't run tc from the test suite. But 
the diff is really trivial:

diff --git i/src/util/virnetdev.h w/src/util/virnetdev.h
index 82943b8e08..dfef49938f 100644
--- i/src/util/virnetdev.h
+++ w/src/util/virnetdev.h
@@ -313,6 +313,7 @@ int virNetDevRunEthernetScript(const char *ifname, const 
char *script)
  G_GNUC_NO_INLINE;

  int virNetDevSetRootQDisc(const char *ifname,
-  const char *qdisc);
+  const char *qdisc)
+    G_GNUC_NO_INLINE;

  G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree);
diff --git i/tests/qemuxml2argvmock.c w/tests/qemuxml2argvmock.c
index 9bf4357b66..b9322f4f2a 100644
--- i/tests/qemuxml2argvmock.c
+++ w/tests/qemuxml2argvmock.c
@@ -286,3 +286,11 @@ qemuBuildTPMOpenBackendFDs(const char *tpmdev 
G_GNUC_UNUSED,
  *cancelfd = 1731;
  return 0;
  }
+
+
+int
+virNetDevSetRootQDisc(const char *ifname G_GNUC_UNUSED,
+  const char *qdisc G_GNUC_UNUSED)
+{
+    return 0;
+}


Hopefully, you are okay if I squash this to 2/2.



Yep, go ahead.






Have you tried it out with older kernels? Reading the bug I understood
that 4.2 and older (2015 kernels) does not have qdisc support and that you
can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to
verify it.



I didn't, but I had this on mind when writing patches and I made the whole thing 
best effort. Note that qemuDomainInterfaceSetDefaultQDisc() retval is not checked 
for => if setting noqueue fails (for whatever reason) then worst case scenario 
an error is printed into logs.


I consider this to be a 'nice to have' that can be added in a follow up
patch though, if applicable. By the way, do we have any documentation about
"the latest Libvirt release will not care about N+ years old kernel/QEMU"?


I'm not sure what you mean. This perhaps?

https://libvirt.org/platforms.html#linux-freebsd-and-macos



Thanks, this is what I was looking for.



DHB




Michal

Re: [PATCH 0/2] qemu: Set noqueue qdisc for TAP devices

2020-10-13 Thread Michal Privoznik


On 10/13/20 3:09 PM, Daniel Henrique Barboza wrote:



On 10/8/20 3:10 PM, Michal Privoznik wrote:

See 2/2 for detailed explanation.
Long story short - we can squeeze more bandwidth from TAP devices we
create for domains.

Michal Prívozník (2):
   virnetdev: Introduce virNetDevSetRootQDisc()
   qemu: Set noqueue qdisc for TAP devices



Reviewed-by: Daniel Henrique Barboza 



Thanks. Just before I merged these I realized that it will be a good 
idea to mock virNetDevSetRootQDisc() so that we don't run tc from the 
test suite. But the diff is really trivial:


diff --git i/src/util/virnetdev.h w/src/util/virnetdev.h
index 82943b8e08..dfef49938f 100644
--- i/src/util/virnetdev.h
+++ w/src/util/virnetdev.h
@@ -313,6 +313,7 @@ int virNetDevRunEthernetScript(const char *ifname, 
const char *script)

 G_GNUC_NO_INLINE;

 int virNetDevSetRootQDisc(const char *ifname,
-  const char *qdisc);
+  const char *qdisc)
+G_GNUC_NO_INLINE;

 G_DEFINE_AUTOPTR_CLEANUP_FUNC(virNetDevRxFilter, virNetDevRxFilterFree);
diff --git i/tests/qemuxml2argvmock.c w/tests/qemuxml2argvmock.c
index 9bf4357b66..b9322f4f2a 100644
--- i/tests/qemuxml2argvmock.c
+++ w/tests/qemuxml2argvmock.c
@@ -286,3 +286,11 @@ qemuBuildTPMOpenBackendFDs(const char *tpmdev 
G_GNUC_UNUSED,

 *cancelfd = 1731;
 return 0;
 }
+
+
+int
+virNetDevSetRootQDisc(const char *ifname G_GNUC_UNUSED,
+  const char *qdisc G_GNUC_UNUSED)
+{
+return 0;
+}


Hopefully, you are okay if I squash this to 2/2.



Have you tried it out with older kernels? Reading the bug I understood
that 4.2 and older (2015 kernels) does not have qdisc support and that you
can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to
verify it.



I didn't, but I had this on mind when writing patches and I made the 
whole thing best effort. Note that qemuDomainInterfaceSetDefaultQDisc() 
retval is not checked for => if setting noqueue fails (for whatever 
reason) then worst case scenario an error is printed into logs.



I consider this to be a 'nice to have' that can be added in a follow up
patch though, if applicable. By the way, do we have any documentation about
"the latest Libvirt release will not care about N+ years old kernel/QEMU"?


I'm not sure what you mean. This perhaps?

https://libvirt.org/platforms.html#linux-freebsd-and-macos

Michal

Re: [PATCH] qemu_slirp: Check if helper exists before fetching its capabilities

2020-10-13 Thread Laine Stump


On 10/13/20 9:32 AM, Michal Privoznik wrote:

I've noticed when running libvirtd in the session mode that
whenever I start a virtual machine the following error is printed
into logs:

   error : cannot execute binary /usr/bin/slirp-helper: No such file or 
directory

The error message is produced in qemuSlirpNewForHelper() which
does attempt to be a NO-OP if the helper doesn't exist by
checking if its path is NULL, but that's not usually the case
because in the default config (in virQEMUDriverConfigNew()) its
path is initialized to QEMU_SLIRP_HELPER. And while it is true
that during configure we try to get the actual path of the helper
we fallback to the path above if not found.

Fix the check so that the function checks whether the helper
exists and is executable.

Signed-off-by: Michal Privoznik 
---
  src/qemu/qemu_slirp.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c
index d2e4ed79be..4e5ab12727 100644
--- a/src/qemu/qemu_slirp.c
+++ b/src/qemu/qemu_slirp.c
@@ -101,7 +101,8 @@ qemuSlirpNewForHelper(const char *helper)
  virJSONValuePtr featuresJSON;
  size_t i, nfeatures;
  
-if (!helper)

+if (!helper ||
+!virFileIsExecutable(helper))



I would prefer if either then entire expression was on a single line 
(since it's so short), or if you're going to put it on multiple lines, 
that you surroung the "return NULL;" with {  } (I think the coding 
standards say to do this, but the syntax-check doesn't check it)


Reviewed-by: Laine Stump 



  return NULL;
  
  slirp = qemuSlirpNew();

Re: [PATCH] qemu_slirp: Check if helper exists before fetching its capabilities

2020-10-13 Thread Daniel P . Berrangé

On Tue, Oct 13, 2020 at 03:32:28PM +0200, Michal Privoznik wrote:
> I've noticed when running libvirtd in the session mode that
> whenever I start a virtual machine the following error is printed
> into logs:
> 
>   error : cannot execute binary /usr/bin/slirp-helper: No such file or 
> directory
> 
> The error message is produced in qemuSlirpNewForHelper() which
> does attempt to be a NO-OP if the helper doesn't exist by
> checking if its path is NULL, but that's not usually the case
> because in the default config (in virQEMUDriverConfigNew()) its
> path is initialized to QEMU_SLIRP_HELPER. And while it is true
> that during configure we try to get the actual path of the helper
> we fallback to the path above if not found.
> 
> Fix the check so that the function checks whether the helper
> exists and is executable.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/qemu/qemu_slirp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c
> index d2e4ed79be..4e5ab12727 100644
> --- a/src/qemu/qemu_slirp.c
> +++ b/src/qemu/qemu_slirp.c
> @@ -101,7 +101,8 @@ qemuSlirpNewForHelper(const char *helper)
>  virJSONValuePtr featuresJSON;
>  size_t i, nfeatures;
>  
> -if (!helper)
> +if (!helper ||
> +!virFileIsExecutable(helper))
>  return NULL;

This API has way bigger problems than this. It is reporting errors using
virReportError, but the callers ignore them and can't distinguish real
errors from expected errors.

I'm working on a more comprehensive fix and will send patch shortly


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] bhyve: implement virtio-9p support

2020-10-13 Thread Roman Bogorodskiy

  Daniel P. Berrangé wrote:

> On Thu, Oct 08, 2020 at 05:06:16PM +0400, Roman Bogorodskiy wrote:
> > Recently virtio-9p support was added to bhyve.
> > 
> > On the host side it looks this way:
> > 
> >   bhyve  -s 25:0,virtio-9p,sharename=/path/to/shared/dir
> > 
> > It could also have ",ro" suffix to make share read-only.
> > 
> > In the Linux guest, this share is mounted with:
> > 
> >   mount -t 9p sharename /mnt/sharename
> > 
> > In the guest user will see the same permissions and ownership
> > information for this directory as on the host. No uid/gid remapping is
> > supported, so those could resolve to wrong user or group names.
> > 
> > The same applies to the other side: chowning/chmodding in the guest will
> > set specified ownership and permissions on the host.
> > 
> > In libvirt domain XML it's modeled using the 'filesystem' element:
> > 
> >   
> > 
> > 
> >   
> 
> 
> > diff --git a/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml 
> > b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml
> > new file mode 100644
> > index 00..6341236654
> > --- /dev/null
> > +++ b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml
> > @@ -0,0 +1,28 @@
> > +
> > +  bhyve
> > +  df3be7e7-a104-11e3-aeb0-50e5492bd3dc
> > +  219136
> > +  1
> > +  
> > +hvm
> > +  
> > +  
> > +
> > +  
> > +  
> > +  
> > +  
> > +
> > +
> > +  
> > +  
> > +  
> > +   > function='0x0'/>
> > +
> > +
> 
> This is missing the  type="mount"  attribute which should be mandatory.
> It suggests we're not validating the type in the driver, before accessing
> the  element, which is dangerous.
> 
> > +  
> > +  
> > +  
> > +
> > +  
> > +
> 
> The other demo XML files are the same.

Hm, as I can see in the schema, type="mount" is default. That's what I
see in virDomainFSDefParseXML() @ src/conf/domain_conf.c as well.

I also check that in the driver, and there's a test for it:

tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-unsupported-type.xml

Are you referring to something different?

> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
> 

Roman Bogorodskiy


signature.asc
Description: PGP signature

[PATCH] qemu_slirp: Check if helper exists before fetching its capabilities

2020-10-13 Thread Michal Privoznik

I've noticed when running libvirtd in the session mode that
whenever I start a virtual machine the following error is printed
into logs:

  error : cannot execute binary /usr/bin/slirp-helper: No such file or directory

The error message is produced in qemuSlirpNewForHelper() which
does attempt to be a NO-OP if the helper doesn't exist by
checking if its path is NULL, but that's not usually the case
because in the default config (in virQEMUDriverConfigNew()) its
path is initialized to QEMU_SLIRP_HELPER. And while it is true
that during configure we try to get the actual path of the helper
we fallback to the path above if not found.

Fix the check so that the function checks whether the helper
exists and is executable.

Signed-off-by: Michal Privoznik 
---
 src/qemu/qemu_slirp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_slirp.c b/src/qemu/qemu_slirp.c
index d2e4ed79be..4e5ab12727 100644
--- a/src/qemu/qemu_slirp.c
+++ b/src/qemu/qemu_slirp.c
@@ -101,7 +101,8 @@ qemuSlirpNewForHelper(const char *helper)
 virJSONValuePtr featuresJSON;
 size_t i, nfeatures;
 
-if (!helper)
+if (!helper ||
+!virFileIsExecutable(helper))
 return NULL;
 
 slirp = qemuSlirpNew();
-- 
2.26.2

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread Daniel Henrique Barboza





On 10/12/20 8:35 PM, Nico Pache wrote:

gitlab issue: https://gitlab.com/libvirt/libvirt/-/issues/79

The virtio-balloon device now has the ability to report free pages
back to the hypervisor for reuse by other programs.

This kernel feature allows for more stable and resource friendly systems.

This feature is available in QEMU and is enabled with free-page-reporting=on
default virt setting should be off but the user should be able to enable it.

Nico Pache (4):
   Document and parser support for the Virtio free page reporting
 feature.
   QEMU: declare qemu capabilities for the Virtio Free page reporting
 feature
   QEMU: introduce Virtio free page reporting feature
   provide testing for free-page-reporting feature in QEMU



Reviewed-by: Daniel Henrique Barboza 



  docs/formatdomain.rst |  6 
  docs/schemas/domaincommon.rng |  5 +++
  src/conf/domain_conf.c| 21 +++
  src/conf/domain_conf.h|  1 +
  src/qemu/qemu_capabilities.c  |  2 ++
  src/qemu/qemu_capabilities.h  |  1 +
  src/qemu/qemu_command.c   |  5 +++
  src/qemu/qemu_validate.c  |  7 
  .../caps_5.1.0.x86_64.xml |  1 +
  .../caps_5.2.0.x86_64.xml |  1 +
  ...-options-memballoon-freepage-reporting.err |  1 +
  ...loon-freepage-reporting.x86_64-latest.args | 36 +++
  ...-options-memballoon-freepage-reporting.xml | 22 
  tests/qemuxml2argvtest.c  |  2 ++
  ...-options-memballoon-freepage-reporting.xml | 22 
  15 files changed, 133 insertions(+)
  create mode 100644 
tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.err
  create mode 100644 
tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.x86_64-latest.args
  create mode 100644 
tests/qemuxml2argvdata/virtio-options-memballoon-freepage-reporting.xml
  create mode 100644 
tests/qemuxml2xmloutdata/virtio-options-memballoon-freepage-reporting.xml

Re: [libvirt PATCH 0/4] vmx: start parsing SATA disks

2020-10-13 Thread Michal Privoznik


On 10/12/20 5:13 PM, Pino Toscano wrote:

Try to parse SATA disks in VMware guests from their configs in VMX
files. There are a couple of helper/cleanup commits to ease a bit the
actual patches.

Pino Toscano (4):
   vmx: hide private helpers
   vmx: shortcut 'cdrom-image' as CD-ROM earlier
   vmx: expand the disk array
   vmx: start parsing SATA disks

  src/libvirt_vmx.syms  |  12 -
  src/vmx/vmx.c | 212 --
  src/vmx/vmx.h |  44 
  .../vmx2xml-esx-in-the-wild-10.vmx| 101 +
  .../vmx2xml-esx-in-the-wild-10.xml|  36 +++
  .../vmx2xmldata/vmx2xml-esx-in-the-wild-8.xml |   6 +
  tests/vmx2xmltest.c   |   1 +
  7 files changed, 335 insertions(+), 77 deletions(-)
  create mode 100644 tests/vmx2xmldata/vmx2xml-esx-in-the-wild-10.vmx
  create mode 100644 tests/vmx2xmldata/vmx2xml-esx-in-the-wild-10.xml



Reviewed-by: Michal Privoznik 

Michal

Re: [PATCH 0/2] qemu: Set noqueue qdisc for TAP devices

2020-10-13 Thread Daniel Henrique Barboza





On 10/8/20 3:10 PM, Michal Privoznik wrote:

See 2/2 for detailed explanation.
Long story short - we can squeeze more bandwidth from TAP devices we
create for domains.

Michal Prívozník (2):
   virnetdev: Introduce virNetDevSetRootQDisc()
   qemu: Set noqueue qdisc for TAP devices



Reviewed-by: Daniel Henrique Barboza 


Have you tried it out with older kernels? Reading the bug I understood
that 4.2 and older (2015 kernels) does not have qdisc support and that you
can, for example, test for NETIF_F_LLTX of the ETHTOOL_GFEATURES ioctl to
verify it.

I consider this to be a 'nice to have' that can be added in a follow up
patch though, if applicable. By the way, do we have any documentation about
"the latest Libvirt release will not care about N+ years old kernel/QEMU"?


Thanks,

DHB






  src/libvirt_private.syms |  1 +
  src/qemu/qemu_command.c  |  2 ++
  src/qemu/qemu_domain.c   | 36 +++
  src/qemu/qemu_domain.h   |  4 
  src/qemu/qemu_hotplug.c  |  2 ++
  src/util/virnetdev.c | 46 
  src/util/virnetdev.h |  3 +++
  7 files changed, 94 insertions(+)

Re: [RFC] qemu: virtiofs can be used without NUMA nodes

2020-10-13 Thread Michal Privoznik


On 10/6/20 6:20 PM, Marc Hartmayer wrote:

...if a machine memory-backend using shared memory is configured for
the guest. This is especially important for QEMU machine types that
don't have NUMA but virtiofs support.

An example snippet:

   
 test
 2097152
 
   
 
 
   



   
   ...
 
 ...
   

and the corresponding QEMU command line:

   /usr/bin/qemu-system-s390x \
   -machine s390-ccw-virtio-5.2,memory-backend=s390.ram \
   -m 2048 \
   -object
   
memory-backend-file,id=s390.ram,mem-path=/var/lib/libvirt/qemu/ram/46-test/s390.ram,share=yes,size=2147483648
 \
   ...

Signed-off-by: Marc Hartmayer 
---
Note: There are still some TODOs left... e.g. adapt the virtiofs
documentation of libvirt.


Yep, but looks good.


---
  src/qemu/qemu_validate.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
index a212605579d2..077a85b30802 100644
--- a/src/qemu/qemu_validate.c
+++ b/src/qemu/qemu_validate.c
@@ -3470,14 +3470,21 @@ qemuValidateDomainDeviceDefGraphics(const 
virDomainGraphicsDef *graphics,
  
  
  static int

-qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def)
+qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def,
+  virQEMUCapsPtr qemuCaps)
  {
+const char *defaultRAMId = virQEMUCapsGetMachineDefaultRAMid(qemuCaps,
+ def->virtType,
+ 
def->os.machine);
  size_t numa_nodes = virDomainNumaGetNodeCount(def->numa);
  size_t i;
  
-if (numa_nodes == 0) {

+if (numa_nodes == 0 &&
+!(defaultRAMId && def->mem.access == VIR_DOMAIN_MEMORY_ACCESS_SHARED)) 
{
+/* TODO do we need further checks here (e.g. check whether
+ * memory backend is supported by the QEMU binary)? */


I don't think we need that. memory backends can't be compiled out (well, 
unless a distro has a patch on the top of qemu which would do exactly 
that), can defaultRAMId exposed is strictly newer than memory backends.


I think this comment can be removed and the rest can be kept as is (plus 
 the docs).


Michal

Re: [PATCH] bhyve: implement virtio-9p support

2020-10-13 Thread Daniel P . Berrangé

On Thu, Oct 08, 2020 at 05:06:16PM +0400, Roman Bogorodskiy wrote:
> Recently virtio-9p support was added to bhyve.
> 
> On the host side it looks this way:
> 
>   bhyve  -s 25:0,virtio-9p,sharename=/path/to/shared/dir
> 
> It could also have ",ro" suffix to make share read-only.
> 
> In the Linux guest, this share is mounted with:
> 
>   mount -t 9p sharename /mnt/sharename
> 
> In the guest user will see the same permissions and ownership
> information for this directory as on the host. No uid/gid remapping is
> supported, so those could resolve to wrong user or group names.
> 
> The same applies to the other side: chowning/chmodding in the guest will
> set specified ownership and permissions on the host.
> 
> In libvirt domain XML it's modeled using the 'filesystem' element:
> 
>   
> 
> 
>   


> diff --git a/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml 
> b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml
> new file mode 100644
> index 00..6341236654
> --- /dev/null
> +++ b/tests/bhyvexml2argvdata/bhyvexml2argv-fs-9p-readonly.xml
> @@ -0,0 +1,28 @@
> +
> +  bhyve
> +  df3be7e7-a104-11e3-aeb0-50e5492bd3dc
> +  219136
> +  1
> +  
> +hvm
> +  
> +  
> +
> +  
> +  
> +  
> +  
> +
> +
> +  
> +  
> +  
> +   function='0x0'/>
> +
> +

This is missing the  type="mount"  attribute which should be mandatory.
It suggests we're not validating the type in the driver, before accessing
the  element, which is dangerous.

> +  
> +  
> +  
> +
> +  
> +

The other demo XML files are the same.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 1/4] API: introduce memory failure

2020-10-13 Thread Daniel Henrique Barboza


This patch failed to compile in my env:


FAILED: tools/virsh.p/virsh-domain.c.o
[]
-D_FUNCTION_DEF -MD -MQ tools/virsh.p/virsh-domain.c.o -MF 
tools/virsh.p/virsh-domain.c.o.d -o tools/virsh.p/virsh-domain.c.o -c 
../tools/virsh-domain.c
In file included from /usr/lib64/glib-2.0/include/glibconfig.h:9,
 from /usr/include/glib-2.0/glib/gtypes.h:32,
 from /usr/include/glib-2.0/glib/galloca.h:32,
 from /usr/include/glib-2.0/glib.h:30,
 from ../src/util/glibcompat.h:21,
 from ../src/internal.h:30,
 from ../tools/virsh.h:25,
 from ../tools/virsh-domain.h:23,
 from ../tools/virsh-domain.c:22:
/usr/include/glib-2.0/glib/gmacros.h:745:53: error: size of array 
‘_GStaticAssertCompileTimeAssertion_185’ is negative
  745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE 
(_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] 
G_GNUC_UNUSED
  | 
^~~
/usr/include/glib-2.0/glib/gmacros.h:735:47: note: in definition of macro 
‘G_PASTE_ARGS’
  735 | #define G_PASTE_ARGS(identifier1,identifier2) identifier1 ## identifier2
  |   ^~~
/usr/include/glib-2.0/glib/gmacros.h:745:44: note: in expansion of macro 
‘G_PASTE’
  745 | #define G_STATIC_ASSERT(expr) typedef char G_PASTE 
(_GStaticAssertCompileTimeAssertion_, __COUNTER__)[(expr) ? 1 : -1] 
G_GNUC_UNUSED
  |^~~
../tools/virsh-domain.c:13643:1: note: in expansion of macro ‘G_STATIC_ASSERT’
13643 | G_STATIC_ASSERT(VIR_DOMAIN_EVENT_ID_LAST == 
G_N_ELEMENTS(virshDomainEventCallbacks));
  | ^~~
[505/984] Compiling C object src/virtqemud.p/remote_remote_daemon_dispatch.c.o
ninja: build stopped: subcommand failed.
$


I didn't verify if the following patches fixes it.


Thanks,


DHB


On 10/12/20 9:31 AM, zhenwei pi wrote:

Introduce memory failure event. Libvirt should monitor domain's
event, then posts it to uplayer. According to the hardware memory
corrupted message, the cloud scheduler could migrate domain to another
health physical server.

Signed-off-by: zhenwei pi 
---
  include/libvirt/libvirt-domain.h| 82 +
  src/conf/domain_event.c | 80 
  src/conf/domain_event.h | 12 ++
  src/libvirt_private.syms|  2 +
  src/remote/remote_daemon_dispatch.c | 32 +++
  src/remote/remote_driver.c  | 32 +++
  src/remote/remote_protocol.x| 16 +++-
  src/remote_protocol-structs |  8 
  8 files changed, 263 insertions(+), 1 deletion(-)

diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
index 77f9116675..5138843a56 100644
--- a/include/libvirt/libvirt-domain.h
+++ b/include/libvirt/libvirt-domain.h
@@ -3196,6 +3196,64 @@ typedef enum {
  } virDomainEventCrashedDetailType;
  
  /**

+ * virDomainMemoryFailureRecipientType:
+ *
+ * Recipient of a memory failure event.
+ */
+typedef enum {
+/* memory failure at hypersivor memory address space */
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_HYPERVISOR = 0,
+
+/* memory failure at guest memory address space */
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_GUEST = 1,
+
+# ifdef VIR_ENUM_SENTINELS
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_LAST
+# endif
+} virDomainMemoryFailureRecipientType;
+
+
+/**
+ * virDomainMemoryFailureActionType:
+ *
+ * Action of a memory failure event.
+ */
+typedef enum {
+/* the memory failure could be ignored. This will only be the case for
+ * action-optional failures. */
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_IGNORE = 0,
+
+/* memory failure occurred in guest memory, the guest enabled MCE handling
+ * mechanism, and hypervisor could inject the MCE into the guest
+ * successfully. */
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_INJECT = 1,
+
+/* the failure is unrecoverable.  This occurs for action-required failures
+ * if the recipient is the hypervisor; hypervisor will exit. */
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_FATAL = 2,
+
+/* the failure is unrecoverable but confined to the guest. This occurs if
+ * the recipient is a guest which is not ready to handle memory failures. 
*/
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_RESET = 3,
+
+# ifdef VIR_ENUM_SENTINELS
+VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_LAST
+# endif
+} virDomainMemoryFailureActionType;
+
+
+typedef enum {
+/* whether a memory failure event is action-required or action-optional
+ * (e.g. a failure during memory scrub). */
+VIR_DOMAIN_MEMORY_FAILURE_ACTION_REQUIRED = (1 << 0),
+
+/* whether the failure occurred while the previous failure was still in
+ * progress. */
+VIR_DOMAIN_MEMORY_FAILURE_RECURSIVE = (1

[PATCH v11 02/13] copy-on-read: add filter append/drop functions

2020-10-13 Thread Andrey Shinkevich

Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.c | 88 
 block/copy-on-read.h | 35 +
 2 files changed, 123 insertions(+)
 create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..bcccf0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,21 @@
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
+BDRVStateCOR *state = bs->opaque;
+
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
false, errp);
@@ -42,6 +52,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
 
+state->active = true;
+
+/*
+ * We don't need to call bdrv_child_refresh_perms() now as the permissions
+ * will be updated later when the filter node gets its parent.
+ */
+
 return 0;
 }
 
@@ -57,6 +74,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild 
*c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVStateCOR *s = bs->opaque;
+
+if (!s->active) {
+/*
+ * While the filter is being removed
+ */
+*nperm = 0;
+*nshared = BLK_PERM_ALL;
+return;
+}
+
 *nperm = perm & PERM_PASSTHROUGH;
 *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +163,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool 
locked)
 
 static BlockDriver bdrv_copy_on_read = {
 .format_name= "copy-on-read",
+.instance_size  = sizeof(BDRVStateCOR),
 
 .bdrv_open  = cor_open,
 .bdrv_child_perm= cor_child_perm,
@@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
 bdrv_register(_copy_on_read);
 }
 
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+ QDict *node_options,
+ int flags, Error **errp)
+{
+BlockDriverState *cor_filter_bs;
+Error *local_err = NULL;
+
+cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
+if (cor_filter_bs == NULL) {
+error_prepend(errp, "Could not create COR-filter node: ");
+return NULL;
+}
+
+if (!qdict_get_try_str(node_options, "node-name")) {
+cor_filter_bs->implicit = true;
+}
+
+bdrv_drained_begin(bs);
+bdrv_replace_node(bs, cor_filter_bs, _err);
+bdrv_drained_end(bs);
+
+if (local_err) {
+bdrv_unref(cor_filter_bs);
+error_propagate(errp, local_err);
+return NULL;
+}
+
+return cor_filter_bs;
+}
+
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+BdrvChild *child;
+BlockDriverState *bs;
+BDRVStateCOR *s = cor_filter_bs->opaque;
+
+child = bdrv_filter_child(cor_filter_bs);
+if (!child) {
+return;
+}
+bs = child->bs;
+
+/* Retain the BDS until we complete the graph change. */
+bdrv_ref(bs);
+/* Hold a guest back from writing while permissions are being reset. */
+bdrv_drained_begin(bs);
+/* Drop permissions before the graph change. */
+s->active = false;
+bdrv_child_refresh_perms(cor_filter_bs, child, _abort);
+bdrv_replace_node(cor_filter_bs, bs, _abort);
+
+bdrv_drained_end(bs);
+bdrv_unref(bs);
+bdrv_unref(cor_filter_bs);
+}
+
+
 block_init(bdrv_copy_on_read_init);
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 000..d6f2422
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,35 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 2018-2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *   Andrey Shinkevich 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of

[PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-13 Thread Andrey Shinkevich

We are going to use the COR-filter for a block-stream job.
To limit COR operations by the base node in the backing chain during
stream job, pass the name of overlay base node to the copy-on-read
driver as base node itself may change due to possible concurrent jobs.
The rest of the functionality will be implemented in the patch that
follows.

Signed-off-by: Andrey Shinkevich 
---
 block/copy-on-read.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index bcccf0f..c578b1b 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,19 +24,24 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qdict.h"
 #include "block/copy-on-read.h"
 
 
 typedef struct BDRVStateCOR {
 bool active;
+BlockDriverState *base_overlay;
 } BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
+BlockDriverState *base_overlay = NULL;
 BDRVStateCOR *state = bs->opaque;
+/* We need the base overlay node rather than the base itself */
+const char *base_overlay_node = qdict_get_try_str(options, "base");
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
 bs->file->bs->supported_zero_flags);
 
+if (base_overlay_node) {
+qdict_del(options, "base");
+base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
+if (!base_overlay) {
+error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
+return -EINVAL;
+}
+}
 state->active = true;
+state->base_overlay = base_overlay;
 
 /*
  * We don't need to call bdrv_child_refresh_perms() now as the permissions
-- 
1.8.3.1

[PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions

2020-10-13 Thread Andrey Shinkevich

Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-on-read.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 2816e61..cb03e0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs)
 }
 
 
-static int coroutine_fn cor_co_preadv(BlockDriverState *bs,
-  uint64_t offset, uint64_t bytes,
-  QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
+   uint64_t offset, uint64_t bytes,
+   QEMUIOVector *qiov,
+   size_t qiov_offset,
+   int flags)
 {
-return bdrv_co_preadv(bs->file, offset, bytes, qiov,
-  flags | BDRV_REQ_COPY_ON_READ);
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
 }
 
 
-static int coroutine_fn cor_co_pwritev(BlockDriverState *bs,
-   uint64_t offset, uint64_t bytes,
-   QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
+uint64_t offset,
+uint64_t bytes,
+QEMUIOVector *qiov,
+size_t qiov_offset, int flags)
 {
-
-return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+flags);
 }
 
 
@@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = {
 
 .bdrv_getlength = cor_getlength,
 
-.bdrv_co_preadv = cor_co_preadv,
-.bdrv_co_pwritev= cor_co_pwritev,
+.bdrv_co_preadv_part= cor_co_preadv_part,
+.bdrv_co_pwritev_part   = cor_co_pwritev_part,
 .bdrv_co_pwrite_zeroes  = cor_co_pwrite_zeroes,
 .bdrv_co_pdiscard   = cor_co_pdiscard,
 .bdrv_co_pwritev_compressed = cor_co_pwritev_compressed,
-- 
1.8.3.1

[PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed

2020-10-13 Thread Andrey Shinkevich

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich 
---
 block/copy-on-read.c | 13 +
 block/io.c   |  3 ++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index b136895..278a11a 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -148,10 +148,15 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
 }
 }
 
-ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
-  local_flags);
-if (ret < 0) {
-return ret;
+if (!!(flags & BDRV_REQ_PREFETCH) &
+!(local_flags & BDRV_REQ_COPY_ON_READ)) {
+/* Skip non-guest reads if no copy needed */
+} else {
+ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+if (ret < 0) {
+return ret;
+}
 }
 
 offset += n;
diff --git a/block/io.c b/block/io.c
index 11df188..bff1808 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild 
*child,
 
 max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
 if (bytes <= max_bytes && bytes <= max_transfer) {
-ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
+ flags & bs->supported_read_flags);
 goto out;
 }
 
-- 
1.8.3.1

[PATCH v11 07/13] block: include supported_read_flags into BDS structure

2020-10-13 Thread Andrey Shinkevich

Add the new member supported_read_flags to BlockDriverState structure.
It will control the BDRV_REQ_PREFETCH flag set for copy-on-read
operations.

Signed-off-by: Andrey Shinkevich 
---
 include/block/block_int.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index f782737..a142867 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -873,6 +873,10 @@ struct BlockDriverState {
 /* I/O Limits */
 BlockLimits bl;
 
+/*
+ * Flags honored during pread (so far: BDRV_REQ_PREFETCH)
+ */
+unsigned int supported_read_flags;
 /* Flags honored during pwrite (so far: BDRV_REQ_FUA,
  * BDRV_REQ_WRITE_UNCHANGED).
  * If a driver does not support BDRV_REQ_WRITE_UNCHANGED, those
-- 
1.8.3.1

[PATCH v11 11/13] stream: mark backing-file argument as deprecated

2020-10-13 Thread Andrey Shinkevich

Whereas the block-stream job starts using a backing file name of the
base node overlay after the block-stream job completes, mark the QMP
'backing-file' argument as deprecated.

Signed-off-by: Andrey Shinkevich 
---
 docs/system/deprecated.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 8b3ab5b..7491fcf 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -285,6 +285,12 @@ details.
 The ``query-events`` command has been superseded by the more powerful
 and accurate ``query-qmp-schema`` command.
 
+``block-stream`` argument ``backing-file`` (since 5.2)
+'
+
+The argument ``backing-file`` is deprecated. QEMU uses a backing file
+name of the base node overlay after the block-stream job completes.
+
 chardev client socket with ``wait`` option (since 4.0)
 ''
 
-- 
1.8.3.1

[PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter

2020-10-13 Thread Andrey Shinkevich

Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags
of the COR-filter.

Signed-off-by: Andrey Shinkevich 
---
 block/copy-on-read.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index dfbd6ad..b136895 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, int 
flags,
 return -EINVAL;
 }
 
+bs->supported_read_flags = BDRV_REQ_PREFETCH;
 bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
 (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
 
-- 
1.8.3.1

[PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag

2020-10-13 Thread Andrey Shinkevich

Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich 
---
 include/block/block.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b..2b7efd1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -71,9 +71,10 @@ typedef enum {
 BDRV_REQ_NO_FALLBACK= 0x100,
 
 /*
- * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
- * on read request and means that caller doesn't really need data to be
- * written to qiov parameter which may be NULL.
+ * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
+ * flag or when the COR-filter applied to read operations and means that
+ * caller doesn't really need data to be written to qiov parameter which
+ * may be NULL.
  */
 BDRV_REQ_PREFETCH  = 0x200,
 /* Mask of valid flags */
-- 
1.8.3.1

[PATCH v11 00/13] Apply COR-filter to the block-stream permanently

2020-10-13 Thread Andrey Shinkevich

The iotest case test_stream_parallel still does not pass after the
COR-filter is inserted into the backing chain. As the test case may not
be initialized, it does not make a sense and was removed again.

v11:
  04: Base node overlay is used instead of base.
  05: Base node overlay is used instead of base.
  06: New.
  07: New.
  08: New.
  09: The new BDS-member 'supported_read_flags' is applied.
  10: The 'base_metadata' variable renamed to 'base_unfiltered'.
  11: New.
  12: The backing-file argument is left in the QMP interface. Warning added.
  13: The BDRV_REQ_COPY_ON_READ removed from the stream_populate();
  The 'implicit' initialization moved back to COR-filter driver.
  Base node overlay is used instead of base.

The v8 Message-Id:
<1601383109-110988-1-git-send-email-andrey.shinkev...@virtuozzo.com>

Andrey Shinkevich (13):
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  copy-on-read: pass overlay base node name to COR driver
  copy-on-read: limit COR operations to base in COR driver
  block: modify the comment for BDRV_REQ_PREFETCH flag
  block: include supported_read_flags into BDS structure
  copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter
  copy-on-read: skip non-guest reads if no copy needed
  stream: skip filters when writing backing file name to QCOW2 header
  stream: mark backing-file argument as deprecated
  stream: remove unused backing-file name parameter
  block: apply COR-filter to block-stream jobs

 block/copy-on-read.c   | 171 ++---
 block/copy-on-read.h   |  35 +
 block/io.c |   3 +-
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c | 112 ---
 blockdev.c |  25 +++---
 docs/system/deprecated.rst |   6 ++
 include/block/block.h  |   7 +-
 include/block/block_int.h  |  13 +++-
 qapi/block-core.json   |   6 ++
 tests/qemu-iotests/030 |  51 ++--
 tests/qemu-iotests/030.out |   4 +-
 tests/qemu-iotests/141.out |   2 +-
 tests/qemu-iotests/245 |  19 +++--
 14 files changed, 324 insertions(+), 134 deletions(-)
 create mode 100644 block/copy-on-read.h

-- 
1.8.3.1

[PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-13 Thread Andrey Shinkevich

Limit COR operations by the base node in the backing chain when the
overlay base node name is given. It will be useful for a block stream
job when the COR-filter is applied. The overlay base node is passed as
the base itself may change due to concurrent commit jobs on the same
backing chain.

Signed-off-by: Andrey Shinkevich 
---
 block/copy-on-read.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index c578b1b..dfbd6ad 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -122,8 +122,43 @@ static int coroutine_fn 
cor_co_preadv_part(BlockDriverState *bs,
size_t qiov_offset,
int flags)
 {
-return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
-   flags | BDRV_REQ_COPY_ON_READ);
+int64_t n = 0;
+int64_t size = offset + bytes;
+int local_flags;
+int ret;
+BDRVStateCOR *state = bs->opaque;
+
+if (!state->base_overlay) {
+return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+   flags | BDRV_REQ_COPY_ON_READ);
+}
+
+while (offset < size) {
+local_flags = flags;
+
+/* In case of failure, try to copy-on-read anyway */
+ret = bdrv_is_allocated(bs->file->bs, offset, bytes, );
+if (!ret) {
+ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
+  state->base_overlay, true, offset,
+  n, );
+if (ret) {
+local_flags |= BDRV_REQ_COPY_ON_READ;
+}
+}
+
+ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+  local_flags);
+if (ret < 0) {
+return ret;
+}
+
+offset += n;
+qiov_offset += n;
+bytes -= n;
+}
+
+return 0;
 }
 
 
-- 
1.8.3.1

[PATCH v11 12/13] stream: remove unused backing-file name parameter

2020-10-13 Thread Andrey Shinkevich

The 'backing-file' argument is not used by the block-stream job. It
designates a backing file name to set in QCOW2 image header after the
block-stream job finished. A backing file name of the node above base
is used instead.

Signed-off-by: Andrey Shinkevich 
---
 block/stream.c|  6 +-
 blockdev.c| 21 ++---
 include/block/block_int.h |  2 +-
 3 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 51462bd..d3e1812 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -34,7 +34,6 @@ typedef struct StreamBlockJob {
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
 BlockdevOnError on_error;
-char *backing_file_str;
 bool bs_read_only;
 bool chain_frozen;
 } StreamBlockJob;
@@ -103,8 +102,6 @@ static void stream_clean(Job *job)
 blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort);
 bdrv_reopen_set_read_only(bs, true, NULL);
 }
-
-g_free(s->backing_file_str);
 }
 
 static int coroutine_fn stream_run(Job *job, Error **errp)
@@ -220,7 +217,7 @@ static const BlockJobDriver stream_job_driver = {
 };
 
 void stream_start(const char *job_id, BlockDriverState *bs,
-  BlockDriverState *base, const char *backing_file_str,
+  BlockDriverState *base,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
   const char *filter_node_name,
@@ -295,7 +292,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 
 s->base_overlay = base_overlay;
 s->above_base = above_base;
-s->backing_file_str = g_strdup(backing_file_str);
 s->bs_read_only = bs_read_only;
 s->chain_frozen = true;
 
diff --git a/blockdev.c b/blockdev.c
index d719c47..019b6e0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2498,7 +2498,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 BlockDriverState *base_bs = NULL;
 AioContext *aio_context;
 Error *local_err = NULL;
-const char *base_name = NULL;
 int job_flags = JOB_DEFAULT;
 
 if (!has_on_error) {
@@ -2526,7 +2525,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 assert(bdrv_get_aio_context(base_bs) == aio_context);
-base_name = base;
 }
 
 if (has_base_node) {
@@ -2541,7 +2539,11 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 assert(bdrv_get_aio_context(base_bs) == aio_context);
 bdrv_refresh_filename(base_bs);
-base_name = base_bs->filename;
+}
+
+if (has_backing_file) {
+warn_report("Use of \"backing-file\" argument is deprecated; "
+"a backing file of the node above base is used instead");
 }
 
 /* Check for op blockers in the whole chain between bs and base */
@@ -2553,17 +2555,6 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 }
 
-/* if we are streaming the entire chain, the result will have no backing
- * file, and specifying one is therefore an error */
-if (base_bs == NULL && has_backing_file) {
-error_setg(errp, "backing file specified, but streaming the "
- "entire chain");
-goto out;
-}
-
-/* backing_file string overrides base bs filename */
-base_name = has_backing_file ? backing_file : base_name;
-
 if (has_auto_finalize && !auto_finalize) {
 job_flags |= JOB_MANUAL_FINALIZE;
 }
@@ -2571,7 +2562,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 job_flags |= JOB_MANUAL_DISMISS;
 }
 
-stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
+stream_start(has_job_id ? job_id : NULL, bs, base_bs,
  job_flags, has_speed ? speed : 0, on_error,
  filter_node_name, _err);
 if (local_err) {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a142867..4f523c3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1151,7 +1151,7 @@ int is_windows_drive(const char *filename);
  * BlockDriverState.
  */
 void stream_start(const char *job_id, BlockDriverState *bs,
-  BlockDriverState *base, const char *backing_file_str,
+  BlockDriverState *base,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
   const char *filter_node_name,
-- 
1.8.3.1

[PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header

2020-10-13 Thread Andrey Shinkevich

Avoid writing a filter JSON-name to QCOW2 image when the backing file
is changed after the block stream job.

Signed-off-by: Andrey Shinkevich 
---
 block/stream.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index e0540ee..51462bd 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
 BlockDriverState *bs = blk_bs(bjob->blk);
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
 BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
+BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
 Error *local_err = NULL;
 int ret = 0;
 
@@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
 
 if (bdrv_cow_child(unfiltered_bs)) {
 const char *base_id = NULL, *base_fmt = NULL;
-if (base) {
-base_id = s->backing_file_str;
-if (base->drv) {
-base_fmt = base->drv->format_name;
+if (base_unfiltered) {
+base_id = base_unfiltered->filename;
+if (base_unfiltered->drv) {
+base_fmt = base_unfiltered->drv->format_name;
 }
 }
 bdrv_set_backing_hd(unfiltered_bs, base, _err);
-- 
1.8.3.1

[PATCH v11 13/13] block: apply COR-filter to block-stream jobs

2020-10-13 Thread Andrey Shinkevich

This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich 
---
 block/stream.c | 93 +-
 tests/qemu-iotests/030 | 51 +++--
 tests/qemu-iotests/030.out |  4 +-
 tests/qemu-iotests/141.out |  2 +-
 tests/qemu-iotests/245 | 19 +++---
 5 files changed, 81 insertions(+), 88 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index d3e1812..93564db 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -17,8 +17,10 @@
 #include "block/blockjob_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qdict.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
 
 enum {
 /*
@@ -33,6 +35,8 @@ typedef struct StreamBlockJob {
 BlockJob common;
 BlockDriverState *base_overlay; /* COW overlay (stream from this) */
 BlockDriverState *above_base;   /* Node directly above the base */
+BlockDriverState *cor_filter_bs;
+BlockDriverState *target_bs;
 BlockdevOnError on_error;
 bool bs_read_only;
 bool chain_frozen;
@@ -43,8 +47,7 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
 {
 assert(bytes < SIZE_MAX);
 
-return blk_co_preadv(blk, offset, bytes, NULL,
- BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
+return blk_co_preadv(blk, offset, bytes, NULL, BDRV_REQ_PREFETCH);
 }
 
 static void stream_abort(Job *job)
@@ -52,23 +55,20 @@ static void stream_abort(Job *job)
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
 if (s->chain_frozen) {
-BlockJob *bjob = >common;
-bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 }
 }
 
 static int stream_prepare(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
-BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
 BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
 BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
 Error *local_err = NULL;
 int ret = 0;
 
-bdrv_unfreeze_backing_chain(bs, s->above_base);
+bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
 s->chain_frozen = false;
 
 if (bdrv_cow_child(unfiltered_bs)) {
@@ -94,13 +94,14 @@ static void stream_clean(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockJob *bjob = >common;
-BlockDriverState *bs = blk_bs(bjob->blk);
+
+bdrv_cor_filter_drop(s->cor_filter_bs);
 
 /* Reopen the image back in read-only mode if necessary */
 if (s->bs_read_only) {
 /* Give up write permissions before making it read-only */
 blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, _abort);
-bdrv_reopen_set_read_only(bs, true, NULL);
+bdrv_reopen_set_read_only(s->target_bs, true, NULL);
 }
 }
 
@@ -108,9 +109,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockBackend *blk = s->common.blk;
-BlockDriverState *bs = blk_bs(blk);
-BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
-bool enable_cor = !bdrv_cow_child(s->base_overlay);
+BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
 int64_t len;
 int64_t offset = 0;
 uint64_t delay_ns = 0;
@@ -122,21 +121,12 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 return 0;
 }
 
-len = bdrv_getlength(bs);
+len = bdrv_getlength(s->target_bs);
 if (len < 0) {
 return len;
 }
 job_progress_set_remaining(>common.job, len);
 
-/* Turn on copy-on-read for the whole block device so that guest read
- * requests help us make progress.  Only do this when copying the entire
- *

[PATCH v11 03/13] qapi: add filter-node-name to block-stream

2020-10-13 Thread Andrey Shinkevich

Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/block-hmp-cmds.c | 4 ++--
 block/stream.c | 4 +++-
 blockdev.c | 4 +++-
 include/block/block_int.h  | 7 ++-
 qapi/block-core.json   | 6 ++
 5 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4d3db5e..4e66775 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 
 qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
  false, NULL, qdict_haskey(qdict, "speed"), speed, true,
- BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
- );
+ BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, 
false,
+ false, );
 
 hmp_handle_error(mon, error);
 }
diff --git a/block/stream.c b/block/stream.c
index 8ce6729..e0540ee 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = {
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp)
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp)
 {
 StreamBlockJob *s;
 BlockDriverState *iter;
diff --git a/blockdev.c b/blockdev.c
index bebd3ba..d719c47 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2489,6 +2489,7 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
   bool has_backing_file, const char *backing_file,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
+  bool has_filter_node_name, const char *filter_node_name,
   bool has_auto_finalize, bool auto_finalize,
   bool has_auto_dismiss, bool auto_dismiss,
   Error **errp)
@@ -2571,7 +2572,8 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
 }
 
 stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
- job_flags, has_speed ? speed : 0, on_error, _err);
+ job_flags, has_speed ? speed : 0, on_error,
+ filter_node_name, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 goto out;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38cad9d..f782737 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1134,6 +1134,9 @@ int is_windows_drive(const char *filename);
  *  See @BlockJobCreateFlags
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @on_error: The action to take upon error.
+ * @filter_node_name: The node name that should be assigned to the filter
+ * driver that the commit job inserts into the graph above @bs. NULL means
+ * that a node name should be autogenerated.
  * @errp: Error object.
  *
  * Start a streaming operation on @bs.  Clusters that are unallocated
@@ -1146,7 +1149,9 @@ int is_windows_drive(const char *filename);
 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
   int creation_flags, int64_t speed,
-  BlockdevOnError on_error, Error **errp);
+  BlockdevOnError on_error,
+  const char *filter_node_name,
+  Error **errp);
 
 /**
  * commit_start:
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3c16f1e..32fb097 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2533,6 +2533,11 @@
 #'stop' and 'enospc' can only be used if the block device
 #supports io-status (see BlockInfo).  Since 1.3.
 #
+# @filter-node-name: the node name that should be assigned to the
+#filter driver that the stream job inserts into the graph
+#above @device. If this option is not given, a node name is
+#autogenerated. (Since: 5.2)
+#
 # @auto-finalize: When false, this job will wait in a PENDING state after it 
has
 # finished its work, waiting for @block-job-finalize before
 # making any block graph changes.
@@ -2563,6 +2568,7 @@
   'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
 '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
 '*on-error': 'BlockdevOnError',
+

Re: Remove rpath from RPMs

2020-10-13 Thread Daniel Letai


  
  
Signed-off-by Daniel Letai d...@letai.org.il

On 13/10/2020 13:14:49, Daniel P.
  Berrangé wrote:


  On Tue, Oct 13, 2020 at 11:03:28AM +0100, Daniel P. Berrangé wrote:

  
On Sat, Oct 10, 2020 at 06:11:53PM +0300, Daniel Letai wrote:


 Hi,

   Attached see simple patch to remove rpath during meson build, so rpmbuild
   on redhat/fedora based distros don't complain.

   Seeing as all rpaths are in standard locations, this seems safe to me.



This was supposed to have been fixed by

  commit 69980ab798e240923ef12f86a92665b6c9ff7292
  Author: Andrea Bolognani 
  Date:   Wed Aug 19 11:15:35 2020 +0200

however that overlooked that %meson macro adds --auto-features=enabled,
which caused the rpath to be force enabled in the RPM build.



  >From 6622771b2f595b293e8872a44bfceb7f2097e4a5 Mon Sep 17 00:00:00 2001
From: Daniel Letai 
Date: Sat, 10 Oct 2020 18:02:13 +0300
Subject: [PATCH] Remove rpath from rpms

---
 libvirt.spec.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libvirt.spec.in b/libvirt.spec.in
index edf919d7ba..7e356bb843 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -1184,6 +1184,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' %{_specdir}/%{name}.spec)
-Dinit_script=systemd \
-Ddocs=enabled \
-Dtests=enabled \
+	   -Drpath=disabled \
%{?arg_login_shell}



so yeah, we need this. 

Reviewed-by: Daniel P. Berrangé 

  
  
BTW, we require contributions to be signed-off, to assert that the
author is in compliance with the Developer Certificate of Origin

   https://developercertificate.org/

Assuming you're fine with this, just reply to this mail with a
Signed-Off-By line that has your name+email addr


Regards,
Daniel


-- 
Regards,

Daniel Letai
+972 (0)505 870 456

Re: Remove rpath from RPMs

2020-10-13 Thread Daniel P . Berrangé

On Tue, Oct 13, 2020 at 11:03:28AM +0100, Daniel P. Berrangé wrote:
> On Sat, Oct 10, 2020 at 06:11:53PM +0300, Daniel Letai wrote:
> >Hi,
> > 
> >Attached see simple patch to remove rpath during meson build, so rpmbuild
> >on redhat/fedora based distros don't complain.
> > 
> >Seeing as all rpaths are in standard locations, this seems safe to me.
> 
> This was supposed to have been fixed by
> 
>   commit 69980ab798e240923ef12f86a92665b6c9ff7292
>   Author: Andrea Bolognani 
>   Date:   Wed Aug 19 11:15:35 2020 +0200
> 
> however that overlooked that %meson macro adds --auto-features=enabled,
> which caused the rpath to be force enabled in the RPM build.
> 
> > >From 6622771b2f595b293e8872a44bfceb7f2097e4a5 Mon Sep 17 00:00:00 2001
> > From: Daniel Letai 
> > Date: Sat, 10 Oct 2020 18:02:13 +0300
> > Subject: [PATCH] Remove rpath from rpms
> > 
> > ---
> >  libvirt.spec.in | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/libvirt.spec.in b/libvirt.spec.in
> > index edf919d7ba..7e356bb843 100644
> > --- a/libvirt.spec.in
> > +++ b/libvirt.spec.in
> > @@ -1184,6 +1184,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' 
> > %{_specdir}/%{name}.spec)
> > -Dinit_script=systemd \
> > -Ddocs=enabled \
> > -Dtests=enabled \
> > +  -Drpath=disabled \
> > %{?arg_login_shell}
> 
> so yeah, we need this. 
> 
> Reviewed-by: Daniel P. Berrangé 

BTW, we require contributions to be signed-off, to assert that the
author is in compliance with the Developer Certificate of Origin

   https://developercertificate.org/

Assuming you're fine with this, just reply to this mail with a
Signed-Off-By line that has your name+email addr


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: Remove rpath from RPMs

2020-10-13 Thread Daniel P . Berrangé

On Sat, Oct 10, 2020 at 06:11:53PM +0300, Daniel Letai wrote:
>Hi,
> 
>Attached see simple patch to remove rpath during meson build, so rpmbuild
>on redhat/fedora based distros don't complain.
> 
>Seeing as all rpaths are in standard locations, this seems safe to me.

This was supposed to have been fixed by

  commit 69980ab798e240923ef12f86a92665b6c9ff7292
  Author: Andrea Bolognani 
  Date:   Wed Aug 19 11:15:35 2020 +0200

however that overlooked that %meson macro adds --auto-features=enabled,
which caused the rpath to be force enabled in the RPM build.

> >From 6622771b2f595b293e8872a44bfceb7f2097e4a5 Mon Sep 17 00:00:00 2001
> From: Daniel Letai 
> Date: Sat, 10 Oct 2020 18:02:13 +0300
> Subject: [PATCH] Remove rpath from rpms
> 
> ---
>  libvirt.spec.in | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libvirt.spec.in b/libvirt.spec.in
> index edf919d7ba..7e356bb843 100644
> --- a/libvirt.spec.in
> +++ b/libvirt.spec.in
> @@ -1184,6 +1184,7 @@ export SOURCE_DATE_EPOCH=$(stat --printf='%Y' 
> %{_specdir}/%{name}.spec)
> -Dinit_script=systemd \
> -Ddocs=enabled \
> -Dtests=enabled \
> +-Drpath=disabled \
> %{?arg_login_shell}

so yeah, we need this. 

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH] logging: allow max_len=0 to disable log rollover

2020-10-13 Thread Peter Krempa

On Tue, Oct 13, 2020 at 10:21:58 +0100, Daniel Berrange wrote:
> Currently setting max_len=0 causes virtlogd to spin in a busy loop. It
> is natural to allow this to disable log rollover which can be useful for
> developers debugging things.
> 
> Note disabling rollover exposes the host to denial of service from a
> malicious guest, so must be used with care.
> 
> Closes https://gitlab.com/libvirt/libvirt/-/issues/85
> Signed-off-by: Daniel P. BerrangÃ© 
> ---
>  src/logging/virtlogd.conf  |  4 
>  src/util/virrotatingfile.c | 48 +-
>  2 files changed, 30 insertions(+), 22 deletions(-)

Reviewed-by: Peter Krempa

Re: [libvirt PATCH 0/2] fixes allocation issues leading to crashes

2020-10-13 Thread Ján Tomko


On a Monday in 2020, Pavel Hrdina wrote:

Our libvirt-dbus tests discovered allocation issues introduced by recent
rewrite to use g_new0 instead of VIR_ALLOC_N by allocating array one
element shorter in some places which can lead to crashes of client
applications counting on a fact that the returned arrays are NULL
terminated.



Oops, I did not notice that my vim macro [0] dropped these (due to looking
for the ')' from the left side).
There seem to be no other cases like this in my changes where the
expression contains a parenthesis.

Thanks for fixing this.

Jano


Pavel Hrdina (2):
 conf: fix g_new0 allocation
 conf: virsecretobj: fix g_new0 allocation

src/conf/virinterfaceobj.c  | 2 +-
src/conf/virnetworkobj.c| 4 ++--
src/conf/virnodedeviceobj.c | 2 +-
src/conf/virsecretobj.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)



[0] I was not able to replace everything with Coccinelle


signature.asc
Description: PGP signature

[libvirt PATCH] logging: allow max_len=0 to disable log rollover

2020-10-13 Thread Daniel P . Berrangé

Currently setting max_len=0 causes virtlogd to spin in a busy loop. It
is natural to allow this to disable log rollover which can be useful for
developers debugging things.

Note disabling rollover exposes the host to denial of service from a
malicious guest, so must be used with care.

Closes https://gitlab.com/libvirt/libvirt/-/issues/85
Signed-off-by: Daniel P. Berrangé 
---
 src/logging/virtlogd.conf  |  4 
 src/util/virrotatingfile.c | 48 +-
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/src/logging/virtlogd.conf b/src/logging/virtlogd.conf
index 8b1ff0156f..c53a1112bd 100644
--- a/src/logging/virtlogd.conf
+++ b/src/logging/virtlogd.conf
@@ -87,6 +87,10 @@
 
 # Maximum file size before rolling over. Defaults to 2 MB
 #
+# Setting max_size to zero will disable rollover entirely.
+# NOTE: disabling rollover exposes the host filesystem to
+# denial of service from a malicious guest.
+#
 # Beware that a logrotate config file might be installed too,
 # to handle cases where virtlogd is disabled. To ensure that
 # the logrotate config is a no-op when virtlogd is running,
diff --git a/src/util/virrotatingfile.c b/src/util/virrotatingfile.c
index a88c332cf4..9f1ef17c3e 100644
--- a/src/util/virrotatingfile.c
+++ b/src/util/virrotatingfile.c
@@ -225,7 +225,8 @@ virRotatingFileWriterDelete(virRotatingFileWriterPtr file)
  *
  * The files will never exceed @maxlen bytes in size,
  * but may be rolled over before they reach this size
- * in order to avoid splitting lines
+ * in order to avoid splitting lines. If @maxlen is
+ * zero then no rollover will be performed.
  */
 virRotatingFileWriterPtr
 virRotatingFileWriterNew(const char *path,
@@ -430,25 +431,27 @@ virRotatingFileWriterAppend(virRotatingFileWriterPtr file,
 size_t towrite = len;
 bool forceRollover = false;
 
-if (file->entry->pos > file->maxlen) {
-/* If existing file is for some reason larger then max length we
- * won't write to this file anymore, but we rollover this file.*/
-forceRollover = true;
-towrite = 0;
-} else if ((file->entry->pos + towrite) > file->maxlen) {
-towrite = file->maxlen - file->entry->pos;
-
-/*
- * If there's a newline in the last 80 chars
- * we're about to write, then break at that
- * point to avoid splitting lines across
- * separate files
- */
-for (i = 0; i < towrite && i < 80; i++) {
-if (buf[towrite - i - 1] == '\n') {
-towrite -= i;
-forceRollover = true;
-break;
+if (file->maxlen != 0) {
+if (file->entry->pos > file->maxlen) {
+/* If existing file is for some reason larger then max length 
we
+ * won't write to this file anymore, but we rollover this 
file.*/
+forceRollover = true;
+towrite = 0;
+} else if ((file->entry->pos + towrite) > file->maxlen) {
+towrite = file->maxlen - file->entry->pos;
+
+/*
+ * If there's a newline in the last 80 chars
+ * we're about to write, then break at that
+ * point to avoid splitting lines across
+ * separate files
+ */
+for (i = 0; i < towrite && i < 80; i++) {
+if (buf[towrite - i - 1] == '\n') {
+towrite -= i;
+forceRollover = true;
+break;
+}
 }
 }
 }
@@ -468,8 +471,9 @@ virRotatingFileWriterAppend(virRotatingFileWriterPtr file,
 file->entry->len += towrite;
 }
 
-if ((file->entry->pos == file->maxlen && len) ||
-forceRollover) {
+if (file->maxlen != 0 &&
+((file->entry->pos == file->maxlen && len) ||
+ forceRollover)) {
 virRotatingFileWriterEntryPtr tmp;
 VIR_DEBUG("Hit max size %zu on %s (force=%d)",
   file->maxlen, file->basepath, forceRollover);
-- 
2.26.2

[PATCH v2 6/7] hyperv: fix domainManagedSave on Hyper-V V2

2020-10-13 Thread Matt Coleman

Signed-off-by: Matt Coleman 
---
 src/hyperv/hyperv_driver.c  | 8 ++--
 src/hyperv/hyperv_wmi_classes.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
index aeee5b2b9f..fb46f42631 100644
--- a/src/hyperv/hyperv_driver.c
+++ b/src/hyperv/hyperv_driver.c
@@ -1644,6 +1644,10 @@ hypervDomainManagedSave(virDomainPtr domain, unsigned 
int flags)
 hypervPrivate *priv = domain->conn->privateData;
 Msvm_ComputerSystem *computerSystem = NULL;
 bool in_transition = false;
+int requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_SUSPENDED;
+
+if (priv->wmiVersion == HYPERV_WMI_VERSION_V2)
+requestedState = 
MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED_BUT_OFFLINE;
 
 virCheckFlags(0, -1);
 
@@ -1657,8 +1661,8 @@ hypervDomainManagedSave(virDomainPtr domain, unsigned int 
flags)
 goto cleanup;
 }
 
-result = hypervInvokeMsvmComputerSystemRequestStateChange
-   (domain, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_SUSPENDED);
+result = hypervInvokeMsvmComputerSystemRequestStateChange(domain,
+  requestedState);
 
  cleanup:
 hypervFreeObject(priv, (hypervObject *)computerSystem);
diff --git a/src/hyperv/hyperv_wmi_classes.h b/src/hyperv/hyperv_wmi_classes.h
index 0074d8889e..a5213901c8 100644
--- a/src/hyperv/hyperv_wmi_classes.h
+++ b/src/hyperv/hyperv_wmi_classes.h
@@ -73,6 +73,7 @@ enum _Msvm_ComputerSystem_EnabledState {
 enum _Msvm_ComputerSystem_RequestedState {
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED = 2,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_DISABLED = 3,
+MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED_BUT_OFFLINE = 6,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE = 9,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT = 10,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET = 11,
-- 
2.27.0

[PATCH v2 7/7] news: more Hyper-V APIs

2020-10-13 Thread Matt Coleman

Signed-off-by: Matt Coleman 
---
 NEWS.rst | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/NEWS.rst b/NEWS.rst
index bc35458f38..d0454b7840 100644
--- a/NEWS.rst
+++ b/NEWS.rst
@@ -22,8 +22,11 @@ v6.9.0 (unreleased)
   * hyperv: implement new APIs
 
 The ``virConnectGetCapabilities()``, ``virConnectGetMaxVcpus()``,
-``virConnectGetVersion()``, and ``virDomainGetAutostart()`` APIs have been
-implemented in the Hyper-V driver.
+``virConnectGetVersion()``, ``virDomainGetAutostart()``,
+``virDomainSetAutostart()``, ``virNodeGetFreeMemory()``,
+``virDomainReboot()``, ``virDomainReset()``, ``virDomainShutdown()``, and
+``virDomainShutdownFlags()`` APIs have been implemented in the Hyper-V
+driver.
 
   * bhyve: implement virtio-9p filesystem support
 
-- 
2.27.0

[PATCH v2 4/7] hyperv: implement domainShutdown and domainShutdownFlags

2020-10-13 Thread Matt Coleman

Co-authored-by: Sri Ramanujam 
Signed-off-by: Matt Coleman 
---
 src/hyperv/hyperv_driver.c| 77 ++
 src/hyperv/hyperv_wmi_generator.input | 78 +++
 2 files changed, 155 insertions(+)

diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
index 7182340f74..024ba7c6f4 100644
--- a/src/hyperv/hyperv_driver.c
+++ b/src/hyperv/hyperv_driver.c
@@ -917,6 +917,81 @@ hypervDomainResume(virDomainPtr domain)
 
 
 
+static int
+hypervDomainShutdownFlags(virDomainPtr domain, unsigned int flags)
+{
+int result = -1;
+hypervPrivate *priv = domain->conn->privateData;
+Msvm_ComputerSystem *computerSystem = NULL;
+Msvm_ShutdownComponent *shutdown = NULL;
+bool in_transition = false;
+char uuid[VIR_UUID_STRING_BUFLEN];
+g_auto(virBuffer) query = VIR_BUFFER_INITIALIZER;
+hypervInvokeParamsListPtr params = NULL;
+g_autofree char *selector = NULL;
+
+virCheckFlags(0, -1);
+
+virUUIDFormat(domain->uuid, uuid);
+
+if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
+goto cleanup;
+
+if (!hypervIsMsvmComputerSystemActive(computerSystem, _transition) ||
+in_transition) {
+virReportError(VIR_ERR_OPERATION_INVALID, "%s",
+   _("Domain is not active or in state transition"));
+goto cleanup;
+}
+
+virBufferEscapeSQL(,
+   MSVM_SHUTDOWNCOMPONENT_WQL_SELECT
+   "WHERE SystemName = '%s'", uuid);
+
+if (hypervGetWmiClass(Msvm_ShutdownComponent, ) < 0 ||
+!shutdown) {
+virReportError(VIR_ERR_OPERATION_FAILED,
+   _("Could not get Msvm_ShutdownComponent for domain with 
UUID '%s'"),
+   uuid);
+goto cleanup;
+}
+
+selector = g_strdup_printf(
+"CreationClassName=\"Msvm_ShutdownComponent\"=\"%s\"&"
+
"SystemCreationClassName=\"Msvm_ComputerSystem\"=\"%s\"",
+shutdown->data.common->DeviceID, uuid);
+
+params = hypervCreateInvokeParamsList(priv, "InitiateShutdown", selector,
+  Msvm_ShutdownComponent_WmiInfo);
+
+hypervAddSimpleParam(params, "Force", "False");
+hypervAddSimpleParam(params, "Reason", _("Planned shutdown via libvirt"));
+
+if (hypervInvokeMethod(priv, params, NULL) < 0) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("Could not shutdown domain with UUID '%s'"), uuid);
+goto cleanup;
+}
+
+result = 0;
+
+ cleanup:
+hypervFreeObject(priv, (hypervObject *) computerSystem);
+hypervFreeObject(priv, (hypervObject *) shutdown);
+
+return result;
+}
+
+
+
+static int
+hypervDomainShutdown(virDomainPtr domain)
+{
+return hypervDomainShutdownFlags(domain, 0);
+}
+
+
+
 static int
 hypervDomainReboot(virDomainPtr domain, unsigned int flags)
 {
@@ -2013,6 +2088,8 @@ static virHypervisorDriver hypervHypervisorDriver = {
 .domainLookupByName = hypervDomainLookupByName, /* 0.9.5 */
 .domainSuspend = hypervDomainSuspend, /* 0.9.5 */
 .domainResume = hypervDomainResume, /* 0.9.5 */
+.domainShutdown = hypervDomainShutdown, /* 6.9.0 */
+.domainShutdownFlags = hypervDomainShutdownFlags, /* 6.9.0 */
 .domainReboot = hypervDomainReboot, /* 6.9.0 */
 .domainReset = hypervDomainReset, /* 6.9.0 */
 .domainDestroy = hypervDomainDestroy, /* 0.9.5 */
diff --git a/src/hyperv/hyperv_wmi_generator.input 
b/src/hyperv/hyperv_wmi_generator.input
index bbca550790..1377138a12 100644
--- a/src/hyperv/hyperv_wmi_generator.input
+++ b/src/hyperv/hyperv_wmi_generator.input
@@ -1072,3 +1072,81 @@ class v2/Msvm_Keyboard
 uint16   Password
 boolean  UnicodeSupported
 end
+
+
+class Msvm_ShutdownComponent
+string   Caption
+string   Description
+string   ElementName
+datetime InstallDate
+string   Name
+uint16   OperationalStatus[]
+string   StatusDescriptions[]
+string   Status
+uint16   HealthState
+uint16   EnabledState
+string   OtherEnabledState
+uint16   RequestedState
+uint16   EnabledDefault
+datetime TimeOfLastStateChange
+string   SystemCreationClassName
+string   SystemName
+string   CreationClassName
+string   DeviceID
+boolean  PowerManagementSupported
+uint16   PowerManagementCapabilities[]
+uint16   Availability
+uint16   StatusInfo
+uint32   LastErrorCode
+string   ErrorDescription
+boolean  ErrorCleared
+string   OtherIdentifyingInfo[]
+uint64   PowerOnHours
+uint64   TotalPowerOnHours
+string   IdentifyingDescriptions[]
+uint16   AdditionalAvailability[]
+uint64   MaxQuiesceTime
+uint16   LocationIndicator
+end
+
+
+class v2/Msvm_ShutdownComponent
+string   InstanceID
+string   Caption
+string   Description
+string   ElementName
+datetime InstallDate
+string   Name
+uint16   OperationalStatus[]
+string

[PATCH v2 5/7] hyperv: fix domainSuspend and domainResume on Hyper-V V2

2020-10-13 Thread Matt Coleman

Signed-off-by: Matt Coleman 
---
 src/hyperv/hyperv_driver.c  | 15 +++
 src/hyperv/hyperv_wmi_classes.h |  1 +
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
index 024ba7c6f4..aeee5b2b9f 100644
--- a/src/hyperv/hyperv_driver.c
+++ b/src/hyperv/hyperv_driver.c
@@ -867,6 +867,10 @@ hypervDomainSuspend(virDomainPtr domain)
 int result = -1;
 hypervPrivate *priv = domain->conn->privateData;
 Msvm_ComputerSystem *computerSystem = NULL;
+int requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED;
+
+if (priv->wmiVersion == HYPERV_WMI_VERSION_V2)
+requestedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE;
 
 if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
 goto cleanup;
@@ -878,8 +882,8 @@ hypervDomainSuspend(virDomainPtr domain)
 goto cleanup;
 }
 
-result = hypervInvokeMsvmComputerSystemRequestStateChange
-   (domain, MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED);
+result = hypervInvokeMsvmComputerSystemRequestStateChange(domain,
+  requestedState);
 
  cleanup:
 hypervFreeObject(priv, (hypervObject *)computerSystem);
@@ -895,12 +899,15 @@ hypervDomainResume(virDomainPtr domain)
 int result = -1;
 hypervPrivate *priv = domain->conn->privateData;
 Msvm_ComputerSystem *computerSystem = NULL;
+int expectedState = MSVM_COMPUTERSYSTEM_ENABLEDSTATE_PAUSED;
+
+if (priv->wmiVersion == HYPERV_WMI_VERSION_V2)
+expectedState = MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE;
 
 if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
 goto cleanup;
 
-if (computerSystem->data.common->EnabledState !=
-MSVM_COMPUTERSYSTEM_ENABLEDSTATE_PAUSED) {
+if (computerSystem->data.common->EnabledState != expectedState) {
 virReportError(VIR_ERR_OPERATION_INVALID, "%s",
_("Domain is not paused"));
 goto cleanup;
diff --git a/src/hyperv/hyperv_wmi_classes.h b/src/hyperv/hyperv_wmi_classes.h
index 7f4159dd8e..0074d8889e 100644
--- a/src/hyperv/hyperv_wmi_classes.h
+++ b/src/hyperv/hyperv_wmi_classes.h
@@ -73,6 +73,7 @@ enum _Msvm_ComputerSystem_EnabledState {
 enum _Msvm_ComputerSystem_RequestedState {
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED = 2,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_DISABLED = 3,
+MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_QUIESCE = 9,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT = 10,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET = 11,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED = 32768,
-- 
2.27.0

[PATCH v2 2/7] hyperv: implement nodeGetFreeMemory

2020-10-13 Thread Matt Coleman

Co-authored-by: Sri Ramanujam 
Signed-off-by: Matt Coleman 
---
 src/hyperv/hyperv_driver.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
index 79b48a9dff..c4fca4685e 100644
--- a/src/hyperv/hyperv_driver.c
+++ b/src/hyperv/hyperv_driver.c
@@ -1400,6 +1400,33 @@ hypervDomainSetAutostart(virDomainPtr domain, int 
autostart)
 
 
 
+static unsigned long long
+hypervNodeGetFreeMemory(virConnectPtr conn)
+{
+unsigned long long res = 0;
+hypervPrivate *priv = conn->privateData;
+Win32_OperatingSystem *operatingSystem = NULL;
+g_auto(virBuffer) query = { 
g_string_new(WIN32_OPERATINGSYSTEM_WQL_SELECT), 0 };
+
+if (hypervGetWmiClass(Win32_OperatingSystem, ) < 0 ||
+!operatingSystem) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("Could not get free memory for host %s"),
+   conn->uri->server);
+goto cleanup;
+}
+
+/* Return free memory in bytes */
+res = operatingSystem->data.common->FreePhysicalMemory * 1024;
+
+cleanup:
+hypervFreeObject(priv, (hypervObject *) operatingSystem);
+
+return res;
+}
+
+
+
 static int
 hypervConnectIsEncrypted(virConnectPtr conn)
 {
@@ -1952,6 +1979,7 @@ static virHypervisorDriver hypervHypervisorDriver = {
 .domainCreateWithFlags = hypervDomainCreateWithFlags, /* 0.9.5 */
 .domainGetAutostart = hypervDomainGetAutostart, /* 6.9.0 */
 .domainSetAutostart = hypervDomainSetAutostart, /* 6.9.0 */
+.nodeGetFreeMemory = hypervNodeGetFreeMemory, /* 6.9.0 */
 .connectIsEncrypted = hypervConnectIsEncrypted, /* 0.9.5 */
 .connectIsSecure = hypervConnectIsSecure, /* 0.9.5 */
 .domainIsActive = hypervDomainIsActive, /* 0.9.5 */
-- 
2.27.0

[PATCH v2 3/7] hyperv: implement domainReboot and domainReset

2020-10-13 Thread Matt Coleman

Co-authored-by: Sri Ramanujam 
Signed-off-by: Matt Coleman 
---
 src/hyperv/hyperv_driver.c  | 48 +
 src/hyperv/hyperv_wmi_classes.h |  1 +
 2 files changed, 49 insertions(+)

diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
index c4fca4685e..7182340f74 100644
--- a/src/hyperv/hyperv_driver.c
+++ b/src/hyperv/hyperv_driver.c
@@ -917,6 +917,52 @@ hypervDomainResume(virDomainPtr domain)
 
 
 
+static int
+hypervDomainReboot(virDomainPtr domain, unsigned int flags)
+{
+int result = -1;
+hypervPrivate *priv = domain->conn->privateData;
+Msvm_ComputerSystem *computerSystem = NULL;
+
+virCheckFlags(0, -1);
+
+if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
+goto cleanup;
+
+result = hypervInvokeMsvmComputerSystemRequestStateChange(domain,
+MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT);
+
+ cleanup:
+hypervFreeObject(priv, (hypervObject *)computerSystem);
+
+return result;
+}
+
+
+
+static int
+hypervDomainReset(virDomainPtr domain, unsigned int flags)
+{
+int result = -1;
+hypervPrivate *priv = domain->conn->privateData;
+Msvm_ComputerSystem *computerSystem = NULL;
+
+virCheckFlags(0, -1);
+
+if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
+goto cleanup;
+
+result = hypervInvokeMsvmComputerSystemRequestStateChange(domain,
+MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET);
+
+ cleanup:
+hypervFreeObject(priv, (hypervObject *)computerSystem);
+
+return result;
+}
+
+
+
 static int
 hypervDomainDestroyFlags(virDomainPtr domain, unsigned int flags)
 {
@@ -1967,6 +2013,8 @@ static virHypervisorDriver hypervHypervisorDriver = {
 .domainLookupByName = hypervDomainLookupByName, /* 0.9.5 */
 .domainSuspend = hypervDomainSuspend, /* 0.9.5 */
 .domainResume = hypervDomainResume, /* 0.9.5 */
+.domainReboot = hypervDomainReboot, /* 6.9.0 */
+.domainReset = hypervDomainReset, /* 6.9.0 */
 .domainDestroy = hypervDomainDestroy, /* 0.9.5 */
 .domainDestroyFlags = hypervDomainDestroyFlags, /* 0.9.5 */
 .domainGetOSType = hypervDomainGetOSType, /* 0.9.5 */
diff --git a/src/hyperv/hyperv_wmi_classes.h b/src/hyperv/hyperv_wmi_classes.h
index d32711589a..7f4159dd8e 100644
--- a/src/hyperv/hyperv_wmi_classes.h
+++ b/src/hyperv/hyperv_wmi_classes.h
@@ -74,6 +74,7 @@ enum _Msvm_ComputerSystem_RequestedState {
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_ENABLED = 2,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_DISABLED = 3,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT = 10,
+MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET = 11,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_PAUSED = 32768,
 MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_SUSPENDED = 32769,
 };
-- 
2.27.0

[PATCH v2 0/7] more Hyper-V APIs

2020-10-13 Thread Matt Coleman

This set of patches adds several new APIs and fixes several others.

Changes since v1:
* 'enabledValue' and 'disabledValue' are now static strings in
  hypervDomainSetAutostart()
* a long virReportError() call has been split into two lines

Matt Coleman (7):
  hyperv: implement domainSetAutostart
  hyperv: implement nodeGetFreeMemory
  hyperv: implement domainReboot and domainReset
  hyperv: implement domainShutdown and domainShutdownFlags
  hyperv: fix domainSuspend and domainResume on Hyper-V V2
  hyperv: fix domainManagedSave on Hyper-V V2
  news: more Hyper-V APIs

 NEWS.rst  |   7 +-
 src/hyperv/hyperv_driver.c| 267 +-
 src/hyperv/hyperv_wmi_classes.h   |   3 +
 src/hyperv/hyperv_wmi_generator.input |  78 
 4 files changed, 347 insertions(+), 8 deletions(-)

-- 
2.27.0

[PATCH v2 1/7] hyperv: implement domainSetAutostart

2020-10-13 Thread Matt Coleman

Co-authored-by: Sri Ramanujam 
Signed-off-by: Matt Coleman 
---
 src/hyperv/hyperv_driver.c | 91 ++
 1 file changed, 91 insertions(+)

diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
index 2ac30fa4c6..79b48a9dff 100644
--- a/src/hyperv/hyperv_driver.c
+++ b/src/hyperv/hyperv_driver.c
@@ -1310,6 +1310,96 @@ hypervDomainGetAutostart(virDomainPtr domain, int 
*autostart)
 
 
 
+static int
+hypervDomainSetAutostart(virDomainPtr domain, int autostart)
+{
+int result = -1;
+char uuid_string[VIR_UUID_STRING_BUFLEN];
+hypervPrivate *priv = domain->conn->privateData;
+Msvm_VirtualSystemSettingData *vssd = NULL;
+hypervInvokeParamsListPtr params = NULL;
+g_auto(virBuffer) eprQuery = VIR_BUFFER_INITIALIZER;
+virHashTablePtr autostartParam = NULL;
+hypervWmiClassInfoListPtr embeddedParamClass = NULL;
+const char *methodName = NULL, *embeddedParamName = NULL;
+char enabledValue[] = "2", disabledValue[] = "0";
+
+if (priv->wmiVersion == HYPERV_WMI_VERSION_V1) {
+methodName = "ModifyVirtualSystem";
+embeddedParamName = "SystemSettingData";
+embeddedParamClass = Msvm_VirtualSystemGlobalSettingData_WmiInfo;
+} else if (priv->wmiVersion == HYPERV_WMI_VERSION_V2) {
+methodName = "ModifySystemSettings";
+embeddedParamName = "SystemSettings";
+embeddedParamClass = Msvm_VirtualSystemSettingData_WmiInfo;
+enabledValue[0] = '4';
+disabledValue[0] = '2';
+}
+
+virUUIDFormat(domain->uuid, uuid_string);
+
+if (hypervGetVSSDFromUUID(priv, uuid_string, ) < 0)
+goto cleanup;
+
+params = hypervCreateInvokeParamsList(priv, methodName,
+MSVM_VIRTUALSYSTEMMANAGEMENTSERVICE_SELECTOR,
+Msvm_VirtualSystemManagementService_WmiInfo);
+
+if (!params) {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("Could not create params"));
+goto cleanup;
+}
+
+if (priv->wmiVersion == HYPERV_WMI_VERSION_V1) {
+virBufferEscapeSQL(,
+   MSVM_COMPUTERSYSTEM_WQL_SELECT "WHERE Name = '%s'",
+   uuid_string);
+
+if (hypervAddEprParam(params, "ComputerSystem", priv, ,
+  Msvm_ComputerSystem_WmiInfo) < 0)
+goto params_cleanup;
+}
+
+autostartParam = hypervCreateEmbeddedParam(priv, embeddedParamClass);
+
+if (hypervSetEmbeddedProperty(autostartParam, "AutomaticStartupAction",
+  autostart ? enabledValue : disabledValue) < 0) {
+hypervFreeEmbeddedParam(autostartParam);
+goto params_cleanup;
+}
+
+if (hypervSetEmbeddedProperty(autostartParam, "InstanceID",
+  vssd->data.common->InstanceID) < 0) {
+hypervFreeEmbeddedParam(autostartParam);
+goto params_cleanup;
+}
+
+if (hypervAddEmbeddedParam(params, priv, embeddedParamName, autostartParam,
+   embeddedParamClass) < 0) {
+hypervFreeEmbeddedParam(autostartParam);
+goto params_cleanup;
+}
+
+if (hypervInvokeMethod(priv, params, NULL) < 0) {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("Could not set autostart"));
+goto cleanup;
+}
+
+result = 0;
+goto cleanup;
+
+ params_cleanup:
+hypervFreeInvokeParams(params);
+ cleanup:
+hypervFreeObject(priv, (hypervObject *) vssd);
+
+return result;
+}
+
+
+
 static int
 hypervConnectIsEncrypted(virConnectPtr conn)
 {
@@ -1861,6 +1951,7 @@ static virHypervisorDriver hypervHypervisorDriver = {
 .domainCreate = hypervDomainCreate, /* 0.9.5 */
 .domainCreateWithFlags = hypervDomainCreateWithFlags, /* 0.9.5 */
 .domainGetAutostart = hypervDomainGetAutostart, /* 6.9.0 */
+.domainSetAutostart = hypervDomainSetAutostart, /* 6.9.0 */
 .connectIsEncrypted = hypervConnectIsEncrypted, /* 0.9.5 */
 .connectIsSecure = hypervConnectIsSecure, /* 0.9.5 */
 .domainIsActive = hypervDomainIsActive, /* 0.9.5 */
-- 
2.27.0

Re: Overlay limit bug

2020-10-13 Thread Peter Krempa

Adding libvir-list to cc. If you come across stuff that involves both
libvirt and qemu it's best to crosspost it to libvir-list too as I've
noticed this only accidentally.

On Mon, Oct 12, 2020 at 15:03:10 -0400, Yoonho Park wrote:
> I stumbled on a bug in qemu 4.2.0 (virsh 6.0.0) with a large number of
> overlays. I am using "qemu-img create" and "virsh snapshot-create-as" to
> create each overlay. When I run "virsh snapshot-create-as" for the 42nd
> overlay, I get "error: No complete monitor response found in 10485760

This error comes from libvirt's DoS protection. The response from qemu
is too large. This happens when 'query-named-block-nodes' is called on
too-deep backing chains as the reply contains both nested and linear
entries. Thus for a N deep backing chain, you get N + N-1 + N-2 ...
entries in the returned data.

Libvirt stops reading after 1MiB of json to prevent buffer overflows
from a rogue qemu.

> bytes: Numerical result out of range". However, I pulled down qemu 5.1.50
> (still using virsh 6.0.0), and it looks like the problem has disappeared
> which is great. Does anyone know the patch set that addressed this bug?

qemu patch:

commit facda5443f5a8676fb635b82ac1046ac6b6a67ce
Author: Peter Krempa 
Date:   Mon Jan 20 09:50:49 2020 +0100

qapi: Allow getting flat output from 'query-named-block-nodes'

When a management application manages node names there's no reason to
recurse into backing images in the output of query-named-block-nodes.

Add a parameter to the command which will return just the top level
structs.

v4.2.0-1584-gfacda5443f

libvirt patches:

commit 95080cc8b470c977d53043d4eff3e30781f472eb
Author: Peter Krempa 
Date:   Tue Jan 21 16:51:40 2020 +0100

qemu: Don't request nested entries in qemuBlockGetNamedNodeData

Use the 'flat' flag for 'query-named-block-nodes' if qemu supports
QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT in qemuBlockGetNamedNodeData.

We don't need the data so plumb in whether qemu supports the
'flat' output.

Signed-off-by: Peter Krempa 
Reviewed-by: JÃ¡n Tomko 

commit 855211bbf3ed45a73159f45eba1b15f05d771b76
Author: Peter Krempa 
Date:   Tue Jan 21 16:42:49 2020 +0100

qemu: monitor: Add 'flat' parameter for qemuMonitorJSONQueryNamedBlockNodes

Modern qemu allows to skip the nested redundant data in the output of
query-named-block-nodes. Plumb in the support for the argument that
enables it.

Signed-off-by: Peter Krempa 
Reviewed-by: JÃ¡n Tomko 

commit 63610bd5fbe67ba9eb4f22f67c4bdab6eda8c041
Author: Peter Krempa 
Date:   Wed Feb 26 12:50:53 2020 +0100

qemuCheckpointDiscardBitmaps: Use qemuBlockGetNamedNodeData

Replace qemuMonitorBlockGetNamedNodeData by qemuBlockGetNamedNodeData.

Signed-off-by: Peter Krempa 
Reviewed-by: JÃ¡n Tomko 

commit f886c9f33051fc03dd6c78134c92d31e7caf0c81
Author: Peter Krempa 
Date:   Tue Jan 21 16:33:12 2020 +0100

qemu: monitor: Refactor variable cleanup in 
qemuMonitorJSONQueryNamedBlockNodes

Use g_autoptr to get rid of the cleanup section.

Signed-off-by: Peter Krempa 
Reviewed-by: JÃ¡n Tomko 

commit b7991c903cdd5b3c8b79a206584a4db81a6d4eaa
Author: Peter Krempa 
Date:   Tue Jan 21 15:13:47 2020 +0100

qemu: capabilities: Add capability for the 'flat' argument of 
'query-named-block-nodes'

Detect the presence of the flag and make it available internally as
QEMU_CAPS_QMP_QUERY_NAMED_BLOCK_NODES_FLAT.

Signed-off-by: Peter Krempa 
Reviewed-by: JÃ¡n Tomko 

v6.1.0-29-g95080cc8b4

> Also, does anyone know the "official" limit on the number of overlays that
> can be created and is there a qemu test that exercises this? I could not

Libvirt's official limit is 200 layers.

This is the documentation for the validation function explaining the
rationale:

/**
 * qemuDomainStorageSourceValidateDepth:
 * @src: storage source chain to validate
 * @add: offsets the calculated number of images
 * @diskdst: optional disk target to use in error message
 *
 * The XML parser limits the maximum element nesting to 256 layers. As libvirt
 * reports the chain into the status and in some cases the config XML we must
 * validate that any user-provided chains will not exceed the XML nesting limit
 * when formatted to the XML.
 *
 * This function validates that the storage source chain starting @src is at
 * most 200 layers deep. @add modifies the calculated value to offset the number
 * to allow checking cases when new layers are going to be added to the chain.
 *
 * Returns 0 on success and -1 if the chain is too deep. Error is reported.
 */
int
qemuDomainStorageSourceValidateDepth(virStorageSourcePtr src,
 int add,
 const char *diskdst)

> find an overlay limit test in tests/qemu-iotests.

I'd guess qemu doesn't really limit that, but at some point you'll get
too much of a performance penalty form traversing all the

Re: [RFC] qemu: virtiofs can be used without NUMA nodes

2020-10-13 Thread Marc Hartmayer

On Tue, Oct 06, 2020 at 06:20 PM +0200, Marc Hartmayer  
wrote:
> ...if a machine memory-backend using shared memory is configured for
> the guest. This is especially important for QEMU machine types that
> don't have NUMA but virtiofs support.
>
> An example snippet:
>
>   
> test
> 2097152
> 
>   
> 
> 
>   
>   
>   
>   
>   
>   ...
> 
> ...
>   
>
> and the corresponding QEMU command line:
>
>   /usr/bin/qemu-system-s390x \
>   -machine s390-ccw-virtio-5.2,memory-backend=s390.ram \
>   -m 2048 \
>   -object
>   
> memory-backend-file,id=s390.ram,mem-path=/var/lib/libvirt/qemu/ram/46-test/s390.ram,share=yes,size=2147483648
>  \
>   ...
>
> Signed-off-by: Marc Hartmayer 
> ---
> Note: There are still some TODOs left... e.g. adapt the virtiofs
> documentation of libvirt.
> ---
>  src/qemu/qemu_validate.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c
> index a212605579d2..077a85b30802 100644
> --- a/src/qemu/qemu_validate.c
> +++ b/src/qemu/qemu_validate.c
> @@ -3470,14 +3470,21 @@ qemuValidateDomainDeviceDefGraphics(const 
> virDomainGraphicsDef *graphics,
>  
>  
>  static int
> -qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def)
> +qemuValidateDomainDefVirtioFSSharedMemory(const virDomainDef *def,
> +  virQEMUCapsPtr qemuCaps)
>  {
> +const char *defaultRAMId = virQEMUCapsGetMachineDefaultRAMid(qemuCaps,
> + 
> def->virtType,
> + 
> def->os.machine);
>  size_t numa_nodes = virDomainNumaGetNodeCount(def->numa);
>  size_t i;
>  
> -if (numa_nodes == 0) {
> +if (numa_nodes == 0 &&
> +!(defaultRAMId && def->mem.access == 
> VIR_DOMAIN_MEMORY_ACCESS_SHARED)) {
> +/* TODO do we need further checks here (e.g. check whether
> + * memory backend is supported by the QEMU binary)? */
>  virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> -   _("virtiofs requires one or more NUMA nodes"));
> +   _("virtiofs requires shared memory"));
>  return -1;
>  }
>  
> @@ -3591,7 +3598,7 @@ qemuValidateDomainDeviceDefFS(virDomainFSDefPtr fs,
> _("virtiofs does not support multidevs"));
>  return -1;
>  }
> -if (qemuValidateDomainDefVirtioFSSharedMemory(def) < 0)
> +if (qemuValidateDomainDefVirtioFSSharedMemory(def, qemuCaps) < 0)
>  return -1;
>  break;
>  
> -- 
> 2.25.4
>

Gentle ping :)

-- 
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Re: [PATCH v2 4/7] hyperv: implement domainShutdown and domainShutdownFlags

2020-10-13 Thread Pino Toscano

On Tuesday, 13 October 2020 07:14:01 CEST Matt Coleman wrote:
> +hypervAddSimpleParam(params, "Force", "False");
> +hypervAddSimpleParam(params, "Reason", _("Planned shutdown via 
> libvirt"));

Is this message logged in the Hyper-V logs? If so, then I'd not
translate it: imagine an user running libvirt on an e.g. a French
locale, sending this query to an Hyper-V on e.g. German. While English
is not that different in this situation (still potentially not the
Hyper-V language), at least IMHO it gives something understandable.

Of course, in case the message is changed to an untranslated text,
then please leave a comment there, so noone accidentally makes it
translatable.

-- 
Pino Toscano

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH v2 3/7] hyperv: implement domainReboot and domainReset

2020-10-13 Thread Pino Toscano

On Tuesday, 13 October 2020 07:14:00 CEST Matt Coleman wrote:
> Co-authored-by: Sri Ramanujam 
> Signed-off-by: Matt Coleman 
> ---
>  src/hyperv/hyperv_driver.c  | 48 +
>  src/hyperv/hyperv_wmi_classes.h |  1 +
>  2 files changed, 49 insertions(+)
> 
> diff --git a/src/hyperv/hyperv_driver.c b/src/hyperv/hyperv_driver.c
> index c4fca4685e..7182340f74 100644
> --- a/src/hyperv/hyperv_driver.c
> +++ b/src/hyperv/hyperv_driver.c
> @@ -917,6 +917,52 @@ hypervDomainResume(virDomainPtr domain)
>  
>  
>  
> +static int
> +hypervDomainReboot(virDomainPtr domain, unsigned int flags)
> +{
> +int result = -1;
> +hypervPrivate *priv = domain->conn->privateData;
> +Msvm_ComputerSystem *computerSystem = NULL;
> +
> +virCheckFlags(0, -1);
> +
> +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
> +goto cleanup;
> +
> +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain,
> +
> MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_REBOOT);
> +
> + cleanup:
> +hypervFreeObject(priv, (hypervObject *)computerSystem);
> +
> +return result;
> +}
> +
> +
> +
> +static int
> +hypervDomainReset(virDomainPtr domain, unsigned int flags)
> +{
> +int result = -1;
> +hypervPrivate *priv = domain->conn->privateData;
> +Msvm_ComputerSystem *computerSystem = NULL;
> +
> +virCheckFlags(0, -1);
> +
> +if (hypervMsvmComputerSystemFromDomain(domain, ) < 0)
> +goto cleanup;
> +
> +result = hypervInvokeMsvmComputerSystemRequestStateChange(domain,
> +
> MSVM_COMPUTERSYSTEM_REQUESTEDSTATE_RESET);
> +
> + cleanup:
> +hypervFreeObject(priv, (hypervObject *)computerSystem);
> +
> +return result;
> +}

What about making a common helper function that calls
hypervMsvmComputerSystemFromDomain +
hypervInvokeMsvmComputerSystemRequestStateChange?

Note that virDomainReboot() can take various flags, however none is
used ATM.

-- 
Pino Toscano

signature.asc
Description: This is a digitally signed message part.

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-13 Thread Sean Christopherson

On Tue, Oct 13, 2020 at 01:33:28AM -0400, harry harry wrote:
> > > Do you mean that GPAs are different from their corresponding HVAs when
> > > KVM does the walks (as you said above) in software?
> >
> > What do you mean by "different"?  GPAs and HVAs are two completely
> different
> > address spaces.
> 
> Let me give you one concrete example as follows to explain the meaning of
> ``different''.
> 
> Suppose a program is running in a single-vCPU VM. The program allocates and
> references one page (e.g., array[1024*4]). Assume that allocating and
> referencing the page in the guest OS triggers a page fault and host OS
> allocates a machine page to back it.
> 
> Assume that GVA of array[0] is 0x0021 and its corresponding GPA is
> 0x0081. I think array[0]'s corresponding HVA should also be
> 0x0081, which is the same as array[0]'s GPA. If array[0]'s HVA
> is not 0x0081, array[0]'s GPA is* different* from its
> corresponding HVA.
> 
> Now, let's assume array[0]'s GPA is different from its corresponding HVA. I
> think there might be one issue like this: I think MMU's hardware logic to
> translate ``GPA ->[extended/nested page tables] -> HPA''[1] should be the
> same as ``VA-> [page tables] -> PA"[2]; if true, how does KVM find the
> correct HPA with the different HVA (e.g., array[0]'s HVA is not
> 0x0081) when there are EPT violations?

This is where memslots come in.  Think of memslots as a one-level page tablea
that translate GPAs to HVAs.  A memslot, set by userspace, tells KVM the
corresponding HVA for a given GPA.

Before the guest is running (assuming host userspace isn't broken), the
userspace VMM will first allocate virtual memory (HVA) for all physical
memory it wants to map into the guest (GPA).  It then tells KVM how to
translate a given GPA to its HVA by creating a memslot.

To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
GPA = 0xa000.  For KVM to create a GPA->HPA mapping for the guest, there _must_
be a memslot that translates GPA 0xa000 to an HVA[*].  Let's say HVA = 0xb000.

On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to
its HVA (0xb000), and then walks the host page tables to translate the HVA into
a HPA (let's say that ends up being 0xc000).  KVM then stuffs 0xc000 into the
EPT tables, which yields:

  GPA-> HVA(KVM memslots)
  0xa0000xb000

  HVA-> HPA(host page tables)
  0xb0000xc000

  GPA-> HPA(extended page tables)
  0xa0000xc000

To keep the EPT tables synchronized with the host page tables, if HVA->HPA
changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
by the host kernel that the HVA has been unmapped and will find and unmap
the corresponding GPA (again via memslots) to HPA translations.

Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
(and presumably a new HPA).

[*] If there is no memslot, KVM will exit to userspace on the EPT violation,
with some information about what GPA the guest was accessing.  This is how
emulated MMIO is implemented, e.g. userspace intentionally doesn't back a
GPA with a memslot so that it can trap guest accesses to said GPA for the
purpose of emulating a device.

> [1] Please note that this hardware walk is the last step, which only
> translates the guest physical address to the host physical address through
> the four-level nested page table.
> [2] Please note that this hardware walk assumes translating the VA to the
> PA without virtualization involvement.
> 
> Please note that the above addresses are not real and just use for
> explanations.
> 
> Thanks,
> Harry

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-13 Thread Paolo Bonzini

On 13/10/20 07:46, harry harry wrote:
> Now, let's assume array[0]'s GPA is different from its corresponding
> HVA. I think there might be one issue like this: I think MMU's hardware
> logic to translate ``GPA ->[extended/nested page tables] -> HPA''[1]
> should be the same as ``VA-> [page tables] -> PA"[2]; if true, how does
> KVM find the correct HPA with the different HVA (e.g., array[0]'s HVA is
> not  0x0081) when there are EPT violations?

It has separate data structures that help with the translation.  These
data structures are specific to KVM for GPA to HVA translation, while
for HVA to HPA the Linux functionality is reused.

> BTW, I assume the software logic for KVM to find the HPA with a given
> HVA (as you said like below) should be the same as the hardware logic in
> MMU to translate ``GPA -> [extended/nested page tables] -> HPA''.

No, the logic to find the HPA with a given HVA is the same as the
hardware logic to translate HVA -> HPA.  That is it uses the host
"regular" page tables, not the nested page tables.

In order to translate GPA to HPA, instead, KVM does not use the nested
page tables.  It performs instead two steps, from GPA to HVA and from
HVA to HPA:

* for GPA to HVA it uses a custom data structure.

* for HVA to HPA it uses the host page tables as mentioned above.

This is because:

* the GPA to HVA translation is the one that is almost always
sufficient, and the nested page tables do not provide this information

* even if GPA to HPA is needed, the nested page tables are built lazily
and therefore may not always contain the requested mapping.  In addition
using HPA requires special steps (such as calling get_page/put_page) and
often these steps need an HVA anyway.

Paolo

75 matches

Mail list logo