Re: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices

2020-08-20 Thread Jason Wang



On 2020/8/20 下午8:27, Cornelia Huck wrote:

On Wed, 19 Aug 2020 17:28:38 +0800
Jason Wang  wrote:


On 2020/8/19 下午4:13, Yan Zhao wrote:

On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote:

On 2020/8/19 下午2:59, Yan Zhao wrote:

On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote:

On 2020/8/19 上午11:30, Yan Zhao wrote:

hi All,
could we decide that sysfs is the interface that every VFIO vendor driver
needs to provide in order to support vfio live migration, otherwise the
userspace management tool would not list the device into the compatible
list?

if that's true, let's move to the standardizing of the sysfs interface.
(1) content
common part: (must)
   - software_version: (in major.minor.bugfix scheme)

This can not work for devices whose features can be negotiated/advertised
independently. (E.g virtio devices)

I thought the 'software_version' was supposed to describe kind of a
'protocol version' for the data we transmit? I.e., you add a new field,
you bump the version number.



Ok, but since we mandate backward compatibility of uABI, is this really 
worth to have a version for sysfs? (Searching on sysfs shows no examples 
like this)





  

sorry, I don't understand here, why virtio devices need to use vfio interface?

I don't see any reason that virtio devices can't be used by VFIO. Do you?

Actually, virtio devices have been used by VFIO for many years:

- passthrough a hardware virtio devices to userspace(VM) drivers
- using virtio PMD inside guest
  

So, what's different for it vs passing through a physical hardware via VFIO?


The difference is in the guest, the device could be either real hardware
or emulated ones.



even though the features are negotiated dynamically, could you explain
why it would cause software_version not work?


Virtio device 1 supports feature A, B, C
Virtio device 2 supports feature B, C, D

So you can't migrate a guest from device 1 to device 2. And it's
impossible to model the features with versions.

We're talking about the features offered by the device, right? Would it
be sufficient to mandate that the target device supports the same
features or a superset of the features supported by the source device?



Yes.






  

I think this thread is discussing about vfio related devices.
  

   - device_api: vfio-pci or vfio-ccw ...
   - type: mdev type for mdev device or
   a signature for physical device which is a counterpart for
   mdev type.

device api specific part: (must)
  - pci id: pci id of mdev parent device or pci id of physical pci
device (device_api is vfio-pci)API here.

So this assumes a PCI device which is probably not true.
  

for device_api of vfio-pci, why it's not true?

for vfio-ccw, it's subchannel_type.

Ok but having two different attributes for the same file is not good idea.
How mgmt know there will be a 3rd type?

that's why some attributes need to be common. e.g.
device_api: it's common because mgmt need to know it's a pci device or a
  ccw device. and the api type is already defined vfio.h.
(The field is agreed by and actually suggested by Alex in previous 
mail)
type: mdev_type for mdev. if mgmt does not understand it, it would not
be able to create one compatible mdev device.
software_version: mgmt can compare the major and minor if it understands
this fields.


I think it would be helpful if you can describe how mgmt is expected to
work step by step with the proposed sysfs API. This can help people to
understand.

My proposal would be:
- check that device_api matches
- check possible device_api specific attributes
- check that type matches [I don't think the combination of mdev types
   and another attribute to determine compatibility is a good idea;



Any reason for this? Actually if we only use mdev type to detect the 
compatibility, it would be much more easier. Otherwise, we are actually 
re-inventing mdev types.


E.g can we have the same mdev types with different device_api and other 
attributes?




   actually, the current proposal confuses me every time I look at it]
- check that software_version is compatible, assuming semantic
   versioning
- check possible type-specific attributes



I'm not sure if this is too complicated. And I suspect there will be 
vendor specific attributes:


- for compatibility check: I think we should either modeling everything 
via mdev type or making it totally vendor specific. Having something in 
the middle will bring a lot of burden
- for provisioning: it's still not clear. As shown in this proposal, for 
NVME we may need to set remote_url, but unless there will be a subclass 
(NVME) in the mdev (which I guess not), we can't prevent vendor from 
using another attribute name, in this case, tricks like attributes 
iteration in some sub directory won't work. So even if we had some 
common API for compatibility check, the provisioning API is still vendor 
specific ...


Thanks






Thanks for the 

[PATCH] Modify virCPUarmCompare to perform compare actions

2020-08-20 Thread Zhenyu Zheng
Modify virCPUarmCompare in cpu_arm.c to perform
actual compare actions. Compare host cpu vendor
and model info with guest cpu as initial implementation,
as most ARM clouds uses host-passthrogh mode.

Signed-off-by: Zhenyu Zheng 
---
 src/cpu/cpu_arm.c | 188 +-
 1 file changed, 184 insertions(+), 4 deletions(-)

diff --git a/src/cpu/cpu_arm.c b/src/cpu/cpu_arm.c
index 374a4d6f6c..98c5e551f5 100644
--- a/src/cpu/cpu_arm.c
+++ b/src/cpu/cpu_arm.c
@@ -118,6 +118,36 @@ virCPUarmMapNew(void)
 return map;
 }
 
+static int
+virCPUarmDataCopy(virCPUarmData *dst, const virCPUarmData *src)
+{
+if (VIR_ALLOC(dst->features) < 0)
+return -1;
+
+dst->pvr = src->pvr;
+dst->vendor_id = src->vendor_id;
+dst->features = src->features;
+
+return 0;
+}
+
+static virCPUDataPtr
+virCPUarmMakeCPUData(virArch arch,
+ virCPUarmData *data)
+{
+virCPUDataPtr cpuData;
+
+if (VIR_ALLOC(cpuData) < 0)
+return NULL;
+
+cpuData->arch = arch;
+
+if (virCPUarmDataCopy(>data.arm, data) < 0)
+VIR_FREE(cpuData);
+
+return cpuData;
+}
+
 static void
 virCPUarmDataClear(virCPUarmData *data)
 {
@@ -376,6 +406,42 @@ virCPUarmModelParse(xmlXPathContextPtr ctxt,
 return 0;
 }
 
+static virCPUarmModelPtr
+virCPUarmModelCopy(virCPUarmModelPtr model)
+{
+virCPUarmModelPtr copy;
+
+copy = g_new0(virCPUarmModel, 1);
+copy->name = g_strdup(model->name);
+virCPUarmDataCopy(>data, >data);
+copy->vendor = model->vendor;
+
+return g_steal_pointer();
+}
+
+static virCPUarmModelPtr
+virCPUarmModelFromCPU(const virCPUDef *cpu,
+  virCPUarmMapPtr map)
+{
+g_autoptr(virCPUarmModel) model = NULL;
+
+if (!cpu->model) {
+virReportError(VIR_ERR_INVALID_ARG, "%s",
+   _("no CPU model specified"));
+return NULL;
+}
+
+if (!(model = virCPUarmModelFind(map, cpu->model))) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+_("Unknown CPU model %s"), cpu->model);
+return NULL;
+}
+
+model = virCPUarmModelCopy(model);
+
+return g_steal_pointer();
+}
+
 static virCPUarmMapPtr
 virCPUarmLoadMap(void)
 {
@@ -463,11 +529,125 @@ virCPUarmBaseline(virCPUDefPtr *cpus,
 }
 
 static virCPUCompareResult
-virCPUarmCompare(virCPUDefPtr host G_GNUC_UNUSED,
- virCPUDefPtr cpu G_GNUC_UNUSED,
- bool failMessages G_GNUC_UNUSED)
+armCompute(virCPUDefPtr host,
+   virCPUDefPtr cpu,
+   virCPUDataPtr *guestData,
+   char **message)
 {
-return VIR_CPU_COMPARE_IDENTICAL;
+virCPUarmMapPtr map = NULL;
+virCPUarmModelPtr host_model = NULL;
+virCPUarmModelPtr guest_model = NULL;
+virCPUCompareResult ret = VIR_CPU_COMPARE_ERROR;
+virArch arch;
+size_t i;
+
+if (cpu->arch != VIR_ARCH_NONE) {
+bool found = false;
+
+for (i = 0; i < G_N_ELEMENTS(archs); i++) {
+if (archs[i] == cpu->arch) {
+found = true;
+break;
+}
+}
+
+if (!found) {
+VIR_DEBUG("CPU arch %s does not match host arch",
+  virArchToString(cpu->arch));
+if (message)
+*message = g_strdup_printf(_("CPU arch %s does not match host 
arch"),
+   virArchToString(cpu->arch));
+
+ret = VIR_CPU_COMPARE_INCOMPATIBLE;
+goto cleanup;
+}
+arch = cpu->arch;
+} else {
+arch = host->arch;
+}
+
+if (cpu->vendor &&
+(!host->vendor || STRNEQ(cpu->vendor, host->vendor))) {
+VIR_DEBUG("host CPU vendor does not match required CPU vendor %s",
+  cpu->vendor);
+if (message) {
+*message = g_strdup_printf(_("host CPU vendor does not match 
required "
+ "CPU vendor %s"),
+   cpu->vendor);
+}
+
+ret = VIR_CPU_COMPARE_INCOMPATIBLE;
+goto cleanup;
+}
+
+if (!(map = virCPUarmLoadMap()))
+goto cleanup;
+
+/* Host CPU information */
+if (!(host_model = virCPUarmModelFromCPU(host, map)))
+goto cleanup;
+
+/* Guest CPU information */
+if (!(guest_model = virCPUarmModelFromCPU(cpu, map)))
+goto cleanup;
+
+if (STRNEQ(guest_model->name, host_model->name)) {
+VIR_DEBUG("host CPU model does not match required CPU model %s",
+  guest_model->name);
+if (message) {
+*message = g_strdup_printf(_("host CPU model does not match 
required "
+ "CPU model %s"),
+   guest_model->name);
+}
+
+ret = VIR_CPU_COMPARE_INCOMPATIBLE;
+goto cleanup;
+}
+
+if (guestData)
+if (!(*guestData = virCPUarmMakeCPUData(arch, _model->data)))
+

Re: [PATCH] Xen: Improve parsing of PCI addresses in config converter

2020-08-20 Thread Daniel Henrique Barboza




On 8/10/20 8:14 PM, Jim Fehlig wrote:

There was a report on libvirt-users [1] about the domxml to/from
native converter in the Xen driver not handling PCI addresses
without a domain specification. This patch improves parsing of PCI
addresses in the converter and allows PCI addresses with only
bb:ss.f. xl.cfg(5) also allows either the :bb:ss.f or bb:ss.f
format. A test has been added to check the conversion from xl.cfg
to domXML.

[1] https://www.redhat.com/archives/libvirt-users/2020-August/msg00040.html

Signed-off-by: Jim Fehlig 
---



Reviewed-by: Daniel Henrique Barboza 



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Jim Fehlig

On 8/20/20 2:07 PM, Jim Fehlig wrote:

On 8/20/20 12:01 PM, Daniel P. Berrangé wrote:

On Thu, Aug 20, 2020 at 11:42:45AM -0600, Jim Fehlig wrote:

On 8/20/20 10:54 AM, Mark Mielke wrote:

On Thu, Aug 20, 2020 at 11:48 AM Christian Ehrhardt
mailto:christian.ehrha...@canonical.com>> wrote:

 On Thu, Aug 20, 2020 at 5:15 PM Mark Mielke mailto:mark.mie...@gmail.com>> wrote:
  > On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt
 >

 wrote:
  >> This was meant for upgrades, but if libvirt would define a known 
path in
  >> there like /var/run/qemu/last_packaging_change the packages could 
easily
  >> touch it on any install/remove/update as Daniel suggested and 
libvirt could

  >> check this path like it does with the date of the qemu binary already.
  > Earlier in this thread - I think one or two of us had asked about the
 timestamp on the directory that contains the modules.
  > I'm wondering if a "last_packaging_change" provides any value over and
 above the timestamp of the directory itself? Wouldn't the directory be 
best
 - as it would work automatically for both distro packaging as well as 
custom

 installs?
 Sure, if
 - "list of files in module dir"
 - "stat dates of files in module dir"
 would be checked by libvirt that would also be a no-op for packaging
 and thereby land faster.


Why is the list of files and stat dates needed? Any change done by RPM
to add or remove a file from the directory, should cause the mtime for
the directory to be updated. It doesn't really matter what the change is
- it only matters that the change is detected.

The case for needing to check the files *in* the directory, would be a
concern about the modules keeping the same name, but being updated in
place. I'm doubtful that this ever occurs for real, as it would cause
breakage for running programs. Existing running programs would mmap() or
similar the binaries into memory, and cannot be updated in place.
Instead, the inode remains alive, and a new file is created with the
same name, to replace the old file, and once all file descriptors are
closed - the inode is deleted.

So, unless there is a hierarchical directory structure - I believe
checking the modules directory timestamp is near 100% accuracy for
whether modules were added, removed, or updated using "safe" deployment
practices either by RPM or "make install".


I wrote a small test program that checks mtime (also tried ctime) and it
only changes on the directory when files are added and deleted. There is no
change to either if a file in the directory has been updated. It would be
really convenient to check only the directory to see if its' contents have
changed but I'm not aware of how to do that other than with something like
inotify. Suggestions welcomed.


IIUC, Mark's point is that an RPM update won't replace the file in-placec.
It will write out a new temporary file and then rename over the top of the
old file, which should trigger an update on the directory mtime.


Ah, right. I wasn't considering the behavior of package mangers. My simple test 
of appending to an existing file doesn't simulate that. Indeed checking mtime on 
the directory should work.


I can work on a patch, but one last question is the location of the directory? 
It looks like qemu searches all over the place for modules: QEMU_MODULE_DIR env 
var, CONFIG_QEMU_MODDIR, directory up from the executable location, directory of 
executable location, var/run/qemu. I suppose downstreams will go with 
CONFIG_QEMU_MODDIR for module location. Should libvirt have a config option to 
specify the path, so downstream can coordinate that between the packages?


I've sent a patch that includes a 'qemu_moddir' configure option

https://www.redhat.com/archives/libvir-list/2020-August/msg00688.html

Regards,
Jim




[RFC] qemu: Check for changes in qemu modules directory

2020-08-20 Thread Jim Fehlig
Add a configuration option for specifying location of the qemu modules
directory, defaulting to /usr/lib64/qemu. Then use this location to
check for changes in the directory, indicating that a qemu module has
changed and capabilities need to be reprobed.

Signed-off-by: Jim Fehlig 
---

This is a mildly tested impl of the idea discussed here (and elsewhere
in the thread)

https://www.redhat.com/archives/libvir-list/2020-August/msg00684.html

I've checked that both system and session caches are updated when a
qemu modules package is updated. I'll do some more testing but wanted
to send the patch for early comments. Testing in other environments
would be much appreciated. Thanks!

 meson.build  |  6 ++
 meson_options.txt|  1 +
 src/qemu/qemu_capabilities.c | 35 +++
 3 files changed, 42 insertions(+)

diff --git a/meson.build b/meson.build
index a72d0c0e85..16928482aa 100644
--- a/meson.build
+++ b/meson.build
@@ -1745,6 +1745,12 @@ if not get_option('driver_qemu').disabled()
   if use_qemu
 conf.set('WITH_QEMU', 1)
 
+qemu_moddir = get_option('qemu_moddir')
+if qemu_moddir == ''
+  qemu_moddir = '/usr/lib64/qemu'
+endif
+conf.set_quoted('QEMU_MODDIR', qemu_moddir)
+
 if host_machine.system() in ['freebsd', 'darwin']
   default_qemu_user = 'root'
   default_qemu_group = 'wheel'
diff --git a/meson_options.txt b/meson_options.txt
index c538d323c1..cedbd79491 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -59,6 +59,7 @@ option('driver_openvz', type: 'feature', value: 'auto', 
description: 'OpenVZ dri
 option('driver_qemu', type: 'feature', value: 'auto', description: 'QEMU/KVM 
driver')
 option('qemu_user', type: 'string', value: '', description: 'username to run 
QEMU system instance as')
 option('qemu_group', type: 'string', value: '', description: 'groupname to run 
QEMU system instance as')
+option('qemu_moddir', type: 'string', value: '', description: 'set the 
directory where QEMU modules are located')
 option('driver_remote', type: 'feature', value: 'enabled', description: 
'remote driver')
 option('remote_default_mode', type: 'combo', choices: ['legacy', 'direct'], 
value: 'legacy', description: 'remote driver default mode')
 option('driver_secrets', type: 'feature', value: 'auto', description: 'local 
secrets management driver')
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index ff6ba8c9e9..a00853c753 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -677,6 +677,7 @@ struct _virQEMUCaps {
 char *binary;
 time_t ctime;
 time_t libvirtCtime;
+time_t modDirMtime;
 bool invalidation;
 
 virBitmapPtr flags;
@@ -4194,6 +4195,7 @@ virQEMUCapsParseSEVInfo(virQEMUCapsPtr qemuCaps, 
xmlXPathContextPtr ctxt)
  * 
  *   /some/path
  *   234235253
+ *   234235253
  *   234235253
  *   1002016
  *   
@@ -4283,6 +4285,9 @@ virQEMUCapsLoadCache(virArch hostArch,
 }
 qemuCaps->ctime = (time_t)l;
 
+if (virXPathLongLong("string(./qemumoddirmtime)", ctxt, ) == 0)
+qemuCaps->modDirMtime = (time_t)l;
+
 if ((n = virXPathNodeSet("./flag", ctxt, )) < 0) {
 virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
_("failed to parse qemu capabilities flags"));
@@ -4615,6 +4620,10 @@ virQEMUCapsFormatCache(virQEMUCapsPtr qemuCaps)
   qemuCaps->binary);
 virBufferAsprintf(, "%llu\n",
   (long long)qemuCaps->ctime);
+if (qemuCaps->modDirMtime > 0) {
+virBufferAsprintf(, "%llu\n",
+  (long long)qemuCaps->modDirMtime);
+}
 virBufferAsprintf(, "%llu\n",
   (long long)qemuCaps->libvirtCtime);
 virBufferAsprintf(, "%lu\n",
@@ -4881,6 +4890,23 @@ virQEMUCapsIsValid(void *data,
 if (!qemuCaps->binary)
 return true;
 
+if (virFileExists(QEMU_MODDIR)) {
+if (stat(QEMU_MODDIR, ) < 0) {
+VIR_DEBUG("Failed to stat QEMU module directory '%s': %s",
+  QEMU_MODDIR,
+  g_strerror(errno));
+return false;
+}
+
+if (sb.st_mtime != qemuCaps->modDirMtime) {
+VIR_DEBUG("Outdated capabilities for '%s': QEMU modules "
+  "directory '%s' changed (%lld vs %lld)",
+  qemuCaps->binary, QEMU_MODDIR,
+  (long long)sb.st_mtime, (long 
long)qemuCaps->modDirMtime);
+return false;
+}
+}
+
 if (qemuCaps->libvirtCtime != virGetSelfLastChanged() ||
 qemuCaps->libvirtVersion != LIBVIR_VERSION_NUMBER) {
 VIR_DEBUG("Outdated capabilities for '%s': libvirt changed "
@@ -5463,6 +5489,15 @@ virQEMUCapsNewForBinaryInternal(virArch hostArch,
 goto error;
 }
 
+if (virFileExists(QEMU_MODDIR)) {
+if (stat(QEMU_MODDIR, ) < 0) {
+virReportSystemError(errno, _("Cannot check QEMU 

Re: [libvirt PATCH] RFC: Add support for vDPA network devices

2020-08-20 Thread Laine Stump

On 8/18/20 2:37 PM, Jonathon Jongsma wrote:

vDPA network devices allow high-performance networking in a virtual
machine by providing a wire-speed data path. These devices require a
vendor-specific host driver but the data path follows the virtio
specification.

The support for vDPA devices was recently added to qemu. This allows
libvirt to support these devices. It requires that the device is
configured on the host with the appropriate vendor-specific driver.
This will create a chardev on the host at e.g. /dev/vhost-vdpa-0. That
chardev path can then be used to define a new interface with
type='vdpa'.
---
  docs/formatdomain.rst | 20 +
  docs/schemas/domaincommon.rng | 15 +++
  src/conf/domain_conf.c| 41 +++
  src/conf/domain_conf.h|  4 ++
  src/conf/netdev_bandwidth_conf.c  |  1 +
  src/libxl/libxl_conf.c|  1 +
  src/libxl/xen_common.c|  1 +
  src/lxc/lxc_controller.c  |  1 +
  src/lxc/lxc_driver.c  |  3 ++
  src/lxc/lxc_process.c |  1 +
  src/qemu/qemu_command.c   | 29 -
  src/qemu/qemu_command.h   |  3 +-
  src/qemu/qemu_domain.c|  6 ++-
  src/qemu/qemu_hotplug.c   | 15 ---
  src/qemu/qemu_interface.c | 25 +++
  src/qemu/qemu_interface.h |  2 +
  src/qemu/qemu_process.c   |  1 +
  src/qemu/qemu_validate.c  |  1 +
  src/vmx/vmx.c |  1 +
  .../net-vdpa.x86_64-latest.args   | 37 +
  tests/qemuxml2argvdata/net-vdpa.xml   | 28 +
  tests/qemuxml2argvmock.c  | 11 -
  tests/qemuxml2argvtest.c  |  1 +
  tests/qemuxml2xmloutdata/net-vdpa.xml | 34 +++
  tests/qemuxml2xmltest.c   |  1 +
  tools/virsh-domain.c  |  1 +
  26 files changed, 274 insertions(+), 10 deletions(-)
  create mode 100644 tests/qemuxml2argvdata/net-vdpa.x86_64-latest.args
  create mode 100644 tests/qemuxml2argvdata/net-vdpa.xml
  create mode 100644 tests/qemuxml2xmloutdata/net-vdpa.xml



I would have had fewer excuses to procrastinate in looking at this if it 
was broken up into smaller patches. At least one patch for the change to 
the XML schema/parser/formatter, and xml2xml test case, and a bit of 
docs in formatdomain, then another putting the support into qemu for 
that bit of config.




diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 8365fc8bbb..1356485504 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -4632,6 +4632,26 @@ or stopping the guest.
 
 ...
  
+:anchor:``

+
+vDPA devices
+
+
+A vDPA device can be used to provide wire speed network performance within a
+domain. The host device must already be configured with the appropriate
+device-specific vDPA driver. This creates a vDPA char device (e.g.
+/dev/vhost-vdpa-0) that can be used to assign the device to a libvirt domain.



Maybe at least mention here that this only works with certain models of 
SR-IOV NICs, and that each guest vdpa uses up one SR-IOV VF on the host. 
Otherwise we'll get people seeing the "wirespeed performance" part, then 
trying to figure out how to set it up using their ISA bus NE2000 NIC or 
something :-)




+
+::
+
+   ...
+   
+ 
+   



(The above device is created (I just learned this from you in IRC!) by 
unbinding a VF from its NIC driver on the host, and re-binding it to a 
special VDPA-VF driver.)



As we were just discussing online, on one hand it could be nice if 
libvirt could automatically handle rebinding the VF to the vdpa host 
driver (given the PCI address of the VF), to make it easier to use 
(because no advance setup would be needed), similar to what's already 
done with hostdev devices (and ) when 
managed='yes' (which is the default setting).



On the other hand, it is exactly that managed='yes' functionality that 
has created more "libvirt-but-not-really-libvirt" bug reports than any 
other aspect of vfio device assignment, because the process of unbinding 
and rebinding drivers is timing-sensitive and causes code that's usually 
run only once at host boot-time to be run hundreds of times thus making 
it more likely to expose infrequently-hit bugs.



I just bring this up in advance of someone suggesting the addition of 
managed='yes' here to put in my vote for *not* doing it, and instead 
using that same effort to provide some sort of API in the node-device 
driver for easily creating one or more VDPA devices from VFs, which 
could be done once at host boot time, and thus avoid the level of 
"libvirt-not-libvirt" bug reports for VDPA. (and after that maybe even 

Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Mark Mielke
On Thu, Aug 20, 2020 at 2:01 PM Daniel P. Berrangé 
wrote:

> IIUC, Mark's point is that an RPM update won't replace the file in-placec.
> It will write out a new temporary file and then rename over the top of the
> old file, which should trigger an update on the directory mtime.
>

Yep, correct.


> Not sure what a "make install" will do with QEMU, but since QEMU is about
> to switch to meson, we might get lucky in having the same temp+rename
> behaviour.
>

I think this is the relevant section of the Makefile:

ifneq ($(CONFIG_MODULES),)
$(INSTALL_DIR) "$(DESTDIR)$(qemu_moddir)"
for s in $(modules-m:.mo=$(DSOSUF)); do \
t="$(DESTDIR)$(qemu_moddir)/$$(echo $$s | tr / -)"; \
$(INSTALL_LIB) $$s "$$t"; \
test -z "$(STRIP)" || $(STRIP) "$$t"; \
done
endif


The INSTALL_LIB is detected by configure, and will normally be:

config-host.mak:INSTALL_LIB=install -c -m 0644


You can see here that "install" results in a new inode, which indicates a
new file:

$ cd /tmp
$ install -c -m 0644 /etc/motd motd_temp
$ ls -li motd_temp
*3544372* -rw-r--r-- 1 mark mark 0 Aug 20 17:25 motd_temp
$ install -c -m 0644 /etc/motd motd_temp
$ ls -li motd_temp
*3561572* -rw-r--r-- 1 mark mark 0 Aug 20 17:25 motd_temp


Once upon a time, this wasn't done - and the consequence is that things
like "make install" would break running programs with various undefined
behaviour. Now it is done "correctly" for so long, that people may have
forgotten the old days. :-)

-- 
Mark Mielke 


Re: [PATCH] lxc: Add TPM passthrough option for LXC driver

2020-08-20 Thread Daniel Henrique Barboza




On 8/16/20 1:16 PM, Julio Faracco wrote:

There is no support to use TPM for passthrough for LXC libvirt driver
this commit adds the option to use host TPM inside containers.



" for LXC libvirt driver. This commit adds the option ..."



Signed-off-by: Julio Faracco 
---
  src/lxc/lxc_cgroup.c | 27 +++
  src/lxc/lxc_controller.c | 56 
  2 files changed, 83 insertions(+)

diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c
index d13f2adde5..955d2b4fc1 100644
--- a/src/lxc/lxc_cgroup.c
+++ b/src/lxc/lxc_cgroup.c
@@ -374,6 +374,33 @@ static int virLXCCgroupSetupDeviceACL(virDomainDefPtr def,
  return -1;
  }
  
+for (i = 0; i < def->ntpms; i++) {

+virDomainTPMDefPtr tpm = def->tpms[i];
+const char *dev = NULL;
+
+switch (tpm->type) {
+case VIR_DOMAIN_TPM_TYPE_EMULATOR:
+case VIR_DOMAIN_TPM_TYPE_LAST:
+break;
+case VIR_DOMAIN_TPM_TYPE_PASSTHROUGH:
+dev = "/dev/tpm0";
+break;
+}
+
+if (!dev)
+continue;



Tou're making an assumption here that the host TPM will always be in /dev/tpm0.
This is most likely to be the case but it has some potential for backfiring.
I don't have access to an LXC env to see how much of a disaster this could
case, so I'll take your word for it.

That said, since 'dev' doesn't change value and you only about TPM PASSTHROUGH,
you can roll something like this (ps: I didn't compile it):


for (i = 0; i < def->ntpms; i++) {
virDomainTPMDefPtr tpm = def->tpms[i];
const char *dev = "/dev/tpm0";

   if (tpm->type != VIR_DOMAIN_TPM_TYPE_PASSTHROUGH)
   continue;

   // if /dev/tpm0 does not exist the whole loop can be skipped
   if (!virFileExists(dev))
   break;

   if (virCgroupAllowDevicePath(cgroup, dev,
VIR_CGROUP_DEVICE_READ,
 false) < 0)
return -1;

// even if there are more than one TPM passthrough dev, you just
// care about /dev/tpm0 and you already set the cgroup for it,
// so break out the loop.
break;
}





+
+if (!virFileExists(dev)) {
+VIR_DEBUG("Ignoring non-existent device %s", dev);
+continue;
+}
+
+if (virCgroupAllowDevicePath(cgroup, dev,
+ VIR_CGROUP_DEVICE_READ,
+ false) < 0)
+return -1;
+}
+
  VIR_DEBUG("Device ACL setup complete");
  
  return 0;

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index ae6b737b60..70ca773bbf 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -1644,6 +1644,59 @@ virLXCControllerSetupHostdevSubsysUSB(virDomainDefPtr 
vmDef,
  }
  
  
+static int

+virLXCControllerSetupTPM(virLXCControllerPtr ctrl)
+{
+virDomainDefPtr def = ctrl->def;
+size_t i;
+
+for (i = 0; i < def->ntpms; i++) {
+virDomainTPMDefPtr tpm = def->tpms[i];
+g_autofree char *path = NULL;
+const char *tpm_dev = NULL;
+struct stat sb;
+dev_t dev;
+
+switch (tpm->type) {
+case VIR_DOMAIN_TPM_TYPE_EMULATOR:
+case VIR_DOMAIN_TPM_TYPE_LAST:
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("unsupported timer type (name) '%s'"),



I believe you meant "unsupported TPM device" instead of timer.



+   virDomainTPMBackendTypeToString(tpm->type));
+return -1;
+case VIR_DOMAIN_TPM_TYPE_PASSTHROUGH:
+tpm_dev = "/dev/tpm0";
+path = g_strdup_printf("/%s/%s.dev/%s", LXC_STATE_DIR,
+   def->name, "/rtc");
+break;
+}
+
+if (!tpm_dev)
+continue;



I believe this code can be simplified in a similar fashion like I mentioned
up there for virLXCCgroupSetupDeviceACL.


Thanks,


DHB



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Jim Fehlig

On 8/20/20 12:01 PM, Daniel P. Berrangé wrote:

On Thu, Aug 20, 2020 at 11:42:45AM -0600, Jim Fehlig wrote:

On 8/20/20 10:54 AM, Mark Mielke wrote:

On Thu, Aug 20, 2020 at 11:48 AM Christian Ehrhardt
mailto:christian.ehrha...@canonical.com>> wrote:

 On Thu, Aug 20, 2020 at 5:15 PM Mark Mielke mailto:mark.mie...@gmail.com>> wrote:
  > On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt
 mailto:christian.ehrha...@canonical.com>>
 wrote:
  >> This was meant for upgrades, but if libvirt would define a known path 
in
  >> there like /var/run/qemu/last_packaging_change the packages could 
easily
  >> touch it on any install/remove/update as Daniel suggested and libvirt 
could
  >> check this path like it does with the date of the qemu binary already.
  > Earlier in this thread - I think one or two of us had asked about the
 timestamp on the directory that contains the modules.
  > I'm wondering if a "last_packaging_change" provides any value over and
 above the timestamp of the directory itself? Wouldn't the directory be best
 - as it would work automatically for both distro packaging as well as 
custom
 installs?
 Sure, if
 - "list of files in module dir"
 - "stat dates of files in module dir"
 would be checked by libvirt that would also be a no-op for packaging
 and thereby land faster.


Why is the list of files and stat dates needed? Any change done by RPM
to add or remove a file from the directory, should cause the mtime for
the directory to be updated. It doesn't really matter what the change is
- it only matters that the change is detected.

The case for needing to check the files *in* the directory, would be a
concern about the modules keeping the same name, but being updated in
place. I'm doubtful that this ever occurs for real, as it would cause
breakage for running programs. Existing running programs would mmap() or
similar the binaries into memory, and cannot be updated in place.
Instead, the inode remains alive, and a new file is created with the
same name, to replace the old file, and once all file descriptors are
closed - the inode is deleted.

So, unless there is a hierarchical directory structure - I believe
checking the modules directory timestamp is near 100% accuracy for
whether modules were added, removed, or updated using "safe" deployment
practices either by RPM or "make install".


I wrote a small test program that checks mtime (also tried ctime) and it
only changes on the directory when files are added and deleted. There is no
change to either if a file in the directory has been updated. It would be
really convenient to check only the directory to see if its' contents have
changed but I'm not aware of how to do that other than with something like
inotify. Suggestions welcomed.


IIUC, Mark's point is that an RPM update won't replace the file in-placec.
It will write out a new temporary file and then rename over the top of the
old file, which should trigger an update on the directory mtime.


Ah, right. I wasn't considering the behavior of package mangers. My simple test 
of appending to an existing file doesn't simulate that. Indeed checking mtime on 
the directory should work.


I can work on a patch, but one last question is the location of the directory? 
It looks like qemu searches all over the place for modules: QEMU_MODULE_DIR env 
var, CONFIG_QEMU_MODDIR, directory up from the executable location, directory of 
executable location, var/run/qemu. I suppose downstreams will go with 
CONFIG_QEMU_MODDIR for module location. Should libvirt have a config option to 
specify the path, so downstream can coordinate that between the packages?


Regards,
Jim




Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Daniel P . Berrangé
On Thu, Aug 20, 2020 at 11:42:45AM -0600, Jim Fehlig wrote:
> On 8/20/20 10:54 AM, Mark Mielke wrote:
> > On Thu, Aug 20, 2020 at 11:48 AM Christian Ehrhardt
> >  > > wrote:
> > 
> > On Thu, Aug 20, 2020 at 5:15 PM Mark Mielke  > > wrote:
> >  > On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt
> >  > >
> > wrote:
> >  >> This was meant for upgrades, but if libvirt would define a known 
> > path in
> >  >> there like /var/run/qemu/last_packaging_change the packages could 
> > easily
> >  >> touch it on any install/remove/update as Daniel suggested and 
> > libvirt could
> >  >> check this path like it does with the date of the qemu binary 
> > already.
> >  > Earlier in this thread - I think one or two of us had asked about the
> > timestamp on the directory that contains the modules.
> >  > I'm wondering if a "last_packaging_change" provides any value over 
> > and
> > above the timestamp of the directory itself? Wouldn't the directory be 
> > best
> > - as it would work automatically for both distro packaging as well as 
> > custom
> > installs?
> > Sure, if
> > - "list of files in module dir"
> > - "stat dates of files in module dir"
> > would be checked by libvirt that would also be a no-op for packaging
> > and thereby land faster.
> > 
> > 
> > Why is the list of files and stat dates needed? Any change done by RPM
> > to add or remove a file from the directory, should cause the mtime for
> > the directory to be updated. It doesn't really matter what the change is
> > - it only matters that the change is detected.
> > 
> > The case for needing to check the files *in* the directory, would be a
> > concern about the modules keeping the same name, but being updated in
> > place. I'm doubtful that this ever occurs for real, as it would cause
> > breakage for running programs. Existing running programs would mmap() or
> > similar the binaries into memory, and cannot be updated in place.
> > Instead, the inode remains alive, and a new file is created with the
> > same name, to replace the old file, and once all file descriptors are
> > closed - the inode is deleted.
> > 
> > So, unless there is a hierarchical directory structure - I believe
> > checking the modules directory timestamp is near 100% accuracy for
> > whether modules were added, removed, or updated using "safe" deployment
> > practices either by RPM or "make install".
> 
> I wrote a small test program that checks mtime (also tried ctime) and it
> only changes on the directory when files are added and deleted. There is no
> change to either if a file in the directory has been updated. It would be
> really convenient to check only the directory to see if its' contents have
> changed but I'm not aware of how to do that other than with something like
> inotify. Suggestions welcomed.

IIUC, Mark's point is that an RPM update won't replace the file in-placec.
It will write out a new temporary file and then rename over the top of the
old file, which should trigger an update on the directory mtime.

Not sure what a "make install" will do with QEMU, but since QEMU is about
to switch to meson, we might get lucky in having the same temp+rename
behaviour.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Jim Fehlig

On 8/20/20 10:54 AM, Mark Mielke wrote:
On Thu, Aug 20, 2020 at 11:48 AM Christian Ehrhardt 
mailto:christian.ehrha...@canonical.com>> wrote:


On Thu, Aug 20, 2020 at 5:15 PM Mark Mielke mailto:mark.mie...@gmail.com>> wrote:
 > On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt
mailto:christian.ehrha...@canonical.com>>
wrote:
 >> This was meant for upgrades, but if libvirt would define a known path in
 >> there like /var/run/qemu/last_packaging_change the packages could easily
 >> touch it on any install/remove/update as Daniel suggested and libvirt 
could
 >> check this path like it does with the date of the qemu binary already.
 > Earlier in this thread - I think one or two of us had asked about the
timestamp on the directory that contains the modules.
 > I'm wondering if a "last_packaging_change" provides any value over and
above the timestamp of the directory itself? Wouldn't the directory be best
- as it would work automatically for both distro packaging as well as custom
installs?
Sure, if
- "list of files in module dir"
- "stat dates of files in module dir"
would be checked by libvirt that would also be a no-op for packaging
and thereby land faster.


Why is the list of files and stat dates needed? Any change done by RPM to add or 
remove a file from the directory, should cause the mtime for the directory to be 
updated. It doesn't really matter what the change is - it only matters that the 
change is detected.


The case for needing to check the files *in* the directory, would be a concern 
about the modules keeping the same name, but being updated in place. I'm 
doubtful that this ever occurs for real, as it would cause breakage for running 
programs. Existing running programs would mmap() or similar the binaries into 
memory, and cannot be updated in place. Instead, the inode remains alive, and a 
new file is created with the same name, to replace the old file, and once all 
file descriptors are closed - the inode is deleted.


So, unless there is a hierarchical directory structure - I believe checking the 
modules directory timestamp is near 100% accuracy for whether modules were 
added, removed, or updated using "safe" deployment practices either by RPM or 
"make install".


I wrote a small test program that checks mtime (also tried ctime) and it only 
changes on the directory when files are added and deleted. There is no change to 
either if a file in the directory has been updated. It would be really 
convenient to check only the directory to see if its' contents have changed but 
I'm not aware of how to do that other than with something like inotify. 
Suggestions welcomed.


Regards,
Jim
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, int **argv)
{
struct stat sb;
time_t curtime;

if (stat("/home/jfehlig/virt/temp", ) < 0) {
printf("stat failed with error %s\n", strerror(errno));
exit(1);
}

curtime = sb.st_mtime;
printf("Got initial mtime on directory, looking for changes...\n");
while (1) {
sleep(5);
if (stat("/home/jfehlig/virt/temp", ) < 0) {
printf("stat failed with error %s\n", strerror(errno));
exit(1);
}

if (curtime != sb.st_mtime) {
printf("The directory has changed!\n");
curtime = sb.st_mtime;
}
}

exit(0);
}


Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Mark Mielke
On Thu, Aug 20, 2020 at 11:48 AM Christian Ehrhardt <
christian.ehrha...@canonical.com> wrote:

> On Thu, Aug 20, 2020 at 5:15 PM Mark Mielke  wrote:
> > On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt <
> christian.ehrha...@canonical.com> wrote:
> >> This was meant for upgrades, but if libvirt would define a known path in
> >> there like /var/run/qemu/last_packaging_change the packages could easily
> >> touch it on any install/remove/update as Daniel suggested and libvirt
> could
> >> check this path like it does with the date of the qemu binary already.
> > Earlier in this thread - I think one or two of us had asked about the
> timestamp on the directory that contains the modules.
> > I'm wondering if a "last_packaging_change" provides any value over and
> above the timestamp of the directory itself? Wouldn't the directory be best
> - as it would work automatically for both distro packaging as well as
> custom installs?
> Sure, if
> - "list of files in module dir"
> - "stat dates of files in module dir"
> would be checked by libvirt that would also be a no-op for packaging
> and thereby land faster.
>

Why is the list of files and stat dates needed? Any change done by RPM to
add or remove a file from the directory, should cause the mtime for the
directory to be updated. It doesn't really matter what the change is - it
only matters that the change is detected.

The case for needing to check the files *in* the directory, would be a
concern about the modules keeping the same name, but being updated in
place. I'm doubtful that this ever occurs for real, as it would cause
breakage for running programs. Existing running programs would mmap() or
similar the binaries into memory, and cannot be updated in place. Instead,
the inode remains alive, and a new file is created with the same name, to
replace the old file, and once all file descriptors are closed - the inode
is deleted.

So, unless there is a hierarchical directory structure - I believe checking
the modules directory timestamp is near 100% accuracy for whether modules
were added, removed, or updated using "safe" deployment practices either by
RPM or "make install".


> But I guess there were reasons against it in the discussion before?
>

I don't think there was a follow up discussion on the idea. I think that is
here. :-)


-- 
Mark Mielke 


Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Christian Ehrhardt
On Thu, Aug 20, 2020 at 5:15 PM Mark Mielke  wrote:
>
> On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt 
>  wrote:
>>
>> On Thu, Aug 20, 2020 at 12:43 PM Martin Wilck  wrote:
>> > The simplest approach is to touch the qemu binaries. We discussed this
>> > already. It has the drawback that it makes "rpm -V" complain about
>> > wrong timestamps. It might also confuse backup software. Still, it
>> > might be a viable short-term workaround if nothing else is available.
>>
>> Qemu already allows to save modules in /var/run/qemu/ [1] to better handle
>> module upgrades which is already used in Debian and Ubuntu to avoid
>> late module load errors after upgrades.
>>
>> This was meant for upgrades, but if libvirt would define a known path in
>> there like /var/run/qemu/last_packaging_change the packages could easily
>> touch it on any install/remove/update as Daniel suggested and libvirt could
>> check this path like it does with the date of the qemu binary already.
>>
>> [1]: 
>> https://github.com/qemu/qemu/commit/bd83c861c0628a64997b7bd95c3bcc2e916baf2e
>
>
> Earlier in this thread - I think one or two of us had asked about the 
> timestamp on the directory that contains the modules.
>
> I'm wondering if a "last_packaging_change" provides any value over and above 
> the timestamp of the directory itself? Wouldn't the directory be best - as it 
> would work automatically for both distro packaging as well as custom installs?

Sure, if
- "list of files in module dir"
- "stat dates of files in module dir"
would be checked by libvirt that would also be a no-op for packaging
and thereby land faster.

But I guess there were reasons against it in the discussion before?

> --
> Mark Mielke 
>


-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd



Re: [libvirt PATCH v2 03/16] nodedev: Add ability to filter by active state

2020-08-20 Thread Erik Skultety
On Tue, Aug 18, 2020 at 09:47:53AM -0500, Jonathon Jongsma wrote:
> Add two flag values for virConnectListAllNodeDevices() so that we can
> list only node devices that are active or inactive.
>
> Signed-off-by: Jonathon Jongsma 
> ---
...

> diff --git a/src/conf/virnodedeviceobj.c b/src/conf/virnodedeviceobj.c
> index 88a7746b27..9838513b69 100644
> --- a/src/conf/virnodedeviceobj.c
> +++ b/src/conf/virnodedeviceobj.c
> @@ -865,6 +865,16 @@ virNodeDeviceObjMatch(virNodeDeviceObjPtr obj,
>  return false;
>  }
>
> +#undef MATCH
> +#define MATCH(FLAG) (flags & (FLAG))
> +
> +if (MATCH(VIR_CONNECT_LIST_NODE_DEVICES_FILTERS_ACTIVE) &&
> +!((MATCH(VIR_CONNECT_LIST_NODE_DEVICES_ACTIVE) &&
> +   virNodeDeviceObjIsActive(obj)) ||
> +  (MATCH(VIR_CONNECT_LIST_NODE_DEVICES_INACTIVE) &&
> +   !virNodeDeviceObjIsActive(obj
> +return false;
> +
>  return true;
>  }
>  #undef MATCH

^This produces a significantly smaller diff, but I'm not really a fan of
re-defining the same macro in a different way, it can turn out being confusing
in the end, since it's a different macro. I suggest renaming the original MATCH
macro to something like MATCH_CAP(CAP) and defining this new one as MATCH.

ACK to the rest.

Erik



Re: [libvirt PATCH v2 02/16] nodedev: introduce concept of 'active' node devices

2020-08-20 Thread Erik Skultety
On Tue, Aug 18, 2020 at 09:47:52AM -0500, Jonathon Jongsma wrote:
> we will be able to define mediated devices that can be started or
> stopped, so we need to be able to indicate whether the device is active
> or not, similar to other resources (storage pools, domains, etc.)
>
> Signed-off-by: Jonathon Jongsma 
> ---
Reviewed-by: Erik Skultety 



Re: [libvirt PATCH v2 01/16] tests: remove extra trailing semicolon

2020-08-20 Thread Erik Skultety
On Tue, Aug 18, 2020 at 09:47:51AM -0500, Jonathon Jongsma wrote:
> The macro should not have a trailing semicolon so that when the macro is
> used, the user can add a semicolon themselves.
>
> Signed-off-by: Jonathon Jongsma 
> ---
Reviewed-by: Erik Skultety 

I'll merge this right away.



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Mark Mielke
On Thu, Aug 20, 2020 at 8:55 AM Christian Ehrhardt <
christian.ehrha...@canonical.com> wrote:

> On Thu, Aug 20, 2020 at 12:43 PM Martin Wilck  wrote:
> > The simplest approach is to touch the qemu binaries. We discussed this
> > already. It has the drawback that it makes "rpm -V" complain about
> > wrong timestamps. It might also confuse backup software. Still, it
> > might be a viable short-term workaround if nothing else is available.
>
> Qemu already allows to save modules in /var/run/qemu/ [1] to better handle
> module upgrades which is already used in Debian and Ubuntu to avoid
> late module load errors after upgrades.
>
> This was meant for upgrades, but if libvirt would define a known path in
> there like /var/run/qemu/last_packaging_change the packages could easily
> touch it on any install/remove/update as Daniel suggested and libvirt could
> check this path like it does with the date of the qemu binary already.
>
> [1]:
> https://github.com/qemu/qemu/commit/bd83c861c0628a64997b7bd95c3bcc2e916baf2e
>

Earlier in this thread - I think one or two of us had asked about the
timestamp on the directory that contains the modules.

I'm wondering if a "last_packaging_change" provides any value over and
above the timestamp of the directory itself? Wouldn't the directory be best
- as it would work automatically for both distro packaging as well as
custom installs?

-- 
Mark Mielke 


Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Daniel P . Berrangé
On Thu, Aug 20, 2020 at 09:05:56AM -0600, Jim Fehlig wrote:
> On 8/20/20 8:56 AM, Jim Fehlig wrote:
> > On 8/20/20 6:54 AM, Christian Ehrhardt wrote:
> > > On Thu, Aug 20, 2020 at 12:43 PM Martin Wilck  wrote:
> > > > 
> > > > On Thu, 2020-08-20 at 11:00 +0100, Daniel P. Berrangé wrote:
> > > > > On Wed, Aug 05, 2020 at 10:19:26AM +0200, Andrea Bolognani wrote:
> > > > > > contents change.
> > > > > > 
> > > > > > That said, if upgrading QEMU results in losing features, even
> > > > > > though
> > > > > > you can recover them through additional steps I would argue that's
> > > > > > a
> > > > > > bug in the packaging that should be addressed on the QEMU side.
> > > > > 
> > > > > Potentially we can consider this a  distro packaging problem and fix
> > > > > it at the RPM level.
> > > > > 
> > > > > eg the libvirt RPM can define a trigger that runs when *any* of the
> > > > > qemu-device*  RPMs is installed/updated. This trigger can simply
> > > > > touch a file on disk somewhere, and libvirtd can monitor this one
> > > > > file, instead of having to monitor every module.
> > > > 
> > > > The simplest approach is to touch the qemu binaries. We discussed this
> > > > already. It has the drawback that it makes "rpm -V" complain about
> > > > wrong timestamps. It might also confuse backup software. Still, it
> > > > might be a viable short-term workaround if nothing else is available.
> > > 
> > > Qemu already allows to save modules in /var/run/qemu/ [1] to better handle
> > > module upgrades which is already used in Debian and Ubuntu to avoid
> > > late module load errors after upgrades.
> > > 
> > > This was meant for upgrades, but if libvirt would define a known path in
> > > there like /var/run/qemu/last_packaging_change the packages could easily
> > > touch it on any install/remove/update as Daniel suggested and libvirt 
> > > could
> > > check this path like it does with the date of the qemu binary already.
> > 
> > That would require changes in libvirt and the various module packages
> > right? Are you fine with Daniel's suggestion of handling this all within
> > libvirt? E.g. on the packaging side libvirt could define a trigger to be
> > notified when the modules dir is updated and touch a well-known file
> > 
> > %filetriggerin -- %{_libdir}/qemu
> > touch /var/cache/libvirt/qemu/capabilities/module-changed
> > %filetriggerun -- %{_libdir}/qemu
> > touch /var/cache/libvirt/qemu/capabilities/module-changed
> 
> Hmm, this is still problematic for session mode. I think your suggestion
> covers both system and session modes nicely.

Hmm, yeah, session mode ought not to be looking at the
/var/cache directory tree ideally, as we like to keep
the daemons fully isolated.

I wonder if we really do just need to traverse the module
directory. readdir+stat'ing 10-20 files is still wy
faster than not having any caching at all.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Jim Fehlig

On 8/20/20 8:56 AM, Jim Fehlig wrote:

On 8/20/20 6:54 AM, Christian Ehrhardt wrote:

On Thu, Aug 20, 2020 at 12:43 PM Martin Wilck  wrote:


On Thu, 2020-08-20 at 11:00 +0100, Daniel P. Berrangé wrote:

On Wed, Aug 05, 2020 at 10:19:26AM +0200, Andrea Bolognani wrote:

contents change.

That said, if upgrading QEMU results in losing features, even
though
you can recover them through additional steps I would argue that's
a
bug in the packaging that should be addressed on the QEMU side.


Potentially we can consider this a  distro packaging problem and fix
it at the RPM level.

eg the libvirt RPM can define a trigger that runs when *any* of the
qemu-device*  RPMs is installed/updated. This trigger can simply
touch a file on disk somewhere, and libvirtd can monitor this one
file, instead of having to monitor every module.


The simplest approach is to touch the qemu binaries. We discussed this
already. It has the drawback that it makes "rpm -V" complain about
wrong timestamps. It might also confuse backup software. Still, it
might be a viable short-term workaround if nothing else is available.


Qemu already allows to save modules in /var/run/qemu/ [1] to better handle
module upgrades which is already used in Debian and Ubuntu to avoid
late module load errors after upgrades.

This was meant for upgrades, but if libvirt would define a known path in
there like /var/run/qemu/last_packaging_change the packages could easily
touch it on any install/remove/update as Daniel suggested and libvirt could
check this path like it does with the date of the qemu binary already.


That would require changes in libvirt and the various module packages right? Are 
you fine with Daniel's suggestion of handling this all within libvirt? E.g. on 
the packaging side libvirt could define a trigger to be notified when the 
modules dir is updated and touch a well-known file


%filetriggerin -- %{_libdir}/qemu
touch /var/cache/libvirt/qemu/capabilities/module-changed
%filetriggerun -- %{_libdir}/qemu
touch /var/cache/libvirt/qemu/capabilities/module-changed


Hmm, this is still problematic for session mode. I think your suggestion covers 
both system and session modes nicely.


If we can agree on an approach I'd be happy to take a stab at implementing it.

Regards,
Jim




Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Jim Fehlig

On 8/20/20 6:54 AM, Christian Ehrhardt wrote:

On Thu, Aug 20, 2020 at 12:43 PM Martin Wilck  wrote:


On Thu, 2020-08-20 at 11:00 +0100, Daniel P. Berrangé wrote:

On Wed, Aug 05, 2020 at 10:19:26AM +0200, Andrea Bolognani wrote:

contents change.

That said, if upgrading QEMU results in losing features, even
though
you can recover them through additional steps I would argue that's
a
bug in the packaging that should be addressed on the QEMU side.


Potentially we can consider this a  distro packaging problem and fix
it at the RPM level.

eg the libvirt RPM can define a trigger that runs when *any* of the
qemu-device*  RPMs is installed/updated. This trigger can simply
touch a file on disk somewhere, and libvirtd can monitor this one
file, instead of having to monitor every module.


The simplest approach is to touch the qemu binaries. We discussed this
already. It has the drawback that it makes "rpm -V" complain about
wrong timestamps. It might also confuse backup software. Still, it
might be a viable short-term workaround if nothing else is available.


Qemu already allows to save modules in /var/run/qemu/ [1] to better handle
module upgrades which is already used in Debian and Ubuntu to avoid
late module load errors after upgrades.

This was meant for upgrades, but if libvirt would define a known path in
there like /var/run/qemu/last_packaging_change the packages could easily
touch it on any install/remove/update as Daniel suggested and libvirt could
check this path like it does with the date of the qemu binary already.


That would require changes in libvirt and the various module packages right? Are 
you fine with Daniel's suggestion of handling this all within libvirt? E.g. on 
the packaging side libvirt could define a trigger to be notified when the 
modules dir is updated and touch a well-known file


%filetriggerin -- %{_libdir}/qemu
touch /var/cache/libvirt/qemu/capabilities/module-changed
%filetriggerun -- %{_libdir}/qemu
touch /var/cache/libvirt/qemu/capabilities/module-changed

The daemon could then check changes to this file when validating the cache. I'm 
not familiar enough with deb packaging to know if there is a similar mechanism 
to filetrigger.


Regards,
Jim




Re: [PATCH v2 07/17] libvirt-storage: Document volume upload/download stream format

2020-08-20 Thread Peter Krempa
On Thu, Aug 20, 2020 at 14:29:21 +0200, Michal Privoznik wrote:
> On 8/20/20 10:35 AM, Peter Krempa wrote:
> > On Tue, Jul 07, 2020 at 21:46:25 +0200, Michal Privoznik wrote:
> > > For libvirt, the volume is just a binary blob and it doesn't
> > > interpret data on volume upload/download. But as it turns out,
> > > this unspoken assumption is not clear to our users. Document it
> > > explicitly.
> > > 
> > > Suggested in: https://bugzilla.redhat.com/show_bug.cgi?id=1851023#c17
> > > 
> > > Signed-off-by: Michal Privoznik 
> > > ---
> > >   src/libvirt-storage.c | 8 ++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/src/libvirt-storage.c b/src/libvirt-storage.c
> > > index a45c8b98c1..8738f6dd14 100644
> > > --- a/src/libvirt-storage.c
> > > +++ b/src/libvirt-storage.c
> > > @@ -1590,7 +1590,9 @@ virStorageVolCreateXMLFrom(virStoragePoolPtr pool,
> > >*
> > >* Download the content of the volume as a stream. If @length
> > >* is zero, then the remaining contents of the volume after
> > > - * @offset will be downloaded.
> > > + * @offset will be downloaded. Please note, that the data is
> > > + * not interpreted and therefore data received by stream
> > > + * client are in the very same format the volume is in.
> > 
> > I don't think this wording clarifies it that much as it's not obvious
> > what's meant by 'interpreted'.
> > 
> > How about:
> > 
> > Please note that the stream transports the volume itself as is, so the
> > downloaded data may not correspond to guest OS visible state in cases
> > when a complex storage format such as qcow2 or vmdk is used.
> > 
> 
> Fine by me.

Reviewed-by: Peter Krempa 

> 
> Michal



Re: [PATCH v2 13/17] virfdstream: Allow sparse stream vol-download

2020-08-20 Thread Peter Krempa
On Thu, Aug 20, 2020 at 15:31:28 +0200, Michal Privoznik wrote:
> On 8/20/20 1:57 PM, Peter Krempa wrote:
> > On Tue, Jul 07, 2020 at 21:46:31 +0200, Michal Privoznik wrote:
> > > When handling sparse stream, a thread is executed. This thread
> > > runs a read() or write() loop (depending what API is called; in
> > > this case it's virStorageVolDownload() and  this the thread run
> > > read() loop). The read() is handled in virFDStreamThreadDoRead()
> > > which is then data/hole section aware, meaning it uses
> > > virFileInData() to detect data and hole sections and sends
> > > TYPE_DATA or TYPE_HOLE virStream messages accordingly.
> > > 
> > > However, virFileInData() does not work with block devices. Simply
> > > because block devices don't have data and hole sections. But we
> > > can use new virFileInDataDetectZeroes() which is block device
> > > friendly for that.
> > > 
> > > Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528
> > > 
> > > Signed-off-by: Michal Privoznik 
> > > ---
> > >   src/util/virfdstream.c | 15 ---
> > >   1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > IMO this goes against the semantics of the _SPARSE_STREAM flag. A block
> > device by definition is not sparse, so there are no holes to send.
> > 
> > What you've implemented is a way to sparsify a block device, but that
> > IMO should not be considered by default when a block device is used.
> > If a file is not sparse, the previous code doesn't actually transmit
> > holes either.
> > 
> > If you want to achieve sparsification on the source side of the
> > transmission, this IMO needs an explicit flag to opt-in and then we
> > should sparsify also regular files using the same algorithm.
> > 
> 
> Fair enough. So how about I'll send v3 where:
> 
> a) in the first patches I make our stream read/write functions handle block
> devices for _SPARSE_STREAM without any zero block detection. Only thing that
> will happen is that if the source is a sparse regular file and thus the
> stream receiver gets a HOLE packet and it is writing the data into a block
> device it will have to emulate the hole by writing block of zeroes. However,
> if the stream source is a block device then no HOLE shall ever be sent.

AFAIK I've R-b'd enough patches to fix this portion and provided that
there aren't any merge conflicts you can already commit those.

I'm completely fine with that portion as-is.

> b) in next patches I'll introduce _DETECT_ZEROES flag (and possibly make it
> require _SPARSE_STREAM too) which will handle the case where the stream
> source is a block device with zero blocks, at which point it will try to
> detect them and be allowed to send HOLE down the stream.

On this topic, I agree that it's a sensible approach for the rest of the
series and it at least unifies the behaviour.

I'm unsure though whether it's worth even doing _DETECT_ZEROES feature
at all though. To me it feels that the users are better off using other
tools rather than re-implementing yet another thing in libvirt.

If possible provide some additional justification here.



Re: [PATCH v2 13/17] virfdstream: Allow sparse stream vol-download

2020-08-20 Thread Michal Privoznik

On 8/20/20 1:57 PM, Peter Krempa wrote:

On Tue, Jul 07, 2020 at 21:46:31 +0200, Michal Privoznik wrote:

When handling sparse stream, a thread is executed. This thread
runs a read() or write() loop (depending what API is called; in
this case it's virStorageVolDownload() and  this the thread run
read() loop). The read() is handled in virFDStreamThreadDoRead()
which is then data/hole section aware, meaning it uses
virFileInData() to detect data and hole sections and sends
TYPE_DATA or TYPE_HOLE virStream messages accordingly.

However, virFileInData() does not work with block devices. Simply
because block devices don't have data and hole sections. But we
can use new virFileInDataDetectZeroes() which is block device
friendly for that.

Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528

Signed-off-by: Michal Privoznik 
---
  src/util/virfdstream.c | 15 ---
  1 file changed, 12 insertions(+), 3 deletions(-)


IMO this goes against the semantics of the _SPARSE_STREAM flag. A block
device by definition is not sparse, so there are no holes to send.

What you've implemented is a way to sparsify a block device, but that
IMO should not be considered by default when a block device is used.
If a file is not sparse, the previous code doesn't actually transmit
holes either.

If you want to achieve sparsification on the source side of the
transmission, this IMO needs an explicit flag to opt-in and then we
should sparsify also regular files using the same algorithm.



Fair enough. So how about I'll send v3 where:

a) in the first patches I make our stream read/write functions handle 
block devices for _SPARSE_STREAM without any zero block detection. Only 
thing that will happen is that if the source is a sparse regular file 
and thus the stream receiver gets a HOLE packet and it is writing the 
data into a block device it will have to emulate the hole by writing 
block of zeroes. However, if the stream source is a block device then no 
HOLE shall ever be sent.


b) in next patches I'll introduce _DETECT_ZEROES flag (and possibly make 
it require _SPARSE_STREAM too) which will handle the case where the 
stream source is a block device with zero blocks, at which point it will 
try to detect them and be allowed to send HOLE down the stream.


Does this sound reasonable?

BTW: I've pushed the first N patches which were just switching FDStream 
to glib.


Michal



Re: device compatibility interface for live migration with assigned devices

2020-08-20 Thread Sean Mooney
On Thu, 2020-08-20 at 14:27 +0800, Yan Zhao wrote:
> On Thu, Aug 20, 2020 at 06:16:28AM +0100, Sean Mooney wrote:
> > On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote:
> > > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote:
> > > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote:
> > > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote:
> > > > > > On Tue, 18 Aug 2020 10:16:28 +0100
> > > > > > Daniel P. Berrangé  wrote:
> > > > > > 
> > > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote:
> > > > > > > >On 2020/8/18 下午4:55, Daniel P. Berrangé wrote:
> > > > > > > > 
> > > > > > > >  On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
> > > > > > > > 
> > > > > > > >  On 2020/8/14 下午1:16, Yan Zhao wrote:
> > > > > > > > 
> > > > > > > >  On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
> > > > > > > > 
> > > > > > > >  On 2020/8/10 下午3:46, Yan Zhao wrote:  
> > > > > > > >  we actually can also retrieve the same information through 
> > > > > > > > sysfs, .e.g
> > > > > > > > 
> > > > > > > >  |- [path to device]
> > > > > > > > |--- migration
> > > > > > > > | |--- self
> > > > > > > > | |   |---device_api
> > > > > > > > ||   |---mdev_type
> > > > > > > > ||   |---software_version
> > > > > > > > ||   |---device_id
> > > > > > > > ||   |---aggregator
> > > > > > > > | |--- compatible
> > > > > > > > | |   |---device_api
> > > > > > > > ||   |---mdev_type
> > > > > > > > ||   |---software_version
> > > > > > > > ||   |---device_id
> > > > > > > > ||   |---aggregator
> > > > > > > > 
> > > > > > > > 
> > > > > > > >  Yes but:
> > > > > > > > 
> > > > > > > >  - You need one file per attribute (one syscall for one 
> > > > > > > > attribute)
> > > > > > > >  - Attribute is coupled with kobject
> > > > > > 
> > > > > > Is that really that bad? You have the device with an embedded 
> > > > > > kobject
> > > > > > anyway, and you can just put things into an attribute group?
> > > > > > 
> > > > > > [Also, I think that self/compatible split in the example makes 
> > > > > > things
> > > > > > needlessly complex. Shouldn't semantic versioning and matching 
> > > > > > already
> > > > > > cover nearly everything? I would expect very few cases that are more
> > > > > > complex than that. Maybe the aggregation stuff, but I don't think we
> > > > > > need that self/compatible split for that, either.]
> > > > > 
> > > > > Hi Cornelia,
> > > > > 
> > > > > The reason I want to declare compatible list of attributes is that
> > > > > sometimes it's not a simple 1:1 matching of source attributes and 
> > > > > target attributes
> > > > > as I demonstrated below,
> > > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is 
> > > > > compatible to
> > > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2),
> > > > >(mdev_type i915-GVTg_V5_8 + aggregator 4)
> > > > 
> > > > the way you are doing the nameing is till really confusing by the way
> > > > if this has not already been merged in the kernel can you chagne the 
> > > > mdev
> > > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 
> > > > instead of half the device
> > > > 
> > > > currently you need to deived the aggratod by the number at the end of 
> > > > the mdev type to figure out
> > > > how much of the phsicial device is being used with is a very unfridly 
> > > > api convention
> > > > 
> > > > the way aggrator are being proposed in general is not really someting i 
> > > > like but i thin this at least
> > > > is something that should be able to correct.
> > > > 
> > > > with the complexity in the mdev type name + aggrator i suspect that 
> > > > this will never be support
> > > > in openstack nova directly requireing integration via cyborg unless we 
> > > > can pre partion the
> > > > device in to mdevs staicaly and just ignore this.
> > > > 
> > > > this is way to vendor sepecif to integrate into something like 
> > > > openstack in nova unless we can guarentee
> > > > taht how aggreator work will be portable across vendors genericly.
> > > > 
> > > > > 
> > > > > and aggragator may be just one of such examples that 1:1 matching 
> > > > > does not
> > > > > fit.
> > > > 
> > > > for openstack nova i dont see us support anything beyond the 1:1 case 
> > > > where the mdev type does not change.
> > > > 
> > > 
> > > hi Sean,
> > > I understand it's hard for openstack. but 1:N is always meaningful.
> > > e.g.
> > > if source device 1 has cap A, it is compatible to
> > > device 2: cap A,
> > > device 3: cap A+B,
> > > device 4: cap A+B+C
> > > 
> > > to allow openstack to detect it correctly, in compatible list of
> > > device 2, we would say compatible cap is A;
> > > device 3, compatible cap is A or A+B;
> > > device 4, compatible cap is A or A+B, or A+B+C;
> > > 
> > > then if openstack finds device A's self cap A is contained in 

Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Christian Ehrhardt
On Thu, Aug 20, 2020 at 12:43 PM Martin Wilck  wrote:
>
> On Thu, 2020-08-20 at 11:00 +0100, Daniel P. Berrangé wrote:
> > On Wed, Aug 05, 2020 at 10:19:26AM +0200, Andrea Bolognani wrote:
> > > contents change.
> > >
> > > That said, if upgrading QEMU results in losing features, even
> > > though
> > > you can recover them through additional steps I would argue that's
> > > a
> > > bug in the packaging that should be addressed on the QEMU side.
> >
> > Potentially we can consider this a  distro packaging problem and fix
> > it at the RPM level.
> >
> > eg the libvirt RPM can define a trigger that runs when *any* of the
> > qemu-device*  RPMs is installed/updated. This trigger can simply
> > touch a file on disk somewhere, and libvirtd can monitor this one
> > file, instead of having to monitor every module.
>
> The simplest approach is to touch the qemu binaries. We discussed this
> already. It has the drawback that it makes "rpm -V" complain about
> wrong timestamps. It might also confuse backup software. Still, it
> might be a viable short-term workaround if nothing else is available.

Qemu already allows to save modules in /var/run/qemu/ [1] to better handle
module upgrades which is already used in Debian and Ubuntu to avoid
late module load errors after upgrades.

This was meant for upgrades, but if libvirt would define a known path in
there like /var/run/qemu/last_packaging_change the packages could easily
touch it on any install/remove/update as Daniel suggested and libvirt could
check this path like it does with the date of the qemu binary already.

[1]: 
https://github.com/qemu/qemu/commit/bd83c861c0628a64997b7bd95c3bcc2e916baf2e


> Cheers,
> Martin
>
>


-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd




Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Peter Krempa
On Thu, Aug 20, 2020 at 14:36:15 +0200, Michal Privoznik wrote:
> On 8/20/20 1:02 PM, Peter Krempa wrote:
> > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> > > This function will be used to detect zero buffers (which are
> > > going to be interpreted as hole in virStream later).
> > > 
> > > I shamelessly took inspiration from coreutils.
> > 
> > Coreutils is proudly GPLv3 ...
> > 
> 
> Sure. But it was discussed in v1 and I think we agreed that the algorithm is
> generic enough that it can be used in Libvirt too:
> 
> https://www.redhat.com/archives/libvir-list/2020-July/msg00170.html
> 
> > > Signed-off-by: Michal Privoznik 
> > > ---
> > >   src/libvirt_private.syms |  1 +
> > >   src/util/virstring.c | 38 
> > >   src/util/virstring.h |  2 ++
> > >   tests/virstringtest.c| 47 
> > >   4 files changed, 88 insertions(+)
> > 
> > [...]
> > 
> > > diff --git a/src/util/virstring.c b/src/util/virstring.c
> > > index e9e792f3bf..c26bc770d4 100644
> > > --- a/src/util/virstring.c
> > > +++ b/src/util/virstring.c
> > > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool 
> > > *result)
> > >   return 0;
> > >   }
> > > +
> > > +
> > > +/**
> > > + * virStringIsNull:
> > 
> > IsNull might indicate that this does a check if the pointer is NULL. You
> > are checking for NUL bytes.
> 
> Fair enough. I'm out of ideas though. Do you have a suggestion?
> 
> > 
> > > + * @buf: buffer to check
> > > + * @len: the length of the buffer
> > > + *
> > > + * For given buffer @buf and its size @len determine whether
> > > + * it contains only zero bytes (NUL) or not.
> > 
> > Given the semantics of C strings being terminated by the NUL byte I
> > don't think this function qualifies as a string helper and thus should
> > probably reside somewhere outside of virstring.h
> 
> Alright. I will try to find better location. But since I want to use this
> function from virsh too I'd like to have it in utils.
> 
> > 
> > > + *
> > > + * Returns: true if buffer is full of zero bytes,
> > > + *  false otherwise.
> > > + */
> > > +bool virStringIsNull(const char *buf, size_t len)
> > > +{
> > > +const char *p = buf;
> > > +
> > > +if (!len)
> > > +return true;
> > > +
> > > +/* Check up to 16 first bytes. */
> > > +for (;;) {
> > > +if (*p)
> > > +return false;
> > > +
> > > +p++;
> > > +len--;
> > > +
> > > +if (!len)
> > > +return true;
> > > +
> > > +if ((len & 0xf) == 0)
> > > +break;
> > > +}
> > 
> > Do we really need to do this optimization? We could arguably simplify
> > this to:
> > 
> >  if (*buf != '\0')
> >  return false;
> > 
> >  return memcmp(buf, buf + 1, len - 1);
> > 
> > You can then use the saved lines to explain that comparing a piece of
> > memory with itself shifted by any position just ensures that there are
> > repeating sequences of itself in the remainder and by shifting it by 1
> > it means that it checks that the strings are just the same byte. The
> > check above then ensuers that the one byte is NUL.
> > 
> 
> The idea is to pass aligned address to memcmp(). But I guess we can let
> memcmp() deal with that.

Well, this explanation might justify the algorithm above, but it's
certainly not obvious, so please add a comment.



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Michal Privoznik

On 8/20/20 1:02 PM, Peter Krempa wrote:

On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:

This function will be used to detect zero buffers (which are
going to be interpreted as hole in virStream later).

I shamelessly took inspiration from coreutils.


Coreutils is proudly GPLv3 ...



Sure. But it was discussed in v1 and I think we agreed that the 
algorithm is generic enough that it can be used in Libvirt too:


https://www.redhat.com/archives/libvir-list/2020-July/msg00170.html


Signed-off-by: Michal Privoznik 
---
  src/libvirt_private.syms |  1 +
  src/util/virstring.c | 38 
  src/util/virstring.h |  2 ++
  tests/virstringtest.c| 47 
  4 files changed, 88 insertions(+)


[...]


diff --git a/src/util/virstring.c b/src/util/virstring.c
index e9e792f3bf..c26bc770d4 100644
--- a/src/util/virstring.c
+++ b/src/util/virstring.c
@@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result)
  
  return 0;

  }
+
+
+/**
+ * virStringIsNull:


IsNull might indicate that this does a check if the pointer is NULL. You
are checking for NUL bytes.


Fair enough. I'm out of ideas though. Do you have a suggestion?




+ * @buf: buffer to check
+ * @len: the length of the buffer
+ *
+ * For given buffer @buf and its size @len determine whether
+ * it contains only zero bytes (NUL) or not.


Given the semantics of C strings being terminated by the NUL byte I
don't think this function qualifies as a string helper and thus should
probably reside somewhere outside of virstring.h


Alright. I will try to find better location. But since I want to use 
this function from virsh too I'd like to have it in utils.





+ *
+ * Returns: true if buffer is full of zero bytes,
+ *  false otherwise.
+ */
+bool virStringIsNull(const char *buf, size_t len)
+{
+const char *p = buf;
+
+if (!len)
+return true;
+
+/* Check up to 16 first bytes. */
+for (;;) {
+if (*p)
+return false;
+
+p++;
+len--;
+
+if (!len)
+return true;
+
+if ((len & 0xf) == 0)
+break;
+}


Do we really need to do this optimization? We could arguably simplify
this to:

 if (*buf != '\0')
 return false;

 return memcmp(buf, buf + 1, len - 1);

You can then use the saved lines to explain that comparing a piece of
memory with itself shifted by any position just ensures that there are
repeating sequences of itself in the remainder and by shifting it by 1
it means that it checks that the strings are just the same byte. The
check above then ensuers that the one byte is NUL.



The idea is to pass aligned address to memcmp(). But I guess we can let 
memcmp() deal with that.


Michal



Re: [PATCH v2 07/17] libvirt-storage: Document volume upload/download stream format

2020-08-20 Thread Michal Privoznik

On 8/20/20 10:35 AM, Peter Krempa wrote:

On Tue, Jul 07, 2020 at 21:46:25 +0200, Michal Privoznik wrote:

For libvirt, the volume is just a binary blob and it doesn't
interpret data on volume upload/download. But as it turns out,
this unspoken assumption is not clear to our users. Document it
explicitly.

Suggested in: https://bugzilla.redhat.com/show_bug.cgi?id=1851023#c17

Signed-off-by: Michal Privoznik 
---
  src/libvirt-storage.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/libvirt-storage.c b/src/libvirt-storage.c
index a45c8b98c1..8738f6dd14 100644
--- a/src/libvirt-storage.c
+++ b/src/libvirt-storage.c
@@ -1590,7 +1590,9 @@ virStorageVolCreateXMLFrom(virStoragePoolPtr pool,
   *
   * Download the content of the volume as a stream. If @length
   * is zero, then the remaining contents of the volume after
- * @offset will be downloaded.
+ * @offset will be downloaded. Please note, that the data is
+ * not interpreted and therefore data received by stream
+ * client are in the very same format the volume is in.


I don't think this wording clarifies it that much as it's not obvious
what's meant by 'interpreted'.

How about:

Please note that the stream transports the volume itself as is, so the
downloaded data may not correspond to guest OS visible state in cases
when a complex storage format such as qcow2 or vmdk is used.



Fine by me.

Michal



Re: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices

2020-08-20 Thread Cornelia Huck
On Wed, 19 Aug 2020 17:28:38 +0800
Jason Wang  wrote:

> On 2020/8/19 下午4:13, Yan Zhao wrote:
> > On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote:  
> >> On 2020/8/19 下午2:59, Yan Zhao wrote:  
> >>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote:  
>  On 2020/8/19 上午11:30, Yan Zhao wrote:  
> > hi All,
> > could we decide that sysfs is the interface that every VFIO vendor 
> > driver
> > needs to provide in order to support vfio live migration, otherwise the
> > userspace management tool would not list the device into the compatible
> > list?
> >
> > if that's true, let's move to the standardizing of the sysfs interface.
> > (1) content
> > common part: (must)
> >   - software_version: (in major.minor.bugfix scheme)  
>  This can not work for devices whose features can be negotiated/advertised
>  independently. (E.g virtio devices)

I thought the 'software_version' was supposed to describe kind of a
'protocol version' for the data we transmit? I.e., you add a new field,
you bump the version number.

>   
> >>> sorry, I don't understand here, why virtio devices need to use vfio 
> >>> interface?  
> >>
> >> I don't see any reason that virtio devices can't be used by VFIO. Do you?
> >>
> >> Actually, virtio devices have been used by VFIO for many years:
> >>
> >> - passthrough a hardware virtio devices to userspace(VM) drivers
> >> - using virtio PMD inside guest
> >>  
> > So, what's different for it vs passing through a physical hardware via 
> > VFIO?  
> 
> 
> The difference is in the guest, the device could be either real hardware 
> or emulated ones.
> 
> 
> > even though the features are negotiated dynamically, could you explain
> > why it would cause software_version not work?  
> 
> 
> Virtio device 1 supports feature A, B, C
> Virtio device 2 supports feature B, C, D
> 
> So you can't migrate a guest from device 1 to device 2. And it's 
> impossible to model the features with versions.

We're talking about the features offered by the device, right? Would it
be sufficient to mandate that the target device supports the same
features or a superset of the features supported by the source device?

> 
> 
> >
> >  
> >>> I think this thread is discussing about vfio related devices.
> >>>  
> >   - device_api: vfio-pci or vfio-ccw ...
> >   - type: mdev type for mdev device or
> >   a signature for physical device which is a counterpart for
> >mdev type.
> >
> > device api specific part: (must)
> >  - pci id: pci id of mdev parent device or pci id of physical pci
> >device (device_api is vfio-pci)API here.  
>  So this assumes a PCI device which is probably not true.
>   
> >>> for device_api of vfio-pci, why it's not true?
> >>>
> >>> for vfio-ccw, it's subchannel_type.  
> >>
> >> Ok but having two different attributes for the same file is not good idea.
> >> How mgmt know there will be a 3rd type?  
> > that's why some attributes need to be common. e.g.
> > device_api: it's common because mgmt need to know it's a pci device or a
> >  ccw device. and the api type is already defined vfio.h.
> > (The field is agreed by and actually suggested by Alex in previous 
> > mail)
> > type: mdev_type for mdev. if mgmt does not understand it, it would not
> >be able to create one compatible mdev device.
> > software_version: mgmt can compare the major and minor if it understands
> >this fields.  
> 
> 
> I think it would be helpful if you can describe how mgmt is expected to 
> work step by step with the proposed sysfs API. This can help people to 
> understand.

My proposal would be:
- check that device_api matches
- check possible device_api specific attributes
- check that type matches [I don't think the combination of mdev types
  and another attribute to determine compatibility is a good idea;
  actually, the current proposal confuses me every time I look at it]
- check that software_version is compatible, assuming semantic
  versioning
- check possible type-specific attributes

> 
> Thanks for the patience. Since sysfs is uABI, when accepted, we need 
> support it forever. That's why we need to be careful.

Nod.

(...)



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Peter Krempa
On Thu, Aug 20, 2020 at 13:17:42 +0100, Daniel Berrange wrote:
> On Thu, Aug 20, 2020 at 01:02:55PM +0200, Peter Krempa wrote:
> > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> > > This function will be used to detect zero buffers (which are
> > > going to be interpreted as hole in virStream later).
> > > 
> > > I shamelessly took inspiration from coreutils.
> > 
> > Coreutils is proudly GPLv3 ...
> > 
> > > Signed-off-by: Michal Privoznik 
> > > ---

[...]

> > Do we really need to do this optimization? We could arguably simplify
> > this to:
> > 
> > if (*buf != '\0')
> > return false;
> > 
> > return memcmp(buf, buf + 1, len - 1);
> 
> Depends whether we care about this sparsification having goood
> performance or not. As a point of reference, QEMU has invested
> tonnes of effort in its impl, using highly tuned impls for
> SSE, AVX, etc switched at runtime.

Well, comments to other patches in this series question whether we
actually want to do explicit sparsification at all.

Users always can explicitly sparsify their storage using the qemu tools
and the implementation this series adds (the sparsification on read) is
IMO actually stretching the semantics of the _SPARSE_STREAM flag too
much and creating different behaviour depending on whether a block
device is used (full sparsification) or a file is used (sparseness of
the file is preserved, explicitly allocated but zeroed blocks are
transported in full).



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Daniel P . Berrangé
On Thu, Aug 20, 2020 at 01:02:55PM +0200, Peter Krempa wrote:
> On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> > This function will be used to detect zero buffers (which are
> > going to be interpreted as hole in virStream later).
> > 
> > I shamelessly took inspiration from coreutils.
> 
> Coreutils is proudly GPLv3 ...
> 
> > Signed-off-by: Michal Privoznik 
> > ---
> >  src/libvirt_private.syms |  1 +
> >  src/util/virstring.c | 38 
> >  src/util/virstring.h |  2 ++
> >  tests/virstringtest.c| 47 
> >  4 files changed, 88 insertions(+)
> 
> [...]
> 
> > diff --git a/src/util/virstring.c b/src/util/virstring.c
> > index e9e792f3bf..c26bc770d4 100644
> > --- a/src/util/virstring.c
> > +++ b/src/util/virstring.c
> > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool 
> > *result)
> >  
> >  return 0;
> >  }
> > +
> > +
> > +/**
> > + * virStringIsNull:
> 
> IsNull might indicate that this does a check if the pointer is NULL. You
> are checking for NUL bytes.
> 
> > + * @buf: buffer to check
> > + * @len: the length of the buffer
> > + *
> > + * For given buffer @buf and its size @len determine whether
> > + * it contains only zero bytes (NUL) or not.
> 
> Given the semantics of C strings being terminated by the NUL byte I
> don't think this function qualifies as a string helper and thus should
> probably reside somewhere outside of virstring.h
> 
> > + *
> > + * Returns: true if buffer is full of zero bytes,
> > + *  false otherwise.
> > + */
> > +bool virStringIsNull(const char *buf, size_t len)
> > +{
> > +const char *p = buf;
> > +
> > +if (!len)
> > +return true;
> > +
> > +/* Check up to 16 first bytes. */
> > +for (;;) {
> > +if (*p)
> > +return false;
> > +
> > +p++;
> > +len--;
> > +
> > +if (!len)
> > +return true;
> > +
> > +if ((len & 0xf) == 0)
> > +break;
> > +}
> 
> Do we really need to do this optimization? We could arguably simplify
> this to:
> 
> if (*buf != '\0')
> return false;
> 
> return memcmp(buf, buf + 1, len - 1);

Depends whether we care about this sparsification having goood
performance or not. As a point of reference, QEMU has invested
tonnes of effort in its impl, using highly tuned impls for
SSE, AVX, etc switched at runtime.



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH v2 15/17] virfdstream: Emulate skip for block devices

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:33 +0200, Michal Privoznik wrote:
> This is similar to one of previous patches.
> 
> When receiving stream (on virStorageVolUpload() and subsequent
> virStreamSparseSendAll()) we may receive a hole. If the volume we
> are saving the incoming data into is a regular file we just
> lseek() and ftruncate() to create the hole. But this won't work
> if the file is a block device. If that is the case we must write
> zeroes so that any subsequent reader reads nothing just zeroes
> (just like they would from a hole in a regular file).
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 59 +++---
>  1 file changed, 44 insertions(+), 15 deletions(-)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 13/17] virfdstream: Allow sparse stream vol-download

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:31 +0200, Michal Privoznik wrote:
> When handling sparse stream, a thread is executed. This thread
> runs a read() or write() loop (depending what API is called; in
> this case it's virStorageVolDownload() and  this the thread run
> read() loop). The read() is handled in virFDStreamThreadDoRead()
> which is then data/hole section aware, meaning it uses
> virFileInData() to detect data and hole sections and sends
> TYPE_DATA or TYPE_HOLE virStream messages accordingly.
> 
> However, virFileInData() does not work with block devices. Simply
> because block devices don't have data and hole sections. But we
> can use new virFileInDataDetectZeroes() which is block device
> friendly for that.
> 
> Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)

IMO this goes against the semantics of the _SPARSE_STREAM flag. A block
device by definition is not sparse, so there are no holes to send.

What you've implemented is a way to sparsify a block device, but that
IMO should not be considered by default when a block device is used.
If a file is not sparse, the previous code doesn't actually transmit
holes either.

If you want to achieve sparsification on the source side of the
transmission, this IMO needs an explicit flag to opt-in and then we
should sparsify also regular files using the same algorithm.



Re: [PATCH v2 12/17] virshStreamSkip: Emulate skip for block devices

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:30 +0200, Michal Privoznik wrote:
> This callback is called when the server sends us STREAM_HOLE
> meaning there is no real data, only zeroes. For regular files
> we would just seek() beyond EOF and ftruncate() to create the
> hole. But for block devices this won't work. Not only we can't
> seek() beyond EOF, and ftruncate() will fail, this approach won't
> fill the device with zeroes. We have to do it manually.
> 
> Partially resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1852528
> 
> Signed-off-by: Michal Privoznik 
> ---
>  tools/virsh-util.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 11/17] virsh: Track if vol-upload or vol-download work over a block device

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:29 +0200, Michal Privoznik wrote:
> We can't use virFileInData() with block devices, but we could use
> new virFileInDataDetectZeroes(). But to decide we need to know if
> the FD we are reading data from / writing data to is a block
> device. Store this information in _virshStreamCallbackData.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  tools/virsh-util.h   |  1 +
>  tools/virsh-volume.c | 14 ++
>  2 files changed, 15 insertions(+)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 10/17] virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and virshStreamSkip()

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:28 +0200, Michal Privoznik wrote:
> These callback will need to know more that the FD they are
> working on. Pass the structure that is passed to other stream
> callbacks (e.g. virshStreamSource() or virshStreamSourceSkip())
> instead of inventing a new one.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  tools/virsh-util.c   | 10 +-
>  tools/virsh-volume.c |  6 +-
>  2 files changed, 10 insertions(+), 6 deletions(-)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> This function will be used to detect zero buffers (which are
> going to be interpreted as hole in virStream later).
> 
> I shamelessly took inspiration from coreutils.

Coreutils is proudly GPLv3 ...

> Signed-off-by: Michal Privoznik 
> ---
>  src/libvirt_private.syms |  1 +
>  src/util/virstring.c | 38 
>  src/util/virstring.h |  2 ++
>  tests/virstringtest.c| 47 
>  4 files changed, 88 insertions(+)

[...]

> diff --git a/src/util/virstring.c b/src/util/virstring.c
> index e9e792f3bf..c26bc770d4 100644
> --- a/src/util/virstring.c
> +++ b/src/util/virstring.c
> @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result)
>  
>  return 0;
>  }
> +
> +
> +/**
> + * virStringIsNull:

IsNull might indicate that this does a check if the pointer is NULL. You
are checking for NUL bytes.

> + * @buf: buffer to check
> + * @len: the length of the buffer
> + *
> + * For given buffer @buf and its size @len determine whether
> + * it contains only zero bytes (NUL) or not.

Given the semantics of C strings being terminated by the NUL byte I
don't think this function qualifies as a string helper and thus should
probably reside somewhere outside of virstring.h

> + *
> + * Returns: true if buffer is full of zero bytes,
> + *  false otherwise.
> + */
> +bool virStringIsNull(const char *buf, size_t len)
> +{
> +const char *p = buf;
> +
> +if (!len)
> +return true;
> +
> +/* Check up to 16 first bytes. */
> +for (;;) {
> +if (*p)
> +return false;
> +
> +p++;
> +len--;
> +
> +if (!len)
> +return true;
> +
> +if ((len & 0xf) == 0)
> +break;
> +}

Do we really need to do this optimization? We could arguably simplify
this to:

if (*buf != '\0')
return false;

return memcmp(buf, buf + 1, len - 1);

You can then use the saved lines to explain that comparing a piece of
memory with itself shifted by any position just ensures that there are
repeating sequences of itself in the remainder and by shifting it by 1
it means that it checks that the strings are just the same byte. The
check above then ensuers that the one byte is NUL.



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Martin Wilck
On Thu, 2020-08-20 at 11:00 +0100, Daniel P. Berrangé wrote:
> On Wed, Aug 05, 2020 at 10:19:26AM +0200, Andrea Bolognani wrote:
> > contents change.
> > 
> > That said, if upgrading QEMU results in losing features, even
> > though
> > you can recover them through additional steps I would argue that's
> > a
> > bug in the packaging that should be addressed on the QEMU side.
> 
> Potentially we can consider this a  distro packaging problem and fix
> it at the RPM level.
> 
> eg the libvirt RPM can define a trigger that runs when *any* of the
> qemu-device*  RPMs is installed/updated. This trigger can simply
> touch a file on disk somewhere, and libvirtd can monitor this one
> file, instead of having to monitor every module.

The simplest approach is to touch the qemu binaries. We discussed this
already. It has the drawback that it makes "rpm -V" complain about
wrong timestamps. It might also confuse backup software. Still, it
might be a viable short-term workaround if nothing else is available.

Cheers,
Martin




Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Daniel P . Berrangé
On Thu, Aug 20, 2020 at 12:31:15PM +0200, Martin Wilck wrote:
> On Thu, 2020-08-20 at 10:57 +0100, Daniel P. Berrangé wrote:
> > On Thu, Aug 20, 2020 at 11:32:03AM +0200, Martin Wilck wrote:
> > > On Tue, 2020-08-18 at 15:15 -0600, Jim Fehlig wrote:
> > > > On 8/5/20 2:19 AM, Andrea Bolognani wrote:
> > > > > I guess we need to start checking the modules directory in
> > > > > addition
> > > > > to the main QEMU binary, and regenerate capabilities every time
> > > > > its
> > > > > contents change.
> > > > 
> > > > We recently received reports of this issue on Tumbleweed, which
> > > > just
> > > > got the 
> > > > modularized qemu 5.1
> > > > 
> > > > https://bugzilla.opensuse.org/show_bug.cgi?id=1175320
> > > > 
> > > > Mark, are you working on a patch to invalidate the cache on
> > > > changes
> > > > to the qemu 
> > > > modules directory? I suppose it needs to be handled similar to
> > > > the
> > > > qemu 
> > > > binaries. E.g. when building the cache include a list of all qemu
> > > > modules found. 
> > > > When validating the cache see if any modules have disappeared, if
> > > > any
> > > > have been 
> > > > added, and if the ctime of any have changed. Yikes, sounds a
> > > > little
> > > > more complex 
> > > > than the binaries :-).
> > > 
> > > I'd like to question whether this effort is justified for an
> > > optimization that matters only at libvirtd startup time, and even
> > > there
> > > saves no more than a few seconds.
> > > 
> > > I propose to simply disable the caching of qemu capabilities (or
> > > provide a build-time option to do so). Optimizations that produce
> > > wrong
> > > results should be avoided.
> > 
> > Whether the time matters depends on your use case for QEMU. For heavy
> > data center apps like OpenStack you won't notice it because OpenStack
> > itself adds soo much overhead to the system.  For cases where the VM
> > is used as "embedded" infrastructure startup time can be critical.
> > Not caching capabilities adds easily 300-500 ms to startup of a
> > single
> > VM, which is very significant when the current minimum startup time
> > of
> > a VM can be as low as 150 ms.
> > 
> > IOW removing caching is not a viable option.
> 
> Capability caching could be turned into a build-time option, optimized
> for the target audience.

It is rare at build time to know what your target audience's apps are
going to be doing when you are a OS distro. So it isn't a decision that
can usefully be made at build time.

> Or we could enable caching in general, but always rescan capabilites at
> libvirtd startup. That way startup of VMs wouldn't be slowed down. No?

Scanning at libvirtd startup is something we work very hard to avoid.
When you have 20 QEMU system emulators installed, it makes libvirtd
startup incredibly slow which is a big problem when we are using
auto-start + auto-shutdown for libvirtd.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Martin Wilck
On Thu, 2020-08-20 at 10:57 +0100, Daniel P. Berrangé wrote:
> On Thu, Aug 20, 2020 at 11:32:03AM +0200, Martin Wilck wrote:
> > On Tue, 2020-08-18 at 15:15 -0600, Jim Fehlig wrote:
> > > On 8/5/20 2:19 AM, Andrea Bolognani wrote:
> > > > I guess we need to start checking the modules directory in
> > > > addition
> > > > to the main QEMU binary, and regenerate capabilities every time
> > > > its
> > > > contents change.
> > > 
> > > We recently received reports of this issue on Tumbleweed, which
> > > just
> > > got the 
> > > modularized qemu 5.1
> > > 
> > > https://bugzilla.opensuse.org/show_bug.cgi?id=1175320
> > > 
> > > Mark, are you working on a patch to invalidate the cache on
> > > changes
> > > to the qemu 
> > > modules directory? I suppose it needs to be handled similar to
> > > the
> > > qemu 
> > > binaries. E.g. when building the cache include a list of all qemu
> > > modules found. 
> > > When validating the cache see if any modules have disappeared, if
> > > any
> > > have been 
> > > added, and if the ctime of any have changed. Yikes, sounds a
> > > little
> > > more complex 
> > > than the binaries :-).
> > 
> > I'd like to question whether this effort is justified for an
> > optimization that matters only at libvirtd startup time, and even
> > there
> > saves no more than a few seconds.
> > 
> > I propose to simply disable the caching of qemu capabilities (or
> > provide a build-time option to do so). Optimizations that produce
> > wrong
> > results should be avoided.
> 
> Whether the time matters depends on your use case for QEMU. For heavy
> data center apps like OpenStack you won't notice it because OpenStack
> itself adds soo much overhead to the system.  For cases where the VM
> is used as "embedded" infrastructure startup time can be critical.
> Not caching capabilities adds easily 300-500 ms to startup of a
> single
> VM, which is very significant when the current minimum startup time
> of
> a VM can be as low as 150 ms.
> 
> IOW removing caching is not a viable option.

Capability caching could be turned into a build-time option, optimized
for the target audience.

Or we could enable caching in general, but always rescan capabilites at
libvirtd startup. That way startup of VMs wouldn't be slowed down. No?

The latter would have the additional advantage that people who
(de)install qemu modules would simply need to restart libvirtd in order
to get a working system back. I suppose that would make sense to most
users, and probably even occur to them as a workaround. Currently users
have to locate and manually delete the cache files, not to mention that
they need to figure this out first, which is far from easy.

Regards
Martin





[PATCH] docs: add kbase entry for migrationinternals

2020-08-20 Thread Fangge Jin
Commit c051e56d27 added migrationinternals.rst in kbase, but the
entry was missing.

Signed-off-by: Fangge Jin 
---
 docs/kbase.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/docs/kbase.rst b/docs/kbase.rst
index 78daaa5989..546be6785e 100644
--- a/docs/kbase.rst
+++ b/docs/kbase.rst
@@ -44,3 +44,7 @@ Knowledge base
 
`Incremental backup internals `__
   Incremental backup implementation details relevant for users
+
+   `VM migration internals `__
+  VM migration implementation details, complementing the info in
+  `migration `__
-- 
2.20.1



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Daniel P . Berrangé
On Wed, Aug 05, 2020 at 10:19:26AM +0200, Andrea Bolognani wrote:
> On Wed, 2020-08-05 at 02:09 -0400, Mark Mielke wrote:
> > Hi all:
> > 
> > In testing qemu-5.1rc2 on my Fedora 32 home system, I found that
> > the Fedora rawhide package has broken out both the QXL display
> > device and the USB redirect device into separate RPM modules:
> > 
> > qemu-device-display-qxl.x86_64 2:5.1.0-0.1.rc2.fc32
> > qemu-device-usb-redirect.x86_642:5.1.0-0.1.rc2.fc32
> > 
> > The upgrade from 5.0.0 to 5.1.0 does not treat these sub-packages
> > as mandatory packages, therefore a straight upgrade of packages
> > causes these domcapabilities to disappear.
> > 
> > If the user tries to use libvirt after this, then existing domains
> > using QXL display device or USB redirect device will fail to start.
> > If the user then investigates and realizes that they now need to
> > install the above packages separately, they find that qemu-kvm
> > recognizes the modules right away, but libvirt does not. This looks
> > to be due to the libvirt domcapabilities cache?
> 
> I guess we need to start checking the modules directory in addition
> to the main QEMU binary, and regenerate capabilities every time its
> contents change.
> 
> That said, if upgrading QEMU results in losing features, even though
> you can recover them through additional steps I would argue that's a
> bug in the packaging that should be addressed on the QEMU side.

Potentially we can consider this a  distro packaging problem and fix
it at the RPM level.

eg the libvirt RPM can define a trigger that runs when *any* of the
qemu-device*  RPMs is installed/updated. This trigger can simply
touch a file on disk somewhere, and libvirtd can monitor this one
file, instead of having to monitor every module.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Daniel P . Berrangé
On Thu, Aug 20, 2020 at 11:32:03AM +0200, Martin Wilck wrote:
> On Tue, 2020-08-18 at 15:15 -0600, Jim Fehlig wrote:
> > On 8/5/20 2:19 AM, Andrea Bolognani wrote:
> > > 
> > > I guess we need to start checking the modules directory in addition
> > > to the main QEMU binary, and regenerate capabilities every time its
> > > contents change.
> > 
> > We recently received reports of this issue on Tumbleweed, which just
> > got the 
> > modularized qemu 5.1
> > 
> > https://bugzilla.opensuse.org/show_bug.cgi?id=1175320
> > 
> > Mark, are you working on a patch to invalidate the cache on changes
> > to the qemu 
> > modules directory? I suppose it needs to be handled similar to the
> > qemu 
> > binaries. E.g. when building the cache include a list of all qemu
> > modules found. 
> > When validating the cache see if any modules have disappeared, if any
> > have been 
> > added, and if the ctime of any have changed. Yikes, sounds a little
> > more complex 
> > than the binaries :-).
> 
> I'd like to question whether this effort is justified for an
> optimization that matters only at libvirtd startup time, and even there
> saves no more than a few seconds.
> 
> I propose to simply disable the caching of qemu capabilities (or
> provide a build-time option to do so). Optimizations that produce wrong
> results should be avoided.

Whether the time matters depends on your use case for QEMU. For heavy
data center apps like OpenStack you won't notice it because OpenStack
itself adds soo much overhead to the system.  For cases where the VM
is used as "embedded" infrastructure startup time can be critical.
Not caching capabilities adds easily 300-500 ms to startup of a single
VM, which is very significant when the current minimum startup time of
a VM can be as low as 150 ms.

IOW removing caching is not a viable option.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: qemu modularization of qemu-5.1 vs libvirt domcapabilities cache?

2020-08-20 Thread Martin Wilck
On Tue, 2020-08-18 at 15:15 -0600, Jim Fehlig wrote:
> On 8/5/20 2:19 AM, Andrea Bolognani wrote:
> > 
> > I guess we need to start checking the modules directory in addition
> > to the main QEMU binary, and regenerate capabilities every time its
> > contents change.
> 
> We recently received reports of this issue on Tumbleweed, which just
> got the 
> modularized qemu 5.1
> 
> https://bugzilla.opensuse.org/show_bug.cgi?id=1175320
> 
> Mark, are you working on a patch to invalidate the cache on changes
> to the qemu 
> modules directory? I suppose it needs to be handled similar to the
> qemu 
> binaries. E.g. when building the cache include a list of all qemu
> modules found. 
> When validating the cache see if any modules have disappeared, if any
> have been 
> added, and if the ctime of any have changed. Yikes, sounds a little
> more complex 
> than the binaries :-).

I'd like to question whether this effort is justified for an
optimization that matters only at libvirtd startup time, and even there
saves no more than a few seconds. 

I propose to simply disable the caching of qemu capabilities (or
provide a build-time option to do so). Optimizations that produce wrong
results should be avoided.

> I wonder if it is possible to use inotify to receive notification of
> changes to 
> the modules directory?

That would certainly be possible, but it'd be a new feature (getting
aware of qemu capability changes during runtime), while we're currently
discussing an existing feature (getting qemu capabilities on startup)
which is broken. I believe these issues should be discussed separately.

Regards,
Martin




Re: Please revert f4be03b3 (libvirtaio: Drop object(*args, **kwargs)) for theoretical reasons

2020-08-20 Thread Daniel P . Berrangé
On Wed, Aug 19, 2020 at 11:32:10PM +0200, Wojtek Porczyk wrote:
> Hi Philipp,
> (Cc: Daniel, because IIUC you reviewed !16 which got this merged),
> 
> I'm sorry I didn't notice this earlier, but the commit f4be03b3 dated
> 2020-04-20 [0] is wrong. The super().__init__(*args, **kwargs) in
> Callback.__init__ was there on purpose, because of how Python's inheritance in
> new-style classes works.
> 
> Let me explain this a bit, because it is not obvious.
> 
> Suppose you had diamond inheritance like this:
> 
> class A(object): pass
> class B(A): pass
> class C(A): pass
> class D(B,C): pass
> 
> And those classes needed a common function with varying arguments:
> 
> class A(object):
> def spam(self, a): print(f'A: {a}')
> class B(A):
> def spam(self, b): print(f'B: {b}')
> class C(A):
> def spam(self, c): print(f'C: {c}')
> class D(B,C):
> def spam(self, d): print(f'D: {d}')
> 
> The way to call all parent's functions exactly once (as per MRO) and accept
> all arguments and also forbid unknown arguments is to accept **kwargs
> everywhere and pass them to super().spam():
> 
> class A:
> def spam(self, a):
> print(f'A: {a}')
> class B(A):
> def spam(self, b, **kwargs):
> print(f'B: {b}')
> super().spam(**kwargs)
> class C(A):
> def spam(self, c, **kwargs):
> print(f'C: {c}')
> super().spam(**kwargs)
> class D(B, C):
> def spam(self, d, **kwargs):
> print(f'D: {d}')
> super().spam(**kwargs)
> 
> Let's run this:
> 
> >>> B().spam(a=1, b=2)
> B: 2
> A: 1
> >>> D().spam(a=1, b=2, c=3, d=4)
> D: 4
> B: 2
> C: 3
> A: 1
> 
> You may notice that super() in B.spam refers to two different classes, either
> A or C, depending on inheritance order in yet undefinded classes (as of B's
> definition).
> 
> That's why the conclusion that super() in Callback.__init__ refers to object
> is wrong. In this example, spam=__init__, A=object, B=Callback and C and D are
> not yet written, but theoretically possible classes that could be written by
> someone else. Why would they be needed, I don't know, but if someone written
> them, s/he would be out of options to invent new arguments to C.__init__.
> 
> Note that super().__init__(*args, **kwargs) when super() refers to object
> isn't harmful, and just ensures that args and kwargs are empty (i.e. no
> unknown arguments were passed). In fact, this is exactly why object.__init__()
> takes no arguments since Python 2.6 [1][2], as you correctly point out in the
> commit message.
> 
> I don't think this breaks anything (I very much doubt anyone would need to
> write code that would trigger this), nevertheless, as the commit is both
> pointless and wrong, and as the original author of libvirtaio I'd like to ask
> for this commit to be reverted. If this breaks some static analysis tool,
> could you just suppress it for this particular line?

Could you open a merge request providing the revert along with your
description of why the change was wrong and I'll review & approve it.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH] tools: fix libvirt-guests.sh text assignments

2020-08-20 Thread Michael Chapman
On Thu, 20 Aug 2020, Christian Ehrhardt wrote:
> On Wed, Aug 19, 2020 at 12:15 PM Christian Ehrhardt
>  wrote:
> >
> > In libvirt 6.6 stopping guests with libvirt-guests.sh is broken.
> > As soon as there is more than one guest one can see
> > `systemctl stop libvirt-guests` faiing and in the log we see:
> >   libvirt-guests.sh[2455]: Running guests on default URI:
> >   libvirt-guests.sh[2457]: /usr/lib/libvirt/libvirt-guests.sh: 120:
> >   local: 2a49cb0f-1ff8-44b5-a61d-806b9e52dae2: bad variable name
> >   libvirt-guests.sh[2462]: no running guests.
> >
> > That is due do mutliple guests becoming a list of UUIDs. Without
> > recognizing this as one single string the assignment breaks when using 
> > 'local'
> > (which was recently added in 6.3.0). This is because local is defined as
> >   local [option] [name[=value] ... | - ]
> > which makes the shell trying handle the further part of the string as
> > variable names. In the error above that string isn't a valid variable
> > name triggering the issue that is seen.
> >
> > To resolve that 'textify' all assignments that are strings or potentially
> > can become such lists (even if they are not using the local qualifier).
> 
> Just to illustrate the problem, this never was great and could cause
> warnings/errors,
> but amplified due to the 'local' it makes the script break now.

Arguably the big problem here is that 'local' isn't actually specified by 
POSIX, so can not be used in a portable /bin/sh script. (It might end up 
in POSIX eventually, see [1].)

If this were a Bash script, then all of those variable assignments 
(whether they're local or not) would work as expected:

$ echo $BASH_VERSION 
5.0.17(1)-release
$ uuid='2a49cb0f-1ff8-44b5-a61d-806b9e52dae2 
2a49cb0f-1ff8-44b5-a61d-806b9e52dae3'
$ foo() { x=$uuid; echo "<$x>"; }
$ bar() { local x=$uuid; echo "<$x>"; }
$ foo
<2a49cb0f-1ff8-44b5-a61d-806b9e52dae2 2a49cb0f-1ff8-44b5-a61d-806b9e52dae3>
$ bar
<2a49cb0f-1ff8-44b5-a61d-806b9e52dae2 2a49cb0f-1ff8-44b5-a61d-806b9e52dae3>

This works even when Bash is running in POSIX mode.

But POSIX shells that do not support 'local' seem to be rare, and your 
suggested change would make these assignments work even on shells that do 
not apply special parsing rules for it.

[1] https://www.austingroupbugs.net/bug_view_page.php?bug_id=767



Re: [PATCH v2 07/17] libvirt-storage: Document volume upload/download stream format

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:25 +0200, Michal Privoznik wrote:
> For libvirt, the volume is just a binary blob and it doesn't
> interpret data on volume upload/download. But as it turns out,
> this unspoken assumption is not clear to our users. Document it
> explicitly.
> 
> Suggested in: https://bugzilla.redhat.com/show_bug.cgi?id=1851023#c17
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/libvirt-storage.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/libvirt-storage.c b/src/libvirt-storage.c
> index a45c8b98c1..8738f6dd14 100644
> --- a/src/libvirt-storage.c
> +++ b/src/libvirt-storage.c
> @@ -1590,7 +1590,9 @@ virStorageVolCreateXMLFrom(virStoragePoolPtr pool,
>   *
>   * Download the content of the volume as a stream. If @length
>   * is zero, then the remaining contents of the volume after
> - * @offset will be downloaded.
> + * @offset will be downloaded. Please note, that the data is
> + * not interpreted and therefore data received by stream
> + * client are in the very same format the volume is in.

I don't think this wording clarifies it that much as it's not obvious
what's meant by 'interpreted'.

How about:

Please note that the stream transports the volume itself as is, so the
downloaded data may not correspond to guest OS visible state in cases
when a complex storage format such as qcow2 or vmdk is used.



Re: [PATCH v2 06/17] virfdstream: Drop some needless labels

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:24 +0200, Michal Privoznik wrote:
> After previous cleanups, some labels in some functions have
> nothing but 'return' statement in them. Drop the labels and
> replace 'goto'-s with respective return statements.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 24 +---
>  1 file changed, 9 insertions(+), 15 deletions(-)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 04/17] virfdstream: Use g_new0() instead of VIR_ALLOC()

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:22 +0200, Michal Privoznik wrote:
> This switch allow us to save a few lines of code.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 20 +++-
>  1 file changed, 7 insertions(+), 13 deletions(-)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 05/17] virfdstream: Use VIR_AUTOCLOSE()

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:23 +0200, Michal Privoznik wrote:
> Again, instead of closing FDs explicitly, we can automatically
> close them when they go out of their respective scopes.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

[...]

> @@ -1160,9 +1158,10 @@ int virFDStreamConnectUNIX(virStreamPtr st,
>  {
>  struct sockaddr_un sa;
>  virTimeBackOffVar timeout;
> +VIR_AUTOCLOSE fd = -1;
>  int ret;
>  
> -int fd = socket(AF_UNIX, SOCK_STREAM, 0);
> +fd = socket(AF_UNIX, SOCK_STREAM, 0);
>  if (fd < 0) {
>  virReportSystemError(errno, "%s", _("Unable to open UNIX socket"));
>  goto error;
> @@ -1197,10 +1196,11 @@ int virFDStreamConnectUNIX(virStreamPtr st,
>  
>  if (virFDStreamOpenInternal(st, fd, NULL, 0) < 0)
>  goto error;
> +
> +fd = -1;
>  return 0;

Please move the clearing of 'fd' closer towards the call to
'virFDStreamOpenInternal' which consumes it than to the return statement
so that it's visualy clearer that it's consumed by the call.


Reviewed-by: Peter Krempa 



Re: [PATCH v2 03/17] virfdstream: Use autoptr for virFDStreamMsg

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:21 +0200, Michal Privoznik wrote:
> A cleanup function can be declared for virFDStreamMsg type so
> that the structure doesn't have to be freed explicitly.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)

Reviewed-by: Peter Krempa 



Re: [PATCH v2 02/17] virFDStreamMsgQueuePush: Clear pointer to passed message

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:20 +0200, Michal Privoznik wrote:
> All callers of virFDStreamMsgQueuePush() have the same pattern:
> they explicitly set @msg passed to NULL to avoid freeing it later
> on. Well, the function can take address of the pointer and clear
> it for them.
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/util/virfdstream.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)

Reviewed-by: Peter Krempa  



Re: device compatibility interface for live migration with assigned devices

2020-08-20 Thread Yan Zhao
On Thu, Aug 20, 2020 at 06:16:28AM +0100, Sean Mooney wrote:
> On Thu, 2020-08-20 at 12:01 +0800, Yan Zhao wrote:
> > On Thu, Aug 20, 2020 at 02:29:07AM +0100, Sean Mooney wrote:
> > > On Thu, 2020-08-20 at 08:39 +0800, Yan Zhao wrote:
> > > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote:
> > > > > On Tue, 18 Aug 2020 10:16:28 +0100
> > > > > Daniel P. Berrangé  wrote:
> > > > > 
> > > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote:
> > > > > > >On 2020/8/18 下午4:55, Daniel P. Berrangé wrote:
> > > > > > > 
> > > > > > >  On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
> > > > > > > 
> > > > > > >  On 2020/8/14 下午1:16, Yan Zhao wrote:
> > > > > > > 
> > > > > > >  On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
> > > > > > > 
> > > > > > >  On 2020/8/10 下午3:46, Yan Zhao wrote:  
> > > > > > >  we actually can also retrieve the same information through 
> > > > > > > sysfs, .e.g
> > > > > > > 
> > > > > > >  |- [path to device]
> > > > > > > |--- migration
> > > > > > > | |--- self
> > > > > > > | |   |---device_api
> > > > > > > ||   |---mdev_type
> > > > > > > ||   |---software_version
> > > > > > > ||   |---device_id
> > > > > > > ||   |---aggregator
> > > > > > > | |--- compatible
> > > > > > > | |   |---device_api
> > > > > > > ||   |---mdev_type
> > > > > > > ||   |---software_version
> > > > > > > ||   |---device_id
> > > > > > > ||   |---aggregator
> > > > > > > 
> > > > > > > 
> > > > > > >  Yes but:
> > > > > > > 
> > > > > > >  - You need one file per attribute (one syscall for one attribute)
> > > > > > >  - Attribute is coupled with kobject
> > > > > 
> > > > > Is that really that bad? You have the device with an embedded kobject
> > > > > anyway, and you can just put things into an attribute group?
> > > > > 
> > > > > [Also, I think that self/compatible split in the example makes things
> > > > > needlessly complex. Shouldn't semantic versioning and matching already
> > > > > cover nearly everything? I would expect very few cases that are more
> > > > > complex than that. Maybe the aggregation stuff, but I don't think we
> > > > > need that self/compatible split for that, either.]
> > > > 
> > > > Hi Cornelia,
> > > > 
> > > > The reason I want to declare compatible list of attributes is that
> > > > sometimes it's not a simple 1:1 matching of source attributes and 
> > > > target attributes
> > > > as I demonstrated below,
> > > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible 
> > > > to
> > > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2),
> > > >(mdev_type i915-GVTg_V5_8 + aggregator 4)
> > > 
> > > the way you are doing the nameing is till really confusing by the way
> > > if this has not already been merged in the kernel can you chagne the mdev
> > > so that mdev_type i915-GVTg_V5_2 is 2 of mdev_type i915-GVTg_V5_1 instead 
> > > of half the device
> > > 
> > > currently you need to deived the aggratod by the number at the end of the 
> > > mdev type to figure out
> > > how much of the phsicial device is being used with is a very unfridly api 
> > > convention
> > > 
> > > the way aggrator are being proposed in general is not really someting i 
> > > like but i thin this at least
> > > is something that should be able to correct.
> > > 
> > > with the complexity in the mdev type name + aggrator i suspect that this 
> > > will never be support
> > > in openstack nova directly requireing integration via cyborg unless we 
> > > can pre partion the
> > > device in to mdevs staicaly and just ignore this.
> > > 
> > > this is way to vendor sepecif to integrate into something like openstack 
> > > in nova unless we can guarentee
> > > taht how aggreator work will be portable across vendors genericly.
> > > 
> > > > 
> > > > and aggragator may be just one of such examples that 1:1 matching does 
> > > > not
> > > > fit.
> > > 
> > > for openstack nova i dont see us support anything beyond the 1:1 case 
> > > where the mdev type does not change.
> > > 
> > 
> > hi Sean,
> > I understand it's hard for openstack. but 1:N is always meaningful.
> > e.g.
> > if source device 1 has cap A, it is compatible to
> > device 2: cap A,
> > device 3: cap A+B,
> > device 4: cap A+B+C
> > 
> > to allow openstack to detect it correctly, in compatible list of
> > device 2, we would say compatible cap is A;
> > device 3, compatible cap is A or A+B;
> > device 4, compatible cap is A or A+B, or A+B+C;
> > 
> > then if openstack finds device A's self cap A is contained in compatible
> > cap of device 2/3/4, it can migrate device 1 to device 2,3,4.
> > 
> > conversely,  device 1's compatible cap is only A,
> > so it is able to migrate device 2 to device 1, and it is not able to
> > migrate device 3/4 to device 1.
> 
> yes we build the palcement servce aroudn the idea of capablites as traits on 
>