[libvirt] question about syntax of storage volume element

2018-09-27 Thread Jim Fehlig
I've attempted to use virt-manager to create a new VM that uses a volume from an 
rbd-based network pool, but have not been able to progress past step 4/5 where 
VM storage is selected. It appears virt-manager has problems properly detecting 
the volume as network-based storage, but before investigating those further I 
have a question about the syntax of the  element of a storage volume. 
The storage management page [0] of the website describing rbd volumes claims 
that the  subelement contains an 'rbd:rbd/' prefix in the volume path. But 
the page describing pool and volume format [1] syntax does not contain any info 
wrt specifying network URLs in the  subelement.


What is the expectation wrt the  subelement of the  element within 
rbd volume config? In general, should the  subelement encode the scheme 
(e.g. rbd://) of a network-based volume? And if so, should it be formatted in 
the traditional 'rbd://' vs 'rbd:rbd/' syntax?


Regards,
Jim

[0] https://libvirt.org/storage.html#StorageBackendRBD
[1] https://libvirt.org/formatstorage.html#StorageVolTarget

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] uml: Fix weird logic inside umlConnectOpen() function.

2018-09-27 Thread Julio Faracco
The pointer related to uml_driver needs to be checked before its usage
inside the function. Some attributes of the driver are being accessed
while the pointer is NULL considering the current logic.

Signed-off-by: Julio Faracco 
---
 src/uml/uml_driver.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/uml/uml_driver.c b/src/uml/uml_driver.c
index fcd468243e..d1c71d8521 100644
--- a/src/uml/uml_driver.c
+++ b/src/uml/uml_driver.c
@@ -1193,6 +1193,13 @@ static virDrvOpenStatus umlConnectOpen(virConnectPtr 
conn,
 {
 virCheckFlags(VIR_CONNECT_RO, VIR_DRV_OPEN_ERROR);
 
+/* URI was good, but driver isn't active */
+if (uml_driver == NULL) {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("uml state driver is not active"));
+return VIR_DRV_OPEN_ERROR;
+}
+
 /* Check path and tell them correct path if they made a mistake */
 if (uml_driver->privileged) {
 if (STRNEQ(conn->uri->path, "/system") &&
@@ -1211,13 +1218,6 @@ static virDrvOpenStatus umlConnectOpen(virConnectPtr 
conn,
 }
 }
 
-/* URI was good, but driver isn't active */
-if (uml_driver == NULL) {
-virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-   _("uml state driver is not active"));
-return VIR_DRV_OPEN_ERROR;
-}
-
 if (virConnectOpenEnsureACL(conn) < 0)
 return VIR_DRV_OPEN_ERROR;
 
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Friday, September 28, 2018 4:01 AM
> 
> On Fri, 28 Sep 2018 00:53:00 +0530
> Kirti Wankhede  wrote:
> 
> > On 9/27/2018 9:29 PM, Alex Williamson wrote:
> > > On Thu, 27 Sep 2018 06:48:27 +
> > > "Tian, Kevin"  wrote:
> > >
> > >>> From: Kirti Wankhede
> > >>> Sent: Thursday, September 27, 2018 2:22 PM
> > >>>
> > >>> Generally a single instance of mdev device, a share of physical device,
> is
> > >>> assigned to user space application or a VM. There are cases when
> multiple
> > >>> instances of mdev devices of same or different types are required by
> User
> > >>> space application or VM. For example in case of vGPU, multiple mdev
> > >>> devices
> > >>> of type which represents whole GPU can be assigned to one instance
> of
> > >>> application or VM.
> > >>>
> > >>> All types of mdev devices may not support assigning multiple mdev
> devices
> > >>> to a user space application. In that case vendor driver can fail open()
> > >>> call of mdev device. But there is no way to know User space
> application
> > >>> about the configuration supported by vendor driver.
> > >>>
> > >>> To expose supported configuration, vendor driver should add
> > >>> 'multiple_mdev_support' attribute to type-id directory if vendor
> driver
> > >>> supports assigning multiple mdev devices of a particular type-id to
> one
> > >>> instance of user space application or a VM.
> > >>>
> > >>> User space application should read if 'multiple_mdev_support'
> attibute is
> > >>> present in type-id directory of all mdev devices which are going to be
> > >>> used. If all read 1 then user space application can proceed with
> multiple
> > >>> mdev devices.
> > >>>
> > >>> This is optional and readonly attribute.
> > >>
> > >> I didn't get what is the exact problem from above description. Isn't it
> the
> > >> basic point behind mdev concept that parent driver can create multiple
> > >> mdev instances on a physical device? VFIO usually doesn't care
> whether
> > >> multiple devices (pci or mdev) are assigned to same process/VM or not.
> > >> Can you give some example of actual problem behind this change?
> > >
> > > The situation here is that mdev imposes no restrictions regarding how
> > > mdev devices can be used by the user.  Multiple mdevs, regardless of
> > > type, can be combined and used in any way the user sees fit, at least
> > > that's the theory.  However, in practice, certain vendor drivers impose
> > > a limitation that prevents multiple mdev devices from being used in the
> > > same VM.  This is done by returning an error when the user attempts to
> > > open those additional devices.  Now we're in the very predictable
> > > situation that we want to consider that some of the vendor's mdev
> types
> > > might be combined in the same userspace instance, while others still
> > > impose a restriction, and how can we optionally expose such per mdev
> > > type restrictions to the userspace so we have something better than
> > > "try it and see if it works".
> > >
> > > The below proposal assumes that a single instance per VM is the norm
> > > and that we add something to the API to indicate multiple instances are
> > > supported.  However, that's really never been the position of the mdev
> > > core, where there's no concept of limiting devices per user instance;
> > > it's a vendor driver restriction.  So I think the question is then why
> > > should we burden vendor drivers who do not impose a restriction with
> an
> > > additional field?  Does the below extension provide a better backwards
> > > compatibility scenario?
> >
> > The vendor driver who do not want to impose restriction, doesn't need to
> > provide this attribute. In that case, behavior would remain same as it
> > is now, i.e. multiple mdev instances can be used by one instance of
> > application.
> >
> >
> > > With the current code, I think it's reasonable for userspace to assume
> > > there are no mdev limits per userspace instance, those that exist are
> > > vendor specific.  The below proposal is for an optional field, what
> > > does the management tool assume when it's not supplied?  We have
> both
> > > cases currently, mdev devices that allow multiple instances per VM and
> > > those that do not, so I don't see that the user is more informed with
> > > this addition and no attempt is made here to synchronously update all
> > > drivers to expose this new attribute.
> > >
> >
> > When this attribute is not provided, behavior should remain same as it
> > is now. But if this attribute is provided then management tool should
> > read this attribute to verify that such combination is supported by
> > vendor driver.
> >
> > > Does it make more sense then to add an attribute to expose the
> > > restriction rather than the lack of restriction.  Perhaps something
> > > like "singleton_usage_restriction".
> >
> > I would prefer to expose what is supported than what is restricted.
> 
> Above, it's stated 

Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Friday, September 28, 2018 9:27 AM
> 
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Friday, September 28, 2018 4:01 AM
> >
> > On Fri, 28 Sep 2018 00:53:00 +0530
> > Kirti Wankhede  wrote:
> >
> > > On 9/27/2018 9:29 PM, Alex Williamson wrote:
> > > > On Thu, 27 Sep 2018 06:48:27 +
> > > > "Tian, Kevin"  wrote:
> > > >
> > > >>> From: Kirti Wankhede
> > > >>> Sent: Thursday, September 27, 2018 2:22 PM
> > > >>>
> > > >>> Generally a single instance of mdev device, a share of physical
> device,
> > is
> > > >>> assigned to user space application or a VM. There are cases when
> > multiple
> > > >>> instances of mdev devices of same or different types are required
> by
> > User
> > > >>> space application or VM. For example in case of vGPU, multiple
> mdev
> > > >>> devices
> > > >>> of type which represents whole GPU can be assigned to one
> instance
> > of
> > > >>> application or VM.
> > > >>>
> > > >>> All types of mdev devices may not support assigning multiple mdev
> > devices
> > > >>> to a user space application. In that case vendor driver can fail 
> > > >>> open()
> > > >>> call of mdev device. But there is no way to know User space
> > application
> > > >>> about the configuration supported by vendor driver.
> > > >>>
> > > >>> To expose supported configuration, vendor driver should add
> > > >>> 'multiple_mdev_support' attribute to type-id directory if vendor
> > driver
> > > >>> supports assigning multiple mdev devices of a particular type-id to
> > one
> > > >>> instance of user space application or a VM.
> > > >>>
> > > >>> User space application should read if 'multiple_mdev_support'
> > attibute is
> > > >>> present in type-id directory of all mdev devices which are going to
> be
> > > >>> used. If all read 1 then user space application can proceed with
> > multiple
> > > >>> mdev devices.
> > > >>>
> > > >>> This is optional and readonly attribute.
> > > >>
> > > >> I didn't get what is the exact problem from above description. Isn't it
> > the
> > > >> basic point behind mdev concept that parent driver can create
> multiple
> > > >> mdev instances on a physical device? VFIO usually doesn't care
> > whether
> > > >> multiple devices (pci or mdev) are assigned to same process/VM or
> not.
> > > >> Can you give some example of actual problem behind this change?
> > > >
> > > > The situation here is that mdev imposes no restrictions regarding how
> > > > mdev devices can be used by the user.  Multiple mdevs, regardless of
> > > > type, can be combined and used in any way the user sees fit, at least
> > > > that's the theory.  However, in practice, certain vendor drivers impose
> > > > a limitation that prevents multiple mdev devices from being used in
> the
> > > > same VM.  This is done by returning an error when the user attempts
> to
> > > > open those additional devices.  Now we're in the very predictable
> > > > situation that we want to consider that some of the vendor's mdev
> > types
> > > > might be combined in the same userspace instance, while others still
> > > > impose a restriction, and how can we optionally expose such per
> mdev
> > > > type restrictions to the userspace so we have something better than
> > > > "try it and see if it works".
> > > >
> > > > The below proposal assumes that a single instance per VM is the
> norm
> > > > and that we add something to the API to indicate multiple instances
> are
> > > > supported.  However, that's really never been the position of the
> mdev
> > > > core, where there's no concept of limiting devices per user instance;
> > > > it's a vendor driver restriction.  So I think the question is then why
> > > > should we burden vendor drivers who do not impose a restriction
> with
> > an
> > > > additional field?  Does the below extension provide a better
> backwards
> > > > compatibility scenario?
> > >
> > > The vendor driver who do not want to impose restriction, doesn't need
> to
> > > provide this attribute. In that case, behavior would remain same as it
> > > is now, i.e. multiple mdev instances can be used by one instance of
> > > application.
> > >
> > >
> > > > With the current code, I think it's reasonable for userspace to assume
> > > > there are no mdev limits per userspace instance, those that exist are
> > > > vendor specific.  The below proposal is for an optional field, what
> > > > does the management tool assume when it's not supplied?  We have
> > both
> > > > cases currently, mdev devices that allow multiple instances per VM
> and
> > > > those that do not, so I don't see that the user is more informed with
> > > > this addition and no attempt is made here to synchronously update all
> > > > drivers to expose this new attribute.
> > > >
> > >
> > > When this attribute is not provided, behavior should remain same as it
> > > is now. But if this attribute is provided then management tool should
> > > read this attribute to verify that such combination is supported by

[libvirt] [PATCH v3 5/6] qemu_driver: BaselineHypervisorCPU support for S390

2018-09-27 Thread Chris Venteicher
Libvirt can compute baseline hypervisor cpu from list of S390 cpu models
by using QEMU QMP messages to compute baseline within QEMU.

QEMU only baselines one pair of CPU models per query so a sequence of
multiple QMP queries are needed if more than two CPU models are
baselined.

Full expansion of baseline S390 model supported using QEMU QMP messages to
compute expansion within QEMU.

Introduces qemuMonitorCPUModelInfoRemovePropByBoolValue to remove props
by value as required to convert between cpu model property lists that
indicate if property is or isn't include in model with true / false
value and the form of cpu model property lists that only enumerate
properties that are actually included in the model.

Introduces functions for virCPUDef / qemuMonitorCPUModelInfo conversion.

Signed-off-by: Chris Venteicher 
---
 src/qemu/qemu_capabilities.c | 128 +--
 src/qemu/qemu_capabilities.h |   4 +
 src/qemu/qemu_driver.c   | 143 +--
 src/qemu/qemu_monitor.c  |  41 ++
 src/qemu/qemu_monitor.h  |  10 +++
 src/qemu/qemu_monitor_json.c |  60 +++
 src/qemu/qemu_monitor_json.h |   7 ++
 7 files changed, 365 insertions(+), 28 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index 5b1e4f1bbd..c5bf04993b 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -2785,7 +2785,8 @@ virQEMUCapsInitCPUModelS390(virQEMUCapsPtr qemuCaps,
 virCPUDefPtr cpu,
 bool migratable)
 {
-size_t i;
+virCPUDefPtr tmp = NULL;
+int ret = -1;
 
 if (!modelInfo) {
 if (type == VIR_DOMAIN_VIRT_KVM) {
@@ -2798,32 +2799,18 @@ virQEMUCapsInitCPUModelS390(virQEMUCapsPtr qemuCaps,
 return 2;
 }
 
-if (VIR_STRDUP(cpu->model, modelInfo->name) < 0 ||
-VIR_ALLOC_N(cpu->features, modelInfo->nprops) < 0)
-return -1;
-
-cpu->nfeatures_max = modelInfo->nprops;
-cpu->nfeatures = 0;
-
-for (i = 0; i < modelInfo->nprops; i++) {
-virCPUFeatureDefPtr feature = cpu->features + cpu->nfeatures;
-qemuMonitorCPUPropertyPtr prop = modelInfo->props + i;
+if (!(tmp = virQEMUCapsCPUModelInfoToCPUDef(migratable, modelInfo)))
+goto cleanup;
 
-if (prop->type != QEMU_MONITOR_CPU_PROPERTY_BOOLEAN)
-continue;
+/* Free original then copy over model, vendor, vendor_id and features */
+virCPUDefStealModel(cpu, tmp, true);
 
-if (VIR_STRDUP(feature->name, prop->name) < 0)
-return -1;
+ret = 0;
 
-if (!prop->value.boolean ||
-(migratable && prop->migratable == VIR_TRISTATE_BOOL_NO))
-feature->policy = VIR_CPU_FEATURE_DISABLE;
-else
-feature->policy = VIR_CPU_FEATURE_REQUIRE;
-cpu->nfeatures++;
-}
+ cleanup:
+virCPUDefFree(tmp);
 
-return 0;
+return ret;
 }
 
 
@@ -3620,6 +3607,101 @@ virQEMUCapsLoadCache(virArch hostArch,
 return ret;
 }
 
+/* virCPUDef model=> qemuMonitorCPUModelInfo name
+ * virCPUDef features => qemuMonitorCPUModelInfo boolean properties */
+qemuMonitorCPUModelInfoPtr
+virQEMUCapsCPUModelInfoFromCPUDef(const virCPUDef *cpuDef)
+{
+size_t i;
+qemuMonitorCPUModelInfoPtr cpuModel = NULL;
+qemuMonitorCPUModelInfoPtr ret = NULL;
+
+if (!cpuDef || (VIR_ALLOC(cpuModel) < 0))
+goto cleanup;
+
+VIR_DEBUG("cpuDef->model = %s", NULLSTR(cpuDef->model));
+
+if (VIR_STRDUP(cpuModel->name, cpuDef->model) < 0 ||
+VIR_ALLOC_N(cpuModel->props, cpuDef->nfeatures) < 0)
+goto cleanup;
+
+cpuModel->nprops = 0;
+
+for (i = 0; i < cpuDef->nfeatures; i++) {
+qemuMonitorCPUPropertyPtr prop = &(cpuModel->props[cpuModel->nprops]);
+virCPUFeatureDefPtr feature = &(cpuDef->features[i]);
+
+if (!(feature->name) ||
+VIR_STRDUP(prop->name, feature->name) < 0)
+goto cleanup;
+
+prop->type = QEMU_MONITOR_CPU_PROPERTY_BOOLEAN;
+
+prop->value.boolean = feature->policy == -1 || /* policy undefined */
+  feature->policy == VIR_CPU_FEATURE_FORCE ||
+  feature->policy == VIR_CPU_FEATURE_REQUIRE;
+
+ cpuModel->nprops++;
+}
+
+VIR_STEAL_PTR(ret, cpuModel);
+
+ cleanup:
+qemuMonitorCPUModelInfoFree(cpuModel);
+return ret;
+}
+
+
+/* qemuMonitorCPUModelInfo name   => virCPUDef model
+ * qemuMonitorCPUModelInfo boolean properties => virCPUDef features
+ *
+ * migratable true: mark non-migratable features as disabled
+ *false: allow all features as required
+ */
+virCPUDefPtr
+virQEMUCapsCPUModelInfoToCPUDef(bool migratable, qemuMonitorCPUModelInfoPtr 
model)
+{
+virCPUDefPtr cpu = NULL;
+virCPUDefPtr ret = NULL;
+size_t i;
+
+if (!model || VIR_ALLOC(cpu) < 0)
+goto cleanup;
+
+VIR_DEBUG("model->name= %s", 

[libvirt] [PATCH v3 4/6] qemu_process: Use common processes mgmt funcs for all QMP query types

2018-09-27 Thread Chris Venteicher
Generalized QEMU process management functions supporting all forms of
QMP queries including but no limited to capabilities queries.

QEMU process instances can be maintained and used for multiple different QMP
queries, of the same or different types, as required to complete a
specific libvirt task.

Support concurrent QEMU process instances for QMP queries,
using the same or different qemu binaries.

All process mgmt functions are removed from qemu_capabilities and
re-implemented in qemu_process in a generic, non capabilities specific,
manner consistent with existing qemu_process mgmt functions.

Concept of qmp_query domain is used to contain process state through the
an entire process lifecycle similar to how a VM domains contain process
state in existing process functions.

Monitor callbacks were not used in original process management code for
capabilities and are eliminated in new generalized process code for QMP
queries.

QMP exchange to exit capabilities negotiation mode and enter command mode
is the sole responsibility of the QMP query process code.

Signed-off-by: Chris Venteicher 
---
 src/qemu/qemu_capabilities.c | 304 +-
 src/qemu/qemu_monitor.c  |   4 +-
 src/qemu/qemu_process.c  | 398 +++
 src/qemu/qemu_process.h  |  35 +++
 tests/qemucapabilitiestest.c |   7 +
 5 files changed, 489 insertions(+), 259 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index d38530ca80..5b1e4f1bbd 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -47,6 +47,7 @@
 #define __QEMU_CAPSPRIV_H_ALLOW__
 #include "qemu_capspriv.h"
 #include "qemu_qapi.h"
+#include "qemu_process.h"
 
 #include 
 #include 
@@ -3960,18 +3961,6 @@ virQEMUCapsIsValid(void *data,
 }
 
 
-static void virQEMUCapsMonitorNotify(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
- virDomainObjPtr vm ATTRIBUTE_UNUSED,
- void *opaque ATTRIBUTE_UNUSED)
-{
-}
-
-static qemuMonitorCallbacks callbacks = {
-.eofNotify = virQEMUCapsMonitorNotify,
-.errorNotify = virQEMUCapsMonitorNotify,
-};
-
-
 /**
  * virQEMUCapsInitQMPArch:
  * @qemuCaps: QEMU capabilities
@@ -4067,13 +4056,6 @@ virQEMUCapsInitQMPMonitor(virQEMUCapsPtr qemuCaps,
 
 /* @mon is supposed to be locked by callee */
 
-if (qemuMonitorSetCapabilities(mon) < 0) {
-VIR_DEBUG("Failed to set monitor capabilities %s",
-  virGetLastErrorMessage());
-ret = 0;
-goto cleanup;
-}
-
 if (qemuMonitorGetVersion(mon,
   , , ,
   ) < 0) {
@@ -4247,13 +4229,6 @@ virQEMUCapsInitQMPMonitorTCG(virQEMUCapsPtr qemuCaps 
ATTRIBUTE_UNUSED,
 {
 int ret = -1;
 
-if (qemuMonitorSetCapabilities(mon) < 0) {
-VIR_DEBUG("Failed to set monitor capabilities %s",
-  virGetLastErrorMessage());
-ret = 0;
-goto cleanup;
-}
-
 if (virQEMUCapsProbeQMPCPUDefinitions(qemuCaps, mon, true) < 0)
 goto cleanup;
 
@@ -4266,251 +4241,75 @@ virQEMUCapsInitQMPMonitorTCG(virQEMUCapsPtr qemuCaps 
ATTRIBUTE_UNUSED,
 }
 
 
-typedef struct _virQEMUCapsInitQMPCommand virQEMUCapsInitQMPCommand;
-typedef virQEMUCapsInitQMPCommand *virQEMUCapsInitQMPCommandPtr;
-struct _virQEMUCapsInitQMPCommand {
-char *binary;
-uid_t runUid;
-gid_t runGid;
-char **qmperr;
-char *monarg;
-char *monpath;
-char *pidfile;
-virCommandPtr cmd;
-qemuMonitorPtr mon;
-virDomainChrSourceDef config;
-pid_t pid;
-virDomainObjPtr vm;
-};
-
-
-static void
-virQEMUCapsInitQMPCommandAbort(virQEMUCapsInitQMPCommandPtr cmd)
-{
-if (cmd->mon)
-virObjectUnlock(cmd->mon);
-qemuMonitorClose(cmd->mon);
-cmd->mon = NULL;
-
-virCommandAbort(cmd->cmd);
-virCommandFree(cmd->cmd);
-cmd->cmd = NULL;
-
-if (cmd->monpath)
-unlink(cmd->monpath);
-
-virDomainObjEndAPI(>vm);
-
-if (cmd->pid != 0) {
-char ebuf[1024];
-
-VIR_DEBUG("Killing QMP caps process %lld", (long long)cmd->pid);
-if (virProcessKill(cmd->pid, SIGKILL) < 0 && errno != ESRCH)
-VIR_ERROR(_("Failed to kill process %lld: %s"),
-  (long long)cmd->pid,
-  virStrerror(errno, ebuf, sizeof(ebuf)));
-
-VIR_FREE(*cmd->qmperr);
-}
-if (cmd->pidfile)
-unlink(cmd->pidfile);
-cmd->pid = 0;
-}
-
-
-static void
-virQEMUCapsInitQMPCommandFree(virQEMUCapsInitQMPCommandPtr cmd)
-{
-if (!cmd)
-return;
-
-virQEMUCapsInitQMPCommandAbort(cmd);
-VIR_FREE(cmd->binary);
-VIR_FREE(cmd->monpath);
-VIR_FREE(cmd->monarg);
-VIR_FREE(cmd->pidfile);
-VIR_FREE(cmd);
-}
-
-
-static virQEMUCapsInitQMPCommandPtr
-virQEMUCapsInitQMPCommandNew(char *binary,
- const char *libDir,
- uid_t runUid,
-  

[libvirt] [PATCH v3 6/6] qemu_monitor: Default props to migratable when expanding cpu model

2018-09-27 Thread Chris Venteicher
QEMU only returns migratable props when expanding model unless
explicitly told to also include non-migratable props.

Props will be marked migratable when we are certain QEMU returned only
migratable props resulting in consistent information and expansion output
for s390 that is consistent with x86.

After this change,
immediately default prop->migratable = _YES for all props
when we know QEMU only included migratable props in CPU Model.

Set model->migratability = true when we have set prop->migratable.

Signed-off-by: Chris Venteicher 
---
 src/qemu/qemu_monitor.c   | 53 ++-
 src/qemu/qemu_monitor.h   |  7 +-
 .../caps_2.10.0.s390x.xml | 60 -
 .../caps_2.11.0.s390x.xml | 58 -
 .../caps_2.12.0.s390x.xml | 56 
 .../qemucapabilitiesdata/caps_2.8.0.s390x.xml | 32 +-
 .../qemucapabilitiesdata/caps_2.9.0.s390x.xml | 34 +-
 .../qemucapabilitiesdata/caps_3.0.0.s390x.xml | 64 +--
 8 files changed, 210 insertions(+), 154 deletions(-)

diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index 7045186c4e..7a7c175f0a 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -3698,12 +3698,31 @@ qemuMonitorGetCPUModelExpansion(qemuMonitorPtr mon,
 qemuMonitorCPUModelInfoPtr *expansion
)
 {
+int ret = -1;
+qemuMonitorCPUModelInfoPtr tmp;
+
 VIR_DEBUG("type=%d model_name=%s migratable=%d",
   type, input->name, migratable);
 
+*expansion = NULL;
+
 QEMU_CHECK_MONITOR(mon);
 
-return qemuMonitorJSONGetCPUModelExpansion(mon, type, migratable, input, 
expansion);
+if ((ret = qemuMonitorJSONGetCPUModelExpansion(mon, type, migratable, 
input, )) < 0)
+goto cleanup;
+
+if (migratable) {
+/* Only migratable props were included in expanded CPU model */
+*expansion = qemuMonitorCPUModelInfoCopyDefaultMigratable(tmp);
+} else {
+VIR_STEAL_PTR(*expansion, tmp);
+}
+
+ret = 0;
+
+ cleanup:
+qemuMonitorCPUModelInfoFree(tmp);
+return ret;
 }
 
 
@@ -3801,6 +3820,38 @@ qemuMonitorCPUModelInfoCopy(const 
qemuMonitorCPUModelInfo *orig)
 }
 
 
+qemuMonitorCPUModelInfoPtr
+qemuMonitorCPUModelInfoCopyDefaultMigratable(const qemuMonitorCPUModelInfo 
*orig)
+{
+qemuMonitorCPUModelInfoPtr ret = NULL;
+qemuMonitorCPUModelInfoPtr tmp = NULL;
+qemuMonitorCPUPropertyPtr prop = NULL;
+size_t i;
+
+if (!(tmp = qemuMonitorCPUModelInfoCopy(orig)))
+goto cleanup;
+
+for (i = 0; i < tmp->nprops; i++) {
+prop = tmp->props + i;
+
+/* Default prop thats in cpu model (true) to migratable (_YES)
+ * unless prop already explicitly set not migratable (_NO)
+ */
+if (prop->type == QEMU_MONITOR_CPU_PROPERTY_BOOLEAN &&
+prop->value.boolean &&
+prop->migratable != VIR_TRISTATE_BOOL_NO)
+prop->migratable = VIR_TRISTATE_BOOL_YES;
+}
+
+tmp->migratability = true; /* prop->migratable = YES/NO for all CPU props 
*/
+
+VIR_STEAL_PTR(ret, tmp);
+
+ cleanup:
+return ret;
+}
+
+
 /* Squash CPU Model Info property list
  * removing props of type boolean matching value */
 void
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h
index fad39945d3..7268bd7977 100644
--- a/src/qemu/qemu_monitor.h
+++ b/src/qemu/qemu_monitor.h
@@ -1025,7 +1025,7 @@ struct _qemuMonitorCPUModelInfo {
 char *name;
 size_t nprops;
 qemuMonitorCPUPropertyPtr props;
-bool migratability;
+bool migratability; /* true if prop->migratable is YES/NO for all CPU 
props */
 };
 
 typedef enum {
@@ -1048,6 +1048,11 @@ qemuMonitorCPUModelInfoPtr 
qemuMonitorCPUModelInfoNew(const char *name);
 qemuMonitorCPUModelInfoPtr
 qemuMonitorCPUModelInfoCopy(const qemuMonitorCPUModelInfo *orig);
 
+qemuMonitorCPUModelInfoPtr
+qemuMonitorCPUModelInfoCopyDefaultMigratable(const qemuMonitorCPUModelInfo 
*orig)
+ATTRIBUTE_NONNULL(1);
+
+
 int qemuMonitorCPUModelInfoBoolPropAdd(qemuMonitorCPUModelInfoPtr model,
const char *prop_name,
bool prop_value)
diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml 
b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml
index e000aac384..391bee4f06 100644
--- a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml
+++ b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml
@@ -118,36 +118,36 @@
   306247
   
   s390x
-  
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
   
   
   
diff --git 

[libvirt] [PATCH v3 1/6] qemu_monitor: Introduce qemuMonitorCPUModelInfoNew

2018-09-27 Thread Chris Venteicher
A helper function allocates an initializes model name
in CPU Model Info structs.

Signed-off-by: Chris Venteicher 
---
 src/qemu/qemu_capabilities.c |  2 +-
 src/qemu/qemu_monitor.c  | 32 +++-
 src/qemu/qemu_monitor.h  |  2 ++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index e228f52ec0..e60a4b369e 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -3022,7 +3022,7 @@ virQEMUCapsLoadHostCPUModelInfo(virQEMUCapsPtr qemuCaps,
 goto cleanup;
 }
 
-if (VIR_ALLOC(hostCPU) < 0)
+if (!(hostCPU = qemuMonitorCPUModelInfoNew(NULL)))
 goto cleanup;
 
 if (!(hostCPU->name = virXMLPropString(hostCPUNode, "model"))) {
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index 7f7013e115..45a4568fcc 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -3670,6 +3670,31 @@ qemuMonitorGetCPUModelExpansion(qemuMonitorPtr mon,
 }
 
 
+qemuMonitorCPUModelInfoPtr
+qemuMonitorCPUModelInfoNew(const char *name)
+{
+qemuMonitorCPUModelInfoPtr ret = NULL;
+qemuMonitorCPUModelInfoPtr model;
+
+if (VIR_ALLOC(model) < 0)
+return NULL;
+
+model->name = NULL;
+model->nprops = 0;
+model->props = NULL;
+model->migratability = false;
+
+if (VIR_STRDUP(model->name, name) < 0)
+goto cleanup;
+
+VIR_STEAL_PTR(ret, model);
+
+ cleanup:
+qemuMonitorCPUModelInfoFree(model);
+return ret;
+}
+
+
 void
 qemuMonitorCPUModelInfoFree(qemuMonitorCPUModelInfoPtr model_info)
 {
@@ -3693,18 +3718,15 @@ qemuMonitorCPUModelInfoFree(qemuMonitorCPUModelInfoPtr 
model_info)
 qemuMonitorCPUModelInfoPtr
 qemuMonitorCPUModelInfoCopy(const qemuMonitorCPUModelInfo *orig)
 {
-qemuMonitorCPUModelInfoPtr copy;
+qemuMonitorCPUModelInfoPtr copy = NULL;
 size_t i;
 
-if (VIR_ALLOC(copy) < 0)
+if (!orig || !(copy = qemuMonitorCPUModelInfoNew(orig->name)))
 goto error;
 
 if (VIR_ALLOC_N(copy->props, orig->nprops) < 0)
 goto error;
 
-if (VIR_STRDUP(copy->name, orig->name) < 0)
-goto error;
-
 copy->migratability = orig->migratability;
 copy->nprops = orig->nprops;
 
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h
index 48b142a4f4..d87b5a4ec0 100644
--- a/src/qemu/qemu_monitor.h
+++ b/src/qemu/qemu_monitor.h
@@ -1042,6 +1042,8 @@ int qemuMonitorGetCPUModelExpansion(qemuMonitorPtr mon,
 
 void qemuMonitorCPUModelInfoFree(qemuMonitorCPUModelInfoPtr model_info);
 
+qemuMonitorCPUModelInfoPtr qemuMonitorCPUModelInfoNew(const char *name);
+
 qemuMonitorCPUModelInfoPtr
 qemuMonitorCPUModelInfoCopy(const qemuMonitorCPUModelInfo *orig);
 
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v3 3/6] qemu_monitor: qemuMonitorGetCPUModelExpansion inputs and outputs CPUModelInfo

2018-09-27 Thread Chris Venteicher
A Full CPUModelInfo structure with props is sent to QEMU for expansion.

virQEMUCapsProbeQMPHostCPU migratability logic partitioned into new function
for clarity.

Signed-off-by: Chris Venteicher 
---
 src/qemu/qemu_capabilities.c | 125 ---
 src/qemu/qemu_monitor.c  |  47 +++--
 src/qemu/qemu_monitor.h  |   5 +-
 src/qemu/qemu_monitor_json.c |  20 --
 src/qemu/qemu_monitor_json.h |   6 +-
 tests/cputest.c  |  11 ++-
 6 files changed, 156 insertions(+), 58 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index e60a4b369e..d38530ca80 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -2352,15 +2352,82 @@ virQEMUCapsProbeQMPCPUDefinitions(virQEMUCapsPtr 
qemuCaps,
 return 0;
 }
 
+/* virQEMUCapsMigratablePropsDiff
+ * @migratable: migratable props=true, non-migratable & unsupported props=false
+ * @nonMigratable: migratable & non-migratable props = true, unsupported props 
= false
+ * @augmented: prop->migratable = VIR_TRISTATE_BOOL_{YES/NO} base on diff
+ *
+ * Use differences in Expanded CPUModelInfo inputs
+ * to augment with prop->migratable in CPUModelInfo output
+ */
+static int
+virQEMUCapsMigratablePropsDiff(qemuMonitorCPUModelInfoPtr migratable,
+   qemuMonitorCPUModelInfoPtr nonMigratable,
+   qemuMonitorCPUModelInfoPtr *augmented)
+{
+int ret = -1;
+qemuMonitorCPUModelInfoPtr tmp;
+qemuMonitorCPUPropertyPtr prop;
+qemuMonitorCPUPropertyPtr mProp;
+qemuMonitorCPUPropertyPtr nmProp;
+virHashTablePtr hash = NULL;
+size_t i;
+
+*augmented = NULL;
+
+if (!(tmp = qemuMonitorCPUModelInfoCopy(migratable)))
+goto cleanup;
+
+if (!nonMigratable)
+goto done;
+
+if (!(hash = virHashCreate(0, NULL)))
+goto cleanup;
+
+for (i = 0; i < tmp->nprops; i++) {
+prop = tmp->props + i;
+
+if (virHashAddEntry(hash, prop->name, prop) < 0)
+goto cleanup;
+}
+
+for (i = 0; i < nonMigratable->nprops; i++) {
+nmProp = nonMigratable->props + i;
+
+if (!(mProp = virHashLookup(hash, nmProp->name)) ||
+mProp->type != QEMU_MONITOR_CPU_PROPERTY_BOOLEAN ||
+mProp->type != nmProp->type)
+continue;  /* In non-migratable list but not in migratable list */
+
+if (mProp->value.boolean) {
+mProp->migratable = VIR_TRISTATE_BOOL_YES;
+} else if (nmProp->value.boolean) {
+mProp->value.boolean = true;
+mProp->migratable = VIR_TRISTATE_BOOL_NO;
+}
+}
+
+tmp->migratability = true;
+
+ done:
+VIR_STEAL_PTR(*augmented, tmp);
+ret = 0;
+
+ cleanup:
+qemuMonitorCPUModelInfoFree(tmp);
+virHashFree(hash);
+return ret;
+}
 
 static int
 virQEMUCapsProbeQMPHostCPU(virQEMUCapsPtr qemuCaps,
qemuMonitorPtr mon,
bool tcg)
 {
-qemuMonitorCPUModelInfoPtr modelInfo = NULL;
+qemuMonitorCPUModelInfoPtr input;
+qemuMonitorCPUModelInfoPtr migratable = NULL;
 qemuMonitorCPUModelInfoPtr nonMigratable = NULL;
-virHashTablePtr hash = NULL;
+qemuMonitorCPUModelInfoPtr augmented = NULL;
 const char *model;
 qemuMonitorCPUModelExpansionType type;
 virDomainVirtType virtType;
@@ -2380,6 +2447,8 @@ virQEMUCapsProbeQMPHostCPU(virQEMUCapsPtr qemuCaps,
 
 cpuData = virQEMUCapsGetHostCPUData(qemuCaps, virtType);
 
+cpuData->info = NULL;
+
 /* Some x86_64 features defined in cpu_map.xml use spelling which differ
  * from the one preferred by QEMU. Static expansion would give us only the
  * preferred spelling, thus we need to do a full expansion on the result of
@@ -2390,54 +2459,30 @@ virQEMUCapsProbeQMPHostCPU(virQEMUCapsPtr qemuCaps,
 else
 type = QEMU_MONITOR_CPU_MODEL_EXPANSION_STATIC;
 
-if (qemuMonitorGetCPUModelExpansion(mon, type, model, true, ) < 
0)
+if (!(input = qemuMonitorCPUModelInfoNew(model)) ||
+qemuMonitorGetCPUModelExpansion(mon, type, true, input, ) < 
0)
 goto cleanup;
 
-/* Try to check migratability of each feature. */
-if (modelInfo &&
-qemuMonitorGetCPUModelExpansion(mon, type, model, false,
-) < 0)
+if (!migratable) {
+ret = 0;  /* Qemu can't expand the model name, exit without error 
*/
 goto cleanup;
+}
 
-if (nonMigratable) {
-qemuMonitorCPUPropertyPtr prop;
-qemuMonitorCPUPropertyPtr nmProp;
-size_t i;
-
-if (!(hash = virHashCreate(0, NULL)))
-goto cleanup;
-
-for (i = 0; i < modelInfo->nprops; i++) {
-prop = modelInfo->props + i;
-if (virHashAddEntry(hash, prop->name, prop) < 0)
-goto cleanup;
-}
-
-for (i = 0; i < nonMigratable->nprops; i++) {
-nmProp = 

[libvirt] [PATCH v3 0/6] BaselineHypervisorCPU using QEMU QMP exchanges

2018-09-27 Thread Chris Venteicher
Some architectures (S390) depend on QEMU to compute baseline CPU model and
expand a models feature set.

Interacting with QEMU requires starting the QEMU process and completing one or
more query-cpu-model-baseline QMP exchanges with QEMU in addition to a
query-cpu-model-expansion QMP exchange to expand all features in the model.

See "s390x CPU models: exposing features" patch set on Qemu-devel for discussion
of QEMU aspects.

This is part of resolution of: 
https://bugzilla.redhat.com/show_bug.cgi?id=1511999

Version 3 attempts to address all issues from V1 and V2 including making the 
QEMU process activation for QMP Queries generic (not specific to capabilities)
and moving that code from qemu_capabilities to qemu_process. 

I attempted to make the new qemu_process functions consistent with the functions
but mostly ended up reusing the guts of the previous functions from 
qemu_capabilities.

Of interest may be the choice to reuse a domain structure to hold the Qmp Query
process state and connection information.  Also, pay attention to the use of a 
large
random number to uniquely identify decoupled processes in terms of sockets and
file system footprint.  If you believe it's worth the effort I think there are 
some
pre-existing library functions to establish files with unique identifiers in a
thread safe way.

The last patch may also be interesting and is based on past discussion of QEMU 
cpu
expansion only returning migratable features except for one x86 case where
non-migratable features are explicitly requested.  The patch records that 
features
are migratable based on QEMU only returning migratable features.  The main 
result
is the CPU xml for S390 CPU's contains the same migratability info the X86 
currently
does.  The testcases were updated to reflect the change.  Is this ok?  

Unlike the previous versions every patch should compile independently if applied
in sequence.

Thanks,
Chris


Chris Venteicher (6):
  qemu_monitor: Introduce qemuMonitorCPUModelInfoNew
  qemu_monitor: Introduce qemuMonitorCPUModelInfo / JSON conversion
  qemu_monitor: qemuMonitorGetCPUModelExpansion inputs and outputs
CPUModelInfo
  qemu_process: Use common processes mgmt funcs for all QMP query types
  qemu_driver: BaselineHypervisorCPU support for S390
  qemu_monitor: Default props to migratable when expanding cpu model

 src/qemu/qemu_capabilities.c  | 559 --
 src/qemu/qemu_capabilities.h  |   4 +
 src/qemu/qemu_driver.c| 143 -
 src/qemu/qemu_monitor.c   | 199 ++-
 src/qemu/qemu_monitor.h   |  29 +-
 src/qemu/qemu_monitor_json.c  | 210 +--
 src/qemu/qemu_monitor_json.h  |  13 +-
 src/qemu/qemu_process.c   | 398 +
 src/qemu/qemu_process.h   |  35 ++
 tests/cputest.c   |  11 +-
 .../caps_2.10.0.s390x.xml |  60 +-
 .../caps_2.11.0.s390x.xml |  58 +-
 .../caps_2.12.0.s390x.xml |  56 +-
 .../qemucapabilitiesdata/caps_2.8.0.s390x.xml |  32 +-
 .../qemucapabilitiesdata/caps_2.9.0.s390x.xml |  34 +-
 .../qemucapabilitiesdata/caps_3.0.0.s390x.xml |  64 +-
 tests/qemucapabilitiestest.c  |   7 +
 17 files changed, 1375 insertions(+), 537 deletions(-)

-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v3 2/6] qemu_monitor: Introduce qemuMonitorCPUModelInfo / JSON conversion

2018-09-27 Thread Chris Venteicher
Conversion functions are used convert CPUModelInfo structs into
QMP JSON and the reverse.

QMP JSON is of form:
{"model": {"name": "IvyBridge", "props": {}}}

qemuMonitorCPUModelInfoBoolPropAdd is used to add boolean properties to
CPUModelInfo struct.

qemuMonitorJSONGetCPUModelExpansion makes full use of conversions and
propAdd in prep to support input of full cpu model in future.

Signed-off-by: Chris Venteicher 
---
 src/qemu/qemu_monitor.c  |  24 ++
 src/qemu/qemu_monitor.h  |   5 ++
 src/qemu/qemu_monitor_json.c | 154 +--
 3 files changed, 138 insertions(+), 45 deletions(-)

diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index 45a4568fcc..801c072eff 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -3764,6 +3764,30 @@ qemuMonitorCPUModelInfoCopy(const 
qemuMonitorCPUModelInfo *orig)
 }
 
 
+int
+qemuMonitorCPUModelInfoBoolPropAdd(qemuMonitorCPUModelInfoPtr model,
+   const char *prop_name,
+   bool prop_value)
+{
+int ret = -1;
+qemuMonitorCPUProperty prop;
+prop.type = QEMU_MONITOR_CPU_PROPERTY_BOOLEAN;
+prop.value.boolean = prop_value;
+
+if (VIR_STRDUP(prop.name, prop_name) < 0)
+goto cleanup;
+
+if (VIR_APPEND_ELEMENT(model->props, model->nprops, prop) < 0)
+goto cleanup;
+
+ret = 0;
+
+ cleanup:
+VIR_FREE(prop.name);
+return ret;
+}
+
+
 int
 qemuMonitorGetCommands(qemuMonitorPtr mon,
char ***commands)
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h
index d87b5a4ec0..0a09590ed1 100644
--- a/src/qemu/qemu_monitor.h
+++ b/src/qemu/qemu_monitor.h
@@ -1047,6 +1047,11 @@ qemuMonitorCPUModelInfoPtr 
qemuMonitorCPUModelInfoNew(const char *name);
 qemuMonitorCPUModelInfoPtr
 qemuMonitorCPUModelInfoCopy(const qemuMonitorCPUModelInfo *orig);
 
+int qemuMonitorCPUModelInfoBoolPropAdd(qemuMonitorCPUModelInfoPtr model,
+   const char *prop_name,
+   bool prop_value)
+ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
+
 int qemuMonitorGetCommands(qemuMonitorPtr mon,
char ***commands);
 int qemuMonitorGetEvents(qemuMonitorPtr mon,
diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
index 3de298c9e2..f20e9e9379 100644
--- a/src/qemu/qemu_monitor_json.c
+++ b/src/qemu/qemu_monitor_json.c
@@ -5505,6 +5505,101 @@ qemuMonitorJSONParseCPUModelProperty(const char *key,
 return 0;
 }
 
+
+/* model_json: {"name": "z13-base", "props": {}}
+ */
+static virJSONValuePtr
+qemuMonitorJSONBuildCPUModelInfoToJSON(qemuMonitorCPUModelInfoPtr model)
+{
+virJSONValuePtr cpu_props = NULL;
+virJSONValuePtr model_json = NULL;
+size_t i;
+
+if (!model)
+goto cleanup;
+
+if (model->nprops > 0 && !(cpu_props = virJSONValueNewObject()))
+goto cleanup;
+
+for (i = 0; i < model->nprops; i++) {
+qemuMonitorCPUPropertyPtr prop = &(model->props[i]);
+
+switch (prop->type) {
+case QEMU_MONITOR_CPU_PROPERTY_BOOLEAN:
+if (virJSONValueObjectAppendBoolean(cpu_props, prop->name,
+prop->value.boolean) < 0)
+goto cleanup;
+break;
+
+case QEMU_MONITOR_CPU_PROPERTY_STRING:
+if (virJSONValueObjectAppendString(cpu_props, prop->name,
+   prop->value.string) < 0)
+goto cleanup;
+break;
+
+case QEMU_MONITOR_CPU_PROPERTY_NUMBER:
+if (virJSONValueObjectAppendNumberLong(cpu_props, prop->name,
+   prop->value.number) < 0)
+goto cleanup;
+break;
+
+case QEMU_MONITOR_CPU_PROPERTY_LAST:
+default:
+virReportEnumRangeError(qemuMonitorCPUPropertyPtr, prop->type);
+goto cleanup;
+}
+}
+
+ignore_value(virJSONValueObjectCreate(_json, "s:name", model->name,
+  "A:props", _props, NULL));
+
+ cleanup:
+virJSONValueFree(cpu_props);
+return model_json;
+}
+
+
+/* model_json: {"name": "IvyBridge", "props": {}}
+ */
+static qemuMonitorCPUModelInfoPtr
+qemuMonitorJSONBuildCPUModelInfoFromJSON(virJSONValuePtr cpu_model)
+{
+virJSONValuePtr cpu_props;
+qemuMonitorCPUModelInfoPtr model = NULL;
+qemuMonitorCPUModelInfoPtr ret = NULL;
+char const *cpu_name;
+
+if (!(cpu_name = virJSONValueObjectGetString(cpu_model, "name"))) {
+virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+   _("Parsed JSON reply missing 'name'"));
+goto cleanup;
+}
+
+if (VIR_ALLOC(model) < 0)
+goto cleanup;
+
+if (VIR_STRDUP(model->name, cpu_name) < 0)
+goto cleanup;
+
+if ((cpu_props = virJSONValueObjectGetObject(cpu_model, "props"))) {
+

Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Alex Williamson
On Fri, 28 Sep 2018 00:53:00 +0530
Kirti Wankhede  wrote:

> On 9/27/2018 9:29 PM, Alex Williamson wrote:
> > On Thu, 27 Sep 2018 06:48:27 +
> > "Tian, Kevin"  wrote:
> >   
> >>> From: Kirti Wankhede
> >>> Sent: Thursday, September 27, 2018 2:22 PM
> >>>
> >>> Generally a single instance of mdev device, a share of physical device, is
> >>> assigned to user space application or a VM. There are cases when multiple
> >>> instances of mdev devices of same or different types are required by User
> >>> space application or VM. For example in case of vGPU, multiple mdev
> >>> devices
> >>> of type which represents whole GPU can be assigned to one instance of
> >>> application or VM.
> >>>
> >>> All types of mdev devices may not support assigning multiple mdev devices
> >>> to a user space application. In that case vendor driver can fail open()
> >>> call of mdev device. But there is no way to know User space application
> >>> about the configuration supported by vendor driver.
> >>>
> >>> To expose supported configuration, vendor driver should add
> >>> 'multiple_mdev_support' attribute to type-id directory if vendor driver
> >>> supports assigning multiple mdev devices of a particular type-id to one
> >>> instance of user space application or a VM.
> >>>
> >>> User space application should read if 'multiple_mdev_support' attibute is
> >>> present in type-id directory of all mdev devices which are going to be
> >>> used. If all read 1 then user space application can proceed with multiple
> >>> mdev devices.
> >>>
> >>> This is optional and readonly attribute.
> >>
> >> I didn't get what is the exact problem from above description. Isn't it the
> >> basic point behind mdev concept that parent driver can create multiple
> >> mdev instances on a physical device? VFIO usually doesn't care whether
> >> multiple devices (pci or mdev) are assigned to same process/VM or not.
> >> Can you give some example of actual problem behind this change?  
> > 
> > The situation here is that mdev imposes no restrictions regarding how
> > mdev devices can be used by the user.  Multiple mdevs, regardless of
> > type, can be combined and used in any way the user sees fit, at least
> > that's the theory.  However, in practice, certain vendor drivers impose
> > a limitation that prevents multiple mdev devices from being used in the
> > same VM.  This is done by returning an error when the user attempts to
> > open those additional devices.  Now we're in the very predictable
> > situation that we want to consider that some of the vendor's mdev types
> > might be combined in the same userspace instance, while others still
> > impose a restriction, and how can we optionally expose such per mdev
> > type restrictions to the userspace so we have something better than
> > "try it and see if it works".
> > 
> > The below proposal assumes that a single instance per VM is the norm
> > and that we add something to the API to indicate multiple instances are
> > supported.  However, that's really never been the position of the mdev
> > core, where there's no concept of limiting devices per user instance;
> > it's a vendor driver restriction.  So I think the question is then why
> > should we burden vendor drivers who do not impose a restriction with an
> > additional field?  Does the below extension provide a better backwards
> > compatibility scenario?  
> 
> The vendor driver who do not want to impose restriction, doesn't need to
> provide this attribute. In that case, behavior would remain same as it
> is now, i.e. multiple mdev instances can be used by one instance of
> application.
> 
> 
> > With the current code, I think it's reasonable for userspace to assume
> > there are no mdev limits per userspace instance, those that exist are
> > vendor specific.  The below proposal is for an optional field, what
> > does the management tool assume when it's not supplied?  We have both
> > cases currently, mdev devices that allow multiple instances per VM and
> > those that do not, so I don't see that the user is more informed with
> > this addition and no attempt is made here to synchronously update all
> > drivers to expose this new attribute.
> >   
> 
> When this attribute is not provided, behavior should remain same as it
> is now. But if this attribute is provided then management tool should
> read this attribute to verify that such combination is supported by
> vendor driver.
> 
> > Does it make more sense then to add an attribute to expose the
> > restriction rather than the lack of restriction.  Perhaps something
> > like "singleton_usage_restriction".  
> 
> I would prefer to expose what is supported than what is restricted.

Above, it's stated that vendors who do not impose a restriction do not
need to implement this.  So it's designed to expose a restriction.
We're stating that exposing multiple_mdev_support=1/Y is the equivalent
of not reporting the attribute at all, so the only reason to implement
it would be because there is a 

Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Kirti Wankhede


On 9/27/2018 9:29 PM, Alex Williamson wrote:
> On Thu, 27 Sep 2018 06:48:27 +
> "Tian, Kevin"  wrote:
> 
>>> From: Kirti Wankhede
>>> Sent: Thursday, September 27, 2018 2:22 PM
>>>
>>> Generally a single instance of mdev device, a share of physical device, is
>>> assigned to user space application or a VM. There are cases when multiple
>>> instances of mdev devices of same or different types are required by User
>>> space application or VM. For example in case of vGPU, multiple mdev
>>> devices
>>> of type which represents whole GPU can be assigned to one instance of
>>> application or VM.
>>>
>>> All types of mdev devices may not support assigning multiple mdev devices
>>> to a user space application. In that case vendor driver can fail open()
>>> call of mdev device. But there is no way to know User space application
>>> about the configuration supported by vendor driver.
>>>
>>> To expose supported configuration, vendor driver should add
>>> 'multiple_mdev_support' attribute to type-id directory if vendor driver
>>> supports assigning multiple mdev devices of a particular type-id to one
>>> instance of user space application or a VM.
>>>
>>> User space application should read if 'multiple_mdev_support' attibute is
>>> present in type-id directory of all mdev devices which are going to be
>>> used. If all read 1 then user space application can proceed with multiple
>>> mdev devices.
>>>
>>> This is optional and readonly attribute.  
>>
>> I didn't get what is the exact problem from above description. Isn't it the
>> basic point behind mdev concept that parent driver can create multiple
>> mdev instances on a physical device? VFIO usually doesn't care whether
>> multiple devices (pci or mdev) are assigned to same process/VM or not.
>> Can you give some example of actual problem behind this change?
> 
> The situation here is that mdev imposes no restrictions regarding how
> mdev devices can be used by the user.  Multiple mdevs, regardless of
> type, can be combined and used in any way the user sees fit, at least
> that's the theory.  However, in practice, certain vendor drivers impose
> a limitation that prevents multiple mdev devices from being used in the
> same VM.  This is done by returning an error when the user attempts to
> open those additional devices.  Now we're in the very predictable
> situation that we want to consider that some of the vendor's mdev types
> might be combined in the same userspace instance, while others still
> impose a restriction, and how can we optionally expose such per mdev
> type restrictions to the userspace so we have something better than
> "try it and see if it works".
> 
> The below proposal assumes that a single instance per VM is the norm
> and that we add something to the API to indicate multiple instances are
> supported.  However, that's really never been the position of the mdev
> core, where there's no concept of limiting devices per user instance;
> it's a vendor driver restriction.  So I think the question is then why
> should we burden vendor drivers who do not impose a restriction with an
> additional field?  Does the below extension provide a better backwards
> compatibility scenario?

The vendor driver who do not want to impose restriction, doesn't need to
provide this attribute. In that case, behavior would remain same as it
is now, i.e. multiple mdev instances can be used by one instance of
application.


> With the current code, I think it's reasonable for userspace to assume
> there are no mdev limits per userspace instance, those that exist are
> vendor specific.  The below proposal is for an optional field, what
> does the management tool assume when it's not supplied?  We have both
> cases currently, mdev devices that allow multiple instances per VM and
> those that do not, so I don't see that the user is more informed with
> this addition and no attempt is made here to synchronously update all
> drivers to expose this new attribute.
> 

When this attribute is not provided, behavior should remain same as it
is now. But if this attribute is provided then management tool should
read this attribute to verify that such combination is supported by
vendor driver.

> Does it make more sense then to add an attribute to expose the
> restriction rather than the lack of restriction.  Perhaps something
> like "singleton_usage_restriction".

I would prefer to expose what is supported than what is restricted.

>  Also if the type is meant to be a
> boolean, I think we have Y/N support for that in sysfs rather than
> using integers which raise the question what >1 implies.
> 

Ok. Makes sense. I'll change it to Y/N

Thanks,
Kirti

> Is there a more compelling backwards compatibility store for the below
> proposal than what I'm thinking?  Thanks,
> 
> Alex
> 
>>> Signed-off-by: Neo Jia 
>>> Signed-off-by: Kirti Wankhede 
>>> ---
>>>  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 13 +
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git 

[libvirt] [PATCH 4/4] security: Always spawn process for transactions

2018-09-27 Thread Michal Privoznik
On 09/27/2018 03:33 PM, John Ferlan wrote:
> 
> 
> On 9/27/18 4:51 AM, Michal Privoznik wrote:
>> On 09/26/2018 11:14 PM, John Ferlan wrote:
>>>
> 
> [...]
> 
> I read the cover, it's not simpler than the above. 

 Yeah, this is not that simple bug. I'll try to explain it differently:

 On fork(), all FDs of the parent are duplicated to child. So even if
 parent closes one of them, the connection is not closed until child
 closes the same FD. And since metadata locking is very fragile when it
 comes to connection to virtlockd, this fork() behaviour actually plays
 against us because it violates our assumption that connection is closed
 at the end of metadataUnlock(). In fact it is not - child has it still 
 open.

 If there was a flag like DONT_DUP_ON_FORK I could just set it on
 virtlockd connection in metadataLock() and be done.
>>>
>>> Avoid the forking exec's works too ;-}
>>>
>>> OK - instead of talking in generalities - which forking dup'd fd is in
>>> play here? I don't want to assume or guess any more.
>>
>> The connection FD.
>>
>> Maybe I'm misreading your comments or the other way around. I'll try to
>> explain what is the problem one last time, in the simplest way I can.
>>
>> 1) Thread A wants to start a domain so it calls
>> virSecurityManagerMetadataLock().
>>
>> 2) The function opens a connection to virtlockd (represented by FD) and
>> duplicates the FD so that connection is kept open until unlock.
> 
> Yep, virLockManagerLockDaemonAcquire calls virNetClientDupFD(false)
> meaning I don't want one with the "FD_CLOEXEC" flag set.
> 
> While normally that would be OK - is it OK for this path? Does anything
> change if the argument was true here (e.g. cloexec = priv->type ==
> VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON)?
> 
> It seems to me that the 3 patches together dance around two things -
> that the CLOEXEC is set and that TransactionCommit does the fork to
> avoid the parent/child duplication problem.

Let's forget about CLOEXEC for a moment. I'll wait for Dan to return (or
find somebody else) for those patches. This is the only patch left for
consideration.

> 
>>
>> 3) Now that locks are acquired, the thread does chown() and setfcon().
>>
>> 4) the virSecurityManagerMetadataUnlock() function is called, which
>> causes virtlockd to release all the locks. Subsequently to that, the
>> function closes the duplicated FD obtained in step 2).
>>
>> The duplication is done because both virLockManagerAcquire() and
>> virLockManagerRelease() open new connection, do the RPC talk and close
>> the connection. That's just the way the functions are written. Now,
>> because of FD duplication happening in step 2) the connection from
>> virLockManagerAcquire() is not really closed until step 4) where the
>> duplicated FD is closed.
>>
>> This is important to realize: the connection is closed only when the
>> last FD representing it is closed. If I were to duplicate the connection
>> FD a hundred times and then close 99 of those duplicates the connection
>> would still be kept open.
> 
> I've understood all this.
> 
>>
>> Now, lets put some threads into the picture. Imagine thread B which
>> called fork() just in between steps 2) and 3). On fork, ALL partent FDs
>> are duplicated. So the connection FD is duplicated into the child
>> process too. Therefore, in step 4) where we think the connection is
>> closed - well it is not really because there is still one FD around - in
>> the child.
> 
> This is exactly where the story gets cloudy for me. Is ThreadB related
> to ThreadA? It must be, because how would it get the duplicated fd, right?

They are related in a sense that they run in the same process. But they
don't have to be related at all for fork() to duplicate all the FDs.
fork() does not ask if two threads are related, it just duplicates ALL
opened FDs into the child.

> 
> But ironically we save priv->clientfd, so if we know we've forked a
> child can we immediately close it?  Knowing that our parent still has it
> open? Off in left field again?

I'm failing to see how to achieve that. fork() is occurring more than
you might think - every time virCommandRun() is called, every time we
try to open a file on NFS, and million of other places. Now for every
place we would have to put close() as the first thing that child does
(and even then that would be unsafe because you are not guaranteed when
the child is scheduled to run), and there might be places where we are
unable to put close(), e.g. if some library we link with fork()-s too.
Also, requiring close() would mean making the FD globally available
which goes against all our code separation efforts.

> 
>>
>> And this is huge problem, because we do not know when child is going to
>> close it. It may be (unintentionally) an evil child and close the FD at
>> the worst time possible - when libvirtd holds some locks.
>>
>> Important thing to realize #2: whenever a connection closes, virtlockd
>> looks 

Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Alex Williamson
On Thu, 27 Sep 2018 06:48:27 +
"Tian, Kevin"  wrote:

> > From: Kirti Wankhede
> > Sent: Thursday, September 27, 2018 2:22 PM
> > 
> > Generally a single instance of mdev device, a share of physical device, is
> > assigned to user space application or a VM. There are cases when multiple
> > instances of mdev devices of same or different types are required by User
> > space application or VM. For example in case of vGPU, multiple mdev
> > devices
> > of type which represents whole GPU can be assigned to one instance of
> > application or VM.
> > 
> > All types of mdev devices may not support assigning multiple mdev devices
> > to a user space application. In that case vendor driver can fail open()
> > call of mdev device. But there is no way to know User space application
> > about the configuration supported by vendor driver.
> > 
> > To expose supported configuration, vendor driver should add
> > 'multiple_mdev_support' attribute to type-id directory if vendor driver
> > supports assigning multiple mdev devices of a particular type-id to one
> > instance of user space application or a VM.
> > 
> > User space application should read if 'multiple_mdev_support' attibute is
> > present in type-id directory of all mdev devices which are going to be
> > used. If all read 1 then user space application can proceed with multiple
> > mdev devices.
> > 
> > This is optional and readonly attribute.  
> 
> I didn't get what is the exact problem from above description. Isn't it the
> basic point behind mdev concept that parent driver can create multiple
> mdev instances on a physical device? VFIO usually doesn't care whether
> multiple devices (pci or mdev) are assigned to same process/VM or not.
> Can you give some example of actual problem behind this change?

The situation here is that mdev imposes no restrictions regarding how
mdev devices can be used by the user.  Multiple mdevs, regardless of
type, can be combined and used in any way the user sees fit, at least
that's the theory.  However, in practice, certain vendor drivers impose
a limitation that prevents multiple mdev devices from being used in the
same VM.  This is done by returning an error when the user attempts to
open those additional devices.  Now we're in the very predictable
situation that we want to consider that some of the vendor's mdev types
might be combined in the same userspace instance, while others still
impose a restriction, and how can we optionally expose such per mdev
type restrictions to the userspace so we have something better than
"try it and see if it works".

The below proposal assumes that a single instance per VM is the norm
and that we add something to the API to indicate multiple instances are
supported.  However, that's really never been the position of the mdev
core, where there's no concept of limiting devices per user instance;
it's a vendor driver restriction.  So I think the question is then why
should we burden vendor drivers who do not impose a restriction with an
additional field?  Does the below extension provide a better backwards
compatibility scenario?

With the current code, I think it's reasonable for userspace to assume
there are no mdev limits per userspace instance, those that exist are
vendor specific.  The below proposal is for an optional field, what
does the management tool assume when it's not supplied?  We have both
cases currently, mdev devices that allow multiple instances per VM and
those that do not, so I don't see that the user is more informed with
this addition and no attempt is made here to synchronously update all
drivers to expose this new attribute.

Does it make more sense then to add an attribute to expose the
restriction rather than the lack of restriction.  Perhaps something
like "singleton_usage_restriction".  Also if the type is meant to be a
boolean, I think we have Y/N support for that in sysfs rather than
using integers which raise the question what >1 implies.

Is there a more compelling backwards compatibility store for the below
proposal than what I'm thinking?  Thanks,

Alex

> > Signed-off-by: Neo Jia 
> > Signed-off-by: Kirti Wankhede 
> > ---
> >  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 13 +
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> > b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> > index 452dbe39270e..69e1291479ce 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> > +++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> > @@ -85,6 +85,19 @@ Users:
> > a particular  that can help in understanding the
> > features provided by that type of mediated device.
> > 
> > +What:   /sys/.../mdev_supported_types/ > id>/multiple_mdev_support  
> > +Date:   September 2018
> > +Contact:Kirti Wankhede 
> > +Description:
> > +   Reading this attribute will return 0 or 1. Returning 1
> > indicates
> > +   multiple mdev devices 

[libvirt] [PATCH 5/8] qemu: hotplug: Add wrapper for disk hotplug code

2018-09-27 Thread Peter Krempa
The disk hotplug code also overloads media change which is not ideal.
This will allow splitting out of the media change code.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_hotplug.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 1ec696e2aa..6227a130da 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -1061,10 +1061,10 @@ qemuDomainAttachUSBMassStorageDevice(virQEMUDriverPtr 
driver,
 }


-int
-qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
-   virDomainObjPtr vm,
-   virDomainDeviceDefPtr dev)
+static int
+qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr driver,
+   virDomainObjPtr vm,
+   virDomainDeviceDefPtr dev)
 {
 size_t i;
 virDomainDiskDefPtr disk = dev->data.disk;
@@ -1157,6 +1157,25 @@ qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
 }


+/**
+ * qemuDomainAttachDeviceDiskLive:
+ * @driver: qemu driver struct
+ * @vm: domain object
+ * @dev: device to attach (expected type is DISK)
+ *
+ * Attach a new disk or in case of cdroms/floppies change the media in the 
drive.
+ * This function handles all the necessary steps to attach a new storage source
+ * to the VM.
+ */
+int
+qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
+   virDomainObjPtr vm,
+   virDomainDeviceDefPtr dev)
+{
+return qemuDomainAttachDeviceDiskLiveInternal(driver, vm, dev);
+}
+
+
 static void
 qemuDomainNetDeviceVportRemove(virDomainNetDefPtr net)
 {
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 8/8] qemu: hotplug: Refactor qemuDomainAttachDeviceDiskLiveInternal

2018-09-27 Thread Peter Krempa
We now explicitly handle media change elsewhere so we can drop the
switch statement. This will also make it more intuitive once CDROM
device hotplug might be supported.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_hotplug.c | 70 ++---
 1 file changed, 30 insertions(+), 40 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index ef3f0cd8b3..d5e1e2a039 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -1098,52 +1098,42 @@ qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr 
driver,
 if (qemuDomainDetermineDiskChain(driver, vm, disk, true) < 0)
 goto cleanup;

-switch ((virDomainDiskDevice) disk->device)  {
-case VIR_DOMAIN_DISK_DEVICE_DISK:
-case VIR_DOMAIN_DISK_DEVICE_LUN:
-for (i = 0; i < vm->def->ndisks; i++) {
-if (virDomainDiskDefCheckDuplicateInfo(vm->def->disks[i], disk) < 
0)
-goto cleanup;
-}
-
-switch ((virDomainDiskBus) disk->bus) {
-case VIR_DOMAIN_DISK_BUS_USB:
-if (disk->device == VIR_DOMAIN_DISK_DEVICE_LUN) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
-   _("disk device='lun' is not supported for usb 
bus"));
-break;
-}
-ret = qemuDomainAttachUSBMassStorageDevice(driver, vm, disk);
-break;
-
-case VIR_DOMAIN_DISK_BUS_VIRTIO:
-ret = qemuDomainAttachVirtioDiskDevice(driver, vm, disk);
-break;
+for (i = 0; i < vm->def->ndisks; i++) {
+if (virDomainDiskDefCheckDuplicateInfo(vm->def->disks[i], disk) < 0)
+goto cleanup;
+}

-case VIR_DOMAIN_DISK_BUS_SCSI:
-ret = qemuDomainAttachSCSIDisk(driver, vm, disk);
+switch ((virDomainDiskBus) disk->bus) {
+case VIR_DOMAIN_DISK_BUS_USB:
+if (disk->device == VIR_DOMAIN_DISK_DEVICE_LUN) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+   _("disk device='lun' is not supported for usb 
bus"));
 break;
-
-case VIR_DOMAIN_DISK_BUS_IDE:
-case VIR_DOMAIN_DISK_BUS_FDC:
-case VIR_DOMAIN_DISK_BUS_XEN:
-case VIR_DOMAIN_DISK_BUS_UML:
-case VIR_DOMAIN_DISK_BUS_SATA:
-case VIR_DOMAIN_DISK_BUS_SD:
-/* Note that SD card hotplug support should be added only once
- * they support '-device' (don't require -drive only).
- * See also: qemuDiskBusNeedsDriveArg */
-case VIR_DOMAIN_DISK_BUS_LAST:
-virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
-   _("disk bus '%s' cannot be hotplugged."),
-   virDomainDiskBusTypeToString(disk->bus));
 }
+ret = qemuDomainAttachUSBMassStorageDevice(driver, vm, disk);
+break;
+
+case VIR_DOMAIN_DISK_BUS_VIRTIO:
+ret = qemuDomainAttachVirtioDiskDevice(driver, vm, disk);
 break;

-case VIR_DOMAIN_DISK_DEVICE_CDROM:
-case VIR_DOMAIN_DISK_DEVICE_FLOPPY:
-case VIR_DOMAIN_DISK_DEVICE_LAST:
+case VIR_DOMAIN_DISK_BUS_SCSI:
+ret = qemuDomainAttachSCSIDisk(driver, vm, disk);
 break;
+
+case VIR_DOMAIN_DISK_BUS_IDE:
+case VIR_DOMAIN_DISK_BUS_FDC:
+case VIR_DOMAIN_DISK_BUS_XEN:
+case VIR_DOMAIN_DISK_BUS_UML:
+case VIR_DOMAIN_DISK_BUS_SATA:
+case VIR_DOMAIN_DISK_BUS_SD:
+/* Note that SD card hotplug support should be added only once
+ * they support '-device' (don't require -drive only).
+ * See also: qemuDiskBusNeedsDriveArg */
+case VIR_DOMAIN_DISK_BUS_LAST:
+virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
+   _("disk bus '%s' cannot be hotplugged."),
+   virDomainDiskBusTypeToString(disk->bus));
 }

  cleanup:
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 7/8] qemu: hotplug: Split out media change code from disk hotplug

2018-09-27 Thread Peter Krempa
Disk hotplug has slightly different semantics from media changing. Move
the media change code out and add proper initialization of the new
source object.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_driver.c  | 15 +--
 src/qemu/qemu_hotplug.c | 56 +
 2 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 64f0ad33e8..f609151152 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -7851,12 +7851,6 @@ qemuDomainChangeDiskLive(virDomainObjPtr vm,
 virDomainDeviceDef oldDev = { .type = dev->type };
 int ret = -1;

-if (virDomainDiskTranslateSourcePool(disk) < 0)
-goto cleanup;
-
-if (qemuDomainDetermineDiskChain(driver, vm, disk, true) < 0)
-goto cleanup;
-
 if (!(orig_disk = virDomainDiskFindByBusAndDst(vm->def,
disk->bus, disk->dst))) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
@@ -7885,16 +7879,9 @@ qemuDomainChangeDiskLive(virDomainObjPtr vm,
 goto cleanup;
 }

-/* Add the new disk src into shared disk hash table */
-if (qemuAddSharedDevice(driver, dev, vm->def->name) < 0)
-goto cleanup;
-
 if (qemuDomainChangeEjectableMedia(driver, vm, orig_disk,
-   dev->data.disk->src, force) < 0) {
-ignore_value(qemuRemoveSharedDisk(driver, dev->data.disk,
-  vm->def->name));
+   dev->data.disk->src, force) < 0)
 goto cleanup;
-}

 dev->data.disk->src = NULL;
 }
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 6227a130da..ef3f0cd8b3 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -737,6 +737,15 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,

 disk->src = newsrc;

+if (virDomainDiskTranslateSourcePool(disk) < 0)
+goto cleanup;
+
+if (qemuAddSharedDisk(driver, disk, vm->def->name) < 0)
+goto cleanup;
+
+if (qemuDomainDetermineDiskChain(driver, vm, disk, true) < 0)
+goto cleanup;
+
 if (qemuDomainPrepareDiskSource(disk, priv, cfg) < 0)
 goto cleanup;

@@ -1068,9 +1077,15 @@ qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr 
driver,
 {
 size_t i;
 virDomainDiskDefPtr disk = dev->data.disk;
-virDomainDiskDefPtr orig_disk = NULL;
 int ret = -1;

+if (disk->device == VIR_DOMAIN_DISK_DEVICE_CDROM ||
+disk->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY) {
+virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
+   _("cdrom/floppy device hotplug isn't supported"));
+return -1;
+}
+
 if (virDomainDiskTranslateSourcePool(disk) < 0)
 goto cleanup;

@@ -1084,27 +1099,6 @@ qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr 
driver,
 goto cleanup;

 switch ((virDomainDiskDevice) disk->device)  {
-case VIR_DOMAIN_DISK_DEVICE_CDROM:
-case VIR_DOMAIN_DISK_DEVICE_FLOPPY:
-if (!(orig_disk = virDomainDiskFindByBusAndDst(vm->def,
-   disk->bus, disk->dst))) 
{
-virReportError(VIR_ERR_INTERNAL_ERROR,
-   _("No device with bus '%s' and target '%s'. "
- "cdrom and floppy device hotplug isn't supported "
- "by libvirt"),
-   virDomainDiskBusTypeToString(disk->bus),
-   disk->dst);
-goto cleanup;
-}
-
-if (qemuDomainChangeEjectableMedia(driver, vm, orig_disk,
-   disk->src, false) < 0)
-goto cleanup;
-
-disk->src = NULL;
-ret = 0;
-break;
-
 case VIR_DOMAIN_DISK_DEVICE_DISK:
 case VIR_DOMAIN_DISK_DEVICE_LUN:
 for (i = 0; i < vm->def->ndisks; i++) {
@@ -1146,6 +1140,8 @@ qemuDomainAttachDeviceDiskLiveInternal(virQEMUDriverPtr 
driver,
 }
 break;

+case VIR_DOMAIN_DISK_DEVICE_CDROM:
+case VIR_DOMAIN_DISK_DEVICE_FLOPPY:
 case VIR_DOMAIN_DISK_DEVICE_LAST:
 break;
 }
@@ -1172,6 +1168,22 @@ qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
virDomainObjPtr vm,
virDomainDeviceDefPtr dev)
 {
+virDomainDiskDefPtr disk = dev->data.disk;
+virDomainDiskDefPtr orig_disk = NULL;
+
+/* this API overloads media change semantics on disk hotplug
+ * for devices supporting media changes */
+if ((disk->device == VIR_DOMAIN_DISK_DEVICE_CDROM ||
+ disk->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY) &&
+(orig_disk = virDomainDiskFindByBusAndDst(vm->def, disk->bus, 
disk->dst))) {
+if (qemuDomainChangeEjectableMedia(driver, vm, orig_disk,
+

[libvirt] [PATCH 2/8] Revert "qemu: hotplug: consolidate media change code paths"

2018-09-27 Thread Peter Krempa
While the idea was good the implementation not so much as we need to
take into account the old disk data and the new source. The code will be
consolidated later in a different way.

This reverts commit 663b1d55de652201b19d875f0eff730dc28e689e.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_driver.c  | 20 ++--
 src/qemu/qemu_hotplug.c | 18 +++---
 src/qemu/qemu_hotplug.h |  9 +++--
 tests/qemuhotplugtest.c |  2 +-
 4 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index b238309852..64f0ad33e8 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -7600,7 +7600,7 @@ qemuDomainAttachDeviceLive(virDomainObjPtr vm,
 switch ((virDomainDeviceType)dev->type) {
 case VIR_DOMAIN_DEVICE_DISK:
 qemuDomainObjCheckDiskTaint(driver, vm, dev->data.disk, NULL);
-ret = qemuDomainAttachDeviceDiskLive(driver, vm, dev, false);
+ret = qemuDomainAttachDeviceDiskLive(driver, vm, dev);
 if (!ret) {
 alias = dev->data.disk->info.alias;
 dev->data.disk = NULL;
@@ -7851,6 +7851,12 @@ qemuDomainChangeDiskLive(virDomainObjPtr vm,
 virDomainDeviceDef oldDev = { .type = dev->type };
 int ret = -1;

+if (virDomainDiskTranslateSourcePool(disk) < 0)
+goto cleanup;
+
+if (qemuDomainDetermineDiskChain(driver, vm, disk, true) < 0)
+goto cleanup;
+
 if (!(orig_disk = virDomainDiskFindByBusAndDst(vm->def,
disk->bus, disk->dst))) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
@@ -7879,8 +7885,18 @@ qemuDomainChangeDiskLive(virDomainObjPtr vm,
 goto cleanup;
 }

-if (qemuDomainAttachDeviceDiskLive(driver, vm, dev, force) < 0)
+/* Add the new disk src into shared disk hash table */
+if (qemuAddSharedDevice(driver, dev, vm->def->name) < 0)
 goto cleanup;
+
+if (qemuDomainChangeEjectableMedia(driver, vm, orig_disk,
+   dev->data.disk->src, force) < 0) {
+ignore_value(qemuRemoveSharedDisk(driver, dev->data.disk,
+  vm->def->name));
+goto cleanup;
+}
+
+dev->data.disk->src = NULL;
 }

 orig_disk->startupPolicy = dev->data.disk->startupPolicy;
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index ccf3336633..86afda636e 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -722,7 +722,7 @@ qemuDomainChangeMediaBlockdev(virQEMUDriverPtr driver,
  *
  * Returns 0 on success, -1 on error and reports libvirt error
  */
-static int
+int
 qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
virDomainObjPtr vm,
virDomainDiskDefPtr disk,
@@ -1049,22 +1049,10 @@ qemuDomainAttachUSBMassStorageDevice(virQEMUDriverPtr 
driver,
 }


-/**
- * qemuDomainAttachDeviceDiskLive:
- * @driver: qemu driver struct
- * @vm: domain object
- * @dev: device to attach (expected type is DISK)
- * @forceMediaChange: Forcibly open the drive if changing media
- *
- * Attach a new disk or in case of cdroms/floppies change the media in the 
drive.
- * This function handles all the necessary steps to attach a new storage source
- * to the VM. If @forceMediaChange is true the drive is opened forcibly.
- */
 int
 qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
virDomainObjPtr vm,
-   virDomainDeviceDefPtr dev,
-   bool forceMediaChange)
+   virDomainDeviceDefPtr dev)
 {
 size_t i;
 virDomainDiskDefPtr disk = dev->data.disk;
@@ -1098,7 +1086,7 @@ qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
 }

 if (qemuDomainChangeEjectableMedia(driver, vm, orig_disk,
-   disk->src, forceMediaChange) < 0)
+   disk->src, false) < 0)
 goto cleanup;

 disk->src = NULL;
diff --git a/src/qemu/qemu_hotplug.h b/src/qemu/qemu_hotplug.h
index c085c45082..0297e42a98 100644
--- a/src/qemu/qemu_hotplug.h
+++ b/src/qemu/qemu_hotplug.h
@@ -28,6 +28,12 @@
 # include "qemu_domain.h"
 # include "domain_conf.h"

+int qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
+   virDomainObjPtr vm,
+   virDomainDiskDefPtr disk,
+   virStorageSourcePtr newsrc,
+   bool force);
+
 void qemuDomainDelTLSObjects(virQEMUDriverPtr driver,
  virDomainObjPtr vm,
  qemuDomainAsyncJob asyncJob,
@@ -54,8 +60,7 @@ int qemuDomainAttachControllerDevice(virQEMUDriverPtr driver,
  virDomainControllerDefPtr controller);
 int 

[libvirt] [PATCH 4/8] qemu: hotplug: Prepare disk source for media changing

2018-09-27 Thread Peter Krempa
The disk storage source needs to be prepared if we want to use -blockdev
or secrets for the new media image.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_hotplug.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 3043b3257b..1ec696e2aa 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -729,6 +729,7 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
virStorageSourcePtr newsrc,
bool force)
 {
+virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
 qemuDomainObjPrivatePtr priv = vm->privateData;
 virStorageSourcePtr oldsrc = disk->src;
 int ret = -1;
@@ -736,6 +737,9 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,

 disk->src = newsrc;

+if (qemuDomainPrepareDiskSource(disk, priv, cfg) < 0)
+goto cleanup;
+
 if (qemuHotplugPrepareDiskAccess(driver, vm, disk, newsrc, false) < 0)
 goto cleanup;

@@ -770,6 +774,7 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
 if (oldsrc)
 disk->src = oldsrc;

+virObjectUnref(cfg);
 return ret;
 }

-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 6/8] qemu: conf: Export qemuAddSharedDisk

2018-09-27 Thread Peter Krempa
In cases where we know the device is a disk we can avoid using the full
device definition.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_conf.c | 2 +-
 src/qemu/qemu_conf.h | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c
index 33508174cb..6cdc0f407a 100644
--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -1411,7 +1411,7 @@ qemuSharedDeviceEntryInsert(virQEMUDriverPtr driver,
  * records all the domains that use the shared device if the entry
  * already exists, otherwise add a new entry.
  */
-static int
+int
 qemuAddSharedDisk(virQEMUDriverPtr driver,
   virDomainDiskDefPtr disk,
   const char *name)
diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h
index c227ac72cc..f876f9117c 100644
--- a/src/qemu/qemu_conf.h
+++ b/src/qemu/qemu_conf.h
@@ -339,6 +339,11 @@ char *qemuGetSharedDeviceKey(const char *disk_path)

 void qemuSharedDeviceEntryFree(void *payload, const void *name);

+int qemuAddSharedDisk(virQEMUDriverPtr driver,
+  virDomainDiskDefPtr disk,
+  const char *name)
+ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
+
 int qemuAddSharedDevice(virQEMUDriverPtr driver,
 virDomainDeviceDefPtr dev,
 const char *name)
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 3/8] qemu: hotplug: Use new source when preparing/translating for media change

2018-09-27 Thread Peter Krempa
Some internal helpers use only a virDomainDiskDef to do their job. If we
want to change source for the disk we need to update it to the new
source in the disk definition for those to work correctly.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_hotplug.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 86afda636e..3043b3257b 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -730,9 +730,12 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
bool force)
 {
 qemuDomainObjPrivatePtr priv = vm->privateData;
+virStorageSourcePtr oldsrc = disk->src;
 int ret = -1;
 int rc;

+disk->src = newsrc;
+
 if (qemuHotplugPrepareDiskAccess(driver, vm, disk, newsrc, false) < 0)
 goto cleanup;

@@ -744,7 +747,7 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
 else
 rc = qemuDomainChangeMediaLegacy(driver, vm, disk, newsrc, force);

-virDomainAuditDisk(vm, disk->src, newsrc, "update", rc >= 0);
+virDomainAuditDisk(vm, oldsrc, newsrc, "update", rc >= 0);

 if (rc < 0) {
 ignore_value(qemuHotplugPrepareDiskAccess(driver, vm, disk, newsrc, 
true));
@@ -755,7 +758,8 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
 ignore_value(qemuRemoveSharedDisk(driver, disk, vm->def->name));
 ignore_value(qemuHotplugPrepareDiskAccess(driver, vm, disk, NULL, true));

-virStorageSourceFree(disk->src);
+virStorageSourceFree(oldsrc);
+oldsrc = NULL;
 VIR_STEAL_PTR(disk->src, newsrc);

 ignore_value(qemuHotplugRemoveManagedPR(driver, vm, QEMU_ASYNC_JOB_NONE));
@@ -763,6 +767,9 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
 ret = 0;

  cleanup:
+if (oldsrc)
+disk->src = oldsrc;
+
 return ret;
 }

-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/8] Revert "qemu: hotplug: Prepare disk source in qemuDomainAttachDeviceDiskLive"

2018-09-27 Thread Peter Krempa
Preparing the storage source prior to assigning the alias will not work
as the names of the certain objects depend on the alias for the legacy
hotplug case as we generate the object names for the secrets based on
the alias.

This reverts commit 192fdaa614e3800255048a8a70c1292ccf18397a.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_hotplug.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 4558a3c02d..ccf3336633 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -781,6 +781,7 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver,
 qemuDomainObjPrivatePtr priv = vm->privateData;
 qemuHotplugDiskSourceDataPtr diskdata = NULL;
 char *devstr = NULL;
+virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);

 if (qemuHotplugPrepareDiskAccess(driver, vm, disk, NULL, false) < 0)
 goto cleanup;
@@ -788,6 +789,9 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver,
 if (qemuAssignDeviceDiskAlias(vm->def, disk, priv->qemuCaps) < 0)
 goto error;

+if (qemuDomainPrepareDiskSource(disk, priv, cfg) < 0)
+goto error;
+
 if (!(diskdata = qemuHotplugDiskSourceAttachPrepare(disk, priv->qemuCaps)))
 goto error;

@@ -822,6 +826,7 @@ qemuDomainAttachDiskGeneric(virQEMUDriverPtr driver,
 qemuHotplugDiskSourceDataFree(diskdata);
 qemuDomainSecretDiskDestroy(disk);
 VIR_FREE(devstr);
+virObjectUnref(cfg);
 return ret;

  exit_monitor:
@@ -1062,8 +1067,6 @@ qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
bool forceMediaChange)
 {
 size_t i;
-virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
-qemuDomainObjPrivatePtr priv = vm->privateData;
 virDomainDiskDefPtr disk = dev->data.disk;
 virDomainDiskDefPtr orig_disk = NULL;
 int ret = -1;
@@ -1080,9 +1083,6 @@ qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
 if (qemuDomainDetermineDiskChain(driver, vm, disk, true) < 0)
 goto cleanup;

-if (qemuDomainPrepareDiskSource(disk, priv, cfg) < 0)
-goto cleanup;
-
 switch ((virDomainDiskDevice) disk->device)  {
 case VIR_DOMAIN_DISK_DEVICE_CDROM:
 case VIR_DOMAIN_DISK_DEVICE_FLOPPY:
@@ -1153,7 +1153,6 @@ qemuDomainAttachDeviceDiskLive(virQEMUDriverPtr driver,
  cleanup:
 if (ret != 0)
 ignore_value(qemuRemoveSharedDevice(driver, dev, vm->def->name));
-virObjectUnref(cfg);
 return ret;
 }

-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 0/8] qemu: Fix disk hotplug/media change regression

2018-09-27 Thread Peter Krempa
Few of my recent patches (see the first two reverts) introduced a
regression when adding disks. The disk alias is needed when creating
names of certain backend objects (secrets/TLS). The code preparing those
was moved prior to alias formatting.

Revert those patches and fix it in a different way.

Peter Krempa (8):
  Revert "qemu: hotplug: Prepare disk source in
qemuDomainAttachDeviceDiskLive"
  Revert "qemu: hotplug: consolidate media change code paths"
  qemu: hotplug: Use new source when preparing/translating for media
change
  qemu: hotplug: Prepare disk source for media changing
  qemu: hotplug: Add wrapper for disk hotplug code
  qemu: conf: Export qemuAddSharedDisk
  qemu: hotplug: Split out media change code from disk hotplug
  qemu: hotplug: Refactor qemuDomainAttachDeviceDiskLiveInternal

 src/qemu/qemu_conf.c|   2 +-
 src/qemu/qemu_conf.h|   5 ++
 src/qemu/qemu_driver.c  |   7 +-
 src/qemu/qemu_hotplug.c | 188 ++--
 src/qemu/qemu_hotplug.h |   9 +-
 tests/qemuhotplugtest.c |   2 +-
 6 files changed, 123 insertions(+), 90 deletions(-)

-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] security: dac: also label listen UNIX sockets

2018-09-27 Thread Ján Tomko
We switched to opening mode='bind' sockets ourselves:
commit 30fb2276d88b275dc2aad6ddd28c100d944b59a5
qemu: support passing pre-opened UNIX socket listen FD
in v4.5.0-rc1~251

Then fixed qemuBuildChrChardevStr to change libvirtd's label
while creating the socket:
commit b0c6300fc42bbc3e5eb0b236392f7344581c5810
qemu: ensure FDs passed to QEMU for chardevs have correct SELinux labels
v4.5.0-rc1~52

Also add labeling of these sockets to the DAC driver.
Instead of trying to figure out which one was created by libvirt,
label it if it exists.

https://bugzilla.redhat.com/show_bug.cgi?id=1633389

Signed-off-by: Ján Tomko 
---
 src/security/security_dac.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/security/security_dac.c b/src/security/security_dac.c
index 62442745dd..da4a6c72fe 100644
--- a/src/security/security_dac.c
+++ b/src/security/security_dac.c
@@ -1308,7 +1308,12 @@ virSecurityDACSetChardevLabel(virSecurityManagerPtr mgr,
 break;
 
 case VIR_DOMAIN_CHR_TYPE_UNIX:
-if (!dev_source->data.nix.listen) {
+if (!dev_source->data.nix.listen ||
+(dev_source->data.nix.path &&
+ virFileExists(dev_source->data.nix.path))) {
+/* Also label mode='bind' sockets if they exist,
+ * e.g. because they were created by libvirt
+ * and passed via FD */
 if (virSecurityDACSetOwnership(mgr, NULL,
dev_source->data.nix.path,
user, group) < 0)
-- 
2.16.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Revert "vircgroup: cleanup controllers not managed by systemd on error"

2018-09-27 Thread Pavel Hrdina
On Thu, Sep 27, 2018 at 04:32:37PM +0200, Fabiano Fidêncio wrote:
> On Thu, 2018-09-27 at 16:13 +0200, Pavel Hrdina wrote:
> > This reverts commit 1602aa28f820ada66f707cef3e536e8572fbda1e.
> > 
> > There is no need to call virCgroupRemove() nor virCgroupFree() if
> > virCgroupEnableMissingControllers() fails because it will not modify
> > 'group' at all.  The cleanup is done in virCgroupMakeGroup().
> 
> I wouldn't mention "The cleanup is done in virCGroupMakeGroup() because
> there's no cleanup needed from virCgroupEnableMissingControllers() as
> groups is only modified in case of success.

It's not a cleanup of 'group' variable but cleanup of the directories
created on host.

> I'd wait for John's ACK as well, but:
> Reviewed-by: Fabiano Fidêncio 

I can modify the commit message to make it more obvious.

Pavel


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Revert "vircgroup: cleanup controllers not managed by systemd on error"

2018-09-27 Thread Fabiano Fidêncio
On Thu, 2018-09-27 at 16:13 +0200, Pavel Hrdina wrote:
> This reverts commit 1602aa28f820ada66f707cef3e536e8572fbda1e.
> 
> There is no need to call virCgroupRemove() nor virCgroupFree() if
> virCgroupEnableMissingControllers() fails because it will not modify
> 'group' at all.  The cleanup is done in virCgroupMakeGroup().

I wouldn't mention "The cleanup is done in virCGroupMakeGroup() because
there's no cleanup needed from virCgroupEnableMissingControllers() as
groups is only modified in case of success.

I'd wait for John's ACK as well, but:
Reviewed-by: Fabiano Fidêncio 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] Revert "vircgroup: cleanup controllers not managed by systemd on error"

2018-09-27 Thread John Ferlan



On 9/27/18 10:13 AM, Pavel Hrdina wrote:
> This reverts commit 1602aa28f820ada66f707cef3e536e8572fbda1e.
> 
> There is no need to call virCgroupRemove() nor virCgroupFree() if
> virCgroupEnableMissingControllers() fails because it will not modify
> 'group' at all.  The cleanup is done in virCgroupMakeGroup().
> 
> Signed-off-by: Pavel Hrdina 
> ---
>  src/util/vircgroup.c | 25 ++---
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 

For this patch,

Reviewed-by: John Ferlan 

However, of course this tickle's Coverity (that's what happens when
someone makes a change in code).

Anyway, it seems calling virCgroupSetOwner with a NULL cgroup as could
happen in virLXCCgroupCreate if virCgroupNewMachine returns 0 and cgroup
== NULL, will cause a problem.

Yes, this existing and no I'm not knowledgeable (nor is Coverity for
that matter) to know if having def->idmap.uidmap is "related".

John

It's like an onion, peel one layer and we find something else.

> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> index f90906e4ad..548c873da8 100644
> --- a/src/util/vircgroup.c
> +++ b/src/util/vircgroup.c
> @@ -1055,7 +1055,6 @@ virCgroupNewMachineSystemd(const char *name,
>  int rv;
>  virCgroupPtr init;
>  VIR_AUTOFREE(char *) path = NULL;
> -virErrorPtr saved = NULL;
>  
>  VIR_DEBUG("Trying to setup machine '%s' via systemd", name);
>  if ((rv = virSystemdCreateMachine(name,
> @@ -1088,24 +1087,20 @@ virCgroupNewMachineSystemd(const char *name,
>  
>  if (virCgroupEnableMissingControllers(path, pidleader,
>controllers, group) < 0) {
> -goto error;
> +return -1;
>  }
>  
> -if (virCgroupAddProcess(*group, pidleader) < 0)
> -goto error;
> +if (virCgroupAddProcess(*group, pidleader) < 0) {
> +virErrorPtr saved = virSaveLastError();
> +virCgroupRemove(*group);
> +virCgroupFree(group);
> +if (saved) {
> +virSetError(saved);
> +virFreeError(saved);
> +}
> +}
>  
>  return 0;
> -
> - error:
> -saved = virSaveLastError();
> -virCgroupRemove(*group);
> -virCgroupFree(group);
> -if (saved) {
> -virSetError(saved);
> -virFreeError(saved);
> -}
> -
> -return -1;
>  }
>  
>  
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] tests: reintroduce tests for libxl's legacy nested setting

2018-09-27 Thread Jim Fehlig

On 9/27/18 3:29 AM, Erik Skultety wrote:

On Wed, Sep 26, 2018 at 11:31:19AM -0600, Jim Fehlig wrote:

The preferred location for setting the nested CPU flag changed in
Xen 4.10 and is advertised via the LIBXL_HAVE_BUILDINFO_NESTED_HVM
define.  Commit 95d19cd0 changed libxl to use the new preferred
location but unconditionally changed the tests, causing 'make check'
failures against Xen < 4.10 that do not contain the new location.

Commit e94415d5 fixed the failures by only running the tests when
LIBXL_HAVE_BUILDINFO_NESTED_HVM is defined. Since libvirt supports
several versions of Xen that use the old nested location, it is
prudent to test the flag is set correctly. This patch reintroduces
the tests for the legacy location of the nested setting.

Signed-off-by: Jim Fehlig 
---

We could probably get by with one test for the old nested location,
in which case I'd drop vnuma-hvm-legacy-nest. Any opinions on that?


I verified with a few different platforms. I don't have a better idea on what
to do about the legacy tests, we either add more (even identical) test files
or we figure out some black magic to do the same thing (not preferred).
Anyway, to answer your question, even though it might be enough, I'd like to
stay consistent and keep both, so that if one day someone is looking at the
source they don't wonder why only one of them is being run in the legacy mode.
I hope that makes sense.


Yep, no problem. Should I push now or after release?

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/2] A few spec file fixes

2018-09-27 Thread Pavel Hrdina
On Thu, Sep 27, 2018 at 04:14:46PM +0200, Jiri Denemark wrote:
> Jiri Denemark (2):
>   spec: Set correct TLS priority
>   spec: Build ceph and gluster support everywhere

Reviewed-by: Pavel Hrdina 


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 2/2] spec: Build ceph and gluster support everywhere

2018-09-27 Thread Jiri Denemark
Both ceph and gluster have been built on RHEL on all architectures for
some time, there's no need to limit them to x86_64.

Signed-off-by: Jiri Denemark 
---
 libvirt.spec.in | 14 --
 1 file changed, 14 deletions(-)

diff --git a/libvirt.spec.in b/libvirt.spec.in
index dbfee0ba10..667f89c0b9 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -105,20 +105,6 @@
 %define with_numactl 0
 %endif
 
-# libgfapi is built only on x86_64 on rhel
-%ifnarch x86_64
-%if 0%{?rhel}
-%define with_storage_gluster 0
-%endif
-%endif
-
-# librados and librbd are built only on x86_64 on rhel
-%ifnarch x86_64
-%if 0%{?rhel}
-%define with_storage_rbd 0
-%endif
-%endif
-
 # zfs-fuse is not available on some architectures
 %ifarch s390 s390x aarch64
 %define with_storage_zfs 0
-- 
2.19.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/2] spec: Set correct TLS priority

2018-09-27 Thread Jiri Denemark
RHEL-7 is the only system where gnutls is too old to support @LIBVIRT
specifier.

Signed-off-by: Jiri Denemark 
---
 libvirt.spec.in | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libvirt.spec.in b/libvirt.spec.in
index b31947b0f4..dbfee0ba10 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -214,10 +214,10 @@
 %define enable_werror --disable-werror
 %endif
 
-%if 0%{?fedora}
-%define tls_priority "@LIBVIRT,SYSTEM"
-%else
+%if 0%{?rhel} == 7
 %define tls_priority "NORMAL"
+%else
+%define tls_priority "@LIBVIRT,SYSTEM"
 %endif
 
 
-- 
2.19.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 0/2] A few spec file fixes

2018-09-27 Thread Jiri Denemark
Jiri Denemark (2):
  spec: Set correct TLS priority
  spec: Build ceph and gluster support everywhere

 libvirt.spec.in | 20 +++-
 1 file changed, 3 insertions(+), 17 deletions(-)

-- 
2.19.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] Revert "vircgroup: cleanup controllers not managed by systemd on error"

2018-09-27 Thread Pavel Hrdina
This reverts commit 1602aa28f820ada66f707cef3e536e8572fbda1e.

There is no need to call virCgroupRemove() nor virCgroupFree() if
virCgroupEnableMissingControllers() fails because it will not modify
'group' at all.  The cleanup is done in virCgroupMakeGroup().

Signed-off-by: Pavel Hrdina 
---
 src/util/vircgroup.c | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index f90906e4ad..548c873da8 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1055,7 +1055,6 @@ virCgroupNewMachineSystemd(const char *name,
 int rv;
 virCgroupPtr init;
 VIR_AUTOFREE(char *) path = NULL;
-virErrorPtr saved = NULL;
 
 VIR_DEBUG("Trying to setup machine '%s' via systemd", name);
 if ((rv = virSystemdCreateMachine(name,
@@ -1088,24 +1087,20 @@ virCgroupNewMachineSystemd(const char *name,
 
 if (virCgroupEnableMissingControllers(path, pidleader,
   controllers, group) < 0) {
-goto error;
+return -1;
 }
 
-if (virCgroupAddProcess(*group, pidleader) < 0)
-goto error;
+if (virCgroupAddProcess(*group, pidleader) < 0) {
+virErrorPtr saved = virSaveLastError();
+virCgroupRemove(*group);
+virCgroupFree(group);
+if (saved) {
+virSetError(saved);
+virFreeError(saved);
+}
+}
 
 return 0;
-
- error:
-saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-
-return -1;
 }
 
 
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/9] vircgroup: cleanup controllers not managed by systemd on error

2018-09-27 Thread Pavel Hrdina
On Thu, Sep 27, 2018 at 04:01:08PM +0200, Pavel Hrdina wrote:
> On Thu, Sep 27, 2018 at 08:40:03AM -0400, John Ferlan wrote:
> > 
> > 
> > On 9/20/18 4:54 AM, Pavel Hrdina wrote:
> > > If virCgroupEnableMissingControllers() fails it could already create
> > > some directories, we should clean it up as well.
> > > 
> > > Reviewed-by: Fabiano Fidêncio 
> > > Signed-off-by: Pavel Hrdina 
> > > ---
> > >  src/util/vircgroup.c | 25 +++--
> > >  1 file changed, 15 insertions(+), 10 deletions(-)
> > > 
> > 
> > Rather than usurp Marc's patch to fix a bug introduced here, I figured
> > I'd go directly to this patch...
> > 
> > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > > index c825f7c77e..f9e387c86d 100644
> > > --- a/src/util/vircgroup.c
> > > +++ b/src/util/vircgroup.c
> > > @@ -1551,6 +1551,7 @@ virCgroupNewMachineSystemd(const char *name,
> > >  int rv;
> > >  virCgroupPtr init;
> > >  VIR_AUTOFREE(char *) path = NULL;
> > > +virErrorPtr saved = NULL;
> > >  
> > >  VIR_DEBUG("Trying to setup machine '%s' via systemd", name);
> > >  if ((rv = virSystemdCreateMachine(name,
> > > @@ -1584,20 +1585,24 @@ virCgroupNewMachineSystemd(const char *name,
> > >  
> > >  if (virCgroupEnableMissingControllers(path, pidleader,
> > >controllers, group) < 0) {
> > > -return -1;
> > > +goto error;
> > >  }
> > >  
> > > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > > -virErrorPtr saved = virSaveLastError();
> > > -virCgroupRemove(*group);
> > > -virCgroupFree(group);
> > > -if (saved) {
> > > -virSetError(saved);
> > > -virFreeError(saved);
> > > -}
> > > -}
> > > +if (virCgroupAddTask(*group, pidleader) < 0)
> > > +goto error;
> > 
> > This code changes from returning 0 with *group == NULL to returning -1.
> > Which perhaps isn't a problem per se...
> > 
> > However, I note other return paths from virCgroupNewMachineManual and
> > qemuExtTPMSetupCgroup when processing a failed virCgroupAddProcess (what
> > virCgroupAddTask turns into) which still return 0 and a NULL group.
> 
> If virCgroupAddProcess fails virCgroupNewMachineManual returns 0 and
> NULL, however, qemuExtTPMSetupCgroup returns -1 and NULL.
> 
> > For qemuExtTPMSetupCgroup one must go back through to
> > qemuSetupCgroupForExtDevices via qemuExtDevicesSetupCgroup to see that
> > the virCgroupRemove is not made on the error path.
> 
> The difference with qemuExtDevicesSetupCgroup is adding another process
> to existing cgroups that were created for the domain itself.  The
> cleanup is done for the whole domain.  If you follow the cleanup path:
> 
> qemuSetupCgroupForExtDevices fails -> goto cleanup, there is no
> qemuRemoveCgroup or virCgroupRemove, this will result that
> qemuProcessLaunch fails -> goto stop, there we will call qemuProcessStop
> where we will call qemuRemoveCgroup which will remove cgroups.
> 
> So that part of code is correct, we don't want to remove cgroups if the
> qemu process is running, we will do it later once we stop the qemu
> process.
> 
> > For some additional data, if one goes back through history of this hunk
> > of code it wasn't added with returning an error (commit 2fe2470181), but
> > there was a change (commit 71ce47596) that got reverted (commit
> > d41bd0959) that had changed to returning an error and I have a faint
> > recollection of that being a problem...
> > 
> > Looking at the archive of the revert patches:
> > 
> > v2:
> > https://www.redhat.com/archives/libvir-list/2016-January/msg00619.html
> > 
> > v1:
> > https://www.redhat.com/archives/libvir-list/2016-January/msg00511.html
> > 
> > There's quite a bit of "history" in the v1 patches that may help show
> > the depth of the issue.
> > 
> > I'm not really sure of the best course of action right now, but I
> > figured I should at least note it and think about it now while we still
> > have a chance before the release.
> 
> Before this patch:
> 
> - when virCgroupEnableMissingControllers failed we returned -1 and
>   cgroup ptr or NULL
> 
> - when virCgroupAddTask failed we returned 0 and NULL
> 
> After this patch:
> 
> - when virCgroupEnableMissingControllers fails we return -1 and NULL
> 
> - when virCgroupAddTask fails we return -1 and NULL
> 
> The change for virCgroupEnableMissingControllers is not an issue because
> we always return -1 therefore starting guest always fails and we always 
> call virCgroupRemove.
> 
> The change for virCgroupAddTask changed the behavior, previously we
> continued with starting the guest but no cgroups were used for that
> guest, after this patch starting guest fails.
> 
> So there are two issues after this patch, we need to return 0 if
> virCgroupAddTask fails and we need to modify virCgroupRemove to accept
> NULL as well.
> 
> Marc's patch fixes the second issue, but is incorrect since we 

Re: [libvirt] [PATCH v2 1/9] vircgroup: cleanup controllers not managed by systemd on error

2018-09-27 Thread Pavel Hrdina
On Thu, Sep 27, 2018 at 08:40:03AM -0400, John Ferlan wrote:
> 
> 
> On 9/20/18 4:54 AM, Pavel Hrdina wrote:
> > If virCgroupEnableMissingControllers() fails it could already create
> > some directories, we should clean it up as well.
> > 
> > Reviewed-by: Fabiano Fidêncio 
> > Signed-off-by: Pavel Hrdina 
> > ---
> >  src/util/vircgroup.c | 25 +++--
> >  1 file changed, 15 insertions(+), 10 deletions(-)
> > 
> 
> Rather than usurp Marc's patch to fix a bug introduced here, I figured
> I'd go directly to this patch...
> 
> > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > index c825f7c77e..f9e387c86d 100644
> > --- a/src/util/vircgroup.c
> > +++ b/src/util/vircgroup.c
> > @@ -1551,6 +1551,7 @@ virCgroupNewMachineSystemd(const char *name,
> >  int rv;
> >  virCgroupPtr init;
> >  VIR_AUTOFREE(char *) path = NULL;
> > +virErrorPtr saved = NULL;
> >  
> >  VIR_DEBUG("Trying to setup machine '%s' via systemd", name);
> >  if ((rv = virSystemdCreateMachine(name,
> > @@ -1584,20 +1585,24 @@ virCgroupNewMachineSystemd(const char *name,
> >  
> >  if (virCgroupEnableMissingControllers(path, pidleader,
> >controllers, group) < 0) {
> > -return -1;
> > +goto error;
> >  }
> >  
> > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > -virErrorPtr saved = virSaveLastError();
> > -virCgroupRemove(*group);
> > -virCgroupFree(group);
> > -if (saved) {
> > -virSetError(saved);
> > -virFreeError(saved);
> > -}
> > -}
> > +if (virCgroupAddTask(*group, pidleader) < 0)
> > +goto error;
> 
> This code changes from returning 0 with *group == NULL to returning -1.
> Which perhaps isn't a problem per se...
> 
> However, I note other return paths from virCgroupNewMachineManual and
> qemuExtTPMSetupCgroup when processing a failed virCgroupAddProcess (what
> virCgroupAddTask turns into) which still return 0 and a NULL group.

If virCgroupAddProcess fails virCgroupNewMachineManual returns 0 and
NULL, however, qemuExtTPMSetupCgroup returns -1 and NULL.

> For qemuExtTPMSetupCgroup one must go back through to
> qemuSetupCgroupForExtDevices via qemuExtDevicesSetupCgroup to see that
> the virCgroupRemove is not made on the error path.

The difference with qemuExtDevicesSetupCgroup is adding another process
to existing cgroups that were created for the domain itself.  The
cleanup is done for the whole domain.  If you follow the cleanup path:

qemuSetupCgroupForExtDevices fails -> goto cleanup, there is no
qemuRemoveCgroup or virCgroupRemove, this will result that
qemuProcessLaunch fails -> goto stop, there we will call qemuProcessStop
where we will call qemuRemoveCgroup which will remove cgroups.

So that part of code is correct, we don't want to remove cgroups if the
qemu process is running, we will do it later once we stop the qemu
process.

> For some additional data, if one goes back through history of this hunk
> of code it wasn't added with returning an error (commit 2fe2470181), but
> there was a change (commit 71ce47596) that got reverted (commit
> d41bd0959) that had changed to returning an error and I have a faint
> recollection of that being a problem...
> 
> Looking at the archive of the revert patches:
> 
> v2:
> https://www.redhat.com/archives/libvir-list/2016-January/msg00619.html
> 
> v1:
> https://www.redhat.com/archives/libvir-list/2016-January/msg00511.html
> 
> There's quite a bit of "history" in the v1 patches that may help show
> the depth of the issue.
> 
> I'm not really sure of the best course of action right now, but I
> figured I should at least note it and think about it now while we still
> have a chance before the release.

Before this patch:

- when virCgroupEnableMissingControllers failed we returned -1 and
  cgroup ptr or NULL

- when virCgroupAddTask failed we returned 0 and NULL

After this patch:

- when virCgroupEnableMissingControllers fails we return -1 and NULL

- when virCgroupAddTask fails we return -1 and NULL

The change for virCgroupEnableMissingControllers is not an issue because
we always return -1 therefore starting guest always fails and we always 
call virCgroupRemove.

The change for virCgroupAddTask changed the behavior, previously we
continued with starting the guest but no cgroups were used for that
guest, after this patch starting guest fails.

So there are two issues after this patch, we need to return 0 if
virCgroupAddTask fails and we need to modify virCgroupRemove to accept
NULL as well.

Marc's patch fixes the second issue, but is incorrect since we tend to
have check for NULL inside the function.

I'll post new patches to address both issues.

Pavel

> John
> 
> 
> >  
> >  return 0;
> > +
> > + error:
> > +saved = virSaveLastError();
> > +virCgroupRemove(*group);
> > +virCgroupFree(group);
> > +if (saved) {
> > +

Re: [libvirt] [PATCH for 4.8.0] qemu: Temporarily disable metadata locking

2018-09-27 Thread Jiri Denemark
On Thu, Sep 27, 2018 at 15:14:49 +0200, Michal Privoznik wrote:
> Turns out, there are couple of bugs that prevent this feature
> from being operational. Given how close to the release we are
> disable the feature temporarily. Hopefully, it can be enabled
> back after all the bugs are fixed.
> 
> Signed-off-by: Michal Privoznik 

Reviewed-by: Jiri Denemark 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 4/4] security: Always spawn process for transactions

2018-09-27 Thread John Ferlan



On 9/27/18 4:51 AM, Michal Privoznik wrote:
> On 09/26/2018 11:14 PM, John Ferlan wrote:
>>

[...]

 I read the cover, it's not simpler than the above. 
>>>
>>> Yeah, this is not that simple bug. I'll try to explain it differently:
>>>
>>> On fork(), all FDs of the parent are duplicated to child. So even if
>>> parent closes one of them, the connection is not closed until child
>>> closes the same FD. And since metadata locking is very fragile when it
>>> comes to connection to virtlockd, this fork() behaviour actually plays
>>> against us because it violates our assumption that connection is closed
>>> at the end of metadataUnlock(). In fact it is not - child has it still open.
>>>
>>> If there was a flag like DONT_DUP_ON_FORK I could just set it on
>>> virtlockd connection in metadataLock() and be done.
>>
>> Avoid the forking exec's works too ;-}
>>
>> OK - instead of talking in generalities - which forking dup'd fd is in
>> play here? I don't want to assume or guess any more.
> 
> The connection FD.
> 
> Maybe I'm misreading your comments or the other way around. I'll try to
> explain what is the problem one last time, in the simplest way I can.
> 
> 1) Thread A wants to start a domain so it calls
> virSecurityManagerMetadataLock().
> 
> 2) The function opens a connection to virtlockd (represented by FD) and
> duplicates the FD so that connection is kept open until unlock.

Yep, virLockManagerLockDaemonAcquire calls virNetClientDupFD(false)
meaning I don't want one with the "FD_CLOEXEC" flag set.

While normally that would be OK - is it OK for this path? Does anything
change if the argument was true here (e.g. cloexec = priv->type ==
VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON)?

It seems to me that the 3 patches together dance around two things -
that the CLOEXEC is set and that TransactionCommit does the fork to
avoid the parent/child duplication problem.

> 
> 3) Now that locks are acquired, the thread does chown() and setfcon().
> 
> 4) the virSecurityManagerMetadataUnlock() function is called, which
> causes virtlockd to release all the locks. Subsequently to that, the
> function closes the duplicated FD obtained in step 2).
> 
> The duplication is done because both virLockManagerAcquire() and
> virLockManagerRelease() open new connection, do the RPC talk and close
> the connection. That's just the way the functions are written. Now,
> because of FD duplication happening in step 2) the connection from
> virLockManagerAcquire() is not really closed until step 4) where the
> duplicated FD is closed.
> 
> This is important to realize: the connection is closed only when the
> last FD representing it is closed. If I were to duplicate the connection
> FD a hundred times and then close 99 of those duplicates the connection
> would still be kept open.

I've understood all this.

> 
> Now, lets put some threads into the picture. Imagine thread B which
> called fork() just in between steps 2) and 3). On fork, ALL partent FDs
> are duplicated. So the connection FD is duplicated into the child
> process too. Therefore, in step 4) where we think the connection is
> closed - well it is not really because there is still one FD around - in
> the child.

This is exactly where the story gets cloudy for me. Is ThreadB related
to ThreadA? It must be, because how would it get the duplicated fd, right?

But ironically we save priv->clientfd, so if we know we've forked a
child can we immediately close it?  Knowing that our parent still has it
open? Off in left field again?

> 
> And this is huge problem, because we do not know when child is going to
> close it. It may be (unintentionally) an evil child and close the FD at
> the worst time possible - when libvirtd holds some locks.
> 
> Important thing to realize #2: whenever a connection closes, virtlockd
> looks into its internal records and if PID on the other side held some
> locks then: a) the locks are releasd, b) the PID is killed.
> 
> I'm sorry, I can't explain this any simpler. I'll try to find somebody
> who understands this and have them review the patch.

I think sometimes it's better to see actual calls and traces rather than
using "imagine" if type scenarios. I'm sure this is *a lot* easier when
you have a couple of debug sessions running and see things happening in
real time.

If someone else reviews this it doesn't matter to me, but I think this
patch is wrong for a couple of reasons. That reasoning doesn't seem to
be well received. If this patch was accepted as is, then you'd be
calling virProcessRunInMountNamespace even when namespaces weren't being
used, even for domain locks. Something I saw in the traces that Marc
posted this morning and starting thinking, hey wait this code is only
supposed to be run when namespaces are enabled. So I have to assume
that's being used in his environment. Would the same problem exist if
namespaces weren't being used?

> 
> As an alternative approach, if we did not close the connection in 2) and

Did you mean clone or 

[libvirt] [PATCH for 4.8.0] qemu: Temporarily disable metadata locking

2018-09-27 Thread Michal Privoznik
Turns out, there are couple of bugs that prevent this feature
from being operational. Given how close to the release we are
disable the feature temporarily. Hopefully, it can be enabled
back after all the bugs are fixed.

Signed-off-by: Michal Privoznik 
---

There are two major problems:

1) when using regular locking the lockspace code gets confused and
starting domain is denied after hitting 60 second timeout.

2) the virtlockd connection FD might leak into children process
resulting in sudden killing of libvirtd.

I posted a patch for 2), and I'm working on patch for 1).

 src/qemu/libvirtd_qemu.aug |  1 -
 src/qemu/qemu.conf |  9 -
 src/qemu/qemu_conf.c   | 11 ---
 src/qemu/test_libvirtd_qemu.aug.in |  1 -
 4 files changed, 22 deletions(-)

diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug
index 42e325d4fb..ddc4bbfd1d 100644
--- a/src/qemu/libvirtd_qemu.aug
+++ b/src/qemu/libvirtd_qemu.aug
@@ -98,7 +98,6 @@ module Libvirtd_qemu =
  | bool_entry "relaxed_acs_check"
  | bool_entry "allow_disk_format_probing"
  | str_entry "lock_manager"
- | str_entry "metadata_lock_manager"
 
let rpc_entry = int_entry "max_queued"
  | int_entry "keepalive_interval"
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf
index 84492719c4..8391332cb4 100644
--- a/src/qemu/qemu.conf
+++ b/src/qemu/qemu.conf
@@ -659,15 +659,6 @@
 #lock_manager = "lockd"
 
 
-# To serialize two or more daemons trying to change metadata on a
-# file (e.g. a file on NFS share), libvirt offers a locking
-# mechanism. Currently, only "lockd" is supported (or no locking
-# at all if unset). Note that this is independent of lock_manager
-# described above.
-#
-#metadata_lock_manager = "lockd"
-
-
 # Set limit of maximum APIs queued on one domain. All other APIs
 # over this threshold will fail on acquiring job lock. Specially,
 # setting to zero turns this feature off.
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c
index 33508174cb..fc84186a7e 100644
--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -838,17 +838,6 @@ int virQEMUDriverConfigLoadFile(virQEMUDriverConfigPtr cfg,
 if (virConfGetValueString(conf, "lock_manager", >lockManagerName) < 0)
 goto cleanup;
 
-if (virConfGetValueString(conf, "metadata_lock_manager",
-  >metadataLockManagerName) < 0)
-goto cleanup;
-if (cfg->metadataLockManagerName &&
-STRNEQ(cfg->metadataLockManagerName, "lockd")) {
-virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
-   _("unknown metadata lock manager name %s"),
-   cfg->metadataLockManagerName);
-goto cleanup;
-}
-
 if (virConfGetValueString(conf, "stdio_handler", ) < 0)
 goto cleanup;
 if (stdioHandler) {
diff --git a/src/qemu/test_libvirtd_qemu.aug.in 
b/src/qemu/test_libvirtd_qemu.aug.in
index 451e73126e..f1e8806ad2 100644
--- a/src/qemu/test_libvirtd_qemu.aug.in
+++ b/src/qemu/test_libvirtd_qemu.aug.in
@@ -81,7 +81,6 @@ module Test_libvirtd_qemu =
 { "mac_filter" = "1" }
 { "relaxed_acs_check" = "1" }
 { "lock_manager" = "lockd" }
-{ "metadata_lock_manager" = "lockd" }
 { "max_queued" = "0" }
 { "keepalive_interval" = "5" }
 { "keepalive_count" = "5" }
-- 
2.16.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/9] vircgroup: cleanup controllers not managed by systemd on error

2018-09-27 Thread John Ferlan


On 9/20/18 4:54 AM, Pavel Hrdina wrote:
> If virCgroupEnableMissingControllers() fails it could already create
> some directories, we should clean it up as well.
> 
> Reviewed-by: Fabiano Fidêncio 
> Signed-off-by: Pavel Hrdina 
> ---
>  src/util/vircgroup.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 

Rather than usurp Marc's patch to fix a bug introduced here, I figured
I'd go directly to this patch...

> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> index c825f7c77e..f9e387c86d 100644
> --- a/src/util/vircgroup.c
> +++ b/src/util/vircgroup.c
> @@ -1551,6 +1551,7 @@ virCgroupNewMachineSystemd(const char *name,
>  int rv;
>  virCgroupPtr init;
>  VIR_AUTOFREE(char *) path = NULL;
> +virErrorPtr saved = NULL;
>  
>  VIR_DEBUG("Trying to setup machine '%s' via systemd", name);
>  if ((rv = virSystemdCreateMachine(name,
> @@ -1584,20 +1585,24 @@ virCgroupNewMachineSystemd(const char *name,
>  
>  if (virCgroupEnableMissingControllers(path, pidleader,
>controllers, group) < 0) {
> -return -1;
> +goto error;
>  }
>  
> -if (virCgroupAddTask(*group, pidleader) < 0) {
> -virErrorPtr saved = virSaveLastError();
> -virCgroupRemove(*group);
> -virCgroupFree(group);
> -if (saved) {
> -virSetError(saved);
> -virFreeError(saved);
> -}
> -}
> +if (virCgroupAddTask(*group, pidleader) < 0)
> +goto error;

This code changes from returning 0 with *group == NULL to returning -1.
Which perhaps isn't a problem per se...

However, I note other return paths from virCgroupNewMachineManual and
qemuExtTPMSetupCgroup when processing a failed virCgroupAddProcess (what
virCgroupAddTask turns into) which still return 0 and a NULL group.

For qemuExtTPMSetupCgroup one must go back through to
qemuSetupCgroupForExtDevices via qemuExtDevicesSetupCgroup to see that
the virCgroupRemove is not made on the error path.

For some additional data, if one goes back through history of this hunk
of code it wasn't added with returning an error (commit 2fe2470181), but
there was a change (commit 71ce47596) that got reverted (commit
d41bd0959) that had changed to returning an error and I have a faint
recollection of that being a problem...

Looking at the archive of the revert patches:

v2:
https://www.redhat.com/archives/libvir-list/2016-January/msg00619.html

v1:
https://www.redhat.com/archives/libvir-list/2016-January/msg00511.html

There's quite a bit of "history" in the v1 patches that may help show
the depth of the issue.

I'm not really sure of the best course of action right now, but I
figured I should at least note it and think about it now while we still
have a chance before the release.

John


>  
>  return 0;
> +
> + error:
> +saved = virSaveLastError();
> +virCgroupRemove(*group);
> +virCgroupFree(group);
> +if (saved) {
> +virSetError(saved);
> +virFreeError(saved);
> +}
> +
> +return -1;
>  }
>  
>  
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] vircgroup: fix NULL pointer dereferencing

2018-09-27 Thread John Ferlan
$subj:

util: Fix cgroup processing NULL pointer dereferencing


On 9/26/18 11:53 AM, Marc Hartmayer wrote:
> When virCgroupEnableMissingControllers fails it's possible that *group
> is still set to NULL. Therefore let's add a guard and an attribute for
> this.

Prefix paragraph with rather than at the bottom "Fixes:".

Introduced by commit 1602aa28f,

> 
>  [#0] virCgroupRemove(group=0x0)
>  [#1] virCgroupNewMachineSystemd
>  [#2] virCgroupNewMachine
>  [#3] qemuInitCgroup
>  [#4] qemuSetupCgroup
>  [#5] qemuProcessLaunch
>  [#6] qemuProcessStart
>  [#7] qemuDomainObjStart
>  [#8] qemuDomainCreateWithFlags
>  [#9] qemuDomainCreate
>  ...
> 
> Fixes: 1602aa28f820ada66f707cef3e536e8572fbda1e

I think it's safe to remove the stack trace...

> Reviewed-by: Boris Fiuczynski 
> Reviewed-by: Bjoern Walk 
> Signed-off-by: Marc Hartmayer 
> ---
>  src/util/vircgroup.c | 3 ++-
>  src/util/vircgroup.h | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)

While this patch is correct to remove the NULL deref, there "may" be a
problem with the patch that introduced this. Rather than usurp this
thread, I'll respond to the other one and see where it takes us.

John

> 
> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> index 23957c82c7fa..06e1d158febb 100644
> --- a/src/util/vircgroup.c
> +++ b/src/util/vircgroup.c
> @@ -1104,7 +1104,8 @@ virCgroupNewMachineSystemd(const char *name,
>  
>   error:
>  saved = virSaveLastError();
> -virCgroupRemove(*group);
> +if (*group)
> +virCgroupRemove(*group);
>  virCgroupFree(group);
>  if (saved) {
>  virSetError(saved);
> diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h
> index 1f676f21c380..9e1ae3706b1e 100644
> --- a/src/util/vircgroup.h
> +++ b/src/util/vircgroup.h
> @@ -268,7 +268,8 @@ int virCgroupGetCpusetMemoryMigrate(virCgroupPtr group, 
> bool *migrate);
>  int virCgroupSetCpusetCpus(virCgroupPtr group, const char *cpus);
>  int virCgroupGetCpusetCpus(virCgroupPtr group, char **cpus);
>  
> -int virCgroupRemove(virCgroupPtr group);
> +int virCgroupRemove(virCgroupPtr group)
> +ATTRIBUTE_NONNULL(1);

FWIW: This only helps if someone tried to call virCgroupRemove(NULL); it
doesn't help if the parameter itself is NULL. One could add a "if
(!group) return 0" to virCgroupRemove to avoid.

>  
>  int virCgroupKillRecursive(virCgroupPtr group, int signum);
>  int virCgroupKillPainfully(virCgroupPtr group);
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Michal Privoznik
On 09/27/2018 01:16 PM, Bjoern Walk wrote:
> Michal Privoznik  [2018-09-27, 12:07PM +0200]:
>> On 09/27/2018 11:11 AM, Bjoern Walk wrote:
>>> I still don't understand why we need a timeout at all. If virtlockd is
>>> unable to get the lock, just bail and continue with what you did after
>>> the timeout runs out. Is this some kind of safety-measure? Against what?
>>
>> When there are two threads fighting over the lock. Imagine two threads
>> trying to set security label over the same file. Or imagine two daemons
>> on two distant hosts trying to set label on the same file on NFS.
>> virtlockd implements try-or-fail approach. So we need to call lock()
>> repeatedly until we succeed (or timeout).
> 
> One thread wins and the other fails with lock contention? I don't see
> how repeatedly trying to acquire the lock will improve things, but maybe
> I am not getting it right now.

The point of metadata locking is not to prevent metadata change, but to
be able to atomically change it. As I said in other thread, there is not
much point in this feature alone since chown() and setfcon() are atomic
themselves. But this is preparing the code for original label
remembering which will be stored in XATTRs at which point the whole
operation is not going to be atomic.

https://www.redhat.com/archives/libvir-list/2018-September/msg01349.html

Long story short, at the first chown() libvirt will record the original
owner of the file into XATTRs and then on the last restore it will use
that owner to return the file to instead of root:root. The locks are
there because reading XATTR, parsing it, increasing/decreasing
refcounter and possibly doing chown() is not atomic. But with locks it is.

> 
>>> There IS a lock held on the image, from the FIRST domain that we
>>> started. The second domain, which is using the SAME image, unshared,
>>> runs into the locking timeout. Sorry if I failed to describe this setup
>>> appropriately. I am starting to believe that this is expected behaviour,
>>> although it is not intuitive.
>>>
>>> # lslocks -u
>>> COMMANDPID   TYPE SIZE MODE  M STARTEND PATH
>>> ...
>>> virtlockd   199062  POSIX 1.5G WRITE 0 0  0 
>>> /var/lib/libvirt/images/u1604.qcow2
>>
>> But this should not clash, because metadata lock use different bytes:
>> the regular locking uses offset=0, metadata lock uses offset=1. Both
>> locks lock just one byte in length.
> 
> It's the only lock in the output for the image. Should I see the
> metadata lock at start=1 as well?

You should see it if you run lslocks at the right moment. But since
metadata locking happens only for a fraction of a second, it is very
unlikely that you'll be able to run it at the right moment.

> 
>> However, from the logs it looks like virtlockd doesn't try to actually
>> acquire the lock because it found in internal records that the path is
>> locked (even though it is locked with different offset). Anyway, now I
>> am finally able to reproduce the issue and I'll look into this too.
> 
> Good.
> 
>> Quick and dirty patch might look something like this:
>>
>> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
>> index 79fa48d365..3c51d7926b 100644
>> --- i/src/util/virlockspace.c
>> +++ w/src/util/virlockspace.c
>> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
>> lockspace,
>>  virMutexLock(>lock);
>>
>>  if ((res = virHashLookup(lockspace->resources, resname))) {
>> -if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) &&
>> -(flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) {
>> +if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED &&
>> +flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) ||
>> +start == 1) {
>>
>>  if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0)
>>  goto cleanup;
> 
> Had a quick test with this, but now it seems like that the whole
> metadata locking does nothing. When starting the second domain, I get an
> internal error from QEMU, failing to get the write lock. The SELinux
> labels of the image are changed as well. This is the same behaviour as
> with metadata locking disabled. Not entirely sure what should happen or
> if this is a error in my setup or not. I will have to think about this.

Metadata locking is not supposed to prevent you from running two domains
with the same disk. Nor to prevent chown() on the file.
But since you are using virtlockd to lock the disk contents, it should
prevent running such domains, not qemu. I wonder if this is a
misconfiguration or something. Perhaps one domain has disk RW and the
other has it RO + shared?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] qemu: agent: Reset agentError when qemuConnectAgent success

2018-09-27 Thread Wang Yechao
qemuAgentClose and qemuAgentIO have race condition,
as follows:

main thread:  second thread:
virEventPollRunOnceprocessSerialChangedEvent
  virEventPollDispatchHandles
virMutexUnlock()
  qemuAgentClose
virObjectLock(mon)
virEventRemoveHandle
VIR_FORCE_CLOSE(mon->fd)
virObjectUnlock(mon)
  priv->agentError = false
qemuAgentIO
  virObjectLock(mon)
mon->fd != fd --> error = true
qemuProcessHandleAgentError
  priv->agentError = true
  virObjectUnlock(mon)
virMutexLock()

qemuAgentClose set the mon->fd to '-1', and then qemuAgentIO
check the mon->fd not equals to fd that registered before.
qemuProcessHandleAgentError will be called to set
priv->agentError to 'true', then the priv->agentError is
always 'true' except restart libvirtd or restart
qemu-guest-agent process in guest. We can't send any
qemu-agent-command anymore even if qemuConnectAgent return
success later.

This is accidently occurs when hot-add-vcpu in windows2012.
  virsh setvcpus ...
  virsh qemu-agent-command $vm '{"execute":"guest-get-vcpus"}'

Reset the priv->agentError to 'false' when qemuConnectAgent sucess
to fix this problem.

Signed-off-by: Wang Yechao 
---
 src/qemu/qemu_process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 29b0ba1..4fbb955 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -269,6 +269,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr 
vm)
 virResetLastError();
 }
 
+priv->agentError = false;
 return 0;
 }
 
-- 
1.8.3.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Bjoern Walk
Michal Privoznik  [2018-09-27, 12:07PM +0200]:
> On 09/27/2018 11:11 AM, Bjoern Walk wrote:
> > I still don't understand why we need a timeout at all. If virtlockd is
> > unable to get the lock, just bail and continue with what you did after
> > the timeout runs out. Is this some kind of safety-measure? Against what?
> 
> When there are two threads fighting over the lock. Imagine two threads
> trying to set security label over the same file. Or imagine two daemons
> on two distant hosts trying to set label on the same file on NFS.
> virtlockd implements try-or-fail approach. So we need to call lock()
> repeatedly until we succeed (or timeout).

One thread wins and the other fails with lock contention? I don't see
how repeatedly trying to acquire the lock will improve things, but maybe
I am not getting it right now.

> > There IS a lock held on the image, from the FIRST domain that we
> > started. The second domain, which is using the SAME image, unshared,
> > runs into the locking timeout. Sorry if I failed to describe this setup
> > appropriately. I am starting to believe that this is expected behaviour,
> > although it is not intuitive.
> > 
> > # lslocks -u
> > COMMANDPID   TYPE SIZE MODE  M STARTEND PATH
> > ...
> > virtlockd   199062  POSIX 1.5G WRITE 0 0  0 
> > /var/lib/libvirt/images/u1604.qcow2
> 
> But this should not clash, because metadata lock use different bytes:
> the regular locking uses offset=0, metadata lock uses offset=1. Both
> locks lock just one byte in length.

It's the only lock in the output for the image. Should I see the
metadata lock at start=1 as well?

> However, from the logs it looks like virtlockd doesn't try to actually
> acquire the lock because it found in internal records that the path is
> locked (even though it is locked with different offset). Anyway, now I
> am finally able to reproduce the issue and I'll look into this too.

Good.

> Quick and dirty patch might look something like this:
> 
> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
> index 79fa48d365..3c51d7926b 100644
> --- i/src/util/virlockspace.c
> +++ w/src/util/virlockspace.c
> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
> lockspace,
>  virMutexLock(>lock);
> 
>  if ((res = virHashLookup(lockspace->resources, resname))) {
> -if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) &&
> -(flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) {
> +if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED &&
> +flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) ||
> +start == 1) {
> 
>  if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0)
>  goto cleanup;

Had a quick test with this, but now it seems like that the whole
metadata locking does nothing. When starting the second domain, I get an
internal error from QEMU, failing to get the write lock. The SELinux
labels of the image are changed as well. This is the same behaviour as
with metadata locking disabled. Not entirely sure what should happen or
if this is a error in my setup or not. I will have to think about this.

> But as I say, this is very dirty hack. I need to find a better solution.
> At least you might have something to test.
> 
> Michal
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

-- 
IBM Systems
Linux on Z & Virtualization Development
--
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v3 0/3] fix build failure in vircgroup code

2018-09-27 Thread Ján Tomko

On Thu, Sep 27, 2018 at 12:31:08PM +0200, Pavel Hrdina wrote:

Chagnes in v3:
   - removed VIR_CGROUP_SUPPORTED
   - include system headers only on linux

Pavel Hrdina (3):
 vircgroup: remove VIR_CGROUP_SUPPORTED
 vircgroup: include system headers only on linux
 vircgroupv1: fix build on non-linux OSes

src/util/vircgroup.c   | 38 --
src/util/vircgroupv1.c | 20 +++-
2 files changed, 31 insertions(+), 27 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v3 0/3] fix build failure in vircgroup code

2018-09-27 Thread Pavel Hrdina
Chagnes in v3:
- removed VIR_CGROUP_SUPPORTED
- include system headers only on linux

Pavel Hrdina (3):
  vircgroup: remove VIR_CGROUP_SUPPORTED
  vircgroup: include system headers only on linux
  vircgroupv1: fix build on non-linux OSes

 src/util/vircgroup.c   | 38 --
 src/util/vircgroupv1.c | 20 +++-
 2 files changed, 31 insertions(+), 27 deletions(-)

-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v3 1/3] vircgroup: remove VIR_CGROUP_SUPPORTED

2018-09-27 Thread Pavel Hrdina
tests/vircgrouptest.c uses #ifdef __linux__ for a long time and no
failure was reported so far so it's safe to assume that __linux__ is
good enough to guard cgroup code.

Signed-off-by: Pavel Hrdina 
---
 src/util/vircgroup.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 23957c82c7..7700a9f7a7 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -66,11 +66,6 @@ VIR_LOG_INIT("util.cgroup");
 #define CGROUP_NB_TOTAL_CPU_STAT_PARAM 3
 #define CGROUP_NB_PER_CPU_STAT_PARAM   1
 
-#if defined(__linux__) && defined(HAVE_GETMNTENT_R) && \
-defined(_DIRENT_HAVE_D_TYPE) && defined(_SC_CLK_TCK)
-# define VIR_CGROUP_SUPPORTED
-#endif
-
 VIR_ENUM_IMPL(virCgroupController, VIR_CGROUP_CONTROLLER_LAST,
   "cpu", "cpuacct", "cpuset", "memory", "devices",
   "freezer", "blkio", "net_cls", "perf_event",
@@ -115,7 +110,7 @@ virCgroupGetDevicePermsString(int perms)
 }
 
 
-#ifdef VIR_CGROUP_SUPPORTED
+#ifdef __linux__
 bool
 virCgroupAvailable(void)
 {
@@ -2639,7 +2634,7 @@ virCgroupControllerAvailable(int controller)
 return ret;
 }
 
-#else /* !VIR_CGROUP_SUPPORTED */
+#else /* !__linux__ */
 
 bool
 virCgroupAvailable(void)
@@ -3400,7 +3395,7 @@ virCgroupControllerAvailable(int controller 
ATTRIBUTE_UNUSED)
 {
 return false;
 }
-#endif /* !VIR_CGROUP_SUPPORTED */
+#endif /* !__linux__ */
 
 
 int
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v3 3/3] vircgroupv1: fix build on non-linux OSes

2018-09-27 Thread Pavel Hrdina
Cgroups are linux specific and we need to make sure that the code is
compiled only on linux.  On different OSes it fails the compilation:

../../src/util/vircgroupv1.c:65:19: error: variable has incomplete type 'struct 
mntent'
struct mntent entry;
  ^
../../src/util/vircgroupv1.c:65:12: note: forward declaration of 'struct mntent'
struct mntent entry;
   ^
../../src/util/vircgroupv1.c:74:12: error: implicit declaration of function 
'getmntent_r' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
while (getmntent_r(mounts, , buf, sizeof(buf)) != NULL) {
   ^
../../src/util/vircgroupv1.c:814:39: error: use of undeclared identifier 
'MS_NOSUID'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
  ^
../../src/util/vircgroupv1.c:814:49: error: use of undeclared identifier 
'MS_NODEV'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
^
../../src/util/vircgroupv1.c:814:58: error: use of undeclared identifier 
'MS_NOEXEC'
if (mount("tmpfs", root, "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, opts) < 0) {
 ^
../../src/util/vircgroupv1.c:841:65: error: use of undeclared identifier 
'MS_BIND'
if (mount(src, group->legacy[i].mountPoint, "none", MS_BIND,
^

Signed-off-by: Pavel Hrdina 
---
 src/util/vircgroupv1.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/util/vircgroupv1.c b/src/util/vircgroupv1.c
index 62a6e5c448..28a74474ee 100644
--- a/src/util/vircgroupv1.c
+++ b/src/util/vircgroupv1.c
@@ -20,13 +20,11 @@
  */
 #include 
 
-#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R
+#ifdef __linux__
 # include 
-#endif
-#include 
-#if defined HAVE_SYS_MOUNT_H
+# include 
 # include 
-#endif
+#endif /* __linux__ */
 
 #include "internal.h"
 
@@ -55,6 +53,8 @@ VIR_ENUM_IMPL(virCgroupV1Controller, 
VIR_CGROUP_CONTROLLER_LAST,
   "name=systemd");
 
 
+#ifdef __linux__
+
 /* We're looking for at least one 'cgroup' fs mount,
  * which is *not* a named mount. */
 static bool
@@ -2099,3 +2099,13 @@ virCgroupV1Register(void)
 {
 virCgroupBackendRegister();
 }
+
+#else /* !__linux__ */
+
+void
+virCgroupV1Register(void)
+{
+VIR_INFO("Control groups not supported on this platform");
+}
+
+#endif /* !__linux__ */
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v3 2/3] vircgroup: include system headers only on linux

2018-09-27 Thread Pavel Hrdina
All the system headers are used only if we are compiling on linux
and they all are present otherwise we would have seen build errors
because in our tests/vircgrouptest.c we use only __linux__ to check
whether to skip the cgroup tests or not.

Signed-off-by: Pavel Hrdina 
---
 src/util/vircgroup.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 7700a9f7a7..f90906e4ad 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -23,24 +23,23 @@
  */
 #include 
 
-#if defined HAVE_MNTENT_H && defined HAVE_SYS_MOUNT_H \
-&& defined HAVE_GETMNTENT_R
+#ifdef __linux__
 # include 
 # include 
-#endif
-#include 
-#include 
+# include 
+# include 
 
-#ifdef MAJOR_IN_MKDEV
-# include 
-#elif MAJOR_IN_SYSMACROS
-# include 
-#endif
+# ifdef MAJOR_IN_MKDEV
+#  include 
+# elif MAJOR_IN_SYSMACROS
+#  include 
+# endif
 
-#include 
-#include 
-#include 
-#include 
+# include 
+# include 
+# include 
+# include 
+#endif /* __linux__ */
 
 #define __VIR_CGROUP_ALLOW_INCLUDE_PRIV_H__
 #include "vircgrouppriv.h"
-- 
2.17.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Michal Privoznik
On 09/27/2018 11:11 AM, Bjoern Walk wrote:
> Michal Privoznik  [2018-09-27, 10:15AM +0200]:
>> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
>>> Michal Privoznik  [2018-09-19, 11:45AM +0200]:
 On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> Bjoern Walk  [2018-09-12, 01:17PM +0200]:
>> Michal Privoznik  [2018-09-12, 11:32AM
>> +0200]:

>>
>
> Still seeing the same timeout. Is this expected behaviour?
>

 Nope. I wonder if something has locked the path and forgot to
 unlock it (however, virtlockd should have unlocked all the paths
 owned by PID on connection close), or something is still holding
 the lock and connection opened.

 Can you see the timeout even when you turn off the selinux driver 
 (security_driver="none' in qemu.conf)? I tried to reproduce the
 issue yesterday and was unsuccessful. Do you have any steps to
 reproduce?
>>>
>>> So, I haven't been able to actually dig into the debugging but we
>>> have been able to reproduce this behaviour on x86 (both with SELinux
>>> and DAC). Looks like a general problem, if a problem at all, because
>>> from what I can see in the code, the 60 seconds timeout is actually
>>> intended, or not? The security manager does try for 60 seconds to
>>> acquire the lock and only then bails out. Why is this?
>>
>> The ideal solution would be to just tell virlockd "these are the paths I
>> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
>> the moment all paths are unlocked virtlockd will lock them and libvirtd
>> can continue its execution (i.e. chown() and setfcon()). However, we
>> can't do this because virtlockd runs single threaded [1] and therefore
>> if one thread is waiting for lock to be acquired there is no other
>> thread to unlock the path.
>>
>> Therefore I had to move the logic into libvirtd which tries repeatedly
>> to lock all the paths needed. And this is where the timeout steps in -
>> the lock acquiring attempts are capped at 60 seconds. This is an
>> arbitrary chosen timeout. We can make it smaller, but that will not
>> solve your problem - only mask it.
> 
> I still don't understand why we need a timeout at all. If virtlockd is
> unable to get the lock, just bail and continue with what you did after
> the timeout runs out. Is this some kind of safety-measure? Against what?

When there are two threads fighting over the lock. Imagine two threads
trying to set security label over the same file. Or imagine two daemons
on two distant hosts trying to set label on the same file on NFS.
virtlockd implements try-or-fail approach. So we need to call lock()
repeatedly until we succeed (or timeout).

> 
>>
>>>
>>> However, an actual bug is that while the security manager waits for
>>> the lock acquire the whole libvirtd hangs, but from what I understood
>>> and Marc explained to me, this is more of a pathological error in
>>> libvirt behaviour with various domain locks.
>>>
>>
>> Whole libvirtd shouldn't hang. Only those threads which try to acquire
>> domain object lock. IOW it should be possible to run APIs over different
>> domains (or other objects for that matter). For instance dumpxml over
>> different domain works just fine.
> 
> Yes, but from a user perspective, this is still pretty bad and
> unexpected. libvirt should continue to operate as usual while virtlockd
> is figuring out the locking.

Agreed. I will look into this.

> 
>> Anyway, we need to get to the bottom of this. Looks like something keeps
>> the file locked and then when libvirt wants to lock if for metadata the
>> timeout is hit and whole operation fails. Do you mind running 'lslocks
>> -u' when starting a domain and before the timeout is hit?
> 
> There IS a lock held on the image, from the FIRST domain that we
> started. The second domain, which is using the SAME image, unshared,
> runs into the locking timeout. Sorry if I failed to describe this setup
> appropriately. I am starting to believe that this is expected behaviour,
> although it is not intuitive.
> 
>   # lslocks -u
>   COMMANDPID   TYPE SIZE MODE  M STARTEND PATH
>   ...
>   virtlockd   199062  POSIX 1.5G WRITE 0 0  0 
> /var/lib/libvirt/images/u1604.qcow2

But this should not clash, because metadata lock use different bytes:
the regular locking uses offset=0, metadata lock uses offset=1. Both
locks lock just one byte in length.

However, from the logs it looks like virtlockd doesn't try to actually
acquire the lock because it found in internal records that the path is
locked (even though it is locked with different offset). Anyway, now I
am finally able to reproduce the issue and I'll look into this too.

Quick and dirty patch might look something like this:

diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
index 79fa48d365..3c51d7926b 100644
--- i/src/util/virlockspace.c
+++ w/src/util/virlockspace.c
@@ -630,8 +630,9 @@ int 

Re: [libvirt] [PATCH] tests: reintroduce tests for libxl's legacy nested setting

2018-09-27 Thread Erik Skultety
On Wed, Sep 26, 2018 at 11:31:19AM -0600, Jim Fehlig wrote:
> The preferred location for setting the nested CPU flag changed in
> Xen 4.10 and is advertised via the LIBXL_HAVE_BUILDINFO_NESTED_HVM
> define.  Commit 95d19cd0 changed libxl to use the new preferred
> location but unconditionally changed the tests, causing 'make check'
> failures against Xen < 4.10 that do not contain the new location.
>
> Commit e94415d5 fixed the failures by only running the tests when
> LIBXL_HAVE_BUILDINFO_NESTED_HVM is defined. Since libvirt supports
> several versions of Xen that use the old nested location, it is
> prudent to test the flag is set correctly. This patch reintroduces
> the tests for the legacy location of the nested setting.
>
> Signed-off-by: Jim Fehlig 
> ---
>
> We could probably get by with one test for the old nested location,
> in which case I'd drop vnuma-hvm-legacy-nest. Any opinions on that?

I verified with a few different platforms. I don't have a better idea on what
to do about the legacy tests, we either add more (even identical) test files
or we figure out some black magic to do the same thing (not preferred).
Anyway, to answer your question, even though it might be enough, I'd like to
stay consistent and keep both, so that if one day someone is looking at the
source they don't wonder why only one of them is being run in the legacy mode.
I hope that makes sense.

Reviewed-by: Erik Skultety 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Bjoern Walk
Michal Privoznik  [2018-09-27, 10:15AM +0200]:
> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
> > Michal Privoznik  [2018-09-19, 11:45AM +0200]:
> >> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> >>> Bjoern Walk  [2018-09-12, 01:17PM +0200]:
>  Michal Privoznik  [2018-09-12, 11:32AM
>  +0200]:
> >> 
>  
> >>> 
> >>> Still seeing the same timeout. Is this expected behaviour?
> >>> 
> >> 
> >> Nope. I wonder if something has locked the path and forgot to
> >> unlock it (however, virtlockd should have unlocked all the paths
> >> owned by PID on connection close), or something is still holding
> >> the lock and connection opened.
> >> 
> >> Can you see the timeout even when you turn off the selinux driver 
> >> (security_driver="none' in qemu.conf)? I tried to reproduce the
> >> issue yesterday and was unsuccessful. Do you have any steps to
> >> reproduce?
> > 
> > So, I haven't been able to actually dig into the debugging but we
> > have been able to reproduce this behaviour on x86 (both with SELinux
> > and DAC). Looks like a general problem, if a problem at all, because
> > from what I can see in the code, the 60 seconds timeout is actually
> > intended, or not? The security manager does try for 60 seconds to
> > acquire the lock and only then bails out. Why is this?
> 
> The ideal solution would be to just tell virlockd "these are the paths I
> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
> the moment all paths are unlocked virtlockd will lock them and libvirtd
> can continue its execution (i.e. chown() and setfcon()). However, we
> can't do this because virtlockd runs single threaded [1] and therefore
> if one thread is waiting for lock to be acquired there is no other
> thread to unlock the path.
> 
> Therefore I had to move the logic into libvirtd which tries repeatedly
> to lock all the paths needed. And this is where the timeout steps in -
> the lock acquiring attempts are capped at 60 seconds. This is an
> arbitrary chosen timeout. We can make it smaller, but that will not
> solve your problem - only mask it.

I still don't understand why we need a timeout at all. If virtlockd is
unable to get the lock, just bail and continue with what you did after
the timeout runs out. Is this some kind of safety-measure? Against what?

> 
> > 
> > However, an actual bug is that while the security manager waits for
> > the lock acquire the whole libvirtd hangs, but from what I understood
> > and Marc explained to me, this is more of a pathological error in
> > libvirt behaviour with various domain locks.
> > 
> 
> Whole libvirtd shouldn't hang. Only those threads which try to acquire
> domain object lock. IOW it should be possible to run APIs over different
> domains (or other objects for that matter). For instance dumpxml over
> different domain works just fine.

Yes, but from a user perspective, this is still pretty bad and
unexpected. libvirt should continue to operate as usual while virtlockd
is figuring out the locking.

> Anyway, we need to get to the bottom of this. Looks like something keeps
> the file locked and then when libvirt wants to lock if for metadata the
> timeout is hit and whole operation fails. Do you mind running 'lslocks
> -u' when starting a domain and before the timeout is hit?

There IS a lock held on the image, from the FIRST domain that we
started. The second domain, which is using the SAME image, unshared,
runs into the locking timeout. Sorry if I failed to describe this setup
appropriately. I am starting to believe that this is expected behaviour,
although it is not intuitive.

# lslocks -u
COMMANDPID   TYPE SIZE MODE  M STARTEND PATH
...
virtlockd   199062  POSIX 1.5G WRITE 0 0  0 
/var/lib/libvirt/images/u1604.qcow2
...

> 
> Michal
> 
> 1: The reason that virtlockd has to run single threaded is stupidity of
> POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW,
> ...) and entering critical section. If another thread does open() +
> close() on the same file the file is unlocked. Because we can't
> guarantee this will not happen in multithreaded libvirt we had to
> introduce a separate process to take care of that. And this process has
> to be single threaded so there won't ever be the second thread to call
> close() and unintentionally release the lock.

Thanks for the explanation, I will have some reading to do to get a
better overview of the locking process in Linux.

> 
> Perhaps we could use OFD locks but those are not part of POSIX and are
> available on Linux only.
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

-- 
IBM Systems
Linux on Z & Virtualization Development
--
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--

Re: [libvirt] [PATCH 4/4] security: Always spawn process for transactions

2018-09-27 Thread Michal Privoznik
On 09/26/2018 11:14 PM, John Ferlan wrote:
> 
> 
> On 9/26/18 5:46 AM, Michal Privoznik wrote:
>> On 09/25/2018 10:52 PM, John Ferlan wrote:
>>>
>>>
>>> On 9/21/18 5:29 AM, Michal Privoznik wrote:
 There is this latent bug which can result in virtlockd killing
 libvirtd. The problem is when the following steps occur:

Parent |  Child
 --+
 1) virSecurityManagerMetadataLock(path);  |
This results in connection FD to virtlockd to be   |
dup()-ed and saved for later use.  |
   |
 2) Another thread calling fork(); |
This results in the FD from step 1) to be cloned   |
   |
 3) virSecurityManagerMetadataUnlock(path);|
Here, the FD is closed, but the connection is  |
still kept open because child process has  |
duplicated FD  |
   |
 4) virSecurityManagerMetadataLock(differentPath); |
Again, new connection is opened, FD is dup()-ed|
   |
 5)| exec() or exit()

 Step 5) results in closing the connection from 1) to be closed,
 finally. However, at this point virtlockd looks at its internal
 records if PID from 1) does not still own any resources. Well it
 does - in step 4) it locked differentPath. This results in
 virtlockd killing libvirtd.

 Note that this happens because we do some relabelling even before
 fork() at which point vm->pid is still -1. When this value is
 passed down to transaction code it simply runs the transaction
 in parent process (=libvirtd), which means the owner of metadata
 locks is the parent process.

 Signed-off-by: Michal Privoznik 

 Signed-off-by: Michal Privoznik 
>>>
>>> Always good to double up on the S-o-b's when there's locks and forks
>>> involved. That makes sure the fork gets one too ;-)
>>>
>>>
 ---
  src/security/security_dac.c | 12 ++--
  src/security/security_selinux.c | 12 ++--
  2 files changed, 12 insertions(+), 12 deletions(-)

>>>
>>> I read the cover, it's not simpler than the above. 
>>
>> Yeah, this is not that simple bug. I'll try to explain it differently:
>>
>> On fork(), all FDs of the parent are duplicated to child. So even if
>> parent closes one of them, the connection is not closed until child
>> closes the same FD. And since metadata locking is very fragile when it
>> comes to connection to virtlockd, this fork() behaviour actually plays
>> against us because it violates our assumption that connection is closed
>> at the end of metadataUnlock(). In fact it is not - child has it still open.
>>
>> If there was a flag like DONT_DUP_ON_FORK I could just set it on
>> virtlockd connection in metadataLock() and be done.
> 
> Avoid the forking exec's works too ;-}
> 
> OK - instead of talking in generalities - which forking dup'd fd is in
> play here? I don't want to assume or guess any more.

The connection FD.

Maybe I'm misreading your comments or the other way around. I'll try to
explain what is the problem one last time, in the simplest way I can.

1) Thread A wants to start a domain so it calls
virSecurityManagerMetadataLock().

2) The function opens a connection to virtlockd (represented by FD) and
duplicates the FD so that connection is kept open until unlock.

3) Now that locks are acquired, the thread does chown() and setfcon().

4) the virSecurityManagerMetadataUnlock() function is called, which
causes virtlockd to release all the locks. Subsequently to that, the
function closes the duplicated FD obtained in step 2).

The duplication is done because both virLockManagerAcquire() and
virLockManagerRelease() open new connection, do the RPC talk and close
the connection. That's just the way the functions are written. Now,
because of FD duplication happening in step 2) the connection from
virLockManagerAcquire() is not really closed until step 4) where the
duplicated FD is closed.

This is important to realize: the connection is closed only when the
last FD representing it is closed. If I were to duplicate the connection
FD a hundred times and then close 99 of those duplicates the connection
would still be kept open.

Now, lets put some threads into the picture. Imagine thread B which
called fork() just in between steps 2) and 3). On fork, ALL partent FDs
are duplicated. So the connection FD is duplicated into the child
process too. Therefore, in step 4) where we think the connection is
closed - well it is not really because there is still one FD around - in
the 

Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Bjoern Walk
Michal Privoznik  [2018-09-27, 10:14AM +0200]:
> Right, so this is just waiting for virtlockd to lock the paths.
> virtlockd is obviously unable to do that (as I suggested in my previous
> e-mail - is perhaps some other process holding the lock?).

It can not lock the paths, because those paths are already locked by the
first domain. This is intentional. But it should take 60 seconds to
resolve this and block the whole libvirt daemon in the process.

> Can you please enable debug logs for virtlockd, reproduce and share the
> log with me? Setting this in /etc/libvirt/virtlockd.conf should be enough:
> 
> log_level=1
> log_filters="4:event 3:rpc"
> log_outputs="1:file:/var/log/virtlockd.lod"

See attached.


virtlockd.log.gz
Description: application/gzip


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Marc Hartmayer
On Thu, Sep 27, 2018 at 10:15 AM +0200, Michal Privoznik  
wrote:
> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
>> Michal Privoznik  [2018-09-19, 11:45AM +0200]:
>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
 Bjoern Walk  [2018-09-12, 01:17PM +0200]:
> Michal Privoznik  [2018-09-12, 11:32AM
> +0200]:
>>>
>

 Still seeing the same timeout. Is this expected behaviour?

>>>
>>> Nope. I wonder if something has locked the path and forgot to
>>> unlock it (however, virtlockd should have unlocked all the paths
>>> owned by PID on connection close), or something is still holding
>>> the lock and connection opened.
>>>
>>> Can you see the timeout even when you turn off the selinux driver
>>> (security_driver="none' in qemu.conf)? I tried to reproduce the
>>> issue yesterday and was unsuccessful. Do you have any steps to
>>> reproduce?
>>
>> So, I haven't been able to actually dig into the debugging but we
>> have been able to reproduce this behaviour on x86 (both with SELinux
>> and DAC). Looks like a general problem, if a problem at all, because
>> from what I can see in the code, the 60 seconds timeout is actually
>> intended, or not? The security manager does try for 60 seconds to
>> acquire the lock and only then bails out. Why is this?
>
> The ideal solution would be to just tell virlockd "these are the paths I
> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
> the moment all paths are unlocked virtlockd will lock them and libvirtd
> can continue its execution (i.e. chown() and setfcon()). However, we
> can't do this because virtlockd runs single threaded [1] and therefore
> if one thread is waiting for lock to be acquired there is no other
> thread to unlock the path.
>
> Therefore I had to move the logic into libvirtd which tries repeatedly
> to lock all the paths needed. And this is where the timeout steps in -
> the lock acquiring attempts are capped at 60 seconds. This is an
> arbitrary chosen timeout. We can make it smaller, but that will not
> solve your problem - only mask it.
>
>>
>> However, an actual bug is that while the security manager waits for
>> the lock acquire the whole libvirtd hangs, but from what I understood
>> and Marc explained to me, this is more of a pathological error in
>> libvirt behaviour with various domain locks.
>>
>
> Whole libvirtd shouldn't hang.

The main loop doesn’t hang.

> Only those threads which try to acquire
> domain object lock. IOW it should be possible to run APIs over different
> domains (or other objects for that matter). For instance dumpxml over
> different domain works just fine.

Oh sry, my assumption that the thread A is also holding the
virDomainObjList lock is wrong!

Here is the actual backtrace of a waiting thread (in this case the
“worker thread for the "virsh list" command”)

(gdb) bt
#0  0x7f850887b7ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7f8508874cf4 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x7f850c26cfd9 in virMutexLock (m=) at 
util/virthread.c:89
#3  0x7f850c2455df in virObjectLock (anyobj=) at 
util/virobject.c:429
#4  0x7f850c2ceaba in virDomainObjListFilter 
(list=list@entry=0x7f84f47a97c0, nvms=nvms@entry=0x7f84f47a97c8, 
conn=conn@entry=0x7f84dc01ad00, filter=filter@entry=0x7f850c318c70 
, flags=flags@entry=1) at 
conf/virdomainobjlist.c:919
#5  0x7f850c2cfef2 in virDomainObjListCollect (domlist=0x7f848c10e290, 
conn=conn@entry=0x7f84dc01ad00, vms=vms@entry=0x7f84f47a9820, 
nvms=nvms@entry=0x7f84f47a9830, filter=0x7f850c318c70 
, flags=1) at conf/virdomainobjlist.c:961
#6  0x7f850c2d01a6 in virDomainObjListExport (domlist=, 
conn=0x7f84dc01ad00, domains=0x7f84f47a9890, filter=, 
flags=) at conf/virdomainobjlist.c:1042
#7  0x7f850c426be9 in virConnectListAllDomains (conn=0x7f84dc01ad00, 
domains=0x7f84f47a9890, flags=1) at libvirt-domain.c:6493
#8  0x55a82b2d905d in remoteDispatchConnectListAllDomains 
(server=0x55a82c9e16d0, msg=0x55a82ca53380, args=0x7f84dc01ae30, 
args=0x7f84dc01ae30, ret=0x7f84dc005bb0, rerr=0x7f84f47a9960, client=) at remote/remote_daemon_dispatch_stubs.h:1372
#9  remoteDispatchConnectListAllDomainsHelper (server=0x55a82c9e16d0, 
client=, msg=0x55a82ca53380, rerr=0x7f84f47a9960, 
args=0x7f84dc01ae30, ret=0x7f84dc005bb0) at 
remote/remote_daemon_dispatch_stubs.h:1348
#10 0x7f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53380, 
client=0x55a82ca52a40, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at 
rpc/virnetserverprogram.c:437
…

virDomainObjListFilter needs the lock of every domain… That’s the actual
problem.

>
> Anyway, we need to get to the bottom of this. Looks like something keeps
> the file locked and then when libvirt wants to lock if for metadata the
> timeout is hit and whole operation fails. Do you mind running 'lslocks
> -u' when starting a domain and before the timeout is hit?
>
> Michal
>
> 1: The reason that virtlockd has to run single threaded is stupidity of
> 

Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Marc Hartmayer
On Thu, Sep 27, 2018 at 10:14 AM +0200, Michal Privoznik  
wrote:
> On 09/27/2018 09:57 AM, Marc Hartmayer wrote:
>> On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk  
>> wrote:
>>> Michal Privoznik  [2018-09-19, 11:45AM +0200]:
 On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> Bjoern Walk  [2018-09-12, 01:17PM +0200]:
>> Michal Privoznik  [2018-09-12, 11:32AM +0200]:

>>
>
> Still seeing the same timeout. Is this expected behaviour?
>

 Nope. I wonder if something has locked the path and forgot to unlock it
 (however, virtlockd should have unlocked all the paths owned by PID on
 connection close), or something is still holding the lock and connection
 opened.

 Can you see the timeout even when you turn off the selinux driver
 (security_driver="none' in qemu.conf)? I tried to reproduce the issue
 yesterday and was unsuccessful. Do you have any steps to reproduce?
>>>
>>> So, I haven't been able to actually dig into the debugging but we have
>>> been able to reproduce this behaviour on x86 (both with SELinux and
>>> DAC). Looks like a general problem, if a problem at all, because from
>>> what I can see in the code, the 60 seconds timeout is actually intended,
>>> or not? The security manager does try for 60 seconds to acquire the lock
>>> and only then bails out. Why is this?
>>
>> Backtrace of libvirtd where it’s hanging (in thread A)
>>
>> (gdb) bt
>> #0 read () from /lib64/libpthread.so.0
>> #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at 
>> /usr/include/bits/unistd.h:44
>> #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at 
>> util/virfile.c:1061
>> #3 saferead_lim (fd=31, max_len=max_len@entry=1024, 
>> length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345
>> #4 virFileReadHeaderFD (fd=, maxlen=maxlen@entry=1024, 
>> buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376
>> #5 virProcessRunInMountNamespace () at util/virprocess.c:1149
>> #6 virSecuritySELinuxTransactionCommit (mgr=, pid=23977) at 
>> security/security_selinux.c:1106
>> #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, 
>> pid=pid@entry=23977) at security/security_manager.c:324
>> #8 virSecurityStackTransactionCommit (mgr=, pid=23977) at 
>> security/security_stack.c:166
>> #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, 
>> pid=pid@entry=23977) at security/security_manager.c:324
>> #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, 
>> vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at 
>> _security.c:56
>> #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, 
>> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
>> asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, 
>> snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, 
>> flags=1 qemu/qemu_process.c:6329
>> #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40,
>> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50,
>> updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
>> migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0,
>> snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at
>> qemu/qemu_process.c:6816
>
>
> This is just waiting for forked child to finish. Nothing suspicious here.
>
>>
>> …
>>
>> #25 in start_thread () from /lib64/libpthread.so.0
>> #26 in clone () from /lib64/libc.so.6
>>
>>
>> Backtrace of the forked process where the process is trying to get the
>> meta data lock for 60 seconds.>
>> #0  0x7f8508572730 in nanosleep () from target:/lib64/libc.so.6
>> #1  0x7f850859dff4 in usleep () from target:/lib64/libc.so.6
>> #2  0x7f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) 
>> at util/virtime.c:453
>> #3  0x7f850c3108d8 in virSecurityManagerMetadataLock 
>> (mgr=0x7f848c183850, paths=, npaths=) at 
>> security/security_manager.c:1345
>> #4  0x7f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, 
>> opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226
>> #5  0x7f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, 
>> cb=0x7f850c30e330 , pid=23977, errfd=33) at 
>> util/virprocess.c:1100
>> #6  virProcessRunInMountNamespace () at util/virprocess.c:1140
>> #7  0x7f850c30e55c in virSecurityDACTransactionCommit (mgr=> out>, pid=23977) at security/security_dac.c:565
>> #8  0x7f850c30eeca in virSecurityManagerTransactionCommit 
>> (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324
>> #9  0x7f850c30b36b in virSecurityStackTransactionCommit (mgr=> out>, pid=23977) at security/security_stack.c:166
>> #10 0x7f850c30eeca in virSecurityManagerTransactionCommit 
>> (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
>> #11 0x7f84e0586bf2 in qemuSecuritySetAllLabel 
>> (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
>> stdin_path=stdin_path@entry=0x0) at 

Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Michal Privoznik
On 09/27/2018 09:01 AM, Bjoern Walk wrote:
> Michal Privoznik  [2018-09-19, 11:45AM +0200]:
>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>> Bjoern Walk  [2018-09-12, 01:17PM +0200]:
 Michal Privoznik  [2018-09-12, 11:32AM
 +0200]:
>> 
 
>>> 
>>> Still seeing the same timeout. Is this expected behaviour?
>>> 
>> 
>> Nope. I wonder if something has locked the path and forgot to
>> unlock it (however, virtlockd should have unlocked all the paths
>> owned by PID on connection close), or something is still holding
>> the lock and connection opened.
>> 
>> Can you see the timeout even when you turn off the selinux driver 
>> (security_driver="none' in qemu.conf)? I tried to reproduce the
>> issue yesterday and was unsuccessful. Do you have any steps to
>> reproduce?
> 
> So, I haven't been able to actually dig into the debugging but we
> have been able to reproduce this behaviour on x86 (both with SELinux
> and DAC). Looks like a general problem, if a problem at all, because
> from what I can see in the code, the 60 seconds timeout is actually
> intended, or not? The security manager does try for 60 seconds to
> acquire the lock and only then bails out. Why is this?

The ideal solution would be to just tell virlockd "these are the paths I
want you to lock on my behalf" and virtlockd would use F_SETLKW so that
the moment all paths are unlocked virtlockd will lock them and libvirtd
can continue its execution (i.e. chown() and setfcon()). However, we
can't do this because virtlockd runs single threaded [1] and therefore
if one thread is waiting for lock to be acquired there is no other
thread to unlock the path.

Therefore I had to move the logic into libvirtd which tries repeatedly
to lock all the paths needed. And this is where the timeout steps in -
the lock acquiring attempts are capped at 60 seconds. This is an
arbitrary chosen timeout. We can make it smaller, but that will not
solve your problem - only mask it.

> 
> However, an actual bug is that while the security manager waits for
> the lock acquire the whole libvirtd hangs, but from what I understood
> and Marc explained to me, this is more of a pathological error in
> libvirt behaviour with various domain locks.
> 

Whole libvirtd shouldn't hang. Only those threads which try to acquire
domain object lock. IOW it should be possible to run APIs over different
domains (or other objects for that matter). For instance dumpxml over
different domain works just fine.

Anyway, we need to get to the bottom of this. Looks like something keeps
the file locked and then when libvirt wants to lock if for metadata the
timeout is hit and whole operation fails. Do you mind running 'lslocks
-u' when starting a domain and before the timeout is hit?

Michal

1: The reason that virtlockd has to run single threaded is stupidity of
POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW,
...) and entering critical section. If another thread does open() +
close() on the same file the file is unlocked. Because we can't
guarantee this will not happen in multithreaded libvirt we had to
introduce a separate process to take care of that. And this process has
to be single threaded so there won't ever be the second thread to call
close() and unintentionally release the lock.

Perhaps we could use OFD locks but those are not part of POSIX and are
available on Linux only.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Michal Privoznik
On 09/27/2018 09:57 AM, Marc Hartmayer wrote:
> On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk  
> wrote:
>> Michal Privoznik  [2018-09-19, 11:45AM +0200]:
>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
 Bjoern Walk  [2018-09-12, 01:17PM +0200]:
> Michal Privoznik  [2018-09-12, 11:32AM +0200]:
>>>
>

 Still seeing the same timeout. Is this expected behaviour?

>>>
>>> Nope. I wonder if something has locked the path and forgot to unlock it
>>> (however, virtlockd should have unlocked all the paths owned by PID on
>>> connection close), or something is still holding the lock and connection
>>> opened.
>>>
>>> Can you see the timeout even when you turn off the selinux driver
>>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
>>> yesterday and was unsuccessful. Do you have any steps to reproduce?
>>
>> So, I haven't been able to actually dig into the debugging but we have
>> been able to reproduce this behaviour on x86 (both with SELinux and
>> DAC). Looks like a general problem, if a problem at all, because from
>> what I can see in the code, the 60 seconds timeout is actually intended,
>> or not? The security manager does try for 60 seconds to acquire the lock
>> and only then bails out. Why is this?
> 
> Backtrace of libvirtd where it’s hanging (in thread A)
> 
> (gdb) bt
> #0 read () from /lib64/libpthread.so.0
> #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at 
> /usr/include/bits/unistd.h:44
> #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at 
> util/virfile.c:1061
> #3 saferead_lim (fd=31, max_len=max_len@entry=1024, 
> length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345
> #4 virFileReadHeaderFD (fd=, maxlen=maxlen@entry=1024, 
> buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376
> #5 virProcessRunInMountNamespace () at util/virprocess.c:1149
> #6 virSecuritySELinuxTransactionCommit (mgr=, pid=23977) at 
> security/security_selinux.c:1106
> #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, 
> pid=pid@entry=23977) at security/security_manager.c:324
> #8 virSecurityStackTransactionCommit (mgr=, pid=23977) at 
> security/security_stack.c:166
> #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, 
> pid=pid@entry=23977) at security/security_manager.c:324
> #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, 
> vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56
> #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, 
> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
> asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, 
> snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 
> qemu/qemu_process.c:6329
> #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40,
> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50,
> updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
> migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0,
> snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at
> qemu/qemu_process.c:6816


This is just waiting for forked child to finish. Nothing suspicious here.

> 
> …
> 
> #25 in start_thread () from /lib64/libpthread.so.0
> #26 in clone () from /lib64/libc.so.6
> 
> 
> Backtrace of the forked process where the process is trying to get the
> meta data lock for 60 seconds.>
> #0  0x7f8508572730 in nanosleep () from target:/lib64/libc.so.6
> #1  0x7f850859dff4 in usleep () from target:/lib64/libc.so.6
> #2  0x7f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) 
> at util/virtime.c:453
> #3  0x7f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, 
> paths=, npaths=) at 
> security/security_manager.c:1345
> #4  0x7f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, 
> opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226
> #5  0x7f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, 
> cb=0x7f850c30e330 , pid=23977, errfd=33) at 
> util/virprocess.c:1100
> #6  virProcessRunInMountNamespace () at util/virprocess.c:1140
> #7  0x7f850c30e55c in virSecurityDACTransactionCommit (mgr= out>, pid=23977) at security/security_dac.c:565
> #8  0x7f850c30eeca in virSecurityManagerTransactionCommit 
> (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324
> #9  0x7f850c30b36b in virSecurityStackTransactionCommit (mgr= out>, pid=23977) at security/security_stack.c:166
> #10 0x7f850c30eeca in virSecurityManagerTransactionCommit 
> (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
> #11 0x7f84e0586bf2 in qemuSecuritySetAllLabel 
> (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
> stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56
> #12 0x7f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, 
> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
> 

Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Marc Hartmayer
On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk  wrote:
> Michal Privoznik  [2018-09-19, 11:45AM +0200]:
>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>> > Bjoern Walk  [2018-09-12, 01:17PM +0200]:
>> >> Michal Privoznik  [2018-09-12, 11:32AM +0200]:
>>
>> >>
>> >
>> > Still seeing the same timeout. Is this expected behaviour?
>> >
>>
>> Nope. I wonder if something has locked the path and forgot to unlock it
>> (however, virtlockd should have unlocked all the paths owned by PID on
>> connection close), or something is still holding the lock and connection
>> opened.
>>
>> Can you see the timeout even when you turn off the selinux driver
>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
>> yesterday and was unsuccessful. Do you have any steps to reproduce?
>
> So, I haven't been able to actually dig into the debugging but we have
> been able to reproduce this behaviour on x86 (both with SELinux and
> DAC). Looks like a general problem, if a problem at all, because from
> what I can see in the code, the 60 seconds timeout is actually intended,
> or not? The security manager does try for 60 seconds to acquire the lock
> and only then bails out. Why is this?

Backtrace of libvirtd where it’s hanging (in thread A)

(gdb) bt
#0 read () from /lib64/libpthread.so.0
#1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at 
/usr/include/bits/unistd.h:44
#2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at 
util/virfile.c:1061
#3 saferead_lim (fd=31, max_len=max_len@entry=1024, 
length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345
#4 virFileReadHeaderFD (fd=, maxlen=maxlen@entry=1024, 
buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376
#5 virProcessRunInMountNamespace () at util/virprocess.c:1149
#6 virSecuritySELinuxTransactionCommit (mgr=, pid=23977) at 
security/security_selinux.c:1106
#7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, 
pid=pid@entry=23977) at security/security_manager.c:324
#8 virSecurityStackTransactionCommit (mgr=, pid=23977) at 
security/security_stack.c:166
#9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, 
pid=pid@entry=23977) at security/security_manager.c:324
#10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, 
vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56
#11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, 
driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, 
snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 
qemu/qemu_process.c:6329
#12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40,
driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50,
updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0,
snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at
qemu/qemu_process.c:6816

…

#25 in start_thread () from /lib64/libpthread.so.0
#26 in clone () from /lib64/libc.so.6


Backtrace of the forked process where the process is trying to get the
meta data lock for 60 seconds.

#0  0x7f8508572730 in nanosleep () from target:/lib64/libc.so.6
#1  0x7f850859dff4 in usleep () from target:/lib64/libc.so.6
#2  0x7f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at 
util/virtime.c:453
#3  0x7f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, 
paths=, npaths=) at 
security/security_manager.c:1345
#4  0x7f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, 
opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226
#5  0x7f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, 
cb=0x7f850c30e330 , pid=23977, errfd=33) at 
util/virprocess.c:1100
#6  virProcessRunInMountNamespace () at util/virprocess.c:1140
#7  0x7f850c30e55c in virSecurityDACTransactionCommit (mgr=, 
pid=23977) at security/security_dac.c:565
#8  0x7f850c30eeca in virSecurityManagerTransactionCommit 
(mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324
#9  0x7f850c30b36b in virSecurityStackTransactionCommit (mgr=, pid=23977) at security/security_stack.c:166
#10 0x7f850c30eeca in virSecurityManagerTransactionCommit 
(mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
#11 0x7f84e0586bf2 in qemuSecuritySetAllLabel 
(driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56
#12 0x7f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, 
driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, 
asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, 
snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) 
at qemu/qemu_process.c:6329
#13 0x7f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, 
driver=driver@entry=0x7f848c108c60, 

Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking

2018-09-27 Thread Bjoern Walk
Michal Privoznik  [2018-09-19, 11:45AM +0200]:
> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> > Bjoern Walk  [2018-09-12, 01:17PM +0200]:
> >> Michal Privoznik  [2018-09-12, 11:32AM +0200]:
> 
> >>
> > 
> > Still seeing the same timeout. Is this expected behaviour?
> > 
> 
> Nope. I wonder if something has locked the path and forgot to unlock it
> (however, virtlockd should have unlocked all the paths owned by PID on
> connection close), or something is still holding the lock and connection
> opened.
> 
> Can you see the timeout even when you turn off the selinux driver
> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
> yesterday and was unsuccessful. Do you have any steps to reproduce?

So, I haven't been able to actually dig into the debugging but we have
been able to reproduce this behaviour on x86 (both with SELinux and
DAC). Looks like a general problem, if a problem at all, because from
what I can see in the code, the 60 seconds timeout is actually intended,
or not? The security manager does try for 60 seconds to acquire the lock
and only then bails out. Why is this?

However, an actual bug is that while the security manager waits for the
lock acquire the whole libvirtd hangs, but from what I understood and
Marc explained to me, this is more of a pathological error in libvirt
behaviour with various domain locks.

-- 
IBM Systems
Linux on Z & Virtualization Development
--
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Tian, Kevin
> From: Kirti Wankhede
> Sent: Thursday, September 27, 2018 2:22 PM
> 
> Generally a single instance of mdev device, a share of physical device, is
> assigned to user space application or a VM. There are cases when multiple
> instances of mdev devices of same or different types are required by User
> space application or VM. For example in case of vGPU, multiple mdev
> devices
> of type which represents whole GPU can be assigned to one instance of
> application or VM.
> 
> All types of mdev devices may not support assigning multiple mdev devices
> to a user space application. In that case vendor driver can fail open()
> call of mdev device. But there is no way to know User space application
> about the configuration supported by vendor driver.
> 
> To expose supported configuration, vendor driver should add
> 'multiple_mdev_support' attribute to type-id directory if vendor driver
> supports assigning multiple mdev devices of a particular type-id to one
> instance of user space application or a VM.
> 
> User space application should read if 'multiple_mdev_support' attibute is
> present in type-id directory of all mdev devices which are going to be
> used. If all read 1 then user space application can proceed with multiple
> mdev devices.
> 
> This is optional and readonly attribute.

I didn't get what is the exact problem from above description. Isn't it the
basic point behind mdev concept that parent driver can create multiple
mdev instances on a physical device? VFIO usually doesn't care whether
multiple devices (pci or mdev) are assigned to same process/VM or not.
Can you give some example of actual problem behind this change?

Thanks
Kevin

> 
> Signed-off-by: Neo Jia 
> Signed-off-by: Kirti Wankhede 
> ---
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> index 452dbe39270e..69e1291479ce 100644
> --- a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> +++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> @@ -85,6 +85,19 @@ Users:
>   a particular  that can help in understanding the
>   features provided by that type of mediated device.
> 
> +What:   /sys/.../mdev_supported_types/ id>/multiple_mdev_support
> +Date:   September 2018
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will return 0 or 1. Returning 1
> indicates
> + multiple mdev devices of a particular  assigned to
> one
> + User space application is supported by vendor driver. This is
> + optional and readonly attribute.
> +Users:
> + Userspace applications interested in knowing if multiple
> mdev
> + devices of a particular  can be assigned to one
> + instance of application.
> +
>  What:   /sys/...///
>  Date:   October 2016
>  Contact:Kirti Wankhede 
> --
> 2.7.0


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH v1 1/1] Add attribute multiple_mdev_support for mdev type-id

2018-09-27 Thread Kirti Wankhede
Generally a single instance of mdev device, a share of physical device, is
assigned to user space application or a VM. There are cases when multiple
instances of mdev devices of same or different types are required by User
space application or VM. For example in case of vGPU, multiple mdev devices
of type which represents whole GPU can be assigned to one instance of
application or VM.

All types of mdev devices may not support assigning multiple mdev devices
to a user space application. In that case vendor driver can fail open()
call of mdev device. But there is no way to know User space application
about the configuration supported by vendor driver.

To expose supported configuration, vendor driver should add
'multiple_mdev_support' attribute to type-id directory if vendor driver
supports assigning multiple mdev devices of a particular type-id to one
instance of user space application or a VM.

User space application should read if 'multiple_mdev_support' attibute is
present in type-id directory of all mdev devices which are going to be
used. If all read 1 then user space application can proceed with multiple
mdev devices.

This is optional and readonly attribute.

Signed-off-by: Neo Jia 
Signed-off-by: Kirti Wankhede 
---
 Documentation/ABI/testing/sysfs-bus-vfio-mdev | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev 
b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
index 452dbe39270e..69e1291479ce 100644
--- a/Documentation/ABI/testing/sysfs-bus-vfio-mdev
+++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
@@ -85,6 +85,19 @@ Users:
a particular  that can help in understanding the
features provided by that type of mediated device.
 
+What:   /sys/.../mdev_supported_types//multiple_mdev_support
+Date:   September 2018
+Contact:Kirti Wankhede 
+Description:
+   Reading this attribute will return 0 or 1. Returning 1 indicates
+   multiple mdev devices of a particular  assigned to one
+   User space application is supported by vendor driver. This is
+   optional and readonly attribute.
+Users:
+   Userspace applications interested in knowing if multiple mdev
+   devices of a particular  can be assigned to one
+   instance of application.
+
 What:   /sys/...///
 Date:   October 2016
 Contact:Kirti Wankhede 
-- 
2.7.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list