Re: [libvirt] libvirt-guests.sh without or with failing ACPI support

2019-12-10 Thread Henning Schild
Am Tue, 10 Dec 2019 12:01:24 +0100
schrieb Christian Ehrhardt :

> On Tue, Dec 10, 2019 at 10:46 AM Henning Schild
>  wrote:
> 
> > Hi all,
> >
> > the systemd shutdown scripts work sequentially with a 300s timeout
> > (seen on Debian). If a VM does not have ACPI support, or the ACPI
> > support failed for some reason, you are looking at a 300s timeout
> > per instance for a host shutdown/reboot.
> > i.e. 10 instances without working ACPI = 3000s to shut down
> >
> > I think the systemd scripting should be parallel instead of
> > sequentially. So if you have many VMs without working ACPI you just
> > have to wait 300s in total for the host to shut down.
> >  
> 
> Hi Henning,
> this is configurable in /etc/default/libvirt-guests
> For example Ubuntu (otherwise using the same bits) changes that to run
> PARALLEL_SHUTDOWN=10
> SHUTDOWN_TIMEOUT=120

Sweet. I went for the PARALLEL_SHUTDOWN=10 and left the 300. Maybe the
default PARALLEL_SHUTDOWN value should not be 0 ?

> I never got bugs about that config being too aggressive.
> The change is old and as easy as:
> https://git.launchpad.net/ubuntu/+source/libvirt/tree/debian/patches/ubuntu/parallel-shutdown.patch?h=ubuntu/focal-devel
> Maybe you just want to open a bug with Debian to change the default
> config there as well?

No it is a bug in libvirt having the "wrong" defaults. And a bug in
ubuntu not fixing it upstream ;).

Thanks,
Henning

> Steps to reproduce:
> >  - star a VM that does not support ACPI
> >  - reboot the host and wait 300s for the VM to be shut down
> >  - now start it multiple times
> >  - wait multiples of 300s for the shutdown
> >
> > Expected behaviour:
> >  - no matter how many instances do not support ACPI, make it 300s
> > max because we shut them down in parallel
> >
> >
> > regards,
> > Henning
> >
> >
> > --
> > libvir-list mailing list
> > libvir-list@redhat.com
> > https://www.redhat.com/mailman/listinfo/libvir-list
> >
> >  
> 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



[libvirt] libvirt-guests.sh without or with failing ACPI support

2019-12-10 Thread Henning Schild
Hi all,

the systemd shutdown scripts work sequentially with a 300s timeout
(seen on Debian). If a VM does not have ACPI support, or the ACPI
support failed for some reason, you are looking at a 300s timeout per
instance for a host shutdown/reboot.
i.e. 10 instances without working ACPI = 3000s to shut down

I think the systemd scripting should be parallel instead of
sequentially. So if you have many VMs without working ACPI you just
have to wait 300s in total for the host to shut down.

Steps to reproduce:
 - star a VM that does not support ACPI
 - reboot the host and wait 300s for the VM to be shut down
 - now start it multiple times
 - wait multiples of 300s for the shutdown

Expected behaviour:
 - no matter how many instances do not support ACPI, make it 300s max
   because we shut them down in parallel 


regards,
Henning


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



[libvirt] libvirt-guests.sh stop

2019-12-10 Thread Henning Schild
Hi all,

the shellscript causing VMs to eventually stop when the host shuts
down/reboots has a window where it can loose ACPI shutdown events.

I have seen VMs not shutting down before they become ACPI aware,
causing the systemd magic to time out for a couple of minutes and
eventually killing the VM.

Steps to reproduce:
 - start a new VM
 - reboot the host while the VM is still booting up

What happens:
 - the ACPI power-button event will get lost
 - the systemd loop on the host will go and wait
 - the VM will get killed hard eventually

What should happen (probably):
 - the retry loop should inject an ACPI power-button-event with every
   retry (5s?)
 - the VM will eventually have ACPI support and pick up any of the
   many button-events
 - the VM will shut down on any of the many power-button events, if
   there was enough time

I could try to come up with a patch, but i hope the description of the
problem is clear enough so someone else will try.

regards,
Henning


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [Qemu-devel] pci-assign fails with read error on config-space file

2016-11-02 Thread Henning Schild
Am Wed, 2 Nov 2016 09:54:16 +
schrieb "Daniel P. Berrange" <berra...@redhat.com>:

> On Fri, Oct 28, 2016 at 01:28:19PM +0200, Henning Schild wrote:
> > Hey,
> > 
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and
> > appends arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
> > 
> > I know i should probably not be doing this, it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a
> > vm.
> > 
> > In both cases qemu will crash with the following output:
> >   
> > > qemu: hardware error: pci read failed, ret = 0 errno = 22  
> > 
> > followed by the usual machine state dump. With strace i found it to
> > be a failing read on the config space file of my device.
> > /sys/bus/pci/devices/:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on
> > vendor etc.  
> 
> errno == 22, means EINVAL, so it feels unlikely to be a permissions
> problem unless the kernel or QEMU is reporting the wrong errno.
> 
> > Manually launching a qemu with the pci-assign works without a
> > problem, so i "blame" libvirt and the cgroup environment the qemu
> > ends up in.  
> 
> The 'config' file is a plain file, so not affected by cgroups - that
> only affects block devices.
> 
> When libvirt runs QEMU, it runs unprivileged qemu:qemu user/group,
> so perhaps it is a permissions thing, despite the fact that you're
> getting EINVAL, not EACCESS.

If the wrapper qemu decides to assign a PCI device it will use a
suid-root qemu to do so. So it is no EACCESS, as i said other reads
worked fine.

> It would be interesting to know just what part of the config space
> QEMU was trying to read I guess, to better understand why it might
> be failing

I should have said that before, it is a one byte read on offest 64. So
just behind the regular cfg-space.

regards,
Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [libvirt-users] pci-assign fails with read error on config-space file

2016-11-02 Thread Henning Schild
Am Fri, 28 Oct 2016 11:25:55 -0400
schrieb Laine Stump <la...@redhat.com>:

> On 10/28/2016 07:28 AM, Henning Schild wrote:
> > Hey,
> >
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and
> > appends arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'  
> 
> With any reasonably modern version of Linux/qemu/libvirt, you should
> not be using pci-assign, but should use vfio-pci instead. pci-assign
> is old, unmaintained, and deprecated (and any other bad words you can
> think of).
> 
> Also, have you done anything to lock the guest's memory in host RAM? 
> This is necessary so that the source/destination of DMA reads/writes
> is always present. It is done automatically by libvirt as required
> *when libvirt knows that a device is being assigned to the guest*,
> but if you're going behind libvirt's back, you need to take care of
> that yourself (or alternately, don't go behind libvirt's back, which
> is the greatly preferred alternative!)

Memory locking is taken care of with "-realtime mlock=on".

> >
> > I know i should probably not be doing this,  
> 
> 
> Yes, that is a serious understatement :-) And I suspect that it isn't 
> necessary.

I know, but that was never the question ;).

> >   it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a
> > vm.  
> 
> libvirt selects the device according to its PCI address, not vendor
> and device id. Is that not "fine-grained" enough? (And does OpenStack
> not let you select devices based on their PCI address?)

The workaround is indeed for the version of OpenStack we are using.
Recent versions might have support for more fine-grained assignment,
but updating OpenStack is not something i would like to do right now.
Another item on the TODO-list that i would like to keep seperate from
the problem at hand.

> >
> > In both cases qemu will crash with the following output:
> >  
> >> qemu: hardware error: pci read failed, ret = 0 errno = 22  
> > followed by the usual machine state dump. With strace i found it to
> > be a failing read on the config space file of my device.
> > /sys/bus/pci/devices/:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on
> > vendor etc.
> >
> > Manually launching a qemu with the pci-assign works without a
> > problem, so i "blame" libvirt and the cgroup environment the qemu
> > ends up in. So i put a bash into the exact same cgroup setup - next
> > to a running qemu, expecting a dd or hexdump on the config-space
> > file to fail. But from that bash i can read the file without a
> > problem.
> >
> > Has anyone seen that problem before?  
> 
> No, because nobody else (that I've ever heard) is doing what you are 
> doing. You're going around behind the back of libvirt  (and
> OpenStack) to do device assignment with a method that was replaced
> with something newer/better/etc about 3 years ago, and in the process
> are likely missing a lot of the details that would otherwise be
> automatically handled by libvirt.

Sure, and my question was aiming at what exactly i could be missing.
That is just to fix a system that used to work and get a better
understanding of "a lot of the details that would otherwise be
automatically handled by libvirt".

> 
> > Right now i do not know what i
> > am missing, maybe qemu is hitting some limits configured for the
> > cgroups or whatever. I can not use pci-assign from libvirt, but if i
> > did would it configure cgroups in a different way or relax some
> > limits?
> >
> > What would be a good next step to debug that? Right now i am
> > looking at kernel event traces, but the machine is pretty big and
> > so is the trace.  
> 
> 
> My recommendation would be this:
> 
> 1) look at OpenStack to see if it allows selecting the device to
> assign by PCI address. If so, use that (it will just tell libvirt
> "assign this device", and libvirt will automatically use VFIO for the
> device assignment if it's available (which it will be))

The version currently in use does not allow that.

> 2) if (1) is a deadend (i.e. OpenStack doesn't allow you to select
> based on PCI address), use your "sneaky backdoor method" to do "virsh 
> attach-device somexmlfile.xml", where somexmlfile.xml has a proper 
>  element to select and assign the host device you want.
> Aga

Re: [libvirt] [Qemu-devel] pci-assign fails with read error on config-space file

2016-11-02 Thread Henning Schild
Am Fri, 28 Oct 2016 17:22:41 +0200
schrieb Laszlo Ersek <ler...@redhat.com>:

> On 10/28/16 13:28, Henning Schild wrote:
> > Hey,
> > 
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and
> > appends arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
> > 
> > I know i should probably not be doing this, it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a
> > vm.  
> 
> (1) The libvirt domain XML identifies the host PCI device to assign by
> full PCI address (see the  element:
> <http://libvirt.org/formatdomain.html#elementsHostDev>); it does not
> filter with vendor/device ID.
> 
> So, I believe your comment refers to the pci-stub host kernel driver
> not being flexible enough for binding vs. not binding different
> instances of the same vendor/device ID.

My comment referred to OpenStack. The version we are using assigns PCI
devices purely by device and vendor ID. The pci stub is no problem at
all, you can always bind/unbind by address.

> If that's the case, would you be helped by the following host kernel
> patch?
> 
> [PATCH] PCI: pci-stub: accept exceptions to the ID- and class-based
> matching
> 
> <http://www.spinics.net/lists/linux-pci/msg55497.html>
> 
> (2) Is there any reason (other than (1)) that you are using the
> legacy / deprecated pci-assign method, rather than VFIO?
> 
> I suggest to evaluate whether the "pci-stub.except=..." kernel
> parameter helped your use case, and if (consequently) you could move
> to a fully libvirt + VFIO based config.

I would like to do that in the long run and will look into the options.
But for now i was hoping for a quick answer to make the hacky version
work again.

Thanks,
Henning

> Thanks
> Laszlo
> 
> > 
> > In both cases qemu will crash with the following output:
> >   
> >> qemu: hardware error: pci read failed, ret = 0 errno = 22  
> > 
> > followed by the usual machine state dump. With strace i found it to
> > be a failing read on the config space file of my device.
> > /sys/bus/pci/devices/:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on
> > vendor etc.
> > 
> > Manually launching a qemu with the pci-assign works without a
> > problem, so i "blame" libvirt and the cgroup environment the qemu
> > ends up in. So i put a bash into the exact same cgroup setup - next
> > to a running qemu, expecting a dd or hexdump on the config-space
> > file to fail. But from that bash i can read the file without a
> > problem.
> > 
> > Has anyone seen that problem before? Right now i do not know what i
> > am missing, maybe qemu is hitting some limits configured for the
> > cgroups or whatever. I can not use pci-assign from libvirt, but if i
> > did would it configure cgroups in a different way or relax some
> > limits?
> > 
> > What would be a good next step to debug that? Right now i am
> > looking at kernel event traces, but the machine is pretty big and
> > so is the trace.
> > 
> > That assignment used to work and i do not know how it broke, i have
> > tried combinations of several kernels, versions of libvirt and qemu.
> > (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and
> > 2.7) All combinations show the same problem, even the ones that
> > work on other machines. So when it comes to software versions the
> > problem could well be caused by a software update of another
> > component, that i got with the package manager and did not compile
> > myself. It is a debian 8.6 with all recent updates installed. My
> > guess would be that systemd could have an influence on cgroups or
> > limits causing such a problem.
> > 
> > regards,
> > Henning
> >   
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] pci-assign fails with read error on config-space file

2016-10-30 Thread Henning Schild
Hey,

i am running an unusual setup where i assign pci devices behind the
back of libvirt. I have two options to do that:
1. a wrapper script for qemu that takes care of suid-root and appends
arguments for pci-assign
2. virsh qemu-monitor-command ... 'device_add pci-assign...'

I know i should probably not be doing this, it is a workaround to
introduce fine-grained pci-assignment in an openstack setup, where
vendor and device id are not enough to pick the right device for a vm.

In both cases qemu will crash with the following output:

> qemu: hardware error: pci read failed, ret = 0 errno = 22

followed by the usual machine state dump. With strace i found it to be
a failing read on the config space file of my device.
/sys/bus/pci/devices/:xx:xx.x/config
A few reads out of that file succeeded, as well as accesses on vendor
etc.

Manually launching a qemu with the pci-assign works without a problem,
so i "blame" libvirt and the cgroup environment the qemu ends up in.
So i put a bash into the exact same cgroup setup - next to a running
qemu, expecting a dd or hexdump on the config-space file to fail. But
from that bash i can read the file without a problem.

Has anyone seen that problem before? Right now i do not know what i
am missing, maybe qemu is hitting some limits configured for the
cgroups or whatever. I can not use pci-assign from libvirt, but if i
did would it configure cgroups in a different way or relax some limits?

What would be a good next step to debug that? Right now i am looking at
kernel event traces, but the machine is pretty big and so is the trace.

That assignment used to work and i do not know how it broke, i have
tried combinations of several kernels, versions of libvirt and qemu.
(kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and 2.7)
All combinations show the same problem, even the ones that work on
other machines. So when it comes to software versions the problem could
well be caused by a software update of another component, that i
got with the package manager and did not compile myself. It is a debian
8.6 with all recent updates installed. My guess would be that systemd
could have an influence on cgroups or limits causing such a problem.

regards,
Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] qemu: rename QEMU_CAPS_MLOCK to QEMU_CAPS_REALTIME_MLOCK

2016-07-25 Thread Henning Schild
Purely cosmetic change to be consistent with the other names.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
Something i found in an old staging queue and that might be considered
useful.

 src/qemu/qemu_capabilities.c | 2 +-
 src/qemu/qemu_capabilities.h | 2 +-
 src/qemu/qemu_command.c  | 4 ++--
 tests/qemuxml2argvtest.c | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index f600ce9..d5b73e6 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -2655,7 +2655,7 @@ static struct virQEMUCapsCommandLineProps 
virQEMUCapsCommandLine[] = {
 { "machine", "vmport", QEMU_CAPS_MACHINE_VMPORT_OPT },
 { "drive", "discard", QEMU_CAPS_DRIVE_DISCARD },
 { "drive", "detect-zeroes", QEMU_CAPS_DRIVE_DETECT_ZEROES },
-{ "realtime", "mlock", QEMU_CAPS_MLOCK },
+{ "realtime", "mlock", QEMU_CAPS_REALTIME_MLOCK },
 { "boot-opts", "strict", QEMU_CAPS_BOOT_STRICT },
 { "boot-opts", "reboot-timeout", QEMU_CAPS_REBOOT_TIMEOUT },
 { "boot-opts", "splash-time", QEMU_CAPS_SPLASH_TIMEOUT },
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
index ca84f27..bd5c6d9 100644
--- a/src/qemu/qemu_capabilities.h
+++ b/src/qemu/qemu_capabilities.h
@@ -254,7 +254,7 @@ typedef enum {
 QEMU_CAPS_MEM_MERGE, /* -machine mem-merge */
 QEMU_CAPS_VNC_WEBSOCKET, /* -vnc x:y,websocket */
 QEMU_CAPS_DRIVE_DISCARD, /* -drive discard=off(ignore)|on(unmap) */
-QEMU_CAPS_MLOCK, /* -realtime mlock=on|off */
+QEMU_CAPS_REALTIME_MLOCK, /* -realtime mlock=on|off */
 
 /* 150 */
 QEMU_CAPS_VNC_SHARE_POLICY, /* set display sharing policy */
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 4558b9f..3dc131b 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -7022,12 +7022,12 @@ qemuBuildMemCommandLine(virCommandPtr cmd,
 qemuBuildMemPathStr(cfg, def, qemuCaps, cmd) < 0)
 return -1;
 
-if (def->mem.locked && !virQEMUCapsGet(qemuCaps, QEMU_CAPS_MLOCK)) {
+if (def->mem.locked && !virQEMUCapsGet(qemuCaps, 
QEMU_CAPS_REALTIME_MLOCK)) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
_("memory locking not supported by QEMU binary"));
 return -1;
 }
-if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_MLOCK)) {
+if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_REALTIME_MLOCK)) {
 virCommandAddArg(cmd, "-realtime");
 virCommandAddArgFormat(cmd, "mlock=%s",
def->mem.locked ? "on" : "off");
diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c
index 356f843..a5d51a8 100644
--- a/tests/qemuxml2argvtest.c
+++ b/tests/qemuxml2argvtest.c
@@ -1750,9 +1750,9 @@ mymain(void)
 QEMU_CAPS_VIRTIO_SCSI, QEMU_CAPS_VIRTIO_SCSI,
 QEMU_CAPS_DEVICE_SCSI_GENERIC);
 
-DO_TEST("mlock-on", QEMU_CAPS_MLOCK);
+DO_TEST("mlock-on", QEMU_CAPS_REALTIME_MLOCK);
 DO_TEST_FAILURE("mlock-on", NONE);
-DO_TEST("mlock-off", QEMU_CAPS_MLOCK);
+DO_TEST("mlock-off", QEMU_CAPS_REALTIME_MLOCK);
 DO_TEST("mlock-unsupported", NONE);
 
 DO_TEST_PARSE_ERROR("pci-bridge-negative-index-invalid",
-- 
2.7.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] github and gitlab mirrors are far behind

2016-03-14 Thread Henning Schild
Hey,

as the title says the mirrors are pretty far behind, probably something
is broken there.

Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv2 2/3] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-03-01 Thread Henning Schild
On Tue, 1 Mar 2016 12:34:44 +0100
Peter Krempa <pkre...@redhat.com> wrote:

> On Tue, Mar 01, 2016 at 11:20:18 +, Daniel Berrange wrote:
> > On Fri, Feb 26, 2016 at 04:34:23PM +0100, Henning Schild wrote:  
> > > Move qemuProcessSetupEmulator up under qemuSetupCgroup. That way
> > > we move the one main thread right into the emulator cgroup,
> > > instead of moving multiple threads later on. And we do not
> > > actually want any threads running in the parent cgroups (cpu
> > > cpuacct cpuset).
> > > 
> > > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > > ---
> > >  src/qemu/qemu_process.c | 8 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > > index 0c43183..7725a5f 100644
> > > --- a/src/qemu/qemu_process.c
> > > +++ b/src/qemu/qemu_process.c
> > > @@ -5087,6 +5087,10 @@ qemuProcessLaunch(virConnectPtr conn,
> > >  qemuProcessInitCpuAffinity(vm) < 0)
> > >  goto cleanup;
> > >  
> > > +VIR_DEBUG("Setting emulator tuning/settings");
> > > +if (qemuProcessSetupEmulator(vm) < 0)
> > > +goto cleanup;
> > > +
> > >  VIR_DEBUG("Setting domain security labels");
> > >  if (virSecurityManagerSetAllLabel(driver->securityManager,
> > >vm->def,
> > > @@ -5129,10 +5133,6 @@ qemuProcessLaunch(virConnectPtr conn,
> > >  if (rv == -1) /* The VM failed to start */
> > >  goto cleanup;
> > >  
> > > -VIR_DEBUG("Setting emulator tuning/settings");
> > > -if (qemuProcessSetupEmulator(vm) < 0)
> > > -goto cleanup;
> > > -
> > >  VIR_DEBUG("Waiting for monitor to show up");
> > >  if (qemuProcessWaitForMonitor(driver, vm, asyncJob,
> > > priv->qemuCaps, logCtxt) < 0) goto cleanup;  
> > 
> > Do you have some other local patches applied to your git ?  I just
> > went to apply this and realized that qemuProcessSetupEmulator()
> > does not actually exist. It git master the function is
> > qemuSetupCgroupForEmulator and there is another function call
> > qemuProcessSetEmulatorAffinity just after it too.  So I can't apply
> > this patch  
> 
> This was based on top of my refactor that creates
> qemuProcessSetupEmulator. I didn't realize Henning based that on top
> of my patch. I wanted to wait until this gets sorted and then re-do my
> patch, but I can push it so that you can apply that.

Peter could you please look into the affinity setting code of qemu
after my patch 2?
In the cgroups affinity setting code the main threads (qemuProcess) is
considered an emulator-thread right away. The manual affinty setting
code should apply the same scheme.
I think qemuProcessInitCpuAffinity is obsolte. If cornercases remain
they should become part of qemuProcessSetupEmulator.
Depending on how qemuProcessSetupEmulator has to change we will see
about the patch ordering later.

> Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv2 2/3] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-03-01 Thread Henning Schild
On Tue, 1 Mar 2016 11:20:18 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Fri, Feb 26, 2016 at 04:34:23PM +0100, Henning Schild wrote:
> > Move qemuProcessSetupEmulator up under qemuSetupCgroup. That way
> > we move the one main thread right into the emulator cgroup, instead
> > of moving multiple threads later on. And we do not actually want any
> > threads running in the parent cgroups (cpu cpuacct cpuset).
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/qemu/qemu_process.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 0c43183..7725a5f 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -5087,6 +5087,10 @@ qemuProcessLaunch(virConnectPtr conn,
> >  qemuProcessInitCpuAffinity(vm) < 0)
> >  goto cleanup;
> >  
> > +VIR_DEBUG("Setting emulator tuning/settings");
> > +if (qemuProcessSetupEmulator(vm) < 0)
> > +goto cleanup;
> > +
> >  VIR_DEBUG("Setting domain security labels");
> >  if (virSecurityManagerSetAllLabel(driver->securityManager,
> >vm->def,
> > @@ -5129,10 +5133,6 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (rv == -1) /* The VM failed to start */
> >  goto cleanup;
> >  
> > -VIR_DEBUG("Setting emulator tuning/settings");
> > -if (qemuProcessSetupEmulator(vm) < 0)
> > -goto cleanup;
> > -
> >  VIR_DEBUG("Waiting for monitor to show up");
> >  if (qemuProcessWaitForMonitor(driver, vm, asyncJob,
> > priv->qemuCaps, logCtxt) < 0) goto cleanup;  
> 
> Do you have some other local patches applied to your git ?  I just
> went to apply this and realized that qemuProcessSetupEmulator() does
> not actually exist. It git master the function is
> qemuSetupCgroupForEmulator and there is another function call
> qemuProcessSetEmulatorAffinity just after it too.  So I can't apply
> this patch

Yes i have two other patches that John also wanted to merge. They would
have conflicted with mine. See cover letter of this series.

> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCHv2 1/3] vircgroup: one central point for adding tasks to cgroups

2016-02-26 Thread Henning Schild
Use virCgroupAddTaskController in virCgroupAddTask so we have one
single point where we add tasks to cgroups.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/util/vircgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 6ce208e..ec59150 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1183,7 +1183,7 @@ virCgroupAddTask(virCgroupPtr group, pid_t pid)
 if (i == VIR_CGROUP_CONTROLLER_SYSTEMD)
 continue;
 
-if (virCgroupSetValueU64(group, i, "tasks", pid) < 0)
+if (virCgroupAddTaskController(group, pid, i) < 0)
 goto cleanup;
 }
 
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCHv2 0/3] reorder qemu cgroups operations

2016-02-26 Thread Henning Schild
This is a much shorter series focusing on the key point, the second patch.
The first patch is somehing that was found when looking at the code and is
just a cosmetic change. The third patch just cleans up. They where both
already ACKed.

Patch 2 was also already ACKed but conflicted with another pending change.
It should be reviewed in its new context. Note the new order with the
"manual" affinity setting code.

@Peter:
qemuProcessInitCpuAffinity and qemuProcessSetupEmulator have a lot in
in common. I guess there is potential for further simplification.

The series is based on 92ec2e5e9b79b7df4d575040224bd606ab0b6dd8 with
these two patches on top:
http://www.redhat.com/archives/libvir-list/2016-February/msg01211.html

Henning Schild (3):
  vircgroup: one central point for adding tasks to cgroups
  qemu_cgroup: put qemu right into emulator sub-cgroup
  qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask

 src/libvirt_private.syms |   1 -
 src/qemu/qemu_process.c  |  10 ++---
 src/util/vircgroup.c | 105 +--
 src/util/vircgroup.h |   3 --
 4 files changed, 6 insertions(+), 113 deletions(-)

-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCHv2 3/3] qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask

2016-02-26 Thread Henning Schild
qemuProcessSetupEmulator runs at a point in time where there is only
the qemu main thread. Use virCgroupAddTask to put just that one task
into the emulator cgroup. That patch makes virCgroupMoveTask and
virCgroupAddTaskStrController obsolete.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/libvirt_private.syms |   1 -
 src/qemu/qemu_process.c  |   2 +-
 src/util/vircgroup.c | 103 ---
 src/util/vircgroup.h |   3 --
 4 files changed, 1 insertion(+), 108 deletions(-)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 4cfaed5..a318cb2 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1233,7 +1233,6 @@ virCgroupHasEmptyTasks;
 virCgroupKill;
 virCgroupKillPainfully;
 virCgroupKillRecursive;
-virCgroupMoveTask;
 virCgroupNewDetect;
 virCgroupNewDetectMachine;
 virCgroupNewDomainPartition;
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 7725a5f..08a9eb6 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2233,7 +2233,7 @@ qemuProcessSetupEmulator(virDomainObjPtr vm)
true, _emulator) < 0)
 goto cleanup;
 
-if (virCgroupMoveTask(priv->cgroup, cgroup_emulator) < 0)
+if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
 goto cleanup;
 
 
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index ec59150..42276ca 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1224,99 +1224,6 @@ virCgroupAddTaskController(virCgroupPtr group, pid_t 
pid, int controller)
 
 
 static int
-virCgroupAddTaskStrController(virCgroupPtr group,
-  const char *pidstr,
-  int controller)
-{
-char *str = NULL, *cur = NULL, *next = NULL;
-unsigned long long p = 0;
-int rc = 0;
-char *endp;
-
-if (VIR_STRDUP(str, pidstr) < 0)
-return -1;
-
-cur = str;
-while (*cur != '\0') {
-if (virStrToLong_ull(cur, , 10, ) < 0) {
-virReportError(VIR_ERR_INTERNAL_ERROR,
-   _("Cannot parse '%s' as an integer"), cur);
-goto cleanup;
-}
-
-if (virCgroupAddTaskController(group, p, controller) < 0) {
-/* A thread that exits between when we first read the source
- * tasks and now is not fatal.  */
-if (virLastErrorIsSystemErrno(ESRCH))
-virResetLastError();
-else
-goto cleanup;
-}
-
-next = strchr(cur, '\n');
-if (next) {
-cur = next + 1;
-*next = '\0';
-} else {
-break;
-}
-}
-
- cleanup:
-VIR_FREE(str);
-return rc;
-}
-
-
-/**
- * virCgroupMoveTask:
- *
- * @src_group: The source cgroup where all tasks are removed from
- * @dest_group: The destination where all tasks are added to
- *
- * Returns: 0 on success or -1 on failure
- */
-int
-virCgroupMoveTask(virCgroupPtr src_group, virCgroupPtr dest_group)
-{
-int ret = -1;
-char *content = NULL;
-size_t i;
-
-for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) {
-if (!src_group->controllers[i].mountPoint ||
-!dest_group->controllers[i].mountPoint)
-continue;
-
-/* We must never move tasks in systemd's hierarchy */
-if (i == VIR_CGROUP_CONTROLLER_SYSTEMD)
-continue;
-
-/* New threads are created in the same group as their parent;
- * but if a thread is created after we first read we aren't
- * aware that it needs to move.  Therefore, we must iterate
- * until content is empty.  */
-while (1) {
-VIR_FREE(content);
-if (virCgroupGetValueStr(src_group, i, "tasks", ) < 0)
-return -1;
-
-if (!*content)
-break;
-
-if (virCgroupAddTaskStrController(dest_group, content, i) < 0)
-goto cleanup;
-}
-}
-
-ret = 0;
- cleanup:
-VIR_FREE(content);
-return ret;
-}
-
-
-static int
 virCgroupSetPartitionSuffix(const char *path, char **res)
 {
 char **tokens;
@@ -4356,16 +4263,6 @@ virCgroupAddTaskController(virCgroupPtr group 
ATTRIBUTE_UNUSED,
 
 
 int
-virCgroupMoveTask(virCgroupPtr src_group ATTRIBUTE_UNUSED,
-  virCgroupPtr dest_group ATTRIBUTE_UNUSED)
-{
-virReportSystemError(ENXIO, "%s",
- _("Control groups not supported on this platform"));
-return -1;
-}
-
-
-int
 virCgroupGetBlkioIoServiced(virCgroupPtr group ATTRIBUTE_UNUSED,
 long long *bytes_read ATTRIBUTE_UNUSED,
 long long *bytes_write ATTRIBUTE_UNUSED,
diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h
index aeb641c..76ecf06 100644
--- a/src/util/vircgroup.h
+++ b/src/u

[libvirt] [PATCHv2 2/3] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-02-26 Thread Henning Schild
Move qemuProcessSetupEmulator up under qemuSetupCgroup. That way
we move the one main thread right into the emulator cgroup, instead
of moving multiple threads later on. And we do not actually want any
threads running in the parent cgroups (cpu cpuacct cpuset).

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_process.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 0c43183..7725a5f 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -5087,6 +5087,10 @@ qemuProcessLaunch(virConnectPtr conn,
 qemuProcessInitCpuAffinity(vm) < 0)
 goto cleanup;
 
+VIR_DEBUG("Setting emulator tuning/settings");
+if (qemuProcessSetupEmulator(vm) < 0)
+goto cleanup;
+
 VIR_DEBUG("Setting domain security labels");
 if (virSecurityManagerSetAllLabel(driver->securityManager,
   vm->def,
@@ -5129,10 +5133,6 @@ qemuProcessLaunch(virConnectPtr conn,
 if (rv == -1) /* The VM failed to start */
 goto cleanup;
 
-VIR_DEBUG("Setting emulator tuning/settings");
-if (qemuProcessSetupEmulator(vm) < 0)
-goto cleanup;
-
 VIR_DEBUG("Waiting for monitor to show up");
 if (qemuProcessWaitForMonitor(driver, vm, asyncJob, priv->qemuCaps, 
logCtxt) < 0)
 goto cleanup;
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 5/9] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-02-26 Thread Henning Schild
On Thu, 25 Feb 2016 17:53:30 -0500
John Ferlan <jfer...@redhat.com> wrote:

> On 02/23/2016 10:58 AM, Henning Schild wrote:
> > Move qemuSetupCgroupForEmulator up under qemuSetupCgroup. That way
> > we move the one main thread right into the emulator cgroup, instead
> > of moving multiple threads later on. And we do not actually want any
> > threads running in the parent cgroups (cpu cpuacct cpuset).
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/qemu/qemu_process.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >   
> 
> This is where things are going to get messy.  Peter Krempa posted a
> series after yours:
> 
> http://www.redhat.com/archives/libvir-list/2016-February/msg01211.html

Where can i get a raw copy of this to "git am"? Is there something like
patchwork for people who are subscribed write-only or not at all?

> which conflicts with this and the followup patch. Hopefully between
> you, Peter, and Dan something can be worked out.
> 
> Also, it seems starting at patch 7 there's more conflicts with the top
> of the upstream, so I couldn't 'git am -3' them into a local branch.
> 
> John
> 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 05cbda2..65f718c 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -4895,6 +4895,10 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (qemuSetupCgroup(driver, vm, nnicindexes, nicindexes) < 0)
> >  goto cleanup;
> >  
> > +VIR_DEBUG("Setting cgroup for emulator (if required)");
> > +if (qemuSetupCgroupForEmulator(vm) < 0)
> > +goto cleanup;
> > +
> >  /* This must be done after cgroup placement to avoid resetting
> > CPU
> >   * affinity */
> >  if (!vm->def->cputune.emulatorpin &&
> > @@ -4943,10 +4947,6 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (rv == -1) /* The VM failed to start */
> >  goto cleanup;
> >  
> > -VIR_DEBUG("Setting cgroup for emulator (if required)");
> > -if (qemuSetupCgroupForEmulator(vm) < 0)
> > -goto cleanup;
> > -
> >  VIR_DEBUG("Setting affinity of emulator threads");
> >  if (qemuProcessSetEmulatorAffinity(vm) < 0)
> >  goto cleanup;
> >   

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 4/9] util: cgroups do not implicitly add task to new machine cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 13:26:36 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Fri, Feb 26, 2016 at 02:17:35PM +0100, Henning Schild wrote:
> > On Fri, 26 Feb 2016 13:00:04 +
> > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> >   
> > > On Fri, Feb 26, 2016 at 01:43:05PM +0100, Henning Schild wrote:
> > > IIUC, the original problem you wanted to address was that vCPU
> > > pids could run the wrong CPU for a short time. ie The original
> > > code did
> > > 
> > >   1. Libvirtd forks child pid
> > >  ... this pid runs on whatever pCPUs that libvirt is
> > > permitted to use... 2. Libvirtd creates root cgroup for VM
> > >  ... this pid runs on whatever pCPUs the root cgroup
> > > inherited... 3. Child pid execs QEMU
> > >  ... QEMU pid runs on whatever pCPUs the root cgroup
> > > inherited... 4. QEMU creates vCPU pids
> > >  ... vCPU pids run on whatever pCPUs the root cgroup
> > > inherited... 4. Libvirtd moves emulator PIDs and vCPU PIDs
> > >  ... emulator PIDs are running on assigned pCPUs for
> > > emulator... ... vCPU PIDs are running on assigned pCPUs for
> > > vCPUs
> > > 
> > > With the important change in patch 5 this now looks like
> > > 
> > >   1. Libvirtd forks child pid
> > >  ... this pid runs on whatever pCPUs that libvirt is
> > > permitted to use... 2. Libvirtd creates root cgroup for VM  
> >   
> > >  ... this pid runs on whatever pCPUs the root cgroup
> > > inherited...  
> > 
> > I am trying to come up with a solution that eliminates the above
> > line from the whole bringup. I.e never allow a pid belonging to the
> > VM outside the pinnings of libvirtd and the VM configuration.  
> 
> That's imposible because you can't stop systemd placing the pid leader
> 
> > But until step 4 it should probably be
> > "... this pid *just sits* on whatever pCPUs the root cgroup
> > inherited..."
> > If we are sure that it does not "run" before 4. patch 5 does the
> > trick already  
> 
> Yes the pid *runs* - it has to run in order to do the setup tasks
> before exec'ing QEMU. Indeed even invoking 'execve()' syscall
> requires that it run.

I am saying it should not run while in the parent cgroup. so
between steps 2 and 3. If we can not stop the pid from getting into the
parent cgroup we have to rely on it not causing disturbance by
"running". Otherwise the whole series is not a solution to the
disturbance problem, it is just a mitigation.
That beeing said i think it is still good enough and we should stop
that discussion here. I will send a v2 series.

> > >   3. Libvirtd moves pid into emulator group
> > >  ... this pid runs on assigned pCPUs for emulator...

from now on it can run all it wants, because it is in the corrent cpuset

> > >   4. Child pid execs QEMU
> > >  ... QEMU pid runs on assigned pCPUs for emulator...
> > >   5. QEMU creates vCPU pids
> > >  ... vCPU pids are running on assigned pCPUS for emulator...
> > >   6. Libvirtd moves vCPU PIDs
> > >  ... emulator PIDs are running on assigned pCPUs for
> > > emulator... ... vCPU PIDs are running on assigned pCPUs for
> > > vCPUs
> > > 
> > > Which is good, because vCPU pids don't ever run on un-restricted
> > > pCPUs.
> > > 
> > > 
> > > Your patch 4 here is attempting to change step 2 only so that it
> > > looks like
> > > 
> > > 
> > >   1. Libvirtd forks child pid
> > >  ... this pid runs on whatever pCPUs that libvirt is
> > > permitted to use... 2. Libvirtd creates root cgroup for VM
> > >  ... this pid runs on whatever pCPUs that libvirt is
> > > permitted to use... or depending on what controller system added
> > >  ... this pid runs on whatever pCPUs the root cgroup
> > > inherited... 3. Libvirtd adds pid into emulator group
> > >  ... this pid runs on assigned pCPUs for emulator...
> > >   4. Child pid execs QEMU
> > >  ... QEMU pid runs on assigned pCPUs for emulator...
> > >   5. QEMU creates vCPU pids
> > >  ... vCPU pids are running on assigned pCPUS for emulator...
> > >   6. Libvirtd moves vCPU PIDs
> > >  ... emulator PIDs are running on assigned pCPUs for
> > > emulator... ... vCPU PIDs are running on assigned pCPUs for
> > > vCPUs
> > > 
> > > At the time we exec QEMU in step 4 the situation is exactly the
> > > same as before. The vCPU pids are still created in the right place
> > > straight away.
> > > 
> > > So this patch 4 doesn't achieve anything useful
> > > 
> > > Regards,
> > > Daniel  
> >   
> 
> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 4/9] util: cgroups do not implicitly add task to new machine cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 13:00:04 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Fri, Feb 26, 2016 at 01:43:05PM +0100, Henning Schild wrote:
> > On Fri, 26 Feb 2016 12:21:02 +
> > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> >   
> > > On Fri, Feb 26, 2016 at 01:16:15PM +0100, Henning Schild wrote:  
> > > > On Fri, 26 Feb 2016 11:13:13 +
> > > > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> > > > 
> > > > > On Tue, Feb 23, 2016 at 04:58:39PM +0100, Henning Schild
> > > > > wrote:
> > > > > > virCgroupNewMachine used to add the pidleader to the newly
> > > > > > created machine cgroup. Do not do this implicit anymore.
> > > > > > 
> > > > > > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > > > > > ---
> > > > > >  src/lxc/lxc_cgroup.c   | 11 +++
> > > > > >  src/qemu/qemu_cgroup.c | 11 +++
> > > > > >  src/util/vircgroup.c   | 22 --
> > > > > >  3 files changed, 22 insertions(+), 22 deletions(-)  
> > > > > 
> > > > > NACK to this patch once again.
> > > > > 
> > > > > This does not actually work as you think it does.
> > > > > 
> > > > > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > > > > > index 11f33ab..aef8e8c 100644
> > > > > > --- a/src/util/vircgroup.c
> > > > > > +++ b/src/util/vircgroup.c
> > > > > > @@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char
> > > > > > *name, }
> > > > > >  }
> > > > > >  
> > > > > > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > > > > > -virErrorPtr saved = virSaveLastError();
> > > > > > -virCgroupRemove(*group);
> > > > > > -virCgroupFree(group);
> > > > > > -if (saved) {
> > > > > > -virSetError(saved);
> > > > > > -virFreeError(saved);
> > > > > > -}
> > > > > > -}  
> > > > > 
> > > > > Just above this we called virSystemdCreateMachine.  Systemd
> > > > > will create the cgroup and add the pidleader to those
> > > > > cgroups. Systemd may add the pidleader to just the 'systemd'
> > > > > controller, or it may add the pidleader to *ALL* controllers.
> > > > > We have no way of knowing.
> > > > > 
> > > > > This virCgroupAddTask call deals with whatever systemd chose
> > > > > not todo, so we can guarantee consistent behaviour with the
> > > > > pidleader in all cgroups.
> > > > > 
> > > > > By removing this you make this method non-deterministic - the
> > > > > pid may or may not be in the cpu controller now. THis is bad
> > > > > because it can lead to QEMU/LXC driver code working in some
> > > > > cases but failing in other cases.
> > > > > 
> > > > > Furthermore, this existing does not cause any problems for the
> > > > > scenario you care about. THis cgroup placement is being set
> > > > > in between the time libvirtd calls fork() and exec(). With
> > > > > your later patch 5, we ensure that the PID is moved across
> > > > > into the emulator cgroup, before we call exec(). When we call
> > > > > exec all memory mappings will be replaced, so QEMU will stil
> > > > > start with the correct vCPU placement and memory allocation
> > > > > placement.
> > > > 
> > > > I agree having the task in the wrong cgroup before the exec()
> > > > seems harmless. But i am not sure all the fiddling with cgroups
> > > > is indeed harmless and does not cause i.e. kernel work on cores
> > > > that should be left alone. I have the feeling allowing the task
> > > > in the parent cgroup is a bad idea, no matter how short the
> > > > window seems to be.
> > > > 
> > > > Right now the parent cgroup contains all cpus found in
> > > > machine.slice, which for pinned VMs is too much. How about we
> > > > calculate the size of the child cgroups before and make the
> > > > parent the union of them. Or

Re: [libvirt] [PATCH 5/9] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-02-26 Thread Henning Schild
On Thu, 25 Feb 2016 17:53:30 -0500
John Ferlan <jfer...@redhat.com> wrote:

> On 02/23/2016 10:58 AM, Henning Schild wrote:
> > Move qemuSetupCgroupForEmulator up under qemuSetupCgroup. That way
> > we move the one main thread right into the emulator cgroup, instead
> > of moving multiple threads later on. And we do not actually want any
> > threads running in the parent cgroups (cpu cpuacct cpuset).
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/qemu/qemu_process.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >   
> 
> This is where things are going to get messy.  Peter Krempa posted a
> series after yours:
> 
> http://www.redhat.com/archives/libvir-list/2016-February/msg01211.html
> 
> which conflicts with this and the followup patch. Hopefully between
> you, Peter, and Dan something can be worked out.

Peters patch 1 seems like a semantic noop. It just merge cgroup
creation and affinity setting. I can probably rebase on top of that
without a problem.

> Also, it seems starting at patch 7 there's more conflicts with the top
> of the upstream, so I couldn't 'git am -3' them into a local branch.
> 
> John
> 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 05cbda2..65f718c 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -4895,6 +4895,10 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (qemuSetupCgroup(driver, vm, nnicindexes, nicindexes) < 0)
> >  goto cleanup;
> >  
> > +VIR_DEBUG("Setting cgroup for emulator (if required)");
> > +if (qemuSetupCgroupForEmulator(vm) < 0)
> > +goto cleanup;
> > +
> >  /* This must be done after cgroup placement to avoid resetting
> > CPU
> >   * affinity */
> >  if (!vm->def->cputune.emulatorpin &&
> > @@ -4943,10 +4947,6 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (rv == -1) /* The VM failed to start */
> >  goto cleanup;
> >  
> > -VIR_DEBUG("Setting cgroup for emulator (if required)");
> > -if (qemuSetupCgroupForEmulator(vm) < 0)
> > -goto cleanup;
> > -
> >  VIR_DEBUG("Setting affinity of emulator threads");
> >  if (qemuProcessSetEmulatorAffinity(vm) < 0)
> >  goto cleanup;
> >   

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 4/9] util: cgroups do not implicitly add task to new machine cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 12:21:02 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Fri, Feb 26, 2016 at 01:16:15PM +0100, Henning Schild wrote:
> > On Fri, 26 Feb 2016 11:13:13 +
> > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> >   
> > > On Tue, Feb 23, 2016 at 04:58:39PM +0100, Henning Schild wrote:  
> > > > virCgroupNewMachine used to add the pidleader to the newly
> > > > created machine cgroup. Do not do this implicit anymore.
> > > > 
> > > > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > > > ---
> > > >  src/lxc/lxc_cgroup.c   | 11 +++
> > > >  src/qemu/qemu_cgroup.c | 11 +++
> > > >  src/util/vircgroup.c   | 22 --
> > > >  3 files changed, 22 insertions(+), 22 deletions(-)
> > > 
> > > NACK to this patch once again.
> > > 
> > > This does not actually work as you think it does.
> > >   
> > > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > > > index 11f33ab..aef8e8c 100644
> > > > --- a/src/util/vircgroup.c
> > > > +++ b/src/util/vircgroup.c
> > > > @@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char
> > > > *name, }
> > > >  }
> > > >  
> > > > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > > > -virErrorPtr saved = virSaveLastError();
> > > > -virCgroupRemove(*group);
> > > > -virCgroupFree(group);
> > > > -if (saved) {
> > > > -virSetError(saved);
> > > > -virFreeError(saved);
> > > > -}
> > > > -}
> > > 
> > > Just above this we called virSystemdCreateMachine.  Systemd will
> > > create the cgroup and add the pidleader to those cgroups. Systemd
> > > may add the pidleader to just the 'systemd' controller, or it may
> > > add the pidleader to *ALL* controllers. We have no way of knowing.
> > > 
> > > This virCgroupAddTask call deals with whatever systemd chose not
> > > todo, so we can guarantee consistent behaviour with the pidleader
> > > in all cgroups.
> > > 
> > > By removing this you make this method non-deterministic - the pid
> > > may or may not be in the cpu controller now. THis is bad because
> > > it can lead to QEMU/LXC driver code working in some cases but
> > > failing in other cases.
> > > 
> > > Furthermore, this existing does not cause any problems for the
> > > scenario you care about. THis cgroup placement is being set
> > > in between the time libvirtd calls fork() and exec(). With your
> > > later patch 5, we ensure that the PID is moved across into the
> > > emulator cgroup, before we call exec(). When we call exec all
> > > memory mappings will be replaced, so QEMU will stil start with
> > > the correct vCPU placement and memory allocation placement.  
> > 
> > I agree having the task in the wrong cgroup before the exec() seems
> > harmless. But i am not sure all the fiddling with cgroups is indeed
> > harmless and does not cause i.e. kernel work on cores that should be
> > left alone. I have the feeling allowing the task in the parent
> > cgroup is a bad idea, no matter how short the window seems to be.
> > 
> > Right now the parent cgroup contains all cpus found in
> > machine.slice, which for pinned VMs is too much. How about we
> > calculate the size of the child cgroups before and make the parent
> > the union of them. Or give the parent the emulator pinning and
> > extend it for the vcpus later. But that might turn out pretty
> > complicated as well, getting the order right with the mix of
> > cpusets and sched_setaffinity(). 
> > > Just just drop this patch please.  
> 
> The point is though that we have *no* choice. Systemd can put the task
> in the cpu controller and we've no way to prevent that. So the code
> *has* to be able to cope with that happening. Therefore this patch is
> wrong it just makes behaviour non-deterministic increasing the
> chances that we don't correctly handle the case where systemd adds
> the task to the cpu controllers

Understood! I was suggesting a "growing on demand" policy instead of
"shrinking after inheriting all".

If we can not control what systemd does we have to give it harmless
cpus to mess around with. That assumes we can control the size of the
cpuset before systemd puts anything in. Once back in control we grow
the parent group before deriving more child groups.

Would that be possible?

I have no objections to keep using the shrinking approach. Especially
since the controlled growing is harder to implement in the given
codebase. It just feels like it should be the other way around. 

> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 9/9] qemu_cgroup: assert threading cgroup layout for machine cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 12:02:52 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Fri, Feb 26, 2016 at 12:57:38PM +0100, Henning Schild wrote:
> > On Fri, 26 Feb 2016 11:22:07 +
> > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> >   
> > > On Tue, Feb 23, 2016 at 04:58:44PM +0100, Henning Schild wrote:  
> > > > Make sure the thread related controls of the machine cgroup
> > > > never get any tasks assigned.
> > > > 
> > > > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > > > ---
> > > >  src/qemu/qemu_cgroup.c | 14 --
> > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > 
> > > NACK This also won't work for same reason as previous patch  
> > 
> > Having that in place can still be useful after we have sorted out
> > the random result of what systemd gave us.
> > Is the general idea of such an assertion a good idea, and should i
> > adopt it according to comments?
> > At the moment i just used the assertion mask in the only code-path
> > that adds tasks within libvirt. If we have to deal with
> > manipulation from the outside, it might be a good idea to introduce
> > more assertions based on the mask.  
> 
> Without patch 4 though, there's nowhere you can put this afaict.

After moving the pid to the emulator cgroup, i can assert that the
parent is now empty and then put it in place. It would assert libvirt
itself does not use the parent group somewhen in the future.

> 
> 
> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 4/9] util: cgroups do not implicitly add task to new machine cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 11:13:13 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Tue, Feb 23, 2016 at 04:58:39PM +0100, Henning Schild wrote:
> > virCgroupNewMachine used to add the pidleader to the newly created
> > machine cgroup. Do not do this implicit anymore.
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/lxc/lxc_cgroup.c   | 11 +++
> >  src/qemu/qemu_cgroup.c | 11 +++
> >  src/util/vircgroup.c   | 22 --
> >  3 files changed, 22 insertions(+), 22 deletions(-)  
> 
> NACK to this patch once again.
> 
> This does not actually work as you think it does.
> 
> > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > index 11f33ab..aef8e8c 100644
> > --- a/src/util/vircgroup.c
> > +++ b/src/util/vircgroup.c
> > @@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char *name,
> >  }
> >  }
> >  
> > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > -virErrorPtr saved = virSaveLastError();
> > -virCgroupRemove(*group);
> > -virCgroupFree(group);
> > -if (saved) {
> > -virSetError(saved);
> > -virFreeError(saved);
> > -}
> > -}  
> 
> Just above this we called virSystemdCreateMachine.  Systemd will
> create the cgroup and add the pidleader to those cgroups. Systemd
> may add the pidleader to just the 'systemd' controller, or it may
> add the pidleader to *ALL* controllers. We have no way of knowing.
> 
> This virCgroupAddTask call deals with whatever systemd chose not
> todo, so we can guarantee consistent behaviour with the pidleader
> in all cgroups.
> 
> By removing this you make this method non-deterministic - the pid
> may or may not be in the cpu controller now. THis is bad because
> it can lead to QEMU/LXC driver code working in some cases but
> failing in other cases.
> 
> Furthermore, this existing does not cause any problems for the
> scenario you care about. THis cgroup placement is being set
> in between the time libvirtd calls fork() and exec(). With your
> later patch 5, we ensure that the PID is moved across into the
> emulator cgroup, before we call exec(). When we call exec all
> memory mappings will be replaced, so QEMU will stil start with
> the correct vCPU placement and memory allocation placement.

I agree having the task in the wrong cgroup before the exec() seems
harmless. But i am not sure all the fiddling with cgroups is indeed
harmless and does not cause i.e. kernel work on cores that should be
left alone. I have the feeling allowing the task in the parent cgroup
is a bad idea, no matter how short the window seems to be.

Right now the parent cgroup contains all cpus found in machine.slice,
which for pinned VMs is too much. How about we calculate the size of the
child cgroups before and make the parent the union of them. Or give the
parent the emulator pinning and extend it for the vcpus later.
But that might turn out pretty complicated as well, getting the order
right with the mix of cpusets and sched_setaffinity().

> Just just drop this patch please.
> 
> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 9/9] qemu_cgroup: assert threading cgroup layout for machine cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 11:22:07 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Tue, Feb 23, 2016 at 04:58:44PM +0100, Henning Schild wrote:
> > Make sure the thread related controls of the machine cgroup never
> > get any tasks assigned.
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/qemu/qemu_cgroup.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)  
> 
> NACK This also won't work for same reason as previous patch

Having that in place can still be useful after we have sorted out the
random result of what systemd gave us.
Is the general idea of such an assertion a good idea, and should i
adopt it according to comments?
At the moment i just used the assertion mask in the only code-path that
adds tasks within libvirt. If we have to deal with manipulation from
the outside, it might be a good idea to introduce more assertions based
on the mask.

Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 5/9] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-02-26 Thread Henning Schild
On Fri, 26 Feb 2016 11:14:31 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Tue, Feb 23, 2016 at 04:58:40PM +0100, Henning Schild wrote:
> > Move qemuSetupCgroupForEmulator up under qemuSetupCgroup. That way
> > we move the one main thread right into the emulator cgroup, instead
> > of moving multiple threads later on. And we do not actually want any
> > threads running in the parent cgroups (cpu cpuacct cpuset).
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/qemu/qemu_process.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)  
> 
> ACK, this is the key part of the fix.  With the old code the QEMU
> pids are only moved /after/ exec(), with this change, the pids are
> moved /before/ exec(), fixing the core problem of threads runing
> int the wrong place between 'exec()' and libvirt querying vCPUs.

If the asserts wont work and we have to live with the task being in the
parent cgroup between fork() and exec() we need to make sure the new
process is truly inactive. We need to make sure we are not just making
the window smaller.

> 
> > 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 05cbda2..65f718c 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -4895,6 +4895,10 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (qemuSetupCgroup(driver, vm, nnicindexes, nicindexes) < 0)
> >  goto cleanup;
> >  
> > +VIR_DEBUG("Setting cgroup for emulator (if required)");
> > +if (qemuSetupCgroupForEmulator(vm) < 0)
> > +goto cleanup;
> > +
> >  /* This must be done after cgroup placement to avoid resetting
> > CPU
> >   * affinity */
> >  if (!vm->def->cputune.emulatorpin &&
> > @@ -4943,10 +4947,6 @@ qemuProcessLaunch(virConnectPtr conn,
> >  if (rv == -1) /* The VM failed to start */
> >  goto cleanup;
> >  
> > -VIR_DEBUG("Setting cgroup for emulator (if required)");
> > -if (qemuSetupCgroupForEmulator(vm) < 0)
> > -goto cleanup;
> > -
> >  VIR_DEBUG("Setting affinity of emulator threads");
> >  if (qemuProcessSetEmulatorAffinity(vm) < 0)
> >  goto cleanup;
> > -- 
> > 2.4.10
> >   
> 
> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 6/9] qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask

2016-02-26 Thread Henning Schild
Is it ok do drop unused code in the same patch that makes the code
obsolete, or should i split that up?

On Tue, 23 Feb 2016 16:58:41 +0100
Henning Schild <henning.sch...@siemens.com> wrote:

> qemuSetupCgroupForEmulator runs at a point in time where there is only
> the qemu main thread. Use virCgroupAddTask to put just that one task
> into the emulator cgroup. That patch makes virCgroupMoveTask and
> virCgroupAddTaskStrController obsolete.
> 
> Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> ---
>  src/libvirt_private.syms |   1 -
>  src/qemu/qemu_cgroup.c   |   4 +-
>  src/util/vircgroup.c | 102
> ---
> src/util/vircgroup.h |   3 -- 4 files changed, 2 insertions(+),
> 108 deletions(-)
> 
> diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
> index cf93d06..c5e57bf 100644
> --- a/src/libvirt_private.syms
> +++ b/src/libvirt_private.syms
> @@ -1234,7 +1234,6 @@ virCgroupIsolateMount;
>  virCgroupKill;
>  virCgroupKillPainfully;
>  virCgroupKillRecursive;
> -virCgroupMoveTask;
>  virCgroupNewDetect;
>  virCgroupNewDetectMachine;
>  virCgroupNewDomainPartition;
> diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
> index 66dc782..d410a66 100644
> --- a/src/qemu/qemu_cgroup.c
> +++ b/src/qemu/qemu_cgroup.c
> @@ -1145,8 +1145,8 @@ qemuSetupCgroupForEmulator(virDomainObjPtr vm)
> true, _emulator) < 0)
>  goto cleanup;
>  
> -if (virCgroupMoveTask(priv->cgroup, cgroup_emulator) < 0)
> -goto cleanup;
> +if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
> +   goto cleanup;
>  
>  if (def->cputune.emulatorpin)
>  cpumask = def->cputune.emulatorpin;
> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> index aef8e8c..c31c83b 100644
> --- a/src/util/vircgroup.c
> +++ b/src/util/vircgroup.c
> @@ -1209,50 +1209,6 @@ virCgroupAddTaskController(virCgroupPtr group,
> pid_t pid, int controller) }
>  
>  
> -static int
> -virCgroupAddTaskStrController(virCgroupPtr group,
> -  const char *pidstr,
> -  int controller)
> -{
> -char *str = NULL, *cur = NULL, *next = NULL;
> -unsigned long long p = 0;
> -int rc = 0;
> -char *endp;
> -
> -if (VIR_STRDUP(str, pidstr) < 0)
> -return -1;
> -
> -cur = str;
> -while (*cur != '\0') {
> -if (virStrToLong_ull(cur, , 10, ) < 0) {
> -virReportError(VIR_ERR_INTERNAL_ERROR,
> -   _("Cannot parse '%s' as an integer"),
> cur);
> -goto cleanup;
> -}
> -
> -if (virCgroupAddTaskController(group, p, controller) < 0) {
> -/* A thread that exits between when we first read the
> source
> - * tasks and now is not fatal.  */
> -if (virLastErrorIsSystemErrno(ESRCH))
> -virResetLastError();
> -else
> -goto cleanup;
> -}
> -
> -next = strchr(cur, '\n');
> -if (next) {
> -cur = next + 1;
> -*next = '\0';
> -} else {
> -break;
> -}
> -}
> -
> - cleanup:
> -VIR_FREE(str);
> -return rc;
> -}
> -
>  void
>  virCgroupSetAssertEmpty(virCgroupPtr group, int mask) {
>  group->assert_empty = mask;
> @@ -1264,54 +1220,6 @@ virCgroupGetAssertEmpty(virCgroupPtr group) {
>  }
>  
>  
> -/**
> - * virCgroupMoveTask:
> - *
> - * @src_group: The source cgroup where all tasks are removed from
> - * @dest_group: The destination where all tasks are added to
> - *
> - * Returns: 0 on success or -1 on failure
> - */
> -int
> -virCgroupMoveTask(virCgroupPtr src_group, virCgroupPtr dest_group)
> -{
> -int ret = -1;
> -char *content = NULL;
> -size_t i;
> -
> -for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) {
> -if (!src_group->controllers[i].mountPoint ||
> -!dest_group->controllers[i].mountPoint)
> -continue;
> -
> -/* We must never move tasks in systemd's hierarchy */
> -if (i == VIR_CGROUP_CONTROLLER_SYSTEMD)
> -continue;
> -
> -/* New threads are created in the same group as their parent;
> - * but if a thread is created after we first read we aren't
> - * aware that it needs to move.  Therefore, we must iterate
> - * until content is empty.  */
> -while (1) {
> -VIR_FREE(content);
> -if (virCgroupGetValueStr(src_group, i, "

Re: [libvirt] [PATCH 0/9] fix thread related controllers in cgroups

2016-02-26 Thread Henning Schild
John,

thanks for the review so far. Since i will have to rebase and check
conflicts with other queued patches i would like to also get a review
of the semantics of the overall series.
Or do i first have to get the changes past a CI testsuit?

Henning

On Tue, 23 Feb 2016 16:58:33 +0100
Henning Schild <henning.sch...@siemens.com> wrote:

> This series picks up the cgroups work i started earlier. My initial
> patches got in and later reverted before 1.3.1.
> 
> The problem the series is solving is about qemu-threads becoming
> runnable on pcpus outside the pinning masks configured for the
> machine. That only happens for a short time before the thread is
> moved to its final cpuset. But it can disturb other load on the
> system or can lead to qemu never starting. (qemu main thread
> ends up on a pcpu with busy high prio rt-task).
> 
> The problem in the original series was the lack of understanding 
> that one virCgroup can cover all controllers. Instead of just touching
> cpusets the patches had side effects on all the other controllers
> (memory, blkio etc.) Again the general idea is to put all threads
> right into the correct cgroups and to not move them around. But this
> series touches only the cpu, cpuset, and cpuacct controllers. That are
> the ones relevant to threads and that are the controllers the
> threading sub-groups have mounted.
> 
> Patches 1, 2, and 9 deal with asserting correct behaviour. They are
> optional. But given the complexity of the "bringup" and the importance
> of getting that right, i think they should go in as well!
> 
> The tricky bits are in patches 5 and 8, i kept them as simple as
> possible.
> 
> The series is based on v1.3.1.
> 
> Henning Schild (9):
>   vircgroup: one central point for adding tasks to cgroups
>   vircgroup: add assertion to allow cgroup controllers to stay empty
>   vircgroup: introduce controller mask for threads
>   util: cgroups do not implicitly add task to new machine cgroup
>   qemu_cgroup: put qemu right into emulator sub-cgroup
>   qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask
>   vircgroup: add controller mask to virCgroupAddTask
>   qemu_cgroup: dont put qemu main thread into wrong cgroup
>   qemu_cgroup: assert threading cgroup layout for machine cgroup
> 
>  src/libvirt_private.syms |   3 +-
>  src/lxc/lxc_cgroup.c |  11 
>  src/lxc/lxc_controller.c |   4 +-
>  src/qemu/qemu_cgroup.c   |  30 +++--
>  src/qemu/qemu_driver.c   |   2 +-
>  src/qemu/qemu_process.c  |   8 +--
>  src/util/vircgroup.c | 155
> ---
> src/util/vircgroup.h |  13 +++- src/util/vircgrouppriv.h |   1 +
>  9 files changed, 81 insertions(+), 146 deletions(-)
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2/9] vircgroup: add assertion to allow cgroup controllers to stay empty

2016-02-26 Thread Henning Schild
Ok i will reorder, fix style and docs etc.

On Thu, 25 Feb 2016 17:52:55 -0500
John Ferlan <jfer...@redhat.com> wrote:

> On 02/23/2016 10:58 AM, Henning Schild wrote:
> > When using a hierarchy of cgroups we might want to add tasks just to
> > the children cgroups but never to the parent. To make sure we do not
> > use a parent cgroup by accident add a mechanism that lets us assert
> > a correct implementation in cases we want such a hierarchy.
> > 
> > i.e. for qemu cpusets we want all tasks in /vcpuX or /emulator, not
> > in /.
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/libvirt_private.syms |  2 ++
> >  src/util/vircgroup.c | 19 +++
> >  src/util/vircgroup.h |  3 +++
> >  src/util/vircgrouppriv.h |  1 +
> >  4 files changed, 25 insertions(+)
> >   
> 
> These aren't used until patch 9 - I think this should be closer to
> that patch...  That is introduce it just before you use it..
> 
> > diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
> > index 83f6e2c..cf93d06 100644
> > --- a/src/libvirt_private.syms
> > +++ b/src/libvirt_private.syms
> > @@ -1201,6 +1201,7 @@ virCgroupDenyDeviceMajor;
> >  virCgroupDenyDevicePath;
> >  virCgroupDetectMountsFromFile;
> >  virCgroupFree;
> > +virCgroupGetAssertEmpty;
> >  virCgroupGetBlkioDeviceReadBps;
> >  virCgroupGetBlkioDeviceReadIops;
> >  virCgroupGetBlkioDeviceWeight;
> > @@ -1245,6 +1246,7 @@ virCgroupNewThread;
> >  virCgroupPathOfController;
> >  virCgroupRemove;
> >  virCgroupRemoveRecursively;
> > +virCgroupSetAssertEmpty;
> >  virCgroupSetBlkioDeviceReadBps;
> >  virCgroupSetBlkioDeviceReadIops;
> >  virCgroupSetBlkioDeviceWeight;
> > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > index 0b65238..ad46dfc 100644
> > --- a/src/util/vircgroup.c
> > +++ b/src/util/vircgroup.c
> > @@ -1197,6 +1197,15 @@ virCgroupAddTaskController(virCgroupPtr
> > group, pid_t pid, int controller) return -1;
> >  }
> >  
> > +if(group->assert_empty & (1 << controller)) {  
> 
> should be :
> 
> if (group...
> 
> > +virReportError(VIR_ERR_INTERNAL_ERROR,
> > +   _("Controller '%s' is not supposed to
> > contain any"
> > + " tasks. group=%s pid=%d\n"),
> > +   virCgroupControllerTypeToString(controller),
> > +   group->path, pid);
> > +return -1;
> > +}
> > +
> >  return virCgroupSetValueU64(group, controller, "tasks",
> >  (unsigned long long)pid);
> >  }
> > @@ -1246,6 +1255,16 @@ virCgroupAddTaskStrController(virCgroupPtr
> > group, return rc;
> >  }
> >
> 
> Need to have some code comments regarding input and what these do...
> 
> > +void
> > +virCgroupSetAssertEmpty(virCgroupPtr group, int mask) {
> > +group->assert_empty = mask;
> > +}
> > +
> > +int
> > +virCgroupGetAssertEmpty(virCgroupPtr group) {
> > +return group->assert_empty;
> > +}
> > +
> >
> 
> You'll need to add the corresponding API in the "#else /*
> !VIR_CGROUP_SUPPORTED */" area... Search on
> virCgroupAddTaskController - you'll find a second entry in the module
> which reports a system error. That's what you'll need to add for these
> 
> John
> >  /**
> >   * virCgroupMoveTask:
> > diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h
> > index 63a9e1c..f244c24 100644
> > --- a/src/util/vircgroup.h
> > +++ b/src/util/vircgroup.h
> > @@ -131,6 +131,9 @@ int virCgroupAddTaskController(virCgroupPtr
> > group, pid_t pid,
> > int controller);
> >  
> > +void virCgroupSetAssertEmpty(virCgroupPtr group, int mask);
> > +int virCgroupGetAssertEmpty(virCgroupPtr group);
> > +
> >  int virCgroupMoveTask(virCgroupPtr src_group,
> >virCgroupPtr dest_group);
> >  
> > diff --git a/src/util/vircgrouppriv.h b/src/util/vircgrouppriv.h
> > index 722863e..944d6ae 100644
> > --- a/src/util/vircgrouppriv.h
> > +++ b/src/util/vircgrouppriv.h
> > @@ -44,6 +44,7 @@ struct virCgroupController {
> >  
> >  struct virCgroup {
> >  char *path;
> > +int assert_empty;
> >  
> >  struct virCgroupController
> > controllers[VIR_CGROUP_CONTROLLER_LAST]; };
> >   

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/9] vircgroup: introduce controller mask for threads

2016-02-26 Thread Henning Schild
On Thu, 25 Feb 2016 17:53:07 -0500
John Ferlan <jfer...@redhat.com> wrote:

> On 02/23/2016 10:58 AM, Henning Schild wrote:
> > When using a cgroups hierarchy threads have child cgroups for
> > certain controllers. Introduce an enum for later reuse.
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/util/vircgroup.c | 12 +++-
> >  src/util/vircgroup.h |  7 +++
> >  2 files changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > index ad46dfc..11f33ab 100644
> > --- a/src/util/vircgroup.c
> > +++ b/src/util/vircgroup.c
> > @@ -277,9 +277,7 @@ virCgroupValidateMachineGroup(virCgroupPtr
> > group, goto cleanup;
> >  
> >  if (stripEmulatorSuffix &&
> > -(i == VIR_CGROUP_CONTROLLER_CPU ||
> > - i == VIR_CGROUP_CONTROLLER_CPUACCT ||
> > - i == VIR_CGROUP_CONTROLLER_CPUSET)) {
> > +(i & VIR_CGROUP_THREAD_CONTROLLER_MASK)) {  
> 
> Not sure this works as expected because 'i' is not a mask - it's just
> an int... On entry, VIR_CGROUP_THREAD_CONTROLLER_MASK is 7...
> 
> The loop goes from 0 to VIR_CGROUP_CONTROLLER_LAST (11).
> 
> If you logically go through the values of 'i':
> 
> 0 & 7
> 1 & 7
> 2 & 7
> 3 & 7
> 4 & 7
> ...
> etc
> 
> You'd find 0 & 8 fail, but 1 -> 7, 9, & 10 succeed
> 
> So what "would" work is :
> 
> mask = (1 << i);
> if (stripEmulatorSuffix &&
> mask & VIR_CGROUP_THREAD_CONTROLLER_MASK))

Sure, Stupid mistake, will fix.

> John
> 
> >  if (STREQ(tmp, "/emulator"))
> >  *tmp = '\0';
> >  tmp = strrchr(group->controllers[i].placement, '/');
> > @@ -1518,7 +1516,6 @@ virCgroupNewThread(virCgroupPtr domain,
> >  {
> >  int ret = -1;
> >  char *name = NULL;
> > -int controllers;
> >  
> >  switch (nameval) {
> >  case VIR_CGROUP_THREAD_VCPU:
> > @@ -1539,11 +1536,8 @@ virCgroupNewThread(virCgroupPtr domain,
> >  goto cleanup;
> >  }
> >  
> > -controllers = ((1 << VIR_CGROUP_CONTROLLER_CPU) |
> > -   (1 << VIR_CGROUP_CONTROLLER_CPUACCT) |
> > -   (1 << VIR_CGROUP_CONTROLLER_CPUSET));
> > -
> > -if (virCgroupNew(-1, name, domain, controllers, group) < 0)
> > +if (virCgroupNew(-1, name, domain,
> > VIR_CGROUP_THREAD_CONTROLLER_MASK,
> > +group) < 0)
> >  goto cleanup;
> >  
> >  if (virCgroupMakeGroup(domain, *group, create,
> > VIR_CGROUP_NONE) < 0) { diff --git a/src/util/vircgroup.h
> > b/src/util/vircgroup.h index f244c24..f71aed5 100644
> > --- a/src/util/vircgroup.h
> > +++ b/src/util/vircgroup.h
> > @@ -52,6 +52,13 @@ VIR_ENUM_DECL(virCgroupController);
> >   * Make sure we will not overflow */
> >  verify(VIR_CGROUP_CONTROLLER_LAST < 8 * sizeof(int));
> >  
> > +enum {
> > +VIR_CGROUP_THREAD_CONTROLLER_MASK =
> > +((1 << VIR_CGROUP_CONTROLLER_CPU) |
> > + (1 << VIR_CGROUP_CONTROLLER_CPUACCT) |
> > + (1 << VIR_CGROUP_CONTROLLER_CPUSET))
> > +};
> > +
> >  typedef enum {
> >  VIR_CGROUP_THREAD_VCPU = 0,
> >  VIR_CGROUP_THREAD_EMULATOR,
> >   

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] make-kpkg: add support for "make olddefconfig"

2016-02-23 Thread Henning Schild
Sorry that one is unrelated. Ignore it.

On Tue, 23 Feb 2016 16:58:35 +0100
Henning Schild <henning.sch...@siemens.com> wrote:

> Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> ---
>  make-kpkg | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/make-kpkg b/make-kpkg
> index 5cb8ec3..ba663c9 100755
> --- a/make-kpkg
> +++ b/make-kpkg
> @@ -662,9 +662,10 @@ sub main () {
>}
>  
>if ( $config_target
> -!~ /^(?:|silentold|old|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/
> ) {
> +!~ 
> /^(?:|silentold|old|olddef|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/
> ) { print
> -  "Config type must be one of
> {config,silentoldconfig,oldconfig,menuconfig,xconfig,\n";
> +  "Config type must be one of
> {config,silentoldconfig,oldconfig,olddefconfig,menuconfig,"
> +  . "xconfig,\n";
>  print
>"
> nconfig,gconfig,randconfig,defconfig,allmodconfigallyesconfig,allnoconfig}.\n";
> print "use --help to display command line syntax help.\n";

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add support for "make olddefconfig"

2016-02-23 Thread Henning Schild
Sorry that one is unrelated. Ignore it.

On Tue, 23 Feb 2016 16:58:34 +0100
Henning Schild <henning.sch...@siemens.com> wrote:

> Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> ---
>  make-kpkg | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/make-kpkg b/make-kpkg
> index 5cb8ec3..ba663c9 100755
> --- a/make-kpkg
> +++ b/make-kpkg
> @@ -662,9 +662,10 @@ sub main () {
>}
>  
>if ( $config_target
> -!~ /^(?:|silentold|old|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/
> ) {
> +!~ 
> /^(?:|silentold|old|olddef|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/
> ) { print
> -  "Config type must be one of
> {config,silentoldconfig,oldconfig,menuconfig,xconfig,\n";
> +  "Config type must be one of
> {config,silentoldconfig,oldconfig,olddefconfig,menuconfig,"
> +  . "xconfig,\n";
>  print
>"
> nconfig,gconfig,randconfig,defconfig,allmodconfigallyesconfig,allnoconfig}.\n";
> print "use --help to display command line syntax help.\n";

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 8/9] qemu_cgroup: dont put qemu main thread into wrong cgroup

2016-02-23 Thread Henning Schild
Eventually all qemu threads are supposed to be in the final cgroup
structure in one of the leaf cgroups (vcpuX, emulator). They should
never be in the main machine cgroup. That is for all thread related
controllers like cpuset, cpu, and cpuacct.
By excluding the threading related controllers we do not put the
task into the machine cgroup controllers just to move it out to
the emulator subgroup later.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_cgroup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 41a583c..99fb5bf 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -789,7 +789,8 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
-if (virCgroupAddTask(priv->cgroup, vm->pid, -1) < 0) {
+if (virCgroupAddTask(priv->cgroup, vm->pid,
+ ~VIR_CGROUP_THREAD_CONTROLLER_MASK) < 0) {
 virErrorPtr saved = virSaveLastError();
 virCgroupRemove(priv->cgroup);
 virCgroupFree(>cgroup);
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 4/9] util: cgroups do not implicitly add task to new machine cgroup

2016-02-23 Thread Henning Schild
virCgroupNewMachine used to add the pidleader to the newly created
machine cgroup. Do not do this implicit anymore.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/lxc/lxc_cgroup.c   | 11 +++
 src/qemu/qemu_cgroup.c | 11 +++
 src/util/vircgroup.c   | 22 --
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c
index ad254e4..609e9ea 100644
--- a/src/lxc/lxc_cgroup.c
+++ b/src/lxc/lxc_cgroup.c
@@ -504,6 +504,17 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def,
 ) < 0)
 goto cleanup;
 
+if (virCgroupAddTask(cgroup, initpid) < 0) {
+virErrorPtr saved = virSaveLastError();
+virCgroupRemove(cgroup);
+virCgroupFree();
+if (saved) {
+virSetError(saved);
+virFreeError(saved);
+}
+goto cleanup;
+}
+
 /* setup control group permissions for user namespace */
 if (def->idmap.uidmap) {
 if (virCgroupSetOwner(cgroup,
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index e41f461..66dc782 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -789,6 +789,17 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
+if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
+virErrorPtr saved = virSaveLastError();
+virCgroupRemove(priv->cgroup);
+virCgroupFree(>cgroup);
+if (saved) {
+virSetError(saved);
+virFreeError(saved);
+}
+goto cleanup;
+}
+
  done:
 ret = 0;
  cleanup:
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 11f33ab..aef8e8c 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1682,16 +1682,6 @@ virCgroupNewMachineSystemd(const char *name,
 }
 }
 
-if (virCgroupAddTask(*group, pidleader) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-}
-
 ret = 0;
  cleanup:
 virCgroupFree();
@@ -1714,7 +1704,6 @@ int virCgroupTerminateMachine(const char *name,
 static int
 virCgroupNewMachineManual(const char *name,
   const char *drivername,
-  pid_t pidleader,
   const char *partition,
   int controllers,
   virCgroupPtr *group)
@@ -1740,16 +1729,6 @@ virCgroupNewMachineManual(const char *name,
 group) < 0)
 goto cleanup;
 
-if (virCgroupAddTask(*group, pidleader) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-}
-
  done:
 ret = 0;
 
@@ -1796,7 +1775,6 @@ virCgroupNewMachine(const char *name,
 
 return virCgroupNewMachineManual(name,
  drivername,
- pidleader,
  partition,
  controllers,
  group);
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 6/9] qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask

2016-02-23 Thread Henning Schild
qemuSetupCgroupForEmulator runs at a point in time where there is only
the qemu main thread. Use virCgroupAddTask to put just that one task
into the emulator cgroup. That patch makes virCgroupMoveTask and
virCgroupAddTaskStrController obsolete.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/libvirt_private.syms |   1 -
 src/qemu/qemu_cgroup.c   |   4 +-
 src/util/vircgroup.c | 102 ---
 src/util/vircgroup.h |   3 --
 4 files changed, 2 insertions(+), 108 deletions(-)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index cf93d06..c5e57bf 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1234,7 +1234,6 @@ virCgroupIsolateMount;
 virCgroupKill;
 virCgroupKillPainfully;
 virCgroupKillRecursive;
-virCgroupMoveTask;
 virCgroupNewDetect;
 virCgroupNewDetectMachine;
 virCgroupNewDomainPartition;
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 66dc782..d410a66 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -1145,8 +1145,8 @@ qemuSetupCgroupForEmulator(virDomainObjPtr vm)
true, _emulator) < 0)
 goto cleanup;
 
-if (virCgroupMoveTask(priv->cgroup, cgroup_emulator) < 0)
-goto cleanup;
+if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
+   goto cleanup;
 
 if (def->cputune.emulatorpin)
 cpumask = def->cputune.emulatorpin;
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index aef8e8c..c31c83b 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1209,50 +1209,6 @@ virCgroupAddTaskController(virCgroupPtr group, pid_t 
pid, int controller)
 }
 
 
-static int
-virCgroupAddTaskStrController(virCgroupPtr group,
-  const char *pidstr,
-  int controller)
-{
-char *str = NULL, *cur = NULL, *next = NULL;
-unsigned long long p = 0;
-int rc = 0;
-char *endp;
-
-if (VIR_STRDUP(str, pidstr) < 0)
-return -1;
-
-cur = str;
-while (*cur != '\0') {
-if (virStrToLong_ull(cur, , 10, ) < 0) {
-virReportError(VIR_ERR_INTERNAL_ERROR,
-   _("Cannot parse '%s' as an integer"), cur);
-goto cleanup;
-}
-
-if (virCgroupAddTaskController(group, p, controller) < 0) {
-/* A thread that exits between when we first read the source
- * tasks and now is not fatal.  */
-if (virLastErrorIsSystemErrno(ESRCH))
-virResetLastError();
-else
-goto cleanup;
-}
-
-next = strchr(cur, '\n');
-if (next) {
-cur = next + 1;
-*next = '\0';
-} else {
-break;
-}
-}
-
- cleanup:
-VIR_FREE(str);
-return rc;
-}
-
 void
 virCgroupSetAssertEmpty(virCgroupPtr group, int mask) {
 group->assert_empty = mask;
@@ -1264,54 +1220,6 @@ virCgroupGetAssertEmpty(virCgroupPtr group) {
 }
 
 
-/**
- * virCgroupMoveTask:
- *
- * @src_group: The source cgroup where all tasks are removed from
- * @dest_group: The destination where all tasks are added to
- *
- * Returns: 0 on success or -1 on failure
- */
-int
-virCgroupMoveTask(virCgroupPtr src_group, virCgroupPtr dest_group)
-{
-int ret = -1;
-char *content = NULL;
-size_t i;
-
-for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) {
-if (!src_group->controllers[i].mountPoint ||
-!dest_group->controllers[i].mountPoint)
-continue;
-
-/* We must never move tasks in systemd's hierarchy */
-if (i == VIR_CGROUP_CONTROLLER_SYSTEMD)
-continue;
-
-/* New threads are created in the same group as their parent;
- * but if a thread is created after we first read we aren't
- * aware that it needs to move.  Therefore, we must iterate
- * until content is empty.  */
-while (1) {
-VIR_FREE(content);
-if (virCgroupGetValueStr(src_group, i, "tasks", ) < 0)
-return -1;
-
-if (!*content)
-break;
-
-if (virCgroupAddTaskStrController(dest_group, content, i) < 0)
-goto cleanup;
-}
-}
-
-ret = 0;
- cleanup:
-VIR_FREE(content);
-return ret;
-}
-
-
 static int
 virCgroupSetPartitionSuffix(const char *path, char **res)
 {
@@ -4309,16 +4217,6 @@ virCgroupAddTaskController(virCgroupPtr group 
ATTRIBUTE_UNUSED,
 
 
 int
-virCgroupMoveTask(virCgroupPtr src_group ATTRIBUTE_UNUSED,
-  virCgroupPtr dest_group ATTRIBUTE_UNUSED)
-{
-virReportSystemError(ENXIO, "%s",
- _("Control groups not supported on this platform"));
-return -1;
-}
-
-
-int
 virCgroupGetBlkioIoServiced(virCgroupPtr group ATTRIBUTE_UNUSED,
  

[libvirt] [PATCH 7/9] vircgroup: add controller mask to virCgroupAddTask

2016-02-23 Thread Henning Schild
Add a way to exclude controllers from virCgroupAddTask. In a cgroup
hierarchy the parent might have controllers just to allow children
cgroups to inherit them, not necessarily to put any tasks in them.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/lxc/lxc_cgroup.c | 2 +-
 src/lxc/lxc_controller.c | 4 ++--
 src/qemu/qemu_cgroup.c   | 8 
 src/qemu/qemu_driver.c   | 2 +-
 src/util/vircgroup.c | 8 ++--
 src/util/vircgroup.h | 2 +-
 6 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c
index 609e9ea..9b91dd2 100644
--- a/src/lxc/lxc_cgroup.c
+++ b/src/lxc/lxc_cgroup.c
@@ -504,7 +504,7 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def,
 ) < 0)
 goto cleanup;
 
-if (virCgroupAddTask(cgroup, initpid) < 0) {
+if (virCgroupAddTask(cgroup, initpid, -1) < 0) {
 virErrorPtr saved = virSaveLastError();
 virCgroupRemove(cgroup);
 virCgroupFree();
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 438103a..b1fe8fa 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -863,12 +863,12 @@ static int 
virLXCControllerSetupCgroupLimits(virLXCControllerPtr ctrl)
 ctrl->nicindexes)))
 goto cleanup;
 
-if (virCgroupAddTask(ctrl->cgroup, getpid()) < 0)
+if (virCgroupAddTask(ctrl->cgroup, getpid(), -1) < 0)
 goto cleanup;
 
 /* Add all qemu-nbd tasks to the cgroup */
 for (i = 0; i < ctrl->nnbdpids; i++) {
-if (virCgroupAddTask(ctrl->cgroup, ctrl->nbdpids[i]) < 0)
+if (virCgroupAddTask(ctrl->cgroup, ctrl->nbdpids[i], -1) < 0)
 goto cleanup;
 }
 
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index d410a66..41a583c 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -789,7 +789,7 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
-if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
+if (virCgroupAddTask(priv->cgroup, vm->pid, -1) < 0) {
 virErrorPtr saved = virSaveLastError();
 virCgroupRemove(priv->cgroup);
 virCgroupFree(>cgroup);
@@ -1096,7 +1096,7 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm)
 
 /* move the thread for vcpu to sub dir */
 if (virCgroupAddTask(cgroup_vcpu,
- qemuDomainGetVcpuPid(vm, i)) < 0)
+ qemuDomainGetVcpuPid(vm, i), -1) < 0)
 goto cleanup;
 
 }
@@ -1145,7 +1145,7 @@ qemuSetupCgroupForEmulator(virDomainObjPtr vm)
true, _emulator) < 0)
 goto cleanup;
 
-if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
+if (virCgroupAddTask(cgroup_emulator, vm->pid, -1) < 0)
goto cleanup;
 
 if (def->cputune.emulatorpin)
@@ -1255,7 +1255,7 @@ qemuSetupCgroupForIOThreads(virDomainObjPtr vm)
 
 /* move the thread for iothread to sub dir */
 if (virCgroupAddTask(cgroup_iothread,
- def->iothreadids[i]->thread_id) < 0)
+ def->iothreadids[i]->thread_id, -1) < 0)
 goto cleanup;
 
 virCgroupFree(_iothread);
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 8ccf68b..c0b840b 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -4583,7 +4583,7 @@ qemuDomainAddCgroupForThread(virCgroupPtr cgroup,
 goto error;
 
 /* Add pid/thread to the cgroup */
-rv = virCgroupAddTask(new_cgroup, pid);
+rv = virCgroupAddTask(new_cgroup, pid, -1);
 if (rv < 0) {
 virCgroupRemove(new_cgroup);
 goto error;
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index c31c83b..bbc88f3 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1142,16 +1142,19 @@ virCgroupNew(pid_t pid,
  *
  * @group: The cgroup to add a task to
  * @pid: The pid of the task to add
+ * @controllers: mask of controllers to operate on
  *
  * Returns: 0 on success, -1 on error
  */
 int
-virCgroupAddTask(virCgroupPtr group, pid_t pid)
+virCgroupAddTask(virCgroupPtr group, pid_t pid, int controllers)
 {
 int ret = -1;
 size_t i;
 
 for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) {
+if (((controllers & (1 << i)) == 0))
+continue;
 /* Skip over controllers not mounted */
 if (!group->controllers[i].mountPoint)
 continue;
@@ -4197,7 +4200,8 @@ virCgroupPathOfController(virCgroupPtr group 
ATTRIBUTE_UNUSED,
 
 int
 virCgroupAddTask(virCgroupPtr group ATTRIBUTE_UNUSED,
- pid_t pid ATTRIBUTE_UNUSED)
+ pid_t pid ATTRIBUTE_UNUSED,
+ int controllers ATTRIBUTE_UNUSED)
 {
 virReportSystemError(ENXIO, "%s",
 

[libvirt] [PATCH 9/9] qemu_cgroup: assert threading cgroup layout for machine cgroup

2016-02-23 Thread Henning Schild
Make sure the thread related controls of the machine cgroup never
get any tasks assigned.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_cgroup.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 99fb5bf..c827787 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -740,6 +740,7 @@ qemuInitCgroup(virQEMUDriverPtr driver,
int *nicindexes)
 {
 int ret = -1;
+int assert_empty, controllers;
 qemuDomainObjPrivatePtr priv = vm->privateData;
 virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
 
@@ -789,8 +790,17 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
-if (virCgroupAddTask(priv->cgroup, vm->pid,
- ~VIR_CGROUP_THREAD_CONTROLLER_MASK) < 0) {
+/*
+ * the child cgroups for emulator, vcpu, and io -threads contain
+ * all qemu threads for the following controllers, the parent
+ * group has to stay empty.
+ */
+controllers = VIR_CGROUP_THREAD_CONTROLLER_MASK;
+assert_empty = virCgroupGetAssertEmpty(priv->cgroup);
+assert_empty |= controllers;
+virCgroupSetAssertEmpty(priv->cgroup, assert_empty);
+
+if (virCgroupAddTask(priv->cgroup, vm->pid, ~controllers) < 0) {
 virErrorPtr saved = virSaveLastError();
 virCgroupRemove(priv->cgroup);
 virCgroupFree(>cgroup);
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 3/9] vircgroup: introduce controller mask for threads

2016-02-23 Thread Henning Schild
When using a cgroups hierarchy threads have child cgroups for
certain controllers. Introduce an enum for later reuse.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/util/vircgroup.c | 12 +++-
 src/util/vircgroup.h |  7 +++
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index ad46dfc..11f33ab 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -277,9 +277,7 @@ virCgroupValidateMachineGroup(virCgroupPtr group,
 goto cleanup;
 
 if (stripEmulatorSuffix &&
-(i == VIR_CGROUP_CONTROLLER_CPU ||
- i == VIR_CGROUP_CONTROLLER_CPUACCT ||
- i == VIR_CGROUP_CONTROLLER_CPUSET)) {
+(i & VIR_CGROUP_THREAD_CONTROLLER_MASK)) {
 if (STREQ(tmp, "/emulator"))
 *tmp = '\0';
 tmp = strrchr(group->controllers[i].placement, '/');
@@ -1518,7 +1516,6 @@ virCgroupNewThread(virCgroupPtr domain,
 {
 int ret = -1;
 char *name = NULL;
-int controllers;
 
 switch (nameval) {
 case VIR_CGROUP_THREAD_VCPU:
@@ -1539,11 +1536,8 @@ virCgroupNewThread(virCgroupPtr domain,
 goto cleanup;
 }
 
-controllers = ((1 << VIR_CGROUP_CONTROLLER_CPU) |
-   (1 << VIR_CGROUP_CONTROLLER_CPUACCT) |
-   (1 << VIR_CGROUP_CONTROLLER_CPUSET));
-
-if (virCgroupNew(-1, name, domain, controllers, group) < 0)
+if (virCgroupNew(-1, name, domain, VIR_CGROUP_THREAD_CONTROLLER_MASK,
+group) < 0)
 goto cleanup;
 
 if (virCgroupMakeGroup(domain, *group, create, VIR_CGROUP_NONE) < 0) {
diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h
index f244c24..f71aed5 100644
--- a/src/util/vircgroup.h
+++ b/src/util/vircgroup.h
@@ -52,6 +52,13 @@ VIR_ENUM_DECL(virCgroupController);
  * Make sure we will not overflow */
 verify(VIR_CGROUP_CONTROLLER_LAST < 8 * sizeof(int));
 
+enum {
+VIR_CGROUP_THREAD_CONTROLLER_MASK =
+((1 << VIR_CGROUP_CONTROLLER_CPU) |
+ (1 << VIR_CGROUP_CONTROLLER_CPUACCT) |
+ (1 << VIR_CGROUP_CONTROLLER_CPUSET))
+};
+
 typedef enum {
 VIR_CGROUP_THREAD_VCPU = 0,
 VIR_CGROUP_THREAD_EMULATOR,
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/9] vircgroup: one central point for adding tasks to cgroups

2016-02-23 Thread Henning Schild
Use virCgroupAddTaskController in virCgroupAddTask so we have one
single point where we add tasks to cgroups.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/util/vircgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 7584ee4..0b65238 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1162,7 +1162,7 @@ virCgroupAddTask(virCgroupPtr group, pid_t pid)
 if (i == VIR_CGROUP_CONTROLLER_SYSTEMD)
 continue;
 
-if (virCgroupSetValueU64(group, i, "tasks", pid) < 0)
+if (virCgroupAddTaskController(group, pid, i) < 0)
 goto cleanup;
 }
 
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 2/9] vircgroup: add assertion to allow cgroup controllers to stay empty

2016-02-23 Thread Henning Schild
When using a hierarchy of cgroups we might want to add tasks just to
the children cgroups but never to the parent. To make sure we do not
use a parent cgroup by accident add a mechanism that lets us assert
a correct implementation in cases we want such a hierarchy.

i.e. for qemu cpusets we want all tasks in /vcpuX or /emulator, not
in /.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/libvirt_private.syms |  2 ++
 src/util/vircgroup.c | 19 +++
 src/util/vircgroup.h |  3 +++
 src/util/vircgrouppriv.h |  1 +
 4 files changed, 25 insertions(+)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 83f6e2c..cf93d06 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -1201,6 +1201,7 @@ virCgroupDenyDeviceMajor;
 virCgroupDenyDevicePath;
 virCgroupDetectMountsFromFile;
 virCgroupFree;
+virCgroupGetAssertEmpty;
 virCgroupGetBlkioDeviceReadBps;
 virCgroupGetBlkioDeviceReadIops;
 virCgroupGetBlkioDeviceWeight;
@@ -1245,6 +1246,7 @@ virCgroupNewThread;
 virCgroupPathOfController;
 virCgroupRemove;
 virCgroupRemoveRecursively;
+virCgroupSetAssertEmpty;
 virCgroupSetBlkioDeviceReadBps;
 virCgroupSetBlkioDeviceReadIops;
 virCgroupSetBlkioDeviceWeight;
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 0b65238..ad46dfc 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1197,6 +1197,15 @@ virCgroupAddTaskController(virCgroupPtr group, pid_t 
pid, int controller)
 return -1;
 }
 
+if(group->assert_empty & (1 << controller)) {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("Controller '%s' is not supposed to contain any"
+ " tasks. group=%s pid=%d\n"),
+   virCgroupControllerTypeToString(controller),
+   group->path, pid);
+return -1;
+}
+
 return virCgroupSetValueU64(group, controller, "tasks",
 (unsigned long long)pid);
 }
@@ -1246,6 +1255,16 @@ virCgroupAddTaskStrController(virCgroupPtr group,
 return rc;
 }
 
+void
+virCgroupSetAssertEmpty(virCgroupPtr group, int mask) {
+group->assert_empty = mask;
+}
+
+int
+virCgroupGetAssertEmpty(virCgroupPtr group) {
+return group->assert_empty;
+}
+
 
 /**
  * virCgroupMoveTask:
diff --git a/src/util/vircgroup.h b/src/util/vircgroup.h
index 63a9e1c..f244c24 100644
--- a/src/util/vircgroup.h
+++ b/src/util/vircgroup.h
@@ -131,6 +131,9 @@ int virCgroupAddTaskController(virCgroupPtr group,
pid_t pid,
int controller);
 
+void virCgroupSetAssertEmpty(virCgroupPtr group, int mask);
+int virCgroupGetAssertEmpty(virCgroupPtr group);
+
 int virCgroupMoveTask(virCgroupPtr src_group,
   virCgroupPtr dest_group);
 
diff --git a/src/util/vircgrouppriv.h b/src/util/vircgrouppriv.h
index 722863e..944d6ae 100644
--- a/src/util/vircgrouppriv.h
+++ b/src/util/vircgrouppriv.h
@@ -44,6 +44,7 @@ struct virCgroupController {
 
 struct virCgroup {
 char *path;
+int assert_empty;
 
 struct virCgroupController controllers[VIR_CGROUP_CONTROLLER_LAST];
 };
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 5/9] qemu_cgroup: put qemu right into emulator sub-cgroup

2016-02-23 Thread Henning Schild
Move qemuSetupCgroupForEmulator up under qemuSetupCgroup. That way
we move the one main thread right into the emulator cgroup, instead
of moving multiple threads later on. And we do not actually want any
threads running in the parent cgroups (cpu cpuacct cpuset).

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_process.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 05cbda2..65f718c 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -4895,6 +4895,10 @@ qemuProcessLaunch(virConnectPtr conn,
 if (qemuSetupCgroup(driver, vm, nnicindexes, nicindexes) < 0)
 goto cleanup;
 
+VIR_DEBUG("Setting cgroup for emulator (if required)");
+if (qemuSetupCgroupForEmulator(vm) < 0)
+goto cleanup;
+
 /* This must be done after cgroup placement to avoid resetting CPU
  * affinity */
 if (!vm->def->cputune.emulatorpin &&
@@ -4943,10 +4947,6 @@ qemuProcessLaunch(virConnectPtr conn,
 if (rv == -1) /* The VM failed to start */
 goto cleanup;
 
-VIR_DEBUG("Setting cgroup for emulator (if required)");
-if (qemuSetupCgroupForEmulator(vm) < 0)
-goto cleanup;
-
 VIR_DEBUG("Setting affinity of emulator threads");
 if (qemuProcessSetEmulatorAffinity(vm) < 0)
 goto cleanup;
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] make-kpkg: add support for "make olddefconfig"

2016-02-23 Thread Henning Schild
Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 make-kpkg | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/make-kpkg b/make-kpkg
index 5cb8ec3..ba663c9 100755
--- a/make-kpkg
+++ b/make-kpkg
@@ -662,9 +662,10 @@ sub main () {
   }
 
   if ( $config_target
-!~ /^(?:|silentold|old|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/ ) {
+!~ 
/^(?:|silentold|old|olddef|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/ ) {
 print
-  "Config type must be one of 
{config,silentoldconfig,oldconfig,menuconfig,xconfig,\n";
+  "Config type must be one of 
{config,silentoldconfig,oldconfig,olddefconfig,menuconfig,"
+  . "xconfig,\n";
 print
   " 
nconfig,gconfig,randconfig,defconfig,allmodconfigallyesconfig,allnoconfig}.\n";
 print "use --help to display command line syntax help.\n";
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 0/9] fix thread related controllers in cgroups

2016-02-23 Thread Henning Schild
This series picks up the cgroups work i started earlier. My initial
patches got in and later reverted before 1.3.1.

The problem the series is solving is about qemu-threads becoming
runnable on pcpus outside the pinning masks configured for the
machine. That only happens for a short time before the thread is
moved to its final cpuset. But it can disturb other load on the
system or can lead to qemu never starting. (qemu main thread
ends up on a pcpu with busy high prio rt-task).

The problem in the original series was the lack of understanding 
that one virCgroup can cover all controllers. Instead of just touching
cpusets the patches had side effects on all the other controllers
(memory, blkio etc.) Again the general idea is to put all threads
right into the correct cgroups and to not move them around. But this
series touches only the cpu, cpuset, and cpuacct controllers. That are
the ones relevant to threads and that are the controllers the
threading sub-groups have mounted.

Patches 1, 2, and 9 deal with asserting correct behaviour. They are
optional. But given the complexity of the "bringup" and the importance
of getting that right, i think they should go in as well!

The tricky bits are in patches 5 and 8, i kept them as simple as
possible.

The series is based on v1.3.1.

Henning Schild (9):
  vircgroup: one central point for adding tasks to cgroups
  vircgroup: add assertion to allow cgroup controllers to stay empty
  vircgroup: introduce controller mask for threads
  util: cgroups do not implicitly add task to new machine cgroup
  qemu_cgroup: put qemu right into emulator sub-cgroup
  qemu_cgroup: use virCgroupAddTask instead of virCgroupMoveTask
  vircgroup: add controller mask to virCgroupAddTask
  qemu_cgroup: dont put qemu main thread into wrong cgroup
  qemu_cgroup: assert threading cgroup layout for machine cgroup

 src/libvirt_private.syms |   3 +-
 src/lxc/lxc_cgroup.c |  11 
 src/lxc/lxc_controller.c |   4 +-
 src/qemu/qemu_cgroup.c   |  30 +++--
 src/qemu/qemu_driver.c   |   2 +-
 src/qemu/qemu_process.c  |   8 +--
 src/util/vircgroup.c | 155 ---
 src/util/vircgroup.h |  13 +++-
 src/util/vircgrouppriv.h |   1 +
 9 files changed, 81 insertions(+), 146 deletions(-)

-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] add support for "make olddefconfig"

2016-02-23 Thread Henning Schild
Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 make-kpkg | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/make-kpkg b/make-kpkg
index 5cb8ec3..ba663c9 100755
--- a/make-kpkg
+++ b/make-kpkg
@@ -662,9 +662,10 @@ sub main () {
   }
 
   if ( $config_target
-!~ /^(?:|silentold|old|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/ ) {
+!~ 
/^(?:|silentold|old|olddef|menu|n|x|g|rand|def|all(mod|yes|no))(config)?$/ ) {
 print
-  "Config type must be one of 
{config,silentoldconfig,oldconfig,menuconfig,xconfig,\n";
+  "Config type must be one of 
{config,silentoldconfig,oldconfig,olddefconfig,menuconfig,"
+  . "xconfig,\n";
 print
   " 
nconfig,gconfig,randconfig,defconfig,allmodconfigallyesconfig,allnoconfig}.\n";
 print "use --help to display command line syntax help.\n";
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 0/4] Adjustment to recent cgroup/cpuset changes (for 1.3.1)

2016-01-15 Thread Henning Schild
On Thu, 14 Jan 2016 17:20:04 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Thu, Jan 14, 2016 at 06:14:45PM +0100, Henning Schild wrote:
> > On Thu, 14 Jan 2016 16:42:12 +
> > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> >   
> > > On Thu, Jan 14, 2016 at 11:21:25AM -0500, John Ferlan wrote:  
> > > > v1:
> > > > http://www.redhat.com/archives/libvir-list/2016-January/msg00511.html
> > > > 
> > > > As discussed during the replies of the v1 - revert Henning's
> > > > first two patches, plus the one I made as a result of those.
> > > > 
> > > > Patch 4/4 is already ACK'd 
> > > > 
> > > > John Ferlan (4):
> > > >   Revert "qemu: do not put a task into machine cgroup"
> > > >   Revert "util: cgroups do not implicitly add task to new
> > > > machine cgroup"
> > > >   Revert "lxc_cgroup: Add check for NULL cgroup before AddTask
> > > > call" cgroup: Fix possible bug as a result of code motion for
> > > > vcpu cgroup setup
> > > 
> > > ACK to all  
> > 
> > Same here!
> > 
> > Daniel do you want to fix the "first qemu thread is no emulator"
> > issue, or should i give it another try?  
> 
> If you send another patch I'll review it.  As mentioned before, I
> think i'd suggest something as simple as calling
> qemuCgroupSetupEmulator from qemuInitCgroup will probably work ok

I will send one, but bare with me .. Other projects and vacation ;).

Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [REPOST 0/4] Adjustment to recent cgroup/cpuset changes (for 1.3.1)

2016-01-14 Thread Henning Schild
On Wed, 13 Jan 2016 17:53:16 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Wed, Jan 13, 2016 at 05:51:34PM +0100, Martin Kletzander wrote:
> > On Wed, Jan 13, 2016 at 07:29:46AM -0500, John Ferlan wrote:  
> > >Reposting my cgroup fixes series:
> > >
> > >http://www.redhat.com/archives/libvir-list/2016-January/msg00236.html
> > >
> > >partially because I originally forgot to CC the author (Henning
> > >Schild) of the original series for which these patch fix a couple
> > >of issues discovered during regression testing (virt-test memtune
> > >failures in Red Hat regression environment), but also to bring
> > >them up to date with the top of libvirt git.
> > >
> > >NB: I did send Henning the changes after the fact, but my resend
> > >using the same message-id skills so that replies are left in the
> > >onlist series are lacking.  Henning has looked at the first patch
> > >- with a response here:
> > >
> > >http://www.redhat.com/archives/libvir-list/2016-January/msg00443.html
> > >
> > >Finally, I think these changes should go into 1.3.1 since that's
> > >when the regression was introduced.
> > >  
> > 
> > It would be nice to have them in, I really tried reviewing them,
> > but I can't wrap my head around last two of them.  Maybe because
> > I'm already late for an appointment I have.
> > 
> > So unfortunately I have to leave you without the review for those
> > two as I really need to go, but anyone else feel free to continue.
> > And even re-check my reviews for 1 and 2 if you want.  It would be
> > a pity not to fix a regression when we could.  
> 
> I agree we need to get these into the next release, but please hold
> off from merging them. I want to re-examine Henning's original patches
> in more detail, as I have a bad feeling we might need to simply revert
> all of them and start again.

Until when do these patches need to be reviewed? The 1/4 is obvious but
the other ones need a closer look. I can just say that they do not seem
right.
I pulled virCgroupAddTask out of virCgroupNewMachine* and that should
be done. But it seems to me that the way virCgroupAddTask was called
contained important error handling that should remain in
virCgroupNewMachine.
If we only have a couple of days to the next release i would suggest
reverting my changes to give us time to figure the whole thing out.
I will have limited time to look into that until the end of the month.

But i certainly want those changes merged, or something that helps
solve/mitigate the realtime problems of the current implementation. My
changes address only a fraction of the problem. My suggestion for a
proper solution would be using the exclusive flags of cgroups, which
will require a totally different cgroup layout. Something like a
"machine.slice_excl" next to "machine.slice" with the current structure
replicated in there. Or a "top-level" exclusive cgroup per instance. So
far i only looked at the implications for cpu_exclusive, where such a
setup can work and will properly "protect" the reserved cpus from
accidential use.

Regards,
Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [REPOST 0/4] Adjustment to recent cgroup/cpuset changes (for 1.3.1)

2016-01-14 Thread Henning Schild
On Thu, 14 Jan 2016 11:57:44 +
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Wed, Jan 13, 2016 at 07:29:46AM -0500, John Ferlan wrote:
> > Reposting my cgroup fixes series:
> > 
> > http://www.redhat.com/archives/libvir-list/2016-January/msg00236.html
> > 
> > partially because I originally forgot to CC the author (Henning
> > Schild) of the original series for which these patch fix a couple
> > of issues discovered during regression testing (virt-test memtune
> > failures in Red Hat regression environment), but also to bring them
> > up to date with the top of libvirt git.
> > 
> > NB: I did send Henning the changes after the fact, but my resend
> > using the same message-id skills so that replies are left in the
> > onlist series are lacking.  Henning has looked at the first patch -
> > with a response here:
> > 
> > http://www.redhat.com/archives/libvir-list/2016-January/msg00443.html
> > 
> > Finally, I think these changes should go into 1.3.1 since that's
> > when the regression was introduced.  
> 
> Since this has been puzzelling us for a while, let me recap on the
> cgroup setup in general.
> 
> First, I'll describe how it used to work *before* Henning's patches
> were merged, on a systemd based host.
> 
>  - The QEMU driver forks a child process, but does *not* exec QEMU yet
>The cgroup placement at this point is inherited from libvirtd. It
>may look like this:
> 
>  10:freezer:/
>  9:cpuset:/
>  8:perf_event:/
>  7:hugetlb:/
>  6:blkio:/system.slice
>  5:memory:/system.slice
>  4:net_cls,net_prio:/
>  3:devices:/system.slice/libvirtd.service
>  2:cpu,cpuacct:/system.slice
>  1:name=systemd:/system.slice/libvirtd.service
> 
>  - The QEMU driver calls virCgroupNewMachine()
> 
>   - We calll virSystemdCreateMachine with pidleader=$child
> 
>- Systemd creates the initial machine scope unit under
>the machine slice unit, for the "systemd" controller.
>It may also add the PID to *zero* or more other
>resource controllers. So at this point the cgroup
>placement may look like this:
> 
>   10:freezer:/
>   9:cpuset:/
>   8:perf_event:/
>   7:hugetlb:/
>   6:blkio:/
>   5:memory:/
>   4:net_cls,net_prio:/
>   3:devices:/
>   2:cpu,cpuacct:/
>   1:name=systemd:/machine.slice/machine-qemu\x2dserial.scope
> 
>  Or may look like this:
> 
>   10:freezer:/machine.slice/machine-qemu\x2dserial.scope
>   9:cpuset:/machine.slice/machine-qemu\x2dserial.scope
>   8:perf_event:/machine.slice/machine-qemu\x2dserial.scope
>   7:hugetlb:/machine.slice/machine-qemu\x2dserial.scope
>   6:blkio:/machine.slice/machine-qemu\x2dserial.scope
>   5:memory:/machine.slice/machine-qemu\x2dserial.scope
>   4:net_cls,net_prio:/machine.slice/machine-qemu\x2dserial.scope
>   3:devices:/machine.slice/machine-qemu\x2dserial.scope
>   2:cpu,cpuacct:/machine.slice/machine-qemu\x2dserial.scope
>   1:name=systemd:/machine.slice/machine-qemu\x2dserial.scope
> 
>  Or anywhere in between. We have *ZERO* guarantee about
> what other resource controllers we may have been placed in by
>systemd. There is some fairly complex logic that
> determines this, based on what other tasks current exist in sibling
>cgroups, and what tasks have *previously* existed in the
>cgroups. IOW, you should consider the list of etra
> resource controllers essentially non-deterministic
> 
>   - We call virCgroupAddTask with pid=$child
> 
> This places the pid in any resource controllers we need, which
>   systemd has not already setup. IOW, it guarantees that we now
>   have placement that should look like this, regardless of what
>   systemd has done:
> 
>   10:freezer:/machine.slice/machine-qemu\x2dserial.scope
>   9:cpuset:/machine.slice/machine-qemu\x2dserial.scope
>   8:perf_event:/machine.slice/machine-qemu\x2dserial.scope
>   7:hugetlb:/machine.slice/machine-qemu\x2dserial.scope
>   6:blkio:/machine.slice/machine-qemu\x2dserial.scope
>   5:memory:/machine.slice/machine-qemu\x2dserial.scope
>   4:net_cls,net_prio:/machine.slice/machine-qemu\x2dserial.scope
>   3:devices:/machine.slice/machine-qemu\x2dserial.scope
>   2:cpu,cpuacct:/ma

Re: [libvirt] [REPOST 0/4] Adjustment to recent cgroup/cpuset changes (for 1.3.1)

2016-01-14 Thread Henning Schild
On Thu, 14 Jan 2016 12:37:18 +
"Daniel P. Berrange"  wrote:

> On Thu, Jan 14, 2016 at 11:57:44AM +, Daniel P. Berrange wrote:
> > Since this has been puzzelling us for a while, let me recap on the
> > cgroup setup in general.
> > 
> > First, I'll describe how it used to work *before* Henning's patches
> > were merged, on a systemd based host.
> > 
> >  - The QEMU driver forks a child process, but does *not* exec QEMU
> > yet The cgroup placement at this point is inherited from libvirtd.
> > It may look like this:
> > 
> >  10:freezer:/
> >  9:cpuset:/
> >  8:perf_event:/
> >  7:hugetlb:/
> >  6:blkio:/system.slice
> >  5:memory:/system.slice
> >  4:net_cls,net_prio:/
> >  3:devices:/system.slice/libvirtd.service
> >  2:cpu,cpuacct:/system.slice
> >  1:name=systemd:/system.slice/libvirtd.service
> > 
> >  - The QEMU driver calls virCgroupNewMachine()
> > 
> >   - We calll virSystemdCreateMachine with pidleader=$child
> > 
> >- Systemd creates the initial machine scope unit under
> >  the machine slice unit, for the "systemd" controller.
> >  It may also add the PID to *zero* or more other
> >  resource controllers. So at this point the cgroup
> >  placement may look like this:
> > 
> >   10:freezer:/
> >   9:cpuset:/
> >   8:perf_event:/
> >   7:hugetlb:/
> >   6:blkio:/
> >   5:memory:/
> >   4:net_cls,net_prio:/
> >   3:devices:/
> >   2:cpu,cpuacct:/
> >   1:name=systemd:/machine.slice/machine-qemu\x2dserial.scope
> > 
> >  Or may look like this:
> > 
> >   10:freezer:/machine.slice/machine-qemu\x2dserial.scope
> >   9:cpuset:/machine.slice/machine-qemu\x2dserial.scope
> >   8:perf_event:/machine.slice/machine-qemu\x2dserial.scope
> >   7:hugetlb:/machine.slice/machine-qemu\x2dserial.scope
> >   6:blkio:/machine.slice/machine-qemu\x2dserial.scope
> >   5:memory:/machine.slice/machine-qemu\x2dserial.scope
> >   4:net_cls,net_prio:/machine.slice/machine-qemu\x2dserial.scope
> >   3:devices:/machine.slice/machine-qemu\x2dserial.scope
> >   2:cpu,cpuacct:/machine.slice/machine-qemu\x2dserial.scope
> >   1:name=systemd:/machine.slice/machine-qemu\x2dserial.scope
> > 
> >  Or anywhere in between. We have *ZERO* guarantee about
> > what other resource controllers we may have been placed in by
> >  systemd. There is some fairly complex logic that
> > determines this, based on what other tasks current exist in sibling
> >  cgroups, and what tasks have *previously* existed in
> > the cgroups. IOW, you should consider the list of etra resource
> >  controllers essentially non-deterministic
> > 
> >   - We call virCgroupAddTask with pid=$child
> > 
> > This places the pid in any resource controllers we need,
> > which systemd has not already setup. IOW, it guarantees that we now
> > have placement that should look like this, regardless of
> > what systemd has done:
> > 
> >   10:freezer:/machine.slice/machine-qemu\x2dserial.scope
> >   9:cpuset:/machine.slice/machine-qemu\x2dserial.scope
> >   8:perf_event:/machine.slice/machine-qemu\x2dserial.scope
> >   7:hugetlb:/machine.slice/machine-qemu\x2dserial.scope
> >   6:blkio:/machine.slice/machine-qemu\x2dserial.scope
> >   5:memory:/machine.slice/machine-qemu\x2dserial.scope
> >   4:net_cls,net_prio:/machine.slice/machine-qemu\x2dserial.scope
> >   3:devices:/machine.slice/machine-qemu\x2dserial.scope
> >   2:cpu,cpuacct:/machine.slice/machine-qemu\x2dserial.scope
> >   1:name=systemd:/machine.slice/machine-qemu\x2dserial.scope
> > 
> >  - The QEMU driver now lets the child process exec QEMU. QEMU
> > creates its vCPU threads at this point. All QEMU threads (emulator,
> > vcpu and I/O threads) now have the cgroup placement shown above.
> > 
> >  - We create the emulator cgroup for the cpuset, cpu, cpuacct
> > controllers move all threads into this new cgroup. All threads
> > (emulator, vcpu and I/O threads) thus now have placement of:
> > 
> >10:freezer:/machine.slice/machine-qemu\x2dserial.scope
> >9:cpuset:/machine.slice/machine-qemu\x2dserial.scope/emulator
> >8:perf_event:/machine.slice/machine-qemu\x2dserial.scope
> >7:hugetlb:/machine.slice/machine-qemu\x2dserial.scope
> >6:blkio:/machine.slice/machine-qemu\x2dserial.scope
> >5:memory:/machine.slice/machine-qemu\x2dserial.scope
> >4:net_cls,net_prio:/machine.slice/machine-qemu\x2dserial.scope
> >3:devices:/machine.slice/machine-qemu\x2dserial.scope
> >
> > 

Re: [libvirt] [PATCH v2 0/4] Adjustment to recent cgroup/cpuset changes (for 1.3.1)

2016-01-14 Thread Henning Schild
On Thu, 14 Jan 2016 16:42:12 +
"Daniel P. Berrange"  wrote:

> On Thu, Jan 14, 2016 at 11:21:25AM -0500, John Ferlan wrote:
> > v1:
> > http://www.redhat.com/archives/libvir-list/2016-January/msg00511.html
> > 
> > As discussed during the replies of the v1 - revert Henning's first
> > two patches, plus the one I made as a result of those.
> > 
> > Patch 4/4 is already ACK'd 
> > 
> > John Ferlan (4):
> >   Revert "qemu: do not put a task into machine cgroup"
> >   Revert "util: cgroups do not implicitly add task to new machine
> > cgroup"
> >   Revert "lxc_cgroup: Add check for NULL cgroup before AddTask call"
> >   cgroup: Fix possible bug as a result of code motion for vcpu
> > cgroup setup  
> 
> ACK to all

Same here!

Daniel do you want to fix the "first qemu thread is no emulator" issue,
or should i give it another try?

Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/4] cgroup: Fix possible bug as a result of code motion for vcpu cgroup setup

2016-01-12 Thread Henning Schild
On Mon, 11 Jan 2016 13:50:32 -0500
John Ferlan  wrote:

> Commit id '90b721e43' moved where the virCgroupAddTask was made until
> after the check for the vcpupin checks. However, in doing so it missed
> an option where if the cpumap didn't exist, then the code would
> continue back to the top of the current vcpu loop. The results was
> that the virCgroupAddTask wouldn't be called.
> 
> Signed-off-by: John Ferlan 
> ---
>  src/qemu/qemu_cgroup.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
> index 1c406ce..91b3328 100644
> --- a/src/qemu/qemu_cgroup.c
> +++ b/src/qemu/qemu_cgroup.c
> @@ -1079,10 +1079,7 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm)
>  }
>  }
>  
> -if (!cpumap)
> -continue;
> -
> -if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
> +if (cpumap && qemuSetupCgroupCpusetCpus(cgroup_vcpu,
> cpumap) < 0) goto cleanup;
>  }
>  

Good catch, should be applied!

Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2016-01-11 Thread Henning Schild
On Fri, 8 Jan 2016 11:05:59 -0500
John Ferlan  wrote:

> 
> >>
> >> I'm leaning towards something in the test. I'll check if reverting
> >> these changes alters the results. I don't imagine it will.
> > 
> > The real question is which thread it fails on and at what point in
> > time. My patches only changed the order of operations where threads
> > enter the cpuset cgroups at a slightly different time. And the qemu
> > main thread never enters the parent group, it becomes an
> > emulator-thread. Maybe you can point to exactly the assertion that
> > fails. Including a link to the test code. And yes if you can
> > confirm that the patches are to blame that would be a good first
> > step ;).
> > 
> > Thanks,
> > Henning
> > 
> 
> Update:
> 
> I have found that if I revert patch 2...
> 
> Then modify qemuInitCgroup() to modify the virCgroupNewMachine check
> to also ensure "|| !priv->cgroup)

I see the check for the parent cgroup should probably go back into
virCgroupNewMachine, including the cleanup stuff in case of failure.

> Then modify qemuSetupCgroupForEmulator() to make the
> virCgroupAddTask() call like was in patch 2
> 
> Then modify patch 3 (qemuSetupCgroupForVcpu) to change the call:
> 
> 
>  if (!cpumap)
>  continue;
> 
>  if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
>  goto cleanup;
> 
> to
> 
>  if (cpumap &&
>  qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
>  goto cleanup;
> 

Well that is not a syntactical change, maybe easier to read and in line
with the other places where qemuSetupCgroupCpusetCpus is called.

> Then retest and the test passes again.
>
> Note that taking this route, I found that when I start the guest, I
> have the following in 'tasks':
> 
> # cat /sys/fs/cgroup/memory/machine.slice/tasks
> # cat /sys/fs/cgroup/memory/machine.slice/*/tasks
> 15007
> 15008
> 15010
> 15011
> 15013
> #
> 
> Where '15007' is the virt-tests-vm1 process (eg, /proc/$pid/cgroup).
> If I read the intentions you had, this follows that...
> 
> I'll post a couple of patches in a bit...
> 
> John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2016-01-11 Thread Henning Schild
On Mon, 11 Jan 2016 07:05:11 -0500
John Ferlan <jfer...@redhat.com> wrote:

> 
> 
> On 01/11/2016 06:38 AM, Henning Schild wrote:
> > On Fri, 8 Jan 2016 11:05:59 -0500
> > John Ferlan <jfer...@redhat.com> wrote:
> > 
> >>
> >>>>
> >>>> I'm leaning towards something in the test. I'll check if
> >>>> reverting these changes alters the results. I don't imagine it
> >>>> will.
> >>>
> >>> The real question is which thread it fails on and at what point in
> >>> time. My patches only changed the order of operations where
> >>> threads enter the cpuset cgroups at a slightly different time.
> >>> And the qemu main thread never enters the parent group, it
> >>> becomes an emulator-thread. Maybe you can point to exactly the
> >>> assertion that fails. Including a link to the test code. And yes
> >>> if you can confirm that the patches are to blame that would be a
> >>> good first step ;).
> >>>
> >>> Thanks,
> >>> Henning
> >>>
> >>
> >> Update:
> >>
> >> I have found that if I revert patch 2...
> >>
> >> Then modify qemuInitCgroup() to modify the virCgroupNewMachine
> >> check to also ensure "|| !priv->cgroup)
> > 
> > I see the check for the parent cgroup should probably go back into
> > virCgroupNewMachine, including the cleanup stuff in case of failure.
> > 
> 
> Forgot to CC you (and Jan) on the 4 patch series I sent:
> 
> http://www.redhat.com/archives/libvir-list/2016-January/msg00236.html
> 
> Patches 2, 3, & 4 are related to above while patch 1 is for below.

If you are subscribed could you please send me a copy of the mails - as
received on the list, for review?
 
> John
> 
> >> Then modify qemuSetupCgroupForEmulator() to make the
> >> virCgroupAddTask() call like was in patch 2
> >>
> >> Then modify patch 3 (qemuSetupCgroupForVcpu) to change the call:
> >>
> >>
> >>  if (!cpumap)
> >>  continue;
> >>
> >>  if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
> >> 0) goto cleanup;
> >>
> >> to
> >>
> >>  if (cpumap &&
> >>  qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
> >> 0) goto cleanup;
> >>
> > 
> > Well that is not a syntactical change, maybe easier to read and in
> > line with the other places where qemuSetupCgroupCpusetCpus is
> > called.
> > 
> >> Then retest and the test passes again.
> >>
> >> Note that taking this route, I found that when I start the guest, I
> >> have the following in 'tasks':
> >>
> >> # cat /sys/fs/cgroup/memory/machine.slice/tasks
> >> # cat /sys/fs/cgroup/memory/machine.slice/*/tasks
> >> 15007
> >> 15008
> >> 15010
> >> 15011
> >> 15013
> >> #
> >>
> >> Where '15007' is the virt-tests-vm1 process
> >> (eg, /proc/$pid/cgroup). If I read the intentions you had, this
> >> follows that...
> >>
> >> I'll post a couple of patches in a bit...
> >>
> >> John
> > 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2016-01-08 Thread Henning Schild
On Thu, 7 Jan 2016 19:56:33 -0500
John Ferlan <jfer...@redhat.com> wrote:

> 
> 
> On 01/07/2016 02:01 PM, Henning Schild wrote:
> > On Thu, 7 Jan 2016 11:20:23 -0500
> > John Ferlan <jfer...@redhat.com> wrote:
> > 
> >>
> >> [...]
> >>
> >>>> No problem - although it seems they've generated a regression in
> >>>> the virttest memtune test suite.  I'm 'technically' on vacation
> >>>> for the next couple of weeks; however, I think/perhaps the
> >>>> problem is a result of this patch and the change to adding the
> >>>> task to the cgroup at the end of the for loop, but perhaps the
> >>>> following code causes the control to jump back to the top of the
> >>>> loop:
> >>>>
> >>>>  if (!cpumap)
> >>>>  continue;
> >>>>
> >>>>   if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap)
> >>>> < 0) goto cleanup;
> >>>>
> >>>> not allowing the
> >>>>
> >>>>
> >>>> /* move the thread for vcpu to sub dir */
> >>>> if (virCgroupAddTask(cgroup_vcpu,
> >>>>  qemuDomainGetVcpuPid(vm, i)) < 0)
> >>>> goto cleanup;
> >>>>
> >>>> to be executed.
> >>>>
> >>>> The code should probably change to be (like IOThreads):
> >>>>
> >>>>  if (cpumap &&
> >>>>  qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
> >>>> 0) goto cleanup;
> >>>>
> >>>>
> >>>> As for the rest, I suspect things will be quite quiet around here
> >>>> over the next couple of weeks. A discussion to perhaps start in
> >>>> the new year.
> >>>
> >>> Same here. I will have a look at that regression after my
> >>> vacation, should it still be there.
> >>>
> >>> Henning
> >>>
> >>
> >> More data from the issue...  While the above mentioned path is an
> >> issue, I don't believe it's what's causing the test failure.
> >>
> >> I haven't quite figured out why yet, but it seems
> >> the /proc/#/cgroup file isn't getting the proper path for the
> >> 'memory' slice and thus the test fails because it's looking at the:
> >>
> >>/sys/fs/cgroup/memory/machine.slice/memory.*
> >>
> >> files instead of the
> >>
> >> /sys/fs/cgroup/memory/machine.slice/$path/memory.*
> > 
> > To be honest i did just look at the cgroup/cpuset/ hierarchy, but i
> > just browsed cgroup/memory/ as well.
> > 
> > The target of my patch series was to get
> > cgroup/cpuset/machine.slice/tasks to be emtpy, all tasks should be
> > in their sub-cgroup under the machine.slice. And the ordering
> > patches make sure the file is always empty.
> > 
> > In the memory cgroups all tasks are in the parent group (all in
> > machine.slice/tasks). machine.slice/*/tasks are empty. I am not sure
> > whether that is intended, i can just assume it is a bug in the
> > memory cgroup subsystem. Why are the groups created and tuned when
> > the tasks stay in the big superset?
> 
> TBH - there's quite a bit of this that mystifies me... Use of cgroups
> is not something I've spent a whole lot of time looking at...
> 
> I guess I've been working under the assumption that when the
> machine.slice/$path is created, the domain would use that for all
> cgroup specific file adjustments for that domain. Not sure how the
> /proc/$pid/cgroup is related to this.
> 
> My f23 system seems to generate the /proc/$pid/cgroup with the
> machine.slice/$path/ for each of the cgroups libvirt cares about while
> the f20 system with the test only has that path for cpuset and
> cpu,cpuacct. Since that's what the test uses for to find the memory
> path for validation that's why it fails.
> 
> I've been looking through the libvirtd debug logs to see if anything
> jumps out at me, but it seems both the systems I've looked at will
> build the path for the domain using the machine.slice/$path as seen
> during domain startup.
> 
> Very odd - perhaps looking at it too long right now though!
> 
> 
> > /proc/#/cgroup is showing the correct path, libvirt seems to fail to
> > migrate tasks into memory subgroups. (i am talking about a patched
> > 1.2.19 where vms do not have any special memory tuning)

Re: [libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2016-01-07 Thread Henning Schild
On Thu, 7 Jan 2016 11:20:23 -0500
John Ferlan  wrote:

> 
> [...]
> 
> >> No problem - although it seems they've generated a regression in
> >> the virttest memtune test suite.  I'm 'technically' on vacation
> >> for the next couple of weeks; however, I think/perhaps the problem
> >> is a result of this patch and the change to adding the task to the
> >> cgroup at the end of the for loop, but perhaps the following code
> >> causes the control to jump back to the top of the loop:
> >>
> >>  if (!cpumap)
> >>  continue;
> >>
> >>   if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
> >> 0) goto cleanup;
> >>
> >> not allowing the
> >>
> >>
> >> /* move the thread for vcpu to sub dir */
> >> if (virCgroupAddTask(cgroup_vcpu,
> >>  qemuDomainGetVcpuPid(vm, i)) < 0)
> >> goto cleanup;
> >>
> >> to be executed.
> >>
> >> The code should probably change to be (like IOThreads):
> >>
> >>  if (cpumap &&
> >>  qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) <
> >> 0) goto cleanup;
> >>
> >>
> >> As for the rest, I suspect things will be quite quiet around here
> >> over the next couple of weeks. A discussion to perhaps start in
> >> the new year.
> > 
> > Same here. I will have a look at that regression after my vacation,
> > should it still be there.
> > 
> > Henning
> > 
> 
> More data from the issue...  While the above mentioned path is an
> issue, I don't believe it's what's causing the test failure.
>
> I haven't quite figured out why yet, but it seems the /proc/#/cgroup
> file isn't getting the proper path for the 'memory' slice and thus the
> test fails because it's looking at the:
> 
>/sys/fs/cgroup/memory/machine.slice/memory.*
> 
> files instead of the
> 
> /sys/fs/cgroup/memory/machine.slice/$path/memory.*

To be honest i did just look at the cgroup/cpuset/ hierarchy, but i
just browsed cgroup/memory/ as well.

The target of my patch series was to get
cgroup/cpuset/machine.slice/tasks to be emtpy, all tasks should be in
their sub-cgroup under the machine.slice. And the ordering patches make
sure the file is always empty.

In the memory cgroups all tasks are in the parent group (all in
machine.slice/tasks). machine.slice/*/tasks are empty. I am not sure
whether that is intended, i can just assume it is a bug in the memory
cgroup subsystem. Why are the groups created and tuned when the tasks
stay in the big superset?
/proc/#/cgroup is showing the correct path, libvirt seems to fail to
migrate tasks into memory subgroups. (i am talking about a patched
1.2.19 where vms do not have any special memory tuning)

Without my patches the first qemu thread was in
"2:cpuset:/machine.slice" and the name did match
"4:memory:/machine.slice". Now if the test wants matching names the
test might just be wrong. Or as indicated before there might be a bug
in the memory cgroups.

> Where $path is "machine-qemu\x2dvirt\x2dtests\x2dvm1.scope"
> 
> This affects the virsh memtune $dom command test suite which uses the
> /proc/$pid/cgroup file in order to find the path for the 'memory' or
> 'cpuset' or 'cpu,cpuacct' cgroup paths.
> 
> Seems to be some interaction with systemd that I have quite figured
> out.
> 
> I'm assuming this is essentially the issue you were trying to fix -
> that is changes to values should be done to the machine-qemu*
> specific files rather than the machine.slice files.
> 
> The good news is I can see the changes occurring in the machine-qemu*
> specific files, so it seems libvirt is doing the right thing.
> 
> However, there's something strange with perhaps previously
> existing/running domains where that /proc/$pid/cgroup file doesn't get
> the $path for the memory entry, thus causing the test validation to
> look in the wrong place.
> 
> Hopefully this makes sense. What's really strange (for me at least) is
> that it's only occurring on one test system. I can set up the same
> test on another system and things work just fine.  I'm not quite sure
> what interaction generates that /proc/$pid/cgroup file - hopefully
> someone else understands it and help me make sense of it.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2015-12-21 Thread Henning Schild
On Mon, 14 Dec 2015 16:27:54 -0500
John Ferlan <jfer...@redhat.com> wrote:

> 
> 
> On 11/13/2015 11:56 AM, Henning Schild wrote:
> > Hi,
> > 
> > i already explained some of the cgroup problems in some detail so i
> > will not do that again.
> > https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html
> > 
> > I managed to solve some of the problems in the current codebase, and
> > am now sharing the patches. But they are really just half of what i 
> > had to change to get libvirt to behave in a system with isolated
> > cpus.
> > 
> > Other changes/hacks i am not sending here because they do not work
> > for the general case:
> > - create machine.slice before starting libvirtd (smaller than root)
> >   ... and hope it wont grow
> > - disabling cpuset.cpus inheritance in libvirtd
> > - allowing only xml with fully specified cputune
> > - set machine cpuset to (vcpupins | emulatorpin)
> > 
> > I am not sure how useful the individual fixes are, i am sending them
> > as concrete examples for the problems i described earlier. And i am
> > hoping that will start a discussion.
> > 
> > Henning
> > 
> > Henning Schild (3):
> >   util: cgroups do not implicitly add task to new machine cgroup
> >   qemu: do not put a task into machine cgroup
> >   qemu cgroups: move new threads to new cgroup after cpuset is set
> > up
> > 
> >  src/lxc/lxc_cgroup.c   |  6 ++
> >  src/qemu/qemu_cgroup.c | 23 ++-
> >  src/util/vircgroup.c   | 22 --
> >  3 files changed, 20 insertions(+), 31 deletions(-)
> > 
> 
> 
> The updated code looks fine to me - although it didn't directly git am
> -3 to top of tree - I was able to make a few adjustments to get things
> merged...  Since no one has objected to this ordering change - I've
> pushed.

Sorry the patches where still based on v1.2.19. Thanks for the merge
and accepting them!

Wrong operation ordering within libvirt cgroups (like the ones
fixed by the patches) could still push tasks onto dedicated cpus. And
more importantly other cgroups users can still grab the dedicated cpus
as well. The only reliable solution to prevent that seems to be making
use of the "exclusive" feature of cpusets. And that would imply
changing the cgroups layout of libvirt again. Because sets can not be
partially exclusive and libvirt deals with dedicated cpus and shared
ones.
How to deal with these problems is a discussion that i wanted to get
started with this patch-series. It would be nice to receive general
comments on that. How should we proceed here? I could maybe write an
RFC mail describing the problems again and suggesting changes to
libvirt on a conceptual basis.

But until then maybe people responsible for cgroups in libvirt (Paul
and Martin?) can again look at
https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html
There i described how naive use of cgoups can place tasks on cpus that
are supposed to be isolated/dedicated/exclusive. Even if libvirt does
not make these mistakes it should protect itself against docker,
systemd, ...

Henning


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2015-12-21 Thread Henning Schild
On Mon, 21 Dec 2015 12:44:32 -0500
John Ferlan <jfer...@redhat.com> wrote:

> 
> 
> On 12/21/2015 03:36 AM, Henning Schild wrote:
> > On Mon, 14 Dec 2015 16:27:54 -0500
> > John Ferlan <jfer...@redhat.com> wrote:
> > 
> >>
> >>
> >> On 11/13/2015 11:56 AM, Henning Schild wrote:
> >>> Hi,
> >>>
> >>> i already explained some of the cgroup problems in some detail so
> >>> i will not do that again.
> >>> https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html
> >>>
> >>> I managed to solve some of the problems in the current codebase,
> >>> and am now sharing the patches. But they are really just half of
> >>> what i had to change to get libvirt to behave in a system with
> >>> isolated cpus.
> >>>
> >>> Other changes/hacks i am not sending here because they do not work
> >>> for the general case:
> >>> - create machine.slice before starting libvirtd (smaller than
> >>> root) ... and hope it wont grow
> >>> - disabling cpuset.cpus inheritance in libvirtd
> >>> - allowing only xml with fully specified cputune
> >>> - set machine cpuset to (vcpupins | emulatorpin)
> >>>
> >>> I am not sure how useful the individual fixes are, i am sending
> >>> them as concrete examples for the problems i described earlier.
> >>> And i am hoping that will start a discussion.
> >>>
> >>> Henning
> >>>
> >>> Henning Schild (3):
> >>>   util: cgroups do not implicitly add task to new machine cgroup
> >>>   qemu: do not put a task into machine cgroup
> >>>   qemu cgroups: move new threads to new cgroup after cpuset is set
> >>> up
> >>>
> >>>  src/lxc/lxc_cgroup.c   |  6 ++
> >>>  src/qemu/qemu_cgroup.c | 23 ++-
> >>>  src/util/vircgroup.c   | 22 --
> >>>  3 files changed, 20 insertions(+), 31 deletions(-)
> >>>
> >>
> >>
> >> The updated code looks fine to me - although it didn't directly
> >> git am -3 to top of tree - I was able to make a few adjustments to
> >> get things merged...  Since no one has objected to this ordering
> >> change - I've pushed.
> > 
> > Sorry the patches where still based on v1.2.19. Thanks for the merge
> > and accepting them!
> > 
> 
> No problem - although it seems they've generated a regression in the
> virttest memtune test suite.  I'm 'technically' on vacation for the
> next couple of weeks; however, I think/perhaps the problem is a
> result of this patch and the change to adding the task to the cgroup
> at the end of the for loop, but perhaps the following code causes the
> control to jump back to the top of the loop:
> 
>  if (!cpumap)
>  continue;
> 
>   if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
>  goto cleanup;
> 
> not allowing the
> 
> 
> /* move the thread for vcpu to sub dir */
> if (virCgroupAddTask(cgroup_vcpu,
>  qemuDomainGetVcpuPid(vm, i)) < 0)
> goto cleanup;
> 
> to be executed.
> 
> The code should probably change to be (like IOThreads):
> 
>  if (cpumap &&
>  qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
>  goto cleanup;
> 
> 
> As for the rest, I suspect things will be quite quiet around here over
> the next couple of weeks. A discussion to perhaps start in the new
> year.

Same here. I will have a look at that regression after my vacation,
should it still be there.

Henning

> John
> 
> 
> > Wrong operation ordering within libvirt cgroups (like the ones
> > fixed by the patches) could still push tasks onto dedicated cpus.
> > And more importantly other cgroups users can still grab the
> > dedicated cpus as well. The only reliable solution to prevent that
> > seems to be making use of the "exclusive" feature of cpusets. And
> > that would imply changing the cgroups layout of libvirt again.
> > Because sets can not be partially exclusive and libvirt deals with
> > dedicated cpus and shared ones.
> > How to deal with these problems is a discussion that i wanted to get
> > started with this patch-series. It would be nice to receive general
> > comments on that. How should we proceed here? I could maybe write an
> > RFC mail describing the problems again and suggesting changes to
> > libvirt on a conceptual basis.
> > 
> > But until then maybe people responsible for cgroups in libvirt (Paul
> > and Martin?) can again look at
> > https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html
> > There i described how naive use of cgoups can place tasks on cpus
> > that are supposed to be isolated/dedicated/exclusive. Even if
> > libvirt does not make these mistakes it should protect itself
> > against docker, systemd, ...
> > 
> > Henning
> > 
> > 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCHv2 1/3] util: cgroups do not implicitly add task to new machine cgroup

2015-12-13 Thread Henning Schild
virCgroupNewMachine used to add the pidleader to the newly created
machine cgroup. Do not do this implicit anymore.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/lxc/lxc_cgroup.c   | 11 +++
 src/qemu/qemu_cgroup.c | 11 +++
 src/util/vircgroup.c   | 22 --
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c
index ad254e4..609e9ea 100644
--- a/src/lxc/lxc_cgroup.c
+++ b/src/lxc/lxc_cgroup.c
@@ -504,6 +504,17 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def,
 ) < 0)
 goto cleanup;
 
+if (virCgroupAddTask(cgroup, initpid) < 0) {
+virErrorPtr saved = virSaveLastError();
+virCgroupRemove(cgroup);
+virCgroupFree();
+if (saved) {
+virSetError(saved);
+virFreeError(saved);
+}
+goto cleanup;
+}
+
 /* setup control group permissions for user namespace */
 if (def->idmap.uidmap) {
 if (virCgroupSetOwner(cgroup,
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 0da6c02..7320046 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -770,6 +770,17 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
+if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
+virErrorPtr saved = virSaveLastError();
+virCgroupRemove(priv->cgroup);
+virCgroupFree(>cgroup);
+if (saved) {
+virSetError(saved);
+virFreeError(saved);
+}
+goto cleanup;
+}
+
  done:
 ret = 0;
  cleanup:
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 0379c2e..a07f3c2 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1669,16 +1669,6 @@ virCgroupNewMachineSystemd(const char *name,
 }
 }
 
-if (virCgroupAddTask(*group, pidleader) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-}
-
 ret = 0;
  cleanup:
 virCgroupFree();
@@ -1701,7 +1691,6 @@ int virCgroupTerminateMachine(const char *name,
 static int
 virCgroupNewMachineManual(const char *name,
   const char *drivername,
-  pid_t pidleader,
   const char *partition,
   int controllers,
   virCgroupPtr *group)
@@ -1727,16 +1716,6 @@ virCgroupNewMachineManual(const char *name,
 group) < 0)
 goto cleanup;
 
-if (virCgroupAddTask(*group, pidleader) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-}
-
  done:
 ret = 0;
 
@@ -1783,7 +1762,6 @@ virCgroupNewMachine(const char *name,
 
 return virCgroupNewMachineManual(name,
  drivername,
- pidleader,
  partition,
  controllers,
  group);
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2/3] qemu: do not put a task into machine cgroup

2015-12-13 Thread Henning Schild
On Tue, 8 Dec 2015 12:23:19 -0500
John Ferlan <jfer...@redhat.com> wrote:

> 
> 
> On 11/13/2015 11:57 AM, Henning Schild wrote:
> > The machine cgroup is a superset, a parent to the emulator and vcpuX
> > cgroups. The parent cgroup should never have any tasks directly in
> > it. In fact the parent cpuset might contain way more cpus than the
> > sum of emulatorpin and vcpupins. So putting tasks in the superset
> > will allow them to run outside of .
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/qemu/qemu_cgroup.c | 10 --
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
> > index 28d2ca2..2c74a22 100644
> > --- a/src/qemu/qemu_cgroup.c
> > +++ b/src/qemu/qemu_cgroup.c
> > @@ -769,12 +769,6 @@ qemuInitCgroup(virQEMUDriverPtr driver,
> >  goto cleanup;
> >  }
> >  
> > -if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
> > -virCgroupRemove(priv->cgroup);
> > -virCgroupFree(>cgroup);
> > -goto cleanup;
> > -}
> > -
> 
> Moving this to later would also seem to imply that the code after the
> qemuSetupCgroup (which calls qemuInitCgroup) from qemuProcessLaunch
> would need some movement too, e.g.:
> 
> /* This must be done after cgroup placement to avoid resetting CPU
>  * affinity */
> if (!vm->def->cputune.emulatorpin &&
> qemuProcessInitCpuAffinity(vm) < 0)
> goto cleanup;
>
> Theoretically that would then need to be between the following:
> 
>VIR_DEBUG("Setting cgroup for emulator (if required)");
> if (qemuSetupCgroupForEmulator(vm) < 0)
> goto cleanup;
> 
> <<<... right here, I believe  ...>>>

Good catch! That code is confusing. I will try and merge
qemuProcessInitCpuAffinity with qemuProcessSetEmulatorAffinity.

> VIR_DEBUG("Setting affinity of emulator threads");
> if (qemuProcessSetEmulatorAffinity(vm) < 0)
> goto cleanup;
> 
> 
> Again, weak ACK - hopefully Peter/Martin can take a look. In any case
> a v2 probably should be done.
> 
> John
> >   done:
> >  ret = 0;
> >   cleanup:
> > @@ -1145,6 +1139,10 @@ qemuSetupCgroupForEmulator(virDomainObjPtr
> > vm) goto cleanup;
> >  }
> >  
> > +/* consider the first thread an emulator-thread */
> > +if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
> > +goto cleanup;
> > +
> >  virCgroupFree(_emulator);
> >  return 0;
> >  
> > 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/3] util: cgroups do not implicitly add task to new machine cgroup

2015-12-09 Thread Henning Schild
On Tue, 8 Dec 2015 12:23:14 -0500
John Ferlan <jfer...@redhat.com> wrote:

> 
> 
> On 11/13/2015 11:56 AM, Henning Schild wrote:
> > virCgroupNewMachine used to add the pidleader to the newly created
> > machine cgroup. Do not do this implicit anymore.
> > 
> > Signed-off-by: Henning Schild <henning.sch...@siemens.com>
> > ---
> >  src/lxc/lxc_cgroup.c   |  6 ++
> >  src/qemu/qemu_cgroup.c |  6 ++
> >  src/util/vircgroup.c   | 22 --
> >  3 files changed, 12 insertions(+), 22 deletions(-)
> > 
> > diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c
> > index ad254e4..e5ac893 100644
> > --- a/src/lxc/lxc_cgroup.c
> > +++ b/src/lxc/lxc_cgroup.c
> > @@ -504,6 +504,12 @@ virCgroupPtr
> > virLXCCgroupCreate(virDomainDefPtr def, ) < 0)
> >  goto cleanup;
> >  
> > +if (virCgroupAddTask(cgroup, initpid) < 0) {
> > +virCgroupRemove(cgroup);
> > +virCgroupFree();
> > +goto cleanup;
> > +}
> > +
> 
> For both this and qemu, the store/restore last error:
> 
>  virErrorPtr saved = virSaveLastError();
> ...
> 
>  if (saved) {
>virSetError(saved);
>virFreeError(saved);
>   }
> 
> Is "lost". I realize no other call to virCgroupRemove saves the error,
> but as I found in a different review:

Yes that was lost and i will get it back in. Further discussions on
where it should be are out of the scope of this series.

> http://www.redhat.com/archives/libvir-list/2015-October/msg00823.html
> 
> the call to virCgroupPathOfController from virCgroupRemove could
> overwrite the last error.
> 
> Even though others don't have it, I think perhaps we should ensure it
> still exists here. Or perhaps a patch prior to this one that would
> adjust the virCgroupRemove to "save/restore" the last error around the
> virCgroupPathOfController call...
> 
> 
> >  /* setup control group permissions for user namespace */
> >  if (def->idmap.uidmap) {
> >  if (virCgroupSetOwner(cgroup,
> > diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
> > index a8e0b8c..28d2ca2 100644
> > --- a/src/qemu/qemu_cgroup.c
> > +++ b/src/qemu/qemu_cgroup.c
> > @@ -769,6 +769,12 @@ qemuInitCgroup(virQEMUDriverPtr driver,
> >  goto cleanup;
> >  }
> >  
> > +if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
> > +virCgroupRemove(priv->cgroup);
> > +virCgroupFree(>cgroup);
> > +goto cleanup;
> > +}
> > +
> >   done:
> >  ret = 0;
> >   cleanup:
> > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > index 0379c2e..a07f3c2 100644
> > --- a/src/util/vircgroup.c
> > +++ b/src/util/vircgroup.c
> > @@ -1669,16 +1669,6 @@ virCgroupNewMachineSystemd(const char *name,
> >  }
> >  }
> >  
> > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > -virErrorPtr saved = virSaveLastError();
> > -virCgroupRemove(*group);
> > -virCgroupFree(group);
> > -if (saved) {
> > -virSetError(saved);
> > -virFreeError(saved);
> > -}
> > -}
> > -
> >  ret = 0;
> >   cleanup:
> >  virCgroupFree();
> > @@ -1701,7 +1691,6 @@ int virCgroupTerminateMachine(const char
> > *name, static int
> >  virCgroupNewMachineManual(const char *name,
> >const char *drivername,
> > -  pid_t pidleader,
> >const char *partition,
> >int controllers,
> >virCgroupPtr *group)
> > @@ -1727,16 +1716,6 @@ virCgroupNewMachineManual(const char *name,
> >  group) < 0)
> >  goto cleanup;
> >  
> > -if (virCgroupAddTask(*group, pidleader) < 0) {
> > -virErrorPtr saved = virSaveLastError();
> > -virCgroupRemove(*group);
> > -virCgroupFree(group);
> > -if (saved) {
> > -virSetError(saved);
> > -virFreeError(saved);
> > -}
> > -}
> > -
> >   done:
> >  ret = 0;
> >  
> > @@ -1783,7 +1762,6 @@ virCgroupNewMachine(const char *name,
> >  
> >  return virCgroupNewMachineManual(name,
> >   drivername,
> > - pidleader,
> >   partition,
> >   controllers,
> >   group);
> > 
> 
> Beyond that - things seem reasonable. I usually defer to Martin or
> Peter for cgroup stuff though...
> 
> Another thought/addition/change would be to have virCgroupNewMachine
> return 'cgroup' rather than have it as the last parameter and then
> check vs. NULL for success/failure rather than 0/-1...
> 
> Weak ACK - hopefully either Peter/Martin can look. I think Peter in
> particular may be interested due to upcoming vCpu changes.
> 
> John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCHv2 2/3] qemu: do not put a task into machine cgroup

2015-12-09 Thread Henning Schild
The machine cgroup is a superset, a parent to the emulator and vcpuX
cgroups. The parent cgroup should never have any tasks directly in it.
In fact the parent cpuset might contain way more cpus than the sum of
emulatorpin and vcpupins. So putting tasks in the superset will allow
them to run outside of .

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_cgroup.c  | 15 ---
 src/qemu/qemu_process.c | 12 ++--
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 7320046..85b8e4e 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -770,17 +770,6 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
-if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(priv->cgroup);
-virCgroupFree(>cgroup);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-goto cleanup;
-}
-
  done:
 ret = 0;
  cleanup:
@@ -1151,6 +1140,10 @@ qemuSetupCgroupForEmulator(virDomainObjPtr vm)
 goto cleanup;
 }
 
+/* consider the first thread an emulator-thread */
+if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
+goto cleanup;
+
 virCgroupFree(_emulator);
 return 0;
 
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index f7eb2b6..cfe1da8 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -4881,12 +4881,6 @@ int qemuProcessStart(virConnectPtr conn,
 if (qemuSetupCgroup(driver, vm, nnicindexes, nicindexes) < 0)
 goto cleanup;
 
-/* This must be done after cgroup placement to avoid resetting CPU
- * affinity */
-if (!vm->def->cputune.emulatorpin &&
-qemuProcessInitCpuAffinity(vm) < 0)
-goto cleanup;
-
 VIR_DEBUG("Setting domain security labels");
 if (virSecurityManagerSetAllLabel(driver->securityManager,
   vm->def, stdin_path) < 0)
@@ -4934,6 +4928,12 @@ int qemuProcessStart(virConnectPtr conn,
 if (qemuSetupCgroupForEmulator(vm) < 0)
 goto cleanup;
 
+/* This must be done after cgroup placement to avoid resetting CPU
+ * affinity */
+if (!vm->def->cputune.emulatorpin &&
+qemuProcessInitCpuAffinity(vm) < 0)
+goto cleanup;
+
 VIR_DEBUG("Setting affinity of emulator threads");
 if (qemuProcessSetEmulatorAffinity(vm) < 0)
 goto cleanup;
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 0/3] several cgroups/cpuset fixes

2015-11-13 Thread Henning Schild
Hi,

i already explained some of the cgroup problems in some detail so i
will not do that again.
https://www.redhat.com/archives/libvir-list/2015-October/msg00876.html

I managed to solve some of the problems in the current codebase, and
am now sharing the patches. But they are really just half of what i 
had to change to get libvirt to behave in a system with isolated cpus.

Other changes/hacks i am not sending here because they do not work for
the general case:
- create machine.slice before starting libvirtd (smaller than root)
  ... and hope it wont grow
- disabling cpuset.cpus inheritance in libvirtd
- allowing only xml with fully specified cputune
- set machine cpuset to (vcpupins | emulatorpin)

I am not sure how useful the individual fixes are, i am sending them
as concrete examples for the problems i described earlier. And i am
hoping that will start a discussion.

Henning

Henning Schild (3):
  util: cgroups do not implicitly add task to new machine cgroup
  qemu: do not put a task into machine cgroup
  qemu cgroups: move new threads to new cgroup after cpuset is set up

 src/lxc/lxc_cgroup.c   |  6 ++
 src/qemu/qemu_cgroup.c | 23 ++-
 src/util/vircgroup.c   | 22 --
 3 files changed, 20 insertions(+), 31 deletions(-)

-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 3/3] qemu cgroups: move new threads to new cgroup after cpuset is set up

2015-11-13 Thread Henning Schild
Moving tasks to cgroups implied sched_setaffinity. Changing the cpus in
a set implies the same for all tasks in the group.
The old code put the the thread into the cpuset inherited from the
machine cgroup, which allowed it to run outside of vcpupin for a short
while.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_cgroup.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 2c74a22..ab61a09 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -1030,10 +1030,6 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm)
true, _vcpu) < 0)
 goto cleanup;
 
-/* move the thread for vcpu to sub dir */
-if (virCgroupAddTask(cgroup_vcpu, priv->vcpupids[i]) < 0)
-goto cleanup;
-
 if (period || quota) {
 if (qemuSetupCgroupVcpuBW(cgroup_vcpu, period, quota) < 0)
 goto cleanup;
@@ -1067,6 +1063,11 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm)
 if (qemuSetupCgroupCpusetCpus(cgroup_vcpu, cpumap) < 0)
 goto cleanup;
 }
+
+/* move the thread for vcpu to sub dir */
+if (virCgroupAddTask(cgroup_vcpu, priv->vcpupids[i]) < 0)
+goto cleanup;
+
 }
 virCgroupFree(_vcpu);
 VIR_FREE(mem_mask);
@@ -1208,11 +1209,6 @@ qemuSetupCgroupForIOThreads(virDomainObjPtr vm)
true, _iothread) < 0)
 goto cleanup;
 
-/* move the thread for iothread to sub dir */
-if (virCgroupAddTask(cgroup_iothread,
- def->iothreadids[i]->thread_id) < 0)
-goto cleanup;
-
 if (period || quota) {
 if (qemuSetupCgroupVcpuBW(cgroup_iothread, period, quota) < 0)
 goto cleanup;
@@ -1239,6 +1235,11 @@ qemuSetupCgroupForIOThreads(virDomainObjPtr vm)
 goto cleanup;
 }
 
+/* move the thread for iothread to sub dir */
+if (virCgroupAddTask(cgroup_iothread,
+ def->iothreadids[i]->thread_id) < 0)
+goto cleanup;
+
 virCgroupFree(_iothread);
 }
 VIR_FREE(mem_mask);
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 2/3] qemu: do not put a task into machine cgroup

2015-11-13 Thread Henning Schild
The machine cgroup is a superset, a parent to the emulator and vcpuX
cgroups. The parent cgroup should never have any tasks directly in it.
In fact the parent cpuset might contain way more cpus than the sum of
emulatorpin and vcpupins. So putting tasks in the superset will allow
them to run outside of .

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/qemu/qemu_cgroup.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 28d2ca2..2c74a22 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -769,12 +769,6 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
-if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
-virCgroupRemove(priv->cgroup);
-virCgroupFree(>cgroup);
-goto cleanup;
-}
-
  done:
 ret = 0;
  cleanup:
@@ -1145,6 +1139,10 @@ qemuSetupCgroupForEmulator(virDomainObjPtr vm)
 goto cleanup;
 }
 
+/* consider the first thread an emulator-thread */
+if (virCgroupAddTask(cgroup_emulator, vm->pid) < 0)
+goto cleanup;
+
 virCgroupFree(_emulator);
 return 0;
 
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/3] util: cgroups do not implicitly add task to new machine cgroup

2015-11-13 Thread Henning Schild
virCgroupNewMachine used to add the pidleader to the newly created
machine cgroup. Do not do this implicit anymore.

Signed-off-by: Henning Schild <henning.sch...@siemens.com>
---
 src/lxc/lxc_cgroup.c   |  6 ++
 src/qemu/qemu_cgroup.c |  6 ++
 src/util/vircgroup.c   | 22 --
 3 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/lxc/lxc_cgroup.c b/src/lxc/lxc_cgroup.c
index ad254e4..e5ac893 100644
--- a/src/lxc/lxc_cgroup.c
+++ b/src/lxc/lxc_cgroup.c
@@ -504,6 +504,12 @@ virCgroupPtr virLXCCgroupCreate(virDomainDefPtr def,
 ) < 0)
 goto cleanup;
 
+if (virCgroupAddTask(cgroup, initpid) < 0) {
+virCgroupRemove(cgroup);
+virCgroupFree();
+goto cleanup;
+}
+
 /* setup control group permissions for user namespace */
 if (def->idmap.uidmap) {
 if (virCgroupSetOwner(cgroup,
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index a8e0b8c..28d2ca2 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -769,6 +769,12 @@ qemuInitCgroup(virQEMUDriverPtr driver,
 goto cleanup;
 }
 
+if (virCgroupAddTask(priv->cgroup, vm->pid) < 0) {
+virCgroupRemove(priv->cgroup);
+virCgroupFree(>cgroup);
+goto cleanup;
+}
+
  done:
 ret = 0;
  cleanup:
diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
index 0379c2e..a07f3c2 100644
--- a/src/util/vircgroup.c
+++ b/src/util/vircgroup.c
@@ -1669,16 +1669,6 @@ virCgroupNewMachineSystemd(const char *name,
 }
 }
 
-if (virCgroupAddTask(*group, pidleader) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-}
-
 ret = 0;
  cleanup:
 virCgroupFree();
@@ -1701,7 +1691,6 @@ int virCgroupTerminateMachine(const char *name,
 static int
 virCgroupNewMachineManual(const char *name,
   const char *drivername,
-  pid_t pidleader,
   const char *partition,
   int controllers,
   virCgroupPtr *group)
@@ -1727,16 +1716,6 @@ virCgroupNewMachineManual(const char *name,
 group) < 0)
 goto cleanup;
 
-if (virCgroupAddTask(*group, pidleader) < 0) {
-virErrorPtr saved = virSaveLastError();
-virCgroupRemove(*group);
-virCgroupFree(group);
-if (saved) {
-virSetError(saved);
-virFreeError(saved);
-}
-}
-
  done:
 ret = 0;
 
@@ -1783,7 +1762,6 @@ virCgroupNewMachine(const char *name,
 
 return virCgroupNewMachineManual(name,
  drivername,
- pidleader,
  partition,
  controllers,
  group);
-- 
2.4.10

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Again cgroups and isolcpus

2015-10-29 Thread Henning Schild
Hi folks,

i already started a discussion on the interaction of cgroups and
isolcpus a while ago. But now i believe i have got a better
understanding of how the two interact and i can describe problems that
arise from that.

The scenario: A machine that runs realtime tasks on pcpus reserved with
isolcpus. It also runs VMs with the help of libvirt. It might also run
realtime VMs with the help of libvirt.

Moving a task into a new cgroup/cpuset and some modifications of the
cpus in that set imply a setaffinity by the kernel. That affinity
setting will ignore isolcpus. The result is possible "interference by"
or "starvation of" these tasks.

Now let me describe one scenario where that implicit setaffinity
becomes a problem for our realtime system.
libvirt creates a superset called the machine.slice and subsets called
emulator and vpuX. By default the machine.slice inherits from the root
which contains all pcpus, also the isolated ones. Now moving a task
into that superset will place that task on isolcpus where it might
interfere or simply starve.
Turns out that a fresh qemu actually is put into that superset. That is
a bug that should be fixed but let me address that one in another mail.

My current point of view is that we need a strong mechanism to isolate
cpus. isolcpus just is not good enough. The measure of choice probably
is cpusets as well, and this time with the exclusive flag turned on.
That will stop every other cpuset user from messing around with
those cpus by accident.

I am thinking of one or more cpusets where isolated cpus are
parked and not used within this cgroup. Anyone wanting to use one of
them will have to take it out there and explictily put it into their a
new set. Now if libvirt makes the mistake to have tasks running in
supersets these tasks will spread to the newly added rt-cpu. Or new
tasks that run in supersets will end up on rt-cpus already in use. But
at least we have containment in libvirt and the VMs it spawned.

For alloc and free of rt-cpus i am planning to use libvirt hooks to
begin with, from what i read they should enable me to do what i need.
What do you guys think about the general idea to address the described
problem?

I will implement a prototype of the alloc-free of rt-cpus. My current
hope is that libvirt hooks can be abused for that.

I am thinking that at some point libvirt should be able to do that
without hooks. It should get a notion of reserved ressources that are
currently parked in other cgroups. My current suspicion is that the
cpusets might just be the tip of the iceberg. -- for now i am running
libvirt without cgroups to keep my isolcpus free

cgroups/cpusets offer a switch to make a cpu exclusive to a set. That
switch is great because it will act as an assert, a second line of
defense. Having seen how cpusets and migration mess around with
affinities i guess for realtime people have to insist on that second
line of defense. Especially in times where cgroups are all over the
place.

In openstack one would actually say that a pcpu should be "dedicated".
That will result in a vcpupin on exactly one pcpu. Unfortunately one
meaning of "dedicated" gets lost in translation. It could otherwise be
used by libvirtd to set cpuset.cpu_exclusive in the vcpu-cgroup.
And i am bringing that up here because i do not think libvirt allows
me to influence the cpu_exclusive flag for my vcpu cgroups.

Henning 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] cpu affinity, isolcpus and cgroups

2015-10-14 Thread Henning Schild
On Thu, 2 Jul 2015 17:27:21 +0100
"Daniel P. Berrange" <berra...@redhat.com> wrote:

> On Thu, Jul 02, 2015 at 04:42:47PM +0200, Henning Schild wrote:
> > On Thu, 2 Jul 2015 15:18:46 +0100
> > "Daniel P. Berrange" <berra...@redhat.com> wrote:
> > 
> > > On Thu, Jul 02, 2015 at 04:02:58PM +0200, Henning Schild wrote:
> > > > Hi,
> > > > 
> > > > i am currently looking into realtime VMs using libvirt. My first
> > > > starting point was reserving a couple of cores using isolcpus
> > > > and later tuning the affinity to place my vcpus on the reserved
> > > > pcpus.
> > > > 
> > > > My first observation was that libvirt ignores isolcpus. Affinity
> > > > masks of new qemus will default to all cpus and will not be
> > > > inherited from libvirtd. A comment in the code suggests that
> > > > this is done on purpose.
> > > 
> > > Ignore realtime + isolcpus for a minute. It is not unreasonable
> > > for the system admin to decide system services should be
> > > restricted to run on a certain subset of CPUs. If we let VMs
> > > inherit the CPU pinning on libvirtd, we'd be accidentally
> > > confining VMs to a subset of CPUs too. With new cgroups layout,
> > > libvirtd lives in a separate cgroups tree /system.slice, while
> > > VMs live in /machine.slice. So for both these reasons, when
> > > starting VMs, we explicitly ignore any affinity libvirtd has and
> > > set VMs mask to allow any CPU.

Since i started making heavy use of realtime priorities on 100% busy
threads i started running into starvation problems.
I just found a stuck qemu that still had the affinity of all 'f' and no
high prio yet. But it got unlucky and ended up in the scheduling q on
one of my busy cores ... that qemu never came to life.

I do not remember the details of the last time we discussed the topic,
the take-away was that libvirt itself does not do policy. The policy
(affinity and prio) comes from nova, but there should be no time where
the qemu is already running with the policy not yet applied. That can
cause starvation and disturbance of realtime workloads.
To me it seems there is such a time-window. If there is i need a way to
limit such new-born hypervisors to a cpuset, actually they should just
inherit it from libvirtd ... isolcpus.

> > Sure, that was my first guess as well. Still i wanted to raise the
> > topic again from the realtime POV.
> > I am using a pretty recent libvirt from git but did not come across
> > the system.slice yet. Might be a matter of configuration/invocation
> > of libvirtd.
> 
> Oh, I should mention that I'm referring to OS that use systemd
> for their init system here, not legacy sysvinit
> 
> FWIW our cgroups layout is described here
> 
>   http://libvirt.org/cgroups.html

The system.slice does not have a libvirtd.service in my case but my
libvirtd is running in a screen and not started using systemd. Might
that be causing the problem?

> > 
> > > > After that i changed the code to use only the available cpus by
> > > > default. But taskset was still showing all 'f's on my qemus.
> > > > Then i traced my change down to sched_setaffinity assuming that
> > > > some other mechanism might have reverted my hack, but it is
> > > > still in place.
> > > 
> > > From the libvirt POV, we can't tell whether the admin set isolcpus
> > > because they want to reserve those CPUs only for VMs, or because
> > > they want to stop VMs using those CPUs by default. As such libvirt
> > > does not try to interpret isolcpus at all, it leaves it upto a
> > > higher level app to decide on this policy.
> > 
> > I know, you have to tell libvirt that the reservation is actually
> > for libvirt. My idea was to introduce a config option in libvirt
> > and maybe sanity check it by looking at whether the pcpus are
> > actually reserved. Rik recently posted a patch to allow easy
> > programmatic checking of isolcpus via sysfs.
> 
> In libvirt we try to have a general principle that libvirt will
> provide the mechanism but not implement usage policy. So if we
> follow a strict interpretation here, then applying CPU mask
> based on isolcpus would be out of scope for libvirt, since we
> expose a sufficiently flexible mechanism to implement any
> desired policy at a higher level.
> 
> > > In the case of OpenStack, the /etc/nova/nova.conf allows a config
> > > setting  'vcpu_pin_set' to say what set of CPUs VMs should be
> > > allowed to run on, and nova will then update the libvirt XML when
> > > star

[libvirt] cpu affinity, isolcpus and cgroups

2015-07-02 Thread Henning Schild
Hi,

i am currently looking into realtime VMs using libvirt. My first
starting point was reserving a couple of cores using isolcpus and later
tuning the affinity to place my vcpus on the reserved pcpus.

My first observation was that libvirt ignores isolcpus. Affinity masks
of new qemus will default to all cpus and will not be inherited from
libvirtd. A comment in the code suggests that this is done on purpose.

After that i changed the code to use only the available cpus by
default. But taskset was still showing all 'f's on my qemus. Then i
traced my change down to sched_setaffinity assuming that some other
mechanism might have reverted my hack, but it is still in place.

Libvirt is setting up cgroups and now my suspicion is that cgroups and
taskset might not work well together.
 /sys/fs/cgroup/cpu/machine.slice/machine-qemu\x2dvm1.scope/vcpu0#
 cpuacct.usage_percpu
 247340587 50851635 89631114 23383025 412639264 1241965 55442753 19923
 14093629 15863859 27403280 1292195745 82031088 53690508 135826421
 124915000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Looks like the last 16 cores are not used.

But if i use taskset to ask for the affinity mask i get all 32 cpus.

  taskset -p `cat tasks`
 pid 12905's current affinity mask: 

I know that is not strictly libvirt but also a kernel question, still
you guys are probably able to point me to what i am missing here.

 Linux 3.18.11+ #4 SMP PREEMPT RT

regards,
Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] cpu affinity, isolcpus and cgroups

2015-07-02 Thread Henning Schild
On Thu, 2 Jul 2015 15:18:46 +0100
Daniel P. Berrange berra...@redhat.com wrote:

 On Thu, Jul 02, 2015 at 04:02:58PM +0200, Henning Schild wrote:
  Hi,
  
  i am currently looking into realtime VMs using libvirt. My first
  starting point was reserving a couple of cores using isolcpus and
  later tuning the affinity to place my vcpus on the reserved pcpus.
  
  My first observation was that libvirt ignores isolcpus. Affinity
  masks of new qemus will default to all cpus and will not be
  inherited from libvirtd. A comment in the code suggests that this
  is done on purpose.
 
 Ignore realtime + isolcpus for a minute. It is not unreasonable for
 the system admin to decide system services should be restricted to
 run on a certain subset of CPUs. If we let VMs inherit the CPU
 pinning on libvirtd, we'd be accidentally confining VMs to a subset
 of CPUs too. With new cgroups layout, libvirtd lives in a separate
 cgroups tree /system.slice, while VMs live in /machine.slice. So
 for both these reasons, when starting VMs, we explicitly ignore
 any affinity libvirtd has and set VMs mask to allow any CPU.

Sure, that was my first guess as well. Still i wanted to raise the
topic again from the realtime POV.
I am using a pretty recent libvirt from git but did not come across the
system.slice yet. Might be a matter of configuration/invocation of
libvirtd.

  After that i changed the code to use only the available cpus by
  default. But taskset was still showing all 'f's on my qemus. Then i
  traced my change down to sched_setaffinity assuming that some other
  mechanism might have reverted my hack, but it is still in place.
 
 From the libvirt POV, we can't tell whether the admin set isolcpus
 because they want to reserve those CPUs only for VMs, or because
 they want to stop VMs using those CPUs by default. As such libvirt
 does not try to interpret isolcpus at all, it leaves it upto a
 higher level app to decide on this policy.

I know, you have to tell libvirt that the reservation is actually for
libvirt. My idea was to introduce a config option in libvirt and maybe
sanity check it by looking at whether the pcpus are actually reserved.
Rik recently posted a patch to allow easy programmatic checking of
isolcpus via sysfs.

 In the case of OpenStack, the /etc/nova/nova.conf allows a config
 setting  'vcpu_pin_set' to say what set of CPUs VMs should be allowed
 to run on, and nova will then update the libvirt XML when starting
 each guest.

I see, would it not still make sense to have that setting centrally in
libvirt? I am thinking about people not using nova but virsh or
virt-manager.
 
  Libvirt is setting up cgroups and now my suspicion is that cgroups
  and taskset might not work well together.
   /sys/fs/cgroup/cpu/machine.slice/machine-qemu\x2dvm1.scope/vcpu0#
   cpuacct.usage_percpu
   247340587 50851635 89631114 23383025 412639264 1241965 55442753
   19923 14093629 15863859 27403280 1292195745 82031088 53690508
   135826421 124915000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  
  Looks like the last 16 cores are not used.
  
  But if i use taskset to ask for the affinity mask i get all 32 cpus.
  
taskset -p `cat tasks`
   pid 12905's current affinity mask: 
  
  I know that is not strictly libvirt but also a kernel question,
  still you guys are probably able to point me to what i am missing
  here.
  
   Linux 3.18.11+ #4 SMP PREEMPT RT
 
 BTW, I dropped Osier from the CC list, since he no longer works
 as Red Hat.

Yeah, the reply from my mailserver suggested that.

Henning

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list