Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-05 Thread Lennart Poettering
On Fri, 05.08.16 12:33, Dr. Werner Fink (wer...@suse.de) wrote:

> > Yeah, to make this clear: I do not blame libvirt for this borkedness
> > at all. I blame the kernel.
> 
> Hmmm ... IMHO it is useless to pass the buck from kernel to user space
> as well do the same from user space back to kernel.  I've an open bug
> from a customer and this bug requires a solution.  AFAICS libvirt can
> not do this but machined could do.

machined certainly can't. It doesn't do cgroup stuff at all. It just
keeps tracks of local containers and VMs. cgroup management is done by
systemd in PID1 itself. So no, machined certainly can't.

And in systemd itself we are very conservative on working around
broken kernel behaviour. Note that Tejun (the kernel's cgroup
maintainer) actually acknowledges that cpuset has broken semantics
there.

For many cases CPUAffinity= is actually enough, not for all, if you
need a quick solution.

If you require the correct solution, please work with Tejun to fix the
cpuset semantics and as that has happened we can start making use of
this in systemd.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-05 Thread Dr. Werner Fink
On Fri, Aug 05, 2016 at 11:07:50AM +0200, Lennart Poettering wrote:
> On Thu, 04.08.16 16:19, Cedric Bosdonnat (cbosdon...@suse.com) wrote:
> 
> > Hi Lennart and Werner,
> > 
> > On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote:
> > > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote:
> > > > problem with v228 (and I guess this is also later AFAICS from logs of
> > > > current git) that repeating CPU hotplug events (offline/online). The
> > > > root cause is that cpuset.cpus become not restored by machined.
> > > > Please note that libvirt can not do this as it is not allowed to do
> > > > so.
> > > 
> > > This is a limitation of the kernel cpuset interface, and it's one of
> > > the reasons we do not expose cpusets at all in systemd right
> > > now. Thankfully, there's an alternative to cpusets, which is the CPU
> > > affinity controls exposed via CPUAffinity= in systemd, which do much
> > > of the same, but have less borked semantics.
> > > 
> > > We'd like to support cpusets directly in systemd, but we don't do this
> > > as long as the kernel interfaces are as borked as they are. For
> > > example, cpusets are flushed out entirely currently when the system
> > > goes through a suspend/resume cycle.
> > > 
> > > If libvirt has hook-ups with cpuset, then it bypasses systemd for
> > > that.
> > 
> > I guess by CPU affinity you mean sched_setaffinity and friends. If that is
> > the case, then this is constrained by cpuset too as mentioned here:
> > 
> > http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53
> > 
> > As long as the machine.slice cpuset isn't restored after onlining a CPU 
> > again,
> > then libvirt won't be able to set either the affinity or the cpuset if it
> > contains that CPU.
> > 
> > May be the kernel's behaviour is weird and can be discussed, but libvirt 
> > can't
> > do anything on that bug.
> 
> Yeah, to make this clear: I do not blame libvirt for this borkedness
> at all. I blame the kernel.

Hmmm ... IMHO it is useless to pass the buck from kernel to user space
as well do the same from user space back to kernel.  I've an open bug
from a customer and this bug requires a solution.  AFAICS libvirt can
not do this but machined could do.

Werner

-- 
  "Having a smoking section in a restaurant is like having
  a peeing section in a swimming pool." -- Edward Burr


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-05 Thread Daniel P. Berrange
On Fri, Aug 05, 2016 at 12:33:21PM +0200, Dr. Werner Fink wrote:
> On Fri, Aug 05, 2016 at 11:07:50AM +0200, Lennart Poettering wrote:
> > On Thu, 04.08.16 16:19, Cedric Bosdonnat (cbosdon...@suse.com) wrote:
> > 
> > > Hi Lennart and Werner,
> > > 
> > > On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote:
> > > > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote:
> > > > > problem with v228 (and I guess this is also later AFAICS from logs of
> > > > > current git) that repeating CPU hotplug events (offline/online). The
> > > > > root cause is that cpuset.cpus become not restored by machined.
> > > > > Please note that libvirt can not do this as it is not allowed to do
> > > > > so.
> > > > 
> > > > This is a limitation of the kernel cpuset interface, and it's one of
> > > > the reasons we do not expose cpusets at all in systemd right
> > > > now. Thankfully, there's an alternative to cpusets, which is the CPU
> > > > affinity controls exposed via CPUAffinity= in systemd, which do much
> > > > of the same, but have less borked semantics.
> > > > 
> > > > We'd like to support cpusets directly in systemd, but we don't do this
> > > > as long as the kernel interfaces are as borked as they are. For
> > > > example, cpusets are flushed out entirely currently when the system
> > > > goes through a suspend/resume cycle.
> > > > 
> > > > If libvirt has hook-ups with cpuset, then it bypasses systemd for
> > > > that.
> > > 
> > > I guess by CPU affinity you mean sched_setaffinity and friends. If that is
> > > the case, then this is constrained by cpuset too as mentioned here:
> > > 
> > > http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53
> > > 
> > > As long as the machine.slice cpuset isn't restored after onlining a CPU 
> > > again,
> > > then libvirt won't be able to set either the affinity or the cpuset if it
> > > contains that CPU.
> > > 
> > > May be the kernel's behaviour is weird and can be discussed, but libvirt 
> > > can't
> > > do anything on that bug.
> > 
> > Yeah, to make this clear: I do not blame libvirt for this borkedness
> > at all. I blame the kernel.
> 
> Hmmm ... IMHO it is useless to pass the buck from kernel to user space
> as well do the same from user space back to kernel.  I've an open bug
> from a customer and this bug requires a solution.  AFAICS libvirt can
> not do this but machined could do.

It is not simply a problem wrt to virtual machines, it affects any application
which is using the cpuset controller - VMs is just one such user. So it would
be inappropriate todo it in machined.

Fixing it in userspace is complicated by the fact that different levels or
branches in the cgroup hiearchy are managed by different applications, with
no single application having a single world view. Even if systemd itsef did
have support for the cpuset controller, it would still not have  a global
view of all cgroups, as applications can be created further child cgroups
below the groups managed by systemd, which systemd doesn't track.

Trying to restore correct cpuaffinity after hotplug would thus require that
multiple userspace applications all be aware of the problem and contain
logic to fix their part of the hierarchy. This is further complicated by
the ordering constraints that would require top levels to be fixed before
child levels.

Bearing all this in mind, fixing it in userspace is an incredibly hard
problem which will always be liable to race conditions between applications.

The only choices that are practical are a) not use the cpuset controller
at all, or b) fix the kernel so that it maintains 2 distinct bitmaps,
one for the set of online CPus, and one for the configured affinity in the
cpuset, and thus avoid throwing away data on CPU unplug/plug.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-05 Thread Lennart Poettering
On Thu, 04.08.16 16:19, Cedric Bosdonnat (cbosdon...@suse.com) wrote:

> Hi Lennart and Werner,
> 
> On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote:
> > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote:
> > > problem with v228 (and I guess this is also later AFAICS from logs of
> > > current git) that repeating CPU hotplug events (offline/online). The
> > > root cause is that cpuset.cpus become not restored by machined.
> > > Please note that libvirt can not do this as it is not allowed to do
> > > so.
> > 
> > This is a limitation of the kernel cpuset interface, and it's one of
> > the reasons we do not expose cpusets at all in systemd right
> > now. Thankfully, there's an alternative to cpusets, which is the CPU
> > affinity controls exposed via CPUAffinity= in systemd, which do much
> > of the same, but have less borked semantics.
> > 
> > We'd like to support cpusets directly in systemd, but we don't do this
> > as long as the kernel interfaces are as borked as they are. For
> > example, cpusets are flushed out entirely currently when the system
> > goes through a suspend/resume cycle.
> > 
> > If libvirt has hook-ups with cpuset, then it bypasses systemd for
> > that.
> 
> I guess by CPU affinity you mean sched_setaffinity and friends. If that is
> the case, then this is constrained by cpuset too as mentioned here:
> 
> http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53
> 
> As long as the machine.slice cpuset isn't restored after onlining a CPU again,
> then libvirt won't be able to set either the affinity or the cpuset if it
> contains that CPU.
> 
> May be the kernel's behaviour is weird and can be discussed, but libvirt can't
> do anything on that bug.

Yeah, to make this clear: I do not blame libvirt for this borkedness
at all. I blame the kernel.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-04 Thread Cedric Bosdonnat
Hi Lennart and Werner,

On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote:
> On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote:
> > problem with v228 (and I guess this is also later AFAICS from logs of
> > current git) that repeating CPU hotplug events (offline/online). The
> > root cause is that cpuset.cpus become not restored by machined.
> > Please note that libvirt can not do this as it is not allowed to do
> > so.
> 
> This is a limitation of the kernel cpuset interface, and it's one of
> the reasons we do not expose cpusets at all in systemd right
> now. Thankfully, there's an alternative to cpusets, which is the CPU
> affinity controls exposed via CPUAffinity= in systemd, which do much
> of the same, but have less borked semantics.
> 
> We'd like to support cpusets directly in systemd, but we don't do this
> as long as the kernel interfaces are as borked as they are. For
> example, cpusets are flushed out entirely currently when the system
> goes through a suspend/resume cycle.
> 
> If libvirt has hook-ups with cpuset, then it bypasses systemd for
> that.

I guess by CPU affinity you mean sched_setaffinity and friends. If that is
the case, then this is constrained by cpuset too as mentioned here:

http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53

As long as the machine.slice cpuset isn't restored after onlining a CPU again,
then libvirt won't be able to set either the affinity or the cpuset if it
contains that CPU.

May be the kernel's behaviour is weird and can be discussed, but libvirt can't
do anything on that bug.

--
Cedric
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-03 Thread Lennart Poettering
On Wed, 03.08.16 14:46, Dr. Werner Fink (wer...@suse.de) wrote:

> Hi,
> 
> problem with v228 (and I guess this is also later AFAICS from logs of
> current git) that repeating CPU hotplug events (offline/online). The
> root cause is that cpuset.cpus become not restored by machined.
> Please note that libvirt can not do this as it is not allowed to do
> so.

This is a limitation of the kernel cpuset interface, and it's one of
the reasons we do not expose cpusets at all in systemd right
now. Thankfully, there's an alternative to cpusets, which is the CPU
affinity controls exposed via CPUAffinity= in systemd, which do much
of the same, but have less borked semantics.

We'd like to support cpusets directly in systemd, but we don't do this
as long as the kernel interfaces are as borked as they are. For
example, cpusets are flushed out entirely currently when the system
goes through a suspend/resume cycle.

If libvirt has hook-ups with cpuset, then it bypasses systemd for
that.

Either way, this is not a systemd issue at all.

> PS: Using https://github.com/systemd/systemd/issues/new seems to be very 
> limited
> with > *NOTE: Do not submit bug reports about anything but the two most 
> recently
>  > released systemd versions upstream!*

Yes, we do upstream maintainance upstream only. Downstream
maintainance needs to happen downstream.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start

2016-08-03 Thread Dr. Werner Fink
Hi,

problem with v228 (and I guess this is also later AFAICS from logs of
current git) that repeating CPU hotplug events (offline/online). The
root cause is that cpuset.cpus become not restored by machined.
Please note that libvirt can not do this as it is not allowed to do so.

Steps to reproduce:

  1. Configure vCPU pinning.
# virsh vcpupin guest-os 0 0-3 --config
  2. Boot the guest.
# virsh start guest-os
  3. Shutdown the guest.
# virsh shutdown guest-os
  4. Offline one of host CPUs.
# echo 0 > /sys/devices/system/cpu/cpu3/online
  5. Online the host CPU again.
# echo 1 > /sys/devices/system/cpu/cpu3/online
  6. Boot the guest again.
# virsh start guest-os

Actual result:

  error: Failed to start domain guest-os
  error: Unable to write to 
'/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d2\x2dsles12sp2\x2dbeta3.scope/vcpu0/cpuset.cpus':
 Permission denied

Expected result:

  The KVM guest may boot without errors.

this could be done by hand with

  # echo 0-31 > /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus

as libvirt can't touch the cpuset in machine.slice scope since
this one is owned by machined.

This problem is also discussed at upstream of libvirt

  https://www.redhat.com/archives/libvir-list/2012-April/msg00777.html

and seems to be a well know problem not only here:

  https://bugzilla.redhat.com/show_bug.cgi?id=838070

from the kernel's side this seems to a behavior by design, from right before
the cpuset_hotplug_workfn():

  /*
   * [...]
   * Non-root cpusets are only affected by offlining.  If any CPUs or memory
   * nodes have been taken down, cpuset_hotplug_update_tasks() is invoked on
   * all descendants.
   * [...]
   */

Werner

PS: Using https://github.com/systemd/systemd/issues/new seems to be very limited
with > *NOTE: Do not submit bug reports about anything but the two most recently
 > released systemd versions upstream!*
-- 
  "Having a smoking section in a restaurant is like having
  a peeing section in a swimming pool." -- Edward Burr


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel