Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
On Fri, 05.08.16 12:33, Dr. Werner Fink (wer...@suse.de) wrote: > > Yeah, to make this clear: I do not blame libvirt for this borkedness > > at all. I blame the kernel. > > Hmmm ... IMHO it is useless to pass the buck from kernel to user space > as well do the same from user space back to kernel. I've an open bug > from a customer and this bug requires a solution. AFAICS libvirt can > not do this but machined could do. machined certainly can't. It doesn't do cgroup stuff at all. It just keeps tracks of local containers and VMs. cgroup management is done by systemd in PID1 itself. So no, machined certainly can't. And in systemd itself we are very conservative on working around broken kernel behaviour. Note that Tejun (the kernel's cgroup maintainer) actually acknowledges that cpuset has broken semantics there. For many cases CPUAffinity= is actually enough, not for all, if you need a quick solution. If you require the correct solution, please work with Tejun to fix the cpuset semantics and as that has happened we can start making use of this in systemd. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
On Fri, Aug 05, 2016 at 11:07:50AM +0200, Lennart Poettering wrote: > On Thu, 04.08.16 16:19, Cedric Bosdonnat (cbosdon...@suse.com) wrote: > > > Hi Lennart and Werner, > > > > On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote: > > > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote: > > > > problem with v228 (and I guess this is also later AFAICS from logs of > > > > current git) that repeating CPU hotplug events (offline/online). The > > > > root cause is that cpuset.cpus become not restored by machined. > > > > Please note that libvirt can not do this as it is not allowed to do > > > > so. > > > > > > This is a limitation of the kernel cpuset interface, and it's one of > > > the reasons we do not expose cpusets at all in systemd right > > > now. Thankfully, there's an alternative to cpusets, which is the CPU > > > affinity controls exposed via CPUAffinity= in systemd, which do much > > > of the same, but have less borked semantics. > > > > > > We'd like to support cpusets directly in systemd, but we don't do this > > > as long as the kernel interfaces are as borked as they are. For > > > example, cpusets are flushed out entirely currently when the system > > > goes through a suspend/resume cycle. > > > > > > If libvirt has hook-ups with cpuset, then it bypasses systemd for > > > that. > > > > I guess by CPU affinity you mean sched_setaffinity and friends. If that is > > the case, then this is constrained by cpuset too as mentioned here: > > > > http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53 > > > > As long as the machine.slice cpuset isn't restored after onlining a CPU > > again, > > then libvirt won't be able to set either the affinity or the cpuset if it > > contains that CPU. > > > > May be the kernel's behaviour is weird and can be discussed, but libvirt > > can't > > do anything on that bug. > > Yeah, to make this clear: I do not blame libvirt for this borkedness > at all. I blame the kernel. Hmmm ... IMHO it is useless to pass the buck from kernel to user space as well do the same from user space back to kernel. I've an open bug from a customer and this bug requires a solution. AFAICS libvirt can not do this but machined could do. Werner -- "Having a smoking section in a restaurant is like having a peeing section in a swimming pool." -- Edward Burr signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
On Fri, Aug 05, 2016 at 12:33:21PM +0200, Dr. Werner Fink wrote: > On Fri, Aug 05, 2016 at 11:07:50AM +0200, Lennart Poettering wrote: > > On Thu, 04.08.16 16:19, Cedric Bosdonnat (cbosdon...@suse.com) wrote: > > > > > Hi Lennart and Werner, > > > > > > On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote: > > > > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote: > > > > > problem with v228 (and I guess this is also later AFAICS from logs of > > > > > current git) that repeating CPU hotplug events (offline/online). The > > > > > root cause is that cpuset.cpus become not restored by machined. > > > > > Please note that libvirt can not do this as it is not allowed to do > > > > > so. > > > > > > > > This is a limitation of the kernel cpuset interface, and it's one of > > > > the reasons we do not expose cpusets at all in systemd right > > > > now. Thankfully, there's an alternative to cpusets, which is the CPU > > > > affinity controls exposed via CPUAffinity= in systemd, which do much > > > > of the same, but have less borked semantics. > > > > > > > > We'd like to support cpusets directly in systemd, but we don't do this > > > > as long as the kernel interfaces are as borked as they are. For > > > > example, cpusets are flushed out entirely currently when the system > > > > goes through a suspend/resume cycle. > > > > > > > > If libvirt has hook-ups with cpuset, then it bypasses systemd for > > > > that. > > > > > > I guess by CPU affinity you mean sched_setaffinity and friends. If that is > > > the case, then this is constrained by cpuset too as mentioned here: > > > > > > http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53 > > > > > > As long as the machine.slice cpuset isn't restored after onlining a CPU > > > again, > > > then libvirt won't be able to set either the affinity or the cpuset if it > > > contains that CPU. > > > > > > May be the kernel's behaviour is weird and can be discussed, but libvirt > > > can't > > > do anything on that bug. > > > > Yeah, to make this clear: I do not blame libvirt for this borkedness > > at all. I blame the kernel. > > Hmmm ... IMHO it is useless to pass the buck from kernel to user space > as well do the same from user space back to kernel. I've an open bug > from a customer and this bug requires a solution. AFAICS libvirt can > not do this but machined could do. It is not simply a problem wrt to virtual machines, it affects any application which is using the cpuset controller - VMs is just one such user. So it would be inappropriate todo it in machined. Fixing it in userspace is complicated by the fact that different levels or branches in the cgroup hiearchy are managed by different applications, with no single application having a single world view. Even if systemd itsef did have support for the cpuset controller, it would still not have a global view of all cgroups, as applications can be created further child cgroups below the groups managed by systemd, which systemd doesn't track. Trying to restore correct cpuaffinity after hotplug would thus require that multiple userspace applications all be aware of the problem and contain logic to fix their part of the hierarchy. This is further complicated by the ordering constraints that would require top levels to be fixed before child levels. Bearing all this in mind, fixing it in userspace is an incredibly hard problem which will always be liable to race conditions between applications. The only choices that are practical are a) not use the cpuset controller at all, or b) fix the kernel so that it maintains 2 distinct bitmaps, one for the set of online CPus, and one for the configured affinity in the cpuset, and thus avoid throwing away data on CPU unplug/plug. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
On Thu, 04.08.16 16:19, Cedric Bosdonnat (cbosdon...@suse.com) wrote: > Hi Lennart and Werner, > > On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote: > > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote: > > > problem with v228 (and I guess this is also later AFAICS from logs of > > > current git) that repeating CPU hotplug events (offline/online). The > > > root cause is that cpuset.cpus become not restored by machined. > > > Please note that libvirt can not do this as it is not allowed to do > > > so. > > > > This is a limitation of the kernel cpuset interface, and it's one of > > the reasons we do not expose cpusets at all in systemd right > > now. Thankfully, there's an alternative to cpusets, which is the CPU > > affinity controls exposed via CPUAffinity= in systemd, which do much > > of the same, but have less borked semantics. > > > > We'd like to support cpusets directly in systemd, but we don't do this > > as long as the kernel interfaces are as borked as they are. For > > example, cpusets are flushed out entirely currently when the system > > goes through a suspend/resume cycle. > > > > If libvirt has hook-ups with cpuset, then it bypasses systemd for > > that. > > I guess by CPU affinity you mean sched_setaffinity and friends. If that is > the case, then this is constrained by cpuset too as mentioned here: > > http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53 > > As long as the machine.slice cpuset isn't restored after onlining a CPU again, > then libvirt won't be able to set either the affinity or the cpuset if it > contains that CPU. > > May be the kernel's behaviour is weird and can be discussed, but libvirt can't > do anything on that bug. Yeah, to make this clear: I do not blame libvirt for this borkedness at all. I blame the kernel. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
Hi Lennart and Werner, On Wed, 2016-08-03 at 16:56 +0200, Lennart Poettering wrote: > On Wed, 03.08.16 14:46, Dr. Werner Fink (werner at suse.de) wrote: > > problem with v228 (and I guess this is also later AFAICS from logs of > > current git) that repeating CPU hotplug events (offline/online). The > > root cause is that cpuset.cpus become not restored by machined. > > Please note that libvirt can not do this as it is not allowed to do > > so. > > This is a limitation of the kernel cpuset interface, and it's one of > the reasons we do not expose cpusets at all in systemd right > now. Thankfully, there's an alternative to cpusets, which is the CPU > affinity controls exposed via CPUAffinity= in systemd, which do much > of the same, but have less borked semantics. > > We'd like to support cpusets directly in systemd, but we don't do this > as long as the kernel interfaces are as borked as they are. For > example, cpusets are flushed out entirely currently when the system > goes through a suspend/resume cycle. > > If libvirt has hook-ups with cpuset, then it bypasses systemd for > that. I guess by CPU affinity you mean sched_setaffinity and friends. If that is the case, then this is constrained by cpuset too as mentioned here: http://www.mjmwired.net/kernel/Documentation/cpusets.txt#53 As long as the machine.slice cpuset isn't restored after onlining a CPU again, then libvirt won't be able to set either the affinity or the cpuset if it contains that CPU. May be the kernel's behaviour is weird and can be discussed, but libvirt can't do anything on that bug. -- Cedric ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
On Wed, 03.08.16 14:46, Dr. Werner Fink (wer...@suse.de) wrote: > Hi, > > problem with v228 (and I guess this is also later AFAICS from logs of > current git) that repeating CPU hotplug events (offline/online). The > root cause is that cpuset.cpus become not restored by machined. > Please note that libvirt can not do this as it is not allowed to do > so. This is a limitation of the kernel cpuset interface, and it's one of the reasons we do not expose cpusets at all in systemd right now. Thankfully, there's an alternative to cpusets, which is the CPU affinity controls exposed via CPUAffinity= in systemd, which do much of the same, but have less borked semantics. We'd like to support cpusets directly in systemd, but we don't do this as long as the kernel interfaces are as borked as they are. For example, cpusets are flushed out entirely currently when the system goes through a suspend/resume cycle. If libvirt has hook-ups with cpuset, then it bypasses systemd for that. Either way, this is not a systemd issue at all. > PS: Using https://github.com/systemd/systemd/issues/new seems to be very > limited > with > *NOTE: Do not submit bug reports about anything but the two most > recently > > released systemd versions upstream!* Yes, we do upstream maintainance upstream only. Downstream maintainance needs to happen downstream. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] machined: after CPU offline then online, vcpupin KVM guest failed to start
Hi, problem with v228 (and I guess this is also later AFAICS from logs of current git) that repeating CPU hotplug events (offline/online). The root cause is that cpuset.cpus become not restored by machined. Please note that libvirt can not do this as it is not allowed to do so. Steps to reproduce: 1. Configure vCPU pinning. # virsh vcpupin guest-os 0 0-3 --config 2. Boot the guest. # virsh start guest-os 3. Shutdown the guest. # virsh shutdown guest-os 4. Offline one of host CPUs. # echo 0 > /sys/devices/system/cpu/cpu3/online 5. Online the host CPU again. # echo 1 > /sys/devices/system/cpu/cpu3/online 6. Boot the guest again. # virsh start guest-os Actual result: error: Failed to start domain guest-os error: Unable to write to '/sys/fs/cgroup/cpuset/machine.slice/machine-qemu\x2d2\x2dsles12sp2\x2dbeta3.scope/vcpu0/cpuset.cpus': Permission denied Expected result: The KVM guest may boot without errors. this could be done by hand with # echo 0-31 > /sys/fs/cgroup/cpuset/machine.slice/cpuset.cpus as libvirt can't touch the cpuset in machine.slice scope since this one is owned by machined. This problem is also discussed at upstream of libvirt https://www.redhat.com/archives/libvir-list/2012-April/msg00777.html and seems to be a well know problem not only here: https://bugzilla.redhat.com/show_bug.cgi?id=838070 from the kernel's side this seems to a behavior by design, from right before the cpuset_hotplug_workfn(): /* * [...] * Non-root cpusets are only affected by offlining. If any CPUs or memory * nodes have been taken down, cpuset_hotplug_update_tasks() is invoked on * all descendants. * [...] */ Werner PS: Using https://github.com/systemd/systemd/issues/new seems to be very limited with > *NOTE: Do not submit bug reports about anything but the two most recently > released systemd versions upstream!* -- "Having a smoking section in a restaurant is like having a peeing section in a swimming pool." -- Edward Burr signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel