Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
On 11/16/2017 02:19 PM, Hans van Kranenburg wrote: > > Latest work on this: > > https://patchwork.kernel.org/patch/10035835/ > > "Applied to for-linus-4.15." Ok, I just built a 4.9.65 kernel with this patch on top and the config from debian (config-4.9.0-4-amd64). It applies without complaints: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e25f5db6abb96ca8ee2aaedcb863daa6dfcc07a With the 4.9.51-1 from Stretch I can reliably reproduce the broken CPU counters after doing live migration with Xen once or maybe twice. The 4.9.65 + steal time patch survived throwing it around >20 times now, and there's no sign of any weird behaviour any more. Counters in /proc/stat keep showing values that still make sense, instead of suddenly jumping to values like 1174983480817 or 1753913027832... How do we proceed from here? -- Hans van Kranenburg
Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
Hi, I ran into the same issue with 4.9.51-1 in the guest. My dom0 in this case is still Jessie with its xen 4.4 version. Users (rightfully) worry about what's suddenly wrong with their virtual machine, since the steal values mess up the cpu graphs. I see that all the discussions linked above have gone silent quickly without a solution. A post to the Xen mailing list on Aug 31th did not get any answer yet: https://lists.xen.org/archives/html/xen-users/2017-08/msg00092.html I see that the issue got fixed by replacing the code with a new implementation of the same functionality. I guess this is a scenario that sometimes happens, not having a ready to go fix available for the previous LTS kernel. So, what would be the best step forward here? Should we poke the Xen people a bit more to find out how to approach this, or get an opinion on the best and smallest patch to go with? I'm not an expert in the cpu time accounting area, but I can help testing etc... Thanks, -- Hans van Kranenburg
Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
There is currently a discussion on LKML and xen-devel about this issue: https://lkml.org/lkml/2017/10/10/182 Maybe this will result in some backportable fix.
Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
> Am 25.09.2017 um 02:07 schrieb Ben Hutchings : > > I agree that the kernel ought to work around this, but I'm hesitant to > add a fix that doesn't look like any upstream change. Why and how do > you think this was fixed in 4.11? Indeed this patch is not included upstream and won’t be as it does not apply to the current development branch. See the following brief discussion on this on Linux stable: http://www.spinics.net/lists/stable/msg186915.html The main issue here is the conversion between nsecs and cputime and how this behaves when the calculated difference in steal time overflows, i.e. decreases. In this case cpustat does not overflow and therefore decrease as well but instead increases by a large number. In Linux 4.11 the conversion between nsecs and cputime is entirely gone, which should lead to a backwards running cpustat counter without further issues (I have not actually tested this): 2b1f967d80e8e5d7361f0e1654c842869570f573 sched/cputime: Complete nsec conversion of tick based accounting This change seems to be only the last commit of a larger change though and I’m not sure if this larger change is suitable for backporting. I agree that it would be nice to have a patch which has undergone a good review process. Unfortunately I cannot provide that. I also cannot estimate how many people are affected by this issue considering that it did not attract attention over the last year. Michael
Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
Control: severity -1 important Control: tag -1 moreinfo On Wed, 2017-08-09 at 22:46 +0200, Michael Lass wrote: > Package: src:linux > Version: 4.9.30-2+deb9u3 > Severity: normal > Tags: patch > > Dear Maintainer, > > running Debian Stretch as a paravirtualized guest under Xen, the kernel > obtains its cpu steal time counter from the virtualization host. On some > hosts, occasionally a slight decrease in the cpu steal time is returned > which leads to an overflow of unsigned variables in the kernel and > subsequent errors in steal time accounting (such as backwards running > counters). This renders tools like "top" or "vmstat" broken in a way > that the cpu utilization cannot be determined anymore. > > While this is likely a bug in the virtualization environment, the kernel > running as a guest should deal with this gracefully. I attached a patch > to this report which fixes the errors caused by this on the guest. > Kernel versions 4.7 and older, as well as 4.11 and newer should not be > affected by this issue. [...] I agree that the kernel ought to work around this, but I'm hesitant to add a fix that doesn't look like any upstream change. Why and how do you think this was fixed in 4.11? Ben. -- Ben Hutchings If the facts do not conform to your theory, they must be disposed of. signature.asc Description: This is a digitally signed message part
Processed: Re: Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
Processing control commands: > severity -1 important Bug #871608 [src:linux] linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully Severity set to 'important' from 'normal' > tag -1 moreinfo Bug #871608 [src:linux] linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully Added tag(s) moreinfo. -- 871608: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871608 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#871608: linux-image-4.9.0-3-amd64: Linux kernel should handle decreasing cpu steal clock counter gracefully
Package: src:linux Version: 4.9.30-2+deb9u3 Severity: normal Tags: patch Dear Maintainer, running Debian Stretch as a paravirtualized guest under Xen, the kernel obtains its cpu steal time counter from the virtualization host. On some hosts, occasionally a slight decrease in the cpu steal time is returned which leads to an overflow of unsigned variables in the kernel and subsequent errors in steal time accounting (such as backwards running counters). This renders tools like "top" or "vmstat" broken in a way that the cpu utilization cannot be determined anymore. While this is likely a bug in the virtualization environment, the kernel running as a guest should deal with this gracefully. I attached a patch to this report which fixes the errors caused by this on the guest. Kernel versions 4.7 and older, as well as 4.11 and newer should not be affected by this issue. Bug #785557 shows that behavior like this is caused by some broken KVM hosts. I myself experience this on a Xen host which unfortunately I have no more information about. A more detailled description of the issue is part of the patch header, as well as the following blog post: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest/ I would appreciate inclusion of this patch in Debian as this issue may affect other people running on buggy virtualization hosts and the patch should not influence other systems. Note that the system I report this from already runs a customly patched kernel which may influence some of the information below. -- Package-specific info: ** Version: Linux version 4.9.0-3-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u3+lass1 (2017-08-08) ** Command line: root=/dev/xvda ro ** Not tainted ** Kernel log: Unable to read kernel log; any relevant messages should be attached ** Model information ** Loaded modules: ipt_REJECT nf_reject_ipv4 binfmt_misc xt_multiport iptable_filter intel_rapl sb_edac edac_core evdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr intel_rapl_perf ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto ecb mbcache btrfs crc32c_generic xor raid6_pq crc32c_intel xen_netfront xen_blkfront aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ** PCI devices: not available ** USB devices: not available -- System Information: Debian Release: 9.1 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-3-amd64 (SMP w/1 CPU core) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages linux-image-4.9.0-3-amd64 depends on: ii initramfs-tools [linux-initramfs-tool] 0.130 ii kmod23-2 ii linux-base 4.5 Versions of packages linux-image-4.9.0-3-amd64 recommends: ii firmware-linux-free 3.4 ii irqbalance 1.1.0-2.3 Versions of packages linux-image-4.9.0-3-amd64 suggests: pn debian-kernel-handbook pn grub-pc | grub-efi-amd64 | extlinux pn linux-doc-4.9 Versions of packages linux-image-4.9.0-3-amd64 is related to: pn firmware-amd-graphics pn firmware-atheros pn firmware-bnx2 pn firmware-bnx2x pn firmware-brcm80211 pn firmware-cavium pn firmware-intel-sound pn firmware-intelwimax pn firmware-ipw2x00 pn firmware-ivtv pn firmware-iwlwifi pn firmware-libertas pn firmware-linux-nonfree pn firmware-misc-nonfree pn firmware-myricom pn firmware-netxen pn firmware-qlogic pn firmware-realtek pn firmware-samsung pn firmware-siano pn firmware-ti-connectivity pn xen-hypervisor -- no debconf information >From 4b66621a06a94d22629661a9262f92b8cf5b7ca9 Mon Sep 17 00:00:00 2001 From: Michael Lass Date: Sun, 6 Aug 2017 18:09:21 +0200 Subject: [PATCH] sched/cputime: handle decreasing steal clock On some flaky Xen hosts, the steal clock returned by paravirt_steal_clock is not monotonically increasing but can slightly decrease. Currently this results in an overflow of u64 steal. Before giving this number to account_steal_time() it is converted into cputime, so the target cpustat counter cpustat[CPUTIME_STEAL] is not overflowing as well but instead increased by a large amount. Due to the conversion to cputime and back into nanoseconds, this_rq()->prev_steal_time does not correctly reflect the latest reported steal clock afterwards, resulting in erratic behavior such as backwards running cpustat[CPUTIME_STEAL]. The following is a trace from userspace of the value for steal time reported in /proc/stat: timestolen diff --