[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-11-07 Thread Martin Pitt
I propose to close this. This is clearly fixed with 4.4 on the host, and rolling that out is covered by bug 1602577. It can be closed for auto-package-testing either way as our arm64 nova compute nodes now run 4.4.23. ** Changed in: linux (Ubuntu) Status: Confirmed => Fix Released **

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-09-30 Thread Martin Pitt
For the record, I now use two arm64 xenial (4.4) instances on a host with kernel 4.8, and things are looking really good. See latest posts to bug 1602577. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-21 Thread dann frazier
@cjwatson: Ah, ok. I may have misread the history here. I had gleaned that the xenial kernel (as a host) was more unstable - but for different reasons. Regardless, I have pulled the wily backport build I prepared, because it was frequently triggering a WARN() condition. Looks like my backport

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-21 Thread Colin Watson
@dannf, this bug seems to be *worse* in xenial than in wily, so I don't think backporting a change from xenial to wily is going to help matters? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu.

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-20 Thread dann frazier
I wonder if this might be a dupe of LP: #1549494? We fixed that in xenial, but haven't backported the fix to wily. I haven't been able to reproduce this issue myself, but I uploaded a wily kernel w/ a backported fix to ppa:dannf/test, in case someone else can test it. It corresponds to the git

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-14 Thread Colin Ian King
OK, ignore the last two messages, it eventually booted, it just seems that the host was rather slow. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-14 Thread Colin Ian King
I'm reproducing rcu_sched timeouts all the time with a 4.4 kernel on a far slower ARM64 host with the same cloud images. [ 157.555837] INFO: rcu_sched self-detected stall on CPU [ 157.561551] INFO: rcu_sched detected stalls on CPUs/tasks: [ 157.562669] 2-...: (14960 ticks this GP)

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-14 Thread Colin Ian King
..and for one more datapoint, QEMU seems to be hung spinning on a futex: futex(0xb05520, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xb054f4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xb05520, 1104376) = 1 -- You received this bug notification because

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-13 Thread Colin Watson
We're seeing rather similar symptoms on Launchpad builders after upgrading the guests from wily to xenial (console-log not very informative, e.g. https://pastebin.canonical.com/160898/plain/; build output appears hung; I can't tell for sure that it's the same thing, this is just a guess). These

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-13 Thread Junien Fridrick
Filed LP#1602577 for the host instability issue on 4.4 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting Status in Auto Package

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-12 Thread Colin Ian King
yep, file a separate bug, the perf data will be useful. Thanks. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting Status in Auto

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-12 Thread Junien Fridrick
Hi, I'm sorry but 4.4 is too unstable on the hosts. We have to reboot and/or power cycle them multiple times a day. We're back on 4.2 everywhere. Haw gathered some perf data on a failing 4.4 host, perhaps we can start digging the issue from here ? Perhaps it should be a separate bug as well.

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-12 Thread Martin Pitt
> can you try using the following kernel parameters on the VM and see if this > helps: > rcu_nocb_poll rcutree.kthread_prio=90 rcuperf.verbose=1 the instance on swirlix16 (on 4.2 kernel) hung again (twice), with the attached console log. This now has the above kernel parameters, but I'm afraid

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-12 Thread Martin Pitt
hloeung | pitti: yeah, I believe work was done to get swirlix01-09 to 4.4 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting Status

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-12 Thread Martin Pitt
[hloeung@ragnar tmp]$ for i in {01..09} 16; do ssh swirlix${i}.bos01.scalingstack "uname -a"; done Linux swirlix01 4.4.0-30-generic #49~14.04.1-Ubuntu SMP Thu Jun 30 22:20:09 UTC 2016 aarch64 aarch64 aarch64 GNU/Linux Linux swirlix02 4.4.0-30-generic #49~14.04.1-Ubuntu SMP Thu Jun 30 22:20:09

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-12 Thread Martin Pitt
lxd-armhf1 (on swirlix01) has run without any lockup since the host kernel update to 4.4. I created a new lxd-armhf2 yesterday (on swirlix08) which also survived without any workaround. At the same time I created a new lxd-armhf3 (on swirlix16) which has locked up pretty well every < 15 minutes (I

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-08 Thread Martin Pitt
Thanks Colin, great work! I'll deploy this ASAP. FYI, at least some of the VM hosts in scalingstack got updated to a 4.4 kernel. Not sure how much that changes your investigations. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-08 Thread Colin Ian King
Also, can we clarify something. Do the ARM hosts provide kvm? If not, one should really run the VMs with just one CPU. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title:

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-08 Thread Colin Ian King
This article throws some light onto things: https://lwn.net/Articles/518953/ "Second, the greater the number of idle CPUs, the more work RCU must do when forcing quiescent states. Yes, the busier the system, the less work RCU needs to do! The reason for the extra work is that RCU is not

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-07 Thread Colin Ian King
http://lists.infradead.org/pipermail/linux-arm- kernel/2014-July/274251.html -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-07 Thread Colin Ian King
Bit more digging, I see that the CPU goes into idle either by a single WFI (wait for an interrupt) shallow sleep or a deeper arm_cpuidle_suspend() - the latter is akin to turning off the CPU. I wonder if we're seeing some issues with the wakeup latency taking a long time inside QEMU when the host

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-07 Thread Colin Ian King
On an idle Xenial cloud image I'm seeing: [ 1485.236760] [] __switch_to+0x90/0xa8 [ 1485.236772] [] __tick_nohz_idle_enter+0x50/0x3f0 [ 1485.236776] [] tick_nohz_idle_enter+0x40/0x70 [ 1485.236785] [] cpu_startup_entry+0x288/0x2d8 [ 1485.236791] [] secondary_start_kernel+0x120/0x130 [

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-07-07 Thread Colin Ian King
Bisecting is proving problematic as 4.3 kernels don't boot. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting Status in Auto Package

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
I wonder if it is possible to test with a recent 4.4 Xenial kernel on the host to see if that helps. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
Can't repro the bug on 4.4 kernel on host. Will try 4.3 now -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting Status in Auto Package

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
Testing with 4.4 on the host and the VM is showing: [ 335.699014] sched: RT throttling activated [ 337.600831] hrtimer: interrupt took 2939683820 ns ..which shows us that the host is suffering from some very large scheduling latency issues that is causing the VM some grief. -- You received

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
Can trip it with stress-ng context switching with 4.2.0-38-generic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after booting Status in Auto

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
Finally able to trip a rcu timeout. 3.19.0-61-generic kernel on host, xenial on server, host busy on async i/o requests (via stress-ng): [ 825.195520] systemd[1]: Started Journal Service. [ 900.108730] INFO: rcu_sched detected stalls on CPUs/tasks: [ 900.110254] 0-...: (4 GPs behind)

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
Thanks William, I'm going to soak test with those older kernels and see if I can trip the hang on these. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread William Grant
The production hardware is mcdivitt as well, running trusty with lts- vivid or lts-wily. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: [arm64] lockups some time after

[Kernel-packages] [Bug 1531768] Re: [arm64] lockups some time after booting

2016-06-21 Thread Colin Ian King
I've been running Xenial host + Xenial VM on a mcdivitt 8 core box and not been able to reproduce this issue. I'm going to keep it running for one more day. Do we have any idea of what the host(s) hardware is? I'm starting to wonder if it is a host/VM interaction issue. -- You received this