[Kernel-packages] [Bug 1531768] Re: lxd and other commands get stuck on arm64 kernel and multiple CPUs
Very much looks like it's related to threading and futexes somehow. Forcing golang to use a single thread rather than one per container made things more stable using a very simple test (infinite loop of "lxc list"), though starting containers then still caused the hang to happen. I've seen a similar hang on futex when running (lxc-tests package): lxc-test-concurrent -j 8 -i 50 This creates and spawns 8 containers in parallel using threads and attempts that 50 times in a row. This is done entirely in C so doesn't touch golang. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: lxd and other commands get stuck on arm64 kernel and multiple CPUs Status in linux package in Ubuntu: Confirmed Bug description: I created an 8 CPU arm64 instance on Canonical's Scalingstack (which I want to use for armhf autopkgtesting in LXD). I started with wily as that has lxd available (it's not yet available in trusty nor the PPA for arm64). However, pretty much any LXD task that I do (I haven't tried much else) on this machine takes unbearably long. A simple "lxc profile set default raw.lxc lxc.seccomp=" or "lxc list" takes several minutes. I see tons of [ 1020.971955] rcu_sched kthread starved for 6000 jiffies! g1095 c1094 f0x0 [ 1121.166926] INFO: task fsnotify_mark:69 blocked for more than 120 seconds. in dmesg (the attached apport info has the complete dmesg). ProblemType: Bug DistroRelease: Ubuntu 15.10 Package: linux-image-4.2.0-22-generic 4.2.0-22.27 ProcVersionSignature: User Name 4.2.0-22.27-generic 4.2.6 Uname: Linux 4.2.0-22-generic aarch64 AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Jan 7 09:18 seq crw-rw 1 root audio 116, 33 Jan 7 09:18 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.19.1-0ubuntu5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A Date: Thu Jan 7 09:24:01 2016 IwConfig: eth0 no wireless extensions. lono wireless extensions. lxcbr0no wireless extensions. Lspci: 00:00.0 Host bridge [0600]: Red Hat, Inc. Device [1b36:0008] Subsystem: Red Hat, Inc Device [1af4:1100] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic root=LABEL=cloudimg-rootfs earlyprintk RelatedPackageVersions: linux-restricted-modules-4.2.0-22-generic N/A linux-backports-modules-4.2.0-22-generic N/A linux-firmware1.149.3 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1531768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1531768] Re: lxd and other commands get stuck on arm64 kernel and multiple CPUs
Reducing the number of threads that Go uses seems to help a bit: $ cat /etc/systemd/system/lxd.service.d/override.conf [Service] Environment=GOMAXPROCS=1 (GOMAXPROCS defaults to the number of CPUs). But Stéphane is still able to lock up LXD pretty fast even with that. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: lxd and other commands get stuck on arm64 kernel and multiple CPUs Status in linux package in Ubuntu: Confirmed Bug description: I created an 8 CPU arm64 instance on Canonical's Scalingstack (which I want to use for armhf autopkgtesting in LXD). I started with wily as that has lxd available (it's not yet available in trusty nor the PPA for arm64). However, pretty much any LXD task that I do (I haven't tried much else) on this machine takes unbearably long. A simple "lxc profile set default raw.lxc lxc.seccomp=" or "lxc list" takes several minutes. I see tons of [ 1020.971955] rcu_sched kthread starved for 6000 jiffies! g1095 c1094 f0x0 [ 1121.166926] INFO: task fsnotify_mark:69 blocked for more than 120 seconds. in dmesg (the attached apport info has the complete dmesg). ProblemType: Bug DistroRelease: Ubuntu 15.10 Package: linux-image-4.2.0-22-generic 4.2.0-22.27 ProcVersionSignature: User Name 4.2.0-22.27-generic 4.2.6 Uname: Linux 4.2.0-22-generic aarch64 AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Jan 7 09:18 seq crw-rw 1 root audio 116, 33 Jan 7 09:18 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.19.1-0ubuntu5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A Date: Thu Jan 7 09:24:01 2016 IwConfig: eth0 no wireless extensions. lono wireless extensions. lxcbr0no wireless extensions. Lspci: 00:00.0 Host bridge [0600]: Red Hat, Inc. Device [1b36:0008] Subsystem: Red Hat, Inc Device [1af4:1100] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic root=LABEL=cloudimg-rootfs earlyprintk RelatedPackageVersions: linux-restricted-modules-4.2.0-22-generic N/A linux-backports-modules-4.2.0-22-generic N/A linux-firmware1.149.3 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1531768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1531768] Re: lxd and other commands get stuck on arm64 kernel and multiple CPUs
I managed to get the 4x CPU instance into the same locked up state now, so AFAICS the problem isn't fundamentally different between 4 and 8 cores. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: lxd and other commands get stuck on arm64 kernel and multiple CPUs Status in linux package in Ubuntu: Confirmed Bug description: I created an 8 CPU arm64 instance on Canonical's Scalingstack (which I want to use for armhf autopkgtesting in LXD). I started with wily as that has lxd available (it's not yet available in trusty nor the PPA for arm64). However, pretty much any LXD task that I do (I haven't tried much else) on this machine takes unbearably long. A simple "lxc profile set default raw.lxc lxc.seccomp=" or "lxc list" takes several minutes. I see tons of [ 1020.971955] rcu_sched kthread starved for 6000 jiffies! g1095 c1094 f0x0 [ 1121.166926] INFO: task fsnotify_mark:69 blocked for more than 120 seconds. in dmesg (the attached apport info has the complete dmesg). ProblemType: Bug DistroRelease: Ubuntu 15.10 Package: linux-image-4.2.0-22-generic 4.2.0-22.27 ProcVersionSignature: User Name 4.2.0-22.27-generic 4.2.6 Uname: Linux 4.2.0-22-generic aarch64 AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Jan 7 09:18 seq crw-rw 1 root audio 116, 33 Jan 7 09:18 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.19.1-0ubuntu5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A Date: Thu Jan 7 09:24:01 2016 IwConfig: eth0 no wireless extensions. lono wireless extensions. lxcbr0no wireless extensions. Lspci: 00:00.0 Host bridge [0600]: Red Hat, Inc. Device [1b36:0008] Subsystem: Red Hat, Inc Device [1af4:1100] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic root=LABEL=cloudimg-rootfs earlyprintk RelatedPackageVersions: linux-restricted-modules-4.2.0-22-generic N/A linux-backports-modules-4.2.0-22-generic N/A linux-firmware1.149.3 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1531768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1531768] Re: lxd and other commands get stuck on arm64 kernel and multiple CPUs
Retitling. The "unusably slow" part was fixed with installing haveged, so what remains is that the 8x CPU instance gets into this lockup state after some time. On the 4x instance I'm now running adt-run in a loop, so far it's through ~ 10 iterations. I'll let it run over night and see how it is keeping up. ** Summary changed: - arm64 kernel and multiple CPUs is unusably slow + lxd and other commands get stuck on arm64 kernel and multiple CPUs -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1531768 Title: lxd and other commands get stuck on arm64 kernel and multiple CPUs Status in linux package in Ubuntu: Confirmed Bug description: I created an 8 CPU arm64 instance on Canonical's Scalingstack (which I want to use for armhf autopkgtesting in LXD). I started with wily as that has lxd available (it's not yet available in trusty nor the PPA for arm64). However, pretty much any LXD task that I do (I haven't tried much else) on this machine takes unbearably long. A simple "lxc profile set default raw.lxc lxc.seccomp=" or "lxc list" takes several minutes. I see tons of [ 1020.971955] rcu_sched kthread starved for 6000 jiffies! g1095 c1094 f0x0 [ 1121.166926] INFO: task fsnotify_mark:69 blocked for more than 120 seconds. in dmesg (the attached apport info has the complete dmesg). ProblemType: Bug DistroRelease: Ubuntu 15.10 Package: linux-image-4.2.0-22-generic 4.2.0-22.27 ProcVersionSignature: User Name 4.2.0-22.27-generic 4.2.6 Uname: Linux 4.2.0-22-generic aarch64 AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Jan 7 09:18 seq crw-rw 1 root audio 116, 33 Jan 7 09:18 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.19.1-0ubuntu5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A Date: Thu Jan 7 09:24:01 2016 IwConfig: eth0 no wireless extensions. lono wireless extensions. lxcbr0no wireless extensions. Lspci: 00:00.0 Host bridge [0600]: Red Hat, Inc. Device [1b36:0008] Subsystem: Red Hat, Inc Device [1af4:1100] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic root=LABEL=cloudimg-rootfs earlyprintk RelatedPackageVersions: linux-restricted-modules-4.2.0-22-generic N/A linux-backports-modules-4.2.0-22-generic N/A linux-firmware1.149.3 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1531768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp