[Kernel-packages] [Bug 1866730] Re: Need patch for post 5.5 low-latency kernels
Thanks for the update on this compat fix. I've tested this on: upstream 5.6-rc5 lowlatency + generic upstream 5.5 lowlatency + generic ubuntu 5.4.0-18 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1866730 Title: Need patch for post 5.5 low-latency kernels Status in zfs-linux package in Ubuntu: In Progress Bug description: CONFIG_PREEMPT_RCU=y enabled post 5.4.x kernels have __rcu_read_lock exposed as GPL-ONLY, which breaks zfs compilation on kernels with that enabled. (Ubuntu low-latency kernels have that enabled IIRC.) The patch for the .8 series implementing this inside zfs to circumvent this issue is here: https://github.com/openzfs/zfs/commit/2fcab8795c7c493845bfa277d44bc443802000b8 This is from this comment in the relevant issue: https://github.com/openzfs/zfs/issues/9745#issuecomment-592617605 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1866730/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1866730] Re: Need patch for post 5.5 low-latency kernels
** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Medium ** Changed in: zfs-linux (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1866730 Title: Need patch for post 5.5 low-latency kernels Status in zfs-linux package in Ubuntu: In Progress Bug description: CONFIG_PREEMPT_RCU=y enabled post 5.4.x kernels have __rcu_read_lock exposed as GPL-ONLY, which breaks zfs compilation on kernels with that enabled. (Ubuntu low-latency kernels have that enabled IIRC.) The patch for the .8 series implementing this inside zfs to circumvent this issue is here: https://github.com/openzfs/zfs/commit/2fcab8795c7c493845bfa277d44bc443802000b8 This is from this comment in the relevant issue: https://github.com/openzfs/zfs/issues/9745#issuecomment-592617605 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1866730/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM
Can this be re-tested to see if this now fails after I cleaned up kernel03? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1863989 Title: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM Status in Stress-ng: Invalid Status in ubuntu-kernel-tests: Fix Committed Status in linux package in Ubuntu: Incomplete Bug description: Issue found on Eoan zVM node kernel03 Test hung at bad-altstack test. Reproducible rate: 4 out of 4 attempts 02:36:12 DEBUG| [stdout] aiol STARTING 02:36:17 DEBUG| [stdout] aiol RETURNED 0 02:36:17 DEBUG| [stdout] aiol PASSED 02:36:17 DEBUG| [stdout] bad-altstack STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@kernel03:kernel-test-results /var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive dmesg only shows: [ 102.352136] Adding 1048572k swap on /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img. Priority:-3 extents:95 across:26763272k SSFS [ 122.402895] NET: Registered protocol family 38 It looks like this is caused by OOM issue, x3270 console flushed with OOM error messages. ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-41-generic 5.3.0-41.33 ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18 Uname: Linux 5.3.0-41-generic s390x NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.11-0ubuntu8.4 Architecture: s390x ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Thu Feb 20 06:06:04 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M BOOT_IMAGE=0 RelatedPackageVersions: linux-restricted-modules-5.3.0-41-generic N/A linux-backports-modules-5.3.0-41-generic N/A linux-firmware1.183.4 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1852119] Re: Please add zfs modules to linux-raspi2
The Ubuntu kernel team recommends to have at least 4GB of free memory to run ZFS on slow backing store devices for nominal performance. Since there is OS overhead (kernel, userspace processes etc) a 4GB Raspberry Pi will perform sub-optimally. Note that the document you referenced in commet #1 states: "Computers that have less than 2 GiB of memory run ZFS slowly. 4 GiB of memory is recommended for normal performance in basic workloads. " Once you start to add in ZFS options such as compression and/or run scrubs on a slow device it is likely you may start to see high memory pressure issues occurring. Hence we do not support ZFS unless you have at least 4GB of memory free. ** Changed in: linux-raspi2 (Ubuntu) Status: Confirmed => Won't Fix ** Changed in: linux-raspi2 (Ubuntu) Importance: Undecided => Wishlist -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-raspi2 in Ubuntu. https://bugs.launchpad.net/bugs/1852119 Title: Please add zfs modules to linux-raspi2 Status in linux-raspi2 package in Ubuntu: Won't Fix Bug description: The 4gb RPI4 is more than capable of handling zfs. ( Even zfs root can be enabled manually with arm64 eoan builds using a variant of the steps at https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS . ) Currently one has to install zfs-dkms for zfs support, but ideally one would have zfs modules come with the standard arm64 kernel so that one does not have to recompile zfs on the pi. Example: uname -a Linux rpi4 5.3.0-1011-raspi2 #12-Ubuntu SMP Fri Nov 1 09:07:06 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux @rpi4:~$ zpool status pool: bpool state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: scrub repaired 0B in 0 days 00:00:00 with 0 errors on Mon Nov 11 13:50:14 2019 config: NAME STATE READ WRITE CKSUM bpool ONLINE 0 0 0 usb-Samsung_Flash_Drive_FIT_0309318110004882-0:0-part3 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: scrub repaired 0B in 0 days 00:01:21 with 0 errors on Mon Nov 11 13:51:39 2019 config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 sda4 ONLINE 0 0 0 errors: No known data errors dkms status zfs, 0.8.1, 5.3.0-1011-raspi2, aarch64: installed @rpi4:~$ cat /proc/cpuinfo processor : 0 BogoMIPS: 108.00 Features: fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd08 CPU revision: 3 processor : 1 BogoMIPS: 108.00 Features: fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd08 CPU revision: 3 processor : 2 BogoMIPS: 108.00 Features: fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd08 CPU revision: 3 processor : 3 BogoMIPS: 108.00 Features: fp asimd evtstrm crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part: 0xd08 CPU revision: 3 Hardware: BCM2835 Revision: c03111 Serial : --- Model : Raspberry Pi 4 Model B Rev 1.1 @rpi4:~$ free -m totalusedfree shared buff/cache available Mem: 3791 8832612 18 295 2836 Swap: 4095 04095 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-raspi2/+bug/1852119/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM
I found this was failing on kernel03 because there was very little space for the test to enable a large swap file. I cleaned the machine up and was unable to reproduce the failure. I'm assuming the tests were failing on kernel03, if not what machine were they being run on? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1863989 Title: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM Status in Stress-ng: Invalid Status in ubuntu-kernel-tests: Fix Committed Status in linux package in Ubuntu: Incomplete Bug description: Issue found on Eoan zVM node kernel03 Test hung at bad-altstack test. Reproducible rate: 4 out of 4 attempts 02:36:12 DEBUG| [stdout] aiol STARTING 02:36:17 DEBUG| [stdout] aiol RETURNED 0 02:36:17 DEBUG| [stdout] aiol PASSED 02:36:17 DEBUG| [stdout] bad-altstack STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@kernel03:kernel-test-results /var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive dmesg only shows: [ 102.352136] Adding 1048572k swap on /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img. Priority:-3 extents:95 across:26763272k SSFS [ 122.402895] NET: Registered protocol family 38 It looks like this is caused by OOM issue, x3270 console flushed with OOM error messages. ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-41-generic 5.3.0-41.33 ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18 Uname: Linux 5.3.0-41-generic s390x NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.11-0ubuntu8.4 Architecture: s390x ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Thu Feb 20 06:06:04 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M BOOT_IMAGE=0 RelatedPackageVersions: linux-restricted-modules-5.3.0-41-generic N/A linux-backports-modules-5.3.0-41-generic N/A linux-firmware1.183.4 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1814983] Re: zfs poor sustained read performance from ssd pool
** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => High ** Changed in: zfs-linux (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1814983 Title: zfs poor sustained read performance from ssd pool Status in zfs-linux package in Ubuntu: Confirmed Bug description: Hello, I'm seeing substantially slower read performance from an ssd pool than I expected. I have two pools on this computer; one ('fst') is four sata ssds, the other ('srv') is nine spinning metal drives. With a long-running ripgrep process on the fst pool, performance started out really good and grew to astonishingly good (iirc ~30kiops, as measured by zpool iostat -v 1). However after a few hours the performance has dropped to 30-40 iops. top reports an arc_reclaim and many arc_prune processes to be consuming most of the CPU time. I've included a screenshot of top, some output from zpool iostat -v 1, and arc_summary, with "===" to indicate the start of the next command's output: === top (memory in gigabytes): top - 16:27:53 up 70 days, 16:03, 3 users, load average: 35.67, 35.81, 35.58 Tasks: 809 total, 19 running, 612 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 58.1 sy, 0.0 ni, 39.2 id, 2.6 wa, 0.0 hi, 0.0 si, 0.0 st GiB Mem : 125.805 total,0.620 free, 96.942 used, 28.243 buff/cache GiB Swap:5.694 total,5.688 free,0.006 used. 27.840 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1523 root 20 00.0m 0.0m 0.0m R 100.0 0.0 290:52.26 arc_reclaim 4484 root 20 00.0m 0.0m 0.0m R 56.2 0.0 1:18.79 arc_prune 6225 root 20 00.0m 0.0m 0.0m R 56.2 0.0 1:11.92 arc_prune 7601 root 20 00.0m 0.0m 0.0m S 56.2 0.0 2:50.25 arc_prune 30891 root 20 00.0m 0.0m 0.0m S 56.2 0.0 1:33.08 arc_prune 3057 root 20 00.0m 0.0m 0.0m S 55.9 0.0 9:00.95 arc_prune 3259 root 20 00.0m 0.0m 0.0m R 55.9 0.0 3:16.84 arc_prune 24008 root 20 00.0m 0.0m 0.0m S 55.9 0.0 1:55.71 arc_prune 1285 root 20 00.0m 0.0m 0.0m R 55.6 0.0 3:20.52 arc_prune 5345 root 20 00.0m 0.0m 0.0m R 55.6 0.0 1:15.99 arc_prune 30121 root 20 00.0m 0.0m 0.0m S 55.6 0.0 1:35.50 arc_prune 31192 root 20 00.0m 0.0m 0.0m S 55.6 0.0 6:17.16 arc_prune 32287 root 20 00.0m 0.0m 0.0m S 55.6 0.0 1:28.02 arc_prune 32625 root 20 00.0m 0.0m 0.0m R 55.6 0.0 1:27.34 arc_prune 22572 root 20 00.0m 0.0m 0.0m S 55.3 0.0 10:02.92 arc_prune 31989 root 20 00.0m 0.0m 0.0m R 55.3 0.0 1:28.03 arc_prune 3353 root 20 00.0m 0.0m 0.0m R 54.9 0.0 8:58.81 arc_prune 10252 root 20 00.0m 0.0m 0.0m R 54.9 0.0 2:36.37 arc_prune 1522 root 20 00.0m 0.0m 0.0m S 53.9 0.0 158:42.45 arc_prune 3694 root 20 00.0m 0.0m 0.0m R 53.9 0.0 1:20.79 arc_prune 13394 root 20 00.0m 0.0m 0.0m R 53.9 0.0 10:35.78 arc_prune 24592 root 20 00.0m 0.0m 0.0m R 53.9 0.0 1:54.19 arc_prune 25859 root 20 00.0m 0.0m 0.0m S 53.9 0.0 1:51.71 arc_prune 8194 root 20 00.0m 0.0m 0.0m S 53.6 0.0 0:54.51 arc_prune 18472 root 20 00.0m 0.0m 0.0m R 53.6 0.0 2:08.73 arc_prune 29525 root 20 00.0m 0.0m 0.0m R 53.6 0.0 1:35.81 arc_prune 32291 root 20 00.0m 0.0m 0.0m S 53.6 0.0
[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade
I've uploaded a fixed package, it's now going to proceed via the normal SRU process. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1860182 Title: zpool scrub malfunction after kernel upgrade Status in zfs-linux package in Ubuntu: In Progress Bug description: == SRU Request [BIONIC] == The HWE kernel on bionic provides zfs 0.8.1 driver which includes an improved scrub however, the progress stats reported by the kernel are incompatible to the 0.7.x zfs driver. == Fix == Use the new zfs 8.x pool_scan_stat_t extra fields to calculate the scan progress when using zfs 8.x kernel drivers. Add detection of the kernel module version and use an approximation to the zfs 0.8.0 progress and rate reporting for newer kernels. For 0.7.5 we can pass the larger 8.x port_scan_stat_t to 0.7.5 zfs w/o problems and ignore these new fields and continue to use the 0.7.5 rate calculations. == Test == Install the HWE kernel on Bionic, create some large ZFS pools and populate with a lot of data. Issue: sudo zpool scrub poolname and then look at the progress using sudo zpool status Without the fix, the progress stats are incorrect. With the fix the duration and rate stats as a fairly good approximation of the progress. Since the newer 0.8.x zfs does scanning now in two phases the older zfs tools will only report accurate stats for phase #2 of the scan to keep it roughly compatible with the 0.7.x zfs utils output. == Regression Potential == This is a userspace reporting fix so the zpool status output is only affected by this fix when doing a scrub, so the impact of this fix is very small and limited. I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly: eric@eric-8700K:~$ zpool status pool: storagepool1 state: ONLINE scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020 config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 I ran zpool scrub after upgrading the kernel and rebooting, and now it fails to work properly. It appeared to finish in about 5 minutes but did not, and says it is going slow: eric@eric-8700K:~$ sudo zpool status pool: storagepool1 state: ONLINE scan: scrub in progress since Fri Jan 17 15:32:07 2020 1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time) 0B repaired, 100.00% done config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 errors: No known data errors ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: zfsutils-linux 0.7.5-1ubuntu16.7 ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13 Uname: Linux 5.3.0-26-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.9-0ubuntu7.9 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Fri Jan 17 16:22:01 2020 InstallationDate: Installed on 2018-03-07 (681 days ago) InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1) SourcePackage: zfs-linux UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago) modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs'] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade
** Description changed: + == SRU Request [BIONIC] == + + The HWE kernel on bionic provides zfs 0.8.1 driver which includes an + improved scrub however, the progress stats reported by the kernel are + incompatible to the 0.7.x zfs driver. + + == Fix == + + Use the new zfs 8.x pool_scan_stat_t extra fields to calculate + the scan progress when using zfs 8.x kernel drivers. Add detection of the kernel module version and use an approximation to the zfs 0.8.0 progress and rate reporting for newer kernels. + + For 0.7.5 we can pass the larger 8.x port_scan_stat_t to 0.7.5 + zfs w/o problems and ignore these new fields and continue + to use the 0.7.5 rate calculations. + + == Test == + + Install the HWE kernel on Bionic, create some large ZFS pools and + populate with a lot of data. Issue: + + sudo zpool scrub poolname + and then look at the progress using + + sudo zpool status + + Without the fix, the progress stats are incorrect. With the fix the + duration and rate stats as a fairly good approximation of the progress. + Since the newer 0.8.x zfs does scanning now in two phases the older zfs + tools will only report accurate stats for phase #2 of the scan to keep + it roughly compatible with the 0.7.x zfs utils output. + + == Regression Potential == + + This is a userspace reporting fix so the zpool status output is only + affected by this fix when doing a scrub, so the impact of this fix is + very small and limited. + + + I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly: eric@eric-8700K:~$ zpool status - pool: storagepool1 - state: ONLINE - scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020 + pool: storagepool1 + state: ONLINE + scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020 config: - NAME STATE READ WRITE CKSUM - storagepool1 ONLINE 0 0 0 - mirror-0ONLINE 0 0 0 - ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 - ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 - mirror-1ONLINE 0 0 0 - ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 - ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 - + NAME STATE READ WRITE CKSUM + storagepool1 ONLINE 0 0 0 + mirror-0ONLINE 0 0 0 + ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 + ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 + mirror-1ONLINE 0 0 0 + ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 + ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 I ran zpool scrub after upgrading the kernel and rebooting, and now it fails to work properly. It appeared to finish in about 5 minutes but did not, and says it is going slow: - eric@eric-8700K:~$ sudo zpool status - pool: storagepool1 - state: ONLINE - scan: scrub in progress since Fri Jan 17 15:32:07 2020 - 1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time) - 0B repaired, 100.00% done + pool: storagepool1 + state: ONLINE + scan: scrub in progress since Fri Jan 17 15:32:07 2020 + 1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time) + 0B repaired, 100.00% done config: - NAME STATE READ WRITE CKSUM - storagepool1 ONLINE 0 0 0 - mirror-0ONLINE 0 0 0 - ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 - ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 - mirror-1ONLINE 0 0 0 - ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 - ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 + NAME STATE READ WRITE CKSUM + storagepool1 ONLINE 0 0 0 + mirror-0ONLINE 0 0 0 + ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 + ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 + mirror-1ONLINE 0 0 0 + ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE
[Kernel-packages] [Bug 1858495] Re: multiple long delays during kernel and userspace boot
Hi Ryan, We would like to reproduce this bug to debug it further. Can you answer the questions below relating to your initial comments in the bug: "Booting some Bionic instances in Azure (gen1 machines).." Q: What is a gen1 machine? What instance type is this? "..I see some large delays during kernel/userspace boot that it would be good to understand what's going on. Additionally, there areas during boot that see delays is different for an image that's been created from a template vs. stock images." Q: I don't know what these are. Can you explain how these are created? Do you have any exact examples of a template and stock image? Thanks, Colin ** Changed in: linux-signed-azure (Ubuntu) Status: In Progress => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-azure in Ubuntu. https://bugs.launchpad.net/bugs/1858495 Title: multiple long delays during kernel and userspace boot Status in linux-signed-azure package in Ubuntu: Incomplete Bug description: Booting some Bionic instances in Azure (gen1 machines), I see some large delays during kernel/userspace boot that it would be good to understand what's going on. Additionally, there areas during boot that see delays is different for an image that's been created from a template vs. stock images. I'm attaching some data, 10 runs of the same image in a scaling set that run the initial boot. Processing the journal output, looking at delays of over 2.0 shows some concern. [1.788581] localhost.localdomain kernel: * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, * this clock source is slow. Consider trying other clock sources [3.545974] localhost.localdomain kernel: Unstable clock detected, switching default tracing clock to "global" If you want to keep using the local clock, then add: "trace_clock=local" on the kernel command line [6.401684] localhost.localdomain kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 15.280390] localhost.localdomain kernel: EXT4-fs (sda1): re-mounted. Opts: discard After capturing bionic image as a template, and creating a new VM, we see new hot spots we didn't see before. # HotSpot maximum delta between kernel messages: 2.0 # [2.846188] localhost.localdomain kernel: AES CTR mode by8 optimization enabled # [5.919313] localhost.localdomain kernel: raid6: avx2x4 gen() 21512 MB/s # # [6.591530] localhost.localdomain kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) # [9.031051] localhost.localdomain systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid) # # [ 13.773554] localhost.localdomain sh[871]: + exit 0 # [ 21.625467] localhost.localdomain kernel: UDF-fs: INFO Mounting volume 'UDF Volume', timestamp 2019/12/17 00:00 (1000) # # [ 24.919359] bugbif2be01 systemd-timesyncd[771]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com). # [ 29.787339] bugbif2be01 cloud-init[1026]: Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Mon, 16 Dec 2019 18:14:47 +. Up 25.20 seconds. The easiest comparison kernel-side is the systemd-analyze value: Grepping in the debug data: % grep "Startup finished.*kernel" bug-bionic-baseline-no*.debug/*/journal.log | cut -d" " -f 7- Startup finished in 3.209s (kernel) + 49.305s (userspace) = 52.515s. Startup finished in 3.355s (kernel) + 51.732s (userspace) = 55.088s. Startup finished in 3.287s (kernel) + 51.747s (userspace) = 55.035s. Startup finished in 3.129s (kernel) + 50.066s (userspace) = 53.195s. Startup finished in 3.350s (kernel) + 50.682s (userspace) = 54.032s. Startup finished in 3.355s (kernel) + 49.322s (userspace) = 52.678s. Startup finished in 3.219s (kernel) + 51.124s (userspace) = 54.343s. Startup finished in 3.128s (kernel) + 49.226s (userspace) = 52.354s. Startup finished in 3.193s (kernel) + 53.197s (userspace) = 56.390s. Startup finished in 3.118s (kernel) + 46.203s (userspace) = 49.322s. foofoo % grep "Startup finished.*kernel" bug-bionic-baseline-after*.debug/*/journal.log | cut -d" " -f 7- Startup finished in 7.685s (kernel) + 32.463s (userspace) = 40.148s. Startup finished in 7.041s (kernel) + 35.998s (userspace) = 43.040s. Startup finished in 7.808s (kernel) + 35.444s (userspace) = 43.253s. Startup finished in 7.206s (kernel) + 37.952s (userspace) = 45.159s. Startup finished in 8.426s (kernel) + 36.976s (userspace) = 45.403s.
[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM
This is probably fixed with commit: https://kernel.ubuntu.com/git/ubuntu /autotest-client- tests.git/commit/?id=4db07fef60449c786364638d7978b239676624eb I've run this a few times with the fix above and can't reproduce this issue. ** Changed in: ubuntu-kernel-tests Status: New => Fix Committed ** Changed in: stress-ng Status: New => Invalid ** Changed in: ubuntu-kernel-tests Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1863989 Title: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM Status in Stress-ng: Invalid Status in ubuntu-kernel-tests: Fix Committed Status in linux package in Ubuntu: Incomplete Bug description: Issue found on Eoan zVM node kernel03 Test hung at bad-altstack test. Reproducible rate: 4 out of 4 attempts 02:36:12 DEBUG| [stdout] aiol STARTING 02:36:17 DEBUG| [stdout] aiol RETURNED 0 02:36:17 DEBUG| [stdout] aiol PASSED 02:36:17 DEBUG| [stdout] bad-altstack STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@kernel03:kernel-test-results /var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive dmesg only shows: [ 102.352136] Adding 1048572k swap on /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img. Priority:-3 extents:95 across:26763272k SSFS [ 122.402895] NET: Registered protocol family 38 It looks like this is caused by OOM issue, x3270 console flushed with OOM error messages. ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-41-generic 5.3.0-41.33 ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18 Uname: Linux 5.3.0-41-generic s390x NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.11-0ubuntu8.4 Architecture: s390x ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Thu Feb 20 06:06:04 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M BOOT_IMAGE=0 RelatedPackageVersions: linux-restricted-modules-5.3.0-41-generic N/A linux-backports-modules-5.3.0-41-generic N/A linux-firmware1.183.4 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM
@Sam. Can you re-run the test and if it's OK then I no longer require the instance kernel03. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1863989 Title: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM Status in Stress-ng: Invalid Status in ubuntu-kernel-tests: Fix Committed Status in linux package in Ubuntu: Incomplete Bug description: Issue found on Eoan zVM node kernel03 Test hung at bad-altstack test. Reproducible rate: 4 out of 4 attempts 02:36:12 DEBUG| [stdout] aiol STARTING 02:36:17 DEBUG| [stdout] aiol RETURNED 0 02:36:17 DEBUG| [stdout] aiol PASSED 02:36:17 DEBUG| [stdout] bad-altstack STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@kernel03:kernel-test-results /var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive dmesg only shows: [ 102.352136] Adding 1048572k swap on /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img. Priority:-3 extents:95 across:26763272k SSFS [ 122.402895] NET: Registered protocol family 38 It looks like this is caused by OOM issue, x3270 console flushed with OOM error messages. ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-41-generic 5.3.0-41.33 ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18 Uname: Linux 5.3.0-41-generic s390x NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.11-0ubuntu8.4 Architecture: s390x ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Thu Feb 20 06:06:04 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M BOOT_IMAGE=0 RelatedPackageVersions: linux-restricted-modules-5.3.0-41-generic N/A linux-backports-modules-5.3.0-41-generic N/A linux-firmware1.183.4 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1863989] Re: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM
** Changed in: stress-ng Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: stress-ng Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1863989 Title: bad-altstack test from ubuntu_stress_smoke_test failed on Eoan zVM Status in Stress-ng: New Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on Eoan zVM node kernel03 Test hung at bad-altstack test. Reproducible rate: 4 out of 4 attempts 02:36:12 DEBUG| [stdout] aiol STARTING 02:36:17 DEBUG| [stdout] aiol RETURNED 0 02:36:17 DEBUG| [stdout] aiol PASSED 02:36:17 DEBUG| [stdout] bad-altstack STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@kernel03:kernel-test-results /var/lib/jenkins/jobs/smoke__E_s390x.zVM-generic__using_kernel03__for_kernel/builds/3/archive dmesg only shows: [ 102.352136] Adding 1048572k swap on /home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng/swap.img. Priority:-3 extents:95 across:26763272k SSFS [ 122.402895] NET: Registered protocol family 38 It looks like this is caused by OOM issue, x3270 console flushed with OOM error messages. ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-41-generic 5.3.0-41.33 ProcVersionSignature: Ubuntu 5.3.0-41.33-generic 5.3.18 Uname: Linux 5.3.0-41-generic s390x NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.11-0ubuntu8.4 Architecture: s390x ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Thu Feb 20 06:06:04 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_GB.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=/dev/mapper/kl03vg01-kl03root crashkernel=196M BOOT_IMAGE=0 RelatedPackageVersions: linux-restricted-modules-5.3.0-41-generic N/A linux-backports-modules-5.3.0-41-generic N/A linux-firmware1.183.4 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to eoan on 2019-09-30 (142 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1863989/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B
I confirm in my testing I get a hard kernel lockup with no log output. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1864063 Title: vm-segv from ubuntu_stress_smoke_test failed on B Status in Stress-ng: Incomplete Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Bug description: Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1 Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel Test hang with vm-segv: 05:58:36 DEBUG| [stdout] vm-addr PASSED 05:58:36 DEBUG| [stdout] vm-rw STARTING 05:58:41 DEBUG| [stdout] vm-rw RETURNED 0 05:58:41 DEBUG| [stdout] vm-rw PASSED 05:58:41 DEBUG| [stdout] vm-segv STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()
Please ignore the above. Apparently the issue needs a little more digging and the workaround is insufficient. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1861235 Title: zfs recv PANIC at range_tree.c:304:range_tree_find_impl() Status in Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Status in zfs-linux source package in Bionic: New Bug description: Same as bug 1861228 but with a newer kernel installed. [ 790.702566] VERIFY(size != 0) failed [ 790.702590] PANIC at range_tree.c:304:range_tree_find_impl() [ 790.702611] Showing stack for process 28685 [ 790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015 [ 790.702616] Call Trace: [ 790.702626] dump_stack+0x6d/0x8e [ 790.702637] spl_dumpstack+0x42/0x50 [spl] [ 790.702640] spl_panic+0xc8/0x110 [spl] [ 790.702645] ? __switch_to_asm+0x41/0x70 [ 790.702714] ? arc_prune_task+0x1a/0x40 [zfs] [ 790.702740] ? dbuf_dirty+0x43d/0x850 [zfs] [ 790.702745] ? getrawmonotonic64+0x43/0xd0 [ 790.702746] ? getrawmonotonic64+0x43/0xd0 [ 790.702775] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702778] ? getrawmonotonic64+0x43/0xd0 [ 790.702805] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702807] ? mutex_lock+0x12/0x40 [ 790.702833] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs] [ 790.702866] range_tree_find_impl+0x88/0x90 [zfs] [ 790.702870] ? spl_kmem_zalloc+0xdc/0x1a0 [spl] [ 790.702902] range_tree_clear+0x4f/0x60 [zfs] [ 790.702930] dnode_free_range+0x11f/0x5a0 [zfs] [ 790.702957] dmu_object_free+0x53/0x90 [zfs] [ 790.702983] dmu_free_long_object+0x9f/0xc0 [zfs] [ 790.703010] receive_freeobjects.isra.12+0x7a/0x100 [zfs] [ 790.703036] receive_writer_thread+0x6d2/0xa60 [zfs] [ 790.703040] ? set_curr_task_fair+0x2b/0x60 [ 790.703043] ? spl_kmem_free+0x33/0x40 [spl] [ 790.703048] ? kfree+0x165/0x180 [ 790.703073] ? receive_free.isra.13+0xc0/0xc0 [zfs] [ 790.703078] thread_generic_wrapper+0x74/0x90 [spl] [ 790.703081] kthread+0x121/0x140 [ 790.703084] ? __thread_exit+0x20/0x20 [spl] [ 790.703085] ? kthread_create_worker_on_cpu+0x70/0x70 [ 790.703088] ret_from_fork+0x35/0x40 [ 967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds. [ 967.636979] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637076] txg_quiesce D0 14810 2 0x8000 [ 967.637080] Call Trace: [ 967.637089] __schedule+0x24e/0x880 [ 967.637092] schedule+0x2c/0x80 [ 967.637106] cv_wait_common+0x11e/0x140 [spl] [ 967.637114] ? wait_woken+0x80/0x80 [ 967.637122] __cv_wait+0x15/0x20 [spl] [ 967.637210] txg_quiesce_thread+0x2cb/0x3d0 [zfs] [ 967.637278] ? txg_delay+0x1b0/0x1b0 [zfs] [ 967.637286] thread_generic_wrapper+0x74/0x90 [spl] [ 967.637291] kthread+0x121/0x140 [ 967.637297] ? __thread_exit+0x20/0x20 [spl] [ 967.637299] ? kthread_create_worker_on_cpu+0x70/0x70 [ 967.637304] ret_from_fork+0x35/0x40 [ 967.637326] INFO: task zfs:28590 blocked for more than 120 seconds. [ 967.637371] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637467] zfs D0 28590 28587 0x8080 [ 967.637470] Call Trace: [ 967.637474] __schedule+0x24e/0x880 [ 967.637477] schedule+0x2c/0x80 [ 967.637486] cv_wait_common+0x11e/0x140 [spl] [ 967.637491] ? wait_woken+0x80/0x80 [ 967.637498] __cv_wait+0x15/0x20 [spl] [ 967.637554] dmu_recv_stream+0xa51/0xef0 [zfs] [ 967.637630] zfs_ioc_recv_impl+0x306/0x1100 [zfs] [ 967.637679] ? dbuf_read+0x34a/0x920 [zfs] [ 967.637725] ? dbuf_rele+0x36/0x40 [zfs] [ 967.637728] ? _cond_resched+0x19/0x40 [ 967.637798] zfs_ioc_recv_new+0x33d/0x410 [zfs] [ 967.637809] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl] [ 967.637816] ? spl_vmem_alloc+0x19/0x20 [spl] [ 967.637828] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair] [ 967.637834] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair] [ 967.637840] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair] [ 967.637905] zfsdev_ioctl+0x451/0x610 [zfs] [ 967.637913] do_vfs_ioctl+0xa8/0x630 [ 967.637917] ? __audit_syscall_entry+0xbc/0x110 [ 967.637924] ? syscall_trace_enter+0x1da/0x2d0 [ 967.637927] SyS_ioctl+0x79/0x90 [ 967.637930] do_syscall_64+0x73/0x130 [ 967.637935] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 967.637938] RIP: 0033:0x7fc305a905d7 [ 967.637940] RSP: 002b:7ffc45e39618 EFLAGS: 0246 ORIG_RAX: 0010 [ 967.637943] RAX: ffda
[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()
I've uploaded a potential fix to a PPA, do you mind testing this using the zfs-dkms kernel modules as follows: sudo add-apt-repository ppa:colin-king/zfs-sru-1861235 sudo apt-get update sudo apt-get install zfs-dkms and reboot. Then check the correct ZFS module is being used by: dmesg | grep ZFS It should be the 0.7.5-1ubuntu16.9~lp1861235 version. And see if this helps avoid this issue. ** Changed in: linux (Ubuntu) Status: Confirmed => Incomplete ** Also affects: zfs-linux (Ubuntu) Importance: Undecided Status: New ** Changed in: zfs-linux (Ubuntu) Status: New => Incomplete ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => High ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: zfs-linux (Ubuntu Bionic) Importance: Undecided Status: New ** No longer affects: linux (Ubuntu) ** No longer affects: linux (Ubuntu Bionic) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861235 Title: zfs recv PANIC at range_tree.c:304:range_tree_find_impl() Status in Linux: Unknown Status in zfs-linux package in Ubuntu: Incomplete Status in zfs-linux source package in Bionic: New Bug description: Same as bug 1861228 but with a newer kernel installed. [ 790.702566] VERIFY(size != 0) failed [ 790.702590] PANIC at range_tree.c:304:range_tree_find_impl() [ 790.702611] Showing stack for process 28685 [ 790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015 [ 790.702616] Call Trace: [ 790.702626] dump_stack+0x6d/0x8e [ 790.702637] spl_dumpstack+0x42/0x50 [spl] [ 790.702640] spl_panic+0xc8/0x110 [spl] [ 790.702645] ? __switch_to_asm+0x41/0x70 [ 790.702714] ? arc_prune_task+0x1a/0x40 [zfs] [ 790.702740] ? dbuf_dirty+0x43d/0x850 [zfs] [ 790.702745] ? getrawmonotonic64+0x43/0xd0 [ 790.702746] ? getrawmonotonic64+0x43/0xd0 [ 790.702775] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702778] ? getrawmonotonic64+0x43/0xd0 [ 790.702805] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702807] ? mutex_lock+0x12/0x40 [ 790.702833] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs] [ 790.702866] range_tree_find_impl+0x88/0x90 [zfs] [ 790.702870] ? spl_kmem_zalloc+0xdc/0x1a0 [spl] [ 790.702902] range_tree_clear+0x4f/0x60 [zfs] [ 790.702930] dnode_free_range+0x11f/0x5a0 [zfs] [ 790.702957] dmu_object_free+0x53/0x90 [zfs] [ 790.702983] dmu_free_long_object+0x9f/0xc0 [zfs] [ 790.703010] receive_freeobjects.isra.12+0x7a/0x100 [zfs] [ 790.703036] receive_writer_thread+0x6d2/0xa60 [zfs] [ 790.703040] ? set_curr_task_fair+0x2b/0x60 [ 790.703043] ? spl_kmem_free+0x33/0x40 [spl] [ 790.703048] ? kfree+0x165/0x180 [ 790.703073] ? receive_free.isra.13+0xc0/0xc0 [zfs] [ 790.703078] thread_generic_wrapper+0x74/0x90 [spl] [ 790.703081] kthread+0x121/0x140 [ 790.703084] ? __thread_exit+0x20/0x20 [spl] [ 790.703085] ? kthread_create_worker_on_cpu+0x70/0x70 [ 790.703088] ret_from_fork+0x35/0x40 [ 967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds. [ 967.636979] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637076] txg_quiesce D0 14810 2 0x8000 [ 967.637080] Call Trace: [ 967.637089] __schedule+0x24e/0x880 [ 967.637092] schedule+0x2c/0x80 [ 967.637106] cv_wait_common+0x11e/0x140 [spl] [ 967.637114] ? wait_woken+0x80/0x80 [ 967.637122] __cv_wait+0x15/0x20 [spl] [ 967.637210] txg_quiesce_thread+0x2cb/0x3d0 [zfs] [ 967.637278] ? txg_delay+0x1b0/0x1b0 [zfs] [ 967.637286] thread_generic_wrapper+0x74/0x90 [spl] [ 967.637291] kthread+0x121/0x140 [ 967.637297] ? __thread_exit+0x20/0x20 [spl] [ 967.637299] ? kthread_create_worker_on_cpu+0x70/0x70 [ 967.637304] ret_from_fork+0x35/0x40 [ 967.637326] INFO: task zfs:28590 blocked for more than 120 seconds. [ 967.637371] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637467] zfs D0 28590 28587 0x8080 [ 967.637470] Call Trace: [ 967.637474] __schedule+0x24e/0x880 [ 967.637477] schedule+0x2c/0x80 [ 967.637486] cv_wait_common+0x11e/0x140 [spl] [ 967.637491] ? wait_woken+0x80/0x80 [ 967.637498] __cv_wait+0x15/0x20 [spl] [ 967.637554] dmu_recv_stream+0xa51/0xef0 [zfs] [ 967.637630] zfs_ioc_recv_impl+0x306/0x1100 [zfs] [ 967
[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B
** Changed in: stress-ng Assignee: Colin Ian King (colin-king) => Kleber Sacilotto de Souza (kleber-souza) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1864063 Title: vm-segv from ubuntu_stress_smoke_test failed on B Status in Stress-ng: Incomplete Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1 Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel Test hang with vm-segv: 05:58:36 DEBUG| [stdout] vm-addr PASSED 05:58:36 DEBUG| [stdout] vm-rw STARTING 05:58:41 DEBUG| [stdout] vm-rw RETURNED 0 05:58:41 DEBUG| [stdout] vm-rw PASSED 05:58:41 DEBUG| [stdout] vm-segv STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B
To reproduce (on an 8 CPU VM): sudo apt-get update && sudo apt-get dist-upgrade sudo apt-get build-dep stress-ng git clone git://kernel.ubuntu.com/cking/stress-ng cd stress-ng make sudo ./stress-ng --vm-segv 0 -t 10 -v Comment out a ptrace line and rebuild and re-run and the hang does not occur. So it's ptrace releated. diff --git a/stress-vm-segv.c b/stress-vm-segv.c index 39e4cbeb..54d590cd 100644 --- a/stress-vm-segv.c +++ b/stress-vm-segv.c @@ -129,7 +129,7 @@ kill_child: stress_process_dumpable(false); #if defined(HAVE_PTRACE) - (void)ptrace(PTRACE_TRACEME); + //(void)ptrace(PTRACE_TRACEME); kill(getpid(), SIGSTOP); #endif (void)sigemptyset(); -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1864063 Title: vm-segv from ubuntu_stress_smoke_test failed on B Status in Stress-ng: Incomplete Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1 Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel Test hang with vm-segv: 05:58:36 DEBUG| [stdout] vm-addr PASSED 05:58:36 DEBUG| [stdout] vm-rw STARTING 05:58:41 DEBUG| [stdout] vm-rw RETURNED 0 05:58:41 DEBUG| [stdout] vm-rw PASSED 05:58:41 DEBUG| [stdout] vm-segv STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B
I spoke too soon. I was able to trip this 4.15.0-89.89 but not 4.15.0-88. So this looks like a regression. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1864063 Title: vm-segv from ubuntu_stress_smoke_test failed on B Status in Stress-ng: Incomplete Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1 Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel Test hang with vm-segv: 05:58:36 DEBUG| [stdout] vm-addr PASSED 05:58:36 DEBUG| [stdout] vm-rw STARTING 05:58:41 DEBUG| [stdout] vm-rw RETURNED 0 05:58:41 DEBUG| [stdout] vm-rw PASSED 05:58:41 DEBUG| [stdout] vm-segv STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B
I can't reproduce this on the systems I'm using. Can I get access to onibi to try and reproduce this issue? ** Changed in: stress-ng Status: In Progress => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1864063 Title: vm-segv from ubuntu_stress_smoke_test failed on B Status in Stress-ng: Incomplete Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1 Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel Test hang with vm-segv: 05:58:36 DEBUG| [stdout] vm-addr PASSED 05:58:36 DEBUG| [stdout] vm-rw STARTING 05:58:41 DEBUG| [stdout] vm-rw RETURNED 0 05:58:41 DEBUG| [stdout] vm-rw PASSED 05:58:41 DEBUG| [stdout] vm-segv STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1864063] Re: vm-segv from ubuntu_stress_smoke_test failed on B
Do you have any info on the number of CPUs, memory and swap size of onibi? I can then see if I can reproduce the issue. Or better, access to onibi would be most helpful to see if I can repro this issue. ** Changed in: stress-ng Status: New => In Progress ** Changed in: stress-ng Importance: Undecided => Medium ** Changed in: stress-ng Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1864063 Title: vm-segv from ubuntu_stress_smoke_test failed on B Status in Stress-ng: In Progress Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Bug description: Issue found on node onibi, with kernel 4.15.0-89.89/ 4.15.0-89.89~16.04.1 Reproduce rate: 2/2 on generic kernel, 2/2 on lowlatency kernel, 2/2 on X-hwe generic kernel Test hang with vm-segv: 05:58:36 DEBUG| [stdout] vm-addr PASSED 05:58:36 DEBUG| [stdout] vm-rw STARTING 05:58:41 DEBUG| [stdout] vm-rw RETURNED 0 05:58:41 DEBUG| [stdout] vm-rw PASSED 05:58:41 DEBUG| [stdout] vm-segv STARTING + ARCHIVE=/var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive + scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -r ubuntu@onibi:kernel-test-results /var/lib/jenkins/jobs/smoke__B_amd64-generic__using_onibi__for_kernel/builds/2/archive To manage notifications about this bug go to: https://bugs.launchpad.net/stress-ng/+bug/1864063/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()
What is interesting is the following commit modifies range_tree_clear() so it performs a zero size check and returns before calling range_tree_find_impl(). This commit is not in 18.10 and 19.04 Ubuntu ZFS releases. commit a1d477c Author: Matthew Ahrens mahr...@delphix.com Date: Thu Sep 22 09:30:13 2016 -0700 OpenZFS 7614, 9064 - zfs device evacuation/removal OpenZFS 7614 - zfs device evacuation/removal OpenZFS 9064 - remove_mirror should wait for device removal to complete the specific change is: @@ -560,6 +536,9 @@ range_tree_clear(range_tree_t *rt, uint64_t start, uint64_t size) { range_seg_t *rs; + if (size == 0) + return; + while ((rs = range_tree_find_impl(rt, start, size)) != NULL) { uint64_t free_start = MAX(rs->rs_start, start); uint64_t free_end = MIN(rs->rs_end, start + size); I'm not sure why this check was added, but I guess it handles the cases were zero sized allocations are allowed and stops these from doing any unnecessary clearing and avoids the assertion. But the semantics change is not clear in the commit message. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861235 Title: zfs recv PANIC at range_tree.c:304:range_tree_find_impl() Status in Linux: Unknown Status in linux package in Ubuntu: Confirmed Bug description: Same as bug 1861228 but with a newer kernel installed. [ 790.702566] VERIFY(size != 0) failed [ 790.702590] PANIC at range_tree.c:304:range_tree_find_impl() [ 790.702611] Showing stack for process 28685 [ 790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015 [ 790.702616] Call Trace: [ 790.702626] dump_stack+0x6d/0x8e [ 790.702637] spl_dumpstack+0x42/0x50 [spl] [ 790.702640] spl_panic+0xc8/0x110 [spl] [ 790.702645] ? __switch_to_asm+0x41/0x70 [ 790.702714] ? arc_prune_task+0x1a/0x40 [zfs] [ 790.702740] ? dbuf_dirty+0x43d/0x850 [zfs] [ 790.702745] ? getrawmonotonic64+0x43/0xd0 [ 790.702746] ? getrawmonotonic64+0x43/0xd0 [ 790.702775] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702778] ? getrawmonotonic64+0x43/0xd0 [ 790.702805] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702807] ? mutex_lock+0x12/0x40 [ 790.702833] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs] [ 790.702866] range_tree_find_impl+0x88/0x90 [zfs] [ 790.702870] ? spl_kmem_zalloc+0xdc/0x1a0 [spl] [ 790.702902] range_tree_clear+0x4f/0x60 [zfs] [ 790.702930] dnode_free_range+0x11f/0x5a0 [zfs] [ 790.702957] dmu_object_free+0x53/0x90 [zfs] [ 790.702983] dmu_free_long_object+0x9f/0xc0 [zfs] [ 790.703010] receive_freeobjects.isra.12+0x7a/0x100 [zfs] [ 790.703036] receive_writer_thread+0x6d2/0xa60 [zfs] [ 790.703040] ? set_curr_task_fair+0x2b/0x60 [ 790.703043] ? spl_kmem_free+0x33/0x40 [spl] [ 790.703048] ? kfree+0x165/0x180 [ 790.703073] ? receive_free.isra.13+0xc0/0xc0 [zfs] [ 790.703078] thread_generic_wrapper+0x74/0x90 [spl] [ 790.703081] kthread+0x121/0x140 [ 790.703084] ? __thread_exit+0x20/0x20 [spl] [ 790.703085] ? kthread_create_worker_on_cpu+0x70/0x70 [ 790.703088] ret_from_fork+0x35/0x40 [ 967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds. [ 967.636979] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637076] txg_quiesce D0 14810 2 0x8000 [ 967.637080] Call Trace: [ 967.637089] __schedule+0x24e/0x880 [ 967.637092] schedule+0x2c/0x80 [ 967.637106] cv_wait_common+0x11e/0x140 [spl] [ 967.637114] ? wait_woken+0x80/0x80 [ 967.637122] __cv_wait+0x15/0x20 [spl] [ 967.637210] txg_quiesce_thread+0x2cb/0x3d0 [zfs] [ 967.637278] ? txg_delay+0x1b0/0x1b0 [zfs] [ 967.637286] thread_generic_wrapper+0x74/0x90 [spl] [ 967.637291] kthread+0x121/0x140 [ 967.637297] ? __thread_exit+0x20/0x20 [spl] [ 967.637299] ? kthread_create_worker_on_cpu+0x70/0x70 [ 967.637304] ret_from_fork+0x35/0x40 [ 967.637326] INFO: task zfs:28590 blocked for more than 120 seconds. [ 967.637371] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637467] zfs D0 28590 28587 0x8080 [ 967.637470] Call Trace: [ 967.637474] __schedule+0x24e/0x880 [ 967.637477] schedule+0x2c/0x80 [ 967.637486] cv_wait_common+0x11e/0x140 [spl] [ 967.637491] ? wait_woken+0x80/0x80 [ 967.637498] __cv_wait+0x15/0x20 [spl] [ 967.637554] dmu_recv_stream+0xa51/0xef0 [zfs] [ 967.637630] zfs_ioc_recv_impl+0x306/0x1100 [zfs] [ 967.637679] ?
[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support
** Changed in: zfs-linux (Ubuntu) Status: Fix Committed => Fix Released ** Changed in: spl-linux (Ubuntu) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856704 Title: backport 5.3 zfs support to bionic for HWE kernel support Status in spl-linux package in Ubuntu: Fix Released Status in zfs-linux package in Ubuntu: Fix Released Status in spl-linux source package in Bionic: Fix Released Status in zfs-linux source package in Bionic: Fix Released Bug description: == SRU Justification Bionic == The HWE 5.3 kernel requires ZFS + SPL to support dkms module build functionality for kernels 4.15 through to 5.3. Basically, the ZFS+SPL compat commits between 4.15 and 5.3 are required to allow the modules to build on kernels upto and include the HWE 5.3 kernel. == The Fix == Backport of upstream commits: SPL: - 0002-fix-spl-build-shrinker-callback-check.patch - 0003-remove-deprecated-set-fs-pwd-check.patch - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch - 0005-Linux-4.20-compat-current_kernel_time.patch - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch - 0007-Linux-5.0-compat-Use-totalram_pages.patch - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch - 0010-Linux-5.1-compat-get_ds-removed.patch - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch - 0012-Linux-5.2-compat-rw_tryupgrade.patch - 0013-Linux-5.3-compat-rw_semaphore-owner.patch - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch - 0016-Linux-compat-4.16-SECTOR_SIZE.patch - 0017-Linux-compat-spl-timespec_sub.patch - 0018-deprecate-splat-rwlock-test6.patch ZFS: - 3300-Linux-4.16-compat-inode_set_iversion.patch - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch - 3302-Linux-4.16-compat-get_disk_and_module.patch - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch - 3305-Linux-4.14-compat-blk_queue_stackable.patch - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch - 3308-Linux-5.0-compat-Use-totalram_pages.patch - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch - 3315-Linux-5.0-compat-ASM_BUG-macro.patch - 3316-Linux-5.2-compat-rw_tryupgrade.patch - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch - 3320-zpios-deprecate-current-kernel-time.patch - 3321-add-compat-check-disk-size-change.patch == Testcase == Without these commits users who install kernels and kernel headers from 4.16 through to 5.3 inclusive won't be able to build spl + zfs in Bionic because of the lack of the kernel compat fixes. With the commits, zfs + spl dkms modules can build cleanly and pass the ubuntu ZFS regression tests found in the kernel team autotests git repository. == Risk == This is a sizeable backport that touches a fair amount of spl + zfs kernel interfacing code. There is a risk that the backport may cause a regression in functionality that has not been exercised by the ZFS regression tests. This backport with the zfs regression testing ensures that no regression in core zfs functionality has been found. It must be noted that most of the patches are upstream compat fixes that are known to be working with the latest ZFS that is being used in focal, so we are confident the original compat changes work. Note that these updates have all been build tested on x86-64, arm64 and s390x systems with kernels from 4.16 to 5.3 and regression tested with the ubuntu zfs regression tests. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1863136] Re: zfs-dkms will not compile on kernel 5.6rc1
Unfortunately we require some more 5.6 compat fixes as 5.6-rc3 fails to build with the current 2 compat upstream fixes: make[3]: Entering directory '/usr/src/linux-headers-5.6.0-050600rc3-generic' CC [M] /var/lib/dkms/zfs/0.8.3/build/module/avl/avl.o CC [M] /var/lib/dkms/zfs/0.8.3/build/module/icp/illumos-crypto.o LD [M] /var/lib/dkms/zfs/0.8.3/build/module/avl/zavl.o CC [M] /var/lib/dkms/zfs/0.8.3/build/module/lua/lapi.o In file included from /var/lib/dkms/zfs/0.8.3/build/include/spl/sys/condvar.h:33, from /var/lib/dkms/zfs/0.8.3/build/include/sys/zfs_context.h:38, from /var/lib/dkms/zfs/0.8.3/build/include/sys/crypto/common.h:39, from /var/lib/dkms/zfs/0.8.3/build/module/icp/illumos-crypto.c:35: /var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:88:15: error: unknown type name ‘time_t’ 88 | static inline time_t | ^~ /var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h: In function ‘gethrtime’: /var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:108:18: error: storage size of ‘ts’ isn’t known 108 | struct timespec ts; | ^~ /var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:109:2: error: implicit declaration of function ‘getrawmonotonic’ [-Werror=implicit-function-declaration] 109 | getrawmonotonic(); | ^~~ /var/lib/dkms/zfs/0.8.3/build/include/spl/sys/time.h:108:18: warning: unused variable ‘ts’ [-Wunused-variable] 108 | struct timespec ts; | ^~ cc1: some warnings being treated as errors I'll revisit this once the appropriate 5.6 compat fixes have landed. ** Changed in: zfs-linux (Ubuntu) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1863136 Title: zfs-dkms will not compile on kernel 5.6rc1 Status in zfs-linux package in Ubuntu: In Progress Bug description: Bug mentioned here: https://github.com/zfsonlinux/zfs/issues/10001 Patch for zfs master branch here: https://github.com/zfsonlinux/zfs/pull/9961 Patch modified for zfs-dkms_0.8.3-1ubuntu3 here: https://paste.ubuntu.com/p/wsS9GFHjyv/ This is working for me in getting a 5.5.x kernel to access zpools on arm64 and 5.5.2 mainline & 5.6rc1 mainline kernels to access zpools on amd64. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1863136/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Soak tested the -proposed kernel for 2 hours with no hang occurring. Verified OK. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Fix Committed Status in zram-config source package in Bionic: Confirmed Bug description: == SRU Justification == When using zram (as installed and configured with the zram-config package) systems can lockup after about a week of use. This occurs because of a hang in a lock in zram. == Test Case == Run stress-ng --brk 0 --stack 0 in a Bionic amd64 server VM with 1GM of memory, 16 CPU threads and zram-config installed. Without the fix the kernel will hang in a spinlock after 1-2 hours of run time. With the fix, the hang does not occur. Testing shows that with the fix, 5 x 16 CPU hours of stress testing with stress-ng works fine without the lockup occurring. == The fix == Upstream commit c4d6c4cc7bfd ("zram: correct flag name of ZRAM_ACCESS") as a prerequisite followed by a minor context wiggle backport of the fix with commit 3c9959e02547 ("zram: fix lockdep warning of free block handling"). == Regression Potential == This touches the zram locking, so the core zram driver is affected. However the fixes are backports from 5.0, so the fixes have had a fair amount of testing in later kernels. My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1
[Kernel-packages] [Bug 1863136] Re: zfs-dkms will not compile on kernel 5.6rc1
Thanks for confirming that 5.6-rc1 won't yet build with the zfd-dkms. I will roll in all the necessary compat fixes required for 5.5 and 5.6 to build once we get to a later release candidate of the kernel to avoid the extra uploading and regression testing that we run before making a release. ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Medium ** Changed in: zfs-linux (Ubuntu) Status: New => In Progress ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Status: In Progress => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1863136 Title: zfs-dkms will not compile on kernel 5.6rc1 Status in zfs-linux package in Ubuntu: Confirmed Bug description: Bug mentioned here: https://github.com/zfsonlinux/zfs/issues/10001 Patch for zfs master branch here: https://github.com/zfsonlinux/zfs/pull/9961 Patch modified for zfs-dkms_0.8.3-1ubuntu3 here: https://paste.ubuntu.com/p/wsS9GFHjyv/ This is working for me in getting a 5.5.x kernel to access zpools on arm64 and 5.5.2 mainline & 5.6rc1 mainline kernels to access zpools on amd64. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1863136/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1832384] Re: Unable to unmount apparently unused filesystem
** Changed in: linux (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1832384 Title: Unable to unmount apparently unused filesystem Status in linux package in Ubuntu: Fix Released Bug description: We periodically see an issue where unmounting a ZFS filesystem fails with EBUSY, even though there appears to be no one using it. # cat /proc/self/mounts | grep /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive zfs rw,nosuid,nodev,noexec,relatime,xattr,noacl 0 0 'lsof' and 'fuser' show no processes using any of the files in the problematic filesystem: # ls -l /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/ total 221 -rw-r- 1 500 500 52736 May 22 11:01 1_19_1008904362.dbf -rw-r- 1 500 500 541696 May 22 11:03 1_20_1008904362.dbf # fuser /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_20_1008904362.dbf # fuser /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_19_1008904362.dbf # fuser /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/ # lsof | grep /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive # The filesystem was shared over NFS, but has since been unshared: # showmount -e | grep /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive # Since no one appears to be using the filesystem, our expectation is that it should be possible to unmount the filesystem. However, attempts to unmount the filesystem fail with EBUSY: # zfs destroy domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive umount: /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target is busy. cannot unmount '/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive': umount failed # umount /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive umount: /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target is busy. Using bpftrace, we can see that the unmount is failing in 'propagate_mount_busy()' in the kernel. Using a live kernel debugger, we can look at the 'mount' struct for this particular mount and see that the 'mnt_count' refcount summed across all CPUs is 2. For filesystems that are eligible for unmounting, the refcount is 1. The only way to work around this issue that we have found is to reboot, at which point the filesystem can be unmounted and destroyed. So far, we have only been able to reproduce this using a workload driven by our application. The application mananges ZFS filesystems in groups, and the lifecycle of each group looks something like - Create and mount a group of filesystems, 1 parent and 4 children: /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370 /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/datafile /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/external /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/temp - Share all 5 filesystems over NFS - A client mounts all 5 shares using NFSv3 - For a few hours, the client does NFS operations on the filesystems and the server occasionally takes ZFS snapshots of them - Unshare filesystems - Unmount filesystems - Delete filesystems These groups of filesystems are constantly being created and destroyed. At any given time, we have ~30k filesystems on the system, about 5k of which are shared. On average, one out of ~200-300k unmounts fails with this EBUSY error. To create and destroy this many filesystems takes us about a week or so. Note that we are using ZFS built from https://github.com/delphix/zfs, which is essentially master ZFS on Linux. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-50-generic 4.15.0-50.54 ProcVersionSignature: Ubuntu 4.15.0-50.54-generic 4.15.18 Uname: Linux 4.15.0-50-generic x86_64 NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp AlsaDevices: total 0 crw-rw 1 root audio 116, 1 May 20 19:10 seq crw-rw 1 root audio 116, 33 May 20 19:10 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.6 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v',
[Kernel-packages] [Bug 1832384] Re: Unable to unmount apparently unused filesystem
@John, I was wondering what to do about this bug report. Is it still an issue or shall I close it? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1832384 Title: Unable to unmount apparently unused filesystem Status in linux package in Ubuntu: Incomplete Bug description: We periodically see an issue where unmounting a ZFS filesystem fails with EBUSY, even though there appears to be no one using it. # cat /proc/self/mounts | grep /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive zfs rw,nosuid,nodev,noexec,relatime,xattr,noacl 0 0 'lsof' and 'fuser' show no processes using any of the files in the problematic filesystem: # ls -l /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/ total 221 -rw-r- 1 500 500 52736 May 22 11:01 1_19_1008904362.dbf -rw-r- 1 500 500 541696 May 22 11:03 1_20_1008904362.dbf # fuser /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_20_1008904362.dbf # fuser /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/1_19_1008904362.dbf # fuser /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive/ # lsof | grep /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive # The filesystem was shared over NFS, but has since been unshared: # showmount -e | grep /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive # Since no one appears to be using the filesystem, our expectation is that it should be possible to unmount the filesystem. However, attempts to unmount the filesystem fail with EBUSY: # zfs destroy domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive umount: /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target is busy. cannot unmount '/domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive': umount failed # umount /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive umount: /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive: target is busy. Using bpftrace, we can see that the unmount is failing in 'propagate_mount_busy()' in the kernel. Using a live kernel debugger, we can look at the 'mount' struct for this particular mount and see that the 'mnt_count' refcount summed across all CPUs is 2. For filesystems that are eligible for unmounting, the refcount is 1. The only way to work around this issue that we have found is to reboot, at which point the filesystem can be unmounted and destroyed. So far, we have only been able to reproduce this using a workload driven by our application. The application mananges ZFS filesystems in groups, and the lifecycle of each group looks something like - Create and mount a group of filesystems, 1 parent and 4 children: /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370 /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/datafile /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/external /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/archive /domain0/group-38/oracle_db_container-202/oracle_timeflow-16370/temp - Share all 5 filesystems over NFS - A client mounts all 5 shares using NFSv3 - For a few hours, the client does NFS operations on the filesystems and the server occasionally takes ZFS snapshots of them - Unshare filesystems - Unmount filesystems - Delete filesystems These groups of filesystems are constantly being created and destroyed. At any given time, we have ~30k filesystems on the system, about 5k of which are shared. On average, one out of ~200-300k unmounts fails with this EBUSY error. To create and destroy this many filesystems takes us about a week or so. Note that we are using ZFS built from https://github.com/delphix/zfs, which is essentially master ZFS on Linux. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-50-generic 4.15.0-50.54 ProcVersionSignature: Ubuntu 4.15.0-50.54-generic 4.15.18 Uname: Linux 4.15.0-50-generic x86_64 NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp AlsaDevices: total 0 crw-rw 1 root audio 116, 1 May 20 19:10 seq crw-rw 1 root audio 116, 33 May 20 19:10 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.6 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
AppleTalk is disabled on focal s390x 5.4.0-12 kernels so this bug cannot be tripped. Marking this as fixed released even though it's not a direct fix, it does stop the issue. ** Changed in: linux (Ubuntu) Status: Triaged => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Fix Released Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335]
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
Just to say, I did retry the reproducer test and also re-ran the adt tests to double check that this no longer fails on 5.4.0-12. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Fix Released Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:37 DEBUG| [stdout] [ 3738.474354]
[Kernel-packages] [Bug 1822118] Re: Kernel Panic while rebooting cloud instance
@Finom, that's a good observation, much appreciated. ** Changed in: systemd (Ubuntu) Importance: Undecided => High ** Changed in: systemd (Ubuntu) Assignee: (unassigned) => Dimitri John Ledkov (xnox) ** Description changed: - Description: In the event a particular Azure cloud instance is - rebooted it's possible that it may never recover and the instance will - break indefinitely. + Very occasionally systemd panics on reboots of an azure instance. A + workaround to this issue is described in comment #20 + + + + + + Description: In the event a particular Azure cloud instance is rebooted it's possible that it may never recover and the instance will break indefinitely. In My case, it was a kernel panic. See specifics below.. - Series: Disco Instance Size: Basic_A3 Region: (Default) US-WEST-2 Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - - I had a simple script to reboot an instance (X) amount of times, I chose 50, so the machine would power cycle by issuing a "reboot" from the terminal prompt just as a user would. Once the machine came up, it captured dmesg and other bits then rebooted again until it reached 50. + I had a simple script to reboot an instance (X) amount of times, I chose + 50, so the machine would power cycle by issuing a "reboot" from the + terminal prompt just as a user would. Once the machine came up, it + captured dmesg and other bits then rebooted again until it reached 50. After the 4th attempt, my script timed out, I took a look at the instance console log and the following displayed on the console. - [ OK ] Reached target Reboot. /shutdown: error while loading shared libra[ 89.498980] Kernel panic - not syncing: Attempted to kill init! exitcode=0x7f00 [ 89.498980] [ 89.500042] CPU: 0 PID: 1 Comm: shutdown Not tainted 4.18.0-1013-azure #13-Ubuntu [ 89.508026] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 [ 89.508026] Call Trace: [ 89.508026] dump_stack+0x63/0x8a [ 89.508026] panic+0xe7/0x247 [ 89.508026] do_exit.cold.23+0x26/0x75 [ 89.508026] do_group_exit+0x43/0xb0 [ 89.508026] __x64_sys_exit_group+0x18/0x20 [ 89.508026] do_syscall_64+0x5a/0x110 [ 89.508026] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 89.508026] RIP: 0033:0x7f7bf0154d86 [ 89.508026] Code: Bad RIP value. [ 89.508026] RSP: 002b:7ffd6be693b8 EFLAGS: 0206 ORIG_RAX: 00e7 [ 89.508026] RAX: ffda RBX: 7f7bf015e420 RCX: 7f7bf0154d86 [ 89.508026] RDX: 007f RSI: 003c RDI: 007f [ 89.508026] RBP: 7f7bef9449c0 R08: 00e7 R09: [ 89.508026] R10: 7ffd6be6974c R11: 0206 R12: 0018 [ 89.508026] R13: 7f7bef944ac8 R14: 7f7bef944a00 R15: [ 89.508026] Kernel Offset: 0x1600 from 0x8100 (relocation range: 0x8000-0xbfff) [ 89.508026] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x7f00 [ 89.508026] ]--- - this only occurred once in my testing. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1822118 Title: Kernel Panic while rebooting cloud instance Status in linux-azure package in Ubuntu: Incomplete Status in systemd package in Ubuntu: Confirmed Bug description: Very occasionally systemd panics on reboots of an azure instance. A workaround to this issue is described in comment #20 Description: In the event a particular Azure cloud instance is rebooted it's possible that it may never recover and the instance will break indefinitely. In My case, it was a kernel panic. See specifics below.. Series: Disco Instance Size: Basic_A3 Region: (Default) US-WEST-2 Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux I had a simple script to reboot an instance (X) amount of times, I chose 50, so the machine would power cycle by issuing a "reboot" from the terminal prompt just as a user would. Once the machine came up, it captured dmesg and other bits then rebooted again until it reached 50. After the 4th attempt, my script timed out, I took a look at the instance console log and the following displayed on the console. [ OK ] Reached target Reboot. /shutdown: error while loading shared libra[ 89.498980] Kernel panic - not syncing: Attempted to kill init! exitcode=0x7f00 [ 89.498980] [ 89.500042] CPU: 0 PID: 1 Comm: shutdown Not tainted 4.18.0-1013-azure #13-Ubuntu [ 89.508026] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007
[Kernel-packages] [Bug 1855100] Re: bpf self tests break 5.4.0-7-generic on power8 system
..and on a power9 box too. Marking as fix committed for 5.4.0-12 ** Changed in: linux (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855100 Title: bpf self tests break 5.4.0-7-generic on power8 system Status in linux package in Ubuntu: Fix Released Bug description: Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of the bare metal system. Last output seen while ssh'd into the box: 11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls 11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running socket test 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # [PASS] 11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running psock_fanout test 11:52:34 DEBUG| [stdout] # client_loop: send disconnect: Broken pipe last output in (truncated) nohup output: f -emit-llvm -c progs/pyperf180.c -o - || \ 11:52:15 DEBUG| [stdout]echo "clang failed") | \ 11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe \ 11:52:15 DEBUG| [stdout]-filetype=obj -o /home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o this suggests the bpf selftests are causing the breakage. last output logged in /var/log/dmesg.log : Dec 4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to MEM_GOING_OFFLINE next entries on dmesg.log show machine had rebooted. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1855100] Re: bpf self tests break 5.4.0-7-generic on power8 system
I've re-run this on a power8 VM with 5.4.0-12 and cannot trigger this failure. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855100 Title: bpf self tests break 5.4.0-7-generic on power8 system Status in linux package in Ubuntu: Incomplete Bug description: Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of the bare metal system. Last output seen while ssh'd into the box: 11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls 11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running socket test 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # [PASS] 11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running psock_fanout test 11:52:34 DEBUG| [stdout] # client_loop: send disconnect: Broken pipe last output in (truncated) nohup output: f -emit-llvm -c progs/pyperf180.c -o - || \ 11:52:15 DEBUG| [stdout]echo "clang failed") | \ 11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe \ 11:52:15 DEBUG| [stdout]-filetype=obj -o /home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o this suggests the bpf selftests are causing the breakage. last output logged in /var/log/dmesg.log : Dec 4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to MEM_GOING_OFFLINE next entries on dmesg.log show machine had rebooted. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1861228] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()
*** This bug is a duplicate of bug 1861235 *** https://bugs.launchpad.net/bugs/1861235 ** Bug watch added: Github Issue Tracker for ZFS #8637 https://github.com/zfsonlinux/zfs/issues/8637 ** Also affects: linux via https://github.com/zfsonlinux/zfs/issues/8637 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861228 Title: zfs recv PANIC at range_tree.c:304:range_tree_find_impl() Status in Linux: Unknown Status in linux package in Ubuntu: Confirmed Bug description: Hello, I believe these errors happened due to a zfs recv command that was executing at the time: [10823702.582392] VERIFY(size != 0) failed [10823702.582428] PANIC at range_tree.c:304:range_tree_find_impl() [10823702.582463] Showing stack for process 693172 [10823702.582466] CPU: 7 PID: 693172 Comm: receive_writer Tainted: P O 4.15.0-60-generic #67-Ubuntu [10823702.582466] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015 [10823702.582467] Call Trace: [10823702.582475] dump_stack+0x63/0x8b [10823702.582489] spl_dumpstack+0x42/0x50 [spl] [10823702.582494] spl_panic+0xc8/0x110 [spl] [10823702.582539] ? dbuf_dirty+0x43d/0x850 [zfs] [10823702.582542] ? getrawmonotonic64+0x43/0xd0 [10823702.582544] ? getrawmonotonic64+0x43/0xd0 [10823702.582581] ? dmu_zfetch+0x49a/0x500 [zfs] [10823702.582583] ? getrawmonotonic64+0x43/0xd0 [10823702.582619] ? dmu_zfetch+0x49a/0x500 [zfs] [10823702.582621] ? mutex_lock+0x12/0x40 [10823702.582654] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs] [10823702.582697] range_tree_find_impl+0x88/0x90 [zfs] [10823702.582702] ? spl_kmem_zalloc+0xdc/0x1a0 [spl] [10823702.582743] range_tree_clear+0x4f/0x60 [zfs] [10823702.582780] dnode_free_range+0x11f/0x5a0 [zfs] [10823702.582815] dmu_object_free+0x53/0x90 [zfs] [10823702.582850] dmu_free_long_object+0x9f/0xc0 [zfs] [10823702.582885] receive_freeobjects.isra.12+0x7a/0x100 [zfs] [10823702.582918] receive_writer_thread+0x6d2/0xa60 [zfs] [10823702.582920] ? set_curr_task_fair+0x2b/0x60 [10823702.582925] ? spl_kmem_free+0x33/0x40 [spl] [10823702.582928] ? kfree+0x165/0x180 [10823702.582961] ? receive_free.isra.13+0xc0/0xc0 [zfs] [10823702.582967] thread_generic_wrapper+0x74/0x90 [spl] [10823702.582969] kthread+0x121/0x140 [10823702.582974] ? __thread_exit+0x20/0x20 [spl] [10823702.582975] ? kthread_create_worker_on_cpu+0x70/0x70 [10823702.582978] ret_from_fork+0x35/0x40 [10823907.445420] INFO: task txg_quiesce:4485 blocked for more than 120 seconds. [10823907.445486] Tainted: P O 4.15.0-60-generic #67-Ubuntu [10823907.445535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10823907.445589] txg_quiesce D0 4485 2 0x8000 [10823907.445594] Call Trace: [10823907.445608] __schedule+0x24e/0x880 [10823907.445613] schedule+0x2c/0x80 [10823907.445629] cv_wait_common+0x11e/0x140 [spl] [10823907.445638] ? wait_woken+0x80/0x80 [10823907.445647] __cv_wait+0x15/0x20 [spl] [10823907.445766] txg_quiesce_thread+0x2cb/0x3d0 [zfs] [10823907.445835] ? txg_delay+0x1b0/0x1b0 [zfs] [10823907.445843] thread_generic_wrapper+0x74/0x90 [spl] [10823907.445848] kthread+0x121/0x140 [10823907.445854] ? __thread_exit+0x20/0x20 [spl] [10823907.445857] ? kthread_create_worker_on_cpu+0x70/0x70 [10823907.445861] ret_from_fork+0x35/0x40 [10823907.445916] INFO: task zfs:688217 blocked for more than 120 seconds. [10823907.445962] Tainted: P O 4.15.0-60-generic #67-Ubuntu [10823907.446010] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10823907.446063] zfs D0 688217 688214 0x8080 [10823907.446066] Call Trace: [10823907.446071] __schedule+0x24e/0x880 [10823907.446075] schedule+0x2c/0x80 [10823907.446084] cv_wait_common+0x11e/0x140 [spl] [10823907.446088] ? wait_woken+0x80/0x80 [10823907.446095] __cv_wait+0x15/0x20 [spl] [10823907.446151] dmu_recv_stream+0xa51/0xef0 [zfs] [10823907.446227] zfs_ioc_recv_impl+0x306/0x1100 [zfs] [10823907.446232] ? ttwu_do_activate+0x77/0x80 [10823907.446303] zfs_ioc_recv_new+0x33d/0x410 [zfs] [10823907.446312] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl] [10823907.446320] ? spl_vmem_alloc+0x19/0x20 [spl] [10823907.446332] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair] [10823907.446338] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair] [10823907.446344] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair] [10823907.446409] zfsdev_ioctl+0x451/0x610 [zfs] [10823907.446415] do_vfs_ioctl+0xa8/0x630 [10823907.446419] ? __audit_syscall_entry+0xbc/0x110 [10823907.446424] ? syscall_trace_enter+0x1da/0x2d0 [10823907.446426] SyS_ioctl+0x79/0x90 [10823907.446430]
[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()
Can you describe the zfs environment and the command that was being actioned that triggered this issue? ** Bug watch added: Github Issue Tracker for ZFS #8637 https://github.com/zfsonlinux/zfs/issues/8637 ** Also affects: linux via https://github.com/zfsonlinux/zfs/issues/8637 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861235 Title: zfs recv PANIC at range_tree.c:304:range_tree_find_impl() Status in Linux: Unknown Status in linux package in Ubuntu: Confirmed Bug description: Same as bug 1861228 but with a newer kernel installed. [ 790.702566] VERIFY(size != 0) failed [ 790.702590] PANIC at range_tree.c:304:range_tree_find_impl() [ 790.702611] Showing stack for process 28685 [ 790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015 [ 790.702616] Call Trace: [ 790.702626] dump_stack+0x6d/0x8e [ 790.702637] spl_dumpstack+0x42/0x50 [spl] [ 790.702640] spl_panic+0xc8/0x110 [spl] [ 790.702645] ? __switch_to_asm+0x41/0x70 [ 790.702714] ? arc_prune_task+0x1a/0x40 [zfs] [ 790.702740] ? dbuf_dirty+0x43d/0x850 [zfs] [ 790.702745] ? getrawmonotonic64+0x43/0xd0 [ 790.702746] ? getrawmonotonic64+0x43/0xd0 [ 790.702775] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702778] ? getrawmonotonic64+0x43/0xd0 [ 790.702805] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702807] ? mutex_lock+0x12/0x40 [ 790.702833] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs] [ 790.702866] range_tree_find_impl+0x88/0x90 [zfs] [ 790.702870] ? spl_kmem_zalloc+0xdc/0x1a0 [spl] [ 790.702902] range_tree_clear+0x4f/0x60 [zfs] [ 790.702930] dnode_free_range+0x11f/0x5a0 [zfs] [ 790.702957] dmu_object_free+0x53/0x90 [zfs] [ 790.702983] dmu_free_long_object+0x9f/0xc0 [zfs] [ 790.703010] receive_freeobjects.isra.12+0x7a/0x100 [zfs] [ 790.703036] receive_writer_thread+0x6d2/0xa60 [zfs] [ 790.703040] ? set_curr_task_fair+0x2b/0x60 [ 790.703043] ? spl_kmem_free+0x33/0x40 [spl] [ 790.703048] ? kfree+0x165/0x180 [ 790.703073] ? receive_free.isra.13+0xc0/0xc0 [zfs] [ 790.703078] thread_generic_wrapper+0x74/0x90 [spl] [ 790.703081] kthread+0x121/0x140 [ 790.703084] ? __thread_exit+0x20/0x20 [spl] [ 790.703085] ? kthread_create_worker_on_cpu+0x70/0x70 [ 790.703088] ret_from_fork+0x35/0x40 [ 967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds. [ 967.636979] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637076] txg_quiesce D0 14810 2 0x8000 [ 967.637080] Call Trace: [ 967.637089] __schedule+0x24e/0x880 [ 967.637092] schedule+0x2c/0x80 [ 967.637106] cv_wait_common+0x11e/0x140 [spl] [ 967.637114] ? wait_woken+0x80/0x80 [ 967.637122] __cv_wait+0x15/0x20 [spl] [ 967.637210] txg_quiesce_thread+0x2cb/0x3d0 [zfs] [ 967.637278] ? txg_delay+0x1b0/0x1b0 [zfs] [ 967.637286] thread_generic_wrapper+0x74/0x90 [spl] [ 967.637291] kthread+0x121/0x140 [ 967.637297] ? __thread_exit+0x20/0x20 [spl] [ 967.637299] ? kthread_create_worker_on_cpu+0x70/0x70 [ 967.637304] ret_from_fork+0x35/0x40 [ 967.637326] INFO: task zfs:28590 blocked for more than 120 seconds. [ 967.637371] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637467] zfs D0 28590 28587 0x8080 [ 967.637470] Call Trace: [ 967.637474] __schedule+0x24e/0x880 [ 967.637477] schedule+0x2c/0x80 [ 967.637486] cv_wait_common+0x11e/0x140 [spl] [ 967.637491] ? wait_woken+0x80/0x80 [ 967.637498] __cv_wait+0x15/0x20 [spl] [ 967.637554] dmu_recv_stream+0xa51/0xef0 [zfs] [ 967.637630] zfs_ioc_recv_impl+0x306/0x1100 [zfs] [ 967.637679] ? dbuf_read+0x34a/0x920 [zfs] [ 967.637725] ? dbuf_rele+0x36/0x40 [zfs] [ 967.637728] ? _cond_resched+0x19/0x40 [ 967.637798] zfs_ioc_recv_new+0x33d/0x410 [zfs] [ 967.637809] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl] [ 967.637816] ? spl_vmem_alloc+0x19/0x20 [spl] [ 967.637828] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair] [ 967.637834] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair] [ 967.637840] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair] [ 967.637905] zfsdev_ioctl+0x451/0x610 [zfs] [ 967.637913] do_vfs_ioctl+0xa8/0x630 [ 967.637917] ? __audit_syscall_entry+0xbc/0x110 [ 967.637924] ? syscall_trace_enter+0x1da/0x2d0 [ 967.637927] SyS_ioctl+0x79/0x90 [ 967.637930] do_syscall_64+0x73/0x130 [ 967.637935] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [
[Kernel-packages] [Bug 1858495] Re: multiple long delays during kernel and userspace boot
** Changed in: linux-signed-azure (Ubuntu) Importance: Undecided => Medium ** Changed in: linux-signed-azure (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-azure in Ubuntu. https://bugs.launchpad.net/bugs/1858495 Title: multiple long delays during kernel and userspace boot Status in linux-signed-azure package in Ubuntu: In Progress Bug description: Booting some Bionic instances in Azure (gen1 machines), I see some large delays during kernel/userspace boot that it would be good to understand what's going on. Additionally, there areas during boot that see delays is different for an image that's been created from a template vs. stock images. I'm attaching some data, 10 runs of the same image in a scaling set that run the initial boot. Processing the journal output, looking at delays of over 2.0 shows some concern. [1.788581] localhost.localdomain kernel: * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, * this clock source is slow. Consider trying other clock sources [3.545974] localhost.localdomain kernel: Unstable clock detected, switching default tracing clock to "global" If you want to keep using the local clock, then add: "trace_clock=local" on the kernel command line [6.401684] localhost.localdomain kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 15.280390] localhost.localdomain kernel: EXT4-fs (sda1): re-mounted. Opts: discard After capturing bionic image as a template, and creating a new VM, we see new hot spots we didn't see before. # HotSpot maximum delta between kernel messages: 2.0 # [2.846188] localhost.localdomain kernel: AES CTR mode by8 optimization enabled # [5.919313] localhost.localdomain kernel: raid6: avx2x4 gen() 21512 MB/s # # [6.591530] localhost.localdomain kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) # [9.031051] localhost.localdomain systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid) # # [ 13.773554] localhost.localdomain sh[871]: + exit 0 # [ 21.625467] localhost.localdomain kernel: UDF-fs: INFO Mounting volume 'UDF Volume', timestamp 2019/12/17 00:00 (1000) # # [ 24.919359] bugbif2be01 systemd-timesyncd[771]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com). # [ 29.787339] bugbif2be01 cloud-init[1026]: Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Mon, 16 Dec 2019 18:14:47 +. Up 25.20 seconds. The easiest comparison kernel-side is the systemd-analyze value: Grepping in the debug data: % grep "Startup finished.*kernel" bug-bionic-baseline-no*.debug/*/journal.log | cut -d" " -f 7- Startup finished in 3.209s (kernel) + 49.305s (userspace) = 52.515s. Startup finished in 3.355s (kernel) + 51.732s (userspace) = 55.088s. Startup finished in 3.287s (kernel) + 51.747s (userspace) = 55.035s. Startup finished in 3.129s (kernel) + 50.066s (userspace) = 53.195s. Startup finished in 3.350s (kernel) + 50.682s (userspace) = 54.032s. Startup finished in 3.355s (kernel) + 49.322s (userspace) = 52.678s. Startup finished in 3.219s (kernel) + 51.124s (userspace) = 54.343s. Startup finished in 3.128s (kernel) + 49.226s (userspace) = 52.354s. Startup finished in 3.193s (kernel) + 53.197s (userspace) = 56.390s. Startup finished in 3.118s (kernel) + 46.203s (userspace) = 49.322s. foofoo % grep "Startup finished.*kernel" bug-bionic-baseline-after*.debug/*/journal.log | cut -d" " -f 7- Startup finished in 7.685s (kernel) + 32.463s (userspace) = 40.148s. Startup finished in 7.041s (kernel) + 35.998s (userspace) = 43.040s. Startup finished in 7.808s (kernel) + 35.444s (userspace) = 43.253s. Startup finished in 7.206s (kernel) + 37.952s (userspace) = 45.159s. Startup finished in 8.426s (kernel) + 36.976s (userspace) = 45.403s. Startup finished in 6.731s (kernel) + 35.484s (userspace) = 42.216s. Startup finished in 7.152s (kernel) + 32.664s (userspace) = 39.817s. Startup finished in 7.429s (kernel) + 36.177s (userspace) = 43.606s. Startup finished in 9.075s (kernel) + 32.494s (userspace) = 41.570s. Startup finished in 7.281s (kernel) + 32.732s (userspace) = 40.013s. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-5.0.0-1027-azure 5.0.0-1027.29~18.04.1 ProcVersionSignature: User Name 5.0.0-1027.29~18.04.1-azure 5.0.21 Uname: Linux 5.0.0-1027-azure x86_64
[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support
Tested these updates with the kernel team ZFS autotest regression tests on the following architectures: arm64 - PASSED amd64 - PASSED s390x - PASSED ppc64el - PASSED I re-ran the failed lxd test as referenced in comment #5 and it passed, so I believe the original failure was an artifact of the test system and not with ZFS per se. ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856704 Title: backport 5.3 zfs support to bionic for HWE kernel support Status in spl-linux package in Ubuntu: Fix Committed Status in zfs-linux package in Ubuntu: Fix Committed Status in spl-linux source package in Bionic: Fix Committed Status in zfs-linux source package in Bionic: Fix Committed Bug description: == SRU Justification Bionic == The HWE 5.3 kernel requires ZFS + SPL to support dkms module build functionality for kernels 4.15 through to 5.3. Basically, the ZFS+SPL compat commits between 4.15 and 5.3 are required to allow the modules to build on kernels upto and include the HWE 5.3 kernel. == The Fix == Backport of upstream commits: SPL: - 0002-fix-spl-build-shrinker-callback-check.patch - 0003-remove-deprecated-set-fs-pwd-check.patch - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch - 0005-Linux-4.20-compat-current_kernel_time.patch - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch - 0007-Linux-5.0-compat-Use-totalram_pages.patch - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch - 0010-Linux-5.1-compat-get_ds-removed.patch - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch - 0012-Linux-5.2-compat-rw_tryupgrade.patch - 0013-Linux-5.3-compat-rw_semaphore-owner.patch - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch - 0016-Linux-compat-4.16-SECTOR_SIZE.patch - 0017-Linux-compat-spl-timespec_sub.patch - 0018-deprecate-splat-rwlock-test6.patch ZFS: - 3300-Linux-4.16-compat-inode_set_iversion.patch - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch - 3302-Linux-4.16-compat-get_disk_and_module.patch - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch - 3305-Linux-4.14-compat-blk_queue_stackable.patch - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch - 3308-Linux-5.0-compat-Use-totalram_pages.patch - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch - 3315-Linux-5.0-compat-ASM_BUG-macro.patch - 3316-Linux-5.2-compat-rw_tryupgrade.patch - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch - 3320-zpios-deprecate-current-kernel-time.patch - 3321-add-compat-check-disk-size-change.patch == Testcase == Without these commits users who install kernels and kernel headers from 4.16 through to 5.3 inclusive won't be able to build spl + zfs in Bionic because of the lack of the kernel compat fixes. With the commits, zfs + spl dkms modules can build cleanly and pass the ubuntu ZFS regression tests found in the kernel team autotests git repository. == Risk == This is a sizeable backport that touches a fair amount of spl + zfs kernel interfacing code. There is a risk that the backport may cause a regression in functionality that has not been exercised by the ZFS regression tests. This backport with the zfs regression testing ensures that no regression in core zfs functionality has been found. It must be noted that most of the patches are upstream compat fixes that are known to be working with the latest ZFS that is being used in focal, so we are confident the original compat changes work. Note that these updates have all been build tested on x86-64, arm64 and s390x systems with kernels from 4.16 to 5.3 and regression tested with the ubuntu zfs regression tests. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade
OK, I'll look into this sometime this week. Thanks for the information. ** Changed in: zfs-linux (Ubuntu) Importance: Medium => High ** Changed in: zfs-linux (Ubuntu) Status: Triaged => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1860182 Title: zpool scrub malfunction after kernel upgrade Status in zfs-linux package in Ubuntu: In Progress Bug description: I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly: eric@eric-8700K:~$ zpool status pool: storagepool1 state: ONLINE scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020 config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 I ran zpool scrub after upgrading the kernel and rebooting, and now it fails to work properly. It appeared to finish in about 5 minutes but did not, and says it is going slow: eric@eric-8700K:~$ sudo zpool status pool: storagepool1 state: ONLINE scan: scrub in progress since Fri Jan 17 15:32:07 2020 1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time) 0B repaired, 100.00% done config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 errors: No known data errors ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: zfsutils-linux 0.7.5-1ubuntu16.7 ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13 Uname: Linux 5.3.0-26-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.9-0ubuntu7.9 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Fri Jan 17 16:22:01 2020 InstallationDate: Installed on 2018-03-07 (681 days ago) InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1) SourcePackage: zfs-linux UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago) modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs'] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade
** Changed in: zfs-linux (Ubuntu) Importance: High => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1860182 Title: zpool scrub malfunction after kernel upgrade Status in zfs-linux package in Ubuntu: Triaged Bug description: I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly: eric@eric-8700K:~$ zpool status pool: storagepool1 state: ONLINE scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020 config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 I ran zpool scrub after upgrading the kernel and rebooting, and now it fails to work properly. It appeared to finish in about 5 minutes but did not, and says it is going slow: eric@eric-8700K:~$ sudo zpool status pool: storagepool1 state: ONLINE scan: scrub in progress since Fri Jan 17 15:32:07 2020 1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time) 0B repaired, 100.00% done config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 errors: No known data errors ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: zfsutils-linux 0.7.5-1ubuntu16.7 ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13 Uname: Linux 5.3.0-26-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.9-0ubuntu7.9 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Fri Jan 17 16:22:01 2020 InstallationDate: Installed on 2018-03-07 (681 days ago) InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1) SourcePackage: zfs-linux UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago) modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs'] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Cornered this to zswap and not an issue with mm or I/O. Figured out that 3 hours soak testing on each bisect step is the only reliably way to do a bisect. Bisected between 4.20 and 5.0 finally cornered the issue and hence the commits required to fix this. ** Description changed: + == SRU Justification == + + When using zram (as installed and configured with the zram-config package) + systems can lockup after about a week of use. This occurs because of + a hang in a lock in zram. + + == Test Case == + + Run stress-ng --brk 0 --stack 0 in a Bionic amd64 server VM with 1GM of + memory, 16 CPU threads and zram-config installed. Without the fix the + kernel will hang in a spinlock after 1-2 hours of run time. With the fix, + the hang does not occur. Testing shows that with the fix, 5 x 16 CPU hours + of stress testing with stress-ng works fine without the lockup occurring. + + == The fix == + + Upstream commit c4d6c4cc7bfd ("zram: correct flag name of ZRAM_ACCESS") as + a prerequisite followed by a minor context wiggle backport of the fix with + commit 3c9959e02547 ("zram: fix lockdep warning of free block handling"). + + == Regression Potential == + + This touches the zram locking, so the core zram driver is affected. However + the fixes are backports from 5.0, so the fixes have had a fair amount of + testing in later kernels. + + My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. - - As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. + As this felt like a potential hardware issue, I had my hosting provider + give me a completely different system, different motherboard, different + CPU, different RAM and different storage, I installed that system on + 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: - https://github.com/lxc/lxd/issues/5197 + https://github.com/lxc/lxd/issues/5197 - - My system doesn't have a lot of memory pressure with about 50% of free memory: + My system doesn't have a lot of memory pressure with about 50% of free + memory: root@vorash:~# free -m - totalusedfree shared buff/cache available + totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: - Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux + Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. - --- + --- ProblemType: Bug AlsaDevices: - total 0 - crw-rw 1 root audio 116, 1 Oct 23 16:12 seq - crw-rw 1 root audio 116, 33 Oct 23 16:12 timer + total 0 + crw-rw 1 root audio 116, 1 Oct 23 16:12 seq + crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: - Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied - Cannot stat file /proc/22831/fd/10: Permission denied + Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied + Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: - RESUME=none - CRYPTSETUP=n + RESUME=none + CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: - Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub - Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse - Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub + Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub +
[Kernel-packages] [Bug 1862101] Re: ubuntu_zfs_fstest / ubuntu_zfs_xfs_generic failed to build on Focal 5.4
** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Critical ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1862101 Title: ubuntu_zfs_fstest / ubuntu_zfs_xfs_generic failed to build on Focal 5.4 Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Incomplete Status in zfs-linux package in Ubuntu: In Progress Bug description: The test build will failed because of unmet dependencies of zfsutils- linux and zfs-dkms package apt-get install --yes --force-yes build-essential gdb git gcc zfsutils-linux stdout: Reading package lists... Building dependency tree... Reading state information... build-essential is already the newest version (12.8ubuntu1). gcc is already the newest version (4:9.2.1-3.1ubuntu1). gdb is already the newest version (9.0.90.20200117-0ubuntu1). git is already the newest version (1:2.25.0-1ubuntu1). Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: zfsutils-linux : Breaks: zfs-dkms (< 0.8.3-1ubuntu2) stderr: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to correct problems, you have held broken packages. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1862101/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1861359] Re: swap storms kills interactive use
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: linux (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861359 Title: swap storms kills interactive use Status in linux package in Ubuntu: Confirmed Bug description: Hello, several times since upgrading to focal from 19.04 I've found my computer entirely unresponsive for periods of twenty or thirty seconds. No mouse movement, no keyboard input, the screen output does not change. My computer was using swap space and despite very slow writeout speeds well below what the NVME drive can handle, the computer was unusable. I've captured some vmstat 1 output and top output that I started collecting during the event. (Normally one very long painful period is followed by several shorter periods of uselessness.) Thanks ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.4.0-12-generic 5.4.0-12.15 ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8 Uname: Linux 5.4.0-12-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu15 Architecture: amd64 Date: Wed Jan 29 23:44:05 2020 ProcEnviron: TERM=rxvt-unicode-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: linux-signed-5.4 UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago) --- ProblemType: Bug AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-12-generic. ApportVersion: 2.20.11-0ubuntu16 Architecture: amd64 AudioDevicesInUse: USERPID ACCESS COMMAND /dev/snd/controlC0: sarnold2734 F pulseaudio /dev/snd/controlC1: sarnold2734 F pulseaudio Card0.Amixer.info: Card hw:0 'PCH'/'HDA Intel PCH at 0x2fe1028000 irq 145' Mixer name : 'Realtek ALC285' Components : 'HDA:10ec0285,17aa225c,0012 HDA:8086280b,80860101,0010' Controls : 53 Simple ctrls : 15 Card1.Amixer.info: Card hw:1 'Audio'/'Generic ThinkPad Dock USB Audio at usb-:00:14.0-4.2.4, high speed' Mixer name : 'USB Mixer' Components : 'USB17ef:306f' Controls : 9 Simple ctrls : 4 DistroRelease: Ubuntu 20.04 HibernationDevice: RESUME=none IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: LENOVO 20KHCTO1WW NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) ProcEnviron: TERM=rxvt-unicode-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 i915drmfb ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu@/vmlinuz-5.4.0-12-generic root=ZFS=rpool/ROOT/ubuntu ro root=ZFS=rpool/ROOT/ubuntu quiet splash acpi_osi=! "acpi_osi=Windows 2015" vt.handoff=1 ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8 RelatedPackageVersions: linux-restricted-modules-5.4.0-12-generic N/A linux-backports-modules-5.4.0-12-generic N/A linux-firmware1.185 Tags: focal Uname: Linux 5.4.0-12-generic x86_64 UpgradeStatus: Upgraded to focal on 2020-01-24 (5 days ago) UserGroups: adm cdrom libvirt lpadmin plugdev sambashare sbuild sudo _MarkForUpload: True dmi.bios.date: 11/25/2019 dmi.bios.vendor: LENOVO dmi.bios.version: N23ET69W (1.44 ) dmi.board.asset.tag: Not Available dmi.board.name: 20KHCTO1WW dmi.board.vendor: LENOVO dmi.board.version: SDK0J40709 WIN dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: None dmi.modalias: dmi:bvnLENOVO:bvrN23ET69W(1.44):bd11/25/2019:svnLENOVO:pn20KHCTO1WW:pvrThinkPadX1Carbon6th:rvnLENOVO:rn20KHCTO1WW:rvrSDK0J40709WIN:cvnLENOVO:ct10:cvrNone: dmi.product.family: ThinkPad X1 Carbon 6th dmi.product.name: 20KHCTO1WW dmi.product.sku: LENOVO_MT_20KH_BU_Think_FM_ThinkPad X1 Carbon 6th dmi.product.version: ThinkPad X1 Carbon 6th dmi.sys.vendor: LENOVO --- ProblemType: Bug AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-12-generic. ApportVersion: 2.20.11-0ubuntu16 Architecture: amd64 AudioDevicesInUse: USERPID ACCESS COMMAND /dev/snd/controlC0: sarnold2734 F pulseaudio /dev/snd/controlC1: sarnold2734 F pulseaudio Card0.Amixer.info: Card hw:0 'PCH'/'HDA Intel PCH at 0x2fe1028000 irq 145' Mixer name : 'Realtek ALC285' Components : 'HDA:10ec0285,17aa225c,0012 HDA:8086280b,80860101,0010' Controls : 53 Simple ctrls : 15 Card1.Amixer.info: Card hw:1 'Audio'/'Generic ThinkPad Dock USB Audio at usb-:00:14.0-4.2.4, high speed' Mixer name : 'USB Mixer' Components : 'USB17ef:306f' Controls : 9 Simple ctrls : 4
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Running w/o swapfile and zswap and just stress-ng brk and stack stressors with NO file I/O can also lock the system. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..: dmi.product.family: Family dmi.product.name: S1200SP dmi.product.version: dmi.sys.vendor: Intel
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Couple more notes: 1. Disable file based swap on /swapfile - can reproduce issue 2. Use partition based swap on 2nd disk - can reproduce issue -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..: dmi.product.family: Family dmi.product.name: S1200SP dmi.product.version:
[Kernel-packages] [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: linux (Ubuntu) Importance: Undecided => Medium ** Changed in: linux (Ubuntu) Importance: Medium => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1861235 Title: zfs recv PANIC at range_tree.c:304:range_tree_find_impl() Status in linux package in Ubuntu: Confirmed Bug description: Same as bug 1861228 but with a newer kernel installed. [ 790.702566] VERIFY(size != 0) failed [ 790.702590] PANIC at range_tree.c:304:range_tree_find_impl() [ 790.702611] Showing stack for process 28685 [ 790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015 [ 790.702616] Call Trace: [ 790.702626] dump_stack+0x6d/0x8e [ 790.702637] spl_dumpstack+0x42/0x50 [spl] [ 790.702640] spl_panic+0xc8/0x110 [spl] [ 790.702645] ? __switch_to_asm+0x41/0x70 [ 790.702714] ? arc_prune_task+0x1a/0x40 [zfs] [ 790.702740] ? dbuf_dirty+0x43d/0x850 [zfs] [ 790.702745] ? getrawmonotonic64+0x43/0xd0 [ 790.702746] ? getrawmonotonic64+0x43/0xd0 [ 790.702775] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702778] ? getrawmonotonic64+0x43/0xd0 [ 790.702805] ? dmu_zfetch+0x49a/0x500 [zfs] [ 790.702807] ? mutex_lock+0x12/0x40 [ 790.702833] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs] [ 790.702866] range_tree_find_impl+0x88/0x90 [zfs] [ 790.702870] ? spl_kmem_zalloc+0xdc/0x1a0 [spl] [ 790.702902] range_tree_clear+0x4f/0x60 [zfs] [ 790.702930] dnode_free_range+0x11f/0x5a0 [zfs] [ 790.702957] dmu_object_free+0x53/0x90 [zfs] [ 790.702983] dmu_free_long_object+0x9f/0xc0 [zfs] [ 790.703010] receive_freeobjects.isra.12+0x7a/0x100 [zfs] [ 790.703036] receive_writer_thread+0x6d2/0xa60 [zfs] [ 790.703040] ? set_curr_task_fair+0x2b/0x60 [ 790.703043] ? spl_kmem_free+0x33/0x40 [spl] [ 790.703048] ? kfree+0x165/0x180 [ 790.703073] ? receive_free.isra.13+0xc0/0xc0 [zfs] [ 790.703078] thread_generic_wrapper+0x74/0x90 [spl] [ 790.703081] kthread+0x121/0x140 [ 790.703084] ? __thread_exit+0x20/0x20 [spl] [ 790.703085] ? kthread_create_worker_on_cpu+0x70/0x70 [ 790.703088] ret_from_fork+0x35/0x40 [ 967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds. [ 967.636979] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637076] txg_quiesce D0 14810 2 0x8000 [ 967.637080] Call Trace: [ 967.637089] __schedule+0x24e/0x880 [ 967.637092] schedule+0x2c/0x80 [ 967.637106] cv_wait_common+0x11e/0x140 [spl] [ 967.637114] ? wait_woken+0x80/0x80 [ 967.637122] __cv_wait+0x15/0x20 [spl] [ 967.637210] txg_quiesce_thread+0x2cb/0x3d0 [zfs] [ 967.637278] ? txg_delay+0x1b0/0x1b0 [zfs] [ 967.637286] thread_generic_wrapper+0x74/0x90 [spl] [ 967.637291] kthread+0x121/0x140 [ 967.637297] ? __thread_exit+0x20/0x20 [spl] [ 967.637299] ? kthread_create_worker_on_cpu+0x70/0x70 [ 967.637304] ret_from_fork+0x35/0x40 [ 967.637326] INFO: task zfs:28590 blocked for more than 120 seconds. [ 967.637371] Tainted: P O 4.15.0-76-generic #86-Ubuntu [ 967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 967.637467] zfs D0 28590 28587 0x8080 [ 967.637470] Call Trace: [ 967.637474] __schedule+0x24e/0x880 [ 967.637477] schedule+0x2c/0x80 [ 967.637486] cv_wait_common+0x11e/0x140 [spl] [ 967.637491] ? wait_woken+0x80/0x80 [ 967.637498] __cv_wait+0x15/0x20 [spl] [ 967.637554] dmu_recv_stream+0xa51/0xef0 [zfs] [ 967.637630] zfs_ioc_recv_impl+0x306/0x1100 [zfs] [ 967.637679] ? dbuf_read+0x34a/0x920 [zfs] [ 967.637725] ? dbuf_rele+0x36/0x40 [zfs] [ 967.637728] ? _cond_resched+0x19/0x40 [ 967.637798] zfs_ioc_recv_new+0x33d/0x410 [zfs] [ 967.637809] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl] [ 967.637816] ? spl_vmem_alloc+0x19/0x20 [spl] [ 967.637828] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair] [ 967.637834] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair] [ 967.637840] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair] [ 967.637905] zfsdev_ioctl+0x451/0x610 [zfs] [ 967.637913] do_vfs_ioctl+0xa8/0x630 [ 967.637917] ? __audit_syscall_entry+0xbc/0x110 [ 967.637924] ? syscall_trace_enter+0x1da/0x2d0 [ 967.637927] SyS_ioctl+0x79/0x90 [ 967.637930] do_syscall_64+0x73/0x130 [ 967.637935] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 967.637938] RIP: 0033:0x7fc305a905d7 [ 967.637940] RSP: 002b:7ffc45e39618 EFLAGS: 0246 ORIG_RAX: 00
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Captured the hard lock on the following (gdb) stepi 0x8c4e29e5 in ?? () => 0x8c4e29e5: eb ec jmp0x8c4e29d3 (gdb) stepi 0x8c4e29d3 in ?? () => 0x8c4e29d3: 8b 07 mov(%rdi),%eax (gdb) stepi 0x8c4e29d5 in ?? () => 0x8c4e29d5: 85 c0 test %eax,%eax (gdb) stepi 0x8c4e29d7 in ?? () => 0x8c4e29d7: 75 0a jne0x8c4e29e3 (gdb) stepi 0x8c4e29e3 in ?? () => 0x8c4e29e3: f3 90 pause (gdb) stepi 0x8c4e29e5 in ?? () => 0x8c4e29e5: eb ec jmp0x8c4e29d3 This maps to: 810e29c0 : 810e29d3: 8b 07 mov(%rdi),%eax 810e29d5: 85 c0 test %eax,%eax 810e29d7: 75 0a jne810e29e3 810e29d9: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 810e29dd: 85 c0 test %eax,%eax 810e29df: 75 f2 jne810e29d3 810e29e1: 5d pop%rbp 810e29e2: c3 retq 810e29e3: f3 90 pause 810e29e5: eb ec jmp810e29d3 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
Dann, tested on my 6640 on an older kernel, now get: sudo dmidecode # dmidecode 3.1 # No SMBIOS nor DMI entry point found, sorry. I guess that's expected. I'd like to see what Ethan gets on his H/W as I'm not running a cloud installation on my dev board. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Fix Released Status in dmidecode source package in Bionic: In Progress Status in dmidecode source package in Eoan: In Progress Status in dmidecode source package in Focal: Fix Released Status in dmidecode package in Debian: Unknown Bug description: [Impact] Running 'sudo dmidecode' on non-UEFI ARM systems can cause them to crash/reboot. cloud-init apparently runs dmidecode as root, so it breaks any cloud-init based installation. [Test Case] sudo dmidecode [Fix] Upstream has the following fix: commit e12ec26e19e02281d3e7258c3aabb88a5cf5ec1d Author: Jean Delvare Date: Mon Aug 26 14:20:15 2019 +0200 dmidecode: Only scan /dev/mem for entry point on x86 [Regression Risk] In Ubuntu, dmidecode only builds on amd64, arm64, armhf & i386. The fix is to disable code on !x86, so the regression risk is restricted to ARM platforms, where we know /dev/mem trolling is bad news. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
It needs backporting to eoan, disco bionic, I was just about to upload a fix to my ppa so I could get it sponsored. Do you want to take it from here Dann? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Fix Released Status in dmidecode source package in Bionic: In Progress Status in dmidecode source package in Eoan: In Progress Status in dmidecode source package in Focal: Fix Released Status in dmidecode package in Debian: Unknown Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
Upstream has a fix like the one I was hinting at in comment #9, I'll SRU this fix. commit e12ec26e19e02281d3e7258c3aabb88a5cf5ec1d Author: Jean Delvare Date: Mon Aug 26 14:20:15 2019 +0200 dmidecode: Only scan /dev/mem for entry point on x86 x86 is the only architecture which can have a DMI entry point scanned from /dev/mem. Do not attempt it on other architectures, because not only it can't work, but it can even cause the system to reboot. This fixes support request #109697: https://savannah.nongnu.org/support/?109697 ** Changed in: dmidecode (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: dmidecode (Ubuntu) Status: Triaged => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: In Progress Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
I guess the next question is why dmidecode being run as root is required on a cloud init? What happens when arches don't have DMI data? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: In Progress Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
dmidocode.c directly accesses memory and assumes it's an x86 without any checking that the arch is x86.. Randomly scanning arbitrary hunks of memory on non-x86 as root will lead to all sorts of woe: memory_scan: if (!(opt.flags & FLAG_QUIET)) printf("Scanning %s for entry point.\n", opt.devmem); /* Fallback to memory scan (x86, x86_64) */ if ((buf = mem_chunk(0xF, 0x1, opt.devmem)) == NULL) { ret = 1; goto exit_free; } It probably needs wrapping with: #if defined(__x86_64__) || defined(__x86_64) || \ defined(__i386__) || defined(__i386) ... #endif Anyhow, I don't think this is a kernel specific issue. I can trigger this with various kernels - we just don't protect users with CAP_SYS_ADMIN rights doing crazy probing on /dev/mem. ** Changed in: dmidecode (Ubuntu) Assignee: Colin Ian King (colin-king) => (unassigned) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Triaged Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
So, dmidecode directly mmap's to /dev/mem and does some probing based on the belief that the system is a x86 architecture even on arm architectures. openat(AT_FDCWD, "/dev/mem", O_RDONLY) = 3 fstat(3, {st_mode=S_IFCHR|0640, st_rdev=makedev(0x1, 0x1), ...}) = 0 mmap(NULL, 65536, PROT_READ, MAP_SHARED, 3, 0xf) = 0x7f9f6fd000 etc So that's kind of intrusive and as root one can read any sort of physical addresses in /dev/mem that may cause breakage. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Triaged Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support
@ubuntu stable folks - can this be uploaded sometime soon? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856704 Title: backport 5.3 zfs support to bionic for HWE kernel support Status in spl-linux package in Ubuntu: Fix Committed Status in zfs-linux package in Ubuntu: Fix Committed Status in spl-linux source package in Bionic: Fix Committed Status in zfs-linux source package in Bionic: Fix Committed Bug description: == SRU Justification Bionic == The HWE 5.3 kernel requires ZFS + SPL to support dkms module build functionality for kernels 4.15 through to 5.3. Basically, the ZFS+SPL compat commits between 4.15 and 5.3 are required to allow the modules to build on kernels upto and include the HWE 5.3 kernel. == The Fix == Backport of upstream commits: SPL: - 0002-fix-spl-build-shrinker-callback-check.patch - 0003-remove-deprecated-set-fs-pwd-check.patch - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch - 0005-Linux-4.20-compat-current_kernel_time.patch - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch - 0007-Linux-5.0-compat-Use-totalram_pages.patch - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch - 0010-Linux-5.1-compat-get_ds-removed.patch - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch - 0012-Linux-5.2-compat-rw_tryupgrade.patch - 0013-Linux-5.3-compat-rw_semaphore-owner.patch - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch - 0016-Linux-compat-4.16-SECTOR_SIZE.patch - 0017-Linux-compat-spl-timespec_sub.patch - 0018-deprecate-splat-rwlock-test6.patch ZFS: - 3300-Linux-4.16-compat-inode_set_iversion.patch - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch - 3302-Linux-4.16-compat-get_disk_and_module.patch - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch - 3305-Linux-4.14-compat-blk_queue_stackable.patch - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch - 3308-Linux-5.0-compat-Use-totalram_pages.patch - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch - 3315-Linux-5.0-compat-ASM_BUG-macro.patch - 3316-Linux-5.2-compat-rw_tryupgrade.patch - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch - 3320-zpios-deprecate-current-kernel-time.patch - 3321-add-compat-check-disk-size-change.patch == Testcase == Without these commits users who install kernels and kernel headers from 4.16 through to 5.3 inclusive won't be able to build spl + zfs in Bionic because of the lack of the kernel compat fixes. With the commits, zfs + spl dkms modules can build cleanly and pass the ubuntu ZFS regression tests found in the kernel team autotests git repository. == Risk == This is a sizeable backport that touches a fair amount of spl + zfs kernel interfacing code. There is a risk that the backport may cause a regression in functionality that has not been exercised by the ZFS regression tests. This backport with the zfs regression testing ensures that no regression in core zfs functionality has been found. It must be noted that most of the patches are upstream compat fixes that are known to be working with the latest ZFS that is being used in focal, so we are confident the original compat changes work. Note that these updates have all been build tested on x86-64, arm64 and s390x systems with kernels from 4.16 to 5.3 and regression tested with the ubuntu zfs regression tests. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
Hi, can you provide me instructions on how to get and install the image for this board? I'd like to reproduce this issue and get a suitable fix for this. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Triaged Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1855100] Re: bpf self tests break 5.4.0-7-generic on power8 system
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855100 Title: bpf self tests break 5.4.0-7-generic on power8 system Status in linux package in Ubuntu: Incomplete Bug description: Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of the bare metal system. Last output seen while ssh'd into the box: 11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls 11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running socket test 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # [PASS] 11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running psock_fanout test 11:52:34 DEBUG| [stdout] # client_loop: send disconnect: Broken pipe last output in (truncated) nohup output: f -emit-llvm -c progs/pyperf180.c -o - || \ 11:52:15 DEBUG| [stdout]echo "clang failed") | \ 11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe \ 11:52:15 DEBUG| [stdout]-filetype=obj -o /home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o this suggests the bpf selftests are causing the breakage. last output logged in /var/log/dmesg.log : Dec 4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to MEM_GOING_OFFLINE next entries on dmesg.log show machine had rebooted. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860182] Re: zpool scrub malfunction after kernel upgrade
** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => High ** Changed in: zfs-linux (Ubuntu) Status: New => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1860182 Title: zpool scrub malfunction after kernel upgrade Status in zfs-linux package in Ubuntu: Triaged Bug description: I ran a zpool scrub prior to upgrading my 18.04 to the latest HWE kernel (5.3.0-26-generic #28~18.04.1-Ubuntu) and it ran properly: eric@eric-8700K:~$ zpool status pool: storagepool1 state: ONLINE scan: scrub repaired 1M in 4h21m with 0 errors on Fri Jan 17 07:01:24 2020 config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 I ran zpool scrub after upgrading the kernel and rebooting, and now it fails to work properly. It appeared to finish in about 5 minutes but did not, and says it is going slow: eric@eric-8700K:~$ sudo zpool status pool: storagepool1 state: ONLINE scan: scrub in progress since Fri Jan 17 15:32:07 2020 1.89T scanned out of 1.89T at 589M/s, (scan is slow, no estimated time) 0B repaired, 100.00% done config: NAME STATE READ WRITE CKSUM storagepool1 ONLINE 0 0 0 mirror-0ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M3YFRVJ3 ONLINE 0 0 0 ata-ST2000DM001-1CH164_Z1E285A4 ONLINE 0 0 0 mirror-1ONLINE 0 0 0 ata-WDC_WD20EZRZ-00Z5HB0_WD-WCC4M1DSASHD ONLINE 0 0 0 ata-ST2000DM006-2DM164_Z4ZA3ENE ONLINE 0 0 0 errors: No known data errors ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: zfsutils-linux 0.7.5-1ubuntu16.7 ProcVersionSignature: Ubuntu 5.3.0-26.28~18.04.1-generic 5.3.13 Uname: Linux 5.3.0-26-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.9-0ubuntu7.9 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Fri Jan 17 16:22:01 2020 InstallationDate: Installed on 2018-03-07 (681 days ago) InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1) SourcePackage: zfs-linux UpgradeStatus: Upgraded to bionic on 2018-08-02 (533 days ago) modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs'] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1860182/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support
** Description changed: - 5.3 kernel functionality back through to 4.15 is required for 5.3 HWE - kernel support in ZFS and SPL modules. + == SRU Justification Bionic == + + The HWE 5.3 kernel requires ZFS + SPL to support dkms module build + functionality for kernels 4.15 through to 5.3. Basically, the ZFS+SPL + compat commits between 4.15 and 5.3 are required to allow the modules to + build on kernels upto and include the HWE 5.3 kernel. + + == The Fix == + + Backport of upstream commits: + + SPL: + - 0002-fix-spl-build-shrinker-callback-check.patch + - 0003-remove-deprecated-set-fs-pwd-check.patch + - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch + - 0005-Linux-4.20-compat-current_kernel_time.patch + - 0006-Linux-4.18-compat-Use-ktime_get_coarse_real_ts64.patch + - 0007-Linux-5.0-compat-Use-totalram_pages.patch + - 0008-Linux-5.0-compat-Fix-SUBDIRs.patch + - 0009-Linux-4.20-compat-Fix-VERIFY-RW_READ_HELD-hash-mh_co.patch + - 0010-Linux-5.1-compat-get_ds-removed.patch + - 0011-Linux-5.0-compat-Use-totalhigh_pages.patch + - 0012-Linux-5.2-compat-rw_tryupgrade.patch + - 0013-Linux-5.3-compat-rw_semaphore-owner.patch + - 0014-Linux-5.3-compat-retire-rw_tryupgrade.patch + - 0015-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch + - 0016-Linux-compat-4.16-SECTOR_SIZE.patch + - 0017-Linux-compat-spl-timespec_sub.patch + - 0018-deprecate-splat-rwlock-test6.patch + + ZFS: + - 3300-Linux-4.16-compat-inode_set_iversion.patch + - 3301-Linux-4.16-compat-use-correct-_dec_and_test.patch + - 3302-Linux-4.16-compat-get_disk_and_module.patch + - 3303-Linux-compat-4.16-blk_queue_flag_-set-clear.patch + - 3304-Linux-4.18-compat-inode-timespec-timespec64.patch + - 3305-Linux-4.14-compat-blk_queue_stackable.patch + - 3306-Linux-4.19-rc3-compat-Remove-refcount_t-compat.patch + - 3307-Linux-5.0-compat-access_ok-drops-type-parameter.patch + - 3308-Linux-5.0-compat-Use-totalram_pages.patch + - 3309-Linux-5.0-compat-Convert-MS_-macros-to-SB_.patch + - 3310-Linux-5.0-compat-Fix-SUBDIRs.patch + - 3311-Linux-5.0-compat-Disable-vector-instructions-on-5.0-.patch + - 3312-Linux-5.0-compat-Fix-bio_set_dev.patch + - 3313-Linux-5.0-compat-Remove-incorrect-ASSERT.patch + - 3314-Linux-5.0-compat-Use-totalhigh_pages.patch + - 3315-Linux-5.0-compat-ASM_BUG-macro.patch + - 3316-Linux-5.2-compat-rw_tryupgrade.patch + - 3317-Linux-5.2-compat-Directly-call-wait_on_page_bit.patch + - 3318-Linux-5.3-compat-Makefile-subdir-m-no-longer-support.patch + - 3319-Linux-5.3-Fix-switch-fall-though-compiler-errors.patch + - 3320-zpios-deprecate-current-kernel-time.patch + - 3321-add-compat-check-disk-size-change.patch + + == Testcase == + + Without these commits users who install kernels and kernel headers from + 4.16 through to 5.3 inclusive won't be able to build spl + zfs in Bionic + because of the lack of the kernel compat fixes. With the commits, zfs + + spl dkms modules can build cleanly and pass the ubuntu ZFS regression + tests found in the kernel team autotests git repository. + + == Risk == + + This is a sizeable backport that touches a fair amount of spl + zfs + kernel interfacing code. There is a risk that the backport may cause a + regression in functionality that has not been exercised by the ZFS + regression tests. This backport with the zfs regression testing ensures + that no regression in core zfs functionality has been found. It must + be noted that most of the patches are upstream compat fixes that are + known to be working with the latest ZFS that is being used in focal, so + we are confident the original compat changes work. + + Note that these updates have all been build tested on x86-64, arm64 and + s390x systems with kernels from 4.16 to 5.3 and regression tested with + the ubuntu zfs regression tests. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856704 Title: backport 5.3 zfs support to bionic for HWE kernel support Status in spl-linux package in Ubuntu: Fix Committed Status in zfs-linux package in Ubuntu: Fix Committed Status in spl-linux source package in Bionic: Fix Committed Status in zfs-linux source package in Bionic: Fix Committed Bug description: == SRU Justification Bionic == The HWE 5.3 kernel requires ZFS + SPL to support dkms module build functionality for kernels 4.15 through to 5.3. Basically, the ZFS+SPL compat commits between 4.15 and 5.3 are required to allow the modules to build on kernels upto and include the HWE 5.3 kernel. == The Fix == Backport of upstream commits: SPL: - 0002-fix-spl-build-shrinker-callback-check.patch - 0003-remove-deprecated-set-fs-pwd-check.patch - 0004-Linux-4.18-compat-inode-timespec-timespec64.patch -
[Kernel-packages] [Bug 1856704] Re: backport 5.3 zfs support to bionic for HWE kernel support
** Changed in: spl-linux (Ubuntu) Status: In Progress => Fix Committed ** Changed in: spl-linux (Ubuntu Bionic) Status: In Progress => Fix Committed ** Changed in: zfs-linux (Ubuntu) Status: In Progress => Fix Committed ** Changed in: zfs-linux (Ubuntu Bionic) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856704 Title: backport 5.3 zfs support to bionic for HWE kernel support Status in spl-linux package in Ubuntu: Fix Committed Status in zfs-linux package in Ubuntu: Fix Committed Status in spl-linux source package in Bionic: Fix Committed Status in zfs-linux source package in Bionic: Fix Committed Bug description: 5.3 kernel functionality back through to 4.15 is required for 5.3 HWE kernel support in ZFS and SPL modules. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption
** Also affects: zfs-linux (Ubuntu) Importance: Undecided Status: New ** Changed in: zfs-linux (Ubuntu) Status: New => Fix Committed ** Changed in: zfs-linux (Ubuntu) Status: Fix Committed => Fix Released ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => High ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857040 Title: zfs: upstream support for hardware-accelerated encryption Status in linux package in Ubuntu: In Progress Status in zfs-linux package in Ubuntu: Fix Released Bug description: I understand that in Linux 5.0+, certain encryption-related symbols have been marked GPL-only, making them unavailable for use by zfs. As a result, using encryption in zfs pools increases cpu load / decreases disk throughput. There are a pair of upstream pull requests that should improve the performance (with performance measurement done on x86-64). Can these be pulled into the Ubuntu kernel? https://github.com/zfsonlinux/zfs/pull/9515 https://github.com/zfsonlinux/zfs/pull/9296 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
After quite a bit of experimentation I found that I can reproduce the bug if I have zram *and* also swap on the filesystem enabled while exercising the brk stressors and aiol (to cause lots of I/O). Eventually the system grinds to a halt, we lose interactivity and we eventually get lockups as follows: [ 2012.040006] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [stress-ng-brk:1632] [ 2012.040922] Modules linked in: zram(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) pcbc(E) aesni_intel(E) aes_x86_64(E) crypto_simd(E) glue_helper(E) cryptd(E) psmouse(E) input_leds(E) floppy(E) virtio_scsi(E) serio_raw(E) i2c_piix4(E) mac_hid(E) pata_acpi(E) qemu_fw_cfg(E) 9pnet_virtio(E) 9p(E) 9pnet(E) fscache(E) [ 2012.044655] CPU: 2 PID: 1632 Comm: stress-ng-brk Tainted: GEL 4.15.18 #1 [ 2012.045581] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 2012.046555] RIP: 0010:__raw_callee_save___pv_queued_spin_unlock+0x10/0x17 [ 2012.047340] RSP: 0018:b73382083718 EFLAGS: 0246 ORIG_RAX: ff11 [ 2012.048238] RAX: 0001 RBX: RCX: 0002 [ 2012.049078] RDX: RSI: 9d327c2f6918 RDI: a3269978 [ 2012.049909] RBP: b73382083720 R08: 9d327c2f6918 R09: 9d327c0a5328 [ 2012.050746] R10: 9d327c1e2310 R11: 9d327c1e2328 R12: 9d327c2f6800 [ 2012.051574] R13: 9d327c1e2328 R14: 9d327c1e2310 R15: 9d327c1e2200 [ 2012.052436] FS: 7f89f2ccd740() GS:9d327f28() knlGS: [ 2012.053382] CS: 0010 DS: ES: CR0: 80050033 [ 2012.054058] CR2: 7f1350a8dd90 CR3: 311a4004 CR4: 00160ee0 [ 2012.054889] Call Trace: [ 2012.055192] get_swap_pages+0x193/0x360 [ 2012.055652] get_swap_page+0x13f/0x1e0 [ 2012.056123] add_to_swap+0x14/0x70 [ 2012.056530] shrink_page_list+0x81d/0xbc0 [ 2012.057013] shrink_inactive_list+0x242/0x590 [ 2012.057523] shrink_node_memcg+0x364/0x770 [ 2012.058012] shrink_node+0xf7/0x300 [ 2012.058432] ? shrink_node+0xf7/0x300 [ 2012.058863] do_try_to_free_pages+0xc9/0x330 [ 2012.059368] try_to_free_pages+0xee/0x1b0 [ 2012.059842] __alloc_pages_slowpath+0x3fc/0xe00 [ 2012.060424] __alloc_pages_nodemask+0x29a/0x2c0 [ 2012.060963] alloc_pages_vma+0x88/0x1f0 [ 2012.061414] __handle_mm_fault+0x8b7/0x12e0 [ 2012.061909] handle_mm_fault+0xb1/0x210 [ 2012.062375] __do_page_fault+0x281/0x4b0 [ 2012.062848] do_page_fault+0x2e/0xe0 [ 2012.063274] ? async_page_fault+0x2f/0x50 [ 2012.063751] do_async_page_fault+0x51/0x80 [ 2012.064262] async_page_fault+0x45/0x50 [ 2012.064719] RIP: 0033:0x55ec1997bd0a [ 2012.065147] RSP: 002b:7ffeacd21600 EFLAGS: 00010246 [ 2012.065754] RAX: 55ec28601000 RBX: 0005 RCX: 7f89f2de956b [ 2012.066580] RDX: 55ec28601000 RSI: 7ffeacd216d0 RDI: 55ec28602000 [ 2012.067410] RBP: 7ffeacd216c0 R08: R09: 7f89f3d0c2f0 [ 2012.068290] R10: R11: 0246 R12: [ 2012.069129] R13: 0002 R14: 0001 R15: 7ffeacd216d0 [ 2012.069965] Code: 50 41 51 41 52 41 53 e8 3b 05 00 00 41 5b 41 5a 41 59 41 58 5f 5e 5a 59 5d c3 90 55 48 89 e5 52 b8 01 00 00 00 31 d2 f0 0f b0 17 <3c> 01 75 03 5a 5d c3 56 0f b6 f0 e8 bc ff ff ff 5e 5a 5d c3 0f -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase
[Kernel-packages] [Bug 1858495] Re: multiple long delays during kernel and userspace boot
** Changed in: linux-signed-azure (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-signed-azure in Ubuntu. https://bugs.launchpad.net/bugs/1858495 Title: multiple long delays during kernel and userspace boot Status in linux-signed-azure package in Ubuntu: New Bug description: Booting some Bionic instances in Azure (gen1 machines), I see some large delays during kernel/userspace boot that it would be good to understand what's going on. Additionally, there areas during boot that see delays is different for an image that's been created from a template vs. stock images. I'm attaching some data, 10 runs of the same image in a scaling set that run the initial boot. Processing the journal output, looking at delays of over 2.0 shows some concern. [1.788581] localhost.localdomain kernel: * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, * this clock source is slow. Consider trying other clock sources [3.545974] localhost.localdomain kernel: Unstable clock detected, switching default tracing clock to "global" If you want to keep using the local clock, then add: "trace_clock=local" on the kernel command line [6.401684] localhost.localdomain kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 15.280390] localhost.localdomain kernel: EXT4-fs (sda1): re-mounted. Opts: discard After capturing bionic image as a template, and creating a new VM, we see new hot spots we didn't see before. # HotSpot maximum delta between kernel messages: 2.0 # [2.846188] localhost.localdomain kernel: AES CTR mode by8 optimization enabled # [5.919313] localhost.localdomain kernel: raid6: avx2x4 gen() 21512 MB/s # # [6.591530] localhost.localdomain kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) # [9.031051] localhost.localdomain systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid) # # [ 13.773554] localhost.localdomain sh[871]: + exit 0 # [ 21.625467] localhost.localdomain kernel: UDF-fs: INFO Mounting volume 'UDF Volume', timestamp 2019/12/17 00:00 (1000) # # [ 24.919359] bugbif2be01 systemd-timesyncd[771]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com). # [ 29.787339] bugbif2be01 cloud-init[1026]: Cloud-init v. 19.2-36-g059d049c-0ubuntu2~18.04.1 running 'init' at Mon, 16 Dec 2019 18:14:47 +. Up 25.20 seconds. The easiest comparison kernel-side is the systemd-analyze value: Grepping in the debug data: % grep "Startup finished.*kernel" bug-bionic-baseline-no*.debug/*/journal.log | cut -d" " -f 7- Startup finished in 3.209s (kernel) + 49.305s (userspace) = 52.515s. Startup finished in 3.355s (kernel) + 51.732s (userspace) = 55.088s. Startup finished in 3.287s (kernel) + 51.747s (userspace) = 55.035s. Startup finished in 3.129s (kernel) + 50.066s (userspace) = 53.195s. Startup finished in 3.350s (kernel) + 50.682s (userspace) = 54.032s. Startup finished in 3.355s (kernel) + 49.322s (userspace) = 52.678s. Startup finished in 3.219s (kernel) + 51.124s (userspace) = 54.343s. Startup finished in 3.128s (kernel) + 49.226s (userspace) = 52.354s. Startup finished in 3.193s (kernel) + 53.197s (userspace) = 56.390s. Startup finished in 3.118s (kernel) + 46.203s (userspace) = 49.322s. foofoo % grep "Startup finished.*kernel" bug-bionic-baseline-after*.debug/*/journal.log | cut -d" " -f 7- Startup finished in 7.685s (kernel) + 32.463s (userspace) = 40.148s. Startup finished in 7.041s (kernel) + 35.998s (userspace) = 43.040s. Startup finished in 7.808s (kernel) + 35.444s (userspace) = 43.253s. Startup finished in 7.206s (kernel) + 37.952s (userspace) = 45.159s. Startup finished in 8.426s (kernel) + 36.976s (userspace) = 45.403s. Startup finished in 6.731s (kernel) + 35.484s (userspace) = 42.216s. Startup finished in 7.152s (kernel) + 32.664s (userspace) = 39.817s. Startup finished in 7.429s (kernel) + 36.177s (userspace) = 43.606s. Startup finished in 9.075s (kernel) + 32.494s (userspace) = 41.570s. Startup finished in 7.281s (kernel) + 32.732s (userspace) = 40.013s. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-5.0.0-1027-azure 5.0.0-1027.29~18.04.1 ProcVersionSignature: User Name 5.0.0-1027.29~18.04.1-azure 5.0.21 Uname: Linux 5.0.0-1027-azure x86_64 ApportVers
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
Oh, stupid me, I've just read the info in comment #1 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Triaged Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858615] Re: dmidecode triggers system reboot on Inforce 6640
Does the kernel expose the DMI tables via the sysfs following sysfs file: /sys/firmware/dmi/tables/DMI ? If so, can you do the following: sudo cat /sys/firmware/dmi/tables/DMI > dmi.raw and attach it to the bug report. Also a dump of the kernel dmesg log after it boots may be useful to see if it's a broken firmware DMI table or a kernel issue. ** Changed in: dmidecode (Ubuntu) Status: New => Triaged ** Changed in: dmidecode (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: dmidecode (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to dmidecode in Ubuntu. https://bugs.launchpad.net/bugs/1858615 Title: dmidecode triggers system reboot on Inforce 6640 Status in cloud-init: Invalid Status in dmidecode package in Ubuntu: Triaged Bug description: Device: Inforce 6640 https://www.inforcecomputing.com/products/single-board-computers-sbc/qualcomm-snapdragon-820-inforce-6640-sbc SoC: Snapdragon 820 sysname='Linux', nodename='ubuntu', release='4.15.0-1069-snapdragon', version='#76-Ubuntu SMP Tue Nov 26 16:10:14 UTC 2019', machine='aarch64' The issue is caused by following commit. Inforce 6640 doesn't have functional demidecode. System will reboot when executing dmidecode. commit 3416e2ee7f65defdb15aab861a85767d13e8c34c Author: Robert Schweikert Date: Sat Oct 29 09:29:53 2016 -0400 dmidecode: Allow dmidecode to be used on aarch64 aarch64 systems have functional dmidecode, so allow that to be used. - aarch64 has support for dmidecode as well To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1858615/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Can reproduce this with stress-ng exercising high memory pressure scenario using: stress-ng --brk 0 -v --aiol 0 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..: dmi.product.family: Family dmi.product.name: S1200SP dmi.product.version: dmi.sys.vendor: Intel
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
Can reproduce this with stress-ng exercising high memory pressure scenario using: stress-ng --brk 0 -v --aiol 0 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..: dmi.product.family: Family dmi.product.name: S1200SP dmi.product.version: dmi.sys.vendor: Intel
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
I'm assuming the defaults are being used for the moment, this means 50% of total memory being used in total distributed across the number of CPUs, as defined in /usr/bin/init-zram-swapping -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..: dmi.product.family: Family dmi.product.name: S1200SP
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
It would be useful to know if one has made any specific zram config changes, and if so, what your current config is just to help with the debugging of this issue. ** Changed in: linux (Ubuntu) Status: Confirmed => Incomplete ** Changed in: zram-config (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in zram-config package in Ubuntu: Incomplete Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias:
[Kernel-packages] [Bug 1853044] Re: 5.3.0-23-generic causes fans to spin when idle
I'll get a kernel sorted out for testing by EOD. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1853044 Title: 5.3.0-23-generic causes fans to spin when idle Status in linux package in Ubuntu: Confirmed Bug description: After upgrading to 5.3.0-23-generic the fans in my machine don't stop running. They always sound like something is utilizing CPU - even with no applications running after boot. If I boot back to 5.3.0-19-generic it's fine. My microcode version is reported as 0xd4 and iucode-tool reports: iucode-tool: system has processor(s) with signature 0x000506e3 Let me know if you need anything else. ProblemType: Bug DistroRelease: Ubuntu 19.10 Package: linux-image-5.3.0-23-generic 5.3.0-23.25 ProcVersionSignature: Ubuntu 5.3.0-23.25-generic 5.3.7 Uname: Linux 5.3.0-23-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair ApportVersion: 2.20.11-0ubuntu8.2 Architecture: amd64 AudioDevicesInUse: USERPID ACCESS COMMAND /dev/snd/controlC2: dean 2898 F pulseaudio /dev/snd/pcmC2D0p: dean 2898 F...m pulseaudio /dev/snd/controlC0: dean 2898 F pulseaudio /dev/snd/controlC1: dean 2898 F pulseaudio CurrentDesktop: ubuntu:GNOME Date: Mon Nov 18 13:03:34 2019 HibernationDevice: RESUME=UUID=55a42c82-50bf-4e75-a133-dbd3aa93611b InstallationDate: Installed on 2018-07-24 (482 days ago) InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180724) ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 i915drmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.3.0-23-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7 RelatedPackageVersions: linux-restricted-modules-5.3.0-23-generic N/A linux-backports-modules-5.3.0-23-generic N/A linux-firmware1.183.2 SourcePackage: linux UpgradeStatus: Upgraded to eoan on 2019-07-19 (121 days ago) dmi.bios.date: 05/16/2018 dmi.bios.vendor: Intel Corp. dmi.bios.version: KYSKLi70.86A.0055.2018.0516.1629 dmi.board.name: NUC6i7KYB dmi.board.vendor: Intel Corporation dmi.board.version: H90766-406 dmi.chassis.type: 3 dmi.chassis.vendor: Intel Corporation dmi.chassis.version: 1.0 dmi.modalias: dmi:bvnIntelCorp.:bvrKYSKLi70.86A.0055.2018.0516.1629:bd05/16/2018:svn:pn:pvr:rvnIntelCorporation:rnNUC6i7KYB:rvrH90766-406:cvnIntelCorporation:ct3:cvr1.0: To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853044/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption
The next spin of the focal kernel will pick this up when it is built with the new zfs-dkms driver. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857040 Title: zfs: upstream support for hardware-accelerated encryption Status in linux package in Ubuntu: In Progress Bug description: I understand that in Linux 5.0+, certain encryption-related symbols have been marked GPL-only, making them unavailable for use by zfs. As a result, using encryption in zfs pools increases cpu load / decreases disk throughput. There are a pair of upstream pull requests that should improve the performance (with performance measurement done on x86-64). Can these be pulled into the Ubuntu kernel? https://github.com/zfsonlinux/zfs/pull/9515 https://github.com/zfsonlinux/zfs/pull/9296 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1799497] Re: 4.15 kernel hard lockup about once a week
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Confirmed Status in zram-config package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Status in zram-config source package in Bionic: Confirmed Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m totalusedfree shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 159092687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Oct 23 16:12 seq crw-rw 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: dmi.chassis.type: 23 dmi.chassis.vendor: ... dmi.chassis.version: .. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...:ct23:cvr..: dmi.product.family: Family dmi.product.name: S1200SP dmi.product.version: dmi.sys.vendor: Intel Corporation To man
[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption
Also should apply: commit 10fa254539ec41c6b043785d4e7ab34bce383b9f Author: Brian Behlendorf Date: Thu Oct 24 10:17:33 2019 -0700 Linux 4.14, 4.19, 5.0+ compat: SIMD save/restore but this also requires a rather tricky backport of: commit 006e9a40882468be68f276c946bae812b74ac35c Author: Matthew Macy Date: Thu Sep 5 09:34:54 2019 -0700 OpenZFS restructuring - move platform specific headers and also we are dependant on a backport of: commit 608f8749a1055e6769899788e11bd51fd396f9e5 Author: Brian Behlendorf Date: Tue Oct 1 12:50:34 2019 -0700 Perform KABI checks in parallel -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857040 Title: zfs: upstream support for hardware-accelerated encryption Status in linux package in Ubuntu: In Progress Bug description: I understand that in Linux 5.0+, certain encryption-related symbols have been marked GPL-only, making them unavailable for use by zfs. As a result, using encryption in zfs pools increases cpu load / decreases disk throughput. There are a pair of upstream pull requests that should improve the performance (with performance measurement done on x86-64). Can these be pulled into the Ubuntu kernel? https://github.com/zfsonlinux/zfs/pull/9515 https://github.com/zfsonlinux/zfs/pull/9296 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1858650] Re: package zfsutils-linux 0.8.1-1ubuntu14.2 failed to install/upgrade: installed zfsutils-linux package post-installation script subprocess returned error exit status
ZFS kernel modules are not supported for small memory ARM platforms such as raspberry pi as it requires at least 4GB of memory to perform without causing memory pressure issues. ** Changed in: zfs-linux (Ubuntu) Status: New => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1858650 Title: package zfsutils-linux 0.8.1-1ubuntu14.2 failed to install/upgrade: installed zfsutils-linux package post-installation script subprocess returned error exit status 1 Status in zfs-linux package in Ubuntu: Won't Fix Bug description: cant install zfsutils-linux on headless nor desktop Jan 07 15:06:06 ubuntu zfs[3775]: The ZFS modules are not loaded. Jan 07 15:06:06 ubuntu zfs[3775]: Try running '/sbin/modprobe zfs' as root to lo ad them. Jan 07 15:06:06 ubuntu systemd[1]: zfs-mount.service: Main process exited, code= exited, status=1/FAILURE Jan 07 15:06:06 ubuntu systemd[1]: zfs-mount.service: Failed with result 'exit-c ode'. Jan 07 15:06:06 ubuntu systemd[1]: Failed to start Mount ZFS filesystems. ProblemType: Package DistroRelease: Ubuntu 19.10 Package: zfsutils-linux 0.8.1-1ubuntu14.2 ProcVersionSignature: Ubuntu 5.3.0-1015.17-raspi2 5.3.13 Uname: Linux 5.3.0-1015-raspi2 aarch64 ApportVersion: 2.20.11-0ubuntu8.2 Architecture: arm64 Date: Tue Jan 7 15:06:06 2020 ErrorMessage: installed zfsutils-linux package post-installation script subprocess returned error exit status 1 Python3Details: /usr/bin/python3.7, Python 3.7.5, python3-minimal, 3.7.5-1 PythonDetails: N/A RelatedPackageVersions: dpkg 1.19.7ubuntu2 apt 1.9.4 SourcePackage: zfs-linux Title: package zfsutils-linux 0.8.1-1ubuntu14.2 failed to install/upgrade: installed zfsutils-linux package post-installation script subprocess returned error exit status 1 UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1858650/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption
** Changed in: linux (Ubuntu) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857040 Title: zfs: upstream support for hardware-accelerated encryption Status in linux package in Ubuntu: In Progress Bug description: I understand that in Linux 5.0+, certain encryption-related symbols have been marked GPL-only, making them unavailable for use by zfs. As a result, using encryption in zfs pools increases cpu load / decreases disk throughput. There are a pair of upstream pull requests that should improve the performance (with performance measurement done on x86-64). Can these be pulled into the Ubuntu kernel? https://github.com/zfsonlinux/zfs/pull/9515 https://github.com/zfsonlinux/zfs/pull/9296 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1857040] Re: zfs: upstream support for hardware-accelerated encryption
** Changed in: linux (Ubuntu) Importance: Undecided => High ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857040 Title: zfs: upstream support for hardware-accelerated encryption Status in linux package in Ubuntu: Confirmed Bug description: I understand that in Linux 5.0+, certain encryption-related symbols have been marked GPL-only, making them unavailable for use by zfs. As a result, using encryption in zfs pools increases cpu load / decreases disk throughput. There are a pair of upstream pull requests that should improve the performance (with performance measurement done on x86-64). Can these be pulled into the Ubuntu kernel? https://github.com/zfsonlinux/zfs/pull/9515 https://github.com/zfsonlinux/zfs/pull/9296 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857040/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856900] Re: stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12
I believe this is because a FUSE based file system is being used in the prior ADT testing and sysinfo is breaking on the FUSE filesystem, so it may be a problem with with the fuse fs itself or the fuse file system that is using the kernel fuse core. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1856900 Title: stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12 Status in linux package in Ubuntu: Incomplete Bug description: During autopkgtest testing the sysinfo stressor failed, causing the kernel to oops. 16:20:34 DEBUG| [stdout] sysinfo STARTING 16:20:39 DEBUG| [stdout] sysinfo RETURNED 0 16:20:39 DEBUG| [stdout] sysinfo FAILED (kernel oopsed) 16:20:39 DEBUG| [stdout] [ 6521.203448] kernel tried to execute exec-protected page (c000c25ffce0) - exploit attempt? (uid: 0) 16:20:39 DEBUG| [stdout] [ 6521.207260] BUG: Unable to handle kernel instruction fetch 16:20:39 DEBUG| [stdout] [ 6521.207307] Faulting instruction address: 0xc000c25ffce0 16:20:39 DEBUG| [stdout] [ 6521.207367] Oops: Kernel access of bad area, sig: 11 [#1] 16:20:39 DEBUG| [stdout] [ 6521.207416] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 16:20:39 DEBUG| [stdout] [ 6521.207481] Modules linked in: unix_diag sctp vhost_vsock vmw_vsock_virtio_transport_common vsock zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO) userio zcommon(PO) znvpair(PO) cuse spl(O) kvm_pr kvm snd_seq snd_seq_device snd_timer snd soundcore hci_vhci bluetooth ecdh_generic ecc uhid hid vhost_net vhost tap atm algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea crc32_generic md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic blowfish_generic blowfish_common cast5_generic des_generic libdes salsa20_generic chacha_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common algif_skcipher aufs sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp ip6table_nat ip6_tables iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 algif_hash af_alg ip_vti ip6_vti fou6 sit ipip tunnel4 fou geneve act_mirred cls_basic esp6 authenc echainiv 16:20:39 DEBUG| [stdout] [ 6521.208045] iptable_filter xt_policy veth esp4_offload esp4 xfrm_user xfrm_algo macsec vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb dummy tls binfmt_misc af_packet_diag tcp_diag udp_diag raw_diag inet_diag iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_blk virtio_net net_failover failover [last unloaded: trace_printk] 16:20:39 DEBUG| [stdout] [ 6521.209360] CPU: 1 PID: 2647099 Comm: fuse_mnt Tainted: P OE 5.4.0-9-generic #12-Ubuntu 16:20:39 DEBUG| [stdout] [ 6521.209457] NIP: c000c25ffce0 LR: c063f058 CTR: c000c25ffce0 16:20:39 DEBUG| [stdout] [ 6521.209528] REGS: c00109703810 TRAP: 0400 Tainted: P OE (5.4.0-9-generic) 16:20:39 DEBUG| [stdout] [ 6521.209608] MSR: 800010009033 CR: 88002440 XER: 2000 16:20:39 DEBUG| [stdout] [ 6521.209681] CFAR: c063f054 IRQMASK: 0 16:20:39 DEBUG| [stdout]GPR00: c063f034 c00109703aa0 c1a4bb00 c0007cef3000 16:20:39 DEBUG| [stdout]GPR04: c000c25ffc18 16:20:39 DEBUG| [stdout]GPR08: 16:20:39 DEBUG| [stdout]GPR12: c000c25ffce0 c0003fffee00 79b6987b4410 16:20:39 DEBUG| [stdout]GPR16: 79b698b3 79b6987b0320 79b69771f240 79b6987b4420 16:20:39 DEBUG| [stdout]GPR20: 79b6880010a0 79b698a4d3a0 16:20:39 DEBUG| [stdout]GPR24: c00109d56cc0 c001fde0cd8c c000c25ffce0 c00109d56ca0 16:20:39 DEBUG| [stdout]GPR28: c00109d56cc0 c0007cef3000 c00109d56c90 16:20:39 DEBUG| [stdout] [ 6521.210276] NIP [c000c25ffce0] 0xc000c25ffce0 16:20:39 DEBUG| [stdout] [ 6521.210355] LR [c063f058] fuse_request_end+0x128/0x2f0 16:20:39 DEBUG| [stdout] [ 6521.210423] Call Trace: 16:20:39 DEBUG| [stdout] [ 6521.210448] [c00109703aa0] [c063f034] fuse_request_end+0x104/0x2f0 (unreliable) 16:20:39 DEBUG| [stdout] [ 6521.210520] [c00109703af0] [c0642ebc] fuse_dev_do_write+0x2cc/0x5c0 16:20:39 DEBUG| [stdout] [ 6521.210591] [c00109703b70] [c0643654]
[Kernel-packages] [Bug 1856900] Re: stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12
I've seen something very similar to this on this platform and I believe it's a combination of previous regressions tests and the stress-ng sysinfo test that triggers this. Running the stress-ng stressor after a clean boot won't trigger this issue. ** Changed in: linux (Ubuntu) Importance: Undecided => Medium ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1856900 Title: stress-ng sysinfo stressor fails on ppc64el with linux 5.4.0-9.12 Status in linux package in Ubuntu: Incomplete Bug description: During autopkgtest testing the sysinfo stressor failed, causing the kernel to oops. 16:20:34 DEBUG| [stdout] sysinfo STARTING 16:20:39 DEBUG| [stdout] sysinfo RETURNED 0 16:20:39 DEBUG| [stdout] sysinfo FAILED (kernel oopsed) 16:20:39 DEBUG| [stdout] [ 6521.203448] kernel tried to execute exec-protected page (c000c25ffce0) - exploit attempt? (uid: 0) 16:20:39 DEBUG| [stdout] [ 6521.207260] BUG: Unable to handle kernel instruction fetch 16:20:39 DEBUG| [stdout] [ 6521.207307] Faulting instruction address: 0xc000c25ffce0 16:20:39 DEBUG| [stdout] [ 6521.207367] Oops: Kernel access of bad area, sig: 11 [#1] 16:20:39 DEBUG| [stdout] [ 6521.207416] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 16:20:39 DEBUG| [stdout] [ 6521.207481] Modules linked in: unix_diag sctp vhost_vsock vmw_vsock_virtio_transport_common vsock zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO) userio zcommon(PO) znvpair(PO) cuse spl(O) kvm_pr kvm snd_seq snd_seq_device snd_timer snd soundcore hci_vhci bluetooth ecdh_generic ecc uhid hid vhost_net vhost tap atm algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea crc32_generic md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic blowfish_generic blowfish_common cast5_generic des_generic libdes salsa20_generic chacha_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common algif_skcipher aufs sch_etf sch_fq dccp_ipv6 dccp_ipv4 dccp ip6table_nat ip6_tables iptable_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 algif_hash af_alg ip_vti ip6_vti fou6 sit ipip tunnel4 fou geneve act_mirred cls_basic esp6 authenc echainiv 16:20:39 DEBUG| [stdout] [ 6521.208045] iptable_filter xt_policy veth esp4_offload esp4 xfrm_user xfrm_algo macsec vxlan ip6_udp_tunnel udp_tunnel vrf 8021q garp mrp bridge stp llc ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre cls_u32 sch_htb dummy tls binfmt_misc af_packet_diag tcp_diag udp_diag raw_diag inet_diag iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_blk virtio_net net_failover failover [last unloaded: trace_printk] 16:20:39 DEBUG| [stdout] [ 6521.209360] CPU: 1 PID: 2647099 Comm: fuse_mnt Tainted: P OE 5.4.0-9-generic #12-Ubuntu 16:20:39 DEBUG| [stdout] [ 6521.209457] NIP: c000c25ffce0 LR: c063f058 CTR: c000c25ffce0 16:20:39 DEBUG| [stdout] [ 6521.209528] REGS: c00109703810 TRAP: 0400 Tainted: P OE (5.4.0-9-generic) 16:20:39 DEBUG| [stdout] [ 6521.209608] MSR: 800010009033 CR: 88002440 XER: 2000 16:20:39 DEBUG| [stdout] [ 6521.209681] CFAR: c063f054 IRQMASK: 0 16:20:39 DEBUG| [stdout]GPR00: c063f034 c00109703aa0 c1a4bb00 c0007cef3000 16:20:39 DEBUG| [stdout]GPR04: c000c25ffc18 16:20:39 DEBUG| [stdout]GPR08: 16:20:39 DEBUG| [stdout]GPR12: c000c25ffce0 c0003fffee00 79b6987b4410 16:20:39 DEBUG| [stdout]GPR16: 79b698b3 79b6987b0320 79b69771f240 79b6987b4420 16:20:39 DEBUG| [stdout]GPR20: 79b6880010a0 79b698a4d3a0 16:20:39 DEBUG| [stdout]GPR24: c00109d56cc0 c001fde0cd8c c000c25ffce0 c00109d56ca0 16:20:39 DEBUG| [stdout]GPR28: c00109d56cc0 c0007cef3000 c00109d56c90 16:20:39 DEBUG| [stdout] [ 6521.210276] NIP [c000c25ffce0] 0xc000c25ffce0 16:20:39 DEBUG| [stdout] [ 6521.210355] LR [c063f058] fuse_request_end+0x128/0x2f0 16:20:39 DEBUG| [stdout] [ 6521.210423] Call Trace: 16:20:39 DEBUG| [stdout] [ 6521.210448] [c00109703aa0] [c063f034] fuse_request_end+0x104/0x2f0 (unreliable) 16:20:39 DEBUG| [
[Kernel-packages] [Bug 1856704] [NEW] backport 5.3 zfs support to bionic for HWE kernel support
Public bug reported: 5.3 kernel functionality back through to 4.15 is required for 5.3 HWE kernel support in ZFS and SPL modules. ** Affects: spl-linux (Ubuntu) Importance: High Assignee: Colin Ian King (colin-king) Status: In Progress ** Affects: zfs-linux (Ubuntu) Importance: High Assignee: Colin Ian King (colin-king) Status: In Progress ** Also affects: spl-linux (Ubuntu) Importance: Undecided Status: New ** Changed in: spl-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: zfs-linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: spl-linux (Ubuntu) Status: New => Incomplete ** Changed in: spl-linux (Ubuntu) Importance: Undecided => High ** Changed in: zfs-linux (Ubuntu) Importance: Undecided => High ** Changed in: spl-linux (Ubuntu) Status: Incomplete => In Progress ** Changed in: zfs-linux (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856704 Title: backport 5.3 zfs support to bionic for HWE kernel support Status in spl-linux package in Ubuntu: In Progress Status in zfs-linux package in Ubuntu: In Progress Bug description: 5.3 kernel functionality back through to 4.15 is required for 5.3 HWE kernel support in ZFS and SPL modules. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/spl-linux/+bug/1856704/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads
I've tested zfs from the -proposed pockets with the ubuntu ZFS autotest regression tests: ubuntu_zfs_fstest ubuntu_zfs_smoke_test ubuntu_zfs_stress ubuntu_zfs_xfs_generic All the following passed the regression testing. bionic: 0.7.5-1ubuntu16.7 disco: 0.7.12-1ubuntu5.1 eoan: 0.8.1-1ubuntu14.3 I was unable to trip and lockups, so as far as I'm concerned I'm happy for these updates to be released. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856084 Title: Livelock between ZFS evict and writeback threads Status in zfs-linux package in Ubuntu: Fix Released Status in zfs-linux source package in Bionic: Fix Committed Status in zfs-linux source package in Disco: Fix Committed Status in zfs-linux source package in Eoan: Fix Committed Status in zfs-linux source package in Focal: Fix Released Status in zfs-linux package in Debian: Unknown Bug description: Livelock between ZFS evict and writeback threads [Impact] ZIO pipeline stalls, causing ZFS workloads to hang indefinitely [Description] For certain ZFS workloads, we start seeing hung task timeouts in the kernel logs due to zil_commit() stalling. This is due to zfs_zget() not detecting whether a znode has been marked for deletion before attempting to access it, causing a constant "retry loop" in zfs_get_data() if that znode has been unlinked already. An example of the stack traces follows: [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [72742.070429] mysqld D0 5713 2881 0x0320 [72742.073220] Call Trace: [72742.075305] __schedule+0x24e/0x880 [72742.090436] schedule+0x2c/0x80 [72742.090438] schedule_preempt_disabled+0xe/0x10 [72742.090441] __mutex_lock.isra.5+0x276/0x4e0 [72742.090547] ? dmu_tx_destroy+0x105/0x130 [zfs] [72742.090555] __mutex_lock_slowpath+0x13/0x20 [72742.115374] ? __mutex_lock_slowpath+0x13/0x20 [72742.132266] mutex_lock+0x2f/0x40 [72742.134207] zil_commit_impl+0x1b0/0x1b30 [zfs] [72742.150428] ? spl_kmem_alloc+0x115/0x180 [spl] [72742.152622] ? mutex_lock+0x12/0x40 [72742.154819] ? zfs_refcount_add_many+0x9a/0x100 [zfs] [72742.171450] zil_commit+0xde/0x150 [zfs] [72742.173687] zfs_fsync+0x77/0xe0 [zfs] [72742.175044] zpl_fsync+0x80/0x110 [zfs] [72742.191690] vfs_fsync_range+0x51/0xb0 [72742.193876] do_fsync+0x3d/0x70 [72742.195126] SyS_fsync+0x10/0x20 [72742.211059] do_syscall_64+0x73/0x130 [72742.214078] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 It's possible to hit this issue due to a race between the ZFS evict and writeback threads. If the z_iput task is trying to evict a znode that's currently sitting in the writeback thread, both will "livelock" each other and stall the ZIO pipeline, causing other ZFS operations (such as zil_commit) to hang indefinitely. This has been documented and fixed upstream in PR#9583 [0]. We need to pull two fixes from upstream: the first one fixes the zfs_zget() issue in the writeback thread, while the second fixes a regression on O_TMPFILE descriptors caused by the first one. Upstream patches: - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8) - Check for unlinked znodes after igrab() (0c46813805f4) [Test Case] Being a race condition, this issue has been hard to reproduce consistently. The racing window between evict() and the ZFS writeback thread is quite strict, but users have reported this to show up after some hours of running LXD-containerized mySQL workloads. [Regression Potential] These patches have been tested both in the ZFS test suite and in production environments, so the potential for further regressions should be low. Additional regressions would likely cause issues with the ZFS writeback/commit and IO pipeline, so they should be spotted fairly quickly. [0] https://github.com/zfsonlinux/zfs/pull/9583 [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8 [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads
*I was unable to trip any lockups ** Tags added: verification-done-bionic verification-done-disco verification-done-eoan -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856084 Title: Livelock between ZFS evict and writeback threads Status in zfs-linux package in Ubuntu: Fix Released Status in zfs-linux source package in Bionic: Fix Committed Status in zfs-linux source package in Disco: Fix Committed Status in zfs-linux source package in Eoan: Fix Committed Status in zfs-linux source package in Focal: Fix Released Status in zfs-linux package in Debian: Unknown Bug description: Livelock between ZFS evict and writeback threads [Impact] ZIO pipeline stalls, causing ZFS workloads to hang indefinitely [Description] For certain ZFS workloads, we start seeing hung task timeouts in the kernel logs due to zil_commit() stalling. This is due to zfs_zget() not detecting whether a znode has been marked for deletion before attempting to access it, causing a constant "retry loop" in zfs_get_data() if that znode has been unlinked already. An example of the stack traces follows: [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [72742.070429] mysqld D0 5713 2881 0x0320 [72742.073220] Call Trace: [72742.075305] __schedule+0x24e/0x880 [72742.090436] schedule+0x2c/0x80 [72742.090438] schedule_preempt_disabled+0xe/0x10 [72742.090441] __mutex_lock.isra.5+0x276/0x4e0 [72742.090547] ? dmu_tx_destroy+0x105/0x130 [zfs] [72742.090555] __mutex_lock_slowpath+0x13/0x20 [72742.115374] ? __mutex_lock_slowpath+0x13/0x20 [72742.132266] mutex_lock+0x2f/0x40 [72742.134207] zil_commit_impl+0x1b0/0x1b30 [zfs] [72742.150428] ? spl_kmem_alloc+0x115/0x180 [spl] [72742.152622] ? mutex_lock+0x12/0x40 [72742.154819] ? zfs_refcount_add_many+0x9a/0x100 [zfs] [72742.171450] zil_commit+0xde/0x150 [zfs] [72742.173687] zfs_fsync+0x77/0xe0 [zfs] [72742.175044] zpl_fsync+0x80/0x110 [zfs] [72742.191690] vfs_fsync_range+0x51/0xb0 [72742.193876] do_fsync+0x3d/0x70 [72742.195126] SyS_fsync+0x10/0x20 [72742.211059] do_syscall_64+0x73/0x130 [72742.214078] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 It's possible to hit this issue due to a race between the ZFS evict and writeback threads. If the z_iput task is trying to evict a znode that's currently sitting in the writeback thread, both will "livelock" each other and stall the ZIO pipeline, causing other ZFS operations (such as zil_commit) to hang indefinitely. This has been documented and fixed upstream in PR#9583 [0]. We need to pull two fixes from upstream: the first one fixes the zfs_zget() issue in the writeback thread, while the second fixes a regression on O_TMPFILE descriptors caused by the first one. Upstream patches: - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8) - Check for unlinked znodes after igrab() (0c46813805f4) [Test Case] Being a race condition, this issue has been hard to reproduce consistently. The racing window between evict() and the ZFS writeback thread is quite strict, but users have reported this to show up after some hours of running LXD-containerized mySQL workloads. [Regression Potential] These patches have been tested both in the ZFS test suite and in production environments, so the potential for further regressions should be low. Additional regressions would likely cause issues with the ZFS writeback/commit and IO pipeline, so they should be spotted fairly quickly. [0] https://github.com/zfsonlinux/zfs/pull/9583 [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8 [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads
I've checked that the zfs kernel driver builds and it passes the ZFS regression tests. Patches look good, so I've uploaded these packages. ** Changed in: zfs-linux (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: zfs-linux (Ubuntu Disco) Importance: Undecided => Medium ** Changed in: zfs-linux (Ubuntu Eoan) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856084 Title: Livelock between ZFS evict and writeback threads Status in zfs-linux package in Ubuntu: Confirmed Status in zfs-linux source package in Bionic: Confirmed Status in zfs-linux source package in Disco: Confirmed Status in zfs-linux source package in Eoan: Confirmed Status in zfs-linux source package in Focal: Confirmed Status in zfs-linux package in Debian: Unknown Bug description: Livelock between ZFS evict and writeback threads [Impact] ZIO pipeline stalls, causing ZFS workloads to hang indefinitely [Description] For certain ZFS workloads, we start seeing hung task timeouts in the kernel logs due to zil_commit() stalling. This is due to zfs_zget() not detecting whether a znode has been marked for deletion before attempting to access it, causing a constant "retry loop" in zfs_get_data() if that znode has been unlinked already. An example of the stack traces follows: [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [72742.070429] mysqld D0 5713 2881 0x0320 [72742.073220] Call Trace: [72742.075305] __schedule+0x24e/0x880 [72742.090436] schedule+0x2c/0x80 [72742.090438] schedule_preempt_disabled+0xe/0x10 [72742.090441] __mutex_lock.isra.5+0x276/0x4e0 [72742.090547] ? dmu_tx_destroy+0x105/0x130 [zfs] [72742.090555] __mutex_lock_slowpath+0x13/0x20 [72742.115374] ? __mutex_lock_slowpath+0x13/0x20 [72742.132266] mutex_lock+0x2f/0x40 [72742.134207] zil_commit_impl+0x1b0/0x1b30 [zfs] [72742.150428] ? spl_kmem_alloc+0x115/0x180 [spl] [72742.152622] ? mutex_lock+0x12/0x40 [72742.154819] ? zfs_refcount_add_many+0x9a/0x100 [zfs] [72742.171450] zil_commit+0xde/0x150 [zfs] [72742.173687] zfs_fsync+0x77/0xe0 [zfs] [72742.175044] zpl_fsync+0x80/0x110 [zfs] [72742.191690] vfs_fsync_range+0x51/0xb0 [72742.193876] do_fsync+0x3d/0x70 [72742.195126] SyS_fsync+0x10/0x20 [72742.211059] do_syscall_64+0x73/0x130 [72742.214078] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 It's possible to hit this issue due to a race between the ZFS evict and writeback threads. If the z_iput task is trying to evict a znode that's currently sitting in the writeback thread, both will "livelock" each other and stall the ZIO pipeline, causing other ZFS operations (such as zil_commit) to hang indefinitely. This has been documented and fixed upstream in PR#9583 [0]. We need to pull two fixes from upstream: the first one fixes the zfs_zget() issue in the writeback thread, while the second fixes a regression on O_TMPFILE descriptors caused by the first one. Upstream patches: - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8) - Check for unlinked znodes after igrab() (0c46813805f4) [Test Case] Being a race condition, this issue has been hard to reproduce consistently. The racing window between evict() and the ZFS writeback thread is quite strict, but users have reported this to show up after some hours of running LXD-containerized mySQL workloads. [Regression Potential] These patches have been tested both in the ZFS test suite and in production environments, so the potential for further regressions should be low. Additional regressions would likely cause issues with the ZFS writeback/commit and IO pipeline, so they should be spotted fairly quickly. [0] https://github.com/zfsonlinux/zfs/pull/9583 [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8 [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1856084] Re: Livelock between ZFS evict and writeback threads
** Changed in: zfs-linux (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to zfs-linux in Ubuntu. https://bugs.launchpad.net/bugs/1856084 Title: Livelock between ZFS evict and writeback threads Status in zfs-linux package in Ubuntu: Confirmed Status in zfs-linux package in Debian: Unknown Bug description: Livelock between ZFS evict and writeback threads [Impact] ZIO pipeline stalls, causing ZFS workloads to hang indefinitely [Description] For certain ZFS workloads, we start seeing hung task timeouts in the kernel logs due to zil_commit() stalling. This is due to zfs_zget() not detecting whether a znode has been marked for deletion before attempting to access it, causing a constant "retry loop" in zfs_get_data() if that znode has been unlinked already. An example of the stack traces follows: [72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [72742.070429] mysqld D0 5713 2881 0x0320 [72742.073220] Call Trace: [72742.075305] __schedule+0x24e/0x880 [72742.090436] schedule+0x2c/0x80 [72742.090438] schedule_preempt_disabled+0xe/0x10 [72742.090441] __mutex_lock.isra.5+0x276/0x4e0 [72742.090547] ? dmu_tx_destroy+0x105/0x130 [zfs] [72742.090555] __mutex_lock_slowpath+0x13/0x20 [72742.115374] ? __mutex_lock_slowpath+0x13/0x20 [72742.132266] mutex_lock+0x2f/0x40 [72742.134207] zil_commit_impl+0x1b0/0x1b30 [zfs] [72742.150428] ? spl_kmem_alloc+0x115/0x180 [spl] [72742.152622] ? mutex_lock+0x12/0x40 [72742.154819] ? zfs_refcount_add_many+0x9a/0x100 [zfs] [72742.171450] zil_commit+0xde/0x150 [zfs] [72742.173687] zfs_fsync+0x77/0xe0 [zfs] [72742.175044] zpl_fsync+0x80/0x110 [zfs] [72742.191690] vfs_fsync_range+0x51/0xb0 [72742.193876] do_fsync+0x3d/0x70 [72742.195126] SyS_fsync+0x10/0x20 [72742.211059] do_syscall_64+0x73/0x130 [72742.214078] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 It's possible to hit this issue due to a race between the ZFS evict and writeback threads. If the z_iput task is trying to evict a znode that's currently sitting in the writeback thread, both will "livelock" each other and stall the ZIO pipeline, causing other ZFS operations (such as zil_commit) to hang indefinitely. This has been documented and fixed upstream in PR#9583 [0]. We need to pull two fixes from upstream: the first one fixes the zfs_zget() issue in the writeback thread, while the second fixes a regression on O_TMPFILE descriptors caused by the first one. Upstream patches: - Break out of zfs_zget early if unlinked znode (41e1aa2a06f8) - Check for unlinked znodes after igrab() (0c46813805f4) [Test Case] Being a race condition, this issue has been hard to reproduce consistently. The racing window between evict() and the ZFS writeback thread is quite strict, but users have reported this to show up after some hours of running LXD-containerized mySQL workloads. [Regression Potential] These patches have been tested both in the ZFS test suite and in production environments, so the potential for further regressions should be low. Additional regressions would likely cause issues with the ZFS writeback/commit and IO pipeline, so they should be spotted fairly quickly. [0] https://github.com/zfsonlinux/zfs/pull/9583 [1] https://github.com/zfsonlinux/zfs/commit/41e1aa2a06f8 [2] https://github.com/zfsonlinux/zfs/commit/0c46813805f4 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1856084/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1824407] Re: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files
Tested with 5.3.0-25-generic #27-Ubuntu with the regression test and it now works fine. Marking bug as verification-done for eoan ** Tags removed: verification-needed-eoan ** Tags added: verification-done-eoan -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824407 Title: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files Status in linux package in Ubuntu: In Progress Status in linux-hwe package in Ubuntu: Invalid Status in linux-hwe source package in Bionic: In Progress Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Status in linux source package in Focal: In Progress Bug description: == SRU Justification Disco, Eoan, Focal == Multiple squashfs filesystems with overlayfs cause file corruption issues when modifying zero sized files == Fix == The current fix is pending in https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a == Test case == With an Ubuntu ISO on the cdrom drive, use: #!/bin/bash -x mkdir -p /cdrom mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom sleep 1 mkdir -p /cow mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow sleep 1 mkdir -p /cow/upper mkdir -p /cow/work modprobe -q -b overlay sleep 1 modprobe -q -b loop sleep 1 dev=$(losetup -f) mkdir -p /filesystem.squashfs losetup $dev /cdrom/casper/filesystem.squashfs mount -t squashfs -o ro,noatime $dev /filesystem.squashfs sleep 1 dev=$(losetup -f) mkdir -p /installer.squashfs losetup $dev /cdrom/casper/installer.squashfs mount -t squashfs -o ro,noatime $dev /installer.squashfs sleep 1 mkdir -p /root-tmp mount -t overlay -o 'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work' /cow /root-tmp FILE=/root-tmp/etc/.pwd.lock echo foo > $FILE cat $FILE sync # # dropping caches or remounting causes the bug # echo 3 > /proc/sys/vm/drop_caches cat $FILE Without the fix the cat of the file will produce an error. With the the cat will work correctly. == Regression Potential == There is an unhandled corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B However, since this is an issue without the fix and will be addressed later with subsequent fixes once they are OK with upstream I think the risk is minimal considering nobody is complaining about these corner cases with the current broken overlayfs squashfs layering. --- 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute: rm /scripts/casper-bottom/25adduser exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks in casper's 25adduser script to "rm" the offending files, and
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
** Changed in: linux (Ubuntu) Importance: High => Low ** Changed in: linux (Ubuntu) Status: Incomplete => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Triaged Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:37 DEBUG| [stdout] [ 3738.474354] unregister_netdevice: waiting
[Kernel-packages] [Bug 1822133] Re: Azure Instance never recovered during series of instance reboots.
Indeed, the commit is in in 4.15.0-1057 and has been released. Marking this bug as fixed released. commit b502cfeffec81be8564189e5498fd3f252b27900 Author: Taehee Yoo Date: Wed Sep 4 14:40:49 2019 -0300 ip: frags: fix crash in ip_do_fragment() BugLink: https://bugs.launchpad.net/bugs/1842447 commit 5d407b071dc369c26a38398326ee2be53651cfe4 upstream A kernel crash occurrs when defragmented packet is fragmented in ip_do_fragment(). In defragment routine, skb_orphan() is called and skb->ip_defrag_offset is set. but skb->sk and skb->ip_defrag_offset are same union member. so that frag->sk is not NULL. Hence crash occurrs in skb->sk check routine in ip_do_fragment() when defragmented packet is fragmented. ** Changed in: linux-azure (Ubuntu) Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1822133 Title: Azure Instance never recovered during series of instance reboots. Status in linux-azure package in Ubuntu: Fix Released Bug description: Description: During SRU Testing of various Azure Instances, there will be some cases where the instance will not respond following a system reboot. SRU Testing only restarts a giving instance once, after it preps all of the necessary files to-be-tested. Series: Disco Instance Size: Basic_A3 Region: (Default) US-WEST-2 Kernel Version: 4.18.0-1013-azure #13-Ubuntu SMP Thu Feb 28 22:54:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux I initiated a series of tests which rebooted Azure Cloud instances 50 times. During the 49th Reboot, an Instance failed to return from a reboot.. Upon grabbing the console output the following was seen scrolling endlessly. I have seen this failure in cases where the instance only restarted a handful of times >5 [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus [84.247704]hyperv_fb: unable to send packet via vmbus In another test attempt I saw the following failure: ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes ERROR ExtHandler /proc/net/route contains no routes Both of these failures broke networking, Both of these failures were seen at least twice to three times, thus may explain why in some cases we never recover from an instance reboot. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1822133/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1854959] Re: stress-ng sysinfo stressor trips kernel oops on ppc64el with 5.4.0.7-8
*** This bug is a duplicate of bug 1854968 *** https://bugs.launchpad.net/bugs/1854968 Same root corruption issue as bug 1854968 ** This bug has been marked a duplicate of bug 1854968 stress-ng sctp stressor breaks 5.4.0.7-8 on s390x -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854959 Title: stress-ng sysinfo stressor trips kernel oops on ppc64el with 5.4.0.7-8 Status in linux package in Ubuntu: In Progress Bug description: stress-ng on ppc64el with 5.4.0.7-8, sysinfo stressor seems to tickle a bug: 06:26:02 DEBUG| [stdout] sysinfo FAILED (kernel oopsed) 06:26:02 DEBUG| [stdout] [ 7262.965483] kernel tried to execute exec-protected page (c00017407ce0) - exploit attempt? (uid: 0) 06:26:02 DEBUG| [stdout] [ 7262.968030] BUG: Unable to handle kernel instruction fetch 06:26:02 DEBUG| [stdout] [ 7262.968121] Faulting instruction address: 0xc00017407ce0 06:26:02 DEBUG| [stdout] [ 7262.968224] Oops: Kernel access of bad area, sig: 11 [#1] 06:26:02 DEBUG| [stdout] [ 7262.968292] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 06:26:02 DEBUG| [stdout] [ 7262.968403] Modules linked in: unix_diag sctp zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO) zcommon(PO) znvpair(PO) spl(O) snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock kvm_pr kvm hci_vhci bluetooth ecdh_generic ecc userio uhid hid vhost_net vhost tap cuse dccp_ipv4 dccp psnap llc algif_rng aegis128 algif_aead anubis fcrypt khazad seed sm4_generic tea crc32_generic md4 michael_mic nhpoly1305 poly1305_generic rmd128 rmd160 rmd256 rmd320 sha3_generic sm3_generic streebog_generic tgr192 wp512 xxhash_generic algif_hash blowfish_generic blowfish_common cast5_generic des_generic libdes salsa20_generic chacha_generic camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common algif_skcipher af_alg aufs binfmt_misc af_packet_diag tcp_diag udp_diag raw_diag inet_diag iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables 06:26:02 DEBUG| [stdout] [ 7262.969078] x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_net virtio_blk net_failover failover [last unloaded: trace_printk] 06:26:02 DEBUG| [stdout] [ 7262.970416] CPU: 1 PID: 2613531 Comm: fuse_mnt Tainted: P OE 5.4.0-7-generic #8-Ubuntu 06:26:02 DEBUG| [stdout] [ 7262.970532] NIP: c00017407ce0 LR: c063e968 CTR: c00017407ce0 06:26:02 DEBUG| [stdout] [ 7262.970623] REGS: c001d8393810 TRAP: 0400 Tainted: P OE (5.4.0-7-generic) 06:26:02 DEBUG| [stdout] [ 7262.970737] MSR: 800010009033 CR: 88002440 XER: 2000 06:26:02 DEBUG| [stdout] [ 7262.970850] CFAR: c063e964 IRQMASK: 0 06:26:02 DEBUG| [stdout]GPR00: c063e944 c001d8393aa0 c1a5bf00 c0003d95ec00 06:26:02 DEBUG| [stdout]GPR04: c00017407c18 06:26:02 DEBUG| [stdout]GPR08: 06:26:02 DEBUG| [stdout]GPR12: c00017407ce0 c0003fffee00 7c8ab4814410 06:26:02 DEBUG| [stdout]GPR16: 7c8ab4b9 7c8ab4810320 7c8ab2f6f240 7c8ab4814420 06:26:02 DEBUG| [stdout]GPR20: 7c8aa8000b60 7c8ab4aad3a0 06:26:02 DEBUG| [stdout]GPR24: c001f38f7da0 c001fbb81e4c c00017407ce0 c001f38f7d80 06:26:02 DEBUG| [stdout]GPR28: c001f38f7da0 c0003d95ec00 c001f38f7d70 06:26:02 DEBUG| [stdout] [ 7262.971713] NIP [c00017407ce0] 0xc00017407ce0 06:26:02 DEBUG| [stdout] [ 7262.971804] LR [c063e968] fuse_request_end+0x128/0x2f0 06:26:02 DEBUG| [stdout] [ 7262.971893] Call Trace: 06:26:02 DEBUG| [stdout] [ 7262.971930] [c001d8393aa0] [c063e944] fuse_request_end+0x104/0x2f0 (unreliable) 06:26:02 DEBUG| [stdout] [ 7262.972035] [c001d8393af0] [c06427cc] fuse_dev_do_write+0x2cc/0x5c0 06:26:02 DEBUG| [stdout] [ 7262.972138] [c001d8393b70] [c0642f64] fuse_dev_write+0x74/0xd0 06:26:02 DEBUG| [stdout] [ 7262.972221] [c001d8393c00] [c04702b0] do_iter_readv_writev+0x240/0x290 06:26:02 DEBUG| [stdout] [ 7262.972334] [c001d8393c70] [c0472bc8] do_iter_write+0xc8/0x280 06:26:02 DEBUG| [stdout] [ 7262.972424] [c001d8393cc0] [c0472e90] vfs_writev+0xe0/0x180 06:26:02 DEBUG| [stdout] [ 7262.972508] [c001d8393dc0] [c0472fcc] do_writev+0x9c/0x1a0 06:26:02 DEBUG| [stdout] [ 7262.972588] [c001d8393e20]
[Kernel-packages] [Bug 1855151] Re: adt bpf tests crash 5.4.0-7 on ppc64el on power box
*** This bug is a duplicate of bug 1854968 *** https://bugs.launchpad.net/bugs/1854968 Same root issue as bug 1854968 ** This bug has been marked a duplicate of bug 1854968 stress-ng sctp stressor breaks 5.4.0.7-8 on s390x -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855151 Title: adt bpf tests crash 5.4.0-7 on ppc64el on power box Status in linux package in Ubuntu: In Progress Bug description: Running the ADT tests on a power box, the bpf tests crash the kernel as follows: [ 2745.079592] BUG: Unable to handle kernel instruction fetch (NULL pointer?) [ 2745.079808] Faulting instruction address: 0x [ 2745.079824] Oops: Kernel access of bad area, sig: 11 [#1] [ 2745.079993] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 2745.080011] Modules linked in: af_packet_diag tcp_diag udp_diag raw_diag inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds mac_hid ofpart cmdlinepart powernv_flash mtd ibmpowernv at24 uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng vmx_crypto sch_fq_codel ip_tables x_tables autofs4 bt rfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid ast drm_vram_he lper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm tg3 ahci libahci drm_panel_orientation_quirks [last unloaded: no tifier_error_inject] [ 2745.080195] CPU: 0 PID: 366 Comm: reuseport_bpf_c Not tainted 5.4.0-7-generic #8 [ 2745.080214] NIP: LR: c0ce8710 CTR: [ 2745.080233] REGS: c007ff6eb550 TRAP: 0400 Not tainted (5.4.0-7-generic) [ 2745.080250] MSR: 900040009033 CR: 24002282 XER: 2000 [ 2745.080272] CFAR: c000de44 IRQMASK: 0 [ 2745.080272] GPR00: c0d67c9c c007ff6eb7e0 c1a5bf00 c004258e10e0 [ 2745.080272] GPR04: c00802830038 c004258e10e0 0028 e3c2 [ 2745.080272] GPR08: [ 2745.080272] GPR12: c1cf 0001 [ 2745.080272] GPR16: 22b8 017f e3c2 017f [ 2745.080272] GPR20: c198c100 22b8 [ 2745.080272] GPR24: 0028 0080 017f [ 2745.080272] GPR28: c0080283 18ed5e01 c004258e10e0 c0075f0ff000 [ 2745.080409] NIP [] 0x0 [ 2745.080423] LR [c0ce8710] reuseport_select_sock+0x100/0x400 [ 2745.080439] Call Trace: [ 2745.080448] [c007ff6eb7e0] [c007ff6eb8a0] 0xc007ff6eb8a0 (unreliable) [ 2745.080469] [c007ff6eb880] [c0d67c9c] inet_lhash2_lookup+0x1ec/0x220 [ 2745.080490] [c007ff6eb900] [c0d6849c] __inet_lookup_listener+0x1ec/0x1f0 [ 2745.080509] [c007ff6eb9d0] [c0d96608] tcp_v4_rcv+0x6e8/0xe70 [ 2745.080527] [c007ff6ebb00] [c0d5a480] ip_protocol_deliver_rcu+0x60/0x2b0 [ 2745.080547] [c007ff6ebb50] [c0d5a740] ip_local_deliver_finish+0x70/0x90 [ 2745.080566] [c007ff6ebb70] [c0d5a7ec] ip_local_deliver+0x8c/0x140 [ 2745.080585] [c007ff6ebbe0] [c0d59aec] ip_rcv_finish+0xbc/0xf0 [ 2745.080602] [c007ff6ebc20] [c0d5a9a0] ip_rcv+0x100/0x110 [ 2745.080619] [c007ff6ebca0] [c0cab220] __netif_receive_skb_one_core+0x70/0xb0 [ 2745.080638] [c007ff6ebce0] [c0cac4f0] process_backlog+0xd0/0x230 [ 2745.080657] [c007ff6ebd50] [c0cadc68] net_rx_action+0x1e8/0x520 [ 2745.080674] [c007ff6ebe70] [c0ee2a7c] __do_softirq+0x15c/0x3b8 [ 2745.080692] [c007ff6ebf90] [c0030678] call_do_softirq+0x14/0x24 [ 2745.080709] [c0070656f7c0] [c001bf58] do_softirq_own_stack+0x38/0x50 [ 2745.080729] [c0070656f7e0] [c0143d60] do_softirq.part.0+0x80/0xb0 [ 2745.080914] [c0070656f810] [c0143e54] __local_bh_enable_ip+0xc4/0xf0 [ 2745.080933] [c0070656f830] [c0d5f8fc] ip_finish_output2+0x1fc/0x740 [ 2745.080953] [c0070656f8d0] [c0d61fe4] ip_output+0xd4/0x190 [ 2745.080971] [c0070656f960] [c0d61444] ip_local_out+0x64/0x90 [ 2745.080988] [c0070656f9a0] [c0d61838] __ip_queue_xmit+0x168/0x4d0 [ 2745.081007] [c0070656fa30] [c0d90a3c] ip_queue_xmit+0x1c/0x30 [ 2745.081024] [c0070656fa50] [c0d887e4] __tcp_transmit_skb+0x574/0xda0 [ 2745.081044] [c0070656fb00] [c0d89a88] tcp_connect+0x4b8/0x600 [ 2745.081060] [c0070656fbb0]
[Kernel-packages] [Bug 1824407] Re: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files
Hrm, I can't see the fix in the Ubuntu-5.3.0-24.26 kernel, so I think comment #34 a premature SRU test request. As it stands, I tested Ubuntu-5.3.0-24.26 and the issue still exists, and looking at the source the fix isn't present so that correlates with my test observations. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824407 Title: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files Status in linux package in Ubuntu: In Progress Status in linux-hwe package in Ubuntu: Invalid Status in linux-hwe source package in Bionic: In Progress Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Status in linux source package in Focal: In Progress Bug description: == SRU Justification Disco, Eoan, Focal == Multiple squashfs filesystems with overlayfs cause file corruption issues when modifying zero sized files == Fix == The current fix is pending in https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a == Test case == With an Ubuntu ISO on the cdrom drive, use: #!/bin/bash -x mkdir -p /cdrom mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom sleep 1 mkdir -p /cow mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow sleep 1 mkdir -p /cow/upper mkdir -p /cow/work modprobe -q -b overlay sleep 1 modprobe -q -b loop sleep 1 dev=$(losetup -f) mkdir -p /filesystem.squashfs losetup $dev /cdrom/casper/filesystem.squashfs mount -t squashfs -o ro,noatime $dev /filesystem.squashfs sleep 1 dev=$(losetup -f) mkdir -p /installer.squashfs losetup $dev /cdrom/casper/installer.squashfs mount -t squashfs -o ro,noatime $dev /installer.squashfs sleep 1 mkdir -p /root-tmp mount -t overlay -o 'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work' /cow /root-tmp FILE=/root-tmp/etc/.pwd.lock echo foo > $FILE cat $FILE sync # # dropping caches or remounting causes the bug # echo 3 > /proc/sys/vm/drop_caches cat $FILE Without the fix the cat of the file will produce an error. With the the cat will work correctly. == Regression Potential == There is an unhandled corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B However, since this is an issue without the fix and will be addressed later with subsequent fixes once they are OK with upstream I think the risk is minimal considering nobody is complaining about these corner cases with the current broken overlayfs squashfs layering. --- 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute: rm /scripts/casper-bottom/25adduser exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that remount of / will cause IO errors on those files. Currently, we are shipping two hacks
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
See: https://lkml.org/lkml/2019/12/5/476 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:37 DEBUG| [stdout] [ 3738.474354] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:48 DEBUG| [stdout] [ 3748.734396]
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
This goes right back to 4.6.x: .6.7 crash (see below) 4.7.10 crash in xfrm6_dst_ifdown 4.8.17 crash in xfrm6_dst_ifdown 4.12.14 crash (see below) 4.13.16 reports "unregister_netdevice: waiting for eth0 to become free. Usage count = 2" 4.14.157 reports "unregister_netdevice: waiting for eth0 to become free. Usage count = 2"" 4.15.18 .. 5.4 hangs on socket() call 4.6.7: [ 34.457967] BUG: scheduling while atomic: kworker/u8:0/6/0x0200 [ 34.458021] Modules linked in: esp6 xfrm6_mode_transport drbg ansi_cprng seqiv esp4 xfrm4_mode_transport xfrm_user xfrm_algo l2tp_ip6 l2tp_eth l2tp_ip l2tp_netlink veth l2tp_core ip6_udp_tunnel udp_tunnel squashfs binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev kvm_intel kvm irqbypass joydev input_leds snd_hda_codec_generic serio_raw snd_hda_intel snd_hda_codec parport_pc 8250_fintek parport snd_hda_core qemu_fw_cfg snd_hwdep snd_pcm snd_timer mac_hid snd soundcore sch_fq_codel virtio_rng ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel qxl ttm drm_kms_helper syscopyarea sysfillrect aesni_intel sysimgblt [ 34.458086] fb_sys_fops aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 drm psmouse pata_acpi floppy [ 34.458100] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 4.6.7-040607-generic #201608160432 [ 34.458103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 34.458131] Workqueue: netns cleanup_net [ 34.458135] 0286 2fa171e7 88007c8e7ab8 813f7594 [ 34.458139] 88007fc96b80 7fff 88007c8e7ac8 810a8f6b [ 34.458143] 88007c8e7b18 8184905b 00ff88007c8e7ae8 8106463e [ 34.458147] Call Trace: [ 34.458161] [] dump_stack+0x63/0x8f [ 34.458166] [] __schedule_bug+0x4b/0x60 [ 34.458185] [] __schedule+0x5eb/0x7a0 [ 34.458191] [] ? kvm_sched_clock_read+0x1e/0x30 [ 34.458195] [] schedule+0x35/0x80 [ 34.458203] [] schedule_timeout+0x1b2/0x270 [ 34.458207] [] ? __schedule+0x304/0x7a0 [ 34.458212] [] wait_for_completion+0xb3/0x140 [ 34.458217] [] ? wake_up_q+0x70/0x70 [ 34.458226] [] __wait_rcu_gp+0xc8/0xf0 [ 34.458231] [] synchronize_sched.part.58+0x38/0x50 [ 34.458235] [] ? call_rcu_bh+0x20/0x20 [ 34.458239] [] ? trace_raw_output_rcu_utilization+0x60/0x60 [ 34.458244] [] synchronize_sched+0x33/0x40 [ 34.458251] [] __l2tp_session_unhash+0xd1/0xe0 [l2tp_core] [ 34.458256] [] l2tp_tunnel_closeall+0x9e/0x140 [l2tp_core] [ 34.458261] [] l2tp_tunnel_delete+0x19/0x70 [l2tp_core] [ 34.458265] [] l2tp_exit_net+0x4b/0x80 [l2tp_core] [ 34.458269] [] ops_exit_list.isra.4+0x38/0x60 [ 34.458273] [] cleanup_net+0x1c4/0x2a0 [ 34.458281] [] process_one_work+0x1fc/0x490 [ 34.458285] [] worker_thread+0x4b/0x500 [ 34.458290] [] ? process_one_work+0x490/0x490 [ 34.458293] [] kthread+0xd8/0xf0 [ 34.458298] [] ret_from_fork+0x22/0x40 [ 34.458302] [] ? kthread_create_on_node+0x1b0/0x1b0 [ 34.514067] [ cut here ] 4.12.14: [ 20.760253] [ cut here ] [ 20.760256] kernel BUG at /home/kernel/COD/linux/net/ipv6/xfrm6_policy.c:265! [ 20.760299] invalid opcode: [#1] SMP [ 20.760320] Modules linked in: appletalk psnap llc esp6 xfrm6_mode_transport esp4 xfrm4_mode_transport xfrm_user xfrm_algo l2tp_ip6 l2tp_eth l2tp_ip l2tp_netlink veth l2tp_core ip6_udp_tunnel udp_tunnel binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev ppdev snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel snd_hda_codec snd_hda_core input_leds snd_hwdep serio_raw snd_pcm snd_timer hid_generic snd soundcore parport_pc parport mac_hid qemu_fw_cfg sch_fq_codel virtio_rng ip_tables x_tables autofs4 usbhid hid btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd qxl glue_helper ttm cryptd drm_kms_helper psmouse [ 20.760677] syscopyarea sysfillrect virtio_blk sysimgblt fb_sys_fops drm floppy virtio_net i2c_piix4 pata_acpi [ 20.760731] CPU: 3 PID: 49 Comm: kworker/u8:1 Not tainted 4.12.14-041214-generic #201709200843 [ 20.760772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 20.760814] Workqueue: netns cleanup_net [ 20.760836] task: 8aa4bcbbad00 task.stack: 9dc5804c [ 20.760867] RIP: 0010:xfrm6_dst_ifdown+0xa0/0xb0 [ 20.760890] RSP: 0018:9dc5804c3be0 EFLAGS: 00010246 [ 20.760916] RAX: 8aa4b6e6a000 RBX: 8aa4bc1b3500 RCX: [ 20.760950] RDX: 0001 RSI: 8aa4b6f39000 RDI: 8aa4bc1b3500 [ 20.760984] RBP: 9dc5804c3c08 R08:
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
Ah, fails on 5.2.0-15-generic, 5.3.0-18 generic too. Appears that the regression test was enabled quite recently: commit b5b9181c2403025b2c7ae7ea44333fd8fe6dbb54 (between 5.4-rc3 and 5.4-rc4) Author: David Ahern Date: Mon Oct 21 19:02:43 2019 -0600 selftests: Make l2tp.sh executable commit e858ef1cd4bc1bdfcd18114a8195236e336cee42 (between 5.4-rc3 and 5.4-rc4) Author: David Ahern Date: Mon Aug 5 15:41:37 2019 -0700 Since this breaks in 5.3 then this issue is in eoan and hence is not a focal regression. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
Occurs between 5.3 and 5.4-rc1 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:37 DEBUG| [stdout] [ 3738.474354] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:48 DEBUG| [stdout] [ 3748.734396]
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
Easy steps to reproduce this issue: sudo modprobe l2tp_core sudo ./linux-5.4.0/tools/testing/selftests/net/l2tp.sh ./close where close is compiled from: #include #include #include #include int main() { int fd; printf("calling socket..\n"); fd = socket(AF_APPLETALK, SOCK_STREAM, 0); printf("socket returned: %d\n", fd); } When running the above program we just see "calling socket" and it blocks forever on the socket call. After a couple of minutes we get the kernel hung task warning. We also see repeated messages: unregister_netdevice: waiting for eth0 to become free. Usage count = 1 I'll bisect the kernel next. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
The unregister_netdevice issue occurs running the kernel self test in testing/selftests/net/l2tp.sh after modprobing the l2tp driver. A hang can be the produced by running the stress-ng close stressor, this is just expediting an eventual hang caused by this test. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335]
[Kernel-packages] [Bug 1855151] Re: adt bpf tests crash 5.4.0-7 on ppc64el on power box
17:59:24 DEBUG| [stdout] # send cpu 63, receive socket 63 17:59:24 DEBUG| [stdout] # send cpu 65, receive socket 65 17:59:24 DEBUG| [stdout] # send cpu 67, receive socket 67 17:59:24 DEBUG| [stdout] # send cpu 69, receive socket 69 17:59:24 DEBUG| [stdout] # send cpu 71, receive socket 71 17:59:24 DEBUG| [stdout] # send cpu 73, receive socket 73 [ 3269.552837] test_bpf: #0 TAX jited:1 [ 3269.552885] Oops: Exception in kernel mode, sig: 4 [#1] [ 3269.552916] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 3269.552928] Modules linked in: test_bpf(+) tls af_packet_diag tcp_diag udp_diag raw_diag inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds mac_hid ofpart cmdlinepart powernv_flash mtd at24 opal_prd uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler ibmpowernv vmx_crypto powernv_rng sch_fq_codel ip_tables x_t ables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid ast drm_vram_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm ahci tg3 libahci drm_panel_orientation_quirks [l ast unloaded: notifier_error_inject] [ 3269.847244] CPU: 55 PID: 137 Comm: modprobe Not tainted 5.4.0-7-generic #8 [ 3269.926547] NIP: c008029f80b4 LR: c0080465106c CTR: c008029f80b4 [ 3269.927427] REGS: c00712eb3410 TRAP: 0e40 Not tainted (5.4.0-7-generic) [ 3269.928286] MSR: 9288b033 CR: 28222422 XER: 2000 [ 3270.036372] CFAR: c000de44 IRQMASK: 0 [ 3270.036372] GPR00: c00804651044 c00712eb36a0 c0080465dd00 c00415ee1600 [ 3270.036372] GPR04: c00802850038 01f401dc 00025a599268f4d4 [ 3270.036372] GPR08: 0018 018acb48de01 0018f194 c00804651ac0 [ 3270.036372] GPR12: c008029f80b4 c007ff741c80 0008 007b [ 3270.036372] GPR16: 00081234aaab 0241 024c 20c49ba5e353f7cf [ 3270.036372] GPR20: c00415ee1600 c00804656dc9 c00804656e74 03e8 [ 3270.036372] GPR24: c00802850038 1234 c00804656e50 c0080285 [ 3270.036372] GPR28: 02f94279bb09 c00804655dc0 [ 3270.306180] NIP [c008029f80b4] 0xc008029f80b4 [ 3270.307006] LR [c0080465106c] run_one+0x2b0/0x41c [test_bpf] [ 3270.307912] Call Trace: [ 3270.307923] [c00712eb36a0] [c00804651044] run_one+0x288/0x41c [test_bpf] (unreliable) [ 3270.415622] [c00712eb37b0] [c00804651474] test_bpf+0x29c/0x3d8 [test_bpf] [ 3270.416485] [c00712eb38a0] [c00804651714] test_bpf_init+0x164/0x468 [test_bpf] [ 3270.505901] [c00712eb3990] [c00100c4] do_one_initcall+0x64/0x2b0 [ 3270.506777] [c00712eb3a60] [c0225bec] do_init_module+0x7c/0x2e0 [ 3270.507674] [c00712eb3af0] [c0228e88] load_module+0x1628/0x1a40 [ 3270.606197] [c00712eb3d00] [c02295a8] __do_sys_finit_module+0xc8/0x150 [ 3270.607134] [c00712eb3e20] [c000b278] system_call+0x5c/0x68 [ 3270.608814] Instruction dump: [ 3270.608857] [ 3270.713687] [ 3270.716164] ---[ end trace fd593383c9195849 ]--- 17:59:24 DEBUG| [stdout] # send c[ 3270.826052] pu 75, receive socket 75 17:59:24 DEBUG| [stdout] # send cpu 77, receive socket 77 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855151 Title: adt bpf tests crash 5.4.0-7 on ppc64el on power box Status in linux package in Ubuntu: In Progress Bug description: Running the ADT tests on a power box, the bpf tests crash the kernel as follows: [ 2745.079592] BUG: Unable to handle kernel instruction fetch (NULL pointer?) [ 2745.079808] Faulting instruction address: 0x [ 2745.079824] Oops: Kernel access of bad area, sig: 11 [#1] [ 2745.079993] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 2745.080011] Modules linked in: af_packet_diag tcp_diag udp_diag raw_diag inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds mac_hid ofpart cmdlinepart powernv_flash mtd ibmpowernv at24 uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng vmx_crypto sch_fq_codel ip_tables x_tables autofs4 bt rfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid ast drm_vram_he lper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm tg3 ahci libahci
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
I added a background task to dump out new dmesg messages and I now see messages such as the following *before* any stress-ng tests run. I think we can therefore assume the damage to the kernel occurred in prior ADT tests. 11:02:46 DEBUG| [stdout] [ 3093.210307] unregister_netdevice: waiting for lo to become free. Usage count = 1 Current hypothesis is that corruption is happening with the bpf kernel regression tests. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [ ] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [ ] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [ ] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [ ] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [ ] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [ ] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [ ] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [ ] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count
[Kernel-packages] [Bug 1855151] [NEW] adt bpf tests crash 5.4.0-7 on ppc64el on power box
[ 2745.096394] ---[ end trace d347ca85a257c66f ]--- [ 2745.208020] [ 2746.208219] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 274[ 2796.226294116,5] OPAL: Reboot request... 6.316857] Rebooting in 10 seconds.. The final ADT test output recorded was: 17:03:13 DEBUG| [stdout] # IPv6 TCP 17:03:13 DEBUG| [stdout] # Testing EBPF mod 10... 17:03:13 DEBUG| [stdout] # Socket 0: 0 17:03:13 DEBUG| [stdout] # Socket 1: 1 17:03:13 DEBUG| [stdout] # Socket 2: 2 17:03:13 DEBUG| [stdout] # Socket 3: 3 ... etc ... 17:03:13 DEBUG| [stdout] # Socket 4: 4 17:03:13 DEBUG| [stdout] # Socket 5: 5 17:03:13 DEBUG| [stdout] # Socket 9: 19 17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5... 17:03:13 DEBUG| [stdout] # Socket 0: 0 ... 17:03:13 DEBUG| [stdout] # Socket 3: 18 17:03:13 DEBUG| [stdout] # Socket 4: 19 ... 17:03:13 DEBUG| [stdout] # Testing CBPF mod 10... 17:03:13 DEBUG| [stdout] # Socket 0: 0 ... 17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5... 17:03:13 DEBUG| [stdout] # Socket 0: 0 17:03:13 DEBUG| [stdout] # Socket 1: 1 ... 17:03:13 DEBUG| [stdout] # Socket 4: 19 17:03:13 DEBUG| [stdout] # Testing too many filters... 17:03:13 DEBUG| [stdout] # Testing filters on non-SO_REUSEPORT socket... 17:03:13 DEBUG| [stdout] # IPv6 TCP w/ mapped IPv4 17:03:13 DEBUG| [stdout] # Testing EBPF mod 10... 17:03:13 DEBUG| [stdout] # Socket 0: 0 17:03:13 DEBUG| [stdout] # Socket 1: 1 ... 17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5... 17:03:13 DEBUG| [stdout] # Socket 0: 0 17:03:13 DEBUG| [stdout] # Socket 1: 1 ... 17:03:13 DEBUG| [stdout] # Testing CBPF mod 10... 17:03:13 DEBUG| [stdout] # Socket 0: 0 17:03:13 DEBUG| [stdout] # Socket 1: 1 ... 17:03:13 DEBUG| [stdout] # Reprograming, testing mod 5... 17:03:13 DEBUG| [stdout] # Socket 0: 0 17:03:13 DEBUG| [stdout] # Socket 1: 1 ... 17:03:13 DEBUG| [stdout] # Testing filter add without bind... 17:03:13 DEBUG| [stdout] # SUCCESS 17:03:13 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf 17:03:13 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu 17:03:13 DEBUG| [stdout] # IPv4 UDP 17:03:13 DEBUG| [stdout] # send cpu 0, receive socket 0 17:03:13 DEBUG| [stdout] # send cpu 1, receive socket 1 ... 17:03:13 DEBUG| [stdout] # send cpu 125, receive socket 125 17:03:13 DEBUG| [stdout] # send cpu 127, receive socket 127 17:03:13 DEBUG| [stdout] # IPv4 TCP [ end of output as machine panic's ] ..so it occurred sometime around or after this. I'll re-run this with the ipmi tool on the console to see if I can see how far it got before the kernel panic'd. ** Affects: linux (Ubuntu) Importance: High Assignee: Colin Ian King (colin-king) Status: In Progress ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) ** Changed in: linux (Ubuntu) Status: New => In Progress ** Changed in: linux (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855151 Title: adt bpf tests crash 5.4.0-7 on ppc64el on power box Status in linux package in Ubuntu: In Progress Bug description: Running the ADT tests on a power box, the bpf tests crash the kernel as follows: [ 2745.079592] BUG: Unable to handle kernel instruction fetch (NULL pointer?) [ 2745.079808] Faulting instruction address: 0x [ 2745.079824] Oops: Kernel access of bad area, sig: 11 [#1] [ 2745.079993] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 2745.080011] Modules linked in: af_packet_diag tcp_diag udp_diag raw_diag inet_diag binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds mac_hid ofpart cmdlinepart powernv_flash mtd ibmpowernv at24 uio_pdrv_genirq uio ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd powernv_rng vmx_crypto sch_fq_codel ip_tables x_tables autofs4 bt rfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid ast drm_vram_he lper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm tg3 ahci libahci drm_panel_orientation_quirks [last unloaded: no tifier_error_inject] [ 2745.080195] CPU: 0 PID: 366 Comm: reuseport_bpf_c Not tainted 5.4.0-7-generic #8 [ 2745.080214] NIP: LR: c0ce8710 CTR: [ 2745.080233] REGS: c007ff6eb550 TRAP: 0400 Not tainted (5.4.0-7-generic) [ 2745.080250] MSR: 900040009033 CR: 24002282 XER: 2000 [ 2745.080272] CFAR: c000de44 IRQMASK: 0 [ 2745.080272] GPR00: c0d67c9c c007ff6eb7e0 c1a5bf00 c004258e10e0 [ 2745.080272] GPR04: c00802830038 c004258e10e0 0028 e3c2 [ 2745.080272] GPR08: 000
[Kernel-packages] [Bug 1855143] Re: 5.4.0-7 kernel crash on boot on power box
Re-installed the kernel, it's booting fine now. I wonder if I had some kind of corruption from a previous test crash. Can't reproduce this now. Marking it as Invalid. ** Changed in: linux (Ubuntu) Status: New => Invalid ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855143 Title: 5.4.0-7 kernel crash on boot on power box Status in linux package in Ubuntu: Invalid Bug description: boot failures with 5.4.0-7-generic on OPAL power box: I was running ADT tests and the machine hung/rebooted. I was unable to log in. After I rebooted the machine with the ipmi tool the machine crashed with the following kernel output: [ 51.081421774,5] SkiBoot skiboot-5.4.8-5787ad3 starting... [ 51.081426316,5] initial console log level: memory 7, driver 5 [ 51.081429224,6] CPU: P8 generation processor(max 8 threads/core) [ 51.081432044,7] CPU: Boot CPU PIR is 0x0470 PVR is 0x004d0200 [ 51.081435009,7] CPU: Initial max PIR set to 0x1fff [ 51.082535316,5] OPAL table: 0x300bfc40 .. 0x300c0110, branch table: 0x30002000 [ 51.082543101,5] FDT: Parsing fdt @0xff0 [ 51.087692296,5] XSCOM: chip 0x0 at 0x3fc00 [P8 DD2.0] [ 51.087702232,5] XSCOM: chip 0x8 at 0x3fc40 [P8 DD2.0] [ 51.087709775,6] XSTOP: XSCOM addr = 0x2010c82, FIR bit = 31 [ 51.087713185,6] MFSI 0:0: Initialized [ 51.087715462,6] MFSI 0:2: Initialized [ 51.087717669,6] MFSI 0:1: Initialized [ 51.087720203,6] MFSI 8:0: Initialized [ 51.087722365,6] MFSI 8:2: Initialized [ 51.087724518,6] MFSI 8:1: Initialized [ 51.088044434,5] LPC: LPC[000]: Initialized, access via XSCOM @0xb0020 [ 51.088162270,5] LPC: LPC: Default bus on chip 0x0 [ 51.088303476,6] MEM: parsing reserved memory from node /ibm,hostboot/reserved-memory [ 51.088313438,7] HOMER: Init chip 0 [ 51.088316406,7] PBA BAR0 : 0x0007fd80 [ 51.088319108,7] PBA MASK0: 0x0030 [ 51.088321761,7] HOMER Image at 0x7fd80 size 4MB [ 51.088325579,7] PBA BAR2 : 0x4007fda0 [ 51.088328358,7] PBA MASK2: 0x [ 51.088330928,7] SLW Image at 0x7fda0 size 1MB [ 51.088334409,7] PBA BAR3 : 0x0007ff80 [ 51.088337060,7] PBA MASK3: 0x0070 [ 51.088339732,7] OCC Common Area at 0x7ff80 size 8MB [ 51.088342594,7] HOMER: Init chip 8 [ 51.088345257,7] PBA BAR0 : 0x0007fdc0 [ 51.088347872,7] PBA MASK0: 0x0030 [ 51.088350519,7] HOMER Image at 0x7fdc0 size 4MB [ 51.088354173,7] PBA BAR2 : 0x4007fde0 [ 51.088356860,7] PBA MASK2: 0x [ 51.088359365,7] SLW Image at 0x7fde0 size 1MB [ 51.088362788,7] PBA BAR3 : 0x0007ff80 [ 51.088365419,7] PBA MASK3: 0x0070 [ 51.088367946,7] OCC Common Area at 0x7ff80 size 8MB [ 51.088387526,7] CPU idle state device tree init [ 51.088391002,4] SLW: HB-provided idle states property found [ 51.088567406,7] AST: PNOR LPC offset: 0x0c00 [ 51.088650577,5] PLAT: Using virtual UART [ 51.088977615,7] UART: Using LPC IRQ 4 [ 51.203625382,5] PLAT: Detected Firestone platform [ 51.219765305,5] PLAT: Detected BMC platform AMI [ 51.239417466,5] CENTAUR: Found centaur for chip 0x0 channel 4 [ 51.239524825,5] CENTAUR: FSI host: 0x0 cMFSI0 port 7 [ 51.241283553,5] CENTAUR: Found centaur for chip 0x0 channel 5 [ 51.241759761,5] CENTAUR: FSI host: 0x0 cMFSI0 port 6 [ 51.242362656,5] PSI[0x000]: Found PSI bridge [active=0] [ 51.242690427,5] PSI[0x008]: Found PSI bridge [active=0] [ 51.245117930,5] CPU: All 128 processors called in... [2.472212005,5] FLASH: Found system flash: Macronix MXxxL51235F id:0 [2.472354468,5] BT: Interface initialized, IO 0x00e4 [3.421491873,5] NVRAM: Size is 576 KB [4.095942958,5] STB: secure mode off [4.096004331,5] STB: trusted mode off [4.096965839,5] CAPI: Preloading ucode 200ea [4.097023615,5] FLASH: Queueing preload of 2/200ea [4.097202595,5] FLASH: Queueing preload of 0/0 [4.097723471,5] FLASH: Queueing preload of 1/0 [4.097739635,7] FFS: Partition map size: 0x1000 [4.101069429,7] FLASH: CAPP partition has ECC [4.117588444,5] STB: sb_verify skipped resource 2, secure_mode=0 [4.117607170,5] Chip 0 Found PBCQ0 at /xscom@3fc00/pbcq@2012000 [4.117610665,7] PHB3[0:0]: X[PE]=0x02012000 X[PCI]=0x09012000 X[SPCI]=0x09013c00 [4.117690635,7] PHB3[0:0] REGS = 0x0003fffe4000 [4k] [4.124862367,7] PHB3[0:0] PCIBAR = 0x0003fffe4000 [4.144741905,7] PHB3[0:0] MMIO0= 0x2000 [0x0100] [4.147663099,7] PHB3[0:0]
[Kernel-packages] [Bug 1855143] [NEW] 5.4.0-7 kernel crash on boot on power box
Public bug reported: boot failures with 5.4.0-7-generic on OPAL power box: I was running ADT tests and the machine hung/rebooted. I was unable to log in. After I rebooted the machine with the ipmi tool the machine crashed with the following kernel output: [ 51.081421774,5] SkiBoot skiboot-5.4.8-5787ad3 starting... [ 51.081426316,5] initial console log level: memory 7, driver 5 [ 51.081429224,6] CPU: P8 generation processor(max 8 threads/core) [ 51.081432044,7] CPU: Boot CPU PIR is 0x0470 PVR is 0x004d0200 [ 51.081435009,7] CPU: Initial max PIR set to 0x1fff [ 51.082535316,5] OPAL table: 0x300bfc40 .. 0x300c0110, branch table: 0x30002000 [ 51.082543101,5] FDT: Parsing fdt @0xff0 [ 51.087692296,5] XSCOM: chip 0x0 at 0x3fc00 [P8 DD2.0] [ 51.087702232,5] XSCOM: chip 0x8 at 0x3fc40 [P8 DD2.0] [ 51.087709775,6] XSTOP: XSCOM addr = 0x2010c82, FIR bit = 31 [ 51.087713185,6] MFSI 0:0: Initialized [ 51.087715462,6] MFSI 0:2: Initialized [ 51.087717669,6] MFSI 0:1: Initialized [ 51.087720203,6] MFSI 8:0: Initialized [ 51.087722365,6] MFSI 8:2: Initialized [ 51.087724518,6] MFSI 8:1: Initialized [ 51.088044434,5] LPC: LPC[000]: Initialized, access via XSCOM @0xb0020 [ 51.088162270,5] LPC: LPC: Default bus on chip 0x0 [ 51.088303476,6] MEM: parsing reserved memory from node /ibm,hostboot/reserved-memory [ 51.088313438,7] HOMER: Init chip 0 [ 51.088316406,7] PBA BAR0 : 0x0007fd80 [ 51.088319108,7] PBA MASK0: 0x0030 [ 51.088321761,7] HOMER Image at 0x7fd80 size 4MB [ 51.088325579,7] PBA BAR2 : 0x4007fda0 [ 51.088328358,7] PBA MASK2: 0x [ 51.088330928,7] SLW Image at 0x7fda0 size 1MB [ 51.088334409,7] PBA BAR3 : 0x0007ff80 [ 51.088337060,7] PBA MASK3: 0x0070 [ 51.088339732,7] OCC Common Area at 0x7ff80 size 8MB [ 51.088342594,7] HOMER: Init chip 8 [ 51.088345257,7] PBA BAR0 : 0x0007fdc0 [ 51.088347872,7] PBA MASK0: 0x0030 [ 51.088350519,7] HOMER Image at 0x7fdc0 size 4MB [ 51.088354173,7] PBA BAR2 : 0x4007fde0 [ 51.088356860,7] PBA MASK2: 0x [ 51.088359365,7] SLW Image at 0x7fde0 size 1MB [ 51.088362788,7] PBA BAR3 : 0x0007ff80 [ 51.088365419,7] PBA MASK3: 0x0070 [ 51.088367946,7] OCC Common Area at 0x7ff80 size 8MB [ 51.088387526,7] CPU idle state device tree init [ 51.088391002,4] SLW: HB-provided idle states property found [ 51.088567406,7] AST: PNOR LPC offset: 0x0c00 [ 51.088650577,5] PLAT: Using virtual UART [ 51.088977615,7] UART: Using LPC IRQ 4 [ 51.203625382,5] PLAT: Detected Firestone platform [ 51.219765305,5] PLAT: Detected BMC platform AMI [ 51.239417466,5] CENTAUR: Found centaur for chip 0x0 channel 4 [ 51.239524825,5] CENTAUR: FSI host: 0x0 cMFSI0 port 7 [ 51.241283553,5] CENTAUR: Found centaur for chip 0x0 channel 5 [ 51.241759761,5] CENTAUR: FSI host: 0x0 cMFSI0 port 6 [ 51.242362656,5] PSI[0x000]: Found PSI bridge [active=0] [ 51.242690427,5] PSI[0x008]: Found PSI bridge [active=0] [ 51.245117930,5] CPU: All 128 processors called in... [2.472212005,5] FLASH: Found system flash: Macronix MXxxL51235F id:0 [2.472354468,5] BT: Interface initialized, IO 0x00e4 [3.421491873,5] NVRAM: Size is 576 KB [4.095942958,5] STB: secure mode off [4.096004331,5] STB: trusted mode off [4.096965839,5] CAPI: Preloading ucode 200ea [4.097023615,5] FLASH: Queueing preload of 2/200ea [4.097202595,5] FLASH: Queueing preload of 0/0 [4.097723471,5] FLASH: Queueing preload of 1/0 [4.097739635,7] FFS: Partition map size: 0x1000 [4.101069429,7] FLASH: CAPP partition has ECC [4.117588444,5] STB: sb_verify skipped resource 2, secure_mode=0 [4.117607170,5] Chip 0 Found PBCQ0 at /xscom@3fc00/pbcq@2012000 [4.117610665,7] PHB3[0:0]: X[PE]=0x02012000 X[PCI]=0x09012000 X[SPCI]=0x09013c00 [4.117690635,7] PHB3[0:0] REGS = 0x0003fffe4000 [4k] [4.124862367,7] PHB3[0:0] PCIBAR = 0x0003fffe4000 [4.144741905,7] PHB3[0:0] MMIO0= 0x2000 [0x0100] [4.147663099,7] PHB3[0:0] MMIO1= 0x3fe0 [0x8000] [4.151015049,7] PHB3[0:0] BAREN= 0xf800 [4.151018735,7] PHB3[0:0] NEWBAREN = 0xf800 [4.152491015,7] PHB3[0:0] IRSNC= 0x0100 [4.177266431,5] STB: tb_measure skipped resource 2, trusted_mode=0 [4.177266472,7] PHB3[0:0] IRSNM= 0xff00 [4.177269336,7] PHB3[0:0] LSI = 0xff00 [4.177278668,5] Chip 0 Found PBCQ1 at /xscom@3fc00/pbcq@2012400 [4.177282022,7] PHB3[0:1]: X[PE]=0x02012400 X[PCI]=0x09012400 X[SPCI]=0x09013c40 [4.178715842,7] PHB3[0:1] REGS = 0x0003fffe4010 [4k] [4.183043807,7] PHB3[0:1] PCIBAR = 0x0003fffe4010 [4.190163295,5] Chip 8 Found PBCQ0 at
[Kernel-packages] [Bug 1855100] [NEW] bpf self tests break 5.4.0-7-generic on power8 system
Public bug reported: Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of the bare metal system. Last output seen while ssh'd into the box: 11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls 11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running socket test 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # [PASS] 11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running psock_fanout test 11:52:34 DEBUG| [stdout] # client_loop: send disconnect: Broken pipe last output in (truncated) nohup output: f -emit-llvm -c progs/pyperf180.c -o - || \ 11:52:15 DEBUG| [stdout]echo "clang failed") | \ 11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe \ 11:52:15 DEBUG| [stdout]-filetype=obj -o /home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o this suggests the bpf selftests are causing the breakage. last output logged in /var/log/dmesg.log : Dec 4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to MEM_GOING_OFFLINE next entries on dmesg.log show machine had rebooted. ** Affects: linux (Ubuntu) Importance: High Status: New ** Changed in: linux (Ubuntu) Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855100 Title: bpf self tests break 5.4.0-7-generic on power8 system Status in linux package in Ubuntu: New Bug description: Running ADT tests on POWER8 5.4.0-7-generic (gulpin) causes reboot of the bare metal system. Last output seen while ssh'd into the box: 11:52:34 DEBUG| [stdout] ok 6 selftests: net: tls 11:52:34 DEBUG| [stdout] # selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running socket test 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # [PASS] 11:52:34 DEBUG| [stdout] ok 7 selftests: net: run_netsocktests 11:52:34 DEBUG| [stdout] # selftests: net: run_afpackettests 11:52:34 DEBUG| [stdout] # 11:52:34 DEBUG| [stdout] # running psock_fanout test 11:52:34 DEBUG| [stdout] # client_loop: send disconnect: Broken pipe last output in (truncated) nohup output: f -emit-llvm -c progs/pyperf180.c -o - || \ 11:52:15 DEBUG| [stdout]echo "clang failed") | \ 11:52:15 DEBUG| [stdout] llc -march=bpf -mattr=+alu32 -mcpu=probe \ 11:52:15 DEBUG| [stdout]-filetype=obj -o /home/ubuntu/autotest/client/tmp/ubuntu_kernel_selftests/src/linux/tools/testing/selftests/bpf/alu32/pyperf180.o this suggests the bpf selftests are causing the breakage. last output logged in /var/log/dmesg.log : Dec 4 11:50:17 gulpin kernel: [ 5031.966277] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.975298] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.984300] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5031.993389] Injecting error (-12) to MEM_GOING_OFFLINE Dec 4 11:50:17 gulpin kernel: [ 5032.002407] Injecting error (-12) to MEM_GOING_OFFLINE next entries on dmesg.log show machine had rebooted. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855100/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1824407] Re: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files
Verified for disco: Run reproducer script with old kernel: 5.0.0-37-generic, results: cat /root-tmp/etc/.pwd.lock cat: /root-tmp/etc/.pwd.lock: Input/output error Run with -proposed kernel: 5.0.0-38-generic cat /root-tmp/etc/.pwd.lock foo Marking as verification-done-disco ** Tags removed: verification-needed-disco ** Tags added: verification-done-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824407 Title: remount of multilower moved pivoted-root overlayfs root, results in I/O errors on some modified files Status in linux package in Ubuntu: In Progress Status in linux-hwe package in Ubuntu: Invalid Status in linux-hwe source package in Bionic: In Progress Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Status in linux source package in Focal: In Progress Bug description: == SRU Justification Disco, Eoan, Focal == Multiple squashfs filesystems with overlayfs cause file corruption issues when modifying zero sized files == Fix == The current fix is pending in https://github.com/amir73il/linux/commit/b2d4f0ea5af42e16e154254de99da064f3ac551a == Test case == With an Ubuntu ISO on the cdrom drive, use: #!/bin/bash -x mkdir -p /cdrom mount -t iso9660 -o ro,noatime /dev/sr0 /cdrom sleep 1 mkdir -p /cow mount -t tmpfs -o 'rw,noatime,mode=755' tmpfs /cow sleep 1 mkdir -p /cow/upper mkdir -p /cow/work modprobe -q -b overlay sleep 1 modprobe -q -b loop sleep 1 dev=$(losetup -f) mkdir -p /filesystem.squashfs losetup $dev /cdrom/casper/filesystem.squashfs mount -t squashfs -o ro,noatime $dev /filesystem.squashfs sleep 1 dev=$(losetup -f) mkdir -p /installer.squashfs losetup $dev /cdrom/casper/installer.squashfs mount -t squashfs -o ro,noatime $dev /installer.squashfs sleep 1 mkdir -p /root-tmp mount -t overlay -o 'upperdir=/cow/upper,lowerdir=/installer.squashfs:/filesystem.squashfs,workdir=/cow/work' /cow /root-tmp FILE=/root-tmp/etc/.pwd.lock echo foo > $FILE cat $FILE sync # # dropping caches or remounting causes the bug # echo 3 > /proc/sys/vm/drop_caches cat $FILE Without the fix the cat of the file will produce an error. With the the cat will work correctly. == Regression Potential == There is an unhandled corner case: - two filesystems, A and B, both have null uuid - upper layer is on A - lower layer 1 is also on A - lower layer 2 is on B However, since this is an issue without the fix and will be addressed later with subsequent fixes once they are OK with upstream I think the risk is minimal considering nobody is complaining about these corner cases with the current broken overlayfs squashfs layering. --- 1) Download focal subiquity pending image, or eoan release image 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI) 3) After --- insert the following options break=top debug init=/bin/bash 4) Continue boot (Enter in BIOS, ctrl+x in UEFI) 5) in the initramfs execute: rm /scripts/casper-bottom/25adduser exit 6) you will be dropped into pivoted root filesystem, before systemd is execed as pid one 7) /run/initramfs/ will contain a debug log, showing how everything was mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower overlay setup from them, moved to /root, and then pivot-root to /root done to finally end up as /. Underlying layers are moved into /cow for your convenience. 8) At this point modifying zero-byte length files, that exist in the lowest layer, but not the middle one, in certain ways, will results in them to be corrupted, after / is remounted. 9) Corruption examples (On both focal & eoan) cat /etc/.pwd.lock systemd-sysusers cat /etc/.pwd.lock mount -o remount / cat /etc/.pwd.lock overlayfs: invalid origin (etc/.pwd.lock, ftype=8000, origin ftype=4000) cat: /etc/.pwd.lock: Input/output error (Only on eoan) cat /etc/machine-id systemd-machine-id-setup cat /etc/machine-id mount -o remount / cat /etc/machine-id overlayfs: invalid origin (etc/machine-id, ftype=8000, origin ftype=4000) cat: /etc/machine-id: Input/output error Lots of things break once machine-id and .pwd.lock are corrupted. I.e. unable to dhcp, connect to dbus, add/remove/change users or groups, etc. We were unable to recreate the issue outside of booting things with casper. Ie. statically on a regular host machine without pivot-root. But hopefully booting to a quite state with nothing running is sufficient to reproduce this. Instead of booting with `bebroken init=/bin/bash` you can boot with `bebroken systemd.mask=systemd-remount-fs.service` this will complete the boot, with /etc/machine-id & .pwd.lock modified, meaning that
[Kernel-packages] [Bug 1854968] Re: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x
** Changed in: linux (Ubuntu) Assignee: (unassigned) => Colin Ian King (colin-king) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1854968 Title: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x Status in linux package in Ubuntu: Incomplete Bug description: stress-ng sctp stressor breaks 5.4.0.7-8 on s390x during ADT regression testing: https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac /autopkgtest-focal-canonical-kernel-team- unstable/focal/s390x/l/linux/20191203_153629_d7a41@/log.gz 14:44:30 DEBUG| [stdout] sctp STARTING 14:44:30 DEBUG| [stdout] [ 3491.098762] sctp: Hash tables configured (bind 256/256) 14:44:33 DEBUG| [stdout] [ 3494.694285] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:43 DEBUG| [stdout] [ 3504.714324] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:44:54 DEBUG| [stdout] [ 3514.974288] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:04 DEBUG| [stdout] [ 3525.234306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:14 DEBUG| [stdout] [ 3535.494291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:25 DEBUG| [stdout] [ 3545.754323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:35 DEBUG| [stdout] [ 3556.014294] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:45 DEBUG| [stdout] [ 3566.034317] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:45:55 DEBUG| [stdout] [ 3576.054296] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:05 DEBUG| [stdout] [ 3586.324332] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:15 DEBUG| [stdout] [ 3596.334306] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:25 DEBUG| [stdout] [ 3606.594337] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:36 DEBUG| [stdout] [ 3616.854305] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:46 DEBUG| [stdout] [ 3627.124323] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:46:56 DEBUG| [stdout] [ 3637.154313] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:06 DEBUG| [stdout] [ 3647.414304] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:16 DEBUG| [stdout] [ 3657.674353] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:27 DEBUG| [stdout] [ 3667.734297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:37 DEBUG| [stdout] [ 3677.994396] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:44 DEBUG| [stdout] [ 3684.814335] INFO: task modprobe:2063628 blocked for more than 122 seconds. 14:47:44 DEBUG| [stdout] [ 3684.814345] Tainted: P OE 5.4.0-7-generic #8-Ubuntu 14:47:44 DEBUG| [stdout] [ 3684.814346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 14:47:44 DEBUG| [stdout] [ 3684.814348] modprobeD0 2063628 2063618 0x0800 14:47:44 DEBUG| [stdout] [ 3684.814351] Call Trace: 14:47:44 DEBUG| [stdout] [ 3684.814360] ([<be310914>] __schedule+0x304/0x7b0) 14:47:44 DEBUG| [stdout] [ 3684.814362] [<be310e0a>] schedule+0x4a/0xe0 14:47:44 DEBUG| [stdout] [ 3684.814366] [<bdb071cc>] rwsem_down_write_slowpath+0x22c/0x530 14:47:44 DEBUG| [stdout] [ 3684.814370] [<be14d66c>] register_pernet_subsys+0x2c/0x60 14:47:44 DEBUG| [stdout] [ 3684.814411] [<03ff80766638>] sctp_init+0x2f0/0x520 [sctp] 14:47:44 DEBUG| [stdout] [ 3684.814414] [<bda288c0>] do_one_initcall+0x40/0x200 14:47:44 DEBUG| [stdout] [ 3684.814416] [<bdb594a0>] do_init_module+0x70/0x270 14:47:44 DEBUG| [stdout] [ 3684.814418] [<bdb5b892>] load_module+0x1142/0x1440 14:47:44 DEBUG| [stdout] [ 3684.814419] [<bdb5bdc4>] __do_sys_finit_module+0xa4/0xf0 14:47:44 DEBUG| [stdout] [ 3684.814421] [<be315fc6>] system_call+0x2aa/0x2c8 14:47:47 DEBUG| [stdout] [ 3688.014291] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:47:57 DEBUG| [stdout] [ 3698.064370] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:07 DEBUG| [stdout] [ 3708.084328] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:17 DEBUG| [stdout] [ 3718.134297] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:27 DEBUG| [stdout] [ 3728.214335] unregister_netdevice: waiting for lo to become free. Usage count = 1 14:48:37 DEBUG| [stdout