[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2017-10-11 03:36 EDT--- IBM Bugzilla status -> closed, Fix Released within Xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-09-26 07:41 EDT--- Ok, just checked kernel 4.4.0-97 ... that looks much better: root@x:~# uname -a Linux mclint 4.4.0-97-generic #120-Ubuntu SMP Tue Sep 19 17:27:01 UTC 2017 s390x s390x s390x GNU/Linux root@x:~# systool -v -m scsi_mod Module = "scsi_mod" Attributes: uevent = Parameters: default_dev_flags = "0" eh_deadline = "-1" inq_timeout = "20" max_luns= "512" scan= "async" scsi_logging_level = "0" use_blk_mq = "N" root@x:~# systool -v -m dm_mod Module = "dm_mod" Attributes: uevent = Parameters: reserved_bio_based_ios= "16" reserved_rq_based_ios= "256" stats_current_allocated_bytes= "0" use_blk_mq = "N" blk_mq() is now turned off by default and that was our main concern! On top, I also started two "big" pdebuild processes (firefox) that so far had the potential to drive the system right into the hang scenario that was the origin cause to write this ticket. The build did not succeed, but the point is that the system did not run into the typical hang -> so I think you can consider this problem being solved! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-09-22 04:36 EDT--- Just checked kernel 4.4.0-96. blk_mq() is still activated as default: root@x:~# systool -v -m dm_mod Module = "dm_mod" Attributes: uevent = Parameters: reserved_bio_based_ios= "16" reserved_rq_based_ios= "256" stats_current_allocated_bytes= "0" use_blk_mq = "Y" root@x:~# systool -v -m scsi_mod Module = "scsi_mod" Attributes: uevent = Parameters: default_dev_flags = "0" eh_deadline = "-1" inq_timeout = "20" max_luns= "512" scan= "async" scsi_logging_level = "0" use_blk_mq = "Y" -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Fix Committed Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-09-11 10:46 EDT--- Ok, I think then the discussion is back at the point I addressed with my comment from June 6th (comment 36 or 38)! If the Multi-Queue feature doesn't work with Kernel 4.4 yet, then you must not deliver it (that's exactly why SUSE did not activate that feature in their 4.4-Kernel package)! And then again with Kernel 4.11 or 4.12 you can turn that option for those kernel packages on as finally there the feature seems to be mature and can be rolled out. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2017-09-11 09:52 EDT--- No specific requirement for s390 know, for enabled this config options. They can be set to N -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2017-09-04 04:52 EDT--- no specific cpu (un)plug tests where executed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2017-08-30 05:15 EDT--- This function worked with SLES 12 SP2 kernel 4.4.74. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2017-08-22 03:42 EDT--- @Juerg. Now you should have access to the box folder... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [ 5281.179615] [<03ff805df114>] xfs_log_force+0x44/0x100 [xfs] [ 5281.179640]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-07-17 12:10 EDT--- @jsalisbury: To get your point: If 4.11rc8 was a "good kernel" and 4.11rc7 the last "bad kernel" you could try to get that very diff that fixes this problem and could apply it to the stable 4.4-Kernel line, I agree that's a feasible approach. However, we must first make sure that 4.11rc8 is a "good kernel" and thatfore someone must look into the dumps I uploaded. And the arch specific linux-headers-x.x.x-x-generic packages must be fixed for s390x. Without that package I can't build my DKMS-packages. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-07-13 10:26 EDT--- Just wanted to try the 4.11rc7 kernel and my DKMS-OpenAFS packages don't build on it. I already experienced this once with a package from the kernel-ppa repository: root@mclint:~# file /usr/src/linux-headers-4.11.0-041100rc7-generic/arch/s390/tools/gen_facilities /usr/src/linux-headers-4.11.0-041100rc7-generic/arch/s390/tools/gen_facilities: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=4590a42ae8b1bf9b2bd8d05e332e59bc7f47aa93, not stripped You put x86-ELFs in your s390x linux-headers packages I need appropriate linux-headers packages to make repros. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-07-12 10:45 EDT--- @jsalisbury: Pardon to object in some respect, but I can't get the point why I should try another 4.11rc-Kernel and what you refer to with a "bad kernel" and a "good kernel". The 4.11rc8 kernel seemed to be fine and I just got into a hang when the system was shut down by me. I just want to know whether that hang is related to blk_mq() or not and one glance into the crash-dump I uploaded to box will answer that question. I could also look into the crash-dump by myself, I just don't have the debug symbols for kernel 4.11rc8-s390x. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-07-10 04:34 EDT--- @jsalisbury: I tried the kernel and didn't experience a hang as the system ran, but I ran into a hang when I shut the system down after my test. The dump of that very hang was uploaded from me to our Box account and Frank Heimes downloaded it to forward it to the Kernel team. I'd like to have their (the kernel team's) statement whether in the dump they see a relation to blk_mq() or not before we close this item! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-06-08 10:28 EDT--- I shut down the test system I used for the repros here yesterday. And all of a sudden during the shutdown process the system ran into a hang scenario again. I don't know whether that hang is related to the problem addressed here (blk_mq()), but I took a dump and uploaded it to Box as a precaution though -> mclint_20170607_kernel_4_11_0-041100rc8_without_openafs.dump.bz2 General dump info: Dump format: s390mv Version: 5 Dump created...: Wed, 07 Jun 2017 16:56:28 +0200 Dump ended.: Wed, 07 Jun 2017 16:57:06 +0200 Dump CPU ID: 9efc729648000 UTS node name..: mclint UTS kernel release.: 4.11.0-041100rc8-generic UTS kernel version.: #201704232131 SMP Mon Apr 24 02:10:15 UTC 2017 Build arch.: s390x (64 bit) System arch: s390x (64 bit) CPU count (online).: 2 CPU count (real)...: 4 Dump memory range..: 8192 MB Real memory range..: 8192 MB -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-06-07 07:59 EDT--- Hello Benjamin, you're right and let me rephrase in more detail: The multiqueue feature is questioned here, but the feature has an impact to multipathing, because Ubuntu-Xenial boots up and has the multiqueue feature turned on as default(!!). You can run "multipathing" but you will have to turn off the multiqueue feature explicitely if you don't want to potentially run into the hang scenario described here. And my recommendation would be to turn the "multiqueue" feature off by default for all Kernel versions prior to 4.11 that's what I wanted to express with my previous post. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From bbl...@de.ibm.com 2017-06-06 12:46 EDT--- @jac...@de.ibm.com Just to prevent confusion, its probably just a typo, but I think you mean multi-queue (blk-mq and scsi-mq) - thats a different feature. Multipathing (dm-multipath) should certainly work regardless. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-06-06 12:31 EDT--- @jsalisbury Sorry for the late late late answer, but finally I've found time and resources to test the 4.11-rc8 Kernel. And yes, this one looks promising :-) I was able to build OpenAFS with the pbuilder environment and also trying to build the firefox packages with pbuilder did not drive me into the hang scenario I usually face. I'll do some more tests but this looks promising. Hmm, if the solution for this bug is Kernel 4.11 than we may have to speak out a warning to whoever is utilizing multipathing on Kernels with an earlier version. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: Triaged Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-04-06 11:53 EDT--- Same procedure as every kernel, /var/log/syslog: Apr 6 16:17:29 mclint multipathd[881]: mpatha: sda - tur checker timed out Apr 6 16:17:29 mclint multipathd[881]: 8:0: reinstated Apr 6 16:17:29 mclint multipathd[881]: mpatha: sdb - tur checker timed out Apr 6 16:17:29 mclint multipathd[881]: 8:16: reinstated Apr 6 16:17:29 mclint multipathd[881]: mpatha: sdd - tur checker timed out Apr 6 16:17:29 mclint multipathd[881]: 8:48: reinstated Apr 6 16:17:29 mclint multipathd[881]: mpatha: sdc - tur checker timed out Apr 6 16:17:29 mclint multipathd[881]: 8:32: reinstated /dev/sclp_line0: ? 361.418628! INFO: task kworker/1:4:860 blocked for more than 120 seconds. ? 361.418635! Not tainted 4.4.0-72-generic #93-Ubuntu ? 361.418637! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 361.418722! INFO: task cpuplugd:2310 blocked for more than 120 seconds. ? 361.418723! Not tainted 4.4.0-72-generic #93-Ubuntu ? 361.418723! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 361.418766! INFO: task irqbalance:2416 blocked for more than 120 seconds. ? 361.418767! Not tainted 4.4.0-72-generic #93-Ubuntu ? 361.418768! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 361.418806! INFO: task kworker/0:2H:3403 blocked for more than 120 seconds. ? 361.418807! Not tainted 4.4.0-72-generic #93-Ubuntu ? 361.418808! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 361.418990! INFO: task kworker/0:9:4449 blocked for more than 120 seconds. ? 361.418991! Not tainted 4.4.0-72-generic #93-Ubuntu ? 361.418992! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.420013! INFO: task kworker/1:4:860 blocked for more than 120 seconds. ? 481.420020! Not tainted 4.4.0-72-generic #93-Ubuntu ? 481.420021! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.420091! INFO: task systemd-timesyn:1766 blocked for more than 120 seconds . ? 481.420093! Not tainted 4.4.0-72-generic #93-Ubuntu ? 481.420093! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.420136! INFO: task rs:main Q:Reg:2023 blocked for more than 120 seconds. ? 481.420137! Not tainted 4.4.0-72-generic #93-Ubuntu ? 481.420138! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.420250! INFO: task cpuplugd:2310 blocked for more than 120 seconds. ? 481.420251! Not tainted 4.4.0-72-generic #93-Ubuntu ? 481.420252! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.420291! INFO: task irqbalance:2416 blocked for more than 120 seconds. ? 481.420292! Not tainted 4.4.0-72-generic #93-Ubuntu ? 481.420293! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dump: KERNEL: /usr/lib/debug/boot/vmlinux-4.4.0-72-generic DUMPFILE: mclint_20170406_kernel_4_4_0-72_without_openafs.dump CPUS: 1 DATE: Thu Apr 6 16:43:18 2017 UPTIME: 00:29:55 LOAD AVERAGE: 9.99, 9.56, 7.51 TASKS: 403 NODENAME: mclint RELEASE: 4.4.0-72-generic VERSION: #93-Ubuntu SMP Fri Mar 31 14:06:48 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING INFO: no panic task found The bz2-compressed dump is already uploaded to Box. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 -
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-03-30 08:50 EDT--- New Kernel, new hang ... ? 961.242228! INFO: task kworker/1:1:38 blocked for more than 120 seconds. ? 961.242235! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.242236! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.242480! INFO: task xfsaild/dm-11:1742 blocked for more than 120 seconds. ? 961.242481! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.242482! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.242933! INFO: task rs:main Q:Reg:2043 blocked for more than 120 seconds. ? 961.242934! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.242935! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.243407! INFO: task cpuplugd:2355 blocked for more than 120 seconds. ? 961.243409! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.243410! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.243544! INFO: task irqbalance:2447 blocked for more than 120 seconds. ? 961.243546! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.243546! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.243617! INFO: task kworker/0:2H:3385 blocked for more than 120 seconds. ? 961.243618! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.243619! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.243911! INFO: task kworker/0:1H:6035 blocked for more than 120 seconds. ? 961.243912! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.243913! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.244405! INFO: task kworker/0:2:22440 blocked for more than 120 seconds. ? 961.244406! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.244407! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.244543! INFO: task kworker/0:4:22938 blocked for more than 120 seconds. ? 961.244545! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.244545! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.245404! INFO: task dpkg:24617 blocked for more than 120 seconds. ? 961.245405! Not tainted 4.4.0-71-generic #92-Ubuntu ? 961.245406! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. And the dump: KERNEL: /usr/lib/debug/boot/vmlinux-4.4.0-71-generic DUMPFILE: mclint_20170330_kernel_4_4_0-71_without_openafs.dump CPUS: 1 DATE: Thu Mar 30 12:14:27 2017 UPTIME: 00:24:42 LOAD AVERAGE: 14.32, 12.51, 7.23 TASKS: 407 NODENAME: mclint RELEASE: 4.4.0-71-generic VERSION: #92-Ubuntu SMP Fri Mar 24 13:03:47 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING INFO: no panic task found The dump is already uploaded to Box -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO:
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-03-28 12:23 EDT--- Just tried Kernel 4.4.0-70 -> /var/log/syslog: Mar 28 18:07:46 mclint multipathd[888]: mpatha: sda - tur checker timed out Mar 28 18:07:46 mclint multipathd[888]: 8:0: reinstated Mar 28 18:07:46 mclint multipathd[888]: mpatha: sdb - tur checker timed out Mar 28 18:07:46 mclint multipathd[888]: 8:16: reinstated Mar 28 18:07:46 mclint multipathd[888]: mpatha: sdc - tur checker timed out Mar 28 18:07:46 mclint multipathd[888]: 8:32: reinstated Mar 28 18:07:46 mclint multipathd[888]: mpatha: sdd - tur checker timed out Mar 28 18:07:46 mclint multipathd[888]: 8:48: reinstated /dev/sclp_line0: ? 459.779353! BTRFS error (device dm-1): bdev /dev/dm-1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 ? 459.779503! BTRFS error (device dm-1): bdev /dev/dm-1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 ? 481.287452! INFO: task xfsaild/dm-11:1727 blocked for more than 120 seconds. ? 481.287459! Not tainted 4.4.0-70-generic #91-Ubuntu ? 481.287461! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.287647! INFO: task cpuplugd:2402 blocked for more than 120 seconds. ? 481.287648! Not tainted 4.4.0-70-generic #91-Ubuntu ? 481.287649! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.287696! INFO: task irqbalance:2508 blocked for more than 120 seconds. ? 481.287697! Not tainted 4.4.0-70-generic #91-Ubuntu ? 481.287698! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.287740! INFO: task kworker/0:19:22353 blocked for more than 120 seconds. ? 481.287741! Not tainted 4.4.0-70-generic #91-Ubuntu ? 481.287742! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.287769! INFO: task kworker/0:21:22355 blocked for more than 120 seconds. ? 481.287770! Not tainted 4.4.0-70-generic #91-Ubuntu ? 481.287771! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 481.287875! INFO: task tar:22484 blocked for more than 120 seconds. ? 481.287876! Not tainted 4.4.0-70-generic #91-Ubuntu ? 481.287877! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 601.288111! INFO: task systemd:1 blocked for more than 120 seconds. ? 601.288118! Not tainted 4.4.0-70-generic #91-Ubuntu ? 601.288119! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 601.288207! INFO: task xfsaild/dm-11:1727 blocked for more than 120 seconds. ? 601.288208! Not tainted 4.4.0-70-generic #91-Ubuntu ? 601.288209! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 601.288372! INFO: task rs:main Q:Reg:2002 blocked for more than 120 seconds. ? 601.288374! Not tainted 4.4.0-70-generic #91-Ubuntu ? 601.288374! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 601.288496! INFO: task cpuplugd:2402 blocked for more than 120 seconds. ? 601.288497! Not tainted 4.4.0-70-generic #91-Ubuntu ? 601.288497! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. crashdump-info: KERNEL: /usr/lib/debug/boot/vmlinux-4.4.0-70-generic DUMPFILE: mclint_20170328_kernel_4_4_0-70_without_openafs.dump CPUS: 3 DATE: Tue Mar 28 18:15:25 2017 UPTIME: 00:14:23 LOAD AVERAGE: 13.22, 10.22, 5.28 TASKS: 408 NODENAME: mclint RELEASE: 4.4.0-70-generic VERSION: #91-Ubuntu SMP Wed Mar 22 12:48:02 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 3) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found I will upload the compressed dump to Box soon ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-03-14 12:07 EDT--- I was able to run into the hand now also without openafs, /var/log/syslog: Mar 14 15:10:46 mclint multipathd[887]: mpatha: sda - tur checker timed out Mar 14 15:10:46 mclint multipathd[887]: 8:0: reinstated Mar 14 15:10:46 mclint multipathd[887]: mpatha: sdb - tur checker timed out Mar 14 15:10:46 mclint multipathd[887]: 8:16: reinstated Mar 14 15:10:46 mclint multipathd[887]: mpatha: sdc - tur checker timed out Mar 14 15:10:46 mclint multipathd[887]: 8:32: reinstated Mar 14 15:10:46 mclint multipathd[887]: mpatha: sdd - tur checker timed out Mar 14 15:10:46 mclint multipathd[887]: 8:48: reinstated On the sclp_line console: ? 9841.149452! INFO: task btrfs-transacti:634 blocked for more than 120 seconds. ? 9841.149459! Not tainted 4.4.0-67-generic #88 ? 9841.149461! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9841.149627! INFO: task cpuplugd:2409 blocked for more than 120 seconds. ? 9841.149628! Not tainted 4.4.0-67-generic #88 ? 9841.149629! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9841.149674! INFO: task irqbalance:2515 blocked for more than 120 seconds. ? 9841.149675! Not tainted 4.4.0-67-generic #88 ? 9841.149676! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9841.149715! INFO: task kworker/0:2:3661 blocked for more than 120 seconds. ? 9841.149716! Not tainted 4.4.0-67-generic #88 ? 9841.149717! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9841.149752! INFO: task tar:16648 blocked for more than 120 seconds. ? 9841.149753! Not tainted 4.4.0-67-generic #88 ? 9841.149754! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9961.149482! INFO: task btrfs-transacti:634 blocked for more than 120 seconds. ? 9961.149489! Not tainted 4.4.0-67-generic #88 ? 9961.149490! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9961.149640! INFO: task rs:main Q:Reg:1995 blocked for more than 120 seconds. ? 9961.149642! Not tainted 4.4.0-67-generic #88 ? 9961.149642! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9961.149727! INFO: task cpuplugd:2409 blocked for more than 120 seconds. ? 9961.149729! Not tainted 4.4.0-67-generic #88 ? 9961.149729! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9961.149772! INFO: task irqbalance:2515 blocked for more than 120 seconds. ? 9961.149773! Not tainted 4.4.0-67-generic #88 ? 9961.149773! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 9961.149811! INFO: task kworker/0:2:3661 blocked for more than 120 seconds. ? 9961.149812! Not tainted 4.4.0-67-generic #88 ? 9961.149812! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I just made a new dump -> General dump info: Dump format: s390mv Version: 5 Dump created...: Tue, 14 Mar 2017 15:45:29 +0100 Dump ended.: Tue, 14 Mar 2017 15:46:07 +0100 Dump CPU ID: 9efc729648000 UTS node name..: mclint UTS kernel release.: 4.4.0-67-generic UTS kernel version.: #88 SMP Wed Mar 8 14:48:51 UTC 2017 Build arch.: s390x (64 bit) System arch: s390x (64 bit) CPU count (online).: 3 CPU count (real)...: 4 Dump memory range..: 8192 MB Real memory range..: 8192 MB The dump is currently uploaded to the Box-Folder -> mclint_20170314_kernel_4_4_0-67_without_openafs.dump.bz2 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map:
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-03-13 09:57 EDT--- Next iteration: I was able to make OpenAFS for the proposed kernel fixed and so I was able to start the next pdebuild job and again I run into a hang scenario: Mar 13 14:30:32 mclint multipathd[881]: mpatha: sda - tur checker timed out Mar 13 14:30:32 mclint multipathd[881]: 8:0: reinstated Mar 13 14:30:32 mclint multipathd[881]: mpatha: sdc - tur checker timed out Mar 13 14:30:32 mclint multipathd[881]: 8:32: reinstated Mar 13 14:30:33 mclint multipathd[881]: mpatha: sdb - tur checker timed out Mar 13 14:30:33 mclint multipathd[881]: 8:16: reinstated Mar 13 14:30:36 mclint multipathd[881]: mpatha: sdd - tur checker timed out Mar 13 14:30:36 mclint rsyslogd-2007: action 'action 10' suspended, next retry is Mon Mar 13 14:32:06 2017 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Mar 13 14:30:36 mclint multipathd[881]: 8:48: reinstated I just pulled a dump and compress it, eventually I'll upload it to Box. I'll go on and try to reproduce this without OpenAFS and I'm very confident that this hang is not related to AFS at all ... General dump info: Dump format: s390mv Version: 5 Dump created...: Mon, 13 Mar 2017 14:42:39 +0100 Dump ended.: Mon, 13 Mar 2017 14:43:17 +0100 Dump CPU ID: 9efc729648000 UTS node name..: mclint UTS kernel release.: 4.4.0-67-generic UTS kernel version.: #88 SMP Wed Mar 8 14:48:51 UTC 2017 Build arch.: s390x (64 bit) System arch: s390x (64 bit) CPU count (online).: 2 CPU count (real)...: 4 Dump memory range..: 8192 MB Real memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Dump device info: Volume 0: 0.0.8409 (online/active) Volume 1: 0.0.840a (online/active) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup:
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-03-09 11:28 EDT--- Ok, I just tested the kernel from http://kernel.ubuntu.com/~rtg/lp1670634/ and so far this looks good! I was able to Make the OpenAFS Ubuntu package three times with pdebuild without running into a hang, and this was a very good candidate to run into the hang scenario. I'd like to activate OpenAFS on that system so that I'll again be able to also start to run jobs as users other than root and also to build more packages for Ubuntu, and at the moment there is only one subtle obstacle: root@mclint:/var/lib/dkms/openafs/1.6.20.1/build# file /usr/src/linux-headers-4.4.0-67-generic/scripts/basic/fixdep /usr/src/linux-headers-4.4.0-67-generic/scripts/basic/fixdep: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=4f746ae15cb57aa0f264c965a8061844f5f21fa2, not stripped root@mclint:/var/lib/dkms/openafs/1.6.20.1/build# dpkg -S /usr/src/linux-headers-4.4.0-67-generic/scripts/basic/fixdep linux-headers-4.4.0-67-generic: /usr/src/linux-headers-4.4.0-67-generic/scripts/basic/fixdep Somehow, you put me the x86_64 version of "fixdep" into the linux- headers package in the ~rtg/lp1670634 folder. I checked the same file on the other linux-headers packages on my system and they were present and for the s390x-architecture. I'm a little puzzled here, as the linux- headers package is an "_all.deb" I thought those packages should be free of binary files for a specific architecture. I'll try to replace the fixdep program and try on ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923]
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From jac...@de.ibm.com 2017-03-08 05:56 EDT--- Just tried the newest Kernel 4.4.0-66, and I'm still running into the hang. Here the final statements in /var/log/syslog (the lines, that never make it out onto the disk): Mar 8 11:26:31 mclint multipathd[955]: mpatha: sdb - tur checker timed out Mar 8 11:26:31 mclint multipathd[955]: 8:16: reinstated Mar 8 11:26:31 mclint multipathd[955]: mpatha: sdd - tur checker timed out Mar 8 11:26:31 mclint rsyslogd-2007: action 'action 10' suspended, next retry is Wed Mar 8 11:27:01 2017 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Mar 8 11:26:31 mclint multipathd[955]: 8:48: reinstated Mar 8 11:26:31 mclint multipathd[955]: mpatha: sdc - tur checker timed out Mar 8 11:26:31 mclint multipathd[955]: 8:32: reinstated Mar 8 11:26:32 mclint multipathd[955]: mpatha: sda - tur checker timed out Mar 8 11:26:32 mclint multipathd[955]: 8:0: reinstated And this here shows up on the sclp_line console: ? 961.419327! INFO: task cpuplugd:2604 blocked for more than 120 seconds. ? 961.419337! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419338! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.419404! INFO: task irqbalance:2651 blocked for more than 120 seconds. ? 961.419406! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419407! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.419450! INFO: task kworker/0:4:3801 blocked for more than 120 seconds. ? 961.419451! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419452! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.419494! INFO: task kworker/1:1:4548 blocked for more than 120 seconds. ? 961.419495! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419496! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.419539! INFO: task kworker/0:0H:20302 blocked for more than 120 seconds. ? 961.419540! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419541! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.419764! INFO: task kworker/0:0:66641 blocked for more than 120 seconds. ? 961.419766! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419767! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 961.419895! INFO: task rm:81710 blocked for more than 120 seconds. ? 961.419896! Not tainted 4.4.0-66-generic #87-Ubuntu ? 961.419897! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 1081.419024! INFO: task systemd:1 blocked for more than 120 seconds. ? 1081.419033! Not tainted 4.4.0-66-generic #87-Ubuntu ? 1081.419035! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 1081.419148! INFO: task cpuplugd:2604 blocked for more than 120 seconds. ? 1081.419150! Not tainted 4.4.0-66-generic #87-Ubuntu ? 1081.419151! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ? 1081.419186! INFO: task irqbalance:2651 blocked for more than 120 seconds. ? 1081.419187! Not tainted 4.4.0-66-generic #87-Ubuntu ? 1081.419188! "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. I pulled a DASD-Dump from the system: KERNEL: /usr/lib/debug/boot/vmlinux-4.4.0-66-generic DUMPFILE: mclint_20170308_kernel_4_4_0-66_without_openafs.dump CPUS: 3 DATE: Wed Mar 8 11:37:56 2017 UPTIME: 00:25:30 LOAD AVERAGE: 12.99, 11.25, 6.55 TASKS: 422 NODENAME: mclint RELEASE: 4.4.0-66-generic VERSION: #87-Ubuntu SMP Fri Mar 3 15:32:53 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bb1538 (1 of 3) [THREAD_INFO: b7c000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found And again I see 10 multipath-Daemons in the process list, this is my typical hang scenario. crash> ps | grep multipathd 955 1 0 1e49115f0 IN 0.1 335364 8316 multipathd 971 1 0 7e8b2be0 IN 0.1 335364 8316 multipathd 972 1 0 7e8b6db0 IN 0.1 335364 8316 multipathd 977 1 1 7e8b36d8 IN 0.1 335364 8316 multipathd 978 1 0 7e8b62b8 IN 0.1 335364 8316 multipathd 979 1 2 7e8b4cc8 IN 0.1 335364 8316 multipathd 81714 1 1 7cdc8000 IN 0.1 335364 8316 multipathd 81715 1 1 7cdc95f0 IN 0.1 335364 8316 multipathd 81716 1 1 7cdcc1d0 IN 0.1 335364 8316 multipathd 81717 1 1 1e6c595f0 IN 0.1 335364 8316 multipathd I'll compress the dump and try to find ways to make it available to you ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug
[Kernel-packages] [Bug 1670634] Comment bridged from LTC Bugzilla
--- Comment From heinz-werner_se...@de.ibm.com 2017-03-07 11:44 EDT--- Please provide Debug-Info to this IBM_BOX folder https://ibm.box.com/s/y10o4u7bcvc6nk7rgk2gfmk039ii5d1i After Debugging this folder will be deleted. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1670634 Title: blk-mq: possible deadlock on CPU hot(un)plug Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: == Comment: #0 - Carsten Jacobi- 2017-03-07 03:35:31 == I'm evaluating Ubuntu-Xenial on z for development purposes, the test system is installed in an LPAR with one FCP-LUN which is accessable by 4 pathes (all pathes are configured). The system hangs regularly when I make packages with "pdebuild" using the pbuilder packaging suit. The local Linux development team helped me out with a pre-analysis that I can post here (thanks a lot for that): With the default settings and under a certain workload, blk_mq seems to get into a presumed "deadlock". Possibly this happens on CPU hot(un)plug. After the I/O stalled, a dump was pulled manually. The following information is from the crash dump pre-analysis. $ zgetdump -i dump.0 General dump info: Dump format: elf Version: 1 UTS node name..: mclint UTS kernel release.: 4.4.0-65-generic UTS kernel version.: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 System arch: s390x (64 bit) CPU count (online).: 2 Dump memory range..: 8192 MB Memory map: - 0001b831afff (7043 MB) 0001b831b000 - 0001 (1149 MB) Things look similarly with HWE kernel ubuntu16.04-4.8.0-34.36~16.04.1. KERNEL: vmlinux.full DUMPFILE: dump.0 CPUS: 2 DATE: Fri Mar 3 14:31:07 2017 UPTIME: 02:11:20 LOAD AVERAGE: 13.00, 12.92, 11.37 TASKS: 411 NODENAME: mclint RELEASE: 4.4.0-65-generic VERSION: #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 MACHINE: s390x (unknown Mhz) MEMORY: 7.8 GB PANIC: "" PID: 0 COMMAND: "swapper/0" TASK: bad528 (1 of 2) [THREAD_INFO: b78000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) INFO: no panic task found crash> dev -d MAJOR GENDISKNAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV ... 8 1e1d6d800 sda1e1d51210 0 23151 4294944145 N/A(MQ) 8 1e4e06800 sdc2081b180 23148 4294944148 N/A(MQ) 8 1f07800sdb20c75680 23195 4294944101 N/A(MQ) 8 1e4e06000 sdd1e4e31210 0 23099 4294944197 N/A(MQ) 252 1e1d6c800 dm-0 1e1d51b18 9 1 8 N/A(MQ) ... So both dm-mpath and sd have requests pending in their block multiqueue. The large numbers of sd look strange and seem to be the unsigned formatting of the values shown for async multiplied by -1. [0.798256] Linux version 4.4.0-65-generic (buildd@z13-011) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #86-Ubuntu SMP Thu Feb 23 17:54:37 UTC 2017 (Ubuntu 4.4.0-65.86-generic 4.4.49) [0.798262] setup: Linux is running natively in 64-bit mode [0.798290] setup: Max memory size: 8192MB [0.798298] setup: Reserving 196MB of memory at 7996MB for crashkernel (System RAM: 7996MB) [0.836923] Kernel command line: root=/dev/mapper/mclint_vg-root rootflags=subvol=@ crashkernel=196M BOOT_IMAGE=0 [ 5281.179428] INFO: task xfsaild/dm-11:1604 blocked for more than 120 seconds. [ 5281.179437] Not tainted 4.4.0-65-generic #86-Ubuntu [ 5281.179438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5281.179440] xfsaild/dm-11 D 007bcf52 0 1604 2 0x [ 5281.179444]0001e931c230 001a6964 0001e6f9b958 0001e6f9b9d8 0001e15795f0 0001e6f9b988 00ce8c00 0001ea805c70 0001ea805c00 00ba5ed0 0001e931c1d0 0001e1579b20 0001ea805c00 0001e15795f0 0001ea805c00 007d3978 007bc9f8 0001e6f9b9d8 0001e6f9ba40 [ 5281.179454] Call Trace: [ 5281.179461] ([<007bc9f8>] __schedule+0x300/0x810) [ 5281.179462] [<007bcf52>] schedule+0x4a/0xb0 [ 5281.179465] [<007c02aa>] schedule_timeout+0x232/0x2a8 [ 5281.179466] [<007bde50>] wait_for_common+0x110/0x1c8 [ 5281.179472] [<0017b602>] flush_work+0x42/0x58 [ 5281.179564] [<03ff805e14ba>] xlog_cil_force_lsn+0x7a/0x238 [xfs] [ 5281.179589] [<03ff805dee82>] _xfs_log_force+0x9a/0x2e8 [xfs] [