[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
Hello Max, glad to read that. That's what I hoped, after the significant patch set of LP 1887124 landed. I'm closing this bug on our side, too. Thx ** Changed in: linux (Ubuntu) Status: New => Fix Released ** Changed in: ubuntu-z-systems Status: Incomplete => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: Fix Released Status in linux package in Ubuntu: Fix Released Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
** Changed in: ubuntu-z-systems Status: New => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: Incomplete Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883] [<01d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40 [20832.901885] [<01d3cc57d5f2>] system_call+0x2a6/0x2c8 [20832.901885] Last Breaking-Event-Address: [20832.901889] [<01d3cbd5607e>]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
In other words it's reasonable to retry on a latest Ubuntu 20.04 kernel (after sudo apt update && sudo apt full-upgrade and a reboot) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883] [<01d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40 [20832.901885] [<01d3cc57d5f2>] system_call+0x2a6/0x2c8 [20832.901885] Last
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
Yes, the kernel(s) were the significant set (of about 30) zFCP related patches were applied to, already landed in focal (-updates) respectively the groovy kernel (indicated by the Fix Released status at LP 1887124 - https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
I'm wondering if it would make sense (on top of comment #7: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1881109/comments/7) to test this again with an updated kernel (that got patched to fix DIF and DIX), where the fix virtually updates the scsi/zfcp driver to the kernel 5.8 level? I'm having this bug in mind: LP 1887124 "[UBUNTU 20.04] DIF and DIX support in zfcp (s390x) is broken and the kernel crashes unconditionally" https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124 A patched kernel to test is referenced here: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124/comments/1 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
Well, we do drive our storage sub-system from time to time to the limits - especially if we do parallel LPAR deployments for OpenStack environments. But that's on a z13 and a DS8k - and so far we never saw such issues in this environment. Further investigations in Launchpad did not resulted in further references to similar reports like this, with SCSI / wbt (or wbt in general) on focal. However, I found that there were wbt, respectively blk-wbt, issues in the past with kernels > 4.10 and < v4.19 that partially led to CPU hard lockups on heavy writes (largely reported on NVMe drives). But those bugs where only reported on bionic (and cosmic) - which fits to the kernel range above - and got fixed quite some time ago. The bionic (and cosmic) kernels where patched via backports of: 2887e41b910b - "blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait" 38cfb5a45ee0 - "blk-wbt: improve waking of tasks" I just double checked that the fixes from those tickets are (still) in, and they are. With only having heard about this problem in this bug here, I agree that recommending to turn WBT off in general would not be good - even preferring stability over performance. (I still have the suspicion that it could be XIV related, rather than general block or SCSI layer...) However, for now we may add a statement to the s390x section of the release notes pointing to WBT and the udev rule for disabling it for the block-devices, in case one hits such issues under high disk I/O stress. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
** Information type changed from Public Security to Public -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883] [<01d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40 [20832.901885] [<01d3cc57d5f2>] system_call+0x2a6/0x2c8 [20832.901885] Last Breaking-Event-Address: [20832.901889] [<01d3cbd5607e>]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
@Corbin, may I ask for the rationale why you changed this from Public → Public Security ? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883] [<01d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40 [20832.901885] [<01d3cc57d5f2>] system_call+0x2a6/0x2c8 [20832.901885] Last Breaking-Event-Address: [20832.901889]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
** Information type changed from Public to Public Security -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883] [<01d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40 [20832.901885] [<01d3cc57d5f2>] system_call+0x2a6/0x2c8 [20832.901885] Last Breaking-Event-Address: [20832.901889] [<01d3cbd5607e>]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
Hi Benjamin, if it's an issue somewhere in scsi-midlayer/block-layer/wbt wouldn't it then also happen with zFCP on DS8k and on other patforms? So far we did some testing with zFCP on DS8k (the only storage sub-system we have) as part of the release testing and server certification and on top we have constantly several zFCP systems currently running on 20.04 (probably less big systems and/or with less load), but so far we didn't faced a single crash. So I'm assuming more that is is XIV related, no? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>]
[Kernel-packages] [Bug 1881109] Re: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.
** Also affects: ubuntu-z-systems Importance: Undecided Status: New ** Changed in: ubuntu-z-systems Assignee: (unassigned) => Skipper Bug Screeners (skipper-screen-team) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1881109 Title: [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors. Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server). We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace. [20832.901147] Failing address: 7fe00dea8000 TEID: 7fe00dea8403 [20832.901159] Fault in home space mode while using kernel ASCE. [20832.901171] AS:01d3cccf400b R2:03fd0020800b R3:03fd0020c007 S:03fc1cc78800 P:0400 [20832.901264] Oops: 0011 ilc:2 [#1] SMP [20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup [20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu [20832.901530] Hardware name: IBM 8561 T01 708 (LPAR) [20832.901542] Krnl PSW : 0404e0018000 01d3cbd559be (try_to_wake_up+0x4e/0x700) [20832.901575]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [20832.901744] Krnl GPRS: 03fc917cd988 7fe0 7fe0001e 0003 [20832.901750] 004c 040003fd005a4600 0003 [20832.901753]0003 7fe00dea8454 7fe00dea7b00 [20832.901754]03f44bd9b300 01d3cc587088 7fe0021c7ae0 7fe0021c7a60 [20832.901767] Krnl Code: 01d3cbd559b2: 41902954la %r9,2388(%r2) 01d3cbd559b6: 582003acl %r2,940 #01d3cbd559ba: a718lhi %r1,0 >01d3cbd559be: ba129000cs %r1,%r2,0(%r9) 01d3cbd559c2: a77401c9brc 7,01d3cbd55d54 01d3cbd559c6: e310b0080004lg %r1,8(%r11) 01d3cbd559cc: b9800018ngr %r1,%r8 01d3cbd559d0: a774001fbrc 7,01d3cbd55a0e [20832.901784] Call Trace: [20832.901816] ([<01d3cc57e0ac>] cleanup_critical+0x0/0x474) [20832.901823] [<01d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0 [20832.901827] [<01d3cbd74bde>] __wake_up_common+0x9e/0x1b0 [20832.901829] [<01d3cbd750e4>] __wake_up_common_lock+0x94/0xe0 [20832.901830] [<01d3cbd7515a>] __wake_up+0x2a/0x40 [20832.901835] [<01d3cc1e8640>] wbt_done+0x90/0xe0 [20832.901837] [<01d3cc1d17be>] __rq_qos_done+0x3e/0x60 [20832.901841] [<01d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140 [20832.901848] [<01d3cc35fc60>] dm_softirq_done+0x140/0x230 [20832.901849] [<01d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0 [20832.901850] [<01d3cc57e710>] __do_softirq+0x100/0x360 [20832.901853] [<01d3cbd2525e>] irq_exit+0x9e/0xc0 [20832.901856] [<01d3cbcb0b18>] do_IRQ+0x78/0xb0 [20832.901859] [<01d3cc57dc28>] ext_int_handler+0x128/0x12c [20832.901860] [<01d3cc57d306>] sie_exit+0x0/0x46 [20832.901866] ([<01d3cbce944a>] __vcpu_run+0x27a/0xc30) [20832.901870] [<01d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840 [20832.901875] [<01d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770 [20832.901880] [<01d3cbf85f66>] do_vfs_ioctl+0x376/0x690 [20832.901881] [<01d3cbf86304>] ksys_ioctl+0x84/0xb0 [20832.901883] [<01d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40 [20832.901885] [<01d3cc57d5f2>]