[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
sorry s/Juery/Juerg/! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Won't Fix Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
@Juery: I have no reason to believe this is related to the boot regression we fixed (bug 1857074). I haven't re-tested lately, but as of 4.15.0-76 it was still reproducible. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Won't Fix Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
Is this related to the boot regression that we fixed or a different problem? I.e., can you still reproduce this with latest Bionic? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Won't Fix Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
** Changed in: linux (Ubuntu Disco) Status: Triaged => Won't Fix -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Won't Fix Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
The patch I highlighted in Comment #9 appears to be unrelated - 4.15.0-76 still fails even though it has the patch. A test build of 4.15.0-76 w/ the patch reverted also fails. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
Looking at the git log - I wonder if this could be related? commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e Author: Pavel Tatashin Date: Tue Nov 19 17:10:06 2019 -0500 arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault It's interesting because ThunderX is somewhat unique in our test cluster as not having HW PAN. We also only recently merged this into our 4.15 tree - Ubuntu-4.15.0-73.82 was the first tree to have it. I'll restart testing on our latest 4.15 (w/ this patch) to see if the issue persists. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
I attempted to bisect this, using the following process: - Run the kernel-build-reboot-loop test on 3 machines in parallel I used 2 CRB1S systems (anuchin, bestovius) and 1 R120-T33 (seidel) - If any machine crashes w/ the parity error message, consider it failed - If all machines survive over night, consider it "OK". Unfortunately, the commit it landed on looks bogus: # first bad commit: [852643165aea0999bb862b36511c5b9f6b11449f] fs//binfmt_elf.c: move variables initialization closer to their usage (Reverse bisect - this would in theory be the commit that *fixed* it) Just in case, I tried reverting that commit from 5.5-rc6. As noted in comment #2, 5.5-rc6 seems immune to this problem. Reverting the commit didn't change that - 5.5-rc6 still survived over night. Note: Of the 3 systems, anuchin was usually the one that failed during the bisect. It could be that this is a generic hw issue, and anuchin is just more severely impacted than the others. It could also be that this symptom can be caused by both a sw and a hw issue, and anuchin is impacted by the hw part, making it a bad choice for a bisect. Either way, bisection seems like a poor strategy for identifying the issue. ** Attachment added: "bisect.log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5323904/+files/bisect.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
** Attachment added: "lspci.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321216/+files/lspci.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
** Attachment added: "dmidecode_-t_bios.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321214/+files/dmidecode_-t_bios.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
** Attachment added: "dmidecode_-t_memory.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321215/+files/dmidecode_-t_memory.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
** Attachment added: "dmidecode_-t_processor.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321213/+files/dmidecode_-t_processor.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
** Attachment added: "Full console log of host seidel oops w/ error 5.0.0-37.40" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321212/+files/seidel.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
All 3 of my machines survived overnight testing on the 5.5-rc6 mainline build[*]. Next step is to try 5.3. 5.3 mainline doesn't boot on these systems, so I'll use Ubuntu's 5.3.0-24. [*] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5-rc6/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: 801f51fa2d00 x20: 09588000 [ 282.574109] x19: 2e4d3e30 x18: a87e7a70 [ 282.582790] x17: a8756110 x16: 082f4448 [ 282.591433] x15: x14: 0012 [ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 [ 282.608730] x11: x10: 0cf0 [ 282.617283] x9 : 1000 x8 : 000181a4 [ 282.625839] x7 : 01001a2b x6 : 2e4d3da0 [ 282.634238] x5 : 2e4d3e08 x4 : 0008 [ 282.642754] x3 : 0802 x2 : fff8 [ 282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 [ 282.660013] Call trace: [ 282.665421] __arch_copy_to_user+0x13c/0x248 [ 282.672979] SyS_newfstat+0x58/0x88 [ 282.679272] el0_svc_naked+0x30/0x34 [ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 282.694411] ---[ end trace 863693cf0c3fd297 ]--- [Test Case] We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at: https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop [Fix] [Regression Risk] To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error
Also reproducible w/ the 5.0.0-37.40 kernel. I'll try a mainline 5.5-rc6 build next. [ 602.796765] Internal error: synchronous parity or ECC error: 9618 [#1] SMP [ 602.803994] Modules linked in: nls_iso8859_1 cavium_rng_vf ipmi_ssif ipmi_devintf input_leds joydev ipmi_msghandler thunderx_edac cavium_rng sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear aes_ce_blk aes_ce_cipher nicvf cavium_ptp ast i2c_algo_bit ttm drm_kms_helper crct10dif_ce ghash_ce syscopyarea sysfillrect sha2_ce sysimgblt uas hid_generic nicpf fb_sys_fops sha256_arm64 drm sha1_ce usbhid usb_storage hid thunder_bgx ahci thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 602.872414] Process cc1 (pid: 40126, stack limit = 0x90887c2f) [ 602.878949] CPU: 10 PID: 40126 Comm: cc1 Not tainted 5.0.0-37-generic #40~18.04.1-Ubuntu [ 602.887040] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018 [ 602.893921] pstate: 8005 (Nzcv daif -PAN -UAO) [ 602.898724] pc : __arch_copy_to_user+0x13c/0x248 [ 602.903353] lr : cp_new_stat+0x140/0x178 [ 602.907277] sp : 2599bcc0 [ 602.910594] x29: 2599bcc0 x28: 800ed0538ec0 [ 602.915912] x27: x26: [ 602.921229] x25: 5600 x24: 0015 [ 602.926547] x23: 10c716d8 x22: 2599bd08 [ 602.931865] x21: 800ed0538ec0 x20: 1170c000 [ 602.937181] x19: 2599bdb0 x18: [ 602.942498] x17: x16: [ 602.947818] x15: x14: [ 602.953134] x13: x12: [ 602.958452] x11: x10: 152f [ 602.963769] x9 : 1000 x8 : 000181a4 [ 602.969087] x7 : 00a60da3 x6 : 2599bd20 [ 602.974405] x5 : 2599bd88 x4 : 0008 [ 602.979721] x3 : 0802 x2 : fff8 [ 602.985038] x1 : 2599bd10 x0 : 2599bd08 [ 602.990356] Call trace: [ 602.992821] __arch_copy_to_user+0x13c/0x248 [ 602.997107] __se_sys_newfstat+0x58/0x88 [ 603.001045] __arm64_sys_newfstat+0x20/0x30 [ 603.005243] el0_svc_common+0x88/0x180 [ 603.009005] el0_svc_handler+0x38/0x78 [ 603.012770] el0_svc+0x8/0xc [ 603.015664] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) [ 603.021765] ---[ end trace 08068f2978fb8211 ]--- -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1860013 Title: [thunderx] Synchronous External Abort: synchronous parity or ECC error Status in linux package in Ubuntu: Triaged Status in linux source package in Bionic: Confirmed Status in linux source package in Disco: Triaged Status in linux source package in Eoan: Triaged Status in linux source package in Focal: Triaged Bug description: [Impact] Under load, ThunderX systems eventually fail with: [ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x9618) at 0xa6eb7000 [ 282.372351] Internal error: : 9618 [#1] SMP [ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146) [ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1 [ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 282.500121] pstate: 8005 (Nzcv daif -PAN -UAO) [ 282.508297] pc : __arch_copy_to_user+0x13c/0x248 [ 282.516430] lr : cp_new_stat+0x140/0x178 [ 282.523768] sp : 2e4d3d40 [ 282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 [ 282.538988] x27: 08b52000 x26: 0050 [ 282.548031] x25: 0124 x24: 0015 [ 282.556872] x23: x22: 2e4d3d88 [ 282.565449] x21: