[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-03-19 Thread dann frazier
sorry s/Juery/Juerg/!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Won't Fix
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-03-19 Thread dann frazier
@Juery: I have no reason to believe this is related to the boot
regression we fixed (bug 1857074). I haven't re-tested lately, but as of
4.15.0-76 it was still reproducible.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Won't Fix
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-03-18 Thread Juerg Haefliger
Is this related to the boot regression that we fixed or a different
problem? I.e., can you still reproduce this with latest Bionic?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Won't Fix
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-02-04 Thread dann frazier
** Changed in: linux (Ubuntu Disco)
   Status: Triaged => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Won't Fix
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-02-04 Thread dann frazier
The patch I highlighted in Comment #9 appears to be unrelated -
4.15.0-76 still fails even though it has the patch. A test build of
4.15.0-76 w/ the patch reverted also fails.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-29 Thread dann frazier
Looking at the git log - I wonder if this could be related?


commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e
Author: Pavel Tatashin 
Date:   Tue Nov 19 17:10:06 2019 -0500

arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess
fault


It's interesting because ThunderX is somewhat unique in our test cluster as not 
having HW PAN.
We also only recently merged this into our 4.15 tree - Ubuntu-4.15.0-73.82 was 
the first tree to have it. I'll restart testing on our latest 4.15 (w/ this 
patch) to see if the issue persists.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-29 Thread dann frazier
I attempted to bisect this, using the following process:
  - Run the kernel-build-reboot-loop test on 3 machines in parallel
I used 2 CRB1S systems (anuchin, bestovius) and 1 R120-T33 (seidel)
  - If any machine crashes w/ the parity error message, consider it failed
  - If all machines survive over night, consider it "OK".

Unfortunately, the commit it landed on looks bogus:

# first bad commit: [852643165aea0999bb862b36511c5b9f6b11449f] 
fs//binfmt_elf.c: move variables initialization closer to their usage
(Reverse bisect - this would in theory be the commit that *fixed* it)

Just in case, I tried reverting that commit from 5.5-rc6. As noted in
comment #2, 5.5-rc6 seems immune to this problem. Reverting the commit
didn't change that - 5.5-rc6 still survived over night.

Note: Of the 3 systems, anuchin was usually the one that failed during
the bisect. It could be that this is a generic hw issue, and anuchin is
just more severely impacted than the others. It could also be that this
symptom can be caused by both a sw and a hw issue, and anuchin is
impacted by the hw part, making it a bad choice for a bisect. Either
way, bisection seems like a poor strategy for identifying the issue.


** Attachment added: "bisect.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5323904/+files/bisect.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to   

[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-17 Thread dann frazier
** Attachment added: "lspci.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321216/+files/lspci.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-17 Thread dann frazier
** Attachment added: "dmidecode_-t_bios.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321214/+files/dmidecode_-t_bios.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-17 Thread dann frazier
** Attachment added: "dmidecode_-t_memory.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321215/+files/dmidecode_-t_memory.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-17 Thread dann frazier
** Attachment added: "dmidecode_-t_processor.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321213/+files/dmidecode_-t_processor.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-17 Thread dann frazier
** Attachment added: "Full console log of host seidel oops w/ error 5.0.0-37.40"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+attachment/5321212/+files/seidel.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-17 Thread dann frazier
All 3 of my machines survived overnight testing on the 5.5-rc6 mainline 
build[*].
Next step is to try 5.3. 5.3 mainline doesn't boot on these systems, so I'll 
use Ubuntu's 5.3.0-24.

[*] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5-rc6/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: 801f51fa2d00 x20: 09588000 
  [  282.574109] x19: 2e4d3e30 x18: a87e7a70 
  [  282.582790] x17: a8756110 x16: 082f4448 
  [  282.591433] x15:  x14: 0012 
  [  282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69 
  [  282.608730] x11:  x10: 0cf0 
  [  282.617283] x9 : 1000 x8 : 000181a4 
  [  282.625839] x7 : 01001a2b x6 : 2e4d3da0 
  [  282.634238] x5 : 2e4d3e08 x4 : 0008 
  [  282.642754] x3 : 0802 x2 : fff8 
  [  282.651250] x1 : 2e4d3d90 x0 : 2e4d3d88 
  [  282.660013] Call trace:
  [  282.665421]  __arch_copy_to_user+0x13c/0x248
  [  282.672979]  SyS_newfstat+0x58/0x88
  [  282.679272]  el0_svc_naked+0x30/0x34
  [  282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
  [  282.694411] ---[ end trace 863693cf0c3fd297 ]---

  [Test Case]
  We found this by doing a reboot/kernel build loop. (The reboot maybe 
unnecessary). Code to automate this setup is at:
https://code.launchpad.net/~dannf/+git/kernel-build-reboot-loop

  [Fix]
  [Regression Risk]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1860013/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1860013] Re: [thunderx] Synchronous External Abort: synchronous parity or ECC error

2020-01-16 Thread dann frazier
Also reproducible w/ the 5.0.0-37.40 kernel. I'll try a mainline 5.5-rc6
build next.

[  602.796765] Internal error: synchronous parity or ECC error: 9618 [#1] 
SMP
[  602.803994] Modules linked in: nls_iso8859_1 cavium_rng_vf ipmi_ssif 
ipmi_devintf input_leds joydev ipmi_msghandler thunderx_edac cavium_rng 
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear aes_ce_blk 
aes_ce_cipher nicvf cavium_ptp ast i2c_algo_bit ttm drm_kms_helper crct10dif_ce 
ghash_ce syscopyarea sysfillrect sha2_ce sysimgblt uas hid_generic nicpf 
fb_sys_fops sha256_arm64 drm sha1_ce usbhid usb_storage hid thunder_bgx ahci 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
[  602.872414] Process cc1 (pid: 40126, stack limit = 0x90887c2f)
[  602.878949] CPU: 10 PID: 40126 Comm: cc1 Not tainted 5.0.0-37-generic 
#40~18.04.1-Ubuntu
[  602.887040] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
[  602.893921] pstate: 8005 (Nzcv daif -PAN -UAO)
[  602.898724] pc : __arch_copy_to_user+0x13c/0x248
[  602.903353] lr : cp_new_stat+0x140/0x178
[  602.907277] sp : 2599bcc0
[  602.910594] x29: 2599bcc0 x28: 800ed0538ec0 
[  602.915912] x27:  x26:  
[  602.921229] x25: 5600 x24: 0015 
[  602.926547] x23: 10c716d8 x22: 2599bd08 
[  602.931865] x21: 800ed0538ec0 x20: 1170c000 
[  602.937181] x19: 2599bdb0 x18:  
[  602.942498] x17:  x16:  
[  602.947818] x15:  x14:  
[  602.953134] x13:  x12:  
[  602.958452] x11:  x10: 152f 
[  602.963769] x9 : 1000 x8 : 000181a4 
[  602.969087] x7 : 00a60da3 x6 : 2599bd20 
[  602.974405] x5 : 2599bd88 x4 : 0008 
[  602.979721] x3 : 0802 x2 : fff8 
[  602.985038] x1 : 2599bd10 x0 : 2599bd08 
[  602.990356] Call trace:
[  602.992821]  __arch_copy_to_user+0x13c/0x248
[  602.997107]  __se_sys_newfstat+0x58/0x88
[  603.001045]  __arm64_sys_newfstat+0x20/0x30
[  603.005243]  el0_svc_common+0x88/0x180
[  603.009005]  el0_svc_handler+0x38/0x78
[  603.012770]  el0_svc+0x8/0xc
[  603.015664] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829) 
[  603.021765] ---[ end trace 08068f2978fb8211 ]---

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1860013

Title:
  [thunderx] Synchronous External Abort: synchronous parity or ECC error

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Triaged
Status in linux source package in Eoan:
  Triaged
Status in linux source package in Focal:
  Triaged

Bug description:
  [Impact]
  Under load, ThunderX systems eventually fail with:

  [  282.360376] Synchronous External Abort: synchronous parity or ECC error 
(0x9618) at 0xa6eb7000
  [  282.372351] Internal error: : 9618 [#1] SMP
  [  282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf uas usb_storage ast i2c_algo_bit ttm drm_kms_helper syscopyarea 
sysfillrect sysimgblt aes_ce_blk fb_sys_fops aes_ce_cipher drm crc32_ce 
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci libahci thunder_bgx 
thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs 
aes_neon_blk crypto_simd cryptd aes_arm64
  [  282.467284] Process cc1 (pid: 39700, stack limit = 0xe0c44146)
  [  282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic 
#85+lp1857074.1
  [  282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [  282.500121] pstate: 8005 (Nzcv daif -PAN -UAO)
  [  282.508297] pc : __arch_copy_to_user+0x13c/0x248
  [  282.516430] lr : cp_new_stat+0x140/0x178
  [  282.523768] sp : 2e4d3d40
  [  282.530369] x29: 2e4d3d40 x28: 801f51fa2d00 
  [  282.538988] x27: 08b52000 x26: 0050 
  [  282.548031] x25: 0124 x24: 0015 
  [  282.556872] x23:  x22: 2e4d3d88 
  [  282.565449] x21: