On Thu, 16 Jan 2020 02:14:16 -0000 dann frazier <[email protected]> wrote:
> I built a kernel with the proposed patches[*] and ran a reboot/kernel > compile test on 4 systems. The tests survived 46 total iterations > (~12/system) before I interrupted. Two systems failed with "Synchronous > External Abort: synchronous parity or ECC error" errors. > > I've reverted the systems back to 4.15.0-70 - the kernel before the > cpufeature/errata patches that caused this - to see if these SEA errors > are a regression. > > [*] https://lists.ubuntu.com/archives/kernel- > team/2020-January/106909.html > I've ran 75 iterations of reboot/compile-kernel and encountered 3 gcc segmentation faults. Unfortunately, my test didn't capture the dmesg log but it's likely that these are due to the ECC problems we're (still?) seeing. There was also another issue during one of the reboots which is probably unrelated and due to a flaky BMC: [ 33.896320] ipmi_ssif 0-0012: IPMI message handler: device id demangle failed: -22 [ 33.896354] ipmi_ssif 0-0012: Unable to get the device id: -5 [ 33.987825] ipmi_ssif 0-0012: Found new BMC (man_id: 0x000000, prod_id: 0xaabb, dev_id: 0x20) [ 33.987858] Unable to handle kernel read from unreadable memory at virtual address 00000018 [ 33.999300] Mem abort info: [ 34.005475] ESR = 0x96000004 [ 34.011454] Exception class = DABT (current EL), IL = 32 bits [ 34.020168] SET = 0, FnV = 0 [ 34.025893] EA = 0, S1PTW = 0 [ 34.031617] Data abort info: [ 34.037060] ISV = 0, ISS = 0x00000004 [ 34.043448] CM = 0, WnR = 0 [ 34.048949] user pgtable: 4k pages, 48-bit VAs, pgd = 000000002799ee91 [ 34.058063] [0000000000000018] *pgd=0000000000000000 [ 34.065624] Internal error: Oops: 96000004 [#1] SMP [ 34.073090] Modules linked in: nls_iso8859_1 sch_fq_codel thunderx_zip thunderx_edac ib_iser cavium_rng_vf rdma_cm ipmi_ssif(+) ipmi_devintf shpchp cavium_rng iw_cm ipmi_msghandler ib_cm gpio_keys uio_pdrv_genirq uio ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear ast i2c_algo_bit drm_kms_helper nicvf syscopyarea sysfillrect sysimgblt fb_sys_fops ttm nicpf drm aes_ce_blk aes_ce_cipher crc32_ce crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci thunder_bgx libahci i2c_thunderx thunder_xcv mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 34.161807] Process kworker/64:1 (pid: 651, stack limit = 0x00000000b0697881) [ 34.172016] CPU: 64 PID: 651 Comm: kworker/64:1 Not tainted 4.15.18+ #40 [ 34.181723] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 34.193113] Workqueue: events redo_bmc_reg [ipmi_msghandler] [ 34.201840] pstate: 80400005 (Nzcv daif +PAN -UAO) [ 34.209589] pc : smi_send.isra.4+0x80/0x158 [ipmi_msghandler] [ 34.218275] lr : smi_send.isra.4+0x150/0x158 [ipmi_msghandler] [ 34.227046] sp : ffff0000128c3b10 [ 34.233209] x29: ffff0000128c3b10 x28: 0000000000000020 [ 34.241305] x27: 0000000000000002 x26: 0000000000000000 [ 34.249437] x25: ffff0000128c3c40 x24: ffff0000128c3c38 [ 34.257455] x23: 0000000000000000 x22: 0000000000000018 [ 34.265500] x21: 0000000000000000 x20: ffff810fb16c8800 [ 34.273558] x19: ffff800faffb0000 x18: ffffffffffffffff [ 34.281643] x17: 0000000000000005 x16: 0000000000000000 [ 34.289758] x15: ffff000009578c08 x14: ffff810fb0d20187 [ 34.297899] x13: ffff810fb0d20186 x12: 0000000000000030 [ 34.305997] x11: 0101010101010101 x10: ffff7f7f7f7f7f7f [ 34.314069] x9 : fefdfefefefefeff x8 : ffff810fb16c8800 [ 34.322166] x7 : 0000000000001138 x6 : 000000000000125c [ 34.330300] x5 : 00000000000000dc x4 : ffff810fbc8f1340 [ 34.338456] x3 : 0000000000000000 x2 : 0000000000000000 [ 34.346633] x1 : ffff810fb16c8800 x0 : ffff810fae4ff800 [ 34.354839] Call trace: [ 34.360207] smi_send.isra.4+0x80/0x158 [ipmi_msghandler] [ 34.368450] i_ipmi_request+0x2ac/0x980 [ipmi_msghandler] [ 34.376716] send_channel_info_cmd+0xac/0xd8 [ipmi_msghandler] [ 34.385396] __scan_channels.isra.20+0x84/0x180 [ipmi_msghandler] [ 34.394341] __bmc_get_device_id+0x424/0x8c8 [ipmi_msghandler] [ 34.402994] redo_bmc_reg+0x6c/0x70 [ipmi_msghandler] [ 34.410840] process_one_work+0x1e0/0x420 [ 34.417640] worker_thread+0x4c/0x478 [ 34.420416] IPv6: ADDRCONF(NETDEV_UP): enP2p1s0f2: link is not ready [ 34.424073] kthread+0x134/0x138 [ 34.424081] ret_from_fork+0x10/0x18 [ 34.424089] Code: f908aa74 b4ffff74 f9424e60 aa1403e1 (f94002c2) [ 34.454826] ---[ end trace b54ad269f357375f ]--- [ 34.467956] ipmi_ssif: Unable to register device: error -5 [ 34.476380] ipmi_ssif 0-0012: Unable to start IPMI SSIF: -5 [ 34.484925] ipmi_ssif: probe of 0-0012 failed with error -5 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1857074 Title: Cavium ThunderX CN88XX Panic : Unknown reason To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857074/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
