[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 Sean Bruno changed: What|Removed |Added Resolution|--- |Overcome By Events Status|New |Closed --- Comment #29 from Sean Bruno --- (In reply to Ed Maste from comment #28) Yeah, its probably moot. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #28 from Ed Maste --- The cluster ThunderX systems are now retired; there is a 2S ThunderX at Sentex that is working acceptably well (being used by mhorne@ and the Moritz developers doing lldb work). Is this issue now OBE? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #27 from Glen Barber --- The build is still broken with r358081 and r357800 cherry-picked... ¯\_(ツ)_/¯ -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #26 from Glen Barber --- It looks like r358081 is needed, too. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #25 from Glen Barber --- The build is broken on that revision... :( /usr/src/sys/dev/smc/if_smc.c:398:2: error: implicit declaration of function 'NET_TASK_INIT' is invalid in C99 [-Werror,-Wimplicit-funct ion-declaration] NET_TASK_INIT(>smc_rx, SMC_RX_PRIORITY, smc_task_rx, ifp); ^ 1 error generated. --- if_smc.o --- *** [if_smc.o] Error code 1 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #24 from Mark Johnston --- (In reply to Glen Barber from comment #23) Can you try cherry-picking r357772 as well? I believe that commit fixes the regression. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #23 from Glen Barber --- Unfortunately, the machine insta-panics with r357460. panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/net/netisr.c:1091 cpuid = 0 Would it be worthwhile to upgrade to a more recent -CURRENT? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 Mark Linimon changed: What|Removed |Added Attachment #213916|diff|cpu_errata.c.diff filename|| -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #22 from Mark Johnston --- Sean, were you able to test an update to r357460? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #21 from Mark Millard --- (In reply to Mark Johnston from comment #20) Comment #19 lists: r359745M (unlike comment 17's listing r356207M). -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #20 from Mark Johnston --- (In reply to Sean Bruno from comment #19) This is on r356207? I guess we can just continue the bisection then. r357460 seems like another good place to test. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #19 from Sean Bruno --- (In reply to Sean Bruno from comment #18) Or ... the machine will never panic again and is doing a complete rebuild of freebsd.org ports/pkgs. http://thunderx1.nyi.freebsd.org/index.htmlsbr...@thunderx1.nyi:~ % uname -a FreeBSD thunderx1.nyi.freebsd.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r359745M: Wed Apr 29 17:29:11 UTC 2020 sbr...@build-13.freebsd.org:/usr/obj/arm64.aarch64/usr/src/sys/CLUSTER13 arm64 sbr...@thunderx1.nyi:~ % uptime 3:10PM up 4 days, 20 hrs, 1 user, load averages: 25.78, 26.62, 26.31 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #18 from Sean Bruno --- Just to make sure that this failure isn't induced by acutally/factually building pkgs, I built a statically compiled jail to match the kernel to make sure that I'm not skipping over a failure case. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #17 from Sean Bruno --- (In reply to Mark Johnston from comment #16) FreeBSD thunderx1.nyi.freebsd.org 13.0-CURRENT FreeBSD 13.0-CURRENT #3 r356207M: Fri May 1 15:16:03 UTC 2020 sbr...@thunderx1.nyi.freebsd.org:/var/tmp/home/sbruno/fbsd_head/arm64.aarch64/sys/CLUSTER13 arm64 Starting things up. I'll let you know how it goes. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #16 from Mark Johnston --- (In reply to Sean Bruno from comment #15) It looks like my guess was wrong. Let's try r356207 next. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #15 from Sean Bruno --- (In reply to Sean Bruno from comment #14) No panics overnight, which is a record at this point. No pkgs built because of a bug in jail that was fixed in -current recently. :-) Any revision you'd like to see next? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #14 from Sean Bruno --- (In reply to Mark Johnston from comment #13) test in progress. FreeBSD 13.0-CURRENT #2 r355427M: Thu Apr 30 21:06:45 UTC 2020 sbr...@thunderx1.nyi.freebsd.org:/var/tmp/home/sbruno/fbsd_head/arm64.aarch64/sys/CLUSTER13 arm64 FreeBSD clang version 10.0.0 (g...@github.com:llvm/llvm-project.git llvmorg-10.0.0-0-gd32170dbd5b) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #13 from Mark Johnston --- (In reply to Sean Bruno from comment #12) Thanks, that's useful information. Since we are trying to determine if the bug is in the ASID implementation, let's try updating to r355427 and see if the SError exceptions start happening again. I think a plain bisection would take longer and there are some revs in between that are known to be unstable, so we might hit unrelated issues. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #12 from Sean Bruno --- (In reply to Sean Bruno from comment #11) We have built all three jails on the thunderx host and have now proceeded to the building ports part of portmgr's process. so, we now know of a "working" version of the kernel. Should I start bisecting from here to -current or is this enough information to being diagnostics? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #11 from Sean Bruno --- (In reply to Sean Bruno from comment #10) The machine has survived more than 30 minutes of high load (building world for multiple jails). I'll let it run for the rest of the day, but I doubt at this point it crashes. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #10 from Sean Bruno --- (In reply to Mark Johnston from comment #9) And needed a -Wno-misleading-indentation to get past a couple other nits. Test in progress. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #9 from Mark Johnston --- (In reply to Sean Bruno from comment #8) Thanks. You might also try cherry-picking r354325. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #8 from Sean Bruno --- (In reply to Mark Johnston from comment #5) - Can you run r354285 (r354286 is the commit in question) on one of the systems and see if the panic is reproducible? Bah, it doesn't appear that this version is build able. (world) In file included from /usr/src/lib/libutil/kinfo_getfile.c:6: In file included from /usr/obj/arm64.aarch64/usr/src/tmp/usr/include/sys/user.h:40: /usr/obj/arm64.aarch64/usr/src/tmp/usr/include/machine/pcb.h:71:29: error: field has incomplete type 'struct debug_monitor_state' struct debug_monitor_state pcb_dbg_regs; ^ /usr/obj/arm64.aarch64/usr/src/tmp/usr/include/machine/pcb.h:71:9: note: forward declaration of 'struct debug_monitor_state' struct debug_monitor_state pcb_dbg_regs; ^ 1 error generated. --- kinfo_getfile.o --- Let me try some sequential reverts and see if I get lucky with a list of svn revs that I can backout. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #7 from Sean Bruno --- (In reply to Mark Johnston from comment #5) Definitely hitting that code: r...@thunderx1.nyi:~ # dmesg|grep installed installed Cavium erratum 27456 workaround -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #6 from Sean Bruno --- Created attachment 213917 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=213917=edit trace from the db> prompt Here's what I got on the command line. Its not super informative. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #5 from Mark Johnston --- Created attachment 213916 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=213916=edit print a message when a workaround is applied (In reply to Sean Bruno from comment #4) So they were stable before updating in December? Do you know what they were running before? A bit of a shot in the dark, but the ASID allocator was committed around that time, and contains a workaround for an erratum in the ThunderX. A couple of things to try: - Can you run r354285 (r354286 is the commit in question) on one of the systems and see if the panic is reproducible? - Could you apply the attached patch and grab the dmesg? First let's see if the erratum workaround is getting installed. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #4 from Sean Bruno --- (In reply to Mark Johnston from comment #3) We only have 2 thunderx v1 in the freebsd cluster so I'm unable to comment on thunderx v2 stability. We have experienced this failure on both machines since upgrading them from December -current. Both machines run -current to allow us to build packages for all releases. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #3 from Mark Johnston --- (In reply to Sean Bruno from comment #0) Hmm there is no real information in the ISS, and the FAR doesn't tell us anything. I guess some general questions would help to start: - Do you see this only on -CURRENT? Have you seen it on stable/12? - This happens on multiple machines? How many? Are they all ThunderXs? What about ThunderX2s? - Next time it happens on any system, could you try running "acttrace" at the DDB prompt and paste the output here? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 --- Comment #2 from Sean Bruno --- (In reply to Mark Millard from comment #1) Specifically, these are Cavium Thunderx servers running this build of -current: thunderx2.nyi.freebsd.org 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r359745 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 Mark Millard changed: What|Removed |Added CC||marklmi26-f...@yahoo.com --- Comment #1 from Mark Millard --- Any idea what specific system version (combinations?) have shown at least one such failure? If nothing else, it might give a hint what others might want to avoid as a context. But it may be of help to anyone trying to help identify the details of what is wrong in the code. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 246004] Kernel Panic after moderate amount of activity
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246004 Bug ID: 246004 Summary: Kernel Panic after moderate amount of activity Product: Base System Version: CURRENT Hardware: arm64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: sbr...@freebsd.org The freebsd.org pkg builders and ref machines are unable to stay online for more than a day at this point whenever they start any moderate amount of activity. I've run the onboard hardware memory test for 3 days straight with no errors and no lockups. Looking to start some reasonable amount of debugging to get these machines back into service. Any advice is welcome. The panics all seem to look like the following: x0: 40b32400 x1: 40a0b3d8 x2:0 x3:0 x4: 40100401 x5: 8808 x6: 25e440 x7:26ced x8:2 x9:0 x10: 40a0b818 x11: 25e000 x12: 25e000 x13: 25e000 x14: 63 x15:0 x16: 25a870 x17: 4048f880 x18: 35 x19: 204b9e x20: 25e450 x21:0 x22:0 x23: 40a0b810 x24: 25e000 x25: 25e000 x26:0 x27:0 x28: 40bf3360 x29: d180 sp: cec0 lr: 21db5c elr: 2290fc spsr: 6200 far: 40b32400 esr: be00 timeout stopping cpus panic: Unhandled System Error cpuid = 8 time = 1588034425 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0x008f1754 lr = 0x0023fcac sp = 0x00015d9de520 fp = 0x00015d9de720 db_trace_self_wrapper() at vpanic+0x194 pc = 0x0023fcac lr = 0x0055c718 sp = 0x00015d9de730 fp = 0x00015d9de780 vpanic() at panic+0x44 pc = 0x0055c718 lr = 0x0055c580 sp = 0x00015d9de790 fp = 0x00015d9de840 panic() at do_serror+0x40 pc = 0x0055c580 lr = 0x009106c0 sp = 0x00015d9de850 fp = 0x00015d9de850 do_serror() at handle_serror+0x88 pc = 0x009106c0 lr = 0x008f449c sp = 0x00015d9de860 fp = 0x00015d9de980 handle_serror() at 0x21db58 pc = 0x008f449c lr = 0x0021db58 sp = 0x00015d9de990 fp = 0xd180 KDB: enter: panic [ thread pid 17906 tid 101021 ] Stopped at 0x2290fc -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"