Re: TARGET_ARCH=powerpc head -r317820 production-style kernel: periodic panics always in pid=11 (the Idle threads)
On 2017-May-19, at 9:42 PM, Mark Millardwrote: > On 2017-May-9, at 2:00 PM, Mark Millard wrote: > > . . . >> fatal kernel trap: >> exception = 0x903a64e (unknown) >> srr0= 0x7ff760 >> srr1= 0xc1007c >> lr = 0x907f >> curthread = 0x147d6c0 >> pid = 11, comm = idle: cpu0 >> [ thread pid 11 tid 13 ] >> Stopped at ffs_truncate+0x1080:stw r11, 0xf8(r31) >> >> 1 contains (cpu1 instead of cpu0, so different tid): >> >> fatal kernel trap: >> exception = 0x903a64e (unknown) >> srr0= 0x7ff760 >> srr1= 0xc1007c >> lr = 0x907f >> curthread = 0x147d360 >> pid = 11, comm = idle: cpu1 >> [ thread pid 11 tid 14 ] >> Stopped at ffs_truncate+0x1080:stw r11, 0xf8(r31) >> >> 1 contains: > > I've discovered where to find the trapframe > in the vmcore.* files for these specific > examples with 0x903a64e as the exception > and such. > > In the vmcore the memory image starts at > byte offset 0x1000. > > To see the values reported the only > place in the image file to start that > produces those values at the offsets > for in side the powerpc trapframe is: > > offset 0x1001 in the vmcore.* file. > > So memory address 0x1 is being used > as the trapframe address when that > odd exception information is being > displayed. Yep: misaligned. > > The decoding is not of the actual > trapframe: it is garbage that is > not to be believed. > > > Note: I lucked out after the above and > got a somewhat different odd trap information > that lead to actually getting a backtrace > that included the actual pid 11 cpu 1 kernel > thread stack bt associated with that odd > information display. Typo: That should have been "cpu 2". > I'll send a separate reply for that information > as it will take some transcribing from camera > pictures and such. As indicated, I got a different odd trap report that gave a backtrace. . . fatal user trap exception = 0x421 (unknown) srr0 = 0xc1007c09 srr1 = 0x3a64e80 lr= 0xc0807fc9 curthread = 0x147d000 pid = 11, comm = idle: cpu 2 Now at this point it attempted to db_print_loc_and_inst and got another exception (at offset +0x60 in the routine). So the backtrace has both the consequences of that and what lead up to that: an EXI trap was attempting to report trap frame information but was using a bad address for the supposed frame. The details of the backtrace: panic: data storage interrupt trap cpuid = 2 time = 145187154 KDB: stack backtrace 0xdf5ef2c0: at kdb_backtrace+0x5c 0xdf5ef3a0: at panic+0x54 0xdf5ef3f0: at trap_fatal+0x1cc 0xdf5ef420: at powerpc_interrupt+0x180 0xdf5ef5c0: kernel DSI read trap @ 0xc1007c09 by db_disasm+0x30: srr1=0x1032 r1 =0xdf5ef6b0 cr =0x24009022 xer =0 ctr =0x1852cc sr =0x4000 0xdf5ef6b0: at 0x1007480 0xdf5ef6d0: at db_print_loc_and_inst+0x60 0xdf5ef700: at db_trap+0x104 0xdf5ef790: at kdb_trap+0x1bc 0xdf5ef810: at trap_fatal+0x1b0 0xdf5ef840: at trap+0x1184 0xdf5ef870: kernel EXI trap by cpu_idle_60x+0x88: srr1=0x1032 r1 =0xdf5ef930 cr =0x4042 xer =0x2000 ctr =0x8e3bd8 saved LR(0x2) is invalid. So an EXI trap was attempting to report a trap frame. (Note: the LR's for pid 11 cpu threads normally report an invalid LR in ddb.) The actual EXI trapframe starts at 013f0878 in vmcore.5: 013f0870 df 5e f9 30 00 10 08 f8 00 04 90 32 df 5e f9 30 |.^.0...2.^.0| 013f0880 01 47 d0 00 00 00 00 00 25 94 48 3f 00 00 00 00 |.G..%.H?| 013f0890 25 94 48 3f 00 4a a9 c8 00 00 00 00 00 00 00 44 |%.H?.J.D| 013f08a0 01 fc a0 55 00 00 90 32 df 5d 1d 00 00 00 00 00 |...U...2.]..| 013f08b0 00 d4 bd ec 00 cb 98 98 00 c9 66 bc 00 c4 5d 08 |..f...].| 013f08c0 00 c9 66 bc 00 d4 c5 3c df 5e f9 e0 00 eb a7 80 |..f<.^..| 013f08d0 00 c9 66 bc 01 47 d0 00 df 5e f9 8c 00 00 00 06 |..f..G...^..| 013f08e0 00 00 00 06 00 eb b5 80 00 00 00 00 00 8e 3b d8 |..;.| 013f08f0 00 d2 6b f0 df 5e f9 30 00 8e 3b f4 40 00 00 42 |..k..^.0..;.@..B| 013f0900 20 00 00 00 00 8e 3b d8 00 8e 3c 60 00 00 90 32 | .;...<`...2| 013f0910 00 00 05 00 41 a1 d5 d4 42 00 00 00 00 00 00 00 |A...B...| So: r0= 0x00049032 r1= 0xdf5ef930 r2= 0x0147d000 r3= 0x r4= 0x2594483f r5= 0x r6= 0x2594483f r7= 0x004aa9c8 r8= 0x r9= 0x0044 r10 = 0x01fca055 r11 = 0x9032 r12 = 0xdf5d1d00 r13 = 0x r14 = 0x00d4bdec r15 = 0x00cb9898 r16 = 0x00c966bc r17 = 0x00c45d08 r18 = 0x00c966bc r19 = 0x00d4c53c r20 = 0xdf5ef9e0 r21 = 0x00eba780 r22 = 0x00c966bc r23 = 0x1047d000 r24 = 0xdf5ef98c r25 = 0x0006 (this value shows up later in a bad spot) r26
Re: TARGET_ARCH=powerpc head -r317820 production-style kernel: periodic panics always in pid=11 (the Idle threads)
On 2017-May-9, at 2:00 PM, Mark Millardwrote: . . . > fatal kernel trap: > exception = 0x903a64e (unknown) > srr0= 0x7ff760 > srr1= 0xc1007c > lr = 0x907f > curthread = 0x147d6c0 > pid = 11, comm = idle: cpu0 > [ thread pid 11 tid 13 ] > Stopped at ffs_truncate+0x1080:stw r11, 0xf8(r31) > > 1 contains (cpu1 instead of cpu0, so different tid): > > fatal kernel trap: > exception = 0x903a64e (unknown) > srr0= 0x7ff760 > srr1= 0xc1007c > lr = 0x907f > curthread = 0x147d360 > pid = 11, comm = idle: cpu1 > [ thread pid 11 tid 14 ] > Stopped at ffs_truncate+0x1080:stw r11, 0xf8(r31) > > 1 contains: I've discovered where to find the trapframe in the vmcore.* files for these specific examples with 0x903a64e as the exception and such. In the vmcore the memory image starts at byte offset 0x1000. To see the values reported the only place in the image file to start that produces those values at the offsets for in side the powerpc trapframe is: offset 0x1001 in the vmcore.* file. So memory address 0x1 is being used as the trapframe address when that odd exception information is being displayed. Yep: misaligned. The decoding is not of the actual trapframe: it is garbage that is not to be believed. Note: I lucked out after the above and got a somewhat different odd trap information that lead to actually getting a backtrace that included the actual pid 11 cpu 1 kernel thread stack bt associated with that odd information display. I'll send a separate reply for that information as it will take some transcribing from camera pictures and such. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
TARGET_ARCH=powerpc head -r317820 production-style kernel: periodic panics always in pid=11 (the Idle threads)
kgdb is not working for powerpc, neither system nor ports. I've used "strings" to extract the later information below about the failures. The time frames to failure are widely variable, minutes to hours. I've never seen the below with a debug kernel, only with production-style. I have not seen any such problems for powerpc64, aarch64 (with -mcpu=cortex-a53 ), armv6 (with -mcpu=cortex-a7 ), or amd64. Just powerpc. The powerpc and powerpc64 hardware is (e.g.) the same old PowerMac G5 so-called "Quad Core" used with two different boot SSDs. Note: This reproduces for me for pure gcc 4.2.1 based builds. My usual clang-targetting- powerpc experiments are not involved here. I'd not updated for a long time before this due to the status of the clang compiler not changing and its powerpc stack code-generation problems being difficult to work around. My kernels are unusual by having both sc and vt in the build and ps3 disabled. I happen to be using sc because it works with the 2560x1440 display that is currently connected but with vt it fails to boot for such a size. Of 7 example vmcore.* files. . . (Note that all are pid 11 Idle-process thread failures) 3 contain: fatal kernel trap: exception = 0x903a64e (unknown) srr0= 0x7ff760 srr1= 0xc1007c lr = 0x907f curthread = 0x147d6c0 pid = 11, comm = idle: cpu0 [ thread pid 11 tid 13 ] Stopped at ffs_truncate+0x1080:stw r11, 0xf8(r31) 1 contains (cpu1 instead of cpu0, so different tid): fatal kernel trap: exception = 0x903a64e (unknown) srr0= 0x7ff760 srr1= 0xc1007c lr = 0x907f curthread = 0x147d360 pid = 11, comm = idle: cpu1 [ thread pid 11 tid 14 ] Stopped at ffs_truncate+0x1080:stw r11, 0xf8(r31) 1 contains: fatal kernel trap: exception = 0x2100 (unknown) srr0= 0x7c0903 srr1= 0xa64e8004 lr = 0x807fc9e7 curthread = 0x147d000 pid = 11, comm = idle: cpu2 [ thread pid 11 tid 15 ] Stopped at audit_commit+0x24f: illegal instruction 4915f00 1 contains: fatal kernel trap: exception = 0x300 (data storage interrupt) virtual address = 0x7ff76000 dsisr = 0x4000 srr0= 0x8e3cf8 srr1= 0x1032 lr = 0x8e3ce8 curthread = 0x147d6c0 pid = 11, comm = idle: cpu0 panic: data storage interrupt trap cpuid = 0 time = 1494057319 KDB: stack backtrace: 0xdf5e52c0: at kdb_backtrace+0x5c 0xdf5e5330: at vpanic+0x1ec 0xdf5e53a0: at panic+0x54 0xdf5e53f0: at trap_fatal+0x1cc 0xdf5e5420: at trap+0x122c 0xdf5e55c0: at powerpc_interrupt+0x180 0xdf5e55f0: kernel DSI read trap @ 0x7ff76000 by db_disasm+0x30: srr1=0x1032 r1=0xdf5e56b0 cr=0x24009022 xer=0 ctr=0x1852cc sr=0x4000 0xdf5e56b0: at 0x1007460 0xdf5e56d0: at db_print_loc_and_inst+0x60 0xdf5e5700: at db_trap+0x104 0xdf5e5790: at kdb_trap+0x1bc 0xdf5e5810: at trap_fatal+0x1b0 0xdf5e5840: at trap+0x1184 0xdf5e5870: kernel DECR trap by cpu_idle_60x+0x88: srr1=0x9032 r1=0xdf5e5930 cr=0x4042 xer=0x2000 ctr=0x8e3bd8 saved LR(0xfffe) is invalid And 1 contains: fatal kernel trap: exception = 0x0 (unknown) srr0= 0x903a64e srr1= 0x80042100 lr = 0xc9e7c800 curthread = 0x147d360 pid = 11, comm = idle: cpu1 [ thread pid 11 tid 14 ] Stopped at 0x903a64e: fatal kernel trap: exception = 0x300 (data storage interrupt) virtual address = 0x903a64e dsisr = 0x4000 srr0= 0x8e3cf8 srr1= 0x1032 lr = 0x8e3ce8 curthread = 0x147d360 pid = 11, comm = idle: cpu1 panic: data storage interrupt trap cpuid = 1 time = 1494132014 KDB: stack backtrace: 0xdf5ea2c0: at kdb_backtrace+0x5c 0xdf5ea330: at vpanic+0x1ec 0xdf5ea3a0: at panic+0x54 0xdf5ea3f0: at trap_fatal+0x1cc 0xdf5ea420: at trap+0x122c 0xdf5ea5c0: at powerpc_interrupt+0x180 0xdf5ea5f0: kernel DSI read trap @ 0x903a64e by db_disasm+0x30: srr1=0x1032 r1=0xdf5ea6b0 cr=0x24009022 xer=0 ctr=0x1852cc sr=0x4000 0xdf5ea6b0: at 0x1007460 0xdf5ea6d0: at db_print_loc_and_inst+0x60 0xdf5ea700: at db_trap+0x104 0xdf5ea790: at kdb_trap+0x1bc 0xdf5ea810: at trap_fatal+0x1b0 0xdf5ea840: at trap+0x122c 0xdf5ea870: kernel EXI trap by cpu_idle_60x+0x88: srr1=0x9032 r1=0xdf5ea930 cr=0x4042 xer=0x2000 ctr=0x8e3bd8 saved LR(0x5) is invalid Most (but not all) of the above were while the old PowerMac was sitting unused. The pid 11 Idle thread commonality suggests to me some sort of interrupt oddity messing up when the idle threads were put to use for the interrupt. The /usr/src/sys/powerpc/conf/* files in use are (-NODBG for production style and -DBG for debug style): # more