Hello, Recently, I find a crash issue with my Sparc SunFire V215 server. The OS is nv76. I doubt that it is probably related to writing debug message to the log file : /var/adm/messages, because crash seems happen more often if my driver allows more debug messages to be written into the log file. But I am not very sure yet.
All of these fatal error occured in: "PCIe root complex" and running "px_err_panic ". Below is my analysis using mdb. There might be one or more different threads running when panic happens, but one thread writing messages as below always appears in all crash dump, see the result of " 30001418080::findstack -v " below. Is it truly a log message writing bug of the OS? Tom ::msgbuf panic[cpu0]/thread=3000201ec20: Fatal error has occured in: PCIe root complex. 000002a10045fd50 px:px_err_panic+164 (11, 1, 13a0400, 2a10045fe00, 2a10045fe01, 0) %l0-3: 0000000000000001 0000000000000034 00000000018f6400 0000060010817000 %l4-7: 00000000000ffc00 0000000000000000 000000000183d800 ffffffffffffffff 000002a10045fe60 px:px_err_dmc_pec_intr+cc (300000b5cb0, 0, 300000b5d78, 1, 3000 03c6688, 300000b5c40) %l0-3: 0000009882001a02 0000000000004000 0000000000000000 0000000000000003 %l4-7: 0000000000000000 00000000004b0918 0000000000000000 0000000000000011 000002a10045ff50 unix:current_thread+1b8 (fa000000, 0, 4000000, 0, 39f1600, fa60 ea00) %l0-3: 00000000010074cc 000002a100c952e1 000000000000000e 0000000070008440 %l4-7: 000000000000001a 0000000000000068 000003000201ec20 000002a100c95b90 syncing file systems... 3 3 done dumping to /dev/dsk/c1t0d0s1, offset 318898176, content: kernel > > ::cpuinfo -v ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 00001838610 1b 0 0 59 no no t-0 3000201ec20 Xsun | RUNNING <--+ READY EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 0000180c000 1d 5 0 0 yes no t-90 30001418080 syslogd | | RUNNING <--+ +--> PRI THREAD PROC QUIESCED 99 2a100247ca0 sched EXISTS 59 3000201e5a0 java ENABLE 59 300020e6080 java 59 3000201e260 java 59 3000145b4e0 java A thread writing messages always can be seen in all these crash dumps. c> 30001418080::findstack -v stack pointer for thread 30001418080: 2a10046dca1 [ 000002a10046dca1 panic_idle+0x1c() ] 000002a10046dd51 ktl0+0x48(ffffffffffffffff, 7f904a00070, 0, 0, 7f9, 30001418080) 000002a10046dea1 bcopy_more+0x454(3, 16ec, 1400, 0, ffffffffffffffff, 0) 000002a10046dff1 pfb_setup_cmap32+0x74(600108d4000, 0, 14ec, 0, 600108d4aec, 600108d48ec) 000002a10046e0c1 pfb_vis_consdisplay+0x138(600108d4000, 580, 1400, 600108d5740 , 1800, 1400) 000002a10046e1b1 tem_display_layered+0x20(600109a2f70, 2a10046eb30, 60010803e38, 0, 7ffffc00, 5400) 000002a10046e271 tem_pix_cls_range+0xf0(600109a2f70, 0, 1, 360, a0, 50) 000002a10046e361 tem_pix_cls+0x3c(0, 50, 21, 0, 60010803e38, 0) 000002a10046e431 tem_scroll+0xe0(600109a2f70, 300003d4b50, 21, 0, 0, 60010803e38) 000002a10046e501 tem_lf+0x50(600109a2f70, 60010803e38, 0, 0, 21, 300003d4ac0) 000002a10046e5c1 tem_terminal_emulate+0x40(600109a2f70, 6001160a193, 60010803e38, 300003d4ac0, 0, 7bb7d15c) 000002a10046e671 tem_write+0x30(600109a2f70, 6001160a170, 24, 60010803e38, 1, 600109a2f88) 000002a10046e721 wcstart+0xfc(600109c4620, 600122abcc0, 38, 0, 18bc400, 600122abcc0) 000002a10046e7d1 wcuwput+0x370(600109c4620, 600122abcc0, 1388, 1388, 2a10046f130, 2a10046a000) 000002a10046e881 putnext+0x208(600109c1c28, 600109c4620, 600122abcc0, 0, 1815800, 0) 000002a10046e931 ldtermwmsg+0x130(60010ab1068, 600122abcc0, 1388, 0, 0, 2a10046a000) 000002a10046e9f1 putnext+0x208(60010ab1160, 60010ab1068, 6001110f040, 0, 1815800, 0) 000002a10046eaa1 qdrain_syncq+0x6c(60010ab0e40, 60010ab0dd8, 7005fe90, fffe, 60010ab0ed0, 61) 000002a10046eb51 drain_syncq+0x2fc(60010ab0ed0, 11, 11, fffe, 0, fc00) 000002a10046ec01 strput+0x1a0(600109c7878, 0, 2a10046fa98, 2a10046f6c0, 0, 0) 000002a10046ee01 strwrite_common+0x1f0(600109c2840, 2a10046fa98, 10000, 1, 600109c78f8, 600109c7878) 000002a10046eed1 iwscnwrite+0x18(e00000000, 2a10046fa98, 60010802458, e, e, 60010a91938) 000002a10046ef81 fop_write+0x48(60010b4c300, 2a10046fa98, 0, 60010802458, 0, 4b) 000002a10046f031 sysmwrite+0xe4(480, 2a10046fa98, 60010802458, 2a10046f8f8, 2a10046f8e8, 701b3800) 000002a10046f131 fop_write+0x48(6001247ccc0, 2a10046fa98, 8, 60010802458, 0, 4b) 000002a10046f1e1 write+0x178(4, 280a, 4b, 280a, 6001247ccc0, 0) 000002a10046f2e1 syscall_trap32+0xcc(4, 3ac5a0, 4b, 3ac601, 3ac600, 3ac600) > > 3000201ec20::findstack -v stack pointer for thread 3000201ec20: 2a100c94e11 000002a100c94ec1 mutex_vector_enter+0x458(0, 30001418080, 600108d5740, 5fcf527e6e, 0, 9a) 000002a100c94f81 pfb_ioctl+0xcc0(600108d4000, 16ec, 4607, 100003, 0, 2a100c95adc) 000002a100c950e1 fop_ioctl+0x48(6001270d640, 4607, ffbffb8c, 100003, 6001084d9c0, 2a100c95adc) 000002a100c95191 ioctl+0x164(8, 4607, ffbffb8c, e6cd4, 600113b80b8, 31) 000002a100c952e1 syscall_trap32+0xcc(8, 4607, ffbffb8c, e6cd4, 414fe0, 31) This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org