Saurabh Misra wrote: > The problem is that the level6 interrupt handler (network interrupt) is > blocked (base spl of CPU 0 is 6) which blocked level1 as well. That's > the reason softcall state is still SOFT_PEND (2). > > This problem is being worked as part of bug :- > > 6292092 callout should not be blocked by interrupts from executing > realtime timeouts > 6540436 kpreempt() needs a more reliable way to generate level1 intr > > We have seen this problem happening very often with e1000g. > Hi Saurabh,
What I concerned is, although CPU0 BSPL is 6 as you said. But CPU1 is IDLE at time, why those pending softcall can't be dispatched to CPU1? > Eric Saxe wrote: > >> FYI, question posed on opensolaris forums, I figured you probably have >> an answer. :) >> >> -Eric >> >> ------------------------------------------------------------------------ >> >> Subject: >> [osol-code] Questions about overdue callout_execute >> From: >> Oliver Yang <[EMAIL PROTECTED]> >> Date: >> Mon, 27 Aug 2007 16:48:36 +0800 >> To: >> [EMAIL PROTECTED], [email protected] >> >> To: >> [EMAIL PROTECTED], [email protected] >> >> >> Hi All, >> >> I have several questions about overdue callout_execute >> >> 1. How does the system handle a overdue callout_execute? >> >> 2. As far as I know, setrun entry in callout table will be handled as a >> softcall, which is a software interrupt(softlevel1) with PIL 1, right? >> >> 3. If we found there are so many overdue setrun entries in callout >> table, what we should check by using mdb? >> >> Here are background information with my other questions: >> >> I had encountered a system hang issue in my test env and I forced a >> crash dump file successfully. >> >> In this crash dump file, I found a e1000g driver thread blocked on >> cv_timedwait, but we supposed it should return after 1 tick, but it had >> never returned. >> >> > ::callout ! grep fffffe80044e5c80 >> setrun fffffe80044e5c80 3ffffffffffe1a80 >> 1036d (T-33730) ---> it was overdue about 33730 ticks. >> >> stack pointer for thread fffffe80044e5c80: fffffe80044e5880 >> [ fffffe80044e5880 _resume_from_idle+0xf8() ] >> fffffe80044e58c0 swtch+0x167() >> fffffe80044e5930 cv_timedwait+0xcf(ffffffff82f76390, ffffffff82f76388, >> 1036d) >> fffffe80044e59c0 cv_timedwait_sig+0x2cc(ffffffff82f76390, >> ffffffff82f76388, 1036d) >> fffffe80044e5a70 e1000g_send+0x136(ffffffff82f76370, ffffffffac2fce40) >> fffffe80044e5ab0 e1000g_m_tx+0x6f(ffffffff82f76000, ffffffffa21f8180) >> fffffe80044e5ad0 dls_tx+0x1d(ffffffff82f2ec80, ffffffffa21f8180) >> fffffe80044e5b20 dld_wsrv+0xcc(ffffffff894acb70) >> fffffe80044e5b50 runservice+0x42(ffffffff894acb70) >> fffffe80044e5b80 queue_service+0x42(ffffffff894acb70) >> fffffe80044e5bc0 stream_service+0x73(ffffffff83905740) >> fffffe80044e5c60 taskq_d_thread+0xbb(ffffffff833af820) >> fffffe80044e5c70 thread_start+8() >> >> After I check the callout table, I found the relevant callout entry for >> this thread, and it was overdue for some reason: >> >> > ::callout ! grep fffffe80044e5c80 >> setrun fffffe80044e5c80 3ffffffffffe1a80 >> 1036d (T-33730) --------> This indicate it is overdue. >> >> I also find about 2573 overdue entries in callout table, it was really a >> big number: >> >> > ::callout >> FUNCTION ARGUMENT ID TIME >> sigalarm2proc ffffffff9569aae0 7fffffffffffc010 >> 144a1 (T-17038) -----> This indicated the it was overdue. >> sigalarm2proc ffffffff91bb7510 7fffffffffffe010 >> 14484 (T-17067) >> sigalarm2proc ffffffff9569c380 7fffffffffffc020 >> 144a1 (T-17038) >> sigalarm2proc ffffffff95428d48 7fffffffffffc030 >> 144a1 (T-17038) >> sigalarm2proc ffffffff91bb8db0 7fffffffffffe030 >> 14483 (T-17068) >> sigalarm2proc ffffffff9542b238 7fffffffffffc040 >> 144a1 (T-17038) >> .....................[snipped].................................................................... >> >> > ::callout ! grep "T-" | wc -l >> 2573 >> >> Why we could find so many overdue entries in callout table? >> >> Since the setrun entry will be processed by softcall, I tried to print >> the global softcall list by softhead. >> And we did find there are two callout_execute was pending to handled by >> softint: >> >> > *softhead::list softcall_t sc_next|::print softcall_t >> { >> sc_func = callout_execute >> sc_arg = 0xffffffff80219000 >> sc_next = softcalls+0x1290 >> } >> { >> sc_func = callout_execute >> sc_arg = 0xffffffff80216000 >> sc_next = 0 >> } >> >> > softcall_state/J >> softcall_state: >> softcall_state: 2 >> >> #define SOFT_PEND 0x02 /* softcall list needs processing */ >> >> But when I checked the softint by following command, I found no pending >> count in softlevel1 line, why? >> >> I think the softlevel1 should call into softint(), then, the >> callout_execute would be executed. >> >> But it seemed it had never happened, who can give me some ideas about it? >> >> >> > ::softint >> ADDR PEND PIL ARG1 ARG2 ISR(s) >> ffffffff8277a5c0 0 1 ffffffff8048da80 0 errorq_intr >> fffffffffbc05ae8 0 1 0 0 >> softlevel1 ------------> no pending count. >> ffffffff8277a4c0 0 2 ffffffff8048dd00 0 errorq_intr >> fffffffffbc00070 0 2 0 0 cbe_low_level >> ffffffff83a706c0 0 4 ffffffff90004d18 0 ghd_doneq_process >> ffffffff8ed31880 0 4 ffffffff90004d18 0 >> ghd_timeout_softintr >> ffffffff83946e00 0 4 ffffffff8f046c40 0 power_soft_intr >> ffffffff83a70c00 0 4 ffffffff83b1b000 0 bge_chip_factotum >> ffffffff83a70cc0 0 4 ffffffff83b1b000 0 bge_reschedule >> ffffffff8277a2c0 0 4 0 0 asysoftintr >> ffffffff82f4d100 0 9 ffffffff82f76370 0 >> e1000g_tx_softint_worker >> ffffffff82f4df00 0 9 ffffffff82f86370 0 >> e1000g_tx_softint_worker >> ffffffff833b3e80 0 9 ffffffff801af7e8 0 hcdi_soft_intr >> ffffffff8277a000 0 9 ffffffff801afb68 0 hcdi_soft_intr >> fffffffffbc00030 0 10 0 0 cbe_softclock >> >> >> Actually, at that time, system still had 1 IDLE CPU, I think the system >> shouldn't have so many overdue callout entries. >> >> > ::cpuinfo -v >> ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH >> THREAD PROC >> 0 fffffffffbc27730 1f 1 6 169 no no t-3 >> fffffe80000bfc80 sched >> | | | >> RUNNING <--+ | +--> PIL THREAD >> READY | 10 fffffe80000bfc80 >> QUIESCED | 6 fffffe80000b9c80 >> EXISTS | >> ENABLE +--> PRI THREAD PROC >> 99 fffffe80000d1c80 sched >> >> ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH >> THREAD PROC >> 1 fffffffffbc2f260 1b 1 0 -1 no no t-17 >> fffffe8000401c80 (idle) >> | | >> RUNNING <--+ +--> PRI THREAD PROC >> READY 60 fffffe80044d9c80 sched >> EXISTS >> ENABLE >> >> What else should I check? >> > _______________________________________________ > mdb-discuss mailing list > [EMAIL PROTECTED] > -- Cheers, ------------------------------------------------------------ Oliver Yang | Work from office | http://blog.csdn.net/yayong _______________________________________________ opensolaris-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
