Hi All,
I have several questions about overdue callout_execute
1. How does the system handle a overdue callout_execute?
2. As far as I know, setrun entry in callout table will be handled as a
softcall, which is a software interrupt(softlevel1) with PIL 1, right?
3. If we found there are so many overdue setrun entries in callout
table, what we should check by using mdb?
Here are background information with my other questions:
I had encountered a system hang issue in my test env and I forced a
crash dump file successfully.
In this crash dump file, I found a e1000g driver thread blocked on
cv_timedwait, but we supposed it should return after 1 tick, but it had
never returned.
> ::callout ! grep fffffe80044e5c80
setrun fffffe80044e5c80 3ffffffffffe1a80
1036d (T-33730) ---> it was overdue about 33730 ticks.
stack pointer for thread fffffe80044e5c80: fffffe80044e5880
[ fffffe80044e5880 _resume_from_idle+0xf8() ]
fffffe80044e58c0 swtch+0x167()
fffffe80044e5930 cv_timedwait+0xcf(ffffffff82f76390, ffffffff82f76388,
1036d)
fffffe80044e59c0 cv_timedwait_sig+0x2cc(ffffffff82f76390,
ffffffff82f76388, 1036d)
fffffe80044e5a70 e1000g_send+0x136(ffffffff82f76370, ffffffffac2fce40)
fffffe80044e5ab0 e1000g_m_tx+0x6f(ffffffff82f76000, ffffffffa21f8180)
fffffe80044e5ad0 dls_tx+0x1d(ffffffff82f2ec80, ffffffffa21f8180)
fffffe80044e5b20 dld_wsrv+0xcc(ffffffff894acb70)
fffffe80044e5b50 runservice+0x42(ffffffff894acb70)
fffffe80044e5b80 queue_service+0x42(ffffffff894acb70)
fffffe80044e5bc0 stream_service+0x73(ffffffff83905740)
fffffe80044e5c60 taskq_d_thread+0xbb(ffffffff833af820)
fffffe80044e5c70 thread_start+8()
After I check the callout table, I found the relevant callout entry for
this thread, and it was overdue for some reason:
> ::callout ! grep fffffe80044e5c80
setrun fffffe80044e5c80 3ffffffffffe1a80
1036d (T-33730) --------> This indicate it is overdue.
I also find about 2573 overdue entries in callout table, it was really a
big number:
> ::callout
FUNCTION ARGUMENT ID TIME
sigalarm2proc ffffffff9569aae0 7fffffffffffc010
144a1 (T-17038) -----> This indicated the it was overdue.
sigalarm2proc ffffffff91bb7510 7fffffffffffe010
14484 (T-17067)
sigalarm2proc ffffffff9569c380 7fffffffffffc020
144a1 (T-17038)
sigalarm2proc ffffffff95428d48 7fffffffffffc030
144a1 (T-17038)
sigalarm2proc ffffffff91bb8db0 7fffffffffffe030
14483 (T-17068)
sigalarm2proc ffffffff9542b238 7fffffffffffc040
144a1 (T-17038)
.....................[snipped]....................................................................
> ::callout ! grep "T-" | wc -l
2573
Why we could find so many overdue entries in callout table?
Since the setrun entry will be processed by softcall, I tried to print
the global softcall list by softhead.
And we did find there are two callout_execute was pending to handled by
softint:
> *softhead::list softcall_t sc_next|::print softcall_t
{
sc_func = callout_execute
sc_arg = 0xffffffff80219000
sc_next = softcalls+0x1290
}
{
sc_func = callout_execute
sc_arg = 0xffffffff80216000
sc_next = 0
}
> softcall_state/J
softcall_state:
softcall_state: 2
#define SOFT_PEND 0x02 /* softcall list needs processing */
But when I checked the softint by following command, I found no pending
count in softlevel1 line, why?
I think the softlevel1 should call into softint(), then, the
callout_execute would be executed.
But it seemed it had never happened, who can give me some ideas about it?
> ::softint
ADDR PEND PIL ARG1 ARG2 ISR(s)
ffffffff8277a5c0 0 1 ffffffff8048da80 0 errorq_intr
fffffffffbc05ae8 0 1 0 0
softlevel1 ------------> no pending count.
ffffffff8277a4c0 0 2 ffffffff8048dd00 0 errorq_intr
fffffffffbc00070 0 2 0 0 cbe_low_level
ffffffff83a706c0 0 4 ffffffff90004d18 0 ghd_doneq_process
ffffffff8ed31880 0 4 ffffffff90004d18 0
ghd_timeout_softintr
ffffffff83946e00 0 4 ffffffff8f046c40 0 power_soft_intr
ffffffff83a70c00 0 4 ffffffff83b1b000 0 bge_chip_factotum
ffffffff83a70cc0 0 4 ffffffff83b1b000 0 bge_reschedule
ffffffff8277a2c0 0 4 0 0 asysoftintr
ffffffff82f4d100 0 9 ffffffff82f76370 0
e1000g_tx_softint_worker
ffffffff82f4df00 0 9 ffffffff82f86370 0
e1000g_tx_softint_worker
ffffffff833b3e80 0 9 ffffffff801af7e8 0 hcdi_soft_intr
ffffffff8277a000 0 9 ffffffff801afb68 0 hcdi_soft_intr
fffffffffbc00030 0 10 0 0 cbe_softclock
Actually, at that time, system still had 1 IDLE CPU, I think the system
shouldn't have so many overdue callout entries.
> ::cpuinfo -v
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH
THREAD PROC
0 fffffffffbc27730 1f 1 6 169 no no t-3
fffffe80000bfc80 sched
| | |
RUNNING <--+ | +--> PIL THREAD
READY | 10 fffffe80000bfc80
QUIESCED | 6 fffffe80000b9c80
EXISTS |
ENABLE +--> PRI THREAD PROC
99 fffffe80000d1c80 sched
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH
THREAD PROC
1 fffffffffbc2f260 1b 1 0 -1 no no t-17
fffffe8000401c80 (idle)
| |
RUNNING <--+ +--> PRI THREAD PROC
READY 60 fffffe80044d9c80 sched
EXISTS
ENABLE
What else should I check?
--
Cheers,
------------------------------------------------------------
Oliver Yang | Work from office | http://blog.csdn.net/yayong
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code