Hi
On 05/21/07 19:05, Thomas De Schampheleire wrote:
[cut]
In parts of our log which 'seems' normal, we see:
1. a bunch of lwp_create (equal to the number of threads used)
2. a lot of calls to dispdeq() (which seems to be the actual execution)
3. lwp_exits for each thread (one less than the actual amount of
threads, I suppose the last one is the main thread?)
4. proc_exit() of the last thread.
However, lwp_create doesn't seem to call thread_create, even though
according to my understanding of the source code should do that every
time. Is this correct?
See the bit of lwp_create that begins with this comment:
/*
* Try to reclaim a <lwp,stack> from 'deathrow'
*/
That may return a cached lwp without calling thread_create.
Furthermore, there appears to be no resume*, swtch, ... functions
called during dispdeq(). Is this normal? Surely the different threads
get switched using these functions?
dispdeq just removes the specified thread from the dispatch queue it
is on - the caller may be intending to run it, or perhaps to
requeue it elsewhere (e.g., if offlining a cpu). So look at the
callers of dispdeq.
Other questions I have are:
- On which occasions is preempt() called? For example, in our logs I
see that it is called after current_thread() and intr_thread() which
handle interrupts. But at the moment preempt() is called, the
interrupts is already handled, so I don't see why the preempt is
there.
A running thread being interrupted (by whatever PIL interrupt)
is not that same as being preempted. In the interrupt case
the interrupted thread is not preempted - it remains the
dispatched thread on this cpu - cpu_dispthread in cpu_t -
but might not be the cpu_thread (current thread) for a short
time (low PIL interrupts change cpu_thread, high PIL interrupts
do not).
Being "preempted" means being kicked off of cpu as in no longer being
the dispatched thread there - ie changing cpu_dispthread.
Generally preemption decisions are made in terms of relative
thread priority, ie a higher-priority thread becomes runnable
on this cpu (ie is enqueued to a dispatch queue of this cpu
at higher priority than the running thread) and the running thread
is preempted in favour of that.
- I also see a lot of sequences of current_thread() followed
immediately by intr_thread(). Both handle interrupts, only the first
one uses pinning, right? Do they necessarily follow each other?
intr_thread handles lower PIL (<= 10) interrupts, while current_thread
handles high PIL interrupts (>= 11). The difference is that
low PIL interrupts are handled within their own dedicated thread
(which becomes the cpu_thread for the duration of the interrupt
service) while high PIL interrupts squat on top of the thread
structure of the thread they have interrupts- high PIL
interrupts may not block, for this reason - there is no
thread structure to enqueue anywhere.
- Here's another example sequence which I don't understand:
cpu0 [ cycle 4959000299L ]: [ magic 5330 ] current_thread(): entry
cpu0 [ cycle 4959000316L ]: [ magic 5333 ] current_thread(): not
interrupting another interrupt
cpu0 [ cycle 4959000332L ]: [ magic 5334 ] current_thread(): before handler
cpu0 [ cycle 4959002773L ]: [ magic 5337 ] current_thread(): exit
cpu0 [ cycle 4959007958L ]: [ magic 5324 ] intr_thread(): exit
cpu0 [ cycle 4959008316L ]: [ magic 1008 ] preempt() | pid 100006 [tid
7 | threadp 0x 30002dc3720]
cpu0 [ cycle 4959008548L ]: [ magic 1016 ] setfrontdq(): entry | pid
100006 [tid 7 | threadp 0x 30002dc3720]
cpu0 [ cycle 4959009366L ]: [ magic 1017 ] setfrontdq(): exit | pid
100006 [tid 7 | threadp 0x 30002dc3720]
cpu0 [ cycle 4959750279L ]: [ magic 5330 ] current_thread(): entry
cpu0 [ cycle 4959750296L ]: [ magic 5333 ] current_thread(): not
interrupting another interrupt
cpu0 [ cycle 4959750312L ]: [ magic 5334 ] current_thread(): before handler
cpu0 [ cycle 4959752520L ]: [ magic 5337 ] current_thread(): exit
cpu0 [ cycle 4959757705L ]: [ magic 5324 ] intr_thread(): exit
cpu0 [ cycle 4959758063L ]: [ magic 1008 ] preempt() | pid 100006 [tid
7 | threadp 0x 30002dc3720]
cpu0 [ cycle 4959758295L ]: [ magic 1016 ] setfrontdq(): entry | pid
100006 [tid 7 | threadp 0x 30002dc3720]
cpu0 [ cycle 4959758996L ]: [ magic 1017 ] setfrontdq(): exit | pid
100006 [tid 7 | threadp 0x 30002dc3720]
cpu0 [ cycle 4960500279L ]: [ magic 5330 ] current_thread(): entry
cpu0 [ cycle 4960500296L ]: [ magic 5333 ] current_thread(): not
interrupting another interrupt
cpu0 [ cycle 4960500312L ]: [ magic 5334 ] current_thread(): before handler
cpu0 [ cycle 4960502520L ]: [ magic 5337 ] current_thread(): exit
cpu0 [ cycle 4960507705L ]: [ magic 5324 ] intr_thread(): exit
............. (this goes on for a long time)..........
First, how come there is only an exit of intr_thread(), even though I
have put a magic instr. at the entrance as well (and I have seen the
instruction before). Could this be due to a massive interrupting of
interrupts?
Not sure what you mean. But from my recollection I think I'd always
expect to see intr_thread entry and exit in pairs. When the level
interrupt (pil 1 to 15) fires we land up in pil_interrupt() which
decides where we go from there: intr_thread or current_thread;
we get to both via sys_trap which takes us from traplevel 1 back
to traplevel 0 running at a specified pil and in a chosen handler
and with return linkage to the interrupted thread (*_rtt).
Second: thread 100006:7 seems to be preempted, but is then immediately
after put in front of the queue. Why then the preempt? Or is the
preempt called anyway after an interrupt? Or is preempt called anyway
after each OS tick (10 ms), even without an interrupt?
As above, "preempt" != "interrupt"
I have looked in the source code, but the kernel and especially the
scheduler are quite complex, so it's easy to overlook things (at least
for me).
It's a thing of beauty, but takes a lot of staring at before it all
makes sense; I know more than most about the dispatcher/scheduler but I'm
still way short of expert. A simulator is a great place to study parts of it,
but I'd also encourage using dtrace on a live system for exploring
the code and its logic.
Gavin
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code