On Tue, 19 Oct 2010, Alexander Motin wrote:

Marius Strobl wrote:
On Mon, Oct 18, 2010 at 05:05:24PM -0400, John Baldwin wrote:
On Monday, October 18, 2010 4:52:24 pm Marius Strobl wrote:
AFAICT this is not true; intr_event_handle() in sys/kern/kern_intr.c
is what enters a critical section and f.e. on amd64 I don't see where
anywhere in the path from ISR_VEC() to intr_execute_handlers()
calling intr_event_handle() a critical section would be entered,
which also means that in intr_execute_handlers() td_intr_nesting_level
is incremented outside of a critical section.
Not all of the clock interrupts use intr_event_handle().  The local APIC
timer uses its own interrupt entry point on x86 for example and uses an
explicit critical section as a result.  I suspect the sparc64 tick interrupt
is closer to the local APIC timer case and doesn't use intr_event_handle().

Correct; but still you can't say that the MD interrupt code enters a
critical section in general, neither is incrementing td_intr_nesting_level
in intr_execute_handlers() protected by a critical section.

The fact that some clock interrupts do use intr_event_handle() (e.g. the
atrtc driver on x86 now) does indicate that the low-level interrupt code
probably does not belong in the time events code but in the caller.

Well, I agree that entering a critical section in the time events
code would mean entering a nested critical section unnecessarily in
case the clock driver uses a regular "fast" interrupt handler and
that should be avoided. Still I don't think the event time front-end
actually should need to worry about wrapping the callback in a
critical section.

I think it belongs in the caller, and the front end shouldn't do any more
than assert it.

Interrupt frame, required for hard-/stat-/profclock() operation is
stored in curthread. So critical section is effectively mandatory there
now. Correct td_intr_nesting_level value is also important for proper
interrupt threads scheduling - one more reason to have critical section
there. It is indeed strange that td_intr_nesting_level in
intr_event_handle() is not covered by critical section, but probably it
should.

td_intr_nesting_level is garbage, as is the style of the use of it in
sched_ule.c (non-boolean used in boolean context).  I introduced it
(as a global named intr_nesting_level) to count interrupt nesting with
spls.  It was an i386 implemention detail for limiting interrupt nesting
to about 4, but a few places abused it to detect things like M_WAITOK
malloc()s in interrupt context.  There was 1 non-abusive MI reference
to it, via the MD CLKF_INTR(framep) macro, and the most of the abuses
wouldn't have been abuses if there had been a MD macro for them.  See
RELENG_4 for all this.  (spls allow interrupt nesting to either the
number of bits in the mask or the number of high-level spls, but the
kernel stack is not large enough for this, so there was an even more
arbitary limit of about 4.)

Now with ithreads for interrupts, interrupts can't nest (an interrupt
can interrupt an ithread, but that is no more nesting than an interrupt
interrupting a user thread).  td_intr_nesting_level is 0 in ithreads,
so it is almost useless for its original abuse for detecting things
like M_WAITOK malloc()s in interrupt context, but it is still abused
for that :-(.  td_intr_nesting level is != 0 mainly in low-level interrupt
handling code and fast interrupt handlers, and neither of these should
be so broken as to call malloc(), so the check in malloc() is useless;
the correct check (for TDP_ITHREAD) is missing.  Similar checks in
uipc_mbuf.c were just removed, and similar but more-bogus checks in
vinum went away with vinum.

This leaves 3 possibly-useful uses of td_intr_nesting_level in -current:

- {amd64,i386}/trap.c: td_intr_nesting_level  is checked to be zero to
  makes certain traps more fatal when it is nonzero.  Other arches
  don't bother with such a test.  amd64 and i386 don't bother with
  testing the more likely error of a trap that should be fatal in an
  ithread.  This can be handled in the same way as in malloc() -- don't
  bother testing for the very unlikely case, especially if you don't
  bother testing for the unlikely case.

- kern_clock.c: this tests for td_intr_nesting_level >= 2.  That is
  almost exactly the test that used to be hidden in the MD CLKF_INTR().
  I used to maintain patches with an assertion that this never happens
  with ithreads.  It never happens unless there is a bug in the low-level
  interrupt handling code or in the management of td_intr_nesting_level.
  On entry to an interrupt, td_intr_nesting level is incremented to
  1, and another thread should be switched to without allowing any
  interrupt that will increment it again; then on switch back, interrupts
  should remain masked so that it is not incremented from 1 to 2 just
  before it is decremented to 0.  I would have removed td_intr_nesting_level
  from my version, except my version actually needs it, for this use
  in hardclock() only :-(.  In my version, the low-level interrupt
  masking is partially in software, so clock interrupts can interrupt
  the low-level interrupt handling.  I just remembered that sparce64
  also has or had some interrupt masking in software; perhaps it needs
  a similar treatment.

- sched_ule.c: releatively recently, this scheduler started using
  td_intr_nesting_level.  More recently td_intr_nesting_level was
  incremented in clock code to keep this scheduler happy, so something
  is clearly needed.  But sched_4bsd.c doesn't need it, and the
  increment being outside of the critical section in
  intr_execute_handlers() shows how uncritical this variable is.
  Perhaps sched_ule.c can use td_critnest instead.  You have to be
  carefully with not calling critical_enter() too much, so that
  the scheduler cannot tell the real level.  Levels near mi_switch()
  have to have levels of precisely 0, 1 or 2, so that the level
  drops to precisely 0 when the switch is complete, and this is related
  to deciding whether to schedule.

and 2 other nonsense abuses:
- dev/sound/isa/sb16.c: abuses td_intr_nesting_level to avoid doing a
  diagnostic interrupt if (td_intr_nesting_level != 0).  I think this
  has no effect, as for malloc() except it is easier to inspect the
  code to see that it cannot be reached in fast interrupt handler
  context, so the check is null.
- vm_page.c: exactly the same as in malloc(), with a check for unlikely
  fast interrupt handler context but none for ordinary ithread context.

Most arch-specific code that manages td_intr_nesting_level does it in a
critical section.  powerpc also uses atomic ops.

I only grepped for td_intr_nesting_level, and would have missed any
different spelling of it in asm code where most of the original i386
code for it was.

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to