Re: ULE can leak TDQ_LOCK() if statclock() called outside of critical_enter()

2013-01-20 Thread Konstantin Belousov
On Fri, Jan 18, 2013 at 11:56:27AM -0500, Ryan Stone wrote:
 I have been experiencing occasional deadlocks on FreeBSD 8.2 systems using
 the ULE scheduler.  The root cause in every case has been that ULE's
 TDQ_LOCK for cpu 0 is owned by a thread that is not running.  I have been
 investigating the issue, and I believe that I see the issue.  The problem
 occurs if the interrupt that drives statclock does not call
 critical_enter() upon calling into statclock().  The lapic timer does use
 critical_enter(), so default configurations would not see this.  I have
 local patches to use the RTC to drive statclock, and from a quick reading
 of the eventtimer code in -CURRENT the same thing is possible there.  The
 RTC code does not call statclock within a critical section.  So here's the
 bug:
 
 1) A thread with interrupts enabled, running on CPU 0, with td_owepreempt=1
 and td_critnest=1 calls critical_exit():
 
 void
 critical_exit(void)
 {
// ...
 if (td-td_critnest == 1) {
 td-td_critnest = 0;
 if (td-td_owepreempt  !kdb_active) {
 // Irrelevant bits snipped
 
 2) td_critnest is set to 0, and then the RTC interrupt fires.
 
 3) rtcintr calls into statclock (8.2) or statclock_cnt(head) with
 td_critnest still 0 (on head it goes through the eventtimer code, but it
 ends up in statclock eventually).
 
 4) statclock takes the thread_lock on curthread, then calls sched_clock().
 sched_clock calls sched_balance();
 
 static void
 sched_balance(void)
 {
 // snip...
 tdq = TDQ_SELF();
 TDQ_UNLOCK(tdq);
 sched_balance_group(cpu_top);
 TDQ_LOCK(tdq);
 }
I think that current code deserves at least CRITICAL_ASSERT() on
the entry to sched_balance(). The same assert should be added to
kern_clocksource.c::timercb().

 
 TDQ_UNLOCK does a spinlock_exit which does a critical_exit.  td_critnest
 will be decremented back to 0 and td_owepreempt is still 1, so this
 triggers a preemption.  Note that this TDQ_UNLOCK is (intentionally)
 unlocking the thread_lock done by statclock.
 
 5) thread migrates to any other CPU, call it CPU n.  tdq is now stale.
 TDQ_LOCK takes the lock for CPU 0 (but really it's intending to re-take the
 thread_lock, although a thread_lock() here would be equally incorrect --
 sched_balance's caller is going to be mucking around with the TDQ when
 sched_balance returns).
 
 6) The thread returns to statclock.  statclock does a thread_unlock(). The
 td_lock is TDQ_LOCK(n), which we don't hold.  We mangle the stat of
 TDQ_LOCK(n) and leave TDQ_LOCK(0) held.
 
 
 The simplest solution would be to do a critical_enter() in sched_balance,
 although that would be superfluous in the normal case where the lapic timer
 is driving statclock.  I'm not sure if there's other code in the
 eventtimers infrastructure that's assuming that a preemption or migration
 is impossible while handling an event.  A quick look at kern_clocksource.c
 turns up worrying comments like Handle all events for specified time on
 this CPU and uses of curcpu, so there may well be other issues lurking
 here.
 
 It looks to me that the safest thing to do would be to push the
 critical_enter() into the eventtimer code or even all the way back to the
 interrupt handlers (mirroring what the lapic code already does).

Both atrtc and hpet register the interrupt handler as the filter.
The filters call loop enters critical section around handlers, see
kern_intr.c:intr_event_handle(). At least on HEAD it is so, and I see
the same code in the 8.


pgpL1S4JSDJau.pgp
Description: PGP signature


Re: ULE can leak TDQ_LOCK() if statclock() called outside of critical_enter()

2013-01-20 Thread Ryan Stone
On Sun, Jan 20, 2013 at 5:29 AM, Konstantin Belousov kostik...@gmail.comwrote:

 Both atrtc and hpet register the interrupt handler as the filter.
 The filters call loop enters critical section around handlers, see
 kern_intr.c:intr_event_handle(). At least on HEAD it is so, and I see
 the same code in the 8.


Huh, I missed that.  However, on 8.2 ipi_bitmap_handler does not do a
critical_enter() (while HEAD does), so if CPU 0 gets an IPI_STATCLOCK, we
have my bug.  I have DTrace data (from 8.2) showing a thread entering
sched_switch() from sched_balance() when called through an IPI_STATCLOCK.

I'll poke around some more in HEAD to see if there are any entry points
(maybe on other architectures) that don't do a critical section, and then
add the assertions that you suggested.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


ULE can leak TDQ_LOCK() if statclock() called outside of critical_enter()

2013-01-18 Thread Ryan Stone
I have been experiencing occasional deadlocks on FreeBSD 8.2 systems using
the ULE scheduler.  The root cause in every case has been that ULE's
TDQ_LOCK for cpu 0 is owned by a thread that is not running.  I have been
investigating the issue, and I believe that I see the issue.  The problem
occurs if the interrupt that drives statclock does not call
critical_enter() upon calling into statclock().  The lapic timer does use
critical_enter(), so default configurations would not see this.  I have
local patches to use the RTC to drive statclock, and from a quick reading
of the eventtimer code in -CURRENT the same thing is possible there.  The
RTC code does not call statclock within a critical section.  So here's the
bug:

1) A thread with interrupts enabled, running on CPU 0, with td_owepreempt=1
and td_critnest=1 calls critical_exit():

void
critical_exit(void)
{
   // ...
if (td-td_critnest == 1) {
td-td_critnest = 0;
if (td-td_owepreempt  !kdb_active) {
// Irrelevant bits snipped

2) td_critnest is set to 0, and then the RTC interrupt fires.

3) rtcintr calls into statclock (8.2) or statclock_cnt(head) with
td_critnest still 0 (on head it goes through the eventtimer code, but it
ends up in statclock eventually).

4) statclock takes the thread_lock on curthread, then calls sched_clock().
sched_clock calls sched_balance();

static void
sched_balance(void)
{
// snip...
tdq = TDQ_SELF();
TDQ_UNLOCK(tdq);
sched_balance_group(cpu_top);
TDQ_LOCK(tdq);
}

TDQ_UNLOCK does a spinlock_exit which does a critical_exit.  td_critnest
will be decremented back to 0 and td_owepreempt is still 1, so this
triggers a preemption.  Note that this TDQ_UNLOCK is (intentionally)
unlocking the thread_lock done by statclock.

5) thread migrates to any other CPU, call it CPU n.  tdq is now stale.
TDQ_LOCK takes the lock for CPU 0 (but really it's intending to re-take the
thread_lock, although a thread_lock() here would be equally incorrect --
sched_balance's caller is going to be mucking around with the TDQ when
sched_balance returns).

6) The thread returns to statclock.  statclock does a thread_unlock(). The
td_lock is TDQ_LOCK(n), which we don't hold.  We mangle the stat of
TDQ_LOCK(n) and leave TDQ_LOCK(0) held.


The simplest solution would be to do a critical_enter() in sched_balance,
although that would be superfluous in the normal case where the lapic timer
is driving statclock.  I'm not sure if there's other code in the
eventtimers infrastructure that's assuming that a preemption or migration
is impossible while handling an event.  A quick look at kern_clocksource.c
turns up worrying comments like Handle all events for specified time on
this CPU and uses of curcpu, so there may well be other issues lurking
here.

It looks to me that the safest thing to do would be to push the
critical_enter() into the eventtimer code or even all the way back to the
interrupt handlers (mirroring what the lapic code already does).
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org