On Mon, Nov 8, 2010 at 1:01 AM, Philip Guenther <[email protected]> wrote: > On Sunday, November 7, 2010, Mark Kettenis <[email protected]> wrote: >>> Date: Fri, 5 Nov 2010 17:52:23 +0100 >>> From: Mike Belopuhov <[email protected]> >> >> Mike, you might want to take a look at PR 6508. I think the >> "sched_lock" panic: >> >>> ddb{0}> show panic >>> kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" failed: >> >> is actually a side effect of trapping in the middle of a context >> switch when we're doing the sched_lock/kernel_lock dance. In PR 6508 >> it is almost certainly a page fault that happened because we >> overflowed the stack. That also might be the cause of your panic. At >> least judging from the traceback, your stack is seriously hosed: > > Since i386 and amd64 put the lernel stack above the PCB, stack overrun > means the PCB has already been overwritten. At that point, trying to > save the process context will probably blow up trying to follow some > pointer therein. > > I had chatted some with Theo about putting a guard page below the > kernel stack to catch this sort of thing. Would want to move the PCB > to above the stack at the same time to save most of a page. Would the > result help isolate these problems enough to be worth the effort? >
indeed. most of the time you don't get a nice stack trace as in the pr 6508. most of the time these panics are not reproducible. so i see a real value in having this sort of early detection. > > Philip Guenther
