Re: SCHED_ASSERT_UNLOCKED is considered harmful in the _kernel_lock()

Mike Belopuhov Mon, 08 Nov 2010 01:57:09 -0800

On Mon, Nov 8, 2010 at 1:01 AM, Philip Guenther <[email protected]> wrote:
> On Sunday, November 7, 2010, Mark Kettenis <[email protected]> wrote:
>>> Date: Fri, 5 Nov 2010 17:52:23 +0100
>>> From: Mike Belopuhov <[email protected]>
>>
>> Mike, you might want to take a look at PR 6508.  I think the
>> "sched_lock" panic:
>>
>>> ddb{0}> show panic
>>> kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" failed:
>>
>> is actually a side effect of trapping in the middle of a context
>> switch when we're doing the sched_lock/kernel_lock dance.  In PR 6508
>> it is almost certainly a page fault that happened because we
>> overflowed the stack.  That also might be the cause of your panic.  At
>> least judging from the traceback, your stack is seriously hosed:
>
> Since i386 and amd64 put the lernel stack above the PCB, stack overrun
> means the PCB has already been overwritten.  At that point, trying to
> save the process context will probably blow up trying to follow some
> pointer therein.
>
> I had chatted some with Theo about putting a guard page below the
> kernel stack to catch this sort of thing.  Would want to move the PCB
> to above the stack at the same time to save most of a page.  Would the
> result help isolate these problems enough to be worth the effort?
>


indeed.  most of the time you don't get a nice stack trace as in the
pr 6508.  most of the time these panics are not reproducible.  so i
see a real value in having this sort of early detection.

>
> Philip Guenther

Re: SCHED_ASSERT_UNLOCKED is considered harmful in the _kernel_lock()

Reply via email to