Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-17 Thread John Baldwin
On Friday 14 May 2010 7:59:40 am Terry Kennedy wrote: > > > The crash was a "page fault while in kernel mode" with the current process > > > being the interrupt service routine for the bce0 GigE. Things progressed > > > reasonably until partway through the dump, when the system locked up with > >

Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-17 Thread John Baldwin
On Friday 14 May 2010 11:42:44 am Matthew Fleming wrote: > > As an aside, this is a quad-core in one package CPU (an X3363). On both > > this box and a similar one with an X5470, console messages continue to > > print out after "the system has been halted - press any key to reboot" - > > in parti

RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Terry Kennedy
> Oops, youre right that other CPUs are running. > > The stop_cpus() call is only made if kdb is entered. doadump() is called > out of boot() which comes later. At Isilon weve been running with a patch > that does stop_cpus() pretty close to the front of panic(9). This is interesting, and ch

Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Jacob
Matthew Fleming wrote: As an aside, this is a quad-core in one package CPU (an X3363). On both this box and a similar one with an X5470, console messages continue to print out after "the system has been halted - press any key to reboot" - in particular, the shutdown makes a bunch of the "beh

RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Fleming
> As an aside, this is a quad-core in one package CPU (an X3363). On both > this box and a similar one with an X5470, console messages continue to > print out after "the system has been halted - press any key to reboot" - > in particular, the shutdown makes a bunch of the "behind the scenes" man-

Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Jeremy Chadwick
On Fri, May 14, 2010 at 09:56:47AM -0400, Terry Kennedy wrote: > As an aside, this is a quad-core in one package CPU (an X3363). On both > this box and a similar one with an X5470, console messages continue to > print out after "the system has been halted - press any key to reboot" - > in particu

RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Terry Kennedy
> > Hmm. You could try changing the code to not do a nested panic in that > > case. You would update subr_turnstile.c to just return if panicstr is > > not NULL rather than calling panic. However, there is still a good > > chance you will end up deadlocking in that case. I have another patch I

RE: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Matthew Fleming
> > The crash was a "page fault while in kernel mode" with the current process > > being the interrupt service routine for the bce0 GigE. Things progressed > > reasonably until partway through the dump, when the system locked up with a > > "Sleeping thread (tid 100028, pid 12) owns a non-sleepab

Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread Terry Kennedy
> The crash was a "page fault while in kernel mode" with the current process > being the interrupt service routine for the bce0 GigE. Things progressed > reasonably until partway through the dump, when the system locked up with a > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock".

Re: Crash dump problem - sleeping thread owns a non-sleepable lock during crash dump write

2010-05-14 Thread John Baldwin
Terry Kennedy wrote: I'm reposting this over here at the suggestion of the Forums moderator. The original post is at http://forums.freebsd.org/showthread.php?t=14163 Got an interesting crash just now (well, as interesting as a crash on a soon-to-be production system can be). This is 8-STABL