subject:"debugging frequent kernel panics on 8.2\-RELEASE"

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Jamie Gritton


On 08/20/11 19:19, Steven Hartland wrote:

- Original Message - From: Andriy Gapon a...@freebsd.org


on 20/08/2011 23:24 Steven Hartland said the following:

- Original Message - From: Steven Hartland

Looking through the code I believe I may have noticed a scenario
which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
struct prison *ppr, *tpr;
int vfslocked;

if (!(flags  PD_LOCKED))
mtx_lock(pr-pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags  PD_DEUREF) {
for (tpr = pr;; tpr = tpr-pr_parent) {
if (tpr != pr)
mtx_lock(tpr-pr_mtx);
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {
mtx_unlock(tpr-pr_mtx);
if (flags  PD_LIST_SLOCKED)
sx_sunlock(allprison_lock);
else if (flags  PD_LIST_XLOCKED)
sx_xunlock(allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(tpr-pr_mtx);
mtx_lock(pr-pr_mtx);
}
}

If you take a scenario of a simple one level prison setup running a
single
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As
this is the
last process then pr_uref will hit 0 and the loop continues instead
of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other
process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but
then instantly
exit, this process may allow another kernel thread to hit this same
bit of code
and so two process for the same prison get into the section which
decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)

It seems like the action on the parent prison to decrement the
pr_uref is
happening too early, while the jail can still be used and without
the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison
pr_uref's down
so it only takes place if the jail is really being removed. Either
that or
to change the locking semantics so that once the lock is aquired in
this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added
hierarchical
jails, the removal of the lock by the inital loop on the passed in
prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h



If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
- mtx_unlock(tpr-pr_mtx);
+ if (tpr != pr)
+ mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {


Not sure if this would fly as is - please double check the later block
where
pr-pr_mtx is re-locked.


Your right, and its actually more complex than that. Although changing
it to
not unlock in the middle of prison_deref fixes that race condition it
doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.

If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.

The following patch I believe fixes both of these issues.

I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.

It essentially reverts the changes to the if (flags  PD_DEUREF) by
192895 and moves it to after the jail has been actually removed.

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-21 01:56:58.429894825 +0100
@@ -2449,27 +2449,16 @@
mtx_lock(pr-pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags  PD_DEUREF) {
- for (tpr = pr;; tpr = tpr-pr_parent)

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Steven Hartland

- Original Message - 
From: Jamie Gritton ja...@freebsd.org

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change
prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented
twice by prison1)


First off thanks for the feedback Jamie most appreciated :)


The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition.


Are you sure? I do think that unlocking the mtx half way through the
call allows the above scenario to create a race condition, all be it
very briefly, when ignoring the overriding issue.

In addition if the code where changed to so that the pr_uref++ also
maintained the parents uref this would definitely lead to a potential
problems in my mind, especially if you had more than one child prison,
of a given parent, entering the dying state at any one time.

In this case I believe you would have to acquire the locks of all
the parent prisons before it would be safe to precede.


The trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).


I not sure that persistent prisons actually suffer from this in any
different way tbh, as they have an additional uref increment so would
never hit this case unless they have been actively removed and hence
unpersisted first.



There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.


Ahh yes in the hierarchical case my patch would indeed mean that none
persistent parent jails would remain visible even when its last child
jail is in a dying state.

As you say making this not the case would likely require replacing all
instances of pr_uref++ with a prison_uref method that implements the
opposite of the loop in prison_dref should the prisons pr_uref be 0 when
called.


Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.


Good to know my interpretation was correct, even if I was missing the
visibility factor in the hierarchical case :)


Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.



From an admin perspective the current jail dying state does cause

confusion when your not aware of its existence. You ask a jail to stop it
appears to have completed that request, but really hasn't, an generally
due to just a lingering tcp connection.

With the introduction of hierarchical jails that gets a little worse
where a whole series of jails could disappear from normal view only to
be resurrected shortly after. Something to bear in mind when deciding
which solution of the two presented to use.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Jamie Gritton


On 08/21/11 05:01, Steven Hartland wrote:

- Original Message - From: Jamie Gritton ja...@freebsd.org

The problem isn't with the conditional locking of tpr in prison_deref.
That locking is actually correct, and there's no race condition.


Are you sure? I do think that unlocking the mtx half way through the
call allows the above scenario to create a race condition, all be it
very briefly, when ignoring the overriding issue.

In addition if the code where changed to so that the pr_uref++ also
maintained the parents uref this would definitely lead to a potential
problems in my mind, especially if you had more than one child prison,
of a given parent, entering the dying state at any one time.

In this case I believe you would have to acquire the locks of all
the parent prisons before it would be safe to precede.


Lock order requires that I unlock the child if I want to lock the
parent. While that does allow periods where neither is locked, it's safe
in this case. There may be multiple processes dying in one jail, or in
multiple children of a single jail. But as long as a parent jail is
locked while decrementing pr_uref, then only one of these simultaneous
prison_deref calls would set pr_uref to zero and continue in the loop to
that prison's parent. This might be mixed with pr_uref being incremented
elsewhere, but that's not a problem either as long as the jail in
question is locked.


The trouble lies in the resurrection of dead jails, as Andriy has noted
(though not just attaching, but also by setting its persist flag causes
the same problem).


I not sure that persistent prisons actually suffer from this in any
different way tbh, as they have an additional uref increment so would
never hit this case unless they have been actively removed and hence
unpersisted first.


Right - both the attach and persist cases are only a problem when a jail
has disappeared. There are various ways for a jail to be removed,
potentially to be kept around but in the dying state, but only two
related ways for it to be resurrected: attaching a new process or
setting the persist flag, both via jail_set with the JAIL_DYING flag passed.


There are two possible fixes to this. One is the patch you've given,
which only decrements a parent jail's pr_uref when the child jail
completely goes away (as opposed to when it loses its last uref). This
provides symmetry with the current way pr_uref is incremented on the
parent, which is only when a jail is created.

The other fix is to increment a parent's pr_uref when a jail is
resurrected, which will match the current logic in prison_deref. I like
the external semantics of this solution: a jail isn't visible if it is
not persistent and has no processes and no *visible* sub-jails, as
opposed to having no sub-jails at all. But this solution ends up pretty
complicated - there are a few places where pr_uref is incremented, where
I might need to increment parent jails' pr_uref as well, much like the
current tpr loop in prison_deref decrements them.


Ahh yes in the hierarchical case my patch would indeed mean that none
persistent parent jails would remain visible even when its last child
jail is in a dying state.

As you say making this not the case would likely require replacing all
instances of pr_uref++ with a prison_uref method that implements the
opposite of the loop in prison_dref should the prisons pr_uref be 0 when
called.


Yes, that's the problem. Maybe not all instances, but at least most have
enough times a jail is unlocked that we can't assume the pr_uref hasn't
been set to zero somewhere else, and so we need to do that loop.


Your solution removes code instead of adding it, which is generally a
good thing. While it does change the semantics of pr_uref in the
hierarchical case at least from what I thought it was, those semantics
haven't been working properly anyway.


Good to know my interpretation was correct, even if I was missing the
visibility factor in the hierarchical case :)


Bjoern, I'm adding you to the CC list for this because the whole pr_uref
thing was your idea (though it was pr_nprocs at the time), so you might
care about the hierarchical semantics of it - or you may not. Also, this
is a panic-inducing bug in current and may interest you for that reason.


 From an admin perspective the current jail dying state does cause
confusion when your not aware of its existence. You ask a jail to stop it
appears to have completed that request, but really hasn't, an generally
due to just a lingering tcp connection.

With the introduction of hierarchical jails that gets a little worse
where a whole series of jails could disappear from normal view only to
be resurrected shortly after. Something to bear in mind when deciding
which solution of the two presented to use.


The good news is that the only time a jail (or perhaps a whole set of
jails) can only come back from the dead when the administrator makes a
concerted effort to do so. So it at least shouldn't surprise the

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Roger Marquis


On Sat, 20 Aug 2011, Steven Hartland wrote:

Are you seeing a double fault panic?


We're seeing both.  At least one double (or more) fault finishing with
Fatal Trap 12: page fault while in kernel mode.  Subsequent panics have
been single fault (all visible on the IPMI console) Fatal Trap 9:
general protection fault while in kernel mode.

Could well be unrelated.  The system is undergoing hardware diags now.

Roger Marquis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon

on 18/08/2011 02:15 Steven Hartland said the following:
 In a nutshell the jail manager we're using will attempt to resurrect the jail
 from a dieing state in a few specific scenarios.
 
 Here's an exmaple:-
 1. jail restart requested
 2. jail is stopped, so the java processes is killed off, but active tcp 
 sessions
 may prevent the timely full shutdown of the jail.
 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
 starting a new jail we attach to the old one and exec the new java process.
 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
 sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
 and the java exec'ed.
 
 The system uses static jailid's so its possible to determine if an existing
 jail for this service exists or not. This prevents duplicate services as
 well as making services easy to identify by their jailid.
 
 So what we could be seeing is a race between the jail shutdown and the attach
 of the new process?

Not a jail expert at all, but a few suggestions...

First, wouldn't the 'persist' jail option simplify your life a little bit?

Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
via kgdb) while executing various scenarios of what you do now.  If after
finishing a certain scenario you end up with a value lower than at the start of
scenario, then this is the troublesome one.
Please note that prison0.pr_uref is composed from a number of non-jailed
processes plus a number of top-level jails.  So take this into account when
comparing prison0.pr_uref values - it's better to record the initial value when
no jails are started and it's important to keep the number of non-jailed
processes the same (or to account for its changes).

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon

on 20/08/2011 13:02 Andriy Gapon said the following:
 on 18/08/2011 02:15 Steven Hartland said the following:
 In a nutshell the jail manager we're using will attempt to resurrect the jail
 from a dieing state in a few specific scenarios.

 Here's an exmaple:-
 1. jail restart requested
 2. jail is stopped, so the java processes is killed off, but active tcp 
 sessions
 may prevent the timely full shutdown of the jail.
 3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
 starting a new jail we attach to the old one and exec the new java process.
 4. if an existing jail isnt detected, i.e. where there where not hanging tcp
 sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
 and the java exec'ed.

 The system uses static jailid's so its possible to determine if an existing
 jail for this service exists or not. This prevents duplicate services as
 well as making services easy to identify by their jailid.

 So what we could be seeing is a race between the jail shutdown and the attach
 of the new process?
 
 Not a jail expert at all, but a few suggestions...
 
 First, wouldn't the 'persist' jail option simplify your life a little bit?
 
 Second, you may want to try to monitor value of prison0.pr_uref variable (e.g.
 via kgdb) while executing various scenarios of what you do now.  If after
 finishing a certain scenario you end up with a value lower than at the start 
 of
 scenario, then this is the troublesome one.
 Please note that prison0.pr_uref is composed from a number of non-jailed
 processes plus a number of top-level jails.  So take this into account when
 comparing prison0.pr_uref values - it's better to record the initial value 
 when
 no jails are started and it's important to keep the number of non-jailed
 processes the same (or to account for its changes).

BTW, I suspect the following scenario, but I am not able to verify it either via
testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



BTW, I suspect the following scenario, but I am not able to
verify it either via testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


Ahh now that explains all of our experienced panic scenarios:-
1. jail stop / start causing the panic but only after at least a
few days worth of uptime.

Here what we're seeing is enough leak of pr_uref from the restarted
jails to decrement prison0.pr_uref to 0 even with all the standard
unjailed processes still running.

2. A machine reboot, after all jails have been stopped but after
less time than #2.

In this case we haven't seen enough leakage to decrement
prison0.pr_uref to 0 given the number or prison0 process but
it has been incorrectly decremented, so as soon as the reboot kicks
in and prison0 processes start exiting, prison0.pr_uref gets 
further decremented and again hits 0 when it shouldn't


Now if this is the case, we should be able to confirm it with a little
more info.

1. What exactly does pr_uref represent?
2. Can what its value should be, be calculated from examining other
details of the system i.e. number of running processes, number of
running jails?

If we can calculate the value that prison0.pr_uref should be, then
by examining the machines we have which have been up for a while,
we should be able to confirm if an incorrect value is present on
them and hence prove this is the case.

Ideally a little script to run in kgdb to test this would be the
best way to go.

   Regards
   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



BTW, I suspect the following scenario, but I am not able to verify it either via
testing or in the code:
- last process in a dying jail exits
- pr_uref of the jail reaches zero
- pr_uref of prison0 gets decremented
- you attach to the jail and resurrect it
- but pr_uref of prison0 stays decremented

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


I've just checked across a number of the panic dumps from the
past few days and they all have prison0.pr_uref = 0 which confirms
the cause of the panic.

I've tried scripting continuous jail start stops, but even after 1000's
of iterations have been unable to trigger this on my test machine, so
I'm going to dig into the jail code to see if I can find out how its
incorrectly decrementing prison0 via inspection.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon

on 20/08/2011 18:51 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 BTW, I suspect the following scenario, but I am not able to verify it either 
 via
 testing or in the code:
 - last process in a dying jail exits
 - pr_uref of the jail reaches zero
 - pr_uref of prison0 gets decremented
 - you attach to the jail and resurrect it
 - but pr_uref of prison0 stays decremented

 Repeat this enough times and prison0.pr_uref reaches zero.
 To reach zero even sooner just kill enough of non-jailed processes.
 
 I've just checked across a number of the panic dumps from the
 past few days and they all have prison0.pr_uref = 0 which confirms
 the cause of the panic.
 
 I've tried scripting continuous jail start stops, but even after 1000's
 of iterations have been unable to trigger this on my test machine, so
 I'm going to dig into the jail code to see if I can find out how its
 incorrectly decrementing prison0 via inspection.

Steve,

thanks for doing this!  I'll reiterate my suspicion just in case - I think that
you should look for the cases where you stop a jail, but then re-attach and
resurrect the jail before it's completely dead.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Roger Marquis


Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.


Interesting.  We've been getting kernel panics in -stable but with only
one jail started at boot without being restarted.

Are you using SAS drives by any chance?  Setting ethernet polling and HZ?
How about softupdates, gmirror, and/or anything in sysctl.conf?

Roger Marquis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Roger Marquis marq...@roble.com

To: freebsd-j...@freebsd.org; freebsd-stable@FreeBSD.org
Sent: Saturday, August 20, 2011 7:10 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE

Repeat this enough times and prison0.pr_uref reaches zero.
To reach zero even sooner just kill enough of non-jailed processes.

Interesting.  We've been getting kernel panics in -stable but with only
one jail started at boot without being restarted.

Are you using SAS drives by any chance?  Setting ethernet polling and HZ?
How about softupdates, gmirror, and/or anything in sysctl.conf?

If your not restarting things it may be unrelated. No SAS, polling is
compiled in but no devices have it active and using ZFS only.

Are you seeing a double fault panic?

   Regards
   Steve

This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



thanks for doing this!  I'll reiterate my suspicion just in case - I think that
you should look for the cases where you stop a jail, but then re-attach and
resurrect the jail before it's completely dead.


Yer that's where I think its happening too, but I also suspect its not just
dieing jail that's needed, I think its a dieing jail in the final stages of
cleanup.

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
   struct prison *ppr, *tpr;
   int vfslocked;

   if (!(flags  PD_LOCKED))
   mtx_lock(pr-pr_mtx);
   /* Decrement the user references in a separate loop. */
   if (flags  PD_DEUREF) {
   for (tpr = pr;; tpr = tpr-pr_parent) {
   if (tpr != pr)
   mtx_lock(tpr-pr_mtx);
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {
   mtx_unlock(tpr-pr_mtx);
   if (flags  PD_LIST_SLOCKED)
   sx_sunlock(allprison_lock);
   else if (flags  PD_LIST_XLOCKED)
   sx_xunlock(allprison_lock);
   return;
   }
   if (tpr != pr) {
   mtx_unlock(tpr-pr_mtx);
   mtx_lock(pr-pr_mtx);
   }
   }

If you take a scenario of a simple one level prison setup running a single 
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is really being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?

   Regards
   Steve







This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Steven Hartland kill...@multiplay.co.uk

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
   struct prison *ppr, *tpr;
   int vfslocked;

   if (!(flags  PD_LOCKED))
   mtx_lock(pr-pr_mtx);
   /* Decrement the user references in a separate loop. */
   if (flags  PD_DEUREF) {
   for (tpr = pr;; tpr = tpr-pr_parent) {
   if (tpr != pr)
   mtx_lock(tpr-pr_mtx);
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {
   mtx_unlock(tpr-pr_mtx);
   if (flags  PD_LIST_SLOCKED)
   sx_sunlock(allprison_lock);
   else if (flags  PD_LIST_XLOCKED)
   sx_xunlock(allprison_lock);
   return;
   }
   if (tpr != pr) {
   mtx_unlock(tpr-pr_mtx);
   mtx_lock(pr-pr_mtx);
   }
   }

If you take a scenario of a simple one level prison setup running a single 
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is really being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h

If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
-   mtx_unlock(tpr-pr_mtx);
+   if (tpr != pr)
+   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Andriy Gapon

on 20/08/2011 23:24 Steven Hartland said the following:
 - Original Message - From: Steven Hartland kill...@multiplay.co.uk
 Looking through the code I believe I may have noticed a scenario which could
 trigger the problem.

 Given the following code:-

 static void
 prison_deref(struct prison *pr, int flags)
 {
struct prison *ppr, *tpr;
int vfslocked;

if (!(flags  PD_LOCKED))
mtx_lock(pr-pr_mtx);
/* Decrement the user references in a separate loop. */
if (flags  PD_DEUREF) {
for (tpr = pr;; tpr = tpr-pr_parent) {
if (tpr != pr)
mtx_lock(tpr-pr_mtx);
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {
mtx_unlock(tpr-pr_mtx);
if (flags  PD_LIST_SLOCKED)
sx_sunlock(allprison_lock);
else if (flags  PD_LIST_XLOCKED)
sx_xunlock(allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(tpr-pr_mtx);
mtx_lock(pr-pr_mtx);
}
}

 If you take a scenario of a simple one level prison setup running a single
 process
 where a prison has just been stopped.

 In the above code pr_uref of the processes prison is decremented. As this is 
 the
 last process then pr_uref will hit 0 and the loop continues instead of 
 breaking
 early.

 Now at the end of the loop iteration the mtx is unlocked so other process can
 now manipulate the jail, this is where I think the problem may be.

 If we now have another process come in and attach to the jail but then 
 instantly
 exit, this process may allow another kernel thread to hit this same bit of 
 code
 and so two process for the same prison get into the section which decrements
 prison0's pr_uref, instead of only one.

 In essence I think we can get the following flow where 1# = process1
 and 2# = process2
 1#1. prison1.pr_uref = 1 (single process jail)
 1#2. prison_deref( prison1,...
 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
 1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
 1#3. prison0.pr_uref--
 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
 2#2. process1.exit
 2#3. prison_deref( prison1,...
 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
 2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
 prison1)

 It seems like the action on the parent prison to decrement the pr_uref is
 happening too early, while the jail can still be used and without the lock on
 the child jails mtx, so causing a race condition.

 I think the fix is to the move the decrement of parent prison pr_uref's down
 so it only takes place if the jail is really being removed. Either that or
 to change the locking semantics so that once the lock is aquired in this
 prison_deref its not unlocked until the function completes.

 What do people think?
 
 After reviewing the changes to prison_deref in commit which added hierarchical
 jails, the removal of the lock by the inital loop on the passed in prison may
 be unintentional.
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
 
 
 If so the following may be all that's needed to fix this issue:-
 
 diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
 --- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
 +++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
 @@ -2455,7 +2455,8 @@
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
 -   mtx_unlock(tpr-pr_mtx);
 +   if (tpr != pr)
 +   mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {

Not sure if this would fly as is - please double check the later block where
pr-pr_mtx is re-locked.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
-   mtx_unlock(tpr-pr_mtx);
+   if (tpr != pr)
+   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {


Not sure if this would fly as is - please double check the later block where
pr-pr_mtx is re-locked.


Will do, I'm now 99.9% sure this is the problem and even better I now have a
reproducible scenario :)

Something else you many be more interested in Andriy:-
I added in debugging options DDB  INVARIANTS to see if I can get a more
useful info and the panic results in a looping panic constantly scrolling up
the console. Not sure if this is a side effect of the patches we've been
trying.

Going to see if I can confirm that, lmk if there's something you want me
to try?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland



- Original Message - 
From: Steven Hartland kill...@multiplay.co.uk



Something else you many be more interested in Andriy:-
I added in debugging options DDB  INVARIANTS to see if I can get a more
useful info and the panic results in a looping panic constantly scrolling up
the console. Not sure if this is a side effect of the patches we've been
trying.

Going to see if I can confirm that, lmk if there's something you want me
to try?


Seems the stop_scheduler_on_panic.8.x.patch is the cause of this.

Removing it allows me to drop to ddb when the panic due to the KASSERT
happens.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



on 20/08/2011 23:24 Steven Hartland said the following:

- Original Message - From: Steven Hartland

Looking through the code I believe I may have noticed a scenario which could
trigger the problem.

Given the following code:-

static void
prison_deref(struct prison *pr, int flags)
{
   struct prison *ppr, *tpr;
   int vfslocked;

   if (!(flags  PD_LOCKED))
   mtx_lock(pr-pr_mtx);
   /* Decrement the user references in a separate loop. */
   if (flags  PD_DEUREF) {
   for (tpr = pr;; tpr = tpr-pr_parent) {
   if (tpr != pr)
   mtx_lock(tpr-pr_mtx);
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {
   mtx_unlock(tpr-pr_mtx);
   if (flags  PD_LIST_SLOCKED)
   sx_sunlock(allprison_lock);
   else if (flags  PD_LIST_XLOCKED)
   sx_xunlock(allprison_lock);
   return;
   }
   if (tpr != pr) {
   mtx_unlock(tpr-pr_mtx);
   mtx_lock(pr-pr_mtx);
   }
   }

If you take a scenario of a simple one level prison setup running a single
process
where a prison has just been stopped.

In the above code pr_uref of the processes prison is decremented. As this is the
last process then pr_uref will hit 0 and the loop continues instead of breaking
early.

Now at the end of the loop iteration the mtx is unlocked so other process can
now manipulate the jail, this is where I think the problem may be.

If we now have another process come in and attach to the jail but then instantly
exit, this process may allow another kernel thread to hit this same bit of code
and so two process for the same prison get into the section which decrements
prison0's pr_uref, instead of only one.

In essence I think we can get the following flow where 1# = process1
and 2# = process2
1#1. prison1.pr_uref = 1 (single process jail)
1#2. prison_deref( prison1,...
1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
1#3. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
1#3. prison0.pr_uref--
2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
2#2. process1.exit
2#3. prison_deref( prison1,...
2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
2#5. prison1.mtx_unlock -- this now allows others to change prison1.pr_uref
2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by 
prison1)

It seems like the action on the parent prison to decrement the pr_uref is
happening too early, while the jail can still be used and without the lock on
the child jails mtx, so causing a race condition.

I think the fix is to the move the decrement of parent prison pr_uref's down
so it only takes place if the jail is really being removed. Either that or
to change the locking semantics so that once the lock is aquired in this
prison_deref its not unlocked until the function completes.

What do people think?


After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h


If so the following may be all that's needed to fix this issue:-

diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig   2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
   if (--tpr-pr_uref  0)
   break;
   KASSERT(tpr != prison0, (prison0 pr_uref=0));
-   mtx_unlock(tpr-pr_mtx);
+   if (tpr != pr)
+   mtx_unlock(tpr-pr_mtx);
   }
   /* Done if there were only user references to remove. */
   if (!(flags  PD_DEREF)) {


Not sure if this would fly as is - please double check the later block where
pr-pr_mtx is re-locked.


Your right, and its actually more complex than that. Although changing it to
not unlock in the middle of prison_deref fixes that race condition it doesn't
prevent pr_uref being incorrectly decremented each time the jail gets into
the dying state, which is really the problem we are seeing.

If hierarchical prisons are used there seems to be an additional problem
where the counter of all prisons in the hierarchy are decremented, but as
far as I can tell only the immediate parent is ever incremented, so another
reference problem there as well I think.

The following patch I believe fixes both of these issues.

I've testing with debug added and confirmed prison0's pr_uref is maintained
correctly even when a jail hits dying state multiple times.

It essentially reverts the changes to the if (flags  PD_DEUREF) by

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-19 Thread John Baldwin

On Thursday, August 18, 2011 4:09:35 pm Andriy Gapon wrote:
 on 17/08/2011 23:21 Andriy Gapon said the following:
  It seems like everything starts with some kind of a race between terminating
  processes in a jail and termination of the jail itself.  This is where the
  details are very thin so far.  What we see is that a process (http) is in
  exit(2) syscall, in exit1() function actually, and past the place where 
  P_WEXIT
  flag is set and even past the place where p_limit is freed and reset to 
  NULL.
  At that place the thread calls prison_proc_free(), which calls 
  prison_deref().
  Then, we see that in prison_deref() the thread gets a page fault because of 
  what
  seems like a NULL pointer dereference.  That's just the start of the 
  problem and
  its root cause.
 
  Then, trap_pfault() gets invoked and, because addresses close to NULL look 
  like
  userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn 
  goes
  on to call vm_map_growstack.  First thing that vm_map_growstack does is a 
  call
  to lim_cur(), but because p_limit is already NULL, that call results in a 
  NULL
  pointer dereference and a page fault.  Goto the beginning of this paragraph.
 
  So we get this recursion of sorts, which only ends when a stack is 
  exhausted and
  a CPU generates a double-fault.
 
 BTW, does anyone has an idea why the thread in question would disappear from
 the kgdb's point of view?
 
 (kgdb) p cpuid_to_pcpu[2]-pc_curthread-td_tid
 $3 = 102057
 (kgdb) tid 102057
 invalid tid
 
 info threads also doesn't list the thread.
 
 Is it because the panic happened while the thread was somewhere in exit1()?

Yes, it is a bug in kgdb that it only walks allproc and not zombproc.  Try this:

Index: kthr.c
===
--- kthr.c  (revision 224879)
+++ kthr.c  (working copy)
@@ -73,11 +73,52 @@ kgdb_thr_first(void)
return (first);
 }
 
+static void
+kgdb_thr_add_procs(uintptr_t paddr)
+{
+   struct proc p;
+   struct thread td;
+   struct kthr *kt;
+   CORE_ADDR addr;
+
+   while (paddr != 0) {
+   if (kvm_read(kvm, paddr, p, sizeof(p)) != sizeof(p)) {
+   warnx(kvm_read: %s, kvm_geterr(kvm));
+   break;
+   }
+   addr = (uintptr_t)TAILQ_FIRST(p.p_threads);
+   while (addr != 0) {
+   if (kvm_read(kvm, addr, td, sizeof(td)) !=
+   sizeof(td)) {
+   warnx(kvm_read: %s, kvm_geterr(kvm));
+   break;
+   }
+   kt = malloc(sizeof(*kt));
+   kt-next = first;
+   kt-kaddr = addr;
+   if (td.td_tid == dumptid)
+   kt-pcb = dumppcb;
+   else if (td.td_state == TDS_RUNNING  stoppcbs != 0 
+   CPU_ISSET(td.td_oncpu, stopped_cpus))
+   kt-pcb = (uintptr_t)stoppcbs +
+   sizeof(struct pcb) * td.td_oncpu;
+   else
+   kt-pcb = (uintptr_t)td.td_pcb;
+   kt-kstack = td.td_kstack;
+   kt-tid = td.td_tid;
+   kt-pid = p.p_pid;
+   kt-paddr = paddr;
+   kt-cpu = td.td_oncpu;
+   first = kt;
+   addr = (uintptr_t)TAILQ_NEXT(td, td_plist);
+   }
+   paddr = (uintptr_t)LIST_NEXT(p, p_list);
+   }
+}
+
 struct kthr *
 kgdb_thr_init(void)
 {
-   struct proc p;
-   struct thread td;
long cpusetsize;
struct kthr *kt;
CORE_ADDR addr;
@@ -113,37 +154,11 @@ kgdb_thr_init(void)
 
stoppcbs = kgdb_lookup(stoppcbs);
 
-   while (paddr != 0) {
-   if (kvm_read(kvm, paddr, p, sizeof(p)) != sizeof(p)) {
-   warnx(kvm_read: %s, kvm_geterr(kvm));
-   break;
-   }
-   addr = (uintptr_t)TAILQ_FIRST(p.p_threads);
-   while (addr != 0) {
-   if (kvm_read(kvm, addr, td, sizeof(td)) !=
-   sizeof(td)) {
-   warnx(kvm_read: %s, kvm_geterr(kvm));
-   break;
-   }
-   kt = malloc(sizeof(*kt));
-   kt-next = first;
-   kt-kaddr = addr;
-   if (td.td_tid == dumptid)
-   kt-pcb = dumppcb;
-   else if (td.td_state == TDS_RUNNING  stoppcbs != 0 
-   CPU_ISSET(td.td_oncpu, stopped_cpus))
-   kt-pcb = (uintptr_t) stoppcbs + sizeof(struct 
pcb) * td.td_oncpu;
-   else
-

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-19 Thread Andriy Gapon

on 19/08/2011 15:14 John Baldwin said the following:
 Yes, it is a bug in kgdb that it only walks allproc and not zombproc.  Try 
 this:

The patch worked perfectly well for me, thank you!

 Index: kthr.c
 ===
 --- kthr.c(revision 224879)
 +++ kthr.c(working copy)
 @@ -73,11 +73,52 @@ kgdb_thr_first(void)
   return (first);
  }
  
 +static void
 +kgdb_thr_add_procs(uintptr_t paddr)
 +{
 + struct proc p;
 + struct thread td;
 + struct kthr *kt;
 + CORE_ADDR addr;
 +
 + while (paddr != 0) {
 + if (kvm_read(kvm, paddr, p, sizeof(p)) != sizeof(p)) {
 + warnx(kvm_read: %s, kvm_geterr(kvm));
 + break;
 + }
 + addr = (uintptr_t)TAILQ_FIRST(p.p_threads);
 + while (addr != 0) {
 + if (kvm_read(kvm, addr, td, sizeof(td)) !=
 + sizeof(td)) {
 + warnx(kvm_read: %s, kvm_geterr(kvm));
 + break;
 + }
 + kt = malloc(sizeof(*kt));
 + kt-next = first;
 + kt-kaddr = addr;
 + if (td.td_tid == dumptid)
 + kt-pcb = dumppcb;
 + else if (td.td_state == TDS_RUNNING  stoppcbs != 0 
 + CPU_ISSET(td.td_oncpu, stopped_cpus))
 + kt-pcb = (uintptr_t)stoppcbs +
 + sizeof(struct pcb) * td.td_oncpu;
 + else
 + kt-pcb = (uintptr_t)td.td_pcb;
 + kt-kstack = td.td_kstack;
 + kt-tid = td.td_tid;
 + kt-pid = p.p_pid;
 + kt-paddr = paddr;
 + kt-cpu = td.td_oncpu;
 + first = kt;
 + addr = (uintptr_t)TAILQ_NEXT(td, td_plist);
 + }
 + paddr = (uintptr_t)LIST_NEXT(p, p_list);
 + }
 +}
 +
  struct kthr *
  kgdb_thr_init(void)
  {
 - struct proc p;
 - struct thread td;
   long cpusetsize;
   struct kthr *kt;
   CORE_ADDR addr;
 @@ -113,37 +154,11 @@ kgdb_thr_init(void)
  
   stoppcbs = kgdb_lookup(stoppcbs);
  
 - while (paddr != 0) {
 - if (kvm_read(kvm, paddr, p, sizeof(p)) != sizeof(p)) {
 - warnx(kvm_read: %s, kvm_geterr(kvm));
 - break;
 - }
 - addr = (uintptr_t)TAILQ_FIRST(p.p_threads);
 - while (addr != 0) {
 - if (kvm_read(kvm, addr, td, sizeof(td)) !=
 - sizeof(td)) {
 - warnx(kvm_read: %s, kvm_geterr(kvm));
 - break;
 - }
 - kt = malloc(sizeof(*kt));
 - kt-next = first;
 - kt-kaddr = addr;
 - if (td.td_tid == dumptid)
 - kt-pcb = dumppcb;
 - else if (td.td_state == TDS_RUNNING  stoppcbs != 0 
 - CPU_ISSET(td.td_oncpu, stopped_cpus))
 - kt-pcb = (uintptr_t) stoppcbs + sizeof(struct 
 pcb) * td.td_oncpu;
 - else
 - kt-pcb = (uintptr_t)td.td_pcb;
 - kt-kstack = td.td_kstack;
 - kt-tid = td.td_tid;
 - kt-pid = p.p_pid;
 - kt-paddr = paddr;
 - kt-cpu = td.td_oncpu;
 - first = kt;
 - addr = (uintptr_t)TAILQ_NEXT(td, td_plist);
 - }
 - paddr = (uintptr_t)LIST_NEXT(p, p_list);
 + kgdb_thr_add_procs(paddr);
 + addr = kgdb_lookup(zombproc);
 + if (addr != 0) {
 + kvm_read(kvm, addr, paddr, sizeof(paddr));
 + kgdb_thr_add_procs(paddr);
   }
   curkthr = kgdb_thr_lookup_tid(dumptid);
   if (curkthr == NULL)
 
 is there an easy way to examine its stack in this case?
 
 Hmm, you can use something like this from my kgdb macros.

Oh, I completely forgot about them.
I hope I will remember where to search for the tricks next time I need them :-)
Thank you again!

 For amd64:
 
 # Do a backtrace given %rip and %rbp as args
 define bt
 set $_rip = $arg0
 set $_rbp = $arg1
 set $i = 0
 while ($_rbp != 0 || $_rip != 0)
   printf %2d: pc , $i
   if ($_rip != 0)
   x/1i $_rip
   else
   printf \n
   end
   if ($_rbp == 0)
   set $_rip = 0
   else
   set $fr = (struct amd64_frame *)$_rbp
   set $_rbp = $fr-f_frame
   set $_rip = $fr-f_retaddr
   set $i = $i + 1
   end
 end
 end
 
 document bt
 Given values for %rip and %rbp, perform a manual backtrace.
 end
 
 define btf

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon

on 18/08/2011 02:15 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 Thanks to the debug that Steven provided and to the help that I received from
 Kostik, I think that now I understand the basic mechanics of this panic, but,
 unfortunately, not the details of its root cause.

 It seems like everything starts with some kind of a race between terminating
 processes in a jail and termination of the jail itself.  This is where the
 details are very thin so far.  What we see is that a process (http) is in
 exit(2) syscall, in exit1() function actually, and past the place where 
 P_WEXIT
 flag is set and even past the place where p_limit is freed and reset to NULL.
 At that place the thread calls prison_proc_free(), which calls 
 prison_deref().
 Then, we see that in prison_deref() the thread gets a page fault because of 
 what
 seems like a NULL pointer dereference.  That's just the start of the problem 
 and
 its root cause.
 
 Thats interesting, are you using http as an example or is that something thats
 been gleaned from the debugging of our output? I ask as there's only one 
 process
 running in each of our jails and thats a single java process.


It's from the debug data: p_comm = httpd
I also would like to ask you to revert the last patch that I sent you (with 
tf_rip
comparisons) and try the patch from Kostik instead.
Given what we suspect about the problem, can please also try to provoke the
problem by e.g. doing frequent jail restarts or something else that supposedly
should hit the bug.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org

Thats interesting, are you using http as an example or is that something thats
been gleaned from the debugging of our output? I ask as there's only one process
running in each of our jails and thats a single java process.



It's from the debug data: p_comm = httpd


Hmm, there's only one httpd thats ever run on the machine and thats not in the 
jail
its on the raw machine.


I also would like to ask you to revert the last patch that I sent you (with 
tf_rip
comparisons) and try the patch from Kostik instead.


Sure.


Given what we suspect about the problem, can please also try to provoke the
problem by e.g. doing frequent jail restarts or something else that supposedly
should hit the bug.


I've tried doing this for quite some days on the test machine, but I've been
unable to provoke it, will continue to try.

   Regards
   Steve




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon

on 18/08/2011 13:35 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 Thats interesting, are you using http as an example or is that something 
 thats
 been gleaned from the debugging of our output? I ask as there's only one 
 process
 running in each of our jails and thats a single java process.


 It's from the debug data: p_comm = httpd
 
 Hmm, there's only one httpd thats ever run on the machine and thats not in 
 the jail
 its on the raw machine.

Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
something to do with an actual jail, while it could have been just prison0 where
all non-jailed processes belong.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
something to do with an actual jail, while it could have been just prison0 where
all non-jailed processes belong.


That makes sense as this particular panic was caused by a machine reboot,
which is slightly different from the more common jail panic we're seeing.

Doesn't help with our reproduction scenario though unfortunately. If we
don't have any joy reproducing on our single test machine I'll have this
kernel rolled out across a portion of the farm, which should mean we
see the panic results in a few days time.

I understand there's a risk involved in this but, its important for us
to determine the cause and get a confirmed fix, as well as being able
to prove that the panic fix works which will help everyone in the long
run.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon

on 18/08/2011 14:11 Andriy Gapon said the following:
 Probably I have mistakenly assumed that the 'prison' in prison_derefer() has
 something to do with an actual jail, while it could have been just prison0 
 where
 all non-jailed processes belong.

So, indeed:
(kgdb) p $2-p_ucred-cr_prison
$10 = (struct prison *) 0x807d5080
(kgdb) p prison0
$11 = (struct prison *) 0x807d5080
(kgdb) p *$2-p_ucred-cr_prison
$12 = {pr_list = {tqe_next = 0x0, tqe_prev = 0x0}, pr_id = 0, pr_ref = 398,
pr_uref = 0, pr_flags = 386, pr_children = {lh_first = 0x0}, pr_sibling = 
{le_next
= 0x0, le_prev = 0x0}, pr_parent = 0x0,
  pr_mtx = {lock_object = {lo_name = 0x8063007c jail mutex, lo_flags =
16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, pr_task = {ta_link =
{stqe_next = 0x0}, ta_pending = 0,
ta_priority = 0, ta_func = 0, ta_context = 0x0}, pr_osd = {osd_nslots = 0,
osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, pr_cpuset =
0xff0012d65dc8, pr_vnet = 0x0,
  pr_root = 0xff00166ebce8, pr_ip4s = 0, pr_ip6s = 0, pr_ip4 = 0x0, pr_ip6 =
0x0, pr_sparep = {0x0, 0x0, 0x0, 0x0}, pr_childcount = 0, pr_childmax = 99,
pr_allow = 127, pr_securelevel = -1,
  pr_enforce_statfs = 0, pr_spare = {0, 0, 0, 0, 0}, pr_hostid = 3251597242,
pr_name = 0, '\0' repeats 254 times, pr_path = /, '\0' repeats 1022 
times,
  pr_hostname = censored, '\0' repeats 231 times, pr_domainname = '\0'
repeats 255 times, pr_hostuuid = 54443842-0054-2500-902c-0025902c3cb0, '\0'
repeats 27 times}

Also, let's consider this code:
if (flags  PD_DEUREF) {
for (tpr = pr;; tpr = tpr-pr_parent) {
if (tpr != pr)
mtx_lock(tpr-pr_mtx);
if (--tpr-pr_uref  0)
break;
KASSERT(tpr != prison0, (prison0 pr_uref=0));
mtx_unlock(tpr-pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags  PD_DEREF)) {
mtx_unlock(tpr-pr_mtx);
if (flags  PD_LIST_SLOCKED)
sx_sunlock(allprison_lock);
else if (flags  PD_LIST_XLOCKED)
sx_xunlock(allprison_lock);
return;
}
if (tpr != pr) {
mtx_unlock(tpr-pr_mtx);
mtx_lock(pr-pr_mtx);
}
}

The most suspicious thing is that pr_uref is zero in the debug data.
With INVARIANTS we would hit the prison0 pr_uref=0 KASSERT.

Then, because this is prison0 and because pr_uref reached zero, tpr gets 
assigned
to NULL.  And then because tpr != pr we try to execute mtx_unlock(tpr-pr_mtx).
That's where the NULL pointer deref happens.

So, now the big question is how/why we reached pr_uref == 0.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Andriy Gapon


on 17/08/2011 23:21 Andriy Gapon said the following:

It seems like everything starts with some kind of a race between terminating
processes in a jail and termination of the jail itself.  This is where the
details are very thin so far.  What we see is that a process (http) is in
exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
flag is set and even past the place where p_limit is freed and reset to NULL.
At that place the thread calls prison_proc_free(), which calls prison_deref().
Then, we see that in prison_deref() the thread gets a page fault because of what
seems like a NULL pointer dereference.  That's just the start of the problem and
its root cause.

Then, trap_pfault() gets invoked and, because addresses close to NULL look like
userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes
on to call vm_map_growstack.  First thing that vm_map_growstack does is a call
to lim_cur(), but because p_limit is already NULL, that call results in a NULL
pointer dereference and a page fault.  Goto the beginning of this paragraph.

So we get this recursion of sorts, which only ends when a stack is exhausted and
a CPU generates a double-fault.


BTW, does anyone has an idea why the thread in question would disappear from
the kgdb's point of view?

(kgdb) p cpuid_to_pcpu[2]-pc_curthread-td_tid
$3 = 102057
(kgdb) tid 102057
invalid tid

info threads also doesn't list the thread.

Is it because the panic happened while the thread was somewhere in exit1()?
is there an easy way to examine its stack in this case?

--
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Attilio Rao

2011/8/18 Andriy Gapon a...@freebsd.org:
 on 17/08/2011 23:21 Andriy Gapon said the following:

 It seems like everything starts with some kind of a race between
 terminating
 processes in a jail and termination of the jail itself.  This is where the
 details are very thin so far.  What we see is that a process (http) is in
 exit(2) syscall, in exit1() function actually, and past the place where
 P_WEXIT
 flag is set and even past the place where p_limit is freed and reset to
 NULL.
 At that place the thread calls prison_proc_free(), which calls
 prison_deref().
 Then, we see that in prison_deref() the thread gets a page fault because
 of what
 seems like a NULL pointer dereference.  That's just the start of the
 problem and
 its root cause.

 Then, trap_pfault() gets invoked and, because addresses close to NULL look
 like
 userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn
 goes
 on to call vm_map_growstack.  First thing that vm_map_growstack does is a
 call
 to lim_cur(), but because p_limit is already NULL, that call results in a
 NULL
 pointer dereference and a page fault.  Goto the beginning of this
 paragraph.

 So we get this recursion of sorts, which only ends when a stack is
 exhausted and
 a CPU generates a double-fault.

 BTW, does anyone has an idea why the thread in question would disappear
 from
 the kgdb's point of view?

 (kgdb) p cpuid_to_pcpu[2]-pc_curthread-td_tid
 $3 = 102057
 (kgdb) tid 102057
 invalid tid

 info threads also doesn't list the thread.

 Is it because the panic happened while the thread was somewhere in exit1()?
 is there an easy way to examine its stack in this case?

Yes it is likely it.

'tid' command should lookup the tid_to_thread() table (or similar
name) which returns NULL, which means the thread has past beyond the
point it was in the lookup table.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Andriy Gapon

on 16/08/2011 23:43 Steven Hartland said the following:
 
 - Original Message - From: Andriy Gapon a...@freebsd.org
 To: Steven Hartland kill...@multiplay.co.uk
 Cc: freebsd-stable@FreeBSD.org
 Sent: Tuesday, August 16, 2011 9:30 PM
 Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
 
 
 on 15/08/2011 17:56 Steven Hartland said the following:
 (kgdb) x/512a 0xff8d8f357210
 [snip]

 Can you please also provide the following for this core?
 list *vm_map_growstack+93
 list *lim_cur+17
 list *lim_rlimit+18

 Also, it would be interesting to get panic output with DDB option.
 
 Here's the info:-
 
 (kgdb) list *vm_map_growstack+93
 0x80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305).
 3300struct uidinfo *uip;
 3301
 3302Retry:
 3303PROC_LOCK(p);
 3304stacklim = lim_cur(p, RLIMIT_STACK);
 3305vmemlim = lim_cur(p, RLIMIT_VMEM);
 3306PROC_UNLOCK(p);
 3307
 3308vm_map_lock_read(map);
 3309
 (kgdb) list *lim_cur+17
 0x80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150).
 1145rlim_t
 1146lim_cur(struct proc *p, int which)
 1147{
 1148struct rlimit rl;
 1149
 1150lim_rlimit(p, which, rl);
 1151return (rl.rlim_cur);
 1152}
 1153
 1154/*
 (kgdb) list *lim_rlimit+18
 0x80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165).
 1160{
 1161
 1162PROC_LOCK_ASSERT(p, MA_OWNED);
 1163KASSERT(which = 0  which  RLIM_NLIMITS,
 1164(request for invalid resource limit));
 1165*rlp = p-p_limit-pl_rlimit[which];
 1166if (p-p_sysent-sv_fixlimit != NULL)
 1167p-p_sysent-sv_fixlimit(rlp, which);
 1168}
 1169
 
 I've yet to have the machine with DDB + expanded stack panic.
 
 I plan to leave it a day or so more then try a reboot to see if that
 triggers it. If not I'll drop the stack back down to 4 and see if that
 enables us to get another panic.

OK, thank you for continuing to debug this!
Another request: could you please execute the following commands in kgdb on the
above core file?

define allpcpu
set $i = 0
while ($i = mp_maxid)
p *cpuid_to_pcpu[$i]
set $i = $i + 1
end
end
allpcpu


A little bit later I will send you another patch that, I hope, will produce 
better
diagnostics for this crash (without DDB in kernel).

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Andriy Gapon

on 17/08/2011 14:12 Andriy Gapon said the following:
 A little bit later I will send you another patch that, I hope, will produce 
 better
 diagnostics for this crash (without DDB in kernel).

The patch:
Index: sys/amd64/amd64/trap.c
===
--- sys/amd64/amd64/trap.c  (revision 224782)
+++ sys/amd64/amd64/trap.c  (working copy)
@@ -198,6 +198,10 @@
PCPU_INC(cnt.v_trap);
type = frame-tf_trapno;

+   if ((uintptr_t)frame-tf_rip = (uintptr_t)lim_rlimit
+(uintptr_t)frame-tf_rip  (uintptr_t)lim_rlimit + 40)
+   panic(trap in lim_rlimit);
+
 #ifdef SMP
/* Handler for NMI IPIs used for stopping CPUs. */
if (type == T_NMI) {

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Steven Hartland



- Original Message - 
From: Andriy Gapon a...@freebsd.org

To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Wednesday, August 17, 2011 12:12 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



on 16/08/2011 23:43 Steven Hartland said the following:


- Original Message - From: Andriy Gapon a...@freebsd.org
To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Tuesday, August 16, 2011 9:30 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



on 15/08/2011 17:56 Steven Hartland said the following:

(kgdb) x/512a 0xff8d8f357210

[snip]

Can you please also provide the following for this core?
list *vm_map_growstack+93
list *lim_cur+17
list *lim_rlimit+18

Also, it would be interesting to get panic output with DDB option.


Here's the info:-

(kgdb) list *vm_map_growstack+93
0x80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305).
3300struct uidinfo *uip;
3301
3302Retry:
3303PROC_LOCK(p);
3304stacklim = lim_cur(p, RLIMIT_STACK);
3305vmemlim = lim_cur(p, RLIMIT_VMEM);
3306PROC_UNLOCK(p);
3307
3308vm_map_lock_read(map);
3309
(kgdb) list *lim_cur+17
0x80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150).
1145rlim_t
1146lim_cur(struct proc *p, int which)
1147{
1148struct rlimit rl;
1149
1150lim_rlimit(p, which, rl);
1151return (rl.rlim_cur);
1152}
1153
1154/*
(kgdb) list *lim_rlimit+18
0x80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165).
1160{
1161
1162PROC_LOCK_ASSERT(p, MA_OWNED);
1163KASSERT(which = 0  which  RLIM_NLIMITS,
1164(request for invalid resource limit));
1165*rlp = p-p_limit-pl_rlimit[which];
1166if (p-p_sysent-sv_fixlimit != NULL)
1167p-p_sysent-sv_fixlimit(rlp, which);
1168}
1169

I've yet to have the machine with DDB + expanded stack panic.

I plan to leave it a day or so more then try a reboot to see if that
triggers it. If not I'll drop the stack back down to 4 and see if that
enables us to get another panic.


OK, thank you for continuing to debug this!


No thank you for the help :)


Another request: could you please execute the following commands in kgdb on the
above core file?

define allpcpu
set $i = 0
while ($i = mp_maxid)
p *cpuid_to_pcpu[$i]
set $i = $i + 1
end
end
allpcpu


Here's the output.

$1 = {pc_curthread = 0xff0012d708c0, pc_idlethread = 0xff0012d838c0, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xff8000149d00, pc_switchtime = 564139965450231, pc_switchticks = 247796551, pc_cpuid = 0,
 pc_cpumask = 1, pc_other_cpus = 16777214, pc_allcpu = {sle_next = 0x0}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1246344506, 
v_trap = 121031682, v_syscall = 2590785278, v_intr = 866415, v_soft = 174249227,
   v_vm_faults = 24640099, v_cow_faults = 2606934, v_cow_optim = 678, v_zfod = 19177479, v_ozfod = 0, v_swapin = 0, v_swapout = 
0, v_swappgsin = 0, v_swappgsout = 0, v_vnodein = 24007, v_vnodeout = 41, v_vnodepgsin = 24007,
   v_vnodepgsout = 322, v_intrans = 7300, v_reactivated = 0, v_pdwakeups = 0, v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree 
= 0, v_tfree = 25056637, v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
   v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0, v_active_count = 0, v_inactive_target = 0, 
v_inactive_count = 0, v_cache_count = 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
   v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 35906, v_vforks = 21218, v_rforks = 0, v_kthreads = 20, v_forkpages = 
9357854, v_vforkpages = 4445028, v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {9035196,
   1438, 426481, 1091491, 22402335}, pc_device = 0xff0012da2700, pc_netisr = 0xff0012cfe500, pc_rm_queue = {rmq_next = 
0x808af550, rmq_prev = 0x808af550}, pc_dynamic = 3737856,
 pc_monitorbuf = '\0' repeats 127 times, pc_prvspace = 0x808af400, pc_curpmap = 0xff0012d74ef8, pc_tssp = 
0x808ae700, pc_commontssp = 0x808ae700, pc_rsp0 = -549754462976,
 pc_scratch_rsp = 140737488348968, pc_apic_id = 0, pc_acpi_id = 1, pc_fs32p = 0x808ad530, pc_gs32p = 0x808ad538, 
pc_ldt = 0x808ad578, pc_tss = 0x808ad568, pc_cmci_mask = 364}
$2 = {pc_curthread = 0xff0012d85000, pc_idlethread = 0xff0012d85000, pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb 
= 0xff80001bcd00, pc_switchtime = 564139964769035, pc_switchticks = 247796551, pc_cpuid = 1,
 pc_cpumask = 2, pc_other_cpus = 16777213, pc_allcpu = {sle_next = 0x808af400}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 
457697994, v_trap = 61700571, v_syscall = 670428238, v_intr = 298981, v_soft = 58852682,
   v_vm_faults = 7228810, v_cow_faults = 442573, v_cow_optim = 116, v_zfod

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Andriy Gapon

on 17/08/2011 15:15 Steven Hartland said the following:
 define allpcpu
 set $i = 0
 while ($i = mp_maxid)
 p *cpuid_to_pcpu[$i]
 set $i = $i + 1
 end
 end
 allpcpu
 
 Here's the output.
[snip]
 $3 = {pc_curthread = 0xff06b7f9c000, pc_idlethread = 0xff0012d85460,
 pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xff8d8f35ad00,
 pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2,
  pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next =
 0x808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, 
 v_trap =
 95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308,
v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod =
 11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, 
 v_swappgsout
 = 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238,
v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0,
 v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380,
 v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0,
 v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, 
 v_cache_count =
 0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 
 16857,
 v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842,
 v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094,
693, 594838, 24425, 23707811}, pc_device = 0xff0012da2500, pc_netisr = 
 0x0,
 pc_rm_queue = {rmq_next = 0x808afa50, rmq_prev = 0x808afa50},
 pc_dynamic = 18446743526093326592,
  pc_monitorbuf = '\0' repeats 127 times, pc_prvspace = 0x808af900,
 pc_curpmap = 0x8083ea50, pc_tssp = 0x808ae7d0, pc_commontssp =
 0x808ae7d0, pc_rsp0 = -491518579456,
  pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p =
 0x808ad600, pc_gs32p = 0x808ad608, pc_ldt = 
 0x808ad648,
 pc_tss = 0x808ad638, pc_cmci_mask = 8}
[snip]

Thank you.
A few more questions:
1. more kgdb info for the core:
p *(cpuid_to_pcpu[2]-pc_curthread)
p *(cpuid_to_pcpu[2]-pc_curthread-td_proc)
p *(cpuid_to_pcpu[2]-pc_curthread-td_proc-p_limit)

2. do you have any additional patches in your source tree besides those 
debugging
patches that I provided to you?

3. do you have any thirdparty/out-of-tree kernel modules?

4. could you please send me your kernel config?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Steven Hartland



- Original Message - 
From: Andriy Gapon a...@freebsd.org

To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Wednesday, August 17, 2011 1:56 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



on 17/08/2011 15:15 Steven Hartland said the following:

define allpcpu
set $i = 0
while ($i = mp_maxid)
p *cpuid_to_pcpu[$i]
set $i = $i + 1
end
end
allpcpu


Here's the output.

[snip]

$3 = {pc_curthread = 0xff06b7f9c000, pc_idlethread = 0xff0012d85460,
pc_fpcurthread = 0x0, pc_deadthread = 0x0, pc_curpcb = 0xff8d8f35ad00,
pc_switchtime = 564139963042291, pc_switchticks = 247796550, pc_cpuid = 2,
 pc_cpumask = 4, pc_other_cpus = 16777211, pc_allcpu = {sle_next =
0x808af680}, pc_spinlocks = 0x0, pc_cnt = {v_swtch = 1005391948, v_trap 
=
95927887, v_syscall = 2033274537, v_intr = 137253, v_soft = 151981308,
   v_vm_faults = 14199910, v_cow_faults = 1468132, v_cow_optim = 533, v_zfod =
11032593, v_ozfod = 0, v_swapin = 0, v_swapout = 0, v_swappgsin = 0, 
v_swappgsout
= 0, v_vnodein = 17238, v_vnodeout = 48, v_vnodepgsin = 17238,
   v_vnodepgsout = 378, v_intrans = 6753, v_reactivated = 0, v_pdwakeups = 0,
v_pdpages = 0, v_tcached = 0, v_dfree = 0, v_pfree = 0, v_tfree = 15435380,
v_page_size = 0, v_page_count = 0, v_free_reserved = 0,
   v_free_target = 0, v_free_min = 0, v_free_count = 0, v_wire_count = 0,
v_active_count = 0, v_inactive_target = 0, v_inactive_count = 0, v_cache_count =
0, v_cache_min = 0, v_cache_max = 0, v_pageout_free_min = 0,
   v_interrupt_free_min = 0, v_free_severe = 0, v_forks = 24041, v_vforks = 
16857,
v_rforks = 0, v_kthreads = 0, v_forkpages = 6281292, v_vforkpages = 3606842,
v_rforkpages = 0, v_kthreadpages = 0}, pc_cp_time = {8629094,
   693, 594838, 24425, 23707811}, pc_device = 0xff0012da2500, pc_netisr = 
0x0,
pc_rm_queue = {rmq_next = 0x808afa50, rmq_prev = 0x808afa50},
pc_dynamic = 18446743526093326592,
 pc_monitorbuf = '\0' repeats 127 times, pc_prvspace = 0x808af900,
pc_curpmap = 0x8083ea50, pc_tssp = 0x808ae7d0, pc_commontssp =
0x808ae7d0, pc_rsp0 = -491518579456,
 pc_scratch_rsp = 140737488347240, pc_apic_id = 2, pc_acpi_id = 2, pc_fs32p =
0x808ad600, pc_gs32p = 0x808ad608, pc_ldt = 0x808ad648,
pc_tss = 0x808ad638, pc_cmci_mask = 8}

[snip]

Thank you.
A few more questions:
1. more kgdb info for the core:
p *(cpuid_to_pcpu[2]-pc_curthread)
p *(cpuid_to_pcpu[2]-pc_curthread-td_proc)
p *(cpuid_to_pcpu[2]-pc_curthread-td_proc-p_limit)



(kgdb) p *(cpuid_to_pcpu[2]-pc_curthread)
$1 = {td_lock = 0x8084a440, td_proc = 0xff070b5a48c0, td_plist = {tqe_next = 0x0, tqe_prev = 0xff070b5a48d0}, 
td_runq = {tqe_next = 0x0, tqe_prev = 0x8084a688}, td_slpq = {tqe_next = 0x0,
   tqe_prev = 0xff0296460900}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xff8d8fb5c8b0}, td_cpuset = 0xff0012d65dc8, 
td_sel = 0xff0a1b76c700, td_sleepqueue = 0xff0296460900,
 td_turnstile = 0xff05f31d8000, td_umtxq = 0xff05513d9780, td_tid = 102057, td_sigqueue = {sq_signals = {__bits = {0, 0, 
0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0,
 tqh_last = 0xff06b7f9c0a0}, sq_proc = 0xff070b5a48c0, sq_flags = 1}, td_flags = 6, td_inhibitors = 0, td_pflags = 0, 
td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 2 '\002',
 td_oncpu = 2 '\002', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 998, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked 
= 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0,
 td_intr_nesting_level = 0, td_pinned = 1, td_ucred = 0xff0551cf9900, td_estcpu = 0, td_slptick = 0, td_blktick = 0, td_ru = 
{ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 2068,
   ru_ixrss = 5280, ru_idrss = 19296, ru_isrss = 6144, ru_minflt = 5015, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock 
= 0, ru_msgsnd = 241, ru_msgrcv = 2076, ru_nsignals = 1, ru_nvcsw = 2264, ru_nivcsw = 159},
 td_incruntime = 4257692, td_runtime = 487523210, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 4, 
td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {16384, 0, 0, 0}},
 td_generation = 2423, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 
0, td_name = httpd, '\0' repeats 14 times, td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {
   ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, 
si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0,
   sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = 
{_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 0,
   ksi_sigq = 0x0}, td_ng_outbound = 0

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Andriy Gapon


Thanks to the debug that Steven provided and to the help that I received from
Kostik, I think that now I understand the basic mechanics of this panic, but,
unfortunately, not the details of its root cause.

It seems like everything starts with some kind of a race between terminating
processes in a jail and termination of the jail itself.  This is where the
details are very thin so far.  What we see is that a process (http) is in
exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
flag is set and even past the place where p_limit is freed and reset to NULL.
At that place the thread calls prison_proc_free(), which calls prison_deref().
Then, we see that in prison_deref() the thread gets a page fault because of what
seems like a NULL pointer dereference.  That's just the start of the problem and
its root cause.

Then, trap_pfault() gets invoked and, because addresses close to NULL look like
userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn goes
on to call vm_map_growstack.  First thing that vm_map_growstack does is a call
to lim_cur(), but because p_limit is already NULL, that call results in a NULL
pointer dereference and a page fault.  Goto the beginning of this paragraph.

So we get this recursion of sorts, which only ends when a stack is exhausted and
a CPU generates a double-fault.

So, of course, Steven is interested in finding and fixing the root cause.  I
hope we will get to that with some help from the prison guards :-)

But I also would like to use this opportunity to discuss how we can make it
easier to debug such issue as this.  I think that this problem demonstrates that
when we treat certain junk in kernel address value as a userland address value,
we throw additional heaps of irrelevant stuff on top of an actual problem. One
solution could be to use a special flag that would mark all actual attempts to
access userland address (e.g. setting the flag on entrance to copyin and
clearing it upon return), so that in the page fault handler we could distinguish
actual faults on userland addresses from faults on garbage kernel addresses.  I
am sure that there could be other clever techniques to catch such garbage
addresses early.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Kostik Belousov

On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote:
[skip]

 But I also would like to use this opportunity to discuss how we can
 make it easier to debug such issue as this. I think that this problem
 demonstrates that when we treat certain junk in kernel address value
 as a userland address value, we throw additional heaps of irrelevant
 stuff on top of an actual problem. One solution could be to use a
 special flag that would mark all actual attempts to access userland
 address (e.g. setting the flag on entrance to copyin and clearing it
 upon return), so that in the page fault handler we could distinguish
 actual faults on userland addresses from faults on garbage kernel
 addresses. I am sure that there could be other clever techniques to
 catch such garbage addresses early.

We already have such mechanism, the kernel code aware of the usermode
page access sets pcb_onfault. See the end of trap_pfault() handler.
In fact, we can catch it earlier, before even calling vm_fault().

BTW, I think this is esp. useful in the combination with the support
for the SMEP in recent Intel CPUs.

commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536
Author: Konstantin Belousov kos...@pooma.home
Date:   Thu Aug 18 00:08:50 2011 +0300

Assert that the exiting process does not return to usermode.
On x86, do not call vm_fault() when the kernel is not prepared
to handle unsuccessful page fault.

diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 4e5f8b8..55e1e5a 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -674,6 +674,19 @@ trap_pfault(frame, usermode)
goto nogo;
 
map = vm-vm_map;
+
+   /*
+* When accessing a usermode address, kernel must be
+* ready to accept the page fault, and provide a
+* handling routine.  Since accessing the address
+* without the handler is a bug, do not try to handle
+* it normally, and panic immediately.
+*/
+   if (!usermode  (td-td_intr_nesting_level != 0 ||
+   PCPU_GET(curpcb)-pcb_onfault == NULL)) {
+   trap_fatal(frame, eva);
+   return (-1);
+   }
}
 
/*
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 5a8016c..e6d2b5a 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva)
goto nogo;
 
map = vm-vm_map;
+   if (!usermode  (td-td_intr_nesting_level != 0 ||
+   PCPU_GET(curpcb)-pcb_onfault == NULL)) {
+   trap_fatal(frame, eva);
+   return (-1);
+   }
}
 
/*
diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
index 3527ed1..a69b7b8 100644
--- a/sys/kern/subr_trap.c
+++ b/sys/kern/subr_trap.c
@@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame)
 
CTR3(KTR_SYSC, userret: thread %p (pid %d, %s), td, p-p_pid,
 td-td_name);
+   KASSERT((p-p_flag  P_WEXIT) == 0,
+   (Exiting process returns to usermode));
 #if 0
 #ifdef DIAGNOSTIC
/* Check that we called signotify() enough. */


pgpMIIm18QgD2.pgp
Description: PGP signature

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



Thanks to the debug that Steven provided and to the help that I received from
Kostik, I think that now I understand the basic mechanics of this panic, but,
unfortunately, not the details of its root cause.

It seems like everything starts with some kind of a race between terminating
processes in a jail and termination of the jail itself.  This is where the
details are very thin so far.  What we see is that a process (http) is in
exit(2) syscall, in exit1() function actually, and past the place where P_WEXIT
flag is set and even past the place where p_limit is freed and reset to NULL.
At that place the thread calls prison_proc_free(), which calls prison_deref().
Then, we see that in prison_deref() the thread gets a page fault because of what
seems like a NULL pointer dereference.  That's just the start of the problem and
its root cause.


Thats interesting, are you using http as an example or is that something thats
been gleaned from the debugging of our output? I ask as there's only one process
running in each of our jails and thats a single java process.

Now given your description there may be something I can add that may help
clarify what the cause could be.

In a nutshell the jail manager we're using will attempt to resurrect the jail
from a dieing state in a few specific scenarios.

Here's an exmaple:-
1. jail restart requested
2. jail is stopped, so the java processes is killed off, but active tcp sessions
may prevent the timely full shutdown of the jail.
3. if an existing jail is detected, i.e. a dieing jail from #2, instead of
starting a new jail we attach to the old one and exec the new java process.
4. if an existing jail isnt detected, i.e. where there where not hanging tcp
sessions and #2 cleanly shutdown the jail, a new jail is created, attached to
and the java exec'ed.

The system uses static jailid's so its possible to determine if an existing
jail for this service exists or not. This prevents duplicate services as
well as making services easy to identify by their jailid.

So what we could be seeing is a race between the jail shutdown and the attach
of the new process?

Now man 2 jail seems to indicate this is a valid use case for jail_set, as
it documents its support for JAIL_DYING as a valid option for flags, but I
suspect its something quite out of the ordinary to actually do, which may be
why this panic hasnt been seen before now.

As some background the reason we use static jailid's is to ensure only one
instance of the jailed service is running, and the reason we re-attach to
the dieing jail is so that jails can be restarted in a timely manor. Without
using the re-attach we would need to wait of all tcp sessions which have
been aborted to timeout.


So, of course, Steven is interested in finding and fixing the root cause.  I
hope we will get to that with some help from the prison guards :-)


Does the above potentially explain how we're getting to the situation
which generates the panic?

If so we can certainly look at using alternatives to the current design to
workaround this issue. Flagging the jail as permanent and using manual process
management and additional external locking to prevent duplicates, is what
instantly springs to mind.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-16 Thread Andriy Gapon

on 15/08/2011 17:56 Steven Hartland said the following:
 (kgdb) x/512a 0xff8d8f357210
[snip]

Can you please also provide the following for this core?
list *vm_map_growstack+93
list *lim_cur+17
list *lim_rlimit+18

Also, it would be interesting to get panic output with DDB option.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-16 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org

To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Tuesday, August 16, 2011 9:30 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE

on 15/08/2011 17:56 Steven Hartland said the following:

(kgdb) x/512a 0xff8d8f357210

[snip]

Can you please also provide the following for this core?
list *vm_map_growstack+93
list *lim_cur+17
list *lim_rlimit+18

Also, it would be interesting to get panic output with DDB option.

Here's the info:-

(kgdb) list *vm_map_growstack+93
0x80543ffd is in vm_map_growstack (/usr/src/sys/vm/vm_map.c:3305).
3300struct uidinfo *uip;
3301
3302Retry:
3303PROC_LOCK(p);
3304stacklim = lim_cur(p, RLIMIT_STACK);
3305vmemlim = lim_cur(p, RLIMIT_VMEM);
3306PROC_UNLOCK(p);
3307
3308vm_map_lock_read(map);
3309
(kgdb) list *lim_cur+17
0x80384681 is in lim_cur (/usr/src/sys/kern/kern_resource.c:1150).
1145rlim_t
1146lim_cur(struct proc *p, int which)
1147{
1148struct rlimit rl;
1149
1150lim_rlimit(p, which, rl);
1151return (rl.rlim_cur);
1152}
1153
1154/*
(kgdb) list *lim_rlimit+18
0x80384632 is in lim_rlimit (/usr/src/sys/kern/kern_resource.c:1165).
1160{
1161
1162PROC_LOCK_ASSERT(p, MA_OWNED);
1163KASSERT(which = 0  which  RLIM_NLIMITS,
1164(request for invalid resource limit));
1165*rlp = p-p_limit-pl_rlimit[which];
1166if (p-p_sysent-sv_fixlimit != NULL)
1167p-p_sysent-sv_fixlimit(rlp, which);
1168}
1169

I've yet to have the machine with DDB + expanded stack panic.

I plan to leave it a day or so more then try a reboot to see if that
triggers it. If not I'll drop the stack back down to 4 and see if that
enables us to get another panic.

   Regards
   Steve

This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Andriy Gapon

on 14/08/2011 17:43 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org

 Maybe test it on couple of machines first just in case I overlooked something
 essential, although I have a report from another use that the patch didn't 
 break
 anything for him (it was tested for an unrelated issue).
 
 We've got this running on a ~40 machines and just had the first panic
 since the update. Unfortunately it doesn't seem to have changed anything :(
 
 We have 352 thread entries starting with:-
 #0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0,
 flags=Variable flags is not available.
 23 with:-
 cpustop_handler () at atomic.h:285
 and 16 with:-
 #0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

I would like to get a full output of thread apply all bt.

 The main message being:-
 panic: double fault
 
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 118Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

So this line, does it indicate a shutdown of a jail or of the whole system?

 Fatal double fault
 rip = 0x8053b691

Can you please provide output of 'list *0x8053b691' in kgdb?

 rsp = 0xff8d8f356fb0
 rbp = 0xff8d8f357210
 cpuid = 2; apic id = 02
 panic: double fault
 cpuid = 2
 KDB: stack backtrace:
 #0 0x803bb75e at kdb_backtrace+0x5e
 #1 0x8038956e at panic+0x2ae
 #2 0x805802b6 at dblfault_handler+0x96
 #3 0x8056900d at Xdblfault+0xad

I think (not 100% sure) that with DDB in kernel we could get a better backtrace
here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
smarter than the trivial stack(9) printer.

 stack: 0xff8d8f357000, 4

One thing I can say is that this looks like like a double-fault because of stack
exhaustion (the most typical cause): rsp value is below td_kstack.

Can you please also provide the following information:
p *((struct pcb *)((char *)0xff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
PAGE_SIZE is 4096.

 rsp = 0xff89ae10

[snip]

 There are some indications that stopping jails could be the
 cause of the panics so on one test box I've added in invariants
 to see if we get anything shows up from that.

OK.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org

We have 352 thread entries starting with:-
#0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0,
flags=Variable flags is not available.
23 with:-
cpustop_handler () at atomic.h:285
and 16 with:-
#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562


I would like to get a full output of thread apply all bt.


http://blog.multplay.co.uk/dropzone/freebsd/panic-2011-08-14-1524.txt


The main message being:-
panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:
118Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15


So this line, does it indicate a shutdown of a jail or of the whole system?


This specific panic was caused by me running reboot after all jails (~40)
where shutdown, which is slightly different from what my collegue was seeing
last friday, where the machines where panicing when the jails themselves
where stopped.

I may have a crash from one of these if needed.


Fatal double fault
rip = 0x8053b691


Can you please provide output of 'list *0x8053b691' in kgdb?


(kgdb) list *0x8053b691
0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
234 /*
235  * Find the backing store object and offset into it to begin the
236  * search.
237  */
238 fs.map = map;
239 result = vm_map_lookup(fs.map, vaddr, fault_type, fs.entry,
240 fs.first_object, fs.first_pindex, prot, wired);
241 if (result != KERN_SUCCESS) {
242 if (result != KERN_PROTECTION_FAILURE ||
243 (fault_flags  VM_FAULT_WIRE_MASK) != 
VM_FAULT_USER_WIRE) {




rsp = 0xff8d8f356fb0
rbp = 0xff8d8f357210
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
#0 0x803bb75e at kdb_backtrace+0x5e
#1 0x8038956e at panic+0x2ae
#2 0x805802b6 at dblfault_handler+0x96
#3 0x8056900d at Xdblfault+0xad


I think (not 100% sure) that with DDB in kernel we could get a better backtrace
here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
smarter than the trivial stack(9) printer.


I've added this into the the kernel on my test machine and will try
to get it panic over the next few days. Seems to need a few days on
uptime before the panics start happening. In addition to increasing
KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do
you want me to remove this increase?



stack: 0xff8d8f357000, 4


One thing I can say is that this looks like like a double-fault because of stack
exhaustion (the most typical cause): rsp value is below td_kstack.

Can you please also provide the following information:
p *((struct pcb *)((char *)0xff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
PAGE_SIZE is 4096.


(kgdb) p *((struct pcb *)((char *)0xff8d8f357000 + 4 * 4096) - 1)
$1 = {pcb_r15 = -2138686968, pcb_r14 = -1070655224792, pcb_r13 = 0, pcb_r12 = -1070655225856, pcb_rbp = -491518580864, pcb_rsp 
= -491518580952, pcb_rbx = -1099195460512, pcb_rip = -2143622375, pcb_fsbase = 34365428376,
 pcb_gsbase = 0, pcb_kgsbase = 0, pcb_cr0 = 0, pcb_cr2 = 0, pcb_cr3 = 12406784, pcb_cr4 = 0, pcb_dr0 = 0, pcb_dr1 = 0, pcb_dr2 = 
0, pcb_dr3 = 0, pcb_dr6 = 0, pcb_dr7 = 0, pcb_flags = 0, pcb_initial_fpucw = 895,
 pcb_onfault = 0x0, pcb_gs32sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx = 0, 
sd_long = 0, sd_def32 = 0, sd_gran = 0, sd_hibase = 0}, pcb_tssp = 0x0,
 pcb_save = 0xff8d8f35ae00, pcb_full_iret = 0 '\0', pcb_gdt = {rd_limit = 0, rd_base = 0}, pcb_idt = {rd_limit = 0, rd_base = 
0}, pcb_ldt = {rd_limit = 0, rd_base = 0}, pcb_tr = 0, pcb_user_save = {sv_env = {en_cw = 895,
 en_sw = 0, en_tw = 0 '\0', en_zero = 0 '\0', en_opcode = 0, en_rip = 0, en_rdp = 0, en_mxcsr = 8096, en_mxcsr_mask = 65535}, 
sv_fp = {{fp_acc = {fp_bytes = \000\000\000\000\000\000\000\000\000},
   fp_pad = \000\000\000\000\000}, {fp_acc = {fp_bytes = \000\000\000\000\000\000\000\000\000}, fp_pad = 
\000\000\000\000\000}, {fp_acc = {fp_bytes = \000\000\000\000\000\000\000\000\000},
   fp_pad = \000\000\000\000\000}, {fp_acc = {fp_bytes = \000\000\000\000\000\000\000\000\000}, fp_pad = 
\000\000\000\000\000}, {fp_acc = {fp_bytes = \000\000\000\000\000\000\000\000\000},
   fp_pad = \000\000\000\000\000}, {fp_acc = {fp_bytes = \000\000\000\000\000\000\000\000\000}, fp_pad =

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Andriy Gapon

on 15/08/2011 13:34 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 I think (not 100% sure) that with DDB in kernel we could get a better 
 backtrace
 here, possibly with pre-dblfault stack frames, because DDB backend is a bit 
 more
 smarter than the trivial stack(9) printer.
 
 I've added this into the the kernel on my test machine and will try
 to get it panic over the next few days. Seems to need a few days on
 uptime before the panics start happening. In addition to increasing
 KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do
 you want me to remove this increase?

Yes, I think it would make sense to change KSTACK_PAGES to the default value.
But, OTOH, if you can afford to have DDB in a few more machines, then it would 
be
interesting to compare behavior with different stack sizes.

BTW, if you don't want your machines to sit at ddb prompt after panic, then 
you'd
also need either KDB_UNATTENDED option or set debug.debugger_on_panic=0.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Andriy Gapon

on 15/08/2011 13:34 Steven Hartland said the following:
 (kgdb) list *0x8053b691
 0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
 234 /*
 235  * Find the backing store object and offset into it to begin 
 the
 236  * search.
 237  */
 238 fs.map = map;
 239 result = vm_map_lookup(fs.map, vaddr, fault_type, fs.entry,
 240 fs.first_object, fs.first_pindex, prot, wired);
 241 if (result != KERN_SUCCESS) {
 242 if (result != KERN_PROTECTION_FAILURE ||
 243 (fault_flags  VM_FAULT_WIRE_MASK) !=
 VM_FAULT_USER_WIRE) {
 

Interesting... thanks!
Can you please also additionally provide (lengthy) output of x/512a
0xff8d8f356fb0 ?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org




on 15/08/2011 13:34 Steven Hartland said the following:

(kgdb) list *0x8053b691
0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
234 /*
235  * Find the backing store object and offset into it to begin the
236  * search.
237  */
238 fs.map = map;
239 result = vm_map_lookup(fs.map, vaddr, fault_type, fs.entry,
240 fs.first_object, fs.first_pindex, prot, wired);
241 if (result != KERN_SUCCESS) {
242 if (result != KERN_PROTECTION_FAILURE ||
243 (fault_flags  VM_FAULT_WIRE_MASK) !=
VM_FAULT_USER_WIRE) {



Interesting... thanks!
Can you please also additionally provide (lengthy) output of x/512a
0xff8d8f356fb0 ?


Sorry I'm not sure I follow your their?

Do you mean any of the following:-
(kgdb) x/512a
0xff8d8f35b000: Cannot access memory at address 0xff8d8f35b000

(kgdb) list *0xff8d8f356fb0
No source file for address 0xff8d8f356fb0.

or:
(kgdb) x/512a 0xff8d8f356fb0
0xff8d8f356fb0: Cannot access memory at address 0xff8d8f356fb0

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Andriy Gapon

on 15/08/2011 15:51 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 
 on 15/08/2011 13:34 Steven Hartland said the following:
 (kgdb) list *0x8053b691
 0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
 234 /*
 235  * Find the backing store object and offset into it to 
 begin the
 236  * search.
 237  */
 238 fs.map = map;
 239 result = vm_map_lookup(fs.map, vaddr, fault_type, 
 fs.entry,
 240 fs.first_object, fs.first_pindex, prot, wired);
 241 if (result != KERN_SUCCESS) {
 242 if (result != KERN_PROTECTION_FAILURE ||
 243 (fault_flags  VM_FAULT_WIRE_MASK) !=
 VM_FAULT_USER_WIRE) {


 Interesting... thanks!
 Can you please also additionally provide (lengthy) output of x/512a
 0xff8d8f356fb0 ?
 
 Sorry I'm not sure I follow your their?

It seems that you got me correctly :)

 Do you mean any of the following:-
 (kgdb) x/512a
 0xff8d8f35b000: Cannot access memory at address 0xff8d8f35b000

 (kgdb) list *0xff8d8f356fb0
 No source file for address 0xff8d8f356fb0.
 
 or:
 (kgdb) x/512a 0xff8d8f356fb0
 0xff8d8f356fb0: Cannot access memory at address 0xff8d8f356fb0

Can you please try this (the last command) with 0xff8d8f357210 instead?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland



- Original Message - 
From: Andriy Gapon a...@freebsd.org

To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Monday, August 15, 2011 2:20 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



on 15/08/2011 15:51 Steven Hartland said the following:

- Original Message - From: Andriy Gapon a...@freebsd.org



on 15/08/2011 13:34 Steven Hartland said the following:

(kgdb) list *0x8053b691
0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
234 /*
235  * Find the backing store object and offset into it to begin the
236  * search.
237  */
238 fs.map = map;
239 result = vm_map_lookup(fs.map, vaddr, fault_type, fs.entry,
240 fs.first_object, fs.first_pindex, prot, wired);
241 if (result != KERN_SUCCESS) {
242 if (result != KERN_PROTECTION_FAILURE ||
243 (fault_flags  VM_FAULT_WIRE_MASK) !=
VM_FAULT_USER_WIRE) {



Interesting... thanks!
Can you please also additionally provide (lengthy) output of x/512a
0xff8d8f356fb0 ?


Sorry I'm not sure I follow your their?


It seems that you got me correctly :)


Do you mean any of the following:-
(kgdb) x/512a
0xff8d8f35b000: Cannot access memory at address 0xff8d8f35b000

(kgdb) list *0xff8d8f356fb0
No source file for address 0xff8d8f356fb0.

or:
(kgdb) x/512a 0xff8d8f356fb0
0xff8d8f356fb0: Cannot access memory at address 0xff8d8f356fb0


Can you please try this (the last command) with 0xff8d8f357210 instead?


(kgdb) x/512a 0xff8d8f357210
0xff8d8f357210: 0xff8d8f357280  0x805807d3 
trap_pfault+307
0xff8d8f357220: 0x0 0xff8d8f357370
0xff8d8f357230: 0xff06b7f9c000  0x30
0xff8d8f357240: 0x1 0x0
0xff8d8f357250: 0x0 0x9
0xff8d8f357260: 0xc 0xff8d8f357370
0xff8d8f357270: 0xff06b7f9c000  0x0
0xff8d8f357280: 0xff8d8f357360  0x80580e0f trap+991
0xff8d8f357290: 0x0 0x0
0xff8d8f3572a0: 0x80074e49e 0x2
0xff8d8f3572b0: 0x80071cba0 0x80071cdc0
0xff8d8f3572c0: 0x80071c9a0 0x0
0xff8d8f3572d0: 0x0 0x0
0xff8d8f3572e0: 0x0 0x0
0xff8d8f3572f0: 0x0 0x0
0xff8d8f357300: 0x80074e49e 0x1
0xff8d8f357310: 0x80071cba0 0x80071cdc0
0xff8d8f357320: 0x80071c9a0 0x0
0xff8d8f357330: 0x0 0x4
0xff8d8f357340: 0xff070b5a48c0  0xff06b7f9c000
0xff8d8f357350: 0x0 0x8083e920 vmspace0
0xff8d8f357360: 0xff8d8f357430  0x80568f04 calltrap+8
0xff8d8f357370: 0xff070b5a48c0  0x3
0xff8d8f357380: 0xff8d8f357440  0x0
0xff8d8f357390: 0xff8d8f357440  0x30
0xff8d8f3573a0: 0xff06b7f9c000  0x4
0xff8d8f3573b0: 0xff8d8f357430  0x8083e920 vmspace0
0xff8d8f3573c0: 0xff06b7f9c000  0xff070b5a48c0
0xff8d8f3573d0: 0xff06b7f9c000  0x0
0xff8d8f3573e0: 0x8083e920 vmspace0   0x1b0013000c
0xff8d8f3573f0: 0x300x3b003b0001
0xff8d8f357400: 0x0 0x80384632 lim_rlimit+18
0xff8d8f357410: 0x200x10206
0xff8d8f357420: 0xff8d8f357430  0x28
0xff8d8f357430: 0xff8d8f357450  0x80384681 lim_cur+17
0xff8d8f357440: 0x4 0xff070b5a48c0
0xff8d8f357450: 0xff8d8f357500  0x80543ffd 
vm_map_growstack+93
0xff8d8f357460: 0xff8d8f357470  0xff8d8f3576d8
0xff8d8f357470: 0xff8d8f357500  0x80544ef8 
vm_map_lookup+808
0xff8d8f357480: 0xff070b5a49b8  0x0
0xff8d8f357490: 0x8 0xff06b7f9c000
0xff8d8f3574a0: 0xff06b7f9c000  0xff8d8f3576d8
0xff8d8f3574b0: 0xff8d8f3576d0  0xff8d8f3576e8
0xff8d8f3574c0: 0x0 0xff8d8f3576e0
0xff8d8f3574d0: 0x10001 0x1
0xff8d8f3574e0: 0xff06b7f9c000  0x1
0xff8d8f3574f0: 0x0 0x8083e920 vmspace0
0xff8d8f357500: 0xff8d8f357770  0x8053c723 
vm_fault+4355
0xff8d8f357510: 0xff8d8f35773f  0xff8d8f357738
0xff8d8f357520: 0x80085e4f9 0x80085e4f8
0xff8d8f357530: 0xff06b7f9c000  0xff8d8f3576e0
0xff8d8f357540: 0xff8d8f3576e8  0xff8d8f3576d0
0xff8d8f357550: 0xff8d8f3576d8  0x80085e4f9
0xff8d8f357560: 0x80085e4f9 0x80085e4f9
0xff8d8f357570: 0x80085e4f9 0x80085e4f9
0xff8d8f357580: 0x80085e4f9 0x80085e4f9
0xff8d8f357590: 0x80085e4f9 0x80085e4f9
0xff8d8f3575a0: 0x80085e4f9 0x734210
0xff8d8f3575b0: 0x101   0x80073ada0

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Andriy Gapon

on 15/08/2011 17:56 Steven Hartland said the following:
 
 - Original Message - From: Andriy Gapon a...@freebsd.org
 To: Steven Hartland kill...@multiplay.co.uk
 Cc: freebsd-stable@FreeBSD.org
 Sent: Monday, August 15, 2011 2:20 PM
 Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
 
 
 on 15/08/2011 15:51 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org


 on 15/08/2011 13:34 Steven Hartland said the following:
 (kgdb) list *0x8053b691
 0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
 234 /*
 235  * Find the backing store object and offset into it to 
 begin the
 236  * search.
 237  */
 238 fs.map = map;
 239 result = vm_map_lookup(fs.map, vaddr, fault_type, 
 fs.entry,
 240 fs.first_object, fs.first_pindex, prot, wired);
 241 if (result != KERN_SUCCESS) {
 242 if (result != KERN_PROTECTION_FAILURE ||
 243 (fault_flags  VM_FAULT_WIRE_MASK) !=
 VM_FAULT_USER_WIRE) {


 Interesting... thanks!
[snip]
 (kgdb) x/512a 0xff8d8f357210

This is not conclusive, but that stack looks like the following recursive chain:
vm_fault - {vm_map_lookup, vm_map_growstack} - trap - trap_pfault - vm_fault
So I suspect that increasing kernel stack size won't help here much.
Where does this chain come from?  I have no answer at the moment, maybe other
developers could help here.  I suspect that we shouldn't be getting that trap in
vm_map_growstack or should handle it in a different way.

 0xff8d8f357210: 0xff8d8f357280  0x805807d3 
 trap_pfault+307
 0xff8d8f357220: 0x0 0xff8d8f357370
 0xff8d8f357230: 0xff06b7f9c000  0x30
 0xff8d8f357240: 0x1 0x0
 0xff8d8f357250: 0x0 0x9
 0xff8d8f357260: 0xc 0xff8d8f357370
 0xff8d8f357270: 0xff06b7f9c000  0x0
 0xff8d8f357280: 0xff8d8f357360  0x80580e0f trap+991
 0xff8d8f357290: 0x0 0x0
 0xff8d8f3572a0: 0x80074e49e 0x2
 0xff8d8f3572b0: 0x80071cba0 0x80071cdc0
 0xff8d8f3572c0: 0x80071c9a0 0x0
 0xff8d8f3572d0: 0x0 0x0
 0xff8d8f3572e0: 0x0 0x0
 0xff8d8f3572f0: 0x0 0x0
 0xff8d8f357300: 0x80074e49e 0x1
 0xff8d8f357310: 0x80071cba0 0x80071cdc0
 0xff8d8f357320: 0x80071c9a0 0x0
 0xff8d8f357330: 0x0 0x4
 0xff8d8f357340: 0xff070b5a48c0  0xff06b7f9c000
 0xff8d8f357350: 0x0 0x8083e920 vmspace0
 0xff8d8f357360: 0xff8d8f357430  0x80568f04 
 calltrap+8
 0xff8d8f357370: 0xff070b5a48c0  0x3
 0xff8d8f357380: 0xff8d8f357440  0x0
 0xff8d8f357390: 0xff8d8f357440  0x30
 0xff8d8f3573a0: 0xff06b7f9c000  0x4
 0xff8d8f3573b0: 0xff8d8f357430  0x8083e920 vmspace0
 0xff8d8f3573c0: 0xff06b7f9c000  0xff070b5a48c0
 0xff8d8f3573d0: 0xff06b7f9c000  0x0
 0xff8d8f3573e0: 0x8083e920 vmspace0   0x1b0013000c
 0xff8d8f3573f0: 0x300x3b003b0001
 0xff8d8f357400: 0x0 0x80384632 lim_rlimit+18
 0xff8d8f357410: 0x200x10206
 0xff8d8f357420: 0xff8d8f357430  0x28
 0xff8d8f357430: 0xff8d8f357450  0x80384681 
 lim_cur+17
 0xff8d8f357440: 0x4 0xff070b5a48c0
 0xff8d8f357450: 0xff8d8f357500  0x80543ffd
 vm_map_growstack+93
 0xff8d8f357460: 0xff8d8f357470  0xff8d8f3576d8
 0xff8d8f357470: 0xff8d8f357500  0x80544ef8
 vm_map_lookup+808
[trim]

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland



- Original Message - 
From: Andriy Gapon a...@freebsd.org

To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Monday, August 15, 2011 4:36 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



on 15/08/2011 17:56 Steven Hartland said the following:


- Original Message - From: Andriy Gapon a...@freebsd.org
To: Steven Hartland kill...@multiplay.co.uk
Cc: freebsd-stable@FreeBSD.org
Sent: Monday, August 15, 2011 2:20 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE



on 15/08/2011 15:51 Steven Hartland said the following:

- Original Message - From: Andriy Gapon a...@freebsd.org



on 15/08/2011 13:34 Steven Hartland said the following:

(kgdb) list *0x8053b691
0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
234 /*
235  * Find the backing store object and offset into it to begin the
236  * search.
237  */
238 fs.map = map;
239 result = vm_map_lookup(fs.map, vaddr, fault_type, fs.entry,
240 fs.first_object, fs.first_pindex, prot, wired);
241 if (result != KERN_SUCCESS) {
242 if (result != KERN_PROTECTION_FAILURE ||
243 (fault_flags  VM_FAULT_WIRE_MASK) !=
VM_FAULT_USER_WIRE) {



Interesting... thanks!

[snip]

(kgdb) x/512a 0xff8d8f357210


This is not conclusive, but that stack looks like the following recursive chain:
vm_fault - {vm_map_lookup, vm_map_growstack} - trap - trap_pfault - vm_fault
So I suspect that increasing kernel stack size won't help here much.
Where does this chain come from?  I have no answer at the moment, maybe other
developers could help here.  I suspect that we shouldn't be getting that trap in
vm_map_growstack or should handle it in a different way.



Just in case its relevant I've checked other crashes and all rip entries
point to: vm_fault (/usr/src/sys/vm/vm_fault.c:239).

A more typical layout is from a selection of machines is:-

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0x8053b061
rsp = 0xff86ccf8ffb0
rbp = 0xff86ccf90210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0x803bb28e at kdb_backtrace+0x5e
#1 0x80389187 at panic+0x187
#2 0x8057fc86 at dblfault_handler+0x96
#3 0x805689dd at Xdblfault+0xad
Uptime: 2d21h25m4s
Physical memory: 24555 MB
Dumping 4184 MB:...


Unread portion of the kernel message buffer:

Fatal double fault
rip = 0x8053b061
rsp = 0xff86cc742fb0
rbp = 0xff86cc743210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0x803bb28e at kdb_backtrace+0x5e
#1 0x80389187 at panic+0x187
#2 0x8057fc86 at dblfault_handler+0x96
#3 0x805689dd at Xdblfault+0xad
Uptime: 2d4h30m58s
Physical memory: 24555 MB
Dumping 5088 MB:...


Fatal double fault
rip = 0x8053b061
rsp = 0xff86caeabfb0
rbp = 0xff86caeac210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0x803bb28e at kdb_backtrace+0x5e
#1 0x80389187 at panic+0x187
#2 0x8057fc86 at dblfault_handler+0x96
#3 0x805689dd at Xdblfault+0xad
Uptime: 3d1h56m45s
Physical memory: 24555 MB
Dumping 4690 MB:...


Fatal double fault
rip = 0x8053b061
rsp = 0xff86cb1c7fb0
rbp = 0xff86cb1c8210
cpuid = 4; apic id = 04
panic: double fault
cpuid = 4
KDB: stack backtrace:
#0 0x803bb28e at kdb_backtrace+0x5e
#1 0x80389187 at panic+0x187
#2 0x8057fc86 at dblfault_handler+0x96
#3 0x805689dd at Xdblfault+0xad
Uptime: 1d13h41m19s
Physical memory: 24555 MB
Dumping 3626 MB:...

And in case any of the changes to loader.conf or sysctl.conf are
relevant here they are:-
[loader.conf]
zfs_load=YES
vfs.root.mountfrom=zfs:tank/root
# fix swap zone exhausted, increase kern.maxswzone
kern.maxswzone=67108864
# Reduce the minimum arc level we want our apps to have the memory
vfs.zfs.arc_min=512M
[/loader.conf]

[sysctl.conf]
vfs.read_max=32
net.inet.tcp.inflight.enable=0
net.inet.tcp.sendspace=65536
kern.ipc.maxsockbuf=524288
kern.maxfiles=5
kern.ipc.nmbclusters=51200
[/sysctl.conf]

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-14 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org


Maybe test it on couple of machines first just in case I overlooked something
essential, although I have a report from another use that the patch didn't break
anything for him (it was tested for an unrelated issue).


We've got this running on a ~40 machines and just had the first panic
since the update. Unfortunately it doesn't seem to have changed anything :(

We have 352 thread entries starting with:-
#0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable 
flags is not available.
23 with:-
cpustop_handler () at atomic.h:285
and 16 with:-
#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

The main message being:-
panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:
118Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

Fatal double fault
rip = 0x8053b691
rsp = 0xff8d8f356fb0
rbp = 0xff8d8f357210
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
#0 0x803bb75e at kdb_backtrace+0x5e
#1 0x8038956e at panic+0x2ae
#2 0x805802b6 at dblfault_handler+0x96
#3 0x8056900d at Xdblfault+0xad
stack: 0xff8d8f357000, 4
rsp = 0xff89ae10
Uptime: 2d21h6m18s
Physical memory: 49132 MB
Dumping 17080 MB: 17065...
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from 
/boot/kernel/linprocfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from 
/boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko
#0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable 
flags is not available.)
   at /usr/src/sys/kern/sched_ule.c:1858
1858cpuid = PCPU_GET(cpuid);
(kgdb) #0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable 
flags is not available.)
   at /usr/src/sys/kern/sched_ule.c:1858
#1  0x80391a99 in mi_switch (flags=260, newtd=0x0)
   at /usr/src/sys/kern/kern_synch.c:451
#2  0x803c5112 in sleepq_timedwait (wchan=0x8083e080, pri=68)
   at /usr/src/sys/kern/subr_sleepqueue.c:644
#3  0x80391efb in _sleep (ident=0x8083e080, lock=0x0,
   priority=Variable priority is not available.) at 
/usr/src/sys/kern/kern_synch.c:230
#4  0x8053ebc9 in scheduler (dummy=Variable dummy is not available.)
   at /usr/src/sys/vm/vm_glue.c:807
#5  0x80341767 in mi_startup () at /usr/src/sys/kern/init_main.c:254
#6  0x8016efdc in btext () at /usr/src/sys/amd64/amd64/locore.S:81
#7  0x80863dc8 in sleepq_chains ()
#8  0x80848ae0 in cpu_top ()
#9  0x in ?? ()
#10 0x8083e4e0 in proc0 ()
#11 0x80bb3b90 in ?? ()
#12 0x80bb3b38 in ?? ()
#13 0xff0012d838c0 in ?? ()
#14 0x803aeb19 in sched_switch (td=0x0, newtd=0x0, flags=Variable 
flags is not available.)
   at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)

There are some indications that stopping jails could be the
cause of the panics so on one test box I've added in invariants
to see if we get anything shows up from that.

   Regards
   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-14 Thread Steven Hartland



- Original Message - 
From: Attilio Rao atti...@freebsd.org

Anyway, we really would need much more information in order to take a
proactive action.



Would it be possible to access to one of the panic'ing machine? Is it
always the same panic which is happening or it is variadic (like: once
page fault, once fatal double fault, once fatal trap, etc.).


They are always double fault, 99% of the time with no additional info
we've seen 1 mention of java on one of the machines but the vmcore
didn't seem to mention anything to do with that after dump.

My colleague informs me when he did the upgrade to add in schedule
stop patch, pretty much every machine paniced when shutting the
java servers down, which is essentially a jail stop.

I've also had two panics when rebooting my test machine to change
kernel settings, although this could be a side effect of the scheduler
patch?

This single test machine is now running with the following none standard
settings:-
options INVARIANTS
options INVARIANT_SUPPORT
options DDB
options KSTACK_PAGES=12

I've got several vmcores from a number or different machines but none
seem to be any use, as they don't seem to list any thread that caused
the panic i.e. no mention of dump, or fault.

Is there something else in particular I should be looking for?

Circumstantial evidence seems to indicate uptime may to be a factor,
machines under 2 days seem much less likely to panic.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Andriy Gapon

on 10/08/2011 18:35 Steven Hartland said the following:
 Fatal double fault
 rip = 0x8052f6f1
 rsp = 0xff86ce600fb0
 rbp = 0xff86ce601210
 cpuid = 0; apic id = 00
 panic: double fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0x803af91e at kdb_backtrace+0x5e
 #1 0x8037d817 at panic+0x187
 #2 0x80574316 at dblfault_handler+0x96
 #3 0x8055d06d at Xdblfault+0xad
[snip]
 #0  sched_switch (td=0x80830bc0, newtd=0xff000a73f8c0, 
 flags=Variable
 flags is not available.)
at /usr/src/sys/kern/sched_ule.c:1858
 1858cpuid = PCPU_GET(cpuid);
 (kgdb)
 #0  sched_switch (td=0x80830bc0, newtd=0xff000a73f8c0, 
 flags=Variable
 flags is not available.)
at /usr/src/sys/kern/sched_ule.c:1858
 #1  0x80385c86 in mi_switch (flags=260, newtd=0x0)
at /usr/src/sys/kern/kern_synch.c:449
 #2  0x803b92d2 in sleepq_timedwait (wchan=0x80830760, pri=68)
at /usr/src/sys/kern/subr_sleepqueue.c:644
 #3  0x803861e1 in _sleep (ident=0x80830760, lock=0x0,
priority=Variable priority is not available.
 ) at /usr/src/sys/kern/kern_synch.c:230
 #4  0x80532c29 in scheduler (dummy=Variable dummy is not available.
 ) at /usr/src/sys/vm/vm_glue.c:807
 #5  0x80335d67 in mi_startup () at /usr/src/sys/kern/init_main.c:254
 #6  0x8016efac in btext () at /usr/src/sys/amd64/amd64/locore.S:81
 #7  0x808556e0 in sleepq_chains ()
 #8  0x8083b1e0 in cpu_top ()
 #9  0x in ?? ()
 #10 0x80830bc0 in proc0 ()
 #11 0x80ba4b90 in ?? ()
 #12 0x80ba4b38 in ?? ()
 #13 0xff000a73f8c0 in ?? ()
 #14 0x803a2cc9 in sched_switch (td=0x0, newtd=0x0, flags=Variable 
 flags
 is not available.
 )
at /usr/src/sys/kern/sched_ule.c:1852
 Previous frame inner to this frame (corrupt stack?)
 (kgdb)

Looks like this is just the first thread in the kernel.
Perhaps 'thread apply all bt' could help to find the culprit.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Attilio Rao

I'd really point the finger to faulty hw.

Please run all the necessary diagnostic tools for catching it.

Attilio

2011/8/11 Andriy Gapon a...@freebsd.org:
 on 10/08/2011 18:35 Steven Hartland said the following:
 Fatal double fault
 rip = 0x8052f6f1
 rsp = 0xff86ce600fb0
 rbp = 0xff86ce601210
 cpuid = 0; apic id = 00
 panic: double fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0x803af91e at kdb_backtrace+0x5e
 #1 0x8037d817 at panic+0x187
 #2 0x80574316 at dblfault_handler+0x96
 #3 0x8055d06d at Xdblfault+0xad
 [snip]
 #0  sched_switch (td=0x80830bc0, newtd=0xff000a73f8c0, 
 flags=Variable
 flags is not available.)
    at /usr/src/sys/kern/sched_ule.c:1858
 1858                    cpuid = PCPU_GET(cpuid);
 (kgdb)
 #0  sched_switch (td=0x80830bc0, newtd=0xff000a73f8c0, 
 flags=Variable
 flags is not available.)
    at /usr/src/sys/kern/sched_ule.c:1858
 #1  0x80385c86 in mi_switch (flags=260, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:449
 #2  0x803b92d2 in sleepq_timedwait (wchan=0x80830760, pri=68)
    at /usr/src/sys/kern/subr_sleepqueue.c:644
 #3  0x803861e1 in _sleep (ident=0x80830760, lock=0x0,
    priority=Variable priority is not available.
 ) at /usr/src/sys/kern/kern_synch.c:230
 #4  0x80532c29 in scheduler (dummy=Variable dummy is not available.
 ) at /usr/src/sys/vm/vm_glue.c:807
 #5  0x80335d67 in mi_startup () at /usr/src/sys/kern/init_main.c:254
 #6  0x8016efac in btext () at /usr/src/sys/amd64/amd64/locore.S:81
 #7  0x808556e0 in sleepq_chains ()
 #8  0x8083b1e0 in cpu_top ()
 #9  0x in ?? ()
 #10 0x80830bc0 in proc0 ()
 #11 0x80ba4b90 in ?? ()
 #12 0x80ba4b38 in ?? ()
 #13 0xff000a73f8c0 in ?? ()
 #14 0x803a2cc9 in sched_switch (td=0x0, newtd=0x0, flags=Variable 
 flags
 is not available.
 )
    at /usr/src/sys/kern/sched_ule.c:1852
 Previous frame inner to this frame (corrupt stack?)
 (kgdb)

 Looks like this is just the first thread in the kernel.
 Perhaps 'thread apply all bt' could help to find the culprit.

 --
 Andriy Gapon
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland


That's not the issue as its happening across board over 130 machines :(

   Regards
   Steve

- Original Message - 
From: Attilio Rao atti...@freebsd.org



I'd really point the finger to faulty hw.

Please run all the necessary diagnostic tools for catching it.

Attilio



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Jeremy Chadwick

On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote:
 That's not the issue as its happening across board over 130 machines :(

Agreed, bad hardware sounds unlikely here.  I could believe some strange
incompatibility (e.g. BIOS quirk or the like[1]) that might cause problems
en masse across many servers, but hardware issues are unlikely in this
situation.

[1]: I mention this because we had something similar happen at my
workplace.  For months we used a specific model of system from our
vendor which worked reliably, zero issues.  Then we got a new shipment
of boxes (same model as prior) which started acting very odd (often AHCI
timeout issues or MCEs which when decoded would usually turn out to be
nonsensical).  It took weeks to determine the cause given how slow the
vendor was to respond: root cause turned out to be that the vendor
decided, on a whim, to start shipping a newer BIOS version which wasn't
as compatible with Solaris as previous BIOSes.  Downgrading all the
systems to the older BIOS fixed the problem.

In Steve's case this is unlikely to be the situation, but I thought I'd
share the story anyway.  SKU ABCXYZ-1 from August 2009 is not
necessarily the same thing as SKU ABCXYZ-1 from May 2010.  ;-)  This
is also why I prefer to buy/build my own systems, since I cannot trust
vendors to not mess about with settings w/out changing SKUs, P/Ns, or
revision numbers.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Attilio Rao

2011/8/11 Jeremy Chadwick free...@jdc.parodius.com:
 On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote:
 That's not the issue as its happening across board over 130 machines :(

 Agreed, bad hardware sounds unlikely here.  I could believe some strange
 incompatibility (e.g. BIOS quirk or the like[1]) that might cause problems
 en masse across many servers, but hardware issues are unlikely in this
 situation.

 [1]: I mention this because we had something similar happen at my
 workplace.  For months we used a specific model of system from our
 vendor which worked reliably, zero issues.  Then we got a new shipment
 of boxes (same model as prior) which started acting very odd (often AHCI
 timeout issues or MCEs which when decoded would usually turn out to be
 nonsensical).  It took weeks to determine the cause given how slow the
 vendor was to respond: root cause turned out to be that the vendor
 decided, on a whim, to start shipping a newer BIOS version which wasn't
 as compatible with Solaris as previous BIOSes.  Downgrading all the
 systems to the older BIOS fixed the problem.

That falls in the hw problem category for me.

Anyway, we really would need much more information in order to take a
proactive action.

Would it be possible to access to one of the panic'ing machine? Is it
always the same panic which is happening or it is variadic (like: once
page fault, once fatal double fault, once fatal trap, etc.).

Whatever informations you can provide may be valuable here.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland

- Original Message - 
From: Jeremy Chadwick free...@jdc.parodius.com




On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote:

That's not the issue as its happening across board over 130 machines :(


Agreed, bad hardware sounds unlikely here.  I could believe some strange
incompatibility (e.g. BIOS quirk or the like[1]) that might cause problems
en masse across many servers, but hardware issues are unlikely in this
situation.


Its affecting a range of hardware from supermicro blades / 2u's 
dell blades. So it seems more like a software bug.


[1]: I mention this because we had something similar happen at my
workplace.  For months we used a specific model of system from our
vendor which worked reliably, zero issues.  Then we got a new shipment
of boxes (same model as prior) which started acting very odd (often AHCI
timeout issues or MCEs which when decoded would usually turn out to be
nonsensical).  It took weeks to determine the cause given how slow the
vendor was to respond: root cause turned out to be that the vendor
decided, on a whim, to start shipping a newer BIOS version which wasn't
as compatible with Solaris as previous BIOSes.  Downgrading all the
systems to the older BIOS fixed the problem.


The machines have been working for months fine, the panics only started
last week.

We've been looking at the changes made last week to see if we can identify
the cause. The only change made in that time frame was the rollout
of the change to kern.ipc.nmbclusters to workaround the tcp re-assembly
issue.

In this case we raised the value from the default of 25600 to 262144.

We've used this value for a long time on our core webservers, which are
also running 8.2 so I'd be very surprised if this was the cause. That said
we're looking to roll out kern.ipc.nmbclusters=51200 to try and rule it
out.

Prior to this, 1-2 weeks previous, we rolled out a significant update which
included:-
1. Adding IPv6 to the kernel (although no machines are configued with it yet)
2. Adding ipmi module to the kernel, although not loaded.
3. Rebuilding ALL ports to the latest version
4. Restructuring the server layout to be one jail per java server (~60
servers per machine)
5. Restructing the filesystem to be a base nullfs mount + devfs +
zfs volume per server

This update had been testing for 2 weeks prior to that, so in total 3-4
weeks before any panics where seen but that doesn't mean the issue
didnt exist at that time.

Currently we're seeing 1-4 panics a day across all machines.

So currently the most likely suspects are:-
1. kern.ipc.nmbclusters
2. nullfs
3. ipv6
4. a package update, most likely being openjdk6-b23
5. jail


In Steve's case this is unlikely to be the situation, but I thought I'd
share the story anyway.  SKU ABCXYZ-1 from August 2009 is not
necessarily the same thing as SKU ABCXYZ-1 from May 2010.  ;-)  This
is also why I prefer to buy/build my own systems, since I cannot trust
vendors to not mess about with settings w/out changing SKUs, P/Ns, or
revision numbers.


This caused us much scratching of heads when looking for that tcp issue
the other day. As it seemed to effecting the newer machines more than
the old, we even found two machines with the same version of the bios
but that's clearly a different build as the date and available options
where different, quite frustrating!

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org




on 10/08/2011 18:35 Steven Hartland said the following:

Fatal double fault

...

#14 0x803a2cc9 in sched_switch (td=0x0, newtd=0x0, flags=Variable 
flags
is not available.
)
   at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)
(kgdb)


Looks like this is just the first thread in the kernel.
Perhaps 'thread apply all bt' could help to find the culprit.


The trimmed down output, removed the 10,000's of ?? lines here:-
http://blog.multiplay.co.uk/dropzone/freebsd/panic-2011-08-11-1402.txt

The raw output is here:-
http://blog.multiplay.co.uk/dropzone/freebsd/panic-full-2011-08-11-1402.txt.bz2

I'm not sure how useful its going to be as pretty much all of it seems
to be just:-
#0  sched_tswitch (td=0xff00194d4460, newtd=0xff000a74a000, flags=Variable 
flags is not available.
#1  0x80385c86 in mi_switch (flags=260, newtd=0x0) at 
/usr/src/sys/kern/kern_synch.c:449
#2  0x803b8a0c in sleepq_catch_signals (wchan=0xff02f27c48c0, 
pri=92) at /usr/src/sys/kern/subr_sleepqueue.c:418
#3  0x803b9326 in sleepq_wait_sig (wchan=Variable wchan is not 
available.
#4  0x80386149 in _sleep (ident=0xff02f27c48c0, lock=0xff02f27c49b8, priority=Variable priority is not 
available.
#5  0x8035079d in kern_wait (td=0xff00194d4460, pid=91362, status=0xff86cdbffabc, options=Variable options is 
not available.

#6  0x80350e95 in wait4 (td=Variable td is not available.
#7  0x803bb8e5 in syscallenter (td=0xff00194d4460, 
sa=0xff86cdbffba0) at /usr/src/sys/kern/subr_trap.c:315
#8  0x80574a0b in syscall (frame=0xff86cdbffc40) at 
/usr/src/sys/amd64/amd64/trap.c:888
#9  0x8055d242 in Xfast_syscall () at 
/usr/src/sys/amd64/amd64/exception.S:377

On one machine we had a little more info on console which may indicate
java as the problem.

http://blog.multiplay.co.uk/dropzone/freebsd/panic-java.jpg

   Regards
   Steve 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Andriy Gapon

on 11/08/2011 14:39 Steven Hartland said the following:
 The trimmed down output, removed the 10,000's of ?? lines here:-
 http://blog.multiplay.co.uk/dropzone/freebsd/panic-2011-08-11-1402.txt
 
 The raw output is here:-
 http://blog.multiplay.co.uk/dropzone/freebsd/panic-full-2011-08-11-1402.txt.bz2
 
 I'm not sure how useful its going to be as pretty much all of it seems
 to be just:-
 #0  sched_tswitch (td=0xff00194d4460, newtd=0xff000a74a000, 
 flags=Variable
 flags is not available.
 #1  0x80385c86 in mi_switch (flags=260, newtd=0x0) at
 /usr/src/sys/kern/kern_synch.c:449
 #2  0x803b8a0c in sleepq_catch_signals (wchan=0xff02f27c48c0, 
 pri=92)
 at /usr/src/sys/kern/subr_sleepqueue.c:418
 #3  0x803b9326 in sleepq_wait_sig (wchan=Variable wchan is not 
 available.
 #4  0x80386149 in _sleep (ident=0xff02f27c48c0,
 lock=0xff02f27c49b8, priority=Variable priority is not available.
 #5  0x8035079d in kern_wait (td=0xff00194d4460, pid=91362,
 status=0xff86cdbffabc, options=Variable options is not available.
 #6  0x80350e95 in wait4 (td=Variable td is not available.
 #7  0x803bb8e5 in syscallenter (td=0xff00194d4460,
 sa=0xff86cdbffba0) at /usr/src/sys/kern/subr_trap.c:315
 #8  0x80574a0b in syscall (frame=0xff86cdbffc40) at
 /usr/src/sys/amd64/amd64/trap.c:888
 #9  0x8055d242 in Xfast_syscall () at
 /usr/src/sys/amd64/amd64/exception.S:377
 
 On one machine we had a little more info on console which may indicate
 java as the problem.
 
 http://blog.multiplay.co.uk/dropzone/freebsd/panic-java.jpg

I would really appreciate if you could try to reproduce the problem with the 
patch
that I sent earlier.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org




I would really appreciate if you could try to reproduce the problem with the 
patch
that I sent earlier.


Hi Andriy, what's the risk of this patch causing other issues?

I ask as to get results from this we've going to have to roll it
out to over 130+ production machines, so I'd like to be clear on
the risks before I sign that off.

   Regard
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Andriy Gapon

on 11/08/2011 19:37 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 

 I would really appreciate if you could try to reproduce the problem with the 
 patch
 that I sent earlier.
 
 Hi Andriy, what's the risk of this patch causing other issues?

I can not estimate.
The code is supposed to affect only things that happen after panic, so make your
guess.

 I ask as to get results from this we've going to have to roll it
 out to over 130+ production machines, so I'd like to be clear on
 the risks before I sign that off.

I will be happy if you try the patch on a single machine provided the problem is
that reproducible.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland

- Original Message - 
From: Andriy Gapon a...@freebsd.org



I would really appreciate if you could try to reproduce the
problem with the patch that I sent earlier.


Hi Andriy, what's the risk of this patch causing other issues?


I can not estimate.
The code is supposed to affect only things that happen after panic,
so make your guess.


So in theory should be good.


I ask as to get results from this we've going to have to roll it
out to over 130+ production machines, so I'd like to be clear on
the risks before I sign that off.


I will be happy if you try the patch on a single machine
provided the problem is that reproducible.


Unfortunately although its happening a lot its taking the
large numbers of machines to make it that way.

Over the 130+ machines we're seeing between 3 and 8 panics
a day, so based on that we could be waiting quite some time
for a specific machine to panic :(

Don't think we're going to make any progress on this in the current
state so I think we'll give it a shot.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Andriy Gapon

on 11/08/2011 20:14 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 I would really appreciate if you could try to reproduce the
 problem with the patch that I sent earlier.

 Hi Andriy, what's the risk of this patch causing other issues?

 I can not estimate.
 The code is supposed to affect only things that happen after panic,
 so make your guess.
 
 So in theory should be good.
 
 I ask as to get results from this we've going to have to roll it
 out to over 130+ production machines, so I'd like to be clear on
 the risks before I sign that off.

 I will be happy if you try the patch on a single machine
 provided the problem is that reproducible.
 
 Unfortunately although its happening a lot its taking the
 large numbers of machines to make it that way.
 
 Over the 130+ machines we're seeing between 3 and 8 panics
 a day, so based on that we could be waiting quite some time
 for a specific machine to panic :(
 
 Don't think we're going to make any progress on this in the current
 state so I think we'll give it a shot.

Maybe test it on couple of machines first just in case I overlooked something
essential, although I have a report from another use that the patch didn't break
anything for him (it was tested for an unrelated issue).


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Rick Macklem

Steven Hartland wrote:
 - Original Message -
 From: Andriy Gapon a...@freebsd.org
 
  I would really appreciate if you could try to reproduce the
  problem with the patch that I sent earlier.
 
  Hi Andriy, what's the risk of this patch causing other issues?
 
  I can not estimate.
  The code is supposed to affect only things that happen after panic,
  so make your guess.
 
 So in theory should be good.
 
  I ask as to get results from this we've going to have to roll it
  out to over 130+ production machines, so I'd like to be clear on
  the risks before I sign that off.
 
  I will be happy if you try the patch on a single machine
  provided the problem is that reproducible.
 
 Unfortunately although its happening a lot its taking the
 large numbers of machines to make it that way.
 
 Over the 130+ machines we're seeing between 3 and 8 panics
 a day, so based on that we could be waiting quite some time
 for a specific machine to panic :(
 
 Don't think we're going to make any progress on this in the current
 state so I think we'll give it a shot.
 
Just a random thought that is probably not relevent, but...
Is it possible that some change for the upgrade is making the machines
run hotter and they're failing when they overhead?

rick
 Regards
 Steve
 
 
 This e.mail is private and confidential between Multiplay (UK) Ltd.
 and the person or entity to whom it is addressed. In the event of
 misdirection, the recipient is prohibited from using, copying,
 printing or otherwise disseminating it or any information contained in
 it.
 
 In the event of misdirection, illegible or incomplete transmission
 please telephone +44 845 868 1337
 or return the E.mail to postmas...@multiplay.co.uk.
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to
 freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland

- Original Message - 
From: Rick Macklem rmack...@uoguelph.ca



Just a random thought that is probably not relevent, but...
Is it possible that some change for the upgrade is making the machines
run hotter and they're failing when they overhead?


The machines have full HW monitoring and we've not seen reports of 
temperature issues, add to that quite a few are L series so run really 
cool anyway, I very much doubt it.


   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland


We're currently experiencing a large number of kernel panics
on FreeBSD 8.2-RELEASE across a large number of machines here.

The base stack reported is a double fault with no additional
details and CTRL+ALT+ESC fails to break to the debugger as
does and NMI, even though it at least tries printing the
following many times some quite jumbled:-
NMI ... going to debugger

We've configured the dump device but that also seems to fail
to capture any details just sitting there after panic with
Dumping 4465MB:

The machines are single disk ZFS root install and the dump
device is configured using the gptid, could this be what's
preventing the dump happening?

The kernel is compiled with:-
options KDB # Kernel debugger related code
options KDB_TRACE   # Print a stack trace for a panic

We have remove KVM but not remote serial on the most of the
machines.

Any advice on how to debug this issue?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Jeremy Chadwick

On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote:
 The base stack reported is a double fault with no additional
 details and CTRL+ALT+ESC fails to break to the debugger as
 does and NMI, even though it at least tries printing the
 following many times some quite jumbled:-
 NMI ... going to debugger

You may be interested in these system tunables (not sysctls).  These
come from sys/amd64/amd64/trap.c (i386 has the same):

machdep.kdb_on_nmi(defaults to 1)
machdep.panic_on_nmi  (defaults to 1)

If what you're seeing is a hardware NMI that fires, followed by the
machine panicing, the above tunables are probably doing that.  A
hardware NMI could indicate an actual hardware issue of sorts, depending
on how the motherboard vendor implemented what they did.  For example,
on a series of mainboards we have at my workplace, the BIOS can be
configured to generate either an NMI or SMI# when different kinds of ECC
RAM errors happen (either single-bit or multi-bit parity errors).  I
don't know if that's what you're seeing.

If you're generating the NMI yourself (possibly via the KVM, etc.) then
okay, that's different.  I'm trying to discern whether or not *you're*
generating the NMI, or if the NMI just happens and causes a panic for
you and that's what you're worried about.

Now to discuss the jumbled console output:

The interspersing of kernel text output has plagued FreeBSD for a very
long time (since approximately 6.x).  There have been statements from
kernel coders that you can decrease the likelihood of it happening by
increasing the PRINTF_BUFR_SIZE (not a typo) option in your kernel
configuration.  The issue is exacerbated by use of SMP (either
multi-core or multi-CPU).

The default (assuming your kernel configs are based off of GENERIC
within the past 4-5 years) is 128.  However, the same developers stated
that they have great reservations over increasing this number
dramatically (meaning, something like 256 will probably work, but larger
may have repercussions which are unknown at this time).

I have stated publicly then, and will do so again now, that this option
does not solve the problem.  I acknowledge it may make it less likely
to happen or may decrease the amount of interspersed output, but in my
experience neither of those prove true; and more importantly, said
option does not solve the problem.  I've talked on-list with John
Baldwin about this problem in the past, who had some pretty good ideas
of how to solve it.

I should point out that Solaris 10 and OpenSolaris (not sure about
present-day releases) both have this problem as well, especially during
kernel panics or MCEs.  Linux addressed this issue by implementing a
ring-based cyclic buffer for its kernel messages (syslog/klogd), and the
model is extremely well-documented (quite clever too):

http://www.mjmwired.net/kernel/Documentation/trace/ring-buffer-design.txt

I'm still surprised not a single GSoC project has attempted to solve
this for FreeBSD.  It really is a serious matter, as it makes getting
kernel backtraces and crash data a serious pain in the butt.  It can
also impact real-time debugging.  These are the *worst* times to have to
tolerate something like this.

I can point you to old threads about this, and my old FreeBSD wiki page
(Commonly reported issues) touches on this as well.  The point I want
to get across is that PRINTF_BUFR_SIZE does not solve the problem.

 We've configured the dump device but that also seems to fail
 to capture any details just sitting there after panic with
 Dumping 4465MB:
 
 The machines are single disk ZFS root install and the dump
 device is configured using the gptid, could this be what's
 preventing the dump happening?

I can tell you that others have reported this problem where the kernel
panic/dump begins but either locks up after showing the first progress
metre/amount, or during the dumping itself.

I give everyone the same advice: please make sure that you have a swap
partition that's large enough to fit your entire memory contents
(preferably a swap that's 2x or 1.5x the amount of physical RAM), and
please make sure it's on a dedicated slice (e.g. ada0s1b).  I do not
advise any sort of abstraction layer between swap and the rest of the
system.  It might seem like a great/fun/awesome idea followed by
whatever jdc, it works! but when a crash happens -- which is when you
need it most -- and it doesn't work, I won't sympathise.  :-)

As for the GPT aspects of things: I'm still not familiar with GPT (as a
technology I am, but when it comes to actual usability I am not).

 The kernel is compiled with:-
 options KDB # Kernel debugger related code
 options KDB_TRACE   # Print a stack trace for a panic
 
 We have remove KVM but not remote serial on the most of the
 machines.

As long as remote KVM provides actual VGA-level redirection, then that's
sufficient (though makes copy-pasting output basically impossible).  We
use serial console and tend to use these options;

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Andriy Gapon

on 10/08/2011 17:22 Steven Hartland said the following:
 The kernel is compiled with:-
 options KDB # Kernel debugger related code
 options KDB_TRACE   # Print a stack trace for a panic

You also have to provide an actual debugger backend like built-in DDB or a stub
for remote GDB to get online debugging.  No guarantees that that would help you 
to
get the debugging information, but without that the chances are even slimmer.

You may also try this patch and see if it provides any improvements for 
post-panic
environment (dumping etc):
http://people.freebsd.org/~avg/stop_scheduler_on_panic.8.x.diff

It might also be a good idea to at least capture a screenshot of whatever
information you get on console when the panic happens.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland

- Original Message - 
From: Steven Hartland kill...@multiplay.co.uk

To: freebsd-stable@freebsd.org
Sent: Wednesday, August 10, 2011 3:22 PM
Subject: debugging frequent kernel panics on 8.2-RELEASE

We're currently experiencing a large number of kernel panics
on FreeBSD 8.2-RELEASE across a large number of machines here.

The base stack reported is a double fault with no additional
details and CTRL+ALT+ESC fails to break to the debugger as
does and NMI, even though it at least tries printing the
following many times some quite jumbled:-
NMI ... going to debugger

We've configured the dump device but that also seems to fail
to capture any details just sitting there after panic with
Dumping 4465MB:

The machines are single disk ZFS root install and the dump
device is configured using the gptid, could this be what's
preventing the dump happening?

The kernel is compiled with:-
options KDB # Kernel debugger related code
options KDB_TRACE   # Print a stack trace for a panic

We have remove KVM but not remote serial on the most of the
machines.

Any advice on how to debug this issue?

ldn32.multiplay.co.uk dumped core - see /var/crash/vmcore.0

Wed Aug 10 14:02:07 UTC 2011

FreeBSD crash 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Jul 21 11:05:52 BST 2011
root@crash:/usr/obj/usr/src/sys/MULTIPLAY  amd64

panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0x8052f6f1
rsp = 0xff86ce600fb0
rbp = 0xff86ce601210
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
#0 0x803af91e at kdb_backtrace+0x5e
#1 0x8037d817 at panic+0x187
#2 0x80574316 at dblfault_handler+0x96
#3 0x8055d06d at Xdblfault+0xad
Uptime: 13d20h53m31s
Physical memory: 24555 MB
Dumping 3283 MB: 3268 3252 3236 3220 3204 3188 3172 3156 3140 3124 3108 3092 3076 3060 3044 3028 3012 2996 2980 2964 2948 2932 
2916 2900 2884 2868 2852 2836 2820 2804 2788 2772 2756 2740 272
4 2708 2692 2676 2660 2644 2628 2612 2596 2580 2564 2548 2532 2516 2500 2484 2468 2452 2436 2420 2404 2388 2372 2356 2340 2324 
2308 2292 2276 2260 2244 2228 2212 2196 2180 2164 2148 2132 211
6 2100 2084 2068 2052 2036 2020 2004 1988 1972 1956 1940 1924 1908 1892 1876 1860 1844 1828 1812 1796 1780 1764 1748 1732 1716 
1700 1684 1668 1652 1636 1620 1604 1588 1572 1556 1540 1524 150
8 1492 1476 1460 1444 1428 1412 1396 1380 1364 1348 1332 1316 1300 1284 1268 1252 1236 1220 1204 1188 1172 1156 1140 1124 1108 
1092 1076 1060 1044 1028 1012 996 980 964 948 932 916 900 884 8
68 852 836 820 804 788 772 756 740 724 708 692 676 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 356 
340 324 308 292 276 260 244 228 212 196 180 164 148 132 116

100 84 68 52 36 20 4

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from 
/boot/kernel/linprocfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from 
/boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko

One of the machines has managed to dump where all the others
have failed to do so here's the stack from core.txt.0

#0  sched_switch (td=0x80830bc0, newtd=0xff000a73f8c0, flags=Variable 
flags is not available.)
   at /usr/src/sys/kern/sched_ule.c:1858
1858cpuid = PCPU_GET(cpuid);
(kgdb)
#0  sched_switch (td=0x80830bc0, newtd=0xff000a73f8c0, flags=Variable 
flags is not available.)
   at /usr/src/sys/kern/sched_ule.c:1858
#1  0x80385c86 in mi_switch (flags=260, newtd=0x0)
   at /usr/src/sys/kern/kern_synch.c:449
#2  0x803b92d2 in sleepq_timedwait (wchan=0x80830760, pri=68)
   at /usr/src/sys/kern/subr_sleepqueue.c:644
#3  0x803861e1 in _sleep (ident=0x80830760, lock=0x0,
   priority=Variable priority is not available.
) at /usr/src/sys/kern/kern_synch.c:230
#4  0x80532c29 in scheduler (dummy=Variable dummy is not available.
) at /usr/src/sys/vm/vm_glue.c:807
#5  0x80335d67 in mi_startup () at /usr/src/sys/kern/init_main.c:254
#6  0x8016efac in btext () at /usr/src/sys/amd64/amd64/locore.S:81
#7  0x808556e0 in sleepq_chains ()
#8  0x8083b1e0

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland

- Original Message - 
From: Jeremy Chadwick free...@jdc.parodius.com




On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote:

The base stack reported is a double fault with no additional
details and CTRL+ALT+ESC fails to break to the debugger as
does and NMI, even though it at least tries printing the
following many times some quite jumbled:-
NMI ... going to debugger



If you're generating the NMI yourself (possibly via the KVM, etc.) then
okay, that's different.  I'm trying to discern whether or not *you're*
generating the NMI, or if the NMI just happens and causes a panic for
you and that's what you're worried about.


Yer generating it after panic in order to try and get to the debugger :)


Now to discuss the jumbled console output:

...

The default (assuming your kernel configs are based off of GENERIC
within the past 4-5 years) is 128.  However, the same developers stated
that they have great reservations over increasing this number
dramatically (meaning, something like 256 will probably work, but larger
may have repercussions which are unknown at this time).


Might try that if it will help but with so many production machines to
action I'd like to try and avoid if possible.


The machines are single disk ZFS root install and the dump
device is configured using the gptid, could this be what's
preventing the dump happening?


I can tell you that others have reported this problem where the kernel
panic/dump begins but either locks up after showing the first progress
metre/amount, or during the dumping itself.


Ahh, so possibly not a gptid issue


I give everyone the same advice: please make sure that you have a swap
partition that's large enough to fit your entire memory contents
(preferably a swap that's 2x or 1.5x the amount of physical RAM), and
please make sure it's on a dedicated slice (e.g. ada0s1b).  I do not
advise any sort of abstraction layer between swap and the rest of the
system.  It might seem like a great/fun/awesome idea followed by
whatever jdc, it works! but when a crash happens -- which is when you
need it most -- and it doesn't work, I won't sympathise.  :-)

As for the GPT aspects of things: I'm still not familiar with GPT (as a
technology I am, but when it comes to actual usability I am not).


Just managed to get a crash dump from one machine so hopefully will be able
to make some progress is someone can point me in the right direction.


# Debugging options
options BREAK_TO_DEBUGGER   # Sending a serial BREAK drops to DDB
options ALT_BREAK_TO_DEBUGGER   # Permit CR~Ctrl-b to drop to DDB
options KDB # Enable kernel debugger support
options KDB_TRACE   # Print stack trace automatically on 
panic
options DDB # Support DDB
options GDB # Support remote GDB


Cheers 


In combination with this, we use the following in /etc/rc.conf (the
dumpdev line is important, else savecore won't pick up anything):

dumpdev=auto


I thought this was ment to be the default from back in the 6.x days but
it didnt seem to work, so I added the gptid device from /etc/fstab


ddb_enable=yes


Thanks :)

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Jeremy Chadwick

On Wed, Aug 10, 2011 at 04:46:17PM +0100, Steven Hartland wrote:
 On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote:
 The base stack reported is a double fault with no additional
 details and CTRL+ALT+ESC fails to break to the debugger as
 does and NMI, even though it at least tries printing the
 following many times some quite jumbled:-
 NMI ... going to debugger
 
 If you're generating the NMI yourself (possibly via the KVM, etc.) then
 okay, that's different.  I'm trying to discern whether or not *you're*
 generating the NMI, or if the NMI just happens and causes a panic for
 you and that's what you're worried about.
 
 Yer generating it after panic in order to try and get to the debugger :)

Understood, thanks for clarifying.

 Now to discuss the jumbled console output:
 ...
 The default (assuming your kernel configs are based off of GENERIC
 within the past 4-5 years) is 128.  However, the same developers stated
 that they have great reservations over increasing this number
 dramatically (meaning, something like 256 will probably work, but larger
 may have repercussions which are unknown at this time).
 
 Might try that if it will help but with so many production machines to
 action I'd like to try and avoid if possible.

I've used PRINTF_BUFR_SIZE=256 with success on our systems, but since it
doesn't actually *solve* the problem, I just use the default 128 and
just grit my teeth when we experience it.  It's larger values (e.g.
512/1024, etc.) which there is concern over.

 In combination with this, we use the following in /etc/rc.conf (the
 dumpdev line is important, else savecore won't pick up anything):
 
 dumpdev=auto
 
 I thought this was ment to be the default from back in the 6.x days but
 it didnt seem to work, so I added the gptid device from /etc/fstab

/etc/defaults/rc.conf has dumpdev=NO, which affects two things: both
/etc/rc.d/dumpon (this script is a little tricky, you really have to
read it slowly/pay close attention to what's going on), and
/etc/rc.d/savecore.

I've always wondered why dumpdev=NO is the default, not auto, since
on a system with no swap devices in /etc/fstab dumpdev=auto should
behave the same.  Possibly the idea of the default is to ensure that
savecore(8) never gets run (e.g. there's no guarantee someone has
/var/crash, or a /var that's big enough to hold a crash dump; possibly
embedded systems or NFS-only systems, for example).

Touchy subject I guess.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland

- Original Message - 
From: Jeremy Chadwick free...@jdc.parodius.com



In combination with this, we use the following in /etc/rc.conf (the
dumpdev line is important, else savecore won't pick up anything):

dumpdev=auto

I thought this was ment to be the default from back in the 6.x days but
it didnt seem to work, so I added the gptid device from /etc/fstab


/etc/defaults/rc.conf has dumpdev=NO, which affects two things: both
/etc/rc.d/dumpon (this script is a little tricky, you really have to
read it slowly/pay close attention to what's going on), and
/etc/rc.d/savecore.


Hmm, someone might want to correct the docs then:-
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html

AUTO is the default as of FreeBSD 6.0


   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Jeremy Chadwick

On Wed, Aug 10, 2011 at 05:26:27PM +0100, Steven Hartland wrote:
 - Original Message - From: Jeremy Chadwick
 free...@jdc.parodius.com
 
 In combination with this, we use the following in /etc/rc.conf (the
 dumpdev line is important, else savecore won't pick up anything):
 
 dumpdev=auto
 
 I thought this was ment to be the default from back in the 6.x days but
 it didnt seem to work, so I added the gptid device from /etc/fstab
 
 /etc/defaults/rc.conf has dumpdev=NO, which affects two things: both
 /etc/rc.d/dumpon (this script is a little tricky, you really have to
 read it slowly/pay close attention to what's going on), and
 /etc/rc.d/savecore.
 
 Hmm, someone might want to correct the docs then:-
 http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html
 
 AUTO is the default as of FreeBSD 6.0

It used to be auto, and was changed to no in this commit back in
September 2009, and was reviewed by two separate people:

http://www.freebsd.org/cgi/cvsweb.cgi/src/etc/defaults/rc.conf#rev1.358.2.2

Prior to that, it was auto, as confirmed here (circa June 2005):

http://www.freebsd.org/cgi/cvsweb.cgi/src/etc/defaults/rc.conf#rev1.250

So basically the documentation is both correct and incorrect.  For
anyone running FreeBSD later than September 2009 (I would need to spend
some time figuring out what releases that was), dumpdev will not be
enabled by default.  Prior to that (which includes 6.x), it will be.

The documentation needs to be updated to reflect reality (specifically
the commit that was done in September 2009).  I'll file a PR for this,
but won't have the PR number until later today.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

69 matches

Mail list logo