Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-24 Thread Ellis H. Wilson III

On 11/22/2014 03:51 PM, Benjamin Kaduk wrote:

On Fri, 21 Nov 2014, Ellis H. Wilson III wrote:


Before I start, and this is mainly geared to my responder Benjamin Kaduk,
based on your response, are you suggesting that the cnputc WITNESS panic you
expected to happen is now completely unavoidable in FreeBSD 10?  I.E., is this
a spinlock that WITNESS falls over each time but that is provably deadlock
free that the developers have decided cannot be BLESSED for some reason?


https://lists.freebsd.org/pipermail/freebsd-current/2012-January/031316.html
looks to be a better explanation than the previous link I sent ... in
short, console output is hard.


I guess I just can't wrap my head around why we would ever move to a regime
where SKIPSPIN is the default for testing...  That just seems like an open
invitation for introducing spinlock regressions.


I don't think anyone made the conscious decision to do that, it just
happened by default as no one spent the time to fix the aforementioned
issue.


Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS
debugging is how exactly to interpret the output if I see a stacktrace and
then a LOR output like the following:

lock order reversal:
   1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @
/usr/src/sys/dev/random/random_harvestq.c:198
   2nd 0x813b6208 scrlock (scrlock) @
/usr/src/sys/dev/syscons/syscons.c:2682

Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then
#2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without
first getting harvest_mtx?  Or the reverse (WITNESS previously recorded
scrlock and then harvest and the lines it spit out were the offenders?)


I believe it is the latter (the ordering being printed is the bad one
which caused WITNESS to complain).


Thanks so much for the additional info Ben.  This fleshes out the 
history of this issue for me significantly.  I have filed a bug on this at:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195262

Xin Li was able to identify the ordering that caused the problem and 
proposed a possible patch to fix it.  I can confirm that now I'm booting 
with solely WITNESS (i.e., not WITNESS_SKIPSPIN) without panic.


Thanks!

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-22 Thread Benjamin Kaduk
On Fri, 21 Nov 2014, Ellis H. Wilson III wrote:

 Before I start, and this is mainly geared to my responder Benjamin Kaduk,
 based on your response, are you suggesting that the cnputc WITNESS panic you
 expected to happen is now completely unavoidable in FreeBSD 10?  I.E., is this
 a spinlock that WITNESS falls over each time but that is provably deadlock
 free that the developers have decided cannot be BLESSED for some reason?

https://lists.freebsd.org/pipermail/freebsd-current/2012-January/031316.html
looks to be a better explanation than the previous link I sent ... in
short, console output is hard.

 I guess I just can't wrap my head around why we would ever move to a regime
 where SKIPSPIN is the default for testing...  That just seems like an open
 invitation for introducing spinlock regressions.

I don't think anyone made the conscious decision to do that, it just
happened by default as no one spent the time to fix the aforementioned
issue.

 Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS
 debugging is how exactly to interpret the output if I see a stacktrace and
 then a LOR output like the following:

 lock order reversal:
   1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @
 /usr/src/sys/dev/random/random_harvestq.c:198
   2nd 0x813b6208 scrlock (scrlock) @
 /usr/src/sys/dev/syscons/syscons.c:2682

 Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then
 #2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without
 first getting harvest_mtx?  Or the reverse (WITNESS previously recorded
 scrlock and then harvest and the lines it spit out were the offenders?)

I believe it is the latter (the ordering being printed is the bad one
which caused WITNESS to complain).

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-21 Thread Ellis H. Wilson III

On 11/18/2014 08:35 PM, Ellis H. Wilson III wrote:

If nobody has seen these before, I'll try and put together fixes for
 them.  Please somebody speak up if you have seen them or have useful
 information for me to go on in my patches.


I've started to dig into the relevant code in 
sys/dev/random/random_harvestq.c, sys/dev/syscons/syscons.c, and 
sys/kern/subr_sleepqueue.c.


Before I start, and this is mainly geared to my responder Benjamin 
Kaduk, based on your response, are you suggesting that the cnputc 
WITNESS panic you expected to happen is now completely unavoidable in 
FreeBSD 10?  I.E., is this a spinlock that WITNESS falls over each time 
but that is provably deadlock free that the developers have decided 
cannot be BLESSED for some reason?


I guess I just can't wrap my head around why we would ever move to a 
regime where SKIPSPIN is the default for testing...  That just seems 
like an open invitation for introducing spinlock regressions.


Moving onto the LORs I'm seeing, a question I have as a newbie to 
WITNESS debugging is how exactly to interpret the output if I see a 
stacktrace and then a LOR output like the following:


lock order reversal:
  1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) 
@ /usr/src/sys/dev/random/random_harvestq.c:198
  2nd 0x813b6208 scrlock (scrlock) @ 
/usr/src/sys/dev/syscons/syscons.c:2682


Does this mean WITNESS has already stored an ordering of #1 harvest_mtx 
then #2 scp-scr_lock, and somewhere somebody tried to lock 
scp-scr_lock without first getting harvest_mtx?  Or the reverse 
(WITNESS previously recorded scrlock and then harvest and the lines it 
spit out were the offenders?)


Along those lines, in 10.0 and 10.1 releases I get two LORs showing up 
almost on-top of each other, with the other LOR showing up as:


lock order reversal:
 1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @ 
/usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0x81424bb8 sleepq chain (sleepq chain) @ 
/usr/src/sys/kern/subr_sleepqueue.c:240


This seems like maybe two LORs are detected at the same time, which 
perhaps suggests that the harvest_mtx should have been taken /after/ 
both of the other locks mentioned (scrlock and sleepq).


I'm happy to do the legwork implementing, testing, and submitting a 
patch for this, but I would really appreciate a pointer in the right 
direction from somebody who already has handled some LORs before.


Thanks!

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Benjamin Kaduk
On Tue, 18 Nov 2014, Ellis H. Wilson III wrote:

 This is on a virtual machine via VirtualBox, but we have reproduced it on real
 hardware.  Regarding the latter reversal, I have dug in the code and see the
 calls go about in the following order:

 In sys/dev/random/random_harvestq.c:
  198 mtx_lock_spin(harvest_mtx);
  209 msleep_spin_sbt(random_kthread_control, harvest_mtx,
 Now in sys/kern/kern_synch.c:
  297 sleepq_lock(ident);
 Now in sys/kern/subr_sleepqueue.c
  237 mtx_lock_spin(sc-sc_lock);

 The above line numbers might be a hair off from head since I'm looking at code
 synced last week.

 I'm happy to open a bug on this if that's the desirable course of action, or
 to even assist in fixing it, but I wanted to first see if anybody knew about
 these already (they didn't show up on the known LORs list on quick perusal) or
 if this was simply a case of WITNESS being overly conservative and throwing
 false positives.  If this belongs on a different list just let me know.

I don't believe it is known, and this list is fine.

 I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0
 and I was tracking current, hoping they would go away by 10.1, but it seems
 they persist as shown below.  I suspect this is because current is being built
 with WITNESS on but also with SKIPSPIN on.  So these issues are unlikely to
 show up for any devs but those who specifically enable WITNESS and disable
 SKIPSPIN like myself.  At my work we would greatly like to do our debugging
 with checking of spin-locking order included and panicing upon LOR detection.
 That's not possible with these in existence.

However, I was under the impression that a kernel built with WITNESS and
without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see,
e.g.,
https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html).
So, (1) I'm surprised you can boot it, and (2) that would explain why no
one else has been using it.

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Ellis H. Wilson III

On 11/18/2014 05:39 PM, Benjamin Kaduk wrote:

On Tue, 18 Nov 2014, Ellis H. Wilson III wrote:

I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0
and I was tracking current, hoping they would go away by 10.1, but it seems
they persist as shown below.  I suspect this is because current is being built
with WITNESS on but also with SKIPSPIN on.  So these issues are unlikely to
show up for any devs but those who specifically enable WITNESS and disable
SKIPSPIN like myself.  At my work we would greatly like to do our debugging
with checking of spin-locking order included and panicing upon LOR detection.
That's not possible with these in existence.


However, I was under the impression that a kernel built with WITNESS and
without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see,
e.g.,
https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html).
So, (1) I'm surprised you can boot it, and (2) that would explain why no
one else has been using it.


That's a very interesting thread.  I've seen another where a fellow 
developer suggested just throwing on the WITNESS_SKIPSPIN flag to 
solve the issue.  I can't say that I agree with the approach, but I 
understand.  I'd be willing to tackle a bit of WITNESS massaging to help 
it be instructed about known false positives better, if that's desirable.


Why I'm able to boot however, is simple: I haven't enabled a full suite 
of debugging flags, KDB/DDB being the key ones that cause a panic to 
occur on a failure like the one I've seen.  We originally were seeing 
the panic so WITNESS in its entirety was shut off.  I was asked to try 
and get that back on-track, so to start I at least wanted to see how 
many LORs we were dealing with on boot.  2 is apparently that magic 
number, and maybe 3 if the cputs issue you refer to is still possible to 
be hit in the 10.1 wild.


If nobody has seen these before, I'll try and put together fixes for 
them.  Please somebody speak up if you have seen them or have useful 
information for me to go on in my patches.


Thanks,

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org