WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Ellis H. Wilson III

Hi all,

I'm observing the following two WITNESS LORs being thrown upon boot-up 
of 10.0 and I was tracking current, hoping they would go away by 10.1, 
but it seems they persist as shown below.  I suspect this is because 
current is being built with WITNESS on but also with SKIPSPIN on.  So 
these issues are unlikely to show up for any devs but those who 
specifically enable WITNESS and disable SKIPSPIN like myself.  At my 
work we would greatly like to do our debugging with checking of 
spin-locking order included and panicing upon LOR detection.  That's not 
possible with these in existence.


Specifically:

snip
Event timer RTC frequency 32768 Hz quality 0
ppc0: cannot reserve I/O port range
Timecounters tick every 10.000 msec
pcm0: measured ac97 link rate at 29212 Hz
lock order reversal:
lock order reversal:
 1st 0x8177a1a0 entropy harvest mutex (entropy harvest mutex) @ 
/usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0x81475108 scrlock (scrlock) @ 
/usr/src/sys/dev/syscons/syscons.c:2694

KDB: stack backtrace:
#0 0x8092f020 at kdb_backtrace+0x60
#1 0x80944204 at witness_checkorder+0xc04
#2 0x808e6d5e at __mtx_lock_spin_flags+0x4e
#3 0x80738b80 at sc_puts+0xb0
#4 0x8073c4e5 at sc_cnputc+0xe5
#5 0x808b6cf0 at cnputc+0x80
#6 0x808b6fa8 at cnputs+0x58
#7 0x80934011 at putchar+0x151
#8 0x80932c36 at kvprintf+0xf6
#9 0x809344ad at _vprintf+0x8d
#10 0x809346c3 at printf+0x53
#11 0x80943fe0 at witness_checkorder+0x9e0
#12 0x808e6d5e at __mtx_lock_spin_flags+0x4e
#13 0x8090019b at msleep_spin_sbt+0x5b
#14 0x80698198 at random_kthread+0x78
#15 0x808cd941 at fork_exit+0x71
#16 0x80caee7e at fork_trampoline+0xe
 1st 0x8177a1a0 entropy harvest mutex (entropy harvest mutex) @ 
/usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0x81568390 sleepq chain (sleepq chain) @ 
/usr/src/sys/kern/subr_sleepqueue.c:237

KDB: stack backtrace:
#0 0x8092f020 at kdb_backtrace+0x60
#1 0x80944204 at witness_checkorder+0xc04
#2 0x808e6d5e at __mtx_lock_spin_flags+0x4e
#3 0x8090019b at msleep_spin_sbt+0x5b
#4 0x80698198 at random_kthread+0x78
#5 0x808cd941 at fork_exit+0x71
#6 0x80caee7e at fork_trampoline+0xe
usbus0: 12Mbps Full Speed USB v1.0
ugen0.1: Apple at usbus0
uhub0: Apple OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1 on usbus0
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: VBOX HARDDISK 1.0 ATA-6 device
snip

This is on a virtual machine via VirtualBox, but we have reproduced it 
on real hardware.  Regarding the latter reversal, I have dug in the code 
and see the calls go about in the following order:


In sys/dev/random/random_harvestq.c:
 198 mtx_lock_spin(harvest_mtx);
 209 msleep_spin_sbt(random_kthread_control, harvest_mtx,
Now in sys/kern/kern_synch.c:
 297 sleepq_lock(ident);
Now in sys/kern/subr_sleepqueue.c
 237 mtx_lock_spin(sc-sc_lock);

The above line numbers might be a hair off from head since I'm looking 
at code synced last week.


I'm happy to open a bug on this if that's the desirable course of 
action, or to even assist in fixing it, but I wanted to first see if 
anybody knew about these already (they didn't show up on the known LORs 
list on quick perusal) or if this was simply a case of WITNESS being 
overly conservative and throwing false positives.  If this belongs on a 
different list just let me know.


Thanks!

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Benjamin Kaduk
On Tue, 18 Nov 2014, Ellis H. Wilson III wrote:

 This is on a virtual machine via VirtualBox, but we have reproduced it on real
 hardware.  Regarding the latter reversal, I have dug in the code and see the
 calls go about in the following order:

 In sys/dev/random/random_harvestq.c:
  198 mtx_lock_spin(harvest_mtx);
  209 msleep_spin_sbt(random_kthread_control, harvest_mtx,
 Now in sys/kern/kern_synch.c:
  297 sleepq_lock(ident);
 Now in sys/kern/subr_sleepqueue.c
  237 mtx_lock_spin(sc-sc_lock);

 The above line numbers might be a hair off from head since I'm looking at code
 synced last week.

 I'm happy to open a bug on this if that's the desirable course of action, or
 to even assist in fixing it, but I wanted to first see if anybody knew about
 these already (they didn't show up on the known LORs list on quick perusal) or
 if this was simply a case of WITNESS being overly conservative and throwing
 false positives.  If this belongs on a different list just let me know.

I don't believe it is known, and this list is fine.

 I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0
 and I was tracking current, hoping they would go away by 10.1, but it seems
 they persist as shown below.  I suspect this is because current is being built
 with WITNESS on but also with SKIPSPIN on.  So these issues are unlikely to
 show up for any devs but those who specifically enable WITNESS and disable
 SKIPSPIN like myself.  At my work we would greatly like to do our debugging
 with checking of spin-locking order included and panicing upon LOR detection.
 That's not possible with these in existence.

However, I was under the impression that a kernel built with WITNESS and
without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see,
e.g.,
https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html).
So, (1) I'm surprised you can boot it, and (2) that would explain why no
one else has been using it.

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Ellis H. Wilson III

On 11/18/2014 05:39 PM, Benjamin Kaduk wrote:

On Tue, 18 Nov 2014, Ellis H. Wilson III wrote:

I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0
and I was tracking current, hoping they would go away by 10.1, but it seems
they persist as shown below.  I suspect this is because current is being built
with WITNESS on but also with SKIPSPIN on.  So these issues are unlikely to
show up for any devs but those who specifically enable WITNESS and disable
SKIPSPIN like myself.  At my work we would greatly like to do our debugging
with checking of spin-locking order included and panicing upon LOR detection.
That's not possible with these in existence.


However, I was under the impression that a kernel built with WITNESS and
without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see,
e.g.,
https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html).
So, (1) I'm surprised you can boot it, and (2) that would explain why no
one else has been using it.


That's a very interesting thread.  I've seen another where a fellow 
developer suggested just throwing on the WITNESS_SKIPSPIN flag to 
solve the issue.  I can't say that I agree with the approach, but I 
understand.  I'd be willing to tackle a bit of WITNESS massaging to help 
it be instructed about known false positives better, if that's desirable.


Why I'm able to boot however, is simple: I haven't enabled a full suite 
of debugging flags, KDB/DDB being the key ones that cause a panic to 
occur on a failure like the one I've seen.  We originally were seeing 
the panic so WITNESS in its entirety was shut off.  I was asked to try 
and get that back on-track, so to start I at least wanted to see how 
many LORs we were dealing with on boot.  2 is apparently that magic 
number, and maybe 3 if the cputs issue you refer to is still possible to 
be hit in the 10.1 wild.


If nobody has seen these before, I'll try and put together fixes for 
them.  Please somebody speak up if you have seen them or have useful 
information for me to go on in my patches.


Thanks,

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


reboot on wheel

2014-11-18 Thread Bruno Lauzé
Is there a reason for this:

@wheeluser: rebootOperation not permitted@wheeluser: shutdown -r nowSystem halt
What's the difference?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: reboot on wheel

2014-11-18 Thread Kevin Oberman
On Tue, Nov 18, 2014 at 7:24 PM, Bruno Lauzé brunola...@msn.com wrote:

 Is there a reason for this:

 @wheeluser: rebootOperation not permitted@wheeluser: shutdown -r
 nowSystem halt
 What's the difference?


shutdown(8) does a clean shutdown; i.e. it runs all rc.d stop routines.
Cleanly stopping some things can be critical things like database shutdown
(and many other things).

reboot(8) and halt(8) don't run these. They are intended as emergency tools
and require root. membership inoperator is all that is required for
shutdown. (Didn't realize wheel would do the trick. Or maybe your wheeladm
is in the operator group, too.)

It frequently surprises me how common it is for even long time FreeBSD
users seem to be unaware of this and the danger of reboot/halt(8) for some
systems. Of course, for many systems it will make no real difference.
--
R. Kevin Oberman, Network Engineer, Retired
E-mail: rkober...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org