Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-03 Thread Grzegorz Junka



On 03/05/2020 15:13, Gary Jennejohn wrote:

On Sun, 3 May 2020 14:11:09 +0100
Grzegorz Junka  wrote:


I don't have a partition that I could use for swap. I have two whole
disks added to ZFS. Maybe on the boot drive but that would require
repartitioning and I have Windows/FreeBSD there, so not so straightforward.


As the dumpon man pages states, by the time a crash dump is needed the
files systems are dead.  No way to dump to a ZFS file system.  That's
why a raw partition is required.

The other option would be netdump.  See the dumpon man page.



I will consider a separate partition next time I partition my disk. For 
now I will have to ignore panics and dumps. I tried netdump and it 
didn't work - it couldn't ARP the netmapd server.


--GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal and poudriere

2020-05-03 Thread Grzegorz Junka


On 03/05/2020 15:00, Niclas Zeising wrote:

On 2020-05-02 20:36, Kurt Jaeger wrote:


I don't know, either 8-} bz@ is in Cc:, so he'll probably know what
to do.


How do I know if I have got a backtrace?

Are those errors:

pid 43297 (conftest), jid 5, uid 0: exited on signal 11

related or it's a different issue?


I think that's a different issue.



conftest is when configure scripts do things.  Configure works a lot 
by compiling (and sometimes running) small snippets of code to figure 
out what's going on.  Sometimes those snippets core dump. It's all 
normal.




Good to know. It's mostly conftest but sometimes others too:

pid 37407 (cc), jid 9, uid 0: exited on signal 6
pid 95358 (conftest), jid 3, uid 0: exited on signal 11
pid 70242 (conftest), jid 9, uid 0: exited on signal 11
pid 27480 (ngc27183), jid 3, uid 0: exited on signal 11

Regards

--GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal and poudriere

2020-05-03 Thread Grzegorz Junka


On 02/05/2020 10:08, Grzegorz Junka wrote:
I am compiling some packages with poudriere on 13-current kernel. I 
noticed some strange messages printed into the terminal and dmesg:


lock order reversal:
 1st 0xf8010ca78250 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005
 2nd 0xf8010cd37250 devfs (devfs) @ 
/usr/src-13/sys/kern/vfs_mount.c:1016

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b92f18 at lockmgr_lock_flags+0x188
#2 0x80cae744 at _vn_lock+0x54
#3 0x80c90756 at vfs_domount+0xd16
#4 0x80c8efd1 at vfs_donmount+0x871
#5 0x80c8e729 at sys_nmount+0x69
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101
pid 17216 (conftest), jid 6, uid 0: exited on signal 11
pid 51159 (conftest), jid 6, uid 0: exited on signal 11
pid 23833 (conftest), jid 3, uid 0: exited on signal 11
pid 4916 (conftest), jid 3, uid 0: exited on signal 11

(... then there is a bunch of similar ones, then ...)

pid 14504 (conftest), jid 3, uid 0: exited on signal 11
pid 27466 (conftest), jid 6, uid 0: exited on signal 11
pid 43297 (conftest), jid 5, uid 0: exited on signal 11
lock order reversal:
 1st 0xfe00bc68c030 filedesc structure (filedesc structure) @ 
/usr/src-13/sys/kern/sys_generic.c:1557
 2nd 0xf803baeddbd8 tmpfs (tmpfs) @ 
/usr/src-13/sys/kern/vfs_vnops.c:1553

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b946b5 at lockmgr_xlock+0x55
#2 0x80cae744 at _vn_lock+0x54
#3 0x80cad0da at vn_poll+0x3a
#4 0x80c33e19 at kern_poll+0x419
#5 0x80c340df at sys_ppoll+0x6f
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101
pid 37533 (conftest), jid 5, uid 0: exited on signal 11
pid 43474 (conftest), jid 5, uid 0: exited on signal 11




I restarted the compilation and again seeing similar LORs:

lock order reversal:
 1st 0xf80115d32068 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005
 2nd 0xf800243d6808 devfs (devfs) @ 
/usr/src-13/sys/kern/vfs_mount.c:1016

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b92f18 at lockmgr_lock_flags+0x188
#2 0x80cae744 at _vn_lock+0x54
#3 0x80c90756 at vfs_domount+0xd16
#4 0x80c8efd1 at vfs_donmount+0x871
#5 0x80c8e729 at sys_nmount+0x69
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101

lock order reversal:
 1st 0xfe00a7aa49b0 filedesc structure (filedesc structure) @ 
/usr/src-13/sys/kern/sys_generic.c:1557

 2nd 0xf800aa2cdbd8 zfs (zfs) @ /usr/src-13/sys/kern/vfs_vnops.c:1553
stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b946b5 at lockmgr_xlock+0x55
#2 0x80cae744 at _vn_lock+0x54
#3 0x80cad0da at vn_poll+0x3a
#4 0x80c33e19 at kern_poll+0x419
#5 0x80c339f0 at sys_poll+0x50
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101


The page to report still returns 404 :)

--

GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-03 Thread Grzegorz Junka


On 03/05/2020 08:05, Gary Jennejohn wrote:

On Sat, 02 May 2020 16:28:46 -0700
Chris  wrote:






Another thing is that I don't quite understand why the crash couldn't
be dumped.

root@crayon2:~ # swapinfo
Device__ 1K-blocks Used__ Avail Capacity
/dev/zvol/tank3/swap__ 33554432__ 0 33554432 0%

There is no entry in /etc/fstab though, should it be there too?

How about your rc.conf(5) ?

You need to define a dumpdev within it as:

# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="YES"

Which defaults to the location of:

/var/crash
  

Yes, of course I have 'dumpdev="AUTO"'. Should it be "YES" instead?

Yes, it should of course be AUTO. I was distracted at the time of writing.
Sorry.
Does /var/crash exist?

That _should_ be enough. Assuming /var/crash is writable.


Sorry, but read the man page for rc.conf.

This is the entry for dumpdev:

  dumpdev (str) Indicates the device (usually a swap partition) to
  which a crash dump should be written in the event of a system
  crash.  If the value of this variable is "AUTO", the first
  suitable swap device listed in /etc/fstab will be used as
  dump device.  Otherwise, the value of this variable is passed
  as the argument to dumpon(8).  To disable crash dumps, set
  this variable to "NO".

If there are no swap devices in /etc/fstab then "AUTO" will not work.  But
a partition can be specified.  I have dumpdev="/dev/ada0p5" in my rc.conf.

/var/crash is the target for crash dumps after the system is re-booted.



/var/crash existed but might not have had the right permissions. I think 
it was 755 whereas the handbook recommends 700. Shouldn't matter though.


I don't have anything about swap in fstab since I am using Root on ZFS. 
swapinfo correctly recognizes the swap partition and uses it. This the 
typical usage while I am compiling ports:


last pid: 85116;  load averages:  8.95,  8.50, 8.34 up 0+18:06:31  13:02:32
72 processes:  14 running, 57 sleeping, 1 zombie
CPU:  0.0% user, 90.5% nice,  9.5% system,  0.0% interrupt,  0.0% idle
Mem: 993M Active, 594M Inact, 6400K Laundry, 12G Wired, 2225M Free
ARC: 6160M Total, 3093M MFU, 2657M MRU, 214M Anon, 100M Header, 193M Other
 5300M Compressed, 5861M Uncompressed, 1.11:1 Ratio
Swap: 32G Total, 61M Used, 32G Free

The crash happened in similar conditions so there should be nothing 
preventing dumping the crash to the zfs swap, unless dumpon isn't smart 
enough to use zfs swap.


I don't have a partition that I could use for swap. I have two whole 
disks added to ZFS. Maybe on the boot drive but that would require 
repartitioning and I have Windows/FreeBSD there, so not so straightforward.


--GrzegorzJ



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 21:18, Mark Johnston wrote:

OK, I found this handbook
https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug

Obviously something must have been misconfigured that I can't dump the
core now. Is there anything I can fetch from the system while I am in
db> or I should just forget and restart?

It would be useful to see the output of "bt", "show lockchain" and
"alltrace" if possible.  The latter command will product a lot of output
though.


Sorry, had to restart. I tried "netdump -s someIP -g someGateway which 
forced netdump into a loop (of requesting ARP for someIP and failing) 
and couldn't stop it.


I only have the photo of the crash itself which ends at and sleepq_add 
before going to panic. I can hardtranscribe if it might be of any use.


--GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka


On 02/05/2020 20:43, Chris wrote:

On Sat, 2 May 2020 20:19:56 +0100 Grzegorz Junka li...@gjunka.com said


On 02/05/2020 14:56, Grzegorz Junka wrote:
>
> On 02/05/2020 14:15, Grzegorz Junka wrote:
>> cpuid = 3
>>
>> time = 1588422616
>>
>> KDB: stack backtrace:
>>
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 
0xfe00b27e86b0

>>
>> vpanic() at vpanic+0x182/frame 0xfe00b27e8700
>>
>> panic() at panic+0x43/frame ...
>>
>> sleepq_add()
>>
>> ...
>>
>> I see
>>
>> db>
>>
>> in the terminal. I tried "dump" but it says, Cannot dump: no dump 
>> device specified.

>>
>> Is there a guide how to deal wit those, i.e. to gather information 
>> required to investigate issues?

>

Another thing is that I don't quite understand why the crash couldn't 
be dumped.


root@crayon2:~ # swapinfo
Device  1K-blocks Used    Avail Capacity
/dev/zvol/tank3/swap  33554432    0 33554432 0%

There is no entry in /etc/fstab though, should it be there too?


How about your rc.conf(5) ?

You need to define a dumpdev within it as:

# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="YES"

Which defaults to the location of:

/var/crash



Yes, of course I have 'dumpdev="AUTO"'. Should it be "YES" instead?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka


On 02/05/2020 14:56, Grzegorz Junka wrote:


On 02/05/2020 14:15, Grzegorz Junka wrote:

cpuid = 3

time = 1588422616

KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00b27e86b0


vpanic() at vpanic+0x182/frame 0xfe00b27e8700

panic() at panic+0x43/frame ...

sleepq_add()

...

I see

db>

in the terminal. I tried "dump" but it says, Cannot dump: no dump 
device specified.


Is there a guide how to deal wit those, i.e. to gather information 
required to investigate issues?




Another thing is that I don't quite understand why the crash couldn't be 
dumped.


root@crayon2:~ # swapinfo
Device  1K-blocks Used    Avail Capacity
/dev/zvol/tank3/swap  33554432    0 33554432 0%

There is no entry in /etc/fstab though, should it be there too?

--

GrzegorzJ


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 15:40, Conrad Meyer wrote:

Hi Grzegorz,

If you have another machine connected by network that you can install
and start netdumpd on, and; ipv4 configured on a supported network
device before the machine paniced; and a recent CURRENT; you should be
able to initiate a kernel dump over the network with 'netdump -s
server-ip' in DDB.  In more complicated situations you might also need
to specify '-g gateway-ip -c client-ip -i interface', but for servers
on the LAN or available via the default gateway route, the former
ought to work.



Thanks Conrad. That doesn't seem to work. netdump -s reports "Failed to 
ARP server" then "failed to locate MAC address". Both systems are in the 
same local network and the system that crashed did have a network 
configured prior to crash. In fact, I was logged in over ssh in one of 
the terminals. I tried through a switch and when the network is 
connected directly. I tried to specify the interface and the client IP.


Is there a way to specify MAC directly?

GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 14:15, Grzegorz Junka wrote:

cpuid = 3

time = 1588422616

KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00b27e86b0


vpanic() at vpanic+0x182/frame 0xfe00b27e8700

panic() at panic+0x43/frame ...

sleepq_add()

...

I see

db>

in the terminal. I tried "dump" but it says, Cannot dump: no dump 
device specified.


Is there a guide how to deal wit those, i.e. to gather information 
required to investigate issues?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"



OK, I found this handbook 
https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug


Obviously something must have been misconfigured that I can't dump the 
core now. Is there anything I can fetch from the system while I am in 
db> or I should just forget and restart?


GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka

cpuid = 3

time = 1588422616

KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00b27e86b0


vpanic() at vpanic+0x182/frame 0xfe00b27e8700

panic() at panic+0x43/frame ...

sleepq_add()

...

I see

db>

in the terminal. I tried "dump" but it says, Cannot dump: no dump device 
specified.


Is there a guide how to deal wit those, i.e. to gather information 
required to investigate issues?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal and poudriere

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 10:54, Kurt Jaeger wrote:

Hi!


I am compiling some packages with poudriere on 13-current kernel. I
noticed some strange messages printed into the terminal and dmesg:

lock order reversal:

[...]

Are those the debug messages that aren't visible on non-current kernel
and should they be reported?

Yes, they should be checked and reported.

For more details see:

http://sources.zabbadoz.net/freebsd/lor.html

There's a webpage with a list of all known LORs and a way to
report new LORs.



Thanks Kurt. I can't find those two specific LORs in the list on that 
page. The page also says to report them using a link, which leads to 404 
:-), or on this mailing list, which I did. I am not sure what else 
should I do. How do I know if I have got a backtrace?


Are those errors:

pid 43297 (conftest), jid 5, uid 0: exited on signal 11

related or it's a different issue?

GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


lock order reversal and poudriere

2020-05-02 Thread Grzegorz Junka
I am compiling some packages with poudriere on 13-current kernel. I 
noticed some strange messages printed into the terminal and dmesg:


lock order reversal:
 1st 0xf8010ca78250 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005
 2nd 0xf8010cd37250 devfs (devfs) @ 
/usr/src-13/sys/kern/vfs_mount.c:1016

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b92f18 at lockmgr_lock_flags+0x188
#2 0x80cae744 at _vn_lock+0x54
#3 0x80c90756 at vfs_domount+0xd16
#4 0x80c8efd1 at vfs_donmount+0x871
#5 0x80c8e729 at sys_nmount+0x69
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101
pid 17216 (conftest), jid 6, uid 0: exited on signal 11
pid 51159 (conftest), jid 6, uid 0: exited on signal 11
pid 23833 (conftest), jid 3, uid 0: exited on signal 11
pid 4916 (conftest), jid 3, uid 0: exited on signal 11

(... then there is a bunch of similar ones, then ...)

pid 14504 (conftest), jid 3, uid 0: exited on signal 11
pid 27466 (conftest), jid 6, uid 0: exited on signal 11
pid 43297 (conftest), jid 5, uid 0: exited on signal 11
lock order reversal:
 1st 0xfe00bc68c030 filedesc structure (filedesc structure) @ 
/usr/src-13/sys/kern/sys_generic.c:1557
 2nd 0xf803baeddbd8 tmpfs (tmpfs) @ 
/usr/src-13/sys/kern/vfs_vnops.c:1553

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b946b5 at lockmgr_xlock+0x55
#2 0x80cae744 at _vn_lock+0x54
#3 0x80cad0da at vn_poll+0x3a
#4 0x80c33e19 at kern_poll+0x419
#5 0x80c340df at sys_ppoll+0x6f
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101
pid 37533 (conftest), jid 5, uid 0: exited on signal 11
pid 43474 (conftest), jid 5, uid 0: exited on signal 11


Poudriere doesn't really report any problems:

# poudriere status
SET  PORTS JAIL BUILD    STATUS QUEUE BUILT FAIL 
SKIP IGNORE REMAIN TIME LOGS
kde5 gui   13   2020-05-01_10h17m52s parallel_build  2040   792 0    
0  0   1248 22:48:00 
/usr/local/poudriere/data/logs/bulk/13-gui-kde5/2020-05-01_10h17m52s



Are those the debug messages that aren't visible on non-current kernel 
and should they be reported?


GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"