Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

2020-05-02 Thread Mark Millard
[I'm only claiming the new jemalloc is involved and that
reverting avoids the problem.]

I've been reporting to some lists problems with:

dhclient
sendmail
rpcbind
mountd
nfsd

getting SIGSEGV (signal 11) crashes and some core
dumps on the old 2-socket (1 core per socket) 32-bit
PowerMac G4 running head -r360311.

Mikaël Urankar sent a note suggesting that I try
testing reverting head -r360233 for my head -r360311
context. He got it right . . .


Context:

The problem was noticed by an inability to have
other machines do a:

mount -onoatime,soft OLDPOWERMAC-LOCAL-IP:/... /mnt

sort of operation and to have succeed. By contrast, on
the old PowerMac G4 I could initiate mounts against
other machines just fine.

I do not see any such problems on any of (all based
on head -r360311):

powerpc64 (old PowerMac G5 2-sockets with 2 cores each)
armv7 (OrangePi+ 2ed)
aarch64 (Rock64, RPi4, RPi3,
 OverDrive 1000,
 Macchiatobin Double Shot)
amd64 (ThreadRipper 1950X)

So I expect something 32-bit powerpc specific
is somehow involved, even if jemalloc is only
using whatever it is.

(A kyua run with a debug kernel did not find other
unexpected signal 11 sources on the 32-bit PowerMac
compared to past kyua runs, at least that I noticed.
There were a few lock order reversals that I do not
know if they are expected or known-safe or not.
I've reported those reversals to the lists as well.)


Recent experiments based on the suggestion:

Doing the buildworld, buildkernel and installing just
the new kernel and rebooting made no difference.

But then installing the new world and rebooting did
make things work again: I no longer get core files
for the likes of (old cores from before the update):

# find / -name "*.core" -print
/var/spool/clientmqueue/sendmail.core
/rpcbind.core
/mountd.core
/nfsd.core

Nor do I see the various notices for sendmail
signal 11's that did not leave behind a core file
--or for dhclient (no core file left behind).
And I can mount the old PowerMac's drive from
other machines just fine.


Other notes:

I do not actively use sendmail but it was left
to do its default things, partially to test if
such default things are working. Unfortunately,
PowerMacs have a problematical status under
FreeBSD and my context has my historical
experiments with avoiding various problems.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: beadm no longer able to destroy, maybe since using OpenZFS

2020-05-02 Thread Graham Perrin
After reverting from OpenZFS to ZFS, I was able to destroy two of the 
boot environments that previously (below, 1st May, OpenZFS) could not be 
destroyed.


I'm left with at least one BE 'r360237c' that can not be destroyed. Is 
it ever normal to find a snapshot described as '-'?


root@momh167-gjp4-8570p:~ # kldstat | grep zfs
 2    1 0x82109000   3a8b40 zfs.ko
root@momh167-gjp4-8570p:~ # date
Sun May  3 05:59:06 BST 2020
root@momh167-gjp4-8570p:~ # beadm list -as
BE/Dataset/Snapshot    Active Mountpoint Space 
Created


Waterfox
  copperbowl/ROOT/Waterfox -  - 314.0M 
2020-03-10 18:24
    r360237c@2020-03-20-06:19:45   -  - 1.0G 2020-03-20 
06:19


r357746h
  copperbowl/ROOT/r357746h -  - 3.4M 2020-04-08 
09:28
    r360237c@2020-04-09-17:59:32   -  - 1.1G 2020-04-09 
17:59


r360237c
  copperbowl/ROOT/r360237c@2020-03-20-06:19:45 -  - 1.0G 2020-03-20 
06:19
  copperbowl/ROOT/r360237c@2020-04-09-17:59:32 -  - 1.1G 2020-04-09 
17:59
  copperbowl/ROOT/r360237c@2020-04-29-13:24:52 -  - 11.2M 
2020-04-29 13:24
  copperbowl/ROOT/r360237c -  - 80.4G 
2020-04-29 13:24


r360237e
  copperbowl/ROOT/r360237e@2020-05-01-17:58:33 -  - 96.6M 
2020-05-01 17:58
  copperbowl/ROOT/r360237e NR / 1.3G 2020-05-01 
17:58
    r360237c@2020-04-29-13:24:52   -  - 11.2M 
2020-04-29 13:24

root@momh167-gjp4-8570p:~ # beadm destroy r360237c
Are you sure you want to destroy 'r360237c'?
This action cannot be undone (y/[n]): y
Boot environment 'r360237c' was created from existing snapshot
Destroy '-' snapshot? (y/[n]): y
cannot destroy 'copperbowl/ROOT/r360237c': filesystem has dependent clones
use '-R' to destroy the following datasets:
copperbowl/ROOT/r360237e@2020-05-01-17:58:33
copperbowl/ROOT/r360237e
copperbowl/ROOT/r357746h
copperbowl/ROOT/Waterfox
root@momh167-gjp4-8570p:~ # zfs list -t snapshot
NAME USED AVAIL  
REFER  MOUNTPOINT

copperbowl/ROOT/r360237c@2020-03-20-06:19:45    1.02G -  59.2G  -
copperbowl/ROOT/r360237c@2020-04-09-17:59:32    1.15G -  60.0G  -
copperbowl/ROOT/r360237c@2020-04-29-13:24:52    11.2M -  62.2G  -
copperbowl/ROOT/r360237e@2020-05-01-17:58:33    96.6M -  62.3G  -
copperbowl/iocage/releases/12.0-RELEASE/root@jbrowsers 8K -  1.24G  -
copperbowl/poudriere/jails/head@clean    376K -  1.91G  -
copperbowl/usr/home@2020-05-03-05:55-r360237    57.7M -   171G  -
root@momh167-gjp4-8570p:~ # uname -v
FreeBSD 13.0-CURRENT #54 r360237: Fri Apr 24 09:10:37 BST 2020 
root@momh167-gjp4-8570p:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG

root@momh167-gjp4-8570p:~ #



On 01/05/2020 18:57, Graham Perrin wrote:

root@momh167-gjp4-8570p:~ # date ; uname -v
Fri May  1 18:52:31 BST 2020
FreeBSD 13.0-CURRENT #54 r360237: Fri Apr 24 09:10:37 BST 2020 
root@momh167-gjp4-8570p:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG

root@momh167-gjp4-8570p:~ # beadm list
BE   Active Mountpoint  Space Created
Waterfox -  -    1.3G 2020-03-10 18:24
r357746h -  -  928.4M 2020-04-08 09:28
r360237b -  -  213.6M 2020-04-28 20:17
r360237c -  -   92.0G 2020-04-29 13:24
r360237d -  -  153.2M 2020-04-30 13:08
r360237e NR /  720.4M 2020-05-01 17:58
root@momh167-gjp4-8570p:~ # beadm destroy r360237b
Are you sure you want to destroy 'r360237b'?
This action cannot be undone (y/[n]): y
cannot promote 'copperbowl/ROOT/r360237d': not a cloned filesystem
root@momh167-gjp4-8570p:~ # beadm destroy r360237c
Are you sure you want to destroy 'r360237c'?
This action cannot be undone (y/[n]): y
Boot environment 'r360237c' was created from existing snapshot
Destroy '-' snapshot? (y/[n]): y
cannot destroy 'copperbowl/ROOT/r360237c': filesystem has dependent 
clones

use '-R' to destroy the following datasets:
copperbowl/ROOT/r360237e
copperbowl/ROOT/r360237d@2020-05-01-17:58:54
copperbowl/ROOT/r360237d@2020-05-01-17:58:33
copperbowl/ROOT/r360237d
copperbowl/ROOT/r360237b@2020-04-30-13:08:36
copperbowl/ROOT/r360237b
copperbowl/ROOT/r357746h
copperbowl/ROOT/Waterfox
root@momh167-gjp4-8570p:~ # beadm destroy r360237d
Are you sure you want to destroy 'r360237d'?
This action cannot be undone (y/[n]): y
cannot promote 'copperbowl/ROOT/r360237e': not a cloned filesystem
root@momh167-gjp4-8570p:~ # beadm list -as
BE/Dataset/Snapshot    Active Mountpoint Space 
Created


Waterfox
  copperbowl/ROOT/Waterfox -  - 314.0M 
2020-03-10 18:24
    r360237c@2020-03-20-06:19:45   -  - 1.0G 
2020-03-20 06:19


r357746h
  copperbowl/ROOT/r357746h -  - 3.4M 
2020-04-08 09:28
    r360237c@2020-04-09-17:59:32   -  - 925.0M 
2020-04-09 17:59


r360237b
  copperbowl/ROOT/r360237b 

Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Chris

On Sun, 3 May 2020 00:15:48 +0100 Grzegorz Junka li...@gjunka.com said


On 02/05/2020 20:43, Chris wrote:
> On Sat, 2 May 2020 20:19:56 +0100 Grzegorz Junka li...@gjunka.com said
>
>> On 02/05/2020 14:56, Grzegorz Junka wrote:
>> >
>> > On 02/05/2020 14:15, Grzegorz Junka wrote:
>> >> cpuid = 3
>> >>
>> >> time = 1588422616
>> >>
>> >> KDB: stack backtrace:
>> >>
>> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 
>> 0xfe00b27e86b0

>> >>
>> >> vpanic() at vpanic+0x182/frame 0xfe00b27e8700
>> >>
>> >> panic() at panic+0x43/frame ...
>> >>
>> >> sleepq_add()
>> >>
>> >> ...
>> >>
>> >> I see
>> >>
>> >> db>
>> >>
>> >> in the terminal. I tried "dump" but it says, Cannot dump: no dump 
>> >> device specified.

>> >>
>> >> Is there a guide how to deal wit those, i.e. to gather information 
>> >> required to investigate issues?

>> >
>>
>> Another thing is that I don't quite understand why the crash couldn't 
>> be dumped.

>>
>> root@crayon2:~ # swapinfo
>> Device  1K-blocks Used    Avail Capacity
>> /dev/zvol/tank3/swap  33554432    0 33554432 0%
>>
>> There is no entry in /etc/fstab though, should it be there too?
>
> How about your rc.conf(5) ?
>
> You need to define a dumpdev within it as:
>
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="YES"
>
> Which defaults to the location of:
>
> /var/crash
>

Yes, of course I have 'dumpdev="AUTO"'. Should it be "YES" instead?

Yes, it should of course be AUTO. I was distracted at the time of writing.
Sorry.
Does /var/crash exist?

That _should_ be enough. Assuming /var/crash is writable.




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 21:18, Mark Johnston wrote:

OK, I found this handbook
https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug

Obviously something must have been misconfigured that I can't dump the
core now. Is there anything I can fetch from the system while I am in
db> or I should just forget and restart?

It would be useful to see the output of "bt", "show lockchain" and
"alltrace" if possible.  The latter command will product a lot of output
though.


Sorry, had to restart. I tried "netdump -s someIP -g someGateway which 
forced netdump into a loop (of requesting ARP for someIP and failing) 
and couldn't stop it.


I only have the photo of the crash itself which ends at and sleepq_add 
before going to panic. I can hardtranscribe if it might be of any use.


--GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka


On 02/05/2020 20:43, Chris wrote:

On Sat, 2 May 2020 20:19:56 +0100 Grzegorz Junka li...@gjunka.com said


On 02/05/2020 14:56, Grzegorz Junka wrote:
>
> On 02/05/2020 14:15, Grzegorz Junka wrote:
>> cpuid = 3
>>
>> time = 1588422616
>>
>> KDB: stack backtrace:
>>
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 
0xfe00b27e86b0

>>
>> vpanic() at vpanic+0x182/frame 0xfe00b27e8700
>>
>> panic() at panic+0x43/frame ...
>>
>> sleepq_add()
>>
>> ...
>>
>> I see
>>
>> db>
>>
>> in the terminal. I tried "dump" but it says, Cannot dump: no dump 
>> device specified.

>>
>> Is there a guide how to deal wit those, i.e. to gather information 
>> required to investigate issues?

>

Another thing is that I don't quite understand why the crash couldn't 
be dumped.


root@crayon2:~ # swapinfo
Device  1K-blocks Used    Avail Capacity
/dev/zvol/tank3/swap  33554432    0 33554432 0%

There is no entry in /etc/fstab though, should it be there too?


How about your rc.conf(5) ?

You need to define a dumpdev within it as:

# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="YES"

Which defaults to the location of:

/var/crash



Yes, of course I have 'dumpdev="AUTO"'. Should it be "YES" instead?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Mark Johnston
On Sat, May 02, 2020 at 02:56:27PM +0100, Grzegorz Junka wrote:
> 
> On 02/05/2020 14:15, Grzegorz Junka wrote:
> > cpuid = 3
> >
> > time = 1588422616
> >
> > KDB: stack backtrace:
> >
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe00b27e86b0
> >
> > vpanic() at vpanic+0x182/frame 0xfe00b27e8700
> >
> > panic() at panic+0x43/frame ...
> >
> > sleepq_add()
> >
> > ...
> >
> > I see
> >
> > db>
> >
> > in the terminal. I tried "dump" but it says, Cannot dump: no dump 
> > device specified.
> >
> > Is there a guide how to deal wit those, i.e. to gather information 
> > required to investigate issues?
> >
> > ___
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to 
> > "freebsd-current-unsubscr...@freebsd.org"
> 
> 
> OK, I found this handbook 
> https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug
> 
> Obviously something must have been misconfigured that I can't dump the 
> core now. Is there anything I can fetch from the system while I am in 
> db> or I should just forget and restart?

It would be useful to see the output of "bt", "show lockchain" and
"alltrace" if possible.  The latter command will product a lot of output
though.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Chris

On Sat, 2 May 2020 20:19:56 +0100 Grzegorz Junka li...@gjunka.com said


On 02/05/2020 14:56, Grzegorz Junka wrote:
>
> On 02/05/2020 14:15, Grzegorz Junka wrote:
>> cpuid = 3
>>
>> time = 1588422616
>>
>> KDB: stack backtrace:
>>
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
>> 0xfe00b27e86b0

>>
>> vpanic() at vpanic+0x182/frame 0xfe00b27e8700
>>
>> panic() at panic+0x43/frame ...
>>
>> sleepq_add()
>>
>> ...
>>
>> I see
>>
>> db>
>>
>> in the terminal. I tried "dump" but it says, Cannot dump: no dump 
>> device specified.

>>
>> Is there a guide how to deal wit those, i.e. to gather information 
>> required to investigate issues?

>

Another thing is that I don't quite understand why the crash couldn't be 
dumped.


root@crayon2:~ # swapinfo
Device  1K-blocks Used    Avail Capacity
/dev/zvol/tank3/swap  33554432    0 33554432 0%

There is no entry in /etc/fstab though, should it be there too?


How about your rc.conf(5) ?

You need to define a dumpdev within it as:

# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="YES"

Which defaults to the location of:

/var/crash




--

GrzegorzJ


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka


On 02/05/2020 14:56, Grzegorz Junka wrote:


On 02/05/2020 14:15, Grzegorz Junka wrote:

cpuid = 3

time = 1588422616

KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00b27e86b0


vpanic() at vpanic+0x182/frame 0xfe00b27e8700

panic() at panic+0x43/frame ...

sleepq_add()

...

I see

db>

in the terminal. I tried "dump" but it says, Cannot dump: no dump 
device specified.


Is there a guide how to deal wit those, i.e. to gather information 
required to investigate issues?




Another thing is that I don't quite understand why the crash couldn't be 
dumped.


root@crayon2:~ # swapinfo
Device  1K-blocks Used    Avail Capacity
/dev/zvol/tank3/swap  33554432    0 33554432 0%

There is no entry in /etc/fstab though, should it be there too?

--

GrzegorzJ


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal and poudriere

2020-05-02 Thread Kurt Jaeger
Hi!

> > > I am compiling some packages with poudriere on 13-current kernel. I
> > > noticed some strange messages printed into the terminal and dmesg:
> > > 
> > > lock order reversal:
> > [...]
> > > Are those the debug messages that aren't visible on non-current kernel
> > > and should they be reported?
> > Yes, they should be checked and reported.
> > 
> > For more details see:
> > 
> > http://sources.zabbadoz.net/freebsd/lor.html
> > 
> > There's a webpage with a list of all known LORs and a way to
> > report new LORs.

> Thanks Kurt. I can't find those two specific LORs in the list on that
> page. The page also says to report them using a link, which leads to 404
> :-), or on this mailing list, which I did. I am not sure what else should
> I do.

I don't know, either 8-} bz@ is in Cc:, so he'll probably know what
to do.

> How do I know if I have got a backtrace?
> 
> Are those errors:
> 
> pid 43297 (conftest), jid 5, uid 0: exited on signal 11
> 
> related or it's a different issue?

I think that's a different issue.

-- 
p...@freebsd.org +49 171 3101372  Now what ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 15:40, Conrad Meyer wrote:

Hi Grzegorz,

If you have another machine connected by network that you can install
and start netdumpd on, and; ipv4 configured on a supported network
device before the machine paniced; and a recent CURRENT; you should be
able to initiate a kernel dump over the network with 'netdump -s
server-ip' in DDB.  In more complicated situations you might also need
to specify '-g gateway-ip -c client-ip -i interface', but for servers
on the LAN or available via the default gateway route, the former
ought to work.



Thanks Conrad. That doesn't seem to work. netdump -s reports "Failed to 
ARP server" then "failed to locate MAC address". Both systems are in the 
same local network and the system that crashed did have a network 
configured prior to crash. In fact, I was logged in over ssh in one of 
the terminals. I tried through a switch and when the network is 
connected directly. I tried to specify the interface and the client IP.


Is there a way to specify MAC directly?

GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Conrad Meyer
Hi Grzegorz,

If you have another machine connected by network that you can install
and start netdumpd on, and; ipv4 configured on a supported network
device before the machine paniced; and a recent CURRENT; you should be
able to initiate a kernel dump over the network with 'netdump -s
server-ip' in DDB.  In more complicated situations you might also need
to specify '-g gateway-ip -c client-ip -i interface', but for servers
on the LAN or available via the default gateway route, the former
ought to work.

Best,
Conrad

On Sat, May 2, 2020 at 6:56 AM Grzegorz Junka  wrote:
>
>
> On 02/05/2020 14:15, Grzegorz Junka wrote:
> > cpuid = 3
> >
> > time = 1588422616
> >
> > KDB: stack backtrace:
> >
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfe00b27e86b0
> >
> > vpanic() at vpanic+0x182/frame 0xfe00b27e8700
> >
> > panic() at panic+0x43/frame ...
> >
> > sleepq_add()
> >
> > ...
> >
> > I see
> >
> > db>
> >
> > in the terminal. I tried "dump" but it says, Cannot dump: no dump
> > device specified.
> >
> > Is there a guide how to deal wit those, i.e. to gather information
> > required to investigate issues?
> >
> > ___
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to
> > "freebsd-current-unsubscr...@freebsd.org"
>
>
> OK, I found this handbook
> https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug
>
> Obviously something must have been misconfigured that I can't dump the
> core now. Is there anything I can fetch from the system while I am in
> db> or I should just forget and restart?
>
> GrzegorzJ
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 14:15, Grzegorz Junka wrote:

cpuid = 3

time = 1588422616

KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00b27e86b0


vpanic() at vpanic+0x182/frame 0xfe00b27e8700

panic() at panic+0x43/frame ...

sleepq_add()

...

I see

db>

in the terminal. I tried "dump" but it says, Cannot dump: no dump 
device specified.


Is there a guide how to deal wit those, i.e. to gather information 
required to investigate issues?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"



OK, I found this handbook 
https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug


Obviously something must have been misconfigured that I can't dump the 
core now. Is there anything I can fetch from the system while I am in 
db> or I should just forget and restart?


GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371

2020-05-02 Thread Grzegorz Junka

cpuid = 3

time = 1588422616

KDB: stack backtrace:

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00b27e86b0


vpanic() at vpanic+0x182/frame 0xfe00b27e8700

panic() at panic+0x43/frame ...

sleepq_add()

...

I see

db>

in the terminal. I tried "dump" but it says, Cannot dump: no dump device 
specified.


Is there a guide how to deal wit those, i.e. to gather information 
required to investigate issues?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal and poudriere

2020-05-02 Thread Grzegorz Junka



On 02/05/2020 10:54, Kurt Jaeger wrote:

Hi!


I am compiling some packages with poudriere on 13-current kernel. I
noticed some strange messages printed into the terminal and dmesg:

lock order reversal:

[...]

Are those the debug messages that aren't visible on non-current kernel
and should they be reported?

Yes, they should be checked and reported.

For more details see:

http://sources.zabbadoz.net/freebsd/lor.html

There's a webpage with a list of all known LORs and a way to
report new LORs.



Thanks Kurt. I can't find those two specific LORs in the list on that 
page. The page also says to report them using a link, which leads to 404 
:-), or on this mailing list, which I did. I am not sure what else 
should I do. How do I know if I have got a backtrace?


Are those errors:

pid 43297 (conftest), jid 5, uid 0: exited on signal 11

related or it's a different issue?

GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: lock order reversal and poudriere

2020-05-02 Thread Kurt Jaeger
Hi!

> I am compiling some packages with poudriere on 13-current kernel. I
> noticed some strange messages printed into the terminal and dmesg:
> 
> lock order reversal:
[...]
> Are those the debug messages that aren't visible on non-current kernel
> and should they be reported?

Yes, they should be checked and reported.

For more details see:

http://sources.zabbadoz.net/freebsd/lor.html

There's a webpage with a list of all known LORs and a way to
report new LORs.

-- 
p...@opsec.eu+49 171 3101372Now what ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


lock order reversal and poudriere

2020-05-02 Thread Grzegorz Junka
I am compiling some packages with poudriere on 13-current kernel. I 
noticed some strange messages printed into the terminal and dmesg:


lock order reversal:
 1st 0xf8010ca78250 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005
 2nd 0xf8010cd37250 devfs (devfs) @ 
/usr/src-13/sys/kern/vfs_mount.c:1016

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b92f18 at lockmgr_lock_flags+0x188
#2 0x80cae744 at _vn_lock+0x54
#3 0x80c90756 at vfs_domount+0xd16
#4 0x80c8efd1 at vfs_donmount+0x871
#5 0x80c8e729 at sys_nmount+0x69
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101
pid 17216 (conftest), jid 6, uid 0: exited on signal 11
pid 51159 (conftest), jid 6, uid 0: exited on signal 11
pid 23833 (conftest), jid 3, uid 0: exited on signal 11
pid 4916 (conftest), jid 3, uid 0: exited on signal 11

(... then there is a bunch of similar ones, then ...)

pid 14504 (conftest), jid 3, uid 0: exited on signal 11
pid 27466 (conftest), jid 6, uid 0: exited on signal 11
pid 43297 (conftest), jid 5, uid 0: exited on signal 11
lock order reversal:
 1st 0xfe00bc68c030 filedesc structure (filedesc structure) @ 
/usr/src-13/sys/kern/sys_generic.c:1557
 2nd 0xf803baeddbd8 tmpfs (tmpfs) @ 
/usr/src-13/sys/kern/vfs_vnops.c:1553

stack backtrace:
#0 0x80c2d5f1 at witness_debugger+0x71
#1 0x80b946b5 at lockmgr_xlock+0x55
#2 0x80cae744 at _vn_lock+0x54
#3 0x80cad0da at vn_poll+0x3a
#4 0x80c33e19 at kern_poll+0x419
#5 0x80c340df at sys_ppoll+0x6f
#6 0x81060c40 at amd64_syscall+0x140
#7 0x810370a0 at fast_syscall_common+0x101
pid 37533 (conftest), jid 5, uid 0: exited on signal 11
pid 43474 (conftest), jid 5, uid 0: exited on signal 11


Poudriere doesn't really report any problems:

# poudriere status
SET  PORTS JAIL BUILD    STATUS QUEUE BUILT FAIL 
SKIP IGNORE REMAIN TIME LOGS
kde5 gui   13   2020-05-01_10h17m52s parallel_build  2040   792 0    
0  0   1248 22:48:00 
/usr/local/poudriere/data/logs/bulk/13-gui-kde5/2020-05-01_10h17m52s



Are those the debug messages that aren't visible on non-current kernel 
and should they be reported?


GrzegorzJ

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"