CURRENT 220ee18f1964 memstick kernel panic, MacBookPro8,3

2024-03-25 Thread Graham Perrin
Originally posted to 



Photograph: 



USB flash drive written from 
FreeBSD-15.0-CURRENT-amd64-20240314-220ee18f1964-268793-memstick.img.xz


Broadcom Wi-Fi-related, maybe? 





Reproducible in safe mode.


Kernel Panic sys/netinet/tcp_subr.c 2386

2024-02-12 Thread Wolfram Schneider
Hi,

I just got a kernel panic on my 15.0-CURRENT machine, with an
Assertion in sys/netinet/tcp_subr.c 2386

full log:
https://people.freebsd.org/~wosch/tmp/kernel-panic-tcp_subr-line-2386.png

OS: 15.0-CURRENT main-3e9515846f (10-Feb-2024, github.com/freebsd/freebsd-src)

Should I worry?

-Wolfram

-- 
Wolfram Schneider  https://wolfram.schneider.org



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-03 Thread Gary Jennejohn
On Sun, 03 Sep 2023 15:17:36 +0200
"Herbert J. Skuhra"  wrote:

[SNIP]
> Probably best to file a PR: https://bugs.freebsd.org/bugzilla/
>

Bugzilla 273543

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-03 Thread Herbert J. Skuhra
On Sat, 02 Sep 2023 18:02:03 +0200, Gary Jennejohn wrote:
> 
> On Sat, 02 Sep 2023 15:36:36 +0200
> "Herbert J. Skuhra"  wrote:
> 
> > On Fri, 01 Sep 2023 18:05:34 +0200, Gary Jennejohn wrote:
> > >
> > > On Fri, 1 Sep 2023 14:43:21 +
> > > Gary Jennejohn  wrote:
> > >
> > > > A git-bisect is probably required.
> > > >
> > >
> > > I did a bisect and the result was commit
> > > 9a7add6d01f3c5f7eba811e794cf860d2bce131d.
> > >
> > > However, that can't be correct because this commit was made on
> > > Mon Jul 17 19:29:20 2023 and my FBSD-14 kernel from August 13th
> > > boots successfully :(
> >
> > Commit date is August 19th, 2023(!):
> >
> > commit 9a7add6d01f3c5f7eba811e794cf860d2bce131d
> > Author: Colin Percival
> > AuthorDate: Mon Jul 17 19:29:20 2023 -0700
> > Commit: Colin Percival
> > CommitDate: Sat Aug 19 22:04:56 2023 -0700
> >
> >
> > Reverting this commit seems to resolve the issue for me:
> >
> > FreeBSD 15.0-CURRENT amd64 150 #0 main-n265137-2ad756a6bbb3
> >
> > $ git status
> > On branch main
> > Your branch is up to date with 'freebsd/main'.
> >
> > You are currently reverting commit 9a7add6d01f3.
> >   (all conflicts fixed: run "git revert --continue")
> >   (use "git revert --skip" to skip this patch)
> >   (use "git revert --abort" to cancel the revert operation)
> >
> > Changes to be committed:
> >   (use "git restore --staged ..." to unstage)
> > modified:   sys/kern/init_main.c
> >
> > # dmesg |egrep "(amdsmn|amdtemp)"
> > amdsmn0:  on hostb0
> > amdtemp0:  on hostb0
> >
> > $ sysctl kern.conftxt |grep amdt
> > device  amdtemp
> >
> 
> Really?  I did a git log and July 17 is what pops out for this commit.
> 
> Ah, I see that git log doesn't show the commit date.
> 
> So I guess that the git bisect really did find the commit which caused
> all our problems.
> 
> If reverting it fixes things then this requires some action from Colin
> Percival.
> 
> This would also explain why my FBSD-14 kernel from August 13 was
> OK.

Probably best to file a PR: https://bugs.freebsd.org/bugzilla/

--
Herbert



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-02 Thread Gary Jennejohn
On Sat, 02 Sep 2023 15:36:36 +0200
"Herbert J. Skuhra"  wrote:

> On Fri, 01 Sep 2023 18:05:34 +0200, Gary Jennejohn wrote:
> >
> > On Fri, 1 Sep 2023 14:43:21 +
> > Gary Jennejohn  wrote:
> >
> > > A git-bisect is probably required.
> > >
> >
> > I did a bisect and the result was commit
> > 9a7add6d01f3c5f7eba811e794cf860d2bce131d.
> >
> > However, that can't be correct because this commit was made on
> > Mon Jul 17 19:29:20 2023 and my FBSD-14 kernel from August 13th
> > boots successfully :(
>
> Commit date is August 19th, 2023(!):
>
> commit 9a7add6d01f3c5f7eba811e794cf860d2bce131d
> Author: Colin Percival
> AuthorDate: Mon Jul 17 19:29:20 2023 -0700
> Commit: Colin Percival
> CommitDate: Sat Aug 19 22:04:56 2023 -0700
>
>
> Reverting this commit seems to resolve the issue for me:
>
> FreeBSD 15.0-CURRENT amd64 150 #0 main-n265137-2ad756a6bbb3
>
> $ git status
> On branch main
> Your branch is up to date with 'freebsd/main'.
>
> You are currently reverting commit 9a7add6d01f3.
>   (all conflicts fixed: run "git revert --continue")
>   (use "git revert --skip" to skip this patch)
>   (use "git revert --abort" to cancel the revert operation)
>
> Changes to be committed:
>   (use "git restore --staged ..." to unstage)
>   modified:   sys/kern/init_main.c
>
> # dmesg |egrep "(amdsmn|amdtemp)"
> amdsmn0:  on hostb0
> amdtemp0:  on hostb0
>
> $ sysctl kern.conftxt |grep amdt
> device  amdtemp
>

Really?  I did a git log and July 17 is what pops out for this commit.

Ah, I see that git log doesn't show the commit date.

So I guess that the git bisect really did find the commit which caused
all our problems.

If reverting it fixes things then this requires some action from Colin
Percival.

This would also explain why my FBSD-14 kernel from August 13 was
OK.

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-02 Thread Herbert J. Skuhra
On Fri, 01 Sep 2023 18:05:34 +0200, Gary Jennejohn wrote:
> 
> On Fri, 1 Sep 2023 14:43:21 +
> Gary Jennejohn  wrote:
> 
> > A git-bisect is probably required.
> >
> 
> I did a bisect and the result was commit
> 9a7add6d01f3c5f7eba811e794cf860d2bce131d.
> 
> However, that can't be correct because this commit was made on
> Mon Jul 17 19:29:20 2023 and my FBSD-14 kernel from August 13th
> boots successfully :(

Commit date is August 19th, 2023(!):

commit 9a7add6d01f3c5f7eba811e794cf860d2bce131d
Author: Colin Percival
AuthorDate: Mon Jul 17 19:29:20 2023 -0700
Commit: Colin Percival 
CommitDate: Sat Aug 19 22:04:56 2023 -0700


Reverting this commit seems to resolve the issue for me:

FreeBSD 15.0-CURRENT amd64 150 #0 main-n265137-2ad756a6bbb3

$ git status
On branch main
Your branch is up to date with 'freebsd/main'.

You are currently reverting commit 9a7add6d01f3.
  (all conflicts fixed: run "git revert --continue")
  (use "git revert --skip" to skip this patch)
  (use "git revert --abort" to cancel the revert operation)

Changes to be committed:
  (use "git restore --staged ..." to unstage)
modified:   sys/kern/init_main.c

# dmesg |egrep "(amdsmn|amdtemp)"
amdsmn0:  on hostb0
amdtemp0:  on hostb0

$ sysctl kern.conftxt |grep amdt
device  amdtemp

--
Herbert



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-02 Thread Gary Jennejohn
On Fri, 1 Sep 2023 16:00:03 -0600
Warner Losh  wrote:

> I think that the problem is that admsmn has probed, but not attached (or
> failed to attach for some reason), so we find the device, but it's not
> initialized yet, so when we call amdsmn_read, it tries to lock a mutex
> that's not yet initialized.
>
> Not sure why this is happening, or why loading it as modules fixes it...
>
> But since I don't have the hardware, I can't help more. Sorry.
>

Might it be worth adding an entry to /usr/src/UPDATING for users with
older Zen versions like mine (Zen 1 and Zen 2), recommending kldload'ing
of amdsmn and amdtemp?

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Warner Losh
I think that the problem is that admsmn has probed, but not attached (or
failed to attach for some reason), so we find the device, but it's not
initialized yet, so when we call amdsmn_read, it tries to lock a mutex
that's not yet initialized.

Not sure why this is happening, or why loading it as modules fixes it...

But since I don't have the hardware, I can't help more. Sorry.

Warner

On Fri, Sep 1, 2023 at 10:21 AM Gary Jennejohn  wrote:

> On Fri, 01 Sep 2023 17:14:02 +0200
> "Herbert J. Skuhra"  wrote:
>
> > On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote:
> > >
> > > On Fri, 01 Sep 2023 14:15:20 +0200
> > > "Herbert J. Skuhra"  wrote:
> > >
> > > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:
> > > > >
> > > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7
> 3700X.
> > > > >
> > > > > These are respectively Zen 1 and Zen 2 CPUs.
> > > > >
> > > > > I built a kernel on both computers using the FreeBSD-15 source
> tree.
> > > > >
> > > > > If I include the amdtemp device in my kernel file BOTH computers
> end up
> > > > > with a kernel panic while trying to attach the amdtemp device.
> > > > >
> > > > > If I remove amdtemp both computers boot without any issues.
> > > > >
> > > > > I suspect that this commit is the cause:
> > > > >
> > > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> > > > > Author: Akio Morita 
> > > > > Date:   Tue Aug 1 22:32:12 2023 +0200
> > > > >
> > > > > amdsmn(4), amdtemp(4): add support for Zen 4
> > > > >
> > > > > Zen 4 support, tested on Ryzen 9 7900
> > > > >
> > > > > Reviewed by:imp (previous version), mhorne
> > > > > Approved by:mhorne
> > > > > Obtained from:
> http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> > > > > Differential Revision:  https://reviews.freebsd.org/D41049
> > > >
> > > > Thanks for sharing your findings.
> > > >
> > > > Now I probably know why my old kernel from stable/13 no longer booted
> > > > after updating to stable/14. I've create a new kernel config and
> > > > forgot to add "device amdtemp" & "device amdsmn" and forgot about the
> > > > issue. After removing only "device amdtemp" from my old kernel config
> > > > it boots again.
> > > >
> > > > Unfortunately reverting this commit (git revert -n 323a94afb623)
> > > > doesn't resolve this issue. Old kernel does not boot if "device
> > > > amdtemp" is enabled. Probably wrong commit or I am doing somethig
> > > > wrong!?
> > > >
> > >
> > > Strange.  My FreeBSD-14 kernel boots with device amdtemp (which
> automatically
> > > results in amdsmn being included).  It's FreeBSD-15 which fails for me.
> >
> > 1. 'kload amdtemp' works:
> >121 0x81e7c000 3160 amdtemp.ko
> >131 0x81e8 2138 amdsmn.ko
> >
> >amdsmn0:  on hostb0
> >amdtemp0:  on hostb0
> >
> > 2. GENERIC boots fine. The following kernel does not:
> >
> >include GENERIC
> >
> >ident  TEST
> >device amdtemp
> >
> > 3. Unfortunately this is a remote server without a serial console. I
> > don't get a crashdump and I can't find anything in /var/log/messages.
> >
> > 4. I have no good revision for stable/14 and main. On main I always
> > use GENERIC-NODEBUG. :-(
> >
>
> Thanks, Herbert!  kldload'ing amdsmn and amdtemp really does work!
>
> Now I can run FBSD-15 :)
>
> --
> Gary Jennejohn
>
>


Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Gary Jennejohn
On Fri, 01 Sep 2023 17:14:02 +0200
"Herbert J. Skuhra"  wrote:

> On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote:
> >
> > On Fri, 01 Sep 2023 14:15:20 +0200
> > "Herbert J. Skuhra"  wrote:
> >
> > > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:
> > > >
> > > > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 
> > > > 3700X.
> > > >
> > > > These are respectively Zen 1 and Zen 2 CPUs.
> > > >
> > > > I built a kernel on both computers using the FreeBSD-15 source tree.
> > > >
> > > > If I include the amdtemp device in my kernel file BOTH computers end up
> > > > with a kernel panic while trying to attach the amdtemp device.
> > > >
> > > > If I remove amdtemp both computers boot without any issues.
> > > >
> > > > I suspect that this commit is the cause:
> > > >
> > > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> > > > Author: Akio Morita 
> > > > Date:   Tue Aug 1 22:32:12 2023 +0200
> > > >
> > > > amdsmn(4), amdtemp(4): add support for Zen 4
> > > >
> > > > Zen 4 support, tested on Ryzen 9 7900
> > > >
> > > > Reviewed by:imp (previous version), mhorne
> > > > Approved by:mhorne
> > > > Obtained from:  
> > > > http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> > > > Differential Revision:  https://reviews.freebsd.org/D41049
> > >
> > > Thanks for sharing your findings.
> > >
> > > Now I probably know why my old kernel from stable/13 no longer booted
> > > after updating to stable/14. I've create a new kernel config and
> > > forgot to add "device amdtemp" & "device amdsmn" and forgot about the
> > > issue. After removing only "device amdtemp" from my old kernel config
> > > it boots again.
> > >
> > > Unfortunately reverting this commit (git revert -n 323a94afb623)
> > > doesn't resolve this issue. Old kernel does not boot if "device
> > > amdtemp" is enabled. Probably wrong commit or I am doing somethig
> > > wrong!?
> > >
> >
> > Strange.  My FreeBSD-14 kernel boots with device amdtemp (which 
> > automatically
> > results in amdsmn being included).  It's FreeBSD-15 which fails for me.
>
> 1. 'kload amdtemp' works:
>121 0x81e7c000 3160 amdtemp.ko
>131 0x81e8 2138 amdsmn.ko
>
>amdsmn0:  on hostb0
>amdtemp0:  on hostb0
>
> 2. GENERIC boots fine. The following kernel does not:
>
>include GENERIC
>
>ident  TEST
>device amdtemp
>
> 3. Unfortunately this is a remote server without a serial console. I
> don't get a crashdump and I can't find anything in /var/log/messages.
>
> 4. I have no good revision for stable/14 and main. On main I always
> use GENERIC-NODEBUG. :-(
>

Thanks, Herbert!  kldload'ing amdsmn and amdtemp really does work!

Now I can run FBSD-15 :)

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Gary Jennejohn
On Fri, 1 Sep 2023 14:43:21 +
Gary Jennejohn  wrote:

> A git-bisect is probably required.
>

I did a bisect and the result was commit
9a7add6d01f3c5f7eba811e794cf860d2bce131d.

However, that can't be correct because this commit was made on
Mon Jul 17 19:29:20 2023 and my FBSD-14 kernel from August 13th
boots successfully :(

However, "Herbert J. Skuhra"  says that kldload'ing
amdtemp works.  So I'll give that a try.

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Herbert J. Skuhra
On Fri, 01 Sep 2023 16:04:41 +0200, Gary Jennejohn wrote:
> 
> On Fri, 01 Sep 2023 14:15:20 +0200
> "Herbert J. Skuhra"  wrote:
> 
> > On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:
> > >
> > > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 3700X.
> > >
> > > These are respectively Zen 1 and Zen 2 CPUs.
> > >
> > > I built a kernel on both computers using the FreeBSD-15 source tree.
> > >
> > > If I include the amdtemp device in my kernel file BOTH computers end up
> > > with a kernel panic while trying to attach the amdtemp device.
> > >
> > > If I remove amdtemp both computers boot without any issues.
> > >
> > > I suspect that this commit is the cause:
> > >
> > > commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> > > Author: Akio Morita 
> > > Date:   Tue Aug 1 22:32:12 2023 +0200
> > >
> > > amdsmn(4), amdtemp(4): add support for Zen 4
> > >
> > > Zen 4 support, tested on Ryzen 9 7900
> > >
> > > Reviewed by:imp (previous version), mhorne
> > > Approved by:mhorne
> > > Obtained from:  http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> > > Differential Revision:  https://reviews.freebsd.org/D41049
> >
> > Thanks for sharing your findings.
> >
> > Now I probably know why my old kernel from stable/13 no longer booted
> > after updating to stable/14. I've create a new kernel config and
> > forgot to add "device amdtemp" & "device amdsmn" and forgot about the
> > issue. After removing only "device amdtemp" from my old kernel config
> > it boots again.
> >
> > Unfortunately reverting this commit (git revert -n 323a94afb623)
> > doesn't resolve this issue. Old kernel does not boot if "device
> > amdtemp" is enabled. Probably wrong commit or I am doing somethig
> > wrong!?
> >
> 
> Strange.  My FreeBSD-14 kernel boots with device amdtemp (which automatically
> results in amdsmn being included).  It's FreeBSD-15 which fails for me.

1. 'kload amdtemp' works:
   121 0x81e7c000 3160 amdtemp.ko
   131 0x81e8 2138 amdsmn.ko

   amdsmn0:  on hostb0 
   amdtemp0:  on hostb0

2. GENERIC boots fine. The following kernel does not:

   include GENERIC

   identTEST
   device   amdtemp

3. Unfortunately this is a remote server without a serial console. I
don't get a crashdump and I can't find anything in /var/log/messages.

4. I have no good revision for stable/14 and main. On main I always
use GENERIC-NODEBUG. :-( 

-- 
Herbert



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Gary Jennejohn
On Fri, 1 Sep 2023 14:23:36 +
Gary Jennejohn  wrote:

> Now that I look at the date of my FreeBSD-14 kernel I see that it's from
> August 13, so this commit is perhaps not the cause of my FreeBSD-15
> kernel panicking at boot time, since FBSD-14 boots OK.
>
> Nonetheless, amdtemp or maybe amdsmn seems to be related.
>

Since my FBSD-14 kernel is from August 13 and FBSD-15 appeared on August
24 there were 10 to 11 days of commits in between.  That makes it much
more difficult to pinpoint the cause.

A git-bisect is probably required.

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Gary Jennejohn
On Fri, 1 Sep 2023 11:03:14 +
Gary Jennejohn  wrote:

> I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 3700X.
>
> These are respectively Zen 1 and Zen 2 CPUs.
>
> I built a kernel on both computers using the FreeBSD-15 source tree.
>
> If I include the amdtemp device in my kernel file BOTH computers end up
> with a kernel panic while trying to attach the amdtemp device.
>
> If I remove amdtemp both computers boot without any issues.
>
> I suspect that this commit is the cause:
>
> commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> Author: Akio Morita 
> Date:   Tue Aug 1 22:32:12 2023 +0200
>
> amdsmn(4), amdtemp(4): add support for Zen 4
>
> Zen 4 support, tested on Ryzen 9 7900
>
> Reviewed by:imp (previous version), mhorne
> Approved by:mhorne
> Obtained from:  http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> Differential Revision:  https://reviews.freebsd.org/D41049
>

Now that I look at the date of my FreeBSD-14 kernel I see that it's from
August 13, so this commit is perhaps not the cause of my FreeBSD-15
kernel panicking at boot time, since FBSD-14 boots OK.

Nonetheless, amdtemp or maybe amdsmn seems to be related.

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Gary Jennejohn
On Fri, 01 Sep 2023 14:15:20 +0200
"Herbert J. Skuhra"  wrote:

> On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:
> >
> > I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 3700X.
> >
> > These are respectively Zen 1 and Zen 2 CPUs.
> >
> > I built a kernel on both computers using the FreeBSD-15 source tree.
> >
> > If I include the amdtemp device in my kernel file BOTH computers end up
> > with a kernel panic while trying to attach the amdtemp device.
> >
> > If I remove amdtemp both computers boot without any issues.
> >
> > I suspect that this commit is the cause:
> >
> > commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> > Author: Akio Morita 
> > Date:   Tue Aug 1 22:32:12 2023 +0200
> >
> > amdsmn(4), amdtemp(4): add support for Zen 4
> >
> > Zen 4 support, tested on Ryzen 9 7900
> >
> > Reviewed by:imp (previous version), mhorne
> > Approved by:mhorne
> > Obtained from:  http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> > Differential Revision:  https://reviews.freebsd.org/D41049
>
> Thanks for sharing your findings.
>
> Now I probably know why my old kernel from stable/13 no longer booted
> after updating to stable/14. I've create a new kernel config and
> forgot to add "device amdtemp" & "device amdsmn" and forgot about the
> issue. After removing only "device amdtemp" from my old kernel config
> it boots again.
>
> Unfortunately reverting this commit (git revert -n 323a94afb623)
> doesn't resolve this issue. Old kernel does not boot if "device
> amdtemp" is enabled. Probably wrong commit or I am doing somethig
> wrong!?
>

Strange.  My FreeBSD-14 kernel boots with device amdtemp (which automatically
results in amdsmn being included).  It's FreeBSD-15 which fails for me.

--
Gary Jennejohn



Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Warner Losh
On Fri, Sep 1, 2023, 5:03 AM Gary Jennejohn  wrote:

> I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 3700X.
>
> These are respectively Zen 1 and Zen 2 CPUs.
>
> I built a kernel on both computers using the FreeBSD-15 source tree.
>
> If I include the amdtemp device in my kernel file BOTH computers end up
> with a kernel panic while trying to attach the amdtemp device.
>

Traceback?

Warner

If I remove amdtemp both computers boot without any issues.
>
> I suspect that this commit is the cause:
>
> commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> Author: Akio Morita 
> Date:   Tue Aug 1 22:32:12 2023 +0200
>
> amdsmn(4), amdtemp(4): add support for Zen 4
>
> Zen 4 support, tested on Ryzen 9 7900
>
> Reviewed by:imp (previous version), mhorne
> Approved by:mhorne
> Obtained from:  http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> Differential Revision:  https://reviews.freebsd.org/D41049
>
> --
> Gary Jennejohn
>
>


Re: FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Herbert J. Skuhra
On Fri, 01 Sep 2023 13:03:14 +0200, Gary Jennejohn wrote:
> 
> I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 3700X.
> 
> These are respectively Zen 1 and Zen 2 CPUs.
> 
> I built a kernel on both computers using the FreeBSD-15 source tree.
> 
> If I include the amdtemp device in my kernel file BOTH computers end up
> with a kernel panic while trying to attach the amdtemp device.
> 
> If I remove amdtemp both computers boot without any issues.
> 
> I suspect that this commit is the cause:
> 
> commit 323a94afb6236bcec3a07721566aec6f2ea2b209
> Author: Akio Morita 
> Date:   Tue Aug 1 22:32:12 2023 +0200
> 
> amdsmn(4), amdtemp(4): add support for Zen 4
> 
> Zen 4 support, tested on Ryzen 9 7900
> 
> Reviewed by:imp (previous version), mhorne
> Approved by:mhorne
> Obtained from:  http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
> Differential Revision:  https://reviews.freebsd.org/D41049

Thanks for sharing your findings.

Now I probably know why my old kernel from stable/13 no longer booted
after updating to stable/14. I've create a new kernel config and
forgot to add "device amdtemp" & "device amdsmn" and forgot about the
issue. After removing only "device amdtemp" from my old kernel config
it boots again.

Unfortunately reverting this commit (git revert -n 323a94afb623)
doesn't resolve this issue. Old kernel does not boot if "device
amdtemp" is enabled. Probably wrong commit or I am doing somethig
wrong!?

--
Herbert



FreeBSD-15 kernel panic when the amdtemp device is in the kernel

2023-09-01 Thread Gary Jennejohn
I have a laptop wioth a AMD Ryzen 5 and a tower with a AMD Ryzen 7 3700X.

These are respectively Zen 1 and Zen 2 CPUs.

I built a kernel on both computers using the FreeBSD-15 source tree.

If I include the amdtemp device in my kernel file BOTH computers end up
with a kernel panic while trying to attach the amdtemp device.

If I remove amdtemp both computers boot without any issues.

I suspect that this commit is the cause:

commit 323a94afb6236bcec3a07721566aec6f2ea2b209
Author: Akio Morita 
Date:   Tue Aug 1 22:32:12 2023 +0200

amdsmn(4), amdtemp(4): add support for Zen 4

Zen 4 support, tested on Ryzen 9 7900

Reviewed by:imp (previous version), mhorne
Approved by:mhorne
Obtained from:  http://jyurai.ddo.jp/~amorita/diary/?date=20221102#p01
Differential Revision:  https://reviews.freebsd.org/D41049

--
Gary Jennejohn



Re: Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-22 Thread Kevin Bowling
On Sat, Jul 22, 2023 at 1:21 AM Yasuhiro Kimura  wrote:
>
> From: Kevin Bowling 
> Subject: Re: Kernel panic after updating 14-CURRENT amd64 to 
> main-n264268-ff4633d9f89
> Date: Fri, 21 Jul 2023 21:44:13 -0700
>
> > Thanks, I have reverted for now.  Can you tell me which NIC is
> > implemented there?
>
> Output of `pciconf -lv` says as following.
>
> em0@pci0:0:3:0: class=0x02 rev=0x02 hdr=0x00 vendor=0x8086 device=0x100e 
> subvendor=0x8086 subdevice=0x001e
> vendor = 'Intel Corporation'
> device = '82540EM Gigabit Ethernet Controller'
> class  = network
> subclass   = ethernet
>
> Regards.

Thanks for the report, I've identified the errors and recommitted.

> ---
> Yasuhiro Kimura



Re: Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-22 Thread Yasuhiro Kimura
From: Kevin Bowling 
Subject: Re: Kernel panic after updating 14-CURRENT amd64 to 
main-n264268-ff4633d9f89
Date: Fri, 21 Jul 2023 21:44:13 -0700

> Thanks, I have reverted for now.  Can you tell me which NIC is
> implemented there?

Output of `pciconf -lv` says as following.

em0@pci0:0:3:0: class=0x02 rev=0x02 hdr=0x00 vendor=0x8086 device=0x100e 
subvendor=0x8086 subdevice=0x001e
vendor = 'Intel Corporation'
device = '82540EM Gigabit Ethernet Controller'
class  = network
subclass   = ethernet

Regards.

---
Yasuhiro Kimura



Re: Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-21 Thread Kevin Bowling
Thanks, I have reverted for now.  Can you tell me which NIC is
implemented there?

On Fri, Jul 21, 2023 at 12:45 PM Yasuhiro Kimura  wrote:
>
> From: Yasuhiro Kimura 
> Subject: Kernel panic after updating 14-CURRENT amd64 to 
> main-n264268-ff4633d9f89
> Date: Sat, 22 Jul 2023 02:50:23 +0900 (JST)
>
> > After updating my 14.0-CURRENT amd64 system from
> > main-n264162-f58378393fb to main-n264268-ff4633d9f89, kernel crashes
> > with panic as following.
> >
> > https://people.freebsd.org/~yasu/FreeBSD-14-CURRENT-amd64-main-n264268-ff4633d9f89.20230721.panic.png
>
> According to the result of bisect, kernel panic starts with following
> commit.
>
> --
> commit 95f7b36e8fac45092b9a4eea5e32732e979989f0
> Author: Kevin Bowling 
> Date:   Thu Jul 20 20:30:00 2023 -0700
>
> e1000: lem(4)/em(4) ifcaps, TSO and hwcsum fixes
>
> * em(4) obey administrative ifcaps for using hwcsum offload
> * em(4) obey administrative ifcaps for hw vlan receive tagging
> * em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
> * lem(4) obey administrative ifcaps for using hwcsum offload
> * lem(4) add support for hw vlan receive tagging
> * lem(4) Add ifcaps for TSO offload experimentation, but disabled by
>   default due to errata and possibly missing txrx code.
> * lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
>   full duplex links.  It may still be administratively enabled.
>
> Reviewed by:markj (previous version)
> MFC after:  2 weeks
> Differential Revision:  https://reviews.freebsd.org/D30072
> --
>
> Cc-ing to its committer.
>
> ---
> Yasuhiro Kimura



Re: Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-21 Thread Yasuhiro Kimura
From: Yasuhiro Kimura 
Subject: Kernel panic after updating 14-CURRENT amd64 to 
main-n264268-ff4633d9f89
Date: Sat, 22 Jul 2023 02:50:23 +0900 (JST)

> After updating my 14.0-CURRENT amd64 system from
> main-n264162-f58378393fb to main-n264268-ff4633d9f89, kernel crashes
> with panic as following.
> 
> https://people.freebsd.org/~yasu/FreeBSD-14-CURRENT-amd64-main-n264268-ff4633d9f89.20230721.panic.png

According to the result of bisect, kernel panic starts with following
commit.

--
commit 95f7b36e8fac45092b9a4eea5e32732e979989f0
Author: Kevin Bowling 
Date:   Thu Jul 20 20:30:00 2023 -0700

e1000: lem(4)/em(4) ifcaps, TSO and hwcsum fixes

* em(4) obey administrative ifcaps for using hwcsum offload
* em(4) obey administrative ifcaps for hw vlan receive tagging
* em(4) add additional TSO6 ifcap, but disabled by default as is TSO4
* lem(4) obey administrative ifcaps for using hwcsum offload
* lem(4) add support for hw vlan receive tagging
* lem(4) Add ifcaps for TSO offload experimentation, but disabled by
  default due to errata and possibly missing txrx code.
* lem(4) disable HWCSUM ifcaps by default on 82547 due to errata around
  full duplex links.  It may still be administratively enabled.

Reviewed by:markj (previous version)
MFC after:  2 weeks
Differential Revision:  https://reviews.freebsd.org/D30072
--

Cc-ing to its committer.

---
Yasuhiro Kimura



Kernel panic after updating 14-CURRENT amd64 to main-n264268-ff4633d9f89

2023-07-21 Thread Yasuhiro Kimura
After updating my 14.0-CURRENT amd64 system from
main-n264162-f58378393fb to main-n264268-ff4633d9f89, kernel crashes
with panic as following.

https://people.freebsd.org/~yasu/FreeBSD-14-CURRENT-amd64-main-n264268-ff4633d9f89.20230721.panic.png

---
Yasuhiro Kimura



Re: Kernel panic on jail start

2023-04-04 Thread Dmitry Chagin
On Tue, Apr 04, 2023 at 02:31:20AM +0200, Goran Mekić wrote:
> > > > >   exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
> > > > > >/etc/rc.conf.d/network";
> > > > 
> > > > ah, I see where the problem is, 
> > > > until its fixed you can try to set compat.linux.use_real_ifnames to 1, 
> > > > or
> > > > s/eth0/to some oyhe if name/
> > > 
> > > You are correct, that was the problem. Sorry for long delay, but I'm not
> > > the only user of this machine, swap is too little for core dump, I 
> > > couldn't
> > > make dumping to ZVOL work nor using USB key as a swap device. I don't
> > > know what I'm doing wrong with the code dumps as it works like a charm
> > > on a laptop. Thank you for looking into this.
> > > 
> > 
> > Hi, could you please try 7ae0972c7b ?
> 
> Hello,
> 
> I confirm it works on my machine. Thank you very much for working on
> this with little info I could provide!
Thank you!

> 
> Regards,
> meka





Re: Kernel panic on jail start

2023-04-03 Thread Goran Mekić
> > > >   exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
> > > > >/etc/rc.conf.d/network";
> > > 
> > > ah, I see where the problem is,   
> > > until its fixed you can try to set compat.linux.use_real_ifnames to 1, or
> > > s/eth0/to some oyhe if name/
> > 
> > You are correct, that was the problem. Sorry for long delay, but I'm not
> > the only user of this machine, swap is too little for core dump, I couldn't
> > make dumping to ZVOL work nor using USB key as a swap device. I don't
> > know what I'm doing wrong with the code dumps as it works like a charm
> > on a laptop. Thank you for looking into this.
> > 
> 
> Hi, could you please try 7ae0972c7b ?

Hello,

I confirm it works on my machine. Thank you very much for working on
this with little info I could provide!

Regards,
meka


signature.asc
Description: PGP signature


Re: Kernel panic on jail start

2023-04-03 Thread Dmitry Chagin
On Mon, Apr 03, 2023 at 02:08:09PM +0200, Goran Mekić wrote:
> On Fri, Mar 31, 2023 at 12:20:47PM +0300, Dmitry Chagin wrote:
> > On Thu, Mar 30, 2023 at 07:08:28PM +0200, Goran Mekić wrote:
> > >   exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
> > > >/etc/rc.conf.d/network";
> > 
> > ah, I see where the problem is, 
> > until its fixed you can try to set compat.linux.use_real_ifnames to 1, or
> > s/eth0/to some oyhe if name/
> 
> You are correct, that was the problem. Sorry for long delay, but I'm not
> the only user of this machine, swap is too little for core dump, I couldn't
> make dumping to ZVOL work nor using USB key as a swap device. I don't
> know what I'm doing wrong with the code dumps as it works like a charm
> on a laptop. Thank you for looking into this.
> 

Hi, could you please try 7ae0972c7b ?

> Regards,
> meka





Re: Kernel panic on jail start

2023-04-03 Thread Goran Mekić
On Fri, Mar 31, 2023 at 12:20:47PM +0300, Dmitry Chagin wrote:
> On Thu, Mar 30, 2023 at 07:08:28PM +0200, Goran Mekić wrote:
> >   exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
> > >/etc/rc.conf.d/network";
> 
> ah, I see where the problem is,   
> until its fixed you can try to set compat.linux.use_real_ifnames to 1, or
> s/eth0/to some oyhe if name/

You are correct, that was the problem. Sorry for long delay, but I'm not
the only user of this machine, swap is too little for core dump, I couldn't
make dumping to ZVOL work nor using USB key as a swap device. I don't
know what I'm doing wrong with the code dumps as it works like a charm
on a laptop. Thank you for looking into this.

Regards,
meka


signature.asc
Description: PGP signature


Re: Kernel panic on jail start

2023-03-31 Thread Dmitry Chagin
On Thu, Mar 30, 2023 at 07:08:28PM +0200, Goran Mekić wrote:
> Hello,
> 
> I get the kernel panic when starting jail. With git bisect I found out
> the offending commit is 0b56641cfcda30d06243223f37781ccc18455bef. After
> reverting it, everything is back to normal. For completeness, this is my
> jail.conf:
> 
> network {
>   $id = 1;
>   $base = /var/jails;
>   persist;
>   vnet;
>   path = "${base}/${name}";
>   mount.devfs;
>   host.domainname = "example.com";
>   host.hostname = "${name}.${host.domainname}";
>   vnet.interface = "epair${id}b";
>   devfs_ruleset = 8;
>   allow.raw_sockets;
> 
>   mount += "/var/run/reggae ${path}/var/run/reggae nullfs ro 0 0";
> 
>   exec.prepare  = "[ ! -e ${path}/var/run/reggae ] && mkdir 
> ${path}/var/run/reggae || true";
>   exec.prepare += "ifconfig epair${id}a && ifconfig epair${id}a destroy || 
> true";
>   exec.prestart  = "ifconfig epair${id} create up group $(echo ${name} | cut 
> -b 1-15) || (ifconfig epair${id}a destroy && false)";
>   exec.prestart += "ifconfig jails addm epair${id}a";
>   exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
> >/etc/rc.conf.d/network";

ah, I see where the problem is, 
until its fixed you can try to set compat.linux.use_real_ifnames to 1, or
s/eth0/to some oyhe if name/

>   exec.start += "/bin/sh /etc/rc";
>   exec.stop = "/bin/sh /etc/rc.shutdown";
>   exec.poststop = "ifconfig epair${id}a destroy";
>   exec.clean;
>   exec.consolelog = "/var/log/jails/${host.hostname}";
> }
> 
> The jail root is created with bsdinstall disinstall/distfetch and
> 14-CURRENT.
> 
> Regards,
> meka





Re: Kernel panic on jail start

2023-03-30 Thread Dmitry Chagin
On Thu, Mar 30, 2023 at 07:08:28PM +0200, Goran Mekić wrote:
> Hello,
> 
> I get the kernel panic when starting jail. With git bisect I found out
> the offending commit is 0b56641cfcda30d06243223f37781ccc18455bef. After
> reverting it, everything is back to normal. For completeness, this is my
> jail.conf:
> 

it would be better to see backtrace at least, thanks

> network {
>   $id = 1;
>   $base = /var/jails;
>   persist;
>   vnet;
>   path = "${base}/${name}";
>   mount.devfs;
>   host.domainname = "example.com";
>   host.hostname = "${name}.${host.domainname}";
>   vnet.interface = "epair${id}b";
>   devfs_ruleset = 8;
>   allow.raw_sockets;
> 
>   mount += "/var/run/reggae ${path}/var/run/reggae nullfs ro 0 0";
> 
>   exec.prepare  = "[ ! -e ${path}/var/run/reggae ] && mkdir 
> ${path}/var/run/reggae || true";
>   exec.prepare += "ifconfig epair${id}a && ifconfig epair${id}a destroy || 
> true";
>   exec.prestart  = "ifconfig epair${id} create up group $(echo ${name} | cut 
> -b 1-15) || (ifconfig epair${id}a destroy && false)";
>   exec.prestart += "ifconfig jails addm epair${id}a";
>   exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
> >/etc/rc.conf.d/network";
>   exec.start += "/bin/sh /etc/rc";
>   exec.stop = "/bin/sh /etc/rc.shutdown";
>   exec.poststop = "ifconfig epair${id}a destroy";
>   exec.clean;
>   exec.consolelog = "/var/log/jails/${host.hostname}";
> }
> 
> The jail root is created with bsdinstall disinstall/distfetch and
> 14-CURRENT.
> 
> Regards,
> meka





Kernel panic on jail start

2023-03-30 Thread Goran Mekić
Hello,

I get the kernel panic when starting jail. With git bisect I found out
the offending commit is 0b56641cfcda30d06243223f37781ccc18455bef. After
reverting it, everything is back to normal. For completeness, this is my
jail.conf:

network {
  $id = 1;
  $base = /var/jails;
  persist;
  vnet;
  path = "${base}/${name}";
  mount.devfs;
  host.domainname = "example.com";
  host.hostname = "${name}.${host.domainname}";
  vnet.interface = "epair${id}b";
  devfs_ruleset = 8;
  allow.raw_sockets;

  mount += "/var/run/reggae ${path}/var/run/reggae nullfs ro 0 0";

  exec.prepare  = "[ ! -e ${path}/var/run/reggae ] && mkdir 
${path}/var/run/reggae || true";
  exec.prepare += "ifconfig epair${id}a && ifconfig epair${id}a destroy || 
true";
  exec.prestart  = "ifconfig epair${id} create up group $(echo ${name} | cut -b 
1-15) || (ifconfig epair${id}a destroy && false)";
  exec.prestart += "ifconfig jails addm epair${id}a";
  exec.start  = "echo ifconfig_${vnet.interface}_name=\\"eth0\\" 
>/etc/rc.conf.d/network";
  exec.start += "/bin/sh /etc/rc";
  exec.stop = "/bin/sh /etc/rc.shutdown";
  exec.poststop = "ifconfig epair${id}a destroy";
  exec.clean;
  exec.consolelog = "/var/log/jails/${host.hostname}";
}

The jail root is created with bsdinstall disinstall/distfetch and
14-CURRENT.

Regards,
meka


signature.asc
Description: PGP signature


Re: kernel panic during zfs pool import

2022-08-22 Thread Santiago Martinez

Still the same with current.(from today).

Opening a PR for it..

Santi


On 8/22/22 10:43, Santiago Martinez wrote:

Thanks Toommas,
I compiling current right now and will give it a try, hopefully, will 
be able to mount and recover the machine.
Will also open a PR as even if the is any corruption/pool damage it 
should not panic the machine.


Santi


On 8/18/22 21:45, Toomas Soome wrote:



On 18. Aug 2022, at 18:46, Santiago Martinez  
wrote:


Hi everyone,

I have a server running 13.1-stable that was powered off 
(gracefully) and now is been powered on again and we have the 
following problem.


The server boots almost properly, kernel load and when zfs import 
other pools it panic with the following message.


"panic : Solaris (panic) zfs: adding existent segment to range tree 
(offset=4af2ca9000 size=a)".


if i boot into single user and the pools (most specifically pool01 ) 
is not imported then there is no panic. but as soon as we try to 
import pool01 we get that panic error. it worth mentioning that the 
pool imported/online before.


Have two question:

*    How i can tell to not import the pool automatically during boot 
so sever is not stuck in a boot/panic/reboot infinite loop?


removing zpool.cache should do. unfortunately, you would need to boot 
alternate root (usb/cd/net) for that.




*    Anybody know what this panic means?



in short - bug. I’d try to boot current and import pool there - maybe 
the issue is fixed in current…


rgds,
toomas


Thanks in advance!

Santi





Re: kernel panic during zfs pool import

2022-08-22 Thread Santiago Martinez

Thanks Toommas,
I compiling current right now and will give it a try, hopefully, will be 
able to mount and recover the machine.
Will also open a PR as even if the is any corruption/pool damage it 
should not panic the machine.


Santi


On 8/18/22 21:45, Toomas Soome wrote:




On 18. Aug 2022, at 18:46, Santiago Martinez  wrote:

Hi everyone,

I have a server running 13.1-stable that was powered off (gracefully) 
and now is been powered on again and we have the following problem.


The server boots almost properly, kernel load and when zfs import 
other pools it panic with the following message.


"panic : Solaris (panic) zfs: adding existent segment to range tree 
(offset=4af2ca9000 size=a)".


if i boot into single user and the pools (most specifically pool01 ) 
is not imported then there is no panic. but as soon as we try to 
import pool01 we get that panic error. it worth mentioning that the 
pool imported/online before.


Have two question:

*    How i can tell to not import the pool automatically during boot 
so sever is not stuck in a boot/panic/reboot infinite loop?


removing zpool.cache should do. unfortunately, you would need to boot 
alternate root (usb/cd/net) for that.




*    Anybody know what this panic means?



in short - bug. I’d try to boot current and import pool there - maybe 
the issue is fixed in current…


rgds,
toomas


Thanks in advance!

Santi





Re: kernel panic during zfs pool import

2022-08-18 Thread Toomas Soome


> On 18. Aug 2022, at 18:46, Santiago Martinez  wrote:
> 
> Hi everyone,
> 
> I have a server running 13.1-stable that was powered off (gracefully) and now 
> is been powered on again and we have the following problem.
> 
> The server boots almost properly, kernel load and when zfs import other pools 
> it panic with the following message.
> 
> "panic : Solaris (panic) zfs: adding existent segment to range tree 
> (offset=4af2ca9000 size=a)".
> 
> if i boot into single user and the pools (most specifically pool01 ) is not 
> imported then there is no panic. but as soon as we try to import pool01 we 
> get that panic error. it worth mentioning that the pool imported/online 
> before.
> 
> Have two question:
> 
> *How i can tell to not import the pool automatically during boot so sever 
> is not stuck in a boot/panic/reboot infinite loop?

removing zpool.cache should do. unfortunately, you would need to boot alternate 
root (usb/cd/net) for that.

> 
> *Anybody know what this panic means?
> 

in short - bug. I’d try to boot current and import pool there - maybe the issue 
is fixed in current…

rgds,
toomas

> Thanks in advance!
> 
> Santi
> 
> 
> 



kernel panic during zfs pool import

2022-08-18 Thread Santiago Martinez

Hi everyone,

I have a server running 13.1-stable that was powered off (gracefully) 
and now is been powered on again and we have the following problem.


The server boots almost properly, kernel load and when zfs import other 
pools it panic with the following message.


"panic : Solaris (panic) zfs: adding existent segment to range tree 
(offset=4af2ca9000 size=a)".


if i boot into single user and the pools (most specifically pool01 ) is 
not imported then there is no panic. but as soon as we try to import 
pool01 we get that panic error. it worth mentioning that the pool 
imported/online before.


Have two question:

*    How i can tell to not import the pool automatically during boot so 
sever is not stuck in a boot/panic/reboot infinite loop?


*    Anybody know what this panic means?

Thanks in advance!

Santi





Re: Kernel panic on armv7 when PF is enabled

2022-05-03 Thread Marek Zarychta

W dniu 2.05.2022 o 11:02, Kristof Provost pisze:

On 1 May 2022, at 5:13, qroxana wrote:

After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org

Date: Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after: 1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @
/usr/src/sys/netinet/tcp_usrreq.c:1008
stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10


The commit you point at is entirely unrelated to the code where the 
panic occurred, so I’m pretty sure something went wrong in your bisect.


I was experiencing this panic also running FreeBSD 14.0-CURRENT #1 
main-n253028-2ec9a427c85: Tue Feb  8 17:49:25 CET 2022 on armv7, so it's 
unrelated to aforementioned commit which is dated 2022-02-14.


It's very interesting and weird bug, loading pf.ko, enabling PF, loading 
the rules work as expected, but processing the filtered traffic by PF 
triggers the panic.




The backtrace would suggest the issue occurs in the 
pf_syncookie_validate() function, and likely in the line |if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0)|


The obvious way for that to panic would be to call it without the 
curvnet context set, but pf_test() uses it earlier, so that’s going to 
be fine.


Given that this is unique to armv7 I’d recommend talking to the armv7 
maintainer about 64 bit atomic operations.


You can probably avoid the atomic load with this patch (and not enabling 
syncookie support):


|diff --git a/sys/netpfil/pf/pf_syncookies.c 
b/sys/netpfil/pf/pf_syncookies.c index 5230502be30c..c86d469d3cef 100644 
--- a/sys/netpfil/pf/pf_syncookies.c +++ 
b/sys/netpfil/pf/pf_syncookies.c @@ -313,6 +313,9 @@ 
pf_syncookie_validate(struct pf_pdesc *pd) ack = 
ntohl(pd->hdr.tcp.th_ack) - 1; cookie.cookie = (ack & 0xff) ^ (ack >> 
24); + if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER) + return 
(0); + /* we don't know oddeven before setting the cookie (union) */ if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0) |


That shouldn’t be required though.

Br,
Kristof




--
Marek Zarychta


OpenPGP_signature
Description: OpenPGP digital signature


Re: Kernel panic on armv7 when PF is enabled

2022-05-03 Thread qroxana


On Monday, May 2nd, 2022 at 9:02 AM, Kristof Provost  wrote:


> On 1 May 2022, at 5:13, qroxana wrote:
>
> > After git bisecting the panic started since this commit.
> >
> > commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113
> >
> > Author: Kristof Provost <
> > k...@freebsd.org
> >
> > Date: Mon Feb 14 20:09:54 2022 +0100
> >
> > vlan: allow net.link.vlan.mtag_pcp to be set per vnet
> >
> > The primary reason for this change is to facilitate testing.
> >
> > MFC after: 1 week
> >
> > sys/net/if_ethersubr.c | 9 +
> >
> > sys/net/if_vlan.c | 5 +++--
> >
> > 2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > The armv7 board boots from a NFS root,
> >
> > it can boot without any problem if PF is disabled.
> >
> > Any helps?
> >
> > add host ::1: gateway lo0 fib 0: route already in table
> > add net fe80::: gateway ::1
> > add net ff02::: gateway ::1
> > add net :::0.0.0.0: gateway ::1
> > add net ::0.0.0.0: gateway ::1
> > Enabling pf.
> > Kernel page fault with the following non-sleepable locks held:
> > shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
> > /usr/src/sys/netpfil/pf/pf.c:6493
> > exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
> > /usr/src/sys/netinet/tcp_usrreq.c:1008
> > stack backtrace:
> > #0 0xc0355cac at witness_debugger+0x7c
> > #1 0xc0356ef0 at witness_warn+0x3fc
> > #2 0xc05ec048 at abort_handler+0x1d8
> > #3 0xc05cb5ac at exception_exit+0
> > #4 0xe3083c10 at pf_syncookie_validate+0x60
> > #5 0xe30496a8 at pf_test+0x518
> > #6 0xe306d768 at pf_check_out+0x30
> > #7 0xc0415b44 at pfil_run_hooks+0xbc
> > #8 0xc0445cfc at ip_output+0xce8
> > #9 0xc045bc9c at tcp_default_output+0x20ac
> > #10 0xc0471eb4 at tcp_usr_send+0x1ac
> > #11 0xc0389464 at sosend_generic+0x490
> > #12 0xc0389790 at sosend+0x64
> > #13 0xc0502888 at clnt_vc_call+0x560
> > #14 0xc05009d8 at clnt_reconnect_call+0x170
> > #15 0xc01e7b14 at newnfs_request+0xb20
> > #16 0xc0230218 at nfscl_request+0x60
> > #17 0xc020d9bc at nfsrpc_getattr+0xb0
> > Fatal kernel mode data abort: 'Alignment Fault' on read
> > trapframe: 0xdf1f1c90
> > FSR=0001, FAR=d7840264, spsr=4013
> > r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
> > r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
> > r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
> > r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10
>
> The commit you point at is entirely unrelated to the code where the panic 
> occurred, so I’m pretty sure something went wrong in your bisect.
>
> The backtrace would suggest the issue occurs in the pf_syncookie_validate() 
> function, and likely in the line `if 
> (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0)`
>
> The obvious way for that to panic would be to call it without the curvnet 
> context set, but pf_test() uses it earlier, so that’s going to be fine.
>
> Given that this is unique to armv7 I’d recommend talking to the armv7 
> maintainer about 64 bit atomic operations.
>
> You can probably avoid the atomic load with this patch (and not enabling 
> syncookie support):
>
> diff --git a/sys/netpfil/pf/pf_syncookies.c 
> b/sys/netpfil/pf/pf_syncookies.c
> index 5230502be30c..c86d469d3cef 100644
> --- a/sys/netpfil/pf/pf_syncookies.c
> +++ b/sys/netpfil/pf/pf_syncookies.c
> @@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd)
> ack = ntohl(pd->hdr.tcp.th_ack) - 1;
> cookie.cookie = (ack & 0xff) ^ (ack >> 24);
>
> +   if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER)
> +   return (0);
> +
> /* we don't know oddeven before setting the cookie (union) */
>  if 
> (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven])
> == 0)
>
>
> That shouldn’t be required though.
>
> Br,
> Kristof

Thank you sir. You were right.
I tested patch with the latest kernel.
It can boot successfully with the patch,
and still got kernel panic without the patch.







Re: Kernel panic on armv7 when PF is enabled

2022-05-02 Thread Kristof Provost

On 1 May 2022, at 5:13, qroxana wrote:

After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org




Date:   Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after:  1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c  | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1008

stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10


The commit you point at is entirely unrelated to the code where the 
panic occurred, so I’m pretty sure something went wrong in your 
bisect.


The backtrace would suggest the issue occurs in the  
pf_syncookie_validate() function, and likely in the line `if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0)`


The obvious way for that to panic would be to call it without the 
curvnet context set, but pf_test() uses it earlier, so that’s going to 
be fine.


Given that this is unique to armv7 I’d recommend talking to the armv7 
maintainer about 64 bit atomic operations.


You can probably avoid the atomic load with this patch (and not enabling 
syncookie support):


	diff --git a/sys/netpfil/pf/pf_syncookies.c 
b/sys/netpfil/pf/pf_syncookies.c

index 5230502be30c..c86d469d3cef 100644
--- a/sys/netpfil/pf/pf_syncookies.c
+++ b/sys/netpfil/pf/pf_syncookies.c
@@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd)
ack = ntohl(pd->hdr.tcp.th_ack) - 1;
cookie.cookie = (ack & 0xff) ^ (ack >> 24);

+   if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER)
+   return (0);
+
/* we don't know oddeven before setting the cookie (union) */
	 if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven])

== 0)

That shouldn’t be required though.

Br,
Kristof


Re: Kernel panic on armv7 when PF is enabled

2022-05-02 Thread qroxana
On Sun, 01 May 2022 03:13:43 +, qroxana  wrote:

> After git bisecting the panic started since this commit.
>
> commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113
> Author: Kristof Provost 
> Date:   Mon Feb 14 20:09:54 2022 +0100
>
> vlan: allow net.link.vlan.mtag_pcp to be set per vnet
>
> The primary reason for this change is to facilitate testing.
>
> MFC after:  1 week
>
> sys/net/if_ethersubr.c | 9 +
> sys/net/if_vlan.c  | 5 +++--
> 2 files changed, 8 insertions(+), 6 deletions(-)
>
> The armv7 board boots from a NFS root,
>
> it can boot without any problem if PF is disabled.

It appears this only occurs when the rootfs is NFS,
I also tried to boot it from a micro SD card, no kernel panic.

Another workaround to avoid the panic is to delay
starting /etc/rc.d/pf to SERVERS

--- pf.orig 2022-03-12 12:26:47.0 +
+++ pf  2022-05-02 02:59:28.131026862 +
@@ -4,7 +4,7 @@
 #

 # PROVIDE: pf
-# REQUIRE: FILESYSTEMS netif pflog pfsync routing
+# REQUIRE: SERVERS netif pflog pfsync routing
 # KEYWORD: nojailvnet

 . /etc/rc.subr

Thanks,

Kernel panic on armv7 when PF is enabled

2022-04-30 Thread qroxana
After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org
>

Date:   Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after:  1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c  | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1008
stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10

panic: Fatal abort
cpuid = 1
time = 1651366089
KDB: stack backtrace:
db_trace_self() at db_trace_self
 pc = 0xc05c8c00  lr = 0xc007ac8c (db_trace_self_wrapper+0x30)
 sp = 0xdf1f1a68  fp = 0xdf1f1b80
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
 pc = 0xc007ac8c  lr = 0xc02e289c (vpanic+0x170)
 sp = 0xdf1f1b88  fp = 0xdf1f1ba8
 r4 = 0x0100  r5 = 0x
 r6 = 0xc0780529  r7 = 0xc090ea10
vpanic() at vpanic+0x170
 pc = 0xc02e289c  lr = 0xc02e264c (doadump)
 sp = 0xdf1f1bb0  fp = 0xdf1f1bb4
 r4 = 0xdf1f1c90  r5 = 0x0013
 r6 = 0xd7840264  r7 = 0x0001
 r8 = 0x0001  r9 = 0xdb5527c0
r10 = 0xd7840264
doadump() at doadump
 pc = 0xc02e264c  lr = 0xc05ec698 (abort_align)
 sp = 0xdf1f1bbc  fp = 0xdf1f1be8
 r4 = 0xd7840264  r5 = 0xdf1f1bb4
 r6 = 0xc02e264c r10 = 0xdf1f1bbc
abort_align() at abort_align
 pc = 0xc05ec698  lr = 0xc05ec198 (abort_handler+0x328)
 sp = 0xdf1f1bf0  fp = 0xdf1f1c88
 r4 = 0x0013  r5 = 0xd7840264
abort_handler() at abort_handler+0x328
 pc = 0xc05ec198  lr = 0xc05cb5ac (exception_exit)
 sp = 0xdf1f1c90  fp = 0xdf1f1d38
 r4 = 0xdf1f1e00  r5 = 0xdac0d75f
 r6 = 0x0018  r7 = 0xd9422c00
 r8 = 0xc093e5e4  r9 = 0x0001
r10 = 0xdf1f1f5c
exception_exit() at exception_exit
 pc = 0xc05cb5ac  lr = 0xe3083bdc (pf_syncookie_validate+0x2c)
 sp = 0xdf1f1d20  fp = 0xdf1f1d38
 r0 = 0x6a228eda  r1 = 0xdac0d785
 r2 = 0xd7840264  r3 = 0xdb5527c0
 r4 = 0xdf1f1e00  r5 = 0xdac0d75f
 r6 = 0x0018  r7 = 0xd9422c00
 r8 = 0xc093e5e4  r9 = 0x0001
r10 = 0xdf1f1f5c r12 = 0xe3098dd0
pf_syncookie_validate() at pf_syncookie_validate+0x60
 pc = 0xe3083c10  lr = 0xe30496a8 (pf_test+0x518)
 sp = 0xdf1f1d40  fp = 0xdf1f1ea8
 r4 = 0x0002  r5 = 0xdb4a6100
 r6 = 0x0018  r7 = 0xd9422c00
 r8 = 0x0002  r9 = 0x0001
pf_test() at pf_test+0x518
 pc = 0xe30496a8  lr = 0xe306d768 (pf_check_out+0x30)
 sp = 0xdf1f1eb0  fp = 0xdf1f1ec0
 r4 = 0xdf1f1f5c  r5 = 0xe306d738
 r6 = 0xdb6ba660  r7 = 0x
 r8 = 0xd9422c00  r9 = 0xdb748d80
r10 = 0xfff7
pf_check_out() at pf_check_out+0x30
 pc = 0xe306d768  lr = 0xc0415b44 (pfil_run_hooks+0xbc)
 sp = 0xdf1f1ec8  fp = 0xdf1f1ef0
 r4 = 0x0002  r5 = 0xe306d738
pfil_run_hooks() at pfil_run_hooks+0xbc
 pc = 0xc0415b44  lr = 0xc0445cfc (ip_output+0xce8)
 sp = 0xdf1f1ef8  fp = 0xdf1f1fa8
 r4 = 0x010a  r5 = 0x0a0a
 r6 = 0xdb4a6158  r7 = 0xc0946908
 r8 = 0xdb5bec00  r9 = 0xd9422c00
r10 = 0x05dc
ip_output() at ip_output+0xce8
 pc = 0xc0445cfc  lr = 0xc045bc9c (tcp_default_output+0x20ac)
 sp = 0xdf1f1fb0  

Re: Kernel panic for if_epair

2022-02-16 Thread Kristof Provost
On 16 Feb 2022, at 11:31, qroxana wrote:
> It's running 14.0-CURRENT armv7 main-n252983-d21e71efce39
>
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex epairidx (epairidx) r = 0 (0xe2fe9160) locked @ 
> /usr/src/sys/net/if_epair.c:165
> stack backtrace:
> #0 0xc03558f8 at witness_debugger+0x7c
> #1 0xc0356b3c at witness_warn+0x3fc
> #2 0xc05eb3c8 at abort_handler+0x1d8
> #3 0xc05ca8e0 at exception_exit+0
> #4 0xc0475928 at udp_input+0x1c0
> #5 0xc0441884 at ip_input+0xa18
> #6 0xc041426c at netisr_dispatch_src+0x100
> #7 0xc040b9a0 at ether_demux+0x1c8
> #8 0xc040d22c at ether_nh_input+0x514
> #9 0xc041426c at netisr_dispatch_src+0x100
> #10 0xc040be94 at ether_input+0x8c
> #11 0xe2fd8130 at $a.8+0x128
> #12 0xc02a1ee0 at ithread_loop+0x268
> #13 0xc029e088 at fork_exit+0xa0
> #14 0xc05ca870 at swi_exit+0
> Fatal kernel mode data abort: 'Alignment Fault' on read
> trapframe: 0xe2a0baf0
> FSR=0001, FAR=e3f02a56, spsr=2013
> r0 =, r1 =0001, r2 =0001, r3 =0a0a
> r4 =, r5 =e3f02a6a, r6 =e3f02a56, r7 =0044
> r8 =0044, r9 =c0af955c, r10=0014, r11=e2a0bc10
> r12=, ssp=e2a0bb80, slr=c0441884, pc =c0475928
>
> panic: Fatal abort

That backtrace suggests an alignment fault in udp_input(), not an issue with 
if_epair.
There’s not even any mention of if_epair in that backtrace, but I suppose it’s 
remotely possible that it’s in epair_intr(), calling epair_sintr() in #11. That 
would explain why the epair lock is held, at least.

Note that the epair code has been substantially reworked recently so if you 
retry with a recent (post 24f0bfbad57b9c3cb9b543a60b2ba00e4812c286) build you 
won’t see the epair lock mentioned (assuming you can reproduce the panic), but 
again, it doesn’t look to be involved here anyway.

Kristof



Kernel panic for if_epair

2022-02-16 Thread qroxana
It's running 14.0-CURRENT armv7 main-n252983-d21e71efce39

Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex epairidx (epairidx) r = 0 (0xe2fe9160) locked @ 
/usr/src/sys/net/if_epair.c:165
stack backtrace:
#0 0xc03558f8 at witness_debugger+0x7c
#1 0xc0356b3c at witness_warn+0x3fc
#2 0xc05eb3c8 at abort_handler+0x1d8
#3 0xc05ca8e0 at exception_exit+0
#4 0xc0475928 at udp_input+0x1c0
#5 0xc0441884 at ip_input+0xa18
#6 0xc041426c at netisr_dispatch_src+0x100
#7 0xc040b9a0 at ether_demux+0x1c8
#8 0xc040d22c at ether_nh_input+0x514
#9 0xc041426c at netisr_dispatch_src+0x100
#10 0xc040be94 at ether_input+0x8c
#11 0xe2fd8130 at $a.8+0x128
#12 0xc02a1ee0 at ithread_loop+0x268
#13 0xc029e088 at fork_exit+0xa0
#14 0xc05ca870 at swi_exit+0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xe2a0baf0
FSR=0001, FAR=e3f02a56, spsr=2013
r0 =, r1 =0001, r2 =0001, r3 =0a0a
r4 =, r5 =e3f02a6a, r6 =e3f02a56, r7 =0044
r8 =0044, r9 =c0af955c, r10=0014, r11=e2a0bc10
r12=, ssp=e2a0bb80, slr=c0441884, pc =c0475928

panic: Fatal abort
cpuid = 0
time = 1645004889
KDB: stack backtrace:
db_trace_self() at db_trace_self
 pc = 0xc05c7f34  lr = 0xc007ac48 (db_trace_self_wrapper+0x30)
 sp = 0xe2a0b8c8  fp = 0xe2a0b9e0
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
 pc = 0xc007ac48  lr = 0xc02e259c (vpanic+0x170)
 sp = 0xe2a0b9e8  fp = 0xe2a0ba08
 r4 = 0x0100  r5 = 0x
 r6 = 0xc077f670  r7 = 0xc090d910
vpanic() at vpanic+0x170
 pc = 0xc02e259c  lr = 0xc02e234c (doadump)
 sp = 0xe2a0ba10  fp = 0xe2a0ba14
 r4 = 0xe2a0baf0  r5 = 0x0013
 r6 = 0xe3f02a56  r7 = 0x0001
 r8 = 0x0001  r9 = 0xe294e000
r10 = 0xe3f02a56
doadump() at doadump
 pc = 0xc02e234c  lr = 0xc05eba18 (abort_align)
 sp = 0xe2a0ba1c  fp = 0xe2a0ba48
 r4 = 0xe3f02a56  r5 = 0xe2a0ba14
 r6 = 0xc02e234c r10 = 0xe2a0ba1c
abort_align() at abort_align
 pc = 0xc05eba18  lr = 0xc05eb518 (abort_handler+0x328)
 sp = 0xe2a0ba50  fp = 0xe2a0bae8
 r4 = 0x0013  r5 = 0xe3f02a56
abort_handler() at abort_handler+0x328
 pc = 0xc05eb518  lr = 0xc05ca8e0 (exception_exit)
 sp = 0xe2a0baf0  fp = 0xe2a0bc10
 r4 = 0x  r5 = 0xe3f02a6a
 r6 = 0xe3f02a56  r7 = 0x0044
 r8 = 0x0044  r9 = 0xc0af955c
r10 = 0x0014
exception_exit() at exception_exit
 pc = 0xc05ca8e0  lr = 0xc0441884 (ip_input+0xa18)
 sp = 0xe2a0bb80  fp = 0xe2a0bc10
 r0 = 0x  r1 = 0x0001
 r2 = 0x0001  r3 = 0x0a0a
 r4 = 0x  r5 = 0xe3f02a6a
 r6 = 0xe3f02a56  r7 = 0x0044
 r8 = 0x0044  r9 = 0xc0af955c
r10 = 0x0014 r12 = 0x
udp_input() at udp_input+0x1c0
 pc = 0xc0475928  lr = 0xc0441884 (ip_input+0xa18)
 sp = 0xe2a0bc18  fp = 0xe2a0bc50
 r4 = 0x00022e75  r5 = 0x
 r6 = 0x0014  r7 = 0x00fb
 r8 = 0xc090d910  r9 = 0xc09457fc
r10 = 0xe3f02a56
ip_input() at ip_input+0xa18
 pc = 0xc0441884  lr = 0xc041426c (netisr_dispatch_src+0x100)
 sp = 0xe2a0bc58  fp = 0xe2a0bc80
 r4 = 0x0003a73b  r5 = 0xe3f02a00
 r6 = 0x  r7 = 0xc0b5b4b4
 r8 = 0xe4417ac0  r9 = 0x5e4a6f28
r10 = 0x0008
netisr_dispatch_src() at netisr_dispatch_src+0x100
 pc = 0xc041426c  lr = 0xc040b9a0 (ether_demux+0x1c8)
 sp = 0xe2a0bc88  fp = 0xe2a0bca0
 r4 = 0xe3244400  r5 = 0xe3f02a00
 r6 = 0x0800  r7 = 0xe3244400
 r8 = 0xe4417ac0  r9 = 0x5e4a6f28
r10 = 0x0008
ether_demux() at ether_demux+0x1c8
 pc = 0xc040b9a0  lr = 0xc040d22c (ether_nh_input+0x514)
 sp = 0xe2a0bca8  fp = 0xe2a0bd10
 r4 = 0xe3244400  r5 = 0xe3f02a00
 r6 = 0xe3f02a48  r7 = 0x
ether_nh_input() at ether_nh_input+0x514
 pc = 0xc040d22c  lr = 0xc041426c (netisr_dispatch_src+0x100)
 sp = 0xe2a0bd18  fp = 0xe2a0bd40
 r4 = 0x00050e88  r5 = 0xe3f02a00
 r6 = 0x  r7 = 0xc0b5b534
 r8 = 0x5e4a6f28  r9 = 0x0020
r10 = 0x
netisr_dispatch_src() at netisr_dispatch_src+0x100
 pc = 0xc041426c  lr = 0xc040be94 (ether_input+0x8c)
 sp = 0xe2a0bd48  fp = 0xe2a0bd80
 r4 = 0xe3244400  r5 = 0x
 r6 = 0xe3f02a00  r7 = 0x
 r8 = 0x5e4a6f28  r9 = 0x0020
r10 = 0x
ether_input() at ether_input+0x8c
 pc = 0xc040be94  lr = 0xe2fd8130 ($a.8+0x128)
 sp = 0xe2a0bd88  fp = 0xe2a0bdb8
 r4 = 0xc57f5c8c  r5 = 0x
 r6 = 0xe3244400  r7 = 0xc57f5c80
 r8 = 0xe2fe9170  r9 = 0xc0938328
r10 = 0xe2a0bd88
$a.8() at $a.8+0x128
 pc = 0xe2fd8130  lr = 0xc02a1ee0 (ithread_loop+0x268)
 sp = 

Re: Kernel Panic main-n252492-ad15eeeaba3 - Solved

2022-01-18 Thread Thomas Laus

On 1/17/22 16:33, Thomas Laus wrote:

I just updated today to main-n252492-ad15eeeaba3 from:

main-n252313-ae13829ddce on 3 PC's.

I tried bisecting the 87 changes involved between 
main-n252508-aac52f94ea5 to main-n252492-ad15eeeaba3 and started at the 
halfway point and worked forward.  After 25 kernel build updates, I was 
still able to boot on this laptop.  I gave up and did a 'git pull' to 
the latest and everything came up OK.  I never isolated my problem but 
all is good today.


Tom


--
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF



Kernel Panic main-n252492-ad15eeeaba3

2022-01-17 Thread Thomas Laus

I just updated today to main-n252492-ad15eeeaba3 from:

main-n252313-ae13829ddce on 3 PC's.

Two of them had the update go well but my Dell Inspiron 1545 had a panic 
after booting.  This laptop doesn't have a scroll buffer and the entire 
panic message scrolls off the screen, so can't take a screen photograph. 
 I compared the logged messages, I see this difference:


ERROR :intel_cpu_fifo_underrun_irq_handler] CPU pipe B FIFO underrun

In the /var/log/message on the laptop startup with the panic.  All of 
the other successful startups on this PC, don't contain this error 
message.  I tried both a GENERIC and GENERIC-DEBUG kernels with the same 
results.  The other computers running main-n252492-ad15eeeaba3 don't 
have this message and all of them successfully boot after the update. 
It looks like my project tomorrow will involve an exercise in revert and 
bisect.


Tom

--
Public Keys:
PGP KeyID = 0x5F22FDC1
GnuPG KeyID = 0x620836CF



Re: Kernel panic in networking code

2021-12-21 Thread Dustin Marquess
On Thu, Dec 9, 2021 at 12:35 PM Shawn Webb  wrote:
>
> On Thu, Dec 09, 2021 at 12:05:30PM -0500, Mark Johnston wrote:
> > On Thu, Dec 09, 2021 at 10:20:10AM -0500, Shawn Webb wrote:
> > > Hey all,
> > >
> > > It looks like there's a potential deadlock in some networking code,
> > > specifically with ipv4 jails. I can reproduce by running Poudriere on
> > > 14-CURRENT.
> > >
> > > I am using HardenedBSD 14-CURRENT, but we don't have any changes to
> > > any point in the code paths that would trigger/cause this kind of
> > > kernel panic.
> > >
> > > I've uploaded the crash.txt file here:
> > > https://hardenedbsd.org/~shawn/2021-12-09_crash-01.txt
> >
> > There is some WIP to address this in https://reviews.freebsd.org/D9
> > and its followup revision.
>
> Awesome. Thanks for the response! I'll follow along. I'm happy to test
> out the patch before it lands if needed/wanted.

I've been running glebius's revised D9 patch from Friday on my
HardenedBSD -CURRENT box since he posted it, and I haven't had any
jail related issues since. Granted I'm not running pourdriere builds
either, but I guess I could kick one off...


-Dustin



Re: Kernel panic in networking code

2021-12-09 Thread Shawn Webb
On Thu, Dec 09, 2021 at 12:05:30PM -0500, Mark Johnston wrote:
> On Thu, Dec 09, 2021 at 10:20:10AM -0500, Shawn Webb wrote:
> > Hey all,
> > 
> > It looks like there's a potential deadlock in some networking code,
> > specifically with ipv4 jails. I can reproduce by running Poudriere on
> > 14-CURRENT.
> > 
> > I am using HardenedBSD 14-CURRENT, but we don't have any changes to
> > any point in the code paths that would trigger/cause this kind of
> > kernel panic.
> > 
> > I've uploaded the crash.txt file here:
> > https://hardenedbsd.org/~shawn/2021-12-09_crash-01.txt
> 
> There is some WIP to address this in https://reviews.freebsd.org/D9
> and its followup revision.

Awesome. Thanks for the response! I'll follow along. I'm happy to test
out the patch before it lands if needed/wanted.

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc


signature.asc
Description: PGP signature


Re: Kernel panic in networking code

2021-12-09 Thread Mark Johnston
On Thu, Dec 09, 2021 at 10:20:10AM -0500, Shawn Webb wrote:
> Hey all,
> 
> It looks like there's a potential deadlock in some networking code,
> specifically with ipv4 jails. I can reproduce by running Poudriere on
> 14-CURRENT.
> 
> I am using HardenedBSD 14-CURRENT, but we don't have any changes to
> any point in the code paths that would trigger/cause this kind of
> kernel panic.
> 
> I've uploaded the crash.txt file here:
> https://hardenedbsd.org/~shawn/2021-12-09_crash-01.txt

There is some WIP to address this in https://reviews.freebsd.org/D9
and its followup revision.



Kernel panic in networking code

2021-12-09 Thread Shawn Webb
Hey all,

It looks like there's a potential deadlock in some networking code,
specifically with ipv4 jails. I can reproduce by running Poudriere on
14-CURRENT.

I am using HardenedBSD 14-CURRENT, but we don't have any changes to
any point in the code paths that would trigger/cause this kind of
kernel panic.

I've uploaded the crash.txt file here:
https://hardenedbsd.org/~shawn/2021-12-09_crash-01.txt

`uname -a`: FreeBSD ci-08 14.0-CURRENT-HBSD FreeBSD 14.0-CURRENT-HBSD #0  
hardened/current/master-n191216-7474f245a83: Wed Dec  8 22:44:04 EST 2021 
shawn@ci-08:/usr/obj/usr/src/amd64.amd64/sys/HARDENEDBSD  amd64

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc


signature.asc
Description: PGP signature


Re: Kernel panic by executing `poudriere bulk`

2021-11-26 Thread Yasuhiro Kimura
From: Mateusz Guzik 
Subject: Re: Kernel panic by executing `poudriere bulk`
Date: Fri, 26 Nov 2021 20:33:22 +0100

> On 11/26/21, Yasuhiro Kimura  wrote:
>> yasu@rolling-vm-freebsd1[1015]% uname -a
>> ~
>> FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD
>> 14.0-CURRENT #0 main-n251115-ae92ace05fd: Sat Nov 27 01:47:15 JST 2021
>> ro...@rolling-vm-freebsd1.home.utahime.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>>  amd64
>> yasu@rolling-vm-freebsd1[1016]%
>>
>> After regular weekly update of my 14-current amd64 system, kernel
>> panic happens when I execute `poudriere bulk`.
>>
>> Snapshot of console:
>> https://www.utahime.org/FreeBSD/FreeBSD-14-CURRENT-amd64-main-n251115-ae92ace05fd.panic.png
>>
> 
> Should be fixed by
> https://cgit.freebsd.org/src/commit?id=1879021942f56c8b264f4aeb1966b3733908ef62

Confirmed. Thanks for quick fix!

---
Yasuhiro Kimura



Re: Kernel panic by executing `poudriere bulk`

2021-11-26 Thread Mateusz Guzik
On 11/26/21, Yasuhiro Kimura  wrote:
> yasu@rolling-vm-freebsd1[1015]% uname -a
> ~
> FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD
> 14.0-CURRENT #0 main-n251115-ae92ace05fd: Sat Nov 27 01:47:15 JST 2021
> ro...@rolling-vm-freebsd1.home.utahime.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>  amd64
> yasu@rolling-vm-freebsd1[1016]%
>
> After regular weekly update of my 14-current amd64 system, kernel
> panic happens when I execute `poudriere bulk`.
>
> Snapshot of console:
> https://www.utahime.org/FreeBSD/FreeBSD-14-CURRENT-amd64-main-n251115-ae92ace05fd.panic.png
>

Should be fixed by
https://cgit.freebsd.org/src/commit?id=1879021942f56c8b264f4aeb1966b3733908ef62

-- 
Mateusz Guzik 



Kernel panic by executing `poudriere bulk`

2021-11-26 Thread Yasuhiro Kimura
yasu@rolling-vm-freebsd1[1015]% uname -a
 ~
FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD 14.0-CURRENT 
#0 main-n251115-ae92ace05fd: Sat Nov 27 01:47:15 JST 2021 
ro...@rolling-vm-freebsd1.home.utahime.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
  amd64
yasu@rolling-vm-freebsd1[1016]%

After regular weekly update of my 14-current amd64 system, kernel
panic happens when I execute `poudriere bulk`.

Snapshot of console:
https://www.utahime.org/FreeBSD/FreeBSD-14-CURRENT-amd64-main-n251115-ae92ace05fd.panic.png

---
Yasuhiro Kimura



Kernel panic on Lenovo Thinkpad T450

2021-11-08 Thread Maurizio Vairani
On this laptop I've been using FreeBSD 14 for a few months now and
sometimes it panics, but after upgrading to:

uname -a

FreeBSD NomadBSD 14.0-CURRENT FreeBSD 14.0-CURRENT #0 e2157cd00: Sat Nov  6
03:21:26 CET 2021 root@NomadBSD:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64

It always panics, usually when I run Firefox. The backtrace of these dumps
show these lines:

#10 0x80c2b578 in vpanic (fmt=0x811ff8df "%s",
ap=, ap@entry=0xfe01246b3860) at
/usr/src/sys/kern/kern_shutdown.c:908

#11 0x80c2b303 in panic (fmt=0x81e9f1e0 
"\033\300*\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:844

#12 0x810f4f07 in trap_fatal (frame=0xfe01246b3a60,
eva=491328337975) at /usr/src/sys/amd64/amd64/trap.c:946

#13 0x810f4fa9 in trap_pfault (frame=frame@entry=0xfe01246b3a60,
usermode=false, signo=, signo@entry=0x0, ucode=,

ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:765

#14 0x810f45a7 in trap (frame=0xfe01246b3a60) at
/usr/src/sys/amd64/amd64/trap.c:443

#15 

#16 0x837692c9 in drm_prime_handle_to_fd_ioctl () from
/boot/modules/drm.ko

#17 0x8375cc52 in drm_ioctl_kernel () from /boot/modules/drm.ko

#18 0x8375cfaf in drm_ioctl () from /boot/modules/drm.ko

#19 0x80e92727 in linux_file_ioctl_sub (fp=,
filp=0x837692a0 , fop=, cmd=,

data=, td=) at
/usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:993

#20 linux_file_ioctl (fp=, cmd=,
data=, cred=, td=0xf802aef6)

at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1610

#21 0x80ca1f52 in fo_ioctl (fp=, com=3222037549,
data=0x1, active_cred=0x0, td=0xfe0124afa1e0) at
/usr/src/sys/sys/file.h:360

#22 kern_ioctl (td=, td@entry=0xfe0124afa1e0,
fd=, com=, com@entry=3222037549,

data=0x1 ,
data@entry=0xfe01246b3d50
"\n") at /usr/src/sys/kern/sys_generic.c:803

#23 0x80ca1ca4 in sys_ioctl (td=0xfe0124afa1e0,
uap=0xfe0124afa5d0) at /usr/src/sys/kern/sys_generic.c:711

#24 0x810f58de in syscallenter (td=) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189

#25 amd64_syscall (td=0xfe0124afa1e0, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:1191

#26 

#27 0x00080ce80f0a in ?? ()

Backtrace stopped: Cannot access memory at address 0x7fffda48


What can I do ?

I can share the /var/crash directory content if necessary.

Thanks

--

Maurizio


VMMR0InitVM … kernel panic: fatal trap 9: general protection fault while in kernel mode

2021-10-17 Thread Graham Perrin

Is it worth opening a bug for what's below?

GENERIC-NODEBUG, main-n249988-2c614481fd5



Gut feeling: it might be very difficult to reproduce.

From :

…
Unread portion of the kernel message buffer:
VMMR0InitVM: eflags=246 fKernelFeatures=0x0 (SUPKERNELFEATURES_SMAP=0)


Fatal trap 9: general protection fault while in kernel mode
cpuid = 3; apic id = 03
instruction pointer = 0x20:0x810bc0a6
stack pointer   = 0x28:0xfe00c5303ba0
frame pointer   = 0x28:0xfe00c5303ba0
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 19 (arc_reap)
trap number = 9
panic: general protection fault
cpuid = 3
time = 1634464447
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe00c53038a0

vpanic() at vpanic+0x187/frame 0xfe00c5303900
panic() at panic+0x43/frame 0xfe00c5303960
trap_fatal() at trap_fatal+0x387/frame 0xfe00c53039c0
trap() at trap+0x8b/frame 0xfe00c5303ad0
calltrap() at calltrap+0x8/frame 0xfe00c5303ad0
--- trap 0x9, rip = 0x810bc0a6, rsp = 0xfe00c5303ba0, rbp = 
0xfe00c5303ba0 ---
pmap_invalidate_all_pcid_noinvpcid_cb() at 
pmap_invalidate_all_pcid_noinvpcid_cb+0x36/frame 0xfe00c5303ba0
smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x2b7/frame 
0xfe00c5303c20

pmap_invalidate_all() at pmap_invalidate_all+0x117/frame 0xfe00c5303c90
pmap_remove() at pmap_remove+0x5ae/frame 0xfe00c5303d10
_kmem_unback() at _kmem_unback+0x32/frame 0xfe00c5303d60
kmem_free() at kmem_free+0x2d/frame 0xfe00c5303d80
keg_free_slab() at keg_free_slab+0xdc/frame 0xfe00c5303dc0
keg_drain_domain() at keg_drain_domain+0x1c1/frame 0xfe00c5303e00
zone_reclaim() at zone_reclaim+0x1aa/frame 0xfe00c5303e50
arc_kmem_reap_soon() at arc_kmem_reap_soon+0x61/frame 0xfe00c5303e80
arc_reap_cb() at arc_reap_cb+0x9/frame 0xfe00c5303e90
zthr_procedure() at zthr_procedure+0xba/frame 0xfe00c5303ef0
fork_exit() at fork_exit+0x8a/frame 0xfe00c5303f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5303f30
--- trap 0x4dda280, rip = 0x1, rsp = 0, rbp = 0x1a99c090 ---
KDB: enter: panic
…
fstat

USER CMD  PID   FD MOUNT  INUM MODE SZ|DV R/W
grahampe VirtualBoxVM  3085 root / 4 drwxr-xr-x 37  r
…


Context:  lines 1285–1372.




Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-07 Thread Gary Jennejohn
On Wed, 7 Jul 2021 09:38:05 +0100
Edward Tomasz Napiera?a  wrote:

> On 0705T1833, Gary Jennejohn wrote:
> > On Mon, 5 Jul 2021 15:04:48 +0100
> > Edward Tomasz Napiera__a  wrote:
> >   
> > > On 0701T1330, Gary Jennejohn wrote:  
> > > > Gary Jennejohn  wrote:
> > > > > I noticed that the value of vm.debug.divisor affects what value is
> > > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > > > > different values.
> > > > > 
> > > > > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > > > > 
> > > > > The default is vm.debug.divisor=1.
> > > > > 
> > > > > vm.debug.divisor is only present when INVARIANTS is defined.
> > > > > 
> > > > > kskipdbg eventually affects the value of freei.
> > > > > 
> > > > > With these values:
> > > > > vm.debug.divisor: 0
> > > > > kern.cam.da.enable_uma_ccbs: 1
> > > > > I can turn on the disk and it comes up without a panic!
> > > > > 
> > > > > However, I didn't try to do any large data transfers to the disk.
> > > > > 
> > > > > So, it appears that at least vm.debug.divisor is a big factor in
> > > > > whether or not a panic happens with INVARIANTS.
> > > > > 
> > > > 
> > > > I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> > > > installed it to /boot/test.
> > > > 
> > > > Then I stuck a 160GB disk I had around into an external USB3 enclosure
> > > > and put a filesystem on it.
> > > > 
> > > > The I booted the new kernel from /boot/test and set the sysctls so:
> > > > kern.cam.da.enable_uma_ccbs: 1
> > > > kern.cam.ada.enable_uma_ccbs: 1
> > > > 
> > > > After that I plugged in the external USB3 enclosure and copied about
> > > > 114GiB of data from an internal SSD to it - without a kernel panic:
> > > > FilesystemSizeUsed   Avail Capacity  Mounted on
> > > > /dev/da0p1144G114G 18G86%/mnt
> > > > 
> > > > I'm pretty sure that's more than I could copy without a kernel panic
> > > > prior to the recent changes made in cam and umass.
> > > > 
> > > > My test may not be real proof that all bugs have been squashed, but it
> > > > certainly seems to be a better situation than we had before.
> > > 
> > > I think the vm.debug.divisor simply masks the problem; the underlying
> > > bug is still there.
> > > 
> > > Could you go back to the setup which panics, and then test the patch
> > > at https://reviews.freebsd.org/D31054?  It fixes the scenario described
> > > by Warner.
> > >   
> > 
> > It looks like this patch fixes things.
> > 
> > I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1
> > (which are now the default values on my system).
> > 
> > I used the 8TiB disk, which spins up very slowly and usually resulted very
> > quickly in a panic - no panic with the patch.
> > 
> > Then using dd to /dev/null (bs=1m) I transferred:
> > 
> > 308755+0 records in
> > 308755+0 records out
> > 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec)
> > 
> > from the disk, so about 324GiB without a panic.  
> 
> Perfect, I've committed the fix.  Thank you!
> 

Thanks to you!  I built a new kernel as soon as I saw the commit and
am running it since yesterday.

-- 
Gary Jennejohn



Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-07 Thread Edward Tomasz Napiera?a
On 0705T1833, Gary Jennejohn wrote:
> On Mon, 5 Jul 2021 15:04:48 +0100
> Edward Tomasz Napiera__a  wrote:
> 
> > On 0701T1330, Gary Jennejohn wrote:
> > > Gary Jennejohn  wrote:  
> > > > I noticed that the value of vm.debug.divisor affects what value is
> > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > > > different values.
> > > > 
> > > > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > > > 
> > > > The default is vm.debug.divisor=1.
> > > > 
> > > > vm.debug.divisor is only present when INVARIANTS is defined.
> > > > 
> > > > kskipdbg eventually affects the value of freei.
> > > > 
> > > > With these values:
> > > > vm.debug.divisor: 0
> > > > kern.cam.da.enable_uma_ccbs: 1
> > > > I can turn on the disk and it comes up without a panic!
> > > > 
> > > > However, I didn't try to do any large data transfers to the disk.
> > > > 
> > > > So, it appears that at least vm.debug.divisor is a big factor in
> > > > whether or not a panic happens with INVARIANTS.
> > > >   
> > > 
> > > I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> > > installed it to /boot/test.
> > > 
> > > Then I stuck a 160GB disk I had around into an external USB3 enclosure
> > > and put a filesystem on it.
> > > 
> > > The I booted the new kernel from /boot/test and set the sysctls so:
> > > kern.cam.da.enable_uma_ccbs: 1
> > > kern.cam.ada.enable_uma_ccbs: 1
> > > 
> > > After that I plugged in the external USB3 enclosure and copied about
> > > 114GiB of data from an internal SSD to it - without a kernel panic:
> > > FilesystemSizeUsed   Avail Capacity  Mounted on
> > > /dev/da0p1144G114G 18G86%/mnt
> > > 
> > > I'm pretty sure that's more than I could copy without a kernel panic
> > > prior to the recent changes made in cam and umass.
> > > 
> > > My test may not be real proof that all bugs have been squashed, but it
> > > certainly seems to be a better situation than we had before.  
> > 
> > I think the vm.debug.divisor simply masks the problem; the underlying
> > bug is still there.
> > 
> > Could you go back to the setup which panics, and then test the patch
> > at https://reviews.freebsd.org/D31054?  It fixes the scenario described
> > by Warner.
> > 
> 
> It looks like this patch fixes things.
> 
> I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1
> (which are now the default values on my system).
> 
> I used the 8TiB disk, which spins up very slowly and usually resulted very
> quickly in a panic - no panic with the patch.
> 
> Then using dd to /dev/null (bs=1m) I transferred:
> 
> 308755+0 records in
> 308755+0 records out
> 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec)
> 
> from the disk, so about 324GiB without a panic.

Perfect, I've committed the fix.  Thank you!




Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-05 Thread Gary Jennejohn
On Mon, 5 Jul 2021 15:04:48 +0100
Edward Tomasz Napiera__a  wrote:

> On 0701T1330, Gary Jennejohn wrote:
> > Gary Jennejohn  wrote:  
> > > I noticed that the value of vm.debug.divisor affects what value is
> > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > > different values.
> > > 
> > > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > > 
> > > The default is vm.debug.divisor=1.
> > > 
> > > vm.debug.divisor is only present when INVARIANTS is defined.
> > > 
> > > kskipdbg eventually affects the value of freei.
> > > 
> > > With these values:
> > > vm.debug.divisor: 0
> > > kern.cam.da.enable_uma_ccbs: 1
> > > I can turn on the disk and it comes up without a panic!
> > > 
> > > However, I didn't try to do any large data transfers to the disk.
> > > 
> > > So, it appears that at least vm.debug.divisor is a big factor in
> > > whether or not a panic happens with INVARIANTS.
> > >   
> > 
> > I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> > installed it to /boot/test.
> > 
> > Then I stuck a 160GB disk I had around into an external USB3 enclosure
> > and put a filesystem on it.
> > 
> > The I booted the new kernel from /boot/test and set the sysctls so:
> > kern.cam.da.enable_uma_ccbs: 1
> > kern.cam.ada.enable_uma_ccbs: 1
> > 
> > After that I plugged in the external USB3 enclosure and copied about
> > 114GiB of data from an internal SSD to it - without a kernel panic:
> > FilesystemSizeUsed   Avail Capacity  Mounted on
> > /dev/da0p1144G114G 18G86%/mnt
> > 
> > I'm pretty sure that's more than I could copy without a kernel panic
> > prior to the recent changes made in cam and umass.
> > 
> > My test may not be real proof that all bugs have been squashed, but it
> > certainly seems to be a better situation than we had before.  
> 
> I think the vm.debug.divisor simply masks the problem; the underlying
> bug is still there.
> 
> Could you go back to the setup which panics, and then test the patch
> at https://reviews.freebsd.org/D31054?  It fixes the scenario described
> by Warner.
> 

It looks like this patch fixes things.

I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1
(which are now the default values on my system).

I used the 8TiB disk, which spins up very slowly and usually resulted very
quickly in a panic - no panic with the patch.

Then using dd to /dev/null (bs=1m) I transferred:

308755+0 records in
308755+0 records out
323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec)

from the disk, so about 324GiB without a panic.

-- 
Gary Jennejohn



Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-05 Thread Edward Tomasz Napierała
On 0701T1330, Gary Jennejohn wrote:
> Gary Jennejohn  wrote:
> > I noticed that the value of vm.debug.divisor affects what value is
> > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > different values.
> > 
> > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > 
> > The default is vm.debug.divisor=1.
> > 
> > vm.debug.divisor is only present when INVARIANTS is defined.
> > 
> > kskipdbg eventually affects the value of freei.
> > 
> > With these values:
> > vm.debug.divisor: 0
> > kern.cam.da.enable_uma_ccbs: 1
> > I can turn on the disk and it comes up without a panic!
> > 
> > However, I didn't try to do any large data transfers to the disk.
> > 
> > So, it appears that at least vm.debug.divisor is a big factor in
> > whether or not a panic happens with INVARIANTS.
> > 
> 
> I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> installed it to /boot/test.
> 
> Then I stuck a 160GB disk I had around into an external USB3 enclosure
> and put a filesystem on it.
> 
> The I booted the new kernel from /boot/test and set the sysctls so:
> kern.cam.da.enable_uma_ccbs: 1
> kern.cam.ada.enable_uma_ccbs: 1
> 
> After that I plugged in the external USB3 enclosure and copied about
> 114GiB of data from an internal SSD to it - without a kernel panic:
> FilesystemSizeUsed   Avail Capacity  Mounted on
> /dev/da0p1144G114G 18G86%/mnt
> 
> I'm pretty sure that's more than I could copy without a kernel panic
> prior to the recent changes made in cam and umass.
> 
> My test may not be real proof that all bugs have been squashed, but it
> certainly seems to be a better situation than we had before.

I think the vm.debug.divisor simply masks the problem; the underlying
bug is still there.

Could you go back to the setup which panics, and then test the patch
at https://reviews.freebsd.org/D31054?  It fixes the scenario described
by Warner.




Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-01 Thread Gary Jennejohn
0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
> > > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
> > > at /usr/src/sys/cam/cam_xpt.c:4676
> > > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
> > > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> > > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
> > > at /usr/src/sys/cam/cam_xpt.c:5493
> > > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 
> > > )
> > > at /usr/src/sys/cam/cam_xpt.c:5548
> > > #21 0x807673c7 in fork_exit (callout=0x802e6720
> > > ,
> > > arg=0x81143700 , frame=0xfe00c6268c00)
> > > at /usr/src/sys/kern/kern_fork.c:1083
> > > #22 
> > >
> > > [kgdb stuff removed]
> > >
> > > (kgdb) down
> > > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > > 4374item_dtor(zone, item, cache_uz_size(cache), udata,
> > > SKIP_NONE);
> > > (kgdb) down
> > > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
> > > at /usr/src/sys/vm/uma_core.c:3418
> > > 3418uma_dbg_free(zone, NULL, item);
> > > (kgdb) p/x skipdbg
> > > $26 = 0x0
> > > (kgdb) p/x zone->uz_flags
> > > $27 = 0x4100 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR)
> > > (kgdb) down
> > > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
> > > slab=0xf800259e2fd8, item=0xf800259e2800)
> > > at /usr/src/sys/vm/uma_core.c:5659
> > > 5659panic("Unaligned free of %p from zone %p(%s) slab
> > > %p(%d)",
> > >
> > > Note that item is the same as saved_ccb.
> > >
> > > (kgdb) p/x *zone->uz_keg
> > > $28 = {uk_hash = {uh_slab_hash = 0x0, uh_hashsize = 0x0, uh_hashmask =
> > > 0x0},
> > >   uk_zones = {lh_first = 0xfe00dc9d2000}, uk_dr = {
> > > dr_policy = 0x810010e0, dr_iter = 0x0}, uk_align = 0x7,
> > >   uk_reserve = 0x0, uk_size = 0x220, uk_rsize = 0x220,
> > >   uk_init = 0x80c17d50, uk_fini = 0x80c17d80,
> > >   uk_allocf = 0x80d342f0, uk_freef = 0x80d346a0,
> > >   uk_offset = 0x0, uk_kva = 0x0, uk_pgoff = 0xfd8, uk_ppera = 0x1,
> > >   uk_ipers = 0x7, uk_flags = 0x0, uk_name = 0x80debbac, uk_link = 
> > > {
> > > le_next = 0xf80005968000, le_prev = 0xf80005968380},
> > >   uk_domain = 0xf80005968240}
> > > (kgdb) p/x *slab
> > > $29 = {us_link = {le_next = 0xdeadc0dedeadc0de, le_prev =
> > > 0xdeadc0dedeadc0de},
> > >   us_freecount = 0xc0de, us_flags = 0xad, us_domain = 0xde, us_free = {
> > > __bits = 0xf800259e2ff0}}
> > > (kgdb) p/x *0xf800259e2ff0
> > > $30 = 0xdeadc0de
> > > Don't know whether this matters, but slab seems to be unitialized.
> > > (kgdb) p/x freei
> > > $31 = 0x3
> > >
> > > In any case, saved_ccb has an address which lies outside the range
> > > covered by slab, i.e. freei is bigger than the number of entries in
> > > slab.
> > >
> > > I suspect that the only way to really figure out what's going on is to
> > > run the kernel in kgbd and set lots of breakpoints,
> > >
> > 
> > What's happening is this, I think.
> > 
> > (1) We send a request.
> > (2) It fails, so we send a start unit
> > BUT we do weird things to copy the CCBs around, and the request from (1)
> > and (2) have different allocations. Error recovery overwrites the original
> > request with a new request after saving it off
> > (3) start unit succeeds, we go to free one of the CCBs and it's marked
> > incorrectly, triggering either this panic or the prior one we saw.
> > 
> > These actions were fine when there was one allocator, but now that there
> > are two more care must be taken, and that more care hasn't been taken yet,
> > so kern.cam.da.enable_uma_ccbs=1 is unsafe for now and should not be used.
> > kern.cam.ada.enable_uma_c

Re: panic: Unaligned free (was: kernel panic while copying files)

2021-06-30 Thread Gary Jennejohn
On Wed, 30 Jun 2021 10:35:14 -0600
Warner Losh  wrote:

> On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn  wrote:
> 
> > On Wed, 30 Jun 2021 06:02:59 +0100
> > Graham Perrin  wrote:
> >  
> > > On 29/06/2021 10:42, Gary Jennejohn wrote:  
> > > > ___ panic is now the result of an unaligned free.
> > > >
> > > > panic: Unaligned free of 0xf800259e2800 from zone
> > > >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > > >
> > > > I have the crash dump and a debug kernel in case anyone wants more  
> > info.  
> > > Can you post the backtrace etc. here? Thanks
> > >  
> >
> > Sure.  As can be seen from the uma zone being da_ccb, the panic
> > resulted from setting kern.cam.da.enable_uma_ccbs=1.
> >
> > Unread portion of the kernel message buffer:
> > panic: Unaligned free of 0xf800259e2800 from zone
> > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > cpuid = 2
> > time = 1624958650
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame
> > 0xfe00c62687a0
> > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
> > vpanic() at vpanic+0x227/frame 0xfe00c62688f0
> > panic() at panic+0x4e/frame 0xfe00c6268950
> > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
> > item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
> > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
> > uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
> > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
> > camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
> > xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
> > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
> > fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> >
> > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> > 399 dumptid = curthread->td_tid;
> > (kgdb) bt
> > #0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> > #1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false,
> > dummy3=-1,
> > dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
> > #2  0x804d5bf4 in db_command (
> > last_cmdp=0x8114ce80 , cmd_table=0x0,
> > dopager=1)
> > at /usr/src/sys/ddb/db_command.c:482
> > #3  0x804d583c in db_command_loop ()
> > at /usr/src/sys/ddb/db_command.c:535
> > #4  0x804da27c in db_trap (type=3, code=0)
> > at /usr/src/sys/ddb/db_main.c:270
> > #5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
> > at /usr/src/sys/kern/subr_kdb.c:727
> > #6  0x80d31494 in trap (frame=0xfe00c6268770)
> > at /usr/src/sys/amd64/amd64/trap.c:604
> > #7  0x80d32628 in trap_check (frame=0xfe00c6268770)
> > at /usr/src/sys/amd64/amd64/trap.c:664
> > #8  
> > #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> > #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
> > msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> > #11 0x807d1725 in vpanic (
> > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> > %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
> > #12 0x807d120e in panic (
> > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> > %p(%d)")
> > at /usr/src/sys/kern/kern_shutdown.c:843
> > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
> > slab=0xf800259e2fd8, item=0xf800259e2800)
> > at /usr/src/sys/vm/uma_core.c:5659
> > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
> > at /usr/src/sys/vm/uma_core.c:3418
> > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
> > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
> > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
> > at /usr/src/sys/cam/cam_xpt.c:4676
> > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
> > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
> > at /usr/src/sys/cam/cam_xpt.c:5493
> > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 )
> > at /usr/src/sys/cam/cam_xpt.c:5548
> > #21 0x807673c7 in fork_exit (callout=0x802e6720
> > ,
> > arg=0x81143700 , frame=0xfe00c6268c00)
> > at /usr/src/sys/kern/kern_fork.c:1083
> > #22 
> >
> > [kgdb stuff removed]
> >
> > (kgdb) down
> > #15 0x80c0ba60 in uma_zfree_arg 

Re: panic: Unaligned free (was: kernel panic while copying files)

2021-06-30 Thread Warner Losh
On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn  wrote:

> On Wed, 30 Jun 2021 06:02:59 +0100
> Graham Perrin  wrote:
>
> > On 29/06/2021 10:42, Gary Jennejohn wrote:
> > > ___ panic is now the result of an unaligned free.
> > >
> > > panic: Unaligned free of 0xf800259e2800 from zone
> > >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > >
> > > I have the crash dump and a debug kernel in case anyone wants more
> info.
> > Can you post the backtrace etc. here? Thanks
> >
>
> Sure.  As can be seen from the uma zone being da_ccb, the panic
> resulted from setting kern.cam.da.enable_uma_ccbs=1.
>
> Unread portion of the kernel message buffer:
> panic: Unaligned free of 0xf800259e2800 from zone
> 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> cpuid = 2
> time = 1624958650
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame
> 0xfe00c62687a0
> kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
> vpanic() at vpanic+0x227/frame 0xfe00c62688f0
> panic() at panic+0x4e/frame 0xfe00c6268950
> uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
> item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
> uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
> uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
> xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
> camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
> xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
> xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
> fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
>
> doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> 399 dumptid = curthread->td_tid;
> (kgdb) bt
> #0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> #1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false,
> dummy3=-1,
> dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
> #2  0x804d5bf4 in db_command (
> last_cmdp=0x8114ce80 , cmd_table=0x0,
> dopager=1)
> at /usr/src/sys/ddb/db_command.c:482
> #3  0x804d583c in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:535
> #4  0x804da27c in db_trap (type=3, code=0)
> at /usr/src/sys/ddb/db_main.c:270
> #5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
> at /usr/src/sys/kern/subr_kdb.c:727
> #6  0x80d31494 in trap (frame=0xfe00c6268770)
> at /usr/src/sys/amd64/amd64/trap.c:604
> #7  0x80d32628 in trap_check (frame=0xfe00c6268770)
> at /usr/src/sys/amd64/amd64/trap.c:664
> #8  
> #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
> msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> #11 0x807d1725 in vpanic (
> fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
> #12 0x807d120e in panic (
> fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> %p(%d)")
> at /usr/src/sys/kern/kern_shutdown.c:843
> #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
> slab=0xf800259e2fd8, item=0xf800259e2800)
> at /usr/src/sys/vm/uma_core.c:5659
> #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
> at /usr/src/sys/vm/uma_core.c:3418
> #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
> item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
> #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
> at /usr/src/sys/cam/cam_xpt.c:4676
> #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
> done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
> at /usr/src/sys/cam/cam_xpt.c:5493
> #20 0x802e68e0 in xpt_done_td (arg=0x81143700 )
> at /usr/src/sys/cam/cam_xpt.c:5548
> #21 0x807673c7 in fork_exit (callout=0x802e6720
> ,
> arg=0x81143700 , frame=0xfe00c6268c00)
> at /usr/src/sys/kern/kern_fork.c:1083
> #22 
>
> [kgdb stuff removed]
>
> (kgdb) down
> #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> 4374item_dtor(zone, item, cache_uz_size(cache), udata,
> SKIP_NONE);
> (kgdb) down
> #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> 

Re: panic: Unaligned free (was: kernel panic while copying files)

2021-06-30 Thread Gary Jennejohn
On Wed, 30 Jun 2021 06:02:59 +0100
Graham Perrin  wrote:

> On 29/06/2021 10:42, Gary Jennejohn wrote:
> > ___ panic is now the result of an unaligned free.
> >
> > panic: Unaligned free of 0xf800259e2800 from zone
> >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> >
> > I have the crash dump and a debug kernel in case anyone wants more info.  
> Can you post the backtrace etc. here? Thanks
> 

Sure.  As can be seen from the uma zone being da_ccb, the panic
resulted from setting kern.cam.da.enable_uma_ccbs=1.

Unread portion of the kernel message buffer:
panic: Unaligned free of 0xf800259e2800 from zone 
0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
cpuid = 2
time = 1624958650
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe00c62687a0
kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
vpanic() at vpanic+0x227/frame 0xfe00c62688f0
panic() at panic+0x4e/frame 0xfe00c6268950
uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
399 dumptid = curthread->td_tid;
(kgdb) bt
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
#1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, dummy3=-1,
dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
#2  0x804d5bf4 in db_command (
last_cmdp=0x8114ce80 , cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:482
#3  0x804d583c in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:535
#4  0x804da27c in db_trap (type=3, code=0)
at /usr/src/sys/ddb/db_main.c:270
#5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
at /usr/src/sys/kern/subr_kdb.c:727
#6  0x80d31494 in trap (frame=0xfe00c6268770)
at /usr/src/sys/amd64/amd64/trap.c:604
#7  0x80d32628 in trap_check (frame=0xfe00c6268770)
at /usr/src/sys/amd64/amd64/trap.c:664
#8  
#9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
#10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
#11 0x807d1725 in vpanic (
fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab %p(%d)", 
ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
#12 0x807d120e in panic (
fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab %p(%d)")
at /usr/src/sys/kern/kern_shutdown.c:843
#13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
slab=0xf800259e2fd8, item=0xf800259e2800)
at /usr/src/sys/vm/uma_core.c:5659
#14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
at /usr/src/sys/vm/uma_core.c:3418
#15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
#16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
#17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
at /usr/src/sys/cam/cam_xpt.c:4676
#18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
#19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
at /usr/src/sys/cam/cam_xpt.c:5493
#20 0x802e68e0 in xpt_done_td (arg=0x81143700 )
at /usr/src/sys/cam/cam_xpt.c:5548
#21 0x807673c7 in fork_exit (callout=0x802e6720 ,
arg=0x81143700 , frame=0xfe00c6268c00)
at /usr/src/sys/kern/kern_fork.c:1083
#22 

[kgdb stuff removed]

(kgdb) down
#15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
4374item_dtor(zone, item, cache_uz_size(cache), udata, 
SKIP_NONE);
(kgdb) down
#14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
at /usr/src/sys/vm/uma_core.c:3418
3418uma_dbg_free(zone, NULL, item);
(kgdb) p/x skipdbg
$26 = 0x0
(kgdb) p/x zone->uz_flags
$27 = 0x4100 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR)

panic: Unaligned free (was: kernel panic while copying files)

2021-06-29 Thread Graham Perrin

On 29/06/2021 10:42, Gary Jennejohn wrote:

… panic is now the result of an unaligned free.

panic: Unaligned free of 0xf800259e2800 from zone
 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)

I have the crash dump and a debug kernel in case anyone wants more info.

Can you post the backtrace etc. here? Thanks



Re: kernel panic while copying files

2021-06-29 Thread Gary Jennejohn
I was sort of hoping that all the recent changes made by imp@ in cam and
umass may have fixed the cause of the kernel crash.

Unfortunately not.

But there is a change - instead of a duplicate free the panic is now the
result of an unaligned free.

panic: Unaligned free of 0xf800259e2800 from zone
0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)

I have the crash dump and a debug kernel in case anyone wants more info.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-12 Thread Gary Jennejohn
On Sat, 12 Jun 2021 14:10:36 +0100
Edward Tomasz Napiera__a  wrote:

> On 0610T1150, Gary Jennejohn wrote:
> > On Tue, 8 Jun 2021 17:54:05 +0200
> > Gary Jennejohn  wrote:
> > 
> > [big snip]  
> 
> [..]
> 
> > So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
> > which was the penultimate commit made by trasz to clear CCBs on the stack
> > after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
> > to allocate CCBs in UMA.
> > 
> > Note that I only built the kernel and not world.
> > 
> > I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
> > but without the following commits for CCBs on the stack the kernel
> > paniced during startup in AHCI.
> > 
> > Anyway, this is the minimum set of changes relevant to the uma_ccbs
> > story and also results in a panic identical to the one listed above
> > when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
> > disk.
> > 
> > So, Warner is probably right and at least the da_uma_ccbs commits
> > should be reverted until more research can be done on why the panic
> > happens.
> > 
> > The ada_uma_ccbs commits do not cause any problems in my experience and
> > could probably be left in the kernel.  
> 
> Thank you, I'm working on a fix.  Meanwhile - does the current code
> cause any problems with set kern.cam.da.enable_uma_ccbs set to 0?
> If it doesn't, it probably doesn't require backing off, since 0 is
> the default, and will keep being the default until bugs such as this
> one are fixed.
> 

No, with the sysctl set to 0 it works really well.  I've been running
it that way for several days and have transferred large amounts of
data to an external USB3 disk with no problems.

I didn't mention it, but I also tested the reset kernel (with INVARIANTS)
with the sysctl set to 0 and the kernel did not panic.

I've had ada_enable_uma_ccbs set to 1 the whole time and never saw any
problems.

I agree, as long as the default is 0 all the code can stay in the tree.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-12 Thread Edward Tomasz Napierała
On 0610T1150, Gary Jennejohn wrote:
> On Tue, 8 Jun 2021 17:54:05 +0200
> Gary Jennejohn  wrote:
> 
> [big snip]

[..]

> So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
> which was the penultimate commit made by trasz to clear CCBs on the stack
> after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
> to allocate CCBs in UMA.
> 
> Note that I only built the kernel and not world.
> 
> I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
> but without the following commits for CCBs on the stack the kernel
> paniced during startup in AHCI.
> 
> Anyway, this is the minimum set of changes relevant to the uma_ccbs
> story and also results in a panic identical to the one listed above
> when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
> disk.
> 
> So, Warner is probably right and at least the da_uma_ccbs commits
> should be reverted until more research can be done on why the panic
> happens.
> 
> The ada_uma_ccbs commits do not cause any problems in my experience and
> could probably be left in the kernel.

Thank you, I'm working on a fix.  Meanwhile - does the current code
cause any problems with set kern.cam.da.enable_uma_ccbs set to 0?
If it doesn't, it probably doesn't require backing off, since 0 is
the default, and will keep being the default until bugs such as this
one are fixed.




Re: kernel panic while copying files

2021-06-10 Thread Gary Jennejohn
On Tue, 8 Jun 2021 17:54:05 +0200
Gary Jennejohn  wrote:

[big snip]
> Here's the kgdb backtrace with the -O0 kernel:
> 
> (kgdb) bt
> #0  0x8081d706 in doadump (textdump=0)
> at /usr/src/sys/kern/kern_shutdown.c:398
> #1  0x804ef15a in db_dump (dummy=-2138500043, dummy2=false, dummy3=-1,
> dummy4=0xfe00c62a11b0 "") at /usr/src/sys/ddb/db_command.c:575
> #2  0x804eef5f in db_command (
> last_cmdp=0x8114d380 , cmd_table=0x0, dopager=1)
> at /usr/src/sys/ddb/db_command.c:482
> #3  0x804eeb38 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:535
> #4  0x804f38ef in db_trap (type=3, code=0)
> at /usr/src/sys/ddb/db_main.c:270
> #5  0x80891d02 in kdb_trap (type=3, code=0, tf=0xfe00c62a1680)
> at /usr/src/sys/kern/subr_kdb.c:727
> #6  0x80dd53c3 in trap (frame=0xfe00c62a1680)
> at /usr/src/sys/amd64/amd64/trap.c:604
> #7  0x80dd6718 in trap_check (frame=0xfe00c62a1680)
> at /usr/src/sys/amd64/amd64/trap.c:664
> #8  
> #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> #10 0x808910d0 in kdb_enter (why=0x80eaaf0b "panic",
> msg=0x80eaaf0b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> #11 0x8081dbfe in vpanic (
> fmt=0x80e80f73 "Duplicate free of %p from zone %p(%s) slab 
> %p(%d)", ap=0xfe00c62a1850) at /usr/src/sys/kern/kern_shutdown.c:906
> #12 0x8081d6b0 in panic (
> fmt=0x80e80f73 "Duplicate free of %p from zone %p(%s) slab 
> %p(%d)")
> at /usr/src/sys/kern/kern_shutdown.c:843
> #13 0x80caaec5 in uma_dbg_free (zone=0xfe00dc9d9800,
> slab=0xf80007ee0fd8, item=0xf80007ee)
> at /usr/src/sys/vm/uma_core.c:5664
> #14 0x80c9faf5 in item_dtor (zone=0xfe00dc9d9800,
> item=0xf80007ee, size=544, udata=0x0, skip=SKIP_NONE)
> at /usr/src/sys/vm/uma_core.c:3418
> #15 0x80c9eec7 in uma_zfree_arg (zone=0xfe00dc9d9800,
> item=0xf80007ee, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #16 0x802e5a89 in uma_zfree (zone=0xfe00dc9d9800,
> item=0xf80007ee) at /usr/src/sys/vm/uma.h:404
> #17 0x802dcfa6 in xpt_free_ccb (free_ccb=0xf80007ee)
> at /usr/src/sys/cam/cam_xpt.c:4674
> #18 0x802db639 in camperiphdone (periph=0xf8005d68bd00,
> done_ccb=0xf80007797cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #19 0x802e59b6 in xpt_done_process (ccb_h=0xf80007797cc0)
> at /usr/src/sys/cam/cam_xpt.c:5491
> #20 0x802e811e in xpt_done_td (arg=0x81143c00 )
> at /usr/src/sys/cam/cam_xpt.c:5546
> #21 0x807ac0ea in fork_exit (callout=0x802e7f20 ,
> arg=0x81143c00 , frame=0xfe00c62a1c00)
> at /usr/src/sys/kern/kern_fork.c:1083
> #22 
> 

So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
which was the penultimate commit made by trasz to clear CCBs on the stack
after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
to allocate CCBs in UMA.

Note that I only built the kernel and not world.

I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
but without the following commits for CCBs on the stack the kernel
paniced during startup in AHCI.

Anyway, this is the minimum set of changes relevant to the uma_ccbs
story and also results in a panic identical to the one listed above
when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
disk.

So, Warner is probably right and at least the da_uma_ccbs commits
should be reverted until more research can be done on why the panic
happens.

The ada_uma_ccbs commits do not cause any problems in my experience and
could probably be left in the kernel.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 06:27:04 -0600
Warner Losh  wrote:

> On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn  wrote:
> 
[snip old stuff]
> > Here the kgdb backtrace:
> >
> > Unread portion of the kernel message buffer:
> > panic: Duplicate free of 0xf800356b9000 from zone
> > 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> > cpuid = 8
> > time = 1623140519
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfe00c5f398c0
> > vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> > panic() at panic+0x43/frame 0xfe00c5f39970
> > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> > camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> > xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> > fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> >
> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> > (offsetof(struct pcpu,
> > (kgdb) bt
> > #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > #1  doadump (textdump=textdump@entry=0)
> > at /usr/src/sys/kern/kern_shutdown.c:399
> > #2  0x8040c39a in db_dump (dummy=,
> > dummy2=, dummy3=, dummy4=)
> > at /usr/src/sys/ddb/db_command.c:575
> > #3  0x8040c192 in db_command (last_cmdp=,
> > cmd_table=, dopager=dopager@entry=1)
> > at /usr/src/sys/ddb/db_command.c:482
> > #4  0x8040beed in db_command_loop ()
> > at /usr/src/sys/ddb/db_command.c:535
> > #5  0x8040f616 in db_trap (type=, code= > out>)  
> > at /usr/src/sys/ddb/db_main.c:270
> > #6  0x8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0,
> > tf=, tf@entry=0xfe00c5f397f0)
> > at /usr/src/sys/kern/subr_kdb.c:727
> > #7  0x809a4e96 in trap (frame=0xfe00c5f397f0)
> > at /usr/src/sys/amd64/amd64/trap.c:604
> > #8  
> > #9  kdb_enter (why=0x80a61a23 "panic", msg=)
> > at /usr/src/sys/kern/subr_kdb.c:506
> > #10 0x806207a2 in vpanic (fmt=, ap=,
> > ap@entry=0xfe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907
> > #11 0x80620533 in panic (
> > fmt=0x80d635c8  ".\024\244\200\377\377\377\377")
> > at /usr/src/sys/kern/kern_shutdown.c:843
> > #12 0x808e12b1 in uma_dbg_free (zone=0xfe00dcbdd800,
> > slab=0xf800356b9fd8, item=0xf800356b9000)
> > at /usr/src/sys/vm/uma_core.c:5664
> > #13 0x808d9de7 in item_dtor (zone=0xfe00dcbdd800,
> > item=0xf800356b9000, size=544, udata=0x0, skip=SKIP_NONE)
> > at /usr/src/sys/vm/uma_core.c:3418
> > #14 uma_zfree_arg (zone=0xfe00dcbdd800, item=0xf800356b9000,
> > udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > #15 0x802da503 in uma_zfree (zone=0x80d635c8 ,
> > item=0x200) at /usr/src/sys/vm/uma.h:404
> >  
> 
> OK. This is a bad stack trace. camperiphdone doesn't call uma_zfree()...
> It does call
> xpt_free_ccb, though, and that's likely what's going wrong. And that
> matches the line
> numbers. Most likely this is llvm's tail call optimizations...  Can you
> compile the kernel
> either -O0 or with -fno-optimize-sibling-calls? That will give a better
> call stack.
> 
> However, it's likely the new UMA stuff trasz committed (or it's providing
> better
> diagnostics than the old malloc based code which seems more likely) that
> can be
> disabled by the tunable kern.cam.da.enable_uma_ccbs=0.
> 
> The lines in question:
> saved_ccb = (union ccb *)done_ccb->ccb_h.saved_ccb_ptr;
> bcopy(saved_ccb, done_ccb, sizeof(*done_ccb));
> xpt_free_ccb(saved_ccb);
> 
> So we overwrite the done_ccb with the saved_ccb's contents and then free
> the saved ccb.
> That's likely OKish, though.
> 
> We copy entire CCBs around in this code a lot, and I've not traced through
> it. But we're
> sending a scsi start unit in response to some error that is being reported
> via cam_periph_error()
> 
> #16 0x802d9117 in camperiphdone (periph=0xf800061e2c00,
> > done_ccb=0xf800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427
> > #17 0x802dfebd in xpt_done_process (ccb_h=0xf800355d6cc0)
> > at /usr/src/sys/cam/cam_xpt.c:5491
> > #18 0x802e1ec5 in xpt_done_td (
> > arg=arg@entry=0x80d33d80 )
> > at /usr/src/sys/cam/cam_xpt.c:5546
> > #19 0x805dad80 in fork_exit (callout=0x802e1dd0
> > ,
> > arg=0x80d33d80 , frame=0xfe00c5f39c00)
> > at /usr/src/sys/kern/kern_fork.c:1083
> > #20 
> >
> > Apparently caused by recent changes to CAM.
> >
> > Let me know if you want more information.
> >  
> 
> what's 

Re: kernel panic while copying files

2021-06-08 Thread Warner Losh
On Tue, Jun 8, 2021 at 8:42 AM Gary Jennejohn  wrote:

> On Tue, 8 Jun 2021 06:48:19 -0600
> Warner Losh  wrote:
>
> > Sorry to reply to myself... had a thought as my brain rested while making
> > tea...
> >
> > I think we may need to consider reverting (or at least not yet enabling)
> > the uma stuff.
> >
>
> I tested and enabled the UMA CCB stuff immediately after trasz had
> committed it.  I was able to copy files panic-free over USB until
> recently AFAICR.
>
> I also have had the kern.cam.ada.enable_uma_ccbs=1 set since
> then and have never seen a problem there.  Only with USB.
>

Yes. This specific bug only affects SCSI. And it only affects it when
there's an error that requires a restart. I've not yet had the time to do
an audit for where else the copying is done...


> I'll try booting a new kernel with the uma_ccb sysctl's set to 0
> and see what happens.
>
> BTW I now have a kernel compiled with -O0 ready to test.
>

Great!

Warner


Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 06:48:19 -0600
Warner Losh  wrote:

> Sorry to reply to myself... had a thought as my brain rested while making
> tea...
> 
> I think we may need to consider reverting (or at least not yet enabling)
> the uma stuff.
> 

I tested and enabled the UMA CCB stuff immediately after trasz had
committed it.  I was able to copy files panic-free over USB until
recently AFAICR.

I also have had the kern.cam.ada.enable_uma_ccbs=1 set since
then and have never seen a problem there.  Only with USB.

I'll try booting a new kernel with the uma_ccb sysctl's set to 0
and see what happens.

BTW I now have a kernel compiled with -O0 ready to test.

[snip lots of extraneous stuff]

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Warner Losh
Sorry to reply to myself... had a thought as my brain rested while making
tea...

I think we may need to consider reverting (or at least not yet enabling)
the uma stuff.

On Tue, Jun 8, 2021 at 6:27 AM Warner Losh  wrote:

>
>
> On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn 
> wrote:
>
>> On Mon, 7 Jun 2021 16:54:11 -0400
>> Mark Johnston  wrote:
>>
>> > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
>> > > I've seen this panic three times in the last two days:
>> > >
>> > > [first panic]
>> > > Unread portion of the kernel message buffer:
>> > >
>> > >
>> > > Fatal trap 12: page fault while in kernel mode
>> > > cpuid = 3; apic id = 03
>> > > fault virtual address   = 0x801118000
>> > > fault code  = supervisor write data, page not present
>> > > instruction pointer = 0x20:0x808d2212
>> > > stack pointer   = 0x28:0xfe00dbc8c760
>> > > frame pointer   = 0x28:0xfe00dbc8c7a0
>> > > code segment= base 0x0, limit 0xf, type 0x1b
>> > > = DPL 0, pres 1, long 1, def32 0, gran 1
>> > > processor eflags= interrupt enabled, resume, IOPL = 0
>> > > current process = 28 (dom0)
>> > > trap number = 12
>> > > panic: page fault
>> > > cpuid = 3
>> > > time = 1622963058
>> > > KDB: stack backtrace:
>> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe00dbc8c410
>> > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
>> > > panic() at panic+0x43/frame 0xfe00dbc8c4c0
>> > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
>> > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
>> > > trap() at trap+0x253/frame 0xfe00dbc8c690
>> > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
>> > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp
>> = 0xfe00dbc8c7a0 ---
>> > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
>> > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
>> > > bucket_cache_reclaim_domain() at
>> bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830
>> > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
>> > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
>> 0xfe00dbc8c8b0
>> > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame
>> 0xfe00dbc8cb70
>> > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
>> > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
>> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
>> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> > > KDB: enter: panic
>> > >
>> > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> > > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
>> pcpu,
>> > >pc_curthread)));
>> > >
>> > > One difference was that in the second and third panics the fault
>> virtual
>> > > address was 0x0.  But the backtrace was the same.
>> > >
>> > > Relevant info from the info.x files:
>> > > Architecture: amd64
>> > > Architecture Version: 2
>> > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039:
>> Sat Jun
>> > > 5 09:58:55 CEST 2021
>> > >
>> > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz
>> K8-class CPU)
>> > >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
>> Stepping=1
>> > >   AMD Features=0x2e500800
>> > >   AMD
>> Features2=0x35c233ff
>> > >   AMD Extended Feature Extensions ID
>> EBX=0x1007
>> > >
>> > > I have 16GiB of memory in the box.
>> > >
>> > > The panic occurred while copying files from an internal SATA SSD to a
>> > > SATA 8TB disk in an external USB3 docking station.  The panic seems to
>> > > occur quite quickly, after only a few files have been copied.
>> > >
>> > > swap is on a different internal disk.
>> > >
>> > > I can poke around in the crash dumps with kgdb if anyone wants more
>> > > information.
>> >
>> > Are you running with invariants configured in the kernel?  If not,
>> > please try to reproduce this in a kernel with
>> >
>> > options INVARIANT_SUPPORT
>> > options INVARIANTS
>> >
>> > configured.
>> >
>> > A stack trace with line numbers would also be helpful.
>>
>> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
>> soon I turn on the external USB3 disk.  No user disk access required.
>>
>> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
>> 8 09:34:32 CEST 2021
>>
>> Here the kgdb backtrace:
>>
>> Unread portion of the kernel message buffer:
>> panic: Duplicate free of 0xf800356b9000 from zone
>> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
>> cpuid = 8
>> time = 1623140519
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe00c5f398c0
>> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
>> panic() at panic+0x43/frame 0xfe00c5f39970
>> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
>> uma_zfree_arg() at 

Re: kernel panic while copying files

2021-06-08 Thread Warner Losh
On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn  wrote:

> On Mon, 7 Jun 2021 16:54:11 -0400
> Mark Johnston  wrote:
>
> > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
> > > I've seen this panic three times in the last two days:
> > >
> > > [first panic]
> > > Unread portion of the kernel message buffer:
> > >
> > >
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 3; apic id = 03
> > > fault virtual address   = 0x801118000
> > > fault code  = supervisor write data, page not present
> > > instruction pointer = 0x20:0x808d2212
> > > stack pointer   = 0x28:0xfe00dbc8c760
> > > frame pointer   = 0x28:0xfe00dbc8c7a0
> > > code segment= base 0x0, limit 0xf, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > current process = 28 (dom0)
> > > trap number = 12
> > > panic: page fault
> > > cpuid = 3
> > > time = 1622963058
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00dbc8c410
> > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
> > > panic() at panic+0x43/frame 0xfe00dbc8c4c0
> > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
> > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
> > > trap() at trap+0x253/frame 0xfe00dbc8c690
> > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
> > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp
> = 0xfe00dbc8c7a0 ---
> > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
> > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
> > > bucket_cache_reclaim_domain() at
> bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830
> > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
> > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
> 0xfe00dbc8c8b0
> > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
> > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
> > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > > KDB: enter: panic
> > >
> > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
> pcpu,
> > >pc_curthread)));
> > >
> > > One difference was that in the second and third panics the fault
> virtual
> > > address was 0x0.  But the backtrace was the same.
> > >
> > > Relevant info from the info.x files:
> > > Architecture: amd64
> > > Architecture Version: 2
> > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat
> Jun
> > > 5 09:58:55 CEST 2021
> > >
> > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz
> K8-class CPU)
> > >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
> Stepping=1
> > >   AMD Features=0x2e500800
> > >   AMD
> Features2=0x35c233ff
> > >   AMD Extended Feature Extensions ID
> EBX=0x1007
> > >
> > > I have 16GiB of memory in the box.
> > >
> > > The panic occurred while copying files from an internal SATA SSD to a
> > > SATA 8TB disk in an external USB3 docking station.  The panic seems to
> > > occur quite quickly, after only a few files have been copied.
> > >
> > > swap is on a different internal disk.
> > >
> > > I can poke around in the crash dumps with kgdb if anyone wants more
> > > information.
> >
> > Are you running with invariants configured in the kernel?  If not,
> > please try to reproduce this in a kernel with
> >
> > options INVARIANT_SUPPORT
> > options INVARIANTS
> >
> > configured.
> >
> > A stack trace with line numbers would also be helpful.
>
> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
> soon I turn on the external USB3 disk.  No user disk access required.
>
> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
> 8 09:34:32 CEST 2021
>
> Here the kgdb backtrace:
>
> Unread portion of the kernel message buffer:
> panic: Duplicate free of 0xf800356b9000 from zone
> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> cpuid = 8
> time = 1623140519
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00c5f398c0
> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> panic() at panic+0x43/frame 0xfe00c5f39970
> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
> fork_trampoline() at fork_trampoline+0xe/frame 

Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 11:04:33 +0200
Mateusz Guzik  wrote:

> Given how easy it is to reproduce perhaps you can spend a little bit
> of time narrowing it down to a specific commit. You can do it with
> git-bisect.
> 

Ok, I'll give it a try.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Hans Petter Selasky

On 6/8/21 1:34 PM, Gary Jennejohn wrote:

Fields in the ccb like periph_name, unit_number and dev_name are filled
with zeroes.


Smells like a double free, like the panic message indicates, but would 
be nice to know exactly which driver is doing this, if it is "ATA" or 
"UMASS", so to speak.


Maybe you need to do a quick bisect, like suggested.

--HPS



Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 11:20:37 +0200
Hans Petter Selasky  wrote:

> On 6/8/21 11:04 AM, Mateusz Guzik wrote:
> > Apparently caused by recent changes to CAM.
> > 
> > Let me know if you want more information.  
> 
> Maybe you can print the *ccb being freed and figure out which device
> it belongs to.
> 

I'm now running a kernel without INVARIANTS, so I can check:

Jun  8 13:23:52 ernst kernel: ugen2.4:  at usbus2
Jun  8 13:23:52 ernst kernel: umass0 on uhub5
Jun  8 13:23:52 ernst kernel: umass0:  on usbus2
Jun  8 13:23:52 ernst kernel: umass0:  SCSI over Bulk-Only; quirks = 0xc101
Jun  8 13:23:52 ernst kernel: umass0:6:0: Attached to scbus6
Jun  8 13:24:37 ernst kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Jun  8 13:24:37 ernst kernel: da0:  Fixed Direct Access 
SPC-4 SCSI device
Jun  8 13:24:37 ernst kernel: da0: Serial Number 0001
Jun  8 13:24:37 ernst kernel: da0: 400.000MB/s transfers
Jun  8 13:24:37 ernst kernel: da0: 7630885MB (15628053168 512 byte sectors)
Jun  8 13:24:37 ernst kernel: da0: quirks=0x2

The only USB device which I turned on.

Fields in the ccb like periph_name, unit_number and dev_name are filled
with zeroes.

The ccb is enormous and really hard to parse.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Hans Petter Selasky

On 6/8/21 11:04 AM, Mateusz Guzik wrote:

Apparently caused by recent changes to CAM.

Let me know if you want more information.


Maybe you can print the *ccb being freed and figure out which device it 
belongs to.


--HPS



Re: kernel panic while copying files

2021-06-08 Thread Mateusz Guzik
Given how easy it is to reproduce perhaps you can spend a little bit
of time narrowing it down to a specific commit. You can do it with
git-bisect.

On 6/8/21, Gary Jennejohn  wrote:
> On Mon, 7 Jun 2021 16:54:11 -0400
> Mark Johnston  wrote:
>
>> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
>> > I've seen this panic three times in the last two days:
>> >
>> > [first panic]
>> > Unread portion of the kernel message buffer:
>> >
>> >
>> > Fatal trap 12: page fault while in kernel mode
>> > cpuid = 3; apic id = 03
>> > fault virtual address   = 0x801118000
>> > fault code  = supervisor write data, page not present
>> > instruction pointer = 0x20:0x808d2212
>> > stack pointer   = 0x28:0xfe00dbc8c760
>> > frame pointer   = 0x28:0xfe00dbc8c7a0
>> > code segment= base 0x0, limit 0xf, type 0x1b
>> > = DPL 0, pres 1, long 1, def32 0, gran 1
>> > processor eflags= interrupt enabled, resume, IOPL = 0
>> > current process = 28 (dom0)
>> > trap number = 12
>> > panic: page fault
>> > cpuid = 3
>> > time = 1622963058
>> > KDB: stack backtrace:
>> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> > 0xfe00dbc8c410
>> > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
>> > panic() at panic+0x43/frame 0xfe00dbc8c4c0
>> > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
>> > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
>> > trap() at trap+0x253/frame 0xfe00dbc8c690
>> > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
>> > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp =
>> > 0xfe00dbc8c7a0 ---
>> > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
>> > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
>> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame
>> > 0xfe00dbc8c830
>> > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
>> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
>> > 0xfe00dbc8c8b0
>> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
>> > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
>> > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
>> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
>> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> > KDB: enter: panic
>> >
>> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
>> > pcpu,
>> >   pc_curthread)));
>> >
>> > One difference was that in the second and third panics the fault
>> > virtual
>> > address was 0x0.  But the backtrace was the same.
>> >
>> > Relevant info from the info.x files:
>> > Architecture: amd64
>> > Architecture Version: 2
>> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat
>> > Jun
>> > 5 09:58:55 CEST 2021
>> >
>> > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz
>> > K8-class CPU)
>> >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
>> > Stepping=1
>> >   AMD Features=0x2e500800
>> >   AMD
>> > Features2=0x35c233ff
>> >   AMD Extended Feature Extensions ID
>> > EBX=0x1007
>> >
>> > I have 16GiB of memory in the box.
>> >
>> > The panic occurred while copying files from an internal SATA SSD to a
>> > SATA 8TB disk in an external USB3 docking station.  The panic seems to
>> > occur quite quickly, after only a few files have been copied.
>> >
>> > swap is on a different internal disk.
>> >
>> > I can poke around in the crash dumps with kgdb if anyone wants more
>> > information.
>>
>> Are you running with invariants configured in the kernel?  If not,
>> please try to reproduce this in a kernel with
>>
>> options INVARIANT_SUPPORT
>> options INVARIANTS
>>
>> configured.
>>
>> A stack trace with line numbers would also be helpful.
>
> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
> soon I turn on the external USB3 disk.  No user disk access required.
>
> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
> 8 09:34:32 CEST 2021
>
> Here the kgdb backtrace:
>
> Unread portion of the kernel message buffer:
> panic: Duplicate free of 0xf800356b9000 from zone
> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> cpuid = 8
> time = 1623140519
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00c5f398c0
> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> panic() at panic+0x43/frame 0xfe00c5f39970
> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> fork_exit() at fork_exit+0x80/frame 

Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Mon, 7 Jun 2021 16:54:11 -0400
Mark Johnston  wrote:

> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
> > I've seen this panic three times in the last two days:
> > 
> > [first panic]
> > Unread portion of the kernel message buffer:
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 3; apic id = 03
> > fault virtual address   = 0x801118000
> > fault code  = supervisor write data, page not present
> > instruction pointer = 0x20:0x808d2212
> > stack pointer   = 0x28:0xfe00dbc8c760
> > frame pointer   = 0x28:0xfe00dbc8c7a0
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 28 (dom0)
> > trap number = 12
> > panic: page fault
> > cpuid = 3
> > time = 1622963058
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe00dbc8c410
> > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
> > panic() at panic+0x43/frame 0xfe00dbc8c4c0
> > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
> > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
> > trap() at trap+0x253/frame 0xfe00dbc8c690
> > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
> > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 
> > 0xfe00dbc8c7a0 ---
> > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
> > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 
> > 0xfe00dbc8c830
> > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0
> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
> > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
> > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > 
> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
> >pc_curthread)));
> > 
> > One difference was that in the second and third panics the fault virtual
> > address was 0x0.  But the backtrace was the same.
> > 
> > Relevant info from the info.x files:
> > Architecture: amd64
> > Architecture Version: 2
> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun
> > 5 09:58:55 CEST 2021
> > 
> > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class 
> > CPU)
> >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
> >   AMD Features=0x2e500800
> >   AMD 
> > Features2=0x35c233ff
> >   AMD Extended Feature Extensions ID 
> > EBX=0x1007
> > 
> > I have 16GiB of memory in the box.
> > 
> > The panic occurred while copying files from an internal SATA SSD to a
> > SATA 8TB disk in an external USB3 docking station.  The panic seems to
> > occur quite quickly, after only a few files have been copied.
> > 
> > swap is on a different internal disk.
> > 
> > I can poke around in the crash dumps with kgdb if anyone wants more
> > information.  
> 
> Are you running with invariants configured in the kernel?  If not,
> please try to reproduce this in a kernel with
> 
> options INVARIANT_SUPPORT
> options INVARIANTS
> 
> configured.
> 
> A stack trace with line numbers would also be helpful.

Thanks for the hint.  After enabling INVARIANTS the kernel panics as
soon I turn on the external USB3 disk.  No user disk access required.

Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
8 09:34:32 CEST 2021

Here the kgdb backtrace:

Unread portion of the kernel message buffer:
panic: Duplicate free of 0xf800356b9000 from zone 
0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
cpuid = 8
time = 1623140519
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00c5f398c0
vpanic() at vpanic+0x181/frame 0xfe00c5f39910
panic() at panic+0x43/frame 0xfe00c5f39970
uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
pcpu,
(kgdb) bt
#0  __curthread () at 

Re: kernel panic while copying files

2021-06-07 Thread Mark Johnston
On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
> I've seen this panic three times in the last two days:
> 
> [first panic]
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 3; apic id = 03
> fault virtual address   = 0x801118000
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x808d2212
> stack pointer   = 0x28:0xfe00dbc8c760
> frame pointer   = 0x28:0xfe00dbc8c7a0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 28 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 3
> time = 1622963058
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00dbc8c410
> vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
> panic() at panic+0x43/frame 0xfe00dbc8c4c0
> trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
> trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
> trap() at trap+0x253/frame 0xfe00dbc8c690
> calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
> --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 
> 0xfe00dbc8c7a0 ---
> zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
> bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
> bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 
> 0xfe00dbc8c830
> zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
> uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0
> vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
> vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
> fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
>  pc_curthread)));
> 
> One difference was that in the second and third panics the fault virtual
> address was 0x0.  But the backtrace was the same.
> 
> Relevant info from the info.x files:
> Architecture: amd64
> Architecture Version: 2
> Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun
> 5 09:58:55 CEST 2021
> 
> CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class 
> CPU)
>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>   AMD Features=0x2e500800
>   AMD 
> Features2=0x35c233ff
>   AMD Extended Feature Extensions ID EBX=0x1007
> 
> I have 16GiB of memory in the box.
> 
> The panic occurred while copying files from an internal SATA SSD to a
> SATA 8TB disk in an external USB3 docking station.  The panic seems to
> occur quite quickly, after only a few files have been copied.
> 
> swap is on a different internal disk.
> 
> I can poke around in the crash dumps with kgdb if anyone wants more
> information.

Are you running with invariants configured in the kernel?  If not,
please try to reproduce this in a kernel with

options INVARIANT_SUPPORT
options INVARIANTS

configured.

A stack trace with line numbers would also be helpful.



kernel panic while copying files

2021-06-07 Thread Gary Jennejohn
I've seen this panic three times in the last two days:

[first panic]
Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x801118000
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x808d2212
stack pointer   = 0x28:0xfe00dbc8c760
frame pointer   = 0x28:0xfe00dbc8c7a0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 28 (dom0)
trap number = 12
panic: page fault
cpuid = 3
time = 1622963058
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00dbc8c410
vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
panic() at panic+0x43/frame 0xfe00dbc8c4c0
trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
trap() at trap+0x253/frame 0xfe00dbc8c690
calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
--- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 
0xfe00dbc8c7a0 ---
zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 
0xfe00dbc8c830
zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0
vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
   pc_curthread)));

One difference was that in the second and third panics the fault virtual
address was 0x0.  But the backtrace was the same.

Relevant info from the info.x files:
Architecture: amd64
Architecture Version: 2
Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun
5 09:58:55 CEST 2021

CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
  AMD Features=0x2e500800
  AMD 
Features2=0x35c233ff
  AMD Extended Feature Extensions ID EBX=0x1007

I have 16GiB of memory in the box.

The panic occurred while copying files from an internal SATA SSD to a
SATA 8TB disk in an external USB3 docking station.  The panic seems to
occur quite quickly, after only a few files have been copied.

swap is on a different internal disk.

I can poke around in the crash dumps with kgdb if anyone wants more
information.

-- 
Gary Jennejohn



Re: kernel panic and fun debugging

2020-10-17 Thread Warner Losh
On Sat, Oct 17, 2020 at 1:17 PM Steve Kargl <
s...@troutmask.apl.washington.edu> wrote:

> On Sat, Oct 17, 2020 at 12:58:48PM -0600, Warner Losh wrote:
> > On Sat, Oct 17, 2020, 12:00 PM Andrey V. Elsukov 
> wrote:
> >
> > > On 15.10.2020 09:56, Steve Kargl wrote:
> > > > Just had a kernel panic.  Best info I give you is
> > > >
> > > > % uname -a
> > > > FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat
> Sep 26
> > > 10:35:23 PDT 2020 kargl@mobile
> :/usr/obj/usr/src/i386.i386/sys/MOBILE
> > > i386
> > > >
> > > > % kgdb gdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
> > > > ...
> > > > Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...
> > > > /usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/inferior.c:283:
> > > internal-error: struct inferior *find_inferior_pid(int): Assertion
> `pid !=
> > > 0' failed.
> > > > A problem internal to GDB has been detected,
> > > > further debugging may prove unreliable.
> > >
> > > Hi,
> > >
> > > do you have /var/crash/core.txt.1 file?
> > > It may have some useful info. Also did you try an old version
> > > /usr/libexec/kgdb ?
> > >
> >
> > I got this same error, btw, when I pointed kgdb at the wrong kernel for
> the
> > core file.
> >
> > But you can 'more' the uncompressed vmcore to get at least a traceback if
> > the proper kernel is gone...
> >
> > Warner
> >
>
> There are only 2 kernel.debug on the system.
>
> % locate kernel.debug
> /usr/lib/debug/boot/kernel/kernel.debug
> /usr/lib/debug/boot/kernel.old/kernel.debug
>
> Using
>
> % kgdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
> % kgdb /usr/lib/debug/boot/kernel.old/kernel.debug vmcore.1
>
> Both have the same result.


In a pinch, you can use /boot/kernel/kernel if you suspect a difference...


> I didn't realize the panic text
> was recorded int vmcore.1.  If I extracted everything correctly,
> here's what happen
>
> WARNING !drm_modeset_is_locked(>mutex) failed at
> /usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:621
> #0 0x2318d4b9 at linux_dump_stack+0x19
> #1 0x23219ec2 at drm_atomic_helper_check_modeset+0x92
> #2 0x23091d90 at intel_atomic_check+0x70
> #3 0x2321913e at drm_atomic_check_only+0x38e
> #4 0x23219461 at drm_atomic_commit+0x11
> #5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
> #6 0x2322458e at drm_client_modeset_commit_force+0x5e
> #7 0x23260851 at drm_fb_helper_restore_fbdev_mode_unlocked+0x71
> #8 0x2325b092 at vt_kms_postswitch+0x132
> #9 0xa8bec5 at vt_fb_postswitch+0x15
> #10 0xa9153d at vt_window_switch+0xfd
> #11 0xa8f59c at vtterm_cngrab+0x1c
> #12 0xbec5bf at termcn_cngrab+0xf
> #13 0xb4b6f6 at cngrab+0x16
> #14 0xb9fce2 at vpanic+0xd2
> #15 0xb9fc04 at panic+0x14
> #16 0xe2c06a at vm_fault_lookup+0x13a
> #17 0xe2b8e7 at vm_fault+0x77
> WARNING !drm_modeset_is_locked(>mutex) failed at
> /usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:621
> #0 0x2318d4b9 at linux_dump_stack+0x19
> #1 0x23219ec2 at drm_atomic_helper_check_modeset+0x92
> #2 0x23091d90 at intel_atomic_check+0x70
> #3 0x2321913e at drm_atomic_check_only+0x38e
> #4 0x23219461 at drm_atomic_commit+0x11
> #5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
> #6 0x2322458e at drm_client_modeset_commit_force+0x5e
> #7 0x23260851 at drm_fb_helper_restore_fbdev_mode_unlocked+0x71
> #8 0x2325b092 at vt_kms_postswitch+0x132
> #9 0xa8bec5 at vt_fb_postswitch+0x15
> #10 0xa9153d at vt_window_switch+0xfd
> #11 0xa8f59c at vtterm_cngrab+0x1c
> #12 0xbec5bf at termcn_cngrab+0xf
> #13 0xb4b6f6 at cngrab+0x16
> #14 0xb9fce2 at vpanic+0xd2
> #15 0xb9fc04 at panic+0x14
> #16 0xe2c06a at vm_fault_lookup+0x13a
> #17 0xe2b8e7 at vm_fault+0x77
> WARNING !drm_modeset_is_locked(>mode_config.connection_mutex) failed
> at
> /usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:666
> #0 0x2318d4b9 at linux_dump_stack+0x19
> #1 0x2321a005 at drm_atomic_helper_check_modeset+0x1d5
> #2 0x23091d90 at intel_atomic_check+0x70
> #3 0x2321913e at drm_atomic_check_only+0x38e
> #4 0x23219461 at drm_atomic_commit+0x11
> #5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
> #6 0x2322458e at drm_client_modeset_commit_force+0x5e
> #7 0x23260851 at drm_fb_helper_restore_fbdev_mode_unlocked+0x71
> #8 0x2325b092 at vt_kms_postswitch+0x132
> #9 0xa8bec5 at vt_fb_postswitch+0x15
> #10 0xa9153d at vt_window_switch+0

Re: kernel panic and fun debugging

2020-10-17 Thread Steve Kargl
On Sat, Oct 17, 2020 at 12:58:48PM -0600, Warner Losh wrote:
> On Sat, Oct 17, 2020, 12:00 PM Andrey V. Elsukov  wrote:
> 
> > On 15.10.2020 09:56, Steve Kargl wrote:
> > > Just had a kernel panic.  Best info I give you is
> > >
> > > % uname -a
> > > FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat Sep 26
> > 10:35:23 PDT 2020 kargl@mobile:/usr/obj/usr/src/i386.i386/sys/MOBILE
> > i386
> > >
> > > % kgdb gdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
> > > ...
> > > Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...
> > > /usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/inferior.c:283:
> > internal-error: struct inferior *find_inferior_pid(int): Assertion `pid !=
> > 0' failed.
> > > A problem internal to GDB has been detected,
> > > further debugging may prove unreliable.
> >
> > Hi,
> >
> > do you have /var/crash/core.txt.1 file?
> > It may have some useful info. Also did you try an old version
> > /usr/libexec/kgdb ?
> >
> 
> I got this same error, btw, when I pointed kgdb at the wrong kernel for the
> core file.
> 
> But you can 'more' the uncompressed vmcore to get at least a traceback if
> the proper kernel is gone...
> 
> Warner
> 

There are only 2 kernel.debug on the system.

% locate kernel.debug
/usr/lib/debug/boot/kernel/kernel.debug
/usr/lib/debug/boot/kernel.old/kernel.debug

Using

% kgdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
% kgdb /usr/lib/debug/boot/kernel.old/kernel.debug vmcore.1

Both have the same result.  I didn't realize the panic text
was recorded int vmcore.1.  If I extracted everything correctly,
here's what happen

WARNING !drm_modeset_is_locked(>mutex) failed at 
/usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:621
#0 0x2318d4b9 at linux_dump_stack+0x19
#1 0x23219ec2 at drm_atomic_helper_check_modeset+0x92
#2 0x23091d90 at intel_atomic_check+0x70
#3 0x2321913e at drm_atomic_check_only+0x38e
#4 0x23219461 at drm_atomic_commit+0x11
#5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
#6 0x2322458e at drm_client_modeset_commit_force+0x5e
#7 0x23260851 at drm_fb_helper_restore_fbdev_mode_unlocked+0x71
#8 0x2325b092 at vt_kms_postswitch+0x132
#9 0xa8bec5 at vt_fb_postswitch+0x15
#10 0xa9153d at vt_window_switch+0xfd
#11 0xa8f59c at vtterm_cngrab+0x1c
#12 0xbec5bf at termcn_cngrab+0xf
#13 0xb4b6f6 at cngrab+0x16
#14 0xb9fce2 at vpanic+0xd2
#15 0xb9fc04 at panic+0x14
#16 0xe2c06a at vm_fault_lookup+0x13a
#17 0xe2b8e7 at vm_fault+0x77
WARNING !drm_modeset_is_locked(>mutex) failed at 
/usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:621
#0 0x2318d4b9 at linux_dump_stack+0x19
#1 0x23219ec2 at drm_atomic_helper_check_modeset+0x92
#2 0x23091d90 at intel_atomic_check+0x70
#3 0x2321913e at drm_atomic_check_only+0x38e
#4 0x23219461 at drm_atomic_commit+0x11
#5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
#6 0x2322458e at drm_client_modeset_commit_force+0x5e
#7 0x23260851 at drm_fb_helper_restore_fbdev_mode_unlocked+0x71
#8 0x2325b092 at vt_kms_postswitch+0x132
#9 0xa8bec5 at vt_fb_postswitch+0x15
#10 0xa9153d at vt_window_switch+0xfd
#11 0xa8f59c at vtterm_cngrab+0x1c
#12 0xbec5bf at termcn_cngrab+0xf
#13 0xb4b6f6 at cngrab+0x16
#14 0xb9fce2 at vpanic+0xd2
#15 0xb9fc04 at panic+0x14
#16 0xe2c06a at vm_fault_lookup+0x13a
#17 0xe2b8e7 at vm_fault+0x77
WARNING !drm_modeset_is_locked(>mode_config.connection_mutex) failed at 
/usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:666
#0 0x2318d4b9 at linux_dump_stack+0x19
#1 0x2321a005 at drm_atomic_helper_check_modeset+0x1d5
#2 0x23091d90 at intel_atomic_check+0x70
#3 0x2321913e at drm_atomic_check_only+0x38e
#4 0x23219461 at drm_atomic_commit+0x11
#5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
#6 0x2322458e at drm_client_modeset_commit_force+0x5e
#7 0x23260851 at drm_fb_helper_restore_fbdev_mode_unlocked+0x71
#8 0x2325b092 at vt_kms_postswitch+0x132
#9 0xa8bec5 at vt_fb_postswitch+0x15
#10 0xa9153d at vt_window_switch+0xfd
#11 0xa8f59c at vtterm_cngrab+0x1c
#12 0xbec5bf at termcn_cngrab+0xf
#13 0xb4b6f6 at cngrab+0x16
#14 0xb9fce2 at vpanic+0xd2
#15 0xb9fc04 at panic+0x14
#16 0xe2c06a at vm_fault_lookup+0x13a
#17 0xe2b8e7 at vm_fault+0x77
WARNING !drm_modeset_is_locked(>mutex) failed at 
/usr/ports/graphics/drm-current-kmod/work/drm-kmod-drm_v5.4.62_1/drivers/gpu/drm/drm_atomic_helper.c:871
#0 0x2318d4b9 at linux_dump_stack+0x19
#1 0x2321adbd at drm_atomic_helper_check_planes+0x8d
#2 0x23092da7 at intel_atomic_check+0x1087
#3 0x2321913e at drm_atomic_check_only+0x38e
#4 0x23219461 at drm_atomic_commit+0x11
#5 0x23224793 at drm_client_modeset_commit_atomic+0xb3
#6 0x2322458e at drm_client_modeset_commit_force+0x5e
#7

Re: kernel panic and fun debugging

2020-10-17 Thread Steve Kargl
On Sat, Oct 17, 2020 at 08:57:31PM +0300, Andrey V. Elsukov wrote:
> On 15.10.2020 09:56, Steve Kargl wrote:
> > Just had a kernel panic.  Best info I give you is
> > 
> > % uname -a
> > FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat Sep 26 
> > 10:35:23 PDT 2020 kargl@mobile:/usr/obj/usr/src/i386.i386/sys/MOBILE  
> > i386
> > 
> > % kgdb gdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
> > ...
> > Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...
> > /usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/inferior.c:283: internal-error: 
> > struct inferior *find_inferior_pid(int): Assertion `pid != 0' failed.
> > A problem internal to GDB has been detected,
> > further debugging may prove unreliable.
> 
> 
> do you have /var/crash/core.txt.1 file?
> It may have some useful info. Also did you try an old version
> /usr/libexec/kgdb ?
> 

The only additional info in that file is 

mobile dumped core - see /var/crash/vmcore.1

Wed Oct 14 23:47:49 PDT 2020

FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat Sep 26 
10:35:23 PDT 2020 kargl@mobile:/usr/obj/usr/src/i386.i386/sys/MOBILE  i386

panic: vm_fault_lookup: fault on nofault entry, addr: 0


It an older laptop, so I'm not ruling out memory showing its age.
I'll also note that the laptop will panic once a week or so, when
the swapper decides to swap out something drm.  



-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kernel panic and fun debugging

2020-10-17 Thread Warner Losh
On Sat, Oct 17, 2020, 12:00 PM Andrey V. Elsukov  wrote:

> On 15.10.2020 09:56, Steve Kargl wrote:
> > Just had a kernel panic.  Best info I give you is
> >
> > % uname -a
> > FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat Sep 26
> 10:35:23 PDT 2020 kargl@mobile:/usr/obj/usr/src/i386.i386/sys/MOBILE
> i386
> >
> > % kgdb gdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
> > ...
> > Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...
> > /usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/inferior.c:283:
> internal-error: struct inferior *find_inferior_pid(int): Assertion `pid !=
> 0' failed.
> > A problem internal to GDB has been detected,
> > further debugging may prove unreliable.
>
> Hi,
>
> do you have /var/crash/core.txt.1 file?
> It may have some useful info. Also did you try an old version
> /usr/libexec/kgdb ?
>

I got this same error, btw, when I pointed kgdb at the wrong kernel for the
core file.

But you can 'more' the uncompressed vmcore to get at least a traceback if
the proper kernel is gone...

Warner

-- 
> WBR, Andrey V. Elsukov
>
>
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kernel panic and fun debugging

2020-10-17 Thread Andrey V. Elsukov
On 15.10.2020 09:56, Steve Kargl wrote:
> Just had a kernel panic.  Best info I give you is
> 
> % uname -a
> FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat Sep 26 
> 10:35:23 PDT 2020 kargl@mobile:/usr/obj/usr/src/i386.i386/sys/MOBILE  i386
> 
> % kgdb gdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
> ...
> Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...
> /usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/inferior.c:283: internal-error: 
> struct inferior *find_inferior_pid(int): Assertion `pid != 0' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.

Hi,

do you have /var/crash/core.txt.1 file?
It may have some useful info. Also did you try an old version
/usr/libexec/kgdb ?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


kernel panic and fun debugging

2020-10-15 Thread Steve Kargl
Just had a kernel panic.  Best info I give you is

% uname -a
FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT #1 r366176M: Sat Sep 26 
10:35:23 PDT 2020 kargl@mobile:/usr/obj/usr/src/i386.i386/sys/MOBILE  i386

% kgdb gdb /usr/lib/debug/boot/kernel/kernel.debug vmcore.1
...
Reading symbols from /usr/lib/debug/boot/kernel/kernel.debug...
/usr/ports/devel/gdb/work-py37/gdb-9.2/gdb/inferior.c:283: internal-error: 
struct inferior *find_inferior_pid(int): Assertion `pid != 0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

Whoops.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-10-06 Thread Shawn Webb
On Tue, Sep 29, 2020 at 05:36:15PM -0400, Shawn Webb wrote:
> On Tue, Sep 29, 2020 at 11:20:44PM +0200, Kristof Provost wrote:
> > 
> > 
> > On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
> > 
> > > Quoting Kristof Provost  (from Mon, 28 Sep 2020 13:53:16
> > > +0200):
> > > 
> > > > On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
> > > > > Quoting Kristof Provost  (from Sun, 27 Sep 2020
> > > > > 17:51:32 +0200):
> > > > > > Here???s an early version of a task queue based approach: 
> > > > > > http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
> > > > > > 
> > > > > > That still needs to be cleaned up, but this should resolve
> > > > > > the sleep issue and the LOR.
> > > > > 
> > > > > There are some issues... seems like inside a jail I can't ping
> > > > > systems outside of the hardware.
> > > > > 
> > > > > Bridge setup:
> > > > >- member jail A
> > > > >- member jail B
> > > > >- member external_if of host
> > > > > 
> > > > > If I ping the router from the host, it works. If I ping from one
> > > > > jail to another, it works. If I ping from the jail to the IP of
> > > > > the external_if, it works. If I ping from a jail to the router,
> > > > > I do not get a response.
> > > > > 
> > > > Can you check for 'failed ifpromisc' error messages in dmesg? And
> > > > verify that all bridge member interfaces are in promiscuous mode?
> > > 
> > > I have a panic for you...:
> > >  - startup still in progress = 22 jails in startup, somewhere after a
> > > few jails started the panic happened
> > >  - tcpdump was running on the external interface
> > >  - a ping to a jail IP from another system was running, the first ping
> > > went through, then it paniced
> > > 
> > > First regarding your questions about promisc mode: no error, but the
> > > promisc mode is directly disabled again on all interfaces.
> > > 
> > I think I see why you had issues with the promiscuous setting. I???ve
> > updated the patch to be even more horrific than it was before.
> > 
> > I can???t explain the panic, and the backtrace also doesn???t appear to be
> > directly related to this patch. Not sure what???s going on with that.
> 
> I should have time to test the new patch this weekend. ${LIFE} is
> keeping me busy the past few weeks. I'm gonna add an event in my
> calendar to remind me to test the patch. heh.

Sorry for the delay. I rebuilt with the new patch this morning.
Looking good on all fronts, including LORs.

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

GPG Key ID:  0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2
https://git-01.md.hardenedbsd.org/HardenedBSD/pubkeys/src/branch/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc


signature.asc
Description: PGP signature


Re: iflib/bridge kernel panic

2020-10-05 Thread Dustin Marquess
On Sat, Oct 3, 2020 at 2:54 PM Felix Kronlage-Dammers  
wrote:
>
> Alexander Leidinger wrote on 03.10.20 17:37:
>
> > Quoting Kristof Provost  (from Sat, 03 Oct 2020 16:06:43
> > +0200):
>
> >> Okay, let’s abandon that patch. It’s ugly and it doesn’t work.
> >>
> >> Here’s a different approach that I’m much happier with.
> >> https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch
> >>
> >>
> >> It passes the regression tests with WITNESS and INVARIANTS enabled,
> >> and a hack in the epair ioctl() handler to make it sleep (to look a
> >> bit like the Intel ioctl() handler that currently trips up if_bridge).
> > Works for me.
> > No crash, no LOR, promisc-mode stays enabled, jails are reachable.
>
> indeed! I can second that. Works nicely, my machine does not panic
> anymore and machines (bhyve vms) behind the bridge are reachable.

I third that, it works great for me!

-Dustin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-10-03 Thread Felix Kronlage-Dammers
Alexander Leidinger wrote on 03.10.20 17:37:

> Quoting Kristof Provost  (from Sat, 03 Oct 2020 16:06:43
> +0200):

>> Okay, let’s abandon that patch. It’s ugly and it doesn’t work.
>>
>> Here’s a different approach that I’m much happier with.
>> https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch
>>
>>
>> It passes the regression tests with WITNESS and INVARIANTS enabled,
>> and a hack in the epair ioctl() handler to make it sleep (to look a
>> bit like the Intel ioctl() handler that currently trips up if_bridge).
> Works for me.
> No crash, no LOR, promisc-mode stays enabled, jails are reachable.

indeed! I can second that. Works nicely, my machine does not panic
anymore and machines (bhyve vms) behind the bridge are reachable.


felix

-- 
GPG/PGP: 7A0B612C / 5F4D 9B06 C240 3250 35BF 66ED 1AD3 A9B8 7A0B 612C
https://hazardous.org/ - f...@hazardous.org - fkr@irc - @felixkronlage
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-10-03 Thread Alexander Leidinger
Quoting Kristof Provost  (from Sat, 03 Oct 2020  
16:06:43 +0200):



Okay, let’s abandon that patch. It’s ugly and it doesn’t work.

Here’s a different approach that I’m much happier with.
https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch

It passes the regression tests with WITNESS and INVARIANTS enabled,  
and a hack in the epair ioctl() handler to make it sleep (to look a  
bit like the Intel ioctl() handler that currently trips up if_bridge).


Works for me.

No crash, no LOR, promisc-mode stays enabled, jails are reachable.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpxQarmwzz7F.pgp
Description: Digitale PGP-Signatur


Re: iflib/bridge kernel panic

2020-10-03 Thread Kristof Provost

On 30 Sep 2020, at 13:52, Alexander Leidinger wrote:
Quoting Kristof Provost  (from Tue, 29 Sep 2020 
23:20:44 +0200):



On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:

Quoting Kristof Provost  (from Mon, 28 Sep 2020 
13:53:16 +0200):



On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
Quoting Kristof Provost  (from Sun, 27 Sep 2020 
17:51:32 +0200):
Here’s an early version of a task queue based approach: 
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the 
sleep issue and the LOR.


There are some issues... seems like inside a jail I can't ping 
systems outside of the hardware.


Bridge setup:
  - member jail A
  - member jail B
  - member external_if of host

If I ping the router from the host, it works. If I ping from one 
jail to another, it works. If I ping from the jail to the IP of 
the external_if, it works. If I ping from a jail to the router, I 
do not get a response.


Can you check for 'failed ifpromisc' error messages in dmesg? And 
verify that all bridge member interfaces are in promiscuous mode?


I have a panic for you...:
- startup still in progress = 22 jails in startup, somewhere after a 
few jails started the panic happened

- tcpdump was running on the external interface
- a ping to a jail IP from another system was running, the first 
ping went through, then it paniced


First regarding your questions about promisc mode: no error, but the 
promisc mode is directly disabled again on all interfaces.


I think I see why you had issues with the promiscuous setting. I’ve 
updated the patch to be even more horrific than it was before.


Hmmm same behavior as before.
I haven't kept the old version of the patch, so I can't compare if I 
somehow downloaded the old version again, or if I got the updated 
one...



Okay, let’s abandon that patch. It’s ugly and it doesn’t work.

Here’s a different approach that I’m much happier with.
https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch

It passes the regression tests with WITNESS and INVARIANTS enabled, and 
a hack in the epair ioctl() handler to make it sleep (to look a bit like 
the Intel ioctl() handler that currently trips up if_bridge).


Best,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-10-02 Thread Dustin Marquess
On Tue, Sep 29, 2020 at 4:21 PM Kristof Provost  wrote:
>
> On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
>
> > Quoting Kristof Provost  (from Mon, 28 Sep 2020
> > 13:53:16 +0200):
> >
> >> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
> >>> Quoting Kristof Provost  (from Sun, 27 Sep 2020
> >>> 17:51:32 +0200):
>  Here’s an early version of a task queue based approach:
>  http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
> 
>  That still needs to be cleaned up, but this should resolve the
>  sleep issue and the LOR.
> >>>
> >>> There are some issues... seems like inside a jail I can't ping
> >>> systems outside of the hardware.

So similar to the others, kind of.  Using the original
https://reviews.freebsd.org/D26418 patch, everything seems to work
fine.  Using the newer
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
patch, byhve VMs on the bridge attached to the igb/em(5) interfaces
don't pass traffic.  The bhyve VMs on the bridge attached to the
cxgbe(4) interfaces, however, work fine.

-Dustin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-30 Thread Alexander Leidinger


Quoting Kristof Provost  (from Tue, 29 Sep 2020  
23:20:44 +0200):



On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:

Quoting Kristof Provost  (from Mon, 28 Sep 2020  
13:53:16 +0200):



On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
Quoting Kristof Provost  (from Sun, 27 Sep 2020  
17:51:32 +0200):
Here’s an early version of a task queue based approach:  
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the  
sleep issue and the LOR.


There are some issues... seems like inside a jail I can't ping  
systems outside of the hardware.


Bridge setup:
  - member jail A
  - member jail B
  - member external_if of host

If I ping the router from the host, it works. If I ping from one  
jail to another, it works. If I ping from the jail to the IP of  
the external_if, it works. If I ping from a jail to the router, I  
do not get a response.


Can you check for 'failed ifpromisc' error messages in dmesg? And  
verify that all bridge member interfaces are in promiscuous mode?


I have a panic for you...:
- startup still in progress = 22 jails in startup, somewhere after  
a few jails started the panic happened

- tcpdump was running on the external interface
- a ping to a jail IP from another system was running, the first  
ping went through, then it paniced


First regarding your questions about promisc mode: no error, but  
the promisc mode is directly disabled again on all interfaces.


I think I see why you had issues with the promiscuous setting. I’ve  
updated the patch to be even more horrific than it was before.


Hmmm same behavior as before.
I haven't kept the old version of the patch, so I can't compare if I  
somehow downloaded the old version again, or if I got the updated one...


# md5 0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
MD5 (0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch) =  
9f107739e29fad5c9bb5e75e2dae7bcc


I can’t explain the panic, and the backtrace also doesn’t appear to  
be directly related to this patch. Not sure what’s going on with that.


Then let's hope for now it is some kind of defect which is not showing  
up when it works as it should... we can have a look at it again in  
case it reproduces with the final patch.


Bye,
Alexander.


--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpXvB8oyrPbh.pgp
Description: Digitale PGP-Signatur


Re: iflib/bridge kernel panic

2020-09-29 Thread Shawn Webb
On Tue, Sep 29, 2020 at 11:20:44PM +0200, Kristof Provost wrote:
> 
> 
> On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
> 
> > Quoting Kristof Provost  (from Mon, 28 Sep 2020 13:53:16
> > +0200):
> > 
> > > On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
> > > > Quoting Kristof Provost  (from Sun, 27 Sep 2020
> > > > 17:51:32 +0200):
> > > > > Here???s an early version of a task queue based approach: 
> > > > > http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
> > > > > 
> > > > > That still needs to be cleaned up, but this should resolve
> > > > > the sleep issue and the LOR.
> > > > 
> > > > There are some issues... seems like inside a jail I can't ping
> > > > systems outside of the hardware.
> > > > 
> > > > Bridge setup:
> > > >- member jail A
> > > >- member jail B
> > > >- member external_if of host
> > > > 
> > > > If I ping the router from the host, it works. If I ping from one
> > > > jail to another, it works. If I ping from the jail to the IP of
> > > > the external_if, it works. If I ping from a jail to the router,
> > > > I do not get a response.
> > > > 
> > > Can you check for 'failed ifpromisc' error messages in dmesg? And
> > > verify that all bridge member interfaces are in promiscuous mode?
> > 
> > I have a panic for you...:
> >  - startup still in progress = 22 jails in startup, somewhere after a
> > few jails started the panic happened
> >  - tcpdump was running on the external interface
> >  - a ping to a jail IP from another system was running, the first ping
> > went through, then it paniced
> > 
> > First regarding your questions about promisc mode: no error, but the
> > promisc mode is directly disabled again on all interfaces.
> > 
> I think I see why you had issues with the promiscuous setting. I???ve
> updated the patch to be even more horrific than it was before.
> 
> I can???t explain the panic, and the backtrace also doesn???t appear to be
> directly related to this patch. Not sure what???s going on with that.

I should have time to test the new patch this weekend. ${LIFE} is
keeping me busy the past few weeks. I'm gonna add an event in my
calendar to remind me to test the patch. heh.

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

GPG Key ID:  0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2
https://git-01.md.hardenedbsd.org/HardenedBSD/pubkeys/src/branch/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc


signature.asc
Description: PGP signature


Re: iflib/bridge kernel panic

2020-09-29 Thread Kristof Provost



On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:

Quoting Kristof Provost  (from Mon, 28 Sep 2020 
13:53:16 +0200):



On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
Quoting Kristof Provost  (from Sun, 27 Sep 2020 
17:51:32 +0200):
Here’s an early version of a task queue based approach: 
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the 
sleep issue and the LOR.


There are some issues... seems like inside a jail I can't ping 
systems outside of the hardware.


Bridge setup:
   - member jail A
   - member jail B
   - member external_if of host

If I ping the router from the host, it works. If I ping from one 
jail to another, it works. If I ping from the jail to the IP of the 
external_if, it works. If I ping from a jail to the router, I do not 
get a response.


Can you check for 'failed ifpromisc' error messages in dmesg? And 
verify that all bridge member interfaces are in promiscuous mode?


I have a panic for you...:
 - startup still in progress = 22 jails in startup, somewhere after a 
few jails started the panic happened

 - tcpdump was running on the external interface
 - a ping to a jail IP from another system was running, the first ping 
went through, then it paniced


First regarding your questions about promisc mode: no error, but the 
promisc mode is directly disabled again on all interfaces.


I think I see why you had issues with the promiscuous setting. I’ve 
updated the patch to be even more horrific than it was before.


I can’t explain the panic, and the backtrace also doesn’t appear to 
be directly related to this patch. Not sure what’s going on with that.


Krsitof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-28 Thread Alexander Leidinger


Quoting Kristof Provost  (from Mon, 28 Sep 2020  
13:53:16 +0200):



On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
Quoting Kristof Provost  (from Sun, 27 Sep 2020  
17:51:32 +0200):
Here’s an early version of a task queue based approach:  
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the  
sleep issue and the LOR.


There are some issues... seems like inside a jail I can't ping  
systems outside of the hardware.


Bridge setup:
   - member jail A
   - member jail B
   - member external_if of host

If I ping the router from the host, it works. If I ping from one  
jail to another, it works. If I ping from the jail to the IP of the  
external_if, it works. If I ping from a jail to the router, I do  
not get a response.


Can you check for 'failed ifpromisc' error messages in dmesg? And  
verify that all bridge member interfaces are in promiscuous mode?


I have a panic for you...:
 - startup still in progress = 22 jails in startup, somewhere after a  
few jails started the panic happened

 - tcpdump was running on the external interface
 - a ping to a jail IP from another system was running, the first  
ping went through, then it paniced


First regarding your questions about promisc mode: no error, but the  
promisc mode is directly disabled again on all interfaces.


Data (external_if = igb0, jail epairs are j_X_Yif with X the ID of the  
jail and Y either h like host-side or j like jail-side):

---snip---
Host:

# ifconfig -a
igb0: flags=8863 metric 0 mtu 1500
 
options=4a520b9

ether [...]:a4
inet 192.168.1.x netmask 0xff00 broadcast 192.168.1.255
inet6 fe80::[...]a4%igb0 prefixlen 64 scopeid 0x1
inet6 fd73:[...] prefixlen 64
inet6 2003:[...] prefixlen 64 autoconf
inet6 fd73:[...] prefixlen 64 autoconf
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=23
igb1: flags=8822 metric 0 mtu 1500
 
options=4e527bb

ether [...]:a5
media: Ethernet autoselect
status: no carrier
nd6 options=29
lo0: flags=8049 metric 0 mtu 16384
options=680003
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
inet 127.0.0.1 netmask 0xff00
groups: lo
nd6 options=21
vswitch0: flags=8843 metric 0 mtu 1500
ether [...]:a3
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: j_weather_hif flags=143
ifmaxaddr 0 port 9 priority 128 path cost 2000
member: j_web_hif flags=143
ifmaxaddr 0 port 8 priority 128 path cost 2000
member: j_commit_hif flags=143
ifmaxaddr 0 port 7 priority 128 path cost 2000
member: j_video_hif flags=143
ifmaxaddr 0 port 6 priority 128 path cost 2000
member: j_dns_hif flags=143
ifmaxaddr 0 port 5 priority 128 path cost 2000
member: igb0 flags=143
ifmaxaddr 0 port 1 priority 128 path cost 2
groups: bridge
nd6 options=9
j_dns_hif: flags=8843 metric 0  
mtu 1500

options=8
ether [...]:0a
hwaddr [...]:0a
inet6 fe80::[...]0a%j_dns_hif prefixlen 64 scopeid 0x5
groups: epair
media: Ethernet 10Gbase-T (10Gbase-T )
status: active
nd6 options=21
[... some more jail interfaces ...]

# dmesg | grep promis
igb0: promiscuous mode enabled
igb0: promiscuous mode disabled
j_dns_hif: promiscuous mode enabled
j_dns_hif: promiscuous mode disabled
[... some more like this ...]

# jexec 2 ifconfig -a
lo0: flags=8049 metric 0 mtu 16384
options=680003
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet 127.0.0.1 netmask 0xff00
groups: lo
nd6 options=21
j_dns_jif: flags=8843 metric 0  
mtu 1500

options=8
ether [...]:0b
hwaddr [...]:0b
inet 192.168.1.y netmask 0xff00 broadcast 192.168.1.255
inet6 fe80::[...]0b%j_dns_jif prefixlen 64 scopeid 0x2
inet6 fd73:[...]:y prefixlen 64
groups: epair
media: Ethernet 10Gbase-T (10Gbase-T )
status: active
nd6 options=21
---snip---

And here the backtrace of the panic:
---snip---
panic: if_setflag: decrement non-positive refcount 0 for flag 256
cpuid = 4
time = 1601300532
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0378ea3920
vpanic() at vpanic+0x182/frame 0xfe0378ea3970
panic() at panic+0x43/frame 0xfe0378ea39d0
if_setflag() at if_setflag+0x137/frame 0xfe0378ea3a30
ifpromisc() at ifpromisc+0x2a/frame 0xfe0378ea3a60
bpf_detachd_locked() at bpf_detachd_locked+0x280/frame 0xfe0378ea3ab0

Re: iflib/bridge kernel panic

2020-09-28 Thread Alexander Leidinger


Quoting Kristof Provost  (from Sun, 27 Sep 2020  
17:51:32 +0200):



On 21 Sep 2020, at 14:16, Shawn Webb wrote:

On Mon, Sep 21, 2020 at 09:57:40AM +0200, Kristof Provost wrote:

On 21 Sep 2020, at 2:52, Shawn Webb wrote:

From latest HEAD on a Dell Precision 7550 laptop:


https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2

The last working boot environment was 14 Aug 2020. If I get some time to
bisect commits, I'll try to figure out the culprit.


Try https://reviews.freebsd.org/D26418


That seems to fix the kernel panic. dmesg gets spammed with a freak
ton of these LOR messages now:

Here’s an early version of a task queue based approach:  
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the sleep  
issue and the LOR.


There are some issues... seems like inside a jail I can't ping systems  
outside of the hardware.


Bridge setup:
- member jail A
- member jail B
- member external_if of host

If I ping the router from the host, it works. If I ping from one jail  
to another, it works. If I ping from the jail to the IP of the  
external_if, it works. If I ping from a jail to the router, I do not  
get a response.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpZ4OpaNfO4d.pgp
Description: Digitale PGP-Signatur


Re: iflib/bridge kernel panic

2020-09-28 Thread Kristof Provost

On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
Quoting Kristof Provost  (from Sun, 27 Sep 2020 
17:51:32 +0200):
Here’s an early version of a task queue based approach: 
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the sleep 
issue and the LOR.


There are some issues... seems like inside a jail I can't ping systems 
outside of the hardware.


Bridge setup:
- member jail A
- member jail B
- member external_if of host

If I ping the router from the host, it works. If I ping from one jail 
to another, it works. If I ping from the jail to the IP of the 
external_if, it works. If I ping from a jail to the router, I do not 
get a response.


Can you check for 'failed ifpromisc' error messages in dmesg? And verify 
that all bridge member interfaces are in promiscuous mode?


Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-27 Thread Kristof Provost

On 21 Sep 2020, at 14:16, Shawn Webb wrote:

On Mon, Sep 21, 2020 at 09:57:40AM +0200, Kristof Provost wrote:

On 21 Sep 2020, at 2:52, Shawn Webb wrote:

From latest HEAD on a Dell Precision 7550 laptop:


https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2

The last working boot environment was 14 Aug 2020. If I get some 
time to

bisect commits, I'll try to figure out the culprit.


Try https://reviews.freebsd.org/D26418


That seems to fix the kernel panic. dmesg gets spammed with a freak
ton of these LOR messages now:

Here’s an early version of a task queue based approach: 
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch


That still needs to be cleaned up, but this should resolve the sleep 
issue and the LOR.


Best regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-26 Thread S.N. Trigub

Hi!

There is some serios issue in kernel related to network interfaces.
See my message
"speedtest.net in multi connections mode causes the FreeBSD 13-CURRENT 
router to crash"

from 28 aug 2020.

I noticed that kernel every time goes into panic if users run on client 
computers in browser

speedtest.net in multi connections mode.
If my external network interface use VLAN this panic occurs when uplink has 
speed 100Mbits per second.
Without VLAN speedtest passes without any problems at 100Mbits channel but 
every time goes into panic

at 1Gbits outer channel.
During crash, the console screen goes out and the server (router) stops 
responding to the keyboard.

Can anyone do this test on their machine?

Sergei.


From: xt
Sent: Friday, September 25, 2020 8:46 PM
To: Sergey V. Dyatko ; Kristof Provost
Cc: FreeBSD Current
Subject: Re: iflib/bridge kernel panic

Sergey V. Dyatko wrote:

On Mon, 21 Sep 2020 09:57:40 +0200
"Kristof Provost"  wrote:


On 21 Sep 2020, at 2:52, Shawn Webb wrote:

 From latest HEAD on a Dell Precision 7550 laptop:


https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2

The last working boot environment was 14 Aug 2020. If I get some time to
bisect commits, I'll try to figure out the culprit.


Try https://reviews.freebsd.org/D26418

Best regards,
Kristof


I'm not sure, but doesn't this panic have the same root as mine?
Sorry, but I haven't text console and can post only screenshot[s]
  from IP-KVM
https://gyazo.com/fee41c5267e9fc543d43901e498b7c94

rc.conf have something like:
clonned_interfaces="lagg0 vlan101"
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 x.x.x.x/mask"
ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"

without VLAN part all works fine.
Installed from FreeBSD-13.0-CURRENT-amd64-20200924-3c514403bef-disc1.iso


Yes, same panic.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" 


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-25 Thread xt

Sergey V. Dyatko wrote:

On Mon, 21 Sep 2020 09:57:40 +0200
"Kristof Provost"  wrote:


On 21 Sep 2020, at 2:52, Shawn Webb wrote:

 From latest HEAD on a Dell Precision 7550 laptop:


https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2

The last working boot environment was 14 Aug 2020. If I get some time to
bisect commits, I'll try to figure out the culprit.
  

Try https://reviews.freebsd.org/D26418

Best regards,
Kristof


I'm not sure, but doesn't this panic have the same root as mine?
Sorry, but I haven't text console and can post only screenshot[s]
  from IP-KVM
https://gyazo.com/fee41c5267e9fc543d43901e498b7c94

rc.conf have something like:
clonned_interfaces="lagg0 vlan101"
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 x.x.x.x/mask"
ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"

without VLAN part all works fine.
Installed from FreeBSD-13.0-CURRENT-amd64-20200924-3c514403bef-disc1.iso


Yes, same panic.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-25 Thread Sergey V. Dyatko
On Mon, 21 Sep 2020 09:57:40 +0200
"Kristof Provost"  wrote: 

> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
> >> From latest HEAD on a Dell Precision 7550 laptop:  
> >
> > https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
> >
> > The last working boot environment was 14 Aug 2020. If I get some time to
> > bisect commits, I'll try to figure out the culprit.
> >  
> Try https://reviews.freebsd.org/D26418
> 
> Best regards,
> Kristof

I'm not sure, but doesn't this panic have the same root as mine?
Sorry, but I haven't text console and can post only screenshot[s]
 from IP-KVM
https://gyazo.com/fee41c5267e9fc543d43901e498b7c94

rc.conf have something like:
clonned_interfaces="lagg0 vlan101"
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 x.x.x.x/mask"
ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"

without VLAN part all works fine.
Installed from FreeBSD-13.0-CURRENT-amd64-20200924-3c514403bef-disc1.iso




--
wbr, Sergey

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: iflib/bridge kernel panic

2020-09-23 Thread Kristof Provost
On 23 Sep 2020, at 19:37, xto...@hotmail.com wrote:
> Kristof Provost wrote:
>> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
  From latest HEAD on a Dell Precision 7550 laptop:
>>>
>>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>>
>>> The last working boot environment was 14 Aug 2020. If I get some time to
>>> bisect commits, I'll try to figure out the culprit.
>>>
>> Try https://reviews.freebsd.org/D26418
>
> Anything stopping this from being integrated?

Yes, it’s not correct.

I’ve got this on my todo list. I think I know how to fix it better.

Best regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


  1   2   3   4   5   6   7   8   9   >