Re: Kernel panic on armv7 when PF is enabled

2022-05-03 Thread Marek Zarychta

W dniu 2.05.2022 o 11:02, Kristof Provost pisze:

On 1 May 2022, at 5:13, qroxana wrote:

After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org

Date: Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after: 1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @
/usr/src/sys/netinet/tcp_usrreq.c:1008
stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10


The commit you point at is entirely unrelated to the code where the 
panic occurred, so I’m pretty sure something went wrong in your bisect.


I was experiencing this panic also running FreeBSD 14.0-CURRENT #1 
main-n253028-2ec9a427c85: Tue Feb  8 17:49:25 CET 2022 on armv7, so it's 
unrelated to aforementioned commit which is dated 2022-02-14.


It's very interesting and weird bug, loading pf.ko, enabling PF, loading 
the rules work as expected, but processing the filtered traffic by PF 
triggers the panic.




The backtrace would suggest the issue occurs in the 
pf_syncookie_validate() function, and likely in the line |if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0)|


The obvious way for that to panic would be to call it without the 
curvnet context set, but pf_test() uses it earlier, so that’s going to 
be fine.


Given that this is unique to armv7 I’d recommend talking to the armv7 
maintainer about 64 bit atomic operations.


You can probably avoid the atomic load with this patch (and not enabling 
syncookie support):


|diff --git a/sys/netpfil/pf/pf_syncookies.c 
b/sys/netpfil/pf/pf_syncookies.c index 5230502be30c..c86d469d3cef 100644 
--- a/sys/netpfil/pf/pf_syncookies.c +++ 
b/sys/netpfil/pf/pf_syncookies.c @@ -313,6 +313,9 @@ 
pf_syncookie_validate(struct pf_pdesc *pd) ack = 
ntohl(pd->hdr.tcp.th_ack) - 1; cookie.cookie = (ack & 0xff) ^ (ack >> 
24); + if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER) + return 
(0); + /* we don't know oddeven before setting the cookie (union) */ if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0) |


That shouldn’t be required though.

Br,
Kristof




--
Marek Zarychta


OpenPGP_signature
Description: OpenPGP digital signature


Re: Kernel panic on armv7 when PF is enabled

2022-05-03 Thread qroxana


On Monday, May 2nd, 2022 at 9:02 AM, Kristof Provost  wrote:


> On 1 May 2022, at 5:13, qroxana wrote:
>
> > After git bisecting the panic started since this commit.
> >
> > commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113
> >
> > Author: Kristof Provost <
> > k...@freebsd.org
> >
> > Date: Mon Feb 14 20:09:54 2022 +0100
> >
> > vlan: allow net.link.vlan.mtag_pcp to be set per vnet
> >
> > The primary reason for this change is to facilitate testing.
> >
> > MFC after: 1 week
> >
> > sys/net/if_ethersubr.c | 9 +
> >
> > sys/net/if_vlan.c | 5 +++--
> >
> > 2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > The armv7 board boots from a NFS root,
> >
> > it can boot without any problem if PF is disabled.
> >
> > Any helps?
> >
> > add host ::1: gateway lo0 fib 0: route already in table
> > add net fe80::: gateway ::1
> > add net ff02::: gateway ::1
> > add net :::0.0.0.0: gateway ::1
> > add net ::0.0.0.0: gateway ::1
> > Enabling pf.
> > Kernel page fault with the following non-sleepable locks held:
> > shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
> > /usr/src/sys/netpfil/pf/pf.c:6493
> > exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
> > /usr/src/sys/netinet/tcp_usrreq.c:1008
> > stack backtrace:
> > #0 0xc0355cac at witness_debugger+0x7c
> > #1 0xc0356ef0 at witness_warn+0x3fc
> > #2 0xc05ec048 at abort_handler+0x1d8
> > #3 0xc05cb5ac at exception_exit+0
> > #4 0xe3083c10 at pf_syncookie_validate+0x60
> > #5 0xe30496a8 at pf_test+0x518
> > #6 0xe306d768 at pf_check_out+0x30
> > #7 0xc0415b44 at pfil_run_hooks+0xbc
> > #8 0xc0445cfc at ip_output+0xce8
> > #9 0xc045bc9c at tcp_default_output+0x20ac
> > #10 0xc0471eb4 at tcp_usr_send+0x1ac
> > #11 0xc0389464 at sosend_generic+0x490
> > #12 0xc0389790 at sosend+0x64
> > #13 0xc0502888 at clnt_vc_call+0x560
> > #14 0xc05009d8 at clnt_reconnect_call+0x170
> > #15 0xc01e7b14 at newnfs_request+0xb20
> > #16 0xc0230218 at nfscl_request+0x60
> > #17 0xc020d9bc at nfsrpc_getattr+0xb0
> > Fatal kernel mode data abort: 'Alignment Fault' on read
> > trapframe: 0xdf1f1c90
> > FSR=0001, FAR=d7840264, spsr=4013
> > r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
> > r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
> > r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
> > r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10
>
> The commit you point at is entirely unrelated to the code where the panic 
> occurred, so I’m pretty sure something went wrong in your bisect.
>
> The backtrace would suggest the issue occurs in the pf_syncookie_validate() 
> function, and likely in the line `if 
> (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0)`
>
> The obvious way for that to panic would be to call it without the curvnet 
> context set, but pf_test() uses it earlier, so that’s going to be fine.
>
> Given that this is unique to armv7 I’d recommend talking to the armv7 
> maintainer about 64 bit atomic operations.
>
> You can probably avoid the atomic load with this patch (and not enabling 
> syncookie support):
>
> diff --git a/sys/netpfil/pf/pf_syncookies.c 
> b/sys/netpfil/pf/pf_syncookies.c
> index 5230502be30c..c86d469d3cef 100644
> --- a/sys/netpfil/pf/pf_syncookies.c
> +++ b/sys/netpfil/pf/pf_syncookies.c
> @@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd)
> ack = ntohl(pd->hdr.tcp.th_ack) - 1;
> cookie.cookie = (ack & 0xff) ^ (ack >> 24);
>
> +   if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER)
> +   return (0);
> +
> /* we don't know oddeven before setting the cookie (union) */
>  if 
> (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven])
> == 0)
>
>
> That shouldn’t be required though.
>
> Br,
> Kristof

Thank you sir. You were right.
I tested patch with the latest kernel.
It can boot successfully with the patch,
and still got kernel panic without the patch.







Re: Kernel panic on armv7 when PF is enabled

2022-05-02 Thread Kristof Provost

On 1 May 2022, at 5:13, qroxana wrote:

After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org




Date:   Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after:  1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c  | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1008

stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10


The commit you point at is entirely unrelated to the code where the 
panic occurred, so I’m pretty sure something went wrong in your 
bisect.


The backtrace would suggest the issue occurs in the  
pf_syncookie_validate() function, and likely in the line `if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0)`


The obvious way for that to panic would be to call it without the 
curvnet context set, but pf_test() uses it earlier, so that’s going to 
be fine.


Given that this is unique to armv7 I’d recommend talking to the armv7 
maintainer about 64 bit atomic operations.


You can probably avoid the atomic load with this patch (and not enabling 
syncookie support):


	diff --git a/sys/netpfil/pf/pf_syncookies.c 
b/sys/netpfil/pf/pf_syncookies.c

index 5230502be30c..c86d469d3cef 100644
--- a/sys/netpfil/pf/pf_syncookies.c
+++ b/sys/netpfil/pf/pf_syncookies.c
@@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd)
ack = ntohl(pd->hdr.tcp.th_ack) - 1;
cookie.cookie = (ack & 0xff) ^ (ack >> 24);

+   if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER)
+   return (0);
+
/* we don't know oddeven before setting the cookie (union) */
	 if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven])

== 0)

That shouldn’t be required though.

Br,
Kristof


Re: Kernel panic on armv7 when PF is enabled

2022-05-02 Thread qroxana
On Sun, 01 May 2022 03:13:43 +, qroxana  wrote:

> After git bisecting the panic started since this commit.
>
> commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113
> Author: Kristof Provost 
> Date:   Mon Feb 14 20:09:54 2022 +0100
>
> vlan: allow net.link.vlan.mtag_pcp to be set per vnet
>
> The primary reason for this change is to facilitate testing.
>
> MFC after:  1 week
>
> sys/net/if_ethersubr.c | 9 +
> sys/net/if_vlan.c  | 5 +++--
> 2 files changed, 8 insertions(+), 6 deletions(-)
>
> The armv7 board boots from a NFS root,
>
> it can boot without any problem if PF is disabled.

It appears this only occurs when the rootfs is NFS,
I also tried to boot it from a micro SD card, no kernel panic.

Another workaround to avoid the panic is to delay
starting /etc/rc.d/pf to SERVERS

--- pf.orig 2022-03-12 12:26:47.0 +
+++ pf  2022-05-02 02:59:28.131026862 +
@@ -4,7 +4,7 @@
 #

 # PROVIDE: pf
-# REQUIRE: FILESYSTEMS netif pflog pfsync routing
+# REQUIRE: SERVERS netif pflog pfsync routing
 # KEYWORD: nojailvnet

 . /etc/rc.subr

Thanks,

Kernel panic on armv7 when PF is enabled

2022-04-30 Thread qroxana
After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org
>

Date:   Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after:  1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c  | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1008
stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10

panic: Fatal abort
cpuid = 1
time = 1651366089
KDB: stack backtrace:
db_trace_self() at db_trace_self
 pc = 0xc05c8c00  lr = 0xc007ac8c (db_trace_self_wrapper+0x30)
 sp = 0xdf1f1a68  fp = 0xdf1f1b80
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
 pc = 0xc007ac8c  lr = 0xc02e289c (vpanic+0x170)
 sp = 0xdf1f1b88  fp = 0xdf1f1ba8
 r4 = 0x0100  r5 = 0x
 r6 = 0xc0780529  r7 = 0xc090ea10
vpanic() at vpanic+0x170
 pc = 0xc02e289c  lr = 0xc02e264c (doadump)
 sp = 0xdf1f1bb0  fp = 0xdf1f1bb4
 r4 = 0xdf1f1c90  r5 = 0x0013
 r6 = 0xd7840264  r7 = 0x0001
 r8 = 0x0001  r9 = 0xdb5527c0
r10 = 0xd7840264
doadump() at doadump
 pc = 0xc02e264c  lr = 0xc05ec698 (abort_align)
 sp = 0xdf1f1bbc  fp = 0xdf1f1be8
 r4 = 0xd7840264  r5 = 0xdf1f1bb4
 r6 = 0xc02e264c r10 = 0xdf1f1bbc
abort_align() at abort_align
 pc = 0xc05ec698  lr = 0xc05ec198 (abort_handler+0x328)
 sp = 0xdf1f1bf0  fp = 0xdf1f1c88
 r4 = 0x0013  r5 = 0xd7840264
abort_handler() at abort_handler+0x328
 pc = 0xc05ec198  lr = 0xc05cb5ac (exception_exit)
 sp = 0xdf1f1c90  fp = 0xdf1f1d38
 r4 = 0xdf1f1e00  r5 = 0xdac0d75f
 r6 = 0x0018  r7 = 0xd9422c00
 r8 = 0xc093e5e4  r9 = 0x0001
r10 = 0xdf1f1f5c
exception_exit() at exception_exit
 pc = 0xc05cb5ac  lr = 0xe3083bdc (pf_syncookie_validate+0x2c)
 sp = 0xdf1f1d20  fp = 0xdf1f1d38
 r0 = 0x6a228eda  r1 = 0xdac0d785
 r2 = 0xd7840264  r3 = 0xdb5527c0
 r4 = 0xdf1f1e00  r5 = 0xdac0d75f
 r6 = 0x0018  r7 = 0xd9422c00
 r8 = 0xc093e5e4  r9 = 0x0001
r10 = 0xdf1f1f5c r12 = 0xe3098dd0
pf_syncookie_validate() at pf_syncookie_validate+0x60
 pc = 0xe3083c10  lr = 0xe30496a8 (pf_test+0x518)
 sp = 0xdf1f1d40  fp = 0xdf1f1ea8
 r4 = 0x0002  r5 = 0xdb4a6100
 r6 = 0x0018  r7 = 0xd9422c00
 r8 = 0x0002  r9 = 0x0001
pf_test() at pf_test+0x518
 pc = 0xe30496a8  lr = 0xe306d768 (pf_check_out+0x30)
 sp = 0xdf1f1eb0  fp = 0xdf1f1ec0
 r4 = 0xdf1f1f5c  r5 = 0xe306d738
 r6 = 0xdb6ba660  r7 = 0x
 r8 = 0xd9422c00  r9 = 0xdb748d80
r10 = 0xfff7
pf_check_out() at pf_check_out+0x30
 pc = 0xe306d768  lr = 0xc0415b44 (pfil_run_hooks+0xbc)
 sp = 0xdf1f1ec8  fp = 0xdf1f1ef0
 r4 = 0x0002  r5 = 0xe306d738
pfil_run_hooks() at pfil_run_hooks+0xbc
 pc = 0xc0415b44  lr = 0xc0445cfc (ip_output+0xce8)
 sp = 0xdf1f1ef8  fp = 0xdf1f1fa8
 r4 = 0x010a  r5 = 0x0a0a
 r6 = 0xdb4a6158  r7 = 0xc0946908
 r8 = 0xdb5bec00  r9 = 0xd9422c00
r10 = 0x05dc
ip_output() at ip_output+0xce8
 pc = 0xc0445cfc  lr = 0xc045bc9c (tcp_default_output+0x20ac)
 sp = 0xdf1f1fb0