Re: Kernel panic on armv7 when PF is enabled
W dniu 2.05.2022 o 11:02, Kristof Provost pisze: On 1 May 2022, at 5:13, qroxana wrote: After git bisecting the panic started since this commit. commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113 Author: Kristof Provost < k...@freebsd.org Date: Mon Feb 14 20:09:54 2022 +0100 vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week sys/net/if_ethersubr.c | 9 + sys/net/if_vlan.c | 5 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) The armv7 board boots from a NFS root, it can boot without any problem if PF is disabled. Any helps? add host ::1: gateway lo0 fib 0: route already in table add net fe80::: gateway ::1 add net ff02::: gateway ::1 add net :::0.0.0.0: gateway ::1 add net ::0.0.0.0: gateway ::1 Enabling pf. Kernel page fault with the following non-sleepable locks held: shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ /usr/src/sys/netpfil/pf/pf.c:6493 exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ /usr/src/sys/netinet/tcp_usrreq.c:1008 stack backtrace: #0 0xc0355cac at witness_debugger+0x7c #1 0xc0356ef0 at witness_warn+0x3fc #2 0xc05ec048 at abort_handler+0x1d8 #3 0xc05cb5ac at exception_exit+0 #4 0xe3083c10 at pf_syncookie_validate+0x60 #5 0xe30496a8 at pf_test+0x518 #6 0xe306d768 at pf_check_out+0x30 #7 0xc0415b44 at pfil_run_hooks+0xbc #8 0xc0445cfc at ip_output+0xce8 #9 0xc045bc9c at tcp_default_output+0x20ac #10 0xc0471eb4 at tcp_usr_send+0x1ac #11 0xc0389464 at sosend_generic+0x490 #12 0xc0389790 at sosend+0x64 #13 0xc0502888 at clnt_vc_call+0x560 #14 0xc05009d8 at clnt_reconnect_call+0x170 #15 0xc01e7b14 at newnfs_request+0xb20 #16 0xc0230218 at nfscl_request+0x60 #17 0xc020d9bc at nfsrpc_getattr+0xb0 Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xdf1f1c90 FSR=0001, FAR=d7840264, spsr=4013 r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0 r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00 r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38 r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10 The commit you point at is entirely unrelated to the code where the panic occurred, so I’m pretty sure something went wrong in your bisect. I was experiencing this panic also running FreeBSD 14.0-CURRENT #1 main-n253028-2ec9a427c85: Tue Feb 8 17:49:25 CET 2022 on armv7, so it's unrelated to aforementioned commit which is dated 2022-02-14. It's very interesting and weird bug, loading pf.ko, enabling PF, loading the rules work as expected, but processing the filtered traffic by PF triggers the panic. The backtrace would suggest the issue occurs in the pf_syncookie_validate() function, and likely in the line |if (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0)| The obvious way for that to panic would be to call it without the curvnet context set, but pf_test() uses it earlier, so that’s going to be fine. Given that this is unique to armv7 I’d recommend talking to the armv7 maintainer about 64 bit atomic operations. You can probably avoid the atomic load with this patch (and not enabling syncookie support): |diff --git a/sys/netpfil/pf/pf_syncookies.c b/sys/netpfil/pf/pf_syncookies.c index 5230502be30c..c86d469d3cef 100644 --- a/sys/netpfil/pf/pf_syncookies.c +++ b/sys/netpfil/pf/pf_syncookies.c @@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd) ack = ntohl(pd->hdr.tcp.th_ack) - 1; cookie.cookie = (ack & 0xff) ^ (ack >> 24); + if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER) + return (0); + /* we don't know oddeven before setting the cookie (union) */ if (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0) | That shouldn’t be required though. Br, Kristof -- Marek Zarychta OpenPGP_signature Description: OpenPGP digital signature
Re: Kernel panic on armv7 when PF is enabled
On Monday, May 2nd, 2022 at 9:02 AM, Kristof Provost wrote: > On 1 May 2022, at 5:13, qroxana wrote: > > > After git bisecting the panic started since this commit. > > > > commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113 > > > > Author: Kristof Provost < > > k...@freebsd.org > > > > Date: Mon Feb 14 20:09:54 2022 +0100 > > > > vlan: allow net.link.vlan.mtag_pcp to be set per vnet > > > > The primary reason for this change is to facilitate testing. > > > > MFC after: 1 week > > > > sys/net/if_ethersubr.c | 9 + > > > > sys/net/if_vlan.c | 5 +++-- > > > > 2 files changed, 8 insertions(+), 6 deletions(-) > > > > The armv7 board boots from a NFS root, > > > > it can boot without any problem if PF is disabled. > > > > Any helps? > > > > add host ::1: gateway lo0 fib 0: route already in table > > add net fe80::: gateway ::1 > > add net ff02::: gateway ::1 > > add net :::0.0.0.0: gateway ::1 > > add net ::0.0.0.0: gateway ::1 > > Enabling pf. > > Kernel page fault with the following non-sleepable locks held: > > shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ > > /usr/src/sys/netpfil/pf/pf.c:6493 > > exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ > > /usr/src/sys/netinet/tcp_usrreq.c:1008 > > stack backtrace: > > #0 0xc0355cac at witness_debugger+0x7c > > #1 0xc0356ef0 at witness_warn+0x3fc > > #2 0xc05ec048 at abort_handler+0x1d8 > > #3 0xc05cb5ac at exception_exit+0 > > #4 0xe3083c10 at pf_syncookie_validate+0x60 > > #5 0xe30496a8 at pf_test+0x518 > > #6 0xe306d768 at pf_check_out+0x30 > > #7 0xc0415b44 at pfil_run_hooks+0xbc > > #8 0xc0445cfc at ip_output+0xce8 > > #9 0xc045bc9c at tcp_default_output+0x20ac > > #10 0xc0471eb4 at tcp_usr_send+0x1ac > > #11 0xc0389464 at sosend_generic+0x490 > > #12 0xc0389790 at sosend+0x64 > > #13 0xc0502888 at clnt_vc_call+0x560 > > #14 0xc05009d8 at clnt_reconnect_call+0x170 > > #15 0xc01e7b14 at newnfs_request+0xb20 > > #16 0xc0230218 at nfscl_request+0x60 > > #17 0xc020d9bc at nfsrpc_getattr+0xb0 > > Fatal kernel mode data abort: 'Alignment Fault' on read > > trapframe: 0xdf1f1c90 > > FSR=0001, FAR=d7840264, spsr=4013 > > r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0 > > r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00 > > r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38 > > r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10 > > The commit you point at is entirely unrelated to the code where the panic > occurred, so I’m pretty sure something went wrong in your bisect. > > The backtrace would suggest the issue occurs in the pf_syncookie_validate() > function, and likely in the line `if > (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0)` > > The obvious way for that to panic would be to call it without the curvnet > context set, but pf_test() uses it earlier, so that’s going to be fine. > > Given that this is unique to armv7 I’d recommend talking to the armv7 > maintainer about 64 bit atomic operations. > > You can probably avoid the atomic load with this patch (and not enabling > syncookie support): > > diff --git a/sys/netpfil/pf/pf_syncookies.c > b/sys/netpfil/pf/pf_syncookies.c > index 5230502be30c..c86d469d3cef 100644 > --- a/sys/netpfil/pf/pf_syncookies.c > +++ b/sys/netpfil/pf/pf_syncookies.c > @@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd) > ack = ntohl(pd->hdr.tcp.th_ack) - 1; > cookie.cookie = (ack & 0xff) ^ (ack >> 24); > > + if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER) > + return (0); > + > /* we don't know oddeven before setting the cookie (union) */ > if > (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) > == 0) > > > That shouldn’t be required though. > > Br, > Kristof Thank you sir. You were right. I tested patch with the latest kernel. It can boot successfully with the patch, and still got kernel panic without the patch.
Re: Kernel panic on armv7 when PF is enabled
On 1 May 2022, at 5:13, qroxana wrote: After git bisecting the panic started since this commit. commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113 Author: Kristof Provost < k...@freebsd.org Date: Mon Feb 14 20:09:54 2022 +0100 vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week sys/net/if_ethersubr.c | 9 + sys/net/if_vlan.c | 5 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) The armv7 board boots from a NFS root, it can boot without any problem if PF is disabled. Any helps? add host ::1: gateway lo0 fib 0: route already in table add net fe80::: gateway ::1 add net ff02::: gateway ::1 add net :::0.0.0.0: gateway ::1 add net ::0.0.0.0: gateway ::1 Enabling pf. Kernel page fault with the following non-sleepable locks held: shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ /usr/src/sys/netpfil/pf/pf.c:6493 exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ /usr/src/sys/netinet/tcp_usrreq.c:1008 stack backtrace: #0 0xc0355cac at witness_debugger+0x7c #1 0xc0356ef0 at witness_warn+0x3fc #2 0xc05ec048 at abort_handler+0x1d8 #3 0xc05cb5ac at exception_exit+0 #4 0xe3083c10 at pf_syncookie_validate+0x60 #5 0xe30496a8 at pf_test+0x518 #6 0xe306d768 at pf_check_out+0x30 #7 0xc0415b44 at pfil_run_hooks+0xbc #8 0xc0445cfc at ip_output+0xce8 #9 0xc045bc9c at tcp_default_output+0x20ac #10 0xc0471eb4 at tcp_usr_send+0x1ac #11 0xc0389464 at sosend_generic+0x490 #12 0xc0389790 at sosend+0x64 #13 0xc0502888 at clnt_vc_call+0x560 #14 0xc05009d8 at clnt_reconnect_call+0x170 #15 0xc01e7b14 at newnfs_request+0xb20 #16 0xc0230218 at nfscl_request+0x60 #17 0xc020d9bc at nfsrpc_getattr+0xb0 Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xdf1f1c90 FSR=0001, FAR=d7840264, spsr=4013 r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0 r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00 r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38 r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10 The commit you point at is entirely unrelated to the code where the panic occurred, so I’m pretty sure something went wrong in your bisect. The backtrace would suggest the issue occurs in the pf_syncookie_validate() function, and likely in the line `if (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0)` The obvious way for that to panic would be to call it without the curvnet context set, but pf_test() uses it earlier, so that’s going to be fine. Given that this is unique to armv7 I’d recommend talking to the armv7 maintainer about 64 bit atomic operations. You can probably avoid the atomic load with this patch (and not enabling syncookie support): diff --git a/sys/netpfil/pf/pf_syncookies.c b/sys/netpfil/pf/pf_syncookies.c index 5230502be30c..c86d469d3cef 100644 --- a/sys/netpfil/pf/pf_syncookies.c +++ b/sys/netpfil/pf/pf_syncookies.c @@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd) ack = ntohl(pd->hdr.tcp.th_ack) - 1; cookie.cookie = (ack & 0xff) ^ (ack >> 24); + if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER) + return (0); + /* we don't know oddeven before setting the cookie (union) */ if (atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) == 0) That shouldn’t be required though. Br, Kristof
Re: Kernel panic on armv7 when PF is enabled
On Sun, 01 May 2022 03:13:43 +, qroxana wrote: > After git bisecting the panic started since this commit. > > commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113 > Author: Kristof Provost > Date: Mon Feb 14 20:09:54 2022 +0100 > > vlan: allow net.link.vlan.mtag_pcp to be set per vnet > > The primary reason for this change is to facilitate testing. > > MFC after: 1 week > > sys/net/if_ethersubr.c | 9 + > sys/net/if_vlan.c | 5 +++-- > 2 files changed, 8 insertions(+), 6 deletions(-) > > The armv7 board boots from a NFS root, > > it can boot without any problem if PF is disabled. It appears this only occurs when the rootfs is NFS, I also tried to boot it from a micro SD card, no kernel panic. Another workaround to avoid the panic is to delay starting /etc/rc.d/pf to SERVERS --- pf.orig 2022-03-12 12:26:47.0 + +++ pf 2022-05-02 02:59:28.131026862 + @@ -4,7 +4,7 @@ # # PROVIDE: pf -# REQUIRE: FILESYSTEMS netif pflog pfsync routing +# REQUIRE: SERVERS netif pflog pfsync routing # KEYWORD: nojailvnet . /etc/rc.subr Thanks,
Kernel panic on armv7 when PF is enabled
After git bisecting the panic started since this commit. commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113 Author: Kristof Provost < k...@freebsd.org > Date: Mon Feb 14 20:09:54 2022 +0100 vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week sys/net/if_ethersubr.c | 9 + sys/net/if_vlan.c | 5 +++-- 2 files changed, 8 insertions(+), 6 deletions(-) The armv7 board boots from a NFS root, it can boot without any problem if PF is disabled. Any helps? add host ::1: gateway lo0 fib 0: route already in table add net fe80::: gateway ::1 add net ff02::: gateway ::1 add net :::0.0.0.0: gateway ::1 add net ::0.0.0.0: gateway ::1 Enabling pf. Kernel page fault with the following non-sleepable locks held: shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ /usr/src/sys/netpfil/pf/pf.c:6493 exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ /usr/src/sys/netinet/tcp_usrreq.c:1008 stack backtrace: #0 0xc0355cac at witness_debugger+0x7c #1 0xc0356ef0 at witness_warn+0x3fc #2 0xc05ec048 at abort_handler+0x1d8 #3 0xc05cb5ac at exception_exit+0 #4 0xe3083c10 at pf_syncookie_validate+0x60 #5 0xe30496a8 at pf_test+0x518 #6 0xe306d768 at pf_check_out+0x30 #7 0xc0415b44 at pfil_run_hooks+0xbc #8 0xc0445cfc at ip_output+0xce8 #9 0xc045bc9c at tcp_default_output+0x20ac #10 0xc0471eb4 at tcp_usr_send+0x1ac #11 0xc0389464 at sosend_generic+0x490 #12 0xc0389790 at sosend+0x64 #13 0xc0502888 at clnt_vc_call+0x560 #14 0xc05009d8 at clnt_reconnect_call+0x170 #15 0xc01e7b14 at newnfs_request+0xb20 #16 0xc0230218 at nfscl_request+0x60 #17 0xc020d9bc at nfsrpc_getattr+0xb0 Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xdf1f1c90 FSR=0001, FAR=d7840264, spsr=4013 r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0 r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00 r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38 r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10 panic: Fatal abort cpuid = 1 time = 1651366089 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc05c8c00 lr = 0xc007ac8c (db_trace_self_wrapper+0x30) sp = 0xdf1f1a68 fp = 0xdf1f1b80 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007ac8c lr = 0xc02e289c (vpanic+0x170) sp = 0xdf1f1b88 fp = 0xdf1f1ba8 r4 = 0x0100 r5 = 0x r6 = 0xc0780529 r7 = 0xc090ea10 vpanic() at vpanic+0x170 pc = 0xc02e289c lr = 0xc02e264c (doadump) sp = 0xdf1f1bb0 fp = 0xdf1f1bb4 r4 = 0xdf1f1c90 r5 = 0x0013 r6 = 0xd7840264 r7 = 0x0001 r8 = 0x0001 r9 = 0xdb5527c0 r10 = 0xd7840264 doadump() at doadump pc = 0xc02e264c lr = 0xc05ec698 (abort_align) sp = 0xdf1f1bbc fp = 0xdf1f1be8 r4 = 0xd7840264 r5 = 0xdf1f1bb4 r6 = 0xc02e264c r10 = 0xdf1f1bbc abort_align() at abort_align pc = 0xc05ec698 lr = 0xc05ec198 (abort_handler+0x328) sp = 0xdf1f1bf0 fp = 0xdf1f1c88 r4 = 0x0013 r5 = 0xd7840264 abort_handler() at abort_handler+0x328 pc = 0xc05ec198 lr = 0xc05cb5ac (exception_exit) sp = 0xdf1f1c90 fp = 0xdf1f1d38 r4 = 0xdf1f1e00 r5 = 0xdac0d75f r6 = 0x0018 r7 = 0xd9422c00 r8 = 0xc093e5e4 r9 = 0x0001 r10 = 0xdf1f1f5c exception_exit() at exception_exit pc = 0xc05cb5ac lr = 0xe3083bdc (pf_syncookie_validate+0x2c) sp = 0xdf1f1d20 fp = 0xdf1f1d38 r0 = 0x6a228eda r1 = 0xdac0d785 r2 = 0xd7840264 r3 = 0xdb5527c0 r4 = 0xdf1f1e00 r5 = 0xdac0d75f r6 = 0x0018 r7 = 0xd9422c00 r8 = 0xc093e5e4 r9 = 0x0001 r10 = 0xdf1f1f5c r12 = 0xe3098dd0 pf_syncookie_validate() at pf_syncookie_validate+0x60 pc = 0xe3083c10 lr = 0xe30496a8 (pf_test+0x518) sp = 0xdf1f1d40 fp = 0xdf1f1ea8 r4 = 0x0002 r5 = 0xdb4a6100 r6 = 0x0018 r7 = 0xd9422c00 r8 = 0x0002 r9 = 0x0001 pf_test() at pf_test+0x518 pc = 0xe30496a8 lr = 0xe306d768 (pf_check_out+0x30) sp = 0xdf1f1eb0 fp = 0xdf1f1ec0 r4 = 0xdf1f1f5c r5 = 0xe306d738 r6 = 0xdb6ba660 r7 = 0x r8 = 0xd9422c00 r9 = 0xdb748d80 r10 = 0xfff7 pf_check_out() at pf_check_out+0x30 pc = 0xe306d768 lr = 0xc0415b44 (pfil_run_hooks+0xbc) sp = 0xdf1f1ec8 fp = 0xdf1f1ef0 r4 = 0x0002 r5 = 0xe306d738 pfil_run_hooks() at pfil_run_hooks+0xbc pc = 0xc0415b44 lr = 0xc0445cfc (ip_output+0xce8) sp = 0xdf1f1ef8 fp = 0xdf1f1fa8 r4 = 0x010a r5 = 0x0a0a r6 = 0xdb4a6158 r7 = 0xc0946908 r8 = 0xdb5bec00 r9 = 0xd9422c00 r10 = 0x05dc ip_output() at ip_output+0xce8 pc = 0xc0445cfc lr = 0xc045bc9c (tcp_default_output+0x20ac) sp = 0xdf1f1fb0