Re: buildkernel is broken

2022-07-07 Thread Kristof Provost

On 7 Jul 2022, at 19:00, Steve Kargl wrote:

On Thu, Jul 07, 2022 at 10:37:40AM -0600, Warner Losh wrote:

On Thu, Jul 7, 2022 at 10:37 AM Steve Kargl <
s...@troutmask.apl.washington.edu> wrote:


Thanks, but

root[216] git cherry-pick -n 37f604b49d4a
fatal: bad revision '37f604b49d4a'
root[217] pwd
/usr/src



git fetch maybe?



A cursory google search suggests that 'git fetch'
works on repositories not single files.

I did look at the diff associated with 37f604b49d4a.
I am surprised that the commit that broke buildkernel
for me was allowed to be committed.


It was posted for review in https://reviews.freebsd.org/D35716

I’ll also point out that this commit works just fine in nearly all of 
our kernel configs, because there are very few (only one powerpc config, 
as far as I can tell) that do not have VIMAGE.
Arguably we should have a non-VIMAGE kernel config around (probably for 
amd64) so it’s more likely we spot these issues prior to commit.
Arbitrary non-default kernel configs are more likely to see issues like 
this one. I don’t think that can be avoided.



The fix in
37f604b49d4a seems rather questionable especially given
that there is no comment about why the macro is expanded
to a zero-trip loop.


I’m not sure how I could have been much more clear than this:

VNET_FOREACH() is a LIST_FOREACH if VIMAGE is set, but empty if 
it's

not. This means that users of the macro couldn't use 'continue' or
'break' as one would expect of a loop.

I welcome suggestions on how to improve my future commit messages.

To rephrase it a bit: VNET_FOREACH() used to be very misleading, in that 
it was only a loop with options VIMAGE, and empty (so any code within 
would be its own block, and be executed exactly once, for the only vnet 
that exists without VIMAGE). That’s fine, unless you want to 
‘continue’ or ‘break’ the loop. That worked with VIMAGE (so the 
issue in the dummynet fix was not seen) but not without it.


Kristof

Re: Kernel panic on armv7 when PF is enabled

2022-05-02 Thread Kristof Provost

On 1 May 2022, at 5:13, qroxana wrote:

After git bisecting the panic started since this commit.

commit 78bc3d5e1712bc1649aa5574d2b8d153f9665113

Author: Kristof Provost <
k...@freebsd.org




Date:   Mon Feb 14 20:09:54 2022 +0100

vlan: allow net.link.vlan.mtag_pcp to be set per vnet

The primary reason for this change is to facilitate testing.

MFC after:  1 week

sys/net/if_ethersubr.c | 9 +

sys/net/if_vlan.c  | 5 +++--

2 files changed, 8 insertions(+), 6 deletions(-)

The armv7 board boots from a NFS root,

it can boot without any problem if PF is disabled.

Any helps?

add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1
add net ff02::: gateway ::1
add net :::0.0.0.0: gateway ::1
add net ::0.0.0.0: gateway ::1
Enabling pf.
Kernel page fault with the following non-sleepable locks held:
shared rm pf rulesets (pf rulesets) r = 0 (0xe3099430) locked @ 
/usr/src/sys/netpfil/pf/pf.c:6493
exclusive rw tcpinp (tcpinp) r = 0 (0xdb748d88) locked @ 
/usr/src/sys/netinet/tcp_usrreq.c:1008

stack backtrace:
#0 0xc0355cac at witness_debugger+0x7c
#1 0xc0356ef0 at witness_warn+0x3fc
#2 0xc05ec048 at abort_handler+0x1d8
#3 0xc05cb5ac at exception_exit+0
#4 0xe3083c10 at pf_syncookie_validate+0x60
#5 0xe30496a8 at pf_test+0x518
#6 0xe306d768 at pf_check_out+0x30
#7 0xc0415b44 at pfil_run_hooks+0xbc
#8 0xc0445cfc at ip_output+0xce8
#9 0xc045bc9c at tcp_default_output+0x20ac
#10 0xc0471eb4 at tcp_usr_send+0x1ac
#11 0xc0389464 at sosend_generic+0x490
#12 0xc0389790 at sosend+0x64
#13 0xc0502888 at clnt_vc_call+0x560
#14 0xc05009d8 at clnt_reconnect_call+0x170
#15 0xc01e7b14 at newnfs_request+0xb20
#16 0xc0230218 at nfscl_request+0x60
#17 0xc020d9bc at nfsrpc_getattr+0xb0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdf1f1c90
FSR=0001, FAR=d7840264, spsr=4013
r0 =6a228eda, r1 =dac0d785, r2 =d7840264, r3 =db5527c0
r4 =df1f1e00, r5 =dac0d75f, r6 =0018, r7 =d9422c00
r8 =c093e5e4, r9 =0001, r10=df1f1f5c, r11=df1f1d38
r12=e3098dd0, ssp=df1f1d20, slr=e3083bdc, pc =e3083c10


The commit you point at is entirely unrelated to the code where the 
panic occurred, so I’m pretty sure something went wrong in your 
bisect.


The backtrace would suggest the issue occurs in the  
pf_syncookie_validate() function, and likely in the line `if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven]) 
== 0)`


The obvious way for that to panic would be to call it without the 
curvnet context set, but pf_test() uses it earlier, so that’s going to 
be fine.


Given that this is unique to armv7 I’d recommend talking to the armv7 
maintainer about 64 bit atomic operations.


You can probably avoid the atomic load with this patch (and not enabling 
syncookie support):


	diff --git a/sys/netpfil/pf/pf_syncookies.c 
b/sys/netpfil/pf/pf_syncookies.c

index 5230502be30c..c86d469d3cef 100644
--- a/sys/netpfil/pf/pf_syncookies.c
+++ b/sys/netpfil/pf/pf_syncookies.c
@@ -313,6 +313,9 @@ pf_syncookie_validate(struct pf_pdesc *pd)
ack = ntohl(pd->hdr.tcp.th_ack) - 1;
cookie.cookie = (ack & 0xff) ^ (ack >> 24);

+   if (V_pf_status.syncookies_mode == PF_SYNCOOKIES_NEVER)
+   return (0);
+
/* we don't know oddeven before setting the cookie (union) */
	 if 
(atomic_load_64(_pf_status.syncookies_inflight[cookie.flags.oddeven])

== 0)

That shouldn’t be required though.

Br,
Kristof


<    1   2