Re: panic: unix: lock not held

2024-05-02 Thread Alexander Bluhm
On Fri, May 03, 2024 at 12:04:02AM +0300, Vitaliy Makkoveev wrote:
> On Thu, May 02, 2024 at 10:06:45PM +0200, kir...@korins.ky wrote:
> > >Synopsis:  panic: unix: lock not held
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 7.5
> > Details : OpenBSD 7.5-current (GENERIC.MP) #96: Thu May  2 22:01:31 
> > CEST 2024
> >  
> > ca...@matebook.sa31-home.catap.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Commit "Don't re-lock sockets in uipc_shutdown()." leads to kernel
> > panic on my machine.
> > >How-To-Repeat:
> > Build kernel with this commit and try to boot.
> > >Fix:
> > 
> 
> Sorry, I missed this hunk.

Ah, this is in the other diff, that has not been commited yet.
We have to reduce the number of locking flags.

OK bluhm@

> Index: sys/kern/uipc_socket2.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_socket2.c,v
> diff -u -p -r1.151 uipc_socket2.c
> --- sys/kern/uipc_socket2.c   30 Apr 2024 17:59:15 -  1.151
> +++ sys/kern/uipc_socket2.c   2 May 2024 20:58:43 -
> @@ -334,7 +334,7 @@ socantsendmore(struct socket *so)
>  void
>  socantrcvmore(struct socket *so)
>  {
> - if ((so->so_rcv.sb_flags & SB_OWNLOCK) == 0)
> + if ((so->so_rcv.sb_flags & SB_MTXLOCK) == 0)
>   soassertlocked(so);
>  
>   mtx_enter(>so_rcv.sb_mtx);



Re: lock order reversal in soreceive and NFS

2024-04-30 Thread Alexander Bluhm
On Tue, Apr 30, 2024 at 05:26:15PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Apr 30, 2024 at 04:06:29PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 30 Apr 2024 16:18:31 +0300
> > > From: Vitaliy Makkoveev 
> > > 
> > > On Tue, Apr 30, 2024 at 11:08:13AM +0200, Martin Pieuchot wrote:
> > > > 
> > > > On the other side, would that make sense to have a NET_LOCK()-free
> > > > sysctl path?
> > > > 
> > > 
> > > To me it's better to remove uvm_vslock() from network related sysctl
> > > paths. uvm_vslock() used to avoid context switch in the uiomove() call
> > > to not break kernel lock protected data. It is not required for netlock
> > > protected network stuff.
> > 
> > I don't think uvm_vslock() plays a role in the lock order reversal
> > being discussed here.
> > 
> 
> copyin() and copyout() don't sleep while called from sysctl() paths. At
> least it is supposed. 

copyout() does not sleep.  The uvm_vslock() is in if(SCARG(uap, old)).
So copyin() may sleep.  See comment in my sysctl int atomic diff.

bluhm



Re: lock order reversal in soreceive and NFS

2024-04-30 Thread Alexander Bluhm
On Tue, Apr 30, 2024 at 11:08:13AM +0200, Martin Pieuchot wrote:
> > With the patch, the nfsnode-vmmaplk reversal looks like this:
> 
> So the issue here is due to NFS entering the network stack after the
> VFS.  Alexander, Vitaly are we far from a NET_LOCK()-free sosend()?
> Is something we should consider?

The way I am panning is to require shared netlock instead of exclusive
netlock in sosend().  I want to keep exclusive netlock for network
configuration.

Sending and receiving NFS UDP traffic with shared netlock should
be possible.  Switching TCP to shared netlock is much more work.

I don't know if witness knows that two shared rwlocks do not
interfere.

> On the other side, would that make sense to have a NET_LOCK()-free
> sysctl path?

I am working to remove netlock for integer type sysctl.  That is
the majority, others need case by case analysis.

bluhm



Re: [PATCH 2/2] Handle short writes in cp(1)

2024-04-26 Thread Alexander Bluhm
On Thu, Apr 25, 2024 at 09:05:53PM +0200, Piotr Durlej wrote:
> Handle short writes in cp(1)

OK bluhm@

> ---
>  bin/cp/utils.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/bin/cp/utils.c b/bin/cp/utils.c
> index 347081151f2..40265fce7f7 100644
> --- a/bin/cp/utils.c
> +++ b/bin/cp/utils.c
> @@ -149,11 +149,16 @@ copy_file(FTSENT *entp, int exists)
>   wcount = lseek(to_fd, rcount, SEEK_CUR) == -1 ? 
> -1 : rcount;
>   else
>   wcount = write(to_fd, buf, rcount);
> - if (rcount != wcount || wcount == -1) {
> + if (wcount == -1) {
>   warn("%s", to.p_path);
>   rval = 1;
>   break;
>   }
> + if (rcount != wcount) {
> + warnx("%s: short write", to.p_path);
> + rval = 1;
> + break;
> + }
>   }
>   if (skipholes && rcount != -1)
>   rcount = ftruncate(to_fd, lseek(to_fd, 0, SEEK_CUR));
> -- 
> 2.44.0



lock order reversal in soreceive and NFS

2024-04-22 Thread Alexander Bluhm
Hi,

I see a witness lock order reversal warning with soreceive.  It
happens during NFS regress tests.  In /var/log/messages is more
context from regress.

Apr 22 03:18:08 ot29 /bsd: uid 0 on 
/mnt/regress-ffs/fstest_49fd035b8230791792326afb0604868b: out of inodes
Apr 22 03:18:21 ot29 mountd[6781]: Bad exports list line /mnt/regress-nfs-server
Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal:
Apr 22 03:19:08 ot29 /bsd:  1st 0xfd85c8ae12a8 vmmaplk (>lock)
Apr 22 03:19:08 ot29 /bsd:  2nd 0x80004c488c78 nfsnode (>n_lock)
Apr 22 03:19:08 ot29 /bsd: lock order data w2 -> w1 missing
Apr 22 03:19:08 ot29 /bsd: lock order ">lock"(rwlock) -> 
">n_lock"(rrwlock) first seen at:
Apr 22 03:19:08 ot29 /bsd: #0  rw_enter+0x6d
Apr 22 03:19:08 ot29 /bsd: #1  rrw_enter+0x5e
Apr 22 03:19:08 ot29 /bsd: #2  VOP_LOCK+0x5f
Apr 22 03:19:08 ot29 /bsd: #3  vn_lock+0xbc
Apr 22 03:19:08 ot29 /bsd: #4  vn_rdwr+0x83
Apr 22 03:19:08 ot29 /bsd: #5  vndstrategy+0x2ca
Apr 22 03:19:08 ot29 /bsd: #6  physio+0x204
Apr 22 03:19:08 ot29 /bsd: #7  spec_write+0x9e
Apr 22 03:19:08 ot29 /bsd: #8  VOP_WRITE+0x45
Apr 22 03:19:08 ot29 /bsd: #9  vn_write+0x100
Apr 22 03:19:08 ot29 /bsd: #10 dofilewritev+0x14e
Apr 22 03:19:08 ot29 /bsd: #11 sys_pwrite+0x60
Apr 22 03:19:08 ot29 /bsd: #12 syscall+0x588
Apr 22 03:19:08 ot29 /bsd: #13 Xsyscall+0x128
Apr 22 03:19:08 ot29 /bsd: witness: lock order reversal:
Apr 22 03:19:08 ot29 /bsd:  1st 0xfd85c8ae12a8 vmmaplk (>lock)
Apr 22 03:19:08 ot29 /bsd:  2nd 0x80002ec41860 sbufrcv (>so_rcv.sb_lock)
Apr 22 03:19:08 ot29 /bsd: lock order ">so_rcv.sb_lock"(rwlock) -> 
">lock"(rwlock) first seen at:
Apr 22 03:19:08 ot29 /bsd: #0  rw_enter_read+0x50
Apr 22 03:19:08 ot29 /bsd: #1  uvmfault_lookup+0x8a
Apr 22 03:19:08 ot29 /bsd: #2  uvm_fault_check+0x36
Apr 22 03:19:08 ot29 /bsd: #3  uvm_fault+0xfb
Apr 22 03:19:08 ot29 /bsd: #4  kpageflttrap+0x158
Apr 22 03:19:08 ot29 /bsd: #5  kerntrap+0x94
Apr 22 03:19:08 ot29 /bsd: #6  alltraps_kern_meltdown+0x7b
Apr 22 03:19:08 ot29 /bsd: #7  copyout+0x57
Apr 22 03:19:08 ot29 /bsd: #8  soreceive+0x99a
Apr 22 03:19:08 ot29 /bsd: #9  recvit+0x1fd
Apr 22 03:19:08 ot29 /bsd: #10 sys_recvfrom+0xa4
Apr 22 03:19:08 ot29 /bsd: #11 syscall+0x588
Apr 22 03:19:08 ot29 /bsd: #12 Xsyscall+0x128
Apr 22 03:19:08 ot29 /bsd: lock order data w1 -> w2 missing
Apr 22 03:22:27 ot29 /bsd: uid 0 on 
/mnt/regress-nfs-client/fstest_3372ae0ca77c9470440ef577e4f5e16e: file system 
full
Apr 22 03:22:30 ot29 /bsd: uid 0 on 
/mnt/regress-nfs-client/fstest_632a6ba698de06560b4c93617b00808d: out of inodes

According to timestamp it is regress/sys/ffs.
make -C /usr/src/regress/sys/ffs/nfs run-chmod
triggers it.

I already reported in a thread on tech@, but the issue is independent
of the diff over there.  Let's start a fresh discussion.

bluhm



Re: potential unfixed CVE in usr.bin/compress/zopen.c

2024-04-03 Thread Alexander Bluhm
On Wed, Apr 03, 2024 at 03:35:07PM +, Lu ChenHao wrote:
> As CVE-2011-2895 said, the 
> LZW decompressor is vulnerable to an infinite loop or a heap-based buffer 
> overflow. As a mitigation, freebsd has added checks in 
> zopen.c.
>  But there seems to be no checks in openbsd's 
> zopen.c.
>  Since this is an old CVE, just wondering whether openbsd is vulnerable to 
> it, or it has been fixed by another way in openbsd.
> [https://opengraph.githubassets.com/6deefd04d5f9f6e2baa404fec35c127503d661110a01bf55450d94f945341885/openbsd/src]
> src/usr.bin/compress/zopen.c at master ?? 
> openbsd/src
> Read-only git conversion of OpenBSD's official CVS src repository. Pull 
> requests not accepted - send diffs to the tech@ mailing list. - openbsd/src
> github.com
> 

According to
https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2011-2895
it was fixed in OpenBSD here
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/compress/zopen.c#rev1.17

Fixes look different in FreeBSD, NetBSD, OpenBSD.  I have not checked
whether they are equivalent.

bluhm



Re: dwqe ifconfig down panic

2024-03-29 Thread Alexander Bluhm
On Thu, Mar 28, 2024 at 11:06:13PM +0100, Stefan Sperling wrote:
> On Wed, Mar 27, 2024 at 02:08:27PM +0100, Stefan Sperling wrote:
> > On Tue, Mar 26, 2024 at 11:05:49PM +0100, Patrick Wildt wrote:
> > > On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > > > Hi,
> > > > 
> > > > When doing flood ping transmit from a machine and simultaneously
> > > > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
> >  
> > > * Don't run TX/RX proc in case the interface is down?
> > 
> > The RX path already has a corresponding check. But the Tx path does not.
> > 
> > If the problem is a race involving mbufs freed via dwqe_down() and
> > mbufs freed via dwqe_tx_proc() then this simple tweak might help.
> 
> With this patch bluhm's test machine has survived 30 minutes of
> flood ping + ifconfig down/up in a loop. Without the patch the
> machine crashes within a few seconds.
> 
> I understand that there could be an issue in intr_barrier() which
> gets papered over by this patch. However the patch does avoid the
> crash and it is trivial to revert when testing the effectiveness
> of any potential intr_barrier() fixes.
> 
> ok?

OK bluhm@

> > diff /usr/src
> > commit - 029d0a842cd8a317375b31145383409491d345e7
> > path + /usr/src
> > blob - 97f874d2edf74a009a811455fbf37ca56f725eef
> > file + sys/dev/ic/dwqe.c
> > --- sys/dev/ic/dwqe.c
> > +++ sys/dev/ic/dwqe.c
> > @@ -593,6 +593,9 @@ dwqe_tx_proc(struct dwqe_softc *sc)
> > struct dwqe_buf *txb;
> > int idx, txfree;
> >  
> > +   if ((ifp->if_flags & IFF_RUNNING) == 0)
> > +   return;
> > +
> > bus_dmamap_sync(sc->sc_dmat, DWQE_DMA_MAP(sc->sc_txring), 0,
> > DWQE_DMA_LEN(sc->sc_txring),
> > BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
> > > 
> > 
> > 



ntpd NULL deref

2024-03-19 Thread Alexander Bluhm
Hi,

ntpd crashed on my laptop.  cstr->addr is NULL.  According to
accounting it was running for a while.

ntpd[43355]  -   _ntp  __ 0.06 secs Thu Mar 14 10:57 (41:41:32.00)
ntpd[81566]  -F  root  __ 0.28 secs Thu Mar 14 10:57 (41:39:28.00)
ntpd[5567]   -DXT_ntp  __ 0.02 secs Thu Mar 14 10:57 (41:39:28.00)

-rw-r--r--   1 root  wheel  1583504 Mar 16 03:36 5567.core

constraint.c
   204  cstr->last = now;
   205  cstr->state = STATE_QUERY_SENT;
   206
   207  memset(, 0, sizeof(am));
*  208  memcpy(, cstr->addr, sizeof(am.a));
   209  am.synced = synced;
   210
   211  iov[iov_cnt].iov_base = 
   212  iov[iov_cnt++].iov_len = sizeof(am);

Core was generated by `ntpd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x06db7eb7fea0 in memcpy (dst0=0x7b224d08a0e8, src0=, 
length=272) at /usr/src/lib/libc/string/memcpy.c:103
103 TLOOP(*(word *)dst = *(word *)src; src += wsize; dst += wsize);
(gdb) bt
#0  0x06db7eb7fea0 in memcpy (dst0=0x7b224d08a0e8, src0=, 
length=272) at /usr/src/lib/libc/string/memcpy.c:103
#1  0x06d915308864 in constraint_query (cstr=0x6db756f4000, synced=0) at 
/usr/src/usr.sbin/ntpd/constraint.c:208
#2  0x06d9152ff753 in ntp_main (nconf=, pw=, 
argc=, argv=)
at /usr/src/usr.sbin/ntpd/ntp.c:330
#3  0x06d9152fd07a in main (argc=, argv=) at 
/usr/src/usr.sbin/ntpd/ntpd.c:224
(gdb) frame 1
#1  0x06d915308864 in constraint_query (cstr=0x6db756f4000, synced=0) at 
/usr/src/usr.sbin/ntpd/constraint.c:208
208 memcpy(, cstr->addr, sizeof(am.a));

(gdb) print *cstr
value of type `constraint' requires 65704 bytes, which is more than 
max-value-size
(gdb) print cstr->entry
$3 = {tqe_next = 0x0, tqe_prev = 0x6dba8b72000}
(gdb) print cstr->addr_head
$4 = {name = 0x6db60004850 "www.google.com", path = 0x6db600041c0 "/", a = 0x0, 
pool = 2 '\002'}
(gdb) print cstr->addr
$5 = (struct ntp_addr *) 0x0
(gdb) print cstr->senderrors
$6 = 0
(gdb) print cstr->state
$7 = STATE_QUERY_SENT
(gdb) print cstr->id
$11 = 209
(gdb) print cstr->fd
$12 = -1
(gdb) print cstr->pid
$13 = 0
(gdb) print cstr->ibuf
value of type `imsgbuf' requires 65600 bytes, which is more than max-value-size
(gdb) print cstr->last
$14 = 146373
(gdb) print cstr->constraint
$15 = 0
(gdb) print cstr->dnstries
$16 = 0

bluhm



Re: ICMP6 Type2 with MTU=PrevMTU Packet Flood in specific cornercase scenarios on OpenBSD7.4

2024-03-07 Thread Alexander Bluhm
Hi,

Thanks for the detailed bug report.

Note that I have also written some scapy script to test path MTU
discovery.  /usr/src/regress/sys/netinet/pmtu/tcp_connect.py
and tcp_connect6.py
Sometimes these tests fail, so PMTU may have bugs.  Or my tests are
just unreliable.

How does the route look like where the path MTU is saved?
netstat -rn has a Mtu column.

I see lines like this in the dump.
12:52:50.064789 2a06:d1c0::2.179 > 2a06:d1c0::b.54616: P 69:2147(2078) ack 83 
win 1023 : BGP (UPDATE: (Path 
attributes: (ORIGIN[T] IGP)

Packet size 2078 seems large.  Do use jumbo frames?
On which machine did you make the tcpdump?  OpenBSD?

You should disable TCP Segmentation Offload.  Otherwise you never
know the packet sizes on the wire.
sysctl net.inet.tcp.tso=0
Note that OpenBSD supports Large Receive Offload only on ix(4).
Other hardware interface don't do it.
ifconfig ix0 -tcplro
On Linux you can use ethtool to disable offloading.

Does packet size change in tcpdump when you turm off TSO?

bluhm


On Thu, Mar 07, 2024 at 01:58:05PM +0100, Tobias Fiebig wrote:
> Moin,
> 
> I have run into some issues with v6 PMTUD on OpenBSD 7.4, and am
> somewhat at a loss on how to proceed finding a proper reproducer.
> 
> I first brushed into MTU issues when some of my mailers suddenly
> started to put out ~50mbit of traffic with no apparent reason. Back
> then further debugging lead to the following observations:
> 
> - I received connections from a host behind a HE IPv6 tunnel; This 
>   communicated an MSS of 1440 (MTU 1500)
> - Sending return packets, I received Packet-to-Big ICMP messages from 
>   the HE Tunnel Host, indicating an MTU of 1480.
> - OpenBSD reset the MTU to 1480 and resend
> - I would receive another Packet-to-Big ICMP messages from the HE 
>   Tunnel Host, indicating an MTU of 1480; OpenBSD would set the MTU to 
>   1480 and resend the packet
> 
> The root cause back then was some form of (legacy?) misconfiguration on
> the HE side, as the link actually had an MTU of 1472, which was
> incorrectly reported in the packet-to-big messages by the HE router.
> 
> However, the additional issue seems to be that OpenBSD seems to re-
> transmit endlessly on packet-to-big if the MTU is the same as the
> already discovered PMTU.
> 
> I had initially benched that issue, putting on my todo to do a proper
> write-up and build a tool to remotely trigger this. There might be some
> amplification potential here by abusing, e.g., high-BW HE tunnel
> endpoints to make some dst. send a large amount of outbound traffic;
> But i could not get this working reliably with scapy. Very scrapy
> cobbled together code for linux based on an example snippet to do an
> HTTP request 'by foot' can be found here; Might need some fixing before
> it works: 
> 
> https://rincewind.home.aperture-labs.org/~tfiebig/pmtud_code/http_reque
> st.py
> https://rincewind.home.aperture-labs.org/~tfiebig/pmtud_code/http_request_v6.py
> 
> Note that this needs additional firewalling on the client so the linux
> kernel does not interfere with the TCP sessions, i.e., preventing the
> client from sending RST.
> 
> Also, this is specific to the IPv6 implementation; For IPv4 OpenBSD
> runs down to a minimal MTU (below min. MTU for v4 btw) when re-
> receiving PTB ICMP messages. For v6 it does not doe this, likely due to
> the logic being different in relation to the higher (1280) min MTU.
> 
> Recently, this then hit me again, when gw02.dus01.as59645.net put
> ~1gbit of traffic on the path to gw01.ams01.as59645.net. This occured
> after I had set up a test setup in a third location; This location is
> connected to gw01.ams01 via a MTU 1400 link (vxlan tunnel over IPv6 due
> to lack of fragmentation for v6).
> 
> When i installed a test-device (gw02.dlft), i connected this via a MTU
> 1500 to gw01.dlft01, and--to test something unrelated--via a MTU 1500
> link (tunnel over v4 with out fragmentation handled by an additional
> device transparently that just pushes around VLANs).
> 
> All hosts have a BGP underlay using private ASNs (one per host) to
> distribute the global unicast addresses on the direct links. In
> addtion, there is an iBGP setup between the hosts, exchanging
> fulltables and the non-router networks in use. These are handled via
> loopback addresses, which are also distributed via the BGP underlay.
> 
> See the diagram below:
> 
> +---+  +---+
> |gw01.dus01.as59645.net |  |gw01.ams01.as59645.net |
> | JunOS +--+  VyOS (Linux) +---+
> |   lo: 2a06:d1c0::1|  |   lo: 2a06:d1c0::a|   |
> +---+---+  +---+---+   |
> |  /   |
> |  \   |
> | MTU: 1400 -> /   |
> |  \ 

protection fault in amap_wipeout

2024-03-01 Thread Alexander Bluhm
Hi,

An OpenBSD 7.4 machine on KVM running postgress and pagedaemon
crashed in amap_wipeout().

bluhm

kernel: protection fault trap, code=0
Stopped at  amap_wipeout+0x76:  movq%rcx,0x28(%rax)

ddb{3}> show panic
the kernel did not panic

ddb{3}> trace
amap_wipeout(fd8015b154d0) at amap_wipeout+0x76
uvm_fault_check(8000232d6a20,8000232d6a58,8000232d6a80) at uvm_faul
t_check+0x2ad
uvm_fault(fd811d150748,7d42519fb000,0,1) at uvm_fault+0xfb
upageflttrap(8000232d6b80,7d42519fb3c0) at upageflttrap+0x65
usertrap(8000232d6b80) at usertrap+0x1ee
recall_trap() at recall_trap+0x8
end of kernel
end trace frame: 0x7d42519fb3f0, count: -6

ddb{3}> show register
rdi   0x82473f30amap_list_lock
rsi   0x824f4b50uvm_amap_chunk_pool
rbp   0x8000232d6880
rbx  0xe
rdx   0xfe00
rcx 0x63002f00740069
rax 0x72007200650067
r8 0
r9   0x1
r10   0x8000232d6558
r11   0xa409b3b14c737625
r12   0x8000232d6a80
r13   0x8000232d6a80
r14   0xfd8015b154d0
r15   0x8000232d6a58
rip   0x8132b746amap_wipeout+0x76
cs   0x8
rflags   0x10202__ALIGN_SIZE+0xf202
rsp   0x8000232d6830
ss 0
amap_wipeout+0x76:  movq%rcx,0x28(%rax)

ddb{3}> x/s version
version:OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023\01
2dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012

ddb{3}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*38518  517051  50847503  7   0postgres
 82624  208679  99182   1001  30x200082  kqreadnode
 82624  119588  99182   1001  3   0x4200082  kqreadnode
 82624  122193  99182   1001  3   0x4200082  fsleepnode
 82624   66095  99182   1001  3   0x4200082  fsleepnode
 82624  317454  99182   1001  3   0x4200082  fsleepnode
 82624  284088  99182   1001  3   0x4200082  fsleepnode
 82624  234719  99182   1001  3   0x4200082  fsleepnode
 31531  190400  99182   1001  30x200082  kqreadnode
 31531  185654  99182   1001  3   0x4200082  kqreadnode
 31531  229761  99182   1001  3   0x4200082  fsleepnode
 31531  117978  99182   1001  3   0x4200082  fsleepnode
 31531  334870  99182   1001  3   0x4200082  fsleepnode
 31531  312488  99182   1001  3   0x4200082  fsleepnode
 31531  167646  99182   1001  3   0x4200082  fsleepnode
 89435  492203  99182   1001  30x200082  kqreadnode
 89435  460828  99182   1001  3   0x4200082  kqreadnode
 89435   83554  99182   1001  3   0x4200082  fsleepnode
 89435  293193  99182   1001  3   0x4200082  fsleepnode
 89435 704  99182   1001  3   0x4200082  fsleepnode
 89435  359138  99182   1001  3   0x4200082  fsleepnode
 89435   19846  99182   1001  3   0x4200082  fsleepnode
  8671  276508  99182   1001  30x200082  fsleepnode
  8671  322042  99182   1001  3   0x4200082  kqreadnode
  8671  351444  99182   1001  3   0x422  flt_noram5node
  8671  260721  99182   1001  3   0x4200082  fsleepnode
  8671   75501  99182   1001  3   0x422  flt_noram5node
  8671  141315  99182   1001  3   0x4200082  fsleepnode
  8671  287176  99182   1001  3   0x4200082  fsleepnode
 95187  370471  99182   1001  30x200082  kqreadnode
 95187  210159  99182   1001  3   0x4200082  kqreadnode
 95187  229193  99182   1001  3   0x4200082  fsleepnode
 95187  300771  99182   1001  3   0x4200082  fsleepnode
 95187  421388  99182   1001  3   0x4200082  fsleepnode
 95187  397767  99182   1001  3   0x4200082  fsleepnode
 95187   40543  99182   1001  3   0x4200082  fsleepnode
  2278  484811  99182   1001  30x200082  kqreadnode
  2278  400491  99182   1001  3   0x4200082  kqreadnode
  2278  371099  99182   1001  3   0x4200082  fsleepnode
  2278  362176  99182   1001  3   0x4200082  fsleepnode
  2278  136151  99182   1001  3   0x4200082  fsleepnode
  2278  319181  99182   1001  3   0x4200082  fsleepnode
  2278  382201  99182   1001  3   0x4200082  fsleepnode
 38833   71526  99182   1001  30x22  flt_noram5node
 38833  499573  99182   1001  3   0x4200082  kqreadnode
 38833  391427  99182   1001  3   0x422  flt_noram5node
 38833  274546  99182   1001  3   0x422  flt_noram5node
 38833   33206  99182   1001  3   0x422  flt_noram5node
 38833  428149  99182   1001  3   0x422  flt_noram5node
 38833  432234  99182   1001  

dwqe ifconfig down panic

2024-02-29 Thread Alexander Bluhm
Hi,

When doing flood ping transmit from a machine and simultaneously
ifconfig down/up in a loop, dwqe(4) interface driver crashes.

dwqe_down() contains an interrupt barrier, but somehow it does not
work.  Immediately after Xspllower() a transmit interrupt is
processed.

bluhm

kernel: protection fault trap, code=0
Stopped at  m_tag_delete_chain+0x30:movq0(%rsi),%rax

ddb{0}> trace
m_tag_delete_chain(fd806bfa5300) at m_tag_delete_chain+0x30
m_free(fd806bfa5300) at m_free+0x9e
m_freem(fd806bfa5300) at m_freem+0x38
dwqe_tx_proc(80304800) at dwqe_tx_proc+0x194
dwqe_intr(80304800) at dwqe_intr+0x9b
intr_handler(80003f86e760,805f4f80) at intr_handler+0x72
Xintr_ioapic_edge36_untramp() at Xintr_ioapic_edge36_untramp+0x18f
Xspllower() at Xspllower+0x1d
dwqe_ioctl(80304870,80206910,80003f86e990) at dwqe_ioctl+0x18c
ifioctl(fd81ffabe1e8,80206910,80003f86e990,80003f94e550) at 
ifioctl+0x726
sys_ioctl(80003f94e550,80003f86eb50,80003f86eac0) at sys_ioctl+0x2af
syscall(80003f86eb50) at syscall+0x55b
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x73ef48509270, count: -13

ddb{0}> show register
rdi   0xfd806bfa5300
rsi   0xdeafbeaddeafbead
rbp   0x80003f86e5f0
rbx0xf40
rdx0
rcx0
rax   0xab56__ALIGN_SIZE+0x9b56
r8  0x90
r9 0x24634ac__kernel_rodata_phys+0x3624ac
r10   0xe676ed611cc13e4f
r11   0xd2619954b795f246
r12   0x81110f48
r13   0xfd807282
r14   0xfd806bfa5300
r15   0xfd805f6def00
rip   0x81daae80m_tag_delete_chain+0x30
cs   0x8
rflags   0x10282__ALIGN_SIZE+0xf282
rsp   0x80003f86e5d0
ss  0x10
m_tag_delete_chain+0x30:movq0(%rsi),%rax

ddb{0}> x/s version
version:OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 2024\012   
 r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012

ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*70039   16536  80360  0  7   0x803ifconfig
 41531  214934  36719 51  3   0x8100033  netlock   ping

OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 2024
r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8038207488 (7665MB)
avail mem = 7773556736 (7413MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x769c7000 (85 entries)
bios0: vendor American Megatrends Inc. version "1.02.10" date 06/27/2022
efi0 at bios0: UEFI 2.7
efi0: American Megatrends rev 0x50013
acpi0 at bios0: ACPI 6.2
acpi0: sleep states S0 S5
acpi0: tablesfg0: addr 0xc000, bus 0-255
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel Atom(R) x6425RE Processor @ 1.90GHz, 1895.90 MHz, 06-96-01, patch 
0017
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,WAITPKG,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,MISC_PKG_CT,ENERGY_FILT,FB_CLEAR,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
12-way L2 cache, 4MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.2.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel Atom(R) x6425RE Processor @ 1.90GHz, 1895.90 MHz, 06-96-01, patch 
0017
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,WAITPKG,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,MISC_PKG_CT,ENERGY_FILT,FB_CLEAR,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
12-way L2 cache, 4MB 64b/line 16-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: 

Re: TSO em(4) problem

2024-02-01 Thread Alexander Bluhm
On Tue, Jan 30, 2024 at 02:32:24PM +0100, Hrvoje Popovski wrote:
> yes, and forwarding only without pf.
> I'm sending traffic from host connected to vlan/ix0 and forward through
> em5 to other host.
> I'm sending 1Gbps of traffic with cisco t-rex

I cannot reproduce.

ix0 at pci6 dev 0 function 0 "Intel 82599" rev 0x01, msix, 8 queues, address 
90:e2:ba:d6:23:68
em1 at pci7 dev 0 function 1 "Intel I350" rev 0x01: msi, address 
a0:36:9f:0a:4a:c5

root@ot42:.../~# ifconfig ix0 hwfeatures
ix0: flags=2008843 mtu 1500

hwfeatures=71b7
 hardmtu 9198
lladdr 90:e2:ba:d6:23:68
description: Intel 82599
index 5 priority 0 llprio 3
media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause)
status: active

root@ot42:.../~# ifconfig em1 hwfeatures
em1: flags=8c43 mtu 1500

hwfeatures=31b7
 hardmtu 9216
lladdr a0:36:9f:0a:4a:c5
description: Intel I350
index 8 priority 0 llprio 3
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet 10.10.22.3 netmask 0xff00 broadcast 10.10.22.255

root@ot42:.../~# ifconfig vlan0 hwfeatures
vlan0: flags=8843 mtu 1500

hwfeatures=3187
 hardmtu 9198
lladdr 90:e2:ba:d6:23:68
index 24 priority 0 llprio 3
encap: vnetid 221 parent ix0 txprio packet rxprio outer
groups: vlan
media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause)
status: active
inet 10.10.21.2 netmask 0xff00 broadcast 10.10.21.255

root@ot42:.../~# pfctl -si
Status: Disabled for 0 days 00:03:42 Debug: err

Running tcpbench -n100 from Linux via OpenBSD forwarding to Linux.
Simultaneous udpbench to create traffic mixture.

root@ot42:.../~# netstat -ss | egrep 'TSO|LRO'
1188 output TSO packets software chopped
33086906 output TSO packets hardware processed
265855748 output TSO packets generated
31090975 input LRO generated packets from hardware
176482178 input LRO coalesced packets by network device

Lot of LRO and TSO.  Running diff below, which reverts em TSO backout
and adds sparc64 fix.

Hrvoje: What is different in your lab?

bluhm

Index: dev/pci/if_em.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v
diff -u -p -r1.371 if_em.c
--- dev/pci/if_em.c 28 Jan 2024 18:42:58 -  1.371
+++ dev/pci/if_em.c 29 Jan 2024 14:37:36 -
@@ -291,6 +291,8 @@ void em_receive_checksum(struct em_softc
 struct mbuf *);
 u_int  em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int,
u_int32_t *, u_int32_t *);
+u_int  em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
+   u_int32_t *);
 u_int  em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
u_int32_t *);
 void em_iff(struct em_softc *);
@@ -1188,7 +1190,7 @@ em_flowstatus(struct em_softc *sc)
  *
  *  This routine maps the mbufs to tx descriptors.
  *
- *  return 0 on success, positive on failure
+ *  return 0 on failure, positive on success
  **/
 u_int
 em_encap(struct em_queue *que, struct mbuf *m)
@@ -1236,7 +1238,15 @@ em_encap(struct em_queue *que, struct mb
}
 
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
-   used += em_tx_ctx_setup(que, m, head, _upper, _lower);
+   if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) {
+   used += em_tso_setup(que, m, head, _upper,
+   _lower);
+   if (!used)
+   return (used);
+   } else {
+   used += em_tx_ctx_setup(que, m, head, _upper,
+   _lower);
+   }
} else if (sc->hw.mac_type >= em_82543) {
used += em_transmit_checksum_setup(que, m, head,
_upper, _lower);
@@ -1569,6 +1579,21 @@ em_update_link_status(struct em_softc *s
ifp->if_link_state = link_state;
if_link_state_change(ifp);
}
+
+   /* Disable TSO for 10/100 speeds to avoid some hardware issues */
+   switch (sc->link_speed) {
+   case SPEED_10:
+   case SPEED_100:
+   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
+   ifp->if_capabilities &= ~IFCAP_TSOv4;
+   ifp->if_capabilities &= ~IFCAP_TSOv6;
+   }
+   break;
+   case SPEED_1000:
+   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210)
+   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
+   break;
+   }
 }
 
 /*
@@ -1988,6 +2013,7 @@ 

Re: TSO em(4) problem

2024-01-30 Thread Alexander Bluhm
On Tue, Jan 30, 2024 at 12:07:08PM +0100, Hrvoje Popovski wrote:
> On 30.1.2024. 9:27, Hrvoje Popovski wrote:
> > I will prepare one box for this kind of traffic and will contact you and
> > marcus
> > 
> >> In theory when going through vlan interface it should remove
> >> M_VLANTAG.  But something must be wrong and I wonder what.
> >>
> >> bluhm
> 
> Hi,
> 
> I've managed to trigger watchdog in lab. It couldn't be possible without
> bluhm@ information about ix vlan, thank you.

Great, now we can debug the details.

I have to know how ix and em are connected.

Do you have any bridge or veb?  Where are your vlan trunks?
Any aggr, trunk, carp?

Is my understanding of your setup corect?

ix -> vlan -> forward -> em

Can something more happen, like

ix -> forward -> em

bluhm

> Jan 30 12:01:09 smc4 /bsd: em5: watchdog: head 123 tail 187 TDH 187 TDT 123
> Jan 30 12:01:18 smc4 /bsd: em5: watchdog: head 243 tail 307 TDH 307 TDT 243
> Jan 30 12:01:28 smc4 /bsd: em5: watchdog: head 463 tail 15 TDH 15 TDT 463
> Jan 30 12:01:37 smc4 /bsd: em5: watchdog: head 413 tail 477 TDH 477 TDT 413
> Jan 30 12:01:46 smc4 /bsd: em5: watchdog: head 195 tail 259 TDH 259 TDT 195
> Jan 30 12:01:55 smc4 /bsd: em5: watchdog: head 259 tail 323 TDH 323 TDT 259
> Jan 30 12:02:05 smc4 /bsd: em5: watchdog: head 333 tail 397 TDH 397 TDT 333
> Jan 30 12:02:14 smc4 /bsd: em5: watchdog: head 33 tail 97 TDH 97 TDT 33
> Jan 30 12:02:24 smc4 /bsd: em5: watchdog: head 459 tail 11 TDH 11 TDT 459
> Jan 30 12:02:33 smc4 /bsd: em5: watchdog: head 447 tail 511 TDH 511 TDT 447
> 
> 
> em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01: msi, address
> 00:1b:21:61:8a:94
> em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01: msi, address
> 00:1b:21:61:8a:95
> em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address
> 00:25:90:5d:c9:98
> em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address
> 00:25:90:5d:c9:99
> em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9a
> em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9b
> em6 at pci12 dev 0 function 2 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9c
> em7 at pci12 dev 0 function 3 "Intel I350" rev 0x01: msi, address
> 00:25:90:5d:c9:9d
> 
> 
> smc4# netstat -sp tcp | grep LRO
> 0 input LRO packets passed through pseudo device
> 4696315 input LRO generated packets from hardware
> 13205047 input LRO coalesced packets by network device
> 0 input bad LRO packets dropped
> smc4# netstat -sp tcp | grep TSO
> 0 output TSO packets software chopped
> 3672 output TSO packets hardware processed
> 0 output TSO packets generated
> 0 output TSO packets dropped
> 
> 
> 
> 
> smc4# ifconfig em5 hwfeatures
> em5: flags=8c43 mtu 1500
>  
> hwfeatures=31b7
>  hardmtu 9216
> lladdr 00:25:90:5d:c9:9b
> index 8 priority 0 llprio 3
> media: Ethernet autoselect (1000baseT
> full-duplex,master,rxpause,txpause)
> status: active
> inet 192.168.20.1 netmask 0xff00 broadcast 192.168.20.255
> 



Re: TSO em(4) problem

2024-01-29 Thread Alexander Bluhm
On Sun, Jan 28, 2024 at 07:46:29PM +0100, Marcus Glocker wrote:
> Anyway, the TSO support just has been backed out.  Thanks again for all
> your testing!

I am still interested to get em with TSO working if possible.  Most
use cases work fine.  If there is a bug in our driver, we may fix
it.  If it is hardware bug, we should identitfy the broken chip
revisions.

Here is the backed out em TSO diff together with the TCP header
diff for sparc64.

Kurt, could you still test this in your next sparc64 build?

bluhm

Index: dev/pci/if_em.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v
diff -u -p -r1.371 if_em.c
--- dev/pci/if_em.c 28 Jan 2024 18:42:58 -  1.371
+++ dev/pci/if_em.c 29 Jan 2024 14:37:36 -
@@ -291,6 +291,8 @@ void em_receive_checksum(struct em_softc
 struct mbuf *);
 u_int  em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int,
u_int32_t *, u_int32_t *);
+u_int  em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
+   u_int32_t *);
 u_int  em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
u_int32_t *);
 void em_iff(struct em_softc *);
@@ -1188,7 +1190,7 @@ em_flowstatus(struct em_softc *sc)
  *
  *  This routine maps the mbufs to tx descriptors.
  *
- *  return 0 on success, positive on failure
+ *  return 0 on failure, positive on success
  **/
 u_int
 em_encap(struct em_queue *que, struct mbuf *m)
@@ -1236,7 +1238,15 @@ em_encap(struct em_queue *que, struct mb
}
 
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
-   used += em_tx_ctx_setup(que, m, head, _upper, _lower);
+   if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) {
+   used += em_tso_setup(que, m, head, _upper,
+   _lower);
+   if (!used)
+   return (used);
+   } else {
+   used += em_tx_ctx_setup(que, m, head, _upper,
+   _lower);
+   }
} else if (sc->hw.mac_type >= em_82543) {
used += em_transmit_checksum_setup(que, m, head,
_upper, _lower);
@@ -1569,6 +1579,21 @@ em_update_link_status(struct em_softc *s
ifp->if_link_state = link_state;
if_link_state_change(ifp);
}
+
+   /* Disable TSO for 10/100 speeds to avoid some hardware issues */
+   switch (sc->link_speed) {
+   case SPEED_10:
+   case SPEED_100:
+   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
+   ifp->if_capabilities &= ~IFCAP_TSOv4;
+   ifp->if_capabilities &= ~IFCAP_TSOv6;
+   }
+   break;
+   case SPEED_1000:
+   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210)
+   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
+   break;
+   }
 }
 
 /*
@@ -1988,6 +2013,7 @@ em_setup_interface(struct em_softc *sc)
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
ifp->if_capabilities |= IFCAP_CSUM_IPv4;
ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
+   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
}
 
/* 
@@ -2231,9 +2257,9 @@ em_setup_transmit_structures(struct em_s
 
for (i = 0; i < sc->sc_tx_slots; i++) {
pkt = >tx.sc_tx_pkts_ring[i];
-   error = bus_dmamap_create(sc->sc_dmat, 
MAX_JUMBO_FRAME_SIZE,
+   error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE,
EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
-   MAX_JUMBO_FRAME_SIZE, 0, BUS_DMA_NOWAIT, 
>pkt_map);
+   EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map);
if (error != 0) {
printf("%s: Unable to create TX DMA map\n",
DEVNAME(sc));
@@ -2403,6 +2429,81 @@ em_free_transmit_structures(struct em_so
0, que->tx.sc_tx_dma.dma_map->dm_mapsize,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
}
+}
+
+u_int
+em_tso_setup(struct em_queue *que, struct mbuf *mp, u_int head,
+u_int32_t *olinfo_status, u_int32_t *cmd_type_len)
+{
+   struct ether_extracted ext;
+   struct e1000_adv_tx_context_desc *TD;
+   uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0;
+   uint32_t paylen = 0;
+   uint8_t iphlen = 0;
+
+   *olinfo_status = 0;
+   *cmd_type_len = 0;
+   TD = (struct e1000_adv_tx_context_desc 

Re: TSO em(4) problem

2024-01-29 Thread Alexander Bluhm
On Sat, Jan 27, 2024 at 08:08:35AM +0100, Hrvoje Popovski wrote:
> On 26.1.2024. 22:47, Alexander Bluhm wrote:
> > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
> >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
> >> production.
> > What helped debugging a similar issue with ixl(4) and TSO was to
> > remove all TSO specific code from the driver.  Then only this part
> > remains from the original em(4) TSO diff.
> > 
> > error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE,
> > EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
> > EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map);
> > 
> > The parameters that changed when adding TSO are:
> > 
> > bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535
> > bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 
> > 4096
> > 
> > I suspect that this is the cause for the regression as disabling
> > TSO did not help.  Would it be possible to run the diff below?  I
> > expect that the problem will still be there.  But then we know it
> > must be the change of one of the bus_dmamap_create() arguments.
> > 
> > bluhm
> 
> Hi,
> 
> with this diff em0 seems happy and em watchdog is gone.

This is very interesting.  That means that the bus_dmamap_create()
argument does not cause the regression.

Did you see anywhere "output TSO packets hardware processed in"
netstat -s.  In some iteration of testing you turned TSO off with
sysctl net.inet.tcp.tso=0, but it did not help.  So no TSO packets
from the stack.

In another mail you mentioned

> Setup is very simple
> em0 - carp <- uplink
> em1 - pfsync
> ix1 - vlans - carp

ix supports LRO.  If you forward from ix1 to em0 the LRO packets
from ix hardware are split by TSO on em hardware.  And the ix does
vlan offloading + LRO, so em must do vlan offloading properly with
TSO.  Or do you use a vlan interface?

Does it help to disable LRO, ifconfig ix1 -tcplro ?

I see this vlan code with mac_type checks.  Can we end in a
configuration where we enable TSO but cannot do VLAN offloading?

#if NVLAN > 0
/* Find out if we are in VLAN mode */
if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 ||
sc->hw.mac_type > em_i210)) {
/* Set the VLAN id */
desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag);

/* Tell hardware to add tag */
desc->lower.data |= htole32(E1000_TXD_CMD_VLE);
}
#endif

Hrvoje, I know you do great tests in your lab.  Did you try this
setup:

Send bulk TCP traffic in vlan that will trigger LRO.
Do VLAN + LRO offloading in ix.
Forward it to em with TSO.

In theory when going through vlan interface it should remove
M_VLANTAG.  But something must be wrong and I wonder what.

bluhm



Re: TSO em(4) problem

2024-01-26 Thread Alexander Bluhm
On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
> production.

What helped debugging a similar issue with ixl(4) and TSO was to
remove all TSO specific code from the driver.  Then only this part
remains from the original em(4) TSO diff.

error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE,
EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map);

The parameters that changed when adding TSO are:

bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535
bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 4096

I suspect that this is the cause for the regression as disabling
TSO did not help.  Would it be possible to run the diff below?  I
expect that the problem will still be there.  But then we know it
must be the change of one of the bus_dmamap_create() arguments.

bluhm

Index: dev/pci/if_em.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v
diff -u -p -r1.370 if_em.c
--- dev/pci/if_em.c 31 Dec 2023 08:42:33 -  1.370
+++ dev/pci/if_em.c 26 Jan 2024 21:32:08 -
@@ -291,8 +291,6 @@ void em_receive_checksum(struct em_softc
 struct mbuf *);
 u_int  em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int,
u_int32_t *, u_int32_t *);
-u_int  em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
-   u_int32_t *);
 u_int  em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
u_int32_t *);
 void em_iff(struct em_softc *);
@@ -1238,15 +1236,7 @@ em_encap(struct em_queue *que, struct mb
}
 
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
-   if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) {
-   used += em_tso_setup(que, m, head, _upper,
-   _lower);
-   if (!used)
-   return (used);
-   } else {
-   used += em_tx_ctx_setup(que, m, head, _upper,
-   _lower);
-   }
+   used += em_tx_ctx_setup(que, m, head, _upper, _lower);
} else if (sc->hw.mac_type >= em_82543) {
used += em_transmit_checksum_setup(que, m, head,
_upper, _lower);
@@ -1579,21 +1569,6 @@ em_update_link_status(struct em_softc *s
ifp->if_link_state = link_state;
if_link_state_change(ifp);
}
-
-   /* Disable TSO for 10/100 speeds to avoid some hardware issues */
-   switch (sc->link_speed) {
-   case SPEED_10:
-   case SPEED_100:
-   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
-   ifp->if_capabilities &= ~IFCAP_TSOv4;
-   ifp->if_capabilities &= ~IFCAP_TSOv6;
-   }
-   break;
-   case SPEED_1000:
-   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210)
-   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
-   break;
-   }
 }
 
 /*
@@ -2013,7 +1988,6 @@ em_setup_interface(struct em_softc *sc)
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
ifp->if_capabilities |= IFCAP_CSUM_IPv4;
ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
-   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
}
 
/* 
@@ -2429,81 +2403,6 @@ em_free_transmit_structures(struct em_so
0, que->tx.sc_tx_dma.dma_map->dm_mapsize,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
}
-}
-
-u_int
-em_tso_setup(struct em_queue *que, struct mbuf *mp, u_int head,
-u_int32_t *olinfo_status, u_int32_t *cmd_type_len)
-{
-   struct ether_extracted ext;
-   struct e1000_adv_tx_context_desc *TD;
-   uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0;
-   uint32_t paylen = 0;
-   uint8_t iphlen = 0;
-
-   *olinfo_status = 0;
-   *cmd_type_len = 0;
-   TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head];
-
-#if NVLAN > 0
-   if (ISSET(mp->m_flags, M_VLANTAG)) {
-   uint32_t vtag = mp->m_pkthdr.ether_vtag;
-   vlan_macip_lens |= vtag << E1000_ADVTXD_VLAN_SHIFT;
-   *cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
-   }
-#endif
-
-   ether_extract_headers(mp, );
-   if (ext.tcp == NULL)
-   goto out;
-
-   vlan_macip_lens |= (sizeof(*ext.eh) << E1000_ADVTXD_MACLEN_SHIFT);
-
-   if (ext.ip4) {
-   iphlen = ext.ip4->ip_hl << 2;
-
-   type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4;
-   *olinfo_status |= 

Re: OpenBSD 7.4/amd64 on APU4D4 - kernel panic

2024-01-17 Thread Alexander Bluhm
On Wed, Jan 17, 2024 at 11:46:36AM +0100, Radek wrote:
> ddb{0}> show panic
> *cpu0: kernel diagnostic assertion "pkt->pkt_m != NULL" failed: file 
> "/usr/src/
> sys/dev/pci/if_em.c", line 2580

> OpenBSD 7.4 (GENERIC.MP) #0: Fri Jan 12 09:31:37 CET 2024
> r...@krz74.krz:/usr/src/sys/arch/amd64/compile/GENERIC.MP

It looks like you are running 7.4 release with a self compiled
kernel.  This diff from -current should fix your problem.

Protect em(4) refill timeout with splnet.

>From time to time "pkt->pkt_m == NULL" or "m != NULL" assertions
were hit in the em driver.  Stack trace shows that em refill timeout
was interrupted by em interrupt.  Doing em_rxfill() and em_rxeof()
simultaneously cannot be correct.  Protect softclock in em_rxrefill()
with splnet().

OK mglocker@

Index: dev/pci/if_em.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v
diff -u -p -r1.368 if_em.c
--- dev/pci/if_em.c 3 Dec 2023 00:19:25 -   1.368
+++ dev/pci/if_em.c 29 Dec 2023 14:44:34 -
@@ -285,6 +285,7 @@ int  em_allocate_transmit_structures(str
 int  em_allocate_desc_rings(struct em_softc *);
 int  em_rxfill(struct em_queue *);
 void em_rxrefill(void *);
+void em_rxrefill_locked(struct em_queue *);
 int  em_rxeof(struct em_queue *);
 void em_receive_checksum(struct em_softc *, struct em_rx_desc *,
 struct mbuf *);
@@ -1022,7 +1023,7 @@ em_intr(void *arg)
if (ifp->if_flags & IFF_RUNNING) {
em_txeof(que);
if (em_rxeof(que))
-   em_rxrefill(que);
+   em_rxrefill_locked(que);
}
 
/* Link status change */
@@ -2958,6 +2959,16 @@ void
 em_rxrefill(void *arg)
 {
struct em_queue *que = arg;
+   int s;
+
+   s = splnet();
+   em_rxrefill_locked(que);
+   splx(s);
+}
+
+void
+em_rxrefill_locked(struct em_queue *que)
+{
struct em_softc *sc = que->sc;
 
if (em_rxfill(que))
@@ -3954,7 +3965,7 @@ em_queue_intr_msix(void *vque)
if (ifp->if_flags & IFF_RUNNING) {
em_txeof(que);
if (em_rxeof(que))
-   em_rxrefill(que);
+   em_rxrefill_locked(que);
}
 
em_enable_queue_intr_msix(que);



Re: kernel panic: ip_output no HDR

2024-01-15 Thread Alexander Bluhm
On Mon, Jan 15, 2024 at 01:42:55PM -0300, K R wrote:
> >Synopsis:  kernel panic: ip_output no HDR
> >Category:  kernel amd64
> >Environment:
> System  : OpenBSD 7.4
> Details : OpenBSD 7.4-stable (GENERIC.MP) #0: Mon Dec 11
> 19:17:55 UTC 2023
> 
> root@server.mydomain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> 
> ...
> pppoe0: received unexpected PADO
> panic: ip_output no HDR
> Starting stack trace...
> panic() at panic+0x130
> ip_output() at ip_output+0x126
> udp_output() at 0x3be
> sosend() at sosend+0x37f
> pflow_output_process() at pflow_output_process+0x67
> taskq_thread() at taskq_thread+0x100
> end trace frame: 0x0, count: 251
> End of stack frame.
> 
> Trace lines attached below.
> 
> >How-To-Repeat:
> 
> The machine export pflows to localhost: those are read by nfcapd, part
> of nfdump, from packages.  This was a very stable setup -- the kernel
> panic started after upgrading to 7.4.
> 
> This machine runs 7.4-stable.  Same panic was also observed on another
> machine, running 7.4-release + all syspatches, also using pflow.

There were some commits to if_pflow.c in current.  mvs@ has fixed
a bunch of races.  Would it be possible for you to install a -current
snapshot on the machine and test if the problem persists?

bluhm



Re: vmm guest crash in vio

2024-01-09 Thread Alexander Bluhm
On Tue, Jan 09, 2024 at 07:49:16PM +0100, Stefan Fritsch wrote:
> @bluhm: Does the attached patch fix the panic? 

Yes.  My test does not crash the patched guest anymore.

bluhm

> The fdt part is completely untested, testers welcome.
> 
> diff --git a/sys/dev/fdt/virtio_mmio.c b/sys/dev/fdt/virtio_mmio.c
> index 4f1e9eba9b7..27fb17d6102 100644
> --- a/sys/dev/fdt/virtio_mmio.c
> +++ b/sys/dev/fdt/virtio_mmio.c
> @@ -200,11 +200,19 @@ virtio_mmio_set_status(struct virtio_softc *vsc, int 
> status)
>   struct virtio_mmio_softc *sc = (struct virtio_mmio_softc *)vsc;
>   int old = 0;
>  
> - if (status != 0)
> + if (status == 0) {
> + bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_STATUS,
> + 0);
> + while (bus_space_read_4(sc->sc_iot, sc->sc_ioh,
> + VIRTIO_MMIO_STATUS) != 0) {
> + CPU_BUSY_CYCLE();
> + }
> + } else  {
>   old = bus_space_read_4(sc->sc_iot, sc->sc_ioh,
> -VIRTIO_MMIO_STATUS);
> - bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_STATUS,
> -   status|old);
> + VIRTIO_MMIO_STATUS);
> + bus_space_write_4(sc->sc_iot, sc->sc_ioh, VIRTIO_MMIO_STATUS,
> + status|old);
> + }
>  }
>  
>  int
> diff --git a/sys/dev/pci/virtio_pci.c b/sys/dev/pci/virtio_pci.c
> index 398dc960f6d..ef95c834823 100644
> --- a/sys/dev/pci/virtio_pci.c
> +++ b/sys/dev/pci/virtio_pci.c
> @@ -282,15 +282,29 @@ virtio_pci_set_status(struct virtio_softc *vsc, int 
> status)
>   int old = 0;
>  
>   if (sc->sc_sc.sc_version_1) {
> - if (status != 0)
> + if (status == 0) {
> + CWRITE(sc, device_status, 0);
> + while (CREAD(sc, device_status) != 0) {
> + CPU_BUSY_CYCLE();
> + }
> + } else {
>   old = CREAD(sc, device_status);
> - CWRITE(sc, device_status, status|old);
> + CWRITE(sc, device_status, status|old);
> + }
>   } else {
> - if (status != 0)
> + if (status == 0) {
> + bus_space_write_1(sc->sc_iot, sc->sc_ioh,
> + VIRTIO_CONFIG_DEVICE_STATUS, status|old);
> + while (bus_space_read_1(sc->sc_iot, sc->sc_ioh,
> + VIRTIO_CONFIG_DEVICE_STATUS) != 0) {
> + CPU_BUSY_CYCLE();
> + }
> + } else {
>   old = bus_space_read_1(sc->sc_iot, sc->sc_ioh,
>   VIRTIO_CONFIG_DEVICE_STATUS);
> - bus_space_write_1(sc->sc_iot, sc->sc_ioh,
> - VIRTIO_CONFIG_DEVICE_STATUS, status|old);
> + bus_space_write_1(sc->sc_iot, sc->sc_ioh,
> + VIRTIO_CONFIG_DEVICE_STATUS, status|old);
> + }
>   }
>  }
>  



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-09 Thread Alexander Bluhm
On Tue, Jan 09, 2024 at 12:04:17PM +1000, Jonathan Matthew wrote:
> On Wed, Jan 03, 2024 at 10:14:12AM +0100, Hrvoje Popovski wrote:
> > On 3.1.2024. 7:51, Jonathan Matthew wrote:
> > > On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> > >> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> > >>> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> > >>> interface and ifconfig bnxt0 down/up at the same time I can trigger
> > >>> panic. Panic can be triggered without kettenis@ diff...
> > >> It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> > >> receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> > >> My panic looks different.
> > > It looks like I wasn't trying very hard when I wrote bnxt_down().
> > > I think there's also a problem with bnxt_up() unwinding after failure
> > > in various places, but that's a different issue.
> > > 
> > > This makes it a more resilient for me, though it still logs
> > > 'bnxt0: unexpected completion type 3' a lot if I take the interface
> > > down while it's in use.  I'll look at that separately.
> > 
> > Hi,
> > 
> > with this diff I can still panic box with ifconfig up/down but not as
> > fast as without it
> 
> Right, this is the other problem where bnxt_up() wasn't cleaning up properly
> after failing part way through.  This diff should fix that, but I don't think
> it will fix the 'HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error'
> problem, so the interface will still stop working at that point.

OK bluhm@

> Index: if_bnxt.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
> retrieving revision 1.39
> diff -u -p -r1.39 if_bnxt.c
> --- if_bnxt.c 10 Nov 2023 15:51:20 -  1.39
> +++ if_bnxt.c 9 Jan 2024 01:59:38 -
> @@ -1073,7 +1081,7 @@ bnxt_up(struct bnxt_softc *sc)
>   if (bnxt_hwrm_vnic_ctx_alloc(sc, >sc_vnic.rss_id) != 0) {
>   printf("%s: failed to allocate vnic rss context\n",
>   DEVNAME(sc));
> - goto down_queues;
> + goto down_all_queues;
>   }
>  
>   sc->sc_vnic.id = (uint16_t)HWRM_NA_SIGNATURE;
> @@ -1139,8 +1147,11 @@ dealloc_vnic:
>   bnxt_hwrm_vnic_free(sc, >sc_vnic);
>  dealloc_vnic_ctx:
>   bnxt_hwrm_vnic_ctx_free(sc, >sc_vnic.rss_id);
> +
> +down_all_queues:
> + i = sc->sc_nqueues;
>  down_queues:
> - for (i = 0; i < sc->sc_nqueues; i++)
> + while (i-- > 0)
>   bnxt_queue_down(sc, >sc_queues[i]);
>  
>   bnxt_dmamem_free(sc, sc->sc_rx_cfg);



vmm guest crash in vio

2024-01-08 Thread Alexander Bluhm
Hi,

When running a guest in vmm and doing ifconfig operations on vio
interface, I can crash the guest.

I run these loops in the guest:

while doas ifconfig vio1 inet 10.188.234.74/24; do :; done
while doas ifconfig vio1 -inet; do :; done
while doas ifconfig vio1 down; do :; done

And from host I ping the guest:

ping -f 10.188.234.74

Then I see various kind of mbuf corruption:

kernel: protection fault trap, code=0
Stopped at  pool_do_put+0xc9:   movq0x8(%rcx),%rcx
ddb> trace
pool_do_put(82519e30,fd807db89000) at pool_do_put+0xc9
pool_put(82519e30,fd807db89000) at pool_put+0x53
m_extfree(fd807d330300) at m_extfree+0xa5
m_free(fd807d330300) at m_free+0x97
soreceive(fd806f33ac88,0,80002a3e97f8,0,0,80002a3e9724,76299c799030
1bf1) at soreceive+0xa3e
soo_read(fd807ed4a168,80002a3e97f8,0) at soo_read+0x4a
dofilereadv(80002a399548,7,80002a3e97f8,0,80002a3e98c0) at dofilere
adv+0x143
sys_read(80002a399548,80002a3e9870,80002a3e98c0) at sys_read+0x55
syscall(80002a3e9930) at syscall+0x33a
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7469f8836930, count: -10

pool_do_put(8259a500,fd807e7fa800) at pool_do_put+0xc9
pool_put(8259a500,fd807e7fa800) at pool_put+0x53
m_extfree(fd807f838a00) at m_extfree+0xa5
m_free(fd807f838a00) at m_free+0x97
m_freem(fd807f838a00) at m_freem+0x38
vio_txeof(80030118) at vio_txeof+0x11d
vio_tx_intr(80030118) at vio_tx_intr+0x31
virtio_check_vqs(80024800) at virtio_check_vqs+0x102
virtio_pci_legacy_intr(80024800) at virtio_pci_legacy_intr+0x65
intr_handler(80002a52dae0,80081000) at intr_handler+0x3c
Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3
Xspllower() at Xspllower+0x1d
vio_ioctl(800822a8,80206910,80002a52dd00) at vio_ioctl+0x16a
ifioctl(fd807c0ba7a0,80206910,80002a52dd00,80002a41c810) at ifioctl
+0x721
sys_ioctl(80002a41c810,80002a52de00,80002a52de50) at sys_ioctl+0x2a
b
syscall(80002a52dec0) at syscall+0x33a
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7b3d36d55eb0, count: -17

panic: pool_do_get: mcl2k free list modified: page 0xfd80068bd000; item add
r 0xfd80068bf800; offset 0x0=0xa != 0x83dcdb591c6b8bf
Stopped at  db_enter+0x14:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*143851  19121  0 0x3  00  ifconfig
db_enter() at db_enter+0x14
panic(8206e651) at panic+0xb5
pool_do_get(824a1b30,2,80002a4a55d4) at pool_do_get+0x320
pool_get(824a1b30,2) at pool_get+0x7d
m_clget(fd807c4e4f00,2,800) at m_clget+0x18d
rtm_msg1(e,80002a4a56f0) at rtm_msg1+0xde
rtm_ifchg(800822a8) at rtm_ifchg+0x65
if_down(800822a8) at if_down+0xa4
ifioctl(fd8006898978,80206910,80002a4a58c0,80002a474ff0) at ifioctl
+0xcd5
sys_ioctl(80002a474ff0,80002a4a59c0,80002a4a5a10) at sys_ioctl+0x2a
b
syscall(80002a4a5a80) at syscall+0x33a
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f6c22492130, count: 3

OpenBSD 7.4-current (GENERIC) #3213: Mon Jan  8 22:05:58 CET 2024

bluhm@t430s.bluhm.invalid:/home/bluhm/openbsd/cvs/src/sys/arch/amd64/compile/GENERIC*master
real mem = 2130706432 (2032MB)
avail mem = 2046525440 (1951MB)
random: boothowto does not indicate good seed
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.78 MHz, 06-3a-09
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,LONG,LAHF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 64b/line 
8-way L2 cache, 4MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
cpu0: using VERW MDS workaround
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address 70:5f:ca:21:8d:74
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Network" rev 0x00
vio1 at virtio2: address 70:5f:ca:21:8d:84
virtio2: irq 6
virtio3 at pci0 dev 4 function 0 "Qumranet Virtio Network" rev 0x00
vio2 at virtio3: address 70:5f:ca:21:8d:94
virtio3: irq 7
virtio4 at pci0 dev 5 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio4
scsibus1 at vioblk0: 1 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 10240MB, 512 bytes/sector, 20971520 sectors
virtio4: irq 9
virtio5 at pci0 dev 6 function 0 "Qumranet Virtio SCSI" rev 0x00
vioscsi0 at virtio5: qsize 128
scsibus2 at vioscsi0: 1 targets
cd0 

Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-03 Thread Alexander Bluhm
On Wed, Jan 03, 2024 at 04:51:39PM +1000, Jonathan Matthew wrote:
> On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote:
> > On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> > > While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> > > interface and ifconfig bnxt0 down/up at the same time I can trigger
> > > panic. Panic can be triggered without kettenis@ diff...
> > 
> > It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
> > receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
> > My panic looks different.
> 
> It looks like I wasn't trying very hard when I wrote bnxt_down().
> I think there's also a problem with bnxt_up() unwinding after failure
> in various places, but that's a different issue.
> 
> This makes it a more resilient for me, though it still logs
> 'bnxt0: unexpected completion type 3' a lot if I take the interface
> down while it's in use.  I'll look at that separately.

Should we intr_barrier(sc->sc_queues[0].q_ihc) if sc->sc_intrmap == NULL ?

All these barriers make sense to me.  OK bluhm@

> Index: if_bnxt.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v
> retrieving revision 1.39
> diff -u -p -r1.39 if_bnxt.c
> --- if_bnxt.c 10 Nov 2023 15:51:20 -  1.39
> +++ if_bnxt.c 3 Jan 2024 06:36:02 -
> @@ -1158,12 +1159,16 @@ bnxt_down(struct bnxt_softc *sc)
>  
>   CLR(ifp->if_flags, IFF_RUNNING);
>  
> + intr_barrier(sc->sc_ih);
> +
>   for (i = 0; i < sc->sc_nqueues; i++) {
>   ifq_clr_oactive(ifp->if_ifqs[i]);
>   ifq_barrier(ifp->if_ifqs[i]);
> - /* intr barrier? */
>  
> - timeout_del(>sc_queues[i].q_rx.rx_refill);
> + timeout_del_barrier(>sc_queues[i].q_rx.rx_refill);
> +
> + if (sc->sc_intrmap != NULL)
> + intr_barrier(sc->sc_queues[i].q_ihc);
>   }
>  
>   bnxt_hwrm_free_filter(sc, >sc_vnic);
> 



Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.

2024-01-02 Thread Alexander Bluhm
On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote:
> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt
> interface and ifconfig bnxt0 down/up at the same time I can trigger
> panic. Panic can be triggered without kettenis@ diff...

It is easy to reproduce.  ifconfig bnxt1 down/up a few times while
receiving TCP traffic with iperf3.  Machine still has kettenis@ diff.
My panic looks different.

root@ot42:.../~# ifconfig bnxt1 down
bnxt1: unexpected completion type 3
...
bnxt1: unexpected completion type 3
uvm_fault(0x8256c0b8, 0x30, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  bnxt_rx_fill+0x5f:  movq0x30(%rdx),%rdx
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 452275   8801  00x13  0x4003  iperf3
 343849  34751  0 0x14000  0x2002  softnet1
 154248  41240  0 0x14000  0x2001  softnet0
bnxt_rx_fill(802df888) at bnxt_rx_fill+0x5f
bnxt_intr(802df888) at bnxt_intr+0x406
intr_handler(80005c04c040,800a7800) at intr_handler+0x72
Xintr_ioapic_edge1_untramp() at Xintr_ioapic_edge1_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x11f
sched_idle(80005a61fff0) at sched_idle+0x282
end trace frame: 0x0, count: 9
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{7}> show panic
*cpu7: uvm_fault(0x8256c0b8, 0x30, 0, 1) -> e
ddb{7}> trace
bnxt_rx_fill(802df888) at bnxt_rx_fill+0x5f
bnxt_intr(802df888) at bnxt_intr+0x406
intr_handler(80005c04c040,800a7800) at intr_handler+0x72
Xintr_ioapic_edge1_untramp() at Xintr_ioapic_edge1_untramp+0x18f
acpicpu_idle() at acpicpu_idle+0x11f
sched_idle(80005a61fff0) at sched_idle+0x282
end trace frame: 0x0, count: -6
ddb{7}> show register
rdi   0x802df958
rsi   0x802df918
rbp   0x80005c04bf20
rbx   0x802df024
rdx0
rcx0
rax  0x4
r80xcc01
r9   0x1
r10   0x7be05f26dfeb8079
r11   0x81c2c48b86f2e7bd
r12  0x1
r13  0x1
r14   0x802df888
r15   0x802df000
rip   0x81b6180fbnxt_rx_fill+0x5f
cs   0x8
rflags   0x10202__ALIGN_SIZE+0xf202
rsp   0x80005c04bee0
ss  0x10
bnxt_rx_fill+0x5f:  movq0x30(%rdx),%rdx

In my case, I would say rx->rx_ring_mem is NULL.
slots = bnxt_rx_fill_slots(sc, >rx_ring,
BNXT_DMA_KVA(rx->rx_ring_mem), rx->rx_slots,
>rx_prod, MCLBYTES,
RX_PROD_PKT_BD_TYPE_RX_PROD_PKT, slots);

For Hrvoje's panic it looks like tx->tx_slots is NULL.
bnxt_free_slots(sc, tx->tx_slots, tx->tx_ring.ring_size,
tx->tx_ring.ring_size);



Re: rw_enter: netlock locking against myself, 7.4+errata007, hyper-v hvn((4)

2024-01-02 Thread Alexander Bluhm
On Tue, Jan 02, 2024 at 09:11:07PM +0300, Vitaliy Makkoveev wrote:
> ifq_task_mtx initialized with IPL_NET priority, so this sequence
> started this.
> 
> THR1 mtx_enter(>ifq_task_mtx)
> THR2 splnet() /* hv_wait(), just before hv_intr() */
> THR1 mtx_leave(>ifq_task_mtx)
> THR1 `-> Xspllower()
> THR1 skip
> THR1   `-> hv_intr()
> 
> IMHO the spl*() protection in the network stack doesn???t work as
> expected.

I would say it works exactly as I expect.

Xresume_hyperv_upcall is registered in amd64/intr.c with IPL_NET.

#if NHYPERV > 0
isp = malloc(sizeof (struct intrsource), M_DEVBUF, M_NOWAIT|M_ZERO);
if (isp == NULL)
panic("can't allocate fixed interrupt source");
isp->is_recurse = Xrecurse_hyperv_upcall;
isp->is_resume = Xresume_hyperv_upcall;
fake_hyperv_intrhand.ih_level = IPL_NET;
isp->is_handlers = _hyperv_intrhand;
isp->is_pic = _pic;
ci->ci_isources[LIR_HYPERV] = isp;
#endif

mtx_leave(>ifq_task_mtx) reduces the SPL level below SPL_NET.

So this code in spllower() jumps directly into Xrecurse_hyperv_upcall.

2:  bsrq%rax,%rax
btrq%rax,CPUVAR(IPENDING)
movqCPUVAR(ISOURCES)(,%rax,8),%rax
movqIS_RECURSE(%rax),%rax
jmp retpoline_rax
END(Xspllower)

Xrecurse_hyperv_upcall jumps to Xresume_hyperv_upcall which calls
hv_intr, no hv_wait() involved.



Re: rw_enter: netlock locking against myself, 7.4+errata007, hyper-v hvn((4)

2024-01-02 Thread Alexander Bluhm
On Tue, Jan 02, 2024 at 08:29:50PM +0300, Vitaliy Makkoveev wrote:
> > On 2 Jan 2024, at 20:16, Alexander Bluhm  wrote:
> > 
> > On Tue, Jan 02, 2024 at 03:45:10PM +, Stuart Henderson wrote:
> >> panic: rw_enter: netlock locking against myself
> >> Stopped at db_enter+0x14:  popq%rbp
> >>TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >> *388963  11506755 0x2  00  snmpbulkwalk
> >> 320140  61046  0 0x14000  0x2001  reaper
> >> db_enter() at db_enter+0x14
> >> panic(820ebaaa) at panic+0xc3
> >> rw_enter(824bd2d0,2) at rw_enter+0x252
> >> hv_kvp(8248d738) at hv_kvp+0x705
> >> hv_event_intr(8008d000) at hv_event_intr+0x1ab
> >> hv_intr() at hv_intr+0x1a
> >> Xresume_hyperv_upcall() at Xresume_hyperv_upcall+0x27
> >> Xspllower() at Xspllower+0x1d
> >> task_add(80036200,801533c0) at task_add+0x83
> >> if_enqueue_ifq(80153060,fd80f6642900) at if_enqueue_ifq+0x6f
> >> ether_output(80153060,fd80f6642900,fd80c8ef5c78,fd811e6414d8)
> >>  at ether_output+0x82
> >> if_output_tso(80153060,800025395d08,fd80c8ef5c78,fd811e6414d8,5dc)
> >>  at if_output_tso+0xe1
> >> ip_output(fd80f6642900,0,fd80c8ef5c68,0,0,fd80c8ef5bf0,1da98542b9d8f733)
> >>  at ip_output+0x817
> >> udp_output(fd80c8ef5bf0,fd80b468b400,fd80ac452c00,0) at 
> >> udp_output+0x3be
> >> end trace frame: 0x800025395ee0, count: 0
> >> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> >> reports.  Insufficient info makes it difficult to find and fix bugs.
> >> ddb{0}> Stopped at x86_ipi_db+0x16:leave
> >> x86_ipi_db(80002250dff0) at x86_ipi_db+0x16
> >> x86_ipi_handler() at x86_ipi_handler+0x80
> >> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27
> >> _kernel_lock() at _kernel_lock+0xc2
> >> reaper(80002473c008) at reaper+0x133
> >> end trace frame: 0x0, count: 10
> > 
> > It is caused by this commit.
> > 
> > 
> > revision 1.20
> > date: 2023/09/23 13:01:12;  author: mvs;  state: Exp;  lines: +5 -9;  
> > commitid:
> > Gj5WdkySyube1RkO;
> > Use shared netlock to protect if_list and ifa_list walkthrough and ifnet
> > data access within kvp_get_ip_info().
> > 
> > ok bluhm
> > 
> > 
> > Net lock must not be acquired in interrupt context.  It is wrong
> > that hv tries to perform network operations from interrupt context.
> > Actually it does not do much networking, it is just reads some
> > interface addresses.  The kernel lock that was used before is also
> > questionable.  Maybe some writers do not hold it anymore.  There
> > was much conversion from kernel to net lock.
> > 
> > bluhm
> 
> This is not interrupt context. tsleep_nsec() points that. The
> problem lays in Xresume_hyperv_upcall() which can run this
> from interrupt context.

In this case it is a soft interrupt started by spllower().  I think
hv_wait() is not involved.

Xspllower() -> Xresume_hyperv_upcall -> hv_intr() -> hv_event_intr()
-> hv_channel_schedule() -> ch->ch_handler() -> hv_kvp() ->
hv_kvp_process() -> kvp_get_ip_info() -> NET_LOCK_SHARED()

Anyway, converting this to a task makes sense to me.  Let's see if
your CHF_BATCHED diff works as expected.

bluhm

> hv_wait(struct hv_softc *sc, int (*cond)(struct hv_softc *, struct hv_msg *),
> struct hv_msg *msg, void *wchan, const char *wmsg)
> {
> int s;
> 
> KASSERT(cold ? msg->msg_flags & MSGF_NOSLEEP : 1);
> 
> while (!cond(sc, msg)) {
> if (msg->msg_flags & MSGF_NOSLEEP) {
> delay(1000);
> s = splnet();
> hv_intr();
> splx(s);
> } else {
> tsleep_nsec(wchan, PRIBIO, wmsg ? wmsg : "hvwait",
> USEC_TO_NSEC(1000));
> }
> }
> }
> 



Re: rw_enter: netlock locking against myself, 7.4+errata007, hyper-v hvn((4)

2024-01-02 Thread Alexander Bluhm
On Tue, Jan 02, 2024 at 03:45:10PM +, Stuart Henderson wrote:
> panic: rw_enter: netlock locking against myself
> Stopped atdb_enter+0x14:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *388963  11506755 0x2  00  snmpbulkwalk
>  320140  61046  0 0x14000  0x2001  reaper
> db_enter() at db_enter+0x14
> panic(820ebaaa) at panic+0xc3
> rw_enter(824bd2d0,2) at rw_enter+0x252
> hv_kvp(8248d738) at hv_kvp+0x705
> hv_event_intr(8008d000) at hv_event_intr+0x1ab
> hv_intr() at hv_intr+0x1a
> Xresume_hyperv_upcall() at Xresume_hyperv_upcall+0x27
> Xspllower() at Xspllower+0x1d
> task_add(80036200,801533c0) at task_add+0x83
> if_enqueue_ifq(80153060,fd80f6642900) at if_enqueue_ifq+0x6f
> ether_output(80153060,fd80f6642900,fd80c8ef5c78,fd811e6414d8)
>  at ether_output+0x82
> if_output_tso(80153060,800025395d08,fd80c8ef5c78,fd811e6414d8,5dc)
>  at if_output_tso+0xe1
> ip_output(fd80f6642900,0,fd80c8ef5c68,0,0,fd80c8ef5bf0,1da98542b9d8f733)
>  at ip_output+0x817
> udp_output(fd80c8ef5bf0,fd80b468b400,fd80ac452c00,0) at 
> udp_output+0x3be
> end trace frame: 0x800025395ee0, count: 0
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}> Stopped atx86_ipi_db+0x16:leave
> x86_ipi_db(80002250dff0) at x86_ipi_db+0x16
> x86_ipi_handler() at x86_ipi_handler+0x80
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27
> _kernel_lock() at _kernel_lock+0xc2
> reaper(80002473c008) at reaper+0x133
> end trace frame: 0x0, count: 10

It is caused by this commit.


revision 1.20
date: 2023/09/23 13:01:12;  author: mvs;  state: Exp;  lines: +5 -9;  commitid:
Gj5WdkySyube1RkO;
Use shared netlock to protect if_list and ifa_list walkthrough and ifnet
data access within kvp_get_ip_info().

ok bluhm


Net lock must not be acquired in interrupt context.  It is wrong
that hv tries to perform network operations from interrupt context.
Actually it does not do much networking, it is just reads some
interface addresses.  The kernel lock that was used before is also
questionable.  Maybe some writers do not hold it anymore.  There
was much conversion from kernel to net lock.

bluhm



ntpd crash in constraint_msg_close log_sockaddr

2023-12-18 Thread Alexander Bluhm
Hi,

for some days or weeks I see crashes of ntpd in accounting log on
my laptop.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  log_sockaddr (sa=0x8) at /usr/src/usr.sbin/ntpd/util.c:159
159 if (getnameinfo(sa, SA_LEN(sa), buf, sizeof(buf), NULL, 0,
(gdb) bt
#0  log_sockaddr (sa=0x8) at /usr/src/usr.sbin/ntpd/util.c:159
#1  0x0b02fb57fc32 in constraint_msg_close (id=,
data=0xb058f8f3770 "\001", len=4)
at /usr/src/usr.sbin/ntpd/constraint.c:714
#2  0x0b02fb575f8a in ntp_dispatch_imsg ()
at /usr/src/usr.sbin/ntpd/ntp.c:516
#3  0x0b02fb5758b8 in ntp_main (nconf=, pw=,
argc=, argv=)
at /usr/src/usr.sbin/ntpd/ntp.c:378
#4  0x0b02fb57304a in main (argc=, argv=)
at /usr/src/usr.sbin/ntpd/ntpd.c:224

(gdb) frame 1
#1  0x0b02fb57fc32 in constraint_msg_close (id=,
data=0xb058f8f3770 "\001", len=4)
at /usr/src/usr.sbin/ntpd/constraint.c:714
714 log_sockaddr((struct sockaddr *)
(gdb) print cstr
$2 = (struct constraint *) 0xb05b96ac000
(gdb) print cstr->addr
$3 = (struct ntp_addr *) 0x0

Logging a null pointer address does not work.

   711  if (fail) {
   712  log_debug("no constraint reply from %s"
   713  " received in time, next query %ds",
   714  log_sockaddr((struct sockaddr *)
   715  >addr->ss), CONSTRAINT_SCAN_INTERVAL);

bluhm



Re: kernel diagnostic assertion "st->timeout == PFTM_UNLINKED" failed: file "

2023-12-11 Thread Alexander Bluhm
On Mon, Dec 11, 2023 at 10:58:22AM +0100, Alexandr Nedvedicky wrote:
> dlg@ and I are basically trying to remove all NET_LOCK() operations from
> pf(4), because we don't want pf(4) to be playing with global NET_LOCK().
> all callers to pf(4) should either obtain NET_LOCK() in case they need it.
> pf(4) should not care about NET_LOCK() at all. That's the ideal situation
> where we are heading to.

pf(4) neither playing with NET_LOCK() nor relying on net lock is a
good goal.  But when pf calls something in the network stack, it
needs at least shared net lock.  This is the reponsibility of the
caller.  As the caller does not know when pf calls what, it think
all entry points into pf should hold net lock.  In general this is
also true for the pf timers.

We should keep simple locking rules how subsystems interact.  Removing
net lock from pf on one end and relying on net lock in IP stack
when called from pf will not work.

> I took a closer look at export_pflow() in current. It seems to me the
> function assumes caller holds a shared NET_LOCK() at least, just to 
> protect
> consistency of `pflowif_list`.

For that we take mvs@ diff.  It solves the current problem.



Re: kernel diagnostic assertion "st->timeout == PFTM_UNLINKED" failed: file "

2023-12-11 Thread Alexander Bluhm
On Mon, Dec 11, 2023 at 02:47:49PM +0300, Vitaliy Makkoveev wrote:
> Hi,
> 
> > On 11 Dec 2023, at 12:58, Alexandr Nedvedicky  wrote:
> > 
> >on the other hand if there is a way to implement pflowif_list as 
> > lock-less
> >(or move it ouf of NET_LOCK() scope), then this is a preferred way
> >forward.
> 
> So, I???m going to commit the diff which turns pflowif_list into SMR list.

OK bluhm@



Re: kernel diagnostic assertion "st->timeout == PFTM_UNLINKED" failed: file "

2023-12-08 Thread Alexander Bluhm
On Sat, Dec 09, 2023 at 02:07:06AM +0300, Vitaliy Makkoveev wrote:
> > >   SLIST_ENTRY(pflow_softc) sc_next;
> > 
> > This list is protected by net lock.  Can you add an [N] here?
> > 
> 
> This is not true. The netlock is not taken while export_pflow() called
> from pf_purge_states(). I privately shared the diff to fix this, but not
> committed it yet. I will update it and share after committing this diff.

Why not hold the shared net lock while in pf?

My stategy is to convert exclusive net lock to shared net lock step
by step.  When this is complete, we can remove net lock completely.
Until that happens, we hold exclusive net lock in corner cases and
know that nothing can go wrong.

Seems to be easier than to be aware of pflowif_list locking when
working on pf.

bluhm



Re: kernel diagnostic assertion "st->timeout == PFTM_UNLINKED" failed: file "

2023-12-08 Thread Alexander Bluhm
On Thu, Dec 07, 2023 at 06:14:30PM +0300, Vitaliy Makkoveev wrote:
> Here id the diff. I introduces `sc_mtx' mutex(9) to protect the most of 
> pflow_softc structure. The `send_nam', `sc_flowsrc' and `sc_flowdst' are
> prtected by `sc_lock' rwlock(9). `sc_tmpl_ipfix' is immutable.
> 
> Also, the pflow_sendout_ipfix_tmpl() calls pflow_get_mbuf() with NULL
> instead of `sc'. This fix also included to this diff.
> 
> Please note, this diff does not fix all the problems in the pflow(4).
> The ifconfig create/destroy sequence could still break the kernel. I
> have ok'ed diff to fix this but did not commit it yet for some reason.
> Also the `pflowstats' data left unprotected. I will fix this with
> separate diff.
> 
> Please test this diff and let us know the result. I will continue with
> pflow(4) after committing this.

Your changes make sense.

>   sc->sc_gcounter=pflowstats.pflow_flows;

Could you add some spaces around =
sc_gcounter is set without spaced twice.

>   case SIOCGETPFLOW:
>   bzero(, sizeof(pflowr));
>  
> + /* XXXSMP: enforce lock order */
> + NET_UNLOCK();
> + rw_enter_read(>sc_lock);
> + NET_LOCK();

This looks ugly.  I did not find the sc_lock -> NET_LOCK() seqence.
Where is it?  Why do we need it?  In general I think the global
NET_LOCK() should be taken before the specific sc_lock.  Moving the
net lock to very narrow places within other locks makes things more
complicated.  Better take a shared net lock and hold it for more
code.  Not sure if this is a usable solution in this case.

>   SLIST_ENTRY(pflow_softc) sc_next;

This list is protected by net lock.  Can you add an [N] here?

OK bluhm@

Feel free to address my remarks later.



Re: system panics now & then

2023-12-06 Thread Alexander Bluhm
On Wed, Dec 06, 2023 at 10:17:26AM +0100, Claudio Jeker wrote:
> On Wed, Dec 06, 2023 at 12:57:57AM +0100, Alexander Bluhm wrote:
> > On Wed, Dec 06, 2023 at 01:39:40AM +0300, Vitaliy Makkoveev wrote:
> > > > Diff makes sense in any case.
> > > > 
> > > 
> > > Just checked, socket6_send() is identical to socket_send() and needs
> > > to be reworked in the same way.
> > 
> > New diff for v4 and v6.  The other callers seem to be correct.  I
> > will run this through regress and commit regardless whether it fixes
> > the reported bug.  The current code is wrong anyway.
> > 
> > ok?
>  
> AFAIK there is no ip_mroute.c regress coverage. Diff is OK claudio@

I have written tests in 2019 which use a small dummy multicast
routing daemon.  Fix passes all regress with witness kernel.

/usr/src/regress/sys/netinet/mcast
/usr/src/regress/sys/netinet6/mcast6

Diff is commited.

Jo Geraerts: Please test anyway so we know whether it fixes your bug.

bluhm



Re: system panics now & then

2023-12-05 Thread Alexander Bluhm
On Wed, Dec 06, 2023 at 01:39:40AM +0300, Vitaliy Makkoveev wrote:
> > Diff makes sense in any case.
> > 
> 
> Just checked, socket6_send() is identical to socket_send() and needs
> to be reworked in the same way.

New diff for v4 and v6.  The other callers seem to be correct.  I
will run this through regress and commit regardless whether it fixes
the reported bug.  The current code is wrong anyway.

ok?

bluhm

Index: netinet/ip_mroute.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_mroute.c,v
diff -u -p -r1.139 ip_mroute.c
--- netinet/ip_mroute.c 14 Jun 2023 14:30:08 -  1.139
+++ netinet/ip_mroute.c 5 Dec 2023 19:24:11 -
@@ -1048,11 +1048,18 @@ del_mfc(struct socket *so, struct mbuf *
 }
 
 int
-socket_send(struct socket *s, struct mbuf *mm, struct sockaddr_in *src)
+socket_send(struct socket *so, struct mbuf *mm, struct sockaddr_in *src)
 {
-   if (s != NULL) {
-   if (sbappendaddr(s, >so_rcv, sintosa(src), mm, NULL) != 0) {
-   sorwakeup(s);
+   if (so != NULL) {
+   struct inpcb *inp = sotoinpcb(so);
+   int ret;
+
+   mtx_enter(>inp_mtx);
+   ret = sbappendaddr(so, >so_rcv, sintosa(src), mm, NULL);
+   mtx_leave(>inp_mtx);
+
+   if (ret != 0) {
+   sorwakeup(so);
return (0);
}
}
Index: netinet6/ip6_mroute.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet6/ip6_mroute.c,v
diff -u -p -r1.137 ip6_mroute.c
--- netinet6/ip6_mroute.c   14 Jun 2023 14:30:08 -  1.137
+++ netinet6/ip6_mroute.c   5 Dec 2023 23:55:54 -
@@ -853,11 +853,18 @@ del_m6fc(struct socket *so, struct mf6cc
 }
 
 int
-socket6_send(struct socket *s, struct mbuf *mm, struct sockaddr_in6 *src)
+socket6_send(struct socket *so, struct mbuf *mm, struct sockaddr_in6 *src)
 {
-   if (s) {
-   if (sbappendaddr(s, >so_rcv, sin6tosa(src), mm, NULL) != 0) {
-   sorwakeup(s);
+   if (so != NULL) {
+   struct inpcb *inp = sotoinpcb(so);
+   int ret;
+
+   mtx_enter(>inp_mtx);
+   ret = sbappendaddr(so, >so_rcv, sin6tosa(src), mm, NULL);
+   mtx_leave(>inp_mtx);
+
+   if (ret != 0) {
+   sorwakeup(so);
return 0;
}
}



Re: system panics now & then

2023-12-05 Thread Alexander Bluhm
On Tue, Dec 05, 2023 at 08:22:52PM +0100, Jo Geraerts wrote:
> maybe its a good idea to just change 1 thing

Yes, only change 1 thing.  I just wrote down all my ideas.

> > It could be race or a single packet that crashes the machine.

Found a race when we insert the IGMP packet into the socket buffer.
Unicast takes a mutex, but multicast code does not.

> Other than that, I suspect the issue was introduced in 7.3 because 
> (iirc) I never ran into that issue before 7.3.

The parallel receive as commited in 7.2.

revision 1.148
date: 2022/09/13 09:05:02;  author: mvs;  state: Exp;  lines: +30 -3;  
commitid: 7OEqRrdapIF2uHHb;
Do soreceive() with shared netlock for raw sockets.

ok bluhm@


Please try the diff below.

bluhm

Index: netinet/ip_mroute.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_mroute.c,v
diff -u -p -r1.139 ip_mroute.c
--- netinet/ip_mroute.c 14 Jun 2023 14:30:08 -  1.139
+++ netinet/ip_mroute.c 5 Dec 2023 19:24:11 -
@@ -1048,11 +1048,18 @@ del_mfc(struct socket *so, struct mbuf *
 }
 
 int
-socket_send(struct socket *s, struct mbuf *mm, struct sockaddr_in *src)
+socket_send(struct socket *so, struct mbuf *mm, struct sockaddr_in *src)
 {
-   if (s != NULL) {
-   if (sbappendaddr(s, >so_rcv, sintosa(src), mm, NULL) != 0) {
-   sorwakeup(s);
+   if (so != NULL) {
+   struct inpcb *inp = sotoinpcb(so);
+   int ret;
+
+   mtx_enter(>inp_mtx);
+   ret = sbappendaddr(so, >so_rcv, sintosa(src), mm, NULL);
+   mtx_leave(>inp_mtx);
+
+   if (ret != 0) {
+   sorwakeup(so);
return (0);
}
}



Re: system panics now & then

2023-12-05 Thread Alexander Bluhm
On Tue, Dec 05, 2023 at 04:22:47PM +0100, Jo Geraerts wrote:
> *92547  388831  1  0  7   0mrouted

Cool, you are running a multicast router.  Unfortunately this code
path is not well tested.

> *cpu0: receive 1: so 0xfd80259ea760, so_type 3, sb_cc 40

When the multicast routing daemon reads from raw socket, the kernel
crashes.  Kernel has no data in the queue, but counter says there
should be 40 bytes.  So it panics.

What kind of multicast traffic are you using mrouted for?
Is the machine stable if you don't start mrouted?
Does it help if you emulate only one virtual CPU?

It could be race or a single packet that crashes the machine.  Can
you use tcpdump e.g. on the vm host to see what packets are delivered
to mrouted when the crash happens?  It may be an IGMP packet.

netstat -na and netstat -g output would be useful.  fstat -p  helps to see for which sockets recvfrom(2) may be called.

> wait until the kids come complaining internet is not working anymore.

Sorry, no kids around to reproduce.  I need your help to debug.

bluhm



arm64 panic: malloc: out of space in kmem_map

2023-11-09 Thread Alexander Bluhm
Hi,

During make build my arm64 machine with 32 CPUs crashed.

bluhm

ddb{24}> x/s version
version:OpenBSD 7.4-current (GENERIC.MP) #16: Fri Nov  3 21:38:55 MDT 
2023\012
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP\012

ddb{24}> show panic
 cpu0: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", line 
698
 cpu31: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu30: pool_do_get: pted free list modified: page 0xff81baba8000; item 
addr 0xff81baba8298; offset 0x10=0x19ebd001
 cpu29: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", line 
895
 cpu28: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu27: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu26: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu25: pool_do_get: vp: page empty
 cpu24: pool_do_get: vp: page empty
 cpu23: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu22: pool_do_get: pted: page empty
*cpu21: malloc: out of space in kmem_map
 cpu20: pool_do_get: rwobjpl: page empty
 cpu19: pool_do_get: anonpl: page empty
 cpu18: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu17: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu16: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu15: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu14: pool_do_get: vp: page empty
 cpu13: pmap_pte_insert: have a pted, but missing a vp for 4afaaf2c3 va pmap 
0xff81aa0685e8
 cpu12: pool_do_get: vp: page empty
 cpu10: attempt to access user address 0x30 from EL1
 cpu9: pool_do_put: pted: double pool_put: 0xff81afa52f30
 cpu8: pool_do_get: anonpl: page empty
 cpu7: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu6: pool_do_get: anonpl: page empty
 cpu5: pool_do_get: vp: page empty
 cpu4: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu3: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu2: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu1: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", line 
698

ddb{24}> show panic
 cpu0: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", line 
698
 cpu31: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu30: pool_do_get: pted free list modified: page 0xff81baba8000; item 
addr 0xff81baba8298; offset 0x10=0x19ebd001
 cpu29: kernel diagnostic assertion "anon == NULL || anon->an_lock == NULL || 
rw_write_held(anon->an_lock)" failed: file "/usr/src/sys/uvm/uvm_page.c", line 
895
 cpu28: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 
"/usr/src/sys/uvm/uvm_vnode.c", line 953
 cpu27: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu26: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu25: pool_do_get: vp: page empty
 cpu24: pool_do_get: vp: page empty
 cpu23: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu22: pool_do_get: pted: page empty
*cpu21: malloc: out of space in kmem_map
 cpu20: pool_do_get: rwobjpl: page empty
 cpu19: pool_do_get: anonpl: page empty
 cpu18: uvm_fault failed: ff8000373488 esr 9604 far f16be1bd7400ca3a
 cpu17: kernel diagnostic assertion "((flags & PGO_LOCKED) != 0 && 
rw_lock_held(uobj->vmobjlock)) || (flags & PGO_LOCKED) == 0" failed: file 

Re: ixl driver - MAC address "reset"?

2023-10-12 Thread Alexander Bluhm
On Fri, Oct 06, 2023 at 12:08:41PM -0700, Joao Pedras wrote:
>   Another odd thing is that I do have another card but with its MACs 
> starting in 3c:ec:ef:b4:40.

If I understand you corretly, this is the broken card.

> ixl0 at pci6 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW 
> 8.3.64775 API 1.13, msix, 8 queues, address ac:1f:6b:f5:01:00
> Ixl1 at pci6 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 1, FW 
> 8.3.64775 API 1.13, msix, 8 queues, address ac:1f:6b:f5:01:01 

And this one works as expected.

> ixl0 at pci6 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 0, FW 
> 9.20.71847 API 1.15, msix, 8 queues, address 3c:ec:ef:b4:40:94
> ixl1 at pci6 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 1, FW 
> 9.20.71847 API 1.15, msix, 8 queues, address 3c:ec:ef:b4:40:95

An obvious difference is the firmware version.
8.3.64775 -> 9.20.71847

Have you tried to update the firmware?  Is that possible?

bluhm



Re: PXE install of OpenBSD 7.3 (amd64) fails on Protectli VP4650 & VP2420, with 'igc' Intel I225-V 2.5Gbps NICs

2023-10-02 Thread Alexander Bluhm
On Sun, Oct 01, 2023 at 06:25:41PM -0700, Ian R. wrote:
>  ?? The 'bsd' file it's looking for definitely does exist on my pxeboot 
> server, in the tftpd root dirirectory where it's supposed to be. I've 

It is supposed to be in an IP address subdirectory.  At least that's
how my setup looks like.

18512 ??  Ip   4:24.00 /usr/sbin/tftpd -4 -i /var/spool/tftp/

bluhm@testmaster:.../~$ ls -la /var/spool/tftp/10.0.1.49/
total 9656
drwxr-xr-x   3 ot29  ot29   512 Apr 12 17:56 .
drwxr-xr-x  45 root  wheel 2048 Jul 11 10:45 ..
-rw-r--r--   1 ot29  ot29151552 Apr 12 17:56 BOOTX64.EFI
lrwxr-xr-x   1 ot29  ot2911 Apr 12 17:56 auto_install -> BOOTX64.EFI
lrwxr-xr-x   1 ot29  ot2911 Mar 27  2023 auto_upgrade -> BOOTX64.EFI
-rw-r--r--   1 ot29  ot29   4656421 Apr 12 17:56 bsd.rd
drwxrwxr-x   2 ot29  ot29   512 Apr 12 17:56 etc
-rw-r--r--   1 ot29  ot29 98652 Feb  3  2022 pxeboot

Run tcpdump with -vvv or -X to see the path name for the next
request after has been loaded.

bluhm



Re: pf nat-to doesn't match a crafted packet

2023-09-04 Thread Alexander Bluhm
On Mon, Sep 04, 2023 at 03:58:02PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Mon, Sep 04, 2023 at 03:28:00PM +0200, Alexander Bluhm wrote:
> > On Sun, Sep 03, 2023 at 11:00:56PM +0200, Alexandr Nedvedicky wrote:
> > > Hello,
> > >
> > > On Sun, Sep 03, 2023 at 09:26:29PM +0200, Florian Obser wrote:
> > > > FYI, I'm not using sloppy, and I don't have a network with asymmetric 
> > > > routing
> > > > at the moment. I only remembered that we used sloppy for a while at my
> > > > previous job.  I think we settled on no-state because it was faster than
> > > > sloppy and less hastle.
> > > >
> > >
> > > From my perspective 'no state' vs. 'keep state (sloppy)' are valid 
> > > approaches.
> > > Both are equally good. Perhaps 'no state' option keeps code bit more 
> > > simple.
> > > Because if we will go with sloppy state, then we need to include a 
> > > small
> > > tweak to pf_test_rule() too, so 'keep state' (and nat-to) are not 
> > > ignored.
> > 
> > I do not think that ICMP error messages should create a state.  If
> > the rule is sloppy, just pass them.  nat-to does not work in this
> > case, but I would ignore that.  If you use sloppy states, don't be
> > surprized about strange effects in corner cases.
> > 
> 
> Understood. However would not be better to allow 'unsolicited' icmp
> error messages by 'no state' rules only? Such approach makes me
> slightly more comfortable, because the ruleset behavior is bit more
> predictable:
> 
>   keep state
>   keep state (sloppy)
>   don't match ICMP error replies, because those should match
>   existing state.
> 
>   no state
>   option can match any packet (including 'unsolicited' icmp error
>   replies) so firewall admins can deploy workarounds to address some
>   awkward corner cases
> 
> may be I'm overthinking it given the issue has not been noticed for ages.

I think we are bikeshedding the solution for a problem that nobody
had in real life for 20 years.

Peter wants to block the packets and Florian wants sloppy rules.
Just commit the two chunks below.  And never ask again who has a
problem with nat-to and stateless icmp error packets :-)

Maybe write /* icmp error packet must match existing state */ clarify
that comments talk about errors and not ping.

> @@ -4148,6 +4148,10 @@ enter_ruleset:
>   (r->rule_flag & PFRULE_STATESLOPPY) == 0 &&
>   ctx->icmp_dir != PF_IN),
>   TAILQ_NEXT(r, entries));
> + /* icmp packet must match existing state */
> + PF_TEST_ATTRIB(r->keep_state && ctx->state_icmp &&
> + (r->rule_flag & PFRULE_STATESLOPPY) == 0,
> + TAILQ_NEXT(r, entries));
>   break;
>
>   case IPPROTO_ICMPV6:
> @@ -4165,6 +4169,10 @@ enter_ruleset:
>   ctx->icmp_dir != PF_IN &&
>   ctx->icmptype != ND_NEIGHBOR_ADVERT),
>   TAILQ_NEXT(r, entries));
> + /* icmp packet must match existing state */
> + PF_TEST_ATTRIB(r->keep_state && ctx->state_icmp &&
> + (r->rule_flag & PFRULE_STATESLOPPY) == 0,
> + TAILQ_NEXT(r, entries));
>   break;
>
>   default:



Re: pf nat-to doesn't match a crafted packet

2023-09-04 Thread Alexander Bluhm
On Sun, Sep 03, 2023 at 11:00:56PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Sun, Sep 03, 2023 at 09:26:29PM +0200, Florian Obser wrote:
> > FYI, I'm not using sloppy, and I don't have a network with asymmetric 
> > routing
> > at the moment. I only remembered that we used sloppy for a while at my
> > previous job.  I think we settled on no-state because it was faster than
> > sloppy and less hastle.
> > 
> 
> From my perspective 'no state' vs. 'keep state (sloppy)' are valid 
> approaches.
> Both are equally good. Perhaps 'no state' option keeps code bit more 
> simple.
> Because if we will go with sloppy state, then we need to include a small
> tweak to pf_test_rule() too, so 'keep state' (and nat-to) are not ignored.

I do not think that ICMP error messages should create a state.  If
the rule is sloppy, just pass them.  nat-to does not work in this
case, but I would ignore that.  If you use sloppy states, don't be
surprized about strange effects in corner cases.

First two chunks are OK bluhm@, but not the third.

> Updated diff is below.
> 
> thanks and
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index 4f0fc3f91a9..6f7b782612c 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -4148,6 +4148,10 @@ enter_ruleset:
>   (r->rule_flag & PFRULE_STATESLOPPY) == 0 &&
>   ctx->icmp_dir != PF_IN),
>   TAILQ_NEXT(r, entries));
> + /* icmp packet must match existing state */
> + PF_TEST_ATTRIB(r->keep_state && ctx->state_icmp &&
> + (r->rule_flag & PFRULE_STATESLOPPY) == 0,
> + TAILQ_NEXT(r, entries));
>   break;
>  
>   case IPPROTO_ICMPV6:
> @@ -4165,6 +4169,10 @@ enter_ruleset:
>   ctx->icmp_dir != PF_IN &&
>   ctx->icmptype != ND_NEIGHBOR_ADVERT),
>   TAILQ_NEXT(r, entries));
> + /* icmp packet must match existing state */
> + PF_TEST_ATTRIB(r->keep_state && ctx->state_icmp &&
> + (r->rule_flag & PFRULE_STATESLOPPY) == 0,
> + TAILQ_NEXT(r, entries));
>   break;
>  
>   default:
> @@ -4469,8 +4477,7 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, 
> struct pf_state **sm,
>  
>   action = PF_PASS;
>  
> - if (pd->virtual_proto != PF_VPROTO_FRAGMENT
> - && !ctx.state_icmp && r->keep_state) {
> + if (pd->virtual_proto != PF_VPROTO_FRAGMENT && r->keep_state) {
>  
>   if (r->rule_flag & PFRULE_SRCTRACK &&
>   pf_insert_src_node([PF_SN_NONE], r, PF_SN_NONE,



Re: pf nat-to doesn't match a crafted packet

2023-09-03 Thread Alexander Bluhm
On Sun, Sep 03, 2023 at 06:17:12PM +0200, Florian Obser wrote:
> On 2023-09-03 18:13 +02, Alexander Bluhm  wrote:
> > On Sun, Sep 03, 2023 at 05:59:18PM +0200, Alexandr Nedvedicky wrote:
> >> Hello,
> >> 
> >> On Sun, Sep 03, 2023 at 05:10:02PM +0200, Alexander Bluhm wrote:
> >> > On Sun, Sep 03, 2023 at 04:12:35AM +0200, Alexandr Nedvedicky wrote:
> >> > > in my opinion is to fix pf_match_rule() function, so ICMP error message
> >> > > will no longer match 'keep state' rule. Diff below is for IPv4. I still
> >> > > need to think of more about IPv6. My gut feeling is it will be very 
> >> > > similar.
> >> > 
> >> > Thanks for the detailed analysis.
> >> > 
> >> > You proposed fix means that our default pf would block icmp error
> >> > packets now.
> >> > 
> >> > block return# block stateless traffic
> >> > pass# establish keep-state
> >> > 
> >> > To have the old behaviour one would write
> >> 
> >> I think icmp error message, if legit, is allowed because it matches
> >> state created by 'pass' rule. At least this is my understanding.
> >> 
> >> Or is there something else going on which I'm missing?
> >
> > If icmp packets are legit, they work with the existing pass keep-state
> > rule in default pf.conf.
> >
> > For passing unrelated icmp packets, e.g. with assymetric routing,
> > one can add a pass no-state rule.
> 
> ... which you would have in place already if you have asymmetric
> routing. Or keep state (sloppy), does it work with sloppy, too?

Passing the icmp packet with keep state sloppy, even in case there
is no matching state, could make sense.  Especially as Florian uses
sloppy for asymmetric routing.

Should we add "&& (r->rule_flag & PFRULE_STATESLOPPY) == 0" to
Sasha's diff?  Then sloppy rules would pass the packet as before.

bluhm



Re: pf nat-to doesn't match a crafted packet

2023-09-03 Thread Alexander Bluhm
On Sun, Sep 03, 2023 at 05:59:18PM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Sun, Sep 03, 2023 at 05:10:02PM +0200, Alexander Bluhm wrote:
> > On Sun, Sep 03, 2023 at 04:12:35AM +0200, Alexandr Nedvedicky wrote:
> > > in my opinion is to fix pf_match_rule() function, so ICMP error message
> > > will no longer match 'keep state' rule. Diff below is for IPv4. I still
> > > need to think of more about IPv6. My gut feeling is it will be very 
> > > similar.
> > 
> > Thanks for the detailed analysis.
> > 
> > You proposed fix means that our default pf would block icmp error
> > packets now.
> > 
> > block return# block stateless traffic
> > pass# establish keep-state
> > 
> > To have the old behaviour one would write
> 
> I think icmp error message, if legit, is allowed because it matches
> state created by 'pass' rule. At least this is my understanding.
> 
> Or is there something else going on which I'm missing?

If icmp packets are legit, they work with the existing pass keep-state
rule in default pf.conf.

For passing unrelated icmp packets, e.g. with assymetric routing,
one can add a pass no-state rule.

So I think you change is an improvement.

bluhm



Re: pf nat-to doesn't match a crafted packet

2023-09-03 Thread Alexander Bluhm
On Sun, Sep 03, 2023 at 04:12:35AM +0200, Alexandr Nedvedicky wrote:
> in my opinion is to fix pf_match_rule() function, so ICMP error message
> will no longer match 'keep state' rule. Diff below is for IPv4. I still
> need to think of more about IPv6. My gut feeling is it will be very similar.

Thanks for the detailed analysis.

You proposed fix means that our default pf would block icmp error
packets now.

block return# block stateless traffic
pass# establish keep-state

To have the old behaviour one would write

block return# block stateless traffic
pass no state   # allow all packets
pass# establish keep-state if suitable

The default rule, that is used with an empty pf.conf, still passes
all packets, as it does not keep state.

I think this makes sense.  Passing icmp error packets without error
should not be the default.  If someone needs it, it is possible.
The implicit default rule is still as dumb as possible.

OK bluhm@

> 8<---8<---8<--8<
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index 4f0fc3f91a9..0993aed85fb 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -4148,6 +4148,9 @@ enter_ruleset:
>   (r->rule_flag & PFRULE_STATESLOPPY) == 0 &&
>   ctx->icmp_dir != PF_IN),
>   TAILQ_NEXT(r, entries));
> + /* icmp packet must match existing state */
> + PF_TEST_ATTRIB(r->keep_state && ctx->state_icmp,
> + TAILQ_NEXT(r, entries);
>   break;
>  
>   case IPPROTO_ICMPV6:



Re: umb(4): splassert: rtable_getsource: want 2 have 0

2023-08-31 Thread Alexander Bluhm
On Thu, Aug 31, 2023 at 04:25:37PM +0300, Vitaliy Makkoveev wrote:
> > NET_UNLOCK() and NET_LOCK_SHARED() just after each other does not
> > make much sense.  Just keep exclusive netlock for the few lines.
> 
> Agreed. Both the cases perform route sockets walkthrough and message
> transmission. No sense for lockless error path only.

OK bluhm@

> Index: sys/dev/usb/if_umb.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_umb.c,v
> retrieving revision 1.54
> diff -u -p -r1.54 if_umb.c
> --- sys/dev/usb/if_umb.c  29 Aug 2023 23:28:38 -  1.54
> +++ sys/dev/usb/if_umb.c  31 Aug 2023 13:20:21 -
> @@ -1851,7 +1851,6 @@ umb_add_inet_config(struct umb_softc *sc
>   info.rti_info[RTAX_GATEWAY] = sintosa(_dstaddr);
>  
>   rv = rtrequest(RTM_ADD, , 0, , ifp->if_rdomain);
> - NET_UNLOCK();
>   if (rv) {
>   printf("%s: unable to set IPv4 default route, "
>   "error %d\n", DEVNAM(ifp->if_softc), rv);
> @@ -1862,6 +1861,7 @@ umb_add_inet_config(struct umb_softc *sc
>   rtm_send(rt, RTM_ADD, rv, ifp->if_rdomain);
>   rtfree(rt);
>   }
> + NET_UNLOCK();
>  
>   if (ifp->if_flags & IFF_DEBUG) {
>   char str[3][INET_ADDRSTRLEN];
> @@ -1932,7 +1932,6 @@ umb_add_inet6_config(struct umb_softc *s
>   info.rti_info[RTAX_GATEWAY] = sin6tosa(_dstaddr);
>  
>   rv = rtrequest(RTM_ADD, , 0, , ifp->if_rdomain);
> - NET_UNLOCK();
>   if (rv) {
>   printf("%s: unable to set IPv6 default route, "
>   "error %d\n", DEVNAM(ifp->if_softc), rv);
> @@ -1943,6 +1942,7 @@ umb_add_inet6_config(struct umb_softc *s
>   rtm_send(rt, RTM_ADD, rv, ifp->if_rdomain);
>   rtfree(rt);
>   }
> + NET_UNLOCK();
>  
>   if (ifp->if_flags & IFF_DEBUG) {
>   char str[3][INET6_ADDRSTRLEN];



Re: umb(4): splassert: rtable_getsource: want 2 have 0

2023-08-31 Thread Alexander Bluhm
On Thu, Aug 31, 2023 at 01:05:11PM +0300, Vitaliy Makkoveev wrote:
> On Thu, Aug 31, 2023 at 11:26:42AM +0200, Jeremie Courreges-Anglas wrote:
> > 
> > Looks umb(4) triggers the NET_ASSERT_LOCKED() check in
> > rtable_getsource() when the umb(4) interface comes up (here with
> > kern.splassert=2 to get context).  Reproduced with GENERIC.MP from Aug
> > 28 as well with cvs HEAD/if_umb.c rev 1.54.
> > 
> > Something to worry about?
> > 
> > 
> > OpenBSD 7.3-current (GENERIC.MP) #1357: Mon Aug 28 20:14:09 MDT 2023
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > [...]
> > umb0 at uhub0 port 3 configuration 1 interface 0 "FIBOCOM L831-EAU-00" rev 
> > 2.00/17.29 addr 2
> > [...]
> > splassert: rtable_getsource: want 2 have 0
> > Starting stack trace...
> > rtable_getsource(0,2) at rtable_getsource+0x58
> > rtm_send(fd83b1a817e0,1,0,0) at rtm_send+0xbc
> > umb_add_inet_config(817c7000,edf0e72e,18,1f0e72e) at 
> > umb_add_inet_config+0x2a8
> > umb_decode_ip_configuration(817c7000,81ccf230,50) at 
> > umb_decode_ip_configuration+0x147
> > umb_get_response_task(817c7000) at umb_get_response_task+0xda
> > usb_task_thread(800022fe0010) at usb_task_thread+0xe5
> > end trace frame: 0x0, count: 251
> > End of stack trace.
> > 
> 
> rtable_getsource() requires at least shared netlock to be held. It can't
> be taken within rtm_send() because we have paths where caller already
> holds it.

I am not sure if rtm_miss() a few lines above should run without
netlock.  Could we just move the NET_UNLOCK() currenly above the
if block after the else block?

NET_UNLOCK() and NET_LOCK_SHARED() just after each other does not
make much sense.  Just keep exclusive netlock for the few lines.

bluhm

> Index: sys/dev/usb/if_umb.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_umb.c,v
> retrieving revision 1.54
> diff -u -p -r1.54 if_umb.c
> --- sys/dev/usb/if_umb.c  29 Aug 2023 23:28:38 -  1.54
> +++ sys/dev/usb/if_umb.c  31 Aug 2023 10:03:13 -
> @@ -1859,7 +1859,9 @@ umb_add_inet_config(struct umb_softc *sc
>   ifp->if_rdomain);
>   } else {
>   /* Inform listeners of the new route */
> + NET_LOCK_SHARED();
>   rtm_send(rt, RTM_ADD, rv, ifp->if_rdomain);
> + NET_UNLOCK_SHARED();
>   rtfree(rt);
>   }
>  
> @@ -1940,7 +1942,9 @@ umb_add_inet6_config(struct umb_softc *s
>   ifp->if_rdomain);
>   } else {
>   /* Inform listeners of the new route */
> + NET_LOCK_SHARED();
>   rtm_send(rt, RTM_ADD, rv, ifp->if_rdomain);
> + NET_UNLOCK_SHARED();
>   rtfree(rt);
>   }
>  



Re: macppc panic: pool_do_get: vp: page empty

2023-07-18 Thread Alexander Bluhm
On Tue, Jul 18, 2023 at 11:59:27PM +0300, Alexander Bluhm wrote:
> While booting with 
> OpenBSD 7.3-current (GENERIC.MP) #153: Sat Jul 15 16:24:01 MDT 2023
> dera...@macppc.openbsd.org:/usr/src/sys/arch/macppc/compile/GENERIC.MP
> Machine paniced in rc reordering.

With latest snapshot it works again:
OpenBSD 7.3-current (GENERIC.MP) #154: Mon Jul 17 07:59:01 MDT 2023
dera...@macppc.openbsd.org:/usr/src/sys/arch/macppc/compile/GENERIC.MP
I will keep an eye on this.

bluhm



macppc panic: pool_do_get: vp: page empty

2023-07-18 Thread Alexander Bluhm
Hi,

While booting with 
OpenBSD 7.3-current (GENERIC.MP) #153: Sat Jul 15 16:24:01 MDT 2023
dera...@macppc.openbsd.org:/usr/src/sys/arch/macppc/compile/GENERIC.MP
Machine paniced in rc reordering.

reordering: ld.so libc libcrypto sshdpanic: pool_do_get: vp: page empty
Stopped at  db_enter+0x24:  lwz r11,12(r1)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*286053  66409  00x11  00  sh
db_enter() at db_enter+0x20
panic(908034) at panic+0x158
pool_do_get(0,6099c84b,68969083) at pool_do_get+0x370
pool_get(0,66b85b77) at pool_get+0xc4
pmap_vp_enter(1fff1a4,1fff0d8,e8024cb0,d15639f) at pmap_vp_enter+0xa4
pmap_enter(e8024e38,e8024db4,1f1f738,5f1d38bd,2420d042) at pmap_enter+0x1e0
uvm_fault_lower(e8024e58,0,e8024da0,427814) at uvm_fault_lower+0x900
uvm_fault(0,0,0,0) at uvm_fault+0x20c
trap(0) at trap+0x238
trapagain() at trapagain+0x4
--- trap (type 0x10400) ---
End of kernel: 0xfffe6810
end trace frame: 0xfffe6810, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}> [-- MARK -- Tue Jul 18 00:50:00 2023]

ddb{0}> trace
db_enter() at db_enter+0x20
panic(908034) at panic+0x158
pool_do_get(0,6099c84b,68969083) at pool_do_get+0x370
pool_get(0,66b85b77) at pool_get+0xc4
pmap_vp_enter(1fff1a4,1fff0d8,e8024cb0,d15639f) at pmap_vp_enter+0xa4
pmap_enter(e8024e38,e8024db4,1f1f738,5f1d38bd,2420d042) at pmap_enter+0x1e0
uvm_fault_lower(e8024e58,0,e8024da0,427814) at uvm_fault_lower+0x900
uvm_fault(0,0,0,0) at uvm_fault+0x20c
trap(0) at trap+0x238
trapagain() at trapagain+0x4
--- trap (type 0x10400) ---
End of kernel: 0xfffe6810
end trace frame: 0xfffe6810, count: -10

ddb{0}> show panic
*cpu0: pool_do_get: vp: page empty

ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*66409  286053  13174  0  70x11sh
 13174  311294  57279  0  30x100089  sigsusp   sh
 89034  307491  1  0  30x100080  kqreadresolvd
 34301  100707  86871 77  30x100092  kqreaddhcpleased
 54366   69070  86871 77  30x100092  kqreaddhcpleased
 86871  362535  1  0  30x80  kqreaddhcpleased
 98114  186276   3166115  30x100092  kqreadslaacd
 56689   38807   3166115  30x100092  kqreadslaacd
  3166  469092  1  0  30x100080  kqreadslaacd
 57279   55427  76112  0  30x100089  sigsusp   sh
 76112   10156  1  0  30x100083  piperdsh
 22123  127478  0  0  3 0x14200  bored smr
 82693  204230  0  0  3  0x40014200idle1
 68536  190476  0  0  3 0x14200  pgzerozerothread
  4888  454542  0  0  3 0x14200  aiodoned  aiodoned
 82385  475789  0  0  3 0x14200  syncerupdate
 78872   23710  0  0  3 0x14200  cleaner   cleaner
 20471  388019  0  0  3 0x14200  reaperreaper
 20260  360822  0  0  3 0x14200  pgdaemon  pagedaemon
 76712  160794  0  0  3 0x14200  usbtskusbtask
 37718   40013  0  0  3 0x14200  usbatsk   usbatsk
 89295  189330  0  0  3 0x14200  bored sensors
 98074  514874  0  0  3 0x14200  bored softnet3
 90862  520613  0  0  3 0x14200  bored softnet2
 11008   97579  0  0  3 0x14200  bored softnet1
 13848  442973  0  0  3 0x14200  bored softnet0
  6945  155368  0  0  3 0x14200  bored systqmp
 97083  254902  0  0  3 0x14200  bored systq
 27429  306600  0  0  3  0x40014200  bored softclock
 12027  511656  0  0  3  0x40014200idle0
 1  106847  0  0  30x82  wait  init
 0   0 -1  0  3 0x10200  scheduler swapper

ddb{0}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  500848 VM pages: 1622 active, 17805 inactive, 1 wired, 442406 free (55607 
zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=16694, free-target=22258, inactive-target=0, wired-max=166949
  faults=106581, traps=0, intrs=12014, ctxswitch=42307 fpuswitch=12977
  softint=5524, syscalls=166272, kmapent=10
  fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=19763(19794), anget(retries)=51906(0), amapcopy=21501
neighbor anon/obj pg=16262/31978, gets(lock/unlock)=34379/21189
cases: anon=46026, anoncow=5943, obj=32098, prcopy=3869, przero=20398
  daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=786491, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
objs(kern)=0xb31c78

After reboot fsck made no 

Re: kernel diagnostic assertion "!_kernel_lock_held()" failed

2023-07-06 Thread Alexander Bluhm
On Thu, Jul 06, 2023 at 02:14:09PM +, Valdrin MUJA wrote:
> I've applied your patch but crashed again. Here it is:
> ddb{1}> show panic
> *cpu1: kernel diagnostic assertion "refcnt_read(>rt_refcnt) >= 2" failed: 
> f
> ile "/usr/src/sys/net/rtable.c", line 828

This kassert I added seems to be wrong.  I copied it from above
without thinking enough.  Just remove it, updated diff below.

I compared your crash 3 and 4 output:

TEST1> uvm_fault(0xfd826717bcc0, 0x8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  srp_get_locked+0x11:movq0(%rdi),%rax
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*225335  47125  0   0  01  bgpd
 231752  78299 73   0x1100010  03  syslogd
 344909   6421  0 0x14000  0x2002  wg_handshake
 361415  98860  0 0x14000  0x2000  reaper

SPOKE1> uvm_fault(0xfd81d5995878, 0x8, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  srp_get_locked+0x11:movq0(%rdi),%rax
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 448769  98731  00x12  03  sh
 350289  69698 73   0x1100010  00  syslogd
*114462  84824  0   0  01  bgpd
 256495  50081  0 0x14000  0x2002  wg_handshake

It is interesting that bgpd and wireguard are running in both cases
when it crashes.  Unfortunately you mail does not include this
output for crash 1 and 2.  It is printed immediately when the machine
crashes.  Do you have it in some console history?

I see a lot of different workload on your machine.  That makes it
harder to identify the subsystem that has the bug.  I see bgpd(8)
and wg(2) doing things with network and routing.  Is there something
else?

What has changed to make these crashes happen?  New workload?  New
machine?  Upgrade to 7.3?  Was it stable with 7.2?  ...

Thanks for testing.

bluhm

Index: net/rtable.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtable.c,v
retrieving revision 1.82
diff -u -p -r1.82 rtable.c
--- net/rtable.c19 Apr 2023 17:42:47 -  1.82
+++ net/rtable.c6 Jul 2023 15:56:04 -
@@ -604,6 +604,11 @@ rtable_insert(unsigned int rtableid, str
SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next);
 
prev = art_insert(ar, an, addr, plen);
+   if (prev == an) {
+   rw_exit_write(>ar_lock);
+   /* keep the refcount for rt while it is in an_rtlist */
+   return (0);
+   }
if (prev != an) {
SRPL_REMOVE_LOCKED(_rc, >an_rtlist, rt, rtentry,
rt_next);
@@ -689,9 +694,10 @@ rtable_delete(unsigned int rtableid, str
npaths++;
 
if (npaths > 1) {
-   KASSERT(refcnt_read(>rt_refcnt) >= 1);
+   KASSERT(refcnt_read(>rt_refcnt) >= 2);
SRPL_REMOVE_LOCKED(_rc, >an_rtlist, rt, rtentry,
rt_next);
+   rtfree(rt);
 
mrt = SRPL_FIRST_LOCKED(>an_rtlist);
if (npaths == 2)
@@ -703,8 +709,9 @@ rtable_delete(unsigned int rtableid, str
if (art_delete(ar, an, addr, plen) == NULL)
panic("art_delete failed to find node %p", an);
 
-   KASSERT(refcnt_read(>rt_refcnt) >= 1);
+   KASSERT(refcnt_read(>rt_refcnt) >= 2);
SRPL_REMOVE_LOCKED(_rc, >an_rtlist, rt, rtentry, rt_next);
+   rtfree(rt);
art_put(an);
 
 leave:
@@ -821,12 +828,10 @@ rtable_mpath_reprio(unsigned int rtablei
 */
rt->rt_priority = prio;
} else {
-   rtref(rt); /* keep rt alive in between remove and insert */
SRPL_REMOVE_LOCKED(_rc, >an_rtlist,
rt, rtentry, rt_next);
rt->rt_priority = prio;
rtable_mpath_insert(an, rt);
-   rtfree(rt);
error = EAGAIN;
}
rw_exit_write(>ar_lock);
@@ -839,6 +844,9 @@ rtable_mpath_insert(struct art_node *an,
 {
struct rtentry  *mrt, *prt = NULL;
uint8_t  prio = rt->rt_priority;
+
+   /* increment the refcount for rt while it is in an_rtlist */
+   rtref(rt);
 
if ((mrt = SRPL_FIRST_LOCKED(>an_rtlist)) == NULL) {
SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next);



Re: kernel diagnostic assertion "!_kernel_lock_held()" failed

2023-07-06 Thread Alexander Bluhm
On Wed, Jul 05, 2023 at 12:17:15PM +, Valdrin MUJA wrote:
> ddb{3}> show panic
> *cpu3: kernel diagnostic assertion "!ISSET(rt->rt_flags, RTF_UP)" failed: 
> file "
> /usr/src/sys/net/route.c", line 496
> 
> ddb{3}> trace
> db_enter() at db_enter+0x10
> panic(82067518) at panic+0xbf
> __assert(820de23b,8206be5d,1f0,820e901b) at 
> __assert+0x
> 25
> rtfree(fd8275365a90) at rtfree+0x1af
> route_output(fd8065dd1f00,fd821540a920) at route_output+0x413
> route_send(fd821540a920,fd8065dd1f00,0,0) at route_send+0x57
> sosend(fd821540a920,0,80002254d3e0,0,0,80) at sosend+0x37f
> dofilewritev(80002251f390,6,80002254d3e0,0,80002254d4e0) at 
> dofilew
> ritev+0x14d
> sys_writev(80002251f390,80002254d480,80002254d4e0) at 
> sys_writev+0x
> d2
> syscall(80002254d550) at syscall+0x3d4
> Xsyscall() at Xsyscall+0x128

Looks like your routing table is busted.  I just found a bug in
-current.  Maybe this also causes your problem.

Could you apply the diff below an recompile the kernel.  It should
be the same fix for 7.3.

bluhm

Index: net/rtable.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/rtable.c,v
retrieving revision 1.82
diff -u -p -r1.82 rtable.c
--- net/rtable.c19 Apr 2023 17:42:47 -  1.82
+++ net/rtable.c5 Jul 2023 20:05:26 -
@@ -604,6 +604,11 @@ rtable_insert(unsigned int rtableid, str
SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next);
 
prev = art_insert(ar, an, addr, plen);
+   if (prev == an) {
+   rw_exit_write(>ar_lock);
+   /* keep the refcount for rt while it is in an_rtlist */
+   return (0);
+   }
if (prev != an) {
SRPL_REMOVE_LOCKED(_rc, >an_rtlist, rt, rtentry,
rt_next);
@@ -689,9 +694,10 @@ rtable_delete(unsigned int rtableid, str
npaths++;
 
if (npaths > 1) {
-   KASSERT(refcnt_read(>rt_refcnt) >= 1);
+   KASSERT(refcnt_read(>rt_refcnt) >= 2);
SRPL_REMOVE_LOCKED(_rc, >an_rtlist, rt, rtentry,
rt_next);
+   rtfree(rt);
 
mrt = SRPL_FIRST_LOCKED(>an_rtlist);
if (npaths == 2)
@@ -703,8 +709,9 @@ rtable_delete(unsigned int rtableid, str
if (art_delete(ar, an, addr, plen) == NULL)
panic("art_delete failed to find node %p", an);
 
-   KASSERT(refcnt_read(>rt_refcnt) >= 1);
+   KASSERT(refcnt_read(>rt_refcnt) >= 2);
SRPL_REMOVE_LOCKED(_rc, >an_rtlist, rt, rtentry, rt_next);
+   rtfree(rt);
art_put(an);
 
 leave:
@@ -821,12 +828,11 @@ rtable_mpath_reprio(unsigned int rtablei
 */
rt->rt_priority = prio;
} else {
-   rtref(rt); /* keep rt alive in between remove and insert */
+   KASSERT(refcnt_read(>rt_refcnt) >= 2);
SRPL_REMOVE_LOCKED(_rc, >an_rtlist,
rt, rtentry, rt_next);
rt->rt_priority = prio;
rtable_mpath_insert(an, rt);
-   rtfree(rt);
error = EAGAIN;
}
rw_exit_write(>ar_lock);
@@ -839,6 +845,9 @@ rtable_mpath_insert(struct art_node *an,
 {
struct rtentry  *mrt, *prt = NULL;
uint8_t  prio = rt->rt_priority;
+
+   /* increment the refcount for rt while it is in an_rtlist */
+   rtref(rt);
 
if ((mrt = SRPL_FIRST_LOCKED(>an_rtlist)) == NULL) {
SRPL_INSERT_HEAD_LOCKED(_rc, >an_rtlist, rt, rt_next);



Re: Syslog does not attempt DNS resolution if it previously failed during startup - proposed patch In

2023-07-03 Thread Alexander Bluhm
On Thu, Jun 29, 2023 at 01:40:17PM +, Robert Larsson wrote:
> most every 10 seconds. To do this I've added f_resolvetime to f_forw -
> I chose this approach rather than the TCP evtimer because I don't
> fully understand the concurrency or blocking aspects of evtimer, and

I have changed your diff to use an event for retry dns.
Also UDP resolve and TCP connect retry are more similar now.

Does this diff work for you?

bluhm

Index: usr.sbin/syslogd/syslogd.c
===
RCS file: /data/mirror/openbsd/cvs/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.277
diff -u -p -r1.277 syslogd.c
--- usr.sbin/syslogd/syslogd.c  16 Mar 2023 18:22:08 -  1.277
+++ usr.sbin/syslogd/syslogd.c  2 Jul 2023 17:13:11 -
@@ -156,9 +156,12 @@ struct filed {
struct sockaddr_storage  f_addr;
struct buffertls f_buftls;
struct bufferevent  *f_bufev;
+   struct event f_ev;
struct tls  *f_ctx;
+   char*f_ipproto;
char*f_host;
-   int  f_reconnectwait;
+   char*f_port;
+   int  f_retrywait;
} f_forw;   /* forwarding address */
charf_fname[PATH_MAX];
struct {
@@ -320,6 +323,7 @@ void tcp_writecb(struct bufferevent *, 
 voidtcp_errorcb(struct bufferevent *, short, void *);
 voidtcp_connectcb(int, short, void *);
 voidtcp_connect_retry(struct bufferevent *, struct filed *);
+voidudp_resolvecb(int, short, void *);
 int tcpbuf_countmsg(struct bufferevent *bufev);
 voiddie_signalcb(int, short, void *);
 voidmark_timercb(int, short, void *);
@@ -1380,7 +1384,7 @@ tcp_writecb(struct bufferevent *bufev, v
 * Successful write, connection to server is good, reset wait time.
 */
log_debug("loghost \"%s\" successful write", f->f_un.f_forw.f_loghost);
-   f->f_un.f_forw.f_reconnectwait = 0;
+   f->f_un.f_forw.f_retrywait = 0;
 
if (f->f_dropped > 0 &&
EVBUFFER_LENGTH(f->f_un.f_forw.f_bufev->output) < MAX_TCPBUF) {
@@ -1453,6 +1457,18 @@ tcp_connectcb(int fd, short event, void 
struct bufferevent  *bufev = f->f_un.f_forw.f_bufev;
int  s;
 
+   if (f->f_un.f_forw.f_addr.ss_family == AF_UNSPEC) {
+   if (priv_getaddrinfo(f->f_un.f_forw.f_ipproto,
+   f->f_un.f_forw.f_host, f->f_un.f_forw.f_port,
+   (struct sockaddr*)>f_un.f_forw.f_addr,
+   sizeof(f->f_un.f_forw.f_addr)) != 0) {
+   log_warnx("bad hostname \"%s\"",
+   f->f_un.f_forw.f_loghost);
+   tcp_connect_retry(bufev, f);
+   return;
+   }
+   }
+
if ((s = tcp_socket(f)) == -1) {
tcp_connect_retry(bufev, f);
return;
@@ -1511,21 +1527,57 @@ tcp_connect_retry(struct bufferevent *bu
 {
struct timeval   to;
 
-   if (f->f_un.f_forw.f_reconnectwait == 0)
-   f->f_un.f_forw.f_reconnectwait = 1;
+   if (f->f_un.f_forw.f_retrywait == 0)
+   f->f_un.f_forw.f_retrywait = 1;
else
-   f->f_un.f_forw.f_reconnectwait <<= 1;
-   if (f->f_un.f_forw.f_reconnectwait > 600)
-   f->f_un.f_forw.f_reconnectwait = 600;
-   to.tv_sec = f->f_un.f_forw.f_reconnectwait;
+   f->f_un.f_forw.f_retrywait <<= 1;
+   if (f->f_un.f_forw.f_retrywait > 600)
+   f->f_un.f_forw.f_retrywait = 600;
+   to.tv_sec = f->f_un.f_forw.f_retrywait;
to.tv_usec = 0;
+   evtimer_add(>f_un.f_forw.f_ev, );
 
-   log_debug("tcp connect retry: wait %d",
-   f->f_un.f_forw.f_reconnectwait);
+   log_debug("tcp connect retry: wait %d", f->f_un.f_forw.f_retrywait);
bufferevent_setfd(bufev, -1);
-   /* We can reuse the write event as bufferevent is disabled. */
-   evtimer_set(>ev_write, tcp_connectcb, f);
-   evtimer_add(>ev_write, );
+}
+
+void
+udp_resolvecb(int fd, short event, void *arg)
+{
+   struct filed*f = arg;
+   struct timeval   to;
+
+   if (priv_getaddrinfo(f->f_un.f_forw.f_ipproto,
+   f->f_un.f_forw.f_host, f->f_un.f_forw.f_port,
+   (struct sockaddr*)>f_un.f_forw.f_addr,
+   sizeof(f->f_un.f_forw.f_addr)) == 0) {
+   switch (f->f_un.f_forw.f_addr.ss_family) {
+   case AF_INET:
+   log_debug("resolved \"%s\" to IPv4 address",
+   f->f_un.f_forw.f_loghost);
+   f->f_file = fd_udp;
+   break;
+   

Re: panic: rw_enter: pfioctl_rw locking against myself

2023-06-29 Thread Alexander Bluhm
On Wed, Jun 28, 2023 at 09:41:15PM +0200, Florian Obser wrote:
> Yes, good idea, let's ship 7.4 with the leak!

The backout replaced a crash with a leak.  We don't want to ship
7.4 with a potential kernel crash either.

> I'm getting a bit annoyed with unlocking this and rewriting that and then 
> complete silence when shit breaks.

It took some time to figure out the cause of your hangs.  Sorry for that.

And then the first fix triggered a kernel crash and broke regress.
Kn@ has tested the cause for the crash.  I think a quick backout for
a crashing diff is justified.
Sashan@ has fixed the crash, that was introduced independenly but
has been triggered by claudio@'s fix.
Now we are working on something that does not crash, fixes the leak,
and passes regress.  But that takes a little time.

bluhm



Re: ifconfig sbar hang

2023-06-28 Thread Alexander Bluhm
On Wed, Jun 28, 2023 at 11:25:56AM +0200, Mark Kettenis wrote:
> > From: Alexander Bluhm 
> > load: 3.00  cmd: ifconfig 52949 [sbar] 0.01u 0.05s 0% 78k
> > ifconfig holds the netlock, I guess this prevents progress.
> 
> What does a WITNESS kernel report?

This is hard to say as I cannot reproduce.  I grepped through my
console logs and found these issues.

Nov 13 2022, starting network ix, ot15 amd64
Nov 18 2022, starting network ix, ot15 amd64
Nov 19 2022, starting network ix, ot15 amd64
Nov 19 2022, starting network ix, ot15 amd64
Nov 21 2022, starting network ix, ot15 amd64
Nov 21 2022, starting network ix, ot14 amd64
Feb  2 2023, ifconfig mcx0 down, ot10 arm64
Jun 25 2023, starting network ix ot31 amd64

The hangs in November were fixed by a few backouts.

In February it happend once with mcx(4) on arm64.

root@ot10:.../~# ifconfig mcx0 down
load: 1.08  cmd: ifconfig 81584 [sbar] 0.00u 0.07s 0% 67k^M

ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 81584   79693   8043  0  3 0x3  sbar  ifconfig
 15833  336275  93940  0  30x100083  ttyin ksh
 93940  402344  85411  0  30x9a  kqreadsshd
 85411  256105  1  0  30x88  kqreadsshd
  8043  494868  1  0  30x10008b  sigsusp   ksh
 81425  284730  1  0  30x100098  kqreadcron
 98122  214551  1 99  3   0x1100090  kqreadsndiod
 12478  363247  1110  30x100090  kqreadsndiod
 68723  281017  58293 95  3   0x1100092  kqreadsmtpd
 66232  206325  58293103  3   0x1100092  kqreadsmtpd
 11325   62850  58293 95  3   0x1100092  kqreadsmtpd
 20547  416814  58293 95  30x100092  kqreadsmtpd
 56040  129458  58293 95  3   0x1100092  kqreadsmtpd
 17357   62413  58293 95  3   0x1100092  kqreadsmtpd
 58293  210166  1  0  30x100080  kqreadsmtpd
 91251  150204  61674 91  30x92  kqreadsnmpd_metrics
 61674  232064  1  0  30x100080  kqreadsnmpd
 57336  163242  1 91  3   0x1100092  kqreadsnmpd
 852626957  0  0  3 0x14200  acct  acct
 77255  305020  1  0  30x100080  kqreadntpd
 33023   22895  66431 83  30x100092  kqreadntpd
 66431  320304  1 83  3   0x1100012  netlock   ntpd
 23397  480861   4591 74  3   0x1100092  bpf   pflogd
  4591  165046  1  0  30x80  netio pflogd
 33526   16136  38718 73  3   0x1100090  kqreadsyslogd
 38718  363093  1  0  30x100082  netio syslogd
 27296  406775  1  0  30x100080  kqreadresolvd
 48075   26460  29615 77  30x100092  kqreaddhcpleased
 61089   41438  29615 77  30x100092  kqreaddhcpleased
 29615   55499  1  0  30x80  kqreaddhcpleased
 87685  213464  18916115  30x100092  kqreadslaacd
 20348  141000  18916115  30x100092  kqreadslaacd
 18916  288162  1  0  30x100080  kqreadslaacd
 52479  405921  0  0  3 0x14200  bored smr
 98977  128674  0  0  3 0x14200  pgzerozerothread
 47683  136576  0  0  3 0x14200  aiodoned  aiodoned
 31843  499677  0  0  3 0x14200  syncerupdate
 54183  192766  0  0  3 0x14200  cleaner   cleaner
 59290  295494  0  0  3 0x14200  reaperreaper
 57470  163245  0  0  3 0x14200  pgdaemon  pagedaemon
 62972  158489  0  0  3 0x14200  mmctsksdmmc1
 38777  388048  0  0  3 0x14200  usbtskusbtask
 50340  433235  0  0  3 0x14200  usbatsk   usbatsk
 73074  317607  0  0  3 0x14200  bored sensors
 34503  452302  0  0  3 0x14200  mmctsksdmmc0
 31956  271329  0  0  3 0x14200  bored suspend
 80033  244737  0  0  7  0x40014200idle3
 55093  157860  0  0  1 0x14200idle2
 15894  247866  0  0  7  0x40014200idle1
 96587  185531  0  0  3 0x14200  bored softnet
   828  362691  0  0  3 0x14200  bored softnet
 51534  398133  0  0  3 0x14200  bored softnet
 73800   41723  0  0  3 0x14200  bored softnet
 46236  496068  0  0  2  0x40014200systqmp
 31144   27768  0  0  3 0x14200  netlock   systq
 78836   32990  0  0  3  0x40014200  netlock   softclock
*93904  268669  0  0  7  0x40014200idle0
 26354  129736  0  0  3 0x14200  kmalloc   kmthread
 1  147810  0  0  30x82  wait  init
 0   0 -1  0  3 0x10200  scheduler swapp

Re: panic: rw_enter: pfioctl_rw locking against myself

2023-06-28 Thread Alexander Bluhm
On Wed, Jun 28, 2023 at 05:46:36PM +0200, Alexandr Nedvedicky wrote:
> it looks like we need to use goto fail instead of return.
> this is the diff I'm testing now.

OK bluhm@

> 8<---8<---8<--8<
> diff --git a/sys/net/pf_ioctl.c b/sys/net/pf_ioctl.c
> index 36779cfdfd3..a51df9e6089 100644
> --- a/sys/net/pf_ioctl.c
> +++ b/sys/net/pf_ioctl.c
> @@ -1508,11 +1508,15 @@ pfioctl(dev_t dev, u_long cmd, caddr_t addr, int 
> flags, struct proc *p)
>   int  i;
>  
>   t = pf_find_trans(minor(dev), pr->ticket);
> - if (t == NULL)
> - return (ENXIO);
> + if (t == NULL) {
> + error = ENXIO;
> + goto fail;
> + }
>   KASSERT(t->pft_unit == minor(dev));
> - if (t->pft_type != PF_TRANS_GETRULE)
> - return (EINVAL);
> + if (t->pft_type != PF_TRANS_GETRULE) {
> + error = EINVAL;
> + goto fail;
> + }
>  
>       NET_LOCK();
>   PF_LOCK();
> On Wed, Jun 28, 2023 at 02:38:00PM +0200, Alexander Bluhm wrote:
> > Hi,
> > 
> > Since Jun 26 regress tests panic the kernel.
> > 
> > panic: rw_enter: pfioctl_rw locking against myself
> > Stopped at  db_enter+0x14:  popq%rbp
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > * 19846  58589  0 0x2  01K pfctl
> >  343161  43899  0 0x2  02  perl
> > db_enter() at db_enter+0x14
> > panic(820e7d9d) at panic+0xc3
> > rw_enter(82462c60,1) at rw_enter+0x26f
> > pfioctl(24900,cd504407,80f4b000,1,80002226adc0) at pfioctl+0x2da
> > VOP_IOCTL(fd827bfea6e0,cd504407,80f4b000,1,fd827f7e3bc8,80002226adc0)
> >  at VOP_IOCTL+0x60
> > vn_ioctl(fd823b841d20,cd504407,80f4b000,80002226adc0) at 
> > vn_ioctl+0x79
> > sys_ioctl(80002226adc0,800022458160,8000224581c0) at 
> > sys_ioctl+0x2c4
> > syscall(800022458230) at syscall+0x3d4
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x77becbc54dd0, count: 6
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{1}> 
> > 
> > Triggered by regress/sbin/pfctl
> > 
> >  pfload 
> > ...
> > /sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf90.in
> > /sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
> > 's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
> > /usr/src/regress/sbin/pfctl/pf90.loaded /dev/stdin
> > /sbin/pfctl -o none -a regress -Fr >/dev/null 2>&1
> > /sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf91.in
> > /sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
> > 's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
> > /usr/src/regress/sbin/pfctl/pf91.loaded /dev/stdin
> > Timeout, server ot6 not responding.
> > 
> > bluhm
> > 



panic: rw_enter: pfioctl_rw locking against myself

2023-06-28 Thread Alexander Bluhm
Hi,

Since Jun 26 regress tests panic the kernel.

panic: rw_enter: pfioctl_rw locking against myself
Stopped at  db_enter+0x14:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
* 19846  58589  0 0x2  01K pfctl
 343161  43899  0 0x2  02  perl
db_enter() at db_enter+0x14
panic(820e7d9d) at panic+0xc3
rw_enter(82462c60,1) at rw_enter+0x26f
pfioctl(24900,cd504407,80f4b000,1,80002226adc0) at pfioctl+0x2da
VOP_IOCTL(fd827bfea6e0,cd504407,80f4b000,1,fd827f7e3bc8,80002226adc0)
 at VOP_IOCTL+0x60
vn_ioctl(fd823b841d20,cd504407,80f4b000,80002226adc0) at 
vn_ioctl+0x79
sys_ioctl(80002226adc0,800022458160,8000224581c0) at sys_ioctl+0x2c4
syscall(800022458230) at syscall+0x3d4
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x77becbc54dd0, count: 6
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{1}> 

Triggered by regress/sbin/pfctl

 pfload 
...
/sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf90.in
/sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
/usr/src/regress/sbin/pfctl/pf90.loaded /dev/stdin
/sbin/pfctl -o none -a regress -Fr >/dev/null 2>&1
/sbin/pfctl -o none -a regress -f - < /usr/src/regress/sbin/pfctl/pf91.in
/sbin/pfctl -o none -a 'regress/*' -gvvsr |  sed -e 
's/__automatic_[0-9a-f]*_/__automatic_/g' |  diff -u 
/usr/src/regress/sbin/pfctl/pf91.loaded /dev/stdin
Timeout, server ot6 not responding.

bluhm



ifconfig sbar hang

2023-06-26 Thread Alexander Bluhm
Hi,

I have an ifconfig on ix(4) that hangs in "sbar" wait queue during
"starting network" while booting.

load: 3.00  cmd: ifconfig 52949 [sbar] 0.01u 0.05s 0% 78k

ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 52949  250855  50082  0  3 0x3  sbar  ifconfig
 50082  468479  32384  0  30x10008b  sigsusp   sh
 52583  256132  23859 77  30x100092  kqreaddhcpleased
 26314 670  23859 77  30x100092  kqreaddhcpleased
 23859  213684  1  0  30x80  kqreaddhcpleased
  1084  413649  97426115  30x100092  kqreadslaacd
 79640  480435  97426115  30x100092  kqreadslaacd
 97426  244636  1  0  30x100080  kqreadslaacd
 32384  389946  1  0  30x10008b  sigsusp   sh
 25127  139046  0  0  3 0x14200  bored smr
 38562   94707  0  0  3 0x14200  pgzerozerothread
 27589   65355  0  0  3 0x14200  aiodoned  aiodoned
 20876  273172  0  0  3 0x14200  syncerupdate
 35865  394897  0  0  3 0x14200  cleaner   cleaner
 89296   37410  0  0  3 0x14200  reaperreaper
  4195   18701  0  0  3 0x14200  pgdaemon  pagedaemon
 70794   65241  0  0  3 0x14200  usbtskusbtask
 42580  105576  0  0  3 0x14200  usbatsk   usbatsk
 969136418  0  0  3  0x40014200  acpi0 acpi0
 43860  163896  0  0  1 0x14200idle7
  9928  477713  0  0  7  0x40014200idle6
 19947  457773  0  0  7  0x40014200idle5
 71017  110610  0  0  7  0x40014200idle4
 73733  294276  0  0  7  0x40014200idle3
 73085  302072  0  0  7  0x40014200idle2
 89634  211435  0  0  7  0x40014200idle1
 45877  221411  0  0  2  0x40014200sensors
 41433  306787  0  0  3 0x14200  bored softnet3
 85227  338038  0  0  3 0x14200  bored softnet2
 72032  215983  0  0  3 0x14200  netlock   softnet1
 32550  351943  0  0  3 0x14200  bored softnet0
 11993  408132  0  0  2  0x40014200systqmp
 58738  210334  0  0  3 0x14200  netlock   systq
 70352  115696  0  0  3  0x40014200  netlock   softclock
*95768  350377  0  0  7  0x40014200idle0
 1  298699  0  0  30x82  wait  init
 0   0 -1  0  3 0x10200  scheduler swapper

ifconfig holds the netlock, I guess this prevents progress.

ddb{0}> trace /t 0t250855
sleep_finish(8000248a3928,1) at sleep_finish+0x102
cond_wait(8000248a39c0,8207c985) at cond_wait+0x64
sched_barrier(80002253fff0) at sched_barrier+0x77
ixgbe_stop(80776000) at ixgbe_stop+0x1f7
ixgbe_init(80776000) at ixgbe_init+0x36
ixgbe_ioctl(80776048,8020690c,80842500) at ixgbe_ioctl+0x13e
in_ifinit(80776048,80842500,8000248a3cf0,1) at in_ifinit+0x
f3
in_ioctl_change_ifaddr(8040691a,8000248a3ce0,80776048) at in_ioctl_
change_ifaddr+0x390
ifioctl(fd8746c878f8,8040691a,8000248a3ce0,80002487ab00) at ifioctl
+0x988
sys_ioctl(80002487ab00,8000248a3df0,8000248a3e50) at sys_ioctl+0x2c
4
syscall(8000248a3ec0) at syscall+0x3d4
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x74aea7fb4da0, count: -12

systqmp is here, it may wait for the scheduler lock.

ddb{0}> trace /t 0t408132
sched_barrier_task(8000248a39b8) at sched_barrier_task+0x7e
taskq_thread(824ac758) at taskq_thread+0x100
end trace frame: 0x0, count: -2

sensors thread seems to wait for scheduler lock, too.

ddb{0}> trace /t 0t221411
sched_peg_curproc(80002253fff0) at sched_peg_curproc+0x69
cpu_hz_update_sensor(80002253fff0) at cpu_hz_update_sensor+0x21
sensor_task_work(80366700) at sensor_task_work+0x48
taskq_thread(80362100) at taskq_thread+0x100

ddb{0}> show struct __mp_lock sched_lock
struct sched_lock at 0x8250fa54 (520 bytes) {mpl_cpus = 10144565, mpl_t
icket = 0x9acb36, mpl_users = 0x9acb35}

systq is blocked by netlock

ddb{0}> trace /t 0t210334
sleep_finish(8000247ab030,1) at sleep_finish+0x102
rw_enter(824a4fe8,1) at rw_enter+0x1cf
pf_purge(824bb760) at pf_purge+0x38
taskq_thread(824ac708) at taskq_thread+0x100
end trace frame: 0x0, count: -4

bluhm



Re: dvmrpd start causes kernel panic: assertion failed

2023-06-12 Thread Alexander Bluhm
On Mon, Jun 12, 2023 at 11:56:43PM +0300, Vitaliy Makkoveev wrote:
> On Mon, Jun 12, 2023 at 09:04:41PM +0200, Why 42? The lists account. wrote:
> > 
> > On Wed, Jun 07, 2023 at 03:50:29PM +0300, Vitaliy Makkoveev wrote:
> > > > ...
> > > > Please, share your dvmrpd.conf.
> > > > 
> > > 
> > > Also, you could try to use ktrace to provide some additional info.
> > 
> > 
> > Hi Again,
> > 
> > On site I had to power cycle the ThinkPad to be able to get control.
> > 
> > The contents of the dvmrpd config file should be visible here:
> > dvmrpd.conf+ifconfig.jpghttps://paste.c-net.org/SlimeReply
> > 
> > In order to be able to show progress, I tried using "mrouted" instead.
> > It seems to have resulted in much the same panic.
> > So apparently the problem may not specific to dvmrpd.
> > Maybe something related to the USB Ethernet adaptor? I see some
> > references to both "ure" and "usb" in the stack traces ...
> > 
> > See for example:
> > mrouted_panic.jpg   https://paste.c-net.org/YolandaSamir
> > ddb_show_panic+trace.jpghttps://paste.c-net.org/TrackParent
> > ddbcpu0+1.jpg   https://paste.c-net.org/HansonAinsley
> > ddbcpu3+4+5.jpg https://paste.c-net.org/MidtermsComposer
> > ddbcpu6+7.jpg   https://paste.c-net.org/CostaScratchy
> > 
> > Sorry about all the photos, it was the best I could do. I'm driving the
> > system via a pretty rubbish KVM switch.
> > 
> > Hope this helps with the analysis. In the meantime I'll look around for
> > some other multicast routing solution.
> > 
> > Cheers,
> > Robb.
> > 
> 
> We have missing kernel lock around (*if_sysctl)(). Diff below should fix
> it.

OK bluhm@

> Index: sys/netinet/ip_mroute.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_mroute.c,v
> retrieving revision 1.138
> diff -u -p -r1.138 ip_mroute.c
> --- sys/netinet/ip_mroute.c   19 Apr 2023 20:03:51 -  1.138
> +++ sys/netinet/ip_mroute.c   12 Jun 2023 20:55:05 -
> @@ -718,7 +718,9 @@ add_vif(struct socket *so, struct mbuf *
>   satosin(_addr)->sin_len = sizeof(struct sockaddr_in);
>   satosin(_addr)->sin_family = AF_INET;
>   satosin(_addr)->sin_addr = zeroin_addr;
> + KERNEL_LOCK();
>   error = (*ifp->if_ioctl)(ifp, SIOCADDMULTI, (caddr_t));
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   }



powerpc64 panic pool_do_get: pted free list modified

2023-06-05 Thread Alexander Bluhm
Hi

During make release my powerpc64 machines paniced.

[-- MARK -- Mon Jun  5 17:55:00 2023]
ppaanniicc::   p o o l _pda on_ ig ce:  t  :  p  t  e  d f r  e e   l  
i  s   t m  o  d i   f  ie  d  :  p  o  o l  _  pd ao _g g ee  t   :
 p t  edf  0r x e ec 0 0  0  0  0 0   0  7  c 5f  6  0 0   0  ;
i  t  e   m a  d   d r0 x c   0 0 0 0  0   0 0 7   c  5 f  6e   e   0  
; of  f   s e  t 0  x 0   =0  x  1  1  5  a !  = 0 x  e 
  9 d  3 3 8  b 8  9  d b  9  a  9  8  9  
   Stopped at  panic+0x134:ori r0,r0,0x0
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
* 70116  54961 21 0x2  00  c++
 145375  23870 21 0x2  01  c++
 398476   3921 21 0x2  02  c++
 147779  92769 21 0x2  03  c++
panic+0x134
pool_do_get+0x3c0
pool_get+0xd4
pmap_enter+0x1a0
uvm_fault_lower+0x8cc
uvm_fault+0x204
trap+0x274
trapagain+0x4
--- trap (type 0x400) ---
End of kernel: 0xb2a4c7ee7180 lr 0x1297f14c
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}> 

ddb{0}> show panic
 cpu0: pool_do_get: pted free list modified: page 0xc0007c5f6000; item addr 
0xc0007c5f6ee0; offset 0x0=0x115a != 0xe9d338b89db9a989
*cpu1: pool_do_get: pted free list modified: page 0xc0007c5f6000; item addr 
0xc0007c5f6ee0; offset 0x0=0x115a != 0xe9d338b89db9a989
 cpu2: pool_do_get: pted free list modified: page 0xc0007c5f6000; item addr 
0xc0007c5f6ee0; offset 0x0=0x115a != 0xe9d338b89db9a989

ddb{0}> trace
panic+0x134
pool_do_get+0x3c0
pool_get+0xd4
pmap_enter+0x1a0
uvm_fault_lower+0x8cc
uvm_fault+0x204
trap+0x274
trapagain+0x4
--- trap (type 0x400) ---
End of kernel: 0xb2a4c7ee7180 lr 0x1297f14c

ddb{0}> show register
r0 0x13da458panic+0xbc
r10xc0007c707700
r2 0x1a27000.TOC.
r3   0x1
r4   0x2
r5   0x1
r6 0x1b17000extract_entropy.extract_pool+0xee8
r70x31e80060
r8 0
r90x31e80060
r10   0x31e80060
r110
r120
r13  0x4927ed0b8
r14  0x440de2ea8
r150
r16 0x30
r17  0x1
r18   0x10acfe2c_end+0xefb53bc
r19   0xfffd
r20  0x1
r210x1a52a70kernel_pmap_store
r220x1a72e00pmap_pted_pool
r23  0x4
r24   0x1297f000
r25   0x5380
r26 0x24
r270
r280x1a29de4cpu_info+0x19cc
r290x1a28ffccpu_info+0xbe4
r300x190ca07digits+0x1e750
r31   0x90004200f932
lr 0x13da4d0panic+0x134
cr0x44a0f988
xer   0x2004
ctr   0x3003b1ac
iar0x13da4d0panic+0x134
msr   0x90029032
dar  0x50ab2a000
dsisr 0x4200
panic+0x134:ori r0,r0,0x0

ddb{0}> show uvm
Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  1913516 VM pages: 145927 active, 139575 inactive, 1 wired, 1204052 free 
(112728 zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=63783, free-target=85044, inactive-target=0, wired-max=637838
  faults=1107733853, traps=1112742452, intrs=30838799, ctxswitch=22015968 
fpuswitch=0
  softint=2706667, syscalls=1361340222, kmapent=11
  fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=431248(440714), anget(retries)=930540574(0), 
amapcopy=57022822
neighbor anon/obj pg=163789889/113986966, gets(lock/unlock)=44121808/440775
cases: anon=928337420, anoncow=2203154, obj=36199568, prcopy=7912713, 
przero=133080995
  daemon and swap counts:
woke=0, revs=0, scans=0, obscans=0, anscans=0
busy=0, freed=0, reactivate=0, deactivate=0
pageouts=0, pending=0, nswget=0
nswapdev=1
swpages=2162687, swpginuse=0, swpgonly=0 paging=0
  kernel pointers:
objs(kern)=0x1b133f0

ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*54961   70116  32547 21  7 0x2c++
 32547  278958  15675 21  30x10008a  sigsusp   sh
 23870  145375  62616 21  7 0x2c++
 62616  433348  15675 21  30x10008a  sigsusp   sh
  3921  

powerpc64 panic: kernel diagnostic assertion "pm == pted->pted_pmap"

2023-05-10 Thread Alexander Bluhm
Hi,

During release build my powerpc64 machine crashed.

login: [-- MARK -- Wed May 10 14:40:00 2023]
panic: kernel diagnostic assertion "pm == pted->pted_pmap" failed: file 
"/usr/src/sys/arch/powerpc64/powerpc64/pmap.c", line 865
Stopped at  panic+0x134:ori r0,r0,0x0
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 417455  88036 21 0x2  00  ld
 229563  41537 21 0x2  02  ld
*479393  69358 21 0x2  01  cc
 270251  20895 21 0x2  03  cc
panic+0x134
__assert+0x30
pmap_remove_pted+0x310
pmap_remove+0x134
uvm_map_protect+0x5e4
sys_mprotect+0x1a8
syscall+0x3b8
trap+0x5dc
trapagain+0x4
--- syscall (number 74) ---
End of kernel: 0xbef0963a5a80 lr 0x42e5f9cf4
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> show panic
*cpu1: kernel diagnostic assertion "pm == pted->pted_pmap" failed: file 
"/usr/src/sys/arch/powerpc64/powerpc64/pmap.c", line 865

ddb{1}> x/s version
version:OpenBSD 7.3-current (GENERIC.MP) #0: Wed May 10 14:33:27 CEST 
2023\012
r...@ot27.obsd-lab.genua.de:/usr/src/sys/arch/powerpc64/compile/GENERIC.MP\012

ddb{1}> trace
panic+0x134
__assert+0x30
pmap_remove_pted+0x310
pmap_remove+0x134
uvm_map_protect+0x5e4
sys_mprotect+0x1a8
syscall+0x3b8
trap+0x5dc
trapagain+0x4
--- syscall (number 74) ---
End of kernel: 0xbef0963a5a80 lr 0x42e5f9cf4

ddb{1}> show register
r0  0x109b08panic+0xbc
r10xc00079649800
r2  0x9d6000.TOC.
r3   0x1
r4   0x2
r5   0x1
r6  0xac6000rootonlyports+0x12b0
r70x31ea0060
r8 0
r90x31ea0060
r10   0x31ea0060
r110
r120
r13  0x5098bbaf8
r140
r150
r160
r17   0xfffd
r18  0x3
r19  0x7
r20   0xfffd
r210
r22  0xc
r230
r24   0xc000791cf618
r25 0x9d8f00db_active
r26 0x9e2f50panicstr
r270
r280
r29 0xa59afccpu_info+0x19cc
r30 0x872717etext+0xbd603
r31   0x9200d032
lr  0x109b80panic+0x134
cr0x44804a08
xer   0x2004
ctr   0x3003b1ac
iar 0x109b80panic+0x134
msr   0x90029032
dar   0xc00078ade0e8
dsisr 0x4200
panic+0x134:ori r0,r0,0x0

ddb{1}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 88036  417455  45467 21  7 0x2ld
 41537  229563  45467 21  7 0x2ld
 41537  498264  45467 21  3   0x482  fsleepld
 41537  486844  45467 21  3   0x482  fsleepld
 41537  502796  45467 21  3   0x482  fsleepld
 41537  114120  45467 21  3   0x482  fsleepld
*69358  479393   9016 21  7 0x2cc
  9016   22274  45467 21  30x10008a  sigsusp   sh
 20895  270251  88260 21  7 0x2cc
 88260  481205  45467 21  30x10008a  sigsusp   sh
 45467  282350  90407 21  30x10008a  sigsusp   make
 90407  168373  52666 21  30x10008a  sigsusp   sh
 52666  355563  40131 21  30x10008a  sigsusp   make
 40131  461379  23692  0  30x10008a  sigsusp   sh
 23692  242104  91419  0  30x10008a  sigsusp   make
 91419  346756  75128  0  30x10008a  sigsusp   make
 75128  200705  14536  0  30x10008a  sigsusp   sh
 14536  205518  78111  0  30x82  piperdperl
 78111  128981  56070  0  30x10008a  sigsusp   ksh
 56070  359431  46964  0  30x9a  kqreadsshd
 53097  137120  1  0  30x100083  ttyin getty
 25790  434127  1  0  30x100098  kqreadcron
 14945   42253  1 99  3   0x1100090  kqreadsndiod
 33403  411656  1110  30x100090  kqreadsndiod
 31839  284532  19451 95  3   0x1100092  kqreadsmtpd
 30526   46149  19451103  3   0x1100092  kqreadsmtpd
  7452  111777  19451 95  3   0x1100092  kqreadsmtpd
 63409  442745  19451 95  3

powerpc64 panic: pool_do_get: vmmpepl free list modified

2023-05-05 Thread Alexander Bluhm
Hi,

I got this crash while building clang during make release.

[-- MARK -- Wed May  3 15:40:00 2023]
panic: pool_do_get: vmmpepl free list modified: page 0xc0007ad90590; item 
addr 0x91a2a032; offset 0x0=0xc000 != 0x50007b7ba542
Stopped at  panic+0x134:ori r0,r0,0x0
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*241911  96539 21 0x2  00  c++
 310367  56066 21 0x2  02  c++
 457003  20366 21 0x2  01  c++
 489060  27200 21 0x2  03  c++
panic+0x134
pool_do_get+0x3c0
pool_get+0xd4
uvm_mapent_alloc+0x22c
uvm_mapanon+0x118
sys_mmap+0x4e4
syscall+0x3b8
trap+0x5dc
trapagain+0x4
--- syscall (number 49) ---
End of kernel: 0xb065166138f0 lr 0x4b1426cb0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}> 

Unfortunately cron rebooted the machine next day and there is not
more information.  I will keep and eye on this.

bluhm

Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2023 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.3-current (GENERIC.MP) #0: Wed May  3 14:34:25 CEST 2023
r...@ot27.obsd-lab.genua.de:/usr/src/sys/arch/powerpc64/compile/GENERIC.MP
real mem  = 8589934592 (8192MB)
avail mem = 7837638656 (7474MB)
random: good seed from bootblocks
mainbus0 at root: C1P9S01 REV 1.00
cpu0 at mainbus0 pir 50: IBM POWER9 2.1, 2500 MHz
cpu0: 32KB 128b/line 8-way L1 I-cache, 32KB 128b/line 8-way L1 D-cache
cpu0: 512KB 128b/line 8-way L2 cache
cpu0: 10MB 128b/line 8-way L3 cache
cpu1 at mainbus0 pir 54: IBM POWER9 2.1, 2500 MHz
cpu1: 32KB 128b/line 8-way L1 I-cache, 32KB 128b/line 8-way L1 D-cache
cpu1: 512KB 128b/line 8-way L2 cache
cpu1: 10MB 128b/line 8-way L3 cache
cpu2 at mainbus0 pir 58: IBM POWER9 2.1, 2500 MHz
cpu2: 32KB 128b/line 8-way L1 I-cache, 32KB 128b/line 8-way L1 D-cache
cpu2: 512KB 128b/line 8-way L2 cache
cpu2: 10MB 128b/line 8-way L3 cache
cpu3 at mainbus0 pir 5c: IBM POWER9 2.1, 2500 MHz
cpu3: 32KB 128b/line 8-way L1 I-cache, 32KB 128b/line 8-way L1 D-cache
cpu3: 512KB 128b/line 8-way L2 cache
cpu3: 10MB 128b/line 8-way L3 cache
"bmc" at mainbus0 not configured
"ibm,firmware-versions" at mainbus0 not configured
"ibm,hostboot" at mainbus0 not configured
opal0 at mainbus0: skiboot-852ac62
opal0: idle psscr 300332
opalcons0 at opal0: console
opalsens0 at opal0: "core-temp"
opalsens1 at opal0: "core-temp"
opalsens2 at opal0: "core-temp"
opalsens3 at opal0: "core-temp"
opalsens4 at opal0: "mem-temp"
opalsens5 at opal0: "mem-temp"
opalsens6 at opal0: "mem-temp"
opalsens7 at opal0: "mem-temp"
opalsens8 at opal0: "mem-temp"
opalsens9 at opal0: "mem-temp"
opalsens10 at opal0: "mem-temp"
opalsens11 at opal0: "mem-temp"
opalsens12 at opal0: "mem-temp"
opalsens13 at opal0: "mem-temp"
opalsens14 at opal0: "mem-temp"
opalsens15 at opal0: "mem-temp"
opalsens16 at opal0: "mem-temp"
opalsens17 at opal0: "mem-temp"
opalsens18 at opal0: "mem-temp"
opalsens19 at opal0: "mem-temp"
opalsens20 at opal0: "proc-energy"
opalsens21 at opal0: "proc-energy"
opalsens22 at opal0: "proc-energy"
opalsens23 at opal0: "proc-in"
opalsens24 at opal0: "proc-in"
opalsens25 at opal0: "proc-power"
opalsens26 at opal0: "proc-power"
opalsens27 at opal0: "proc-power"
opalsens28 at opal0: "proc-temp"
opalsens29 at opal0: "vrm-curr"
opalsens30 at opal0: "vrm-curr"
opalsens31 at opal0: "vrm-in"
opalsens32 at opal0: "vrm-in"
opalsens33 at opal0: "vrm-temp"
ipmi0 at opal0: version 2.0 interface OPAL
"ibm,pcie-slots" at mainbus0 not configured
"ibm,secureboot" at mainbus0 not configured
"imc-counters" at mainbus0 not configured
xics0 at mainbus0
xive0 at mainbus0
"ipl-params" at mainbus0 not configured
"lpcm-opb" at mainbus0 not configured
phb0 at mainbus0: chip 0x0
pci0 at phb0
ppb0 at pci0 dev 0 function 0 "IBM POWER9 Host" rev 0x00
pci1 at ppb0 bus 1
phb1 at mainbus0: chip 0x0
pci2 at phb1
ppb1 at pci2 dev 0 function 0 "IBM POWER9 Host" rev 0x00
pci3 at ppb1 bus 1
phb2 at mainbus0: chip 0x0
pci4 at phb2
ppb2 at pci4 dev 0 function 0 "IBM POWER9 Host" rev 0x00
pci5 at ppb2 bus 1
ahci0 at pci5 dev 0 function 0 "Marvell 88SE9235 AHCI" rev 0x11: msi, AHCI 1.0
ahci0: port busy after first PMP probe FIS
ahci0: port busy after first PMP probe FIS
ahci0: port 0: 6.0Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0:  naa.
sd0: 228936MB, 512 bytes/sector, 468862128 sectors, thin
phb3 at mainbus0: chip 0x0
pci6 at phb3
ppb3 at pci6 dev 0 function 0 "IBM POWER9 Host" rev 0x00
pci7 at ppb3 bus 1
xhci0 at pci7 dev 0 function 0 "TI xHCI" rev 0x02: msix, xHCI 0.96
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "TI xHCI root hub" rev 3.00/1.00 addr 
1
phb4 at mainbus0: chip 0x0
pci8 at phb4
ppb4 at pci8 dev 0 function 0 "IBM POWER9 Host" rev 0x00
pci9 at ppb4 bus 

Re: Intel Ethernet (?Synopsys based) on Filet3 Elkhart Lake unconfigured on recent snapshot

2023-04-28 Thread Alexander Bluhm
> > On 28 Apr 2023, at 06:06, Ted Ri  wrote:
> > "Intel Elkhart Lake Ethernet" rev 0x11 at pci0 dev 29 function 1 not 
> > configured
> > "Intel Elkhart Lake Ethernet" rev 0x11 at pci0 dev 29 function 2 not 
> > configured

On Fri, Apr 28, 2023 at 07:50:48PM +1000, David Gwynne wrote:
> and one of these machines to do the work with.

If someone is interrested in hacking on this, I can provide a machine
with remote serial console, power control and install server.

http://obsd-lab.genua.de/hw/ot50.html

bluhm



Re: PF still blocks IGMP multicast control packets

2023-02-24 Thread Alexander Bluhm
On Fri, Feb 24, 2023 at 08:42:29AM +0100, Luca Di Gregorio wrote:
> I would implement this logic:
> 
> If the IP Destination Address is 224.0.0.0/4, then the TTL should be 1.
> If the IP Destination Address is not 224.0.0.0/4, then no restrictions on
> TTL.
> 
> In your code, I would do this modification:
> 
> -   if ((h->ip_ttl != 1) || !IN_MULTICAST(h->ip_dst.s_addr)) {
> -   DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> -   REASON_SET(reason, PFRES_IPOPTIONS);
> -   return (PF_DROP);
> -   }
> 
> +   if ((h->ip_ttl != 1) && IN_MULTICAST(h->ip_dst.s_addr)) {
> +   DPFPRINTF(LOG_NOTICE, "Invalid IGMP");
> +   REASON_SET(reason, PFRES_IPOPTIONS);
> +   return (PF_DROP);
> +   }

Sounds resonable.

> Anyway, I'm ok if you revert your commit from May 3rd. If this will be the
> case, I expect (please correct me if I'm wrong) that in /etc/pf.conf there
> must be the line:
> pass proto igmp allow-opts
> Otherwise the packet with Router Alert will be discarded.

No, the router alert and ttl commits are distinct.


revision 1.1128
date: 2022/05/03 13:32:47;  author: sashan;  state: Exp;  lines: +22 -2;  
commitid: A2jgIEZZkDP6Qtya;
Make pf(4) more paranoid about IGMP/MLP messages. MLD/IGMP messages
with ttl other than 1 will be discarded. Also MLD messages with
other than link-local source address will be discarded. IGMP
messages with destination address other than multicast class
will be discarded.

feedback and OK bluhm@, cluadio@

revision 1.1127
date: 2022/04/29 08:58:49;  author: bluhm;  state: Exp;  lines: +115 -21;  
commitid: KJ50nLH6n2yUzUKe;
IGMP and ICMP6 MLD packets always have the router alert option set.
pf blocked IPv4 options and IPv6 option header by default.  This
forced users to set allow-opts in pf rules.
Better let multicast work by default.  Detect router alerts by
parsing IP options and hop by hop headers.  If the packet has only
this option and is a multicast control packet, do not block it due
to bad options.
tested by otto@; OK sashan@


Sasha suggested to revert only revision 1.1128.  The automatic
allow-opts is revision 1.1127.  We keep that.

> Regarding MLD, I can't say anything because I've never tested multicast
> routing with IP6.

We should figure out what RFC says about IPv6 MLD.  If we use Luca's
smarter logic for IPv4, we should also fix IPv6.

bluhm



Re: bbolt can freeze 7.2 from userspace

2023-02-20 Thread Alexander Bluhm
On Mon, Feb 20, 2023 at 09:43:10AM +0100, Martin Pieuchot wrote:
> On 20/02/23(Mon) 03:59, Renato Aguiar wrote:
> > [...] 
> > I can't reproduce it anymore with this patch on 7.2-stable :)
> 
> Thanks a lot for testing!  Here's a better fix from Chuck Silvers.
> That's what I believe we should commit.
> 
> The idea is to prevent sibling from modifying the vm_map by marking
> it as "busy" in msync(2) instead of holding the exclusive lock while
> sleeping.  This let siblings make progress and stop possible writers.
> 
> Could you all guys confirm this also prevent the deadlock?  Thanks!

This uvm diff survived a full regress run on amd64 and powerpc64.

bluhm

> Index: uvm/uvm_map.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_map.c,v
> retrieving revision 1.312
> diff -u -p -r1.312 uvm_map.c
> --- uvm/uvm_map.c 13 Feb 2023 14:52:55 -  1.312
> +++ uvm/uvm_map.c 20 Feb 2023 08:10:39 -
> @@ -4569,8 +4569,7 @@ fail:
>   * => never a need to flush amap layer since the anonymous memory has
>   *   no permanent home, but may deactivate pages there
>   * => called from sys_msync() and sys_madvise()
> - * => caller must not write-lock map (read OK).
> - * => we may sleep while cleaning if SYNCIO [with map read-locked]
> + * => caller must not have map locked
>   */
>  
>  int
> @@ -4592,25 +4591,27 @@ uvm_map_clean(struct vm_map *map, vaddr_
>   if (start > end || start < map->min_offset || end > map->max_offset)
>   return EINVAL;
>  
> - vm_map_lock_read(map);
> + vm_map_lock(map);
>   first = uvm_map_entrybyaddr(>addr, start);
>  
>   /* Make a first pass to check for holes. */
>   for (entry = first; entry != NULL && entry->start < end;
>   entry = RBT_NEXT(uvm_map_addr, entry)) {
>   if (UVM_ET_ISSUBMAP(entry)) {
> - vm_map_unlock_read(map);
> + vm_map_unlock(map);
>   return EINVAL;
>   }
>   if (UVM_ET_ISSUBMAP(entry) ||
>   UVM_ET_ISHOLE(entry) ||
>   (entry->end < end &&
>   VMMAP_FREE_END(entry) != entry->end)) {
> - vm_map_unlock_read(map);
> + vm_map_unlock(map);
>   return EFAULT;
>   }
>   }
>  
> + vm_map_busy(map);
> + vm_map_unlock(map);
>   error = 0;
>   for (entry = first; entry != NULL && entry->start < end;
>   entry = RBT_NEXT(uvm_map_addr, entry)) {
> @@ -4722,7 +4723,7 @@ flush_object:
>   }
>   }
>  
> - vm_map_unlock_read(map);
> + vm_map_unbusy(map);
>   return error;
>  }
>  



sys_pselect assertion "timo || _kernel_lock_held()" failed

2023-02-13 Thread Alexander Bluhm
Hi,

Today I saw this panic on my i386 regress machine.  

panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file 
"/usr/src/sys/kern/kern_synch.c", line 127

Looks like src/regress/lib/libc/sys/ triggered it.  Kernel was built
from some 2023-02-13 source checkout.  I will keep watching if it
happens again.

 run-t_select 
cc -O2 -pipe  -std=gnu99  -MD -MP  -c /usr/src/regress/lib/libc/sys/t_select.c
cc   -o t_select t_select.o atf-c.o 
ulimit -c unlimited &&  ntests="`./t_select -n`" &&  echo "1..$ntests" &&  
tnumbers="`jot -ns' ' - 1 $ntests`" &&  make -C /usr/src/regress/lib/libc/sys 
PROG=t_select NUMBERS="$tnumbers" regress
1..2
 run-t_select-1 
1 Checks pselect's temporary mask setting when a signal is received (PR 
lib/43625)
./t_select -r 1
Timeout, server ot4 not responding.

panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file 
"/usr/src/sys/kern/kern_synch.c", line 127
Stopped at  db_enter+0x4:   popl%ebp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*494368  36805  0   00x89  t_select
 223467  53838  0 0x14000 0x42000  softclock
db_enter() at db_enter+0x4
panic(d0c60cae) at panic+0x7a
__assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19
tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117
tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc
dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c
sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9
syscall(f5b0ac10) at syscall+0x301
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{9}> show panic
*cpu9: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: file 
"/usr/src/sys/kern/kern_synch.c", line 127

ddb{9}> trace
db_enter() at db_enter+0x4
panic(d0c60cae) at panic+0x7a
__assert(d0cccada,d0c6daa7,7f,d0ce58fd) at __assert+0x19
tsleep(d0fcefd0,118,d0d0cf14,0) at tsleep+0x117
tsleep_nsec(d0fcefd0,118,d0d0cf14,) at tsleep_nsec+0xcc
dopselect(d621da8c,1,cf7db3f8,0,0,0,f5b0ab74,f5b0abc8) at dopselect+0x49c
sys_pselect(d621da8c,f5b0abd0,f5b0abc8) at sys_pselect+0xa9
syscall(f5b0ac10) at syscall+0x301
Xsyscall_untramp() at Xsyscall_untramp+0xa9
end of kernel

ddb{9}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*36805  494368  81530  0  7 0x8t_select
 81530  403270  40998  0  30x82  nanoslp   t_select
 40998   94085  15069  0  30x10008a  sigsusp   make
 15069  146328  39109  0  30x10008a  sigsusp   sh
 39109  344737  51331  0  30x10008a  sigsusp   make
 51331  138704  34007  0  30x10008a  sigsusp   sh
 34007   93629  19868  0  30x10008a  sigsusp   make
 19868  317272  38049  0  30x10008a  sigsusp   sh
 38049  440823   5232  0  30x10008a  sigsusp   make
 37773  470707   5644  0  30x100082  piperdgzip
  5644   31299   5232  0  30x100082  piperdpax
  5232  392807  48080  0  30x82  piperdperl
 48080  317705   5394  0  30x10008a  sigsusp   ksh
  5394   64950  41772  0  30x9a  kqreadsshd
 21685  469765  1  0  30x100083  ttyin getty
 54886  396067  1  0  30x100083  ttyin getty
 30334   46910  1  0  30x100083  ttyin getty
 32613  460954  1  0  30x100083  ttyin getty
 98276  290762  1  0  30x100083  ttyin getty
 83393  276624  1  0  30x100083  ttyin getty
 58262  416339  1  0  30x100098  kqreadcron
 63115  378072  1 99  3   0x1100090  kqreadsndiod
 16973  434562  1110  30x100090  kqreadsndiod
 31914  352102  1  0  30x100090  kqreadinetd
 29012  259544  75755 95  3   0x1100092  kqreadsmtpd
 75423   90965  75755103  3   0x1100092  kqreadsmtpd
 21079  436365  75755 95  3   0x1100092  kqreadsmtpd
 50673  244082  75755 95  30x100092  kqreadsmtpd
 29210  305613  75755 95  3   0x1100092  kqreadsmtpd
 23149  255222  75755 95  3   0x1100092  kqreadsmtpd
 75755   17291  1  0  30x100080  kqreadsmtpd
 93529  264256  66818 91  30x92  kqreadsnmpd_metrics
 66818  486393  1  0  30x100080  kqreadsnmpd
 44889  252400  1 91  3   0x1100092  kqreadsnmpd
 41772  245133  1  0  30x88  kqreadsshd
 66620  411509  0  0  3 0x14280  nfsidlnfsio
 80915  345110  0  0  3 0x14280  nfsidlnfsio
 10608  431262  0  0  3 0x14280  nfsidlnfsio
 11489  201187  0  0  3 0x14280  nfsidlnfsio

Re: pf.conf bug

2023-02-06 Thread Alexander Bluhm
On Mon, Feb 06, 2023 at 09:37:47PM +0100, Alexandr Nedvedicky wrote:
> if we want to allow firewall administrator to specify a match
> on icmptype 255 then extending type from uint8_t to uint16_t
> is the right change.
> 
> another option is to change logic here to allow matching icmptype in
> range <0, 254>

Although type 255 is reserved and code 255 not assigned, it seems
useful if pf supports them.  This would allow to block such crap.
Or someone might filter it in an experimental environment.

> either way is fine for me. however my preference is leaning
> towards a type change.

Not allowing 255 would be confusing.  The special meaning of 0 and
the limited range to 254 is an implementation datail that should
not be exposed to the user.

> note diff also rops pad[2] member to compensate for change
> of uint8_t to uin16_t for type, code members. I'm not sure
> about this bit hence I'd like to bring it up here.

This creates an ABI change.  People have to recompile their pfctl.
I think we never guarantee this level of compatibility.

OK bluhm@

> 8<---8<---8<--8<
> diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y
> index 2c5a49772ff..898bded8c24 100644
> --- a/sbin/pfctl/parse.y
> +++ b/sbin/pfctl/parse.y
> @@ -134,8 +134,8 @@ struct node_gid {
>  };
>  
>  struct node_icmp {
> - u_int8_t code;
> - u_int8_t type;
> + u_int16_tcode;  /* aux. value 256 is legit */
> + u_int16_ttype;  /* aux. value 256 is legit */
>   u_int8_t proto;
>   struct node_icmp*next;
>   struct node_icmp*tail;
> diff --git a/sys/net/pfvar.h b/sys/net/pfvar.h
> index 1453bc35c04..1efb1b5221c 100644
> --- a/sys/net/pfvar.h
> +++ b/sys/net/pfvar.h
> @@ -572,8 +572,8 @@ struct pf_rule {
>   u_int8_t keep_state;
>   sa_family_t  af;
>   u_int8_t proto;
> - u_int8_t type;
> - u_int8_t code;
> + u_int16_ttype;  /* aux. value 256 is legit */
> + u_int16_tcode;  /* aux. value 256 is legit */
>   u_int8_t flags;
>   u_int8_t flagset;
>   u_int8_t min_ttl;
> @@ -592,7 +592,6 @@ struct pf_rule {
>   u_int8_t set_prio[2];
>   sa_family_t  naf;
>   u_int8_t rcvifnot;
> - u_int8_t pad[2];
>  
>   struct {
>   struct pf_addr  addr;



Re: kernel panic: diagnostic assertion "inp->inp_laddr.s_addr == INADDR_ANY || inp->inp_lport"

2023-01-12 Thread Alexander Bluhm
On Fri, Dec 16, 2022 at 05:36:04PM +, K R wrote:
> ddb> show panic
> *cpuO: kernel diagnostic assertion "inp->inp_laddr.s_addr ==
> INADDR_ANY || inp->inp_lport" failed: file
> "/usr/src/sys/netinet/in_pcb.c", line 510

This has been fixed in errata and syspatch.
https://ftp.openbsd.org/pub/OpenBSD/patches/7.2/common/013_tcp.patch.sig

> pass in quick log (all) inet6 proto { tcp, udp } to (egress) divert-to
> ::1 port 9000
> pass in quick log (all) inet proto { tcp, udp } to (egress) divert-to
> 127.0.0.1 port 9000

pf divert-to rules that match destination port 0 trigger the bug.
Like in your case, where the port is not specified.

pass in quick log (all) inet6 proto { tcp, udp } to (egress) port != 0 
divert-to ::1 port 9000
pass in quick log (all) inet proto { tcp, udp } to (egress) port != 0 divert-to 
127.0.0.1 port 9000

These rules with destination port not zero would be safe.

bluhm



Re: panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed

2022-12-06 Thread Alexander Bluhm
On Tue, Dec 06, 2022 at 11:33:06PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Dec 06, 2022 at 07:56:13PM +0100, Paul de Weerd wrote:
> > I was playing with the USB NIC that's in my (USB-C) monitor.  As soon
> > as I do traffic over the interface, I get a kernel panic:
> > 
> > panic: kernel diagnostic assertion "timo || _kernel_lock_held()" failed: 
> > file "/usr/src/sys/kern/kern_synch.c", line 127
> > 
> 
> I missed, in{,6}_addmulti() have no kernel lock around (*if_ioctl)().
> But corresponding in{,6}_delmulti() have.

OK bluhm@

> Index: sys/netinet/in.c
> ===
> RCS file: /cvs/src/sys/netinet/in.c,v
> retrieving revision 1.178
> diff -u -p -r1.178 in.c
> --- sys/netinet/in.c  19 Nov 2022 14:26:40 -  1.178
> +++ sys/netinet/in.c  6 Dec 2022 19:47:12 -
> @@ -885,10 +885,13 @@ in_addmulti(struct in_addr *ap, struct i
>*/
>   memset(, 0, sizeof(ifr));
>   memcpy(_addr, >inm_sin, sizeof(inm->inm_sin));
> + KERNEL_LOCK();
>   if ((*ifp->if_ioctl)(ifp, SIOCADDMULTI,(caddr_t)) != 0) {
> + KERNEL_UNLOCK();
>   free(inm, M_IPMADDR, sizeof(*inm));
>   return (NULL);
>   }
> + KERNEL_UNLOCK();
>  
>   TAILQ_INSERT_HEAD(>if_maddrlist, >inm_ifma,
>   ifma_list);
> Index: sys/netinet6/in6.c
> ===
> RCS file: /cvs/src/sys/netinet6/in6.c,v
> retrieving revision 1.258
> diff -u -p -r1.258 in6.c
> --- sys/netinet6/in6.c2 Dec 2022 12:56:51 -   1.258
> +++ sys/netinet6/in6.c6 Dec 2022 19:47:12 -
> @@ -1063,7 +1063,9 @@ in6_addmulti(struct in6_addr *maddr6, st
>* filter appropriately for the new address.
>*/
>   memcpy(_addr, >in6m_sin, sizeof(in6m->in6m_sin));
> + KERNEL_LOCK();
>   *errorp = (*ifp->if_ioctl)(ifp, SIOCADDMULTI, (caddr_t));
> + KERNEL_UNLOCK();
>   if (*errorp) {
>   free(in6m, M_IPMADDR, sizeof(*in6m));
>   return (NULL);



Re: deadlock in ifconfig

2022-11-25 Thread Alexander Bluhm
On Thu, Nov 24, 2022 at 11:23:51AM +1000, David Gwynne wrote:
> > we're working toward dropping the need for NET_LOCK before PF_LOCK. could
> > we try the diff below as a compromise?
> >
> 
> sashan@ and mvs@ have pushed that forward, so this diff should be enough
> now.

This diff has been reverted due to netlock splassert failures.

So I would like to revert the origin of my deadlock.  Move pf_purge
back to systq.

ok?

bluhm

Index: net/pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1155
diff -u -p -r1.1155 pf.c
--- net/pf.c25 Nov 2022 18:03:53 -  1.1155
+++ net/pf.c25 Nov 2022 18:33:08 -
@@ -120,6 +120,10 @@ u_char  pf_tcp_secret[16];
 int pf_tcp_secret_init;
 int pf_tcp_iss_off;
 
+int pf_npurge;
+struct task pf_purge_task = TASK_INITIALIZER(pf_purge, _npurge);
+struct timeout  pf_purge_to = TIMEOUT_INITIALIZER(pf_purge_timeout, NULL);
+
 enum pf_test_status {
PF_TEST_FAIL = -1,
PF_TEST_OK,
@@ -1516,110 +1520,47 @@ pf_state_import(const struct pfsync_stat
 
 /* END state table stuff */
 
-voidpf_purge_states(void *);
-struct task pf_purge_states_task =
-TASK_INITIALIZER(pf_purge_states, NULL);
-
-voidpf_purge_states_tick(void *);
-struct timeout  pf_purge_states_to =
-TIMEOUT_INITIALIZER(pf_purge_states_tick, NULL);
-
-unsigned intpf_purge_expired_states(unsigned int, unsigned int);
-
-/*
- * how many states to scan this interval.
- *
- * this is set when the timeout fires, and reduced by the task. the
- * task will reschedule itself until the limit is reduced to zero,
- * and then it adds the timeout again.
- */
-unsigned int pf_purge_states_limit;
-
-/*
- * limit how many states are processed with locks held per run of
- * the state purge task.
- */
-unsigned int pf_purge_states_collect = 64;
-
 void
-pf_purge_states_tick(void *null)
+pf_purge_timeout(void *unused)
 {
-   unsigned int limit = pf_status.states;
-   unsigned int interval = pf_default_rule.timeout[PFTM_INTERVAL];
-
-   if (limit == 0) {
-   timeout_add_sec(_purge_states_to, 1);
-   return;
-   }
-
-   /*
-* process a fraction of the state table every second
-*/
-
-   if (interval > 1)
-   limit /= interval;
-
-   pf_purge_states_limit = limit;
-   task_add(systqmp, _purge_states_task);
-}
-
-void
-pf_purge_states(void *null)
-{
-   unsigned int limit;
-   unsigned int scanned;
-
-   limit = pf_purge_states_limit;
-   if (limit < pf_purge_states_collect)
-   limit = pf_purge_states_collect;
-
-   scanned = pf_purge_expired_states(limit, pf_purge_states_collect);
-   if (scanned >= pf_purge_states_limit) {
-   /* we've run out of states to scan this "interval" */
-   timeout_add_sec(_purge_states_to, 1);
-   return;
-   }
-
-   pf_purge_states_limit -= scanned;
-   task_add(systqmp, _purge_states_task);
+   /* XXX move to systqmp to avoid KERNEL_LOCK */
+   task_add(systq, _purge_task);
 }
 
-voidpf_purge_tick(void *);
-struct timeout  pf_purge_to =
-TIMEOUT_INITIALIZER(pf_purge_tick, NULL);
-
-voidpf_purge(void *);
-struct task pf_purge_task =
-TASK_INITIALIZER(pf_purge, NULL);
-
 void
-pf_purge_tick(void *null)
+pf_purge(void *xnloops)
 {
-   task_add(systqmp, _purge_task);
-}
+   int *nloops = xnloops;
 
-void
-pf_purge(void *null)
-{
-   unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]);
+   /*
+* process a fraction of the state table every second
+* Note:
+* we no longer need PF_LOCK() here, because
+* pf_purge_expired_states() uses pf_state_lock to maintain
+* consistency.
+*/
+   if (pf_default_rule.timeout[PFTM_INTERVAL] > 0)
+   pf_purge_expired_states(1 + (pf_status.states
+   / pf_default_rule.timeout[PFTM_INTERVAL]));
 
-   /* XXX is NET_LOCK necessary? */
NET_LOCK();
 
PF_LOCK();
-
-   pf_purge_expired_src_nodes();
-
+   /* purge other expired types every PFTM_INTERVAL seconds */
+   if (++(*nloops) >= pf_default_rule.timeout[PFTM_INTERVAL])
+   pf_purge_expired_src_nodes();
PF_UNLOCK();
 
/*
 * Fragments don't require PF_LOCK(), they use their own lock.
 */
-   pf_purge_expired_fragments();
+   if ((*nloops) >= pf_default_rule.timeout[PFTM_INTERVAL]) {
+   pf_purge_expired_fragments();
+   *nloops = 0;
+   }
NET_UNLOCK();
 
-   /* interpret the interval as idle time between runs */
-   timeout_add_sec(_purge_to, interval);
+   timeout_add_sec(_purge_to, 1);
 }
 
 

Re: deadlock in ifconfig

2022-11-25 Thread Alexander Bluhm
On Thu, Nov 24, 2022 at 07:09:39PM +0100, Alexandr Nedvedicky wrote:
> Hello,
> 
> 
> On Thu, Nov 24, 2022 at 10:29:37AM -0500, David Hill wrote:
> > > > 
> > 
> > With this diff against -current - my dmesg is spammed with:
> > 
> > splassert: pfsync_delete_state: want 2 have 0
> > Starting stack trace...
> > pfsync_delete_state(fd820af9f940) at pfsync_delete_state+0x58
> > pf_remove_state(fd820af9f940) at pf_remove_state+0x14b
> > pf_purge_expired_states(42,40) at pf_purge_expired_states+0x202
> > pf_purge_states(0) at pf_purge_states+0x1c
> > taskq_thread(822c78f0) at taskq_thread+0x11a
> > 

I also found them on my test machines now.  But I did not check the
logs when I tested the diff.  Stupid me.

> there are forgotten NET_ASSERT_LOCK() which are no longer valid,
> in pfsync. diff below removes those which are either hit
> from purge_thread() or from ioctl().
> 
> I think remaining NET_ASSERT_LOCK() should stay at least for now.
> those belong to path which runs under NET_LOCK()
> 
> can you give a try diff below?

pfsync_undefer() is called without PF_LOCK from pf_test().  Now
that we have neither net nor pf lock, how can CLR(pd->pd_st->state_flags,
PFSTATE_ACK) in pfsync_undefer() be MP safe?

And pfsync_undefer() calls pfsync_undefer() which calls ip_output().
How can this work without net lock?

bluhm

> 8<---8<---8<--8<
> diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
> index f69790ee98d..24963a546de 100644
> --- a/sys/net/if_pfsync.c
> +++ b/sys/net/if_pfsync.c
> @@ -1865,8 +1865,6 @@ pfsync_undefer(struct pfsync_deferral *pd, int drop)
>  {
>   struct pfsync_softc *sc = pfsyncif;
>  
> - NET_ASSERT_LOCKED();
> -
>   if (sc == NULL)
>   return;
>  
> @@ -2128,8 +2126,6 @@ pfsync_delete_state(struct pf_state *st)
>  {
>   struct pfsync_softc *sc = pfsyncif;
>  
> - NET_ASSERT_LOCKED();
> -
>   if (sc == NULL || !ISSET(sc->sc_if.if_flags, IFF_RUNNING))
>   return;
>  
> @@ -2188,8 +2184,6 @@ pfsync_clear_states(u_int32_t creatorid, const char 
> *ifname)
>   struct pfsync_clr clr;
>   } __packed r;
>  
> - NET_ASSERT_LOCKED();
> -
>   if (sc == NULL || !ISSET(sc->sc_if.if_flags, IFF_RUNNING))
>   return;
>  



Re: deadlock in ifconfig

2022-11-24 Thread Alexander Bluhm
On Thu, Nov 24, 2022 at 11:23:51AM +1000, David Gwynne wrote:
> > we're working toward dropping the need for NET_LOCK before PF_LOCK. could
> > we try the diff below as a compromise?
> >
> 
> sashan@ and mvs@ have pushed that forward, so this diff should be enough
> now.

I have tested the previous version of the diff in my performance
setup and this diff in my regress setup.

OK bluhm@

> Index: pf.c
> ===
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1153
> diff -u -p -r1.1153 pf.c
> --- pf.c  12 Nov 2022 02:48:14 -  1.1153
> +++ pf.c  24 Nov 2022 01:21:48 -
> @@ -1603,9 +1603,6 @@ pf_purge(void *null)
>  {
>   unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]);
>  
> - /* XXX is NET_LOCK necessary? */
> - NET_LOCK();
> -
>   PF_LOCK();
>  
>   pf_purge_expired_src_nodes();
> @@ -1616,7 +1613,6 @@ pf_purge(void *null)
>* Fragments don't require PF_LOCK(), they use their own lock.
>*/
>   pf_purge_expired_fragments();
> - NET_UNLOCK();
>  
>   /* interpret the interval as idle time between runs */
>   timeout_add_sec(_purge_to, interval);
> @@ -1891,7 +1887,6 @@ pf_purge_expired_states(const unsigned i
>   if (SLIST_EMPTY())
>   return (scanned);
>  
> - NET_LOCK();
>   rw_enter_write(_state_list.pfs_rwl);
>   PF_LOCK();
>   PF_STATE_ENTER_WRITE();
> @@ -1904,7 +1899,6 @@ pf_purge_expired_states(const unsigned i
>   PF_STATE_EXIT_WRITE();
>   PF_UNLOCK();
>   rw_exit_write(_state_list.pfs_rwl);
> - NET_UNLOCK();
>  
>   while ((st = SLIST_FIRST()) != NULL) {
>   SLIST_REMOVE_HEAD(, gc_list);



deadlock in ifconfig

2022-11-21 Thread Alexander Bluhm
Hi,

Some of my test machines hang while booting userland.

starting network
-> here it hangs
load: 0.02  cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k

ddb shows these two processes.

 81303  375320  89140  0  3 0x3  sbar  ifconfig
 48135  157353  0  0  3 0x14200  netlock   systqmp

ddb{0}> trace /t 0t375320
sleep_finish(800022d31318,1) at sleep_finish+0xfe
cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54
sched_barrier(800022512ff0) at sched_barrier+0x73
ixgbe_stop(80118000) at ixgbe_stop+0x1f7
ixgbe_init(80118000) at ixgbe_init+0x32
ixgbe_ioctl(80118048,8020690c,8022ec00) at ixgbe_ioctl+0x13a
in_ifinit(80118048,8022ec00,800022d31740,1) at in_ifinit+0x
ef
in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at in_ioct
l_change_ifaddr+0x3a4
in_control(fd81901dc740,8040691a,800022d31730,80118048) at in_c
ontrol+0x75
ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at ifioctl
+0x982
sys_ioctl(800022d6,800022d31840,800022d318a0) at sys_ioctl+0x2c
4
syscall(800022d31910) at syscall+0x384
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7d94a0, count: -13

ddb{0}> trace /t 0t157353
sleep_finish(800022ca8b70,1) at sleep_finish+0xfe
rw_enter(822b4f80,1) at rw_enter+0x1cb
pf_purge(0) at pf_purge+0x1d
taskq_thread(822ac568) at taskq_thread+0x100
end trace frame: 0x0, count: -4

ifconfig waits for the sched_barrier_task() on the systqmp task
queue while holding the netlock.  pf_purge() runs on the systqmp
task queue and is waiting for the netlock.  The netlock has been
taken by ifconfig in in_ioctl_change_ifaddr().

The problem has been introduced when pf_purge() was moved from systq
to systqmp.
https://marc.info/?l=openbsd-cvs=166818274216800=2

bluhm



Re: macppc panic: vref used where vget required

2022-11-14 Thread Alexander Bluhm
On Wed, Nov 09, 2022 at 04:14:10PM +, Martin Pieuchot wrote:
> On 09/09/22(Fri) 14:41, Martin Pieuchot wrote:
> > On 09/09/22(Fri) 12:25, Theo Buehler wrote:
> > > > Yesterday gnezdo@ fixed a race in uvn_attach() that lead to the same
> > > > assert.  Here's an rebased diff for the bug discussed in this thread,
> > > > could you try again and let us know?  Thanks!
> > > 
> > > This seems to be stable now. It's been running for nearly 5 days.
> > > Without gnezdo's fix it would blow up within at most 2 days.
> > 
> > Thanks!  I'm looking for oks then. 
> 
> Here's an alternative possible fix.  The previous one got reverted
> because it exposes a bug on arm64 machines with Cortex-A72 CPUs.
> 
> The idea of the diff below is to flush data to physical pages that we keep
> around when munmap(2) is called.  I hope that the page daemon does the right
> thing and don't try to grab a reference to the vnode if all pages are 
> PG_CLEAN.
> 
> Could you try that and tell me if this prevents the panic you're seeing?
> 
> Index: uvm/uvm_vnode.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
> retrieving revision 1.130
> diff -u -p -r1.130 uvm_vnode.c
> --- uvm/uvm_vnode.c   20 Oct 2022 13:31:52 -  1.130
> +++ uvm/uvm_vnode.c   9 Nov 2022 16:08:57 -
> @@ -329,7 +329,7 @@ uvn_detach(struct uvm_object *uobj)
>*/
>   if (uvn->u_flags & UVM_VNODE_CANPERSIST) {
>   /* won't block */
> - uvn_flush(uobj, 0, 0, PGO_DEACTIVATE|PGO_ALLPAGES);
> + uvn_flush(uobj, 0, 0, PGO_CLEANIT|PGO_DEACTIVATE|PGO_ALLPAGES);
>   goto out;
>   }
>  

With this it does not panic but hangs during build.  Console password
prompt appears, but login does not work.  Break into ddb shows both
CPU are idle.

OpenBSD/macppc (ot26.obsd-lab.genua.de) (console)

login: root
Password:
^T^T^TLogin timed out after 300 seconds
[halt sent]
Stopped at  db_enter+0x24:  lwz r11,12(r1)
ddb{0}> trace
db_enter() at db_enter+0x20
zs_abort(b846c9ff) at zs_abort+0x44
zstty_stint(b846c9ff,aa9050) at zstty_stint+0x80
zsc_intr_hard(0) at zsc_intr_hard+0xf8
zshard(eaf94f7b) at zshard+0x94
openpic_ext_intr() at openpic_ext_intr+0x378
extint_call() at extint_call
--- interrupt ---
sched_idle(0) at sched_idle+0x220
proc_trampoline() at proc_trampoline+0x14
end trace frame: 0x0, count: -9
ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 82062  108741  35278  0  30x10008a  sigsusp   sh
 35278  326141  76529  0  30x100090  piperdcron
 57024  146742  22379  0  30x10008a  sigsusp   sh
 22379  111591  76529  0  30x100090  piperdcron
 55048  187511   1391  0  30x10008a  sigsusp   sh
  1391  326522  76529  0  30x100090  piperdcron
  8321  169445  33238  0  30x10008a  sigsusp   sh
 33238  329691  76529  0  30x100090  piperdcron
 58098  225411   1674  0  30x10008a  sigsusp   sh
  1674  496278  76529  0  30x100090  piperdcron
 15523  293279  38464  0  30x10008a  sigsusp   sh
 38464  110795  76529  0  30x100090  piperdcron
 74065  410269  92325  0  30x10008a  sigsusp   sh
 92325   52659  76529  0  30x100090  piperdcron
 13829  401074  64288  0  30x10008a  sigsusp   sh
 64288   98738  76529  0  30x100090  piperdcron
 28303  221986  33521  0  30x10008a  sigsusp   sh
 33521  373224  76529  0  30x100090  piperdcron
 82401  100291  60910  0  30x10008a  sigsusp   sh
 60910  436047  76529  0  30x100090  piperdcron
 87863   36340  57609  0  30x10008a  sigsusp   sh
 57609   73380  76529  0  30x100090  piperdcron
 23934  159239   6695  0  30x10008a  sigsusp   sh
  6695  188241  76529  0  30x100090  piperdcron
 62835  385474  59944  0  30x10008a  sigsusp   sh
 59944   75639  76529  0  30x100090  piperdcron
 65245  328020  80897  0  3x10008a  sigsusp   sh
 89129  427253  76529  0  30x100090  piperdcron
 46467  285305  28388  0  30x10008a  sigsusp   sh
 28388   57372  76529  0  30x100090  piperdcron
 12718   74740  85868  0  30x10008a  sigsusp   sh
 85868  312799  76529  0  30x100090  piperdcron
 87283  190660  39787  0  30x10008a  sigsusp   sh
 39787  154590  76529  0  30x100090  piperdcron
 18854  410749   2504  0  30x10008a  sigsusp   sh
  2504   71618  76529  0  30x100090  piperdcron
 79881  316220  58337  0  30x10008a  sigsusp   sh
 58337  220546  76529  0  30x100090  piperdcron
 80626  130427  85908  0  30x10008a  sigsusp   sh
 85908   83631  76529  0  30x100090  piperd

Re: performance regression RDTSCP

2022-10-09 Thread Alexander Bluhm
On Sat, Oct 08, 2022 at 08:41:34AM +0200, Robert Nagy wrote:
> What is the output of sysctl kernl.timecounter?

# sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=tsc
kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)

> If you have clock_gettime() showing up in your ktrace, that means
> that usertc is not used, so having a 30% slowdown is expected.

It was my mistake.  I have only updated kernel and not libc.
With new libc iperf3 does not call clock_gettime() anymore.

 97181 iperf3   CALL  select(6,0x7f7effe0,0x7f7eff60,0,0x9706589c690)
 97181 iperf3   CALL  write(5,0x97092d8f000,0x5a8)

Performance is restored.

http://bluhm.genua.de/perform/results/7.1/2022-10-08T08%3A46%3A48Z/perform.html

Sorry for the noise.

bluhm



Re: performance regression RDTSCP

2022-10-07 Thread Alexander Bluhm
On Fri, Oct 07, 2022 at 01:10:14PM -0500, Scott Cheloha wrote:
> Does this machine have the corresponding libc change that went with the kernel
> change?  The RDTSCP option has a distinct userspace implementation.  If libc
> isn't up to date it won't know what to do and it will fall back to the 
> syscall.

It uses an old libc.  I just checkout the kernel from everyday and
compare it.  I did not know, that I have to update libc at this
point, but I can easily do that.

I will test with new libc and report.

bluhm



performance regression RDTSCP

2022-10-07 Thread Alexander Bluhm
Hi,

My monthly UDP benchmarks detect a reduction in iperf3 UDP througput
by 30% between September 22 and 23.

http://bluhm.genua.de/perform/results/7.1/2022-10-01T06%3A17%3A03Z/gnuplot/udp.html

It is this test: iperf3 -c10.3.45.35 -u -b10G -w1m -t10

Per commit checkout shows the relevant change.

http://bluhm.genua.de/perform/results/7.1/2022-10-05T14%3A50%3A25Z/perform.html
http://bluhm.genua.de/perform/results/cvslog/2022/src/sys/2022-09-22T04%3A36%3A38Z--2022-09-22T04%3A57%3A08Z.html


revision 1.29
date: 2022/09/22 04:57:08;  author: robert;  state: Exp;  lines: +29 -13;  
commitid: ru7qpauUJUEE0rnw;
use the always serializing RDTSCP instruction in tsc and usertc if available

tweaks from cheloha@; ok deraadt@, sthen@, cheloha@


Note that despite its purpose, iperf3 -u does not meassure UDP
performance.  Per packet it does 6 system calls, and only 1 is
sending packets.  Basically clock_gettime() is benchmarked.

 32134 iperf3   CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7cd880)
 32134 iperf3   CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7cd880)
 32134 iperf3   CALL  select(6,0x7f7cd930,0x7f7cd8b0,0,0xea5ed51f690)
 32134 iperf3   CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7cd830)
 32134 iperf3   CALL  clock_gettime(CLOCK_MONOTONIC,0x7f7cd800)
 32134 iperf3   CALL  write(5,0xea5d65d5000,0x5a8)

The new implementation in the kernel seems much slower.

bluhm


OpenBSD 7.2 (GENERIC.MP) #cvs : D2022.09.23.00.00.00: Wed Oct  5 22:17:07 CEST 
2022
r...@ot14.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 6416760832 (6119MB)
avail mem = 6188167168 (5901MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x99c00 (88 entries)
bios0: vendor American Megatrends Inc. version "1.1b" date 03/04/2010
bios0: Supermicro X8DTH-i/6/iF/6F
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC MCFG SPMI OEMB HPET DMAR SSDT EINJ BERT ERST HEST
acpi0: wakeup devices NPE1(S4) NPE2(S4) NPE3(S4) NPE4(S4) NPE5(S4) NPE6(S4) 
NPE7(S4) NPE8(S4) NPE9(S4) NPEA(S4) P0P1(S4) USB0(S4) USB1(S4) USB2(S4) 
USB5(S4) EUSB(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz, 2933.59 MHz, 06-1a-05
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,MELTDOWN
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 133MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz, 2933.46 MHz, 06-1a-05
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,MELTDOWN
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz, 2933.46 MHz, 06-1a-05
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,MELTDOWN
cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz, 2933.46 MHz, 06-1a-05
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,MELTDOWN
cpu3: 32KB 64b/line 8-way D-cache, 32KB 64b/line 4-way I-cache, 256KB 64b/line 
8-way L2 cache, 8MB 64b/line 16-way L3 cache
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 24 pins, remapped
ioapic1 at mainbus0: apid 3 pa 0xfec8a000, version 20, 24 pins, remapped
ioapic2 at mainbus0: apid 5 pa 0xfec9a000, version 20, 24 pins, remapped
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255
acpihpet0 at acpi0: 

Re: macppc panic: vref used where vget required

2022-09-06 Thread Alexander Bluhm
On Tue, Sep 06, 2022 at 05:17:56AM +, Miod Vallat wrote:
> > On Thu, Sep 01, 2022 at 02:57:27PM +0200, Martin Pieuchot wrote:
> > > Yesterday gnezdo@ fixed a race in uvn_attach() that lead to the same
> > > assert.  Here's an rebased diff for the bug discussed in this thread,
> > > could you try again and let us know?  Thanks!
> > 
> > Wow!  With this diff I finished make release for the fist time on
> > my macppc.  I am starting the next build right now.  Result for
> > this test takes 1.13 days.
> 
> Same here, PowerMac11,2 has been able to bake a muild while running
> GENERIC.MP for the first time ever, with this diff.

My macppc survived 3 make release and 1 make regress now.
make release never worked before.

bluhm



Re: em0: problems seen initial neighbor solicitation (ipv6)

2022-09-05 Thread Alexander Bluhm
On Mon, Sep 05, 2022 at 03:23:21PM +0200, Sebastien Marie wrote:
> 15:06:59.567147 c8:be:19:e2:2c:ed 33:33:ff:fc:bf:56 ip6 86: 
> fd00:65c6:f26a:5c::1 > ff02::1:fffc:bf56: icmp6: neighbor sol: who has 
> fd00:65c6:f26a:5c:452e:64ab:3ffc:bf56

Some packets sent by the neighbor solicitation state machine go to
multicast MAC address destination.

who has fd00:65c6:f26a:5c:452e:64ab:3ffc:bf56
all matching hosts ff02::1:fffc:bf56
multicast MAC 33:33:ff:fc:bf:56

The have all the same suffix.  Idea is to reduce broadcasts within
the IP stack by filtering the correct multicast MACs in the network
hardware.

If this packet is not received during normal operation but with
tcpdump promiscuous mode, this meas that the multicast filter in
the network card is not set correctly.  I suspect an em(4) driver
bug.

bluhm



Re: rt_ifa_del NULL deref

2022-09-04 Thread Alexander Bluhm
On Sun, Sep 04, 2022 at 05:42:14PM +0300, Vitaliy Makkoveev wrote:
> Not for commit, just for collect assertions. Netlock assertions doen't
> provide panics.

Better use NET_ASSERT_LOCKED_EXCLUSIVE() ?

But maybe nd6 uses a combination of shared netlock and kernel lock.

Is it possible that we sleep somewhere in nd6 input?  Then we give
up kernel lock and shared netlock would not be sufficent.  These
list additions should be safe.  But we might sleep during some list
traversal.

> Index: sys/net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.664
> diff -u -p -r1.664 if.c
> --- sys/net/if.c  2 Sep 2022 13:12:31 -   1.664
> +++ sys/net/if.c  4 Sep 2022 14:39:58 -
> @@ -3148,12 +3148,14 @@ ifpromisc(struct ifnet *ifp, int pswitch
>  void
>  ifa_add(struct ifnet *ifp, struct ifaddr *ifa)
>  {
> + NET_ASSERT_LOCKED();
>   TAILQ_INSERT_TAIL(>if_addrlist, ifa, ifa_list);
>  }
>  
>  void
>  ifa_del(struct ifnet *ifp, struct ifaddr *ifa)
>  {
> + NET_ASSERT_LOCKED();
>   TAILQ_REMOVE(>if_addrlist, ifa, ifa_list);
>  }
>  



Re: macppc panic: vref used where vget required

2022-09-02 Thread Alexander Bluhm
On Thu, Sep 01, 2022 at 02:57:27PM +0200, Martin Pieuchot wrote:
> Yesterday gnezdo@ fixed a race in uvn_attach() that lead to the same
> assert.  Here's an rebased diff for the bug discussed in this thread,
> could you try again and let us know?  Thanks!

Wow!  With this diff I finished make release for the fist time on
my macppc.  I am starting the next build right now.  Result for
this test takes 1.13 days.

bluhm

> Index: uvm/uvm_vnode.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
> retrieving revision 1.127
> diff -u -p -r1.127 uvm_vnode.c
> --- uvm/uvm_vnode.c   31 Aug 2022 09:07:35 -  1.127
> +++ uvm/uvm_vnode.c   1 Sep 2022 12:54:27 -
> @@ -163,11 +163,8 @@ uvn_attach(struct vnode *vp, vm_prot_t a
>*/
>   rw_enter(uvn->u_obj.vmobjlock, RW_WRITE);
>   if (uvn->u_flags & UVM_VNODE_VALID) {   /* already active? */
> + KASSERT(uvn->u_obj.uo_refs > 0);
>  
> - /* regain vref if we were persisting */
> - if (uvn->u_obj.uo_refs == 0) {
> - vref(vp);
> - }
>   uvn->u_obj.uo_refs++;   /* bump uvn ref! */
>   rw_exit(uvn->u_obj.vmobjlock);
>  
> @@ -235,7 +232,7 @@ uvn_attach(struct vnode *vp, vm_prot_t a
>   KASSERT(uvn->u_obj.uo_refs == 0);
>   uvn->u_obj.uo_refs++;
>   oldflags = uvn->u_flags;
> - uvn->u_flags = UVM_VNODE_VALID|UVM_VNODE_CANPERSIST;
> + uvn->u_flags = UVM_VNODE_VALID;
>   uvn->u_nio = 0;
>   uvn->u_size = used_vnode_size;
>  
> @@ -248,7 +245,7 @@ uvn_attach(struct vnode *vp, vm_prot_t a
>   /*
>* add a reference to the vnode.   this reference will stay as long
>* as there is a valid mapping of the vnode.   dropped when the
> -  * reference count goes to zero [and we either free or persist].
> +  * reference count goes to zero.
>*/
>   vref(vp);
>   if (oldflags & UVM_VNODE_WANTED)
> @@ -321,16 +318,6 @@ uvn_detach(struct uvm_object *uobj)
>*/
>   vp->v_flag &= ~VTEXT;
>  
> - /*
> -  * we just dropped the last reference to the uvn.   see if we can
> -  * let it "stick around".
> -  */
> - if (uvn->u_flags & UVM_VNODE_CANPERSIST) {
> - /* won't block */
> - uvn_flush(uobj, 0, 0, PGO_DEACTIVATE|PGO_ALLPAGES);
> - goto out;
> - }
> -
>   /* its a goner! */
>   uvn->u_flags |= UVM_VNODE_DYING;
>  
> @@ -380,7 +367,6 @@ uvn_detach(struct uvm_object *uobj)
>   /* wake up any sleepers */
>   if (oldflags & UVM_VNODE_WANTED)
>   wakeup(uvn);
> -out:
>   rw_exit(uobj->vmobjlock);
>  
>   /* drop our reference to the vnode. */
> @@ -496,8 +482,8 @@ uvm_vnp_terminate(struct vnode *vp)
>   }
>  
>   /*
> -  * done.   now we free the uvn if its reference count is zero
> -  * (true if we are zapping a persisting uvn).   however, if we are
> +  * done.   now we free the uvn if its reference count is zero.
> +  * however, if we are
>* terminating a uvn with active mappings we let it live ... future
>* calls down to the vnode layer will fail.
>*/
> @@ -505,14 +491,14 @@ uvm_vnp_terminate(struct vnode *vp)
>   if (uvn->u_obj.uo_refs) {
>   /*
>* uvn must live on it is dead-vnode state until all references
> -  * are gone.   restore flags.clear CANPERSIST state.
> +  * are gone.   restore flags.
>*/
>   uvn->u_flags &= ~(UVM_VNODE_DYING|UVM_VNODE_VNISLOCKED|
> -   UVM_VNODE_WANTED|UVM_VNODE_CANPERSIST);
> +   UVM_VNODE_WANTED);
>   } else {
>   /*
>* free the uvn now.   note that the vref reference is already
> -  * gone [it is dropped when we enter the persist state].
> +  * gone.
>*/
>   if (uvn->u_flags & UVM_VNODE_IOSYNCWANTED)
>   panic("uvm_vnp_terminate: io sync wanted bit set");
> @@ -1349,46 +1335,14 @@ uvm_vnp_uncache(struct vnode *vp)
>   }
>  
>   /*
> -  * we have a valid, non-blocked uvn.   clear persist flag.
> +  * we have a valid, non-blocked uvn.
>* if uvn is currently active we can return now.
>*/
> - uvn->u_flags &= ~UVM_VNODE_CANPERSIST;
>   if (uvn->u_obj.uo_refs) {
>   rw_exit(uobj->vmobjlock);
>   return FALSE;
>   }
>  
> - /*
> -  * uvn is currently persisting!   we have to gain a reference to
> -  * it so that we can call uvn_detach to kill the uvn.
> -  */
> - vref(vp);   /* seems ok, even with VOP_LOCK */
> - uvn->u_obj.uo_refs++;   /* value is now 1 */
> - rw_exit(uobj->vmobjlock);
> -
> -#ifdef VFSLCKDEBUG
> - /*
> -  * carry over sanity check from old vnode pager: the vnode should
> -  * be VOP_LOCK'd, 

Re: OpenBSD 7.1/amd64 on APU4D4 - system drops into ddb few times a week

2022-08-29 Thread Alexander Bluhm
On Mon, Aug 29, 2022 at 04:42:45AM +0200, Radek wrote:
> the same problem occurs on -current.

It is not the same problem.  Traces are different.  But I guess
your setup triggers some sort of race.

Previous crashes with 7.1 were in route and IPsec, now it is in pf.
Unfortunately you missed my pf fragment fix by a couple of hours.
Please try a newer snapshot.

OpenBSD 7.2-beta (GENERIC.MP) #705: Mon Aug 22 12:25:07 MDT 2022
Changes by: bl...@cvs.openbsd.org   2022/08/22 14:35:39

I could not figure out what is wrong with 7.1-stable crashes.  The
register and ps output are not from the CPU where the crash happened.
You have to run show register and ps before switching CPU with mach
ddbcpu.

So first run show panic.  Then trace, show register, ps.
Finally inspect the other CPU with mach ddbcpu.

The number in ddb{2}> prompt shows the CPU you are currently on.
If "show panic" mentions more than one CPU, the one with the * is
the interresting one.  Usually ddb drops to that initially.  Traces
from other CPU help to see if something was running concurrently.

bluhm



Re: rt_ifa_del NULL deref

2022-08-27 Thread Alexander Bluhm
On Sat, Aug 27, 2022 at 03:14:15AM +0300, Vitaliy Makkoveev wrote:
> > On 27 Aug 2022, at 00:04, Alexander Bluhm  wrote:
> > 
> > Anyone willing to test or ok this?
> > 
> 
> This fixes weird `ifa??? refcounting. I like this.
> 
> Could the ifaref() and ifafree() names use the same notation? Like
> ifaref() and ifarele() or ifaget() and ifafree() or something else?

Refcount naming is very inconsistent.

ifget(), ifput(), pf_state_key_ref(), pf_state_key_unref(), tdb_ref(),
tdb_unref(), tdb_delete(), tdb_free(), vxlan_take(), vxlan_rele()
all work in subtle different ways.

I want to keep ifafree() as the name is established and called from
many places.  And giving ifaref() another name makes it different
but not better.

It would be easy to change something but hard to make it consistent.
So I prefer to leave the diff as it is.

bluhm

> > I would like to get it in before kn@ bumps ports due to net/if_var.h
> > changes.
> > 
> > On Wed, Aug 24, 2022 at 03:14:35PM +0200, Alexander Bluhm wrote:
> >> On Tue, Aug 23, 2022 at 02:47:11PM +0200, Stefan Sperling wrote:
> >>> ddb{2}> show struct ifaddr 0x804e9400
> >>> struct ifaddr at 0x804e9400 (64 bytes) {ifa_addr = (struct 
> >>> sockaddr *)0
> >>> xdeaf0009deafbead, ifa_dstaddr = (struct sockaddr *)0x20ef1a8af1d895de, 
> >>> ifa_net
> >>> mask = (struct sockaddr *)0xdeafbeaddeafbead, ifa_ifp = (struct ifnet 
> >>> *)0xdeafb
> >>> eaddeafbead, ifa_list = {tqe_next = (struct ifaddr *)0xdeafbeaddeafbead, 
> >>> tqe_pr
> >>> ev = 0xdeafbeaddeafbead}, ifa_flags = 0xdeafbead, ifa_refcnt = 
> >>> 0xdeafbead, ifa_
> >>> metric = 0xdeafbead}
> >> 
> >> The ifaddr has been freed.  That should not happen, as ifafree()
> >> uses reference counting.  But this refcount is a simple integer
> >> increment, that is not MP safe.  In theory all ++ should be protected
> >> by exclusive netlock or shared netlock combiined with kernel lock.
> >> But maybe I missed a path.
> >> 
> >> What are you doing with the machine?  Forwarding, IPv6, Stress test?
> >> Do you have network interfaces with multiple network queues?
> >> 
> >> Anyway, instead of adding kernel locks here and there, use a propper
> >> struct refcnt.  The variable ifatrash is never read, it looks like
> >> a ddb debugging feature.  But for debugging refcount leaks we have
> >> dt(4) now.
> >> 
> >> enc and mpls are special, they have interface address as part of
> >> their sc structure.  I use refcount anyway, the new panic prevents
> >> use after free.
> >> 
> >> This has passed a full regres run.
> >> 
> >> ok?
> >> 
> >> bluhm
> >> 
> >> Index: dev/dt/dt_prov_static.c
> >> ===
> >> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/dt/dt_prov_static.c,v
> >> retrieving revision 1.14
> >> diff -u -p -r1.14 dt_prov_static.c
> >> --- dev/dt/dt_prov_static.c28 Jun 2022 09:32:27 -  1.14
> >> +++ dev/dt/dt_prov_static.c23 Aug 2022 17:16:55 -
> >> @@ -88,9 +88,10 @@ DT_STATIC_PROBE0(smr, wakeup);
> >> DT_STATIC_PROBE2(smr, thread, "uint64_t", "uint64_t");
> >> 
> >> /*
> >> - * reference counting
> >> + * reference counting, keep in sync with sys/refcnt.h
> >>  */
> >> DT_STATIC_PROBE0(refcnt, none);
> >> +DT_STATIC_PROBE3(refcnt, ifaddr, "void *", "int", "int");
> >> DT_STATIC_PROBE3(refcnt, inpcb, "void *", "int", "int");
> >> DT_STATIC_PROBE3(refcnt, tdb, "void *", "int", "int");
> >> 
> >> @@ -135,6 +136,7 @@ struct dt_probe *const dtps_static[] = {
> >>&_DT_STATIC_P(smr, thread),
> >>/* refcnt */
> >>&_DT_STATIC_P(refcnt, none),
> >> +  &_DT_STATIC_P(refcnt, ifaddr),
> >>&_DT_STATIC_P(refcnt, inpcb),
> >>&_DT_STATIC_P(refcnt, tdb),
> >> };
> >> Index: net/if_enc.c
> >> ===
> >> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_enc.c,v
> >> retrieving revision 1.78
> >> diff -u -p -r1.78 if_enc.c
> >> --- net/if_enc.c   28 Dec 2020 14:28:50 -  1.78
> >> +++ net/if_enc.c   24 Aug 2022 07:51:49 -
> >> @@ -100,6 +100,7 @@ enc_clone_create(struct if_

Re: rt_ifa_del NULL deref

2022-08-26 Thread Alexander Bluhm
Anyone willing to test or ok this?

I would like to get it in before kn@ bumps ports due to net/if_var.h
changes.

On Wed, Aug 24, 2022 at 03:14:35PM +0200, Alexander Bluhm wrote:
> On Tue, Aug 23, 2022 at 02:47:11PM +0200, Stefan Sperling wrote:
> > ddb{2}> show struct ifaddr 0x804e9400
> > struct ifaddr at 0x804e9400 (64 bytes) {ifa_addr = (struct sockaddr 
> > *)0
> > xdeaf0009deafbead, ifa_dstaddr = (struct sockaddr *)0x20ef1a8af1d895de, 
> > ifa_net
> > mask = (struct sockaddr *)0xdeafbeaddeafbead, ifa_ifp = (struct ifnet 
> > *)0xdeafb
> > eaddeafbead, ifa_list = {tqe_next = (struct ifaddr *)0xdeafbeaddeafbead, 
> > tqe_pr
> > ev = 0xdeafbeaddeafbead}, ifa_flags = 0xdeafbead, ifa_refcnt = 0xdeafbead, 
> > ifa_
> > metric = 0xdeafbead}
> 
> The ifaddr has been freed.  That should not happen, as ifafree()
> uses reference counting.  But this refcount is a simple integer
> increment, that is not MP safe.  In theory all ++ should be protected
> by exclusive netlock or shared netlock combiined with kernel lock.
> But maybe I missed a path.
> 
> What are you doing with the machine?  Forwarding, IPv6, Stress test?
> Do you have network interfaces with multiple network queues?
> 
> Anyway, instead of adding kernel locks here and there, use a propper
> struct refcnt.  The variable ifatrash is never read, it looks like
> a ddb debugging feature.  But for debugging refcount leaks we have
> dt(4) now.
> 
> enc and mpls are special, they have interface address as part of
> their sc structure.  I use refcount anyway, the new panic prevents
> use after free.
> 
> This has passed a full regres run.
> 
> ok?
> 
> bluhm
> 
> Index: dev/dt/dt_prov_static.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/dt/dt_prov_static.c,v
> retrieving revision 1.14
> diff -u -p -r1.14 dt_prov_static.c
> --- dev/dt/dt_prov_static.c   28 Jun 2022 09:32:27 -  1.14
> +++ dev/dt/dt_prov_static.c   23 Aug 2022 17:16:55 -
> @@ -88,9 +88,10 @@ DT_STATIC_PROBE0(smr, wakeup);
>  DT_STATIC_PROBE2(smr, thread, "uint64_t", "uint64_t");
>  
>  /*
> - * reference counting
> + * reference counting, keep in sync with sys/refcnt.h
>   */
>  DT_STATIC_PROBE0(refcnt, none);
> +DT_STATIC_PROBE3(refcnt, ifaddr, "void *", "int", "int");
>  DT_STATIC_PROBE3(refcnt, inpcb, "void *", "int", "int");
>  DT_STATIC_PROBE3(refcnt, tdb, "void *", "int", "int");
>  
> @@ -135,6 +136,7 @@ struct dt_probe *const dtps_static[] = {
>   &_DT_STATIC_P(smr, thread),
>   /* refcnt */
>   &_DT_STATIC_P(refcnt, none),
> + &_DT_STATIC_P(refcnt, ifaddr),
>   &_DT_STATIC_P(refcnt, inpcb),
>   &_DT_STATIC_P(refcnt, tdb),
>  };
> Index: net/if_enc.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_enc.c,v
> retrieving revision 1.78
> diff -u -p -r1.78 if_enc.c
> --- net/if_enc.c  28 Dec 2020 14:28:50 -  1.78
> +++ net/if_enc.c  24 Aug 2022 07:51:49 -
> @@ -100,6 +100,7 @@ enc_clone_create(struct if_clone *ifc, i
>* and empty ifa of type AF_LINK for this purpose.
>*/
>   if_alloc_sadl(ifp);
> + refcnt_init_trace(>sc_ifa.ifa_refcnt, DT_REFCNT_IDX_IFADDR);
>   sc->sc_ifa.ifa_ifp = ifp;
>   sc->sc_ifa.ifa_addr = sdltosa(ifp->if_sadl);
>   sc->sc_ifa.ifa_netmask = NULL;
> @@ -152,6 +153,10 @@ enc_clone_destroy(struct ifnet *ifp)
>   NET_UNLOCK();
>  
>   if_detach(ifp);
> + if (refcnt_rele(>sc_ifa.ifa_refcnt) == 0) {
> + panic("%s: ifa refcnt has %u refs", __func__,
> + sc->sc_ifa.ifa_refcnt.r_refs);
> + }
>   free(sc, M_DEVBUF, sizeof(*sc));
>  
>   return (0);
> Index: net/if_mpe.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_mpe.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 if_mpe.c
> --- net/if_mpe.c  8 Nov 2021 04:50:54 -   1.101
> +++ net/if_mpe.c  24 Aug 2022 07:52:22 -
> @@ -128,6 +128,7 @@ mpe_clone_create(struct if_clone *ifc, i
>   sc->sc_txhprio = 0;
>   sc->sc_rxhprio = IF_HDRPRIO_PACKET;
>   sc->sc_rdomain = 0;
> + refcnt_init_trace(>sc_ifa.ifa_refcnt, DT_REFCNT_IDX_IFADDR);
>   sc->sc_ifa.ifa_ifp = ifp;
>   sc->sc_ifa.ifa_addr = sdltosa(ifp->if_sadl);
>   sc->sc_smpls.smpls_len = sizeof(sc->sc_smpls);
>

Re: rt_ifa_del NULL deref

2022-08-24 Thread Alexander Bluhm
On Tue, Aug 23, 2022 at 02:47:11PM +0200, Stefan Sperling wrote:
> ddb{2}> show struct ifaddr 0x804e9400
> struct ifaddr at 0x804e9400 (64 bytes) {ifa_addr = (struct sockaddr 
> *)0
> xdeaf0009deafbead, ifa_dstaddr = (struct sockaddr *)0x20ef1a8af1d895de, 
> ifa_net
> mask = (struct sockaddr *)0xdeafbeaddeafbead, ifa_ifp = (struct ifnet 
> *)0xdeafb
> eaddeafbead, ifa_list = {tqe_next = (struct ifaddr *)0xdeafbeaddeafbead, 
> tqe_pr
> ev = 0xdeafbeaddeafbead}, ifa_flags = 0xdeafbead, ifa_refcnt = 0xdeafbead, 
> ifa_
> metric = 0xdeafbead}

The ifaddr has been freed.  That should not happen, as ifafree()
uses reference counting.  But this refcount is a simple integer
increment, that is not MP safe.  In theory all ++ should be protected
by exclusive netlock or shared netlock combiined with kernel lock.
But maybe I missed a path.

What are you doing with the machine?  Forwarding, IPv6, Stress test?
Do you have network interfaces with multiple network queues?

Anyway, instead of adding kernel locks here and there, use a propper
struct refcnt.  The variable ifatrash is never read, it looks like
a ddb debugging feature.  But for debugging refcount leaks we have
dt(4) now.

enc and mpls are special, they have interface address as part of
their sc structure.  I use refcount anyway, the new panic prevents
use after free.

This has passed a full regres run.

ok?

bluhm

Index: dev/dt/dt_prov_static.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/dt/dt_prov_static.c,v
retrieving revision 1.14
diff -u -p -r1.14 dt_prov_static.c
--- dev/dt/dt_prov_static.c 28 Jun 2022 09:32:27 -  1.14
+++ dev/dt/dt_prov_static.c 23 Aug 2022 17:16:55 -
@@ -88,9 +88,10 @@ DT_STATIC_PROBE0(smr, wakeup);
 DT_STATIC_PROBE2(smr, thread, "uint64_t", "uint64_t");
 
 /*
- * reference counting
+ * reference counting, keep in sync with sys/refcnt.h
  */
 DT_STATIC_PROBE0(refcnt, none);
+DT_STATIC_PROBE3(refcnt, ifaddr, "void *", "int", "int");
 DT_STATIC_PROBE3(refcnt, inpcb, "void *", "int", "int");
 DT_STATIC_PROBE3(refcnt, tdb, "void *", "int", "int");
 
@@ -135,6 +136,7 @@ struct dt_probe *const dtps_static[] = {
&_DT_STATIC_P(smr, thread),
/* refcnt */
&_DT_STATIC_P(refcnt, none),
+   &_DT_STATIC_P(refcnt, ifaddr),
&_DT_STATIC_P(refcnt, inpcb),
&_DT_STATIC_P(refcnt, tdb),
 };
Index: net/if_enc.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_enc.c,v
retrieving revision 1.78
diff -u -p -r1.78 if_enc.c
--- net/if_enc.c28 Dec 2020 14:28:50 -  1.78
+++ net/if_enc.c24 Aug 2022 07:51:49 -
@@ -100,6 +100,7 @@ enc_clone_create(struct if_clone *ifc, i
 * and empty ifa of type AF_LINK for this purpose.
 */
if_alloc_sadl(ifp);
+   refcnt_init_trace(>sc_ifa.ifa_refcnt, DT_REFCNT_IDX_IFADDR);
sc->sc_ifa.ifa_ifp = ifp;
sc->sc_ifa.ifa_addr = sdltosa(ifp->if_sadl);
sc->sc_ifa.ifa_netmask = NULL;
@@ -152,6 +153,10 @@ enc_clone_destroy(struct ifnet *ifp)
NET_UNLOCK();
 
if_detach(ifp);
+   if (refcnt_rele(>sc_ifa.ifa_refcnt) == 0) {
+   panic("%s: ifa refcnt has %u refs", __func__,
+   sc->sc_ifa.ifa_refcnt.r_refs);
+   }
free(sc, M_DEVBUF, sizeof(*sc));
 
return (0);
Index: net/if_mpe.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_mpe.c,v
retrieving revision 1.101
diff -u -p -r1.101 if_mpe.c
--- net/if_mpe.c8 Nov 2021 04:50:54 -   1.101
+++ net/if_mpe.c24 Aug 2022 07:52:22 -
@@ -128,6 +128,7 @@ mpe_clone_create(struct if_clone *ifc, i
sc->sc_txhprio = 0;
sc->sc_rxhprio = IF_HDRPRIO_PACKET;
sc->sc_rdomain = 0;
+   refcnt_init_trace(>sc_ifa.ifa_refcnt, DT_REFCNT_IDX_IFADDR);
sc->sc_ifa.ifa_ifp = ifp;
sc->sc_ifa.ifa_addr = sdltosa(ifp->if_sadl);
sc->sc_smpls.smpls_len = sizeof(sc->sc_smpls);
@@ -154,6 +155,10 @@ mpe_clone_destroy(struct ifnet *ifp)
ifq_barrier(>if_snd);
 
if_detach(ifp);
+   if (refcnt_rele(>sc_ifa.ifa_refcnt) == 0) {
+   panic("%s: ifa refcnt has %u refs", __func__,
+   sc->sc_ifa.ifa_refcnt.r_refs);
+   }
free(sc, M_DEVBUF, sizeof *sc);
return (0);
 }
Index: net/if_mpip.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_mpip.c,v
retrieving revision 1.15
diff -u -p -r1.15 if_mpip.c
--- net/if_mpip.c   26 Mar 2021 19:00:21 -  1.15
+++ net/if_mpip.c   24 Aug 2022 07:52:26 -
@@ -128,6 +128,7 @@ mpip_clone_create(struct if_clone *ifc, 
bpfattach(>if_bpf, ifp, DLT_LOOP, sizeof(uint32_t));
 #endif
 
+   refcnt_init_trace(>sc_ifa.ifa_refcnt, DT_REFCNT_IDX_IFADDR);

Re: rt_ifa_del NULL deref

2022-08-23 Thread Alexander Bluhm
On Tue, Aug 23, 2022 at 12:23:05PM +0200, Stefan Sperling wrote:
> On Tue, Aug 23, 2022 at 11:43:22AM +0200, Alexander Bluhm wrote:
> > On Tue, Aug 23, 2022 at 10:15:22AM +0200, Stefan Sperling wrote:
> > > I found one of my amd64 systems running -current, built on 12th of
> > > August, has crashed as follows.
> > 
> > I there any chance that the kernel sources are between these commits?
> > August 12th does not fit exactly, do you remember when you did the
> > checkout?  Or is it a snapshot kernel?
> 
> Do you want to me provide anything else from ddb?

The ususal:

show panic
show registers
trace
ps
show struct ifaddr 0x804e9400
show all routes
traces from other cpu

I fear that there will be no really usefull info.  It looks like a
use after free.  When removing the route, the nd6 timer should have
been deleted.

> I can tell you exactly what sources I built once I reboot the machine.

Except the backout there was no relevant change around that time.



Re: rt_ifa_del NULL deref

2022-08-23 Thread Alexander Bluhm
On Tue, Aug 23, 2022 at 10:15:22AM +0200, Stefan Sperling wrote:
> I found one of my amd64 systems running -current, built on 12th of
> August, has crashed as follows.

I there any chance that the kernel sources are between these commits?
August 12th does not fit exactly, do you remember when you did the
checkout?  Or is it a snapshot kernel?


revision 1.246
date: 2022/08/09 21:10:03;  author: kn;  state: Exp;  lines: +10 -10;  
commitid: 7dnmtpMeiy7k6IOQ;
Backout "Call getuptime() just once per function"

This caused stuck ndp cache entries as found by naddy, sorry.

revision 1.244
date: 2022/08/08 15:56:35;  author: kn;  state: Exp;  lines: +10 -10;  
commitid: ILY0HdurUXzwu2qJ;
Call getuptime() just once per function

IPv6 pendant to bluhm's sys/netinet/if_ether.c r1.249:
Instead of calling getuptime() all the time in ARP code, do it only
once per function.  This gives a more consistent time value.
OK claudio@ miod@ mvs@

OK bluhm


> I am not sure if this is still relevant; please excuse the noise if
> this has already been found and fixed.

I am not aware of a fix in this area.  nd6 is not MP safe, so we
have a big kernel lock around it.  I have asked kn@ to look at nd6
locking.

The interaction between rtable SRP locking and MP access to routing
table leaves like nd6 is less than optinal.  I expect bugs there.

> kernel: protection fault trap, code=0
> Stopped at  rt_ifa_del+0x39:movb0x1be(%rax),%bl
> ddb{2}> bt
> rt_ifa_del(804e9400,800100,deaf0009deafbead,0) at rt_ifa_del+0x39
> in6_unlink_ifa(804e9400,800da2a8) at in6_unlink_ifa+0xae
> in6_purgeaddr(804e9400) at in6_purgeaddr+0x127
> nd6_expire(0) at nd6_expire+0x96
> taskq_thread(8002c080) at taskq_thread+0x100
> end trace frame: 0x0, count: -5



Re: Information leakage of IP-layer data on LAN

2022-08-22 Thread Alexander Bluhm
On Mon, Aug 22, 2022 at 08:56:32PM +0200, Peter J. Philipp wrote:
> On Mon, Aug 22, 2022 at 08:15:13PM +0200, Alexander Bluhm wrote:
> > Note that sending an error reply to packets that cannot be processed
> > is not uncommon and sometimes required to make the network behave
> > smoothly.
> 
> Ahh ok, thanks!  So I guess this is configuration mistake on my part.

It is debatable who is doing something wrong.

Due to the correct mac address, your packets are accepted and processed
by the IP layer of the kernel.

The network stack first checks whether the IP address is locally
configured.  As you have chosen a random IP address, it is not
local.

If IP forwarding is disabled (the default), such packet is silently
dropped.  That happens if you disable pf.

If IP forwarding is enabled, you would see an ICMP redirect as your
fake IP is in the same segment.  This ICMP has the IP of the router,
that is intensional.  IP addresses of a router are not secret.

pf does not know if an IP address is local.  It just processes
rules.  As you send "echo reply" packets, they cannot create states.
So the "block return" rule matches.  pf does what it is told to do.

Why is there a "block return" in default pf.conf?  Returning error
messages improves network responsiveness.  If anything goes wrong
or is configured incorrectly, you immediately get a response.
Otherwise you run into timeouts or things fail silently.  Debugging
would be hell.

If you don't like that, you can change pf.conf.  I don't recommend
that.  If your network security is based on the fact that nobody
knows you IP, you have different problems.  If an attacker wants
to detect your IP, he will.  Having a silent network device creates
more obscurity than security.  I don't need that kind of privacy.

bluhm



Re: Information leakage of IP-layer data on LAN

2022-08-22 Thread Alexander Bluhm
On Mon, Aug 22, 2022 at 06:04:17PM +0200, p...@delphinusdns.org wrote:
> >Synopsis:IP Information leakage using MAC address
> >Category:system
> >Environment:
>   System  : OpenBSD 7.1
>   Details : OpenBSD 7.1 (GENERIC.MP) #3: Sun May 15 10:27:01 MDT 2022
>
> r...@syspatch-71-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> Pretend you're on a large Wifi network at a conference and someone wants to
> know your IP address but they only have your MAC address.  Here is a quick
> dirty way of finding your IP address:
> 
> ./cb -ddc:a6:32:cc:db:a7,192.168.173.254 -I0.0 
> -PaA
> 
> The program is an ethernet spoofer and allows addressing the MAC address via a
> bpf device.  -I0.0 is icmp type 0 code 0 (echo reply) and -P is just any 
> random
> payload.  (old versions of this program shouldn't work as well).
> 
> leakage of primary IP address:
> 
> 
> 17:16:33.525822 b8:ae:ed:73:a7:6c dc:a6:32:cc:db:a7 0800 112: 192.168.177.13 
> > 192.168.177.254: icmp: echo reply (id: seq:0) [icmp cksum ok] (ttl 64, 
> id 16785, len 98)
> 17:16:33.525887 dc:a6:32:cc:db:a7 b8:ae:ed:73:a7:6c 0800 70: 192.168.177.14 > 
> 192.168.177.13: icmp: 192.168.177.254 protocol 1 port 62483 unreachable [icmp 
> cksum ok] (ttl 255, id 29020, len 56)

Do you have a "block return" in your pf.conf?
Does it work differently if you disable pf with pfctl -d?
How does your pf.conf filter to such packets?

Note that sending an error reply to packets that cannot be processed
is not uncommon and sometimes required to make the network behave
smoothly.

bluhm

> 
> Also these two work (different subnet):
> 
> 17:29:51.728721 b8:ae:ed:73:a7:6c dc:a6:32:cc:db:a7 0800 112: 192.168.177.13 
> > 192.168.173.254: icmp: echo reply (id: seq:0) [icmp cksum ok] (ttl 64, 
> id 7720, len 98)
> 17:29:51.728780 dc:a6:32:cc:db:a7 b8:ae:ed:73:a7:6c 0800 70: 192.168.177.14 > 
> 192.168.177.13: icmp: 192.168.173.254 protocol 1 port 62483 unreachable [icmp 
> cksum ok] (ttl 255, id 49018, len 56)
> 
> 
> and this one (notice destination IP is 4.3.2.1):
> 
> 17:30:35.448440 b8:ae:ed:73:a7:6c dc:a6:32:cc:db:a7 0800 112: 192.168.177.13 
> > 4.3.2.1: icmp: echo reply (id: seq:0) [icmp cksum ok] (ttl 64, id 
> 52302, len 98)
> 17:30:35.448513 dc:a6:32:cc:db:a7 b8:ae:ed:73:a7:6c 0800 70: 192.168.177.14 > 
> 192.168.177.13: icmp: 4.3.2.1 protocol 1 port 62483 unreachable [icmp cksum 
> ok] (ttl 255, id 16147, len 56)
> 
> Here is the /etc/sysctl.conf file of 192.168.177.14:
> 
> root@host# more /etc/sysctl.conf
> /etc/sysctl.conf: No such file or directory
> 
> So there is no forwarding going on.  How can this information leakage be 
> stopped?
> >How-To-Repeat:
>   network ethernet spoofers can do this with tcpdump.
> >Fix:
>   None provided, this is difficult.
> 
> 
> dmesg:
> removed.



Re: 7.2-beta crash with bridge Interface

2022-08-06 Thread Alexander Bluhm
On Sat, Aug 06, 2022 at 11:33:46PM +, mgra...@brainfat.net wrote:
> after creating a bridge interface running an ifconfig command will 
> crash the system.

> c: netlock: lock not held
> rw_exit_write(822af5e8) at rw_exit_write+0xae
> bridge_ioctl(805e4000,c0406938,8e663820) at bridge_ioctl+0x42
> ifioctl(fd801803b5a8,c0406938,8e663820,8e669268) at 
> ifioctl

SIOCGIFMEDIA is 0xc0406938.  This is fallout of my netlock removal
from media ioctl.  The bridge does not support media parameter, so
just skip it.

Does this diff fix it?

bluhm

Index: net/if_bridge.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_bridge.c,v
retrieving revision 1.363
diff -u -p -r1.363 if_bridge.c
--- net/if_bridge.c 4 Jan 2022 06:32:39 -   1.363
+++ net/if_bridge.c 6 Aug 2022 23:44:31 -
@@ -262,8 +262,13 @@ bridge_ioctl(struct ifnet *ifp, u_long c
/*
 * bridge(4) data structure aren't protected by the NET_LOCK().
 * Idealy it shouldn't be taken before calling `ifp->if_ioctl'
-* but we aren't there yet.
+* but we aren't there yet.  Media ioctl run without netlock.
 */
+   switch (cmd) {
+   case SIOCSIFMEDIA:
+   case SIOCGIFMEDIA:
+   return (ENOTTY);
+   }
NET_UNLOCK();
 
switch (cmd) {



Re: macppc panic: vref used where vget required

2022-06-01 Thread Alexander Bluhm
On Tue, May 31, 2022 at 04:40:32PM +0200, Martin Pieuchot wrote:
> Any of you got the chance to try this diff?  Could you reproduce the
> panic with it?

With this diff I could build a release.  It worked twice.

Usually it crashes after a day.  This time it finished release after
30 hours.

bluhm

> > Index: uvm/uvm_vnode.c
> > ===
> > RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v
> > retrieving revision 1.124
> > diff -u -p -r1.124 uvm_vnode.c
> > --- uvm/uvm_vnode.c 3 May 2022 21:20:35 -   1.124
> > +++ uvm/uvm_vnode.c 20 May 2022 09:04:08 -
> > @@ -162,12 +162,9 @@ uvn_attach(struct vnode *vp, vm_prot_t a
> >  * add it to the writeable list, and then return.
> >  */
> > if (uvn->u_flags & UVM_VNODE_VALID) {   /* already active? */
> > +   KASSERT(uvn->u_obj.uo_refs > 0);
> >  
> > rw_enter(uvn->u_obj.vmobjlock, RW_WRITE);
> > -   /* regain vref if we were persisting */
> > -   if (uvn->u_obj.uo_refs == 0) {
> > -   vref(vp);
> > -   }
> > uvn->u_obj.uo_refs++;   /* bump uvn ref! */
> > rw_exit(uvn->u_obj.vmobjlock);
> >  
> > @@ -234,7 +231,7 @@ uvn_attach(struct vnode *vp, vm_prot_t a
> > KASSERT(uvn->u_obj.uo_refs == 0);
> > uvn->u_obj.uo_refs++;
> > oldflags = uvn->u_flags;
> > -   uvn->u_flags = UVM_VNODE_VALID|UVM_VNODE_CANPERSIST;
> > +   uvn->u_flags = UVM_VNODE_VALID;
> > uvn->u_nio = 0;
> > uvn->u_size = used_vnode_size;
> >  
> > @@ -247,7 +244,7 @@ uvn_attach(struct vnode *vp, vm_prot_t a
> > /*
> >  * add a reference to the vnode.   this reference will stay as long
> >  * as there is a valid mapping of the vnode.   dropped when the
> > -* reference count goes to zero [and we either free or persist].
> > +* reference count goes to zero.
> >  */
> > vref(vp);
> > if (oldflags & UVM_VNODE_WANTED)
> > @@ -320,16 +317,6 @@ uvn_detach(struct uvm_object *uobj)
> >  */
> > vp->v_flag &= ~VTEXT;
> >  
> > -   /*
> > -* we just dropped the last reference to the uvn.   see if we can
> > -* let it "stick around".
> > -*/
> > -   if (uvn->u_flags & UVM_VNODE_CANPERSIST) {
> > -   /* won't block */
> > -   uvn_flush(uobj, 0, 0, PGO_DEACTIVATE|PGO_ALLPAGES);
> > -   goto out;
> > -   }
> > -
> > /* its a goner! */
> > uvn->u_flags |= UVM_VNODE_DYING;
> >  
> > @@ -379,7 +366,6 @@ uvn_detach(struct uvm_object *uobj)
> > /* wake up any sleepers */
> > if (oldflags & UVM_VNODE_WANTED)
> > wakeup(uvn);
> > -out:
> > rw_exit(uobj->vmobjlock);
> >  
> > /* drop our reference to the vnode. */
> > @@ -495,8 +481,8 @@ uvm_vnp_terminate(struct vnode *vp)
> > }
> >  
> > /*
> > -* done.   now we free the uvn if its reference count is zero
> > -* (true if we are zapping a persisting uvn).   however, if we are
> > +* done.   now we free the uvn if its reference count is zero.
> > +* however, if we are
> >  * terminating a uvn with active mappings we let it live ... future
> >  * calls down to the vnode layer will fail.
> >  */
> > @@ -504,14 +490,14 @@ uvm_vnp_terminate(struct vnode *vp)
> > if (uvn->u_obj.uo_refs) {
> > /*
> >  * uvn must live on it is dead-vnode state until all references
> > -* are gone.   restore flags.clear CANPERSIST state.
> > +* are gone.   restore flags.
> >  */
> > uvn->u_flags &= ~(UVM_VNODE_DYING|UVM_VNODE_VNISLOCKED|
> > - UVM_VNODE_WANTED|UVM_VNODE_CANPERSIST);
> > + UVM_VNODE_WANTED);
> > } else {
> > /*
> >  * free the uvn now.   note that the vref reference is already
> > -* gone [it is dropped when we enter the persist state].
> > +* gone.
> >  */
> > if (uvn->u_flags & UVM_VNODE_IOSYNCWANTED)
> > panic("uvm_vnp_terminate: io sync wanted bit set");
> > @@ -1350,46 +1336,14 @@ uvm_vnp_uncache(struct vnode *vp)
> > }
> >  
> > /*
> > -* we have a valid, non-blocked uvn.   clear persist flag.
> > +* we have a valid, non-blocked uvn.
> >  * if uvn is currently active we can return now.
> >  */
> > -   uvn->u_flags &= ~UVM_VNODE_CANPERSIST;
> > if (uvn->u_obj.uo_refs) {
> > rw_exit(uobj->vmobjlock);
> > return FALSE;
> > }
> >  
> > -   /*
> > -* uvn is currently persisting!   we have to gain a reference to
> > -* it so that we can call uvn_detach to kill the uvn.
> > -*/
> > -   vref(vp);   /* seems ok, even with VOP_LOCK */
> > -   uvn->u_obj.uo_refs++;   /* value is now 1 */
> > -   rw_exit(uobj->vmobjlock);
> > -
> > -#ifdef VFSLCKDEBUG
> > -   /*
> > -* carry over sanity check from old vnode pager: the vnode should
> > -* be VOP_LOCK'd, and 

Re: macppc panic: vref used where vget required

2022-05-19 Thread Alexander Bluhm
On Tue, May 17, 2022 at 05:43:02PM +0200, Martin Pieuchot wrote:
> Andrew, Alexander, could you test this and report back?

Panic "vref used where vget required" is still there.  As usual it
needs a day to reproduce.  This time I was running without the vref
history diff.

bluhm

> Index: kern/vfs_subr.c
> ===
> RCS file: /cvs/src/sys/kern/vfs_subr.c,v
> retrieving revision 1.315
> diff -u -p -r1.315 vfs_subr.c
> --- kern/vfs_subr.c   27 Mar 2022 16:19:39 -  1.315
> +++ kern/vfs_subr.c   17 May 2022 15:28:30 -
> @@ -459,6 +459,10 @@ getnewvnode(enum vtagtype tag, struct mo
>   vp->v_flag = 0;
>   vp->v_socket = NULL;
>   }
> + /*
> +  * Clean out any VM data associated with the vnode.
> +  */
> + uvm_vnp_terminate(vp);
>   cache_purge(vp);
>   vp->v_type = VNON;
>   vp->v_tag = tag;

[-- MARK -- Wed May 18 23:45:00 2022]
panic: vref used where vget required
Stopped at  db_enter+0x24:  lwz r11,12(r1)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 264957  33761 21 0x2  00  c++
*197367  60221  0 0x14000  0x2001K pagedaemon
db_enter() at db_enter+0x20
panic(92218b) at panic+0x158
vref(1e791340) at vref+0xac
uvm_vnp_uncache(e7ec3c50) at uvm_vnp_uncache+0x88
ffs_write(a8b89d11) at ffs_write+0x3b0
VOP_WRITE(1e791340,e7ec3c50,40,1ff3f00) at VOP_WRITE+0x48
uvn_io(a8b89d11,b12744,aa4808,,e401) at uvn_io+0x264
uvn_put(5c9adf98,e7ec3dd4,1f0b8498,4ef8cb0) at uvn_put+0x64
uvm_pager_put(0,0,e7ec3d70,6810fc,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x134
uvm_pageout(a8a37d41) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: 2
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> x/s version
version:OpenBSD 7.1-current (GENERIC.MP) #0: Tue May 17 22:51:12 CEST 
2022\012
r...@ot26.obsd-lab.genua.de:/usr/src/sys/arch/macppc/compile/GENERIC.MP\012

ddb{1}> show panic
*cpu1: vref used where vget required

ddb{1}> trace
db_enter() at db_enter+0x20
panic(92218b) at panic+0x158
vref(1e791340) at vref+0xac
uvm_vnp_uncache(e7ec3c50) at uvm_vnp_uncache+0x88
ffs_write(a8b89d11) at ffs_write+0x3b0
VOP_WRITE(1e791340,e7ec3c50,40,1ff3f00) at VOP_WRITE+0x48
uvn_io(a8b89d11,b12744,aa4808,,e401) at uvn_io+0x264
uvn_put(5c9adf98,e7ec3dd4,1f0b8498,4ef8cb0) at uvn_put+0x64
uvm_pager_put(0,0,e7ec3d70,6810fc,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x134
uvm_pageout(a8a37d41) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: -13

ddb{1}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 33761  264957  92741 21  7 0x2c++
 92741  312390  20513 21  30x10008a  sigsusp   sh
 7   85241  73592 21  2 0x2c++
 73592  395247  20513 21  30x10008a  sigsusp   sh
 20513  387613  86722 21  30x10008a  sigsusp   make
 86722  351373  19613 21  30x10008a  sigsusp   sh
 19613   73018  69411 21  30x10008a  sigsusp   make
 69411  381457   9957 21  30x10008a  sigsusp   sh
  9957   35595   9308 21  30x10008a  sigsusp   make
  9308  439208  50196 21  30x10008a  sigsusp   sh
 50196  261987  21533 21  30x10008a  sigsusp   make
 21533  134353  96184 21  30x10008a  sigsusp   sh
 96184  127071  50665 21  30x10008a  sigsusp   make
 50665  140826  82928  0  30x10008a  sigsusp   sh
 82928  416743  39291  0  30x10008a  sigsusp   make
 39291  450619  53019  0  30x10008a  sigsusp   make
 53019  342216   5152  0  30x10008a  sigsusp   sh
  5152  246225  75604  0  30x82  piperdperl
 75604  321372  45770  0  30x10008a  sigsusp   ksh
 45770  382402  53145  0  30x9a  kqreadsshd
 19061  218044  1  0  30x100083  ttyin getty
 77831  153734  1  0  30x100098  kqreadcron
 97917  207119  1 99  3   0x1100090  kqreadsndiod
 72318  229709  1110  30x100090  kqreadsndiod
 99951  369454  60369 95  3   0x1100092  kqreadsmtpd
 26631  442966  60369103  3   0x1100092  kqreadsmtpd
 29720  146017  60369 95  3   0x1100092  kqreadsmtpd
 80957  314403  60369 95  30x100092  kqreadsmtpd
 17022  361258  60369 95  3   0x1100092  kqreadsmtpd
  7863  144172  60369 95  3   0x1100092  kqreadsmtpd
 60369   51354  1  0  30x100080  kqreadsmtpd
 28817  507320  1  0  3

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-18 Thread Alexander Bluhm
On Mon, May 16, 2022 at 05:06:28PM +0200, Claudio Jeker wrote:
> > In veb configuration we are holding the netlock and sleep in
> > smr_barrier() and refcnt_finalize().  An additional sleep in malloc()
> > is fine here.
> 
> Are you sure about this? smr_barrier() on busy systems with many cpus can
> take 100ms and more. Isn't all processing stopped during that time (IIRC
> the code uses the write exclusive netlock).

Interface ioctls are called with netlock.  veb_ioctl() calls
veb_add_port() and veb_del_port() which call smr_barrier().

I did not notice 100ms network sleep during ifconfig.  I would not
consider it an urgent problem.  If you optimize for one thing, you
pay the price somewhere else.

> Sleeping with a global rwlock held is not a good design.

And here we are back to the net lock design discussion we had with
mpi@ years ago.  More important is progress.  We should not spend
too much time on design discussion.  Better solve problems.

My plan is, in more or less this order:
- use exclusive netlock for configuration
- use shared netlock for parts that are MP safe
- kernel lock together with net lock allows quick fixes
  to make progress without making everything MP perfect
- replace kernel lock with mutex
- use something smarter than mutex to make things fast

Others have other plans, so we have different maturity of MP support
in the kernel.

bluhm



Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexander Bluhm
On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote:
> at this point we hold a NET_LOCK(). So basically if there won't
> be enough memory we might start sleeping waiting for memory
> while we will be holding a NET_LOCK.
> 
> This is something we should try to avoid, however this can be
> sorted out later. At this point I just want to point out
> this problem, which can be certainly solved in follow up
> commit. pf(4) also has its homework to be done around
> sleeping mallocs.

I think sleeping with netlock is not a problem in general.

In pf(4) ioctl we sleep with netlock and pflock while doing copyin()
or copyout().  This results in a lock order reversal due to a hack
in uvn_io().  In my opinion we should not sleep within pf lock, so
we can convert it to mutex or someting better later.

In veb configuration we are holding the netlock and sleep in
smr_barrier() and refcnt_finalize().  An additional sleep in malloc()
is fine here.

Holding the netlock and sleeping in m_get() is worse.  There is no
recovery after reaching the mbuf limit.  Sleeping rules are
inconsistent and depend on the area of the stack.  Different people
have multiple ideas how it should be done.

bluhm



Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexander Bluhm
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote:
> sorry i'm late to the party. can you try this diff?

Thanks for having a look.  I added veb(4) to my setup.  With this
diff, I cannot trigger a crash anymore.

OK bluhm@

> this diff replaces the list of ports with an array/map of ports.
> the map takes references to all the ports, so the forwarding paths
> just have to hold a reference to the map to be able to use all the
> ports. the forwarding path uses smr to get hold of a map, takes a map
> ref, and then leaves the smr crit section before iterating over the map
> and pushing packets.
> 
> this means we should only take and release a single refcnt when
> we're pushing packets out any number of ports.
> 
> if no span ports are configured, then there's no span port map and
> we don't try and take a ref, we can just return early.
> 
> we also only take and release a single refcnt when we forward the
> actual packet. forwarding to a single port provided by an etherbridge
> lookup already takes/releases the single port ref. if it falls
> through that for unknown unicast or broadcast/multicast, then it's
> a single refcnt for the current map of all ports.
> 
> Index: if_veb.c
> ===
> RCS file: /cvs/src/sys/net/if_veb.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 if_veb.c
> --- if_veb.c  4 Jan 2022 06:32:39 -   1.25
> +++ if_veb.c  13 May 2022 02:01:43 -
> @@ -139,13 +139,13 @@ struct veb_port {
>   struct veb_rule_list p_vr_list[2];
>  #define VEB_RULE_LIST_OUT0
>  #define VEB_RULE_LIST_IN 1
> -
> - SMR_TAILQ_ENTRY(veb_port)p_entry;
>  };
>  
>  struct veb_ports {
> - SMR_TAILQ_HEAD(, veb_port)   l_list;
> - unsigned int l_count;
> + struct refcntm_refs;
> + unsigned int m_count;
> +
> + /* followed by an array of veb_port pointers */
>  };
>  
>  struct veb_softc {
> @@ -155,8 +155,8 @@ struct veb_softc {
>   struct etherbridge   sc_eb;
>  
>   struct rwlocksc_rule_lock;
> - struct veb_ports sc_ports;
> - struct veb_ports sc_spans;
> + struct veb_ports*sc_ports;
> + struct veb_ports*sc_spans;
>  };
>  
>  #define DPRINTF(_sc, fmt...)do { \
> @@ -184,8 +184,25 @@ static int   veb_p_ioctl(struct ifnet *, u
>  static int   veb_p_output(struct ifnet *, struct mbuf *,
>   struct sockaddr *, struct rtentry *);
>  
> -static void  veb_p_dtor(struct veb_softc *, struct veb_port *,
> - const char *);
> +static inline size_t
> +veb_ports_size(unsigned int n)
> +{
> + /* use of _ALIGN is inspired by CMSGs */
> + return _ALIGN(sizeof(struct veb_ports)) +
> + n * sizeof(struct veb_port *);
> +}
> +
> +static inline struct veb_port **
> +veb_ports_array(struct veb_ports *m)
> +{
> + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m)));
> +}
> +
> +static void  veb_ports_free(struct veb_ports *);
> +
> +static void  veb_p_unlink(struct veb_softc *, struct veb_port *);
> +static void  veb_p_fini(struct veb_port *);
> +static void  veb_p_dtor(struct veb_softc *, struct veb_port *);
>  static int   veb_add_port(struct veb_softc *,
>   const struct ifbreq *, unsigned int);
>  static int   veb_del_port(struct veb_softc *,
> @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
>   return (ENOMEM);
>  
>   rw_init(>sc_rule_lock, "vebrlk");
> - SMR_TAILQ_INIT(>sc_ports.l_list);
> - SMR_TAILQ_INIT(>sc_spans.l_list);
> + sc->sc_ports = NULL;
> + sc->sc_spans = NULL;
>  
>   ifp = >sc_if;
>  
> @@ -314,7 +331,10 @@ static int
>  veb_clone_destroy(struct ifnet *ifp)
>  {
>   struct veb_softc *sc = ifp->if_softc;
> - struct veb_port *p, *np;
> + struct veb_ports *mp, *ms;
> + struct veb_port **ps;
> + struct veb_port *p;
> + unsigned int i;
>  
>   NET_LOCK();
>   sc->sc_dead = 1;
> @@ -326,10 +346,60 @@ veb_clone_destroy(struct ifnet *ifp)
>   if_detach(ifp);
>  
>   NET_LOCK();
> - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_ports.l_list, p_entry, np)
> - veb_p_dtor(sc, p, "destroy");
> - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_spans.l_list, p_entry, np)
> - veb_p_dtor(sc, p, "destroy");
> +
> + /*
> +  * this is an upside down version of veb_p_dtor() and
> +  * veb_ports_destroy() to avoid a lot of malloc/free and
> +  * smr_barrier calls if we remove ports one by one.
> +  */
> +
> + mp = SMR_PTR_GET_LOCKED(>sc_ports);
> + SMR_PTR_SET_LOCKED(>sc_ports, NULL);
> + if (mp != NULL) {
> + ps = veb_ports_array(mp);
> + for (i = 0; i < mp->m_count; i++) {
> + veb_p_unlink(sc, ps[i]);
> + }
> + }
> +

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexander Bluhm
On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote:
> On 9.5.2022. 22:04, Alexander Bluhm wrote:
> > Can some veb or smr hacker explain how this is supposed to work?
> > 
> > Sleeping in pf is also not ideal as it is in the hot path and slows
> > down packets.  But that is not easy to fix as we have to refactor
> > the memory allocations before converting pf lock to a mutex.  sashan@
> > is working on that.
> 
> 
> Hi,
> 
> isn't that similar or same panic that was talked about in "parallel
> forwarding vs. bridges" mail thread on tech@ started by sashan@
> 
> https://www.mail-archive.com/tech@openbsd.org/msg64040.html

Yes.  It is similar.  

I have read the whole mail thread and the final fix got commited.
But it looks incomplete, pf is still sleeping.

Hrvoje, can you run the tests again that triggered the panics a
year ago?

Sasha, I still think the way to go is mutex for pf locks.  I don't
see a performance impact.  Without it, we can run network only on
a single CPU anyway.  And with sleeping locks you have to schedule
in the hot packet path.  Our schedueler was never build for that.

At genua we started with mutex, made it fine grained, and converted
to rcu later.

bluhm



Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexander Bluhm
On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> Also I've used 7.1 for a limited time and there were no crash.
> After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.

For me this looks like a bug in veb(4).

> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(81f22e39) at panic+0xbf
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at 
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40
> sleep_finish(800025574da0,1) at sleep_finish+0x10b
> rw_enter(822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,8520e000,800025575058) at pf_test+0x1088
> ip_input_if(800025575058,800025575064,4,0,8520e000) at 
> ip_input_if+0xcd
> ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> vlan_input(8095a050,fd806064c200,8000255752bc) at 
> vlan_input+0x23d
> ether_input(8095a050,fd806064c200) at ether_input+0x85
> if_input_process(8095a050,800025575358) at if_input_process+0x6f
> ifiq_process(8095a460) at ifiq_process+0x69
> taskq_thread(80035080) at taskq_thread+0x100

veb_port_input -> veb_broadcast -> smr_read_enter; tp->p_enqueue
-> vport_if_enqueue -> if_vinput -> ifp->if_input -> ether_input ->
ipv4_input -> ip_input_if -> pf_test -> PF_LOCK -> rw_enter_write()

After calling smr_read_enter sleeping is not allowed according to
man page.  pf sleeps because it uses a read write lock.  I looks
like we have some contention on the pf lock.  With more forwarding
threads, sleep in pf is more likely.

> __mp_lock(823d986c) at __mp_lock+0x72
> wakeup_n(822cfe50,) at wakeup_n+0x32
> pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6
> pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6
> pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f
> ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at 
> ip_input_if+0xcd
> ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39
> ether_input(80924050,fd8053540f00) at ether_input+0x3ad
> if_input_process(80924050,80002557b688) at if_input_process+0x6f
> ifiq_process(80926500) at ifiq_process+0x69
> taskq_thread(80035100) at taskq_thread+0x100

> __mp_lock(823d986c) at __mp_lock+0x72
> wakeup_n(822cfe50,) at wakeup_n+0x32
> pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6
> pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6
> pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f
> ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at 
> ip_input_if+0xcd
> ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39
> ether_input(80924050,fd8053540f00) at ether_input+0x3ad
> if_input_process(80924050,80002557b688) at if_input_process+0x6f
> ifiq_process(80926500) at ifiq_process+0x69
> taskq_thread(80035100) at taskq_thread+0x100

Can some veb or smr hacker explain how this is supposed to work?

Sleeping in pf is also not ideal as it is in the hot path and slows
down packets.  But that is not easy to fix as we have to refactor
the memory allocations before converting pf lock to a mutex.  sashan@
is working on that.

bluhm



Re: macppc panic: vref used where vget required

2022-05-06 Thread Alexander Bluhm
Same with this diff.

On Wed, May 04, 2022 at 05:58:14PM +0200, Martin Pieuchot wrote:
> Index: nfs/nfs_serv.c
> ===
> RCS file: /cvs/src/sys/nfs/nfs_serv.c,v
> retrieving revision 1.120
> diff -u -p -r1.120 nfs_serv.c
> --- nfs/nfs_serv.c11 Mar 2021 13:31:35 -  1.120
> +++ nfs/nfs_serv.c4 May 2022 15:29:06 -
> @@ -1488,6 +1488,9 @@ nfsrv_rename(struct nfsrv_descript *nfsd
>   error = -1;
>  out:
>   if (!error) {
> + if (tvp) {
> + (void)uvm_vnp_uncache(tvp);
> + }
>   error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, _cnd,
>  tond.ni_dvp, tond.ni_vp, _cnd);
>   } else {
> Index: ufs/ffs/ffs_inode.c
> ===
> RCS file: /cvs/src/sys/ufs/ffs/ffs_inode.c,v
> retrieving revision 1.81
> diff -u -p -r1.81 ffs_inode.c
> --- ufs/ffs/ffs_inode.c   12 Dec 2021 09:14:59 -  1.81
> +++ ufs/ffs/ffs_inode.c   4 May 2022 15:32:15 -
> @@ -172,11 +172,12 @@ ffs_truncate(struct inode *oip, off_t le
>   if (length > fs->fs_maxfilesize)
>   return (EFBIG);
>  
> - uvm_vnp_setsize(ovp, length);
>   oip->i_ci.ci_lasta = oip->i_ci.ci_clen 
>   = oip->i_ci.ci_cstart = oip->i_ci.ci_lastw = 0;
>  
>   if (DOINGSOFTDEP(ovp)) {
> + uvm_vnp_setsize(ovp, length);
> + (void) uvm_vnp_uncache(ovp);
>   if (length > 0 || softdep_slowdown(ovp)) {
>   /*
>* If a file is only partially truncated, then

http://bluhm.genua.de/release/results/2022-05-05T13%3A16%3A25Z/bsdcons-ot26.txt

[-- MARK -- Fri May  6 17:45:00 2022]
uvn_io: start: 0x1687cc08, type VREG, use 0, write 0, hold 0, flags 
(VBIOONFREELIST)
tag VT_UFS, ino 131483, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 6385624
==> vnode_history_print 0x1687cc08, next=7
 [0] c++[58214] vput, usecount 2>1

 [1] c++[67078] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [2] c++[67078] vget, usecount 2>3
#0  mtx_enter_try+0x5c
 [3] c++[67078] vrele, usecount 3>2

 [4] c++[67078] vput, usecount 2>1

 [5] reaper[33068] uvn_detach (UVM_VNODE_CANPERSIST), usecount 1>1
#0  0xfffc
#1  rv6xx_asic+0x4
#2  uvn_detach+0x13c
#3  uvm_unmap_detach+0x1a4
#4  uvm_map_teardown+0x184
#5  uvmspace_free+0x60
#6  uvm_exit+0x30
#7  reaper+0x138
#8  fork_trampoline+0x14
 [6] reaper[33068] vrele, usecount 1>0

 [7>] c++[58214] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [8] c++[58214] vrele, usecount 2>1

 [9] c++[58214] vref, usecount 1>2

vn_lock: v_usecount == 0: 0x1687cc08, type VREG, use 0, write 0, hold 0, flags 
(VBIOONFREELIST)
tag VT_UFS, ino 131483, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 6385624
==> vnode_history_print 0x1687cc08, next=7
 [0] c++[58214] vput, usecount 2>1

 [1] c++[67078] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [2] c++[67078] vget, usecount 2>3
#0  mtx_enter_try+0x5c
 [3] c++[67078] vrele, usecount 3>2

 [4] c++[67078] vput, usecount 2>1

 [5] reaper[33068] uvn_detach (UVM_VNODE_CANPERSIST), usecount 1>1
#0  0xfffc
#1  rv6xx_asic+0x4
#2  uvn_detach+0x13c
#3  uvm_unmap_detach+0x1a4
#4  uvm_map_teardown+0x184
#5  uvmspace_free+0x60
#6  uvm_exit+0x30
#7  reaper+0x138
#8  fork_trampoline+0x14
 [6] reaper[33068] vrele, usecount 1>0

 [7>] c++[58214] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [8] c++[58214] vrele, usecount 2>1

 [9] c++[58214] vref, usecount 1>2

panic: vn_lock: v_usecount == 0
Stopped at  db_enter+0x24:  lwz r11,12(r1)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 308217  72745 21 0x2  00  c++
*260882  89598  0 0x14000  0x2001K pagedaemon
db_enter() at db_enter+0x20
panic(946dcf) at panic+0x158
vn_lock(7c564249,1b5000) at vn_lock+0x1c4
uvn_io(634fe38c,aa6c04,adc4c0,,e401) at uvn_io+0x254
uvn_put(830225fc,e7ec7dd4,16881d58,4ec0400) at uvn_put+0x64
uvm_pager_put(0,0,e7ec7d70,184f78,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x134
uvm_pageout(6393b378) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> x/s version
version:OpenBSD 7.1-current (GENERIC.MP) #0: Thu May  5 15:42:47 CEST 
2022\012
r...@ot26.obsd-lab.genua.de:/usr/src/sys/arch/macppc/compile/GENERIC.MP\012

ddb{1}> show panic
*cpu1: vn_lock: v_usecount == 0

ddb{1}> trace
db_enter() at db_enter+0x20
panic(946dcf) at panic+0x158
vn_lock(7c564249,1b5000) at vn_lock+0x1c4
uvn_io(634fe38c,aa6c04,adc4c0,,e401) at uvn_io+0x254

Re: macppc panic: vref used where vget required

2022-05-04 Thread Alexander Bluhm
On Wed, May 04, 2022 at 05:58:14PM +0200, Martin Pieuchot wrote:
> I don't understand the mechanism around UVM_VNODE_CANPERSIST.  I looked
> for missing uvm_vnp_uncache() and found the following two.  I doubt
> those are the one triggering the bug because they are in NFS & softdep.

It crashes while compiling clang.

c++ -O2 -pipe  -fno-ret-protector -std=c++14 -fvisibility-inlines-hidden 
-fno-exceptions -fno-rtti -Wall -W -Wno-unused-parameter -Wwrite-strings 
-Wcast-qual  -Wno-missing-field-initializers -pedantic -Wno-long-long  
-Wdelete-non-virtual-dtor -Wno-comment -fPIE  -MD -MP  
-I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/../../../llvm/llvm/include
 -I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/../include 
-I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/obj  
-I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/obj/../include 
-DNDEBUG -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS  -D__STDC_FORMAT_MACROS 
-DLLVM_PREFIX="/usr" 
-I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/../../../llvm/lldb/include
  
-I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/../../../llvm/lldb/source
 
-I/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/../../../llvm/clang/include
 -c 
/usr/src/gnu/usr.bin/clang/liblldbPluginExpressionParser/../../../llvm/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionParser.cpp
 -o ClangExpressionParser.o
Timeout, server ot26 not responding.

No softdep, but NFS client.  I use it to mount cvs mirror read-only.
This file system should not be used during make build.

/dev/wd0a on / type ffs (local)
/dev/wd0l on /home type ffs (local, nodev, nosuid)
/dev/wd0d on /tmp type ffs (local, nodev)
/dev/wd0f on /usr type ffs (local, nodev)
/dev/wd0g on /usr/X11R6 type ffs (local, nodev)
/dev/wd0h on /usr/local type ffs (local, nodev, wxallowed)
/dev/wd0k on /usr/obj type ffs (local, nodev, nosuid, wxallowed)
/dev/wd0j on /usr/src type ffs (local, nodev, nosuid)
/dev/wd0e on /var type ffs (local, nodev, nosuid)
regressmaster:/data/mirror/openbsd/cvs on /mount/openbsd/cvs type nfs (nodev, 
nosuid, read-only, v3, udp, intr, wsize=32768, rsize=32768, rdirsize=32768, 
timeo=100, retrans=101)
regressmaster:/data/mirror/openbsd/ftp on /mount/openbsd/ftp type nfs (nodev, 
nosuid, read-only, v3, udp, intr, wsize=32768, rsize=32768, rdirsize=32768, 
timeo=100, retrans=101)
regressmaster:/data/mirror/openbsd/distfiles on /mount/openbsd/distfiles type 
nfs (nodev, nosuid, read-only, v3, udp, intr, wsize=32768, rsize=32768, 
rdirsize=32768, timeo=100, retrans=101)

Should I test this diff?

> Index: nfs/nfs_serv.c
> ===
> RCS file: /cvs/src/sys/nfs/nfs_serv.c,v
> retrieving revision 1.120
> diff -u -p -r1.120 nfs_serv.c
> --- nfs/nfs_serv.c11 Mar 2021 13:31:35 -  1.120
> +++ nfs/nfs_serv.c4 May 2022 15:29:06 -
> @@ -1488,6 +1488,9 @@ nfsrv_rename(struct nfsrv_descript *nfsd
>   error = -1;
>  out:
>   if (!error) {
> + if (tvp) {
> + (void)uvm_vnp_uncache(tvp);
> + }
>   error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, _cnd,
>  tond.ni_dvp, tond.ni_vp, _cnd);
>   } else {
> Index: ufs/ffs/ffs_inode.c
> ===
> RCS file: /cvs/src/sys/ufs/ffs/ffs_inode.c,v
> retrieving revision 1.81
> diff -u -p -r1.81 ffs_inode.c
> --- ufs/ffs/ffs_inode.c   12 Dec 2021 09:14:59 -  1.81
> +++ ufs/ffs/ffs_inode.c   4 May 2022 15:32:15 -
> @@ -172,11 +172,12 @@ ffs_truncate(struct inode *oip, off_t le
>   if (length > fs->fs_maxfilesize)
>   return (EFBIG);
>  
> - uvm_vnp_setsize(ovp, length);
>   oip->i_ci.ci_lasta = oip->i_ci.ci_clen 
>   = oip->i_ci.ci_cstart = oip->i_ci.ci_lastw = 0;
>  
>   if (DOINGSOFTDEP(ovp)) {
> + uvm_vnp_setsize(ovp, length);
> + (void) uvm_vnp_uncache(ovp);
>   if (length > 0 || softdep_slowdown(ovp)) {
>   /*
>* If a file is only partially truncated, then



Re: macppc panic: vref used where vget required

2022-05-03 Thread Alexander Bluhm
On Mon, May 02, 2022 at 06:53:08AM +0200, Sebastien Marie wrote:
> New diff, with new iteration on vnode_history_*() functions. I added a label 
> in 
> the record function. I also changed when showing the stacktrace. powerpc has 
> poor backtrace support, but now it will at least print some infos even if no 
> stacktrace recorded (pid, ps_comm, label, v_usecount).
> 
> I added a vnode_history_record() entry inside vclean(). We should have it in 
> both case:
> - if vnode is already inactive (vp->v_usecount == 0) [trace already recorded]
> - if vnode is still active (vp->v_usecount != 0) [trace previously not 
> recorded]

http://bluhm.genua.de/release/results/2022-05-02T12%3A23%3A22Z/bsdcons-ot26.txt

[-- MARK -- Tue May  3 16:45:00 2022]
uvn_io: start: 0x23b5a9c0, type VREG, use 0, write 0, hold 0, flags 
(VBIOONFREELIST)
tag VT_UFS, ino 365327, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 13647873
==> vnode_history_print 0x23b5a9c0, next=6
 [0] c++[83314] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [1] c++[83314] vget, usecount 2>3
#0  mtx_enter_try+0x5c
 [2] c++[83314] vrele, usecount 3>2

 [3] c++[83314] vput, usecount 2>1

 [4] reaper[690] uvn_detach (UVM_VNODE_CANPERSIST), usecount 1>1
#0  0xfffc
#1  umidi_quirklist+0x148
#2  uvn_detach+0x13c
#3  uvm_unmap_detach+0x1a4
#4  uvm_map_teardown+0x184
#5  uvmspace_free+0x60
#6  uvm_exit+0x30
#7  reaper+0x138
#8  fork_trampoline+0x14
 [5] reaper[690] vrele, usecount 1>0

 [6>] c++[73867] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [7] c++[73867] vrele, usecount 2>1

 [8] c++[73867] vref, usecount 1>2

 [9] c++[73867] vput, usecount 2>1

vn_lock: v_usecount == 0: 0x23b5a9c0, type VREG, use 0, write 0, hold 0, flags 
(VBIOONFREELIST)
tag VT_UFS, ino 365327, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 13647873
==> vnode_history_print 0x23b5a9c0, next=6
 [0] c++[83314] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [1] c++[83314] vget, usecount 2>3
#0  mtx_enter_try+0x5c
 [2] c++[83314] vrele, usecount 3>2

 [3] c++[83314] vput, usecount 2>1

 [4] reaper[690] uvn_detach (UVM_VNODE_CANPERSIST), usecount 1>1
#0  0xfffc
#1  umidi_quirklist+0x148
#2  uvn_detach+0x13c
#3  uvm_unmap_detach+0x1a4
#4  uvm_map_teardown+0x184
#5  uvmspace_free+0x60
#6  uvm_exit+0x30
#7  reaper+0x138
#8  fork_trampoline+0x14
 [5] reaper[690] vrele, usecount 1>0

 [6>] c++[73867] vget, usecount 1>2
#0  mtx_enter_try+0x5c
 [7] c++[73867] vrele, usecount 2>1

 [8] c++[73867] vref, usecount 1>2
< PFLAGS  CPU  COMMAND
 339124  89786 21 0x2  01  c++
*246215  93511  0 0x14000  0x2000K pagedaemon
db_enter() at db_enter+0x20
panic(9466dc) at panic+0x158
vn_lock(7c564249,2b7000) at vn_lock+0x1c4
uvn_io(ae78a0,1,e7ebbc50,8650f8,e401) at uvn_io+0x254
uvn_put(e5274130,e7ebbdd4,23b952a0,4f1f9f0) at uvn_put+0x64
uvm_pager_put(0,0,e7ebbd70,267900,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x15c
uvm_pageout(19350f3) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{0}> x/s version
version:OpenBSD 7.1-current (GENERIC.MP) #0: Mon May  2 14:51:09 CEST 
2022\012
r...@ot26.obsd-lab.genua.de:/usr/src/sys/arch/macppc/compile/GENERIC.MP\012

ddb{0}> show panic
*cpu0: vn_lock: v_usecount == 0

ddb{0}> trace
db_enter() at db_enter+0x20
panic(9466dc) at panic+0x158
vn_lock(7c564249,2b7000) at vn_lock+0x1c4
uvn_io(ae78a0,1,e7ebbc50,8650f8,e401) at uvn_io+0x254
uvn_put(e5274130,e7ebbdd4,23b952a0,4f1f9f0) at uvn_put+0x64
uvm_pager_put(0,0,e7ebbd70,267900,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x15c
uvm_pageout(19350f3) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: -10

ddb{0}> show register
r0  0x5325c0panic+0x15c
r10xe7ebbb60
r2 0
r3  0xaf80e8cpu_info
r4  0xb0extract_entropy.extract_pool+0x1e24
r5   0x1
r6 0
r70xe7bb9000
r8 0
r9 0
r100
r11   0x7986a131
r120xa536dde_end+0x9a21fa6
r130
r14 0xaf4e90bcstats
r15 0xae0f7cuvmexp
r160
r170
r180
r19 0x2b7000vn_lock+0x130
r20   0x2000tlbdsmsize+0x1f18
r21   0x2000tlbdsmsize+0x1f18
r22 0xa887ecnetlock
r23   0xe401
r24   0x23b952b8
r25   0x23b952bc
r26 

Re: macppc panic: vref used where vget required

2022-05-01 Thread Alexander Bluhm
Still panics with the latest uvm fixes.

http://bluhm.genua.de/release/results/2022-04-30T21%3A55%3A03Z/bsdcons-ot26.txt

[-- MARK -- Sun May  1 14:40:00 2022]
uvn_io: start: 0x25452b58, type VREG, use 0, write 0, hold 834, flags 
(VBIOONFREELIST)
tag VT_UFS, ino 469309, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 13647873
==> vnode_history_print 0x25452b58, next=6
 [4] reaper[59031] usecount 1>1
#0  spleen12x24_data+0x86c
 [5] reaper[59031] usecount 1>0
#0  splx+0x30
#1  0xfffc
#2  vrele+0x5c
#3  uvn_detach+0x160
#4  uvm_unmap_detach+0x1a4
#5  uvm_map_teardown+0x184
#6  uvmspace_free+0x60
#7  uvm_exit+0x30
#8  reaper+0x138
#9  fork_trampoline+0x14
vn_lock: v_usecount == 0: 0x25452b58, type VREG, use 0, write 0, hold 834, 
flags (VBIOONFREELIST)
tag VT_UFS, ino 469309, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 13647873
==> vnode_history_print 0x25452b58, next=6
 [4] reaper[59031] usecount 1>1
#0  spleen12x24_data+0x86c
 [5] reaper[59031] usecount 1>0
#0  splx+0x30
#1  0xfffc
#2  vrele+0x5c
#3  uvn_detach+0x160
#4  uvm_unmap_detach+0x1a4
#5  uvm_map_teardown+0x184
#6  uvmspace_free+0x60
#7  uvm_exit+0x30
#8  reaper+0x138
#9  fork_trampoline+0x14

panic: vn_lock: v_usecount == 0
Stopped at  db_enter+0x24:  lwz r11,12(r1)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 509301  70498 21 0x2  00  c++
*159995   7667  0 0x14000  0x2001K pagedaemon
db_enter() at db_enter+0x20
panic(94c446) at panic+0x158
vn_lock(49535400,657000) at vn_lock+0x1c4
uvn_io(5ef31c8a,a9c80c,b020bc,,e401) at uvn_io+0x254
uvn_put(a557fdd,e7ebbdd4,25099380,5dd7f60) at uvn_put+0x64
uvm_pager_put(0,0,e7ebbd70,7fde08,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x15c
uvm_pageout(5ed01eee) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> x/s version
version:OpenBSD 7.1-current (GENERIC.MP) #0: Sun May  1 00:23:00 CEST 
2022\012
r...@ot26.obsd-lab.genua.de:/usr/src/sys/arch/macppc/compile/GENERIC.MP\012

ddb{1}> show panic
*cpu1: vn_lock: v_usecount == 0

ddb{1}> trace
db_enter() at db_enter+0x20
panic(94c446) at panic+0x158
vn_lock(49535400,657000) at vn_lock+0x1c4
uvn_io(5ef31c8a,a9c80c,b020bc,,e401) at uvn_io+0x254
uvn_put(a557fdd,e7ebbdd4,25099380,5dd7f60) at uvn_put+0x64
uvm_pager_put(0,0,e7ebbd70,7fde08,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x15c
uvm_pageout(5ed01eee) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: -10

ddb{1}> show register
r0  0x827230panic+0x15c
r10xe7ebbb60
r2 0
r3  0xaa4140cpu_info+0x4c0
r4  0xabtimeout_wheel_kc+0x1aa8
r5   0x1
r6 0
r70xe7bb9000
r8 0
r9  0x92059fpppdumpm.digits
r10 0x14
r11   0xf9f449fe
r12   0xe9570a90
r130
r14 0xafc628bcstats
r15 0xb02098uvmexp
r160
r170
r180
r19 0x657000coredump+0x18c
r20   0x1000tlbdsmsize+0xf18
r21   0x1000tlbdsmsize+0xf18
r22 0xa8d76cnetlock
r23   0xe401
r24   0x25099398
r25   0x2509939c
r26 0x92dd83apollo_udma66_tim+0x16fc
r270
r280
r29 0xaa4400cpu_info+0x780
r30 0x94c446cy_pio_rec+0x14e8b
r31 0xb021c4uvmexp+0x12c
lr  0x1837b0db_enter+0x24
cr0x48228204
xer   0x2000
ctr 0x43dea0openpic_splx
iar 0x1837b0db_enter+0x24
msr   0x9032tlbdsmsize+0x8f4a
dar0
dsisr  0
db_enter+0x24:  lwz r11,12(r1)

ddb{1}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 70498  509301  70205 21  7 0x2c++
 70205  121095  94995 21  30x10008a  sigsusp   sh
 31427  144008  46929 21  2 0x2c++
 46929  299794  94995 21  30x10008a  sigsusp   sh
 94995  483101   3785 21  30x10008a  sigsusp   make
  3785  285525  82454 21  30x10008a  sigsusp   sh
 82454   77234   5995 21  30x10008a  sigsusp   make
  5995  283797  77123 21  30x10008a  sigsusp   sh
 77123  466026  

Re: ipsp_ids_gc panic after 7.1 upgrade

2022-04-29 Thread Alexander Bluhm
On Thu, Apr 28, 2022 at 12:52:41AM +0300, Vitaliy Makkoveev wrote:
> On Thu, Apr 28, 2022 at 12:15:25AM +0300, Vitaliy Makkoveev wrote:
> > > On 27 Apr 2022, at 23:24, Kasak  wrote:
> [ skip ]
> > > I???m afraid your patch did not help, it crashed again after three hours 
> > 
> > Did it panic within ipsp_ids_gc() again?
> > 
> 
> I missed, ipsp_ids_lookup() bumps `id_refcount' on dead `ids'. I fixed
> my previous diff.

OK bluhm@

> Index: sys/netinet/ip_ipsp.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_ipsp.c,v
> retrieving revision 1.269
> diff -u -p -r1.269 ip_ipsp.c
> --- sys/netinet/ip_ipsp.c 10 Mar 2022 15:21:08 -  1.269
> +++ sys/netinet/ip_ipsp.c 27 Apr 2022 21:40:58 -
> @@ -1205,7 +1205,7 @@ ipsp_ids_insert(struct ipsec_ids *ids)
>   found = RBT_INSERT(ipsec_ids_tree, _ids_tree, ids);
>   if (found) {
>   /* if refcount was zero, then timeout is running */
> - if (atomic_inc_int_nv(>id_refcount) == 1) {
> + if ((++found->id_refcount) == 1) {
>   LIST_REMOVE(found, id_gc_list);
>  
>   if (LIST_EMPTY(_ids_gc_list))
> @@ -1248,7 +1248,12 @@ ipsp_ids_lookup(u_int32_t ipsecflowinfo)
>  
>   mtx_enter(_flows_mtx);
>   ids = RBT_FIND(ipsec_ids_flows, _ids_flows, );
> - atomic_inc_int(>id_refcount);
> + if (ids != NULL) {
> + if (ids->id_refcount != 0)
> + ids->id_refcount++;
> + else
> + ids = NULL;
> + }
>   mtx_leave(_flows_mtx);
>  
>   return ids;
> @@ -1290,6 +1295,8 @@ ipsp_ids_free(struct ipsec_ids *ids)
>   if (ids == NULL)
>   return;
>  
> + mtx_enter(_flows_mtx);
> +
>   /*
>* If the refcount becomes zero, then a timeout is started. This
>* timeout must be cancelled if refcount is increased from zero.
> @@ -1297,10 +1304,10 @@ ipsp_ids_free(struct ipsec_ids *ids)
>   DPRINTF("ids %p count %d", ids, ids->id_refcount);
>   KASSERT(ids->id_refcount > 0);
>  
> - if (atomic_dec_int_nv(>id_refcount) > 0)
> + if ((--ids->id_refcount) > 0) {
> + mtx_leave(_flows_mtx);
>   return;
> -
> - mtx_enter(_flows_mtx);
> + }
>  
>   /*
>* Add second for the case ipsp_ids_gc() is already running and
> Index: sys/netinet/ip_ipsp.h
> ===
> RCS file: /cvs/src/sys/netinet/ip_ipsp.h,v
> retrieving revision 1.238
> diff -u -p -r1.238 ip_ipsp.h
> --- sys/netinet/ip_ipsp.h 21 Apr 2022 15:22:50 -  1.238
> +++ sys/netinet/ip_ipsp.h 27 Apr 2022 21:40:59 -
> @@ -241,7 +241,7 @@ struct ipsec_ids {
>   struct ipsec_id *id_local;  /* [I] */
>   struct ipsec_id *id_remote; /* [I] */
>   u_int32_t   id_flow;/* [I] */
> - u_int   id_refcount;/* [a] */
> + u_int   id_refcount;/* [F] */
>   u_int   id_gc_ttl;  /* [F] */
>  };
>  RBT_HEAD(ipsec_ids_flows, ipsec_ids);



Re: macppc panic: vref used where vget required

2022-04-29 Thread Alexander Bluhm
On Thu, Apr 28, 2022 at 08:47:53PM +0200, Martin Pieuchot wrote:
> On 28/04/22(Thu) 16:54, Sebastien Marie wrote:
> > On Thu, Apr 28, 2022 at 04:04:41PM +0200, Alexander Bluhm wrote:
> > > On Wed, Apr 27, 2022 at 09:16:48AM +0200, Sebastien Marie wrote:
> > > > Here a new diff (sorry for the delay) which add a new 
> > > > vnode_history_record()
> > > > point inside uvn_detach() (when 'uvn' object has UVM_VNODE_CANPERSIST 
> > > > flag sets).
> > > 
> > > [-- MARK -- Thu Apr 28 14:10:00 2022]
> > > uvn_io: start: 0x23ae1400, type VREG, use 0, write 0, hold 0, flags 
> > > (VBIOONFREELIST)
> > > tag VT_UFS, ino 495247, on dev 0, 10 flags 0x100, effnlink 1, 
> > > nlink 1
> > > mode 0100660, owner 21, group 21, size 13647873
> > > ==> vnode_history_print 0x23ae1400, next=6
> > >  [3] c++[44194] usecount 2>1
> > > #0  0x626946ec
> > >  [4] reaper[10898] usecount 1>1
> > > #0  entropy_pool0+0xf54
> > 
> > even if the stacktrace is somehow grabage, the "usecount 1>1" is due to 
> > VH_NOP 
> > (no increment neither decrement), so it is the vnode_history_record() newly 
> > added at:
> > 
> > @@ -323,6 +325,10 @@ uvn_detach(struct uvm_object *uobj)
> >  * let it "stick around".
> >  */
> > if (uvn->u_flags & UVM_VNODE_CANPERSIST) {
> > +   extern void vnode_history_record(struct vnode *, int);
> > +
> > +   vnode_history_record(vp, 0);
> > +
> > /* won't block */
> > uvn_flush(uobj, 0, 0, PGO_DEACTIVATE|PGO_ALLPAGES);
> > goto out;
> > 
> > mpi@, it confirms that uvn_flush() is called without PGO_FREE for this uvn.
> 
> Thanks!
> 
> Has vclean() been called for this vnode?  If so the problem might indeed
> be related to the `uo_refs' fix I just committed, if not that might be
> the bug.

I tried with this commit.  Did not help.

Full console log:
http://bluhm.genua.de/release/results/2022-04-28T18%3A58%3A33Z/bsdcons-ot26.txt

[-- MARK -- Fri Apr 29 11:50:00 2022]
uvn_io: start: 0x3c8701d0, type VREG, use 0, write 0, hold 834, flags 
(VBIOONFREELIST)
tag VT_UFS, ino 729103, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 13647873
==> vnode_history_print 0x3c8701d0, next=6
 [4] reaper[77961] usecount 1>1
#0  udl_decomp_table+0x428
 [5] reaper[77961] usecount 1>0
#0  splx+0x30
#1  0xfffc
#2  vrele+0x5c
#3  uvn_detach+0x160
#4  uvm_unmap_detach+0x1a4
#5  uvm_map_teardown+0x184
#6  uvmspace_free+0x60
#7  uvm_exit+0x30
#8  reaper+0x138
#9  fork_trampoline+0x14
vn_lock: v_usecount == 0: 0x3c8701d0, type VREG, use 0, write 0, hold 834, 
flags (VBIOONFREELIST)
tag VT_UFS, ino 729103, on dev 0, 10 flags 0x100, effnlink 1, nlink 1
mode 0100660, owner 21, group 21, size 13647873
==> vnode_history_print 0x3c8701d0, next=6
 [4] reaper[77961] usecount 1>1
#0  udl_decomp_table+0x428
 [5] reaper[77961] usecount 1>0
#0  splx+0x30
#1  0xfffc
#2  vrele+0x5c
#3  uvn_detach+0x160
#4  uvm_unmap_detach+0x1a4
#5  uvm_map_teardown+0x184
#6  uvmspace_free+0x60
#7  uvm_exit+0x30
#8  reaper+0x138
#9  fork_trampoline+0x14

panic: vn_lock: v_usecount == 0
Stopped at  db_enter+0x24:  lwz r11,12(r1)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 196635  94068 21 0x2  01  c++
*178725  15770  0 0x14000  0x2000K pagedaemon
db_enter() at db_enter+0x20
panic(948e9d) at panic+0x158
vn_lock(49535400,682000) at vn_lock+0x1c4
uvn_io(ac4198,ae2774,e7ebbc50,75468c,e401) at uvn_io+0x254
uvn_put(3e9be6ca,e7ebbdd4,3c8dc460,5e49200) at uvn_put+0x64
uvm_pager_put(0,0,e7ebbd70,304068,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan_inactive+0x224
uvmpd_scan() at uvmpd_scan+0x158
uvm_pageout(db17175f) at uvm_pageout+0x398
fork_trampoline() at fork_trampoline+0x14
end trace frame: 0x0, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{0}> x/s version
version:OpenBSD 7.1-current (GENERIC.MP) #0: Thu Apr 28 21:25:23 CEST 
2022\012
r...@ot26.obsd-lab.genua.de:/usr/src/sys/arch/macppc/compile/GENERIC.MP\012

ddb{0}> show panic
*cpu0: vn_lock: v_usecount == 0

ddb{0}> trace
db_enter() at db_enter+0x20
panic(948e9d) at panic+0x158
vn_lock(49535400,682000) at vn_lock+0x1c4
uvn_io(ac4198,ae2774,e7ebbc50,75468c,e401) at uvn_io+0x254
uvn_put(3e9be6ca,e7ebbdd4,3c8dc460,5e49200) at uvn_put+0x64
uvm_pager_put(0,0,e7ebbd70,304068,200,8000,0) at uvm_pager_put+0x15c
uvmpd_scan_inactive(0) at uvmpd_scan

  1   2   3   4   >