Re: HA IPSec with AWS - no second flow
On Mon, Mar 11, 2024 at 02:33:26PM +0100, Rafa?? Ramocki wrote: > Hi, > > That thread you've mentioned is almost 10 years old and is about isakmpd not > iked. I've also tried the same with route-based VPN to AWS with BGP but > problem is the same. BGP can negotiate route but traffic will not pass if > there is no approperiate flow set up. the problem you're hitting up against is in the kernel though, so it doesnt matter whether it is isakmpd or iked sets up the flows. > From iekd perspective this looks like this: > > iked_sas: 0x41f72e9e780 rspi 0x0980e96b933f8822 ispi 0x4e425ee4a53d8814 > X.X.X.X:4500->16.170.59.81:4500[] ESTABLISHED r natt > udpecap nexti 0x0 pol 0x41ed13d5000 > sa_childsas: 0x41f72e8f800 ESP 0x8c5e2e0e in 16.170.59.81:4500 -> > X.X.X.X:4500 (LA) B=0x0 P=0x41f72e8ed00 @0x41f72e9e780 > sa_childsas: 0x41f72e8ed00 ESP 0xc643a377 out X.X.X.X:4500 -> > 16.170.59.81:4500 (L) B=0x0 P=0x41f72e8f800 @0x41f72e9e780 > sa_flows: 0x41f670ba800 ESP out 10.2.15.0/24 -> 172.31.0.0/20 [0]@-1 () > @0x41f72e9e780 > sa_flows: 0x41f72eafc00 ESP in 172.31.0.0/20 -> 10.2.15.0/24 [0]@-1 () > @0x41f72e9e780 > sa_flows: 0x41f670cd000 ESP out 10.2.15.0/24 -> 172.31.16.0/20 [0]@-1 () > @0x41f72e9e780 > sa_flows: 0x41f72ebd000 ESP in 172.31.16.0/20 -> 10.2.15.0/24 [0]@-1 () > @0x41f72e9e780 > sa_flows: 0x41f72eaf800 ESP out 10.2.15.0/24 -> 172.31.32.0/20 [0]@-1 () > @0x41f72e9e780 > sa_flows: 0x41f72ea8000 ESP in 172.31.32.0/20 -> 10.2.15.0/24 [0]@-1 () > @0x41f72e9e780 > sa_flows: 0x41f72ea8400 ESP out 169.254.74.238/32 -> 169.254.74.237/32 > [0]@-1 () @0x41f72e9e780 > sa_flows: 0x41f72eaf000 ESP in 169.254.74.237/32 -> 169.254.74.238/32 > [0]@-1 () @0x41f72e9e780 > iked_sas: 0x41f72eab780 rspi 0x19dfa021335741c8 ispi 0x0d8f2ce39a096333 > X.X.X.X:4500->16.170.59.81:4500[] ESTABLISHED r natt > udpecap nexti 0x0 pol 0x41ed13d5000 > sa_childsas: 0x41f72e9a000 ESP 0xc4c246f8 out X.X.X.X:4500 -> > 16.170.59.81:4500 (L) B=0x0 P=0x41f72eb9a00 @0x41f72eab780 > sa_childsas: 0x41f72eb9a00 ESP 0x1345935a in 16.170.59.81:4500 -> > X.X.X.X:4500 (LA) B=0x0 P=0x41f72e9a000 @0x41f72eab780 > sa_flows: 0x41f72ec7800 ESP out 10.2.15.0/24 -> 172.31.0.0/20 [0]@-1 (L) > @0x41f72eab780 > sa_flows: 0x41f72e91800 ESP in 172.31.0.0/20 -> 10.2.15.0/24 [0]@-1 (L) > @0x41f72eab780 > sa_flows: 0x41f72ead400 ESP out 10.2.15.0/24 -> 172.31.16.0/20 [0]@-1 (L) > @0x41f72eab780 > sa_flows: 0x41f72eadc00 ESP in 172.31.16.0/20 -> 10.2.15.0/24 [0]@-1 (L) > @0x41f72eab780 > sa_flows: 0x41f72e91c00 ESP out 10.2.15.0/24 -> 172.31.32.0/20 [0]@-1 (L) > @0x41f72eab780 > sa_flows: 0x41f72ead000 ESP in 172.31.32.0/20 -> 10.2.15.0/24 [0]@-1 (L) > @0x41f72eab780 > sa_flows: 0x41f72e90400 ESP out 169.254.74.238/32 -> 169.254.74.237/32 > [0]@-1 (L) @0x41f72eab780 > sa_flows: 0x41f72ead800 ESP in 169.254.74.237/32 -> 169.254.74.238/32 > [0]@-1 (L) @0x41f72eab780 > > I think that (L) marking means that this flow is loaded into the kernel and > some of them are missing. It may be some change in iked to fix this, I think. > > > > PS: I'm working with devices of different vendors and I think that in some > way OpenBSD have problem with this. yes, that's why we added support for route based ipsec vpns via the sec(4) interface, as per the links that hrvoje provided and https://marc.info/?l=openbsd-tech=168844868110327=2. i just set up a an openbsd box talking to aws s2s with iked yesterday. your config would look a bit like this: dlg@obsd-aws-gw0:/etc$ sudo cat hostname.em0 inet 192.0.2.51 255.255.255.0 dlg@obsd-aws-gw0:/etc$ sudo cat hostname.sec0 inet 169.254.21.38 255.255.255.252 169.254.21.37 up dlg@obsd-aws-gw0:/etc$ sudo cat hostname.sec1 inet 169.254.74.238 255.255.255.252 169.254.74.237 up dlg@obsd-aws-gw0:/etc$ sudo cat iked.conf h_self="192.0.2.51" h_s2s1="51.21.86.8" h_s2s1_key="0123456789abcdefghijklmnopqrstuv" h_s2s2="16.170.59.81" h_s2s2_key="abcdefghijklmnopqrstuvwxyz012345" ikev2 "s2s1" active \ from any to any \ local $h_self peer $h_s2s1 \ childsa auth hmac-sha2-256 enc aes-256 group ecp256 \ psk $h_s2s1_key \ iface sec0 ikev2 "s2s2" active \ from any to any \ local $h_self peer $h_s2s2 \ childsa auth hmac-sha2-256 enc aes-256 group ecp256 \ psk $h_s2s2_key \ iface sec1 that should be enough to get ipsec up and running so you can talk to aws over the sec(4) interfaces. the next step is to set up routes to your nets in aws over these links. we use bgpd to dynamically learn routes and fail over between the the different sec interfaces, but if you wanted to do ecmp/multipath you could manually add routes over each interface. dlg > - Original Message - > From: "Hrvoje Popovski" > To: "Rafa?? Ramocki" , "bugs" > Sent: Monday, March 11, 2024 1:05:10 PM > Subject: Re: HA IPSec with AWS - no second flow > > On 11.3.2024. 10:22, Rafa?? Ramocki wrote: > >> (...) > > I think IKED may detect
Re: 7.4 hard locks with bgpd on link change
this should be fixed in src/sys/net/if_sec.c r1.10. sorry for the delay :( > On 4 Nov 2023, at 13:01, Jason Tubnor wrote: > > > On 3/11/2023 8:58 pm, Claudio Jeker wrote: >> Do I understand you correctly that bgpd runs over the sec(4) interface >> which routes over em1? > > Correct (also OSPF). Here is the iked.conf and ifconfig sec10: > > ikev2 active esp from any to any peer 192.168.1.1 srcid 172.16.1.1 dstid > 192.168.1.1 iface sec10 > > sec10: flags=8051 mtu 1380 > index 7 priority 0 llprio 3 > groups: sec egress > inet 10.0.0.253 --> 10.0.1.254 netmask 0xfe00 > >> Does bgpd install any routes over em1? `bgpctl show next` should tell you >> which nexthops use which interface. > > See below. Redacted for privacy: > > fwtst06# bgpctl sho nex > Flags: * = nexthop valid > > Nexthop Route Prio Gateway Iface > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/23 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/23 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/26 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/26 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/26 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/304 connected em0 (UP, 1000 Mbps) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/304 connected em0 (UP, 1000 Mbps) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/30 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/30 32 XXX.XXX.XXX.XXX sec10 (UP, unknown) > >> >> This could well be an issue inside sec(4) since that interface is new. >> If you could give use some example config to rebuild the case it would >> help a lot. > fwtst06# cat /etc/hostname.sec10 > inet 10.0.0.253 255.255.254.0 10.0.1.254 mtu 1380 > up > fwtst06# grep sec /etc/pf.conf > set skip on { lo, sec } > fwtst06# cat /etc/ospfd.conf > router-id $ospf_id > > area 0.0.0.0 { > interface sec10 { > type p2p > } > interface em0 { > type p2p > } > } > > /etc/bgpd.conf > > group "ibgp" { > remote-as $bgpasn > local-address $laif > > neighbor 10.8.8.8 # router reflector 1 ipv4 > neighbor 10.9.9.9 # router reflector 2 ipv4 > > neighbor $em0neighbor { > route-reflector > } > } >
Re: vxlan(4) custom destination UDP port seems not working
On Wed, Nov 15, 2023 at 06:13:15AM -0700, Theo de Raadt wrote: > Luca Di Gregorio wrote: > > > I'm not sure about this, but I think that public cloud datacenters filter > > out > > (or do something with) udp traffic to standard udp vxlan port. > > But that would not be a reason for allowing selection of the pre-standard > port number. > > Rather, it would be a reason for provididing *any non-standard port number* > > Which is perhaps what the code does. But noone would actually want this. > VXLAN on port 54? 80? Noone would want this. > > And if they filter it, then put it inside an underlay. The standard says > nothing about permitting vxlan on any old random stupid port number. from a quick look around it appears that at least linux, juniper and arista allow for the configuration of a non-standard port for vxlan. linux documentation even says it defaults to the pre-iana assigned port because their driver pre dates the standard, which is peak linux. independent of whether our vxlan(4) driver should support it or not, ifconfig should be fixed to handle setting up sockaddrs for these ioctls better anyway. dlg
Re: pf nat-to doesn't match a crafted packet
How are you injecting the crafted packet into the stack? On Tue, 29 Aug 2023, 01:14 , wrote: > >Synopsis: pf nat-to doesn't match a crafted packet > >Category: system > >Environment: > System : OpenBSD 7.3 > Details : OpenBSD 7.3 (GENERIC.MP) #2080: Sat Mar 25 14:20:25 > MDT 2023 > dera...@arm64.openbsd.org: > /usr/src/sys/arch/arm64/compile/GENERIC.MP > > Architecture: OpenBSD.arm64 > Machine : arm64 > >Description: > I was testing a seemingly valid Internet packet going out my > gateway > but the pf firewall doesn't match nat-to to this one for some reason. I'm > possibly overlooking something but every other packet exiting my gateway is > nat'ed. What causes this? How can this be exploited? > > >How-To-Repeat: > Here is the tcpdump from the host 1 hop behind the NAT router: > > 16:59:08.438082 192.168.177.13 > 49.12.42.182: icmp: host 7.198.187.211 > unreachable [icmp cksum ok] for 11.69.44.241.52699 > 7.198.187.211.55672: > udp 51351 [tos 0x9c] (ttl 147, id 17124, len 51419, optlen=40 NOP RR{39}= > RR{#106.155.117.54 233.26.79.111 129.127.249.242 60.117.146.16 > 179.39.29.224 213.65.49.78 0.16.45.109 252.168.188.0 123.108.138.224}) (ttl > 64, id 65443, len 96) > : 4500 0060 ffa3 4001 ad81 c0a8 b10d E..`@... > 0010: 310c 2ab6 0301 55aa 4f9c c8db 1.*...U.O... > 0020: 42e4 9311 c756 0b45 2cf1 07c6 bbd3 B..V.E,. > 0030: 0107 2704 6a9b 7536 e91a 4f6f 817f f9f2 ..'.j.u6..Oo > 0040: 3c75 9210 b327 1de0 d541 314e 0010 2d6d 0050: fca8 bc00 7b6c 8ae0 cddb d978 {l.x > > and here is the tcpdump on the pppoe interface: > > 16:59:08.440403 192.168.177.13 > 49.12.42.182: icmp: host 7.198.187.211 > unreacha > ble [icmp cksum ok] (ttl 63, id 65443, len 96) > > Here is the relevant anchor rules I have: > >match out on $ext_if inet from to any nat-to ($ext_if) > > and: > > table const { 10/8, 172.16/12, 192.168/16 } > > Why did pf not translate this? ... that's kinda kinky. > > >Fix: > Not known. > > > dmesg: > OpenBSD 7.3 (GENERIC.MP) #2080: Sat Mar 25 14:20:25 MDT 2023 > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP > real mem = 8432840704 (8042MB) > avail mem = 8139239424 (7762MB) > random: good seed from bootblocks > mainbus0 at root: ACPI > psci0 at mainbus0: PSCI 1.1, SMCCC 1.2 > cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3 > cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu0: 1024KB 64b/line 16-way L2 cache > cpu0: CRC32,ASID16 > cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3 > cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu1: 1024KB 64b/line 16-way L2 cache > cpu1: CRC32,ASID16 > cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3 > cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu2: 1024KB 64b/line 16-way L2 cache > cpu2: CRC32,ASID16 > cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3 > cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu3: 1024KB 64b/line 16-way L2 cache > cpu3: CRC32,ASID16 > efi0 at mainbus0: UEFI 2.7 > efi0: https://github.com/pftf/RPi4 rev 0x1 > smbios0 at efi0: SMBIOS 3.3.0 > smbios0: vendor https://github.com/pftf/RPi4 version "UEFI Firmware > v1.21" date 11/13/2020 > smbios0: Raspberry Pi Foundation Raspberry Pi 4 Model B > apm0 at mainbus0 > ampintc0 at mainbus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller" > agtimer0 at mainbus0: 54000 kHz > acpi0 at mainbus0: ACPI 6.3 > acpi0: sleep states > acpi0: tables DSDT FACP CSRT DBG2 GTDT IORT APIC PPTT BGRT > acpi0: wakeup devices > acpiiort0 at acpi0 > "BCM2849" at acpi0 not configured > "BCM2835" at acpi0 not configured > "BCM2854" at acpi0 not configured > "ACPI0004" at acpi0 not configured > xhci0 at acpi0 XHC0 addr 0x6/0x1000 irq 175, xHCI 1.0 > usb0 at xhci0: USB revision 3.0 > uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev > 3.00/1.00 addr 1 > "ACPI0007" at acpi0 not configured > "ACPI0007" at acpi0 not configured > "ACPI0007" at acpi0 not configured > "ACPI0007" at acpi0 not configured > "ACPI0004" at acpi0 not configured > "BCM2848" at acpi0 not configured > "BCM2850" at acpi0 not configured > "BCM2856" at acpi0 not configured > "BCM2845" at acpi0 not configured > "BCM2841" at acpi0 not configured > "BCM2841" at acpi0 not configured > "BCM2838" at acpi0 not configured > "BCM2839" at acpi0 not configured > "BCM2844" at acpi0 not configured > pluart0 at acpi0 URT0 addr 0xfe201000/0x1000 irq 153 > "BCM2836" at acpi0 not configured > "BCM2EA6" at acpi0 not configured > "MSFT8000" at acpi0 not configured > sdhc0 at acpi0 SDC1 addr 0xfe30/0x100 irq 158 > sdhc0: base clock frequency unknown > "BCM2855" at acpi0 not configured > bse0 at acpi0 ETH0 addr 0xfd58/0x1 irq 189: address > dc:a6:32:cc:db:a7 > brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT
Re: Intel Ethernet (?Synopsys based) on Filet3 Elkhart Lake unconfigured on recent snapshot
> On 28 Apr 2023, at 06:06, Ted Ri wrote: > > To: bugs@openbsd.org > Subject: Intel Ethernet (?Synopsys based) on Fitlet3 Elkhart Lake > unconfigured on latest snapshot > From: t26034...@gmail.com > Cc: t26034...@gmail.com > Reply-To: t26034...@gmail.com > Message-ID: <7f7ce828af994...@c1.my.domain> > >> Synopsis: Intel ethernet on Compulab Fitlet3 Elhart Lake unconfigured on >> latest snapshot >> Category: system >> Environment: > System : OpenBSD 7.3 > Details : OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23 > 17:40:19 MDT 2023 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >> Description: >"Intel Elkhart Lake Ethernet" rev 0x11 at pci0 dev 29 function > 1 not configured >> How-To-Repeat: > Boot latest snapshot, ethernet unconfigured >> Fix: > jsg's reply to similar bug report on Feb 4 '23 indicated they were > not supported at that time. this is still true. my understanding is the dwqe driver could be hacked up to support it, but i don't know of anyone with the time, interest, and one of these machines to do the work with. dlg > > > dmesg: > OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23 17:40:19 MDT 2023 >dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 8378597376 (7990MB) > avail mem = 8105050112 (7729MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x76a31000 (81 entries) > bios0: vendor American Megatrends International, LLC. version "5.19" > date 01/24/2023 > bios0: Default string Default string > efi0 at bios0: UEFI 2.7 > efi0: American Megatrends rev 0x50013 > acpi0 at bios0: ACPI 6.2 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP MCFG SSDT SSDT SSDT FIDT OEM1 HPET APIC PRAM > RSCI SSDT SSDT SSDT NHLT SSDT SSDT PSDS LPIT SSDT DMAR SSDT TPM2 WSMT > FPDT > acpi0: wakeup devices RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) > PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) PXSX(S4) > RP07(S4) PXSX(S4) XHCI(S3) XDCI(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimcfg0 at acpi0 > acpimcfg0: addr 0xc000, bus 0-255 > acpihpet0 at acpi0: 1920 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB > 64b/line 12-way L2 cache, 4MB 64b/line 16-way L3 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 38MHz > cpu0: mwait min=64, max=64, C-substates=0.2.0.2.2.1.1.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB > 64b/line 12-way L2 cache, 4MB 64b/line 16-way L3 cache > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 4 (application processor) > cpu2: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01 > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES > cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB > 64b/line 12-way L2 cache, 4MB 64b/line 16-way L3 cache > cpu2: smt 0, core 2, package 0 > cpu3 at mainbus0: apid 6 (application processor) > cpu3: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01 > cpu3: >
Re: vmt not enabled when running as a VM on ESXi on arm platforms
is there output from eeprom -p? if so, can you share that too? > On 25 Apr 2023, at 22:47, John Jore wrote: > >> Synopsis: vmt not enabled when running as a VM on ESXi on arm platforms >> Category: kernel aarch64 arm >> Environment: > System : OpenBSD 7.3 > Details : OpenBSD 7.3 (GENERIC) #132: Sat Mar 25 13:51:54 MDT 2023 > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC > > Architecture: OpenBSD.arm64 > Machine : arm64 >> Description: > vmt not enabled when OpenBSD is running as a VM on ESXi on ARM > platforms > > From vmt.c: > #if !defined(__i386__) && !defined(__amd64__) > #error vmt(4) is only supported on i386 and amd64 > #endif > > >> How-To-Repeat: > Default installation >> Fix: > Modify vmt.c to allow vmt to load/run on arm platforms? > > > dmesg: > OpenBSD 7.3 (GENERIC) #132: Sat Mar 25 13:51:54 MDT 2023 > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC > real mem = 260857856 (248MB) > avail mem = 218324992 (208MB) > random: good seed from bootblocks > mainbus0 at root: linux,dummy-virt > psci0 at mainbus0: PSCI 0.2 > cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3 > cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache > cpu0: 1024KB 64b/line 16-way L2 cache > cpu0: CRC32,ASID16 > efi0 at mainbus0: UEFI 2.6 > efi0: EDK II rev 0x1 > smbios0 at efi0: SMBIOS 3.0.0 > smbios0: vendor VMware version "VEFI" date 12/31/2020 > smbios0: VMware, Inc. VMware VBSA > apm0 at mainbus0 > ampintc0 at mainbus0 nirq 992, ncpu 1: "interrupt-controller" > ampintcmsi0 at ampintc0: nspi 927 > "hypervisor" at mainbus0 not configured > "dcc" at mainbus0 not configured > agtimer0 at mainbus0: 54000 kHz > pciecam0 at mainbus0 > pci0 at pciecam0 > 0:15:0: rom address conflict 0x8000/0x8000 > vendor "VMware", unknown product 0x1976 (class bridge subclass host, rev > 0x01) at pci0 dev 0 function 0 not configured > "VMware VMCI" rev 0x10 at pci0 dev 0 function 7 not configured > vendor "VMware", unknown product 0x0406 (class display subclass VGA, rev > 0x00) at pci0 dev 15 function 0 not configured > ppb0 at pci0 dev 17 function 0 "VMware PCI" rev 0x02 > pci1 at ppb0 bus 1 > ahci0 at pci1 dev 0 function 0 "VMware AHCI" rev 0x00: msi, AHCI 1.3 > ahci0: port 0: 6.0Gb/s > scsibus0 at ahci0: 32 targets > sd0 at scsibus0 targ 0 lun 0: > naa.5000c29764a8efa2 > sd0: 8192MB, 512 bytes/sector, 16777216 sectors, thin > ppb1 at pci0 dev 21 function 0 "VMware PCIE" rev 0x01: msi > pci2 at ppb1 bus 2 > 2:0:0: io address conflict 0xe000/0x20 > em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address > 00:50:56:be:85:3b > ppb2 at pci0 dev 21 function 1 "VMware PCIE" rev 0x01: msi > pci3 at ppb2 bus 3 > ppb3 at pci0 dev 21 function 2 "VMware PCIE" rev 0x01: msi > pci4 at ppb3 bus 4 > ppb4 at pci0 dev 21 function 3 "VMware PCIE" rev 0x01: msi > pci5 at ppb4 bus 5 > ppb5 at pci0 dev 21 function 4 "VMware PCIE" rev 0x01: msi > pci6 at ppb5 bus 6 > ppb6 at pci0 dev 21 function 5 "VMware PCIE" rev 0x01: msi > pci7 at ppb6 bus 7 > ppb7 at pci0 dev 21 function 6 "VMware PCIE" rev 0x01: msi > pci8 at ppb7 bus 8 > ppb8 at pci0 dev 21 function 7 "VMware PCIE" rev 0x01: msi > pci9 at ppb8 bus 9 > ppb9 at pci0 dev 22 function 0 "VMware PCIE" rev 0x01: msi > pci10 at ppb9 bus 10 > xhci0 at pci10 dev 0 function 0 "VMware xHCI" rev 0x00: msi, xHCI 1.0 > usb0 at xhci0: USB revision 3.0 > uhub0 at usb0 configuration 1 interface 0 "VMware xHCI root hub" rev > 3.00/1.00 addr 1 > ppb10 at pci0 dev 22 function 1 "VMware PCIE" rev 0x01: msi > pci11 at ppb10 bus 11 > ppb11 at pci0 dev 22 function 2 "VMware PCIE" rev 0x01: msi > pci12 at ppb11 bus 12 > ppb12 at pci0 dev 22 function 3 "VMware PCIE" rev 0x01: msi > pci13 at ppb12 bus 13 > ppb13 at pci0 dev 22 function 4 "VMware PCIE" rev 0x01: msi > pci14 at ppb13 bus 14 > ppb14 at pci0 dev 22 function 5 "VMware PCIE" rev 0x01: msi > pci15 at ppb14 bus 15 > ppb15 at pci0 dev 22 function 6 "VMware PCIE" rev 0x01: msi > pci16 at ppb15 bus 16 > ppb16 at pci0 dev 22 function 7 "VMware PCIE" rev 0x01: msi > pci17 at ppb16 bus 17 > ppb17 at pci0 dev 23 function 0 "VMware PCIE" rev 0x01: msi > pci18 at ppb17 bus 18 > 18:0:0: io address conflict 0xd000/0x20 > em1 at pci18 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address > 00:50:56:be:20:0d > ppb18 at pci0 dev 23 function 1 "VMware PCIE" rev 0x01: msi > pci19 at ppb18 bus 19 > ppb19 at pci0 dev 23 function 2 "VMware PCIE" rev 0x01: msi > pci20 at ppb19 bus 20 > ppb20 at pci0 dev 23 function 3 "VMware PCIE" rev 0x01: msi > pci21 at ppb20 bus 21 > ppb21 at pci0 dev 23 function 4 "VMware PCIE" rev 0x01: msi > pci22 at ppb21 bus 22 > ppb22 at pci0 dev 23 function 5 "VMware PCIE" rev 0x01: msi > pci23 at ppb22 bus 23 > ppb23 at pci0 dev 23 function 6 "VMware PCIE" rev 0x01: msi > pci24 at ppb23 bus 24 > ppb24 at pci0 dev 23 function 7 "VMware PCIE" rev 0x01: msi > pci25 at
Re: Hetzner arm64 Cloud
On Sun, Apr 16, 2023 at 11:39:33PM +0200, Patrick Wildt wrote: > You can also simply dd the image to /dev/sda and reboot, but that still > doesn't solve the problem. The bootup is hard to debug because the > console is KVM and uses viogpu. As soon as we exit the EFI bootservices > the framebuffer is shut down for whatever reason. Means we can only get > access to it again through viogpu, which happens pretty late. I wish we > had a serial console, because Qemu/edk2 can do it, they just don't make > it available. This is gonna be "fun" to debug without serial. i dont think the problem here is booting openbsd, but if it were the diff below might help. this diff teaches BOOTAA64.EFI to load files from the EFI System Partition that the boot loader was run from. this means you can go "boot esp0a:bsd.rd" at the boot> prompt and get into the installer. i wrote this cos i wanted another option for getting openbsd installed on machines where the boot loader and driver support arent that great yet. i can imagine it being useful for upgrading the OS on a system where it's difficult to plug install media in, or repartitioning or overwriting the disk is risky. especially if you also just want to check how well the hardware is supported in openbsd before making changes. > > On Sat, Apr 15, 2023 at 11:33:39AM +0100, Chris Narkiewicz wrote: > > > > I asked Hetzner to import install73.img and mounted it as VM CD-ROM, > > but it doesn't boot. I'm not sure if this is a bug either. > > > > Cheers, > > Chris Narkiewic > > > > On Thu, 2023-04-13 at 16:16 +, Mikolaj Kucharski wrote: > > > Hi, > > > > > > I'm not sure does this belong to bugs@ > > > > > > However what I used in the past was Yaifo and I still use it every > > > few > > > years, but it takes too much effort to rebase it to -current, so I > > > didn't touch it for few years now, but for me it worked really > > > nicely. > > > > > > https://github.com/jedisct1/yaifo > > > > > > > > > On Thu, Apr 13, 2023 at 09:00:23AM +0200, Peter J. Philipp wrote: > > > > Hi, > > > > > > > > Yesterday hetzner.com came out with arm64 cloud instances, I tried > > > > one out. > > > > Here is what I found.? The images they give you a choice of does > > > > not include > > > > OpenBSD, so I had to get a ubuntu OS.? That's fine the EFI > > > > partition was > > > > already mounted.? Through trialing this I found the best way of > > > > getting the > > > > OpenBSD loader to boot was the following way: > > > > > > > > 1. place miniroot73.img on the EFI partition root (/boot/efi/) > > > > 2. reboot > > > > 3. press escape to get to the BIOS, there is 3 options one is a > > > > configuration > > > > ?? option under 1, enter it.? I'm working off memory here I didn't > > > > save > > > > ?? anything so take it with a grain of salt on exactness.? In this > > > > option is > > > > ?? an option to create a RAM drive from a file, go there and enter > > > > the > > > > ?? miniroot73.img (45MB).? The down arrows didn't work in this BIOS > > > > so it was > > > > ?? great that it wrapped around going up. > > > > 4. next go back into the main bios screen by pressing escape.? > > > > There is option > > > > ?? 3 for boot options, enter it.? There is a boot from file option > > > > enter it. > > > > ?? Select the RAM drive and manouver your way to the bootaa64.efi > > > > file.? Press > > > > ?? enter. > > > > 5. OpenBSD loader now loads.? ls displays bsd and bsd.rd, the > > > > console is on > > > > ?? comcons0 or something like that.? Switching to fb0 works too.? > > > > Then when > > > > ?? pressing boot a blank screen happens.? Waiting a while no > > > > prompts and I > > > > ?? didn't try to blind type anything.? Doing this again with fb0 > > > > doesn't > > > > ?? work either. > > > > 6. Full stop, I didn't get further. > > > > > > > > I then deleted my instance as ubuntu is not good enough for me.? I > > > > guess we'll > > > > have to wait until the pros get to it.? Thanks! > > > > > > > > Best Regards, > > > > -peter > > > > > > > > > > > -- > > +44 7502 415 180 (Phone, Signal, WhatsApp) > > @ezaquarii:etacassiopeiae.net (Matrix) Index: conf.c === RCS file: /cvs/src/sys/arch/arm64/stand/efiboot/conf.c,v retrieving revision 1.44 diff -u -p -r1.44 conf.c --- conf.c 15 Feb 2023 14:13:38 - 1.44 +++ conf.c 15 Apr 2023 02:22:34 - @@ -58,10 +58,13 @@ struct fs_ops file_system[] = { ufs_stat,ufs_readdir, ufs_fchmod }, { ufs2_open, ufs2_close, ufs2_read, ufs2_write, ufs2_seek, ufs2_stat, ufs2_readdir, ufs2_fchmod }, + { esp_open,esp_close,esp_read,esp_write,esp_seek, + esp_stat,esp_readdir, } }; int nfsys = nitems(file_system); struct devsw devsw[] = { + { "esp", espstrategy, espopen, espclose, espioctl }, { "tftp", tftpstrategy, tftpopen, tftpclose, tftpioctl }, { "sd",
Re: Sierra Wireless MC7750 attaches as ugen(4) on OpenBSD 7.3 #1125 2023-March-25
On Thu, Apr 06, 2023 at 09:13:27AM +, Gerhard Roth wrote: > On Thu, 2023-04-06 at 18:49 +1000, David Gwynne wrote: > > On Wed, Apr 05, 2023 at 11:22:34PM +, Mikolaj Kucharski wrote: > > > > this is almost certainly a qualcomm msm interface (qmi) device. > > making umsm(4) attach to it is a good first start. > > > > hopefully you'll be able to talk AT commands to one of the interfaces. > > > > qmi devices are notoriously inconsistent and complicated, so what to do > > next isnt clear. i would be trying to tell the modem to switch to mbim > > mode and then figure out how to get umb(4) to attach. this is similar to > > the changes i made to umsm and umb for quectel devices, but they > > actually provided a decent manual. > > The Sierra Wireless documentation is available. Alas, switching the > mode seems far too complex and error prone to perform this inside > a driver. agreed. i just want the kernel to attach the right things to what is presented. > When the AT (modem) interface is available, you would have to: > > 1) enter password protected command mode with "AT!ENTERCND=passwd" > > 2) query the list of modes with "AT!UDUSBCOMP=?". Example result: > > 0 - reserved NOT SUPPORTED > 1 - DM AT SUPPORTED > 2 - reserved NOT SUPPORTED > 3 - reserved NOT SUPPORTED > 4 - reserved NOT SUPPORTED > 5 - reserved NOT SUPPORTED > 6 - DM NMEA ATQMI SUPPORTED > 7 - DM NMEA ATRMNET1 RMNET2 RMNET3SUPPORTED > 8 - DM NMEA ATMBIMSUPPORTED > 9 - MBIM SUPPORTED > 10 - NMEA MBIMSUPPORTED > 11 - DM MBIMSUPPORTED > 12 - DM NMEA MBIM SUPPORTED > 13 - Config1: comp6Config2: comp8 NOT SUPPORTED > 14 - Config1: comp6Config2: comp9 SUPPORTED > 15 - Config1: comp6Config2: comp10NOT SUPPORTED > 16 - Config1: comp6Config2: comp11NOT SUPPORTED > 17 - Config1: comp6Config2: comp12NOT SUPPORTED > 18 - Config1: comp7Config2: comp8 NOT SUPPORTED > 19 - Config1: comp7Config2: comp9 SUPPORTED > 20 - Config1: comp7Config2: comp10NOT SUPPORTED > 21 - Config1: comp7Config2: comp11NOT SUPPORTED > 22 - Config1: comp7Config2: comp12NOT SUPPORTED > > There is no guarantee that the table doesn't change. And every > device has a differnt set of supported modes. > > 3) select the desired mode with "AT!UDUSBCOMP=X" > 4) wait for the device to reset itself yep. the linux driver has some clues, so the following should let umb attach once you've reconfigured the modem. Index: umsm.c === RCS file: /cvs/src/sys/dev/usb/umsm.c,v retrieving revision 1.125 diff -u -p -r1.125 umsm.c --- umsm.c 2 Apr 2023 23:57:57 - 1.125 +++ umsm.c 6 Apr 2023 09:21:35 - @@ -101,6 +101,7 @@ struct umsm_type { #defineDEV_NORMAL 0x #defineDEV_HUAWEI 0x0001 #defineDEV_TRUINSTALL 0x0002 +#defineDEV_SIERRA 0x0004 #defineDEV_UMASS1 0x0010 #defineDEV_UMASS2 0x0020 #defineDEV_UMASS3 0x0040 @@ -271,6 +272,7 @@ static const struct umsm_type umsm_devs[ {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_AIRCARD_340U}, 0}, {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_AIRCARD_770S}, 0}, {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_MC7455}, 0}, + {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_MC7700}, DEV_SIERRA}, {{ USB_VENDOR_SIMCOM, USB_PRODUCT_SIMCOM_SIM5320}, 0}, {{ USB_VENDOR_SIMCOM, USB_PRODUCT_SIMCOM_SIM7600E}, 0}, @@ -363,6 +365,17 @@ umsm_match(struct device *parent, void * /* Interface 4 can be used as a network device */ if (uaa->ifaceno >= 4) return UMATCH_NONE; + } else if (flag & DEV_SIERRA) { + /* Sierra Wireless layout */ + switch (uaa->ifaceno) { + case 0: + case 2: + case 3: + /* Only umsm on specific interfaces */ + break; + default: + return UMATCH_NONE; + } } return UMATCH_VENDOR_IFACESUBCLASS;
Re: Sierra Wireless MC7750 attaches as ugen(4) on OpenBSD 7.3 #1125 2023-March-25
On Wed, Apr 05, 2023 at 11:22:34PM +, Mikolaj Kucharski wrote: > On Wed, Apr 05, 2023 at 11:16:55PM +, miko...@kucharski.name wrote: > > >Synopsis: Sierra Wireless MC7750 attaches as ugen(4) > > >Category: kernel > > >Environment: > > System : OpenBSD 7.3 > > Details : OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT > > 2023 > > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > Sierra Wireless MC7750 attaches as ugen(4) on PC Engines APU3, > > running OpenBSD. Modem is mini-PCIe device. > > >How-To-Repeat: > > No specific steps. Plug in the device, boot up OpenBSD. > > >Fix: > > Unknown. > > > > Forgot to also add lsusb output. > > # lsusb -vvv -d 1199:68a2 > > Bus 002 Device 003: ID 1199:68a2 Sierra Wireless, Inc. > Device Descriptor: > bLength18 > bDescriptorType 1 > bcdUSB 2.00 > bDeviceClass0 (Defined at Interface level) > bDeviceSubClass 0 > bDeviceProtocol 0 > bMaxPacketSize064 > idVendor 0x1199 Sierra Wireless, Inc. > idProduct 0x68a2 > bcdDevice0.06 > iManufacturer 3 Sierra Wireless, Incorporated > iProduct2 MC7750 > iSerial 4 00102700999 > bNumConfigurations 1 > Configuration Descriptor: > bLength 9 > bDescriptorType 2 > wTotalLength 115 > bNumInterfaces 4 > bConfigurationValue 1 > iConfiguration 1 Sierra Configuration > bmAttributes 0xe0 > Self Powered > Remote Wakeup > MaxPower0mA > Interface Descriptor: > bLength 9 > bDescriptorType 4 > bInterfaceNumber0 > bAlternateSetting 0 > bNumEndpoints 2 > bInterfaceClass 255 Vendor Specific Class > bInterfaceSubClass255 Vendor Specific Subclass > bInterfaceProtocol255 Vendor Specific Protocol > iInterface 0 > Endpoint Descriptor: > bLength 7 > bDescriptorType 5 > bEndpointAddress 0x81 EP 1 IN > bmAttributes2 > Transfer TypeBulk > Synch Type None > Usage Type Data > wMaxPacketSize 0x0200 1x 512 bytes > bInterval 32 > Endpoint Descriptor: > bLength 7 > bDescriptorType 5 > bEndpointAddress 0x01 EP 1 OUT > bmAttributes2 > Transfer TypeBulk > Synch Type None > Usage Type Data > wMaxPacketSize 0x0200 1x 512 bytes > bInterval 32 > Interface Descriptor: > bLength 9 > bDescriptorType 4 > bInterfaceNumber2 > bAlternateSetting 0 > bNumEndpoints 2 > bInterfaceClass 255 Vendor Specific Class > bInterfaceSubClass255 Vendor Specific Subclass > bInterfaceProtocol255 Vendor Specific Protocol > iInterface 0 > Endpoint Descriptor: > bLength 7 > bDescriptorType 5 > bEndpointAddress 0x82 EP 2 IN > bmAttributes2 > Transfer TypeBulk > Synch Type None > Usage Type Data > wMaxPacketSize 0x0200 1x 512 bytes > bInterval 32 > Endpoint Descriptor: > bLength 7 > bDescriptorType 5 > bEndpointAddress 0x02 EP 2 OUT > bmAttributes2 > Transfer TypeBulk > Synch Type None > Usage Type Data > wMaxPacketSize 0x0200 1x 512 bytes > bInterval 32 > Interface Descriptor: > bLength 9 > bDescriptorType 4 > bInterfaceNumber3 > bAlternateSetting 0 > bNumEndpoints 3 > bInterfaceClass 255 Vendor Specific Class > bInterfaceSubClass255 Vendor Specific Subclass > bInterfaceProtocol255 Vendor Specific Protocol > iInterface 0 > Endpoint Descriptor: > bLength 7 > bDescriptorType 5 > bEndpointAddress 0x83 EP 3 IN > bmAttributes3 > Transfer TypeInterrupt > Synch Type None > Usage Type Data > wMaxPacketSize 0x0040 1x 64 bytes > bInterval
Re: Dell Wyse 3040 acpitz vs tipmic
On Sun, Feb 26, 2023 at 01:28:04PM +0100, Mark Kettenis wrote: > > Date: Sun, 26 Feb 2023 18:13:18 +1000 > > From: David Gwynne yeesh, i should have proofread my email before i sent it. sorry about making it harder to read than it should have been. > > i picked a couple of Dell Wyse 3040 boxes, which are very cute, i > > like them a lot. however, i have to disable acpitz to be able to > > use them because the driver gets stuck during attach. > > > > during apcitz_attach does a read of all the temperatures. the read > > of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler(). > > tipmic_thermal_opreg_handler has a loop on line 335 waiting for > > sc->sc_stat_adc to change, but that value is only set from tipmic_intr. > > acpitz_attach is running while the kernel is code, and it appears that > > the interrupt handler never runs, so that value never changes, and > > acpitz blocks. also because it's cold, the timeout on the tsleep doesn't > > do anything. thanks to patrick for helping me on the acpi side of things > > so we could figure this out. > > A better approach might be to make sure that while we're cold, > tipmic_thermal_opreg_handler() polls for completion. Something like: > > while (sc->sc_stat_adc == 0) { > if (cold) { > delay(1000); > tpmic_intr(); > } else { > if (tsleep(>sc_stat_adc, PRIBIO, "tipmic", > SEC_TO_NSEC(1))) { > ... > } > } > } > > > > i tried deferring basically all of acpitz_attach to when kthreads are > > running, and that works well enough to get to userland. > > > > is that reasonable? > > The problem is that you can't really know whether AML accesses the > opregion while cold. good point. the diff below works in this situation and is less intrusive. > > also, shortly after dwiic complains about short reads and the kernel > > locks up again. i'll have to plug it in and transcribe the exact > > errors. i think that's a separate problem though. > > Yes, dwiic(4) has always been somewhat problematic. Transactions seem > to fail randomly on some platforms like the atom system you're looking > at but also on my Ampere eMAG system. fun. i managed to catch some of the dwiic stuff via dmesg before it locked up: dwiic0: timed out waiting for tx_empty intr dwiic0: timed out waiting for rx_full intr dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x5b dwiic0: timed out waiting for tx_empty intr dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x01 dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x01 dwiic0: timed out waiting for rx_full intr dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x5a dwiic0: timed out waiting for rx_full intr dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x50 dwiic0: timed out waiting for stop intr dwiic0: timed out waiting for stop intr dwiic0: timed out waiting for stop intr dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x01 dwiic0: timed out waiting for bus idle dwiic0: timed out waiting for stop intr dwiic0: timed out waiting for stop intr dwiic0: timed out waiting for stop intr dwiic0: timed out waiting for stop intr dwiic0: timed out waiting for stop intr dwiic0: timed out reading remaining 1 tipmic0: can't read register 0x01 dwiic0: timed out waiting for bus idle Index: tipmic.c === RCS file: /cvs/src/sys/dev/acpi/tipmic.c,v retrieving revision 1.7 diff -u -p -r1.7 tipmic.c --- tipmic.c6 Apr 2022 18:59:27 - 1.7 +++ tipmic.c26 Feb 2023 23:56:04 - @@ -276,6 +276,25 @@ struct tipmic_regmap tipmic_thermal_regm { 0x18, TIPMIC_SYSTEMP_HI, TIPMIC_SYSTEMP_LO } }; +static int +tipmic_wait_adc(struct tipmic_softc *sc) +{ + int i; + + if (!cold) { + return (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic", + SEC_TO_NSEC(1))); + } + + for (i = 0; i < 1000; i++) { + delay(1000); + if (tipmic_intr(sc) == 1) + return (0); + } + + return (EWOULDBLOCK); +} + int tipmic_thermal_opreg_handler(void *cookie, int iodir, uint64_t address, int size, uint64_t *value) @@ -333,8 +352,7 @@ tipmic_thermal_opreg_handler(void *cooki splx(s); while (sc->sc_stat_adc == 0) { - if (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic", - SEC_TO_NSEC(1))) { + if (tipmic_wait_adc(sc)) { printf("%s: ADC timeout\n", sc->sc_dev.dv_xname); break; }
Dell Wyse 3040 acpitz vs tipmic
i picked a couple of Dell Wyse 3040 boxes, which are very cute, i like them a lot. however, i have to disable acpitz to be able to use them because the driver gets stuck during attach. during apcitz_attach does a read of all the temperatures. the read of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler(). tipmic_thermal_opreg_handler has a loop on line 335 waiting for sc->sc_stat_adc to change, but that value is only set from tipmic_intr. acpitz_attach is running while the kernel is code, and it appears that the interrupt handler never runs, so that value never changes, and acpitz blocks. also because it's cold, the timeout on the tsleep doesn't do anything. thanks to patrick for helping me on the acpi side of things so we could figure this out. i tried deferring basically all of acpitz_attach to when kthreads are running, and that works well enough to get to userland. is that reasonable? also, shortly after dwiic complains about short reads and the kernel locks up again. i'll have to plug it in and transcribe the exact errors. i think that's a separate problem though. OpenBSD 7.2-current (GENERIC.MP) #1071: Wed Feb 22 17:34:56 MST 2023 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 2018418688 (1924MB) avail mem = 1937928192 (1848MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7a9f4000 (50 entries) bios0: vendor Dell Inc. version "1.2.5" date 08/20/2018 bios0: Dell Inc. Wyse 3040 Thin Client efi0 at bios0: UEFI 2.4 efi0: American Megatrends rev 0x5000b acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S4 S5 acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT SSDT UEFI SSDT HPET SSDT SSDT SSDT LPIT BCFG PRAM CSRT WDAT acpi0: wakeup devices acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.02 MHz, 06-4c-04 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 16-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 79MHz cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.03 MHz, 06-4c-04 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu1: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 16-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.04 MHz, 06-4c-04 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu2: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 16-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.07 MHz, 06-4c-04 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu3: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 16-way L2 cache cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 115 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (RP01) acpiprt2 at acpi0: bus -1 (RP02) acpiprt3 at acpi0: bus -1 (RP03) acpiprt4 at acpi0: bus -1 (RP04) "INT33A4" at acpi0 not configured dwiic0 at acpi0 I2C7 addr 0x9151e000/0x1000 irq 38 iic0 at dwiic0 chvgpio0 at acpi0 GPO1 uid 2 addr 0xfed88000/0x8000 irq 48, 59 pins tipmic0 at iic0 addr 0x5e gpio 15 acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001 com0 at acpi0 IURT
Re: deadlock in ifconfig
On Tue, Nov 22, 2022 at 10:49:35AM +1000, David Gwynne wrote: > On Mon, Nov 21, 2022 at 08:53:52PM +0100, Mark Kettenis wrote: > > > Date: Mon, 21 Nov 2022 20:28:35 +0100 > > > From: Alexander Bluhm > > > > > > Hi, > > > > > > Some of my test machines hang while booting userland. > > > > > > starting network > > > -> here it hangs > > > load: 0.02 cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k > > > > > > ddb shows these two processes. > > > > > > 81303 375320 89140 0 3 0x3 sbar ifconfig > > > 48135 157353 0 0 3 0x14200 netlock systqmp > > > > > > ddb{0}> trace /t 0t375320 > > > sleep_finish(800022d31318,1) at sleep_finish+0xfe > > > cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54 > > > sched_barrier(800022512ff0) at sched_barrier+0x73 > > > ixgbe_stop(80118000) at ixgbe_stop+0x1f7 > > > ixgbe_init(80118000) at ixgbe_init+0x32 > > > ixgbe_ioctl(80118048,8020690c,8022ec00) at > > > ixgbe_ioctl+0x13a > > > in_ifinit(80118048,8022ec00,800022d31740,1) at > > > in_ifinit+0x > > > ef > > > in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at > > > in_ioct > > > l_change_ifaddr+0x3a4 > > > in_control(fd81901dc740,8040691a,800022d31730,80118048) > > > at in_c > > > ontrol+0x75 > > > ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at > > > ifioctl > > > +0x982 > > > sys_ioctl(800022d6,800022d31840,800022d318a0) at > > > sys_ioctl+0x2c > > > 4 > > > syscall(800022d31910) at syscall+0x384 > > > Xsyscall() at Xsyscall+0x128 > > > end of kernel > > > end trace frame: 0x7f7d94a0, count: -13 > > > > > > ddb{0}> trace /t 0t157353 > > > sleep_finish(800022ca8b70,1) at sleep_finish+0xfe > > > rw_enter(822b4f80,1) at rw_enter+0x1cb > > > pf_purge(0) at pf_purge+0x1d > > > taskq_thread(822ac568) at taskq_thread+0x100 > > > end trace frame: 0x0, count: -4 > > > > > > ifconfig waits for the sched_barrier_task() on the systqmp task > > > queue while holding the netlock. pf_purge() runs on the systqmp > > > task queue and is waiting for the netlock. The netlock has been > > > taken by ifconfig in in_ioctl_change_ifaddr(). > > > > > > The problem has been introduced when pf_purge() was moved from systq > > > to systqmp. > > > https://marc.info/?l=openbsd-cvs=166818274216800=2 > > > > I'd say pfpurge should be moved to itw own taskq. > > we're working toward dropping the need for NET_LOCK before PF_LOCK. could > we try the diff below as a compromise? > sashan@ and mvs@ have pushed that forward, so this diff should be enough now. Index: pf.c === RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.1153 diff -u -p -r1.1153 pf.c --- pf.c12 Nov 2022 02:48:14 - 1.1153 +++ pf.c24 Nov 2022 01:21:48 - @@ -1603,9 +1603,6 @@ pf_purge(void *null) { unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]); - /* XXX is NET_LOCK necessary? */ - NET_LOCK(); - PF_LOCK(); pf_purge_expired_src_nodes(); @@ -1616,7 +1613,6 @@ pf_purge(void *null) * Fragments don't require PF_LOCK(), they use their own lock. */ pf_purge_expired_fragments(); - NET_UNLOCK(); /* interpret the interval as idle time between runs */ timeout_add_sec(_purge_to, interval); @@ -1891,7 +1887,6 @@ pf_purge_expired_states(const unsigned i if (SLIST_EMPTY()) return (scanned); - NET_LOCK(); rw_enter_write(_state_list.pfs_rwl); PF_LOCK(); PF_STATE_ENTER_WRITE(); @@ -1904,7 +1899,6 @@ pf_purge_expired_states(const unsigned i PF_STATE_EXIT_WRITE(); PF_UNLOCK(); rw_exit_write(_state_list.pfs_rwl); - NET_UNLOCK(); while ((st = SLIST_FIRST()) != NULL) { SLIST_REMOVE_HEAD(, gc_list);
Re: deadlock in ifconfig
On Mon, Nov 21, 2022 at 08:53:52PM +0100, Mark Kettenis wrote: > > Date: Mon, 21 Nov 2022 20:28:35 +0100 > > From: Alexander Bluhm > > > > Hi, > > > > Some of my test machines hang while booting userland. > > > > starting network > > -> here it hangs > > load: 0.02 cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k > > > > ddb shows these two processes. > > > > 81303 375320 89140 0 3 0x3 sbar ifconfig > > 48135 157353 0 0 3 0x14200 netlock systqmp > > > > ddb{0}> trace /t 0t375320 > > sleep_finish(800022d31318,1) at sleep_finish+0xfe > > cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54 > > sched_barrier(800022512ff0) at sched_barrier+0x73 > > ixgbe_stop(80118000) at ixgbe_stop+0x1f7 > > ixgbe_init(80118000) at ixgbe_init+0x32 > > ixgbe_ioctl(80118048,8020690c,8022ec00) at ixgbe_ioctl+0x13a > > in_ifinit(80118048,8022ec00,800022d31740,1) at > > in_ifinit+0x > > ef > > in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at > > in_ioct > > l_change_ifaddr+0x3a4 > > in_control(fd81901dc740,8040691a,800022d31730,80118048) at > > in_c > > ontrol+0x75 > > ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at > > ifioctl > > +0x982 > > sys_ioctl(800022d6,800022d31840,800022d318a0) at > > sys_ioctl+0x2c > > 4 > > syscall(800022d31910) at syscall+0x384 > > Xsyscall() at Xsyscall+0x128 > > end of kernel > > end trace frame: 0x7f7d94a0, count: -13 > > > > ddb{0}> trace /t 0t157353 > > sleep_finish(800022ca8b70,1) at sleep_finish+0xfe > > rw_enter(822b4f80,1) at rw_enter+0x1cb > > pf_purge(0) at pf_purge+0x1d > > taskq_thread(822ac568) at taskq_thread+0x100 > > end trace frame: 0x0, count: -4 > > > > ifconfig waits for the sched_barrier_task() on the systqmp task > > queue while holding the netlock. pf_purge() runs on the systqmp > > task queue and is waiting for the netlock. The netlock has been > > taken by ifconfig in in_ioctl_change_ifaddr(). > > > > The problem has been introduced when pf_purge() was moved from systq > > to systqmp. > > https://marc.info/?l=openbsd-cvs=166818274216800=2 > > I'd say pfpurge should be moved to itw own taskq. we're working toward dropping the need for NET_LOCK before PF_LOCK. could we try the diff below as a compromise? > ixgb(4) holding netlock while calling sched_barrier() is probably > wrong too. it's pretty baked in at this point that the SIOCSIFFLAGS ioctl is called while holding the net lock, and we've been saying for ages that you can clear IFF_RUNNING and then call intr_barrier (and lots of other barriers) to wait for things to get off the rings before tearing them down. long term, drivers should protect themselves. we're nowhere near that point though. Index: pf.c === RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.1153 diff -u -p -r1.1153 pf.c --- pf.c12 Nov 2022 02:48:14 - 1.1153 +++ pf.c22 Nov 2022 00:29:30 - @@ -1603,20 +1620,14 @@ pf_purge(void *null) { unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]); - /* XXX is NET_LOCK necessary? */ - NET_LOCK(); - - PF_LOCK(); - + rw_enter_write(_lock); /* PF_LOCK() without NET_LOCK() */ pf_purge_expired_src_nodes(); - - PF_UNLOCK(); + rw_exit_write(_lock); /* PF_UNLOCK() without NET_LOCK() */ /* * Fragments don't require PF_LOCK(), they use their own lock. */ pf_purge_expired_fragments(); - NET_UNLOCK(); /* interpret the interval as idle time between runs */ timeout_add_sec(_purge_to, interval); @@ -1891,9 +1902,8 @@ pf_purge_expired_states(const unsigned i if (SLIST_EMPTY()) return (scanned); - NET_LOCK(); rw_enter_write(_state_list.pfs_rwl); - PF_LOCK(); + rw_enter_write(_lock); /* PF_LOCK() without NET_LOCK() */ PF_STATE_ENTER_WRITE(); SLIST_FOREACH(st, , gc_list) { if (st->timeout != PFTM_UNLINKED) @@ -1902,9 +1912,8 @@ pf_purge_expired_states(const unsigned i pf_free_state(st); } PF_STATE_EXIT_WRITE(); - PF_UNLOCK(); + rw_exit_write(_lock); /* PF_UNLOCK() without NET_LOCK() */ rw_exit_write(_state_list.pfs_rwl); - NET_UNLOCK(); while ((st = SLIST_FIRST()) != NULL) { SLIST_REMOVE_HEAD(, gc_list);
Re: pf panic with clean snapshot (GENERIC.MP) #570
i upgraded one of the work firewalls to -current and added the diff below in, and got what looks like a different panic: ddb{6}> tr db_enter() at db_enter+0x5 panic(81e2cc31) at panic+0xbf __assert(81eae23b,81ee4549,797,81e7fd91) at __assert+0x25 pfsync_insert_state(fd816375b720) at pfsync_insert_state+0xec pf_state_insert(801f,800024cd9cc0,800024cd9cb8,fd816375b720) at pf_state_insert+0x2df pf_test_rule(800024cd9e50,800024cd9e38,800024cd9e30,800024cd9e40,800024cd9e28,800024cd9e4e,1) at pf_test_rule+0xec4 pf_test(2,3,816ab800,800024cd9fe8) at pf_test+0x1126 ip_output(fd8052dae400,0,800024cda0b0,1,0,0,81640801) at ip_out put+0x72a ip_forward(fd8052dae400,81640800,fd81840e2ad0,0) at ip_forward+0x286 ip_input_if(800024cda2e0,800024cda2dc,4,0,81640800) at ip_input_if+0x347 ipv4_input(81640800,fd8052dae400) at ipv4_input+0x37 ether_input(81640800,fd8052dae400) at ether_input+0x394 carp_input(816a2000,fd8052dae400,5e000156) at carp_input+0x186 ether_input(816a2000,fd8052dae400) at ether_input+0x1c5 vlan_input(8161a000,fd8052dae400,800024cda4fc) at vlan_input+0x22d ether_input(8161a000,fd8052dae400) at ether_input+0x83 if_input_process(8019a048,800024cda588) at if_input_process+0x4a ifiq_process(801f0800) at ifiq_process+0x8e taskq_thread(8002c080) at taskq_thread+0xfa end trace frame: 0x0, count: -19 ddb{6}> sh panic *cpu6: kernel diagnostic assertion "st->sync_state == PFSYNC_S_NONE" failed: file "/usr/src/sys/net/if_pfsync.c", line 1943 i'll try and have a look at it. i am probably most responsible for the code :( On Wed, Jun 08, 2022 at 12:42:33AM +0200, Alexandr Nedvedicky wrote: > Hello Hrvoje, > > > > > > Hi, > > > > while booting with this diff I've got this log: > > > > starting early daemons: syslogd pflogd ntpdwitness: lock_object > > uninitialized: 0xfd8785c81a > > 90 > > Starting stack trace... > > witness_checkorder(fd8785c81a90,9,0) at witness_checkorder+0xad > > mtx_enter(fd8785c81a80) at mtx_enter+0x34 > > pf_remove_state(fd8785c81988) at pf_remove_state+0x1da > > pfsync_in_del_c(fd80028977b0,c,2,2) at pfsync_in_del_c+0x9f > > pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c > > ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 > > ip_local(800020b056e8,800020b056f4,fe007fff0220,0) at > > ip_local+0x1b7 > > ipintr() at ipintr+0x5f > > if_netisr(0) at if_netisr+0xca > > taskq_thread(80036000) at taskq_thread+0x11a > > thanks for quick test with pfsync. it has turned out I've forgot to > initialize > a pf_state::mtx in pfsync_state_import() function. > > below is updated diff, which should fix a stack trace reported by witness. > > thanks and > regards > sashan > > 8<---8<---8<--8< > diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c > index c78ca62766e..b039a50a7bb 100644 > --- a/sys/net/if_pfsync.c > +++ b/sys/net/if_pfsync.c > @@ -157,16 +157,16 @@ struct { > }; > > struct pfsync_q { > - void(*write)(struct pf_state *, void *); > + int (*write)(struct pf_state *, void *); > size_t len; > u_int8_taction; > }; > > /* we have one of these for every PFSYNC_S_ */ > -void pfsync_out_state(struct pf_state *, void *); > -void pfsync_out_iack(struct pf_state *, void *); > -void pfsync_out_upd_c(struct pf_state *, void *); > -void pfsync_out_del(struct pf_state *, void *); > +int pfsync_out_state(struct pf_state *, void *); > +int pfsync_out_iack(struct pf_state *, void *); > +int pfsync_out_upd_c(struct pf_state *, void *); > +int pfsync_out_del(struct pf_state *, void *); > > struct pfsync_q pfsync_qs[] = { > { pfsync_out_iack, sizeof(struct pfsync_ins_ack), PFSYNC_ACT_INS_ACK }, > @@ -516,10 +516,10 @@ pfsync_alloc_scrub_memory(struct pfsync_state_peer *s, > return (0); > } > > -void > +int > pfsync_state_export(struct pfsync_state *sp, struct pf_state *st) > { > - pf_state_export(sp, st); > + return (pf_state_export(sp, st)); > } > > int > @@ -680,6 +680,7 @@ pfsync_state_import(struct pfsync_state *sp, int flags) > st->sync_state = PFSYNC_S_NONE; > > refcnt_init(>refcnt); > + mtx_init(>mtx, IPL_NET); > > /* XXX when we have anchors, use STATE_INC_COUNTERS */ > r->states_cur++; > @@ -1529,24 +1530,25 @@ pfsyncioctl(struct ifnet *ifp, u_long cmd, caddr_t > data) > return (0); > } > > -void > +int > pfsync_out_state(struct pf_state *st, void *buf) > { > struct pfsync_state *sp = buf; > > - pfsync_state_export(sp, st); > + return (pfsync_state_export(sp, st)); > } > > -void > +int > pfsync_out_iack(struct pf_state
Re: relayd panic
> On 6 Jun 2022, at 18:08, Claudio Jeker wrote: > > On Mon, Jun 06, 2022 at 12:03:06AM +0200, Alexandr Nedvedicky wrote: >> Hello, >> >> I'll commit one-liner diff on Tuesday morning (Jun 6th) Prague time without >> explicit OK, unless there will be no objection. > > The diff is OK claudio@. ok by me too. > >> regards >> sashan >> >> >> On Sun, Jun 05, 2022 at 09:44:45AM +0100, Stuart Henderson wrote: >>> I don't know this code well enough to give a meaningful OK, but this >>> should probably get committed. >>> >>> >>> On 2022/06/01 09:16, Alexandr Nedvedicky wrote: Hello, > r420-1# rcctl -f start relayd > relayd(ok) > r420-1# uvm_fault(0xfd862f82f990, 0x0, 0, 1) -> e > kernel: page fault trap, code=0 > Stopped at pf_find_or_create_ruleset+0x1c: movb 0(%rdi),%al > TID PID UID PRFLAGS PFLAGS CPU COMMAND > 431388 19003 0 0x2 0 5 relayd > 174608 32253 89 0x112 0 2 relayd > 395415 12468 0 0x2 0 4 relayd > 493579 11904 0 0x2 0 3 relayd > *101082 14967 89 0x1100012 0 0K relayd > pf_find_or_create_ruleset(0) at pf_find_or_create_ruleset+0x1c > pfr_add_tables(832d7cca800,1,80eaf43c,1000) at > pfr_add_tables+0x6ae > > pfioctl(4900,c450443d,80eaf000,3,80002272e7f0) at > pfioctl+0x1d9f > VOP_IOCTL(fd8551f82dd0,c450443d,80eaf000,3,fd862f7d60c0,800 > 02272e7f0) at VOP_IOCTL+0x5c > vn_ioctl(fd855ecec1e8,c450443d,80eaf000,80002272e7f0) at > vn_ioctl+0x75 > sys_ioctl(80002272e7f0,8000227d9980,8000227d99d0) at > sys_ioctl+0x2c4 > syscall(8000227d9a40) at syscall+0x374 > Xsyscall() at Xsyscall+0x128 > end of kernel it looks like we are dying here at line 239 due to NULL pointer deference: 232 struct pf_ruleset * 233 pf_find_or_create_ruleset(const char *path) 234 { 235 char *p, *aname, *r; 236 struct pf_ruleset *ruleset; 237 struct pf_anchor *anchor; 238 239 if (path[0] == 0) 240 return (_main_ruleset); 241 242 while (*path == '/') 243 path++; 244 I've followed the same steps to reproduce the issue to check if diff below resolves the issue. The bug has been introduced by my recent change to pf_table.c [1] from May 10th: Modified files: sys/net : pf_ioctl.c pf_table.c Log message: move memory allocations in pfr_add_tables() out of NET_LOCK()/PF_LOCK() scope. bluhm@ helped a lot to put this diff into shape. besides using a regression test I've also did simple testing using a 'load anchor': netlock# cat /tmp/anchor.conf load anchor "test" from "/tmp/pf.conf" netlock# netlock# cat /tmp/pf.conf table { 192.168.1.1 } pass from netlock# netlock# pfctl -sA test netlock# pfctl -a test -sT try netlock# pfctl -a test -t try -T show 192.168.1.1 OK to commit fix below? thanks and regards sashan [1] https://urldefense.com/v3/__https://marc.info/?l=openbsd-cvs=16522243003=2__;!!ACWV5N9M2RV99hQ!LsTJPPsMku6N_u9xzJu6Tj6XpZWyLzLWPmbWr-Z-p845Y8r6LH4Ul8PyX8EmqI6alhF0JqadpBBF4mn53v-rQdY$ 8<---8<---8<--8< diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c index 8315ea5dd3a..dfc49de5efe 100644 --- a/sys/net/pf_table.c +++ b/sys/net/pf_table.c @@ -1628,8 +1628,7 @@ pfr_add_tables(struct pfr_table *tbl, int size, int *nadd, int flags) if (r != NULL) continue; - q->pfrkt_rs = pf_find_or_create_ruleset( - q->pfrkt_root->pfrkt_anchor); + q->pfrkt_rs = pf_find_or_create_ruleset(q->pfrkt_anchor); /* * root tables are attached to main ruleset, * because ->pfrkt_anchor[0] == '\0' >> > > -- > :wq Claudio
Re: [External] : Re: ip6 forwarding with pf and pfsync over veb/vport
> On 24 May 2022, at 17:01, Alexandr Nedvedicky > wrote: > > Hello Hrvoje, > > On Mon, May 23, 2022 at 06:34:07PM +0200, Hrvoje Popovski wrote: >> On 23.5.2022. 10:41, Hrvoje Popovski wrote: >>> On 23.5.2022. 8:34, Alexandr Nedvedicky wrote: looks like kind of memory corruption. my bet is use-after-free. will try to get to it later today. does it mean there is no such panic, when we handle IPv4 traffic only? >>> >>> Hi, >>> >>> yes, it seems that i can't trigger panic with ip4 only traffic, at least >>> the same way i can with ip6 traffic >>> >> >> All day I'm trying to trigger panic with ip4 and I just can't > >interesting. I went through mbuf handling in if_veb.c >I just could find a single nit, which is most likely unrelated, >however I think it's still worth to give it a try a diff below. > >basically all calls to veb_pf() read as follows: > m = veb_pf(ifp, ..., m); >except the one in veb_broadcast(), which readsa as: > m = veb_pf(ifp, ..., m0); >I think it is a bug, veb_pf() caller should continue to run >with packet returned by veb_pf(). yes. ok by me. can you fix the same thing in the ipsec handling too? > > thanks and > regards > sashan > > 8<---8<---8<--8< > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c > index 67185fde228..30a002f95a2 100644 > --- a/sys/net/if_veb.c > +++ b/sys/net/if_veb.c > @@ -944,7 +944,7 @@ veb_broadcast(struct veb_softc *sc, struct veb_port *rp, > struct mbuf *m0, >* let pf look at it, but use the veb interface as a proxy. >*/ > if (ISSET(ifp->if_flags, IFF_LINK1) && > - (m = veb_pf(ifp, PF_OUT, m0)) == NULL) > + (m0 = veb_pf(ifp, PF_OUT, m0)) == NULL) > return; > #endif > >
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote: > On 12.5.2022. 20:04, Hrvoje Popovski wrote: > > On 12.5.2022. 16:22, Hrvoje Popovski wrote: > >> On 12.5.2022. 14:48, Claudio Jeker wrote: > >>> I think the diff below may be enough to fix this issue. It drops the SMR > >>> critical secition around the enqueue operation but uses a reference on the > >>> port insteadt to ensure that the device can't be removed during the > >>> enqueue. Once the enqueue is finished we enter the SMR critical section > >>> again and drop the reference. > >>> > >>> To make it clear that the SMR_TAILQ remains intact while a refcount is > >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > >>> not strictly needed since the next pointer remains valid up until the > >>> smr_barrier() call but I find this a bit easier to understand. > >>> First make sure nobody else holds a reference then remove the port from > >>> the list. > >>> > >>> I currently have no test setup to verify this but I hope someone else can > >>> give this a spin. > >> Hi, > >> > >> for now veb seems stable and i can't panic box although it should, but > >> please give me few more hours to torture it properly. > > > > > > I can trigger panic in veb with this diff. > > > > Thank you .. > > > > > > I can't trigger ... :))) sorry .. sorry i'm late to the party. can you try this diff? this diff replaces the list of ports with an array/map of ports. the map takes references to all the ports, so the forwarding paths just have to hold a reference to the map to be able to use all the ports. the forwarding path uses smr to get hold of a map, takes a map ref, and then leaves the smr crit section before iterating over the map and pushing packets. this means we should only take and release a single refcnt when we're pushing packets out any number of ports. if no span ports are configured, then there's no span port map and we don't try and take a ref, we can just return early. we also only take and release a single refcnt when we forward the actual packet. forwarding to a single port provided by an etherbridge lookup already takes/releases the single port ref. if it falls through that for unknown unicast or broadcast/multicast, then it's a single refcnt for the current map of all ports. Index: if_veb.c === RCS file: /cvs/src/sys/net/if_veb.c,v retrieving revision 1.25 diff -u -p -r1.25 if_veb.c --- if_veb.c4 Jan 2022 06:32:39 - 1.25 +++ if_veb.c13 May 2022 02:01:43 - @@ -139,13 +139,13 @@ struct veb_port { struct veb_rule_list p_vr_list[2]; #define VEB_RULE_LIST_OUT 0 #define VEB_RULE_LIST_IN 1 - - SMR_TAILQ_ENTRY(veb_port)p_entry; }; struct veb_ports { - SMR_TAILQ_HEAD(, veb_port) l_list; - unsigned int l_count; + struct refcntm_refs; + unsigned int m_count; + + /* followed by an array of veb_port pointers */ }; struct veb_softc { @@ -155,8 +155,8 @@ struct veb_softc { struct etherbridge sc_eb; struct rwlocksc_rule_lock; - struct veb_ports sc_ports; - struct veb_ports sc_spans; + struct veb_ports*sc_ports; + struct veb_ports*sc_spans; }; #define DPRINTF(_sc, fmt...)do { \ @@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u static int veb_p_output(struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *); -static voidveb_p_dtor(struct veb_softc *, struct veb_port *, - const char *); +static inline size_t +veb_ports_size(unsigned int n) +{ + /* use of _ALIGN is inspired by CMSGs */ + return _ALIGN(sizeof(struct veb_ports)) + + n * sizeof(struct veb_port *); +} + +static inline struct veb_port ** +veb_ports_array(struct veb_ports *m) +{ + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m))); +} + +static voidveb_ports_free(struct veb_ports *); + +static voidveb_p_unlink(struct veb_softc *, struct veb_port *); +static voidveb_p_fini(struct veb_port *); +static voidveb_p_dtor(struct veb_softc *, struct veb_port *); static int veb_add_port(struct veb_softc *, const struct ifbreq *, unsigned int); static int veb_del_port(struct veb_softc *, @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i return (ENOMEM); rw_init(>sc_rule_lock, "vebrlk"); - SMR_TAILQ_INIT(>sc_ports.l_list); - SMR_TAILQ_INIT(>sc_spans.l_list); + sc->sc_ports = NULL; + sc->sc_spans = NULL; ifp = >sc_if; @@ -314,7 +331,10 @@ static int veb_clone_destroy(struct ifnet *ifp) { struct veb_softc *sc = ifp->if_softc; - struct
Re: route sourceaddr and ping/traceroute
On Sat, Mar 19, 2022 at 12:53:54PM +0100, Claudio Jeker wrote: > On Sat, Mar 19, 2022 at 12:27:43PM +0100, Claudio Jeker wrote: > > On Sat, Mar 19, 2022 at 10:51:13AM +, Stuart Henderson wrote: > > > On 2022/03/19 19:14, David Gwynne wrote: > > > > On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote: > > > > > This should not be done in applications. The kernel must do it. It > > > > > means > > > > > the current kernel code is worng. > > > > > > > > i think this is the right place for raw ipv4 sockets like what > > > > ping/traceroute uses. > > > > > > > > ipv6 looks like it already does the right thing. > > > > > > Given that this turned out so similar to the existing v6 support that > > > I think you didn't notice before writing the v4 version, this looks > > > like the right place indeed :) > > > > > > Works for me, OK > > > > Did someone try this on an ospf router? > > I think our ospfd(8) always includes the source address and uses IP_HDRINCL > > but not sure about other daemons. Are there other raw IP users that expect > > the IP source to be set to the outgoing interface by default? > > Maybe IGMP proxies and routers. E.g. dvmrpd depends on IP_MULTICAST_IF to > > set the source IP address. > > > > Looking at the code I think this will break the IP_MULTICAST_IF case. > > In ip_output() the check is like this: > > > > if ((IN_MULTICAST(ip->ip_dst.s_addr) || > > (ip->ip_dst.s_addr == INADDR_BROADCAST)) && > > imo != NULL && (ifp = if_get(imo->imo_ifidx)) != NULL) { > > > > mtu = ifp->if_mtu; > > if (ip->ip_src.s_addr == INADDR_ANY) { > > struct in_ifaddr *ia; > > > > IFP_TO_IA(ifp, ia); > > if (ia != NULL) > > ip->ip_src = ia->ia_addr.sin_addr; > > } > > > > Now raw_ip.c already set ip_src to something so > > if (ip->ip_src.s_addr == INADDR_ANY) > > is never true. > > > > It is possible to skip the in_pcbselsrc() call in rip_output() when > > IN_MULTICAST(ip->ip_dst.s_addr) || (ip->ip_dst.s_addr == INADDR_BROADCAST) > > > > Is that good enough? > > Actually in_pcbselsrc() checks the multicast options but only for the > IN_MULTICAST() case. I guess we could add INADDR_BROADCAST handling in > there as well. Seems like a sensible thing todo. Broadcast is just a very > special version of multicast. i know we talked about this off list, but for the record there are two kinds of IP broadcast. there's 255.255.255.255, and the broadcast address on subnets, eg, 192.168.1.0/24 has a broadcast address of 192.168.1.255. you're talking about 255.255.255.255 here. im honestly surprised in_pcbselsrc doesnt already do what you're talking about. like i said, the impression i get from stevens and other bits of the stack is that 255.255.255.255 is largely treated as a multicast address, which is why im surprised that it isnt handled already and why i think it makes sense. what tests do we need to feel confident with it though? > Also if this is changed is there still a way to have ip->ip_src == > INADDR_ANY in ip_output()? dunno. would slapping a counter on it be useful? Index: in_pcb.c === RCS file: /cvs/src/sys/netinet/in_pcb.c,v retrieving revision 1.261 diff -u -p -r1.261 in_pcb.c --- in_pcb.c14 Mar 2022 22:38:43 - 1.261 +++ in_pcb.c19 Mar 2022 12:54:55 - @@ -893,11 +893,13 @@ in_pcbselsrc(struct in_addr **insrc, str } /* -* If the destination address is multicast and an outgoing -* interface has been set as a multicast option, use the -* address of that interface as our source address. +* If the destination address is multicast or limited +* broadcast (255.255.255.255) and an outgoing interface has +* been set as a multicast option, use the address of that +* interface as our source address. */ - if (IN_MULTICAST(sin->sin_addr.s_addr) && mopts != NULL) { + if ((IN_MULTICAST(sin->sin_addr.s_addr) || + sin->sin_addr.s_addr == INADDR_BROADCAST) && mopts != NULL) { struct ifnet *ifp; ifp = if_get(mopts->imo_ifidx);
Re: route sourceaddr and ping/traceroute
On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote: > This should not be done in applications. The kernel must do it. It means > the current kernel code is worng. i think this is the right place for raw ipv4 sockets like what ping/traceroute uses. ipv6 looks like it already does the right thing. Index: raw_ip.c === RCS file: /cvs/src/sys/netinet/raw_ip.c,v retrieving revision 1.123 diff -u -p -r1.123 raw_ip.c --- raw_ip.c14 Mar 2022 22:38:43 - 1.123 +++ raw_ip.c19 Mar 2022 03:40:44 - @@ -222,6 +222,7 @@ int rip_output(struct mbuf *m, struct socket *so, struct sockaddr *dstaddr, struct mbuf *control) { + struct sockaddr_in *dst = satosin(dstaddr); struct ip *ip; struct inpcb *inp; int flags, error; @@ -246,8 +247,8 @@ rip_output(struct mbuf *m, struct socket ip->ip_off = htons(0); ip->ip_p = inp->inp_ip.ip_p; ip->ip_len = htons(m->m_pkthdr.len); - ip->ip_src = inp->inp_laddr; - ip->ip_dst = satosin(dstaddr)->sin_addr; + ip->ip_src.s_addr = INADDR_ANY; + ip->ip_dst = dst->sin_addr; ip->ip_ttl = inp->inp_ip.ip_ttl ? inp->inp_ip.ip_ttl : MAXTTL; } else { if (m->m_pkthdr.len > IP_MAXPACKET) { @@ -262,11 +263,23 @@ rip_output(struct mbuf *m, struct socket ip = mtod(m, struct ip *); if (ip->ip_id == 0) ip->ip_id = htons(ip_randomid()); + dst->sin_addr = ip->ip_dst; /* XXX prevent ip_output from overwriting header fields */ flags |= IP_RAWOUTPUT; ipstat_inc(ips_rawout); } + + if (ip->ip_src.s_addr == INADDR_ANY) { + struct in_addr *laddr; + + error = in_pcbselsrc(, dst, inp); + if (error != 0) + return (error); + + ip->ip_src = *laddr; + } + #ifdef INET6 /* * A thought: Even though raw IP shouldn't be able to set IPv6
Re: UDP divert-to rule: getsockname(2) won't show original destination
> On 23 Feb 2022, at 02:12, K R wrote: > > Hi David, > > On Tue, Feb 22, 2022 at 5:27 AM David Gwynne wrote: > > > > On 22 Feb 2022, at 06:31, K R wrote: > > > >> Synopsis: UDP divert-to rule: getsockname(2) won't show original > > destination > >> Category: kernel amd64 > >> Environment: > >System : OpenBSD 7.1-beta > >Details : OpenBSD 7.1-beta (GENERIC) #353: Sun Feb 20 17:14:05 > > MST 2022 > > > >Architecture: OpenBSD.amd64 > >Machine : amd64 > >> Description: > > > > getsockname(2) won't show the original destination address/port for a > > UDP inet packet redirected using a PF divert-to rule to a local > > socket. > > > > This works as expected for TCP. > > > >> How-To-Repeat: > > > > server: > > > > (pf.conf) > > pass in on vio0 inet proto udp from any to 100.64.0.100 divert-to 127.0.0.1 > > port 9000 > > > >>>> import socket > >>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) > >>>> s.bind(("127.0.0.1", 9000)) > >>>> s.recvfrom(1024) > > (b'data\n', ('100.64.0.1', 16079)) > >>>> s.getsockname() > > ('127.0.0.1', 9000) > > > > client: > > > > $ echo data | nc -u 100.64.0.100 12345 > > > >> Fix: > >Unknown. > > This is working as expected for UDP, which is a datagram socket, not a > connected TCP stream socket like what you're trying to compare it to. A > locally bound but not connected UDP socket will not keep information about > received packets on it, you have to get all that from the messages as they're > being received. > > If you want the original destination address for that message, you have to > ask for it as part of receiving the message. For IPv4 you do that by setting > the IP_RECVDSTADDR sockopt on the UDP socket, and then using recvmsg() > instead of recvfrom() with space for a control message set up for it to use. > > Thanks, of course, you are right. Now that you mentioned it, I could find > this information on the ip(4) > > If the IP_RECVDSTADDR option is enabled on a SOCK_DGRAM socket, the > recvmsg(2) call will return the destination IP address for a UDP > datagram. The msg_control field in the msghdr structure points to a > buffer that contains a cmsghdr structure followed by the IP address. > [...] > > and pf(4) manpages: > > getsockname(2). For SOCK_DGRAM sockets, the ip(4) socket > options IP_RECVDSTADDR and IP_RECVDSTPORT can be used to > retrieve the destination address and port. > > What would be nice, IMHO, is to make this clear on the pf.conf(5) > manpage when diverto-to is mentioned: > > divert-to host port port > Used to redirect packets to a local socket bound to host and > port. The packets will not be modified, so getsockname(2) on > the socket will return the original destination address of the > packet. agreed. i put something in. > > > src/usr.bin/tftpd/tftpd.c does this if you want some code to refer to. > > I believe it is src/usr.sbin/tftp-proxy/tftp-proxy.c, correct? even though tftpd isn't used with diver-to, it still uses the sockopts and control messages to get destination addresses for tftp requests. tftp-proxy might be better, but i was in tftpd more recently so it was what i remembered first. dlg
Re: UDP divert-to rule: getsockname(2) won't show original destination
> On 22 Feb 2022, at 06:31, K R wrote: > >> Synopsis: UDP divert-to rule: getsockname(2) won't show original > destination >> Category: kernel amd64 >> Environment: >System : OpenBSD 7.1-beta >Details : OpenBSD 7.1-beta (GENERIC) #353: Sun Feb 20 17:14:05 > MST 2022 > >Architecture: OpenBSD.amd64 >Machine : amd64 >> Description: > > getsockname(2) won't show the original destination address/port for a > UDP inet packet redirected using a PF divert-to rule to a local > socket. > > This works as expected for TCP. > >> How-To-Repeat: > > server: > > (pf.conf) > pass in on vio0 inet proto udp from any to 100.64.0.100 divert-to 127.0.0.1 > port 9000 > import socket s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.bind(("127.0.0.1", 9000)) s.recvfrom(1024) > (b'data\n', ('100.64.0.1', 16079)) s.getsockname() > ('127.0.0.1', 9000) > > client: > > $ echo data | nc -u 100.64.0.100 12345 > >> Fix: >Unknown. This is working as expected for UDP, which is a datagram socket, not a connected TCP stream socket like what you're trying to compare it to. A locally bound but not connected UDP socket will not keep information about received packets on it, you have to get all that from the messages as they're being received. If you want the original destination address for that message, you have to ask for it as part of receiving the message. For IPv4 you do that by setting the IP_RECVDSTADDR sockopt on the UDP socket, and then using recvmsg() instead of recvfrom() with space for a control message set up for it to use. src/usr.bin/tftpd/tftpd.c does this if you want some code to refer to. dlg
Re: vxlan broken
On Fri, Feb 18, 2022 at 02:25:45PM +0100, Anton Lindqvist wrote: > On Thu, Feb 17, 2022 at 10:25:19PM +0100, Anton Lindqvist wrote: > > On Thu, Feb 17, 2022 at 09:50:20PM +0100, Alexander Bluhm wrote: > > > Hi, > > > > > > With this snapshot regress/sys/net/vxlan crashes the kernel > > > OpenBSD 7.0-current (GENERIC.MP) #355: Wed Feb 16 13:44:38 MST 2022 > > > > > > START sys/net/vxlan 2022-02-17T08:07:25Z > > > > > > rm -f a.out [Ee]rrs mklog *.core y.tab.h > > > > > > vxlan_1 > > > ksh /usr/src/regress/sys/net/vxlan/vxlan_1.sh -R "11 12" -I "11 12" > > > ifconfig: SIOCSLIFPHYRTABLE: Device busy > > > ifconfig: SIOCSLIFPHYRTABLE: Device busy > > > Timeout, server ot6 not responding. > > > > > > uvm_fault(0xfd8240699668, 0xc, 0, 2) -> e > > > kernel: page fault trap, code=0 > > > Stopped at in_delmulti+0x54: addl$-0x1,0xc(%r14) > > > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > > *455775 97669 0 0x2 01K ifconfig > > > in_delmulti(0) at in_delmulti+0x54 > > > vxlan_ioctl(80e03000,80206910,80002236d760) at > > > vxlan_ioctl+0xfce > > > ifioctl(fd82765d8e08,80206910,80002236d760,8000221b8008) at > > > ifioctl+0x92b > > > soo_ioctl(fd81408b7ae0,80206910,80002236d760,8000221b8008) at > > > soo_ioctl+0x161 > > > sys_ioctl(8000221b8008,80002236d870,80002236d8c0) at > > > sys_ioctl+0x2c4 > > > syscall(80002236d930) at syscall+0x374 > > > Xsyscall() at Xsyscall+0x128 > > > end of kernel > > > end trace frame: 0x7f7cb4f0, count: 8 > > > > > > The same happens on i386, powerpc64, armv7, arm64, sparc64. > > > > Same here on my amd64-regress machine. > > No panic this time but the tests are failing. > > > sys/net/vxlan: > Exit: 1 > Duration: 00:04:35 > Log: 151-sys-net-vxlan.log vxlan needs a parent interface specified when using a multicast destination address on the tunnel. the first chunk forces arp to have run before letting pure icmp through. it shouldnt be necessary, but i have to dig around the arp code again. Index: vxlan_2.sh === RCS file: /cvs/src/regress/sys/net/vxlan/vxlan_2.sh,v retrieving revision 1.2 diff -u -p -r1.2 vxlan_2.sh --- vxlan_2.sh 30 Nov 2016 22:21:20 - 1.2 +++ vxlan_2.sh 18 Feb 2022 03:34:20 - @@ -22,6 +22,7 @@ do_ping() { local source="$1" local dest="${VXLAN_NETID}${2}" + $PING -q -c 1 -w 1 -V "$source" "$dest" > /dev/null # warm up arp $PING -q -c 3 -w 1 -V "$source" "$dest" | grep -q ' 0.0% packet loss' && return echo "Failed to ping $dest from vstack $source" STATUS=1 @@ -96,7 +97,7 @@ vstack_add() { $SUDO ifconfig "$vstack_pairname" rdomain "$vstack" $IFCONFIG_OPTS $SUDO ifconfig "$vstack_pairname" "$AF" "${vstack_tunsrc}${PAIR_PREFX}" up $SUDO ifconfig "vxlan$vstack" rdomain "$vstack" tunneldomain "$vstack" $IFCONFIG_OPTS - $SUDO ifconfig "vxlan$vstack" vnetid "$VNETID" tunnel "$vstack_tunsrc" "${VXLAN_TUNDST}${tundst_sufx}" up + $SUDO ifconfig "vxlan$vstack" vnetid "$VNETID" tunnel "$vstack_tunsrc" "${VXLAN_TUNDST}${tundst_sufx}" parent "$vstack_pairname" up [[ -n $DYNAMIC ]] && $SUDO ifconfig "bridge$vstack" rdomain "$vstack" add "vxlan$vstack" $IFCONFIG_OPTS up }
Re: msk(4) not working with trunk(4) (Marvell Yukon 88E8042)
On Tue, Dec 28, 2021 at 04:10:52PM +0100, Alessandro De Laurenzis wrote: > Ciao David, > > > msk is the first port added to the trunk? ie, it's the preferred port? if > > you run tcpdump on msk or watch systat if, do you see packets on msk? > > The network config is pretty standard; an Ethernet port (msk0), a wifi one > (iwn0), trunk0 with failover (using msk0 as "preferred" port): > > > $ cat /etc/hostname.trunk0 trunkproto failover > > trunkport msk0 > > trunkport iwn0 > > autoconf > > up > > Bear with me, tcpdump is a kind of stranger world for me... > > Enclosed please find the output files from the following commands: > > > $ doas tcpdump -i trunk0 -c 50 -w trunk0.dump > > $ doas tcpdump -i msk0 -c 50 -w msk0.dump > > $ doas tcpdump -i iwn0 -c 50 -w iwn0.dump > > I see some "broadcast" packages on both trunk0 and msk0 (trunk0 didn't > receive the inet address from the DHCP server, of course); nothing as > expected on iwn0. > > Hope this answers to your question... it was hard to see the trees because of the forest :( im back in the office today, so i dug out a box with msk and was able to reproduce the problem. it looks like the hardware "remembers" stuff between when an msk port is taken down and goes up again, like what trunk does to an interface when it's added as a port. i think the problem would occur if you just took msk down and up again during normal operation too, but that seems to be a rare event... the diff that borked it did so because of how it accounted for free space on the ring, and was otherwise lucky. can you try this diff? Index: if_msk.c === RCS file: /cvs/src/sys/dev/pci/if_msk.c,v retrieving revision 1.136 diff -u -p -r1.136 if_msk.c --- if_msk.c12 Dec 2020 11:48:53 - 1.136 +++ if_msk.c4 Jan 2022 08:06:11 - @@ -135,7 +135,7 @@ int msk_intr(void *); void msk_intr_yukon(struct sk_if_softc *); static inline int msk_rxvalid(struct sk_softc *, u_int32_t, u_int32_t); void msk_rxeof(struct sk_if_softc *, struct mbuf_list *, uint16_t, uint32_t); -void msk_txeof(struct sk_if_softc *); +void msk_txeof(struct sk_if_softc *, unsigned int); static unsigned int msk_encap(struct sk_if_softc *, struct mbuf *, uint32_t); void msk_start(struct ifnet *); int msk_ioctl(struct ifnet *, u_long, caddr_t); @@ -1561,7 +1561,7 @@ msk_watchdog(struct ifnet *ifp) * Reclaim first as there is a possibility of losing Tx completion * interrupts. */ - msk_txeof(sc_if); + //msk_txeof(sc_if, sc_if->sk_cdata.sk_tx_prod); if (sc_if->sk_cdata.sk_tx_prod != sc_if->sk_cdata.sk_tx_cons) { printf("%s: watchdog timeout\n", sc_if->sk_dev.dv_xname); @@ -1639,26 +1639,19 @@ msk_rxeof(struct sk_if_softc *sc_if, str } void -msk_txeof(struct sk_if_softc *sc_if) +msk_txeof(struct sk_if_softc *sc_if, unsigned int prod) { struct ifnet*ifp = _if->arpcom.ac_if; struct sk_softc *sc = sc_if->sk_softc; - uint32_tprod, cons; + uint32_tcons; struct mbuf *m; bus_dmamap_tmap; - bus_size_t reg; - - if (sc_if->sk_port == SK_PORT_A) - reg = SK_STAT_BMU_TXA1_RIDX; - else - reg = SK_STAT_BMU_TXA2_RIDX; /* * Go through our tx ring and free mbufs for those * frames that have been sent. */ cons = sc_if->sk_cdata.sk_tx_cons; - prod = sk_win_read_2(sc, reg); if (cons == prod) return; @@ -1770,7 +1763,7 @@ msk_intr(void *xsc) }; struct ifnet*ifp0 = NULL, *ifp1 = NULL; int claimed = 0; - u_int32_t status; + u_int32_t status, sk_status; struct msk_status_desc *cur_st; status = CSR_READ_4(sc, SK_Y2_ISSR2); @@ -1812,10 +1805,19 @@ msk_intr(void *xsc) lemtoh32(_st->sk_status)); break; case SK_Y2_STOPC_TXSTAT: + sk_status = lemtoh32(_st->sk_status); if (sc_if0) - msk_txeof(sc_if0); - if (sc_if1) - msk_txeof(sc_if1); + msk_txeof(sc_if0, sk_status & 0xfff); + if (sc_if1) { + /* +* this would be easier as a 64bit +* load of the whole status descriptor, +* a shift, and a mask. +*/ + unsigned int cons = (sk_status >> 24) & 0xff; + cons |= (lemtoh16(_st->sk_len) & 0xf) << 8; + msk_txeof(sc_if1, cons); +
Re: CARP load balancing problems under KVM
Hey Carlos, I've spent a bit of time today trying to figure out what's going on here, and haven't found anything that looks wrong with carp in 6.8. I did have a lot of trouble trying to reproduce it though, but that's because some of the switches involved seem to be "helping" and filtering packets sent from a multicast MAC address. I could see the carp interface get arp requests for the shared IP, and reply to them, but I never saw the replies on any other machine. However, I was able to build a test setup with carp on top of nvgre between a bunch of machines, and that abstracted me enough off the physical network to test with. As expected, it all worked fine. The only thing that's changing in your setup is the openbsd version? You're not upgrading the host machines or using different physical switches at the same time or anything? To debug this further I'd like to look at packet captures. Can you tcpump on the carp hosts and the client machines? If possible, captures from a 6.7 setup too would be nice. Cheers, dlg > On 5 Jan 2021, at 1:59 am, Carlos Lopez wrote: > > Good afternoon, > > Any news about this bug? > > On 21/10/20, 12:37, "owner-b...@openbsd.org on behalf of Carlos Lopez" > wrote: > >Hi all, > >Before upgrade from OpenBSD 6.7 to OpenBSD 6.8, my pair firewalls was > using carp in IP balance mode without problems from several months. These > firewalls are installed in a RHEL 8.2 (fully patched) KVM host. > >After upgrading to OpenBSD 6.8, carp ip balance mode doesn’t works. I have > tested reconfiguring balance mode for ip-stealth and ip-unicast also and the > result is always the same: network packets are not processed by firewalls. > But if I configure CARP using “the simple configuration” and one node is > master and the other is backup all it is working without problems. > >All CARP interfaces are configured as this one: > >carpdev vio0 balancing ip pass 7254e4bc3024e35490e4b9942f919e9b >inet 172.22.55.30 0xffe0 172.22.55.31 >carpnodes 10:0,11:100 >description "Production Network" > >sysctl.conf file: > >net.inet.carp.preempt=1 >net.inet.carp.log=2 >net.inet.ip.forwarding=1 >net.inet.tcp.mssdflt=1440 >net.inet.ip.redirect=0 >net.inet.ip.mtudisc=0 >net.inet.tcp.rfc3390=1 >net.inet.ip.arptimeout=60 >kern.bufcachepercent=70 >net.inet.icmp.tstamprepl=0 >net.inet.udp.sendspace=262144 >net.inet.udp.recvspace=262144 > > >OpenBSD kvm guest config: > > > obsdfw01 > OpenBSD Security Gateway Cluster > 786432 > 786432 > 1 > >/machine > > >hvm > > > > > > > >Broadwell > > > > > > > > > > > > > > destroy > restart > destroy > > > > > >/usr/libexec/qemu-kvm > > > > > > > function='0x0'/> > > > > > > > function='0x0'/> > > > > > > > > > function='0x0' multifunction='on'/> > > > > > > function='0x1'/> > > > > > > function='0x2'/> > > > > > > function='0x3'/> > > > > > > function='0x4'/> > > > > > > function='0x5'/> > > > > > > function='0x6'/> > > > > > > function='0x7'/> > > > > > > function='0x0' multifunction='on'/> > > > > > > function='0x1'/> > > > > > > function='0x2'/> > > > > > > function='0x3'/> > > > > > > function='0x4'/> > > > > function='0x2'/> > > > > > > > > function='0x0'/> > > > > > > > > function='0x0'/> > > > > > > > > function='0x0'/> > > > > > > > > function='0x0'/> > > > > > > > > function='0x0'/> > >
Re: tcpdump 'ip6' filter doesn't work on wg0 (wireguard)
On Mon, Jul 20, 2020 at 05:45:52PM +0200, Klemens Nanni wrote: > On Sun, Jul 19, 2020 at 02:24:36PM +0200, Matthieu Herrb wrote: > > Trying to look at IPv6 traffic on my wireguard VPN with > > > > tcpdump -n -i wg0 ip6 > > > > also shows all IPv4 traffic. Other interfacees seem to filter the v6 > > protocol correctly. > This happens for all interfaces without link-layer, e.g. lo(4) as well; > see `tcpdump -c1 -ilo0 ip & ping6 -qc1 ::1'. > > > Any suggestion before I try to dig into the kernel code (which I'm not > > really familiar with) ? > Not yet, but I'm curiously looking at this. kn@ pointed me at this, and we came up with the following. firstly, we narrowed the problem down to pcap not actually looking at the header to decide if a packet was ipv4 or ipv6: $ sudo tcpdump -i wg0 -d ip (000) ret #116 $ sudo tcpdump -i wg0 -d ip6 (000) ret #116 $ sudo tcpdump -i gre0 -d ip (000) ret #116 $ sudo tcpdump -i gre0 -d ip6 (000) ret #116 our tunnel interfaces pretty much all use DLT_LOOP as their link type, so this behaviour is consistent across all of them. why the filter unconditionally matches these packets is because of this stuff in src/lib/libpcap/gencode.c. im including bits for DLT_NULL for comparison: static void init_linktype(type) int type; { [snip] switch (type) { [snip] case DLT_NULL: off_linktype = 0; off_nl = 4; return; [snip] case DLT_LOOP: off_linktype = -1; off_nl = 4; return; [snip] } the actual filter is generated in gen_linktype: static struct block * gen_linktype(proto) int proto; { struct block *b0, *b1; /* If we're not using encapsulation and checking for IP, we're done */ if ((off_linktype == -1 || mpls_stack > 0) && proto == ETHERTYPE_IP) return gen_true(); #ifdef INET6 /* this isn't the right thing to do, but sometimes necessary */ if ((off_linktype == -1 || mpls_stack > 0) && proto == ETHERTYPE_IPV6) return gen_true(); #endif switch (linktype) { [snip] case DLT_LOOP: case DLT_ENC: case DLT_NULL: /* XXX */ if (proto == ETHERTYPE_IP) return (gen_cmp(0, BPF_W, (bpf_int32)htonl(AF_INET))); #ifdef INET6 else if (proto == ETHERTYPE_IPV6) return (gen_cmp(0, BPF_W, (bpf_int32)htonl(AF_INET6))); #endif /* INET6 */ else return gen_false(); break; cos init_linktype sets off_linktype to -1, gen_linktype thinks that DLT_LOOP has no linktype header, and just assumes everything is both ipv4 or ipv6. DLT_LOOP does have a link type header though, so we should fix init_linktypes. this is backed up by https://www.tcpdump.org/linktypes.html. this diff seems to work ok: $ sudo tcpdump -i gre0 -d ip (000) ld [0] (001) jeq #0x200 jt 2jf 3 (002) ret #116 (003) ret #0 $ sudo tcpdump -i gre0 -d ip6 (000) ld [0] (001) jeq #0x1800 jt 2jf 3 (002) ret #116 (003) ret #0 ok? Index: gencode.c === RCS file: /cvs/src/lib/libpcap/gencode.c,v retrieving revision 1.52 diff -u -p -r1.52 gencode.c --- gencode.c 9 Dec 2018 15:07:06 - 1.52 +++ gencode.c 20 Jul 2020 23:59:44 - @@ -770,7 +770,7 @@ init_linktype(type) return; case DLT_LOOP: - off_linktype = -1; + off_linktype = 0; off_nl = 4; return;
Re: Interfaces errors and latency spikes with Intel 82583V
are there any config options on the switch site relating to flow control you can try turning off? are there any counters for pause frames on the switch side too? dlg > On 12 Jun 2020, at 12:16 pm, Gabri Tofano wrote: > > Apparently it is not: > > #ifconfig em0 hwfeatures > em0: flags=808843 mtu 1500 >hwfeatures=36 hardmtu > 9216 >lladdr XX:XX:XX:XX:XX:XX >index 1 priority 0 llprio 3 >groups: egress >media: Ethernet autoselect (1000baseT full-duplex) >status: active >inet XX:XX:XX:XX netmask 0xff00 broadcast XX:XX:XX:XX > > > On 2020-06-11 21:57, David Gwynne wrote: >> Is flow control enabled? Can you try disabling rxpause and txpause? >>> On 12 Jun 2020, at 10:36 am, Gabri Tofano wrote: >>> Yes, this is today without resetting the interface: >>> #netstat -ie >>> NameMtu Network Address Ipkts IerrsOpkts Oerrs >>> Colls >>> em0 1500XX:XX:XX:XX:XX:XX 5351463 1868 3016695 0 >>> 0 >>> em0 1500 XX:XX:XX:XX XX:XX:XX:XX:XX:XX 5351463 1868 3016695 0 >>> 0 >>> em1 1500XX:XX:XX:XX:XX:XX 2839738 0 5147702 0 >>> 0 >>> em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 2839738 0 5147702 0 >>> 0 >>> em2 1500XX:XX:XX:XX:XX:XX46977 044135 0 >>> 0 >>> em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX46977 044135 0 >>> 0 >>> em3*150000:e0:67:10:9d:970 0 0 0 >>> 0 >>> enc0* 00 00 0 >>> 0 >>> pflog0 331360 0 128982 0 >>> 0 >>> On 2020-06-11 20:29, David Gwynne wrote: >>>> Is it consistently Ierrs? >>>> dlg >>>>> On 11 Jun 2020, at 10:14 pm, Gabri Tofano wrote: >>>>> #netstat -id >>>>> NameMtu Network Address Ipkts IdropOpkts Odrop >>>>> Colls >>>>> em0 1500XX:XX:XX:XX:XX:XX 266894 0 202813 0 >>>>> 0 >>>>> em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX 266894 0 202813 0 >>>>> 0 >>>>> em1 1500XX:XX:XX:XX:XX:XX 170280 0 230226 1 >>>>> 0 >>>>> em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 170280 0 230226 1 >>>>> 0 >>>>> em2 1500XX:XX:XX:XX:XX:XX15788 013249 2 >>>>> 0 >>>>> em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX15788 013249 2 >>>>> 0 >>>>> em3*1500XX:XX:XX:XX:XX:XX0 00 0 >>>>> 0 >>>>> enc0* 00 00 0 >>>>> 0 >>>>> pflog0 331360 029771 0 >>>>> 0 >>>>> #netstat -ie >>>>> NameMtu Network Address Ipkts IerrsOpkts Oerrs >>>>> Colls >>>>> em0 1500XX:XX:XX:XX:XX:XX 26971372 205469 0 >>>>> 0 >>>>> em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX 26971372 205469 0 >>>>> 0 >>>>> em1 1500XX:XX:XX:XX:XX:XX 172137 0 232148 0 >>>>> 0 >>>>> em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 172137 0 232148 0 >>>>> 0 >>>>> em2 1500XX:XX:XX:XX:XX:XX15892 013316 0 >>>>> 0 >>>>> em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX15892 013316 0 >>>>> 0 >>>>> em3*1500XX:XX:XX:XX:XX:XX0 00 0 >>>>> 0 >>>>> enc0* 00 00 0 >>>>> 0 >>>>> pflog0 331360 030174 0 >>>>> 0 >>>>> #systat queues >>>>> QUEUE BW/FL SCH PKTSBYTES DROP_P >>>>> DROP_B QLEN BORROW SUSPEN P/S B/S >>>>> main on em0 120M fifo000 >>>>> 00 >>>>> defq 100M fifo 139394 215744110 >>>>> 00 >>>>> voip10M fifo34699 49496350 >>>>> 00 >>>>> games 10M fifo32277 24608070 >>>>> 00 >>>>> Thank you! >>>>> Gabri
Re: Interfaces errors and latency spikes with Intel 82583V
Is flow control enabled? Can you try disabling rxpause and txpause? > On 12 Jun 2020, at 10:36 am, Gabri Tofano wrote: > > Yes, this is today without resetting the interface: > > #netstat -ie > NameMtu Network Address Ipkts IerrsOpkts Oerrs > Colls > em0 1500XX:XX:XX:XX:XX:XX 5351463 1868 3016695 0 > 0 > em0 1500 XX:XX:XX:XX XX:XX:XX:XX:XX:XX 5351463 1868 3016695 0 > 0 > em1 1500XX:XX:XX:XX:XX:XX 2839738 0 5147702 0 > 0 > em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 2839738 0 5147702 0 > 0 > em2 1500XX:XX:XX:XX:XX:XX46977 044135 0 > 0 > em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX46977 044135 0 > 0 > em3*150000:e0:67:10:9d:970 00 0 > 0 > enc0* 00 00 0 > 0 > pflog0 33136 0 0 128982 0 > 0 > > > On 2020-06-11 20:29, David Gwynne wrote: >> Is it consistently Ierrs? >> dlg >>> On 11 Jun 2020, at 10:14 pm, Gabri Tofano wrote: >>> #netstat -id >>> NameMtu Network Address Ipkts IdropOpkts Odrop >>> Colls >>> em0 1500XX:XX:XX:XX:XX:XX 266894 0 202813 0 >>> 0 >>> em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX 266894 0 202813 0 >>> 0 >>> em1 1500XX:XX:XX:XX:XX:XX 170280 0 230226 1 >>> 0 >>> em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 170280 0 230226 1 >>> 0 >>> em2 1500XX:XX:XX:XX:XX:XX15788 013249 2 >>> 0 >>> em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX15788 013249 2 >>> 0 >>> em3*1500XX:XX:XX:XX:XX:XX0 00 0 >>> 0 >>> enc0* 00 00 0 >>> 0 >>> pflog0 331360 029771 0 >>> 0 >>> #netstat -ie >>> NameMtu Network Address Ipkts IerrsOpkts Oerrs >>> Colls >>> em0 1500XX:XX:XX:XX:XX:XX 26971372 205469 0 >>> 0 >>> em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX 26971372 205469 0 >>> 0 >>> em1 1500XX:XX:XX:XX:XX:XX 172137 0 232148 0 >>> 0 >>> em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 172137 0 232148 0 >>> 0 >>> em2 1500XX:XX:XX:XX:XX:XX15892 013316 0 >>> 0 >>> em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX15892 013316 0 >>> 0 >>> em3*1500XX:XX:XX:XX:XX:XX0 00 0 >>> 0 >>> enc0* 00 00 0 >>> 0 >>> pflog0 331360 030174 0 >>> 0 >>> #systat queues >>> QUEUE BW/FL SCH PKTSBYTES DROP_P >>> DROP_B QLEN BORROW SUSPEN P/S B/S >>> main on em0 120M fifo000 >>> 00 >>> defq 100M fifo 139394 215744110 >>> 00 >>> voip10M fifo34699 49496350 >>> 00 >>> games 10M fifo32277 24608070 >>> 00 >>> Thank you! >>> Gabri
Re: Interfaces errors and latency spikes with Intel 82583V
Is it consistently Ierrs? dlg > On 11 Jun 2020, at 10:14 pm, Gabri Tofano wrote: > > #netstat -id > NameMtu Network Address Ipkts IdropOpkts Odrop > Colls > em0 1500XX:XX:XX:XX:XX:XX 266894 0 202813 0 > 0 > em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX 266894 0 202813 0 > 0 > em1 1500XX:XX:XX:XX:XX:XX 170280 0 230226 1 > 0 > em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 170280 0 230226 1 > 0 > em2 1500XX:XX:XX:XX:XX:XX15788 013249 2 > 0 > em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX15788 013249 2 > 0 > em3*1500XX:XX:XX:XX:XX:XX0 00 0 > 0 > enc0* 00 00 0 > 0 > pflog0 331360 029771 0 > 0 > > #netstat -ie > NameMtu Network Address Ipkts IerrsOpkts Oerrs > Colls > em0 1500XX:XX:XX:XX:XX:XX 26971372 205469 0 > 0 > em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX 26971372 205469 0 > 0 > em1 1500XX:XX:XX:XX:XX:XX 172137 0 232148 0 > 0 > em1 1500 172.16.200. XX:XX:XX:XX:XX:XX 172137 0 232148 0 > 0 > em2 1500XX:XX:XX:XX:XX:XX15892 013316 0 > 0 > em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX15892 013316 0 > 0 > em3*1500XX:XX:XX:XX:XX:XX0 00 0 > 0 > enc0* 00 00 0 > 0 > pflog0 331360 030174 0 > 0 > > > #systat queues > QUEUE BW/FL SCH PKTSBYTES DROP_P DROP_B > QLEN BORROW SUSPEN P/S B/S > main on em0 120M fifo0000 >0 > defq 100M fifo 139394 2157441100 > 0 > voip10M fifo34699 494963500 > 0 > games 10M fifo32277 246080700 > 0 > > Thank you! > Gabri
Re: Interfaces errors and latency spikes with Intel 82583V
The Ifail and Ofail columns are a sum of queue drops and errors. Could you run that netstat command with -d and -e so we can see the drops and errors separately? Cheers, dlg > On 11 Jun 2020, at 2:21 pm, Gabri Tofano wrote: > > After extensive testing the latency spikes shown up again: > > To the inside interface of the firewall: > > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time=132ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > Reply from 172.16.200.1: bytes=32 time<1ms TTL=254 > > And to the firewall's next hop (ISP ONT) at the same time: > > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=2ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=2ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=3ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=2ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=3ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=242ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=2ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=2ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=2ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=1ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=3ms TTL=62 > Reply from 74.215.235.1: bytes=32 time=3ms TTL=62 > > Interface errors are now showing up just on the output: > > #netstat -i > NameMtu Network Address Ipkts IfailOpkts Ofail > Colls > em0 1500XX:XX:XX:XX:XX:XX22655 041589 0 > 0 > em0 1500 XX.XX.XX.XX XX:XX:XX:XX:XX:XX22655 041589 0 > 0 > em1 1500XX:XX:XX:XX:XX:XX39924 020476 1 > 0 > em1 1500 172.16.200. XX:XX:XX:XX:XX:XX39924 020476 1 > 0 > em2 1500XX:XX:XX:XX:XX:XX 427 0 330 2 > 0 > em2 1500 172.16.103/ XX:XX:XX:XX:XX:XX 427 0 330 2 > 0 > em3*1500XX:XX:XX:XX:XX:XX0 00 0 > 0 > enc0* 00 00 0 > 0 > pflog0 331360 0 1294 0 > 0 > > UDP real time traffic is the most affected one as very sensitive and I keep \ > having spikes meanwhile playing online. > > Thank you! > Gabri > > On 2020-06-10 22:50, Gabri Tofano wrote: >> Another user pointed out to me that in the OpenBSD 6.7 release notes >> there is a statement in regards of the em(4) drivers: "Improvements in >> the em(4) driver." and so I have gave it a try and reinstalled with >> OpenBSD 6.6. It looks like that the system is now stable and latency >> spikes/interface errors are not present at all even under heavy >> traffic loads. I am not sure what introduced the issue but maybe one >> of the devs can give it a look? >> Thank you! >> Gabri >> On 2020-06-09 13:01, Gabri Tofano wrote: >>> Hi all, >>> I'm using a "Protectli FW1" with FreeBSD 12.1 amd64 as a firewall >>> which is serving me with great performances and no issues at all. The >>> appliance has 4 Intel Gigabit 82583V Ethernet NIC ports which are >>> working very well. I have used PFsense as well prior to FreeBSD and it >>> worked without issues too. >>> I took the decision to move to OpenBSD 6.7 amd64 in order to benefit >>> of the latest pf (and other) features but unfortunately the OS is >>> giving me an issue which I guess is related to the NIC drivers; When I >>> was connected via ssh I felt some glitches meanwhile I was >>> typing/moving around with the editor, so I started to ping the inside >>> interface from a wired connected pc and found out that time to time >>> the appliance is responding with a 100+/200+ ms response (I have cut >>> some 1ms reply to make it shorter): >>> Reply from 172.16.200.1: bytes=32 time=1ms TTL=254 >>> Reply from 172.16.200.1: bytes=32 time=1ms TTL=254
Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)
On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote: > >Synopsis:Packet loss / ENOBUFs with kqueue(2) and tap(4) > >Category:bug > >Environment: > System : OpenBSD 6.5 > Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 > 19:39:46 AWST 2019 > > ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > In Solo5 we have been working towards supporting multiple network > interfaces, implemented this using kqueue(2) and tap(4). > > This involves setting up two Tap interfaces, starting up the program. > In another session flood pinging the first Tap interface, > Solo5 handles this with no packets dropped. > In another session ping the second Tap interface, then for every > ping to the second interface a packet is dropped on the first. If you > switch to a flood ping on the second tab interface, you will observe > massive packet loss on both interfaces, and ping complaining about > No buffer space available (ENOBUFS). > > see https://github.com/Solo5/solo5/issues/374 for more information. > > >How-To-Repeat: > I have been able to reproduct this in a hacked up exampled program, > available here https://github.com/adamsteen/test_net_2if. Please note > this is hacked, generally butchered program, which demonstrates the > problem. (if required i can try and clean up this test case) > > 01. git clone https://github.com/adamsteen/test_net_2if > 02. cd test_net_2if > 03. make > 04. doas setup.sh (Setup up the Tap interfaces) > 05. doas ./test_net_2if > 06. in another seesion start a flood ping > doas ping -f 10.0.0.2 > 07. Observe that the flood ping is functioning correctly, > with no packets dropped. > 08. In another session, start a normal ping > ping 10.1.0.2 > 09. Observe that, for each ping sent to service1, a packet is dropped. > 10. Kill the normal ping > 11. start a flood ping > doas ping -f 10.1.0.2 > 12. Observe massive packet loss on both interfaces, and ping > complaining about No buffer space available (ENOBUFS). > >Fix: > Not Known. Hi Adam, claudio@ and I looked at this during a2k20, and came to the conclusion that the packet loss occurred because an interface queue filled up and it was shedding load. It was annoyingly easy to get to that point though. We also spent a lot of time massaging the tun/tap code to try and unify the semantics of tun and tap going through the network stack, and in particular tried to avoid queuing packets until we finally get to the output side of the stack. I'm not saying we've fixed this problem for you, but hopefully we've mitigated it a bit. Could you try again and let us know if you see any difference? If there's no difference, could you tweak your test to loop on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or whatever the errno is that means there's no packet to read right now? Cheers, dlg
Re: Crash while using ospfd over vxlan
On Fri, Apr 10, 2020 at 09:51:40AM +0200, Martin Pieuchot wrote: > On 09/04/20(Thu) 16:10, Massimiliano Stucchi wrote: > > >Synopsis: Crash while using ospfd over vxlan > > >Category: bug > > >Environment: > > System : OpenBSD 6.6 > > Details : OpenBSD 6.6 (GENERIC.MP) #5: Sun Feb 16 01:56:11 MST 2020 > > > > r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > Setting up an OSPF session over VXLAN leads to a kernel crash > > >How-To-Repeat: > > > > I have setup an ospf session over a vxlan interface. When this is up, > > it takes about 2-3 minutes for the crash to consistently happen. > > > > No other action is necessary. > > > > At this address: > > > > https://max.stucchi.ch/bugreport/ > > > > you can find screenshots from the ddb prompt, including a full trace. > > > > If needed, I can also provide access to the console. > > It's a recursion. I don't know anything about vxlan(4) or how the > encapsulation works but the following happens at least 10 times: > > ... > vxlan_lookup() > udp_input() > ip_deliver() > ip_ours() > ip_input_if() > ipv4_input() > ether_input() > if_vinput() > vxlan_lookup() > ... > > Maybe you can share your setup (vxlan config, ospf config, etc) so > somebody can try to reproduce and fix it. Possible recursion through encap drivers is a problem they all share. The vxlan driver should probably have the following text copied from one of the other manpages into it: For correct operation, encapsulated traffic must not be routed over the interface itself. This can be implemented by adding a distinct or a more specific route to the tunnel destination than the hosts or networks routed via the tunnel interface. Alternatively, the tunnel traffic may be configured in a separate routing table to the encapsulated traffic. Misconfiguration shouldn't result in a panic or fault though, so can you try the following diff? It copies the mechanism used to prevent recursion into vxlan. There's some more drivers that don't do this which I'll try and fix up in the next few days. Cheers, dlg Index: if_vxlan.c === RCS file: /cvs/src/sys/net/if_vxlan.c,v retrieving revision 1.76 diff -u -p -r1.76 if_vxlan.c --- if_vxlan.c 8 Nov 2019 07:16:29 - 1.76 +++ if_vxlan.c 12 Apr 2020 01:36:26 - @@ -82,6 +82,8 @@ struct vxlan_softc { voidvxlanattach(int); int vxlanioctl(struct ifnet *, u_long, caddr_t); +int vxlanoutput(struct ifnet *, struct mbuf *, struct sockaddr *, + struct rtentry *); voidvxlanstart(struct ifnet *); int vxlan_clone_create(struct if_clone *, int); int vxlan_clone_destroy(struct ifnet *); @@ -150,6 +152,7 @@ vxlan_clone_create(struct if_clone *ifc, ifp->if_softc = sc; ifp->if_ioctl = vxlanioctl; + ifp->if_output = vxlanoutput; ifp->if_start = vxlanstart; IFQ_SET_MAXLEN(>if_snd, IFQ_MAXLEN); @@ -294,6 +297,33 @@ vxlan_multicast_join(struct ifnet *ifp, if_detachhook_add(mifp, >sc_dtask); return (0); +} + +int +vxlanoutput(struct ifnet *ifp, struct mbuf *m, struct sockaddr *dst, +struct rtentry *rt) +{ + struct m_tag *mtag; + + /* Try to limit infinite recursion through misconfiguration. */ + for (mtag = m_tag_find(m, PACKET_TAG_GRE, NULL); mtag; + mtag = m_tag_find(m, PACKET_TAG_GRE, mtag)) { + if (memcmp((caddr_t)(mtag + 1), >if_index, + sizeof(ifp->if_index)) == 0) { + m_freem(m); + return (EIO); + } + } + + mtag = m_tag_get(PACKET_TAG_GRE, sizeof(ifp->if_index), M_NOWAIT); + if (mtag == NULL) { + m_freem(m); + return (ENOMEM); + } + memcpy((caddr_t)(mtag + 1), >if_index, sizeof(ifp->if_index)); + m_tag_prepend(m, mtag); + + return (ether_output(ifp, m, dst, rt)); } void
Re: splassert w/ add/del vlan on bridge
On Sat, Apr 11, 2020 at 01:43:20PM +, Visa Hankala wrote: > On Sat, Apr 11, 2020 at 11:09:54PM +1000, David Gwynne wrote: > > On Sat, Apr 11, 2020 at 03:21:49AM +, Visa Hankala wrote: > > > On Fri, Apr 10, 2020 at 01:30:47PM -0600, Theo de Raadt wrote: > > > > Why did it take almost a year to find this? > > > > > > > > Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22? > > > > > > This is not related to ioctl(2) becoming UNLOCKED. Lower-layer ioctl > > > code, soo_ioctl() included, lock the kernel when needed. However, most > > > .if_ioctl backends need NET_LOCK() in addition to KERNEL_LOCK(). In > > > most cases, that is satisfied by ifioctl() which acquires the lock > > > before invoking .if_ioctl(). bridge_ioctl() nullifies this by > > > releasing NET_LOCK(). > > > > yes. > > > > i came up with the following diff before i read the thread here. it's > > largely identical to what you (visa) already came up with, but it adds > > some extra checks to ifpromisc based on the doco in around struct ifnet > > members in src/sys/net/if_var.h. i audited the rest of the ifpromisc > > calls and found another one in if_aggr that i was able to trigger. > > > > i think the only other call to ifpromisc outside src/sys/net is in carp, > > and i managed to convinced myself that all those calls hold NET_LOCK > > already. > > > > Index: if.c > > === > > RCS file: /cvs/src/sys/net/if.c,v > > retrieving revision 1.601 > > diff -u -p -r1.601 if.c > > --- if.c10 Mar 2020 09:11:55 - 1.601 > > +++ if.c11 Apr 2020 13:08:46 - > > @@ -3031,7 +3031,9 @@ ifpromisc(struct ifnet *ifp, int pswitch > > unsigned short oif_flags; > > int oif_pcount, error; > > > > + NET_ASSERT_LOCKED(); /* modifying if_flags */ > > oif_flags = ifp->if_flags; > > + KERNEL_ASSERT_LOCKED(); /* modifying if_pcount */ > > oif_pcount = ifp->if_pcount; > > if (pswitch) { > > if (ifp->if_pcount++ != 0) > > Index: if_aggr.c > > === > > RCS file: /cvs/src/sys/net/if_aggr.c,v > > retrieving revision 1.28 > > diff -u -p -r1.28 if_aggr.c > > --- if_aggr.c 11 Mar 2020 07:01:42 - 1.28 > > +++ if_aggr.c 11 Apr 2020 13:08:46 - > > @@ -589,8 +589,10 @@ aggr_clone_destroy(struct ifnet *ifp) > > if_detach(ifp); > > > > /* last ref, no need to lock. aggr_p_dtor locks anyway */ > > + NET_LOCK(); > > while ((p = TAILQ_FIRST(>sc_ports)) != NULL) > > aggr_p_dtor(sc, p, "destroy"); > > + NET_UNLOCK(); > > > > free(sc, M_DEVBUF, sizeof(*sc)); > > > > Index: if_bridge.c > > === > > RCS file: /cvs/src/sys/net/if_bridge.c,v > > retrieving revision 1.338 > > diff -u -p -r1.338 if_bridge.c > > --- if_bridge.c 6 Nov 2019 03:51:26 - 1.338 > > +++ if_bridge.c 11 Apr 2020 13:08:46 - > > @@ -313,7 +313,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c > > break; > > } > > > > + NET_LOCK(); > > error = ifpromisc(ifs, 1); > > + NET_UNLOCK(); > > if (error != 0) { > > free(bif, M_DEVBUF, sizeof(*bif)); > > break; > > @@ -558,7 +560,9 @@ bridge_ifremove(struct bridge_iflist *bi > > } > > > > bif->ifp->if_bridgeidx = 0; > > + NET_LOCK(); > > error = ifpromisc(bif->ifp, 0); > > + NET_UNLOCK(); > > > > bridge_rtdelete(sc, bif->ifp, 0); > > bridge_flushrule(bif); > > Index: if_tpmr.c > > === > > RCS file: /cvs/src/sys/net/if_tpmr.c,v > > retrieving revision 1.9 > > diff -u -p -r1.9 if_tpmr.c > > --- if_tpmr.c 11 Apr 2020 11:01:03 - 1.9 > > +++ if_tpmr.c 11 Apr 2020 13:08:46 - > > @@ -201,12 +201,14 @@ tpmr_clone_destroy(struct ifnet *ifp) > > > > if_detach(ifp); > > > > + NET_LOCK(); > > for (i = 0; i < nitems(sc->sc_ports); i++) { > > struct tpmr_port *p = SMR_PTR_GET_LOCKED(>sc_ports[i]); > > if (p == NULL) > > continue; > > tpmr_p_dtor(sc, p, &q
Re: splassert w/ add/del vlan on bridge
On Sat, Apr 11, 2020 at 03:35:45PM +0200, Martin Pieuchot wrote: > On 11/04/20(Sat) 23:09, David Gwynne wrote: > > On Sat, Apr 11, 2020 at 03:21:49AM +, Visa Hankala wrote: > > > On Fri, Apr 10, 2020 at 01:30:47PM -0600, Theo de Raadt wrote: > > > > Why did it take almost a year to find this? > > > > > > > > Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22? > > > > > > This is not related to ioctl(2) becoming UNLOCKED. Lower-layer ioctl > > > code, soo_ioctl() included, lock the kernel when needed. However, most > > > .if_ioctl backends need NET_LOCK() in addition to KERNEL_LOCK(). In > > > most cases, that is satisfied by ifioctl() which acquires the lock > > > before invoking .if_ioctl(). bridge_ioctl() nullifies this by > > > releasing NET_LOCK(). > > > > yes. > > > > i came up with the following diff before i read the thread here. it's > > largely identical to what you (visa) already came up with, but it adds > > some extra checks to ifpromisc based on the doco in around struct ifnet > > members in src/sys/net/if_var.h. i audited the rest of the ifpromisc > > calls and found another one in if_aggr that i was able to trigger. > > The documentation says `if_pcount' is protected by the KERNEL_LOCK() but > in fact it is only read & modified in ifpromisc(). > > So I'd suggest fixing the documentation and not add another assert there. Can do. > > i think the only other call to ifpromisc outside src/sys/net is in carp, > > and i managed to convinced myself that all those calls hold NET_LOCK > > already. Index: if_var.h === RCS file: /cvs/src/sys/net/if_var.h,v retrieving revision 1.103 diff -u -p -r1.103 if_var.h --- if_var.h8 Nov 2019 07:16:29 - 1.103 +++ if_var.h11 Apr 2020 13:38:10 - @@ -130,7 +130,7 @@ struct ifnet { /* and the entries */ /* [I] check or clean routes (+ or -)'d */ void(*if_rtrequest)(struct ifnet *, int, struct rtentry *); charif_xname[IFNAMSIZ]; /* [I] external name (name + unit) */ - int if_pcount; /* [k] # of promiscuous listeners */ + int if_pcount; /* [N] # of promiscuous listeners */ unsigned int if_bridgeidx; /* [k] used by bridge ports */ caddr_t if_bpf; /* packet filter structure */ caddr_t if_switchport; /* used by switch ports */ Index: if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.601 diff -u -p -r1.601 if.c --- if.c10 Mar 2020 09:11:55 - 1.601 +++ if.c11 Apr 2020 13:38:10 - @@ -3031,6 +3031,8 @@ ifpromisc(struct ifnet *ifp, int pswitch unsigned short oif_flags; int oif_pcount, error; + NET_ASSERT_LOCKED(); /* modifying if_flags and if_pcount */ + oif_flags = ifp->if_flags; oif_pcount = ifp->if_pcount; if (pswitch) { Index: if_aggr.c === RCS file: /cvs/src/sys/net/if_aggr.c,v retrieving revision 1.28 diff -u -p -r1.28 if_aggr.c --- if_aggr.c 11 Mar 2020 07:01:42 - 1.28 +++ if_aggr.c 11 Apr 2020 13:38:10 - @@ -589,8 +589,10 @@ aggr_clone_destroy(struct ifnet *ifp) if_detach(ifp); /* last ref, no need to lock. aggr_p_dtor locks anyway */ + NET_LOCK(); while ((p = TAILQ_FIRST(>sc_ports)) != NULL) aggr_p_dtor(sc, p, "destroy"); + NET_UNLOCK(); free(sc, M_DEVBUF, sizeof(*sc)); Index: if_bridge.c === RCS file: /cvs/src/sys/net/if_bridge.c,v retrieving revision 1.338 diff -u -p -r1.338 if_bridge.c --- if_bridge.c 6 Nov 2019 03:51:26 - 1.338 +++ if_bridge.c 11 Apr 2020 13:38:10 - @@ -313,7 +313,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c break; } + NET_LOCK(); error = ifpromisc(ifs, 1); + NET_UNLOCK(); if (error != 0) { free(bif, M_DEVBUF, sizeof(*bif)); break; @@ -558,7 +560,9 @@ bridge_ifremove(struct bridge_iflist *bi } bif->ifp->if_bridgeidx = 0; + NET_LOCK(); error = ifpromisc(bif->ifp, 0); + NET_UNLOCK(); bridge_rtdelete(sc, bif->ifp, 0); bridge_flushrule(bif); Index: if_tpmr.c === RCS file: /cvs/src/sys/net/if_tpmr.c,v retrieving revision 1.9 diff -u -p -r1.9 if_tpmr.c --- if_tpmr.c 1
Re: netstart: PROMISC,ALLMULTI not set on parent interface of vlan that joins a bridge until run again
Hi Jon, This should be fixed in current as of r1.199 of src/sys/net/if_vlan.c Sorry for the inconvenience. Cheers, dlg > On 11 Oct 2018, at 06:45, Jon Williams wrote: > >> Synopsis: Running netstart 1x does not set PROMISC,ALLMULTI on parent >> interface of vlan members of bridges >> Category: system >> Environment: > System : OpenBSD 6.3 > Details : OpenBSD 6.3 (GENERIC.MP) #11: Thu Sep 20 16:05:37 CEST 2018 > > r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >> Description: > > em0 is a parent of vlan1 and vlan2. vlan1 is a member of bridge0. vlan2 is a > member of bridge2. > After system boot, em0 does not have PROMISC,ALLMULTI set in ifconfig - i.e. > it cannot forward > traffic correctly, until netstart is manually re-run by root user. > >> How-To-Repeat: > NB: have been unable to replicate inside vmd using a vio interface for vlan > parent. All vio > interfaces appear to be always PROMISC,ALLMULTI. > > (Even a temporary workaround would be helpful as this machine is a > gateway/bgp host.) > > /etc/hostname.em0 > description "nycmesh-lbe-1659" > group trunk > up > > /etc/hostname.vlan1 > description "nycmesh-lbe-1659 mgmt VLAN" > parent em0 vnetid 2 > group lan > group trusted > group bridged > up > > cat /etc/hostname.vlan2 > description "nycmesh-lbe-1659 WAN VLAN" > group nycmesh > group bridged > parent em0 vnetid 3 > up > > cat /etc/hostname.vlan2 > description "nycmesh-lbe-1659 WAN VLAN" > group nycmesh > group bridged > parent em0 vnetid 3 > up > > /etc/hostname.bridge0 > description "Bridged LAN" > group lan > group trusted > group bridge > add vether0 > add em1 > add em2 > add em3 > add em4 > add em5 > add vlan1 > # Try to stop the Airport express from sending weird arps > rule block on em1 src 28:37:37:3f:5:4c arp spa 10.70.145.50 > up > > cat /etc/hostname.bridge2 > description "Bridged WAN" > group wan > group bridge > add vether2 > add vlan2 > up > > SENDBUG: Run sendbug as root if this is an ACPI report! > SENDBUG: dmesg and usbdevs are attached. > SENDBUG: Feel free to delete or use the -D flag if they contain sensitive > information. > > dmesg: > OpenBSD 6.3 (GENERIC.MP) #11: Thu Sep 20 16:05:37 CEST 2018 > > r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 8520617984 (8125MB) > avail mem = 8255311872 (7872MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x8e622000 (85 entries) > bios0: vendor American Megatrends Inc. version "5.12" date 07/01/2018 > bios0: Default string Default string > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S3 S5 > acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET SSDT SSDT UEFI > SSDT LPIT SSDT SSDT SSDT SSDT DBGP DBG2 SSDT DMAR ASF! WSMT > acpi0: wakeup devices PXSX(S3) RP09(S3) PXSX(S3) RP10(S3) PXSX(S3) RP11(S3) > PXSX(S3) RP12(S3) PXSX(S3) RP13(S3) PXSX(S3) RP01(S3) PXSX(S3) RP02(S3) > PXSX(S3) RP03(S3) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Celeron(R) CPU 3865U @ 1.80GHz, 1696.65 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,PAGE1GB, > cpu0: 256KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 24MHz > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Celeron(R) CPU 3865U @ 1.80GHz, 1696.06 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,PAGE1GB, > cpu1: 256KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 120 pins > acpimcfg0 at acpi0 addr 0xe000, bus 0-255 > acpihpet0 at acpi0: 2399 Hz > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus -1 (PEG0) > acpiprt2 at acpi0: bus -1 (PEG1) > acpiprt3 at acpi0: bus -1 (PEG2) > acpiprt4 at acpi0: bus -1 (RP09) > acpiprt5 at acpi0: bus -1 (RP10) > acpiprt6 at acpi0: bus -1 (RP11) > acpiprt7 at acpi0: bus -1 (RP12) > acpiprt8 at acpi0: bus -1 (RP13) > acpiprt9 at acpi0: bus 1 (RP01) > acpiprt10 at acpi0: bus 2 (RP02) > acpiprt11 at acpi0: bus 3 (RP03) > acpiprt12 at acpi0: bus 4 (RP04) > acpiprt13 at acpi0: bus 5 (RP05) > acpiprt14 at acpi0: bus 6 (RP06) > acpiprt15 at acpi0: bus -1 (RP07) > acpiprt16 at acpi0: bus -1 (RP08) > acpiprt17 at
Re: Bad TTL when applying multiple MPLS labels
The fix should be in now. Thanks for the report, and the diff pointing right at the problem. On Tue., 27 Aug. 2019, 01:45 Gerrie Roos, wrote: > I just tested your patch and it works! I can see the correct TTL. > > -Original Message- > From: "David Gwynne" > Sent: Monday, 26 August, 2019 4:21am > To: "Gerrie Roos" > Cc: bugs@openbsd.org > Subject: Re: Bad TTL when applying multiple MPLS labels > > On Fri, Aug 23, 2019 at 10:05:25AM -0500, Gerrie Roos wrote: > > Hi David, > > > > Apologies, I haven't used CVS in a while - here you go. Hope it's useful. > > > > Gerrie > > npz, i can read the attachment fine. > > if someone's carrying a non-ip payload over mpls with a nibble that > makes it look like ip, your diff will cause (very) short payloads > to be dropped when m_pullup can't find bytes to copy into the first > mbuf. > > could you try the following? it uses m_getptr to reach into the right > mbuf and byte offset in that mbuf for the ttl. if the packet is > short, it just returns the default ttl. > > i've attached the diff for your convenience too. > > cheers, > dlg > > Index: mpls_output.c > === > RCS file: /cvs/src/sys/netmpls/mpls_output.c,v > retrieving revision 1.26 > diff -u -p -r1.26 mpls_output.c > --- mpls_output.c 2 Dec 2015 08:47:00 - 1.26 > +++ mpls_output.c 26 Aug 2019 09:11:08 - > @@ -162,41 +162,39 @@ mpls_do_cksum(struct mbuf *m) > u_int8_t > mpls_getttl(struct mbuf *m, sa_family_t af) > { > - struct shim_hdr *shim; > - struct ip *ip; > -#ifdef INET6 > - struct ip6_hdr *ip6hdr; > -#endif > + struct mbuf *n; > + int loc, off; > u_int8_t ttl = mpls_defttl; > > /* If the AF is MPLS then inherit the TTL from the present label. > */ > - if (af == AF_MPLS) { > - shim = mtod(m, struct shim_hdr *); > - ttl = ntohl(shim->shim_label & MPLS_TTL_MASK); > - return (ttl); > - } > - /* Else extract TTL from the encapsualted packet. */ > - switch (*mtod(m, u_char *) >> 4) { > - case IPVERSION: > - if (!mpls_mapttl_ip) > + if (af == AF_MPLS) > + loc = 3; > + else { > + switch (*mtod(m, uint8_t *) >> 4) { > + case 4: > + if (!mpls_mapttl_ip) > + return (ttl); > + > + loc = offsetof(struct ip, ip_ttl); > break; > - if (m->m_len < sizeof(*ip)) > - break; /* impossible */ > - ip = mtod(m, struct ip *); > - ttl = ip->ip_ttl; > - break; > #ifdef INET6 > - case IPV6_VERSION >> 4: > - if (!mpls_mapttl_ip6) > + case 6: > + if (!mpls_mapttl_ip6) > + break; > + > + loc = offsetof(struct ip6_hdr, ip6_hlim); > break; > - if (m->m_len < sizeof(struct ip6_hdr)) > - break; /* impossible */ > - ip6hdr = mtod(m, struct ip6_hdr *); > - ttl = ip6hdr->ip6_hlim; > - break; > #endif > - default: > - break; > + default: > + return (ttl); > + } > } > + > + n = m_getptr(m, loc, ); > + if (n == NULL) > + return (ttl); > + > + ttl = *(mtod(n, uint8_t *) + off); > + > return (ttl); > } > > >
Re: Bad TTL when applying multiple MPLS labels
On Fri, Aug 23, 2019 at 10:05:25AM -0500, Gerrie Roos wrote: > Hi David, > > Apologies, I haven't used CVS in a while - here you go. Hope it's useful. > > Gerrie npz, i can read the attachment fine. if someone's carrying a non-ip payload over mpls with a nibble that makes it look like ip, your diff will cause (very) short payloads to be dropped when m_pullup can't find bytes to copy into the first mbuf. could you try the following? it uses m_getptr to reach into the right mbuf and byte offset in that mbuf for the ttl. if the packet is short, it just returns the default ttl. i've attached the diff for your convenience too. cheers, dlg Index: mpls_output.c === RCS file: /cvs/src/sys/netmpls/mpls_output.c,v retrieving revision 1.26 diff -u -p -r1.26 mpls_output.c --- mpls_output.c 2 Dec 2015 08:47:00 - 1.26 +++ mpls_output.c 26 Aug 2019 09:11:08 - @@ -162,41 +162,39 @@ mpls_do_cksum(struct mbuf *m) u_int8_t mpls_getttl(struct mbuf *m, sa_family_t af) { - struct shim_hdr *shim; - struct ip *ip; -#ifdef INET6 - struct ip6_hdr *ip6hdr; -#endif + struct mbuf *n; + int loc, off; u_int8_t ttl = mpls_defttl; /* If the AF is MPLS then inherit the TTL from the present label. */ - if (af == AF_MPLS) { - shim = mtod(m, struct shim_hdr *); - ttl = ntohl(shim->shim_label & MPLS_TTL_MASK); - return (ttl); - } - /* Else extract TTL from the encapsualted packet. */ - switch (*mtod(m, u_char *) >> 4) { - case IPVERSION: - if (!mpls_mapttl_ip) + if (af == AF_MPLS) + loc = 3; + else { + switch (*mtod(m, uint8_t *) >> 4) { + case 4: + if (!mpls_mapttl_ip) + return (ttl); + + loc = offsetof(struct ip, ip_ttl); break; - if (m->m_len < sizeof(*ip)) - break; /* impossible */ - ip = mtod(m, struct ip *); - ttl = ip->ip_ttl; - break; #ifdef INET6 - case IPV6_VERSION >> 4: - if (!mpls_mapttl_ip6) + case 6: + if (!mpls_mapttl_ip6) + break; + + loc = offsetof(struct ip6_hdr, ip6_hlim); break; - if (m->m_len < sizeof(struct ip6_hdr)) - break; /* impossible */ - ip6hdr = mtod(m, struct ip6_hdr *); - ttl = ip6hdr->ip6_hlim; - break; #endif - default: - break; + default: + return (ttl); + } } + + n = m_getptr(m, loc, ); + if (n == NULL) + return (ttl); + + ttl = *(mtod(n, uint8_t *) + off); + return (ttl); } Index: mpls_output.c === RCS file: /cvs/src/sys/netmpls/mpls_output.c,v retrieving revision 1.26 diff -u -p -r1.26 mpls_output.c --- mpls_output.c 2 Dec 2015 08:47:00 - 1.26 +++ mpls_output.c 26 Aug 2019 09:11:08 - @@ -162,41 +162,39 @@ mpls_do_cksum(struct mbuf *m) u_int8_t mpls_getttl(struct mbuf *m, sa_family_t af) { - struct shim_hdr *shim; - struct ip *ip; -#ifdef INET6 - struct ip6_hdr *ip6hdr; -#endif + struct mbuf *n; + int loc, off; u_int8_t ttl = mpls_defttl; /* If the AF is MPLS then inherit the TTL from the present label. */ - if (af == AF_MPLS) { - shim = mtod(m, struct shim_hdr *); - ttl = ntohl(shim->shim_label & MPLS_TTL_MASK); - return (ttl); - } - /* Else extract TTL from the encapsualted packet. */ - switch (*mtod(m, u_char *) >> 4) { - case IPVERSION: - if (!mpls_mapttl_ip) + if (af == AF_MPLS) + loc = 3; + else { + switch (*mtod(m, uint8_t *) >> 4) { + case 4: + if (!mpls_mapttl_ip) + return (ttl); + + loc = offsetof(struct ip, ip_ttl); break; - if (m->m_len < sizeof(*ip)) - break; /* impossible */ - ip = mtod(m, struct ip *); - ttl = ip->ip_ttl; - break; #ifdef INET6 - case IPV6_VERSION >> 4: - if (!mpls_mapttl_ip6) + case 6: + if (!mpls_mapttl_ip6) + break; + + loc = offsetof(struct ip6_hdr, ip6_hlim); break; - if (m->m_len < sizeof(struct ip6_hdr)) - break;
Re: ifiq_input pressure drop too aggressive for msk interfaces
On Mon, Jul 29, 2019 at 12:22:28AM +0200, Olivier Ta??bi wrote: > On one of my machines equipped with two msk(4) gigabit ethernet devices, > the reintroduction (rev. 1.32) of counting backpressure in ifiq_input() in > sys/net/ifq.c causes drops when more than 8 ethernet frames arrive in a short > amount of time. I noticed this because this machine is an NFS server, mounted > with the (client-side) option "-w 32768", but the drops are reproducible > (visible with netstat -I msk*) simply by sending to the machine a large (e.g. > 17kB) UDP packet which gets fragmented into >8 IP packets. > Increasing net.link.ifrxq.pressure_drop (e.g. from the default value 8 to 40) > suppresses this problem, although I am not sure whether this is the correct > solution. i think i see the more fundamental problem. the diff below should fix it so you don't need to tune the ifrxq pressure levels. this basically tweaks msk so it enqueues packets for the stack once per interrupt, rather than once per rx completion event. this helps when you rx multiple packets per interrupt like you are when a packet gets fragmented. while here it uses ifiq_input so it can apply earlier backpressure. i'll commit this soon, so hopefully it will be in the tree for you to test. Index: if_msk.c === RCS file: /cvs/src/sys/dev/pci/if_msk.c,v retrieving revision 1.131 diff -u -p -r1.131 if_msk.c --- if_msk.c6 Jan 2018 03:11:04 - 1.131 +++ if_msk.c30 Jul 2019 03:59:11 - @@ -134,7 +134,7 @@ int mskcprint(void *, const char *); int msk_intr(void *); void msk_intr_yukon(struct sk_if_softc *); static inline int msk_rxvalid(struct sk_softc *, u_int32_t, u_int32_t); -void msk_rxeof(struct sk_if_softc *, uint16_t, uint32_t); +void msk_rxeof(struct sk_if_softc *, struct mbuf_list *, uint16_t, uint32_t); void msk_txeof(struct sk_if_softc *); static unsigned int msk_encap(struct sk_if_softc *, struct mbuf *, uint32_t); void msk_start(struct ifnet *); @@ -1591,11 +1591,11 @@ msk_rxvalid(struct sk_softc *sc, u_int32 } void -msk_rxeof(struct sk_if_softc *sc_if, uint16_t len, uint32_t rxstat) +msk_rxeof(struct sk_if_softc *sc_if, struct mbuf_list *ml, +uint16_t len, uint32_t rxstat) { struct sk_softc *sc = sc_if->sk_softc; struct ifnet*ifp = _if->arpcom.ac_if; - struct mbuf_listml = MBUF_LIST_INITIALIZER(); struct mbuf *m = NULL; int prod, cons, tail; bus_dmamap_tmap; @@ -1640,8 +1640,7 @@ msk_rxeof(struct sk_if_softc *sc_if, uin m->m_pkthdr.len = m->m_len = len; - ml_enqueue(, m); - if_input(ifp, ); + ml_enqueue(ml, m); } void @@ -1770,8 +1769,12 @@ msk_intr(void *xsc) struct sk_if_softc *sc_if; struct sk_if_softc *sc_if0 = sc->sk_if[SK_PORT_A]; struct sk_if_softc *sc_if1 = sc->sk_if[SK_PORT_B]; + struct mbuf_listml[2] = { + MBUF_LIST_INITIALIZER(), + MBUF_LIST_INITIALIZER(), + }; struct ifnet*ifp0 = NULL, *ifp1 = NULL; - int claimed = 0, rx[2] = {0, 0}; + int claimed = 0; u_int32_t status; struct msk_status_desc *cur_st; @@ -1809,8 +1812,8 @@ msk_intr(void *xsc) switch (cur_st->sk_opcode) { case SK_Y2_STOPC_RXSTAT: sc_if = sc->sk_if[cur_st->sk_link & 0x01]; - rx[cur_st->sk_link & 0x01] = 1; - msk_rxeof(sc_if, lemtoh16(_st->sk_len), + msk_rxeof(sc_if, [cur_st->sk_link & 0x01], + lemtoh16(_st->sk_len), lemtoh32(_st->sk_status)); break; case SK_Y2_STOPC_TXSTAT: @@ -1837,12 +1840,16 @@ msk_intr(void *xsc) CSR_WRITE_4(sc, SK_Y2_ICR, 2); - if (rx[0]) { + if (!ml_empty([0])) { + if (ifiq_input(>if_rcv, [0])) + if_rxr_livelocked(_if0->sk_cdata.sk_rx_ring); msk_fill_rx_ring(sc_if0); SK_IF_WRITE_2(sc_if0, 0, SK_RXQ1_Y2_PREF_PUTIDX, sc_if0->sk_cdata.sk_rx_prod); } - if (rx[1]) { + if (!ml_empty([1])) { + if (ifiq_input(>if_rcv, [1])) + if_rxr_livelocked(_if1->sk_cdata.sk_rx_ring); msk_fill_rx_ring(sc_if1); SK_IF_WRITE_2(sc_if1, 0, SK_RXQ1_Y2_PREF_PUTIDX, sc_if1->sk_cdata.sk_rx_prod);
Re: hostname.gre panic. 6.5 RELEASE
> On 26 Jul 2019, at 5:22 am, Alexander Bluhm wrote: > > On Thu, Jul 25, 2019 at 12:40:22PM +, andr...@nullbyte.se wrote: >> # Which results in the following error as can be seen in this screenshot: >> http://gw.nullbyte.se/dump/openbsd/openbsd_65_gre_panic.PNG >> >>> Fix: >> # Make sure the 'tunnel' statement is before the inet/inet6 commands > > The inet6 duplicate address detection packet is sent before the > tunnel is set up. We should reject packets during that time window. > > While there, count errors and use generic unhandled_af(). > > ok? ok > > bluhm > > Index: net/if_gre.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_gre.c,v > retrieving revision 1.151 > diff -u -p -r1.151 if_gre.c > --- net/if_gre.c 17 Jul 2019 16:46:17 - 1.151 > +++ net/if_gre.c 25 Jul 2019 19:03:45 - > @@ -1930,8 +1930,10 @@ mgre_output(struct ifnet *ifp, struct mb > } > > m = gre_l3_encap_dst(>sc_tunnel, addr, m, dest->sa_family); > - if (m == NULL) > + if (m == NULL) { > + ifp->if_oerrors++; > return (ENOBUFS); > + } > > m->m_pkthdr.ph_family = dest->sa_family; > > @@ -2142,6 +2144,10 @@ gre_encap_dst_ip(const struct gre_tunnel > struct mbuf *m, uint8_t ttl, uint8_t tos) > { > switch (tunnel->t_af) { > + case AF_UNSPEC: > + /* packets may arrive before tunnel is set up */ > + m_freem(m); > + return (NULL); > case AF_INET: { > struct ip *ip; > > @@ -2188,8 +2194,7 @@ gre_encap_dst_ip(const struct gre_tunnel > } > #endif /* INET6 */ > default: > - panic("%s: unsupported af %d in %p", __func__, tunnel->t_af, > - tunnel); > + unhandled_af(tunnel->t_af); > } > > return (m); > @@ -2215,8 +2220,7 @@ gre_ip_output(const struct gre_tunnel *t > break; > #endif > default: > - panic("%s: unsupported af %d in %p", __func__, tunnel->t_af, > - tunnel); > + unhandled_af(tunnel->t_af); > } > > return (0); > @@ -4286,7 +4290,7 @@ gre_ip_cmp(int af, const union gre_addr > case AF_INET: > return (memcmp(>in4, >in4, sizeof(a->in4))); > default: > - panic("%s: unsupported af %d\n", __func__, af); > + unhandled_af(af); > } > > return (0); >
Re: panic: Stopped at kqueue_scan
> On 30 Apr 2019, at 03:24, Martin Pieuchot wrote: > > On 29/04/19(Mon) 17:24, David Gwynne wrote: >> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote: >>> On 23/04/19(Tue) 12:16, Olivier Antoine wrote: >>>>> Synopsis:panic: Stopped at kqueue_scan >>>>> Category:kernel i386 >>>>> Environment: >>>>System : OpenBSD 6.5 >>>>Details : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21 >>>> 19:50:46 MDT 2019 >>>> >>>> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP >>>> >>>>Architecture: OpenBSD.i386 >>>>Machine : i386 >>>>> Description: >>>> Hi, since my last update I have regular panic crashes. 4 in two days. >>>> At least 3 of them, with certainty, occurred while I was accessing the >>>> Internet via my smartphone connected to my OpenBSD WiFi access point >>>> through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor. >>>> This setup works for years. >>>> >>>> The machine then displays something like: >>>> uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e >>>> kernel: page fault trap, code=0 >>>> Stopped at kqueue_scan+0x246: movl %eax,0(%ecx) >>>> ddb{1}> >>> >>> So this indicates that the `kqueue' is empty. It should not happen >>> because the caller, in your case npppd, always places a marker in the >>> list. >>> >>> Since the caller is not threaded and the syscall is executed with the >>> KERNEL_LOCK() held, we can supposed that another part of the kernel is >>> removing the marker. That would imply that the other part isn't running >>> with the KERNEL_LOCK() and requires a MP kernel. >>> >>> Could you try *very hard* to reproduce the problem with a kernel built >>> with the diff below? Hopefully you'll make it crash and we'll find the >>> bug. Otherwise we'll look for another possible cause of the marker >>> removal. >>> >>> Index: kern/kern_event.c >>> === >>> RCS file: /cvs/src/sys/kern/kern_event.c,v >>> retrieving revision 1.101 >>> diff -u -p -r1.101 kern_event.c >>> --- kern/kern_event.c 27 Nov 2018 15:52:50 - 1.101 >>> +++ kern/kern_event.c 28 Apr 2019 21:47:25 - >>> @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc >>> struct kqueue *kq = kn->kn_kq; >>> struct klist *list; >>> >>> + KERNEL_ASSERT_LOCKED(); >>> + >>> if (kn->kn_fop->f_isfd) >>> list = >kq_knlist[kn->kn_id]; >>> else >>> >> >> i had a similar diff in my tree, and with some clues from this thread >> and the same panic from jmc@, points out that tun(4) calls tun_wakeup >> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in >> the kq code messing up the kn_head list. fun fun. >> >> below is an extremely clever diff to tun to avoid doing the wakeup >> without the kernel lock held. i say extremely clever because tun(4) is >> not marked as MPSAFE, which means the if_start handler gets called with >> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls >> tun_wakeup with that implicit KERNEL_LOCK hold. >> >> i don't know why this hasn't blown up before. the network stack >> hasn't been run with the KERNEL_LOCK for ages now. >> >> tun_output with a custom if_enqueue handler would be a lot smarter, >> but that is a more invasive diff. > > I'd prefer if we could grab the KERNEK_LOCK() around csignal() and > selwakeup() like it is done in the socket code. I believe that we > need to show where the lock is taken in order to help people realize > where it needs to be pushed down. I should have been more clear, but this diff is to fix the tree, it's by no means a final fix. tun(4) itself could do with some further changes, including the ones you describe above. If you prefer I could just add KERNEL_LOCK to tun_wakeup? > Can we solve both issues with a similar fix? How hard can it be to get > selwakeup() out of the KERNEL_LOCK()? Figuring that out was going to go onto my todo. > Should we add a KASSERT in csignal() too? If it currently needs the kernel lock then it should have the assert for it. Otherwise we have issues like these waiting days or weeks for fixes. dlg > >> Index: net/if_tun.c >>
Re: panic: Stopped at kqueue_scan
On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote: > On 23/04/19(Tue) 12:16, Olivier Antoine wrote: > > >Synopsis:panic: Stopped at kqueue_scan > > >Category:kernel i386 > > >Environment: > > System : OpenBSD 6.5 > > Details : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21 > > 19:50:46 MDT 2019 > > > > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP > > > > Architecture: OpenBSD.i386 > > Machine : i386 > > >Description: > > Hi, since my last update I have regular panic crashes. 4 in two days. > > At least 3 of them, with certainty, occurred while I was accessing the > > Internet via my smartphone connected to my OpenBSD WiFi access point > > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor. > > This setup works for years. > > > > The machine then displays something like: > > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e > > kernel: page fault trap, code=0 > > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx) > > ddb{1}> > > So this indicates that the `kqueue' is empty. It should not happen > because the caller, in your case npppd, always places a marker in the > list. > > Since the caller is not threaded and the syscall is executed with the > KERNEL_LOCK() held, we can supposed that another part of the kernel is > removing the marker. That would imply that the other part isn't running > with the KERNEL_LOCK() and requires a MP kernel. > > Could you try *very hard* to reproduce the problem with a kernel built > with the diff below? Hopefully you'll make it crash and we'll find the > bug. Otherwise we'll look for another possible cause of the marker > removal. > > Index: kern/kern_event.c > === > RCS file: /cvs/src/sys/kern/kern_event.c,v > retrieving revision 1.101 > diff -u -p -r1.101 kern_event.c > --- kern/kern_event.c 27 Nov 2018 15:52:50 - 1.101 > +++ kern/kern_event.c 28 Apr 2019 21:47:25 - > @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc > struct kqueue *kq = kn->kn_kq; > struct klist *list; > > + KERNEL_ASSERT_LOCKED(); > + > if (kn->kn_fop->f_isfd) > list = >kq_knlist[kn->kn_id]; > else > i had a similar diff in my tree, and with some clues from this thread and the same panic from jmc@, points out that tun(4) calls tun_wakeup without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in the kq code messing up the kn_head list. fun fun. below is an extremely clever diff to tun to avoid doing the wakeup without the kernel lock held. i say extremely clever because tun(4) is not marked as MPSAFE, which means the if_start handler gets called with KERNEL_LOCK taken by the stack on its behalf. tun_start then calls tun_wakeup with that implicit KERNEL_LOCK hold. i don't know why this hasn't blown up before. the network stack hasn't been run with the KERNEL_LOCK for ages now. tun_output with a custom if_enqueue handler would be a lot smarter, but that is a more invasive diff. ok? Index: net/if_tun.c === RCS file: /cvs/src/sys/net/if_tun.c,v retrieving revision 1.184 diff -u -p -r1.184 if_tun.c --- net/if_tun.c3 Feb 2019 23:04:49 - 1.184 +++ net/if_tun.c29 Apr 2019 07:14:46 - @@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu return (error); } - tun_wakeup(tp); return (0); } Index: kern/kern_event.c === RCS file: /cvs/src/sys/kern/kern_event.c,v retrieving revision 1.101 diff -u -p -r1.101 kern_event.c --- kern/kern_event.c 27 Nov 2018 15:52:50 - 1.101 +++ kern/kern_event.c 29 Apr 2019 07:14:46 - @@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn) struct kqueue *kq = kn->kn_kq; int s = splhigh(); + KERNEL_ASSERT_LOCKED(); KASSERT((kn->kn_status & KN_QUEUED) == 0); TAILQ_INSERT_TAIL(>kq_head, kn, kn_tqe); @@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn) KASSERT(kn->kn_status & KN_QUEUED); + KERNEL_ASSERT_LOCKED(); TAILQ_REMOVE(>kq_head, kn, kn_tqe); kn->kn_status &= ~KN_QUEUED; kq->kq_count--; tun_wakeup
Re: ARP issues when using ldpd(8) and mpw(4)
On Fri, Feb 15, 2019 at 01:12:36PM +1100, Adrian Close wrote: > >Synopsis: Incorrect ARPing for routed LDP peer's loopback IP instead of > ARPing for next-hop router IP on expiry > >Category: system > >Environment: > System : OpenBSD 6.4 > Details : OpenBSD 6.4-current (GENERIC) #688: Wed Feb 13 12:16:06 MST > 2019 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > I'm having some issues getting an mpw(4) MPLS "pseudowire" setup working > reliably and it seems to come down to ARP issues between my "PE" and "P" > boxes that occur when ldpd(8) thinks the pseudowire l2vpn tunnels are > actually up. > > My setup looks something like: > >[cust L2] - [vic0]PE1[vic1] -- [vic0]P1[vic1] -- [vic1]P2[vic0] -- > [vic1][PE2][vic0] - [cust L2] > > ... the idea being to provide a layer 2 circuit between the "cust L2" > ports, over an MPLS network, which I've configured along the lines of > Claudio's "Demystifying MPLS" paper and Renato's "VPLS basic test > setup", using LDP and OSPF. This all works fine, except when it doesn't: > > So for example, when the ARP entry on the PE1 box for P1's vic0 IP > address times out (or is manually deleted), PE1 sends out ARP requests > not for that IP address, but for the loopback address of its pseudowire > peer PE2. > > It generally does this for about 40 seconds, and then finally sends an > ARP request for the correct IP, which is answered straight away and > things work again until next time. > > If ldpd(8) thinks the l2vpn tunnels are down ("ldpctl show l2vpn > pseudowires"), ARP requests are sent normally and work as expected. > > > >How-To-Repeat: > I made four VMs in VMWare ESX and connected them together in series, > ie. PE1[vic1]:[vic0]P1[vic1]:[vic1]P2[vic0]:[vic1]PE2. > Configure networking and osfpd/lpd as follows: > > PE1 configuration: > ospfd.conf: > router-id 172.31.211.101 > area 0.0.0.0 { > interface vic1 > interface lo1 > } > ldpd.conf: > router-id 172.31.211.101 > address-family ipv4 { > interface vic1 > } > l2vpn VMVPLS type vpls { > bridge bridge0 > interface vic0 > pseudowire mpw2 { > pw-id 102 > neighbor-id 172.31.221.101 > } > } > vic1: 192.168.11.101/24, MPLS enabled > lo1: 172.31.211.101/32 > bridge0: members vic0 + mpw2 > > P1 configuration: > ospfd.conf: > router-id 172.31.100.11 > area 0.0.0.0 { > interface vic0 > interface vic1 > interface lo1 > } > ldpd.conf: > router-id 172.31.100.11 > address-family ipv4 { > interface vic0 > interface vic1 > } > vic0: 192.168.11.11/24, MPLS enabled > vic1: 192.168.12.11/24, MPLS enabled > lo1: 172.31.100.11/32 > > P2 configuration: > ospfd.conf: > router-id 172.31.100.21 > area 0.0.0.0 { > interface vic0 > interface vic1 > interface lo1 > } > ldpd.conf: > router-id 172.31.100.21 > address-family ipv4 { > interface vic0 > interface vic1 > } > vic0: 192.168.12.21/24, MPLS enabled > vic1: 192.168.22.21/24, MPLS enabled > lo1: 172.31.100.21/32 > > PE2 configuration: > ospfd.conf: > router-id 172.31.221.101 > area 0.0.0.0 { > interface vic1 > interface lo1 > } > ldpd.conf: > router-id 172.31.221.101 > address-family ipv4 { > interface vic1 > } > l2vpn VMVPLS type vpls { > bridge bridge0 > interface vic0 > pseudowire mpw2 { > pw-id 102 > neighbor-id 172.31.211.101 > } > } > vic1: 192.168.22.201/24, MPLS enabled > lo1: 172.31.211.101/32 > bridge0: members vic0 + mpw2 > > > Bring up ospfd and ldpd on each box. Routing and MPLS should converge OK. > The L2VPN should come up (see 'ldpctl show l2vpn pseudowires' output). > Ethernet frames pass between vic0 interfaces on the PE boxes, via MPLS. > > Once the ARP entry on PE1 for P1 (192.168.11.11) and/or on PE2 for P2 > (192.168.22.21) times out, the pseudowire stops working. Similar behaviour > if > the ARP entry is manually deleted. > > The PE box will then start ARPing on vic1 for its pseudowire peer > (eg. 172.31.211.101) instead of its nexthop router for that peer, which of > course doesn't work. After a while it does ARP for the nexthop router > (eg. 192.168.11.11), P responds and connectivity is restored. > > > >Fix: > Configure static ARP entries for P router IP on PE boxes to work > around this. Ew. Hi Adrian, Could you please try the following diff? I was able to reproduce the problem you were seeing, but this makes it go away. Index: if_ethersubr.c === RCS file: /cvs/src/sys/net/if_ethersubr.c,v retrieving revision 1.258 diff -u -p -r1.258 if_ethersubr.c --- if_ethersubr.c 18 Feb 2019 03:41:21 - 1.258 +++ if_ethersubr.c 19 Feb 2019 03:23:15 - @@ -235,19 +235,16
Re: Bridging vlan over mpls. OpenBSD6.3
hey andrew, can i see the mpw0 interface according to ifconfig please? cheers, dlg > On 18 May 2018, at 20:51, and...@unixadmina.net wrote: > > I found strange behavior when tried to bridge vlan from OpenBSD box over > mpls. It seems like BSD box sends untagged packets received from mpls tunnel > instead of adding vlan tag. Is it known bug or am I just missing something? > > OpenBSD running on a PC with two vlans. > vyb-r0# uname -a > OpenBSD vyb-r0.loc 6.3 GENERIC.MP#107 amd64 > > > vlan107 -- mpls enabled vlan: > vyb-r0# ifconfig vlan107 > vlan107: flags=88843mtu 1500 >lladdr 70:71:bc:cc:fb:d4 >description: Kinda uplink interface >index 7 priority 0 llprio 3 >encap: vnetid 107 parent re0 >groups: vlan >media: Ethernet autoselect (1000baseT full-duplex) >status: active >inet 10.150.0.10 netmask 0xff00 broadcast 10.150.0.255 > > vlan2000 -- vlan that got to be bridged over mpls: > vyb-r0# ifconfig vlan2000 > vlan2000: flags=8943 mtu 1500 >lladdr 70:71:bc:cc:fb:d4 >description: local L2 interface >index 9 priority 0 llprio 3 >encap: vnetid 2000 parent re0 >groups: vlan >media: Ethernet autoselect (1000baseT full-duplex) >status: active > > bridging interface: > vyb-r0# ifconfig bridge0 > bridge0: flags=41 >index 4 llprio 3 >groups: bridge >priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp >designated: id 00:00:00:00:00:00 priority 0 >vlan2000 flags=3 >port 9 ifpriority 0 ifcost 0 >mpw0 flags=3 >port 6 ifpriority 0 ifcost 0 >Addresses (max cache: 100, timeout: 240): >e4:6f:13:aa:38:c1 mpw0 1 flags=0<> >e4:6f:13:aa:37:c1 vlan2000 1 flags=0<> > > > MPLS tunnel is up and running and I can see MACs and packets coming in and > out of tunnel. > > vyb-r0# tcpdump -nibridge0 -e > tcpdump: listening on bridge0, link-type EN10MB > 13:02:48.323723 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has > 10.150.2.40 tell 10.150.2.50 > 13:02:48.324253 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply > 10.150.2.40 is-at e4:6f:13:aa:38:c1 > 13:02:49.347673 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has > 10.150.2.40 tell 10.150.2.50 > 13:02:49.348255 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply > 10.150.2.40 is-at e4:6f:13:aa:38:c1 > 13:02:50.371668 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has > 10.150.2.40 tell 10.150.2.50 > 13:02:50.372173 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply > 10.150.2.40 is-at e4:6f:13:aa:38:c1 > 13:02:51.395596 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has > 10.150.2.40 tell 10.150.2.50 > 13:02:51.396143 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply > 10.150.2.40 is-at e4:6f:13:aa:38:c1 > > However, those arp replies don't reach 10.150.2.50 on vlan2000. > > I've mirrored OpenBSD port on a switch. vlan3000 is a target vlan for port > mirroring. Here's tcpdump on another PC that receives mirroring vlan. > > #tcpdump -nivlan3000 -e > 13:29:46.445351 e4:6f:13:aa:37:c1 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q > (0x8100), length 64: vlan 2000, p 0, ethertype ARP, Request who-has > 10.150.2.40 tell 10.150.2.50, length 46 > 13:29:46.445410 70:71:bc:cc:fb:d4 > b8:38:61:1a:8e:a1, ethertype 802.1Q > (0x8100), length 90: vlan 107, p 0, ethertype MPLS unicast, MPLS (label 21, > exp 0, ttl 255) (label 20, exp 0, [S], ttl 255) > 13:29:46.445820 b8:38:61:1a:8e:a1 > 70:71:bc:cc:fb:d4, ethertype 802.1Q > (0x8100), length 86: vlan 107, p 0, ethertype MPLS unicast, MPLS (label 16, > exp 0, [S], ttl 254) > 13:29:46.445870 e4:6f:13:aa:38:c1 > e4:6f:13:aa:37:c1, ethertype ARP > (0x0806), length 60: Reply 10.150.2.40 is-at e4:6f:13:aa:38:c1, length 46 > > First line is arp-request. (vlan2000) > Line two is arp-request sent over MPLS. (vlan107) > Line three is arp-answer sent over MPLS. (vlan107) > But forth line in simple untagged answer that got to be sent over vlan2000. > > I haven't checked if this issue is broadcast only or unicast has same > problem. I haven't checked if it is mpw only problem or any point-to-point > interface strives too. Bridge works ok with bridging two vlans. > > ospfd.conf and ldpd.conf are as simple as it could be. > > ospfd.conf: > router-id 10.128.0.10 > > area 0.0.0.0 { >interface vlan107 >interface lo1 > } > > > ldpd.conf: > router-id 10.128.0.10 > > l2vpn OFFICE type vpls { > bridge bridge0 > interface vlan2000 > pseudowire mpw0 { > neighbor-id 10.128.0.9 > pw-id > } > } > > address-family ipv4 { >interface vlan107 > } > > > Things that confuses me is why vlan bridging over mpls isn't working in my > setup. Bridging client's vlan
Re: pair(4) crashes on strict alignment platforms
On Thu, Mar 08, 2018 at 07:44:13PM +0100, Stefan Sperling wrote: > Running the first set of example commands from the pair(4) man page > crashes the kernel on at least sparc64 and octeon. > > # ifconfig pair1 rdomain 1 10.1.1.1/24 up > # ifconfig pair2 rdomain 2 10.1.1.2/24 up > # ifconfig pair1 patch pair2 > # route -T 1 exec ping 10.1.1.2 > > A netcat<->telnet connection from 10.1.1.1 to 10.1.1.2 works. > > It seems the problem only happens with ping, or short packets in general. > It looks like the crash is happening while processing the icmp echo reply. > This code in ip_input_if() calls m_pullup() which ends up setting m->m_data > to an unaligned address: > > if (m->m_len < sizeof (struct ip) && > (m = *mp = m_pullup(m, sizeof (struct ip))) == NULL) { > ipstat_inc(ips_toosmall); > goto bad; > } > ip = mtod(m, struct ip *); > if (ip->ip_v != IPVERSION) { // we crash here because ip is misaligned > > Note that pair(4) has dequeued this mbuf from its send queue and doesn't > modify it except for resetting the packet header if it exists. > > Trace from sparc64: > > panic: trap type 0x34 (mem address not aligned): pc=11336d4 npc=11336d8 > pstate=44820006> Stopped at db_enter+0x8: nop > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *291733 6703 0 0x14000 0x2000 softnet > trap(40016e6b890, 34, 11336d4, 44820006, 400023ff800, 8848) at trap+0x2e0 > Lslowtrap_reenter(400023ace00, 0, , 1c176d0, 4, 1) at > Lslowtrap_reenter+0xf8 > ip_input_if(40016e6bb48, 40016e6bb54, 4, 0, 400023ff800, 8848) at > ip_input_if+0x120 > ipv4_input(400023ff800, 18184e8, , 1c176d0, 4, 1) at > ipv4_input+0x3c > ether_input(400023ff800, 400023ace00, 0, 16545e8, , 8848) at > ether_input+0xc8 > if_input_process(400021607c0, 40016e6bde0, 131cb20, 1c176d0, 4, 0) at > if_input_process+0x11c > taskq_thread(4000216c080, 40002142fc0, 1758938, 16545e8, 0, 3b9ac800) at > taskq_thread+0x6c > proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x14 > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. > Insufficient info makes it difficult to find and fix bugs. > the diff below fixes the panic. the problem is m_adj doesn't keep m_data updated when removing all the data from one mbuf in a chain. in more detail, when we read the ping packet from userland into the kernel, it's put at the front of an mbuf, which is properly aligned for an ip packet. pair is an ethernet interface, so to send it an ethernet header is prepended. because the ip packet is at the start of an mbuf we allocate a new one for the ethernet header. that ends up being 6 bytes at the end of a new mbuf, with the end of the ethernet header properly aligned for an ip packet. pair then sends the packet back into the network stack, which m_adjs the ethernet header off the front of the packet. m_adj never deletes mbufs, it just sets their lengths to 0 if there's no more data left in an mbuf. because the ethernet header is all thats in that first mbuf it sets m_len to 0, but doesn't touch m_data. the ip stack then tries to access the ip header in the mbuf chain. because the first mbuf has 0 bytes in it it uses m_pulldown to get to the ip header. m_pulldown goes to a lot of lengths to maintain the alignment of the payload in an mbuf. because m_adj left m_data where an ethernet header is, ie, 6 bytes with the end aligned for a payload), m_pulldown puts the ip header where the ethernet header was, which is 2 byte aligned. netcat works because non-raw ip sockets have their headers allocated with space at the start for link headers. this diff makes m_adj update m_data in all paths. another option could be to have m_pullup skip over mbufs with m_len == 0 before finding the target alignment. or do both. Index: uipc_mbuf.c === RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v retrieving revision 1.253 diff -u -p -r1.253 uipc_mbuf.c --- uipc_mbuf.c 16 Jan 2018 19:44:34 - 1.253 +++ uipc_mbuf.c 9 Mar 2018 01:34:05 - @@ -812,11 +812,12 @@ m_adj(struct mbuf *mp, int req_len) while (m != NULL && len > 0) { if (m->m_len <= len) { len -= m->m_len; + m->m_data += m->m_len; /* move alignment */ m->m_len = 0; m = m->m_next; } else { - m->m_len -= len; m->m_data += len; + m->m_len -= len; len = 0; } }
Re: m_pullup(9) regression
> On 8 Nov 2016, at 20:01, Martin Pieuchotwrote: > > semarie exposed a bug in m_pullup(9) while testing my diff to > automatically create lo(4) interfaces per rdomain. > > In the block below ``m'' is dereferenced without being previously set. > > Is the diff below correct? yes. ok by me. > > Index: kern/uipc_mbuf.c > === > RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v > retrieving revision 1.237 > diff -u -p -r1.237 uipc_mbuf.c > --- kern/uipc_mbuf.c 27 Oct 2016 03:29:55 - 1.237 > +++ kern/uipc_mbuf.c 8 Nov 2016 09:57:06 - > @@ -896,7 +896,7 @@ m_pullup(struct mbuf *n, int len) > if (len > tail - mtod(n, caddr_t)) { > /* need to memmove to make space at the end */ > memmove(head, mtod(n, caddr_t), n->m_len); > - m->m_data = head; > + n->m_data = head; > } > > len -= n->m_len;
Re: hang at scsibus1 at mpii0: 128 targets
On Mon, Oct 31, 2016 at 02:58:10PM +0100, Simon Mages wrote: > >Synopsis: OpenBSD-current hang at scsibus1 at mpii0: 128 targets > >Category: driver issue > >Environment: > System : OpenBSD 6.0 > Details : OpenBSD 6.0-current (GENERIC.MP) #2: Mon Oct 31 > 14:39:18 CET 2016 > > r...@somebox.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > after rebuilding the current kernel and booting it the machine hangs > at "scsibus1 at mpii0: 128 targets". can you try this? Index: mpii.c === RCS file: /cvs/src/sys/dev/pci/mpii.c,v retrieving revision 1.107 diff -u -p -r1.107 mpii.c --- mpii.c 24 Oct 2016 01:50:09 - 1.107 +++ mpii.c 1 Nov 2016 00:13:56 - @@ -888,6 +888,9 @@ mpii_scsi_probe(struct scsi_link *link) if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED)) return (1); + if (ISSET(flags, MPII_DF_VOLUME)) + return (0); + memset(, 0, sizeof(ehdr)); ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED; ehdr.page_number = 0;
Re: changing lladdr on vlan(4) breaks rx-path
On Tue, Dec 22, 2015 at 10:11:23PM +0100, Paul de Weerd wrote: > Hi all, > > My ISP requires me to use different MAC addresses for internet and TV > access. These services arrive at the demarc as two different ethernet > VLANs, vlan4 (television) and vlan34 (internet), on one copper GE > port. > > I can get a lease on vlan34 just fine, and have internet access. Then > I change the MAC on my vlan4 interface: > > $ ifconfig vlan4 lladdr > > However, dhclient doesn't give me a lease. To debug, I run tcpdump > and try again: lo, an offer arrives. > > Turns out, vlan4 only works when the parent interface (em2, in my > case) is in promiscuous mode. Seems to make sense from a naive > point of view: the interface filters traffic that's destined for it's > own MAC address, dropping traffic for other MACs. Is this a problem > with the em(4) driver, with vlan(4) or elsewhere? i think vlan. the diff below allows a vlan interface to be configured with a custom mac address. if that is done, itll mark itself as having a custom lladdr and will in turn enable promisc on the parent interface so those packets will actually end up coming into the kernel. it supports setting the mac address while the vlan interface is both up and down. if you set the mac address on the vlan interface to 00:00:00:00:00:00, itll treat that as removing the custom mac and will replace it with the parents mac address. lastly, this makes no effort to cope with the mac address of the parent interface being changed at runtime. Index: if_vlan_var.h === RCS file: /cvs/src/sys/net/if_vlan_var.h,v retrieving revision 1.35 diff -u -p -r1.35 if_vlan_var.h --- if_vlan_var.h 15 Apr 2016 04:34:10 - 1.35 +++ if_vlan_var.h 19 Apr 2016 13:30:31 - @@ -81,7 +81,8 @@ structifvlan { #defineifv_tag ifv_mib.ifvm_tag #defineifv_prioifv_mib.ifvm_prio #defineifv_typeifv_mib.ifvm_type -#defineIFVF_PROMISC0x01 +#defineIFVF_PROMISC0x01/* the parent should be made promisc */ +#defineIFVF_LLADDR 0x02/* don't inherit the parents mac */ struct mbuf*vlan_inject(struct mbuf *, uint16_t, uint16_t); #endif /* _KERNEL */ Index: if_vlan.c === RCS file: /cvs/src/sys/net/if_vlan.c,v retrieving revision 1.162 diff -u -p -r1.162 if_vlan.c --- if_vlan.c 15 Apr 2016 04:34:10 - 1.162 +++ if_vlan.c 19 Apr 2016 13:30:31 - @@ -92,8 +92,6 @@ int vlan_up(struct ifvlan *); intvlan_parent_up(struct ifvlan *, struct ifnet *); intvlan_down(struct ifvlan *); -intvlan_promisc(struct ifvlan *, int); - void vlan_ifdetach(void *); void vlan_link_hook(void *); void vlan_link_state(struct ifvlan *, u_char, u_int64_t); @@ -107,6 +105,9 @@ int vlan_multi_del(struct ifvlan *, stru void vlan_multi_apply(struct ifvlan *, struct ifnet *, u_long); void vlan_multi_free(struct ifvlan *); +intvlan_iff(struct ifvlan *); +intvlan_setlladdr(struct ifvlan *, struct ifreq *); + intvlan_set_compat(struct ifnet *, struct ifreq *); intvlan_get_compat(struct ifnet *, struct ifreq *); @@ -432,6 +433,7 @@ vlan_parent_up(struct ifvlan *ifv, struc if_ih_insert(ifp0, vlan_input, NULL); return (0); + } int @@ -470,7 +472,8 @@ vlan_up(struct ifvlan *ifv) /* parent is fine, let's prepare the ifv to handle packets */ ifp->if_hardmtu = hardmtu; SET(ifp->if_flags, ifp0->if_flags & IFF_SIMPLEX); - if_setlladdr(ifp, LLADDR(ifp0->if_sadl)); + if (!ISSET(ifv->ifv_flags, IFVF_LLADDR)) + if_setlladdr(ifp, LLADDR(ifp0->if_sadl)); if (ifv->ifv_type != ETHERTYPE_VLAN) { /* @@ -522,7 +525,8 @@ leave: rw_exit(_tagh_lk); scrub: ifp->if_capabilities = 0; - if_setlladdr(ifp, etheranyaddr); + if (!ISSET(ifv->ifv_flags, IFVF_LLADDR)) + if_setlladdr(ifp, etheranyaddr); CLR(ifp->if_flags, IFF_SIMPLEX); ifp->if_hardmtu = 0x; put: @@ -564,7 +568,8 @@ vlan_down(struct ifvlan *ifv) rw_exit_write(_tagh_lk); ifp->if_capabilities = 0; - if_setlladdr(ifp, etheranyaddr); + if (!ISSET(ifv->ifv_flags, IFVF_LLADDR)) + if_setlladdr(ifp, etheranyaddr); CLR(ifp->if_flags, IFF_SIMPLEX); ifp->if_hardmtu = 0x; @@ -617,29 +622,6 @@ vlan_link_state(struct ifvlan *ifv, u_ch } int -vlan_promisc(struct ifvlan *ifv, int promisc) -{ - struct ifnet *ifp0; - int error = 0; - - if ((ISSET(ifv->ifv_flags, IFVF_PROMISC) ? 1 : 0) == promisc) - return (0); - - ifp0 = if_get(ifv->ifv_ifp0); - if (ifp0 != NULL) { - error = ifpromisc(ifp0, promisc); - } - if_put(ifp0); - - if (error == 0) { - CLR(ifv->ifv_flags,
Re: re(4) driver issues on RTL8168F
hey marc, is this reproducable? if so, next time it happens can you collect ifconfig, systat mb, and vmstat -m output? im looking for IFF_OACTIVE in ifconfig, an ALIVE value that is different to the CWM in systat mb, or really high numbers against the mbuf pools in vmstat -m output. cheers, dlg > On 9 Mar 2016, at 04:23, Marc Espiewrote: > > Well, I've got this on my desktop for a while, didn't work at the time, got > a cheap pci card that worked. > > However, I got a NAS, with two network ports, and I thought "let's try > dedicating one port to the desktop, that way I get 1Gb instead of going > thru the switch". > > > Unfortunately, re(4) is still broken in the exact same way as when I got the > controler. > > Symptom: > during rsync copy, card completely freezes, tcpdump shows no packets. > ^C in the copying window gets the card "back" up to speed. > > /var/log/messages shows this each time: > Mar 3 17:34:44 nausicaa /bsd: re0: watchdog timeout > Mar 3 17:34:44 nausicaa /bsd: re0: stopping TXQ timed out! > > There's probably something missing on that particular make of the controler. > I note that the recent tests DON'T involve any RTL8168F, lucky me. > > Suggestions/fix welcome... > > dmesg: > > OpenBSD 5.9-current (GENERIC.MP) #1905: Sun Mar 6 19:13:08 MST 2016 >dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 8522469376 (8127MB) > avail mem = 8259854336 (7877MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb600 (49 entries) > bios0: vendor American Megatrends Inc. version "0601" date 12/25/2012 > bios0: ASUSTeK COMPUTER INC. CM1435 > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP APIC FPDT MCFG HPET MSDM SSDT SSDT IVRS BGRT > acpi0: wakeup devices SBAZ(S4) PS2K(S4) PS2M(S4) OHC1(S4) EHC1(S4) OHC2(S4) > EHC2(S4) OHC3(S4) EHC3(S4) OHC4(S4) XHC0(S4) XHC1(S4) PE21(S4) RLAN(S4) > PE22(S4) PE23(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 32 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 16 (boot processor) > cpu0: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.45 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1 > cpu0: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu0: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative > cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges > cpu0: apic clock running at 100MHz > cpu0: mwait min=64, max=64, IBE > cpu1 at mainbus0: apid 17 (application processor) > cpu1: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.01 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1 > cpu1: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu1: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative > cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu1: smt 0, core 1, package 0 > cpu2 at mainbus0: apid 18 (application processor) > cpu2: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.01 MHz > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1 > cpu2: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu2: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative > cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative > cpu2: smt 0, core 2, package 0 > cpu3 at mainbus0: apid 19 (application processor) > cpu3: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.01 MHz > cpu3: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1 > cpu3: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line > 16-way L2 cache > cpu3: ITLB 48 4KB
Re: alignment fault on armv7 when using carp(4)
On Mon, Feb 08, 2016 at 11:02:06PM +1000, David Gwynne wrote: > On Sat, Feb 06, 2016 at 04:43:28PM -0500, Anthony Eden wrote: > > >Synopsis: > > > > To me that behavior might suggest the problem is deeper than a > > bookkeeping mistake of aligning memory in mbuf. > > nope, you were right, it's a screwup with alignment. > > the problem is multicast packets that arent to a carp interfaces > mac address have to be duplicated and sent to all carp interfaces > on a parent. the duplication is done with m_copym2, which doesn't > respect the alignment requirements of the ip header inside the 14 > byte ethernet header. > > the following dups the packet inside carp, and makes sure the > ethernet payload is aligned properly. > > i was able to reproduce this on sparc64, and i believe this fixes > it. could you test it and see if it helps? mpi@ pointed out that bridge@ has a special function to do a deep copy of mbufs that get the ip payload alignment right, and that we should share. this moves the functionality in with the rest of the mbuf functions. could a bridge user test this to see if it still works? carp seems fine with this on sparc64 stil. ok? Index: kern/uipc_mbuf.c === RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v retrieving revision 1.218 diff -u -p -r1.218 uipc_mbuf.c --- kern/uipc_mbuf.c31 Jan 2016 00:18:07 - 1.218 +++ kern/uipc_mbuf.c9 Feb 2016 09:49:21 - @@ -1213,6 +1213,40 @@ m_dup_pkthdr(struct mbuf *to, struct mbu return (0); } +struct mbuf * +m_dup_pkt(struct mbuf *m0, unsigned int adj, int wait) +{ + struct mbuf *m; + int len; + + len = m0->m_pkthdr.len + adj; + if (len > MAXMCLBYTES) /* XXX */ + return (NULL); + + m = m_get(m0->m_type, wait); + if (m == NULL) + return (NULL); + + if (m_dup_pkthdr(m, m0, wait) != 0) + goto fail; + + if (len > MHLEN) { + MCLGETI(m, len, NULL, wait); + if (!ISSET(m->m_flags, M_EXT)) + goto fail; + } + + m->m_len = m->m_pkthdr.len = len; + m_adj(m, adj); + m_copydata(m0, 0, m0->m_pkthdr.len, mtod(m, caddr_t)); + + return (m); + +fail: + m_freem(m); + return (NULL); +} + #ifdef DDB void m_print(void *v, Index: net/if_bridge.c === RCS file: /cvs/src/sys/net/if_bridge.c,v retrieving revision 1.275 diff -u -p -r1.275 if_bridge.c --- net/if_bridge.c 5 Dec 2015 10:07:55 - 1.275 +++ net/if_bridge.c 9 Feb 2016 09:49:21 - @@ -137,7 +137,6 @@ int bridge_ipsec(struct bridge_softc *, int bridge_clone_create(struct if_clone *, int); intbridge_clone_destroy(struct ifnet *ifp); intbridge_delete(struct bridge_softc *, struct bridge_iflist *); -struct mbuf *bridge_m_dup(struct mbuf *); #defineETHERADDR_IS_IP_MCAST(a) \ /* struct etheraddr *a; */ \ @@ -800,7 +799,7 @@ bridge_output(struct ifnet *ifp, struct used = 1; mc = m; } else { - mc = bridge_m_dup(m); + mc = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT); if (mc == NULL) { sc->sc_if.if_oerrors++; continue; @@ -1090,7 +1089,7 @@ bridge_process(struct ifnet *ifp, struct (ifl->bif_state == BSTP_IFSTATE_DISCARDING)) goto reenqueue; - mc = bridge_m_dup(m); + mc = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT); if (mc == NULL) goto reenqueue; @@ -1227,7 +1226,7 @@ bridge_broadcast(struct bridge_softc *sc mc = m; used = 1; } else { - mc = bridge_m_dup(m); + mc = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT); if (mc == NULL) { sc->sc_if.if_oerrors++; continue; @@ -1277,7 +1276,7 @@ bridge_localbroadcast(struct bridge_soft return; } - m1 = bridge_m_dup(m); + m1 = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT); if (m1 == NULL) { sc->sc_if.if_oerrors++; return; @@ -2017,37 +2016,4 @@ bridge_copyaddr(struct sockaddr *src, st memcpy(dst, src, src->sa_len); else dst->sa_family = AF_UNSPEC; -} - -/* - * Specialized deep copy to ensure that the payload after the Ethernet - * header is nicely aligned. - */ -struct mbuf * -bridge_m
Re: alignment fault on armv7 when using carp(4)
On Mon, Feb 08, 2016 at 05:45:50PM -0500, Anthony Eden wrote: > Thanks for the quick response. Indeed, the patch makes the alignment > faults go away. But the 'device timeout' messages coming from cpsw(4) > remain. > > To elaborate a bit, I set up three terminals pinging 192.168.123.201, > 192.168.123.202, and 192.168.123.222 (the shared IP). After ~1min I > get no answers from 192.168.123.201 and 192.168.123.201 (although the > times differ). For a few minutes the hosts remain unreachable. dmesg > output looks like > > carp0: state transition: BACKUP -> MASTER > carp0: state transition: MASTER -> BACKUP > carp0: state transition: BACKUP -> MASTER > cpsw0: device timeout > carp0: state transition: MASTER -> BACKUP > carp0: state transition: BACKUP -> MASTER > cpsw0: device timeout > ... > > This bug seems unrelated to the alignment faults issue. I would > investigate given some pointers in the right direction. > > If this is under the purview of cpsw(4), would it be advisable to > submit a new bug report? id mail canacar@, he's been working on cpsw.
Re: alignment fault on armv7 when using carp(4)
> On 9 Feb 2016, at 9:12 PM, Mike Belopuhov <m...@belopuhov.com> wrote: > > On 9 February 2016 at 11:31, David Gwynne <da...@gwynne.id.au> wrote: >> On Mon, Feb 08, 2016 at 11:02:06PM +1000, David Gwynne wrote: >>> On Sat, Feb 06, 2016 at 04:43:28PM -0500, Anthony Eden wrote: >>>>> Synopsis: >>>> >>>>To me that behavior might suggest the problem is deeper than a >>>>bookkeeping mistake of aligning memory in mbuf. >>> >>> nope, you were right, it's a screwup with alignment. >>> >>> the problem is multicast packets that arent to a carp interfaces >>> mac address have to be duplicated and sent to all carp interfaces >>> on a parent. the duplication is done with m_copym2, which doesn't >>> respect the alignment requirements of the ip header inside the 14 >>> byte ethernet header. >>> >>> the following dups the packet inside carp, and makes sure the >>> ethernet payload is aligned properly. >>> >>> i was able to reproduce this on sparc64, and i believe this fixes >>> it. could you test it and see if it helps? >> >> mpi@ pointed out that bridge@ has a special function to do a deep >> copy of mbufs that get the ip payload alignment right, and that we >> should share. >> >> this moves the functionality in with the rest of the mbuf functions. >> >> could a bridge user test this to see if it still works? carp seems >> fine with this on sparc64 stil. >> >> ok? >> > > m_adj can be done as part of the m_copym2 as well. you want to shove m_adj into m_copym2? or you want m_copym2 callers to m_prepend 2 bytes first? > In the long run I don't think that introducing a new function > makes sense, not sure about 5.9 and right now, though. im not sure using m_copym2 for a deep copy makes that much sense generally. it's not a great implementation, and the vast majority of the callers use it to copy everything.
Re: alignment fault on armv7 when using carp(4)
On Sat, Feb 06, 2016 at 04:43:28PM -0500, Anthony Eden wrote: > >Synopsis: > >Category: arm > >Environment: > System : OpenBSD 5.9 > Details : OpenBSD 5.9 (DBGGENERIC) #0: Sat Feb 6 12:22:27 EST 2016 > r...@beagle2.mit.edu:/usr/src/sys/arch/armv7/compile/DBGGENERIC > > Architecture: OpenBSD.armv7 > Machine : armv7 > >Description: > With two beaglebone black's running -current, an alignment fault is > encountered at ip_input.c:262 in ipv4_input() when they are > configured to use carp(4) to share the same IP address. > > Source context from ip_input.c (alignment fault occurs when > ip->ip_dst.s_addr is loaded at line 262): > > 258:ip = mtod(m, struct ip *); > 259:} > 260: > 261:/* 127/8 must not appear on wire - RFC1122 */ > 262:if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET > || > 263: (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) { > 264:if ((ifp->if_flags & IFF_LOOPBACK) == 0) { > 265:ipstat.ips_badaddr++; > 266:goto bad; > > ddb(4) output: > > $ Fatal kernel mode data abort: 'Alignment Fault 1' > trapframe: 0xcb2d8e40 > DFSR=0001, DFAR=c4cb401e, spsr=8013 > r0 =c924d400, r1 =0003, r2 =0045, r3 =0038 > r4 =c4cb400e, r5 =c06f2ca4, r6 =0014, r7 =c4d65800 > r8 =c0710e50, r9 =c069294c, r10=c0692918, r11=cb2d8eb8 > r12=6093, ssp=cb2d8e8c, slr=c040bc88, pc =c04616ec > > Stopped at ipv4_input+0x9c:ldrls r3, [r4, #0x010] > ddb> trace > ipv4_input+0xc > scp=0xc046165c rlv=0xc0461ab4 (ipintr+0x24) > rsp=0xcb2d8ebc rfp=0xcb2d8ecc > r10=0xc0692918 r8=0xc0710e50 r7=0xc06edd88 r6=0xc06edd88 > r5=0x r4=0x0004 > ipintr+0xc > scp=0xc0461a9c rlv=0xc041b290 (netintr+0xa0) > rsp=0xcb2d8ed0 rfp=0xcb2d8ef0 > netintr+0xc > scp=0xc041b1fc rlv=0xc053f3d0 (softintr_dispatch+0x84) > rsp=0xcb2d8ef4 rfp=0xcb2d8f10 > r7=0x r6=0xc0710eb4 r5=0xc0710ec0 r4=0xc89e13a0 > softintr_dispatch+0x18 > scp=0xc053f364 rlv=0xc053eef8 (arm_do_pending_intr+0x110) > rsp=0xcb2d8f14 rfp=0xcb2d8f40 > r6=0xc0710190 r5=0x2013 r4=0x0004 > arm_do_pending_intr+0x10 > scp=0xc053edf8 rlv=0xc040d9a8 (if_input_process+0xcc) > rsp=0xcb2d8f44 rfp=0xcb2d8f78 > r10=0xc0692918 r9=0x r8=0x r7=0xcb2d8f44 > r6=0x r5=0xc4d65800 r4=0xc4d57480 > if_input_process+0xc > scp=0xc040d8e8 rlv=0xc03b5c2c (taskq_thread+0x90) > rsp=0xcb2d8f7c rfp=0xcb2d8fb0 > r10=0xc06e643c r8=0xc06e65d8 r7=0xcb2d8f7c r6=0x0001 > r5=0xc89e2040 r4=0xc03b5b04 > taskq_thread+0xc > scp=0xc03b5ba8 rlv=0xc0536c10 (proc_trampoline+0x18) > rsp=0xcb2d8fb4 rfp=0xc07f3edc > r7=0x r6=0x r5=0xc89e2040 r4=0xc03b5b9c > Bad frame pointer: 0xc07f3edc > > this problem has also been encountered with both BB's running -stable. > > >How-To-Repeat: > Install either -current or -stable on two beaglebone black's, with names > beagle1 and beagle2. On a LAN 192.168.123.0/24 with default > gateway 192.168.123.2, set /etc/mygate to 192.168.123.2 on beagle1 and > beagle2, then set /etc/hostname.cpsw0 on beagle1 to be > > inet 192.168.123.201 255.255.255.0 NONE > > and on beagle2 > > inet 192.168.123.202 255.255.255.0 NONE > > then run the following commands on both to use carp(4): > > doas ifconfig carp0 create > doas ifconfig carp0 vhid 1 pass tyrell carpdev cpsw0 192.168.123.222 > netmask 255.255.255.0 > > shortly thereafter a beaglebone will encounter an alignment fault. > > >Fix: > The cause of this problem is unknown to me. I would speculate that the > issue lies in m_pullup mishandling alignment, given that netowkring on > the beaglebone black usually functions normally, and that there are > branches prior to the crash in which m_pullup is used in deriving a > pointer to ip, which when using carp(4) apparently misaligned. > > In investigating this issue further, I replaced offending 32-bit loads > in the kernel with calls to get_unaligned_le32(), defined as (from > linux/unaligned/packed_struct.h): > > struct __una_u32 { u32 x; } __packed; > static inline u32 get_unaligned_le32(const void *p) { > const struct __una_u32 *ptr = (const struct __una_u32 *)p; > return ptr->x; > } > > Other than replacements in ip_input.c, udp_usrreq.c was also changed as > well as the macros IN6_IS_ADDR_UNSPECIFIED, IN6_IS_ADDR_LOOPBACK, > IN6_IS_ADDR_V4COMPAT, and IN6_IS_ADDR_V4MAPPED in in6.h. > > This resulted in carp(4) appearing to function normally, but beagle1 > and beagle2 repeatedly lost networking temporarily and recurrent > 'device timeout's appeared in dmesg (as well as carp(4) messages > informing state changes from master to
Re: vr(4) mbuf leak
> On 25 Jan 2016, at 9:19 AM, Richard Procter> wrote: > > > vr(4) leaks mbufs on vr_encap() failure: > > - neither vr_encap() nor vr_start() call m_free*() > - *m has been dequeued at call to vr_encap() > > Tested by forcing vr_encap() failure and observing 'netstat -m' > > While here, prettify a NULL test. absolutely correct. i just committed these tweaks. thank you, dlg > > best, > Richard. > > vr0 at pci0 dev 9 function 0 "VIA VT6105M RhineIII" rev 0x96: irq 10, > address 00:0d:b9:xx:xx:xx > ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI > 0x004063, model 0x0034 > > Index: sys/dev/pci/if_vr.c > === > --- sys.orig/dev/pci/if_vr.c > +++ sys/dev/pci/if_vr.c > @@ -1324,12 +1324,13 @@ vr_start(struct ifnet *ifp) > } > > IFQ_DEQUEUE(>if_snd, m); > - if (m== NULL) > + if (m == NULL) > break; > > /* Pack the data into the descriptor. */ > head_tx = cur_tx; > if (vr_encap(sc, _tx, m)) { > + m_freem(m); > ifp->if_oerrors++; > continue; > } >
Re: system crash on latest snapshot
> On 12 Jan 2016, at 22:24, Timo Myyräwrote: >>> > > No joy in sight, just got another router crash while streaming video (tablet) > and downloading files (laptop) at the same time through it. Nothing in the > logs > so no details to share. Both clients were connected to the crashing > router/firewall with its athn0 wireless adapter. I quess the increased stress > to > egress em0 interface trigger the crash but as I don't have any real facts to > share. > > I updated kernel this morning as I saw you made some changes to if_em.c to see > it would help. Doesn't seem to be the case for me. do you know if it was the same panic or what the trace was like? dlg
Re: system crash on latest snapshot
Any joy? Out the other one where it panics? On 12 Jan 2016 04:43, "Timo Myyrä" <timo.my...@wickedbsd.net> wrote: > David Gwynne <da...@gwynne.id.au> writes: > > >> On 11 Jan 2016, at 3:59 AM, timo.my...@wickedbsd.net wrote: > >> > >>> Synopsis: server occasional hangs / crashes on latest snapshot > >>> Category: kernel > >>> Environment: > >> System : OpenBSD 5.9 > >> Details : OpenBSD 5.9-beta (GENERIC.MP) #1807: Sat Jan 9 > 13:46:21 MST 2016 > >> dera...@amd64.openbsd.org: > /usr/src/sys/arch/amd64/compile/GENERIC.MP > >> > >> Architecture: OpenBSD.amd64 > >> Machine : amd64 > >>> Description: > >> > >> I updated my router box to 8.1. snapshot. Noticed that the system > >> would hang / crash occasional in the normal use. > >> /var/log/messages has following: > >> Jan 10 11:36:17 charon /bsd: em0: watchdog: cons 0 prod 503 free > 521 TDH 503 TDT 503 > >> Jan 10 18:25:12 charon /bsd: em0: watchdog: cons 0 prod 494 free > 530 TDH 494 TDT 494 > >> Jan 10 18:29:40 charon /bsd: fatal protection fault in supervisor > mode > >> Jan 10 18:29:40 charon /bsd: trap type 4 code 0 rip > 811f2d60 cs 8 rflags 10282 cr2 ef9ef4cf010 cpl 5 rsp > 8000319c38c8 > >> Jan 10 18:29:40 charon /bsd: panic: trap type 4, code=0, > pc=811f2d60 > >> Jan 10 18:29:40 charon /bsd: Starting stack trace... > >> Jan 10 18:29:40 charon /bsd: panic() at panic+0x10b > >> Jan 10 18:29:40 charon /bsd: trap() at trap+0x7b8 > >> Jan 10 18:29:40 charon /bsd: --- trap (number 4) --- > >> Jan 10 18:29:40 charon /bsd: _bpf_mtap() at _bpf_mtap+0x50 > >> Jan 10 18:29:40 charon /bsd: bpf_mtap_ether() at > bpf_mtap_ether+0x39 > >> Jan 10 18:29:40 charon /bsd: em_start() at em_start+0xa7 > >> Jan 10 18:29:40 charon /bsd: ifq_serialize() at ifq_serialize+0xd9 > >> Jan 10 18:29:40 charon /bsd: if_enqueue() at if_enqueue+0x71 > >> Jan 10 18:29:40 charon /bsd: ether_output() at ether_output+0x166 > >> Jan 10 18:29:40 charon /bsd: ip_output() at ip_output+0x71b > >> Jan 10 18:29:40 charon /bsd: ip_forward() at ip_forward+0x1cb > >> Jan 10 18:29:40 charon /bsd: ipv4_input() at ipv4_input+0x309 > >> Jan 10 18:29:40 charon /bsd: ipintr() at ipintr+0x1e > >> Jan 10 18:29:40 charon /bsd: netintr() at netintr+0x64 > >> Jan 10 18:29:40 charon /bsd: softintr_dispatch() at > softintr_dispatch+0x8b > >> Jan 10 18:29:40 charon /bsd: Xsoftnet() at Xsoftnet+0x1f > >> Jan 10 18:29:40 charon /bsd: --- interrupt --- > >> Jan 10 18:29:40 charon /bsd: end trace frame: 0x0, count: 242 > >> Jan 10 18:29:40 charon /bsd: taskq_thread+0x6c: > >> Jan 10 18:29:40 charon /bsd: End of stack trace. > >> > >> > >>> How-To-Repeat: > >> Keep the system running. > >>> Fix: > >> not known > > > > ola, > > > > could you try -current, specifically after src/sys/dev/pci/if_em.c > r1.328, and see if this is still a problem? > > > > dlg > > Ok, > > I updated to the latest snapshot (dated Mon Jan 11 > 01:47:58 MST 2016) and with it system seemed to crash while I was away. > Couldn't see anything in the logs nor was the display or keyboard > responsive so > I haven't got any details out of it. > > Just updated the source tree and compiled kernel from latest sources. > Waiting to see > how it behaves with it. > > Timo > >
Re: system crash on latest snapshot
> On 11 Jan 2016, at 3:59 AM, timo.my...@wickedbsd.net wrote: > >> Synopsis:server occasional hangs / crashes on latest snapshot >> Category:kernel >> Environment: > System : OpenBSD 5.9 > Details : OpenBSD 5.9-beta (GENERIC.MP) #1807: Sat Jan 9 13:46:21 > MST 2016 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >> Description: > > I updated my router box to 8.1. snapshot. Noticed that the system > would hang / crash occasional in the normal use. > /var/log/messages has following: > Jan 10 11:36:17 charon /bsd: em0: watchdog: cons 0 prod 503 free 521 > TDH 503 TDT 503 > Jan 10 18:25:12 charon /bsd: em0: watchdog: cons 0 prod 494 free 530 > TDH 494 TDT 494 > Jan 10 18:29:40 charon /bsd: fatal protection fault in supervisor mode > Jan 10 18:29:40 charon /bsd: trap type 4 code 0 rip 811f2d60 cs > 8 rflags 10282 cr2 ef9ef4cf010 cpl 5 rsp 8000319c38c8 > Jan 10 18:29:40 charon /bsd: panic: trap type 4, code=0, > pc=811f2d60 > Jan 10 18:29:40 charon /bsd: Starting stack trace... > Jan 10 18:29:40 charon /bsd: panic() at panic+0x10b > Jan 10 18:29:40 charon /bsd: trap() at trap+0x7b8 > Jan 10 18:29:40 charon /bsd: --- trap (number 4) --- > Jan 10 18:29:40 charon /bsd: _bpf_mtap() at _bpf_mtap+0x50 > Jan 10 18:29:40 charon /bsd: bpf_mtap_ether() at bpf_mtap_ether+0x39 > Jan 10 18:29:40 charon /bsd: em_start() at em_start+0xa7 > Jan 10 18:29:40 charon /bsd: ifq_serialize() at ifq_serialize+0xd9 > Jan 10 18:29:40 charon /bsd: if_enqueue() at if_enqueue+0x71 > Jan 10 18:29:40 charon /bsd: ether_output() at ether_output+0x166 > Jan 10 18:29:40 charon /bsd: ip_output() at ip_output+0x71b > Jan 10 18:29:40 charon /bsd: ip_forward() at ip_forward+0x1cb > Jan 10 18:29:40 charon /bsd: ipv4_input() at ipv4_input+0x309 > Jan 10 18:29:40 charon /bsd: ipintr() at ipintr+0x1e > Jan 10 18:29:40 charon /bsd: netintr() at netintr+0x64 > Jan 10 18:29:40 charon /bsd: softintr_dispatch() at > softintr_dispatch+0x8b > Jan 10 18:29:40 charon /bsd: Xsoftnet() at Xsoftnet+0x1f > Jan 10 18:29:40 charon /bsd: --- interrupt --- > Jan 10 18:29:40 charon /bsd: end trace frame: 0x0, count: 242 > Jan 10 18:29:40 charon /bsd: taskq_thread+0x6c: > Jan 10 18:29:40 charon /bsd: End of stack trace. > > >> How-To-Repeat: > Keep the system running. >> Fix: > not known ola, could you try -current, specifically after src/sys/dev/pci/if_em.c r1.328, and see if this is still a problem? dlg
Re: httpd crashes when fetching a hidden file located on a CD
> On 11 Dec 2015, at 9:23 PM, Ted Unangstwrote: > > Ted Unangst wrote: >> Jonathan Gray wrote: There's one thing to add though, it looks like it happens for any file on cd9660, not just dotfiles. >>> >>> It is worth pointing out that httpd has had trouble serving files off >>> specific filesystems in the past due to kqueue issues. >>> >>> cd9660_vops does not currently set .vop_kqfilter, does anything change >>> if you set EVENT_NOKQUEUE before running httpd? >> >> this maybe adds kqueue to cd9660. > > We've confirmed this diff fixes the problem. However, there seems to be a > larger problem that httpd/libevent cannot gracefully handle the condition > where kevent returns an error. We are doomed to see this problem repeat if > that is not addressed. on one hand i agree with you, but on the other i wonder why httpd thinks setting events up on files is useful.
Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR
> On 8 Dec 2015, at 10:17, Steven Chamberlain <ste...@pyro.eu.org> wrote: > > Hi, > > David Gwynne wrote: >> hrm. could you try it without your diff below and see if its still stable? > > I can still reproduce the same crash, if I use your diff without mine. > > I'll try again with latest -CURRENT in light of these recent commits: > http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/dev/vnet.c.diff?r1=1.51=1.52=h > http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/dev/vnet.c.diff?r1=1.52=1.53=h thats a network driver for "hardware" you dont have on an x1. > I wonder, exactly which pools does dc(4) use? As with previous crashes, > dma512 had Pgreq=1, Pgrel=1 meaning it was empty at the time, and had > already been freed. dc uses the mbuf pools. the dma512 use comes from an ata driver from what i can tell. maybe i should try and get an x1 to reproduce this with.
Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR
> On 30 Nov 2015, at 9:01 AM, Steven Chamberlain <ste...@pyro.eu.org> wrote: > > Hi! > > David Gwynne wrote: >> could you guys try this and let me know how it goes? i dont expect >> it to fix the problems, but i also dont expect them to get worse. > > I've never seen those panics with dc(4), but I'm now testing -CURRENT > with your patch applied as well as mine below, and my Netra X1 is stable > so far. Thanks! hrm. could you try it without your diff below and see if its still stable? my theory is dc is (was?) sensitive to the layout of objects in memory, so by moving the pool page headers in or out of the item pages you're moving dc next to something that ends up causing the iommu to fault. cheers, dlg > > --- kern/subr_pool.c11 Sep 2015 09:26:13 - 1.193 > +++ kern/subr_pool.c29 Nov 2015 22:59:21 - > @@ -258,7 +258,7 @@ pool_init(struct pool *pp, size_t size, > */ >if (pgsize - (size * items) > sizeof(struct pool_item_header)) { >off = pgsize - sizeof(struct pool_item_header); > - } else if (sizeof(struct pool_item_header) * 2 >= size) { > + } else if (sizeof(struct pool_item_header) * 8 >= size) { >off = pgsize - sizeof(struct pool_item_header); >items = off / size; >} > > Regards, > -- > Steven Chamberlain > ste...@pyro.eu.org
Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR
> On 29 Nov 2015, at 5:55 AM, Fred <open...@crowsons.com> wrote: > > On 11/27/15 12:43, David Gwynne wrote: >> On Wed, Nov 25, 2015 at 08:32:19PM +, Fred wrote: >>> >>> Well with that diff I hit another panic - which seems to be >>> triggered by the nic: >> >> i also think this is related to the nic. i have started cleaning >> dc(4), but would like some tests before going further. >> >> could you guys try this and let me know how it goes? i dont expect >> it to fix the problems, but i also dont expect them to get worse. >> >> cheers, >> dlg >> > > Hi David, > > I've been running this diff today - and so far the kernel has been more > stable then it has been recently. ok. ill put it in. thank you for testing. ill have a look at another cleanup soon too. cheers, dlg > > Output from ifconfig dc, and dmesg below. > > Thanks > > Fred > > dc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:03:ba:13:a8:c7 > priority: 0 > media: Ethernet autoselect (none) > status: no carrier > inet 192.168.50.50 netmask 0xff00 broadcast 192.168.50.255 > dc1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > lladdr 00:03:ba:13:a8:c8 > priority: 0 > groups: egress > media: Ethernet autoselect (100baseTX full-duplex) > status: active > inet 192.168.50.51 netmask 0xff00 broadcast 192.168.50.255 > > dmesg:OpenBSD 5.8-current (bsddc) #0: Sat Nov 28 14:12:26 GMT 2015 >f...@ultra.crowsons.com:/usr/src/sys/arch/sparc64/compile/bsddc > real mem = 268435456 (256MB) > avail mem = 248209408 (236MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz) > cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz > cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external (64 > b/l) > psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0 > psycho0: bus range 0-0, PCI bus 0 > psycho0: dvma map 6000-7fff > pci0 at psycho0 > ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00 > "dma" at ebus0 addr 0- ivec 0x2a not configured > rtc0 at ebus0 addr 70-71: m5819 > power0 at ebus0 addr 2000-2007 ivec 0x23 > lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11 > com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo > com0: console > com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo > "flashprom" at ebus0 addr 0-7 not configured > alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock > iic0 at alipm0 > "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs > spdmem0 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2 > spdmem1 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2 > dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address > 00:03:ba:13:a8:c7 > amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0 > dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address > 00:03:ba:13:a8:c8 > amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0 > ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4, > version 1.0, legacy support > pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, > channel 0 configured to native-PCI, channel 1 configured to native-PCI > pciide0: using ivec 0x7cc for native-PCI interrupt > pciide0: channel 0 disabled (no drives) > wd0 at pciide0 channel 1 drive 0: > wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors > atapiscsi0 at pciide0 channel 1 drive 1 > scsibus1 at atapiscsi0: 2 targets > cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable > wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 > cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2 > usb0 at ohci0: USB revision 1.0 > uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1 > vscsi0 at root > scsibus2 at vscsi0: 256 targets > softraid0 at root > scsibus3 at softraid0: 256 targets > bootpath: /pci@1f,0/ide@d,0/disk@2,0 > root on wd0a (a0af7098621c5786.a) swap on wd0b dump on wd0b >
Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR
On Wed, Nov 25, 2015 at 08:32:19PM +, Fred wrote: > > Well with that diff I hit another panic - which seems to be > triggered by the nic: i also think this is related to the nic. i have started cleaning dc(4), but would like some tests before going further. could you guys try this and let me know how it goes? i dont expect it to fix the problems, but i also dont expect them to get worse. cheers, dlg Index: dc.c === RCS file: /cvs/src/sys/dev/ic/dc.c,v retrieving revision 1.148 diff -u -p -r1.148 dc.c --- dc.c25 Nov 2015 03:09:58 - 1.148 +++ dc.c27 Nov 2015 12:37:04 - @@ -125,8 +125,7 @@ int dc_intr(void *); struct dc_type *dc_devtype(void *); int dc_newbuf(struct dc_softc *, int, struct mbuf *); -int dc_encap(struct dc_softc *, struct mbuf *, u_int32_t *); -int dc_coal(struct dc_softc *, struct mbuf **); +int dc_encap(struct dc_softc *, bus_dmamap_t, struct mbuf *, u_int32_t *); void dc_pnic_rx_bug_war(struct dc_softc *, int); int dc_rx_resync(struct dc_softc *); @@ -1658,17 +1657,19 @@ hasmac: BUS_DMA_NOWAIT, >sc_rx_sparemap) != 0) { printf(": can't create rx spare map\n"); return; - } + } for (i = 0; i < DC_TX_LIST_CNT; i++) { if (bus_dmamap_create(sc->sc_dmat, MCLBYTES, - DC_TX_LIST_CNT - 5, MCLBYTES, 0, BUS_DMA_NOWAIT, + (sc->dc_flags & DC_TX_COALESCE) ? 1 : DC_TX_LIST_CNT - 5, + MCLBYTES, 0, BUS_DMA_NOWAIT, >dc_cdata.dc_tx_chain[i].sd_map) != 0) { printf(": can't create tx map\n"); return; } } - if (bus_dmamap_create(sc->sc_dmat, MCLBYTES, DC_TX_LIST_CNT - 5, + if (bus_dmamap_create(sc->sc_dmat, MCLBYTES, + (sc->dc_flags & DC_TX_COALESCE) ? 1 : DC_TX_LIST_CNT - 5, MCLBYTES, 0, BUS_DMA_NOWAIT, >sc_tx_sparemap) != 0) { printf(": can't create tx spare map\n"); return; @@ -2488,39 +2489,14 @@ dc_intr(void *arg) * pointers to the fragment pointers. */ int -dc_encap(struct dc_softc *sc, struct mbuf *m_head, u_int32_t *txidx) +dc_encap(struct dc_softc *sc, bus_dmamap_t map, struct mbuf *m, u_int32_t *idx) { struct dc_desc *f = NULL; int frag, cur, cnt = 0, i; - bus_dmamap_t map; - - /* -* Start packing the mbufs in this chain into -* the fragment pointers. Stop when we run out -* of fragments or hit the end of the mbuf chain. -*/ - map = sc->sc_tx_sparemap; - - if (bus_dmamap_load_mbuf(sc->sc_dmat, map, - m_head, BUS_DMA_NOWAIT) != 0) - return (ENOBUFS); - cur = frag = *txidx; + cur = frag = *idx; for (i = 0; i < map->dm_nsegs; i++) { - if (sc->dc_flags & DC_TX_ADMTEK_WAR) { - if (*txidx != sc->dc_cdata.dc_tx_prod && - frag == (DC_TX_LIST_CNT - 1)) { - bus_dmamap_unload(sc->sc_dmat, map); - return (ENOBUFS); - } - } - if ((DC_TX_LIST_CNT - - (sc->dc_cdata.dc_tx_cnt + cnt)) < 5) { - bus_dmamap_unload(sc->sc_dmat, map); - return (ENOBUFS); - } - f = >dc_ldata->dc_tx_list[frag]; f->dc_ctl = htole32(DC_TXCTL_TLINK | map->dm_segs[i].ds_len); if (cnt == 0) { @@ -2535,12 +2511,12 @@ dc_encap(struct dc_softc *sc, struct mbu } sc->dc_cdata.dc_tx_cnt += cnt; - sc->dc_cdata.dc_tx_chain[cur].sd_mbuf = m_head; + sc->dc_cdata.dc_tx_chain[cur].sd_mbuf = m; sc->sc_tx_sparemap = sc->dc_cdata.dc_tx_chain[cur].sd_map; sc->dc_cdata.dc_tx_chain[cur].sd_map = map; sc->dc_ldata->dc_tx_list[cur].dc_ctl |= htole32(DC_TXCTL_LASTFRAG); if (sc->dc_flags & DC_TX_INTR_FIRSTFRAG) - sc->dc_ldata->dc_tx_list[*txidx].dc_ctl |= + sc->dc_ldata->dc_tx_list[*idx].dc_ctl |= htole32(DC_TXCTL_FINT); if (sc->dc_flags & DC_TX_INTR_ALWAYS) sc->dc_ldata->dc_tx_list[cur].dc_ctl |= @@ -2551,43 +2527,9 @@ dc_encap(struct dc_softc *sc, struct mbu bus_dmamap_sync(sc->sc_dmat, map, 0, map->dm_mapsize, BUS_DMASYNC_PREWRITE); - sc->dc_ldata->dc_tx_list[*txidx].dc_status = htole32(DC_TXSTAT_OWN); - - bus_dmamap_sync(sc->sc_dmat, sc->sc_listmap, - offsetof(struct dc_list_data, dc_tx_list[*txidx]), - sizeof(struct dc_desc) * cnt, - BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); - - *txidx = frag; + sc->dc_ldata->dc_tx_list[*idx].dc_status = htole32(DC_TXSTAT_OWN); - return (0); -} - -/* - * Coalesce an mbuf chain into a single mbuf
Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR
> On 20 Nov 2015, at 09:48, Steven Chamberlainwrote: > > Hi, > > Fred wrote: >> I've just updated to: >> OpenBSD 5.8-current (GENERIC) #801: Wed Nov 18 16:37:51 MST 2015 > > I'd used: > OpenBSD 5.8-current (GENERIC) #799: Wed Nov 18 01:34:20 MST 2015 > >> but I still had the following panic: >> >> panic: psycho0: uncorrectable DMA error AFAR 663f8450 (pa=0 tte=0/6218a012) >> AFSR 41ff4080 > > Damn! although this was stable for a couple of hours yesterday > (and that's quite an improvement to how it was) - today I booted > it up and it crashed on my very first SSH login attempt. > > Thank you for re-testing anyway. I notice your machine crashed in > process sshd also. My backtrace looks slightly different to yours so I > share it here: my guess this is something to do with handling of long mbuf chains for transmit. i say that because ssh is very good at generating these long chains, and it has caused problems on various other chips. either that or the chip has alignment or minimum transfer requirements that such packets dont respect. if you're inclined to play with the code, try making the code coalesce (dc_coal) every packet and see if the problem still occurs. dlg > > panic: psycho0: uncorrectable DMA error AFAR 6e868448 (pa=0 tte=0/69a12012) > AFSR 4100ff002080 > Stopped at Debugger+0x8: nop > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > *13074 13074 00x12 00 sshd > psycho_ue(48a3200, 0, 4000fb13bc8, 0, 0, 0) at psycho_ue+0x7c > intr_handler(e0017ec8, 48a3300, 58c86, 4000fb13cd8, 1, 0) at > intr_handler+0xc > sparc_interrupt(0, 4000fb13db0, 4000fb13df0, 0, 0, 14b) at > sparc_interrupt+0x298 > syscall(4000fb13ed0, 404, 1cd56442e8, 1cd56442ec, 0, 0) at syscall+0x34c > softtrap(3, 1c49412ee4, 54, 0, 0, 0) at softtrap+0x19c > http://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > ddb> trace > psycho_ue(48a3200, 0, 4000fb13bc8, 0, 0, 0) at psycho_ue+0x7c > intr_handler(e0017ec8, 48a3300, 58c86, 4000fb13cd8, 1, 0) at > intr_handler+0xc > sparc_interrupt(0, 4000fb13db0, 4000fb13df0, 0, 0, 14b) at > sparc_interrupt+0x298 > syscall(4000fb13ed0, 404, 1cd56442e8, 1cd56442ec, 0, 0) at syscall+0x34c > softtrap(3, 1c49412ee4, 54, 0, 0, 0) at softtrap+0x19c > ddb> ps > TID PPID PGRPUID S FLAGS WAIT COMMAND > 4621 13074 4621 0 20x11sshd > *13074 5310 13074 0 70x12sshd > 19768 1 19768 0 30x83 ttyin getty > 9712 1 9712 0 30x80 poll cron > 30895 1 30895 99 30x90 poll sndiod > 24430 1 24430 79 30x90 kqreadtftpd > 15649 18969 18969 95 30x90 kqreadsmtpd > 9170 18969 18969 95 30x90 kqreadsmtpd > 31583 18969 18969 95 30x90 kqreadsmtpd >46 18969 18969 95 30x90 kqreadsmtpd > 32085 18969 18969 95 30x90 kqreadsmtpd > 14545 18969 18969103 30x90 kqreadsmtpd > 18969 1 18969 0 30x80 kqreadsmtpd > 17656 10194 10194 0 30x82 piperdcat > 10194 17726 10194 0 30x8a pause ksh > 24201 1 24201 77 30x90 poll dhcpd > 17726 5310 17726 0 20x12sshd > 5310 1 5310 0 30x80 selectsshd > 11444 22135 26256 83 30x90 poll ntpd > 22135 26256 26256 83 30x90 poll ntpd > 26256 1 26256 0 30x80 poll ntpd > 20693 19390 19390 74 30x90 bpf pflogd > 19390 1 19390 0 30x80 netio pflogd > 18490 1 1 73 30x90 kqreadsyslogd > 1 1 1 0 30x80 netio syslogd > 19396 0 0 0 2 0x14200zerothread > 27403 0 0 0 3 0x14200 aiodoned aiodoned > 30791 0 0 0 3 0x14200 syncerupdate > 5885 0 0 0 3 0x14200 cleaner cleaner > 17162 0 0 0 3 0x14200 reaperreaper > 6922 0 0 0 3 0x14200 pgdaemon pagedaemon > 10422 0 0 0 3 0x14200 bored crypto > 7995 0 0 0 3 0x14200 pftm pfpurge > 28357 0 0 0 3 0x14200 usbtskusbtask > 16205 0 0 0 3 0x14200 usbatsk usbatsk > 9316 0 0 0 3 0x14200 bored sensors > 14821 0 0 0 3 0x14200 bored softnet > 25203 0 0 0 3 0x14200 bored systqmp > 20530 0 0
Re: OpenBSD 5.6 Kernel panic - panic: mtx_enter: locking against myself
On Mon, Mar 30, 2015 at 05:01:57PM -0500, Walter Daugherity wrote: You are right, there were two interrupts. Here is the latest crash trace, ps, show registers, and dmesg. (There was also another crash which just died and never went to ddb.) you could try the following diff to see if it fixes this problem: Index: if_ppp.c === RCS file: /cvs/src/sys/net/if_ppp.c,v retrieving revision 1.83 diff -u -p -r1.83 if_ppp.c --- if_ppp.c13 May 2015 10:42:46 - 1.83 +++ if_ppp.c28 May 2015 05:51:28 - @@ -154,18 +154,10 @@ static void ppp_ifstart(struct ifnet *if intppp_clone_create(struct if_clone *, int); intppp_clone_destroy(struct ifnet *); -/* - * Some useful mbuf macros not in mbuf.h. - */ -#define M_IS_CLUSTER(m)((m)-m_flags M_EXT) - -#define M_DATASTART(m) \ - (M_IS_CLUSTER(m) ? (m)-m_ext.ext_buf : \ - (m)-m_flags M_PKTHDR ? (m)-m_pktdat : (m)-m_dat) - -#define M_DATASIZE(m) \ - (M_IS_CLUSTER(m) ? (m)-m_ext.ext_size : \ - (m)-m_flags M_PKTHDR ? MHLEN: MLEN) +void ppp_pkt_list_init(struct ppp_pkt_list *, u_int); +intppp_pkt_enqueue(struct ppp_pkt_list *, struct ppp_pkt *); +struct ppp_pkt *ppp_pkt_dequeue(struct ppp_pkt_list *); +struct mbuf * ppp_pkt_mbuf(struct ppp_pkt *); /* * We steal two bits in the mbuf m_flags, to mark high-priority packets @@ -234,7 +226,7 @@ ppp_clone_create(struct if_clone *ifc, i IFQ_SET_MAXLEN(sc-sc_if.if_snd, IFQ_MAXLEN); mq_init(sc-sc_inq, IFQ_MAXLEN, IPL_NET); IFQ_SET_MAXLEN(sc-sc_fastq, IFQ_MAXLEN); -IFQ_SET_MAXLEN(sc-sc_rawq, IFQ_MAXLEN); +ppp_pkt_list_init(sc-sc_rawq, IFQ_MAXLEN); IFQ_SET_READY(sc-sc_if.if_snd); if_attach(sc-sc_if); if_alloc_sadl(sc-sc_if); @@ -315,6 +307,7 @@ pppalloc(pid_t pid) void pppdealloc(struct ppp_softc *sc) { +struct ppp_pkt *pkt; struct mbuf *m; splsoftassert(IPL_SOFTNET); @@ -323,12 +316,8 @@ pppdealloc(struct ppp_softc *sc) sc-sc_if.if_flags = ~(IFF_UP|IFF_RUNNING); sc-sc_devp = NULL; sc-sc_xfer = 0; -for (;;) { - IF_DEQUEUE(sc-sc_rawq, m); - if (m == NULL) - break; - m_freem(m); -} +while ((pkt = ppp_pkt_dequeue(sc-sc_rawq)) != NULL) + ppp_pkt_free(pkt); while ((m = mq_dequeue(sc-sc_inq)) != NULL) m_freem(m); for (;;) { @@ -1052,31 +1041,28 @@ void pppintr(void) { struct ppp_softc *sc; -int s, s2; +int s; +struct ppp_pkt *pkt; struct mbuf *m; splsoftassert(IPL_SOFTNET); -s = splsoftnet(); /* XXX - what's the point of this? see comment above */ LIST_FOREACH(sc, ppp_softc_list, sc_list) { if (!(sc-sc_flags SC_TBUSY) (!IFQ_IS_EMPTY(sc-sc_if.if_snd) || !IFQ_IS_EMPTY(sc-sc_fastq))) { - s2 = splnet(); + s = splnet(); sc-sc_flags |= SC_TBUSY; - splx(s2); + splx(s); (*sc-sc_start)(sc); } - while (!IFQ_IS_EMPTY(sc-sc_rawq)) { - s2 = splnet(); - IF_DEQUEUE(sc-sc_rawq, m); - splx(s2); + while ((pkt = ppp_pkt_dequeue(sc-sc_rawq)) != NULL) { + m = ppp_pkt_mbuf(pkt); if (m == NULL) - break; + continue; ppp_inproc(sc, m); } } -splx(s); } #ifdef PPP_COMPRESS @@ -1199,15 +1185,11 @@ ppp_ccp_closed(struct ppp_softc *sc) * were omitted. */ void -ktin(struct ppp_softc *sc, struct mbuf *m, int lost) +ktin(struct ppp_softc *sc, struct ppp_pkt *pkt, int lost) { -int s = splnet(); - -if (lost) - m-m_flags |= M_ERRMARK; -IF_ENQUEUE(sc-sc_rawq, m); -schednetisr(NETISR_PPP); -splx(s); +pkt-p_hdr.ph_errmark = lost; +if (ppp_pkt_enqueue(sc-sc_rawq, pkt) == 0) + schednetisr(NETISR_PPP); } /* @@ -1389,19 +1371,6 @@ ppp_inproc(struct ppp_softc *sc, struct } #endif /* VJC */ -/* - * If the packet will fit in a header mbuf, don't waste a - * whole cluster on it. - */ -if (ilen = MHLEN M_IS_CLUSTER(m)) { - MGETHDR(mp, M_DONTWAIT, MT_DATA); - if (mp != NULL) { - m_copydata(m, 0, ilen, mtod(mp, caddr_t)); - m_freem(m); - m = mp; - m-m_len = ilen; - } -} m-m_pkthdr.len = ilen; m-m_pkthdr.rcvif = ifp; @@ -1542,4 +1511,96 @@ ppp_ifstart(struct ifnet *ifp) sc = ifp-if_softc; (*sc-sc_start)(sc); } + +void +ppp_pkt_list_init(struct ppp_pkt_list *pl, u_int limit) +{ + mtx_init(pl-pl_mtx, IPL_TTY); + pl-pl_head = pl-pl_tail = NULL; + pl-pl_count = 0; + pl-pl_limit = limit; +} + +int +ppp_pkt_enqueue(struct ppp_pkt_list *pl, struct ppp_pkt *pkt) +{ + int drop = 0; + + mtx_enter(pl-pl_mtx); + if (pl-pl_count pl-pl_limit) { + if (pl-pl_tail == NULL) +
Re: Panic: malloc: out of space in kmem_map
On 7 Apr 2015, at 6:38 pm, Evgeniy Sudyr eject.in...@gmail.com wrote: David, yes, there are next changes in sysctl.conf, but kernel options were untouched (again it was GENERIC.MP -stable). $ cat /etc/sysctl.conf net.inet.ip.forwarding=1 net.inet.carp.preempt=1 net.inet6.ip6.forwarding=1 kern.maxfiles=5048026 kern.maxclusters=200 why did you raise those last two values? dlg ps. that last one is the cause of your panics. On Tue, Apr 7, 2015 at 9:37 AM, David Gwynne da...@gwynne.id.au wrote: On 6 Apr 2015, at 05:32, Evgeniy Sudyr eject.in...@gmail.com wrote: Mark, I will dig in to this. Sorry, but can someone give a hint what are unusual values for pools there which can be related to kernel panic Iv'e reported at the very beginning? Current vmstat -m output is: the abbreviated version below is kind of interesting. are you setting the kern.maxclusters sysctl? if so, to what value? Memory Totals: In UseFreeRequests 76695K862K24831415 Memory resource pool statistics NameSize Requests FailInUse Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle mbpl 256 2741641011 0 346 4789 0 0 4789 1 125000 4767 mcl2k 2048 1108887843 0 183 10052 0 0 10052 4 100 9959 In use 210238K, total allocated 0K; utilization inf% Will update if will find something... On Sun, Apr 5, 2015 at 6:59 PM, Mark Kettenis mark.kette...@xs4all.nl wrote: Date: Sun, 5 Apr 2015 18:44:43 +0200 From: Evgeniy Sudyr eject.in...@gmail.com Stuart, as part of troubleshooting, BIOS was upgraded from R 3.0 to latest R 3.2 http://www.supermicro.com/products/motherboard/Xeon/C600/X9SRW-F.cfm X9SRW5.115 How big chances are it hitted bug which was fixed in latest BIOS relase and this will not occurs again? Did you noticed something we can check with Supermicro support to make sure? So far I've not seen any real evidence that the BIOS is causing problems. Ted noticed the higher-than-usual ACPI memory usage, suggesting a memory leak. This made Stuart suggest that it might be worth updating your BIOS. But we haven't actually established that there is indeed a memory leak. In fact the information you posted earlier suggests that there is no ACPI memory leak, or at least not one directly related to executing AML. You'll really need to do some digging yourself here. Look at the vmstat -m output immediately after booting your machine. Then keep looking at it periodically and identify the memory types and pools that keep growing. For malloc'ed memory look at the MemUse column under Memory statistics by type. For pools, look at the InUse column under Memory resource pool statistics. -- -- With regards, Eugene Sudyr -- -- With regards, Eugene Sudyr
Re: modload issues on armv7
On 29 Oct 2014, at 09:11, Dimitri Sokolyuk de...@dim13.org wrote: On 28 Oct 2014, at 20:15, Stuart Henderson st...@openbsd.org wrote: On 2014/10/28 19:03, Dimitri Sokolyuk wrote: Synopsis: modload (ld) issues on armv7 (beaglebone black) Category: system, ld, kernel Environment: System : OpenBSD 5.6 Details : OpenBSD 5.6-current (GENERIC-OMAP) #5: Thu Oct 9 16:58:24 AEDT 2014 r...@armv7.jsg.id.au:/usr/src/sys/arch/armv7/compile/GENERIC-OMAP Architecture: OpenBSD.armv7 Machine : armv7 Description: ld fails with internel error on lkm load: ld -nopie -Z -R /dev/ksyms -e test_lkmentry -o test -Ttext 0x0 combined.o internal error: aborting at /usr/src/gnu/usr.bin/binutils/ld/ldlang.c line 3835 in lang_place_orphans ld: please report this bug LKM support was removed. Rally sad news. :( Sorry, I’ve missed the announcement (http://www.openbsd.org/faq/current.html#20141013 http://www.openbsd.org/faq/current.html#20141013) It’s a wrong mailing list, but still, which is a recommended way to develop user kernel extensions and new drivers then? i build new code as part of the monolithic kernel, copy it to /bsd, and reboot to try it. i keep a copy of a working kernel as bsd.working in case my changes dont work. LKM didn’t get much love in past years, but still it was a very powerful tool for small and useful things, which will never make into base code. -- Dimitri Sokolyuk — 0x5a7c3054 — http://www.dim13.org/
Re: pfsync over ipsec is broken
On 18 Oct 2014, at 11:42 pm, Stefan Sperling s...@openbsd.org wrote: On Fri, Oct 17, 2014 at 10:51:54AM +1000, David Gwynne wrote: as discussed, a fix for this has been committed in src/sys/net/if_pfsync.c r1.210 thank you for the good bug report. your recipe was easy to follow. There is still another problem with pfsync and IPsec. Using the previously described setup, the pfsync peers don't properly keep their pf states in sync while IPsec is in use. The reason seems to be that once local TDBs used for protecting pfsync traffic are synced to the peer then IPsec replay checks trigger on the peer's side when future pfsync updates are sent: # netstat -p esp -s | grep replay 304 possibly replayed packets received In this state, pfsync updates are being dropped by the peer's IPsec stack. Running tcpdump on enc0 shows that there is no bi-directional pfsync traffic. The peers are sending pfsync updates out but neither is receiving them. To test my theory I've verified that turning pfsync_update_tdb() into a stub that returns immediately allows both peers to keep pf states in sync and tcpdump on enc0 now shows bi-directional pfsync traffic. I'm not sure what the best fix is. Perhaps making pfsync prevent particular TDBs from being synced would do. However, this problem might affect any SAs negotiated between the peers. In case the peers have multiple IPsec-protected links between them we might have to prevent syncing IPsec state not directly related to pfsync. The question then becomes which TDB's can actually be synced without breaking communication between the peers, in general, without knowing what addresses the remote peer is using besides the syncpeer address. I don't know enough about IPsec to come up with an answer I feel confident about. The best answer I could come up with so far was any TDB that uses a source address from any local interface should be exempt from syncing. Is that acceptable? Other ideas? pfsync implicitely adds NO_SYNC on pf states for pfsync traffic. state update messages never occur on the wire for pfsync protocol traffic. id argue something that gives the same result is necessary here. the problem is the flow you set up to protect pfsync traffic also protects all traffic between the hosts, not just pfsync. on my big boxes at work i try to have something along the lines of pass !received-on any keep state (no-sync) to avoid creating states for connections that are locally terminated. i dont know if there was a no-sync flag added for ipsec flows that pfsync could work with, but that would be necessary in this situation. dlg
Re: bge NOT work on Dell R720
i'll try to chase this down, but its hard going by the freebsd bug report cos its lots of vague times and no references to specific revisions of their driver. i have r420s and r520s with 5720s in them which work fine. i do have a r720 i can try, but its hard to pull out of production for this kind of testing. i'll try to get to this soon. cheers, dlg On 05/04/2013, at 10:04 PM, Robert Young yay...@gmail.com wrote: Dell PowerEdge R720 Broadcom Gigabit Ethernet BCM5720 Tested kernel: http://ftp.openbsd.org/pub/OpenBSD/snapshots/amd64/bsd.rd http://ftp.openbsd.org/pub/OpenBSD/snapshots/amd64/bsd.mp dmesg: OpenBSD 5.3-current (RAMDISK_CD) #96: Wed Apr 3 02:19:34 MDT 2013 bge2 at pci2 dev 0 function 0 Broadcom BCM5720 rev 0x00, BCM5720 A0 (0x572), APE firmware NCSI 1.1.7.0: apic 1 int 3, address 90:b1:1c:3a:a8:19 brgphy2 at bge2 phy 1: BCM5720C 10/100/1000baseT PHY, rev. 0 It's OK 156 . ping -s 156 10.2.1.29 164 bytes from ... NO reply =157,(not resetted,can return to normal with small packet test) ping -s 157 10.2.1.29 ... 0 packets received Even larger packet, NIC resetted(wait ... return to normal after reseted) ping -s 275 10.2.1.29 bge2: watchdog timeout -- resetting FreeBSD have encountered same issue,I tested, same issue found in: http://ftp.freebsd.org/pub/FreeBSD/ISO-IMAGES-amd64/9.1/FreeBSD-9.1-RELEASE-amd64-dvd1.iso This problem was fixed by FreeBSD team: http://www.freebsd.org/cgi/query-pr.cgi?pr=171121 I tested, This version have fixed this problem: http://ftp.freebsd.org/pub/FreeBSD/snapshots/amd64/amd64/ISO-IMAGES/10.0/FreeBSD-10.0-CURRENT-amd64-20130323-r248655-release.iso
Re: OpenBSD crash on an IBM x3550 M3
i agree that mikebs change should go in. On 05/03/2011, at 12:10 AM, Mark Kettenis wrote: Date: Fri, 4 Mar 2011 07:30:24 -0600 From: Marco Peereboom sl...@peereboom.us That is a huge penalty because it is read over the pci bus. The trick with 0x should work just fine per the doco and other os' drivers (on top of my head). The question I have is does Linux only have one device per interrupt? Linux probably does a better job at avoiding shared interrupts than we do, but it on some hardware it can't be avoided so it has to deal with it. If you wantto avoid reading the interrupt status register, you'll have to stop trusting the hardware (or rather the firmware) in make mpi_reply(), and do bounds checks before accessing sc-sc_rcbs[] and sc-sc_ccbs[]. To be honest, that would be a good idea even if we didn't have this bug. In the meantime I think mikeb's fix should be committed. I am going to reference the doco one more time on this. On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote: On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote: On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote: Hello, After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3, I find the system randomly panics after a period of use. uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e kernel: page fault trap, code=0 Stopped at mpi_reply+0x102:movq 0(%r13),%rax ddb{0} ddb{0} trace mpi_reply() at mpi_reply+0x102 mpi_intr() at mpi_intr+0x20 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec --- interrupt --- Bad frame pointer: 0x8000194e1920 end trace frame: 0x8000194e1920, count: -3 Xspllower+0xe: ddb{0} We've tried different things, but after this hint i realised that what might be happening is that bnx and mpi interrupts are chained (it's bnx0 actually, my initial guess about bnx1 was wrong) and mpi_intr is called first. Currently neither mpi(4) nor mpii(4) don't check the interrupt status register but look directly into the reply post queue. Although, there's not supposed to be any race between host cpu reading from the memory and ioc writing to it, in practice it turns out that in some particular hardware configurations this rule is violated and we read a garbled reply from the controller. If my memory serves, I've considered this for the mpii_intr but never got into the situation where it was needed and thus omitted it. I guess I have to bring it back too. Emeric tortured the machine with this diff and reported that it solves the issue for him. OK to commit? On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote: hi, This change doesn't solve the issue. I have remarked that the server crash when I use the network. I copy a small file several times without problem. On the IBM I do : scp USER@IP:/tmp/mpi.c . And when I copy a larger file the server crash : scp USER@IP:/bsd . And when I copy th same file (bsd) from an usb key I don't have problem. Emeric. that sounds like an interrupt sharing bug of some sort. is it bnx1 that you're using to reproduce a crash? try the following diff please (on a clean checkout): Index: mpi.c === RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v retrieving revision 1.166 diff -u -p -r1.166 mpi.c --- mpi.c 1 Mar 2011 23:48:33 - 1.166 +++ mpi.c 2 Mar 2011 17:40:13 - @@ -887,6 +887,9 @@ mpi_intr(void *arg) u_int32_t reg; int rv = 0; + if ((mpi_read_intr(sc) MPI_INTR_STATUS_REPLY) == 0) + return (rv); + while ((reg = mpi_pop_reply(sc)) != 0x) { mpi_reply(sc, reg); rv = 1; ok krw@. Ken