from:"David Gwynne"

Re: HA IPSec with AWS - no second flow

2024-03-11 Thread David Gwynne

On Mon, Mar 11, 2024 at 02:33:26PM +0100, Rafa?? Ramocki wrote:
> Hi,
> 
> That thread you've mentioned is almost 10 years old and is about isakmpd not 
> iked. I've also tried the same with route-based VPN to AWS with BGP but 
> problem is the same. BGP can negotiate route but traffic will not pass if 
> there is no approperiate flow set up.

the problem you're hitting up against is in the kernel though, so
it doesnt matter whether it is isakmpd or iked sets up the flows.

> From iekd perspective this looks like this:
> 
> iked_sas: 0x41f72e9e780 rspi 0x0980e96b933f8822 ispi 0x4e425ee4a53d8814 
> X.X.X.X:4500->16.170.59.81:4500[] ESTABLISHED r natt 
> udpecap nexti 0x0 pol 0x41ed13d5000
>   sa_childsas: 0x41f72e8f800 ESP 0x8c5e2e0e in 16.170.59.81:4500 -> 
> X.X.X.X:4500 (LA) B=0x0 P=0x41f72e8ed00 @0x41f72e9e780
>   sa_childsas: 0x41f72e8ed00 ESP 0xc643a377 out X.X.X.X:4500 -> 
> 16.170.59.81:4500 (L) B=0x0 P=0x41f72e8f800 @0x41f72e9e780
>   sa_flows: 0x41f670ba800 ESP out 10.2.15.0/24 -> 172.31.0.0/20 [0]@-1 () 
> @0x41f72e9e780
>   sa_flows: 0x41f72eafc00 ESP in 172.31.0.0/20 -> 10.2.15.0/24 [0]@-1 () 
> @0x41f72e9e780
>   sa_flows: 0x41f670cd000 ESP out 10.2.15.0/24 -> 172.31.16.0/20 [0]@-1 () 
> @0x41f72e9e780
>   sa_flows: 0x41f72ebd000 ESP in 172.31.16.0/20 -> 10.2.15.0/24 [0]@-1 () 
> @0x41f72e9e780
>   sa_flows: 0x41f72eaf800 ESP out 10.2.15.0/24 -> 172.31.32.0/20 [0]@-1 () 
> @0x41f72e9e780
>   sa_flows: 0x41f72ea8000 ESP in 172.31.32.0/20 -> 10.2.15.0/24 [0]@-1 () 
> @0x41f72e9e780
>   sa_flows: 0x41f72ea8400 ESP out 169.254.74.238/32 -> 169.254.74.237/32 
> [0]@-1 () @0x41f72e9e780
>   sa_flows: 0x41f72eaf000 ESP in 169.254.74.237/32 -> 169.254.74.238/32 
> [0]@-1 () @0x41f72e9e780
> iked_sas: 0x41f72eab780 rspi 0x19dfa021335741c8 ispi 0x0d8f2ce39a096333 
> X.X.X.X:4500->16.170.59.81:4500[] ESTABLISHED r natt 
> udpecap nexti 0x0 pol 0x41ed13d5000
>   sa_childsas: 0x41f72e9a000 ESP 0xc4c246f8 out X.X.X.X:4500 -> 
> 16.170.59.81:4500 (L) B=0x0 P=0x41f72eb9a00 @0x41f72eab780
>   sa_childsas: 0x41f72eb9a00 ESP 0x1345935a in 16.170.59.81:4500 -> 
> X.X.X.X:4500 (LA) B=0x0 P=0x41f72e9a000 @0x41f72eab780
>   sa_flows: 0x41f72ec7800 ESP out 10.2.15.0/24 -> 172.31.0.0/20 [0]@-1 (L) 
> @0x41f72eab780
>   sa_flows: 0x41f72e91800 ESP in 172.31.0.0/20 -> 10.2.15.0/24 [0]@-1 (L) 
> @0x41f72eab780
>   sa_flows: 0x41f72ead400 ESP out 10.2.15.0/24 -> 172.31.16.0/20 [0]@-1 (L) 
> @0x41f72eab780
>   sa_flows: 0x41f72eadc00 ESP in 172.31.16.0/20 -> 10.2.15.0/24 [0]@-1 (L) 
> @0x41f72eab780
>   sa_flows: 0x41f72e91c00 ESP out 10.2.15.0/24 -> 172.31.32.0/20 [0]@-1 (L) 
> @0x41f72eab780
>   sa_flows: 0x41f72ead000 ESP in 172.31.32.0/20 -> 10.2.15.0/24 [0]@-1 (L) 
> @0x41f72eab780
>   sa_flows: 0x41f72e90400 ESP out 169.254.74.238/32 -> 169.254.74.237/32 
> [0]@-1 (L) @0x41f72eab780
>   sa_flows: 0x41f72ead800 ESP in 169.254.74.237/32 -> 169.254.74.238/32 
> [0]@-1 (L) @0x41f72eab780
> 
> I think that (L) marking means that this flow is loaded into the kernel and 
> some of them are missing. It may be some change in iked to fix this, I think.
> 
> 
> 
> PS: I'm working with devices of different vendors and I think that in some 
> way OpenBSD have problem with this.

yes, that's why we added support for route based ipsec vpns via the
sec(4) interface, as per the links that hrvoje provided and
https://marc.info/?l=openbsd-tech=168844868110327=2.

i just set up a an openbsd box talking to aws s2s with iked yesterday.
your config would look a bit like this:

dlg@obsd-aws-gw0:/etc$ sudo cat hostname.em0
inet 192.0.2.51 255.255.255.0
dlg@obsd-aws-gw0:/etc$ sudo cat hostname.sec0
inet 169.254.21.38 255.255.255.252 169.254.21.37
up
dlg@obsd-aws-gw0:/etc$ sudo cat hostname.sec1
inet 169.254.74.238 255.255.255.252 169.254.74.237
up
dlg@obsd-aws-gw0:/etc$ sudo cat iked.conf
h_self="192.0.2.51"
h_s2s1="51.21.86.8"
h_s2s1_key="0123456789abcdefghijklmnopqrstuv"
h_s2s2="16.170.59.81"
h_s2s2_key="abcdefghijklmnopqrstuvwxyz012345"

ikev2 "s2s1" active \
from any to any \
local $h_self peer $h_s2s1 \
childsa auth hmac-sha2-256 enc aes-256 group ecp256 \
psk $h_s2s1_key \
iface sec0

ikev2 "s2s2" active \
from any to any \
local $h_self peer $h_s2s2 \
childsa auth hmac-sha2-256 enc aes-256 group ecp256 \
psk $h_s2s2_key \
iface sec1

that should be enough to get ipsec up and running so you can talk to aws
over the sec(4) interfaces. the next step is to set up routes to your nets
in aws over these links. we use bgpd to dynamically learn routes
and fail over between the the different sec interfaces, but if you
wanted to do ecmp/multipath you could manually add routes over each
interface.

dlg

> - Original Message -
> From: "Hrvoje Popovski" 
> To: "Rafa?? Ramocki" , "bugs" 
> Sent: Monday, March 11, 2024 1:05:10 PM
> Subject: Re: HA IPSec with AWS - no second flow
> 
> On 11.3.2024. 10:22, Rafa?? Ramocki wrote:
> >> (...)
> > I think IKED may detect

Re: 7.4 hard locks with bgpd on link change

2024-01-23 Thread David Gwynne

this should be fixed in src/sys/net/if_sec.c r1.10.

sorry for the delay :(

> On 4 Nov 2023, at 13:01, Jason Tubnor  wrote:
> 
> 
> On 3/11/2023 8:58 pm, Claudio Jeker wrote:
>> Do I understand you correctly that bgpd runs over the sec(4) interface
>> which routes over em1?
> 
> Correct (also OSPF). Here is the iked.conf and ifconfig sec10:
> 
> ikev2 active esp from any to any peer 192.168.1.1 srcid 172.16.1.1 dstid 
> 192.168.1.1 iface sec10
> 
> sec10: flags=8051 mtu 1380
> index 7 priority 0 llprio 3
> groups: sec egress
> inet 10.0.0.253 --> 10.0.1.254 netmask 0xfe00
> 
>> Does bgpd install any routes over em1? `bgpctl show next` should tell you
>> which nexthops use which interface.
> 
> See below. Redacted for privacy:
> 
> fwtst06# bgpctl sho nex
> Flags: * = nexthop valid
> 
>   Nexthop Route  Prio Gateway Iface
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/32   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/23   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/23   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/26   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/26   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/26   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/304 connected   em0 (UP, 1000 Mbps)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/304 connected   em0 (UP, 1000 Mbps)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/30   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> * XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX/30   32 XXX.XXX.XXX.XXX sec10 (UP, unknown)
> 
>> 
>> This could well be an issue inside sec(4) since that interface is new.
>> If you could give use some example config to rebuild the case it would
>> help a lot.
> fwtst06# cat /etc/hostname.sec10
> inet 10.0.0.253 255.255.254.0 10.0.1.254 mtu 1380
> up
> fwtst06# grep sec /etc/pf.conf
> set skip on { lo, sec }
> fwtst06# cat /etc/ospfd.conf
> router-id $ospf_id
> 
> area 0.0.0.0 {
> interface sec10 {
> type p2p
> }
> interface em0 {
> type p2p
> }
> }
> 
> /etc/bgpd.conf 
> 
> group "ibgp" {
> remote-as $bgpasn
> local-address $laif
> 
> neighbor 10.8.8.8  # router reflector 1 ipv4
> neighbor 10.9.9.9  # router reflector 2 ipv4
> 
> neighbor $em0neighbor {
> route-reflector
> }
> }
>

Re: vxlan(4) custom destination UDP port seems not working

2023-11-21 Thread David Gwynne

On Wed, Nov 15, 2023 at 06:13:15AM -0700, Theo de Raadt wrote:
> Luca Di Gregorio  wrote:
> 
> > I'm not sure about this, but I think that public cloud datacenters filter 
> > out
> > (or do something with) udp traffic to standard udp vxlan port.
> 
> But that would not be a reason for allowing selection of the pre-standard
> port number.
> 
> Rather, it would be a reason for provididing *any non-standard port number*
> 
> Which is perhaps what the code does.  But noone would actually want this.
> VXLAN on port 54?  80?  Noone would want this.
> 
> And if they filter it, then put it inside an underlay.  The standard says
> nothing about permitting vxlan on any old random stupid port number.

from a quick look around it appears that at least linux, juniper and
arista allow for the configuration of a non-standard port for vxlan.
linux documentation even says it defaults to the pre-iana assigned port
because their driver pre dates the standard, which is peak linux.

independent of whether our vxlan(4) driver should support it or not,
ifconfig should be fixed to handle setting up sockaddrs for these
ioctls better anyway.

dlg

Re: pf nat-to doesn't match a crafted packet

2023-08-28 Thread David Gwynne

How are you injecting the crafted packet into the stack?

On Tue, 29 Aug 2023, 01:14 ,  wrote:

> >Synopsis:  pf nat-to doesn't match a crafted packet
> >Category:  system
> >Environment:
> System  : OpenBSD 7.3
> Details : OpenBSD 7.3 (GENERIC.MP) #2080: Sat Mar 25 14:20:25
> MDT 2023
>  dera...@arm64.openbsd.org:
> /usr/src/sys/arch/arm64/compile/GENERIC.MP
>
> Architecture: OpenBSD.arm64
> Machine : arm64
> >Description:
> I was testing a seemingly valid Internet packet going out my
> gateway
> but the pf firewall doesn't match nat-to to this one for some reason.  I'm
> possibly overlooking something but every other packet exiting my gateway is
> nat'ed.  What causes this?  How can this be exploited?
>
> >How-To-Repeat:
> Here is the tcpdump from the host 1 hop behind the NAT router:
>
> 16:59:08.438082 192.168.177.13 > 49.12.42.182: icmp: host 7.198.187.211
> unreachable [icmp cksum ok] for 11.69.44.241.52699 > 7.198.187.211.55672:
> udp 51351 [tos 0x9c] (ttl 147, id 17124, len 51419, optlen=40 NOP RR{39}=
> RR{#106.155.117.54 233.26.79.111 129.127.249.242 60.117.146.16
> 179.39.29.224 213.65.49.78 0.16.45.109 252.168.188.0 123.108.138.224}) (ttl
> 64, id 65443, len 96)
>   : 4500 0060 ffa3  4001 ad81 c0a8 b10d  E..`@...
>   0010: 310c 2ab6 0301 55aa   4f9c c8db  1.*...U.O...
>   0020: 42e4  9311 c756 0b45 2cf1 07c6 bbd3  B..V.E,.
>   0030: 0107 2704 6a9b 7536 e91a 4f6f 817f f9f2  ..'.j.u6..Oo
>   0040: 3c75 9210 b327 1de0 d541 314e 0010 2d6d 0050: fca8 bc00 7b6c 8ae0 cddb d978    {l.x
>
> and here is the tcpdump on the pppoe interface:
>
> 16:59:08.440403 192.168.177.13 > 49.12.42.182: icmp: host 7.198.187.211
> unreacha
> ble [icmp cksum ok] (ttl 63, id 65443, len 96)
>
> Here is the relevant anchor rules I have:
>
>match out on $ext_if inet from  to any nat-to ($ext_if)
>
> and:
>
> table  const { 10/8, 172.16/12, 192.168/16 }
>
> Why did pf not translate this?  ... that's kinda kinky.
>
> >Fix:
> Not known.
>
>
> dmesg:
> OpenBSD 7.3 (GENERIC.MP) #2080: Sat Mar 25 14:20:25 MDT 2023
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> real mem  = 8432840704 (8042MB)
> avail mem = 8139239424 (7762MB)
> random: good seed from bootblocks
> mainbus0 at root: ACPI
> psci0 at mainbus0: PSCI 1.1, SMCCC 1.2
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
> cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu0: 1024KB 64b/line 16-way L2 cache
> cpu0: CRC32,ASID16
> cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
> cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu1: 1024KB 64b/line 16-way L2 cache
> cpu1: CRC32,ASID16
> cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
> cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu2: 1024KB 64b/line 16-way L2 cache
> cpu2: CRC32,ASID16
> cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3
> cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu3: 1024KB 64b/line 16-way L2 cache
> cpu3: CRC32,ASID16
> efi0 at mainbus0: UEFI 2.7
> efi0: https://github.com/pftf/RPi4 rev 0x1
> smbios0 at efi0: SMBIOS 3.3.0
> smbios0: vendor https://github.com/pftf/RPi4 version "UEFI Firmware
> v1.21" date 11/13/2020
> smbios0: Raspberry Pi Foundation Raspberry Pi 4 Model B
> apm0 at mainbus0
> ampintc0 at mainbus0 nirq 256, ncpu 4 ipi: 0, 1, 2: "interrupt-controller"
> agtimer0 at mainbus0: 54000 kHz
> acpi0 at mainbus0: ACPI 6.3
> acpi0: sleep states
> acpi0: tables DSDT FACP CSRT DBG2 GTDT IORT APIC PPTT BGRT
> acpi0: wakeup devices
> acpiiort0 at acpi0
> "BCM2849" at acpi0 not configured
> "BCM2835" at acpi0 not configured
> "BCM2854" at acpi0 not configured
> "ACPI0004" at acpi0 not configured
> xhci0 at acpi0 XHC0 addr 0x6/0x1000 irq 175, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Generic xHCI root hub" rev
> 3.00/1.00 addr 1
> "ACPI0007" at acpi0 not configured
> "ACPI0007" at acpi0 not configured
> "ACPI0007" at acpi0 not configured
> "ACPI0007" at acpi0 not configured
> "ACPI0004" at acpi0 not configured
> "BCM2848" at acpi0 not configured
> "BCM2850" at acpi0 not configured
> "BCM2856" at acpi0 not configured
> "BCM2845" at acpi0 not configured
> "BCM2841" at acpi0 not configured
> "BCM2841" at acpi0 not configured
> "BCM2838" at acpi0 not configured
> "BCM2839" at acpi0 not configured
> "BCM2844" at acpi0 not configured
> pluart0 at acpi0 URT0 addr 0xfe201000/0x1000 irq 153
> "BCM2836" at acpi0 not configured
> "BCM2EA6" at acpi0 not configured
> "MSFT8000" at acpi0 not configured
> sdhc0 at acpi0 SDC1 addr 0xfe30/0x100 irq 158
> sdhc0: base clock frequency unknown
> "BCM2855" at acpi0 not configured
> bse0 at acpi0 ETH0 addr 0xfd58/0x1 irq 189: address
> dc:a6:32:cc:db:a7
> brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT

Re: Intel Ethernet (?Synopsys based) on Filet3 Elkhart Lake unconfigured on recent snapshot

2023-04-28 Thread David Gwynne




> On 28 Apr 2023, at 06:06, Ted Ri  wrote:
> 
> To: bugs@openbsd.org
> Subject: Intel Ethernet (?Synopsys based) on Fitlet3 Elkhart Lake
> unconfigured on latest snapshot
> From: t26034...@gmail.com
> Cc: t26034...@gmail.com
> Reply-To: t26034...@gmail.com
> Message-ID: <7f7ce828af994...@c1.my.domain>
> 
>> Synopsis: Intel ethernet on Compulab Fitlet3 Elhart Lake unconfigured on 
>> latest snapshot
>> Category: system
>> Environment:
> System  : OpenBSD 7.3
> Details : OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23
> 17:40:19 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
>> Description:
>"Intel Elkhart Lake Ethernet" rev 0x11 at pci0 dev 29 function
> 1 not configured
>> How-To-Repeat:
> Boot latest snapshot, ethernet unconfigured
>> Fix:
>  jsg's reply to similar bug report on Feb 4 '23 indicated they were
> not supported at that time.

this is still true. my understanding is the dwqe driver could be hacked up to 
support it, but i don't know of anyone with the time, interest, and one of 
these machines to do the work with.

dlg

> 
> 
> dmesg:
> OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23 17:40:19 MDT 2023
>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8378597376 (7990MB)
> avail mem = 8105050112 (7729MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x76a31000 (81 entries)
> bios0: vendor American Megatrends International, LLC. version "5.19"
> date 01/24/2023
> bios0: Default string Default string
> efi0 at bios0: UEFI 2.7
> efi0: American Megatrends rev 0x50013
> acpi0 at bios0: ACPI 6.2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP MCFG SSDT SSDT SSDT FIDT OEM1 HPET APIC PRAM
> RSCI SSDT SSDT SSDT NHLT SSDT SSDT PSDS LPIT SSDT DMAR SSDT TPM2 WSMT
> FPDT
> acpi0: wakeup devices RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4)
> PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) PXSX(S4)
> RP07(S4) PXSX(S4) XHCI(S3) XDCI(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xc000, bus 0-255
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 12-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.2.2.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 12-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB
> 64b/line 12-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Celeron(R) J6413 @ 1.80GHz, 1796.12 MHz, 06-96-01
> cpu3: 
>

Re: vmt not enabled when running as a VM on ESXi on arm platforms

2023-04-26 Thread David Gwynne

is there output from eeprom -p? if so, can you share that too?

> On 25 Apr 2023, at 22:47, John Jore  wrote:
> 
>> Synopsis:  vmt not enabled when running as a VM on ESXi on arm platforms
>> Category:  kernel aarch64 arm
>> Environment:
> System  : OpenBSD 7.3
> Details : OpenBSD 7.3 (GENERIC) #132: Sat Mar 25 13:51:54 MDT 2023
>  
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC
> 
> Architecture: OpenBSD.arm64
> Machine : arm64
>> Description:
> vmt not enabled when OpenBSD is running as a VM on ESXi on ARM 
> platforms
> 
> From vmt.c:
> #if !defined(__i386__) && !defined(__amd64__)
> #error vmt(4) is only supported on i386 and amd64
> #endif
> 
> 
>> How-To-Repeat:
> Default installation
>> Fix:
> Modify vmt.c to allow vmt to load/run on arm platforms?
> 
> 
> dmesg:
> OpenBSD 7.3 (GENERIC) #132: Sat Mar 25 13:51:54 MDT 2023
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC
> real mem  = 260857856 (248MB)
> avail mem = 218324992 (208MB)
> random: good seed from bootblocks
> mainbus0 at root: linux,dummy-virt
> psci0 at mainbus0: PSCI 0.2
> cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
> cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
> cpu0: 1024KB 64b/line 16-way L2 cache
> cpu0: CRC32,ASID16
> efi0 at mainbus0: UEFI 2.6
> efi0: EDK II rev 0x1
> smbios0 at efi0: SMBIOS 3.0.0
> smbios0: vendor VMware version "VEFI" date 12/31/2020
> smbios0: VMware, Inc. VMware VBSA
> apm0 at mainbus0
> ampintc0 at mainbus0 nirq 992, ncpu 1: "interrupt-controller"
> ampintcmsi0 at ampintc0: nspi 927
> "hypervisor" at mainbus0 not configured
> "dcc" at mainbus0 not configured
> agtimer0 at mainbus0: 54000 kHz
> pciecam0 at mainbus0
> pci0 at pciecam0
> 0:15:0: rom address conflict 0x8000/0x8000
> vendor "VMware", unknown product 0x1976 (class bridge subclass host, rev 
> 0x01) at pci0 dev 0 function 0 not configured
> "VMware VMCI" rev 0x10 at pci0 dev 0 function 7 not configured
> vendor "VMware", unknown product 0x0406 (class display subclass VGA, rev 
> 0x00) at pci0 dev 15 function 0 not configured
> ppb0 at pci0 dev 17 function 0 "VMware PCI" rev 0x02
> pci1 at ppb0 bus 1
> ahci0 at pci1 dev 0 function 0 "VMware AHCI" rev 0x00: msi, AHCI 1.3
> ahci0: port 0: 6.0Gb/s
> scsibus0 at ahci0: 32 targets
> sd0 at scsibus0 targ 0 lun 0:  
> naa.5000c29764a8efa2
> sd0: 8192MB, 512 bytes/sector, 16777216 sectors, thin
> ppb1 at pci0 dev 21 function 0 "VMware PCIE" rev 0x01: msi
> pci2 at ppb1 bus 2
> 2:0:0: io address conflict 0xe000/0x20
> em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
> 00:50:56:be:85:3b
> ppb2 at pci0 dev 21 function 1 "VMware PCIE" rev 0x01: msi
> pci3 at ppb2 bus 3
> ppb3 at pci0 dev 21 function 2 "VMware PCIE" rev 0x01: msi
> pci4 at ppb3 bus 4
> ppb4 at pci0 dev 21 function 3 "VMware PCIE" rev 0x01: msi
> pci5 at ppb4 bus 5
> ppb5 at pci0 dev 21 function 4 "VMware PCIE" rev 0x01: msi
> pci6 at ppb5 bus 6
> ppb6 at pci0 dev 21 function 5 "VMware PCIE" rev 0x01: msi
> pci7 at ppb6 bus 7
> ppb7 at pci0 dev 21 function 6 "VMware PCIE" rev 0x01: msi
> pci8 at ppb7 bus 8
> ppb8 at pci0 dev 21 function 7 "VMware PCIE" rev 0x01: msi
> pci9 at ppb8 bus 9
> ppb9 at pci0 dev 22 function 0 "VMware PCIE" rev 0x01: msi
> pci10 at ppb9 bus 10
> xhci0 at pci10 dev 0 function 0 "VMware xHCI" rev 0x00: msi, xHCI 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "VMware xHCI root hub" rev 
> 3.00/1.00 addr 1
> ppb10 at pci0 dev 22 function 1 "VMware PCIE" rev 0x01: msi
> pci11 at ppb10 bus 11
> ppb11 at pci0 dev 22 function 2 "VMware PCIE" rev 0x01: msi
> pci12 at ppb11 bus 12
> ppb12 at pci0 dev 22 function 3 "VMware PCIE" rev 0x01: msi
> pci13 at ppb12 bus 13
> ppb13 at pci0 dev 22 function 4 "VMware PCIE" rev 0x01: msi
> pci14 at ppb13 bus 14
> ppb14 at pci0 dev 22 function 5 "VMware PCIE" rev 0x01: msi
> pci15 at ppb14 bus 15
> ppb15 at pci0 dev 22 function 6 "VMware PCIE" rev 0x01: msi
> pci16 at ppb15 bus 16
> ppb16 at pci0 dev 22 function 7 "VMware PCIE" rev 0x01: msi
> pci17 at ppb16 bus 17
> ppb17 at pci0 dev 23 function 0 "VMware PCIE" rev 0x01: msi
> pci18 at ppb17 bus 18
> 18:0:0: io address conflict 0xd000/0x20
> em1 at pci18 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address 
> 00:50:56:be:20:0d
> ppb18 at pci0 dev 23 function 1 "VMware PCIE" rev 0x01: msi
> pci19 at ppb18 bus 19
> ppb19 at pci0 dev 23 function 2 "VMware PCIE" rev 0x01: msi
> pci20 at ppb19 bus 20
> ppb20 at pci0 dev 23 function 3 "VMware PCIE" rev 0x01: msi
> pci21 at ppb20 bus 21
> ppb21 at pci0 dev 23 function 4 "VMware PCIE" rev 0x01: msi
> pci22 at ppb21 bus 22
> ppb22 at pci0 dev 23 function 5 "VMware PCIE" rev 0x01: msi
> pci23 at ppb22 bus 23
> ppb23 at pci0 dev 23 function 6 "VMware PCIE" rev 0x01: msi
> pci24 at ppb23 bus 24
> ppb24 at pci0 dev 23 function 7 "VMware PCIE" rev 0x01: msi
> pci25 at

Re: Hetzner arm64 Cloud

2023-04-18 Thread David Gwynne

On Sun, Apr 16, 2023 at 11:39:33PM +0200, Patrick Wildt wrote:
> You can also simply dd the image to /dev/sda and reboot, but that still
> doesn't solve the problem.  The bootup is hard to debug because the
> console is KVM and uses viogpu.  As soon as we exit the EFI bootservices
> the framebuffer is shut down for whatever reason.  Means we can only get
> access to it again through viogpu, which happens pretty late.  I wish we
> had a serial console, because Qemu/edk2 can do it, they just don't make
> it available.  This is gonna be "fun" to debug without serial.

i dont think the problem here is booting openbsd, but if it were the
diff below might help.

this diff teaches BOOTAA64.EFI to load files from the EFI System
Partition that the boot loader was run from. this means you can go
"boot esp0a:bsd.rd" at the boot> prompt and get into the installer.

i wrote this cos i wanted another option for getting openbsd installed
on machines where the boot loader and driver support arent that
great yet. i can imagine it being useful for upgrading the OS on a
system where it's difficult to plug install media in, or repartitioning
or overwriting the disk is risky. especially if you also just want to
check how well the hardware is supported in openbsd before making
changes.

> 
> On Sat, Apr 15, 2023 at 11:33:39AM +0100, Chris Narkiewicz wrote:
> > 
> > I asked Hetzner to import install73.img and mounted it as VM CD-ROM,
> > but it doesn't boot. I'm not sure if this is a bug either.
> > 
> > Cheers,
> > Chris Narkiewic
> > 
> > On Thu, 2023-04-13 at 16:16 +, Mikolaj Kucharski wrote:
> > > Hi,
> > > 
> > > I'm not sure does this belong to bugs@
> > > 
> > > However what I used in the past was Yaifo and I still use it every
> > > few
> > > years, but it takes too much effort to rebase it to -current, so I
> > > didn't touch it for few years now, but for me it worked really
> > > nicely.
> > > 
> > > https://github.com/jedisct1/yaifo
> > > 
> > > 
> > > On Thu, Apr 13, 2023 at 09:00:23AM +0200, Peter J. Philipp wrote:
> > > > Hi,
> > > > 
> > > > Yesterday hetzner.com came out with arm64 cloud instances, I tried
> > > > one out.
> > > > Here is what I found.? The images they give you a choice of does
> > > > not include
> > > > OpenBSD, so I had to get a ubuntu OS.? That's fine the EFI
> > > > partition was
> > > > already mounted.? Through trialing this I found the best way of
> > > > getting the
> > > > OpenBSD loader to boot was the following way:
> > > > 
> > > > 1. place miniroot73.img on the EFI partition root (/boot/efi/)
> > > > 2. reboot
> > > > 3. press escape to get to the BIOS, there is 3 options one is a
> > > > configuration
> > > > ?? option under 1, enter it.? I'm working off memory here I didn't
> > > > save 
> > > > ?? anything so take it with a grain of salt on exactness.? In this
> > > > option is
> > > > ?? an option to create a RAM drive from a file, go there and enter
> > > > the
> > > > ?? miniroot73.img (45MB).? The down arrows didn't work in this BIOS
> > > > so it was
> > > > ?? great that it wrapped around going up.
> > > > 4. next go back into the main bios screen by pressing escape.?
> > > > There is option
> > > > ?? 3 for boot options, enter it.? There is a boot from file option
> > > > enter it.
> > > > ?? Select the RAM drive and manouver your way to the bootaa64.efi
> > > > file.? Press
> > > > ?? enter.
> > > > 5. OpenBSD loader now loads.? ls displays bsd and bsd.rd, the
> > > > console is on
> > > > ?? comcons0 or something like that.? Switching to fb0 works too.?
> > > > Then when
> > > > ?? pressing boot a blank screen happens.? Waiting a while no
> > > > prompts and I
> > > > ?? didn't try to blind type anything.? Doing this again with fb0
> > > > doesn't
> > > > ?? work either.
> > > > 6. Full stop, I didn't get further.
> > > > 
> > > > I then deleted my instance as ubuntu is not good enough for me.? I
> > > > guess we'll
> > > > have to wait until the pros get to it.? Thanks!
> > > > 
> > > > Best Regards,
> > > > -peter
> > > > 
> > > 
> > 
> > -- 
> > +44 7502 415 180 (Phone, Signal, WhatsApp)
> > @ezaquarii:etacassiopeiae.net (Matrix)

Index: conf.c
===
RCS file: /cvs/src/sys/arch/arm64/stand/efiboot/conf.c,v
retrieving revision 1.44
diff -u -p -r1.44 conf.c
--- conf.c  15 Feb 2023 14:13:38 -  1.44
+++ conf.c  15 Apr 2023 02:22:34 -
@@ -58,10 +58,13 @@ struct fs_ops file_system[] = {
  ufs_stat,ufs_readdir,  ufs_fchmod },
{ ufs2_open,   ufs2_close,   ufs2_read,   ufs2_write,   ufs2_seek,
  ufs2_stat,   ufs2_readdir, ufs2_fchmod },
+   { esp_open,esp_close,esp_read,esp_write,esp_seek,
+ esp_stat,esp_readdir,  }
 };
 int nfsys = nitems(file_system);
 
 struct devsw   devsw[] = {
+   { "esp", espstrategy, espopen, espclose, espioctl },
{ "tftp", tftpstrategy, tftpopen, tftpclose, tftpioctl },
{ "sd",

Re: Sierra Wireless MC7750 attaches as ugen(4) on OpenBSD 7.3 #1125 2023-March-25

2023-04-06 Thread David Gwynne

On Thu, Apr 06, 2023 at 09:13:27AM +, Gerhard Roth wrote:
> On Thu, 2023-04-06 at 18:49 +1000, David Gwynne wrote:
> > On Wed, Apr 05, 2023 at 11:22:34PM +, Mikolaj Kucharski wrote:
> > 
> > this is almost certainly a qualcomm msm interface (qmi) device.
> > making umsm(4) attach to it is a good first start.
> > 
> > hopefully you'll be able to talk AT commands to one of the interfaces.
> > 
> > qmi devices are notoriously inconsistent and complicated, so what to do
> > next isnt clear. i would be trying to tell the modem to switch to mbim
> > mode and then figure out how to get umb(4) to attach. this is similar to
> > the changes i made to umsm and umb for quectel devices, but they
> > actually provided a decent manual.
> 
> The Sierra Wireless documentation is available. Alas, switching the
> mode seems far too complex and error prone to perform this inside
> a driver.

agreed. i just want the kernel to attach the right things to what is
presented.

> When the AT (modem) interface is available, you would have to:
> 
> 1) enter password protected command mode with "AT!ENTERCND=passwd"
> 
> 2) query the list of modes with "AT!UDUSBCOMP=?". Example result:
> 
> 0  - reserved NOT SUPPORTED
> 1  - DM   AT  SUPPORTED
> 2  - reserved NOT SUPPORTED
> 3  - reserved NOT SUPPORTED
> 4  - reserved NOT SUPPORTED
> 5  - reserved NOT SUPPORTED
> 6  - DM   NMEA  ATQMI SUPPORTED
> 7  - DM   NMEA  ATRMNET1 RMNET2 RMNET3SUPPORTED
> 8  - DM   NMEA  ATMBIMSUPPORTED
> 9  - MBIM SUPPORTED
> 10 - NMEA MBIMSUPPORTED
> 11 - DM   MBIMSUPPORTED
> 12 - DM   NMEA  MBIM  SUPPORTED
> 13 - Config1: comp6Config2: comp8 NOT SUPPORTED
> 14 - Config1: comp6Config2: comp9 SUPPORTED
> 15 - Config1: comp6Config2: comp10NOT SUPPORTED
> 16 - Config1: comp6Config2: comp11NOT SUPPORTED
> 17 - Config1: comp6Config2: comp12NOT SUPPORTED
> 18 - Config1: comp7Config2: comp8 NOT SUPPORTED
> 19 - Config1: comp7Config2: comp9 SUPPORTED
> 20 - Config1: comp7Config2: comp10NOT SUPPORTED
> 21 - Config1: comp7Config2: comp11NOT SUPPORTED
> 22 - Config1: comp7Config2: comp12NOT SUPPORTED
> 
> There is no guarantee that the table doesn't change. And every
> device has a differnt set of supported modes.
> 
> 3) select the desired mode with "AT!UDUSBCOMP=X"
> 4) wait for the device to reset itself

yep.

the linux driver has some clues, so the following should let umb attach
once you've reconfigured the modem.

Index: umsm.c
===
RCS file: /cvs/src/sys/dev/usb/umsm.c,v
retrieving revision 1.125
diff -u -p -r1.125 umsm.c
--- umsm.c  2 Apr 2023 23:57:57 -   1.125
+++ umsm.c  6 Apr 2023 09:21:35 -
@@ -101,6 +101,7 @@ struct umsm_type {
 #defineDEV_NORMAL  0x
 #defineDEV_HUAWEI  0x0001
 #defineDEV_TRUINSTALL  0x0002
+#defineDEV_SIERRA  0x0004
 #defineDEV_UMASS1  0x0010
 #defineDEV_UMASS2  0x0020
 #defineDEV_UMASS3  0x0040
@@ -271,6 +272,7 @@ static const struct umsm_type umsm_devs[
{{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_AIRCARD_340U}, 0},
{{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_AIRCARD_770S}, 0},
{{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_MC7455}, 0},
+   {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_MC7700}, DEV_SIERRA},
 
{{ USB_VENDOR_SIMCOM, USB_PRODUCT_SIMCOM_SIM5320}, 0},
{{ USB_VENDOR_SIMCOM, USB_PRODUCT_SIMCOM_SIM7600E}, 0},
@@ -363,6 +365,17 @@ umsm_match(struct device *parent, void *
/* Interface 4 can be used as a network device */
if (uaa->ifaceno >= 4)
return UMATCH_NONE;
+   } else if (flag & DEV_SIERRA) {
+   /* Sierra Wireless layout */
+   switch (uaa->ifaceno) {
+   case 0:
+   case 2:
+   case 3:
+   /* Only umsm on specific interfaces */
+   break;
+   default:
+   return UMATCH_NONE;
+   }
}
 
return UMATCH_VENDOR_IFACESUBCLASS;

Re: Sierra Wireless MC7750 attaches as ugen(4) on OpenBSD 7.3 #1125 2023-March-25

2023-04-06 Thread David Gwynne

On Wed, Apr 05, 2023 at 11:22:34PM +, Mikolaj Kucharski wrote:
> On Wed, Apr 05, 2023 at 11:16:55PM +, miko...@kucharski.name wrote:
> > >Synopsis:  Sierra Wireless MC7750 attaches as ugen(4)
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 7.3
> > Details : OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 
> > 2023
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Sierra Wireless MC7750 attaches as ugen(4) on PC Engines APU3,
> > running OpenBSD. Modem is mini-PCIe device.
> > >How-To-Repeat:
> > No specific steps. Plug in the device, boot up OpenBSD.
> > >Fix:
> > Unknown.
> > 
> 
> Forgot to also add lsusb output.
> 
> # lsusb -vvv -d 1199:68a2
> 
> Bus 002 Device 003: ID 1199:68a2 Sierra Wireless, Inc. 
> Device Descriptor:
>   bLength18
>   bDescriptorType 1
>   bcdUSB   2.00
>   bDeviceClass0 (Defined at Interface level)
>   bDeviceSubClass 0 
>   bDeviceProtocol 0 
>   bMaxPacketSize064
>   idVendor   0x1199 Sierra Wireless, Inc.
>   idProduct  0x68a2 
>   bcdDevice0.06
>   iManufacturer   3 Sierra Wireless, Incorporated
>   iProduct2 MC7750
>   iSerial 4 00102700999
>   bNumConfigurations  1
>   Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength  115
> bNumInterfaces  4
> bConfigurationValue 1
> iConfiguration  1 Sierra Configuration
> bmAttributes 0xe0
>   Self Powered
>   Remote Wakeup
> MaxPower0mA
> Interface Descriptor:
>   bLength 9
>   bDescriptorType 4
>   bInterfaceNumber0
>   bAlternateSetting   0
>   bNumEndpoints   2
>   bInterfaceClass   255 Vendor Specific Class
>   bInterfaceSubClass255 Vendor Specific Subclass
>   bInterfaceProtocol255 Vendor Specific Protocol
>   iInterface  0 
>   Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x81  EP 1 IN
> bmAttributes2
>   Transfer TypeBulk
>   Synch Type   None
>   Usage Type   Data
> wMaxPacketSize 0x0200  1x 512 bytes
> bInterval  32
>   Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x01  EP 1 OUT
> bmAttributes2
>   Transfer TypeBulk
>   Synch Type   None
>   Usage Type   Data
> wMaxPacketSize 0x0200  1x 512 bytes
> bInterval  32
> Interface Descriptor:
>   bLength 9
>   bDescriptorType 4
>   bInterfaceNumber2
>   bAlternateSetting   0
>   bNumEndpoints   2
>   bInterfaceClass   255 Vendor Specific Class
>   bInterfaceSubClass255 Vendor Specific Subclass
>   bInterfaceProtocol255 Vendor Specific Protocol
>   iInterface  0 
>   Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x82  EP 2 IN
> bmAttributes2
>   Transfer TypeBulk
>   Synch Type   None
>   Usage Type   Data
> wMaxPacketSize 0x0200  1x 512 bytes
> bInterval  32
>   Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x02  EP 2 OUT
> bmAttributes2
>   Transfer TypeBulk
>   Synch Type   None
>   Usage Type   Data
> wMaxPacketSize 0x0200  1x 512 bytes
> bInterval  32
> Interface Descriptor:
>   bLength 9
>   bDescriptorType 4
>   bInterfaceNumber3
>   bAlternateSetting   0
>   bNumEndpoints   3
>   bInterfaceClass   255 Vendor Specific Class
>   bInterfaceSubClass255 Vendor Specific Subclass
>   bInterfaceProtocol255 Vendor Specific Protocol
>   iInterface  0 
>   Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83  EP 3 IN
> bmAttributes3
>   Transfer TypeInterrupt
>   Synch Type   None
>   Usage Type   Data
> wMaxPacketSize 0x0040  1x 64 bytes
> bInterval

Re: Dell Wyse 3040 acpitz vs tipmic

2023-02-26 Thread David Gwynne

On Sun, Feb 26, 2023 at 01:28:04PM +0100, Mark Kettenis wrote:
> > Date: Sun, 26 Feb 2023 18:13:18 +1000
> > From: David Gwynne 

yeesh, i should have proofread my email before i sent it. sorry about
making it harder to read than it should have been.

> > i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
> > like them a lot. however, i have to disable acpitz to be able to
> > use them because the driver gets stuck during attach.
> > 
> > during apcitz_attach does a read of all the temperatures. the read
> > of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
> > tipmic_thermal_opreg_handler has a loop on line 335 waiting for
> > sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
> > acpitz_attach is running while the kernel is code, and it appears that
> > the interrupt handler never runs, so that value never changes, and
> > acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
> > do anything. thanks to patrick for helping me on the acpi side of things
> > so we could figure this out.
> 
> A better approach might be to make sure that while we're cold,
> tipmic_thermal_opreg_handler() polls for completion.  Something like:
> 
>   while (sc->sc_stat_adc == 0) {
>   if (cold) {
>   delay(1000);
>   tpmic_intr();
>   } else {
>   if (tsleep(>sc_stat_adc, PRIBIO, "tipmic",
>   SEC_TO_NSEC(1))) {
>   ...
>   }
>   }
>   }   
> 
> 
> > i tried deferring basically all of acpitz_attach to when kthreads are
> > running, and that works well enough to get to userland.
> > 
> > is that reasonable?
> 
> The problem is that you can't really know whether AML accesses the
> opregion while cold.

good point. the diff below works in this situation and is less
intrusive.

> > also, shortly after dwiic complains about short reads and the kernel
> > locks up again. i'll have to plug it in and transcribe the exact
> > errors. i think that's a separate problem though.
> 
> Yes, dwiic(4) has always been somewhat problematic.  Transactions seem
> to fail randomly on some platforms like the atom system you're looking
> at but also on my Ampere eMAG system.

fun. i managed to catch some of the dwiic stuff via dmesg before
it locked up:

dwiic0: timed out waiting for tx_empty intr
dwiic0: timed out waiting for rx_full intr
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x5b
dwiic0: timed out waiting for tx_empty intr
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x01
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x01
dwiic0: timed out waiting for rx_full intr
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x5a
dwiic0: timed out waiting for rx_full intr
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x50
dwiic0: timed out waiting for stop intr
dwiic0: timed out waiting for stop intr
dwiic0: timed out waiting for stop intr
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x01
dwiic0: timed out waiting for bus idle
dwiic0: timed out waiting for stop intr
dwiic0: timed out waiting for stop intr
dwiic0: timed out waiting for stop intr
dwiic0: timed out waiting for stop intr
dwiic0: timed out waiting for stop intr
dwiic0: timed out reading remaining 1
tipmic0: can't read register 0x01
dwiic0: timed out waiting for bus idle

Index: tipmic.c
===
RCS file: /cvs/src/sys/dev/acpi/tipmic.c,v
retrieving revision 1.7
diff -u -p -r1.7 tipmic.c
--- tipmic.c6 Apr 2022 18:59:27 -   1.7
+++ tipmic.c26 Feb 2023 23:56:04 -
@@ -276,6 +276,25 @@ struct tipmic_regmap tipmic_thermal_regm
{ 0x18, TIPMIC_SYSTEMP_HI, TIPMIC_SYSTEMP_LO }
 };
 
+static int
+tipmic_wait_adc(struct tipmic_softc *sc)
+{
+   int i;
+
+   if (!cold) {
+   return (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic",
+   SEC_TO_NSEC(1)));
+   }
+
+   for (i = 0; i < 1000; i++) {
+   delay(1000);
+   if (tipmic_intr(sc) == 1)
+   return (0);
+   }
+
+   return (EWOULDBLOCK);
+}
+
 int
 tipmic_thermal_opreg_handler(void *cookie, int iodir, uint64_t address,
 int size, uint64_t *value)
@@ -333,8 +352,7 @@ tipmic_thermal_opreg_handler(void *cooki
splx(s);
 
while (sc->sc_stat_adc == 0) {
-   if (tsleep_nsec(>sc_stat_adc, PRIBIO, "tipmic",
-   SEC_TO_NSEC(1))) {
+   if (tipmic_wait_adc(sc)) {
printf("%s: ADC timeout\n", sc->sc_dev.dv_xname);
break;
}

Dell Wyse 3040 acpitz vs tipmic

2023-02-26 Thread David Gwynne

i picked a couple of Dell Wyse 3040 boxes, which are very cute, i
like them a lot. however, i have to disable acpitz to be able to
use them because the driver gets stuck during attach.

during apcitz_attach does a read of all the temperatures. the read
of _TMP ends up talking to tipmic(4) via tipmic_thermal_opreg_handler().
tipmic_thermal_opreg_handler has a loop on line 335 waiting for
sc->sc_stat_adc to change, but that value is only set from tipmic_intr.
acpitz_attach is running while the kernel is code, and it appears that
the interrupt handler never runs, so that value never changes, and
acpitz blocks. also because it's cold, the timeout on the tsleep doesn't
do anything. thanks to patrick for helping me on the acpi side of things
so we could figure this out.

i tried deferring basically all of acpitz_attach to when kthreads are
running, and that works well enough to get to userland.

is that reasonable?

also, shortly after dwiic complains about short reads and the kernel
locks up again. i'll have to plug it in and transcribe the exact
errors. i think that's a separate problem though.

OpenBSD 7.2-current (GENERIC.MP) #1071: Wed Feb 22 17:34:56 MST 2023
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2018418688 (1924MB)
avail mem = 1937928192 (1848MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7a9f4000 (50 entries)
bios0: vendor Dell Inc. version "1.2.5" date 08/20/2018
bios0: Dell Inc. Wyse 3040 Thin Client
efi0 at bios0: UEFI 2.4
efi0: American Megatrends rev 0x5000b
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT SSDT UEFI SSDT HPET SSDT 
SSDT SSDT LPIT BCFG PRAM CSRT WDAT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.02 MHz, 06-4c-04
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
cpu0: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
16-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 79MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.03 MHz, 06-4c-04
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
cpu1: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
16-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.04 MHz, 06-4c-04
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
cpu2: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
16-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz, 480.07 MHz, 06-4c-04
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
cpu3: 24KB 64b/line 6-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
16-way L2 cache
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 1 pa 0xfec0, version 20, 115 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xe000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (RP01)
acpiprt2 at acpi0: bus -1 (RP02)
acpiprt3 at acpi0: bus -1 (RP03)
acpiprt4 at acpi0: bus -1 (RP04)
"INT33A4" at acpi0 not configured
dwiic0 at acpi0 I2C7 addr 0x9151e000/0x1000 irq 38
iic0 at dwiic0
chvgpio0 at acpi0 GPO1 uid 2 addr 0xfed88000/0x8000 irq 48, 59 pins
tipmic0 at iic0 addr 0x5e gpio 15
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
com0 at acpi0 IURT

Re: deadlock in ifconfig

2022-11-23 Thread David Gwynne

On Tue, Nov 22, 2022 at 10:49:35AM +1000, David Gwynne wrote:
> On Mon, Nov 21, 2022 at 08:53:52PM +0100, Mark Kettenis wrote:
> > > Date: Mon, 21 Nov 2022 20:28:35 +0100
> > > From: Alexander Bluhm 
> > > 
> > > Hi,
> > > 
> > > Some of my test machines hang while booting userland.
> > > 
> > > starting network
> > > -> here it hangs
> > > load: 0.02  cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k
> > > 
> > > ddb shows these two processes.
> > > 
> > >  81303  375320  89140  0  3 0x3  sbar  ifconfig
> > >  48135  157353  0  0  3 0x14200  netlock   systqmp
> > > 
> > > ddb{0}> trace /t 0t375320
> > > sleep_finish(800022d31318,1) at sleep_finish+0xfe
> > > cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54
> > > sched_barrier(800022512ff0) at sched_barrier+0x73
> > > ixgbe_stop(80118000) at ixgbe_stop+0x1f7
> > > ixgbe_init(80118000) at ixgbe_init+0x32
> > > ixgbe_ioctl(80118048,8020690c,8022ec00) at 
> > > ixgbe_ioctl+0x13a
> > > in_ifinit(80118048,8022ec00,800022d31740,1) at 
> > > in_ifinit+0x
> > > ef
> > > in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at 
> > > in_ioct
> > > l_change_ifaddr+0x3a4
> > > in_control(fd81901dc740,8040691a,800022d31730,80118048) 
> > > at in_c
> > > ontrol+0x75
> > > ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at 
> > > ifioctl
> > > +0x982
> > > sys_ioctl(800022d6,800022d31840,800022d318a0) at 
> > > sys_ioctl+0x2c
> > > 4
> > > syscall(800022d31910) at syscall+0x384
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x7f7d94a0, count: -13
> > > 
> > > ddb{0}> trace /t 0t157353
> > > sleep_finish(800022ca8b70,1) at sleep_finish+0xfe
> > > rw_enter(822b4f80,1) at rw_enter+0x1cb
> > > pf_purge(0) at pf_purge+0x1d
> > > taskq_thread(822ac568) at taskq_thread+0x100
> > > end trace frame: 0x0, count: -4
> > > 
> > > ifconfig waits for the sched_barrier_task() on the systqmp task
> > > queue while holding the netlock.  pf_purge() runs on the systqmp
> > > task queue and is waiting for the netlock.  The netlock has been
> > > taken by ifconfig in in_ioctl_change_ifaddr().
> > > 
> > > The problem has been introduced when pf_purge() was moved from systq
> > > to systqmp.
> > > https://marc.info/?l=openbsd-cvs=166818274216800=2
> > 
> > I'd say pfpurge should be moved to itw own taskq.
> 
> we're working toward dropping the need for NET_LOCK before PF_LOCK. could
> we try the diff below as a compromise?
>

sashan@ and mvs@ have pushed that forward, so this diff should be enough
now.

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1153
diff -u -p -r1.1153 pf.c
--- pf.c12 Nov 2022 02:48:14 -  1.1153
+++ pf.c24 Nov 2022 01:21:48 -
@@ -1603,9 +1603,6 @@ pf_purge(void *null)
 {
unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]);
 
-   /* XXX is NET_LOCK necessary? */
-   NET_LOCK();
-
PF_LOCK();
 
pf_purge_expired_src_nodes();
@@ -1616,7 +1613,6 @@ pf_purge(void *null)
 * Fragments don't require PF_LOCK(), they use their own lock.
 */
pf_purge_expired_fragments();
-   NET_UNLOCK();
 
/* interpret the interval as idle time between runs */
timeout_add_sec(_purge_to, interval);
@@ -1891,7 +1887,6 @@ pf_purge_expired_states(const unsigned i
if (SLIST_EMPTY())
return (scanned);
 
-   NET_LOCK();
rw_enter_write(_state_list.pfs_rwl);
PF_LOCK();
PF_STATE_ENTER_WRITE();
@@ -1904,7 +1899,6 @@ pf_purge_expired_states(const unsigned i
PF_STATE_EXIT_WRITE();
PF_UNLOCK();
rw_exit_write(_state_list.pfs_rwl);
-   NET_UNLOCK();
 
while ((st = SLIST_FIRST()) != NULL) {
SLIST_REMOVE_HEAD(, gc_list);

Re: deadlock in ifconfig

2022-11-21 Thread David Gwynne

On Mon, Nov 21, 2022 at 08:53:52PM +0100, Mark Kettenis wrote:
> > Date: Mon, 21 Nov 2022 20:28:35 +0100
> > From: Alexander Bluhm 
> > 
> > Hi,
> > 
> > Some of my test machines hang while booting userland.
> > 
> > starting network
> > -> here it hangs
> > load: 0.02  cmd: ifconfig 81303 [sbar] 0.00u 0.15s 0% 78k
> > 
> > ddb shows these two processes.
> > 
> >  81303  375320  89140  0  3 0x3  sbar  ifconfig
> >  48135  157353  0  0  3 0x14200  netlock   systqmp
> > 
> > ddb{0}> trace /t 0t375320
> > sleep_finish(800022d31318,1) at sleep_finish+0xfe
> > cond_wait(800022d313b0,81f15e9d) at cond_wait+0x54
> > sched_barrier(800022512ff0) at sched_barrier+0x73
> > ixgbe_stop(80118000) at ixgbe_stop+0x1f7
> > ixgbe_init(80118000) at ixgbe_init+0x32
> > ixgbe_ioctl(80118048,8020690c,8022ec00) at ixgbe_ioctl+0x13a
> > in_ifinit(80118048,8022ec00,800022d31740,1) at 
> > in_ifinit+0x
> > ef
> > in_ioctl_change_ifaddr(8040691a,800022d31730,80118048,1) at 
> > in_ioct
> > l_change_ifaddr+0x3a4
> > in_control(fd81901dc740,8040691a,800022d31730,80118048) at 
> > in_c
> > ontrol+0x75
> > ifioctl(fd81901dc740,8040691a,800022d31730,800022d6) at 
> > ifioctl
> > +0x982
> > sys_ioctl(800022d6,800022d31840,800022d318a0) at 
> > sys_ioctl+0x2c
> > 4
> > syscall(800022d31910) at syscall+0x384
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x7f7d94a0, count: -13
> > 
> > ddb{0}> trace /t 0t157353
> > sleep_finish(800022ca8b70,1) at sleep_finish+0xfe
> > rw_enter(822b4f80,1) at rw_enter+0x1cb
> > pf_purge(0) at pf_purge+0x1d
> > taskq_thread(822ac568) at taskq_thread+0x100
> > end trace frame: 0x0, count: -4
> > 
> > ifconfig waits for the sched_barrier_task() on the systqmp task
> > queue while holding the netlock.  pf_purge() runs on the systqmp
> > task queue and is waiting for the netlock.  The netlock has been
> > taken by ifconfig in in_ioctl_change_ifaddr().
> > 
> > The problem has been introduced when pf_purge() was moved from systq
> > to systqmp.
> > https://marc.info/?l=openbsd-cvs=166818274216800=2
> 
> I'd say pfpurge should be moved to itw own taskq.

we're working toward dropping the need for NET_LOCK before PF_LOCK. could
we try the diff below as a compromise?

> ixgb(4) holding netlock while calling sched_barrier() is probably
> wrong too.

it's pretty baked in at this point that the SIOCSIFFLAGS ioctl is called
while holding the net lock, and we've been saying for ages that you can
clear IFF_RUNNING and then call intr_barrier (and lots of other
barriers) to wait for things to get off the rings before tearing them
down.

long term, drivers should protect themselves. we're nowhere near that
point though.

Index: pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1153
diff -u -p -r1.1153 pf.c
--- pf.c12 Nov 2022 02:48:14 -  1.1153
+++ pf.c22 Nov 2022 00:29:30 -
@@ -1603,20 +1620,14 @@ pf_purge(void *null)
 {
unsigned int interval = max(1, pf_default_rule.timeout[PFTM_INTERVAL]);
 
-   /* XXX is NET_LOCK necessary? */
-   NET_LOCK();
-
-   PF_LOCK();
-
+   rw_enter_write(_lock); /* PF_LOCK() without NET_LOCK() */
pf_purge_expired_src_nodes();
-
-   PF_UNLOCK();
+   rw_exit_write(_lock); /* PF_UNLOCK() without NET_LOCK() */
 
/*
 * Fragments don't require PF_LOCK(), they use their own lock.
 */
pf_purge_expired_fragments();
-   NET_UNLOCK();
 
/* interpret the interval as idle time between runs */
timeout_add_sec(_purge_to, interval);
@@ -1891,9 +1902,8 @@ pf_purge_expired_states(const unsigned i
if (SLIST_EMPTY())
return (scanned);
 
-   NET_LOCK();
rw_enter_write(_state_list.pfs_rwl);
-   PF_LOCK();
+   rw_enter_write(_lock); /* PF_LOCK() without NET_LOCK() */
PF_STATE_ENTER_WRITE();
SLIST_FOREACH(st, , gc_list) {
if (st->timeout != PFTM_UNLINKED)
@@ -1902,9 +1912,8 @@ pf_purge_expired_states(const unsigned i
pf_free_state(st);
}
PF_STATE_EXIT_WRITE();
-   PF_UNLOCK();
+   rw_exit_write(_lock); /* PF_UNLOCK() without NET_LOCK() */
rw_exit_write(_state_list.pfs_rwl);
-   NET_UNLOCK();
 
while ((st = SLIST_FIRST()) != NULL) {
SLIST_REMOVE_HEAD(, gc_list);

Re: pf panic with clean snapshot (GENERIC.MP) #570

2022-06-09 Thread David Gwynne



i upgraded one of the work firewalls to -current and added the diff
below in, and got what looks like a different panic:

ddb{6}> tr
db_enter() at db_enter+0x5
panic(81e2cc31) at panic+0xbf
__assert(81eae23b,81ee4549,797,81e7fd91) at 
__assert+0x25
pfsync_insert_state(fd816375b720) at pfsync_insert_state+0xec
pf_state_insert(801f,800024cd9cc0,800024cd9cb8,fd816375b720)
 at pf_state_insert+0x2df
pf_test_rule(800024cd9e50,800024cd9e38,800024cd9e30,800024cd9e40,800024cd9e28,800024cd9e4e,1)
 at pf_test_rule+0xec4
pf_test(2,3,816ab800,800024cd9fe8) at pf_test+0x1126
ip_output(fd8052dae400,0,800024cda0b0,1,0,0,81640801) at ip_out 
put+0x72a
ip_forward(fd8052dae400,81640800,fd81840e2ad0,0) at 
ip_forward+0x286
ip_input_if(800024cda2e0,800024cda2dc,4,0,81640800) at 
ip_input_if+0x347
ipv4_input(81640800,fd8052dae400) at ipv4_input+0x37
ether_input(81640800,fd8052dae400) at ether_input+0x394
carp_input(816a2000,fd8052dae400,5e000156) at carp_input+0x186
ether_input(816a2000,fd8052dae400) at ether_input+0x1c5
vlan_input(8161a000,fd8052dae400,800024cda4fc) at 
vlan_input+0x22d
ether_input(8161a000,fd8052dae400) at ether_input+0x83
if_input_process(8019a048,800024cda588) at if_input_process+0x4a
ifiq_process(801f0800) at ifiq_process+0x8e
taskq_thread(8002c080) at taskq_thread+0xfa
end trace frame: 0x0, count: -19
ddb{6}> sh panic
*cpu6: kernel diagnostic assertion "st->sync_state == PFSYNC_S_NONE" failed: 
file "/usr/src/sys/net/if_pfsync.c", line 1943

i'll try and have a look at it. i am probably most responsible for the
code :(

On Wed, Jun 08, 2022 at 12:42:33AM +0200, Alexandr Nedvedicky wrote:
> Hello Hrvoje,
> 
> 
> > 
> > Hi,
> > 
> > while booting with this diff I've got this log:
> > 
> > starting early daemons: syslogd pflogd ntpdwitness: lock_object
> > uninitialized: 0xfd8785c81a
> > 90
> > Starting stack trace...
> > witness_checkorder(fd8785c81a90,9,0) at witness_checkorder+0xad
> > mtx_enter(fd8785c81a80) at mtx_enter+0x34
> > pf_remove_state(fd8785c81988) at pf_remove_state+0x1da
> > pfsync_in_del_c(fd80028977b0,c,2,2) at pfsync_in_del_c+0x9f
> > pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c
> > ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103
> > ip_local(800020b056e8,800020b056f4,fe007fff0220,0) at
> > ip_local+0x1b7
> > ipintr() at ipintr+0x5f
> > if_netisr(0) at if_netisr+0xca
> > taskq_thread(80036000) at taskq_thread+0x11a
> 
> thanks for quick test with pfsync. it has turned out I've forgot to 
> initialize
> a pf_state::mtx in pfsync_state_import() function.
> 
> below is updated diff, which should fix a stack trace reported by witness.
> 
> thanks and
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
> index c78ca62766e..b039a50a7bb 100644
> --- a/sys/net/if_pfsync.c
> +++ b/sys/net/if_pfsync.c
> @@ -157,16 +157,16 @@ struct {
>  };
>  
>  struct pfsync_q {
> - void(*write)(struct pf_state *, void *);
> + int (*write)(struct pf_state *, void *);
>   size_t  len;
>   u_int8_taction;
>  };
>  
>  /* we have one of these for every PFSYNC_S_ */
> -void pfsync_out_state(struct pf_state *, void *);
> -void pfsync_out_iack(struct pf_state *, void *);
> -void pfsync_out_upd_c(struct pf_state *, void *);
> -void pfsync_out_del(struct pf_state *, void *);
> +int  pfsync_out_state(struct pf_state *, void *);
> +int  pfsync_out_iack(struct pf_state *, void *);
> +int  pfsync_out_upd_c(struct pf_state *, void *);
> +int  pfsync_out_del(struct pf_state *, void *);
>  
>  struct pfsync_q pfsync_qs[] = {
>   { pfsync_out_iack,  sizeof(struct pfsync_ins_ack), PFSYNC_ACT_INS_ACK },
> @@ -516,10 +516,10 @@ pfsync_alloc_scrub_memory(struct pfsync_state_peer *s,
>   return (0);
>  }
>  
> -void
> +int
>  pfsync_state_export(struct pfsync_state *sp, struct pf_state *st)
>  {
> - pf_state_export(sp, st);
> + return (pf_state_export(sp, st));
>  }
>  
>  int
> @@ -680,6 +680,7 @@ pfsync_state_import(struct pfsync_state *sp, int flags)
>   st->sync_state = PFSYNC_S_NONE;
>  
>   refcnt_init(>refcnt);
> + mtx_init(>mtx, IPL_NET);
>  
>   /* XXX when we have anchors, use STATE_INC_COUNTERS */
>   r->states_cur++;
> @@ -1529,24 +1530,25 @@ pfsyncioctl(struct ifnet *ifp, u_long cmd, caddr_t 
> data)
>   return (0);
>  }
>  
> -void
> +int
>  pfsync_out_state(struct pf_state *st, void *buf)
>  {
>   struct pfsync_state *sp = buf;
>  
> - pfsync_state_export(sp, st);
> + return (pfsync_state_export(sp, st));
>  }
>  
> -void
> +int
>  pfsync_out_iack(struct pf_state

Re: relayd panic

2022-06-06 Thread David Gwynne




> On 6 Jun 2022, at 18:08, Claudio Jeker  wrote:
> 
> On Mon, Jun 06, 2022 at 12:03:06AM +0200, Alexandr Nedvedicky wrote:
>> Hello,
>> 
>> I'll commit one-liner diff on Tuesday morning (Jun 6th) Prague time without
>> explicit OK, unless there will be no objection.
> 
> The diff is OK claudio@.

ok by me too.

> 
>> regards
>> sashan
>> 
>> 
>> On Sun, Jun 05, 2022 at 09:44:45AM +0100, Stuart Henderson wrote:
>>> I don't know this code well enough to give a meaningful OK, but this
>>> should probably get committed.
>>> 
>>> 
>>> On 2022/06/01 09:16, Alexandr Nedvedicky wrote:
 Hello,
 
 
> r420-1# rcctl -f start relayd
> relayd(ok)
> r420-1# uvm_fault(0xfd862f82f990, 0x0, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at pf_find_or_create_ruleset+0x1c: movb 0(%rdi),%al
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> 431388 19003 0 0x2 0 5 relayd
> 174608 32253 89 0x112 0 2 relayd
> 395415 12468 0 0x2 0 4 relayd
> 493579 11904 0 0x2 0 3 relayd
> *101082 14967 89 0x1100012 0 0K relayd
> pf_find_or_create_ruleset(0) at pf_find_or_create_ruleset+0x1c
> pfr_add_tables(832d7cca800,1,80eaf43c,1000) at
> pfr_add_tables+0x6ae
> 
> pfioctl(4900,c450443d,80eaf000,3,80002272e7f0) at 
> pfioctl+0x1d9f
> VOP_IOCTL(fd8551f82dd0,c450443d,80eaf000,3,fd862f7d60c0,800
> 02272e7f0) at VOP_IOCTL+0x5c
> vn_ioctl(fd855ecec1e8,c450443d,80eaf000,80002272e7f0) at
> vn_ioctl+0x75
> sys_ioctl(80002272e7f0,8000227d9980,8000227d99d0) at
> sys_ioctl+0x2c4
> syscall(8000227d9a40) at syscall+0x374
> Xsyscall() at Xsyscall+0x128
> end of kernel
 
 it looks like we are dying here at line 239 due to NULL pointer deference:
 
 232 struct pf_ruleset *
 233 pf_find_or_create_ruleset(const char *path)
 234 {
 235 char *p, *aname, *r;
 236 struct pf_ruleset *ruleset;
 237 struct pf_anchor *anchor;
 238 
 239 if (path[0] == 0)
 240 return (_main_ruleset);
 241 
 242 while (*path == '/')
 243 path++;
 244 
 
 I've followed the same steps to reproduce the issue to check if
 diff below resolves the issue. The bug has been introduced by
 my recent change to pf_table.c [1] from May 10th:
 
Modified files:
sys/net : pf_ioctl.c pf_table.c 
 
Log message:
move memory allocations in pfr_add_tables() out of
NET_LOCK()/PF_LOCK() scope. bluhm@ helped a lot
to put this diff into shape.
 
 besides using a regression test I've also did simple testing
 using a 'load anchor':
 
 netlock# cat /tmp/anchor.conf 
 load anchor "test" from "/tmp/pf.conf"
 netlock#
 netlock# cat /tmp/pf.conf 
 table  { 192.168.1.1 }
 pass from 
 netlock#
 netlock# pfctl -sA
 test
 netlock# pfctl -a test -sT
 try
 netlock# pfctl -a test -t try -T show
 192.168.1.1
 
 OK to commit fix below?
 
 thanks and
 regards
 sashan
 
 [1] 
 https://urldefense.com/v3/__https://marc.info/?l=openbsd-cvs=16522243003=2__;!!ACWV5N9M2RV99hQ!LsTJPPsMku6N_u9xzJu6Tj6XpZWyLzLWPmbWr-Z-p845Y8r6LH4Ul8PyX8EmqI6alhF0JqadpBBF4mn53v-rQdY$
  
 
 8<---8<---8<--8<
 diff --git a/sys/net/pf_table.c b/sys/net/pf_table.c
 index 8315ea5dd3a..dfc49de5efe 100644
 --- a/sys/net/pf_table.c
 +++ b/sys/net/pf_table.c
 @@ -1628,8 +1628,7 @@ pfr_add_tables(struct pfr_table *tbl, int size, int 
 *nadd, int flags)
if (r != NULL)
continue;
 
 -  q->pfrkt_rs = pf_find_or_create_ruleset(
 -   q->pfrkt_root->pfrkt_anchor);
 +  q->pfrkt_rs = 
 pf_find_or_create_ruleset(q->pfrkt_anchor);
/*
 * root tables are attached to main ruleset,
 * because ->pfrkt_anchor[0] == '\0'
 
>> 
> 
> -- 
> :wq Claudio

Re: [External] : Re: ip6 forwarding with pf and pfsync over veb/vport

2022-05-28 Thread David Gwynne




> On 24 May 2022, at 17:01, Alexandr Nedvedicky 
>  wrote:
> 
> Hello Hrvoje,
> 
> On Mon, May 23, 2022 at 06:34:07PM +0200, Hrvoje Popovski wrote:
>> On 23.5.2022. 10:41, Hrvoje Popovski wrote:
>>> On 23.5.2022. 8:34, Alexandr Nedvedicky wrote:
looks like kind of memory corruption. my bet is use-after-free.
will try to get to it later today.
 
does it mean there is no such panic, when we handle IPv4 traffic only?
>>> 
>>> Hi,
>>> 
>>> yes, it seems that i can't trigger panic with ip4 only traffic, at least
>>> the same way i can with ip6 traffic
>>> 
>> 
>> All day I'm trying to trigger panic with ip4 and I just can't 
> 
>interesting. I went through mbuf handling in if_veb.c
>I just could find a single nit, which is most likely unrelated,
>however I think it's still worth to give it a try a diff below.
> 
>basically all calls to veb_pf() read as follows:
>   m = veb_pf(ifp, ..., m);
>except the one in veb_broadcast(), which readsa as:
>   m = veb_pf(ifp, ..., m0);
>I think it is a bug, veb_pf() caller should continue to run
>with packet returned by veb_pf().

yes. ok by me. can you fix the same thing in the ipsec handling too?

> 
> thanks and
> regards
> sashan
> 
> 8<---8<---8<--8<
> diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
> index 67185fde228..30a002f95a2 100644
> --- a/sys/net/if_veb.c
> +++ b/sys/net/if_veb.c
> @@ -944,7 +944,7 @@ veb_broadcast(struct veb_softc *sc, struct veb_port *rp, 
> struct mbuf *m0,
>* let pf look at it, but use the veb interface as a proxy.
>*/
>   if (ISSET(ifp->if_flags, IFF_LINK1) &&
> - (m = veb_pf(ifp, PF_OUT, m0)) == NULL)
> + (m0 = veb_pf(ifp, PF_OUT, m0)) == NULL)
>   return;
> #endif
> 
>

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread David Gwynne

On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote:
> On 12.5.2022. 20:04, Hrvoje Popovski wrote:
> > On 12.5.2022. 16:22, Hrvoje Popovski wrote:
> >> On 12.5.2022. 14:48, Claudio Jeker wrote:
> >>> I think the diff below may be enough to fix this issue. It drops the SMR
> >>> critical secition around the enqueue operation but uses a reference on the
> >>> port insteadt to ensure that the device can't be removed during the
> >>> enqueue. Once the enqueue is finished we enter the SMR critical section
> >>> again and drop the reference.
> >>>
> >>> To make it clear that the SMR_TAILQ remains intact while a refcount is
> >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> >>> not strictly needed since the next pointer remains valid up until the
> >>> smr_barrier() call but I find this a bit easier to understand.
> >>> First make sure nobody else holds a reference then remove the port from
> >>> the list.
> >>>
> >>> I currently have no test setup to verify this but I hope someone else can
> >>> give this a spin.
> >> Hi,
> >>
> >> for now veb seems stable and i can't panic box although it should, but
> >> please give me few more hours to torture it properly.
> > 
> > 
> > I can trigger panic in veb with this diff.
> > 
> > Thank you ..
> > 
> > 
> 
> I can't trigger ... :))) sorry ..

sorry i'm late to the party. can you try this diff?

this diff replaces the list of ports with an array/map of ports.
the map takes references to all the ports, so the forwarding paths
just have to hold a reference to the map to be able to use all the
ports. the forwarding path uses smr to get hold of a map, takes a map
ref, and then leaves the smr crit section before iterating over the map
and pushing packets.

this means we should only take and release a single refcnt when
we're pushing packets out any number of ports.

if no span ports are configured, then there's no span port map and
we don't try and take a ref, we can just return early.

we also only take and release a single refcnt when we forward the
actual packet. forwarding to a single port provided by an etherbridge
lookup already takes/releases the single port ref. if it falls
through that for unknown unicast or broadcast/multicast, then it's
a single refcnt for the current map of all ports.

Index: if_veb.c
===
RCS file: /cvs/src/sys/net/if_veb.c,v
retrieving revision 1.25
diff -u -p -r1.25 if_veb.c
--- if_veb.c4 Jan 2022 06:32:39 -   1.25
+++ if_veb.c13 May 2022 02:01:43 -
@@ -139,13 +139,13 @@ struct veb_port {
struct veb_rule_list p_vr_list[2];
 #define VEB_RULE_LIST_OUT  0
 #define VEB_RULE_LIST_IN   1
-
-   SMR_TAILQ_ENTRY(veb_port)p_entry;
 };
 
 struct veb_ports {
-   SMR_TAILQ_HEAD(, veb_port)   l_list;
-   unsigned int l_count;
+   struct refcntm_refs;
+   unsigned int m_count;
+
+   /* followed by an array of veb_port pointers */
 };
 
 struct veb_softc {
@@ -155,8 +155,8 @@ struct veb_softc {
struct etherbridge   sc_eb;
 
struct rwlocksc_rule_lock;
-   struct veb_ports sc_ports;
-   struct veb_ports sc_spans;
+   struct veb_ports*sc_ports;
+   struct veb_ports*sc_spans;
 };
 
 #define DPRINTF(_sc, fmt...)do { \
@@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u
 static int veb_p_output(struct ifnet *, struct mbuf *,
struct sockaddr *, struct rtentry *);
 
-static voidveb_p_dtor(struct veb_softc *, struct veb_port *,
-   const char *);
+static inline size_t
+veb_ports_size(unsigned int n)
+{
+   /* use of _ALIGN is inspired by CMSGs */
+   return _ALIGN(sizeof(struct veb_ports)) +
+   n * sizeof(struct veb_port *);
+}
+
+static inline struct veb_port **
+veb_ports_array(struct veb_ports *m)
+{
+   return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m)));
+}
+
+static voidveb_ports_free(struct veb_ports *);
+
+static voidveb_p_unlink(struct veb_softc *, struct veb_port *);
+static voidveb_p_fini(struct veb_port *);
+static voidveb_p_dtor(struct veb_softc *, struct veb_port *);
 static int veb_add_port(struct veb_softc *,
const struct ifbreq *, unsigned int);
 static int veb_del_port(struct veb_softc *,
@@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
return (ENOMEM);
 
rw_init(>sc_rule_lock, "vebrlk");
-   SMR_TAILQ_INIT(>sc_ports.l_list);
-   SMR_TAILQ_INIT(>sc_spans.l_list);
+   sc->sc_ports = NULL;
+   sc->sc_spans = NULL;
 
ifp = >sc_if;
 
@@ -314,7 +331,10 @@ static int
 veb_clone_destroy(struct ifnet *ifp)
 {
struct veb_softc *sc = ifp->if_softc;
-   struct

Re: route sourceaddr and ping/traceroute

2022-03-19 Thread David Gwynne

On Sat, Mar 19, 2022 at 12:53:54PM +0100, Claudio Jeker wrote:
> On Sat, Mar 19, 2022 at 12:27:43PM +0100, Claudio Jeker wrote:
> > On Sat, Mar 19, 2022 at 10:51:13AM +, Stuart Henderson wrote:
> > > On 2022/03/19 19:14, David Gwynne wrote:
> > > > On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> > > > > This should not be done in applications.  The kernel must do it.  It 
> > > > > means
> > > > > the current kernel code is worng.
> > > > 
> > > > i think this is the right place for raw ipv4 sockets like what
> > > > ping/traceroute uses.
> > > > 
> > > > ipv6 looks like it already does the right thing.
> > > 
> > > Given that this turned out so similar to the existing v6 support that
> > > I think you didn't notice before writing the v4 version, this looks
> > > like the right place indeed :)
> > > 
> > > Works for me, OK
> >  
> > Did someone try this on an ospf router?
> > I think our ospfd(8) always includes the source address and uses IP_HDRINCL
> > but not sure about other daemons. Are there other raw IP users that expect
> > the IP source to be set to the outgoing interface by default?
> > Maybe IGMP proxies and routers. E.g. dvmrpd depends on IP_MULTICAST_IF to
> > set the source IP address.
> > 
> > Looking at the code I think this will break the IP_MULTICAST_IF case.
> > In ip_output() the check is like this:
> > 
> > if ((IN_MULTICAST(ip->ip_dst.s_addr) ||
> > (ip->ip_dst.s_addr == INADDR_BROADCAST)) &&
> > imo != NULL && (ifp = if_get(imo->imo_ifidx)) != NULL) {
> > 
> > mtu = ifp->if_mtu;
> > if (ip->ip_src.s_addr == INADDR_ANY) {
> > struct in_ifaddr *ia;
> > 
> > IFP_TO_IA(ifp, ia);
> > if (ia != NULL)
> > ip->ip_src = ia->ia_addr.sin_addr;
> > }
> > 
> > Now raw_ip.c already set ip_src to something so
> > if (ip->ip_src.s_addr == INADDR_ANY)
> > is never true.
> > 
> > It is possible to skip the in_pcbselsrc() call in rip_output() when
> > IN_MULTICAST(ip->ip_dst.s_addr) || (ip->ip_dst.s_addr == INADDR_BROADCAST)
> > 
> > Is that good enough?
> 
> Actually in_pcbselsrc() checks the multicast options but only for the
> IN_MULTICAST() case. I guess we could add INADDR_BROADCAST handling in
> there as well. Seems like a sensible thing todo. Broadcast  is just a very
> special version of multicast.

i know we talked about this off list, but for the record there are two
kinds of IP broadcast. there's 255.255.255.255, and the broadcast
address on subnets, eg, 192.168.1.0/24 has a broadcast address of
192.168.1.255. you're talking about 255.255.255.255 here.

im honestly surprised in_pcbselsrc doesnt already do what you're talking
about. like i said, the impression i get from stevens and other bits of
the stack is that 255.255.255.255 is largely treated as a multicast
address, which is why im surprised that it isnt handled already and why
i think it makes sense. what tests do we need to feel confident with it
though? 

> Also if this is changed is there still a way to have ip->ip_src ==
> INADDR_ANY in ip_output()?

dunno. would slapping a counter on it be useful?

Index: in_pcb.c
===
RCS file: /cvs/src/sys/netinet/in_pcb.c,v
retrieving revision 1.261
diff -u -p -r1.261 in_pcb.c
--- in_pcb.c14 Mar 2022 22:38:43 -  1.261
+++ in_pcb.c19 Mar 2022 12:54:55 -
@@ -893,11 +893,13 @@ in_pcbselsrc(struct in_addr **insrc, str
}
 
/*
-* If the destination address is multicast and an outgoing
-* interface has been set as a multicast option, use the
-* address of that interface as our source address.
+* If the destination address is multicast or limited
+* broadcast (255.255.255.255) and an outgoing interface has
+* been set as a multicast option, use the address of that
+* interface as our source address.
 */
-   if (IN_MULTICAST(sin->sin_addr.s_addr) && mopts != NULL) {
+   if ((IN_MULTICAST(sin->sin_addr.s_addr) || 
+   sin->sin_addr.s_addr == INADDR_BROADCAST) && mopts != NULL) {
struct ifnet *ifp;
 
ifp = if_get(mopts->imo_ifidx);

Re: route sourceaddr and ping/traceroute

2022-03-19 Thread David Gwynne

On Thu, Mar 17, 2022 at 12:05:16PM -0600, Theo de Raadt wrote:
> This should not be done in applications.  The kernel must do it.  It means
> the current kernel code is worng.

i think this is the right place for raw ipv4 sockets like what
ping/traceroute uses.

ipv6 looks like it already does the right thing.

Index: raw_ip.c
===
RCS file: /cvs/src/sys/netinet/raw_ip.c,v
retrieving revision 1.123
diff -u -p -r1.123 raw_ip.c
--- raw_ip.c14 Mar 2022 22:38:43 -  1.123
+++ raw_ip.c19 Mar 2022 03:40:44 -
@@ -222,6 +222,7 @@ int
 rip_output(struct mbuf *m, struct socket *so, struct sockaddr *dstaddr,
 struct mbuf *control)
 {
+   struct sockaddr_in *dst = satosin(dstaddr);
struct ip *ip;
struct inpcb *inp;
int flags, error;
@@ -246,8 +247,8 @@ rip_output(struct mbuf *m, struct socket
ip->ip_off = htons(0);
ip->ip_p = inp->inp_ip.ip_p;
ip->ip_len = htons(m->m_pkthdr.len);
-   ip->ip_src = inp->inp_laddr;
-   ip->ip_dst = satosin(dstaddr)->sin_addr;
+   ip->ip_src.s_addr = INADDR_ANY;
+   ip->ip_dst = dst->sin_addr;
ip->ip_ttl = inp->inp_ip.ip_ttl ? inp->inp_ip.ip_ttl : MAXTTL;
} else {
if (m->m_pkthdr.len > IP_MAXPACKET) {
@@ -262,11 +263,23 @@ rip_output(struct mbuf *m, struct socket
ip = mtod(m, struct ip *);
if (ip->ip_id == 0)
ip->ip_id = htons(ip_randomid());
+   dst->sin_addr = ip->ip_dst;
 
/* XXX prevent ip_output from overwriting header fields */
flags |= IP_RAWOUTPUT;
ipstat_inc(ips_rawout);
}
+
+   if (ip->ip_src.s_addr == INADDR_ANY) {
+   struct in_addr *laddr;
+
+   error = in_pcbselsrc(, dst, inp);
+   if (error != 0)
+   return (error);
+
+   ip->ip_src = *laddr;
+   }
+
 #ifdef INET6
/*
 * A thought:  Even though raw IP shouldn't be able to set IPv6

Re: UDP divert-to rule: getsockname(2) won't show original destination

2022-02-23 Thread David Gwynne




> On 23 Feb 2022, at 02:12, K R  wrote:
> 
> Hi David,
> 
> On Tue, Feb 22, 2022 at 5:27 AM David Gwynne  wrote:
> 
> 
> > On 22 Feb 2022, at 06:31, K R  wrote:
> > 
> >> Synopsis:  UDP divert-to rule: getsockname(2) won't show original
> > destination
> >> Category:  kernel amd64
> >> Environment:
> >System  : OpenBSD 7.1-beta
> >Details : OpenBSD 7.1-beta (GENERIC) #353: Sun Feb 20 17:14:05
> > MST 2022
> > 
> >Architecture: OpenBSD.amd64
> >Machine : amd64
> >> Description:
> > 
> > getsockname(2) won't show the original destination address/port for a
> > UDP inet packet redirected using a PF divert-to rule to a local
> > socket.
> > 
> > This works as expected for TCP.
> > 
> >> How-To-Repeat:
> > 
> > server:
> > 
> > (pf.conf)
> > pass in on vio0 inet proto udp from any to 100.64.0.100 divert-to 127.0.0.1
> > port 9000
> > 
> >>>> import socket
> >>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
> >>>> s.bind(("127.0.0.1", 9000))
> >>>> s.recvfrom(1024)
> > (b'data\n', ('100.64.0.1', 16079))
> >>>> s.getsockname()
> > ('127.0.0.1', 9000)
> > 
> > client:
> > 
> > $ echo data | nc -u 100.64.0.100 12345
> > 
> >> Fix:
> >Unknown.
> 
> This is working as expected for UDP, which is a datagram socket, not a 
> connected TCP stream socket like what you're trying to compare it to. A 
> locally bound but not connected UDP socket will not keep information about 
> received packets on it, you have to get all that from the messages as they're 
> being received.
> 
> If you want the original destination address for that message, you have to 
> ask for it as part of receiving the message. For IPv4 you do that by setting 
> the IP_RECVDSTADDR sockopt on the UDP socket, and then using recvmsg() 
> instead of recvfrom() with space for a control message set up for it to use.
> 
> Thanks, of course, you are right.  Now that you mentioned it, I could find 
> this information on the ip(4)
> 
>  If the IP_RECVDSTADDR option is enabled on a SOCK_DGRAM socket, the
>  recvmsg(2) call will return the destination IP address for a UDP
>  datagram.  The msg_control field in the msghdr structure points to a
>  buffer that contains a cmsghdr structure followed by the IP address.
>  [...]
> 
> and pf(4) manpages:
>  
>  getsockname(2).  For SOCK_DGRAM sockets, the ip(4) socket
>  options IP_RECVDSTADDR and IP_RECVDSTPORT can be used to
>  retrieve the destination address and port.
> 
> What would be nice, IMHO, is to make this clear on the pf.conf(5) 
> manpage when diverto-to is mentioned:
> 
>  divert-to host port port
>  Used to redirect packets to a local socket bound to host and
>  port.  The packets will not be modified, so getsockname(2) on
>  the socket will return the original destination address of the
>  packet.

agreed. i put something in.

> 
> 
> src/usr.bin/tftpd/tftpd.c does this if you want some code to refer to.
> 
> I believe it is src/usr.sbin/tftp-proxy/tftp-proxy.c, correct?

even though tftpd isn't used with diver-to, it still uses the sockopts and 
control messages to get destination addresses for tftp requests. tftp-proxy 
might be better, but i was in tftpd more recently so it was what i remembered 
first.

dlg

Re: UDP divert-to rule: getsockname(2) won't show original destination

2022-02-22 Thread David Gwynne

> On 22 Feb 2022, at 06:31, K R  wrote:
> 
>> Synopsis:  UDP divert-to rule: getsockname(2) won't show original
> destination
>> Category:  kernel amd64
>> Environment:
>System  : OpenBSD 7.1-beta
>Details : OpenBSD 7.1-beta (GENERIC) #353: Sun Feb 20 17:14:05
> MST 2022
> 
>Architecture: OpenBSD.amd64
>Machine : amd64
>> Description:
> 
> getsockname(2) won't show the original destination address/port for a
> UDP inet packet redirected using a PF divert-to rule to a local
> socket.
> 
> This works as expected for TCP.
> 
>> How-To-Repeat:
> 
> server:
> 
> (pf.conf)
> pass in on vio0 inet proto udp from any to 100.64.0.100 divert-to 127.0.0.1
> port 9000
> 
 import socket
 s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
 s.bind(("127.0.0.1", 9000))
 s.recvfrom(1024)
> (b'data\n', ('100.64.0.1', 16079))
 s.getsockname()
> ('127.0.0.1', 9000)
> 
> client:
> 
> $ echo data | nc -u 100.64.0.100 12345
> 
>> Fix:
>Unknown.

This is working as expected for UDP, which is a datagram socket, not a 
connected TCP stream socket like what you're trying to compare it to. A locally 
bound but not connected UDP socket will not keep information about received 
packets on it, you have to get all that from the messages as they're being 
received.

If you want the original destination address for that message, you have to ask 
for it as part of receiving the message. For IPv4 you do that by setting the 
IP_RECVDSTADDR sockopt on the UDP socket, and then using recvmsg() instead of 
recvfrom() with space for a control message set up for it to use.

src/usr.bin/tftpd/tftpd.c does this if you want some code to refer to.

dlg

Re: vxlan broken

2022-02-18 Thread David Gwynne

On Fri, Feb 18, 2022 at 02:25:45PM +0100, Anton Lindqvist wrote:
> On Thu, Feb 17, 2022 at 10:25:19PM +0100, Anton Lindqvist wrote:
> > On Thu, Feb 17, 2022 at 09:50:20PM +0100, Alexander Bluhm wrote:
> > > Hi,
> > > 
> > > With this snapshot regress/sys/net/vxlan crashes the kernel
> > > OpenBSD 7.0-current (GENERIC.MP) #355: Wed Feb 16 13:44:38 MST 2022
> > > 
> > > START sys/net/vxlan   2022-02-17T08:07:25Z
> > > 
> > > rm -f a.out [Ee]rrs mklog *.core y.tab.h
> > > 
> > >  vxlan_1 
> > > ksh /usr/src/regress/sys/net/vxlan/vxlan_1.sh -R "11 12" -I "11 12"
> > > ifconfig: SIOCSLIFPHYRTABLE: Device busy
> > > ifconfig: SIOCSLIFPHYRTABLE: Device busy
> > > Timeout, server ot6 not responding.
> > > 
> > > uvm_fault(0xfd8240699668, 0xc, 0, 2) -> e
> > > kernel: page fault trap, code=0
> > > Stopped at  in_delmulti+0x54:   addl$-0x1,0xc(%r14)
> > > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > > *455775  97669  0 0x2  01K ifconfig
> > > in_delmulti(0) at in_delmulti+0x54
> > > vxlan_ioctl(80e03000,80206910,80002236d760) at 
> > > vxlan_ioctl+0xfce
> > > ifioctl(fd82765d8e08,80206910,80002236d760,8000221b8008) at 
> > > ifioctl+0x92b
> > > soo_ioctl(fd81408b7ae0,80206910,80002236d760,8000221b8008) at 
> > > soo_ioctl+0x161
> > > sys_ioctl(8000221b8008,80002236d870,80002236d8c0) at 
> > > sys_ioctl+0x2c4
> > > syscall(80002236d930) at syscall+0x374
> > > Xsyscall() at Xsyscall+0x128
> > > end of kernel
> > > end trace frame: 0x7f7cb4f0, count: 8
> > > 
> > > The same happens on i386, powerpc64, armv7, arm64, sparc64.
> > 
> > Same here on my amd64-regress machine.
> 
> No panic this time but the tests are failing.
> 
> > sys/net/vxlan:
> Exit: 1
> Duration: 00:04:35
> Log: 151-sys-net-vxlan.log

vxlan needs a parent interface specified when using a multicast
destination address on the tunnel.

the first chunk forces arp to have run before letting pure icmp
through. it shouldnt be necessary, but i have to dig around the arp code
again.

Index: vxlan_2.sh
===
RCS file: /cvs/src/regress/sys/net/vxlan/vxlan_2.sh,v
retrieving revision 1.2
diff -u -p -r1.2 vxlan_2.sh
--- vxlan_2.sh  30 Nov 2016 22:21:20 -  1.2
+++ vxlan_2.sh  18 Feb 2022 03:34:20 -
@@ -22,6 +22,7 @@ do_ping()
 {
local source="$1"
local dest="${VXLAN_NETID}${2}"
+   $PING -q -c 1 -w 1 -V "$source" "$dest" > /dev/null # warm up arp
$PING -q -c 3 -w 1 -V "$source" "$dest" | grep -q ' 0.0% packet loss' 
&& return
echo "Failed to ping $dest from vstack $source"
STATUS=1
@@ -96,7 +97,7 @@ vstack_add() {
$SUDO ifconfig "$vstack_pairname" rdomain "$vstack" $IFCONFIG_OPTS
$SUDO ifconfig "$vstack_pairname" "$AF" "${vstack_tunsrc}${PAIR_PREFX}" 
up
$SUDO ifconfig "vxlan$vstack" rdomain "$vstack" tunneldomain "$vstack" 
$IFCONFIG_OPTS
-   $SUDO ifconfig "vxlan$vstack" vnetid "$VNETID" tunnel "$vstack_tunsrc" 
"${VXLAN_TUNDST}${tundst_sufx}" up
+   $SUDO ifconfig "vxlan$vstack" vnetid "$VNETID" tunnel "$vstack_tunsrc" 
"${VXLAN_TUNDST}${tundst_sufx}" parent "$vstack_pairname" up
[[ -n $DYNAMIC ]] && $SUDO ifconfig "bridge$vstack" rdomain "$vstack" 
add "vxlan$vstack" $IFCONFIG_OPTS up
 }

Re: msk(4) not working with trunk(4) (Marvell Yukon 88E8042)

2022-01-04 Thread David Gwynne

On Tue, Dec 28, 2021 at 04:10:52PM +0100, Alessandro De Laurenzis wrote:
> Ciao David,
> 
> > msk is the first port added to the trunk? ie, it's the preferred port? if 
> > you run tcpdump on msk or watch systat if, do you see packets on msk?
> 
> The network config is pretty standard; an Ethernet port (msk0), a wifi one
> (iwn0), trunk0 with failover (using msk0 as "preferred" port):
> 
> > $ cat /etc/hostname.trunk0 trunkproto failover
> > trunkport msk0
> > trunkport iwn0
> > autoconf
> > up
> 
> Bear with me, tcpdump is a kind of stranger world for me...
> 
> Enclosed please find the output files from the following commands:
> 
> > $ doas tcpdump -i trunk0 -c 50 -w trunk0.dump
> > $ doas tcpdump -i msk0 -c 50 -w msk0.dump
> > $ doas tcpdump -i iwn0 -c 50 -w iwn0.dump
> 
> I see some "broadcast" packages on both trunk0 and msk0 (trunk0 didn't
> receive the inet address from the DHCP server, of course); nothing as
> expected on iwn0.
> 
> Hope this answers to your question...

it was hard to see the trees because of the forest :(

im back in the office today, so i dug out a box with msk and was able to
reproduce the problem.

it looks like the hardware "remembers" stuff between when an msk port is
taken down and goes up again, like what trunk does to an interface when
it's added as a port. i think the problem would occur if you just took
msk down and up again during normal operation too, but that seems to be
a rare event...

the diff that borked it did so because of how it accounted for free
space on the ring, and was otherwise lucky.

can you try this diff?

Index: if_msk.c
===
RCS file: /cvs/src/sys/dev/pci/if_msk.c,v
retrieving revision 1.136
diff -u -p -r1.136 if_msk.c
--- if_msk.c12 Dec 2020 11:48:53 -  1.136
+++ if_msk.c4 Jan 2022 08:06:11 -
@@ -135,7 +135,7 @@ int msk_intr(void *);
 void msk_intr_yukon(struct sk_if_softc *);
 static inline int msk_rxvalid(struct sk_softc *, u_int32_t, u_int32_t);
 void msk_rxeof(struct sk_if_softc *, struct mbuf_list *, uint16_t, uint32_t);
-void msk_txeof(struct sk_if_softc *);
+void msk_txeof(struct sk_if_softc *, unsigned int);
 static unsigned int msk_encap(struct sk_if_softc *, struct mbuf *, uint32_t);
 void msk_start(struct ifnet *);
 int msk_ioctl(struct ifnet *, u_long, caddr_t);
@@ -1561,7 +1561,7 @@ msk_watchdog(struct ifnet *ifp)
 * Reclaim first as there is a possibility of losing Tx completion
 * interrupts.
 */
-   msk_txeof(sc_if);
+   //msk_txeof(sc_if, sc_if->sk_cdata.sk_tx_prod);
if (sc_if->sk_cdata.sk_tx_prod != sc_if->sk_cdata.sk_tx_cons) {
printf("%s: watchdog timeout\n", sc_if->sk_dev.dv_xname);
 
@@ -1639,26 +1639,19 @@ msk_rxeof(struct sk_if_softc *sc_if, str
 }
 
 void
-msk_txeof(struct sk_if_softc *sc_if)
+msk_txeof(struct sk_if_softc *sc_if, unsigned int prod)
 {
struct ifnet*ifp = _if->arpcom.ac_if;
struct sk_softc *sc = sc_if->sk_softc;
-   uint32_tprod, cons;
+   uint32_tcons;
struct mbuf *m;
bus_dmamap_tmap;
-   bus_size_t  reg;
-
-   if (sc_if->sk_port == SK_PORT_A)
-   reg = SK_STAT_BMU_TXA1_RIDX;
-   else
-   reg = SK_STAT_BMU_TXA2_RIDX;
 
/*
 * Go through our tx ring and free mbufs for those
 * frames that have been sent.
 */
cons = sc_if->sk_cdata.sk_tx_cons;
-   prod = sk_win_read_2(sc, reg);
 
if (cons == prod)
return;
@@ -1770,7 +1763,7 @@ msk_intr(void *xsc)
};
struct ifnet*ifp0 = NULL, *ifp1 = NULL;
int claimed = 0;
-   u_int32_t   status;
+   u_int32_t   status, sk_status;
struct msk_status_desc  *cur_st;
 
status = CSR_READ_4(sc, SK_Y2_ISSR2);
@@ -1812,10 +1805,19 @@ msk_intr(void *xsc)
lemtoh32(_st->sk_status));
break;
case SK_Y2_STOPC_TXSTAT:
+   sk_status = lemtoh32(_st->sk_status);
if (sc_if0)
-   msk_txeof(sc_if0);
-   if (sc_if1)
-   msk_txeof(sc_if1);
+   msk_txeof(sc_if0, sk_status & 0xfff);
+   if (sc_if1) {
+   /*
+* this would be easier as a 64bit
+* load of the whole status descriptor,
+* a shift, and a mask.
+*/
+   unsigned int cons = (sk_status >> 24) & 0xff;
+   cons |= (lemtoh16(_st->sk_len) & 0xf) << 8;
+   msk_txeof(sc_if1, cons);
+

Re: CARP load balancing problems under KVM

2021-01-09 Thread David Gwynne

Hey Carlos,

I've spent a bit of time today trying to figure out what's going on here, and 
haven't found anything that looks wrong with carp in 6.8.

I did have a lot of trouble trying to reproduce it though, but that's because 
some of the switches involved seem to be "helping" and filtering packets sent 
from a multicast MAC address. I could see the carp interface get arp requests 
for the shared IP, and reply to them, but I never saw the replies on any other 
machine. However, I was able to build a test setup with carp on top of nvgre 
between a bunch of machines, and that abstracted me enough off the physical 
network to test with. As expected, it all worked fine.

The only thing that's changing in your setup is the openbsd version? You're not 
upgrading the host machines or using different physical switches at the same 
time or anything?

To debug this further I'd like to look at packet captures. Can you tcpump on 
the carp hosts and the client machines? If possible, captures from a 6.7 setup 
too would be nice.

Cheers,
dlg

> On 5 Jan 2021, at 1:59 am, Carlos Lopez  wrote:
> 
> Good afternoon,
> 
> Any news about this bug?
> 
> On 21/10/20, 12:37, "owner-b...@openbsd.org on behalf of Carlos Lopez" 
>  wrote:
> 
>Hi all,
> 
>Before upgrade from OpenBSD 6.7 to OpenBSD 6.8, my pair firewalls was 
> using carp in IP balance mode without problems from several months. These 
> firewalls are installed in a RHEL 8.2 (fully patched) KVM host.
> 
>After upgrading to OpenBSD 6.8, carp ip balance mode doesn’t works. I have 
> tested reconfiguring balance mode for ip-stealth and ip-unicast also and the 
> result is always the same: network packets are not processed by firewalls. 
> But if I configure CARP using “the simple configuration” and one node is 
> master and the other is backup all it is working without problems.
> 
>All CARP interfaces are configured as this one:
> 
>carpdev vio0 balancing ip pass 7254e4bc3024e35490e4b9942f919e9b
>inet 172.22.55.30 0xffe0 172.22.55.31
>carpnodes 10:0,11:100
>description "Production Network"
> 
>sysctl.conf file:
> 
>net.inet.carp.preempt=1
>net.inet.carp.log=2
>net.inet.ip.forwarding=1
>net.inet.tcp.mssdflt=1440
>net.inet.ip.redirect=0
>net.inet.ip.mtudisc=0
>net.inet.tcp.rfc3390=1
>net.inet.ip.arptimeout=60
>kern.bufcachepercent=70
>net.inet.icmp.tstamprepl=0
>net.inet.udp.sendspace=262144
>net.inet.udp.recvspace=262144
> 
> 
>OpenBSD kvm guest config:
> 
>
>  obsdfw01
>  OpenBSD Security Gateway Cluster
>  786432
>  786432
>  1
>  
>/machine
>  
>  
>hvm
>
>  
>  
>
>
>  
>  
>Broadwell
>
>
>
>
>
>
>
>  
>  
>
>
>
>  
>  destroy
>  restart
>  destroy
>  
>
>
>  
>  
>/usr/libexec/qemu-kvm
>
>  
>  
>  
>  
>  
>   function='0x0'/>
>
>
>  
>
>
>  
>   function='0x0'/>
>   
>
>  
>
>
>  
>  
>  
>   function='0x0' multifunction='on'/>
>
>
>  
>  
>  
>   function='0x1'/>
>
>
>  
>  
>  
>   function='0x2'/>
>
>
>  
>  
>  
>   function='0x3'/>
>
>
>  
>  
>  
>   function='0x4'/>
>
>
>  
>  
>  
>   function='0x5'/>
>
>
>  
>  
>  
>   function='0x6'/>
>
>
>  
>  
>  
>   function='0x7'/>
>
>
>  
>  
>  
>   function='0x0' multifunction='on'/>
>
>
>  
>  
>  
>   function='0x1'/>
>
>
>  
>  
>  
>   function='0x2'/>
>
>
>  
>  
>  
>   function='0x3'/>
>
>
>  
>  
>  
>   function='0x4'/>
>
>
>  
>   function='0x2'/>
>
>
>  
>  
>  
>  
>  
>   function='0x0'/>
>
>
>  
>  
>  
>  
>  
>   function='0x0'/>
>
>
>  
>  
>  
>  
>  
>   function='0x0'/>
>
>
>  
>  
>  
>  
>  
>   function='0x0'/>
>
>
>  
>  
>  
>  
>  
>   function='0x0'/>
>
>

Re: tcpdump 'ip6' filter doesn't work on wg0 (wireguard)

2020-07-20 Thread David Gwynne

On Mon, Jul 20, 2020 at 05:45:52PM +0200, Klemens Nanni wrote:
> On Sun, Jul 19, 2020 at 02:24:36PM +0200, Matthieu Herrb wrote:
> > Trying to look at IPv6 traffic on my wireguard VPN with
> > 
> > tcpdump -n -i wg0 ip6
> > 
> > also shows all IPv4 traffic. Other interfacees seem to filter the v6
> > protocol correctly.
> This happens for all interfaces without link-layer, e.g. lo(4) as well;
> see `tcpdump -c1 -ilo0 ip & ping6 -qc1 ::1'.
> 
> > Any suggestion before I try to dig into the kernel code (which I'm not
> > really familiar with) ?
> Not yet, but I'm curiously looking at this.

kn@ pointed me at this, and we came up with the following. firstly, we
narrowed the problem down to pcap not actually looking at the header to
decide if a packet was ipv4 or ipv6:

$ sudo tcpdump -i wg0 -d ip
(000) ret  #116
$ sudo tcpdump -i wg0 -d ip6
(000) ret  #116
$ sudo tcpdump -i gre0 -d ip 
(000) ret  #116
$ sudo tcpdump -i gre0 -d ip6
(000) ret  #116

our tunnel interfaces pretty much all use DLT_LOOP as their link type,
so this behaviour is consistent across all of them.

why the filter unconditionally matches these packets is because of
this stuff in src/lib/libpcap/gencode.c. im including bits for DLT_NULL
for comparison:

static void
init_linktype(type)
int type;
{
[snip]
switch (type) {
[snip]
case DLT_NULL:
off_linktype = 0;
off_nl = 4;
return;
[snip]
case DLT_LOOP:
off_linktype = -1;
off_nl = 4;
return;
[snip]
}

the actual filter is generated in gen_linktype:

static struct block *
gen_linktype(proto)
int proto;
{
struct block *b0, *b1;

/* If we're not using encapsulation and checking for IP, we're done */
if ((off_linktype == -1 || mpls_stack > 0) && proto == ETHERTYPE_IP)
return gen_true();
#ifdef INET6
/* this isn't the right thing to do, but sometimes necessary */
if ((off_linktype == -1 || mpls_stack > 0) && proto == ETHERTYPE_IPV6)
return gen_true();
#endif

switch (linktype) {
[snip]
case DLT_LOOP:
case DLT_ENC:
case DLT_NULL:
/* XXX */
if (proto == ETHERTYPE_IP)
return (gen_cmp(0, BPF_W, (bpf_int32)htonl(AF_INET)));
#ifdef INET6
else if (proto == ETHERTYPE_IPV6)
return (gen_cmp(0, BPF_W, (bpf_int32)htonl(AF_INET6)));
#endif /* INET6 */
else
return gen_false();
break;

cos init_linktype sets off_linktype to -1, gen_linktype thinks that
DLT_LOOP has no linktype header, and just assumes everything is both
ipv4 or ipv6.

DLT_LOOP does have a link type header though, so we should fix
init_linktypes. this is backed up by
https://www.tcpdump.org/linktypes.html.

this diff seems to work ok:

$ sudo tcpdump -i gre0 -d ip  
(000) ld   [0]
(001) jeq  #0x200   jt 2jf 3
(002) ret  #116
(003) ret  #0
$ sudo tcpdump -i gre0 -d ip6
(000) ld   [0]
(001) jeq  #0x1800  jt 2jf 3
(002) ret  #116
(003) ret  #0

ok?

Index: gencode.c
===
RCS file: /cvs/src/lib/libpcap/gencode.c,v
retrieving revision 1.52
diff -u -p -r1.52 gencode.c
--- gencode.c   9 Dec 2018 15:07:06 -   1.52
+++ gencode.c   20 Jul 2020 23:59:44 -
@@ -770,7 +770,7 @@ init_linktype(type)
return;
 
case DLT_LOOP:
-   off_linktype = -1;
+   off_linktype = 0;
off_nl = 4;
return;

Re: Interfaces errors and latency spikes with Intel 82583V

2020-06-11 Thread David Gwynne

are there any config options on the switch site relating to flow control you 
can try turning off? are there any counters for pause frames on the switch side 
too?

dlg

> On 12 Jun 2020, at 12:16 pm, Gabri Tofano  wrote:
> 
> Apparently it is not:
> 
> #ifconfig em0 hwfeatures
> em0: flags=808843 mtu 1500
>hwfeatures=36 hardmtu 
> 9216
>lladdr XX:XX:XX:XX:XX:XX
>index 1 priority 0 llprio 3
>groups: egress
>media: Ethernet autoselect (1000baseT full-duplex)
>status: active
>inet XX:XX:XX:XX netmask 0xff00 broadcast XX:XX:XX:XX
> 
> 
> On 2020-06-11 21:57, David Gwynne wrote:
>> Is flow control enabled? Can you try disabling rxpause and txpause?
>>> On 12 Jun 2020, at 10:36 am, Gabri Tofano  wrote:
>>> Yes, this is today without resetting the interface:
>>> #netstat -ie
>>> NameMtu   Network Address  Ipkts IerrsOpkts Oerrs 
>>> Colls
>>> em0 1500XX:XX:XX:XX:XX:XX  5351463  1868  3016695 0   
>>>   0
>>> em0 1500  XX:XX:XX:XX XX:XX:XX:XX:XX:XX  5351463  1868  3016695 0   
>>>   0
>>> em1 1500XX:XX:XX:XX:XX:XX  2839738 0  5147702 0   
>>>   0
>>> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX  2839738 0  5147702 0   
>>>   0
>>> em2 1500XX:XX:XX:XX:XX:XX46977 044135 0   
>>>   0
>>> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX46977 044135 0   
>>>   0
>>> em3*150000:e0:67:10:9d:970 0    0     0   
>>>   0
>>> enc0*   00 00 0   
>>>   0
>>> pflog0  331360 0   128982 0   
>>>   0
>>> On 2020-06-11 20:29, David Gwynne wrote:
>>>> Is it consistently Ierrs?
>>>> dlg
>>>>> On 11 Jun 2020, at 10:14 pm, Gabri Tofano  wrote:
>>>>> #netstat -id
>>>>> NameMtu   Network Address  Ipkts IdropOpkts Odrop 
>>>>> Colls
>>>>> em0 1500XX:XX:XX:XX:XX:XX   266894 0   202813 0 
>>>>> 0
>>>>> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   266894 0   202813 0 
>>>>> 0
>>>>> em1 1500XX:XX:XX:XX:XX:XX   170280 0   230226 1 
>>>>> 0
>>>>> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   170280 0   230226 1 
>>>>> 0
>>>>> em2 1500XX:XX:XX:XX:XX:XX15788 013249 2 
>>>>> 0
>>>>> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15788 013249 2 
>>>>> 0
>>>>> em3*1500XX:XX:XX:XX:XX:XX0 00 0 
>>>>> 0
>>>>> enc0*   00 00 0 
>>>>> 0
>>>>> pflog0  331360 029771 0 
>>>>> 0
>>>>> #netstat -ie
>>>>> NameMtu   Network Address  Ipkts IerrsOpkts Oerrs 
>>>>> Colls
>>>>> em0 1500XX:XX:XX:XX:XX:XX   26971372   205469 0 
>>>>> 0
>>>>> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   26971372   205469 0 
>>>>> 0
>>>>> em1 1500XX:XX:XX:XX:XX:XX   172137 0   232148 0 
>>>>> 0
>>>>> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   172137 0   232148 0 
>>>>> 0
>>>>> em2 1500XX:XX:XX:XX:XX:XX15892 013316 0 
>>>>> 0
>>>>> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15892 013316 0 
>>>>> 0
>>>>> em3*1500XX:XX:XX:XX:XX:XX0 00 0 
>>>>> 0
>>>>> enc0*   00 00 0 
>>>>> 0
>>>>> pflog0  331360 030174 0 
>>>>> 0
>>>>> #systat queues
>>>>> QUEUE  BW/FL SCH  PKTSBYTES   DROP_P   
>>>>> DROP_B QLEN BORROW SUSPEN P/S B/S
>>>>> main on em0 120M fifo000  
>>>>>   00
>>>>> defq   100M fifo   139394 215744110   
>>>>>  00
>>>>> voip10M fifo34699  49496350   
>>>>>  00
>>>>> games   10M fifo32277  24608070   
>>>>>  00
>>>>> Thank you!
>>>>> Gabri

Re: Interfaces errors and latency spikes with Intel 82583V

2020-06-11 Thread David Gwynne

Is flow control enabled? Can you try disabling rxpause and txpause?

> On 12 Jun 2020, at 10:36 am, Gabri Tofano  wrote:
> 
> Yes, this is today without resetting the interface:
> 
> #netstat -ie
> NameMtu   Network Address  Ipkts IerrsOpkts Oerrs 
> Colls
> em0 1500XX:XX:XX:XX:XX:XX  5351463  1868  3016695 0 > 0
> em0 1500  XX:XX:XX:XX XX:XX:XX:XX:XX:XX  5351463  1868  3016695 0 > 0
> em1 1500XX:XX:XX:XX:XX:XX  2839738 0  5147702 0 > 0
> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX  2839738 0  5147702 0 > 0
> em2 1500XX:XX:XX:XX:XX:XX46977 044135 0 > 0
> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX46977 044135 0 > 0
> em3*150000:e0:67:10:9d:970 00 0 > 0
> enc0*   00 00 0 > 0
> pflog0  33136    0 0   128982 0 > 0
> 
> 
> On 2020-06-11 20:29, David Gwynne wrote:
>> Is it consistently Ierrs?
>> dlg
>>> On 11 Jun 2020, at 10:14 pm, Gabri Tofano  wrote:
>>> #netstat -id
>>> NameMtu   Network Address  Ipkts IdropOpkts Odrop 
>>> Colls
>>> em0 1500XX:XX:XX:XX:XX:XX   266894 0   202813 0   
>>>   0
>>> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   266894 0   202813 0   
>>>   0
>>> em1 1500XX:XX:XX:XX:XX:XX   170280 0   230226 1   
>>>   0
>>> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   170280 0   230226 1   
>>>   0
>>> em2 1500XX:XX:XX:XX:XX:XX15788 013249 2   
>>>   0
>>> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15788 013249 2   
>>>   0
>>> em3*1500XX:XX:XX:XX:XX:XX0 00 0   
>>>   0
>>> enc0*   00 00 0   
>>>   0
>>> pflog0  331360 029771 0   
>>>   0
>>> #netstat -ie
>>> NameMtu   Network Address  Ipkts IerrsOpkts Oerrs 
>>> Colls
>>> em0 1500XX:XX:XX:XX:XX:XX   26971372   205469 0   
>>>   0
>>> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   26971372   205469 0   
>>>   0
>>> em1 1500XX:XX:XX:XX:XX:XX   172137 0   232148 0   
>>>   0
>>> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   172137 0   232148 0   
>>>   0
>>> em2 1500XX:XX:XX:XX:XX:XX15892 013316 0   
>>>   0
>>> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15892 013316 0   
>>>   0
>>> em3*1500XX:XX:XX:XX:XX:XX0 00 0   
>>>   0
>>> enc0*   00 00 0   
>>>   0
>>> pflog0  331360 030174 0   
>>>   0
>>> #systat queues
>>> QUEUE  BW/FL SCH  PKTSBYTES   DROP_P   
>>> DROP_B QLEN BORROW SUSPEN P/S B/S
>>> main on em0 120M fifo000
>>> 00
>>> defq   100M fifo   139394 215744110
>>> 00
>>> voip10M fifo34699  49496350
>>> 00
>>> games   10M fifo32277  24608070
>>> 00
>>> Thank you!
>>> Gabri

Re: Interfaces errors and latency spikes with Intel 82583V

2020-06-11 Thread David Gwynne

Is it consistently Ierrs?

dlg

> On 11 Jun 2020, at 10:14 pm, Gabri Tofano  wrote:
> 
> #netstat -id
> NameMtu   Network Address  Ipkts IdropOpkts Odrop 
> Colls
> em0 1500XX:XX:XX:XX:XX:XX   266894 0   202813 0 > 0
> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   266894 0   202813 0 > 0
> em1 1500XX:XX:XX:XX:XX:XX   170280 0   230226 1 > 0
> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   170280 0   230226 1 > 0
> em2 1500XX:XX:XX:XX:XX:XX15788 013249 2 > 0
> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15788 013249 2 > 0
> em3*1500XX:XX:XX:XX:XX:XX0 00 0 > 0
> enc0*   00 00 0 > 0
> pflog0  331360 029771 0 > 0
> 
> #netstat -ie
> NameMtu   Network Address  Ipkts IerrsOpkts Oerrs 
> Colls
> em0 1500XX:XX:XX:XX:XX:XX   26971372   205469 0 > 0
> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   26971372   205469 0 > 0
> em1 1500XX:XX:XX:XX:XX:XX   172137 0   232148 0 > 0
> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   172137 0   232148 0 > 0
> em2 1500XX:XX:XX:XX:XX:XX15892 013316 0 > 0
> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15892 013316 0 > 0
> em3*1500XX:XX:XX:XX:XX:XX0 00 0 > 0
> enc0*   00 00 0 > 0
> pflog0  331360 030174 0 > 0
> 
> 
> #systat queues
> QUEUE  BW/FL SCH  PKTSBYTES   DROP_P   DROP_B 
> QLEN BORROW SUSPEN P/S B/S
> main on em0 120M fifo0000 
>0
> defq   100M fifo   139394 2157441100  
>   0
> voip10M fifo34699  494963500  
>   0
> games   10M fifo32277  246080700  
>   0
> 
> Thank you!
> Gabri

Re: Interfaces errors and latency spikes with Intel 82583V

2020-06-11 Thread David Gwynne

The Ifail and Ofail columns are a sum of queue drops and errors. Could you run 
that netstat command with -d and -e so we can see the drops and errors 
separately?

Cheers,
dlg

> On 11 Jun 2020, at 2:21 pm, Gabri Tofano  wrote:
> 
> After extensive testing the latency spikes shown up again:
> 
> To the inside interface of the firewall:
> 
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time=132ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> Reply from 172.16.200.1: bytes=32 time<1ms TTL=254
> 
> And to the firewall's next hop (ISP ONT) at the same time:
> 
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=2ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=2ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=3ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=2ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=3ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=242ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=2ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=2ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=2ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=1ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=3ms TTL=62
> Reply from 74.215.235.1: bytes=32 time=3ms TTL=62
> 
> Interface errors are now showing up just on the output:
> 
> #netstat -i
> NameMtu   Network Address  Ipkts IfailOpkts Ofail 
> Colls
> em0 1500XX:XX:XX:XX:XX:XX22655 041589 0 > 0
> em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX22655 041589 0 > 0
> em1 1500XX:XX:XX:XX:XX:XX39924 020476 1 > 0
> em1 1500  172.16.200. XX:XX:XX:XX:XX:XX39924 020476 1 > 0
> em2 1500XX:XX:XX:XX:XX:XX  427 0  330 2 > 0
> em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX  427 0  330 2 > 0
> em3*1500XX:XX:XX:XX:XX:XX0 00 0 > 0
> enc0*   00 00 0 > 0
> pflog0  331360 0 1294 0 > 0
> 
> UDP real time traffic is the most affected one as very sensitive and I keep \
> having spikes meanwhile playing online.
> 
> Thank you!
> Gabri
> 
> On 2020-06-10 22:50, Gabri Tofano wrote:
>> Another user pointed out to me that in the OpenBSD 6.7 release notes
>> there is a statement in regards of the em(4) drivers: "Improvements in
>> the em(4) driver." and so I have gave it a try and reinstalled with
>> OpenBSD 6.6. It looks like that the system is now stable and latency
>> spikes/interface errors are not present at all even under heavy
>> traffic loads. I am not sure what introduced the issue but maybe one
>> of the devs can give it a look?
>> Thank you!
>> Gabri
>> On 2020-06-09 13:01, Gabri Tofano wrote:
>>> Hi all,
>>> I'm using a "Protectli FW1" with FreeBSD 12.1 amd64 as a firewall
>>> which is serving me with great performances and no issues at all. The
>>> appliance has 4 Intel Gigabit 82583V Ethernet NIC ports which are
>>> working very well. I have used PFsense as well prior to FreeBSD and it
>>> worked without issues too.
>>> I took the decision to move to OpenBSD 6.7 amd64 in order to benefit
>>> of the latest pf (and other) features but unfortunately the OS is
>>> giving me an issue which I guess is related to the NIC drivers; When I
>>> was connected via ssh I felt some glitches meanwhile I was
>>> typing/moving around with the editor, so I started to ping the inside
>>> interface from a wired connected pc and found out that time to time
>>> the appliance is responding with a 100+/200+ ms response (I have cut
>>> some 1ms reply to make it shorter):
>>> Reply from 172.16.200.1: bytes=32 time=1ms TTL=254
>>> Reply from 172.16.200.1: bytes=32 time=1ms TTL=254

Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)

2020-04-11 Thread David Gwynne

On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote:
> >Synopsis:Packet loss / ENOBUFs with kqueue(2) and tap(4)
> >Category:bug
> >Environment:
>   System  : OpenBSD 6.5
>   Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 
> 19:39:46 AWST 2019
>
> ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   In Solo5 we have been working towards supporting multiple network
>   interfaces, implemented this using kqueue(2) and tap(4).
> 
>   This involves setting up two Tap interfaces, starting up the program.
>   In another session flood pinging the first Tap interface,
>   Solo5 handles this with no packets dropped.
>   In another session ping the second Tap interface, then for every
>   ping to the second interface a packet is dropped on the first. If you
>   switch to a flood ping on the second tab interface, you will observe
>   massive packet loss on both interfaces, and ping complaining about
>   No buffer space available (ENOBUFS).
> 
>   see https://github.com/Solo5/solo5/issues/374 for more information.
>   
> >How-To-Repeat:
>   I have been able to reproduct this in a hacked up exampled program,
>   available here https://github.com/adamsteen/test_net_2if. Please note
>   this is hacked, generally butchered program, which demonstrates the
>   problem. (if required i can try and clean up this test case)
> 
>   01. git clone https://github.com/adamsteen/test_net_2if
>   02. cd test_net_2if
>   03. make
>   04. doas setup.sh (Setup up the Tap interfaces)
>   05. doas ./test_net_2if
>   06. in another seesion start a flood ping
>   doas ping -f 10.0.0.2
>   07. Observe that the flood ping is functioning correctly,
>   with no packets dropped.
>   08. In another session, start a normal ping
>   ping 10.1.0.2
>   09. Observe that, for each ping sent to service1, a packet is dropped.
>   10. Kill the normal ping
>   11. start a flood ping
>   doas ping -f 10.1.0.2
>   12. Observe massive packet loss on both interfaces, and ping
>   complaining about No buffer space available (ENOBUFS).
> >Fix:
>   Not Known.

Hi Adam,

claudio@ and I looked at this during a2k20, and came to the conclusion
that the packet loss occurred because an interface queue filled up
and it was shedding load. It was annoyingly easy to get to that point
though.

We also spent a lot of time massaging the tun/tap code to try and unify
the semantics of tun and tap going through the network stack, and in
particular tried to avoid queuing packets until we finally get to the
output side of the stack.

I'm not saying we've fixed this problem for you, but hopefully we've
mitigated it a bit. Could you try again and let us know if you see any
difference? If there's no difference, could you tweak your test to loop
on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or
whatever the errno is that means there's no packet to read right now?

Cheers,
dlg

Re: Crash while using ospfd over vxlan

2020-04-11 Thread David Gwynne

On Fri, Apr 10, 2020 at 09:51:40AM +0200, Martin Pieuchot wrote:
> On 09/04/20(Thu) 16:10, Massimiliano Stucchi wrote:
> > >Synopsis:  Crash while using ospfd over vxlan
> > >Category:  bug
> > >Environment:
> > System  : OpenBSD 6.6
> > Details : OpenBSD 6.6 (GENERIC.MP) #5: Sun Feb 16 01:56:11 MST 2020
> > 
> > r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Setting up an OSPF session over VXLAN leads to a kernel crash
> > >How-To-Repeat:
> > 
> > I have setup an ospf session over a vxlan interface.  When this is up,
> > it takes about 2-3 minutes for the crash to consistently happen.
> > 
> > No other action is necessary.
> > 
> > At this address:
> > 
> > https://max.stucchi.ch/bugreport/
> > 
> > you can find screenshots from the ddb prompt, including a full trace.
> > 
> > If needed, I can also provide access to the console.
> 
> It's a recursion.  I don't know anything about vxlan(4) or how the
> encapsulation works but the following happens at least 10 times:
> 
>   ...
>   vxlan_lookup()
>   udp_input()
>   ip_deliver()
>   ip_ours()
>   ip_input_if()
>   ipv4_input()
>   ether_input()
>   if_vinput()
>   vxlan_lookup()
>   ...
> 
> Maybe you can share your setup (vxlan config, ospf config, etc) so
> somebody can try to reproduce and fix it.

Possible recursion through encap drivers is a problem they all share.
The vxlan driver should probably have the following text copied from one
of the other manpages into it:

  For correct operation, encapsulated traffic must not be routed over the
  interface itself.  This can be implemented by adding a distinct or a more
  specific route to the tunnel destination than the hosts or networks
  routed via the tunnel interface.  Alternatively, the tunnel traffic may
  be configured in a separate routing table to the encapsulated traffic.

Misconfiguration shouldn't result in a panic or fault though, so can you
try the following diff? It copies the mechanism used to prevent
recursion into vxlan. There's some more drivers that don't do this which
I'll try and fix up in the next few days.

Cheers,
dlg

Index: if_vxlan.c
===
RCS file: /cvs/src/sys/net/if_vxlan.c,v
retrieving revision 1.76
diff -u -p -r1.76 if_vxlan.c
--- if_vxlan.c  8 Nov 2019 07:16:29 -   1.76
+++ if_vxlan.c  12 Apr 2020 01:36:26 -
@@ -82,6 +82,8 @@ struct vxlan_softc {
 
 voidvxlanattach(int);
 int vxlanioctl(struct ifnet *, u_long, caddr_t);
+int vxlanoutput(struct ifnet *, struct mbuf *, struct sockaddr *,
+   struct rtentry *);
 voidvxlanstart(struct ifnet *);
 int vxlan_clone_create(struct if_clone *, int);
 int vxlan_clone_destroy(struct ifnet *);
@@ -150,6 +152,7 @@ vxlan_clone_create(struct if_clone *ifc,
 
ifp->if_softc = sc;
ifp->if_ioctl = vxlanioctl;
+   ifp->if_output = vxlanoutput;
ifp->if_start = vxlanstart;
IFQ_SET_MAXLEN(>if_snd, IFQ_MAXLEN);
 
@@ -294,6 +297,33 @@ vxlan_multicast_join(struct ifnet *ifp, 
if_detachhook_add(mifp, >sc_dtask);
 
return (0);
+}
+
+int
+vxlanoutput(struct ifnet *ifp, struct mbuf *m, struct sockaddr *dst,
+struct rtentry *rt)
+{
+   struct m_tag *mtag;
+
+   /* Try to limit infinite recursion through misconfiguration. */
+   for (mtag = m_tag_find(m, PACKET_TAG_GRE, NULL); mtag;
+   mtag = m_tag_find(m, PACKET_TAG_GRE, mtag)) {
+   if (memcmp((caddr_t)(mtag + 1), >if_index,
+   sizeof(ifp->if_index)) == 0) {
+   m_freem(m);
+   return (EIO);
+   }
+   }
+
+   mtag = m_tag_get(PACKET_TAG_GRE, sizeof(ifp->if_index), M_NOWAIT);
+   if (mtag == NULL) {
+   m_freem(m);
+   return (ENOMEM);
+   }
+   memcpy((caddr_t)(mtag + 1), >if_index, sizeof(ifp->if_index));
+   m_tag_prepend(m, mtag);
+
+   return (ether_output(ifp, m, dst, rt));
 }
 
 void

Re: splassert w/ add/del vlan on bridge

2020-04-11 Thread David Gwynne

On Sat, Apr 11, 2020 at 01:43:20PM +, Visa Hankala wrote:
> On Sat, Apr 11, 2020 at 11:09:54PM +1000, David Gwynne wrote:
> > On Sat, Apr 11, 2020 at 03:21:49AM +, Visa Hankala wrote:
> > > On Fri, Apr 10, 2020 at 01:30:47PM -0600, Theo de Raadt wrote:
> > > > Why did it take almost a year to find this?
> > > > 
> > > > Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22?
> > > 
> > > This is not related to ioctl(2) becoming UNLOCKED. Lower-layer ioctl
> > > code, soo_ioctl() included, lock the kernel when needed. However, most
> > > .if_ioctl backends need NET_LOCK() in addition to KERNEL_LOCK(). In
> > > most cases, that is satisfied by ifioctl() which acquires the lock
> > > before invoking .if_ioctl(). bridge_ioctl() nullifies this by
> > > releasing NET_LOCK().
> > 
> > yes.
> > 
> > i came up with the following diff before i read the thread here. it's
> > largely identical to what you (visa) already came up with, but it adds
> > some extra checks to ifpromisc based on the doco in around struct ifnet
> > members in src/sys/net/if_var.h. i audited the rest of the ifpromisc
> > calls and found another one in if_aggr that i was able to trigger.
> > 
> > i think the only other call to ifpromisc outside src/sys/net is in carp,
> > and i managed to convinced myself that all those calls hold NET_LOCK
> > already.
> > 
> > Index: if.c
> > ===
> > RCS file: /cvs/src/sys/net/if.c,v
> > retrieving revision 1.601
> > diff -u -p -r1.601 if.c
> > --- if.c10 Mar 2020 09:11:55 -  1.601
> > +++ if.c11 Apr 2020 13:08:46 -
> > @@ -3031,7 +3031,9 @@ ifpromisc(struct ifnet *ifp, int pswitch
> > unsigned short oif_flags;
> > int oif_pcount, error;
> >  
> > +   NET_ASSERT_LOCKED(); /* modifying if_flags */
> > oif_flags = ifp->if_flags;
> > +   KERNEL_ASSERT_LOCKED(); /* modifying if_pcount */
> > oif_pcount = ifp->if_pcount;
> > if (pswitch) {
> > if (ifp->if_pcount++ != 0)
> > Index: if_aggr.c
> > ===
> > RCS file: /cvs/src/sys/net/if_aggr.c,v
> > retrieving revision 1.28
> > diff -u -p -r1.28 if_aggr.c
> > --- if_aggr.c   11 Mar 2020 07:01:42 -  1.28
> > +++ if_aggr.c   11 Apr 2020 13:08:46 -
> > @@ -589,8 +589,10 @@ aggr_clone_destroy(struct ifnet *ifp)
> > if_detach(ifp);
> >  
> > /* last ref, no need to lock. aggr_p_dtor locks anyway */
> > +   NET_LOCK();
> > while ((p = TAILQ_FIRST(>sc_ports)) != NULL)
> > aggr_p_dtor(sc, p, "destroy");
> > +   NET_UNLOCK();
> >  
> > free(sc, M_DEVBUF, sizeof(*sc));
> >  
> > Index: if_bridge.c
> > ===
> > RCS file: /cvs/src/sys/net/if_bridge.c,v
> > retrieving revision 1.338
> > diff -u -p -r1.338 if_bridge.c
> > --- if_bridge.c 6 Nov 2019 03:51:26 -   1.338
> > +++ if_bridge.c 11 Apr 2020 13:08:46 -
> > @@ -313,7 +313,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c
> > break;
> > }
> >  
> > +   NET_LOCK();
> > error = ifpromisc(ifs, 1);
> > +   NET_UNLOCK();
> > if (error != 0) {
> > free(bif, M_DEVBUF, sizeof(*bif));
> > break;
> > @@ -558,7 +560,9 @@ bridge_ifremove(struct bridge_iflist *bi
> > }
> >  
> > bif->ifp->if_bridgeidx = 0;
> > +   NET_LOCK();
> > error = ifpromisc(bif->ifp, 0);
> > +   NET_UNLOCK();
> >  
> > bridge_rtdelete(sc, bif->ifp, 0);
> > bridge_flushrule(bif);
> > Index: if_tpmr.c
> > ===
> > RCS file: /cvs/src/sys/net/if_tpmr.c,v
> > retrieving revision 1.9
> > diff -u -p -r1.9 if_tpmr.c
> > --- if_tpmr.c   11 Apr 2020 11:01:03 -  1.9
> > +++ if_tpmr.c   11 Apr 2020 13:08:46 -
> > @@ -201,12 +201,14 @@ tpmr_clone_destroy(struct ifnet *ifp)
> >  
> > if_detach(ifp);
> >  
> > +   NET_LOCK();
> > for (i = 0; i < nitems(sc->sc_ports); i++) {
> > struct tpmr_port *p = SMR_PTR_GET_LOCKED(>sc_ports[i]);
> > if (p == NULL)
> > continue;
> > tpmr_p_dtor(sc, p, &q

Re: splassert w/ add/del vlan on bridge

2020-04-11 Thread David Gwynne

On Sat, Apr 11, 2020 at 03:35:45PM +0200, Martin Pieuchot wrote:
> On 11/04/20(Sat) 23:09, David Gwynne wrote:
> > On Sat, Apr 11, 2020 at 03:21:49AM +, Visa Hankala wrote:
> > > On Fri, Apr 10, 2020 at 01:30:47PM -0600, Theo de Raadt wrote:
> > > > Why did it take almost a year to find this?
> > > > 
> > > > Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22?
> > > 
> > > This is not related to ioctl(2) becoming UNLOCKED. Lower-layer ioctl
> > > code, soo_ioctl() included, lock the kernel when needed. However, most
> > > .if_ioctl backends need NET_LOCK() in addition to KERNEL_LOCK(). In
> > > most cases, that is satisfied by ifioctl() which acquires the lock
> > > before invoking .if_ioctl(). bridge_ioctl() nullifies this by
> > > releasing NET_LOCK().
> > 
> > yes.
> > 
> > i came up with the following diff before i read the thread here. it's
> > largely identical to what you (visa) already came up with, but it adds
> > some extra checks to ifpromisc based on the doco in around struct ifnet
> > members in src/sys/net/if_var.h. i audited the rest of the ifpromisc
> > calls and found another one in if_aggr that i was able to trigger.
> 
> The documentation says `if_pcount' is protected by the KERNEL_LOCK() but
> in fact it is only read & modified in ifpromisc().
> 
> So I'd suggest fixing the documentation and not add another assert there.

Can do.

> > i think the only other call to ifpromisc outside src/sys/net is in carp,
> > and i managed to convinced myself that all those calls hold NET_LOCK
> > already.

Index: if_var.h
===
RCS file: /cvs/src/sys/net/if_var.h,v
retrieving revision 1.103
diff -u -p -r1.103 if_var.h
--- if_var.h8 Nov 2019 07:16:29 -   1.103
+++ if_var.h11 Apr 2020 13:38:10 -
@@ -130,7 +130,7 @@ struct ifnet {  /* and the 
entries */
/* [I] check or clean routes (+ or -)'d */
void(*if_rtrequest)(struct ifnet *, int, struct rtentry *);
charif_xname[IFNAMSIZ]; /* [I] external name (name + unit) */
-   int if_pcount;  /* [k] # of promiscuous listeners */
+   int if_pcount;  /* [N] # of promiscuous listeners */
unsigned int if_bridgeidx;  /* [k] used by bridge ports */
caddr_t if_bpf; /* packet filter structure */
caddr_t if_switchport;  /* used by switch ports */
Index: if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.601
diff -u -p -r1.601 if.c
--- if.c10 Mar 2020 09:11:55 -  1.601
+++ if.c11 Apr 2020 13:38:10 -
@@ -3031,6 +3031,8 @@ ifpromisc(struct ifnet *ifp, int pswitch
unsigned short oif_flags;
int oif_pcount, error;
 
+   NET_ASSERT_LOCKED(); /* modifying if_flags and if_pcount */
+
oif_flags = ifp->if_flags;
oif_pcount = ifp->if_pcount;
if (pswitch) {
Index: if_aggr.c
===
RCS file: /cvs/src/sys/net/if_aggr.c,v
retrieving revision 1.28
diff -u -p -r1.28 if_aggr.c
--- if_aggr.c   11 Mar 2020 07:01:42 -  1.28
+++ if_aggr.c   11 Apr 2020 13:38:10 -
@@ -589,8 +589,10 @@ aggr_clone_destroy(struct ifnet *ifp)
if_detach(ifp);
 
/* last ref, no need to lock. aggr_p_dtor locks anyway */
+   NET_LOCK();
while ((p = TAILQ_FIRST(>sc_ports)) != NULL)
aggr_p_dtor(sc, p, "destroy");
+   NET_UNLOCK();
 
free(sc, M_DEVBUF, sizeof(*sc));
 
Index: if_bridge.c
===
RCS file: /cvs/src/sys/net/if_bridge.c,v
retrieving revision 1.338
diff -u -p -r1.338 if_bridge.c
--- if_bridge.c 6 Nov 2019 03:51:26 -   1.338
+++ if_bridge.c 11 Apr 2020 13:38:10 -
@@ -313,7 +313,9 @@ bridge_ioctl(struct ifnet *ifp, u_long c
break;
}
 
+   NET_LOCK();
error = ifpromisc(ifs, 1);
+   NET_UNLOCK();
if (error != 0) {
free(bif, M_DEVBUF, sizeof(*bif));
break;
@@ -558,7 +560,9 @@ bridge_ifremove(struct bridge_iflist *bi
}
 
bif->ifp->if_bridgeidx = 0;
+   NET_LOCK();
error = ifpromisc(bif->ifp, 0);
+   NET_UNLOCK();
 
bridge_rtdelete(sc, bif->ifp, 0);
bridge_flushrule(bif);
Index: if_tpmr.c
===
RCS file: /cvs/src/sys/net/if_tpmr.c,v
retrieving revision 1.9
diff -u -p -r1.9 if_tpmr.c
--- if_tpmr.c   1

Re: netstart: PROMISC,ALLMULTI not set on parent interface of vlan that joins a bridge until run again

2019-11-03 Thread David Gwynne

Hi Jon,

This should be fixed in current as of r1.199 of src/sys/net/if_vlan.c

Sorry for the inconvenience.

Cheers,
dlg

> On 11 Oct 2018, at 06:45, Jon Williams  wrote:
> 
>> Synopsis: Running netstart 1x does not set PROMISC,ALLMULTI on parent 
>> interface of vlan members of bridges
>> Category: system
>> Environment:
>  System  : OpenBSD 6.3
>  Details : OpenBSD 6.3 (GENERIC.MP) #11: Thu Sep 20 16:05:37 CEST 2018
>   
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>  Architecture: OpenBSD.amd64
>  Machine : amd64
>> Description:
> 
> em0 is a parent of vlan1 and vlan2. vlan1 is a member of bridge0. vlan2 is a 
> member of bridge2.
> After system boot, em0 does not have PROMISC,ALLMULTI set in ifconfig - i.e. 
> it cannot forward
> traffic correctly, until netstart is manually re-run by root user.
> 
>> How-To-Repeat:
> NB: have been unable to replicate inside vmd using a vio interface for vlan 
> parent. All vio
> interfaces appear to be always PROMISC,ALLMULTI.
> 
> (Even a temporary workaround would be helpful as this machine is a 
> gateway/bgp host.)
> 
> /etc/hostname.em0
> description "nycmesh-lbe-1659"
> group trunk
> up
> 
> /etc/hostname.vlan1
> description "nycmesh-lbe-1659 mgmt VLAN"
> parent em0 vnetid 2
> group lan
> group trusted
> group bridged
> up
> 
> cat /etc/hostname.vlan2
> description "nycmesh-lbe-1659 WAN VLAN"
> group nycmesh
> group bridged
> parent em0 vnetid 3
> up
> 
> cat /etc/hostname.vlan2
> description "nycmesh-lbe-1659 WAN VLAN"
> group nycmesh
> group bridged
> parent em0 vnetid 3
> up
> 
> /etc/hostname.bridge0
> description "Bridged LAN"
> group lan
> group trusted
> group bridge
> add vether0
> add em1
> add em2
> add em3
> add em4
> add em5
> add vlan1
> # Try to stop the Airport express from sending weird arps
> rule block on em1 src 28:37:37:3f:5:4c arp spa 10.70.145.50
> up
> 
> cat /etc/hostname.bridge2
> description "Bridged WAN"
> group wan
> group bridge
> add vether2
> add vlan2
> up
> 
> SENDBUG: Run sendbug as root if this is an ACPI report!
> SENDBUG: dmesg and usbdevs are attached.
> SENDBUG: Feel free to delete or use the -D flag if they contain sensitive 
> information.
> 
> dmesg:
> OpenBSD 6.3 (GENERIC.MP) #11: Thu Sep 20 16:05:37 CEST 2018
>
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8520617984 (8125MB)
> avail mem = 8255311872 (7872MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x8e622000 (85 entries)
> bios0: vendor American Megatrends Inc. version "5.12" date 07/01/2018
> bios0: Default string Default string
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S5
> acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET SSDT SSDT UEFI 
> SSDT LPIT SSDT SSDT SSDT SSDT DBGP DBG2 SSDT DMAR ASF! WSMT
> acpi0: wakeup devices PXSX(S3) RP09(S3) PXSX(S3) RP10(S3) PXSX(S3) RP11(S3) 
> PXSX(S3) RP12(S3) PXSX(S3) RP13(S3) PXSX(S3) RP01(S3) PXSX(S3) RP02(S3) 
> PXSX(S3) RP03(S3) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Celeron(R) CPU 3865U @ 1.80GHz, 1696.65 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,PAGE1GB,
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Celeron(R) CPU 3865U @ 1.80GHz, 1696.06 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,PAGE1GB,
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 120 pins
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 2399 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (PEG0)
> acpiprt2 at acpi0: bus -1 (PEG1)
> acpiprt3 at acpi0: bus -1 (PEG2)
> acpiprt4 at acpi0: bus -1 (RP09)
> acpiprt5 at acpi0: bus -1 (RP10)
> acpiprt6 at acpi0: bus -1 (RP11)
> acpiprt7 at acpi0: bus -1 (RP12)
> acpiprt8 at acpi0: bus -1 (RP13)
> acpiprt9 at acpi0: bus 1 (RP01)
> acpiprt10 at acpi0: bus 2 (RP02)
> acpiprt11 at acpi0: bus 3 (RP03)
> acpiprt12 at acpi0: bus 4 (RP04)
> acpiprt13 at acpi0: bus 5 (RP05)
> acpiprt14 at acpi0: bus 6 (RP06)
> acpiprt15 at acpi0: bus -1 (RP07)
> acpiprt16 at acpi0: bus -1 (RP08)
> acpiprt17 at

Re: Bad TTL when applying multiple MPLS labels

2019-08-27 Thread David Gwynne

The fix should be in now. Thanks for the report, and the diff pointing
right at the problem.

On Tue., 27 Aug. 2019, 01:45 Gerrie Roos,  wrote:

> I just tested your patch and it works! I can see the correct TTL.
>
> -Original Message-
> From: "David Gwynne" 
> Sent: Monday, 26 August, 2019 4:21am
> To: "Gerrie Roos" 
> Cc: bugs@openbsd.org
> Subject: Re: Bad TTL when applying multiple MPLS labels
>
> On Fri, Aug 23, 2019 at 10:05:25AM -0500, Gerrie Roos wrote:
> > Hi David,
> >
> > Apologies, I haven't used CVS in a while - here you go. Hope it's useful.
> >
> > Gerrie
>
> npz, i can read the attachment fine.
>
> if someone's carrying a non-ip payload over mpls with a nibble that
> makes it look like ip, your diff will cause (very) short payloads
> to be dropped when m_pullup can't find bytes to copy into the first
> mbuf.
>
> could you try the following? it uses m_getptr to reach into the right
> mbuf and byte offset in that mbuf for the ttl. if the packet is
> short, it just returns the default ttl.
>
> i've attached the diff for your convenience too.
>
> cheers,
> dlg
>
> Index: mpls_output.c
> ===
> RCS file: /cvs/src/sys/netmpls/mpls_output.c,v
> retrieving revision 1.26
> diff -u -p -r1.26 mpls_output.c
> --- mpls_output.c   2 Dec 2015 08:47:00 -   1.26
> +++ mpls_output.c   26 Aug 2019 09:11:08 -
> @@ -162,41 +162,39 @@ mpls_do_cksum(struct mbuf *m)
>  u_int8_t
>  mpls_getttl(struct mbuf *m, sa_family_t af)
>  {
> -   struct shim_hdr *shim;
> -   struct ip *ip;
> -#ifdef INET6
> -   struct ip6_hdr *ip6hdr;
> -#endif
> +   struct mbuf *n;
> +   int loc, off;
> u_int8_t ttl = mpls_defttl;
>
> /* If the AF is MPLS then inherit the TTL from the present label.
> */
> -   if (af == AF_MPLS) {
> -   shim = mtod(m, struct shim_hdr *);
> -   ttl = ntohl(shim->shim_label & MPLS_TTL_MASK);
> -   return (ttl);
> -   }
> -   /* Else extract TTL from the encapsualted packet. */
> -   switch (*mtod(m, u_char *) >> 4) {
> -   case IPVERSION:
> -   if (!mpls_mapttl_ip)
> +   if (af == AF_MPLS)
> +   loc = 3;
> +   else {
> +   switch (*mtod(m, uint8_t *) >> 4) {
> +   case 4:
> +   if (!mpls_mapttl_ip)
> +   return (ttl);
> +
> +   loc = offsetof(struct ip, ip_ttl);
> break;
> -   if (m->m_len < sizeof(*ip))
> -   break;  /* impossible */
> -   ip = mtod(m, struct ip *);
> -   ttl = ip->ip_ttl;
> -   break;
>  #ifdef INET6
> -   case IPV6_VERSION >> 4:
> -   if (!mpls_mapttl_ip6)
> +   case 6:
> +   if (!mpls_mapttl_ip6)
> +   break;
> +
> +   loc = offsetof(struct ip6_hdr, ip6_hlim);
> break;
> -   if (m->m_len < sizeof(struct ip6_hdr))
> -   break;  /* impossible */
> -   ip6hdr = mtod(m, struct ip6_hdr *);
> -   ttl = ip6hdr->ip6_hlim;
> -   break;
>  #endif
> -   default:
> -   break;
> +   default:
> +   return (ttl);
> +   }
> }
> +
> +   n = m_getptr(m, loc, );
> +   if (n == NULL)
> +   return (ttl);
> +
> +   ttl = *(mtod(n, uint8_t *) + off);
> +
> return (ttl);
>  }
>
>
>

Re: Bad TTL when applying multiple MPLS labels

2019-08-26 Thread David Gwynne

On Fri, Aug 23, 2019 at 10:05:25AM -0500, Gerrie Roos wrote:
> Hi David,
> 
> Apologies, I haven't used CVS in a while - here you go. Hope it's useful.
> 
> Gerrie

npz, i can read the attachment fine.

if someone's carrying a non-ip payload over mpls with a nibble that
makes it look like ip, your diff will cause (very) short payloads
to be dropped when m_pullup can't find bytes to copy into the first
mbuf.

could you try the following? it uses m_getptr to reach into the right
mbuf and byte offset in that mbuf for the ttl. if the packet is
short, it just returns the default ttl.

i've attached the diff for your convenience too.

cheers,
dlg

Index: mpls_output.c
===
RCS file: /cvs/src/sys/netmpls/mpls_output.c,v
retrieving revision 1.26
diff -u -p -r1.26 mpls_output.c
--- mpls_output.c   2 Dec 2015 08:47:00 -   1.26
+++ mpls_output.c   26 Aug 2019 09:11:08 -
@@ -162,41 +162,39 @@ mpls_do_cksum(struct mbuf *m)
 u_int8_t
 mpls_getttl(struct mbuf *m, sa_family_t af)
 {
-   struct shim_hdr *shim;
-   struct ip *ip;
-#ifdef INET6
-   struct ip6_hdr *ip6hdr;
-#endif
+   struct mbuf *n;
+   int loc, off;
u_int8_t ttl = mpls_defttl;
 
/* If the AF is MPLS then inherit the TTL from the present label. */
-   if (af == AF_MPLS) {
-   shim = mtod(m, struct shim_hdr *);
-   ttl = ntohl(shim->shim_label & MPLS_TTL_MASK);
-   return (ttl);
-   }
-   /* Else extract TTL from the encapsualted packet. */
-   switch (*mtod(m, u_char *) >> 4) {
-   case IPVERSION:
-   if (!mpls_mapttl_ip)
+   if (af == AF_MPLS)
+   loc = 3;
+   else {
+   switch (*mtod(m, uint8_t *) >> 4) {
+   case 4:
+   if (!mpls_mapttl_ip)
+   return (ttl);
+
+   loc = offsetof(struct ip, ip_ttl);
break;
-   if (m->m_len < sizeof(*ip))
-   break;  /* impossible */
-   ip = mtod(m, struct ip *);
-   ttl = ip->ip_ttl;
-   break;
 #ifdef INET6
-   case IPV6_VERSION >> 4:
-   if (!mpls_mapttl_ip6)
+   case 6:
+   if (!mpls_mapttl_ip6)
+   break;
+
+   loc = offsetof(struct ip6_hdr, ip6_hlim);
break;
-   if (m->m_len < sizeof(struct ip6_hdr))
-   break;  /* impossible */
-   ip6hdr = mtod(m, struct ip6_hdr *);
-   ttl = ip6hdr->ip6_hlim;
-   break;
 #endif
-   default:
-   break;
+   default:
+   return (ttl);
+   }
}
+
+   n = m_getptr(m, loc, );
+   if (n == NULL)
+   return (ttl);
+
+   ttl = *(mtod(n, uint8_t *) + off);
+
return (ttl);
 }
Index: mpls_output.c
===
RCS file: /cvs/src/sys/netmpls/mpls_output.c,v
retrieving revision 1.26
diff -u -p -r1.26 mpls_output.c
--- mpls_output.c   2 Dec 2015 08:47:00 -   1.26
+++ mpls_output.c   26 Aug 2019 09:11:08 -
@@ -162,41 +162,39 @@ mpls_do_cksum(struct mbuf *m)
 u_int8_t
 mpls_getttl(struct mbuf *m, sa_family_t af)
 {
-   struct shim_hdr *shim;
-   struct ip *ip;
-#ifdef INET6
-   struct ip6_hdr *ip6hdr;
-#endif
+   struct mbuf *n;
+   int loc, off;
u_int8_t ttl = mpls_defttl;
 
/* If the AF is MPLS then inherit the TTL from the present label. */
-   if (af == AF_MPLS) {
-   shim = mtod(m, struct shim_hdr *);
-   ttl = ntohl(shim->shim_label & MPLS_TTL_MASK);
-   return (ttl);
-   }
-   /* Else extract TTL from the encapsualted packet. */
-   switch (*mtod(m, u_char *) >> 4) {
-   case IPVERSION:
-   if (!mpls_mapttl_ip)
+   if (af == AF_MPLS)
+   loc = 3;
+   else {
+   switch (*mtod(m, uint8_t *) >> 4) {
+   case 4:
+   if (!mpls_mapttl_ip)
+   return (ttl);
+
+   loc = offsetof(struct ip, ip_ttl);
break;
-   if (m->m_len < sizeof(*ip))
-   break;  /* impossible */
-   ip = mtod(m, struct ip *);
-   ttl = ip->ip_ttl;
-   break;
 #ifdef INET6
-   case IPV6_VERSION >> 4:
-   if (!mpls_mapttl_ip6)
+   case 6:
+   if (!mpls_mapttl_ip6)
+   break;
+
+   loc = offsetof(struct ip6_hdr, ip6_hlim);
break;
-   if (m->m_len < sizeof(struct ip6_hdr))
-   break;

Re: ifiq_input pressure drop too aggressive for msk interfaces

2019-07-29 Thread David Gwynne

On Mon, Jul 29, 2019 at 12:22:28AM +0200, Olivier Ta??bi wrote:
> On one of my machines equipped with two msk(4) gigabit ethernet devices,
> the reintroduction (rev. 1.32) of counting backpressure in ifiq_input() in
> sys/net/ifq.c causes drops when more than 8 ethernet frames arrive in a short
> amount of time.  I noticed this because this machine is an NFS server, mounted
> with the (client-side) option "-w 32768", but the drops are reproducible
> (visible with netstat -I msk*) simply by sending to the machine a large (e.g.
> 17kB) UDP packet which gets fragmented into >8 IP packets.
> Increasing net.link.ifrxq.pressure_drop (e.g. from the default value 8 to 40)
> suppresses this problem, although I am not sure whether this is the correct
> solution.

i think i see the more fundamental problem. the diff below should fix it
so you don't need to tune the ifrxq pressure levels.

this basically tweaks msk so it enqueues packets for the stack once per
interrupt, rather than once per rx completion event. this helps when you
rx multiple packets per interrupt like you are when a packet gets
fragmented. while here it uses ifiq_input so it can apply earlier
backpressure.

i'll commit this soon, so hopefully it will be in the tree for you to
test.

Index: if_msk.c
===
RCS file: /cvs/src/sys/dev/pci/if_msk.c,v
retrieving revision 1.131
diff -u -p -r1.131 if_msk.c
--- if_msk.c6 Jan 2018 03:11:04 -   1.131
+++ if_msk.c30 Jul 2019 03:59:11 -
@@ -134,7 +134,7 @@ int mskcprint(void *, const char *);
 int msk_intr(void *);
 void msk_intr_yukon(struct sk_if_softc *);
 static inline int msk_rxvalid(struct sk_softc *, u_int32_t, u_int32_t);
-void msk_rxeof(struct sk_if_softc *, uint16_t, uint32_t);
+void msk_rxeof(struct sk_if_softc *, struct mbuf_list *, uint16_t, uint32_t);
 void msk_txeof(struct sk_if_softc *);
 static unsigned int msk_encap(struct sk_if_softc *, struct mbuf *, uint32_t);
 void msk_start(struct ifnet *);
@@ -1591,11 +1591,11 @@ msk_rxvalid(struct sk_softc *sc, u_int32
 }
 
 void
-msk_rxeof(struct sk_if_softc *sc_if, uint16_t len, uint32_t rxstat)
+msk_rxeof(struct sk_if_softc *sc_if, struct mbuf_list *ml,
+uint16_t len, uint32_t rxstat)
 {
struct sk_softc *sc = sc_if->sk_softc;
struct ifnet*ifp = _if->arpcom.ac_if;
-   struct mbuf_listml = MBUF_LIST_INITIALIZER();
struct mbuf *m = NULL;
int prod, cons, tail;
bus_dmamap_tmap;
@@ -1640,8 +1640,7 @@ msk_rxeof(struct sk_if_softc *sc_if, uin
 
m->m_pkthdr.len = m->m_len = len;
 
-   ml_enqueue(, m);
-   if_input(ifp, );
+   ml_enqueue(ml, m);
 }
 
 void
@@ -1770,8 +1769,12 @@ msk_intr(void *xsc)
struct sk_if_softc  *sc_if;
struct sk_if_softc  *sc_if0 = sc->sk_if[SK_PORT_A];
struct sk_if_softc  *sc_if1 = sc->sk_if[SK_PORT_B];
+   struct mbuf_listml[2] = {
+   MBUF_LIST_INITIALIZER(),
+   MBUF_LIST_INITIALIZER(),
+   };
struct ifnet*ifp0 = NULL, *ifp1 = NULL;
-   int claimed = 0, rx[2] = {0, 0};
+   int claimed = 0;
u_int32_t   status;
struct msk_status_desc  *cur_st;
 
@@ -1809,8 +1812,8 @@ msk_intr(void *xsc)
switch (cur_st->sk_opcode) {
case SK_Y2_STOPC_RXSTAT:
sc_if = sc->sk_if[cur_st->sk_link & 0x01];
-   rx[cur_st->sk_link & 0x01] = 1;
-   msk_rxeof(sc_if, lemtoh16(_st->sk_len),
+   msk_rxeof(sc_if, [cur_st->sk_link & 0x01],
+   lemtoh16(_st->sk_len),
lemtoh32(_st->sk_status));
break;
case SK_Y2_STOPC_TXSTAT:
@@ -1837,12 +1840,16 @@ msk_intr(void *xsc)
 
CSR_WRITE_4(sc, SK_Y2_ICR, 2);
 
-   if (rx[0]) {
+   if (!ml_empty([0])) {
+   if (ifiq_input(>if_rcv, [0]))
+   if_rxr_livelocked(_if0->sk_cdata.sk_rx_ring);
msk_fill_rx_ring(sc_if0);
SK_IF_WRITE_2(sc_if0, 0, SK_RXQ1_Y2_PREF_PUTIDX,
sc_if0->sk_cdata.sk_rx_prod);
}
-   if (rx[1]) {
+   if (!ml_empty([1])) {
+   if (ifiq_input(>if_rcv, [1]))
+   if_rxr_livelocked(_if1->sk_cdata.sk_rx_ring);
msk_fill_rx_ring(sc_if1);
SK_IF_WRITE_2(sc_if1, 0, SK_RXQ1_Y2_PREF_PUTIDX,
sc_if1->sk_cdata.sk_rx_prod);

Re: hostname.gre panic. 6.5 RELEASE

2019-07-28 Thread David Gwynne




> On 26 Jul 2019, at 5:22 am, Alexander Bluhm  wrote:
> 
> On Thu, Jul 25, 2019 at 12:40:22PM +, andr...@nullbyte.se wrote:
>> # Which results in the following error as can be seen in this screenshot:
>> http://gw.nullbyte.se/dump/openbsd/openbsd_65_gre_panic.PNG
>> 
>>> Fix:
>> # Make sure the 'tunnel' statement is before the inet/inet6 commands
> 
> The inet6 duplicate address detection packet is sent before the
> tunnel is set up.  We should reject packets during that time window.
> 
> While there, count errors and use generic unhandled_af().
> 
> ok?

ok

> 
> bluhm
> 
> Index: net/if_gre.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_gre.c,v
> retrieving revision 1.151
> diff -u -p -r1.151 if_gre.c
> --- net/if_gre.c  17 Jul 2019 16:46:17 -  1.151
> +++ net/if_gre.c  25 Jul 2019 19:03:45 -
> @@ -1930,8 +1930,10 @@ mgre_output(struct ifnet *ifp, struct mb
>   }
> 
>   m = gre_l3_encap_dst(>sc_tunnel, addr, m, dest->sa_family);
> - if (m == NULL)
> + if (m == NULL) {
> + ifp->if_oerrors++;
>   return (ENOBUFS);
> + }
> 
>   m->m_pkthdr.ph_family = dest->sa_family;
> 
> @@ -2142,6 +2144,10 @@ gre_encap_dst_ip(const struct gre_tunnel
> struct mbuf *m, uint8_t ttl, uint8_t tos)
> {
>   switch (tunnel->t_af) {
> + case AF_UNSPEC:
> + /* packets may arrive before tunnel is set up */
> + m_freem(m);
> + return (NULL);
>   case AF_INET: {
>   struct ip *ip;
> 
> @@ -2188,8 +2194,7 @@ gre_encap_dst_ip(const struct gre_tunnel
>   }
> #endif /* INET6 */
>   default:
> - panic("%s: unsupported af %d in %p", __func__, tunnel->t_af,
> - tunnel);
> + unhandled_af(tunnel->t_af);
>   }
> 
>   return (m);
> @@ -2215,8 +2220,7 @@ gre_ip_output(const struct gre_tunnel *t
>   break;
> #endif
>   default:
> - panic("%s: unsupported af %d in %p", __func__, tunnel->t_af,
> - tunnel);
> + unhandled_af(tunnel->t_af);
>   }
> 
>   return (0);
> @@ -4286,7 +4290,7 @@ gre_ip_cmp(int af, const union gre_addr
>   case AF_INET:
>   return (memcmp(>in4, >in4, sizeof(a->in4)));
>   default:
> - panic("%s: unsupported af %d\n", __func__, af);
> + unhandled_af(af);
>   }
> 
>   return (0);
>

Re: panic: Stopped at kqueue_scan

2019-04-29 Thread David Gwynne




> On 30 Apr 2019, at 03:24, Martin Pieuchot  wrote:
> 
> On 29/04/19(Mon) 17:24, David Gwynne wrote:
>> On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
>>> On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
>>>>> Synopsis:panic: Stopped at kqueue_scan
>>>>> Category:kernel i386
>>>>> Environment:
>>>>System  : OpenBSD 6.5
>>>>Details : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
>>>> 19:50:46 MDT 2019
>>>> 
>>>> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
>>>> 
>>>>Architecture: OpenBSD.i386
>>>>Machine : i386
>>>>> Description:
>>>> Hi, since my last update I have regular panic crashes. 4 in two days.
>>>> At least 3 of them, with certainty, occurred while I was accessing the
>>>> Internet via my smartphone connected to my OpenBSD WiFi access point
>>>> through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
>>>> This setup works for years.
>>>> 
>>>> The machine then displays something like:
>>>> uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
>>>> kernel: page fault trap, code=0
>>>> Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
>>>> ddb{1}>
>>> 
>>> So this indicates that the `kqueue' is empty.  It should not happen
>>> because the caller, in your case npppd, always places a marker in the
>>> list.
>>> 
>>> Since the caller is not threaded and the syscall is executed with the
>>> KERNEL_LOCK() held, we can supposed that another part of the kernel is
>>> removing the marker.  That would imply that the other part isn't running
>>> with the KERNEL_LOCK() and requires a MP kernel.
>>> 
>>> Could you try *very hard* to reproduce the problem with a kernel built
>>> with the diff below?  Hopefully you'll make it crash and we'll find the
>>> bug.  Otherwise we'll look for another possible cause of the marker
>>> removal.
>>> 
>>> Index: kern/kern_event.c
>>> ===
>>> RCS file: /cvs/src/sys/kern/kern_event.c,v
>>> retrieving revision 1.101
>>> diff -u -p -r1.101 kern_event.c
>>> --- kern/kern_event.c   27 Nov 2018 15:52:50 -  1.101
>>> +++ kern/kern_event.c   28 Apr 2019 21:47:25 -
>>> @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
>>> struct kqueue *kq = kn->kn_kq;
>>> struct klist *list;
>>> 
>>> +   KERNEL_ASSERT_LOCKED();
>>> +
>>> if (kn->kn_fop->f_isfd)
>>> list = >kq_knlist[kn->kn_id];
>>> else
>>> 
>> 
>> i had a similar diff in my tree, and with some clues from this thread
>> and the same panic from jmc@, points out that tun(4) calls tun_wakeup
>> without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
>> the kq code messing up the kn_head list. fun fun.
>> 
>> below is an extremely clever diff to tun to avoid doing the wakeup
>> without the kernel lock held. i say extremely clever because tun(4) is
>> not marked as MPSAFE, which means the if_start handler gets called with
>> KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
>> tun_wakeup with that implicit KERNEL_LOCK hold.
>> 
>> i don't know why this hasn't blown up before. the network stack
>> hasn't been run with the KERNEL_LOCK for ages now.
>> 
>> tun_output with a custom if_enqueue handler would be a lot smarter,
>> but that is a more invasive diff.
> 
> I'd prefer if we could grab the KERNEK_LOCK() around csignal() and
> selwakeup() like it is done in the socket code.  I believe that we
> need to show where the lock is taken in order to help people realize
> where it needs to be pushed down.

I should have been more clear, but this diff is to fix the tree, it's by no 
means a final fix. tun(4) itself could do with some further changes, including 
the ones you describe above.

If you prefer I could just add KERNEL_LOCK to tun_wakeup?

> Can we solve both issues with a similar fix?  How hard can it be to get
> selwakeup() out of the KERNEL_LOCK()?

Figuring that out was going to go onto my todo.

> Should we add a KASSERT in csignal() too?

If it currently needs the kernel lock then it should have the assert for it. 
Otherwise we have issues like these waiting days or weeks for fixes.

dlg

> 
>> Index: net/if_tun.c
>>

Re: panic: Stopped at kqueue_scan

2019-04-29 Thread David Gwynne

On Sun, Apr 28, 2019 at 06:57:02PM -0300, Martin Pieuchot wrote:
> On 23/04/19(Tue) 12:16, Olivier Antoine wrote:
> > >Synopsis:panic: Stopped at kqueue_scan
> > >Category:kernel i386
> > >Environment:
> > System  : OpenBSD 6.5
> > Details : OpenBSD 6.5-current (GENERIC.MP) #1368: Sun Apr 21
> > 19:50:46 MDT 2019
> >  
> > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.i386
> > Machine : i386
> > >Description:
> > Hi, since my last update I have regular panic crashes. 4 in two days.
> > At least 3 of them, with certainty, occurred while I was accessing the
> > Internet via my smartphone connected to my OpenBSD WiFi access point
> > through my Allways-on VPN isakmp/ipsec/nppp relaying traffic in Tor.
> > This setup works for years.
> > 
> > The machine then displays something like:
> > uvm_fault(0xd34e5f3c, 0x0, 0, 2) -> e
> > kernel: page fault trap, code=0
> > Stopped at kqueue_scan+0x246: movl %eax,0(%ecx)
> > ddb{1}>
> 
> So this indicates that the `kqueue' is empty.  It should not happen
> because the caller, in your case npppd, always places a marker in the
> list.
> 
> Since the caller is not threaded and the syscall is executed with the
> KERNEL_LOCK() held, we can supposed that another part of the kernel is
> removing the marker.  That would imply that the other part isn't running
> with the KERNEL_LOCK() and requires a MP kernel.
> 
> Could you try *very hard* to reproduce the problem with a kernel built
> with the diff below?  Hopefully you'll make it crash and we'll find the
> bug.  Otherwise we'll look for another possible cause of the marker
> removal.
> 
> Index: kern/kern_event.c
> ===
> RCS file: /cvs/src/sys/kern/kern_event.c,v
> retrieving revision 1.101
> diff -u -p -r1.101 kern_event.c
> --- kern/kern_event.c 27 Nov 2018 15:52:50 -  1.101
> +++ kern/kern_event.c 28 Apr 2019 21:47:25 -
> @@ -1052,6 +1052,8 @@ knote_drop(struct knote *kn, struct proc
>   struct kqueue *kq = kn->kn_kq;
>   struct klist *list;
>  
> + KERNEL_ASSERT_LOCKED();
> +
>   if (kn->kn_fop->f_isfd)
>   list = >kq_knlist[kn->kn_id];
>   else
> 

i had a similar diff in my tree, and with some clues from this thread
and the same panic from jmc@, points out that tun(4) calls tun_wakeup
without KERNEL_LOCK. tun_wakeup calls selwakeup, which ends up in
the kq code messing up the kn_head list. fun fun.

below is an extremely clever diff to tun to avoid doing the wakeup
without the kernel lock held. i say extremely clever because tun(4) is
not marked as MPSAFE, which means the if_start handler gets called with
KERNEL_LOCK taken by the stack on its behalf. tun_start then calls
tun_wakeup with that implicit KERNEL_LOCK hold.

i don't know why this hasn't blown up before. the network stack
hasn't been run with the KERNEL_LOCK for ages now.

tun_output with a custom if_enqueue handler would be a lot smarter,
but that is a more invasive diff.

ok?

Index: net/if_tun.c
===
RCS file: /cvs/src/sys/net/if_tun.c,v
retrieving revision 1.184
diff -u -p -r1.184 if_tun.c
--- net/if_tun.c3 Feb 2019 23:04:49 -   1.184
+++ net/if_tun.c29 Apr 2019 07:14:46 -
@@ -576,7 +576,6 @@ tun_output(struct ifnet *ifp, struct mbu
return (error);
}
 
-   tun_wakeup(tp);
return (0);
 }
 
Index: kern/kern_event.c
===
RCS file: /cvs/src/sys/kern/kern_event.c,v
retrieving revision 1.101
diff -u -p -r1.101 kern_event.c
--- kern/kern_event.c   27 Nov 2018 15:52:50 -  1.101
+++ kern/kern_event.c   29 Apr 2019 07:14:46 -
@@ -1072,6 +1072,7 @@ knote_enqueue(struct knote *kn)
struct kqueue *kq = kn->kn_kq;
int s = splhigh();
 
+   KERNEL_ASSERT_LOCKED();
KASSERT((kn->kn_status & KN_QUEUED) == 0);
 
TAILQ_INSERT_TAIL(>kq_head, kn, kn_tqe);
@@ -1089,6 +1090,7 @@ knote_dequeue(struct knote *kn)
 
KASSERT(kn->kn_status & KN_QUEUED);
 
+   KERNEL_ASSERT_LOCKED();
TAILQ_REMOVE(>kq_head, kn, kn_tqe);
kn->kn_status &= ~KN_QUEUED;
kq->kq_count--;
tun_wakeup

Re: ARP issues when using ldpd(8) and mpw(4)

2019-02-18 Thread David Gwynne

On Fri, Feb 15, 2019 at 01:12:36PM +1100, Adrian Close wrote:
> >Synopsis: Incorrect ARPing for routed LDP peer's loopback IP instead of
> ARPing for next-hop router IP on expiry
> >Category: system
> >Environment:
> System  : OpenBSD 6.4
> Details : OpenBSD 6.4-current (GENERIC) #688: Wed Feb 13 12:16:06 MST
> 2019
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> I'm having some issues getting an mpw(4) MPLS "pseudowire" setup working
> reliably and it seems to come down to ARP issues between my "PE" and "P"
> boxes that occur when ldpd(8) thinks the pseudowire l2vpn tunnels are
> actually up.
> 
> My setup looks something like:
> 
>[cust L2] - [vic0]PE1[vic1] -- [vic0]P1[vic1] -- [vic1]P2[vic0] --
> [vic1][PE2][vic0] - [cust L2]
> 
> ... the idea being to provide a layer 2 circuit between the "cust L2"
> ports, over an MPLS network, which I've configured along the lines of
> Claudio's "Demystifying MPLS" paper and Renato's "VPLS basic test
> setup", using LDP and OSPF.  This all works fine, except when it doesn't:
> 
> So for example, when the ARP entry on the PE1 box for P1's vic0 IP
> address times out (or is manually deleted), PE1 sends out ARP requests
> not for that IP address, but for the loopback address of its pseudowire
> peer PE2.
> 
> It generally does this for about 40 seconds, and then finally sends an
> ARP request for the correct IP, which is answered straight away and
> things work again until next time.
> 
> If ldpd(8) thinks the l2vpn tunnels are down ("ldpctl show l2vpn
> pseudowires"), ARP requests are sent normally and work as expected.
> 
> 
> >How-To-Repeat:
> I made four VMs in VMWare ESX and connected them together in series,
> ie. PE1[vic1]:[vic0]P1[vic1]:[vic1]P2[vic0]:[vic1]PE2.
> Configure networking and osfpd/lpd as follows:
> 
> PE1 configuration:
> ospfd.conf:
>   router-id 172.31.211.101
>   area 0.0.0.0 {
> interface vic1
> interface lo1
>   }
> ldpd.conf:
>   router-id 172.31.211.101
>   address-family ipv4 {
> interface vic1
>   }
>   l2vpn VMVPLS type vpls {
> bridge bridge0
> interface vic0
> pseudowire mpw2 {
> pw-id 102
> neighbor-id 172.31.221.101
> }
>   }
> vic1: 192.168.11.101/24, MPLS enabled
> lo1: 172.31.211.101/32
> bridge0: members vic0 + mpw2
> 
> P1 configuration:
> ospfd.conf:
>   router-id 172.31.100.11
>   area 0.0.0.0 {
> interface vic0
> interface vic1
> interface lo1
>   }
> ldpd.conf:
> router-id 172.31.100.11
> address-family ipv4 {
> interface vic0
> interface vic1
> }
> vic0: 192.168.11.11/24, MPLS enabled
> vic1: 192.168.12.11/24, MPLS enabled
> lo1: 172.31.100.11/32
> 
> P2 configuration:
> ospfd.conf:
>   router-id 172.31.100.21
>   area 0.0.0.0 {
> interface vic0
> interface vic1
> interface lo1
>   }
> ldpd.conf:
> router-id 172.31.100.21
> address-family ipv4 {
> interface vic0
> interface vic1
> }
> vic0: 192.168.12.21/24, MPLS enabled
> vic1: 192.168.22.21/24, MPLS enabled
> lo1: 172.31.100.21/32
> 
> PE2 configuration:
> ospfd.conf:
>   router-id 172.31.221.101
>   area 0.0.0.0 {
> interface vic1
> interface lo1
>   }
> ldpd.conf:
>   router-id 172.31.221.101
>   address-family ipv4 {
> interface vic1
>   }
>   l2vpn VMVPLS type vpls {
> bridge bridge0
> interface vic0
> pseudowire mpw2 {
> pw-id 102
> neighbor-id 172.31.211.101
> }
>   }
> vic1: 192.168.22.201/24, MPLS enabled
> lo1: 172.31.211.101/32
> bridge0: members vic0 + mpw2
> 
> 
> Bring up ospfd and ldpd on each box.  Routing and MPLS should converge OK.
> The L2VPN should come up (see 'ldpctl show l2vpn pseudowires' output).
> Ethernet frames pass between vic0 interfaces on the PE boxes, via MPLS.
> 
> Once the ARP entry on PE1 for P1 (192.168.11.11) and/or on PE2 for P2
> (192.168.22.21) times out, the pseudowire stops working.  Similar behaviour
> if
> the ARP entry is manually deleted.
> 
> The PE box will then start ARPing on vic1 for its pseudowire peer
> (eg. 172.31.211.101) instead of its nexthop router for that peer, which of
> course doesn't work.  After a while it does ARP for the nexthop router
> (eg. 192.168.11.11), P responds and connectivity is restored.
> 
> 
> >Fix:
> Configure static ARP entries for P router IP on PE boxes to work
> around this.  Ew.

Hi Adrian,

Could you please try the following diff? I was able to reproduce the
problem you were seeing, but this makes it go away.

Index: if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.258
diff -u -p -r1.258 if_ethersubr.c
--- if_ethersubr.c  18 Feb 2019 03:41:21 -  1.258
+++ if_ethersubr.c  19 Feb 2019 03:23:15 -
@@ -235,19 +235,16

Re: Bridging vlan over mpls. OpenBSD6.3

2018-05-23 Thread David Gwynne

hey andrew,

can i see the mpw0 interface according to ifconfig please?

cheers,
dlg

> On 18 May 2018, at 20:51, and...@unixadmina.net wrote:
> 
> I found strange behavior when tried to bridge vlan from OpenBSD box over 
> mpls. It seems like BSD box sends untagged packets received from mpls tunnel 
> instead of adding vlan tag. Is it known bug or am I just missing something?
> 
> OpenBSD running on a PC with two vlans.
> vyb-r0# uname -a
> OpenBSD vyb-r0.loc 6.3 GENERIC.MP#107 amd64
> 
> 
> vlan107 -- mpls enabled vlan:
> vyb-r0# ifconfig vlan107
> vlan107: flags=88843 mtu 1500
>lladdr 70:71:bc:cc:fb:d4
>description: Kinda uplink interface
>index 7 priority 0 llprio 3
>encap: vnetid 107 parent re0
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex)
>status: active
>inet 10.150.0.10 netmask 0xff00 broadcast 10.150.0.255
> 
> vlan2000 -- vlan that got to be bridged over mpls:
> vyb-r0# ifconfig vlan2000
> vlan2000: flags=8943 mtu 1500
>lladdr 70:71:bc:cc:fb:d4
>description: local L2 interface
>index 9 priority 0 llprio 3
>encap: vnetid 2000 parent re0
>groups: vlan
>media: Ethernet autoselect (1000baseT full-duplex)
>status: active
> 
> bridging interface:
> vyb-r0# ifconfig bridge0
> bridge0: flags=41
>index 4 llprio 3
>groups: bridge
>priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
>designated: id 00:00:00:00:00:00 priority 0
>vlan2000 flags=3
>port 9 ifpriority 0 ifcost 0
>mpw0 flags=3
>port 6 ifpriority 0 ifcost 0
>Addresses (max cache: 100, timeout: 240):
>e4:6f:13:aa:38:c1 mpw0 1 flags=0<>
>e4:6f:13:aa:37:c1 vlan2000 1 flags=0<>
> 
> 
> MPLS tunnel is up and running and I can see MACs and packets coming in and 
> out of tunnel.
> 
> vyb-r0# tcpdump -nibridge0 -e
> tcpdump: listening on bridge0, link-type EN10MB
> 13:02:48.323723 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has 
> 10.150.2.40 tell 10.150.2.50
> 13:02:48.324253 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply 
> 10.150.2.40 is-at e4:6f:13:aa:38:c1
> 13:02:49.347673 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has 
> 10.150.2.40 tell 10.150.2.50
> 13:02:49.348255 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply 
> 10.150.2.40 is-at e4:6f:13:aa:38:c1
> 13:02:50.371668 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has 
> 10.150.2.40 tell 10.150.2.50
> 13:02:50.372173 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply 
> 10.150.2.40 is-at e4:6f:13:aa:38:c1
> 13:02:51.395596 e4:6f:13:aa:37:c1 ff:ff:ff:ff:ff:ff 0806 60: arp who-has 
> 10.150.2.40 tell 10.150.2.50
> 13:02:51.396143 e4:6f:13:aa:38:c1 e4:6f:13:aa:37:c1 0806 60: arp reply 
> 10.150.2.40 is-at e4:6f:13:aa:38:c1
> 
> However, those arp replies don't reach 10.150.2.50 on vlan2000.
> 
> I've mirrored OpenBSD port on a switch. vlan3000 is a target vlan for port 
> mirroring. Here's tcpdump on another PC that receives mirroring vlan.
> 
> #tcpdump -nivlan3000 -e
> 13:29:46.445351 e4:6f:13:aa:37:c1 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
> (0x8100), length 64: vlan 2000, p 0, ethertype ARP, Request who-has 
> 10.150.2.40 tell 10.150.2.50, length 46
> 13:29:46.445410 70:71:bc:cc:fb:d4 > b8:38:61:1a:8e:a1, ethertype 802.1Q 
> (0x8100), length 90: vlan 107, p 0, ethertype MPLS unicast, MPLS (label 21, 
> exp 0, ttl 255) (label 20, exp 0, [S], ttl 255)
> 13:29:46.445820 b8:38:61:1a:8e:a1 > 70:71:bc:cc:fb:d4, ethertype 802.1Q 
> (0x8100), length 86: vlan 107, p 0, ethertype MPLS unicast, MPLS (label 16, 
> exp 0, [S], ttl 254)
> 13:29:46.445870 e4:6f:13:aa:38:c1 > e4:6f:13:aa:37:c1, ethertype ARP 
> (0x0806), length 60: Reply 10.150.2.40 is-at e4:6f:13:aa:38:c1, length 46
> 
> First line is arp-request. (vlan2000)
> Line two is arp-request sent over MPLS. (vlan107)
> Line three is arp-answer sent over MPLS. (vlan107)
> But forth line in simple untagged answer that got to be sent over vlan2000.
> 
> I haven't checked if this issue is broadcast only or unicast has same 
> problem. I haven't checked if it is mpw only problem or any point-to-point 
> interface strives too. Bridge works ok with bridging two vlans.
> 
> ospfd.conf and ldpd.conf are as simple as it could be.
> 
> ospfd.conf:
> router-id 10.128.0.10
> 
> area 0.0.0.0 {
>interface vlan107
>interface lo1
> }
> 
> 
> ldpd.conf:
> router-id 10.128.0.10
> 
> l2vpn OFFICE type vpls {
>  bridge bridge0
>  interface vlan2000
>  pseudowire mpw0 {
> neighbor-id 10.128.0.9
> pw-id 
>  }
> }
> 
> address-family ipv4 {
>interface vlan107
> }
> 
> 
> Things that confuses me is why vlan bridging over mpls isn't working in my 
> setup. Bridging client's vlan

Re: pair(4) crashes on strict alignment platforms

2018-03-08 Thread David Gwynne

On Thu, Mar 08, 2018 at 07:44:13PM +0100, Stefan Sperling wrote:
> Running the first set of example commands from the pair(4) man page
> crashes the kernel on at least sparc64 and octeon.
> 
>  # ifconfig pair1 rdomain 1 10.1.1.1/24 up
>  # ifconfig pair2 rdomain 2 10.1.1.2/24 up
>  # ifconfig pair1 patch pair2
>  # route -T 1 exec ping 10.1.1.2
> 
> A netcat<->telnet connection from 10.1.1.1 to 10.1.1.2 works.
> 
> It seems the problem only happens with ping, or short packets in general.
> It looks like the crash is happening while processing the icmp echo reply.
> This code in ip_input_if() calls m_pullup() which ends up setting m->m_data
> to an unaligned address:
> 
>   if (m->m_len < sizeof (struct ip) &&
>   (m = *mp = m_pullup(m, sizeof (struct ip))) == NULL) {
>   ipstat_inc(ips_toosmall);
>   goto bad;
>   }
>   ip = mtod(m, struct ip *);
>   if (ip->ip_v != IPVERSION) {  // we crash here because ip is misaligned
> 
> Note that pair(4) has dequeued this mbuf from its send queue and doesn't
> modify it except for resetting the packet header if it exists.
> 
> Trace from sparc64:
> 
> panic: trap type 0x34 (mem address not aligned): pc=11336d4 npc=11336d8 
> pstate=44820006
> Stopped at  db_enter+0x8:   nop
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *291733   6703  0 0x14000  0x2000  softnet
> trap(40016e6b890, 34, 11336d4, 44820006, 400023ff800, 8848) at trap+0x2e0
> Lslowtrap_reenter(400023ace00, 0, , 1c176d0, 4, 1) at 
> Lslowtrap_reenter+0xf8
> ip_input_if(40016e6bb48, 40016e6bb54, 4, 0, 400023ff800, 8848) at 
> ip_input_if+0x120
> ipv4_input(400023ff800, 18184e8, , 1c176d0, 4, 1) at 
> ipv4_input+0x3c
> ether_input(400023ff800, 400023ace00, 0, 16545e8, , 8848) at 
> ether_input+0xc8
> if_input_process(400021607c0, 40016e6bde0, 131cb20, 1c176d0, 4, 0) at 
> if_input_process+0x11c
> taskq_thread(4000216c080, 40002142fc0, 1758938, 16545e8, 0, 3b9ac800) at 
> taskq_thread+0x6c
> proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x14
> https://www.openbsd.org/ddb.html describes the minimum info required in bug 
> reports.
> Insufficient info makes it difficult to find and fix bugs.
> 

the diff below fixes the panic.

the problem is m_adj doesn't keep m_data updated when removing all
the data from one mbuf in a chain.

in more detail, when we read the ping packet from userland into
the kernel, it's put at the front of an mbuf, which is properly
aligned for an ip packet. pair is an ethernet interface, so to send
it an ethernet header is prepended. because the ip packet is at the
start of an mbuf we allocate a new one for the ethernet header.
that ends up being 6 bytes at the end of a new mbuf, with the end
of the ethernet header properly aligned for an ip packet.

pair then sends the packet back into the network stack, which m_adjs
the ethernet header off the front of the packet. m_adj never deletes
mbufs, it just sets their lengths to 0 if there's no more data left
in an mbuf. because the ethernet header is all thats in that first
mbuf it sets m_len to 0, but doesn't touch m_data.

the ip stack then tries to access the ip header in the mbuf chain.
because the first mbuf has 0 bytes in it it uses m_pulldown to get
to the ip header. m_pulldown goes to a lot of lengths to maintain
the alignment of the payload in an mbuf. because m_adj left m_data
where an ethernet header is, ie, 6 bytes with the end aligned for
a payload), m_pulldown puts the ip header where the ethernet header
was, which is 2 byte aligned.

netcat works because non-raw ip sockets have their headers allocated
with space at the start for link headers.

this diff makes m_adj update m_data in all paths.

another option could be to have m_pullup skip over mbufs with
m_len == 0 before finding the target alignment. or do both.

Index: uipc_mbuf.c
===
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.253
diff -u -p -r1.253 uipc_mbuf.c
--- uipc_mbuf.c 16 Jan 2018 19:44:34 -  1.253
+++ uipc_mbuf.c 9 Mar 2018 01:34:05 -
@@ -812,11 +812,12 @@ m_adj(struct mbuf *mp, int req_len)
while (m != NULL && len > 0) {
if (m->m_len <= len) {
len -= m->m_len;
+   m->m_data += m->m_len; /* move alignment */
m->m_len = 0;
m = m->m_next;
} else {
-   m->m_len -= len;
m->m_data += len;
+   m->m_len -= len;
len = 0;
}
}

Re: m_pullup(9) regression

2016-11-08 Thread David Gwynne


> On 8 Nov 2016, at 20:01, Martin Pieuchot  wrote:
> 
> semarie exposed a bug in m_pullup(9) while testing my diff to
> automatically create lo(4) interfaces per rdomain.
> 
> In the block below ``m'' is dereferenced without being previously set.
> 
> Is the diff below correct?

yes.

ok by me.

> 
> Index: kern/uipc_mbuf.c
> ===
> RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
> retrieving revision 1.237
> diff -u -p -r1.237 uipc_mbuf.c
> --- kern/uipc_mbuf.c  27 Oct 2016 03:29:55 -  1.237
> +++ kern/uipc_mbuf.c  8 Nov 2016 09:57:06 -
> @@ -896,7 +896,7 @@ m_pullup(struct mbuf *n, int len)
>   if (len > tail - mtod(n, caddr_t)) {
>   /* need to memmove to make space at the end */
>   memmove(head, mtod(n, caddr_t), n->m_len);
> - m->m_data = head;
> + n->m_data = head;
>   }
> 
>   len -= n->m_len;

Re: hang at scsibus1 at mpii0: 128 targets

2016-10-31 Thread David Gwynne

On Mon, Oct 31, 2016 at 02:58:10PM +0100, Simon Mages wrote:
> >Synopsis:  OpenBSD-current hang at scsibus1 at mpii0: 128 targets
> >Category:  driver issue
> >Environment:
> System  : OpenBSD 6.0
> Details : OpenBSD 6.0-current (GENERIC.MP) #2: Mon Oct 31
> 14:39:18 CET 2016
> 
> r...@somebox.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> after rebuilding the current kernel and booting it the machine hangs
> at "scsibus1 at mpii0: 128 targets".

can you try this?

Index: mpii.c
===
RCS file: /cvs/src/sys/dev/pci/mpii.c,v
retrieving revision 1.107
diff -u -p -r1.107 mpii.c
--- mpii.c  24 Oct 2016 01:50:09 -  1.107
+++ mpii.c  1 Nov 2016 00:13:56 -
@@ -888,6 +888,9 @@ mpii_scsi_probe(struct scsi_link *link)
if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED))
return (1);
 
+   if (ISSET(flags, MPII_DF_VOLUME))
+   return (0);
+
memset(, 0, sizeof(ehdr)); 
ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED;
ehdr.page_number = 0;

Re: changing lladdr on vlan(4) breaks rx-path

2016-04-19 Thread David Gwynne

On Tue, Dec 22, 2015 at 10:11:23PM +0100, Paul de Weerd wrote:
> Hi all,
> 
> My ISP requires me to use different MAC addresses for internet and TV
> access.  These services arrive at the demarc as two different ethernet
> VLANs, vlan4 (television) and vlan34 (internet), on one copper GE
> port.
> 
> I can get a lease on vlan34 just fine, and have internet access.  Then
> I change the MAC on my vlan4 interface:
> 
>   $ ifconfig vlan4 lladdr 
> 
> However, dhclient doesn't give me a lease.  To debug, I run tcpdump
> and try again: lo, an offer arrives.
> 
> Turns out, vlan4 only works when the parent interface (em2, in my
> case) is in promiscuous mode.  Seems to make sense from a naive
> point of view: the interface filters traffic that's destined for it's
> own MAC address, dropping traffic for other MACs.  Is this a problem
> with the em(4) driver, with vlan(4) or elsewhere?

i think vlan.

the diff below allows a vlan interface to be configured with a
custom mac address. if that is done, itll mark itself as having a
custom lladdr and will in turn enable promisc on the parent interface
so those packets will actually end up coming into the kernel.

it supports setting the mac address while the vlan interface is
both up and down.

if you set the mac address on the vlan interface to 00:00:00:00:00:00,
itll treat that as removing the custom mac and will replace it with
the parents mac address.

lastly, this makes no effort to cope with the mac address of the
parent interface being changed at runtime.

Index: if_vlan_var.h
===
RCS file: /cvs/src/sys/net/if_vlan_var.h,v
retrieving revision 1.35
diff -u -p -r1.35 if_vlan_var.h
--- if_vlan_var.h   15 Apr 2016 04:34:10 -  1.35
+++ if_vlan_var.h   19 Apr 2016 13:30:31 -
@@ -81,7 +81,8 @@ structifvlan {
 #defineifv_tag ifv_mib.ifvm_tag
 #defineifv_prioifv_mib.ifvm_prio
 #defineifv_typeifv_mib.ifvm_type
-#defineIFVF_PROMISC0x01
+#defineIFVF_PROMISC0x01/* the parent should be made promisc */
+#defineIFVF_LLADDR 0x02/* don't inherit the parents mac */
 
 struct mbuf*vlan_inject(struct mbuf *, uint16_t, uint16_t);
 #endif /* _KERNEL */
Index: if_vlan.c
===
RCS file: /cvs/src/sys/net/if_vlan.c,v
retrieving revision 1.162
diff -u -p -r1.162 if_vlan.c
--- if_vlan.c   15 Apr 2016 04:34:10 -  1.162
+++ if_vlan.c   19 Apr 2016 13:30:31 -
@@ -92,8 +92,6 @@ int   vlan_up(struct ifvlan *);
 intvlan_parent_up(struct ifvlan *, struct ifnet *);
 intvlan_down(struct ifvlan *);
 
-intvlan_promisc(struct ifvlan *, int);
-
 void   vlan_ifdetach(void *);
 void   vlan_link_hook(void *);
 void   vlan_link_state(struct ifvlan *, u_char, u_int64_t);
@@ -107,6 +105,9 @@ int vlan_multi_del(struct ifvlan *, stru
 void   vlan_multi_apply(struct ifvlan *, struct ifnet *, u_long);
 void   vlan_multi_free(struct ifvlan *);
 
+intvlan_iff(struct ifvlan *);
+intvlan_setlladdr(struct ifvlan *, struct ifreq *);
+
 intvlan_set_compat(struct ifnet *, struct ifreq *);
 intvlan_get_compat(struct ifnet *, struct ifreq *);
 
@@ -432,6 +433,7 @@ vlan_parent_up(struct ifvlan *ifv, struc
if_ih_insert(ifp0, vlan_input, NULL);
 
return (0);
+
 }
 
 int
@@ -470,7 +472,8 @@ vlan_up(struct ifvlan *ifv)
/* parent is fine, let's prepare the ifv to handle packets */
ifp->if_hardmtu = hardmtu;
SET(ifp->if_flags, ifp0->if_flags & IFF_SIMPLEX);
-   if_setlladdr(ifp, LLADDR(ifp0->if_sadl));
+   if (!ISSET(ifv->ifv_flags, IFVF_LLADDR))
+   if_setlladdr(ifp, LLADDR(ifp0->if_sadl));
 
if (ifv->ifv_type != ETHERTYPE_VLAN) {
/*
@@ -522,7 +525,8 @@ leave:
rw_exit(_tagh_lk);
 scrub:
ifp->if_capabilities = 0;
-   if_setlladdr(ifp, etheranyaddr);
+   if (!ISSET(ifv->ifv_flags, IFVF_LLADDR))
+   if_setlladdr(ifp, etheranyaddr);
CLR(ifp->if_flags, IFF_SIMPLEX);
ifp->if_hardmtu = 0x;
 put:
@@ -564,7 +568,8 @@ vlan_down(struct ifvlan *ifv)
rw_exit_write(_tagh_lk);
 
ifp->if_capabilities = 0;
-   if_setlladdr(ifp, etheranyaddr);
+   if (!ISSET(ifv->ifv_flags, IFVF_LLADDR))
+   if_setlladdr(ifp, etheranyaddr);
CLR(ifp->if_flags, IFF_SIMPLEX);
ifp->if_hardmtu = 0x;
 
@@ -617,29 +622,6 @@ vlan_link_state(struct ifvlan *ifv, u_ch
 }
 
 int
-vlan_promisc(struct ifvlan *ifv, int promisc)
-{
-   struct ifnet *ifp0;
-   int error = 0;
-
-   if ((ISSET(ifv->ifv_flags, IFVF_PROMISC) ? 1 : 0) == promisc)
-   return (0);
-
-   ifp0 = if_get(ifv->ifv_ifp0);
-   if (ifp0 != NULL) {
-   error = ifpromisc(ifp0, promisc);
-   }
-   if_put(ifp0);
-
-   if (error == 0) {
-   CLR(ifv->ifv_flags,

Re: re(4) driver issues on RTL8168F

2016-03-08 Thread David Gwynne

hey marc,

is this reproducable?

if so, next time it happens can you collect ifconfig, systat mb, and vmstat -m 
output? im looking for IFF_OACTIVE in ifconfig, an ALIVE value that is 
different to the CWM in systat mb, or really high numbers against the mbuf 
pools in vmstat -m output.

cheers,
dlg

> On 9 Mar 2016, at 04:23, Marc Espie  wrote:
> 
> Well, I've got this on my desktop for a while, didn't work at the time, got
> a cheap pci card that worked.
> 
> However, I got a NAS, with two network ports, and I thought "let's try
> dedicating one port to the desktop, that way I get 1Gb instead of going
> thru the switch".
> 
> 
> Unfortunately, re(4) is still broken in the exact same way as when I got the
> controler.
> 
> Symptom:
> during rsync copy, card completely freezes, tcpdump shows no packets.
> ^C in the copying window gets the card "back" up to speed.
> 
> /var/log/messages shows this each time:
> Mar  3 17:34:44 nausicaa /bsd: re0: watchdog timeout
> Mar  3 17:34:44 nausicaa /bsd: re0: stopping TXQ timed out!
> 
> There's probably something missing on that particular make of the controler.
> I note that the recent tests DON'T involve any RTL8168F, lucky me.
> 
> Suggestions/fix welcome...
> 
> dmesg:
> 
> OpenBSD 5.9-current (GENERIC.MP) #1905: Sun Mar  6 19:13:08 MST 2016
>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8522469376 (8127MB)
> avail mem = 8259854336 (7877MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb600 (49 entries)
> bios0: vendor American Megatrends Inc. version "0601" date 12/25/2012
> bios0: ASUSTeK COMPUTER INC. CM1435
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG HPET MSDM SSDT SSDT IVRS BGRT
> acpi0: wakeup devices SBAZ(S4) PS2K(S4) PS2M(S4) OHC1(S4) EHC1(S4) OHC2(S4) 
> EHC2(S4) OHC3(S4) EHC3(S4) OHC4(S4) XHC0(S4) XHC1(S4) PE21(S4) RLAN(S4) 
> PE22(S4) PE23(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 16 (boot processor)
> cpu0: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.45 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1
> cpu0: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu0: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 100MHz
> cpu0: mwait min=64, max=64, IBE
> cpu1 at mainbus0: apid 17 (application processor)
> cpu1: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.01 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1
> cpu1: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu1: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative
> cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 18 (application processor)
> cpu2: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.01 MHz
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1
> cpu2: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu2: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative
> cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 19 (application processor)
> cpu3: AMD A10-5700 APU with Radeon(tm) HD Graphics, 3417.01 MHz
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,F16C,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,XOP,SKINIT,WDT,FMA4,NODEID,TBM,TOPEXT,ITSC,BMI1
> cpu3: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB 64b/line 
> 16-way L2 cache
> cpu3: ITLB 48 4KB

Re: alignment fault on armv7 when using carp(4)

2016-02-09 Thread David Gwynne

On Mon, Feb 08, 2016 at 11:02:06PM +1000, David Gwynne wrote:
> On Sat, Feb 06, 2016 at 04:43:28PM -0500, Anthony Eden wrote:
> > >Synopsis:  
> > 
> > To me that behavior might suggest the problem is deeper than a
> > bookkeeping mistake of aligning memory in mbuf.
> 
> nope, you were right, it's a screwup with alignment.
> 
> the problem is multicast packets that arent to a carp interfaces
> mac address have to be duplicated and sent to all carp interfaces
> on a parent. the duplication is done with m_copym2, which doesn't
> respect the alignment requirements of the ip header inside the 14
> byte ethernet header.
> 
> the following dups the packet inside carp, and makes sure the
> ethernet payload is aligned properly.
> 
> i was able to reproduce this on sparc64, and i believe this fixes
> it. could you test it and see if it helps?

mpi@ pointed out that bridge@ has a special function to do a deep
copy of mbufs that get the ip payload alignment right, and that we
should share.

this moves the functionality in with the rest of the mbuf functions.

could a bridge user test this to see if it still works? carp seems
fine with this on sparc64 stil.

ok?

Index: kern/uipc_mbuf.c
===
RCS file: /cvs/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.218
diff -u -p -r1.218 uipc_mbuf.c
--- kern/uipc_mbuf.c31 Jan 2016 00:18:07 -  1.218
+++ kern/uipc_mbuf.c9 Feb 2016 09:49:21 -
@@ -1213,6 +1213,40 @@ m_dup_pkthdr(struct mbuf *to, struct mbu
return (0);
 }
 
+struct mbuf *
+m_dup_pkt(struct mbuf *m0, unsigned int adj, int wait)
+{
+   struct mbuf *m;
+   int len;
+
+   len = m0->m_pkthdr.len + adj;
+   if (len > MAXMCLBYTES) /* XXX */
+   return (NULL);
+
+   m = m_get(m0->m_type, wait);
+   if (m == NULL)
+   return (NULL);
+
+   if (m_dup_pkthdr(m, m0, wait) != 0)
+   goto fail;
+
+   if (len > MHLEN) {
+   MCLGETI(m, len, NULL, wait);
+   if (!ISSET(m->m_flags, M_EXT))
+   goto fail;
+   }
+
+   m->m_len = m->m_pkthdr.len = len;
+   m_adj(m, adj);
+   m_copydata(m0, 0, m0->m_pkthdr.len, mtod(m, caddr_t));
+
+   return (m);
+
+fail:
+   m_freem(m);
+   return (NULL);
+}
+
 #ifdef DDB
 void
 m_print(void *v,
Index: net/if_bridge.c
===
RCS file: /cvs/src/sys/net/if_bridge.c,v
retrieving revision 1.275
diff -u -p -r1.275 if_bridge.c
--- net/if_bridge.c 5 Dec 2015 10:07:55 -   1.275
+++ net/if_bridge.c 9 Feb 2016 09:49:21 -
@@ -137,7 +137,6 @@ int bridge_ipsec(struct bridge_softc *, 
 int bridge_clone_create(struct if_clone *, int);
 intbridge_clone_destroy(struct ifnet *ifp);
 intbridge_delete(struct bridge_softc *, struct bridge_iflist *);
-struct mbuf *bridge_m_dup(struct mbuf *);
 
 #defineETHERADDR_IS_IP_MCAST(a) \
/* struct etheraddr *a; */  \
@@ -800,7 +799,7 @@ bridge_output(struct ifnet *ifp, struct 
used = 1;
mc = m;
} else {
-   mc = bridge_m_dup(m);
+   mc = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT);
if (mc == NULL) {
sc->sc_if.if_oerrors++;
continue;
@@ -1090,7 +1089,7 @@ bridge_process(struct ifnet *ifp, struct
(ifl->bif_state == BSTP_IFSTATE_DISCARDING))
goto reenqueue;
 
-   mc = bridge_m_dup(m);
+   mc = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT);
if (mc == NULL)
goto reenqueue;
 
@@ -1227,7 +1226,7 @@ bridge_broadcast(struct bridge_softc *sc
mc = m;
used = 1;
} else {
-   mc = bridge_m_dup(m);
+   mc = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT);
if (mc == NULL) {
sc->sc_if.if_oerrors++;
continue;
@@ -1277,7 +1276,7 @@ bridge_localbroadcast(struct bridge_soft
return;
}
 
-   m1 = bridge_m_dup(m);
+   m1 = m_dup_pkt(m, ETHER_ALIGN, M_DONTWAIT);
if (m1 == NULL) {
sc->sc_if.if_oerrors++;
return;
@@ -2017,37 +2016,4 @@ bridge_copyaddr(struct sockaddr *src, st
memcpy(dst, src, src->sa_len);
else
dst->sa_family = AF_UNSPEC;
-}
-
-/*
- * Specialized deep copy to ensure that the payload after the Ethernet
- * header is nicely aligned.
- */
-struct mbuf *
-bridge_m

Re: alignment fault on armv7 when using carp(4)

2016-02-09 Thread David Gwynne

On Mon, Feb 08, 2016 at 05:45:50PM -0500, Anthony Eden wrote:
> Thanks for the quick response. Indeed, the patch makes the alignment
> faults go away. But the 'device timeout' messages coming from cpsw(4)
> remain.
> 
> To elaborate a bit, I set up three terminals pinging 192.168.123.201,
> 192.168.123.202, and 192.168.123.222 (the shared IP). After ~1min I
> get no answers from 192.168.123.201 and 192.168.123.201 (although the
> times differ). For a few minutes the hosts remain unreachable. dmesg
> output looks like
> 
> carp0: state transition: BACKUP -> MASTER
> carp0: state transition: MASTER -> BACKUP
> carp0: state transition: BACKUP -> MASTER
> cpsw0: device timeout
> carp0: state transition: MASTER -> BACKUP
> carp0: state transition: BACKUP -> MASTER
> cpsw0: device timeout
> ...
> 
> This bug seems unrelated to the alignment faults issue. I would
> investigate given some pointers in the right direction.
> 
> If this is under the purview of cpsw(4), would it be advisable to
> submit a new bug report?

id mail canacar@, he's been working on cpsw.

Re: alignment fault on armv7 when using carp(4)

2016-02-09 Thread David Gwynne


> On 9 Feb 2016, at 9:12 PM, Mike Belopuhov <m...@belopuhov.com> wrote:
> 
> On 9 February 2016 at 11:31, David Gwynne <da...@gwynne.id.au> wrote:
>> On Mon, Feb 08, 2016 at 11:02:06PM +1000, David Gwynne wrote:
>>> On Sat, Feb 06, 2016 at 04:43:28PM -0500, Anthony Eden wrote:
>>>>> Synopsis:  
>>>> 
>>>>To me that behavior might suggest the problem is deeper than a
>>>>bookkeeping mistake of aligning memory in mbuf.
>>> 
>>> nope, you were right, it's a screwup with alignment.
>>> 
>>> the problem is multicast packets that arent to a carp interfaces
>>> mac address have to be duplicated and sent to all carp interfaces
>>> on a parent. the duplication is done with m_copym2, which doesn't
>>> respect the alignment requirements of the ip header inside the 14
>>> byte ethernet header.
>>> 
>>> the following dups the packet inside carp, and makes sure the
>>> ethernet payload is aligned properly.
>>> 
>>> i was able to reproduce this on sparc64, and i believe this fixes
>>> it. could you test it and see if it helps?
>> 
>> mpi@ pointed out that bridge@ has a special function to do a deep
>> copy of mbufs that get the ip payload alignment right, and that we
>> should share.
>> 
>> this moves the functionality in with the rest of the mbuf functions.
>> 
>> could a bridge user test this to see if it still works? carp seems
>> fine with this on sparc64 stil.
>> 
>> ok?
>> 
> 
> m_adj can be done as part of the m_copym2 as well.

you want to shove m_adj into m_copym2? or you want m_copym2 callers to 
m_prepend 2 bytes first?

> In the long run I don't think that introducing a new function
> makes sense, not sure about 5.9 and right now, though.

im not sure using m_copym2 for a deep copy makes that much sense generally. 
it's not a great implementation, and the vast majority of the callers use it to 
copy everything.

Re: alignment fault on armv7 when using carp(4)

2016-02-08 Thread David Gwynne

On Sat, Feb 06, 2016 at 04:43:28PM -0500, Anthony Eden wrote:
> >Synopsis:  
> >Category:  arm
> >Environment:
> System  : OpenBSD 5.9
> Details : OpenBSD 5.9 (DBGGENERIC) #0: Sat Feb  6 12:22:27 EST 2016
>  r...@beagle2.mit.edu:/usr/src/sys/arch/armv7/compile/DBGGENERIC
> 
> Architecture: OpenBSD.armv7
> Machine : armv7
> >Description:
> With two beaglebone black's running -current, an alignment fault is
> encountered at ip_input.c:262 in ipv4_input() when they are
> configured to use carp(4) to share the same IP address.
> 
> Source context from ip_input.c (alignment fault occurs when
> ip->ip_dst.s_addr is loaded at line 262):
> 
> 258:ip = mtod(m, struct ip *);
> 259:}
> 260:
> 261:/* 127/8 must not appear on wire - RFC1122 */
> 262:if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET 
> ||
> 263:   (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) {
> 264:if ((ifp->if_flags & IFF_LOOPBACK) == 0) {
> 265:ipstat.ips_badaddr++;
> 266:goto bad;
> 
> ddb(4) output:
> 
> $ Fatal kernel mode data abort: 'Alignment Fault 1'
> trapframe: 0xcb2d8e40
> DFSR=0001, DFAR=c4cb401e, spsr=8013
> r0 =c924d400, r1 =0003, r2 =0045, r3 =0038
> r4 =c4cb400e, r5 =c06f2ca4, r6 =0014, r7 =c4d65800
> r8 =c0710e50, r9 =c069294c, r10=c0692918, r11=cb2d8eb8
> r12=6093, ssp=cb2d8e8c, slr=c040bc88, pc =c04616ec
> 
> Stopped at  ipv4_input+0x9c:ldrls   r3, [r4, #0x010]
> ddb> trace
> ipv4_input+0xc
> scp=0xc046165c rlv=0xc0461ab4 (ipintr+0x24)
> rsp=0xcb2d8ebc rfp=0xcb2d8ecc
> r10=0xc0692918 r8=0xc0710e50 r7=0xc06edd88 r6=0xc06edd88
> r5=0x r4=0x0004
> ipintr+0xc
> scp=0xc0461a9c rlv=0xc041b290 (netintr+0xa0)
> rsp=0xcb2d8ed0 rfp=0xcb2d8ef0
> netintr+0xc
> scp=0xc041b1fc rlv=0xc053f3d0 (softintr_dispatch+0x84)
> rsp=0xcb2d8ef4 rfp=0xcb2d8f10
> r7=0x r6=0xc0710eb4 r5=0xc0710ec0 r4=0xc89e13a0
> softintr_dispatch+0x18
> scp=0xc053f364 rlv=0xc053eef8 (arm_do_pending_intr+0x110)
> rsp=0xcb2d8f14 rfp=0xcb2d8f40
> r6=0xc0710190 r5=0x2013 r4=0x0004
> arm_do_pending_intr+0x10
> scp=0xc053edf8 rlv=0xc040d9a8 (if_input_process+0xcc)
> rsp=0xcb2d8f44 rfp=0xcb2d8f78
> r10=0xc0692918 r9=0x r8=0x r7=0xcb2d8f44
> r6=0x r5=0xc4d65800 r4=0xc4d57480
> if_input_process+0xc
> scp=0xc040d8e8 rlv=0xc03b5c2c (taskq_thread+0x90)
> rsp=0xcb2d8f7c rfp=0xcb2d8fb0
> r10=0xc06e643c r8=0xc06e65d8 r7=0xcb2d8f7c r6=0x0001
> r5=0xc89e2040 r4=0xc03b5b04
> taskq_thread+0xc
> scp=0xc03b5ba8 rlv=0xc0536c10 (proc_trampoline+0x18)
> rsp=0xcb2d8fb4 rfp=0xc07f3edc
> r7=0x r6=0x r5=0xc89e2040 r4=0xc03b5b9c
> Bad frame pointer: 0xc07f3edc
> 
> this problem has also been encountered with both BB's running -stable.
> 
> >How-To-Repeat:
> Install either -current or -stable on two beaglebone black's, with names
> beagle1 and beagle2. On a LAN 192.168.123.0/24 with default
> gateway 192.168.123.2, set /etc/mygate to 192.168.123.2 on beagle1 and
> beagle2, then set /etc/hostname.cpsw0 on beagle1 to be
> 
> inet 192.168.123.201 255.255.255.0 NONE
> 
> and on beagle2
> 
> inet 192.168.123.202 255.255.255.0 NONE
> 
> then run the following commands on both to use carp(4):
> 
> doas ifconfig carp0 create
> doas ifconfig carp0 vhid 1 pass tyrell carpdev cpsw0 192.168.123.222
> netmask 255.255.255.0
> 
> shortly thereafter a beaglebone will encounter an alignment fault.
> 
> >Fix:
> The cause of this problem is unknown to me. I would speculate that the
> issue lies in m_pullup mishandling alignment, given that netowkring on
> the beaglebone black usually functions normally, and that there are
> branches prior to the crash in which m_pullup is used in deriving a
> pointer to ip, which when using carp(4) apparently misaligned.
> 
> In investigating this issue further, I replaced offending 32-bit loads
> in the kernel with calls to get_unaligned_le32(), defined as (from
> linux/unaligned/packed_struct.h):
> 
> struct __una_u32 { u32 x; } __packed;
> static inline u32 get_unaligned_le32(const void *p) {
> const struct __una_u32 *ptr = (const struct __una_u32 *)p;
> return ptr->x;
> }
> 
> Other than replacements in ip_input.c, udp_usrreq.c was also changed as
> well as the macros IN6_IS_ADDR_UNSPECIFIED, IN6_IS_ADDR_LOOPBACK,
> IN6_IS_ADDR_V4COMPAT, and IN6_IS_ADDR_V4MAPPED in in6.h.
> 
> This resulted in carp(4) appearing to function normally, but beagle1
> and beagle2 repeatedly lost networking temporarily and recurrent
> 'device timeout's appeared in dmesg (as well as carp(4) messages
> informing state changes from master to

Re: vr(4) mbuf leak

2016-01-25 Thread David Gwynne


> On 25 Jan 2016, at 9:19 AM, Richard Procter  
> wrote:
> 
> 
> vr(4) leaks mbufs on vr_encap() failure:
> 
>  - neither vr_encap() nor vr_start() call m_free*()
>  - *m has been dequeued at call to vr_encap()
> 
> Tested by forcing vr_encap() failure and observing 'netstat -m'
> 
> While here, prettify a NULL test.

absolutely correct. i just committed these tweaks.

thank you,
dlg

> 
> best, 
> Richard. 
> 
> vr0 at pci0 dev 9 function 0 "VIA VT6105M RhineIII" rev 0x96: irq 10, 
> address 00:0d:b9:xx:xx:xx
> ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 
> 0x004063, model 0x0034
> 
> Index: sys/dev/pci/if_vr.c
> ===
> --- sys.orig/dev/pci/if_vr.c
> +++ sys/dev/pci/if_vr.c
> @@ -1324,12 +1324,13 @@ vr_start(struct ifnet *ifp)
>   }
> 
>   IFQ_DEQUEUE(>if_snd, m);
> - if (m== NULL)
> + if (m == NULL)
>   break;
> 
>   /* Pack the data into the descriptor. */
>   head_tx = cur_tx;
>   if (vr_encap(sc, _tx, m)) {
> + m_freem(m);
>   ifp->if_oerrors++;
>   continue;
>   }
>

Re: system crash on latest snapshot

2016-01-12 Thread David Gwynne


> On 12 Jan 2016, at 22:24, Timo Myyrä  wrote:
>>> 
> 
> No joy in sight, just got another router crash while streaming video (tablet)
> and downloading files (laptop) at the same time through it. Nothing in the 
> logs
> so no details to share. Both clients were connected to the crashing
> router/firewall with its athn0 wireless adapter. I quess the increased stress 
> to
> egress em0 interface trigger the crash but as I don't have any real facts to 
> share.
> 
> I updated kernel this morning as I saw you made some changes to if_em.c to see
> it would help. Doesn't seem to be the case for me.

do you know if it was the same panic or what the trace was like?

dlg

Re: system crash on latest snapshot

2016-01-11 Thread David Gwynne

Any joy? Out the other one where it panics?
On 12 Jan 2016 04:43, "Timo Myyrä" <timo.my...@wickedbsd.net> wrote:

> David Gwynne <da...@gwynne.id.au> writes:
>
> >> On 11 Jan 2016, at 3:59 AM, timo.my...@wickedbsd.net wrote:
> >>
> >>> Synopsis:   server occasional hangs / crashes on latest snapshot
> >>> Category:   kernel
> >>> Environment:
> >>  System  : OpenBSD 5.9
> >>  Details : OpenBSD 5.9-beta (GENERIC.MP) #1807: Sat Jan  9
> 13:46:21 MST 2016
> >>   dera...@amd64.openbsd.org:
> /usr/src/sys/arch/amd64/compile/GENERIC.MP
> >>
> >>  Architecture: OpenBSD.amd64
> >>  Machine : amd64
> >>> Description:
> >>
> >>  I updated my router box to 8.1. snapshot. Noticed that the system
> >>  would hang / crash occasional in the normal use.
> >>  /var/log/messages has following:
> >>  Jan 10 11:36:17 charon /bsd: em0: watchdog: cons 0 prod 503 free
> 521 TDH 503 TDT 503
> >>  Jan 10 18:25:12 charon /bsd: em0: watchdog: cons 0 prod 494 free
> 530 TDH 494 TDT 494
> >>  Jan 10 18:29:40 charon /bsd: fatal protection fault in supervisor
> mode
> >>  Jan 10 18:29:40 charon /bsd: trap type 4 code 0 rip
> 811f2d60 cs 8 rflags 10282 cr2  ef9ef4cf010 cpl 5 rsp
> 8000319c38c8
> >>  Jan 10 18:29:40 charon /bsd: panic: trap type 4, code=0,
> pc=811f2d60
> >>  Jan 10 18:29:40 charon /bsd: Starting stack trace...
> >>  Jan 10 18:29:40 charon /bsd: panic() at panic+0x10b
> >>  Jan 10 18:29:40 charon /bsd: trap() at trap+0x7b8
> >>  Jan 10 18:29:40 charon /bsd: --- trap (number 4) ---
> >>  Jan 10 18:29:40 charon /bsd: _bpf_mtap() at _bpf_mtap+0x50
> >>  Jan 10 18:29:40 charon /bsd: bpf_mtap_ether() at
> bpf_mtap_ether+0x39
> >>  Jan 10 18:29:40 charon /bsd: em_start() at em_start+0xa7
> >>  Jan 10 18:29:40 charon /bsd: ifq_serialize() at ifq_serialize+0xd9
> >>  Jan 10 18:29:40 charon /bsd: if_enqueue() at if_enqueue+0x71
> >>  Jan 10 18:29:40 charon /bsd: ether_output() at ether_output+0x166
> >>  Jan 10 18:29:40 charon /bsd: ip_output() at ip_output+0x71b
> >>  Jan 10 18:29:40 charon /bsd: ip_forward() at ip_forward+0x1cb
> >>  Jan 10 18:29:40 charon /bsd: ipv4_input() at ipv4_input+0x309
> >>  Jan 10 18:29:40 charon /bsd: ipintr() at ipintr+0x1e
> >>  Jan 10 18:29:40 charon /bsd: netintr() at netintr+0x64
> >>  Jan 10 18:29:40 charon /bsd: softintr_dispatch() at
> softintr_dispatch+0x8b
> >>  Jan 10 18:29:40 charon /bsd: Xsoftnet() at Xsoftnet+0x1f
> >>  Jan 10 18:29:40 charon /bsd: --- interrupt ---
> >>  Jan 10 18:29:40 charon /bsd: end trace frame: 0x0, count: 242
> >>  Jan 10 18:29:40 charon /bsd: taskq_thread+0x6c:
> >>  Jan 10 18:29:40 charon /bsd: End of stack trace.
> >>
> >>
> >>> How-To-Repeat:
> >>  Keep the system running.
> >>> Fix:
> >>  not known
> >
> > ola,
> >
> > could you try -current, specifically after src/sys/dev/pci/if_em.c
> r1.328, and see if this is still a problem?
> >
> > dlg
>
> Ok,
>
> I updated to the latest snapshot (dated Mon Jan 11
> 01:47:58 MST 2016) and with it system seemed to crash while I was away.
> Couldn't see anything in the logs nor was the display or keyboard
> responsive so
> I haven't got any details out of it.
>
> Just updated the source tree and compiled kernel from latest sources.
> Waiting to see
> how it behaves with it.
>
> Timo
>
>

Re: system crash on latest snapshot

2016-01-11 Thread David Gwynne


> On 11 Jan 2016, at 3:59 AM, timo.my...@wickedbsd.net wrote:
> 
>> Synopsis:server occasional hangs / crashes on latest snapshot
>> Category:kernel
>> Environment:
>   System  : OpenBSD 5.9
>   Details : OpenBSD 5.9-beta (GENERIC.MP) #1807: Sat Jan  9 13:46:21 
> MST 2016
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>> Description:
> 
>   I updated my router box to 8.1. snapshot. Noticed that the system
>   would hang / crash occasional in the normal use. 
>   /var/log/messages has following:
>   Jan 10 11:36:17 charon /bsd: em0: watchdog: cons 0 prod 503 free 521 
> TDH 503 TDT 503
>   Jan 10 18:25:12 charon /bsd: em0: watchdog: cons 0 prod 494 free 530 
> TDH 494 TDT 494
>   Jan 10 18:29:40 charon /bsd: fatal protection fault in supervisor mode
>   Jan 10 18:29:40 charon /bsd: trap type 4 code 0 rip 811f2d60 cs 
> 8 rflags 10282 cr2  ef9ef4cf010 cpl 5 rsp 8000319c38c8
>   Jan 10 18:29:40 charon /bsd: panic: trap type 4, code=0, 
> pc=811f2d60
>   Jan 10 18:29:40 charon /bsd: Starting stack trace...
>   Jan 10 18:29:40 charon /bsd: panic() at panic+0x10b
>   Jan 10 18:29:40 charon /bsd: trap() at trap+0x7b8
>   Jan 10 18:29:40 charon /bsd: --- trap (number 4) ---
>   Jan 10 18:29:40 charon /bsd: _bpf_mtap() at _bpf_mtap+0x50
>   Jan 10 18:29:40 charon /bsd: bpf_mtap_ether() at bpf_mtap_ether+0x39
>   Jan 10 18:29:40 charon /bsd: em_start() at em_start+0xa7
>   Jan 10 18:29:40 charon /bsd: ifq_serialize() at ifq_serialize+0xd9
>   Jan 10 18:29:40 charon /bsd: if_enqueue() at if_enqueue+0x71
>   Jan 10 18:29:40 charon /bsd: ether_output() at ether_output+0x166
>   Jan 10 18:29:40 charon /bsd: ip_output() at ip_output+0x71b
>   Jan 10 18:29:40 charon /bsd: ip_forward() at ip_forward+0x1cb
>   Jan 10 18:29:40 charon /bsd: ipv4_input() at ipv4_input+0x309
>   Jan 10 18:29:40 charon /bsd: ipintr() at ipintr+0x1e
>   Jan 10 18:29:40 charon /bsd: netintr() at netintr+0x64
>   Jan 10 18:29:40 charon /bsd: softintr_dispatch() at 
> softintr_dispatch+0x8b
>   Jan 10 18:29:40 charon /bsd: Xsoftnet() at Xsoftnet+0x1f
>   Jan 10 18:29:40 charon /bsd: --- interrupt ---
>   Jan 10 18:29:40 charon /bsd: end trace frame: 0x0, count: 242
>   Jan 10 18:29:40 charon /bsd: taskq_thread+0x6c:
>   Jan 10 18:29:40 charon /bsd: End of stack trace.
>   
>   
>> How-To-Repeat:
>   Keep the system running. 
>> Fix:
>   not known

ola,

could you try -current, specifically after src/sys/dev/pci/if_em.c r1.328, and 
see if this is still a problem?

dlg

Re: httpd crashes when fetching a hidden file located on a CD

2015-12-12 Thread David Gwynne


> On 11 Dec 2015, at 9:23 PM, Ted Unangst  wrote:
> 
> Ted Unangst wrote:
>> Jonathan Gray wrote:
 
 There's one thing to add though, it looks like it happens for any file on
 cd9660, not just dotfiles.
>>> 
>>> It is worth pointing out that httpd has had trouble serving files off
>>> specific filesystems in the past due to kqueue issues.
>>> 
>>> cd9660_vops does not currently set .vop_kqfilter, does anything change
>>> if you set EVENT_NOKQUEUE before running httpd?
>> 
>> this maybe adds kqueue to cd9660.
> 
> We've confirmed this diff fixes the problem. However, there seems to be a
> larger problem that httpd/libevent cannot gracefully handle the condition
> where kevent returns an error. We are doomed to see this problem repeat if
> that is not addressed.

on one hand i agree with you, but on the other i wonder why httpd thinks 
setting events up on files is useful.

Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

2015-12-07 Thread David Gwynne


> On 8 Dec 2015, at 10:17, Steven Chamberlain <ste...@pyro.eu.org> wrote:
> 
> Hi,
> 
> David Gwynne wrote:
>> hrm. could you try it without your diff below and see if its still stable?
> 
> I can still reproduce the same crash, if I use your diff without mine.
> 
> I'll try again with latest -CURRENT in light of these recent commits:
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/dev/vnet.c.diff?r1=1.51=1.52=h
> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/dev/vnet.c.diff?r1=1.52=1.53=h

thats a network driver for "hardware" you dont have on an x1.

> I wonder, exactly which pools does dc(4) use?  As with previous crashes,
> dma512 had Pgreq=1, Pgrel=1 meaning it was empty at the time, and had
> already been freed.

dc uses the mbuf pools. the dma512 use comes from an ata driver from what i can 
tell.

maybe i should try and get an x1 to reproduce this with.

Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

2015-12-01 Thread David Gwynne


> On 30 Nov 2015, at 9:01 AM, Steven Chamberlain <ste...@pyro.eu.org> wrote:
> 
> Hi!
> 
> David Gwynne wrote:
>> could you guys try this and let me know how it goes? i dont expect
>> it to fix the problems, but i also dont expect them to get worse.
> 
> I've never seen those panics with dc(4), but I'm now testing -CURRENT
> with your patch applied as well as mine below, and my Netra X1 is stable
> so far.  Thanks!

hrm. could you try it without your diff below and see if its still stable?

my theory is dc is (was?) sensitive to the layout of objects in memory, so by 
moving the pool page headers in or out of the item pages you're moving dc next 
to something that ends up causing the iommu to fault.

cheers,
dlg


> 
> --- kern/subr_pool.c11 Sep 2015 09:26:13 -  1.193
> +++ kern/subr_pool.c29 Nov 2015 22:59:21 -
> @@ -258,7 +258,7 @@ pool_init(struct pool *pp, size_t size, 
> */
>if (pgsize - (size * items) > sizeof(struct pool_item_header)) {
>off = pgsize - sizeof(struct pool_item_header);
> -   } else if (sizeof(struct pool_item_header) * 2 >= size) {
> +   } else if (sizeof(struct pool_item_header) * 8 >= size) {
>off = pgsize - sizeof(struct pool_item_header);
>items = off / size;
>}
> 
> Regards,
> -- 
> Steven Chamberlain
> ste...@pyro.eu.org

Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

2015-11-28 Thread David Gwynne


> On 29 Nov 2015, at 5:55 AM, Fred <open...@crowsons.com> wrote:
> 
> On 11/27/15 12:43, David Gwynne wrote:
>> On Wed, Nov 25, 2015 at 08:32:19PM +, Fred wrote:
>>> 
>>> Well with that diff I hit another panic - which seems to be
>>> triggered by the nic:
>> 
>> i also think this is related to the nic. i have started cleaning
>> dc(4), but would like some tests before going further.
>> 
>> could you guys try this and let me know how it goes? i dont expect
>> it to fix the problems, but i also dont expect them to get worse.
>> 
>> cheers,
>> dlg
>> 
> 
> Hi David,
> 
> I've been running this diff today - and so far the kernel has been more 
> stable then it has been recently.

ok. ill put it in. thank you for testing.

ill have a look at another cleanup soon too.

cheers,
dlg

> 
> Output from ifconfig dc, and dmesg below.
> 
> Thanks
> 
> Fred
> 
> dc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>   lladdr 00:03:ba:13:a8:c7
>   priority: 0
>   media: Ethernet autoselect (none)
>   status: no carrier
>   inet 192.168.50.50 netmask 0xff00 broadcast 192.168.50.255
> dc1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>   lladdr 00:03:ba:13:a8:c8
>   priority: 0
>   groups: egress
>   media: Ethernet autoselect (100baseTX full-duplex)
>   status: active
>   inet 192.168.50.51 netmask 0xff00 broadcast 192.168.50.255
> 
> dmesg:OpenBSD 5.8-current (bsddc) #0: Sat Nov 28 14:12:26 GMT 2015
>f...@ultra.crowsons.com:/usr/src/sys/arch/sparc64/compile/bsddc
> real mem = 268435456 (256MB)
> avail mem = 248209408 (236MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root: Sun Fire V100 (UltraSPARC-IIe 500MHz)
> cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
> cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K external (64 
> b/l)
> psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
> psycho0: bus range 0-0, PCI bus 0
> psycho0: dvma map 6000-7fff
> pci0 at psycho0
> ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
> "dma" at ebus0 addr 0- ivec 0x2a not configured
> rtc0 at ebus0 addr 70-71: m5819
> power0 at ebus0 addr 2000-2007 ivec 0x23
> lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.11
> com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
> com0: console
> com1 at ebus0 addr 2e8-2ef ivec 0x2b: ns16550a, 16 byte fifo
> "flashprom" at ebus0 addr 0-7 not configured
> alipm0 at pci0 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock
> iic0 at alipm0
> "max1617" at alipm0 addr 0x18 skipped due to alipm0 bugs
> spdmem0 at iic0 addr 0x56: 128MB SDRAM registered ECC PC133CL2
> spdmem1 at iic0 addr 0x57: 128MB SDRAM registered ECC PC133CL2
> dc0 at pci0 dev 12 function 0 "Davicom DM9102" rev 0x31: ivec 0x7c6, address 
> 00:03:ba:13:a8:c7
> amphy0 at dc0 phy 1: DM9102 10/100 PHY, rev. 0
> dc1 at pci0 dev 5 function 0 "Davicom DM9102" rev 0x31: ivec 0x7dc, address 
> 00:03:ba:13:a8:c8
> amphy1 at dc1 phy 1: DM9102 10/100 PHY, rev. 0
> ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7e4, 
> version 1.0, legacy support
> pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, 
> channel 0 configured to native-PCI, channel 1 configured to native-PCI
> pciide0: using ivec 0x7cc for native-PCI interrupt
> pciide0: channel 0 disabled (no drives)
> wd0 at pciide0 channel 1 drive 0: 
> wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
> atapiscsi0 at pciide0 channel 1 drive 1
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0: <TEAC, CD-224E, 1.7A> ATAPI 5/cdrom removable
> wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> cd0(pciide0:1:1): using PIO mode 4, Ultra-DMA mode 2
> usb0 at ohci0: USB revision 1.0
> uhub0 at usb0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> bootpath: /pci@1f,0/ide@d,0/disk@2,0
> root on wd0a (a0af7098621c5786.a) swap on wd0b dump on wd0b
>

Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

2015-11-27 Thread David Gwynne

On Wed, Nov 25, 2015 at 08:32:19PM +, Fred wrote:
> 
> Well with that diff I hit another panic - which seems to be
> triggered by the nic:

i also think this is related to the nic. i have started cleaning
dc(4), but would like some tests before going further.

could you guys try this and let me know how it goes? i dont expect
it to fix the problems, but i also dont expect them to get worse.

cheers,
dlg

Index: dc.c
===
RCS file: /cvs/src/sys/dev/ic/dc.c,v
retrieving revision 1.148
diff -u -p -r1.148 dc.c
--- dc.c25 Nov 2015 03:09:58 -  1.148
+++ dc.c27 Nov 2015 12:37:04 -
@@ -125,8 +125,7 @@
 int dc_intr(void *);
 struct dc_type *dc_devtype(void *);
 int dc_newbuf(struct dc_softc *, int, struct mbuf *);
-int dc_encap(struct dc_softc *, struct mbuf *, u_int32_t *);
-int dc_coal(struct dc_softc *, struct mbuf **);
+int dc_encap(struct dc_softc *, bus_dmamap_t, struct mbuf *, u_int32_t *);
 
 void dc_pnic_rx_bug_war(struct dc_softc *, int);
 int dc_rx_resync(struct dc_softc *);
@@ -1658,17 +1657,19 @@ hasmac:
BUS_DMA_NOWAIT, >sc_rx_sparemap) != 0) {
printf(": can't create rx spare map\n");
return;
-   }
+   }   
 
for (i = 0; i < DC_TX_LIST_CNT; i++) {
if (bus_dmamap_create(sc->sc_dmat, MCLBYTES,
-   DC_TX_LIST_CNT - 5, MCLBYTES, 0, BUS_DMA_NOWAIT,
+   (sc->dc_flags & DC_TX_COALESCE) ? 1 : DC_TX_LIST_CNT - 5,
+   MCLBYTES, 0, BUS_DMA_NOWAIT,
>dc_cdata.dc_tx_chain[i].sd_map) != 0) {
printf(": can't create tx map\n");
return;
}
}
-   if (bus_dmamap_create(sc->sc_dmat, MCLBYTES, DC_TX_LIST_CNT - 5,
+   if (bus_dmamap_create(sc->sc_dmat, MCLBYTES,
+   (sc->dc_flags & DC_TX_COALESCE) ? 1 : DC_TX_LIST_CNT - 5,
MCLBYTES, 0, BUS_DMA_NOWAIT, >sc_tx_sparemap) != 0) {
printf(": can't create tx spare map\n");
return;
@@ -2488,39 +2489,14 @@ dc_intr(void *arg)
  * pointers to the fragment pointers.
  */
 int
-dc_encap(struct dc_softc *sc, struct mbuf *m_head, u_int32_t *txidx)
+dc_encap(struct dc_softc *sc, bus_dmamap_t map, struct mbuf *m, u_int32_t *idx)
 {
struct dc_desc *f = NULL;
int frag, cur, cnt = 0, i;
-   bus_dmamap_t map;
-
-   /*
-* Start packing the mbufs in this chain into
-* the fragment pointers. Stop when we run out
-* of fragments or hit the end of the mbuf chain.
-*/
-   map = sc->sc_tx_sparemap;
-
-   if (bus_dmamap_load_mbuf(sc->sc_dmat, map,
-   m_head, BUS_DMA_NOWAIT) != 0)
-   return (ENOBUFS);
 
-   cur = frag = *txidx;
+   cur = frag = *idx;
 
for (i = 0; i < map->dm_nsegs; i++) {
-   if (sc->dc_flags & DC_TX_ADMTEK_WAR) {
-   if (*txidx != sc->dc_cdata.dc_tx_prod &&
-   frag == (DC_TX_LIST_CNT - 1)) {
-   bus_dmamap_unload(sc->sc_dmat, map);
-   return (ENOBUFS);
-   }
-   }
-   if ((DC_TX_LIST_CNT -
-   (sc->dc_cdata.dc_tx_cnt + cnt)) < 5) {
-   bus_dmamap_unload(sc->sc_dmat, map);
-   return (ENOBUFS);
-   }
-
f = >dc_ldata->dc_tx_list[frag];
f->dc_ctl = htole32(DC_TXCTL_TLINK | map->dm_segs[i].ds_len);
if (cnt == 0) {
@@ -2535,12 +2511,12 @@ dc_encap(struct dc_softc *sc, struct mbu
}
 
sc->dc_cdata.dc_tx_cnt += cnt;
-   sc->dc_cdata.dc_tx_chain[cur].sd_mbuf = m_head;
+   sc->dc_cdata.dc_tx_chain[cur].sd_mbuf = m;
sc->sc_tx_sparemap = sc->dc_cdata.dc_tx_chain[cur].sd_map;
sc->dc_cdata.dc_tx_chain[cur].sd_map = map;
sc->dc_ldata->dc_tx_list[cur].dc_ctl |= htole32(DC_TXCTL_LASTFRAG);
if (sc->dc_flags & DC_TX_INTR_FIRSTFRAG)
-   sc->dc_ldata->dc_tx_list[*txidx].dc_ctl |=
+   sc->dc_ldata->dc_tx_list[*idx].dc_ctl |=
htole32(DC_TXCTL_FINT);
if (sc->dc_flags & DC_TX_INTR_ALWAYS)
sc->dc_ldata->dc_tx_list[cur].dc_ctl |=
@@ -2551,43 +2527,9 @@ dc_encap(struct dc_softc *sc, struct mbu
bus_dmamap_sync(sc->sc_dmat, map, 0, map->dm_mapsize,
BUS_DMASYNC_PREWRITE);
 
-   sc->dc_ldata->dc_tx_list[*txidx].dc_status = htole32(DC_TXSTAT_OWN);
-
-   bus_dmamap_sync(sc->sc_dmat, sc->sc_listmap,
-   offsetof(struct dc_list_data, dc_tx_list[*txidx]),
-   sizeof(struct dc_desc) * cnt,
-   BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
-
-   *txidx = frag;
+   sc->dc_ldata->dc_tx_list[*idx].dc_status = htole32(DC_TXSTAT_OWN);
 
-   return (0);
-}
-
-/*
- * Coalesce an mbuf chain into a single mbuf

Re: kernel panic - sparc64 on Netra X1 - psycho0: uncorrectable DMA error AFAR

2015-11-19 Thread David Gwynne


> On 20 Nov 2015, at 09:48, Steven Chamberlain  wrote:
> 
> Hi,
> 
> Fred wrote:
>> I've just updated to:
>> OpenBSD 5.8-current (GENERIC) #801: Wed Nov 18 16:37:51 MST 2015
> 
> I'd used:
> OpenBSD 5.8-current (GENERIC) #799: Wed Nov 18 01:34:20 MST 2015
> 
>> but I still had the following panic:
>> 
>> panic: psycho0: uncorrectable DMA error AFAR 663f8450 (pa=0 tte=0/6218a012)
>> AFSR 41ff4080
> 
> Damn!  although this was stable for a couple of hours yesterday
> (and that's quite an improvement to how it was) - today I booted
> it up and it crashed on my very first SSH login attempt.
> 
> Thank you for re-testing anyway.  I notice your machine crashed in
> process sshd also.  My backtrace looks slightly different to yours so I
> share it here:

my guess this is something to do with handling of long mbuf chains for 
transmit. i say that because ssh is very good at generating these long chains, 
and it has caused problems on various other chips. either that or the chip has 
alignment or minimum transfer requirements that such packets dont respect.

if you're inclined to play with the code, try making the code coalesce 
(dc_coal) every packet and see if the problem still occurs.

dlg


> 
> panic: psycho0: uncorrectable DMA error AFAR 6e868448 (pa=0 tte=0/69a12012) 
> AFSR 4100ff002080
> Stopped at  Debugger+0x8:   nop
>   TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *13074  13074  00x12  00  sshd
> psycho_ue(48a3200, 0, 4000fb13bc8, 0, 0, 0) at psycho_ue+0x7c
> intr_handler(e0017ec8, 48a3300, 58c86, 4000fb13cd8, 1, 0) at 
> intr_handler+0xc
> sparc_interrupt(0, 4000fb13db0, 4000fb13df0, 0, 0, 14b) at 
> sparc_interrupt+0x298
> syscall(4000fb13ed0, 404, 1cd56442e8, 1cd56442ec, 0, 0) at syscall+0x34c
> softtrap(3, 1c49412ee4, 54, 0, 0, 0) at softtrap+0x19c
> http://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb> trace
> psycho_ue(48a3200, 0, 4000fb13bc8, 0, 0, 0) at psycho_ue+0x7c
> intr_handler(e0017ec8, 48a3300, 58c86, 4000fb13cd8, 1, 0) at 
> intr_handler+0xc
> sparc_interrupt(0, 4000fb13db0, 4000fb13df0, 0, 0, 14b) at 
> sparc_interrupt+0x298
> syscall(4000fb13ed0, 404, 1cd56442e8, 1cd56442ec, 0, 0) at syscall+0x34c
> softtrap(3, 1c49412ee4, 54, 0, 0, 0) at softtrap+0x19c
> ddb> ps
>   TID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND
>  4621  13074   4621  0  20x11sshd
> *13074   5310  13074  0  70x12sshd
> 19768  1  19768  0  30x83  ttyin getty
>  9712  1   9712  0  30x80  poll  cron
> 30895  1  30895 99  30x90  poll  sndiod
> 24430  1  24430 79  30x90  kqreadtftpd
> 15649  18969  18969 95  30x90  kqreadsmtpd
>  9170  18969  18969 95  30x90  kqreadsmtpd
> 31583  18969  18969 95  30x90  kqreadsmtpd
>46  18969  18969 95  30x90  kqreadsmtpd
> 32085  18969  18969 95  30x90  kqreadsmtpd
> 14545  18969  18969103  30x90  kqreadsmtpd
> 18969  1  18969  0  30x80  kqreadsmtpd
> 17656  10194  10194  0  30x82  piperdcat
> 10194  17726  10194  0  30x8a  pause ksh
> 24201  1  24201 77  30x90  poll  dhcpd
> 17726   5310  17726  0  20x12sshd
>  5310  1   5310  0  30x80  selectsshd
> 11444  22135  26256 83  30x90  poll  ntpd
> 22135  26256  26256 83  30x90  poll  ntpd
> 26256  1  26256  0  30x80  poll  ntpd
> 20693  19390  19390 74  30x90  bpf   pflogd
> 19390  1  19390  0  30x80  netio pflogd
> 18490  1  1 73  30x90  kqreadsyslogd
> 1  1  1  0  30x80  netio syslogd
> 19396  0  0  0  2 0x14200zerothread
> 27403  0  0  0  3 0x14200  aiodoned  aiodoned
> 30791  0  0  0  3 0x14200  syncerupdate
>  5885  0  0  0  3 0x14200  cleaner   cleaner
> 17162  0  0  0  3 0x14200  reaperreaper
>  6922  0  0  0  3 0x14200  pgdaemon  pagedaemon
> 10422  0  0  0  3 0x14200  bored crypto
>  7995  0  0  0  3 0x14200  pftm  pfpurge
> 28357  0  0  0  3 0x14200  usbtskusbtask
> 16205  0  0  0  3 0x14200  usbatsk   usbatsk
>  9316  0  0  0  3 0x14200  bored sensors
> 14821  0  0  0  3 0x14200  bored softnet
> 25203  0  0  0  3 0x14200  bored systqmp
> 20530  0  0

Re: OpenBSD 5.6 Kernel panic - panic: mtx_enter: locking against myself

2015-05-28 Thread David Gwynne

On Mon, Mar 30, 2015 at 05:01:57PM -0500, Walter Daugherity wrote:
 You are right, there were two interrupts.  Here is the latest crash trace,
 ps, show registers, and dmesg.   (There was also another crash which just
 died and never went to ddb.)

you could try the following diff to see if it fixes this problem:

Index: if_ppp.c
===
RCS file: /cvs/src/sys/net/if_ppp.c,v
retrieving revision 1.83
diff -u -p -r1.83 if_ppp.c
--- if_ppp.c13 May 2015 10:42:46 -  1.83
+++ if_ppp.c28 May 2015 05:51:28 -
@@ -154,18 +154,10 @@ static void   ppp_ifstart(struct ifnet *if
 intppp_clone_create(struct if_clone *, int);
 intppp_clone_destroy(struct ifnet *);
 
-/*
- * Some useful mbuf macros not in mbuf.h.
- */
-#define M_IS_CLUSTER(m)((m)-m_flags  M_EXT)
-
-#define M_DATASTART(m) \
-   (M_IS_CLUSTER(m) ? (m)-m_ext.ext_buf : \
-   (m)-m_flags  M_PKTHDR ? (m)-m_pktdat : (m)-m_dat)
-
-#define M_DATASIZE(m)  \
-   (M_IS_CLUSTER(m) ? (m)-m_ext.ext_size : \
-   (m)-m_flags  M_PKTHDR ? MHLEN: MLEN)
+void   ppp_pkt_list_init(struct ppp_pkt_list *, u_int);
+intppp_pkt_enqueue(struct ppp_pkt_list *, struct ppp_pkt *);
+struct ppp_pkt *ppp_pkt_dequeue(struct ppp_pkt_list *);
+struct mbuf *  ppp_pkt_mbuf(struct ppp_pkt *);
 
 /*
  * We steal two bits in the mbuf m_flags, to mark high-priority packets
@@ -234,7 +226,7 @@ ppp_clone_create(struct if_clone *ifc, i
 IFQ_SET_MAXLEN(sc-sc_if.if_snd, IFQ_MAXLEN);
 mq_init(sc-sc_inq, IFQ_MAXLEN, IPL_NET);
 IFQ_SET_MAXLEN(sc-sc_fastq, IFQ_MAXLEN);
-IFQ_SET_MAXLEN(sc-sc_rawq, IFQ_MAXLEN);
+ppp_pkt_list_init(sc-sc_rawq, IFQ_MAXLEN);
 IFQ_SET_READY(sc-sc_if.if_snd);
 if_attach(sc-sc_if);
 if_alloc_sadl(sc-sc_if);
@@ -315,6 +307,7 @@ pppalloc(pid_t pid)
 void
 pppdealloc(struct ppp_softc *sc)
 {
+struct ppp_pkt *pkt;
 struct mbuf *m;
 
 splsoftassert(IPL_SOFTNET);
@@ -323,12 +316,8 @@ pppdealloc(struct ppp_softc *sc)
 sc-sc_if.if_flags = ~(IFF_UP|IFF_RUNNING);
 sc-sc_devp = NULL;
 sc-sc_xfer = 0;
-for (;;) {
-   IF_DEQUEUE(sc-sc_rawq, m);
-   if (m == NULL)
-   break;
-   m_freem(m);
-}
+while ((pkt = ppp_pkt_dequeue(sc-sc_rawq)) != NULL)
+   ppp_pkt_free(pkt);
 while ((m = mq_dequeue(sc-sc_inq)) != NULL)
m_freem(m);
 for (;;) {
@@ -1052,31 +1041,28 @@ void
 pppintr(void)
 {
 struct ppp_softc *sc;
-int s, s2;
+int s;
+struct ppp_pkt *pkt;
 struct mbuf *m;
 
 splsoftassert(IPL_SOFTNET);
 
-s = splsoftnet();  /* XXX - what's the point of this? see comment above */
 LIST_FOREACH(sc, ppp_softc_list, sc_list) {
if (!(sc-sc_flags  SC_TBUSY)
 (!IFQ_IS_EMPTY(sc-sc_if.if_snd) ||
!IFQ_IS_EMPTY(sc-sc_fastq))) {
-   s2 = splnet();
+   s = splnet();
sc-sc_flags |= SC_TBUSY;
-   splx(s2);
+   splx(s);
(*sc-sc_start)(sc);
}
-   while (!IFQ_IS_EMPTY(sc-sc_rawq)) {
-   s2 = splnet();
-   IF_DEQUEUE(sc-sc_rawq, m);
-   splx(s2);
+   while ((pkt = ppp_pkt_dequeue(sc-sc_rawq)) != NULL) {
+   m = ppp_pkt_mbuf(pkt);
if (m == NULL)
-   break;
+   continue;
ppp_inproc(sc, m);
}
 }
-splx(s);
 }
 
 #ifdef PPP_COMPRESS
@@ -1199,15 +1185,11 @@ ppp_ccp_closed(struct ppp_softc *sc)
  * were omitted.
  */
 void
-ktin(struct ppp_softc *sc, struct mbuf *m, int lost)
+ktin(struct ppp_softc *sc, struct ppp_pkt *pkt, int lost)
 {
-int s = splnet();
-
-if (lost)
-   m-m_flags |= M_ERRMARK;
-IF_ENQUEUE(sc-sc_rawq, m);
-schednetisr(NETISR_PPP);
-splx(s);
+pkt-p_hdr.ph_errmark = lost;
+if (ppp_pkt_enqueue(sc-sc_rawq, pkt) == 0)
+   schednetisr(NETISR_PPP);
 }
 
 /*
@@ -1389,19 +1371,6 @@ ppp_inproc(struct ppp_softc *sc, struct 
 }
 #endif /* VJC */
 
-/*
- * If the packet will fit in a header mbuf, don't waste a
- * whole cluster on it.
- */
-if (ilen = MHLEN  M_IS_CLUSTER(m)) {
-   MGETHDR(mp, M_DONTWAIT, MT_DATA);
-   if (mp != NULL) {
-   m_copydata(m, 0, ilen, mtod(mp, caddr_t));
-   m_freem(m);
-   m = mp;
-   m-m_len = ilen;
-   }
-}
 m-m_pkthdr.len = ilen;
 m-m_pkthdr.rcvif = ifp;
 
@@ -1542,4 +1511,96 @@ ppp_ifstart(struct ifnet *ifp)
sc = ifp-if_softc;
(*sc-sc_start)(sc);
 }
+
+void
+ppp_pkt_list_init(struct ppp_pkt_list *pl, u_int limit)
+{
+   mtx_init(pl-pl_mtx, IPL_TTY);
+   pl-pl_head = pl-pl_tail = NULL;
+   pl-pl_count = 0;
+   pl-pl_limit = limit;
+}
+
+int
+ppp_pkt_enqueue(struct ppp_pkt_list *pl, struct ppp_pkt *pkt)
+{
+   int drop = 0;
+
+   mtx_enter(pl-pl_mtx);
+   if (pl-pl_count  pl-pl_limit) {
+   if (pl-pl_tail == NULL)
+

Re: Panic: malloc: out of space in kmem_map

2015-04-07 Thread David Gwynne


 On 7 Apr 2015, at 6:38 pm, Evgeniy Sudyr eject.in...@gmail.com wrote:
 
 David,
 
 yes, there are next changes in sysctl.conf, but kernel options were
 untouched (again it was GENERIC.MP -stable).
 
 $ cat /etc/sysctl.conf
 net.inet.ip.forwarding=1
 net.inet.carp.preempt=1
 net.inet6.ip6.forwarding=1
 kern.maxfiles=5048026
 kern.maxclusters=200

why did you raise those last two values?

dlg

ps. that last one is the cause of your panics.

 
 
 On Tue, Apr 7, 2015 at 9:37 AM, David Gwynne da...@gwynne.id.au wrote:
 
 On 6 Apr 2015, at 05:32, Evgeniy Sudyr eject.in...@gmail.com wrote:
 
 Mark, I will dig in to this.
 
 Sorry, but can someone give a hint what are unusual values for pools
 there which can be related to kernel panic Iv'e reported at the very
 beginning?
 
 Current  vmstat -m output is:
 
 the abbreviated version below is kind of interesting.
 
 are you setting the kern.maxclusters sysctl? if so, to what value?
 
 
 Memory Totals:  In UseFreeRequests
   76695K862K24831415
 Memory resource pool statistics
 NameSize Requests FailInUse Pgreq Pgrel Npage Hiwat Minpg Maxpg 
 Idle
 mbpl 256 2741641011  0  346  4789 0 0  4789 1
 125000 4767
 mcl2k   2048 1108887843  0  183 10052 0 0 10052 4
 100 9959
 
 In use 210238K, total allocated 0K; utilization inf%
 
 
 Will update if will find something...
 
 On Sun, Apr 5, 2015 at 6:59 PM, Mark Kettenis mark.kette...@xs4all.nl 
 wrote:
 Date: Sun, 5 Apr 2015 18:44:43 +0200
 From: Evgeniy Sudyr eject.in...@gmail.com
 
 Stuart,
 
 as part of troubleshooting, BIOS was upgraded from R 3.0 to latest R 3.2
 
 http://www.supermicro.com/products/motherboard/Xeon/C600/X9SRW-F.cfm 
 X9SRW5.115
 
 How big chances are it hitted bug which was fixed in latest BIOS
 relase and this will not occurs again? Did you noticed something we
 can check with Supermicro support to make sure?
 
 So far I've not seen any real evidence that the BIOS is causing
 problems.  Ted noticed the higher-than-usual ACPI memory usage,
 suggesting a memory leak.  This made Stuart suggest that it might be
 worth updating your BIOS.  But we haven't actually established that
 there is indeed a memory leak.  In fact the information you posted
 earlier suggests that there is no ACPI memory leak, or at least not
 one directly related to executing AML.
 
 You'll really need to do some digging yourself here.  Look at the
 vmstat -m output immediately after booting your machine.  Then keep
 looking at it periodically and identify the memory types and pools
 that keep growing.  For malloc'ed memory look at the MemUse column
 under Memory statistics by type.  For pools, look at the InUse
 column under Memory resource pool statistics.
 
 
 
 --
 --
 With regards,
 Eugene Sudyr
 
 
 
 
 
 -- 
 --
 With regards,
 Eugene Sudyr

Re: modload issues on armv7

2014-11-03 Thread David Gwynne


 On 29 Oct 2014, at 09:11, Dimitri Sokolyuk de...@dim13.org wrote:
 
 
 On 28 Oct 2014, at 20:15, Stuart Henderson st...@openbsd.org wrote:
 
 On 2014/10/28 19:03, Dimitri Sokolyuk wrote:
 Synopsis:  modload (ld) issues on armv7 (beaglebone black)
 Category:  system, ld, kernel
 Environment:
 System  : OpenBSD 5.6
 Details : OpenBSD 5.6-current (GENERIC-OMAP) #5: Thu Oct  9 
 16:58:24 AEDT 2014
  
 r...@armv7.jsg.id.au:/usr/src/sys/arch/armv7/compile/GENERIC-OMAP
 
 Architecture: OpenBSD.armv7
 Machine : armv7
 Description:
 ld fails with internel error on lkm load:
 
 ld -nopie -Z -R /dev/ksyms -e test_lkmentry -o test -Ttext 0x0 
 combined.o
 internal error: aborting at /usr/src/gnu/usr.bin/binutils/ld/ldlang.c 
 line 3835 in lang_place_orphans
 ld: please report this bug
 
 LKM support was removed.
 
 Rally sad news. :( Sorry, I’ve missed the announcement 
 (http://www.openbsd.org/faq/current.html#20141013 
 http://www.openbsd.org/faq/current.html#20141013) 
 
 It’s a wrong mailing list, but still, which is a recommended way to develop 
 user kernel extensions and new drivers then?

i build new code as part of the monolithic kernel, copy it to /bsd, and reboot 
to try it. i keep a copy of a working kernel as bsd.working in case my changes 
dont work.

 
 LKM didn’t get much love in past years, but still it was a very powerful tool 
 for small and useful things, which will never make into base code.
 
 --
 Dimitri Sokolyuk — 0x5a7c3054 — http://www.dim13.org/

Re: pfsync over ipsec is broken

2014-10-18 Thread David Gwynne


 On 18 Oct 2014, at 11:42 pm, Stefan Sperling s...@openbsd.org wrote:
 
 On Fri, Oct 17, 2014 at 10:51:54AM +1000, David Gwynne wrote:
 as discussed, a fix for this has been committed in src/sys/net/if_pfsync.c 
 r1.210
 
 thank you for the good bug report. your recipe was easy to follow.
 
 
 There is still another problem with pfsync and IPsec.
 
 Using the previously described setup, the pfsync peers don't properly
 keep their pf states in sync while IPsec is in use. The reason seems
 to be that once local TDBs used for protecting pfsync traffic are synced
 to the peer then IPsec replay checks trigger on the peer's side when
 future pfsync updates are sent:
 
 # netstat -p esp -s | grep replay
304 possibly replayed packets received
 
 In this state, pfsync updates are being dropped by the peer's IPsec stack.
 Running tcpdump on enc0 shows that there is no bi-directional pfsync traffic.
 The peers are sending pfsync updates out but neither is receiving them.
 
 To test my theory I've verified that turning pfsync_update_tdb() into a
 stub that returns immediately allows both peers to keep pf states in sync
 and tcpdump on enc0 now shows bi-directional pfsync traffic.
 
 I'm not sure what the best fix is. Perhaps making pfsync prevent particular
 TDBs from being synced would do. However, this problem might affect any
 SAs negotiated between the peers. In case the peers have multiple
 IPsec-protected links between them we might have to prevent syncing IPsec
 state not directly related to pfsync. The question then becomes which
 TDB's can actually be synced without breaking communication between the
 peers, in general, without knowing what addresses the remote peer is using
 besides the syncpeer address.
 
 I don't know enough about IPsec to come up with an answer I feel confident
 about. The best answer I could come up with so far was any TDB that uses
 a source address from any local interface should be exempt from syncing.
 Is that acceptable? Other ideas?

pfsync implicitely adds NO_SYNC on pf states for pfsync traffic. state update 
messages never occur on the wire for pfsync protocol traffic. id argue 
something that gives the same result is necessary here. the problem is the flow 
you set up to protect pfsync traffic also protects all traffic between the 
hosts, not just pfsync.

on my big boxes at work i try to have something along the lines of pass 
!received-on any keep state (no-sync) to avoid creating states for connections 
that are locally terminated. i dont know if there was a no-sync flag added 
for ipsec flows that pfsync could work with, but that would be necessary in 
this situation.

dlg

Re: bge NOT work on Dell R720

2013-04-06 Thread David Gwynne

i'll try to chase this down, but its hard going by the freebsd bug report cos 
its lots of vague times and no references to specific revisions of their driver.

i have r420s and r520s with 5720s in them which work fine. i do have a r720 i 
can try, but its hard to pull out of production for this kind of testing.

i'll try to get to this soon.

cheers,
dlg

On 05/04/2013, at 10:04 PM, Robert Young yay...@gmail.com wrote:

 Dell PowerEdge R720
 Broadcom Gigabit Ethernet BCM5720
 
 Tested kernel:
 http://ftp.openbsd.org/pub/OpenBSD/snapshots/amd64/bsd.rd
 http://ftp.openbsd.org/pub/OpenBSD/snapshots/amd64/bsd.mp
 
 dmesg:
 OpenBSD 5.3-current (RAMDISK_CD) #96: Wed Apr 3 02:19:34 MDT 2013
 bge2 at pci2 dev 0 function 0 Broadcom BCM5720 rev 0x00, BCM5720 A0
 (0x572), APE firmware NCSI 1.1.7.0: apic 1 int 3, address
 90:b1:1c:3a:a8:19
 brgphy2 at bge2 phy 1: BCM5720C 10/100/1000baseT PHY, rev. 0
 
 It's OK 156 .
 ping -s 156 10.2.1.29
 164 bytes from ...
 
 NO reply =157,(not resetted,can return to normal with small packet test)
 ping -s 157 10.2.1.29
 ... 0 packets received
 
 Even larger packet, NIC resetted(wait ... return to normal after reseted)
 ping -s 275 10.2.1.29
 bge2: watchdog timeout -- resetting
 
 
 FreeBSD have encountered same issue,I tested, same issue found in:
 http://ftp.freebsd.org/pub/FreeBSD/ISO-IMAGES-amd64/9.1/FreeBSD-9.1-RELEASE-amd64-dvd1.iso
 
 This problem was fixed by FreeBSD team:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=171121
 
 I tested, This version have fixed this problem:
 http://ftp.freebsd.org/pub/FreeBSD/snapshots/amd64/amd64/ISO-IMAGES/10.0/FreeBSD-10.0-CURRENT-amd64-20130323-r248655-release.iso

Re: OpenBSD crash on an IBM x3550 M3

2011-03-04 Thread David Gwynne

i agree that mikebs change should go in.

On 05/03/2011, at 12:10 AM, Mark Kettenis wrote:

 Date: Fri, 4 Mar 2011 07:30:24 -0600
 From: Marco Peereboom sl...@peereboom.us

 That is a huge penalty because it is read over the pci bus.  The trick
 with 0x should work just fine per the doco and other os' drivers
 (on top of my head).  The question I have is does Linux only have one
 device per interrupt?

 Linux probably does a better job at avoiding shared interrupts than we
 do, but it on some hardware it can't be avoided so it has to deal with
 it.

 If you wantto avoid reading the interrupt status register, you'll have
 to stop trusting the hardware (or rather the firmware) in make
 mpi_reply(), and do bounds checks before accessing sc-sc_rcbs[] and
 sc-sc_ccbs[].  To be honest, that would be a good idea even if we
 didn't have this bug.

 In the meantime I think mikeb's fix should be committed.

 I am going to reference the doco one more time on this.

 On Thu, Mar 03, 2011 at 10:35:59PM -0500, Kenneth R Westerback wrote:
 On Thu, Mar 03, 2011 at 07:11:52PM +0100, Mike Belopuhov wrote:
 On Fri, Feb 04, 2011 at 14:53 +, emeric boit wrote:
 Hello,

 After doing a clean install of OpenBSD 4.8 (AMD64) on an IBM x3550 M3,
 I find
 the
 system randomly panics after a period of use.
 uvm_fault(0x80cc8360, 0x8000149b7000, 0, 1) - e
 kernel: page
 fault trap, code=0
 Stopped at  mpi_reply+0x102:movq
 0(%r13),%rax
 ddb{0}

 ddb{0} trace
 mpi_reply() at mpi_reply+0x102
 mpi_intr()
 at mpi_intr+0x20
 Xintr_ioapic_level18() at Xintr_ioapic_level18+0xec
 ---
 interrupt ---
 Bad frame pointer: 0x8000194e1920
 end trace frame:
 0x8000194e1920, count: -3
 Xspllower+0xe:
 ddb{0}

 We've tried different things, but after this hint i realised
 that what might be happening is that bnx and mpi interrupts
 are chained (it's bnx0 actually, my initial guess about bnx1
 was wrong) and mpi_intr is called first.  Currently neither
 mpi(4) nor mpii(4) don't check the interrupt status register
 but look directly into the reply post queue.  Although,
 there's not supposed to be any race between host cpu reading
 from the memory and ioc writing to it, in practice it turns
 out that in some particular hardware configurations this rule
 is violated and we read a garbled reply from the controller.

 If my memory serves, I've considered this for the mpii_intr
 but never got into the situation where it was needed and
 thus omitted it.  I guess I have to bring it back too.

 Emeric tortured the machine with this diff and reported that
 it solves the issue for him.  OK to commit?

 On Wed, Mar 02, 2011 at 17:20 +, emeric boit wrote:
 hi,

 This change doesn't solve the issue.

 I have remarked that the server crash when I use the network.

 I copy a small file several times without problem.
 On the IBM I do :
 scp USER@IP:/tmp/mpi.c .

 And when I copy a larger file the server crash :
 scp USER@IP:/bsd .

 And when I copy th same file (bsd) from an usb key I don't have
problem.

 Emeric.

 that sounds like an interrupt sharing bug of some sort.
 is it bnx1 that you're using to reproduce a crash?

 try the following diff please (on a clean checkout):

 Index: mpi.c
 ===
 RCS file: /home/cvs/src/sys/dev/ic/mpi.c,v
 retrieving revision 1.166
 diff -u -p -r1.166 mpi.c
 --- mpi.c  1 Mar 2011 23:48:33 -   1.166
 +++ mpi.c  2 Mar 2011 17:40:13 -
 @@ -887,6 +887,9 @@ mpi_intr(void *arg)
u_int32_t   reg;
int rv = 0;

 +  if ((mpi_read_intr(sc)  MPI_INTR_STATUS_REPLY) == 0)
 +  return (rv);
 +
while ((reg = mpi_pop_reply(sc)) != 0x) {
mpi_reply(sc, reg);
rv = 1;

 ok krw@.

  Ken

67 matches

Mail list logo