Re: poor routing/nat performance

2022-12-19 Thread Gabor LENCSE

Dear David,

According to my experience, the IPv4/IPv6 packet forwarding performance 
of OpenBSD is about an order of magnitude lower than that of Linux, if I 
use a 16-core server.


When I tried to identify the root causes, I found two things:

1. I used an RFC 2544 compliant test with a single IP address pair and 
RFC 4814 pseudorandom port numbers. However, the interrupts caused by 
the packet arrivals were processed by two CPU cores (one core per 
direction), the others did not take part in it. It is so because OpenBSD 
does not support the setting of the proper RSS (Receive-Side Scaling), 
please see the details in: 
https://marc.info/?l=openbsd-misc=166581934723445=2 If you forward 
Internet traffic, then you have different IP addresses, thus this one 
will not be an issue for you.


2. When I checked the CPU utilization using the top command, I found 
that only 3 CPU cores (out of the 32 CPU cores of my server) had 
non-zero load: two of them processed interrupts and had about 25-27% CPU 
utilization, and very likely the third one did the packet forwarding and 
it had about 90-95% CPU utilization in my particular experiment. That 
is, very likely the packet forwarding process can use only a single CPU 
core.


I have saved the output of the top command, now I copy it here:

36 processes: 35 idle, 1 on processor up  0:12
CPU00 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU01 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
*CPU02 states:  0.0% user,  0.0% nice, 93.8% sys,  6.2% spin, 0.0% 
intr,  0.0% idle*
CPU03 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU04 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU05 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU06 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU07 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU08 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
*CPU09 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 25.0% 
intr, 75.0% idle*
CPU10 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU11 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU12 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU13 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU14 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU15 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU16 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU17 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU18 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU19 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU20 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU21 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU22 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU23 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU24 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
*CPU25 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 26.7% 
intr, 73.3% idle*
CPU26 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU27 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU28 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU29 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU30 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle
CPU31 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin, 0.0% intr,  
100% idle

Memory: Real: 32M/1397M act/tot Free: 371G Cache: 712M Swap: 0K/256M

As you can see, I made the lines with non-zero CPU utilization *bold*.

I expect that this issue will be a problem for you, too: the packet 
forwarding performance of your OpenBSD system will not scale up with the 
number of CPU cores.


Best regards,

Gábor

On 12/19/2022 5:35 PM, David Hajes wrote:

hi guys,

I have simple PcEngines APU2 router running latest OpenBSD stable.

em0 is WAN (bridge to CaTV modem with 1Gbps/100Mbps connectivity with normal 
ether connectivity with DHCP...no special stuff like PPPoE)

em1-3 is in vether/bridge mode with NAT routing to local network.

I have complained to ISP about speeds because it supposes to run almost 1Gbps.

results (speedtest.net used by ISP for some reason):

800+/85 Mbps measured by ISP technician directly from CaTV modem.
440MBps/85Mbps simple NAT firewall pf.conf based on OpenBSD suggestions
380/80Mbps with my 

Re: poor routing/nat performance

2022-12-19 Thread Daniel Ouellet

With 7.2 on the APU 2 when I tested it was about 650 or so.

I didn't send the info as it is not connected now.

But either way, you can't get Gb speed on it no matter what.


On 12/19/22 2:43 PM, Stuart Henderson wrote:

On 2022-12-19, Daniel Ouellet  wrote:

OpenBSD 6.8 (GENERIC.MP) #4: Thu Aug  5 11:02:18 MDT 2021


This is too old for a good comparison, many improvements have been made since 
then.






Re: poor routing/nat performance

2022-12-19 Thread Stuart Henderson
On 2022-12-19, Daniel Ouellet  wrote:
> OpenBSD 6.8 (GENERIC.MP) #4: Thu Aug  5 11:02:18 MDT 2021

This is too old for a good comparison, many improvements have been made since 
then.




Re: poor routing/nat performance

2022-12-19 Thread Daniel Ouellet

I have the APU 1 and here is what I get

TEST_DATE   TIME_ZONE   DOWNLOAD_MEGABITS   UPLOAD_MEGABITS
12/19/2022 11:52GMT 429.05  422.17

LATENCY_MS  SERVER_NAME DISTANCE_MILES  CONNECTION_MODE
3   Ashburn  VA 0multi

SERVER_COUNT
multi 4

I haven't tested with the APU 2 that I have,  but with NAT I don't think 
you can get the full 1Gb speed.


I have 1Gb symmetric line and with NAT I can't come close to the full 
line speed.




OpenBSD 6.8 (GENERIC.MP) #4: Thu Aug  5 11:02:18 MDT 2021

t...@syspatch-68-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4246003712 (4049MB)
avail mem = 4102266880 (3912MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdf16d820 (7 entries)
bios0: vendor coreboot version "4.0" date 09/08/2014
bios0: PC Engines APU
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP SPCR HPET APIC HEST SSDT SSDT SSDT
acpi0: wakeup devices AGPB(S4) HDMI(S4) PBR4(S4) PBR5(S4) PBR6(S4) 
PBR7(S4) PE20(S4) PE21(S4) PE22(S4) PE23(S4) PIBR(S4) UOH1(S3) UOH2(S3) 
UOH3(S3) UOH4(S3) UOH5(S3) [...]

acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD G-T40E Processor, 1000.13 MHz, 14-02-00
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
cpu0: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
64b/line 16-way L2 cache

cpu0: 8 4MB entries fully associative
cpu0: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 199MHz
cpu0: mwait min=64, max=64, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD G-T40E Processor, 1000.01 MHz, 14-02-00
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,MWAIT,SSSE3,CX16,POPCNT,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,IBS,SKINIT,ITSC
cpu1: 32KB 64b/line 2-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
64b/line 16-way L2 cache

cpu1: 8 4MB entries fully associative
cpu1: DTLB 40 4KB entries fully associative, 8 4MB entries fully associative
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 21, 24 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (AGPB)
acpiprt2 at acpi0: bus -1 (HDMI)
acpiprt3 at acpi0: bus 1 (PBR4)
acpiprt4 at acpi0: bus 2 (PBR5)
acpiprt5 at acpi0: bus 3 (PBR6)
acpiprt6 at acpi0: bus -1 (PBR7)
acpiprt7 at acpi0: bus 5 (PE20)
acpiprt8 at acpi0: bus -1 (PE21)
acpiprt9 at acpi0: bus -1 (PE22)
acpiprt10 at acpi0: bus -1 (PE23)
acpiprt11 at acpi0: bus 4 (PIBR)
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
acpicmos0 at acpi0
acpibtn0 at acpi0: PWRB
acpicpu0 at acpi0: C2(0@100 io@0x841), C1(@1 halt!), PSS
acpicpu1 at acpi0: C2(0@100 io@0x841), C1(@1 halt!), PSS
cpu0: 1000 MHz: speeds: 1000 800 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "AMD 14h Host" rev 0x00
ppb0 at pci0 dev 4 function 0 "AMD 14h PCIE" rev 0x00: msi
pci1 at ppb0 bus 1
re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E 
(0x2c00), msi, address 00:0d:b9:3e:d5:5c

rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 4
ppb1 at pci0 dev 5 function 0 "AMD 14h PCIE" rev 0x00: msi
pci2 at ppb1 bus 2
re1 at pci2 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E 
(0x2c00), msi, address 00:0d:b9:3e:d5:5d

rgephy1 at re1 phy 7: RTL8169S/8110S/8211 PHY, rev. 4
ppb2 at pci0 dev 6 function 0 "AMD 14h PCIE" rev 0x00: msi
pci3 at ppb2 bus 3
re2 at pci3 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E 
(0x2c00), msi, address 00:0d:b9:3e:d5:5e

rgephy2 at re2 phy 7: RTL8169S/8110S/8211 PHY, rev. 4
ahci0 at pci0 dev 17 function 0 "ATI SBx00 SATA" rev 0x40: apic 2 int 
19, AHCI 1.2

ahci0: port 0: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0:  naa.5000
sd0: 15272MB, 512 bytes/sector, 31277232 sectors, thin
ohci0 at pci0 dev 18 function 0 "ATI SB700 USB" rev 0x00: apic 2 int 18, 
version 1.0, legacy support

ehci0 at pci0 dev 18 function 2 "ATI SB700 USB2" rev 0x00: apic 2 int 17
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "ATI EHCI root hub" rev 
2.00/1.00 addr 1
ohci1 at pci0 dev 19 function 0 "ATI SB700 USB" rev 0x00: apic 2 int 18, 
version 1.0, legacy support

ehci1 at pci0 dev 19 function 2 "ATI SB700 USB2" rev 0x00: apic 2 int 17
usb1 at ehci1: USB revision 2.0
uhub1 at usb1 configuration 1 interface 0 "ATI EHCI root hub" rev 

Re: poor routing/nat performance

2022-12-19 Thread Stuart Henderson
On 2022-12-19, David Hajes  wrote:
> hi guys,
>
> I have simple PcEngines APU2 router running latest OpenBSD stable.
>
> em0 is WAN (bridge to CaTV modem with 1Gbps/100Mbps connectivity with normal 
> ether connectivity with DHCP...no special stuff like PPPoE)
>
> em1-3 is in vether/bridge mode with NAT routing to local network.
>
> I have complained to ISP about speeds because it supposes to run almost 1Gbps.
>
> results (speedtest.net used by ISP for some reason):
>
> 800+/85 Mbps measured by ISP technician directly from CaTV modem.
> 440MBps/85Mbps simple NAT firewall pf.conf based on OpenBSD suggestions
> 380/80Mbps with my strict firewall rules

APU2 is not particularly powerful. When running OpenBSD on it, it's
pretty much OK for VDSL type speeds / 100M leased line / some slower FTTP.
It's not really gigabit-capable.

You aren't helping by making further requirements on the CPU by using
it as a bridge as well; you might help things a bit by removing the bridge
configuration on em1-3 and just have a standard single em1 interface,
connect a real ethernet switch for additional machines. But I don't think
that will get you really close to gigabit - maybe you can find another
100-150M or so, but that's probably about it.

You will see better speeds from the APU hardware with other OS, though
full gigabit with anything complex in terms of packet filtering is still
a bit unlikely. If you want full gigabit from OpenBSD you'll need faster
hw.

> I have used following guide 
> http://dant.net.ru/calomel/network_performance.html No changes, same 
> performance.

That guide is often quoted (though fortunately not quite so often
these days). But it's fairly useless.

-- 
Please keep replies on the mailing list.



Re: poor routing/nat performance

2022-12-19 Thread Brian Conway
On Mon, Dec 19, 2022, at 10:35 AM, David Hajes wrote:
> I am guessing HW is not issue.

I would not be totally sure on that. The CPU in the APU2 is pretty slow. While 
you can no doubt find some tweaked Linux or FreeBSD configurations that push it 
close to wire speed, the best I've ever been able to accomplish with the 
simplest pf.conf and forwarding between em0-em1 is 500-550 Mbps sustained, with 
occasional bursts to 600 Mbps. Research indicates others have had similar 
experiences.

If you check the misc@ list archive, you've find a bunch of threads with people 
looking for inexpensive alternatives to the APU2+ platform, and there are 
plenty in the $100-200 USD range for amd64. Most of my APU2s have been retired 
to terminal/console server duty.

> CPU bored, max. load 25%

It sounds like 1 of your 4 cores is maxed, which would not be surprising.

Brian Conway



Re: poor routing/nat performance

2022-12-19 Thread Hrvoje Popovski
On 19.12.2022. 17:35, David Hajes wrote:
> hi guys,
> 
> I have simple PcEngines APU2 router running latest OpenBSD stable.
> 
> em0 is WAN (bridge to CaTV modem with 1Gbps/100Mbps connectivity with normal 
> ether connectivity with DHCP...no special stuff like PPPoE)
> 
> em1-3 is in vether/bridge mode with NAT routing to local network.
> 
> I have complained to ISP about speeds because it supposes to run almost 1Gbps.
> 
> results (speedtest.net used by ISP for some reason):
> 
> 800+/85 Mbps measured by ISP technician directly from CaTV modem.
> 440MBps/85Mbps simple NAT firewall pf.conf based on OpenBSD suggestions
> 380/80Mbps with my strict firewall rules
> 
> I have used following guide 
> http://dant.net.ru/calomel/network_performance.html No changes, same 
> performance.
> 
> Checking out router monitoring
> 
> 3k packets/s firewall throughput
> pf_states lookup max. 12k/s, ~2k/s
> CPU bored, max. load 25%
> RAM 2.6 GB from 4GB free, swap never used
> 
> I am guessing HW is not issue.
> 
> Is there any issues with bridging local interfaces, and routing/NAT 
> performance, please?
> 
> I tried to Google answers, and there is lots of whining but no real info. It 
> supposes to run double speed, at least 800Mbps as shown by ISP technicians.
> 
> Any suggestions for bottleneck, please?
> 

Could you try veb(4) instead bridge(4) ?
Bridge is quite slow

https://undeadly.org/cgi?action=article;sid=20220319123157




poor routing/nat performance

2022-12-19 Thread David Hajes
hi guys,

I have simple PcEngines APU2 router running latest OpenBSD stable.

em0 is WAN (bridge to CaTV modem with 1Gbps/100Mbps connectivity with normal 
ether connectivity with DHCP...no special stuff like PPPoE)

em1-3 is in vether/bridge mode with NAT routing to local network.

I have complained to ISP about speeds because it supposes to run almost 1Gbps.

results (speedtest.net used by ISP for some reason):

800+/85 Mbps measured by ISP technician directly from CaTV modem.
440MBps/85Mbps simple NAT firewall pf.conf based on OpenBSD suggestions
380/80Mbps with my strict firewall rules

I have used following guide http://dant.net.ru/calomel/network_performance.html 
No changes, same performance.

Checking out router monitoring

3k packets/s firewall throughput
pf_states lookup max. 12k/s, ~2k/s
CPU bored, max. load 25%
RAM 2.6 GB from 4GB free, swap never used

I am guessing HW is not issue.

Is there any issues with bridging local interfaces, and routing/NAT 
performance, please?

I tried to Google answers, and there is lots of whining but no real info. It 
supposes to run double speed, at least 800Mbps as shown by ISP technicians.

Any suggestions for bottleneck, please?

Regards

DavidH

Sent with [Proton Mail](https://proton.me/) secure email.