Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)

2020-04-14 Thread Adam Steen
On Sun, Apr 12, 2020 at 11:53:06AM +1000, David Gwynne wrote:
> 
> On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote:
> > >Synopsis:  Packet loss / ENOBUFs with kqueue(2) and tap(4)
> > >Category:  bug
> > >Environment:
> > System  : OpenBSD 6.5
> > Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 
> > 19:39:46 AWST 2019
> >  
> > ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > In Solo5 we have been working towards supporting multiple network
> > interfaces, implemented this using kqueue(2) and tap(4).
> >
> > This involves setting up two Tap interfaces, starting up the program.
> > In another session flood pinging the first Tap interface,
> > Solo5 handles this with no packets dropped.
> > In another session ping the second Tap interface, then for every
> > ping to the second interface a packet is dropped on the first. If you
> > switch to a flood ping on the second tab interface, you will observe
> > massive packet loss on both interfaces, and ping complaining about
> > No buffer space available (ENOBUFS).
> >
> > see https://github.com/Solo5/solo5/issues/374 for more information.
> >
> > >How-To-Repeat:
> > I have been able to reproduct this in a hacked up exampled program,
> > available here https://github.com/adamsteen/test_net_2if. Please note
> > this is hacked, generally butchered program, which demonstrates the
> > problem. (if required i can try and clean up this test case)
> >
> > 01. git clone https://github.com/adamsteen/test_net_2if
> > 02. cd test_net_2if
> > 03. make
> > 04. doas setup.sh (Setup up the Tap interfaces)
> > 05. doas ./test_net_2if
> > 06. in another seesion start a flood ping
> > doas ping -f 10.0.0.2
> > 07. Observe that the flood ping is functioning correctly,
> > with no packets dropped.
> > 08. In another session, start a normal ping
> > ping 10.1.0.2
> > 09. Observe that, for each ping sent to service1, a packet is dropped.
> > 10. Kill the normal ping
> > 11. start a flood ping
> > doas ping -f 10.1.0.2
> > 12. Observe massive packet loss on both interfaces, and ping
> > complaining about No buffer space available (ENOBUFS).
> > >Fix:
> > Not Known.
> 
> Hi Adam,
> 
> claudio@ and I looked at this during a2k20, and came to the conclusion
> that the packet loss occurred because an interface queue filled up
> and it was shedding load. It was annoyingly easy to get to that point
> though.
> 
> We also spent a lot of time massaging the tun/tap code to try and unify
> the semantics of tun and tap going through the network stack, and in
> particular tried to avoid queuing packets until we finally get to the
> output side of the stack.
> 
> I'm not saying we've fixed this problem for you, but hopefully we've
> mitigated it a bit. Could you try again and let us know if you see any
> difference? If there's no difference, could you tweak your test to loop
> on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or
> whatever the errno is that means there's no packet to read right now?
> 
> Cheers,
> dlg

Hi David

I have tested again, but there was no difference, i tried EWOULDBLOCK,
but no luck there either, will do more reading and see what i can come
up with with regards to 'EWOULDBLOCK'

Also did you know with this recent work, if you create a tap interface,
it starts UP, if i then use it with my test app, (also the real app)
once the app has finished the tap interface is no longer up, before the
recent work this was not the case.

ie
before:

tap100: flags=8843 mtu 1500
lladdr fe:e1:ba:d4:05:17
index 12 priority 0 llprio 3
groups: tap
status: no carrier
inet 10.0.0.1 netmask 0xff00 broadcast 10.0.0.255A

after:

tap100: flags=8802 mtu 1500
lladdr fe:e1:ba:d4:05:17
index 12 priority 0 llprio 3
groups: tap
status: no carrier
inet 10.0.0.1 netmask 0xff00 broadcast 10.0.0.255

Cheers
Adam



Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)

2020-04-11 Thread Adam Steen
On Sun, Apr 12, 2020 at 09:53, David Gwynne  wrote:

> On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote:
>> >Synopsis: Packet loss / ENOBUFs with kqueue(2) and tap(4)
>> >Category: bug
>> >Environment:
>> System : OpenBSD 6.5
>> Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 
>> 2019
>> ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
>>
>> Architecture: OpenBSD.amd64
>> Machine : amd64
>> >Description:
>> In Solo5 we have been working towards supporting multiple network
>> interfaces, implemented this using kqueue(2) and tap(4).
>>
>> This involves setting up two Tap interfaces, starting up the program.
>> In another session flood pinging the first Tap interface,
>> Solo5 handles this with no packets dropped.
>> In another session ping the second Tap interface, then for every
>> ping to the second interface a packet is dropped on the first. If you
>> switch to a flood ping on the second tab interface, you will observe
>> massive packet loss on both interfaces, and ping complaining about
>> No buffer space available (ENOBUFS).
>>
>> see https://github.com/Solo5/solo5/issues/374 for more information.
>>
>> >How-To-Repeat:
>> I have been able to reproduct this in a hacked up exampled program,
>> available here https://github.com/adamsteen/test_net_2if. Please note
>> this is hacked, generally butchered program, which demonstrates the
>> problem. (if required i can try and clean up this test case)
>>
>> 01. git clone https://github.com/adamsteen/test_net_2if
>> 02. cd test_net_2if
>> 03. make
>> 04. doas setup.sh (Setup up the Tap interfaces)
>> 05. doas ./test_net_2if
>> 06. in another seesion start a flood ping
>> doas ping -f 10.0.0.2
>> 07. Observe that the flood ping is functioning correctly,
>> with no packets dropped.
>> 08. In another session, start a normal ping
>> ping 10.1.0.2
>> 09. Observe that, for each ping sent to service1, a packet is dropped.
>> 10. Kill the normal ping
>> 11. start a flood ping
>> doas ping -f 10.1.0.2
>> 12. Observe massive packet loss on both interfaces, and ping
>> complaining about No buffer space available (ENOBUFS).
>> >Fix:
>> Not Known.
>
> Hi Adam,
>
> claudio@ and I looked at this during a2k20, and came to the conclusion
> that the packet loss occurred because an interface queue filled up
> and it was shedding load. It was annoyingly easy to get to that point
> though.
>
> We also spent a lot of time massaging the tun/tap code to try and unify
> the semantics of tun and tap going through the network stack, and in
> particular tried to avoid queuing packets until we finally get to the
> output side of the stack.
>
> I'm not saying we've fixed this problem for you, but hopefully we've
> mitigated it a bit. Could you try again and let us know if you see any
> difference? If there's no difference, could you tweak your test to loop
> on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or 
> whatever the errno is that means there's no packet to read right now?

> Cheers,
> dlg

Hi dlg

I will definitely have a look, and I will give the “EWOULDBLOCK” errno idea a 
go! Will probably get back to you Tuesday sometime!

Cheers
Adam

Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)

2020-04-11 Thread David Gwynne
On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote:
> >Synopsis:Packet loss / ENOBUFs with kqueue(2) and tap(4)
> >Category:bug
> >Environment:
>   System  : OpenBSD 6.5
>   Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 
> 19:39:46 AWST 2019
>
> ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   In Solo5 we have been working towards supporting multiple network
>   interfaces, implemented this using kqueue(2) and tap(4).
> 
>   This involves setting up two Tap interfaces, starting up the program.
>   In another session flood pinging the first Tap interface,
>   Solo5 handles this with no packets dropped.
>   In another session ping the second Tap interface, then for every
>   ping to the second interface a packet is dropped on the first. If you
>   switch to a flood ping on the second tab interface, you will observe
>   massive packet loss on both interfaces, and ping complaining about
>   No buffer space available (ENOBUFS).
> 
>   see https://github.com/Solo5/solo5/issues/374 for more information.
>   
> >How-To-Repeat:
>   I have been able to reproduct this in a hacked up exampled program,
>   available here https://github.com/adamsteen/test_net_2if. Please note
>   this is hacked, generally butchered program, which demonstrates the
>   problem. (if required i can try and clean up this test case)
> 
>   01. git clone https://github.com/adamsteen/test_net_2if
>   02. cd test_net_2if
>   03. make
>   04. doas setup.sh (Setup up the Tap interfaces)
>   05. doas ./test_net_2if
>   06. in another seesion start a flood ping
>   doas ping -f 10.0.0.2
>   07. Observe that the flood ping is functioning correctly,
>   with no packets dropped.
>   08. In another session, start a normal ping
>   ping 10.1.0.2
>   09. Observe that, for each ping sent to service1, a packet is dropped.
>   10. Kill the normal ping
>   11. start a flood ping
>   doas ping -f 10.1.0.2
>   12. Observe massive packet loss on both interfaces, and ping
>   complaining about No buffer space available (ENOBUFS).
> >Fix:
>   Not Known.

Hi Adam,

claudio@ and I looked at this during a2k20, and came to the conclusion
that the packet loss occurred because an interface queue filled up
and it was shedding load. It was annoyingly easy to get to that point
though.

We also spent a lot of time massaging the tun/tap code to try and unify
the semantics of tun and tap going through the network stack, and in
particular tried to avoid queuing packets until we finally get to the
output side of the stack.

I'm not saying we've fixed this problem for you, but hopefully we've
mitigated it a bit. Could you try again and let us know if you see any
difference? If there's no difference, could you tweak your test to loop
on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or
whatever the errno is that means there's no packet to read right now?

Cheers,
dlg



Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)

2019-07-05 Thread Adam Steen
CC Mato from Solo5 (not subscribed to bugs@)

On Fri, Jul 5, 2019 at 11:51, Adam Steen  wrote:

>>Synopsis: Packet loss / ENOBUFs with kqueue(2) and tap(4)
>>Category: bug
>>Environment:
> System : OpenBSD 6.5
> Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019
> ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
>
> Architecture: OpenBSD.amd64
> Machine : amd64
>>Description:
> In Solo5 we have been working towards supporting multiple network
> interfaces, implemented this using kqueue(2) and tap(4).
>
> This involves setting up two Tap interfaces, starting up the program.
> In another session flood pinging the first Tap interface,
> Solo5 handles this with no packets dropped.
> In another session ping the second Tap interface, then for every
> ping to the second interface a packet is dropped on the first. If you
> switch to a flood ping on the second tab interface, you will observe
> massive packet loss on both interfaces, and ping complaining about
> No buffer space available (ENOBUFS).
>
> see https://github.com/Solo5/solo5/issues/374 for more information.
>
>>How-To-Repeat:
> I have been able to reproduct this in a hacked up exampled program,
> available here https://github.com/adamsteen/test_net_2if. Please note
> this is hacked, generally butchered program, which demonstrates the
> problem. (if required i can try and clean up this test case)
>
> 01. git clone https://github.com/adamsteen/test_net_2if
> 02. cd test_net_2if
> 03. make
> 04. doas setup.sh (Setup up the Tap interfaces)
> 05. doas ./test_net_2if
> 06. in another seesion start a flood ping
> doas ping -f 10.0.0.2
> 07. Observe that the flood ping is functioning correctly,
> with no packets dropped.
> 08. In another session, start a normal ping
> ping 10.1.0.2
> 09. Observe that, for each ping sent to service1, a packet is dropped.
> 10. Kill the normal ping
> 11. start a flood ping
> doas ping -f 10.1.0.2
> 12. Observe massive packet loss on both interfaces, and ping
> complaining about No buffer space available (ENOBUFS).
>>Fix:
> Not Known.
>
> dmesg:
> OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019
> ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17041059840 (16251MB)
> avail mem = 16514461696 (15749MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (64 entries)
> bios0: vendor LENOVO version "8DET69WW (1.39 )" date 07/18/2013
> bios0: LENOVO 4291N58
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA 
> SSDT SSDT DMAR UEFI UEFI UEFI
> acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EXP7(S4) EHC1(S3) 
> EHC2(S3) HDEF(S4)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.31 MHz, 06-2a-07
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz, 06-2a-07
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf800, bus 0-63
> acpiec0 at acpi0
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus -1 (PEG_)
> acpiprt2 at acpi0: bus 2 (EXP1)
> acpiprt3 at acpi0: bus 3 (EXP2)
> acpiprt4 at acpi0: bus 5 (EXP4)
> acpiprt5 at acpi0: bus 13 (EXP5)
> acpiprt6 at acpi0: bus -1 (EXP7)
> acpicpu0 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
> acpicpu1 at acpi0: C3(350@104 io@0x415), C1(1000@1 ha

Packet loss / ENOBUFs with kqueue(2) and tap(4)

2019-07-04 Thread Adam Steen
>Synopsis:      Packet loss / ENOBUFs with kqueue(2) and tap(4)
>Category:  bug
>Environment:
System  : OpenBSD 6.5
Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 
19:39:46 AWST 2019
 
ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
In Solo5 we have been working towards supporting multiple network
interfaces, implemented this using kqueue(2) and tap(4).

This involves setting up two Tap interfaces, starting up the program.
In another session flood pinging the first Tap interface,
Solo5 handles this with no packets dropped.
In another session ping the second Tap interface, then for every
ping to the second interface a packet is dropped on the first. If you
switch to a flood ping on the second tab interface, you will observe
massive packet loss on both interfaces, and ping complaining about
No buffer space available (ENOBUFS).

see https://github.com/Solo5/solo5/issues/374 for more information.

>How-To-Repeat:
I have been able to reproduct this in a hacked up exampled program,
available here https://github.com/adamsteen/test_net_2if. Please note
this is hacked, generally butchered program, which demonstrates the
problem. (if required i can try and clean up this test case)

01. git clone https://github.com/adamsteen/test_net_2if
02. cd test_net_2if
03. make
04. doas setup.sh (Setup up the Tap interfaces)
05. doas ./test_net_2if
06. in another seesion start a flood ping
doas ping -f 10.0.0.2
07. Observe that the flood ping is functioning correctly,
with no packets dropped.
08. In another session, start a normal ping
ping 10.1.0.2
09. Observe that, for each ping sent to service1, a packet is dropped.
10. Kill the normal ping
11. start a flood ping
doas ping -f 10.1.0.2
12. Observe massive packet loss on both interfaces, and ping
complaining about No buffer space available (ENOBUFS).
>Fix:
Not Known.

dmesg:
OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019
ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP
real mem = 17041059840 (16251MB)
avail mem = 16514461696 (15749MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (64 entries)
bios0: vendor LENOVO version "8DET69WW (1.39 )" date 07/18/2013
bios0: LENOVO 4291N58
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA SSDT 
SSDT DMAR UEFI UEFI UEFI
acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EXP7(S4) EHC1(S3) 
EHC2(S3) HDEF(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.31 MHz, 06-2a-07
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz, 06-2a-07
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63
acpiec0 at acpi0
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus 5 (EXP4)
acpiprt5 at acpi0: bus 13 (EXP5)
acpiprt6 at acpi0: bus -1 (EXP7)
acpicpu0 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
acpicpu1 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS
acpipwrres0 at acpi0: PUBS, resource for EHC1, EHC2
acpitz0 at acpi0: critical temperature is 99 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpipci0 at acpi0 PCI0: 0x 0x0