Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)
On Sun, Apr 12, 2020 at 11:53:06AM +1000, David Gwynne wrote: > > On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote: > > >Synopsis: Packet loss / ENOBUFs with kqueue(2) and tap(4) > > >Category: bug > > >Environment: > > System : OpenBSD 6.5 > > Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 > > 19:39:46 AWST 2019 > > > > ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > In Solo5 we have been working towards supporting multiple network > > interfaces, implemented this using kqueue(2) and tap(4). > > > > This involves setting up two Tap interfaces, starting up the program. > > In another session flood pinging the first Tap interface, > > Solo5 handles this with no packets dropped. > > In another session ping the second Tap interface, then for every > > ping to the second interface a packet is dropped on the first. If you > > switch to a flood ping on the second tab interface, you will observe > > massive packet loss on both interfaces, and ping complaining about > > No buffer space available (ENOBUFS). > > > > see https://github.com/Solo5/solo5/issues/374 for more information. > > > > >How-To-Repeat: > > I have been able to reproduct this in a hacked up exampled program, > > available here https://github.com/adamsteen/test_net_2if. Please note > > this is hacked, generally butchered program, which demonstrates the > > problem. (if required i can try and clean up this test case) > > > > 01. git clone https://github.com/adamsteen/test_net_2if > > 02. cd test_net_2if > > 03. make > > 04. doas setup.sh (Setup up the Tap interfaces) > > 05. doas ./test_net_2if > > 06. in another seesion start a flood ping > > doas ping -f 10.0.0.2 > > 07. Observe that the flood ping is functioning correctly, > > with no packets dropped. > > 08. In another session, start a normal ping > > ping 10.1.0.2 > > 09. Observe that, for each ping sent to service1, a packet is dropped. > > 10. Kill the normal ping > > 11. start a flood ping > > doas ping -f 10.1.0.2 > > 12. Observe massive packet loss on both interfaces, and ping > > complaining about No buffer space available (ENOBUFS). > > >Fix: > > Not Known. > > Hi Adam, > > claudio@ and I looked at this during a2k20, and came to the conclusion > that the packet loss occurred because an interface queue filled up > and it was shedding load. It was annoyingly easy to get to that point > though. > > We also spent a lot of time massaging the tun/tap code to try and unify > the semantics of tun and tap going through the network stack, and in > particular tried to avoid queuing packets until we finally get to the > output side of the stack. > > I'm not saying we've fixed this problem for you, but hopefully we've > mitigated it a bit. Could you try again and let us know if you see any > difference? If there's no difference, could you tweak your test to loop > on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or > whatever the errno is that means there's no packet to read right now? > > Cheers, > dlg Hi David I have tested again, but there was no difference, i tried EWOULDBLOCK, but no luck there either, will do more reading and see what i can come up with with regards to 'EWOULDBLOCK' Also did you know with this recent work, if you create a tap interface, it starts UP, if i then use it with my test app, (also the real app) once the app has finished the tap interface is no longer up, before the recent work this was not the case. ie before: tap100: flags=8843 mtu 1500 lladdr fe:e1:ba:d4:05:17 index 12 priority 0 llprio 3 groups: tap status: no carrier inet 10.0.0.1 netmask 0xff00 broadcast 10.0.0.255A after: tap100: flags=8802 mtu 1500 lladdr fe:e1:ba:d4:05:17 index 12 priority 0 llprio 3 groups: tap status: no carrier inet 10.0.0.1 netmask 0xff00 broadcast 10.0.0.255 Cheers Adam
Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)
On Sun, Apr 12, 2020 at 09:53, David Gwynne wrote: > On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote: >> >Synopsis: Packet loss / ENOBUFs with kqueue(2) and tap(4) >> >Category: bug >> >Environment: >> System : OpenBSD 6.5 >> Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST >> 2019 >> ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP >> >> Architecture: OpenBSD.amd64 >> Machine : amd64 >> >Description: >> In Solo5 we have been working towards supporting multiple network >> interfaces, implemented this using kqueue(2) and tap(4). >> >> This involves setting up two Tap interfaces, starting up the program. >> In another session flood pinging the first Tap interface, >> Solo5 handles this with no packets dropped. >> In another session ping the second Tap interface, then for every >> ping to the second interface a packet is dropped on the first. If you >> switch to a flood ping on the second tab interface, you will observe >> massive packet loss on both interfaces, and ping complaining about >> No buffer space available (ENOBUFS). >> >> see https://github.com/Solo5/solo5/issues/374 for more information. >> >> >How-To-Repeat: >> I have been able to reproduct this in a hacked up exampled program, >> available here https://github.com/adamsteen/test_net_2if. Please note >> this is hacked, generally butchered program, which demonstrates the >> problem. (if required i can try and clean up this test case) >> >> 01. git clone https://github.com/adamsteen/test_net_2if >> 02. cd test_net_2if >> 03. make >> 04. doas setup.sh (Setup up the Tap interfaces) >> 05. doas ./test_net_2if >> 06. in another seesion start a flood ping >> doas ping -f 10.0.0.2 >> 07. Observe that the flood ping is functioning correctly, >> with no packets dropped. >> 08. In another session, start a normal ping >> ping 10.1.0.2 >> 09. Observe that, for each ping sent to service1, a packet is dropped. >> 10. Kill the normal ping >> 11. start a flood ping >> doas ping -f 10.1.0.2 >> 12. Observe massive packet loss on both interfaces, and ping >> complaining about No buffer space available (ENOBUFS). >> >Fix: >> Not Known. > > Hi Adam, > > claudio@ and I looked at this during a2k20, and came to the conclusion > that the packet loss occurred because an interface queue filled up > and it was shedding load. It was annoyingly easy to get to that point > though. > > We also spent a lot of time massaging the tun/tap code to try and unify > the semantics of tun and tap going through the network stack, and in > particular tried to avoid queuing packets until we finally get to the > output side of the stack. > > I'm not saying we've fixed this problem for you, but hopefully we've > mitigated it a bit. Could you try again and let us know if you see any > difference? If there's no difference, could you tweak your test to loop > on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or > whatever the errno is that means there's no packet to read right now? > Cheers, > dlg Hi dlg I will definitely have a look, and I will give the “EWOULDBLOCK” errno idea a go! Will probably get back to you Tuesday sometime! Cheers Adam
Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)
On Fri, Jul 05, 2019 at 03:51:31AM +, Adam Steen wrote: > >Synopsis:Packet loss / ENOBUFs with kqueue(2) and tap(4) > >Category:bug > >Environment: > System : OpenBSD 6.5 > Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 > 19:39:46 AWST 2019 > > ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > In Solo5 we have been working towards supporting multiple network > interfaces, implemented this using kqueue(2) and tap(4). > > This involves setting up two Tap interfaces, starting up the program. > In another session flood pinging the first Tap interface, > Solo5 handles this with no packets dropped. > In another session ping the second Tap interface, then for every > ping to the second interface a packet is dropped on the first. If you > switch to a flood ping on the second tab interface, you will observe > massive packet loss on both interfaces, and ping complaining about > No buffer space available (ENOBUFS). > > see https://github.com/Solo5/solo5/issues/374 for more information. > > >How-To-Repeat: > I have been able to reproduct this in a hacked up exampled program, > available here https://github.com/adamsteen/test_net_2if. Please note > this is hacked, generally butchered program, which demonstrates the > problem. (if required i can try and clean up this test case) > > 01. git clone https://github.com/adamsteen/test_net_2if > 02. cd test_net_2if > 03. make > 04. doas setup.sh (Setup up the Tap interfaces) > 05. doas ./test_net_2if > 06. in another seesion start a flood ping > doas ping -f 10.0.0.2 > 07. Observe that the flood ping is functioning correctly, > with no packets dropped. > 08. In another session, start a normal ping > ping 10.1.0.2 > 09. Observe that, for each ping sent to service1, a packet is dropped. > 10. Kill the normal ping > 11. start a flood ping > doas ping -f 10.1.0.2 > 12. Observe massive packet loss on both interfaces, and ping > complaining about No buffer space available (ENOBUFS). > >Fix: > Not Known. Hi Adam, claudio@ and I looked at this during a2k20, and came to the conclusion that the packet loss occurred because an interface queue filled up and it was shedding load. It was annoyingly easy to get to that point though. We also spent a lot of time massaging the tun/tap code to try and unify the semantics of tun and tap going through the network stack, and in particular tried to avoid queuing packets until we finally get to the output side of the stack. I'm not saying we've fixed this problem for you, but hopefully we've mitigated it a bit. Could you try again and let us know if you see any difference? If there's no difference, could you tweak your test to loop on the read() of the /dev/tap entry until it gets back EWOULDBLOCK or whatever the errno is that means there's no packet to read right now? Cheers, dlg
Re: Packet loss / ENOBUFs with kqueue(2) and tap(4)
CC Mato from Solo5 (not subscribed to bugs@) On Fri, Jul 5, 2019 at 11:51, Adam Steen wrote: >>Synopsis: Packet loss / ENOBUFs with kqueue(2) and tap(4) >>Category: bug >>Environment: > System : OpenBSD 6.5 > Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019 > ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >>Description: > In Solo5 we have been working towards supporting multiple network > interfaces, implemented this using kqueue(2) and tap(4). > > This involves setting up two Tap interfaces, starting up the program. > In another session flood pinging the first Tap interface, > Solo5 handles this with no packets dropped. > In another session ping the second Tap interface, then for every > ping to the second interface a packet is dropped on the first. If you > switch to a flood ping on the second tab interface, you will observe > massive packet loss on both interfaces, and ping complaining about > No buffer space available (ENOBUFS). > > see https://github.com/Solo5/solo5/issues/374 for more information. > >>How-To-Repeat: > I have been able to reproduct this in a hacked up exampled program, > available here https://github.com/adamsteen/test_net_2if. Please note > this is hacked, generally butchered program, which demonstrates the > problem. (if required i can try and clean up this test case) > > 01. git clone https://github.com/adamsteen/test_net_2if > 02. cd test_net_2if > 03. make > 04. doas setup.sh (Setup up the Tap interfaces) > 05. doas ./test_net_2if > 06. in another seesion start a flood ping > doas ping -f 10.0.0.2 > 07. Observe that the flood ping is functioning correctly, > with no packets dropped. > 08. In another session, start a normal ping > ping 10.1.0.2 > 09. Observe that, for each ping sent to service1, a packet is dropped. > 10. Kill the normal ping > 11. start a flood ping > doas ping -f 10.1.0.2 > 12. Observe massive packet loss on both interfaces, and ping > complaining about No buffer space available (ENOBUFS). >>Fix: > Not Known. > > dmesg: > OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019 > ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP > real mem = 17041059840 (16251MB) > avail mem = 16514461696 (15749MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (64 entries) > bios0: vendor LENOVO version "8DET69WW (1.39 )" date 07/18/2013 > bios0: LENOVO 4291N58 > acpi0 at bios0: ACPI 4.0 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA > SSDT SSDT DMAR UEFI UEFI UEFI > acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EXP7(S4) EHC1(S3) > EHC2(S3) HDEF(S4) > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpihpet0 at acpi0: 14318179 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.31 MHz, 06-2a-07 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu0: 256KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE > cpu1 at mainbus0: apid 2 (application processor) > cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz, 06-2a-07 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu1: 256KB 64b/line 8-way L2 cache > cpu1: smt 0, core 1, package 0 > ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins > acpimcfg0 at acpi0 > acpimcfg0: addr 0xf800, bus 0-63 > acpiec0 at acpi0 > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus -1 (PEG_) > acpiprt2 at acpi0: bus 2 (EXP1) > acpiprt3 at acpi0: bus 3 (EXP2) > acpiprt4 at acpi0: bus 5 (EXP4) > acpiprt5 at acpi0: bus 13 (EXP5) > acpiprt6 at acpi0: bus -1 (EXP7) > acpicpu0 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS > acpicpu1 at acpi0: C3(350@104 io@0x415), C1(1000@1 ha
Packet loss / ENOBUFs with kqueue(2) and tap(4)
>Synopsis: Packet loss / ENOBUFs with kqueue(2) and tap(4) >Category: bug >Environment: System : OpenBSD 6.5 Details : OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019 ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: In Solo5 we have been working towards supporting multiple network interfaces, implemented this using kqueue(2) and tap(4). This involves setting up two Tap interfaces, starting up the program. In another session flood pinging the first Tap interface, Solo5 handles this with no packets dropped. In another session ping the second Tap interface, then for every ping to the second interface a packet is dropped on the first. If you switch to a flood ping on the second tab interface, you will observe massive packet loss on both interfaces, and ping complaining about No buffer space available (ENOBUFS). see https://github.com/Solo5/solo5/issues/374 for more information. >How-To-Repeat: I have been able to reproduct this in a hacked up exampled program, available here https://github.com/adamsteen/test_net_2if. Please note this is hacked, generally butchered program, which demonstrates the problem. (if required i can try and clean up this test case) 01. git clone https://github.com/adamsteen/test_net_2if 02. cd test_net_2if 03. make 04. doas setup.sh (Setup up the Tap interfaces) 05. doas ./test_net_2if 06. in another seesion start a flood ping doas ping -f 10.0.0.2 07. Observe that the flood ping is functioning correctly, with no packets dropped. 08. In another session, start a normal ping ping 10.1.0.2 09. Observe that, for each ping sent to service1, a packet is dropped. 10. Kill the normal ping 11. start a flood ping doas ping -f 10.1.0.2 12. Observe massive packet loss on both interfaces, and ping complaining about No buffer space available (ENOBUFS). >Fix: Not Known. dmesg: OpenBSD 6.5-current (GENERIC.MP) #123: Sat Jun 29 19:39:46 AWST 2019 ast...@x220.adamsteen.com.au:/sys/arch/amd64/compile/GENERIC.MP real mem = 17041059840 (16251MB) avail mem = 16514461696 (15749MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xdae9c000 (64 entries) bios0: vendor LENOVO version "8DET69WW (1.39 )" date 07/18/2013 bios0: LENOVO 4291N58 acpi0 at bios0: ACPI 4.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SLIC SSDT SSDT SSDT HPET APIC MCFG ECDT ASF! TCPA SSDT SSDT DMAR UEFI UEFI UEFI acpi0: wakeup devices LID_(S3) SLPB(S3) IGBE(S4) EXP4(S4) EXP7(S4) EHC1(S3) EHC2(S3) HDEF(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2492.31 MHz, 06-2a-07 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz, 2491.91 MHz, 06-2a-07 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xf800, bus 0-63 acpiec0 at acpi0 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (PEG_) acpiprt2 at acpi0: bus 2 (EXP1) acpiprt3 at acpi0: bus 3 (EXP2) acpiprt4 at acpi0: bus 5 (EXP4) acpiprt5 at acpi0: bus 13 (EXP5) acpiprt6 at acpi0: bus -1 (EXP7) acpicpu0 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS acpicpu1 at acpi0: C3(350@104 io@0x415), C1(1000@1 halt), PSS acpipwrres0 at acpi0: PUBS, resource for EHC1, EHC2 acpitz0 at acpi0: critical temperature is 99 degC acpibtn0 at acpi0: LID_ acpibtn1 at acpi0: SLPB acpipci0 at acpi0 PCI0: 0x 0x0