Re: network performance fun
> On 25 Oct 2016, at 16:31, Mike Belopuhovwrote: > > On 25 October 2016 at 08:25, Mike Belopuhov wrote: >> On 25 October 2016 at 02:34, David Gwynne wrote: I see. I will double check this tomorrow but your approach looks solid. >>> >>> it's obviously an interaction between intel nics that do not align >>> their ethernet headers correctly, and the M_PREPEND which i just >>> changed (and you oked) in src/sys/net/if_ethersubr.c r1.240). >>> >>> basically the stack strips the 6 byte ethernet header before pushing >>> the packet into the ip stack, and forwarding causes it to be output >>> as an ethernet packet. the M_PREPEND of 8 bytes in ether_output >>> causes an mbuf to be prefixed cos the frame has 6 bytes free, not >>> 8. >>> >> >> Ah right, it's the same cluster we're transmitting as we have Rx'ed... >> > > You need to get your mcl2k2 change in then. ill put it in tonight. cheers, dlg > >>> the good news is that at least the prepended mbuf gets its ethernet >>> header correctly aligned. >>> >>> dlg
Re: network performance fun
On 25 October 2016 at 08:25, Mike Belopuhovwrote: > On 25 October 2016 at 02:34, David Gwynne wrote: >>> I see. I will double check this tomorrow but your approach >>> looks solid. >> >> it's obviously an interaction between intel nics that do not align >> their ethernet headers correctly, and the M_PREPEND which i just >> changed (and you oked) in src/sys/net/if_ethersubr.c r1.240). >> >> basically the stack strips the 6 byte ethernet header before pushing >> the packet into the ip stack, and forwarding causes it to be output >> as an ethernet packet. the M_PREPEND of 8 bytes in ether_output >> causes an mbuf to be prefixed cos the frame has 6 bytes free, not >> 8. >> > > Ah right, it's the same cluster we're transmitting as we have Rx'ed... > You need to get your mcl2k2 change in then. >> the good news is that at least the prepended mbuf gets its ethernet >> header correctly aligned. >> >> dlg
Re: network performance fun
On 25 October 2016 at 02:34, David Gwynnewrote: >> I see. I will double check this tomorrow but your approach >> looks solid. > > it's obviously an interaction between intel nics that do not align > their ethernet headers correctly, and the M_PREPEND which i just > changed (and you oked) in src/sys/net/if_ethersubr.c r1.240). > > basically the stack strips the 6 byte ethernet header before pushing > the packet into the ip stack, and forwarding causes it to be output > as an ethernet packet. the M_PREPEND of 8 bytes in ether_output > causes an mbuf to be prefixed cos the frame has 6 bytes free, not > 8. > Ah right, it's the same cluster we're transmitting as we have Rx'ed... > the good news is that at least the prepended mbuf gets its ethernet > header correctly aligned. > > dlg
Re: network performance fun
On Tue, Oct 25, 2016 at 12:50:50AM +0200, Mike Belopuhov wrote: > On Tue, Oct 25, 2016 at 00:22 +0200, Hrvoje Popovski wrote: > > On 24.10.2016. 23:36, Mike Belopuhov wrote: > > > On Mon, Oct 24, 2016 at 19:04 +0200, Hrvoje Popovski wrote: > > >> Hi all, > > >> > > >> OpenBSD box acts as transit router for /8 networks without pf and with > > >> this sysctls > > >> > > >> ddb.console=1 > > >> kern.pool_debug=0 > > >> net.inet.ip.forwarding=1 > > >> net.inet.ip.ifq.maxlen=8192 > > >> > > >> netstat > > >> 11/8 192.168.11.2 UGS0 114466419 - 8 > > >> ix0 > > >> 12/8 192.168.12.2 UGS00 - 8 > > >> ix1 > > >> 13/8 192.168.13.2 UGS00 - 8 > > >> myx0 > > >> 14/8 192.168.14.2 UGS00 - 8 > > >> myx1 > > >> 15/8 192.168.15.2 UGS00 - 8 > > >> em3 > > >> 16/8 192.168.16.2 UGS0 89907239 - 8 > > >> em2 > > >> 17/8 192.168.17.2 UGS0 65791508 - 8 > > >> bge0 > > >> 18/8 192.168.18.2 UGS00 - 8 > > >> bge1 > > >> > > >> while testing dlg@ "mcl2k2 mbuf clusters" patch with todays -current i > > >> saw that performance with plain -current drops for about 300Kpps vs > > >> -current from 06.10.2016. by bisecting cvs tree it seems that this > > >> commit is guilty for this > > >> > > >> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c?rev=1.240=text/x-cvsweb-markup > > >> > > > > > > I don't see how this change can affect performance in such a way > > > unless you're sending jumbo packets, but then the packet rates > > > are too high. Are you 100% sure it's this particular change? > > > > > > > No, no, i'm not 100% sure. I was doing this to try to find bottleneck: > > > > cvs -q checkout -D "2016-10-XX" -P src > > > > 2016-10-06 - 900kpps > > 2016-10-07 - 900kpps > > 2016-10-10 - 900kpps > > 2016-10-11 - 650kpps > > 2016-10-11 with if_ethersubr.c 1.239 - 900kpps > > ... > > 2016-10-14 - 650kpps > > 2016-10-14 with dlg@ patch - 900kpps > > 2016-10-14 with dlg@ patch and with if_ethersubr.c 1.239 - 880kpps > > > > 2016-10-24 - results are in mail ... > > > > and then i looked at networking diffs from 2016-10-10 and 2016-10-11 and > > it seems that if_ethersubr.c is guilty > > > > tests was done over ix only ... > > > > although as you can see with today's plain -current i'm getting 690kpps > > and with today's -current with if_ethersubr.c 1.239 i'm getting 910kpps > > > > so i thought that there must be something with if_ethersubr.c > > > > I see. I will double check this tomorrow but your approach > looks solid. it's obviously an interaction between intel nics that do not align their ethernet headers correctly, and the M_PREPEND which i just changed (and you oked) in src/sys/net/if_ethersubr.c r1.240). basically the stack strips the 6 byte ethernet header before pushing the packet into the ip stack, and forwarding causes it to be output as an ethernet packet. the M_PREPEND of 8 bytes in ether_output causes an mbuf to be prefixed cos the frame has 6 bytes free, not 8. the good news is that at least the prepended mbuf gets its ethernet header correctly aligned. dlg
Re: network performance fun
On Tue, Oct 25, 2016 at 00:22 +0200, Hrvoje Popovski wrote: > On 24.10.2016. 23:36, Mike Belopuhov wrote: > > On Mon, Oct 24, 2016 at 19:04 +0200, Hrvoje Popovski wrote: > >> Hi all, > >> > >> OpenBSD box acts as transit router for /8 networks without pf and with > >> this sysctls > >> > >> ddb.console=1 > >> kern.pool_debug=0 > >> net.inet.ip.forwarding=1 > >> net.inet.ip.ifq.maxlen=8192 > >> > >> netstat > >> 11/8 192.168.11.2 UGS0 114466419 - 8 > >> ix0 > >> 12/8 192.168.12.2 UGS00 - 8 ix1 > >> 13/8 192.168.13.2 UGS00 - 8 > >> myx0 > >> 14/8 192.168.14.2 UGS00 - 8 > >> myx1 > >> 15/8 192.168.15.2 UGS00 - 8 em3 > >> 16/8 192.168.16.2 UGS0 89907239 - 8 em2 > >> 17/8 192.168.17.2 UGS0 65791508 - 8 > >> bge0 > >> 18/8 192.168.18.2 UGS00 - 8 > >> bge1 > >> > >> while testing dlg@ "mcl2k2 mbuf clusters" patch with todays -current i > >> saw that performance with plain -current drops for about 300Kpps vs > >> -current from 06.10.2016. by bisecting cvs tree it seems that this > >> commit is guilty for this > >> > >> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c?rev=1.240=text/x-cvsweb-markup > >> > > > > I don't see how this change can affect performance in such a way > > unless you're sending jumbo packets, but then the packet rates > > are too high. Are you 100% sure it's this particular change? > > > > No, no, i'm not 100% sure. I was doing this to try to find bottleneck: > > cvs -q checkout -D "2016-10-XX" -P src > > 2016-10-06 - 900kpps > 2016-10-07 - 900kpps > 2016-10-10 - 900kpps > 2016-10-11 - 650kpps > 2016-10-11 with if_ethersubr.c 1.239 - 900kpps > ... > 2016-10-14 - 650kpps > 2016-10-14 with dlg@ patch - 900kpps > 2016-10-14 with dlg@ patch and with if_ethersubr.c 1.239 - 880kpps > > 2016-10-24 - results are in mail ... > > and then i looked at networking diffs from 2016-10-10 and 2016-10-11 and > it seems that if_ethersubr.c is guilty > > tests was done over ix only ... > > although as you can see with today's plain -current i'm getting 690kpps > and with today's -current with if_ethersubr.c 1.239 i'm getting 910kpps > > so i thought that there must be something with if_ethersubr.c > I see. I will double check this tomorrow but your approach looks solid. > > What kind of traffic are you testing this with? > > I assume small IP or UDP packets, correct? > > > > yes, 64 byte UDP without flowcontrol.. > > > Actually I'd like to know what causes this. > > > > So far I've noticed that the code generating ICMP error doesn't > > reserve space for the link header but it's unlikely a culprit. > > (The diff was only compile tested so far...) > > > > > with -current from few minutes ago and with this diff i'm getting panic > MH_ALIGN gets in the way... This should solve it, but needs to be tested with large packets. diff --git sys/netinet/ip_icmp.c sys/netinet/ip_icmp.c index cdd60aa..5542f64 100644 --- sys/netinet/ip_icmp.c +++ sys/netinet/ip_icmp.c @@ -210,7 +210,8 @@ icmp_do_error(struct mbuf *n, int type, int code, u_int32_t dest, int destmtu) icmplen = MCLBYTES - ICMP_MINLEN - sizeof (struct ip); m = m_gethdr(M_DONTWAIT, MT_HEADER); - if (m && (sizeof (struct ip) + icmplen + ICMP_MINLEN > MHLEN)) { + if (m && (max_linkhdr + sizeof(struct ip) + icmplen + + ICMP_MINLEN > MHLEN)) { MCLGET(m, M_DONTWAIT); if ((m->m_flags & M_EXT) == 0) { m_freem(m); @@ -224,6 +225,8 @@ icmp_do_error(struct mbuf *n, int type, int code, u_int32_t dest, int destmtu) m->m_len = icmplen + ICMP_MINLEN; if ((m->m_flags & M_EXT) == 0) MH_ALIGN(m, m->m_len); + else + m->m_data += max_linkhdr; icp = mtod(m, struct icmp *); if ((u_int)type > ICMP_MAXTYPE) panic("icmp_error");
Re: network performance fun
On 25.10.2016. 0:22, Hrvoje Popovski wrote: > On 24.10.2016. 23:36, Mike Belopuhov wrote: >> On Mon, Oct 24, 2016 at 19:04 +0200, Hrvoje Popovski wrote: >>> Hi all, >>> >>> OpenBSD box acts as transit router for /8 networks without pf and with >>> this sysctls >>> >>> ddb.console=1 >>> kern.pool_debug=0 >>> net.inet.ip.forwarding=1 >>> net.inet.ip.ifq.maxlen=8192 >>> >>> netstat >>> 11/8 192.168.11.2 UGS0 114466419 - 8 ix0 >>> 12/8 192.168.12.2 UGS00 - 8 ix1 >>> 13/8 192.168.13.2 UGS00 - 8 myx0 >>> 14/8 192.168.14.2 UGS00 - 8 myx1 >>> 15/8 192.168.15.2 UGS00 - 8 em3 >>> 16/8 192.168.16.2 UGS0 89907239 - 8 em2 >>> 17/8 192.168.17.2 UGS0 65791508 - 8 bge0 >>> 18/8 192.168.18.2 UGS00 - 8 bge1 >>> >>> while testing dlg@ "mcl2k2 mbuf clusters" patch with todays -current i >>> saw that performance with plain -current drops for about 300Kpps vs >>> -current from 06.10.2016. by bisecting cvs tree it seems that this >>> commit is guilty for this >>> >>> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c?rev=1.240=text/x-cvsweb-markup >>> >> >> I don't see how this change can affect performance in such a way >> unless you're sending jumbo packets, but then the packet rates >> are too high. Are you 100% sure it's this particular change? >> > > No, no, i'm not 100% sure. I was doing this to try to find bottleneck: > > cvs -q checkout -D "2016-10-XX" -P src > > 2016-10-06 - 900kpps > 2016-10-07 - 900kpps > 2016-10-10 - 900kpps > 2016-10-11 - 650kpps > 2016-10-11 with if_ethersubr.c 1.239 - 900kpps > ... > 2016-10-14 - 650kpps > 2016-10-14 with dlg@ patch - 900kpps > 2016-10-14 with dlg@ patch and with if_ethersubr.c 1.239 - 880kpps > > 2016-10-24 - results are in mail ... > > and then i looked at networking diffs from 2016-10-10 and 2016-10-11 and > it seems that if_ethersubr.c is guilty > > tests was done over ix only ... > > although as you can see with today's plain -current i'm getting 690kpps > and with today's -current with if_ethersubr.c 1.239 i'm getting 910kpps > just please see that bge performance are the same with if_ethersubr.c 1.239 or 1.241. i haven't test myx, will do that ...
Re: network performance fun
On 24.10.2016. 23:36, Mike Belopuhov wrote: > On Mon, Oct 24, 2016 at 19:04 +0200, Hrvoje Popovski wrote: >> Hi all, >> >> OpenBSD box acts as transit router for /8 networks without pf and with >> this sysctls >> >> ddb.console=1 >> kern.pool_debug=0 >> net.inet.ip.forwarding=1 >> net.inet.ip.ifq.maxlen=8192 >> >> netstat >> 11/8 192.168.11.2 UGS0 114466419 - 8 ix0 >> 12/8 192.168.12.2 UGS00 - 8 ix1 >> 13/8 192.168.13.2 UGS00 - 8 myx0 >> 14/8 192.168.14.2 UGS00 - 8 myx1 >> 15/8 192.168.15.2 UGS00 - 8 em3 >> 16/8 192.168.16.2 UGS0 89907239 - 8 em2 >> 17/8 192.168.17.2 UGS0 65791508 - 8 bge0 >> 18/8 192.168.18.2 UGS00 - 8 bge1 >> >> while testing dlg@ "mcl2k2 mbuf clusters" patch with todays -current i >> saw that performance with plain -current drops for about 300Kpps vs >> -current from 06.10.2016. by bisecting cvs tree it seems that this >> commit is guilty for this >> >> http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c?rev=1.240=text/x-cvsweb-markup >> > > I don't see how this change can affect performance in such a way > unless you're sending jumbo packets, but then the packet rates > are too high. Are you 100% sure it's this particular change? > No, no, i'm not 100% sure. I was doing this to try to find bottleneck: cvs -q checkout -D "2016-10-XX" -P src 2016-10-06 - 900kpps 2016-10-07 - 900kpps 2016-10-10 - 900kpps 2016-10-11 - 650kpps 2016-10-11 with if_ethersubr.c 1.239 - 900kpps ... 2016-10-14 - 650kpps 2016-10-14 with dlg@ patch - 900kpps 2016-10-14 with dlg@ patch and with if_ethersubr.c 1.239 - 880kpps 2016-10-24 - results are in mail ... and then i looked at networking diffs from 2016-10-10 and 2016-10-11 and it seems that if_ethersubr.c is guilty tests was done over ix only ... although as you can see with today's plain -current i'm getting 690kpps and with today's -current with if_ethersubr.c 1.239 i'm getting 910kpps so i thought that there must be something with if_ethersubr.c > What kind of traffic are you testing this with? > I assume small IP or UDP packets, correct? > yes, 64 byte UDP without flowcontrol.. > Actually I'd like to know what causes this. > > So far I've noticed that the code generating ICMP error doesn't > reserve space for the link header but it's unlikely a culprit. > (The diff was only compile tested so far...) > > diff --git sys/netinet/ip_icmp.c sys/netinet/ip_icmp.c > index cdd60aa..b3ddea4 100644 > --- sys/netinet/ip_icmp.c > +++ sys/netinet/ip_icmp.c > @@ -208,19 +208,21 @@ icmp_do_error(struct mbuf *n, int type, int code, > u_int32_t dest, int destmtu) > > if (icmplen + ICMP_MINLEN > MCLBYTES) > icmplen = MCLBYTES - ICMP_MINLEN - sizeof (struct ip); > > m = m_gethdr(M_DONTWAIT, MT_HEADER); > - if (m && (sizeof (struct ip) + icmplen + ICMP_MINLEN > MHLEN)) { > + if (m && (max_linkhdr + sizeof(struct ip) + icmplen + > + ICMP_MINLEN > MHLEN)) { > MCLGET(m, M_DONTWAIT); > if ((m->m_flags & M_EXT) == 0) { > m_freem(m); > m = NULL; > } > } > if (m == NULL) > goto freeit; > + m->m_data += max_linkhdr; > /* keep in same rtable */ > m->m_pkthdr.ph_rtableid = n->m_pkthdr.ph_rtableid; > m->m_len = icmplen + ICMP_MINLEN; > if ((m->m_flags & M_EXT) == 0) > MH_ALIGN(m, m->m_len); > with -current from few minutes ago and with this diff i'm getting panic login: panic: pool_do_get: mbufpl free list modified: page 0xff00697e8000; item addr 0xff00697e8800; offset 0x0= 0x384500081c56 != 0xf2a4b1392c5839b2 Stopped at Debugger+0x9: leave TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *11010 11010 830x100012 02 ntpd Debugger() at Debugger+0x9 panic() at panic+0xfe pool_runqueue() at pool_runqueue pool_get() at pool_get+0xb5 m_get() at m_get+0x28 m_getuio() at m_getuio+0x5c sosend() at sosend+0x268 sendit() at sendit+0x258 sys_sendmsg() at sys_sendmsg+0xc0 syscall() at syscall+0x27b --- syscall (number 28) --- end of kernel end trace frame: 0x7f7f11f0, count: 5 0xd9f5f7f362a: https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> show panic pool_do_get: mbufpl free list modified: page 0xff00697e8000; item addr 0xff 00697e8800; offset 0x0=0x384500081c56 != 0xf2a4b1392c5839b2 ddb{2}> trace Debugger() at Debugger+0x9 panic() at panic+0xfe pool_runqueue() at pool_runqueue pool_get() at pool_get+0xb5 m_get() at m_get+0x28 m_getuio() at m_getuio+0x5c sosend() at
Re: network performance fun
On Mon, Oct 24, 2016 at 19:04 +0200, Hrvoje Popovski wrote: > Hi all, > > OpenBSD box acts as transit router for /8 networks without pf and with > this sysctls > > ddb.console=1 > kern.pool_debug=0 > net.inet.ip.forwarding=1 > net.inet.ip.ifq.maxlen=8192 > > netstat > 11/8 192.168.11.2 UGS0 114466419 - 8 ix0 > 12/8 192.168.12.2 UGS00 - 8 ix1 > 13/8 192.168.13.2 UGS00 - 8 myx0 > 14/8 192.168.14.2 UGS00 - 8 myx1 > 15/8 192.168.15.2 UGS00 - 8 em3 > 16/8 192.168.16.2 UGS0 89907239 - 8 em2 > 17/8 192.168.17.2 UGS0 65791508 - 8 bge0 > 18/8 192.168.18.2 UGS00 - 8 bge1 > > while testing dlg@ "mcl2k2 mbuf clusters" patch with todays -current i > saw that performance with plain -current drops for about 300Kpps vs > -current from 06.10.2016. by bisecting cvs tree it seems that this > commit is guilty for this > > http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_ethersubr.c?rev=1.240=text/x-cvsweb-markup > I don't see how this change can affect performance in such a way unless you're sending jumbo packets, but then the packet rates are too high. Are you 100% sure it's this particular change? What kind of traffic are you testing this with? I assume small IP or UDP packets, correct? > > -current from 24.10.2016 > > ix > send receive > 690Kpps 690Kpps > 700Kpps 680Kpps > 800Kpps 350Kpps > 1.4Mpps 305Kpps > 14Mpps305Kpps > > em > send receive > 690Kpps 690Kpps > 700Kpps 680Kpps > 800Kpps 700Kpps > 1.4Mpps 700Kpps > > bge > send receive > 620Kpps 620Kpps > 630Kpps 515Kpps > 700Kpps 475Kpps > 800Kpps 430Kpps > 1.4Mpps 350Kpps > > > -current with if_ethersubr.c version 1.239 > > ix > send receive > 910Kpps 910Kpps > 920Kpps 820Kpps > 1Mpps 825Kpps > 1.4Mpps 700Kpps > 14Mpps700Kpps > > em > send receive > 940Kpps 940Kpps > 950Kpps 845Kpps > 1Mpps 855Kpps > 1.4Mpps 845Kpps > > bge > send receive > 620Kpps 620Kpps > 630Kpps 525Kpps > 700Kpps 485Kpps > 800Kpps 435Kpps > 1.4Mpps 350Kpps > > > applying dlg@ "mcl2k2 mbuf clusters" patch to todays -current brings > performance back to ix and em ... bge is emotional as always :) > > ix > send receive > 900Kpps 900Kpps > 910Kpps 895Kpps > 1Mpps 895Kpps > 1.4Mpps 810Kpps > 14Mpps815Kpps > > em > send receive > 940Kpps 940Kpps > 950Kpps 930Kpps > 1Mpps 920Kpps > 1.4Mpps 930Kpps > > bge > send receive > 620Kpps 620Kpps > 630Kpps 520Kpps > 700Kpps 475Kpps > 800Kpps 430Kpps > 1.4Mpps 366Kpps > > > > i honestly don't know what all that means, i'm just writing my > observation ... > Actually I'd like to know what causes this. So far I've noticed that the code generating ICMP error doesn't reserve space for the link header but it's unlikely a culprit. (The diff was only compile tested so far...) diff --git sys/netinet/ip_icmp.c sys/netinet/ip_icmp.c index cdd60aa..b3ddea4 100644 --- sys/netinet/ip_icmp.c +++ sys/netinet/ip_icmp.c @@ -208,19 +208,21 @@ icmp_do_error(struct mbuf *n, int type, int code, u_int32_t dest, int destmtu) if (icmplen + ICMP_MINLEN > MCLBYTES) icmplen = MCLBYTES - ICMP_MINLEN - sizeof (struct ip); m = m_gethdr(M_DONTWAIT, MT_HEADER); - if (m && (sizeof (struct ip) + icmplen + ICMP_MINLEN > MHLEN)) { + if (m && (max_linkhdr + sizeof(struct ip) + icmplen + + ICMP_MINLEN > MHLEN)) { MCLGET(m, M_DONTWAIT); if ((m->m_flags & M_EXT) == 0) { m_freem(m); m = NULL; } } if (m == NULL) goto freeit; + m->m_data += max_linkhdr; /* keep in same rtable */ m->m_pkthdr.ph_rtableid = n->m_pkthdr.ph_rtableid; m->m_len = icmplen + ICMP_MINLEN; if ((m->m_flags & M_EXT) == 0) MH_ALIGN(m, m->m_len);