Francois,
We did a bit of throughput testing with VPP back in April. The machines used are detailed in a someone infamous 1 April blog post. https://www.netgate.com/blog/building-a-behemoth-router.html Basically a pair of boxes with i7-6950X, some water cooling, and Intel XL710 cards (we also added one of our 8955 CPIC cards to each, because the real focus was IPsec throughput teeing.) Source and sink were a pair of 4C Xeons with XL710 cards installed. Anyway, using pktgen, and 64 byte UDP frames, we got to 42.60 Mpps using 8 concurrent flows. Some quick back-of-the-envelope math: 50000000000 bits/sec / (84*8) bits/packet. (64-byte tiny grams plus the SFD (1 byte) + Preamble (7 bytes) + IFG (12 bytes)) = 74404761.90 packets/sec 42.6/74.4 = .57258 or 57% of the 50gbps line rate with an untuned setup. Your state that only 47% is achievable. Thanks, Jim > On Oct 11, 2017, at 9:50 AM, Francois Ozog <francois.o...@linaro.org> wrote: > > Hi Damjan, > > When it comes to performance, the contiguity prevents DMA transaction > coalescing which is required to reach line rate for 64 byte packets at > 25Gbps and above. > > On a PCI gen 3 x8 lanes slot, you have 50Gbps and roughly 35M DMA > transactions per second. This allows reaching 47% of 64 byte line rate > of a 50Gbps network adapter. This is also true for PCIx16 for 100Gbps > card. > > With hardware that can handle those DMA coalescing, you need bounce > buffers and the cost of packet copy to approach line rate. Allowing > non contiguous packet and meta data is a way to keep zero-copy > solution. > > Packet copy is awfully bad to conduct L3F with 1M route on an internet > router, way worse than independent reads.... > > Of course, you could say use PCIx16 for 50Gbps cards but that is also > reduces the number of ports that can be installed on a system, should > we find systems with enough x16 slots. > > So even if the rework is significant, does the end goal is worth the cost? > > -FF > > On 5 October 2017 at 11:42, Damjan Marion <dmarion.li...@gmail.com> wrote: >> Francois, >> >> Almost every VPP feature assumes that data is adjacent to vlib_buffer_t. >> >> It will be huge rework to make this happen, and it will slow down >> performance as >> it will introduce dependent read at many places in the code... >> >> So to answer your question, we don’t have such plans. >> >> Thanks, >> >> Damjan >> >>> On 4 Oct 2017, at 17:17, Francois Ozog <francois.o...@linaro.org> wrote: >>> >>> Hi, >>> >>> Hardware that is capable of 50Gbps and above (at 64 byte line rate) >>> place packets next to each other in large memory zones rather than in >>> individual memory buffers. >>> >>> To handle packets without copy would require vlib_buffer_t to allow >>> packet data to be NOT consecutive to it. >>> >>> Are there plans or have there been discussions on this topic before ? >>> (I checked the archive and did not found any reference). >>> >>> -FF >>> _______________________________________________ >>> vpp-dev mailing list >>> vpp-dev@lists.fd.io >>> https://lists.fd.io/mailman/listinfo/vpp-dev >> > > > > -- > François-Frédéric Ozog | Director Linaro Networking Group > T: +33.67221.6485 > francois.o...@linaro.org | Skype: ffozog > _______________________________________________ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev