Francois,

We did a bit of throughput testing with VPP back in April.  The machines used 
are detailed in a someone infamous 1 April blog post.
https://www.netgate.com/blog/building-a-behemoth-router.html

Basically a pair of boxes with i7-6950X, some water cooling, and Intel XL710 
cards (we also added one of our 8955 CPIC cards to each, because the real focus 
was IPsec throughput teeing.)

Source and sink were a pair of 4C Xeons with XL710 cards installed.

Anyway, using pktgen, and 64 byte UDP frames, we got to 42.60 Mpps using 8 
concurrent flows.

Some quick back-of-the-envelope math:

50000000000 bits/sec  / (84*8) bits/packet.   (64-byte tiny grams plus the SFD 
(1 byte) + Preamble (7 bytes) + IFG (12 bytes))
= 74404761.90 packets/sec

42.6/74.4 =  .57258 or 57% of the 50gbps line rate with an untuned setup.

Your state that only 47% is achievable.

Thanks,

Jim 

> On Oct 11, 2017, at 9:50 AM, Francois Ozog <francois.o...@linaro.org> wrote:
> 
> Hi Damjan,
> 
> When it comes to performance, the contiguity prevents DMA transaction
> coalescing which is required to reach line rate for 64 byte packets at
> 25Gbps and above.
> 
> On a PCI gen 3 x8 lanes slot, you have 50Gbps and roughly 35M DMA
> transactions per second. This allows reaching 47% of 64 byte line rate
> of a 50Gbps network adapter. This is also true for PCIx16 for 100Gbps
> card.
> 
> With hardware that can handle those DMA coalescing, you need bounce
> buffers and the cost of packet copy to approach line rate. Allowing
> non contiguous packet and meta data is a way to keep zero-copy
> solution.
> 
> Packet copy is awfully bad to conduct L3F with 1M route on an internet
> router, way worse than independent reads....
> 
> Of course, you could say use PCIx16 for 50Gbps cards but that is also
> reduces the number of ports that can be installed on a system, should
> we find systems with enough x16 slots.
> 
> So even if the rework is significant, does the end goal is worth the cost?
> 
> -FF
> 
> On 5 October 2017 at 11:42, Damjan Marion <dmarion.li...@gmail.com> wrote:
>> Francois,
>> 
>> Almost every VPP feature assumes that data is adjacent to vlib_buffer_t.
>> 
>> It will be huge rework to make this happen, and it will slow down 
>> performance as
>> it will introduce dependent read at many places in the code...
>> 
>> So to answer your question, we don’t have such plans.
>> 
>> Thanks,
>> 
>> Damjan
>> 
>>> On 4 Oct 2017, at 17:17, Francois Ozog <francois.o...@linaro.org> wrote:
>>> 
>>> Hi,
>>> 
>>> Hardware that is capable of 50Gbps and above (at 64 byte line rate)
>>> place packets next to each other in large memory zones rather than in
>>> individual memory buffers.
>>> 
>>> To handle packets without copy would require vlib_buffer_t to allow
>>> packet data to be NOT consecutive to it.
>>> 
>>> Are there plans or have there been discussions on this topic before ?
>>> (I checked the archive and did not found any reference).
>>> 
>>> -FF
>>> _______________________________________________
>>> vpp-dev mailing list
>>> vpp-dev@lists.fd.io
>>> https://lists.fd.io/mailman/listinfo/vpp-dev
>> 
> 
> 
> 
> -- 
> François-Frédéric Ozog | Director Linaro Networking Group
> T: +33.67221.6485
> francois.o...@linaro.org | Skype: ffozog
> _______________________________________________
> vpp-dev mailing list
> vpp-dev@lists.fd.io
> https://lists.fd.io/mailman/listinfo/vpp-dev

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to