Re: [vpp-dev] VPP Graph Optimization

Dave Barach (dbarach) Fri, 26 Jan 2018 11:17:17 -0800

Dear David,


A bit of history. We worked on vpp for a decade before making any serious 
effort to multi-thread it. The first scheme that I tried was to break up the 
graph into reconfigurable pipeline stages. Effective partitioning of the graph 
is highly workload-dependent, and it can change in a heartbeat. the resulting 
system runs at the speed of the slowest pipeline stage.

 

In terms of easily measured inter-thread handoff cost, it’s not awful. 2-3 
clocks/pkt. Handing vectors of packets between threads can cause a festival of 
cache coherence traffic, and it can easily undo the positive effects of ddio 
(packet data DMA into the cache hierarchy).

 

We actually use the scheme you describe in a very fine-grained way: dual and 
quad loop graph dispatch functions process 2 or 4 packets at the same time. 
Until we run out of registers, a superscalar CPU can “do the same thing to 2 or 
4 packets at the same time” pretty effectively. Including memory hierarchy 
stalls, vpp averages more than two instructions retired per clock cycle.

 

At the graph node level, I can’t see how to leverage this technique. Presenting 
[identical] vectors to 2 (or more) nodes running on multiple threads would mean 
(1) the parallelized subgraph would run at the speed of the slowest node. (2) 
you’d pay the handoff costs already discussed above, and (3) you’d need an 
expensive algorithm to make sure that all vector replicas were finished before 
reentering sequential processing. (4) None of the graph nodes we’ve ever 
constructed are free of ordering constraints. Every node alters packet state in 
a meaningful way, or they wouldn’t be worth having. (😉)… 

 

We’ve had considerable success with flow-hashing across a set of identical 
graph replicas [worker threads], even when available hardware RSS hashing is 
not useful [think about NATted UDP traffic]. 

 

Hope this is of some interest.

 

Thanks… Dave

 

From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
Behalf Of David Bainbridge
Sent: Friday, January 26, 2018 12:39 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP Graph Optimization

 

I have just started to read up on VPP/FD.io, and I have a question about graph 
optimization and was wondering if (as I suspect) this has already been thought 
about and either planned or decided against.

 

The documentation I found on VPP essentially says that VPP uses batch 
processing and processes all packets in a vector on one step before proceeding 
to the next step. The claim is this provides overall better throughput because 
of instruction caching.

 

I was wondering if optimization of the graph to understand where concurrency 
can be leveraged has been considered, as well as where you could process the 
vector by two steps with an offset. If this is possible, then steps could be 
pinned to cores and perhaps both concurrency and instruction caching could be 
leveraged.

 

For example assume the following graph:

 



 

In this graph, steps B,C can be done concurrently as they don't "modify" the 
vector. Steps D, E can't be done concurrently, but as they don't require look 
back/forward they can be done in offset.

 

What I am suggesting is, if there are enough cores, then steps could be pinned 
to cores to achieve the benefits of instruction caching, and after step A is 
complete, steps B,C could be done concurrently. After B,C are complete then D 
can be started and as D completes processing on a packet if can then be 
processed by E (i.e., the entire vector does not need to be processed by D 
before processing by E is started).

 

I make no argument that this doesn't increase complexity and also introduces 
coordination costs that don't exists today. To be fair, offset processing could 
be viewed as splitting the original large vector into smaller vectors and 
processing the smaller vectors from start to finish (almost dynamic 
optimization based on dynamic vector resizing).

Just curious to hear others thoughts and if some of this has been thought 
through or experimented with. As I said, just thinking off the cuff and 
wondering; not fully thought through.

 

avèk respè,

/david

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] VPP Graph Optimization

Reply via email to