Hello DPDK Users:

I have some in-depth/rigorous questions about DPDK's ACL matching capabilities. 
 To answer these questions, I have been scouring the web, reading much 
documentation, and pouring over the source code to the best of my ability.  As 
I dig through the source, I am struggling to understand all of nuances of the 
implementation and was hoping someone might have a bit higher level discussion 
about how the ACL matching algorithm behaves.  If this doesn't exist, then I 
will continue to plow through the source, but it isn't a fast process.

Currently, my organization has a product that does ACL matching by using a 
TCAM.  Because of this, I can make hard guarantees about our systems 
performance.  A TCAM has some desirable traits:

1. Line rate matching of 64K frames 
2. No performance degradation as rule set increases in size, (until the TCAM is 
exhausted of course)
3. No restrictions on rule set, (i.e. no rule set behaves better or worse than 
another.  They all work at line rate)
4. Adding or removing a rule is fast
5. No preprocessing or "compilation" of the rule set is required

While the TCAM is nice, I am interested in pushing our product away from the 
specialized hardware and replace the TCAM with DPDK.  However, I have concerns 
about the ACL matching capability provided by DPDK.  I would like to make some 
assertions about what rule sets perform better than others.  Specifically, I 
would like to know:

1.  What types of rules increase memory use?
2.  What types of rules lead to decreased performance?
3.  Has benchmarking been done to find pathological worst case rule sets?  What 
do these rule sets look like?

Personally, I have built several "home grown" ACL matching algorithms on top of 
DPDK.  I have experimented with many variants of trie trees, cross producting 
all permutations of fields/dimensions, and many others.  None of my 
implementations match the observed capability of the DPDK provided ACL 
implementation.  I have even generated, (via script), random ACL rulesets that 
can be quite large (10K rules).  When loaded into the l3fwd-acl example 
program, performance is still quite strong.  Rule sets that were a pathological 
worst case for my implementations were generally handled gracefully by DPDK 
ACLs.  I need to understand this implementation, but I tend to get lost when 
working through the source.

I understand that the ACL implementation uses tries with an 8bit stride.  
However, I don't understand how the DPDK implementation efficiently handles 
don't cares or ranges that may be found in various search fields.  My homegrown 
trie implementation accounted for this in one of two ways:

1.  Expand the tree to account for don't cares -- This was effective, but 
pathological rule sets caused huge amounts of RAM usage.  A don't care bit 
early on in the rule resulted in many duplicative copies of subtrees under the 
don't care trie paths.
2.  Use a "ternary" trie with backtracking -- This is also effective, but since 
the trie was ternary it may have to backtrack to follow the don't care path.  
In the pathological worst case, the traversal essentially had to examine every 
rule.  Performance was abysmal.

What I need to understand is the DPDK ACL algorithm and the motivation behind 
it.  It seems that ACLs are generated it two phases.  The build phase evaluates 
each rule and then sorts it for "wildness."  This is a concept/heuristic that 
is currently lost on me.  I then see that the algorithm develops a trie for 
each rule.  It seems that this initial trie is actually a 1-bit stride trie 
that may be quite large.  The algorithm then moves to the "generate" phase 
where the internal tries are compacted into something more space/time 
efficient.  It is unclear to me how the algorithm doesn't suffer from:

1. Memory bloat if runtime performance is favored
2. Matching performance suffering is memory efficiency is favored

If someone had any background on the algorithm implemented that could help me 
see the forest from the trees I would be grateful.  I am also quite willing to 
document what I learn and feed this documentation back to the community if it 
is of value.  I would also be willing to provide my testing results showing 
performance observations from best/average/pathological rule sets.

Best Regards,
Nathan Jensen

Reply via email to