On Fri, Apr 17, 2015 at 06:24:55PM +0100, Paul Jakma wrote: > On Mon, 13 Apr 2015, David Lamparter wrote: > > > Prefix lists were implemented with a simple linear list that is scanned > > sequentially. This is, of course, extremely inefficient as it scales by > > O(n). This patch adds a trie-ish data structure that allows quickly > > descending based on the prefix. > > > > Note that the trie structure used here is designed for real-world use, > > hence it uses a relatively crude fixed-size bytewise table instead of > > some fancy balancing scheme. It is quite cacheline efficient. > > > > Using real-world routeserver prefix lists, matching against a fulltable > > dump: > > > > entries before after factor > > 9103 63.8s .0124s 5142x > > 772 4.52s .0101s 445.3x > > 86 .445s .0098s 45.51x > > 7 .0379s .0099s 3.834x > > 2 .0136s .0095s 1.440x > > 1 .0084s .0095s .879x > > Nice speedups. > > One thing, maybe stick the trie in a different file, and put a generic > interface over it. We already have a binary trie in lib/table, be nice > they use the same thing. Indeed, you could even use lib/table for this? > Why not?
lib/table.c is not optimal because: - it won't out of the box support multiple entry for a single prefix (prefix lists can contain that with different le/ge values) - lookups touch considerably more cache lines - one per prefix bit, because it's a binary tree - which is the most expensive thing on modern CPUs. The byte-level trie jumps, well, a byte at a time, with little cache pollution. Even more, the list off the pointer in the trie will only have one entry for "usual" prefix lists, meaning we only need to grab 3 cachelines to get to a /16. lib/table.c will load 17 cachelines, an order of magnitude more... (and, additionally, the working set of frequently-accessed data is larger with lots of nodes being candidates to load; on the other hand the cachelines for the bytewise approach contain neighboring information too and are fewer in total...) The structure will start behaving badly for prefix list entries longer than the depth of the table - where it regresses into linked list - but testing this against actual data indicates it's not a serious problem. We could try hanging off a lib/table.c trie instead of the linked list... (That said, I guess lib/table.c would still perform in under a second for all cases in the above table.) > Also, the bucketing thing in this trie is sort of like a fixed-bucket > B-tree, but without balancing. Maybe one day it should become a b-tree. Since this is fixed bytewise, the only thing to balance would be the depth of the trie. We could look at that in a future patch. Moving to a B-tree seems to introduce complexity while making memory access efficiency worse. The beauty of this trie implementation is in its simplicity and cacheline efficiency. > And that raises the question, is this bucketed tree correct for plist > addresses data? If so, should lib/table use this? If not, why is the > distribution of plist prefixes different from route prefixes? ... I don't understand the correctness question, can you re-word? I've put some thought into adapting something similar for lib/table.c; however I haven't come to a satisfying idea yet. Either way the code added here is relatively special-purpose and not all that long, so I don't see a major advantage in splitting it off. -David _______________________________________________ Quagga-dev mailing list [email protected] https://lists.quagga.net/mailman/listinfo/quagga-dev
