On 14 Jun 2014, at 18:34, Alex Rousskov <rouss...@measurement-factory.com> wrote:
> On 06/14/2014 12:39 AM, Kinkie wrote: >> On Fri, Jun 13, 2014 at 4:50 PM, Alex Rousskov >> <rouss...@measurement-factory.com> wrote: >>> On 06/11/2014 04:43 PM, Kinkie wrote: >>>> level-compact-trie: the mean time is 11 sec; all runs take between >>>> 10.782 and 11.354 secs; 18 Mb of core used >>> >>>> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. >>> >>>> splay-based: mean time is 16.3sec; all runs take between 16.193 and >>>> 16.427 secs; 14 Mb of core >>> >>> How about std::map? Have you considered using a standard class for this >>> purpose? >> >> std::map doesn't offer efficient prefix matching, but sure, I'll try >> to prepare a test run on the same data to establish a baseline. > > Is explicit support to prefix matching actually required though? I am > thinking of using std::map::lower_bound or similar to find the vicinity > of a possible prefix match. For example, when the map has a b.c domain > stored, and you are searching for a.b.c, the lower_bound method returns > a "pointer" to the stored b.c node. After that, a simple prefix test > between the two strings can return the final match/mismatch answer. > > Needless to say, all domain names ought to be stored/compared in the > reverse order of their labels. For example, a.b.c is stored internally > as c.b.a. It just feels more natural for me to render it as a.b.c in a > human-oriented text like this email. > > And if we need to support prefixes that do not end at domain label > boundaries (i.e., *foo.bar should match zzzfoo.bar), then we just treat > each character as a "label". More data: I haven't yet tested out your solution, but yet another round of TrieNode (it's easier), which uses a variable-length std::vector + an offset to compact the TrieNode to the actual range that's needed for that one node. The results are very interesting: This implementation uses the least memory of the trie-based solutes (16.5Mb for the test data set) and performance is only 12% worse than full-fledged trie (8.5sec +- 0.3sec over 10 tests). More to come. Kinkie