Re: [code] [for discussion] map-trie

Francesco Mon, 16 Jun 2014 10:41:39 -0700

On 14 Jun 2014, at 18:34, Alex Rousskov <rouss...@measurement-factory.com> 
wrote:


> On 06/14/2014 12:39 AM, Kinkie wrote:
>> On Fri, Jun 13, 2014 at 4:50 PM, Alex Rousskov
>> <rouss...@measurement-factory.com> wrote:
>>> On 06/11/2014 04:43 PM, Kinkie wrote:
>>>> level-compact-trie: the mean time is 11 sec; all runs take between
>>>> 10.782 and 11.354 secs; 18 Mb of core used
>>> 
>>>> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core.
>>> 
>>>> splay-based: mean time is 16.3sec; all runs take between 16.193 and
>>>> 16.427 secs; 14 Mb of core
>>> 
>>> How about std::map? Have you considered using a standard class for this
>>> purpose?
>> 
>> std::map doesn't offer efficient prefix matching, but sure, I'll try
>> to prepare a test run on the same data to establish a baseline.
> 
> Is explicit support to prefix matching actually required though? I am
> thinking of using std::map::lower_bound or similar to find the vicinity
> of a possible prefix match. For example, when the map has a b.c domain
> stored, and you are searching for a.b.c, the lower_bound method returns
> a "pointer" to the stored b.c node. After that, a simple prefix test
> between the two strings can return the final match/mismatch answer.
> 
> Needless to say, all domain names ought to be stored/compared in the
> reverse order of their labels. For example, a.b.c is stored internally
> as c.b.a. It just feels more natural for me to render it as a.b.c in a
> human-oriented text like this email.
> 
> And if we need to support prefixes that do not end at domain label
> boundaries (i.e., *foo.bar should match zzzfoo.bar), then we just treat
> each character as a "label".

More data: I haven't yet tested out your solution, but yet another round of 
TrieNode (it's easier), which uses a variable-length std::vector + an offset to 
compact the TrieNode to the actual range that's needed for that one node. The 
results are very interesting: This implementation uses the least memory of the 
trie-based solutes (16.5Mb for the test data set) and performance is  only 12% 
worse than full-fledged trie (8.5sec +- 0.3sec over 10 tests).

More to come.

        Kinkie

Re: [code] [for discussion] map-trie

Reply via email to