Re: [code] [for discussion] map-trie
On 14 Jun 2014, at 18:34, Alex Rousskov wrote: > On 06/14/2014 12:39 AM, Kinkie wrote: >> On Fri, Jun 13, 2014 at 4:50 PM, Alex Rousskov >> wrote: >>> On 06/11/2014 04:43 PM, Kinkie wrote: level-compact-trie: the mean time is 11 sec; all runs take between 10.782 and 11.354 secs; 18 Mb of core used >>> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. >>> splay-based: mean time is 16.3sec; all runs take between 16.193 and 16.427 secs; 14 Mb of core >>> >>> How about std::map? Have you considered using a standard class for this >>> purpose? >> >> std::map doesn't offer efficient prefix matching, but sure, I'll try >> to prepare a test run on the same data to establish a baseline. > > Is explicit support to prefix matching actually required though? I am > thinking of using std::map::lower_bound or similar to find the vicinity > of a possible prefix match. For example, when the map has a b.c domain > stored, and you are searching for a.b.c, the lower_bound method returns > a "pointer" to the stored b.c node. After that, a simple prefix test > between the two strings can return the final match/mismatch answer. > > Needless to say, all domain names ought to be stored/compared in the > reverse order of their labels. For example, a.b.c is stored internally > as c.b.a. It just feels more natural for me to render it as a.b.c in a > human-oriented text like this email. > > And if we need to support prefixes that do not end at domain label > boundaries (i.e., *foo.bar should match zzzfoo.bar), then we just treat > each character as a "label". More data: I haven't yet tested out your solution, but yet another round of TrieNode (it's easier), which uses a variable-length std::vector + an offset to compact the TrieNode to the actual range that's needed for that one node. The results are very interesting: This implementation uses the least memory of the trie-based solutes (16.5Mb for the test data set) and performance is only 12% worse than full-fledged trie (8.5sec +- 0.3sec over 10 tests). More to come. Kinkie
Re: [code] [for discussion] map-trie
On 06/14/2014 12:39 AM, Kinkie wrote: > On Fri, Jun 13, 2014 at 4:50 PM, Alex Rousskov > wrote: >> On 06/11/2014 04:43 PM, Kinkie wrote: >>> level-compact-trie: the mean time is 11 sec; all runs take between >>> 10.782 and 11.354 secs; 18 Mb of core used >> >>> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. >> >>> splay-based: mean time is 16.3sec; all runs take between 16.193 and >>> 16.427 secs; 14 Mb of core >> >> How about std::map? Have you considered using a standard class for this >> purpose? > > std::map doesn't offer efficient prefix matching, but sure, I'll try > to prepare a test run on the same data to establish a baseline. Is explicit support to prefix matching actually required though? I am thinking of using std::map::lower_bound or similar to find the vicinity of a possible prefix match. For example, when the map has a b.c domain stored, and you are searching for a.b.c, the lower_bound method returns a "pointer" to the stored b.c node. After that, a simple prefix test between the two strings can return the final match/mismatch answer. Needless to say, all domain names ought to be stored/compared in the reverse order of their labels. For example, a.b.c is stored internally as c.b.a. It just feels more natural for me to render it as a.b.c in a human-oriented text like this email. And if we need to support prefixes that do not end at domain label boundaries (i.e., *foo.bar should match zzzfoo.bar), then we just treat each character as a "label". HTH, Alex.
Re: [code] [for discussion] map-trie
FWIW my intent with libTrie was always to head for an LPC Trie implementation as the only implementation. This should be faster still than the current naive trie, and faster than the std::map compression technique you've implemented. [ IIRC the LPC paper was Implementing a Dynamic Compressed Trie from Stefan Nilsson, Matti Tikkanen ]- I have the pdf floating around here somewhere, but a quick google just hit paywalls these days :(. -Rob On 12 June 2014 10:43, Kinkie wrote: > Hi, > I've done some benchmarking, here are the results so far: > The proposal I'm suggesting for dstdomain acl is at > lp:~kinkie/squid/flexitrie . It uses the level-compact trie approach > I've described in this thread (NOT a Patricia trie). As a comparison, > lp:~kinkie/squid/domaindata-benchmark implements the same benchmark > using the current splay-based implementation. > > I've implemented a quick-n-dirty benchmarking tool > (src/acl/testDomainDataPerf); it takes as input an acl definition - > one dstdomain per line, as if it was included in squid.conf, and a > hostname list file (one destination hostname per line). > > I've run both variants of the code against the same dataset: a 4k > entries domain list, containing both hostnames and domain names, and a > 18M entries list of destination hostnames, both matching and not > matching entries in the domain list (about 7.5M hits, 10.5M misses). > > Tested 10 times on a Core 2 PC with plenty of RAM - source datasets > are in the fs cache. > level-compact-trie: the mean time is 11 sec; all runs take between > 10.782 and 11.354 secs; 18 Mb of core used > full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. > splay-based: mean time is 16.3sec; all runs take between 16.193 and > 16.427 secs; 14 Mb of core > > I expect compact-trie to scale better as the number of entries in the > list grows and with the number of clients and requests per second; > furthermore using it removes 50-100 LOC, and makes code more readable. > > IMO it is the best compromise in terms of performance, resources > useage and expected scalability; before pursuing this further however > I'd like to have some feedback. > > Thanks > > > On Wed, Jun 4, 2014 at 4:11 PM, Alex Rousskov > wrote: >> On 06/04/2014 02:06 AM, Kinkie wrote: >> >>> there are use cases >>> for using a Trie (e.g. prefix matching for dstdomain ACL); these may >>> be served by other data strcutures, but none we could come up with on >>> the spot. >> >> std::map and StringIdentifier are the ones we came up with on the spot. >> Both may require some adjustments to address the use case you are after. >> >> Alex. >> >> >>> A standard trie uses quite a lot of RAM for those use cases. >>> There are quite a lot of areas where we can improve so this one is not >>> urgent. Still I'd like to explore it as it's synchronous code (thus >>> easier for me to follow) and it's a nice area to tinker with. >>> >>> On Tue, Jun 3, 2014 at 10:12 PM, Alex Rousskov >>> wrote: On 06/03/2014 08:40 AM, Kinkie wrote: > Hi all, > as an experiment and to encourage some discussion I prepared an > alternate implementation of TrieNode which uses a std::map instead of > an array to store a node's children. > > The expected result is a worst case performance degradation on insert > and delete from O(N) to O(N log R) where N is the length of the > c-string being looked up, and R is the size of the alphabet (as R = > 256, we're talking about 8x worse). > > The expected benefit is a noticeable reduction in terms of memory use, > especially for sparse key-spaces; it'd be useful e.g. in some lookup > cases. > > Comments? To evaluate these optimizations, we need to know the targeted use cases. Amos mentioned ESI as the primary Trie user. I am not familiar with ESI specifics (and would be surprised to learn you want to optimize ESI!), but would recommend investigating a different approach if your goal is to optimize search/identification of strings from a known-in-advance set. Cheers, Alex. >>> >>> >>> >> > > > > -- > Francesco -- Robert Collins Distinguished Technologist HP Converged Cloud
Re: [code] [for discussion] map-trie
On Fri, Jun 13, 2014 at 4:50 PM, Alex Rousskov wrote: > On 06/11/2014 04:43 PM, Kinkie wrote: >> level-compact-trie: the mean time is 11 sec; all runs take between >> 10.782 and 11.354 secs; 18 Mb of core used > >> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. > >> splay-based: mean time is 16.3sec; all runs take between 16.193 and >> 16.427 secs; 14 Mb of core > > How about std::map? Have you considered using a standard class for this > purpose? std::map doesn't offer efficient prefix matching, but sure, I'll try to prepare a test run on the same data to establish a baseline. -- Kinkie
Re: [code] [for discussion] map-trie
On 06/11/2014 04:43 PM, Kinkie wrote: > level-compact-trie: the mean time is 11 sec; all runs take between > 10.782 and 11.354 secs; 18 Mb of core used > full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. > splay-based: mean time is 16.3sec; all runs take between 16.193 and > 16.427 secs; 14 Mb of core How about std::map? Have you considered using a standard class for this purpose? Thank you, Alex.
Re: [code] [for discussion] map-trie
On 12 Jun 2014, at 11:33, Francesco wrote: > It's currently one single typedef in libtrie. Could be made templatized with > no effort. > However libtrie needs some work still: it can't be walked, it's not > type-safe, it can't be used for algorithms. > I expect I could merge the work I've done on the ternary search trie > (lp:~kinkie/squid/tst) and this to have a fully-functional trie > implementation in a few days of work, exposing the same API as std::map, > having level-compact-trie or full-trie implementation via a template > parameter. .. a followup to myself: the current code has a bug: if the domain list contains .example.com www.example.com then a match for .example.com will fail, while it shouldn't. Fixing this is trivial but decreases the efficiency a bit, unfortunately. Kinkie
Re: [code] [for discussion] map-trie
On 12 Jun 2014, at 11:28, Amos Jeffries wrote: > On 12/06/2014 10:43 a.m., Kinkie wrote: >> Hi, >> I've done some benchmarking, here are the results so far: >> The proposal I'm suggesting for dstdomain acl is at >> lp:~kinkie/squid/flexitrie . It uses the level-compact trie approach >> I've described in this thread (NOT a Patricia trie). As a comparison, >> lp:~kinkie/squid/domaindata-benchmark implements the same benchmark >> using the current splay-based implementation. >> >> I've implemented a quick-n-dirty benchmarking tool >> (src/acl/testDomainDataPerf); it takes as input an acl definition - >> one dstdomain per line, as if it was included in squid.conf, and a >> hostname list file (one destination hostname per line). >> >> I've run both variants of the code against the same dataset: a 4k >> entries domain list, containing both hostnames and domain names, and a >> 18M entries list of destination hostnames, both matching and not >> matching entries in the domain list (about 7.5M hits, 10.5M misses). >> >> Tested 10 times on a Core 2 PC with plenty of RAM - source datasets >> are in the fs cache. >> level-compact-trie: the mean time is 11 sec; all runs take between >> 10.782 and 11.354 secs; 18 Mb of core used >> full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. >> splay-based: mean time is 16.3sec; all runs take between 16.193 and >> 16.427 secs; 14 Mb of core >> >> I expect compact-trie to scale better as the number of entries in the >> list grows and with the number of clients and requests per second; >> furthermore using it removes 50-100 LOC, and makes code more readable. >> >> IMO it is the best compromise in terms of performance, resources >> useage and expected scalability; before pursuing this further however >> I'd like to have some feedback. > > Does making compact vs full trie configurable affect compexity much? I > think we have sufficiently wide customer base that people will want to > go further now that its known there is a much faster version. It's currently one single typedef in libtrie. Could be made templatized with no effort. However libtrie needs some work still: it can't be walked, it's not type-safe, it can't be used for algorithms. I expect I could merge the work I've done on the ternary search trie (lp:~kinkie/squid/tst) and this to have a fully-functional trie implementation in a few days of work, exposing the same API as std::map, having level-compact-trie or full-trie implementation via a template parameter. Kinkie
Re: [code] [for discussion] map-trie
On 12/06/2014 10:43 a.m., Kinkie wrote: > Hi, > I've done some benchmarking, here are the results so far: > The proposal I'm suggesting for dstdomain acl is at > lp:~kinkie/squid/flexitrie . It uses the level-compact trie approach > I've described in this thread (NOT a Patricia trie). As a comparison, > lp:~kinkie/squid/domaindata-benchmark implements the same benchmark > using the current splay-based implementation. > > I've implemented a quick-n-dirty benchmarking tool > (src/acl/testDomainDataPerf); it takes as input an acl definition - > one dstdomain per line, as if it was included in squid.conf, and a > hostname list file (one destination hostname per line). > > I've run both variants of the code against the same dataset: a 4k > entries domain list, containing both hostnames and domain names, and a > 18M entries list of destination hostnames, both matching and not > matching entries in the domain list (about 7.5M hits, 10.5M misses). > > Tested 10 times on a Core 2 PC with plenty of RAM - source datasets > are in the fs cache. > level-compact-trie: the mean time is 11 sec; all runs take between > 10.782 and 11.354 secs; 18 Mb of core used > full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. > splay-based: mean time is 16.3sec; all runs take between 16.193 and > 16.427 secs; 14 Mb of core > > I expect compact-trie to scale better as the number of entries in the > list grows and with the number of clients and requests per second; > furthermore using it removes 50-100 LOC, and makes code more readable. > > IMO it is the best compromise in terms of performance, resources > useage and expected scalability; before pursuing this further however > I'd like to have some feedback. Does making compact vs full trie configurable affect compexity much? I think we have sufficiently wide customer base that people will want to go further now that its known there is a much faster version. Amos
Re: [code] [for discussion] map-trie
Hi, I've done some benchmarking, here are the results so far: The proposal I'm suggesting for dstdomain acl is at lp:~kinkie/squid/flexitrie . It uses the level-compact trie approach I've described in this thread (NOT a Patricia trie). As a comparison, lp:~kinkie/squid/domaindata-benchmark implements the same benchmark using the current splay-based implementation. I've implemented a quick-n-dirty benchmarking tool (src/acl/testDomainDataPerf); it takes as input an acl definition - one dstdomain per line, as if it was included in squid.conf, and a hostname list file (one destination hostname per line). I've run both variants of the code against the same dataset: a 4k entries domain list, containing both hostnames and domain names, and a 18M entries list of destination hostnames, both matching and not matching entries in the domain list (about 7.5M hits, 10.5M misses). Tested 10 times on a Core 2 PC with plenty of RAM - source datasets are in the fs cache. level-compact-trie: the mean time is 11 sec; all runs take between 10.782 and 11.354 secs; 18 Mb of core used full-trie: mean is 7.5 secs +- 0.2secs; 85 Mb of core. splay-based: mean time is 16.3sec; all runs take between 16.193 and 16.427 secs; 14 Mb of core I expect compact-trie to scale better as the number of entries in the list grows and with the number of clients and requests per second; furthermore using it removes 50-100 LOC, and makes code more readable. IMO it is the best compromise in terms of performance, resources useage and expected scalability; before pursuing this further however I'd like to have some feedback. Thanks On Wed, Jun 4, 2014 at 4:11 PM, Alex Rousskov wrote: > On 06/04/2014 02:06 AM, Kinkie wrote: > >> there are use cases >> for using a Trie (e.g. prefix matching for dstdomain ACL); these may >> be served by other data strcutures, but none we could come up with on >> the spot. > > std::map and StringIdentifier are the ones we came up with on the spot. > Both may require some adjustments to address the use case you are after. > > Alex. > > >> A standard trie uses quite a lot of RAM for those use cases. >> There are quite a lot of areas where we can improve so this one is not >> urgent. Still I'd like to explore it as it's synchronous code (thus >> easier for me to follow) and it's a nice area to tinker with. >> >> On Tue, Jun 3, 2014 at 10:12 PM, Alex Rousskov >> wrote: >>> On 06/03/2014 08:40 AM, Kinkie wrote: Hi all, as an experiment and to encourage some discussion I prepared an alternate implementation of TrieNode which uses a std::map instead of an array to store a node's children. The expected result is a worst case performance degradation on insert and delete from O(N) to O(N log R) where N is the length of the c-string being looked up, and R is the size of the alphabet (as R = 256, we're talking about 8x worse). The expected benefit is a noticeable reduction in terms of memory use, especially for sparse key-spaces; it'd be useful e.g. in some lookup cases. Comments? >>> >>> >>> To evaluate these optimizations, we need to know the targeted use cases. >>> Amos mentioned ESI as the primary Trie user. I am not familiar with ESI >>> specifics (and would be surprised to learn you want to optimize ESI!), >>> but would recommend investigating a different approach if your goal is >>> to optimize search/identification of strings from a known-in-advance set. >>> >>> >>> Cheers, >>> >>> Alex. >>> >> >> >> > -- Francesco
Re: [code] [for discussion] map-trie
On 06/04/2014 02:06 AM, Kinkie wrote: > there are use cases > for using a Trie (e.g. prefix matching for dstdomain ACL); these may > be served by other data strcutures, but none we could come up with on > the spot. std::map and StringIdentifier are the ones we came up with on the spot. Both may require some adjustments to address the use case you are after. Alex. > A standard trie uses quite a lot of RAM for those use cases. > There are quite a lot of areas where we can improve so this one is not > urgent. Still I'd like to explore it as it's synchronous code (thus > easier for me to follow) and it's a nice area to tinker with. > > On Tue, Jun 3, 2014 at 10:12 PM, Alex Rousskov > wrote: >> On 06/03/2014 08:40 AM, Kinkie wrote: >>> Hi all, >>> as an experiment and to encourage some discussion I prepared an >>> alternate implementation of TrieNode which uses a std::map instead of >>> an array to store a node's children. >>> >>> The expected result is a worst case performance degradation on insert >>> and delete from O(N) to O(N log R) where N is the length of the >>> c-string being looked up, and R is the size of the alphabet (as R = >>> 256, we're talking about 8x worse). >>> >>> The expected benefit is a noticeable reduction in terms of memory use, >>> especially for sparse key-spaces; it'd be useful e.g. in some lookup >>> cases. >>> >>> Comments? >> >> >> To evaluate these optimizations, we need to know the targeted use cases. >> Amos mentioned ESI as the primary Trie user. I am not familiar with ESI >> specifics (and would be surprised to learn you want to optimize ESI!), >> but would recommend investigating a different approach if your goal is >> to optimize search/identification of strings from a known-in-advance set. >> >> >> Cheers, >> >> Alex. >> > > >
Re: [code] [for discussion] map-trie
Hi, to align the others on what we discussed about on IRC (feel free to correct me if I'm remembering inappropriately): there are use cases for using a Trie (e.g. prefix matching for dstdomain ACL); these may be served by other data strcutures, but none we could come up with on the spot. A standard trie uses quite a lot of RAM for those use cases. There are quite a lot of areas where we can improve so this one is not urgent. Still I'd like to explore it as it's synchronous code (thus easier for me to follow) and it's a nice area to tinker with. On Tue, Jun 3, 2014 at 10:12 PM, Alex Rousskov wrote: > On 06/03/2014 08:40 AM, Kinkie wrote: >> Hi all, >> as an experiment and to encourage some discussion I prepared an >> alternate implementation of TrieNode which uses a std::map instead of >> an array to store a node's children. >> >> The expected result is a worst case performance degradation on insert >> and delete from O(N) to O(N log R) where N is the length of the >> c-string being looked up, and R is the size of the alphabet (as R = >> 256, we're talking about 8x worse). >> >> The expected benefit is a noticeable reduction in terms of memory use, >> especially for sparse key-spaces; it'd be useful e.g. in some lookup >> cases. >> >> Comments? > > > To evaluate these optimizations, we need to know the targeted use cases. > Amos mentioned ESI as the primary Trie user. I am not familiar with ESI > specifics (and would be surprised to learn you want to optimize ESI!), > but would recommend investigating a different approach if your goal is > to optimize search/identification of strings from a known-in-advance set. > > > Cheers, > > Alex. > -- Francesco
Re: [code] [for discussion] map-trie
On 06/03/2014 08:40 AM, Kinkie wrote: > Hi all, > as an experiment and to encourage some discussion I prepared an > alternate implementation of TrieNode which uses a std::map instead of > an array to store a node's children. > > The expected result is a worst case performance degradation on insert > and delete from O(N) to O(N log R) where N is the length of the > c-string being looked up, and R is the size of the alphabet (as R = > 256, we're talking about 8x worse). > > The expected benefit is a noticeable reduction in terms of memory use, > especially for sparse key-spaces; it'd be useful e.g. in some lookup > cases. > > Comments? To evaluate these optimizations, we need to know the targeted use cases. Amos mentioned ESI as the primary Trie user. I am not familiar with ESI specifics (and would be surprised to learn you want to optimize ESI!), but would recommend investigating a different approach if your goal is to optimize search/identification of strings from a known-in-advance set. Cheers, Alex.
Re: [code] [for discussion] map-trie
On Tue, Jun 3, 2014 at 5:52 PM, Amos Jeffries wrote: > On 4/06/2014 2:40 a.m., Kinkie wrote: >> Hi all, >> as an experiment and to encourage some discussion I prepared an >> alternate implementation of TrieNode which uses a std::map instead of >> an array to store a node's children. >> >> The expected result is a worst case performance degradation on insert >> and delete from O(N) to O(N log R) where N is the length of the >> c-string being looked up, and R is the size of the alphabet (as R = >> 256, we're talking about 8x worse). >> >> The expected benefit is a noticeable reduction in terms of memory use, >> especially for sparse key-spaces; it'd be useful e.g. in some lookup >> cases. >> >> Comments? >> > > Since this is in libTrie used for ESI message body parsing the major > operations being done on it are insert and delete. These happen a > relatively large number of times per-request as the XML payload/body > gets parsed and a whole tree is constructed+destructed. er.. Trie doesn't support delete. Only add, find and findPrefix. > So the worst-case areas are worse thay they would be for example in ACLs > based on Trie. Note I'm not suggesting to blindly replace TrieNode with this. Assuming we don't go for something more complete (e.g. the ternary search trie I'm tinkering with), and stick to this very simple API, then the easiest thing to do is to make Trie a generic, taking as argument the template to be used. So there'd be a FastTrie (aka Trie) and a CompactTrie (aka Trie). -- Kinkie
Re: [code] [for discussion] map-trie
On 4/06/2014 2:40 a.m., Kinkie wrote: > Hi all, > as an experiment and to encourage some discussion I prepared an > alternate implementation of TrieNode which uses a std::map instead of > an array to store a node's children. > > The expected result is a worst case performance degradation on insert > and delete from O(N) to O(N log R) where N is the length of the > c-string being looked up, and R is the size of the alphabet (as R = > 256, we're talking about 8x worse). > > The expected benefit is a noticeable reduction in terms of memory use, > especially for sparse key-spaces; it'd be useful e.g. in some lookup > cases. > > Comments? > Since this is in libTrie used for ESI message body parsing the major operations being done on it are insert and delete. These happen a relatively large number of times per-request as the XML payload/body gets parsed and a whole tree is constructed+destructed. So the worst-case areas are worse thay they would be for example in ACLs based on Trie. Amos
[code] [for discussion] map-trie
Hi all, as an experiment and to encourage some discussion I prepared an alternate implementation of TrieNode which uses a std::map instead of an array to store a node's children. The expected result is a worst case performance degradation on insert and delete from O(N) to O(N log R) where N is the length of the c-string being looked up, and R is the size of the alphabet (as R = 256, we're talking about 8x worse). The expected benefit is a noticeable reduction in terms of memory use, especially for sparse key-spaces; it'd be useful e.g. in some lookup cases. Comments? -- Francesco === modified file 'lib/libTrie/Trie.cc' --- lib/libTrie/Trie.cc 2014-06-03 10:49:08 + +++ lib/libTrie/Trie.cc 2014-06-03 14:16:21 + @@ -48,7 +48,7 @@ return head->add(aString, theLength, privatedata, transform); } -head = new TrieNode; +head = new node_type; return head->add(aString, theLength, privatedata, transform); } === modified file 'lib/libTrie/Trie.h' --- lib/libTrie/Trie.h 2014-06-03 10:49:08 + +++ lib/libTrie/Trie.h 2014-06-03 14:20:17 + @@ -58,7 +58,8 @@ bool add(char const *, size_t, void *); private: -TrieNode *head; +typedef MapTrieNode node_type; +node_type *head; /* transfor each 8 bits in the element */ TrieCharTransform *transform; === modified file 'lib/libTrie/TrieNode.cc' --- lib/libTrie/TrieNode.cc 2014-06-03 11:59:56 + +++ lib/libTrie/TrieNode.cc 2014-06-03 13:09:24 + @@ -24,13 +24,13 @@ #include #endif -TrieNode::TrieNode() : _privateData(NULL) +ArrayTrieNode::ArrayTrieNode() : _privateData(NULL) { for (int i = 0; i < 256; ++i) internal[i] = NULL; } -TrieNode::~TrieNode() +ArrayTrieNode::~ArrayTrieNode() { for (int i = 0; i < 256; ++i) delete internal[i]; @@ -38,7 +38,7 @@ /* as for find */ bool -TrieNode::add(char const *aString, size_t theLength, void *privatedata, TrieCharTransform *transform) +ArrayTrieNode::add(char const *aString, size_t theLength, void *privatedata, TrieCharTransform *transform) { /* We trust that privatedata and existant keys have already been checked */ @@ -46,7 +46,7 @@ int index = transform ? (*transform)(*aString): *aString; if (!internal[index]) -internal[index] = new TrieNode; +internal[index] = new ArrayTrieNode; return internal[index]->add(aString + 1, theLength - 1, privatedata, transform); } else { === modified file 'lib/libTrie/TrieNode.h' --- lib/libTrie/TrieNode.h 2014-06-03 10:49:08 + +++ lib/libTrie/TrieNode.h 2014-06-03 14:12:01 + @@ -29,14 +29,14 @@ * i.e. M-ary internal node sizes etc */ -class TrieNode +class ArrayTrieNode { public: -TrieNode(); -~TrieNode(); -TrieNode(TrieNode const &); -TrieNode &operator= (TrieNode const &); +ArrayTrieNode(); +~ArrayTrieNode(); +ArrayTrieNode(ArrayTrieNode const &); +ArrayTrieNode &operator= (ArrayTrieNode const &); /* Find a string. * If found, return the private data. @@ -58,7 +58,7 @@ * internal[0] is the terminal node for * a string and may not be used */ -TrieNode * internal[256]; +ArrayTrieNode * internal[256]; /* If a string ends here, non NULL */ void *_privateData; @@ -66,7 +66,7 @@ /* recursive. TODO? make iterative */ void * -TrieNode::find (char const *aString, size_t theLength, TrieCharTransform *transform, bool const prefix) const +ArrayTrieNode::find (char const *aString, size_t theLength, TrieCharTransform *transform, bool const prefix) const { if (theLength) { int index = -1; @@ -92,4 +92,79 @@ return _privateData; } } + +#include +class MapTrieNode { +public: +MapTrieNode() : _privateData(NULL) {} +inline ~MapTrieNode(); +MapTrieNode(MapTrieNode const &); +MapTrieNode &operator=(MapTrieNode const &); + +inline void *find(char const *, size_t, TrieCharTransform *, bool const prefix) const; +inline bool add (char const *, size_t, void *, TrieCharTransform *); + +private: +typedef std::map internalmap; +internalmap internal; +void *_privateData; +}; + +MapTrieNode::~MapTrieNode() +{ +using std::map; +for (internalmap::iterator i =internal.begin(); i != internal.end(); ++i) +if (i->second) +delete i->second; +} + +bool +MapTrieNode::add(char const *aString, size_t theLength, void *privatedata, TrieCharTransform *transform) +{ +if (theLength) { +int index = transform ? (*transform)(*aString): *aString; +const internalmap::iterator i = internal.find(index); + +if (i == internal.end()) +internal[index] = new MapTrieNode; + +return internal[index]->add(aString + 1, theLength - 1, privatedata, transform); +} else { +/* terminal node */ + +if (_privateData) +return false; + +_privateData = privatedata; + +return true; +} +}