Re: [Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps
Don Stewart wrote: wren: Alex Mason wrote: TernaryTrees is a package that extends Data.Set ad Data.Map with some ternary tree structures, based on the article [http://www.pcplus.co.uk/node/3074/] . For the string (or rather ByteString) version: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie Which has a number of other significant performance improvements (e.g. node fusion, ByteString instead of String) and a highly expressive interface. Because it uses ByteStrings it can trie any type which can be serialized into a vector of bits[1], albeit indirectly. The real trick with tries is not in just having them[2], it's in having the right interface to make use of what they're good at. For example, if I have multiple tries, I'd like to merge them without doing it element by element[3]. Or if I know I'm going to be making a number of similar queries, it'd be nice if I could cache my position in the trie[4] to avoid repeating the work for the prefixes of all my queries[5]. Using tricks like these leads to significant improvements over using them like hashtables; tries aren't hashtables just like lists aren't arrays. Do you have benchmarks? Somewhere in my email archive (care of Mark Wotton). I'll see if I can dig them up this weekend. The biggest issue here is finding nice datasets (and tasks) to give reasonable benchmarks for. Reading in all of /usr/dict (or the Brown corpus) and looking up all keys only gives one perspective (or two), and not necessarily the most helpful one for real world use. I haven't found any good dataset/task suites like there are for the Language Benchmarks Game, though I'd love to hear about one. The tries /= hashtables comment stems from discussions on various haskell blogs with people inventing their own (or wanting to benchmark Data.Map vs hashtables vs tries vs bloomfilters). As a drop-in replacement tries will perform adequately, but they're nothing overwhelming; the overwhelming comes from changing the usage algorithms to match the stride of the datastructure. I don't think I have links to these discussions anymore to pull up code examples. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps
TernaryTrees is a package that extends Data.Set ad Data.Map with some ternary tree structures, based on the article [http://www.pcplus.co.uk/node/3074/ ] . So far there are three modules: Data.Set.TernarySet, Data.Map.TernaryMap and Data.Set.StringSet, which can hold `Ord a = [a]`, Ord a = [a] keys with b values, and Strings respectively. The interfaces for these types are very much like those of Data.{Set,Map}, though not wuite as featurefull. Later releases will also have a StringMap, and I'll update the TernaryMap to match the Set implementations more closely in a not too distant update. Ternary trees are supposed to be one of the more efficient ways of storing strings in a set, and my testing of this package seems to support this (being able to insert 230,000+ words, check that all those words are actually in the set, write the set out to disk using the Data.Binary instance, reading them back in, and checking the old and new sets are equal takes about 3.5 seconds on my machine). Included is a small example program that runs through the above sequence, and then asks the user to enter words to see if they're in the set, called tdict. Please give it a try and let me know what you think, it's my first (hopefully) useful hackage package, and I'd love some feedback. There is also a darcs repo [http://random.axman6.com/darcs/TernaryTrees/], and any patches are welcome. Cheers, Alex Mason (Axman6) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps
Alex Mason wrote: TernaryTrees is a package that extends Data.Set ad Data.Map with some ternary tree structures, based on the article [http://www.pcplus.co.uk/node/3074/] . That's just scary. I was just in the middle of writing the exact same thing! :-D (I read that very article...) Please give it a try and let me know what you think, it's my first (hopefully) useful hackage package, and I'd love some feedback. OK, will do... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps
On Tue, Jun 30, 2009 at 03:29:45AM +1000, Alex Mason wrote: (being able to insert 230,000+ words, check that all those words are actually in the set, write the set out to disk using the Data.Binary instance, reading them back in, and checking the old and new sets are equal takes about 3.5 seconds on my machine). It would be nice to know how much time the same test takes using other kinds of containers. -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: TernaryTrees-0.1.1.1 - An efficient ternary tree implementation of Sets and Maps
G'day all. Quoting Andrew Coppin andrewcop...@btinternet.com: That's just scary. I was just in the middle of writing the exact same thing! :-D (I read that very article...) When you're both done, please compare with the implementation that's been in Edison for about five years: http://www.cs.princeton.edu/~rdockins/edison/edison-core/src/Data/Edison/Assoc/ If you've got something that's an improvement (and believe me, it shouldn't be hard to improve on it), I'm sure that Rob would appreciate a patch. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe