Re: [Haskell-cafe] ANN: bytestring-trie 0.1.0

2008-12-21 Thread Sterling Clover
Thanks for working on this! A nice efficient bytestring-trie is the  
sort of data structure we should have had in Haskell for some time  
now, and I'm sure I'll be giving it a great deal of use.


Regards,
Sterl.

On Dec 20, 2008, at 1:06 AM, wren ng thornton wrote:



-- Announcing: bytestring-trie 0.1.0 (beta)


An efficient finite map from (byte)strings to values.

The implementation is based on big-endian patricia trees, like  
Data.IntMap. We first trie on the Word8 elements of a  
Data.ByteString, sharing string prefixes where possible, and then  
trie on the big-endian bit representation of those elements.  
Patricia trees have efficient algorithms for union and other  
merging operations, but they're also quick for lookups and insertions.




-- Future Extensions


* I've done spot checking to make sure it works, but haven't  
constructed an extensive test suite. Does anyone know of a good  
data set already out there for testing corner cases? I'm sure other  
dictionary writers have come up with one.


* A Word8 is much smaller than the architecture's natural Word  
size. Therefore it'd be more efficient for the trie on bits to read  
off a whole Word at a time. This work is in progress, but I need  
help from people with 64-bit and big-endian machines in order to  
verify the code works on those architectures.


* Using ByteStrings allows for trieing on any packed byte  
representation of a value, but they're not Strings. Integration  
with Data.ByteString.Char8, utf8-string, and utf8-light should happen.


* The current implementation also only accepts strict ByteStrings.  
When chopping up strings to share prefixes we share the smaller  
string. For very long strings when many deletions are expected,  
this can still hold onto more memory than necessary. Accepting lazy  
ByteStrings or adding heuristics for when to copy prefixes instead  
of sharing will fix this.




-- Links


Homepage:
http://code.haskell.org/~wren/

Hackage:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/ 
bytestring-trie


Darcs:
http://code.haskell.org/~wren/bytestring-trie/

Haddock (Darcs version):
http://code.haskell.org/~wren/bytestring-trie/dist/doc/html/ 
bytestring-trie/


--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ANN: bytestring-trie 0.1.0

2008-12-19 Thread wren ng thornton


-- Announcing: bytestring-trie 0.1.0 (beta)


An efficient finite map from (byte)strings to values.

The implementation is based on big-endian patricia trees, like 
Data.IntMap. We first trie on the Word8 elements of a Data.ByteString, 
sharing string prefixes where possible, and then trie on the big-endian 
bit representation of those elements. Patricia trees have efficient 
algorithms for union and other merging operations, but they're also 
quick for lookups and insertions.




-- Future Extensions


* I've done spot checking to make sure it works, but haven't constructed 
an extensive test suite. Does anyone know of a good data set already out 
there for testing corner cases? I'm sure other dictionary writers have 
come up with one.


* A Word8 is much smaller than the architecture's natural Word size. 
Therefore it'd be more efficient for the trie on bits to read off a 
whole Word at a time. This work is in progress, but I need help from 
people with 64-bit and big-endian machines in order to verify the code 
works on those architectures.


* Using ByteStrings allows for trieing on any packed byte representation 
of a value, but they're not Strings. Integration with 
Data.ByteString.Char8, utf8-string, and utf8-light should happen.


* The current implementation also only accepts strict ByteStrings. When 
chopping up strings to share prefixes we share the smaller string. For 
very long strings when many deletions are expected, this can still hold 
onto more memory than necessary. Accepting lazy ByteStrings or adding 
heuristics for when to copy prefixes instead of sharing will fix this.




-- Links


Homepage:
http://code.haskell.org/~wren/

Hackage:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie

Darcs:
http://code.haskell.org/~wren/bytestring-trie/

Haddock (Darcs version):

http://code.haskell.org/~wren/bytestring-trie/dist/doc/html/bytestring-trie/

--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe