Does anybody have 'j code' implementing this? The discription found at: http://citeseer.ist.psu.edu/cachedpage/526368/1
indicates to me to me it could eligantly be emplimented in J. If you were to turn it into an .exe file for dos/windows or an executable for unix ??? ABSTRACT Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We Purpose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or near sorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. The experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order. Conclusion: We have described the burst trie, a variant form of the trie that is highly efficient for managing large sets of strings in memory. In the burst trie, each leaf holds a set of strings in a container, allowing dramatic space reductions with no impact on efficiency. We have explored a wide range of options for managing a burst trie, including different container structures, different heuristics for deciding when to burst, and different techniques for bursting. In large scale experiments with a variety of real data sets, we have shown that burst tries are faster than compact tries and use 1/6 of the space: are more compact than binary trees or splay trees and are over two times faster: and are close to hash tables in efficiency, yet keep the data in sort order. These results demonstrate that the burst trie is dramatically more efficient than any previous structure for the task of managing sorted strings. The improvements are the same on data sets ranging from 100 Mb to 45 Gb in size. For text data, these gains are because the more common strings are typically also the shorter strings, and are stored wholly within the access trie. Rare strings held in containers, but because they are typically long there is no loss of efficiency: the cost of searching the container is offset by the savings of not having to traverse a large number of trie nodes. Together, these savings allow in memory processing of larger sets of strings than was previously possible, in much less time. the performance of burst trie depends on the distribution of strings. The worst case we observed was genomic data, where the strings are of equal length and have a relatively flat probability distribution. Even in this case, performance was comparable to the other tree structures. Analytical results have shown that the height of a burst trie is expected to be logarithmic. We therefore believe that the burst trie is the data structure of choice for practical string management. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
