just dont forget that a RadixTree is O(L) on the length of the strings
upon lookup, while a Set is O(1) on average (worse the more collisions
you have) since a string's hashCode is stored as an instance field.
But since they're lazily calculated, for brand new strings, lookup
time on a Set is O(N)
On 3 Sep., 17:14, Barney wrote:
> Is it realistic to use HashSet to determine if a large amount of
> string data (2 000 000 strings of length 20) is composed of unique
> entry ?
i needed something like this recently, i used a radix tree data
structure to store all strings. quite space-saving. st
Check this out at:
http://amino-cbbs.sourceforge.net/
http://amino-cbbs.sourceforge.net/java_apidocs/index.html
May be, parallelism might help here.
Regards
On Fri, Sep 4, 2009 at 12:59 AM, Christian Catchpole <
christ...@catchpole.net> wrote:
>
> do it and find out :) i dont think the hashi
do it and find out :) i dont think the hashing of the collections
classes have anything against such high object counts. it might just
be a concern of memory.
on average, do you expect all 2 million strings to be unique? how
often do you expect duplicates?
you could do the processing in small