Thanks a
lot for your detailed answer. I used this kind of trees
in Prolog but since
I'm worried about the
access time I didn't take this
datatype into consideration.
I'll have a look into
your links. Management of very huge sets of data frequently
is a hybrid form of trees and hash tables, so
trees can't be inherently bad in
terms
of performance.
Looking at the superficial outline
structure first is indeed a technique worth knowing
and searching can be carried out well without hashing on sorted lists by binary
seach
for example. However, these problems
require case-by-case decisions. Since
PolyML
has hashing I'd like to give
it a try and I would appreciate to
see an example.
Regards,
Michael
Am 10/17/2012 01:52 PM, schrieb Makarius:
On Wed, 17 Oct 2012, Michael Moeller wrote:
Thanks for the links, I'll look into it.
Talking about large amounts of data, do you happen to have an
example of using HashArray? This would be great.
Maybe someone else on the list has experience with hash tables.
In Isabelle/ML we do the "large data" management in memory as
immutable tree structures. This allows to conserve a lot of
memory in Poly/ML 5.5.0 in particular, where David provides this
exceedingly nice online sharing of old/life data on the heap. It
also works well with multicore / multithread programming to have
everything persistent, immutable, sharable by default.
The main workhorse for map/set-like structures in Isabelle/ML is
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/Pure/General/table.ML
This is a quite efficient and reasonably simple implementation of
2-3 trees. When starting that in 1999, I've first had a look at
the red-black tree implementation in SML/NJ (ported from C) and
was not convinced -- later some people found actual problems in it
that might be still there.
Another prerequisite of efficiency are reasonable order functions
defined on the key values for indexing tables. E.g. for type
string we look at the length in the descriptor first, before
walking through the memory to compare the text (you cannot do this
in C). Same for complex types like lists, trees, lambda-terms
etc.: first compare superficial outline structure, then walk down
to the content.
Since red-black trees are more present in the public these days
(why?), several people have worked on this verified implementation
using Isabelle/HOL:
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/HOL/Library/RBT_Impl.thy
http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/HOL/Library/RBT.thy
You can backport that to SML manually, or use the Isabelle/HOL
generator to produce SML, Haskell, Scala etc. -- the result would
not look much worse than the C -> SML translation performed by
hand in SML/NJ some years ago.
Makarius
|
_______________________________________________
polyml mailing list
polyml@inf.ed.ac.uk
http://lists.inf.ed.ac.uk/mailman/listinfo/polyml