Hi,

after getting GiST works we're trying to use RD-Tree in
our fulltext search application. We have universe of lexems
(words in dictionaries) which is rather large, so
we need some compression to effectively use RD-Tree.
When we did index support for int arrays we compressed
set by range sets but it's not applicable if cardinality of
universe set is very high. We're thinking about algorithm of
creating good signature for set of integers. This signature
must follow several rules:

1). if set A is contained in set B, then sig(A) is also contained in sig(B)
2). if set C is a union of set A and set B, then sig(C) is union of
   sig(A) and sig(B)

Also, signature should be good for effective tree construction (RD-Tree),
i.e. it should be not degenerated for set size about 10^6 .

We need 1) for search operation and 2) for tree contructing.

Right now we implementing so-called "superimposed coding" technique
(D. Knuth, vol.3) which is based on idea to hash attribute values
into random k-bit codes in a b-bit field and to superimpose the codes for
each attribute value in a record. This technique was proposed by Sven Helmer
("Index Structures for Databases Containing Data Items with Set-valued Attr
ibutes",1997, Sven Helmer, paper is available from my gist page)
to represent sets in the index structures. This technique is great because
of fixed length and great speed of calculation (used only bit operations).
It follows rules 1 and 2, but it's not good for big sets, because for
internal nodes and especially for root (a union of sets) we get signature
fully consisting of 1.
We couldn't use arbitrarily long signature, because we have 8Kb limit of
index page size. For signature of variable size length we don't know
how to define 1) and 2)

While we 're investigating the problem, I'd be glad
to know some references, ideas.

        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Reply via email to