--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke
<[EMAIL PROTECTED]> wrote:
Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.
Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
Space efficiency at ~1 entry per bucket: How about using closed hashing,
saving in each page a bitmask in front which specifies which entries hold
valid entries and in the rest of the page row-pointers (is this the correct
expression? I don't know...) without further data. Should provide
reasonably simple data structure and alignment for the pointers.
Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance
characteristics
that we are seeking.
One should look into new plan nodes for "!= ANY()", "NOT EXISTS" and
similar. A node like "look into hash and true if bucket is empty" would
work without checking tuple visibility when the bucket is empty and could
be a win in some situations.
Do we want special cases for short keys like INT4? In those cases the
implementation might use hash == key and put that knowledge to use in
plans. Even a unique constraint might then be doable. Does the
postgresql-storage backend on linux support sparse files? Might be a win
when holes in the sequence turn up.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match