--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke <[EMAIL PROTECTED]> wrote:
Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.


Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
Space efficiency at ~1 entry per bucket: How about using closed hashing, saving in each page a bitmask in front which specifies which entries hold valid entries and in the rest of the page row-pointers (is this the correct expression? I don't know...) without further data. Should provide reasonably simple data structure and alignment for the pointers.

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance
characteristics
that we are seeking.

One should look into new plan nodes for "!= ANY()", "NOT EXISTS" and similar. A node like "look into hash and true if bucket is empty" would work without checking tuple visibility when the bucket is empty and could be a win in some situations.

Do we want special cases for short keys like INT4? In those cases the implementation might use hash == key and put that knowledge to use in plans. Even a unique constraint might then be doable. Does the postgresql-storage backend on linux support sparse files? Might be a win when holes in the sequence turn up.



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to