Re: [HACKERS] Hash index todo list item

Jens-Wolfhard Schicke Mon, 10 Sep 2007 02:03:25 -0700

--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke<[EMAIL PROTECTED]> wrote:

Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.



Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).

Space efficiency at ~1 entry per bucket: How about using closed hashing,saving in each page a bitmask in front which specifies which entries holdvalid entries and in the rest of the page row-pointers (is this the correctexpression? I don't know...) without further data. Should providereasonably simple data structure and alignment for the pointers.

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance
characteristics
that we are seeking.

One should look into new plan nodes for "!= ANY()", "NOT EXISTS" andsimilar. A node like "look into hash and true if bucket is empty" wouldwork without checking tuple visibility when the bucket is empty and couldbe a win in some situations.

Do we want special cases for short keys like INT4? In those cases theimplementation might use hash == key and put that knowledge to use inplans. Even a unique constraint might then be doable. Does thepostgresql-storage backend on linux support sparse files? Might be a winwhen holes in the sequence turn up.




---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] Hash index todo list item

Reply via email to