Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>:

> I'm not really familiar enough with hash indexes to know if this
> would
> work, but if the maximum bucket size was known you could use that to
> determine a maximum range of buckets to look at. In some cases, that
> range would include only one bucket, otherwise it would be a set of
> buckets. If you found a set of buckets, I think you could then just
> go
> to the specific one you need.
> If we assume that the maximum bucket size is one page it becomes
> more
> realistic to take an existing large bucket and split it into several
> smaller ones. This could be done on an update to the index page, or
> a
> background process could handle it.
> In any case, should this go on the TODO list?
> > Allowing a bucket size to be specified at CREATE INDEX doesn't seem
> out
> > of line though.  We'd have to think up a scheme for
> index-AM-specific
> > index parameters ...
> -- 
> Jim C. Nasby, Database Consultant               [EMAIL PROTECTED] 
> Give your computer some brain candy! www.distributed.net Team #1828

Google "dynamic hash" or "linear hash". It takes care of not needing to
have varying bucket sizes.

Hash indexes are useful if you ALWAYS require disk access; they behave
like worst-case random cache-thrash tests. That's probably why dbs have
gravitated toward tree indexes instead. On the other hand, there's more
(good) to hashing than initially meets the eye.

Dynamic multiway hashing has come a long way from just splicing the bits
together from multiple columns' hash values. If you can lay your hands
on Tim Merrett's old text "Relational Information Systems", it's an
eye-opener. Picture an efficient terabyte spreadsheet.

For one thing, unlike a btree, a multicolumn hash is symmetric: it
doesn't matter which column(s) you do not specify in a partial match.

For another, a multiway hash is useful for much lower selectivity than a
btree. I built such indexes for OLAP cubes, and some dimensions were
only 10 elements wide. At the point where btree indexing becomes worse
than seqscan, a multiway hash tells you which 10% of the disk to scan.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to