On Sun, Oct 4, 2009 at 7:13 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.ja...@gmail.com> writes: >> I see that the docs were recently changed from discouraging hash >> indexes both because they were no known uses in which they were >> meaningfully better than btree, and because they aren't recoverable; >> to now just discouraging them because they are not recoverable. Does >> that mean there are now uses in which hash indexes are provably better >> than btree if one is willing to overlook the lack of recoverability? >> If so, what are those situations? > > One reason is that you can index values that are too big for btrees; > since hash indexes now store only the hash codes, they don't have a hard > length limit on the underlying value. I'm not sure how useful that > really is in practice, but it's at least an argument for considering > them in certain cases.
I haven't tested this situation at all yet, hoping to show that there is substantially less restrictive use case under which they can be made to work well. > >> I've played around a bit with hash indexes, and it seems to me that >> making them generally worthwhile will take (at least) reducing or >> entirely doing away with the heavy-wait locks. > > Concurrency is really the least of the issues for hash indexes. So far > it's not clear that they're fast enough even in sequential use ... Hash index performance came up again in another thread, so I wanted to come back to this and report my findings. I've done testing using select-only statements with either btree or hash index on pgbench_accounts.aid, using only one active backend. For all tests, CPU usage was the bottleneck, essentially 100%. I used 600MB shared_buffers and -s 30. I figure that if each table-page-fetch is a physical IO, then that will dominate the performance, so the index's nature is irrelevant. So if the hash index has a strength it will best show if the table fits in memory. (The alternative would be if the table is so large that not even the btree's index pages fit in memory, resulting in IO for those index pages as well as table's, but I didn't have the patience or adequate IO subsystem to set up a test for that) With pgbench -S (select only, one tuple at a time) the overhead of communication between pgbench and the backend dominated the run time. Moving the select loop into a pl/pgsql routine did not help much. So I made a temp table with 10,000 randomly selected ids (from 1 to 3,000,000) and used that as the outer member to drive an nested loop join against the pgbench_accounts table, doing a custom query of select sum(abalance) from pgbench_accounts, foo where aid=random_id The HEAD code could do 16-17 tps (160,000 to 170,000 inner loop tuple fetches per second) using hash index, and 18 tps with btree. (But occasionally the rank order would be reversed, out of many alternating runs of reindexing, vacuuming, then doing pgbench -T 600 -f nested_loop.sql). When I disabled the lmgr page locking of the hash index, by making it think it was a temporary relation, the tps for the hash index jumped to 24 tps. If I make all the aid pgbench_accounts be even, and load "foo" with only odd IDs so that no rows are actually found, then the speed of the hash index jumps to about 48 tps, so about half the remaining CPU usage with hash index pertains to dealing with the rows once they are found, and not with using the index to attempt to find them. (I don't know how much of this time is used to recheck the qual, which btree doesn't need to do.) So my conclusions are: 1) btree indexes are so good at equality lookup, there may not be much room for hash indexes to improve on it by a substantial amount. 2) Heavy-weight locks are called that for a reason, they use a lot of CPU even without contention. 3) The CPU usage of the hash-index code proper is quite small, with more time being spent in heavy-weight PageLocks (specific to hash indexes, but not part of the hash index code itself) and in executor code common to all index methods, than in the hash index code. Based on this, I don't think that changing the bucket size to be smaller than a 8K block size is going to make much difference in hash index performance. Thinking about how to get rid of heavy weight locks and add WAL, I was wondering if we couldn't get rid of bucket splits altogether, requiring the bucket count to be set immutably at build time based on the estimated eventual max/equilibrium size of the table. Maybe this would render hash indexes somewhat of a niche feature, but probably much less so than they currently are. I think that getting rid of dynamic bucket splits would make it easier to get rid of heavy-weight locks and also much easier to implement WAL. WAL logging the addition of an overflow page seems about the same as a btree node split, while WAL logging a bucket split, when the bucket potentially has overflow pages in it, seems much much harder. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers