Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
On Tue, Aug 01, 2006 at 02:26:18PM -0700, [EMAIL PROTECTED] wrote: > Kenneth Marshall wrote: > > On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote: > > > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > > > > Jim Nasby wrote: > > > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > > > > >Hannu Krosing <[EMAIL PROTECTED]> writes: > > > > > > > > > >>What would be the use-case for hash indexes ? And what should be > > > > > >>done to make them faster than btree ? > > > > > > > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > > > > >to find out. > > > > > > > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > > > > wouldn't hash indexes be faster for those operations than regular > > > > > indexes? > > > > > > > > The main problem doesn't seem to be in the hash algorithm (which I > > > > understand to mean the hashing function), but in the protocol for > > > > concurrent access of index pages, and the distribution of keys in pages > > > > of a single hash key. > > > > > > > > This is described in a README file or a code comment somewhere in the > > > > hash AM code. Someone needs to do some profiling to find out what the > > > > bottleneck really is, and ideally find a way to fix it. > > > > > > What I'm getting at is that I've never seen any explanation for the > > > theoretical use cases where a hash index would outperform a btree. If we > > > knew what kind of problems hash indexes were supposed to solve, we could > > > try and interest people who are solving those kinds of problems in > > > fixing hash indexes. > > > > The big win for hash indexes is the idea that searching for a single > > value should only take 1 I/O operation in a perfect world. Btree can > > not do that. > > Hash indexes stored on disk still need a level of indirection -- you've > got to look up what range of blocks contains your hash value. How big > your table of ranges is depends on how big the hash index is and how > big your ranges are. Almost always you can fit that table into a block > cached in memory. But, the root of a BTree is often cached in memory > too. So there's no advantage for a hash index over a BTree index until > the BTree needs to grow to three levels deep, what is that, usually > 10,000 or 100,000 records. Beyond that, you're right, the BTree slowly > grows deeper while the hash index doesn't. > I have seen some clever hash techniques that used knowledge ofo the file and directory structure to avoid the indirection allowing a single I/O operation to retrieve the value of the index without needing another layer of indirection. So it is possible given appropriate constraints. Of course, postgresql would need to check for tuple validity unless that could be incorporated into the index in some fashion. Ken ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
Kenneth Marshall wrote: > On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote: > > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > > > Jim Nasby wrote: > > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > > > >Hannu Krosing <[EMAIL PROTECTED]> writes: > > > > > > > >>What would be the use-case for hash indexes ? And what should be > > > > >>done to make them faster than btree ? > > > > > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > > > >to find out. > > > > > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > > > wouldn't hash indexes be faster for those operations than regular > > > > indexes? > > > > > > The main problem doesn't seem to be in the hash algorithm (which I > > > understand to mean the hashing function), but in the protocol for > > > concurrent access of index pages, and the distribution of keys in pages > > > of a single hash key. > > > > > > This is described in a README file or a code comment somewhere in the > > > hash AM code. Someone needs to do some profiling to find out what the > > > bottleneck really is, and ideally find a way to fix it. > > > > What I'm getting at is that I've never seen any explanation for the > > theoretical use cases where a hash index would outperform a btree. If we > > knew what kind of problems hash indexes were supposed to solve, we could > > try and interest people who are solving those kinds of problems in > > fixing hash indexes. > > The big win for hash indexes is the idea that searching for a single > value should only take 1 I/O operation in a perfect world. Btree can > not do that. Hash indexes stored on disk still need a level of indirection -- you've got to look up what range of blocks contains your hash value. How big your table of ranges is depends on how big the hash index is and how big your ranges are. Almost always you can fit that table into a block cached in memory. But, the root of a BTree is often cached in memory too. So there's no advantage for a hash index over a BTree index until the BTree needs to grow to three levels deep, what is that, usually 10,000 or 100,000 records. Beyond that, you're right, the BTree slowly grows deeper while the hash index doesn't. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
On Tue, 2006-08-01 at 07:55 -0700, Luke Lonergan wrote: > WRT hashing - we use FNV hash which is a very nice, very fast modern hash > algorithm. We would contribute that if we worked on this. We currently use Bob Jenkins' hash function[1], which is apparently faster than FNV on most architectures except the Pentium IV (because the P4 has slow shifting -- see [2]). I haven't compared their collision rates -- it may be that we could improve matters incrementally by switching to FNV, but the fundamental problems with our hash index implementation lie elsewhere. -Neil [1] http://burtleburtle.net/bob/hash/doobs.html [2] http://www.azillionmonkeys.com/qed/hash.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
Jim, On 7/28/06 12:27 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > In that case, perhaps this is something Greenplum might be interested > in, since it might fit nicely between bitmap and btree indexes. I'm certainly following the thread. We have talked about hash and btree organized tables both here, but haven't gotten far enough along to evaluate what's already there in pg. Looks like this thread has nicely characterized the problems with what's there. WRT hashing - we use FNV hash which is a very nice, very fast modern hash algorithm. We would contribute that if we worked on this. - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
Tom Lane <[EMAIL PROTECTED]> writes: > I think the problem may well be that we use hash buckets that are too > large (ie, whole pages). After we fetch the page, we have to grovel > through every tuple on it to find the one(s) that really match the > query, whereas btree has a much more intelligent strategy (viz binary > search) to do its intrapage searches. Smaller buckets would help make > up for this. Hm, you would expect hash indexes to still be a win for very large indexes where you're worried more about i/o than cpu resources. > Another issue is that we don't store the raw hashcode in the index > tuples, so the only way to test a tuple is to actually invoke the > datatype equality function. If we stored the whole 32-bit hashcode > we could eliminate non-matching hashcodes cheaply. I'm not sure how > painful it'd be to do this though ... hash uses the same index tuple > layout as everybody else, and so there's no convenient place to put > the hashcode. I looked a while back and was suspicious about the actual hash functions too. It seemed like a lot of them were vastly suboptimal. That would mean we're often dealing with mostly empty and mostly full buckets instead of well distributed hash tables. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote: > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > > Jim Nasby wrote: > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > > >Hannu Krosing <[EMAIL PROTECTED]> writes: > > > > > >>What would be the use-case for hash indexes ? And what should be > > > >>done to make them faster than btree ? > > > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > > >to find out. > > > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > > wouldn't hash indexes be faster for those operations than regular > > > indexes? > > > > The main problem doesn't seem to be in the hash algorithm (which I > > understand to mean the hashing function), but in the protocol for > > concurrent access of index pages, and the distribution of keys in pages > > of a single hash key. > > > > This is described in a README file or a code comment somewhere in the > > hash AM code. Someone needs to do some profiling to find out what the > > bottleneck really is, and ideally find a way to fix it. > > What I'm getting at is that I've never seen any explanation for the > theoretical use cases where a hash index would outperform a btree. If we > knew what kind of problems hash indexes were supposed to solve, we could > try and interest people who are solving those kinds of problems in > fixing hash indexes. The big win for hash indexes is the idea that searching for a single value should only take 1 I/O operation in a perfect world. Btree can not do that. Ken ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
On Fri, Jul 28, 2006 at 03:14:33PM -0400, Alvaro Herrera wrote: > Jim C. Nasby wrote: > > > What I'm getting at is that I've never seen any explanation for the > > theoretical use cases where a hash index would outperform a btree. If we > > knew what kind of problems hash indexes were supposed to solve, we could > > try and interest people who are solving those kinds of problems in > > fixing hash indexes. > > The btree index needs to descend potentially many pages before getting > to the leaf page, where the actual index is stored. The hash index can > get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to > get a single key, while hash is O(1). Our problem lies in the > constants; for btree they are smaller than for hash, so in practice > that O(logN) is always smaller than O(1). > > I've heard other database systems manage to have hash indexes that are > actually faster than btree, so either (1) our btree absolutely rocks, or > (2) their hash implementations are better (probably both). In that case, perhaps this is something Greenplum might be interested in, since it might fit nicely between bitmap and btree indexes. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
Alvaro Herrera <[EMAIL PROTECTED]> writes: > The btree index needs to descend potentially many pages before getting > to the leaf page, where the actual index is stored. The hash index can > get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to > get a single key, while hash is O(1). Our problem lies in the > constants; for btree they are smaller than for hash, so in practice > that O(logN) is always smaller than O(1). > I've heard other database systems manage to have hash indexes that are > actually faster than btree, so either (1) our btree absolutely rocks, or > (2) their hash implementations are better (probably both). I think the problem may well be that we use hash buckets that are too large (ie, whole pages). After we fetch the page, we have to grovel through every tuple on it to find the one(s) that really match the query, whereas btree has a much more intelligent strategy (viz binary search) to do its intrapage searches. Smaller buckets would help make up for this. Another issue is that we don't store the raw hashcode in the index tuples, so the only way to test a tuple is to actually invoke the datatype equality function. If we stored the whole 32-bit hashcode we could eliminate non-matching hashcodes cheaply. I'm not sure how painful it'd be to do this though ... hash uses the same index tuple layout as everybody else, and so there's no convenient place to put the hashcode. Anyway the bottom line here is that no one's tried hard to fix it, but there are certainly things that might help. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
Jim C. Nasby wrote: > What I'm getting at is that I've never seen any explanation for the > theoretical use cases where a hash index would outperform a btree. If we > knew what kind of problems hash indexes were supposed to solve, we could > try and interest people who are solving those kinds of problems in > fixing hash indexes. The btree index needs to descend potentially many pages before getting to the leaf page, where the actual index is stored. The hash index can get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to get a single key, while hash is O(1). Our problem lies in the constants; for btree they are smaller than for hash, so in practice that O(logN) is always smaller than O(1). I've heard other database systems manage to have hash indexes that are actually faster than btree, so either (1) our btree absolutely rocks, or (2) their hash implementations are better (probably both). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > Jim Nasby wrote: > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > >Hannu Krosing <[EMAIL PROTECTED]> writes: > > > >>What would be the use-case for hash indexes ? And what should be > > >>done to make them faster than btree ? > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > >to find out. > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > wouldn't hash indexes be faster for those operations than regular > > indexes? > > The main problem doesn't seem to be in the hash algorithm (which I > understand to mean the hashing function), but in the protocol for > concurrent access of index pages, and the distribution of keys in pages > of a single hash key. > > This is described in a README file or a code comment somewhere in the > hash AM code. Someone needs to do some profiling to find out what the > bottleneck really is, and ideally find a way to fix it. What I'm getting at is that I've never seen any explanation for the theoretical use cases where a hash index would outperform a btree. If we knew what kind of problems hash indexes were supposed to solve, we could try and interest people who are solving those kinds of problems in fixing hash indexes. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Hash indexes (was: On-disk bitmap index patch)
Jim Nasby wrote: > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > >Hannu Krosing <[EMAIL PROTECTED]> writes: > >>What would be the use-case for hash indexes ? And what should be > >>done to make them faster than btree ? > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > >to find out. > > Do they use the same hash algorithm as hash joins/aggregation? If so, > wouldn't hash indexes be faster for those operations than regular > indexes? The main problem doesn't seem to be in the hash algorithm (which I understand to mean the hashing function), but in the protocol for concurrent access of index pages, and the distribution of keys in pages of a single hash key. This is described in a README file or a code comment somewhere in the hash AM code. Someone needs to do some profiling to find out what the bottleneck really is, and ideally find a way to fix it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Hash indexes (was: On-disk bitmap index patch)
On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: Hannu Krosing <[EMAIL PROTECTED]> writes: Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: The reason I have such high sales resistance is that we've carried the hash and rtree AMs for years, hoping that someone would do the work to make them actually worth using, with little result. What would be the use-case for hash indexes ? And what should be done to make them faster than btree ? If we knew, we'd do it ;-) But no one's put enough effort into it to find out. Do they use the same hash algorithm as hash joins/aggregation? If so, wouldn't hash indexes be faster for those operations than regular indexes? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly