Re: [HACKERS] Hash index todo list item

2007-11-14 Thread Kenneth Marshall
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote: > On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote: > > This is a great starting point. I would appreciate it if you have the > > time and could make it apply cleanly to HEAD. > > Just to give you an update on this, I'll try to

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Tom Lane
Shreya Bhargava <[EMAIL PROTECTED]> writes: > We performed some probe tests on our patch on > hash index and the original btree index to compare the > performance between the two. Interesting, but ... > From the results obtained, the average of all the hash probes is 141.8ms, the > average for

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Jens-Wolfhard Schicke
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Shreya Bhargava wrote: > "Note that the bottom line for the problems with hash indexes is that the > current implementation doesn't offer any advantages over btree indexes. Hash > indexes need to be not only as good of a choice as btree indexes but > s

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Heikki Linnakangas
Shreya Bhargava wrote: 1. Populate the table with 80 million tuples. 2. Create HASH index on the table. 3. clear both linux cache & psql buffers. (exiting psql and restarting it cleared the psql buffers; to clear linux cache, we used drop_cache command) 4. start psql 5. select on an intege

Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Gokulakannan Somasundaram
I have not followed this thread very closely. But just wanted to give my inputs. > From the results obtained, the average of all the hash probes is 141.8ms, > the average for btree is 168.5, a difference of about 27.The standard > deviations are about 23, so this is a statistically significant di

Re: [HACKERS] Hash index todo list item

2007-11-05 Thread Shreya Bhargava
"Note that the bottom line for the problems with hash indexes is that the current implementation doesn't offer any advantages over btree indexes. Hash indexes need to be not only as good of a choice as btree indexes but significantly better a significantly better choice at least for some specific c

Re: [HACKERS] Hash index todo list item

2007-10-22 Thread Kenneth Marshall
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote: > On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote: > > This is a great starting point. I would appreciate it if you have the > > time and could make it apply cleanly to HEAD. > > Just to give you an update on this, I'll try to

Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Kenneth Marshall
Tom, Thank you for the update. I am currently working on updating the patch Neil Conway sent in against 8.0-ish that stores only the hash in the index and locates the entries within the page using a binary search. Then I will fold in your recent update. On Sun, Oct 21, 2007 at 01:13:48PM -0700, T

Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Tom Raney
Kenneth, I just pushed the revised patch (v2!). The revised approach samples the parent relation to estimate the number of tuples rather than performing a complete scan. In my tests, the estimate appears to be accurate, erring on the larger side, which is fine. Tom, That is great. I am loo

Re: [HACKERS] Hash index todo list item

2007-10-17 Thread Kenneth Marshall
Tom, That is great. I am looking forward to your patch. After the issues that you needed to address, I think that it would be reasonable to add a few more user settings for the hash index. Fill-factor is too course a knob. The others that I have been considering are: maxtuples - Not really the ma

Re: [HACKERS] Hash index todo list item

2007-10-12 Thread Kenneth Marshall
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote: > On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: > > Dear PostgreSQL Hackers: > > > > After following the hackers mailing list for quite a while, > > I am going to start investigating what will need to be done > >

Re: [HACKERS] Hash index todo list item

2007-09-28 Thread Bruce Momjian
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --- Tom Raney wrote: > We are pleased to announce an upcoming patch to the hash index code > which improves build time an

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Tom Raney
Kenneth Marshall wrote: On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: "Kenneth Marshall" <[EMAIL PROTECTED]> writes: On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: Using our implementation, build times and index sizes are comparable with btree inde

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Michael Glaesemann
On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote: Although I am very excited about this patch, I do not see any real value in including it in 8.3. I don't think you have to worry about it being in 8.3. Feature freeze was months ago. Michael Glaesemann grzm seespotcode net -

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote: > "Kenneth Marshall" <[EMAIL PROTECTED]> writes: > > > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: > > > >> Using our implementation, build times and index sizes are > >> comparable with btree index build times and index

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Gregory Stark
"Kenneth Marshall" <[EMAIL PROTECTED]> writes: > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: > >> Using our implementation, build times and index sizes are >> comparable with btree index build times and index sizes. >... > That is super! (and timely) It is pretty super. I have a fe

Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote: > We are pleased to announce an upcoming patch to the hash index code > which improves build time and index size, based on this item in the > TODO list: > During index creation, pre-sort the tuples to improve build speed > http://archives.p

Re: [HACKERS] Hash index todo list item

2007-09-20 Thread Tom Raney
We are pleased to announce an upcoming patch to the hash index code which improves build time and index size, based on this item in the TODO list: During index creation, pre-sort the tuples to improve build speed http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php Using our implemen

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Martijn van Oosterhout
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote: > I think that if the case of >1 entry per hash becomes common enough to > be significant, and the key is stored in the hash, that a btree will > perform equal or better, and there is no point in pursuing such a hash > index model. Th

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Mark Mielke
Kenneth Marshall wrote: On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: Kenneth Marshall <[EMAIL PROTECTED]> writes: ... This is the rough plan. Does anyone see anything critical that is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
More random thoughts: - Hash-Indices are best for unique keys, but every table needs a new hash key, which means one more random page access. Is there any way to build multi-_table_ indices? A join might then fetch all table rows with a given unique key after one page fetch for the combined in

Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke <[EMAIL PROTECTED]> wrote: Kenneth Marshall wrote: Along with the hypothetical performance wins, the hash index space efficiency would be improved by a similar factor. Obviously, all of these ideas would need to be tested in various work

Re: [HACKERS] Hash index todo list item

2007-09-09 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: > Kenneth Marshall <[EMAIL PROTECTED]> writes: > > ... This is the rough plan. Does anyone see anything critical that > > is missing at this point? > > Sounds pretty good. Let me brain-dump one item on you: one thing that > hash currently

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke
Kenneth Marshall wrote: The value of storing the actual value, possibly as an option, is that the key check can be done in the index without requiring a heap lookup to check the actual value which would be a benefit for a unique index. I agree that supporting variable length data would complicate

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall
On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote: > Kenneth Marshall wrote: >> Continuing this train of thought While it would make sense for larger >> keys to store the hash in the index, if the key is smaller, particularly >> if it is of fixed size, it would make sense to store the

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke
Kenneth Marshall wrote: Continuing this train of thought While it would make sense for larger keys to store the hash in the index, if the key is smaller, particularly if it is of fixed size, it would make sense to store the key in the index instead. This would have the benefit of allowing use

Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: > Kenneth Marshall wrote: > >> I understand that a hash value is a many-to-one mapping. That is the >> point of the flag in the index. The flag means that there is only one >> item in the heap corresponding to that hash value. In this case

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote: > Kenneth Marshall wrote: > >>> How likely is it that you will get a hash collision, two strings that are >>> different that will hash to the same value? To avoid this requires a >>> very large hash key (128 bits, minimum)- o

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt
Kenneth Marshall wrote: How likely is it that you will get a hash collision, two strings that are different that will hash to the same value? To avoid this requires a very large hash key (128 bits, minimum)- otherwise you get into birthday attack problems. With a 32-bit hash, the li

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote: > Kenneth Marshall wrote: > >> I understand that a hash value is a many-to-one mapping. That is the >> point of the flag in the index. The flag means that there is only one >> item in the heap corresponding to that hash value. In this case

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote: > Kenneth Marshall wrote: >> I understand that a hash value is a many-to-one mapping. That is the >> point of the flag in the index. The flag means that there is only one >> item in the heap corresponding to that hash value. In this case

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt
Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we know that the value in the heap is the correct one and a possibly

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke
Kenneth Marshall wrote: I understand that a hash value is a many-to-one mapping. That is the point of the flag in the index. The flag means that there is only one item in the heap corresponding to that hash value. In this case we know that the value in the heap is the correct one and a possibly v

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: > On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: > > 2. Evaluate the performance of different hash index implementations > >and/or changes to the current implementation. My current plan is > >to keep the implementation

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote: > Kenneth Marshall wrote: >> On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: >> >>> You might find this patch useful: >>> >>> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php >>> ... >>> >>> Unfortunat

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke
Kenneth Marshall wrote: On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: You might find this patch useful: http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php ... Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge it up to HEAD if you'd lik

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote: > Neil Conway wrote: > > You might find this patch useful: > > > > http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php > > Oh, I had forgot about that. > > > It implements the "just store the hash in the index"

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote: > On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: > > 2. Evaluate the performance of different hash index implementations > >and/or changes to the current implementation. My current plan is > >to keep the implementation

Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Heikki Linnakangas
Neil Conway wrote: > You might find this patch useful: > > http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php Oh, I had forgot about that. > It implements the "just store the hash in the index" idea; it also sorts > the entries in a bucket by the hash value, which allows binar

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Neil Conway
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote: > 2. Evaluate the performance of different hash index implementations >and/or changes to the current implementation. My current plan is >to keep the implementation as simple as possible and still provide >the desired performance

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Martijn van Oosterhout
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote: > Since we already have to check the actual tuple values for any index > lookup in postgresql, we could only store the full hash value and the > corresponding TIDs in the bucket. Then when we lookup an item by > calculating its hash,

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke
Michael Glaesemann wrote: On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of co

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote: > Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. >>> How does that help? You still need all the keys to find out which >>> bucket to lo

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: > >> No, because part of the deal is that you can do lookups using only the > >> leading index columns. At least, all the

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Michael Glaesemann
On Sep 6, 2007, at 10:53 , Mark Mielke wrote: I don't like the truncating hash suggestion because it limits the ability of a hash code to uniquely identify a key. AIUI, a hash can't be used as a unique identifier: it always needs to be rechecked due to the chance of collisions. There might

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke
Hannu Krosing wrote: One approahc is not to mix hashes, but to partition the hash, so that each column gets its N bits in the hash. How does that help? You still need all the keys to find out which bucket to look in. no. you need to look at only the buckets where that part of has

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Tom Lane
Hannu Krosing <[EMAIL PROTECTED]> writes: > Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: >> No, because part of the deal is that you can do lookups using only the >> leading index columns. At least, all the existing multicolumn index >> types can do that. > One approahc is no

Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane: > Gregory Stark <[EMAIL PROTECTED]> writes: > > "Kenneth Marshall" <[EMAIL PROTECTED]> writes: > >> - What about multi-column indexes? The current implementation > >> only supports 1 column. > > > That seems kind of weird. It seems ob

Re: [HACKERS] Hash index todo list item

2007-09-05 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: > Dear PostgreSQL Hackers: > > After following the hackers mailing list for quite a while, > I am going to start investigating what will need to be done > to improve hash index performance. Below are the pieces of > this project tha

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Ben Tilly" <[EMAIL PROTECTED]> writes: > > That raises a very random thought. One of the nicer features of > > Oracle is the ability to have function-based indexes. So you could > > index, say, trim(lower(person.name)). > > > Is there any prospect

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane
"Ben Tilly" <[EMAIL PROTECTED]> writes: > That raises a very random thought. One of the nicer features of > Oracle is the ability to have function-based indexes. So you could > index, say, trim(lower(person.name)). > Is there any prospect of postgres aquiring that functionality? Uh, no, since i

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote: > > That raises a very random thought. One of the nicer features of > Oracle is the ability to have function-based indexes. So you could > index, say, trim(lower(person.name)). There are a *lot* of practical > situations where that come

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Gregory Stark <[EMAIL PROTECTED]> wrote: > > "Kenneth Marshall" <[EMAIL PROTECTED]> writes: > > > On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: > >> Kenneth Marshall <[EMAIL PROTECTED]> writes: > >> > ... This is the rough plan. Does anyone see anything critical that > >> > i

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > "Kenneth Marshall" <[EMAIL PROTECTED]> writes: >> - What about multi-column indexes? The current implementation >> only supports 1 column. > That seems kind of weird. It seems obvious that you mix the three hashes > together which reduces it to the solve

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Gregory Stark
"Kenneth Marshall" <[EMAIL PROTECTED]> writes: > On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: >> Kenneth Marshall <[EMAIL PROTECTED]> writes: >> > ... This is the rough plan. Does anyone see anything critical that >> > is missing at this point? >> >> Sounds pretty good. Let me brai

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote: > > > > This is the rough plan. Does anyone see anything critical that > > is missing at this point? Please send me any suggestions for test > > data and various performance test ideas, since I will be working > > on that first. > > Sou

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote: > Kenneth Marshall <[EMAIL PROTECTED]> writes: > > ... This is the rough plan. Does anyone see anything critical that > > is missing at this point? > > Sounds pretty good. Let me brain-dump one item on you: one thing that > hash currently

Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Simon Riggs
On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote: > Dear PostgreSQL Hackers: > > After following the hackers mailing list for quite a while, > I am going to start investigating what will need to be done > to improve hash index performance. Below are the pieces of > this project that I am

Re: [HACKERS] Hash index todo list item

2007-09-02 Thread Tom Lane
Kenneth Marshall <[EMAIL PROTECTED]> writes: > ... This is the rough plan. Does anyone see anything critical that > is missing at this point? Sounds pretty good. Let me brain-dump one item on you: one thing that hash currently has over btree is the ability to handle index items up to a full page.