On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
> On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
> > This is a great starting point. I would appreciate it if you have the
> > time and could make it apply cleanly to HEAD.
>
> Just to give you an update on this, I'll try to
Shreya Bhargava <[EMAIL PROTECTED]> writes:
> We performed some probe tests on our patch on
> hash index and the original btree index to compare the
> performance between the two.
Interesting, but ...
> From the results obtained, the average of all the hash probes is 141.8ms, the
> average for
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Shreya Bhargava wrote:
> "Note that the bottom line for the problems with hash indexes is that the
> current implementation doesn't offer any advantages over btree indexes. Hash
> indexes need to be not only as good of a choice as btree indexes but
> s
Shreya Bhargava wrote:
1. Populate the table with 80 million tuples.
2. Create HASH index on the table.
3. clear both linux cache & psql buffers.
(exiting psql and restarting it cleared the psql buffers;
to clear linux cache, we used drop_cache command)
4. start psql
5. select on an intege
I have not followed this thread very closely. But just wanted to give my inputs.
> From the results obtained, the average of all the hash probes is 141.8ms,
> the average for btree is 168.5, a difference of about 27.The standard
> deviations are about 23, so this is a statistically significant di
"Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
c
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
> On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
> > This is a great starting point. I would appreciate it if you have the
> > time and could make it apply cleanly to HEAD.
>
> Just to give you an update on this, I'll try to
Tom,
Thank you for the update. I am currently working on updating the
patch Neil Conway sent in against 8.0-ish that stores only the hash
in the index and locates the entries within the page using a binary
search. Then I will fold in your recent update.
On Sun, Oct 21, 2007 at 01:13:48PM -0700, T
Kenneth, I just pushed the revised patch (v2!). The revised approach
samples the parent relation to estimate the number of tuples rather than
performing a complete scan. In my tests, the estimate appears to be
accurate, erring on the larger side, which is fine.
Tom,
That is great. I am loo
Tom,
That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:
maxtuples - Not really the ma
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote:
> On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> > Dear PostgreSQL Hackers:
> >
> > After following the hackers mailing list for quite a while,
> > I am going to start investigating what will need to be done
> >
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
---
Tom Raney wrote:
> We are pleased to announce an upcoming patch to the hash index code
> which improves build time an
Kenneth Marshall wrote:
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
"Kenneth Marshall" <[EMAIL PROTECTED]> writes:
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
Using our implementation, build times and index sizes are
comparable with btree inde
On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote:
Although I am very excited about this patch, I do not see any real
value
in including it in 8.3.
I don't think you have to worry about it being in 8.3. Feature freeze
was months ago.
Michael Glaesemann
grzm seespotcode net
-
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
> "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
>
> > On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> >
> >> Using our implementation, build times and index sizes are
> >> comparable with btree index build times and index
"Kenneth Marshall" <[EMAIL PROTECTED]> writes:
> On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
>
>> Using our implementation, build times and index sizes are
>> comparable with btree index build times and index sizes.
>...
> That is super! (and timely)
It is pretty super. I have a fe
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
> We are pleased to announce an upcoming patch to the hash index code
> which improves build time and index size, based on this item in the
> TODO list:
> During index creation, pre-sort the tuples to improve build speed
> http://archives.p
We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
Using our implemen
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote:
> I think that if the case of >1 entry per hash becomes common enough to
> be significant, and the key is stored in the hash, that a btree will
> perform equal or better, and there is no point in pursuing such a hash
> index model. Th
Kenneth Marshall wrote:
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
Kenneth Marshall <[EMAIL PROTECTED]> writes:
... This is the rough plan. Does anyone see anything critical that
is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one
More random thoughts:
- Hash-Indices are best for unique keys, but every table needs a new hash
key, which means one more random page access. Is there any way to build
multi-_table_ indices? A join might then fetch all table rows with a given
unique key after one page fetch for the combined in
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke
<[EMAIL PROTECTED]> wrote:
Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various work
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> Kenneth Marshall <[EMAIL PROTECTED]> writes:
> > ... This is the rough plan. Does anyone see anything critical that
> > is missing at this point?
>
> Sounds pretty good. Let me brain-dump one item on you: one thing that
> hash currently
Kenneth Marshall wrote:
The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate
On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> Continuing this train of thought While it would make sense for larger
>> keys to store the hash in the index, if the key is smaller, particularly
>> if it is of fixed size, it would make sense to store the
Kenneth Marshall wrote:
Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing use
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case
On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>>> How likely is it that you will get a hash collision, two strings that are
>>> different that will hash to the same value? To avoid this requires a
>>> very large hash key (128 bits, minimum)- o
Kenneth Marshall wrote:
How likely is it that you will get a hash collision, two strings that are
different that will hash to the same value? To avoid this requires a very
large hash key (128 bits, minimum)- otherwise you get into birthday attack
problems. With a 32-bit hash, the li
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
> Kenneth Marshall wrote:
>
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case
On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> I understand that a hash value is a many-to-one mapping. That is the
>> point of the flag in the index. The flag means that there is only one
>> item in the heap corresponding to that hash value. In this case
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
Kenneth Marshall wrote:
I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
v
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
> On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> > 2. Evaluate the performance of different hash index implementations
> >and/or changes to the current implementation. My current plan is
> >to keep the implementation
On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote:
> Kenneth Marshall wrote:
>> On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
>>
>>> You might find this patch useful:
>>>
>>> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>>> ...
>>>
>>> Unfortunat
Kenneth Marshall wrote:
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
You might find this patch useful:
http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
...
Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
it up to HEAD if you'd lik
On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote:
> Neil Conway wrote:
> > You might find this patch useful:
> >
> > http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
>
> Oh, I had forgot about that.
>
> > It implements the "just store the hash in the index"
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
> On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> > 2. Evaluate the performance of different hash index implementations
> >and/or changes to the current implementation. My current plan is
> >to keep the implementation
Neil Conway wrote:
> You might find this patch useful:
>
> http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
Oh, I had forgot about that.
> It implements the "just store the hash in the index" idea; it also sorts
> the entries in a bucket by the hash value, which allows binar
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> 2. Evaluate the performance of different hash index implementations
>and/or changes to the current implementation. My current plan is
>to keep the implementation as simple as possible and still provide
>the desired performance
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote:
> Since we already have to check the actual tuple values for any index
> lookup in postgresql, we could only store the full hash value and the
> corresponding TIDs in the bucket. Then when we lookup an item by
> calculating its hash,
Michael Glaesemann wrote:
On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
I don't like the truncating hash suggestion because it limits the
ability of a hash code to uniquely identify a key.
AIUI, a hash can't be used as a unique identifier: it always needs to
be rechecked due to the chance of co
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote:
> Hannu Krosing wrote:
One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash.
>>> How does that help? You still need all the keys to find out which
>>> bucket to lo
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
> >> No, because part of the deal is that you can do lookups using only the
> >> leading index columns. At least, all the
On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
I don't like the truncating hash suggestion because it limits the
ability of a hash code to uniquely identify a key.
AIUI, a hash can't be used as a unique identifier: it always needs to
be rechecked due to the chance of collisions. There might
Hannu Krosing wrote:
One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash.
How does that help? You still need all the keys to find out which
bucket to look in.
no. you need to look at only the buckets where that part of has
Hannu Krosing <[EMAIL PROTECTED]> writes:
> Ãhel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
>> No, because part of the deal is that you can do lookups using only the
>> leading index columns. At least, all the existing multicolumn index
>> types can do that.
> One approahc is no
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
> Gregory Stark <[EMAIL PROTECTED]> writes:
> > "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
> >> - What about multi-column indexes? The current implementation
> >> only supports 1 column.
>
> > That seems kind of weird. It seems ob
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project tha
On 9/3/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Ben Tilly" <[EMAIL PROTECTED]> writes:
> > That raises a very random thought. One of the nicer features of
> > Oracle is the ability to have function-based indexes. So you could
> > index, say, trim(lower(person.name)).
>
> > Is there any prospect
"Ben Tilly" <[EMAIL PROTECTED]> writes:
> That raises a very random thought. One of the nicer features of
> Oracle is the ability to have function-based indexes. So you could
> index, say, trim(lower(person.name)).
> Is there any prospect of postgres aquiring that functionality?
Uh, no, since i
On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote:
>
> That raises a very random thought. One of the nicer features of
> Oracle is the ability to have function-based indexes. So you could
> index, say, trim(lower(person.name)). There are a *lot* of practical
> situations where that come
On 9/3/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
>
> "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
>
> > On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> >> Kenneth Marshall <[EMAIL PROTECTED]> writes:
> >> > ... This is the rough plan. Does anyone see anything critical that
> >> > i
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Kenneth Marshall" <[EMAIL PROTECTED]> writes:
>> - What about multi-column indexes? The current implementation
>> only supports 1 column.
> That seems kind of weird. It seems obvious that you mix the three hashes
> together which reduces it to the solve
"Kenneth Marshall" <[EMAIL PROTECTED]> writes:
> On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
>> Kenneth Marshall <[EMAIL PROTECTED]> writes:
>> > ... This is the rough plan. Does anyone see anything critical that
>> > is missing at this point?
>>
>> Sounds pretty good. Let me brai
On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote:
> >
> > This is the rough plan. Does anyone see anything critical that
> > is missing at this point? Please send me any suggestions for test
> > data and various performance test ideas, since I will be working
> > on that first.
>
> Sou
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
> Kenneth Marshall <[EMAIL PROTECTED]> writes:
> > ... This is the rough plan. Does anyone see anything critical that
> > is missing at this point?
>
> Sounds pretty good. Let me brain-dump one item on you: one thing that
> hash currently
On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote:
> Dear PostgreSQL Hackers:
>
> After following the hackers mailing list for quite a while,
> I am going to start investigating what will need to be done
> to improve hash index performance. Below are the pieces of
> this project that I am
Kenneth Marshall <[EMAIL PROTECTED]> writes:
> ... This is the rough plan. Does anyone see anything critical that
> is missing at this point?
Sounds pretty good. Let me brain-dump one item on you: one thing that
hash currently has over btree is the ability to handle index items up
to a full page.
59 matches
Mail list logo