Re: [HACKERS] Hash index todo list item

2007-11-14 Thread Kenneth Marshall
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
 On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
  This is a great starting point. I would appreciate it if you have the
  time and could make it apply cleanly to HEAD.
 
 Just to give you an update on this, I'll try to find the time to get it
 done soon, but my day job is keeping me really busy these days, so I'm
 not sure when I'll be able to get to it...
 
 -Neil
 
Neil,

I have been working on putting an updated version of your
patch into the current source. My first try was to try and
put your patch in directly, but it differed so much from the
current build that it was not obvious how to address things
like the current hash_index sorted build patch, which I need
to be able to test with indexes of any size at all. My current
try is to replace the _hash_formitem() calls with a function
called _hash_form_tuple() that actually returns an IndexTuple
and not an HashItem. This will allow it to be used quite
naturally with the current sorted build patch. Here is what
it looks like now:

/*
 * _hash_form_tuple -- construct index tuple using hash(value) not value
 */
IndexTuple
_hash_form_tuple(IndexTuple itup, Relation rel)
{
IndexTuple  result;
Sizesize;
uint32  hashkey;
Datum   datum;
boolisnull;

/* disallow nulls in hash keys */
if (IndexTupleHasNulls(itup))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(hash indexes cannot contain null keys)
));

if (rel-rd_rel-relnatts != 1)
elog(ERROR, hash indexes support only one index key);

/* hash the tuple; we only store the hash value in the index */
datum = index_getattr(itup, 1, RelationGetDescr(rel), isnull);
Assert(!isnull);
hashkey = _hash_datum2hashkey(rel, datum);

size = IndexTupleSize(itup);
result = (IndexTuple) palloc(size);
memcpy(result, itup, size);
return result;
}

I am not currently doing anything other than returning the current
IndexTuple that was created with index_form_tuple(). Am I daft, or
can I just memcpy() the 6 bytes of TID, add the 2 bytes of t_info
(everything 0 and the size set to 6 + 2 + sizeof(hash) = 10), and
the 4 bytes of hash. This will allow me to handle 8-byte hashes
in the future. If you see a problem with this approach, please
let me know. I would appreciate any feedback you can give.

Regards,
Ken
 
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Gokulakannan Somasundaram
I have not followed this thread very closely. But just wanted to give my inputs.


 From the results obtained, the average of all the hash probes is 141.8ms,
 the average for btree is 168.5, a difference of about 27.The standard
 deviations are about 23, so this is a statistically significant difference.
 Our prediction that the hash index would take on the average one
 probe for about 10ms and the btree would take three probes for about 30 ms
 or a difference of about 20ms was pretty well shown by the difference we
 got of about 27. Hope these data points will help with some questions
 about the performance differences between Hash and Btree index.


We all know that Hash indexes are good for equality queries and Binary
indexes are good for both equality queries and Range queries. So for
equality queries, Hash index should do only one Logical I/O (if no
hash collisions) and Binary index should do atleast 3 (I don't know
the level of B-tree that came out as a result of this). You can enable
the Logical I/O count by applying this patch.

*** postgresql-8.3beta1/src/backend/storage/buffer/bufmgr.c Tue Sep 25
18:11:48 2007
--- postgresql-8.3patch/src/backend/storage/buffer/bufmgr.c Fri Oct 19
23:18:36 2007
***
*** 1470,1477 
localhitrate = (float) LocalBufferHitCount *100.0 / 
ReadLocalBufferCount;

appendStringInfo(str,
!   !\tShared blocks: %10ld read, %10ld written, buffer hit rate = 
%.2f%%\n,
!   ReadBufferCount - BufferHitCount, 
BufferFlushCount, hitrate);
appendStringInfo(str,
!\tLocal  blocks: %10ld read, %10ld written, buffer hit rate = 
%.2f%%\n,
 ReadLocalBufferCount - 
LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);
--- 1470,1477 
localhitrate = (float) LocalBufferHitCount *100.0 / 
ReadLocalBufferCount;

appendStringInfo(str,
!   !\tShared blocks: %10ld Logical Reads, %10ld Physical Reads, %10ld
written, buffer hit rate = %.2f%%\n,
!   ReadBufferCount, ReadBufferCount - 
BufferHitCount,
BufferFlushCount, hitrate);
appendStringInfo(str,
!\tLocal  blocks: %10ld read, %10ld written, buffer hit rate = 
%.2f%%\n,
 ReadLocalBufferCount - 
LocalBufferHitCount,
LocalBufferFlushCount, localhitrate);


If possible, it would be useful for you to present the reduction in
Logical I/O count, as it very well might translate to Physical I/Os
for simple index scans. Atleast check whether it is in the ratio 1:3.

Hope my suggestion helps.


Thanks,
Gokul.
CertoSQL Project,
Allied Solution Group.
(www.alliedgroups.com)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Jens-Wolfhard Schicke
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Shreya Bhargava wrote:
 Note that the bottom line for the problems with hash indexes is that the
 current implementation doesn't offer any advantages over btree indexes. Hash
 indexes need to be not only as good of a choice as btree indexes but
 significantly better a significantly better choice at least for some
 specific circumstances.
Oh it does. I recently used a hash index to speed up my database. Namely
I found it improved performance when indexing a non-integer column
containing english words.

I don't know how much of that data was cached, according to the sound
of my harddrive it wasn't all of it. Consider this anecdotical
evidence, but the speedup was noticeable.

 We performed some probe tests on our patch on 
 hash index and the original btree index to compare the 
 performance between the two. We used a 80 million tuple table.
 The table contained single integer attribute and the data
 range was 0 - 1. (generated by a random generator).

I'd be interested how much difference is there between non-integer
index behaviour. I at least had the impression that in my case
the sorted strings in the btree pages didn't compare too well.

 */Gregory Stark [EMAIL PROTECTED]/* wrote:
 Kenneth Marshall writes:
  On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
  Using our implementation, build times and index sizes are
  comparable with btree index build times and index sizes.
Way to go! Currently building hash indices is no fun.

Regards,
  Jens-W. Schicke
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHMErXzhchXT4RR5ARAh7pAKCZIZFJFa7Oq25GvwDhiZJFsrtwgACbBC1F
otwhIZVlNgUGlroePIafi1c=
=N1f7
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Heikki Linnakangas

Shreya Bhargava wrote:

1. Populate the table with 80 million tuples.
2. Create HASH index on the table.
3. clear both linux cache  psql buffers.
   (exiting psql and restarting it cleared the psql buffers;
to clear linux cache, we used drop_cache command)
4. start psql
5. select on an integer in the range of values in the table.
   (all test numbers were big ones, like 98934599)
6. record the time.
7. exit psql.
8. drop caches.(as described above)
9. repeat 4-8 for different numbers.
10. Drop Hash index.
11. Create Btree index and repeat 3-9.


It seems you're mostly measuring the overhead of starting a backend, 
populating the relcache etc.


Restarting psql doesn't clear the postgres shared buffer cache. Or did 
you mean that you restarted postgres?


Anyway, I don't think it's interesting to test with cleared caches. 
Surely the metapage and first 1-2 levels of the b-tree would stay cached 
all the time in real life.



From the results obtained, the average of all the hash probes is 141.8ms, the 
average for btree is 168.5, a difference of about 27.The standard deviations 
are about 23, so this is a statistically significant difference.


I don't trust those numbers much, but in any case I don't think that 
edge is big enough to justify the existence of hash indexes.


If you're looking for a use case where hash index is faster, I'd suggest 
using a data type with an expensive comparison function. Like long 
multi-byte strings in UTF-8 encoding.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-11-06 Thread Tom Lane
Shreya Bhargava [EMAIL PROTECTED] writes:
 We performed some probe tests on our patch on 
 hash index and the original btree index to compare the 
 performance between the two.

Interesting, but ...

 From the results obtained, the average of all the hash probes is 141.8ms, the 
 average for btree is 168.5, a difference of about 27.The standard deviations 
 are about 23, so this is a statistically significant difference.

 Our prediction that the hash index would take on the average one
 probe for about 10ms and the btree would take three probes for about 30 ms
 or a difference of about 20ms was pretty well shown by the difference we
 got of about 27.

... unfortunately, a zero-cache situation isn't very reflective of the
real world.  In reality, for an index that is getting used often enough
to impact application performance at all, the root page will stay in
cache and likely so will some of the upper tree levels.  So I don't
think this experiment proves anything at all about real-world
performance.

What I'd like to see is an aggregate time measurement for millions of
random probes into an index that's noticeably larger than memory.
It would also be interesting to measure the speed of the fully-cached
case (same test, but index smaller than memory).

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-11-05 Thread Shreya Bhargava
Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.


We performed some probe tests on our patch on 
hash index and the original btree index to compare the 
performance between the two. We used a 80 million tuple table.
The table contained single integer attribute and the data
range was 0 - 1. (generated by a random generator).
We did the following:

1. Populate the table with 80 million tuples.
2. Create HASH index on the table.
3. clear both linux cache  psql buffers.
   (exiting psql and restarting it cleared the psql buffers;
to clear linux cache, we used drop_cache command)
4. start psql
5. select on an integer in the range of values in the table.
   (all test numbers were big ones, like 98934599)
6. record the time.
7. exit psql.
8. drop caches.(as described above)
9. repeat 4-8 for different numbers.
10. Drop Hash index.
11. Create Btree index and repeat 3-9.
 
The results are as shown below. All trials are in ms. 
Count is the number of such tuples in the table.
 
 HASH INDEX:
 Number   Count   Trial1  Trial2  Trial3
 21367898  2 152.0129.5  131.1
 95678699  2 140.6145.6  137.5
 99899799  2 148.0147.4  152.6
 97677899  2 142.0153.5  112.0
 94311123  2 137.6146.3  141.3
 67544455  2 141.6144.6  152.7
 88877677  2 122.1123.1  130.8
 88788988  2 150.2150.0  171.7
 4444  1 119.9116.3  117.6
 34267878  1  97.5 99.9  114.8
 55489781  2 115.7117.2  145.3 
 97676767  1 169.5168.5  181.7 
 99767564  1 142.7133.6  132.7
 21367989  4 198.0193.2  186.7
 
BTREE INDEX
 
 Number  Trial1Trial2  Trial3
 21367898   145.5 145.6   150.6
 95678699   177.5 170.0   176.4
 99899799   175.4 181.2   185.4
 97677899   136.9 149.0   130.8
 94311123   179.0 175.3   166.3
 67544455   161.7 162.2   170.4
 88877677   138.7 135.2   149.9
 88788988   165.7 179.3   186.3
 4444   166.0 172.8   184.3
 34267878   159.1 168.8   147.8
 55489781   183.2 195.4   185.1
 97676767   147.2 143.6   132.6
 99767564   167.8 178.3   162.1
 21367989   232.3 238.1   216.9
From the results obtained, the average of all the hash probes is 141.8ms, the 
average for btree is 168.5, a difference of about 27.The standard deviations 
are about 23, so this is a statistically significant difference.

Our prediction that the hash index would take on the average one
probe for about 10ms and the btree would take three probes for about 30 ms
or a difference of about 20ms was pretty well shown by the difference we
got of about 27. Hope these data points will help with some questions
about the performance differences between Hash and Btree index.

Regards,
Shreya




Gregory Stark [EMAIL PROTECTED] wrote: Kenneth Marshall  writes:

 On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:

 Using our implementation, build times and index sizes are
 comparable with btree index build times and index sizes.
...
 That is super! (and timely)

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree 

Re: [HACKERS] Hash index todo list item

2007-10-22 Thread Kenneth Marshall
On Thu, Sep 13, 2007 at 02:02:14PM -0700, Neil Conway wrote:
 On Fri, 2007-09-07 at 08:29 -0500, Kenneth Marshall wrote:
  This is a great starting point. I would appreciate it if you have the
  time and could make it apply cleanly to HEAD.
 
 Just to give you an update on this, I'll try to find the time to get it
 done soon, but my day job is keeping me really busy these days, so I'm
 not sure when I'll be able to get to it...
 
 -Neil
 
Neil,

I am currently working on integrating your previous patch with the
8.3beta1 hash code + the sorted build patch by Tom Raney. I have
been adding back pieces that were removed and I had a question
about index tuples, in particular what does the size field indicate?
Is it the size in bytes of the data after the IndexTupleData or the
total size in bytes including the IndexTupleData?

In order to use the hitem with the sorted build process, I think
that the flags should provide the size info so that the tuplesort
code can be used. This will also allow the use of 32-bit hash
values. Am I completely off base?

Regards,
Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Tom Raney
Kenneth, I just pushed the revised patch (v2!).  The revised approach 
samples the parent relation to estimate the number of tuples rather than 
performing a complete scan.  In my tests, the estimate appears to be 
accurate, erring on the larger side, which is fine.



Tom,

That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:
  



maxtuples - Not really the maximum, but a target value to use
  for setting up the initial buckets. This would allow you to
  set it for data loads and avoid the split-n-copy trauma
  that you are trying to avoid with your new hash build process.
  
If I understand you correctly, I believe we already do this with our 
current build process, there should not be any splits of the index if we 
estimated the tuple count correctly.  However, what gets you is 
collisions where lots of overflow pages occur when distinct keys map to 
the same bucket, or if you have lots of duplicate keys.  Because your 
overall tuple count doesn't exceed the fill factor, no splits occur, but 
lengthy bucket chains lead to lots of IOs.  You touch on this below.

multiplicity - Try to capture use cases that would require many
  overflow pages. In particular, if we discard the normal index
  page layout we can skip the space overhead of the page pointer
  and generate a more compact index. Then you could use a few
  more hash bits to lookup the index entry in the page. How many
  bits would be determined by this factor. 8-bits would give
  you 256 sub-pieces that could each hold about 3 entries using
  the current 4-byte hash, or 2 entries using an 8-byte hash.

What do you think?
  
Yes, this is a good direction.  If we can increase the number of buckets 
and reduce the bucket size (either physically or virtually) to allow 
more direct access without creating a huge index on disk, that would be 
ideal.  But, then if you do have collisions, overflows occur more 
frequently.  I spoke with Neil Conway yesterday at the PG conference 
here in Portland and he piqued my interest in examining his hash code 
more closely to see what he has already done in this area.


-Tom

Cheers,
Ken

On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
  

Kenneth,

Great!

Yes, we did update the code to use the estimate.  I will post the patch 
with this update.  We only saw a very small difference in index build time, 
but you may when you add many columns to the base relation. 
With 1 billion tuples, you should start to see the hash index outperform 
the btree for some equality probes, I would imagine.  With a 90% fill 
factor, the btree would require 4 levels to index that many tuples.  If the 
top two were in memory, there would be 3 IOs needed.  I don't think PG 
supports index only scans, so it will take the extra IO to probe the base 
relation.  The hash may take up to 2 IOs and maybe even less (or maybe more 
depending on how many overflow buckets there are).  It might be interesting 
to fiddle with the fill factors of each index - hash pages (buckets) 
default to 75% full. 
-Tom


Tom,

I am getting ready to stitch in an updated, simplified version
of Neil Conway's hash-only hash index patch. Did you have a
chance to update your sizing function to use the planner-like
estimate and not a full table scan? I would like to be able
to use that when my test table start to have 10^9 entries.
If you have not had a chance, I will try and add it myself.

Regards,
Ken

  
  



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

  



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-10-21 Thread Kenneth Marshall
Tom,

Thank you for the update. I am currently working on updating the
patch Neil Conway sent in against 8.0-ish that stores only the hash
in the index and locates the entries within the page using a binary
search. Then I will fold in your recent update.

On Sun, Oct 21, 2007 at 01:13:48PM -0700, Tom Raney wrote:
 Kenneth, I just pushed the revised patch (v2!).  The revised approach 
 samples the parent relation to estimate the number of tuples rather than 
 performing a complete scan.  In my tests, the estimate appears to be 
 accurate, erring on the larger side, which is fine.


Yes, larger is great and what we need to avoid all the index tuple
suffling between pages.

 Tom,

 That is great. I am looking forward to your patch. After the
 issues that you needed to address, I think that it would be
 reasonable to add a few more user settings for the hash index.
 Fill-factor is too course a knob. The others that I have been
 considering are:
   

 maxtuples - Not really the maximum, but a target value to use
   for setting up the initial buckets. This would allow you to
   set it for data loads and avoid the split-n-copy trauma
   that you are trying to avoid with your new hash build process.
   
 If I understand you correctly, I believe we already do this with our 
 current build process, there should not be any splits of the index if we 
 estimated the tuple count correctly.  However, what gets you is collisions 
 where lots of overflow pages occur when distinct keys map to the same 
 bucket, or if you have lots of duplicate keys.  Because your overall tuple 
 count doesn't exceed the fill factor, no splits occur, but lengthy bucket 
 chains lead to lots of IOs.  You touch on this below.

Yes, you do address this in your patches and it works well for an
existing heap. My idea was to minimize the shuffling problem when we
are doing a data load and do not have a heap to get a count from
because it has not been loaded yet.
 multiplicity - Try to capture use cases that would require many
   overflow pages. In particular, if we discard the normal index
   page layout we can skip the space overhead of the page pointer
   and generate a more compact index. Then you could use a few
   more hash bits to lookup the index entry in the page. How many
   bits would be determined by this factor. 8-bits would give
   you 256 sub-pieces that could each hold about 3 entries using
   the current 4-byte hash, or 2 entries using an 8-byte hash.

 What do you think?
   
 Yes, this is a good direction.  If we can increase the number of buckets 
 and reduce the bucket size (either physically or virtually) to allow more 
 direct access without creating a huge index on disk, that would be ideal.  
 But, then if you do have collisions, overflows occur more frequently.  I 
 spoke with Neil Conway yesterday at the PG conference here in Portland and 
 he piqued my interest in examining his hash code more closely to see what 
 he has already done in this area.

Right, overflows would occur more frequently and any overflow would
allocate a full page. It may be possible to estimate the multiplicity
and minimize the use of overflow pages. If we know that on average that
there are no more than 10 items in a bucket, we can size the virtual
buckets on the first page to support 10 items and minimize the rollover
to an overflow page.

Other ideas, once we hit the overflow page, back-off to the current
fullpage use to maximize the fill-factor, possibly using Neil's
binary search to speed lookups within the overflow pages. Also, with
the smaller virtual buckets use the remaining space on the page as
the first overflow page to hopefully prevent needing to allocate a
full new overflow page. This would be most useful for the smaller
virtual bucket sizes and improve the overall packing efficiency of
the index. If we intend to take advantage of the O(1) behavior, it
will be most useful for large numbers of tuples, more than 10^9. A
32-bit hash is not good enough and we will need to use a 64-bit hash
to reduce collisions in the hash table and consequent need for overflow
pages. I already have the hash function updated to support 64-bit and
will look at getting that functional once I have the hash-only version
working with the 32-bit hashes. Also, without the need to store the
value in the index, we can boost the size of indexable fields to the
page size, and larger. I was tentatively targeting 32k as the implementation
max, because then hashing time becomes the issue. Of course, all of these
changes and options will need to be benchmarked and discussed.

Thank you again for posting the updated patch.

Regards,
Ken


 -Tom
 Cheers,
 Ken

 On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
   
 Kenneth,

 Great!

 Yes, we did update the code to use the estimate.  I will post the patch 
 with this update.  We only saw a very small difference in index build 
 time, but you may when you add many columns to the base relation. With 1 
 billion 

Re: [HACKERS] Hash index todo list item

2007-10-17 Thread Kenneth Marshall
Tom,

That is great. I am looking forward to your patch. After the
issues that you needed to address, I think that it would be
reasonable to add a few more user settings for the hash index.
Fill-factor is too course a knob. The others that I have been
considering are:

maxtuples - Not really the maximum, but a target value to use
  for setting up the initial buckets. This would allow you to
  set it for data loads and avoid the split-n-copy trauma
  that you are trying to avoid with your new hash build process.

multiplicity - Try to capture use cases that would require many
  overflow pages. In particular, if we discard the normal index
  page layout we can skip the space overhead of the page pointer
  and generate a more compact index. Then you could use a few
  more hash bits to lookup the index entry in the page. How many
  bits would be determined by this factor. 8-bits would give
  you 256 sub-pieces that could each hold about 3 entries using
  the current 4-byte hash, or 2 entries using an 8-byte hash.

What do you think?

Cheers,
Ken

On Wed, Oct 17, 2007 at 03:31:58PM -0700, Tom Raney wrote:
 Kenneth,

 Great!

 Yes, we did update the code to use the estimate.  I will post the patch 
 with this update.  We only saw a very small difference in index build time, 
 but you may when you add many columns to the base relation. 
 With 1 billion tuples, you should start to see the hash index outperform 
 the btree for some equality probes, I would imagine.  With a 90% fill 
 factor, the btree would require 4 levels to index that many tuples.  If the 
 top two were in memory, there would be 3 IOs needed.  I don't think PG 
 supports index only scans, so it will take the extra IO to probe the base 
 relation.  The hash may take up to 2 IOs and maybe even less (or maybe more 
 depending on how many overflow buckets there are).  It might be interesting 
 to fiddle with the fill factors of each index - hash pages (buckets) 
 default to 75% full. 
 -Tom
 Tom,

 I am getting ready to stitch in an updated, simplified version
 of Neil Conway's hash-only hash index patch. Did you have a
 chance to update your sizing function to use the planner-like
 estimate and not a full table scan? I would like to be able
 to use that when my test table start to have 10^9 entries.
 If you have not had a chance, I will try and add it myself.

 Regards,
 Ken

   



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hash index todo list item

2007-10-12 Thread Kenneth Marshall
On Wed, Sep 05, 2007 at 03:07:03PM -0500, Kenneth Marshall wrote:
 On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
  Dear PostgreSQL Hackers:
  
  After following the hackers mailing list for quite a while,
  I am going to start investigating what will need to be done
  to improve hash index performance. Below are the pieces of
  this project that I am currently considering:
  
  1. Characterize the current hash index implementation against
 the BTree index, with a focus on space utilization and
 lookup performance against a collection of test data. This
 will give a baseline performance test to evaluate the impact
 of changes. I initially do not plan to bench the hash creation
 process since my initial focus will be on lookup performance.
  
 
 Here are very basic results for a table with 1.6m entries:
 
 postgres=# CREATE TABLE dict (word varchar(100));
 CREATE TABLE
 
 postgres=# COPY dict FROM '/tmp/words';
 COPY 1648379
 postgres=# select count(*) from dict;
   count  
 -
  1648379
 (1 row)
 
 Time: 11187.418 ms
 postgres=# select count(*) from dict;
   count  
 -
  1648379
 (1 row)
 
 Time: 6040.912 ms
 postgres=# CREATE INDEX wordhash ON dict USING hash (word);
 CREATE INDEX
 Time: 11108707.160 ms
 
 postgres=# select * from dict where word = 'avatar';
   word  
 
  avatar
 (1 row)
 
 Time: 79.823 ms
 postgres=# select * from dict where word = 'zebra';
  word  
 ---
  zebra
 (1 row)
 
 Time: 9.864 ms
 postgres=# select * from dict where word = 'turkey'; 
   word  
 
  turkey
 (1 row)
 
 Time: 18.418 ms
 Time: 1.045 ms
 Time: 1.257 ms
 Time: 1.080 ms
 
 postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
 CREATE INDEX
 
 Time: 25438.884 ms
 
 postgres=# select * from dict where word = 'avatar';
   word  
 
  avatar
 (1 row)
 
 Time: 13.400 ms
 postgres=# select * from dict where word = 'zebra';
  word  
 ---
  zebra
 (1 row)
 
 Time: 1.173 ms
 postgres=# select * from dict where word = 'turkey';
   word  
 
  turkey
 (1 row)
 
 Time: 1.186 ms
 Time: 1.103 ms
 Time: 1.099 ms
 Time: 1.108 ms
 
 --
 Size of table =   87556096
 
 Size of hash index = 268451840
 Size of btree index = 53510144
 
 From my very small sample on an unloaded machine, a hash index lookup
 took the least amount of time. It had a much larger initial time which
 could be attributable to cache population effects. The size is 5X that
 of the Btree index. I will continue to improve the test suite as more
 granularity is needed. If anyone has a good data generator, please let
 me know. Otherwise I will just roll my own.
 
 Regards,
 Ken
 
I have just re-ran the previous benchmark against 8.3beta1 (versus 8.2)
including the hash index sorted build patch and the results are below:

test=# CREATE TABLE dict (word varchar(100));
CREATE TABLE
Time: 24.463 ms
test=# COPY dict FROM '/tmp/cracklib-words';
LOG:  checkpoints are occurring too frequently (21 seconds apart)
HINT:  Consider increasing the configuration parameter checkpoint_segments.
COPY 1648379
Time: 44470.235 ms
test=# select count(*) from dict;
  count  
-
 1648379
(1 row)

Time: 4553.924 ms
test=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 85905.960 ms
test=# select * from dict where word = 'avatar';
  word  

 avatar
(1 row)

Time: 592.906 ms
test=# select * from dict where word = 'zebra';
 word  
---
 zebra
(1 row)

Time: 21.458 ms
test=# select * from dict where word = 'turkey';
  word  

 turkey
(1 row)

Time: 22.532 ms
Time: 1.224 ms
Time: 1.200 ms
Time: 1.264 ms
test=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX
Time: 33714.874 ms
test=# select * from dict where word = 'avatar';
  word  

 avatar
(1 row)

Time: 24.296 ms
test=# select * from dict where word = 'zebra';
 word  
---
 zebra
(1 row)

Time: 1.400 ms
test=# select * from dict where word = 'turkey';
  word  

 turkey
(1 row)

Time: 28.302 ms
Time: 1.284 ms
Time: 1.313 ms

---
Size of table = 69877760

Size of hash index =   268451840
Size of btree index =   48570368

I just wanted to have this baseline for future patches since
8.3 is just around the bend. As expected, the sorted build
process is a huge improvement from the unsorted original.

Regards,
Ken Marshall

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-09-28 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Raney wrote:
 We are pleased to announce an upcoming patch to the hash index code
 which improves build time and index size, based on this item in the
 TODO list:
 During index creation, pre-sort the tuples to improve build speed
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php
 
 Using our implementation, build times and index sizes are
 comparable with btree index build times and index sizes.
 For example, for a particular 12 million row relation, the
 8.2.4 release requires over 2.8 hours to build the hash index. 
 With our patch, the index is built in 80 seconds.
 Here is the link for a graph, showing a comparative analysis of
 btree and hash index builds and describing the benchmark data.
 http://web.cecs.pdx.edu/~raneyt/pg/
 
 We are currently cleaning up the patch and will submit it asap.
 
 Regards,
 Shreya Bhargava [EMAIL PROTECTED]
 Tom Raney [EMAIL PROTECTED]
 
 
 Kenneth Marshall wrote:
  Dear PostgreSQL Hackers:
 
  After following the hackers mailing list for quite a while,
  I am going to start investigating what will need to be done
  to improve hash index performance. Below are the pieces of
  this project that I am currently considering:
 
  1. Characterize the current hash index implementation against
 the BTree index, with a focus on space utilization and
 lookup performance against a collection of test data. This
 will give a baseline performance test to evaluate the impact
 of changes. I initially do not plan to bench the hash creation
 process since my initial focus will be on lookup performance.
 
  2. Evaluate the performance of different hash index implementations
 and/or changes to the current implementation. My current plan is
 to keep the implementation as simple as possible and still provide
 the desired performance. Several hash index suggestions deal with
 changing the layout of the keys on a page to improve lookup
 performance, including reducing the bucket size to a fraction of
 a page or only storing the hash value on the page, instead of
 the index value itself. My goal in this phase is to produce one
 or more versions with better performance than the current BTree.
 
  3. Look at build time and concurrency issues with the addition of
 some additional tests to the test bed. (1)
 
  4. Repeat as needed.
 
  This is the rough plan. Does anyone see anything critical that
  is missing at this point? Please send me any suggestions for test
  data and various performance test ideas, since I will be working
  on that first.
 
  Regards,
  Ken Marshall 
 
  ---(end of broadcast)---
  TIP 5: don't forget to increase your free space map settings
 
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall
On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
 We are pleased to announce an upcoming patch to the hash index code
 which improves build time and index size, based on this item in the
 TODO list:
 During index creation, pre-sort the tuples to improve build speed
 http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

 Using our implementation, build times and index sizes are
 comparable with btree index build times and index sizes.
 For example, for a particular 12 million row relation, the
 8.2.4 release requires over 2.8 hours to build the hash index. With our 
 patch, the index is built in 80 seconds.
 Here is the link for a graph, showing a comparative analysis of
 btree and hash index builds and describing the benchmark data.
 http://web.cecs.pdx.edu/~raneyt/pg/

 We are currently cleaning up the patch and will submit it asap.

 Regards,
 Shreya Bhargava [EMAIL PROTECTED]
 Tom Raney [EMAIL PROTECTED]


That is super! (and timely)

I was just looking at that some myself since the large index build
times make testing a very laborious process. I am looking forward to
seeing the details. 

Regards,
Ken

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Gregory Stark
Kenneth Marshall [EMAIL PROTECTED] writes:

 On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:

 Using our implementation, build times and index sizes are
 comparable with btree index build times and index sizes.
...
 That is super! (and timely)

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.

Also, if you're going to submit a patch you should check out a copy of the CVS
HEAD and work from that. I don't think there are huge differences in the area
of hash indexes though. But in most other areas you would be spending quite a
bit of time dealing details which have changed since.

Finally note that we're in the final throes of the 8.3 feature freeze.
Normally any patch submitted now would be held until 8.3 is released and
development on 8.4 is started. I could imagine an exception being made for
hash indexes since they're so moribund currently but probably not. The flip
side of that argument is that there's not much point in making an exception
for something which will only be really useful once further work is done in
the same area.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Kenneth Marshall
On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
 Kenneth Marshall [EMAIL PROTECTED] writes:
 
  On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:
 
  Using our implementation, build times and index sizes are
  comparable with btree index build times and index sizes.
 ...
  That is super! (and timely)
 
 It is pretty super. I have a few comments to raise but don't take it to be too
 negative, it sounds like this is a big step forward towards making hash
 indexes valuable.
 
 Firstly in the graphs it seems the index size graph has an exponential x-axis
 but a linear y-axis. This makes it hard to see what I would expect to be
 pretty much linear growth. The curves look exponential which would mean linear
 growth but of course it's hard to tell.
 
 Also, the growth in the time chart looks pretty much linear. That seems weird
 since I would expect there would be a visible extra component since sort times
 are n-log(n). Perhaps you need to test still larger tables to see that though.
 
 In any case it's clear from the results you have there that the change is a
 positive one and fixes a fundamental problem with the hash index build code.
 
 Something else you should perhaps test is indexing something which is
 substantially larger than hash function output. A hash function is going to
 make the most sense when indexing something like strings for which you want to
 avoid the long comparison costs. Especially consider testing this on a UTF8
 locale with expensive comparisons (like a CJK locale for example).
 
 Note that the bottom line for the problems with hash indexes is that the
 current implementation doesn't offer any advantages over btree indexes. Hash
 indexes need to be not only as good of a choice as btree indexes but
 significantly better a significantly better choice at least for some specific
 circumstances.
 
 Also, if you're going to submit a patch you should check out a copy of the CVS
 HEAD and work from that. I don't think there are huge differences in the area
 of hash indexes though. But in most other areas you would be spending quite a
 bit of time dealing details which have changed since.
 
 Finally note that we're in the final throes of the 8.3 feature freeze.
 Normally any patch submitted now would be held until 8.3 is released and
 development on 8.4 is started. I could imagine an exception being made for
 hash indexes since they're so moribund currently but probably not. The flip
 side of that argument is that there's not much point in making an exception
 for something which will only be really useful once further work is done in
 the same area.
 

Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.

Regards,
Ken

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Michael Glaesemann


On Sep 25, 2007, at 11:26 , Kenneth Marshall wrote:

Although I am very excited about this patch, I do not see any real  
value

in including it in 8.3.


I don't think you have to worry about it being in 8.3. Feature freeze  
was months ago.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-09-25 Thread Tom Raney

Kenneth Marshall wrote:

On Tue, Sep 25, 2007 at 03:35:47PM +0100, Gregory Stark wrote:
  

Kenneth Marshall [EMAIL PROTECTED] writes:



On Thu, Sep 20, 2007 at 05:12:45PM -0700, Tom Raney wrote:

  

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.


...
That is super! (and timely)
  

It is pretty super. I have a few comments to raise but don't take it to be too
negative, it sounds like this is a big step forward towards making hash
indexes valuable.

Firstly in the graphs it seems the index size graph has an exponential x-axis
but a linear y-axis. This makes it hard to see what I would expect to be
pretty much linear growth. The curves look exponential which would mean linear
growth but of course it's hard to tell.

Also, the growth in the time chart looks pretty much linear. That seems weird
since I would expect there would be a visible extra component since sort times
are n-log(n). Perhaps you need to test still larger tables to see that though.

In any case it's clear from the results you have there that the change is a
positive one and fixes a fundamental problem with the hash index build code.

Something else you should perhaps test is indexing something which is
substantially larger than hash function output. A hash function is going to
make the most sense when indexing something like strings for which you want to
avoid the long comparison costs. Especially consider testing this on a UTF8
locale with expensive comparisons (like a CJK locale for example).

Note that the bottom line for the problems with hash indexes is that the
current implementation doesn't offer any advantages over btree indexes. Hash
indexes need to be not only as good of a choice as btree indexes but
significantly better a significantly better choice at least for some specific
circumstances.

Also, if you're going to submit a patch you should check out a copy of the CVS
HEAD and work from that. I don't think there are huge differences in the area
of hash indexes though. But in most other areas you would be spending quite a
bit of time dealing details which have changed since.

Finally note that we're in the final throes of the 8.3 feature freeze.
Normally any patch submitted now would be held until 8.3 is released and
development on 8.4 is started. I could imagine an exception being made for
hash indexes since they're so moribund currently but probably not. The flip
side of that argument is that there's not much point in making an exception
for something which will only be really useful once further work is done in
the same area.




Although I am very excited about this patch, I do not see any real value
in including it in 8.3. As you mention, we need to to have a hash index
implementation that outperforms btree in some problem regime and that is
currently not the case. I have just recently started the process of
gathering ideas and having discussions on various approaches to making
hash indexes more performant and we have a number of ideas on which to
start our investigation. I do think that this patch will make the testing
and evaluation, that will be needed to truly improve the hash index, much
much easier.

Regards,
Ken

  
We're glad to contribute and be a part of Postgres.  The patch has been 
posted to [EMAIL PROTECTED]


Speeding up the creation time of hash indexes on non-trivial relations 
was our goal.  This will allow some interesting performance tests of the 
hash index on very large relations.  It may be that the near constant 
lookup time of the hash index outperforms the Btree index for some large 
data sets and for certain types of data and distributions.


Sincerely,
Tom Raney




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-20 Thread Tom Raney

We are pleased to announce an upcoming patch to the hash index code
which improves build time and index size, based on this item in the
TODO list:
During index creation, pre-sort the tuples to improve build speed
http://archives.postgresql.org/pgsql-hackers/2007-03/msg01199.php

Using our implementation, build times and index sizes are
comparable with btree index build times and index sizes.
For example, for a particular 12 million row relation, the
8.2.4 release requires over 2.8 hours to build the hash index. 
With our patch, the index is built in 80 seconds.

Here is the link for a graph, showing a comparative analysis of
btree and hash index builds and describing the benchmark data.
http://web.cecs.pdx.edu/~raneyt/pg/

We are currently cleaning up the patch and will submit it asap.

Regards,
Shreya Bhargava [EMAIL PROTECTED]
Tom Raney [EMAIL PROTECTED]


Kenneth Marshall wrote:

Dear PostgreSQL Hackers:

After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am currently considering:

1. Characterize the current hash index implementation against
   the BTree index, with a focus on space utilization and
   lookup performance against a collection of test data. This
   will give a baseline performance test to evaluate the impact
   of changes. I initially do not plan to bench the hash creation
   process since my initial focus will be on lookup performance.

2. Evaluate the performance of different hash index implementations
   and/or changes to the current implementation. My current plan is
   to keep the implementation as simple as possible and still provide
   the desired performance. Several hash index suggestions deal with
   changing the layout of the keys on a page to improve lookup
   performance, including reducing the bucket size to a fraction of
   a page or only storing the hash value on the page, instead of
   the index value itself. My goal in this phase is to produce one
   or more versions with better performance than the current BTree.
   
3. Look at build time and concurrency issues with the addition of

   some additional tests to the test bed. (1)

4. Repeat as needed.

This is the rough plan. Does anyone see anything critical that
is missing at this point? Please send me any suggestions for test
data and various performance test ideas, since I will be working
on that first.

Regards,
Ken Marshall 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke
--On Samstag, September 08, 2007 18:56:23 -0400 Mark Mielke 
[EMAIL PROTECTED] wrote:

Kenneth Marshall wrote:
Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.


Space efficiency is provided by not storing the key, nor the header data
required (length prefix?).
Space efficiency at ~1 entry per bucket: How about using closed hashing, 
saving in each page a bitmask in front which specifies which entries hold 
valid entries and in the rest of the page row-pointers (is this the correct 
expression? I don't know...) without further data. Should provide 
reasonably simple data structure and alignment for the pointers.



Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance
characteristics
that we are seeking.


One should look into new plan nodes for != ANY(), NOT EXISTS and 
similar. A node like look into hash and true if bucket is empty would 
work without checking tuple visibility when the bucket is empty and could 
be a win in some situations.


Do we want special cases for short keys like INT4? In those cases the 
implementation might use hash == key and put that knowledge to use in 
plans. Even a unique constraint might then be doable. Does the 
postgresql-storage backend on linux support sparse files? Might be a win 
when holes in the sequence turn up.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Jens-Wolfhard Schicke

More random thoughts:

- Hash-Indices are best for unique keys, but every table needs a new hash 
key, which means one more random page access. Is there any way to build 
multi-_table_ indices? A join might then fetch all table rows with a given 
unique key after one page fetch for the combined index.


- Hashes with trivial hash-functions (like identity) can also return rows 
in a desired order.


- Is there a case where a sequentially scanning a hash-index is useful? I 
can't find any, but maybe somebody else has a use-case.


- What about HashJoins when the referenced tables have hash-indices?

- What about hash-indices where entries are inserted for multiple columns. 
Assume a table like this:


CREATE TABLE link (obj_id1 INT4, obj_id2 INT4);

and a query like

SELECT * FROM link WHERE ? IN (obj_id1, obj_id2);

or some join using a similar condition. It might be a nice thing to insert 
entries at both HASH(obj_id1) and HASH(obj_id2), which would eliminate the 
need to check in two indices and do a bitmap OR. OTOH it might not be 
faster in any significant use cases because who'd need a link table with 
nearly unique linked objects?


- In cases where the distribution of the hash-function is good, but a small 
and relatively even number of rows exist for each key (like it might be the 
case in the above example), it might be nice to reserve a given amount of 
same-key row entries in each bucket, and hold a fill-count at the front of 
it. That would avoid costly page fetches after each collision. You'd create 
a hash-index with n-buckets, each m-elements large. When the bucket is 
full, the usual collision handling continues.


- About hash enlargement: What about always using only the first k bits of 
each hash value. When you find that the hash is quite-full (however that 
is defined and detected), k is increased by one, effectively doubling the 
hash size. New entries are then written as usual, while retrieving the old 
entries needs to test at the k-bit-position first and if there is a miss, 
also at the k-1-position and so forth. To limit this search, some 
background process could after analyzing the index move old entries to the 
now correct k-bit-position and increment some min-k-value once all old 
entries have been moved. After the hash has been increased, the index would 
approximately half it's speed for some time then. Additionally one could 
also insert the entry at the new position if it has been found at the old 
one only while using the index. A special miss-entry at the new position 
doesn't help if nothing could be found because the old positions will 
usually hold some data which resides there even if it uses k bits.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Mark Mielke

Kenneth Marshall wrote:

On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
  

Kenneth Marshall [EMAIL PROTECTED] writes:


... This is the rough plan. Does anyone see anything critical that
is missing at this point?
  

Sounds pretty good.  Let me brain-dump one item on you: one thing that
hash currently has over btree is the ability to handle index items up
to a full page.  Now, if you go with a scheme that only stores hash
codes and not the underlying data, you can not only handle that but
improve on it; but if you reduce the bucket size and don't remove the
data, it'd be a step backward.  The idea I had about dealing with that
was to only reduce the size of primary buckets --- if it's necessary to
add overflow space to a bucket, the overflow units are still full pages.
So an index tuple up to a page in size can always be accommodated by
adding an overflow page to the bucket.

Just a thought, but AFAIR it's not in the archives anywhere.

regards, tom lane



I was thinking about this some more, and it strikes me that we can
keep the page size = bucket size = overflow size in the new scheme
of storing the hash value in the index and still reduce the effective
bucket size. Let's make the new size the (page size / 2^n) where n
is chosen appropriately. Then we we store the value in the page we
simply use n bits of the hash to determine which sub-piece of the
page to use to actually store the value. We may need to add a multiplier
to adjust the decision to split the page based on the mini-page. This
should allow us to much more densely pack the pages and approach the
1 item per bucket. This could easily shrink the size of the index by
a factor of 2^n. Let me know what you think.
  
My personal opinion is that something like this is required to take best 
advantage of hashes. I didn't respond immediately because I don't have 
advice on what the best implementation would look like.


I have also been wondering about the effect of a hash index that 
includes multiple values to the same key (either a non-unique key, or 
different tuples from the same key due to table updates). I started by 
thinking that the maximum number of hash entries per hash bucket should 
be kept to 2 or 3 to prevent reduction in performance to that of btree, 
but I think this doesn't work if multiple tuples can have the same key. 
Unless - the maps is hash value =1:1 index page =1:1 hash bucket =1:N 
hash value =1:M= tuples. Guarantee than N is small (either = 2 or =4 
depending on performance evaluation) by re-hashing if N ever becomes  2 
or  4. Choose a sparse harsh. Let 1:M be indefinite? Also, optimize the 
1:M=1:1 case, such that the value can be inline?


For most cases, I would think the above model would make it cheap to 
determine if the key does not exist, as well as the 1:1 (hash value:key) 
case requiring a single page lookup. Update performance would suffer, 
but lookup should be faster than btree in these cases, as btree often 
requires  1 index page scan.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Hash index todo list item

2007-09-10 Thread Martijn van Oosterhout
On Sat, Sep 08, 2007 at 06:56:23PM -0400, Mark Mielke wrote:
 I think that if the case of 1 entry per hash becomes common enough to 
 be significant, and the key is stored in the hash, that a btree will 
 perform equal or better, and there is no point in pursuing  such a hash 
 index model. This is where we are today.

There is the point that if a user does an UPDATE of a row without
changing the key your index will have to store entries with the same
hash. If your goal is mostly write-once tables, then that's cool, but
otherwise you're going to have to find a way of dealing with that. It's
not just going to happen a lot, it is going to be common, even for
unique indexes.

Presumably HOT will help with this, but that relies on all index columns
not to change.

The major benenfits will mostly come from not storing the key at all. I
think.

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Hash index todo list item

2007-09-09 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
 Kenneth Marshall [EMAIL PROTECTED] writes:
  ... This is the rough plan. Does anyone see anything critical that
  is missing at this point?
 
 Sounds pretty good.  Let me brain-dump one item on you: one thing that
 hash currently has over btree is the ability to handle index items up
 to a full page.  Now, if you go with a scheme that only stores hash
 codes and not the underlying data, you can not only handle that but
 improve on it; but if you reduce the bucket size and don't remove the
 data, it'd be a step backward.  The idea I had about dealing with that
 was to only reduce the size of primary buckets --- if it's necessary to
 add overflow space to a bucket, the overflow units are still full pages.
 So an index tuple up to a page in size can always be accommodated by
 adding an overflow page to the bucket.
 
 Just a thought, but AFAIR it's not in the archives anywhere.
 
   regards, tom lane
 
I was thinking about this some more, and it strikes me that we can
keep the page size = bucket size = overflow size in the new scheme
of storing the hash value in the index and still reduce the effective
bucket size. Let's make the new size the (page size / 2^n) where n
is chosen appropriately. Then we we store the value in the page we
simply use n bits of the hash to determine which sub-piece of the
page to use to actually store the value. We may need to add a multiplier
to adjust the decision to split the page based on the mini-page. This
should allow us to much more densely pack the pages and approach the
1 item per bucket. This could easily shrink the size of the index by
a factor of 2^n. Let me know what you think.

Regards,
Ken

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
 Kenneth Marshall wrote:

 I understand that a hash value is a many-to-one mapping. That is the
 point of the flag in the index. The flag means that there is only one
 item in the heap corresponding to that hash value. In this case we
 know that the value in the heap is the correct one and a possibly
 very expensive string comparison can be skipped. Given that the hash
 function is doing its job, almost every string comparison can be skipped.
 How long would it take to compare 1-32K of data? How much CPU usage?
 With this field in place, you only need to check tuple visibility.
  

 How likely is it that you will get a hash collision, two strings that are 
 different that will hash to the same value?  To avoid this requires a very 
 large hash key (128 bits, minimum)- otherwise you get into birthday attack 
 problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
 strings in a collection of 100,000 will hash to the same value.  With a 
 64-bit hash, the likelyhood is greater than 50% that two strings in a 
 collection of 10 billion will has to same value.  10 billion is a large 
 number, but not an unreasonable number, of strings to want to put into a 
 hash table- and it's exactly this case where the O(1) cost of hashtables 
 starts being a real win.

 Brian

Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing use of the hash index
in non-lossy mode albeit with a slight increase in complexity.

Ken

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke

Kenneth Marshall wrote:

Continuing this train of thought While it would make sense for larger
keys to store the hash in the index, if the key is smaller, particularly
if it is of fixed size, it would make sense to store the key in the index
instead. This would have the benefit of allowing use of the hash index
in non-lossy mode albeit with a slight increase in complexity.
  
I suspect there is no value in designing a hash implementation to work 
well for a context where a btree index would already perform equally well.


If there are too few hash buckets, performance is not O(1). For a hash 
index to function better than btree, I believe focus should be spent on 
the O(1) case, which means ensuring that enough hash buckets are used to 
provide O(1).


All of these must match: 1) Hash value, 2) Key value, 3) Tuple visibility.

In the optimum O(1) scenario, each existing key will map to a hash 
bucket that contains ~1 entry. For this case, there is no value to 
having the key stored in the index row, as 3) Tuple visibility, will 
still require access to the table row. In this optimum scenario, I do 
not believe anything of value is saved by storing the key in the index 
row. The loss, however, is that the hash index data structures become 
more complex, and would likely require support for variable length data. 
The resulting increase in hash index size and code complexity would 
reduce performance.


Just an opinion.

Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Kenneth Marshall
On Sat, Sep 08, 2007 at 05:14:09PM -0400, Mark Mielke wrote:
 Kenneth Marshall wrote:
 Continuing this train of thought While it would make sense for larger
 keys to store the hash in the index, if the key is smaller, particularly
 if it is of fixed size, it would make sense to store the key in the index
 instead. This would have the benefit of allowing use of the hash index
 in non-lossy mode albeit with a slight increase in complexity.
   
 I suspect there is no value in designing a hash implementation to work well 
 for a context where a btree index would already perform equally well.

 If there are too few hash buckets, performance is not O(1). For a hash 
 index to function better than btree, I believe focus should be spent on the 
 O(1) case, which means ensuring that enough hash buckets are used to 
 provide O(1).

 All of these must match: 1) Hash value, 2) Key value, 3) Tuple visibility.

 In the optimum O(1) scenario, each existing key will map to a hash bucket 
 that contains ~1 entry. For this case, there is no value to having the key 
 stored in the index row, as 3) Tuple visibility, will still require access 
 to the table row. In this optimum scenario, I do not believe anything of 
 value is saved by storing the key in the index row. The loss, however, is 
 that the hash index data structures become more complex, and would likely 
 require support for variable length data. The resulting increase in hash 
 index size and code complexity would reduce performance.

 Just an opinion.

 Cheers,
 mark


I agree that we should focus on the O(1) area. The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing. It seems reasonable that
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency. Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance characteristics
that we are seeking.

Regards,
Ken

 -- 
 Mark Mielke [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hash index todo list item

2007-09-08 Thread Mark Mielke

Kenneth Marshall wrote:

The value of storing the
actual value, possibly as an option, is that the key check can be done
in the index without requiring a heap lookup to check the actual value
which would be a benefit for a unique index. I agree that supporting
variable length data would complicate the index and reduce performance.
I am not willing to assume that ~1 entry per hash bucket is necessarily
what we want, at least without some testing.
I think that if the case of 1 entry per hash becomes common enough to 
be significant, and the key is stored in the hash, that a btree will 
perform equal or better, and there is no point in pursuing  such a hash 
index model. This is where we are today.



It seems reasonable that
with the performance differences between L1/L2/L3 cache, main memory,
and the disk subsystem a higher load factor would result in better
overall system performance by reducing cache line misses and improving
hardware pre-fetch efficiency.
If the key is stored, all of these factors likely favor the btree format 
over the hash format. Again, this is where we are today.



Along with the hypothetical performance
wins, the hash index space efficiency would be improved by a similar
factor. Obviously, all of these ideas would need to be tested in
various workload environments. In the large index arena, 10^6 to 10^9
keys and more, space efficiency will help keep the index manageable
in todays system memories.
  
Space efficiency is provided by not storing the key, nor the header data 
required (length prefix?).

Please keep the ideas and comments coming. I am certain that a synthesis
of them will provide an implementation with the performance characteristics
that we are seeking.
  

The subject interests me. :-)

Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Neil Conway
On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
 2. Evaluate the performance of different hash index implementations
and/or changes to the current implementation. My current plan is
to keep the implementation as simple as possible and still provide
the desired performance. Several hash index suggestions deal with
changing the layout of the keys on a page to improve lookup
performance, including reducing the bucket size to a fraction of
a page or only storing the hash value on the page, instead of
the index value itself.

You might find this patch useful:

http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php

It implements the just store the hash in the index idea; it also sorts
the entries in a bucket by the hash value, which allows binary search to
be used to locate candidate matches.

I was surprised that this didn't result in a performance improvement for
the benchmarks that I ran, but I never got around to investigating
further (either running more benchmarks or checking whether there was a
bug in the implementation).

Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
it up to HEAD if you'd like.

-Neil



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 09:50:07AM -0400, Mark Mielke wrote:
 Kenneth Marshall wrote:
 On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
   
 You might find this patch useful:

 http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
 ...

 Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
 it up to HEAD if you'd like.
 
 This is a great starting point. I would appreciate it if you have the
 time and could make it apply cleanly to HEAD. I remember when you first
 posted it but had forgotten, probably because of the lack-luster results.
 Just a quick glance at the patch and from what I can tell, tagging the
 index as lossy causes a lot more work to be done than should be needed
 in theory. Currently the index-scan machinery will recheck the value
 against the original value for lossy indexes. However, given that we
 are using a good hash function with a low chance of collision, if we
 mark the unique items in the index then they do not actually have to
 be rechecked during the scan. Do you have any suggestions for implementing
 that optimization or is there any option to tell the scan machinery to
 only re-check a certain list of tuples? Thank you again for pointing
 this patch out and please let me know when you have a version against
 HEAD.
   
 What do you mean by mark the unique items in the index then they do not 
 actually have to be rechecked during the scan. Even if there is a unique 
 hash value mapping to a unique key, there is no guarantee that a new value 
 won't result in that same hash value. Such is the nature of hashes. A hash 
 key map does not mean a value match. The value must be checked. The 
 opposite, however, may be true. If the hash key is not found, then we know 
 the row for the value does not exist.

 Have you measured the performance of re-checking? I have always assumed the 
 performance of re-checking was near free when compared to the cost of 
 looking up the tuples in the table to determine whether or not they are 
 live. This is why I have not been upset that bitmap index scans often 
 re-check.

 Cheers,
 mark

I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
very expensive string comparison can be skipped. Given that the hash
function is doing its job, almost every string comparison can be skipped.
How long would it take to compare 1-32K of data? How much CPU usage?
With this field in place, you only need to check tuple visibility.

Regards,
Ken

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke

Kenneth Marshall wrote:

On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
  

You might find this patch useful:

http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
...

Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
it up to HEAD if you'd like.


This is a great starting point. I would appreciate it if you have the
time and could make it apply cleanly to HEAD. I remember when you first
posted it but had forgotten, probably because of the lack-luster results.
Just a quick glance at the patch and from what I can tell, tagging the
index as lossy causes a lot more work to be done than should be needed
in theory. Currently the index-scan machinery will recheck the value
against the original value for lossy indexes. However, given that we
are using a good hash function with a low chance of collision, if we
mark the unique items in the index then they do not actually have to
be rechecked during the scan. Do you have any suggestions for implementing
that optimization or is there any option to tell the scan machinery to
only re-check a certain list of tuples? Thank you again for pointing
this patch out and please let me know when you have a version against
HEAD.
  
What do you mean by mark the unique items in the index then they do not 
actually have to be rechecked during the scan. Even if there is a 
unique hash value mapping to a unique key, there is no guarantee that a 
new value won't result in that same hash value. Such is the nature of 
hashes. A hash key map does not mean a value match. The value must be 
checked. The opposite, however, may be true. If the hash key is not 
found, then we know the row for the value does not exist.


Have you measured the performance of re-checking? I have always assumed 
the performance of re-checking was near free when compared to the cost 
of looking up the tuples in the table to determine whether or not they 
are live. This is why I have not been upset that bitmap index scans 
often re-check.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 12:55:37PM +0100, Heikki Linnakangas wrote:
 Neil Conway wrote:
  You might find this patch useful:
  
  http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
 
 Oh, I had forgot about that.
 
  It implements the just store the hash in the index idea; it also sorts
  the entries in a bucket by the hash value, which allows binary search to
  be used to locate candidate matches.
  
  I was surprised that this didn't result in a performance improvement for
  the benchmarks that I ran, but I never got around to investigating
  further (either running more benchmarks or checking whether there was a
  bug in the implementation).
 
 You did get a smaller index, which has cache benefits with larger
 tables. You didn't compare the size comparison against b-tree, but a
 smaller index is a good thing.
 
 I think you left some money on the table on that front. Since all the
 HashItems are fixed size, you can get rid of the line pointers
 altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line
 pointer is 4 bytes, that should give a further 1/3 reduction in index
 size. If you're willing to give up on the alignment of HashItemData, you
 can get it down to 6 bytes.
 
 Even from a CPU point of view, as the table becomes bigger and the
 b-tree becomes deeper, the difference between O(1) and O(log n) lookup
 starts to become more significant.
 
If you use the size values from my initial tests, the hash index is
down to 3X the btree size instead of 5X. If we can make these additional
changes and add a unique flags to the index entry, we would have a much
smaller index. I had not thought of it at the time, but the addition of
the unique/sole item flag would allow the use of the hash index to
support unique indexes and be used for primary keys. In some usage
cases, the O(1) would be very advantageous.

Regards,
Ken

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
 On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
  2. Evaluate the performance of different hash index implementations
 and/or changes to the current implementation. My current plan is
 to keep the implementation as simple as possible and still provide
 the desired performance. Several hash index suggestions deal with
 changing the layout of the keys on a page to improve lookup
 performance, including reducing the bucket size to a fraction of
 a page or only storing the hash value on the page, instead of
 the index value itself.
 
 You might find this patch useful:
 
 http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
 
 It implements the just store the hash in the index idea; it also sorts
 the entries in a bucket by the hash value, which allows binary search to
 be used to locate candidate matches.
 
 I was surprised that this didn't result in a performance improvement for
 the benchmarks that I ran, but I never got around to investigating
 further (either running more benchmarks or checking whether there was a
 bug in the implementation).
 
 Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
 it up to HEAD if you'd like.
 
 -Neil
 
This is a great starting point. I would appreciate it if you have the
time and could make it apply cleanly to HEAD. I remember when you first
posted it but had forgotten, probably because of the lack-luster results.
Just a quick glance at the patch and from what I can tell, tagging the
index as lossy causes a lot more work to be done than should be needed
in theory. Currently the index-scan machinery will recheck the value
against the original value for lossy indexes. However, given that we
are using a good hash function with a low chance of collision, if we
mark the unique items in the index then they do not actually have to
be rechecked during the scan. Do you have any suggestions for implementing
that optimization or is there any option to tell the scan machinery to
only re-check a certain list of tuples? Thank you again for pointing
this patch out and please let me know when you have a version against
HEAD.

Regards,
Ken
 
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Heikki Linnakangas
Neil Conway wrote:
 You might find this patch useful:
 
 http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php

Oh, I had forgot about that.

 It implements the just store the hash in the index idea; it also sorts
 the entries in a bucket by the hash value, which allows binary search to
 be used to locate candidate matches.
 
 I was surprised that this didn't result in a performance improvement for
 the benchmarks that I ran, but I never got around to investigating
 further (either running more benchmarks or checking whether there was a
 bug in the implementation).

You did get a smaller index, which has cache benefits with larger
tables. You didn't compare the size comparison against b-tree, but a
smaller index is a good thing.

I think you left some money on the table on that front. Since all the
HashItems are fixed size, you can get rid of the line pointers
altogether. Given that a sizeof(HashItemData) is 8 bytes, and a line
pointer is 4 bytes, that should give a further 1/3 reduction in index
size. If you're willing to give up on the alignment of HashItemData, you
can get it down to 6 bytes.

Even from a CPU point of view, as the table becomes bigger and the
b-tree becomes deeper, the difference between O(1) and O(log n) lookup
starts to become more significant.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
 On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
  2. Evaluate the performance of different hash index implementations
 and/or changes to the current implementation. My current plan is
 to keep the implementation as simple as possible and still provide
 the desired performance. Several hash index suggestions deal with
 changing the layout of the keys on a page to improve lookup
 performance, including reducing the bucket size to a fraction of
 a page or only storing the hash value on the page, instead of
 the index value itself.
 
 You might find this patch useful:
 
 http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
 
 It implements the just store the hash in the index idea; it also sorts
 the entries in a bucket by the hash value, which allows binary search to
 be used to locate candidate matches.
 
 I was surprised that this didn't result in a performance improvement for
 the benchmarks that I ran, but I never got around to investigating
 further (either running more benchmarks or checking whether there was a
 bug in the implementation).
 
 Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
 it up to HEAD if you'd like.
 
 -Neil
 
I have another question. Did the scan code at this time use the
heap-order scanning? Could that have had an impact on the patch
performance?

Ken

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Mark Mielke

Kenneth Marshall wrote:

I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
very expensive string comparison can be skipped. Given that the hash
function is doing its job, almost every string comparison can be skipped.
How long would it take to compare 1-32K of data? How much CPU usage?
With this field in place, you only need to check tuple visibility
The value comparison cannot be skipped. I do not think you understand 
the many-to-one mapping in its entirety.


Example:

   Table has: a(1), b(2), c(3)
   Index has: 1 = 1 (unique), 2 = 2 (unique), 3 = 3 (unique)

Query:

   select * from table where key = 'z';

If 'z' hashes to '3' (completely possible), then the index record 3 
points to tuple 3, and it exists. Only, it doesn't because 'a'  'z'. 
You MUST check the value.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:36:41AM -0400, Brian Hurt wrote:
 Kenneth Marshall wrote:

 I understand that a hash value is a many-to-one mapping. That is the
 point of the flag in the index. The flag means that there is only one
 item in the heap corresponding to that hash value. In this case we
 know that the value in the heap is the correct one and a possibly
 very expensive string comparison can be skipped. Given that the hash
 function is doing its job, almost every string comparison can be skipped.
 How long would it take to compare 1-32K of data? How much CPU usage?
 With this field in place, you only need to check tuple visibility.
  

 How likely is it that you will get a hash collision, two strings that are 
 different that will hash to the same value?  To avoid this requires a very 
 large hash key (128 bits, minimum)- otherwise you get into birthday attack 
 problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
 strings in a collection of 100,000 will hash to the same value.  With a 
 64-bit hash, the likelyhood is greater than 50% that two strings in a 
 collection of 10 billion will has to same value.  10 billion is a large 
 number, but not an unreasonable number, of strings to want to put into a 
 hash table- and it's exactly this case where the O(1) cost of hashtables 
 starts being a real win.

 Brian

Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.

Regards,
Ken

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote:
 Kenneth Marshall wrote:

  
 How likely is it that you will get a hash collision, two strings that are 
 different that will hash to the same value?  To avoid this requires a 
 very large hash key (128 bits, minimum)- otherwise you get into birthday 
 attack problems.  With a 32-bit hash, the likelyhood is greater than 50% 
 that two strings in a collection of 100,000 will hash to the same value.  
 With a 64-bit hash, the likelyhood is greater than 50% that two strings 
 in a collection of 10 billion will has to same value.  10 billion is a 
 large number, but not an unreasonable number, of strings to want to put 
 into a hash table- and it's exactly this case where the O(1) cost of 
 hashtables starts being a real win.

 Brian


 Yes, there is a non-negligible chance of collision (In a DB is there
 any chance that is non-negligible? :) ) and the values must be checked
 against the actual. The win is the collapse of the index size and only
 needed to check a small fraction of the actual tuples.


  

 Ah, OK- I misunderstood you.  I thought you were saying that the hash 
 values would need to be unique, and you wouldn't check the original values 
 at all.  My bad.

 Brian

No, you were correct. I misstated originally and you and Mark both pointed
out my mistake.

Regards,
Ken

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Kenneth Marshall
On Fri, Sep 07, 2007 at 10:30:30AM -0400, Mark Mielke wrote:
 Kenneth Marshall wrote:
 I understand that a hash value is a many-to-one mapping. That is the
 point of the flag in the index. The flag means that there is only one
 item in the heap corresponding to that hash value. In this case we
 know that the value in the heap is the correct one and a possibly
 very expensive string comparison can be skipped. Given that the hash
 function is doing its job, almost every string comparison can be skipped.
 How long would it take to compare 1-32K of data? How much CPU usage?
 With this field in place, you only need to check tuple visibility
 The value comparison cannot be skipped. I do not think you understand the 
 many-to-one mapping in its entirety.

 Example:

Table has: a(1), b(2), c(3)
Index has: 1 = 1 (unique), 2 = 2 (unique), 3 = 3 (unique)

 Query:

select * from table where key = 'z';

 If 'z' hashes to '3' (completely possible), then the index record 3 points 
 to tuple 3, and it exists. Only, it doesn't because 'a'  'z'. You MUST 
 check the value.

 Cheers,
 mark

Yes, you are completely correct. Thank you for the reminder.

Regards,
Ken

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt

Kenneth Marshall wrote:



 

How likely is it that you will get a hash collision, two strings that are 
different that will hash to the same value?  To avoid this requires a very 
large hash key (128 bits, minimum)- otherwise you get into birthday attack 
problems.  With a 32-bit hash, the likelyhood is greater than 50% that two 
strings in a collection of 100,000 will hash to the same value.  With a 
64-bit hash, the likelyhood is greater than 50% that two strings in a 
collection of 10 billion will has to same value.  10 billion is a large 
number, but not an unreasonable number, of strings to want to put into a 
hash table- and it's exactly this case where the O(1) cost of hashtables 
starts being a real win.


Brian

   


Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.


 



Ah, OK- I misunderstood you.  I thought you were saying that the hash 
values would need to be unique, and you wouldn't check the original 
values at all.  My bad.


Brian



Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Brian Hurt

Kenneth Marshall wrote:


I understand that a hash value is a many-to-one mapping. That is the
point of the flag in the index. The flag means that there is only one
item in the heap corresponding to that hash value. In this case we
know that the value in the heap is the correct one and a possibly
very expensive string comparison can be skipped. Given that the hash
function is doing its job, almost every string comparison can be skipped.
How long would it take to compare 1-32K of data? How much CPU usage?
With this field in place, you only need to check tuple visibility.
 



How likely is it that you will get a hash collision, two strings that 
are different that will hash to the same value?  To avoid this requires 
a very large hash key (128 bits, minimum)- otherwise you get into 
birthday attack problems.  With a 32-bit hash, the likelyhood is greater 
than 50% that two strings in a collection of 100,000 will hash to the 
same value.  With a 64-bit hash, the likelyhood is greater than 50% that 
two strings in a collection of 10 billion will has to same value.  10 
billion is a large number, but not an unreasonable number, of strings to 
want to put into a hash table- and it's exactly this case where the O(1) 
cost of hashtables starts being a real win.


Brian


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Hash index todo list item

2007-09-07 Thread Martijn van Oosterhout
On Thu, Sep 06, 2007 at 01:08:59PM -0500, Kenneth Marshall wrote:
 Since we already have to check the actual tuple values for any index
 lookup in postgresql, we could only store the full hash value and the
 corresponding TIDs in the bucket. Then when we lookup an item by
 calculating its hash, if the exact hash is not present in the bucket,
 then we know that the item is not in the index.

Sounds like you'd be returning a bitmap for use with a bitmap scan.
That's a different take on other suggestions I've heard and would allow
a hash index to have an almost unlimited key size yet flexible
matching... (combined with other index, or even just the same index).

Neat.

Have a nice day,
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
 Gregory Stark [EMAIL PROTECTED] writes:
  Kenneth Marshall [EMAIL PROTECTED] writes:
  - What about multi-column indexes? The current implementation
  only supports 1 column.
 
  That seems kind of weird. It seems obvious that you mix the three hashes
  together which reduces it to the solved problem. 
 
 No, because part of the deal is that you can do lookups using only the
 leading index columns.  At least, all the existing multicolumn index
 types can do that.

One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash. 

If you do it smartly you can use any column for index lookups, not just
the leading one.

   regards, tom lane
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
 No, because part of the deal is that you can do lookups using only the
 leading index columns.  At least, all the existing multicolumn index
 types can do that.

 One approahc is not to mix hashes, but to partition the hash, so that
 each column gets its N bits in the hash. 

How does that help?  You still need all the keys to find out which
bucket to look in.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke

Hannu Krosing wrote:

One approahc is not to mix hashes, but to partition the hash, so that
each column gets its N bits in the hash. 
  

How does that help?  You still need all the keys to find out which
bucket to look in.



no. you need to look at only the buckets where that part of hash matches

say you allocate bits 4-7 for column 2 and then need to look up column 2
value with hash 3 . here you need to look at only buckets N*16 + 3, that
is, you need to examine only each 16th bucket

  


I don't like the truncating hash suggestion because it limits the 
ability of a hash code to uniquely identify a key.


If a user requires the ability to search on both (column1) and (column1, 
column2), they can create two hash indexes and the planner can decide 
which to use.
Or, they can use a btree. I think hash has a subset of uses where it 
would be a significant gain, and focus should be spent on this subset.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Michael Glaesemann


On Sep 6, 2007, at 10:53 , Mark Mielke wrote:

I don't like the truncating hash suggestion because it limits the  
ability of a hash code to uniquely identify a key.


AIUI, a hash can't be used as a unique identifier: it always needs to  
be rechecked due to the chance of collisions. There might be other  
issues with truncation, but preventing hashes from being unique isn't  
one of them.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-09-06 kell 09:38, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Ühel kenal päeval, E, 2007-09-03 kell 19:55, kirjutas Tom Lane:
  No, because part of the deal is that you can do lookups using only the
  leading index columns.  At least, all the existing multicolumn index
  types can do that.
 
  One approahc is not to mix hashes, but to partition the hash, so that
  each column gets its N bits in the hash. 
 
 How does that help?  You still need all the keys to find out which
 bucket to look in.

no. you need to look at only the buckets where that part of hash matches

say you allocate bits 4-7 for column 2 and then need to look up column 2
value with hash 3 . here you need to look at only buckets N*16 + 3, that
is, you need to examine only each 16th bucket

-
Hannu


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Kenneth Marshall
On Thu, Sep 06, 2007 at 11:53:45AM -0400, Mark Mielke wrote:
 Hannu Krosing wrote:
 One approahc is not to mix hashes, but to partition the hash, so that
 each column gets its N bits in the hash.   
 How does that help?  You still need all the keys to find out which
 bucket to look in.
 

 no. you need to look at only the buckets where that part of hash matches

 say you allocate bits 4-7 for column 2 and then need to look up column 2
 value with hash 3 . here you need to look at only buckets N*16 + 3, that
 is, you need to examine only each 16th bucket

   

 I don't like the truncating hash suggestion because it limits the ability 
 of a hash code to uniquely identify a key.

 If a user requires the ability to search on both (column1) and (column1, 
 column2), they can create two hash indexes and the planner can decide which 
 to use.
 Or, they can use a btree. I think hash has a subset of uses where it would 
 be a significant gain, and focus should be spent on this subset.

 Cheers,
 mark

I agree that we should focus primarily on the subset of uses for hash
indexes where there would be a significant gain. I do think that being
able to use a single O(1) hash lookup against all the values specified
in a pseudo-multi-column index could be very beneficial in reducing
access time and I/O. 

Since we already have to check the actual tuple values for any index
lookup in postgresql, we could only store the full hash value and the
corresponding TIDs in the bucket. Then when we lookup an item by
calculating its hash, if the exact hash is not present in the bucket,
then we know that the item is not in the index. If the value exists,
then we would check the heap tuples before returning the results. Thus
a negative lookup only needs to check the index and if the hash function
is good there will be optimally only 1 possibly valid heap tuple if
there is a match. One very big win for this change is to allow a much
smaller index size (hash value + relevant TIDs) and the large column
values are only stored in the actual data tuples.

Regards,
Ken
 -- 
 Mark Mielke [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-06 Thread Mark Mielke

Michael Glaesemann wrote:

On Sep 6, 2007, at 10:53 , Mark Mielke wrote:
I don't like the truncating hash suggestion because it limits the 
ability of a hash code to uniquely identify a key.
AIUI, a hash can't be used as a unique identifier: it always needs to 
be rechecked due to the chance of collisions. There might be other 
issues with truncation, but preventing hashes from being unique isn't 
one of them.


Of course - that's why I used the word limit.

Hash works best, when the key is unique, however. A 32-bit hash will be 
many powers of 2 more unique than a 8-bit hash.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Hash index todo list item

2007-09-05 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote:
 Dear PostgreSQL Hackers:
 
 After following the hackers mailing list for quite a while,
 I am going to start investigating what will need to be done
 to improve hash index performance. Below are the pieces of
 this project that I am currently considering:
 
 1. Characterize the current hash index implementation against
the BTree index, with a focus on space utilization and
lookup performance against a collection of test data. This
will give a baseline performance test to evaluate the impact
of changes. I initially do not plan to bench the hash creation
process since my initial focus will be on lookup performance.
 

Here are very basic results for a table with 1.6m entries:

postgres=# CREATE TABLE dict (word varchar(100));
CREATE TABLE

postgres=# COPY dict FROM '/tmp/words';
COPY 1648379
postgres=# select count(*) from dict;
  count  
-
 1648379
(1 row)

Time: 11187.418 ms
postgres=# select count(*) from dict;
  count  
-
 1648379
(1 row)

Time: 6040.912 ms
postgres=# CREATE INDEX wordhash ON dict USING hash (word);
CREATE INDEX
Time: 11108707.160 ms

postgres=# select * from dict where word = 'avatar';
  word  

 avatar
(1 row)

Time: 79.823 ms
postgres=# select * from dict where word = 'zebra';
 word  
---
 zebra
(1 row)

Time: 9.864 ms
postgres=# select * from dict where word = 'turkey'; 
  word  

 turkey
(1 row)

Time: 18.418 ms
Time: 1.045 ms
Time: 1.257 ms
Time: 1.080 ms

postgres=# CREATE INDEX wordbtree ON dict USING btree (word);
CREATE INDEX

Time: 25438.884 ms

postgres=# select * from dict where word = 'avatar';
  word  

 avatar
(1 row)

Time: 13.400 ms
postgres=# select * from dict where word = 'zebra';
 word  
---
 zebra
(1 row)

Time: 1.173 ms
postgres=# select * from dict where word = 'turkey';
  word  

 turkey
(1 row)

Time: 1.186 ms
Time: 1.103 ms
Time: 1.099 ms
Time: 1.108 ms

--
Size of table =   87556096

Size of hash index = 268451840
Size of btree index = 53510144

From my very small sample on an unloaded machine, a hash index lookup
took the least amount of time. It had a much larger initial time which
could be attributable to cache population effects. The size is 5X that
of the Btree index. I will continue to improve the test suite as more
granularity is needed. If anyone has a good data generator, please let
me know. Otherwise I will just roll my own.

Regards,
Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Simon Riggs
On Sun, 2007-09-02 at 13:04 -0500, Kenneth Marshall wrote:
 Dear PostgreSQL Hackers:
 
 After following the hackers mailing list for quite a while,
 I am going to start investigating what will need to be done
 to improve hash index performance. Below are the pieces of
 this project that I am currently considering:
 
 1. Characterize the current hash index implementation against
the BTree index, with a focus on space utilization and
lookup performance against a collection of test data. This
will give a baseline performance test to evaluate the impact
of changes. I initially do not plan to bench the hash creation
process since my initial focus will be on lookup performance.
 
 2. Evaluate the performance of different hash index implementations
and/or changes to the current implementation. My current plan is
to keep the implementation as simple as possible and still provide
the desired performance. Several hash index suggestions deal with
changing the layout of the keys on a page to improve lookup
performance, including reducing the bucket size to a fraction of
a page or only storing the hash value on the page, instead of
the index value itself. My goal in this phase is to produce one
or more versions with better performance than the current BTree.

 3. Look at build time and concurrency issues with the addition of
some additional tests to the test bed. (1)
 
 4. Repeat as needed.
 
 This is the rough plan. Does anyone see anything critical that
 is missing at this point? Please send me any suggestions for test
 data and various performance test ideas, since I will be working
 on that first.

Sounds good.

I'd be particularly interested in large indexes, say ~ 0.5 - 2GB. There
are likely to be various effects apparent as the indexes grow. It would
be too easy to do all the tests with smaller indexes and miss things.

Other factors are:
- volatility
- concurrency

My general experience is that hash-based indexes are better when the
range of inputs is relatively well-known, allowing a fast lookup. If
that is the only benefit of hash indexes, a flexible hashing scheme may
simply weaken the benefit-case for using them. If that's true, should
the index build process examine the key values in the data to determine
the best parameters to use? Kind of ANALYZE before build.

My current feeling is that they ought to be very good at handling
read-mostly situations such as privilege checking or UPDATE-intensive
situations such as Customer-Current-Credit tracking, when the number of
customers is large.

It might also be worth looking at lossy hash indexes, i.e. the index
stores only the block numbers. That would need to be part of the
discussion around how lossy we will allow indexes to be.

We currently have two kinds of full text index with different
concurrency use cases, so it should be acceptable to have hash indexes
have a clear benefit in one use case but a clear loss in another.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
 Kenneth Marshall [EMAIL PROTECTED] writes:
  ... This is the rough plan. Does anyone see anything critical that
  is missing at this point?
 
 Sounds pretty good.  Let me brain-dump one item on you: one thing that
 hash currently has over btree is the ability to handle index items up
 to a full page.  Now, if you go with a scheme that only stores hash
 codes and not the underlying data, you can not only handle that but
 improve on it; but if you reduce the bucket size and don't remove the
 data, it'd be a step backward.  The idea I had about dealing with that
 was to only reduce the size of primary buckets --- if it's necessary to
 add overflow space to a bucket, the overflow units are still full pages.
 So an index tuple up to a page in size can always be accommodated by
 adding an overflow page to the bucket.
 
 Just a thought, but AFAIR it's not in the archives anywhere.
 
   regards, tom lane
 
Tom,

Thank you for the input. I agree that keeping the ability to accomodate
an index tuple up to a page is size worth keeping. I think that your
goal in reducing the bucket size is to improve the packing efficiency
of the index. Since the on disk page size remains the same, it may be
possible to use a different structure overlayed on the current bucket
size and still improve the packing efficiency of the index. After some
more mulling, here are some further thoughts on the improved hash table
implementation:

- Hash lookup is O(1) while btree is O(logN). Is there a value
  in optimizing the NOT case, i.e. the entry is not in the table?

- Space versus performance trade-off. This may tie into cache
  efficiency and use of L2/L3, shared buffers, main memory.
  Denser layouts with a higher load facter may be a bit slower
  in lookups but play much nicer in a multi-user system. Look
  at the possibility of a lossy mapping?

- Build time versus update time. How does concurrency enter into
  the picture regarding simultaneous updates, inserts, deletes,
  and lookups?

- Could a hybrid structure with some type of prefix compression
  give a more efficient layout and possibly improve performance?

- Index larger fields. Btree is limited to blocksize/3, the
  current hash implementation can go up to a full block.

- What about multi-column indexes? The current implementation
  only supports 1 column.

More ideas are welcome and I will add them to the list for
investigation.

Regards,
Ken

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Mon, Sep 03, 2007 at 10:33:54AM +0100, Simon Riggs wrote:
  
  This is the rough plan. Does anyone see anything critical that
  is missing at this point? Please send me any suggestions for test
  data and various performance test ideas, since I will be working
  on that first.
 
 Sounds good.
 
 I'd be particularly interested in large indexes, say ~ 0.5 - 2GB. There
 are likely to be various effects apparent as the indexes grow. It would
 be too easy to do all the tests with smaller indexes and miss things.
 
 Other factors are:
 - volatility
 - concurrency
 
 My general experience is that hash-based indexes are better when the
 range of inputs is relatively well-known, allowing a fast lookup. If
 that is the only benefit of hash indexes, a flexible hashing scheme may
 simply weaken the benefit-case for using them. If that's true, should
 the index build process examine the key values in the data to determine
 the best parameters to use? Kind of ANALYZE before build.
 
 My current feeling is that they ought to be very good at handling
 read-mostly situations such as privilege checking or UPDATE-intensive
 situations such as Customer-Current-Credit tracking, when the number of
 customers is large.
 
 It might also be worth looking at lossy hash indexes, i.e. the index
 stores only the block numbers. That would need to be part of the
 discussion around how lossy we will allow indexes to be.
 
 We currently have two kinds of full text index with different
 concurrency use cases, so it should be acceptable to have hash indexes
 have a clear benefit in one use case but a clear loss in another.
 
 -- 
   Simon Riggs
   2ndQuadrant  http://www.2ndQuadrant.com
 

Simon,

Thank you for your input. I would like to include some tests with large
indexes too. Do you have any ideas for a test corpus or should we try
and generate the test data programatically? Many people in the literature
of text retrieval use the TREC* data for at least some of their runs. I
am going to check at work to see if the campus has access to the data,
otherwise I will do some web crawling to generate some sample data. I
have just posted a reply to Tom Lane with some further ideas for consideration
in the new hash index support. Like you, I suspect that volatile data that
results in many index changes may not work well with hash indexes, in general.
PostgreSQL has the additional burden of needing to access both the index and
the data heap. Obviously, the less I/O that is needed the better the
performance is likely to be. The new HOT functionality plus clustering the
table data on the hash index would effectively organize the table into the
hash buckets which could help with reducing both the churn in the index
as well as in the tables.

Regards,
Ken

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Gregory Stark

Kenneth Marshall [EMAIL PROTECTED] writes:

 On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
 Kenneth Marshall [EMAIL PROTECTED] writes:
  ... This is the rough plan. Does anyone see anything critical that
  is missing at this point?
 
 Sounds pretty good.  Let me brain-dump one item on you: one thing that
 hash currently has over btree is the ability to handle index items up
 to a full page.  Now, if you go with a scheme that only stores hash
 codes and not the underlying data, you can not only handle that but
 improve on it; 

I think that would be a big selling point for hash indexes. It would let you
index even toasted data which are larger than a page. I'm not sure whether you
can make it work for unique indexes though. But for non-unique indexes I think
it would be a solid win and mean you cover a set of use cases quite distinct
from btree indexes.

 - Hash lookup is O(1) while btree is O(logN). 

That's not really true. There's a tradeoff between insertion time and lookup
time. In order to get O(1) lookups you need to work pretty hard to maintain
the hash table including spending a lot of time reorganizing it when you grow
it. If you don't want to spend the time on inserts then you end up with
buckets and the hash table is basically just a linear speedup to whatever
algorithm you use to scan the buckets.


 - What about multi-column indexes? The current implementation
   only supports 1 column.

That seems kind of weird. It seems obvious that you mix the three hashes
together which reduces it to the solved problem. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Kenneth Marshall [EMAIL PROTECTED] writes:
 - What about multi-column indexes? The current implementation
 only supports 1 column.

 That seems kind of weird. It seems obvious that you mix the three hashes
 together which reduces it to the solved problem. 

No, because part of the deal is that you can do lookups using only the
leading index columns.  At least, all the existing multicolumn index
types can do that.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Gregory Stark [EMAIL PROTECTED] wrote:

 Kenneth Marshall [EMAIL PROTECTED] writes:

  On Sun, Sep 02, 2007 at 10:41:22PM -0400, Tom Lane wrote:
  Kenneth Marshall [EMAIL PROTECTED] writes:
   ... This is the rough plan. Does anyone see anything critical that
   is missing at this point?
 
  Sounds pretty good.  Let me brain-dump one item on you: one thing that
  hash currently has over btree is the ability to handle index items up
  to a full page.  Now, if you go with a scheme that only stores hash
  codes and not the underlying data, you can not only handle that but
  improve on it;

 I think that would be a big selling point for hash indexes. It would let you
 index even toasted data which are larger than a page. I'm not sure whether you
 can make it work for unique indexes though. But for non-unique indexes I think
 it would be a solid win and mean you cover a set of use cases quite distinct
 from btree indexes.

  - Hash lookup is O(1) while btree is O(logN).

 That's not really true. There's a tradeoff between insertion time and lookup
 time. In order to get O(1) lookups you need to work pretty hard to maintain
 the hash table including spending a lot of time reorganizing it when you grow
 it. If you don't want to spend the time on inserts then you end up with
 buckets and the hash table is basically just a linear speedup to whatever
 algorithm you use to scan the buckets.

These facts notwithstanding, average insert performance remains O(1)
if you grow the hash exponentially every time it needs to be grown.
Suppose, for example, that you use a power of 2 arrangement.  Then the
worst case scenario, right after a split, is that all of your keys had
to be inserted, all had to be moved once, half had to be moved twice,
a quarter 3 times, etc.  So the ratio of moves to keys is 1 + 1/2 +
1/4 + ... which is a well-known geometric series converging on 2.

True, when you cross the threshold a lot of work needs to be done.
Life would be simpler if you could just put up a lock while you split
the hash.  You can't do that for a busy transactional database though.
 But if you want to be clever about it, you build into your hash
implementation the intelligence to be able to have 1 or 2 hash
locations to search.  When they are both present, all inserts go into
one of them, all deletes and updates are performed against both.  Then
you're able to have a background job reorganize your hash while the
database continues to use it.

  - What about multi-column indexes? The current implementation
only supports 1 column.

 That seems kind of weird. It seems obvious that you mix the three hashes
 together which reduces it to the solved problem.

That raises a very random thought.  One of the nicer features of
Oracle is the ability to have function-based indexes.  So you could
index, say, trim(lower(person.name)).  There are a *lot* of practical
situations where that comes in handy.  The best workaround that I can
think of for not having that is to have a column defined to hold the
result of the function, maintain that column with a trigger, then
index that column.  Which works, but is inelegant.  (It also requires
storing completely redundant data.)

Is there any prospect of postgres aquiring that functionality?

Ben

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Kenneth Marshall
On Mon, Sep 03, 2007 at 05:20:34PM -0700, Ben Tilly wrote:
 
 That raises a very random thought.  One of the nicer features of
 Oracle is the ability to have function-based indexes.  So you could
 index, say, trim(lower(person.name)).  There are a *lot* of practical
 situations where that comes in handy.  The best workaround that I can
 think of for not having that is to have a column defined to hold the
 result of the function, maintain that column with a trigger, then
 index that column.  Which works, but is inelegant.  (It also requires
 storing completely redundant data.)
 
 Is there any prospect of postgres aquiring that functionality?
 
 Ben
 
I believe that PostgreSQL already supports functional indexes. In fact,
one suggestion to address the egregiously poor performance of the current
hash index was to replace it with a functional index.

Regards,
Ken

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Tom Lane
Ben Tilly [EMAIL PROTECTED] writes:
 That raises a very random thought.  One of the nicer features of
 Oracle is the ability to have function-based indexes.  So you could
 index, say, trim(lower(person.name)).

 Is there any prospect of postgres aquiring that functionality?

Uh, no, since it's already there; has been since Berkeley days ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Hash index todo list item

2007-09-03 Thread Ben Tilly
On 9/3/07, Tom Lane [EMAIL PROTECTED] wrote:
 Ben Tilly [EMAIL PROTECTED] writes:
  That raises a very random thought.  One of the nicer features of
  Oracle is the ability to have function-based indexes.  So you could
  index, say, trim(lower(person.name)).

  Is there any prospect of postgres aquiring that functionality?

 Uh, no, since it's already there; has been since Berkeley days ...

Nice!

I know of at least one DBA who is moving from Oracle to postgres who
will be *very* happy to hear that.

Ben

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Hash index todo list item

2007-09-02 Thread Kenneth Marshall
Dear PostgreSQL Hackers:

After following the hackers mailing list for quite a while,
I am going to start investigating what will need to be done
to improve hash index performance. Below are the pieces of
this project that I am currently considering:

1. Characterize the current hash index implementation against
   the BTree index, with a focus on space utilization and
   lookup performance against a collection of test data. This
   will give a baseline performance test to evaluate the impact
   of changes. I initially do not plan to bench the hash creation
   process since my initial focus will be on lookup performance.

2. Evaluate the performance of different hash index implementations
   and/or changes to the current implementation. My current plan is
   to keep the implementation as simple as possible and still provide
   the desired performance. Several hash index suggestions deal with
   changing the layout of the keys on a page to improve lookup
   performance, including reducing the bucket size to a fraction of
   a page or only storing the hash value on the page, instead of
   the index value itself. My goal in this phase is to produce one
   or more versions with better performance than the current BTree.
   
3. Look at build time and concurrency issues with the addition of
   some additional tests to the test bed. (1)

4. Repeat as needed.

This is the rough plan. Does anyone see anything critical that
is missing at this point? Please send me any suggestions for test
data and various performance test ideas, since I will be working
on that first.

Regards,
Ken Marshall 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Hash index todo list item

2007-09-02 Thread Tom Lane
Kenneth Marshall [EMAIL PROTECTED] writes:
 ... This is the rough plan. Does anyone see anything critical that
 is missing at this point?

Sounds pretty good.  Let me brain-dump one item on you: one thing that
hash currently has over btree is the ability to handle index items up
to a full page.  Now, if you go with a scheme that only stores hash
codes and not the underlying data, you can not only handle that but
improve on it; but if you reduce the bucket size and don't remove the
data, it'd be a step backward.  The idea I had about dealing with that
was to only reduce the size of primary buckets --- if it's necessary to
add overflow space to a bucket, the overflow units are still full pages.
So an index tuple up to a page in size can always be accommodated by
adding an overflow page to the bucket.

Just a thought, but AFAIR it's not in the archives anywhere.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings