Re: Next Steps with Hash Indexes

2021-10-27 Thread Amit Kapila
On Wed, Oct 27, 2021 at 4:55 PM Matthias van de Meent wrote: > > On Wed, 27 Oct 2021 at 12:58, Amit Kapila wrote: > > > > On Wed, Oct 27, 2021 at 2:32 AM Robert Haas wrote: > > > > > > On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs > > > wrote: > > > > With unique data, starting at 1 and

Re: Next Steps with Hash Indexes

2021-10-27 Thread Matthias van de Meent
On Wed, 27 Oct 2021 at 12:58, Amit Kapila wrote: > > On Wed, Oct 27, 2021 at 2:32 AM Robert Haas wrote: > > > > On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs > > wrote: > > > With unique data, starting at 1 and monotonically ascending, hash > > > indexes will grow very nicely from 0 to 10E7 rows

Re: Next Steps with Hash Indexes

2021-10-27 Thread Amit Kapila
On Wed, Oct 27, 2021 at 2:32 AM Robert Haas wrote: > > On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs > wrote: > > With unique data, starting at 1 and monotonically ascending, hash > > indexes will grow very nicely from 0 to 10E7 rows without causing >1 > > overflow block to be allocated for any

Re: Next Steps with Hash Indexes

2021-10-26 Thread Robert Haas
On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs wrote: > With unique data, starting at 1 and monotonically ascending, hash > indexes will grow very nicely from 0 to 10E7 rows without causing >1 > overflow block to be allocated for any bucket. This keeps the search > time for such data to just 2 blocks

Re: Next Steps with Hash Indexes

2021-10-25 Thread Amit Kapila
On Sun, Oct 17, 2021 at 4:30 PM Simon Riggs wrote: > > On Thu, 14 Oct 2021 at 16:09, Peter Geoghegan wrote: > > > > On Thu, Oct 14, 2021 at 12:48 AM Simon Riggs > > wrote: > > > The hash index tuples are 20-bytes each. If that were rounded up to > > > 8-byte alignment, then that would be 24

Re: Next Steps with Hash Indexes

2021-10-25 Thread Amit Kapila
On Wed, Oct 13, 2021 at 4:13 PM Simon Riggs wrote: > > On Tue, 5 Oct 2021 at 20:06, Tomas Vondra > wrote: > > > >>> I have presented a simple, almost trivial, patch to allow multi-col > > >>> hash indexes. It hashes the first column only, which can be a downside > > >>> in *some* cases. If that

Re: Next Steps with Hash Indexes

2021-10-17 Thread Simon Riggs
On Thu, 14 Oct 2021 at 16:09, Peter Geoghegan wrote: > > On Thu, Oct 14, 2021 at 12:48 AM Simon Riggs > wrote: > > The hash index tuples are 20-bytes each. If that were rounded up to > > 8-byte alignment, then that would be 24 bytes. > > > > Using pageinspect, the max(live_items) on any data

Re: Next Steps with Hash Indexes

2021-10-14 Thread Peter Geoghegan
On Thu, Oct 14, 2021 at 12:48 AM Simon Riggs wrote: > The hash index tuples are 20-bytes each. If that were rounded up to > 8-byte alignment, then that would be 24 bytes. > > Using pageinspect, the max(live_items) on any data page (bucket or > overflow) is 407 items, so they can't be 24 bytes

Re: Next Steps with Hash Indexes

2021-10-14 Thread Simon Riggs
On Wed, 13 Oct 2021 at 20:16, Peter Geoghegan wrote: > > On Wed, Oct 13, 2021 at 3:44 AM Simon Riggs > wrote: > > > IMO it'd be nice to show some numbers to support the claims that storing > > > the extra hashes and/or 8B hashes is not worth it ... > > > > Using an 8-byte hash is possible, but

Re: Next Steps with Hash Indexes

2021-10-13 Thread Peter Geoghegan
On Wed, Oct 13, 2021 at 12:15 PM Peter Geoghegan wrote: > Are you sure? I know that nbtree index tuples for a single-column int8 > index are exactly the same size as those from a single column int4 > index, due to alignment overhead at the tuple level. So my guess is > that hash index tuples

Re: Next Steps with Hash Indexes

2021-10-13 Thread Peter Geoghegan
On Wed, Oct 13, 2021 at 3:44 AM Simon Riggs wrote: > > IMO it'd be nice to show some numbers to support the claims that storing > > the extra hashes and/or 8B hashes is not worth it ... > > Using an 8-byte hash is possible, but only becomes effective when > 4-byte hash collisions get hard to

Re: Next Steps with Hash Indexes

2021-10-13 Thread Simon Riggs
On Tue, 5 Oct 2021 at 20:06, Tomas Vondra wrote: > >>> I have presented a simple, almost trivial, patch to allow multi-col > >>> hash indexes. It hashes the first column only, which can be a downside > >>> in *some* cases. If that is clearly documented, it would not cause > >>> many issues,

Re: Next Steps with Hash Indexes

2021-10-05 Thread Tomas Vondra
On 10/5/21 18:28, Simon Riggs wrote: On Tue, 5 Oct 2021 at 12:24, Dilip Kumar wrote: On Tue, Oct 5, 2021 at 4:08 PM Simon Riggs wrote: On Mon, 27 Sept 2021 at 06:52, Amit Kapila wrote: On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar wrote: On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad

Re: Next Steps with Hash Indexes

2021-10-05 Thread Simon Riggs
On Tue, 5 Oct 2021 at 12:24, Dilip Kumar wrote: > > On Tue, Oct 5, 2021 at 4:08 PM Simon Riggs > wrote: > > > > On Mon, 27 Sept 2021 at 06:52, Amit Kapila wrote: > > > > > > On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar > > > wrote: > > > > > > > > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad

Re: Next Steps with Hash Indexes

2021-10-05 Thread Dilip Kumar
On Tue, Oct 5, 2021 at 4:08 PM Simon Riggs wrote: > > On Mon, 27 Sept 2021 at 06:52, Amit Kapila wrote: > > > > On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar wrote: > > > > > > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro > > > wrote: > > > > > > > > And to get the multi-column hash

Re: Next Steps with Hash Indexes

2021-10-05 Thread Simon Riggs
On Fri, 13 Aug 2021 at 05:01, Amit Kapila wrote: > > On Thu, Aug 12, 2021 at 8:30 PM Robert Haas wrote: > > > > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila > > wrote: > > > The design of the patch has changed since the initial proposal. It > > > tries to perform unique inserts by holding a

Re: Next Steps with Hash Indexes

2021-10-05 Thread Simon Riggs
On Mon, 27 Sept 2021 at 06:52, Amit Kapila wrote: > > On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar wrote: > > > > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro > > wrote: > > > > > > And to get the multi-column hash index selected, we may set > > > enable_hashjoin =off, to avoid any

Re: Next Steps with Hash Indexes

2021-09-26 Thread Amit Kapila
On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar wrote: > > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro wrote: > > > > And to get the multi-column hash index selected, we may set > > enable_hashjoin =off, to avoid any condition become join condition, > > saw similar behaviors in other DBs as

Re: Next Steps with Hash Indexes

2021-09-22 Thread Dilip Kumar
On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro wrote: > > > > One more thing to consider is that it seems that the planner requires > > > a condition for the first column of an index before considering an > > > indexscan plan. See Tom's email [1] in this regard. I think it would > > > be

Re: Next Steps with Hash Indexes

2021-09-22 Thread Sadhuprasad Patro
> > One more thing to consider is that it seems that the planner requires > > a condition for the first column of an index before considering an > > indexscan plan. See Tom's email [1] in this regard. I think it would > > be better to see what kind of work is involved there if you want to > >

Re: Next Steps with Hash Indexes

2021-09-01 Thread Sadhuprasad Patro
> > That's a significant difference. Have you checked via perf or some > other way what causes this difference? I have seen that sometimes > single client performance with pgbench is not stable, so can you > please once check with 4 clients or so and possibly with a larger > dataset as well. I

Re: Next Steps with Hash Indexes

2021-08-28 Thread Amit Kapila
On Fri, Aug 27, 2021 at 4:27 PM Sadhuprasad Patro wrote: > > IMHO, as discussed above, since other databases also have the > limitation that if you create a multi-column hash index then the hash > index can not be used until all the key columns are used in the search > condition. So my point is

Re: Next Steps with Hash Indexes

2021-08-27 Thread Sadhuprasad Patro
On Fri, Aug 13, 2021 at 11:40 AM Dilip Kumar wrote: > > On Fri, Aug 13, 2021 at 9:31 AM Amit Kapila wrote: > > > > On Thu, Aug 12, 2021 at 8:30 PM Robert Haas wrote: > > > > > > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila > > > wrote: > > > > The design of the patch has changed since the

Re: Next Steps with Hash Indexes

2021-08-13 Thread Dilip Kumar
On Fri, Aug 13, 2021 at 9:31 AM Amit Kapila wrote: > > On Thu, Aug 12, 2021 at 8:30 PM Robert Haas wrote: > > > > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila > > wrote: > > > The design of the patch has changed since the initial proposal. It > > > tries to perform unique inserts by holding a

Re: Next Steps with Hash Indexes

2021-08-12 Thread Amit Kapila
On Thu, Aug 12, 2021 at 8:30 PM Robert Haas wrote: > > On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila wrote: > > The design of the patch has changed since the initial proposal. It > > tries to perform unique inserts by holding a write lock on the bucket > > page to avoid duplicate inserts. > > Do

Re: Next Steps with Hash Indexes

2021-08-12 Thread Peter Geoghegan
On Wed, Aug 11, 2021 at 8:51 AM John Naylor wrote: > (Standard disclaimer that I'm not qualified to design index AMs) I've seen > one mention in the literature about the possibility of simply having a btree > index over the hash values. That would require faster search within pages, in >

Re: Next Steps with Hash Indexes

2021-08-12 Thread Robert Haas
On Thu, Aug 12, 2021 at 12:22 AM Amit Kapila wrote: > The design of the patch has changed since the initial proposal. It > tries to perform unique inserts by holding a write lock on the bucket > page to avoid duplicate inserts. Do you mean that you're holding a buffer lwlock while you search the

Re: Next Steps with Hash Indexes

2021-08-11 Thread Amit Kapila
On Wed, Aug 11, 2021 at 8:47 PM Tom Lane wrote: > > Robert Haas writes: > > I have to admit that after working with Amit on all the work to make > > hash indexes WAL-logged a few years ago, I was somewhat disillusioned > > with the whole AM. It seems like a cool idea to me but it's just not > >

Re: Next Steps with Hash Indexes

2021-08-11 Thread Amit Kapila
On Thu, Aug 12, 2021 at 9:09 AM Dilip Kumar wrote: > > On Wed, Aug 11, 2021 at 8:47 PM Tom Lane wrote: > > > As far as the specific point at hand is concerned, I think storing > > a hash value per index column, while using only the first column's > > hash for bucket selection, is what to do for

Re: Next Steps with Hash Indexes

2021-08-11 Thread Dilip Kumar
On Wed, Aug 11, 2021 at 8:47 PM Tom Lane wrote: > As far as the specific point at hand is concerned, I think storing > a hash value per index column, while using only the first column's > hash for bucket selection, is what to do for multicol indexes. > We still couldn't set amoptionalkey=true

Re: Next Steps with Hash Indexes

2021-08-11 Thread Robert Haas
On Wed, Aug 11, 2021 at 11:17 AM Tom Lane wrote: > Yeah, agreed. The whole buckets-are-integral-numbers-of-pages scheme > is pretty well designed to ensure bloat, but trying to ameliorate that > by reducing the number of buckets creates its own problems (since, as > you mention, we have no

Re: Next Steps with Hash Indexes

2021-08-11 Thread Tom Lane
John Naylor writes: > (Standard disclaimer that I'm not qualified to design index AMs) I've seen > one mention in the literature about the possibility of simply having a > btree index over the hash values. Yeah, that's been talked about in the past --- we considered it moderately seriously back

Re: Next Steps with Hash Indexes

2021-08-11 Thread John Naylor
On Wed, Aug 11, 2021 at 10:54 AM Robert Haas wrote: > don't know how the present patch tries to solve that problem.) It's > tempting to think that we should think about creating something > altogether new instead of hacking on the existing implementation, but > that's a lot of work and I'm not

Re: Next Steps with Hash Indexes

2021-08-11 Thread Tom Lane
Robert Haas writes: > I have to admit that after working with Amit on all the work to make > hash indexes WAL-logged a few years ago, I was somewhat disillusioned > with the whole AM. It seems like a cool idea to me but it's just not > that well-implemented. Yeah, agreed. The whole

Re: Next Steps with Hash Indexes

2021-08-11 Thread Robert Haas
On Wed, Aug 11, 2021 at 10:30 AM Tom Lane wrote: > Robert Haas writes: > > I suspect it would be hard to store multiple hash values, one per > > column. It seems to me that what we ought to do is combine the hash > > values for the individual columns using hash_combine(64) and store the > >

Re: Next Steps with Hash Indexes

2021-08-11 Thread Tom Lane
Robert Haas writes: > I suspect it would be hard to store multiple hash values, one per > column. It seems to me that what we ought to do is combine the hash > values for the individual columns using hash_combine(64) and store the > combined value. I can't really imagine why we would NOT do that.

Re: Next Steps with Hash Indexes

2021-08-11 Thread Robert Haas
On Tue, Aug 10, 2021 at 8:44 AM Dilip Kumar wrote: > I was looking into the hash_multicoul.v3.patch, I have a question > > > - Hash indexes support only single-column indexes and do not allow > - uniqueness checking. > + Hash indexes support uniqueness checking. > + Hash indexes support

Re: Next Steps with Hash Indexes

2021-08-11 Thread Amit Kapila
On Tue, Aug 10, 2021 at 6:14 PM Dilip Kumar wrote: > > On Fri, Jul 23, 2021 at 6:16 PM Simon Riggs > wrote: > > > > On Thu, 22 Jul 2021 at 06:10, Amit Kapila wrote: > > > Complete patch for hash_multicol.v3.patch attached, slightly updated > > from earlier patch. > > Docs, tests, passes make

Re: Next Steps with Hash Indexes

2021-08-10 Thread Peter Geoghegan
On Thu, Jul 15, 2021 at 9:41 AM Simon Riggs wrote: > It would be very desirable to allow Hash Indexes to become Primary Key > Indexes, which requires both > amroutine->amcanunique = true; > amroutine->amcanmulticol = true; Why do you say that? I don't think it's self-evident that it's

Re: Next Steps with Hash Indexes

2021-08-10 Thread Dilip Kumar
On Fri, Jul 23, 2021 at 6:16 PM Simon Riggs wrote: > > On Thu, 22 Jul 2021 at 06:10, Amit Kapila wrote: > Complete patch for hash_multicol.v3.patch attached, slightly updated > from earlier patch. > Docs, tests, passes make check. I was looking into the hash_multicoul.v3.patch, I have a

Re: Next Steps with Hash Indexes

2021-07-23 Thread Simon Riggs
On Thu, 22 Jul 2021 at 06:10, Amit Kapila wrote: > It will surely work if we have an exclusive lock on both the buckets > (old and new) in this case but I think it is better if we can avoid > exclusive locking the old bucket (bucket_being_split) unless it is > really required. We need an

Re: Next Steps with Hash Indexes

2021-07-21 Thread Amit Kapila
On Tue, Jul 20, 2021 at 6:32 PM Simon Riggs wrote: > > On Tue, Jul 20, 2021 at 1:00 PM Amit Kapila wrote: > > > > On Thu, Jul 15, 2021 at 10:11 PM Simon Riggs > > wrote: > > > > I think the main thing to think about for uniqueness check during > > split (where we scan both the old and new

Re: Next Steps with Hash Indexes

2021-07-20 Thread Simon Riggs
On Tue, Jul 20, 2021 at 1:26 PM Amit Kapila wrote: > One more thing we need to think about here is when to find the right > bucket page in the chain where we can insert the new tuple. Do we > first try to complete the uniqueness check (which needs to scan > through the entire bucket chain) and

Re: Next Steps with Hash Indexes

2021-07-20 Thread Simon Riggs
On Tue, Jul 20, 2021 at 1:00 PM Amit Kapila wrote: > > On Thu, Jul 15, 2021 at 10:11 PM Simon Riggs > wrote: > > > > 2. Unique Hash Indexes have been summarized here: > > https://www.postgresql.org/message-id/CAA4eK1KATC1TA5bR5eobYQVO3RWsnH6djNpk3P376em4V8MuUA%40mail.gmail.com > > which also

Re: Next Steps with Hash Indexes

2021-07-20 Thread Amit Kapila
On Tue, Jul 20, 2021 at 5:30 PM Amit Kapila wrote: > > On Thu, Jul 15, 2021 at 10:11 PM Simon Riggs > wrote: > > > > 2. Unique Hash Indexes have been summarized here: > > https://www.postgresql.org/message-id/CAA4eK1KATC1TA5bR5eobYQVO3RWsnH6djNpk3P376em4V8MuUA%40mail.gmail.com > > which also

Re: Next Steps with Hash Indexes

2021-07-20 Thread Amit Kapila
On Thu, Jul 15, 2021 at 10:11 PM Simon Riggs wrote: > > 2. Unique Hash Indexes have been summarized here: > https://www.postgresql.org/message-id/CAA4eK1KATC1TA5bR5eobYQVO3RWsnH6djNpk3P376em4V8MuUA%40mail.gmail.com > which also seems to have two parts to it. > > 2.1 Uniqueness Check > Amit: "to

Next Steps with Hash Indexes

2021-07-15 Thread Simon Riggs
Hi, I've been investigating hash indexes and have what I think is a clear picture in my head, so time for discussion. It would be very desirable to allow Hash Indexes to become Primary Key Indexes, which requires both amroutine->amcanunique = true; amroutine->amcanmulticol = true; Every