Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-02-08 Thread Alvaro Herrera
I'm closing this as returned-with-feedback; AFAICS even the last version submitted is still in research stage. Please resubmit once you make further progress. Thanks, -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training &

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-10 Thread Tomas Vondra
Hi, On 01/10/2016 01:08 AM, Peter Geoghegan wrote: On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra wrote: So, this seems to bring reasonable speedup, as long as the selectivity is below 50%, and the data set is sufficiently large. What about semijoins? Apparently

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-10 Thread Tomas Vondra
Hi, On 01/10/2016 04:03 AM, Peter Geoghegan wrote: On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote: Also, have you considered Hash join conditions with multiple attributes as a special case? I'm thinking of cases like this: Sorry, accidentally fat-fingered my enter

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-10 Thread Tomas Vondra
Hi, On 01/10/2016 05:11 AM, Peter Geoghegan wrote: On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra wrote: Which means the "dim.r" column has 100 different values (0-99) with uniform distribution. So e.g. "WHERE r < 15" matches 15%. I think that the use of a

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-10 Thread David Rowley
On 11 January 2016 at 09:30, Tomas Vondra wrote: > Hi, > > On 01/10/2016 04:03 AM, Peter Geoghegan wrote: > >> On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote: > > Also, are you aware of this? >> >> >>

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-09 Thread Peter Geoghegan
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra wrote: > So, this seems to bring reasonable speedup, as long as the selectivity is > below 50%, and the data set is sufficiently large. What about semijoins? Apparently they can use bloom filters particularly

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-09 Thread Peter Geoghegan
On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote: > Also, have you considered Hash join conditions with multiple > attributes as a special case? I'm thinking of cases like this: Sorry, accidentally fat-fingered my enter key before I was finished drafting that mail. That

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2016-01-09 Thread Peter Geoghegan
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra wrote: > Which means the "dim.r" column has 100 different values (0-99) with uniform > distribution. So e.g. "WHERE r < 15" matches 15%. I think that the use of a uniform distribution to demonstrate this patch is a bad

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-31 Thread Tomas Vondra
Hi, attached is v2 of the patch, with a number of improvements: 0) This relies on the the other hashjoin patches (delayed build of buckets and batching), as it allows sizing the bloom filter. 1) enable_hashjoin_bloom GUC This is mostly meant for debugging and testing, not for

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread David Rowley
On 28 December 2015 at 23:23, Tomas Vondra wrote: > On 12/28/2015 03:15 AM, David Rowley wrote: > >> Maybe it would be better to, once the filter is built, simply count the >> > number of 1 bits and only use the filter if there's less than >> 1 bits compared to the

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread Tomas Vondra
On 12/28/2015 11:52 AM, David Rowley wrote: On 28 December 2015 at 23:44, Tomas Vondra > wrote: On 12/28/2015 11:38 AM, David Rowley wrote: If so, then a filter with all 1 bits set should be thrown away, as

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread Tomas Vondra
On 12/28/2015 11:38 AM, David Rowley wrote: On 28 December 2015 at 23:23, Tomas Vondra > wrote: On 12/28/2015 03:15 AM, David Rowley wrote: Maybe it would be better to, once the filter is built, simply count

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread Tomas Vondra
On 12/28/2015 03:15 AM, David Rowley wrote: On 18 December 2015 at 04:34, Tomas Vondra > wrote: I think ultimately we'll need to measure the false positive rate, so that we can use it to dynamically disable the bloom

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread David Rowley
On 28 December 2015 at 23:44, Tomas Vondra wrote: > On 12/28/2015 11:38 AM, David Rowley wrote: > >> If so, then a filter with all 1 bits set should be thrown away, as >> > it'll never help us, and the filter should generally become more >> worthwhile as it contains

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-27 Thread David Rowley
On 18 December 2015 at 04:34, Tomas Vondra wrote: > I think ultimately we'll need to measure the false positive rate, so that > we can use it to dynamically disable the bloom filter if it gets > inefficient. Also maybe put some of that into EXPLAIN ANALYZE. > I'm

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-24 Thread Tomas Vondra
On 12/24/2015 02:51 PM, Simon Riggs wrote: On 17 December 2015 at 16:00, Tomas Vondra > wrote: On 12/17/2015 11:44 AM, Simon Riggs wrote: My understanding is that the bloom filter would be ineffective in

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-24 Thread Aleksander Alekseev
Hello, Tomas. Great idea! Did you consider to cache bloom filter or at least part(s) of it somehow? I think this way we could gain some more TPS. This of course assuming that creating a bloom filter is really a bottleneck here, which would be nice be investigated first. Best regards, Aleksander

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-24 Thread Simon Riggs
On 17 December 2015 at 16:00, Tomas Vondra wrote: > On 12/17/2015 11:44 AM, Simon Riggs wrote: > >> >> My understanding is that the bloom filter would be ineffective in any of >> these cases >> * Hash table is too small >> > > Yes, although it depends what you mean

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-20 Thread Tomas Vondra
Hi, On 12/20/2015 05:46 AM, Oleg Bartunov wrote: Tomas, have you seen http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru I have very limited internet connection (no graphics) , so I may miss something I haven't seen that, but I don't really see how that's related - your post is

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-19 Thread Oleg Bartunov
Tomas, have you seen http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru I have very limited internet connection (no graphics) , so I may miss something Oleg On Wed, Dec 16, 2015 at 4:15 AM, Tomas Vondra wrote: > Hi, > > while working on the Hash Join

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-17 Thread Shulgin, Oleksandr
On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra wrote: > > Attached is a spreadsheet with results for various work_mem values, and > also with a smaller data set (just 30M rows in the fact table), which > easily fits into memory. Yet it shows similar gains, shaving

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-17 Thread Tomas Vondra
On 12/17/2015 11:44 AM, Simon Riggs wrote: My understanding is that the bloom filter would be ineffective in any of these cases * Hash table is too small Yes, although it depends what you mean by "too small". Essentially if we can do with a single batch, then it's cheaper to do a single

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-17 Thread Tomas Vondra
Hi, On 12/17/2015 10:50 AM, Shulgin, Oleksandr wrote: On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra > wrote: Attached is a spreadsheet with results for various work_mem values, and also with a smaller data set (just 30M rows

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-17 Thread Simon Riggs
On 15 December 2015 at 22:30, Tomas Vondra wrote: 3) Currently the bloom filter is used whenever we do batching, but it > should really be driven by selectivity too - it'd be good to (a) > estimate the fraction of 'fact' tuples having a match in the hash