I'm closing this as returned-with-feedback; AFAICS even the last version
submitted is still in research stage. Please resubmit once you make
further progress.
Thanks,
--
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training &
Hi,
On 01/10/2016 01:08 AM, Peter Geoghegan wrote:
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
So, this seems to bring reasonable speedup, as long as the selectivity is
below 50%, and the data set is sufficiently large.
What about semijoins? Apparently
Hi,
On 01/10/2016 04:03 AM, Peter Geoghegan wrote:
On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote:
Also, have you considered Hash join conditions with multiple
attributes as a special case? I'm thinking of cases like this:
Sorry, accidentally fat-fingered my enter
Hi,
On 01/10/2016 05:11 AM, Peter Geoghegan wrote:
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
Which means the "dim.r" column has 100 different values (0-99) with uniform
distribution. So e.g. "WHERE r < 15" matches 15%.
I think that the use of a
On 11 January 2016 at 09:30, Tomas Vondra
wrote:
> Hi,
>
> On 01/10/2016 04:03 AM, Peter Geoghegan wrote:
>
>> On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote:
>
> Also, are you aware of this?
>>
>>
>>
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
> So, this seems to bring reasonable speedup, as long as the selectivity is
> below 50%, and the data set is sufficiently large.
What about semijoins? Apparently they can use bloom filters
particularly
On Sat, Jan 9, 2016 at 4:08 PM, Peter Geoghegan wrote:
> Also, have you considered Hash join conditions with multiple
> attributes as a special case? I'm thinking of cases like this:
Sorry, accidentally fat-fingered my enter key before I was finished
drafting that mail. That
On Sat, Jan 9, 2016 at 11:02 AM, Tomas Vondra
wrote:
> Which means the "dim.r" column has 100 different values (0-99) with uniform
> distribution. So e.g. "WHERE r < 15" matches 15%.
I think that the use of a uniform distribution to demonstrate this
patch is a bad
Hi,
attached is v2 of the patch, with a number of improvements:
0) This relies on the the other hashjoin patches (delayed build of
buckets and batching), as it allows sizing the bloom filter.
1) enable_hashjoin_bloom GUC
This is mostly meant for debugging and testing, not for
On 28 December 2015 at 23:23, Tomas Vondra
wrote:
> On 12/28/2015 03:15 AM, David Rowley wrote:
>
>> Maybe it would be better to, once the filter is built, simply count the
>>
> number of 1 bits and only use the filter if there's less than
>> 1 bits compared to the
On 12/28/2015 11:52 AM, David Rowley wrote:
On 28 December 2015 at 23:44, Tomas Vondra > wrote:
On 12/28/2015 11:38 AM, David Rowley wrote:
If so, then a filter with all 1 bits set should be thrown away, as
On 12/28/2015 11:38 AM, David Rowley wrote:
On 28 December 2015 at 23:23, Tomas Vondra > wrote:
On 12/28/2015 03:15 AM, David Rowley wrote:
Maybe it would be better to, once the filter is built, simply
count
On 12/28/2015 03:15 AM, David Rowley wrote:
On 18 December 2015 at 04:34, Tomas Vondra > wrote:
I think ultimately we'll need to measure the false positive rate, so
that we can use it to dynamically disable the bloom
On 28 December 2015 at 23:44, Tomas Vondra
wrote:
> On 12/28/2015 11:38 AM, David Rowley wrote:
>
>> If so, then a filter with all 1 bits set should be thrown away, as
>>
> it'll never help us, and the filter should generally become more
>> worthwhile as it contains
On 18 December 2015 at 04:34, Tomas Vondra
wrote:
> I think ultimately we'll need to measure the false positive rate, so that
> we can use it to dynamically disable the bloom filter if it gets
> inefficient. Also maybe put some of that into EXPLAIN ANALYZE.
>
I'm
On 12/24/2015 02:51 PM, Simon Riggs wrote:
On 17 December 2015 at 16:00, Tomas Vondra > wrote:
On 12/17/2015 11:44 AM, Simon Riggs wrote:
My understanding is that the bloom filter would be ineffective
in
Hello, Tomas.
Great idea!
Did you consider to cache bloom filter or at least part(s) of it
somehow? I think this way we could gain some more TPS. This of course
assuming that creating a bloom filter is really a bottleneck here,
which would be nice be investigated first.
Best regards,
Aleksander
On 17 December 2015 at 16:00, Tomas Vondra
wrote:
> On 12/17/2015 11:44 AM, Simon Riggs wrote:
>
>>
>> My understanding is that the bloom filter would be ineffective in any of
>> these cases
>> * Hash table is too small
>>
>
> Yes, although it depends what you mean
Hi,
On 12/20/2015 05:46 AM, Oleg Bartunov wrote:
Tomas,
have you seen
http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru
I have very limited internet connection (no graphics) , so I may miss
something
I haven't seen that, but I don't really see how that's related - your
post is
Tomas,
have you seen
http://www.postgresql.org/message-id/4b4dd67f.9010...@sigaev.ru
I have very limited internet connection (no graphics) , so I may miss
something
Oleg
On Wed, Dec 16, 2015 at 4:15 AM, Tomas Vondra
wrote:
> Hi,
>
> while working on the Hash Join
On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra wrote:
>
> Attached is a spreadsheet with results for various work_mem values, and
> also with a smaller data set (just 30M rows in the fact table), which
> easily fits into memory. Yet it shows similar gains, shaving
On 12/17/2015 11:44 AM, Simon Riggs wrote:
My understanding is that the bloom filter would be ineffective in any of
these cases
* Hash table is too small
Yes, although it depends what you mean by "too small".
Essentially if we can do with a single batch, then it's cheaper to do a
single
Hi,
On 12/17/2015 10:50 AM, Shulgin, Oleksandr wrote:
On Tue, Dec 15, 2015 at 11:30 PM, Tomas Vondra
> wrote:
Attached is a spreadsheet with results for various work_mem
values, and also with a smaller data set (just 30M rows
On 15 December 2015 at 22:30, Tomas Vondra
wrote:
3) Currently the bloom filter is used whenever we do batching, but it
> should really be driven by selectivity too - it'd be good to (a)
> estimate the fraction of 'fact' tuples having a match in the hash
24 matches
Mail list logo