subject:"Re\: Hash Joins vs. Bloom Filters \/ take 2"

Re: Hash Joins vs. Bloom Filters / take 2

2018-11-30 Thread Dmitry Dolgov

On Thu, Nov 1, 2018 at 10:17 PM Tomas Vondra wrote: > > I haven't really planned to work on this anytime soon, unfortunately, > which is why I proposed to mark it as RwF at the end of the last CF. I > already have a couple other patches there, and (more importantly) I > don't have a very clear ide

Re: Hash Joins vs. Bloom Filters / take 2

2018-11-02 Thread Robert Haas

On Thu, Nov 1, 2018 at 5:07 PM Thomas Munro wrote: > Would you compute the hash for the outer tuples in the scan, and then > again in the Hash Join when probing, or would you want to (somehow) > attach the hash to emitted tuples for later reuse by the higher node? I'm interested in what Jim has t

Re: Hash Joins vs. Bloom Filters / take 2

2018-11-01 Thread Tomas Vondra

On 11/01/2018 10:06 PM, Thomas Munro wrote: > On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty wrote: >> I'm very interested in this patch, and particularly in possible >> extensions to push the Bloom filter down on the probe side of the join. I >> made a few small edits to the patch to enable it

Re: Hash Joins vs. Bloom Filters / take 2

2018-11-01 Thread Thomas Munro

On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty wrote: > I'm very interested in this patch, and particularly in possible > extensions to push the Bloom filter down on the probe side of the join. I > made a few small edits to the patch to enable it to compile on PG11, and can > send it to you if y

Re: Hash Joins vs. Bloom Filters / take 2

2018-10-01 Thread Tomas Vondra

On 10/01/2018 09:15 AM, Michael Paquier wrote: On Thu, Mar 01, 2018 at 07:04:41PM -0500, David Steele wrote: After reviewing the thread I also agree that this should be pushed to 2018-09, so I have done so. I'm very excited by this patch, though. In general I agree with Peter that a higher rat

Re: Hash Joins vs. Bloom Filters / take 2

2018-10-01 Thread Michael Paquier

On Thu, Mar 01, 2018 at 07:04:41PM -0500, David Steele wrote: > After reviewing the thread I also agree that this should be pushed to > 2018-09, so I have done so. > > I'm very excited by this patch, though. In general I agree with Peter that > a higher rate of false positives is acceptable to sa

Re: Hash Joins vs. Bloom Filters / take 2

2018-03-06 Thread Patrick Krecker

On Thu, Mar 1, 2018 at 4:04 PM, David Steele wrote: > On 3/1/18 6:52 PM, Tomas Vondra wrote: >> >> On 03/02/2018 12:31 AM, Andres Freund wrote: >>> >>> >>> >>> On March 1, 2018 3:22:44 PM PST, Tomas Vondra >>> wrote: On 03/01/2018 11:01 PM, Andres Freund wrote: > > Hi,

Re: Hash Joins vs. Bloom Filters / take 2

2018-03-01 Thread David Steele

On 3/1/18 6:52 PM, Tomas Vondra wrote: On 03/02/2018 12:31 AM, Andres Freund wrote: On March 1, 2018 3:22:44 PM PST, Tomas Vondra wrote: On 03/01/2018 11:01 PM, Andres Freund wrote: Hi, On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote: So I've decided to revive the old patch, rebase it

Re: Hash Joins vs. Bloom Filters / take 2

2018-03-01 Thread Tomas Vondra

On 03/02/2018 12:31 AM, Andres Freund wrote: > > > On March 1, 2018 3:22:44 PM PST, Tomas Vondra > wrote: >> >> >> On 03/01/2018 11:01 PM, Andres Freund wrote: >>> Hi, >>> >>> On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote: So I've decided to revive the old patch, rebase it to current >>

Re: Hash Joins vs. Bloom Filters / take 2

2018-03-01 Thread Andres Freund

On March 1, 2018 3:22:44 PM PST, Tomas Vondra wrote: > > >On 03/01/2018 11:01 PM, Andres Freund wrote: >> Hi, >> >> On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote: >>> So I've decided to revive the old patch, rebase it to current >master, >>> and see if we can resolve the issues that killed

Re: Hash Joins vs. Bloom Filters / take 2

2018-03-01 Thread Tomas Vondra

On 03/01/2018 11:01 PM, Andres Freund wrote: > Hi, > > On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote: >> So I've decided to revive the old patch, rebase it to current master, >> and see if we can resolve the issues that killed it in 2016. > > There seems to be some good discussion in the thr

Re: Hash Joins vs. Bloom Filters / take 2

2018-03-01 Thread Andres Freund

Hi, On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote: > So I've decided to revive the old patch, rebase it to current master, > and see if we can resolve the issues that killed it in 2016. There seems to be some good discussion in the thread. But the patch arrived just before the last commitfest

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Peter Geoghegan

On Thu, Feb 22, 2018 at 1:14 PM, Tomas Vondra wrote: > OK, thanks for reminding me about SBF and for the discussion. > > At this point I'll probably focus on the other parts though - > determining selectivity of the join, etc. Which I think is crucial, and > we need to get that right even for accu

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Tomas Vondra

On 02/22/2018 09:52 PM, Claudio Freire wrote: > On Thu, Feb 22, 2018 at 5:11 PM, Tomas Vondra > wrote: >> On 02/22/2018 08:33 PM, Claudio Freire wrote: >>> That's kinda slow to do per-item. I tried to "count" distinct items by >>> checking the BF before adding (don't add redundantly), but that's

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Claudio Freire

On Thu, Feb 22, 2018 at 5:11 PM, Tomas Vondra wrote: > On 02/22/2018 08:33 PM, Claudio Freire wrote: >> That's kinda slow to do per-item. I tried to "count" distinct items by >> checking the BF before adding (don't add redundantly), but that's less >> precise than a HLL in my experience. > > But y

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Tomas Vondra

On 02/22/2018 08:33 PM, Claudio Freire wrote: > On Thu, Feb 22, 2018 at 12:45 PM, Tomas Vondra > wrote: >> >> >> On 02/22/2018 12:44 PM, Claudio Freire wrote: >>> ... >>> >>> An HLL can be used to estimate set size, the paper makes no >>> mention of it, probably assuming only distinct items are ad

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Claudio Freire

On Thu, Feb 22, 2018 at 12:45 PM, Tomas Vondra wrote: > > > On 02/22/2018 12:44 PM, Claudio Freire wrote: >> Let me reiterate, you can avoid both issues with scalable bloom filters[1]. >> > > I'm afraid it's not as straight-forward as "Use scalable bloom filters!" > > This is not merely a question

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Tomas Vondra

On 02/22/2018 12:44 PM, Claudio Freire wrote: > On Wed, Feb 21, 2018 at 11:21 PM, Tomas Vondra > wrote: >> On 02/21/2018 02:10 AM, Peter Geoghegan wrote: >>> ... >>> I misunderstood. I would probably do something like double or triple >>> the original rows estimate instead, though. The estimate

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-22 Thread Claudio Freire

On Wed, Feb 21, 2018 at 11:21 PM, Tomas Vondra wrote: > On 02/21/2018 02:10 AM, Peter Geoghegan wrote: >> On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra >> wrote: I suspect that it could make sense to use a Bloom filter to summarize the entire inner side of the join all at once, even >>>

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-21 Thread Tomas Vondra

On 02/21/2018 08:17 AM, Thomas Munro wrote: > On Wed, Feb 21, 2018 at 10:23 AM, Tomas Vondra > wrote: >> In 2015/2016 I've been exploring if we could improve hash joins by >> leveraging bloom filters [1], and I was reminded about this idea in a >> thread about amcheck [2]. I also see that bloom f

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-21 Thread Tomas Vondra

On 02/21/2018 02:10 AM, Peter Geoghegan wrote: > On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra > wrote: >>> I suspect that it could make sense to use a Bloom filter to >>> summarize the entire inner side of the join all at once, even >>> when there are multiple batches. I also suspect that this i

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Thomas Munro

On Wed, Feb 21, 2018 at 10:23 AM, Tomas Vondra wrote: > In 2015/2016 I've been exploring if we could improve hash joins by > leveraging bloom filters [1], and I was reminded about this idea in a > thread about amcheck [2]. I also see that bloom filters were briefly > mentioned in the thread about

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Peter Geoghegan

On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra wrote: >> I suspect that it could make sense to use a Bloom filter to >> summarize the entire inner side of the join all at once, even when >> there are multiple batches. I also suspect that this is particularly >> beneficial with parallel hash joins,

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Peter Geoghegan

On Tue, Feb 20, 2018 at 3:48 PM, Claudio Freire wrote: >> Do we need to eliminate 99% of all hash join probes (that find nothing >> to join on) to make this Bloom filter optimization worthwhile? >> Personally, I doubt it. > > Even for 90% it's about 4.6 bits per element. 4.6 bits is vastly less t

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Tomas Vondra

On 02/21/2018 12:06 AM, Peter Geoghegan wrote: > On Tue, Feb 20, 2018 at 1:23 PM, Tomas Vondra > wrote: >> In 2015/2016 I've been exploring if we could improve hash joins by >> leveraging bloom filters [1], and I was reminded about this idea in a >> thread about amcheck [2]. I also see that bloo

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Claudio Freire

On Tue, Feb 20, 2018 at 8:23 PM, Peter Geoghegan wrote: > On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire > wrote: >> I've worked a lot with bloom filters, and for large false positive >> rates and large sets (multi-million entries), you get bloom filter >> sizes of about 10 bits per distinct it

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Peter Geoghegan

On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire wrote: > I've worked a lot with bloom filters, and for large false positive > rates and large sets (multi-million entries), you get bloom filter > sizes of about 10 bits per distinct item. It's generally true that you need 9.6 bits per element to ge

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Claudio Freire

On Tue, Feb 20, 2018 at 8:06 PM, Peter Geoghegan wrote: > You should try to exploit the fact that a Bloom filter can summarize a > large set reasonably well with a very compact, simple representation. > A false positive rate of 10% sounds a lot worse than 1% or 0.1%, but > for cases where Bloom pr

Re: Hash Joins vs. Bloom Filters / take 2

2018-02-20 Thread Peter Geoghegan

On Tue, Feb 20, 2018 at 1:23 PM, Tomas Vondra wrote: > In 2015/2016 I've been exploring if we could improve hash joins by > leveraging bloom filters [1], and I was reminded about this idea in a > thread about amcheck [2]. I also see that bloom filters were briefly > mentioned in the thread about p

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

Re: Hash Joins vs. Bloom Filters / take 2

29 matches

Site Navigation

Mail list logo

Footer information