On Thu, Nov 1, 2018 at 10:17 PM Tomas Vondra
wrote:
>
> I haven't really planned to work on this anytime soon, unfortunately,
> which is why I proposed to mark it as RwF at the end of the last CF. I
> already have a couple other patches there, and (more importantly) I
> don't have a very clear ide
On Thu, Nov 1, 2018 at 5:07 PM Thomas Munro
wrote:
> Would you compute the hash for the outer tuples in the scan, and then
> again in the Hash Join when probing, or would you want to (somehow)
> attach the hash to emitted tuples for later reuse by the higher node?
I'm interested in what Jim has t
On 11/01/2018 10:06 PM, Thomas Munro wrote:
> On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty wrote:
>> I'm very interested in this patch, and particularly in possible
>> extensions to push the Bloom filter down on the probe side of the join. I
>> made a few small edits to the patch to enable it
On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty wrote:
> I'm very interested in this patch, and particularly in possible
> extensions to push the Bloom filter down on the probe side of the join. I
> made a few small edits to the patch to enable it to compile on PG11, and can
> send it to you if y
On 10/01/2018 09:15 AM, Michael Paquier wrote:
On Thu, Mar 01, 2018 at 07:04:41PM -0500, David Steele wrote:
After reviewing the thread I also agree that this should be pushed to
2018-09, so I have done so.
I'm very excited by this patch, though. In general I agree with Peter that
a higher rat
On Thu, Mar 01, 2018 at 07:04:41PM -0500, David Steele wrote:
> After reviewing the thread I also agree that this should be pushed to
> 2018-09, so I have done so.
>
> I'm very excited by this patch, though. In general I agree with Peter that
> a higher rate of false positives is acceptable to sa
On Thu, Mar 1, 2018 at 4:04 PM, David Steele wrote:
> On 3/1/18 6:52 PM, Tomas Vondra wrote:
>>
>> On 03/02/2018 12:31 AM, Andres Freund wrote:
>>>
>>>
>>>
>>> On March 1, 2018 3:22:44 PM PST, Tomas Vondra
>>> wrote:
On 03/01/2018 11:01 PM, Andres Freund wrote:
>
> Hi,
On 3/1/18 6:52 PM, Tomas Vondra wrote:
On 03/02/2018 12:31 AM, Andres Freund wrote:
On March 1, 2018 3:22:44 PM PST, Tomas Vondra
wrote:
On 03/01/2018 11:01 PM, Andres Freund wrote:
Hi,
On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote:
So I've decided to revive the old patch, rebase it
On 03/02/2018 12:31 AM, Andres Freund wrote:
>
>
> On March 1, 2018 3:22:44 PM PST, Tomas Vondra
> wrote:
>>
>>
>> On 03/01/2018 11:01 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote:
So I've decided to revive the old patch, rebase it to current
>>
On March 1, 2018 3:22:44 PM PST, Tomas Vondra
wrote:
>
>
>On 03/01/2018 11:01 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote:
>>> So I've decided to revive the old patch, rebase it to current
>master,
>>> and see if we can resolve the issues that killed
On 03/01/2018 11:01 PM, Andres Freund wrote:
> Hi,
>
> On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote:
>> So I've decided to revive the old patch, rebase it to current master,
>> and see if we can resolve the issues that killed it in 2016.
>
> There seems to be some good discussion in the thr
Hi,
On 2018-02-20 22:23:54 +0100, Tomas Vondra wrote:
> So I've decided to revive the old patch, rebase it to current master,
> and see if we can resolve the issues that killed it in 2016.
There seems to be some good discussion in the thread. But the patch
arrived just before the last commitfest
On Thu, Feb 22, 2018 at 1:14 PM, Tomas Vondra
wrote:
> OK, thanks for reminding me about SBF and for the discussion.
>
> At this point I'll probably focus on the other parts though -
> determining selectivity of the join, etc. Which I think is crucial, and
> we need to get that right even for accu
On 02/22/2018 09:52 PM, Claudio Freire wrote:
> On Thu, Feb 22, 2018 at 5:11 PM, Tomas Vondra
> wrote:
>> On 02/22/2018 08:33 PM, Claudio Freire wrote:
>>> That's kinda slow to do per-item. I tried to "count" distinct items by
>>> checking the BF before adding (don't add redundantly), but that's
On Thu, Feb 22, 2018 at 5:11 PM, Tomas Vondra
wrote:
> On 02/22/2018 08:33 PM, Claudio Freire wrote:
>> That's kinda slow to do per-item. I tried to "count" distinct items by
>> checking the BF before adding (don't add redundantly), but that's less
>> precise than a HLL in my experience.
>
> But y
On 02/22/2018 08:33 PM, Claudio Freire wrote:
> On Thu, Feb 22, 2018 at 12:45 PM, Tomas Vondra
> wrote:
>>
>>
>> On 02/22/2018 12:44 PM, Claudio Freire wrote:
>>> ...
>>>
>>> An HLL can be used to estimate set size, the paper makes no
>>> mention of it, probably assuming only distinct items are ad
On Thu, Feb 22, 2018 at 12:45 PM, Tomas Vondra
wrote:
>
>
> On 02/22/2018 12:44 PM, Claudio Freire wrote:
>> Let me reiterate, you can avoid both issues with scalable bloom filters[1].
>>
>
> I'm afraid it's not as straight-forward as "Use scalable bloom filters!"
>
> This is not merely a question
On 02/22/2018 12:44 PM, Claudio Freire wrote:
> On Wed, Feb 21, 2018 at 11:21 PM, Tomas Vondra
> wrote:
>> On 02/21/2018 02:10 AM, Peter Geoghegan wrote:
>>> ...
>>> I misunderstood. I would probably do something like double or triple
>>> the original rows estimate instead, though. The estimate
On Wed, Feb 21, 2018 at 11:21 PM, Tomas Vondra
wrote:
> On 02/21/2018 02:10 AM, Peter Geoghegan wrote:
>> On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra
>> wrote:
I suspect that it could make sense to use a Bloom filter to
summarize the entire inner side of the join all at once, even
>>>
On 02/21/2018 08:17 AM, Thomas Munro wrote:
> On Wed, Feb 21, 2018 at 10:23 AM, Tomas Vondra
> wrote:
>> In 2015/2016 I've been exploring if we could improve hash joins by
>> leveraging bloom filters [1], and I was reminded about this idea in a
>> thread about amcheck [2]. I also see that bloom f
On 02/21/2018 02:10 AM, Peter Geoghegan wrote:
> On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra
> wrote:
>>> I suspect that it could make sense to use a Bloom filter to
>>> summarize the entire inner side of the join all at once, even
>>> when there are multiple batches. I also suspect that this i
On Wed, Feb 21, 2018 at 10:23 AM, Tomas Vondra
wrote:
> In 2015/2016 I've been exploring if we could improve hash joins by
> leveraging bloom filters [1], and I was reminded about this idea in a
> thread about amcheck [2]. I also see that bloom filters were briefly
> mentioned in the thread about
On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra
wrote:
>> I suspect that it could make sense to use a Bloom filter to
>> summarize the entire inner side of the join all at once, even when
>> there are multiple batches. I also suspect that this is particularly
>> beneficial with parallel hash joins,
On Tue, Feb 20, 2018 at 3:48 PM, Claudio Freire wrote:
>> Do we need to eliminate 99% of all hash join probes (that find nothing
>> to join on) to make this Bloom filter optimization worthwhile?
>> Personally, I doubt it.
>
> Even for 90% it's about 4.6 bits per element.
4.6 bits is vastly less t
On 02/21/2018 12:06 AM, Peter Geoghegan wrote:
> On Tue, Feb 20, 2018 at 1:23 PM, Tomas Vondra
> wrote:
>> In 2015/2016 I've been exploring if we could improve hash joins by
>> leveraging bloom filters [1], and I was reminded about this idea in a
>> thread about amcheck [2]. I also see that bloo
On Tue, Feb 20, 2018 at 8:23 PM, Peter Geoghegan wrote:
> On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire
> wrote:
>> I've worked a lot with bloom filters, and for large false positive
>> rates and large sets (multi-million entries), you get bloom filter
>> sizes of about 10 bits per distinct it
On Tue, Feb 20, 2018 at 3:17 PM, Claudio Freire wrote:
> I've worked a lot with bloom filters, and for large false positive
> rates and large sets (multi-million entries), you get bloom filter
> sizes of about 10 bits per distinct item.
It's generally true that you need 9.6 bits per element to ge
On Tue, Feb 20, 2018 at 8:06 PM, Peter Geoghegan wrote:
> You should try to exploit the fact that a Bloom filter can summarize a
> large set reasonably well with a very compact, simple representation.
> A false positive rate of 10% sounds a lot worse than 1% or 0.1%, but
> for cases where Bloom pr
On Tue, Feb 20, 2018 at 1:23 PM, Tomas Vondra
wrote:
> In 2015/2016 I've been exploring if we could improve hash joins by
> leveraging bloom filters [1], and I was reminded about this idea in a
> thread about amcheck [2]. I also see that bloom filters were briefly
> mentioned in the thread about p
29 matches
Mail list logo