On Wed, Feb 21, 2018 at 11:21 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> On 02/21/2018 02:10 AM, Peter Geoghegan wrote:
>> On Tue, Feb 20, 2018 at 3:54 PM, Tomas Vondra
>> <tomas.von...@2ndquadrant.com> wrote:
>>>> I suspect that it could make sense to use a Bloom filter to
>>>> summarize the entire inner side of the join all at once, even
>>>> when there are multiple batches. I also suspect that this is
>>>> particularly beneficial with parallel hash joins, where
>>>> IPC/synchronization overhead can be a big problem.
>>> But that's what the patch does, currently - the filter is built
>>> during the initial pass through the data, and then used for all
>>> batches.
>> I misunderstood. I would probably do something like double or triple
>> the original rows estimate instead, though. The estimate must be at
>> least slightly inaccurate when we get to this point, but I don't
>> think that that's a good enough reason to give up on the estimate
>> completely.
> That's a problem only for the multi-batch case, though.
> With a single batch we can walk the hash table and count non-empty
> buckets, to get a good ndistinct estimate cheaply. And then size the
> filter considering both memory requirements (fits into CPU cache) and
> false positive rate. There are other things we may need to consider
> (memory usage vs. work_mem) but that's a separate issue.
> With multiple batches I think we could use the "size the bloom filter
> for a fraction of work_mem" which the current patch uses when switching
> to multiple batches halfway-through. That pretty much entirely ignores
> the estimate and essentially replaces it with a "fictional" estimate.
> I think that's a better approach than using some arbitrary multiple of
> the estimate. When we have to start batching halfway through, the
> estimate is proven to be rather bogus anyway, but we may treat it as a
> lower boundary for the bloom filter size.


>> As I said, X should not be a portion of work_mem, because that has
>> only a weak relationship to what really matters.
> I agree a fixed fraction of work_mem may not be the right thing, but the
> goal was to make the bloom filter part of the Hash memory budget, i.e.
>     bloom filter + hash table <= work_mem
> (which I think we agree should be the case), without increasing the
> number of batches too much. For example, if you size the filter ignoring
> this, and it end up being 90% of work_mem, you may need to do the hash
> join in 128 batches instead of just 16. Or something like that.
> Maybe that would still be a win, though. Firstly, the higher number of
> batches may not have a huge impact - in one case we need to serialie
> 15/16 and in the other one 127/128. That's 93% vs. 99%. And if the more
> accurate filter allows us to discard much more data from the outer
> relation ...

Let me reiterate, you can avoid both issues with scalable bloom filters[1].

An HLL can be used to estimate set size, the paper makes no mention of
it, probably assuming only distinct items are added to the set.

[1] http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

Reply via email to