Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-13 Thread Zhihong Yu
On Thu, Oct 13, 2022 at 7:30 AM Zhihong Yu  wrote:

>
>
> On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu  wrote:
>
>>
>>
>> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan  wrote:
>>
>>> Hello Zhihong Yu & Tomas Vondra,
>>>
>>> Thank you so much for your review and feedback!
>>>
>>> We made some updates based on previous feedback and attached the new
>>> patch set. Due to time constraints, we didn't get to resolve all the
>>> comments, and we'll continue to improve this patch.
>>>
>>> > In this prototype, the cost model is based on an assumption that there
>>> is
>>> > a linear relationship between the performance gain from using a
>>> semijoin
>>> > filter and the estimated filtering rate:
>>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>>> 0.137.
>>> >
>>> > How were the coefficients (0.83 and 0.137) determined ?
>>> > I guess they were based on the results of running certain workload.
>>>
>>> Right, the coefficients (0.83 and 0.137) determined are based on some
>>> preliminary testings. The current costing model is pretty naive and
>>> we'll work on a more robust costing model in future work.
>>>
>>>
>>> > I agree, in principle, although I think the current logic / formula is
>>> a
>>> > bit too crude and fitted to the simple data used in the test. I think
>>> > this needs to be formulated as a regular costing issue, considering
>>> > stuff like cost of the hash functions, and so on.
>>> >
>>> > I think this needs to do two things:
>>> >
>>> > 1) estimate the cost of building the bloom filter - This shall depend
>>> on
>>> > the number of rows in the inner relation, number/cost of the hash
>>> > functions (which may be higher for some data types), etc.
>>> >
>>> > 2) estimate improvement for the probing branch - Essentially, we need
>>> to
>>> > estimate how much we save by filtering some of the rows, but this also
>>> > neeeds to include the cost of probing the bloom filter.
>>> >
>>> > This will probably require some improvements to the lib/bloomfilter, in
>>> > order to estimate the false positive rate - this may matter a lot for
>>> > large data sets and small work_mem values. The bloomfilter library
>>> > simply reduces the size of the bloom filter, which increases the false
>>> > positive rate. At some point it'll start reducing the benefit.
>>> >
>>>
>>> These suggestions make a lot of sense. The current costing model is
>>> definitely not good enough, and we plan to work on a more robust
>>> costing model as we continue to improve the patch.
>>>
>>>
>>> > OK. Could also build the bloom filter in shared memory?
>>> >
>>>
>>> We thought about this approach but didn't prefer this one because if
>>> all worker processes share the same bloom filter in shared memory, we
>>> need to frequently lock and unlock the bloom filter to avoid race
>>> conditions. So we decided to have each worker process create its own
>>> bloom filter.
>>>
>>>
>>> > IMHO we shouldn't make too many conclusions from these examples. Yes,
>>> it
>>> > shows merge join can be improved, but for cases where a hashjoin works
>>> > better so we wouldn't use merge join anyway.
>>> >
>>> > I think we should try constructing examples where either merge join
>>> wins
>>> > already (and gets further improved by the bloom filter), or would lose
>>> > to hash join and the bloom filter improves it enough to win.
>>> >
>>> > AFAICS that requires a join of two large tables - large enough that
>>> hash
>>> > join would need to be batched, or pre-sorted inputs (which eliminates
>>> > the explicit Sort, which is the main cost in most cases).
>>> >
>>> > The current patch only works with sequential scans, which eliminates
>>> the
>>> > second (pre-sorted) option. So let's try the first one - can we invent
>>> > an example with a join of two large tables where a merge join would
>>> win?
>>> >
>>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS.
>>> >
>>>
>>> Agreed. The current examples are only intended to show us that using
>>> bloom filters in merge join could improve the merge join performance
>>> in some cases. We are working on testing more examples that merge join
>>> with bloom filter could out-perform hash join, which should be more
>>> persuasive.
>>>
>>>
>>> > The bloom filter is built by the first seqscan (on t0), and then used
>>> by
>>> > the second seqscan (on t1). But this only works because we always run
>>> > the t0 scan to completion (because we're feeding it into Sort) before
>>> we
>>> > start scanning t1.
>>> >
>>> > But when the scan on t1 switches to an index scan, it's over - we'd be
>>> > building the filter without being able to probe it, and when we finish
>>> > building it we no longer need it. So this seems pretty futile.
>>> >
>>> > It might still improve plans like
>>> >
>>> >->  Merge Join
>>> >  Merge Cond: (t0.c1 = t1.c1)
>>> >  SemiJoin Filter Created Based on: (t0.c1 = t1.c1)
>>> >  SemiJoin Estimated Filtering Rate: 1.
>>> > ->  Sort

Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-13 Thread Zhihong Yu
On Wed, Oct 12, 2022 at 4:35 PM Zhihong Yu  wrote:

>
>
> On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan  wrote:
>
>> Hello Zhihong Yu & Tomas Vondra,
>>
>> Thank you so much for your review and feedback!
>>
>> We made some updates based on previous feedback and attached the new
>> patch set. Due to time constraints, we didn't get to resolve all the
>> comments, and we'll continue to improve this patch.
>>
>> > In this prototype, the cost model is based on an assumption that there
>> is
>> > a linear relationship between the performance gain from using a semijoin
>> > filter and the estimated filtering rate:
>> > % improvement to Merge Join cost = 0.83 * estimated filtering rate -
>> 0.137.
>> >
>> > How were the coefficients (0.83 and 0.137) determined ?
>> > I guess they were based on the results of running certain workload.
>>
>> Right, the coefficients (0.83 and 0.137) determined are based on some
>> preliminary testings. The current costing model is pretty naive and
>> we'll work on a more robust costing model in future work.
>>
>>
>> > I agree, in principle, although I think the current logic / formula is a
>> > bit too crude and fitted to the simple data used in the test. I think
>> > this needs to be formulated as a regular costing issue, considering
>> > stuff like cost of the hash functions, and so on.
>> >
>> > I think this needs to do two things:
>> >
>> > 1) estimate the cost of building the bloom filter - This shall depend on
>> > the number of rows in the inner relation, number/cost of the hash
>> > functions (which may be higher for some data types), etc.
>> >
>> > 2) estimate improvement for the probing branch - Essentially, we need to
>> > estimate how much we save by filtering some of the rows, but this also
>> > neeeds to include the cost of probing the bloom filter.
>> >
>> > This will probably require some improvements to the lib/bloomfilter, in
>> > order to estimate the false positive rate - this may matter a lot for
>> > large data sets and small work_mem values. The bloomfilter library
>> > simply reduces the size of the bloom filter, which increases the false
>> > positive rate. At some point it'll start reducing the benefit.
>> >
>>
>> These suggestions make a lot of sense. The current costing model is
>> definitely not good enough, and we plan to work on a more robust
>> costing model as we continue to improve the patch.
>>
>>
>> > OK. Could also build the bloom filter in shared memory?
>> >
>>
>> We thought about this approach but didn't prefer this one because if
>> all worker processes share the same bloom filter in shared memory, we
>> need to frequently lock and unlock the bloom filter to avoid race
>> conditions. So we decided to have each worker process create its own
>> bloom filter.
>>
>>
>> > IMHO we shouldn't make too many conclusions from these examples. Yes, it
>> > shows merge join can be improved, but for cases where a hashjoin works
>> > better so we wouldn't use merge join anyway.
>> >
>> > I think we should try constructing examples where either merge join wins
>> > already (and gets further improved by the bloom filter), or would lose
>> > to hash join and the bloom filter improves it enough to win.
>> >
>> > AFAICS that requires a join of two large tables - large enough that hash
>> > join would need to be batched, or pre-sorted inputs (which eliminates
>> > the explicit Sort, which is the main cost in most cases).
>> >
>> > The current patch only works with sequential scans, which eliminates the
>> > second (pre-sorted) option. So let's try the first one - can we invent
>> > an example with a join of two large tables where a merge join would win?
>> >
>> > Can we find such example in existing benchmarks like TPC-H/TPC-DS.
>> >
>>
>> Agreed. The current examples are only intended to show us that using
>> bloom filters in merge join could improve the merge join performance
>> in some cases. We are working on testing more examples that merge join
>> with bloom filter could out-perform hash join, which should be more
>> persuasive.
>>
>>
>> > The bloom filter is built by the first seqscan (on t0), and then used by
>> > the second seqscan (on t1). But this only works because we always run
>> > the t0 scan to completion (because we're feeding it into Sort) before we
>> > start scanning t1.
>> >
>> > But when the scan on t1 switches to an index scan, it's over - we'd be
>> > building the filter without being able to probe it, and when we finish
>> > building it we no longer need it. So this seems pretty futile.
>> >
>> > It might still improve plans like
>> >
>> >->  Merge Join
>> >  Merge Cond: (t0.c1 = t1.c1)
>> >  SemiJoin Filter Created Based on: (t0.c1 = t1.c1)
>> >  SemiJoin Estimated Filtering Rate: 1.
>> > ->  Sort
>> >Sort Key: t0.c1
>> >->  Seq Scan on t0
>> >  ->  Index Scan on t1
>> >
>> > But I don't know how common/likely that actually is. I'd expect to have
>> > an index on 

Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-12 Thread Zhihong Yu
On Wed, Oct 12, 2022 at 3:36 PM Lyu Pan  wrote:

> Hello Zhihong Yu & Tomas Vondra,
>
> Thank you so much for your review and feedback!
>
> We made some updates based on previous feedback and attached the new
> patch set. Due to time constraints, we didn't get to resolve all the
> comments, and we'll continue to improve this patch.
>
> > In this prototype, the cost model is based on an assumption that there is
> > a linear relationship between the performance gain from using a semijoin
> > filter and the estimated filtering rate:
> > % improvement to Merge Join cost = 0.83 * estimated filtering rate -
> 0.137.
> >
> > How were the coefficients (0.83 and 0.137) determined ?
> > I guess they were based on the results of running certain workload.
>
> Right, the coefficients (0.83 and 0.137) determined are based on some
> preliminary testings. The current costing model is pretty naive and
> we'll work on a more robust costing model in future work.
>
>
> > I agree, in principle, although I think the current logic / formula is a
> > bit too crude and fitted to the simple data used in the test. I think
> > this needs to be formulated as a regular costing issue, considering
> > stuff like cost of the hash functions, and so on.
> >
> > I think this needs to do two things:
> >
> > 1) estimate the cost of building the bloom filter - This shall depend on
> > the number of rows in the inner relation, number/cost of the hash
> > functions (which may be higher for some data types), etc.
> >
> > 2) estimate improvement for the probing branch - Essentially, we need to
> > estimate how much we save by filtering some of the rows, but this also
> > neeeds to include the cost of probing the bloom filter.
> >
> > This will probably require some improvements to the lib/bloomfilter, in
> > order to estimate the false positive rate - this may matter a lot for
> > large data sets and small work_mem values. The bloomfilter library
> > simply reduces the size of the bloom filter, which increases the false
> > positive rate. At some point it'll start reducing the benefit.
> >
>
> These suggestions make a lot of sense. The current costing model is
> definitely not good enough, and we plan to work on a more robust
> costing model as we continue to improve the patch.
>
>
> > OK. Could also build the bloom filter in shared memory?
> >
>
> We thought about this approach but didn't prefer this one because if
> all worker processes share the same bloom filter in shared memory, we
> need to frequently lock and unlock the bloom filter to avoid race
> conditions. So we decided to have each worker process create its own
> bloom filter.
>
>
> > IMHO we shouldn't make too many conclusions from these examples. Yes, it
> > shows merge join can be improved, but for cases where a hashjoin works
> > better so we wouldn't use merge join anyway.
> >
> > I think we should try constructing examples where either merge join wins
> > already (and gets further improved by the bloom filter), or would lose
> > to hash join and the bloom filter improves it enough to win.
> >
> > AFAICS that requires a join of two large tables - large enough that hash
> > join would need to be batched, or pre-sorted inputs (which eliminates
> > the explicit Sort, which is the main cost in most cases).
> >
> > The current patch only works with sequential scans, which eliminates the
> > second (pre-sorted) option. So let's try the first one - can we invent
> > an example with a join of two large tables where a merge join would win?
> >
> > Can we find such example in existing benchmarks like TPC-H/TPC-DS.
> >
>
> Agreed. The current examples are only intended to show us that using
> bloom filters in merge join could improve the merge join performance
> in some cases. We are working on testing more examples that merge join
> with bloom filter could out-perform hash join, which should be more
> persuasive.
>
>
> > The bloom filter is built by the first seqscan (on t0), and then used by
> > the second seqscan (on t1). But this only works because we always run
> > the t0 scan to completion (because we're feeding it into Sort) before we
> > start scanning t1.
> >
> > But when the scan on t1 switches to an index scan, it's over - we'd be
> > building the filter without being able to probe it, and when we finish
> > building it we no longer need it. So this seems pretty futile.
> >
> > It might still improve plans like
> >
> >->  Merge Join
> >  Merge Cond: (t0.c1 = t1.c1)
> >  SemiJoin Filter Created Based on: (t0.c1 = t1.c1)
> >  SemiJoin Estimated Filtering Rate: 1.
> > ->  Sort
> >Sort Key: t0.c1
> >->  Seq Scan on t0
> >  ->  Index Scan on t1
> >
> > But I don't know how common/likely that actually is. I'd expect to have
> > an index on both sides, but perhaps I'm wrong.
> >
> > This is why hashjoin seems like a more natural fit for the bloom filter,
> > BTW, because there we have a guarantee the inner 

Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-02 Thread Zhihong Yu
On Sun, Oct 2, 2022 at 6:40 AM Zhihong Yu  wrote:

>
>
> On Sat, Oct 1, 2022 at 12:45 AM Zhihong Yu  wrote:
>
>>
>>
>> On Fri, Sep 30, 2022 at 9:20 PM Zhihong Yu  wrote:
>>
>>>
>>>
>>> On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu  wrote:
>>>


 On Fri, Sep 30, 2022 at 3:44 PM Zheng Li  wrote:

> Hello,
>
> A bloom filter provides early filtering of rows that cannot be joined
> before they would reach the join operator, the optimization is also
> called a semi join filter (SJF) pushdown. Such a filter can be created
> when one child of the join operator must materialize its derived table
> before the other child is evaluated.
>
> For example, a bloom filter can be created using the the join keys for
> the build side/inner side of a hash join or the outer side of a merge
> join, the bloom filter can then be used to pre-filter rows on the
> other side of the join operator during the scan of the base relation.
> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
> discussion on using such optimization for hash join without going into
> the pushdown of the filter where its performance gain could be further
> increased.
>
> We worked on prototyping bloom filter pushdown for both hash join and
> merge join. Attached is a patch set for bloom filter pushdown for
> merge join. We also plan to send the patch for hash join once we have
> it rebased.
>
> Here is a summary of the patch set:
> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
> during the table scan instead of later on.
> -The bloom filter is pushed down along the execution tree to
> the target SeqScan nodes.
> -Experiments show that this optimization can speed up Merge
> Join by up to 36%.
>
> 2. The planner makes the decision to use the bloom filter based on the
> estimated filtering rate and the expected performance gain.
> -The planner accomplishes this by estimating four numbers per
> variable - the total number of rows of the relation, the number of
> distinct values for a given variable, and the minimum and maximum
> value of the variable (when applicable). Using these numbers, the
> planner estimates a filtering rate of a potential filter.
> -Because actually creating and implementing the filter adds
> more operations, there is a minimum threshold of filtering where the
> filter would actually be useful. Based on testing, we query to see if
> the estimated filtering rate is higher than 35%, and that informs our
> decision to use a filter or not.
>
> 3. If using a bloom filter, the planner also adjusts the expected cost
> of Merge Join based on expected performance gain.
>
> 4. Capability to build the bloom filter in parallel in case of
> parallel SeqScan. This is done efficiently by populating a local bloom
> filter for each parallel worker and then taking a bitwise OR over all
> the local bloom filters to form a shared bloom filter at the end of
> the parallel SeqScan.
>
> 5. The optimization is GUC controlled, with settings of
> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>
> We found in experiments that there is a significant improvement
> when using the bloom filter during Merge Join. One experiment involved
> joining two large tables while varying the theoretical filtering rate
> (TFR) between the two tables, the TFR is defined as the percentage
> that the two datasets are disjoint. Both tables in the merge join were
> the same size. We tested changing the TFR to see the change in
> filtering optimization.
>
> For example, let’s imagine t0 has 10 million rows, which contain the
> numbers 1 through 10 million randomly shuffled. Also, t1 has the
> numbers 4 million through 14 million randomly shuffled. Then the TFR
> for a join of these two tables is 40%, since 40% of the tables are
> disjoint from the other table (1 through 4 million for t0, 10 million
> through 14 million for t4).
>
> Here is the performance test result joining two tables:
> TFR: theoretical filtering rate
> EFR: estimated filtering rate
> AFR: actual filtering rate
> HJ: hash join
> MJ Default: default merge join
> MJ Filter: merge join with bloom filter optimization enabled
> MJ Filter Forced: merge join with bloom filter optimization forced
>
> TFR   EFR   AFR   HJ   MJ Default   MJ Filter   MJ Filter Forced
>
> -
> 10 33.46   7.416529   226382194923160
> 20 37.27  14.85   6483   222902192821930
> 30 41.32   22.25  6395   223742071820794
> 40 45.67   29.76272   

Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-02 Thread Zhihong Yu
On Sat, Oct 1, 2022 at 12:45 AM Zhihong Yu  wrote:

>
>
> On Fri, Sep 30, 2022 at 9:20 PM Zhihong Yu  wrote:
>
>>
>>
>> On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu  wrote:
>>
>>>
>>>
>>> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li  wrote:
>>>
 Hello,

 A bloom filter provides early filtering of rows that cannot be joined
 before they would reach the join operator, the optimization is also
 called a semi join filter (SJF) pushdown. Such a filter can be created
 when one child of the join operator must materialize its derived table
 before the other child is evaluated.

 For example, a bloom filter can be created using the the join keys for
 the build side/inner side of a hash join or the outer side of a merge
 join, the bloom filter can then be used to pre-filter rows on the
 other side of the join operator during the scan of the base relation.
 The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
 discussion on using such optimization for hash join without going into
 the pushdown of the filter where its performance gain could be further
 increased.

 We worked on prototyping bloom filter pushdown for both hash join and
 merge join. Attached is a patch set for bloom filter pushdown for
 merge join. We also plan to send the patch for hash join once we have
 it rebased.

 Here is a summary of the patch set:
 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
 during the table scan instead of later on.
 -The bloom filter is pushed down along the execution tree to
 the target SeqScan nodes.
 -Experiments show that this optimization can speed up Merge
 Join by up to 36%.

 2. The planner makes the decision to use the bloom filter based on the
 estimated filtering rate and the expected performance gain.
 -The planner accomplishes this by estimating four numbers per
 variable - the total number of rows of the relation, the number of
 distinct values for a given variable, and the minimum and maximum
 value of the variable (when applicable). Using these numbers, the
 planner estimates a filtering rate of a potential filter.
 -Because actually creating and implementing the filter adds
 more operations, there is a minimum threshold of filtering where the
 filter would actually be useful. Based on testing, we query to see if
 the estimated filtering rate is higher than 35%, and that informs our
 decision to use a filter or not.

 3. If using a bloom filter, the planner also adjusts the expected cost
 of Merge Join based on expected performance gain.

 4. Capability to build the bloom filter in parallel in case of
 parallel SeqScan. This is done efficiently by populating a local bloom
 filter for each parallel worker and then taking a bitwise OR over all
 the local bloom filters to form a shared bloom filter at the end of
 the parallel SeqScan.

 5. The optimization is GUC controlled, with settings of
 enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.

 We found in experiments that there is a significant improvement
 when using the bloom filter during Merge Join. One experiment involved
 joining two large tables while varying the theoretical filtering rate
 (TFR) between the two tables, the TFR is defined as the percentage
 that the two datasets are disjoint. Both tables in the merge join were
 the same size. We tested changing the TFR to see the change in
 filtering optimization.

 For example, let’s imagine t0 has 10 million rows, which contain the
 numbers 1 through 10 million randomly shuffled. Also, t1 has the
 numbers 4 million through 14 million randomly shuffled. Then the TFR
 for a join of these two tables is 40%, since 40% of the tables are
 disjoint from the other table (1 through 4 million for t0, 10 million
 through 14 million for t4).

 Here is the performance test result joining two tables:
 TFR: theoretical filtering rate
 EFR: estimated filtering rate
 AFR: actual filtering rate
 HJ: hash join
 MJ Default: default merge join
 MJ Filter: merge join with bloom filter optimization enabled
 MJ Filter Forced: merge join with bloom filter optimization forced

 TFR   EFR   AFR   HJ   MJ Default   MJ Filter   MJ Filter Forced

 -
 10 33.46   7.416529   226382194923160
 20 37.27  14.85   6483   222902192821930
 30 41.32   22.25  6395   223742071820794
 40 45.67   29.76272   219691944919410
 50 50.41   37.16210   214121822218224
 60 55.64   44.51  6052   21108

Re: Bloom filter Pushdown Optimization for Merge Join

2022-10-01 Thread Zhihong Yu
On Fri, Sep 30, 2022 at 9:20 PM Zhihong Yu  wrote:

>
>
> On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu  wrote:
>
>>
>>
>> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li  wrote:
>>
>>> Hello,
>>>
>>> A bloom filter provides early filtering of rows that cannot be joined
>>> before they would reach the join operator, the optimization is also
>>> called a semi join filter (SJF) pushdown. Such a filter can be created
>>> when one child of the join operator must materialize its derived table
>>> before the other child is evaluated.
>>>
>>> For example, a bloom filter can be created using the the join keys for
>>> the build side/inner side of a hash join or the outer side of a merge
>>> join, the bloom filter can then be used to pre-filter rows on the
>>> other side of the join operator during the scan of the base relation.
>>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
>>> discussion on using such optimization for hash join without going into
>>> the pushdown of the filter where its performance gain could be further
>>> increased.
>>>
>>> We worked on prototyping bloom filter pushdown for both hash join and
>>> merge join. Attached is a patch set for bloom filter pushdown for
>>> merge join. We also plan to send the patch for hash join once we have
>>> it rebased.
>>>
>>> Here is a summary of the patch set:
>>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
>>> during the table scan instead of later on.
>>> -The bloom filter is pushed down along the execution tree to
>>> the target SeqScan nodes.
>>> -Experiments show that this optimization can speed up Merge
>>> Join by up to 36%.
>>>
>>> 2. The planner makes the decision to use the bloom filter based on the
>>> estimated filtering rate and the expected performance gain.
>>> -The planner accomplishes this by estimating four numbers per
>>> variable - the total number of rows of the relation, the number of
>>> distinct values for a given variable, and the minimum and maximum
>>> value of the variable (when applicable). Using these numbers, the
>>> planner estimates a filtering rate of a potential filter.
>>> -Because actually creating and implementing the filter adds
>>> more operations, there is a minimum threshold of filtering where the
>>> filter would actually be useful. Based on testing, we query to see if
>>> the estimated filtering rate is higher than 35%, and that informs our
>>> decision to use a filter or not.
>>>
>>> 3. If using a bloom filter, the planner also adjusts the expected cost
>>> of Merge Join based on expected performance gain.
>>>
>>> 4. Capability to build the bloom filter in parallel in case of
>>> parallel SeqScan. This is done efficiently by populating a local bloom
>>> filter for each parallel worker and then taking a bitwise OR over all
>>> the local bloom filters to form a shared bloom filter at the end of
>>> the parallel SeqScan.
>>>
>>> 5. The optimization is GUC controlled, with settings of
>>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>>>
>>> We found in experiments that there is a significant improvement
>>> when using the bloom filter during Merge Join. One experiment involved
>>> joining two large tables while varying the theoretical filtering rate
>>> (TFR) between the two tables, the TFR is defined as the percentage
>>> that the two datasets are disjoint. Both tables in the merge join were
>>> the same size. We tested changing the TFR to see the change in
>>> filtering optimization.
>>>
>>> For example, let’s imagine t0 has 10 million rows, which contain the
>>> numbers 1 through 10 million randomly shuffled. Also, t1 has the
>>> numbers 4 million through 14 million randomly shuffled. Then the TFR
>>> for a join of these two tables is 40%, since 40% of the tables are
>>> disjoint from the other table (1 through 4 million for t0, 10 million
>>> through 14 million for t4).
>>>
>>> Here is the performance test result joining two tables:
>>> TFR: theoretical filtering rate
>>> EFR: estimated filtering rate
>>> AFR: actual filtering rate
>>> HJ: hash join
>>> MJ Default: default merge join
>>> MJ Filter: merge join with bloom filter optimization enabled
>>> MJ Filter Forced: merge join with bloom filter optimization forced
>>>
>>> TFR   EFR   AFR   HJ   MJ Default   MJ Filter   MJ Filter Forced
>>>
>>> -
>>> 10 33.46   7.416529   226382194923160
>>> 20 37.27  14.85   6483   222902192821930
>>> 30 41.32   22.25  6395   223742071820794
>>> 40 45.67   29.76272   219691944919410
>>> 50 50.41   37.16210   214121822218224
>>> 60 55.64   44.51  6052   211081706017018
>>> 70 61.59   51.98  5947   210201568215737
>>> 80 68.64   59.36  5761   2081214411

Re: Bloom filter Pushdown Optimization for Merge Join

2022-09-30 Thread Zhihong Yu
On Fri, Sep 30, 2022 at 8:40 PM Zhihong Yu  wrote:

>
>
> On Fri, Sep 30, 2022 at 3:44 PM Zheng Li  wrote:
>
>> Hello,
>>
>> A bloom filter provides early filtering of rows that cannot be joined
>> before they would reach the join operator, the optimization is also
>> called a semi join filter (SJF) pushdown. Such a filter can be created
>> when one child of the join operator must materialize its derived table
>> before the other child is evaluated.
>>
>> For example, a bloom filter can be created using the the join keys for
>> the build side/inner side of a hash join or the outer side of a merge
>> join, the bloom filter can then be used to pre-filter rows on the
>> other side of the join operator during the scan of the base relation.
>> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
>> discussion on using such optimization for hash join without going into
>> the pushdown of the filter where its performance gain could be further
>> increased.
>>
>> We worked on prototyping bloom filter pushdown for both hash join and
>> merge join. Attached is a patch set for bloom filter pushdown for
>> merge join. We also plan to send the patch for hash join once we have
>> it rebased.
>>
>> Here is a summary of the patch set:
>> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
>> during the table scan instead of later on.
>> -The bloom filter is pushed down along the execution tree to
>> the target SeqScan nodes.
>> -Experiments show that this optimization can speed up Merge
>> Join by up to 36%.
>>
>> 2. The planner makes the decision to use the bloom filter based on the
>> estimated filtering rate and the expected performance gain.
>> -The planner accomplishes this by estimating four numbers per
>> variable - the total number of rows of the relation, the number of
>> distinct values for a given variable, and the minimum and maximum
>> value of the variable (when applicable). Using these numbers, the
>> planner estimates a filtering rate of a potential filter.
>> -Because actually creating and implementing the filter adds
>> more operations, there is a minimum threshold of filtering where the
>> filter would actually be useful. Based on testing, we query to see if
>> the estimated filtering rate is higher than 35%, and that informs our
>> decision to use a filter or not.
>>
>> 3. If using a bloom filter, the planner also adjusts the expected cost
>> of Merge Join based on expected performance gain.
>>
>> 4. Capability to build the bloom filter in parallel in case of
>> parallel SeqScan. This is done efficiently by populating a local bloom
>> filter for each parallel worker and then taking a bitwise OR over all
>> the local bloom filters to form a shared bloom filter at the end of
>> the parallel SeqScan.
>>
>> 5. The optimization is GUC controlled, with settings of
>> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>>
>> We found in experiments that there is a significant improvement
>> when using the bloom filter during Merge Join. One experiment involved
>> joining two large tables while varying the theoretical filtering rate
>> (TFR) between the two tables, the TFR is defined as the percentage
>> that the two datasets are disjoint. Both tables in the merge join were
>> the same size. We tested changing the TFR to see the change in
>> filtering optimization.
>>
>> For example, let’s imagine t0 has 10 million rows, which contain the
>> numbers 1 through 10 million randomly shuffled. Also, t1 has the
>> numbers 4 million through 14 million randomly shuffled. Then the TFR
>> for a join of these two tables is 40%, since 40% of the tables are
>> disjoint from the other table (1 through 4 million for t0, 10 million
>> through 14 million for t4).
>>
>> Here is the performance test result joining two tables:
>> TFR: theoretical filtering rate
>> EFR: estimated filtering rate
>> AFR: actual filtering rate
>> HJ: hash join
>> MJ Default: default merge join
>> MJ Filter: merge join with bloom filter optimization enabled
>> MJ Filter Forced: merge join with bloom filter optimization forced
>>
>> TFR   EFR   AFR   HJ   MJ Default   MJ Filter   MJ Filter Forced
>>
>> -
>> 10 33.46   7.416529   226382194923160
>> 20 37.27  14.85   6483   222902192821930
>> 30 41.32   22.25  6395   223742071820794
>> 40 45.67   29.76272   219691944919410
>> 50 50.41   37.16210   214121822218224
>> 60 55.64   44.51  6052   211081706017018
>> 70 61.59   51.98  5947   210201568215737
>> 80 68.64   59.36  5761   208121441114437
>> 90 77.83   66.86  5701   205851317113200
>> Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two
>> Tables of 

Re: Bloom filter Pushdown Optimization for Merge Join

2022-09-30 Thread Zhihong Yu
On Fri, Sep 30, 2022 at 3:44 PM Zheng Li  wrote:

> Hello,
>
> A bloom filter provides early filtering of rows that cannot be joined
> before they would reach the join operator, the optimization is also
> called a semi join filter (SJF) pushdown. Such a filter can be created
> when one child of the join operator must materialize its derived table
> before the other child is evaluated.
>
> For example, a bloom filter can be created using the the join keys for
> the build side/inner side of a hash join or the outer side of a merge
> join, the bloom filter can then be used to pre-filter rows on the
> other side of the join operator during the scan of the base relation.
> The thread about “Hash Joins vs. Bloom Filters / take 2” [1] is good
> discussion on using such optimization for hash join without going into
> the pushdown of the filter where its performance gain could be further
> increased.
>
> We worked on prototyping bloom filter pushdown for both hash join and
> merge join. Attached is a patch set for bloom filter pushdown for
> merge join. We also plan to send the patch for hash join once we have
> it rebased.
>
> Here is a summary of the patch set:
> 1. Bloom Filter Pushdown optimizes Merge Join by filtering rows early
> during the table scan instead of later on.
> -The bloom filter is pushed down along the execution tree to
> the target SeqScan nodes.
> -Experiments show that this optimization can speed up Merge
> Join by up to 36%.
>
> 2. The planner makes the decision to use the bloom filter based on the
> estimated filtering rate and the expected performance gain.
> -The planner accomplishes this by estimating four numbers per
> variable - the total number of rows of the relation, the number of
> distinct values for a given variable, and the minimum and maximum
> value of the variable (when applicable). Using these numbers, the
> planner estimates a filtering rate of a potential filter.
> -Because actually creating and implementing the filter adds
> more operations, there is a minimum threshold of filtering where the
> filter would actually be useful. Based on testing, we query to see if
> the estimated filtering rate is higher than 35%, and that informs our
> decision to use a filter or not.
>
> 3. If using a bloom filter, the planner also adjusts the expected cost
> of Merge Join based on expected performance gain.
>
> 4. Capability to build the bloom filter in parallel in case of
> parallel SeqScan. This is done efficiently by populating a local bloom
> filter for each parallel worker and then taking a bitwise OR over all
> the local bloom filters to form a shared bloom filter at the end of
> the parallel SeqScan.
>
> 5. The optimization is GUC controlled, with settings of
> enable_mergejoin_semijoin_filter and force_mergejoin_semijoin_filter.
>
> We found in experiments that there is a significant improvement
> when using the bloom filter during Merge Join. One experiment involved
> joining two large tables while varying the theoretical filtering rate
> (TFR) between the two tables, the TFR is defined as the percentage
> that the two datasets are disjoint. Both tables in the merge join were
> the same size. We tested changing the TFR to see the change in
> filtering optimization.
>
> For example, let’s imagine t0 has 10 million rows, which contain the
> numbers 1 through 10 million randomly shuffled. Also, t1 has the
> numbers 4 million through 14 million randomly shuffled. Then the TFR
> for a join of these two tables is 40%, since 40% of the tables are
> disjoint from the other table (1 through 4 million for t0, 10 million
> through 14 million for t4).
>
> Here is the performance test result joining two tables:
> TFR: theoretical filtering rate
> EFR: estimated filtering rate
> AFR: actual filtering rate
> HJ: hash join
> MJ Default: default merge join
> MJ Filter: merge join with bloom filter optimization enabled
> MJ Filter Forced: merge join with bloom filter optimization forced
>
> TFR   EFR   AFR   HJ   MJ Default   MJ Filter   MJ Filter Forced
>
> -
> 10 33.46   7.416529   226382194923160
> 20 37.27  14.85   6483   222902192821930
> 30 41.32   22.25  6395   223742071820794
> 40 45.67   29.76272   219691944919410
> 50 50.41   37.16210   214121822218224
> 60 55.64   44.51  6052   211081706017018
> 70 61.59   51.98  5947   210201568215737
> 80 68.64   59.36  5761   208121441114437
> 90 77.83   66.86  5701   205851317113200
> Table. Execution Time (ms) vs Filtering Rate (%) for Joining Two
> Tables of 10M Rows.
>
> Attached you can find figures of the same performance test and a SQL script
> to reproduce the performance test.
>
> The first thing to