Re: Employ bloom filters in joins

2023-05-04 Thread Chunwei Lei
It sounds like a Runtime Filter[1], which is commonly used by many systems. As Stamatis mentioned, integrating it into the cost model is much more challenging than implementing the rule. Fortunately, we can refer to the practices of other systems. [1]

Re: Employ bloom filters in joins

2023-04-29 Thread Stamatis Zampetakis
The topic is really interesting, thanks for sharing your ideas Zoltan! I see no drawbacks adding the new transformation rule; definitely worth having! However, adding them to the default rule set or using them in a cost based decision may require much more work/thinking. Calcite's built-in cost

Re: Employ bloom filters in joins

2023-04-28 Thread Julian Hyde
It would be great to have such a rule. People who don’t want it can disable it; and people who enable it can use a cost function. Some systems that use Bloom filters (and other probabilistic filters) don’t execute the query twice but use a side-channel to send the Bloom filter from one scan to

Employ bloom filters in joins

2023-04-28 Thread Zoltan Haindrich
Hi, I was wondering about the pros and cons of having a Calcite rule which could rewrite a join to utilize bloom filters; something like: select e.* from emp e join dept d on(e.deptno=d.deptno); where d.dname='Sales'; into something like: select e.* from (