Re: When will be the stats based join selector be implemented?

Li Gao Thu, 08 Oct 2015 13:16:25 -0700

Hi Maryann,

Those are great pointers. Thanks for the detailed descriptions.


Thanks,
Li


On Thu, Oct 8, 2015 at 1:08 PM, Maryann Xue <[email protected]> wrote:

> Hi Li,
>
> What you are concerned here seems to be more of the knowledge of Calcite.
>
> Anyway in short Calcite works with rules. And you can think of applying a
> set of rules gives you a bunch of different query plans you could probably
> go with. Calcite then calculates the cumulative cost for each candidate
> (this is only the idea, but implementation differs a little bit) and picks
> the cheapest plan out of these candidates.
>
> So for example, we have several different implementations for joins in
> Phoenix, and those correspond to different physical operators in Calcite
> (PhoenixServerJoin.java, PhoenixClientJoin.java). We provide overrides the
> cost function ("computeSelfCost") trying to model it as close as the
> runtime overhead. But both versions (using PhoenixServerJoin and
> PhoenixClientJoin) exist in the candidates, and what comes cheaper is
> usually based on the join's input. Like if both sides of the join operator
> are sorted on the join keys, most likely the merge-join is going to chosen.
>
> There are quite a lot of general optimization rules provided by Calcite
> already (in the Calcite project), like the filter push down rule. There are
> also some Phoenix specific rules under org.apache.phoenix.calcite.rel.rules.
>
> For examples, you can look at CalciteIT.java, which contains some basic
> test cases as well as some interesting stuff.
>
>
> Thanks,
> Maryann
>
>
>
> On Thu, Oct 8, 2015 at 2:37 PM, Li Gao <[email protected]> wrote:
>
>> Hi Maryann,
>>
>> I am wondering if you could help me understand how the Phoenix calcite
>> branch is using Calcite to do query optimizations
>>
>> i.e.
>>
>>    - some pointers to the code where the joins can detect whether a hash
>>    join or a sort merge join should be used for a given case
>>    - pointers to how the cost is calculated in the code
>>    - pointers to how the filter predicate push down is implemented in
>>    the code
>>
>> Examples  would be greatly appreciated.
>>
>> Thanks,
>> Li
>>
>>
>> On Mon, Oct 5, 2015 at 5:49 PM, Maryann Xue <[email protected]>
>> wrote:
>>
>>> Hi Li,
>>>
>>> Sorry, I forgot to mention that this calcite branch is now depending on
>>> Apache Calcite's master branch instead of any of its releases. So you need
>>> to checkout Calcite (git://github.com/apache/incubator-calcite.git)
>>> first and run `mvn install` for that project before going back to the
>>> Phoenix project and run mvn commands.
>>>
>>> On Mon, Oct 5, 2015 at 6:43 PM, Li Gao <[email protected]> wrote:
>>>
>>>> Hi Maryann,
>>>>
>>>> This looks great. Thanks for pointing me to the right branch!  For some
>>>> reason I am getting the following errors when I do mvn package
>>>>
>>>> [WARNING] The POM for
>>>> org.apache.calcite:calcite-avatica:jar:1.5.0-incubating-SNAPSHOT is
>>>> missing, no dependency information available
>>>>
>>>> [WARNING] The POM for
>>>> org.apache.calcite:calcite-core:jar:1.5.0-incubating-SNAPSHOT is missing,
>>>> no dependency information available
>>>>
>>>> [WARNING] The POM for
>>>> org.apache.calcite:calcite-core:jar:tests:1.5.0-incubating-SNAPSHOT is
>>>> missing, no dependency information available
>>>>
>>>> [WARNING] The POM for
>>>> org.apache.calcite:calcite-linq4j:jar:1.5.0-incubating-SNAPSHOT is missing,
>>>> no dependency information available
>>>>
>>>> Where can I find these dependencies?
>>>>
>>>> Thanks,
>>>>
>>>> Li
>>>>
>>>>
>>>>
>>>> On Mon, Oct 5, 2015 at 12:19 PM, Maryann Xue <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Li,
>>>>>
>>>>> We are moving towards integrating with Calcite as our stats based
>>>>> optimization now. You can checkout our calcite
>>>>> <https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=shortlog;h=refs/heads/calcite>
>>>>> branch and play with it if you are interested. It's still under
>>>>> development, but you can already see some amazing optimization examples in
>>>>> our test file CalciteIT.java. You can also go
>>>>> http://www.slideshare.net/HBaseCon/ecosystem-session-2-49044349 for
>>>>> more information.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 5, 2015 at 2:08 PM, Li Gao <[email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am currently looking into getting optimized joins based on table
>>>>>> stats. I noticed in the QueryCompile at line 232-234 is still saying 
>>>>>> "TODO".
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/phoenix/blob/4.x-HBase-1.0/phoenix-core/src/main/java/org/apache/phoenix/compile/QueryCompiler.java
>>>>>>
>>>>>> We have a need to get the selector enabled based on the size of the
>>>>>> the LHS and RHS table.
>>>>>>
>>>>>> Thanks,
>>>>>> Li
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: When will be the stats based join selector be implemented?

Reply via email to