Hello,

I am trying to create my query planner based on hive's implementation of
Calcite Planner (https://github.com/apache/hive/blob/master/ql/src/java/
org/apache/hadoop/hive/ql/parse/CalcitePlanner.java). I have split my
optimizing procedure in a similar way like Hive's planner. At first, I use
some pre-join order optimizations. Then I am using
LoptOptimizeJoinRule.INSTANCE
for join order and finally I apply some rules that don't need statistics to
get my final plan. I face two problems :

1) When I have a query like this :
     "select *  "
          + "from  s.products join s.orders  "
          + "on s.orders.productid = s.products.productid  "
         + " where units>10 and description < 20 "
        );

I get this plan, after using the LoptOptimizeJoinRule :
LogicalProject(rowtime=[$5], productid=[$6], description=[$7],
rowtime0=[$0], orderid=[$1], productid0=[$2], units=[$3], customerid=[$4])
  LogicalJoin(condition=[=($6, $2)], joinType=[inner])
    LogicalFilter(condition=[>($3, 10)])
      LogicalTableScan(table=[[s, orders]])
    LogicalFilter(condition=[<($2, 20)])
      LogicalTableScan(table=[[s, products]])

The final plan has an extra Projection over the Join. However, this
projection has no use and I want to get rid of it.
I tried to create a rule that transforms a project(join) -> join ,when they
have the same output schema, but I couldn't find the output schema of the
join operator. Am I doing something wrong with the order or the way I
enforce the rules? Is there an easy way to get rid of this topProject?

2)After I have used the LoptOptimizeJoinRule and get my optimized order, I
can't use JoinCommuteRule, as the hepPlanner runs forever.

Thank you in advance,
George

Reply via email to