Never mind I figured it out by looking at Calcite tests :)



On 9/22/16, 9:26 PM, "Vineet Garg" <vg...@hortonworks.com> wrote:

>Hi Julian,
>
>Thank you for your response. I have few follow-up questions:
>
>Yes. Remember it should return only the correlating variables it sets, not 
>those it inherits
>What do you mean by inherit ? Could you kindly provide an example to elaborate?
>
>No it shouldn’t necessarily. The id must be unique within the whole query.
>If id is unique how does co-related variable in inner query is bound to outer 
>query ? I.e. How would calcite figure out what variable in outer query a 
>particular co-related variable refers to ?
>
>Vineet
>
>From: Julian Hyde <jh...@apache.org<mailto:jh...@apache.org>>
>Date: Thursday, September 22, 2016 at 3:05 PM
>To: default <vg...@hortonworks.com<mailto:vg...@hortonworks.com>>
>Cc: "dev@calcite.apache.org<mailto:dev@calcite.apache.org>" 
><dev@calcite.apache.org<mailto:dev@calcite.apache.org>>
>Subject: Re: Subquery de-correlation
>
>Vineet,
>
>Thanks for your message. See my responses inline.
>
>On Sep 21, 2016, at 5:11 PM, Vineet Garg 
><vg...@hortonworks.com<mailto:vg...@hortonworks.com>> wrote:
>
>Hello Julian/Calcite community,
>
>I am working on adding subquery support in HIVE using calcite.  From what I 
>have read/understood so far Calcite requires HIVE to create RexSubqueryNode 
>corresponding to a subquery and then call SubQueryRemoveRule to get rid of 
>RexSubqueryNode and change it to join. This seems to be working for 
>Un-correlated queries where SubQueryRemoveRule creates Aggregate + Join to get 
>rid of RexSubqueryNode. But I am running into following issues with 
>Co-rrelated queries: (Note that I am using FILTER rule)
>
>  *   Looking at SubQueryRemoveRule code it should be creating Correlate node 
> if it finds any correlation in given filter. To find if given filter has 
> correlation getVariablesSet is called on filter, which supposedly should be 
> returning set of correlated variables, but it is always returning empty set 
> as filter does not implement this method. Shouldn’t Filter implement this 
> method to return appropriate correlated variables ?
>
>Yes. Remember it should return only the correlating variables it sets, not 
>those it inherits.
>
>  *   Comments in SubQueryRemoveRule mentions that “The correlate can be 
> removed using RelDecorrelator”. But I don’t see SubqueryRemoveRule using 
> RelDecorrelator to de-correlate given query. Should SubQueryRemoveRule call 
> this ? If not is doing de-correlation immediately after SubQueryRemoveRule 
> appropriate ?
>
>I would tend to invoke RelDecorrelator on the whole tree. But I see no reason 
>in principle why it can’t be called on a section of the tree, as long as that 
>section is self-contained (i.e. no unbound correlating variables).
>
>Here is what I have done so far for co-rrelated queries. Could you please 
>comment if this is right ?
>
>  *   While creating RexSubqueryNode and RelNode for the subquery I am 
> creating RexCorrelVariable. RexCorrelVariable needs a correlation id. 
> CorrelationId requires an integer id. Should this id be same as index of 
> co-relatted column in outer table ?
>
>No it shouldn’t necessarily. The id must be unique within the whole query.
>
>  *   Hive has a HiveFilter which is extended from Filter. I implemented 
> getVariableSet method to look at the condition and return all correlated 
> variables in condition’s RelNode. Does this sound correct ?
>
>Yes, sounds right.
>
>  *   I am calling RelDecorrelator’s decorrelateQuery immediately after 
> calling SubQueryRemoveRule.  After implementing getVariableSet in HiveFilter 
> SubQueryRemoveRule seems to be creating appropriate LogicalCorrelate for 
> correlate queries but decorrelateQuery is throwing an exception.
>
>I can’t help too much if you are getting errors in Hive-land. This stuff is so 
>complicated I strongly suggest unit tests. Don’t do anything “new” in Hive, 
>make sure that it all works on Calcite logical nodes. Write tests in 
>RelOptRulesTest.
>
>Julian
>

Reply via email to