Never mind I figured it out by looking at Calcite tests :)
On 9/22/16, 9:26 PM, "Vineet Garg" <vg...@hortonworks.com> wrote: >Hi Julian, > >Thank you for your response. I have few follow-up questions: > >Yes. Remember it should return only the correlating variables it sets, not >those it inherits >What do you mean by inherit ? Could you kindly provide an example to elaborate? > >No it shouldn’t necessarily. The id must be unique within the whole query. >If id is unique how does co-related variable in inner query is bound to outer >query ? I.e. How would calcite figure out what variable in outer query a >particular co-related variable refers to ? > >Vineet > >From: Julian Hyde <jh...@apache.org<mailto:jh...@apache.org>> >Date: Thursday, September 22, 2016 at 3:05 PM >To: default <vg...@hortonworks.com<mailto:vg...@hortonworks.com>> >Cc: "dev@calcite.apache.org<mailto:dev@calcite.apache.org>" ><dev@calcite.apache.org<mailto:dev@calcite.apache.org>> >Subject: Re: Subquery de-correlation > >Vineet, > >Thanks for your message. See my responses inline. > >On Sep 21, 2016, at 5:11 PM, Vineet Garg ><vg...@hortonworks.com<mailto:vg...@hortonworks.com>> wrote: > >Hello Julian/Calcite community, > >I am working on adding subquery support in HIVE using calcite. From what I >have read/understood so far Calcite requires HIVE to create RexSubqueryNode >corresponding to a subquery and then call SubQueryRemoveRule to get rid of >RexSubqueryNode and change it to join. This seems to be working for >Un-correlated queries where SubQueryRemoveRule creates Aggregate + Join to get >rid of RexSubqueryNode. But I am running into following issues with >Co-rrelated queries: (Note that I am using FILTER rule) > > * Looking at SubQueryRemoveRule code it should be creating Correlate node > if it finds any correlation in given filter. To find if given filter has > correlation getVariablesSet is called on filter, which supposedly should be > returning set of correlated variables, but it is always returning empty set > as filter does not implement this method. Shouldn’t Filter implement this > method to return appropriate correlated variables ? > >Yes. Remember it should return only the correlating variables it sets, not >those it inherits. > > * Comments in SubQueryRemoveRule mentions that “The correlate can be > removed using RelDecorrelator”. But I don’t see SubqueryRemoveRule using > RelDecorrelator to de-correlate given query. Should SubQueryRemoveRule call > this ? If not is doing de-correlation immediately after SubQueryRemoveRule > appropriate ? > >I would tend to invoke RelDecorrelator on the whole tree. But I see no reason >in principle why it can’t be called on a section of the tree, as long as that >section is self-contained (i.e. no unbound correlating variables). > >Here is what I have done so far for co-rrelated queries. Could you please >comment if this is right ? > > * While creating RexSubqueryNode and RelNode for the subquery I am > creating RexCorrelVariable. RexCorrelVariable needs a correlation id. > CorrelationId requires an integer id. Should this id be same as index of > co-relatted column in outer table ? > >No it shouldn’t necessarily. The id must be unique within the whole query. > > * Hive has a HiveFilter which is extended from Filter. I implemented > getVariableSet method to look at the condition and return all correlated > variables in condition’s RelNode. Does this sound correct ? > >Yes, sounds right. > > * I am calling RelDecorrelator’s decorrelateQuery immediately after > calling SubQueryRemoveRule. After implementing getVariableSet in HiveFilter > SubQueryRemoveRule seems to be creating appropriate LogicalCorrelate for > correlate queries but decorrelateQuery is throwing an exception. > >I can’t help too much if you are getting errors in Hive-land. This stuff is so >complicated I strongly suggest unit tests. Don’t do anything “new” in Hive, >make sure that it all works on Calcite logical nodes. Write tests in >RelOptRulesTest. > >Julian >