[ https://issues.apache.org/jira/browse/SPARK-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell resolved SPARK-18582. --------------------------------------- Resolution: Fixed Assignee: Nattavut Sutyanyong Fix Version/s: 2.1.0 > Whitelist LogicalPlan operators allowed in correlated subqueries > ---------------------------------------------------------------- > > Key: SPARK-18582 > URL: https://issues.apache.org/jira/browse/SPARK-18582 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.0.0 > Reporter: Nattavut Sutyanyong > Assignee: Nattavut Sutyanyong > Fix For: 2.1.0 > > > We want to tighten the code that handles correlated subquery to whitelist > operators that are allowed in it. > The current code in {{def pullOutCorrelatedPredicates}} looks like > {code} > // Simplify the predicates before pulling them out. > val transformed = BooleanSimplification(sub) transformUp { > case f @ Filter(cond, child) => ... > case p @ Project(expressions, child) => ... > case a @ Aggregate(grouping, expressions, child) => ... > case w : Window => ... > case j @ Join(left, _, RightOuter, _) => ... > case j @ Join(left, right, FullOuter, _) => ... > case j @ Join(_, right, jt, _) if !jt.isInstanceOf[InnerLike] => ... > case u: Union => ... > case s: SetOperation => ... > case e: Expand => ... > case l : LocalLimit => ... > case g : GlobalLimit => ... > case s : Sample => ... > case p => > failOnOuterReference(p) > ... > } > {code} > The code disallows operators in a sub plan of an operator hosting correlation > on a case by case basis. As it is today, it only blocks {{Union}}, > {{Intersect}}, {{Except}}, {{Expand}} {{LocalLimit}} {{GlobalLimit}} > {{Sample}} {{FullOuter}} and right table of {{LeftOuter}} (and left table of > {{RightOuter}}). That means any {{LogicalPlan}} operators that are not in the > list above are permitted to be under a correlation point. Is this risky? > There are many (30+ at least from browsing the {{LogicalPlan}} type > hierarchy) operators derived from {{LogicalPlan}} class. > For the case of {{ScalarSubquery}}, it explicitly checks that only > {{SubqueryAlias}} {{Project}} {{Filter}} {{Aggregate}} are allowed > ({{CheckAnalysis.scala}} around line 126-165 in and after {{def > cleanQuery}}). We should whitelist which operators are allowed in correlated > subqueries. At my first glance, we should allow, in addition to the ones > allowed in {{ScalarSubquery}}: {{Join}}, {{Distinct}}, {{Sort}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org