Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/12810#discussion_r61944296
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1250,40 +1251,77 @@ class Analyzer(
}
/**
- * Rewrites table generating expressions that either need one or more of
the following in order
- * to be resolved:
- * - concrete attribute references for their output.
- * - to be relocated from a SELECT clause (i.e. from a [[Project]])
into a [[Generate]]).
+ * Extracts [[Generator]] from the projectList of a [[Project]] operator
and create [[Generate]]
+ * operator under [[Project]].
*
- * Names for the output [[Attribute]]s are extracted from [[Alias]] or
[[MultiAlias]] expressions
- * that wrap the [[Generator]]. If more than one [[Generator]] is found
in a Project, an
- * [[AnalysisException]] is throw.
+ * This rule will throw [[AnalysisException]] for following cases:
+ * 1. [[Generator]] is nested in expressions, e.g. `SELECT explode(list)
+ 1 FROM tbl`
+ * 2. more than one [[Generator]] is found in projectList,
+ * e.g. `SELECT explode(list), explode(list) FROM tbl`
+ * 3. [[Generator]] is found in other operators that are not [[Project]]
or [[Generate]],
+ * e.g. `SELECT * FROM tbl SORT BY explode(list)`
*/
- object ResolveGenerate extends Rule[LogicalPlan] {
+ object ExtractGenerator extends Rule[LogicalPlan] {
+ private def hasGenerator(expr: Expression): Boolean = {
+ expr.find(_.isInstanceOf[Generator]).isDefined
+ }
+
+ private def hasNestedGenerator(expr: NamedExpression): Boolean = expr
match {
+ case UnresolvedAlias(_: Generator, _) => false
+ case Alias(_: Generator, _) => false
+ case MultiAlias(_: Generator, _) => false
+ case other => hasGenerator(other)
+ }
+
+ private def trimAlias(expr: NamedExpression): Expression = expr match {
+ case UnresolvedAlias(child, _) => child
+ case Alias(child, _) => child
+ case MultiAlias(child, _) => child
+ case _ => expr
+ }
+
+ /** Extracts a [[Generator]] expression and any names assigned by
aliases to their output. */
+ private object AliasedGenerator {
+ def unapply(e: Expression): Option[(Generator, Seq[String])] = e
match {
+ case Alias(g: Generator, name) if g.resolved &&
g.elementSchema.length > 1 =>
+ // If not given the default names, and the TGF with multiple
output columns
--- End diff --
Dumb question: what is TGF? Table Generating Function?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]