Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/6107#discussion_r30379270
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -521,66 +525,89 @@ class Analyzer(
}
/**
- * When a SELECT clause has only a single expression and that expression
is a
- * [[catalyst.expressions.Generator Generator]] we convert the
- * [[catalyst.plans.logical.Project Project]] to a
[[catalyst.plans.logical.Generate Generate]].
+ * Rewrites table generating expressions that either need one or more of
the following in order
+ * to be resolved:
+ * - concrete attribute references for their output.
+ * - to be relocated from a SELECT clause (i.e. from a [[Project]])
into a [[Generate]]).
+ *
+ * Names for the output [[Attributes]] are extracted from [[Alias]] or
[[MultiAlias]] expressions
+ * that wrap the [[Generator]]. If more than one [[Generator]] is found
in a Project, an
+ * [[AnalysisException]] is throw.
*/
- object ImplicitGenerate extends Rule[LogicalPlan] {
+ object ResolveGenerate extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
- case Project(Seq(Alias(g: Generator, name)), child) =>
- Generate(g, join = false, outer = false,
- qualifier = None, UnresolvedAttribute(name) :: Nil, child)
- case Project(Seq(MultiAlias(g: Generator, names)), child) =>
- Generate(g, join = false, outer = false,
- qualifier = None, names.map(UnresolvedAttribute(_)), child)
+ case p: Generate if !p.child.resolved || !p.generator.resolved => p
+ case g: Generate if g.resolved == false =>
+ g.copy(
+ generatorOutput = makeGeneratorOutput(g.generator,
g.generatorOutput.map(_.name)))
+
+ case p @ Project(projectList, child) =>
+ // Holds the resolved generator, if one exists in the project list.
+ var resolvedGenerator: Generate = null
+
+ val newProjectList = projectList.flatMap {
+ case AliasedGenerator(generator, names) if
generator.childrenResolved =>
+ if (resolvedGenerator != null) {
+ failAnalysis(
+ s"Only one generator allowed per select but
${resolvedGenerator.nodeName} and " +
+ s"and ${generator.nodeName} found.")
+ }
+
+ resolvedGenerator =
+ Generate(
+ generator,
+ join = projectList.size > 1, // Only join if there are
other expressions in SELECT.
--- End diff --
The point of this PR is explicitly to allow you to use a single TGF in a
select clause with other expressions. I know that Hive does not allow this,
but that seems like an unreasonable limitation that we are trying to remove.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]