Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/6107#discussion_r30381898
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -521,66 +525,89 @@ class Analyzer(
}
/**
- * When a SELECT clause has only a single expression and that expression
is a
- * [[catalyst.expressions.Generator Generator]] we convert the
- * [[catalyst.plans.logical.Project Project]] to a
[[catalyst.plans.logical.Generate Generate]].
+ * Rewrites table generating expressions that either need one or more of
the following in order
+ * to be resolved:
+ * - concrete attribute references for their output.
+ * - to be relocated from a SELECT clause (i.e. from a [[Project]])
into a [[Generate]]).
+ *
+ * Names for the output [[Attributes]] are extracted from [[Alias]] or
[[MultiAlias]] expressions
+ * that wrap the [[Generator]]. If more than one [[Generator]] is found
in a Project, an
+ * [[AnalysisException]] is throw.
*/
- object ImplicitGenerate extends Rule[LogicalPlan] {
+ object ResolveGenerate extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
- case Project(Seq(Alias(g: Generator, name)), child) =>
- Generate(g, join = false, outer = false,
- qualifier = None, UnresolvedAttribute(name) :: Nil, child)
- case Project(Seq(MultiAlias(g: Generator, names)), child) =>
- Generate(g, join = false, outer = false,
- qualifier = None, names.map(UnresolvedAttribute(_)), child)
+ case p: Generate if !p.child.resolved || !p.generator.resolved => p
+ case g: Generate if g.resolved == false =>
+ g.copy(
+ generatorOutput = makeGeneratorOutput(g.generator,
g.generatorOutput.map(_.name)))
+
+ case p @ Project(projectList, child) =>
+ // Holds the resolved generator, if one exists in the project list.
+ var resolvedGenerator: Generate = null
+
+ val newProjectList = projectList.flatMap {
+ case AliasedGenerator(generator, names) if
generator.childrenResolved =>
+ if (resolvedGenerator != null) {
+ failAnalysis(
+ s"Only one generator allowed per select but
${resolvedGenerator.nodeName} and " +
+ s"and ${generator.nodeName} found.")
+ }
+
+ resolvedGenerator =
+ Generate(
+ generator,
+ join = projectList.size > 1, // Only join if there are
other expressions in SELECT.
--- End diff --
@rxin and I are afraid that allowing more than one in a single select is
too confusing, so we explicitly disallow that. This is because you get an
implicit cartesian product of the two things you are exploding. If users want
to do more than one they can just use more than one select, and then the result
is more obvious we think.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]