[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: URL: https://github.com/apache/spark/pull/27803#discussion_r416334410 ## File path: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ## @@ -344,12 +345,73 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession { } } + test("Supported nested inner generators") { Review comment: If you don't mind, `SPARK-31049: ` prefix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: URL: https://github.com/apache/spark/pull/27803#discussion_r416333033 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -2203,13 +2220,41 @@ class Analyzer( }.nonEmpty) } -private def trimAlias(expr: NamedExpression): Expression = expr match { +private def trimAlias(expr: Expression): Expression = expr match { case UnresolvedAlias(child, _) => child case Alias(child, _) => child case MultiAlias(child, _) => child case _ => expr } +private def createGenerate( +generator: Generator, +outer: Boolean, +names: Seq[String], +child: LogicalPlan): Generate = { + Generate( +generator, +unrequiredChildIndex = Nil, +outer = outer, +qualifier = None, +generatorOutput = ResolveGenerate.makeGeneratorOutput(generator, names), +child) +} + +private def collectAdjacentGenerators(children: Seq[Expression]): Seq[(Generator, Boolean)] = { Review comment: Could you describe the output as a function description, please? For example, the purpose of the second boolean in the result pair? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: URL: https://github.com/apache/spark/pull/27803#discussion_r416330704 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ## @@ -425,38 +425,35 @@ class AnalysisErrorSuite extends AnalysisTest { TimeWindow(Literal("2016-01-01 01:01:01"), "1 second", "0 second", "0 second").as("window")), "The slide duration" :: " must be greater than 0." :: Nil ) - +"Nested generators are supported only when inner generators " + + "are unary and adjacent with single output, but got: " Review comment: Ur, maybe, mistake? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v)) URL: https://github.com/apache/spark/pull/27803#discussion_r399906896 ## File path: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ## @@ -344,12 +345,55 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession { } } + test("Supported nested inner generators") { +// Project cases +checkAnswer( + sql("SELECT explode(explode(array(array(1, 2), array(3"), + Row(1) :: Row(2) :: Row(3) :: Nil) +checkAnswer( + sql("SELECT array(array(4), array(5)) v").select(explode_outer(explode_outer($"v"))), + Row(4) :: Row(5) :: Nil) +checkAnswer( + sql("SELECT posexplode(explode(array(array(1, 2), array(3"), + Row(0, 1) :: Row(1, 2) :: Row(0, 3) :: Nil) +checkAnswer( + sql("SELECT array(array(4), array(5)) v").select(posexplode_outer(explode($"v"))), + Row(0, 4) :: Row(0, 5) :: Nil) +checkAnswer( + sql("SELECT inline(explode(array(array(struct(1, 'a'), struct(2, 'b')"), + Row(1, "a") :: Row(2, "b") :: Nil) + +// Aggregate cases +checkAnswer( + sql("SELECT explode_outer(explode(array(array(min(v), max(v) FROM VALUES 1, 2, 3 t(v)"), + Row(1) :: Row(3) :: Nil) +checkAnswer( + sql("SELECT array(array(min(v), max(v))) ar FROM VALUES 7, 9 t(v)") +.select(explode(explode_outer($"ar"))), + Row(7) :: Row(9) :: Nil) Review comment: Can we add more deep test case like the following? ``` scala> sql("select explode(explode(explode(explode(array(array(array(array(1, 2), array(3, 4").show() +---+ |col| +---+ | 1| | 2| | 3| | 4| +---+ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v)) URL: https://github.com/apache/spark/pull/27803#discussion_r399906896 ## File path: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ## @@ -344,12 +345,55 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession { } } + test("Supported nested inner generators") { +// Project cases +checkAnswer( + sql("SELECT explode(explode(array(array(1, 2), array(3"), + Row(1) :: Row(2) :: Row(3) :: Nil) +checkAnswer( + sql("SELECT array(array(4), array(5)) v").select(explode_outer(explode_outer($"v"))), + Row(4) :: Row(5) :: Nil) +checkAnswer( + sql("SELECT posexplode(explode(array(array(1, 2), array(3"), + Row(0, 1) :: Row(1, 2) :: Row(0, 3) :: Nil) +checkAnswer( + sql("SELECT array(array(4), array(5)) v").select(posexplode_outer(explode($"v"))), + Row(0, 4) :: Row(0, 5) :: Nil) +checkAnswer( + sql("SELECT inline(explode(array(array(struct(1, 'a'), struct(2, 'b')"), + Row(1, "a") :: Row(2, "b") :: Nil) + +// Aggregate cases +checkAnswer( + sql("SELECT explode_outer(explode(array(array(min(v), max(v) FROM VALUES 1, 2, 3 t(v)"), + Row(1) :: Row(3) :: Nil) +checkAnswer( + sql("SELECT array(array(min(v), max(v))) ar FROM VALUES 7, 9 t(v)") +.select(explode(explode_outer($"ar"))), + Row(7) :: Row(9) :: Nil) Review comment: Can we add more deep test case like the following? ```scala scala> sql("select explode(explode(explode(explode(array(array(array(array(1, 2), array(3, 4").show() +---+ |col| +---+ | 1| | 2| | 3| | 4| +---+ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v)) URL: https://github.com/apache/spark/pull/27803#discussion_r390677763 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -2206,17 +2220,21 @@ class Analyzer( * @return (the [[Generator]], seq of output names, outer flag) */ def unapply(e: Expression): Option[(Generator, Seq[String], Boolean)] = e match { -case Alias(GeneratorOuter(g: Generator), name) if g.resolved => Some((g, name :: Nil, true)) -case MultiAlias(GeneratorOuter(g: Generator), names) if g.resolved => Some((g, names, true)) -case Alias(g: Generator, name) if g.resolved => Some((g, name :: Nil, false)) -case MultiAlias(g: Generator, names) if g.resolved => Some((g, names, false)) +case Alias(GeneratorOuter(g: Generator), name) => Some((g, name :: Nil, true)) +case MultiAlias(GeneratorOuter(g: Generator), names) => Some((g, names, true)) +case Alias(g: Generator, name) => Some((g, name :: Nil, false)) +case MultiAlias(g: Generator, names) => Some((g, names, false)) case _ => None } } def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case Project(projectList, _) if projectList.exists(hasNestedGenerator) => -val nestedGenerator = projectList.find(hasNestedGenerator).get + // We need to resolve all the functions that might be replaced with generators + // before validating a plan in this rule. + case p if p.expressions.exists(_.find(_.isInstanceOf[UnresolvedFunction]).isDefined) => p + + case Project(projectList, _) if projectList.exists(hasUnsupportedNestedGenerator) => +val nestedGenerator = projectList.find(hasUnsupportedNestedGenerator).get throw new AnalysisException("Generators are not supported when it's nested in " + Review comment: Do we need to revise this message, `Generators are not supported when it's nested`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v)) URL: https://github.com/apache/spark/pull/27803#discussion_r390677218 ## File path: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ## @@ -344,12 +344,35 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSession { } } + test("Supported nested inner generators") { Review comment: Can we add more generators? Or, shall we mention `explode/explode_outer`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v)) URL: https://github.com/apache/spark/pull/27803#discussion_r390676615 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -2286,24 +2304,66 @@ class Analyzer( // Holds the resolved generator, if one exists in the project list. var resolvedGenerator: Generate = null +def createGenerate(g: Generator, outer: Boolean, names: Seq[String], child: LogicalPlan) = { Review comment: Shall we put this outside like `private def trimAlias` because this block seems to grow too big? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v)) URL: https://github.com/apache/spark/pull/27803#discussion_r390676744 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -2286,24 +2304,66 @@ class Analyzer( // Holds the resolved generator, if one exists in the project list. var resolvedGenerator: Generate = null +def createGenerate(g: Generator, outer: Boolean, names: Seq[String], child: LogicalPlan) = { + Generate( +g, +unrequiredChildIndex = Nil, +outer = outer, +qualifier = None, +generatorOutput = ResolveGenerate.makeGeneratorOutput(g, names), +child) +} + val newProjectList = projectList .map(CleanupAliases.trimNonTopLevelAliases(_).asInstanceOf[NamedExpression]) .flatMap { -case AliasedGenerator(generator, names, outer) if generator.childrenResolved => +case AliasedGenerator(generator, names, outer) => // It's a sanity check, this should not happen as the previous case will throw // exception earlier. assert(resolvedGenerator == null, "More than one generator found in SELECT.") - resolvedGenerator = -Generate( - generator, - unrequiredChildIndex = Nil, - outer = outer, - qualifier = None, - generatorOutput = ResolveGenerate.makeGeneratorOutput(generator, names), - child) + if (hasInnerGenerator(generator)) { +// The case of multiple nested inner generators +def collectAdjacentGenerators(children: Seq[Expression]) Review comment: This one too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org