[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-04-27 Thread GitBox


dongjoon-hyun commented on a change in pull request #27803:
URL: https://github.com/apache/spark/pull/27803#discussion_r416334410



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala
##
@@ -344,12 +345,73 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
+  test("Supported nested inner generators") {

Review comment:
   If you don't mind, `SPARK-31049: ` prefix?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-04-27 Thread GitBox


dongjoon-hyun commented on a change in pull request #27803:
URL: https://github.com/apache/spark/pull/27803#discussion_r416333033



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -2203,13 +2220,41 @@ class Analyzer(
   }.nonEmpty)
 }
 
-private def trimAlias(expr: NamedExpression): Expression = expr match {
+private def trimAlias(expr: Expression): Expression = expr match {
   case UnresolvedAlias(child, _) => child
   case Alias(child, _) => child
   case MultiAlias(child, _) => child
   case _ => expr
 }
 
+private def createGenerate(
+generator: Generator,
+outer: Boolean,
+names: Seq[String],
+child: LogicalPlan): Generate = {
+  Generate(
+generator,
+unrequiredChildIndex = Nil,
+outer = outer,
+qualifier = None,
+generatorOutput = ResolveGenerate.makeGeneratorOutput(generator, 
names),
+child)
+}
+
+private def collectAdjacentGenerators(children: Seq[Expression]): 
Seq[(Generator, Boolean)] = {

Review comment:
   Could you describe the output as a function description, please? For 
example, the purpose of the second boolean in the result pair?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-04-27 Thread GitBox


dongjoon-hyun commented on a change in pull request #27803:
URL: https://github.com/apache/spark/pull/27803#discussion_r416330704



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
##
@@ -425,38 +425,35 @@ class AnalysisErrorSuite extends AnalysisTest {
   TimeWindow(Literal("2016-01-01 01:01:01"), "1 second", "0 second", "0 
second").as("window")),
   "The slide duration" :: " must be greater than 0." :: Nil
   )
-
+"Nested generators are supported only when inner generators " +
+  "are unary and adjacent with single output, but got: "

Review comment:
   Ur, maybe, mistake?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-03-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] 
Support nested adjacent generators, e.g., explode(explode(v))
URL: https://github.com/apache/spark/pull/27803#discussion_r399906896
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala
 ##
 @@ -344,12 +345,55 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
+  test("Supported nested inner generators") {
+// Project cases
+checkAnswer(
+  sql("SELECT explode(explode(array(array(1, 2), array(3"),
+  Row(1) :: Row(2) :: Row(3) :: Nil)
+checkAnswer(
+  sql("SELECT array(array(4), array(5)) 
v").select(explode_outer(explode_outer($"v"))),
+  Row(4) :: Row(5) :: Nil)
+checkAnswer(
+  sql("SELECT posexplode(explode(array(array(1, 2), array(3"),
+  Row(0, 1) :: Row(1, 2) :: Row(0, 3) :: Nil)
+checkAnswer(
+  sql("SELECT array(array(4), array(5)) 
v").select(posexplode_outer(explode($"v"))),
+  Row(0, 4) :: Row(0, 5) :: Nil)
+checkAnswer(
+  sql("SELECT inline(explode(array(array(struct(1, 'a'), struct(2, 
'b')"),
+  Row(1, "a") :: Row(2, "b") :: Nil)
+
+// Aggregate cases
+checkAnswer(
+  sql("SELECT explode_outer(explode(array(array(min(v), max(v) FROM 
VALUES 1, 2, 3 t(v)"),
+  Row(1) :: Row(3) :: Nil)
+checkAnswer(
+  sql("SELECT array(array(min(v), max(v))) ar FROM VALUES 7, 9 t(v)")
+.select(explode(explode_outer($"ar"))),
+  Row(7) :: Row(9) :: Nil)
 
 Review comment:
   Can we add more deep test case like the following?
   ```
   scala> sql("select 
explode(explode(explode(explode(array(array(array(array(1, 2), array(3, 
4").show()
   +---+
   |col|
   +---+
   |  1|
   |  2|
   |  3|
   |  4|
   +---+
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-03-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] 
Support nested adjacent generators, e.g., explode(explode(v))
URL: https://github.com/apache/spark/pull/27803#discussion_r399906896
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala
 ##
 @@ -344,12 +345,55 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
+  test("Supported nested inner generators") {
+// Project cases
+checkAnswer(
+  sql("SELECT explode(explode(array(array(1, 2), array(3"),
+  Row(1) :: Row(2) :: Row(3) :: Nil)
+checkAnswer(
+  sql("SELECT array(array(4), array(5)) 
v").select(explode_outer(explode_outer($"v"))),
+  Row(4) :: Row(5) :: Nil)
+checkAnswer(
+  sql("SELECT posexplode(explode(array(array(1, 2), array(3"),
+  Row(0, 1) :: Row(1, 2) :: Row(0, 3) :: Nil)
+checkAnswer(
+  sql("SELECT array(array(4), array(5)) 
v").select(posexplode_outer(explode($"v"))),
+  Row(0, 4) :: Row(0, 5) :: Nil)
+checkAnswer(
+  sql("SELECT inline(explode(array(array(struct(1, 'a'), struct(2, 
'b')"),
+  Row(1, "a") :: Row(2, "b") :: Nil)
+
+// Aggregate cases
+checkAnswer(
+  sql("SELECT explode_outer(explode(array(array(min(v), max(v) FROM 
VALUES 1, 2, 3 t(v)"),
+  Row(1) :: Row(3) :: Nil)
+checkAnswer(
+  sql("SELECT array(array(min(v), max(v))) ar FROM VALUES 7, 9 t(v)")
+.select(explode(explode_outer($"ar"))),
+  Row(7) :: Row(9) :: Nil)
 
 Review comment:
   Can we add more deep test case like the following?
   ```scala
   scala> sql("select 
explode(explode(explode(explode(array(array(array(array(1, 2), array(3, 
4").show()
   +---+
   |col|
   +---+
   |  1|
   |  2|
   |  3|
   |  4|
   +---+
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-03-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] 
Support nested adjacent generators, e.g., explode(explode(v))
URL: https://github.com/apache/spark/pull/27803#discussion_r390677763
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -2206,17 +2220,21 @@ class Analyzer(
* @return (the [[Generator]], seq of output names, outer flag)
*/
   def unapply(e: Expression): Option[(Generator, Seq[String], Boolean)] = 
e match {
-case Alias(GeneratorOuter(g: Generator), name) if g.resolved => 
Some((g, name :: Nil, true))
-case MultiAlias(GeneratorOuter(g: Generator), names) if g.resolved => 
Some((g, names, true))
-case Alias(g: Generator, name) if g.resolved => Some((g, name :: Nil, 
false))
-case MultiAlias(g: Generator, names) if g.resolved => Some((g, names, 
false))
+case Alias(GeneratorOuter(g: Generator), name) => Some((g, name :: 
Nil, true))
+case MultiAlias(GeneratorOuter(g: Generator), names) => Some((g, 
names, true))
+case Alias(g: Generator, name) => Some((g, name :: Nil, false))
+case MultiAlias(g: Generator, names) => Some((g, names, false))
 case _ => None
   }
 }
 
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case Project(projectList, _) if projectList.exists(hasNestedGenerator) =>
-val nestedGenerator = projectList.find(hasNestedGenerator).get
+  // We need to resolve all the functions that might be replaced with 
generators
+  // before validating a plan in this rule.
+  case p if 
p.expressions.exists(_.find(_.isInstanceOf[UnresolvedFunction]).isDefined) => p
+
+  case Project(projectList, _) if 
projectList.exists(hasUnsupportedNestedGenerator) =>
+val nestedGenerator = 
projectList.find(hasUnsupportedNestedGenerator).get
 throw new AnalysisException("Generators are not supported when it's 
nested in " +
 
 Review comment:
   Do we need to revise this message, `Generators are not supported when it's 
nested`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-03-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] 
Support nested adjacent generators, e.g., explode(explode(v))
URL: https://github.com/apache/spark/pull/27803#discussion_r390677218
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala
 ##
 @@ -344,12 +344,35 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSparkSession {
 }
   }
 
+  test("Supported nested inner generators") {
 
 Review comment:
   Can we add more generators? Or, shall we mention `explode/explode_outer`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-03-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] 
Support nested adjacent generators, e.g., explode(explode(v))
URL: https://github.com/apache/spark/pull/27803#discussion_r390676615
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -2286,24 +2304,66 @@ class Analyzer(
 // Holds the resolved generator, if one exists in the project list.
 var resolvedGenerator: Generate = null
 
+def createGenerate(g: Generator, outer: Boolean, names: Seq[String], 
child: LogicalPlan) = {
 
 Review comment:
   Shall we put this outside like `private def trimAlias` because this block 
seems to grow too big?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] Support nested adjacent generators, e.g., explode(explode(v))

2020-03-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #27803: [SPARK-31049][SQL] 
Support nested adjacent generators, e.g., explode(explode(v))
URL: https://github.com/apache/spark/pull/27803#discussion_r390676744
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -2286,24 +2304,66 @@ class Analyzer(
 // Holds the resolved generator, if one exists in the project list.
 var resolvedGenerator: Generate = null
 
+def createGenerate(g: Generator, outer: Boolean, names: Seq[String], 
child: LogicalPlan) = {
+  Generate(
+g,
+unrequiredChildIndex = Nil,
+outer = outer,
+qualifier = None,
+generatorOutput = ResolveGenerate.makeGeneratorOutput(g, names),
+child)
+}
+
 val newProjectList = projectList
   
.map(CleanupAliases.trimNonTopLevelAliases(_).asInstanceOf[NamedExpression])
   .flatMap {
-case AliasedGenerator(generator, names, outer) if 
generator.childrenResolved =>
+case AliasedGenerator(generator, names, outer) =>
   // It's a sanity check, this should not happen as the previous 
case will throw
   // exception earlier.
   assert(resolvedGenerator == null, "More than one generator found 
in SELECT.")
 
-  resolvedGenerator =
-Generate(
-  generator,
-  unrequiredChildIndex = Nil,
-  outer = outer,
-  qualifier = None,
-  generatorOutput = 
ResolveGenerate.makeGeneratorOutput(generator, names),
-  child)
+  if (hasInnerGenerator(generator)) {
+// The case of multiple nested inner generators
+def collectAdjacentGenerators(children: Seq[Expression])
 
 Review comment:
   This one too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org