[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-07-01 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448709387



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -172,13 +188,23 @@ object NestedColumnAliasing {
   (f, Alias(f, s"_gen_alias_${exprId.id}")(exprId, Seq.empty, None))
 }
 
+
+// Do deduplication based on semanticEquals, and then sum.
+val nestedFieldNum = nestedFieldToAlias
+  .foldLeft(Seq[ExtractValue]()) {
+(unique, curr) => if (!unique.exists(curr._1.semanticEquals(_))) {
+  curr._1 +: unique
+} else {
+  unique
+}
+  }
+  .map { t => totalFieldNum(t.dataType)  }
+  .sum

Review comment:
   Could you check this part, @viirya ? also cc: @dongjoon-hyun because 
IIUC he is an orignal author of this file: 
https://github.com/apache/spark/commit/257391497bea51553d5a9e420c20a3fdfe4d36a9#diff-43334bab9616cc53e8797b9afa9fc7aaR117-R126





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-07-01 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448692070



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -176,9 +192,16 @@ object NestedColumnAliasing {
 // By default, ColumnPruning rule uses `attr` already.
 if (nestedFieldToAlias.nonEmpty &&
 nestedFieldToAlias
-  .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
-  .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  .foldLeft(Seq[ExtractValue]()) {
+(unique, curr) => if 
(!unique.exists(curr._1.semanticEquals(_))) {
+  curr._1 +: unique
+} else {
+  unique
+}
+  }
+  .map { t => totalFieldNum(t.dataType)  }
+  .sum < totalFieldNum(attr._2)) {

Review comment:
   I think Its hard to read this part. Could you pull the left value out 
from the `if` condition?
   ```
   val xxx = yyy
   if (xxx < totalFieldNum(attr._2)) { ...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-07-01 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448692070



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -176,9 +192,16 @@ object NestedColumnAliasing {
 // By default, ColumnPruning rule uses `attr` already.
 if (nestedFieldToAlias.nonEmpty &&
 nestedFieldToAlias
-  .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
-  .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  .foldLeft(Seq[ExtractValue]()) {
+(unique, curr) => if 
(!unique.exists(curr._1.semanticEquals(_))) {
+  curr._1 +: unique
+} else {
+  unique
+}
+  }
+  .map { t => totalFieldNum(t.dataType)  }
+  .sum < totalFieldNum(attr._2)) {

Review comment:
   Its hard to read this part. Could you pull the left value out from the 
`if` condition?
   ```
   val xxx = yyy
   if (xxx < totalFieldNum(attr._2)) { ...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-30 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448035806



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -178,11 +195,16 @@ object NestedColumnAliasing {
 nestedFieldToAlias
   .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
   .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  Some((attr.exprId, nestedFieldToAlias))
 } else {
   None
 }
   }
+  .groupBy(_._1) // To fix same ExprId mapped to different attribute 
instance

Review comment:
   You meant this fix is only for the Window case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-30 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448035514



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -178,11 +195,16 @@ object NestedColumnAliasing {
 nestedFieldToAlias
   .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
   .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  Some((attr.exprId, nestedFieldToAlias))

Review comment:
   ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-30 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r447488633



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -153,6 +169,7 @@ object NestedColumnAliasing {
 val aliasSub = nestedFieldReferences.asInstanceOf[Seq[ExtractValue]]
   .filter(!_.references.subsetOf(exclusiveAttrSet))
   .groupBy(_.references.head)
+  .toList

Review comment:
   We need this?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,20 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+/**
+ * This pattern is needed to support [[Filter]] plan cases like
+ * [[Project]]->[[Filter]]->listed plan in `canProjectPushThrough` (e.g., 
[[Window]]).
+ * The reason why we don't simply add [[Filter]] in 
`canProjectPushThrough` is that
+ * the optimizer can hit an infinite loop during the 
[[PushDownPredicates]] rule.
+ */
+case Project(projectList, Filter(condition, child))
+  if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>

Review comment:
   nit: indent

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -178,11 +195,16 @@ object NestedColumnAliasing {
 nestedFieldToAlias
   .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
   .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  Some((attr.exprId, nestedFieldToAlias))

Review comment:
   Why did change this?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -178,11 +195,16 @@ object NestedColumnAliasing {
 nestedFieldToAlias
   .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
   .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  Some((attr.exprId, nestedFieldToAlias))
 } else {
   None
 }
   }
+  .groupBy(_._1) // To fix same ExprId mapped to different attribute 
instance
+  .map {
+case (exprId: ExprId, expressions: List[(ExprId, Seq[(ExtractValue, 
Alias)])]) =>
+  exprId -> expressions.flatMap(_._2)
+  }

Review comment:
   Plz avoid the long chaining and see: 
https://github.com/databricks/scala-style-guide#chaining

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -178,11 +195,16 @@ object NestedColumnAliasing {
 nestedFieldToAlias
   .map { case (nestedField, _) => 
totalFieldNum(nestedField.dataType) }
   .sum < totalFieldNum(attr.dataType)) {
-  Some(attr.exprId -> nestedFieldToAlias)
+  Some((attr.exprId, nestedFieldToAlias))
 } else {
   None
 }
   }
+  .groupBy(_._1) // To fix same ExprId mapped to different attribute 
instance

Review comment:
   Is this an existing issue? Could you show us a query to reproduce this?

##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,87 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for Window") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+
+val query1 = contact
+  .select($"name.first", winExpr.as('window))
+  .analyze
+val optimized1 = Optimize.execute(query1)
+val expected1 = contact
+  .select($"name.first", $"address", $"id")
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized1, expected1)
+
+val query2 = contact
+  .select($"name.first", winExpr.as('window))
+  .orderBy($"name.last".asc)
+  .analyze
+val optimized2 = Optimize.execute(query2)
+val aliases2 = collectGeneratedAliases(optimized2)
+val expected2 = contact
+  .select($"name.first", $"address", $"id", $"name.last".as(aliases2(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"window", $"${aliases2(1)}".as(aliases2(0)))
+  .orderBy($"${aliases2(0)}".asc)
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized2, expected2)
+  }
+
+  test("Nested field pruning for Filter") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 

[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499537



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized = Optimize.execute(query)
+val aliases = collectGeneratedAliases(optimized)
+val expected = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases(1)}".as(aliases(0)), $"window")
+  .where($"window" === 1 && $"${aliases(0)}" === "a")
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized, expected)
+  }
+
+  test("Nested field pruning for orderBy") {

Review comment:
   Actually, `test("Nested field pruning for Sort") {`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446508750



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,22 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+/**
+ * This is to solve a `LogicalPlan` like `Project`->`Filter`->`Window`.
+ * In this case, `Window` can be plan that is `canProjectPushThrough`.
+ * By adding this, it allows nested columns to be passed onto next stages.
+ * Currently, not adding `Filter` into `canProjectPushThrough` due to
+ * infinitely loop in optimizers during the predicate push-down rule.
+ */

Review comment:
   btw, do you know why the optimizer can hit the issue? I think its better 
to check the root cause for future activities if possible.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446502299



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,22 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+/**
+ * This is to solve a `LogicalPlan` like `Project`->`Filter`->`Window`.
+ * In this case, `Window` can be plan that is `canProjectPushThrough`.
+ * By adding this, it allows nested columns to be passed onto next stages.
+ * Currently, not adding `Filter` into `canProjectPushThrough` due to
+ * infinitely loop in optimizers during the predicate push-down rule.
+ */

Review comment:
   How about rephrasing it like tihs?
   ```
   /**
* This pattern is needed to support [[Filter]] plan cases like
* [[Project]]->[[Filter]]->listed plan in `canProjectPushThrough` 
(e.g., [[Window]]).
* The reason why we don't simply add [[Filter]] in 
`canProjectPushThrough` is that
* the optimizer can hit an infinite loop during the 
[[PushDownPredicates]] rule.
*/
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446500230



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala
##
@@ -460,6 +460,40 @@ abstract class SchemaPruningSuite
 checkAnswer(query4, Row(2, null) :: Row(2, 4) :: Nil)
   }
 
+  testSchemaPruning("select nested field in window function") {
+val windowSql =
+  """
+|with contact_rank as (
+|  select row_number() over (partition by address order by id desc) as 
__rank,
+|  contacts.*
+|  from contacts
+|)
+|select name.first , __rank from contact_rank
+|where name.first = 'Jane' AND __rank = 1

Review comment:
   the same comment here, too: 
https://github.com/apache/spark/pull/28898/files#r446499910





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499982



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala
##
@@ -460,6 +460,40 @@ abstract class SchemaPruningSuite
 checkAnswer(query4, Row(2, null) :: Row(2, 4) :: Nil)
   }
 
+  testSchemaPruning("select nested field in window function") {
+val windowSql =
+  """
+|with contact_rank as (
+|  select row_number() over (partition by address order by id desc) as 
__rank,

Review comment:
   nit: `__rank` -> `rank`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499910



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized = Optimize.execute(query)
+val aliases = collectGeneratedAliases(optimized)
+val expected = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases(1)}".as(aliases(0)), $"window")
+  .where($"window" === 1 && $"${aliases(0)}" === "a")

Review comment:
   Just a suggestion: could you remove this `where` in this test, then add 
a separate test unit like `test("Nested field pruning for Filter") {` for 
exhaustive filter tests?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499675



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,22 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+/**
+ * This is to solve a `LogicalPlan` like `Project`->`Filter`->`Window`.
+ * In this case, `Window` can be plan that is `canProjectPushThrough`.
+ * By adding this, it allows nested columns to be passed onto next stages.
+ * Currently, not adding `Filter` into `canProjectPushThrough` due to
+ * infinitely loop in optimizers during the predicate push-down rule.
+ */
+

Review comment:
   nit: remove this unnecessary blank.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499643



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -39,6 +39,22 @@ object NestedColumnAliasing {
   NestedColumnAliasing.replaceToAliases(plan, nestedFieldToAlias, 
attrToAliases)
   }
 
+/**
+ * This is to solve a `LogicalPlan` like `Project`->`Filter`->`Window`.

Review comment:
   Is this only for `Window`? It seems not.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499537



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized = Optimize.execute(query)
+val aliases = collectGeneratedAliases(optimized)
+val expected = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases(1)}".as(aliases(0)), $"window")
+  .where($"window" === 1 && $"${aliases(0)}" === "a")
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized, expected)
+  }
+
+  test("Nested field pruning for orderBy") {

Review comment:
   Actually, `tes("Nested field pruning for Sort") {`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499537



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized = Optimize.execute(query)
+val aliases = collectGeneratedAliases(optimized)
+val expected = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases(1)}".as(aliases(0)), $"window")
+  .where($"window" === 1 && $"${aliases(0)}" === "a")
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized, expected)
+  }
+
+  test("Nested field pruning for orderBy") {

Review comment:
   Actually, `Nested field pruning for Sort`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499494



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {

Review comment:
   `Nested field pruning for window functions` -> `Nested field pruning for 
Window`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-06-27 Thread GitBox


maropu commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r446499394



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala
##
@@ -493,6 +491,58 @@ class NestedColumnAliasingSuite extends SchemaPruningTest {
 comparePlans(optimized3, expected3)
   }
 
+  test("Nested field pruning for window functions") {
+val spec = windowSpec($"address" :: Nil, $"id".asc :: Nil, 
UnspecifiedFrame)
+val winExpr = windowExpr(RowNumber().toAggregateExpression(), spec)
+val query = contact.select($"name.first", winExpr.as('window))
+  .where($"window" === 1 && $"name.first" === "a")
+  .analyze
+val optimized = Optimize.execute(query)
+val aliases = collectGeneratedAliases(optimized)
+val expected = contact
+  .select($"name.first", $"address", $"id", $"name.first".as(aliases(1)))
+  .window(Seq(winExpr.as("window")), Seq($"address"), Seq($"id".asc))
+  .select($"first", $"${aliases(1)}".as(aliases(0)), $"window")
+  .where($"window" === 1 && $"${aliases(0)}" === "a")
+  .select($"first", $"window")
+  .analyze
+comparePlans(optimized, expected)
+  }
+
+  test("Nested field pruning for orderBy") {

Review comment:
   Why did you add the separate tests for orderBy/sortBy? They have the 
same plan, sort.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org