[GitHub] [spark] HyukjinKwon commented on a change in pull request #28383: [SPARK-31590][SQL] The filter used by Metadata-only queries should not have Unevaluable

2020-05-01 Thread GitBox


HyukjinKwon commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r418857861



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
##
@@ -117,7 +117,7 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
 case a: AttributeReference =>
   a.withName(relation.output.find(_.semanticEquals(a)).get.name)
   }
-}
+}.filterNot(SubqueryExpression.hasSubquery)

Review comment:
   Yeah let's keep the PR title.and description matched ..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28383: [SPARK-31590][SQL] The filter used by Metadata-only queries should not have Unevaluable

2020-04-30 Thread GitBox


HyukjinKwon commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r418417828



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuerySuite.scala
##
@@ -150,4 +150,30 @@ class OptimizeMetadataOnlyQuerySuite extends QueryTest 
with SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-31590 The filter used by Metadata-only queries should not have 
Unevaluable") {
+withTable("test_tbl") {
+  withSQLConf(OPTIMIZER_METADATA_ONLY.key -> "true") {
+sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")

Review comment:
   Can we reuse `testMetadataOnly`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28383: [SPARK-31590][SQL] The filter used by Metadata-only queries should not have Unevaluable

2020-04-30 Thread GitBox


HyukjinKwon commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r418415403



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuerySuite.scala
##
@@ -150,4 +150,30 @@ class OptimizeMetadataOnlyQuerySuite extends QueryTest 
with SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-31590 The filter used by Metadata-only queries should not have 
Unevaluable") {
+withTable("test_tbl") {
+  withSQLConf(OPTIMIZER_METADATA_ONLY.key -> "true") {
+sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")

Review comment:
   Can you make the test case minimised, and consistent with the style used 
in this file? I think you can create the partitioned table via 
`write.parquetby` syntax instead of relying on the SQL syntax here even when 
you create tables.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28383: [SPARK-31590][SQL] The filter used by Metadata-only queries should not have Unevaluable

2020-04-30 Thread GitBox


HyukjinKwon commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r418415403



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuerySuite.scala
##
@@ -150,4 +150,30 @@ class OptimizeMetadataOnlyQuerySuite extends QueryTest 
with SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-31590 The filter used by Metadata-only queries should not have 
Unevaluable") {
+withTable("test_tbl") {
+  withSQLConf(OPTIMIZER_METADATA_ONLY.key -> "true") {
+sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")

Review comment:
   Can you make the test case minimised, and consistent with the style used 
in this file? I think you can create the partitioned table via 
`write.parquetby` syntax instead of relying on the SQL syntax here even when 
you create tables.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28383: [SPARK-31590][SQL] The filter used by Metadata-only queries should not have Unevaluable

2020-04-30 Thread GitBox


HyukjinKwon commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r418415403



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuerySuite.scala
##
@@ -150,4 +150,30 @@ class OptimizeMetadataOnlyQuerySuite extends QueryTest 
with SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-31590 The filter used by Metadata-only queries should not have 
Unevaluable") {
+withTable("test_tbl") {
+  withSQLConf(OPTIMIZER_METADATA_ONLY.key -> "true") {
+sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET 
PARTITIONED BY (d ,h)")

Review comment:
   Can you make the test case minimised, and consistent with the style used 
in this file? I think you can create the partitioned table via 
`write.parquetby` syntax instead of relying on the SQL syntax here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28383: [SPARK-31590][SQL] The filter used by Metadata-only queries should not have Unevaluable

2020-04-30 Thread GitBox


HyukjinKwon commented on a change in pull request #28383:
URL: https://github.com/apache/spark/pull/28383#discussion_r418408328



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
##
@@ -119,6 +119,10 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
   }
 }
 
+if 
(normalizedFilters.exists(_.find(_.isInstanceOf[Unevaluable]).isDefined)) {
+  return child
+}

Review comment:
   Why don't you just filter out subquerties consistently with other 
normalized filters, by 
`normalizedFilters.filterNot(SubqueryExpression.hasSubquery)`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org