Count) for Parquet if filter is on partition col

GitBox Tue, 12 Oct 2021 21:16:43 -0700


huaxingao commented on a change in pull request #34248:
URL: https://github.com/apache/spark/pull/34248#discussion_r727685501




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetAggregatePushDownSuite.scala
##########
@@ -240,6 +239,29 @@ abstract class ParquetAggregatePushDownSuite
     }
   }
 
+  test("aggregate push down - aggregate with partition filter can be pushed 
down") {
+    withTempPath { dir =>
+      spark.range(10).selectExpr("id", "id % 3 as p")
+        .write.partitionBy("p").parquet(dir.getCanonicalPath)
+      withTempView("tmp") {
+        
spark.read.parquet(dir.getCanonicalPath).createOrReplaceTempView("tmp");
+        Seq("false", "true").foreach { enableVectorizedReader =>
+          withSQLConf(SQLConf.PARQUET_AGGREGATE_PUSHDOWN_ENABLED.key -> "true",
+            vectorizedReaderEnabledKey -> enableVectorizedReader) {
+            val max = sql("SELECT max(id) FROM tmp WHERE p = 0")

Review comment:
       added.
   Group by on partition column is a little more complicated and needs some 
code changes: currently, we only have the aggregate values in the returned row. 
For group by on partition column, we will need to pass down the partition col 
value and prepend that value to the aggregation row. I will have a separate PR 
for that work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #34248: [SPARK-36647][SQL][TESTS] Push down Aggregate (Min/Max/Count) for Parquet if filter is on partition col

Reply via email to