huaxingao commented on a change in pull request #34248:
URL: https://github.com/apache/spark/pull/34248#discussion_r727685501
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetAggregatePushDownSuite.scala
##########
@@ -240,6 +239,29 @@ abstract class ParquetAggregatePushDownSuite
}
}
+ test("aggregate push down - aggregate with partition filter can be pushed
down") {
+ withTempPath { dir =>
+ spark.range(10).selectExpr("id", "id % 3 as p")
+ .write.partitionBy("p").parquet(dir.getCanonicalPath)
+ withTempView("tmp") {
+
spark.read.parquet(dir.getCanonicalPath).createOrReplaceTempView("tmp");
+ Seq("false", "true").foreach { enableVectorizedReader =>
+ withSQLConf(SQLConf.PARQUET_AGGREGATE_PUSHDOWN_ENABLED.key -> "true",
+ vectorizedReaderEnabledKey -> enableVectorizedReader) {
+ val max = sql("SELECT max(id) FROM tmp WHERE p = 0")
Review comment:
added.
Group by on partition column is a little more complicated and needs some
code changes: currently, we only have the aggregate values in the returned row.
For group by on partition column, we will need to pass down the partition col
value and prepend that value to the aggregation row. I will have a separate PR
for that work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]