Count) push down for Parquet

GitBox Wed, 13 Oct 2021 09:41:35 -0700


timarmstrong edited a comment on pull request #33639:
URL: https://github.com/apache/spark/pull/33639#issuecomment-942489610



   > If the aggregate column is on partition column, only Count will be pushed, 
Min or Max will not be pushed down because Parquet doesn't return max/min for 
partition column.
   
   In the traditional Hive table layout partition columns are not stored in the 
files at all and the reader needs to materialise the partition column values 
via a different mechanism (e.g. the partition column value is included in plan 
metadata somewhere). I don't know the spark readers well but I think the spark 
parquet reader must have access to the partition values.
   
   I.e. so the min/max could be materialised from the partition column too, it 
just needs to use a different mechanism.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] timarmstrong edited a comment on pull request #33639: [SPARK-36645][SQL] Aggregate (Min/Max/Count) push down for Parquet

Reply via email to