[GitHub] [spark] cloud-fan commented on pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

GitBox Thu, 15 Sep 2022 07:20:17 -0700


cloud-fan commented on PR #37881:
URL: https://github.com/apache/spark/pull/37881#issuecomment-1248170006


   This seems like a corner case when data columns and partition columns 
overlap (assuming you didn't set the case sensitivity flag to true).
   
   When data columns and partition columns overlap, Spark reads the actual 
values from partition columns and ignore the overlapping data columns. See 
`HadoopFsRelation.schema`. That said, in your example, the filter `col > 10` 
should be a partition filter, not data filter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #37881: [SPARK-40169][SQL] Don't pushdown Parquet filters with no reference to data schema

Reply via email to