revans2 opened a new pull request, #37747: URL: https://github.com/apache/spark/pull/37747
### What changes were proposed in this pull request? This fixes SPARK-40280 by normalizing a parquet int/long that has optional metadata with it to look like the expected version that does not have the extra metadata. ## Why are the changes needed? This allows predicate push down in parquet to work when reading files that are complaint with the parquet specification, but different from what Spark writes. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I added unit tests that cover this use case. I also did some manual testing on some queries to verify that less data is actually read after this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
