Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/511#issuecomment-41361251
But are there any realistic workloads where you'd want to turn this on all
the time, or turn it off all the time? It seems that in an ad-hoc query
workload, you'll have some queries that can use this, and some that can't. You
should just pick whether you want it as a default. Personally I'd go for it
unless the cost is super high in the cases where it doesn't work, because I
imagine filtering is pretty common in large schemas and I hope Parquet itself
optimizes this down the line.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---