Github user andreweduffy commented on the issue:
https://github.com/apache/spark/pull/14671
Yeah benchmarking is definitely a great idea, as it is likely Spark will be
better than Parquet at filtering individual records, but I'm still not quite
understanding why this filter is any different and should block on row-by-row
filtering decision. _All_ filters are being processed row-by-row using
ParquetRecordReader to my understanding, and this one is no different from any
of the others in ParquetFilters.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]