sadikovi commented on PR #37419:
URL: https://github.com/apache/spark/pull/37419#issuecomment-1212757388
Not exactly, the filter actually references columns that exist in the file.
It is the projection that matters in the code apparently.
Here is what they have in the javadoc:
```
* @param paths
* the paths of the columns used in the actual projection; a
column not being part of the projection will be
* handled as containing {@code null} values only even if the
column has values written in the file
```
https://github.com/apache/parquet-mr/blob/0819356a9dafd2ca07c5eab68e2bffeddc3bd3d9/parquet-column/src/main/java/org/apache/parquet/internal/filter2/columnindex/ColumnIndexFilter.java#L80)
I am not very familiar with the implementation but I think the library
should be returning all rows instead of empty rows.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]