Hi, I know that in ORC with SearchArguments and row index, we can skip reading and decoding row groups that are out of the range of predicate. But does ORC have late materialization functionality? Basically after decoding and evaluating the predicate column(s), we can only read and decode the row groups of projection columns where the matching rows reside. This can further reduce IO and decoding overhead. It seems the C++ version does not have this. I am asking because parquet-rs recently add this: https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/
Another question is about row index. Since each row group is logically 10000 rows and may not align with CompressionChunk boundaries, does this cause issue for predicate pushdown? E.g, even we can skip one row group, we may still need to do IO on the boundary CompressionChunks. Thanks a lot, Xinyu