Hi,

I know that in ORC with SearchArguments and row index, we can skip
reading and decoding row groups that are out of the range of
predicate. But does ORC have late materialization functionality?
Basically after decoding and evaluating the predicate column(s), we
can only read and decode the row groups of projection columns where
the matching rows reside. This can further reduce IO and decoding
overhead. It seems the C++ version does not have this. I am asking
because parquet-rs recently add this:
https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/

Another question is about row index. Since each row group is logically
10000 rows and may not align with CompressionChunk boundaries, does
this cause issue for predicate pushdown? E.g, even we can skip one row
group, we may still need to do IO on the boundary CompressionChunks.

Thanks a lot,
Xinyu

Reply via email to