Hi Xinyu, The C++ library does not provide lazy materialization. The java library supports row level filtering, please check it if interested: https://issues.apache.org/jira/browse/ORC-577
With regards to the IO magnification introduced by PPD, I think we have discussed this earlier and there is a pending work item: https://issues.apache.org/jira/browse/ORC-1264 Best, Gang On Mon, Jan 16, 2023 at 5:41 PM Xinyu Z <xzen...@gmail.com> wrote: > Hi, > > I know that in ORC with SearchArguments and row index, we can skip > reading and decoding row groups that are out of the range of > predicate. But does ORC have late materialization functionality? > Basically after decoding and evaluating the predicate column(s), we > can only read and decode the row groups of projection columns where > the matching rows reside. This can further reduce IO and decoding > overhead. It seems the C++ version does not have this. I am asking > because parquet-rs recently add this: > > https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/ > > Another question is about row index. Since each row group is logically > 10000 rows and may not align with CompressionChunk boundaries, does > this cause issue for predicate pushdown? E.g, even we can skip one row > group, we may still need to do IO on the boundary CompressionChunks. > > Thanks a lot, > Xinyu >