Hi Xinyu,

The C++ library does not provide lazy materialization. The java library
supports row level filtering, please check it if interested:
https://issues.apache.org/jira/browse/ORC-577

With regards to the IO magnification introduced by PPD, I think we have
discussed this earlier and there is a pending work item:
https://issues.apache.org/jira/browse/ORC-1264

Best,
Gang

On Mon, Jan 16, 2023 at 5:41 PM Xinyu Z <xzen...@gmail.com> wrote:

> Hi,
>
> I know that in ORC with SearchArguments and row index, we can skip
> reading and decoding row groups that are out of the range of
> predicate. But does ORC have late materialization functionality?
> Basically after decoding and evaluating the predicate column(s), we
> can only read and decode the row groups of projection columns where
> the matching rows reside. This can further reduce IO and decoding
> overhead. It seems the C++ version does not have this. I am asking
> because parquet-rs recently add this:
>
> https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/
>
> Another question is about row index. Since each row group is logically
> 10000 rows and may not align with CompressionChunk boundaries, does
> this cause issue for predicate pushdown? E.g, even we can skip one row
> group, we may still need to do IO on the boundary CompressionChunks.
>
> Thanks a lot,
> Xinyu
>

Reply via email to