Hi Xinyu, When the row group stride is set to 100, we end up with many row groups and each contributes a protobuf object in the stripe index. That's why you see the most expensive function is loadStripeIndex().
I need to say that smaller row groups may not help reduce the I/Os since the compression blocks by design are not aligned to the row group boundary. For example, if we have one compression block containing 5 row groups and only the 3rd row group survives the PPD, we still need the I/O of the entire compressed block and decompress the two row groups before the 3rd one. Hope my answer helps. Best, Gang On Mon, Sep 5, 2022 at 4:15 PM Xinyu Z <xzen...@gmail.com> wrote: > Hi community, > > I am using ORC C++ with filter pushdown (using similar approaches in > TestPredicatePushdown.cc). By varying rowIndexStride, I found that for > a low selectivity query, which means smaller rowIndexStride should > eliminate more IO, the scan time even goes up. This typically happens > when rowIndexStride is below 1000. > > A simple perf profiling shows that for an extreme case where I set > rowIndexStride=100, the time cost is from loadStripeIndex(). I was > wondering why? Is this because of the cost of protobuf parsing of a > lot of indexes? > > Thanks a lot, > Xinyu >