Hi guys, I am trying to read ORC files in chunks. I have a very large ORC file on disk (say 100GB) and very limited memory (e.g., I can buffer max 1MB data in memory). I want to scan ORC file intelligently:
1. read footer 2. get addresses of stripes 3. read first stripe's metadata (footer) and apply some filters 4. read first stripe's index 5. read first stripe's data (chunk by chunk - 1MB at a time) 6. Move to the next stripe I have tried to use MemoryInputStream.hh from the ORC repo: https://github.com/apache/orc/blob/main/c++/test/MemoryInputStream.hh However, while reading the data, its read method tries to access large amounts of data (beyond 1MB). virtual void read(void* buf, uint64_t length, uint64_t offset) override { memcpy(buf, buffer + offset, length); } So, is there a way to read/aceess different parts of the ORC file incrementally and with a limited in-memory buffer? Or should I materialize all orc file in local disk or in memory? Thanks! Jeyhun