GitHub user pitrou added a comment to the discussion: It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files
In addition to `batch_readahead`, you can also try the [`cache_metadata`](https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset11ScanOptions14cache_metadataE) option. GitHub link: https://github.com/apache/arrow/discussions/47003#discussioncomment-13765913 ---- This is an automatically sent email for user@arrow.apache.org. To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org