GitHub user adamreeve added a comment to the discussion: It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files
You might want to try creating `ParquetFragmentScanOptions` and setting them on the `ScannerBuilder`, and enabling a buffered stream on the `parquet::ReaderProperties` and disabling pre-buffering on the `parquet::ArrowReaderProperties`. Those options together should reduce memory use when reading Parquet files. See this issue for some more context: https://github.com/apache/arrow/issues/46935 GitHub link: https://github.com/apache/arrow/discussions/47003#discussioncomment-13676765 ---- This is an automatically sent email for user@arrow.apache.org. To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org