Re: [D] It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files [arrow]

via GitHub Sun, 06 Jul 2025 14:48:48 -0700


GitHub user adamreeve added a comment to the discussion: It is possible to 
reduce peak memory usage when using datasets (to use predicate pushdown) when 
reading single parquet files


You might want to try creating `ParquetFragmentScanOptions` and setting them on 
the `ScannerBuilder`, and enabling a buffered stream on the 
`parquet::ReaderProperties` and disabling pre-buffering on the 
`parquet::ArrowReaderProperties`. Those options together should reduce memory 
use when reading Parquet files. See this issue for some more context: 
https://github.com/apache/arrow/issues/46935

GitHub link: 
https://github.com/apache/arrow/discussions/47003#discussioncomment-13676765

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Re: [D] It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files [arrow]

Reply via email to