GitHub user adamreeve added a comment to the discussion: It is possible to 
reduce peak memory usage when using datasets (to use predicate pushdown) when 
reading single parquet files

You might want to try creating `ParquetFragmentScanOptions` and setting them on 
the `ScannerBuilder`, and enabling a buffered stream on the 
`parquet::ReaderProperties` and disabling pre-buffering on the 
`parquet::ArrowReaderProperties`. Those options together should reduce memory 
use when reading Parquet files. See this issue for some more context: 
https://github.com/apache/arrow/issues/46935

GitHub link: 
https://github.com/apache/arrow/discussions/47003#discussioncomment-13676765

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Reply via email to