GitHub user adamreeve added a comment to the discussion: It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files
One other option that comes to mind is reducing `batch_readahead`. I believe it is 16 by default, so reducing it to something low like 1 or disabling it completely by setting it to 0 should reduce memory use too. `fragment_readahead` probably won't have any effect if you are only reading one Parquet file. GitHub link: https://github.com/apache/arrow/discussions/47003#discussioncomment-13714353 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
