GitHub user aavbsouza added a comment to the discussion: It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files
Hello. The suggestion by @adamreeve to reduce the batch_readahead was effective in reduce the memory consumption with a increase in time to read the file. What I found to be more unexpected is that the memory used (with readahead of 16) to be many times greater than the size of the uncompressed file (the parquet file has 6.8GB and the saved column 7.6GB) at about 70GB. @pitrou I have built the arrow library using VCPKG with the jemalloc feature, changing the environment variable to system it reduced the max rss (using time -v) to about half of the memory usage of the jemalloc pool GitHub link: https://github.com/apache/arrow/discussions/47003#discussioncomment-13739945 ---- This is an automatically sent email for user@arrow.apache.org. To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org