GitHub user adamreeve added a comment to the discussion: It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files
That's interesting you see reduced RSS with the system allocator. In some tests I did recently when streaming record batches from Parquet, but not using the dataset API, the max RSS with jemalloc was a lot lower than with the system allocator. I guess heap management is complicated and there isn't one allocator that will be best for all workloads. GitHub link: https://github.com/apache/arrow/discussions/47003#discussioncomment-13745844 ---- This is an automatically sent email for user@arrow.apache.org. To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org