GitHub user adamreeve added a comment to the discussion: It is possible to 
reduce peak memory usage when using datasets (to use predicate pushdown) when 
reading single parquet files

That's interesting you see reduced RSS with the system allocator. In some tests 
I did recently when streaming record batches from Parquet, but not using the 
dataset API, the max RSS with jemalloc was a lot lower than with the system 
allocator. I guess heap management is complicated and there isn't one allocator 
that will be best for all workloads.

GitHub link: 
https://github.com/apache/arrow/discussions/47003#discussioncomment-13745844

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Reply via email to