Re: [D] It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files [arrow]

via GitHub Sun, 13 Jul 2025 14:04:49 -0700


GitHub user adamreeve added a comment to the discussion: It is possible to 
reduce peak memory usage when using datasets (to use predicate pushdown) when 
reading single parquet files


That's interesting you see reduced RSS with the system allocator. In some tests 
I did recently when streaming record batches from Parquet, but not using the 
dataset API, the max RSS with jemalloc was a lot lower than with the system 
allocator. I guess heap management is complicated and there isn't one allocator 
that will be best for all workloads.

GitHub link: 
https://github.com/apache/arrow/discussions/47003#discussioncomment-13745844

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] It is possible to reduce peak memory usage when using datasets (to use predicate pushdown) when reading single parquet files [arrow]

Reply via email to