Wes McKinney created ARROW-6570: ----------------------------------- Summary: [Python] Use MemoryPool to allocate memory for NumPy arrays in to_pandas calls Key: ARROW-6570 URL: https://issues.apache.org/jira/browse/ARROW-6570 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.15.0
It occurred to me that we can likely improve the performance and scalability of {{Table.to_pandas}} or other {{to_pandas}} methods by using the active MemoryPool to allocate memory for the array rather than letting NumPy use the system allocator. We would need to use the {{PyCapsule}} approach to setting a {{shared_ptr<Buffer>}} as the base of the created NumPy arrays This has the additional benefit of tracking NumPy-related allocations in the MemoryPool so we will have a more precise accounting of allocated memory. -- This message was sent by Atlassian Jira (v8.3.2#803003)