Wes McKinney created ARROW-6570:
-----------------------------------

             Summary: [Python] Use MemoryPool to allocate memory for NumPy 
arrays in to_pandas calls
                 Key: ARROW-6570
                 URL: https://issues.apache.org/jira/browse/ARROW-6570
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Wes McKinney
             Fix For: 0.15.0


It occurred to me that we can likely improve the performance and scalability of 
{{Table.to_pandas}} or other {{to_pandas}} methods by using the active 
MemoryPool to allocate memory for the array rather than letting NumPy use the 
system allocator. We would need to use the {{PyCapsule}} approach to setting a 
{{shared_ptr<Buffer>}} as the base of the created NumPy arrays

This has the additional benefit of tracking NumPy-related allocations in the 
MemoryPool so we will have a more precise accounting of allocated memory. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to