[GitHub] [arrow] pitrou commented on a change in pull request #9626: ARROW-11855: [C++][Python] Memory leak in to_pandas when converting chunked struct array

GitBox Thu, 04 Mar 2021 01:59:51 -0800


pitrou commented on a change in pull request #9626:
URL: https://github.com/apache/arrow/pull/9626#discussion_r587322650




##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -2272,6 +2272,30 @@ def test_to_pandas(self):
         series = pd.Series(arr.to_pandas())
         tm.assert_series_equal(series, expected)
 
+    def test_to_pandas_multiple_chunks(self):
+        # ARROW-11855
+        bytes_start = pa.total_allocated_bytes()

Review comment:
       Probably want to call `gc.collect()` just before this, to avoid false 
positives.

##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -689,7 +691,8 @@ Status ConvertStruct(PandasOptions options, const 
ChunkedArray& data,
           auto name = array_type->field(static_cast<int>(field_idx))->name();
           if (!arr->field(static_cast<int>(field_idx))->IsNull(i)) {
             // Value exists in child array, obtain it
-            auto array = 
reinterpret_cast<PyArrayObject*>(fields_data[field_idx].obj());
+            auto array = reinterpret_cast<PyArrayObject*>(
+                fields_data[field_idx + fields_data_offset].obj());

Review comment:
       Does this mean that conversion could give the wrong results (in addition 
to being leaky)? If so, can you add a test showcasing that? (I believe you need 
the different chunks to be unequal...).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #9626: ARROW-11855: [C++][Python] Memory leak in to_pandas when converting chunked struct array

Reply via email to