[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references
[ https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258073#comment-16258073 ] Uwe L. Korn commented on ARROW-1769: We could drop various `gc.collect()` calls in different places but I would like to refrain from date and hope for the next pandas release arriving soon. > Python: pyarrow.parquet.write_to_dataset creates cyclic references > -- > > Key: ARROW-1769 > URL: https://issues.apache.org/jira/browse/ARROW-1769 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Uwe L. Korn > Fix For: 0.8.0 > > > See https://github.com/apache/arrow/issues/1285 for the initial issue. Having > cyclic references is a valid state in Python as they can be cleaned up by the > garbage collector. But as the garbage collector normally runs at a point > which is not clear to the user and we deal here normally with larger objects, > we should get rid of the cyclic reference to evict data as soon as possible > from main memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references
[ https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257183#comment-16257183 ] Wes McKinney commented on ARROW-1769: - Is there something actionable we could do in pyarrow? > Python: pyarrow.parquet.write_to_dataset creates cyclic references > -- > > Key: ARROW-1769 > URL: https://issues.apache.org/jira/browse/ARROW-1769 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Uwe L. Korn > Fix For: 0.8.0 > > > See https://github.com/apache/arrow/issues/1285 for the initial issue. Having > cyclic references is a valid state in Python as they can be cleaned up by the > garbage collector. But as the garbage collector normally runs at a point > which is not clear to the user and we deal here normally with larger objects, > we should get rid of the cyclic reference to evict data as soon as possible > from main memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references
[ https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239596#comment-16239596 ] Uwe L. Korn commented on ARROW-1769: Verified with {{pandas==0.22.0.dev0+61.gbc69dc69b}} that the underlying problem is caused by Pandas and will vanish after the next Pandas release. > Python: pyarrow.parquet.write_to_dataset creates cyclic references > -- > > Key: ARROW-1769 > URL: https://issues.apache.org/jira/browse/ARROW-1769 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Uwe L. Korn > Fix For: 0.8.0 > > > See https://github.com/apache/arrow/issues/1285 for the initial issue. Having > cyclic references is a valid state in Python as they can be cleaned up by the > garbage collector. But as the garbage collector normally runs at a point > which is not clear to the user and we deal here normally with larger objects, > we should get rid of the cyclic reference to evict data as soon as possible > from main memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references
[ https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239566#comment-16239566 ] Uwe L. Korn commented on ARROW-1769: We generate temporary DataFrames inside of {{write_to_dataset}} in the above case. This could probably be fixed by https://github.com/pandas-dev/pandas/issues/15746 > Python: pyarrow.parquet.write_to_dataset creates cyclic references > -- > > Key: ARROW-1769 > URL: https://issues.apache.org/jira/browse/ARROW-1769 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Uwe L. Korn > Fix For: 0.8.0 > > > See https://github.com/apache/arrow/issues/1285 for the initial issue. Having > cyclic references is a valid state in Python as they can be cleaned up by the > garbage collector. But as the garbage collector normally runs at a point > which is not clear to the user and we deal here normally with larger objects, > we should get rid of the cyclic reference to evict data as soon as possible > from main memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)