[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references

2017-11-18 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258073#comment-16258073
 ] 

Uwe L. Korn commented on ARROW-1769:


We could drop various `gc.collect()` calls in different places but I would like 
to refrain from date and hope for the next pandas release arriving soon.

> Python: pyarrow.parquet.write_to_dataset creates cyclic references
> --
>
> Key: ARROW-1769
> URL: https://issues.apache.org/jira/browse/ARROW-1769
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Uwe L. Korn
> Fix For: 0.8.0
>
>
> See https://github.com/apache/arrow/issues/1285 for the initial issue. Having 
> cyclic references is a valid state in Python as they can be cleaned up by the 
> garbage collector. But as the garbage collector normally runs at a point 
> which is not clear to the user and we deal here normally with larger objects, 
> we should get rid of the cyclic reference to evict data as soon as possible 
> from main memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references

2017-11-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257183#comment-16257183
 ] 

Wes McKinney commented on ARROW-1769:
-

Is there something actionable we could do in pyarrow?

> Python: pyarrow.parquet.write_to_dataset creates cyclic references
> --
>
> Key: ARROW-1769
> URL: https://issues.apache.org/jira/browse/ARROW-1769
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Uwe L. Korn
> Fix For: 0.8.0
>
>
> See https://github.com/apache/arrow/issues/1285 for the initial issue. Having 
> cyclic references is a valid state in Python as they can be cleaned up by the 
> garbage collector. But as the garbage collector normally runs at a point 
> which is not clear to the user and we deal here normally with larger objects, 
> we should get rid of the cyclic reference to evict data as soon as possible 
> from main memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references

2017-11-05 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239596#comment-16239596
 ] 

Uwe L. Korn commented on ARROW-1769:


Verified with {{pandas==0.22.0.dev0+61.gbc69dc69b}} that the underlying problem 
is caused by Pandas and will vanish after the next Pandas release.

> Python: pyarrow.parquet.write_to_dataset creates cyclic references
> --
>
> Key: ARROW-1769
> URL: https://issues.apache.org/jira/browse/ARROW-1769
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Uwe L. Korn
> Fix For: 0.8.0
>
>
> See https://github.com/apache/arrow/issues/1285 for the initial issue. Having 
> cyclic references is a valid state in Python as they can be cleaned up by the 
> garbage collector. But as the garbage collector normally runs at a point 
> which is not clear to the user and we deal here normally with larger objects, 
> we should get rid of the cyclic reference to evict data as soon as possible 
> from main memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1769) Python: pyarrow.parquet.write_to_dataset creates cyclic references

2017-11-05 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239566#comment-16239566
 ] 

Uwe L. Korn commented on ARROW-1769:


We generate temporary DataFrames inside of {{write_to_dataset}} in the above 
case. This could probably be fixed by 
https://github.com/pandas-dev/pandas/issues/15746

> Python: pyarrow.parquet.write_to_dataset creates cyclic references
> --
>
> Key: ARROW-1769
> URL: https://issues.apache.org/jira/browse/ARROW-1769
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Uwe L. Korn
> Fix For: 0.8.0
>
>
> See https://github.com/apache/arrow/issues/1285 for the initial issue. Having 
> cyclic references is a valid state in Python as they can be cleaned up by the 
> garbage collector. But as the garbage collector normally runs at a point 
> which is not clear to the user and we deal here normally with larger objects, 
> we should get rid of the cyclic reference to evict data as soon as possible 
> from main memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)