[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968683#comment-16968683 ] Wes McKinney commented on ARROW-6910: - The place to start will be twiddling with the jemalloc conf

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968675#comment-16968675 ] V Luong commented on ARROW-6910: ok [~wesm] let me create a new JIRA ticket for 0.15.1 > [Python]

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968670#comment-16968670 ] Wes McKinney commented on ARROW-6910: - If you can open a new JIRA for further investigation that

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968667#comment-16968667 ] Wes McKinney commented on ARROW-6910: - What platform are you on? It's possible the background thread

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654 ] V Luong commented on ARROW-6910: [~apitrou] [~wesm] I'm re-testing this issue using the newly-released

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-19 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955242#comment-16955242 ] V Luong commented on ARROW-6910: Great, thank you a great deal [~wesm]! > [Python]

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-19 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955206#comment-16955206 ] Wes McKinney commented on ARROW-6910: - I can confirm that setting the "dirty_page_ms" jemalloc option

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-18 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955038#comment-16955038 ] Wes McKinney commented on ARROW-6910: - I can access it. I'll try to have a closer look in the next

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954147#comment-16954147 ] V Luong commented on ARROW-6910: Using the code above, after just 10 iterations of reading up the file

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954134#comment-16954134 ] V Luong commented on ARROW-6910: [~wesm] [~jorisvandenbossche] [~apitrou] can you try "wget

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954029#comment-16954029 ] V Luong commented on ARROW-6910: ok let me check again on another machine [~wesm] and let you know >

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954026#comment-16954026 ] Wes McKinney commented on ARROW-6910: - {code} $ aws s3 cp s3://public-parquet-test-data/big.parquet .

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954025#comment-16954025 ] V Luong commented on ARROW-6910: [~wesm] could you try "aws s3 sync

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954022#comment-16954022 ] Wes McKinney commented on ARROW-6910: - Can you give me an HTTPS link to download that file? I tried

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021 ] V Luong commented on ARROW-6910: [~wesm][~jorisvandenbossche] I've made a Parquet data set available at

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954005#comment-16954005 ] Wes McKinney commented on ARROW-6910: - I don't think this is a bug. I wrote a script to make and read

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953751#comment-16953751 ] Joris Van den Bossche commented on ARROW-6910: -- [~MBALearnsToCode] If it might not be a

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953748#comment-16953748 ] Wes McKinney commented on ARROW-6910: - I see. Is there something you can do to make the issue more