[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2021-01-12 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263669#comment-17263669 ] Weston Pace commented on ARROW-9974: Now that ARROW-11049  is finished I tried this out with the

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-28 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255684#comment-17255684 ] Ashish Gupta commented on ARROW-9974: - Tried, didn't work. > [Python][C++] pyarrow version 1.0.1

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255671#comment-17255671 ] Weston Pace commented on ARROW-9974: You could try adding... pa.jemalloc_set_decay_ms(1) to the

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-28 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255667#comment-17255667 ] Ashish Gupta commented on ARROW-9974: - I will wait for ARROW-11009 to be resolved so I can switch

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-28 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255642#comment-17255642 ] Antoine Pitrou commented on ARROW-9974: --- > Why this is not an issue on windows 10, read somewhere

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-28 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255514#comment-17255514 ] Ashish Gupta commented on ARROW-9974: - Looks like that's the issue. Thanks! {code:java} cat

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-27 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255333#comment-17255333 ] Weston Pace commented on ARROW-9974: Also, this behavior was introduced between 0.13.0 and 1.0.1

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-27 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255332#comment-17255332 ] Weston Pace commented on ARROW-9974: Ok.  I think I've really tracked it down now.  It appears the

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-24 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254703#comment-17254703 ] Weston Pace commented on ARROW-9974: > If the system memory limit is the issue, would it have worked

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-24 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254642#comment-17254642 ] Ashish Gupta commented on ARROW-9974: - If the system memory limit is the issue, would it have worked

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-23 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254171#comment-17254171 ] Weston Pace commented on ARROW-9974: I believe what is happening is that the `ParquetDataset`

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-23 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254007#comment-17254007 ] Ashish Gupta commented on ARROW-9974: - This is a dedicated physical server. sysctl

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-22 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253825#comment-17253825 ] Weston Pace commented on ARROW-9974: I was able to reproduce memory issues at 6500 rows. 

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-21 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252725#comment-17252725 ] Ashish Gupta commented on ARROW-9974: - Thanks for looking into this. I suspect the dataframe size is

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-12-17 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251419#comment-17251419 ] Weston Pace commented on ARROW-9974: I attempted to reproduce this on centos-8 and was not

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-10-07 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209705#comment-17209705 ] Ashish Gupta commented on ARROW-9974: - Tried... export MALLOC_MMAP_THRESHOLD_=65536 same error

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-10-07 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209676#comment-17209676 ] Antoine Pitrou commented on ARROW-9974: --- [~kgashish] Can you try what I suggested above? >

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-10-07 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209671#comment-17209671 ] Ashish Gupta commented on ARROW-9974: - Anyone tried to reproduce on centos-8? > [Python][C++]

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-10-05 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17208167#comment-17208167 ] Antoine Pitrou commented on ARROW-9974: --- The error message ("OSError: Out of memory: malloc of size

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-09-29 Thread Krisztian Szucs (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204338#comment-17204338 ] Krisztian Szucs commented on ARROW-9974: I have not had the time to test it on centos 8 yet, but

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-09-29 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204167#comment-17204167 ] Ashish Gupta commented on ARROW-9974: - I have test.py as below {code:java}  {code}   {code:java}

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-09-29 Thread Krisztian Szucs (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204038#comment-17204038 ] Krisztian Szucs commented on ARROW-9974: Sure! First you need to enable coredumps on the

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-09-29 Thread Ashish Gupta (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204007#comment-17204007 ] Ashish Gupta commented on ARROW-9974: - It seems it has something to do with the operating system. The

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset

2020-09-29 Thread Krisztian Szucs (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203912#comment-17203912 ] Krisztian Szucs commented on ARROW-9974: I was also unable to reproduce the error, tried with

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset (works fine with version 0.13)

2020-09-11 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194504#comment-17194504 ] Joris Van den Bossche commented on ARROW-9974: -- [~kgashish] thanks for opening the issue

[jira] [Commented] (ARROW-9974) [Python][C++] pyarrow version 1.0.1 throws Out Of Memory exception while reading large number of files using ParquetDataset (works fine with version 0.13)

2020-09-11 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194299#comment-17194299 ] Wes McKinney commented on ARROW-9974: - cc [~bkietz] [~jorisvandenbossche] > [Python][C++] pyarrow