[jira] [Updated] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-26 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6719: --- Description: I have Parquet files with certain complex columns of type List, List, etc. and am using latest

[jira] [Created] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-26 Thread V Luong (Jira)
V Luong created ARROW-6719: -- Summary: Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...> Key: ARROW-6719 URL:

[jira] [Closed] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-27 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong closed ARROW-6719. -- Resolution: Invalid > Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data > for

[jira] [Commented] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-27 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939159#comment-16939159 ] V Luong commented on ARROW-6719: Sorry, I've just found out that the PyArrow used was actually

[jira] [Comment Edited] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654 ] V Luong edited comment on ARROW-6910 at 11/6/19 8:05 PM: - [~apitrou] [~wesm]

[jira] [Comment Edited] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654 ] V Luong edited comment on ARROW-6910 at 11/6/19 8:08 PM: - [~apitrou] [~wesm]

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968675#comment-16968675 ] V Luong commented on ARROW-6910: ok [~wesm] let me create a new JIRA ticket for 0.15.1 > [Python]

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654 ] V Luong commented on ARROW-6910: [~apitrou] [~wesm] I'm re-testing this issue using the newly-released

[jira] [Updated] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6910: --- Description: I realize that when I read up a lot of Parquet files using pyarrow.parquet.read_table(...), my

[jira] [Comment Edited] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-11-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654 ] V Luong edited comment on ARROW-6910 at 11/6/19 8:08 PM: - [~apitrou] [~wesm]

[jira] [Commented] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952129#comment-16952129 ] V Luong commented on ARROW-6719: I have attached some data above to reproduce the problem > Parquet

[jira] [Reopened] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong reopened ARROW-6719: I am encountering this issue in PyArrow 0.15.0 again > Parquet read_table error in Python3.7:

[jira] [Updated] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6719: --- Affects Version/s: 0.15.0 > Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data >

[jira] [Comment Edited] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951797#comment-16951797 ] V Luong edited comment on ARROW-6719 at 10/15/19 10:12 AM: --- I am encountering

[jira] [Updated] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6719: --- Attachment: read-fail.snappy.parquet > Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid:

[jira] [Updated] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6719: --- Affects Version/s: (was: 0.14.1) > Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid:

[jira] [Updated] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6719: --- Description: I have Parquet files with certain complex columns of type List, List, etc. and am using latest

[jira] [Issue Comment Deleted] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-10-15 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6719: --- Comment: was deleted (was: Sorry, I've just found out that the PyArrow used was actually 0.14.0.RAY,

[jira] [Updated] (ARROW-6910) pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-16 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6910: --- Description: I realize that when I read up a lot of Parquet files using pyarrow.parquet.read_table(...), my

[jira] [Created] (ARROW-6910) pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-16 Thread V Luong (Jira)
V Luong created ARROW-6910: -- Summary: pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits Key: ARROW-6910 URL: https://issues.apache.org/jira/browse/ARROW-6910

[jira] [Updated] (ARROW-6910) pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-16 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6910: --- Description: I realize that when I read up a lot of Parquet files using pyarrow.parquet.read_table(...), my

[jira] [Commented] (ARROW-6910) pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-16 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953357#comment-16953357 ] V Luong commented on ARROW-6910: [~wesm] [~apitrou] ARROW-6874's title states that Table.to_pandas()

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-19 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955242#comment-16955242 ] V Luong commented on ARROW-6910: Great, thank you a great deal [~wesm]! > [Python]

[jira] [Comment Edited] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021 ] V Luong edited comment on ARROW-6910 at 10/17/19 6:54 PM: -- [~wesm]

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021 ] V Luong commented on ARROW-6910: [~wesm][~jorisvandenbossche] I've made a Parquet data set available at

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954025#comment-16954025 ] V Luong commented on ARROW-6910: [~wesm] could you try "aws s3 sync

[jira] [Comment Edited] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021 ] V Luong edited comment on ARROW-6910 at 10/17/19 6:51 PM: -- [~wesm]

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954029#comment-16954029 ] V Luong commented on ARROW-6910: ok let me check again on another machine [~wesm] and let you know >

[jira] [Issue Comment Deleted] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6910: --- Comment: was deleted (was: [~wesm] could you try "aws s3 sync s3://public-parquet-test-data/big.parquet

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954134#comment-16954134 ] V Luong commented on ARROW-6910: [~wesm] [~jorisvandenbossche] [~apitrou] can you try "wget

[jira] [Updated] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6910: --- Description: I realize that when I read up a lot of Parquet files using pyarrow.parquet.read_table(...), my

[jira] [Comment Edited] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021 ] V Luong edited comment on ARROW-6910 at 10/17/19 9:50 PM: -- [~wesm]

[jira] [Updated] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6910: --- Description: I realize that when I read up a lot of Parquet files using pyarrow.parquet.read_table(...), my

[jira] [Commented] (ARROW-6910) [Python] pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-17 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954147#comment-16954147 ] V Luong commented on ARROW-6910: Using the code above, after just 10 iterations of reading up the file

[jira] [Updated] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-05 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6796: --- Description: My Spark workloads produce small-to-moderately-sized Parquet files with typical on-disk sizes

[jira] [Updated] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-05 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6796: --- Description: My Spark workloads produce small-to-moderately-sized Parquet files with typical on-disk sizes

[jira] [Updated] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-05 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6796: --- Description: My Spark workloads produce small-to-moderately-sized Parquet files with typical on-disk sizes

[jira] [Created] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-05 Thread V Luong (Jira)
V Luong created ARROW-6796: -- Summary: Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table Key: ARROW-6796 URL:

[jira] [Updated] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-05 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated ARROW-6796: --- Description: My Spark workloads produce small-to-moderately-sized Parquet files with typical on-disk sizes

[jira] [Commented] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945365#comment-16945365 ] V Luong commented on ARROW-6796: [~emkornfi...@gmail.com] thank you very much. Yes, indeed this

[jira] [Resolved] (ARROW-6796) Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table

2019-10-06 Thread V Luong (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong resolved ARROW-6796. Fix Version/s: 0.15.0 Resolution: Fixed Resolved in 0.15.0, which has fixed a number of