[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6719:
---
Description:
I have Parquet files with certain complex columns of type List,
List, etc. and am using latest
V Luong created ARROW-6719:
--
Summary: Parquet read_table error in Python3.7:
pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is
inconsistent with schema list<...>
Key: ARROW-6719
URL:
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong closed ARROW-6719.
--
Resolution: Invalid
> Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data
> for
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939159#comment-16939159
]
V Luong commented on ARROW-6719:
Sorry, I've just found out that the PyArrow used was actually
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654
]
V Luong edited comment on ARROW-6910 at 11/6/19 8:05 PM:
-
[~apitrou] [~wesm]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654
]
V Luong edited comment on ARROW-6910 at 11/6/19 8:08 PM:
-
[~apitrou] [~wesm]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968675#comment-16968675
]
V Luong commented on ARROW-6910:
ok [~wesm] let me create a new JIRA ticket for 0.15.1
> [Python]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654
]
V Luong commented on ARROW-6910:
[~apitrou] [~wesm] I'm re-testing this issue using the newly-released
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6910:
---
Description:
I realize that when I read up a lot of Parquet files using
pyarrow.parquet.read_table(...), my
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968654#comment-16968654
]
V Luong edited comment on ARROW-6910 at 11/6/19 8:08 PM:
-
[~apitrou] [~wesm]
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952129#comment-16952129
]
V Luong commented on ARROW-6719:
I have attached some data above to reproduce the problem
> Parquet
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong reopened ARROW-6719:
I am encountering this issue in PyArrow 0.15.0 again
> Parquet read_table error in Python3.7:
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6719:
---
Affects Version/s: 0.15.0
> Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data
>
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951797#comment-16951797
]
V Luong edited comment on ARROW-6719 at 10/15/19 10:12 AM:
---
I am encountering
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6719:
---
Attachment: read-fail.snappy.parquet
> Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid:
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6719:
---
Affects Version/s: (was: 0.14.1)
> Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid:
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6719:
---
Description:
I have Parquet files with certain complex columns of type List,
List, etc. and am using latest
[
https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6719:
---
Comment: was deleted
(was: Sorry, I've just found out that the PyArrow used was actually 0.14.0.RAY,
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6910:
---
Description:
I realize that when I read up a lot of Parquet files using
pyarrow.parquet.read_table(...), my
V Luong created ARROW-6910:
--
Summary: pyarrow.parquet.read_table(...) takes up lots of memory
which is not released until program exits
Key: ARROW-6910
URL: https://issues.apache.org/jira/browse/ARROW-6910
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6910:
---
Description:
I realize that when I read up a lot of Parquet files using
pyarrow.parquet.read_table(...), my
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953357#comment-16953357
]
V Luong commented on ARROW-6910:
[~wesm] [~apitrou] ARROW-6874's title states that Table.to_pandas()
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955242#comment-16955242
]
V Luong commented on ARROW-6910:
Great, thank you a great deal [~wesm]!
> [Python]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021
]
V Luong edited comment on ARROW-6910 at 10/17/19 6:54 PM:
--
[~wesm]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021
]
V Luong commented on ARROW-6910:
[~wesm][~jorisvandenbossche] I've made a Parquet data set available at
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954025#comment-16954025
]
V Luong commented on ARROW-6910:
[~wesm] could you try "aws s3 sync
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021
]
V Luong edited comment on ARROW-6910 at 10/17/19 6:51 PM:
--
[~wesm]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954029#comment-16954029
]
V Luong commented on ARROW-6910:
ok let me check again on another machine [~wesm] and let you know
>
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6910:
---
Comment: was deleted
(was: [~wesm] could you try "aws s3 sync
s3://public-parquet-test-data/big.parquet
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954134#comment-16954134
]
V Luong commented on ARROW-6910:
[~wesm] [~jorisvandenbossche] [~apitrou] can you try "wget
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6910:
---
Description:
I realize that when I read up a lot of Parquet files using
pyarrow.parquet.read_table(...), my
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954021#comment-16954021
]
V Luong edited comment on ARROW-6910 at 10/17/19 9:50 PM:
--
[~wesm]
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6910:
---
Description:
I realize that when I read up a lot of Parquet files using
pyarrow.parquet.read_table(...), my
[
https://issues.apache.org/jira/browse/ARROW-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954147#comment-16954147
]
V Luong commented on ARROW-6910:
Using the code above, after just 10 iterations of reading up the file
[
https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6796:
---
Description:
My Spark workloads produce small-to-moderately-sized Parquet files with typical
on-disk sizes
[
https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6796:
---
Description:
My Spark workloads produce small-to-moderately-sized Parquet files with typical
on-disk sizes
[
https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6796:
---
Description:
My Spark workloads produce small-to-moderately-sized Parquet files with typical
on-disk sizes
V Luong created ARROW-6796:
--
Summary: Certain moderately-sized (~100MB)
default-Snappy-compressed Parquet files take enormous memory and long time to
load by pyarrow.parquet.read_table
Key: ARROW-6796
URL:
[
https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong updated ARROW-6796:
---
Description:
My Spark workloads produce small-to-moderately-sized Parquet files with typical
on-disk sizes
[
https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945365#comment-16945365
]
V Luong commented on ARROW-6796:
[~emkornfi...@gmail.com] thank you very much. Yes, indeed this
[
https://issues.apache.org/jira/browse/ARROW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V Luong resolved ARROW-6796.
Fix Version/s: 0.15.0
Resolution: Fixed
Resolved in 0.15.0, which has fixed a number of
41 matches
Mail list logo