[jira] [Created] (ARROW-1644) Parquet with nested structs can not be loaded in pyarrow in Oct 2017 nightly build

2017-10-04 Thread DB Tsai (JIRA)
DB Tsai created ARROW-1644: -- Summary: Parquet with nested structs can not be loaded in pyarrow in Oct 2017 nightly build Key: ARROW-1644 URL: https://issues.apache.org/jira/browse/ARROW-1644 Project: Apache

[jira] [Commented] (ARROW-1644) [Python] Read and write nested Parquet data with a mix of struct and list nesting levels

2017-10-04 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16192346#comment-16192346 ] DB Tsai commented on ARROW-1644: [~wesmckinn] Thanks for the detail reply. Are you saying that in

[jira] [Updated] (ARROW-1873) Segmentation fault when loading total 2GB of parquet files

2017-11-29 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated ARROW-1873: --- Description: We are trying to load 100 parquet files, and each of them is around 20MB. Before we port

[jira] [Created] (ARROW-1873) Segmentation fault when loading total 2GB of parquet files

2017-11-29 Thread DB Tsai (JIRA)
DB Tsai created ARROW-1873: -- Summary: Segmentation fault when loading total 2GB of parquet files Key: ARROW-1873 URL: https://issues.apache.org/jira/browse/ARROW-1873 Project: Apache Arrow Issue

[jira] [Commented] (ARROW-1873) [Python] Segmentation fault when loading total 2GB of parquet files

2017-12-04 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277181#comment-16277181 ] DB Tsai commented on ARROW-1873: [~wesmckinn] thanks. I wonder why the overhead of loading parquet files

[jira] [Created] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-17 Thread DB Tsai (JIRA)
DB Tsai created ARROW-1830: -- Summary: [Python] Error when loading all the files in a dictionary Key: ARROW-1830 URL: https://issues.apache.org/jira/browse/ARROW-1830 Project: Apache Arrow Issue

[jira] [Updated] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-17 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated ARROW-1830: --- Description: I can read one parquet file, but when I tried to read all the parquet files in a folder, I got

[jira] [Commented] (ARROW-1830) [Python] Error when loading all the files in a dictionary

2017-11-20 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258929#comment-16258929 ] DB Tsai commented on ARROW-1830: Those are parquet files generated by Spark written into Hive with S3

[jira] [Commented] (ARROW-1873) Segmentation fault when loading total 2GB of parquet files

2017-12-01 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275093#comment-16275093 ] DB Tsai commented on ARROW-1873: There are only 14 elements in the feature inner array, but there are 100M

[jira] [Commented] (ARROW-1873) Segmentation fault when loading total 2GB of parquet files

2017-12-01 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275374#comment-16275374 ] DB Tsai commented on ARROW-1873: Doing some digging in a beefy machine, and it runs in big machine! We