[jira] [Created] (ARROW-2656) [Python] Improve ParquetManifest creation time for highly

2018-05-31 Thread Robert Gruener (JIRA)
Robert Gruener created ARROW-2656: - Summary: [Python] Improve ParquetManifest creation time for highly Key: ARROW-2656 URL: https://issues.apache.org/jira/browse/ARROW-2656 Project: Apache Arrow

[jira] [Updated] (ARROW-2656) [Python] Improve ParquetManifest creation time

2018-05-31 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener updated ARROW-2656: -- Summary: [Python] Improve ParquetManifest creation time (was: [Python] Improve

[jira] [Commented] (ARROW-2656) [Python] Improve ParquetManifest creation time for highly

2018-05-31 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497147#comment-16497147 ] Robert Gruener commented on ARROW-2656: --- I will attempt to get some code as a benchmark however (I

[jira] [Updated] (ARROW-2656) [Python] Improve ParquetManifest creation time

2018-05-31 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener updated ARROW-2656: -- Description: When a parquet dataset is highly partitioned, the time to call the constructor

[jira] [Updated] (ARROW-2656) [Python] Improve ParquetManifest creation time

2018-05-31 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener updated ARROW-2656: -- Description: When a parquet dataset is highly partitioned, the time to call the constructor

[jira] [Commented] (ARROW-2656) [Python] Improve ParquetManifest creation time

2018-06-27 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525236#comment-16525236 ] Robert Gruener commented on ARROW-2656: --- I have opened [https://github.com/apache/arrow/pull/2185] 

[jira] [Commented] (ARROW-2800) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-02 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567478#comment-16567478 ] Robert Gruener commented on ARROW-2800: --- So I have dug into the code here a bit out of curiosity

[jira] [Commented] (ARROW-2800) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-02 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567515#comment-16567515 ] Robert Gruener commented on ARROW-2800: --- Nevermind, I found

[jira] [Assigned] (ARROW-2800) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-02 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener reassigned ARROW-2800: - Assignee: Robert Gruener > [Python] Unavailable Parquet column statistics from

[jira] [Commented] (ARROW-2800) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-02 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567531#comment-16567531 ] Robert Gruener commented on ARROW-2800: --- Ok so I was wondering why parquet-mr 1.10.0 can read the

[jira] [Commented] (ARROW-2800) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-03 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568244#comment-16568244 ] Robert Gruener commented on ARROW-2800: --- Is there a way to move this ticket to be under PARQUET? (I

[jira] [Commented] (ARROW-2800) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-07-26 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558352#comment-16558352 ] Robert Gruener commented on ARROW-2800: --- Ah, I see. Though why can parquet-cpp read the statistics

[jira] [Assigned] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2018-08-20 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener reassigned ARROW-1983: - Assignee: Robert Gruener > [Python] Add ability to write parquet `_metadata` file >

[jira] [Resolved] (ARROW-2842) [Python] Cannot read parquet files with row group size of 1 From HDFS

2018-07-25 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener resolved ARROW-2842. --- Resolution: Invalid I have not been able to reproduce well. It likely was due to an hdfs

[jira] [Commented] (ARROW-2911) [Python] Parquet binary statistics that end in '\0' truncate last byte

2018-07-25 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556161#comment-16556161 ] Robert Gruener commented on ARROW-2911: --- This is likely related to ARROW-2800 ? > [Python] Parquet

[jira] [Commented] (ARROW-1796) [Python] RowGroup filtering on file level

2018-09-05 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604756#comment-16604756 ] Robert Gruener commented on ARROW-1796: --- That sounds good to me. I would like to point out it would

[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2018-07-11 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540231#comment-16540231 ] Robert Gruener commented on ARROW-1983: --- This looks like it would need changes in parquet-cpp as

[jira] [Assigned] (ARROW-2656) [Python] Improve ParquetManifest creation time

2018-07-06 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener reassigned ARROW-2656: - Assignee: Robert Gruener > [Python] Improve ParquetManifest creation time >

[jira] [Commented] (ARROW-1956) [Python] Support reading specific partitions from a partitioned parquet dataset

2018-07-06 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535115#comment-16535115 ] Robert Gruener commented on ARROW-1956: --- Can this not already be done using the filters argument on

[jira] [Created] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset

2018-07-06 Thread Robert Gruener (JIRA)
Robert Gruener created ARROW-2801: - Summary: [Python] Implement splt_row_groups for ParquetDataset Key: ARROW-2801 URL: https://issues.apache.org/jira/browse/ARROW-2801 Project: Apache Arrow

[jira] [Updated] (ARROW-2842) [Python] Cannot read parquet files with row group size of 1 From HDFS

2018-07-12 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener updated ARROW-2842: -- Description: This might be a bug in parquet-cpp, I need to spend a bit more time tracking

[jira] [Created] (ARROW-2842) [Python] Cannot read parquet files with row group size of 1 From HDFS

2018-07-12 Thread Robert Gruener (JIRA)
Robert Gruener created ARROW-2842: - Summary: [Python] Cannot read parquet files with row group size of 1 From HDFS Key: ARROW-2842 URL: https://issues.apache.org/jira/browse/ARROW-2842 Project:

[jira] [Updated] (ARROW-2842) [Python] Cannot read parquet files with row group size of 1 From HDFS

2018-07-12 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Gruener updated ARROW-2842: -- Description: This might be a bug in parquet-cpp, I need to spend a bit more time tracking

[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2018-07-12 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541803#comment-16541803 ] Robert Gruener commented on ARROW-1983: --- [~xhochy] I made this dependent task PARQUET-1348 >

[jira] [Created] (ARROW-2761) Support set filter operators on Hive partitioned Parquet files

2018-06-28 Thread Robert Gruener (JIRA)
Robert Gruener created ARROW-2761: - Summary: Support set filter operators on Hive partitioned Parquet files Key: ARROW-2761 URL: https://issues.apache.org/jira/browse/ARROW-2761 Project: Apache Arrow

[jira] [Commented] (ARROW-2761) Support set filter operators on Hive partitioned Parquet files

2018-06-28 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526544#comment-16526544 ] Robert Gruener commented on ARROW-2761: --- https://github.com/apache/arrow/pull/2188 > Support set

[jira] [Created] (ARROW-2763) [Python] Make parquet _metadata file accessible from ParquetDataset

2018-06-29 Thread Robert Gruener (JIRA)
Robert Gruener created ARROW-2763: - Summary: [Python] Make parquet _metadata file accessible from ParquetDataset Key: ARROW-2763 URL: https://issues.apache.org/jira/browse/ARROW-2763 Project: Apache

[jira] [Commented] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset

2018-12-05 Thread Robert Gruener (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710584#comment-16710584 ] Robert Gruener commented on ARROW-2801: --- I might have time to finish this up next week. I actually