Robert Gruener created ARROW-2656:
-
Summary: [Python] Improve ParquetManifest creation time for highly
Key: ARROW-2656
URL: https://issues.apache.org/jira/browse/ARROW-2656
Project: Apache Arrow
[
https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener updated ARROW-2656:
--
Summary: [Python] Improve ParquetManifest creation time (was: [Python]
Improve
[
https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497147#comment-16497147
]
Robert Gruener commented on ARROW-2656:
---
I will attempt to get some code as a benchmark however (I
[
https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener updated ARROW-2656:
--
Description:
When a parquet dataset is highly partitioned, the time to call the constructor
[
https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener updated ARROW-2656:
--
Description:
When a parquet dataset is highly partitioned, the time to call the constructor
[
https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525236#comment-16525236
]
Robert Gruener commented on ARROW-2656:
---
I have opened [https://github.com/apache/arrow/pull/2185]
[
https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567478#comment-16567478
]
Robert Gruener commented on ARROW-2800:
---
So I have dug into the code here a bit out of curiosity
[
https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567515#comment-16567515
]
Robert Gruener commented on ARROW-2800:
---
Nevermind, I found
[
https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener reassigned ARROW-2800:
-
Assignee: Robert Gruener
> [Python] Unavailable Parquet column statistics from
[
https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567531#comment-16567531
]
Robert Gruener commented on ARROW-2800:
---
Ok so I was wondering why parquet-mr 1.10.0 can read the
[
https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568244#comment-16568244
]
Robert Gruener commented on ARROW-2800:
---
Is there a way to move this ticket to be under PARQUET? (I
[
https://issues.apache.org/jira/browse/ARROW-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558352#comment-16558352
]
Robert Gruener commented on ARROW-2800:
---
Ah, I see. Though why can parquet-cpp read the statistics
[
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener reassigned ARROW-1983:
-
Assignee: Robert Gruener
> [Python] Add ability to write parquet `_metadata` file
>
[
https://issues.apache.org/jira/browse/ARROW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener resolved ARROW-2842.
---
Resolution: Invalid
I have not been able to reproduce well. It likely was due to an hdfs
[
https://issues.apache.org/jira/browse/ARROW-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556161#comment-16556161
]
Robert Gruener commented on ARROW-2911:
---
This is likely related to ARROW-2800 ?
> [Python] Parquet
[
https://issues.apache.org/jira/browse/ARROW-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604756#comment-16604756
]
Robert Gruener commented on ARROW-1796:
---
That sounds good to me. I would like to point out it would
[
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540231#comment-16540231
]
Robert Gruener commented on ARROW-1983:
---
This looks like it would need changes in parquet-cpp as
[
https://issues.apache.org/jira/browse/ARROW-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener reassigned ARROW-2656:
-
Assignee: Robert Gruener
> [Python] Improve ParquetManifest creation time
>
[
https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535115#comment-16535115
]
Robert Gruener commented on ARROW-1956:
---
Can this not already be done using the filters argument on
Robert Gruener created ARROW-2801:
-
Summary: [Python] Implement splt_row_groups for ParquetDataset
Key: ARROW-2801
URL: https://issues.apache.org/jira/browse/ARROW-2801
Project: Apache Arrow
[
https://issues.apache.org/jira/browse/ARROW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener updated ARROW-2842:
--
Description:
This might be a bug in parquet-cpp, I need to spend a bit more time tracking
Robert Gruener created ARROW-2842:
-
Summary: [Python] Cannot read parquet files with row group size of
1 From HDFS
Key: ARROW-2842
URL: https://issues.apache.org/jira/browse/ARROW-2842
Project:
[
https://issues.apache.org/jira/browse/ARROW-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Gruener updated ARROW-2842:
--
Description:
This might be a bug in parquet-cpp, I need to spend a bit more time tracking
[
https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541803#comment-16541803
]
Robert Gruener commented on ARROW-1983:
---
[~xhochy] I made this dependent task PARQUET-1348
>
Robert Gruener created ARROW-2761:
-
Summary: Support set filter operators on Hive partitioned Parquet
files
Key: ARROW-2761
URL: https://issues.apache.org/jira/browse/ARROW-2761
Project: Apache Arrow
[
https://issues.apache.org/jira/browse/ARROW-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526544#comment-16526544
]
Robert Gruener commented on ARROW-2761:
---
https://github.com/apache/arrow/pull/2188
> Support set
Robert Gruener created ARROW-2763:
-
Summary: [Python] Make parquet _metadata file accessible from
ParquetDataset
Key: ARROW-2763
URL: https://issues.apache.org/jira/browse/ARROW-2763
Project: Apache
[
https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710584#comment-16710584
]
Robert Gruener commented on ARROW-2801:
---
I might have time to finish this up next week. I actually
28 matches
Mail list logo