Gidon Gershinsky created PARQUET-2148:
-
Summary: Enable uniform decryption with plaintext footer
Key: PARQUET-2148
URL: https://issues.apache.org/jira/browse/PARQUET-2148
Project: Parquet
On Thu, 12 May 2022 09:46:57 -0700
William Butler
wrote:
>
> From the JIRA, the converted type looks something like
>
> required group FeatureAmounts (MAP) {
> repeated group map (MAP_KEY_VALUE) {
> required binary key (STRING);
> required binary key (STRING);
> }
> }
>
Parth Chandra created PARQUET-2149:
--
Summary: Implement async IO for Parquet file reader
Key: PARQUET-2149
URL: https://issues.apache.org/jira/browse/PARQUET-2149
Project: Parquet
Issue
[
https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Parth Chandra updated PARQUET-2149:
---
Description:
ParquetFileReader's implementation has the following flow (simplified) -
theosib-amazon commented on PR #960:
URL: https://github.com/apache/parquet-mr/pull/960#issuecomment-1127827189
That improvement comes from a larget set of changes. I have a design doc
that goes over all those changes plus some more that make it possible to get
even more performance
theosib-amazon commented on code in PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#discussion_r873888258
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##
@@ -44,8 +45,15 @@ public class CodecFactory implements CompressionCodecFactory
[
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537614#comment-17537614
]
ASF GitHub Bot commented on PARQUET-2126:
-
theosib-amazon commented on code in PR #959:
URL:
shangxinli commented on PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1127847443
My question is when a thread exits, we don't have a corresponding evict
operation on the map. Using thread pool might be OK if the thread object is not
changed, but not sure if
theosib-amazon commented on PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1127885617
> My question is when a thread exits, we don't have a corresponding evict
operation on the map. Using thread pool might be OK if the thread object is not
changed, but not sure if
[
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537636#comment-17537636
]
ASF GitHub Bot commented on PARQUET-2126:
-
theosib-amazon commented on PR #959:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537604#comment-17537604
]
Timothy Miller commented on PARQUET-2069:
-
Well, I tried modifying prepareForRead to just
theosib-amazon commented on PR #957:
URL: https://github.com/apache/parquet-mr/pull/957#issuecomment-1127822921
OK, check out the code changes. I've redone this completely. Now what it
does is try out the avro schema, and if that fails, it caches the exception and
tries again with an avro
[
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537611#comment-17537611
]
ASF GitHub Bot commented on PARQUET-2126:
-
theosib-amazon commented on code in PR #959:
URL:
theosib-amazon commented on code in PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#discussion_r873884939
##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##
@@ -184,8 +192,18 @@ public CompressionCodecName getCodecName() {
}
+ /*
[
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537613#comment-17537613
]
ASF GitHub Bot commented on PARQUET-2126:
-
theosib-amazon commented on PR #959:
URL:
theosib-amazon commented on PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1127839048
> If we change it to be per thread, then would it be a problem in the
scenario where short living threads come and go? When the thread stopped, we
might not know and leak here.
[
https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537607#comment-17537607
]
ASF GitHub Bot commented on PARQUET-2069:
-
theosib-amazon commented on PR #957:
URL:
[
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537620#comment-17537620
]
ASF GitHub Bot commented on PARQUET-2126:
-
shangxinli commented on PR #959:
URL:
theosib-amazon commented on PR #962:
URL: https://github.com/apache/parquet-mr/pull/962#issuecomment-1128059990
There is no new functionality here. There is just a performance
optimization. It looks like the following tests should already handle this:
BitPackingPerfTest,
[
https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537813#comment-17537813
]
ASF GitHub Bot commented on PARQUET-2149:
-
parthchandra opened a new pull request, #968:
URL:
parthchandra opened a new pull request, #968:
URL: https://github.com/apache/parquet-mr/pull/968
### Jira
This PR addresses the following
[PARQUET-2149](https://issues.apache.org/jira/browse/PARQUET-2149): Implement
async IO for Parquet file reader
### Tests
[
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537823#comment-17537823
]
Parth Chandra commented on PARQUET-2126:
FWIW, I just submitted a PR to implement async io for
22 matches
Mail list logo