[jira] [Created] (PARQUET-2148) Enable uniform decryption with plaintext footer

2022-05-16 Thread Gidon Gershinsky (Jira)
Gidon Gershinsky created PARQUET-2148: - Summary: Enable uniform decryption with plaintext footer Key: PARQUET-2148 URL: https://issues.apache.org/jira/browse/PARQUET-2148 Project: Parquet

Re: Forward & Backwards Compatibility

2022-05-16 Thread Antoine Pitrou
On Thu, 12 May 2022 09:46:57 -0700 William Butler wrote: > > From the JIRA, the converted type looks something like > > required group FeatureAmounts (MAP) { > repeated group map (MAP_KEY_VALUE) { > required binary key (STRING); > required binary key (STRING); > } > } >

[jira] [Created] (PARQUET-2149) Implement async IO for Parquet file reader

2022-05-16 Thread Parth Chandra (Jira)
Parth Chandra created PARQUET-2149: -- Summary: Implement async IO for Parquet file reader Key: PARQUET-2149 URL: https://issues.apache.org/jira/browse/PARQUET-2149 Project: Parquet Issue

[jira] [Updated] (PARQUET-2149) Implement async IO for Parquet file reader

2022-05-16 Thread Parth Chandra (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Parth Chandra updated PARQUET-2149: --- Description: ParquetFileReader's implementation has the following flow (simplified) -     

[GitHub] [parquet-mr] theosib-amazon commented on pull request #960: Performance optimization: Move all LittleEndianDataInputStream functionality into ByteBufferInputStream

2022-05-16 Thread GitBox
theosib-amazon commented on PR #960: URL: https://github.com/apache/parquet-mr/pull/960#issuecomment-1127827189 That improvement comes from a larget set of changes. I have a design doc that goes over all those changes plus some more that make it possible to get even more performance

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-05-16 Thread GitBox
theosib-amazon commented on code in PR #959: URL: https://github.com/apache/parquet-mr/pull/959#discussion_r873888258 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -44,8 +45,15 @@ public class CodecFactory implements CompressionCodecFactory

[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537614#comment-17537614 ] ASF GitHub Bot commented on PARQUET-2126: - theosib-amazon commented on code in PR #959: URL:

[GitHub] [parquet-mr] shangxinli commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-05-16 Thread GitBox
shangxinli commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1127847443 My question is when a thread exits, we don't have a corresponding evict operation on the map. Using thread pool might be OK if the thread object is not changed, but not sure if

[GitHub] [parquet-mr] theosib-amazon commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-05-16 Thread GitBox
theosib-amazon commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1127885617 > My question is when a thread exits, we don't have a corresponding evict operation on the map. Using thread pool might be OK if the thread object is not changed, but not sure if

[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537636#comment-17537636 ] ASF GitHub Bot commented on PARQUET-2126: - theosib-amazon commented on PR #959: URL:

[jira] [Commented] (PARQUET-2069) Parquet file containing arrays, written by Parquet-MR, cannot be read again by Parquet-MR

2022-05-16 Thread Timothy Miller (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537604#comment-17537604 ] Timothy Miller commented on PARQUET-2069: - Well, I tried modifying prepareForRead to just

[GitHub] [parquet-mr] theosib-amazon commented on pull request #957: PARQUET-2069: Allow list and array record types to be compatible.

2022-05-16 Thread GitBox
theosib-amazon commented on PR #957: URL: https://github.com/apache/parquet-mr/pull/957#issuecomment-1127822921 OK, check out the code changes. I've redone this completely. Now what it does is try out the avro schema, and if that fails, it caches the exception and tries again with an avro

[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537611#comment-17537611 ] ASF GitHub Bot commented on PARQUET-2126: - theosib-amazon commented on code in PR #959: URL:

[GitHub] [parquet-mr] theosib-amazon commented on a diff in pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-05-16 Thread GitBox
theosib-amazon commented on code in PR #959: URL: https://github.com/apache/parquet-mr/pull/959#discussion_r873884939 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java: ## @@ -184,8 +192,18 @@ public CompressionCodecName getCodecName() { } + /*

[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537613#comment-17537613 ] ASF GitHub Bot commented on PARQUET-2126: - theosib-amazon commented on PR #959: URL:

[GitHub] [parquet-mr] theosib-amazon commented on pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

2022-05-16 Thread GitBox
theosib-amazon commented on PR #959: URL: https://github.com/apache/parquet-mr/pull/959#issuecomment-1127839048 > If we change it to be per thread, then would it be a problem in the scenario where short living threads come and go? When the thread stopped, we might not know and leak here.

[jira] [Commented] (PARQUET-2069) Parquet file containing arrays, written by Parquet-MR, cannot be read again by Parquet-MR

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537607#comment-17537607 ] ASF GitHub Bot commented on PARQUET-2069: - theosib-amazon commented on PR #957: URL:

[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537620#comment-17537620 ] ASF GitHub Bot commented on PARQUET-2126: - shangxinli commented on PR #959: URL:

[GitHub] [parquet-mr] theosib-amazon commented on pull request #962: Performance optimization to ByteBitPackingValuesReader

2022-05-16 Thread GitBox
theosib-amazon commented on PR #962: URL: https://github.com/apache/parquet-mr/pull/962#issuecomment-1128059990 There is no new functionality here. There is just a performance optimization. It looks like the following tests should already handle this: BitPackingPerfTest,

[jira] [Commented] (PARQUET-2149) Implement async IO for Parquet file reader

2022-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537813#comment-17537813 ] ASF GitHub Bot commented on PARQUET-2149: - parthchandra opened a new pull request, #968: URL:

[GitHub] [parquet-mr] parthchandra opened a new pull request, #968: PARQUET-2149: Async IO implementation for ParquetFileReader

2022-05-16 Thread GitBox
parthchandra opened a new pull request, #968: URL: https://github.com/apache/parquet-mr/pull/968 ### Jira This PR addresses the following [PARQUET-2149](https://issues.apache.org/jira/browse/PARQUET-2149): Implement async IO for Parquet file reader ### Tests

[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-05-16 Thread Parth Chandra (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537823#comment-17537823 ] Parth Chandra commented on PARQUET-2126: FWIW, I just submitted a PR to implement async io for