[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119727#comment-17119727
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r432580685



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,

Review comment:
   It is not a blocking comment and I am fine with it. But generally 
speaking, adding too much nested if/else diverges the code path and causes the 
complexity for reading. One way to avoid duplicating is to wrap them up in 
helper functions. I understand column indexes and bloom filters already did 
that but if we keep adding features like this, the code will become less and 
less readable. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119473#comment-17119473
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

andersonm-1 commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r432393858



##
File path: 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestBloomEncryption.java
##
@@ -0,0 +1,313 @@
+/*

Review comment:
   Thank you, @gszadovszky , for the suggestion. We've added it to #782 .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119463#comment-17119463
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky merged pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118716#comment-17118716
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r431856097



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -1071,12 +1229,31 @@ public BloomFilterReader 
getBloomFilterDataReader(BlockMetaData block) {
*/
   public BloomFilter readBloomFilter(ColumnChunkMetaData meta) throws 
IOException {
 long bloomFilterOffset = meta.getBloomFilterOffset();
+
+if (0 == bloomFilterOffset) { // TODO Junjie - is there a better way to 
handle this?

Review comment:
   yep, this works; also, makes sense to me - we should not proceed to read 
a bloom filter header (including seeking its offset), if there is no bloom 
filter in the file; this can be checked via the `bloomFilterOffset` variable - 
it can't be 0 if a bloom filter is present; and if no bloom filter, it can be 0 
only.
   
   I'll remove the TODO comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118703#comment-17118703
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r431856097



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -1071,12 +1229,31 @@ public BloomFilterReader 
getBloomFilterDataReader(BlockMetaData block) {
*/
   public BloomFilter readBloomFilter(ColumnChunkMetaData meta) throws 
IOException {
 long bloomFilterOffset = meta.getBloomFilterOffset();
+
+if (0 == bloomFilterOffset) { // TODO Junjie - is there a better way to 
handle this?

Review comment:
   yep, this works; also, makes sense to me - we should not proceed to read 
a bloom filter header, if there is no bloom filter in the file; this can be 
checked via the `bloomFilterOffset` variable - it can't be 0 if a bloom filter 
is present; and if no bloom filter, it can be 0 only.
   
   I'll remove the TODO comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118691#comment-17118691
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r431848468



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java
##
@@ -165,10 +200,10 @@ protected static boolean positiveLongFitsInAnInt(long 
value) {
 return (value >= 0) && (value + Integer.MIN_VALUE <= Integer.MAX_VALUE);
   }
 
-  private final EncodingStats encodingStats;
+  protected EncodingStats encodingStats;

Review comment:
   Sure, will change this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117737#comment-17117737
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-634639953


   The current review round is addressed; the checks pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117539#comment-17117539
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430964257



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -1114,7 +1299,20 @@ public ColumnIndex readColumnIndex(ColumnChunkMetaData 
column) throws IOExceptio
   return null;
 }
 f.seek(ref.getOffset());
-return 
ParquetMetadataConverter.fromParquetColumnIndex(column.getPrimitiveType(), 
Util.readColumnIndex(f));
+
+column.decryptIfNeededed();

Review comment:
   found a better place for it, inside the `ColumnChunkMetaData` object





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117537#comment-17117537
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430962619



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java
##
@@ -188,7 +189,11 @@ private void checkDeltaByteArrayProblem(FileMetaData meta, 
Configuration conf, B
   // this is okay if not using DELTA_BYTE_ARRAY with the bug
   Set encodings = new HashSet();
   for (ColumnChunkMetaData column : block.getColumns()) {
-encodings.addAll(column.getEncodings());
+try {
+  encodings.addAll(column.getEncodings());
+} catch (KeyAccessDeniedException e) {

Review comment:
   removed the suppression





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117455#comment-17117455
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430902400



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java
##
@@ -165,10 +200,10 @@ protected static boolean positiveLongFitsInAnInt(long 
value) {
 return (value >= 0) && (value + Integer.MIN_VALUE <= Integer.MAX_VALUE);
   }
 
-  private final EncodingStats encodingStats;
+  protected EncodingStats encodingStats;

Review comment:
   these fields are accessed by the `EncryptedColumnChunkMetaData` class, 
that sets their values after decrypting the column metadata.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117454#comment-17117454
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r430901237



##
File path: 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestBloomEncryption.java
##
@@ -0,0 +1,313 @@
+/*

Review comment:
   this will be a part of https://github.com/apache/parquet-mr/pull/782 (it 
will also remove the TestBloomEncryption.java and 
TestColumnIndexEncryption.java files).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113915#comment-17113915
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632617498


   > > I know, this is not the whole feature yet but would like to ensure that 
at the end we will have proper test coverity. So, the question is do we have or 
planning to have tests to verify the working of semi-encrypted files (some 
columns and the footer are plaintext) with previous readers.
   > 
   > We test this manually. We also have an automatic test that runs the 
current reader code on files that were previously created with other writers 
(such as parquet-cpp with encryption; these files are kept in parquet-testing 
repo and fetched from there). But I don't know how to automate running of 
previous readers on the files written by the current code - ?
   
   Good question. We should have the tooling to provide such tests but not sure 
if we already have it. I am fine with testing it manually for now just document 
it in details (e.g. adding to the git comment of the test PR).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113254#comment-17113254
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632140012


   > Now, we are in sync. I will try to keep it this way. ;)
   
   Appreciated! 
   Addressing today's comments will keep me busy for a while :), I'll get back 
next week.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113247#comment-17113247
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428710799



##
File path: 
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryption.java
##
@@ -0,0 +1,215 @@
+/* 
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.hadoop;
+
+import static org.junit.Assert.assertEquals;
+import static org.apache.parquet.hadoop.TestUtils.enforceEmptyDir;
+import static 
org.apache.parquet.hadoop.metadata.CompressionCodecName.UNCOMPRESSED;
+import static org.apache.parquet.schema.MessageTypeParser.parseMessageType;
+
+import java.nio.charset.StandardCharsets;
+import java.util.HashMap;
+import java.util.Random;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.junit.Rule;
+import org.junit.Test;
+
+import org.apache.parquet.crypto.ColumnEncryptionProperties;
+import org.apache.parquet.crypto.FileDecryptionProperties;
+import org.apache.parquet.crypto.FileEncryptionProperties;
+import org.apache.parquet.crypto.ParquetCipher;
+import org.apache.parquet.crypto.StringKeyIdRetriever;
+import org.apache.parquet.example.data.Group;
+import org.apache.parquet.example.data.simple.SimpleGroupFactory;
+
+
+import org.apache.parquet.hadoop.example.GroupReadSupport;
+import org.apache.parquet.hadoop.example.GroupWriteSupport;
+import org.apache.parquet.hadoop.metadata.ColumnPath;
+import org.apache.parquet.io.api.Binary;
+import org.apache.parquet.schema.MessageType;
+import org.junit.rules.TemporaryFolder;
+
+public class TestEncryption {
+
+  @Test
+  public void test() throws Exception {
+Configuration conf = new Configuration();
+Path root = new Path("target/tests/TestEncryption/");
+enforceEmptyDir(conf, root);
+
+Random random = new Random();
+int numberOfEncryptionModes = 5;
+FileEncryptionProperties[] encryptionPropertiesList = new 
FileEncryptionProperties[numberOfEncryptionModes];
+FileDecryptionProperties[] decryptionPropertiesList = new 
FileDecryptionProperties[numberOfEncryptionModes];
+
+// #0 Unencrypted - make sure null encryption properties don't break 
regular Parquet
+encryptionPropertiesList[0] = null;
+decryptionPropertiesList[0] = null;
+
+// #1 Basic encryption setup
+byte[] encryptionKey = new byte[16];
+random.nextBytes(encryptionKey);
+FileEncryptionProperties encryptionProperties = 
FileEncryptionProperties.builder(encryptionKey).build();
+FileDecryptionProperties decryptionProperties = 
FileDecryptionProperties.builder().withFooterKey(encryptionKey).build();
+encryptionPropertiesList[1] = encryptionProperties;
+decryptionPropertiesList[1] = decryptionProperties;
+
+// #2 Default algorithm, non-uniform encryption, key metadata, key 
retriever, AAD prefix
+byte[] footerKey = new byte[16];
+random.nextBytes(footerKey);
+byte[] columnKey0 = new byte[16];
+random.nextBytes(columnKey0);
+byte[] columnKey1 = new byte[16];
+random.nextBytes(columnKey1);
+ColumnEncryptionProperties columnProperties0 = 
ColumnEncryptionProperties.builder("binary_field")
+.withKey(columnKey0)
+.withKeyID("ck0")
+.build();
+ColumnEncryptionProperties columnProperties1 = 
ColumnEncryptionProperties.builder("int32_field")
+.withKey(columnKey1)
+.withKeyID("ck1")
+.build();
+HashMap columnPropertiesMap = new 
HashMap();
+columnPropertiesMap.put(columnProperties0.getPath(), columnProperties0);
+columnPropertiesMap.put(columnProperties1.getPath(), columnProperties1);
+byte[] AADPrefix = root.getName().getBytes(StandardCharsets.UTF_8);
+encryptionProperties = FileEncryptionProperties.builder(footerKey)
+.withFooterKeyID("fk")
+.withAADPrefix(AADPrefix)
+.withEncryptedColumns(columnPropertiesMap)
+.build();
+StringKeyIdRetriever keyRetriever = new StringKeyIdRetriever();

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113242#comment-17113242
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-632130361


   > I know, this is not the whole feature yet but would like to ensure that at 
the end we will have proper test coverity. So, the question is do we have or 
planning to have tests to verify the working of semi-encrypted files (some 
columns and the footer are plaintext) with previous readers.
   
   We test this manually. We also have an automatic test that runs the current 
reader code on files that were previously created with other writers (such as 
parquet-cpp with encryption; these files are kept in parquet-testing repo and 
fetched from there). But I don't know how to automate running of previous 
readers on the files written by the current code - ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113214#comment-17113214
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428673721



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -90,6 +102,10 @@
 
   // Update last two bytes with new page ordinal (instead of creating new page 
AAD from scratch)
   public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) {
+if (newPageOrdinal < 0) {

Review comment:
   I agree on validating the arguments is important. But the proper 
exception to be thrown for a `null` is a `NullPointerException` that would be 
thrown at line 134 anyway. If you really want to validate the argument for 
`null` at the first line of the method I would suggest using 
`java.util.Objects.requireNonNull(Object)`.

##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -78,11 +78,22 @@
 if (rowGroupOrdinal < 0) {
   throw new IllegalArgumentException("Wrong row group ordinal: " + 
rowGroupOrdinal);
 }
-byte[] rowGroupOrdinalBytes = shortToBytesLE(rowGroupOrdinal);
+short shortRGOrdinal = (short) rowGroupOrdinal;
+if (shortRGOrdinal != rowGroupOrdinal) {
+  throw new ParquetCryptoRuntimeException("Encrypted parquet files can't 
have "
+  + "more than Short.MAX_VALUE row groups: " + rowGroupOrdinal);

Review comment:
   I think, writing the actual value instead of referencing a java constant 
is more informative.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113204#comment-17113204
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428562074



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -1131,7 +1329,19 @@ public OffsetIndex readOffsetIndex(ColumnChunkMetaData 
column) throws IOExceptio
   return null;
 }
 f.seek(ref.getOffset());
-return 
ParquetMetadataConverter.fromParquetOffsetIndex(Util.readOffsetIndex(f));
+
+column.decryptIfNeededed();

Review comment:
   Same as above.

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java
##
@@ -1035,20 +1154,89 @@ private static void serializeBloomFilters(
 
 long offset = out.getPos();
 column.setBloomFilterOffset(offset);
-
Util.writeBloomFilterHeader(ParquetMetadataConverter.toBloomFilterHeader(bloomFilter),
 out);
-bloomFilter.writeTo(out);
+
+BlockCipher.Encryptor bloomFilterEncryptor = null;
+byte[] bloomFilterHeaderAAD = null;
+byte[] bloomFilterBitsetAAD = null;
+if (null != fileEncryptor) {
+  InternalColumnEncryptionSetup columnEncryptionSetup = 
fileEncryptor.getColumnSetup(column.getPath(), false, (short) cIndex);
+  if (columnEncryptionSetup.isEncrypted()) {
+bloomFilterEncryptor = 
columnEncryptionSetup.getMetaDataEncryptor();
+short columnOrdinal = columnEncryptionSetup.getOrdinal();
+bloomFilterHeaderAAD = 
AesCipher.createModuleAAD(fileEncryptor.getFileAAD(), 
ModuleType.BloomFilterHeader, 
+block.getOrdinal(), columnOrdinal, (short)-1);
+bloomFilterBitsetAAD = 
AesCipher.createModuleAAD(fileEncryptor.getFileAAD(), 
ModuleType.BloomFilterBitset, 
+block.getOrdinal(), columnOrdinal, (short)-1);
+  }
+}
+
+
Util.writeBloomFilterHeader(ParquetMetadataConverter.toBloomFilterHeader(bloomFilter),
 out, 
+bloomFilterEncryptor, bloomFilterHeaderAAD);
+
+ByteArrayOutputStream tempOutStream = new ByteArrayOutputStream();
+bloomFilter.writeTo(tempOutStream);
+byte[] serializedBitset = tempOutStream.toByteArray();
+if (null != bloomFilterEncryptor) {
+  serializedBitset = bloomFilterEncryptor.encrypt(serializedBitset, 
bloomFilterBitsetAAD);
+}
+out.write(serializedBitset);
   }
 }
   }
-
-  private static void serializeFooter(ParquetMetadata footer, 
PositionOutputStream out) throws IOException {
-long footerIndex = out.getPos();
+  
+  private static void serializeFooter(ParquetMetadata footer, 
PositionOutputStream out,
+  InternalFileEncryptor fileEncryptor) throws IOException {
+
 ParquetMetadataConverter metadataConverter = new 
ParquetMetadataConverter();
-org.apache.parquet.format.FileMetaData parquetMetadata = 
metadataConverter.toParquetMetadata(CURRENT_VERSION, footer);
-writeFileMetaData(parquetMetadata, out);
-LOG.debug("{}: footer length = {}" , out.getPos(), (out.getPos() - 
footerIndex));
-BytesUtils.writeIntLittleEndian(out, (int) (out.getPos() - footerIndex));
-out.write(MAGIC);
+
+// Unencrypted file
+if (null == fileEncryptor) {
+  long footerIndex = out.getPos();
+  org.apache.parquet.format.FileMetaData parquetMetadata = 
metadataConverter.toParquetMetadata(CURRENT_VERSION, footer);
+  writeFileMetaData(parquetMetadata, out);
+  LOG.debug("{}: footer length = {}" , out.getPos(), (out.getPos() - 
footerIndex));
+  BytesUtils.writeIntLittleEndian(out, (int) (out.getPos() - footerIndex));
+  out.write(MAGIC);
+  return;
+}
+
+org.apache.parquet.format.FileMetaData parquetMetadata =
+metadataConverter.toParquetMetadata(CURRENT_VERSION, footer, 
fileEncryptor);
+
+// Encrypted file with plaintext footer 
+if (!fileEncryptor.isFooterEncrypted()) {

Review comment:
   I don't know why we think that the footer length is an important 
information but this is the only case where we do not log it. We might want to 
add it here as well.

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -1071,12 +1229,31 @@ public BloomFilterReader 
getBloomFilterDataReader(BlockMetaData block) {
*/
   public BloomFilter readBloomFilter(ColumnChunkMetaData meta) throws 
IOException {
 long bloomFilterOffset = meta.getBloomFilterOffset();
+
+if (0 == bloomFilterOffset) { // TODO Junjie - is there a better way to 
handle this?

Review comment:
   @chenjunjiedada, could you please check this?

##
File path: 

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113075#comment-17113075
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428584998



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -705,22 +797,39 @@ public ParquetFileReader(Configuration conf, Path file, 
ParquetMetadata footer)
   paths.put(ColumnPath.get(col.getPath()), col);
 }
 this.crc = options.usePageChecksumVerification() ? new CRC32() : null;
+this.fileDecryptor = fileMetaData.getFileDecryptor();
   }
 
   public ParquetFileReader(InputFile file, ParquetReadOptions options) throws 
IOException {
+this(file, options, null);
+  }
+
+  public ParquetFileReader(InputFile file, ParquetReadOptions options, 
+  FileDecryptionProperties fileDecryptionProperties) throws IOException {
 this.converter = new ParquetMetadataConverter(options);
 this.file = file;
 this.f = file.newStream();
 this.options = options;
+if ((null == fileDecryptionProperties) && (file instanceof 
HadoopInputFile)) {
+  HadoopInputFile hadoopFile = (HadoopInputFile) file;
+  fileDecryptionProperties = getDecryptionProperties(hadoopFile.getPath(), 
hadoopFile.getConfiguration());

Review comment:
   Sounds good. I'll check what can be done here.
   Also, will check how the deprecated readFooter functions can be handled for 
encryption.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112990#comment-17112990
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428535767



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -442,7 +460,13 @@ static ParquetMetadata readSummaryMetadata(Configuration 
configuration, Path bas
*/
   @Deprecated
   public static final ParquetMetadata readFooter(Configuration configuration, 
Path file) throws IOException {
-return readFooter(configuration, file, NO_FILTER);
+return readFooter(configuration, file, getDecryptionProperties(file, 
configuration));
+  }
+
+  @Deprecated
+  public static final ParquetMetadata readFooter(Configuration configuration, 
Path file, 

Review comment:
   So, all of these new methods are used inside parquet-mr? If not, then I 
don't think we need them. If yes, then please, try to refactor the caller part 
to use the non-deprecated ones instead. If it does not require too much effort.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112833#comment-17112833
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428465094



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -705,22 +797,39 @@ public ParquetFileReader(Configuration conf, Path file, 
ParquetMetadata footer)
   paths.put(ColumnPath.get(col.getPath()), col);
 }
 this.crc = options.usePageChecksumVerification() ? new CRC32() : null;
+this.fileDecryptor = fileMetaData.getFileDecryptor();
   }
 
   public ParquetFileReader(InputFile file, ParquetReadOptions options) throws 
IOException {
+this(file, options, null);
+  }
+
+  public ParquetFileReader(InputFile file, ParquetReadOptions options, 
+  FileDecryptionProperties fileDecryptionProperties) throws IOException {
 this.converter = new ParquetMetadataConverter(options);
 this.file = file;
 this.f = file.newStream();
 this.options = options;
+if ((null == fileDecryptionProperties) && (file instanceof 
HadoopInputFile)) {
+  HadoopInputFile hadoopFile = (HadoopInputFile) file;
+  fileDecryptionProperties = getDecryptionProperties(hadoopFile.getPath(), 
hadoopFile.getConfiguration());

Review comment:
   We need the file `Path` in order to create the file decryption 
properties. `ParquetReadOptions` and `HadoopReadOptions` don't keep/handle the 
file paths.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112812#comment-17112812
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428455071



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -442,7 +460,13 @@ static ParquetMetadata readSummaryMetadata(Configuration 
configuration, Path bas
*/
   @Deprecated
   public static final ParquetMetadata readFooter(Configuration configuration, 
Path file) throws IOException {
-return readFooter(configuration, file, NO_FILTER);
+return readFooter(configuration, file, getDecryptionProperties(file, 
configuration));
+  }
+
+  @Deprecated
+  public static final ParquetMetadata readFooter(Configuration configuration, 
Path file, 

Review comment:
   These are encrypting versions of the existing `readFooter` functions, 
already marked as deprecated, but still actively used (eg in parquet-cli and in 
Spark). The deprecation comment says "@ deprecated will be removed in 2.0.0". 
Since we are not at parquet 2.0 yet, I've marked the encrypting versions of 
these functions as deprecated too, in order not to forget to handle them when 
working on parquet-2.0.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112252#comment-17112252
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r428008353



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -499,55 +528,100 @@ public static final ParquetMetadata 
readFooter(Configuration configuration, File
*/
   @Deprecated
   public static final ParquetMetadata readFooter(InputFile file, 
MetadataFilter filter) throws IOException {
+return readFooter(file, filter, null);
+  }
+
+  @Deprecated
+  public static final ParquetMetadata readFooter(InputFile file, 
MetadataFilter filter, 

Review comment:
   Why do we introduce new public methods that are deprecated already?

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1465,23 +1674,38 @@ public void writeDataPageV2Header(
 dataEncoding,
 rlByteLength, dlByteLength), to);
   }
-
+  

Review comment:
   Please, check your code to not to introduce any trailing whitespaces.

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
##
@@ -442,7 +460,13 @@ static ParquetMetadata readSummaryMetadata(Configuration 
configuration, Path bas
*/
   @Deprecated
   public static final ParquetMetadata readFooter(Configuration configuration, 
Path file) throws IOException {
-return readFooter(configuration, file, NO_FILTER);
+return readFooter(configuration, file, getDecryptionProperties(file, 
configuration));
+  }
+
+  @Deprecated
+  public static final ParquetMetadata readFooter(Configuration configuration, 
Path file, 

Review comment:
   Why do we introduce new public methods that are deprecated already?

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
 return offset;
   }
 
+  private static void verifyFooterIntegrity(InputStream from, 
InternalFileDecryptor fileDecryptor, 
+  int combinedFooterLength) throws IOException {
+
+byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+from.read(nonce);
+byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+from.read(gcmTag);
+
+AesGcmEncryptor footerSigner =  
fileDecryptor.createSignedFooterEncryptor();
+
+byte[] footerAndSignature = ((ByteBufferInputStream) 
from).slice(0).array();
+int footerSignatureLength = AesCipher.NONCE_LENGTH + 
AesCipher.GCM_TAG_LENGTH;
+byte[] serializedFooter = new byte[combinedFooterLength - 
footerSignatureLength];
+System.arraycopy(footerAndSignature, 0, serializedFooter, 0, 
serializedFooter.length);
+
+byte[] signedFooterAAD = 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+byte[] encryptedFooterBytes = footerSigner.encrypt(false, 
serializedFooter, nonce, signedFooterAAD);
+byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - 
AesCipher.GCM_TAG_LENGTH, 
+calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+if (!Arrays.equals(gcmTag, calculatedTag)) {
+  throw new TagVerificationException("Signature mismatch in plaintext 
footer");
+}
+  }
+
   public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter) throws IOException {
+return readParquetMetadata(from, filter, null, false, 0);
+  }
+
+  public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter,
+  final InternalFileDecryptor fileDecryptor, final boolean 
encryptedFooter, 
+  final int combinedFooterLength) throws IOException {
+
+final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? 
fileDecryptor.fetchFooterDecryptor() : null);
+final byte[] encryptedFooterAAD = (encryptedFooter? 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
 FileMetaData fileMetaData = filter.accept(new 
MetadataFilterVisitor() {
   @Override
   public FileMetaData visit(NoFilter filter) throws IOException {
-return readFileMetaData(from);
+return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
-return readFileMetaData(from, true);
+return readFileMetaData(from, true, footerDecryptor, 
encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(OffsetMetadataFilter filter) throws 
IOException {
-return filterFileMetaDataByStart(readFileMetaData(from), filter);
+

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112247#comment-17112247
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427843291



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   Also - like with the page numbers in the previous comment, having too 
many row groups will adversely affect encryption performance. There are 
per-rowgroup encryption operations, always performed on small  buffers - 
therefore, very slow (no hardware acceleration, etc). Having dozens/hundreds of 
thousands (or more) of them will significantly affect the overall encryption 
time of a file. Moreover, having lots of row groups might lead to having 
smaller data pages, which decreases the performance further.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112078#comment-17112078
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427934346



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   Sounds good to me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112076#comment-17112076
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427932476



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   Thanks for the value checking tip, I'll update the code to use it. As 
for a centralization - I think this function (`createModuleAAD`) is the right 
place. In the encryption feature, ordinals are used only for integrity 
verification - performed via AADs, which are calculated here for both 
encryption and decryption. Everywhere in the code, the ordinals will be an 
`int`. Since the `createModuleAAD` is called only for encrypted files, an 
exception will be thrown only for them (if an ordinal exceeds the 
`Short.MAX_VALUE`).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112063#comment-17112063
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427925111



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   I understand that large numbers of pages/row groups or columns would 
lead to significant performance drawbacks but it should not limit what the spec 
allows otherwise.
   
   Since it is discussed and approved already, I am fine with using `short` 
values for these. What I would suggest adding though is to have the conversion 
from `int` to `short` centralized and and have specific error messages so it is 
clear that the limit reached is a hard limit for the encryption feature. Also, 
if we will publish any description/example for the encryption feature these 
limitations shall be listed there.
   
   One more thing: the check of `intValue > Short.MAX_VALUE` is not complete. 
In case of `intValue` is negative the cast may result in a valid positive 
`short` value. I would suggest using `(short) intValue != intValue` instead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112057#comment-17112057
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427922312



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   Still, the variables in this function (and elsewhere) don't have to be 
`short`. After looking at the code, it seems `int`s are better suited for 
managing and checking these parameters (and for enabling any values in 
unencrypted files). I'll make this change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111945#comment-17111945
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427845827



##
File path: 
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java
##
@@ -49,6 +49,13 @@
* @return the index of the first row in the page
*/
   public long getFirstRowIndex(int pageIndex);
+  
+  /**
+   * @param pageIndex
+   * the index of the page
+   * @return the original ordinal of the page in the column chunk
+   */
+  public short getPageOrdinal(int pageIndex);

Review comment:
   Plus - the page headers are also encrypted. These are small, so the 
hardware acceleration is not applied on them. Having dozens/hundreds of 
thousands (or more) of page headers will significantly affect the overall 
encryption time of a file. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111799#comment-17111799
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427759101



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
 return offset;
   }
 
+  private static void verifyFooterIntegrity(InputStream from, 
InternalFileDecryptor fileDecryptor, 
+  int combinedFooterLength) throws IOException {
+
+byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+from.read(nonce);
+byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+from.read(gcmTag);
+
+AesGcmEncryptor footerSigner =  
fileDecryptor.createSignedFooterEncryptor();
+
+byte[] footerAndSignature = ((ByteBufferInputStream) 
from).slice(0).array();
+int footerSignatureLength = AesCipher.NONCE_LENGTH + 
AesCipher.GCM_TAG_LENGTH;
+byte[] serializedFooter = new byte[combinedFooterLength - 
footerSignatureLength];
+System.arraycopy(footerAndSignature, 0, serializedFooter, 0, 
serializedFooter.length);
+
+byte[] signedFooterAAD = 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+byte[] encryptedFooterBytes = footerSigner.encrypt(false, 
serializedFooter, nonce, signedFooterAAD);
+byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - 
AesCipher.GCM_TAG_LENGTH, 
+calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+if (!Arrays.equals(gcmTag, calculatedTag)) {
+  throw new TagVerificationException("Signature mismatch in plaintext 
footer");
+}
+  }
+
   public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter) throws IOException {
+return readParquetMetadata(from, filter, null, false, 0);
+  }
+
+  public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter,
+  final InternalFileDecryptor fileDecryptor, final boolean 
encryptedFooter, 
+  final int combinedFooterLength) throws IOException {
+
+final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? 
fileDecryptor.fetchFooterDecryptor() : null);
+final byte[] encryptedFooterAAD = (encryptedFooter? 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
 FileMetaData fileMetaData = filter.accept(new 
MetadataFilterVisitor() {
   @Override
   public FileMetaData visit(NoFilter filter) throws IOException {
-return readFileMetaData(from);
+return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
-return readFileMetaData(from, true);
+return readFileMetaData(from, true, footerDecryptor, 
encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(OffsetMetadataFilter filter) throws 
IOException {
-return filterFileMetaDataByStart(readFileMetaData(from), filter);
+return filterFileMetaDataByStart(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 
   @Override
   public FileMetaData visit(RangeMetadataFilter filter) throws IOException 
{
-return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+return filterFileMetaDataByMidpoint(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 });
 LOG.debug("{}", fileMetaData);
-ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+if (!encryptedFooter && null != fileDecryptor) {
+  if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file

Review comment:
   In cryptography, plaintext is an opposite of ciphertext (the result of 
plaintext encryption). 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111782#comment-17111782
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427745576



##
File path: 
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java
##
@@ -49,6 +49,13 @@
* @return the index of the first row in the page
*/
   public long getFirstRowIndex(int pageIndex);
+  
+  /**
+   * @param pageIndex
+   * the index of the page
+   * @return the original ordinal of the page in the column chunk
+   */
+  public short getPageOrdinal(int pageIndex);

Review comment:
   The background discussion is here,
   https://github.com/apache/parquet-mr/pull/776#discussion_r427743861
   
   In the case of pages, encryption becomes an order (or two orders) of 
magnitude slower if the pages are small. Basically, the hardware acceleration 
does not kick in with small pages (and there are additional problems). This is 
another reason not to allow more than 32K pages in a chunk.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111775#comment-17111775
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427743861



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   The links to the discussion on this,
   https://github.com/apache/parquet-format/pull/114#discussion_r234022579
   https://github.com/apache/parquet-format/pull/114#discussion_r232941138
   
http://mail-archives.apache.org/mod_mbox/parquet-dev/201901.mbox/%3CCAO4re1kM4xGMNT4CGrjvA43t-QgUmUwLMskTJfd8ivgCfF8rSw%40mail.gmail.com%3E
   
   The parquet-cpp approach to this is to allow for any number of row groups in 
files without encryption, and to limit it to 32K in encrypted files,
   
https://github.com/apache/arrow/commit/0c5168cf203ddf94dff01c178d649853323acbb8
   "While writing files with so many row groups is a bad idea, people will 
still do it... This .. enables reading the many-row-group files again. Files 
with encrypted row group metadata with that many row groups cannot be read"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111206#comment-17111206
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427284527



##
File path: 
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java
##
@@ -49,6 +49,13 @@
* @return the index of the first row in the page
*/
   public long getFirstRowIndex(int pageIndex);
+  
+  /**
+   * @param pageIndex
+   * the index of the page
+   * @return the original ordinal of the page in the column chunk
+   */
+  public short getPageOrdinal(int pageIndex);

Review comment:
   Why do we need it as a `short` instead of keeping it as an `int`? As per 
the parquet.thrift spec we never say that we cannot have more pages than 
`32767` even if it is unlikely to have such many.

##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -68,19 +67,32 @@
 
   public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType, 
   short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {

Review comment:
   Theoretically we don't give hard limits for the number of row groups, 
number of columns or the number of pages in the spec. There is a de facto limit 
that we use thrift lists where the size is an i32 meaning that we should allow 
java int values here.
   Also, there was a post commit discussion in a [related 
PR](https://github.com/apache/parquet-format/pull/142#issuecomment-600294754). 
It is unfortunate that that time parquet-format was already released so I don't 
know if there is a way to properly fix this issue in the format. Anyway, I 
would not restrict these values to a `short`.

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java
##
@@ -98,16 +104,21 @@
 ((lengthBuffer[0] & 0xff));
 
 if (ciphertextLength < 1) {
-  throw new IOException("Wrong length of encrypted metadata: " + 
ciphertextLength);
+  throw new ParquetCryptoRuntimeException("Wrong length of encrypted 
metadata: " + ciphertextLength);
 }
 
 byte[] ciphertextBuffer = new byte[ciphertextLength];
 gotBytes = 0;
 // Read the encrypted structure contents
 while (gotBytes < ciphertextLength) {
-  int n = from.read(ciphertextBuffer, gotBytes, ciphertextLength - 
gotBytes);
+  int n;
+  try {
+n = from.read(ciphertextBuffer, gotBytes, ciphertextLength - gotBytes);
+  } catch (IOException e) {

Review comment:
   We should let the `IOException` thrown out.

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java
##
@@ -70,23 +69,30 @@
   if (null != AAD) cipher.updateAAD(AAD);
 
   cipher.doFinal(ciphertext, inputOffset, inputLength, plainText, 
outputOffset);
-}  catch (GeneralSecurityException e) {
-  throw new IOException("Failed to decrypt", e);
+}  catch (AEADBadTagException e) {
+  throw new TagVerificationException("GCM tag check failed", e);
+} catch (GeneralSecurityException e) {
+  throw new ParquetCryptoRuntimeException("Failed to decrypt", e);
 }
 
 return plainText;
   }
 
   @Override
-  public byte[] decrypt(InputStream from, byte[] AAD) throws IOException {
+  public byte[] decrypt(InputStream from, byte[] AAD) {
 byte[] lengthBuffer = new byte[SIZE_LENGTH];
 int gotBytes = 0;
 
 // Read the length of encrypted Thrift structure
 while (gotBytes < SIZE_LENGTH) {
-  int n = from.read(lengthBuffer, gotBytes, SIZE_LENGTH - gotBytes);
+  int n;
+  try {
+n = from.read(lengthBuffer, gotBytes, SIZE_LENGTH - gotBytes);
+  } catch (IOException e) {

Review comment:
   We should let the `IOException` thrown out.

##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
 return offset;
   }
 
+  private static void verifyFooterIntegrity(InputStream from, 
InternalFileDecryptor fileDecryptor, 
+  int combinedFooterLength) throws IOException {
+
+byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+from.read(nonce);
+byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+from.read(gcmTag);
+
+AesGcmEncryptor footerSigner =  
fileDecryptor.createSignedFooterEncryptor();
+
+byte[] footerAndSignature = ((ByteBufferInputStream) 
from).slice(0).array();
+int footerSignatureLength = AesCipher.NONCE_LENGTH + 
AesCipher.GCM_TAG_LENGTH;
+byte[] serializedFooter = new byte[combinedFooterLength - 
footerSignatureLength];
+System.arraycopy(footerAndSignature, 0, serializedFooter, 

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103615#comment-17103615
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-626266715


   @ggershinsky Do you have time to review the code?  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103614#comment-17103614
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r422577368



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,

Review comment:
   Not all the changes are isolated. Generally, adding 'if/else' will add 
diverge the code and add the complexity. One other thing is regression 
thinking. If fileEncryptor is null, which would be most of the case, then it 
just executes the existing method without change. It would be less error prone. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103604#comment-17103604
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r422575326



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
 return offset;
   }
 
+  private static void verifyFooterIntegrity(InputStream from, 
InternalFileDecryptor fileDecryptor, 
+  int combinedFooterLength) throws IOException {
+
+byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+from.read(nonce);
+byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+from.read(gcmTag);
+
+AesGcmEncryptor footerSigner =  
fileDecryptor.createSignedFooterEncryptor();
+
+byte[] footerAndSignature = ((ByteBufferInputStream) 
from).slice(0).array();
+int footerSignatureLength = AesCipher.NONCE_LENGTH + 
AesCipher.GCM_TAG_LENGTH;
+byte[] serializedFooter = new byte[combinedFooterLength - 
footerSignatureLength];
+System.arraycopy(footerAndSignature, 0, serializedFooter, 0, 
serializedFooter.length);
+
+byte[] signedFooterAAD = 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+byte[] encryptedFooterBytes = footerSigner.encrypt(false, 
serializedFooter, nonce, signedFooterAAD);
+byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - 
AesCipher.GCM_TAG_LENGTH, 
+calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+if (!Arrays.equals(gcmTag, calculatedTag)) {
+  throw new TagVerificationException("Signature mismatch in plaintext 
footer");
+}
+  }
+
   public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter) throws IOException {
+return readParquetMetadata(from, filter, null, false, 0);
+  }
+
+  public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter,
+  final InternalFileDecryptor fileDecryptor, final boolean 
encryptedFooter, 
+  final int combinedFooterLength) throws IOException {
+
+final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? 
fileDecryptor.fetchFooterDecryptor() : null);
+final byte[] encryptedFooterAAD = (encryptedFooter? 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
 FileMetaData fileMetaData = filter.accept(new 
MetadataFilterVisitor() {
   @Override
   public FileMetaData visit(NoFilter filter) throws IOException {
-return readFileMetaData(from);
+return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
-return readFileMetaData(from, true);
+return readFileMetaData(from, true, footerDecryptor, 
encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(OffsetMetadataFilter filter) throws 
IOException {
-return filterFileMetaDataByStart(readFileMetaData(from), filter);
+return filterFileMetaDataByStart(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 
   @Override
   public FileMetaData visit(RangeMetadataFilter filter) throws IOException 
{
-return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+return filterFileMetaDataByMidpoint(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 });
 LOG.debug("{}", fileMetaData);
-ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+if (!encryptedFooter && null != fileDecryptor) {
+  if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file

Review comment:
   No. I see most of them on up line but a few on the same line. It is not 
a must. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message 

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-05-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098269#comment-17098269
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

prdpsvs commented on pull request #472:
URL: https://github.com/apache/parquet-mr/pull/472#issuecomment-623066084


   When is this feature going to be available? Any alternatives available today?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093195#comment-17093195
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415622844



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
 return offset;
   }
 
+  private static void verifyFooterIntegrity(InputStream from, 
InternalFileDecryptor fileDecryptor, 
+  int combinedFooterLength) throws IOException {
+
+byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+from.read(nonce);
+byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+from.read(gcmTag);
+
+AesGcmEncryptor footerSigner =  
fileDecryptor.createSignedFooterEncryptor();
+
+byte[] footerAndSignature = ((ByteBufferInputStream) 
from).slice(0).array();
+int footerSignatureLength = AesCipher.NONCE_LENGTH + 
AesCipher.GCM_TAG_LENGTH;
+byte[] serializedFooter = new byte[combinedFooterLength - 
footerSignatureLength];
+System.arraycopy(footerAndSignature, 0, serializedFooter, 0, 
serializedFooter.length);
+
+byte[] signedFooterAAD = 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+byte[] encryptedFooterBytes = footerSigner.encrypt(false, 
serializedFooter, nonce, signedFooterAAD);
+byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - 
AesCipher.GCM_TAG_LENGTH, 
+calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+if (!Arrays.equals(gcmTag, calculatedTag)) {
+  throw new TagVerificationException("Signature mismatch in plaintext 
footer");
+}
+  }
+
   public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter) throws IOException {
+return readParquetMetadata(from, filter, null, false, 0);
+  }
+
+  public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter,
+  final InternalFileDecryptor fileDecryptor, final boolean 
encryptedFooter, 
+  final int combinedFooterLength) throws IOException {
+
+final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? 
fileDecryptor.fetchFooterDecryptor() : null);
+final byte[] encryptedFooterAAD = (encryptedFooter? 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
 FileMetaData fileMetaData = filter.accept(new 
MetadataFilterVisitor() {
   @Override
   public FileMetaData visit(NoFilter filter) throws IOException {
-return readFileMetaData(from);
+return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
-return readFileMetaData(from, true);
+return readFileMetaData(from, true, footerDecryptor, 
encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(OffsetMetadataFilter filter) throws 
IOException {
-return filterFileMetaDataByStart(readFileMetaData(from), filter);
+return filterFileMetaDataByStart(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 
   @Override
   public FileMetaData visit(RangeMetadataFilter filter) throws IOException 
{
-return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+return filterFileMetaDataByMidpoint(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 });
 LOG.debug("{}", fileMetaData);
-ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+if (!encryptedFooter && null != fileDecryptor) {
+  if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file

Review comment:
   is there a requirement in the code formatting rules in this community to 
keep comments in separate lines?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> 

[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093193#comment-17093193
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415621398



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,

Review comment:
   encryption is isolated there, with _if (null != fileEncryptor)_  and _if 
(encryptMetaData)_ - similar to the _if (columnIndexRef != null)_  and _if 
(offsetIndexRef != null)_ in the same function.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093189#comment-17093189
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415617873



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -90,6 +102,10 @@
 
   // Update last two bytes with new page ordinal (instead of creating new page 
AAD from scratch)
   public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) {
+if (newPageOrdinal < 0) {

Review comment:
   Ok, I'll add this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093187#comment-17093187
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

gszadovszky commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-619820989


   > > preferably reviewed after the Travis fix is in. @gszadovszky @shangxinli 
can you apply #777 to the encryption branch.
   > 
   > It seems I still don't have permission to do so. @gszadovszky, I did 'git 
cherry-pick' and then git push to encryption branch', but got permission error. 
Do I have to create a PR for it?
   
   @shangxinli, I've done a rebase on the branch `encryption`. Maybe, that's 
why you were unable to push a commit because after the rebase it would not be a 
fast forward commit (would need a force-push).
   (Because of the rebase the Travis fix is already in.)
   If you have any problems with manual push to the repo, let's bring it 
offline. I'm happy to help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092906#comment-17092906
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415446216



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
 return offset;
   }
 
+  private static void verifyFooterIntegrity(InputStream from, 
InternalFileDecryptor fileDecryptor, 
+  int combinedFooterLength) throws IOException {
+
+byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+from.read(nonce);
+byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+from.read(gcmTag);
+
+AesGcmEncryptor footerSigner =  
fileDecryptor.createSignedFooterEncryptor();
+
+byte[] footerAndSignature = ((ByteBufferInputStream) 
from).slice(0).array();
+int footerSignatureLength = AesCipher.NONCE_LENGTH + 
AesCipher.GCM_TAG_LENGTH;
+byte[] serializedFooter = new byte[combinedFooterLength - 
footerSignatureLength];
+System.arraycopy(footerAndSignature, 0, serializedFooter, 0, 
serializedFooter.length);
+
+byte[] signedFooterAAD = 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+byte[] encryptedFooterBytes = footerSigner.encrypt(false, 
serializedFooter, nonce, signedFooterAAD);
+byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length - 
AesCipher.GCM_TAG_LENGTH, 
+calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+if (!Arrays.equals(gcmTag, calculatedTag)) {
+  throw new TagVerificationException("Signature mismatch in plaintext 
footer");
+}
+  }
+
   public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter) throws IOException {
+return readParquetMetadata(from, filter, null, false, 0);
+  }
+
+  public ParquetMetadata readParquetMetadata(final InputStream from, 
MetadataFilter filter,
+  final InternalFileDecryptor fileDecryptor, final boolean 
encryptedFooter, 
+  final int combinedFooterLength) throws IOException {
+
+final BlockCipher.Decryptor footerDecryptor = (encryptedFooter? 
fileDecryptor.fetchFooterDecryptor() : null);
+final byte[] encryptedFooterAAD = (encryptedFooter? 
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
 FileMetaData fileMetaData = filter.accept(new 
MetadataFilterVisitor() {
   @Override
   public FileMetaData visit(NoFilter filter) throws IOException {
-return readFileMetaData(from);
+return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
-return readFileMetaData(from, true);
+return readFileMetaData(from, true, footerDecryptor, 
encryptedFooterAAD);
   }
 
   @Override
   public FileMetaData visit(OffsetMetadataFilter filter) throws 
IOException {
-return filterFileMetaDataByStart(readFileMetaData(from), filter);
+return filterFileMetaDataByStart(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 
   @Override
   public FileMetaData visit(RangeMetadataFilter filter) throws IOException 
{
-return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+return filterFileMetaDataByMidpoint(readFileMetaData(from, 
footerDecryptor, encryptedFooterAAD), filter);
   }
 });
 LOG.debug("{}", fileMetaData);
-ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+if (!encryptedFooter && null != fileDecryptor) {
+  if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file

Review comment:
   Move comments up





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092904#comment-17092904
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415444590



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,

Review comment:
   Should we keep original addRowGroup() intact and just add a new one with 
InternalFileEncryptor to isolate the change's impact? There is not much 
duplicate code if doing so and we can refactor the existing code with helper 
functions.  In the majority use cases, they are non-encryption cases. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092903#comment-17092903
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415444590



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,

Review comment:
   Should we keep original addRowGroup() intact and just add a new one with 
InternalFileEncryptor? There is not much duplicate code if doing so and we can 
refactor the existing code with helper functions.  In the majority use cases, 
they are non-encryption cases. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092902#comment-17092902
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415444590



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,

Review comment:
   Should we keep original addRowGroup() intact and just add a new one with 
InternalFileEncryptor? There is not much duplicate code if doing so and we can 
refactor the existing code with helper functions.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092901#comment-17092901
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415440161



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,
+  InternalFileEncryptor fileEncryptor) throws IOException {
+
 //rowGroup.total_byte_size = ;

Review comment:
   Can we remove this? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092900#comment-17092900
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415440161



##
File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##
@@ -463,14 +486,29 @@ ConvertedType 
convertToConvertedType(LogicalTypeAnnotation logicalTypeAnnotation
 }
   }
 
-  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block) {
+  private void addRowGroup(ParquetMetadata parquetMetadata, List 
rowGroups, BlockMetaData block,
+  InternalFileEncryptor fileEncryptor) throws IOException {
+
 //rowGroup.total_byte_size = ;

Review comment:
   Can we remove this? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092890#comment-17092890
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r415429416



##
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##
@@ -90,6 +102,10 @@
 
   // Update last two bytes with new page ordinal (instead of creating new page 
AAD from scratch)
   public static void quickUpdatePageAAD(byte[] pageAAD, short newPageOrdinal) {
+if (newPageOrdinal < 0) {

Review comment:
   Since this is a public method, can we validate pageAAD also? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092885#comment-17092885
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-619640880


   > preferably reviewed after the Travis fix is in. @gszadovszky @shangxinli 
can you apply #777 to the encryption branch.
   
   It seems I still don't have permission to do so. @gszadovszky, I did 'git 
cherry-pick' and then git push to encryption branch', but got permission error. 
Do I have to create a PR for it?  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089768#comment-17089768
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on issue #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-617843400


   the PR is ready for review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088391#comment-17088391
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on issue #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-617009069


   > can you squash it to one single commit to make the review easier?
   
   This can be reviewed as a single commit at
   https://github.com/apache/parquet-mr/pull/776/files
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088389#comment-17088389
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on issue #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-617008320


   preferably reviewed after the Travis fix is in. @gszadovszky @shangxinli can 
you apply #777 to the encryption branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087801#comment-17087801
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

shangxinli commented on issue #776:
URL: https://github.com/apache/parquet-mr/pull/776#issuecomment-616599564


   Is this ready to review? Since there is no comment yet, can you squash it to 
one single commit to make the review easier? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-04-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072479#comment-17072479
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on pull request #776: PARQUET-1229: Parquet MR encryption
URL: https://github.com/apache/parquet-mr/pull/776
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2020-03-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071492#comment-17071492
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on pull request #643: PARQUET-1229 Parquet-MR encryption
URL: https://github.com/apache/parquet-mr/pull/643
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2019-05-11 Thread Gidon Gershinsky (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837768#comment-16837768
 ] 

Gidon Gershinsky commented on PARQUET-1229:
---

Once parquet-mr-1.11.0 is released, a new PR will be created for this jira.

> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2019-05-10 Thread Ken Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837203#comment-16837203
 ] 

Ken Wang commented on PARQUET-1229:
---

I see the original PR was close, but I didn't find the code in master branch. 

> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2019-02-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764761#comment-16764761
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky commented on pull request #472: PARQUET-1229: Encryption/decryption 
support to the existing Parquet classes
URL: https://github.com/apache/parquet-mr/pull/472
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>  Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1229) parquet-mr code changes for encryption support

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459521#comment-16459521
 ] 

ASF GitHub Bot commented on PARQUET-1229:
-

ggershinsky opened a new pull request #472: PARQUET-1229: Encryption/decryption 
support to the existing Parquet classes
URL: https://github.com/apache/parquet-mr/pull/472
 
 
   Depends on #471 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parquet-mr code changes for encryption support
> --
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-mr
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
>
> Addition of encryption/decryption support to the existing Parquet classes and 
> APIs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)